Congratulations!

Big Data Fundamentals with PySpark

Upendra Devisetty

Science Analyst, CyVerse

Fundamentals of BigData and Apache Spark

  • Chapter 1: Fundamentals of BigData and introduction to Spark as a distributed computing framework

    • Main components: Spark Core and Spark built-in libraries - Spark SQL, Spark MLlib, Graphx, and Spark Streaming

    • PySpark: Apache Spark’s Python API to execute Spark jobs

    • PySpark shell: For developing the interactive applications in python

    • Spark modes: Local and cluster mode

Big Data Fundamentals with PySpark

Spark components

  • Chapter 2: Introduction to RDDs, different features of RDDs, methods of creating RDDs and RDD operations (Transformations and Actions)

  • Chapter 3: Introduction to Spark SQL, DataFrame abstraction, creating DataFrames, DataFrame operations and visualizing Big Data through DataFrames

  • Chapter 4: Introduction to Spark MLlib, the three C's of Machine Learning (Collaborative filtering, Classification and Clustering)

Big Data Fundamentals with PySpark

Where to go next?

Big Data Fundamentals with PySpark

Preparing Video For Download...