Congratulations!

Big Data Fundamentals with PySpark

Upendra Devisetty

Science Analyst, CyVerse

Fundamentals of BigData and Apache Spark

Chapter 1: Fundamentals of BigData and introduction to Spark as a distributed computing framework
- Main components: Spark Core and Spark built-in libraries - Spark SQL, Spark MLlib, Graphx, and Spark Streaming
- PySpark: Apache Spark’s Python API to execute Spark jobs
- PySpark shell: For developing the interactive applications in python
- Spark modes: Local and cluster mode

Spark components

Chapter 2: Introduction to RDDs, different features of RDDs, methods of creating RDDs and RDD operations (Transformations and Actions)
Chapter 3: Introduction to Spark SQL, DataFrame abstraction, creating DataFrames, DataFrame operations and visualizing Big Data through DataFrames
Chapter 4: Introduction to Spark MLlib, the three C's of Machine Learning (Collaborative filtering, Classification and Clustering)

Where to go next?

Big Data Fundamentals with PySpark

Preparing Video For Download...