Congratulations!

Fondamenti di Big Data con PySpark

Upendra Devisetty

Science Analyst, CyVerse

Fundamentals of BigData and Apache Spark

  • Chapter 1: Fundamentals of BigData and introduction to Spark as a distributed computing framework

    • Main components: Spark Core and Spark built-in libraries - Spark SQL, Spark MLlib, Graphx, and Spark Streaming

    • PySpark: Apache Spark’s Python API to execute Spark jobs

    • PySpark shell: For developing the interactive applications in python

    • Spark modes: Local and cluster mode

Fondamenti di Big Data con PySpark

Spark components

  • Chapter 2: Introduction to RDDs, different features of RDDs, methods of creating RDDs and RDD operations (Transformations and Actions)

  • Chapter 3: Introduction to Spark SQL, DataFrame abstraction, creating DataFrames, DataFrame operations and visualizing Big Data through DataFrames

  • Chapter 4: Introduction to Spark MLlib, the three C's of Machine Learning (Collaborative filtering, Classification and Clustering)

Fondamenti di Big Data con PySpark

Where to go next?

Fondamenti di Big Data con PySpark

Preparing Video For Download...