Machine Learning with PySpark
Andrew Collier
Data Scientist, Fathom Data

Languages for interacting with Spark.
From Python import the pyspark module.
import pyspark
Check version of the pyspark module.
pyspark.__version__
'2.4.1'
In addition to pyspark there are
pyspark.sqlpyspark.streamingpyspark.mllib (deprecated) and pyspark.mlRemote Cluster using Spark URL — spark://<IP address | DNS name>:<port>
Example:
spark://13.59.151.161:7077Local Cluster
Examples:
local — only 1 core;local[4] — 4 cores; orlocal[*] — all available cores.from pyspark.sql import SparkSession
Create a local cluster using a SparkSession builder.
spark = SparkSession.builder \
.master('local[*]') \
.appName('first_spark_application') \
.getOrCreate()
Interact with Spark...
# Close connection to Spark
>>> spark.stop()
Machine Learning with PySpark