Working with Databricks notebooks

Data Transformation with Spark SQL in Databricks

Disha Mukherjee

Lead Data Engineer

Your instructor

Disha Mukherjee

Instructor photo

Introduction to Databricks SQL

Introduction to PySpark

Icons for notebooks, data, transformations

Databricks workspace overview

Diagram of driver and workers

Databricks notebook with multiple cells

Preview table of customer transactions

Catalog UI showing a volume with Upload to volume option

path = "/Volumes/.../default/transactions.csv"
df = spark.read.csv(path, header=True, inferSchema=True)

df:pyspark.sql.connect.dataframe.DataFrame
    ID:integer
    Date:timestamp
    Customer_ID:string
    ....
    Transaction_Status:string

df.printSchema()

root
 |-- ID: integer (nullable = true)
 |-- Date: timestamp (nullable = true)
 ...
 |-- Transaction_Status: string (nullable = true)

df.show(5, truncate=False)

Previewing outputs in Databricks

Data Transformation with Spark SQL in Databricks