Working with Databricks notebooks

Data Transformation with Spark SQL in Databricks

Disha Mukherjee

Lead Data Engineer

Your instructor

$$

$$

$$

Disha Mukherjee

  • Lead Data Engineer
  • Worked at Ford Credit, Just Eat and PayU

 

Instructor photo

Data Transformation with Spark SQL in Databricks

$$

Introduction to Databricks SQL

$$

$$

  • Will apply skills inside Databricks 🧱

$$

Introduction to PySpark

Data Transformation with Spark SQL in Databricks

What to expect

Icons for notebooks, data, transformations

Data Transformation with Spark SQL in Databricks

What is Databricks?

Databricks workspace overview

  • Provides tools for engineering and analytics 📊
  • Handles provisioning and scaling for us 🔄
Data Transformation with Spark SQL in Databricks

Clusters and Spark

Diagram of driver and workers

  • Spark handles large datasets efficiently 🚀
  • Exercises use serverless compute - same architecture, no setup
Data Transformation with Spark SQL in Databricks

Databricks notebooks

Databricks notebook with multiple cells

Data Transformation with Spark SQL in Databricks

Dataset introduction

Preview table of customer transactions

Data Transformation with Spark SQL in Databricks

Unity Catalog Volumes

Catalog UI showing a volume with Upload to volume option

Data Transformation with Spark SQL in Databricks

Loading data into a DataFrame

path = "/Volumes/.../default/transactions.csv"
df = spark.read.csv(path, header=True, inferSchema=True)
df:pyspark.sql.connect.dataframe.DataFrame
    ID:integer
    Date:timestamp
    Customer_ID:string
    ....
    Transaction_Status:string
Data Transformation with Spark SQL in Databricks

Inspecting the DataFrame

df.printSchema()
root
 |-- ID: integer (nullable = true)
 |-- Date: timestamp (nullable = true)
 ...
 |-- Transaction_Status: string (nullable = true)
Data Transformation with Spark SQL in Databricks

Previewing the data

df.show(5, truncate=False)

Previewing outputs in Databricks

Data Transformation with Spark SQL in Databricks

Driver logs

Driver logs

Data Transformation with Spark SQL in Databricks

Let's practice!

Data Transformation with Spark SQL in Databricks

Preparing Video For Download...