Notebook fundamentals

Introduction to Databricks Lakehouse

Gang Wang

Senior Data Scientist

What is a Databricks notebook?

recraft: half: A scientist at a clean modern laboratory workbench with multiple colorful instruments and tools arranged neatly, representing a multi-tool development environment

An interactive document of runnable code cells
Attached to a cluster for execution
Mix code, results, and documentation in one place
Supports Python, SQL, Scala, and R

Magic commands

# Default language: Python
df = spark.table("silver_taxi_trips")
display(df)

%sql
SELECT COUNT(*) AS total_trips
FROM silver_taxi_trips

%md
## Analysis notes
Revenue is **highest** in the Northeast region.

Available magic commands

Command	Purpose
`%python`	Run Python code
`%sql`	Run SQL queries
`%scala`	Run Scala code
`%r`	Run R code
`%md`	Render Markdown
`%sh`	Run shell commands

radial: Cluster, python, sql, md, scala, r, sh

Running another notebook with %run

%run executes another notebook in the same context
Functions and variables become available
Great for reusable utilities and shared config

# Load shared helper functions
%run /Shared/utils/data_helpers

# Now use a function defined
# in data_helpers
clean_df = clean_nulls(raw_df)

Interpreting results

Code cells show output directly below
SQL queries render as interactive tables
DataFrames display with built-in visualization options
Errors show stack traces with line numbers

recraft: half: A laptop screen displaying a colorful data dashboard with charts, tables, and a code cell, representing notebook output and visualization

Summary

Notebooks are interactive documents attached to clusters
Magic commands let you mix Python, SQL, R, Scala, and Markdown
%run loads functions from other notebooks into your session
Results render inline with built-in tables and charts

Let's practice!

Introduction to Databricks Lakehouse