Databricks Concepts
Kevin Barlow
Data Practitioner

Spark is a highly flexible framework and can read from various data sources/types.
Common data sources and types:



Spark is a highly flexible framework and can read from various data sources/types.
Common data sources and types:
#Delta table
spark.read.table()
#CSV files
spark.read.format('csv').load('*.csv')
#Postgres table
spark.read.format("jdbc")
.option("driver", driver)
.option("url", url)
.option("dbtable", table)
.option("user", user)
.option("password", password)
.load()
A Delta table provides table-like qualities to an open file format.



DataFrames are two-dimensional representations of data.
| id | customerName | bookTitle |
|---|---|---|
| 1 | John Data | Guide to Spark |
| 2 | Sally Bricks | SQL for Data Engineering |
| 3 | Adam Delta | Keeping Data Clean |
df = (spark.read
.format("csv")
.option("header", "true")
.option("inferSchema", "true")
.load("/data.csv"))
Kinds of tables in Databricks
df.write.saveAsTable(table_name)
CREATE TABLE table_name
USING delta
AS ...
df.write
.location('').saveAsTable(table_name)
CREATE TABLE table_name
USING delta
LOCATION "<path>"
AS ...
Databricks Concepts