Databricks Concepts
Kevin Barlow
Data Practitioner
Spark is a highly flexible framework and can read from various data sources/types.
Common data sources and types:
Spark is a highly flexible framework and can read from various data sources/types.
Common data sources and types:
#Delta table
spark.read.table()
#CSV files
spark.read.format('csv').load('*.csv')
#Postgres table
spark.read.format("jdbc")
.option("driver", driver)
.option("url", url)
.option("dbtable", table)
.option("user", user)
.option("password", password)
.load()
A Delta table provides table-like qualities to an open file format.
DataFrames are two-dimensional representations of data.
id | customerName | bookTitle |
---|---|---|
1 | John Data | Guide to Spark |
2 | Sally Bricks | SQL for Data Engineering |
3 | Adam Delta | Keeping Data Clean |
df = (spark.read
.format("csv")
.option("header", "true")
.option("inferSchema", "true")
.load("/data.csv"))
Kinds of tables in Databricks
df.write.saveAsTable(table_name)
CREATE TABLE table_name
USING delta
AS ...
df.write
.location('').saveAsTable(table_name)
CREATE TABLE table_name
USING delta
LOCATION "<path>"
AS ...
Databricks Concepts