Data Transformation with Spark SQL in Databricks
Disha Mukherjee
Lead Data Engineer
$$
$$
$$
Disha Mukherjee

$$

$$
$$
$$







path = "/Volumes/.../default/transactions.csv"
df = spark.read.csv(path, header=True, inferSchema=True)
df:pyspark.sql.connect.dataframe.DataFrame
ID:integer
Date:timestamp
Customer_ID:string
....
Transaction_Status:string
df.printSchema()
root
|-- ID: integer (nullable = true)
|-- Date: timestamp (nullable = true)
...
|-- Transaction_Status: string (nullable = true)
df.show(5, truncate=False)


Data Transformation with Spark SQL in Databricks