Data Transformation with Spark SQL in Databricks
Disha Mukherjee
Lead Data Engineer

$$
df_valid.write.format("delta") \
.mode("overwrite") \
.saveAsTable("transactions_clean")
print(f"Rows written: {df_valid.count():,}")
Rows written: 33,223
$$
$$






customer-id should be Customer_ID

$$
$$

@dlt.table(name="transactions_bronze") def transactions_bronze(): return spark.read.format("csv").schema(schema).load(FILE_PATH)@dlt.table(name="transactions_silver") def transactions_silver(): return dlt.read("transactions_bronze").na.drop(...).filter(...)@dlt.table(name="category_revenue_gold") def category_revenue_gold(): return dlt.read("transactions_silver").groupBy("Category").agg(...)

$$

$$
Data Transformation with Spark SQL in Databricks