Data Transformation with Spark SQL in Databricks
Disha Mukherjee
Lead Data Engineer

total_rows = df_enriched.count() distinct_rows = df_enriched.distinct().count()null_rate = ( df_enriched.filter(F.col("Customer_ID").isNull()) .count() / total_rows * 100 )print(f"Total rows: {total_rows}") print(f"Duplicates: {total_rows - distinct_rows}") print(f"Null rate: {null_rate:.2f}%")
Total rows: 33,223
Duplicates: 0
Null rate: 0.00%

$$
$$
$$
Data Transformation with Spark SQL in Databricks