Introduction to Data Engineering
Vincent Vankrunkelsven
Data Engineer @ DataCamp



df = spark.read.parquet("users.parquet")
outliers = df.filter(df["age"] > 100)
print(outliers.count())
Data engineer understands the abstractions.

JoinProductOrder needs to run after CleanProduct and CleanOrder
Databases


Processing

Scheduling




Introduction to Data Engineering