Introduction to Data Engineering
Vincent Vankrunkelsven
Data Engineer @ DataCamp
df = spark.read.parquet("users.parquet")
outliers = df.filter(df["age"] > 100)
print(outliers.count())
Data engineer understands the abstractions.
JoinProductOrder
needs to run after CleanProduct
and CleanOrder
Databases
Processing
Scheduling
Introduction to Data Engineering