ETL and ELT in Python
Jake Roach
Data Engineer
Data pipelines should be thoroughly tested
$$
Validating pipelines' limits maintenance efforts after deployment
Tools and techniques to test data pipelines
End-to-end testing
# Extract, transform, and load data as part of a pipeline
...
# Take a look at the data made available in a Postgres database
loaded_data = pd.read_sql("SELECT * FROM clean_stock_data", con=db_engine)
print(loaded_data.shape)
(6438, 4)
print(loaded_data.head())
timestamps volume open close
1997-05-15 13:30:00 1443120000 0.121875 0.097917
1997-05-16 13:30:00 294000000 0.098438 0.086458
1997-05-19 13:30:00 122136000 0.088021 0.085417
# Extract, transform, and load data, as part of a pipeline
...
# Take a look at the data made available in a Postgres database
loaded_data = pd.read_sql("SELECT * FROM clean_stock_data", con=db_engine)
# Compare the two DataFrames.
print(clean_stock_data.equals(loaded_data))
True
ETL and ELT in Python