The data pipeline
Understanding Data Engineering
Hadrien Lacroix
Content Developer at DataCamp
If data is the new oil...
1
The Economist, 2017-05-06, by David Parkins
Back to data engineering
Ingest
Process
Store
Need pipelines
Automate flow from one station to the next
Provide up-to-date, accurate, relevant data
$$
Data pipelines ensure an efficient flow of the data
Automate
Extracting
Transforming
Combining
Validating
Loading
Reduce
Human intervention
Errors
Time it takes data to flow
ETL and data pipelines
ETL
Popular framework for designing data pipelines
1)
Extract
data
2)
Transform
extracted data
3)
Load
transformed data to another database
Data pipelines
Move data from one system to another
May follow ETL
Data may not be transformed
Data may be directly loaded in applications
Summary
What a data pipeline is
What it does
Why it's important
How data pipelines are implemented at Spotflix
What ETL is and its nuances
Let's practice!
Understanding Data Engineering
Preparing Video For Download...