The data pipeline

Understanding Data Engineering

Hadrien Lacroix

Content Developer at DataCamp

If data is the new oil...

data is the new oil - economist cover

1 The Economist, 2017-05-06, by David Parkins
Understanding Data Engineering

oil well

Understanding Data Engineering

piping from oil well

Understanding Data Engineering

distilling

Understanding Data Engineering

residue

Understanding Data Engineering

heavy oil

Understanding Data Engineering

diesel

Understanding Data Engineering

kerosene

Understanding Data Engineering

naphtha

Understanding Data Engineering

gasoline

Understanding Data Engineering

kerosene is delivered directly to airport

Understanding Data Engineering

gasoline is delivered to gas storage facility

Understanding Data Engineering

gasoline is delivered from gas storage facility to gas stations

Understanding Data Engineering

naphtha undergoes chemical transformations

Understanding Data Engineering

plastic is sent to the factory

Understanding Data Engineering

Back to data engineering

  • Ingest
  • Process
  • Store
  • Need pipelines
  • Automate flow from one station to the next
  • Provide up-to-date, accurate, relevant data

$$

data-engineer

Understanding Data Engineering

mobile

Understanding Data Engineering

computer

Understanding Data Engineering

website

Understanding Data Engineering

pipes from mobile app, desktop app and website

Understanding Data Engineering

data rack

Understanding Data Engineering

artists

Understanding Data Engineering

albums

Understanding Data Engineering

tracks

Understanding Data Engineering

playlists

Understanding Data Engineering

customers

Understanding Data Engineering

employees

Understanding Data Engineering

Artists database

Understanding Data Engineering

sales employees

Understanding Data Engineering

engineering employees

Understanding Data Engineering

Support employees

Understanding Data Engineering

United States sales employees

Understanding Data Engineering

Belgium sales employees

Understanding Data Engineering

France sales employees

Understanding Data Engineering

check and clean tracks

Understanding Data Engineering

write clean tracks to database

Understanding Data Engineering

Understanding Data Engineering

Understanding Data Engineering

Understanding Data Engineering

Understanding Data Engineering

Understanding Data Engineering

Understanding Data Engineering

Understanding Data Engineering

Understanding Data Engineering

Understanding Data Engineering

Oprah distributing pipelines

Understanding Data Engineering

Oprah distributing pipelines

Understanding Data Engineering

Oprah distributing pipelines

Understanding Data Engineering

Data pipelines ensure an efficient flow of the data

Automate

  • Extracting
  • Transforming
  • Combining
  • Validating
  • Loading

Reduce

  • Human intervention
  • Errors
  • Time it takes data to flow
Understanding Data Engineering

ETL and data pipelines

ETL

  • Popular framework for designing data pipelines
  • 1) Extract data
  • 2) Transform extracted data
  • 3) Load transformed data to another database

Data pipelines

  • Move data from one system to another
  • May follow ETL
  • Data may not be transformed
  • Data may be directly loaded in applications
Understanding Data Engineering

Summary

  • What a data pipeline is
  • What it does
  • Why it's important
  • How data pipelines are implemented at Spotflix
  • What ETL is and its nuances
Understanding Data Engineering

Let's practice!

Understanding Data Engineering

Preparing Video For Download...