The data pipeline

Capire il Data Engineering

Hadrien Lacroix

Content Developer at DataCamp

If data is the new oil...

data is the new oil - economist cover

1 The Economist, 2017-05-06, by David Parkins
Capire il Data Engineering

oil well

Capire il Data Engineering

piping from oil well

Capire il Data Engineering

distilling

Capire il Data Engineering

residue

Capire il Data Engineering

heavy oil

Capire il Data Engineering

diesel

Capire il Data Engineering

kerosene

Capire il Data Engineering

naphtha

Capire il Data Engineering

gasoline

Capire il Data Engineering

kerosene is delivered directly to airport

Capire il Data Engineering

gasoline is delivered to gas storage facility

Capire il Data Engineering

gasoline is delivered from gas storage facility to gas stations

Capire il Data Engineering

naphtha undergoes chemical transformations

Capire il Data Engineering

plastic is sent to the factory

Capire il Data Engineering

Back to data engineering

  • Ingest
  • Process
  • Store
  • Need pipelines
  • Automate flow from one station to the next
  • Provide up-to-date, accurate, relevant data

$$

data-engineer

Capire il Data Engineering

mobile

Capire il Data Engineering

computer

Capire il Data Engineering

website

Capire il Data Engineering

pipes from mobile app, desktop app and website

Capire il Data Engineering

data rack

Capire il Data Engineering

artists

Capire il Data Engineering

albums

Capire il Data Engineering

tracks

Capire il Data Engineering

playlists

Capire il Data Engineering

customers

Capire il Data Engineering

employees

Capire il Data Engineering

Artists database

Capire il Data Engineering

sales employees

Capire il Data Engineering

engineering employees

Capire il Data Engineering

Support employees

Capire il Data Engineering

United States sales employees

Capire il Data Engineering

Belgium sales employees

Capire il Data Engineering

France sales employees

Capire il Data Engineering

check and clean tracks

Capire il Data Engineering

write clean tracks to database

Capire il Data Engineering

Capire il Data Engineering

Capire il Data Engineering

Capire il Data Engineering

Capire il Data Engineering

Capire il Data Engineering

Capire il Data Engineering

Capire il Data Engineering

Capire il Data Engineering

Capire il Data Engineering

Oprah distributing pipelines

Capire il Data Engineering

Oprah distributing pipelines

Capire il Data Engineering

Oprah distributing pipelines

Capire il Data Engineering

Data pipelines ensure an efficient flow of the data

Automate

  • Extracting
  • Transforming
  • Combining
  • Validating
  • Loading

Reduce

  • Human intervention
  • Errors
  • Time it takes data to flow
Capire il Data Engineering

ETL and data pipelines

ETL

  • Popular framework for designing data pipelines
  • 1) Extract data
  • 2) Transform extracted data
  • 3) Load transformed data to another database

Data pipelines

  • Move data from one system to another
  • May follow ETL
  • Data may not be transformed
  • Data may be directly loaded in applications
Capire il Data Engineering

Summary

  • What a data pipeline is
  • What it does
  • Why it's important
  • How data pipelines are implemented at Spotflix
  • What ETL is and its nuances
Capire il Data Engineering

Let's practice!

Capire il Data Engineering

Preparing Video For Download...