Scheduling data

Capire il Data Engineering

Hadrien Lacroix

Content Developer at DataCamp

Scheduling

  • Can apply to any task listed in data processing
  • Scheduling is the glue of your system
  • Holds each piece and organize how they work together
  • Runs tasks in a specific order and resolves all dependencies
Capire il Data Engineering

Manual, time and sensor scheduling

  • Manually
  • Manually update the employee table
Capire il Data Engineering

data pipeline

Capire il Data Engineering

image showing a clock - the employees table gets updated every morning at 6 AM

Capire il Data Engineering

Manual, time and sensor scheduling

  • Manually
  • Automatically run at a specific time
  • Automatically run if a specific condition is met
    • Sensor scheduling
  • Manually update the employee table
  • Update the employee table at 6 AM
Capire il Data Engineering

data pipeline

Capire il Data Engineering

image showing a sensor listening to the employees table before splitting into departments

Capire il Data Engineering

Manual, time, and sensor scheduling

  • Manually
  • Automatically run at a specific time
  • Automatically run if a specific condition is met
    • Sensor scheduling
  • Manually update the employee table
  • Update the employee table at 6 AM
  • Update the department tables if a new employee was added
Capire il Data Engineering

Batches and streams

  • Batches
    • Group records at intervals
    • Often cheaper
  • Streams
    • Send individual records right away
  • Songs uploaded by artists
  • Employee table
  • Revenue table
  • New users signing in
  • Another example: online vs. offline listening
Capire il Data Engineering

Scheduling tools

airflow and luigi logos

Capire il Data Engineering

Summary

  • What scheduling is
  • Different ways to set it up
  • Difference between batches and streams
  • How scheduling is implemented at Spotflix
  • Airflow, Luigi
Capire il Data Engineering

Let's practice!

Capire il Data Engineering

Preparing Video For Download...