Scheduling data
Understanding Data Engineering
Hadrien Lacroix
Content Developer at DataCamp
Scheduling
Can apply to any task listed in data processing
Scheduling is the glue of your system
Holds each piece and organize how they work together
Runs tasks in a specific order and resolves all dependencies
Manual, time and sensor scheduling
Manually
Manually update the employee table
Manual, time and sensor scheduling
Manually
Automatically run at a specific time
Automatically run if a specific condition is met
Sensor scheduling
Manually update the employee table
Update the employee table at 6 AM
Manual, time, and sensor scheduling
Manually
Automatically run at a specific time
Automatically run if a specific condition is met
Sensor scheduling
Manually update the employee table
Update the employee table at 6 AM
Update the department tables if a new employee was added
Batches and streams
Batches
Group records at intervals
Often cheaper
Streams
Send individual records right away
Songs uploaded by artists
Employee table
Revenue table
New users signing in
Another example: online vs. offline listening
Scheduling tools
Summary
What scheduling is
Different ways to set it up
Difference between batches and streams
How scheduling is implemented at Spotflix
Airflow, Luigi
Let's practice!
Understanding Data Engineering
Preparing Video For Download...