Introduction to Apache Airflow in Python
Mike Metzger
Data Engineer
schedule_interval
running
failed
success
When scheduling a DAG, there are several attributes of note:
start_date
- The date / time to initially schedule the DAG runend_date
- Optional attribute for when to stop running new DAG instancesmax_tries
- Optional attribute for how many attempts to makeschedule_interval
- How often to runschedule_interval
represents:
start_date
and end_date
cron
style syntax or via built-in presets.*
represents running for every interval (ie, every minute, every day, etc)0 12 * * * # Run daily at noon
* * 25 2 * # Run once per minute on February 25
0,15,30,45 * * * * # Run every 15 minutes
Preset:
cron equivalent:
0 * * * *
0 0 * * *
0 0 * * 0
0 0 1 * *
0 0 1 1 *
Airflow has two special schedule_interval
presets:
None
- Don't schedule ever, used for manually triggered DAGs@once
- Schedule only onceWhen scheduling a DAG, Airflow will:
start_date
as the earliest possible valuestart_date
+ schedule_interval
'start_date': datetime(2020, 2, 25),
'schedule_interval': @daily
This means the earliest starting time to run the DAG is on February 26th, 2020
Introduction to Apache Airflow in Python