Airflow sensors

Introduction to Apache Airflow in Python

Mike Metzger

Data Engineer

Sensors

What is a sensor?

  • An operator that waits for a certain condition to be true
    • Creation of a file
    • Upload of a database record
    • Certain response from a web request
  • Can define how often to check for the condition to be true
  • Are assigned to tasks
Introduction to Apache Airflow in Python

Sensor details

  • Derived from airflow.sensors.base_sensor_operator
  • Sensor arguments:
  • mode - How to check for the condition
    • mode='poke' - The default, run repeatedly
    • mode='reschedule' - Give up task slot and try again later
  • poke_interval - How often to wait between checks
  • timeout - How long to wait before failing task
  • Also includes normal operator attributes
Introduction to Apache Airflow in Python

File sensor

  • Is part of the airflow.sensors library
  • Checks for the existence of a file at a certain location
  • Can also check if any files exist within a directory
from airflow.sensors.filesystem import FileSensor

file_sensor_task = FileSensor(task_id='file_sense',
                              filepath='salesdata.csv',
                              poke_interval=300,
                              dag=sales_report_dag)

init_sales_cleanup >> file_sensor_task >> generate_report
Introduction to Apache Airflow in Python

Other sensors

  • ExternalTaskSensor - wait for a task in another DAG to complete
  • HttpSensor - Request a web URL and check for content
  • SqlSensor - Runs a SQL query to check for content
  • Many others in airflow.sensors and airflow.providers.*.sensors
Introduction to Apache Airflow in Python

Why sensors?

Use a sensor...

  • Uncertain when it will be true
  • If failure not immediately desired
  • To add task repetition without loops
Introduction to Apache Airflow in Python

Let's practice!

Introduction to Apache Airflow in Python

Preparing Video For Download...