Creating a production pipeline

Introduction to Apache Airflow in Python

Mike Metzger

Data Engineer

Running DAGs & Tasks

To run a specific task from command-line:

airflow tasks test <dag_id> <task_id> <date>

To run a full DAG:

airflow dags trigger -e <date> <dag_id>
Introduction to Apache Airflow in Python

Operators reminder

  • BashOperator - expects a bash_command
  • PythonOperator - expects a python_callable
  • BranchPythonOperator - requires a python_callable and provide_context=True. The callable must accept **kwargs.
  • FileSensor - requires filepath argument and might need mode or poke_interval attributes
Introduction to Apache Airflow in Python

Template reminders

  • Many objects in Airflow can use templates
  • Certain fields may use templated strings, while others do not
  • One way to check is to use built-in documentation:
  1. Open python3 interpreter
  2. Import necessary libraries (ie, from airflow.operators.bash import BashOperator)
  3. At prompt, run help(<Airflow object>), ie, help(BashOperator)
  4. Look for a line that referencing template_fields. This will specify any of the arguments that can use templates.
Introduction to Apache Airflow in Python

Template documentation example

Airflow python3 help

Airflow template help

Introduction to Apache Airflow in Python

Let's practice!

Introduction to Apache Airflow in Python

Preparing Video For Download...