Creating a production pipeline

Introduzione ad Apache Airflow in Python

Mike Metzger

Data Engineer

Running DAGs & Tasks

To run a specific task from command-line:

airflow tasks test <dag_id> <task_id> <date>

To run a full DAG:

airflow dags trigger -e <date> <dag_id>
Introduzione ad Apache Airflow in Python

Operators reminder

  • BashOperator - expects a bash_command
  • PythonOperator - expects a python_callable
  • BranchPythonOperator - requires a python_callable and provide_context=True. The callable must accept **kwargs.
  • FileSensor - requires filepath argument and might need mode or poke_interval attributes
Introduzione ad Apache Airflow in Python

Template reminders

  • Many objects in Airflow can use templates
  • Certain fields may use templated strings, while others do not
  • One way to check is to use built-in documentation:
  1. Open python3 interpreter
  2. Import necessary libraries (ie, from airflow.operators.bash import BashOperator)
  3. At prompt, run help(<Airflow object>), ie, help(BashOperator)
  4. Look for a line that referencing template_fields. This will specify any of the arguments that can use templates.
Introduzione ad Apache Airflow in Python

Template documentation example

Airflow python3 help

Airflow template help

Introduzione ad Apache Airflow in Python

Let's practice!

Introduzione ad Apache Airflow in Python

Preparing Video For Download...