Airflow operators

Introduction to Apache Airflow in Python

Mike Metzger

Data Engineer

Operators

  • Represent a single task in a workflow.
  • Run independently (usually).
  • Generally do not share information.
  • Various operators to perform different tasks.
# New way, Airflow 2.x+
EmptyOperator(task_id='example')

# Old way, Airflow <2.0
EmptyOperator(task_id='example', dag=dag_name)
Introduction to Apache Airflow in Python

BashOperator

BashOperator(
    task_id='bash_example',
    bash_command='echo "Example!"',
    # Next line only for Airflow before version 2
    dag=dag
)
BashOperator(
    task_id='bash_script_example',
    bash_command='runcleanup.sh',
)
  • Executes a given Bash command or script.
  • Runs the command in a temporary directory.
  • Can specify environment variables for the command.
Introduction to Apache Airflow in Python

BashOperator examples

from airflow.operators.bash import BashOperator

example_task = BashOperator(task_id='bash_ex', bash_command='echo 1', )
bash_task = BashOperator(task_id='clean_addresses',
  bash_command='cat addresses.txt | awk "NF==10" > cleaned.txt',
)
Introduction to Apache Airflow in Python

Operator gotchas

  • Not guaranteed to run in the same location / environment.
  • May require extensive use of Environment variables.
  • Can be difficult to run tasks with elevated privileges.
Introduction to Apache Airflow in Python

Let's practice!

Introduction to Apache Airflow in Python

Preparing Video For Download...