Airflow tasks

Introduction to Apache Airflow in Python

Mike Metzger

Data Engineer

Tasks

Tasks are:

  • Instances of operators
  • Usually assigned to a variable in Python
    example_task = BashOperator(task_id='bash_example',
                              bash_command='echo "Example!"')
    
  • Referred to by the task_id within the Airflow tools
Introduction to Apache Airflow in Python

Task dependencies

$$

  • Define a given order of task completion
  • Are not required for a given workflow, but usually present in most
  • Are referred to as upstream or downstream tasks
  • In Airflow 1.8 and later, are defined using the bitshift operators
    • >>, or the upstream operator
    • <<, or the downstream operator
Introduction to Apache Airflow in Python

Upstream vs Downstream

Upstream means before

Downstream means after

Introduction to Apache Airflow in Python

Simple task dependency

# Define the tasks
task1 = BashOperator(task_id='first_task',
                     bash_command='echo 1'
                    )

task2 = BashOperator(task_id='second_task', bash_command='echo 2' )
# Set first_task to run before second_task task1 >> task2 # or task2 << task1
Introduction to Apache Airflow in Python

Task dependencies in the Airflow UI

simple_dependency DAG view

Introduction to Apache Airflow in Python

Task dependencies in the Airflow UI

simple_dependency DAG view, tasks markup

Introduction to Apache Airflow in Python

Task dependencies in the Airflow UI

simple_dependency DAG with bitstream

Introduction to Apache Airflow in Python

Multiple dependencies

Chained dependencies:

task1 >> task2 >> task3 >> task4

Mixed dependencies:

task1 >> task2 << task3

or:

task1 >> task2
task3 >> task2

chained_dependencies

mixed_dependencies

Introduction to Apache Airflow in Python

Let's practice!

Introduction to Apache Airflow in Python

Preparing Video For Download...