Airflow DAGs

Introduction to Apache Airflow in Python

Mike Metzger

Data Engineer

What is a DAG?

DAG, or Directed Acyclic Graph:

  • Directed, there is an inherent flow representing dependencies between components.
  • Acyclic, does not loop / cycle / repeat.
  • Graph, the actual set of components.
  • Seen in Airflow, Apache Spark, dbt

DAG

1 https://en.m.wikipedia.org/wiki/Directed_acyclic_graph
Introduction to Apache Airflow in Python

DAG in Airflow

Within Airflow, DAGs:

  • Are written in Python (but can use components written in other languages).
  • Are made up of components (typically tasks) to be executed, such as operators, sensors, etc.
  • Contain dependencies defined explicitly or implicitly.
    • ie, Copy the file to the server before trying to import it to the database service.
Introduction to Apache Airflow in Python

Define a DAG

Example DAG:

from airflow import DAG

from datetime import datetime default_arguments = { 'owner': 'jdoe', 'email': '[email protected]', 'start_date': datetime(2020, 1, 20) }
with DAG('etl_workflow', default_args=default_arguments ) as etl_dag:
Introduction to Apache Airflow in Python

Define a DAG (before Airflow 2.x)

Example DAG:

from airflow import DAG

from datetime import datetime default_arguments = { 'owner': 'jdoe', 'email': '[email protected]', 'start_date': datetime(2020, 1, 20) }
etl_dag = DAG('etl_workflow', default_args=default_arguments )
Introduction to Apache Airflow in Python

DAGs on the command line

Using airflow:

  • The airflow command line program contains many subcommands.
  • airflow -h for descriptions.
  • Many are related to DAGs.
  • airflow dags list to show all recognized DAGs.
Introduction to Apache Airflow in Python

Command line vs Python

Use the command line tool to:

  • Start Airflow processes
  • Manually run DAGs / Tasks
  • Get logging information from Airflow

Use Python to:

  • Create a DAG
  • Edit the individual properties of a DAG
Introduction to Apache Airflow in Python

Let's practice!

Introduction to Apache Airflow in Python

Preparing Video For Download...