Introduction to Apache Airflow

Introduction to Apache Airflow in Python

Mike Metzger

Data Engineer

What is data engineering?

Data engineering is:

  • Taking any action involving data and turning it into a reliable, repeatable, and maintainable process.
Introduction to Apache Airflow in Python

What is a workflow?

A workflow is:

  • A set of steps to accomplish a given data engineering task
    • Such as: downloading files, copying data, filtering information, writing to a database, etc
  • Of varying levels of complexity
  • A term with various meaning depending on context

Example Workflow

Introduction to Apache Airflow in Python

What is Airflow?

Airflow is a platform to program workflows, including:

  • Creation
  • Scheduling
  • Monitoring

Airflow Logo

Introduction to Apache Airflow in Python

Airflow continued...

  • Can implement programs from any language, but workflows are written in Python
  • Implements workflows as DAGs: Directed Acyclic Graphs
  • Accessed via code, command-line, or via web interface / REST API

Airflow Logo

1 https://airflow.apache.org/docs/stable/
Introduction to Apache Airflow in Python

Other workflow tools

Other tools:

  • Luigi
  • SSIS
  • Bash scripting

Luigi logo

Bash logo

SSIS logo

Introduction to Apache Airflow in Python

Quick introduction to DAGs

A DAG stands for Directed Acyclic Graph

  • In Airflow, this represents the set of tasks that make up your workflow.
  • Consists of the tasks and the dependencies between tasks.
  • Created with various details about the DAG, including the name, start date, owner, etc.
  • Further depth in the next lesson.

DAG Example

Introduction to Apache Airflow in Python

DAG code example

Simple DAG definition:

etl_dag = DAG(
    dag_id='etl_pipeline',
    default_args={"start_date": "2024-01-08"}
)
Introduction to Apache Airflow in Python

Running a workflow in Airflow

Running a simple Airflow task

airflow tasks test <dag_id> <task_id> [execution_date]

Using a DAG named example-etl, a task named download-file on 2024-01-10:

airflow tasks test example-etl download-file 2024-01-10
Introduction to Apache Airflow in Python

Let's practice!

Introduction to Apache Airflow in Python

Preparing Video For Download...