Introduction to Apache Airflow

Introduction to Apache Airflow in Python

Mike Metzger

Data Engineer

What is a workflow?

  • Workflow - A set of steps to accomplish a given data engineering process

 

  • Example: Download files, copy data, filter information, and write to a database

Workflow diagram of sequential data engineering steps from downloading files to writing to a database

Introduction to Apache Airflow in Python

What is Airflow?

Airflow - a platform to orchestrate workflows:

  • Create
  • Schedule
  • Monitor

 

Apache Airflow logo

   

Icons representing creating, scheduling, and monitoring workflows

Introduction to Apache Airflow in Python

What is Airflow?

  • Can implement programs from any language, but workflows are written in Python
  • Implements workflows as Dags
  • Accessed via web interface, code, command-line, or via REST API
  • Used for ETL pipelines, ML workflows, automation, etc

 

Apache Airflow logo

Introduction to Apache Airflow in Python

Quick introduction to Dags

Dag - a model that represents everything needed to execute a workflow.

  • Consists of the tasks and their dependencies
  • Created with various details including the name, email, owner

Example Dag of tasks connected by dependency arrows

Introduction to Apache Airflow in Python

Airflow components

  • Scheduler -Triggers scheduled workflows, submits tasks
  • API Server - Provides consistent secure access to the Airflow platform
  • Dag processor - Read by the scheduler to figure out what tasks to run and when to run them
  • Metadata database - Stores the state of Dags and tasks

Diagram of Airflow components: scheduler, API server, Dag processor, and metadata database

Introduction to Apache Airflow in Python

Running a workflow in Airflow UI

Airflow Dags view showing two loaded Dags with interaction options

Introduction to Apache Airflow in Python

Running a workflow in Airflow UI

Airflow Dags view with the trigger play button highlighted

Introduction to Apache Airflow in Python

Running a workflow in Airflow UI

Airflow trigger Dag popup with default run options and a Trigger button

Introduction to Apache Airflow in Python

Running a workflow in Airflow UI

Airflow Dags view showing a successful latest run marked with a green checkmark

Introduction to Apache Airflow in Python

Running a workflow in Airflow UI

Airflow Dag run tasks view listing individual task details such as generate_random_number

Introduction to Apache Airflow in Python

Let's practice!

Introduction to Apache Airflow in Python

Preparing Video For Download...