Airflow core concepts

Building Data Pipelines with Airflow

Volker Janz

Senior Developer Advocate at Astronomer

Meet your instructor

  Volker Janz profile photo

$$

Volker Janz

$$

  • Senior Developer Advocate, Astronomer
  • 14+ years as a data engineer in gaming
  • Worked with Airflow since version 1.x
  • Speaker, mentor, and newsletter lead at Data Engineer Things
Building Data Pipelines with Airflow

What you'll build

 

  • Author Dags with the TaskFlow API
  • Build dynamic workflows with task mapping and asset-based scheduling
  • Handle failures with retries and callbacks
  • Run SQL workloads through Airflow

Visualization of each chapter content

Building Data Pipelines with Airflow

Before we start

 

$$

  • Comfortable with Dags, tasks, and operators
  • Familiar with scheduling basics

Introduction to Airflow - course page

Building Data Pipelines with Airflow

Quick refresher

from airflow.sdk import dag, task

@dag
def star_wars_dag():


@task def get_star_wars_person(): import requests return requests.get("https://swapi.dev/api/people/1/").json()
@task.bash def print_name(person): return f"echo '{person['name']}'"
person = get_star_wars_person() print_name(person) star_wars_dag()
  • A Dag is a collection of tasks with dependencies
  • Tasks are individual units of work
  • Operators / decorators define what each task does
  • Dependencies define the order of execution

$$

Simple Dag

Building Data Pipelines with Airflow

Airflow architecture

$$

Airflow 3 architecture

 

$$

  • Orchestration: Scheduler, Dag Processor
  • Execution: Worker, Triggerer
  • Interface & Storage: API Server, Metadata DB
Building Data Pipelines with Airflow

Scheduling approaches

# No automatic runs: trigger manually (default)
@dag(schedule=None)
def my_pipeline(): ...

# Time-based: runs every day at 6 AM @dag(schedule="0 6 * * *") def daily_pipeline(): ...
# Data-aware: runs when an Asset updates @dag(schedule=[Asset("my_asset")]) def downstream_pipeline(): ...
Building Data Pipelines with Airflow

Two ways to write Dags

Classic operators

extract = PythonOperator(
    task_id="extract",
    python_callable=extract_fn)
extract >> transform
  • Choose when no decorators are available

TaskFlow API

@task
def extract():
    return {"users": 150}

data = extract()
transform(data)
  • Simple Python decorators
  • Less boilerplate code
  • Can be combined with classic operators
Building Data Pipelines with Airflow

Exercises in this course

$$

IDE exercise screenshot

 

  • IDE exercises: edit real .py files
  • Click "Run this file" or use python3 filename.py
  • dag.test() runs the full Dag in one process
Building Data Pipelines with Airflow

Let's practice!

Building Data Pipelines with Airflow

Preparing Video For Download...