Debugging and troubleshooting in Airflow

Introduction to Apache Airflow in Python

Mike Metzger

Data Engineer

Typical issues...

  • DAG won't run on schedule
  • DAG won't load
  • Syntax errors
Introduction to Apache Airflow in Python

DAG won't run on schedule

  • Check if scheduler is running

Airflow scheduler not running

  • Fix by running airflow scheduler from the command-line.
Introduction to Apache Airflow in Python

DAG won't run on schedule (part 2)

  • At least one schedule_interval hasn't passed.
    • Modify the attributes to meet your requirements.
  • Not enough tasks free within the executor to run.
    • Change executor type
    • Add system resources
    • Add more systems
    • Change DAG scheduling
Introduction to Apache Airflow in Python

DAG won't load

  • DAG not in web UI
  • DAG not in airflow dags list

Possible solutions

  • Verify DAG file is in correct folder
  • Determine the DAGs folder via airflow.cfg
  • Note, the folder must be an absolute path

Airflow dags_folder

Introduction to Apache Airflow in Python

Syntax errors

  • The most common reason a DAG file won't appear
  • Sometimes difficult to find errors in DAG
  • Two quick methods:

    • Run airflow dags list-import-errors

    • Run python3 <dagfile.py>

Introduction to Apache Airflow in Python

airflow dags list-import-errors

airflow dags list-import-errors with errors

Introduction to Apache Airflow in Python

Running the Python interpreter

python3 dagfile.py:

  • With errors

python3 - errors

  • Without errors

python3 - no errors

Introduction to Apache Airflow in Python

Let's practice!

Introduction to Apache Airflow in Python

Preparing Video For Download...