Passing data between tasks with XCom

Introduction to Apache Airflow in Python

Mike Metzger

Data Engineer

What is XCom?

  • "Cross Communication"
    • Allows tasks to talk to each other
  • Stored in the Airflow metadata database
    • Pass small amounts of data
    • Filenames, URI, row counts

Illustration of XCom passing small data between Airflow tasks

Introduction to Apache Airflow in Python

What not to send via XCom

  • Large files
  • DataFrames
  • Full databases
  • Large images

Illustration of data types to avoid sending via XCom, such as large files and DataFrames

Introduction to Apache Airflow in Python

Implementing XCom

  • Many ways to use XCom
  • We'll focus on the TaskFlow API
  • Extension of what we've already done with @tasks
Introduction to Apache Airflow in Python

XCom example

@dag(dag_id='Example_XCom')
def example_xcom():

@task def get_data(): return data
@task(multiple_outputs=True) def clean_data(sourcedata): return clean(sourcedata) # Example, not implemented
clean_data(get_data()) example_xcom()
Introduction to Apache Airflow in Python

XCom dependencies

  • XComs automatically define dependency order
  • Example
    clean_data(get_data())
    
  • Conceptually the same as get_data() >> clean_data()
  • Further example
     result = clean_data(get_data())
     result >> alert_when_complete()
    
Introduction to Apache Airflow in Python

Viewing XCom data

Airflow XCom page listing stored values by key, Dag, and task

Introduction to Apache Airflow in Python

Let's practice!

Introduction to Apache Airflow in Python

Preparing Video For Download...