Introduction to Data Engineering
Vincent Vankrunkelsven
Data Engineer @ DataCamp

How to schedule?
cron scheduling toolDirected Acyclic Graph

cron

# Create the DAG object dag = DAG(dag_id="example_dag", ..., schedule_interval="0 * * * *")# Define operations start_cluster = StartClusterOperator(task_id="start_cluster", dag=dag) ingest_customer_data = SparkJobOperator(task_id="ingest_customer_data", dag=dag) ingest_product_data = SparkJobOperator(task_id="ingest_product_data", dag=dag) enrich_customer_data = PythonOperator(task_id="enrich_customer_data", ..., dag = dag)# Set up dependency flow start_cluster.set_downstream(ingest_customer_data) ingest_customer_data.set_downstream(enrich_customer_data) ingest_product_data.set_downstream(enrich_customer_data)
Introduction to Data Engineering