Exécuter un pipeline de données en production

ETL et ELT en Python

Jake Roach

Data Engineer

Modèles d’architecture de pipeline de données

# Define ETL function
...
def load(clean_data):
...

# Run the data pipeline
raw_stock_data = extract("raw_stock_data.csv")
clean_stock_data = transform(raw_stock_data)
load(clean_stock_data)

> ls
 etl_pipeline.py
# Import extract, transform, and load functions
from pipeline_utils import extract, transform, load

# Run the data pipeline
raw_stock_data = extract("raw_stock_data.csv")
clean_stock_data = transform(raw_stock_data)
load(clean_stock_data)

> ls
 etl_pipeline.py
 pipeline_utils.py
ETL et ELT en Python

Exécuter un pipeline de bout en bout

import logging
from pipeline_utils import extract, transform, load

logging.basicConfig(format='%(levelname)s: %(message)s', level=logging.DEBUG)
try:
    # Extract, transform, and load data
    raw_stock_data = extract("raw_stock_data.csv")
    clean_stock_data = transform(raw_stock_data)
    load(clean_stock_data)

    logging.info("Successfully extracted, transformed and loaded data.")  # Log success message

# Handle exceptions, log messages
except Exception as e:
    logging.error(f"Pipeline failed with error: {e}")
ETL et ELT en Python

Orchestrer des pipelines de données en production

Outils d’orchestration par part de marché.

1 https://open.substack.com/pub/seattledataguy/p/the-state-of-data-engineering-part?r=1po78c&utm_campaign=post&utm_medium=web
ETL et ELT en Python

Passons à la pratique !

ETL et ELT en Python

Preparing Video For Download...