Executando um pipeline de dados em produção

ETL e ELT em Python

Jake Roach

Data Engineer

Padrões de arquitetura de pipelines de dados

# Define ETL function
...
def load(clean_data):
...

# Run the data pipeline
raw_stock_data = extract("raw_stock_data.csv")
clean_stock_data = transform(raw_stock_data)
load(clean_stock_data)

> ls
 etl_pipeline.py
# Import extract, transform, and load functions
from pipeline_utils import extract, transform, load

# Run the data pipeline
raw_stock_data = extract("raw_stock_data.csv")
clean_stock_data = transform(raw_stock_data)
load(clean_stock_data)

> ls
 etl_pipeline.py
 pipeline_utils.py
ETL e ELT em Python

Executando um pipeline de ponta a ponta

import logging
from pipeline_utils import extract, transform, load

logging.basicConfig(format='%(levelname)s: %(message)s', level=logging.DEBUG)
try:
    # Extract, transform, and load data
    raw_stock_data = extract("raw_stock_data.csv")
    clean_stock_data = transform(raw_stock_data)
    load(clean_stock_data)

    logging.info("Successfully extracted, transformed and loaded data.")  # Log success message

# Handle exceptions, log messages
except Exception as e:
    logging.error(f"Pipeline failed with error: {e}")
ETL e ELT em Python

Orquestrando pipelines de dados em produção

Ferramentas de orquestração por participação de mercado.

1 https://open.substack.com/pub/seattledataguy/p/the-state-of-data-engineering-part?r=1po78c&utm_campaign=post&utm_medium=web
ETL e ELT em Python

Vamos praticar!

ETL e ELT em Python

Preparing Video For Download...