Loading data to a SQL database with pandas

ETL and ELT in Python

Jake Roach

Data Engineer

Load data to a SQL database with pandas

ETL pipeline with the load component highlighted.

ETL and ELT in Python

Loading data into a SQL database with pandas

Data consumers accessing a SQL database.

pandas provides .to_sql() to persist data to SQL

  • name
  • con
  • if_exists
  • index
  • index_label
ETL and ELT in Python

Persisting data to Postgres with pandas

# Create a connection object
connection_uri = "postgresql+psycopg2://repl:password@localhost:5432/market"
db_engine = sqlalchemy.create_engine(connection_uri)
# Use the .to_sql() method to persist data to SQL
clean_stock_data.to_sql(
    name="filtered_stock_data",
    con=db_engine, 
    if_exists="append",
    index=True,
    index_label="timestamps"
)
ETL and ELT in Python

Validating data persistence with pandas

It's important to validate that data is persisted as expected.

  • Ensure data can be queried
  • Make sure counts match
  • Validate that each row is present
# Pull data written to SQL table
to_validate = pd.read_sql("SELECT * FROM cleaned_stock_data", db_engine)
# Validate counts, record equality, etc
...
ETL and ELT in Python

Let's practice!

ETL and ELT in Python

Preparing Video For Download...