Working with Databases

Scaling and Optimizing Data Pipelines with Polars

Liam Brannigan

Data Scientist & Polars Contributor

Querying a database from Polars

Conceptual diagram showing Polars sending a SQL query and uri to a database.

Scaling and Optimizing Data Pipelines with Polars

Querying a database from Polars

Conceptual diagram showing Polars sending a SQL query to a database and receiving a result table via a database engine.

Scaling and Optimizing Data Pipelines with Polars

Databases Polars can query

Databases Polars can work with

  • Embedded → runs locally, no server
  • DuckDBanalytical database
Scaling and Optimizing Data Pipelines with Polars

Building permits table

SELECT *
FROM building_permits
LIMIT 5
shape: (5, 5)
| issue_date | permit_type | review_type | street_name  | work_description             |
| ---        | ---         | ---         | ---          | ---                          |
| date       | str         | str         | str          | str                          |
|------------|-------------|-------------|--------------|------------------------------|
| 2026-04-30 | Signs       | SIGN PERMIT | CLARK ST     | Wall sign install            |
| 2026-04-28 | Renovation  | SELF CERT   | OAKLEY AVE   | Add dormer to 8-unit roof    |
| 2026-03-21 | Easy Permit | EASY PERMIT | SEMINARY AVE | Parapet masonry repair       |
| 2026-03-11 | Easy Permit | EASY PERMIT | RHODES AVE   | Rebuild south chimney side   |
| 2026-03-03 | Easy Permit | EASY PERMIT | GIDDINGS ST  | Porch lintel + tuckpointing  |
Scaling and Optimizing Data Pipelines with Polars

Connection URI

import polars as pl

uri = "postgresql://analyst:[email protected]:5432/buildings"
  • postgresql:// protocol
  • analyst:secret username and password
  • data.chicago.org:5432 host and port
  • buildings database name
Scaling and Optimizing Data Pipelines with Polars

Querying a database

query = """
    SELECT issue_date, permit_type
    FROM building_permits
    WHERE issue_date >= DATE '2026-01-01'
"""







Scaling and Optimizing Data Pipelines with Polars

Querying a database

query = """
    SELECT issue_date, permit_type
    FROM building_permits
    WHERE issue_date >= DATE '2026-01-01'
"""

permit_counts = (
    pl.read_database_uri(query=query, uri=uri)



)
Scaling and Optimizing Data Pipelines with Polars

Querying a database

query = """
    SELECT issue_date, permit_type
    FROM building_permits
    WHERE issue_date >= DATE '2026-01-01'
"""

permit_counts = (
    pl.read_database_uri(query=query, uri=uri)
    .group_by("permit_type")
    .len()
    .sort("len", descending=True)
)
Scaling and Optimizing Data Pipelines with Polars

A grouped permit count

permit_counts
shape: (3, 2)
| permit_type                     | len |
| ---                             | --- |
| str                             | u32 |
|---------------------------------|-----|
| Easy Permit                     | 58  |
| New Construction                | 14  |
| Renovation                      | 11  |
Scaling and Optimizing Data Pipelines with Polars

Choose your engine

pl.read_database_uri(
    query=query,
    uri=uri,
    engine="connectorx",
)
  • More supported databases
pl.read_database_uri(
    query=query,
    uri=uri,
    engine="adbc",
)
  • Faster for supported databases
1 https://arrow.apache.org/adbc/current/driver/status.html
Scaling and Optimizing Data Pipelines with Polars

Writing back to the database

permit_counts.write_database(



)
Scaling and Optimizing Data Pipelines with Polars

Writing back to the database

permit_counts.write_database(
    table_name="permit_type_summary",
    connection=uri,

)
Scaling and Optimizing Data Pipelines with Polars

Writing back to the database

permit_counts.write_database(
    table_name="permit_type_summary",
    connection=uri,
    if_table_exists="replace",
)
  • append
  • fail
Scaling and Optimizing Data Pipelines with Polars

DuckDB integration

import duckdb

db = duckdb.connect("permits.duckdb")


$$

  • DuckDBbuilt-in Polars output
Scaling and Optimizing Data Pipelines with Polars

DuckDB integration

import duckdb

db = duckdb.connect("permits.duckdb")

result = db.sql("SELECT * FROM permit_type_summary").pl()
shape: (3, 2)
| permit_type                     | len |
| ---                             | --- |
| str                             | u32 |
|---------------------------------|-----|
| Easy Permit                     | 58  |
| New Construction                | 14  |
| Renovation                      | 11  |
Scaling and Optimizing Data Pipelines with Polars

Let's practice!

Scaling and Optimizing Data Pipelines with Polars

Preparing Video For Download...