Scaling and Optimizing Data Pipelines with Polars
Liam Brannigan
Data Scientist & Polars Contributor

events = pl.read_parquet("chicago_events.parquet")
events.select(
"event_date", "event_title", "tags", "area"
).head()
shape: (5, 4)
| event_date | event_title | tags | area |
| --- | --- | --- | --- |
| date | str | list[str] | str |
|------------|------------------|-------------------------------------|---------------|
| 2025-06-14 | Chef Showcase | ["food", "chef_demo"] | West Loop |
| 2025-06-29 | Folk Festival | ["music", "neighborhood", "crafts"] | Andersonville |
| 2025-05-25 | Fireworks Night | ["nightlife", "family"] | Downtown |
| 2025-09-20 | Rail History Day | ["history", "family", "education"] | Pullman |
| 2025-07-04 | Grant Park Concert | ["music", "holiday", "family"] | Loop |
events["tags"][0]
shape: (2,)
Series: '' [str]
[
"food"
"chef_demo"
]
events.select(
"event_title",
"tags",
)
events.select(
"event_title",
"tags",
pl.col("tags").list.get(0),
)
events.select(
"event_title",
"tags",
pl.col("tags").list.get(0).alias("primary_tag"),
)
events.select(
"event_title",
"tags",
pl.col("tags").list.get(0).alias("primary_tag"),
).head(3)
shape: (3, 3)
| event_title | tags | primary_tag |
| --- | --- | --- |
| str | list[str] | str |
|--------------------|-------------------------------------|-------------|
| Chef Showcase | ["food", "chef_demo"] | food |
| Folk Festival | ["music", "neighborhood", "crafts"] | music |
| Fireworks Night | ["nightlife", "family"] | nightlife |
events.select(
"event_title","tags",
)
events.select(
"event_title","tags",
pl.col("tags").list.len().alias("tag_count"),
)
events.select(
"event_title","tags",
pl.col("tags").list.len().alias("tag_count"),
pl.col("tags").list.contains("family").alias("has_family"),
)
events.select(
"event_title","tags",
pl.col("tags").list.len().alias("tag_count"),
pl.col("tags").list.contains("family").alias("has_family"),
).filter(pl.col("has_family")).head(3)
shape: (3, 4)
| event_title | tags | tag_count | has_family |
| --- | --- | --- | --- |
| str | list[str] | u32 | bool |
|--------------------|------------------------------------|-----------|------------|
| Fireworks Night | ["nightlife", "family"] | 2 | true |
| Rail History Day | ["history", "family", "education"] | 3 | true |
| Grant Park Concert | ["music", "holiday", "family"] | 3 | true |
list.first() → index$$
list.unique() → deduplicate$$
list.mean() → numeric aggregation$$
events.select("event_title","tags")
events.select("event_title","tags").explode("tags")
events.select("event_title","tags").explode("tags")
shape: (1346328, 2)
| event_title | tags |
| --- | --- |
| str | str |
|--------------------|------------- |
| Chef Showcase | food |
| Chef Showcase | chef_demo |
| Folk Festival | music |
| Folk Festival | neighborhood |
| Folk Festival | crafts |
| Fireworks Night | nightlife |
| ... | ... |
events.select("event_title","tags").explode("tags")["tags"].value_counts(sort=True)
events.select("event_title","tags").explode("tags")["tags"].value_counts(sort=True).head()
shape: (5, 2)
| tags | count |
| --- | --- |
| str | u32 |
|-----------|---------|
| family | 59864 |
| nightlife | 45302 |
| music | 35612 |
| food | 23948 |
| culture | 21566 |
Scaling and Optimizing Data Pipelines with Polars