Grouped aggregations

Introduction to Polars

Liam Brannigan

Data Scientist & Polars Contributor

Our vacation rentals dataset

rentals
shape: (49, 8)
| name      | type    | price | bedrooms | doubles | singles | review | beach |
| ---       | ---     | ---   | ---      | ---     | ---     | ---    | ---   |
| str       | str     | i64   | i64      | i64     | i64     | f64    | bool  |
|-----------|---------|-------|----------|---------|---------|--------|-------|
| Waves     | Cottage | 540   | 4        | 1       | 2       | 8.9    | false |
| Seashells | Cottage | 540   | 4        | 2       | 2       | 8.7    | true  |
| ...       | ...     | ...   | ...      | ...     | ...     | ...    | ...   |
Introduction to Polars

Counting values in a column

count_data = rentals["type"]
shape: (49,)
Series: 'type' [str]
[
    "Cottage"
    "Cottage"
    "..."
]
Introduction to Polars

Counting values in a column

count_data = rentals["type"].value_counts()
shape: (4, 2)
| type      | count |
| ---       | ---   |
| str       | u32   |
|-----------|-------|
| Villa     | 8     |
| House     | 12    |
| Apartment | 10    |
| Cottage   | 30    |
Introduction to Polars

Counting values in a column

count_data = rentals["type"].value_counts(sort=True)
shape: (4, 2)
| type      | count |
| ---       | ---   |
| Cottage   | 30    |
| House     | 12    |
| Apartment | 10    |
| Villa     | 8     |
Introduction to Polars

Visualizing aggregated data

import plotly.express as px
px.bar(count_data, x="type", y="count", title="Count of Property Type")

Bar chart showing count of each property type

Introduction to Polars

Grouping and aggregating

rentals.group_by("type")


Introduction to Polars

Grouping and aggregating

rentals.group_by("type").agg(
    pl.col("price").mean()
)
shape: (4, 2)
| type     | price  |
| ---      | ---    |
| str      | f64    |
|----------|--------|
| Cottage  | 768.37 |
| House    | 1146.83|
| Apartment| 566.2  |
| Villa    | 1635.0 |
Introduction to Polars

Controlling group-by order

rentals.group_by("type", maintain_order=True).agg(
    pl.col("price").mean()
)
shape: (6, 2)
| type      | price      |
| ---       | ---        |
| str       | f64        |
|-----------|------------|
| Cottage   | 878.566667 |
| Villa     | 1640.0     |
| ...       | ...        |
df.group_by("type").agg(
    pl.col("price").mean()
).sort("price")
shape: (6, 2)
| type      | price      |
| ---       | ---        |
| str       | f64        |
|-----------|------------|
| Caravan   | 551.0      |
| null      | 775.0      |
| ...       | ...        |
Introduction to Polars

Multiple aggregations

rentals.group_by("type").agg(
    pl.col("price").mean().name.suffix("_mean"),
    pl.col("price").max().name.suffix("_max"),

pl.col("review").mean().alias("review_mean")
)
shape: (6, 4)
| type      | price_mean | price_max | review_mean |
| ---       | ---        | ---       | ---         |
| str       | f64        | i64       | f64         |
|-----------|------------|-----------|-------------|
| Apartment | 1220.0     | 1220      | 9.7         |
| Caravan   | 551.0      | 690       | 8.05        |
| ---       | ---        | ---       | ---         |
Introduction to Polars

Grouping by multiple columns

(
  rentals
  .group_by("type", "bedrooms")
  .agg(pl.col("price").mean(),pl.len())
)
shape: (10, 4)
| type      | bedrooms | price     | len  |
| ---       | ---      | ---       | ---  |
| str       | i64      | f64       | u32  |
|-----------|----------|-----------|------|
| Cottage   | 4        | 911.055   | 4    |
| Cottage   | 5        | 1146.00   | 1    |
| Villa     | 4        | 1598.66   | 6    |
| ...       | ...      | ...       | ...  |
Introduction to Polars

Query optimizations

(
  pl.scan_csv("vacation_rentals.csv")
  .group_by("type", "bedrooms")
  .agg(pl.col("price").mean())
)
Introduction to Polars

Query optimizations

(
  pl.scan_csv("vacation_rentals.csv")
  .group_by("type", "bedrooms")
  .agg(pl.col("price").mean())
  .explain()
)
AGGREGATE
    [col("price").mean()] BY [col("type"), col("bedrooms")] FROM
  simple PROJ 3/3 ["price", "type", "bedrooms"]
    Csv SCAN [vacation_rentals.csv]
    PROJECT 3/8 COLUMNS
Introduction to Polars

Let's practice!

Introduction to Polars

Preparing Video For Download...