Transforming data by group

Data Transformation with Polars

Liam Brannigan

Data Scientist & Polars Contributor

Why compare to group statistics?

Time series chart of electricity prices over a day

  • Are the prices higher or lower than normal?
Data Transformation with Polars

Why compare to group statistics?

Time series chart of electricity prices over a day with the hourly average prices

Data Transformation with Polars

Why compare to group statistics?

Time series chart of electricity prices over a day with the hourly average prices and an annotation to show cheap prices at midday

Data Transformation with Polars

Why compare to group statistics?

Time series chart of electricity prices over a day with the hourly average prices and an annotation to show normal prices in the evening

Data Transformation with Polars

Extracting the hour

prices.with_columns(
    pl.col("time").dt.hour().alias("hour")
)
| time                | price | solar   | hour |
| ---                 | ---   | ---     | ---  |
| datetime[µs]        | f64   | f64     | i8   |
|---------------------|-------|-------  |------|
| 2025-07-05 10:00:00 | 2.3   | 419.0   | 10   |
| 2025-07-05 11:00:00 | 0.5   | 481.0   | 11   |
| 2025-07-05 12:00:00 | 0.0   | 462.0   | 12   |
Data Transformation with Polars

Window function with .over()

prices.with_columns(
    pl.col("price").mean()
)
Data Transformation with Polars

Window function with .over()

prices.with_columns(
    pl.col("price").mean().over("hour")
)
Data Transformation with Polars

Window function with .over()

prices.with_columns(
    pl.col("price").mean().over("hour").alias("hourly_avg")
)
Data Transformation with Polars

Window function with .over()

prices.with_columns(
    pl.col("price").mean().over("hour").alias("hourly_avg")
)
| time                | price | solar | hour | hourly_avg |
| ---                 | ---   | ---   | ---  | ---        |
| datetime[µs]        | f64   | f64   | i8   | f64        |
|---------------------|-------|-------|------|------------|
| 2025-07-05 10:00:00 | 2.3   | 419.0 | 10   | 49.6       |
| 2025-07-05 11:00:00 | 0.5   | 481.0 | 11   | 44.4       |
| 2025-07-05 12:00:00 | 0.0   | 462.0 | 12   | 39.5       |
Data Transformation with Polars

Creating a weather category

  • Sunny: solar > 400 W/m²
  • Cloudy: solar <= 400 W/m²
Data Transformation with Polars

Creating a weather category

prices.with_columns(
    pl.when(pl.col("solar") > 400)
    .then(pl.lit("sunny"))
    .otherwise(pl.lit("cloudy"))

.alias("weather")
)
| time                | price | solar | weather |
| ---                 | ---   | ---   | ---     |
| datetime[µs]        | f64   | f64   | str     |
|---------------------|-------|-------|---------|
| 2025-07-05 10:00:00 | 2.3   | 419.0 | sunny   |
| 2025-07-06 10:00:00 | 80.5  | 181.0 | cloudy  |
| 2025-07-07 10:00:00 | 41.5  | 150.0 | cloudy  |
Data Transformation with Polars

Grouping by multiple columns

prices.with_columns(
    pl.col("price").mean()
)
Data Transformation with Polars

Grouping by multiple columns

prices.with_columns(
    pl.col("price").mean().over("hour", "weather")
)
Data Transformation with Polars

Grouping by multiple columns

prices.with_columns(
    pl.col("price").mean().over("hour", "weather").alias("avg_by_hour_weather")
)
Data Transformation with Polars

Grouping by multiple columns

prices.with_columns(
    pl.col("price").mean().over("hour", "weather").alias("avg_by_hour_weather")
)
| time                | price | hour | weather | avg_by_hour_weather |
| ---                 | ---   | ---  | ---     | ---                 |
| datetime[µs]        | f64   | i8   | str     | f64                 |
|---------------------|-------|------|---------|---------------------|
| 2025-07-05 10:00:00 | 2.3   | 10   | sunny   | 34.2                |
| 2025-07-06 10:00:00 | 80.5  | 10   | cloudy  | 58.0                |
| 2025-07-07 10:00:00 | 41.5  | 10   | cloudy  | 58.0                |
Data Transformation with Polars

Comparing to group average

prices.with_columns(
    (pl.col("price")                                     )

)
Data Transformation with Polars

Comparing to group average

prices.with_columns(
    (pl.col("price") - pl.col("price").mean().over("hour"))

)
Data Transformation with Polars

Comparing to group average

prices.with_columns(
    (pl.col("price") - pl.col("price").mean().over("hour"))
    .alias("diff_from_hourly_avg")
)
| time                | price | hour | diff_from_hourly_avg |
| ---                 | ---   | ---  | ---                  |
| datetime[µs]        | f64   | i8   | f64                  |
|---------------------|-------|------|----------------------|
| 2025-07-05 10:00:00 | 2.3   | 10   | -47.3                |
| 2025-07-06 10:00:00 | 25.1  | 10   | -24.5                |
| 2025-07-07 10:00:00 | 2.7   | 10   | -46.9                |
Data Transformation with Polars

Other window aggregations

.over() works with many aggregations:

  • .sum().over() - group totals
  • .max().over() - group maximums
  • .count().over() - group sizes
Data Transformation with Polars

Let's practice!

Data Transformation with Polars

Preparing Video For Download...