Introduction to Polars
Liam Brannigan
Data Scientist & Polars Contributor
rentals
shape: (49, 8)
| name | type | price | bedrooms | doubles | singles | review | beach |
| --- | --- | --- | --- | --- | --- | --- | --- |
| str | str | i64 | i64 | i64 | i64 | f64 | bool |
|-----------|---------|-------|----------|---------|---------|--------|-------|
| Waves | Cottage | 540 | 4 | 1 | 2 | 8.9 | false |
| Seashells | Cottage | 540 | 4 | 2 | 2 | 8.7 | true |
| ... | ... | ... | ... | ... | ... | ... | ... |
count_data = rentals["type"]
shape: (49,)
Series: 'type' [str]
[
"Cottage"
"Cottage"
"..."
]
count_data = rentals["type"].value_counts()
shape: (4, 2)
| type | count |
| --- | --- |
| str | u32 |
|-----------|-------|
| Villa | 8 |
| House | 12 |
| Apartment | 10 |
| Cottage | 30 |
count_data = rentals["type"].value_counts(sort=True)
shape: (4, 2)
| type | count |
| --- | --- |
| Cottage | 30 |
| House | 12 |
| Apartment | 10 |
| Villa | 8 |
import plotly.express as px
px.bar(count_data, x="type", y="count", title="Count of Property Type")
rentals.group_by("type")
rentals.group_by("type").agg(
pl.col("price").mean()
)
shape: (4, 2)
| type | price |
| --- | --- |
| str | f64 |
|----------|--------|
| Cottage | 768.37 |
| House | 1146.83|
| Apartment| 566.2 |
| Villa | 1635.0 |
rentals.group_by("type", maintain_order=True).agg(
pl.col("price").mean()
)
shape: (6, 2)
| type | price |
| --- | --- |
| str | f64 |
|-----------|------------|
| Cottage | 878.566667 |
| Villa | 1640.0 |
| ... | ... |
df.group_by("type").agg(
pl.col("price").mean()
).sort("price")
shape: (6, 2)
| type | price |
| --- | --- |
| str | f64 |
|-----------|------------|
| Caravan | 551.0 |
| null | 775.0 |
| ... | ... |
rentals.group_by("type").agg( pl.col("price").mean().name.suffix("_mean"), pl.col("price").max().name.suffix("_max"),
pl.col("review").mean().alias("review_mean")
)
shape: (6, 4)
| type | price_mean | price_max | review_mean |
| --- | --- | --- | --- |
| str | f64 | i64 | f64 |
|-----------|------------|-----------|-------------|
| Apartment | 1220.0 | 1220 | 9.7 |
| Caravan | 551.0 | 690 | 8.05 |
| --- | --- | --- | --- |
(
rentals
.group_by("type", "bedrooms")
.agg(pl.col("price").mean(),pl.len())
)
shape: (10, 4)
| type | bedrooms | price | len |
| --- | --- | --- | --- |
| str | i64 | f64 | u32 |
|-----------|----------|-----------|------|
| Cottage | 4 | 911.055 | 4 |
| Cottage | 5 | 1146.00 | 1 |
| Villa | 4 | 1598.66 | 6 |
| ... | ... | ... | ... |
(
pl.scan_csv("vacation_rentals.csv")
.group_by("type", "bedrooms")
.agg(pl.col("price").mean())
)
(
pl.scan_csv("vacation_rentals.csv")
.group_by("type", "bedrooms")
.agg(pl.col("price").mean())
.explain()
)
AGGREGATE
[col("price").mean()] BY [col("type"), col("bedrooms")] FROM
simple PROJ 3/3 ["price", "type", "bedrooms"]
Csv SCAN [vacation_rentals.csv]
PROJECT 3/8 COLUMNS
Introduction to Polars