Data Transformation with Polars
Liam Brannigan
Data Scientist & Polars Contributor



venues
shape: (4, 6)
| business | location | review | price | type | hygiene_rating |
| --- | --- | --- | --- | --- | --- |
| str | str | f64 | i64 | str | i64 |
|-----------------|-------------|--------|-------|------------|----------------|
| 7burgers | Wakey Wakey | 4.2 | 15 | restaurant | 4 |
| Costa Coffee | City Point | 4.5 | 8 | café | 5 |
| The Queens Head | Denman St. | 4.7 | 25 | bar | 5 |
| Costa Coffee | Waterloo | 4.1 | 8 | café | 3 |
venues.with_columns(
)
venues.with_columns(
pl.col("price").rank()
)
venues.with_columns(
pl.col("price").rank(descending=False)
)
venues.with_columns(
pl.col("price").rank(descending=False).alias("rank_default")
)
shape: (4, 7)
| business | location | review | price | ... | rank_default |
| --- | --- | --- | --- | --- | --- |
| str | str | f64 | i64 | ... | f64 |
|-----------------|-------------|--------|-------|-------|--------------|
| 7burgers | Wakey Wakey | 4.2 | 15 | ... | 3.0 |
| Costa Coffee | City Point | 4.5 | 8 | ... | 1.5 |
| The Queens Head | Denman St. | 4.7 | 25 | ... | 4.0 |
| Costa Coffee | Waterloo | 4.1 | 8 | ... | 1.5 |
venues.with_columns(
pl.col("price").rank(descending=False).alias("rank_default"),
pl.col("price").rank(method="min", descending=False).alias("rank_min")
)
shape: (4, 8)
| business | ... | price | rank_default | rank_min |
| --- | --- | --- | --- | --- |
| str | ... | i64 | f64 | i64 |
|-----------------|-----|-------|--------------|----------|
| 7burgers | ... | 15 | 3.0 | 3 |
| Costa Coffee | ... | 8 | 1.5 | 1 |
| The Queens Head | ... | 25 | 4.0 | 4 |
| Costa Coffee | ... | 8 | 1.5 | 1 |
$$
$$
user_reviews = pl.read_csv("user_reviews.csv")
shape: (4, 4)
| business | Alice | Bob | Charlie |
| --- | --- | --- | --- |
| str | i64 | i64 | i64 |
|-----------------|-------|------|---------|
| 7burgers | 9 | 6 | 8 |
| Costa Coffee | 3 | 9 | 5 |
| The Queens Head | 9 | 7 | 9 |
| Nando's | 7 | 8 | 8 |
correlations = user_reviews.select( )
correlations = user_reviews.select(pl.selectors.integer())
correlations = user_reviews.select(pl.selectors.integer()).corr()
correlations
shape: (3, 3)
| Alice | Bob | Charlie |
| --- | --- | --- |
| f64 | f64 | f64 |
|---------|---------|---------|
| 1.0 | -0.913 | 0.953 |
| -0.913 | 1.0 | -0.745 |
| 0.953 | -0.745 | 1.0 |
correlations.with_columns(pl.Series("user", correlations.columns))
shape: (3, 4)
| user | Alice | Bob | Charlie |
| --- | --- | --- | --- |
| str | f64 | f64 | f64 |
|---------|---------|---------|---------|
| Alice | 1.00 | -0.91 | 0.95 |
| Bob | -0.91 | 1.00 | -0.74 |
| Charlie | 0.95 | -0.74 | 1.00 |
venues.select("review", "price").describe( )
venues.select("review", "price").describe(percentiles=[0.33, 0.67])
shape: (8, 3)
| statistic | review | price |
| --- | --- | --- |
| str | f64 | f64 |
|------------|----------|----------|
| count | 4.0 | 4.0 |
| null_count | 0.0 | 0.0 |
| mean | 4.3 | 15.0 |
| std | 0.391578 | 7.25718 |
| min | 3.8 | 8.0 |
| 33% | 4.2 | 12.0 |
| 67% | 4.5 | 15.0 |
| max | 4.7 | 25.0 |
Data Transformation with Polars