Calculating ranks and correlations

Data Transformation with Polars

Liam Brannigan

Data Scientist & Polars Contributor

Ranking

Table showing number of world cup wins.

Data Transformation with Polars

Ranking

Table showing number of world cup wins with a ranking column.

Data Transformation with Polars

Ranking

Table showing number of world cup wins with two ranking columns derived using different methods.

Data Transformation with Polars

Our venues data

venues
shape: (4, 6)
| business        | location    | review | price | type       | hygiene_rating |
| ---             | ---         | ---    | ---   | ---        | ---            |
| str             | str         | f64    | i64   | str        | i64            |
|-----------------|-------------|--------|-------|------------|----------------|
| 7burgers        | Wakey Wakey | 4.2    | 15    | restaurant | 4              |
| Costa Coffee    | City Point  | 4.5    | 8     | café       | 5              |
| The Queens Head | Denman St.  | 4.7    | 25    | bar        | 5              |
| Costa Coffee    | Waterloo    | 4.1    | 8     | café       | 3              |
Data Transformation with Polars

Rank venues by price

venues.with_columns(

)
Data Transformation with Polars

Rank venues by price

venues.with_columns(
    pl.col("price").rank()
)
Data Transformation with Polars

Rank venues by price

venues.with_columns(
    pl.col("price").rank(descending=False)
)
Data Transformation with Polars

Rank venues by price

venues.with_columns(
    pl.col("price").rank(descending=False).alias("rank_default")
)
Data Transformation with Polars

Ranked venues

shape: (4, 7)
| business        | location    | review | price | ...   | rank_default |
| ---             | ---         | ---    | ---   | ---   | ---          |
| str             | str         | f64    | i64   | ...   | f64          |
|-----------------|-------------|--------|-------|-------|--------------|
| 7burgers        | Wakey Wakey | 4.2    | 15    | ...   | 3.0          |
| Costa Coffee    | City Point  | 4.5    | 8     | ...   | 1.5          |
| The Queens Head | Denman St.  | 4.7    | 25    | ...   | 4.0          |
| Costa Coffee    | Waterloo    | 4.1    | 8     | ...   | 1.5          |
Data Transformation with Polars

Rank venues: add min ranking

venues.with_columns(
    pl.col("price").rank(descending=False).alias("rank_default"),
    pl.col("price").rank(method="min", descending=False).alias("rank_min")
)
shape: (4, 8)
| business        | ... | price | rank_default | rank_min |
| ---             | --- | ---   | ---          | ---      |
| str             | ... | i64   | f64          | i64      |
|-----------------|-----|-------|--------------|----------|
| 7burgers        | ... | 15    | 3.0          | 3        |
| Costa Coffee    | ... | 8     | 1.5          | 1        |
| The Queens Head | ... | 25    | 4.0          | 4        |
| Costa Coffee    | ... | 8     | 1.5          | 1        |
Data Transformation with Polars

Which ranking style should we use?

$$

Default rank
  • Floating point output
  • Good for analytics

$$

Min rank
  • Integer-style output
  • Good for top-N displays
Data Transformation with Polars

Correlating user reviews

user_reviews = pl.read_csv("user_reviews.csv")
shape: (4, 4)
| business        | Alice | Bob  | Charlie |
| ---             | ---   | ---  | ---     |
| str             | i64   | i64  | i64     |
|-----------------|-------|------|---------|
| 7burgers        | 9     | 6    | 8       |
| Costa Coffee    | 3     | 9    | 5       |
| The Queens Head | 9     | 7    | 9       |
| Nando's         | 7     | 8    | 8       |
Data Transformation with Polars

Calculating correlations

correlations = user_reviews.select(                    )

Data Transformation with Polars

Calculating correlations

correlations = user_reviews.select(pl.selectors.integer())

Data Transformation with Polars

Calculating correlations

correlations = user_reviews.select(pl.selectors.integer()).corr()
correlations
shape: (3, 3)
| Alice   | Bob     | Charlie |
| ---     | ---     | ---     |
| f64     | f64     | f64     |
|---------|---------|---------|
| 1.0     | -0.913  | 0.953   |
| -0.913  | 1.0     | -0.745  |
| 0.953   | -0.745  | 1.0     |
Data Transformation with Polars

Adding a user column

correlations.with_columns(pl.Series("user", correlations.columns))
shape: (3, 4)
| user    | Alice   | Bob     | Charlie |
| ---     | ---     | ---     | ---     |
| str     | f64     | f64     | f64     |
|---------|---------|---------|---------|
| Alice   | 1.00    | -0.91   | 0.95    |
| Bob     | -0.91   | 1.00    | -0.74   |
| Charlie | 0.95    | -0.74   | 1.00    |
Data Transformation with Polars

Quick summary with .describe()

venues.select("review", "price").describe(                   )
Data Transformation with Polars

Quick summary with .describe()

venues.select("review", "price").describe(percentiles=[0.33, 0.67])
Data Transformation with Polars

Quick summary with .describe()

shape: (8, 3)
| statistic  | review   | price    |
| ---        | ---      | ---      |
| str        | f64      | f64      |
|------------|----------|----------|
| count      | 4.0      | 4.0      |
| null_count | 0.0      | 0.0      |
| mean       | 4.3      | 15.0     |
| std        | 0.391578 | 7.25718  |
| min        | 3.8      | 8.0      |
| 33%        | 4.2      | 12.0     |
| 67%        | 4.5      | 15.0     |
| max        | 4.7      | 25.0     |
Data Transformation with Polars

Let's practice!

Data Transformation with Polars

Preparing Video For Download...