Applying custom transformations

Data Transformation with Polars

Liam Brannigan

Data Scientist & Polars Contributor

Venues dataset

venues = pl.read_csv("venues.csv")
shape: (4, 6)
| business         | location    | type       | hygiene_rating | review | price |
| ---              | ---         | ---        | ---            | ---    | ---   |
| str              | str         | str        | i64            | f64    | i64   |
|------------------|-------------|------------|----------------|--------|-------|
| 7burgers         | Wakey Wakey | restaurant | 4              | 4.2    | 15    |
| Bang Bang Burger | Forest Rd.  | restaurant | 3              | 3.8    | 12    |
| Costa Coffee     | City Point  | café       | 5              | 4.5    | 8     |
| The Queens Head  | Denman St.  | bar        | 5              | 4.7    | 25    |
  • Task: to rescale the review column
Data Transformation with Polars

Rescale reviews

def rescale_review(x):
    return 2 * x




Data Transformation with Polars

Rescale reviews

def rescale_review(x):
    return 2 * x

venues.with_columns(
    pl.col("review")
)
Data Transformation with Polars

Rescale reviews

def rescale_review(x):
    return 2 * x

venues.with_columns(
    pl.col("review").map_elements(rescale_review)
)
shape: (4, 6)
| business         | location    | type       | hygiene_rating | review | price |
| ---              | ---         | ---        | ---            | ---    | ---   |
| str              | str         | str        | i64            | f64    | i64   |
|------------------|-------------|------------|----------------|--------|-------|
| 7burgers         | Wakey Wakey | restaurant | 4              | 8.4    | 15    |
| Bang Bang Burger | Forest Rd.  | restaurant | 3              | 7.6    | 12    |
| Costa Coffee     | City Point  | café       | 5              | 9.0    | 8     |
| The Queens Head  | Denman St.  | bar        | 5              | 9.4    | 25    |
Data Transformation with Polars

Specify return_dtype for control

venues.with_columns(
    pl.col("review").map_elements(
        rescale_review, return_dtype=pl.Float64
    )
)
shape: (4, 6)
| business         | location    | type       | hygiene_rating | review | price |
| ---              | ---         | ---        | ---            | ---    | ---   |
| str              | str         | str        | i64            | f64    | i64   |
|------------------|-------------|------------|----------------|--------|-------|
| 7burgers         | Wakey Wakey | restaurant | 4              | 8.4    | 15    |
| Bang Bang Burger | Forest Rd.  | restaurant | 3              | 7.6    | 12    |
| Costa Coffee     | City Point  | café       | 5              | 9.0    | 8     |
| The Queens Head  | Denman St.  | bar        | 5              | 9.4    | 25    |
Data Transformation with Polars

Polars Warning

PolarsInefficientMapWarning:

Expr.map_elements is significantly slower than the native expressions API.
Only use if you absolutely CANNOT implement your logic otherwise.
Replace this expression...
  - pl.col("review").map_elements(rescale_review)
with this one instead:
  + 2 * pl.col("review")
Data Transformation with Polars

Rescale reviews (native expression)

venues.with_columns(
    (2 * pl.col("review")).alias("review")
)
shape: (4, 6)
| business         | location    | type       | hygiene_rating | review | price |
| ---              | ---         | ---        | ---            | ---    | ---   |
| str              | str         | str        | i64            | f64    | i64   |
|------------------|-------------|------------|----------------|--------|-------|
| 7burgers         | Wakey Wakey | restaurant | 4              | 8.4    | 15    |
| Bang Bang Burger | Forest Rd.  | restaurant | 3              | 7.6    | 12    |
| Costa Coffee     | City Point  | café       | 5              | 9.0    | 8     |
| The Queens Head  | Denman St.  | bar        | 5              | 9.4    | 25    |
Data Transformation with Polars

Choose the right tool

$$

Native expressions
  • Preferred default
  • Fast
  • Optimized

$$

.map_elements()
  • Use when needed
  • Legible
  • Third-party packages
Data Transformation with Polars

Standardize location text

  • Denman St. -> DENMAN STREET
  • Forest Rd. -> FOREST ROAD

$$

$$

$$

$$

$$

  • Task: to standardize the format of the location column
Data Transformation with Polars

Standardize locations

venues.with_columns(
    pl.col("location")



)
Data Transformation with Polars

Standardize locations

venues.with_columns(
    pl.col("location")
    .str.replace_many(["St.", "Rd."], ["Street", "Road"])


)
Data Transformation with Polars

Standardize locations

venues.with_columns(
    pl.col("location")
    .str.replace_many(["St.", "Rd."], ["Street", "Road"])
    .str.to_uppercase()
    .alias("location_clean")
)
shape: (4, 7)
| business         | location    | ... | location_clean |
| ---              | ---         | --- | ---            |
| str              | str         | ... | str            |
|------------------|-------------|-----|----------------|
| 7burgers         | Wakey Wakey | ... | WAKEY WAKEY    |
| Bang Bang Burger | Forest Rd.  | ... | FOREST ROAD    |
| Costa Coffee     | City Point  | ... | CITY POINT     |
| The Queens Head  | Denman St.  | ... | DENMAN STREET  |
Data Transformation with Polars

Store an expression in a variable

standardize_locations_expr = (
    pl.col("location")
    .str.replace_many(["St.", "Rd."], ["Street", "Road"])
    .str.to_uppercase()
    .alias("location_clean")
)

type(standardize_locations_expr)
polars.Expr
Data Transformation with Polars

Reuse the expression

venues.with_columns(
    standardize_locations_expr
)
shape: (4, 7)
| business         | location    | ... | location_clean |
| ---              | ---         | --- | ---            |
| str              | str         | ... | str            |
|------------------|-------------|-----|----------------|
| 7burgers         | Wakey Wakey | ... | WAKEY WAKEY    |
| Bang Bang Burger | Forest Rd.  | ... | FOREST ROAD    |
| Costa Coffee     | City Point  | ... | CITY POINT     |
| The Queens Head  | Denman St.  | ... | DENMAN STREET  |
Data Transformation with Polars

Add a custom expression method

def standardize(input):





Data Transformation with Polars

Add a custom expression method

def standardize(input):
    return (
        input
        .str.replace_many(["St.", "Rd."], ["Street", "Road"])
        .str.to_uppercase()
    )
Data Transformation with Polars

Add a custom expression method

def standardize(input):
    return (
        input
        .str.replace_many(["St.", "Rd."], ["Street", "Road"])
        .str.to_uppercase()
    )

pl.Expr.standardize = standardize
Data Transformation with Polars

Use the custom method

restaurants.with_columns(
    pl.col("address").standardize().alias("address_clean")
)
shape: (4, 7)
| business         | address     | ... | address_clean  |
| ---              | ---         | --- | ---            |
| str              | str         | ... | str            |
|------------------|-------------|-----|----------------|
| 7burgers         | Wakey Wakey | ... | WAKEY WAKEY    |
| Bang Bang Burger | Forest Rd.  | ... | FOREST ROAD    |
| Costa Coffee     | City Point  | ... | CITY POINT     |
| The Queens Head  | Denman St.  | ... | DENMAN STREET  |
Data Transformation with Polars

Let's practice!

Data Transformation with Polars

Preparing Video For Download...