Joining DataFrames

Data Transformation with Polars

Liam Brannigan

Data Scientist & Polars Contributor

Reviews data

shape: (5, 4)
| business         | location    | review | price |
| ---              | ---         | ---    | ---   |
| str              | str         | f64    | i64   |
|------------------|-------------|--------|-------|
| 7burgers         | Wakey Wakey | 4.2    | 15    |
| Bang Bang Burger | Forest Rd.  | 3.8    | 12    |
| Costa Coffee     | City Point  | 4.5    | 8     |
| Costa Coffee     | Waterloo    | 4.1    | 8     |
| The Queens Head  | Denman St.  | 4.7    | 25    |
Data Transformation with Polars

The inspections dataset

inspections = pl.read_csv("restaurant_inspections.csv")
shape: (5, 4)
| business        | location    | type       | hygiene_rating |
| ---             | ---         | ---        | ---            |
| str             | str         | str        | i64            |
|-----------------|-------------|------------|----------------|
| 7burgers        | Wakey Wakey | restaurant | 4              |
| Costa Coffee    | City Point  | café       | 5              |
| Costa Coffee    | Waterloo    | café       | 3              |
| The Queens Head | Denman St.  | bar        | 5              |
| Wagamama        | Soho        | restaurant | 4              |
Data Transformation with Polars

What is a join?

Diagram showing a single DataFrame.

Data Transformation with Polars

What is a join?

Diagram showing two DataFrames.

Data Transformation with Polars

What is a join?

Diagram showing two DataFrames with a matching key highlighted.

Data Transformation with Polars

What is a join?

Diagram showing two DataFrames being combined by matching rows on shared columns.

Data Transformation with Polars

What is a join?

Diagram showing two DataFrames being combined by matching rows on shared columns with a matching key highlighted.

Data Transformation with Polars

Join on a single column

reviews.join(                                       )
Data Transformation with Polars

Join on a single column

reviews.join(inspections,                           )
Data Transformation with Polars

Join on a single column

reviews.join(inspections, on="business"             )
Data Transformation with Polars

Join on a single column

reviews.join(inspections, on="business", how="inner")
shape: (6, 7)
| business      | location    | ... | location_right | hygiene_rating |
| ---           | ---         | ... | ---            | ---            |
| str           | str         | ... | str            | i64            |
|---------------|-------------|-----|----------------|----------------|
| 7burgers      | Wakey Wakey | ... | Wakey Wakey    | 4              |
| Costa Coffee  | City Point  | ... | City Point     | 5              |
| Costa Coffee  | Waterloo    | ... | City Point     | 5              |
| Costa Coffee  | City Point  | ... | Waterloo       | 3              |
| Costa Coffee  | Waterloo    | ... | Waterloo       | 3              |
| The Queens Hd | Denman St.  | ... | Denman St.     | 5              |
Data Transformation with Polars

Join on multiple columns

reviews.join(inspections, on=["business", "location"])
Data Transformation with Polars

Join on multiple columns

reviews.join(inspections, on=["business", "location"])
shape: (4, 6)
| business        | location    | review | price | type       | hygiene_rating |
| ---             | ---         | ---    | ---   | ---        | ---            |
| str             | str         | f64    | i64   | str        | i64            |
|-----------------|-------------|--------|-------|------------|----------------|
| 7burgers        | Wakey Wakey | 4.2    | 15    | restaurant | 4              |
| Costa Coffee    | City Point  | 4.5    | 8     | café       | 5              |
| Costa Coffee    | Waterloo    | 4.1    | 8     | café       | 3              |
| The Queens Head | Denman St.  | 4.7    | 25    | bar        | 5              |
Data Transformation with Polars

Left join

reviews.join(inspections, on=["business", "location"],           )
Data Transformation with Polars

Left join

reviews.join(inspections, on=["business", "location"], how="left")
shape: (5, 6)
| business         | location    | review | price | type       | hygiene_rating |
| ---              | ---         | ---    | ---   | ---        | ---            |
| str              | str         | f64    | i64   | str        | i64            |
|------------------|-------------|--------|-------|------------|----------------|
| 7burgers         | Wakey Wakey | 4.2    | 15    | restaurant | 4              |
| Bang Bang Burger | Forest Rd.  | 3.8    | 12    | null       | null           |
| Costa Coffee     | City Point  | 4.5    | 8     | café       | 5              |
| Costa Coffee     | Waterloo    | 4.1    | 8     | café       | 3              |
| The Queens Head  | Denman St.  | 4.7    | 25    | bar        | 5              |
Data Transformation with Polars

Full join

reviews.join(inspections, on=["business", "location"],                          )
Data Transformation with Polars

Full join

reviews.join(inspections, on=["business", "location"], how="full"               )
Data Transformation with Polars

Full join

reviews.join(inspections, on=["business", "location"], how="full", coalesce=True)
shape: (6, 6)
| business         | location    | review | price | type       | hygiene_rating |
| ---              | ---         | ---    | ---   | ---        | ---            |
| str              | str         | f64    | i64   | str        | i64            |
|------------------|-------------|--------|-------|------------|----------------|
| 7burgers         | Wakey Wakey | 4.2    | 15    | restaurant | 4              |
| Costa Coffee     | City Point  | 4.5    | 8     | café       | 5              |
| Costa Coffee     | Waterloo    | 4.1    | 8     | café       | 3              |
| The Queens Head  | Denman St.  | 4.7    | 25    | bar        | 5              |
| Wagamama         | Soho        | null   | null  | restaurant | 4              |
| Bang Bang Burger | Forest Rd.  | 3.8    | 12    | null       | null           |
Data Transformation with Polars

Choosing a join strategy

2 tables

Data Transformation with Polars

Inner join

Inner join representation

Data Transformation with Polars

Left join

Left join representation

Data Transformation with Polars

Full join

Full join representation

Data Transformation with Polars

Let's practice!

Data Transformation with Polars

Preparing Video For Download...