Joining data: a real-world necessity

Pandas Joins for Spreadsheet Users

John Miller

Principal Data Scientist

Pandas for spreadsheet users

  • Learn based on similarities to spreadsheets
  • Understand the power and flexibility of pandas
  • Use data from the National Football League (NFL)

football punt

Pandas Joins for Spreadsheet Users

Common situations

big data cartoon

$$

  • Datasets split by time or other factor
  • Datasets with related factors
Pandas Joins for Spreadsheet Users

Split data

$$

  • Influenced by reporting cycle
  • Common splits
    • Time
    • Geography
    • Business unit
Pandas Joins for Spreadsheet Users

Split data example

games data tables

Pandas Joins for Spreadsheet Users

Split data example

Split data columns

Pandas Joins for Spreadsheet Users

Split data example

Split data keys

Pandas Joins for Spreadsheet Users

Complementary data

$$

  • Results from collecting data for different purposes
  • Department-specific data
  • Storage in separate files or database tables
Pandas Joins for Spreadsheet Users

Complementary data example

$$ Complementary data

Pandas Joins for Spreadsheet Users

Complementary data example

$$ Complementary data columns

Pandas Joins for Spreadsheet Users

Complementary data example

$$ Complementary data rows

Pandas Joins for Spreadsheet Users

Let's practice!

Pandas Joins for Spreadsheet Users

Preparing Video For Download...