Grouping by multiple columns

Python for Spreadsheet Users

Chris Cardillo

Data Scientist

Fruit sales

fruit_sales many fruit sales.png

fruit_sales.info() fruit sales info.png

Python for Spreadsheet Users

Fruit sales

fruit_sales many fruit sales.png

fruit_sales.info() fruit sales info highlighted.png

Python for Spreadsheet Users

Fruit sales

many apples.png

Python for Spreadsheet Users

Adding a list of column names

Before
fruit_sales.groupby('store', as_index=False).sum()
After
fruit_sales.groupby(['store', 'product_name'], as_index=False).sum()
Python for Spreadsheet Users

What is a list?

shopping_list = ['milk', 'eggs', 'cheese']
Python for Spreadsheet Users

By store, by fruit

groups = ['store', 'product_name']

fruit_sales_less = fruit_sales.groupby(groups, as_index=False).sum()

grouped and summarized.png

Python for Spreadsheet Users

By store, by fruit

groups = ['store', 'product_name']

fruit_sales_less = fruit_sales.groupby(groups, as_index=False).sum()

now one apple.png

Python for Spreadsheet Users

The benefits of grouping by more columns before .sum()

  • It's not "one or none"
  • Reduce data down to what matters
  • Help make spreadsheet data more manageable
Python for Spreadsheet Users

Your turn!

Python for Spreadsheet Users

Preparing Video For Download...