Aggregation

Market Basket Analysis in Python

Isaiah Hull

Visiting Associate Professor of Finance, BI Norwegian Business School

Exploring the data

import pandas as pd

# Load novelty gift data.
gifts = pd.read_csv('datasets/novelty_gifts.csv')

# Preview data with head() method.
print(gifts.head())
  InvoiceNo                          Description
0    562583      IVORY STRING CURTAIN WITH POLE 
1    562583        PINK AND BLACK STRING CURTAIN
2    562583                PSYCHEDELIC TILE HOOK
3    562583                ENAMEL COLANDER CREAM
4    562583  SMALL FOLDING SCISSOR(POINTED EDGE)
Market Basket Analysis in Python

Exploring the data

# Print number of transactions.
print(len(gifts['InvoiceNo'].unique()))
9709
# Print number of items.
print(len(gifts['Description'].unique()))
3461
Market Basket Analysis in Python

Pruning and aggregation

Pruning The diagram shows a list of items, some of which have been crossed out to indicate that they've been pruned.

Aggregation The diagram shows a list of items being mapped to a smaller list to indicate that

Market Basket Analysis in Python

Aggregating the data

# Load one-hot encoded data
onehot = pd.read_csv('datasets/online_retail_onehot.csv')

# Print preview of DataFrame
print(onehot.head(2))
    50'S CHRISTMAS GIFT BAG LARGE   DOLLY GIRL BEAKER ...  ZINC WILLIE WINKIE  CANDLE STICK
0                           False               False                               False
1                           False               False                               True
Market Basket Analysis in Python

Aggregating the data

# Select the column names for bags and boxes
bag_headers = [i for i in onehot.columns if i.lower().find('bag')>=0]
box_headers = [i for i in onehot.columns if i.lower().find('box')>=0]
# Identify column headers
bags = onehot[bag_headers]
boxes = onehot[box_headers]
print(bags)
       50'S CHRISTMAS GIFT BAG LARGE   RED SPOT GIFT BAG LARGE  
0                              False                     False   
1                              False                     False
...                             ...                      ...
Market Basket Analysis in Python

Aggregating the data

# Sum over columns
bags = (bags.sum(axis=1) > 0.0).values
boxes = (boxes.sum(axis=1) > 0.0).values
print(bags)
[False  True False ... False  True False]
Market Basket Analysis in Python

Aggregating the data

# Add results to DataFrame
aggregated = pd.DataFrame(np.vstack([bags, boxes]).T, columns = ['bags', 'boxes'])
print(aggregated.head())
    bags  boxes
0  False  False
1   True  False
2  False  False
3  False  False
4   True  False
Market Basket Analysis in Python

Market basket analysis with aggregates

  • Aggregation process:
    • Items -> Categories
    • Compute metrics
    • Identify rules
# Compute support
print(aggregated.mean())
bags     0.130075
boxes    0.071429
Market Basket Analysis in Python

Let's practice!

Market Basket Analysis in Python

Preparing Video For Download...