The simplest metric

Market Basket Analysis in Python

Isaiah Hull

Visiting Associate Professor of Finance, BI Norwegian Business School

Metrics and pruning

  • A metric is a measure of performance for rules.
    • {humor} $\rightarrow$ {poetry}
      • 0.81
    • {fiction} $\rightarrow$ {travel}
      • 0.23
  • Pruning is the use of metrics to discard rules.
    • Retain: {humor} $\rightarrow$ {poetry}
    • Discard: {fiction} $\rightarrow$ {travel}
Market Basket Analysis in Python

The simplest metric

  • The support metric measures the share of transactions that contain an itemset.

 

$$\frac{\text{number of transactions with items(s)}}{\text{number of transactions}}$$

 

$$\frac{\text{number of transactions with milk}}{\text{total transactions}}$$

Market Basket Analysis in Python

Support for language

TID Transaction
0 travel, humor, fiction
1 humor, language
2 humor, biography, cooking
3 cooking, language
4 travel

 

Support for {language} = 2 / 10 = 0.2

TID Transaction
5 poetry, health, travel, history
6 humor
7 travel
8 poetry, fiction, humor
9 fiction, biography
Market Basket Analysis in Python

Support for {Humor} $\rightarrow$ {Language}

TID Transaction
0 travel,humor,fiction
1 humor,language
2 humor,biography,cooking
3 cooking,language
4 travel

 

SUPPORT for {language} $\rightarrow$ {humor} = 0.1

TID Transaction
5 poetry,health,travel,history
6 humor
7 travel
8 poetry,fiction,humor
9 fiction,biography
Market Basket Analysis in Python

Preparing the data

print(transactions)
[['travel', 'humor', 'fiction'],
...
['fiction', 'biography']]
from mlxtend.preprocessing import TransactionEncoder
# Instantiate transaction encoder
encoder = TransactionEncoder().fit(transactions)
Market Basket Analysis in Python

Preparing the data

# One-hot encode itemsets by applying fit and transform
onehot = encoder.transform(transactions)
# Convert one-hot encoded data to DataFrame
onehot = pd.DataFrame(onehot, columns = encoder.columns_)
print(onehot)
   biography  cooking  ...  poetry  travel
0  False      False   ...   False    True
...
9  True       False   ...   False    False
Market Basket Analysis in Python

Computing support for single items

print(onehot.mean())
biography    0.2
cooking      0.2
fiction      0.3
health       0.1
history      0.1
humor        0.5
language     0.2
poetry       0.2
travel       0.4
dtype: float64
Market Basket Analysis in Python

Computing support for multiple items

import numpy as np

# Define itemset that contains fiction and poetry
onehot['fiction+poetry'] = np.logical_and(onehot['fiction'],onehot['poetry'])

print(onehot.mean())
biography         0.2
cooking           0.2
...               ...
travel            0.4
fiction+poetry    0.1
dtype: float64
Market Basket Analysis in Python

Let's practice!

Market Basket Analysis in Python

Preparing Video For Download...