De simpelste metric

Market Basket Analysis in Python

Isaiah Hull

Visiting Associate Professor of Finance, BI Norwegian Business School

Metrics en pruning

  • Een metric is een prestatiemaatstaf voor regels.
    • {humor} $\rightarrow$ {poetry}
      • 0.81
    • {fiction} $\rightarrow$ {travel}
      • 0.23
  • Pruning is het gebruik van metrics om regels te schrappen.
    • Behouden: {humor} $\rightarrow$ {poetry}
    • Schrappen: {fiction} $\rightarrow$ {travel}
Market Basket Analysis in Python

De simpelste metric

  • De support-metric meet het aandeel transacties met een itemset.

 

$$\frac{\text{number of transactions with items(s)}}{\text{number of transactions}}$$

 

$$\frac{\text{number of transactions with milk}}{\text{total transactions}}$$

Market Basket Analysis in Python

Support voor language

TID Transaction
0 travel, humor, fiction
1 humor, language
2 humor, biography, cooking
3 cooking, language
4 travel

 

Support voor {language} = 2 / 10 = 0.2

TID Transaction
5 poetry, health, travel, history
6 humor
7 travel
8 poetry, fiction, humor
9 fiction, biography
Market Basket Analysis in Python

Support voor {Humor} $\rightarrow$ {Language}

TID Transaction
0 travel,humor,fiction
1 humor,language
2 humor,biography,cooking
3 cooking,language
4 travel

 

SUPPORT voor {language} $\rightarrow$ {humor} = 0.1

TID Transaction
5 poetry,health,travel,history
6 humor
7 travel
8 poetry,fiction,humor
9 fiction,biography
Market Basket Analysis in Python

Data voorbereiden

print(transactions)
[['travel', 'humor', 'fiction'],
...
['fiction', 'biography']]
from mlxtend.preprocessing import TransactionEncoder
# Instantiate transaction encoder
encoder = TransactionEncoder().fit(transactions)
Market Basket Analysis in Python

Data voorbereiden

# One-hot encode itemsets by applying fit and transform
onehot = encoder.transform(transactions)
# Convert one-hot encoded data to DataFrame
onehot = pd.DataFrame(onehot, columns = encoder.columns_)
print(onehot)
   biography  cooking  ...  poetry  travel
0  False      False   ...   False    True
...
9  True       False   ...   False    False
Market Basket Analysis in Python

Support voor losse items berekenen

print(onehot.mean())
biography    0.2
cooking      0.2
fiction      0.3
health       0.1
history      0.1
humor        0.5
language     0.2
poetry       0.2
travel       0.4
dtype: float64
Market Basket Analysis in Python

Support voor meerdere items berekenen

import numpy as np

# Define itemset that contains fiction and poetry
onehot['fiction+poetry'] = np.logical_and(onehot['fiction'],onehot['poetry'])

print(onehot.mean())
biography         0.2
cooking           0.2
...               ...
travel            0.4
fiction+poetry    0.1
dtype: float64
Market Basket Analysis in Python

Laten we oefenen!

Market Basket Analysis in Python

Preparing Video For Download...