Market Basket Analysis in Python
Isaiah Hull
Visiting Associate Professor of Finance, BI Norwegian Business School
$$\frac{\text{number of transactions with items(s)}}{\text{number of transactions}}$$
$$\frac{\text{number of transactions with milk}}{\text{total transactions}}$$
TID | Transaction |
---|---|
0 | travel, humor, fiction |
1 | humor, language |
2 | humor, biography, cooking |
3 | cooking, language |
4 | travel |
Support for {language} = 2 / 10 = 0.2
TID | Transaction |
---|---|
5 | poetry, health, travel, history |
6 | humor |
7 | travel |
8 | poetry, fiction, humor |
9 | fiction, biography |
TID | Transaction |
---|---|
0 | travel,humor,fiction |
1 | humor,language |
2 | humor,biography,cooking |
3 | cooking,language |
4 | travel |
SUPPORT for {language} $\rightarrow$ {humor} = 0.1
TID | Transaction |
---|---|
5 | poetry,health,travel,history |
6 | humor |
7 | travel |
8 | poetry,fiction,humor |
9 | fiction,biography |
print(transactions)
[['travel', 'humor', 'fiction'],
...
['fiction', 'biography']]
from mlxtend.preprocessing import TransactionEncoder
# Instantiate transaction encoder
encoder = TransactionEncoder().fit(transactions)
# One-hot encode itemsets by applying fit and transform
onehot = encoder.transform(transactions)
# Convert one-hot encoded data to DataFrame
onehot = pd.DataFrame(onehot, columns = encoder.columns_)
print(onehot)
biography cooking ... poetry travel
0 False False ... False True
...
9 True False ... False False
print(onehot.mean())
biography 0.2
cooking 0.2
fiction 0.3
health 0.1
history 0.1
humor 0.5
language 0.2
poetry 0.2
travel 0.4
dtype: float64
import numpy as np
# Define itemset that contains fiction and poetry
onehot['fiction+poetry'] = np.logical_and(onehot['fiction'],onehot['poetry'])
print(onehot.mean())
biography 0.2
cooking 0.2
... ...
travel 0.4
fiction+poetry 0.1
dtype: float64
Market Basket Analysis in Python