Market Basket Analysis in Python
Isaiah Hull
Visiting Associate Professor of Finance, BI Norwegian Business School
$$\frac{\text{number of transactions with items(s)}}{\text{number of transactions}}$$
$$\frac{\text{number of transactions with milk}}{\text{total transactions}}$$
| TID | Transaction |
|---|---|
| 0 | travel, humor, fiction |
| 1 | humor, language |
| 2 | humor, biography, cooking |
| 3 | cooking, language |
| 4 | travel |
Support for {language} = 2 / 10 = 0.2
| TID | Transaction |
|---|---|
| 5 | poetry, health, travel, history |
| 6 | humor |
| 7 | travel |
| 8 | poetry, fiction, humor |
| 9 | fiction, biography |
| TID | Transaction |
|---|---|
| 0 | travel,humor,fiction |
| 1 | humor,language |
| 2 | humor,biography,cooking |
| 3 | cooking,language |
| 4 | travel |
SUPPORT for {language} $\rightarrow$ {humor} = 0.1
| TID | Transaction |
|---|---|
| 5 | poetry,health,travel,history |
| 6 | humor |
| 7 | travel |
| 8 | poetry,fiction,humor |
| 9 | fiction,biography |
print(transactions)
[['travel', 'humor', 'fiction'],
...
['fiction', 'biography']]
from mlxtend.preprocessing import TransactionEncoder
# Instantiate transaction encoder
encoder = TransactionEncoder().fit(transactions)
# One-hot encode itemsets by applying fit and transform
onehot = encoder.transform(transactions)
# Convert one-hot encoded data to DataFrame
onehot = pd.DataFrame(onehot, columns = encoder.columns_)
print(onehot)
biography cooking ... poetry travel
0 False False ... False True
...
9 True False ... False False
print(onehot.mean())
biography 0.2
cooking 0.2
fiction 0.3
health 0.1
history 0.1
humor 0.5
language 0.2
poetry 0.2
travel 0.4
dtype: float64
import numpy as np
# Define itemset that contains fiction and poetry
onehot['fiction+poetry'] = np.logical_and(onehot['fiction'],onehot['poetry'])
print(onehot.mean())
biography 0.2
cooking 0.2
... ...
travel 0.4
fiction+poetry 0.1
dtype: float64
Market Basket Analysis in Python