Market Basket Analysis in Python
Isaiah Hull
Visiting Associate Professor of Finance, BI Norwegian Business School
TID | Transaction |
---|---|
1 | Coffee, Milk |
2 | Bread, Milk, Orange |
3 | Bread, Milk |
4 | Bread, Milk, Sugar |
5 | Bread, Jam, Milk |
... | ... |
$$\frac{Support(X \& Y)}{Support(X)}$$
$$Support(Milk\&Coffee) = 0.20$$
$$Support(Milk) = 1.00$$
$$\frac{Support(Milk\&Coffee)}{Support(Milk)} = \frac{0.20}{1.00} = 0.20$$
$$\frac{Support(Milk\&Coffee)}{Support(Milk)} = \frac{0.20}{0.80} = 0.25$$
$$\frac{Support(Milk\&Coffee)}{Support(Milk)} = \frac{0.20}{0.20} = 1.00$$
$$\frac{Support(X \& Y)}{Support(X) Support(Y)}$$
from mlxtend.preprocessing import TransactionEncoder
import pandas as pd
# Split library strings into lists
libraries = data['Library'].apply(lambda t: t.split(','))
# Convert to list of lists
libraries = list(libraries)
# One-hot encode books
books = TransactionEncoder().fit(libraries).transform(libraries)
# Convert one-hot encoded data to DataFrame
books = pd.DataFrame(books, columns = encoder.columns_)
# Print first five items
print(books.head())
Hunger Gatsby
0 False True
1 False True
2 False False
3 False True
4 False True
# Computing support.
supportHG = np.logical_and(books['Hunger'],books['Gatsby']).mean()
supportH = books['Hunger'].mean()
supportG = books['Gatsby'].mean()
# Compute and print confidence and lift.
confidence = supportHG / supportH
lift = supportHG / (supportH * supportG)
# Print results.
print(supportG, confidence, lift)
(0.30, 0.16, 0.53)
Market Basket Analysis in Python