Market Basket Analysis in Python
Isaiah Hull
Visiting Associate Professor of Finance, BI Norwegian Business School
$${n \choose k} = \frac{n!}{(n-k)!k!}$$
Item Count | Itemset Size | Combinations |
---|---|---|
3461 | 0 | 1 |
3461 | 1 | 3461 |
3461 | 2 | 5,987,530 |
3461 | 3 | 6,903,622,090 |
3461 | 4 | 5,968,181,296,805 |
$$\sum_{k=0}^{n}{n \choose k} = 2^{n}$$
# Import Apriori algorithm
from mlxtend.frequent_patterns import apriori
# Load one-hot encoded novelty gifts data
onehot = pd.read_csv('datasets/online_retail_onehot.csv')
# Print header.
print(onehot.head())
50'S CHRISTMAS GIFT BAG LARGE ... ZINC WILLIE WINKIE CANDLE STICK \
0 False ... False
1 False ... False
2 False ... False
3 False ... False
4 False ... False
# Compute frequent itemsets
frequent_itemsets = apriori(onehot, min_support = 0.0005,
max_len = 4, use_colnames = True)
# Print number of itemsets
print(len(frequent_itemsets))
3652
# Print itemsets
print(frequent_itemsets.head())
support itemsets
0 0.000752 ( 50'S CHRISTMAS GIFT BAG LARGE)
1 0.001504 ( DOLLY GIRL BEAKER)
...
1500 0.000752 (PING MICROWAVE APRON, FOOD CONTAINER SET 3 LO...
1501 0.000752 (WOOD 2 DRAWER CABINET WHITE FINISH, FOOD CONT...
...
Market Basket Analysis in Python