Market Basket Analysis in Python
Isaiah Hull
Visiting Associate Professor of Finance, BI Norwegian Business School
mlxtend
module offers means of pruning association rules.association_rules()
takes frequent items, metric, and threshold.# Import Apriori algorithm
from mlxtend.frequent_patterns import apriori, association_rules
# Load one-hot encoded novelty gifts data
onehot = pd.read_csv('datasets/online_retail_onehot.csv')
# Apply Apriori algorithm
frequent_itemsets = apriori(onehot,
use_colnames=True,
min_support=0.0001)
# Compute association rules
rules = association_rules(frequent_itemsets,
metric = "support",
min_threshold = 0.0)
# Print the rules.
print(rules)
antecedents ... conviction
0 (CARDHOLDER GINGHAM CHRISTMAS TREE) ... inf
...
79505 (SET OF 3 HEART COOKIE CUTTERS) ... 1.998496
# Print the frequent itemsets.
print(frequent_itemsets)
support itemsets
0 0.000752 ( 50'S CHRISTMAS GIFT BAG LARGE)
...
4707 0.000752 (PIZZA PLATE IN BOX, CHRISTMAS ...
# Compute association rules
rules = association_rules(frequent_itemsets,
metric = "support",
min_threshold = 0.001)
# Print the rules.
print(rules)
antecedents conviction
0 (BIRTHDAY CARD, RETRO SPOT) ... 2.977444
1 (JUMBO BAG RED RETROSPOT) ... 1.247180
print(rules.columns)
Index(['antecedents', 'consequents', 'antecedent support',
'consequent support', 'support', 'confidence', 'lift', 'leverage',
'conviction'],
dtype='object')
print(rules[['antecedents','consequents']])
antecedents consequents
0 (JUMBO BAG RED RETROSPOT) (BIRTHDAY CARD, RETRO SPOT)
1 (BIRTHDAY CARD, RETRO SPOT) (JUMBO BAG RED RETROSPOT)
# Compute association rules
rules = association_rules(frequent_itemsets,
metric = "antecedent support",
min_threshold = 0.002)
# Print the number of rules.
print(len(rules))
3899
Market Basket Analysis in Python