Basic Apriori results pruning

Market Basket Analysis in Python

Isaiah Hull

Visiting Associate Professor of Finance, BI Norwegian Business School

Apriori and association rules

  • Apriori prunes itemsets.
    • Applies minimum support threshold.
    • Modified version can prune by number of items.
    • Doesn't tell us about association rules.
  • Association rules.
    • Many more association rules than itemsets.
    • {Bags, Boxes}: Bags -> Boxes OR Boxes -> Bags.
Market Basket Analysis in Python

How to compute association rules

  • Computing rules from Apriori results.
    • Difficult to enumerate for high n and k.
    • Could undo itemset pruning by Apriori.
  • Reducing number of association rules.
    • mlxtend module offers means of pruning association rules.
    • association_rules() takes frequent items, metric, and threshold.
Market Basket Analysis in Python

How to compute association rules

# Import Apriori algorithm
from mlxtend.frequent_patterns import apriori, association_rules

# Load one-hot encoded novelty gifts data
onehot = pd.read_csv('datasets/online_retail_onehot.csv')

# Apply Apriori algorithm
frequent_itemsets = apriori(onehot, 
                            use_colnames=True, 
                            min_support=0.0001)
# Compute association rules
rules = association_rules(frequent_itemsets,
                          metric = "support", 
                          min_threshold = 0.0)
Market Basket Analysis in Python

The importance of pruning

# Print the rules.
print(rules)
                               antecedents  ... conviction
0      (CARDHOLDER GINGHAM CHRISTMAS TREE)  ...      inf
...
79505      (SET OF 3 HEART COOKIE CUTTERS)  ... 1.998496
# Print the frequent itemsets.
print(frequent_itemsets)
       support                                           itemsets
0     0.000752                   ( 50'S CHRISTMAS GIFT BAG LARGE)
...
4707  0.000752                  (PIZZA PLATE IN BOX, CHRISTMAS ...
Market Basket Analysis in Python

The importance of pruning

# Compute association rules
rules = association_rules(frequent_itemsets,
                          metric = "support", 
                          min_threshold = 0.001)

# Print the rules.
print(rules)
                   antecedents      conviction  
0  (BIRTHDAY CARD, RETRO SPOT)  ...  2.977444 
1    (JUMBO BAG RED RETROSPOT)  ...  1.247180
Market Basket Analysis in Python

Exploring the set of rules

print(rules.columns)
Index(['antecedents', 'consequents', 'antecedent support',
       'consequent support', 'support', 'confidence', 'lift', 'leverage',
       'conviction'],
      dtype='object')
print(rules[['antecedents','consequents']])
                   antecedents                  consequents
0    (JUMBO BAG RED RETROSPOT)  (BIRTHDAY CARD, RETRO SPOT)
1  (BIRTHDAY CARD, RETRO SPOT)    (JUMBO BAG RED RETROSPOT)
Market Basket Analysis in Python

Pruning with other metrics

# Compute association rules
rules = association_rules(frequent_itemsets,
                          metric = "antecedent support", 
                          min_threshold = 0.002)

# Print the number of rules.
print(len(rules))
3899
Market Basket Analysis in Python

Let's practice!

Market Basket Analysis in Python

Preparing Video For Download...