Basis Apriori: regels snoeien

Market Basket Analysis in Python

Isaiah Hull

Visiting Associate Professor of Finance, BI Norwegian Business School

Apriori en associatieregels

  • Apriori snoeit itemsets.
    • Past een minimumsupport-drempel toe.
    • Aangepaste versie kan op aantal items snoeien.
    • Zegt niks over associatieregels.
  • Associatieregels.
    • Veel meer regels dan itemsets.
    • {Tassen, Dozen}: Tassen -> Dozen OF Dozen -> Tassen.
Market Basket Analysis in Python

Associatieregels berekenen

  • Regels uit Apriori-resultaten berekenen.
    • Moeilijk te enumereren bij hoge n en k.
    • Kan Apriori-snoei van itemsets ongedaan maken.
  • Aantal associatieregels beperken.
    • mlxtend biedt manieren om regels te snoeien.
    • association_rules() neemt frequente items, metric en drempel.
Market Basket Analysis in Python

Associatieregels berekenen

# Import Apriori algorithm
from mlxtend.frequent_patterns import apriori, association_rules

# Load one-hot encoded novelty gifts data
onehot = pd.read_csv('datasets/online_retail_onehot.csv')

# Apply Apriori algorithm
frequent_itemsets = apriori(onehot, 
                            use_colnames=True, 
                            min_support=0.0001)
# Compute association rules
rules = association_rules(frequent_itemsets,
                          metric = "support", 
                          min_threshold = 0.0)
Market Basket Analysis in Python

Het belang van snoeien

# Print the rules.
print(rules)
                               antecedents  ... conviction
0      (CARDHOLDER GINGHAM CHRISTMAS TREE)  ...      inf
...
79505      (SET OF 3 HEART COOKIE CUTTERS)  ... 1.998496
# Print the frequent itemsets.
print(frequent_itemsets)
       support                                           itemsets
0     0.000752                   ( 50'S CHRISTMAS GIFT BAG LARGE)
...
4707  0.000752                  (PIZZA PLATE IN BOX, CHRISTMAS ...
Market Basket Analysis in Python

Het belang van snoeien

# Compute association rules
rules = association_rules(frequent_itemsets,
                          metric = "support", 
                          min_threshold = 0.001)

# Print the rules.
print(rules)
                   antecedents      conviction  
0  (BIRTHDAY CARD, RETRO SPOT)  ...  2.977444 
1    (JUMBO BAG RED RETROSPOT)  ...  1.247180
Market Basket Analysis in Python

De regels verkennen

print(rules.columns)
Index(['antecedents', 'consequents', 'antecedent support',
       'consequent support', 'support', 'confidence', 'lift', 'leverage',
       'conviction'],
      dtype='object')
print(rules[['antecedents','consequents']])
                   antecedents                  consequents
0    (JUMBO BAG RED RETROSPOT)  (BIRTHDAY CARD, RETRO SPOT)
1  (BIRTHDAY CARD, RETRO SPOT)    (JUMBO BAG RED RETROSPOT)
Market Basket Analysis in Python

Snoeien met andere metrics

# Compute association rules
rules = association_rules(frequent_itemsets,
                          metric = "antecedent support", 
                          min_threshold = 0.002)

# Print the number of rules.
print(len(rules))
3899
Market Basket Analysis in Python

Laten we oefenen!

Market Basket Analysis in Python

Preparing Video For Download...