Identifying association rules

Market Basket Analysis in Python

Isaiah Hull

Visiting Associate Professor of Finance, BI Norwegian Business School

Loading and preparing data

import pandas as pd

# Load transactions from pandas.
books = pd.read_csv("datasets/bookstore.csv")
# Split transaction strings into lists.
transactions = books['Transaction'].apply(lambda t: t.split(','))
# Convert DataFrame into list of strings.
transactions = list(transactions)
Market Basket Analysis in Python

Exploring the data

print(transactions[:5])
[['language', 'travel', 'humor', 'fiction'],
 ['humor', 'language'],
 ['humor', 'biography', 'cooking'],
 ['cooking', 'language'],
 ['travel']]
Market Basket Analysis in Python

Association rules

  • Association rule

    • Contains antecedent and consequent
      • {health} $\rightarrow$ {cooking}
  • Multi-antecedent rule

    • {humor, travel} $\rightarrow$ {language}
  • Multi-consequent rule

    • {biography} $\rightarrow$ {history, language}
Market Basket Analysis in Python

Difficulty of selecting rules

  • Finding useful rules is difficult.

    • Set of all possible rules is large.
    • Most rules are not useful.
    • Must discard most rules.
  • What if we restrict ourselves to simple rules?

    • One antecedent and one consequent.
    • Still challenging, even for small dataset.
Market Basket Analysis in Python

Generating the rules

 

  • fiction
  • poetry
  • history
  • biography
  • cooking

 

  • health
  • travel
  • language
  • humor
Market Basket Analysis in Python

Generating the rules

Fiction Rules Poetry Rules ... Humor Rules
fiction->poetry poetry->fiction ... humor->fiction
fiction->history poetry->history ... humor->history
fiction->biography poetry->biography ... humor->biography
fiction->cooking poetry->cooking ... humor->cooking
... ... ... ...
fiction->humor poetry->humor ...
Market Basket Analysis in Python

Generating rules with itertools

from itertools import permutations

# Extract unique items.
flattened = [item for transaction in transactions for item in transaction]
items = list(set(flattened))
# Compute and print rules.
rules = list(permutations(items, 2))
print(rules)
[('fiction', 'poetry'), 
 ('fiction', 'history'),
 ...
 ('humor', 'travel'), 
 ('humor', 'language')]
Market Basket Analysis in Python

Counting the rules

# Print the number of rules
print(len(rules))
72

The plot shows the total number of rules as a function of the number of unique items.

Market Basket Analysis in Python

Looking ahead

# Import the association rules function
from mlxtend.frequent_patterns import association_rules
from mlxtend.frequent_patterns import apriori

# Compute frequent itemsets using the Apriori algorithm
frequent_itemsets = apriori(onehot, min_support = 0.001, 
                            max_len = 2, use_colnames = True)

# Compute all association rules for frequent_itemsets
rules = association_rules(frequent_itemsets, 
                            metric = "lift", 
                             min_threshold = 1.0)
Market Basket Analysis in Python

Let's practice!

Market Basket Analysis in Python

Preparing Video For Download...