Market Basket Analysis in Python
Isaiah Hull
Visiting Associate Professor of Finance, BI Norwegian Business School
import pandas as pd
# Load transactions from pandas.
books = pd.read_csv("datasets/bookstore.csv")
# Split transaction strings into lists.
transactions = books['Transaction'].apply(lambda t: t.split(','))
# Convert DataFrame into list of strings.
transactions = list(transactions)
print(transactions[:5])
[['language', 'travel', 'humor', 'fiction'],
['humor', 'language'],
['humor', 'biography', 'cooking'],
['cooking', 'language'],
['travel']]
Association rule
Multi-antecedent rule
Multi-consequent rule
Finding useful rules is difficult.
What if we restrict ourselves to simple rules?
Fiction Rules | Poetry Rules | ... | Humor Rules |
---|---|---|---|
fiction->poetry | poetry->fiction | ... | humor->fiction |
fiction->history | poetry->history | ... | humor->history |
fiction->biography | poetry->biography | ... | humor->biography |
fiction->cooking | poetry->cooking | ... | humor->cooking |
... | ... | ... | ... |
fiction->humor | poetry->humor | ... |
from itertools import permutations
# Extract unique items.
flattened = [item for transaction in transactions for item in transaction]
items = list(set(flattened))
# Compute and print rules.
rules = list(permutations(items, 2))
print(rules)
[('fiction', 'poetry'),
('fiction', 'history'),
...
('humor', 'travel'),
('humor', 'language')]
# Print the number of rules
print(len(rules))
72
# Import the association rules function
from mlxtend.frequent_patterns import association_rules
from mlxtend.frequent_patterns import apriori
# Compute frequent itemsets using the Apriori algorithm
frequent_itemsets = apriori(onehot, min_support = 0.001,
max_len = 2, use_colnames = True)
# Compute all association rules for frequent_itemsets
rules = association_rules(frequent_itemsets,
metric = "lift",
min_threshold = 1.0)
Market Basket Analysis in Python