Advanced Apriori results pruning

Market Basket Analysis in Python

Isaiah Hull

Visiting Associate Professor of Finance, BI Norwegian Business School

Applications

Cross-Promotion

This image shows an example of cross-promotion.

Aggregation

This image shows an example of aggregation.

The Apriori algorithm

List of Lists

This image shows an example of a list of lists of the items in transactions.

One-Hot Encoding

This image shows the one-hot encoding of transactions.

Apriori Algorithm

This image illustrates the Apriori algorithm being applied to a generic set of 4 items.

The Apriori algorithm

import pandas as pd
import numpy as np
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori

itemsets = np.load('itemsets.npy')
print(itemsets)

[['EASTER CRAFT 4 CHICKS'],
['CERAMIC CAKE DESIGN SPOTTED MUG', 'CHARLOTTE BAG APPLES DESIGN'],
['SET 12 COLOUR PENCILS DOLLY GIRL'],
...
['JUMBO BAG RED RETROSPOT', ... 'LIPSTICK PEN FUSCHIA']]

The Apriori algorithm

# One-hot encode data
encoder = TransactionEncoder()
onehot = encoder.fit(itemsets).transform(itemsets)
onehot = pd.DataFrame(onehot, columns = encoder.columns_)

# Apply Apriori algorithm and print
frequent_itemsets = apriori(onehot, use_colnames=True, min_support=0.001)
print(frequent_itemsets)

      support                                           itemsets
0    0.001504                               ( DOLLY GIRL BEAKER)
1    0.002256                         ( RED SPOT GIFT BAG LARGE)
...
428  0.001504  (BIRTHDAY CARD, RETRO SPOT, JUMBO BAG RED RETR...

Apriori algorithm results

print(len(data.columns))

print(len(frequent_itemsets))

rules = association_rules(frequent_itemsets)

Association rules

print(rules['consequents'])

0                   (DOTCOM POSTAGE)
                        ... 
9                 (HERB MARKER THYME)
                        ...
234        (JUMBO BAG RED RETROSPOT)
235         (WOODLAND CHARLOTTE BAG)
236    (RED RETROSPOT CHARLOTTE BAG)
237       (STRAWBERRY CHARLOTTE BAG)
238      (CHARLOTTE BAG SUKI DESIGN)
Name: consequents, Length: 239, dtype: object

Filtering with multiple metrics

targeted_rules = rules[rules['consequents'] == {'HERB MARKER THYME'}].copy()

filtered_rules = targeted_rules[(targeted_rules['antecedent support'] > 0.01) &
                        (targeted_rules['support'] > 0.009) &
                        (targeted_rules['confidence'] > 0.85) &
                        (targeted_rules['lift'] > 1.00)]

print(filtered_rules['antecedents'])

9        (HERB MARKER BASIL)
25     (HERB MARKER PARSLEY)
27    (HERB MARKER ROSEMARY)
Name: antecedents, dtype: object

Grouping products

The image shows a store floorplan where boxes are grouped with bags and signs are grouped with candles.

The image shows a store floorplan where boxes are grouped with candles and signs are grouped with bags.

The image shows a store floorplan where boxes are grouped with signs and candles are grouped with bags.

Aggregation and dissociation

# Load aggregated data
aggregated = pd.read_csv('datasets/online_retail_aggregated.csv')

# Compute frequent itemsets
onehot = encoder.fit(aggregated).transform(aggregated)
data = pd.DataFrame(onehot, columns = encoder.columns_)
frequent_itemsets = apriori(data, use_colnames=True)

# Compute standard metrics
rules = association_rules(frequent_itemsets)

# Compute Zhang's rule
rules['zhang'] = zhangs_rule(rules)

Zhang's rule

# Print rules that indicate dissociation
print(rules[rules['zhang'] < 0][['antecedents','consequents']])

  antecedents consequents
2       (bag)    (candle)
3    (candle)       (bag)
4      (sign)       (bag)
5       (bag)      (sign)

Selecting a floorplan

The image shows a store floorplan where boxes are grouped with bags and signs are grouped with candles.

Let's practice!

Market Basket Analysis in Python