The Apriori algorithm

Market Basket Analysis in Python

Isaiah Hull

Visiting Associate Professor of Finance, BI Norwegian Business School

Counting itemsets

$${n \choose k} = \frac{n!}{(n-k)!k!}$$

Item Count Itemset Size Combinations
3461 0 1
3461 1 3461
3461 2 5,987,530
3461 3 6,903,622,090
3461 4 5,968,181,296,805
Market Basket Analysis in Python

Counting itemsets

$$\sum_{k=0}^{n}{n \choose k} = 2^{n}$$

  • $n = 3461 \rightarrow 2^{3461}$
  • $2^{3461}>>10^{82}$
  • Number of atoms in universe: $10^{82}$.
Market Basket Analysis in Python

Reducing the number of itemsets

  • Not possible to consider all itemsets.
    • Not even possible to enumerate them.
  • How do we remove an itemset without even evaluating it?
    • Could set maximum $k$ value.
  • Apriori algorithm offers alternative.
    • Doesn't require enumeration of all itemsets.
    • Sensible rule for pruning.
Market Basket Analysis in Python

The Apriori principle

  • Apriori principle.
    • Subsets of frequent sets are frequent.
    • Retain sets known to be frequent.
    • Prune sets not known to be frequent.
  • Candles = Infrequent
    • -> {Candles, Signs} = Infrequent
  • {Candles, Signs} = Infrequent
    • -> {Candles, Signs Boxes} = Infrequent
  • {Candles, Signs, Boxes} = Infrequent
    • -> {Candles, Signs, Boxes, Bags} = Infrequent
Market Basket Analysis in Python

Apriori implementation

# Import Apriori algorithm
from mlxtend.frequent_patterns import apriori

# Load one-hot encoded novelty gifts data
onehot = pd.read_csv('datasets/online_retail_onehot.csv')

# Print header.
print(onehot.head())
    50'S CHRISTMAS GIFT BAG LARGE ...  ZINC WILLIE WINKIE  CANDLE STICK  \
0                           False ...              False   
1                           False ...              False   
2                           False ...              False   
3                           False ...              False   
4                           False ...              False
Market Basket Analysis in Python

Apriori implementation

# Compute frequent itemsets
frequent_itemsets = apriori(onehot, min_support = 0.0005, 
                            max_len = 4, use_colnames = True)

# Print number of itemsets
print(len(frequent_itemsets))
3652
Market Basket Analysis in Python

Apriori implementation

# Print itemsets
print(frequent_itemsets.head())
      support                          itemsets
0     0.000752  ( 50'S CHRISTMAS GIFT BAG LARGE)
1     0.001504              ( DOLLY GIRL BEAKER)
...
1500  0.000752  (PING MICROWAVE APRON, FOOD CONTAINER SET 3 LO...
1501  0.000752  (WOOD 2 DRAWER CABINET WHITE FINISH, FOOD CONT...
...
Market Basket Analysis in Python

Let's practice!

Market Basket Analysis in Python

Preparing Video For Download...