The apriori algorithm

Market Basket Analysis in R

Christopher Bruffaerts

Statistician

Association rule mining

Association rule mining allows to discover interesting relationships between items in a large transactional database.

This mining task can be divided into two subtasks:

  • Frequent itemset generation: determine all frequent itemsets of a potentially large database of transactions. An itemset is said to be frequent if it satisfies a minimum support threshold.

  • Rule generation: from the above frequent itemsets, generate association rules with confidence above a minimum confidence threshold.

  The apriori algorithm is a classic and fast mining algorithm belonging to the class of association rule mining algorithms.

Market Basket Analysis in R

Idea behind the apriori algorithm

The apriori algorithm:

  • Bottom-up approach
  • Generates candidate itemsets by exploiting the apriori principle

Apriori principle:

  • If an itemset is frequent, then all of its subsets must also be frequent.
    • e.g. if {A,B} is frequent, then both {A} and {B} are frequent
  • For an infrequent itemset, all its super-sets are infrequent.
    • e.g. if {A} is infrequent, then {A,B}, {A,C} and {A,B,C} are infrequent.
1 Agrawal and Srikant (1994)
Market Basket Analysis in R

Example: 1-itemset

item_set_lattice2

TID Transaction
1 {A, B, C, D}
2 {A, B, D}
3 {A, B}
4 {B, C, D}
5 {B, C}
6 {C, D}
7 {B, D}
1 Minimum support threshold = 3/7 = 0.42
Market Basket Analysis in R

Example: 2-itemsets

item_set_lattice3

TID Transaction
1 {A, B, C, D}
2 {A, B, D}
3 {A, B}
4 {B, C, D}
5 {B, C}
6 {C, D}
7 {B, D}
1 Minimum support threshold = 3/7 = 0.42
Market Basket Analysis in R

Example: 3-itemsets

all_itemsets

TID Transaction
1 {A, B, C, D}
2 {A, B, D}
3 {A, B}
4 {B, C, D}
5 {B, C}
6 {C, D}
7 {B, D}
1 Minimum support threshold = 3/7 = 0.42
Market Basket Analysis in R

Example: frequent itemsets

item_set_lattice3

Itemset Count Support
{A} 3 0.42
{B} 6 0.85
{C} 4 0.57
{D} 5 0.71
{A,B} 3 0.42
{B,C} 3 0.42
{B,D} 4 0.57
{C,D} 3 0.42
1 Minimum support threshold = 3/7 = 0.42
Market Basket Analysis in R

Apriori: rule generation

After the computationally expensive frequent itemset generation, apriori generates rules:

  • Start with high-confidence rules with single precedent
    • e.g. {A,C} $\rightarrow$ {B}
  • Build more complex rules, with more items on the right hand side
    • e.g. {A,C} $\rightarrow$ {B, D}

 

Trick: pruning of association rule

e.g.: if the rule {B,C,D} $\rightarrow$ {A} has low confidence, all rules containing item A in its consequent can be discarded (such as the rule {B,D} $\rightarrow$ {A, C} or {D} $\rightarrow$ {A,B, C}).

Market Basket Analysis in R

A first try with the apriori

Transactional data

inspect(head(trans,2))
    items     transactionID
[1] {A,B,C,D} 1            
[2] {A,B,D}   2            

First call to the apriori function - frequent itemsets

support.all = apriori(trans, 
                      parameter = list(supp = 3/7, target="frequent itemsets"))
Market Basket Analysis in R

Output of the apriori - frequent itemsets

Frequent itemsets

inspect(support.all)
    items support   count
[1] {A}   0.4285714 3    
[2] {C}   0.5714286 4    
[3] {D}   0.7142857 5    
[4] {B}   0.8571429 6    
[5] {A,B} 0.4285714 3    
[6] {C,D} 0.4285714 3    
[7] {B,C} 0.4285714 3    
[8] {B,D} 0.5714286 4

item_set_lattice3

Market Basket Analysis in R

Extracting rules with the apriori function

Parameter: the mining parameters change the characteristics of the mined itemsets or rules.

  • Support = 3/7
  • Confidence = 60%
  • Minimum length of rule = 2

Call to the apriori function for rule generation with specific arguments

rules.all = apriori(trans,
                parameter = list(supp=3/7, conf=0.6, minlen=2),
                control = list(verbose=F)
                   )
Market Basket Analysis in R

Extracting rules: output

Inspecting the rules

inspect(rules.all)
    lhs    rhs support   confidence lift      count
[1] {A} => {B} 0.4285714 1.0000000  1.1666667 3    
[2] {C} => {D} 0.4285714 0.7500000  1.0500000 3    
[3] {D} => {C} 0.4285714 0.6000000  1.0500000 3    
[4] {C} => {B} 0.4285714 0.7500000  0.8750000 3    
[5] {D} => {B} 0.5714286 0.8000000  0.9333333 4    
[6] {B} => {D} 0.5714286 0.6666667  0.9333333 4
Market Basket Analysis in R

Let's practice!

Market Basket Analysis in R

Preparing Video For Download...