Metrics in market basket analysis

Market Basket Analysis in R

Christopher Bruffaerts

Statistician

Metrics used for rule extraction

TID Transaction
1 {Bread, Butter, Cheese, Wine}
2 {Bread, Butter, Wine}
3 {Bread, Butter}
4 {Butter, Cheese, Wine}
5 {Butter, Cheese}
6 {Cheese, Wine}
7 {Butter, Wine}

Goal: Extract association rules

Examples:

  • {Bread} $\rightarrow$ {Butter}
    • Bread = "Antecedent"
    • Butter = "Consequent"
  • {Butter, Cheese} $\rightarrow$ {Wine}

Metrics: Support, confidence, lift,...

Market Basket Analysis in R

Support measure

TID Transaction
1 {Bread, Butter, Cheese, Wine}
2 {Bread, Butter, Wine}
3 {Bread, Butter}
4 {Butter, Cheese, Wine}
5 {Butter, Cheese}
6 {Cheese, Wine}
7 {Butter, Wine}

Support : "popularity of an itemset"

  • supp(X) = Fraction of transactions that contain itemset X.
  • supp(X $\cup$ Y) = Fraction of transactions with both X and Y.

Examples:

  • supp({Bread}) = 3/7 = 42%
  • supp({Bread} $\cup$ {Butter}) = 3/7 = 42%
Market Basket Analysis in R

Confidence measure

TID Transaction
1 {Bread, Butter, Cheese, Wine}
2 {Bread, Butter, Wine}
3 {Bread, Butter}
4 {Butter, Cheese, Wine}
5 {Butter, Cheese}
6 {Cheese, Wine}
7 {Butter, Wine}

Confidence : "how often the rule is true"

conf(X $\rightarrow$ Y) = supp(X $\cup$ Y) / supp(X)

Confidence shows the percentage in which Y is bought with X.

Example:

X = {Bread}

Y = {Butter}

conf(X $\rightarrow$ Y) = $\frac{3/7}{3/7}$ = 100%

Market Basket Analysis in R

Lift measure

TID Transaction
1 {Bread, Butter, Cheese, Wine}
2 {Bread, Butter, Wine}
3 {Bread, Butter}
4 {Butter, Cheese, Wine}
5 {Butter, Cheese}
6 {Cheese, Wine}
7 {Butter, Wine}

Lift : "how strong is the association"

lift(X $\rightarrow$ Y) = $\dfrac{supp(X \cup Y)}{supp(X) \times supp(Y)}$

  • Lift > 1: Y is likely to be bought with X
  • Lift < 1: Y is unlikely to be bought if X is bought

Example:

X = {Bread}; Y = {Butter}

lift(X $\rightarrow$ Y) = $\frac{3/7}{(3/7)*(6/7)} = \frac{7}{6}$ ~ 1.16

Market Basket Analysis in R

The apriori function for frequent itemsets

library(arules)
# Frequent itemsets
supp.cw = apriori(trans, # the transactional dataset
                  # Parameter list
                  parameter=list(
                    # Minimum Support
                    supp=0.2,
                    # Minimum Confidence
                    conf=0.4,
                    # Minimum length
                    minlen=2,
                    # Target
                    target="frequent itemsets"),
                  # Appearence argument
                  appearance = list(
                    items = c("Cheese","Wine"))
                 )
Market Basket Analysis in R

The apriori function for rules

library(arules)
# Rules
rules.b.rhs = apriori(trans, # the transactional dataset
                  # Parameter list
                  parameter=list(
                    # Minimum Support
                    supp=0.2,
                    # Minimum Confidence
                    conf=0.4,
                    # Minimum length
                    minlen=2,
                    # Target
                    target="rules"),
                  # Appearence argument
                   appearance = list(
                     rhs = "Butter",
                    default = "lhs")
                 )
Market Basket Analysis in R

Frequent itemsets with the apriori

TID Transaction
1 {Bread, Butter, Cheese, Wine}
2 {Bread, Butter, Wine}
3 {Bread, Butter}
4 {Butter, Cheese, Wine}
5 {Butter, Cheese}
6 {Cheese, Wine}
7 {Butter, Wine}

Retrieve the frequent itemsets

supp.all = apriori(trans, 
           parameter=list(supp=3/7,
           target="frequent itemsets"))        
inspect(head(sort(supp.all,by="support"),3))
    items          support   count
[1] {Butter}       0.8571429 6    
[2] {Wine}         0.7142857 5    
[3] {Cheese}       0.5714286 4
Market Basket Analysis in R

Inspect confidence and lift measures

TID Transaction
1 {Bread, Butter, Cheese, Wine}
2 {Bread, Butter, Wine}
3 {Bread, Butter}
4 {Butter, Cheese, Wine}
5 {Butter, Cheese}
6 {Cheese, Wine}
7 {Butter, Wine}

Retrieve the rules

# Rules with "Butter" on rhs
rules.b.rhs = apriori(trans, 
                 parameter=list(
                    minlen=2,
                    target="rules"), 
                 appearance = list(
                     rhs="Butter",
                    default = "lhs")
           )
inspect(head(sort(rules.b.rhs,by="lift")), 5)
Market Basket Analysis in R

Inspect confidence and lift measures

TID Transaction
1 {Bread, Butter, Cheese, Wine}
2 {Bread, Butter, Wine}
3 {Bread, Butter}
4 {Butter, Cheese, Wine}
5 {Butter, Cheese}
6 {Cheese, Wine}
7 {Butter, Wine}

Retrieve the rules

    lhs                    rhs      support  confidence lift  count
[1] {Bread}             => {Butter} 0.42      1.0        1.16     3    
[2] {Bread,Cheese}      => {Butter} 0.14      1.0        1.16     1    
[3] {Bread,Wine}        => {Butter} 0.28      1.0        1.16     2    
[4] {Bread,Cheese,Wine} => {Butter} 0.14      1.0        1.16     1    
[5] {Wine}              => {Butter} 0.57      0.8        0.93     4
Market Basket Analysis in R

Let's practice!

Market Basket Analysis in R

Preparing Video For Download...