Metrieken in market-basketanalyse

Market Basket-analyse in R

Christopher Bruffaerts

Statistician

Metrieken voor regelextractie

TID Transactie
1 {Bread, Butter, Cheese, Wine}
2 {Bread, Butter, Wine}
3 {Bread, Butter}
4 {Butter, Cheese, Wine}
5 {Butter, Cheese}
6 {Cheese, Wine}
7 {Butter, Wine}

Doel: associatieregels extraheren

Voorbeelden:

  • {Bread} $\rightarrow$ {Butter}
    • Bread = "Antecedent"
    • Butter = "Consequent"
  • {Butter, Cheese} $\rightarrow$ {Wine}

Metrieken: support, confidence, lift, ...

Market Basket-analyse in R

Support-maat

TID Transactie
1 {Bread, Butter, Cheese, Wine}
2 {Bread, Butter, Wine}
3 {Bread, Butter}
4 {Butter, Cheese, Wine}
5 {Butter, Cheese}
6 {Cheese, Wine}
7 {Butter, Wine}

Support: "populariteit van een itemset"

  • supp(X) = Aandeel transacties met itemset X.
  • supp(X $\cup$ Y) = Aandeel transacties met zowel X als Y.

Voorbeelden:

  • supp({Bread}) = 3/7 = 42%
  • supp({Bread} $\cup$ {Butter}) = 3/7 = 42%
Market Basket-analyse in R

Confidence-maat

TID Transactie
1 {Bread, Butter, Cheese, Wine}
2 {Bread, Butter, Wine}
3 {Bread, Butter}
4 {Butter, Cheese, Wine}
5 {Butter, Cheese}
6 {Cheese, Wine}
7 {Butter, Wine}

Confidence: "hoe vaak de regel klopt"

conf(X $\rightarrow$ Y) = supp(X $\cup$ Y) / supp(X)

Confidence geeft het percentage waarin Y samen met X wordt gekocht.

Voorbeeld:

X = {Bread}

Y = {Butter}

conf(X $\rightarrow$ Y) = $\frac{3/7}{3/7}$ = 100%

Market Basket-analyse in R

Lift-maat

TID Transactie
1 {Bread, Butter, Cheese, Wine}
2 {Bread, Butter, Wine}
3 {Bread, Butter}
4 {Butter, Cheese, Wine}
5 {Butter, Cheese}
6 {Cheese, Wine}
7 {Butter, Wine}

Lift: "hoe sterk is de associatie"

lift(X $\rightarrow$ Y) = $\dfrac{supp(X \cup Y)}{supp(X) \times supp(Y)}$

  • Lift > 1: Y wordt waarschijnlijk samen met X gekocht
  • Lift < 1: Y wordt onwaarschijnlijk gekocht als X wordt gekocht

Voorbeeld:

X = {Bread}; Y = {Butter}

lift(X $\rightarrow$ Y) = $\frac{3/7}{(3/7)*(6/7)} = \frac{7}{6}$ ~ 1,16

Market Basket-analyse in R

De apriori-functie voor frequente itemsets

library(arules)
# Frequent itemsets
supp.cw = apriori(trans, # the transactional dataset
                  # Parameter list
                  parameter=list(
                    # Minimum Support
                    supp=0.2,
                    # Minimum Confidence
                    conf=0.4,
                    # Minimum length
                    minlen=2,
                    # Target
                    target="frequent itemsets"),
                  # Appearence argument
                  appearance = list(
                    items = c("Cheese","Wine"))
                 )
Market Basket-analyse in R

De apriori-functie voor regels

library(arules)
# Rules
rules.b.rhs = apriori(trans, # the transactional dataset
                  # Parameter list
                  parameter=list(
                    # Minimum Support
                    supp=0.2,
                    # Minimum Confidence
                    conf=0.4,
                    # Minimum length
                    minlen=2,
                    # Target
                    target="rules"),
                  # Appearence argument
                   appearance = list(
                     rhs = "Butter",
                    default = "lhs")
                 )
Market Basket-analyse in R

Frequente itemsets met apriori

TID Transactie
1 {Bread, Butter, Cheese, Wine}
2 {Bread, Butter, Wine}
3 {Bread, Butter}
4 {Butter, Cheese, Wine}
5 {Butter, Cheese}
6 {Cheese, Wine}
7 {Butter, Wine}

Haal de frequente itemsets op

supp.all = apriori(trans, 
           parameter=list(supp=3/7,
           target="frequent itemsets"))        
inspect(head(sort(supp.all,by="support"),3))
    items          support   count
[1] {Butter}       0.8571429 6    
[2] {Wine}         0.7142857 5    
[3] {Cheese}       0.5714286 4
Market Basket-analyse in R

Inspecteer confidence- en lift-maten

TID Transactie
1 {Bread, Butter, Cheese, Wine}
2 {Bread, Butter, Wine}
3 {Bread, Butter}
4 {Butter, Cheese, Wine}
5 {Butter, Cheese}
6 {Cheese, Wine}
7 {Butter, Wine}

Haal de regels op

# Regels met "Butter" op rhs
rules.b.rhs = apriori(trans, 
                 parameter=list(
                    minlen=2,
                    target="rules"), 
                 appearance = list(
                     rhs="Butter",
                    default = "lhs")
           )
inspect(head(sort(rules.b.rhs,by="lift")), 5)
Market Basket-analyse in R

Inspecteer confidence- en lift-maten

TID Transactie
1 {Bread, Butter, Cheese, Wine}
2 {Bread, Butter, Wine}
3 {Bread, Butter}
4 {Butter, Cheese, Wine}
5 {Butter, Cheese}
6 {Cheese, Wine}
7 {Butter, Wine}

Haal de regels op

    lhs                    rhs      support  confidence lift  count
[1] {Bread}             => {Butter} 0.42      1.0        1.16     3    
[2] {Bread,Cheese}      => {Butter} 0.14      1.0        1.16     1    
[3] {Bread,Wine}        => {Butter} 0.28      1.0        1.16     2    
[4] {Bread,Cheese,Wine} => {Butter} 0.14      1.0        1.16     1    
[5] {Wine}              => {Butter} 0.57      0.8        0.93     4
Market Basket-analyse in R

Laten we oefenen!

Market Basket-analyse in R

Preparing Video For Download...