Market Basket Analysis in R
Christopher Bruffaerts
Statistician
TID | Transaction |
---|---|
1 | {Bread, Butter, Cheese, Wine} |
2 | {Bread, Butter, Wine} |
3 | {Bread, Butter} |
4 | {Butter, Cheese, Wine} |
5 | {Butter, Cheese} |
6 | {Cheese, Wine} |
7 | {Butter, Wine} |
Goal: Extract association rules
Examples:
Metrics: Support, confidence, lift,...
TID | Transaction |
---|---|
1 | {Bread, Butter, Cheese, Wine} |
2 | {Bread, Butter, Wine} |
3 | {Bread, Butter} |
4 | {Butter, Cheese, Wine} |
5 | {Butter, Cheese} |
6 | {Cheese, Wine} |
7 | {Butter, Wine} |
Support : "popularity of an itemset"
Examples:
TID | Transaction |
---|---|
1 | {Bread, Butter, Cheese, Wine} |
2 | {Bread, Butter, Wine} |
3 | {Bread, Butter} |
4 | {Butter, Cheese, Wine} |
5 | {Butter, Cheese} |
6 | {Cheese, Wine} |
7 | {Butter, Wine} |
Confidence : "how often the rule is true"
conf(X $\rightarrow$ Y) = supp(X $\cup$ Y) / supp(X)
Confidence shows the percentage in which Y is bought with X.
Example:
X = {Bread}
Y = {Butter}
conf(X $\rightarrow$ Y) = $\frac{3/7}{3/7}$ = 100%
TID | Transaction |
---|---|
1 | {Bread, Butter, Cheese, Wine} |
2 | {Bread, Butter, Wine} |
3 | {Bread, Butter} |
4 | {Butter, Cheese, Wine} |
5 | {Butter, Cheese} |
6 | {Cheese, Wine} |
7 | {Butter, Wine} |
Lift : "how strong is the association"
lift(X $\rightarrow$ Y) = $\dfrac{supp(X \cup Y)}{supp(X) \times supp(Y)}$
Example:
X = {Bread}; Y = {Butter}
lift(X $\rightarrow$ Y) = $\frac{3/7}{(3/7)*(6/7)} = \frac{7}{6}$ ~ 1.16
library(arules)
# Frequent itemsets
supp.cw = apriori(trans, # the transactional dataset
# Parameter list
parameter=list(
# Minimum Support
supp=0.2,
# Minimum Confidence
conf=0.4,
# Minimum length
minlen=2,
# Target
target="frequent itemsets"),
# Appearence argument
appearance = list(
items = c("Cheese","Wine"))
)
library(arules)
# Rules
rules.b.rhs = apriori(trans, # the transactional dataset
# Parameter list
parameter=list(
# Minimum Support
supp=0.2,
# Minimum Confidence
conf=0.4,
# Minimum length
minlen=2,
# Target
target="rules"),
# Appearence argument
appearance = list(
rhs = "Butter",
default = "lhs")
)
TID | Transaction |
---|---|
1 | {Bread, Butter, Cheese, Wine} |
2 | {Bread, Butter, Wine} |
3 | {Bread, Butter} |
4 | {Butter, Cheese, Wine} |
5 | {Butter, Cheese} |
6 | {Cheese, Wine} |
7 | {Butter, Wine} |
Retrieve the frequent itemsets
supp.all = apriori(trans,
parameter=list(supp=3/7,
target="frequent itemsets"))
inspect(head(sort(supp.all,by="support"),3))
items support count
[1] {Butter} 0.8571429 6
[2] {Wine} 0.7142857 5
[3] {Cheese} 0.5714286 4
TID | Transaction |
---|---|
1 | {Bread, Butter, Cheese, Wine} |
2 | {Bread, Butter, Wine} |
3 | {Bread, Butter} |
4 | {Butter, Cheese, Wine} |
5 | {Butter, Cheese} |
6 | {Cheese, Wine} |
7 | {Butter, Wine} |
Retrieve the rules
# Rules with "Butter" on rhs
rules.b.rhs = apriori(trans,
parameter=list(
minlen=2,
target="rules"),
appearance = list(
rhs="Butter",
default = "lhs")
)
inspect(head(sort(rules.b.rhs,by="lift")), 5)
TID | Transaction |
---|---|
1 | {Bread, Butter, Cheese, Wine} |
2 | {Bread, Butter, Wine} |
3 | {Bread, Butter} |
4 | {Butter, Cheese, Wine} |
5 | {Butter, Cheese} |
6 | {Cheese, Wine} |
7 | {Butter, Wine} |
Retrieve the rules
lhs rhs support confidence lift count
[1] {Bread} => {Butter} 0.42 1.0 1.16 3
[2] {Bread,Cheese} => {Butter} 0.14 1.0 1.16 1
[3] {Bread,Wine} => {Butter} 0.28 1.0 1.16 2
[4] {Bread,Cheese,Wine} => {Butter} 0.14 1.0 1.16 1
[5] {Wine} => {Butter} 0.57 0.8 0.93 4
Market Basket Analysis in R