Market Basket Analysis in R
Christopher Bruffaerts
Statistician
TID | Transaction |
---|---|
1 | {Bread, Butter, Cheese, Wine} |
2 | {Bread, Butter, Wine} |
3 | {Bread, Butter} |
4 | {Butter, Cheese, Wine} |
5 | {Butter, Cheese} |
6 | {Cheese, Wine} |
7 | {Butter, Wine} |
Apply apriori on transactions:
rules = apriori(data_trx,
parameter = list(
supp = 3/7, conf = 0.6,
minlen = 2),
control = list(verbose=F)
)
Create dataframe with extracted rules
df_rules = as(rules, "data.frame")
df_rules
rules support confidence lift count
1 {Bread} => {Butter} 0.4285714 1.0000000 1.1666667 3
2 {Cheese} => {Wine} 0.4285714 0.7500000 1.0500000 3
3 {Wine} => {Cheese} 0.4285714 0.6000000 1.0500000 3
4 {Cheese} => {Butter} 0.4285714 0.7500000 0.8750000 3
5 {Wine} => {Butter} 0.5714286 0.8000000 0.9333333 4
6 {Butter} => {Wine} 0.5714286 0.6666667 0.9333333 4
Frequent itemsets for Cheese and Wine
supp_cheese_wine =
apriori(trans,
parameter = list(
target = "frequent itemsets",
supp = 3/7),
appearance = list(
items = c("Cheese", "Wine"))
)
inspect(supp_cheese_wine)
items support count
[1] {Cheese} 0.5714286 4
[2] {Wine} 0.7142857 5
[3] {Cheese,Wine} 0.4285714 3
Specific rules for Cheese
rules_cheese_rhs = apriori(data = trans,
parameter = list(supp=3/7,conf=0.2, minlen=2),
appearance = list(rhs="Cheese"),
control = list (verbose=F))
inspect(rules_cheese_rhs)
lhs rhs support confidence lift count
[1] {Wine} => {Cheese} 0.4285714 0.6 1.050 3
[2] {Butter} => {Cheese} 0.4285714 0.5 0.875 3
What is a redundant rule?
A rule is redundant if a more general rule with the same or a higher confidence exists.
Super-rule:
A rule is more general if it has the same RHS but one or more items removed from the LHS.
Example:
Super-rules of {A} $\rightarrow$ {C}:
Non-redundant rules are defined as:
Set of generated rules
rules = apriori(trans,control = list(verbose=F),
parameter = list(supp=0.05, conf=0.5, minlen=2),
appearance = list(rhs="Bread", default = "lhs"))
Set of pruned rules (non-redundant)
redundant_rules = is.redundant(rules)
non_redundant_rules = rules[!redundant_rules]
Comparing extracted rules and non-redundant rules
inspect(rules)
lhs rhs support confidence lift count
[1] {Butter} => {Bread} 0.4285714 0.5 1.166667 3
[2] {Butter,Wine} => {Bread} 0.2857143 0.5 1.166667 2
[3] {Butter,Cheese,Wine} => {Bread} 0.1428571 0.5 1.166667 1
inspect(non_redundant_rules)
lhs rhs support confidence lift count
[1] {Butter} => {Bread} 0.4285714 0.5 1.166667 3
Market Basket Analysis in R