Market Basket Analysis in R
Christopher Bruffaerts
Statistician
Transaction: Activity of buying or selling something.
Transactional data: List of all items bought by a customer in a single purchase.
Example of one transaction:
TID Product
1 1 Bread
2 1 Cheese
3 1 Cheese
4 1 Cheese
Transactions-class: represents transaction data used for mining itemsets or rules.
Coercion from:
However, you will need to prepare your data first.
Important when considering transactional data
Field/column used to identify a product
Field/column used to identify a transaction
Transactional data from the store
my_transactions = data.frame(
"TID" = c(1,1,1,1, 2,2,2, 3,3, 4,4,4, 5,5, 6,6, 7,7),
"Product" = c("Bread", "Cheese", "Cheese", "Cheese",
"Bread", "Butter", "Wine",
"Butter", "Butter",
"Butter", "Wine", "Wine",
"Butter", "Cheese",
"Cheese", "Wine",
"Wine", "Wine")
)
Transaction glimpse
head(my_transactions, 10)
TID Product
1 1 Bread
2 1 Butter
3 1 Cheese
4 1 Wine
5 2 Bread
6 2 Butter
7 2 Wine
8 3 Bread
9 3 Butter
10 4 Butter
Create lists with the split function
# Transform TID into a factor
my_transactions$TID =
factor(my_transactions$TID)
# Split into groups
data_list = split(my_transactions$Product,
my_transactions$TID)
data_list
$`1`
[1] Bread Butter Cheese Wine
Levels: Bread Butter Cheese Wine
$`2`
[1] Bread Butter Wine
Levels: Bread Butter Cheese Wine
$`3`
[1] Bread Butter
Levels: Bread Butter Cheese Wine
Transforming to transaction class
# Transform to transactional dataset
data_trx = as(data_list,"transactions")
# Inspect transactions
inspect(data_trx)
Inspection of the transactional data
items transactionID
[1] {Bread,Butter,Cheese,Wine} 1
[2] {Bread,Butter,Wine} 2
[3] {Bread,Butter} 3
[4] {Butter,Cheese,Wine} 4
[5] {Butter,Cheese} 5
[6] {Cheese,Wine} 6
[7] {Butter,Wine} 7
Overview of transactions
inspect(head(data_trx))
items transactionID
[1] {Bread,Butter,Cheese,Wine} 1
[2] {Bread,Butter,Wine} 2
[3] {Bread,Butter} 3
[4] {Butter,Cheese,Wine} 4
[5] {Butter,Cheese} 5
[6] {Cheese,Wine} 6
Accessing specific transactions
inspect(data_trx[1])
inspect(data_trx[1:3])
Summary of the transactional object
summary(data_trx)
Plotting the ItemMatrix
image(data_trx)
Warning: use the function on a limited number of transactions
Useful to identify:
Density = 18/28 = 0.64
Market Basket Analysis in R