Transactional data

Market Basket Analysis in R

Christopher Bruffaerts

Statistician

What is a transaction?

Transaction: Activity of buying or selling something.

transaction

Transactional data: List of all items bought by a customer in a single purchase.

Example of one transaction:

  TID Product
1   1   Bread
2   1  Cheese
3   1  Cheese
4   1  Cheese
Market Basket Analysis in R

The transactional class in R

Transactions-class: represents transaction data used for mining itemsets or rules.

Coercion from:

  • lists
  • matrices
  • dataframes

However, you will need to prepare your data first.

Important when considering transactional data

  • Field/column used to identify a product

  • Field/column used to identify a transaction

Market Basket Analysis in R

Back to the grocery store (1)

Transactional data from the store

my_transactions = data.frame(
  "TID" = c(1,1,1,1, 2,2,2, 3,3, 4,4,4, 5,5, 6,6, 7,7),
  "Product" = c("Bread", "Cheese", "Cheese", "Cheese",
                "Bread", "Butter", "Wine",
                "Butter", "Butter",
                "Butter", "Wine", "Wine",
                "Butter", "Cheese",
                "Cheese", "Wine",
                "Wine", "Wine")
)

Transaction glimpse

head(my_transactions, 10)
   TID Product
1    1   Bread
2    1  Butter
3    1  Cheese
4    1    Wine
5    2   Bread
6    2  Butter
7    2    Wine
8    3   Bread
9    3  Butter
10   4  Butter
Market Basket Analysis in R

Back to the grocery store (2)

Create lists with the split function

# Transform TID into a factor
my_transactions$TID = 
  factor(my_transactions$TID)

# Split into groups
data_list = split(my_transactions$Product,
                   my_transactions$TID)
data_list
$`1`
[1] Bread  Butter Cheese Wine  
Levels: Bread Butter Cheese Wine

$`2`
[1] Bread  Butter Wine  
Levels: Bread Butter Cheese Wine

$`3`
[1] Bread  Butter
Levels: Bread Butter Cheese Wine
Market Basket Analysis in R

Back to the grocery store (3)

Transforming to transaction class

# Transform to transactional dataset
data_trx = as(data_list,"transactions")

# Inspect transactions
inspect(data_trx)

Inspection of the transactional data

    items                      transactionID
[1] {Bread,Butter,Cheese,Wine} 1            
[2] {Bread,Butter,Wine}        2            
[3] {Bread,Butter}             3            
[4] {Butter,Cheese,Wine}       4            
[5] {Butter,Cheese}            5            
[6] {Cheese,Wine}              6            
[7] {Butter,Wine}              7
Market Basket Analysis in R

More inspections of transactions

Overview of transactions

inspect(head(data_trx))
    items                      transactionID
[1] {Bread,Butter,Cheese,Wine} 1            
[2] {Bread,Butter,Wine}        2            
[3] {Bread,Butter}             3            
[4] {Butter,Cheese,Wine}       4            
[5] {Butter,Cheese}            5            
[6] {Cheese,Wine}              6

Accessing specific transactions

inspect(data_trx[1])
inspect(data_trx[1:3])

Summary of the transactional object

summary(data_trx)
Market Basket Analysis in R

Overview of transactions

Plotting the ItemMatrix

image(data_trx)

Warning: use the function on a limited number of transactions

Useful to identify:

  • Patterns in the transactions
  • Sparsity in the data

Density = 18/28 = 0.64

trx_image

Market Basket Analysis in R

Let's inspect transactions!

Market Basket Analysis in R

Preparing Video For Download...