Market Basket Analysis in R
Christopher Bruffaerts
Statistician
What's in the store?
Basket 1: {"Bread", "Cheese"}
Basket 2: {"Bread", "Wine" , "Cheese"}
Multiple baskets
If 100 customers visit the grocery store, can we find associations of items that occur together?
Example: Bread and Cheese
Outcome: “if this, then that”
Learning from multiple baskets
Different applications
Create a dataset containing multiple baskets!
my_baskets = data.frame(
"Basket" = c(1,1,1,1, 2,2,2, 3,3, 4,4,4, 5,5, 6,6, 7,7),
"Product" = c("Bread", "Cheese", "Cheese", "Cheese",
"Bread", "Butter", "Wine",
"Butter", "Butter",
"Butter", "Wine", "Wine",
"Butter", "Cheese",
"Cheese", "Wine",
"Wine", "Wine")
)
A glimpse at my baskets
head(my_baskets)
Basket Product
1 1 Bread
2 1 Cheese
3 1 Cheese
4 1 Cheese
5 2 Bread
6 2 Butter
Questions
n_distinct(my_baskets$Product)
[1] 4
n_distinct(my_baskets$Basket)
[1] 7
df_basket =
my_baskets %>%
group_by(Basket) %>%
summarize(
n_total = n(),
n_items = n_distinct(Product))
Basket n_total n_items
<dbl> <int> <int>
1 1 4 2
2 2 3 3
Average basket sizes
basket_size %>%
summarize(
avg_total_items = mean(n_total),
avg_dist_items = mean(n_items))
# A tibble: 1 x 2
avg_total_items avg_dist_items
<dbl> <dbl>
1 2.57 1.86
Distribution of basket size
# Distribution of distinct items
ggplot(df_basket, aes(n_items)) +
geom_bar()
Which item are you looking at?
How many times an item appears across all baskets?
How many baskets contain that item?
Example:
Filtering for Cheese in R
# Number of baskets containing Cheese
my_baskets %>%
filter(Product == "Cheese") %>%
summarize(
n_tot_items = n(),
n_basket_item = n_distinct(Basket))
n_tot_items n_basket_item
1 5 3
Association rule mining: finding frequent co-occuring associations among a collection of items.
Example of rule extraction:
Agenda for the rest of the course:
Market Basket Analysis in R