Market Basket Analysis in R
Christopher Bruffaerts
Statistician
Market Basket course
Basket = collection of items
Items
Examples of baskets:
Your basket @ the grocery store
Your Amazon shopping cart
Your courses @ DataCamp
The movies you watched on Netflix
What's in the store?
What are you up for today?
What's in the store?
store = c("Bread", "Butter",
"Cheese", "Wine")
set.seed(1234)
n_items = 4
my_basket = data.frame(
TID = rep(1,n_items),
Product = sample(
store, n_items,
replace = TRUE))
R output
my_basket
TID Product
1 1 Bread
2 1 Cheese
3 1 Cheese
4 1 Cheese
My original basket
One record per item purchased
TID Product
1 1 Bread
2 1 Cheese
3 1 Cheese
4 1 Cheese
My adjusted basket
One record per distinct item purchased
# A tibble: 2 x 3
TID Product Quantity
<dbl> <fct> <int>
1 1 Bread 1
2 1 Cheese 3
Reshaping the basket data
# Adjusting my basket
my_basket = my_basket %>%
add_count(Product) %>%
unique() %>%
rename(Quantity = n)
# Number of distinct items
n_distinct(my_basket$Product)
2
# Total basket size
my_basket %>% summarize(sum(Quantity))
4
Visualizing items in my basket
# Plotting items
ggplot(my_basket,
aes(x=reorder(Product, Quantity),
y = Quantity)) +
geom_col() +
coord_flip() +
xlab("Items") +
ggtitle("Summary of items
in my basket")
Question: Is there any relationship between items within a basket ?
Back to examples
Your basket @ the grocery store, e.g. Spaghetti and Tomato sauce
Your Amazon shopping cart, e.g. Phone and a phone case
Your courses @ DataCamp e.g. "Introduction to R" and "Intermediate R"
Market Basket Analysis in R