What is market basket analysis ?

Market Basket Analysis in R

Christopher Bruffaerts

Statistician

Multiple baskets @ grocery store

What's in the store?

all_products_emoticons

Basket 1: {"Bread", "Cheese"}

Basket 2: {"Bread", "Wine" , "Cheese"}

Multiple baskets

If 100 customers visit the grocery store, can we find associations of items that occur together?

Example: Bread and Cheese

bread_cheese

Outcome: “if this, then that”

Market Basket Analysis in R

Market basket applications

Learning from multiple baskets

baskets

Different applications

  • E-commerce: “customers who bought this also bought this”
  • Retail: items which are “bundled or placed together”
  • Social media: friends and connections recommendation
  • Videos and movies recommendation
Market Basket Analysis in R

Multiple baskets in R

Create a dataset containing multiple baskets!

my_baskets = data.frame(
  "Basket" = c(1,1,1,1, 2,2,2, 3,3, 4,4,4, 5,5, 6,6, 7,7),
  "Product" = c("Bread", "Cheese", "Cheese", "Cheese",
                "Bread", "Butter", "Wine",
                "Butter", "Butter",
                "Butter", "Wine", "Wine",
                "Butter", "Cheese",
                "Cheese", "Wine",
                "Wine", "Wine")
)

A glimpse at my baskets

head(my_baskets)
  Basket Product
1      1   Bread
2      1  Cheese
3      1  Cheese
4      1  Cheese
5      2   Bread
6      2  Butter
Market Basket Analysis in R

What's in our baskets?

Questions

  • How many distinct items are there?
n_distinct(my_baskets$Product)
[1] 4
  • How many baskets are there?
n_distinct(my_baskets$Basket)
[1] 7
  • How many items are there in each basket?
df_basket =
  my_baskets %>%
  group_by(Basket) %>%
  summarize(
    n_total = n(),
    n_items = n_distinct(Product))
  Basket n_total n_items
   <dbl>   <int>   <int>
1      1       4       2
2      2       3       3
Market Basket Analysis in R

How big are baskets?

Average basket sizes

basket_size %>% 
  summarize(
    avg_total_items = mean(n_total), 
    avg_dist_items = mean(n_items))
# A tibble: 1 x 2
  avg_total_items avg_dist_items
            <dbl>          <dbl>
1            2.57           1.86

Distribution of basket size

# Distribution of distinct items
ggplot(df_basket, aes(n_items)) +
  geom_bar()

distribution_basket

Market Basket Analysis in R

Specific products in the baskets

Which item are you looking at?

  • How many times an item appears across all baskets?

  • How many baskets contain that item?

Example:

cheese

Filtering for Cheese in R

# Number of baskets containing Cheese
my_baskets %>%
  filter(Product == "Cheese")  %>%
  summarize(
    n_tot_items = n(),
    n_basket_item = n_distinct(Basket))
  n_tot_items n_basket_item
1           5             3
Market Basket Analysis in R

Association rule mining

Association rule mining: finding frequent co-occuring associations among a collection of items.

emoticons_arrows

Example of rule extraction:

  • {Bread} $\rightarrow$ {Butter}
  • {Bread, Cheese} $\rightarrow$ {Wine}
Market Basket Analysis in R

So what's coming next?

Agenda for the rest of the course:

  • Chapter 2: Metrics & techniques in market basket analysis
  • Chapter 3: Visualization in market basket analysis
  • Chapter 4: Case study: Movie recommendations @ movieLens

movie_lens_logo

Market Basket Analysis in R

Let's play with baskets!

Market Basket Analysis in R

Preparing Video For Download...