Market Basket-analyse in R
Christopher Bruffaerts
Statistician
Market-basketanalyse
Focus op het wat, niet op het hoeveel;
oftewel: wat zit er in mandjes van klanten?

Belangrijkste metrics
Let op
De set gevonden regels kan heel groot zijn.
Bekijk of toon dan niet alles: gebruik altijd een subset, of de functies head of tail!
Terug naar de supermarkt

Dataset uit het arules-pakket
# Loading the arules package
library(arules)
# Loading the Groceries dataset
data(Groceries)
summary(Groceries)
transactions as itemMatrix in sparse format with
9835 rows (elements/itemsets/transactions) and
169 columns (items) and a density of 0.02609146
most frequent items:
whole milk other vegetables rolls/buns soda yogurt
2513 1903 1809 1715 1372
(Other)
34055
element (itemset/transaction) length distribution:
sizes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
2159 1643 1299 1005 855 645 545 438 350 246 182 117 78 77 55 46 29
18 19 20 21 22 23 24 26 27 28 29 32
14 14 9 11 4 6 1 1 1 1 3 1
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 2.000 3.000 4.409 6.000 32.000
includes extended item information - examples:
labels level2 level1
1 frankfurter sausage meat and sausage
2 sausage sausage meat and sausage
3 liver loaf sausage meat and sausage
# Plotting a sample of 200 transactions
image(sample(Groceries, 200))

Populairste items
itemFrequencyPlot(Groceries,type="relative",
topN=10,horiz=TRUE,col='steelblue3')

Minst populaire items
par(mar=c(2,10,2,2), mfrow=c(1,1))
barplot(sort(table(unlist(LIST(Groceries))))[1:10],
horiz = TRUE,las = 1,col='orange')

Contingentietabellen
# Contingency table
tbl = crossTable(Groceries)
tbl[1:4,1:4]
frankfurter sausage liver loaf ham
frankfurter 580 99 7 25
sausage 99 924 10 49
liver loaf 7 10 50 3
ham 25 49 3 256
Gesorteerde contingentietabel
# Sorted contingency table
tbl = crossTable(Groceries, sort = TRUE)
tbl[1:4,1:4]
whole milk other vegetables rolls/buns soda
whole milk 2513 736 557 394
other vegetables 736 1903 419 322
rolls/buns 557 419 1809 377
soda 394 322 377 1715
Contingentietabellen
# Counts
tbl['whole milk','flour']
[1] 83
# Chi-square test
crossTable(Groceries, measure='chi')['whole milk', 'flour']
[1] 0.003595389
Contingentietabellen met andere metrics
crossTable(Groceries, measure='lift',sort=T)[1:4,1:4]
whole milk other vegetables rolls/buns soda
whole milk NA 1.5136341 1.205032 1.571735
other vegetables 1.5136341 NA 1.197047 0.9703476
rolls/buns 1.2050318 1.1970465 NA 1.1951242
soda 0.8991124 0.9703476 1.195124 NA
MovieLens: webgebaseerd aanbevelingssysteem dat films aanraadt om te kijken.

Market Basket-analyse in R