Market Basket Analysis in Python
Isaiah Hull
Visiting Associate Professor of Finance, BI Norwegian Business School


| TID | Transactie |
|---|---|
| 1 | biografie, geschiedenis |
| 2 | fictie |
| 3 | biografie, poëzie |
| 4 | fictie, geschiedenis |
| 5 | biografie |
| ... | ... |
| 75000 | fictie, poëzie |
Vind producten die vaak samen gekocht worden.
Maak aanbevelingen op basis hiervan.
| TID | Transactie |
|---|---|
| 11 | fictie, biografie |
| 12 | fictie, biografie |
| 13 | geschiedenis, biografie |
| ... | ... |
| 19 | fictie, biografie |
| 20 | fictie, biografie |
| ... | ... |
import pandas as pd
# Load transactions from pandas.
books = pd.read_csv("datasets/bookstore.csv")
# Print the header
print(books.head(2))
TID Transaction
0 biography, history
1 fiction
Voor een opfrisser, zie de Pandas Cheat Sheet.
# Split transaction strings into lists.
transactions = books['Transaction'].apply(lambda t: t.split(','))
# Convert DataFrame into list of strings.
transactions = list(transactions)
# Print the first transaction.
print(transactions[0])
['biography', 'history']
# Count the number of transactions that contain biography and fiction.
transactions.count(['biography', 'fiction'])
218
# Count the number of transactions that contain fiction and poetry.
transactions.count(['fiction', 'poetry'])
5357

Market Basket Analysis in Python