Market Basket Analysis in Python
Isaiah Hull
Visiting Associate Professor of Finance, BI Norwegian Business School
TID | Transaction |
---|---|
1 | biography, history |
2 | fiction |
3 | biography, poetry |
4 | fiction, history |
5 | biography |
... | ... |
75000 | fiction, poetry |
Identify products frequently purchased together.
Construct recommendations based on these findings.
TID | Transaction |
---|---|
11 | fiction, biography |
12 | fiction, biography |
13 | history, biography |
... | ... |
19 | fiction, biography |
20 | fiction, biography |
... | ... |
import pandas as pd
# Load transactions from pandas.
books = pd.read_csv("datasets/bookstore.csv")
# Print the header
print(books.head(2))
TID Transaction
0 biography, history
1 fiction
For a refresher, see the Pandas Cheat Sheet.
# Split transaction strings into lists.
transactions = books['Transaction'].apply(lambda t: t.split(','))
# Convert DataFrame into list of strings.
transactions = list(transactions)
# Print the first transaction.
print(transactions[0])
['biography', 'history']
# Count the number of transactions that contain biography and fiction.
transactions.count(['biography', 'fiction'])
218
# Count the number of transactions that contain fiction and poetry.
transactions.count(['fiction', 'poetry'])
5357
Market Basket Analysis in Python