Confidence and lift

Market Basket Analysis in Python

Isaiah Hull

Visiting Associate Professor of Finance, BI Norwegian Business School

When support is misleading

TID Transaction
1 Coffee, Milk
2 Bread, Milk, Orange
3 Bread, Milk
4 Bread, Milk, Sugar
5 Bread, Jam, Milk
... ...

 

  1. Milk and bread frequently purchased together.
    • Support: {Milk} $\rightarrow$ {Bread}
  2. Rule is not informative for marketing.
    • Milk and bread are both popular items.
Market Basket Analysis in Python

The confidence metric

  1. Can improve over support with additional metrics.
  2. Adding confidence provides a more complete picture.

 

$$\frac{Support(X \& Y)}{Support(X)}$$

Market Basket Analysis in Python

Interpreting the confidence metric

This image shows a list of transactions from a grocery store. One of those transactions includes jam and bread.

$$Support(Milk\&Coffee) = 0.20$$

This image shows a list of transactions from a grocery store. One of those transactions includes jam and bread.

$$Support(Milk) = 1.00$$

Market Basket Analysis in Python

Interpreting the confidence metric

  $$\frac{Support(Milk\&Coffee)}{Support(Milk)} = \frac{0.20}{1.00} = 0.20$$

  $$\frac{Support(Milk\&Coffee)}{Support(Milk)} = \frac{0.20}{0.80} = 0.25$$

  $$\frac{Support(Milk\&Coffee)}{Support(Milk)} = \frac{0.20}{0.20} = 1.00$$

Market Basket Analysis in Python

The lift metric

 

  • Lift provides another metric for evaluating the relationship between items.
    • Numerator: Proportion of transactions that contain X and Y.
    • Denominator: Proportion if X and Y assigned randomly and independently.

 

$$\frac{Support(X \& Y)}{Support(X) Support(Y)}$$

Market Basket Analysis in Python

Preparing the data

from mlxtend.preprocessing import TransactionEncoder
import pandas as pd

# Split library strings into lists
libraries = data['Library'].apply(lambda t: t.split(','))

# Convert to list of lists
libraries = list(libraries)

# One-hot encode books
books = TransactionEncoder().fit(libraries).transform(libraries)

# Convert one-hot encoded data to DataFrame
books = pd.DataFrame(books, columns = encoder.columns_)
Market Basket Analysis in Python

Computing confidence and lift

# Print first five items
print(books.head())
       Hunger           Gatsby
0      False            True
1      False            True
2      False            False
3      False            True
4      False            True

Dataset: GoodBooks-10K.

Market Basket Analysis in Python

Computing confidence and lift

# Computing support.
supportHG = np.logical_and(books['Hunger'],books['Gatsby']).mean()
supportH = books['Hunger'].mean()
supportG = books['Gatsby'].mean()
# Compute and print confidence and lift.
confidence = supportHG / supportH
lift = supportHG / (supportH * supportG)
# Print results.
print(supportG, confidence, lift)
(0.30, 0.16, 0.53)
Market Basket Analysis in Python

Let's practice!

Market Basket Analysis in Python

Preparing Video For Download...