Making content-based recommendations

Building Recommendation Engines in Python

Rob O'Callaghan

Director of Data

Introducing the Jaccard similarity

Jaccard similarity: Venn diagram showing Jaccard similarity. $$J(A,B)=\frac{A\cap B }{A \cup B}$$

Building Recommendation Engines in Python

Calculating Jaccard similarity between books

genres_array_df:

Book Adventure Fantasy Tragedy Social commentary ...
The Hobbit 1 1 0 0 ...
The Great Gatsby 0 0 1 1 ...
A Game of Thrones 0 1 0 0 ...
Macbeth 0 0 1 0 ...
... ... ... ... ... ...
Building Recommendation Engines in Python

Calculating Jaccard similarity between books

from sklearn.metrics import jaccard_score


hobbit_row = book_genre_df.loc['The Hobbit']
GOT_row = book_genre_df.loc['A Game of Thrones']
print(jaccard_score(hobbit_row, GOT_row))
0.5
Building Recommendation Engines in Python

Finding the distance between all items

from scipy.spatial.distance import pdist, squareform


jaccard_distances = pdist(book_genre_df.values, metric='jaccard') print(jaccard_distances)
[1.  0.5 1.  1.  0.5 1. ]
square_jaccard_distances = squareform(jaccard_distances)
print(square_jaccard_distances)
[[0.  1.  0.5 1. ]
 [1.  0.  1.  0.5]
 [0.5 1.  0.  1. ]
 [1.  0.5 1.  0. ]]
Building Recommendation Engines in Python

Finding the distance between all items

print(square_jaccard_distances)
[[0.  1.  0.5 1. ]
 [1.  0.  1.  0.5]
 [0.5 1.  0.  1. ]
 [1.  0.5 1.  0. ]]
jaccard_similarity_array = 1 -  square_jaccard_distances
print(jaccard_similarity_array)
[[1.  0.  0.5 0. ]
 [0.  1.  0.  0.5]
 [0.5 0.  1.  0. ]
 [0.  0.5 0.  1. ]]
Building Recommendation Engines in Python

Creating a usable distance table

distance_df = pd.DataFrame(jaccard_similarity_array,
                           index=genres_array_df['Book'], 
                           columns=genres_array_df['Book'])

distance_df.head()
            The Hobbit The Great Gatsby  A Game of Thrones          Macbeth     ...
The Hobbit        1.00             0.15               0.75             0.01     ...
The Great Gatsby  0.15             1.00               0.01             0.43     ...
...
Building Recommendation Engines in Python

Comparing books

print(distance_df['The Hobbit']['A Game of Thrones'])
0.75
print(distance_df['The Hobbit']['The Great Gatsby'])
0.15
Building Recommendation Engines in Python

Finding the most similar books

print(distance_df['The Hobbit'].sort_values(ascending=False))
title
The Hobbit                                                             1.00
The Two Towers                                                         0.91
A Game of Thrones                                                      0.50
...
Building Recommendation Engines in Python

Let's practice!

Building Recommendation Engines in Python

Preparing Video For Download...