Building Recommendation Engines in Python
Rob O'Callaghan
Director of Data
tfidf_summary_df
:
Book | Adventure | Fantasy | Tragedy | Social commentary |
---|---|---|---|---|
The Hobbit | 1 | 1 | 0 | 0 |
Macbeth | 0 | 0 | 1 | 0 |
... | ... | ... | ... | ... |
User Profile:
User Profile | Adventure | Fantasy | Tragedy | Social commentary |
---|---|---|---|---|
User_001 | ??? | ??? | ??? | ??? |
list_of_books_read = ['The Hobbit', 'Foundation', 'Nudge']
user_books = tfidf_summary_df.reindex(list_of_books_read)
print(user_books)
age ancient angry brave battle fellow ...
The Hobbit 0.21 0.53 0.41 0.64 0.01 0.02 ...
Foundation 0.31 0.90 0.42 0.33 0.64 0.04 ...
Nudge 0.61 0.01 0.45 0.31 0.12 0.74 ...
user_prof = user_movies.mean()
print(user_prof)
age 0.376667
ancient 0.480000
angry 0.426667
brave 0.256667
...
print(user_prof.values.reshape(1,-1))
[0.376667, .480000, 0.426667, 0.256667, ...]
# Create a subset of only the non read books non_user_movies = tfidf_summary_df.drop(list_of_movies_seen, axis=0)
# Calculate the cosine similarity between all rows user_prof_similarities = cosine_similarity(user_prof.values.reshape(1, -1), non_user_movies)
# Wrap in a DataFrame for ease of use user_prof_similarities_df = pd.DataFrame(user_prof_similarities.T, index=tfidf_summary_df.index, columns=["similarity_score"])
sorted_similarity_df = user_prof_similarities.sort_values(by="similarity_score", ascending=False)
print(sorted_similarity_df)
similarity_score
Title
The Two Towers 0.422488
Dune 0.363540
The Magicians Nephew 0.316075
... ...
Building Recommendation Engines in Python