Building Recommendation Engines in Python
Rob O'Callaghan
Director of Data
book_df
DataFrame:
User | Book |
---|---|
User_233 | The Great Gatsby |
User_651 | The Catcher in the Rye |
User_131 | The Lord of the Rings |
User_965 | Little Women |
User_651 | Fifty Shades of Grey |
... | ... |
book_df['book'].value_counts()
40 Shades of Grey 524
Harry Potter and the Sorcerer's Stone 487
The Da Vinci Code 455
The Twilight Saga 401
Lord of the Rings 278
...
print(book_df.value_counts().index)
Index(['40 Shades of Grey', 'Harry Potter and the Sorcerer's Stone',
'The Da Vinci Code', 'The Twilight Saga',
'The Lord of the Rings'],
dtype='object')
user_ratings
DataFrame:
User | Book | Rating |
---|---|---|
User_233 | The Great Gatsby | 3.0 |
User_651 | The Catcher in the Rye | 5.0 |
User_131 | The Lord of the Rings | 3.0 |
User_965 | Little Women | 4.0 |
User_651 | Fifty Shades of Grey | 2.0 |
... | ... | ... |
avg_rating_df = user_ratings[["book", "rating"]].groupby(['book']).mean()
avg_rating_df.head()
rating
title
Hamlet 4.1
The Da Vinci Code 2.1
Gone with the Wind 4.2
Fifty Shades of Grey 1.2
Wuthering Heights 3.9
...
sorted_avg_rating_df = avg_rating_df.sort_values(by="rating", ascending=False)
sorted_avg_rating_df.head()
rating
title
The Girl in the Fog 5.0
Behind the Bell 5.0
Across the River and into the Trees 5.0
The Complete McGonagall 5.0
What Is to Be Done? 5.0
...
(user_ratings['title']=='The Girl in the Fog').sum()
1
(user_ratings['title']=='Valley of the Dolls').sum()
1
(user_ratings['title']=='Across the River and into the Trees').sum()
1
book_frequency = user_ratings["book"].value_counts()
print(book_frequency)
40 Shades of Grey 524
Harry Potter and the Sorcerer's Stone 487
...
frequently_reviewed_books = book_frequency[book_frequency > 100].index
print(frequently_reviewed_books)
Index([u'The Lord of the Rings', u'To Kill a Mockingbird', u'Of Mice and Men',
u'1984', u'Hamlet'])
frequent_books_df = user_ratings_df[user_ratings_df["book"].isin(frequently_reviewed_books)]
frequent_books_avgs = frequently_reviewed_books[["title", "rating"]].groupby('title').mean()
print(frequent_books_avgs.sort_values(by="rating", ascending=False).head())
rating
title
To Kill a Mockingbird 4.7
1984. 4.7
Harry Potter and the Sorcerer's Stone 4.6
...
Building Recommendation Engines in Python