Building Recommendation Engines in Python
Robert O'Callaghan
Director of Data
0 User_223 The Great Gatsby <---| Read by the same user
1 User_223 The Catcher in the Rye <---|
2 User_131 The Lord of the Rings
3 User_965 Little Women <---| Read by the same user
4 User_965 Fifty Shades of Grey <---|
... ...
User | book_title |
---|---|
User_233 | The Great Gatsby |
User_233 | The Catcher in the Rye |
To:
Book A | Book B | |
---|---|---|
0 | The Great Gatsby | The Catcher in the Rye |
1 | The Catcher in the Rye | The Great Gatsby |
Books seen with The Great Gatsby
-> The Catcher in the Rye
Books seen with The Catcher in the Rye
-> The Great Gatsby
from itertools import permutations
def create_pairs(x):
return pairs
from itertools import permutations
def create_pairs(x):
pairs = permutations(x.values, 2)
return pairs
permutations(list, length_of_permutations))
Generates iterable object containing all permutationsfrom itertools import permutations
def create_pairs(x):
pairs = list(permutations(x.values, 2))
return pairs
permutations(list, length_of_permutations))
Generates iterable object containing all permutations
list()
Converts this object to a usable list
from itertools import permutations
def create_pairs(x):
pairs = pd.DataFrame(list(permutations(x.values, 2)),
columns=['book_a','book_b'])
return pairs
permutations(list, length_of_permutations))
Generates iterable object containing all permutations
list()
Converts this object to a usable list
pd.DataFrame()
Converts the list to a DataFrame containing the columns book_a
and book_b
book_pairs = book_df.groupby('userId')['book_title'].apply(perm_function)
print(book_pairs.head())
book_a book_b
userId
User_223 0 The Great Gatsby The Catcher in the Rye
1 The Catcher in the Rye The Great Gatsby
User_965 0 Little Women 40 Shades of Grey
1 40 Shades of Grey Little Women
User_773 0 The Twilight Saga Harry Potter and the Sorcerer's Stone
...
book_pairs = book_pairs.reset_index(drop=True)
print(book_pairs.head())
book_a book_b
0 The Great Gatsby The Catcher in the Rye
1 The Catcher in the Rye The Great Gatsby
3 Little Women 40 Shades of Grey
4 40 Shades of Grey Little Women
5 The Twilight Saga Harry Potter and the Sorcerer's Stone
...
pair_counts = book_pairs.groupby(['book_a', 'book_b']).size()
book_a book_b
The Twilight Saga Fifty Shades of Grey 16
Pride and Prejudice 12
...
pair_counts_df = pair_counts.to_frame(name = 'size').reset_index()
print(pair_counts_df.head())
book_a book_b size
1 The Twilight Saga Fifty Shades of Grey 16
2 The Twilight Saga Pride and Prejudice 12
...
pair_counts_sorted = pair_counts_df.sort_values('size', ascending=False)
pair_counts_sorted[pair_counts_sorted['book_a'] == 'Lord of the Rings']
book_a book_b size
137 Lord of the Rings The Hobbit 12
147 Lord of the Rings Harry Potter and the Sorcerer's Stone 10
143 Lord of the Rings The Colour of Magic 9
...
Building Recommendation Engines in Python