Aanbevelingssystemen bouwen in Python
Robert O'Callaghan
Director of Data
0 User_223 The Great Gatsby <---| Door dezelfde gebruiker gelezen
1 User_223 The Catcher in the Rye <---|
2 User_131 The Lord of the Rings
3 User_965 Little Women <---| Door dezelfde gebruiker gelezen
4 User_965 Fifty Shades of Grey <---|
... ...
| User | book_title |
|---|---|
| User_233 | The Great Gatsby |
| User_233 | The Catcher in the Rye |
Naar:
| Boek A | Boek B | |
|---|---|---|
| 0 | The Great Gatsby | The Catcher in the Rye |
| 1 | The Catcher in the Rye | The Great Gatsby |
Boeken gezien met The Great Gatsby -> The Catcher in the Rye
Boeken gezien met The Catcher in the Rye -> The Great Gatsby
from itertools import permutations
def create_pairs(x):
return pairs
from itertools import permutations
def create_pairs(x):
pairs = permutations(x.values, 2)
return pairs
permutations(list, length_of_permutations)) Maakt een iterabel met alle permutatiesfrom itertools import permutations
def create_pairs(x):
pairs = list(permutations(x.values, 2))
return pairs
permutations(list, length_of_permutations)) Maakt een iterabel met alle permutaties
list() Zet dit om naar een bruikbare lijst
from itertools import permutations
def create_pairs(x):
pairs = pd.DataFrame(list(permutations(x.values, 2)),
columns=['book_a','book_b'])
return pairs
permutations(list, length_of_permutations)) Maakt een iterabel met alle permutaties
list() Zet dit om naar een bruikbare lijst
pd.DataFrame() Zet de lijst om naar een DataFrame met kolommen book_a
en book_b
book_pairs = book_df.groupby('userId')['book_title'].apply(perm_function)
print(book_pairs.head())
book_a book_b
userId
User_223 0 The Great Gatsby The Catcher in the Rye
1 The Catcher in the Rye The Great Gatsby
User_965 0 Little Women 40 Shades of Grey
1 40 Shades of Grey Little Women
User_773 0 The Twilight Saga Harry Potter and the Sorcerer's Stone
...
book_pairs = book_pairs.reset_index(drop=True)
print(book_pairs.head())
book_a book_b
0 The Great Gatsby The Catcher in the Rye
1 The Catcher in the Rye The Great Gatsby
3 Little Women 40 Shades of Grey
4 40 Shades of Grey Little Women
5 The Twilight Saga Harry Potter and the Sorcerer's Stone
...
pair_counts = book_pairs.groupby(['book_a', 'book_b']).size()
book_a book_b
The Twilight Saga Fifty Shades of Grey 16
Pride and Prejudice 12
...
pair_counts_df = pair_counts.to_frame(name = 'size').reset_index()
print(pair_counts_df.head())
book_a book_b size
1 The Twilight Saga Fifty Shades of Grey 16
2 The Twilight Saga Pride and Prejudice 12
...
pair_counts_sorted = pair_counts_df.sort_values('size', ascending=False)
pair_counts_sorted[pair_counts_sorted['book_a'] == 'Lord of the Rings']
book_a book_b size
137 Lord of the Rings The Hobbit 12
147 Lord of the Rings Harry Potter and the Sorcerer's Stone 10
143 Lord of the Rings The Colour of Magic 9
...
Aanbevelingssystemen bouwen in Python