Creare motori di raccomandazione in Python
Robert O'Callaghan
Director of Data
0 User_223 The Great Gatsby <---| Letto dallo stesso utente
1 User_223 The Catcher in the Rye <---|
2 User_131 The Lord of the Rings
3 User_965 Little Women <---| Letto dallo stesso utente
4 User_965 Fifty Shades of Grey <---|
... ...
| Utente | book_title |
|---|---|
| User_233 | The Great Gatsby |
| User_233 | The Catcher in the Rye |
In:
| Book A | Book B | |
|---|---|---|
| 0 | The Great Gatsby | The Catcher in the Rye |
| 1 | The Catcher in the Rye | The Great Gatsby |
Libri visti con The Great Gatsby -> The Catcher in the Rye
Libri visti con The Catcher in the Rye -> The Great Gatsby
from itertools import permutations
def create_pairs(x):
return pairs
from itertools import permutations
def create_pairs(x):
pairs = permutations(x.values, 2)
return pairs
permutations(list, length_of_permutations)) Genera un oggetto iterabile con tutte le permutazionifrom itertools import permutations
def create_pairs(x):
pairs = list(permutations(x.values, 2))
return pairs
permutations(list, length_of_permutations)) Genera un oggetto iterabile con tutte le permutazioni
list() Converte l'oggetto in una lista utilizzabile
from itertools import permutations
def create_pairs(x):
pairs = pd.DataFrame(list(permutations(x.values, 2)),
columns=['book_a','book_b'])
return pairs
permutations(list, length_of_permutations)) Genera un oggetto iterabile con tutte le permutazioni
list() Converte l'oggetto in una lista utilizzabile
pd.DataFrame() Converte la lista in un DataFrame con le colonne book_a
e book_b
book_pairs = book_df.groupby('userId')['book_title'].apply(perm_function)
print(book_pairs.head())
book_a book_b
userId
User_223 0 The Great Gatsby The Catcher in the Rye
1 The Catcher in the Rye The Great Gatsby
User_965 0 Little Women 40 Shades of Grey
1 40 Shades of Grey Little Women
User_773 0 The Twilight Saga Harry Potter and the Sorcerer's Stone
...
book_pairs = book_pairs.reset_index(drop=True)
print(book_pairs.head())
book_a book_b
0 The Great Gatsby The Catcher in the Rye
1 The Catcher in the Rye The Great Gatsby
3 Little Women 40 Shades of Grey
4 40 Shades of Grey Little Women
5 The Twilight Saga Harry Potter and the Sorcerer's Stone
...
pair_counts = book_pairs.groupby(['book_a', 'book_b']).size()
book_a book_b
The Twilight Saga Fifty Shades of Grey 16
Pride and Prejudice 12
...
pair_counts_df = pair_counts.to_frame(name = 'size').reset_index()
print(pair_counts_df.head())
book_a book_b size
1 The Twilight Saga Fifty Shades of Grey 16
2 The Twilight Saga Pride and Prejudice 12
...
pair_counts_sorted = pair_counts_df.sort_values('size', ascending=False)
pair_counts_sorted[pair_counts_sorted['book_a'] == 'Lord of the Rings']
book_a book_b size
137 Lord of the Rings The Hobbit 12
147 Lord of the Rings Harry Potter and the Sorcerer's Stone 10
143 Lord of the Rings The Colour of Magic 9
...
Creare motori di raccomandazione in Python