Non-personalized suggestions

Building Recommendation Engines in Python

Robert O'Callaghan

Director of Data

Identifying pairs

0     User_223           The Great Gatsby <---| Read by the same user
1     User_223     The Catcher in the Rye <---|
2     User_131        The Lord of the Rings
3     User_965               Little Women <---| Read by the same user
4     User_965       Fifty Shades of Grey <---|
... ...
Building Recommendation Engines in Python

Permutations versus combinations

User book_title
User_233 The Great Gatsby
User_233 The Catcher in the Rye

To:

Book A Book B
0 The Great Gatsby The Catcher in the Rye
1 The Catcher in the Rye The Great Gatsby

Books seen with The Great Gatsby -> The Catcher in the Rye

Books seen with The Catcher in the Rye -> The Great Gatsby

Building Recommendation Engines in Python

Creating the pairing function

from itertools import permutations

def create_pairs(x):


  return pairs
Building Recommendation Engines in Python

Creating the pairing function

from itertools import permutations

def create_pairs(x):
  pairs =                   permutations(x.values, 2)

  return pairs
  • permutations(list, length_of_permutations)) Generates iterable object containing all permutations
Building Recommendation Engines in Python

Creating the pairing function

from itertools import permutations

def create_pairs(x):
  pairs =              list(permutations(x.values, 2))

  return pairs
  • permutations(list, length_of_permutations)) Generates iterable object containing all permutations

  • list() Converts this object to a usable list

Building Recommendation Engines in Python

Creating the pairing function

from itertools import permutations

def create_pairs(x):
  pairs = pd.DataFrame(list(permutations(x.values, 2)), 
                           columns=['book_a','book_b'])
  return pairs
  • permutations(list, length_of_permutations)) Generates iterable object containing all permutations

  • list() Converts this object to a usable list

  • pd.DataFrame() Converts the list to a DataFrame containing the columns book_a and book_b

Building Recommendation Engines in Python

Applying the function to the data

book_pairs = book_df.groupby('userId')['book_title'].apply(perm_function)
print(book_pairs.head())
                                book_a                                   book_b
userId                                                   
User_223     0        The Great Gatsby                   The Catcher in the Rye
             1  The Catcher in the Rye                         The Great Gatsby
User_965     0            Little Women                        40 Shades of Grey
             1       40 Shades of Grey                             Little Women
User_773     0       The Twilight Saga    Harry Potter and the Sorcerer's Stone
                                                                            ...
Building Recommendation Engines in Python

Cleaning up the results

book_pairs = book_pairs.reset_index(drop=True)
print(book_pairs.head())
                     book_a                                   book_b
0          The Great Gatsby                   The Catcher in the Rye
1    The Catcher in the Rye                         The Great Gatsby
3              Little Women                        40 Shades of Grey
4         40 Shades of Grey                             Little Women
5         The Twilight Saga    Harry Potter and the Sorcerer's Stone
                                                                 ...
Building Recommendation Engines in Python

Counting the pairings

pair_counts = book_pairs.groupby(['book_a', 'book_b']).size()
book_a                                book_b                             
The Twilight Saga                     Fifty Shades of Grey           16
                                      Pride and Prejudice            12
                                                                    ...
pair_counts_df = pair_counts.to_frame(name = 'size').reset_index()
print(pair_counts_df.head())
     book_a                                book_b                       size    
1    The Twilight Saga                     Fifty Shades of Grey           16
2    The Twilight Saga                     Pride and Prejudice            12
                                                                         ...
Building Recommendation Engines in Python

Looking up recommendations

pair_counts_sorted = pair_counts_df.sort_values('size', ascending=False)
pair_counts_sorted[pair_counts_sorted['book_a'] == 'Lord of the Rings']
                  book_a                                     book_b size
137    Lord of the Rings                                 The Hobbit   12
147    Lord of the Rings      Harry Potter and the Sorcerer's Stone   10
143    Lord of the Rings                        The Colour of Magic    9
                                                                     ...
Building Recommendation Engines in Python

Let's practice!

Building Recommendation Engines in Python

Preparing Video For Download...