Dealing with sparsity

Building Recommendation Engines in Python

Rob O'Callaghan

Director of Data

Sparse matrices

Small not sparse matrix

Building Recommendation Engines in Python

Sparse matrices

Small not sparse matrix and large sparse matrix

Building Recommendation Engines in Python

Sparse matrices

Small not sparse matrix and large sparse matrix

Building Recommendation Engines in Python

Measuring sparsity

print(book_rating_df)
title     The Great Gatsby    The Catcher in the Rye    Fifty Shades of Grey
User                    
User_233               3.0                       NaN                     NaN
User_651               NaN                       5.0                     4.0
User_965               4.0                       3.0                     NaN
     ...               ...                       ...                     ...
Building Recommendation Engines in Python

Measuring sparsity

number_of_empty = book_ratings_df.isnull().values.sum()

total_number = user_ratings_df.size
sparsity = number_of_empty/total_number
print(sparsity)
0.0114
Building Recommendation Engines in Python

Why sparsity matters

Large sparse matrix

Building Recommendation Engines in Python

Why sparsity matters

Large sparse matrix with empty cell highlighted

Building Recommendation Engines in Python

Why sparsity matters

Large sparse matrix with nearest populated neighbors highlighted

Building Recommendation Engines in Python

Why sparsity matters

Large sparse matrix with nearest populated neighbors highlighted

Building Recommendation Engines in Python

Measuring sparsity per column

user_ratings_df.notnull().sum()
The Pelican Brief                           1
Snow Crash                                  1
The Great Gatsby                           12
Fifty Shades of Grey                        9
Leviathan                                   1
                                           ..
Building Recommendation Engines in Python

Matrix factorization

Large sparse matrix

Building Recommendation Engines in Python

Matrix factorization

Large sparse matrix next to its factors

Building Recommendation Engines in Python

Matrix factorization

Large sparse matrix next to its factors and a large filled in matrix

Building Recommendation Engines in Python

Matrix multiplication

Two rectangular matrices

Building Recommendation Engines in Python

Matrix multiplication

Two rectangular matrices

Building Recommendation Engines in Python

Matrix multiplication

Two rectangular matrices

Building Recommendation Engines in Python

Matrix multiplication

Two rectangular matrices

Building Recommendation Engines in Python

Matrix multiplication

Two rectangular matrices next to a larger matrix that is the product of the matrices

Building Recommendation Engines in Python

Matrix multiplication

print(matrix_x)
[[4, 1], 
 [2, 2], 
 [3, 3]]
print(matrix_b)
[[1, 0, 4], 
 [0, 1, 6]]
Building Recommendation Engines in Python

Matrix multiplication

import numpy as np

dot_product = np.dot(matrix_x, matrix_b)
print(dot_product)
[[ 4  1 22]
 [ 2  2 20]
 [ 3  3 30]]
Building Recommendation Engines in Python

Let's practice!

Building Recommendation Engines in Python

Preparing Video For Download...