Overview of binary, implicit ratings

Building Recommendation Engines with PySpark

Jamen Long

Data Scientist at Nike

Binary ratings

binary_movie_ratings.show()
+------+-------+-------------+
|userId|movieId|binary_rating|
+------+-------+-------------+
|    26|    474|            0|
|    26|   2529|            1|
|    26|     26|            0|
|    26|   1950|            0|
|    26|   4823|            1|
|    26|  72011|            1|
|    26| 142507|            0|
|    38|   1325|            0|
|    38|   6011|            1|
+------+-------+-------------+
Building Recommendation Engines with PySpark

Class imbalance

getSparsity(binary_ratings)
Sparsity: .993
Building Recommendation Engines with PySpark

Item weighting

  • Item Weighting: Movies with more user views = higher weight
Building Recommendation Engines with PySpark

Item weighting and user weighting

  • Item Weighting: Movies with more user views = higher weight
  • User Weighting: Users that have seen more movies will have lower weights applied to unseen movies
Building Recommendation Engines with PySpark

Let's practice!

Building Recommendation Engines with PySpark

Preparing Video For Download...