From ratings to recommendations

Introduction to Data Engineering

Vincent Vankrunkelsven

Data Engineer @ DataCamp

The recommendations table

 

user_id course_id rating
1 1 4.8
1 74 4.78
1 21 4.5
2 32 4.9

 

The estimated rating of a course the user hasn't taken yet.

Introduction to Data Engineering

Recommendation techniques

 

  • Matrix factorization
  • Building Recommendation Engines with PySpark
Introduction to Data Engineering

Common sense transformation

Diagram representing courses table

 

Diagram representing rating table

Recommendations

user_id course_id rating
1 1 4.8
1 74 4.78
1 21 4.5
2 32 4.9
Introduction to Data Engineering

Average course ratings

Average course rating

course_id avg_rating
1 4.8
74 4.78
21 4.5
32 4.9

 

We want to recommend highly rated courses

Introduction to Data Engineering

Use the right programming language

Rating

user_id course_id programming_language rating
1 1 r 4.8
1 74 sql 4.78
1 21 sql 4.5
1 32 python 4.9

 

Recommend SQL course for user with id 1

Introduction to Data Engineering

Recommend new courses

Rating

user_id course_id programming_language rating
1 1 r 4.8
1 74 sql 4.78
1 21 sql 4.5
1 32 python 4.9

 

Don't recommend the combinations already in the rating table

Introduction to Data Engineering

Our recommendation transformation

 

  • Use technology that user has rated most
  • Don't recommend courses that user already rated
  • Recommend three highest rated courses from remaining combinations
Introduction to Data Engineering

Rating

user_id course_id programming_language rating
1 12 sql 4.78
1 52 sql 4.5
1 32 r 4.9

 

Recommend three highest rated SQL courses which are not 12 and 52.

Introduction to Data Engineering

Let's practice!

Introduction to Data Engineering

Preparing Video For Download...