Building Recommendation Engines with PySpark
Jamen Long
Data Scientist at Nike
als_model = ALS(userCol="userId", itemCol="movieId", ratingCol="rating",
rank=25, maxIter=100, regParam=.05, alpha=40,
nonnegative=True,
coldStartStrategy="drop",
implicitPrefs=False)
als_model = ALS(userCol="userId", itemCol="movieId", ratingCol="rating",
rank=25, maxIter=100, regParam=.05, alpha=40,
nonnegative=True,
coldStartStrategy="drop",
implicitPrefs=False)
Arguments
userCol
: Name of column that contains user id's itemCol
: Name of column that contains item id's ratingCol
: Name of column that contains ratingsals_model = ALS(userCol="userId", itemCol="movieId", ratingCol="rating",
rank=25, maxIter=100, regParam=.05, alpha=40,
nonnegative=True,
coldStartStrategy="drop",
implicitPrefs=False)
Hyperparameters
rank
, $k$: number of latent featuresals_model = ALS(userCol="userId", itemCol="movieId", ratingCol="rating",
rank=25, maxIter=100, regParam=.05, alpha=40,
nonnegative=True,
coldStartStrategy="drop",
implicitPrefs=False)
Hyperparameters
rank
, $k$: number of latent featuresmaxIter
: number of iterationsals_model = ALS(userCol="userId", itemCol="movieId", ratingCol="rating",
rank=25, maxIter=100, regParam=.05, alpha=40,
nonnegative=True,
coldStartStrategy="drop",
implicitPrefs=False)
Hyperparameters
rank
, $k$: number of latent featuresmaxIter
: number of iterationsregParam
: Lambdaals_model = ALS(userCol="userId", itemCol="movieId", ratingCol="rating",
rank=25, maxIter=100, regParam=.05, alpha=40,
nonnegative=True,
coldStartStrategy="drop",
implicitPrefs=False)
Hyperparameters
rank
, $k$: number of latent featuresmaxIter
: number of iterationsregParam
: Lambdaalpha
: Discussed later. Only used with implicit ratings.als_model = ALS(userCol="userId", itemCol="movieId", ratingCol="rating",
rank=25, maxIter=100, regParam=.05, alpha=40,
nonnegative=True,
coldStartStrategy="drop",
implicitPrefs=False)
Additional Arguments
nonnegative = True
: Ensures positive numbersals_model = ALS(userCol="userId", itemCol="movieId", ratingCol="rating",
rank=25, maxIter=100, regParam=.05, alpha=40,
nonnegative=True,
coldStartStrategy="drop",
implicitPrefs=False)
Additional Arguments
nonnegative = True
: Ensures positive numberscoldStartStrategy = "drop"
: Addresses issues with test/train splitals_model = ALS(userCol="userId", itemCol="movieId", ratingCol="rating",
rank=25, maxIter=100, regParam=.05, alpha=40,
nonnegative=True,
coldStartStrategy="drop",
implicitPrefs=False)
Additional Arguments
nonnegative = True
: Ensures positive numberscoldStartStrategy = "drop"
: Addresses issues with test/train splitimplicitPrefs = True
: True/False depending on ratings typeals = ALS(userCol="userId", itemCol="movieId", ratingCol="rating",
rank=25, maxIter=100, regParam=.05,
nonnegative=True,
coldStartStrategy="drop",
implicitPrefs=False)
# Fit ALS to training dataset
model = als.fit(training_data)
# Generate predictions on test dataset
predictions = model.transform(test_data)
Building Recommendation Engines with PySpark