Regression: regularization

Practicing Machine Learning Interview Questions in Python

Lisa Stuart

Data Scientist

Regularization algorithms

  • Ridge regression
  • Lasso regression
  • ElasticNet regression
Practicing Machine Learning Interview Questions in Python

Ordinary least squares

OLS Plot

OLS Formula

1 https://en.wikipedia.org/wiki/Linear_regression#Simple_and_multiple_linear_regression
Practicing Machine Learning Interview Questions in Python

Ridge loss function

Ridge regression plot

Ridge regression formula

1 https://gerardnico.com/data_mining/ridge_regression#tuning_parameter_math_lambdamath
Practicing Machine Learning Interview Questions in Python

Lasso loss function

Lasso regression plot

Lasso regression formula

1 https://stats.stackexchange.com/questions/155192/why-discrepancy-between-lasso-and-randomforest
Practicing Machine Learning Interview Questions in Python

Ridge vs lasso

Regularization L1 (Lasso) L2 (Ridge)
penalizes sum of absolute value of coefficients sum of squares of coefficients
solutions sparse non-sparse
number of solutions multiple one
feature selection yes no
robust to outliers? yes no
complex patterns? no yes
Practicing Machine Learning Interview Questions in Python

ElasticNet

ElasticNet formula

Practicing Machine Learning Interview Questions in Python

Regularization with Boston housing data

Features CHAS NOX RM
Coefficient estimates 2.7 -17.8 3.8
Regularized coefficient estimates 0 0 0.95
Practicing Machine Learning Interview Questions in Python

Regularization functions

# Lasso estimator 
sklearn.linear_model.Lasso

# Lasso estimator with cross-validation
sklearn.linear_model.LassoCV

# Ridge estimator
sklearn.linear_model.Ridge

# Ridge estimator with cross-validation
sklearn.linear_model.RidgeCV

# ElasticNet estimator
sklearn.linear_model.ElasticNet
# ElasticNet estimator with cross-validation
sklearn.linear_model.ElasticNetCV

# Train/test split
sklearn.model_selection.train_test_split

# Mean squared error
sklearn.metrics.mean_squared_error(y_test, 
                           predict(X_test))
# Best regularization parameter
mod_cv.alpha_

# Array of log values
alphas=np.logspace(-6, 6, 13)
Practicing Machine Learning Interview Questions in Python

Let's practice!

Practicing Machine Learning Interview Questions in Python

Preparing Video For Download...