Regression: regularization

Practicing Machine Learning Interview Questions in Python

Lisa Stuart

Data Scientist

Regularization algorithms

Ridge regression
Lasso regression
ElasticNet regression

Ordinary least squares

OLS Plot

OLS Formula

¹ https://en.wikipedia.org/wiki/Linear_regression#Simple_and_multiple_linear_regression

Ridge loss function

Ridge regression plot

Ridge regression formula

¹ https://gerardnico.com/data_mining/ridge_regression#tuning_parameter_math_lambdamath

Lasso loss function

Lasso regression plot

Lasso regression formula

¹ https://stats.stackexchange.com/questions/155192/why-discrepancy-between-lasso-and-randomforest

Ridge vs lasso

Regularization	L1 (Lasso)	L2 (Ridge)
penalizes	sum of absolute value of coefficients	sum of squares of coefficients
solutions	sparse	non-sparse
number of solutions	multiple	one
feature selection	yes	no
robust to outliers?	yes	no
complex patterns?	no	yes

ElasticNet

ElasticNet formula

Regularization with Boston housing data

Features	CHAS	NOX	RM
Coefficient estimates	2.7	-17.8	3.8
Regularized coefficient estimates	0	0	0.95

Regularization functions

# Lasso estimator 
sklearn.linear_model.Lasso

# Lasso estimator with cross-validation
sklearn.linear_model.LassoCV

# Ridge estimator
sklearn.linear_model.Ridge

# Ridge estimator with cross-validation
sklearn.linear_model.RidgeCV

# ElasticNet estimator
sklearn.linear_model.ElasticNet

# ElasticNet estimator with cross-validation
sklearn.linear_model.ElasticNetCV

# Train/test split
sklearn.model_selection.train_test_split

# Mean squared error
sklearn.metrics.mean_squared_error(y_test, 
                           predict(X_test))
# Best regularization parameter
mod_cv.alpha_

# Array of log values
alphas=np.logspace(-6, 6, 13)

Let's practice!

Practicing Machine Learning Interview Questions in Python