Visualizing model fit

Introduction to Regression with statsmodels in Python

Maarten Van den Broeck

Content Developer at DataCamp

Residual properties of a good fit

  • Residuals are normally distributed
  • The mean of the residuals is zero
Introduction to Regression with statsmodels in Python

Bream and perch again

Bream: the "good" model

mdl_bream = ols("mass_g ~ length_cm", data=bream).fit()

The scatter plot of bream masses versus their lengths, with a trend line, that has been shown previously.

Perch: the "bad" model

mdl_perch = ols("mass_g ~ length_cm", data=perch).fit()

The scatter plot of perch masses versus their lengths, with a trend line, that has been shown previously.

Introduction to Regression with statsmodels in Python

Residuals vs. fitted

Bream

A scatter plot of bream model residuals versus fitted values, with a LOWESS trend line. The trend line stays close to the x-axis.

Perch

A scatter plot of perch model residuals versus fitted values, with a LOWESS trend line. The trend line forms a V shape.

Introduction to Regression with statsmodels in Python

Q-Q plot

Bream

A Q-Q plot of bream model standardized residuals versus theoretical quantiles. The points closely follow the line where residuals and quantiles are equal, except for two outliers.

Perch

A Q-Q plot of perch model standardized residuals versus theoretical quantiles. The points don't closely follow the line where residuals and quantiles are equal, particularly on the right-hand side of the plot.

Introduction to Regression with statsmodels in Python

Scale-location plot

Bream

A scatter plot of bream model square root standardized residuals versus fitted values, with a LOWESS trend line. The trend line moves slightly upwards as fitted values increase.

Perch

A scatter plot of perch model square root standardized residuals versus fitted values, with a LOWESS trend line. The trend line moves up and down several times as fitted values increase.

Introduction to Regression with statsmodels in Python

residplot()

sns.residplot(x="length_cm", y="mass_g", data=bream, lowess=True)
plt.xlabel("Fitted values")
plt.ylabel("Residuals")

A scatter plot of bream model residuals versus fitted values, with a LOWESS trend line. The trend line stays close to the x-axis.

Introduction to Regression with statsmodels in Python

qqplot()

from statsmodels.api import qqplot
qqplot(data=mdl_bream.resid, fit=True, line="45")

Bream QQ plot, as seen before

Introduction to Regression with statsmodels in Python

Scale-location plot

model_norm_residuals_bream = mdl_bream.get_influence().resid_studentized_internal

model_norm_residuals_abs_sqrt_bream = np.sqrt(np.abs(model_norm_residuals_bream))
sns.regplot(x=mdl_bream.fittedvalues, y=model_norm_residuals_abs_sqrt_bream, ci=None, lowess=True)
plt.xlabel("Fitted values") plt.ylabel("Sqrt of abs val of stdized residuals")

Bream scale location plot, as seen before

Introduction to Regression with statsmodels in Python

Let's practice!

Introduction to Regression with statsmodels in Python

Preparing Video For Download...