Introductie tot regressie met statsmodels in Python
Maarten Van den Broeck
Content Developer at DataCamp
Brasem

Baars

Ook wel “r-kwadraat” of “R-kwadraat” genoemd.
Het aandeel van de variantie in de responsvariabele dat verklaard wordt door de verklarende variabele
1 betekent perfecte fit0 betekent de slechtst mogelijke fitKijk naar de regel “R-squared”
mdl_bream = ols("mass_g ~ length_cm", data=bream).fit()
print(mdl_bream.summary())
# Some lines of output omitted
OLS Regression Results
Dep. Variable: mass_g R-squared: 0.878
Model: OLS Adj. R-squared: 0.874
Method: Least Squares F-statistic: 237.6
print(mdl_bream.rsquared)
0.8780627095147174
coeff_determination = bream["length_cm"].corr(bream["mass_g"]) ** 2
print(coeff_determination)
0.8780627095147173

mse = mdl_bream.mse_resid
print('mse: ', mse)
mse: 5498.555084973521
rse = np.sqrt(mse)
print("rse: ", rse)
rse: 74.15224261594197
residuals_sq = mdl_bream.resid ** 2
print("residuals sq: \n", residuals_sq)
residuals sq:
0 138.957118
1 260.758635
2 5126.992578
3 1318.919660
4 390.974309
...
30 2125.047026
31 6576.923291
32 206.259713
33 889.335096
34 7665.302003
Length: 35, dtype: float64
residuals_sq = mdl_bream.resid ** 2
resid_sum_of_sq = sum(residuals_sq)
print("resid sum of sq :",
resid_sum_of_sq)
resid sum of sq : 181452.31780412616
residuals_sq = mdl_bream.resid ** 2
resid_sum_of_sq = sum(residuals_sq)
deg_freedom = len(bream.index) - 2
print("deg freedom: ", deg_freedom)
Het aantal vrijheidsgraden is het aantal observaties min het aantal modelcoëfficiënten.
deg freedom: 33
residuals_sq = mdl_bream.resid ** 2
resid_sum_of_sq = sum(residuals_sq)
deg_freedom = len(bream.index) - 2
rse = np.sqrt(resid_sum_of_sq/deg_freedom)
print("rse :", rse)
rse : 74.15224261594197
mdl_bream heeft een RSE van 74.
Het verschil tussen voorspelde en geobserveerde brasemmassa’s is typisch circa 74 g.
residuals_sq = mdl_bream.resid ** 2
resid_sum_of_sq = sum(residuals_sq)
deg_freedom = len(bream.index) - 2
rse = np.sqrt(resid_sum_of_sq/deg_freedom)
print("rse :", rse)
rse : 74.15224261594197
residuals_sq = mdl_bream.resid ** 2
resid_sum_of_sq = sum(residuals_sq)
n_obs = len(bream.index)
rmse = np.sqrt(resid_sum_of_sq/n_obs)
print("rmse :", rmse)
rmse : 72.00244396727619
Introductie tot regressie met statsmodels in Python