Intermediate Regression with statsmodels in Python
Maarten Van den Broeck
Content Developer at DataCamp
sns.scatterplot(x="length_cm",
y="height_cm",
data=fish,
hue="mass_g")
grid = sns.FacetGrid(data=fish,
col="species",
hue="mass_g", col_wrap=2,
palette="plasma")
grid.map(sns.scatterplot,
"length_cm",
"height_cm")
plt.show()
No interactions
ols("mass_g ~ length_cm + height_cm + species + 0", data=fish).fit()
two-way interactions between pairs of variables
ols(
"mass_g ~ length_cm + height_cm + species +
length_cm:height_cm + length_cm:species + height_cm:species + 0", data=fish).fit()
three-way interaction between all three variables
ols(
"mass_g ~ length_cm + height_cm + species +
length_cm:height_cm + length_cm:species + height_cm:species + length_cm:height_cm:species + 0", data=fish).fit()
ols(
"mass_g ~ length_cm + height_cm + species +
length_cm:height_cm + length_cm:species + height_cm:species + length_cm:height_cm:species + 0",
data=fish).fit()
same as
ols(
"mass_g ~ length_cm * height_cm * species + 0",
data=fish).fit()
ols(
"mass_g ~ length_cm + height_cm + species +
length_cm:height_cm + length_cm:species + height_cm:species + 0",
data=fish).fit()
same as
ols(
"mass_g ~ (length_cm + height_cm + species) ** 2 + 0",
data=fish).fit()
mdl_mass_vs_all = ols(
"mass_g ~ length_cm * height_cm * species + 0",
data=fish).fit()
length_cm = np.arange(5, 61, 5)
height_cm = np.arange(2, 21, 2)
species = fish["species"].unique()
p = product(length_cm, height_cm, species)
explanatory_data = pd.DataFrame(p,
columns=["length_cm",
"height_cm",
"species"])
prediction_data = explanatory_data.assign(
mass_g = mdl_mass_vs_all.predict(explanatory_data))
print(prediction_data)
length_cm height_cm species mass_g
0 5 2 Bream -570.656437
1 5 2 Roach 31.449145
2 5 2 Perch 43.789984
3 5 2 Pike 271.270093
4 5 4 Bream -451.127405
.. ... ... ... ...
475 60 18 Pike 2690.346384
476 60 20 Bream 1531.618475
477 60 20 Roach 2621.797668
478 60 20 Perch 3041.931709
479 60 20 Pike 2926.352397
[480 rows x 4 columns]
Intermediate Regression with statsmodels in Python