Introduction to Regression with statsmodels in Python
Maarten Van den Broeck
Content Developer at DataCamp
species | mass_g |
---|---|
Bream | 242.0 |
Perch | 5.9 |
Pike | 200.0 |
Roach | 40.0 |
... | ... |
import matplotlib.pyplot as plt
import seaborn as sns
sns.displot(data=fish,
x="mass_g",
col="species",
col_wrap=2,
bins=9)
plt.show()
summary_stats = fish.groupby("species")["mass_g"].mean()
print(summary_stats)
species
Bream 617.828571
Perch 382.239286
Pike 718.705882
Roach 152.050000
Name: mass_g, dtype: float64
from statsmodels.formula.api import ols mdl_mass_vs_species = ols("mass_g ~ species", data=fish).fit()
print(mdl_mass_vs_species.params)
Intercept 617.828571
species[T.Perch] -235.589286
species[T.Pike] 100.877311
species[T.Roach] -465.778571
From previous slide, model with intercept
mdl_mass_vs_species = ols( "mass_g ~ species", data=fish).fit()
print(mdl_mass_vs_species.params)
Intercept 617.828571
species[T.Perch] -235.589286
species[T.Pike] 100.877311
species[T.Roach] -465.778571
The coefficients are relative to the intercept: $617.83 - 235.59 = 382.24!$
Model without an intercept
mdl_mass_vs_species = ols( "mass_g ~ species + 0", data=fish).fit()
print(mdl_mass_vs_species.params)
species[Bream] 617.828571
species[Perch] 382.239286
species[Pike] 718.705882
species[Roach] 152.050000
In case of a single, categorical variable, coefficients are the means.
Introduction to Regression with statsmodels in Python