Generalized Linear Models in Python
Ita Cirovic Donev
Data Science Consultant
Importing statsmodels
import statsmodels.api as sm
Support for formulas
import statsmodels.formula.api as smf
Use glm()
directly
from statsmodels.formula.api import glm
FORMULA based
from statsmodels.formula.api import glm
model = glm(formula, data, family)
ARRAY based
import statsmodels.api as sm
X = sm.add_constant(X)
model = sm.glm(y, X, family)
$$\texttt{\color{#00A388}{response}} \sim \texttt{\color{#FF6138}{explanatory variable(s)}}$$ $$\texttt{\color{#00A388}{output}} \sim \texttt{\color{#FF6138}{input(s)}}$$
formula = 'y ~ x1 + x2'
x1
as categorical variablex1
and x2
x1
and x2
and the individual variablesfamily = sm.families.____()
The family functions:
Other distribution families you can review at statsmodels website.
print(model_GLM.summary())
Generalized Linear Model Regression Results
=============================================================================
Dep. Variable: y No. Observations: 173
Model: GLM Df Residuals: 171
Model Family: Binomial Df Model: 1
Link Function: logit Scale: 1.0000
Method: IRLS Log-Likelihood: -97.226
Date: Mon, 21 Jan 2019 Deviance: 194.45
Time: 11:30:01 Pearson chi2: 165.
No. Iterations: 4 Covariance Type: nonrobust
=============================================================================
coef std err z P>|z| [0.025 0.975]
-----------------------------------------------------------------------------
Intercept -12.3508 2.629 -4.698 0.000 -17.503 -7.199
width 0.4972 0.102 4.887 0.000 0.298 0.697
=============================================================================
$\texttt{\color{#007AFF}{.params}}$ prints regression coefficients
model_GLM.params
Intercept -12.350818
width 0.497231
dtype: float64
$\texttt{\color{#007AFF}{.conf\_int(alpha=0.05, cols=None)}}$ prints confidence intervals
model_GLM.conf_int()
0 1
Intercept -17.503010 -7.198625
width 0.297833 0.696629
model_GLM.predict(test_data)
0 0.029309
1 0.470299
2 0.834983
3 0.972363
4 0.987941
Generalized Linear Models in Python