Generalized Linear Models in Python
Ita Cirovic Donev
Data Science Consultant
statsmodels importeren
import statsmodels.api as sm
Ondersteuning voor formules
import statsmodels.formula.api as smf
glm() direct gebruiken
from statsmodels.formula.api import glm
OP FORMULE gebaseerd
from statsmodels.formula.api import glm
model = glm(formule, data, family)
OP ARRAY gebaseerd
import statsmodels.api as sm
X = sm.add_constant(X)
model = sm.glm(y, X, family)
$$\texttt{\color{#00A388}{respons}} \sim \texttt{\color{#FF6138}{verklarende variabele(n)}}$$ $$\texttt{\color{#00A388}{output}} \sim \texttt{\color{#FF6138}{input(s)}}$$
formula = 'y ~ x1 + x2'
x1 als categorische variabelex1 en x2family = sm.families.____()
De family-functies:
Andere distributiefamilies vind je op de statsmodels-website.
print(model_GLM.summary())
Generalized Linear Model Regression Results
=============================================================================
Dep. Variable: y No. Observations: 173
Model: GLM Df Residuals: 171
Model Family: Binomial Df Model: 1
Link Function: logit Scale: 1.0000
Method: IRLS Log-Likelihood: -97.226
Date: Mon, 21 Jan 2019 Deviance: 194.45
Time: 11:30:01 Pearson chi2: 165.
No. Iterations: 4 Covariance Type: nonrobust
=============================================================================
coef std err z P>|z| [0.025 0.975]
-----------------------------------------------------------------------------
Intercept -12.3508 2.629 -4.698 0.000 -17.503 -7.199
width 0.4972 0.102 4.887 0.000 0.298 0.697
=============================================================================
$\texttt{\color{#007AFF}{.params}}$ print regressiecoëfficiënten
model_GLM.params
Intercept -12.350818
width 0.497231
dtype: float64
$\texttt{\color{#007AFF}{.conf\_int(alpha=0.05, cols=None)}}$ print betrouwbaarheidsintervallen
model_GLM.conf_int()
0 1
Intercept -17.503010 -7.198625
width 0.297833 0.696629
model_GLM.predict(test_data)
0 0.029309
1 0.470299
2 0.834983
3 0.972363
4 0.987941
Generalized Linear Models in Python