Generalized Linear Models di Python
Ita Cirovic Donev
Data Science Consultant
Mengimpor statsmodels
import statsmodels.api as sm
Dukungan formula
import statsmodels.formula.api as smf
Gunakan glm() langsung
from statsmodels.formula.api import glm
Berbasis FORMULA
from statsmodels.formula.api import glm
model = glm(formula, data, family)
Berbasis ARRAY
import statsmodels.api as sm
X = sm.add_constant(X)
model = sm.glm(y, X, family)
$$\texttt{\color{#00A388}{response}} \sim \texttt{\color{#FF6138}{explanatory variable(s)}}$$ $$\texttt{\color{#00A388}{output}} \sim \texttt{\color{#FF6138}{input(s)}}$$
formula = 'y ~ x1 + x2'
x1 sebagai variabel kategorikalx1 dan x2x1-x2 dan variabel individualfamily = sm.families.____()
Fungsi family:
Keluarga distribusi lain dapat ditinjau di situs statsmodels.
print(model_GLM.summary())
Generalized Linear Model Regression Results
=============================================================================
Dep. Variable: y No. Observations: 173
Model: GLM Df Residuals: 171
Model Family: Binomial Df Model: 1
Link Function: logit Scale: 1.0000
Method: IRLS Log-Likelihood: -97.226
Date: Mon, 21 Jan 2019 Deviance: 194.45
Time: 11:30:01 Pearson chi2: 165.
No. Iterations: 4 Covariance Type: nonrobust
=============================================================================
coef std err z P>|z| [0.025 0.975]
-----------------------------------------------------------------------------
Intercept -12.3508 2.629 -4.698 0.000 -17.503 -7.199
width 0.4972 0.102 4.887 0.000 0.298 0.697
=============================================================================
$\texttt{\color{#007AFF}{.params}}$ menampilkan koefisien regresi
model_GLM.params
Intercept -12.350818
width 0.497231
dtype: float64
$\texttt{\color{#007AFF}{.conf\_int(alpha=0.05, cols=None)}}$ menampilkan interval kepercayaan
model_GLM.conf_int()
0 1
Intercept -17.503010 -7.198625
width 0.297833 0.696629
model_GLM.predict(test_data)
0 0.029309
1 0.470299
2 0.834983
3 0.972363
4 0.987941
Generalized Linear Models di Python