Cara memodelkan GLM di Python?

Generalized Linear Models di Python

Ita Cirovic Donev

Data Science Consultant

statsmodels

  • Mengimpor statsmodels

    import statsmodels.api as sm
    
  • Dukungan formula

    import statsmodels.formula.api as smf
    
  • Gunakan glm() langsung

    from statsmodels.formula.api import glm
    
Generalized Linear Models di Python

Proses fitting model

  1. Deskripsikan model $\rightarrow \texttt{\color{#007AFF}{glm()}}$
  2. Fit model $\rightarrow \texttt{\color{#007AFF}{.fit()}}$
  3. Ringkas model $\rightarrow \texttt{\color{#007AFF}{.summary()}}$
  4. Buat prediksi $\rightarrow \texttt{\color{#007AFF}{.predict()}}$
Generalized Linear Models di Python

Mendeskripsikan model

Berbasis FORMULA

from statsmodels.formula.api import glm
model = glm(formula, data, family)

Berbasis ARRAY

import statsmodels.api as sm
X = sm.add_constant(X)
model = sm.glm(y, X, family)
Generalized Linear Models di Python

Argumen Formula

$$\texttt{\color{#00A388}{response}} \sim \texttt{\color{#FF6138}{explanatory variable(s)}}$$ $$\texttt{\color{#00A388}{output}} \sim \texttt{\color{#FF6138}{input(s)}}$$

formula = 'y ~ x1 + x2'
  • $\texttt{\color{#FF6138}{C(x1)}}$ : perlakukan x1 sebagai variabel kategorikal
  • $\texttt{\color{#FF6138}{-1}}$ : hapus intersep
  • $\texttt{\color{#FF6138}{x1:x2}}$ : istilah interaksi antara x1 dan x2
  • $\texttt{\color{#FF6138}{x1*x2}}$ : interaksi x1-x2 dan variabel individual
  • $\texttt{\color{#FF6138}{np.log(x1)}}$ : terapkan fungsi vektorisasi ke variabel model
Generalized Linear Models di Python

Argumen Family

family = sm.families.____()

Fungsi family:

  • $\texttt{\color{#007AFF}{Gaussian}(link = sm.families.links.\color{deeppink}{identity()})}$ $\rightarrow$ family default
  • $\texttt{\color{#007AFF}{Binomial}(link = sm.families.links.\color{deeppink}{logit()})}$
    • $\texttt{\color{deeppink}{probit()}}$, $\texttt{\color{deeppink}{cauchy()}}$, $\texttt{\color{deeppink}{log()}}$, dan $\texttt{\color{deeppink}{cloglog()}}$
  • $\texttt{\color{#007AFF}{Poisson}(link = sm.families.links.\color{deeppink}{log()})}$
    • $\texttt{\color{deeppink}{identity()}}$ dan $\texttt{\color{deeppink}{sqrt()}}$

Keluarga distribusi lain dapat ditinjau di situs statsmodels.

Generalized Linear Models di Python

Meringkas model

print(model_GLM.summary())
Generalized Linear Models di Python
                 Generalized Linear Model Regression Results                  
=============================================================================
Dep. Variable:                      y  No. Observations:                  173
Model:                            GLM  Df Residuals:                      171
Model Family:                Binomial  Df Model:                            1
Link Function:                  logit  Scale:                          1.0000
Method:                          IRLS  Log-Likelihood:                -97.226
Date:                Mon, 21 Jan 2019  Deviance:                       194.45
Time:                        11:30:01  Pearson chi2:                     165.
No. Iterations:                     4  Covariance Type:             nonrobust
=============================================================================
                 coef    std err         z      P>|z|      [0.025      0.975]
-----------------------------------------------------------------------------
Intercept    -12.3508      2.629    -4.698      0.000     -17.503      -7.199
width          0.4972      0.102     4.887      0.000       0.298       0.697
=============================================================================
Generalized Linear Models di Python

Koefisien regresi

$\texttt{\color{#007AFF}{.params}}$ menampilkan koefisien regresi

model_GLM.params
Intercept   -12.350818
width         0.497231
dtype: float64

$\texttt{\color{#007AFF}{.conf\_int(alpha=0.05, cols=None)}}$ menampilkan interval kepercayaan

model_GLM.conf_int()
                   0         1
Intercept -17.503010 -7.198625
width       0.297833  0.696629
Generalized Linear Models di Python

Prediksi

  • Cantumkan semua variabel model dalam data uji
  • $\texttt{\color{#007AFF}{.predict(\color{#FF931B}{test\_data})}}$ menghitung prediksi
model_GLM.predict(test_data)
0    0.029309
1    0.470299
2    0.834983
3    0.972363
4    0.987941
Generalized Linear Models di Python

Ayo berlatih!

Generalized Linear Models di Python

Preparing Video For Download...