Regresi logistik multivariat

Generalized Linear Models di Python

Ita Cirovic Donev

Data Science Consultant

Konteks multivariat

  • Rumus model $$ \text{logit}(y) = \beta_0+\beta_1\color{red}{x_1} $$
Generalized Linear Models di Python

Konteks multivariat

  • Rumus model $$ \text{logit}(y) = \color{blue}{\beta_0}+\color{blue}{\beta_1}\color{red}{x_1} $$
Generalized Linear Models di Python

Konteks multivariat

  • Rumus model $$ \text{logit}(y) = \beta_0+\beta_1x_1 + \beta_2\color{red}{x_2} + ... + \beta_p \color{red}{x_p} $$
Generalized Linear Models di Python

Konteks multivariat

  • Rumus model $$ \text{logit}(y) = \beta_0+\beta_1x_1 + \color{blue}{\beta_2}\color{red}{x_2} + ... + \color{blue}{\beta_p}\color{red}{x_p} $$

  • Di Python

    model = glm('y ~ x1 + x2 + x3 + x4', 
              data = my_data, 
              family = sm.families.Binomial()).fit()
    
Generalized Linear Models di Python

Contoh - ganti sumur

formula = 'switch ~ distance100 + arsenic'
wells_fit = glm(formula = formula, data = wells, 
                family = sm.families.Binomial()).fit()
===============================================================================
                  coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------
Intercept       0.0027      0.079      0.035      0.972      -0.153       0.158
distance100    -0.8966      0.104     -8.593      0.000      -1.101      -0.692
arsenic         0.4608      0.041     11.134      0.000       0.380       0.542
===============================================================================
Generalized Linear Models di Python

Contoh - ganti sumur

                  coef    std err          z      P>|z|      [0.025      0.975]
-------------------------------------------------------------------------------
Intercept       0.0027      0.079      0.035      0.972      -0.153       0.158
distance100    -0.8966      0.104     -8.593      0.000      -1.101      -0.692
arsenic         0.4608      0.041     11.134      0.000       0.380       0.542
  • Kedua koefisien signifikan secara statistik
  • Tanda koefisien masuk akal
  • Perubahan 1 unit pada distance100 menurunkan logit sebesar 0,89
  • Perubahan 1 unit pada arsenic menaikkan logit sebesar 0,46
Generalized Linear Models di Python

Dampak menambah variabel

  • Dampak variabel arsenic
  • distance100 berubah dari -0,62 ke -0,89
  • Semakin jauh dari sumur aman
    • Semakin mungkin kadar arsenik lebih tinggi
                  coef    std err 
---------------------------------
Intercept       0.0027      0.079
distance100    -0.8966      0.104
arsenic         0.4608      0.041  
                  coef    std err
---------------------------------
Intercept       0.6060      0.060
distance100    -0.6291      0.097  
Generalized Linear Models di Python

Multikolinearitas

  • Variabel yang saling berkorelasi dengan variabel model lain

Struktur dua variabel dengan korelasi 0,8 0,4 0, -0,4 dan -0,8.

  • Galat baku koefisien meningkat
    • Koefisien bisa jadi tidak signifikan
1 https://en.wikipedia.org/wiki/Correlation_and_dependence
Generalized Linear Models di Python

Ada multikolinearitas?

Apa yang perlu dilihat?

  • Koefisien tidak signifikan, tetapi variabel sangat berkorelasi dengan $y$
  • Menambah/menghapus variabel sangat mengubah koefisien
  • Tanda koefisien tidak logis
  • Korelasi berpasangan antar variabel tinggi
Generalized Linear Models di Python

Variance inflation factor (VIF)

  • Diagnostik multikolinearitas paling umum
    • Dihitung untuk tiap variabel penjelas
    • Mengukur seberapa membengkak varians koefisien
  • Ambang saran VIF > 2,5
  • Di Python
from statsmodels.stats.outliers_influence import variance_inflation_factor
Generalized Linear Models di Python

Ayo berlatih!

Generalized Linear Models di Python

Preparing Video For Download...