Data biner dan regresi logistik

Generalized Linear Models di Python

Ita Cirovic Donev

Data Science Consultant

Data respons biner

  • Respons dua kelas $\rightarrow \large{\texttt{\color{#079EA1}{0},\color{#ED715F}{1}}}$

Contoh:

  • Skor kredit $\rightarrow \texttt{\color{#ED715F}{"Default"}/\color{#079EA1}{"Non-Default"}}$
  • Lulus ujian $\rightarrow \texttt{\color{#079EA1}{"Pass"}/\color{#ED715F}{"Fail"}}$
  • Deteksi penipuan $\rightarrow \texttt{\color{#ED715F}{"Fraud"}/\color{#079EA1}{"No-Fraud"}}$
  • Pilihan produk $\rightarrow \texttt{\color{#2485F2}{"Product ABC"}/\color{#F2AC30}{"Product XYZ"}}$
Generalized Linear Models di Python

Data biner

TIDAK KELOMPOK

  • Satu kejadian
  • Lempar satu koin
  • Dua kemungkinan hasil: 0/1
  • $Bernoulli(p)$ atau
  • $Binomial(n=1,p)$

DIKELOMPOKKAN

  • Banyak kejadian
  • Lempar banyak koin
  • Jumlah sukses dalam $n$ percobaan
  • $Binomial(n,p)$
Generalized Linear Models di Python

Fungsi logistik

Diagram sebar jam belajar dan respons lulus/gagal (0/1)

Generalized Linear Models di Python

Fungsi logistik

Diagram sebar jam belajar dan respons lulus/gagal (0/1)

  • Hasil tes: $PASS=1$ atau $FAIL=0$

  • Ingin memodelkan

$P(y=1)=\beta_0 + \beta_1x_1$

$P(\text{Pass})=\beta_0 + \beta_1 \times \text{Hours of study}$

Generalized Linear Models di Python

Fungsi logistik

Fitting logistik pada data jam belajar dan respons lulus/gagal (0/1)

  • Hasil tes: $PASS=1$ atau $FAIL=0$

  • Ingin memodelkan

$P(y=1)=\beta_0 + \beta_1x_1$

$P(\text{Pass})=\beta_0 + \beta_1 \times \text{Hours of study}$

  • Gunakan fungsi logistik

$f(z) = \frac{1}{(1+\exp(-z))}$

Generalized Linear Models di Python

Odds dan odds ratio

       

$$ ODDS = \frac{\text{kejadian terjadi}}{\text{kejadian TIDAK terjadi}} $$

       

$$ \text{ODDS RATIO} = \frac{odds 1}{odds 2} $$

Generalized Linear Models di Python

Contoh odds

  • 4 gim 3 kemenangan dan 1 kekalahan sebagai urutan

  • Odds adalah 3 banding 1 Perhitungan visual odds dengan 3 kotak menang di pembilang dan 1 kotak kalah di penyebut.

Generalized Linear Models di Python

Odds dan probabilitas

  $$ \text{odds} \neq \text{probability} $$

  $$ \text{odds} = \frac{\text{probability}}{1-\text{probability}} $$

  $$ \text{probability} = \frac{\text{odds}}{1+\text{odds}} $$

Generalized Linear Models di Python

Dari model probabilitas ke regresi logistik

 

Langkah 1. Model probabilitas

$E(y)=\mu=P(y=1)=\beta_0 + \beta_1x_1$

 

Langkah 2. Fungsi logistik

$f(z) = \large{\frac{1}{(1+\exp(-z))}}$

 

Langkah 3. Terapkan fungsi logistik $\rightarrow$ INVERSE-LOGIT

$\mu = \large{\frac{1}{1+\exp(-(\beta_0+\beta_1x_1))}} = \large{\frac{\exp(\beta_0+\beta_1x_1)}{1+\exp(\beta_0+\beta_1x_1)}}$

$1-\mu = \large{\frac{1}{1+\exp(\beta_0+\beta_1x_1)}}$

Generalized Linear Models di Python

Dari model probabilitas ke regresi logistik

 

  • Probabilitas $\rightarrow$ odds $$ ODDS=\frac{\mu}{1-\mu} = exp{(\beta_0+\beta_1x_1)} $$  
  • Transformasi log $\rightarrow \color{#CF5383}{\text{REGRESI LOGISTIK}}$

  $$ LOGIT(\mu)=log(\frac{\mu}{1-\mu}) = \beta_0+\beta_1x_1 $$

Generalized Linear Models di Python

Regresi logistik di Python

Fungsi - glm()

model_GLM = glm(formula = 'y ~ x',                        
                data = my_data, 
                family = sm.families.Binomial()).fit

Input

y = [0,1,1,0,...]
y = ['No','Yes','Yes',...]
y = ['Fail','Pass','Pass',...]
Generalized Linear Models di Python

Ayo berlatih!

Generalized Linear Models di Python

Preparing Video For Download...