Melampaui regresi linear

Generalized Linear Models di Python

Ita Cirovic Donev

Data Science Consultant

Tujuan kursus

  • Pelajari blok bangunan GLM
  • Latih GLM
  • Interpretasikan hasil model
  • Nilai kinerja model
  • Hitung prediksi
  • Bab 1: Bagaimana GLM memperluas model linear
  • Bab 2: Regresi binomial (logistik)
  • Bab 3: Regresi Poisson
  • Bab 4: Regresi logistik multivariat
Generalized Linear Models di Python

Tinjauan model linear

Scatterplot tahun pengalaman dan gaji.

$\color{#00A388}{\text{salary}} \sim \color{#FF6138}{\text{experience}}$

$\normalsize{\color{#00A388}{\text{salary}} = \beta_0 + \beta_1\times\color{#FF6138}{\text{experience}} + \epsilon}$

$\normalsize{\color{#00A388}y = \beta_0 + \beta_1x_1 + \epsilon}$

Generalized Linear Models di Python

Tinjauan model linear

Scatterplot tahun pengalaman dan gaji.

$\color{#00A388}{\text{salary}} \sim \color{#FF6138}{\text{experience}}$

$\color{#00A388}{\text{salary}} = \beta_0 + \beta_1\times{\text{experience}} + \epsilon$

$\color{#00A388}y = \beta_0 + \beta_1x_1 + \epsilon$

di mana:
$\color{#00A388}y$ - variabel respons (output)

Generalized Linear Models di Python

Tinjauan model linear

Scatterplot tahun pengalaman dan gaji.

$\color{#00A388}{\text{salary}} \sim \color{#FF6138}{\text{experience}}$

$\normalsize{\color{#00A388}{\text{salary}} = \beta_0 + \beta_1\times\color{#FF6138}{\text{experience}} + \epsilon}$

$\normalsize{\color{#00A388}y = \beta_0 + \beta_1\color{#FF6138}{x_1} + \epsilon}$

di mana:
$y$ - variabel respons (output)
$\color{#FF6138}x$ - variabel penjelas (input)

Generalized Linear Models di Python

Tinjauan model linear

Scatterplot tahun pengalaman dan gaji.

$\color{#00A388}{\text{salary}} \sim \color{#FF6138}{\text{experience}}$

$\normalsize{\color{#00A388}{\text{salary}} = \color{#007AFF}{\beta_0} + \color{#007AFF}{\beta_1}\times\color{#FF6138}{\text{experience}} + \epsilon}$

$\normalsize{\color{#00A388}y = \color{#007AFF}{\beta_0} + \color{#007AFF}{\beta_1}\color{#FF6138}{x_1} + \epsilon}$

di mana:
$y$ - variabel respons (output)
$x$ - variabel penjelas (input)
$\color{#007AFF}{\beta}$ - parameter model
$\color{#007AFF}{\beta_0}$ - intersep
$\color{#007AFF}{\beta_1}$ - kemiringan (slope)

Generalized Linear Models di Python

Tinjauan model linear

Scatterplot tahun pengalaman dan gaji.

$\color{#00A388}{\text{salary}} \sim \color{#FF6138}{\text{experience}}$

$\normalsize{\color{#00A388}{\text{salary}} = \color{#007AFF}{\beta_0} + \color{#007AFF}{\beta_1}\times\color{#FF6138}{\text{experience}} + \color{#B12BFF}\epsilon}$

$\normalsize{\color{#00A388}y = \color{#007AFF}{\beta_0} + \color{#007AFF}{\beta_1}\color{#FF6138}{x_1} + \color{#B12BFF}\epsilon}$

di mana:
$y$ - variabel respons (output)
$x$ - variabel penjelas (input)
$\color{#007AFF}{\beta}$ - parameter model
$\color{#007AFF}{\beta_0}$ - intersep
$\color{#007AFF}{\beta_1}$ - kemiringan (slope)
$\color{#B12BFF}{\epsilon}$ - galat acak

Generalized Linear Models di Python

MODEL LINEAR - ols()

from statsmodels.formula.api import ols
model = ols(formula = 'y ~ X', 
            data = my_data).fit()

MODEL LINEAR TERUMUMKAN - glm()

import statsmodels.api as sm
from statsmodels.formula.api import glm
model = glm(formula = 'y ~ X', 
            data = my_data,
            family = sm.families.____).fit()
Generalized Linear Models di Python

Asumsi model linear

Garis linear pada data tahun pengalaman dan gaji.

$$ \normalsize{{\text{salary} = \color{blue}{25790} + \color{blue}{9449}\times\text{experience}}} $$

Fungsi regresi

$\normalsize{E[y] = \mu = \beta_0 + \beta_1x_1}$

Asumsi

  • Linear dalam parameter
  • Galat independen dan berdistribusi normal
  • Varians konstan
Generalized Linear Models di Python

Bagaimana jika ...?

  • Respons berupa biner atau hitungan $\rightarrow \color{red}{\text{BUKAN kontinu}}$

Displot variabel acak kontinu, biner, dan Poisson.

  • Varians $y$ tidak konstan $\rightarrow \color{red}{\text{bergantung pada mean}}$
Generalized Linear Models di Python

Dataset - sarang kepiting tapal kuda

Nama Variabel Deskripsi
sat Jumlah satelit di sarang
y Ada ≥1 satelit di sarang; 0/1
weight Berat kepiting betina (kg)
width Lebar kepiting betina (cm)
color 1 - medium terang, 2 - medium, 3 - medium gelap, 4 - gelap
spine 1 - keduanya baik, 2 - satu aus/retak, 3 - keduanya aus/retak
1 A. Agresti, An Introduction to Categorical Data Analysis, 2007.
Generalized Linear Models di Python

Model linear dan respons biner

 

$\text{satellite crab} \sim \text{female crab weight}$

y ~ weight

$P(\text{satelit ada})=P(y=1)$

Generalized Linear Models di Python

Model linear dan respons biner

Scatterplot berat kepiting betina dan respons (ada minimal satu satelit).

Generalized Linear Models di Python

Model linear dan respons biner

Garis linear pada data berat kepiting betina dan respons (ada minimal satu satelit).

Generalized Linear Models di Python

Model linear dan respons biner

Membaca nilai probabilitas untuk fit model linear pada berat kepiting betina dan respons (ada minimal satu satelit).

Generalized Linear Models di Python

Model linear dan data biner

Menambahkan fit GLM (Binomial) pada fit linear untuk data berat kepiting betina dan respons (ada minimal satu satelit).

Generalized Linear Models di Python

Model linear dan data biner

Membaca nilai probabilitas untuk fit GLM (Binomial) pada berat kepiting betina dan respons (ada minimal satu satelit).

Generalized Linear Models di Python

Dari probabilitas ke kelas

Pemisahan output model dengan ambang probabilitas tertentu.

Generalized Linear Models di Python

Ayo berlatih!

Generalized Linear Models di Python

Preparing Video For Download...