Masalah Overdispersi

Generalized Linear Models di Python

Ita Cirovic Donev

Data Science Consultant

Memahami data

Plot distribusi jumlah satelit (kepiting)

# mean of y
y_mean = crab['sat'].mean()
2.919
# variance of y
y_variance = crab['sat'].var()
9.912
Generalized Linear Models di Python

Rata-rata tidak sama dengan varians

  • $variance > mean$ $\rightarrow$ overdispersi
  • $variance < mean$ $\rightarrow$ underdispersi

Dampak:

  • Galat baku kecil
  • Nilai p kecil
Generalized Linear Models di Python

Cara memeriksa overdispersi

Ringkasan model terpasang dengan sorotan pada df residual dan statistik Pearson Chi-square.

Generalized Linear Models di Python

Hitung overdispersi terestimasi

ratio = crab_fit.pearson_chi2 / crab_fit.df_resid
print(ratio)
3.134
  • Rasio $ =1$ $\rightarrow$ kira-kira Poisson

  • Rasio $ <1$ $\rightarrow$ underdispersi

  • Rasio $ >1$ $\rightarrow$ overdispersi

Generalized Linear Models di Python

Regresi Binomial Negatif

  • $E(y)=\lambda$
  • $Var(y) = \lambda+\alpha\lambda^2$
  • $\alpha$ - parameter dispersi
Generalized Linear Models di Python

GLM Binomial Negatif di Python

import statsmodels.api as sm
from statsmodels.formula.api import glm
model = glm('y ~ x', data = my_data, 
            family = sm.families.NegativeBinomial(alpha = 1)).fit()
Generalized Linear Models di Python

Ayo berlatih!

Generalized Linear Models di Python

Preparing Video For Download...