Generalized Linear Models in Python
Ita Cirovic Donev
Data Science Consultant
Model formula $$ \text{logit}(y) = \beta_0+\beta_1x_1 + \color{blue}{\beta_2}\color{red}{x_2} + ... + \color{blue}{\beta_p}\color{red}{x_p} $$
In Python
model = glm('y ~ x1 + x2 + x3 + x4',
data = my_data,
family = sm.families.Binomial()).fit()
formula = 'switch ~ distance100 + arsenic'
wells_fit = glm(formula = formula, data = wells,
family = sm.families.Binomial()).fit()
===============================================================================
coef std err z P>|z| [0.025 0.975]
-------------------------------------------------------------------------------
Intercept 0.0027 0.079 0.035 0.972 -0.153 0.158
distance100 -0.8966 0.104 -8.593 0.000 -1.101 -0.692
arsenic 0.4608 0.041 11.134 0.000 0.380 0.542
===============================================================================
coef std err z P>|z| [0.025 0.975]
-------------------------------------------------------------------------------
Intercept 0.0027 0.079 0.035 0.972 -0.153 0.158
distance100 -0.8966 0.104 -8.593 0.000 -1.101 -0.692
arsenic 0.4608 0.041 11.134 0.000 0.380 0.542
distance100
corresponds to a negative difference of 0.89 in the logitarsenic
corresponds to a positive difference of 0.46 in the logitarsenic
variabledistance100
changes from -0.62 to -0.89 coef std err
---------------------------------
Intercept 0.0027 0.079
distance100 -0.8966 0.104
arsenic 0.4608 0.041
coef std err
---------------------------------
Intercept 0.6060 0.060
distance100 -0.6291 0.097
What to look for?
from statsmodels.stats.outliers_influence import variance_inflation_factor
Generalized Linear Models in Python