How to fit a GLM in Python?

Generalized Linear Models in Python

Ita Cirovic Donev

Data Science Consultant

statsmodels

  • Importing statsmodels

    import statsmodels.api as sm
    
  • Support for formulas

    import statsmodels.formula.api as smf
    
  • Use glm() directly

    from statsmodels.formula.api import glm
    
Generalized Linear Models in Python

Process of model fit

  1. Describe the model $\rightarrow \texttt{\color{#007AFF}{glm()}}$
  2. Fit the model $\rightarrow \texttt{\color{#007AFF}{.fit()}}$
  3. Summarize the model $\rightarrow \texttt{\color{#007AFF}{.summary()}}$
  4. Make model predictions $\rightarrow \texttt{\color{#007AFF}{.predict()}}$
Generalized Linear Models in Python

Describing the model

FORMULA based

from statsmodels.formula.api import glm
model = glm(formula, data, family)

ARRAY based

import statsmodels.api as sm
X = sm.add_constant(X)
model = sm.glm(y, X, family)
Generalized Linear Models in Python

Formula Argument

$$\texttt{\color{#00A388}{response}} \sim \texttt{\color{#FF6138}{explanatory variable(s)}}$$ $$\texttt{\color{#00A388}{output}} \sim \texttt{\color{#FF6138}{input(s)}}$$

formula = 'y ~ x1 + x2'
  • $\texttt{\color{#FF6138}{C(x1)}}$ : treat x1 as categorical variable
  • $\texttt{\color{#FF6138}{-1}}$ : remove intercept
  • $\texttt{\color{#FF6138}{x1:x2}}$ : an interaction term between x1 and x2
  • $\texttt{\color{#FF6138}{x1*x2}}$ : an interaction term between x1 and x2 and the individual variables
  • $\texttt{\color{#FF6138}{np.log(x1)}}$ : apply vectorized functions to model variables
Generalized Linear Models in Python

Family Argument

family = sm.families.____()

The family functions:

  • $\texttt{\color{#007AFF}{Gaussian}(link = sm.families.links.\color{deeppink}{identity()})}$ $\rightarrow$ the default family
  • $\texttt{\color{#007AFF}{Binomial}(link = sm.families.links.\color{deeppink}{logit()})}$
    • $\texttt{\color{deeppink}{probit()}}$, $\texttt{\color{deeppink}{cauchy()}}$, $\texttt{\color{deeppink}{log()}}$, and $\texttt{\color{deeppink}{cloglog()}}$
  • $\texttt{\color{#007AFF}{Poisson}(link = sm.families.links.\color{deeppink}{log()})}$
    • $\texttt{\color{deeppink}{identity()}}$ and $\texttt{\color{deeppink}{sqrt()}}$

Other distribution families you can review at statsmodels website.

Generalized Linear Models in Python

Summarizing the model

print(model_GLM.summary())
Generalized Linear Models in Python
                 Generalized Linear Model Regression Results                  
=============================================================================
Dep. Variable:                      y  No. Observations:                  173
Model:                            GLM  Df Residuals:                      171
Model Family:                Binomial  Df Model:                            1
Link Function:                  logit  Scale:                          1.0000
Method:                          IRLS  Log-Likelihood:                -97.226
Date:                Mon, 21 Jan 2019  Deviance:                       194.45
Time:                        11:30:01  Pearson chi2:                     165.
No. Iterations:                     4  Covariance Type:             nonrobust
=============================================================================
                 coef    std err         z      P>|z|      [0.025      0.975]
-----------------------------------------------------------------------------
Intercept    -12.3508      2.629    -4.698      0.000     -17.503      -7.199
width          0.4972      0.102     4.887      0.000       0.298       0.697
=============================================================================
Generalized Linear Models in Python

Regression coefficients

$\texttt{\color{#007AFF}{.params}}$ prints regression coefficients

model_GLM.params
Intercept   -12.350818
width         0.497231
dtype: float64

$\texttt{\color{#007AFF}{.conf\_int(alpha=0.05, cols=None)}}$ prints confidence intervals

model_GLM.conf_int()
                   0         1
Intercept -17.503010 -7.198625
width       0.297833  0.696629
Generalized Linear Models in Python

Predictions

  • Specify all the model variables in test data
  • $\texttt{\color{#007AFF}{.predict(\color{#FF931B}{test\_data})}}$ computes predictions
model_GLM.predict(test_data)
0    0.029309
1    0.470299
2    0.834983
3    0.972363
4    0.987941
Generalized Linear Models in Python

Let's practice!

Generalized Linear Models in Python

Preparing Video For Download...