Model formula

Generalized Linear Models in Python

Ita Cirovic Donev

Data Science Consultant

Formula and model matrix

Start of the diagram with data sources X and Y.

Generalized Linear Models in Python

Formula and model matrix

Formula diagram

Generalized Linear Models in Python

Formula and model matrix

Model matrix diagram

Generalized Linear Models in Python

Formula and model matrix

Diagram of the input to the glm class.

Generalized Linear Models in Python

Model matrix

  • Model matrix: $y \sim \bold{X}$

  • Model formula

    'y ~ x1 + x2'
    
  • Check model matrix structure
    from patsy import dmatrix
    dmatrix('x1 + x2')
    
  Intercept  x1  x2
          1   1   4
          1   2   5
          1   3   6
Generalized Linear Models in Python

Variable transformation

import numpy as np
'y ~ x1 + np.log(x2)'
dmatrix('x1 + np.log(x2)')
DesignMatrix with shape (3, 3)
  Intercept  x1  np.log(x2)
          1   1     1.38629
          1   2     1.60944
          1   3     1.79176
Generalized Linear Models in Python

Centering and standardization

  • Stateful transforms
'y ~ center(x1) + standardize(x2)'
dmatrix('center(x1) + standardize(x2)')
DesignMatrix with shape (3, 3)
  Intercept  center(x1)  standardize(x2)
          1          -1         -1.22474
          1           0          0.00000
          1           1          1.22474
Generalized Linear Models in Python

Build your own transformation

def my_transformation(x):
  return 4 * x
dmatrix('x1 + x2 + my_transformation(x2)')
DesignMatrix with shape (3, 4)
  Intercept  x1  x2  my_transformation(x2)
          1   1   4                     16
          1   2   5                     20
          1   3   6                     24
Generalized Linear Models in Python

Arithmetic operations

x1 = np.array([1, 2, 3])
x2 = np.array([4,5,6])

dmatrix('I(x1 + x2'))
DesignMatrix with shape (3, 2)
  Intercept  I(x1 + x2)
          1           5
          1           7
          1           9
x1 = [1, 2, 3]
x2 = [4,5,6]

dmatrix('I(x1 + x2)')
DesignMatrix with shape (6, 2)
  Intercept  I(x1 + x2)
          1           1
          1           2
          1           3
          1           4
          1           5
          1           6
Generalized Linear Models in Python

Coding the categorical data

Color type: red, green, blue

Generalized Linear Models in Python

Coding the categorical data

Diagram of color type: red, green, blue and color observations in data

Generalized Linear Models in Python

Coding the categorical data

Diagram of one-hot encoding using color of red, green and blue.

Generalized Linear Models in Python

Patsy coding

  • Strings and booleans are automatically coded
  • Numerical $\rightarrow$ categorical
    • C() function
  • Reference group
    • Default: first group
    • Treatment
    • levels
Generalized Linear Models in Python

The C() function

  • Numeric variable
    dmatrix('color', data = crab)
    
DesignMatrix with shape (173, 2)
  Intercept  color
          1      2
          1      3
          1      1
  [... rows omitted]
  • How many levels?
    crab['color'].value_counts()
    
2    95
3    44
4    22
1    12
Generalized Linear Models in Python

The C() function

  • Categorical variable
    dmatrix('C(color)', data = crab)
    
DesignMatrix with shape (173, 4)
  Intercept  C(color)[T.2]  C(color)[T.3]  C(color)[T.4]
          1              1              0              0
          1              0              1              0
          1              0              0              0
  [... rows omitted]
Generalized Linear Models in Python

Changing the reference group

dmatrix('C(color, Treatment(4))', data = crab)
DesignMatrix with shape (173, 4)
  Intercept  C(color)[T.1]  C(color)[T.2]  C(color)[T.3]  
          1              0              1              0 
          1              0              0              1 
          1              1              0              0 
  [... rows omitted]
Generalized Linear Models in Python

Changing the reference group

l = [1, 2, 3,4]
dmatrix('C(color, levels = l)', data = crab)
DesignMatrix with shape (173, 4)
  Intercept  C(color)[T.2]  C(color)[T.3]   C(color)[T.4] 
          1               1            0               0     
          1               0            1               0
          1               0            0               0 
  [... rows omitted]
Generalized Linear Models in Python

Multiple intercepts

'y ~ C(color)-1'
dmatrix('C(color)-1', data = crab)
DesignMatrix with shape (173, 4)
  C(color)[1]  C(color)[2]  C(color)[3]  C(color)[4]
            0            1            0            0
            0            0            1            0
            1            0            0            0
  [... rows omitted]
Generalized Linear Models in Python

Let's practice!

Generalized Linear Models in Python

Preparing Video For Download...