Generalized Linear Models in Python
Ita Cirovic Donev
Data Science Consultant
Model matrix: $y \sim \bold{X}$
Model formula
'y ~ x1 + x2'
from patsy import dmatrix
dmatrix('x1 + x2')
Intercept x1 x2
1 1 4
1 2 5
1 3 6
import numpy as np
'y ~ x1 + np.log(x2)'
dmatrix('x1 + np.log(x2)')
DesignMatrix with shape (3, 3)
Intercept x1 np.log(x2)
1 1 1.38629
1 2 1.60944
1 3 1.79176
'y ~ center(x1) + standardize(x2)'
dmatrix('center(x1) + standardize(x2)')
DesignMatrix with shape (3, 3)
Intercept center(x1) standardize(x2)
1 -1 -1.22474
1 0 0.00000
1 1 1.22474
def my_transformation(x):
return 4 * x
dmatrix('x1 + x2 + my_transformation(x2)')
DesignMatrix with shape (3, 4)
Intercept x1 x2 my_transformation(x2)
1 1 4 16
1 2 5 20
1 3 6 24
x1 = np.array([1, 2, 3])
x2 = np.array([4,5,6])
dmatrix('I(x1 + x2'))
DesignMatrix with shape (3, 2)
Intercept I(x1 + x2)
1 5
1 7
1 9
x1 = [1, 2, 3]
x2 = [4,5,6]
dmatrix('I(x1 + x2)')
DesignMatrix with shape (6, 2)
Intercept I(x1 + x2)
1 1
1 2
1 3
1 4
1 5
1 6
C()
functionTreatment
levels
dmatrix('color', data = crab)
DesignMatrix with shape (173, 2)
Intercept color
1 2
1 3
1 1
[... rows omitted]
crab['color'].value_counts()
2 95
3 44
4 22
1 12
dmatrix('C(color)', data = crab)
DesignMatrix with shape (173, 4)
Intercept C(color)[T.2] C(color)[T.3] C(color)[T.4]
1 1 0 0
1 0 1 0
1 0 0 0
[... rows omitted]
dmatrix('C(color, Treatment(4))', data = crab)
DesignMatrix with shape (173, 4)
Intercept C(color)[T.1] C(color)[T.2] C(color)[T.3]
1 0 1 0
1 0 0 1
1 1 0 0
[... rows omitted]
l = [1, 2, 3,4]
dmatrix('C(color, levels = l)', data = crab)
DesignMatrix with shape (173, 4)
Intercept C(color)[T.2] C(color)[T.3] C(color)[T.4]
1 1 0 0
1 0 1 0
1 0 0 0
[... rows omitted]
'y ~ C(color)-1'
dmatrix('C(color)-1', data = crab)
DesignMatrix with shape (173, 4)
C(color)[1] C(color)[2] C(color)[3] C(color)[4]
0 1 0 0
0 0 1 0
1 0 0 0
[... rows omitted]
Generalized Linear Models in Python