Modelformule

Generalized Linear Models in Python

Ita Cirovic Donev

Data Science Consultant

Formule en modelmatrix

Begin van het diagram met databronnen X en Y.

Generalized Linear Models in Python

Formule en modelmatrix

Formulediagram

Generalized Linear Models in Python

Formule en modelmatrix

Diagram van modelmatrix

Generalized Linear Models in Python

Formule en modelmatrix

Diagram van de input voor de glm-klasse.

Generalized Linear Models in Python

Modelmatrix

  • Modelmatrix: y ~ X

  • Modelformule

    'y ~ x1 + x2'
    
  • Controleer de structuur van de modelmatrix
    from patsy import dmatrix
    dmatrix('x1 + x2')
    
  Intercept  x1  x2
          1   1   4
          1   2   5
          1   3   6
Generalized Linear Models in Python

Variabele transformatie

import numpy as np
'y ~ x1 + np.log(x2)'
dmatrix('x1 + np.log(x2)')
DesignMatrix met vorm (3, 3)
  Intercept  x1  np.log(x2)
          1   1     1.38629
          1   2     1.60944
          1   3     1.79176
Generalized Linear Models in Python

Centreren en standaardiseren

  • Stateful transformaties
'y ~ center(x1) + standardize(x2)'
dmatrix('center(x1) + standardize(x2)')
DesignMatrix met vorm (3, 3)
  Intercept  center(x1)  standardize(x2)
          1          -1         -1.22474
          1           0          0.00000
          1           1          1.22474
Generalized Linear Models in Python

Maak je eigen transformatie

def my_transformation(x):
  return 4 * x
dmatrix('x1 + x2 + my_transformation(x2)')
DesignMatrix met vorm (3, 4)
  Intercept  x1  x2  my_transformation(x2)
          1   1   4                     16
          1   2   5                     20
          1   3   6                     24
Generalized Linear Models in Python

Rekenkundige bewerkingen

x1 = np.array([1, 2, 3])
x2 = np.array([4,5,6])

dmatrix('I(x1 + x2'))
DesignMatrix met vorm (3, 2)
  Intercept  I(x1 + x2)
          1           5
          1           7
          1           9
x1 = [1, 2, 3]
x2 = [4,5,6]

dmatrix('I(x1 + x2)')
DesignMatrix met vorm (6, 2)
  Intercept  I(x1 + x2)
          1           1
          1           2
          1           3
          1           4
          1           5
          1           6
Generalized Linear Models in Python

Categorische data coderen

Kleurtype: rood, groen, blauw

Generalized Linear Models in Python

Categorische data coderen

Diagram van kleurtype: rood, groen, blauw en kleurwaarnemingen in data

Generalized Linear Models in Python

Categorische data coderen

Diagram van one-hot-encoding met kleuren rood, groen en blauw.

Generalized Linear Models in Python

Patsy-codering

  • Strings en booleans worden automatisch gecodeerd
  • Numeriek → categorisch
    • C()-functie
  • Referentiegroep
    • Standaard: eerste groep
    • Treatment
    • levels
Generalized Linear Models in Python

De C()-functie

  • Numerieke variabele
    dmatrix('color', data = crab)
    
DesignMatrix met vorm (173, 2)
  Intercept  color
          1      2
          1      3
          1      1
  [... rijen weggelaten]
  • Hoeveel levels?
    crab['color'].value_counts()
    
2    95
3    44
4    22
1    12
Generalized Linear Models in Python

De C()-functie

  • Categorische variabele
    dmatrix('C(color)', data = crab)
    
DesignMatrix met vorm (173, 4)
  Intercept  C(color)[T.2]  C(color)[T.3]  C(color)[T.4]
          1              1              0              0
          1              0              1              0
          1              0              0              0
  [... rijen weggelaten]
Generalized Linear Models in Python

De referentiegroep wijzigen

dmatrix('C(color, Treatment(4))', data = crab)
DesignMatrix met vorm (173, 4)
  Intercept  C(color)[T.1]  C(color)[T.2]  C(color)[T.3]  
          1              0              1              0 
          1              0              0              1 
          1              1              0              0 
  [... rijen weggelaten]
Generalized Linear Models in Python

De referentiegroep wijzigen

l = [1, 2, 3,4]
dmatrix('C(color, levels = l)', data = crab)
DesignMatrix met vorm (173, 4)
  Intercept  C(color)[T.2]  C(color)[T.3]   C(color)[T.4] 
          1               1            0               0     
          1               0            1               0
          1               0            0               0 
  [... rijen weggelaten]
Generalized Linear Models in Python

Meerdere intercepts

'y ~ C(color)-1'
dmatrix('C(color)-1', data = crab)
DesignMatrix met vorm (173, 4)
  C(color)[1]  C(color)[2]  C(color)[3]  C(color)[4]
            0            1            0            0
            0            0            1            0
            1            0            0            0
  [... rijen weggelaten]
Generalized Linear Models in Python

Laten we oefenen!

Generalized Linear Models in Python

Preparing Video For Download...