Rumus model

Generalized Linear Models di Python

Ita Cirovic Donev

Data Science Consultant

Rumus dan matriks model

Awal diagram dengan sumber data X dan Y.

Generalized Linear Models di Python

Rumus dan matriks model

Diagram rumus

Generalized Linear Models di Python

Rumus dan matriks model

Diagram matriks model

Generalized Linear Models di Python

Rumus dan matriks model

Diagram input ke kelas glm.

Generalized Linear Models di Python

Matriks model

  • Matriks model: y ∼ X

  • Rumus model

    'y ~ x1 + x2'
    
  • Periksa struktur matriks model
    from patsy import dmatrix
    dmatrix('x1 + x2')
    
  Intercept  x1  x2
          1   1   4
          1   2   5
          1   3   6
Generalized Linear Models di Python

Transformasi variabel

import numpy as np
'y ~ x1 + np.log(x2)'
dmatrix('x1 + np.log(x2)')
DesignMatrix dengan shape (3, 3)
  Intercept  x1  np.log(x2)
          1   1     1.38629
          1   2     1.60944
          1   3     1.79176
Generalized Linear Models di Python

Pemusatan dan standardisasi

  • Transformasi stateful
'y ~ center(x1) + standardize(x2)'
dmatrix('center(x1) + standardize(x2)')
DesignMatrix dengan shape (3, 3)
  Intercept  center(x1)  standardize(x2)
          1          -1         -1.22474
          1           0          0.00000
          1           1          1.22474
Generalized Linear Models di Python

Buat transformasi sendiri

def my_transformation(x):
  return 4 * x
dmatrix('x1 + x2 + my_transformation(x2)')
DesignMatrix dengan shape (3, 4)
  Intercept  x1  x2  my_transformation(x2)
          1   1   4                     16
          1   2   5                     20
          1   3   6                     24
Generalized Linear Models di Python

Operasi aritmetika

x1 = np.array([1, 2, 3])
x2 = np.array([4,5,6])

dmatrix('I(x1 + x2'))
DesignMatrix dengan shape (3, 2)
  Intercept  I(x1 + x2)
          1           5
          1           7
          1           9
x1 = [1, 2, 3]
x2 = [4,5,6]

dmatrix('I(x1 + x2)')
DesignMatrix dengan shape (6, 2)
  Intercept  I(x1 + x2)
          1           1
          1           2
          1           3
          1           4
          1           5
          1           6
Generalized Linear Models di Python

Pengodean data kategorikal

Tipe warna: merah, hijau, biru

Generalized Linear Models di Python

Pengodean data kategorikal

Diagram tipe warna: merah, hijau, biru dan observasi warna dalam data

Generalized Linear Models di Python

Pengodean data kategorikal

Diagram one-hot encoding menggunakan warna merah, hijau, dan biru.

Generalized Linear Models di Python

Pengodean Patsy

  • String dan boolean otomatis dikodekan
  • Numerik → kategorikal
    • fungsi C()
  • Kelompok referensi
    • Bawaan: kelompok pertama
    • Treatment
    • levels
Generalized Linear Models di Python

Fungsi C()

  • Variabel numerik
    dmatrix('color', data = crab)
    
DesignMatrix dengan shape (173, 2)
  Intercept  color
          1      2
          1      3
          1      1
  [... baris dihilangkan]
  • Ada berapa level?
    crab['color'].value_counts()
    
2    95
3    44
4    22
1    12
Generalized Linear Models di Python

Fungsi C()

  • Variabel kategorikal
    dmatrix('C(color)', data = crab)
    
DesignMatrix dengan shape (173, 4)
  Intercept  C(color)[T.2]  C(color)[T.3]  C(color)[T.4]
          1              1              0              0
          1              0              1              0
          1              0              0              0
  [... baris dihilangkan]
Generalized Linear Models di Python

Mengubah kelompok referensi

dmatrix('C(color, Treatment(4))', data = crab)
DesignMatrix dengan shape (173, 4)
  Intercept  C(color)[T.1]  C(color)[T.2]  C(color)[T.3]  
          1              0              1              0 
          1              0              0              1 
          1              1              0              0 
  [... baris dihilangkan]
Generalized Linear Models di Python

Mengubah kelompok referensi

l = [1, 2, 3,4]
dmatrix('C(color, levels = l)', data = crab)
DesignMatrix dengan shape (173, 4)
  Intercept  C(color)[T.2]  C(color)[T.3]   C(color)[T.4] 
          1               1            0               0     
          1               0            1               0
          1               0            0               0 
  [... baris dihilangkan]
Generalized Linear Models di Python

Intercept berganda

'y ~ C(color)-1'
dmatrix('C(color)-1', data = crab)
DesignMatrix dengan shape (173, 4)
  C(color)[1]  C(color)[2]  C(color)[3]  C(color)[4]
            0            1            0            0
            0            0            1            0
            1            0            0            0
  [... baris dihilangkan]
Generalized Linear Models di Python

Ayo berlatih!

Generalized Linear Models di Python

Preparing Video For Download...