Generalized Linear Models in Python
Ita Cirovic Donev
Data Science Consultant
$\color{#00A388}{\text{salary}} \sim \color{#FF6138}{\text{experience}}$
$\normalsize{\color{#00A388}{\text{salary}} = \beta_0 + \beta_1\times\color{#FF6138}{\text{experience}} + \epsilon}$
$\normalsize{\color{#00A388}y = \beta_0 + \beta_1x_1 + \epsilon}$
$\color{#00A388}{\text{salary}} \sim \color{#FF6138}{\text{experience}}$
$\color{#00A388}{\text{salary}} = \beta_0 + \beta_1\times{\text{experience}} + \epsilon$
$\color{#00A388}y = \beta_0 + \beta_1x_1 + \epsilon$
where:
$\color{#00A388}y$ - response variable (output)
$\color{#00A388}{\text{salary}} \sim \color{#FF6138}{\text{experience}}$
$\normalsize{\color{#00A388}{\text{salary}} = \beta_0 + \beta_1\times\color{#FF6138}{\text{experience}} + \epsilon}$
$\normalsize{\color{#00A388}y = \beta_0 + \beta_1\color{#FF6138}{x_1} + \epsilon}$
where:
$y$ - response variable (output)
$\color{#FF6138}x$ - explanatory variable (input)
$\color{#00A388}{\text{salary}} \sim \color{#FF6138}{\text{experience}}$
$\normalsize{\color{#00A388}{\text{salary}} = \color{#007AFF}{\beta_0} + \color{#007AFF}{\beta_1}\times\color{#FF6138}{\text{experience}} + \epsilon}$
$\normalsize{\color{#00A388}y = \color{#007AFF}{\beta_0} + \color{#007AFF}{\beta_1}\color{#FF6138}{x_1} + \epsilon}$
where:
$y$ - response variable (output)
$x$ - explanatory variable (input)
$\color{#007AFF}{\beta}$ - model parameters
$\color{#007AFF}{\beta_0}$ - intercept
$\color{#007AFF}{\beta_1}$ - slope
$\color{#00A388}{\text{salary}} \sim \color{#FF6138}{\text{experience}}$
$\normalsize{\color{#00A388}{\text{salary}} = \color{#007AFF}{\beta_0} + \color{#007AFF}{\beta_1}\times\color{#FF6138}{\text{experience}} + \color{#B12BFF}\epsilon}$
$\normalsize{\color{#00A388}y = \color{#007AFF}{\beta_0} + \color{#007AFF}{\beta_1}\color{#FF6138}{x_1} + \color{#B12BFF}\epsilon}$
where:
$y$ - response variable (output)
$x$ - explanatory variable (input)
$\color{#007AFF}{\beta}$ - model parameters
$\color{#007AFF}{\beta_0}$ - intercept
$\color{#007AFF}{\beta_1}$ - slope
$\color{#B12BFF}{\epsilon}$ - random error
LINEAR MODEL - ols()
from statsmodels.formula.api import ols
model = ols(formula = 'y ~ X',
data = my_data).fit()
GENERALIZED LINEAR MODEL - glm()
import statsmodels.api as sm
from statsmodels.formula.api import glm
model = glm(formula = 'y ~ X',
data = my_data,
family = sm.families.____).fit()
$$ \normalsize{{\text{salary} = \color{blue}{25790} + \color{blue}{9449}\times\text{experience}}} $$
Regression function
$\normalsize{E[y] = \mu = \beta_0 + \beta_1x_1}$
Assumptions
Variable Name | Description |
---|---|
sat |
Number of satellites residing in the nest |
y |
There is at least one satellite residing in the nest; 0/1 |
weight |
Weight of the female crab in kg |
width |
Width of the female crab in cm |
color |
1 - light medium, 2 - medium, 3 - dark medium, 4 - dark |
spine |
1 - both good, 2 - one worn or broken, 3 - both worn or broken |
$\text{satellite crab} \sim \text{female crab weight}$
y ~ weight
$P(\text{satellite crab is present})=P(y=1)$
Generalized Linear Models in Python