Technical conditions for linear regression

Inference for Linear Regression in R

Jo Hardin

Professor, Pomona College

What are the technical conditions?

$$Y = \beta_0 + \beta_1 \cdot X + \epsilon$$

$$\epsilon \sim N(0, \sigma_\epsilon)$$

L: linear model
I: independent observations
N: points are normally distributed around the line
E: equal variability around the line for all values of the explanatory variable

Linear model: residuals

linear_lm <- augment(
  lm(response ~ explanatory,
     data = lineardata)
)

ggplot(linear_lm,
       aes(x =. fitted, y = .resid)) +
  geom_point() +
  geom_hline(yintercept=0)

Fitted value: $\hat{Y}_i = b_0 + b_1 X_i$

Residual: $e_i= Y_i - \hat{Y}_i$

Not linear

$$Y = \beta_0 + \beta_1 \cdot X + \epsilon$$

$$\epsilon \sim N(0, \sigma_\epsilon)$$

L: linear model
I: independent observations
N: points are normally distributed around the line
E: equal variability around the line for all values of the explanatory variable

Not linear: residuals

nonlinear_lm <- augment(
  lm(response ~ explanatory,
     data = nonlineardata))
ggplot(nonlinear_lm, 
       aes(x = .fitted, y = .resid)) + 
  geom_point() +
  geom_hline(yintercept=0)

fitted value: $\hat{Y}_i = b_0 + b_1 X_i$

residual: $e_i= Y_i - \hat{Y}_i$

Not normal

$$Y = \beta_0 + \beta_1 \cdot X + \epsilon$$

$$\epsilon \sim N(0, \sigma_\epsilon)$$

L: linear model
I: independent observations
N: points are normally distributed around the line
E: equal variability around the line for all values of the explanatory variable

Not normal: residuals

nonnormal_lm <- augment(
  lm(response ~ explanatory, 
     data = nonnormaldata))
ggplot(nonnormal_lm, 
      aes(x = .fitted, y = .resid)) + 
  geom_point() +
  geom_hline(yintercept = 0)

fitted value: $\hat{Y}_i = b_0 + b_1 X_i$

residual: $e_i= Y_i - \hat{Y}_i$

Not equal variance

$$Y = \beta_0 + \beta_1 \cdot X + \epsilon$$

$$\epsilon \sim N(0, \sigma_\epsilon)$$

L: linear model
I: independent observations
N: points are normally distributed around the line
E: equal variability around the line for all values of the explanatory variable

Not equal variance: residuals

nonequal_lm <- augment(
  lm(response ~ explanatory, 
     data = nonequaldata))
ggplot(nonequal_lm, 
       aes(x = .fitted, y = .resid)) + 
  geom_point() +
  geom_hline(yintercept = 0)

fitted value: $\hat{Y}_i = b_0 + b_1 X_i$

residual: $e_i= Y_i - \hat{Y}_i$

Let's practice!

Inference for Linear Regression in R