Model assessment and selection

Modeling with Data in the Tidyverse

Albert Y. Kim

Assistant Professor of Statistical and Data Sciences

Refresher: Multiple regression

Two models with different pairs of explanatory/predictor variables:

# Model 1 - Two numerical:
model_price_1 <- lm(log10_price ~ log10_size + yr_built, 
                    data = house_prices)

# Model 3 - One numerical & one categorical: model_price_3 <- lm(log10_price ~ log10_size + condition, data = house_prices)
Modeling with Data in the Tidyverse

Refresher: Sum of squared residuals

Modeling with Data in the Tidyverse

Refresher: Sum of squared residuals

# Model 1
model_price_1 <- lm(log10_price ~ log10_size + yr_built, 
                    data = house_prices)
get_regression_points(model_price_1) %>%
  mutate(sq_residuals = residual^2) %>%
  summarize(sum_sq_residuals = sum(sq_residuals))
# A tibble: 1 x 1
  sum_sq_residuals
             <dbl>
1             585.
Modeling with Data in the Tidyverse

Refresher: Sum of squared residuals

# Model 3
model_price_3 <- lm(log10_price ~ log10_size + condition, 
                    data = house_prices)

get_regression_points(model_price_3) %>%
  mutate(sq_residuals = residual^2) %>%
  summarize(sum_sq_residuals = sum(sq_residuals))
# A tibble: 1 x 1
  sum_sq_residuals
             <dbl>
1             608.
Modeling with Data in the Tidyverse

Let's practice!

Modeling with Data in the Tidyverse

Preparing Video For Download...