Predicting house price using size & condition

Modeling with Data in the Tidyverse

Albert Y. Kim

Assistant Professor of Statistical and Data Sciences

Refresher: Parallel slopes

Modeling with Data in the Tidyverse

Making a prediction

Modeling with Data in the Tidyverse

Visualizing predictions

Modeling with Data in the Tidyverse

Numerical predictions

Using values in estimate in regression table below:

  • First house: $\hat{y} = 2.88 + 0.032 + 0.837 \cdot 2.90 = 5.34$
  • Second house: $\hat{y} = 2.88 + 0.044 + 0.837 \cdot 3.60 = 5.94$
# Fit regression model and get regression table
model_price_3 <- lm(log10_price ~ log10_size + condition,
                    data = house_prices)
get_regression_table(model_price_3)
# A tibble: 6 x 7
  term       estimate std_error statistic p_value lower_ci...
  <chr>         <dbl>     <dbl>     <dbl>   <dbl>    <dbl>...
1 intercept     2.88      0.036     80.0    0        2.81...
2 log10_size    0.837     0.006    134.     0        0.825...
...
Modeling with Data in the Tidyverse

Defining "new" data

# Create data frame of "new" houses
new_houses <- data_frame(
  log10_size = c(2.9, 3.6),
  condition = factor(c(3, 4))
)
new_houses
# A tibble: 2 x 2
  log10_size condition
       <dbl> <fct>    
1        2.9 3        
2        3.6 4
Modeling with Data in the Tidyverse

Making predictions using new data

# Make predictions on new data
get_regression_points(model_price_3,
                      newdata = new_houses)
# A tibble: 2 x 4
     ID log10_size condition log10_price_hat
  <int>      <dbl> <fct>               <dbl>
1     1        2.9 3                    5.34
2     2        3.6 4                    5.94
Modeling with Data in the Tidyverse

Making predictions using new data

# Make predictions in original units by undoing log10()
get_regression_points(model_price_3,
                      newdata = new_houses) %>% 
  mutate(price_hat = 10^log10_price_hat)
# A tibble: 2 x 5
     ID log10_size condition log10_price_hat price_hat
  <int>      <dbl> <fct>               <dbl>     <dbl>
1     1        2.9 3                    5.34   219786.
2     2        3.6 4                    5.94   870964.
Modeling with Data in the Tidyverse

Let's practice!

Modeling with Data in the Tidyverse

Preparing Video For Download...