Explaining house price with size & condition

Modeling with Data in the Tidyverse

Albert Y. Kim

Assistant Professor of Statistical and Data Sciences

Refresher: Exploratory data analysis

library(dplyr)
library(moderndive)

# log transform variables
house_prices <- house_prices %>%
  mutate(
    log10_price = log10(price),
    log10_size = log10(sqft_living)
  )

Modeling with Data in the Tidyverse

Refresher: Exploratory data analysis

# Group mean & sd of log10_price and counts
house_prices %>% 
  group_by(condition) %>% 
  summarize(mean = mean(log10_price), 
            sd = sd(log10_price), n = n())
# A tibble: 5 x 4
  condition  mean    sd     n
  <fct>     <dbl> <dbl> <int>
1 1          5.42 0.293    30
2 2          5.45 0.233   172
3 3          5.67 0.224 14031
...
Modeling with Data in the Tidyverse

House price, size, and condition

Modeling with Data in the Tidyverse

Parallel slopes model

Modeling with Data in the Tidyverse

Parallel slopes model

Modeling with Data in the Tidyverse

House price, size, and condition relationship

# Fit regression model using formula of form: y ~ x1 + x2
model_price_3 <- lm(log10_price ~ log10_size + condition,
                    data = house_prices)

# Output regression table
get_regression_table(model_price_3)
# A tibble: 6 x 7
  term       estimate std_error statistic p_value lower_ci...
  <chr>         <dbl>     <dbl>     <dbl>   <dbl>    <dbl>...
1 intercept     2.88      0.036     80.0    0        2.81...
2 log10_size    0.837     0.006    134.     0        0.825...
3 condition2   -0.039     0.033     -1.16   0.246   -0.104...
4 condition3    0.032     0.031      1.04   0.3     -0.028...
...
Modeling with Data in the Tidyverse

Let's practice!

Modeling with Data in the Tidyverse

Preparing Video For Download...