Explaining house price with size & condition

Modelleren met data in de Tidyverse

Albert Y. Kim

Assistant Professor of Statistical and Data Sciences

Refresher: Exploratory data analysis

library(dplyr)
library(moderndive)

# log transform variables
house_prices <- house_prices %>%
  mutate(
    log10_price = log10(price),
    log10_size = log10(sqft_living)
  )

Modelleren met data in de Tidyverse

Refresher: Exploratory data analysis

# Group mean & sd of log10_price and counts
house_prices %>% 
  group_by(condition) %>% 
  summarize(mean = mean(log10_price), 
            sd = sd(log10_price), n = n())
# A tibble: 5 x 4
  condition  mean    sd     n
  <fct>     <dbl> <dbl> <int>
1 1          5.42 0.293    30
2 2          5.45 0.233   172
3 3          5.67 0.224 14031
...
Modelleren met data in de Tidyverse

House price, size, and condition

Modelleren met data in de Tidyverse

Parallel slopes model

Modelleren met data in de Tidyverse

Parallel slopes model

Modelleren met data in de Tidyverse

House price, size, and condition relationship

# Fit regression model using formula of form: y ~ x1 + x2
model_price_3 <- lm(log10_price ~ log10_size + condition,
                    data = house_prices)

# Output regression table
get_regression_table(model_price_3)
# A tibble: 6 x 7
  term       estimate std_error statistic p_value lower_ci...
  <chr>         <dbl>     <dbl>     <dbl>   <dbl>    <dbl>...
1 intercept     2.88      0.036     80.0    0        2.81...
2 log10_size    0.837     0.006    134.     0        0.825...
3 condition2   -0.039     0.033     -1.16   0.246   -0.104...
4 condition3    0.032     0.031      1.04   0.3     -0.028...
...
Modelleren met data in de Tidyverse

Let's practice!

Modelleren met data in de Tidyverse

Preparing Video For Download...