Explaining house price with size & condition

Pemodelan dengan Data di Tidyverse

Albert Y. Kim

Assistant Professor of Statistical and Data Sciences

Refresher: Exploratory data analysis

library(dplyr)
library(moderndive)

# log transform variables
house_prices <- house_prices %>%
  mutate(
    log10_price = log10(price),
    log10_size = log10(sqft_living)
  )

Pemodelan dengan Data di Tidyverse

Refresher: Exploratory data analysis

# Group mean & sd of log10_price and counts
house_prices %>% 
  group_by(condition) %>% 
  summarize(mean = mean(log10_price), 
            sd = sd(log10_price), n = n())
# A tibble: 5 x 4
  condition  mean    sd     n
  <fct>     <dbl> <dbl> <int>
1 1          5.42 0.293    30
2 2          5.45 0.233   172
3 3          5.67 0.224 14031
...
Pemodelan dengan Data di Tidyverse

House price, size, and condition

Pemodelan dengan Data di Tidyverse

Parallel slopes model

Pemodelan dengan Data di Tidyverse

Parallel slopes model

Pemodelan dengan Data di Tidyverse

House price, size, and condition relationship

# Fit regression model using formula of form: y ~ x1 + x2
model_price_3 <- lm(log10_price ~ log10_size + condition,
                    data = house_prices)

# Output regression table
get_regression_table(model_price_3)
# A tibble: 6 x 7
  term       estimate std_error statistic p_value lower_ci...
  <chr>         <dbl>     <dbl>     <dbl>   <dbl>    <dbl>...
1 intercept     2.88      0.036     80.0    0        2.81...
2 log10_size    0.837     0.006    134.     0        0.825...
3 condition2   -0.039     0.033     -1.16   0.246   -0.104...
4 condition3    0.032     0.031      1.04   0.3     -0.028...
...
Pemodelan dengan Data di Tidyverse

Let's practice!

Pemodelan dengan Data di Tidyverse

Preparing Video For Download...