Modeling with Data in the Tidyverse
Albert Y. Kim
Assistant Professor of Statistical and Data Sciences
library(dplyr)
library(moderndive)
# log transform variables
house_prices <- house_prices %>%
mutate(
log10_price = log10(price),
log10_size = log10(sqft_living)
)
# Group mean & sd of log10_price and counts
house_prices %>%
group_by(condition) %>%
summarize(mean = mean(log10_price),
sd = sd(log10_price), n = n())
# A tibble: 5 x 4
condition mean sd n
<fct> <dbl> <dbl> <int>
1 1 5.42 0.293 30
2 2 5.45 0.233 172
3 3 5.67 0.224 14031
...
# Fit regression model using formula of form: y ~ x1 + x2
model_price_3 <- lm(log10_price ~ log10_size + condition,
data = house_prices)
# Output regression table
get_regression_table(model_price_3)
# A tibble: 6 x 7
term estimate std_error statistic p_value lower_ci...
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>...
1 intercept 2.88 0.036 80.0 0 2.81...
2 log10_size 0.837 0.006 134. 0 0.825...
3 condition2 -0.039 0.033 -1.16 0.246 -0.104...
4 condition3 0.032 0.031 1.04 0.3 -0.028...
...
Modeling with Data in the Tidyverse