Meer dan 2 verklarende variabelen

Intermediary Regression in R

Richie Cotton

Data Evangelist at DataCamp

Van de vorige keer

ggplot(
  fish,
  aes(length_cm, height_cm, color = mass_g)
) +
  geom_point() +
  scale_color_viridis_c(option = "inferno")

Spreiding: lengte vs. hoogte van vissen, kleur = massa (inferno)

Intermediary Regression in R

Facet op soort

ggplot(
  fish,
  aes(length_cm, height_cm, color = mass_g)
) +
  geom_point() +
  scale_color_viridis_c(option = "inferno") +
  facet_wrap(vars(species))

Spreiding: lengte vs. hoogte van vissen, kleur = massa, per soort

Intermediary Regression in R

Verschillende interactieniveaus

Geen interacties

lm(mass_g ~ length_cm + height_cm + species + 0, data = fish)

2-weg-interacties tussen paren variabelen

lm(
  mass_g ~ length_cm + height_cm + species + length_cm:height_cm + length_cm:species + height_cm:species + 0, 
  data = fish
)

3-weg-interactie tussen alle drie variabelen

lm(
  mass_g ~ length_cm + height_cm + species + length_cm:height_cm + length_cm:species + height_cm:species + length_cm:height_cm:species + 0, 
  data = fish
)
Intermediary Regression in R

Alle interacties

lm(
  mass_g ~ length_cm + height_cm + species + length_cm:height_cm + length_cm:species + height_cm:species + length_cm:height_cm:species + 0, 
  data = fish
)
lm(
  mass_g ~ length_cm * height_cm * species + 0, 
  data = fish
)
Intermediary Regression in R

Alleen 2-weg-interacties

lm(
  mass_g ~ length_cm + height_cm + species + length_cm:height_cm + length_cm:species + height_cm:species + 0, 
  data = fish
)
lm(
  mass_g ~ (length_cm + height_cm + species) ^ 2 + 0, 
  data = fish
)
lm(
  mass_g ~ I(length_cm) ^ 2 + height_cm + species + 0, 
  data = fish
)
1 Voor het kwadrateren van verklarende variabelen, zie "Introduction to Regression in R", Hoofdstuk 2, "Transforming variables"
Intermediary Regression in R

De voorspelflow

mdl_mass_vs_all <- lm(mass_g ~ length_cm * height_cm * species * 0, data = fish)

explanatory_data <- expand_grid(
  length_cm = seq(5, 60, 6),
  height_cm = seq(2, 20, 2),
  species = unique(fish$species)
)

prediction_data <- explanatory_data %>% 
  mutate(mass_g = predict(mdl_mass_vs_all, explanatory_data))
Intermediary Regression in R

Voorspellingen visualiseren

ggplot(
  fish,
  aes(length_cm, height_cm, color = mass_g)
) +
  geom_point() +
  scale_color_viridis_c(option = "inferno") +
  facet_wrap(vars(species)) +
  geom_point(
    data = prediction_data, 
    size = 3, shape = 15
  )

Spreiding: lengte vs. hoogte van vissen, kleur = massa, per soort, met voorspellingen

Intermediary Regression in R

Laten we oefenen!

Intermediary Regression in R

Preparing Video For Download...