Selecting based on correlation with other features

Dimensionality Reduction in R

Matt Pickard

Owner, Pickard Predictives, LLC

Review correlation plot creation

healthcare_df %>% 
  select(where(is.numeric)) %>%

correlate() %>%
shave() %>%
rplot(print_cor = TRUE) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
Dimensionality Reduction in R

Correlation plot

correlation plot of the healthcare company attrition

Dimensionality Reduction in R

Correlation strength

correlation plot of the healthcare company attrition with table of correlation strength ranges

Dimensionality Reduction in R

A correlation filter?

correlation plot of the healthcare company attrition

Dimensionality Reduction in R

A correlation filter?

venn diagram of percent salary hike and performance rating showing large overlap of mutual information

Dimensionality Reduction in R

A correlation filter?

Venn diagram of percent salary hike with performance rating removed

Dimensionality Reduction in R

A correlation filter?

Both parts of the Venn diagram removed to show we removed valuable information

Dimensionality Reduction in R

A correlation filter recipe

# create and prep the recipe
corr_recipe <-  
  recipe(Attrition ~ ., data = healthcare_df) %>%

step_corr(all_numeric_predictors(), threshold = 0.7) %>%
prep()
# Apply the recipe to the data filtered_healthcare_df <- corr_recipe %>% bake(new_data = NULL)
# Identify the features that were removed tidy(corr_recipe, number = 1)
Dimensionality Reduction in R

Let's practice!

Dimensionality Reduction in R

Preparing Video For Download...