Introduction to dimensionality reduction

Dimensionality Reduction in R

Matt Pickard

Owner, Pickard Predictives, LLC

Dimensions

  • Dimensions are the vertical components of a tidy table
  • Dimensions = Columns = Features
  • # of dimensions = # of columns
df %>% ncol()
3

A tidy table highlighting the vertical dimensions

Dimensionality Reduction in R

What is dimensionality reduction?

Eliminating or combining features with little or no new information

Example

Tidy table with more columns

Dimensionality Reduction in R

What is dimensionality reduction?

Eliminating or combining features with little or no new information

Example

Tidy table with more columns highlighting features with redundant information

Dimensionality Reduction in R

What is dimensionality reduction?

Eliminating or combining features with little or no new information

Example

Tidy table with more columns highlighting the feature with all the same values

Dimensionality Reduction in R

Dimensionality reduction visually

3D projection to 2D surfaces

Dimensionality Reduction in R

Finding numeric columns with no variance

df %>% 
  summarize(
    across(
      everything(), 
      ~ var(., na.rm = TRUE))) %>%

pivot_longer( everything(), "feature", "variance")
# A tibble: 7 × 2
  feature              variance
  <chr>                   <dbl>
1 sqft_living           843534.
2 sqft_above            685735.
3 sqft_basement         195873.
4 sqft_living_near15    475480.
5 sqft_lot_near15    863386815.
6 num_garages                0 
7 num_hvac_units             0
Dimensionality Reduction in R

Mutual information

A Venn diagram with an intersection

Dimensionality Reduction in R

Mutual information

A Venn diagram with an intersection

Dimensionality Reduction in R

Mutual information

A Venn diagram with an intersection

Dimensionality Reduction in R

Mutual information

A Venn diagram with an intersection

Dimensionality Reduction in R

Create a correlation plot

library(corrr)

house_sales_df %>% select(where(is.numeric)) %>%
correlate() %>%
shave() %>%
rplot(print_cor = TRUE) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
Dimensionality Reduction in R

Correlation plot

Correlation plot

Dimensionality Reduction in R

Let's practice!

Dimensionality Reduction in R

Preparing Video For Download...