Feature selection vs. feature extraction

Dimensionality Reduction in Python

Jeroen Boeye

Head of Machine Learning, Faktion

Why reduce dimensionality?

 

Your dataset will:
  • be less complex
  • require less disk space
  • require less computation time
  • have lower chance of model overfitting
Dimensionality Reduction in Python

Feature selection

income age favorite color

Dimensionality Reduction in Python

Feature selection

removed feature

insurance_df.drop('favorite color', axis=1)
Dimensionality Reduction in Python

Building a pairplot on ANSUR data

sns.pairplot(ansur_df, hue="gender", diag_kind='hist')

body pairplot

Dimensionality Reduction in Python

Building a pairplot on ANSUR data

sns.pairplot(ansur_df, hue="gender", diag_kind='hist')

body pairplot annotated

Dimensionality Reduction in Python

Building a pairplot on ANSUR data

sns.pairplot(ansur_df, hue="gender", diag_kind='hist')

constant pairplot

Dimensionality Reduction in Python

Feature selection

Feature selection schema

Dimensionality Reduction in Python

Feature selection

Feature selection schema

Feature extraction

Feature extraction schema

Dimensionality Reduction in Python

Feature extraction - Example

4 vs. 4 pairplot

Dimensionality Reduction in Python

Feature extraction - Example

pairplot pca transform

Dimensionality Reduction in Python

Let's practice!

Dimensionality Reduction in Python

Preparing Video For Download...