Dimensionality reduction: feature extraction

Practicing Machine Learning Interview Questions in Python

Lisa Stuart

Data Scientist

Unsupervised learning methods

  • Principal component analysis (PCA) --> Lesson 3.1
  • Singular value decomposition (SVD) --> Lesson 3.1
  • Clustering/grouping --> Lesson 3.3
  • Exploratory data mining
Practicing Machine Learning Interview Questions in Python

Dimensionality reduction != feature selection

PCA plot

Feature selection

1 https://slideplayer.com/slide/9699240/ 2 https://www.analyticsvidhya.com/blog/2016/03/practical-guide-principal-component-analysis-python/
Practicing Machine Learning Interview Questions in Python

Curse of dimensionality

Dimensionality vs performance plot

1 https://www.visiondummy.com/2014/04/curse-dimensionality-affect-classification/
Practicing Machine Learning Interview Questions in Python

1-D search

1-D search space

Practicing Machine Learning Interview Questions in Python

2-D search

2-D search space

Practicing Machine Learning Interview Questions in Python

3-D search

3-D search space

Practicing Machine Learning Interview Questions in Python

Dimensionality reduction methods

  • PCA
  • SVD
Practicing Machine Learning Interview Questions in Python

PCA

Iris PCA plot

  • PCA
    • Relationship between X and y
    • Calculated by finding principal axes
    • Translates, rotates and scales
    • Lower-dimensional projection of the data
1 https://scikit-learn.org/stable/modules/decomposition.html
Practicing Machine Learning Interview Questions in Python

SVD

Iris SVD plot

  • SVD
    • Linear algebra and vector calculus
    • Decomposes data matrix into three matrices
    • Results in 'singular' values
    • Variance in data approximately equals SS of singular values
1 https://galaxydatatech.com/2018/07/15/singular-value-decomposition/
Practicing Machine Learning Interview Questions in Python

Dimension reduction functions

Function/method returns
sklearn.decomposition.PCA principal component analysis
sklearn.decomposition.TruncatedSVD singular value decomposition
PCA/SVD.fit_transform(X) fits and transforms data
PCA/SVD.explained_variance_ratio_ variance explained by PCs
Practicing Machine Learning Interview Questions in Python

Let's practice!

Practicing Machine Learning Interview Questions in Python

Preparing Video For Download...