Principal Component Analysis

Linear Algebra for Data Science in R

Eric Eager

Data Scientist at Pro Football Focus

Big Data

head(select(combine, height:shuttle))
  height weight forty vertical bench broad_jump three_cone shuttle
1     71    192  4.38     35.0    14        127       6.71    3.98
2     73    298  5.34     26.5    27         99       7.81    4.71
3     77    256  4.67     31.0    17        113       7.34    4.38
4     74    198  4.34     41.0    16        131       6.56    4.03
5     76    257  4.87     30.0    20        118       7.12    4.23
6     78    262  4.60     38.5    18        128       7.53    4.48
nrow(combine)
2885
Linear Algebra for Data Science in R

Big Data - Redundancy

Linear Algebra for Data Science in R

Principal Component Analysis

  • One of the more-useful methods from applied linear algebra
  • Non-parametric way of extracting meaningful information from confusing data sets
  • Uncovers hidden, low-dimensional structures that underlie your data
  • These structures are more-easily visualized and are often interpretable to content experts
Linear Algebra for Data Science in R

Principal Component Analysis - Motivating Example

Linear Algebra for Data Science in R

Let's practice!

Linear Algebra for Data Science in R

Preparing Video For Download...