Removing redundant features

Preprocessing for Machine Learning in Python

James Chapman

Curriculum Manager, DataCamp

Redundant features

  • Remove noisy features
  • Remove correlated features
  • Remove duplicated features
Preprocessing for Machine Learning in Python

Scenarios for manual removal

city state lat long
hico tx 31.982778 -98.033333
mackinaw city mi 45.783889 -84.727778
winchester ky 37.990000 -84.179722
Preprocessing for Machine Learning in Python

Correlated features

  • Statistically correlated: features move together directionally
  • Linear models assume feature independence
  • Pearson's correlation coefficient
Preprocessing for Machine Learning in Python

Correlated features

print(df)
        A     B     C
0    3.06  3.92  1.04
1    2.76  3.40  1.05
2    3.24  3.17  1.03
...
print(df.corr())
          A         B         C
A  1.000000  0.787194  0.543479
B  0.787194  1.000000  0.565468
C  0.543479  0.565468  1.000000
Preprocessing for Machine Learning in Python

Let's practice!

Preprocessing for Machine Learning in Python

Preparing Video For Download...