Information and feature importance

Dimensionality Reduction in R

Matt Pickard

Owner, Pickard Predictives, LLC

Quote on information gain

1 Provost, Foster; Fawcett, Tom (2013-07-27). Data Science for Business: What you need to know about data mining and data-analytic thinking. O'Reilly Media. Kindle Edition.
Dimensionality Reduction in R

Feature importance

Feature importance: a measure of information in model building

Predictor target model illustration

Many ways to measure feature importance

  • Correlation (with target variable)
  • Standardize regression coefficients
  • Information gain
Dimensionality Reduction in R

Decision tree example

A set of observations of loan defaults with characteristics of shape, color, outline, and texture

Dimensionality Reduction in R

Decision tree and information gain

Information gain - the amount of information we know about one variable by observing another variable

Information gain equation

set being split by some feature

Dimensionality Reduction in R

Entropy

  • A measure of disorder
  • As purity goes up, entropy goes down
  • Entropy values range from 0 (perfect purity) to 1 (perfect entropy)

Entropy graph

Dimensionality Reduction in R

Entropy: root node

Entropy equation

p_yes <- 7/16

p_no <- 9/16
entropy_root <- -(p_yes * log2(p_yes)) + -(p_no * log2(p_no))
entropy_root
0.989

Image of observations in root node

Dimensionality Reduction in R

Entropy: children nodes

p_left_yes <- 2/9

p_left_no <- 7/9
entropy_left <- -(p_left_yes * log2(p_left_yes)) + -(p_left_no * log2(p_left_no))

Decision tree split to make first level from root

Dimensionality Reduction in R

Entropy: children nodes

p_left_yes <- 2/9 

p_left_no <- 7/9
entropy_left <- -(p_left_yes * log2(p_left_yes)) + -(p_left_no * log2(p_left_no))
entropy_left
0.764

Decision tree split to make first level from root

Dimensionality Reduction in R

Entropy: children nodes

p_right_yes <- 5/7

p_right_no <- 2/7
entropy_right <- -(p_right_yes * log2(p_right_yes)) + -(p_right_no * log2(p_right_no))

Decision tree split to make first level from root

Dimensionality Reduction in R

Entropy: children nodes

p_right_yes <- 5/7 

p_right_no <- 2/7
entropy_right <- -(p_right_yes * log2(p_right_yes)) + -(p_right_no * log2(p_right_no))
entropy_right
0.863 

Decision tree split to make first level from root

Dimensionality Reduction in R

Information gain: root to children

p_left <- 9/16

p_right <- 7/16
info_gain <- entropy_root - (p_left * entropy_left + p_right * entropy_right)
info_gain
0.181

Decision tree split to make first level from root

Dimensionality Reduction in R

Compare information gain across features

Feature Information Gain
shape 0.181
texture 0.180
outline 0.106
color 0.106

Decision trree with question mark at split

Dimensionality Reduction in R

Let's practice!

Dimensionality Reduction in R

Preparing Video For Download...