Dimensionality Reduction in R
Matt Pickard
Owner, Pickard Predictives, LLC

Feature importance: a measure of information in model building
Many ways to measure feature importance

Information gain - the amount of information we know about one variable by observing another variable




p_yes <- 7/16p_no <- 9/16entropy_root <- -(p_yes * log2(p_yes)) + -(p_no * log2(p_no))entropy_root
0.989

p_left_yes <- 2/9p_left_no <- 7/9entropy_left <- -(p_left_yes * log2(p_left_yes)) + -(p_left_no * log2(p_left_no))

p_left_yes <- 2/9p_left_no <- 7/9entropy_left <- -(p_left_yes * log2(p_left_yes)) + -(p_left_no * log2(p_left_no))entropy_left
0.764

p_right_yes <- 5/7p_right_no <- 2/7entropy_right <- -(p_right_yes * log2(p_right_yes)) + -(p_right_no * log2(p_right_no))

p_right_yes <- 5/7p_right_no <- 2/7entropy_right <- -(p_right_yes * log2(p_right_yes)) + -(p_right_no * log2(p_right_no))entropy_right
0.863

p_left <- 9/16p_right <- 7/16info_gain <- entropy_root - (p_left * entropy_left + p_right * entropy_right)info_gain
0.181

| Feature | Information Gain |
|---|---|
| shape | 0.181 |
| texture | 0.180 |
| outline | 0.106 |
| color | 0.106 |

Dimensionality Reduction in R