Dimensionality Reduction in R
Matt Pickard
Owner, Pickard Predictives, LLC
Feature importance: a measure of information in model building
Many ways to measure feature importance
Information gain - the amount of information we know about one variable by observing another variable
p_yes <- 7/16
p_no <- 9/16
entropy_root <- -(p_yes * log2(p_yes)) + -(p_no * log2(p_no))
entropy_root
0.989
p_left_yes <- 2/9
p_left_no <- 7/9
entropy_left <- -(p_left_yes * log2(p_left_yes)) + -(p_left_no * log2(p_left_no))
p_left_yes <- 2/9
p_left_no <- 7/9
entropy_left <- -(p_left_yes * log2(p_left_yes)) + -(p_left_no * log2(p_left_no))
entropy_left
0.764
p_right_yes <- 5/7
p_right_no <- 2/7
entropy_right <- -(p_right_yes * log2(p_right_yes)) + -(p_right_no * log2(p_right_no))
p_right_yes <- 5/7
p_right_no <- 2/7
entropy_right <- -(p_right_yes * log2(p_right_yes)) + -(p_right_no * log2(p_right_no))
entropy_right
0.863
p_left <- 9/16
p_right <- 7/16
info_gain <- entropy_root - (p_left * entropy_left + p_right * entropy_right)
info_gain
0.181
Feature | Information Gain |
---|---|
shape | 0.181 |
texture | 0.180 |
outline | 0.106 |
color | 0.106 |
Dimensionality Reduction in R