SHAP kernel explainer

Explainable AI in Python

Fouad Trad

Machine Learning Engineer

SHAP kernel explainer

Derives SHAP values for any model
- K-nearest neighbors
- Neural networks
- Tree-based models
Slower than type-specific explainers

Image showing that SHAP explainers are divided between general explainers that can be applied to any model and type-specific explainers which are optimized for specific model types.

Heart disease

age	sex	blood_pressure	ecg_results	thalassemia
52	1	125	1	3
53	1	140	0	3
70	1	145	1	3
61	1	148	1	3
62	0	138	1	2

mlp_clf: multilayer perceptron predicting risk of heart disease

Insurance charges

age	gender	bmi	children	smoker	charges
19	0	27.900	0	1	16884.92
18	1	33.770	1	0	1725.55
28	1	33.000	3	0	4449.46
33	1	22.705	0	0	21984.47
32	1	28.880	0	0	3866.85

mlp_reg: multilayer perceptron predicting insurance charges

Creating kernel explainers

MLPRegressor

import shap


explainer = shap.KernelExplainer(
  # Model's prediction function, 
  # Representative summary of dataset
)

MLPClassifier

import shap


explainer = shap.KernelExplainer(
  # Model's prediction function, 
  # Representative summary of dataset
)

Creating kernel explainers

MLPRegressor

import shap

explainer = shap.KernelExplainer(
  mlp_reg.predict, 
  # Representative summary of dataset
)

MLPClassifier

import shap

explainer = shap.KernelExplainer(
  mlp_clf.predict_proba, 
  # Representative summary of dataset
)

Creating kernel explainers

MLPRegressor

import shap

explainer = shap.KernelExplainer(
  mlp_reg.predict, 
  shap.kmeans(X, 10)
)


shap_values_reg = explainer.shap_values(X)

MLPClassifier

import shap

explainer = shap.KernelExplainer(
  mlp_clf.predict_proba, 
  shap.kmeans(X, 10)
)


shap_values_cls = explainer.shap_values(X)

Feature importance

MLPRegressor

mean_reg = np.abs(shap_values_reg).mean(axis=0)

plt.bar(X.columns, mean_reg)

Image showing a bar plot for feature importances in the regression task, highlighting that smoking and age are the most influential factors in predicting charges.

MLPClassifier

mean_cls = np.abs(shap_values_cls[:,:,1]).mean(axis=0)

plt.bar(X.columns, mean_cls)

Image showing a bar plot for feature importances in the classification task, highlighting that chest pain type and thalassemia are the most influential factors in predicting charges.

Comparing with model-specific approaches

Linear regression

plt.bar(X.columns, np.abs(lin_reg.coef_))

Image showing a bar plot for feature importances in the using a linear regression model, highlighting that smoking and age are the most influential factors in predicting charges.

Logistic regression

plt.bar(X.columns, np.abs(log_reg.coef_[0]))

Image showing a bar plot for feature importances in the classification task, highlighting that chest pain type is the most influential factors in predicting charges.

Let's practice!

Explainable AI in Python