Explainability metrics

Explainable AI in Python

Fouad Trad

Machine Learning Engineer

Consistency

Assesses stability of explanations when model trained on different subsets
Low consistency → no robust explanations

Image showing a dataset divided into two subsets: subset 1, and subset 2.

Consistency

Assesses stability of explanations when model trained on different subsets
Low consistency → no robust explanations

A model gets trained on each subset.

Consistency

Assesses stability of explanations when model trained on different subsets
Low consistency → no robust explanations

Each time the model gets trained on a subset, we derive feature importances, and we compute cosine similarity for these.

Cosine similarity to measure consistency

Image showing Consistency key values and their meanings: for 1 we have highly consistent explanations.

Cosine similarity to measure consistency

Image showing Consistency key values and their meanings: for 1 we have highly consistent explanations, for 0, no consistent explanations,

Cosine similarity to measure consistency

Image showing Consistency key values and their meanings: for 1 we have highly consistent explanations, for 0, no consistent explanations, and for -1, opposite explanations.

Admissions dataset

GRE Score	TOEFL Score	University Rating	SOP	LOR	CGPA	Chance of Admit
337	118	4	4.5	4.5	9.65	0.92
324	107	4	4	4.5	8.87	0.76
316	104	3	3	3.5	8	0.72
322	110	3	3.5	2.5	8.67	0.8
314	103	2	2	3	8.21	0.45

X1, y1: first part of the dataset
X2, y2: second part of the dataset
model1, model2: random forest regressors

Computing consistency

from sklearn.metrics.pairwise import cosine_similarity


explainer1 = shap.TreeExplainer(model1)
explainer2 = shap.TreeExplainer(model2)


shap_values1 = explainer1.shap_values(X1)
shap_values2 = explainer2.shap_values(X2)


feature_importance1 = np.mean(np.abs(shap_values1), axis=0)
feature_importance2 = np.mean(np.abs(shap_values2), axis=0)


consistency = cosine_similarity([feature_importance1], [feature_importance2])
print("Consistency between SHAP values:", consistency)

Consistency between SHAP values: [[0.99706516]]

Faithfulness

Evaluates if important features influence model's predictions
Low faithfulness → misleads trust in model reasoning
Useful in sensitive applications

Image showing a model generating an original prediction for an input sample.

Faithfulness

Evaluates if important features influence model's predictions
Low faithfulness → misleads trust in model reasoning
Useful in sensitive applications

SHAP or LIME are used to locally explain this original prediction.

Faithfulness

Evaluates if important features influence model's predictions
Low faithfulness → misleads trust in model reasoning
Useful in sensitive applications

A modified sample is fed into the model to generate new prediction.

Faithfulness

Evaluates if important features influence model's predictions
Low faithfulness → misleads trust in model reasoning
Useful in sensitive applications

Image showing the formula for deriving faithfulness as the absolute value of the difference between the new prediction and the original prediction.

Computing faithfulness

X_instance = X_test.iloc[[0]]

original_prediction = model.predict_proba(X_instance)[0, 1]
print(f"Original prediction: {original_prediction}")

Original prediction: 0.43

Image showing LIME's feature importance explanation for the selected sample.

Computing faithfulness

X_instance['GRE Score'] = 310  


new_prediction = model.predict_proba(X_instance)[0, 1]
print(f"Prediction after perturbing {important_feature}: {new_prediction}")


faithfulness_score = np.abs(original_prediction - new_prediction)
print(f"Local Faithfulness Score: {faithfulness_score}")

Prediction after perturbing GRE Score: 0.77

Local Faithfulness Score: 0.34

Let's practice!

Explainable AI in Python