Explainability metrics

Explainable AI in Python

Fouad Trad

Machine Learning Engineer

Consistency

  • Assesses stability of explanations when model trained on different subsets
  • Low consistency → no robust explanations

Image showing a dataset divided into two subsets: subset 1, and subset 2.

Explainable AI in Python

Consistency

  • Assesses stability of explanations when model trained on different subsets
  • Low consistency → no robust explanations

A model gets trained on each subset.

Explainable AI in Python

Consistency

  • Assesses stability of explanations when model trained on different subsets
  • Low consistency → no robust explanations

Each time the model gets trained on a subset, we derive feature importances, and we compute cosine similarity for these.

Explainable AI in Python

Cosine similarity to measure consistency

 

 

 

Image showing Consistency key values and their meanings: for 1 we have highly consistent explanations.

Explainable AI in Python

Cosine similarity to measure consistency

 

 

 

Image showing Consistency key values and their meanings: for 1 we have highly consistent explanations, for 0, no consistent explanations,

Explainable AI in Python

Cosine similarity to measure consistency

 

 

 

Image showing Consistency key values and their meanings: for 1 we have highly consistent explanations, for 0, no consistent explanations, and for -1, opposite explanations.

Explainable AI in Python

Admissions dataset

GRE Score TOEFL Score University Rating SOP LOR CGPA Chance of Admit
337 118 4 4.5 4.5 9.65 0.92
324 107 4 4 4.5 8.87 0.76
316 104 3 3 3.5 8 0.72
322 110 3 3.5 2.5 8.67 0.8
314 103 2 2 3 8.21 0.45

 

  • X1, y1: first part of the dataset
  • X2, y2: second part of the dataset
  • model1, model2: random forest regressors
Explainable AI in Python

Computing consistency

from sklearn.metrics.pairwise import cosine_similarity

explainer1 = shap.TreeExplainer(model1) explainer2 = shap.TreeExplainer(model2)
shap_values1 = explainer1.shap_values(X1) shap_values2 = explainer2.shap_values(X2)
feature_importance1 = np.mean(np.abs(shap_values1), axis=0) feature_importance2 = np.mean(np.abs(shap_values2), axis=0)
consistency = cosine_similarity([feature_importance1], [feature_importance2]) print("Consistency between SHAP values:", consistency)
Consistency between SHAP values: [[0.99706516]]
Explainable AI in Python

Faithfulness

  • Evaluates if important features influence model's predictions
  • Low faithfulness → misleads trust in model reasoning
  • Useful in sensitive applications

Image showing a model generating an original prediction for an input sample.

Explainable AI in Python

Faithfulness

  • Evaluates if important features influence model's predictions
  • Low faithfulness → misleads trust in model reasoning
  • Useful in sensitive applications

SHAP or LIME are used to locally explain this original prediction.

Explainable AI in Python

Faithfulness

  • Evaluates if important features influence model's predictions
  • Low faithfulness → misleads trust in model reasoning
  • Useful in sensitive applications

A modified sample is fed into the model to generate new prediction.

Explainable AI in Python

Faithfulness

  • Evaluates if important features influence model's predictions
  • Low faithfulness → misleads trust in model reasoning
  • Useful in sensitive applications

Image showing the formula for deriving faithfulness as the absolute value of the difference between the new prediction and the original prediction.

Explainable AI in Python

Computing faithfulness

X_instance = X_test.iloc[[0]]

original_prediction = model.predict_proba(X_instance)[0, 1] print(f"Original prediction: {original_prediction}")
Original prediction: 0.43

Image showing LIME's feature importance explanation for the selected sample.

Explainable AI in Python

Computing faithfulness

X_instance['GRE Score'] = 310  


new_prediction = model.predict_proba(X_instance)[0, 1] print(f"Prediction after perturbing {important_feature}: {new_prediction}")
faithfulness_score = np.abs(original_prediction - new_prediction) print(f"Local Faithfulness Score: {faithfulness_score}")
Prediction after perturbing GRE Score: 0.77

Local Faithfulness Score: 0.34
Explainable AI in Python

Let's practice!

Explainable AI in Python

Preparing Video For Download...