  • Why XAI and Where XAI?
  • Which XAI? (available methods)
  • Integrated Gradients
  • How to use it?
  • Potential issues with XAI methods
Why XAI and Where XAI?

Source: AAAI'20 Conference

Wrong Decisions might be costly

Source: https://taiwanenglishnews.com/tesla-on-autopilot-crashes-into-overturned-truck/

But XAI can help us improve models

Source: W. Samek, A. Binder, MICCAI’18

And we can learn from it

Source: Google DeepMind beating world champion Lee Se-Dol at GO

The Law

"Article 22 of GDPR empowers individuals with the right to demand an explanation of how an AI system made a decision that affects them" - European Commission
"Provide an assessment of the risks posed by the automated decision system to the privacy or security and the risks that contribute to inaccurate, unfair, biased, or discriminatory decisions impacting consumers" - Algorithmic Accountability Act 2019
Which XAI?

  • Surrogate Model Based
  • Attribution Based
  • Contrastive Explanations
  • Counterfactual / Recourse Based
  • Example similarity
Surrogate Model Based Explanation

Black Box model

Surrogate Model Based Explanation

Global Surrogate model

Surrogate Model Based Explanation

Source: J48 and VTJ48 decision trees doi:

LIME - Local Interpretable Model agnostic Explanations

Attribution Based Explanation

You have access to the model's structure

Integrated Gradients

$$ IntegratedGrads_{i}(x) ::= (x_{i} - x'_{i})\times\int_{\alpha=0}^1\frac{\partial F(x'+\alpha \times (x - x'))}{\partial x_i}{d\alpha} $$ $$ IntegratedGrads^{approx}_{i}(x)::=(x_{i}-x'_{i})\times\sum_{k=1}^{m}\frac{\partial F(x' + \frac{k}{m}\times(x - x'))}{\partial x_{i} } \times \frac{1}{m} $$

Source: Axiomatic Attribution for Deep Networks - M. Sundararajan, A. Taly, Q. Yan
Integrated Gradients - variables

  • $i$ - feature
  • $x$ - input (image)
  • $x'$ - baseline (image)
  • $\alpha$ - interpolation constant
  • $k$ - scaled interpolation constant
  • $m$ - number of steps
  • $(x_{i}-x'_{i})$ - scale factor
  • $F()$ - model's prediction function
  • $\frac{\partial F}{\partial x_{i} }$ - gradient relative to feature $x_i$
Test subjects


$$IntegratedGrads^{approx}_{i}(x)::=(x_{i}-x'_{i})\times\sum_{k=1}^{m}\frac{\partial F(\overbrace{x' + \frac{k}{m}\times(x - x')}^\text{interpolate m images at k intervals})}{\partial x_{i} } \times \frac{1}{m}$$
Interpolation process
Compute Gradients

$$ IntegratedGrads^{approx}_{i}(x)::=(x_{i}-x'_{i})\times\sum_{k=1}^{m}\frac{\overbrace{\partial F(\text{interpolated images})}^\text{compute gradients} }{\partial x_{i} } \times \frac{1}{m} $$
Compute Gradients

$$ IntegratedGrads^{approx}_{i}(x)::=(x_{i}-x'_{i})\times \overbrace{\sum_{k=1}^{m} }^\text{Sum m local gradients} \text{gradients(interpolated images)} \times \overbrace{\frac{1}{m} }^\text{Divide by m steps} $$
Class: "guinea pig"
Class: "bee"
Class: "guinea pig"
Class: "hare"
Contrastive Explanations

Why Spoonbill, rather than Flamingo?

Source: Contrastive Explanations in Neural Networks - 2020 - M. Prabhushankar, G. Kwon, D. Temel, G. AlRegib

Counterfactual / Recourse Based Explanations

How to change X to predict Y?

Source: Counterfactual Explanations & Adversarial Examples -- Common Grounds, Essential Differences, and Potential Transfers - 2020 - T. Freiesleben

Example similarity Explanations

Author: Andrej Karpathy, generated from ILSVRC 2012 images

How to use it

Available Tools

How to use it

Available Tools

  1. Captum
  2. tf-explain
  3. Lime
  4. SHAP


  • Dedicated to interpret torch-based models
  • Developed by Facebook under Facebook Open Source
  • Available on pypi
  • Used to develop and troubleshoot models
  • Can be used on production to help users understand model's prediction

Available Methods

  • Primary Attribution: Evaluates contribution of each input feature to the output of a model.
    • Integrated Gradients
    • Gradient SHAP
    • Saliency
    • Guided Backpropagation, Deconvolution
    • Guided GradCAM
  • Layer Attribution: Evaluates contribution of each neuron in a given layer to the output of the model.
    • Layer Conductance
    • Internal Influence
    • Layer Activation
    • GradCAM
  • Neuron Attribution: Evaluates contribution of each input feature on the activation of a particular hidden neuron.
    • Neuron Conductance
    • Neuron Integrated Gradients
    • Neuron GradientSHAP


                from captum.attr import GuidedGradCam
                from torchvision import models

                # Load pretrained AlexNet
                alexnet = models.alexnet(pretrained=True)

                # Create object to interpret the model
                # To make GradCam work we pass reference to last conv layer
                guided_gc = GuidedGradCam(alexnet, alexnet.features[10])

                # Predict output on data
                out = alexnet(batch)

                # Point out classes with highest score
                score, index = out.max(1)

                # Use interpreter to calculate attributions
                attributions = guided_gc.attribute(batch_t, index)

Captum Insights

                import torch
                import torch.nn.functional as F
                from captum.insights import AttributionVisualizer
                from captum.insights.features import ImageFeature
                from torchvision import models

                # Load pretrained AlexNet
                alexnet = models.alexnet(pretrained=True)

                # Launch visualization inside the notebook
                visualizer = AttributionVisualizer(
                    score_func=lambda o: F.softmax(o, 1),
                    classes=[...],  # class labels


Captum Insights


  • Dedicated to interpret TensorFlow-based models
  • No official backing from a large company
  • Available on pypi
  • Used to develop and troubleshoot models
  • Can be used on production to help users understand model's prediction

Available Methods

  • Activations Visualization
  • Vanilla Gradients
  • Gradients*Inputs
  • Occlusion Sensitivity
  • Grad CAM
  • SmoothGrad
  • Integrated Gradients

Usage - Core API

                from tf_explain.core.grad_cam import GradCAM

                explainer = GradCAM()

                output = explainer.explain(*explainer_args)

                explainer.save(output, output_dir, output_name)

Usage - Callbacks

                from tf_explain.callbacks.grad_cam import GradCAMCallback

                callbacks = [
                        validation_data=(x_val, y_val),
                model.fit(x_train, y_train, batch_size=2, epochs=2, callbacks=callbacks)


  • Applicable to any black-box classifier
  • Works on text, tabular and image data
  • Available on pypi


                from lime import lime_image

                explainer = lime_image.LimeImageExplainer()
                explanation = explainer.explain_instance(
                    image, # input image
                    predict, # predict function of interpreted classifier

Input image

Interpretation of dog prediction

Green areas are contributing positively to prediction, red areas have negative effect on prediction

DeepShap - Usage

                import numpy as np
                import shap

                model = ... # classifier model
                images = ... # image date set
                background = images[:100]
                test_images = images[100:103]

                explainer = shap.DeepExplainer(model, background)
                shap_values = e.shap_values(test_images)

                shap_numpy = [np.swapaxes(np.swapaxes(s, 1, -1), 1, 2) for s in shap_values]
                test_numpy = np.swapaxes(np.swapaxes(test_images.numpy(), 1, -1), 1, 2)

                shap.image_plot(shap_numpy, -test_numpy)

DeepShap - Example

Contribution of features to each class prediction

SHAP - Gradient Explainer

Combines Integrated Gradients, SHAP and SmoothGrad into single interpretation method.

Contribution of features to 2 top predictions

XAI challenges

How to evaluate explanation?

Why one explanation is better then another

Two distinct approaches

Human-grounded Measures

Computational Measures

Human-grounded Measures

Metrics like IoU

Researchers conduct surveys where they ask: "which part of data is important"

Computational Measures

Dozens of approaches

No established standards

Open research area


How much explanation differ given a perturbed input image


net = ImageClassifier()
saliency = Saliency(net)

# Computes saliency maps for class 3 for the input image.
attribution = saliency.attribute(input, target=3)

# define a perturbation function for the input
def perturb_fn(inputs):
    noise = torch.tensor(np.random.normal(0, 0.003, inputs.shape)).float()
    return noise, inputs - noise

infidelity(net, perturb_fn, input, attribution)
>> 0.2177


We remove "important" inputs and network confidence should decrease

XAI challenges

How to explain a wrong prediction

Wrong prediction

Explanation makes even less sense

XAI challenges

Hyperparameters have a massive impact on a given explanation

Ok, I'm hyped

Where I should start my PhD


  • XAI methods doesn't always work great
  • but still we need them to trust ML systems
  • Available out-of-the-box implementations that will work with your model
  • An open problem, many challenges



"There's no such thing as a stupid question!"

Kemal Erdem, Piotr Mazurek, Piotr Rarus
