Integrated Gradients

\begin{equation} IG_{i}(x)=\left(x_{i}-x_{i}^{\prime}\right) \int_{\alpha=0}^{1} \frac{\partial F\left(x^{\prime}+\alpha \left(x-x^{\prime}\right)\right)}{\partial x_{i}} d \alpha \end{equation} Piotr Mazurek

Agenda

  • Method introduction
  • How to use it without understanding - Captum API
  • Intuition behind the method
  • Equations explained in detail
  • How to code it from scratch
  • Open questions and challenges

Visualisation

Engineering Approach

Use IG implementation from Captum

How to use it?

Load your data


transform = transforms.Compose([
 transforms.Resize(224),
 transforms.ToTensor(),
 transforms.Normalize()
])

img = Image.open('image/for/which/I/need/prediction.jpg')

input = transform(img).unsqueeze(0)

input.size()
>> torch.Size(1, 3, 224, 224)
					
Inspired by the official Captum tutorial

How to use it?

Load the trained model


model = models.resnet18(pretrained=True)
model = model.eval()
            

Get the prediction


output = F.softmax(model(input), dim=1)
output.size()
>> torch.Size(1, 1000)
            

prediction_score, pred_label_idx = torch.topk(output, 1)
pred_label_idx.squeeze_()
>> 21
            

How to use it?

Use your model as an input to the Captum Object


from captum.attr import IntegratedGradients

integrated_gradients = IntegratedGradients(model)

attributions_ig = integrated_gradients.attribute(input,
                                    target=pred_label_idx,
                                    n_steps=200)
attributions_ig.size()
>> torch.Size([1, 3, 224, 224])

        

Enjoy the out-of-the-box model explainability

Ok, it works for images, but can we do better?

Actually yes

IG is compatible with wide range of DL models

NLP example

Analyzing BERT


from transformers import BertTokenizer, BertForQuestionAnswering

model = BertForQuestionAnswering.from_pretrained(model_path)

tokenized_question = BertTokenizer("What's up")
tensor_answer = model(tokenized_question)

integrated_gradients = IntegratedGradients(model)

attributions_ig = integrated_gradients.attribute(tokenized_question,
                    target=tensor_answer,
                    n_steps=200)
More here

Visual question answering

API is great

But how does it actually work?

Baseline

"Baseline input is meant to represent “absence” of feature input"

Path intuition

Quick recap - what is a gradient?

\begin{equation} \frac{\partial F(x)}{\partial x} \end{equation}
"How F(x) changes is we change x"

Gradients in path

How can we "integrate" gradients?

\begin{equation} IG_{i}(x)=\left(x_{i}-x_{i}^{\prime}\right) \int_{\alpha=0}^{1} \frac{\partial F\left(x^{\prime}+\alpha \left(x-x^{\prime}\right)\right)}{\partial x_{i}} d \alpha \end{equation} Where:
$i=$ feature
$x=$ input
$x^{\prime}=$ baseline
$\alpha=$ interpolation constant to perturbe features by

Integral approximation

\begin{equation} IG^{a}_{i}(x)=(x_{i}-x'_{i}) \sum_{k=1}^{m} \frac{\partial F(x' + \frac{k}{m} (x - x'))}{\partial x_{i}} \frac{1}{m} \end{equation} Where:
$i=$ feature, $x=$ input, $x^{\prime}=$ baseline
$k=$ scaled feature perturbation constant
$m=$ number of steps in the Riemann sum approximation of the integral

Still don't get it?

Little visualisation

Can we code it?

\begin{equation} IG^{a}_{i}(x)=(x_{i}-x'_{i}) \sum_{k=1}^{m}\frac{\partial F(x' + \frac{k}{m} (x - x'))}{\partial x_{i}} \frac{1}{m} \end{equation}

Lets generate "path"

\begin{equation} \sum_{k=1}^{m}(x' + \frac{k}{m} (x - x')) \end{equation}

baseline = torch.zeros_like(input)

input.requires_grad = True

steps=200

scaled_inputs = [baseline + (i / steps) * (input - baseline)
                for i in range(0, steps)]

Calculate gradients

\begin{equation} \sum_{k=1}^{m}\frac{\partial F(x' + \frac{k}{m} (x - x'))}{\partial x_{i}} \end{equation}

gradients = []
for scaled_input in scaled_inputs:
    output = F.softmax(model(scaled_input), dim=1)
    output = output.gather(1, pred_label_idx)

    model.zero_grad()
    output.backward()

    gradient = input.grad.detach().cpu().numpy()[0]
    gradients.append(gradient)

Sum gradients

\begin{equation} IG^{a}_{i}(x)=(x_{i}-x'_{i}) \sum_{k=1}^{m}\frac{\partial F(x' + \frac{k}{m} (x - x'))}{\partial x_{i}} \frac{1}{m} \end{equation}

avg_grads = np.average(gradients, axis=0)
attributions_ig = (input - baseline) * avg_grads

attributions_ig.size()
>> torch.Size([1, 3, 224, 224])

How does it look in practice?

Attribution visualisation

Open questions:

How to choose a good baseline?

Do we need to bother with adding "zero" gradients?

How do we know the explanation is "good"?

Thanks

"There's no such thing as a stupid question!"

Piotr Mazurek
Presentation avalibe at: https://tugot17.github.io/Integrated-Gradients-Presentation/