Adversarial attack on DL

A demonstration fo adversarial attack on CNN based image classification models

This projecct involves using automatic differentiation’s capabilities to explain model classifications and create adversarial examples. I first explored how gradients can be used to explain which portion of the input the model relied on for making its classification. Then I implemented Grad-CAM from scratch.

Then explored two basic adversarial methods in order to cause ResNet18 to predict another class by perturbing the input images. To improve reconstruction quality.

Key Features:

  • Used gradients to highlight important image regions.
  • Impelemented Grad-CAM from scratch
  • Implemented adversarial methods to create misleading inputs.
  • Analyzed model behavior and robustness.

Tools and Technologies:

  • Python
  • PyTorch

Source Code

The complete source code for this project is available on GitHub.