Adversarial attacks on text classification models

Deep Learning for Text with PyTorch

Shubham Jain

Instructor

What are adversarial attacks?

  • Tweaks to input data
  • Not random but calculated malicious changes
  • Can drastically affect AI's decision-making
Deep Learning for Text with PyTorch

Importance of robustness

  • AI systems deciding if user comments are toxic or benign
  • AI unintentionally amplifying negative stereotypes from biased data
  • AI giving misleading information
Deep Learning for Text with PyTorch

Fast Gradient Sign Method (FGSM)

  • Exploits the model's learning information
  • Makes the tiniest possible change to deceive the model

FGSM Attack

Deep Learning for Text with PyTorch

Projected Gradient Descent (PGD)

  • More advanced than FGSM: it's iterative
  • Tries to find the most effective disturbance

PGD Attack

Deep Learning for Text with PyTorch

The Carlini & Wagner (C&W) attack

  • Focuses on optimizing the loss function
  • Not just about deceiving but about being undetectable

C&W Attack

Deep Learning for Text with PyTorch

Building defenses: strategies

  • Model Ensembling:
    • Use multiple models

 

  • Robust Data Augmentation:
    • Data augmentation

 

  • Adversarial Training:
    • Anticipate deception

Model Ensembling

Deep Learning for Text with PyTorch

Building defenses: tools & techniques

  • PyTorch's Robustness Toolbox:
    • Strengthen text models

 

  • Gradient Masking:
    • Add variety to training data to hide exploitable patterns

 

  • Regularization Techniques:
    • Ensure model balance

Python Robustness toolbox

Gradient Masking

Regularization Techniques

1 https://adversarial-robustness-toolbox.readthedocs.io/en/latest/, https://stock.adobe.com/ie/contributor/209161356/designer-s-circle
Deep Learning for Text with PyTorch

Let's practice!

Deep Learning for Text with PyTorch

Preparing Video For Download...