Adversarial Examples

Similar documents
Notes on Adversarial Examples

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Deep Generative Models. (Unsupervised Learning)

Neural networks and optimization

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Lecture 3 Feedforward Networks and Backpropagation

CS60010: Deep Learning

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Understanding How ConvNets See

Lecture 14: Deep Generative Learning

Deep Feedforward Networks

Reading Group on Deep Learning Session 1

Deep Learning: Self-Taught Learning and Deep vs. Shallow Architectures. Lecture 04

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

How to do backpropagation in a brain

Fundamentals of Machine Learning

Deep Feedforward Networks. Lecture slides for Chapter 6 of Deep Learning Ian Goodfellow Last updated

Lecture 3 Feedforward Networks and Backpropagation

Tutorial on: Optimization I. (from a deep learning perspective) Jimmy Ba

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

CSC321 Lecture 20: Autoencoders

Introduction to Machine Learning

STA 414/2104: Lecture 8

CSCI-567: Machine Learning (Spring 2019)

Natural Language Processing with Deep Learning CS224N/Ling284

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?

Generative adversarial networks

Deep Learning Lab Course 2017 (Deep Learning Practical)

Bagging and Other Ensemble Methods

Ch.6 Deep Feedforward Networks (2/3)

CSC321 Lecture 9: Generalization

Deep Neural Networks (3) Computational Graphs, Learning Algorithms, Initialisation

Regularization in Neural Networks

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Linear & nonlinear classifiers

STA 414/2104: Lecture 8

ECE 595: Machine Learning I Adversarial Attack 1

Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks

Day 3 Lecture 3. Optimizing deep networks

Machine Learning Lecture 14

Tips for Deep Learning

Maxout Networks. Hien Quoc Dang

Tips for Deep Learning

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

ECE 595: Machine Learning I Adversarial Attack 1

Introduction to Neural Networks

Midterm Exam, Spring 2005

AE = q < H(p < ) + (1 q < )H(p > ) H(p) = p lg(p) (1 p) lg(1 p)

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

CSC321 Lecture 9: Generalization

Deep Feedforward Networks

CSC321 Lecture 16: ResNets and Attention

Neural networks and support vector machines

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

The connection of dropout and Bayesian statistics

Deep unsupervised learning

Today. Calculus. Linear Regression. Lagrange Multipliers

Final Exam, Fall 2002

Linear Models for Regression

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

From perceptrons to word embeddings. Simon Šuster University of Groningen

Towards ML You Can Rely On. Aleksander Mądry

Midterm Exam Solutions, Spring 2007

Unsupervised Learning

ECE521 week 3: 23/26 January 2017

Logistic Regression & Neural Networks

text classification 3: neural networks

arxiv: v3 [stat.ml] 20 Mar 2015

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Lecture : Probabilistic Machine Learning

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Training Neural Networks Practical Issues

Neural Networks in Structured Prediction. November 17, 2015

WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY,

Learning with multiple models. Boosting.

Universal Adversarial Networks

arxiv: v1 [cs.lg] 15 Nov 2017 ABSTRACT

Machine Learning Techniques

CSC321 Lecture 15: Exploding and Vanishing Gradients

Linear & nonlinear classifiers

FINAL: CS 6375 (Machine Learning) Fall 2014

Adversarially Robust Optimization and Generalization

Department of Computer Science and Engineering CSE 151 University of California, San Diego Fall Midterm Examination

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 3: Introduction to Deep Learning (continued)

Probabilistic Graphical Models

Fundamentals of Machine Learning (Part I)

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

Data Mining und Maschinelles Lernen

Machine Learning, Midterm Exam: Spring 2008 SOLUTIONS. Q Topic Max. Score Score. 1 Short answer questions 20.

OPTIMIZATION METHODS IN DEEP LEARNING

what can deep learning learn from linear regression? Benjamin Recht University of California, Berkeley

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Deep Learning for NLP

Qualifying Exam in Machine Learning

Transcription:

Adversarial Examples presentation by Ian Goodfellow Deep Learning Summer School Montreal August 9, 2015

In this presentation. - Intriguing Properties of Neural Networks. Szegedy et al., ICLR 2014. - Explaining and Harnessing Adversarial Examples. Goodfellow et al., ICLR 2014. - Distributional Smoothing by Virtual Adversarial Examples. Miyato et al ArXiv 2015.

Universal engineering machine (model-based optimization) Make new inventions by finding input that maximizes model s predicted performance Training data Extrapolation

Deep neural networks are as good as humans at......recognizing objects and faces. (Szegedy et al, 2014) (Taigmen et al, 2013)...solving CAPTCHAS and reading addresses... (Goodfellow et al, 2013) (Goodfellow et al, 2013) and other tasks...

Do neural networks understand these tasks? - John Searle s Chinese Room thought experiment 你好嗎? -> 我很好, 你呢? - What happens for a sentence not in the instruction book? 您貴姓 大名? ->

Turning objects into airplanes

Attacking a linear model

Clever Hans ( Clever Hans, Clever Algorithms, Bob Sturm)

Adversarial examples from overfitting O O O x x x x O O

Adversarial examples from underfitting O O x x x O x O O

Different kinds of low capacity Linear model: overconfident when extrapolating RBF: no opinion in most places

Modern deep nets are very (piecewise) linear Rectified linear unit Maxout Carefully tuned sigmoid LSTM

A thin manifold of accuracy Argument to softmax

Not every class change is a mistake Clean example Perturbation Corrupted example Perturbation changes the true class Random perturbation does not change the class Perturbation changes the input to rubbish class All three perturbations have L2 norm 3.96 This is actually small. We typically use 7!

Guaranteeing that class changes are mistakes Clean example Perturbation Corrupted example Each pixel has some precision below which changes can t be measured. -> Max norm constraint blocks previous attempt at changing the class constrain the max norm and we can guarantee that we do not change the class. To make adversarial examples for other kinds of data, we need to define different invariant constraint regions. Random perturbations still do not change the class Max norm constraint blocks change to rubbish class All three perturbations have L2 norm 3.96, same as on prev slide. They now have max norm.14142. This is accomplished by taking.14142 * sign(perturbation) and choosing the sign randomly for pixels that were not perturbed before.

The Fast Gradient Sign Method

Linear Adversarial examples

High-dimensional linear models Weights Clean examples Adversarial examples Signs of weights

Higher-dimensional linear models (Andrej Karpathy, Breaking Linear Classifiers on ImageNet )

RBFs behave more intuitively far from the data

Easy to optimize = easy to perturb Do we need to move past gradient-based optimization to overcome adversarial examples?

Ubiquitous hallucinations

Methods based on expensive search, strong hand-designed priors (Nguyen et al 2015) (Olah 2015)

Cross-model, cross-dataset generalization

Cross-model, cross-dataset generalization Neural net -> nearest neighbor: 25.3% error rate Smoothed nearest neighbor -> nearest neighbor: 47.2% error rate (a non-differentiable model doesn t provide much protection, it just requires the attacker to work indirectly) Adversarially trained neural net -> nearest neighbor: 22.15% error rate (Adversarially trained neural net -> self: 18% error rate) Maxout net -> relu net: 99.4% error rate agree on wrong class 85% of the time Maxout net -> tanh net: 99.3% error rate Maxout net -> softmax regression: 88.9% error rate agree on wrong class 67% of the time Maxout net -> shallow RBF: 36.8% error rate agree on class 43% of the time

Adversarial examples in the human visual system (Circles are concentric but appear intertwining) (Pinna and Gregory, 2002)

Failed defenses - Defenses that fail due to cross-model transfer: - Ensembles - Voting after multiple saccades - Other failed defenses: - Noise resistance - Generative modeling / unsupervised pretraining - Denoise the input with an autoencoder (Gu and Rigazio, 2014) - Defenses that solve the adversarial task only if they break the clean task performance: - Weight decay (L1 or L2) - Jacobian regularization (like double backprop) - Deep RBF network

Limiting total variation (weight constraints) Usually underfits before it solves the adversarial example problem. Limiting sensitivity to infinitesimal perturbation (double backprop, CAE) -Very hard to make the derivative close to 0 -Only provides constraint very near training examples, so does not solve adversarial examples. Limiting sensitivity to finite perturbation (adversarial training) -Easy to fit because slope is not constrained -Constrains function over a wide area

Generative modeling cannot solve the problem Both these two class mixture models implement the same marginal over x, with totally different posteriors over the classes. The likelihood criterion can t prefer one to the other, and in many cases will prefer the bad one.

Security implications - Must consider existence of adversarial examples when deciding whether to use machine learning - Attackers can shut down a system that detects and refuses to process adversarial examples - Attackers can control the output of a naive system - Attacks can resemble regular data, or can appear to be unstructured noise, or can be structured but unusual - Attacker does not need access to your model, parameters, or training set

Universal approximator theorem Neural nets can represent either function: Maximum Maximum likelihood likelihood doesn t doesn t cause cause them them to to learn the right function. But we can fix that...

Training on adversarial examples 0.0782% error on MNIST

Weaknesses persist

More weaknesses

Pertubation s effect on class distributions

Pertubation s effect after adversarial training

Virtual adversarial training - Penalize full KL divergence between predictions on clean and adversarial point - Does not need y - Semi-supervised learning - MNIST results: 0.64% test error (statistically tied with state of the art) 100 examples: VAE -> 3.33% error Virtual Adversarial -> 2.12% Ladder network -> 1.13%

Clearing up common misconceptions - Inputs that the model processes incorrectly are ubiquitous, not rare, and occur most often in half-spaces rather than pockets - Adversarial examples are not specific to deep learning - Deep learning is uniquely able to overcome adversarial examples, due to the universal approximator theorem - An attacker does not need access to a model or its training set - Common off-the-shelf regularization techniques like model averaging and unsupervised learning do not automatically solve the problem

Please use evidence, not speculation - It s common to say that obviously some technique will fix adversarial examples, and then just assume it will work without testing it - It s common to say in the introduction to some new paper on regularizing neural nets that this regularization research is justifie because of adversarial examples - Usually this is wrong - Please actually test your method on adversarial examples and report the results - Consider doing this even if you re not primarily concerned with adversarial examples

Recommended adversarial example benchmark - Fix epsilon - Compute the error rate on test data perturbed by the fast gradi sign method - Report the error rate, epsilon, and the version of the model use for both forward and back-prop - Alternative variant: design your own fixed-size perturbation scheme and report the error rate and size. For example, rotatio by some angle.

Alternative adversarial example benchmark - Use L-BFGS or other optimizer - Search for minimum size misclassified perturbation - Report average size - Report exhaustive detail to make the optimizer reproducible - Downsides: computation cost, difficulty of reproducing, hard to guarantee the perturbations will really be mistakes

Recommended fooling image / rubbish class benchmark - Fix epsilon - Fit a Gaussian to the training inputs - Draw samples from the Gaussian - Perturb them toward a specific positive class with the fast grad sign method - Report the rate at which you achieved this positive class - Report the rate at which the model believed any specific nonrubbish class was present (probability of that class being prese exceeds 0.5) - Report epsilon

Conclusion - Many modern ML algorithms - get the right answer for the wrong reason on naturally occurring inputs - get very wrong answers on almost all inputs - Adversarial training can expand the very narrow manifold of accuracy - Deep learning is on course to overcome adversarial examples, but maybe only by expanding our instruction book - Measure your model s error rate on fast gradient sign method adversarial examples and report it!