Blind Attacks on Machine Learners

Similar documents
Blind Attacks on Machine Learners

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

3.0.1 Multivariate version and tensor product of experiments

Machine Teaching. for Personalized Education, Security, Interactive Machine Learning. Jerry Zhu

Nishant Gurnani. GAN Reading Group. April 14th, / 107

Distirbutional robustness, regularizing variance, and adversaries

Adversarial Label Flips Attack on Support Vector Machines

21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk)

21.1 Lower bounds on minimax risk for functional estimation

Information Theory Based Estimator of the Number of Sources in a Sparse Linear Mixing Model

Machine learning - HT Maximum Likelihood

Sample complexity bounds for differentially private learning

Quiz 2 Date: Monday, November 21, 2016

Lecture 17: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 31, 2016 [Ed. Apr 1]

6.1 Variational representation of f-divergences

Lecture 2: Statistical Decision Theory (Part I)

Matrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang

Hands-On Learning Theory Fall 2016, Lecture 3

Statistical Data Mining and Machine Learning Hilary Term 2016

16.1 Bounding Capacity with Covering Number

Lecture 3 January 28

Lecture 35: December The fundamental statistical distances

Lecture 9: October 25, Lower bounds for minimax rates via multiple hypotheses

Lecture 20: Introduction to Differential Privacy

4 Invariant Statistical Decision Problems

On Bayes Risk Lower Bounds

Worst-Case Bounds for Gaussian Process Models

Revisiting the Exploration-Exploitation Tradeoff in Bandit Models

Auto-Encoding Variational Bayes

Optimal Estimation of a Nonsmooth Functional

How hard is this function to optimize?

Lecture 21: Minimax Theory

Target Localization in Wireless Sensor Networks with Quantized Data in the Presence of Byzantine Attacks

MIT Spring 2016

A Note on the Expectation-Maximization (EM) Algorithm

Kernel methods for comparing distributions, measuring dependence

Bayesian Dropout. Tue Herlau, Morten Morup and Mikkel N. Schmidt. Feb 20, Discussed by: Yizhe Zhang

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Graphical Models for Collaborative Filtering

Advanced Machine Learning

Stochastic Gradient Descent with Variance Reduction

Lecture 7 Introduction to Statistical Decision Theory

Lecture 16: FTRL and Online Mirror Descent

Generative Adversarial Networks

Energy-Based Generative Adversarial Network

Sample and Computationally Efficient Active Learning. Maria-Florina Balcan Carnegie Mellon University

1 Hoeffding s Inequality

Model Selection and Geometry

Extremal Mechanisms for Local Differential Privacy

Robustness in GANs and in Black-box Optimization

Robustness and duality of maximum entropy and exponential family distributions

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization John Duchi, Elad Hanzan, Yoram Singer

Minimax lower bounds I

Lecture 6: Graphical Models: Learning

Extremal Mechanisms for Local Differential Privacy

Permuation Models meet Dawid-Skene: A Generalised Model for Crowdsourcing

Surrogate regret bounds for generalized classification performance metrics

Machine Learning Summer School

U Logo Use Guidelines

Finding the best mismatched detector for channel coding and hypothesis testing

The Method of Types and Its Application to Information Hiding

Lecture 8: Minimax Lower Bounds: LeCam, Fano, and Assouad

Gaussian Estimation under Attack Uncertainty

Minimax Estimation of Kernel Mean Embeddings

Asymptotics of minimax stochastic programs

Monte-Carlo Tree Search by. MCTS by Best Arm Identification

Differentially Private Machine Learning

Unsupervised Anomaly Detection for High Dimensional Data

LEARNING TO SAMPLE WITH ADVERSARIALLY LEARNED LIKELIHOOD-RATIO

CSCI-567: Machine Learning (Spring 2019)

Wasserstein Training of Boltzmann Machines

A Minimax Approach to Supervised Learning

Likelihood inference in the presence of nuisance parameters

Lecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;

Why (Bayesian) inference makes (Differential) privacy easy

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:

Minimax Rates. Homology Inference

1.1 Basis of Statistical Decision Theory

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Reward-modulated inference

Measuring nuisance parameter effects in Bayesian inference

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

EM Algorithm LECTURE OUTLINE

OPTIMIZATION METHODS IN DEEP LEARNING

STA141C: Big Data & High Performance Statistical Computing

Communication-efficient and Differentially-private Distributed SGD

FORMULATION OF THE LEARNING PROBLEM

Quantifying Cyber Security for Networked Control Systems

Bandit models: a tutorial

Bayesian Decision and Bayesian Learning

sparse and low-rank tensor recovery Cubic-Sketching

Asymptotics for Nonlinear GMM

Class Prior Estimation from Positive and Unlabeled Data

Unlabeled Data: Now It Helps, Now It Doesn t

Infinitely Imbalanced Logistic Regression

Differential Privacy without Sensitivity

The Optimal Mechanism in Differential Privacy

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Information Theory and Distribution Modeling

ECS171: Machine Learning

Lecture 2: August 31

Transcription:

Blind Attacks on Machine Learners Alex Beatson 1 Zhaoran Wang 1 Han Liu 1 1 Princeton University NIPS, 2016/ Presenter: Anant Kharkar NIPS, 2016/ Presenter: Anant Kharkar 1

Outline 1 Introduction Motivation 2 Bounds Minimax Problem Scenarios 3 Results Informed Learner Blind Learner 4 Summary NIPS, 2016/ Presenter: Anant Kharkar 2

Outline 1 Introduction Motivation 2 Bounds Minimax Problem Scenarios 3 Results Informed Learner Blind Learner 4 Summary NIPS, 2016/ Presenter: Anant Kharkar 3

Motivation Context: data injection attack (adversarial data added to existing distribution) Past work assumes attacker has knowledge of learner s algorithm (or can query for it) Here, consider both informed and blind attacker Statistical privacy - users may want to protect data via noise Objective: adversary makes it difficult to estimate distr. params NIPS, 2016/ Presenter: Anant Kharkar 4

Notation Distribution of interest: F θ density f θ, family F, data X i Malicious distribution: G φ density g φ, family G, data X i Combined dataset: Z, distribution P p(z) = αf θ (z) + (1 α)g φ (z) NIPS, 2016/ Presenter: Anant Kharkar 5

Outline 1 Introduction Motivation 2 Bounds Minimax Problem Scenarios 3 Results Informed Learner Blind Learner 4 Summary NIPS, 2016/ Presenter: Anant Kharkar 6

Minimax Minimax risk - worst-case bound on population risk of estimator: M n = inf ˆψ sup E Z1:n Pψ n L(ψ, ˆψ n ) ψ Ψ Intuitively: minimum worst-case risk = minimum worst-case expected l2-norm KL-Divergence - deviation between two distributions Mutual information I (Z, V ) - measure of dependence between random variables NIPS, 2016/ Presenter: Anant Kharkar 7

Bounds Le Cam: Fano: [ 1 M n L(ψ 1, ψ 2 ) 2 1 ] nd 2 KL (P φ1, P φ2 2 [ M n δ 1 I (Z ] 1:n; V ) + log2 log V I (Z, V ) upper-bounded by D KL NIPS, 2016/ Presenter: Anant Kharkar 8

Outline 1 Introduction Motivation 2 Bounds Minimax Problem Scenarios 3 Results Informed Learner Blind Learner 4 Summary NIPS, 2016/ Presenter: Anant Kharkar 9

Blind Attacker, Informed Learner Attacker knows F but not F θ, learner knows G φ Objective: maximize M n by choice of G φ φ = argmax φ M n = argmax φ inf ˆψ Minimize KL-Divergence ˆφ = argmin φ θ i V θ j V sup E Z1:n Pψ n L(ψ, ˆψ n ) ψ Ψ D KL (P θi,φ P θj,φ) V 2 n I (Z n ; θ) NIPS, 2016/ Presenter: Anant Kharkar 1

Blind Attacker, Blind Learner Learner does not know G φ, but knows G Ĝ = argmin G G = argmaxinf ˆθ (θ i,φ i ) V (θ j,φ j ) V sup (F θ,g φ ) F G E Z1:n L(θ, ˆθ) D KL (P θi,φ i P θj,φ j ) V 2 n I (Z n ; θ) NIPS, 2016/ Presenter: Anant Kharkar 1

Outline 1 Introduction Motivation 2 Bounds Minimax Problem Scenarios 3 Results Informed Learner Blind Learner 4 Summary NIPS, 2016/ Presenter: Anant Kharkar 1

Informed Learner D KL (P i P j ) + D KL (P j P i ) α2 (1 α) F i F j 2 TV Vol(Z) Le Cam bound: ( ) 1 M n L(θ 1, θ 2 ) 2 1 2 α 2 2 (1 α) n F 1 F 2 2 TV Vol(Z) Fano bound: M n δ ( 1 α 2 (1 α) ) Vol(Z)nτδ + log2 log V Uniform attack bounds effective sample size at n α2 (1 α) Vol(Z) NIPS, 2016/ Presenter: Anant Kharkar 1

Outline 1 Introduction Motivation 2 Bounds Minimax Problem Scenarios 3 Results Informed Learner Blind Learner 4 Summary NIPS, 2016/ Presenter: Anant Kharkar 1

Informed Learner For α 1 2 - attacker can make learning impossible (KL-divergences sum to 0) Mimic attack: (G φ = F θ ) D KL (P i P j ) + D KL (P j P i ) (2α 1)2 (1 α) F i F j 2 TV 4 α4 1 α F 1 F 2 2 TV KL-divergence 0 as α 1 2 NIPS, 2016/ Presenter: Anant Kharkar 1

Summary Injection attacks against ML models 2 cases: blind learner, informed learner (attacker always blind) 2 attacks: uniform injection, mimic Attacker maximizes lower bounds on minimax risk NIPS, 2016/ Presenter: Anant Kharkar 1