Adversarial Label Flips Attack on Support Vector Machines

Size: px
Start display at page:

Download "Adversarial Label Flips Attack on Support Vector Machines"

Transcription

1 Adversarial Label Flips Attack on Support Vector Machines Han Xiao, Huang Xiao, Claudia Eckert Institute of Informatics TU München August 26, 2012

2 Overview 1 Causative Attack 2 Problem Formulation 3 Label Flip Attack on SVMs 4 Experiment 5 Summary H.Xiao (TUM) Attack on SVM August 26, / 18

3 Learner meets evil teachers What if haters dominate? Are they going to subvert the learning algorithm? And to what extent? H.Xiao (TUM) Attack on SVM August 26, / 18

4 Learner meets evil teachers What if haters dominate? Are they going to subvert the learning algorithm? And to what extent? H.Xiao (TUM) Attack on SVM August 26, / 18

5 Motivation Problem How to induce adversarial label noise to the training data so that the classification algorithm will have maximal error rate? Motivation crowd labeled training data contains adversarial noise previous work assume that labels missing at random, or follow some distributions little is known on adversarial label noise improve the robustness of learner H.Xiao (TUM) Attack on SVM August 26, / 18

6 Motivation Problem How to induce adversarial label noise to the training data so that the classification algorithm will have maximal error rate? Motivation crowd labeled training data contains adversarial noise previous work assume that labels missing at random, or follow some distributions little is known on adversarial label noise improve the robustness of learner H.Xiao (TUM) Attack on SVM August 26, / 18

7 Notations Input space: X R D Response space: Y := { 1,1} Instance: x X is a D-dimensional vector Hypothesis space: H Classification hypothesis: f H, f : X R Negative set (begnin): X := {x X sign(f(x)) = 1} Positive set (malicious): X + := {x X sign(f(x)) = +1} Loss function: V : Y Y R 0+ H.Xiao (TUM) Attack on SVM August 26, / 18

8 Notations Input space: X R D Response space: Y := { 1,1} Instance: x X is a D-dimensional vector Hypothesis space: H Classification hypothesis: f H, f : X R Negative set (begnin): X := {x X sign(f(x)) = 1} Positive set (malicious): X + := {x X sign(f(x)) = +1} Loss function: V : Y Y R 0+ H.Xiao (TUM) Attack on SVM August 26, / 18

9 Notations Input space: X R D Response space: Y := { 1,1} Instance: x X is a D-dimensional vector Hypothesis space: H Classification hypothesis: f H, f : X R Negative set (begnin): X := {x X sign(f(x)) = 1} Positive set (malicious): X + := {x X sign(f(x)) = +1} Loss function: V : Y Y R 0+ H.Xiao (TUM) Attack on SVM August 26, / 18

10 Notations Input space: X R D Response space: Y := { 1,1} Instance: x X is a D-dimensional vector Hypothesis space: H Classification hypothesis: f H, f : X R Negative set (begnin): X := {x X sign(f(x)) = 1} Positive set (malicious): X + := {x X sign(f(x)) = +1} Loss function: V : Y Y R 0+ H.Xiao (TUM) Attack on SVM August 26, / 18

11 Classification Algorithm Tikhonov Regularization Problem Problem Given a training set S := {(x i,y i ) x i X, y i Y} n. Find the classifier f S H that performs best on some test set T. Solving Tikhonov regularization problem f S := argmin f γ n V (y i,f(x i ))+ f 2 H, where γ R 0+ is a fixed parameter for quantifying the trade off. H.Xiao (TUM) Attack on SVM August 26, / 18

12 Label Flips Attack Given a training set, the adversary contaminates the training data through flipping labels. Adversarial Label Flip Attack Find a combination of label flips under a given budget so that a classifier trained on such data will have maximal classification error on some test data. H.Xiao (TUM) Attack on SVM August 26, / 18

13 A Bilevel Formulation Training set: S := {(x i,y i ) x i X, y i Y} n Indicator: z i {0,1},i = 1,...,n Tainted label: y i := y i(1 2z i ) so that if z i = 1 then y i = y i (i.e. flipped), otherwise y i = y i Tainted training set: S := {(x i,y i )} Flipping cost: c i R 0+ Finding the optimal label flips Given S, a test set T and a budget C, solve max V (y,f S (x)), z (x,y) T s.t. f S argmin f γ n V ( y i,f(x i ) ) + f 2 H, n c i z i C, z i {0,1}. H.Xiao (TUM) Attack on SVM August 26, / 18

14 A Bilevel Formulation Training set: S := {(x i,y i ) x i X, y i Y} n Indicator: z i {0,1},i = 1,...,n Tainted label: y i := y i(1 2z i ) so that if z i = 1 then y i = y i (i.e. flipped), otherwise y i = y i Tainted training set: S := {(x i,y i )} Flipping cost: c i R 0+ Finding the optimal label flips Given S, a test set T and a budget C, solve max V (y,f S (x)), z (x,y) T s.t. f S argmin f γ n V ( y i,f(x i ) ) + f 2 H, n c i z i C, z i {0,1}. H.Xiao (TUM) Attack on SVM August 26, / 18

15 A Bilevel Formulation Training set: S := {(x i,y i ) x i X, y i Y} n Indicator: z i {0,1},i = 1,...,n Tainted label: y i := y i(1 2z i ) so that if z i = 1 then y i = y i (i.e. flipped), otherwise y i = y i Tainted training set: S := {(x i,y i )} Flipping cost: c i R 0+ Finding the optimal label flips Given S, a test set T and a budget C, solve max V (y,f S (x)), z (x,y) T s.t. f S argmin f γ n V ( y i,f(x i ) ) + f 2 H, n c i z i C, z i {0,1}. H.Xiao (TUM) Attack on SVM August 26, / 18

16 A Bilevel Formulation Training set: S := {(x i,y i ) x i X, y i Y} n Indicator: z i {0,1},i = 1,...,n Tainted label: y i := y i(1 2z i ) so that if z i = 1 then y i = y i (i.e. flipped), otherwise y i = y i Tainted training set: S := {(x i,y i )} Flipping cost: c i R 0+ Finding the optimal label flips Given S, a test set T and a budget C, solve max V (y,f S (x)), z (x,y) T s.t. f S argmin f γ n V ( y i,f(x i ) ) + f 2 H, n c i z i C, z i {0,1}. H.Xiao (TUM) Attack on SVM August 26, / 18

17 Relaxing the Problem Difficulties even a linear bilevel problem can be hard exhaustive search on all combinations is prohibitive Idea Combining these two conflict objective functions into one. H.Xiao (TUM) Attack on SVM August 26, / 18

18 A Relaxed Formulation Define an auxiliary function A and B be two sets of labeled instances, define g(b,f A ) := γ V (y,f A (x)) (x,y) B Finding the near-optimal label flips: }{{} trained on A measured on B min g(s,f S ) g(s,f S ), z n s.t. c i z i C, z i {0,1}. + f A 2 H H.Xiao (TUM) Attack on SVM August 26, / 18

19 A Relaxed Formulation Refine the objective function 1 Construct a new set U := {(x i,y i )} 2n as follows (x i,y i ) S, i = 1,...,n, x i := x i n, i = n+1,...,2n, y i := y i n i = n+1,...,2n. 2 Introduce q i {0,1},i = 1,...,2n for each element in U, where q i = 1 if (x i,y i ) S. 3 Replace S by U in g(s,f S ) g(s,f S ), we obtain min q,f s.t. γ 2n 2n i=n+1 q i [V (y i,f(x i )) V (y i,f S (x i ))]+ f 2 H, c i q i C, q i +q i+n = 1, i = 1,...,n. q i {0,1}, i = 1,...,2n. H.Xiao (TUM) Attack on SVM August 26, / 18

20 A Relaxed Formulation Refine the objective function 1 Construct a new set U := {(x i,y i )} 2n as follows (x i,y i ) S, i = 1,...,n, x i := x i n, i = n+1,...,2n, y i := y i n i = n+1,...,2n. 2 Introduce q i {0,1},i = 1,...,2n for each element in U, where q i = 1 if (x i,y i ) S. 3 Replace S by U in g(s,f S ) g(s,f S ), we obtain min q,f s.t. γ 2n 2n i=n+1 q i [V (y i,f(x i )) V (y i,f S (x i ))]+ f 2 H, c i q i C, q i +q i+n = 1, i = 1,...,n. q i {0,1}, i = 1,...,2n. H.Xiao (TUM) Attack on SVM August 26, / 18

21 A Relaxed Formulation Refine the objective function 1 Construct a new set U := {(x i,y i )} 2n as follows (x i,y i ) S, i = 1,...,n, x i := x i n, i = n+1,...,2n, y i := y i n i = n+1,...,2n. 2 Introduce q i {0,1},i = 1,...,2n for each element in U, where q i = 1 if (x i,y i ) S. 3 Replace S by U in g(s,f S ) g(s,f S ), we obtain min q,f s.t. γ 2n 2n i=n+1 q i [V (y i,f(x i )) V (y i,f S (x i ))]+ f 2 H, c i q i C, q i +q i+n = 1, i = 1,...,n. q i {0,1}, i = 1,...,2n. H.Xiao (TUM) Attack on SVM August 26, / 18

22 Label Flip Attack on SVMs SVM can be formulated as Tikhonov regularization problem n γ ξ i w 2 min w,ξ,b s.t. y i (w x i +b) 1 ξ i, ξ i 0, i = 1,...,n, where ξ i is the hinge loss of (x i,y i ) resulting from f S. Denote ǫ i the hinge loss of (x i,y i ) resulting from the tainted classifier f S min q,w,ǫ,b γ 2n q i (ǫ i ξ i )+ 1 2 w 2 s.t. y i (w x i +b) 1 ǫ i, ǫ i 0, i = 1,...,2n, 2n c i q i C, i=n+1 q i +q i+n = 1, i = 1,...,n, q i {0,1}, i = 1,...,2n. H.Xiao (TUM) Attack on SVM August 26, / 18

23 Label Flip Attack on SVMs SVM can be formulated as Tikhonov regularization problem n γ ξ i w 2 min w,ξ,b s.t. y i (w x i +b) 1 ξ i, ξ i 0, i = 1,...,n, where ξ i is the hinge loss of (x i,y i ) resulting from f S. Denote ǫ i the hinge loss of (x i,y i ) resulting from the tainted classifier f S min q,w,ǫ,b γ 2n q i (ǫ i ξ i )+ 1 2 w 2 s.t. y i (w x i +b) 1 ǫ i, ǫ i 0, i = 1,...,2n, 2n c i q i C, i=n+1 q i +q i+n = 1, i = 1,...,n, q i {0,1}, i = 1,...,2n. H.Xiao (TUM) Attack on SVM August 26, / 18

24 Label Flip Attack on SVMs Alternately solving QP and LP min w,ǫ,b γ 2n q i ǫ i w 2 (1) s.t. y i (w x i +b) 1 ǫ i, ǫ i 0, i = 1,...,2n. min q s.t. γ 2n 2n i=n+1 q i (ǫ i ξ i ) (2) c i q i C, q i +q i+n = 1, i = 1,...,n, 0 q i 1, i = 1,...,2n. H.Xiao (TUM) Attack on SVM August 26, / 18

25 Experiment Design Adversarial cost c i := 1 for all labels Original training and test sets are balanced Train SVM (LIBSVM) on the tainted training set Worst performance is % error rate (i.e. random guess) Baselines Random introducing label noise from nonadversarial perspective Nearest a thoughtless labeler fails to distinguish instances on the border Furthest malicious labeler deliberately gives wrong instances H.Xiao (TUM) Attack on SVM August 26, / 18

26 Experiment Design Adversarial cost c i := 1 for all labels Original training and test sets are balanced Train SVM (LIBSVM) on the tainted training set Worst performance is % error rate (i.e. random guess) Baselines Random introducing label noise from nonadversarial perspective Nearest a thoughtless labeler fails to distinguish instances on the border Furthest malicious labeler deliberately gives wrong instances H.Xiao (TUM) Attack on SVM August 26, / 18

27 Results on Synthetic Data Train: 100, flip:20, test 800 (a) Synthetic data (b) No Flips (c) Random (d) Nearst (e) Furthest (f) ALFA Linear pattern Linear SVM 1.8% 1.9% 6.9% 9.5% 21.8% RBF SVM 3.2% 4.0% 3.5% 26.5% 32.4% Parabolic pattern Linear SVM 23.5% 28.8% 29.2%.5% 48.0% RBF SVM 5.1% 9.4% 10.1% 12.9%.8% H.Xiao (TUM) Attack on SVM August 26, / 18

28 Results on 10 Data Sets Train: 200, flip:1,...,60, test 800. ( : linear, : RBF) Rand Nearest Furthest ALFA 60 a9a 60 acoustic 55 connect 4 55 covtype dna gisette 60 ijcnn1 70 letter 60 seismic 60 satimage a9a 60 acoustic 55 connect 4 55 covtype dna gisette ijcnn1 70 letter 60 seismic 60 satimage H.Xiao (TUM) Attack on SVM August 26, / 18

29 Required Cost for % Error Rate Data sets Rand. Near. Furt. ALFA Rand. Near. Furt. ALFA Rand. Near. Furt. ALFA SVM with linear kernel a9a acoustic connect covtype dna gisette ijcnn letter seismic satimage SVM with RBF kernel a9a acoustic connect covtype dna gisette ijcnn letter seismic satimage H.Xiao (TUM) Attack on SVM August 26, / 18

30 Summary 1 a framework for adversarial label flips attack 2 more aggressive than random noise and other baselines 3 also effective on robust label-noise SVM (Battista B. etc., ACML 11) 4 can be extended to regression, active learning scenarios H.Xiao (TUM) Attack on SVM August 26, / 18

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 24

Big Data Analytics. Special Topics for Computer Science CSE CSE Feb 24 Big Data Analytics Special Topics for Computer Science CSE 4095-001 CSE 5095-005 Feb 24 Fei Wang Associate Professor Department of Computer Science and Engineering fei_wang@uconn.edu Prediction III Goal

More information

Learning Theory. Piyush Rai. CS5350/6350: Machine Learning. September 27, (CS5350/6350) Learning Theory September 27, / 14

Learning Theory. Piyush Rai. CS5350/6350: Machine Learning. September 27, (CS5350/6350) Learning Theory September 27, / 14 Learning Theory Piyush Rai CS5350/6350: Machine Learning September 27, 2011 (CS5350/6350) Learning Theory September 27, 2011 1 / 14 Why Learning Theory? We want to have theoretical guarantees about our

More information

Holdout and Cross-Validation Methods Overfitting Avoidance

Holdout and Cross-Validation Methods Overfitting Avoidance Holdout and Cross-Validation Methods Overfitting Avoidance Decision Trees Reduce error pruning Cost-complexity pruning Neural Networks Early stopping Adjusting Regularizers via Cross-Validation Nearest

More information

MIRA, SVM, k-nn. Lirong Xia

MIRA, SVM, k-nn. Lirong Xia MIRA, SVM, k-nn Lirong Xia Linear Classifiers (perceptrons) Inputs are feature values Each feature has a weight Sum is the activation activation w If the activation is: Positive: output +1 Negative, output

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

Support Vector Machine (continued)

Support Vector Machine (continued) Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need

More information

Support Vector Machine. Industrial AI Lab.

Support Vector Machine. Industrial AI Lab. Support Vector Machine Industrial AI Lab. Classification (Linear) Autonomously figure out which category (or class) an unknown item should be categorized into Number of categories / classes Binary: 2 different

More information

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved

More information

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Data Mining Support Vector Machines Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 Support Vector Machines Find a linear hyperplane

More information

Support vector machines Lecture 4

Support vector machines Lecture 4 Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The

More information

MLCC 2017 Regularization Networks I: Linear Models

MLCC 2017 Regularization Networks I: Linear Models MLCC 2017 Regularization Networks I: Linear Models Lorenzo Rosasco UNIGE-MIT-IIT June 27, 2017 About this class We introduce a class of learning algorithms based on Tikhonov regularization We study computational

More information

Machine Teaching. for Personalized Education, Security, Interactive Machine Learning. Jerry Zhu

Machine Teaching. for Personalized Education, Security, Interactive Machine Learning. Jerry Zhu Machine Teaching for Personalized Education, Security, Interactive Machine Learning Jerry Zhu NIPS 2015 Workshop on Machine Learning from and for Adaptive User Technologies Supervised Learning Review D:

More information

Support Vector Machines for Classification and Regression

Support Vector Machines for Classification and Regression CIS 520: Machine Learning Oct 04, 207 Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may

More information

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this

More information

Blind Attacks on Machine Learners

Blind Attacks on Machine Learners Blind Attacks on Machine Learners Alex Beatson 1 Zhaoran Wang 1 Han Liu 1 1 Princeton University NIPS, 2016/ Presenter: Anant Kharkar NIPS, 2016/ Presenter: Anant Kharkar 1 Outline 1 Introduction Motivation

More information

Support Vector Machine. Industrial AI Lab. Prof. Seungchul Lee

Support Vector Machine. Industrial AI Lab. Prof. Seungchul Lee Support Vector Machine Industrial AI Lab. Prof. Seungchul Lee Classification (Linear) Autonomously figure out which category (or class) an unknown item should be categorized into Number of categories /

More information

PAC-learning, VC Dimension and Margin-based Bounds

PAC-learning, VC Dimension and Margin-based Bounds More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based

More information

Midterm Exam, Spring 2005

Midterm Exam, Spring 2005 10-701 Midterm Exam, Spring 2005 1. Write your name and your email address below. Name: Email address: 2. There should be 15 numbered pages in this exam (including this cover sheet). 3. Write your name

More information

10-701/ Machine Learning - Midterm Exam, Fall 2010

10-701/ Machine Learning - Midterm Exam, Fall 2010 10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam

More information

ABC-LogitBoost for Multi-Class Classification

ABC-LogitBoost for Multi-Class Classification Ping Li, Cornell University ABC-Boost BTRY 6520 Fall 2012 1 ABC-LogitBoost for Multi-Class Classification Ping Li Department of Statistical Science Cornell University 2 4 6 8 10 12 14 16 2 4 6 8 10 12

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Review: Support vector machines. Machine learning techniques and image analysis

Review: Support vector machines. Machine learning techniques and image analysis Review: Support vector machines Review: Support vector machines Margin optimization min (w,w 0 ) 1 2 w 2 subject to y i (w 0 + w T x i ) 1 0, i = 1,..., n. Review: Support vector machines Margin optimization

More information

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in

More information

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

Warm up: risk prediction with logistic regression

Warm up: risk prediction with logistic regression Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T

More information

Support Vector and Kernel Methods

Support Vector and Kernel Methods SIGIR 2003 Tutorial Support Vector and Kernel Methods Thorsten Joachims Cornell University Computer Science Department tj@cs.cornell.edu http://www.joachims.org 0 Linear Classifiers Rules of the Form:

More information

Basis Expansion and Nonlinear SVM. Kai Yu

Basis Expansion and Nonlinear SVM. Kai Yu Basis Expansion and Nonlinear SVM Kai Yu Linear Classifiers f(x) =w > x + b z(x) = sign(f(x)) Help to learn more general cases, e.g., nonlinear models 8/7/12 2 Nonlinear Classifiers via Basis Expansion

More information

Time Series Classification

Time Series Classification Distance Measures Classifiers DTW vs. ED Further Work Questions August 31, 2017 Distance Measures Classifiers DTW vs. ED Further Work Questions Outline 1 2 Distance Measures 3 Classifiers 4 DTW vs. ED

More information

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab

More information

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University Lecture 18: Kernels Risk and Loss Support Vector Regression Aykut Erdem December 2016 Hacettepe University Administrative We will have a make-up lecture on next Saturday December 24, 2016 Presentations

More information

Constrained Optimization and Support Vector Machines

Constrained Optimization and Support Vector Machines Constrained Optimization and Support Vector Machines Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/

More information

About this class. Maximizing the Margin. Maximum margin classifiers. Picture of large and small margin hyperplanes

About this class. Maximizing the Margin. Maximum margin classifiers. Picture of large and small margin hyperplanes About this class Maximum margin classifiers SVMs: geometric derivation of the primal problem Statement of the dual problem The kernel trick SVMs as the solution to a regularization problem Maximizing the

More information

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Evaluation. Andrea Passerini Machine Learning. Evaluation

Evaluation. Andrea Passerini Machine Learning. Evaluation Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain

More information

The Perceptron Algorithm, Margins

The Perceptron Algorithm, Margins The Perceptron Algorithm, Margins MariaFlorina Balcan 08/29/2018 The Perceptron Algorithm Simple learning algorithm for supervised classification analyzed via geometric margins in the 50 s [Rosenblatt

More information

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training

More information

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x))

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) Linear smoother ŷ = S y where s ij = s ij (x) e.g. s ij = diag(l i (x)) 2 Online Learning: LMS and Perceptrons Partially adapted from slides by Ryan Gabbard and Mitch Marcus (and lots original slides by

More information

Machine Learning in the Data Revolution Era

Machine Learning in the Data Revolution Era Machine Learning in the Data Revolution Era Shai Shalev-Shwartz School of Computer Science and Engineering The Hebrew University of Jerusalem Machine Learning Seminar Series, Google & University of Waterloo,

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem

More information

Pattern Recognition 2018 Support Vector Machines

Pattern Recognition 2018 Support Vector Machines Pattern Recognition 2018 Support Vector Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 1 / 48 Support Vector Machines Ad Feelders ( Universiteit Utrecht

More information

Evaluation requires to define performance measures to be optimized

Evaluation requires to define performance measures to be optimized Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation

More information

Support Vector Machines: Maximum Margin Classifiers

Support Vector Machines: Maximum Margin Classifiers Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind

More information

Name (NetID): (1 Point)

Name (NetID): (1 Point) CS446: Machine Learning (D) Spring 2017 March 16 th, 2017 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains

More information

L5 Support Vector Classification

L5 Support Vector Classification L5 Support Vector Classification Support Vector Machine Problem definition Geometrical picture Optimization problem Optimization Problem Hard margin Convexity Dual problem Soft margin problem Alexander

More information

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina Indirect Rule Learning: Support Vector Machines Indirect learning: loss optimization It doesn t estimate the prediction rule f (x) directly, since most loss functions do not have explicit optimizers. Indirection

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)

More information

An Analytical Comparison between Bayes Point Machines and Support Vector Machines

An Analytical Comparison between Bayes Point Machines and Support Vector Machines An Analytical Comparison between Bayes Point Machines and Support Vector Machines Ashish Kapoor Massachusetts Institute of Technology Cambridge, MA 02139 kapoor@mit.edu Abstract This paper analyzes the

More information

Support Vector Machines

Support Vector Machines Two SVM tutorials linked in class website (please, read both): High-level presentation with applications (Hearst 1998) Detailed tutorial (Burges 1998) Support Vector Machines Machine Learning 10701/15781

More information

Unsupervised Anomaly Detection for High Dimensional Data

Unsupervised Anomaly Detection for High Dimensional Data Unsupervised Anomaly Detection for High Dimensional Data Department of Mathematics, Rowan University. July 19th, 2013 International Workshop in Sequential Methodologies (IWSM-2013) Outline of Talk Motivation

More information

Linear Classifiers: Expressiveness

Linear Classifiers: Expressiveness Linear Classifiers: Expressiveness Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 Lecture outline Linear classifiers: Introduction What functions do linear classifiers express?

More information

Announcements - Homework

Announcements - Homework Announcements - Homework Homework 1 is graded, please collect at end of lecture Homework 2 due today Homework 3 out soon (watch email) Ques 1 midterm review HW1 score distribution 40 HW1 total score 35

More information

Homework 3. Convex Optimization /36-725

Homework 3. Convex Optimization /36-725 Homework 3 Convex Optimization 10-725/36-725 Due Friday October 14 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)

More information

Lecture 10: A brief introduction to Support Vector Machine

Lecture 10: A brief introduction to Support Vector Machine Lecture 10: A brief introduction to Support Vector Machine Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information

Efficient and Principled Online Classification Algorithms for Lifelon

Efficient and Principled Online Classification Algorithms for Lifelon Efficient and Principled Online Classification Algorithms for Lifelong Learning Toyota Technological Institute at Chicago Chicago, IL USA Talk @ Lifelong Learning for Mobile Robotics Applications Workshop,

More information

Support Vector Machine

Support Vector Machine Andrea Passerini passerini@disi.unitn.it Machine Learning Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem

More information

Lecture 10: Support Vector Machine and Large Margin Classifier

Lecture 10: Support Vector Machine and Large Margin Classifier Lecture 10: Support Vector Machine and Large Margin Classifier Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu

More information

Midterm Exam Solutions, Spring 2007

Midterm Exam Solutions, Spring 2007 1-71 Midterm Exam Solutions, Spring 7 1. Personal info: Name: Andrew account: E-mail address:. There should be 16 numbered pages in this exam (including this cover sheet). 3. You can use any material you

More information

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:

More information

Learning with multiple models. Boosting.

Learning with multiple models. Boosting. CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models

More information

Linear and Logistic Regression. Dr. Xiaowei Huang

Linear and Logistic Regression. Dr. Xiaowei Huang Linear and Logistic Regression Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Two Classical Machine Learning Algorithms Decision tree learning K-nearest neighbor Model Evaluation Metrics

More information

Nearest Neighbors Methods for Support Vector Machines

Nearest Neighbors Methods for Support Vector Machines Nearest Neighbors Methods for Support Vector Machines A. J. Quiroz, Dpto. de Matemáticas. Universidad de Los Andes joint work with María González-Lima, Universidad Simón Boĺıvar and Sergio A. Camelo, Universidad

More information

Machine Learning. Ensemble Methods. Manfred Huber

Machine Learning. Ensemble Methods. Manfred Huber Machine Learning Ensemble Methods Manfred Huber 2015 1 Bias, Variance, Noise Classification errors have different sources Choice of hypothesis space and algorithm Training set Noise in the data The expected

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction

More information

Graph-Based Anomaly Detection with Soft Harmonic Functions

Graph-Based Anomaly Detection with Soft Harmonic Functions Graph-Based Anomaly Detection with Soft Harmonic Functions Michal Valko Advisor: Milos Hauskrecht Computer Science Department, University of Pittsburgh, Computer Science Day 2011, March 18 th, 2011. Anomaly

More information

CSC 411 Lecture 17: Support Vector Machine

CSC 411 Lecture 17: Support Vector Machine CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17

More information

Support Vector Machine for Classification and Regression

Support Vector Machine for Classification and Regression Support Vector Machine for Classification and Regression Ahlame Douzal AMA-LIG, Université Joseph Fourier Master 2R - MOSIG (2013) November 25, 2013 Loss function, Separating Hyperplanes, Canonical Hyperplan

More information

Machine Learning And Applications: Supervised Learning-SVM

Machine Learning And Applications: Supervised Learning-SVM Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Reading: Ben-Hur & Weston, A User s Guide to Support Vector Machines (linked from class web page) Notation Assume a binary classification problem. Instances are represented by vector

More information

Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Risk Minimization Barnabás Póczos What have we seen so far? Several classification & regression algorithms seem to work fine on training datasets: Linear

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Ryan M. Rifkin Google, Inc. 2008 Plan Regularization derivation of SVMs Geometric derivation of SVMs Optimality, Duality and Large Scale SVMs The Regularization Setting (Again)

More information

SVM TRADE-OFF BETWEEN MAXIMIZE THE MARGIN AND MINIMIZE THE VARIABLES USED FOR REGRESSION

SVM TRADE-OFF BETWEEN MAXIMIZE THE MARGIN AND MINIMIZE THE VARIABLES USED FOR REGRESSION International Journal of Pure and Applied Mathematics Volume 87 No. 6 2013, 741-750 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu doi: http://dx.doi.org/10.12732/ijpam.v87i6.2

More information

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable. Linear SVM (separable case) First consider the scenario where the two classes of points are separable. It s desirable to have the width (called margin) between the two dashed lines to be large, i.e., have

More information

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels Karl Stratos June 21, 2018 1 / 33 Tangent: Some Loose Ends in Logistic Regression Polynomial feature expansion in logistic

More information

Training algorithms for fuzzy support vector machines with nois

Training algorithms for fuzzy support vector machines with nois Training algorithms for fuzzy support vector machines with noisy data Presented by Josh Hoak Chun-fu Lin 1 Sheng-de Wang 1 1 National Taiwan University 13 April 2010 Prelude Problem: SVMs are particularly

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Support Vector Machine (SVM) Hamid R. Rabiee Hadi Asheri, Jafar Muhammadi, Nima Pourdamghani Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Introduction

More information

Machine Learning and Data Mining. Support Vector Machines. Kalev Kask

Machine Learning and Data Mining. Support Vector Machines. Kalev Kask Machine Learning and Data Mining Support Vector Machines Kalev Kask Linear classifiers Which decision boundary is better? Both have zero training error (perfect training accuracy) But, one of them seems

More information

Low Bias Bagged Support Vector Machines

Low Bias Bagged Support Vector Machines Low Bias Bagged Support Vector Machines Giorgio Valentini Dipartimento di Scienze dell Informazione Università degli Studi di Milano, Italy valentini@dsi.unimi.it Thomas G. Dietterich Department of Computer

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang Machine Learning Basics Lecture 2: Linear Classification Princeton University COS 495 Instructor: Yingyu Liang Review: machine learning basics Math formulation Given training data x i, y i : 1 i n i.i.d.

More information

Support Vector Machines (SVMs).

Support Vector Machines (SVMs). Support Vector Machines (SVMs). SemiSupervised Learning. SemiSupervised SVMs. MariaFlorina Balcan 3/25/215 Support Vector Machines (SVMs). One of the most theoretically well motivated and practically most

More information

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM 1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

More information

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Naïve Bayes Matt Gormley Lecture 18 Oct. 31, 2018 1 Reminders Homework 6: PAC Learning

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

4.1 Online Convex Optimization

4.1 Online Convex Optimization CS/CNS/EE 53: Advanced Topics in Machine Learning Topic: Online Convex Optimization and Online SVM Lecturer: Daniel Golovin Scribe: Xiaodi Hou Date: Jan 3, 4. Online Convex Optimization Definition 4..

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 477 Instructor: Tony Jebara Topic 5 Generalization Guarantees VC-Dimension Nearest Neighbor Classification (infinite VC dimension) Structural Risk Minimization Support Vector Machines

More information

Learning SVM Classifiers with Indefinite Kernels

Learning SVM Classifiers with Indefinite Kernels Learning SVM Classifiers with Indefinite Kernels Suicheng Gu and Yuhong Guo Dept. of Computer and Information Sciences Temple University Support Vector Machines (SVMs) (Kernel) SVMs are widely used in

More information

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers Erin Allwein, Robert Schapire and Yoram Singer Journal of Machine Learning Research, 1:113-141, 000 CSE 54: Seminar on Learning

More information

SVMs, Duality and the Kernel Trick

SVMs, Duality and the Kernel Trick SVMs, Duality and the Kernel Trick Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 26 th, 2007 2005-2007 Carlos Guestrin 1 SVMs reminder 2005-2007 Carlos Guestrin 2 Today

More information

Surrogate regret bounds for generalized classification performance metrics

Surrogate regret bounds for generalized classification performance metrics Surrogate regret bounds for generalized classification performance metrics Wojciech Kotłowski Krzysztof Dembczyński Poznań University of Technology PL-SIGML, Częstochowa, 14.04.2016 1 / 36 Motivation 2

More information

Statistical and Computational Learning Theory

Statistical and Computational Learning Theory Statistical and Computational Learning Theory Fundamental Question: Predict Error Rates Given: Find: The space H of hypotheses The number and distribution of the training examples S The complexity of the

More information

Introduction to Signal Detection and Classification. Phani Chavali

Introduction to Signal Detection and Classification. Phani Chavali Introduction to Signal Detection and Classification Phani Chavali Outline Detection Problem Performance Measures Receiver Operating Characteristics (ROC) F-Test - Test Linear Discriminant Analysis (LDA)

More information