A Magiv CV Theory for Large-Margin Classifiers

Size: px
Start display at page:

Download "A Magiv CV Theory for Large-Margin Classifiers"

Transcription

1 A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang

2 Outline 1 Background 2 Magic CV formula 3 Magic support vector machines 4 Magic CV applications in kernel learning theory 5 Numerical studies

3 Binary classification Observations: a collection of i.i.d. training data (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ). Predictors (covariates, input vector): x i = (x i1,..., x ip ). Response (class label, output variable): y i = { 1, 1}. Build a model ˆf using the training data. Given any new input x, we predict the class label ŷ = ˆf(x). 2/37

4 Classification toolbox linear discriminant analysis logistic regression kernel density classifier naïve Bayes classifier neural network boosting ensembles random forest support vector machine (SVM)... Experiment: Fernández et al (2014, JMLR) compared 179 commonly used classifiers on 121 UCI data sets. Conclusion: best classifiers are random forest, kernel SVM, neural networks, and boosting ensembles. 3/37

5 Support vector machine The Non-separable Case Distance y i (ω 0 + x T i ω) y = +1 may be negative. x2 η i x i Introduce slack variables η i 0. Redefine the distance d i : y = 1 ω d i = y i (ω 0 + x T i ω) + η i such that d i 0 for all i. x 1 4/37

6 Support vector machine SVM (Vapnik, 1995) argmax ω 0,ω subject to min i d i, d i = y i (ω 0 + x T i ω) + η i 0, i, η i 0, i, ω T ω = 1, η i t. i The tuning parameter t controls the extent of the slack variables. 5/37

7 Computing SVM in the dual space Lagrange dual function: [ n max L D = max α i 1 α α 2 subject to i=1 n i=1 i =1 n α i y i = 0 and 0 α i γ, i. i=1 The solution has the form: ˆf(x) = ˆβ 0 + ] n α i α i y i y i x i, x i, n ˆα i y i x i, x. The coefficients, ˆα i, are nonzero only when the observations are the support vectors. i=1 6/37

8 Kernel trick in the dual space Lagrange dual and solution with kernel function: [ n max L D = max α i 1 α α 2 subject to i=1 n i=1 i =1 n α i y i = 0 and 0 α i γ, i. i=1 ˆf(x) = β 0 + Gaussian kernel: n ˆα i y i K(x i, x). i=1 K(x, x ) = exp( σ x x 2 2). ] n α i α i y i y i K(x i, x i ), 7/37

9 State-of-the-art SVM solvers interior point method. Example: R package kernlab. sequential minimal optimization. (Platt, 1999; Osuna et al, 1997; Keerthi et al., 2001; Fan et al., 2005) LIBSVM library, R package e /37

10 Tuning the SVM The kernel SVM has high prediction accuracy. The generalization error of SVM depends on the choice of the tuning parameter. Two tasks: 1 Model comparison/selection: e.g., choose the tuning parameter for a procedure. 2 Model assessment: estimate the generalization error for the final model. Cross-validation: perhaps the simplest and the most widely used tool. 9/37

11 Cross-validation

12 V -fold cross-validation Fold 1 Fold 2 Fold 3 Validation Train Train Train Train Validation Train Train Train Train Validation Train Fold V Train Train Train Validation Cross-validation error: CV(λ) = V 1 V v=1 L(Y v, ˆf [ v] λ (X v )). Tuning parameter: ˆλ = argmin CV(λ). λ Generalization error: Err( ˆf λ ) CV (λ). Leave-one-out CV (LOOCV): V = n. Ten-fold CV: V = /37

13 LOOCV or ten-fold CV? LOOCV is an almost unbiased estimator for the true generalization error, i.e., small bias. LOOCV is deterministic while ten-fold CV is random in terms of different training/validation splits. LOOCV is computationally expensive due to n times model fits.? LOOCV is claimed to have larger variance than ten-fold CV. The last statement is quite popular but it is not generally true. 11/37

14 Mean of classification error LOOCV cv error true error 10 fold cv error true error 5 fold cv error true error 2 fold cv error true error Variance of classification error log(lambda) log(lambda) log(lambda) log(lambda) 1 LOOCV has almost no bias in estimating generalization error. 2 LOOCV has the variance no larger than other V -fold CV. 12/37

15 Magic CV Formula

16 LOOCV for regression Model: y = f(x) + ɛ. Estimate f using the regularization: ˆf λ = argmin f [ 1 n ] n (y i f(x i )) 2 + λp (f). i=1 Examples: ridge regression, f(x) = x β, P (f) = β 2 2 ; smoothing spline, P (f) = f (u) 2 du. LOOCV estimate: ˆf [ v] λ = argmin f 1 n n i=1,i v (y i f(x i )) 2 + λp (f). LOOCV error: LOOCV(λ) = 1 n i=1 (y i n ˆf [ i] λ (x i )) 2. 13/37

17 Magic leave-one-out lemma for regression Craven and Wahba (1979) Suppose that ˆf λ (x i ) = H i y, is self-stable, then LOOCV(λ) = 1 n = 1 n n i=1 n i=1 ( y i ) [ i] 2 ˆf λ (x i ) (y i ˆf λ (x i )) 2 (1 h ii ) 2. 14/37

18 Self-stability property f(x [ 5], y [ 5] ) f(x, ỹ) y f(x, y) y x x 15/37

19 The Question We must consider the computation cost of the SVM with CV. Can we compute the exact LOOCV of SVM and related classifiers without repeating the algorithm n times? 16/37

20 Our contributions Propose a magic CV formula to compute the exact cross-validation error in the context of large-margin classification. Develop a magic SVM by designing a very efficient algorithm to solve kernel SVM and applying the magic CV formula for tuning the parameters. Obtain the theoretic bounds of the expectation and variance of cross-validation error based on the CV formula. 17/37

21 Performance demo method time (sec) error (%) time (sec) error (%) arrhythmia n = 452 p = 191 musk n = 476 p = 166 magicsvm (0.602) (0.415) kernlab (0.602) (0.433) e (0.602) (0.420) australian n = 690 p = 14 SAfrica n = 462 p = 65 magicsvm (0.372) (0.616) kernlab (0.388) (0.750) e (0.371) (0.692) hepatitis n = 112 p = 18 sonar n = 208 p = 6 magicsvm (1.044) (0.749) kernlab (0.966) (0.973) e (0.933) (1.351) LSVT n = 126 p = 309 valley n = 606 p = 100 magicsvm (0.620) (0.186) kernlab (0.766) (0.232) e (0.763) (0.209) 18/37

22 Magic CV formula for SVM

23 Cross-validation estimates: A(X [ v], y [ v] ) = A(X, ỹ). Regression Large-margin Classification! "! "! "! "! #! #! #! #! % '( % (* % )! % -! &! &! &! &!,!!,! 19/37

24 Cross-validation estimates: A(X [ v], y [ v] ) = A(X, ỹ). Regression Large-margin Classification! "! "! "! "! "! "! "! "! #! #! #! #! #! #! #! #! %! % '( % '( (* % (* % ) % )! %! % - -! &! &! &! &! &! &! &! &!!,!,!!!,!,! 19/37

25 Kernel SVM in the primal space SVM: argmax ω 0,ω subject to loss + penalty: min i d i, d i = y i (ω 0 + x T i ω) + η i 0, η i 0, i, ω T ω = 1, η i t. i [ 1 n [ argmin 1 yi (β 0 + x T i β) ] ] β 0,β n + + λβt β. i= SVM /37

26 Kernel SVM in the primal space 1 min f H K [ 1 n ] n [1 y i f(x i )] + + λ f 2 H K, i=1 2 Mercer Theorem: a kernel function K has an eigen-expansion, K(x, x ) = γ t φ t (x)φ t (x ). t=1 3 An Hilbert space H K is defined as the collection of functions f(x) = θ t φ t (x). t=1 with the inner product defined as θ t φ t (x), δ t φ t (x) θ t δ t /γ t. t=1 t =1 H K = t=1 21/37

27 4 The representer theorem (Kimeldorf and Wahba, 1971): min f H K [ 1 n ˆα = arg min α ] n [1 y i f(x i )] + + λ f 2 H K, i=1 ˆf(x) = [ 1 n n ˆα i K(x, x i ) = K T i α, i=1 n [ 1 yi K T i α ] ] + + λαt Kα. i=1 22/37

28 Magic leave-one-out formula for SVM If we let ỹ v = 0 and ỹ i = y i if i v, then we have ˆf [ v] λ = argmin f H K [ 1 n ] n L (ỹ i f(x i )) + λ f 2 H K. i=1 23/37

29 Magic cross-validation formula (V -fold CV) The ith data point is allocated to the fold τ(i) by randomization: τ : {1,..., n} {1,..., V }. The V -fold CV estimate: ˆf [ v] λ = argmin 1 f H K n {i:τ(i) v} L (y i f(x i )) + λ f 2 H K. We define ỹ i = 0 if τ(i) = v and ỹ i = y i if τ(i) v, then we have ˆf [ v] λ = argmin f H K [ 1 n ] n L (ỹ i f(x i )) + λ f 2 H K. i=1 24/37

30 Magic SVM Efficient algorithm for training SVM Exact smoothing principle Accelerated proximal gradient descent Tune SVM using leave-one-out cross-validation Magic CV formula 25/37

31 Smoothed SVM loss: 0 u 1 + δ, L δ 1 (u) = 4δ [u (1 + δ)]2 1 δ < u < 1 + δ, 1 u u 1 δ. Lipschitz gradient: L δ (u 1 ) L δ (u 2 ) 1 2δ u 1 u 2. Loss Lδ(u) δ = 0.5 δ = 0.25 δ = 0.1 δ = 0.01 SVM u 26/37

32 Theorem (finite exact smoothing of SVM) With training data K and y given, suppose α SVM and α δ are the unique solution of the following problems, then there exists a small δ such that α δ = α SVM when δ < δ. α SVM = argmin α R n α δ = argmin α R n [ 1 n [ 1 n ] n L(y i K i α) + λα Kα. i=1 ] n L δ (y i K i α) + λα Kα. i=1 The exact SVM solution is virtually attained before δ = 0. Define a sequence δ (d+1) = rδ (d) and 0 < r < 1. We solve ˆα δ (d) sequentially and terminate the algorithm when some ˆα δ (d) satisfies the KKT condition of the SVM problem. 27/37

33 Smoothed SVM: min α R Qδ (α) min n α R n [ 1 n ] n L δ (y i K i α) + λα Kα. i=1 Accelerated proximal gradient descent update: [ α (t+1) = argmin λα Kα + 1 ( ) ] α ᾱ (t) 2δ l δ (ᾱ (t) 2 ) α R n 4nδ 2 ( = ᾱ (t) 2λK + 1 ) 1 ( 2nδ KK l δ (ᾱ (t) ) + 2λKᾱ (t)). ( ) ᾱ (t) = α (t) rt 1 + (α (t) α (t 1) ), r 1 = 1, r t+1 = r t+1 ( r 2 t ) /2. 28/37

34 Algorithm 1 Magic SVM Require: y, K, λ, and r, (e.g., r = 2 3 ). 1: Initialize δ. Define L δ. Initialize each α [ v]. 2: repeat 3: Compute P 1 δ (K) = (2λK + 1 2nδ KK) 1. 4: for v = 1,..., n do 5: Let ỹ i = y i if i v, and ỹ v = 0. 6: repeat 7: Compute z, with z i = ỹ i L δ ((ỹ i K i α [ v] )/n. 8: α [ v] α [ v] P 1 δ (K) K z + 2λK α [ v]). 9: until the convergence condition is met. 10: end for 11: Update δ rδ. 12: until the KKT condition check of SVM is passed. The complexity of regular LOOCV SVM O(n 4 ). The complexity of the entire magic LOOCV SVM is O(n 3 ), 29/37

35 The algorithm can be generalized to other variants of CV, such as V -fold CV, delete-v CV, etc. The algorithm can be generalized to other kernel machines, e.g., logistic regression, squared SVM, huber SVM. 30/37

36 Simulation Define µ + = (1,..., 1, 0,..., 0) and µ = (0,..., 0, 1,..., 1). Positive class: 10 k=1 0.1N(µ k+, 4I) where µ k+ N(µ +, I). Negative class: 10 k=1 0.1N(µ k, 4I) where µ k N(µ, I). Ratios of the run time without the magic CV to the magic CV: ratio of run time p=0.2n p=0.5n sample size 31/37

37 n p time (seconds) objective value magicsvm kernlab e1071 magicsvm kernlab e (0.011) (0.011) (0.011) (0.014) (0.014) (0.073) (0.010) (0.010) (0.010) (0.018) (0.017) (0.017) (0.012) (0.012) (0.012) (0.015) (0.015) (0.015) 32/37

38 Magic CV in kernel learning theory

39 Theorem (Bounding expectation of V -fold CV error) Suppose each training data in T n = {(x i, y i )} n i=1 is sampled from the same distribution. Suppose the ith data point is allocated to the τ(i)th set. Let B = sup x K(x, x) and Λ = sup u L (u). Then the expectation of V -fold CV error, 2 < V n, satisfies E T n err V CV E T n err( ˆf λ ) + BΛ2 2λV. V -fold CV error bound: err V CV = 1 V n Training error: v=1 L(y i {i:τ(i)=v} ˆf [ v] λ (x i )). err( ˆf λ ) = 1 n n i=1 ( L y i ˆf ) λ (x i ). 33/37

40 Corollary (Generalization error bound for kernel SVM) Suppose each training data in T n = {(x i, y i )} n i=1 is sampled from the same distribution. Define f = argmin f HK Err(f). Then E T n Err( ˆf λ ) Err( f) + λ f 2 H K + 1 2λn. Gaussian kernel: B = sup x K(x, x) = 1; SVM: Λ = sup u L (u) = 1. 34/37

41 Theorem (Bounding the variance of the V -fold CV) Suppose each training data in T n = {(x i, y i )} n i=1 is sampled from the same distribution. Suppose the ith data point is allocated to the τ(i)th set. Let B = sup x K(x, x) and Λ = sup u L (u). Var T n (err LOOCV ) 1 n ( 1 + ) 4 1. λ 35/37

42 Thank You

Support Vector Machines for Classification: A Statistical Portrait

Support Vector Machines for Classification: A Statistical Portrait Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,

More information

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina Indirect Rule Learning: Support Vector Machines Indirect learning: loss optimization It doesn t estimate the prediction rule f (x) directly, since most loss functions do not have explicit optimizers. Indirection

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

Direct Learning: Linear Classification. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Classification. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Classification Logistic regression models for classification problem We consider two class problem: Y {0, 1}. The Bayes rule for the classification is I(P(Y = 1 X = x) > 1/2) so

More information

Support Vector Machines: Maximum Margin Classifiers

Support Vector Machines: Maximum Margin Classifiers Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong

A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES Wei Chu, S. Sathiya Keerthi, Chong Jin Ong Control Division, Department of Mechanical Engineering, National University of Singapore 0 Kent Ridge Crescent,

More information

Lecture 10: Support Vector Machine and Large Margin Classifier

Lecture 10: Support Vector Machine and Large Margin Classifier Lecture 10: Support Vector Machine and Large Margin Classifier Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families

Topics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families Midterm Review Topics we covered Machine Learning Optimization Basics of optimization Convexity Unconstrained: GD, SGD Constrained: Lagrange, KKT Duality Linear Methods Perceptrons Support Vector Machines

More information

Support Vector Machines

Support Vector Machines Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two

More information

Lecture 10: A brief introduction to Support Vector Machine

Lecture 10: A brief introduction to Support Vector Machine Lecture 10: A brief introduction to Support Vector Machine Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department

More information

Statistical Methods for Data Mining

Statistical Methods for Data Mining Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find

More information

Basis Expansion and Nonlinear SVM. Kai Yu

Basis Expansion and Nonlinear SVM. Kai Yu Basis Expansion and Nonlinear SVM Kai Yu Linear Classifiers f(x) =w > x + b z(x) = sign(f(x)) Help to learn more general cases, e.g., nonlinear models 8/7/12 2 Nonlinear Classifiers via Basis Expansion

More information

Review: Support vector machines. Machine learning techniques and image analysis

Review: Support vector machines. Machine learning techniques and image analysis Review: Support vector machines Review: Support vector machines Margin optimization min (w,w 0 ) 1 2 w 2 subject to y i (w 0 + w T x i ) 1 0, i = 1,..., n. Review: Support vector machines Margin optimization

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable. Linear SVM (separable case) First consider the scenario where the two classes of points are separable. It s desirable to have the width (called margin) between the two dashed lines to be large, i.e., have

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Support Vector Machine I

Support Vector Machine I Support Vector Machine I Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please use piazza. No emails. HW 0 grades are back. Re-grade request for one week. HW 1 due soon. HW

More information

Another Look at DWD: Thrifty Algorithm and Bayes Risk Consistency in RKHS

Another Look at DWD: Thrifty Algorithm and Bayes Risk Consistency in RKHS Another Look at DWD: Thrifty Algorithm and Bayes Risk Consistency in RKHS arxiv:1508.05913v1 stat.ml] 24 Aug 2015 Boxiang Wang and Hui Zou August 21, 2015 Abstract Distance weighted discrimination (DWD

More information

Pattern Recognition 2018 Support Vector Machines

Pattern Recognition 2018 Support Vector Machines Pattern Recognition 2018 Support Vector Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 1 / 48 Support Vector Machines Ad Feelders ( Universiteit Utrecht

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Support Vector Machines

Support Vector Machines EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable

More information

LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition LINEAR CLASSIFIERS Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification, the input

More information

Introduction to Logistic Regression and Support Vector Machine

Introduction to Logistic Regression and Support Vector Machine Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel

More information

Nearest Neighbors Methods for Support Vector Machines

Nearest Neighbors Methods for Support Vector Machines Nearest Neighbors Methods for Support Vector Machines A. J. Quiroz, Dpto. de Matemáticas. Universidad de Los Andes joint work with María González-Lima, Universidad Simón Boĺıvar and Sergio A. Camelo, Universidad

More information

Change point method: an exact line search method for SVMs

Change point method: an exact line search method for SVMs Erasmus University Rotterdam Bachelor Thesis Econometrics & Operations Research Change point method: an exact line search method for SVMs Author: Yegor Troyan Student number: 386332 Supervisor: Dr. P.J.F.

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

Introduction to SVM and RVM

Introduction to SVM and RVM Introduction to SVM and RVM Machine Learning Seminar HUS HVL UIB Yushu Li, UIB Overview Support vector machine SVM First introduced by Vapnik, et al. 1992 Several literature and wide applications Relevance

More information

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University Lecture 18: Kernels Risk and Loss Support Vector Regression Aykut Erdem December 2016 Hacettepe University Administrative We will have a make-up lecture on next Saturday December 24, 2016 Presentations

More information

Multicategory Vertex Discriminant Analysis for High-Dimensional Data

Multicategory Vertex Discriminant Analysis for High-Dimensional Data Multicategory Vertex Discriminant Analysis for High-Dimensional Data Tong Tong Wu Department of Epidemiology and Biostatistics University of Maryland, College Park October 8, 00 Joint work with Prof. Kenneth

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods 2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights Linear Discriminant Functions and Support Vector Machines Linear, threshold units CSE19, Winter 11 Biometrics CSE 19 Lecture 11 1 X i : inputs W i : weights θ : threshold 3 4 5 1 6 7 Courtesy of University

More information

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization / Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods

More information

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Machine Learning. Lecture 6: Support Vector Machine. Feng Li. Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)

More information

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:

More information

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines

Nonlinear Support Vector Machines through Iterative Majorization and I-Splines Nonlinear Support Vector Machines through Iterative Majorization and I-Splines P.J.F. Groenen G. Nalbantov J.C. Bioch July 9, 26 Econometric Institute Report EI 26-25 Abstract To minimize the primal support

More information

10-701/ Machine Learning - Midterm Exam, Fall 2010

10-701/ Machine Learning - Midterm Exam, Fall 2010 10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam

More information

Support Vector Machine, Random Forests, Boosting Based in part on slides from textbook, slides of Susan Holmes. December 2, 2012

Support Vector Machine, Random Forests, Boosting Based in part on slides from textbook, slides of Susan Holmes. December 2, 2012 Support Vector Machine, Random Forests, Boosting Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Neural networks Neural network Another classifier (or regression technique)

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization

More information

Online Learning and Sequential Decision Making

Online Learning and Sequential Decision Making Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning

More information

Big Data Analytics: Optimization and Randomization

Big Data Analytics: Optimization and Randomization Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.

More information

Perceptron Revisited: Linear Separators. Support Vector Machines

Perceptron Revisited: Linear Separators. Support Vector Machines Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Bayesian Support Vector Machines for Feature Ranking and Selection

Bayesian Support Vector Machines for Feature Ranking and Selection Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction

More information

A Bias Correction for the Minimum Error Rate in Cross-validation

A Bias Correction for the Minimum Error Rate in Cross-validation A Bias Correction for the Minimum Error Rate in Cross-validation Ryan J. Tibshirani Robert Tibshirani Abstract Tuning parameters in supervised learning problems are often estimated by cross-validation.

More information

Model Selection for Gaussian Processes

Model Selection for Gaussian Processes Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Tobias Pohlen Selected Topics in Human Language Technology and Pattern Recognition February 10, 2014 Human Language Technology and Pattern Recognition Lehrstuhl für Informatik 6

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction

More information

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM 1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

More information

Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 April 5, 2013 Due: April 19, 2013

Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 April 5, 2013 Due: April 19, 2013 Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 April 5, 2013 Due: April 19, 2013 A. Kernels 1. Let X be a finite set. Show that the kernel

More information

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Machine Learning Support Vector Machines. Prof. Matteo Matteucci Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple

More information

Statistical Methods for SVM

Statistical Methods for SVM Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,

More information

Regularization Paths

Regularization Paths December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and

More information

Part I Week 7 Based in part on slides from textbook, slides of Susan Holmes

Part I Week 7 Based in part on slides from textbook, slides of Susan Holmes Part I Week 7 Based in part on slides from textbook, slides of Susan Holmes Support Vector Machine, Random Forests, Boosting December 2, 2012 1 / 1 2 / 1 Neural networks Artificial Neural networks: Networks

More information

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725

Proximal Newton Method. Ryan Tibshirani Convex Optimization /36-725 Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12

Ensemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12 Ensemble Methods Charles Sutton Data Mining and Exploration Spring 2012 Bias and Variance Consider a regression problem Y = f(x)+ N(0, 2 ) With an estimate regression function ˆf, e.g., ˆf(x) =w > x Suppose

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Andreas Maletti Technische Universität Dresden Fakultät Informatik June 15, 2006 1 The Problem 2 The Basics 3 The Proposed Solution Learning by Machines Learning

More information

Generalized Boosted Models: A guide to the gbm package

Generalized Boosted Models: A guide to the gbm package Generalized Boosted Models: A guide to the gbm package Greg Ridgeway April 15, 2006 Boosting takes on various forms with different programs using different loss functions, different base models, and different

More information

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the

More information

Iteratively Reweighted Least Square for Asymmetric L 2 -Loss Support Vector Regression

Iteratively Reweighted Least Square for Asymmetric L 2 -Loss Support Vector Regression Iteratively Reweighted Least Square for Asymmetric L 2 -Loss Support Vector Regression Songfeng Zheng Department of Mathematics Missouri State University Springfield, MO 65897 SongfengZheng@MissouriState.edu

More information

The Margin Vector, Admissible Loss and Multi-class Margin-based Classifiers

The Margin Vector, Admissible Loss and Multi-class Margin-based Classifiers The Margin Vector, Admissible Loss and Multi-class Margin-based Classifiers Hui Zou University of Minnesota Ji Zhu University of Michigan Trevor Hastie Stanford University Abstract We propose a new framework

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

More information

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers Computational Methods for Data Analysis Massimo Poesio SUPPORT VECTOR MACHINES Support Vector Machines Linear classifiers 1 Linear Classifiers denotes +1 denotes -1 w x + b>0 f(x,w,b) = sign(w x + b) How

More information

SMO Algorithms for Support Vector Machines without Bias Term

SMO Algorithms for Support Vector Machines without Bias Term Institute of Automatic Control Laboratory for Control Systems and Process Automation Prof. Dr.-Ing. Dr. h. c. Rolf Isermann SMO Algorithms for Support Vector Machines without Bias Term Michael Vogt, 18-Jul-2002

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date

More information

Support Vector Machines.

Support Vector Machines. Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel

More information

CSC 411 Lecture 17: Support Vector Machine

CSC 411 Lecture 17: Support Vector Machine CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17

More information

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

A talk on Oracle inequalities and regularization. by Sara van de Geer

A talk on Oracle inequalities and regularization. by Sara van de Geer A talk on Oracle inequalities and regularization by Sara van de Geer Workshop Regularization in Statistics Banff International Regularization Station September 6-11, 2003 Aim: to compare l 1 and other

More information

An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM

An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM Wei Chu Chong Jin Ong chuwei@gatsby.ucl.ac.uk mpeongcj@nus.edu.sg S. Sathiya Keerthi mpessk@nus.edu.sg Control Division, Department

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Kernel Learning via Random Fourier Representations

Kernel Learning via Random Fourier Representations Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied

More information

Classifier Complexity and Support Vector Classifiers

Classifier Complexity and Support Vector Classifiers Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl

More information

A Study of Relative Efficiency and Robustness of Classification Methods

A Study of Relative Efficiency and Robustness of Classification Methods A Study of Relative Efficiency and Robustness of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang April 28, 2011 Department of Statistics

More information

COMS 4771 Regression. Nakul Verma

COMS 4771 Regression. Nakul Verma COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Bits of Machine Learning Part 1: Supervised Learning

Bits of Machine Learning Part 1: Supervised Learning Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification

More information

Support Vector Machine II

Support Vector Machine II Support Vector Machine II Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 due tonight HW 2 released. Online Scalable Learning Adaptive to Unknown Dynamics and Graphs Yanning

More information

Approximation Theoretical Questions for SVMs

Approximation Theoretical Questions for SVMs Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually

More information