A Magiv CV Theory for Large-Margin Classifiers
|
|
- Sandra Fowler
- 5 years ago
- Views:
Transcription
1 A Magiv CV Theory for Large-Margin Classifiers Hui Zou School of Statistics, University of Minnesota June 30, 2018 Joint work with Boxiang Wang
2 Outline 1 Background 2 Magic CV formula 3 Magic support vector machines 4 Magic CV applications in kernel learning theory 5 Numerical studies
3 Binary classification Observations: a collection of i.i.d. training data (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ). Predictors (covariates, input vector): x i = (x i1,..., x ip ). Response (class label, output variable): y i = { 1, 1}. Build a model ˆf using the training data. Given any new input x, we predict the class label ŷ = ˆf(x). 2/37
4 Classification toolbox linear discriminant analysis logistic regression kernel density classifier naïve Bayes classifier neural network boosting ensembles random forest support vector machine (SVM)... Experiment: Fernández et al (2014, JMLR) compared 179 commonly used classifiers on 121 UCI data sets. Conclusion: best classifiers are random forest, kernel SVM, neural networks, and boosting ensembles. 3/37
5 Support vector machine The Non-separable Case Distance y i (ω 0 + x T i ω) y = +1 may be negative. x2 η i x i Introduce slack variables η i 0. Redefine the distance d i : y = 1 ω d i = y i (ω 0 + x T i ω) + η i such that d i 0 for all i. x 1 4/37
6 Support vector machine SVM (Vapnik, 1995) argmax ω 0,ω subject to min i d i, d i = y i (ω 0 + x T i ω) + η i 0, i, η i 0, i, ω T ω = 1, η i t. i The tuning parameter t controls the extent of the slack variables. 5/37
7 Computing SVM in the dual space Lagrange dual function: [ n max L D = max α i 1 α α 2 subject to i=1 n i=1 i =1 n α i y i = 0 and 0 α i γ, i. i=1 The solution has the form: ˆf(x) = ˆβ 0 + ] n α i α i y i y i x i, x i, n ˆα i y i x i, x. The coefficients, ˆα i, are nonzero only when the observations are the support vectors. i=1 6/37
8 Kernel trick in the dual space Lagrange dual and solution with kernel function: [ n max L D = max α i 1 α α 2 subject to i=1 n i=1 i =1 n α i y i = 0 and 0 α i γ, i. i=1 ˆf(x) = β 0 + Gaussian kernel: n ˆα i y i K(x i, x). i=1 K(x, x ) = exp( σ x x 2 2). ] n α i α i y i y i K(x i, x i ), 7/37
9 State-of-the-art SVM solvers interior point method. Example: R package kernlab. sequential minimal optimization. (Platt, 1999; Osuna et al, 1997; Keerthi et al., 2001; Fan et al., 2005) LIBSVM library, R package e /37
10 Tuning the SVM The kernel SVM has high prediction accuracy. The generalization error of SVM depends on the choice of the tuning parameter. Two tasks: 1 Model comparison/selection: e.g., choose the tuning parameter for a procedure. 2 Model assessment: estimate the generalization error for the final model. Cross-validation: perhaps the simplest and the most widely used tool. 9/37
11 Cross-validation
12 V -fold cross-validation Fold 1 Fold 2 Fold 3 Validation Train Train Train Train Validation Train Train Train Train Validation Train Fold V Train Train Train Validation Cross-validation error: CV(λ) = V 1 V v=1 L(Y v, ˆf [ v] λ (X v )). Tuning parameter: ˆλ = argmin CV(λ). λ Generalization error: Err( ˆf λ ) CV (λ). Leave-one-out CV (LOOCV): V = n. Ten-fold CV: V = /37
13 LOOCV or ten-fold CV? LOOCV is an almost unbiased estimator for the true generalization error, i.e., small bias. LOOCV is deterministic while ten-fold CV is random in terms of different training/validation splits. LOOCV is computationally expensive due to n times model fits.? LOOCV is claimed to have larger variance than ten-fold CV. The last statement is quite popular but it is not generally true. 11/37
14 Mean of classification error LOOCV cv error true error 10 fold cv error true error 5 fold cv error true error 2 fold cv error true error Variance of classification error log(lambda) log(lambda) log(lambda) log(lambda) 1 LOOCV has almost no bias in estimating generalization error. 2 LOOCV has the variance no larger than other V -fold CV. 12/37
15 Magic CV Formula
16 LOOCV for regression Model: y = f(x) + ɛ. Estimate f using the regularization: ˆf λ = argmin f [ 1 n ] n (y i f(x i )) 2 + λp (f). i=1 Examples: ridge regression, f(x) = x β, P (f) = β 2 2 ; smoothing spline, P (f) = f (u) 2 du. LOOCV estimate: ˆf [ v] λ = argmin f 1 n n i=1,i v (y i f(x i )) 2 + λp (f). LOOCV error: LOOCV(λ) = 1 n i=1 (y i n ˆf [ i] λ (x i )) 2. 13/37
17 Magic leave-one-out lemma for regression Craven and Wahba (1979) Suppose that ˆf λ (x i ) = H i y, is self-stable, then LOOCV(λ) = 1 n = 1 n n i=1 n i=1 ( y i ) [ i] 2 ˆf λ (x i ) (y i ˆf λ (x i )) 2 (1 h ii ) 2. 14/37
18 Self-stability property f(x [ 5], y [ 5] ) f(x, ỹ) y f(x, y) y x x 15/37
19 The Question We must consider the computation cost of the SVM with CV. Can we compute the exact LOOCV of SVM and related classifiers without repeating the algorithm n times? 16/37
20 Our contributions Propose a magic CV formula to compute the exact cross-validation error in the context of large-margin classification. Develop a magic SVM by designing a very efficient algorithm to solve kernel SVM and applying the magic CV formula for tuning the parameters. Obtain the theoretic bounds of the expectation and variance of cross-validation error based on the CV formula. 17/37
21 Performance demo method time (sec) error (%) time (sec) error (%) arrhythmia n = 452 p = 191 musk n = 476 p = 166 magicsvm (0.602) (0.415) kernlab (0.602) (0.433) e (0.602) (0.420) australian n = 690 p = 14 SAfrica n = 462 p = 65 magicsvm (0.372) (0.616) kernlab (0.388) (0.750) e (0.371) (0.692) hepatitis n = 112 p = 18 sonar n = 208 p = 6 magicsvm (1.044) (0.749) kernlab (0.966) (0.973) e (0.933) (1.351) LSVT n = 126 p = 309 valley n = 606 p = 100 magicsvm (0.620) (0.186) kernlab (0.766) (0.232) e (0.763) (0.209) 18/37
22 Magic CV formula for SVM
23 Cross-validation estimates: A(X [ v], y [ v] ) = A(X, ỹ). Regression Large-margin Classification! "! "! "! "! #! #! #! #! % '( % (* % )! % -! &! &! &! &!,!!,! 19/37
24 Cross-validation estimates: A(X [ v], y [ v] ) = A(X, ỹ). Regression Large-margin Classification! "! "! "! "! "! "! "! "! #! #! #! #! #! #! #! #! %! % '( % '( (* % (* % ) % )! %! % - -! &! &! &! &! &! &! &! &!!,!,!!!,!,! 19/37
25 Kernel SVM in the primal space SVM: argmax ω 0,ω subject to loss + penalty: min i d i, d i = y i (ω 0 + x T i ω) + η i 0, η i 0, i, ω T ω = 1, η i t. i [ 1 n [ argmin 1 yi (β 0 + x T i β) ] ] β 0,β n + + λβt β. i= SVM /37
26 Kernel SVM in the primal space 1 min f H K [ 1 n ] n [1 y i f(x i )] + + λ f 2 H K, i=1 2 Mercer Theorem: a kernel function K has an eigen-expansion, K(x, x ) = γ t φ t (x)φ t (x ). t=1 3 An Hilbert space H K is defined as the collection of functions f(x) = θ t φ t (x). t=1 with the inner product defined as θ t φ t (x), δ t φ t (x) θ t δ t /γ t. t=1 t =1 H K = t=1 21/37
27 4 The representer theorem (Kimeldorf and Wahba, 1971): min f H K [ 1 n ˆα = arg min α ] n [1 y i f(x i )] + + λ f 2 H K, i=1 ˆf(x) = [ 1 n n ˆα i K(x, x i ) = K T i α, i=1 n [ 1 yi K T i α ] ] + + λαt Kα. i=1 22/37
28 Magic leave-one-out formula for SVM If we let ỹ v = 0 and ỹ i = y i if i v, then we have ˆf [ v] λ = argmin f H K [ 1 n ] n L (ỹ i f(x i )) + λ f 2 H K. i=1 23/37
29 Magic cross-validation formula (V -fold CV) The ith data point is allocated to the fold τ(i) by randomization: τ : {1,..., n} {1,..., V }. The V -fold CV estimate: ˆf [ v] λ = argmin 1 f H K n {i:τ(i) v} L (y i f(x i )) + λ f 2 H K. We define ỹ i = 0 if τ(i) = v and ỹ i = y i if τ(i) v, then we have ˆf [ v] λ = argmin f H K [ 1 n ] n L (ỹ i f(x i )) + λ f 2 H K. i=1 24/37
30 Magic SVM Efficient algorithm for training SVM Exact smoothing principle Accelerated proximal gradient descent Tune SVM using leave-one-out cross-validation Magic CV formula 25/37
31 Smoothed SVM loss: 0 u 1 + δ, L δ 1 (u) = 4δ [u (1 + δ)]2 1 δ < u < 1 + δ, 1 u u 1 δ. Lipschitz gradient: L δ (u 1 ) L δ (u 2 ) 1 2δ u 1 u 2. Loss Lδ(u) δ = 0.5 δ = 0.25 δ = 0.1 δ = 0.01 SVM u 26/37
32 Theorem (finite exact smoothing of SVM) With training data K and y given, suppose α SVM and α δ are the unique solution of the following problems, then there exists a small δ such that α δ = α SVM when δ < δ. α SVM = argmin α R n α δ = argmin α R n [ 1 n [ 1 n ] n L(y i K i α) + λα Kα. i=1 ] n L δ (y i K i α) + λα Kα. i=1 The exact SVM solution is virtually attained before δ = 0. Define a sequence δ (d+1) = rδ (d) and 0 < r < 1. We solve ˆα δ (d) sequentially and terminate the algorithm when some ˆα δ (d) satisfies the KKT condition of the SVM problem. 27/37
33 Smoothed SVM: min α R Qδ (α) min n α R n [ 1 n ] n L δ (y i K i α) + λα Kα. i=1 Accelerated proximal gradient descent update: [ α (t+1) = argmin λα Kα + 1 ( ) ] α ᾱ (t) 2δ l δ (ᾱ (t) 2 ) α R n 4nδ 2 ( = ᾱ (t) 2λK + 1 ) 1 ( 2nδ KK l δ (ᾱ (t) ) + 2λKᾱ (t)). ( ) ᾱ (t) = α (t) rt 1 + (α (t) α (t 1) ), r 1 = 1, r t+1 = r t+1 ( r 2 t ) /2. 28/37
34 Algorithm 1 Magic SVM Require: y, K, λ, and r, (e.g., r = 2 3 ). 1: Initialize δ. Define L δ. Initialize each α [ v]. 2: repeat 3: Compute P 1 δ (K) = (2λK + 1 2nδ KK) 1. 4: for v = 1,..., n do 5: Let ỹ i = y i if i v, and ỹ v = 0. 6: repeat 7: Compute z, with z i = ỹ i L δ ((ỹ i K i α [ v] )/n. 8: α [ v] α [ v] P 1 δ (K) K z + 2λK α [ v]). 9: until the convergence condition is met. 10: end for 11: Update δ rδ. 12: until the KKT condition check of SVM is passed. The complexity of regular LOOCV SVM O(n 4 ). The complexity of the entire magic LOOCV SVM is O(n 3 ), 29/37
35 The algorithm can be generalized to other variants of CV, such as V -fold CV, delete-v CV, etc. The algorithm can be generalized to other kernel machines, e.g., logistic regression, squared SVM, huber SVM. 30/37
36 Simulation Define µ + = (1,..., 1, 0,..., 0) and µ = (0,..., 0, 1,..., 1). Positive class: 10 k=1 0.1N(µ k+, 4I) where µ k+ N(µ +, I). Negative class: 10 k=1 0.1N(µ k, 4I) where µ k N(µ, I). Ratios of the run time without the magic CV to the magic CV: ratio of run time p=0.2n p=0.5n sample size 31/37
37 n p time (seconds) objective value magicsvm kernlab e1071 magicsvm kernlab e (0.011) (0.011) (0.011) (0.014) (0.014) (0.073) (0.010) (0.010) (0.010) (0.018) (0.017) (0.017) (0.012) (0.012) (0.012) (0.015) (0.015) (0.015) 32/37
38 Magic CV in kernel learning theory
39 Theorem (Bounding expectation of V -fold CV error) Suppose each training data in T n = {(x i, y i )} n i=1 is sampled from the same distribution. Suppose the ith data point is allocated to the τ(i)th set. Let B = sup x K(x, x) and Λ = sup u L (u). Then the expectation of V -fold CV error, 2 < V n, satisfies E T n err V CV E T n err( ˆf λ ) + BΛ2 2λV. V -fold CV error bound: err V CV = 1 V n Training error: v=1 L(y i {i:τ(i)=v} ˆf [ v] λ (x i )). err( ˆf λ ) = 1 n n i=1 ( L y i ˆf ) λ (x i ). 33/37
40 Corollary (Generalization error bound for kernel SVM) Suppose each training data in T n = {(x i, y i )} n i=1 is sampled from the same distribution. Define f = argmin f HK Err(f). Then E T n Err( ˆf λ ) Err( f) + λ f 2 H K + 1 2λn. Gaussian kernel: B = sup x K(x, x) = 1; SVM: Λ = sup u L (u) = 1. 34/37
41 Theorem (Bounding the variance of the V -fold CV) Suppose each training data in T n = {(x i, y i )} n i=1 is sampled from the same distribution. Suppose the ith data point is allocated to the τ(i)th set. Let B = sup x K(x, x) and Λ = sup u L (u). Var T n (err LOOCV ) 1 n ( 1 + ) 4 1. λ 35/37
42 Thank You
Support Vector Machines for Classification: A Statistical Portrait
Support Vector Machines for Classification: A Statistical Portrait Yoonkyung Lee Department of Statistics The Ohio State University May 27, 2011 The Spring Conference of Korean Statistical Society KAIST,
More informationIndirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina
Indirect Rule Learning: Support Vector Machines Indirect learning: loss optimization It doesn t estimate the prediction rule f (x) directly, since most loss functions do not have explicit optimizers. Indirection
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationCS6375: Machine Learning Gautam Kunapuli. Support Vector Machines
Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationDirect Learning: Linear Classification. Donglin Zeng, Department of Biostatistics, University of North Carolina
Direct Learning: Linear Classification Logistic regression models for classification problem We consider two class problem: Y {0, 1}. The Bayes rule for the classification is I(P(Y = 1 X = x) > 1/2) so
More informationSupport Vector Machines: Maximum Margin Classifiers
Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationA GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES. Wei Chu, S. Sathiya Keerthi, Chong Jin Ong
A GENERAL FORMULATION FOR SUPPORT VECTOR MACHINES Wei Chu, S. Sathiya Keerthi, Chong Jin Ong Control Division, Department of Mechanical Engineering, National University of Singapore 0 Kent Ridge Crescent,
More informationLecture 10: Support Vector Machine and Large Margin Classifier
Lecture 10: Support Vector Machine and Large Margin Classifier Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationTopics we covered. Machine Learning. Statistics. Optimization. Systems! Basics of probability Tail bounds Density Estimation Exponential Families
Midterm Review Topics we covered Machine Learning Optimization Basics of optimization Convexity Unconstrained: GD, SGD Constrained: Lagrange, KKT Duality Linear Methods Perceptrons Support Vector Machines
More informationSupport Vector Machines
Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)
More informationSupport Vector Machines
Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot, we get creative in two
More informationLecture 10: A brief introduction to Support Vector Machine
Lecture 10: A brief introduction to Support Vector Machine Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department
More informationStatistical Methods for Data Mining
Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find
More informationBasis Expansion and Nonlinear SVM. Kai Yu
Basis Expansion and Nonlinear SVM Kai Yu Linear Classifiers f(x) =w > x + b z(x) = sign(f(x)) Help to learn more general cases, e.g., nonlinear models 8/7/12 2 Nonlinear Classifiers via Basis Expansion
More informationReview: Support vector machines. Machine learning techniques and image analysis
Review: Support vector machines Review: Support vector machines Margin optimization min (w,w 0 ) 1 2 w 2 subject to y i (w 0 + w T x i ) 1 0, i = 1,..., n. Review: Support vector machines Margin optimization
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationStat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.
Linear SVM (separable case) First consider the scenario where the two classes of points are separable. It s desirable to have the width (called margin) between the two dashed lines to be large, i.e., have
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationSupport Vector Machine I
Support Vector Machine I Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative Please use piazza. No emails. HW 0 grades are back. Re-grade request for one week. HW 1 due soon. HW
More informationAnother Look at DWD: Thrifty Algorithm and Bayes Risk Consistency in RKHS
Another Look at DWD: Thrifty Algorithm and Bayes Risk Consistency in RKHS arxiv:1508.05913v1 stat.ml] 24 Aug 2015 Boxiang Wang and Hui Zou August 21, 2015 Abstract Distance weighted discrimination (DWD
More informationPattern Recognition 2018 Support Vector Machines
Pattern Recognition 2018 Support Vector Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 1 / 48 Support Vector Machines Ad Feelders ( Universiteit Utrecht
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationSupport Vector Machines
EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable
More informationLINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
LINEAR CLASSIFIERS Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification, the input
More informationIntroduction to Logistic Regression and Support Vector Machine
Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel
More informationNearest Neighbors Methods for Support Vector Machines
Nearest Neighbors Methods for Support Vector Machines A. J. Quiroz, Dpto. de Matemáticas. Universidad de Los Andes joint work with María González-Lima, Universidad Simón Boĺıvar and Sergio A. Camelo, Universidad
More informationChange point method: an exact line search method for SVMs
Erasmus University Rotterdam Bachelor Thesis Econometrics & Operations Research Change point method: an exact line search method for SVMs Author: Yegor Troyan Student number: 386332 Supervisor: Dr. P.J.F.
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationIntroduction to SVM and RVM
Introduction to SVM and RVM Machine Learning Seminar HUS HVL UIB Yushu Li, UIB Overview Support vector machine SVM First introduced by Vapnik, et al. 1992 Several literature and wide applications Relevance
More informationLecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University
Lecture 18: Kernels Risk and Loss Support Vector Regression Aykut Erdem December 2016 Hacettepe University Administrative We will have a make-up lecture on next Saturday December 24, 2016 Presentations
More informationMulticategory Vertex Discriminant Analysis for High-Dimensional Data
Multicategory Vertex Discriminant Analysis for High-Dimensional Data Tong Tong Wu Department of Epidemiology and Biostatistics University of Maryland, College Park October 8, 00 Joint work with Prof. Kenneth
More informationSupport Vector Machines and Kernel Methods
2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationLinear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights
Linear Discriminant Functions and Support Vector Machines Linear, threshold units CSE19, Winter 11 Biometrics CSE 19 Lecture 11 1 X i : inputs W i : weights θ : threshold 3 4 5 1 6 7 Courtesy of University
More informationCoordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /
Coordinate descent Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Adding to the toolbox, with stats and ML in mind We ve seen several general and useful minimization tools First-order methods
More informationMachine Learning. Lecture 6: Support Vector Machine. Feng Li.
Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)
More informationKernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning
Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:
More informationNonlinear Support Vector Machines through Iterative Majorization and I-Splines
Nonlinear Support Vector Machines through Iterative Majorization and I-Splines P.J.F. Groenen G. Nalbantov J.C. Bioch July 9, 26 Econometric Institute Report EI 26-25 Abstract To minimize the primal support
More information10-701/ Machine Learning - Midterm Exam, Fall 2010
10-701/15-781 Machine Learning - Midterm Exam, Fall 2010 Aarti Singh Carnegie Mellon University 1. Personal info: Name: Andrew account: E-mail address: 2. There should be 15 numbered pages in this exam
More informationSupport Vector Machine, Random Forests, Boosting Based in part on slides from textbook, slides of Susan Holmes. December 2, 2012
Support Vector Machine, Random Forests, Boosting Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Neural networks Neural network Another classifier (or regression technique)
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization
More informationOnline Learning and Sequential Decision Making
Online Learning and Sequential Decision Making Emilie Kaufmann CNRS & CRIStAL, Inria SequeL, emilie.kaufmann@univ-lille.fr Research School, ENS Lyon, Novembre 12-13th 2018 Emilie Kaufmann Online Learning
More informationBig Data Analytics: Optimization and Randomization
Big Data Analytics: Optimization and Randomization Tianbao Yang Tutorial@ACML 2015 Hong Kong Department of Computer Science, The University of Iowa, IA, USA Nov. 20, 2015 Yang Tutorial for ACML 15 Nov.
More informationPerceptron Revisited: Linear Separators. Support Vector Machines
Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationBayesian Support Vector Machines for Feature Ranking and Selection
Bayesian Support Vector Machines for Feature Ranking and Selection written by Chu, Keerthi, Ong, Ghahramani Patrick Pletscher pat@student.ethz.ch ETH Zurich, Switzerland 12th January 2006 Overview 1 Introduction
More informationA Bias Correction for the Minimum Error Rate in Cross-validation
A Bias Correction for the Minimum Error Rate in Cross-validation Ryan J. Tibshirani Robert Tibshirani Abstract Tuning parameters in supervised learning problems are often estimated by cross-validation.
More informationModel Selection for Gaussian Processes
Institute for Adaptive and Neural Computation School of Informatics,, UK December 26 Outline GP basics Model selection: covariance functions and parameterizations Criteria for model selection Marginal
More informationSupport Vector Machines
Support Vector Machines Tobias Pohlen Selected Topics in Human Language Technology and Pattern Recognition February 10, 2014 Human Language Technology and Pattern Recognition Lehrstuhl für Informatik 6
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University
More informationMehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 April 5, 2013 Due: April 19, 2013
Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 April 5, 2013 Due: April 19, 2013 A. Kernels 1. Let X be a finite set. Show that the kernel
More informationMachine Learning Support Vector Machines. Prof. Matteo Matteucci
Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple
More informationStatistical Methods for SVM
Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,
More informationRegularization Paths
December 2005 Trevor Hastie, Stanford Statistics 1 Regularization Paths Trevor Hastie Stanford University drawing on collaborations with Brad Efron, Saharon Rosset, Ji Zhu, Hui Zhou, Rob Tibshirani and
More informationPart I Week 7 Based in part on slides from textbook, slides of Susan Holmes
Part I Week 7 Based in part on slides from textbook, slides of Susan Holmes Support Vector Machine, Random Forests, Boosting December 2, 2012 1 / 1 2 / 1 Neural networks Artificial Neural networks: Networks
More informationProximal Newton Method. Ryan Tibshirani Convex Optimization /36-725
Proximal Newton Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: primal-dual interior-point method Given the problem min x subject to f(x) h i (x) 0, i = 1,... m Ax = b where f, h
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationEnsemble Methods. Charles Sutton Data Mining and Exploration Spring Friday, 27 January 12
Ensemble Methods Charles Sutton Data Mining and Exploration Spring 2012 Bias and Variance Consider a regression problem Y = f(x)+ N(0, 2 ) With an estimate regression function ˆf, e.g., ˆf(x) =w > x Suppose
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Andreas Maletti Technische Universität Dresden Fakultät Informatik June 15, 2006 1 The Problem 2 The Basics 3 The Proposed Solution Learning by Machines Learning
More informationGeneralized Boosted Models: A guide to the gbm package
Generalized Boosted Models: A guide to the gbm package Greg Ridgeway April 15, 2006 Boosting takes on various forms with different programs using different loss functions, different base models, and different
More informationLinear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction
Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the
More informationIteratively Reweighted Least Square for Asymmetric L 2 -Loss Support Vector Regression
Iteratively Reweighted Least Square for Asymmetric L 2 -Loss Support Vector Regression Songfeng Zheng Department of Mathematics Missouri State University Springfield, MO 65897 SongfengZheng@MissouriState.edu
More informationThe Margin Vector, Admissible Loss and Multi-class Margin-based Classifiers
The Margin Vector, Admissible Loss and Multi-class Margin-based Classifiers Hui Zou University of Minnesota Ji Zhu University of Michigan Trevor Hastie Stanford University Abstract We propose a new framework
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationMachine Learning. Support Vector Machines. Manfred Huber
Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data
More informationChapter 9. Support Vector Machine. Yongdai Kim Seoul National University
Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More information10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers
Computational Methods for Data Analysis Massimo Poesio SUPPORT VECTOR MACHINES Support Vector Machines Linear classifiers 1 Linear Classifiers denotes +1 denotes -1 w x + b>0 f(x,w,b) = sign(w x + b) How
More informationSMO Algorithms for Support Vector Machines without Bias Term
Institute of Automatic Control Laboratory for Control Systems and Process Automation Prof. Dr.-Ing. Dr. h. c. Rolf Isermann SMO Algorithms for Support Vector Machines without Bias Term Michael Vogt, 18-Jul-2002
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date
More informationSupport Vector Machines.
Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel
More informationCSC 411 Lecture 17: Support Vector Machine
CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17
More informationMachine Learning - MT & 5. Basis Expansion, Regularization, Validation
Machine Learning - MT 2016 4 & 5. Basis Expansion, Regularization, Validation Varun Kanade University of Oxford October 19 & 24, 2016 Outline Basis function expansion to capture non-linear relationships
More informationMachine Learning for NLP
Machine Learning for NLP Linear Models Joakim Nivre Uppsala University Department of Linguistics and Philology Slides adapted from Ryan McDonald, Google Research Machine Learning for NLP 1(26) Outline
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationA talk on Oracle inequalities and regularization. by Sara van de Geer
A talk on Oracle inequalities and regularization by Sara van de Geer Workshop Regularization in Statistics Banff International Regularization Station September 6-11, 2003 Aim: to compare l 1 and other
More informationAn Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM
An Improved Conjugate Gradient Scheme to the Solution of Least Squares SVM Wei Chu Chong Jin Ong chuwei@gatsby.ucl.ac.uk mpeongcj@nus.edu.sg S. Sathiya Keerthi mpessk@nus.edu.sg Control Division, Department
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationKernel Learning via Random Fourier Representations
Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 8: Optimization Cho-Jui Hsieh UC Davis May 9, 2017 Optimization Numerical Optimization Numerical Optimization: min X f (X ) Can be applied
More informationClassifier Complexity and Support Vector Classifiers
Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl
More informationA Study of Relative Efficiency and Robustness of Classification Methods
A Study of Relative Efficiency and Robustness of Classification Methods Yoonkyung Lee* Department of Statistics The Ohio State University *joint work with Rui Wang April 28, 2011 Department of Statistics
More informationCOMS 4771 Regression. Nakul Verma
COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationBits of Machine Learning Part 1: Supervised Learning
Bits of Machine Learning Part 1: Supervised Learning Alexandre Proutiere and Vahan Petrosyan KTH (The Royal Institute of Technology) Outline of the Course 1. Supervised Learning Regression and Classification
More informationSupport Vector Machine II
Support Vector Machine II Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 due tonight HW 2 released. Online Scalable Learning Adaptive to Unknown Dynamics and Graphs Yanning
More informationApproximation Theoretical Questions for SVMs
Ingo Steinwart LA-UR 07-7056 October 20, 2007 Statistical Learning Theory: an Overview Support Vector Machines Informal Description of the Learning Goal X space of input samples Y space of labels, usually
More information