Machine Learning 4771
|
|
- Gordon Fox
- 5 years ago
- Views:
Transcription
1 Machine Learning 477 Instructor: Tony Jebara
2 Topic 5 Generalization Guarantees VC-Dimension Nearest Neighbor Classification (infinite VC dimension) Structural Risk Minimization Support Vector Machines
3 Empirical Risk Minimization Example: non-pdf linear classifiers f x;θ Recall ERM: R ( emp θ) = N N L( y i, f ( x i ;θ)) 0, i= Have loss function: quadratic: L( y,x, θ) = y f x;θ 2 linear: L y,x, θ binary: = sign ( θ T x + θ 0 ) {,} ( ) = y f ( x;θ) = step yf ( x;θ) L y,x, θ Empirical R approximates the true risk (expected error) R( θ) emp θ = E { P L( x,y, θ) } = P ( x,y)l( x,y, θ)dx dy 0, X Y But, we don t know the true P(x,y)! If infinite data, law of large numbers says: lim N min θ R ( emp θ) = min θ R( θ) But, in general, can t make guarantees for ERM solution: arg min θ R ( emp θ) arg min θ R( θ)
4 Bounding the True Risk ERM is inconsistent not guaranteed may do better on training than on test! R( ˆθ ) R ( emp ˆθ ) Idea: add a prior or regularizer to Define capacity or confidence = = R ( emp θ) +C ( θ) J θ R emp R emp ( θ) θ which favors simpler J ( θ) If, R θ we have bound J θ is a guaranteed risk After train, can guarantee future error rate is min θ J θ
5 Bound the True Risk with VC But, how to find a guarantee? Difficult, but there is one Theorem (Vapnik): with probability -η where η is a number between [0,], the following bound holds: R( θ) J ( θ) = R emp ( θ) + 2h log 2eN h + 2log 4 η N N = number of data points h = Vapnik-Chervonenkis (VC) dimension (970 s) = capacity of the classifier class f.;θ Note, above is independent of the true P(x,y) A worst-case scenario bound, guaranteed for all P(x,y) VC dimension not just the # of parameters a classifier has VC measures # of different datasets it can classify perfectly Structural Risk Minimization: minimize risk bound J(θ) + + NR emp h log 2eN h ( θ) + log 4 η
6 VC Dimension & Shattering f (.;θ) How to compute h or VC for a family of functions h = # of training points that can be shattered Recall, classifier maps input to output f ( x;θ) y {,} Shattering: I pick h points & place them at x You challenge me with 2 h,,x h possible labelings y,,y h ± VC dimension is maximum # of points I can place which a f ( x;θ) can correctly classify for arbitrary labeling y,,y h Example: for 2d linear classifier h=3 f x;θ { } h = x θ + x 2 θ 2 + θ 0 can t ever shatter 4 points! or 3 points on a straight line
7 VC Dimension & Shattering More generally for higher dimensional linear classifiers, a hyperplane in shatters any set of linearly independent points. Can choose d+ linearly indep. points so h=d+ Note: VC is not necessarily proportional to # of parameters Example: sinusoidal d classifier f x;θ number of parameters= but h=infinity! since I can choose: x i = 0 i i =,,h no matter what labeling you challenge: y,,y h ± using h θ = π + y 2 ( i )0 i i= But, as a side note, if I choose 4 equally spaced x s then cannot shatter = sign sin( θx) shatters perfectly { } h
8 VC Dimension & Shattering Recall that VC dimension gives an upper bound We want to minimize h since that minimizes C(θ) & J(θ) If can t compute h exactly but can compute h + can plug in h + in bound & still guarantee Also, sometimes bound is trivial Need h/n = 0.3 before C(θ)< (recall R(θ) in [0,]) Note: h = low good performance h = poor performance
9 Nearest Neighbors & VC Consider Nearest Neighbors classification algorithm: Input a query example x Find training example x i in {x, x N } closest to x Predict label for x as y i of neighbor Often use Euclidean distance x x i to measure closeness Nearest neighbors shatters any set of points! So VC=infinity, C(θ)=infinity, guaranteed risk=infinity But still works well in practice h = poor performance h = low good performance
10 VC Dimension & Large Margins Linear classifiers are too big a function class since h=d+ Can reduce VC dimension if we restrict them Constrain linear classifiers to data living inside a sphere Gap-Tolerant classifiers: a linear classifier whose activity is constrained to a sphere & outside a margin Only count errors in shaded region Elsewhere have L(x,y,θ)=0 If M is small relative to D, can still shatter 3 points: M D M=margin D=diameter d=dimensionality
11 VC Dimension & Large Margins But, as M grows relative to D, can only shatter 2 points! Can t shatter 3 Can shatter 2 For hyperplanes, as M grows vs. D, shatter fewer points! VC dimension h goes down if gap-tolerant classifier has larger margin, general formula is: D 2 h min ceil Before, just had h=d+. Now we have a smaller h If data is anywhere, D is infinite and back to h=d+ Typically real data is bounded (by sphere), D is fixed Maximizing M reduces h, improving guaranteed risk J(θ) Note: R(θ) doesn t count errors in margin or outside sphere M 2,d +
12 Structural Risk Minimization Structural Risk Minimization: minimize risk bound J(θ) reducing empirical error & reduce VC dimension h 2eN 2h log( h ) + 2log( R( θ) J ( θ) = R emp ( θ) 4 η) NR emp ( θ) + N + + h log( 2eN h ) + log ( 4 η) for each model i in list of hypothesis h ) compute its h=h h 2 h 3 i 2) θ * = arg min θ R emp ( θ) Space of different 3) compute J ( θ *,h i ) Classifiers or choose model with lowest Hypotheses = arg min θ,h J θ,h J θ *,h i Or, directly optimize over both θ *,h If possible, min empirical error while also minimizing VC For gap-tolerant linear classifiers, minimize R emp (θ) while maximizing margin, support vector machines do just that!
13 Support Vector Machines Support vector machines are (in the simplest case) linear classifiers that do structural risk minimization (SRM) Directly maximize margin to reduce guaranteed risk J(θ) Assume first the 2-class data is linearly separable: have ( x,y ),, x N,y N { } { } where x i D and y i, = sign w T x +b f x;θ Decision boundary or hyperplane given by w T x +b = 0 Note: can scale w & b while keeping same boundary Many solutions exist which have empirical error R emp (θ)=0 Want widest or thickest one (max margin), also it s unique!
14 Side Note: Constraints How to minimize a function subject to equality constraints? min x,x 2 f x = min x,x 2 b x +b 2 x H x = min x b T x + x T Hx 2 f x = b + Hx = 0 x = H b 2 + H 2 x x 2 + H x Only walk on x =2x 2 or x -2x 2 =0 Use Lagrange Multipliers, for each constraint, subtract it times a lambda variable. Lambda blows up the minimization if we don t satisfy the constraint: min x,x 2 max λ f ( x ) λ equalitycondition = 0 = min x max,x 2 λ b x +b 2 x 2 + H x 2 + H 2 2 x x 2 + H x 2 λ x x 2
15 Side Note: Constraints Cost minimization with equality constraints: ) Subtract each constraint times an extra variable (a Lagrange multiplier λ, like an adversary variable) 2) Take partials with respect to x and set to zero 3) Plug solution into constraint to find lambda min x max λ f = min x max λ b T x + 2 f x = b + H x λ H λ 2 ( x ) λ equalitycondition = 0 x T H x λ x 2x 2 2 H b T = 0 2 x = H λ = 0 λ = b T H 2 2 T H x 2x 2 = x T 2 H b 2 2
16 Support Vector Machines Define: w T x +b = 0 H + =positive margin hyperplane H - =negative margin hyperplane q =distance from decision plane to origin q = min x x 0 subject to w T x +b = 0 x 0 2 λ w T x +b min x 2 ) grad x = 0 2 xt x λ w T x +b x λw = 0 x = λw 3) Sol n ˆx = ( b )w w T w 4) distance q = ˆx 0 = b w = w T w 5) Define without loss of generality since can scale b & w 2) plug into constraint b w T w w T w = b w H w T x +b = 0 H + w T x +b = + H w T x +b = w T x +b = 0 w T ( λw) +b = 0 λ = b w T w
17 Support Vector Machines The constraints on the SVM for R emp (θ)=0 are thus: w T x i +b + y i = + w T x i +b y i = y i ( w T x i +b) 0 Or more simply: The margin of the SVM is: H + w T x +b = + H w T x +b = m = d + + d Distance to origin: Therefore: d + = d = w H q = b and margin Want to max margin, or equivalently minimize: SVM Problem: min w 2 subject to y 2 i ( w T x i +b) 0 This is a quadratic program! Can plug this into a matlab function called qp(), done! w H + q + = b m = w 2 w H q = b w or w 2 w
18 Side Note: Optimization Tools A hierarchy of Matlab optimization packages to use: Linear Programming <Quadratic Programming min x b T x s.t. T ci x αi i x T Hx + b T x s.t. T ci x αi min x 2 <Quadratically Constrained Quadratic Programming <Semidefinite Programming <Convex Programming <Polynomial Time Algorithms i LP QP QCQP SDP CP P
19 Side Note: Optimization Tools Each data point adds y i ( w T x i +b) 0 linear inequality to QP Each point cuts a half plane of allowable SVMs and reduces green region The SVM is closest point to the origin that is still in the green region The preceptron algorithm just puts us randomly in green region QP runs in cubic polynomial time There are D values in the w vector Needs O(D 3 ) run time But, there is a DUAL SVM in O(N 3 )! 2 wt w
20 SVM in Dual Form We can also solve the problem via convex duality Primal SVM problem L P : This is a quadratic program, quadratic cost function with multiple linear inequalities (these carve out a convex hull) Subtract from cost each inequality times an α Lagrange multiplier, take derivatives of w & b: Plug back in, dual: Also have constraints: min w 2 subject to y 2 i ( w T x i +b) 0 L P = min w,b max w 2 α 0 α 2 ( i y i ( w T x i +b) ) i L = w α w P y x = 0 w = α i i i i i i y i x i L = α b P y i = 0 i i L D = α i i α 2 i α j y i y j x T i j i x j α i y i = 0 & α i 0 i Above L D must be maximized! convex duality also qp()
Learning Theory. Piyush Rai. CS5350/6350: Machine Learning. September 27, (CS5350/6350) Learning Theory September 27, / 14
Learning Theory Piyush Rai CS5350/6350: Machine Learning September 27, 2011 (CS5350/6350) Learning Theory September 27, 2011 1 / 14 Why Learning Theory? We want to have theoretical guarantees about our
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 3 Additive Models and Linear Regression Sinusoids and Radial Basis Functions Classification Logistic Regression Gradient Descent Polynomial Basis Functions
More informationIntroduction to Machine Learning
Introduction to Machine Learning Vapnik Chervonenkis Theory Barnabás Póczos Empirical Risk and True Risk 2 Empirical Risk Shorthand: True risk of f (deterministic): Bayes risk: Let us use the empirical
More informationComputational Learning Theory
Computational Learning Theory Pardis Noorzad Department of Computer Engineering and IT Amirkabir University of Technology Ordibehesht 1390 Introduction For the analysis of data structures and algorithms
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear
More informationMachine Learning And Applications: Supervised Learning-SVM
Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationSupport Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs
E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University October 11, 2012 Today: Computational Learning Theory Probably Approximately Coorrect (PAC) learning theorem
More informationMachine Learning. Support Vector Machines. Fabio Vandin November 20, 2017
Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training
More informationAnnouncements - Homework
Announcements - Homework Homework 1 is graded, please collect at end of lecture Homework 2 due today Homework 3 out soon (watch email) Ques 1 midterm review HW1 score distribution 40 HW1 total score 35
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationCOMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37
COMP 652: Machine Learning Lecture 12 COMP 652 Lecture 12 1 / 37 Today Perceptrons Definition Perceptron learning rule Convergence (Linear) support vector machines Margin & max margin classifier Formulation
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable
More informationCSC 411 Lecture 17: Support Vector Machine
CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17
More informationSupport Vector Machines for Classification and Regression
CIS 520: Machine Learning Oct 04, 207 Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationLecture Support Vector Machine (SVM) Classifiers
Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in
More informationKernel Methods and Support Vector Machines
Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationThe definitions and notation are those introduced in the lectures slides. R Ex D [h
Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 2 October 04, 2016 Due: October 18, 2016 A. Rademacher complexity The definitions and notation
More informationSupport Vector Machine
Support Vector Machine Kernel: Kernel is defined as a function returning the inner product between the images of the two arguments k(x 1, x 2 ) = ϕ(x 1 ), ϕ(x 2 ) k(x 1, x 2 ) = k(x 2, x 1 ) modularity-
More informationStatistical and Computational Learning Theory
Statistical and Computational Learning Theory Fundamental Question: Predict Error Rates Given: Find: The space H of hypotheses The number and distribution of the training examples S The complexity of the
More informationStatistical learning theory, Support vector machines, and Bioinformatics
1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.
More informationMachine Learning. Support Vector Machines. Manfred Huber
Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationSupport Vector Machines. Machine Learning Fall 2017
Support Vector Machines Machine Learning Fall 2017 1 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost 2 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost Produce
More informationLecture 9: Large Margin Classifiers. Linear Support Vector Machines
Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation
More informationSolving Classification Problems By Knowledge Sets
Solving Classification Problems By Knowledge Sets Marcin Orchel a, a Department of Computer Science, AGH University of Science and Technology, Al. A. Mickiewicza 30, 30-059 Kraków, Poland Abstract We propose
More informationSupport Vector Machines: Maximum Margin Classifiers
Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University
More informationCOMP9444: Neural Networks. Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization
: Neural Networks Vapnik Chervonenkis Dimension, PAC Learning and Structural Risk Minimization 11s2 VC-dimension and PAC-learning 1 How good a classifier does a learner produce? Training error is the precentage
More informationCS798: Selected topics in Machine Learning
CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning
More informationSupport Vector Machines.
Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel
More information18.9 SUPPORT VECTOR MACHINES
744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the
More informationThe Perceptron algorithm
The Perceptron algorithm Tirgul 3 November 2016 Agnostic PAC Learnability A hypothesis class H is agnostic PAC learnable if there exists a function m H : 0,1 2 N and a learning algorithm with the following
More informationPattern Recognition 2018 Support Vector Machines
Pattern Recognition 2018 Support Vector Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 1 / 48 Support Vector Machines Ad Feelders ( Universiteit Utrecht
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationLecture 18: Optimization Programming
Fall, 2016 Outline Unconstrained Optimization 1 Unconstrained Optimization 2 Equality-constrained Optimization Inequality-constrained Optimization Mixture-constrained Optimization 3 Quadratic Programming
More informationLinear Support Vector Machine. Classification. Linear SVM. Huiping Cao. Huiping Cao, Slide 1/26
Huiping Cao, Slide 1/26 Classification Linear SVM Huiping Cao linear hyperplane (decision boundary) that will separate the data Huiping Cao, Slide 2/26 Support Vector Machines rt Vector Find a linear Machines
More informationSupport Vector Machine (continued)
Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need
More informationSupport Vector Machines
Support Vector Machines Reading: Ben-Hur & Weston, A User s Guide to Support Vector Machines (linked from class web page) Notation Assume a binary classification problem. Instances are represented by vector
More informationUnderstanding Generalization Error: Bounds and Decompositions
CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationLinear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction
Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the
More informationSupport Vector Machines
Two SVM tutorials linked in class website (please, read both): High-level presentation with applications (Hearst 1998) Detailed tutorial (Burges 1998) Support Vector Machines Machine Learning 10701/15781
More informationMLCC 2017 Regularization Networks I: Linear Models
MLCC 2017 Regularization Networks I: Linear Models Lorenzo Rosasco UNIGE-MIT-IIT June 27, 2017 About this class We introduce a class of learning algorithms based on Tikhonov regularization We study computational
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationMachine Learning. Lecture 6: Support Vector Machine. Feng Li.
Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)
More informationSMO vs PDCO for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines
vs for SVM: Sequential Minimal Optimization vs Primal-Dual interior method for Convex Objectives for Support Vector Machines Ding Ma Michael Saunders Working paper, January 5 Introduction In machine learning,
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationLinear, Binary SVM Classifiers
Linear, Binary SVM Classifiers COMPSCI 37D Machine Learning COMPSCI 37D Machine Learning Linear, Binary SVM Classifiers / 6 Outline What Linear, Binary SVM Classifiers Do 2 Margin I 3 Loss and Regularized
More informationSupport Vector Machine for Classification and Regression
Support Vector Machine for Classification and Regression Ahlame Douzal AMA-LIG, Université Joseph Fourier Master 2R - MOSIG (2013) November 25, 2013 Loss function, Separating Hyperplanes, Canonical Hyperplan
More informationConvex Optimization M2
Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial
More informationOutline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22
Outline Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels Basic concepts: SVM and kernels SVM primal/dual problems
More informationSupport Vector Machines
EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable
More informationMachine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015
Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and
More informationLecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University
Lecture 18: Kernels Risk and Loss Support Vector Regression Aykut Erdem December 2016 Hacettepe University Administrative We will have a make-up lecture on next Saturday December 24, 2016 Presentations
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationLECTURE NOTE #8 PROF. ALAN YUILLE. Can we find a linear classifier that separates the position and negative examples?
LECTURE NOTE #8 PROF. ALAN YUILLE 1. Linear Classifiers and Perceptrons A dataset contains N samples: { (x µ, y µ ) : µ = 1 to N }, y µ {±1} Can we find a linear classifier that separates the position
More informationConvex Optimization and Support Vector Machine
Convex Optimization and Support Vector Machine Problem 0. Consider a two-class classification problem. The training data is L n = {(x 1, t 1 ),..., (x n, t n )}, where each t i { 1, 1} and x i R p. We
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationIndirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina
Indirect Rule Learning: Support Vector Machines Indirect learning: loss optimization It doesn t estimate the prediction rule f (x) directly, since most loss functions do not have explicit optimizers. Indirection
More informationGeneralization, Overfitting, and Model Selection
Generalization, Overfitting, and Model Selection Sample Complexity Results for Supervised Classification Maria-Florina (Nina) Balcan 10/03/2016 Two Core Aspects of Machine Learning Algorithm Design. How
More informationComputational and Statistical Learning theory
Computational and Statistical Learning theory Problem set 2 Due: January 31st Email solutions to : karthik at ttic dot edu Notation : Input space : X Label space : Y = {±1} Sample : (x 1, y 1,..., (x n,
More informationSupport Vector Machines, Kernel SVM
Support Vector Machines, Kernel SVM Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 27, 2017 1 / 40 Outline 1 Administration 2 Review of last lecture 3 SVM
More informationMachine Learning Lecture 7
Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant
More informationPart of the slides are adapted from Ziko Kolter
Part of the slides are adapted from Ziko Kolter OUTLINE 1 Supervised learning: classification........................................................ 2 2 Non-linear regression/classification, overfitting,
More informationClassifier Complexity and Support Vector Classifiers
Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl
More informationLearning From Data Lecture 25 The Kernel Trick
Learning From Data Lecture 25 The Kernel Trick Learning with only inner products The Kernel M. Magdon-Ismail CSCI 400/600 recap: Large Margin is Better Controling Overfitting Non-Separable Data 0.08 random
More informationSupport Vector Machines
Support Vector Machines Stephan Dreiseitl University of Applied Sciences Upper Austria at Hagenberg Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Overview Motivation
More informationSupport Vector Machines
Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal
More informationConsistency of Nearest Neighbor Methods
E0 370 Statistical Learning Theory Lecture 16 Oct 25, 2011 Consistency of Nearest Neighbor Methods Lecturer: Shivani Agarwal Scribe: Arun Rajkumar 1 Introduction In this lecture we return to the study
More informationSupport Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More informationSupport Vector Machines
Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)
More informationModelli Lineari (Generalizzati) e SVM
Modelli Lineari (Generalizzati) e SVM Corso di AA, anno 2018/19, Padova Fabio Aiolli 19/26 Novembre 2018 Fabio Aiolli Modelli Lineari (Generalizzati) e SVM 19/26 Novembre 2018 1 / 36 Outline Linear methods
More informationMax Margin-Classifier
Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More information(Kernels +) Support Vector Machines
(Kernels +) Support Vector Machines Machine Learning Torsten Möller Reading Chapter 5 of Machine Learning An Algorithmic Perspective by Marsland Chapter 6+7 of Pattern Recognition and Machine Learning
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationChapter 6 Classification and Prediction (2)
Chapter 6 Classification and Prediction (2) Outline Classification and Prediction Decision Tree Naïve Bayes Classifier Support Vector Machines (SVM) K-nearest Neighbors Accuracy and Error Measures Feature
More informationAdvanced Introduction to Machine Learning CMU-10715
Advanced Introduction to Machine Learning CMU-10715 Risk Minimization Barnabás Póczos What have we seen so far? Several classification & regression algorithms seem to work fine on training datasets: Linear
More informationMachine Learning A Geometric Approach
Machine Learning A Geometric Approach CIML book Chap 7.7 Linear Classification: Support Vector Machines (SVM) Professor Liang Huang some slides from Alex Smola (CMU) Linear Separator Ham Spam From Perceptron
More informationLinear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training
More informationCS6375: Machine Learning Gautam Kunapuli. Support Vector Machines
Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this
More informationPerceptron Revisited: Linear Separators. Support Vector Machines
Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department
More informationIFT Lecture 7 Elements of statistical learning theory
IFT 6085 - Lecture 7 Elements of statistical learning theory This version of the notes has not yet been thoroughly checked. Please report any bugs to the scribes or instructor. Scribe(s): Brady Neal and
More informationPolyhedral Computation. Linear Classifiers & the SVM
Polyhedral Computation Linear Classifiers & the SVM mcuturi@i.kyoto-u.ac.jp Nov 26 2010 1 Statistical Inference Statistical: useful to study random systems... Mutations, environmental changes etc. life
More informationDan Roth 461C, 3401 Walnut
CIS 519/419 Applied Machine Learning www.seas.upenn.edu/~cis519 Dan Roth danroth@seas.upenn.edu http://www.cis.upenn.edu/~danroth/ 461C, 3401 Walnut Slides were created by Dan Roth (for CIS519/419 at Penn
More informationThis is an author-deposited version published in : Eprints ID : 17710
Open Archive TOULOUSE Archive Ouverte (OATAO) OATAO is an open access repository that collects the work of Toulouse researchers and makes it freely available over the web where possible. This is an author-deposited
More informationComputational Learning Theory
CS 446 Machine Learning Fall 2016 OCT 11, 2016 Computational Learning Theory Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes 1 PAC Learning We want to develop a theory to relate the probability of successful
More informationFoundation of Intelligent Systems, Part I. SVM s & Kernel Methods
Foundation of Intelligent Systems, Part I SVM s & Kernel Methods mcuturi@i.kyoto-u.ac.jp FIS - 2013 1 Support Vector Machines The linearly-separable case FIS - 2013 2 A criterion to select a linear classifier:
More informationMachine Learning and Data Mining. Support Vector Machines. Kalev Kask
Machine Learning and Data Mining Support Vector Machines Kalev Kask Linear classifiers Which decision boundary is better? Both have zero training error (perfect training accuracy) But, one of them seems
More informationMachine Learning Support Vector Machines. Prof. Matteo Matteucci
Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way
More information