The definitions and notation are those introduced in the lectures slides. R Ex D [h
|
|
- Cathleen Henry
- 6 years ago
- Views:
Transcription
1 Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 2 October 04, 2016 Due: October 18, 2016 A. Rademacher complexity The definitions and notation are those introduced in the lectures slides. 1. What is the Rademacher complexity of a hypothesis set reduced to a single hypothesis? An alternative definition of the Rademacher is based on absolute values: R (H) = 1 m E h H m σ ih(x i ). Show the following upper bound for a hypothesis set reduced to a single hypothesis h: R Ex D h ({h}) 2 (x). (1) m (hint: you can use the inequality E X EX 2 which is valid for any random variable X, by Jensen s inequality). Solution: Let h be that single hypothesis. By definition, R({h}) = 1 m m E σ i h(x i ) = 1 m m E Eσ i h(x i ) = 0, S σ since E σ σ i = 0 for all i 1, m. Using Jensen s inequality, with the 1
2 alternative definition the Rademacher complexity can be bounded as follows: R ({h}) = 1 m E σ i h(x i ) = 1 m E 2 σ i h(x i ) 1 E 2 σ i h(x i ) (by Jensen s inequality) m = 1 E σ i σ j h(x i )h(x j ) m i,j=1 = 1 m ES h(x i ) m 2 (Eσ i σ j = 0 for i j) = m 1 Ex m ES h(x 1 ) 2 h = 2 (x) m. (i.i.d. sample) 2. Fix m 1. Prove the following identities for any α R and any two hypothesis sets H and H of functions mapping from X to R: R m (αh) = α R m (H) (2) R m (H + H ) R m (H) + R m (H ). (3) R m ({max(h, h ): h H, h H }) R m (H) + R m (H ), (4) where max(h, h ) denotes the function x max x X (h(x), h (x)) (hint: you could use the identity max(a, b) = 1 2 a + b + a b valid for all a, b R and the contraction lemma (Lecture 4)). Solution: If α 0, then h αh σ i h(x i ) = h H otherwise if α < 0, then h αh σ i h(x i ) = h αh α σ i h(x i ) = α h H α σ i h(x i ) = ( α) h H 2 σ i h(x i ), ( σ i )h(x i ).
3 Since σ i and σ i have the same distribution, this shows that R m (αh) = α R m (H). The second inequality actually holds with equality, and follows from: R m (H + H ) = 1 m E = 1 m E = 1 m E h H,h H h H,h H h H σ i (h(x i ) + h (x i )) σ i h(x i ) + σ i h(x i ) + 1 m E h H,h H h H σ i h (x i ) σ i h (x i ). For the third inequality, using the identity max(a, b) = 1 2 a + b + a b valid for all a, b R, and the sub-additivity of we can write: R m ({max(h, h ): h H, h H }) = 1 2m E σ i h(x i ) + h (x i ) + h(x i ) h (x i ) h H,h H 1 2 R m(h) + R m (H ) + 1 2m E h H,h H σ i h(x i ) h (x i ). Since the absolute value function is 1-Lipschitz, by the contraction lemma, the third term can be bounded as follows 1 2m E σ i h(x i ) h (x i ) h H,h H 1 2m E σ i h(x i ) h (x i ) h H,h H 1 2m E σ i h(x i ) + σ i h (x i ) h H,h H h H,h H = 1 2m E σ i h(x i ) + 1 h H,h H 2m E σ i h (x i ) h H,h H = 1 2 R m(h) + R m (H ), using the fact that σ i and σ i follow the same distribution. 3
4 B. VC-dimension 1. What is the VC-dimension of the family of subsets of the real line x, x + 1 x + 2, + ), with x R? Solution: The VC-dimension of this family is less than 4 since the labeling + + is not possible to obtain using any hypothesis: for any set of 4 points {x 1, x 2, x 3, x 4 } with x 1 < x 2 < x 3 < x 4, the labeling of the first 3 points imposes x 1 x, x + 1, x 2 (x + 1, x + 2), and x 3 x + 2, + ) for some x R, which implies a positive labeling for x 4. The VC-dimension is equal to 3 since there is a set of 3 points that can be fuly shattered. An example is {1/4, 9/8, 7/4} with the following intervals for each labeling sequence: , +1 2, + ) + + 1/4, 5/4 9/4, + ) + + 1/2, +1/2 3/2, + ) + 0, 1 2, + ) , 0 1, + ) 1/2, 3/2 5/2, + 3/2, 5/2 7/2, + ) 2, 3 4, + ). 2. VC-dimension of sine functions. Consider the hypothesis family of sine functions: {x sign(sin(ωx)): ω R}. (a) Show that for any x R the points x, 2x, 3x and 4x cannot be shattered by this family of sine functions. Solution: Fix x R and pose there exists an ω that realizes the labeling +. Thus sin(ωx) < 0, sin(2ωx) < 0, sin(3ωx) 0 and sin(4ωx) < 0. We will show that this implies sin 2 (ωx) < 1 2 and sin 2 (ωx) 3 4, a contradiction. Using the identity sin(2θ) = 2 sin(θ) cos(θ) and the fact that sin(4ωx) < 0 we have 2 sin(2ωx) cos(2ωx) = sin(4ωx) < 0. 4
5 Since sin(2ωx) < 0 we can divide both sides of this inequality by 2 sin(2ωx) to conclude cos(2ωx) > 0. Applying the identity cos(2θ) = 1 2 sin 2 (θ) yields 1 2 sin 2 (ωx) > 0, or sin 2 (ωx) < 1 2. Using the identity sin(3θ) = 3 sin(θ) 4 sin 3 (θ) and the fact that sin(3ωx) 0 we have 3 sin(ωx) 4 sin 3 (ωx) = sin(3ωx) 0 Since sin(ωx) < 0 we can divide both sides of this inequality by sin(ωx) to conclude 3 4 sin 2 (ωx) 0, or sin 2 (ωx) 3 4. Interestingly, it is possible to find a set of four evenly spaced points that can be shattered by this family of sine functions, provided that the distance from the origin to the nearest point is allowed to differ from the distance between the points. For example, the points {1, 1.5, 2, 2.5} are shattered by the frequencies ω { 10, 9, 8, 7, 6, 5, 4, 1, 1, 2, 4, 5, 11, 14, 15, 23}. (b) Show that the VC-dimension of the family of sine functions is infinite. (hint: show that {2 m : m N} can be fully shattered for any m > 0.) Solution: For any m > 0, consider the set of points (x 1,..., x m ) with arbitrary labels (y 1,..., y m ) { 1, +1} m. Now, choose the parameter ω = π(1+ m 2i y i ) where y i = 1 y i 2. We show that this single parameter will always correctly classify the entire sample for any m > 0 and choice of labels. For any j 1, m we have ωx j = ω2 j = π(2 j + 2 i j y i) j 1 m j = π(2 j + ( 2 i j y i) + y j + ( 2 i y i)). The last term can be dropped from the sum since it only contributes multiples of 2π. Since y i {0, 1} the remaining term, that is, π(2 j + ( j 1 2i j y i ) + y j ) = π( j 1 2 i y i + 2 j + y j ), can be upper and 5
6 lower bound as follows: j 1 π( j 1 π( 2 i y i + 2 j + y j) π( j 2 i + y j) < π(1 + y j), 2 i y i + 2 j + y j) > πy j. Thus, if y j = 1 we have y j = 0 and 0 < ωx j < π, which implies sign(ωx j ) = 1. Similarly, for y j = 1 we have sign(ωx j ) = 1. C. Support Vector Machines 1. Download and install the libsvm software library from: cjlin/libsvm/ 2. Consider the spambase data set Download a shuffled version of that dataset from mohri/ml16/spambase.data.shuffled Use the libsvm scaling tool to scale the features of all the data. Use the first 3450 examples for training, the last 1151 for testing. The scaling parameters should be computed only on the training data and then applied to the test data. 3. Consider the binary classification that consists of predicting if the message if a spam using the 57 features. Use SVMs combined with polynomial kernels to tackle this binary classification problem. To do that, randomly split the training data into ten equal-sized disjoint sets. For each value of the polynomial degree, d = 1, 2, 3, 4, plot the average cross-validation error plus or minus one standard deviation as a function of C (let other parameters of polynomial kernels in libsvm be equal to their default values), varying C in powers of 2, starting from a small value C = 2 k to C = 2 k, for some value of k. k should be chosen so that you see a significant variation in training error, starting from a very high training 6
7 misclassification fraction Validation Error vs. C d=1 d=2 d=3 d= log 2 (C) Figure 1: Average error according to 10-fold cross-validation, with error-bars indicating one standard deviation. error to a low training error. Expect longer training times with libsvm as the value of C increases. Solution: Figure 1 shows the average cross-validation performance as a function of the regularization parameter C. Note that the algorithm starts to exhibit some over-fitting as C becomes very large. The performance for several choices of d and C are essentially indistinguishable; one suitable choice of optimal parameters is C = 2 9 and d = Let (C, d ) be the best pair found previously. Fix C to be C. Plot the ten-fold cross-validation error and the test errors for the hypotheses obtained as a function of d. Plot the average number of port vectors obtained as a function of d. How many of the port vectors lie on the margin hyperplanes? Solution: The first plot in Figure 2 shows that the test error decreases (slightly) with an increase in degree and also that the cross-validation error is (slightly) optimistic when compared to the test error on the held-out dataset. The second plot shows that the total number of marginal port vectors increases with d (as does the dimension of the feature space) while the total number of overall port vectors decreases. This also implies that the number of port vectors due to mistakes is decreasing, which agrees with the 7
8 misclassification fraction Error vs. Degree test error validation error count Number of Support Vectors vs. Degree num SV num marginal SV degree (d) degree (d) Figure 2: The test and validation error as a function of degree (left panel) as well as the number of total and marginal port vectors (right panel). first plot. 5. Suppose we replace in the primal optimization problem of SVMs the penalty term m ξ i = ξ 1 with ξ = max m ξ i. Give the associated dual optimization problem. Show that it differs from the standard dual optimization problem of SVMs only by the constraints, which can be expressed in terms of α 1. Solution: The optimization problem for this version of SVMs can be written as follows: min w,b,ξ 1 2 w 2 + Cξ (5) subject to y i (w x i + b) 1 ξ i 1, m ξ 0. The corresponding Lagrange function can be written as L = 1 2 w 2 + Cξ α i y i (w x i + b) 1 + ξ βξ. 8
9 Differentiating with respect to the primal variables gives: w L = 0 = w = α i y i x i b L = 0 = ξ L = 0 = α i y i = 0 α i + β = C. Plugging in the first equality in L and using the second and third yields: L = α i 1 2 α i α j y i y j (x i x j ). i,j=1 In view of the third equality, the condition β 0 can be equivalently written as m α i C. Thus, the equivalent dual optimization problem can be written as max α α i 1 2 α i α j y i y j (x i x j ) i,j=1 subject to (α 0) ( α 1 C) ( α i y i = 0). More generally, a p -constraint on ξ in the primal optimization problem leads to a q -constraint (dual norm constraint) on α in the dual, where p and q are conjugate: 1/p + 1/q = 1. 9
Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 April 5, 2013 Due: April 19, 2013
Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 April 5, 2013 Due: April 19, 2013 A. Kernels 1. Let X be a finite set. Show that the kernel
More informationwhere x i is the ith coordinate of x R N. 1. Show that the following upper bound holds for the growth function of H:
Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 2 October 25, 2017 Due: November 08, 2017 A. Growth function Growth function of stum functions.
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear
More information1. Kernel ridge regression In contrast to ordinary least squares which has a cost function. m (θ T x (i) y (i) ) 2, J(θ) = 1 2.
CS229 Problem Set #2 Solutions 1 CS 229, Public Course Problem Set #2 Solutions: Theory Kernels, SVMs, and 1. Kernel ridge regression In contrast to ordinary least squares which has a cost function J(θ)
More informationLearning with Rejection
Learning with Rejection Corinna Cortes 1, Giulia DeSalvo 2, and Mehryar Mohri 2,1 1 Google Research, 111 8th Avenue, New York, NY 2 Courant Institute of Mathematical Sciences, 251 Mercer Street, New York,
More informationMachine Learning 4771
Machine Learning 477 Instructor: Tony Jebara Topic 5 Generalization Guarantees VC-Dimension Nearest Neighbor Classification (infinite VC dimension) Structural Risk Minimization Support Vector Machines
More informationEvaluation. Andrea Passerini Machine Learning. Evaluation
Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain
More informationSupport Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs
E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in
More informationEvaluation requires to define performance measures to be optimized
Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation
More informationPAC-learning, VC Dimension and Margin-based Bounds
More details: General: http://www.learning-with-kernels.org/ Example of more complex bounds: http://www.research.ibm.com/people/t/tzhang/papers/jmlr02_cover.ps.gz PAC-learning, VC Dimension and Margin-based
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationJeff Howbert Introduction to Machine Learning Winter
Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable
More informationLecture 10: A brief introduction to Support Vector Machine
Lecture 10: A brief introduction to Support Vector Machine Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department
More informationIntroduction to Machine Learning
Introduction to Machine Learning Machine Learning: Jordan Boyd-Graber University of Maryland RADEMACHER COMPLEXITY Slides adapted from Rob Schapire Machine Learning: Jordan Boyd-Graber UMD Introduction
More informationLecture Learning infinite hypothesis class via VC-dimension and Rademacher complexity;
CSCI699: Topics in Learning and Game Theory Lecture 2 Lecturer: Ilias Diakonikolas Scribes: Li Han Today we will cover the following 2 topics: 1. Learning infinite hypothesis class via VC-dimension and
More informationLecture Support Vector Machine (SVM) Classifiers
Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationSupport Vector Machines for Classification and Regression
CIS 520: Machine Learning Oct 04, 207 Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may
More informationFoundations of Machine Learning
Introduction to ML Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu page 1 Logistics Prerequisites: basics in linear algebra, probability, and analysis of algorithms. Workload: about
More informationSupport Vector Machines: Maximum Margin Classifiers
Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationLinear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training
More informationi=1 cosn (x 2 i y2 i ) over RN R N. cos y sin x
Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 November 16, 017 Due: Dec 01, 017 A. Kernels Show that the following kernels K are PDS: 1.
More informationSupport Vector Machines
Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationLearning Theory. Piyush Rai. CS5350/6350: Machine Learning. September 27, (CS5350/6350) Learning Theory September 27, / 14
Learning Theory Piyush Rai CS5350/6350: Machine Learning September 27, 2011 (CS5350/6350) Learning Theory September 27, 2011 1 / 14 Why Learning Theory? We want to have theoretical guarantees about our
More informationAnnouncements - Homework
Announcements - Homework Homework 1 is graded, please collect at end of lecture Homework 2 due today Homework 3 out soon (watch email) Ques 1 midterm review HW1 score distribution 40 HW1 total score 35
More informationLinear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training
More informationSupport Vector Machines
EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationSupport Vector Machine
Andrea Passerini passerini@disi.unitn.it Machine Learning Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
More informationCS446: Machine Learning Spring Problem Set 4
CS446: Machine Learning Spring 2017 Problem Set 4 Handed Out: February 27 th, 2017 Due: March 11 th, 2017 Feel free to talk to other members of the class in doing the homework. I am more concerned that
More informationSupport Vector and Kernel Methods
SIGIR 2003 Tutorial Support Vector and Kernel Methods Thorsten Joachims Cornell University Computer Science Department tj@cs.cornell.edu http://www.joachims.org 0 Linear Classifiers Rules of the Form:
More informationLecture 9: Large Margin Classifiers. Linear Support Vector Machines
Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation
More informationLinear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)
Support vector machines In a nutshell Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers) Solution only depends on a small subset of training
More informationSupport Vector Machines, Kernel SVM
Support Vector Machines, Kernel SVM Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 27, 2017 1 / 40 Outline 1 Administration 2 Review of last lecture 3 SVM
More informationDiscriminative Models
No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models
More informationLinear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights
Linear Discriminant Functions and Support Vector Machines Linear, threshold units CSE19, Winter 11 Biometrics CSE 19 Lecture 11 1 X i : inputs W i : weights θ : threshold 3 4 5 1 6 7 Courtesy of University
More informationMachine Learning. Support Vector Machines. Fabio Vandin November 20, 2017
Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training
More informationNearest Neighbors Methods for Support Vector Machines
Nearest Neighbors Methods for Support Vector Machines A. J. Quiroz, Dpto. de Matemáticas. Universidad de Los Andes joint work with María González-Lima, Universidad Simón Boĺıvar and Sergio A. Camelo, Universidad
More informationCSC 411 Lecture 17: Support Vector Machine
CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17
More informationSupport Vector Machines
Support Vector Machines Reading: Ben-Hur & Weston, A User s Guide to Support Vector Machines (linked from class web page) Notation Assume a binary classification problem. Instances are represented by vector
More informationHomework 3. Convex Optimization /36-725
Homework 3 Convex Optimization 10-725/36-725 Due Friday October 14 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationECS171: Machine Learning
ECS171: Machine Learning Lecture 6: Training versus Testing (LFD 2.1) Cho-Jui Hsieh UC Davis Jan 29, 2018 Preamble to the theory Training versus testing Out-of-sample error (generalization error): What
More informationPerceptron Revisited: Linear Separators. Support Vector Machines
Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department
More informationPart of the slides are adapted from Ziko Kolter
Part of the slides are adapted from Ziko Kolter OUTLINE 1 Supervised learning: classification........................................................ 2 2 Non-linear regression/classification, overfitting,
More information1. Implement AdaBoost with boosting stumps and apply the algorithm to the. Solution:
Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 3 October 31, 2016 Due: A. November 11, 2016; B. November 22, 2016 A. Boosting 1. Implement
More informationMachine Learning And Applications: Supervised Learning-SVM
Machine Learning And Applications: Supervised Learning-SVM Raphaël Bournhonesque École Normale Supérieure de Lyon, Lyon, France raphael.bournhonesque@ens-lyon.fr 1 Supervised vs unsupervised learning Machine
More informationSupport Vector Machine for Classification and Regression
Support Vector Machine for Classification and Regression Ahlame Douzal AMA-LIG, Université Joseph Fourier Master 2R - MOSIG (2013) November 25, 2013 Loss function, Separating Hyperplanes, Canonical Hyperplan
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationVC Dimension Review. The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces.
VC Dimension Review The purpose of this document is to review VC dimension and PAC learning for infinite hypothesis spaces. Previously, in discussing PAC learning, we were trying to answer questions about
More informationDownload the following data sets from the UC Irvine ML repository:
Mehryar Mohri Introduction to Machine Learning Courant Institute of Mathematical Sciences Homework assignment 2 Solution (written by Andres Muñoz) Perceptron Algorithm Download the following data sets
More informationGeneralization Bounds in Machine Learning. Presented by: Afshin Rostamizadeh
Generalization Bounds in Machine Learning Presented by: Afshin Rostamizadeh Outline Introduction to generalization bounds. Examples: VC-bounds Covering Number bounds Rademacher bounds Stability bounds
More informationName (NetID): (1 Point)
CS446: Machine Learning (D) Spring 2017 March 16 th, 2017 This is a closed book exam. Everything you need in order to solve the problems is supplied in the body of this exam. This exam booklet contains
More informationOutline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22
Outline Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels Basic concepts: SVM and kernels SVM primal/dual problems
More informationComputational and Statistical Learning theory
Computational and Statistical Learning theory Problem set 2 Due: January 31st Email solutions to : karthik at ttic dot edu Notation : Input space : X Label space : Y = {±1} Sample : (x 1, y 1,..., (x n,
More informationSequential Minimal Optimization (SMO)
Data Science and Machine Intelligence Lab National Chiao Tung University May, 07 The SMO algorithm was proposed by John C. Platt in 998 and became the fastest quadratic programming optimization algorithm,
More informationSupport Vector Machine (continued)
Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need
More informationLecture 10: Support Vector Machine and Large Margin Classifier
Lecture 10: Support Vector Machine and Large Margin Classifier Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu
More informationAbout this class. Maximizing the Margin. Maximum margin classifiers. Picture of large and small margin hyperplanes
About this class Maximum margin classifiers SVMs: geometric derivation of the primal problem Statement of the dual problem The kernel trick SVMs as the solution to a regularization problem Maximizing the
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationPattern Recognition 2018 Support Vector Machines
Pattern Recognition 2018 Support Vector Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 1 / 48 Support Vector Machines Ad Feelders ( Universiteit Utrecht
More informationEstimating the accuracy of a hypothesis Setting. Assume a binary classification setting
Estimating the accuracy of a hypothesis Setting Assume a binary classification setting Assume input/output pairs (x, y) are sampled from an unknown probability distribution D = p(x, y) Train a binary classifier
More informationNeural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science
Neural Networks Prof. Dr. Rudolf Kruse Computational Intelligence Group Faculty for Computer Science kruse@iws.cs.uni-magdeburg.de Rudolf Kruse Neural Networks 1 Supervised Learning / Support Vector Machines
More informationSupport Vector Machine via Nonlinear Rescaling Method
Manuscript Click here to download Manuscript: svm-nrm_3.tex Support Vector Machine via Nonlinear Rescaling Method Roman Polyak Department of SEOR and Department of Mathematical Sciences George Mason University
More informationIndirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina
Indirect Rule Learning: Support Vector Machines Indirect learning: loss optimization It doesn t estimate the prediction rule f (x) directly, since most loss functions do not have explicit optimizers. Indirection
More informationThe sample complexity of agnostic learning with deterministic labels
The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College
More informationGeneralization theory
Generalization theory Chapter 4 T.P. Runarsson (tpr@hi.is) and S. Sigurdsson (sven@hi.is) Introduction Suppose you are given the empirical observations, (x 1, y 1 ),..., (x l, y l ) (X Y) l. Consider the
More informationSupport Vector Machines. Machine Learning Fall 2017
Support Vector Machines Machine Learning Fall 2017 1 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost 2 Where are we? Learning algorithms Decision Trees Perceptron AdaBoost Produce
More informationCS6375: Machine Learning Gautam Kunapuli. Support Vector Machines
Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this
More informationSupport Vector Machines
Support Vector Machines Ryan M. Rifkin Google, Inc. 2008 Plan Regularization derivation of SVMs Geometric derivation of SVMs Optimality, Duality and Large Scale SVMs The Regularization Setting (Again)
More informationSupport Vector Machines.
Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel
More information18.9 SUPPORT VECTOR MACHINES
744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the
More information1 Boosting. COS 511: Foundations of Machine Learning. Rob Schapire Lecture # 10 Scribe: John H. White, IV 3/6/2003
COS 511: Foundations of Machine Learning Rob Schapire Lecture # 10 Scribe: John H. White, IV 3/6/003 1 Boosting Theorem 1 With probablity 1, f co (H), and θ >0 then Pr D yf (x) 0] Pr S yf (x) θ]+o 1 (m)ln(
More informationLecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University
Lecture 18: Kernels Risk and Loss Support Vector Regression Aykut Erdem December 2016 Hacettepe University Administrative We will have a make-up lecture on next Saturday December 24, 2016 Presentations
More informationWarm up: risk prediction with logistic regression
Warm up: risk prediction with logistic regression Boss gives you a bunch of data on loans defaulting or not: {(x i,y i )} n i= x i 2 R d, y i 2 {, } You model the data as: P (Y = y x, w) = + exp( yw T
More informationLinear Support Vector Machine. Classification. Linear SVM. Huiping Cao. Huiping Cao, Slide 1/26
Huiping Cao, Slide 1/26 Classification Linear SVM Huiping Cao linear hyperplane (decision boundary) that will separate the data Huiping Cao, Slide 2/26 Support Vector Machines rt Vector Find a linear Machines
More informationMachine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015
Machine Learning 10-701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 2006-2015 1 Last time: PAC and
More informationStatistical and Computational Learning Theory
Statistical and Computational Learning Theory Fundamental Question: Predict Error Rates Given: Find: The space H of hypotheses The number and distribution of the training examples S The complexity of the
More informationComputational Learning Theory. CS534 - Machine Learning
Computational Learning Theory CS534 Machine Learning Introduction Computational learning theory Provides a theoretical analysis of learning Shows when a learning algorithm can be expected to succeed Shows
More informationMachine Learning. Support Vector Machines. Manfred Huber
Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data
More informationKernel Methods and Support Vector Machines
Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction
More informationSupport Vector Machines
Two SVM tutorials linked in class website (please, read both): High-level presentation with applications (Hearst 1998) Detailed tutorial (Burges 1998) Support Vector Machines Machine Learning 10701/15781
More informationL5 Support Vector Classification
L5 Support Vector Classification Support Vector Machine Problem definition Geometrical picture Optimization problem Optimization Problem Hard margin Convexity Dual problem Soft margin problem Alexander
More informationCOS 511: Theoretical Machine Learning. Lecturer: Rob Schapire Lecture #5 Scribe: Allen(Zhelun) Wu February 19, ). Then: Pr[err D (h A ) > ɛ] δ
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #5 Scribe: Allen(Zhelun) Wu February 19, 018 Review Theorem (Occam s Razor). Say algorithm A finds a hypothesis h A H consistent with
More informationLinear Classifiers. Michael Collins. January 18, 2012
Linear Classifiers Michael Collins January 18, 2012 Today s Lecture Binary classification problems Linear classifiers The perceptron algorithm Classification Problems: An Example Goal: build a system that
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationComputational Learning Theory. Definitions
Computational Learning Theory Computational learning theory is interested in theoretical analyses of the following issues. What is needed to learn effectively? Sample complexity. How many examples? Computational
More informationComputational Learning Theory: Shattering and VC Dimensions. Machine Learning. Spring The slides are mainly from Vivek Srikumar
Computational Learning Theory: Shattering and VC Dimensions Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory of Generalization
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Support Vector Machine (SVM) Hamid R. Rabiee Hadi Asheri, Jafar Muhammadi, Nima Pourdamghani Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Introduction
More informationDoes Unlabeled Data Help?
Does Unlabeled Data Help? Worst-case Analysis of the Sample Complexity of Semi-supervised Learning. Ben-David, Lu and Pal; COLT, 2008. Presentation by Ashish Rastogi Courant Machine Learning Seminar. Outline
More informationCSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18
CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H
More informationMachine Learning: Homework 5
0-60 Machine Learning: Homework 5 Due 5:0 p.m. Thursday, March, 06 TAs: Travis Dick and Han Zhao Instructions Late homework policy: Homework is worth full credit if submitted before the due date, half
More informationComputational Learning Theory: Probably Approximately Correct (PAC) Learning. Machine Learning. Spring The slides are mainly from Vivek Srikumar
Computational Learning Theory: Probably Approximately Correct (PAC) Learning Machine Learning Spring 2018 The slides are mainly from Vivek Srikumar 1 This lecture: Computational Learning Theory The Theory
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 5: Vector Data: Support Vector Machine Instructor: Yizhou Sun yzsun@cs.ucla.edu October 18, 2017 Homework 1 Announcements Due end of the day of this Thursday (11:59pm)
More informationMachine Learning A Geometric Approach
Machine Learning A Geometric Approach CIML book Chap 7.7 Linear Classification: Support Vector Machines (SVM) Professor Liang Huang some slides from Alex Smola (CMU) Linear Separator Ham Spam From Perceptron
More information