CIS 520: Machine Learning Oct 09, Kernel Methods

 Agatha Lawson
 4 months ago
 Views:
Transcription
1 CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed in the lecture (and vice versa Outline Nonlinear models via basis functions Closer look at the SVM dual: kernel functions, kernel SVM RKHSs and Representer Theorem Kernel logistic regression Kernel ridge regression Nonlinear Models via Basis Functions Let X = R d We have seen methods for learning linear models of the form h(x = sign(w x + b for binary classification (such as logistic regression and SVMs and f(x = w x + b for regression (such as linear least squares regression and SVR What if we want to learn a nonlinear model? What would be a simple way to achieve this using the methods we have seen so far? One way to achieve this is to map instances x R d to some new feature vectors φ(x R n via some nonlinear feature mapping φ : R d R n, and then to learn a linear model in this transformed space For example, if one maps instances x R d to n = ( + 2d + ( d 2 dimensional feature vectors x x d x x 2 φ(x =, x d x d x 2 then learning a linear model in the transformed space is equivalent to learning a quadratic model in the original instance space In general, one can choose any basis functions φ, φ n : X R, and learn a linear x 2 d
2 2 Kernel Methods model over these: w φ(x + b, where w R n (in fact, one can do this for X R d as well For example, in least squares regression applied to a training sample S = ((x, y,, (x m, y m (R d R m, one would simply replace the matrix X R m d with the design matrix Φ R m n, where Φ ij = φ j (x i What is a potential difficulty in doing this? If n is large (eg as would be the case if the feature mapping φ corresponded to a highdegree polynomial, then the above approach can be computationally expensive In this lecture we look at a technique that allows one to implement the above idea efficiently for many algorithms We start by taking a closer look at the SVM dual which we derived in the last lecture 2 Closer Look at the SVM Dual: Kernel Functions, Kernel SVM Recall the form of the dual we derived for the (softmargin linear SVM: max α 2 α i α j y i y j (x i x j + α i ( j= subject to α i y i = 0 (2 0 α i C, i =,, m (3 If we implement this on feature vectors φ(x i R n in place of x i R d, we get the following optimization problem: max α 2 ( α i α j y i y j φ(xi φ(x j + α i (4 j= subject to α i y i = 0 (5 0 α i C, i =,, m (6 This involves computing dot products between vectors φ(x i, φ(x j in R n Similarly, using the learned model to make predictions on a new test point x R d also involves computing dot products between vectors in R n : ( h(x = sign α i y i φ(x i φ(x + b i SV For example, as we saw above, one can learn a quadratic classifier in X = R 2 by learning a linear classifier in φ(r 2 R 6, where (( x x φ = x 2 x 2 x x 2 ; x 2 x 2 2 clearly, a straightforward approach to learning an SVM classifier in this space (and applying it to a new test point will involve computing dot products in R 6 (more generally, when learning a degreeq polynomial in R d, such a straightforward approach will involve computing dot products in R n for n = O(d q
3 Kernel Methods 3 Now, consider replacing dot products φ(x φ(x in the above example with K(x, x, where x, x R 2, K(x, x = (x x + 2 It can be verified (exercise! that K(x, x = φ K (x φ K (x, where (( x φ K = x 2 2x 2x2 2x x 2 Thus, using K(x, x above instead of φ(x φ(x implicitly computes dot products in R 6, with computation of dot products required only in R 2! In fact, one can use any symmetric, positive semidefinite kernel function K : X X R (also called a Mercer kernel function in the SVM algorithm directly, even if the feature space implemented by the kernel function cannot be described explicitly Any such kernel function yields a convex dual problem; if K is positive definite, then K also corresponds to inner products in some inner product space V (ie K(x, x = φ(x, φ(x for some φ : X V For Euclidean instance spaces X = R d, examples of commonly used kernel functions include the polynomial kernel K(x, x = (x x + q,which results in learning a degreeq polynomial threshold classifier, and the Gaussian kernel, also known as the radial basis function (RBF kernel, K(x, x = exp ( x x 2 2 2σ (where 2 σ > 0 is a parameter of the kernel, which effectivey implements dot products in an infinitedimensional inner product space; in both cases, evaluating the kernel K(x, x at any two points x, x requires only O(d computation time Kernel functions can also be used for nonvectorial data (X = R d ; for example, kernel functions are often used to implicitly embed instance spaces containing strings, trees etc into an inner product space, and to implicitly learn a linear classifier in this space Intuitively, it is helpful to think of kernel functions as capturing some sort of similarity between pairs of instances in X To summarize, given a training sample S = ((x, y,, (x m, y m (X {±} m, in order to learn a kernel SVM classifier using a kernel function K : X X R, one simply solves the kernel SVM dual given by x 2 x 2 2 max α 2 α i α j y i y j K(x i, x j + α i (7 j= subject to α i y i = 0 (8 0 α i C, i =,, m, (9 and then predicts the label of a new instance x X according to ( h(x = sign i SV α i y i K(x i, x + b, where b = SV i SV ( y i j SV α j y j K(x i, x j
4 4 Kernel Methods 3 RKHSs and Representer Theorem Let K : X X R be a symmetric positive definite kernel function Let { FK 0 r } = f : X R f(x = α i K(x i, x for some r Z +, α i R, x i X For f, g FK 0 with f(x = r α ik(x i, x and g(x = s j= β jk(x j, x, define r s f, g K = α i β j K(x i, x j (0 j= f K = f, f K ( Let F K be the completion of FK 0 under the metric induced by the above norm Then reproducing kernel Hibert space (RKHS associated with K 2 Note that the SVM classifier learned using kernel K is of the form where f(x = i SV α iy i K(x i, x, ie where f F K h(x = sign(f(x + b, In fact, consider the following optimization problem: ( yi (f(x i + b f F K,b R m + + λ f 2 K F K is called the It turns out that the above SVM solution (with C = 2λm is a solution to this problem, ie the kernel SVM solution imizes the RKHSnorm regularized hinge loss over all functions over the form f(x + b for f F K, b R More generally, we have the following result: Theorem (Representer Theorem Let K : X X R be a positive definite kernel function Let Y R Let S = ((x, y,, (x m, y m (X Y m Let L : R m Y m R Let Ω : R + R + be a monotonically increasing function Then for λ > 0, there is a solution to the optimization problem of the form ( (f(x L + b,, f(x m + b, (y,, y m f F K,b R f(x = α i K(x i, x for some α,, α m R If Ω is strictly increasing, then all solutions have this form + λ Ω( f 2 K The above result tells us that even if F K is an infinitedimensional space, any optimization problem resulting from imizing a loss over a finite training sample regularized by some increasing function of the RKHSnorm is effectively a finitedimensional optimization problem, and moreover, the solution to this problem can be written as a kernel expansion over the training points In particular, imizing any other loss over F K (regularized by the RKHSnorm will also yield a solution of this form! Exercise Show that linear functions f : R d R of the form f(x = w x form an RKHS with linear kernel K : R d R d R given by K(x, x = x x and with f 2 K = w 2 2 The metric induced by the norm K is given by d K (f, g = f g K The completion of FK 0 is simply F K plus any limit points of Cauchy sequences in FK 0 under this metric 2 The name reproducing kernel Hilbert space comes from the following reproducing property: For any x X, define K x : X R as K x(x = K(x, x ; then for any f F K, we have f, K x = f(x
5 Kernel Methods 5 4 Kernel Logistic Regression Given a training sample S (X {±} m and kernel function K : X X R, the kernel logistic regression classifier is given by the solution to the following optimization problem: f F K,b R m ln ( + e yi(f(xi+b + λ f 2 K Since we know from the Representer Theorem that the solution has the form f(x = m α ik(x i, x, we can write the above as an optimization problem over α, b: α R m,b R m ln ( + e yi( m j= αjk(xj,xi+b + λ j= α i α j K(x i, x j This is of a similar form as in standard logistic regression, with m basis functions φ j (x = K(x j, x for j [m] (and w α! In particular, define K R m m as K ij = K(x i, x j (this is often called the gram matrix, and let k i denote the ith column of this matrix Then we can write the above as simply α R m,b R m ln ( + e yi(α k i+b + λα Kα, which is similar to the form for standard linear logistic regression (with feature vectors k i except for the regularizer being α Kα rather than α 2 2 and can be solved similarly as before, using similar numerical optimization methods We note that unlike SVMs, here in general, the solution has α i 0 i [m] A variant of logistic regression called the import vector machine (IVM adopts a greedy approach to find a subset IV [m] such that the function f (x + b = i IV α i K(x i, x + b gives good performance Compared to SVMs, IVMs can provide more natural class probability estimates, as well as more natural extensions to multiclass classification 5 Kernel Ridge Regression Given a training sample S (X R m and kernel function K : X X R, consider first a kernel ridge regression formulation for learning a function f F K : f F K m ( yi f(x i 2 + λ f 2 K Again, since we know from the Representer Theorem that the solution has the form f(x = m α ik(x i, x, we can write the above as an optimization problem over α: α R m m ( 2 y i α j K(x j, x i + λ α i α j K(x i, x j, j= j= or in matrix notation, α R m m ( yi α 2 k i + λα Kα
6 6 Kernel Methods Again, this is of the same form as standard linear ridge regression, with feature vectors k i and with regularizer α Kα rather than α 2 2 If K is positive definite, in which case the gram matrix K is invertible, then setting the gradient of the objective above wrt α to zero can be seen to yield α = ( K + λmi m y, where as before I m is the m m identity matrix and y = (y,, y m R m Exercise Show that if X = R d and one wants to explicitly include a bias term b in the linear ridge regression solution which is not included in the regularization, then defining x ( [ ] w Id 0 X =, w =, L =, b 0 0 x m one gets the solution w = ( X X + λml X y How would you extend this to learning a function of the form f(x + b for f F K, b R in the kernel ridge regression setting?
Support Vector Machines
Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes NonLinear Separable SoftMargin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)
More informationSupport Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs
E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept HardMargin SVM SoftMargin SVM Dual Problems of HardMargin
More informationKernel Methods. Outline
Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert
More informationLecture 10: A brief introduction to Support Vector Machine
Lecture 10: A brief introduction to Support Vector Machine Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department
More information5.6 Nonparametric Logistic Regression
5.6 onparametric Logistic Regression Dmitri Dranishnikov University of Florida Statistical Learning onparametric Logistic Regression onparametric? Doesnt mean that there are no parameters. Just means that
More informationPerceptron Revisited: Linear Separators. Support Vector Machines
Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department
More informationSupport Vector Machines and Kernel Methods
2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More informationKernel Method: Data Analysis with Positive Definite Kernels
Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University
More informationLMS Algorithm Summary
LMS Algorithm Summary Step size tradeoff Other Iterative Algorithms LMS algorithm with variable step size: w(k+1) = w(k) + µ(k)e(k)x(k) When step size µ(k) = µ/k algorithm converges almost surely to optimal
More informationDeviations from linear separability. Kernel methods. Basis expansion for quadratic boundaries. Adding new features Systematic deviation
Deviations from linear separability Kernel methods CSE 250B Noise Find a separator that minimizes a convex loss function related to the number of mistakes. e.g. SVM, logistic regression. Systematic deviation
More informationSVMs, Duality and the Kernel Trick
SVMs, Duality and the Kernel Trick Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 26 th, 2007 20052007 Carlos Guestrin 1 SVMs reminder 20052007 Carlos Guestrin 2 Today
More informationSupport Vector Machines and Kernel Methods
Support Vector Machines and Kernel Methods Geoff Gordon ggordon@cs.cmu.edu July 10, 2003 Overview Why do people care about SVMs? Classification problems SVMs often produce good results over a wide range
More informationSupport Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM
1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM JeanPhilippe Vert Bioinformatics Center, Kyoto University, Japan JeanPhilippe.Vert@mines.org Human Genome Center, University
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationKernel Methods. Barnabás Póczos
Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels
More informationKernel Methods in Machine Learning
Kernel Methods in Machine Learning Autumn 2015 Lecture 1: Introduction Juho Rousu ICSE4030 Kernel Methods in Machine Learning 9. September, 2015 uho Rousu (ICSE4030 Kernel Methods in Machine Learning)
More information9.2 Support Vector Machines 159
9.2 Support Vector Machines 159 9.2.3 Kernel Methods We have all the tools together now to make an exciting step. Let us summarize our findings. We are interested in regularized estimation problems of
More informationLearning From Data Lecture 25 The Kernel Trick
Learning From Data Lecture 25 The Kernel Trick Learning with only inner products The Kernel M. MagdonIsmail CSCI 400/600 recap: Large Margin is Better Controling Overfitting NonSeparable Data 0.08 random
More informationSupport Vector Machines
EE 17/7AT: Optimization Models in Engineering Section 11/1  April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable
More informationReproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto
Reproducing Kernel Hilbert Spaces 9.520 Class 03, 15 February 2006 Andrea Caponnetto About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationLecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron
CS446: Machine Learning, Fall 2017 Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron Lecturer: Sanmi Koyejo Scribe: Ke Wang, Oct. 24th, 2017 Agenda Recap: SVM and Hinge loss, Representer
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppäaho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationLearning with kernels and SVM
Learning with kernels and SVM Šámalova chata, 23. května, 2006 Petra Kudová Outline Introduction Binary classification Learning with Kernels Support Vector Machines Demo Conclusion Learning from data find
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 MultiLayer Perceptrons The BackPropagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationLecture 5: Linear models for classification. Logistic regression. Gradient Descent. Secondorder methods.
Lecture 5: Linear models for classification. Logistic regression. Gradient Descent. Secondorder methods. Linear models for classification Logistic regression Gradient descent and secondorder methods
More informationConsistency of Nearest Neighbor Methods
E0 370 Statistical Learning Theory Lecture 16 Oct 25, 2011 Consistency of Nearest Neighbor Methods Lecturer: Shivani Agarwal Scribe: Arun Rajkumar 1 Introduction In this lecture we return to the study
More informationConvex Optimization Algorithms for Machine Learning in 10 Slides
Convex Optimization Algorithms for Machine Learning in 10 Slides Presenter: Jul. 15. 2015 Outline 1 Quadratic Problem Linear System 2 Smooth Problem NewtonCG 3 Composite Problem ProximalNewtonCD 4 Nonsmooth,
More informationReproducing Kernel Hilbert Spaces
9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Nonparametric
More informationSupport Vector and Kernel Methods
SIGIR 2003 Tutorial Support Vector and Kernel Methods Thorsten Joachims Cornell University Computer Science Department tj@cs.cornell.edu http://www.joachims.org 0 Linear Classifiers Rules of the Form:
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Support Vector Machine (SVM) Hamid R. Rabiee Hadi Asheri, Jafar Muhammadi, Nima Pourdamghani Spring 2013 http://ce.sharif.edu/courses/9192/2/ce7251/ Agenda Introduction
More informationElements of Positive Definite Kernel and Reproducing Kernel Hilbert Space
Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 12, 2007 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two onepage, twosided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More informationLINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES. Supervised Learning
LINEAR CLASSIFICATION, PERCEPTRON, LOGISTIC REGRESSION, SVC, NAÏVE BAYES Supervised Learning Linear vs non linear classifiers In KNN we saw an example of a nonlinear classifier: the decision boundary
More informationSupport Vector Machines
Support Vector Machines Tobias Pohlen Selected Topics in Human Language Technology and Pattern Recognition February 10, 2014 Human Language Technology and Pattern Recognition Lehrstuhl für Informatik 6
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning ChoJui Hsieh UC Davis Sept 29, 2016 Outline Convex vs Nonconvex Functions Coordinate Descent Gradient Descent Newton s method Stochastic Gradient Descent Numerical Optimization
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationLinear Classification and SVM. Dr. Xin Zhang
Linear Classification and SVM Dr. Xin Zhang Email: eexinzhang@scut.edu.cn What is linear classification? Classification is intrinsically nonlinear It puts nonidentical things in the same class, so a
More informationSupport Vector Machines. Maximizing the Margin
Support Vector Machines Support vector achines (SVMs) learn a hypothesis: h(x) = b + Σ i= y i α i k(x, x i ) (x, y ),..., (x, y ) are the training exs., y i {, } b is the bias weight. α,..., α are the
More informationThe Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017
The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two onepage, twosided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationLecture Support Vector Machine (SVM) Classifiers
Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in
More informationCS , Fall 2011 Assignment 2 Solutions
CS 940, Fall 20 Assignment 2 Solutions (8 pts) In this question we briefly review the expressiveness of kernels (a) Construct a support vector machine that computes the XOR function Use values of + and
More informationSupport Vector Machines: Training with Stochastic Gradient Descent. Machine Learning Fall 2017
Support Vector Machines: Training with Stochastic Gradient Descent Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem
More information1. Kernel ridge regression In contrast to ordinary least squares which has a cost function. m (θ T x (i) y (i) ) 2, J(θ) = 1 2.
CS229 Problem Set #2 Solutions 1 CS 229, Public Course Problem Set #2 Solutions: Theory Kernels, SVMs, and 1. Kernel ridge regression In contrast to ordinary least squares which has a cost function J(θ)
More informationFunction Spaces. 1 Hilbert Spaces
Function Spaces A function space is a set of functions F that has some structure. Often a nonparametric regression function or classifier is chosen to lie in some function space, where the assume structure
More informationCSE 151 Machine Learning. Instructor: Kamalika Chaudhuri
CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Linear Classification Given labeled data: (xi, feature vector yi) label i=1,..,n where y is 1 or 1, find a hyperplane to separate from Linear Classification
More informationCOMS 4771 Regression. Nakul Verma
COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the
More informationThe Kernel Trick. Carlos C. Rodríguez October 25, Why don t we do it in higher dimensions?
The Kernel Trick Carlos C. Rodríguez http://omega.albany.edu:8008/ October 25, 2004 Why don t we do it in higher dimensions? If SVMs were able to handle only linearly separable data, their usefulness would
More informationLinear Regression (continued)
Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More informationCOMP 652: Machine Learning. Lecture 12. COMP Lecture 12 1 / 37
COMP 652: Machine Learning Lecture 12 COMP 652 Lecture 12 1 / 37 Today Perceptrons Definition Perceptron learning rule Convergence (Linear) support vector machines Margin & max margin classifier Formulation
More informationStatistical Classification. Minsoo Kim Pomona College Advisor: Jo Hardin
Statistical Classification Minsoo Kim Pomona College Advisor: Jo Hardin April 2, 2010 2 Contents 1 Introduction 5 2 Basic Discriminants 7 2.1 Linear Discriminant Analysis for Two Populations...................
More informationDirect Learning: Linear Classification. Donglin Zeng, Department of Biostatistics, University of North Carolina
Direct Learning: Linear Classification Logistic regression models for classification problem We consider two class problem: Y {0, 1}. The Bayes rule for the classification is I(P(Y = 1 X = x) > 1/2) so
More informationLINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
LINEAR CLASSIFIERS Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification, the input
More informationKernels for Multi task Learning
Kernels for Multi task Learning Charles A Micchelli Department of Mathematics and Statistics State University of New York, The University at Albany 1400 Washington Avenue, Albany, NY, 12222, USA Massimiliano
More informationDiscriminative Direction for Kernel Classifiers
Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering
More informationMidterm: CS 6375 Spring 2015 Solutions
Midterm: CS 6375 Spring 2015 Solutions The exam is closed book. You are allowed a onepage cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for an
More informationCS4495/6495 Introduction to Computer Vision. 8CL3 Support Vector Machines
CS4495/6495 Introduction to Computer Vision 8CL3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several
More informationKernel Conjugate Gradient
Kernel Conjugate Gradient Nathan Ratliff Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213 ndr@andrew.cmu.edu J. Andrew Bagnell Robotics Institute Carnegie Mellon University Pittsburgh,
More informationKernels MIT Course Notes
Kernels MIT 15.097 Course Notes Cynthia Rudin Credits: Bartlett, Schölkopf and Smola, Cristianini and ShaweTaylor The kernel trick that I m going to show you applies much more broadly than SVM, but we
More informationLecture 10: Support Vector Machines
Lecture 0: Support Vector Machines Lecture : Support Vector Machines Haim Sompolinsky, MCB 3, Monday, March 2, 205 Haim Sompolinsky, MCB 3, Wednesday, March, 207 The Optimal Separating Plane Suppose we
More informationSupport Vector Machines and Kernel Algorithms
Support Vector Machines and Kernel Algorithms Bernhard Schölkopf MaxPlanckInstitut für biologische Kybernetik 72076 Tübingen, Germany Bernhard.Schoelkopf@tuebingen.mpg.de Alex Smola RSISE, Australian
More informationNearest Neighbors Methods for Support Vector Machines
Nearest Neighbors Methods for Support Vector Machines A. J. Quiroz, Dpto. de Matemáticas. Universidad de Los Andes joint work with María GonzálezLima, Universidad Simón Boĺıvar and Sergio A. Camelo, Universidad
More informationAdaBoost. Lecturer: Authors: Center for Machine Perception Czech Technical University, Prague
AdaBoost Lecturer: Jan Šochman Authors: Jan Šochman, Jiří Matas Center for Machine Perception Czech Technical University, Prague http://cmp.felk.cvut.cz Motivation Presentation 2/17 AdaBoost with trees
More informationSupport Vector Machines. CAP 5610: Machine Learning Instructor: GuoJun QI
Support Vector Machines CAP 5610: Machine Learning Instructor: GuoJun QI 1 Linear Classifier Naive Bayes Assume each attribute is drawn from Gaussian distribution with the same variance Generative model:
More informationSUPPORT VECTOR MACHINE FOR THE SIMULTANEOUS APPROXIMATION OF A FUNCTION AND ITS DERIVATIVE
SUPPORT VECTOR MACHINE FOR THE SIMULTANEOUS APPROXIMATION OF A FUNCTION AND ITS DERIVATIVE M. Lázaro 1, I. Santamaría 2, F. PérezCruz 1, A. ArtésRodríguez 1 1 Departamento de Teoría de la Señal y Comunicaciones
More informationMachine Learning, Midterm Exam
10601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv BarJoseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have
More informationbelow, kernel PCA Eigenvectors, and linear combinations thereof. For the cases where the preimage does exist, we can provide a means of constructing
Kernel PCA Pattern Reconstruction via Approximate PreImages Bernhard Scholkopf, Sebastian Mika, Alex Smola, Gunnar Ratsch, & KlausRobert Muller GMD FIRST, Rudower Chaussee 5, 12489 Berlin, Germany fbs,
More informationMachine Learning (CS 567) Lecture 5
Machine Learning (CS 567) Lecture 5 Time: TTh 5:00pm  6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationIntroduction to Logistic Regression and Support Vector Machine
Introduction to Logistic Regression and Support Vector Machine guest lecturer: MingWei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel
More informationKernel Methods in Medical Imaging
This is page 1 Printer: Opaque this Kernel Methods in Medical Imaging G. Charpiat, M. Hofmann, B. Schölkopf ABSTRACT We introduce machine learning techniques, more specifically kernel methods, and show
More informationLecture 11. Kernel Methods
Lecture 11. Kernel Methods COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Andrey Kan Copyright: University of Melbourne This lecture The kernel trick Efficient computation of a dot product
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic
More informationThe Intrinsic Recurrent Support Vector Machine
he Intrinsic Recurrent Support Vector Machine Daniel Schneegaß 1,2, Anton Maximilian Schaefer 1,3, and homas Martinetz 2 1 Siemens AG, Corporate echnology, Learning Systems, OttoHahnRing 6, D81739
More informationBack to the future: Radial Basis Function networks revisited
Back to the future: Radial Basis Function networks revisited Qichao Que, Mikhail Belkin Department of Computer Science and Engineering Ohio State University Columbus, OH 4310 que, mbelkin@cse.ohiostate.edu
More information15388/688  Practical Data Science: Nonlinear modeling, crossvalidation, regularization, and evaluation
15388/688  Practical Data Science: Nonlinear modeling, crossvalidation, regularization, and evaluation J. Zico Kolter Carnegie Mellon University Fall 2016 1 Outline Example: return to peak demand prediction
More informationDiffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)
Diffeomorphic Warping Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) What Manifold Learning Isn t Common features of Manifold Learning Algorithms: 11 charting Dense sampling Geometric Assumptions
More informationGaussian Processes (10/16/13)
STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs
More informationMAT 419 Lecture Notes Transcribed by Eowyn Cenek 6/1/2012
(Homework 1: Chapter 1: Exercises 17, 9, 11, 19, due Monday June 11th See also the course website for lectures, assignments, etc) Note: today s lecture is primarily about definitions Lots of definitions
More informationTDT 4173 Machine Learning and Case Based Reasoning. Helge Langseth og Agnar Aamodt. NTNU IDI Seksjon for intelligente systemer
TDT 4173 Machine Learning and Case Based Reasoning Lecture 6 Support Vector Machines. Ensemble Methods Helge Langseth og Agnar Aamodt NTNU IDI Seksjon for intelligente systemer Outline 1 Wrapup from last
More informationStefanos Zafeiriou, Anastasios Tefas, and Ioannis Pitas
GENDER DETERMINATION USING A SUPPORT VECTOR MACHINE VARIANT Stefanos Zafeiriou, Anastasios Tefas, and Ioannis Pitas Artificial Intelligence and Information Analysis Lab/Department of Informatics, Aristotle
More information9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee
9.520: Class 20 Bayesian Interpretations Tomaso Poggio and Sayan Mukherjee Plan Bayesian interpretation of Regularization Bayesian interpretation of the regularizer Bayesian interpretation of quadratic
More informationKernel methods and the exponential family
Kernel methods and the exponential family Stéphane Canu 1 and Alex J. Smola 2 1 PSI  FRE CNRS 2645 INSA de Rouen, France St Etienne du Rouvray, France Stephane.Canu@insarouen.fr 2 Statistical Machine
More informationLecture 4. 1 Learning NonLinear Classifiers. 2 The Kernel Trick. CS621 Theory Gems September 27, 2012
CS62 Theory Gems September 27, 22 Lecture 4 Lecturer: Aleksander Mądry Scribes: Alhussein Fawzi Learning NonLinear Classifiers In the previous lectures, we have focused on finding linear classifiers,
More informationFinitedimensional spaces. C n is the space of ntuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product
Chapter 4 Hilbert Spaces 4.1 Inner Product Spaces Inner Product Space. A complex vector space E is called an inner product space (or a prehilbert space, or a unitary space) if there is a mapping (, )
More informationRobust KernelBased Regression
Robust KernelBased Regression Budi Santosa Department of Industrial Engineering Sepuluh Nopember Institute of Technology Kampus ITS Surabaya Surabaya 60111,Indonesia Theodore B. Trafalis School of Industrial
More informationMachine Learning. VC Dimension and Model Complexity. Eric Xing , Fall 2015
Machine Learning 10701, Fall 2015 VC Dimension and Model Complexity Eric Xing Lecture 16, November 3, 2015 Reading: Chap. 7 T.M book, and outline material Eric Xing @ CMU, 20062015 1 Last time: PAC and
More informationCSE3210 Machine Learning: Basic Principles
CSE3210 Machine Learning: Basic Principles Lecture 3: Regression I slides by Markus Heinonen Department of Computer Science Aalto University, School of Science Autumn (Period I) 2017 1 / 48 In a nutshell
More informationSupport Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab
Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a
More information1 Review of Winnow Algorithm
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture # 17 Scribe: Xingyuan Fang, Ethan April 9th, 2013 1 Review of Winnow Algorithm We have studied Winnow algorithm in Algorithm 1. Algorithm
More informationStreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory
StreamSVM Linear SVMs and Logistic Regression When Data Does Not Fit In Memory S.V. N. (vishy) Vishwanathan Purdue University and Microsoft vishy@purdue.edu October 9, 2012 S.V. N. Vishwanathan (Purdue,
More informationMLCC 2017 Regularization Networks I: Linear Models
MLCC 2017 Regularization Networks I: Linear Models Lorenzo Rosasco UNIGEMITIIT June 27, 2017 About this class We introduce a class of learning algorithms based on Tikhonov regularization We study computational
More informationDATA MINING AND MACHINE LEARNING
DATA MINING AND MACHINE LEARNING Lecture 5: Regularization and loss functions Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Loss functions Loss functions for regression problems
More informationSupport vector machines Lecture 4
Support vector machines Lecture 4 David Sontag New York University Slides adapted from Luke Zettlemoyer, Vibhav Gogate, and Carlos Guestrin Q: What does the Perceptron mistake bound tell us? Theorem: The
More informationSupport Vector Machines
CS229 Lecture notes Andrew Ng Part V Support Vector Machines This set of notes presents the Support Vector Machine (SVM) learning algorithm. SVMs are among the best (and many believe is indeed the best)
More informationCOMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma
COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released
More information