10-701/ Recitation : Kernels

Size: px
Start display at page:

Download "10-701/ Recitation : Kernels"

Transcription

1 10-701/ Recitation : Kernels Manojit Nandi February 27, 2014

2 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer Theorem Do You want to Build a Snowman Kernel? It doesn t have to be Snowman Kernel. Operations that Preserve/Create Kernels

3 Current Research in Kernel Theory Flaxman Test Fast Food References Manojit Nandi Carnegie Mellon University Machine Learning Department 3/36

4 Manojit Nandi Carnegie Mellon University Machine Learning Department 4/36

5 IMPORTANT: The Kernels used in Support Vector Machines are different from the Kernels used in Kernel Regression. To avoid confusion, I ll refer to the Support Vector Machines kernels as Mercer Kernels and the Kernel Regression kernels as Smoothing Kernels Manojit Nandi Carnegie Mellon University Machine Learning Department 5/36

6 Recall: A square matrix A R NxN u R n, u T Au 0. is positive semi-definite if for all vectors For some kernel function K : X xx R defined over a set X ; X = n, the Gram Matrix G is an nxn matrix such that G ij = K(x i, x j ) =< φ(x i ), φ(x j ) > Manojit Nandi Carnegie Mellon University Machine Learning Department 6/36

7 Mathematical Theory Manojit Nandi Carnegie Mellon University Machine Learning Department 7/36

8 Banach Space and Hilbert Spaces A Banach Space is a complete Vector Space V equipped with a norm. Examples: l m p Spaces: R m equipped with the norm x = ( m x i p ) 1 p i=1 Function Spaces, f := ( X f(x) P dx) 1 p Manojit Nandi Carnegie Mellon University Machine Learning Department 8/36

9 A Hilbert Space is a complete Vector Space V equipped with an inner product <.,. >. The inner product defines a norm ( x =< x, x >), so Hilbert Spaces are a special case of Banach Spaces. Example: Euclidean Space with standard inner product: x, y R n ; < x, y >= n x i y i = x T y Because Kernel functions are the inner product in some higher-dimensional space, each kernel function corresponds to some Hilbert Space H. i=1 Manojit Nandi Carnegie Mellon University Machine Learning Department 9/36

10 Kernels A Mercer Kernel is a function K defined over a set X such that K : X xx R. The Mercer Kernel represents an inner product in some higher dimensional space A Mercer Kernel is symmetric and positive semi-definite. By Mercer s theorem, we know that any symmetric positive semi-definite kernel represents the inner product in some higher-dimensional Hilbert Space H. Symmetric and Positive Semi-Definite Kernel Function < φ(x), φ(x ) > for some φ(.). Manojit Nandi Carnegie Mellon University Machine Learning Department 10/36

11 Why Symmetric? Inner products are symmetric by definition, so therefore if the kernel function represents an inner product in some Hilbert Space, then the kernel function must be symmetric as well. K(x, z) =< φ(x), φ(z) >=< φ(z), φ(x) >= K(z, x). Manojit Nandi Carnegie Mellon University Machine Learning Department 11/36

12 Why Positive Semi-definite? In order to be positive semi-definite, the Gram Matrix G R nxn must satisfy u T Gu 0 u R n. Using functional analysis, one can show that u T Gu corresponds to < h u, h u > for some h u in some Hilbert Space H. Because < h u, h u > 0 by the definition of an inner product, then < h u, h u > 0 and < h u, h u >= u T Gu u T Gu 0. Manojit Nandi Carnegie Mellon University Machine Learning Department 12/36

13 Why are Kernels useful? Let x = (x 1, x 2 ) and z = (z 1, z 2 ). Then, K(x, z) =< x, z > 2 = (x 1 z 1 + x 2 z 2 ) 2 = (x 2 1z 2 1) + (2x 1 z 1 x 2 z 2 ) + (x 2 2z 2 2) =< (x 2 1, 2x 1 x 2, x 2 2), (z 2 1, 2z 1 z 2, z 2 2) >=< φ(x), φ(z) > Where φ(x) = (x 2 1, 2x 1 x 2, x 2 2). What if x = (x 1, x 2, x 3, x 4 ), z = (z 1, z 2, z 3, z 4 ), and K(x, z) =< x, z > 50. In this case, φ(x) and φ(z) are dimensional vectors. : From combinatorics, 50 unlabeled balls into 4 labeled boxes = ( 53 3 ). Manojit Nandi Carnegie Mellon University Machine Learning Department 13/36

14 We can either: 1. Calculate < x, z > (Inner product of two four-dimension vectors) and raise this value (a float) to the 50 th power. 2. OR map x and z to φ(x) and φ(z), and calculate the inner product of two dimensional vectors. Which is faster? Manojit Nandi Carnegie Mellon University Machine Learning Department 14/36

15 Commonly Used Kernels Some commonly used Kernels are: Linear Kernel: K(x, z) =< x, z > Polynomial Kernel: K(x, z) =< x, z > d where d is the degree of the polynomial Gaussian RBF: K(x, z) = exp( 1 2σ 2 x z 2 ) = exp( γ x z 2 ) Sigmoid Kernel: K(x, z) = tanh(αx T z + β) Manojit Nandi Carnegie Mellon University Machine Learning Department 15/36

16 Other Kernels: 1. Laplacian Kernel 2. ANOVA Kernel 3. Circular Kernel 4. Spherical Kernel 5. Wave Kernel 6. Power Kernel 7. Log Kernel 8. B-Spline Kernel 9. Bessel Kernel 10. Cauchy Kernel 11. χ 2 Kernel 12. Wavelet Kernel 13. Bayesian Kernel 14. Histogram Kernel As you can see, there are a lot of kernels. Manojit Nandi Carnegie Mellon University Machine Learning Department 16/36

17 The Gaussian RBF Kernel is infinite In class, Barnabas mentioned that the Gaussian RBF Kernel corresponds to an infinite dimensional vector space. From the Moore-Aronszajn theorem, we know there is a unique Hilbert space of functions for with the Gaussian RBF is a reproducing kernel. One can show that this Hilbert Space has a infinite basis, and so the Gaussian RBF Kernel corresponds to an infinite-dimensional vector space. A YouTube video of Abu-Mostafa explaining this in more detail has been included in the references at the end of this presentation. Manojit Nandi Carnegie Mellon University Machine Learning Department 17/36

18 Kernel Theory Manojit Nandi Carnegie Mellon University Machine Learning Department 18/36

19 One Weird Kernel Trick The kernel trick is one the reasons kernel methods are so popular in machine learning. With the kernel trick, we can substitute any dot product with a kernel function. This means any linear dot product in the formulation of a machine learning algorithm can be replaced with a non-linear kernel function. The nonlinear kernel function may allow us to find separation or structure in the data that was not present in the linear dot product. Kernel Trick: < x i, x j > can be replaced with K(x i, x j ) for any valid kernel function K. Furthermore, we can swap any kernel function with any other kernel function. Manojit Nandi Carnegie Mellon University Machine Learning Department 19/36

20 Some examples of algorithms that take advantage of the kernel trick: Support Vector Machines (Poster Child for Kernel Methods). Kernel Principal Component Analysis Kernel Independent Component Analysis Kernel K-Means algorithm Kernel Gaussian Process Regression Kernel Deep Learning Manojit Nandi Carnegie Mellon University Machine Learning Department 20/36

21 Representer Theorem Theorem Let X be a non-empty set and k a positive-definite real-valued kernel on on X xx with the corresponding reproducing kernel Hilbert Space H k. Given a training sample [(x 1, y 1 ),..., (x n, y n )] X xr, a strictly monotonic increasing real-valued function g : [0, ) R, and an arbitrary emprical loss function E : (X XR 2 ) m R, then for any f satisfying ˆf = argmin{e[(x 1, y 1, f(x 1 ),..., (x n, y n, f(x n ))]} + g( f ) f H k f can be represented in the form f (.) = n α i k(, x i ), where α i R for all α i i=1 Manojit Nandi Carnegie Mellon University Machine Learning Department 21/36

22 Example: Let H k be some RKHS function space with a kernel k(.,.). Let [(x 1, y 1 ),..., (x m, y m )] be a training input-output set, and our task is to find f that minimizes the following regularized kernel. n n ˆf = argmin( f(x i ) 6 ) [ sin( x i yi f(xi) ) 25 + y i f(x i ) 249 ] + exp( f F ) f H k i=1 i=1 Manojit Nandi Carnegie Mellon University Machine Learning Department 22/36

23 n n ˆf = argmin( f(x i ) 6 ) [ sin( x i yi f(xi) ) 25 + y i f(x i ) 249 ] + exp( f F ) f H k i=1 i=1 Well exp(.) is a strictly monotonic increasing function, so exp( f F ) fits the g( f F ) part. ( n f(x i ) 6 ) n [ sin( x i yi f(xi) ) 25 + y i f(x i ) 249 ] is some arbitrary emperical loss function of the [x i, y i, f(x i )]. So this optimization can be expressed as ˆf = argmin {E[(x 1, y 1, f(x 1 ),..., (x n, y n, f(x n ))]} + g( f ) i=1 i=1 f H k Therefore, by the Representer Theorem, f (.) = n α i K(x i,.) i=1 Manojit Nandi Carnegie Mellon University Machine Learning Department 23/36

24 Do You want to Build a Snowman Kernel? It doesn t have to be Snowman Kernel. Manojit Nandi Carnegie Mellon University Machine Learning Department 24/36

25 Operations that Preserve/Create Kernels Let k 1,...k m be valid kernel functions defined over some set X such that X = n, and let α 1,..., α m be non-negative coefficents. The following are valid kernels. K(x,z) = m α i k i (x, z) (Closed under non-negative linear multiplication) i=1 K(x,z) = m k i (x, z) (Closed under multiplication) i=1 K(x,z) =k 1 (f(x), f(z)) for any function f : X X K(x,z) = g(x)g(z) for any function g : X R K(x,z) = x T A T Az for any matrix A R mxn Manojit Nandi Carnegie Mellon University Machine Learning Department 25/36

26 Proof: K(x,z) = m α i k i (x, z) is a valid kernel i=1 Symmetry: K(z, x) = m i=1 α i k i (z, x). Because each k i is a kernel function, it is symmetric, so k i (z, x) = k i (x, z). Therefore, K(z, x) = m α i k i (z, x) = m α i k i (x, z) = K(x, z) K(z, x) = K(x, z) i=1 i=1 Positive Semi-definite: Let u R n be arbitrary. The Gram matrix of K, denoted by G has the property, G i,j = K(x i, x j ) = m α i k i (z, x) G = α 1 G α m G m. Now u T Gu = u T (α 1 G α m G m )u = α 1 u T G 1 u α m u T G m u = m α i u T G i u. i=1 u T G i u 0, and α i 0, so α i u T G i u 0. i=1 Manojit Nandi Carnegie Mellon University Machine Learning Department 26/36

27 Proof: K(x,z) = x T A T Az for any matrix A R mxn is a valid Kernel. For this proof, we are going to show K(x, z) is an inner product on some Hilbert Space. Let φ(x) = Ax, then < φ(x), φ(z) >= φ(x) T φ(z) = (Ax) T (Az) = x T A T Az = K(x, z) < φ(x), φ(z) >= K(x, z). Therefore, K(x, z) is an inner product on some Hilbert Space. Manojit Nandi Carnegie Mellon University Machine Learning Department 27/36

28 Just because it is a valid kernel does not mean it is a good kernel Manojit Nandi Carnegie Mellon University Machine Learning Department 28/36

29 Manojit Nandi Carnegie Mellon University Machine Learning Department 29/36

30 Some Exercises Prove the following are not valid kernels: 1. K(x, z) = k 1 (x, z) k 2 (x, z) for k 1, k 2 valid kernels 2. K(x, z) = exp(γ x z 2 ) for some γ > 0 Manojit Nandi Carnegie Mellon University Machine Learning Department 30/36

31 Current Research in Kernel Theory Manojit Nandi Carnegie Mellon University Machine Learning Department 31/36

32 Flaxman Test Correlates of homicide: New space/time interaction tests for spatiotemporal point processes Goal: Develop a statistical test that can test for the strength of interaction between time and space for different types of homicides. Result: Using Reproducing Kernel Hilbert spaces, one can project the space-time data into a higher dimensional space that can test for a significant interaction of a kernalized distance of space and time using the Hilbert-Schmidt Independence Criterion. Manojit Nandi Carnegie Mellon University Machine Learning Department 32/36

33 Fast Food Fastfood - Approximating Kernel Expansions in Loglinear time Problem: Kernel methods do not scale well to large datasets, especially during prediction time, because the algorithm has to compute the kernel distance between the new vector and all of the data in the training set. Solution: Using advanced mathematical black magic, dense random Gaussian matrices can be approximated using Hadamard matrices and diagonal Gaussian matrices. V = 1 σ d SHG HB Manojit Nandi Carnegie Mellon University Machine Learning Department 33/36

34 Where {0, 1} n is a permutation matrix, H is a Hadamard matrix, S is a random scaling matrix with elements along the diagonal, B has random {+ 1} along the diagonal, and G has random Gaussian entries. This algorithm performs 100x faster than the previous leading algorithm (Random Kitchen Sinks) and uses 1000x less memory. Manojit Nandi Carnegie Mellon University Machine Learning Department 34/36

35 References Representer Theorem and Kernel Methods Predicting Structured Data by Alex Smola Kernel Machines String and Tree Kernels Why Gaussian Kernel is Infinite Manojit Nandi Carnegie Mellon University Machine Learning Department 35/36

36 Questions?

Each new feature uses a pair of the original features. Problem: Mapping usually leads to the number of features blow up!

Each new feature uses a pair of the original features. Problem: Mapping usually leads to the number of features blow up! Feature Mapping Consider the following mapping φ for an example x = {x 1,...,x D } φ : x {x1,x 2 2,...,x 2 D,,x 2 1 x 2,x 1 x 2,...,x 1 x D,...,x D 1 x D } It s an example of a quadratic mapping Each new

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

Kernel Methods. Barnabás Póczos

Kernel Methods. Barnabás Póczos Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels

More information

Kernels and the Kernel Trick. Machine Learning Fall 2017

Kernels and the Kernel Trick. Machine Learning Fall 2017 Kernels and the Kernel Trick Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem Support vectors, duals and kernels

More information

Kernel Methods. Outline

Kernel Methods. Outline Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert

More information

Kernels A Machine Learning Overview

Kernels A Machine Learning Overview Kernels A Machine Learning Overview S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola, Stéphane Canu, Mike Jordan and Peter

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Kernel Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1 / 21

More information

Kernel Method: Data Analysis with Positive Definite Kernels

Kernel Method: Data Analysis with Positive Definite Kernels Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University

More information

Fastfood Approximating Kernel Expansions in Loglinear Time. Quoc Le, Tamas Sarlos, and Alex Smola Presenter: Shuai Zheng (Kyle)

Fastfood Approximating Kernel Expansions in Loglinear Time. Quoc Le, Tamas Sarlos, and Alex Smola Presenter: Shuai Zheng (Kyle) Fastfood Approximating Kernel Expansions in Loglinear Time Quoc Le, Tamas Sarlos, and Alex Smola Presenter: Shuai Zheng (Kyle) Large Scale Problem: ImageNet Challenge Large scale data Number of training

More information

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto Reproducing Kernel Hilbert Spaces 9.520 Class 03, 15 February 2006 Andrea Caponnetto About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

10-701/ Recitation : Linear Algebra Review (based on notes written by Jing Xiang)

10-701/ Recitation : Linear Algebra Review (based on notes written by Jing Xiang) 10-701/15-781 Recitation : Linear Algebra Review (based on notes written by Jing Xiang) Manojit Nandi February 1, 2014 Outline Linear Algebra General Properties Matrix Operations Inner Products and Orthogonal

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

EECS 598: Statistical Learning Theory, Winter 2014 Topic 11. Kernels

EECS 598: Statistical Learning Theory, Winter 2014 Topic 11. Kernels EECS 598: Statistical Learning Theory, Winter 2014 Topic 11 Kernels Lecturer: Clayton Scott Scribe: Jun Guo, Soumik Chatterjee Disclaimer: These notes have not been subjected to the usual scrutiny reserved

More information

Kernels MIT Course Notes

Kernels MIT Course Notes Kernels MIT 15.097 Course Notes Cynthia Rudin Credits: Bartlett, Schölkopf and Smola, Cristianini and Shawe-Taylor The kernel trick that I m going to show you applies much more broadly than SVM, but we

More information

SVMs: nonlinearity through kernels

SVMs: nonlinearity through kernels Non-separable data e-8. Support Vector Machines 8.. The Optimal Hyperplane Consider the following two datasets: SVMs: nonlinearity through kernels ER Chapter 3.4, e-8 (a) Few noisy data. (b) Nonlinearly

More information

5.6 Nonparametric Logistic Regression

5.6 Nonparametric Logistic Regression 5.6 onparametric Logistic Regression Dmitri Dranishnikov University of Florida Statistical Learning onparametric Logistic Regression onparametric? Doesnt mean that there are no parameters. Just means that

More information

CS798: Selected topics in Machine Learning

CS798: Selected topics in Machine Learning CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning

More information

Kernel Methods in Machine Learning

Kernel Methods in Machine Learning Kernel Methods in Machine Learning Autumn 2015 Lecture 1: Introduction Juho Rousu ICS-E4030 Kernel Methods in Machine Learning 9. September, 2015 uho Rousu (ICS-E4030 Kernel Methods in Machine Learning)

More information

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2 Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal

More information

Kernel Methods. Foundations of Data Analysis. Torsten Möller. Möller/Mori 1

Kernel Methods. Foundations of Data Analysis. Torsten Möller. Möller/Mori 1 Kernel Methods Foundations of Data Analysis Torsten Möller Möller/Mori 1 Reading Chapter 6 of Pattern Recognition and Machine Learning by Bishop Chapter 12 of The Elements of Statistical Learning by Hastie,

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space

Outline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space to The The A s s in to Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 2, 2009 to The The A s s in 1 Motivation Outline 2 The Mapping the

More information

Review: Support vector machines. Machine learning techniques and image analysis

Review: Support vector machines. Machine learning techniques and image analysis Review: Support vector machines Review: Support vector machines Margin optimization min (w,w 0 ) 1 2 w 2 subject to y i (w 0 + w T x i ) 1 0, i = 1,..., n. Review: Support vector machines Margin optimization

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the

More information

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee The Learning Problem and Regularization 9.520 Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing

More information

Support Vector Machines

Support Vector Machines Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date

More information

LMS Algorithm Summary

LMS Algorithm Summary LMS Algorithm Summary Step size tradeoff Other Iterative Algorithms LMS algorithm with variable step size: w(k+1) = w(k) + µ(k)e(k)x(k) When step size µ(k) = µ/k algorithm converges almost surely to optimal

More information

MATH 590: Meshfree Methods

MATH 590: Meshfree Methods MATH 590: Meshfree Methods Chapter 2 Part 3: Native Space for Positive Definite Kernels Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Fall 2014 fasshauer@iit.edu MATH

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space

Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department

More information

Support Vector Machines: Kernels

Support Vector Machines: Kernels Support Vector Machines: Kernels CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 14.1, 14.2, 14.4 Schoelkopf/Smola Chapter 7.4, 7.6, 7.8 Non-Linear Problems

More information

Kernel Methods. Charles Elkan October 17, 2007

Kernel Methods. Charles Elkan October 17, 2007 Kernel Methods Charles Elkan elkan@cs.ucsd.edu October 17, 2007 Remember the xor example of a classification problem that is not linearly separable. If we map every example into a new representation, then

More information

Regularization in Reproducing Kernel Banach Spaces

Regularization in Reproducing Kernel Banach Spaces .... Regularization in Reproducing Kernel Banach Spaces Guohui Song School of Mathematical and Statistical Sciences Arizona State University Comp Math Seminar, September 16, 2010 Joint work with Dr. Fred

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Kaggle.

Kaggle. Administrivia Mini-project 2 due April 7, in class implement multi-class reductions, naive bayes, kernel perceptron, multi-class logistic regression and two layer neural networks training set: Project

More information

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:

More information

The Kernel Trick. Robert M. Haralick. Computer Science, Graduate Center City University of New York

The Kernel Trick. Robert M. Haralick. Computer Science, Graduate Center City University of New York The Kernel Trick Robert M. Haralick Computer Science, Graduate Center City University of New York Outline SVM Classification < (x 1, c 1 ),..., (x Z, c Z ) > is the training data c 1,..., c Z { 1, 1} specifies

More information

22 : Hilbert Space Embeddings of Distributions

22 : Hilbert Space Embeddings of Distributions 10-708: Probabilistic Graphical Models 10-708, Spring 2014 22 : Hilbert Space Embeddings of Distributions Lecturer: Eric P. Xing Scribes: Sujay Kumar Jauhar and Zhiguang Huo 1 Introduction and Motivation

More information

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 04: Features and Kernels. Lorenzo Rosasco

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 04: Features and Kernels. Lorenzo Rosasco MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 04: Features and Kernels Lorenzo Rosasco Linear functions Let H lin be the space of linear functions f(x) = w x. f w is one

More information

Kernel Principal Component Analysis

Kernel Principal Component Analysis Kernel Principal Component Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Machine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML)

Machine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML) Machine Learning Fall 2017 Kernels (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang (Chap. 12 of CIML) Nonlinear Features x4: -1 x1: +1 x3: +1 x2: -1 Concatenated (combined) features XOR:

More information

Basis Expansion and Nonlinear SVM. Kai Yu

Basis Expansion and Nonlinear SVM. Kai Yu Basis Expansion and Nonlinear SVM Kai Yu Linear Classifiers f(x) =w > x + b z(x) = sign(f(x)) Help to learn more general cases, e.g., nonlinear models 8/7/12 2 Nonlinear Classifiers via Basis Expansion

More information

The Kernel Trick. Carlos C. Rodríguez October 25, Why don t we do it in higher dimensions?

The Kernel Trick. Carlos C. Rodríguez  October 25, Why don t we do it in higher dimensions? The Kernel Trick Carlos C. Rodríguez http://omega.albany.edu:8008/ October 25, 2004 Why don t we do it in higher dimensions? If SVMs were able to handle only linearly separable data, their usefulness would

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers Computational Methods for Data Analysis Massimo Poesio SUPPORT VECTOR MACHINES Support Vector Machines Linear classifiers 1 Linear Classifiers denotes +1 denotes -1 w x + b>0 f(x,w,b) = sign(w x + b) How

More information

Lecture 10: Support Vector Machine and Large Margin Classifier

Lecture 10: Support Vector Machine and Large Margin Classifier Lecture 10: Support Vector Machine and Large Margin Classifier Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu

More information

CS 7140: Advanced Machine Learning

CS 7140: Advanced Machine Learning Instructor CS 714: Advanced Machine Learning Lecture 3: Gaussian Processes (17 Jan, 218) Jan-Willem van de Meent (j.vandemeent@northeastern.edu) Scribes Mo Han (han.m@husky.neu.edu) Guillem Reus Muns (reusmuns.g@husky.neu.edu)

More information

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017 The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the

More information

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal

More information

Support Vector Machines

Support Vector Machines Support Vector Machines INFO-4604, Applied Machine Learning University of Colorado Boulder September 28, 2017 Prof. Michael Paul Today Two important concepts: Margins Kernels Large Margin Classification

More information

MAT 419 Lecture Notes Transcribed by Eowyn Cenek 6/1/2012

MAT 419 Lecture Notes Transcribed by Eowyn Cenek 6/1/2012 (Homework 1: Chapter 1: Exercises 1-7, 9, 11, 19, due Monday June 11th See also the course website for lectures, assignments, etc) Note: today s lecture is primarily about definitions Lots of definitions

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Lecture 7: Kernels for Classification and Regression

Lecture 7: Kernels for Classification and Regression Lecture 7: Kernels for Classification and Regression CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 15, 2011 Outline Outline A linear regression problem Linear auto-regressive

More information

SVMs, Duality and the Kernel Trick

SVMs, Duality and the Kernel Trick SVMs, Duality and the Kernel Trick Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 26 th, 2007 2005-2007 Carlos Guestrin 1 SVMs reminder 2005-2007 Carlos Guestrin 2 Today

More information

Kernel Ridge Regression. Mohammad Emtiyaz Khan EPFL Oct 27, 2015

Kernel Ridge Regression. Mohammad Emtiyaz Khan EPFL Oct 27, 2015 Kernel Ridge Regression Mohammad Emtiyaz Khan EPFL Oct 27, 2015 Mohammad Emtiyaz Khan 2015 Motivation The ridge solution β R D has a counterpart α R N. Using duality, we will establish a relationship between

More information

Manifold Learning: Theory and Applications to HRI

Manifold Learning: Theory and Applications to HRI Manifold Learning: Theory and Applications to HRI Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr August 19, 2008 1 / 46 Greek Philosopher

More information

Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Reading: Ben-Hur & Weston, A User s Guide to Support Vector Machines (linked from class web page) Notation Assume a binary classification problem. Instances are represented by vector

More information

Machine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2015

Machine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2015 Machine Learning Classification, Discriminative learning Structured output, structured input, discriminative function, joint input-output features, Likelihood Maximization, Logistic regression, binary

More information

Memory Efficient Kernel Approximation

Memory Efficient Kernel Approximation Si Si Department of Computer Science University of Texas at Austin ICML Beijing, China June 23, 2014 Joint work with Cho-Jui Hsieh and Inderjit S. Dhillon Outline Background Motivation Low-Rank vs. Block

More information

Kernel Learning via Random Fourier Representations

Kernel Learning via Random Fourier Representations Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Lecture Notes on Support Vector Machine

Lecture Notes on Support Vector Machine Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is

More information

9.2 Support Vector Machines 159

9.2 Support Vector Machines 159 9.2 Support Vector Machines 159 9.2.3 Kernel Methods We have all the tools together now to make an exciting step. Let us summarize our findings. We are interested in regularized estimation problems of

More information

Machine Learning : Support Vector Machines

Machine Learning : Support Vector Machines Machine Learning Support Vector Machines 05/01/2014 Machine Learning : Support Vector Machines Linear Classifiers (recap) A building block for almost all a mapping, a partitioning of the input space into

More information

Support Vector Machines.

Support Vector Machines. Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

Support Vector Machine & Its Applications

Support Vector Machine & Its Applications Support Vector Machine & Its Applications A portion (1/3) of the slides are taken from Prof. Andrew Moore s SVM tutorial at http://www.cs.cmu.edu/~awm/tutorials Mingyue Tan The University of British Columbia

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Support Vector Machine

Support Vector Machine Support Vector Machine Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Linear Support Vector Machine Kernelized SVM Kernels 2 From ERM to RLM Empirical Risk Minimization in the binary

More information

Kernels for Multi task Learning

Kernels for Multi task Learning Kernels for Multi task Learning Charles A Micchelli Department of Mathematics and Statistics State University of New York, The University at Albany 1400 Washington Avenue, Albany, NY, 12222, USA Massimiliano

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 12, 2007 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Lecture 4 February 2

Lecture 4 February 2 4-1 EECS 281B / STAT 241B: Advanced Topics in Statistical Learning Spring 29 Lecture 4 February 2 Lecturer: Martin Wainwright Scribe: Luqman Hodgkinson Note: These lecture notes are still rough, and have

More information

Functional Analysis Review

Functional Analysis Review Outline 9.520: Statistical Learning Theory and Applications February 8, 2010 Outline 1 2 3 4 Vector Space Outline A vector space is a set V with binary operations +: V V V and : R V V such that for all

More information

A Kernel Between Sets of Vectors

A Kernel Between Sets of Vectors A Kernel Between Sets of Vectors Risi Kondor Tony Jebara Columbia University, New York, USA. 1 A Kernel between Sets of Vectors In SVM, Gassian Processes, Kernel PCA, kernel K de nes feature map Φ : X

More information

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable. Linear SVM (separable case) First consider the scenario where the two classes of points are separable. It s desirable to have the width (called margin) between the two dashed lines to be large, i.e., have

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 11, 2009 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Kernel methods for comparing distributions, measuring dependence

Kernel methods for comparing distributions, measuring dependence Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations

More information

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Machine Learning. Lecture 6: Support Vector Machine. Feng Li. Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)

More information

Lecture 10: A brief introduction to Support Vector Machine

Lecture 10: A brief introduction to Support Vector Machine Lecture 10: A brief introduction to Support Vector Machine Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department

More information

Deviations from linear separability. Kernel methods. Basis expansion for quadratic boundaries. Adding new features Systematic deviation

Deviations from linear separability. Kernel methods. Basis expansion for quadratic boundaries. Adding new features Systematic deviation Deviations from linear separability Kernel methods CSE 250B Noise Find a separator that minimizes a convex loss function related to the number of mistakes. e.g. SVM, logistic regression. Systematic deviation

More information

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets 9.520 Class 22, 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce an alternate perspective of RKHS via integral operators

More information

Joint distribution optimal transportation for domain adaptation

Joint distribution optimal transportation for domain adaptation Joint distribution optimal transportation for domain adaptation Changhuang Wan Mechanical and Aerospace Engineering Department The Ohio State University March 8 th, 2018 Joint distribution optimal transportation

More information

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Support Vector Machines

Support Vector Machines Two SVM tutorials linked in class website (please, read both): High-level presentation with applications (Hearst 1998) Detailed tutorial (Burges 1998) Support Vector Machines Machine Learning 10701/15781

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Kernel methods CSE 250B

Kernel methods CSE 250B Kernel methods CSE 250B Deviations from linear separability Noise Find a separator that minimizes a convex loss function related to the number of mistakes. e.g. SVM, logistic regression. Deviations from

More information

1. Mathematical Foundations of Machine Learning

1. Mathematical Foundations of Machine Learning 1. Mathematical Foundations of Machine Learning 1.1 Basic Concepts Definition of Learning Definition [Mitchell (1997)] A computer program is said to learn from experience E with respect to some class of

More information

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines

CS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several

More information