10-701/ Recitation : Kernels
|
|
- Mariah Wright
- 5 years ago
- Views:
Transcription
1 10-701/ Recitation : Kernels Manojit Nandi February 27, 2014
2 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer Theorem Do You want to Build a Snowman Kernel? It doesn t have to be Snowman Kernel. Operations that Preserve/Create Kernels
3 Current Research in Kernel Theory Flaxman Test Fast Food References Manojit Nandi Carnegie Mellon University Machine Learning Department 3/36
4 Manojit Nandi Carnegie Mellon University Machine Learning Department 4/36
5 IMPORTANT: The Kernels used in Support Vector Machines are different from the Kernels used in Kernel Regression. To avoid confusion, I ll refer to the Support Vector Machines kernels as Mercer Kernels and the Kernel Regression kernels as Smoothing Kernels Manojit Nandi Carnegie Mellon University Machine Learning Department 5/36
6 Recall: A square matrix A R NxN u R n, u T Au 0. is positive semi-definite if for all vectors For some kernel function K : X xx R defined over a set X ; X = n, the Gram Matrix G is an nxn matrix such that G ij = K(x i, x j ) =< φ(x i ), φ(x j ) > Manojit Nandi Carnegie Mellon University Machine Learning Department 6/36
7 Mathematical Theory Manojit Nandi Carnegie Mellon University Machine Learning Department 7/36
8 Banach Space and Hilbert Spaces A Banach Space is a complete Vector Space V equipped with a norm. Examples: l m p Spaces: R m equipped with the norm x = ( m x i p ) 1 p i=1 Function Spaces, f := ( X f(x) P dx) 1 p Manojit Nandi Carnegie Mellon University Machine Learning Department 8/36
9 A Hilbert Space is a complete Vector Space V equipped with an inner product <.,. >. The inner product defines a norm ( x =< x, x >), so Hilbert Spaces are a special case of Banach Spaces. Example: Euclidean Space with standard inner product: x, y R n ; < x, y >= n x i y i = x T y Because Kernel functions are the inner product in some higher-dimensional space, each kernel function corresponds to some Hilbert Space H. i=1 Manojit Nandi Carnegie Mellon University Machine Learning Department 9/36
10 Kernels A Mercer Kernel is a function K defined over a set X such that K : X xx R. The Mercer Kernel represents an inner product in some higher dimensional space A Mercer Kernel is symmetric and positive semi-definite. By Mercer s theorem, we know that any symmetric positive semi-definite kernel represents the inner product in some higher-dimensional Hilbert Space H. Symmetric and Positive Semi-Definite Kernel Function < φ(x), φ(x ) > for some φ(.). Manojit Nandi Carnegie Mellon University Machine Learning Department 10/36
11 Why Symmetric? Inner products are symmetric by definition, so therefore if the kernel function represents an inner product in some Hilbert Space, then the kernel function must be symmetric as well. K(x, z) =< φ(x), φ(z) >=< φ(z), φ(x) >= K(z, x). Manojit Nandi Carnegie Mellon University Machine Learning Department 11/36
12 Why Positive Semi-definite? In order to be positive semi-definite, the Gram Matrix G R nxn must satisfy u T Gu 0 u R n. Using functional analysis, one can show that u T Gu corresponds to < h u, h u > for some h u in some Hilbert Space H. Because < h u, h u > 0 by the definition of an inner product, then < h u, h u > 0 and < h u, h u >= u T Gu u T Gu 0. Manojit Nandi Carnegie Mellon University Machine Learning Department 12/36
13 Why are Kernels useful? Let x = (x 1, x 2 ) and z = (z 1, z 2 ). Then, K(x, z) =< x, z > 2 = (x 1 z 1 + x 2 z 2 ) 2 = (x 2 1z 2 1) + (2x 1 z 1 x 2 z 2 ) + (x 2 2z 2 2) =< (x 2 1, 2x 1 x 2, x 2 2), (z 2 1, 2z 1 z 2, z 2 2) >=< φ(x), φ(z) > Where φ(x) = (x 2 1, 2x 1 x 2, x 2 2). What if x = (x 1, x 2, x 3, x 4 ), z = (z 1, z 2, z 3, z 4 ), and K(x, z) =< x, z > 50. In this case, φ(x) and φ(z) are dimensional vectors. : From combinatorics, 50 unlabeled balls into 4 labeled boxes = ( 53 3 ). Manojit Nandi Carnegie Mellon University Machine Learning Department 13/36
14 We can either: 1. Calculate < x, z > (Inner product of two four-dimension vectors) and raise this value (a float) to the 50 th power. 2. OR map x and z to φ(x) and φ(z), and calculate the inner product of two dimensional vectors. Which is faster? Manojit Nandi Carnegie Mellon University Machine Learning Department 14/36
15 Commonly Used Kernels Some commonly used Kernels are: Linear Kernel: K(x, z) =< x, z > Polynomial Kernel: K(x, z) =< x, z > d where d is the degree of the polynomial Gaussian RBF: K(x, z) = exp( 1 2σ 2 x z 2 ) = exp( γ x z 2 ) Sigmoid Kernel: K(x, z) = tanh(αx T z + β) Manojit Nandi Carnegie Mellon University Machine Learning Department 15/36
16 Other Kernels: 1. Laplacian Kernel 2. ANOVA Kernel 3. Circular Kernel 4. Spherical Kernel 5. Wave Kernel 6. Power Kernel 7. Log Kernel 8. B-Spline Kernel 9. Bessel Kernel 10. Cauchy Kernel 11. χ 2 Kernel 12. Wavelet Kernel 13. Bayesian Kernel 14. Histogram Kernel As you can see, there are a lot of kernels. Manojit Nandi Carnegie Mellon University Machine Learning Department 16/36
17 The Gaussian RBF Kernel is infinite In class, Barnabas mentioned that the Gaussian RBF Kernel corresponds to an infinite dimensional vector space. From the Moore-Aronszajn theorem, we know there is a unique Hilbert space of functions for with the Gaussian RBF is a reproducing kernel. One can show that this Hilbert Space has a infinite basis, and so the Gaussian RBF Kernel corresponds to an infinite-dimensional vector space. A YouTube video of Abu-Mostafa explaining this in more detail has been included in the references at the end of this presentation. Manojit Nandi Carnegie Mellon University Machine Learning Department 17/36
18 Kernel Theory Manojit Nandi Carnegie Mellon University Machine Learning Department 18/36
19 One Weird Kernel Trick The kernel trick is one the reasons kernel methods are so popular in machine learning. With the kernel trick, we can substitute any dot product with a kernel function. This means any linear dot product in the formulation of a machine learning algorithm can be replaced with a non-linear kernel function. The nonlinear kernel function may allow us to find separation or structure in the data that was not present in the linear dot product. Kernel Trick: < x i, x j > can be replaced with K(x i, x j ) for any valid kernel function K. Furthermore, we can swap any kernel function with any other kernel function. Manojit Nandi Carnegie Mellon University Machine Learning Department 19/36
20 Some examples of algorithms that take advantage of the kernel trick: Support Vector Machines (Poster Child for Kernel Methods). Kernel Principal Component Analysis Kernel Independent Component Analysis Kernel K-Means algorithm Kernel Gaussian Process Regression Kernel Deep Learning Manojit Nandi Carnegie Mellon University Machine Learning Department 20/36
21 Representer Theorem Theorem Let X be a non-empty set and k a positive-definite real-valued kernel on on X xx with the corresponding reproducing kernel Hilbert Space H k. Given a training sample [(x 1, y 1 ),..., (x n, y n )] X xr, a strictly monotonic increasing real-valued function g : [0, ) R, and an arbitrary emprical loss function E : (X XR 2 ) m R, then for any f satisfying ˆf = argmin{e[(x 1, y 1, f(x 1 ),..., (x n, y n, f(x n ))]} + g( f ) f H k f can be represented in the form f (.) = n α i k(, x i ), where α i R for all α i i=1 Manojit Nandi Carnegie Mellon University Machine Learning Department 21/36
22 Example: Let H k be some RKHS function space with a kernel k(.,.). Let [(x 1, y 1 ),..., (x m, y m )] be a training input-output set, and our task is to find f that minimizes the following regularized kernel. n n ˆf = argmin( f(x i ) 6 ) [ sin( x i yi f(xi) ) 25 + y i f(x i ) 249 ] + exp( f F ) f H k i=1 i=1 Manojit Nandi Carnegie Mellon University Machine Learning Department 22/36
23 n n ˆf = argmin( f(x i ) 6 ) [ sin( x i yi f(xi) ) 25 + y i f(x i ) 249 ] + exp( f F ) f H k i=1 i=1 Well exp(.) is a strictly monotonic increasing function, so exp( f F ) fits the g( f F ) part. ( n f(x i ) 6 ) n [ sin( x i yi f(xi) ) 25 + y i f(x i ) 249 ] is some arbitrary emperical loss function of the [x i, y i, f(x i )]. So this optimization can be expressed as ˆf = argmin {E[(x 1, y 1, f(x 1 ),..., (x n, y n, f(x n ))]} + g( f ) i=1 i=1 f H k Therefore, by the Representer Theorem, f (.) = n α i K(x i,.) i=1 Manojit Nandi Carnegie Mellon University Machine Learning Department 23/36
24 Do You want to Build a Snowman Kernel? It doesn t have to be Snowman Kernel. Manojit Nandi Carnegie Mellon University Machine Learning Department 24/36
25 Operations that Preserve/Create Kernels Let k 1,...k m be valid kernel functions defined over some set X such that X = n, and let α 1,..., α m be non-negative coefficents. The following are valid kernels. K(x,z) = m α i k i (x, z) (Closed under non-negative linear multiplication) i=1 K(x,z) = m k i (x, z) (Closed under multiplication) i=1 K(x,z) =k 1 (f(x), f(z)) for any function f : X X K(x,z) = g(x)g(z) for any function g : X R K(x,z) = x T A T Az for any matrix A R mxn Manojit Nandi Carnegie Mellon University Machine Learning Department 25/36
26 Proof: K(x,z) = m α i k i (x, z) is a valid kernel i=1 Symmetry: K(z, x) = m i=1 α i k i (z, x). Because each k i is a kernel function, it is symmetric, so k i (z, x) = k i (x, z). Therefore, K(z, x) = m α i k i (z, x) = m α i k i (x, z) = K(x, z) K(z, x) = K(x, z) i=1 i=1 Positive Semi-definite: Let u R n be arbitrary. The Gram matrix of K, denoted by G has the property, G i,j = K(x i, x j ) = m α i k i (z, x) G = α 1 G α m G m. Now u T Gu = u T (α 1 G α m G m )u = α 1 u T G 1 u α m u T G m u = m α i u T G i u. i=1 u T G i u 0, and α i 0, so α i u T G i u 0. i=1 Manojit Nandi Carnegie Mellon University Machine Learning Department 26/36
27 Proof: K(x,z) = x T A T Az for any matrix A R mxn is a valid Kernel. For this proof, we are going to show K(x, z) is an inner product on some Hilbert Space. Let φ(x) = Ax, then < φ(x), φ(z) >= φ(x) T φ(z) = (Ax) T (Az) = x T A T Az = K(x, z) < φ(x), φ(z) >= K(x, z). Therefore, K(x, z) is an inner product on some Hilbert Space. Manojit Nandi Carnegie Mellon University Machine Learning Department 27/36
28 Just because it is a valid kernel does not mean it is a good kernel Manojit Nandi Carnegie Mellon University Machine Learning Department 28/36
29 Manojit Nandi Carnegie Mellon University Machine Learning Department 29/36
30 Some Exercises Prove the following are not valid kernels: 1. K(x, z) = k 1 (x, z) k 2 (x, z) for k 1, k 2 valid kernels 2. K(x, z) = exp(γ x z 2 ) for some γ > 0 Manojit Nandi Carnegie Mellon University Machine Learning Department 30/36
31 Current Research in Kernel Theory Manojit Nandi Carnegie Mellon University Machine Learning Department 31/36
32 Flaxman Test Correlates of homicide: New space/time interaction tests for spatiotemporal point processes Goal: Develop a statistical test that can test for the strength of interaction between time and space for different types of homicides. Result: Using Reproducing Kernel Hilbert spaces, one can project the space-time data into a higher dimensional space that can test for a significant interaction of a kernalized distance of space and time using the Hilbert-Schmidt Independence Criterion. Manojit Nandi Carnegie Mellon University Machine Learning Department 32/36
33 Fast Food Fastfood - Approximating Kernel Expansions in Loglinear time Problem: Kernel methods do not scale well to large datasets, especially during prediction time, because the algorithm has to compute the kernel distance between the new vector and all of the data in the training set. Solution: Using advanced mathematical black magic, dense random Gaussian matrices can be approximated using Hadamard matrices and diagonal Gaussian matrices. V = 1 σ d SHG HB Manojit Nandi Carnegie Mellon University Machine Learning Department 33/36
34 Where {0, 1} n is a permutation matrix, H is a Hadamard matrix, S is a random scaling matrix with elements along the diagonal, B has random {+ 1} along the diagonal, and G has random Gaussian entries. This algorithm performs 100x faster than the previous leading algorithm (Random Kitchen Sinks) and uses 1000x less memory. Manojit Nandi Carnegie Mellon University Machine Learning Department 34/36
35 References Representer Theorem and Kernel Methods Predicting Structured Data by Alex Smola Kernel Machines String and Tree Kernels Why Gaussian Kernel is Infinite Manojit Nandi Carnegie Mellon University Machine Learning Department 35/36
36 Questions?
Each new feature uses a pair of the original features. Problem: Mapping usually leads to the number of features blow up!
Feature Mapping Consider the following mapping φ for an example x = {x 1,...,x D } φ : x {x1,x 2 2,...,x 2 D,,x 2 1 x 2,x 1 x 2,...,x 1 x D,...,x D 1 x D } It s an example of a quadratic mapping Each new
More informationCIS 520: Machine Learning Oct 09, Kernel Methods
CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed
More informationKernel Methods. Barnabás Póczos
Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels
More informationKernels and the Kernel Trick. Machine Learning Fall 2017
Kernels and the Kernel Trick Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem Support vectors, duals and kernels
More informationKernel Methods. Outline
Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert
More informationKernels A Machine Learning Overview
Kernels A Machine Learning Overview S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola, Stéphane Canu, Mike Jordan and Peter
More informationIntroduction to Machine Learning
Introduction to Machine Learning Kernel Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574 1 / 21
More informationKernel Method: Data Analysis with Positive Definite Kernels
Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University
More informationFastfood Approximating Kernel Expansions in Loglinear Time. Quoc Le, Tamas Sarlos, and Alex Smola Presenter: Shuai Zheng (Kyle)
Fastfood Approximating Kernel Expansions in Loglinear Time Quoc Le, Tamas Sarlos, and Alex Smola Presenter: Shuai Zheng (Kyle) Large Scale Problem: ImageNet Challenge Large scale data Number of training
More informationReproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto
Reproducing Kernel Hilbert Spaces 9.520 Class 03, 15 February 2006 Andrea Caponnetto About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More information10-701/ Recitation : Linear Algebra Review (based on notes written by Jing Xiang)
10-701/15-781 Recitation : Linear Algebra Review (based on notes written by Jing Xiang) Manojit Nandi February 1, 2014 Outline Linear Algebra General Properties Matrix Operations Inner Products and Orthogonal
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More informationEECS 598: Statistical Learning Theory, Winter 2014 Topic 11. Kernels
EECS 598: Statistical Learning Theory, Winter 2014 Topic 11 Kernels Lecturer: Clayton Scott Scribe: Jun Guo, Soumik Chatterjee Disclaimer: These notes have not been subjected to the usual scrutiny reserved
More informationKernels MIT Course Notes
Kernels MIT 15.097 Course Notes Cynthia Rudin Credits: Bartlett, Schölkopf and Smola, Cristianini and Shawe-Taylor The kernel trick that I m going to show you applies much more broadly than SVM, but we
More informationSVMs: nonlinearity through kernels
Non-separable data e-8. Support Vector Machines 8.. The Optimal Hyperplane Consider the following two datasets: SVMs: nonlinearity through kernels ER Chapter 3.4, e-8 (a) Few noisy data. (b) Nonlinearly
More information5.6 Nonparametric Logistic Regression
5.6 onparametric Logistic Regression Dmitri Dranishnikov University of Florida Statistical Learning onparametric Logistic Regression onparametric? Doesnt mean that there are no parameters. Just means that
More informationCS798: Selected topics in Machine Learning
CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning
More informationKernel Methods in Machine Learning
Kernel Methods in Machine Learning Autumn 2015 Lecture 1: Introduction Juho Rousu ICS-E4030 Kernel Methods in Machine Learning 9. September, 2015 uho Rousu (ICS-E4030 Kernel Methods in Machine Learning)
More informationNearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2
Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal
More informationKernel Methods. Foundations of Data Analysis. Torsten Möller. Möller/Mori 1
Kernel Methods Foundations of Data Analysis Torsten Möller Möller/Mori 1 Reading Chapter 6 of Pattern Recognition and Machine Learning by Bishop Chapter 12 of The Elements of Statistical Learning by Hastie,
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationOutline. Motivation. Mapping the input space to the feature space Calculating the dot product in the feature space
to The The A s s in to Fabio A. González Ph.D. Depto. de Ing. de Sistemas e Industrial Universidad Nacional de Colombia, Bogotá April 2, 2009 to The The A s s in 1 Motivation Outline 2 The Mapping the
More informationReview: Support vector machines. Machine learning techniques and image analysis
Review: Support vector machines Review: Support vector machines Margin optimization min (w,w 0 ) 1 2 w 2 subject to y i (w 0 + w T x i ) 1 0, i = 1,..., n. Review: Support vector machines Margin optimization
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationLinear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction
Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the
More informationThe Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee
The Learning Problem and Regularization 9.520 Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing
More informationSupport Vector Machines
Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date
More informationLMS Algorithm Summary
LMS Algorithm Summary Step size tradeoff Other Iterative Algorithms LMS algorithm with variable step size: w(k+1) = w(k) + µ(k)e(k)x(k) When step size µ(k) = µ/k algorithm converges almost surely to optimal
More informationMATH 590: Meshfree Methods
MATH 590: Meshfree Methods Chapter 2 Part 3: Native Space for Positive Definite Kernels Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Fall 2014 fasshauer@iit.edu MATH
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationBeyond the Point Cloud: From Transductive to Semi-Supervised Learning
Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationElements of Positive Definite Kernel and Reproducing Kernel Hilbert Space
Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department
More informationSupport Vector Machines: Kernels
Support Vector Machines: Kernels CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 14.1, 14.2, 14.4 Schoelkopf/Smola Chapter 7.4, 7.6, 7.8 Non-Linear Problems
More informationKernel Methods. Charles Elkan October 17, 2007
Kernel Methods Charles Elkan elkan@cs.ucsd.edu October 17, 2007 Remember the xor example of a classification problem that is not linearly separable. If we map every example into a new representation, then
More informationRegularization in Reproducing Kernel Banach Spaces
.... Regularization in Reproducing Kernel Banach Spaces Guohui Song School of Mathematical and Statistical Sciences Arizona State University Comp Math Seminar, September 16, 2010 Joint work with Dr. Fred
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationKaggle.
Administrivia Mini-project 2 due April 7, in class implement multi-class reductions, naive bayes, kernel perceptron, multi-class logistic regression and two layer neural networks training set: Project
More informationKernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning
Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:
More informationThe Kernel Trick. Robert M. Haralick. Computer Science, Graduate Center City University of New York
The Kernel Trick Robert M. Haralick Computer Science, Graduate Center City University of New York Outline SVM Classification < (x 1, c 1 ),..., (x Z, c Z ) > is the training data c 1,..., c Z { 1, 1} specifies
More information22 : Hilbert Space Embeddings of Distributions
10-708: Probabilistic Graphical Models 10-708, Spring 2014 22 : Hilbert Space Embeddings of Distributions Lecturer: Eric P. Xing Scribes: Sujay Kumar Jauhar and Zhiguang Huo 1 Introduction and Motivation
More informationMIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 04: Features and Kernels. Lorenzo Rosasco
MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 04: Features and Kernels Lorenzo Rosasco Linear functions Let H lin be the space of linear functions f(x) = w x. f w is one
More informationKernel Principal Component Analysis
Kernel Principal Component Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationMachine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML)
Machine Learning Fall 2017 Kernels (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang (Chap. 12 of CIML) Nonlinear Features x4: -1 x1: +1 x3: +1 x2: -1 Concatenated (combined) features XOR:
More informationBasis Expansion and Nonlinear SVM. Kai Yu
Basis Expansion and Nonlinear SVM Kai Yu Linear Classifiers f(x) =w > x + b z(x) = sign(f(x)) Help to learn more general cases, e.g., nonlinear models 8/7/12 2 Nonlinear Classifiers via Basis Expansion
More informationThe Kernel Trick. Carlos C. Rodríguez October 25, Why don t we do it in higher dimensions?
The Kernel Trick Carlos C. Rodríguez http://omega.albany.edu:8008/ October 25, 2004 Why don t we do it in higher dimensions? If SVMs were able to handle only linearly separable data, their usefulness would
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More information10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers
Computational Methods for Data Analysis Massimo Poesio SUPPORT VECTOR MACHINES Support Vector Machines Linear classifiers 1 Linear Classifiers denotes +1 denotes -1 w x + b>0 f(x,w,b) = sign(w x + b) How
More informationLecture 10: Support Vector Machine and Large Margin Classifier
Lecture 10: Support Vector Machine and Large Margin Classifier Applied Multivariate Analysis Math 570, Fall 2014 Xingye Qiao Department of Mathematical Sciences Binghamton University E-mail: qiao@math.binghamton.edu
More informationCS 7140: Advanced Machine Learning
Instructor CS 714: Advanced Machine Learning Lecture 3: Gaussian Processes (17 Jan, 218) Jan-Willem van de Meent (j.vandemeent@northeastern.edu) Scribes Mo Han (han.m@husky.neu.edu) Guillem Reus Muns (reusmuns.g@husky.neu.edu)
More informationThe Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017
The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the
More informationConnection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis
Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal
More informationSupport Vector Machines
Support Vector Machines INFO-4604, Applied Machine Learning University of Colorado Boulder September 28, 2017 Prof. Michael Paul Today Two important concepts: Margins Kernels Large Margin Classification
More informationMAT 419 Lecture Notes Transcribed by Eowyn Cenek 6/1/2012
(Homework 1: Chapter 1: Exercises 1-7, 9, 11, 19, due Monday June 11th See also the course website for lectures, assignments, etc) Note: today s lecture is primarily about definitions Lots of definitions
More informationNONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function
More informationLecture 7: Kernels for Classification and Regression
Lecture 7: Kernels for Classification and Regression CS 194-10, Fall 2011 Laurent El Ghaoui EECS Department UC Berkeley September 15, 2011 Outline Outline A linear regression problem Linear auto-regressive
More informationSVMs, Duality and the Kernel Trick
SVMs, Duality and the Kernel Trick Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 26 th, 2007 2005-2007 Carlos Guestrin 1 SVMs reminder 2005-2007 Carlos Guestrin 2 Today
More informationKernel Ridge Regression. Mohammad Emtiyaz Khan EPFL Oct 27, 2015
Kernel Ridge Regression Mohammad Emtiyaz Khan EPFL Oct 27, 2015 Mohammad Emtiyaz Khan 2015 Motivation The ridge solution β R D has a counterpart α R N. Using duality, we will establish a relationship between
More informationManifold Learning: Theory and Applications to HRI
Manifold Learning: Theory and Applications to HRI Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr August 19, 2008 1 / 46 Greek Philosopher
More informationMachine Learning. Support Vector Machines. Manfred Huber
Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data
More informationSupport Vector Machines
Support Vector Machines Reading: Ben-Hur & Weston, A User s Guide to Support Vector Machines (linked from class web page) Notation Assume a binary classification problem. Instances are represented by vector
More informationMachine Learning. Classification, Discriminative learning. Marc Toussaint University of Stuttgart Summer 2015
Machine Learning Classification, Discriminative learning Structured output, structured input, discriminative function, joint input-output features, Likelihood Maximization, Logistic regression, binary
More informationMemory Efficient Kernel Approximation
Si Si Department of Computer Science University of Texas at Austin ICML Beijing, China June 23, 2014 Joint work with Cho-Jui Hsieh and Inderjit S. Dhillon Outline Background Motivation Low-Rank vs. Block
More informationKernel Learning via Random Fourier Representations
Kernel Learning via Random Fourier Representations L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Module 5: Machine Learning L. Law, M. Mider, X. Miscouridou, S. Ip, A. Wang Kernel Learning via Random
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationLecture Notes on Support Vector Machine
Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is
More information9.2 Support Vector Machines 159
9.2 Support Vector Machines 159 9.2.3 Kernel Methods We have all the tools together now to make an exciting step. Let us summarize our findings. We are interested in regularized estimation problems of
More informationMachine Learning : Support Vector Machines
Machine Learning Support Vector Machines 05/01/2014 Machine Learning : Support Vector Machines Linear Classifiers (recap) A building block for almost all a mapping, a partitioning of the input space into
More informationSupport Vector Machines.
Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel
More informationAdvanced Introduction to Machine Learning CMU-10715
Advanced Introduction to Machine Learning CMU-10715 Gaussian Processes Barnabás Póczos http://www.gaussianprocess.org/ 2 Some of these slides in the intro are taken from D. Lizotte, R. Parr, C. Guesterin
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationKernel methods, kernel SVM and ridge regression
Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;
More informationSupport Vector Machine & Its Applications
Support Vector Machine & Its Applications A portion (1/3) of the slides are taken from Prof. Andrew Moore s SVM tutorial at http://www.cs.cmu.edu/~awm/tutorials Mingyue Tan The University of British Columbia
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationSupport Vector Machine
Support Vector Machine Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Linear Support Vector Machine Kernelized SVM Kernels 2 From ERM to RLM Empirical Risk Minimization in the binary
More informationKernels for Multi task Learning
Kernels for Multi task Learning Charles A Micchelli Department of Mathematics and Statistics State University of New York, The University at Albany 1400 Washington Avenue, Albany, NY, 12222, USA Massimiliano
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 12, 2007 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationLecture 4 February 2
4-1 EECS 281B / STAT 241B: Advanced Topics in Statistical Learning Spring 29 Lecture 4 February 2 Lecturer: Martin Wainwright Scribe: Luqman Hodgkinson Note: These lecture notes are still rough, and have
More informationFunctional Analysis Review
Outline 9.520: Statistical Learning Theory and Applications February 8, 2010 Outline 1 2 3 4 Vector Space Outline A vector space is a set V with binary operations +: V V V and : R V V such that for all
More informationA Kernel Between Sets of Vectors
A Kernel Between Sets of Vectors Risi Kondor Tony Jebara Columbia University, New York, USA. 1 A Kernel between Sets of Vectors In SVM, Gassian Processes, Kernel PCA, kernel K de nes feature map Φ : X
More informationStat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.
Linear SVM (separable case) First consider the scenario where the two classes of points are separable. It s desirable to have the width (called margin) between the two dashed lines to be large, i.e., have
More informationReproducing Kernel Hilbert Spaces
Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 11, 2009 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert
More informationKernel methods for comparing distributions, measuring dependence
Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations
More informationMachine Learning. Lecture 6: Support Vector Machine. Feng Li.
Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)
More informationLecture 10: A brief introduction to Support Vector Machine
Lecture 10: A brief introduction to Support Vector Machine Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department
More informationDeviations from linear separability. Kernel methods. Basis expansion for quadratic boundaries. Adding new features Systematic deviation
Deviations from linear separability Kernel methods CSE 250B Noise Find a separator that minimizes a convex loss function related to the number of mistakes. e.g. SVM, logistic regression. Systematic deviation
More informationRKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee
RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets 9.520 Class 22, 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce an alternate perspective of RKHS via integral operators
More informationJoint distribution optimal transportation for domain adaptation
Joint distribution optimal transportation for domain adaptation Changhuang Wan Mechanical and Aerospace Engineering Department The Ohio State University March 8 th, 2018 Joint distribution optimal transportation
More informationSupport'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan
Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationSupport Vector Machines
Two SVM tutorials linked in class website (please, read both): High-level presentation with applications (Hearst 1998) Detailed tutorial (Burges 1998) Support Vector Machines Machine Learning 10701/15781
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationKernel methods CSE 250B
Kernel methods CSE 250B Deviations from linear separability Noise Find a separator that minimizes a convex loss function related to the number of mistakes. e.g. SVM, logistic regression. Deviations from
More information1. Mathematical Foundations of Machine Learning
1. Mathematical Foundations of Machine Learning 1.1 Basic Concepts Definition of Learning Definition [Mitchell (1997)] A computer program is said to learn from experience E with respect to some class of
More informationCS4495/6495 Introduction to Computer Vision. 8C-L3 Support Vector Machines
CS4495/6495 Introduction to Computer Vision 8C-L3 Support Vector Machines Discriminative classifiers Discriminative classifiers find a division (surface) in feature space that separates the classes Several
More information