Representer theorem and kernel examples

Size: px
Start display at page:

Download "Representer theorem and kernel examples"

Transcription

1 CS81B/Stat41B Spring 008) Statistical Learning Theory Lecture: 8 Representer theorem and kernel examples Lecturer: Peter Bartlett Scribe: Howard Lei 1 Representer Theorem Recall that the SVM optimization problem can be ressed as follows: Jf ) = min f H Jf) where Jf) = C n hingeloss fx i ), y i ) + f H and H is a Reproducing Kernel Hilbert Space RKHS). Theorem 1.1. Fix a kernel k, and let H be the corresponding RKHS. Then, for a function L: R n R and non-decreasing Ω: R R, if the SVM optimization problem can be ressed as: Jf ) = min f H Jf) = min Lfx1 )... fx n )) + Ω f H)) f H then the solution can be ressed as: f = α i kx i, ) Furthermore, if Ω is strictly increasing, then all solutions have this form. This shows that to solve the SVM optimization problem, we only need to solve for the α i, which agrees with the solution obtained via the Lagrangian formulation of the problem. Furthermore, our solution lies in the span of the kernels. Suppose we project f onto the subspace: span{kx i, ): 1 i n} obtaining f s the component along the subspace) and f the component perpendicular to the subspace). We have: f = f s + f f = f s + f f s Since Ω is non-decreasing, Ω f H) Ω f s H) 1

2 Representer theorem and kernel examples implying that Ω ) is minimized if f lies in the subspace. Furthermore, since the kernel k has the reproducing property, we have: Implying that: fx i ) = f, kx i, ) = f s, kx i, ) + f, kx i, ) = f s, kx i, ) = f s x i ) Lfx 1 ),..., fx n )) = Lf s x 1 ),..., f s x n )) Hence, L ) depends only on the component of f lying in the subspace: span{kx i, ): 1 i n}, and Ω ) is minimized if f lies in that subspace. Hence, Jf) is minimized if f lies in that subspace, and we can ress the minimizer as: f ) = α i kx i, ) Note that if Ω ) is strictly non-decreasing, then f must necessarily be zero for f to be the minimizer of Jf), implying that f must necessarily lie in the subspace: span{kx i, ): 1 i n}. Constructing Kernels In this section, we discuss ways to construct new kernels from previously defined kernels. Suppose k 1 and k are valid symmetric, positive definite) kernels on X. Then, the following are valid kernels: 1. ku, v) = αk 1 u, v) + βk u, v), for α, β 0 Since αk 1 u, v) = αφ 1 u), αφ 1 v) and βk u, v) = βφ u), βφ v), then: ku, v) = αk 1 u, v) + βk u, v) 1) = αφ 1 u), αφ 1 v) + βφ u), βφ v) ) = [ αφ 1 u) βφ u)], [ αφ 1 v) βφ v)] 3) and we see that ku, v) can be ressed as an inner product. ku, v) = k 1 u, v)k u, v) Note that the gram matrix K for k is the Hadamard product or element-by-element product) of K 1 and K K = K 1 K ). Suppose that K 1 and K are covariance matrices of X 1,..., X n ) and Y 1,..., Y n ) respectively. Then K is simply the covariance matrix of X 1 Y 1,..., X n Y n ), implying that it is symmetric and positive definite. 3. ku, v) = k 1 fu), fv)), where f: X X Since f is a transformation in the same domain, k is simply a different kernel in that domain: ku, v) = k 1 fu), fv)) = Φfu)), Φfv)) = Φ f u), Φ f v)

3 Representer theorem and kernel examples 3 4. ku, v) = gu)gv), for g: X R We can ress the gram matrix K as the outer product of the vector γ = [gx 1 ),..., gx n )]. Hence, K is symmetric and positive semi-definite with rank 1. It is positive semi-definite because the non-zero eigenvalue of γγ is the trace of γγ which is the trace of γ γ which is simply γ γ which is greater than or equal to 0). 5. ku, v) = fk 1 u, v)), where f is a polynomial with positive coefficients. Since each polynomial term is a product of kernels with a positive coefficient, the proof follows by applying 1 and. 6. ku, v) = k 1 u, v)) Since: The proof follows from 5 and the fact that: x) = lim 1 + x + + x ) i i i! ku, v) = lim i k i u, v) ) 7. ku, v) = u v σ ku, v) = u v = σ ) = u σ ) ) u v +u v σ v σ )) ) u v σ = gu)gv))k 1 u, v)) 6) gu)gv) is a kernel according to 4, and k 1 u, v)) is a kernel according to 6. According to, the product of two kernels is a valid kernel. 4) 5) Note that the Gaussian kernel is translation-invariant, where ku, v) can be ressed as fu v) = fx). Example: Translation-invariant kernels Consider the function f: [ π, π] R, and suppose that f is continuous and even i.e. fx) = f x)). Then, we can ress f via the Fourier ansion as: fx) = a n cosnx) n=0

4 4 Representer theorem and kernel examples where a n 0. If we let x be the difference of u and v, then we have: fx) = fu v) = a 0 + = a n sinnu)sinnv) + cosnu)cosnv)) 7) n=1 λ i Ψ i u)ψ i v), 8) i=0 where {Ψ i } = {sinnu) : n 1} {cosnu) : n 0}. We see that fu v) is a valid kernel that s translation invariant. This example shows that we can choose the kernel by choosing the a i coefficients, which is equivalent to choosing a filter. Example: Bag-of-words kernel Suppose that Φ w d) is the number of times word w appears in document d. If we want to classify documents by their word counts, we can use the kernel kd 1, d ) = Φd 1 ), Φd ). In practice, these counts are weighted to take into account the relative frequency of different words.) Example: Marginalized kernel Given the probability distribution px, h) and hence ph x)) and a kernel defined for x,h) pairs kx, h), x, h ))), we can obtain a kernel on only the x s as follows: k m x, x ) = h,h kx, h), x, h ))ph x)ph x ) Exercise: Prove that this is a valid kernel! Example: Convolution kernel or string kernel) Define a i to be a letter of the alphabet, s = s i,..., s l ) to be a string of letters, and Σ to be the space of all possible letter sequences. Suppose that s has a = a 1,..., a n ) as a subsequence if there exists a sequence of indices I = i 1,..., i n ), where i 1 < i < < i n with s ij = a j, where j = 1,..., n. Define the length of the set of indices i 1,..., i n ) forming the subsequence as li) = i n i For simplicity, we use the notation s[i] = a. Define, for fixed n, the feature map for a particular sequence a and string s: Φ a s) = I: s[i]=a λ li) where λ 0, 1). To compare two strings s and s, we can use the following kernel: ks, s ) = a Σ n Φ a s)φ a s )

5 Representer theorem and kernel examples 5 We can also derive the above kernel via convolution. Define the following kernel: k 0 s, i), s, i )) = 1[si) = s i )] Set k n s, i), s, i )) = k 0 s, i), s, i ))h k n 1 )s, i), s, i )) where hi j) = 1[i j > 0]λ i j), and is the convolution operator. Then: h k n 1 )s, i), s, i )) = hi j)hi j )k n 1 s, i), s, i )) j,j and ks, s ) = i,i k n s, i), s, i ))

Kernels MIT Course Notes

Kernels MIT Course Notes Kernels MIT 15.097 Course Notes Cynthia Rudin Credits: Bartlett, Schölkopf and Smola, Cristianini and Shawe-Taylor The kernel trick that I m going to show you applies much more broadly than SVM, but we

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

Kernels A Machine Learning Overview

Kernels A Machine Learning Overview Kernels A Machine Learning Overview S.V.N. Vishy Vishwanathan vishy@axiom.anu.edu.au National ICT of Australia and Australian National University Thanks to Alex Smola, Stéphane Canu, Mike Jordan and Peter

More information

Kernel Methods. Outline

Kernel Methods. Outline Kernel Methods Quang Nguyen University of Pittsburgh CS 3750, Fall 2011 Outline Motivation Examples Kernels Definitions Kernel trick Basic properties Mercer condition Constructing feature space Hilbert

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date

More information

Kernel methods, kernel SVM and ridge regression

Kernel methods, kernel SVM and ridge regression Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;

More information

Oslo Class 2 Tikhonov regularization and kernels

Oslo Class 2 Tikhonov regularization and kernels RegML2017@SIMULA Oslo Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT May 3, 2017 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Support Vector Machines

Support Vector Machines Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 11, 2009 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

22 : Hilbert Space Embeddings of Distributions

22 : Hilbert Space Embeddings of Distributions 10-708: Probabilistic Graphical Models 10-708, Spring 2014 22 : Hilbert Space Embeddings of Distributions Lecturer: Eric P. Xing Scribes: Sujay Kumar Jauhar and Zhiguang Huo 1 Introduction and Motivation

More information

RegML 2018 Class 2 Tikhonov regularization and kernels

RegML 2018 Class 2 Tikhonov regularization and kernels RegML 2018 Class 2 Tikhonov regularization and kernels Lorenzo Rosasco UNIGE-MIT-IIT June 17, 2018 Learning problem Problem For H {f f : X Y }, solve min E(f), f H dρ(x, y)l(f(x), y) given S n = (x i,

More information

MATH 829: Introduction to Data Mining and Analysis Support vector machines and kernels

MATH 829: Introduction to Data Mining and Analysis Support vector machines and kernels 1/12 MATH 829: Introduction to Data Mining and Analysis Support vector machines and kernels Dominique Guillot Departments of Mathematical Sciences University of Delaware March 14, 2016 Separating sets:

More information

Lecture 4 February 2

Lecture 4 February 2 4-1 EECS 281B / STAT 241B: Advanced Topics in Statistical Learning Spring 29 Lecture 4 February 2 Lecturer: Martin Wainwright Scribe: Luqman Hodgkinson Note: These lecture notes are still rough, and have

More information

The Representor Theorem, Kernels, and Hilbert Spaces

The Representor Theorem, Kernels, and Hilbert Spaces The Representor Theorem, Kernels, and Hilbert Spaces We will now work with infinite dimensional feature vectors and parameter vectors. The space l is defined to be the set of sequences f 1, f, f 3,...

More information

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee

The Learning Problem and Regularization Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee The Learning Problem and Regularization 9.520 Class 03, 11 February 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing

More information

Functional Gradient Descent

Functional Gradient Descent Statistical Techniques in Robotics (16-831, F12) Lecture #21 (Nov 14, 2012) Functional Gradient Descent Lecturer: Drew Bagnell Scribe: Daniel Carlton Smith 1 1 Goal of Functional Gradient Descent We have

More information

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets 9.520 Class 22, 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce an alternate perspective of RKHS via integral operators

More information

Kernels and the Kernel Trick. Machine Learning Fall 2017

Kernels and the Kernel Trick. Machine Learning Fall 2017 Kernels and the Kernel Trick Machine Learning Fall 2017 1 Support vector machines Training by maximizing margin The SVM objective Solving the SVM optimization problem Support vectors, duals and kernels

More information

Kernel Method: Data Analysis with Positive Definite Kernels

Kernel Method: Data Analysis with Positive Definite Kernels Kernel Method: Data Analysis with Positive Definite Kernels 2. Positive Definite Kernel and Reproducing Kernel Hilbert Space Kenji Fukumizu The Institute of Statistical Mathematics. Graduate University

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 9, 2011 About this class Goal In this class we continue our journey in the world of RKHS. We discuss the Mercer theorem which gives

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces Reproducing Kernel Hilbert Spaces Lorenzo Rosasco 9.520 Class 03 February 12, 2007 About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Linear Algebra, 4th day, Thursday 7/1/04 REU Info:

Linear Algebra, 4th day, Thursday 7/1/04 REU Info: Linear Algebra, 4th day, Thursday 7/1/04 REU 004. Info http//people.cs.uchicago.edu/laci/reu04. Instructor Laszlo Babai Scribe Nick Gurski 1 Linear maps We shall study the notion of maps between vector

More information

Algebra II. Paulius Drungilas and Jonas Jankauskas

Algebra II. Paulius Drungilas and Jonas Jankauskas Algebra II Paulius Drungilas and Jonas Jankauskas Contents 1. Quadratic forms 3 What is quadratic form? 3 Change of variables. 3 Equivalence of quadratic forms. 4 Canonical form. 4 Normal form. 7 Positive

More information

Hilbert Space Methods in Learning

Hilbert Space Methods in Learning Hilbert Space Methods in Learning guest lecturer: Risi Kondor 6772 Advanced Machine Learning and Perception (Jebara), Columbia University, October 15, 2003. 1 1. A general formulation of the learning problem

More information

2. Review of Linear Algebra

2. Review of Linear Algebra 2. Review of Linear Algebra ECE 83, Spring 217 In this course we will represent signals as vectors and operators (e.g., filters, transforms, etc) as matrices. This lecture reviews basic concepts from linear

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces 9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we

More information

There are two things that are particularly nice about the first basis

There are two things that are particularly nice about the first basis Orthogonality and the Gram-Schmidt Process In Chapter 4, we spent a great deal of time studying the problem of finding a basis for a vector space We know that a basis for a vector space can potentially

More information

Vectors in Function Spaces

Vectors in Function Spaces Jim Lambers MAT 66 Spring Semester 15-16 Lecture 18 Notes These notes correspond to Section 6.3 in the text. Vectors in Function Spaces We begin with some necessary terminology. A vector space V, also

More information

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak, scribe: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters,

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

Support Vector Machine

Support Vector Machine Support Vector Machine Kernel: Kernel is defined as a function returning the inner product between the images of the two arguments k(x 1, x 2 ) = ϕ(x 1 ), ϕ(x 2 ) k(x 1, x 2 ) = k(x 2, x 1 ) modularity-

More information

LINEAR ALGEBRA (PMTH213) Tutorial Questions

LINEAR ALGEBRA (PMTH213) Tutorial Questions Tutorial Questions The tutorial exercises range from revision and routine practice, through lling in details in the notes, to applications of the theory. While the tutorial problems are not compulsory,

More information

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal

More information

Inner Product Spaces

Inner Product Spaces Inner Product Spaces Introduction Recall in the lecture on vector spaces that geometric vectors (i.e. vectors in two and three-dimensional Cartesian space have the properties of addition, subtraction,

More information

NATIONAL UNIVERSITY OF SINGAPORE DEPARTMENT OF MATHEMATICS SEMESTER 2 EXAMINATION, AY 2010/2011. Linear Algebra II. May 2011 Time allowed :

NATIONAL UNIVERSITY OF SINGAPORE DEPARTMENT OF MATHEMATICS SEMESTER 2 EXAMINATION, AY 2010/2011. Linear Algebra II. May 2011 Time allowed : NATIONAL UNIVERSITY OF SINGAPORE DEPARTMENT OF MATHEMATICS SEMESTER 2 EXAMINATION, AY 2010/2011 Linear Algebra II May 2011 Time allowed : 2 hours INSTRUCTIONS TO CANDIDATES 1. This examination paper contains

More information

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable.

Stat542 (F11) Statistical Learning. First consider the scenario where the two classes of points are separable. Linear SVM (separable case) First consider the scenario where the two classes of points are separable. It s desirable to have the width (called margin) between the two dashed lines to be large, i.e., have

More information

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto Reproducing Kernel Hilbert Spaces 9.520 Class 03, 15 February 2006 Andrea Caponnetto About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Global Optimization of Polynomials

Global Optimization of Polynomials Semidefinite Programming Lecture 9 OR 637 Spring 2008 April 9, 2008 Scribe: Dennis Leventhal Global Optimization of Polynomials Recall we were considering the problem min z R n p(z) where p(z) is a degree

More information

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning

Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Beyond the Point Cloud: From Transductive to Semi-Supervised Learning Vikas Sindhwani, Partha Niyogi, Mikhail Belkin Andrew B. Goldberg goldberg@cs.wisc.edu Department of Computer Sciences University of

More information

Introduction to Kernel methods

Introduction to Kernel methods Introduction to Kernel ethods ML Workshop, ISI Kolkata Chiranjib Bhattacharyya Machine Learning lab Dept of CSA, IISc chiru@csa.iisc.ernet.in http://drona.csa.iisc.ernet.in/~chiru 19th Oct, 2012 Introduction

More information

(i) The optimisation problem solved is 1 min

(i) The optimisation problem solved is 1 min STATISTICAL LEARNING IN PRACTICE Part III / Lent 208 Example Sheet 3 (of 4) By Dr T. Wang You have the option to submit your answers to Questions and 4 to be marked. If you want your answers to be marked,

More information

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) Diffeomorphic Warping Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) What Manifold Learning Isn t Common features of Manifold Learning Algorithms: 1-1 charting Dense sampling Geometric Assumptions

More information

Kernel Methods. Charles Elkan October 17, 2007

Kernel Methods. Charles Elkan October 17, 2007 Kernel Methods Charles Elkan elkan@cs.ucsd.edu October 17, 2007 Remember the xor example of a classification problem that is not linearly separable. If we map every example into a new representation, then

More information

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x = Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.

More information

Minimax risk bounds for linear threshold functions

Minimax risk bounds for linear threshold functions CS281B/Stat241B (Spring 2008) Statistical Learning Theory Lecture: 3 Minimax risk bounds for linear threshold functions Lecturer: Peter Bartlett Scribe: Hao Zhang 1 Review We assume that there is a probability

More information

10-701/ Recitation : Kernels

10-701/ Recitation : Kernels 10-701/15-781 Recitation : Kernels Manojit Nandi February 27, 2014 Outline Mathematical Theory Banach Space and Hilbert Spaces Kernels Commonly Used Kernels Kernel Theory One Weird Kernel Trick Representer

More information

Assignment 1 Math 5341 Linear Algebra Review. Give complete answers to each of the following questions. Show all of your work.

Assignment 1 Math 5341 Linear Algebra Review. Give complete answers to each of the following questions. Show all of your work. Assignment 1 Math 5341 Linear Algebra Review Give complete answers to each of the following questions Show all of your work Note: You might struggle with some of these questions, either because it has

More information

Review and problem list for Applied Math I

Review and problem list for Applied Math I Review and problem list for Applied Math I (This is a first version of a serious review sheet; it may contain errors and it certainly omits a number of topic which were covered in the course. Let me know

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 04: Features and Kernels. Lorenzo Rosasco

MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications. Class 04: Features and Kernels. Lorenzo Rosasco MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 04: Features and Kernels Lorenzo Rosasco Linear functions Let H lin be the space of linear functions f(x) = w x. f w is one

More information

Scattered Data Interpolation with Polynomial Precision and Conditionally Positive Definite Functions

Scattered Data Interpolation with Polynomial Precision and Conditionally Positive Definite Functions Chapter 3 Scattered Data Interpolation with Polynomial Precision and Conditionally Positive Definite Functions 3.1 Scattered Data Interpolation with Polynomial Precision Sometimes the assumption on the

More information

Machine Learning : Support Vector Machines

Machine Learning : Support Vector Machines Machine Learning Support Vector Machines 05/01/2014 Machine Learning : Support Vector Machines Linear Classifiers (recap) A building block for almost all a mapping, a partitioning of the input space into

More information

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University

Chapter 9. Support Vector Machine. Yongdai Kim Seoul National University Chapter 9. Support Vector Machine Yongdai Kim Seoul National University 1. Introduction Support Vector Machine (SVM) is a classification method developed by Vapnik (1996). It is thought that SVM improved

More information

CS798: Selected topics in Machine Learning

CS798: Selected topics in Machine Learning CS798: Selected topics in Machine Learning Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Jakramate Bootkrajang CS798: Selected topics in Machine Learning

More information

Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space

Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Elements of Positive Definite Kernel and Reproducing Kernel Hilbert Space Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department

More information

LMS Algorithm Summary

LMS Algorithm Summary LMS Algorithm Summary Step size tradeoff Other Iterative Algorithms LMS algorithm with variable step size: w(k+1) = w(k) + µ(k)e(k)x(k) When step size µ(k) = µ/k algorithm converges almost surely to optimal

More information

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination Math 0, Winter 07 Final Exam Review Chapter. Matrices and Gaussian Elimination { x + x =,. Different forms of a system of linear equations. Example: The x + 4x = 4. [ ] [ ] [ ] vector form (or the column

More information

Fitting Linear Statistical Models to Data by Least Squares I: Introduction

Fitting Linear Statistical Models to Data by Least Squares I: Introduction Fitting Linear Statistical Models to Data by Least Squares I: Introduction Brian R. Hunt and C. David Levermore University of Maryland, College Park Math 420: Mathematical Modeling February 5, 2014 version

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear

More information

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space

The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space Robert Jenssen, Deniz Erdogmus 2, Jose Principe 2, Torbjørn Eltoft Department of Physics, University of Tromsø, Norway

More information

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters, transforms,

More information

Theory of Positive Definite Kernel and Reproducing Kernel Hilbert Space

Theory of Positive Definite Kernel and Reproducing Kernel Hilbert Space Theory of Positive Definite Kernel and Reproducing Kernel Hilbert Space Statistical Inference with Reproducing Kernel Hilbert Space Kenji Fukumizu Institute of Statistical Mathematics, ROIS Department

More information

INTRODUCTION TO LIE ALGEBRAS. LECTURE 2.

INTRODUCTION TO LIE ALGEBRAS. LECTURE 2. INTRODUCTION TO LIE ALGEBRAS. LECTURE 2. 2. More examples. Ideals. Direct products. 2.1. More examples. 2.1.1. Let k = R, L = R 3. Define [x, y] = x y the cross-product. Recall that the latter is defined

More information

LECTURE NOTE #11 PROF. ALAN YUILLE

LECTURE NOTE #11 PROF. ALAN YUILLE LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform

More information

Support Vector Machines: Kernels

Support Vector Machines: Kernels Support Vector Machines: Kernels CS6780 Advanced Machine Learning Spring 2015 Thorsten Joachims Cornell University Reading: Murphy 14.1, 14.2, 14.4 Schoelkopf/Smola Chapter 7.4, 7.6, 7.8 Non-Linear Problems

More information

Kernel Principal Component Analysis

Kernel Principal Component Analysis Kernel Principal Component Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Math 312 Final Exam Jerry L. Kazdan May 5, :00 2:00

Math 312 Final Exam Jerry L. Kazdan May 5, :00 2:00 Math 32 Final Exam Jerry L. Kazdan May, 204 2:00 2:00 Directions This exam has three parts. Part A has shorter questions, (6 points each), Part B has 6 True/False questions ( points each), and Part C has

More information

Manifold Learning: Theory and Applications to HRI

Manifold Learning: Theory and Applications to HRI Manifold Learning: Theory and Applications to HRI Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr August 19, 2008 1 / 46 Greek Philosopher

More information

Linear Algebra problems

Linear Algebra problems Linear Algebra problems 1. Show that the set F = ({1, 0}, +,.) is a field where + and. are defined as 1+1=0, 0+0=0, 0+1=1+0=1, 0.0=0.1=1.0=0, 1.1=1.. Let X be a non-empty set and F be any field. Let X

More information

Support Vector Machines

Support Vector Machines EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Exercises * on Linear Algebra

Exercises * on Linear Algebra Exercises * on Linear Algebra Laurenz Wiskott Institut für Neuroinformatik Ruhr-Universität Bochum, Germany, EU 4 February 7 Contents Vector spaces 4. Definition...............................................

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Polynomial interpolation on the sphere, reproducing kernels and random matrices

Polynomial interpolation on the sphere, reproducing kernels and random matrices Polynomial interpolation on the sphere, reproducing kernels and random matrices Paul Leopardi Mathematical Sciences Institute, Australian National University. For presentation at MASCOS Workshop on Stochastics

More information

Kernel Methods. Foundations of Data Analysis. Torsten Möller. Möller/Mori 1

Kernel Methods. Foundations of Data Analysis. Torsten Möller. Möller/Mori 1 Kernel Methods Foundations of Data Analysis Torsten Möller Möller/Mori 1 Reading Chapter 6 of Pattern Recognition and Machine Learning by Bishop Chapter 12 of The Elements of Statistical Learning by Hastie,

More information

Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

More information

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the

More information

How Good is a Kernel When Used as a Similarity Measure?

How Good is a Kernel When Used as a Similarity Measure? How Good is a Kernel When Used as a Similarity Measure? Nathan Srebro Toyota Technological Institute-Chicago IL, USA IBM Haifa Research Lab, ISRAEL nati@uchicago.edu Abstract. Recently, Balcan and Blum

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal

More information

EECS 598: Statistical Learning Theory, Winter 2014 Topic 11. Kernels

EECS 598: Statistical Learning Theory, Winter 2014 Topic 11. Kernels EECS 598: Statistical Learning Theory, Winter 2014 Topic 11 Kernels Lecturer: Clayton Scott Scribe: Jun Guo, Soumik Chatterjee Disclaimer: These notes have not been subjected to the usual scrutiny reserved

More information

LECTURE 7. k=1 (, v k)u k. Moreover r

LECTURE 7. k=1 (, v k)u k. Moreover r LECTURE 7 Finite rank operators Definition. T is said to be of rank r (r < ) if dim T(H) = r. The class of operators of rank r is denoted by K r and K := r K r. Theorem 1. T K r iff T K r. Proof. Let T

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall

More information

Exercises * on Principal Component Analysis

Exercises * on Principal Component Analysis Exercises * on Principal Component Analysis Laurenz Wiskott Institut für Neuroinformatik Ruhr-Universität Bochum, Germany, EU 4 February 207 Contents Intuition 3. Problem statement..........................................

More information

SOLUTION KEY TO THE LINEAR ALGEBRA FINAL EXAM 1 2 ( 2) ( 1) c a = 1 0

SOLUTION KEY TO THE LINEAR ALGEBRA FINAL EXAM 1 2 ( 2) ( 1) c a = 1 0 SOLUTION KEY TO THE LINEAR ALGEBRA FINAL EXAM () We find a least squares solution to ( ) ( ) A x = y or 0 0 a b = c 4 0 0. 0 The normal equation is A T A x = A T y = y or 5 0 0 0 0 0 a b = 5 9. 0 0 4 7

More information

Strictly Positive Definite Functions on a Real Inner Product Space

Strictly Positive Definite Functions on a Real Inner Product Space Strictly Positive Definite Functions on a Real Inner Product Space Allan Pinkus Abstract. If ft) = a kt k converges for all t IR with all coefficients a k 0, then the function f< x, y >) is positive definite

More information

Insights into the Geometry of the Gaussian Kernel and an Application in Geometric Modeling

Insights into the Geometry of the Gaussian Kernel and an Application in Geometric Modeling Insights into the Geometry of the Gaussian Kernel and an Application in Geometric Modeling Master Thesis Michael Eigensatz Advisor: Joachim Giesen Professor: Mark Pauly Swiss Federal Institute of Technology

More information

(Kernels +) Support Vector Machines

(Kernels +) Support Vector Machines (Kernels +) Support Vector Machines Machine Learning Torsten Möller Reading Chapter 5 of Machine Learning An Algorithmic Perspective by Marsland Chapter 6+7 of Pattern Recognition and Machine Learning

More information

Causal Inference by Minimizing the Dual Norm of Bias. Nathan Kallus. Cornell University and Cornell Tech

Causal Inference by Minimizing the Dual Norm of Bias. Nathan Kallus. Cornell University and Cornell Tech Causal Inference by Minimizing the Dual Norm of Bias Nathan Kallus Cornell University and Cornell Tech www.nathankallus.com Matching Zoo It s a zoo of matching estimators for causal effects: PSM, NN, CM,

More information

Lecture Notes on the Gaussian Distribution

Lecture Notes on the Gaussian Distribution Lecture Notes on the Gaussian Distribution Hairong Qi The Gaussian distribution is also referred to as the normal distribution or the bell curve distribution for its bell-shaped density curve. There s

More information

Linear Algebra Massoud Malek

Linear Algebra Massoud Malek CSUEB Linear Algebra Massoud Malek Inner Product and Normed Space In all that follows, the n n identity matrix is denoted by I n, the n n zero matrix by Z n, and the zero vector by θ n An inner product

More information

Lecture 3 January 28

Lecture 3 January 28 EECS 28B / STAT 24B: Advanced Topics in Statistical LearningSpring 2009 Lecture 3 January 28 Lecturer: Pradeep Ravikumar Scribe: Timothy J. Wheeler Note: These lecture notes are still rough, and have only

More information

CS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings

CS8803: Statistical Techniques in Robotics Byron Boots. Hilbert Space Embeddings CS8803: Statistical Techniques in Robotics Byron Boots Hilbert Space Embeddings 1 Motivation CS8803: STR Hilbert Space Embeddings 2 Overview Multinomial Distributions Marginal, Joint, Conditional Sum,

More information

3 Compact Operators, Generalized Inverse, Best- Approximate Solution

3 Compact Operators, Generalized Inverse, Best- Approximate Solution 3 Compact Operators, Generalized Inverse, Best- Approximate Solution As we have already heard in the lecture a mathematical problem is well - posed in the sense of Hadamard if the following properties

More information

Linear Algebra Lecture Notes-I

Linear Algebra Lecture Notes-I Linear Algebra Lecture Notes-I Vikas Bist Department of Mathematics Panjab University, Chandigarh-6004 email: bistvikas@gmail.com Last revised on February 9, 208 This text is based on the lectures delivered

More information

6.1 Composition of Functions

6.1 Composition of Functions 6. Composition of Functions SETTING THE STAGE Explore the concepts in this lesson in more detail using Exploration on page 579. Recall that composition was introduced as the result of substituting one

More information

Recall the convention that, for us, all vectors are column vectors.

Recall the convention that, for us, all vectors are column vectors. Some linear algebra Recall the convention that, for us, all vectors are column vectors. 1. Symmetric matrices Let A be a real matrix. Recall that a complex number λ is an eigenvalue of A if there exists

More information

Regularization in Reproducing Kernel Banach Spaces

Regularization in Reproducing Kernel Banach Spaces .... Regularization in Reproducing Kernel Banach Spaces Guohui Song School of Mathematical and Statistical Sciences Arizona State University Comp Math Seminar, September 16, 2010 Joint work with Dr. Fred

More information

9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee

9.520: Class 20. Bayesian Interpretations. Tomaso Poggio and Sayan Mukherjee 9.520: Class 20 Bayesian Interpretations Tomaso Poggio and Sayan Mukherjee Plan Bayesian interpretation of Regularization Bayesian interpretation of the regularizer Bayesian interpretation of quadratic

More information