Plan of Class 4. Radial Basis Functions with moving centers. Projection Pursuit Regression and ridge. Principal Component Analysis: basic ideas

Size: px
Start display at page:

Download "Plan of Class 4. Radial Basis Functions with moving centers. Projection Pursuit Regression and ridge. Principal Component Analysis: basic ideas"

Transcription

1 Plan of Class 4 Radial Basis Functions with moving centers Multilayer Perceptrons Projection Pursuit Regression and ridge functions approximation Principal Component Analysis: basic ideas Radial Basis Functions and dimensionality reduction From Additive Splines to ridge functions approximation

2 Regularization approach Asmooth function that approximates the data set D = f(x i y i ) 2 R d Rg N i=1 can be found minimizing the functional: X H[f]= N (y i ; f(x i )) 2 + ZRd ds j f(s)j ~ 2 ~G(s) i=1 where ~ G is a positive, even, function, decreasing to zero at innity, and a small, positive number.

3 Main result (Duchon, 1977 Meinguet, 1979 Wahba, 1977 Madych and Nelson, 1990 Poggio and Girosi, 1989 Girosi, 1992) The function that minimizes the functional X H[f]= N (y i ; f(x i )) 2 + ZRd ds j f(s)j ~ 2 ~G(s) i=1 has the form: X f(x)= N i=1 c i G(x ; x i )+ k X =1 d (x) where G is conditionally positive denite of order m, G ~ is the Fourier transform of G and f g k =1 is a basis in the space of polynomials of degree m ; 1.

4 Computation of the coecients The coecients are found by solving the linear system: (G + I)c +; T d = y ;c =0 where I is the identity matrix, and we have dened (y) i = y i (c) i = c i (d) i = d i (G) ij = G(x i ; x j ) (;) i = (x i )

5 Radial Basis Functions If the function ~ G is radial ( ~ G = ~ G(ksk) the smoothness functional is rotationally invariant, that is [f(x)] = [f(rx)] 8R 2 O(d) where O(d) is the group of rotations in d dimensions. The regularization solution becomes the so called Radial Basis Functions technique: f(x)= N X i=1 c i G(kx ; x i k)+p(x)

6 Radial Basis Functions Dene r = kxk G(x) =e ;r2 G(x) = gaussian q r 2 + c 2 multiquadric G(x) = 1 pc 2 +r 2 inverse multiquadric G(x) =r 2n+1 G(x) =r 2n ln r multivariate splines multivariate splines

7 Some good properties of RBF Very well motivated in the framework of regularization theory The solution is unique and equivalent to solve a linear system Amount of smoothness is tunable (with ) Radial Basis Functions are universal approximators Huge amount of literature on this subject (Interpretation in terms of \neural networks") Biologically plausible Simple interpretation in terms of\smooth look-up table" Similar to other non-parametric techniques, such as nearest neighbor and kernel regression

8 Some not-so-good properties of RBF Computationally expensive (O(N 3 ) where N =number of data) Linear system often badly ill-conditioned The same amount of smoothness is imposed on dierent regions of the domain

9 This function has dierent smoothness properties in dierent regions of its domain

10 Least Squares Regularization Networks We look for an approximation to the regularization solution: f(x)= N X f (x)= i=1 n X + =1 c i G(x ; x i ) c G(x ; t ) where n << N and the vectors t are called centers. (Broomhead and Lowe, 1988 Moody and Darken, 1989 Poggio and Girosi, 1989)

11 Least Squares Regularization Networks f (x)= n X =1 c G(x ; t ) Suppose the centers t have been xed. How do we nd the coecients c? + Least Squares

12 Dene Least Squares Regularization Networks X E(c1 ::: c n )= N (y i ; f (x i )) 2 i=1 The least squares criterion is min c E(c1 ::: c n ) The problem is convex and quadratic in the c, and the =0

13 Least Squares =0 + G T G c = G T y where we have dened (y) i = y i (c) = c (G) i = G(x i ; t ) Therefore c = G + y, where G + =(G T G) ;1 G T is the pseudoinverse of G.

14 Least Squares Regularization Networks Given the centers t we know howtondthe c. How do we choose the t? 1. a subset of the examples 2. by a clustering algorithm (k-means, for examples) 3. by least squares (\moving centers")

15 Centers as a subset of the examples Fair technique. The subset is a random subset, which should reect the distribution of the data. Not many theoretical results availabl. Main problem: how many centers? Main answer: we don't know. Cross validation techniques seem a reasonable choice.

16 Clustering in Rd Clustering a set of data points fx i g N i=1 means nding a set of m representative points fm k g m k=1. Let S k be the Voronoi polytopes of center m k : S k = fx : kx ; m k k < kx ; m j k j 6= kg A set of representative points M = fm k g m k=1 is optimal if solves: min M mx k=1 X x i 2S k kx i ; m k k

17 K-means algorithm The k-means algorithm (MacQueen, 1967) is an iterative technique for nding optimal representative points (cluster centers): m (t+1) k = 1 #S (t) k X x i 2S (t) k This algorithm is guaranteed to nd a local mininum. Many variations of this algorithm have been proposed. x i

18 K-means algorithm S 1 S 2 S 3

19 K-means algorithm S 1 S 2 S 3

20 K-means algorithm S 1 S 2 S 3

21 Finding the centers by clustering Very common. However it makes sense only if the input data points are clustered. No theoretical results. Not clear that it is agoodidea,especially for pattern classication cases.

22 Moving centers Dene X E(c1 ::: c n t 1 ::: t n )= N (y i ; f (x i )) 2 i=1 The least squares criterion is min E(c1 ::: c n t c t 1 ::: t n ) The problem is not convex and quadratic anymore: expect multiple local minima.

23 Moving centers :-) Very exible, in principle very powerful :-) Some theoretical understanding :-( Very expensive computationally due to the local minima problem :-( Centers sometimes move in \weird" ways

24 Some new approximation schemes Radial Basis Functions with moving centers is a particular case of a function approximation technique of the form: f(x)= n X =1 c H(x p ) where fp g n =1 is a set of parameters, which can be estimated by least squares techniques. Radial Basis Functions corresponds to the choice H(x p )=G(kx ; p k)

25 Neural Networks Neural networks correspond to a dierent choice. We set p (w ) and choose: H(x p )=(x w + ) where is a sigmoidal function. The resulting approximation scheme is f(x)= n X =1 c (x w + ) and it is called Multilayer Perceptron with one layer of hidden units

26 Examples of sigmoidal functions f(x)= 1 1+e ;x f(x) = tanh(x) Notice that when is very large the sigmoid approaches a step function.

27 One hidden layer Perceptron

28 Multilayer Perceptrons :-) Approximation schemes of this type have been successful in a number of cases :-( Interpretation of the approximation technique is not immediate :-) Multilayer Perceptrons are \universal approximator" :-) Some theoretical results available :-( Computationally expensive, local minima :- Motivation of this technique?

29 Multilayer perceptrons are particular cases of the more general ridge function approximation techniques: f(x)= n X =1 h (x w ) in which we have chosen h (x)=c h(x + ) for some xed function h (sigmoid).

30 Projection ursuit Regression (Friedman amd Stuetzle, 1981 Huber, 1986) Look for an approximating function of the form f(x)= n X =1 h (x w ) where w are unit vectors and h are functions to be tted to the data (often cubic splines, trigonometric polynomials or and Radial Basis Functions). The underlying idea is that all the information lies in few, one dimensional projections of the data. The unit vectors w are the \interesting projections" that have to be pursued.

31 PPR algorithm Assume that the rst ; 1 terms of the expansion have been determined, and de- ne the residuals r (;1) i = y i ; ;1 X =1 h (x i w ) Find the -th vector w and the -th function h that solve min w h NX (r (;1) i=1 i ; h (x i w )) 2 Go back to the rst step and iterate

32 Motivation of PPR Cluster analysis of high-dimensional set of data points D: Diaconis and Freedman (1984) show that for most high-dimensional clouds, most one-dimensional projections are approximately normal. In cluster analysis normal means \uninteresting". Therefore there are few interesting projections, that can be found maximizing a \projection index" that measures deviation from normality.

33 Principal Component Analysis

34 Let fx g N =1 be a set of points in Rd, with d possibly very large. Every point is specied by d numbers. Is there a more compact representation? It depends on the distribution of the data points in R d...

35 There are random points

36

37 and random points

38

39 Nr. Feature 1 pupil to nose vertical distance 2 pupil to mouth vertical distance 3 pupil to chin vertical distance 4 nose width 5 mouth width 6 face width halfway between nose tip and eyes 7 face width at nose position 8-13 chin radii 14 mouth height 15 upper lip thickness 16 lower lip thickness 17 pupil to eyebrow separation 18 eyebrow thickness

40 Features 8 to 14

41 Principal Component Analysis The data might lie (at least approximately) in a k-dimensional linear subspace of R d, with k<<d: X x k (x d i )d i i=1 where D = fd i g k i=1 vectors: is an orthonormal set of d i d j = ij Principal Component Analysis let us nd the best orthonormal set of vectors fd i g k i=1,which are called the rst k principal components.

42 The orthonormal basis D = fd i g k i=1 is found by solving the following least squares problem: min D NX =1 max D X (x ; k (x d i )d i ) 2 = NX i=1 =1 i=1 = 1 N max D kx (x d i ) 2 = kx i=1 d i Cd i where C is the correlation matrix: C ij = 1 NX x i N xj =1

43 Let us consider the case k =1,and d 1 = d. The problem is now to solve: 1 N max kdk=1 d Cd where C is a symmetrix matrix (correlation matrix). Setting C = R T R d = Rd with R a rotation matrix and diagonal, we have now to solve: 1 N kd max k=1 d d

44 Notice that, since kd k =1,then: d d max If we choose the vector d max such that d max =(0 0 ::: ::: 0) and d max = max d max then d max d max = max and d max maximizes the quadratic form d d.

45 The rst principal component, that is the vector d max that maximizes the quadratic form d Cd, is therefore d max = R T d max and it can be easily seen that satises the eigenvector equation: Cd max = max d max The rst principal component is therefore the eigenvector of the correlation matrix corresponding to the maximum eigenvalue.

46 Case k>1 It can be shown that the rst k principal components are the eigenvectors of the correlation matrix C corresponding to the rst k largest eigenvalues.

47 Try this at home: The approximation error of the rst k principal components fd i g k i=1 is given by X E(k)= N X (x ; k (x d i )d i ) 2 =1 i=1 E(k)= N X i=k+1 i where i are the eigenvalues of the correlation matrix C. Hint: when k = N the error is zero because the correlation matrix C is symmetric, and its eigenvectors form a complete basis.

48 An example in 20 dimensions: x 2 R 20, x 10 = x 8 + x 12, 500 points.

49 Correlation matrix

50 Correlation matrix Z Y X

51 Eigenvalues of the correlation matrix Eigenvalues

52 Another point of view on PCA Let fx g N =1 be a set of points in Rd. Problem: nd the unit vector d for which the projection of the data on d has the maximum variance. Assuming that the data points have zero mean, this means: max kdk=1 1 N NX (x d) 2 =1

53 This is formally the same as nding the rst principal component. The second principal component is the projection of maximum variance in a subspace orthogonal to the rst principal component.... The k-th principal component is the projection of maximum variance in a subspace wich is orthogonal to the subspace spanned by the rst k ; 1 principal components.

54 Extensions of Radial Basis Functions Dierent variables can have dierent scales: f(x y)=y 2 sin(100x) Dierent variables could have dierent units of measure f = f(x x x) Not all the variables are independent or relevant: f(x y z t)=g(x y z(x y)) Only some linear combinations of the variables are relevant: f(x y z)=sin(x + y + z)

55 Extensions of regularization theory A priori knowledge: the relevant variables are linear combination of the original ones: z = Wx for some (possibly rectangular) matrix W f(x) =g(wx) =g(z) and the function g is smooth The regularization functional is now X H[g]= N (y i ; g(z i )) 2 + [g] i=1 where z i = Wx i.

56 Extensions of regularization theory (continue) The solution is g(z)= N X i=1 c i G(z ; z i ) : Therefore the solution for f is: f(x)=g(wx)= N X i=1 c i G(Wx ; Wx i )

57 If the matrix W were known, the coecients could be computed as in the radial case: (G + I)c = y where (y) i = y i (c) i = c i (G) ij = G(Wx i ; Wx j ) and the same criticisms of the Regularization Networks technique apply, leading to Generalized Regularization Networks: f (x)= n X =1 c G(Wx ; Wt )

58 Since W is not known, it could be found by least squares. Dene X E(c1 c n W)= N (y i ; f (x i )) 2 Then we can solve: i=1 min c W E(c 1 c n W) The problem is not convex and quadratic anymore: expect multiple local minima.

59 From RBF to HyperBF When the basis function is radial the Generalized Regularization Networks becomes f(x)= n X =1 c G(kx ; t kw) that is a non radial basis function technique

60 Least Squares 1. min c E(c1 ::: c n ) 2. min c t E(c1 ::: c n t 1 ::: t n ) 3. min c W E(c1 c n W) 4. min c t W E(c1 ::: c n t 1 ::: t n W)

61 A non radial gaussian function

62 A non radial multiquadric function

63 Additive models An additive model has the form f(x)= d X f (x ) =1 where In other words f (x )= N X i=1 c i G(x ; x i ) f(x)= d X NX =1 i=1 c i G(x ; x i )

64 Extensions of Additive Models If we have less centers than examples we obtain: f(x)= d X f (x ) =1 where In other words f (x )= N X =1 c G(x ; t ) f(x)= d X NX =1 =1 c G(x ; t )

65 Extensions of Additive Models If we now allow for an arbitrary linear transformation of the inputs: x! Wx where W is a d 0 d matrix, we obtain: X f(x)= d0 nx =1 =1 c G(x w ; t ) where w is the -th row of the matrix W,

66 Extensions of Additive Models The expression X f(x)= d0 can be written as nx =1 =1 c G(x w ; t ) where X f(x)= d0 =1 h (x w ) h (y)= n X =1 c G(y ; t ) This form of approximation is called ridge approximation

67 From the extension of additive models we can therefore justify an approximation technique of the form X f(x)= d0 nx =1 =1 c G(x w ; t ) Particular case: n = 1 (one center). Then we derive the following technique: X f(x)= d0 =1 c G(x w ; t) which is a Multilayer Perceptron with a Radial Basis Functions G instead of the sigmoid function. Notice that the sigmoid function is not a Radial Basis Functions

68 Regularization Networks Regularization networks (RN) Radial Stabilizer RBF Regularization Additive Stabilizer Additive Splines Product Stabilizer Tensor Product Splines Generalized Regularization Networks (GRN) Movable metric Movable centers HyperBF if x = 1 Movable metric Movable centers Ridge Approximation Movable metric Movable centers

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee

RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets Class 22, 2004 Tomaso Poggio and Sayan Mukherjee RKHS, Mercer s theorem, Unbounded domains, Frames and Wavelets 9.520 Class 22, 2004 Tomaso Poggio and Sayan Mukherjee About this class Goal To introduce an alternate perspective of RKHS via integral operators

More information

ARTIFICIAL INTELLIGENCE LABORATORY. and CENTER FOR BIOLOGICAL INFORMATION PROCESSING. A.I. Memo No August Federico Girosi.

ARTIFICIAL INTELLIGENCE LABORATORY. and CENTER FOR BIOLOGICAL INFORMATION PROCESSING. A.I. Memo No August Federico Girosi. MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL INFORMATION PROCESSING WHITAKER COLLEGE A.I. Memo No. 1287 August 1991 C.B.I.P. Paper No. 66 Models of

More information

A New Trust Region Algorithm Using Radial Basis Function Models

A New Trust Region Algorithm Using Radial Basis Function Models A New Trust Region Algorithm Using Radial Basis Function Models Seppo Pulkkinen University of Turku Department of Mathematics July 14, 2010 Outline 1 Introduction 2 Background Taylor series approximations

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

COS 424: Interacting with Data

COS 424: Interacting with Data COS 424: Interacting with Data Lecturer: Rob Schapire Lecture #14 Scribe: Zia Khan April 3, 2007 Recall from previous lecture that in regression we are trying to predict a real value given our data. Specically,

More information

A Characterization of Principal Components. for Projection Pursuit. By Richard J. Bolton and Wojtek J. Krzanowski

A Characterization of Principal Components. for Projection Pursuit. By Richard J. Bolton and Wojtek J. Krzanowski A Characterization of Principal Components for Projection Pursuit By Richard J. Bolton and Wojtek J. Krzanowski Department of Mathematical Statistics and Operational Research, University of Exeter, Laver

More information

Slide05 Haykin Chapter 5: Radial-Basis Function Networks

Slide05 Haykin Chapter 5: Radial-Basis Function Networks Slide5 Haykin Chapter 5: Radial-Basis Function Networks CPSC 636-6 Instructor: Yoonsuck Choe Spring Learning in MLP Supervised learning in multilayer perceptrons: Recursive technique of stochastic approximation,

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

below, kernel PCA Eigenvectors, and linear combinations thereof. For the cases where the pre-image does exist, we can provide a means of constructing

below, kernel PCA Eigenvectors, and linear combinations thereof. For the cases where the pre-image does exist, we can provide a means of constructing Kernel PCA Pattern Reconstruction via Approximate Pre-Images Bernhard Scholkopf, Sebastian Mika, Alex Smola, Gunnar Ratsch, & Klaus-Robert Muller GMD FIRST, Rudower Chaussee 5, 12489 Berlin, Germany fbs,

More information

In the Name of God. Lectures 15&16: Radial Basis Function Networks

In the Name of God. Lectures 15&16: Radial Basis Function Networks 1 In the Name of God Lectures 15&16: Radial Basis Function Networks Some Historical Notes Learning is equivalent to finding a surface in a multidimensional space that provides a best fit to the training

More information

D. Shepard, Shepard functions, late 1960s (application, surface modelling)

D. Shepard, Shepard functions, late 1960s (application, surface modelling) Chapter 1 Introduction 1.1 History and Outline Originally, the motivation for the basic meshfree approximation methods (radial basis functions and moving least squares methods) came from applications in

More information

Vectors in many applications, most of the i, which are found by solving a quadratic program, turn out to be 0. Excellent classication accuracies in bo

Vectors in many applications, most of the i, which are found by solving a quadratic program, turn out to be 0. Excellent classication accuracies in bo Fast Approximation of Support Vector Kernel Expansions, and an Interpretation of Clustering as Approximation in Feature Spaces Bernhard Scholkopf 1, Phil Knirsch 2, Alex Smola 1, and Chris Burges 2 1 GMD

More information

Kernel Principal Component Analysis

Kernel Principal Component Analysis Kernel Principal Component Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) Contents 1 Vector Spaces 1 1.1 The Formal Denition of a Vector Space.................................. 1 1.2 Subspaces...................................................

More information

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2 Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal

More information

Statistical Methods for SVM

Statistical Methods for SVM Statistical Methods for SVM Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find a plane that separates the classes in feature space. If we cannot,

More information

Techinical Proofs for Nonlinear Learning using Local Coordinate Coding

Techinical Proofs for Nonlinear Learning using Local Coordinate Coding Techinical Proofs for Nonlinear Learning using Local Coordinate Coding 1 Notations and Main Results Denition 1.1 (Lipschitz Smoothness) A function f(x) on R d is (α, β, p)-lipschitz smooth with respect

More information

Problem Set 9 Due: In class Tuesday, Nov. 27 Late papers will be accepted until 12:00 on Thursday (at the beginning of class).

Problem Set 9 Due: In class Tuesday, Nov. 27 Late papers will be accepted until 12:00 on Thursday (at the beginning of class). Math 3, Fall Jerry L. Kazdan Problem Set 9 Due In class Tuesday, Nov. 7 Late papers will be accepted until on Thursday (at the beginning of class).. Suppose that is an eigenvalue of an n n matrix A and

More information

2 Tikhonov Regularization and ERM

2 Tikhonov Regularization and ERM Introduction Here we discusses how a class of regularization methods originally designed to solve ill-posed inverse problems give rise to regularized learning algorithms. These algorithms are kernel methods

More information

Chapter 4: Interpolation and Approximation. October 28, 2005

Chapter 4: Interpolation and Approximation. October 28, 2005 Chapter 4: Interpolation and Approximation October 28, 2005 Outline 1 2.4 Linear Interpolation 2 4.1 Lagrange Interpolation 3 4.2 Newton Interpolation and Divided Differences 4 4.3 Interpolation Error

More information

Reproducing Kernel Hilbert Spaces

Reproducing Kernel Hilbert Spaces 9.520: Statistical Learning Theory and Applications February 10th, 2010 Reproducing Kernel Hilbert Spaces Lecturer: Lorenzo Rosasco Scribe: Greg Durrett 1 Introduction In the previous two lectures, we

More information

INRIA Rh^one-Alpes. Abstract. Friedman (1989) has proposed a regularization technique (RDA) of discriminant analysis

INRIA Rh^one-Alpes. Abstract. Friedman (1989) has proposed a regularization technique (RDA) of discriminant analysis Regularized Gaussian Discriminant Analysis through Eigenvalue Decomposition Halima Bensmail Universite Paris 6 Gilles Celeux INRIA Rh^one-Alpes Abstract Friedman (1989) has proposed a regularization technique

More information

Mathematical foundations - linear algebra

Mathematical foundations - linear algebra Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar

More information

Linear Regression Linear Regression with Shrinkage

Linear Regression Linear Regression with Shrinkage Linear Regression Linear Regression ith Shrinkage Introduction Regression means predicting a continuous (usually scalar) output y from a vector of continuous inputs (features) x. Example: Predicting vehicle

More information

Statistical Learning & Applications. f w (x) =< f w, K x > H = w T x. α i α j < x i x T i, x j x T j. = < α i x i x T i, α j x j x T j > F

Statistical Learning & Applications. f w (x) =< f w, K x > H = w T x. α i α j < x i x T i, x j x T j. = < α i x i x T i, α j x j x T j > F CR2: Statistical Learning & Applications Examples of Kernels and Unsupervised Learning Lecturer: Julien Mairal Scribes: Rémi De Joannis de Verclos & Karthik Srikanta Kernel Inventory Linear Kernel The

More information

Kernel Methods. Charles Elkan October 17, 2007

Kernel Methods. Charles Elkan October 17, 2007 Kernel Methods Charles Elkan elkan@cs.ucsd.edu October 17, 2007 Remember the xor example of a classification problem that is not linearly separable. If we map every example into a new representation, then

More information

Spatial Process Estimates as Smoothers: A Review

Spatial Process Estimates as Smoothers: A Review Spatial Process Estimates as Smoothers: A Review Soutir Bandyopadhyay 1 Basic Model The observational model considered here has the form Y i = f(x i ) + ɛ i, for 1 i n. (1.1) where Y i is the observed

More information

Undercomplete Independent Component. Analysis for Signal Separation and. Dimension Reduction. Category: Algorithms and Architectures.

Undercomplete Independent Component. Analysis for Signal Separation and. Dimension Reduction. Category: Algorithms and Architectures. Undercomplete Independent Component Analysis for Signal Separation and Dimension Reduction John Porrill and James V Stone Psychology Department, Sheeld University, Sheeld, S10 2UR, England. Tel: 0114 222

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

ARTIFICIAL INTELLIGENCE LABORATORY. and. A.I. Memo No May, An Equivalence Between Sparse Approximation and. Support Vector Machines

ARTIFICIAL INTELLIGENCE LABORATORY. and. A.I. Memo No May, An Equivalence Between Sparse Approximation and. Support Vector Machines MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 1606 May, 1997 C.B.C.L

More information

output dimension input dimension Gaussian evidence Gaussian Gaussian evidence evidence from t +1 inputs and outputs at time t x t+2 x t-1 x t+1

output dimension input dimension Gaussian evidence Gaussian Gaussian evidence evidence from t +1 inputs and outputs at time t x t+2 x t-1 x t+1 To appear in M. S. Kearns, S. A. Solla, D. A. Cohn, (eds.) Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 999. Learning Nonlinear Dynamical Systems using an EM Algorithm Zoubin

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Linear Algebra, 4th day, Thursday 7/1/04 REU Info:

Linear Algebra, 4th day, Thursday 7/1/04 REU Info: Linear Algebra, 4th day, Thursday 7/1/04 REU 004. Info http//people.cs.uchicago.edu/laci/reu04. Instructor Laszlo Babai Scribe Nick Gurski 1 Linear maps We shall study the notion of maps between vector

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

1. Introduction Let f(x), x 2 R d, be a real function of d variables, and let the values f(x i ), i = 1; 2; : : : ; n, be given, where the points x i,

1. Introduction Let f(x), x 2 R d, be a real function of d variables, and let the values f(x i ), i = 1; 2; : : : ; n, be given, where the points x i, DAMTP 2001/NA11 Radial basis function methods for interpolation to functions of many variables 1 M.J.D. Powell Abstract: A review of interpolation to values of a function f(x), x 2 R d, by radial basis

More information

. Consider the linear system dx= =! = " a b # x y! : (a) For what values of a and b do solutions oscillate (i.e., do both x(t) and y(t) pass through z

. Consider the linear system dx= =! =  a b # x y! : (a) For what values of a and b do solutions oscillate (i.e., do both x(t) and y(t) pass through z Preliminary Exam { 1999 Morning Part Instructions: No calculators or crib sheets are allowed. Do as many problems as you can. Justify your answers as much as you can but very briey. 1. For positive real

More information

only nite eigenvalues. This is an extension of earlier results from [2]. Then we concentrate on the Riccati equation appearing in H 2 and linear quadr

only nite eigenvalues. This is an extension of earlier results from [2]. Then we concentrate on the Riccati equation appearing in H 2 and linear quadr The discrete algebraic Riccati equation and linear matrix inequality nton. Stoorvogel y Department of Mathematics and Computing Science Eindhoven Univ. of Technology P.O. ox 53, 56 M Eindhoven The Netherlands

More information

Contents. 6 Systems of First-Order Linear Dierential Equations. 6.1 General Theory of (First-Order) Linear Systems

Contents. 6 Systems of First-Order Linear Dierential Equations. 6.1 General Theory of (First-Order) Linear Systems Dierential Equations (part 3): Systems of First-Order Dierential Equations (by Evan Dummit, 26, v 2) Contents 6 Systems of First-Order Linear Dierential Equations 6 General Theory of (First-Order) Linear

More information

Automatic Capacity Tuning. of Very Large VC-dimension Classiers. B. Boser 3. EECS Department, University of California, Berkeley, CA 94720

Automatic Capacity Tuning. of Very Large VC-dimension Classiers. B. Boser 3. EECS Department, University of California, Berkeley, CA 94720 Automatic Capacity Tuning of Very Large VC-dimension Classiers I. Guyon AT&T Bell Labs, 50 Fremont st., 6 th oor, San Francisco, CA 94105 isabelle@neural.att.com B. Boser 3 EECS Department, University

More information

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice 3. Eigenvalues and Eigenvectors, Spectral Representation 3.. Eigenvalues and Eigenvectors A vector ' is eigenvector of a matrix K, if K' is parallel to ' and ' 6, i.e., K' k' k is the eigenvalue. If is

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

Review and problem list for Applied Math I

Review and problem list for Applied Math I Review and problem list for Applied Math I (This is a first version of a serious review sheet; it may contain errors and it certainly omits a number of topic which were covered in the course. Let me know

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

When can Deep Networks avoid the curse of dimensionality and other theoretical puzzles

When can Deep Networks avoid the curse of dimensionality and other theoretical puzzles When can Deep Networks avoid the curse of dimensionality and other theoretical puzzles Tomaso Poggio, MIT, CBMM Astar CBMM s focus is the Science and the Engineering of Intelligence We aim to make progress

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems

More information

Linear Models for Regression. Sargur Srihari

Linear Models for Regression. Sargur Srihari Linear Models for Regression Sargur srihari@cedar.buffalo.edu 1 Topics in Linear Regression What is regression? Polynomial Curve Fitting with Scalar input Linear Basis Function Models Maximum Likelihood

More information

MATH 829: Introduction to Data Mining and Analysis Principal component analysis

MATH 829: Introduction to Data Mining and Analysis Principal component analysis 1/11 MATH 829: Introduction to Data Mining and Analysis Principal component analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware April 4, 2016 Motivation 2/11 High-dimensional

More information

Error Empirical error. Generalization error. Time (number of iteration)

Error Empirical error. Generalization error. Time (number of iteration) Submitted to Neural Networks. Dynamics of Batch Learning in Multilayer Networks { Overrealizability and Overtraining { Kenji Fukumizu The Institute of Physical and Chemical Research (RIKEN) E-mail: fuku@brain.riken.go.jp

More information

Contents. 2.1 Vectors in R n. Linear Algebra (part 2) : Vector Spaces (by Evan Dummit, 2017, v. 2.50) 2 Vector Spaces

Contents. 2.1 Vectors in R n. Linear Algebra (part 2) : Vector Spaces (by Evan Dummit, 2017, v. 2.50) 2 Vector Spaces Linear Algebra (part 2) : Vector Spaces (by Evan Dummit, 2017, v 250) Contents 2 Vector Spaces 1 21 Vectors in R n 1 22 The Formal Denition of a Vector Space 4 23 Subspaces 6 24 Linear Combinations and

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

New concepts: Span of a vector set, matrix column space (range) Linearly dependent set of vectors Matrix null space

New concepts: Span of a vector set, matrix column space (range) Linearly dependent set of vectors Matrix null space Lesson 6: Linear independence, matrix column space and null space New concepts: Span of a vector set, matrix column space (range) Linearly dependent set of vectors Matrix null space Two linear systems:

More information

. (a) Express [ ] as a non-trivial linear combination of u = [ ], v = [ ] and w =[ ], if possible. Otherwise, give your comments. (b) Express +8x+9x a

. (a) Express [ ] as a non-trivial linear combination of u = [ ], v = [ ] and w =[ ], if possible. Otherwise, give your comments. (b) Express +8x+9x a TE Linear Algebra and Numerical Methods Tutorial Set : Two Hours. (a) Show that the product AA T is a symmetric matrix. (b) Show that any square matrix A can be written as the sum of a symmetric matrix

More information

Curves clustering with approximation of the density of functional random variables

Curves clustering with approximation of the density of functional random variables Curves clustering with approximation of the density of functional random variables Julien Jacques and Cristian Preda Laboratoire Paul Painlevé, UMR CNRS 8524, University Lille I, Lille, France INRIA Lille-Nord

More information

Exercises * on Linear Algebra

Exercises * on Linear Algebra Exercises * on Linear Algebra Laurenz Wiskott Institut für Neuroinformatik Ruhr-Universität Bochum, Germany, EU 4 February 7 Contents Vector spaces 4. Definition...............................................

More information

Preliminary Examination, Part I Tuesday morning, August 31, 2004

Preliminary Examination, Part I Tuesday morning, August 31, 2004 Preliminary Examination, Part I Tuesday morning, August 31, 24 This part of the examination consists of six problems. You should work all of the problems. Show all of your work in your workbook. Try to

More information

Radial Basis Function Networks. Ravi Kaushik Project 1 CSC Neural Networks and Pattern Recognition

Radial Basis Function Networks. Ravi Kaushik Project 1 CSC Neural Networks and Pattern Recognition Radial Basis Function Networks Ravi Kaushik Project 1 CSC 84010 Neural Networks and Pattern Recognition History Radial Basis Function (RBF) emerged in late 1980 s as a variant of artificial neural network.

More information

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x = Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.

More information

Linear Algebra: Characteristic Value Problem

Linear Algebra: Characteristic Value Problem Linear Algebra: Characteristic Value Problem . The Characteristic Value Problem Let < be the set of real numbers and { be the set of complex numbers. Given an n n real matrix A; does there exist a number

More information

Lecture 5 : Projections

Lecture 5 : Projections Lecture 5 : Projections EE227C. Lecturer: Professor Martin Wainwright. Scribe: Alvin Wan Up until now, we have seen convergence rates of unconstrained gradient descent. Now, we consider a constrained minimization

More information

Kernel Methods. Machine Learning A W VO

Kernel Methods. Machine Learning A W VO Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

More information

Widths. Center Fluctuations. Centers. Centers. Widths

Widths. Center Fluctuations. Centers. Centers. Widths Radial Basis Functions: a Bayesian treatment David Barber Bernhard Schottky Neural Computing Research Group Department of Applied Mathematics and Computer Science Aston University, Birmingham B4 7ET, U.K.

More information

MATH 590: Meshfree Methods

MATH 590: Meshfree Methods MATH 590: Meshfree Methods Chapter 34: Improving the Condition Number of the Interpolation Matrix Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Fall 2010 fasshauer@iit.edu

More information

Math Camp Notes: Everything Else

Math Camp Notes: Everything Else Math Camp Notes: Everything Else Systems of Dierential Equations Consider the general two-equation system of dierential equations: Steady States ẋ = f(x, y ẏ = g(x, y Just as before, we can nd the steady

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

X/94 $ IEEE 1894

X/94 $ IEEE 1894 I Implementing Radial Basis Functions Using Bump-Resistor Networks John G. Harris University of Florida EE Dept., 436 CSE Bldg 42 Gainesville, FL 3261 1 harris@j upit er.ee.ufl.edu Abstract- Radial Basis

More information

4.1 Eigenvalues, Eigenvectors, and The Characteristic Polynomial

4.1 Eigenvalues, Eigenvectors, and The Characteristic Polynomial Linear Algebra (part 4): Eigenvalues, Diagonalization, and the Jordan Form (by Evan Dummit, 27, v ) Contents 4 Eigenvalues, Diagonalization, and the Jordan Canonical Form 4 Eigenvalues, Eigenvectors, and

More information

PATTERN CLASSIFICATION

PATTERN CLASSIFICATION PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS

More information

On the Noise Model of Support Vector Machine Regression. Massimiliano Pontil, Sayan Mukherjee, Federico Girosi

On the Noise Model of Support Vector Machine Regression. Massimiliano Pontil, Sayan Mukherjee, Federico Girosi MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No. 1651 October 1998

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

MATH 829: Introduction to Data Mining and Analysis Graphical Models I

MATH 829: Introduction to Data Mining and Analysis Graphical Models I MATH 829: Introduction to Data Mining and Analysis Graphical Models I Dominique Guillot Departments of Mathematical Sciences University of Delaware May 2, 2016 1/12 Independence and conditional independence:

More information

Computation. For QDA we need to calculate: Lets first consider the case that

Computation. For QDA we need to calculate: Lets first consider the case that Computation For QDA we need to calculate: δ (x) = 1 2 log( Σ ) 1 2 (x µ ) Σ 1 (x µ ) + log(π ) Lets first consider the case that Σ = I,. This is the case where each distribution is spherical, around the

More information

Calculating Forward Rate Curves Based on Smoothly Weighted Cubic Splines

Calculating Forward Rate Curves Based on Smoothly Weighted Cubic Splines Calculating Forward Rate Curves Based on Smoothly Weighted Cubic Splines February 8, 2017 Adnan Berberovic Albin Göransson Jesper Otterholm Måns Skytt Peter Szederjesi Contents 1 Background 1 2 Operational

More information

Abstract. Jacobi curves are far going generalizations of the spaces of \Jacobi

Abstract. Jacobi curves are far going generalizations of the spaces of \Jacobi Principal Invariants of Jacobi Curves Andrei Agrachev 1 and Igor Zelenko 2 1 S.I.S.S.A., Via Beirut 2-4, 34013 Trieste, Italy and Steklov Mathematical Institute, ul. Gubkina 8, 117966 Moscow, Russia; email:

More information

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter

More information

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture

More information

which arises when we compute the orthogonal projection of a vector y in a subspace with an orthogonal basis. Hence assume that P y = A ij = x j, x i

which arises when we compute the orthogonal projection of a vector y in a subspace with an orthogonal basis. Hence assume that P y = A ij = x j, x i MODULE 6 Topics: Gram-Schmidt orthogonalization process We begin by observing that if the vectors {x j } N are mutually orthogonal in an inner product space V then they are necessarily linearly independent.

More information

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto

Reproducing Kernel Hilbert Spaces Class 03, 15 February 2006 Andrea Caponnetto Reproducing Kernel Hilbert Spaces 9.520 Class 03, 15 February 2006 Andrea Caponnetto About this class Goal To introduce a particularly useful family of hypothesis spaces called Reproducing Kernel Hilbert

More information

Motivating the Covariance Matrix

Motivating the Covariance Matrix Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role

More information

Theorems. Least squares regression

Theorems. Least squares regression Theorems In this assignment we are trying to classify AML and ALL samples by use of penalized logistic regression. Before we indulge on the adventure of classification we should first explain the most

More information

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018 CPSC 340: Machine Learning and Data Mining Sparse Matrix Factorization Fall 2018 Last Time: PCA with Orthogonal/Sequential Basis When k = 1, PCA has a scaling problem. When k > 1, have scaling, rotation,

More information

Mathematical Institute, University of Utrecht. The problem of estimating the mean of an observed Gaussian innite-dimensional vector

Mathematical Institute, University of Utrecht. The problem of estimating the mean of an observed Gaussian innite-dimensional vector On Minimax Filtering over Ellipsoids Eduard N. Belitser and Boris Y. Levit Mathematical Institute, University of Utrecht Budapestlaan 6, 3584 CD Utrecht, The Netherlands The problem of estimating the mean

More information

LECTURE NOTE #11 PROF. ALAN YUILLE

LECTURE NOTE #11 PROF. ALAN YUILLE LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform

More information

Least Squares Approximation

Least Squares Approximation Chapter 6 Least Squares Approximation As we saw in Chapter 5 we can interpret radial basis function interpolation as a constrained optimization problem. We now take this point of view again, but start

More information

Spectral Regularization

Spectral Regularization Spectral Regularization Lorenzo Rosasco 9.520 Class 07 February 27, 2008 About this class Goal To discuss how a class of regularization methods originally designed for solving ill-posed inverse problems,

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

STATISTICAL LEARNING SYSTEMS

STATISTICAL LEARNING SYSTEMS STATISTICAL LEARNING SYSTEMS LECTURE 8: UNSUPERVISED LEARNING: FINDING STRUCTURE IN DATA Institute of Computer Science, Polish Academy of Sciences Ph. D. Program 2013/2014 Principal Component Analysis

More information

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal

More information

CIS 520: Machine Learning Oct 09, Kernel Methods

CIS 520: Machine Learning Oct 09, Kernel Methods CIS 520: Machine Learning Oct 09, 207 Kernel Methods Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture They may or may not cover all the material discussed

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 4 Jan-Willem van de Meent (credit: Yijun Zhao, Arthur Gretton Rasmussen & Williams, Percy Liang) Kernel Regression Basis function regression

More information

BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1

BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1 BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1 Shaobo Li University of Cincinnati 1 Partially based on Hastie, et al. (2009) ESL, and James, et al. (2013) ISLR Data Mining I Lecture

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Statistical Methods for Data Mining

Statistical Methods for Data Mining Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Support Vector Machines Here we approach the two-class classification problem in a direct way: We try and find

More information

Identication and Control of Nonlinear Systems Using. Neural Network Models: Design and Stability Analysis. Marios M. Polycarpou and Petros A.

Identication and Control of Nonlinear Systems Using. Neural Network Models: Design and Stability Analysis. Marios M. Polycarpou and Petros A. Identication and Control of Nonlinear Systems Using Neural Network Models: Design and Stability Analysis by Marios M. Polycarpou and Petros A. Ioannou Report 91-09-01 September 1991 Identication and Control

More information

Group Theory. 1. Show that Φ maps a conjugacy class of G into a conjugacy class of G.

Group Theory. 1. Show that Φ maps a conjugacy class of G into a conjugacy class of G. Group Theory Jan 2012 #6 Prove that if G is a nonabelian group, then G/Z(G) is not cyclic. Aug 2011 #9 (Jan 2010 #5) Prove that any group of order p 2 is an abelian group. Jan 2012 #7 G is nonabelian nite

More information

MLCC 2015 Dimensionality Reduction and PCA

MLCC 2015 Dimensionality Reduction and PCA MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline PCA & Reconstruction PCA and Maximum Variance PCA and Associated Eigenproblem Beyond the First Principal Component

More information