Nonlinear Dimensionality Reduction
|
|
- Corey Hutchinson
- 6 years ago
- Views:
Transcription
1 Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012)
2 Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap
3 Centering in Feature Space Suppose we use kernel function ˆk(, ) which induces a nonlinear feature map ˆφ from the input space X to some feature space F. The images of the N points in F are ˆφ(x (1) ),..., ˆφ(x (N) ), which in general are not centered. The corresponding kernel matrix ˆK is ˆK = [ ˆK ij ] N N = [ˆk(x (i), x (j) )] N N = [ ˆφ(x (i) ), ˆφ(x (j) ) ] N N We want to translate the coordinate system of F such that the new origin is at the sample mean of the N points, i.e., φ(x (i) ) = ˆφ(x (i) ) 1 N N ˆφ(x (j) ) j=1
4 Centering in Feature Space (2) As a result, we also convert the kernel matrix ˆK to K: K = [K ij ] N N = [k(x (i), x (j) )] N N = [ φ(x (i) ), φ(x (j) ) ] N N Let Z = [φ(x (1) ),..., φ(x (N) )] T Ẑ = [ ˆφ(x (1) ),..., ˆφ(x (N) )] T H = I 1 N 11T where 1 is a column vector of ones. We write Z = HẐ. Hence, K = ZZ T = HẐẐT H = HˆKH
5 Eigenvalue Equation Based on Covariance Matrix The covariance matrix of the N centered points in F is given by C = 1 N N φ(x (i) )φ(x (i) ) T (1) i=1 If F is infinite-dimensional (e.g., F is a Hilbert space), we can think of φ(x (i) )φ(x (i) ) T as a linear operator on F, mapping z φ(x (i) ) φ(x (i) ), z. To perform PCA in F, we solve the following eigenvalue equation for the eigenvalues λ k (k = 1,..., N) and eigenvectors v k (k = 1,..., N) of C: Cv = λv (2)
6 Eigenvalue Equation Based on Covariance Matrix (2) Substituting (1) into (2) gives an equivalent form of (2): λv = 1 N N φ(x (i) )φ(x (i) ) T v = i=1 N i=1 φ(x (i) ), v φ(x (i) ) N If λ 0, then we have the following dual eigenvector representation: v = N i=1 φ(x (i) ), v φ(x (i) ) = λn N α i φ(x (i) ) (3) for some coefficients α i (i = 1,..., N). Thus, all eigenvector solutions v with nonzero eigenvalues λ 0 must lie in the span of φ(x (1) ),..., φ(x (N) ). (2) can be written as the following set of equations: φ(x (k) ), Cv = λ φ(x (k) ), v, k = 1,..., N. (4) i=1
7 Eigenvalue Equation Based on Kernel Matrix Substituting (1) and (3) into (4), we have 1 N N j=1 K kj N α i K ji = λ i=1 i=1 N α i K ki, k = 1,..., N or in matrix form: K 2 α = NλKα, (5) where K is the kernel matrix (or Gram matrix) and α = (α 1,..., α N ) T. If K is invertible, (5) can be expressed as the following (dual) eigenvalue equation: Kα = ξα (6) where ξ = Nλ.
8 Normalization of Eigenvectors Let ξ 1... ξ N 0 denote the N eigenvalues of K and α 1,..., α N be the corresponding eigenvectors. Suppose ξ p is the smallest nonzero eigenvalue for some 1 p N. We normalize α 1,..., α p such that v k, v k = 1, k = 1,..., p. (7)
9 Normalization of Eigenvectors (2) Substituting (3) into (7), we have N α ik α jk K ij = 1, i,j=1 α k, Kα k = 1, α k, ξ k α k = 1, α k, α k = 1, ξ k for all k = 1,..., p.
10 Normalization of Eigenvectors (3) Suppose the eigenvectors obtain for (6) are such that α k = 1, k = 1,..., p. Then we should modify (3) to in order to satisfy (7). v k = 1 ξk N i=1 α ik φ(x (i) )
11 Embedding of New Data Points For any input x, the kth principal component y k of φ(x) is given by y k = v k, φ(x) = 1 ξk N i=1 α ik φ(x (i) ), φ(x) = 1 ξk N i=1 α ik k(x (i), x). If x = x j for some 1 j N, i.e., x is one of the N original points, then the kth principal component y jk of φ(x j ) becomes y jk = v k, φ(x (j) ) = 1 ξk = N i=1 1 ξk (ξ k α k ) j = ξ k α jk, α ik K ij = 1 ξk (Kα k ) j which is proportional to the expansion coefficient α jk.
12 Embedding of New Data Points (2) Let Y = [y jk ] N p. Then we can express Y as Y = [α 1,..., α p ]diag( ξ 1,..., ξ p ). Note that K = YY T.
13 Geodesic Distance Euclidean distance in the high-dimensional input space cannot reflect the true low-dimensional geometry of the manifold.
14 Geodesic Distance (2) The geodesic ( shortest path ) distance should be used instead. Geodesic distance: Neighboring points: input space Euclidean distance provided a good approximation of the geodesic distance. Faraway points: geodesic distance can be approximated by adding up a sequence of short hops between neighboring points based on Euclidean distance.
15 Isomap Algorithm Isomap is a nonlinear dimensionality reduction (NLDR) method that is based on metric MDS but seeks to preserve the intrinsic geometry of the data as captured in the geodesic distances between data points. Three steps of the Isomap algorithm: 1 Construct the neighborhood graph 2 Compute the shortest paths 3 Construct the low-dimensional embedding
16 Isomap Algorithm (2) Given distance d(i, j) between point pairs for N points in X. 1 Construct the neighborhood graph: Define a graph G over all N data points by connecting points i and j if their distance d(i, j) is closer than ɛ (ɛ-isomap) or if i is one of the K nearest neighbors of j (K -Isomap). Set edge lengths equal to d(i, j). 2 Compute the shortest paths: Initialize d G (i, j) = d(i, j) if i and j are linked by an edge and d G (i, j) = otherwise. For each k = 1,..., N, replace all entries d G (i, j) by min(d G (i, j), d G (i, k) + d G (k, j)). Then D G = [d G (i, j)] contains the shortest path distances between all point pairs in G. 3 Construct the low-dimensional embedding (by MDS).
17 Isomap Algorithm (3) There are two bottlenecks in the Isomap algorithm. Shortest path computation: Floydś algorithm: O(N 3 ) Dijkstraś algorithm (with Fibonacci heaps): O(KN 2 log N) where K is the neighborhood size. Eigendecomposition: O(N 3 )
18 Example - Face Images
19 Example - Face Images
20 Intrinsic Dimensionality of Data Manifolds In practice, some of the eigenvalues may be so close to zero that they can be ignored. As with PCA and MDS, the true (intrinsic) dimensionality of the data can be estimated from the decrease in error as the dimensionality of the low-dimensional space increases. For nonlinear manifolds, PCA and MDS tend to overestimate the intrinsic dimensionality. The intrinsic degrees of freedom provide a simple way to analyze and manipulate high-dimensional data.
21 Intrinsic Dimensionality of Data Manifolds (2) The residual variance of PCA, MDS, and Isomap on 4 data sets: (A) face images (MDS, Isomap ) (B) Swiss roll data (MDS, Isomap ) (C) hand images (MDS, Isomap ) (D) handwritten 2 s(pca, MDS, Isomap )
22 Global vs. Local Embedding Methods Metric MDS and Isomap compute embeddings that seek to preserve inter-point straight-line (Euclidean) distances or geodesic distances between all pairs of points. Hence they are global methods. Both locally linear embedding (LLE) and Laplacian eigenmap try to recover the global nonlinear structure from local geometric properties. They are local methods. Overlapping local neighborhoods, collectively analyzed, can provide information about the global geometry.
23 Computational Advantages of LLE Like PCA and MDS, LLE is simple to implement and its optimization problems do not have local minima. Although only linear algebraic methods are used, the constraint that points are only reconstructed from neighbors based on locally linear fits can result in highly nonlinear embeddings. Its main step involves a sparse eigenvalue problem that scales up better with large, high-dimensional data sets.
24 Problem Setting Let X = {x (1),..., x (N) } be a set of n points in a high-dimensional input space R D. The N data points are assumed to lie on or near a nonlinear manifold of intrinsic dimensionality p < D (typically p D). Provided that sufficient data are available by sampling well from the manifold, the goal of LLE is to find a low-dimensional embedding of X by mapping the D-dimensional data into a single global coordinate system in R p. Let us denote the set of N points in the embedding space R p by Y = {y (1),..., y (N) }.
25 LLE Algorithm 1 For each data point x (i) X : Find the set N i of K nearest neighbors of x (i). Compute the reconstruction weights of the neighbors that minimize the error of reconstructing x (i). 2 Compute the low-dimensional embedding Y that best preserves the local geometry represented by the reconstruction weights.
26 Locally Linear Fitting If the manifold is sufficiently dense, then each point and its neighbors are expected to lie on or close to a locally linear patch of the manifold. The local geometry of a patch is characterized by the reconstruction weights with which a data point is constructed from its neighbors. Let w i denote the K -dimensional vector of local reconstruction weights for data point x (i). (One may also consider the full N-dimensional weight vector by constraining the terms w ij for x (j) / N i to 0)
27 Constrained Least Squares Problem Optimality is achieved by minimizing the local reconstruction error function for each data point x (i) : E i (w i ) = x (i) x (j) N i w ij x (j) 2 which is the squared distance between x (i) and its reconstruction, subject to the constraints x (j) N i w ij = 1 T w i = 1 and w ij = 0 for any x (j) / N i. This is a constrained least squares problem that can be solved using the classical method of Lagrange multipliers.
28 Constrained Least Squares Problem(2) The error function E i (w i ) can be rewritten as follows: E i (w i ) = [ w ij (x (i) x (j) )] T [ w ij (x (i) x (j) )] x (j) N i x (j) N i = w ij w ik (x (i) x (j) ) T (x (i) x (k) )] x (j),x (k) N i = w T i G i w i where G i = [(x (i) x (j) ) T (x (i) x (k) )] K K is the local Gram matrix for x (i). To minimize E i (w i ) subject to the constraint 1 T w i = 1, we define Lagrangian function with multiplier λ: L(w i, λ) = w T i G i w i + λ(1 1 T w i )
29 Constrained Least Squares Problem (3) The partial derivative of L(w i, λ) w.r.t. w i and λ are L w i = 2G i w i λ1 L λ = 1 1T w i Setting the above equations to 0, we finally get (if G 1 i w i = G 1 i 1 1 T G 1 i 1 exists)
30 A More Efficient Method Instead of inverting G i, a more efficient way is to solve the linear system of equations G i ŵ i = 1 for ŵ i and then compute w i as ŵi w i = 1 T ŵ i so that the equality constraint 1 T w i = 1 is satisfied. Based on the reconstruction weights computed for all N data points, we form a weight matrix W = [w ij ] N N that will be used in the next step.
31 Low-Dimensional Embedding Given the weight matrix W, the best low-dimensional embedding Y can be computed by minimizing the following error function w.r.t. Y = [y (1),..., y (N) ] T R N p : J(Y) = N y (i) i=1 x (j) N i w ij y (j) Let b i be the ith column of the identity matrix I and w i be the ith column of W T (i.e., w i is the weight vector for x (i) ).
32 Optimization We can rewrite J(Y) as N N J(Y) = Y T b i Y T w i 2 = Y T (b i w i ) 2 i=1 i=1 = Y T (I W T ) 2 F = Tr[YT (I W) T (I W)Y] = Tr[Y T MY] where M = (I W) T (I W) is symmetric and positive semi-definite matrix (since x T Mx 0 for all x). M is sparse for reasonable choices of the neighborhood size K (i.e., K N).
33 Invariance to Translation, Rotation and Scaling Note that the error function J(Y) is invariant to translation, rotation and scaling of the vectors y (i) in the low-dimensional embedding Y. To remove the translational degree of freedom, we require the vectors y (i) to have zero mean, i.e. N y (i) = Y T 1 = 0 i=1 To remove the degree of freedom due to rotation and scaling, we constrain the vectors y (i) to have covariance matrix equal to the identity matrix, i.e. 1 N N y (i) (y (i) ) T = 1 N YT Y = I i=1
34 Eigenvalue Problem The optimization problem can thus be stated as min Tr(Y T MY) Y subject to Y T 1 = 0 and Y T Y = NI If we express Y as [y 1,..., y p ], then the optimization problem can also be expressed as p min y T k My k Y subject to k=1 y T k 1 = 0 and yt k y k = N for k = 1,..., p.
35 Eigenvalue Problem (2) Thus the solution to the optimization problem can be obtained by solving the following eigenvalue problem My = λy for the eigenvectors y k (k = 1,..., p) that correspond to the p smallest nonzero eigenvalues. The eigenvectors are normalized such that y T k y k = N for all k = 1,..., p.
36 Algorithm Overview Let x (1),..., x (N) be N points in R D. Like Isomap, the Laplacian eigenmap algorithm first constructs a weighted graph with N nodes representing the neighborhood relationships. It then computes an eigenmap based on the graph.
37 Edge Creation An edge is created between nodes i and j if x (i) and x (j) are close to each other. Two possible criteria for edge creation: ɛ-neighborhood: Nodes i and j are connected by an edge if x (i) x (j) < ɛ for some ɛ R +. K nearest neighbors: Nodes i and j are connected by an edge if x (i) is among the K nearest neighbors of x (j) or x (j) is among the K nearest neighbors of x (i).
38 Edge Weighting Two common variations for edge weighting: Heat kernel: { w ij = exp( x(i) x (j) ) if nodes i and j are connected σ 2 0 otherwise for some σ 2 R +. Binary weights: { 1 if nodes i and j are connected w ij = 0 otherwise
39 Construction of Eigenmap If the graph constructed above is not connected, then the following procedure is applied to each connected component separately. We first consider the special case which finds a 1-dimensional embedding, and then generalize it to the general p-dimensional case for p > 1. Let y = (y (1),..., y (N) ) T denote 1-dimensional embedding. The objective function for minimization is given by N (y (i) y (j) ) 2 w ij i,j=1
40 Construction of Eigenmap (2) We can rewrite the objective function as 1 2 N (y (i) y (j) ) 2 w ij = 1 2 i,j=1 = = N (y (i) ) 2 w ij i,j=1 N (y (i) ) 2 d ii i=1 N (y (j) ) 2 w ij + i,j=1 N y (i) y (j) w ij i,j=1 N y (i) (d ij w ij ) i,j=1 = y T (D W)y = y T Ly N y (i) y (j) w ij i,j=1 where d ii = j w ij,d = diag([d 11,..., d NN ]),L = D W is the graph Laplacian.
41 Scale Invariance To remove the arbitrary scaling factor in the embedding, we enforce the constraint y T Dy = 1 (the larger d ii is, the more important is the corresponding node i). The optimization problem can thus be restated as min y y T Ly subject to y T Dy = 1 or min y y T Ly y T Dy
42 Generalized Eigenvalue Problem This corresponds to solving the following eigenvalue problem (D 1 L)y = λy or the corresponding generalized eigenvalue problem Ly = λdy for the smallest eigenvalue λ and the corresponding eigenvector y. Note that λ = 0 and y = ci for all c 0 form a solution since cl1 = c(d W)1 = 0 = 0D1
43 Generalized Eigenvalue Problem (2) To eliminate such cases, we modify the minimization problem to min y y T Ly subject to y T Dy = 1, y T D1 = 0 Note that if D = I, then y T D1 = 0 is equivalent to centering. Finally, we can conclude that the solution is the eigenvector y for the following generalized eigenvalue problem: Ly = λdy corresponding to the smallest nonzero eigenvalue λ. Normalization of y is performed such that y T Dy = 1.
44 Construction of Eigenmap for p > 1 Let the p-dimensional embedding be denoted by the N p matrix Y = [y n,..., y p ] = [y (1),..., y (N) ] T. Note that y (i) is the p-dimensional representation of x (i) in the embedding space. The objective function for minimization is given by N y (i) y (j) 2 w ij i,j=1
45 Construction of Eigenmap for p > 1 (2) The minimization problem can be stated as min Tr(Y T LY) Y subject to Y T DY = I, Y T D1 = 0 Solution: the eigenvectors y k (k = 1,..., p) for the generalized eigenvalue problem corresponding to the p smallest nonzero eigenvalues give the solution to the optimization problem above. The eigenvectors are normalized such that y T k Dy k = 1 for all k = 1,..., p.
46 And Beyond Robust version of dimensionality reduction Out-of-sample extensions for LLE, Isomap, MDS A kernel view of embedding methods Probabilistic view Supervised and semi-supervised extensions Applications Super-resolution Image recognition Many many others...
47 Main References Kernel PCA: [SSM98] Isomap: [Ten98][TdL00][BST + 02] Locally Linear Embedding: [RS00][SR03] Laplacian Eigenmap: [BN02][BN03]
48 M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral techniques for embedding and clustering. In T.G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages MIT Press, Cambridge, MA, USA, M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation, 15(6): , M. Balasubramanian, E.L. Schwartz, J.B. Tenenbaum, V. de Silva, and J.C. Langford. The Isomap algorithm and topological stability. Science, 295(5552):7a, S.T. Roweis and L.K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500): , 2000.
49 L.K. Saul and S.T. Roweis. Think globally, fit locally: unsupervised learning of low dimensional manifolds. Journal of Machine Learning Research, 4: , B. Schölkopf, A.J. Smola, and K.-R. Müller. Nonlinear component analysis as a kernel eigenvalue probelm. Neural Computation, 10: , J.B. Tenenbaum, V. de Silva, and J.C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500): , J.B. Tenenbaum. Mapping a manifold of perceptual observations. In M.I. Jordan, M.J. Kearns, and S.A. Solla, editors, Advances in Neural Information Processing Systems 10, pages MIT Press, 1998.
Unsupervised dimensionality reduction
Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional
More informationNon-linear Dimensionality Reduction
Non-linear Dimensionality Reduction CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Laplacian Eigenmaps Locally Linear Embedding (LLE)
More informationLaplacian Eigenmaps for Dimensionality Reduction and Data Representation
Introduction and Data Representation Mikhail Belkin & Partha Niyogi Department of Electrical Engieering University of Minnesota Mar 21, 2017 1/22 Outline Introduction 1 Introduction 2 3 4 Connections to
More informationConnection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis
Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal
More informationNonlinear Dimensionality Reduction
Nonlinear Dimensionality Reduction Piyush Rai CS5350/6350: Machine Learning October 25, 2011 Recap: Linear Dimensionality Reduction Linear Dimensionality Reduction: Based on a linear projection of the
More informationLecture 10: Dimension Reduction Techniques
Lecture 10: Dimension Reduction Techniques Radu Balan Department of Mathematics, AMSC, CSCAMM and NWC University of Maryland, College Park, MD April 17, 2018 Input Data It is assumed that there is a set
More informationCSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13
CSE 291. Assignment 3 Out: Wed May 23 Due: Wed Jun 13 3.1 Spectral clustering versus k-means Download the rings data set for this problem from the course web site. The data is stored in MATLAB format as
More informationLaplacian Eigenmaps for Dimensionality Reduction and Data Representation
Laplacian Eigenmaps for Dimensionality Reduction and Data Representation Neural Computation, June 2003; 15 (6):1373-1396 Presentation for CSE291 sp07 M. Belkin 1 P. Niyogi 2 1 University of Chicago, Department
More informationDimension Reduction and Low-dimensional Embedding
Dimension Reduction and Low-dimensional Embedding Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/26 Dimension
More informationDimension Reduction Techniques. Presented by Jie (Jerry) Yu
Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage
More informationLearning Eigenfunctions: Links with Spectral Clustering and Kernel PCA
Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures
More informationManifold Learning: Theory and Applications to HRI
Manifold Learning: Theory and Applications to HRI Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr August 19, 2008 1 / 46 Greek Philosopher
More informationL26: Advanced dimensionality reduction
L26: Advanced dimensionality reduction The snapshot CA approach Oriented rincipal Components Analysis Non-linear dimensionality reduction (manifold learning) ISOMA Locally Linear Embedding CSCE 666 attern
More informationData-dependent representations: Laplacian Eigenmaps
Data-dependent representations: Laplacian Eigenmaps November 4, 2015 Data Organization and Manifold Learning There are many techniques for Data Organization and Manifold Learning, e.g., Principal Component
More informationNonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.
Nonlinear Methods Data often lies on or near a nonlinear low-dimensional curve aka manifold. 27 Laplacian Eigenmaps Linear methods Lower-dimensional linear projection that preserves distances between all
More informationNonlinear Dimensionality Reduction. Jose A. Costa
Nonlinear Dimensionality Reduction Jose A. Costa Mathematics of Information Seminar, Dec. Motivation Many useful of signals such as: Image databases; Gene expression microarrays; Internet traffic time
More informationIntrinsic Structure Study on Whale Vocalizations
1 2015 DCLDE Conference Intrinsic Structure Study on Whale Vocalizations Yin Xian 1, Xiaobai Sun 2, Yuan Zhang 3, Wenjing Liao 3 Doug Nowacek 1,4, Loren Nolte 1, Robert Calderbank 1,2,3 1 Department of
More informationDimensionality Reduction AShortTutorial
Dimensionality Reduction AShortTutorial Ali Ghodsi Department of Statistics and Actuarial Science University of Waterloo Waterloo, Ontario, Canada, 2006 c Ali Ghodsi, 2006 Contents 1 An Introduction to
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction
More informationLECTURE NOTE #11 PROF. ALAN YUILLE
LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform
More informationFocus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.
Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,
More informationGlobal (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction
Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction A presentation by Evan Ettinger on a Paper by Vin de Silva and Joshua B. Tenenbaum May 12, 2005 Outline Introduction The
More informationLearning a Kernel Matrix for Nonlinear Dimensionality Reduction
Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger kilianw@cis.upenn.edu Fei Sha feisha@cis.upenn.edu Lawrence K. Saul lsaul@cis.upenn.edu Department of Computer and Information
More informationManifold Learning and it s application
Manifold Learning and it s application Nandan Dubey SE367 Outline 1 Introduction Manifold Examples image as vector Importance Dimension Reduction Techniques 2 Linear Methods PCA Example MDS Perception
More informationData dependent operators for the spatial-spectral fusion problem
Data dependent operators for the spatial-spectral fusion problem Wien, December 3, 2012 Joint work with: University of Maryland: J. J. Benedetto, J. A. Dobrosotskaya, T. Doster, K. W. Duke, M. Ehler, A.
More informationRobust Laplacian Eigenmaps Using Global Information
Manifold Learning and its Applications: Papers from the AAAI Fall Symposium (FS-9-) Robust Laplacian Eigenmaps Using Global Information Shounak Roychowdhury ECE University of Texas at Austin, Austin, TX
More informationUnsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto
Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian
More informationData Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection
More informationFace Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi
Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold
More informationNonlinear Manifold Learning Summary
Nonlinear Manifold Learning 6.454 Summary Alexander Ihler ihler@mit.edu October 6, 2003 Abstract Manifold learning is the process of estimating a low-dimensional structure which underlies a collection
More informationDIMENSION REDUCTION. min. j=1
DIMENSION REDUCTION 1 Principal Component Analysis (PCA) Principal components analysis (PCA) finds low dimensional approximations to the data by projecting the data onto linear subspaces. Let X R d and
More informationLearning a kernel matrix for nonlinear dimensionality reduction
University of Pennsylvania ScholarlyCommons Departmental Papers (CIS) Department of Computer & Information Science 7-4-2004 Learning a kernel matrix for nonlinear dimensionality reduction Kilian Q. Weinberger
More informationManifold Regularization
9.520: Statistical Learning Theory and Applications arch 3rd, 200 anifold Regularization Lecturer: Lorenzo Rosasco Scribe: Hooyoung Chung Introduction In this lecture we introduce a class of learning algorithms,
More informationDistance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center
Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II
More informationLocality Preserving Projections
Locality Preserving Projections Xiaofei He Department of Computer Science The University of Chicago Chicago, IL 60637 xiaofei@cs.uchicago.edu Partha Niyogi Department of Computer Science The University
More informationStatistical and Computational Analysis of Locality Preserving Projection
Statistical and Computational Analysis of Locality Preserving Projection Xiaofei He xiaofei@cs.uchicago.edu Department of Computer Science, University of Chicago, 00 East 58th Street, Chicago, IL 60637
More informationLecture: Some Practical Considerations (3 of 4)
Stat260/CS294: Spectral Graph Methods Lecture 14-03/10/2015 Lecture: Some Practical Considerations (3 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough.
More informationApprentissage non supervisée
Apprentissage non supervisée Cours 3 Higher dimensions Jairo Cugliari Master ECD 2015-2016 From low to high dimension Density estimation Histograms and KDE Calibration can be done automacally But! Let
More informationClassification of handwritten digits using supervised locally linear embedding algorithm and support vector machine
Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine Olga Kouropteva, Oleg Okun, Matti Pietikäinen Machine Vision Group, Infotech Oulu and
More informationA Duality View of Spectral Methods for Dimensionality Reduction
A Duality View of Spectral Methods for Dimensionality Reduction Lin Xiao 1 Jun Sun 2 Stephen Boyd 3 May 3, 2006 1 Center for the Mathematics of Information, California Institute of Technology, Pasadena,
More informationLinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis
More informationDimensionality Reduction: A Comparative Review
Tilburg centre for Creative Computing P.O. Box 90153 Tilburg University 5000 LE Tilburg, The Netherlands http://www.uvt.nl/ticc Email: ticc@uvt.nl Copyright c Laurens van der Maaten, Eric Postma, and Jaap
More informationMachine Learning. Data visualization and dimensionality reduction. Eric Xing. Lecture 7, August 13, Eric Xing Eric CMU,
Eric Xing Eric Xing @ CMU, 2006-2010 1 Machine Learning Data visualization and dimensionality reduction Eric Xing Lecture 7, August 13, 2010 Eric Xing Eric Xing @ CMU, 2006-2010 2 Text document retrieval/labelling
More informationGaussian Process Latent Random Field
Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Gaussian Process Latent Random Field Guoqiang Zhong, Wu-Jun Li, Dit-Yan Yeung, Xinwen Hou, Cheng-Lin Liu National Laboratory
More informationA Duality View of Spectral Methods for Dimensionality Reduction
Lin Xiao lxiao@caltech.edu Center for the Mathematics of Information, California Institute of Technology, Pasadena, CA 91125, USA Jun Sun sunjun@stanford.edu Stephen Boyd boyd@stanford.edu Department of
More informationSpectral Dimensionality Reduction via Maximum Entropy
Sheffield Institute for Translational Neuroscience and Department of Computer Science, University of Sheffield Abstract We introduce a new perspective on spectral dimensionality reduction which views these
More informationDimensionality Reduc1on
Dimensionality Reduc1on contd Aarti Singh Machine Learning 10-601 Nov 10, 2011 Slides Courtesy: Tom Mitchell, Eric Xing, Lawrence Saul 1 Principal Component Analysis (PCA) Principal Components are the
More informationMachine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Machine Learning Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1395 1 / 47 Table of contents 1 Introduction
More information(Non-linear) dimensionality reduction. Department of Computer Science, Czech Technical University in Prague
(Non-linear) dimensionality reduction Jiří Kléma Department of Computer Science, Czech Technical University in Prague http://cw.felk.cvut.cz/wiki/courses/a4m33sad/start poutline motivation, task definition,
More informationGraphs, Geometry and Semi-supervised Learning
Graphs, Geometry and Semi-supervised Learning Mikhail Belkin The Ohio State University, Dept of Computer Science and Engineering and Dept of Statistics Collaborators: Partha Niyogi, Vikas Sindhwani In
More informationPrincipal Component Analysis
CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationLearning Eigenfunctions Links Spectral Embedding
Learning Eigenfunctions Links Spectral Embedding and Kernel PCA Yoshua Bengio, Olivier Delalleau, Nicolas Le Roux Jean-François Paiement, Pascal Vincent, and Marie Ouimet Département d Informatique et
More informationData Mining II. Prof. Dr. Karsten Borgwardt, Department Biosystems, ETH Zürich. Basel, Spring Semester 2016 D-BSSE
D-BSSE Data Mining II Prof. Dr. Karsten Borgwardt, Department Biosystems, ETH Zürich Basel, Spring Semester 2016 D-BSSE Karsten Borgwardt Data Mining II Course, Basel Spring Semester 2016 2 / 117 Our course
More informationSpectral Dimensionality Reduction
Spectral Dimensionality Reduction Yoshua Bengio, Olivier Delalleau, Nicolas Le Roux Jean-François Paiement, Pascal Vincent, and Marie Ouimet Département d Informatique et Recherche Opérationnelle Centre
More informationAdvanced Machine Learning & Perception
Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 2 Nonlinear Manifold Learning Multidimensional Scaling (MDS) Locally Linear Embedding (LLE) Beyond Principal Components Analysis (PCA)
More informationDiscriminative Direction for Kernel Classifiers
Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering
More informationManifold Learning: From Linear to nonlinear. Presenter: Wei-Lun (Harry) Chao Date: April 26 and May 3, 2012 At: AMMAI 2012
Manifold Learning: From Linear to nonlinear Presenter: Wei-Lun (Harry) Chao Date: April 26 and May 3, 2012 At: AMMAI 2012 1 Preview Goal: Dimensionality Classification reduction and clustering Main idea:
More informationKernel Methods. Machine Learning A W VO
Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance
More informationISSN: (Online) Volume 3, Issue 5, May 2015 International Journal of Advance Research in Computer Science and Management Studies
ISSN: 2321-7782 (Online) Volume 3, Issue 5, May 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at:
More informationDiscriminant Uncorrelated Neighborhood Preserving Projections
Journal of Information & Computational Science 8: 14 (2011) 3019 3026 Available at http://www.joics.com Discriminant Uncorrelated Neighborhood Preserving Projections Guoqiang WANG a,, Weijuan ZHANG a,
More informationA Scalable Kernel-Based Algorithm for Semi-Supervised Metric Learning
A Scalable Kernel-Based Algorithm for Semi-Supervised Metric Learning Dit-Yan Yeung, Hong Chang, Guang Dai Department of Computer Science and Engineering Hong Kong University of Science and Technology
More informationSPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS
SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS VIKAS CHANDRAKANT RAYKAR DECEMBER 5, 24 Abstract. We interpret spectral clustering algorithms in the light of unsupervised
More informationChap.11 Nonlinear principal component analysis [Book, Chap. 10]
Chap.11 Nonlinear principal component analysis [Book, Chap. 1] We have seen machine learning methods nonlinearly generalizing the linear regression method. Now we will examine ways to nonlinearly generalize
More informationGraph-Laplacian PCA: Closed-form Solution and Robustness
2013 IEEE Conference on Computer Vision and Pattern Recognition Graph-Laplacian PCA: Closed-form Solution and Robustness Bo Jiang a, Chris Ding b,a, Bin Luo a, Jin Tang a a School of Computer Science and
More informationImage Analysis & Retrieval Lec 13 - Feature Dimension Reduction
CS/EE 5590 / ENG 401 Special Topics, Spring 2018 Image Analysis & Retrieval Lec 13 - Feature Dimension Reduction Zhu Li Dept of CSEE, UMKC http://l.web.umkc.edu/lizhu Office Hour: Tue/Thr 2:30-4pm@FH560E,
More informationNonlinear Dimensionality Reduction by Semidefinite Programming and Kernel Matrix Factorization
Nonlinear Dimensionality Reduction by Semidefinite Programming and Kernel Matrix Factorization Kilian Q. Weinberger, Benjamin D. Packer, and Lawrence K. Saul Department of Computer and Information Science
More informationCOMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017
COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY
More informationA SEMI-SUPERVISED METRIC LEARNING FOR CONTENT-BASED IMAGE RETRIEVAL. {dimane,
A SEMI-SUPERVISED METRIC LEARNING FOR CONTENT-BASED IMAGE RETRIEVAL I. Daoudi,, K. Idrissi, S. Ouatik 3 Université de Lyon, CNRS, INSA-Lyon, LIRIS, UMR505, F-696, France Faculté Des Sciences, UFR IT, Université
More informationDimensionality Reduction:
Dimensionality Reduction: From Data Representation to General Framework Dong XU School of Computer Engineering Nanyang Technological University, Singapore What is Dimensionality Reduction? PCA LDA Examples:
More informationDimensionality Reduction: A Comparative Review
Dimensionality Reduction: A Comparative Review L.J.P. van der Maaten, E.O. Postma, H.J. van den Herik MICC, Maastricht University, P.O. Box 616, 6200 MD Maastricht, The Netherlands. Abstract In recent
More informationDistance Preservation - Part 2
Distance Preservation - Part 2 Graph Distances Niko Vuokko October 9th 2007 NLDR Seminar Outline Introduction Geodesic and graph distances From linearity to nonlinearity Isomap Geodesic NLM Curvilinear
More informationDimensionality Reduction
Dimensionality Reduction Neil D. Lawrence neill@cs.man.ac.uk Mathematics for Data Modelling University of Sheffield January 23rd 28 Neil Lawrence () Dimensionality Reduction Data Modelling School 1 / 7
More informationISOMAP TRACKING WITH PARTICLE FILTER
Clemson University TigerPrints All Theses Theses 5-2007 ISOMAP TRACKING WITH PARTICLE FILTER Nikhil Rane Clemson University, nrane@clemson.edu Follow this and additional works at: https://tigerprints.clemson.edu/all_theses
More informationKernel Principal Component Analysis
Kernel Principal Component Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationLearning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst
Learning on Graphs and Manifolds CMPSCI 689 Sridhar Mahadevan U.Mass Amherst Outline Manifold learning is a relatively new area of machine learning (2000-now). Main idea Model the underlying geometry of
More informationLocal Learning Projections
Mingrui Wu mingrui.wu@tuebingen.mpg.de Max Planck Institute for Biological Cybernetics, Tübingen, Germany Kai Yu kyu@sv.nec-labs.com NEC Labs America, Cupertino CA, USA Shipeng Yu shipeng.yu@siemens.com
More informationInformative Laplacian Projection
Informative Laplacian Projection Zhirong Yang and Jorma Laaksonen Department of Information and Computer Science Helsinki University of Technology P.O. Box 5400, FI-02015, TKK, Espoo, Finland {zhirong.yang,jorma.laaksonen}@tkk.fi
More informationThe Curse of Dimensionality for Local Kernel Machines
The Curse of Dimensionality for Local Kernel Machines Yoshua Bengio, Olivier Delalleau, Nicolas Le Roux Dept. IRO, Université de Montréal P.O. Box 6128, Downtown Branch, Montreal, H3C 3J7, Qc, Canada {bengioy,delallea,lerouxni}@iro.umontreal.ca
More informationData Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings
Data Analysis and Manifold Learning Lecture 3: Graphs, Graph Matrices, and Graph Embeddings Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline
More informationEECS 275 Matrix Computation
EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 23 1 / 27 Overview
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationApproximate Kernel PCA with Random Features
Approximate Kernel PCA with Random Features (Computational vs. Statistical Tradeoff) Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Journées de Statistique Paris May 28,
More informationCourse 495: Advanced Statistical Machine Learning/Pattern Recognition
Course 495: Advanced Statistical Machine Learning/Pattern Recognition Deterministic Component Analysis Goal (Lecture): To present standard and modern Component Analysis (CA) techniques such as Principal
More informationBi-stochastic kernels via asymmetric affinity functions
Bi-stochastic kernels via asymmetric affinity functions Ronald R. Coifman, Matthew J. Hirn Yale University Department of Mathematics P.O. Box 208283 New Haven, Connecticut 06520-8283 USA ariv:1209.0237v4
More informationKernel methods for comparing distributions, measuring dependence
Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations
More informationData Analysis and Manifold Learning Lecture 7: Spectral Clustering
Data Analysis and Manifold Learning Lecture 7: Spectral Clustering Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture 7 What is spectral
More informationStatistical Learning. Dong Liu. Dept. EEIS, USTC
Statistical Learning Dong Liu Dept. EEIS, USTC Chapter 6. Unsupervised and Semi-Supervised Learning 1. Unsupervised learning 2. k-means 3. Gaussian mixture model 4. Other approaches to clustering 5. Principle
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems
More informationNonlinear Learning using Local Coordinate Coding
Nonlinear Learning using Local Coordinate Coding Kai Yu NEC Laboratories America kyu@sv.nec-labs.com Tong Zhang Rutgers University tzhang@stat.rutgers.edu Yihong Gong NEC Laboratories America ygong@sv.nec-labs.com
More informationSpectral Techniques for Clustering
Nicola Rebagliati 1/54 Spectral Techniques for Clustering Nicola Rebagliati 29 April, 2010 Nicola Rebagliati 2/54 Thesis Outline 1 2 Data Representation for Clustering Setting Data Representation and Methods
More informationIterative Laplacian Score for Feature Selection
Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,
More informationMACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA
1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR
More informationLinear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction
Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the
More informationGraph Metrics and Dimension Reduction
Graph Metrics and Dimension Reduction Minh Tang 1 Michael Trosset 2 1 Applied Mathematics and Statistics The Johns Hopkins University 2 Department of Statistics Indiana University, Bloomington November
More informationApproximate Kernel Methods
Lecture 3 Approximate Kernel Methods Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 207 Outline Motivating example Ridge regression
More informationMachine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.
Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning
More informationPreprocessing & dimensionality reduction
Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016
More informationLinear and Non-Linear Dimensionality Reduction
Linear and Non-Linear Dimensionality Reduction Alexander Schulz aschulz(at)techfak.uni-bielefeld.de University of Pisa, Pisa 4.5.215 and 7.5.215 Overview Dimensionality Reduction Motivation Linear Projections
More information