Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction

Size: px
Start display at page:

Download "Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction"

Transcription

1 Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction A presentation by Evan Ettinger on a Paper by Vin de Silva and Joshua B. Tenenbaum May 12, 2005

2 Outline Introduction The NLDR Problem Theoretical Claims ISOMAP Asymptotic Convergence Proofs Conformal ISOMAP Relaxing the Assumptions Landmark ISOMAP Faster and Scalable

3 The NLDR Problem Outline Introduction The NLDR Problem Theoretical Claims ISOMAP Asymptotic Convergence Proofs Conformal ISOMAP Relaxing the Assumptions Landmark ISOMAP Faster and Scalable

4 The NLDR Problem Nonlinear Dimensionality Reduction (NLDR) Let Y R d be a convex domain and let f : Y R D be a smooth embedding into a manifold M for some D > d. Hidden data {y i } are generated randomly and then mapped using f to {x i = f (y i )}. We would like to recover Y and f based on a given set {x i } of observed data in R D. To make the problem well posed assume f is an isometric embedding (i.e. f preserves infinitesmal angles and lengths).

5 The NLDR Problem Locally Linear Embedding (LLE) Compute a weight vector w i that best reconstructs each x i by a linear combination of its k-nn. Compute the embedding in R d that minimizes reconstruction error of each y i using w i and its corresponding k-nn.

6 The NLDR Problem ISOMAP Idea ISOMAP algorithm attempts to recover original embedding of hidden data {y i }. A geodesic is the shortest path in M between two points x and y. The idea: Approximate pairwise geodesic distances in M of {x i }. Given these pairwise geodesic distances, use MDS (description to follow) to find a d-dimensional embedding that preserves geodesic distances.

7 The NLDR Problem Multidimensional Scaling (MDS) Input: A matrix δ 2 ij of dissimilarities (pairwise sq. distances; sq. geodesic distances) where δ 2 ij = (x i x j ) T (x i x j ). Goal: Find a set of points in R d that well-approximates the dissimilarities in δ 2 ij. Let matrix A ij = 1 2 δ2 ij, let symmetric matrix H = I n 1 n 11T and then construct B = HAH. H is known as the centering matrix. B becomes B ij = (x i x) T (x j x). B = HAH = (HX)(HX) T where B is real, symmetric and positive semi-definite. Eigendecomposition of B gives B = V ΛV T.

8 The NLDR Problem MDS (cont d) Like in PCA, take the eigenvectors v i with λ i > 0 of B = V ΛV T and order them from largest to smallest to create both V D and Λ D. Our reconstructed points are then ˆX = V D Λ 1/2 D since B = ˆX ˆX T. To get a d-dimensional embedding we can maximize the fraction of the variance explained by taking the largest d eigenvalues and their corresponding eigenvectors. NOTE: We are not restricted to using a Euclidean distance metric. However, when Euclidean distances are used MDS is equivalent to PCA. For ISOMAP, dissimilarity input to MDS will be pairwise squared geodesic distances.

9 The NLDR Problem ISOMAP in Detail 1. Create graph G of {x i } by either using a K-NN rule or ɛ-rule where each edge e = (x i, x j ) is weighted by the Euclidean distance between the two points O(DN 2 ). 2. Compute the shortest paths in the graph for all pairs of data points to approximate geodesic distances and store in pairwise sq. distance matrix O(N 3 ) using Floyd s Algorithm or O(kN 2 logn) using Dijkstra s algorithm with Fibonacci heaps. 3. Apply MDS on to find a d-dimensional embedding in Euclidean space for the reconstructed {y i } O(N 3 ).

10 The NLDR Problem An Example: Twos

11 The NLDR Problem ISOMAP Theoretical Guarantees ISOMAP has provable convergence guarantees. Given that {x i } is sampled sufficiently dense ISOMAP will approximate closely the original distance as measured in M with high probability. In other words, actual geodesic distance approximations using graph G can be arbitrarily good. Let s examine these theoretical guarantees in more detail...

12 ISOMAP Asymptotic Convergence Proofs Outline Introduction The NLDR Problem Theoretical Claims ISOMAP Asymptotic Convergence Proofs Conformal ISOMAP Relaxing the Assumptions Landmark ISOMAP Faster and Scalable

13 ISOMAP Asymptotic Convergence Proofs Approximating geodesics with G It is not immediately obvious that G should give a good approximation to geodesic distances. Degenerate cases could lead to zig-zagging behavior that could add a significant amount of overhead.

14 ISOMAP Asymptotic Convergence Proofs Theoretical Setup Convergence proof hinges on the idea that we can approximate geodesic distance in M by short Euclidean distance hops. Let s define the following for two points x, y M: d M (x, y) = inf γ {length(γ)} d G (x, y) = min( x 0 x x p 1 x p ) P d S (x, y) = min(d M (x 0, x 1 ) d M (x p 1, x p )) P where γ varies over the set of smooth arcs connecting x to y in M and P varies over all paths along the edges of G starting at data point x = x 0 and ending at y = x p. We will show d M d S and d S d G, which will imply the desired result that d G d M.

15 ISOMAP Asymptotic Convergence Proofs Theorem 1: Sampling Condition Theorem 1: Let ɛ, δ > 0 with 4δ < ɛ. Suppose G contains all edges e = (x, y) for which d M (x, y) < ɛ. Furthermore, assume for every point m M there is a data point x i such that d M (m, x i ) < δ (δ-sampling condition). Then for all pairs of data points x,y we have: d M (x, y) d S (x, y) (1 + 4δ/ɛ)d M (x, y)

16 ISOMAP Asymptotic Convergence Proofs Proof of Theorem 1 Proof: d M (x, y) d S (x, y) (1 + 4δ/ɛ)d M (x, y) The left hand side of the inequality follows directly from the triangle inequality. Let γ be any piecewise-smooth arc connecting x to y with l = length(γ). If l ɛ 2δ then x and y are connected by an edge in G which we can use as our path.

17 ISOMAP Asymptotic Convergence Proofs Proof (cont d) If l > ɛ 2δ then we can write l = l 0 + (l l 1 ) + l 0 where l 1 = ɛ 2δ and ɛ 2δ l 0 (ɛ 2δ)/2. This splits up arc γ into a sequence of points γ 0 = x, γ 1,..., γ p = y. Each point γ i lies within a distance δ of a sample data point x i. Claim: The path xx 1 x 2... x p 1 y satisfies our requirements. d M (x i, x i+1 ) d M (x i, γ i ) + d M (γ i, γ i+1 ) + d M (γ i+1, x i+1 ) δ + l 1 + δ = ɛ = l 1 ɛ/(ɛ 2δ)

18 ISOMAP Asymptotic Convergence Proofs Proof (cont d) Similarly d M (x, x 1 ) l 0 ɛ/(ɛ 2δ) ɛ and the same holds for d M (x p 1, y). d M (x 0, x 1 ) d M (x p 1, x p ) lɛ/(ɛ 2δ) < l(1 + 4δ/ɛ) The last inequality utilizes the fact that 1/(1 t) < 1 + 2t for 0 < t < 1/2. Finally, we take the inf over all γ giving l = d M (x, y). Thus, we see that d S d M arbitrarily well given both the graph construction and δ-sampling conditions.

19 ISOMAP Asymptotic Convergence Proofs d S d G We would like to now show the other approximate equality: d S d G. First let s make some definitions: 1. The minimum radius of curvature r 0 = r 0 (M) is defined by 1 r 0 = max γ,t γ (t) where γ varies over all unit-speed geodesics in M and t is in the domain D of γ. Intuitively, geodesics in M curl around less tightly than circles of radius less than r 0 (M). 2. The minimum branch separation s 0 = s 0 (M) is the largest positive number for which x y < s 0 implies d M (x, y) πr 0 for any x, y M. Lemma: If γ is a geodesic in M connecting points x and y, and if l = length(γ) πr 0, then: 2r 0 sin(l/2r 0 ) x y l

20 ISOMAP Asymptotic Convergence Proofs Notes on Lemma We will take this Lemma without proof as it is somewhat technical and long. Using the fact that sin(t) t t 3 /6 for t 0 we can write down a weakened form of the Lemma: (1 l 2 /24r0 2 )l x y l We can also write down an even more weakened version valid for l πr 0 : We can now show d G d S. (2/π)l x y l

21 ISOMAP Asymptotic Convergence Proofs Theorem 2: Euclidean Hops Geodesic Hops Theorem 2: Let λ > 0 be given. Suppose data points x i, x i+1 M satisfy: x i x i+1 < s 0 x i x i+1 (2/π)r 0 24λ Suppose also there is a geodesic arc of length l = d M (x i, x i+1 ) connecting x i to x i+1. Then: (1 λ)l x i x i+1 l

22 ISOMAP Asymptotic Convergence Proofs Proof of Theorem 2 By the first assumption we can directly conclude l πr 0. This fact allows us to apply the Lemma using the weakest form combined with the second assumption gives us: l (π/2) x i x i+1 r 0 24λ Solving for λ in the above gives: 1 λ (1 l 2 /24r0 2 ). Applying the weakened statement of the Lemma then gives us the desired result. Combining Theorem 1 and 2 shows d M d G. This leads us then to our main theorem...

23 ISOMAP Asymptotic Convergence Proofs Main Theorem Theorem 1: Let M be a compact submanifold of R n and let {x i } be a finite set of data points in M. We are given a graph G on {x i } and positive real numbers λ 1, λ 2 < 1 and δ, ɛ > 0. Suppose: 1. G contains all edges (x i, x j ) of length x i x j ɛ. 2. The data set {x i } statisfies a δ-sampling condition for every point m M there exists an x i such that d M (m, x i ) < δ. 3. M is geodesically convex the shortest curve joining any two points on the surface is a geodesic curve. 4. ɛ < (2/π)r 0 24λ1, where r 0 is the minimum radius of curvature of M 1 r 0 = max γ,t γ (t) where γ varies over all unit-speed geodesics in M. 5. ɛ < s 0, where s 0 is the minimum branch separation of M the largest positive number for which x y < s 0 implies d M (x, y) πr δ < λ 2 ɛ/4. Then the following is valid for all x, y M, (1 λ 1 )d M (x, y) d G (x, y) (1 + λ 2 )d M (x, y)

24 ISOMAP Asymptotic Convergence Proofs Recap So, short Euclidean distance hops along G approximate well actual geodesic distance as measured in M. What were the main assumptions we made? The biggest one was the δ-sampling density condition. A probabilistic version of the Main Theorem can be shown where each point x i is drawn from a density function. Then the approximation bounds will hold with high probability. Here s a truncated version of what the theorem looks like now: Asymptotic Convergence Theorem: Given λ 1, λ 2, µ > 0 then for density function α sufficiently large: 1 λ 1 d G(x, y) d M (x, y) 1 + λ 2 will hold with probability at least 1 µ for any two data points x, y.

25 Relaxing the Assumptions Outline Introduction The NLDR Problem Theoretical Claims ISOMAP Asymptotic Convergence Proofs Conformal ISOMAP Relaxing the Assumptions Landmark ISOMAP Faster and Scalable

26 Relaxing the Assumptions ISOMAP Relaxation Remember NLDR problem formulation: we want to recover f and {y i = f (x i )}. In the original ISOMAP problem statement we only considered isometric embeddings by f (i.e. preserve infinitesimal lengths and angles). If f is isometric, then the shortest path between two points in Y will be a geodesic in M. ISOMAP relies on exactly this fact. A relaxation that will allow us to consider a wider class of manifolds is to allow f to be a conformal mapping f preserves only infinitesimal angles and makes no claims about preserving distances. An equivalent way to view a conformal mapping: at every point y Y there is a smooth scalar function s(y) > 0 such that infinitesimal vectors at y get magnified in length by a factor s(y).

27 Relaxing the Assumptions Linear Isometry, Isometry, Conformal If f is a linear isometry f : R d R D then we can simply use PCA or MDS to recover the d significant dimensions Plane. If f is an isometric embedding f : Y R D then provided that data points are sufficiently dense and Y R d is a convex domain we can use ISOMAP to recover the approximate original structure Swiss Roll. If f is a conformal embedding f : Y R D then we must assume the data is uniformly dense in Y and Y R d is a convex domain and then we can successfuly use C-ISOMAP Fish Bowl.

28 Relaxing the Assumptions C-ISOMAP Idea behind C-ISOMAP: Not only estimate geodesic distances, but also scalar function s(y). Let µ(i) be the mean distance from xi to its k-nn. Each yi and its k-nn occupy a d-dimensional disk of radius r r depends only on d and sampling density. f maps this disk to approximately a d-dimensional disk on M of radius s(y i )r µ(i) s(y i ). µ(i) is a reasonable estimate of s(yi ) since it will be off by a constant factor (uniform density assumption).

29 Relaxing the Assumptions Altering ISOMAP to C-ISOMAP We replace each edge weight in G by x i x j / µ(i)µ(j). Everything else is the same. Resulting Effect: magnify regions of high density and shrink regions of low density. A similar convergence theorem as given before can be shown about C-ISOMAP assuming that Y is sampled uniformly from a bounded convex region.

30 Relaxing the Assumptions C-ISOMAP Examples Setup We will compare LLE, ISOMAP, C-ISOMAP, and MDS on toy datasets. Conformal Fishbowl: Use stereographic projection to project points uniformly distributed in a disk in R 2 onto a sphere with the top removed. Uniform Fishbowl: Points distributed uniformly on the surface of the fishbowl. Offset Fishbowl: Same as conformal fishbowl but points are sampled in Y with a Gaussian offset from center.

31 Relaxing the Assumptions

32 Relaxing the Assumptions Example 2: Face Images 2000 face images were randomly generated varying in distance and left-right pose. Each image is a vector in dimensional space. Below shows the four extreme cases. Conformal because changes in orientation at a long distance will have a smaller effect on local pixel distances than the corresponding change at a shorter distance.

33 Relaxing the Assumptions Face Images Results C-ISOMAP separates the two intrinsic dimensions cleanly. ISOMAP narrows as faces get further away. LLE is highly distorted.

34 Faster and Scalable Outline Introduction The NLDR Problem Theoretical Claims ISOMAP Asymptotic Convergence Proofs Conformal ISOMAP Relaxing the Assumptions Landmark ISOMAP Faster and Scalable

35 Faster and Scalable Motivation for L-ISOMAP ISOMAP out of the box is not scalable. Two bottlenecks: All pairs shortest path - O(kN 2 log N). MDS eigenvalue calculation on a full NxN matrix - O(N 3 ). For contrast, LLE is limited by a sparse eigenvalue computation - O(dN 2 ). Landmark ISOMAP (L-ISOMAP) Idea: Use n << N landmark points from {x i } and compute a n x N matrix of geodesic distances, D n, from each data point to the landmark points only. Use new procedure Landmark-MDS (LMDS) to find a Euclidean embedding of all the data utilizes idea of triangulation similar to GPS. Savings: L-ISOMAP will have shortest paths calculation of O(knN log N) and LMDS eigenvalue problem of O(n 2 N).

36 Faster and Scalable LMDS Details 1. Designate a set of n landmark points. 2. Apply classical MDS to the n x n matrix n of the squared distances between each landmark point to find a d-dimensional embedding of these n points. Let L k be the d x n matrix containing the embedded landmark points constructed by utilizing the calculated eigenvectors v i and eigenvalues λ i. 2 λ1 v T 1 3 λ2 v T 2 L k = 64. p λd v T d 75

37 Faster and Scalable LMDS Details (cont d) 3. Apply distance-based triangulation to find a d-dimensional embedding of all N points. Let δ 1,..., δ n be vectors of the squared distances from the i-th landmark to all the landmarks and let δ µ be the mean of these vectors. Let δ x be the vector of squared distances between a point x and the landmark points. Then the i-th component of the embedding vector for y x is: y x i = 1 v i T ( δ x δ µ) 2 λi It can be shown that the above embedding of y x is equivalent to projecting onto the first d principal components of the landmarks. 4. Finally, we can optionally choose to run PCA to reorient our axes.

38 Faster and Scalable Landmark choices How many landmark points should we choose?... d + 1 landmarks are enough for the triangulation to locate each point uniquely, but heuristics show that a few more is better for stability. Poorly distributed landmarks could lead to foreshortening projection onto the d-dimensional subspace causes a shortening of distances. Good methods are random OR use more expensive MinMax method that for each new landmark added maximizes the minimum distance to the already chosen ones. Either way, running L-ISOMAP in combination with cross-validation techniques would be useful to find a stable embedding.

39 Faster and Scalable L-ISOMAP example

40 Recap and Questions LLE ISOMAP Approach Local Global Isometry Most of the time, covariance distortion Yes Conformal No Guarantees, but sometimes C-ISOMAP Speed Quadratic in N Cubic in N, but L-ISOMAP How do LLE and L-ISOMAP compare in the quality of their output on real world datasets? can we develop a quantitative metric to evaluate them? How much improvement in classification tasks do NLDR techniques really give over traditional dimensionality reduction techniques? Is there some sort of heuristic for choosing k? Possibly could we utilize heirarchical clustering information in constructing a better graph G? Lots of research potential...

41 References V. de Silva and J. B. Tenenbaum. Global versus local methods in nonlinear dimensionality reduction. Neural Information Processing Systems 15 (NIPS 2002), pp , J.A. Lee, M. Verleysen. How to project circular manifolds using geodesic distances? ESANN 2004, European Symposium on Artificial Neural Networks, pp , M. Bernstein, V. de Silva, J. Langford, and J. Tenenbaum. Graph Approximations to Geodesics on Embedded Manifolds. Technical Report, Department of Psychology, Stanford University, V. de Silva, J.B. Tenenbaum. Unsupervised learning of curved manifolds. Nonlinear Estimation and Classification, J.B. Tenenbaum, V. de Silva and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, vol. 290, pp , V. de Silva, J.B. Tenenbaum. Sparse multidimensional scaling using landmark points. Available at: silva/public/publications.html C. Williams. On a connection between kernel PCA and metric multidimensional scaling. Machine Learning

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap

More information

Dimension Reduction and Low-dimensional Embedding

Dimension Reduction and Low-dimensional Embedding Dimension Reduction and Low-dimensional Embedding Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/26 Dimension

More information

Nonlinear Manifold Learning Summary

Nonlinear Manifold Learning Summary Nonlinear Manifold Learning 6.454 Summary Alexander Ihler ihler@mit.edu October 6, 2003 Abstract Manifold learning is the process of estimating a low-dimensional structure which underlies a collection

More information

Manifold Learning and it s application

Manifold Learning and it s application Manifold Learning and it s application Nandan Dubey SE367 Outline 1 Introduction Manifold Examples image as vector Importance Dimension Reduction Techniques 2 Linear Methods PCA Example MDS Perception

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

Nonlinear Dimensionality Reduction. Jose A. Costa

Nonlinear Dimensionality Reduction. Jose A. Costa Nonlinear Dimensionality Reduction Jose A. Costa Mathematics of Information Seminar, Dec. Motivation Many useful of signals such as: Image databases; Gene expression microarrays; Internet traffic time

More information

LECTURE NOTE #11 PROF. ALAN YUILLE

LECTURE NOTE #11 PROF. ALAN YUILLE LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform

More information

Unsupervised dimensionality reduction

Unsupervised dimensionality reduction Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Lecture 10: Dimension Reduction Techniques

Lecture 10: Dimension Reduction Techniques Lecture 10: Dimension Reduction Techniques Radu Balan Department of Mathematics, AMSC, CSCAMM and NWC University of Maryland, College Park, MD April 17, 2018 Input Data It is assumed that there is a set

More information

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal

More information

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations. Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

More information

Non-linear Dimensionality Reduction

Non-linear Dimensionality Reduction Non-linear Dimensionality Reduction CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Laplacian Eigenmaps Locally Linear Embedding (LLE)

More information

ISSN: (Online) Volume 3, Issue 5, May 2015 International Journal of Advance Research in Computer Science and Management Studies

ISSN: (Online) Volume 3, Issue 5, May 2015 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 3, Issue 5, May 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at:

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Nonlinear Dimensionality Reduction Piyush Rai CS5350/6350: Machine Learning October 25, 2011 Recap: Linear Dimensionality Reduction Linear Dimensionality Reduction: Based on a linear projection of the

More information

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold

More information

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13 CSE 291. Assignment 3 Out: Wed May 23 Due: Wed Jun 13 3.1 Spectral clustering versus k-means Download the rings data set for this problem from the course web site. The data is stored in MATLAB format as

More information

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation Introduction and Data Representation Mikhail Belkin & Partha Niyogi Department of Electrical Engieering University of Minnesota Mar 21, 2017 1/22 Outline Introduction 1 Introduction 2 3 4 Connections to

More information

L26: Advanced dimensionality reduction

L26: Advanced dimensionality reduction L26: Advanced dimensionality reduction The snapshot CA approach Oriented rincipal Components Analysis Non-linear dimensionality reduction (manifold learning) ISOMA Locally Linear Embedding CSCE 666 attern

More information

Apprentissage non supervisée

Apprentissage non supervisée Apprentissage non supervisée Cours 3 Higher dimensions Jairo Cugliari Master ECD 2015-2016 From low to high dimension Density estimation Histograms and KDE Calibration can be done automacally But! Let

More information

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation Laplacian Eigenmaps for Dimensionality Reduction and Data Representation Neural Computation, June 2003; 15 (6):1373-1396 Presentation for CSE291 sp07 M. Belkin 1 P. Niyogi 2 1 University of Chicago, Department

More information

Manifold Learning: Theory and Applications to HRI

Manifold Learning: Theory and Applications to HRI Manifold Learning: Theory and Applications to HRI Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr August 19, 2008 1 / 46 Greek Philosopher

More information

Distance Preservation - Part 2

Distance Preservation - Part 2 Distance Preservation - Part 2 Graph Distances Niko Vuokko October 9th 2007 NLDR Seminar Outline Introduction Geodesic and graph distances From linearity to nonlinearity Isomap Geodesic NLM Curvilinear

More information

Lecture: Some Practical Considerations (3 of 4)

Lecture: Some Practical Considerations (3 of 4) Stat260/CS294: Spectral Graph Methods Lecture 14-03/10/2015 Lecture: Some Practical Considerations (3 of 4) Lecturer: Michael Mahoney Scribe: Michael Mahoney Warning: these notes are still very rough.

More information

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian

More information

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008 Advances in Manifold Learning Presented by: Nakul Verma June 10, 008 Outline Motivation Manifolds Manifold Learning Random projection of manifolds for dimension reduction Introduction to random projections

More information

Intrinsic Structure Study on Whale Vocalizations

Intrinsic Structure Study on Whale Vocalizations 1 2015 DCLDE Conference Intrinsic Structure Study on Whale Vocalizations Yin Xian 1, Xiaobai Sun 2, Yuan Zhang 3, Wenjing Liao 3 Doug Nowacek 1,4, Loren Nolte 1, Robert Calderbank 1,2,3 1 Department of

More information

Machine Learning. Data visualization and dimensionality reduction. Eric Xing. Lecture 7, August 13, Eric Xing Eric CMU,

Machine Learning. Data visualization and dimensionality reduction. Eric Xing. Lecture 7, August 13, Eric Xing Eric CMU, Eric Xing Eric Xing @ CMU, 2006-2010 1 Machine Learning Data visualization and dimensionality reduction Eric Xing Lecture 7, August 13, 2010 Eric Xing Eric Xing @ CMU, 2006-2010 2 Text document retrieval/labelling

More information

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold. Nonlinear Methods Data often lies on or near a nonlinear low-dimensional curve aka manifold. 27 Laplacian Eigenmaps Linear methods Lower-dimensional linear projection that preserves distances between all

More information

Dimensionality Reduction AShortTutorial

Dimensionality Reduction AShortTutorial Dimensionality Reduction AShortTutorial Ali Ghodsi Department of Statistics and Actuarial Science University of Waterloo Waterloo, Ontario, Canada, 2006 c Ali Ghodsi, 2006 Contents 1 An Introduction to

More information

Data dependent operators for the spatial-spectral fusion problem

Data dependent operators for the spatial-spectral fusion problem Data dependent operators for the spatial-spectral fusion problem Wien, December 3, 2012 Joint work with: University of Maryland: J. J. Benedetto, J. A. Dobrosotskaya, T. Doster, K. W. Duke, M. Ehler, A.

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II

More information

CALCULUS ON MANIFOLDS. 1. Riemannian manifolds Recall that for any smooth manifold M, dim M = n, the union T M =

CALCULUS ON MANIFOLDS. 1. Riemannian manifolds Recall that for any smooth manifold M, dim M = n, the union T M = CALCULUS ON MANIFOLDS 1. Riemannian manifolds Recall that for any smooth manifold M, dim M = n, the union T M = a M T am, called the tangent bundle, is itself a smooth manifold, dim T M = 2n. Example 1.

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection

More information

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures

More information

Unsupervised Learning: Dimensionality Reduction

Unsupervised Learning: Dimensionality Reduction Unsupervised Learning: Dimensionality Reduction CMPSCI 689 Fall 2015 Sridhar Mahadevan Lecture 3 Outline In this lecture, we set about to solve the problem posed in the previous lecture Given a dataset,

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

A Duality View of Spectral Methods for Dimensionality Reduction

A Duality View of Spectral Methods for Dimensionality Reduction A Duality View of Spectral Methods for Dimensionality Reduction Lin Xiao 1 Jun Sun 2 Stephen Boyd 3 May 3, 2006 1 Center for the Mathematics of Information, California Institute of Technology, Pasadena,

More information

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision) CS4495/6495 Introduction to Computer Vision 8B-L2 Principle Component Analysis (and its use in Computer Vision) Wavelength 2 Wavelength 2 Principal Components Principal components are all about the directions

More information

Spherical Euclidean Distance Embedding of a Graph

Spherical Euclidean Distance Embedding of a Graph Spherical Euclidean Distance Embedding of a Graph Hou-Duo Qi University of Southampton Presented at Isaac Newton Institute Polynomial Optimization August 9, 2013 Spherical Embedding Problem The Problem:

More information

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach Dr. Guangliang Chen February 9, 2016 Outline Introduction Review of linear algebra Matrix SVD PCA Motivation The digits

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

A Duality View of Spectral Methods for Dimensionality Reduction

A Duality View of Spectral Methods for Dimensionality Reduction Lin Xiao lxiao@caltech.edu Center for the Mathematics of Information, California Institute of Technology, Pasadena, CA 91125, USA Jun Sun sunjun@stanford.edu Stephen Boyd boyd@stanford.edu Department of

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional

More information

Lecture 6 Proof for JL Lemma and Linear Dimensionality Reduction

Lecture 6 Proof for JL Lemma and Linear Dimensionality Reduction COMS 4995: Unsupervised Learning (Summer 18) June 7, 018 Lecture 6 Proof for JL Lemma and Linear imensionality Reduction Instructor: Nakul Verma Scribes: Ziyuan Zhong, Kirsten Blancato This lecture gives

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

(Non-linear) dimensionality reduction. Department of Computer Science, Czech Technical University in Prague

(Non-linear) dimensionality reduction. Department of Computer Science, Czech Technical University in Prague (Non-linear) dimensionality reduction Jiří Kléma Department of Computer Science, Czech Technical University in Prague http://cw.felk.cvut.cz/wiki/courses/a4m33sad/start poutline motivation, task definition,

More information

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS VIKAS CHANDRAKANT RAYKAR DECEMBER 5, 24 Abstract. We interpret spectral clustering algorithms in the light of unsupervised

More information

Learning sets and subspaces: a spectral approach

Learning sets and subspaces: a spectral approach Learning sets and subspaces: a spectral approach Alessandro Rudi DIBRIS, Università di Genova Optimization and dynamical processes in Statistical learning and inverse problems Sept 8-12, 2014 A world of

More information

Dimensionality Reduction: A Comparative Review

Dimensionality Reduction: A Comparative Review Tilburg centre for Creative Computing P.O. Box 90153 Tilburg University 5000 LE Tilburg, The Netherlands http://www.uvt.nl/ticc Email: ticc@uvt.nl Copyright c Laurens van der Maaten, Eric Postma, and Jaap

More information

Dimensionality Reduc1on

Dimensionality Reduc1on Dimensionality Reduc1on contd Aarti Singh Machine Learning 10-601 Nov 10, 2011 Slides Courtesy: Tom Mitchell, Eric Xing, Lawrence Saul 1 Principal Component Analysis (PCA) Principal Components are the

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Dimensionality Reduction

Dimensionality Reduction Lecture 5 1 Outline 1. Overview a) What is? b) Why? 2. Principal Component Analysis (PCA) a) Objectives b) Explaining variability c) SVD 3. Related approaches a) ICA b) Autoencoders 2 Example 1: Sportsball

More information

Principal Component Analysis

Principal Component Analysis CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

DIMENSION REDUCTION. min. j=1

DIMENSION REDUCTION. min. j=1 DIMENSION REDUCTION 1 Principal Component Analysis (PCA) Principal components analysis (PCA) finds low dimensional approximations to the data by projecting the data onto linear subspaces. Let X R d and

More information

Preprocessing & dimensionality reduction

Preprocessing & dimensionality reduction Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016

More information

Learning a Kernel Matrix for Nonlinear Dimensionality Reduction

Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger kilianw@cis.upenn.edu Fei Sha feisha@cis.upenn.edu Lawrence K. Saul lsaul@cis.upenn.edu Department of Computer and Information

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Metric-based classifiers. Nuno Vasconcelos UCSD

Metric-based classifiers. Nuno Vasconcelos UCSD Metric-based classifiers Nuno Vasconcelos UCSD Statistical learning goal: given a function f. y f and a collection of eample data-points, learn what the function f. is. this is called training. two major

More information

Dimensionality Reduction: A Comparative Review

Dimensionality Reduction: A Comparative Review Dimensionality Reduction: A Comparative Review L.J.P. van der Maaten, E.O. Postma, H.J. van den Herik MICC, Maastricht University, P.O. Box 616, 6200 MD Maastricht, The Netherlands. Abstract In recent

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Statistical Learning. Dong Liu. Dept. EEIS, USTC

Statistical Learning. Dong Liu. Dept. EEIS, USTC Statistical Learning Dong Liu Dept. EEIS, USTC Chapter 6. Unsupervised and Semi-Supervised Learning 1. Unsupervised learning 2. k-means 3. Gaussian mixture model 4. Other approaches to clustering 5. Principle

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional

More information

Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma

Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma Multivariate Statistics Random Projections and Johnson-Lindenstrauss Lemma Suppose again we have n sample points x,..., x n R p. The data-point x i R p can be thought of as the i-th row X i of an n p-dimensional

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 2 Nonlinear Manifold Learning Multidimensional Scaling (MDS) Locally Linear Embedding (LLE) Beyond Principal Components Analysis (PCA)

More information

Dimensionality Reduction by Unsupervised Regression

Dimensionality Reduction by Unsupervised Regression Dimensionality Reduction by Unsupervised Regression Miguel Á. Carreira-Perpiñán, EECS, UC Merced http://faculty.ucmerced.edu/mcarreira-perpinan Zhengdong Lu, CSEE, OGI http://www.csee.ogi.edu/~zhengdon

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality

More information

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The

More information

Supplemental Materials for. Local Multidimensional Scaling for. Nonlinear Dimension Reduction, Graph Drawing. and Proximity Analysis

Supplemental Materials for. Local Multidimensional Scaling for. Nonlinear Dimension Reduction, Graph Drawing. and Proximity Analysis Supplemental Materials for Local Multidimensional Scaling for Nonlinear Dimension Reduction, Graph Drawing and Proximity Analysis Lisha Chen and Andreas Buja Yale University and University of Pennsylvania

More information

Founda'ons of Large- Scale Mul'media Informa'on Management and Retrieval. Lecture #4 Similarity. Edward Chang

Founda'ons of Large- Scale Mul'media Informa'on Management and Retrieval. Lecture #4 Similarity. Edward Chang Founda'ons of Large- Scale Mul'media Informa'on Management and Retrieval Lecture #4 Similarity Edward Y. Chang Edward Chang Foundations of LSMM 1 Edward Chang Foundations of LSMM 2 Similar? Edward Chang

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

Is Manifold Learning for Toy Data only?

Is Manifold Learning for Toy Data only? s Manifold Learning for Toy Data only? Marina Meilă University of Washington mmp@stat.washington.edu MMDS Workshop 2016 Outline What is non-linear dimension reduction? Metric Manifold Learning Estimating

More information

Graphs, Geometry and Semi-supervised Learning

Graphs, Geometry and Semi-supervised Learning Graphs, Geometry and Semi-supervised Learning Mikhail Belkin The Ohio State University, Dept of Computer Science and Engineering and Dept of Statistics Collaborators: Partha Niyogi, Vikas Sindhwani In

More information

Data-dependent representations: Laplacian Eigenmaps

Data-dependent representations: Laplacian Eigenmaps Data-dependent representations: Laplacian Eigenmaps November 4, 2015 Data Organization and Manifold Learning There are many techniques for Data Organization and Manifold Learning, e.g., Principal Component

More information

Semi Supervised Distance Metric Learning

Semi Supervised Distance Metric Learning Semi Supervised Distance Metric Learning wliu@ee.columbia.edu Outline Background Related Work Learning Framework Collaborative Image Retrieval Future Research Background Euclidean distance d( x, x ) =

More information

Discriminant Uncorrelated Neighborhood Preserving Projections

Discriminant Uncorrelated Neighborhood Preserving Projections Journal of Information & Computational Science 8: 14 (2011) 3019 3026 Available at http://www.joics.com Discriminant Uncorrelated Neighborhood Preserving Projections Guoqiang WANG a,, Weijuan ZHANG a,

More information

Learning a kernel matrix for nonlinear dimensionality reduction

Learning a kernel matrix for nonlinear dimensionality reduction University of Pennsylvania ScholarlyCommons Departmental Papers (CIS) Department of Computer & Information Science 7-4-2004 Learning a kernel matrix for nonlinear dimensionality reduction Kilian Q. Weinberger

More information

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016

U.C. Berkeley CS294: Spectral Methods and Expanders Handout 11 Luca Trevisan February 29, 2016 U.C. Berkeley CS294: Spectral Methods and Expanders Handout Luca Trevisan February 29, 206 Lecture : ARV In which we introduce semi-definite programming and a semi-definite programming relaxation of sparsest

More information

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Machine Learning Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1395 1 / 47 Table of contents 1 Introduction

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

Machine Learning - MT Clustering

Machine Learning - MT Clustering Machine Learning - MT 2016 15. Clustering Varun Kanade University of Oxford November 28, 2016 Announcements No new practical this week All practicals must be signed off in sessions this week Firm Deadline:

More information

Kernel Methods. Barnabás Póczos

Kernel Methods. Barnabás Póczos Kernel Methods Barnabás Póczos Outline Quick Introduction Feature space Perceptron in the feature space Kernels Mercer s theorem Finite domain Arbitrary domain Kernel families Constructing new kernels

More information

Graph Metrics and Dimension Reduction

Graph Metrics and Dimension Reduction Graph Metrics and Dimension Reduction Minh Tang 1 Michael Trosset 2 1 Applied Mathematics and Statistics The Johns Hopkins University 2 Department of Statistics Indiana University, Bloomington November

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 1 Introduction, researchy course, latest papers Going beyond simple machine learning Perception, strange spaces, images, time, behavior

More information

CSC 411 Lecture 12: Principal Component Analysis

CSC 411 Lecture 12: Principal Component Analysis CSC 411 Lecture 12: Principal Component Analysis Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 12-PCA 1 / 23 Overview Today we ll cover the first unsupervised

More information

EE Technion, Spring then. can be isometrically embedded into can be realized as a Gram matrix of rank, Properties:

EE Technion, Spring then. can be isometrically embedded into can be realized as a Gram matrix of rank, Properties: 5/25/200 2 A mathematical exercise Assume points with the metric are isometrically embeddable into Then, there exists a canonical form such that for all Spectral methods We can also write Alexander & Michael

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems

More information

Structured matrix factorizations. Example: Eigenfaces

Structured matrix factorizations. Example: Eigenfaces Structured matrix factorizations Example: Eigenfaces An extremely large variety of interesting and important problems in machine learning can be formulated as: Given a matrix, find a matrix and a matrix

More information

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design

MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications. Class 19: Data Representation by Design MIT 9.520/6.860, Fall 2017 Statistical Learning Theory and Applications Class 19: Data Representation by Design What is data representation? Let X be a data-space X M (M) F (M) X A data representation

More information

Lectures in Discrete Differential Geometry 2 Surfaces

Lectures in Discrete Differential Geometry 2 Surfaces Lectures in Discrete Differential Geometry 2 Surfaces Etienne Vouga February 4, 24 Smooth Surfaces in R 3 In this section we will review some properties of smooth surfaces R 3. We will assume that is parameterized

More information

Distances, volumes, and integration

Distances, volumes, and integration Distances, volumes, and integration Scribe: Aric Bartle 1 Local Shape of a Surface A question that we may ask ourselves is what significance does the second fundamental form play in the geometric characteristics

More information

Lecture 3: Compressive Classification

Lecture 3: Compressive Classification Lecture 3: Compressive Classification Richard Baraniuk Rice University dsp.rice.edu/cs Compressive Sampling Signal Sparsity wideband signal samples large Gabor (TF) coefficients Fourier matrix Compressive

More information

CS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA)

CS168: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA) CS68: The Modern Algorithmic Toolbox Lecture #7: Understanding Principal Component Analysis (PCA) Tim Roughgarden & Gregory Valiant April 0, 05 Introduction. Lecture Goal Principal components analysis

More information

Principal Components Analysis. Sargur Srihari University at Buffalo

Principal Components Analysis. Sargur Srihari University at Buffalo Principal Components Analysis Sargur Srihari University at Buffalo 1 Topics Projection Pursuit Methods Principal Components Examples of using PCA Graphical use of PCA Multidimensional Scaling Srihari 2

More information