MANIFOLD LEARNING: A MACHINE LEARNING PERSPECTIVE. Sam Roweis. University of Toronto Department of Computer Science. [Google: Sam Toronto ]

Size: px
Start display at page:

Download "MANIFOLD LEARNING: A MACHINE LEARNING PERSPECTIVE. Sam Roweis. University of Toronto Department of Computer Science. [Google: Sam Toronto ]"

Transcription

1 MANIFOLD LEARNING: A MACHINE LEARNING PERSPECTIVE Sam Roweis University of Toronto Department of Computer Science [Google: Sam Toronto ] MSRI High-Dimensional Data Workshop December 10, 2004

2 Manifold Learning Means many things to many people. In machine learning, generally refers to a class of unsupervised statistical problems: Dimensionality reduction of a finite data set to preserve or highlight certain features of the original measurements. Latent factor modeling of high-dimensional observations using only a small number of underlying causes. Density estimation, based on a finite sample of a points from a distribution over a high-dimensional space. Mathematically, we assume x = f(y)+noise We see samples of x based on some unknown function f( ), underlying distribution p(y), and some uncharacterized noise process; and we want to learn f( ) (or its inverse). Ill-posed, so we typically make several strong assumptions, e.g. f is smooth, p(y) is uniform and the noise is small.

3 Motivations for Manifold Learning Most Inputs are Redundant Data are points in a high dimensional space. Coherent structure in the world generates strong correlations between components. Geometrically, observations lie on or near thin, connected low dimensional manifolds. Many Processes are Nonlinear We want to model the curved geometry of high-dimensional manifolds. Linearity can be a useful approximation in local domains, but globally too strong. Most interesting data has nonlinear structure. Computational Savings Need to vastly decrease size of inputs while preserving important similarities and differences. Improve efficiency of statistical algorithms, avoid the curse.

4 Dimensionality Reduction Goal: find a set of low-dimensional coordinates y n for each high-dimensional observation x n in order to preserve some measure of the original structure. Appeal: no assumptions about distributions, the data is just what we have in front of us. Disadvantages: does not generalize to new data, does not explicitly reveal anything about the nature of the underlying process, its latent causes or the structure of the manifold it induces in the observation space.

5 Dimensionality Reduction Approach: optimization of low-dimensional coordinates directly, given some carefully designed objective function. Common theme: how to convert local info into global info (e.g. overlapping local geometric constraints, geodesic distances on local graphs, preserving neighbour identities) Typical setup: build a locally connected graph on the data sample; use local measurements to induce a global objective function; optimize this objective using an eigenvector method Examples of Linear methods: SVD, PCA, Classical MDS Examples of Nonlinear methods: Kruskal MDS, Isomap, LLE, Laplacian Eigenmaps, and variants (Conformal Isomap, Hessian LLE, Semidefinite Embedding), Local MDS, Projection Pursuit, self-organizing maps, Stochastic Neighbour Embedding (SNE)

6 Latent factor models Goal: build an explicit model (often probabilistic) of the embedding function f( ) that explains the data we saw. Appeal: explicitly represents underlying causes, allows us to generalize off the data, handles uncertainty and noise naturally. Disadvantages: too many unknowns to build a full probabilistic model, in particular there is a fundamental degeneracy between sampling in latent space and curvature of manifold.

7 Latent factor models Approach: make very strong assumptions and proceed from there using maximum likelihood learning (or approximations). (Typical assumptions: uniform density in latent space, isometric embedding, bounded curvature of manifold.) Examples of Linear methods: probabilistic PCA, factor analysis, etc. Examples of Nonlinear methods: autoencoder neural networks, principal curves/surfaces, generative topographic mapping (GTM), independent components analysis (ICA), Kernel PCA

8 Global Coordination of Local Models Locally simple (e.g. linear) models can be stitched together or aligned to form a global factor model of the entire data space. Appeal: manifolds often look simple locally (e.g. almost linear, almost uniform data sampling). We can often train a simple model well if it is restricted to a small part of space. Combining models has a long history in statistics as mixture modeling. Disadvantages: we need to specify what our goal is in coordination and then design new algorithms to achieve this Approaches: Decoupled train local models and align their internal coordinates later (Teh/Roweis, Brand, Verbeek). Simultaneously fit local models in a way that encourages their agreement (Roweis,Saul,Hinton).

9 Other Issues in Manifold Learning Out of sample extensions for many dimensionality reduction methods can be achieved with interpolation techniques such as the Nystrom approximation. Semi-supervised versions of many of these problems arise naturally if we are given class labels, partial observations of hidden causes, correspondence information, etc. Recent clustering algorithms have used similar techniques and addressed related problems (e.g. spectral clustering, min cut) Exploration of fundamental link between spectral nonlinear dimensionality reduction algorithms and kernel methods. Isolated sub-problem of estimating the underlying (co-)dimensionality of a manifold has received lots of attention. Increased focus on computational speedups, e.g. landmark methods, efficient iterated eigensolvers, convex programming.

10 Linear Projection Methods References Zoubin Ghahramani & Geoff Hinton, The EM algorithm for Mixtures of Factor Analyzers, U.Toronto Tech Report CRG-TR-96-1, A.J. Bell & T.J. Sejnowski, An information maximisation approach to blind separation and blind deconvolution, Neural Computation 7(6), David Mackay, Maximum Likelihood and Covariant Algorithms for ICA, unpublished, A. Hyvarinen, J. Karhunen, & E. Oja. Independent Component Analysis. Wiley, Sam Roweis, EM Algorithms for PCA and SPCA, NIPS 10, M.E. Tipping & C.M. Bishop. Probabilistic principal component analysis. Journal of the Royal Statistical Society,61(3), pp. 611, A. Basilevsky. Statistical Factor Analysis and Related Methods. Wiley, NewYork, I. Borg & P. Groenen Modern Multidimensional Scaling: Theory and Applications. Springer, T.F. Cox & M.A.A. Cox. Multidimensional Scaling., Chapman and Hall,2001.

11 Alignment of Local Models References Michael E. Tipping & Christopher M. Bishop, Mixtures of Probabilistic Principal Component Analysers., Neural Computation 11(2), pp , Sam Roweis, Lawrence Saul & Geoff Hinton. Global Coordination of Local Linear Models. NIPS 14, pp , Yee Whye Teh & Sam T. Roweis, Automatic Alignment of Hidden Representations. NIPS 15, pp , M. Brand, Charting a manifold, NIPS 15, J. H. Ham, D. D. Lee & L. K. Saul Learning high dimensional correspondences from low dimensional manifolds., ICML Workshop on The Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, 2003 J.J. Verbeek, S.T. Roweis & N. Vlassis, Non-linear CCA and PCA by Alignment of Local Models. NIPS 16, 2004.

12 References Neural networks, and other nonparametric mappings Geoffrey Hinton & Sam T. Roweis, Stochastic Neighbor Embedding. NIPS 15, pp , G. E. Hinton, P. Dayan & M. Revow, Modeling the manifolds of handwritten digits. IEEE Transactions on Neural Networks, N. Kambhatla & T. Leen, Dimension reduction by local principal component analysis. Neural Computation, v.9, pp , C.M. Bishop, M. Svenson & C.K.I. Williams, GTM: The Generative Topographic Mapping, Neural Computation, 10(1), pp , H. Bourlard & Y. Kamp, Auto-association by multilayer perceptrons and singular value decomposition, Biological Cybernetics, Vol. 59, pp , 1988 K.I.Diamantaras & S.Y. Kung, Principal Component Neural Networks. John Wiley, R. Durbin & D. Willshaw, An Analogue Approach to the Travelling Salesman Problem Using an Elastic Net Method Nature, Vol. 326, pp , 1987 E. Erwin, K. Obermayer & K. Schulten, Self-organizing maps: ordering, convergence properties and energy functions Biological Cybernetics, 67(1), pp , 1992.

13 References Principal Curves and Projection Pursuit T.J. Hastie & W. Stuetzle. Principal curves. Journal of the American Statistical Association v.84, pp , P. Diaconis & D. Freedman, Asymptotics of graphical projection pursuit. Annals of Statistics v. 12, pp , J.H. Friedman, W. Stuetzle & A. Schroeder. Projection pursuit density estimation. Journal of the American Statistical Association v.79, pp , J.H. Friedman & J.W. Tukey. A projection pursuit algorithm for exploratory data analysis. IEEE Transactions on Computers,c-23(9), pp , P.J. Huber. Projection pursuit. Annals of Statistics, 13(2), pp , 1985.

14 References Eigenvector Manifold Learning Algorithms S.T. Roweis & L.K. Saul, Nonlinear dimensionality reduction by locally linear embedding. Science, 290(22), pp , L. K. Saul & S. T. Roweis, Think globally, fit locally: unsupervised learning of low dimensional manifolds, Journal of Machine Learning Research, v. 4, pp , J. B. Tenenbaum, V. de Silva & J. C. Langford, A Global Geometric Framework for Nonlinear Dimensionality reduction, Science 290(22), pp , J.B. Tenenbaum, Mapping a Manifold of Perceptual Observations, NIPS 10, M. Belkin & P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation 15(6), pp , M. Belkin & P. Niyogi. Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering NIPS 14, pp , Y. Bengio, J. Paiement & P. Vincent. Out-of-sample extensions for LLE, Isomap, MDS, Eigenmaps and spectral clustering., NIPS 16, V. desilva & J.B. Tenenbaum. Global versus local methods in nonlinear dimensionality reduction. NIPS 15, pp , D. L. Donoho & C. E. Grimes, Hessian Eigenmaps: new locally linear embedding techniques for high-dimensional data, Proceedings of the National Academy of Arts and Sciences, v. 100 pp , H. Zha and Z. Zhang, Isometric embedding and continuum Isomap, ICML pp , K. Q. Weinberger, F. Sha & L. K. Saul, Learning a kernel matrix for nonlinear dimensionality reduction, ICML, K. Q. Weinberger & L. K. Saul, Unsupervised learning of image manifolds by semidefinite programming, CVPR, 2004.

15 Spectral Clustering References Andrew Ng, Michael Jordan & Yair Weiss, On spectral clustering: analysis and an algorithm. NIPS 14, Marina Meila & Jianbo Shi. Learning segmentation by random walks., NIPS 12, pp , C. Fowlkes, S. Belongie, F. Chung & J. Malik. Spectral grouping using the Nystrom method. IEEE Trans. Pattern Analysis and Machine Intelligence, 26(2), Jianbo Shi & Jitendra Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), pp , R. Pless & I. Simon, Embedding images in non-flat spaces, Washington U., Tech. Rep. WU-CS-01-43, F.R.K. Chung. Spectral Graph Theory., American Mathematical Society, 1997.

16 Kernel Methods References M.A. Aizerman, E.M. Braverman & L.I. Rozoner. Theoretical foundations of the potential function method in pattern recognition learning. Automation and Remote Control v.25, pp ,1964. J. Ham, D.D. Lee, S. Mika & B. Scholkopf. A kernel view of dimensionality reduction of manifolds. ICML, C.K.I. Williams. On a Connection between Kernel PCA and Metric Multidimensional Scaling. NIPS 13, pp , C.K.I. Williams & M. Seeger. Using the Nystrom method to speed up kernel machines. NIPS 13, pp , B. Schoelkopf, A. Smola & K.-R. Mueller, Nonlinear Component Analysis as a Kernel Eigenvalue Problem, Neural Computation, 10(5), pp , B. Scholkopf. The kernel trick for distances. NIPS 13, pp , B. Scholkopf & A.Smola. Learning with Kernels, MIT Press, S. Mika, B. Scholkopf, A. Smola, K. Muller, M. Scholz & G. Ratsch. Kernel PCA and de-noising in feature spaces. NIPS 11, 1999.

17 Useful Overviews, etc. References Martin Law s Manifold Learning Resource Page lawhiu/manifold/ Chris Burges review of dimensionality reduction cburges/tech reports/tr dimred.pdf General Mathematical/Statistical Background B.D. Ripley, Pattern Recognition and Neural Networks, Cambridge University Press, 1996 R.O. Duda & P.E. Hart. Pattern Classification and Scene Analysis., John Wiley, G.H. Golub & C.F. Van Loan. Matrix Computations. (3rd ed.) JohnsHopkins,1996. R.A. Horn & C.R. Johnson. Matrix Analysis. Cambridge University Press,1985. J.R. Magnus & H. Neudecker, Matrix Differential Calculus with Applications, Wiley, T. Hastie, R. Tibshirani & J. Friedman, The Elements of Statistical Learning, Springer-Verlag, 2001.

Nonlinear Dimensionality Reduction. Jose A. Costa

Nonlinear Dimensionality Reduction. Jose A. Costa Nonlinear Dimensionality Reduction Jose A. Costa Mathematics of Information Seminar, Dec. Motivation Many useful of signals such as: Image databases; Gene expression microarrays; Internet traffic time

More information

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13

CSE 291. Assignment Spectral clustering versus k-means. Out: Wed May 23 Due: Wed Jun 13 CSE 291. Assignment 3 Out: Wed May 23 Due: Wed Jun 13 3.1 Spectral clustering versus k-means Download the rings data set for this problem from the course web site. The data is stored in MATLAB format as

More information

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center

Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center Distance Metric Learning in Data Mining (Part II) Fei Wang and Jimeng Sun IBM TJ Watson Research Center 1 Outline Part I - Applications Motivation and Introduction Patient similarity application Part II

More information

Dimensionality Reduction AShortTutorial

Dimensionality Reduction AShortTutorial Dimensionality Reduction AShortTutorial Ali Ghodsi Department of Statistics and Actuarial Science University of Waterloo Waterloo, Ontario, Canada, 2006 c Ali Ghodsi, 2006 Contents 1 An Introduction to

More information

Unsupervised dimensionality reduction

Unsupervised dimensionality reduction Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap

More information

Riemannian Manifold Learning for Nonlinear Dimensionality Reduction

Riemannian Manifold Learning for Nonlinear Dimensionality Reduction Riemannian Manifold Learning for Nonlinear Dimensionality Reduction Tony Lin 1,, Hongbin Zha 1, and Sang Uk Lee 2 1 National Laboratory on Machine Perception, Peking University, Beijing 100871, China {lintong,

More information

Learning a Kernel Matrix for Nonlinear Dimensionality Reduction

Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger kilianw@cis.upenn.edu Fei Sha feisha@cis.upenn.edu Lawrence K. Saul lsaul@cis.upenn.edu Department of Computer and Information

More information

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation

Laplacian Eigenmaps for Dimensionality Reduction and Data Representation Introduction and Data Representation Mikhail Belkin & Partha Niyogi Department of Electrical Engieering University of Minnesota Mar 21, 2017 1/22 Outline Introduction 1 Introduction 2 3 4 Connections to

More information

A Duality View of Spectral Methods for Dimensionality Reduction

A Duality View of Spectral Methods for Dimensionality Reduction A Duality View of Spectral Methods for Dimensionality Reduction Lin Xiao 1 Jun Sun 2 Stephen Boyd 3 May 3, 2006 1 Center for the Mathematics of Information, California Institute of Technology, Pasadena,

More information

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal

More information

Learning a kernel matrix for nonlinear dimensionality reduction

Learning a kernel matrix for nonlinear dimensionality reduction University of Pennsylvania ScholarlyCommons Departmental Papers (CIS) Department of Computer & Information Science 7-4-2004 Learning a kernel matrix for nonlinear dimensionality reduction Kilian Q. Weinberger

More information

A Duality View of Spectral Methods for Dimensionality Reduction

A Duality View of Spectral Methods for Dimensionality Reduction Lin Xiao lxiao@caltech.edu Center for the Mathematics of Information, California Institute of Technology, Pasadena, CA 91125, USA Jun Sun sunjun@stanford.edu Stephen Boyd boyd@stanford.edu Department of

More information

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures

More information

Nonlinear Dimensionality Reduction by Semidefinite Programming and Kernel Matrix Factorization

Nonlinear Dimensionality Reduction by Semidefinite Programming and Kernel Matrix Factorization Nonlinear Dimensionality Reduction by Semidefinite Programming and Kernel Matrix Factorization Kilian Q. Weinberger, Benjamin D. Packer, and Lawrence K. Saul Department of Computer and Information Science

More information

Robust Laplacian Eigenmaps Using Global Information

Robust Laplacian Eigenmaps Using Global Information Manifold Learning and its Applications: Papers from the AAAI Fall Symposium (FS-9-) Robust Laplacian Eigenmaps Using Global Information Shounak Roychowdhury ECE University of Texas at Austin, Austin, TX

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 23 1 / 27 Overview

More information

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS

SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS VIKAS CHANDRAKANT RAYKAR DECEMBER 5, 24 Abstract. We interpret spectral clustering algorithms in the light of unsupervised

More information

Non-linear Dimensionality Reduction

Non-linear Dimensionality Reduction Non-linear Dimensionality Reduction CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Laplacian Eigenmaps Locally Linear Embedding (LLE)

More information

Spectral Dimensionality Reduction

Spectral Dimensionality Reduction Spectral Dimensionality Reduction Yoshua Bengio, Olivier Delalleau, Nicolas Le Roux Jean-François Paiement, Pascal Vincent, and Marie Ouimet Département d Informatique et Recherche Opérationnelle Centre

More information

Graphs, Geometry and Semi-supervised Learning

Graphs, Geometry and Semi-supervised Learning Graphs, Geometry and Semi-supervised Learning Mikhail Belkin The Ohio State University, Dept of Computer Science and Engineering and Dept of Statistics Collaborators: Partha Niyogi, Vikas Sindhwani In

More information

Learning Eigenfunctions Links Spectral Embedding

Learning Eigenfunctions Links Spectral Embedding Learning Eigenfunctions Links Spectral Embedding and Kernel PCA Yoshua Bengio, Olivier Delalleau, Nicolas Le Roux Jean-François Paiement, Pascal Vincent, and Marie Ouimet Département d Informatique et

More information

Improved Local Coordinate Coding using Local Tangents

Improved Local Coordinate Coding using Local Tangents Improved Local Coordinate Coding using Local Tangents Kai Yu NEC Laboratories America, 10081 N. Wolfe Road, Cupertino, CA 95129 Tong Zhang Rutgers University, 110 Frelinghuysen Road, Piscataway, NJ 08854

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems

More information

Large-Scale Manifold Learning

Large-Scale Manifold Learning Large-Scale Manifold Learning Ameet Talwalkar Courant Institute New York, NY ameet@cs.nyu.edu Sanjiv Kumar Google Research New York, NY sanjivk@google.com Henry Rowley Google Research Mountain View, CA

More information

Graph-Laplacian PCA: Closed-form Solution and Robustness

Graph-Laplacian PCA: Closed-form Solution and Robustness 2013 IEEE Conference on Computer Vision and Pattern Recognition Graph-Laplacian PCA: Closed-form Solution and Robustness Bo Jiang a, Chris Ding b,a, Bin Luo a, Jin Tang a a School of Computer Science and

More information

Bi-stochastic kernels via asymmetric affinity functions

Bi-stochastic kernels via asymmetric affinity functions Bi-stochastic kernels via asymmetric affinity functions Ronald R. Coifman, Matthew J. Hirn Yale University Department of Mathematics P.O. Box 208283 New Haven, Connecticut 06520-8283 USA ariv:1209.0237v4

More information

Localized Sliced Inverse Regression

Localized Sliced Inverse Regression Localized Sliced Inverse Regression Qiang Wu, Sayan Mukherjee Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University, Durham NC 2778-251,

More information

Linear and Non-Linear Dimensionality Reduction

Linear and Non-Linear Dimensionality Reduction Linear and Non-Linear Dimensionality Reduction Alexander Schulz aschulz(at)techfak.uni-bielefeld.de University of Pisa, Pisa 4.5.215 and 7.5.215 Overview Dimensionality Reduction Motivation Linear Projections

More information

Discriminative K-means for Clustering

Discriminative K-means for Clustering Discriminative K-means for Clustering Jieping Ye Arizona State University Tempe, AZ 85287 jieping.ye@asu.edu Zheng Zhao Arizona State University Tempe, AZ 85287 zhaozheng@asu.edu Mingrui Wu MPI for Biological

More information

Learning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst

Learning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst Learning on Graphs and Manifolds CMPSCI 689 Sridhar Mahadevan U.Mass Amherst Outline Manifold learning is a relatively new area of machine learning (2000-now). Main idea Model the underlying geometry of

More information

Gaussian Process Latent Random Field

Gaussian Process Latent Random Field Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10) Gaussian Process Latent Random Field Guoqiang Zhong, Wu-Jun Li, Dit-Yan Yeung, Xinwen Hou, Cheng-Lin Liu National Laboratory

More information

Intrinsic Structure Study on Whale Vocalizations

Intrinsic Structure Study on Whale Vocalizations 1 2015 DCLDE Conference Intrinsic Structure Study on Whale Vocalizations Yin Xian 1, Xiaobai Sun 2, Yuan Zhang 3, Wenjing Liao 3 Doug Nowacek 1,4, Loren Nolte 1, Robert Calderbank 1,2,3 1 Department of

More information

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian

More information

Statistical and Computational Analysis of Locality Preserving Projection

Statistical and Computational Analysis of Locality Preserving Projection Statistical and Computational Analysis of Locality Preserving Projection Xiaofei He xiaofei@cs.uchicago.edu Department of Computer Science, University of Chicago, 00 East 58th Street, Chicago, IL 60637

More information

Manifold Learning and it s application

Manifold Learning and it s application Manifold Learning and it s application Nandan Dubey SE367 Outline 1 Introduction Manifold Examples image as vector Importance Dimension Reduction Techniques 2 Linear Methods PCA Example MDS Perception

More information

Analysis of Spectral Kernel Design based Semi-supervised Learning

Analysis of Spectral Kernel Design based Semi-supervised Learning Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN , Sparse Kernel Canonical Correlation Analysis Lili Tan and Colin Fyfe 2, Λ. Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong. 2. School of Information and Communication

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Regression on Manifolds Using Kernel Dimension Reduction

Regression on Manifolds Using Kernel Dimension Reduction Jens Nilsson JENSN@MATHS.LTH.SE Centre for Mathematical Sciences, Lund University, Box 118, SE-221 00 Lund, Sweden Fei Sha FEISHA@CS.BERKELEY.EDU Computer Science Division, University of California, Berkeley,

More information

Machine Learning. Data visualization and dimensionality reduction. Eric Xing. Lecture 7, August 13, Eric Xing Eric CMU,

Machine Learning. Data visualization and dimensionality reduction. Eric Xing. Lecture 7, August 13, Eric Xing Eric CMU, Eric Xing Eric Xing @ CMU, 2006-2010 1 Machine Learning Data visualization and dimensionality reduction Eric Xing Lecture 7, August 13, 2010 Eric Xing Eric Xing @ CMU, 2006-2010 2 Text document retrieval/labelling

More information

The Numerical Stability of Kernel Methods

The Numerical Stability of Kernel Methods The Numerical Stability of Kernel Methods Shawn Martin Sandia National Laboratories P.O. Box 5800 Albuquerque, NM 87185-0310 smartin@sandia.gov November 3, 2005 Abstract Kernel methods use kernel functions

More information

Exploring model selection techniques for nonlinear dimensionality reduction

Exploring model selection techniques for nonlinear dimensionality reduction Exploring model selection techniques for nonlinear dimensionality reduction Stefan Harmeling Edinburgh University, Scotland stefan.harmeling@ed.ac.uk Informatics Research Report EDI-INF-RR-0960 SCHOOL

More information

Full text available at: Dimension Reduction: A Guided Tour

Full text available at:   Dimension Reduction: A Guided Tour Dimension Reduction: A Guided Tour Dimension Reduction: A Guided Tour Christopher J. C. Burges Microsoft Research One Microsoft Way Redmond, WA 98052-6399 USA chris.burges@microsoft.com Boston Delft Foundations

More information

Apprentissage non supervisée

Apprentissage non supervisée Apprentissage non supervisée Cours 3 Higher dimensions Jairo Cugliari Master ECD 2015-2016 From low to high dimension Density estimation Histograms and KDE Calibration can be done automacally But! Let

More information

Local Learning Projections

Local Learning Projections Mingrui Wu mingrui.wu@tuebingen.mpg.de Max Planck Institute for Biological Cybernetics, Tübingen, Germany Kai Yu kyu@sv.nec-labs.com NEC Labs America, Cupertino CA, USA Shipeng Yu shipeng.yu@siemens.com

More information

Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction

Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction Global (ISOMAP) versus Local (LLE) Methods in Nonlinear Dimensionality Reduction A presentation by Evan Ettinger on a Paper by Vin de Silva and Joshua B. Tenenbaum May 12, 2005 Outline Introduction The

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

Informative Laplacian Projection

Informative Laplacian Projection Informative Laplacian Projection Zhirong Yang and Jorma Laaksonen Department of Information and Computer Science Helsinki University of Technology P.O. Box 5400, FI-02015, TKK, Espoo, Finland {zhirong.yang,jorma.laaksonen}@tkk.fi

More information

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31

Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking p. 1/31 Learning from Labeled and Unlabeled Data: Semi-supervised Learning and Ranking Dengyong Zhou zhou@tuebingen.mpg.de Dept. Schölkopf, Max Planck Institute for Biological Cybernetics, Germany Learning from

More information

Theoretical analysis of LLE based on its weighting step

Theoretical analysis of LLE based on its weighting step Theoretical analysis of LLE based on its weighting step Yair Goldberg and Ya acov Ritov Department of Statistics and The Center for the Study of Rationality The Hebrew University March 29, 2011 Abstract

More information

Spectral Dimensionality Reduction via Maximum Entropy

Spectral Dimensionality Reduction via Maximum Entropy Sheffield Institute for Translational Neuroscience and Department of Computer Science, University of Sheffield Abstract We introduce a new perspective on spectral dimensionality reduction which views these

More information

Spectral Clustering. Zitao Liu

Spectral Clustering. Zitao Liu Spectral Clustering Zitao Liu Agenda Brief Clustering Review Similarity Graph Graph Laplacian Spectral Clustering Algorithm Graph Cut Point of View Random Walk Point of View Perturbation Theory Point of

More information

The Curse of Dimensionality for Local Kernel Machines

The Curse of Dimensionality for Local Kernel Machines The Curse of Dimensionality for Local Kernel Machines Yoshua Bengio, Olivier Delalleau & Nicolas Le Roux April 7th 2005 Yoshua Bengio, Olivier Delalleau & Nicolas Le Roux Snowbird Learning Workshop Perspective

More information

Discriminant Uncorrelated Neighborhood Preserving Projections

Discriminant Uncorrelated Neighborhood Preserving Projections Journal of Information & Computational Science 8: 14 (2011) 3019 3026 Available at http://www.joics.com Discriminant Uncorrelated Neighborhood Preserving Projections Guoqiang WANG a,, Weijuan ZHANG a,

More information

Part I Generalized Principal Component Analysis

Part I Generalized Principal Component Analysis Part I Generalized Principal Component Analysis René Vidal Center for Imaging Science Institute for Computational Medicine Johns Hopkins University Principal Component Analysis (PCA) Given a set of points

More information

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold

More information

Spectral Techniques for Clustering

Spectral Techniques for Clustering Nicola Rebagliati 1/54 Spectral Techniques for Clustering Nicola Rebagliati 29 April, 2010 Nicola Rebagliati 2/54 Thesis Outline 1 2 Data Representation for Clustering Setting Data Representation and Methods

More information

TUTORIAL PART 1 Unsupervised Learning

TUTORIAL PART 1 Unsupervised Learning TUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu Co-organizers: Honglak Lee, Yoshua Bengio, Geoff Hinton, Yann LeCun, Andrew

More information

Statistical Learning. Dong Liu. Dept. EEIS, USTC

Statistical Learning. Dong Liu. Dept. EEIS, USTC Statistical Learning Dong Liu Dept. EEIS, USTC Chapter 6. Unsupervised and Semi-Supervised Learning 1. Unsupervised learning 2. k-means 3. Gaussian mixture model 4. Other approaches to clustering 5. Principle

More information

Locality Preserving Projections

Locality Preserving Projections Locality Preserving Projections Xiaofei He Department of Computer Science The University of Chicago Chicago, IL 60637 xiaofei@cs.uchicago.edu Partha Niyogi Department of Computer Science The University

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES

SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES JIANG ZHU, SHILIANG SUN Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 20024, P. R. China E-MAIL:

More information

Data-dependent representations: Laplacian Eigenmaps

Data-dependent representations: Laplacian Eigenmaps Data-dependent representations: Laplacian Eigenmaps November 4, 2015 Data Organization and Manifold Learning There are many techniques for Data Organization and Manifold Learning, e.g., Principal Component

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Graph Metrics and Dimension Reduction

Graph Metrics and Dimension Reduction Graph Metrics and Dimension Reduction Minh Tang 1 Michael Trosset 2 1 Applied Mathematics and Statistics The Johns Hopkins University 2 Department of Statistics Indiana University, Bloomington November

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

L26: Advanced dimensionality reduction

L26: Advanced dimensionality reduction L26: Advanced dimensionality reduction The snapshot CA approach Oriented rincipal Components Analysis Non-linear dimensionality reduction (manifold learning) ISOMA Locally Linear Embedding CSCE 666 attern

More information

Chap.11 Nonlinear principal component analysis [Book, Chap. 10]

Chap.11 Nonlinear principal component analysis [Book, Chap. 10] Chap.11 Nonlinear principal component analysis [Book, Chap. 1] We have seen machine learning methods nonlinearly generalizing the linear regression method. Now we will examine ways to nonlinearly generalize

More information

Spectral Clustering. by HU Pili. June 16, 2013

Spectral Clustering. by HU Pili. June 16, 2013 Spectral Clustering by HU Pili June 16, 2013 Outline Clustering Problem Spectral Clustering Demo Preliminaries Clustering: K-means Algorithm Dimensionality Reduction: PCA, KPCA. Spectral Clustering Framework

More information

Spectral Hashing. Antonio Torralba 1 1 CSAIL, MIT, 32 Vassar St., Cambridge, MA Abstract

Spectral Hashing. Antonio Torralba 1 1 CSAIL, MIT, 32 Vassar St., Cambridge, MA Abstract Spectral Hashing Yair Weiss,3 3 School of Computer Science, Hebrew University, 9904, Jerusalem, Israel yweiss@cs.huji.ac.il Antonio Torralba CSAIL, MIT, 32 Vassar St., Cambridge, MA 0239 torralba@csail.mit.edu

More information

Dimensionality Reduction

Dimensionality Reduction Dimensionality Reduction Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, UCL Apr/May 2016 High dimensional data Example data: Gene Expression Example data: Web Pages Google

More information

Local Fisher Discriminant Analysis for Supervised Dimensionality Reduction

Local Fisher Discriminant Analysis for Supervised Dimensionality Reduction Local Fisher Discriminant Analysis for Supervised Dimensionality Reduction Masashi Sugiyama sugi@cs.titech.ac.jp Department of Computer Science, Tokyo Institute of Technology, ---W8-7, O-okayama, Meguro-ku,

More information

Chapter 1. GEOMETRIC METHODS FOR FEATURE EXTRACTION AND DIMENSIONAL REDUCTION A Guided Tour. Introduction. Christopher J.C. Burges

Chapter 1. GEOMETRIC METHODS FOR FEATURE EXTRACTION AND DIMENSIONAL REDUCTION A Guided Tour. Introduction. Christopher J.C. Burges Chapter 1 GEOMETRIC METHODS FOR FEATURE EXTRACTION AND DIMENSIONAL REDUCTION A Guided Tour Christopher J.C. Burges Microsoft Research Abstract Keywords: We give a tutorial overview of several geometric

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Nonlinear Dimensionality Reduction Piyush Rai CS5350/6350: Machine Learning October 25, 2011 Recap: Linear Dimensionality Reduction Linear Dimensionality Reduction: Based on a linear projection of the

More information

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold. Nonlinear Methods Data often lies on or near a nonlinear low-dimensional curve aka manifold. 27 Laplacian Eigenmaps Linear methods Lower-dimensional linear projection that preserves distances between all

More information

arxiv: v1 [cs.lg] 30 Jun 2012

arxiv: v1 [cs.lg] 30 Jun 2012 Implicit Density Estimation by Local Moment Matching to Sample from Auto-Encoders arxiv:1207.0057v1 [cs.lg] 30 Jun 2012 Yoshua Bengio, Guillaume Alain, and Salah Rifai Department of Computer Science and

More information

Validation of nonlinear PCA

Validation of nonlinear PCA Matthias Scholz. Validation of nonlinear PCA. (pre-print version) The final publication is available at www.springerlink.com Neural Processing Letters, 212, Volume 36, Number 1, Pages 21-3 Doi: 1.17/s1163-12-922-6

More information

Dimensionality Reduction: A Comparative Review

Dimensionality Reduction: A Comparative Review Dimensionality Reduction: A Comparative Review L.J.P. van der Maaten, E.O. Postma, H.J. van den Herik MICC, Maastricht University, P.O. Box 616, 6200 MD Maastricht, The Netherlands. Abstract In recent

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

Locally Linear Embedded Eigenspace Analysis

Locally Linear Embedded Eigenspace Analysis Locally Linear Embedded Eigenspace Analysis IFP.TR-LEA.YunFu-Jan.1,2005 Yun Fu and Thomas S. Huang Beckman Institute for Advanced Science and Technology University of Illinois at Urbana-Champaign 405 North

More information

What, exactly, is a cluster? - Bernhard Schölkopf, personal communication

What, exactly, is a cluster? - Bernhard Schölkopf, personal communication Chapter 1 Warped Mixture Models What, exactly, is a cluster? - Bernhard Schölkopf, personal communication Previous chapters showed how the probabilistic nature of GPs sometimes allows the automatic determination

More information

Using Kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method

Using Kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method Using Kernel PCA for Initialisation of Variational Bayesian Nonlinear Blind Source Separation Method Antti Honkela 1, Stefan Harmeling 2, Leo Lundqvist 1, and Harri Valpola 1 1 Helsinki University of Technology,

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Nonlinear Learning using Local Coordinate Coding

Nonlinear Learning using Local Coordinate Coding Nonlinear Learning using Local Coordinate Coding Kai Yu NEC Laboratories America kyu@sv.nec-labs.com Tong Zhang Rutgers University tzhang@stat.rutgers.edu Yihong Gong NEC Laboratories America ygong@sv.nec-labs.com

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Smooth Bayesian Kernel Machines

Smooth Bayesian Kernel Machines Smooth Bayesian Kernel Machines Rutger W. ter Borg 1 and Léon J.M. Rothkrantz 2 1 Nuon NV, Applied Research & Technology Spaklerweg 20, 1096 BA Amsterdam, the Netherlands rutger@terborg.net 2 Delft University

More information

LECTURE NOTE #11 PROF. ALAN YUILLE

LECTURE NOTE #11 PROF. ALAN YUILLE LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform

More information

Smart PCA. Yi Zhang Machine Learning Department Carnegie Mellon University

Smart PCA. Yi Zhang Machine Learning Department Carnegie Mellon University Smart PCA Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Abstract PCA can be smarter and makes more sensible projections. In this paper, we propose smart PCA, an extension

More information

Linear Heteroencoders

Linear Heteroencoders Gatsby Computational Neuroscience Unit 17 Queen Square, London University College London WC1N 3AR, United Kingdom http://www.gatsby.ucl.ac.uk +44 20 7679 1176 Funded in part by the Gatsby Charitable Foundation.

More information

Multiple Similarities Based Kernel Subspace Learning for Image Classification

Multiple Similarities Based Kernel Subspace Learning for Image Classification Multiple Similarities Based Kernel Subspace Learning for Image Classification Wang Yan, Qingshan Liu, Hanqing Lu, and Songde Ma National Laboratory of Pattern Recognition, Institute of Automation, Chinese

More information

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model (& discussion on the GPLVM tech. report by Prof. N. Lawrence, 06) Andreas Damianou Department of Neuro- and Computer Science,

More information

Matching the dimensionality of maps with that of the data

Matching the dimensionality of maps with that of the data Matching the dimensionality of maps with that of the data COLIN FYFE Applied Computational Intelligence Research Unit, The University of Paisley, Paisley, PA 2BE SCOTLAND. Abstract Topographic maps are

More information

Integrating Global and Local Structures: A Least Squares Framework for Dimensionality Reduction

Integrating Global and Local Structures: A Least Squares Framework for Dimensionality Reduction Integrating Global and Local Structures: A Least Squares Framework for Dimensionality Reduction Jianhui Chen, Jieping Ye Computer Science and Engineering Department Arizona State University {jianhui.chen,

More information

Short-term Wind Speed Forecasting by Using Model Structure Selection and Manifold Algorithm

Short-term Wind Speed Forecasting by Using Model Structure Selection and Manifold Algorithm by Using Model Structure Selection and Manifold Algorithm 1 School of Automation, Southeast University, Jiangsu, 210096, Nanjing, China Key Laboratory of Measurement and Control for Complex System of Ministry

More information

Clustering in kernel embedding spaces and organization of documents

Clustering in kernel embedding spaces and organization of documents Clustering in kernel embedding spaces and organization of documents Stéphane Lafon Collaborators: Raphy Coifman (Yale), Yosi Keller (Yale), Ioannis G. Kevrekidis (Princeton), Ann B. Lee (CMU), Boaz Nadler

More information

Lecture 10: Dimension Reduction Techniques

Lecture 10: Dimension Reduction Techniques Lecture 10: Dimension Reduction Techniques Radu Balan Department of Mathematics, AMSC, CSCAMM and NWC University of Maryland, College Park, MD April 17, 2018 Input Data It is assumed that there is a set

More information

ISSN: (Online) Volume 3, Issue 5, May 2015 International Journal of Advance Research in Computer Science and Management Studies

ISSN: (Online) Volume 3, Issue 5, May 2015 International Journal of Advance Research in Computer Science and Management Studies ISSN: 2321-7782 (Online) Volume 3, Issue 5, May 2015 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at:

More information