Nonlinear Data Transformation with Diffusion Map

Size: px
Start display at page:

Download "Nonlinear Data Transformation with Diffusion Map"

Transcription

1 Nonlinear Data Transformation with Diffusion Map Peter Freeman Ann Lee Joey Richards* Chad Schafer Department of Statistics Carnegie Mellon University * now at U.C. Berkeley t! " 3 3 x 10!11 5 0!5!10! x 10!12! 2 t!5!10!15!20!4!25!6 " !2 x 10!7 t! " Richards et al. (2009; ApJ 691, 32) Predict redshift and identify outliers (Richards, Freeman, Lee, Schafer (

2 Data Transformation: Why Do It? The Problem: Astronomical data that inhabit complex structures in (high-)dimensional spaces are difficult to analyze using standard statistical methods. For instance, we may want to: Estimate photometric redshifts from galaxy colors Estimate galaxy parameters (age, metallicity, etc.) from galaxy spectra Classify supernovae using irregularly spaced photometric observations The Solution: If these data possess a simpler underlying geometry in the original data space, we transform the data so as to capture and exploit that geometry. Usually (but not always), transforming the data affects dimensionality reduction, mitigating the curse of dimensionality. We seek to transform the data in such a way as to preserve relevant physical information whose variation is apparent in the original data.

3 Data Transformation: Example These data inhabit a one-dimensional manifold in a two-dimensional space. Perhaps a physical parameter of interest (e.g., redshift) varies smoothly along the manifold. Note that these may be non-standard data (e.g., each data point may represent a vector of values, like a spectrum). We want to transform the data in such a way that we can employ simple statistics (e.g., linear regression) to model the variation of that physical parameter. (Accurately.)

4 The Classic Choice: PCA Principal components analysis will do a terrible job (at dimension reduction) in this instance because it is a linear transformer.

5 The Classic Choice: PCA Principal components analysis will do a terrible job (at dimension reduction) in this instance because it is a linear transformer. In PCA, high-dimensional data are projected onto hyperplanes. Physical information may not be well-preserved in the transformation.

6 Nonlinear Data Transformation There are many available methods for nonlinear data transformation which have yet to be widely applied to astronomical data: Local linear embedding (LLE; see, e.g., Vanderplas & Connolly 2009) Others: Laplacian eigenmaps, Hessian eigenmaps, LTSA We apply the diffusion map (Coifman & Lafon 2006, Lafon & Lee 2006; see diffusionmap R package). The Idea: to estimate the true distance between two data points via a fictive diffusion (i.e., Markov random walk) process. The Advantage: The Euclidean distance between points x and y in the space of transformed data is approximately the diffusion distance between those points in the original data space. Thus variations of physical parameters along the original manifold are approximately preserved in the new data space.

7 Diffusion Map: Intuition Pick location... * * Diffusion Distances Diffusion Distances Diffusion Distances t = 1 t = 2 t = 25...set up a kernel... * * * *... * * * t = 1 t = 2...and map out the random walk. Yields distribution over points after first step Distribution after the second step t = 25 Distribution after the 25 th step tered on one point

8 Diffusion Map: The Math (Part I) Define similarity measure between two points x and y, e.g., the Euclidean distance: s(x, y) = Construct a weighted graph: p ( ) 2 cx,i c y,i. Row-normalize to compute one-step probabilities: i=1 A key feature of SCA is that th ( ) ) s(x, y)2 w(x, y) = exp ( ɛ where is a tuning parameter p 1 (x, y) = w(x, y)/ z w(x, z). ilities between all n data points in an Use p 1(x,y) to populate n x n matrix P of one-step probabilities.

9 Diffusion Map: The Math (Part II) The probability of stepping from x to y in t steps is Pt. The diffusion distance between x and y at time t is D 2 t (x, y) = j=1 λ 2t j (ψ j (x) ψ j ( y)) 2 Retain the top m eigenmodes to create diffusion map: D 2 t t : x [λ t 1 ψ 1(x), λ t 2 ψ 2(x),..., λ t m ψ m(x)] from R p to R m, we have that (x, y) m j=1 λ 2t λj = j th largest eigenvalue of P ψj = j th (right) eigenvector of P j (ψ j (x) ψ j ( y)) 2 = t (x) t ( y) 2 The tuning parameters ε and m are determined by minimizing predictive risk (a topic I will skip over in the interests of time). The choice of t generally does not matter.

10 The Spiral, Redux Coordinate Functions The first diffusion coordinate The second diffusion coordinate rdinate plot for diffusion map Second coordinate plot for diffusion map 91 92

11 Application I Spectroscopic redshift estimation and outlier detection using SDSS galaxy spectra. Estimation via adaptive regression: R r( t ) = t β = = m j=1 m j=1 β j t,j (x) β j λ t j ψ j (x) = m j=1 β j ψ j (x). We see that the choice of the parameter t is u " 3! 3 t 5 0!5!10!15 x 10!12 x 10!11 5 0! 2 t!5!10 " 2!15!20!25 True Spectroscopic Redshift Predict redshift and identify outliers (Richards, Freeman, Lee, Schafer (2009!6!4!2! 1 t 0 " x 10! Richards et al. (2009; ApJ 691, 32) 93

12 Application II Estimating properties of SDSS galaxies (age, metallicity, etc.) using a subset of the Bruzual & Charlot (2003) dictionary of theoretical galaxy spectra. Selection of prototype spectra made through diffusion K-means.! 3 /(1!! 3 ) " !2!4!6 0! 2 /(1!! 2 ) " 2!50!100!100!80!60! 1 /(1!! 1 ) " 1!40!20 Richards et al. (2009; MNRAS 399, 1044) For estimating properties of galaxies (Richards, Freeman, Lee, Schafer (

13 Freeman et al. (2009; MNRAS 398, 2012) Photometric Redshift Estimate ( z ) R Petro Photometric Redshift Estimate ( z ) Photometric redshift estimation for SDSS Main Sample Galaxies. Uses Nyström Extension laxies for quickly predicting le, we photometric extract those redshifts galaxies with of > ; Strauss nitude test <17.77set (or Fdata, given the ur main sample galaxy or MSG sample. We diffusion coordinates 0 galaxies from this sample to train our regres-of training setalgorithm data.described on of the outlier-removal o the removal of 251 galaxies from this set. Displays effect of flux e algorithm outlined in Sections 2.1 and 2.2 measurement error upon! = (0.05, 150), er estimates (!!, m) i.e. in order e appropriate, the 16-dimensional colour data predictions: attenuation o 150-dimensional bias. space or more elements of the set {F! } < 0, we m analysis. The flux units are nanomaggies;! to magnitude m! is m! = log10 F!,! = 2.5 log10 (F!j /F!i ). o colours is ci j f galaxies in our sample is Applications 0.00 Application III Photometric Redshift Estimate ( z ) Photometric redshift estimation using SCA Spectroscopic Redshift (Z) genvector estimates are independent of those apply the Nystro m extension to validation Figure 1. Top: predictions for randomly selected objects in the MSG redshift estimation (Freeman, Newman, Lee, Richards, Sc at a time, then concatenate the resultingphotometric pre!!

14 Application IV Classifying SNe in the Supernova Photometric Classification Challenge (Kessler et al. arxiv: ) g i See talk by Joey Richards for more details! r z Richards et al. (2010; in preparation)

15 Future Application Distribution Space Encoding Space Data Space Component 3 Confidence/Credible Region Component 2 Physically Possible Distributions Component 1 Transform observed light curves and theoretical light curves to a low-dimensional encoding space, where they may be compared using nonparametric density estimation. Supernova light curves

16 Diffusion Map: Challenges Computational Challenge I: efficient construction of weighted graph w. Distance computation slow for high-dimensional data. Graph may be sparse: can we short-circuit the distance computation? Computational Challenge II: execution time and memory requirements for eigen-decomposition of the one-step probability matrix P. SVD limited to approximately 10,000 x 10,000 matrices on typical desktop computers. Slow: we only need the top n% of eigenvalues and eigenvectors, but typical SVD implementations compute all of them. P may be sparse: efficient sparse SVD algorithms? Would algorithm of Budavári et al. (2009; MNRAS 394, 1496) help?

17 Diffusion Map: Challenges Computational Challenge III: efficient implementation of the Nyström Extension to apply training set results to far larger test sets. Predictions for 350,000 SDSS MSGs computed in 10 CPU hours...is this too slow in the era of LSST?

18 And One Statistical Challenge P. E. Freeman et al Spectroscopic Redshift (Z) ose ion Figure 1. Top: predictions for randomly selected objects in the MSG pre- validation hotometric redshift Newman, Richards,! ) = (0.05,(Freeman, set, forestimation (!!, m 150). Bottom: same as Lee, top, for the LRGSchafer (2009)) are! ) = (0.012, 200). In both cases, we remove 5σ validation set, with (!!, m om- outliers from the sample prior to plotting, 94 thus the actual number of plotted ing points is 9740 (top) and 9579 (bottom). our Spectroscopic Redshift (Z) Spe Can attenuation bias be effectively mitigated? TBD. This is not diffusion map specific... ias d Deviation Sample Standard Deviation Sample Bias Photometric Redshift Estimate ( z ) Flux measurement error causes attenuation bias: Photometric Redshift Estimate ( z ) with uss We resbed set. 2.2 der data Photometric Redshift Estimate ( z ) Applications we ies; F!, 2015 Photometric redshift estimation using SCA

19 And One Statistical Challenge... No. 1, 2008 MACHINE LEARNING FOR PROBABI five-band ollow-up of ANNz, t training ic survey talog and al. 2001).! 17.77) 2), while y uniform me limited tion). ANNz (Collister & Lahav 2004; PASP 116, 345) Fig. 2. Spectroscopic vs. photometric redshifts for ANNz applied to 10,000 galaxies randomly selected from the SDSS EDR. tecture was 5 : 10 : 10 : 1. A committee of five such networks was trained on the training and validation sets, then applied to the evaluation set. Figure 2 shows the ANNz photometric knn (Ball et al. 2008; ApJ 683, 12) Fig. 6. Photometric vs. spectroscopic redshift for the 82,672 SDSS DR5 main sample galaxies of the blind testing set (20% of the sample). Here, zphot is the mean photometric redshift from the PDF for each object. The result from a single split (of the 10 used for validation) of the data into training and blind testing data is shown. Here,! is the RMS dispersion between zphot and zspec. [See the electronic edition of the Journal for a color version of this figure.] tro

20 Summary Methods of nonlinear data transformation such as diffusion map can help make statistical analyses of complex (and perhaps high-dimensional) data tractable. Analyses with diffusion map generally outperform (i.e., result in a lower predictive risk) similar analyses with PCA, a linear technique. Nonlinear techniques have great promise in the era of LSST, so long as certain computational challenges are overcome. We seek Optimal construction of weighted graphs Optimal implementations of SVD (memory, execution time, sparsity) Optimal implementation of the Nyström Extension Regardless of whether the challenges are overcome, the accuracy of our results may be limited by measurement error.

21 Predictive Risk: an Algorithm Pick tuning parameter values ε and m. Transform the data into diffusion space. Perform k-fold cross-validation on the transformed data: Assign each datum to one of k groups. Fit model (e.g., linear regression) to the data in k-1 groups (i.e., leave the data of the k th group out of the fit). Given best-fit model, compute estimate ŷ i for all data in the k th group. Repeat process until all k groups have been held out. Assuming the L 2 (squared-error) loss function, our estimate of the predictive risk is generally R(ɛ,m)= 1 n n [ŷ j (ɛ,m) Y j ] 2 j=1 We vary ε and m until the predictive risk estimate is minimized.

22 Nyström Extension The basic idea: compute the similarity of a test set datum to the training set data, and use that similarity to determine the diffusion coordinate for that datum via interpolation, with no eigen-decomposition. Mathematically: Ψ = WΨΛ W is the matrix of similarities between the test set data and the training set data, while Λ is a diagonal matrix with entries 1/λi.

Exploiting Sparse Non-Linear Structure in Astronomical Data

Exploiting Sparse Non-Linear Structure in Astronomical Data Exploiting Sparse Non-Linear Structure in Astronomical Data Ann B. Lee Department of Statistics and Department of Machine Learning, Carnegie Mellon University Joint work with P. Freeman, C. Schafer, and

More information

Non-linear Dimensionality Reduction

Non-linear Dimensionality Reduction Non-linear Dimensionality Reduction CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Laplacian Eigenmaps Locally Linear Embedding (LLE)

More information

Improved Astronomical Inferences via Nonparametric Density Estimation

Improved Astronomical Inferences via Nonparametric Density Estimation Improved Astronomical Inferences via Nonparametric Density Estimation Chad M. Schafer, InCA Group www.incagroup.org Department of Statistics Carnegie Mellon University Work Supported by NSF, NASA-AISR

More information

Spectral approximations in machine learning

Spectral approximations in machine learning Spectral approximations in machine learning Darren Homrighausen Department of Statistics Carnegie Mellon University Pittsburgh, PA 15213 dhomrigh@stat.cmu.edu Daniel J. McDonald Department of Statistics

More information

Astronomical Time Series Analysis for the Synoptic Survey Era. Joseph W. Richards. UC Berkeley Department of Astronomy Department of Statistics

Astronomical Time Series Analysis for the Synoptic Survey Era. Joseph W. Richards. UC Berkeley Department of Astronomy Department of Statistics Astronomical Time Series Analysis for the Synoptic Survey Era Joseph W. Richards UC Berkeley Department of Astronomy Department of Statistics jwrichar@stat.berkeley.edu Workshop on Algorithms for Modern

More information

Exploiting Low-Dimensional Structure in Astronomical Spectra

Exploiting Low-Dimensional Structure in Astronomical Spectra Exploiting Low-Dimensional Structure in Astronomical Spectra Joseph W. Richards, Peter E. Freeman, Ann B. Lee, Chad M. Schafer jwrichar@stat.cmu.edu Department of Statistics, Carnegie Mellon University,

More information

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold

More information

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal

More information

LECTURE NOTE #11 PROF. ALAN YUILLE

LECTURE NOTE #11 PROF. ALAN YUILLE LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform

More information

Data-dependent representations: Laplacian Eigenmaps

Data-dependent representations: Laplacian Eigenmaps Data-dependent representations: Laplacian Eigenmaps November 4, 2015 Data Organization and Manifold Learning There are many techniques for Data Organization and Manifold Learning, e.g., Principal Component

More information

Manifold Learning: Theory and Applications to HRI

Manifold Learning: Theory and Applications to HRI Manifold Learning: Theory and Applications to HRI Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr August 19, 2008 1 / 46 Greek Philosopher

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Nonlinear Dimensionality Reduction Piyush Rai CS5350/6350: Machine Learning October 25, 2011 Recap: Linear Dimensionality Reduction Linear Dimensionality Reduction: Based on a linear projection of the

More information

Nonlinear Dimensionality Reduction. Jose A. Costa

Nonlinear Dimensionality Reduction. Jose A. Costa Nonlinear Dimensionality Reduction Jose A. Costa Mathematics of Information Seminar, Dec. Motivation Many useful of signals such as: Image databases; Gene expression microarrays; Internet traffic time

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Learning gradients: prescriptive models

Learning gradients: prescriptive models Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan

More information

Using Local Spectral Methods in Theory and in Practice

Using Local Spectral Methods in Theory and in Practice Using Local Spectral Methods in Theory and in Practice Michael W. Mahoney ICSI and Dept of Statistics, UC Berkeley ( For more info, see: http: // www. stat. berkeley. edu/ ~ mmahoney or Google on Michael

More information

Data dependent operators for the spatial-spectral fusion problem

Data dependent operators for the spatial-spectral fusion problem Data dependent operators for the spatial-spectral fusion problem Wien, December 3, 2012 Joint work with: University of Maryland: J. J. Benedetto, J. A. Dobrosotskaya, T. Doster, K. W. Duke, M. Ehler, A.

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

DIMENSION REDUCTION. min. j=1

DIMENSION REDUCTION. min. j=1 DIMENSION REDUCTION 1 Principal Component Analysis (PCA) Principal components analysis (PCA) finds low dimensional approximations to the data by projecting the data onto linear subspaces. Let X R d and

More information

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.

Nonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold. Nonlinear Methods Data often lies on or near a nonlinear low-dimensional curve aka manifold. 27 Laplacian Eigenmaps Linear methods Lower-dimensional linear projection that preserves distances between all

More information

Semi-supervised learning for photometric supernova classification

Semi-supervised learning for photometric supernova classification Mon. Not. R. Astron. Soc. 419, 1121 1135 (2012) doi:10.1111/j.1365-2966.2011.19768.x Semi-supervised learning for photometric supernova classification Joseph W. Richards, 1,2 Darren Homrighausen, 3 Peter

More information

Astrostatistics: The Final Frontier

Astrostatistics: The Final Frontier Astrostatistics: The Final Frontier Peter Freeman, Joseph Richards, Chad Schafer, and Ann Lee H ow did the universe form? What is it made of, and how do its constituents evolve? How old is it? And, will

More information

Diffusion Wavelets and Applications

Diffusion Wavelets and Applications Diffusion Wavelets and Applications J.C. Bremer, R.R. Coifman, P.W. Jones, S. Lafon, M. Mohlenkamp, MM, R. Schul, A.D. Szlam Demos, web pages and preprints available at: S.Lafon: www.math.yale.edu/~sl349

More information

Data Analysis Methods

Data Analysis Methods Data Analysis Methods Successes, Opportunities, and Challenges Chad M. Schafer Department of Statistics & Data Science Carnegie Mellon University cschafer@cmu.edu The Astro/Data Science Community LSST

More information

Some Statistical Aspects of Photometric Redshift Estimation

Some Statistical Aspects of Photometric Redshift Estimation Some Statistical Aspects of Photometric Redshift Estimation James Long November 9, 2016 1 / 32 Photometric Redshift Estimation Background Non Negative Least Squares Non Negative Matrix Factorization EAZY

More information

Fisher s Linear Discriminant Analysis

Fisher s Linear Discriminant Analysis Fisher s Linear Discriminant Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian

Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian Amit Singer Princeton University Department of Mathematics and Program in Applied and Computational Mathematics

More information

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008

Advances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008 Advances in Manifold Learning Presented by: Nakul Verma June 10, 008 Outline Motivation Manifolds Manifold Learning Random projection of manifolds for dimension reduction Introduction to random projections

More information

Contribution from: Springer Verlag Berlin Heidelberg 2005 ISBN

Contribution from: Springer Verlag Berlin Heidelberg 2005 ISBN Contribution from: Mathematical Physics Studies Vol. 7 Perspectives in Analysis Essays in Honor of Lennart Carleson s 75th Birthday Michael Benedicks, Peter W. Jones, Stanislav Smirnov (Eds.) Springer

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

The Nyström Extension and Spectral Methods in Learning

The Nyström Extension and Spectral Methods in Learning Introduction Main Results Simulation Studies Summary The Nyström Extension and Spectral Methods in Learning New bounds and algorithms for high-dimensional data sets Patrick J. Wolfe (joint work with Mohamed-Ali

More information

Approximate Principal Components Analysis of Large Data Sets

Approximate Principal Components Analysis of Large Data Sets Approximate Principal Components Analysis of Large Data Sets Daniel J. McDonald Department of Statistics Indiana University mypage.iu.edu/ dajmcdon October 28, 2015 Motivation OBLIGATORY DATA IS BIG SLIDE

More information

November 28 th, Carlos Guestrin 1. Lower dimensional projections

November 28 th, Carlos Guestrin 1. Lower dimensional projections PCA Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University November 28 th, 2007 1 Lower dimensional projections Rather than picking a subset of the features, we can new features that are

More information

Dimensionality Reduc1on

Dimensionality Reduc1on Dimensionality Reduc1on contd Aarti Singh Machine Learning 10-601 Nov 10, 2011 Slides Courtesy: Tom Mitchell, Eric Xing, Lawrence Saul 1 Principal Component Analysis (PCA) Principal Components are the

More information

Manifold Regularization

Manifold Regularization 9.520: Statistical Learning Theory and Applications arch 3rd, 200 anifold Regularization Lecturer: Lorenzo Rosasco Scribe: Hooyoung Chung Introduction In this lecture we introduce a class of learning algorithms,

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

Intrinsic Structure Study on Whale Vocalizations

Intrinsic Structure Study on Whale Vocalizations 1 2015 DCLDE Conference Intrinsic Structure Study on Whale Vocalizations Yin Xian 1, Xiaobai Sun 2, Yuan Zhang 3, Wenjing Liao 3 Doug Nowacek 1,4, Loren Nolte 1, Robert Calderbank 1,2,3 1 Department of

More information

Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine

Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine Olga Kouropteva, Oleg Okun, Matti Pietikäinen Machine Vision Group, Infotech Oulu and

More information

Kernel-Based Contrast Functions for Sufficient Dimension Reduction

Kernel-Based Contrast Functions for Sufficient Dimension Reduction Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach

More information

Local regression, intrinsic dimension, and nonparametric sparsity

Local regression, intrinsic dimension, and nonparametric sparsity Local regression, intrinsic dimension, and nonparametric sparsity Samory Kpotufe Toyota Technological Institute - Chicago and Max Planck Institute for Intelligent Systems I. Local regression and (local)

More information

Locally-biased analytics

Locally-biased analytics Locally-biased analytics You have BIG data and want to analyze a small part of it: Solution 1: Cut out small part and use traditional methods Challenge: cutting out may be difficult a priori Solution 2:

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Learning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst

Learning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst Learning on Graphs and Manifolds CMPSCI 689 Sridhar Mahadevan U.Mass Amherst Outline Manifold learning is a relatively new area of machine learning (2000-now). Main idea Model the underlying geometry of

More information

SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES

SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES JIANG ZHU, SHILIANG SUN Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 20024, P. R. China E-MAIL:

More information

Supervised locally linear embedding

Supervised locally linear embedding Supervised locally linear embedding Dick de Ridder 1, Olga Kouropteva 2, Oleg Okun 2, Matti Pietikäinen 2 and Robert P.W. Duin 1 1 Pattern Recognition Group, Department of Imaging Science and Technology,

More information

Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization

Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization Haiping Lu 1 K. N. Plataniotis 1 A. N. Venetsanopoulos 1,2 1 Department of Electrical & Computer Engineering,

More information

March 13, Paper: R.R. Coifman, S. Lafon, Diffusion maps ([Coifman06]) Seminar: Learning with Graphs, Prof. Hein, Saarland University

March 13, Paper: R.R. Coifman, S. Lafon, Diffusion maps ([Coifman06]) Seminar: Learning with Graphs, Prof. Hein, Saarland University Kernels March 13, 2008 Paper: R.R. Coifman, S. Lafon, maps ([Coifman06]) Seminar: Learning with Graphs, Prof. Hein, Saarland University Kernels Figure: Example Application from [LafonWWW] meaningful geometric

More information

Unsupervised dimensionality reduction

Unsupervised dimensionality reduction Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional

More information

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian

More information

Classifying Galaxy Morphology using Machine Learning

Classifying Galaxy Morphology using Machine Learning Julian Kates-Harbeck, Introduction: Classifying Galaxy Morphology using Machine Learning The goal of this project is to classify galaxy morphologies. Generally, galaxy morphologies fall into one of two

More information

4 Bias-Variance for Ridge Regression (24 points)

4 Bias-Variance for Ridge Regression (24 points) Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Dimensionality Reduction: A Comparative Review

Dimensionality Reduction: A Comparative Review Dimensionality Reduction: A Comparative Review L.J.P. van der Maaten, E.O. Postma, H.J. van den Herik MICC, Maastricht University, P.O. Box 616, 6200 MD Maastricht, The Netherlands. Abstract In recent

More information

Image Analysis & Retrieval Lec 13 - Feature Dimension Reduction

Image Analysis & Retrieval Lec 13 - Feature Dimension Reduction CS/EE 5590 / ENG 401 Special Topics, Spring 2018 Image Analysis & Retrieval Lec 13 - Feature Dimension Reduction Zhu Li Dept of CSEE, UMKC http://l.web.umkc.edu/lizhu Office Hour: Tue/Thr 2:30-4pm@FH560E,

More information

Approximate Principal Components Analysis of Large Data Sets

Approximate Principal Components Analysis of Large Data Sets Approximate Principal Components Analysis of Large Data Sets Daniel J. McDonald Department of Statistics Indiana University mypage.iu.edu/ dajmcdon April 27, 2016 Approximation-Regularization for Analysis

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Kernel methods for comparing distributions, measuring dependence

Kernel methods for comparing distributions, measuring dependence Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations

More information

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis .. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make

More information

Quantifying correlations between galaxy emission lines and stellar continua

Quantifying correlations between galaxy emission lines and stellar continua Quantifying correlations between galaxy emission lines and stellar continua R. Beck, L. Dobos, C.W. Yip, A.S. Szalay and I. Csabai 2016 astro-ph: 1601.0241 1 Introduction / Technique Data Emission line

More information

Statistical Learning. Dong Liu. Dept. EEIS, USTC

Statistical Learning. Dong Liu. Dept. EEIS, USTC Statistical Learning Dong Liu Dept. EEIS, USTC Chapter 6. Unsupervised and Semi-Supervised Learning 1. Unsupervised learning 2. k-means 3. Gaussian mixture model 4. Other approaches to clustering 5. Principle

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional

More information

arxiv: v1 [stat.ml] 3 Mar 2019

arxiv: v1 [stat.ml] 3 Mar 2019 Classification via local manifold approximation Didong Li 1 and David B Dunson 1,2 Department of Mathematics 1 and Statistical Science 2, Duke University arxiv:1903.00985v1 [stat.ml] 3 Mar 2019 Classifiers

More information

PARAMETERIZATION OF NON-LINEAR MANIFOLDS

PARAMETERIZATION OF NON-LINEAR MANIFOLDS PARAMETERIZATION OF NON-LINEAR MANIFOLDS C. W. GEAR DEPARTMENT OF CHEMICAL AND BIOLOGICAL ENGINEERING PRINCETON UNIVERSITY, PRINCETON, NJ E-MAIL:WGEAR@PRINCETON.EDU Abstract. In this report we consider

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional

More information

Filtering via a Reference Set. A.Haddad, D. Kushnir, R.R. Coifman Technical Report YALEU/DCS/TR-1441 February 21, 2011

Filtering via a Reference Set. A.Haddad, D. Kushnir, R.R. Coifman Technical Report YALEU/DCS/TR-1441 February 21, 2011 Patch-based de-noising algorithms and patch manifold smoothing have emerged as efficient de-noising methods. This paper provides a new insight on these methods, such as the Non Local Means or the image

More information

Robust Laplacian Eigenmaps Using Global Information

Robust Laplacian Eigenmaps Using Global Information Manifold Learning and its Applications: Papers from the AAAI Fall Symposium (FS-9-) Robust Laplacian Eigenmaps Using Global Information Shounak Roychowdhury ECE University of Texas at Austin, Austin, TX

More information

Vector Space Models. wine_spectral.r

Vector Space Models. wine_spectral.r Vector Space Models 137 wine_spectral.r Latent Semantic Analysis Problem with words Even a small vocabulary as in wine example is challenging LSA Reduce number of columns of DTM by principal components

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Is Manifold Learning for Toy Data only?

Is Manifold Learning for Toy Data only? s Manifold Learning for Toy Data only? Marina Meilă University of Washington mmp@stat.washington.edu MMDS Workshop 2016 Outline What is non-linear dimension reduction? Metric Manifold Learning Estimating

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems

More information

Machine Learning for Data Science (CS4786) Lecture 11

Machine Learning for Data Science (CS4786) Lecture 11 Machine Learning for Data Science (CS4786) Lecture 11 Spectral clustering Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ ANNOUNCEMENT 1 Assignment P1 the Diagnostic assignment 1 will

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

Data Exploration vis Local Two-Sample Testing

Data Exploration vis Local Two-Sample Testing Data Exploration vis Local Two-Sample Testing 0 20 40 60 80 100 40 20 0 20 40 Freeman, Kim, and Lee (2017) Astrostatistics at Carnegie Mellon CMU Astrostatistics Network Graph 2017 (not including collaborations

More information

Regularized Least Squares

Regularized Least Squares Regularized Least Squares Ryan M. Rifkin Google, Inc. 2008 Basics: Data Data points S = {(X 1, Y 1 ),...,(X n, Y n )}. We let X simultaneously refer to the set {X 1,...,X n } and to the n by d matrix whose

More information

Graph Metrics and Dimension Reduction

Graph Metrics and Dimension Reduction Graph Metrics and Dimension Reduction Minh Tang 1 Michael Trosset 2 1 Applied Mathematics and Statistics The Johns Hopkins University 2 Department of Statistics Indiana University, Bloomington November

More information

Dictionary Learning for photo-z estimation

Dictionary Learning for photo-z estimation Dictionary Learning for photo-z estimation Joana Frontera-Pons, Florent Sureau, Jérôme Bobin 5th September 2017 - Workshop Dictionary Learning on Manifolds MOTIVATION! Goal : Measure the radial positions

More information

c 4, < y 2, 1 0, otherwise,

c 4, < y 2, 1 0, otherwise, Fundamentals of Big Data Analytics Univ.-Prof. Dr. rer. nat. Rudolf Mathar Problem. Probability theory: The outcome of an experiment is described by three events A, B and C. The probabilities Pr(A) =,

More information

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection

More information

CS 231A Section 1: Linear Algebra & Probability Review

CS 231A Section 1: Linear Algebra & Probability Review CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

Department of Statistics Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA citizenship: U.S.A.

Department of Statistics Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA citizenship: U.S.A. Chad M. Schafer Department of Statistics Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 cschafer@cmu.edu citizenship: U.S.A. Positions Held Associate Professor Department of Statistics,

More information

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang

CS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations

More information

Tensor Methods for Feature Learning

Tensor Methods for Feature Learning Tensor Methods for Feature Learning Anima Anandkumar U.C. Irvine Feature Learning For Efficient Classification Find good transformations of input for improved classification Figures used attributed to

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

Improved Local Coordinate Coding using Local Tangents

Improved Local Coordinate Coding using Local Tangents Improved Local Coordinate Coding using Local Tangents Kai Yu NEC Laboratories America, 10081 N. Wolfe Road, Cupertino, CA 95129 Tong Zhang Rutgers University, 110 Frelinghuysen Road, Piscataway, NJ 08854

More information

Using SVD to Recommend Movies

Using SVD to Recommend Movies Michael Percy University of California, Santa Cruz Last update: December 12, 2009 Last update: December 12, 2009 1 / Outline 1 Introduction 2 Singular Value Decomposition 3 Experiments 4 Conclusion Last

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Probabilistic photometric redshifts in the era of Petascale Astronomy

Probabilistic photometric redshifts in the era of Petascale Astronomy Probabilistic photometric redshifts in the era of Petascale Astronomy Matías Carrasco Kind NCSA/Department of Astronomy University of Illinois at Urbana-Champaign Tools for Astronomical Big Data March

More information

Analysis of Spectral Kernel Design based Semi-supervised Learning

Analysis of Spectral Kernel Design based Semi-supervised Learning Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality

More information

Machine Learning Practice Page 2 of 2 10/28/13

Machine Learning Practice Page 2 of 2 10/28/13 Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes

More information

Clustering in kernel embedding spaces and organization of documents

Clustering in kernel embedding spaces and organization of documents Clustering in kernel embedding spaces and organization of documents Stéphane Lafon Collaborators: Raphy Coifman (Yale), Yosi Keller (Yale), Ioannis G. Kevrekidis (Princeton), Ann B. Lee (CMU), Boaz Nadler

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Department of Computer Science and Engineering

Department of Computer Science and Engineering Linear algebra methods for data mining with applications to materials Yousef Saad Department of Computer Science and Engineering University of Minnesota ICSC 2012, Hong Kong, Jan 4-7, 2012 HAPPY BIRTHDAY

More information