Nonlinear Data Transformation with Diffusion Map
|
|
- Morris Nicholson
- 5 years ago
- Views:
Transcription
1 Nonlinear Data Transformation with Diffusion Map Peter Freeman Ann Lee Joey Richards* Chad Schafer Department of Statistics Carnegie Mellon University * now at U.C. Berkeley t! " 3 3 x 10!11 5 0!5!10! x 10!12! 2 t!5!10!15!20!4!25!6 " !2 x 10!7 t! " Richards et al. (2009; ApJ 691, 32) Predict redshift and identify outliers (Richards, Freeman, Lee, Schafer (
2 Data Transformation: Why Do It? The Problem: Astronomical data that inhabit complex structures in (high-)dimensional spaces are difficult to analyze using standard statistical methods. For instance, we may want to: Estimate photometric redshifts from galaxy colors Estimate galaxy parameters (age, metallicity, etc.) from galaxy spectra Classify supernovae using irregularly spaced photometric observations The Solution: If these data possess a simpler underlying geometry in the original data space, we transform the data so as to capture and exploit that geometry. Usually (but not always), transforming the data affects dimensionality reduction, mitigating the curse of dimensionality. We seek to transform the data in such a way as to preserve relevant physical information whose variation is apparent in the original data.
3 Data Transformation: Example These data inhabit a one-dimensional manifold in a two-dimensional space. Perhaps a physical parameter of interest (e.g., redshift) varies smoothly along the manifold. Note that these may be non-standard data (e.g., each data point may represent a vector of values, like a spectrum). We want to transform the data in such a way that we can employ simple statistics (e.g., linear regression) to model the variation of that physical parameter. (Accurately.)
4 The Classic Choice: PCA Principal components analysis will do a terrible job (at dimension reduction) in this instance because it is a linear transformer.
5 The Classic Choice: PCA Principal components analysis will do a terrible job (at dimension reduction) in this instance because it is a linear transformer. In PCA, high-dimensional data are projected onto hyperplanes. Physical information may not be well-preserved in the transformation.
6 Nonlinear Data Transformation There are many available methods for nonlinear data transformation which have yet to be widely applied to astronomical data: Local linear embedding (LLE; see, e.g., Vanderplas & Connolly 2009) Others: Laplacian eigenmaps, Hessian eigenmaps, LTSA We apply the diffusion map (Coifman & Lafon 2006, Lafon & Lee 2006; see diffusionmap R package). The Idea: to estimate the true distance between two data points via a fictive diffusion (i.e., Markov random walk) process. The Advantage: The Euclidean distance between points x and y in the space of transformed data is approximately the diffusion distance between those points in the original data space. Thus variations of physical parameters along the original manifold are approximately preserved in the new data space.
7 Diffusion Map: Intuition Pick location... * * Diffusion Distances Diffusion Distances Diffusion Distances t = 1 t = 2 t = 25...set up a kernel... * * * *... * * * t = 1 t = 2...and map out the random walk. Yields distribution over points after first step Distribution after the second step t = 25 Distribution after the 25 th step tered on one point
8 Diffusion Map: The Math (Part I) Define similarity measure between two points x and y, e.g., the Euclidean distance: s(x, y) = Construct a weighted graph: p ( ) 2 cx,i c y,i. Row-normalize to compute one-step probabilities: i=1 A key feature of SCA is that th ( ) ) s(x, y)2 w(x, y) = exp ( ɛ where is a tuning parameter p 1 (x, y) = w(x, y)/ z w(x, z). ilities between all n data points in an Use p 1(x,y) to populate n x n matrix P of one-step probabilities.
9 Diffusion Map: The Math (Part II) The probability of stepping from x to y in t steps is Pt. The diffusion distance between x and y at time t is D 2 t (x, y) = j=1 λ 2t j (ψ j (x) ψ j ( y)) 2 Retain the top m eigenmodes to create diffusion map: D 2 t t : x [λ t 1 ψ 1(x), λ t 2 ψ 2(x),..., λ t m ψ m(x)] from R p to R m, we have that (x, y) m j=1 λ 2t λj = j th largest eigenvalue of P ψj = j th (right) eigenvector of P j (ψ j (x) ψ j ( y)) 2 = t (x) t ( y) 2 The tuning parameters ε and m are determined by minimizing predictive risk (a topic I will skip over in the interests of time). The choice of t generally does not matter.
10 The Spiral, Redux Coordinate Functions The first diffusion coordinate The second diffusion coordinate rdinate plot for diffusion map Second coordinate plot for diffusion map 91 92
11 Application I Spectroscopic redshift estimation and outlier detection using SDSS galaxy spectra. Estimation via adaptive regression: R r( t ) = t β = = m j=1 m j=1 β j t,j (x) β j λ t j ψ j (x) = m j=1 β j ψ j (x). We see that the choice of the parameter t is u " 3! 3 t 5 0!5!10!15 x 10!12 x 10!11 5 0! 2 t!5!10 " 2!15!20!25 True Spectroscopic Redshift Predict redshift and identify outliers (Richards, Freeman, Lee, Schafer (2009!6!4!2! 1 t 0 " x 10! Richards et al. (2009; ApJ 691, 32) 93
12 Application II Estimating properties of SDSS galaxies (age, metallicity, etc.) using a subset of the Bruzual & Charlot (2003) dictionary of theoretical galaxy spectra. Selection of prototype spectra made through diffusion K-means.! 3 /(1!! 3 ) " !2!4!6 0! 2 /(1!! 2 ) " 2!50!100!100!80!60! 1 /(1!! 1 ) " 1!40!20 Richards et al. (2009; MNRAS 399, 1044) For estimating properties of galaxies (Richards, Freeman, Lee, Schafer (
13 Freeman et al. (2009; MNRAS 398, 2012) Photometric Redshift Estimate ( z ) R Petro Photometric Redshift Estimate ( z ) Photometric redshift estimation for SDSS Main Sample Galaxies. Uses Nyström Extension laxies for quickly predicting le, we photometric extract those redshifts galaxies with of > ; Strauss nitude test <17.77set (or Fdata, given the ur main sample galaxy or MSG sample. We diffusion coordinates 0 galaxies from this sample to train our regres-of training setalgorithm data.described on of the outlier-removal o the removal of 251 galaxies from this set. Displays effect of flux e algorithm outlined in Sections 2.1 and 2.2 measurement error upon! = (0.05, 150), er estimates (!!, m) i.e. in order e appropriate, the 16-dimensional colour data predictions: attenuation o 150-dimensional bias. space or more elements of the set {F! } < 0, we m analysis. The flux units are nanomaggies;! to magnitude m! is m! = log10 F!,! = 2.5 log10 (F!j /F!i ). o colours is ci j f galaxies in our sample is Applications 0.00 Application III Photometric Redshift Estimate ( z ) Photometric redshift estimation using SCA Spectroscopic Redshift (Z) genvector estimates are independent of those apply the Nystro m extension to validation Figure 1. Top: predictions for randomly selected objects in the MSG redshift estimation (Freeman, Newman, Lee, Richards, Sc at a time, then concatenate the resultingphotometric pre!!
14 Application IV Classifying SNe in the Supernova Photometric Classification Challenge (Kessler et al. arxiv: ) g i See talk by Joey Richards for more details! r z Richards et al. (2010; in preparation)
15 Future Application Distribution Space Encoding Space Data Space Component 3 Confidence/Credible Region Component 2 Physically Possible Distributions Component 1 Transform observed light curves and theoretical light curves to a low-dimensional encoding space, where they may be compared using nonparametric density estimation. Supernova light curves
16 Diffusion Map: Challenges Computational Challenge I: efficient construction of weighted graph w. Distance computation slow for high-dimensional data. Graph may be sparse: can we short-circuit the distance computation? Computational Challenge II: execution time and memory requirements for eigen-decomposition of the one-step probability matrix P. SVD limited to approximately 10,000 x 10,000 matrices on typical desktop computers. Slow: we only need the top n% of eigenvalues and eigenvectors, but typical SVD implementations compute all of them. P may be sparse: efficient sparse SVD algorithms? Would algorithm of Budavári et al. (2009; MNRAS 394, 1496) help?
17 Diffusion Map: Challenges Computational Challenge III: efficient implementation of the Nyström Extension to apply training set results to far larger test sets. Predictions for 350,000 SDSS MSGs computed in 10 CPU hours...is this too slow in the era of LSST?
18 And One Statistical Challenge P. E. Freeman et al Spectroscopic Redshift (Z) ose ion Figure 1. Top: predictions for randomly selected objects in the MSG pre- validation hotometric redshift Newman, Richards,! ) = (0.05,(Freeman, set, forestimation (!!, m 150). Bottom: same as Lee, top, for the LRGSchafer (2009)) are! ) = (0.012, 200). In both cases, we remove 5σ validation set, with (!!, m om- outliers from the sample prior to plotting, 94 thus the actual number of plotted ing points is 9740 (top) and 9579 (bottom). our Spectroscopic Redshift (Z) Spe Can attenuation bias be effectively mitigated? TBD. This is not diffusion map specific... ias d Deviation Sample Standard Deviation Sample Bias Photometric Redshift Estimate ( z ) Flux measurement error causes attenuation bias: Photometric Redshift Estimate ( z ) with uss We resbed set. 2.2 der data Photometric Redshift Estimate ( z ) Applications we ies; F!, 2015 Photometric redshift estimation using SCA
19 And One Statistical Challenge... No. 1, 2008 MACHINE LEARNING FOR PROBABI five-band ollow-up of ANNz, t training ic survey talog and al. 2001).! 17.77) 2), while y uniform me limited tion). ANNz (Collister & Lahav 2004; PASP 116, 345) Fig. 2. Spectroscopic vs. photometric redshifts for ANNz applied to 10,000 galaxies randomly selected from the SDSS EDR. tecture was 5 : 10 : 10 : 1. A committee of five such networks was trained on the training and validation sets, then applied to the evaluation set. Figure 2 shows the ANNz photometric knn (Ball et al. 2008; ApJ 683, 12) Fig. 6. Photometric vs. spectroscopic redshift for the 82,672 SDSS DR5 main sample galaxies of the blind testing set (20% of the sample). Here, zphot is the mean photometric redshift from the PDF for each object. The result from a single split (of the 10 used for validation) of the data into training and blind testing data is shown. Here,! is the RMS dispersion between zphot and zspec. [See the electronic edition of the Journal for a color version of this figure.] tro
20 Summary Methods of nonlinear data transformation such as diffusion map can help make statistical analyses of complex (and perhaps high-dimensional) data tractable. Analyses with diffusion map generally outperform (i.e., result in a lower predictive risk) similar analyses with PCA, a linear technique. Nonlinear techniques have great promise in the era of LSST, so long as certain computational challenges are overcome. We seek Optimal construction of weighted graphs Optimal implementations of SVD (memory, execution time, sparsity) Optimal implementation of the Nyström Extension Regardless of whether the challenges are overcome, the accuracy of our results may be limited by measurement error.
21 Predictive Risk: an Algorithm Pick tuning parameter values ε and m. Transform the data into diffusion space. Perform k-fold cross-validation on the transformed data: Assign each datum to one of k groups. Fit model (e.g., linear regression) to the data in k-1 groups (i.e., leave the data of the k th group out of the fit). Given best-fit model, compute estimate ŷ i for all data in the k th group. Repeat process until all k groups have been held out. Assuming the L 2 (squared-error) loss function, our estimate of the predictive risk is generally R(ɛ,m)= 1 n n [ŷ j (ɛ,m) Y j ] 2 j=1 We vary ε and m until the predictive risk estimate is minimized.
22 Nyström Extension The basic idea: compute the similarity of a test set datum to the training set data, and use that similarity to determine the diffusion coordinate for that datum via interpolation, with no eigen-decomposition. Mathematically: Ψ = WΨΛ W is the matrix of similarities between the test set data and the training set data, while Λ is a diagonal matrix with entries 1/λi.
Exploiting Sparse Non-Linear Structure in Astronomical Data
Exploiting Sparse Non-Linear Structure in Astronomical Data Ann B. Lee Department of Statistics and Department of Machine Learning, Carnegie Mellon University Joint work with P. Freeman, C. Schafer, and
More informationNon-linear Dimensionality Reduction
Non-linear Dimensionality Reduction CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Laplacian Eigenmaps Locally Linear Embedding (LLE)
More informationImproved Astronomical Inferences via Nonparametric Density Estimation
Improved Astronomical Inferences via Nonparametric Density Estimation Chad M. Schafer, InCA Group www.incagroup.org Department of Statistics Carnegie Mellon University Work Supported by NSF, NASA-AISR
More informationSpectral approximations in machine learning
Spectral approximations in machine learning Darren Homrighausen Department of Statistics Carnegie Mellon University Pittsburgh, PA 15213 dhomrigh@stat.cmu.edu Daniel J. McDonald Department of Statistics
More informationAstronomical Time Series Analysis for the Synoptic Survey Era. Joseph W. Richards. UC Berkeley Department of Astronomy Department of Statistics
Astronomical Time Series Analysis for the Synoptic Survey Era Joseph W. Richards UC Berkeley Department of Astronomy Department of Statistics jwrichar@stat.berkeley.edu Workshop on Algorithms for Modern
More informationExploiting Low-Dimensional Structure in Astronomical Spectra
Exploiting Low-Dimensional Structure in Astronomical Spectra Joseph W. Richards, Peter E. Freeman, Ann B. Lee, Chad M. Schafer jwrichar@stat.cmu.edu Department of Statistics, Carnegie Mellon University,
More informationFace Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi
Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold
More informationConnection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis
Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal
More informationLECTURE NOTE #11 PROF. ALAN YUILLE
LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform
More informationData-dependent representations: Laplacian Eigenmaps
Data-dependent representations: Laplacian Eigenmaps November 4, 2015 Data Organization and Manifold Learning There are many techniques for Data Organization and Manifold Learning, e.g., Principal Component
More informationManifold Learning: Theory and Applications to HRI
Manifold Learning: Theory and Applications to HRI Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr August 19, 2008 1 / 46 Greek Philosopher
More informationNonlinear Dimensionality Reduction
Nonlinear Dimensionality Reduction Piyush Rai CS5350/6350: Machine Learning October 25, 2011 Recap: Linear Dimensionality Reduction Linear Dimensionality Reduction: Based on a linear projection of the
More informationNonlinear Dimensionality Reduction. Jose A. Costa
Nonlinear Dimensionality Reduction Jose A. Costa Mathematics of Information Seminar, Dec. Motivation Many useful of signals such as: Image databases; Gene expression microarrays; Internet traffic time
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationLearning gradients: prescriptive models
Department of Statistical Science Institute for Genome Sciences & Policy Department of Computer Science Duke University May 11, 2007 Relevant papers Learning Coordinate Covariances via Gradients. Sayan
More informationUsing Local Spectral Methods in Theory and in Practice
Using Local Spectral Methods in Theory and in Practice Michael W. Mahoney ICSI and Dept of Statistics, UC Berkeley ( For more info, see: http: // www. stat. berkeley. edu/ ~ mmahoney or Google on Michael
More informationData dependent operators for the spatial-spectral fusion problem
Data dependent operators for the spatial-spectral fusion problem Wien, December 3, 2012 Joint work with: University of Maryland: J. J. Benedetto, J. A. Dobrosotskaya, T. Doster, K. W. Duke, M. Ehler, A.
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationDIMENSION REDUCTION. min. j=1
DIMENSION REDUCTION 1 Principal Component Analysis (PCA) Principal components analysis (PCA) finds low dimensional approximations to the data by projecting the data onto linear subspaces. Let X R d and
More informationNonlinear Methods. Data often lies on or near a nonlinear low-dimensional curve aka manifold.
Nonlinear Methods Data often lies on or near a nonlinear low-dimensional curve aka manifold. 27 Laplacian Eigenmaps Linear methods Lower-dimensional linear projection that preserves distances between all
More informationSemi-supervised learning for photometric supernova classification
Mon. Not. R. Astron. Soc. 419, 1121 1135 (2012) doi:10.1111/j.1365-2966.2011.19768.x Semi-supervised learning for photometric supernova classification Joseph W. Richards, 1,2 Darren Homrighausen, 3 Peter
More informationAstrostatistics: The Final Frontier
Astrostatistics: The Final Frontier Peter Freeman, Joseph Richards, Chad Schafer, and Ann Lee H ow did the universe form? What is it made of, and how do its constituents evolve? How old is it? And, will
More informationDiffusion Wavelets and Applications
Diffusion Wavelets and Applications J.C. Bremer, R.R. Coifman, P.W. Jones, S. Lafon, M. Mohlenkamp, MM, R. Schul, A.D. Szlam Demos, web pages and preprints available at: S.Lafon: www.math.yale.edu/~sl349
More informationData Analysis Methods
Data Analysis Methods Successes, Opportunities, and Challenges Chad M. Schafer Department of Statistics & Data Science Carnegie Mellon University cschafer@cmu.edu The Astro/Data Science Community LSST
More informationSome Statistical Aspects of Photometric Redshift Estimation
Some Statistical Aspects of Photometric Redshift Estimation James Long November 9, 2016 1 / 32 Photometric Redshift Estimation Background Non Negative Least Squares Non Negative Matrix Factorization EAZY
More informationFisher s Linear Discriminant Analysis
Fisher s Linear Discriminant Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationBeyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian
Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian Amit Singer Princeton University Department of Mathematics and Program in Applied and Computational Mathematics
More informationAdvances in Manifold Learning Presented by: Naku Nak l Verm r a June 10, 2008
Advances in Manifold Learning Presented by: Nakul Verma June 10, 008 Outline Motivation Manifolds Manifold Learning Random projection of manifolds for dimension reduction Introduction to random projections
More informationContribution from: Springer Verlag Berlin Heidelberg 2005 ISBN
Contribution from: Mathematical Physics Studies Vol. 7 Perspectives in Analysis Essays in Honor of Lennart Carleson s 75th Birthday Michael Benedicks, Peter W. Jones, Stanislav Smirnov (Eds.) Springer
More informationMACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA
1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR
More informationThe Nyström Extension and Spectral Methods in Learning
Introduction Main Results Simulation Studies Summary The Nyström Extension and Spectral Methods in Learning New bounds and algorithms for high-dimensional data sets Patrick J. Wolfe (joint work with Mohamed-Ali
More informationApproximate Principal Components Analysis of Large Data Sets
Approximate Principal Components Analysis of Large Data Sets Daniel J. McDonald Department of Statistics Indiana University mypage.iu.edu/ dajmcdon October 28, 2015 Motivation OBLIGATORY DATA IS BIG SLIDE
More informationNovember 28 th, Carlos Guestrin 1. Lower dimensional projections
PCA Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University November 28 th, 2007 1 Lower dimensional projections Rather than picking a subset of the features, we can new features that are
More informationDimensionality Reduc1on
Dimensionality Reduc1on contd Aarti Singh Machine Learning 10-601 Nov 10, 2011 Slides Courtesy: Tom Mitchell, Eric Xing, Lawrence Saul 1 Principal Component Analysis (PCA) Principal Components are the
More informationManifold Regularization
9.520: Statistical Learning Theory and Applications arch 3rd, 200 anifold Regularization Lecturer: Lorenzo Rosasco Scribe: Hooyoung Chung Introduction In this lecture we introduce a class of learning algorithms,
More informationRegression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)
Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features
More informationRegression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)
Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features
More informationIntrinsic Structure Study on Whale Vocalizations
1 2015 DCLDE Conference Intrinsic Structure Study on Whale Vocalizations Yin Xian 1, Xiaobai Sun 2, Yuan Zhang 3, Wenjing Liao 3 Doug Nowacek 1,4, Loren Nolte 1, Robert Calderbank 1,2,3 1 Department of
More informationClassification of handwritten digits using supervised locally linear embedding algorithm and support vector machine
Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine Olga Kouropteva, Oleg Okun, Matti Pietikäinen Machine Vision Group, Infotech Oulu and
More informationKernel-Based Contrast Functions for Sufficient Dimension Reduction
Kernel-Based Contrast Functions for Sufficient Dimension Reduction Michael I. Jordan Departments of Statistics and EECS University of California, Berkeley Joint work with Kenji Fukumizu and Francis Bach
More informationLocal regression, intrinsic dimension, and nonparametric sparsity
Local regression, intrinsic dimension, and nonparametric sparsity Samory Kpotufe Toyota Technological Institute - Chicago and Max Planck Institute for Intelligent Systems I. Local regression and (local)
More informationLocally-biased analytics
Locally-biased analytics You have BIG data and want to analyze a small part of it: Solution 1: Cut out small part and use traditional methods Challenge: cutting out may be difficult a priori Solution 2:
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationLearning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst
Learning on Graphs and Manifolds CMPSCI 689 Sridhar Mahadevan U.Mass Amherst Outline Manifold learning is a relatively new area of machine learning (2000-now). Main idea Model the underlying geometry of
More informationSINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES
SINGLE-TASK AND MULTITASK SPARSE GAUSSIAN PROCESSES JIANG ZHU, SHILIANG SUN Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 20024, P. R. China E-MAIL:
More informationSupervised locally linear embedding
Supervised locally linear embedding Dick de Ridder 1, Olga Kouropteva 2, Oleg Okun 2, Matti Pietikäinen 2 and Robert P.W. Duin 1 1 Pattern Recognition Group, Department of Imaging Science and Technology,
More informationUncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization
Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization Haiping Lu 1 K. N. Plataniotis 1 A. N. Venetsanopoulos 1,2 1 Department of Electrical & Computer Engineering,
More informationMarch 13, Paper: R.R. Coifman, S. Lafon, Diffusion maps ([Coifman06]) Seminar: Learning with Graphs, Prof. Hein, Saarland University
Kernels March 13, 2008 Paper: R.R. Coifman, S. Lafon, maps ([Coifman06]) Seminar: Learning with Graphs, Prof. Hein, Saarland University Kernels Figure: Example Application from [LafonWWW] meaningful geometric
More informationUnsupervised dimensionality reduction
Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional
More informationUnsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto
Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian
More informationClassifying Galaxy Morphology using Machine Learning
Julian Kates-Harbeck, Introduction: Classifying Galaxy Morphology using Machine Learning The goal of this project is to classify galaxy morphologies. Generally, galaxy morphologies fall into one of two
More information4 Bias-Variance for Ridge Regression (24 points)
Implement Ridge Regression with λ = 0.00001. Plot the Squared Euclidean test error for the following values of k (the dimensions you reduce to): k = {0, 50, 100, 150, 200, 250, 300, 350, 400, 450, 500,
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationDimensionality Reduction: A Comparative Review
Dimensionality Reduction: A Comparative Review L.J.P. van der Maaten, E.O. Postma, H.J. van den Herik MICC, Maastricht University, P.O. Box 616, 6200 MD Maastricht, The Netherlands. Abstract In recent
More informationImage Analysis & Retrieval Lec 13 - Feature Dimension Reduction
CS/EE 5590 / ENG 401 Special Topics, Spring 2018 Image Analysis & Retrieval Lec 13 - Feature Dimension Reduction Zhu Li Dept of CSEE, UMKC http://l.web.umkc.edu/lizhu Office Hour: Tue/Thr 2:30-4pm@FH560E,
More informationApproximate Principal Components Analysis of Large Data Sets
Approximate Principal Components Analysis of Large Data Sets Daniel J. McDonald Department of Statistics Indiana University mypage.iu.edu/ dajmcdon April 27, 2016 Approximation-Regularization for Analysis
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationKernel methods for comparing distributions, measuring dependence
Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations
More informationDecember 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis
.. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make
More informationQuantifying correlations between galaxy emission lines and stellar continua
Quantifying correlations between galaxy emission lines and stellar continua R. Beck, L. Dobos, C.W. Yip, A.S. Szalay and I. Csabai 2016 astro-ph: 1601.0241 1 Introduction / Technique Data Emission line
More informationStatistical Learning. Dong Liu. Dept. EEIS, USTC
Statistical Learning Dong Liu Dept. EEIS, USTC Chapter 6. Unsupervised and Semi-Supervised Learning 1. Unsupervised learning 2. k-means 3. Gaussian mixture model 4. Other approaches to clustering 5. Principle
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional
More informationarxiv: v1 [stat.ml] 3 Mar 2019
Classification via local manifold approximation Didong Li 1 and David B Dunson 1,2 Department of Mathematics 1 and Statistical Science 2, Duke University arxiv:1903.00985v1 [stat.ml] 3 Mar 2019 Classifiers
More informationPARAMETERIZATION OF NON-LINEAR MANIFOLDS
PARAMETERIZATION OF NON-LINEAR MANIFOLDS C. W. GEAR DEPARTMENT OF CHEMICAL AND BIOLOGICAL ENGINEERING PRINCETON UNIVERSITY, PRINCETON, NJ E-MAIL:WGEAR@PRINCETON.EDU Abstract. In this report we consider
More informationRandomized Algorithms
Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional
More informationFiltering via a Reference Set. A.Haddad, D. Kushnir, R.R. Coifman Technical Report YALEU/DCS/TR-1441 February 21, 2011
Patch-based de-noising algorithms and patch manifold smoothing have emerged as efficient de-noising methods. This paper provides a new insight on these methods, such as the Non Local Means or the image
More informationRobust Laplacian Eigenmaps Using Global Information
Manifold Learning and its Applications: Papers from the AAAI Fall Symposium (FS-9-) Robust Laplacian Eigenmaps Using Global Information Shounak Roychowdhury ECE University of Texas at Austin, Austin, TX
More informationVector Space Models. wine_spectral.r
Vector Space Models 137 wine_spectral.r Latent Semantic Analysis Problem with words Even a small vocabulary as in wine example is challenging LSA Reduce number of columns of DTM by principal components
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationIs Manifold Learning for Toy Data only?
s Manifold Learning for Toy Data only? Marina Meilă University of Washington mmp@stat.washington.edu MMDS Workshop 2016 Outline What is non-linear dimension reduction? Metric Manifold Learning Estimating
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems
More informationMachine Learning for Data Science (CS4786) Lecture 11
Machine Learning for Data Science (CS4786) Lecture 11 Spectral clustering Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ ANNOUNCEMENT 1 Assignment P1 the Diagnostic assignment 1 will
More informationLecture 6: Methods for high-dimensional problems
Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,
More informationData Exploration vis Local Two-Sample Testing
Data Exploration vis Local Two-Sample Testing 0 20 40 60 80 100 40 20 0 20 40 Freeman, Kim, and Lee (2017) Astrostatistics at Carnegie Mellon CMU Astrostatistics Network Graph 2017 (not including collaborations
More informationRegularized Least Squares
Regularized Least Squares Ryan M. Rifkin Google, Inc. 2008 Basics: Data Data points S = {(X 1, Y 1 ),...,(X n, Y n )}. We let X simultaneously refer to the set {X 1,...,X n } and to the n by d matrix whose
More informationGraph Metrics and Dimension Reduction
Graph Metrics and Dimension Reduction Minh Tang 1 Michael Trosset 2 1 Applied Mathematics and Statistics The Johns Hopkins University 2 Department of Statistics Indiana University, Bloomington November
More informationDictionary Learning for photo-z estimation
Dictionary Learning for photo-z estimation Joana Frontera-Pons, Florent Sureau, Jérôme Bobin 5th September 2017 - Workshop Dictionary Learning on Manifolds MOTIVATION! Goal : Measure the radial positions
More informationc 4, < y 2, 1 0, otherwise,
Fundamentals of Big Data Analytics Univ.-Prof. Dr. rer. nat. Rudolf Mathar Problem. Probability theory: The outcome of an experiment is described by three events A, B and C. The probabilities Pr(A) =,
More informationGI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil
GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection
More informationCS 231A Section 1: Linear Algebra & Probability Review
CS 231A Section 1: Linear Algebra & Probability Review 1 Topics Support Vector Machines Boosting Viola-Jones face detector Linear Algebra Review Notation Operations & Properties Matrix Calculus Probability
More informationUnsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent
Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:
More informationDepartment of Statistics Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA citizenship: U.S.A.
Chad M. Schafer Department of Statistics Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA 15213 cschafer@cmu.edu citizenship: U.S.A. Positions Held Associate Professor Department of Statistics,
More informationCS 231A Section 1: Linear Algebra & Probability Review. Kevin Tang
CS 231A Section 1: Linear Algebra & Probability Review Kevin Tang Kevin Tang Section 1-1 9/30/2011 Topics Support Vector Machines Boosting Viola Jones face detector Linear Algebra Review Notation Operations
More informationTensor Methods for Feature Learning
Tensor Methods for Feature Learning Anima Anandkumar U.C. Irvine Feature Learning For Efficient Classification Find good transformations of input for improved classification Figures used attributed to
More informationPrincipal Component Analysis
Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand
More informationImproved Local Coordinate Coding using Local Tangents
Improved Local Coordinate Coding using Local Tangents Kai Yu NEC Laboratories America, 10081 N. Wolfe Road, Cupertino, CA 95129 Tong Zhang Rutgers University, 110 Frelinghuysen Road, Piscataway, NJ 08854
More informationUsing SVD to Recommend Movies
Michael Percy University of California, Santa Cruz Last update: December 12, 2009 Last update: December 12, 2009 1 / Outline 1 Introduction 2 Singular Value Decomposition 3 Experiments 4 Conclusion Last
More informationIntroduction to Machine Learning
1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationProbabilistic photometric redshifts in the era of Petascale Astronomy
Probabilistic photometric redshifts in the era of Petascale Astronomy Matías Carrasco Kind NCSA/Department of Astronomy University of Illinois at Urbana-Champaign Tools for Astronomical Big Data March
More informationAnalysis of Spectral Kernel Design based Semi-supervised Learning
Analysis of Spectral Kernel Design based Semi-supervised Learning Tong Zhang IBM T. J. Watson Research Center Yorktown Heights, NY 10598 Rie Kubota Ando IBM T. J. Watson Research Center Yorktown Heights,
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality
More informationMachine Learning Practice Page 2 of 2 10/28/13
Machine Learning 10-701 Practice Page 2 of 2 10/28/13 1. True or False Please give an explanation for your answer, this is worth 1 pt/question. (a) (2 points) No classifier can do better than a naive Bayes
More informationClustering in kernel embedding spaces and organization of documents
Clustering in kernel embedding spaces and organization of documents Stéphane Lafon Collaborators: Raphy Coifman (Yale), Yosi Keller (Yale), Ioannis G. Kevrekidis (Princeton), Ann B. Lee (CMU), Boaz Nadler
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationDepartment of Computer Science and Engineering
Linear algebra methods for data mining with applications to materials Yousef Saad Department of Computer Science and Engineering University of Minnesota ICSC 2012, Hong Kong, Jan 4-7, 2012 HAPPY BIRTHDAY
More information