Exploiting Sparse Non-Linear Structure in Astronomical Data

Similar documents
Nonlinear Data Transformation with Diffusion Map

Improved Astronomical Inferences via Nonparametric Density Estimation

Data Analysis Methods

Exploiting Low-Dimensional Structure in Astronomical Spectra

Some Statistical Aspects of Photometric Redshift Estimation

Using Local Spectral Methods in Theory and in Practice

Astrostatistics: The Final Frontier

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

c 4, < y 2, 1 0, otherwise,

Machine Learning Linear Models

Contribution from: Springer Verlag Berlin Heidelberg 2005 ISBN

Sparse Composite Models of Galaxy Spectra (or Anything) using L1 Norm Regularization. Benjamin Weiner Steward Observatory University of Arizona

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA

Introduction to Machine Learning

Diffusion/Inference geometries of data features, situational awareness and visualization. Ronald R Coifman Mathematics Yale University

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

March 13, Paper: R.R. Coifman, S. Lafon, Diffusion maps ([Coifman06]) Seminar: Learning with Graphs, Prof. Hein, Saarland University

Approximate Principal Components Analysis of Large Data Sets

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Graphs, Geometry and Semi-supervised Learning

Data Exploration vis Local Two-Sample Testing

Unsupervised Learning

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Data Release 5. Sky coverage of imaging data in the DR5

MULTIPLE EXPOSURES IN LARGE SURVEYS

Spectral approximations in machine learning

Global vs. Multiscale Approaches

Nonlinear Dimensionality Reduction

Kernel-Based Contrast Functions for Sufficient Dimension Reduction

Quantifying correlations between galaxy emission lines and stellar continua

STATISTICAL LEARNING SYSTEMS

Gaussian Processes (10/16/13)

Statistical Learning. Dong Liu. Dept. EEIS, USTC

Diffusion Wavelets and Applications

Big Data Inference. Combining Hierarchical Bayes and Machine Learning to Improve Photometric Redshifts. Josh Speagle 1,

MLCC Clustering. Lorenzo Rosasco UNIGE-MIT-IIT

An improved technique for increasing the accuracy of photometrically determined redshifts for blended galaxies

Automated Variable Source Classification: Methods and Challenges

Support Vector Machines for Classification: A Statistical Portrait

Online Learning in High Dimensions. LWPR and it s application

Hubble s Law and the Cosmic Distance Scale

Locally-biased analytics

PREDICTING SOLAR GENERATION FROM WEATHER FORECASTS. Chenlin Wu Yuhan Lou

ECE521 week 3: 23/26 January 2017

Beyond Scalar Affinities for Network Analysis or Vector Diffusion Maps and the Connection Laplacian

Statistical Machine Learning

Probabilistic Graphical Models

Graph Partitioning Using Random Walks

CPSC 540: Machine Learning

Kernel methods for comparing distributions, measuring dependence

Data-Intensive Statistical Challenges in Astrophysics

Diffusion Geometries, Diffusion Wavelets and Harmonic Analysis of large data sets.

Neuroscience Introduction

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

Analysis of Spectral Kernel Design based Semi-supervised Learning

arxiv: v1 [astro-ph.im] 16 Jun 2011

Semi-supervised learning for photometric supernova classification

How Do I Create a Hubble Diagram to show the expanding universe?

Dimension Reduction Methods

Masses are much harder than distance, luminosity, or temperature. Binary Stars to the Rescue!! AST 101 Introduction to Astronomy: Stars & Galaxies

Lectures in AstroStatistics: Topics in Machine Learning for Astronomers

Photometric Redshifts with DAME

A SPEctra Clustering Tool for the exploration of large spectroscopic surveys. Philipp Schalldach (HU Berlin & TLS Tautenburg, Germany)

Nonlinear Dimensionality Reduction. Jose A. Costa

Graph Metrics and Dimension Reduction

IFT LAPLACIAN APPLICATIONS. Mikhail Bessmeltsev

Filtering via a Reference Set. A.Haddad, D. Kushnir, R.R. Coifman Technical Report YALEU/DCS/TR-1441 February 21, 2011

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Justin Solomon MIT, Spring 2017

Statistical Pattern Recognition

Issues and Techniques in Pattern Classification

Comparing Robustness of Pairwise and Multiclass Neural-Network Systems for Face Recognition

Fast Hierarchical Clustering from the Baire Distance

Nonparametric Estimation of Luminosity Functions

Department of Statistics Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA citizenship: U.S.A.

Data-dependent representations: Laplacian Eigenmaps

Recap from previous lecture

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Mapping the Similarities of Spectra: Global and Locally-biased Approaches to SDSS Galaxy Data

Revision: Chapter 1-6. Applied Multivariate Statistics Spring 2012

L26: Advanced dimensionality reduction

High-Dimensional Pattern Recognition using Low-Dimensional Embedding and Earth Mover s Distance

Introduction. Chapter 1

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning (BSMC-GA 4439) Wenke Liu

Learning on Graphs and Manifolds. CMPSCI 689 Sridhar Mahadevan U.Mass Amherst

9. High-level processing (astronomical data analysis)

Spectral Clustering of Polarimetric SAR Data With Wishart-Derived Distance Measures

Making precise and accurate measurements with data-driven models

Cluster Kernels for Semi-Supervised Learning

Manifold Regularization

Active and Semi-supervised Kernel Classification

Statistics in Astronomy (II) applications. Shen, Shiyin

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

arxiv: v1 [stat.ml] 3 Mar 2019

Local regression, intrinsic dimension, and nonparametric sparsity

Data-driven models of stars

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)

Transcription:

Exploiting Sparse Non-Linear Structure in Astronomical Data Ann B. Lee Department of Statistics and Department of Machine Learning, Carnegie Mellon University Joint work with P. Freeman, C. Schafer, and J. Richards 1

High-Dimensional Inference for Large Data Sets Massive amounts of data collected in astronomical surveys. Typically, High dimensions (p), Huge data sets (n). Complex and noisy data. THE QUESTION IS: how do we extract useful information and make reliable predictions and estimates? Visualization, outlier detection, density estimation Regression/classification Curse of dimensionality (computational/statistical) 2

High Dimensional Inference: Two Important Considerations 1. To find a good initial basis or set of attributes. E.g. Construct ON basis. Dimensionality reduction by projection T k : R p R k, x T k x =(x w 1,x w 2,..., x w k ) 2. To find a parameterization and metric that reflects the underlying geometry of the data. Natural occurring data often have a low intrinsic dimensionality. Underlying dimension, underlying basis functions? 3

Principal Component Maps = Classical Multi-Dimensional Scaling (MDS) Elements of Statistical Learning, Hastie, Tibshirani, and Friedman, pg. 488 Minimize δ = i,j (d2 ij d 2 ij ), where d2 ij = x i x j 2 How about more complex data structures? 4

Non-Linear Dimension Reduction and Data Parameterization via Spectral Connectivity Analysis (Diffusion Maps, Euclidean Commute Time Maps and other metric-based random walk formulations of spectral kernel methods) Starting points: set of objects and similarity measure that makes sense locally (flexibility), e.g X = {x 1,x 2,..., x n } s(x i,x j )= x i x j Basic idea: Integrate local information from overlapping neighborhoods into global representation 5

Non-Linear Dimension Reduction and Data Parameterization via Spectral Connectivity Analysis Create a distance d(x,y) that measures connectivity or how easy information flows from x to y (Markov chain on your data) Find x =f(x) and y =f(y) so that d(x, y) x y Use only the first few components of x and y --- eigenvectors of Markov transition matrix Integrates all paths of length t connecting x and y. Extremely robust to noise! D 2 m(x, z) = p m (x, ) p m (z, ) 2 1/φ 6

Example: Hour-glass surface Courtesy of S. Lafon 7

Unraveling the underlying structure by SCA 2 1.5 1 y.5 x.5 1 1.5 2.5 2 1.5 1.5.5 1 1.5 2 Use underlying basis functions for visualization, clustering, regression, sampling strategies, quick searches in large databases, etc. Next: Examples for astrophysical problems 8

Red Shift Estimation from SDSS spectra [J. Richards et al, 29] " 3! 3 t 5 5 1 15 x 1 12 x 1 11 5 5 1 15 2 25! 2 t " 2 Diffusion Map 6 4 2! 1 t " 1 2 4 6 x 1 7.16.14.12.1.8.6.4 Embedding of a sample of 2793 SDSS spectra with SDSS z CL>.99. Color codes for log(1+z). CV Prediction Risk.15.2.25.3 Diffusion Map PCA 5 1 2 3 # of basis functions Adaptive regression using orthogonal eigenfunctions: r(x) = j β jψ j (x) 9

.3.2 [P. Freeman et al, 29].1 Top: predictions for MSG validation set.. Photometric Redshift Estimate ( z ) Photometric Redshift Estimation.5.4.3.2 Photometric Redshift Estimate ( z )..5.1.15.2.25.3.35 Train orthogonal series model on 9749 randomly selected objects. Estimate redshifts and basis functions for another 34,989 galaxies using the Nystrom extension.2.3.4.5 Spectroscopic Redshift (Z) Adaptive regression using orthogonal eigenfunctions: r(x) =! j βj ψj (x) 1

Estimation of Star Formation History Using Galaxy Spectra [Cid Fernandes et al. 24; J. Richards et al, 29] Population synthesis: Model galaxy spectra as linear combinations of observable data from K simple stellar populations (SSPs) Cid Fernandes et al: Elements of the base should span the range of spectral properties observed in sample galaxies and provide enough resolution in age (t) and metallicity (Z) to address the desired questions Sampling strategy? How do we choose a grid of t and Z? Diffusion K means for SSP spectra, K=45, #=1 12 1 8! 3 /(1! 3 ) " 3 6 4 2 2 4 6! 2 /(1! 2 ) " 2 5 1 1 8 6 4 2! 1 /(1! 1 ) " 1 Basis spectra for CF5 and Diffusion K-means colored by log t 11

SUMMARY. Exploiting non-linear sparse structure in astronomical data Natural data often have low-dimensional structure In complex settings, linear methods may not be adequate. SCA learns the underlying non-linear geometry (basis functions). Can greatly improve data understanding, visualization, high-dimensional inference, sampling strategies and guide prior beliefs Open questions: How useful are these techniques for astronomy? Are they viable for immense data sets (billions of objects)? Semi-supervised learning --- small set of labeled data; improved prediction by learning geometry from a large amount of unlabeled data (guide prior beliefs) 12