Some Statistical Aspects of Photometric Redshift Estimation

Similar documents
Exploiting Sparse Non-Linear Structure in Astronomical Data

Dictionary Learning for photo-z estimation

Data Analysis Methods

Principal Component Analysis (PCA)

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Big Data Inference. Combining Hierarchical Bayes and Machine Learning to Improve Photometric Redshifts. Josh Speagle 1,

Data Release 5. Sky coverage of imaging data in the DR5

SNAP Photometric Redshifts and Simulations

Lecture: Face Recognition and Feature Reduction

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

c Springer, Reprinted with permission.

Probabilistic Latent Semantic Analysis

a Short Introduction

Dimension Reduction (PCA, ICA, CCA, FLD,

Lecture: Face Recognition and Feature Reduction

Expectation Maximization

Principal Component Analysis

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Probabilistic photometric redshifts in the era of Petascale Astronomy

STA 414/2104: Lecture 8

Linear Models for Regression CS534

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Dimensionality reduction

Factor Analysis and Kalman Filtering (11/2/04)

CS 4495 Computer Vision Principle Component Analysis

Machine Learning - MT & 14. PCA and MDS

Principal Component Analysis (PCA)

Automatic Rank Determination in Projective Nonnegative Matrix Factorization

Pattern Recognition and Machine Learning

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Factor Analysis (FA) Non-negative Matrix Factorization (NMF) CSE Artificial Intelligence Grad Project Dr. Debasis Mitra

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction

PCA and admixture models

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

Deriving Principal Component Analysis (PCA)

Principal Component Analysis and Linear Discriminant Analysis

MULTIPLE EXPOSURES IN LARGE SURVEYS

1 Non-negative Matrix Factorization (NMF)

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Bayesian Learning (II)

Optimizing Gaia s Photometric System: Thoughts on distance measure and figure of merit

Linear Algebra & Geometry why is linear algebra useful in computer vision?

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

Data-driven models of stars

OBJECT DETECTION AND RECOGNITION IN DIGITAL IMAGES

Improved Astronomical Inferences via Nonparametric Density Estimation

Statistics in Astronomy (II) applications. Shen, Shiyin

Example: Face Detection

Introduction to Machine Learning

Large Scale Bayesian Inference

A cooperative approach among methods for photometric redshifts estimation

Nonlinear Data Transformation with Diffusion Map

Wavelet Transform And Principal Component Analysis Based Feature Extraction

An improved technique for increasing the accuracy of photometrically determined redshifts for blended galaxies

CS534 Machine Learning - Spring Final Exam

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees

Virtual Sensors and Large-Scale Gaussian Processes

Collaborative Filtering: A Machine Learning Perspective

Variational inference

A Coupled Helmholtz Machine for PCA

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Introduction to Data Mining

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Quick Introduction to Nonnegative Matrix Factorization

Photometric Redshifts with DAME

B553 Lecture 5: Matrix Algebra Review

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

CSC411: Final Review. James Lucas & David Madras. December 3, 2018

Principal Component Analysis (PCA) for Sparse High-Dimensional Data

Advanced Introduction to Machine Learning CMU-10715

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

Approximate Principal Components Analysis of Large Data Sets

Mining Classification Knowledge

Statistical Searches in Astrophysics and Cosmology

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

Dimensionality Reduction with Principal Component Analysis

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

STA 414/2104: Lecture 8

arxiv: v1 [astro-ph.co] 2 Dec 2016

Regression with Numerical Optimization. Logistic

An Introduction to Statistical and Probabilistic Linear Models

9. Evolution with redshift - z > 1.5. Selection in the rest-frame UV

Bayesian Machine Learning

Preprocessing & dimensionality reduction

A Modular NMF Matching Algorithm for Radiation Spectra

Learning representations

Machine learning for pervasive systems Classification in high-dimensional spaces

Support Vector Machines

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Variational Autoencoders

Tutorial on Principal Component Analysis

Bayesian Methods for Machine Learning

Transcription:

Some Statistical Aspects of Photometric Redshift Estimation James Long November 9, 2016 1 / 32

Photometric Redshift Estimation Background Non Negative Least Squares Non Negative Matrix Factorization EAZY Results 2 / 32

Outline Photometric Redshift Estimation Background Non Negative Least Squares Non Negative Matrix Factorization EAZY Results 3 / 32

Redshift f obs (red line) is the observed spectrum f rf (blue line) is the true, unobserved spectrum f rf (λ) = f obs (λ(1 + z)) goal: estimate z relatively easy if we observe f obs 4 / 32

Photometric Data spectral data is very expensive to collect much more photometric data available photometry is the spectra observed at small set of wavelengths Photometric redshift estimation (photoz) is the process of estimating redshift from photometric data. Source: Schafer A Framework for Statistical Inference in Astrophysics. http://www.annualreviews.org/doi/abs/10.1146/annurev-statistics-022513-115538 5 / 32

Machine Learning Approach to Photoz Idea: create training set by taking spectroscopy on subset of data construct classifier on training data, apply to unlabeled data Example: Collected photometry in 7 filters for 114 galaxies. Transmission 0.0 0.2 0.4 0.6 0.8 1.0 f300 f450 f606 f814 irimh irimj irimk 5000 10000 20000 Angstroms 6 / 32

Spectroscopic Redshift and Photometry 2 1 0 1 2 1 0 1 2 3 z 0 2 4 2 0 1 2 f450w f606w 2 0 2 1 1 2 3 irimj irimh 0.0 1.5 3.0 0 1 2 3 4 5 2 1 0 1 2 0.0 1.0 2.0 3.0 Only showing 4 of 7 filters. 7 / 32

Apply Random Forest 0 1 2 3 4 5 6 0 1 2 3 4 5 6 Spectroscopic Redshift Random Forest Estimated Redshift 15 10 5 0 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Residual/(1+z) Density Random Forest Guess the Mean ignored uncertainty in photometry (features) training sets are often from nearby objects, unlabeled data far away objects difficult to compute uncertainty in estimate 8 / 32

Easy Accurate Zphot from Yale (EAZY) developed by Brammer, van Dokkum, Coppi synthesizes ideas from several earlier works Paper: http://adsabs.harvard.edu/abs/2008apj...686.1503b. Code: https://github.com/gbrammer/eazy-photoz/ 9 / 32

Simple Idea Notation: determine set of model spectra T i for i = 1,..., n T actual spectra spectra generated by theoretical models let T z,i be spectra i redshifted to z let T j,z,i be spectra i redshifted to z, convolved with filter j F j and σ j are the flux and flux error in band j Optimization Function: ẑ = argmin z min 1 i n T J ( Tj,z,i F j j=1 σ j ) 2 10 / 32

Outline Photometric Redshift Estimation Background Non Negative Least Squares Non Negative Matrix Factorization EAZY Results 11 / 32

Generalizing the Template Set Consider expanding templates through linear combinations T z = n T i=1 α i T z,i = α T T z Astronomical theory says that α i 0 i. So optimization problem becomes J ( ) α T 2 T z F j ẑ = argmin min z α:α>0 This is known as non negative least squares in statistics. Solve on a grid of z. j=1 σ j 12 / 32

More Familiar Statistical Notation The non-negative least squares estimate is where β = argmin β:β 0 X R n p is design matrix Y R n is response Y X β 2 2 without the β 0 constraint the problem has the familiar LS form argmin β Y X β 2 2 = (X T X ) 1 X T Y. 13 / 32

Example Data 3 2 1 0 1 2 3 2 1 0 1 2 x y Ordinary Least Squares (OLS) Non negative least squares (NNLS) Naive (Incorrect) NNLS Estimate 14 / 32

Example Likelihood Surface β^1 1.5 1.0 0.5 0.0 0.5 1.0 1.5 Ordinary Least Squares (OLS) Non negative least squares (NNLS) Naive (Incorrect) NNLS Estimate 1.0 0.5 0.0 0.5 1.0 β^0 Feasible Set 15 / 32

Non Negative Least Squares Algorithms R package nnls and scipy.optimize.nnls use active set method (Lawson and Hanson 1974 book Solving Least Squares Problems ) EAZY uses Multiplicative Updates for Nonnegative Quadratic Programming by Sha 2007 Neural Computation 16 / 32

Outline Photometric Redshift Estimation Background Non Negative Least Squares Non Negative Matrix Factorization EAZY Results 17 / 32

Constructing Template Set T i are the templates. They may be observed data advantages: real data that does not make physical assumptions disadvantages: sometimes expensive to collect, little data available at high redshifts output from physical simulations advantages / disadvantages reversed from observed data Having a small set of T i is convenient because interpretation is easier computation is faster 18 / 32

Constructing Template Set Normalized Flux 0.000 0.005 0.010 0.015 1000 2000 5000 10000 Angstroms T i for i = 1,..., 259 filters, lots of redundancy, like to reduce set 19 / 32

Dimension Reduction for Template Construction X R n p are templates n = number of templates p = number of bins for each tempate p is the dimension of the data assume: the row vectors x i are (approximately) in some lower dimensional subspace of R p finding and characterizing this subspace is called dimension reduction We would like this subspace to be characterized as a linear combination of positive bases. 20 / 32

Dimension Reduction Example consider {(x i1, x i2 )} n i=1 the first two dimensions of filters x_i2 0.000 0.002 0.004 0.006 0.008 0.010 0.012 0.000 0.005 0.010 0.015 x_i1 Message: intrinsic dimension is near 1 can compress the two dimensional data into 1 dimension 21 / 32

Principal Components Analysis (PCA) Idea realign axes so most variation on first axis second most variation on second axis... ignore higher axes because minimal variation in these directions principal components describe how the new axes map to the old axis PCA is typically applied to a scaled version of X. X = (X 1µ T )S 1 remove column means (µ) scale column variances to 1 S is diagonal with Sjj = standard deviation column j of X 22 / 32

PCA Math Singular Value Decomposition The singular value decomposition of X (assuming n > p) is X = UΣV T where U is n p with U T U = I 1 data in the new coordinate system. V is p p with V T V = I V rotates the new coordinates to the old coordinates. Σ is p p diagonal with Σ jj > Σ ii for j < i 2 Σ scales the new coordinates to the old coordinates. 1 U is n p in R and n n in theory. 2 Σ is p p in R and n p in theory. 23 / 32

Reconstructing the data a q p dimensional reconstruction of X (in R notation) is X q = U[, 1:q]Σ[1:q, 1:q]V [, 1:q] T if the data lies (approximately) on a q dimensional subspace then X Xq obtain an approximation of the original data X q = Xq S + 1µ T and X X q reduced template set is V [, 1:q] 24 / 32

Non negative Matrix Factorization Decomposition: When the data matrix X is positive we can decompose X n p W n r H r p where rows of H are basis. Algorithm: Maximize L(W, H) = n p (x ij log(wh) ij (WH) ij ) i=1 j=1 ie maximum likelihood under the model that x ij is Poisson((WH) ij ) Source: Elements of Statistical Learning. Haste, Tibshirani, Friedman. Chapter 14.6. Available: http://statweb.stanford.edu/~tibs/elemstatlearn/ 25 / 32

Non negative Matrix (NMF) Factorization the NMF basis vectors often have more physical interpretation than PCA basis eg NMF spectral basis elements look like spectra optimizing log likelihood difficult for NMF identifiability issues with model Source: Donoho and Stodden. When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts? https://web.stanford.edu/~vcs/papers/nmfcdp.pdf 26 / 32

MNIST Data Set 12 Fours Out of 4072 each image 28 28 pixels vectorize image i to x i R 784 X R 4072 784 apply PCA and NMF 27 / 32

MNIST Results PCA versus NMF PCA (Mean + 3 principal components) NMF (4 Basis Vectors) 28 / 32

Outline Photometric Redshift Estimation Background Non Negative Least Squares Non Negative Matrix Factorization EAZY Results 29 / 32

Random Forest EAZY Comparison 0 1 2 3 4 5 6 0 1 2 3 4 5 6 Spectroscopic Redshift Random Forest Estimated Redshift 0 1 2 3 4 5 6 0 1 2 3 4 5 6 Spectroscopic Redshift eazy Estimated Redshift 30 / 32

Random Forest EAZY Comparison Density 0 1 2 3 4 5 Random Forest Guess the Mean EAZY 15 10 5 0 Residual/(1+z) 31 / 32

Other Challenges / Opportunities / Approaches accounting for template error (model misspecification) estimating templates from photometric data propagating uncertainty on redshift to the next stage of analysis other work on photoz: Robust machine learning applied to astronomical data sets. III. Probabilistic photometric redshifts for galaxies and quasars in the SDSS and GALEX. Ball ApJ 2008 Bayesian photometric redshift estimation. Benitez ApJ 2000 Random forests for photometric redshifts. Carliles ApJ 2010 32 / 32