Big Data Inference. Combining Hierarchical Bayes and Machine Learning to Improve Photometric Redshifts. Josh Speagle 1,

ICHASC, Apr. 25 2017 Big Data Inference Combining Hierarchical Bayes and Machine Learning to Improve Photometric Redshifts Josh Speagle 1, jspeagle@cfa.harvard.edu In collaboration with: Boris Leistedt 2, Alexie Leauthaud 3, Daniel Eisenstein 1, Kevin Bundy 3,4, Peter Capak 5, Ben Hoyle 6, Daniel Masters 5, Daniel Mortlock 7, and Hiranya Peiris 8,9 NSF Graduate Research Fellow, 1 Harvard University, 2 NYU, 3 UC Santa Cruz, 4 UC Observatories, 5 IPAC, 6 LMU Munich, 7 Imperial College London, 8 University College London, 9 Oskar Klein Centre for Cosmoparticle Physics

What are Photometric Redshifts?

Basic Concept?

Basic Concept? SDSS Skyserver Spectrum

Basic Concept Spec-z SDSS Skyserver Spectrum

Basic Concept? SDSS Skyserver Broadband filters

Basic Concept Spectral Energy Distribution (SED)? SDSS Skyserver Broadband filters

Basic Concept Spectral Energy Distribution (SED) SDSS Skyserver Broadband filters

http://www.stsci.edu/~dcoe/bpz/intro.html

Why do we care?

Because we have to. Why do we care?

Because we have to. Why do we care? Many questions now require large samples of galaxies to answer now entering the Big Data era of astronomy.

Because we have to. Why do we care? Many questions now require large samples of galaxies to answer now entering the Big Data era of astronomy. Wide-field imaging surveys much cheaper and faster than spectroscopic surveys. Also can see fainter objects. ~100x increase in sample size, diversity makes up for photo-z uncertainties. (Detailed studies can rely on ~1% spectroscopic subsample.)

Science Case Precision cosmology Using large samples of galaxies to pin down the dark energy equation of state, growth of large-scale structure, etc. Taken from http://kids.strw.leidenuniv.nl/goals.php.

Computing Photo-z s

Photo-z s: Statistically Speaking Star formation rate Star formation history Redshift Stellar mass AGN activity Dust content Metallicity

Photo-z s: Statistically Speaking A forward modeling problem: can we construct a model from parameters we care about that matches the observed SED? Star formation rate Star formation history Redshift Stellar mass AGN activity Dust content Metallicity

Flux (Jy; normalized) Photometry = F( a T gal ) Traditional Approach e.g., COSMOS: Ilbert et al. (2009), Laigle et al. (2016) Galaxy Wavelength (A)

Flux (Jy; normalized) Traditional Approach e.g., COSMOS: Ilbert et al. (2009), Laigle et al. (2016) Photometry = F( a, b T gal, T em ) Emission lines Galaxy Wavelength (A)

Flux (Jy; normalized) Traditional Approach e.g., COSMOS: Ilbert et al. (2009), Laigle et al. (2016) Photometry = F( a, b, c, E(B V) T gal, T em, T dust ) Emission lines Galaxy Reddening Wavelength (A)

Flux (Jy; normalized) Traditional Approach e.g., COSMOS: Ilbert et al. (2009), Laigle et al. (2016) Photometry = F( a, b, c, E(B V), z T gal, T em, T dust ) Emission lines Galaxy Reddening Redshift Wavelength (A)

Flux (Jy; normalized) Traditional Approach e.g., COSMOS: Ilbert et al. (2009), Laigle et al. (2016) Photometry = F( a, b, c, E(B V), z T gal, T em, T dust, T IGM ) Emission lines Galaxy Reddening Redshift Lyman limit IGM Lyα Wavelength (A)

Flux (Jy; normalized) Traditional Approach e.g., COSMOS: Ilbert et al. (2009), Laigle et al. (2016) Photometry = F( a, b, c, E(B V), z T gal, T em, T dust, T IGM, filters) Emission lines Galaxy Reddening Redshift Photometry IGM Wavelength (A)

Photo-z s: Statistically Speaking An inverse mapping problem: can we use machine learning to construct a mapping from color to redshift? Star formation rate Star formation history Redshift Stellar mass AGN activity Dust content Metallicity

Machine Learning Training set

Machine Learning Decision Tree

Machine Learning

Machine Learning z 1 z 2 z 3 z 4 z 5 z 6 z 7 z 8 Target (redshift)

Machine Learning ~ P C g Feature vector (observed color) z 1 z 2 z 3 z 4 z 5 z 6 z 7 z 8 Target (redshift)

Machine Learning

Machine Learning z 1 z 2 z 3 z 4 z 5 z 6 z 7 z 8 1 7 3 1 0 0 0 0 Target Weight Kernel Density Estimation P z g

Two Communities Model-fitting approaches (color-redshift relation assumed) Machine-learning approaches (feature-redshift relation derived)

Two Communities Model-fitting approaches (color-redshift relation assumed) Probabilistic Interpretable Sensitive to systematics Generally slow Machine-learning approaches (feature-redshift relation derived)

Two Communities Model-fitting approaches (color-redshift relation assumed) Probabilistic Interpretable Sensitive to systematics Generally slow Flexible, data-driven More robust to systematics Generally fast Difficult to interpret Difficult to derive PDFs Machine-learning approaches (feature-redshift relation derived)

What are we doing?

Splitting Photo-z s into Two Parts Mapping from features to redshift. Can be done through model fitting and/or machine learning. P z F or P F z, η

Splitting Photo-z s into Two Parts Mapping from features to redshift. Can be done through model fitting and/or machine learning. P z F or P F z, η Propagating uncertainties in features. P F F, C

Splitting Photo-z s into Two Parts Mapping from features to redshift. Can be done through model fitting and/or machine learning. P z F h, C h P F h F h, C h Propagating uncertainties in features. P F g F g, C g

Feature Projection P F h F h, C h P F g F g, C g

The Big Data Assumption P z g = h P z h P h g

The Big Data Assumption redshift P z g = target galaxy g h redshift PDF P z h P h g training galaxy h photometric posterior

The Big Data Assumption P z g = P z h P h g h = h P z h P g h P h P g

The Big Data Assumption P z g = P z h P h g h = P z h 1. What is this likelihood? h P g h P h P g

The Big Data Assumption P z g = P z h P h g = h h P z h 1. What is this likelihood? 2. How do we compute it? P g h P h P g

The Big Data Assumption P z g = = h h P z h P z h P h g band mask P g h, b P h P g 1. What is this likelihood? 2. How do we compute it? 3. How do we deal with missing data?

The Big Data Assumption P z g = h h P z h P h g, s g = 1, S g P z h P g h, b P h P g galaxy is observed selection effects 1. What is this likelihood? 2. How do we compute it? 3. How do we deal with missing data? 4. How do we incorporate selection effects?

The Big Data Assumption P z g = P z h P h g, s g = 1, S g h h P z h P g h, b P h P g 1. What is this likelihood? 2. How do we compute it? 3. How do we deal with missing data? 4. How do we incorporate selection effects? 5. What is our prior?

Likelihood / Distance Metric What metric(s) to use? P g h Training galaxy Observed galaxy

Likelihood / Distance Metric What metric(s) to use? P g h Training galaxy Observed galaxy NGC 4414 NGC 5457

Likelihood / Distance Metric What metric(s) to use? P g h Training galaxy SDSS Skyserver Observed galaxy Broadband filters

Likelihood / Distance Metric What metric(s) to use? P g h Training galaxy SDSS Skyserver Observed galaxy Assume data is normally distributed. Broadband filters P F g F g, C g N F g F g, C g

Likelihood / Distance Metric What metric(s) to use? P g h Color space (traditional) χ 2 = i F g,i s F h,i 2 σ 2 g,i + s 2 2 σ h,i

Likelihood / Distance Metric What metric(s) to use? P g h Color space (traditional) χ 2 = i F g,i s F h,i 2 σ 2 g,i + s 2 2 σ h,i Magnitude space ( new ) χ 2 = i F g,i F h,i 2 σ 2 2 g,i + σ h,i

Likelihood / Distance Metric What metric(s) to use? P g h Color space (traditional) Requires magnitude priors to account for galaxy evolution. χ 2 = i F g,i s F h,i 2 σ 2 g,i + s 2 2 σ h,i Scale-free Magnitude space ( new ) Requires good sampling in full magnitude space. χ 2 = i F g,i F h,i 2 σ 2 2 g,i + σ h,i Scale-dependent

Population: Mag v Color

The Problem Training data Galaxy Likelihood P h g Photometric training set Cool et al. (2007)

Machine Learning Approximation Training data Galaxy Likelihood P h g Cool et al. (2007)

Population: FRANKEN-Z

How Valid is Our Approximation?

Missing Data How to deal with missing data? Training galaxy P g h, b Observed galaxy Band mask

Now What? How to deal with missing data? P g h, b 2 ln L =???

Naïve Likelihood: Multivariate Normal How to deal with missing data? P g h, b 2 ln L n χ n 2 δ n +n ln 2π + ln C g + C h

Implications How to deal with missing data? P g h, b SDSS Skyserver

Implications How to deal with missing data? P g h, b SDSS Skyserver SDSS Skyserver

Implications How to deal with missing data? P g h, b SDSS Skyserver SDSS Skyserver 2 ln L 5 χ 5 2 + 5 ln 2π + ln C g + C h

Implications How to deal with missing data? P g h, b SDSS Skyserver SDSS Skyserver 2 ln L 3 χ 3 2 + 3 ln 2π + ln C g + C h

Solutions How to deal with missing data? P g h, b SDSS Skyserver SDSS Skyserver 2 ln L X n X n, X n, X i.i.d. n χ 2 n δ n

Implementation How to deal with missing data? P g h, b 2 ln L =??? some transformation of doubly-non-central F- distribution

Implementation How to deal with missing data? P g h, b χ 2 = i ignore non-centrality 2 ln L = χ n 2 N b F g,i F h,i 2 σ 2 2 g,i + σ h,i first-order correction

Missing Data: Searching for Neighbors How to deal with missing data? P g h, b SDSS Skyserver Training data Galaxy Cool et al. (2007)

Selection Effects How to deal with selection effects? P g h

Selection Effects How to deal with selection effects? P F g g

Selection Effects Selection effect(s) How to deal with selection effects? P F g g, s g = 1, S g Binary selection flag (1=in/0=out)

Selection Effects Selection effect(s) How to deal with selection effects? P F g g, s g = 1, S g Selection Probability P s g = 1 F g, S g P F g g P s g = 1 g, S g Marginalized Selection Probability Original PDF Binary selection flag (1=in/0=out)

Application: Signal-to-Noise Cut

Mixing and Matching Galaxies How to deal with population mismatch? Based on Leistedt, Peiris, & Mortlock (2016)

Mixing and Matching Galaxies How to deal with population mismatch? p g ~ Mult n = 1, p = p g N h categories with probability vector p g = P h g Trinomial distribution from Mathworks. Based on Leistedt, Peiris, & Mortlock (2016)

Mixing and Matching Galaxies How to deal with population mismatch? p g ~ Mult n = 1, p = p g Trinomial distribution from Mathworks. N h categories with probability vector p g = P h g h = {0,1,2} p g = 0,0,1 h g = 2 Based on Leistedt, Peiris, & Mortlock (2016)

Mixing and Matching Galaxies How to deal with population mismatch? p g ~ Mult n = 1, p = p g P h n ~ p g g g Based on Leistedt, Peiris, & Mortlock (2016)

Mixing and Matching Galaxies How to deal with population mismatch? p g ~ Mult n = 1, p = p g n ~ p g g g P h Dir w α = n + 1 Population weights Concentration Counts Based on Leistedt, Peiris, & Mortlock (2016)

Mixing and Matching Galaxies How to deal with population mismatch? p g ~ Mult n = 1, p = p g Low concentration Courtesy of Wikipedia. n ~ High concentration p g g g P h Dir w α = n + 1 Population weights Concentration Counts Based on Leistedt, Peiris, & Mortlock (2016)

Hierarchical Modeling Hierarchical posteriors How to deal with population mismatch? P w, p g D, S Population weights (training set) Based on Leistedt, Peiris, & Mortlock (2016)

Hierarchical Modeling Hierarchical posteriors How to deal with population mismatch? P w, p g D, S Gibbs sampling P p g w, S Population weights (training set) P w p g, S Based on Leistedt, Peiris, & Mortlock (2016)

Hierarchical Modeling Hierarchical posteriors How to deal with population mismatch? P w, p g D, S Gibbs sampling P p g w, S Population weights (training set) 1. Sample hierarchical posteriors: i p g ~ Mult n = 1, p = p g w i 1 P w p g, S Based on Leistedt, Peiris, & Mortlock (2016)

Hierarchical Modeling Hierarchical posteriors How to deal with population mismatch? P w, p g D, S Gibbs sampling P p g w, S P w p g, S Population weights (training set) 1. Sample hierarchical posteriors: i p g ~ Mult n = 1, p = p g w i 1 2. Compute counts: n i ~ g g p g i Based on Leistedt, Peiris, & Mortlock (2016)

Application to Mock Data Errors ~ N, independent All objects share same PDF Counts from posterior (uniform prior) Counts from hierarchical model

Application to Mock Data

Tests on ~100k SDSS Galaxies

Next Steps

Robustness of Training Data/Priors Hyper Suprime-Cam (HSC) SSP has ~380k objects taken from 11 surveys. Wide variety of selection criteria, data quality/reliability. Note: old plot from early internal data release.

Robustness of Training Data/Priors 1:1:1 prior 10:5:1 prior Spec-z : g/prism-z : photo-z Relative weights Preliminary

Incorporating Physical Priors Moving a galaxy from one redshift to another is a smooth, physical process that is well-understood. Want to incorporate this into our priors/predictions. GP with physical kernel Leistedt & Hogg (2016)

Incorporating Physical Priors Can use to augment training data. Straightforward to impute missing values. Leistedt & Hogg (2016)

Likelihood / Distance Metric What metric(s) to use? P g h Training galaxy More sophisticated machine learning methods could be used to compute posterior samples (or possibly likelihoods) over complex domains. Observed galaxy NGC 4414 NGC 5457

Summary Photometric redshifts (photo-z s) are an integral part of modern big data extragalactic science. Large training datasets gives new opportunities to develop Bayesian, data-driven photo-z s. Taking advantage of these datasets requires dealing with real-world problems (e.g., biased training data) using a variety of statistical methods (e.g., hierarchical Bayes). Early results look promising!

Code is available! Although still under active development, code, tutorials, and rough draft of a paper are online at: github.com/joshspeagle/frankenz