ICHASC, Apr. 25 2017 Big Data Inference Combining Hierarchical Bayes and Machine Learning to Improve Photometric Redshifts Josh Speagle 1, jspeagle@cfa.harvard.edu In collaboration with: Boris Leistedt 2, Alexie Leauthaud 3, Daniel Eisenstein 1, Kevin Bundy 3,4, Peter Capak 5, Ben Hoyle 6, Daniel Masters 5, Daniel Mortlock 7, and Hiranya Peiris 8,9 NSF Graduate Research Fellow, 1 Harvard University, 2 NYU, 3 UC Santa Cruz, 4 UC Observatories, 5 IPAC, 6 LMU Munich, 7 Imperial College London, 8 University College London, 9 Oskar Klein Centre for Cosmoparticle Physics
What are Photometric Redshifts?
Basic Concept?
Basic Concept? SDSS Skyserver Spectrum
Basic Concept Spec-z SDSS Skyserver Spectrum
Basic Concept Spec-z SDSS Skyserver Spectrum
Basic Concept Spec-z SDSS Skyserver Spectrum
Basic Concept? SDSS Skyserver Broadband filters
Basic Concept Spectral Energy Distribution (SED)? SDSS Skyserver Broadband filters
Basic Concept Spectral Energy Distribution (SED) SDSS Skyserver Broadband filters
http://www.stsci.edu/~dcoe/bpz/intro.html
Why do we care?
Because we have to. Why do we care?
Because we have to. Why do we care? Many questions now require large samples of galaxies to answer now entering the Big Data era of astronomy.
Because we have to. Why do we care? Many questions now require large samples of galaxies to answer now entering the Big Data era of astronomy. Wide-field imaging surveys much cheaper and faster than spectroscopic surveys. Also can see fainter objects.
Because we have to. Why do we care? Many questions now require large samples of galaxies to answer now entering the Big Data era of astronomy. Wide-field imaging surveys much cheaper and faster than spectroscopic surveys. Also can see fainter objects. ~100x increase in sample size, diversity makes up for photo-z uncertainties. (Detailed studies can rely on ~1% spectroscopic subsample.)
Science Case Precision cosmology Using large samples of galaxies to pin down the dark energy equation of state, growth of large-scale structure, etc. Taken from http://kids.strw.leidenuniv.nl/goals.php.
Computing Photo-z s
Photo-z s: Statistically Speaking Star formation rate Star formation history Redshift Stellar mass AGN activity Dust content Metallicity
Photo-z s: Statistically Speaking A forward modeling problem: can we construct a model from parameters we care about that matches the observed SED? Star formation rate Star formation history Redshift Stellar mass AGN activity Dust content Metallicity
Flux (Jy; normalized) Photometry = F( a T gal ) Traditional Approach e.g., COSMOS: Ilbert et al. (2009), Laigle et al. (2016) Galaxy Wavelength (A)
Flux (Jy; normalized) Traditional Approach e.g., COSMOS: Ilbert et al. (2009), Laigle et al. (2016) Photometry = F( a, b T gal, T em ) Emission lines Galaxy Wavelength (A)
Flux (Jy; normalized) Traditional Approach e.g., COSMOS: Ilbert et al. (2009), Laigle et al. (2016) Photometry = F( a, b, c, E(B V) T gal, T em, T dust ) Emission lines Galaxy Reddening Wavelength (A)
Flux (Jy; normalized) Traditional Approach e.g., COSMOS: Ilbert et al. (2009), Laigle et al. (2016) Photometry = F( a, b, c, E(B V), z T gal, T em, T dust ) Emission lines Galaxy Reddening Redshift Wavelength (A)
Flux (Jy; normalized) Traditional Approach e.g., COSMOS: Ilbert et al. (2009), Laigle et al. (2016) Photometry = F( a, b, c, E(B V), z T gal, T em, T dust, T IGM ) Emission lines Galaxy Reddening Redshift Lyman limit IGM Lyα Wavelength (A)
Flux (Jy; normalized) Traditional Approach e.g., COSMOS: Ilbert et al. (2009), Laigle et al. (2016) Photometry = F( a, b, c, E(B V), z T gal, T em, T dust, T IGM, filters) Emission lines Galaxy Reddening Redshift Photometry IGM Wavelength (A)
Flux (Jy; normalized) Traditional Approach e.g., COSMOS: Ilbert et al. (2009), Laigle et al. (2016) Photometry = F( a, b, c, E(B V), z T gal, T em, T dust, T IGM, filters) Emission lines Galaxy Reddening Redshift Photometry IGM Wavelength (A)
Photo-z s: Statistically Speaking A forward modeling problem: can we construct a model from parameters we care about that matches the observed SED? Star formation rate Star formation history Redshift Stellar mass AGN activity Dust content Metallicity
Photo-z s: Statistically Speaking An inverse mapping problem: can we use machine learning to construct a mapping from color to redshift? Star formation rate Star formation history Redshift Stellar mass AGN activity Dust content Metallicity
Machine Learning Training set
Machine Learning Decision Tree
Machine Learning
Machine Learning z 1 z 2 z 3 z 4 z 5 z 6 z 7 z 8 Target (redshift)
Machine Learning ~ P C g Feature vector (observed color) z 1 z 2 z 3 z 4 z 5 z 6 z 7 z 8 Target (redshift)
Machine Learning ~ P C g Feature vector (observed color) z 1 z 2 z 3 z 4 z 5 z 6 z 7 z 8 Target (redshift)
Machine Learning
Machine Learning z 1 z 2 z 3 z 4 z 5 z 6 z 7 z 8 1 7 3 1 0 0 0 0 Target Weight Kernel Density Estimation P z g
Two Communities Model-fitting approaches (color-redshift relation assumed) Machine-learning approaches (feature-redshift relation derived)
Two Communities Model-fitting approaches (color-redshift relation assumed) Machine-learning approaches (feature-redshift relation derived)
Two Communities Model-fitting approaches (color-redshift relation assumed) Probabilistic Interpretable Sensitive to systematics Generally slow Machine-learning approaches (feature-redshift relation derived)
Two Communities Model-fitting approaches (color-redshift relation assumed) Probabilistic Interpretable Sensitive to systematics Generally slow Flexible, data-driven More robust to systematics Generally fast Difficult to interpret Difficult to derive PDFs Machine-learning approaches (feature-redshift relation derived)
What are we doing?
Splitting Photo-z s into Two Parts Mapping from features to redshift. Can be done through model fitting and/or machine learning. P z F or P F z, η
Splitting Photo-z s into Two Parts Mapping from features to redshift. Can be done through model fitting and/or machine learning. P z F or P F z, η Propagating uncertainties in features. P F F, C
Splitting Photo-z s into Two Parts Mapping from features to redshift. Can be done through model fitting and/or machine learning. P z F h, C h P F h F h, C h Propagating uncertainties in features. P F g F g, C g
Splitting Photo-z s into Two Parts Mapping from features to redshift. Can be done through model fitting and/or machine learning. P z F h, C h P F h F h, C h Propagating uncertainties in features. P F g F g, C g
Splitting Photo-z s into Two Parts Mapping from features to redshift. Can be done through model fitting and/or machine learning. P z F h, C h P F h F h, C h Propagating uncertainties in features. P F g F g, C g
Feature Projection P F h F h, C h P F g F g, C g
The Big Data Assumption P z g = h P z h P h g
The Big Data Assumption redshift P z g = target galaxy g h redshift PDF P z h P h g training galaxy h photometric posterior
The Big Data Assumption P z g = P z h P h g h = h P z h P g h P h P g
The Big Data Assumption P z g = P z h P h g h = P z h 1. What is this likelihood? h P g h P h P g
The Big Data Assumption P z g = P z h P h g = h h P z h 1. What is this likelihood? 2. How do we compute it? P g h P h P g
The Big Data Assumption P z g = = h h P z h P z h P h g band mask P g h, b P h P g 1. What is this likelihood? 2. How do we compute it? 3. How do we deal with missing data?
The Big Data Assumption P z g = h h P z h P h g, s g = 1, S g P z h P g h, b P h P g galaxy is observed selection effects 1. What is this likelihood? 2. How do we compute it? 3. How do we deal with missing data? 4. How do we incorporate selection effects?
The Big Data Assumption P z g = P z h P h g, s g = 1, S g h h P z h P g h, b P h P g 1. What is this likelihood? 2. How do we compute it? 3. How do we deal with missing data? 4. How do we incorporate selection effects? 5. What is our prior?
The Big Data Assumption P z g = P z h P h g, s g = 1, S g h h P z h P g h, b P h P g 1. What is this likelihood? 2. How do we compute it? 3. How do we deal with missing data? 4. How do we incorporate selection effects? 5. What is our prior?
Likelihood / Distance Metric What metric(s) to use? P g h Training galaxy Observed galaxy
Likelihood / Distance Metric What metric(s) to use? P g h Training galaxy Observed galaxy NGC 4414 NGC 5457
Likelihood / Distance Metric What metric(s) to use? P g h Training galaxy SDSS Skyserver Observed galaxy Broadband filters
Likelihood / Distance Metric What metric(s) to use? P g h Training galaxy SDSS Skyserver Observed galaxy Assume data is normally distributed. Broadband filters P F g F g, C g N F g F g, C g
Likelihood / Distance Metric What metric(s) to use? P g h Color space (traditional) χ 2 = i F g,i s F h,i 2 σ 2 g,i + s 2 2 σ h,i
Likelihood / Distance Metric What metric(s) to use? P g h Color space (traditional) χ 2 = i F g,i s F h,i 2 σ 2 g,i + s 2 2 σ h,i
Likelihood / Distance Metric What metric(s) to use? P g h Color space (traditional) χ 2 = i F g,i s F h,i 2 σ 2 g,i + s 2 2 σ h,i Magnitude space ( new ) χ 2 = i F g,i F h,i 2 σ 2 2 g,i + σ h,i
Likelihood / Distance Metric What metric(s) to use? P g h Color space (traditional) Requires magnitude priors to account for galaxy evolution. χ 2 = i F g,i s F h,i 2 σ 2 g,i + s 2 2 σ h,i Scale-free Magnitude space ( new ) Requires good sampling in full magnitude space. χ 2 = i F g,i F h,i 2 σ 2 2 g,i + σ h,i Scale-dependent
Population: Mag v Color
The Big Data Assumption P z g = P z h P h g, s g = 1, S g h h P z h P g h, b P h P g 1. What is this likelihood? 2. How do we compute it? 3. How do we deal with missing data? 4. How do we incorporate selection effects? 5. What is our prior?
The Big Data Assumption P z g = P z h P h g, s g = 1, S g h h P z h P g h, b P h P g 1. What is this likelihood? 2. How do we compute it? 3. How do we deal with missing data? 4. How do we incorporate selection effects? 5. What is our prior?
The Problem Training data Galaxy Likelihood P h g Photometric training set Cool et al. (2007)
Machine Learning Approximation Training data Galaxy Likelihood P h g Cool et al. (2007)
Population: FRANKEN-Z
How Valid is Our Approximation?
The Big Data Assumption P z g = P z h P h g, s g = 1, S g h h P z h P g h, b P h P g 1. What is this likelihood? 2. How do we compute it? 3. How do we deal with missing data? 4. How do we incorporate selection effects? 5. What is our prior?
The Big Data Assumption P z g = P z h P h g, s g = 1, S g h h P z h P g h, b P h P g 1. What is this likelihood? 2. How do we compute it? 3. How do we deal with missing data? 4. How do we incorporate selection effects? 5. What is our prior?
Missing Data How to deal with missing data? Training galaxy P g h, b Observed galaxy Band mask
Now What? How to deal with missing data? P g h, b 2 ln L =???
Naïve Likelihood: Multivariate Normal How to deal with missing data? P g h, b 2 ln L n χ n 2 δ n +n ln 2π + ln C g + C h
Implications How to deal with missing data? P g h, b SDSS Skyserver
Implications How to deal with missing data? P g h, b SDSS Skyserver SDSS Skyserver
Implications How to deal with missing data? P g h, b SDSS Skyserver SDSS Skyserver 2 ln L 5 χ 5 2 + 5 ln 2π + ln C g + C h
Implications How to deal with missing data? P g h, b SDSS Skyserver SDSS Skyserver 2 ln L 3 χ 3 2 + 3 ln 2π + ln C g + C h
Solutions How to deal with missing data? P g h, b SDSS Skyserver SDSS Skyserver 2 ln L X n X n, X n, X i.i.d. n χ 2 n δ n
Implementation How to deal with missing data? P g h, b 2 ln L =??? some transformation of doubly-non-central F- distribution
Implementation How to deal with missing data? P g h, b χ 2 = i ignore non-centrality 2 ln L = χ n 2 N b F g,i F h,i 2 σ 2 2 g,i + σ h,i first-order correction
Missing Data: Searching for Neighbors How to deal with missing data? P g h, b SDSS Skyserver Training data Galaxy Cool et al. (2007)
The Big Data Assumption P z g = P z h P h g, s g = 1, S g h h P z h P g h, b P h P g 1. What is this likelihood? 2. How do we compute it? 3. How do we deal with missing data? 4. How do we incorporate selection effects? 5. What is our prior?
The Big Data Assumption P z g = P z h P h g, s g = 1, S g h h P z h P g h, b P h P g 1. What is this likelihood? 2. How do we compute it? 3. How do we deal with missing data? 4. How do we incorporate selection effects? 5. What is our prior?
Selection Effects How to deal with selection effects? P g h
Selection Effects How to deal with selection effects? P F g g
Selection Effects Selection effect(s) How to deal with selection effects? P F g g, s g = 1, S g Binary selection flag (1=in/0=out)
Selection Effects Selection effect(s) How to deal with selection effects? P F g g, s g = 1, S g Selection Probability P s g = 1 F g, S g P F g g P s g = 1 g, S g Marginalized Selection Probability Original PDF Binary selection flag (1=in/0=out)
Application: Signal-to-Noise Cut
Application: Signal-to-Noise Cut
Application: Signal-to-Noise Cut
Application: Signal-to-Noise Cut
The Big Data Assumption P z g = P z h P h g, s g = 1, S g h h P z h P g h, b P h P g 1. What is this likelihood? 2. How do we compute it? 3. How do we deal with missing data? 4. How do we incorporate selection effects? 5. What is our prior?
The Big Data Assumption P z g = P z h P h g, s g = 1, S g h h P z h P g h, b P h P g 1. What is this likelihood? 2. How do we compute it? 3. How do we deal with missing data? 4. How do we incorporate selection effects? 5. What is our prior?
Mixing and Matching Galaxies How to deal with population mismatch? Based on Leistedt, Peiris, & Mortlock (2016)
Mixing and Matching Galaxies How to deal with population mismatch? p g ~ Mult n = 1, p = p g N h categories with probability vector p g = P h g Trinomial distribution from Mathworks. Based on Leistedt, Peiris, & Mortlock (2016)
Mixing and Matching Galaxies How to deal with population mismatch? p g ~ Mult n = 1, p = p g Trinomial distribution from Mathworks. N h categories with probability vector p g = P h g h = {0,1,2} p g = 0,0,1 h g = 2 Based on Leistedt, Peiris, & Mortlock (2016)
Mixing and Matching Galaxies How to deal with population mismatch? p g ~ Mult n = 1, p = p g P h n ~ p g g g Based on Leistedt, Peiris, & Mortlock (2016)
Mixing and Matching Galaxies How to deal with population mismatch? p g ~ Mult n = 1, p = p g n ~ p g g g P h Dir w α = n + 1 Population weights Concentration Counts Based on Leistedt, Peiris, & Mortlock (2016)
Mixing and Matching Galaxies How to deal with population mismatch? p g ~ Mult n = 1, p = p g Low concentration Courtesy of Wikipedia. n ~ High concentration p g g g P h Dir w α = n + 1 Population weights Concentration Counts Based on Leistedt, Peiris, & Mortlock (2016)
Hierarchical Modeling Hierarchical posteriors How to deal with population mismatch? P w, p g D, S Population weights (training set) Based on Leistedt, Peiris, & Mortlock (2016)
Hierarchical Modeling Hierarchical posteriors How to deal with population mismatch? P w, p g D, S Gibbs sampling P p g w, S Population weights (training set) P w p g, S Based on Leistedt, Peiris, & Mortlock (2016)
Hierarchical Modeling Hierarchical posteriors How to deal with population mismatch? P w, p g D, S Gibbs sampling P p g w, S Population weights (training set) 1. Sample hierarchical posteriors: i p g ~ Mult n = 1, p = p g w i 1 P w p g, S Based on Leistedt, Peiris, & Mortlock (2016)
Hierarchical Modeling Hierarchical posteriors How to deal with population mismatch? P w, p g D, S Gibbs sampling P p g w, S P w p g, S Population weights (training set) 1. Sample hierarchical posteriors: i p g ~ Mult n = 1, p = p g w i 1 2. Compute counts: n i ~ g g p g i Based on Leistedt, Peiris, & Mortlock (2016)
Hierarchical Modeling Hierarchical posteriors How to deal with population mismatch? P w, p g D, S Gibbs sampling P p g w, S P w p g, S Population weights (training set) 1. Sample hierarchical posteriors: i p g ~ Mult n = 1, p = p g w i 1 2. Compute counts: n i ~ g g p g i 3. Sample weights: w i Dir w n i + 1 Based on Leistedt, Peiris, & Mortlock (2016)
Application to Mock Data Errors ~ N, independent All objects share same PDF Counts from posterior (uniform prior) Counts from hierarchical model
Application to Mock Data Errors ~ N, independent All objects share same PDF Counts from posterior (uniform prior) Counts from hierarchical model
Application to Mock Data
Tests on ~100k SDSS Galaxies
Next Steps
Robustness of Training Data/Priors Hyper Suprime-Cam (HSC) SSP has ~380k objects taken from 11 surveys. Wide variety of selection criteria, data quality/reliability. Note: old plot from early internal data release.
Robustness of Training Data/Priors 1:1:1 prior 10:5:1 prior Spec-z : g/prism-z : photo-z Relative weights Preliminary
Incorporating Physical Priors Moving a galaxy from one redshift to another is a smooth, physical process that is well-understood. Want to incorporate this into our priors/predictions. GP with physical kernel Leistedt & Hogg (2016)
Incorporating Physical Priors Can use to augment training data. Straightforward to impute missing values. Leistedt & Hogg (2016)
Likelihood / Distance Metric What metric(s) to use? P g h Training galaxy More sophisticated machine learning methods could be used to compute posterior samples (or possibly likelihoods) over complex domains. Observed galaxy NGC 4414 NGC 5457
Summary Photometric redshifts (photo-z s) are an integral part of modern big data extragalactic science. Large training datasets gives new opportunities to develop Bayesian, data-driven photo-z s. Taking advantage of these datasets requires dealing with real-world problems (e.g., biased training data) using a variety of statistical methods (e.g., hierarchical Bayes). Early results look promising!
Code is available! Although still under active development, code, tutorials, and rough draft of a paper are online at: github.com/joshspeagle/frankenz