Big Data Inference. Combining Hierarchical Bayes and Machine Learning to Improve Photometric Redshifts. Josh Speagle 1,

Similar documents
Inference of Galaxy Population Statistics Using Photometric Redshift Probability Distribution Functions

Modern Image Processing Techniques in Astronomical Sky Surveys

Science with large imaging surveys

arxiv: v1 [astro-ph.co] 2 Dec 2016

Probabilistic photometric redshifts in the era of Petascale Astronomy

Bayesian Modeling for Type Ia Supernova Data, Dust, and Distances

Hierarchical Bayesian Modeling

Data Analysis Methods

LePhare Download Install Syntax Examples Acknowledgement Le Phare

Lecture 11: SDSS Sources at Other Wavelengths: From X rays to radio. Astr 598: Astronomy with SDSS

The PRIsm MUlti-object Survey (PRIMUS)

SED- dependent Galactic Extinction Prescription for Euclid and Future Cosmological Surveys

The The largest assembly ESO high-redshift. Lidia Tasca & VUDS collaboration

Subaru/WFIRST Synergies for Cosmology and Galaxy Evolution

Durham Lightcones: Synthetic galaxy survey catalogues from GALFORM. Jo Woodward, Alex Merson, Peder Norberg, Carlton Baugh, John Helly

Dictionary Learning for photo-z estimation

Hierarchical Bayesian Modeling

The SDSS is Two Surveys

What Can We Learn from Galaxy Clustering 1: Why Galaxy Clustering is Useful for AGN Clustering. Alison Coil UCSD

SNAP Photometric Redshifts and Simulations

A cooperative approach among methods for photometric redshifts estimation

Refining Photometric Redshift Distributions with Cross-Correlations

Exploiting Sparse Non-Linear Structure in Astronomical Data

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

Age = Gyr dependent on the detail physics.

Some Statistical Aspects of Photometric Redshift Estimation

Self-Organizing-Map and Deep-Learning! application to photometric redshift in HSC

A Unified Theory of Photometric Redshifts. Tamás Budavári The Johns Hopkins University

Dark Energy. Cluster counts, weak lensing & Supernovae Ia all in one survey. Survey (DES)

Searching for z>6 Quasars with Subaru / Hyper Suprime-Cam Survey

Introduction. Chapter 1

Data-driven models of stars

HIERARCHICAL MODELING OF THE RESOLVE SURVEY

Towards a Bayesian model for Cyber Security

IN GALAXY GROUPS? WHERE IS THE CENTER OF MASS MATT GEORGE. 100 kpc 100 kpc UC BERKELEY

Treatment and analysis of data Applied statistics Lecture 4: Estimation

Surveys at z 1. Petchara Pattarakijwanich 20 February 2013

Photometric Redshifts for the NSLS

Lyman-α Cosmology with BOSS Julián Bautista University of Utah. Rencontres du Vietnam Cosmology 2015

Dusty star-forming galaxies at high redshift (part 5)

Quasar identification with narrow-band cosmological surveys

Neutron inverse kinetics via Gaussian Processes

SNLS supernovae photometric classification with machine learning

Hertzprung-Russel and colormagnitude. ASTR320 Wednesday January 31, 2018

Multistage Anslysis on Solar Spectral Analyses with Uncertainties in Atomic Physical Models

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Statistical Tools and Techniques for Solar Astronomers

Introduction to Machine Learning Midterm Exam

Statistical Searches in Astrophysics and Cosmology

The magnitude system. ASTR320 Wednesday January 30, 2019

Cosmology on small scales: Emulating galaxy clustering and galaxy-galaxy lensing into the deeply nonlinear regime

Learning algorithms at the service of WISE survey

Bayesian inference using Gaia data. Coryn Bailer-Jones Max Planck Institute for Astronomy, Heidelberg

Gas Accretion & Outflows from Redshift z~1 Galaxies

Improved Astronomical Inferences via Nonparametric Density Estimation

arxiv:astro-ph/ v1 16 Apr 2004

Introduction to Machine Learning Midterm Exam Solutions

Supplementary Information for SNLS-03D3bb a super- Chandrasekhar mass Type Ia supernova

Three data analysis problems

Sparse Composite Models of Galaxy Spectra (or Anything) using L1 Norm Regularization. Benjamin Weiner Steward Observatory University of Arizona

arxiv: v1 [astro-ph.ga] 14 May 2015

Dust properties of galaxies at redshift z 5-6

From quasars to dark energy Adventures with the clustering of luminous red galaxies

Andy Casey Astrophysics; Statistics

Large-scale Collaborative Prediction Using a Nonparametric Random Effects Model

Chapter 7: From theory to observations

Recent Advances in Bayesian Inference Techniques

STA 4273H: Statistical Machine Learning

Type Ia Supernovae: Standardizable Candles and Crayons

Keck/Subaru Exchange Program Subaru Users Meeting January 20, 2011

Formation and growth of galaxies in the young Universe: progress & challenges

Subaru High-z Exploration of Low-Luminosity Quasars (SHELLQs)

EXONEST The Exoplanetary Explorer. Kevin H. Knuth and Ben Placek Department of Physics University at Albany (SUNY) Albany NY

The ALMA z=4 Survey (AR4S)

(Slides for Tue start here.)

20x increase from z = 0 to 2!

Approximate Bayesian computation: an application to weak-lensing peak counts

Type II Supernovae as Standardized Candles

IRS Spectroscopy of z~2 Galaxies

Cosmological Merger Rates

The Long Faint Tail of the High-Redshift Galaxy Population

MULTIPLE EXPOSURES IN LARGE SURVEYS

A Random Walk Through Astrometry

Two Statistical Problems in X-ray Astronomy

Day 5: Generative models, structured classification

Hierarchical Bayesian Modeling of Planet Populations

Astronomical image reduction using the Tractor

Stages of a Big Project ***

Gaussian Processes for Audio Feature Extraction

The Evolution of Massive Galaxies at 3 < z < 7 (The Hawaii 20 deg 2 Survey H2O)

Source plane reconstruction of the giant gravitational arc in Abell 2667: a condidate Wolf-Rayet galaxy at z 1

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Revealing new optically-emitting extragalactic Supernova Remnants

Classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Reionization signatures in gamma-ray spectra

Chapter 10: Unresolved Stellar Populations

Naïve Bayes classification

EUCLID Legacy with Spectroscopy

Large Scale Bayesian Inference

The evolution of the Galaxy Stellar Mass Function:

Transcription:

ICHASC, Apr. 25 2017 Big Data Inference Combining Hierarchical Bayes and Machine Learning to Improve Photometric Redshifts Josh Speagle 1, jspeagle@cfa.harvard.edu In collaboration with: Boris Leistedt 2, Alexie Leauthaud 3, Daniel Eisenstein 1, Kevin Bundy 3,4, Peter Capak 5, Ben Hoyle 6, Daniel Masters 5, Daniel Mortlock 7, and Hiranya Peiris 8,9 NSF Graduate Research Fellow, 1 Harvard University, 2 NYU, 3 UC Santa Cruz, 4 UC Observatories, 5 IPAC, 6 LMU Munich, 7 Imperial College London, 8 University College London, 9 Oskar Klein Centre for Cosmoparticle Physics

What are Photometric Redshifts?

Basic Concept?

Basic Concept? SDSS Skyserver Spectrum

Basic Concept Spec-z SDSS Skyserver Spectrum

Basic Concept Spec-z SDSS Skyserver Spectrum

Basic Concept Spec-z SDSS Skyserver Spectrum

Basic Concept? SDSS Skyserver Broadband filters

Basic Concept Spectral Energy Distribution (SED)? SDSS Skyserver Broadband filters

Basic Concept Spectral Energy Distribution (SED) SDSS Skyserver Broadband filters

http://www.stsci.edu/~dcoe/bpz/intro.html

Why do we care?

Because we have to. Why do we care?

Because we have to. Why do we care? Many questions now require large samples of galaxies to answer now entering the Big Data era of astronomy.

Because we have to. Why do we care? Many questions now require large samples of galaxies to answer now entering the Big Data era of astronomy. Wide-field imaging surveys much cheaper and faster than spectroscopic surveys. Also can see fainter objects.

Because we have to. Why do we care? Many questions now require large samples of galaxies to answer now entering the Big Data era of astronomy. Wide-field imaging surveys much cheaper and faster than spectroscopic surveys. Also can see fainter objects. ~100x increase in sample size, diversity makes up for photo-z uncertainties. (Detailed studies can rely on ~1% spectroscopic subsample.)

Science Case Precision cosmology Using large samples of galaxies to pin down the dark energy equation of state, growth of large-scale structure, etc. Taken from http://kids.strw.leidenuniv.nl/goals.php.

Computing Photo-z s

Photo-z s: Statistically Speaking Star formation rate Star formation history Redshift Stellar mass AGN activity Dust content Metallicity

Photo-z s: Statistically Speaking A forward modeling problem: can we construct a model from parameters we care about that matches the observed SED? Star formation rate Star formation history Redshift Stellar mass AGN activity Dust content Metallicity

Flux (Jy; normalized) Photometry = F( a T gal ) Traditional Approach e.g., COSMOS: Ilbert et al. (2009), Laigle et al. (2016) Galaxy Wavelength (A)

Flux (Jy; normalized) Traditional Approach e.g., COSMOS: Ilbert et al. (2009), Laigle et al. (2016) Photometry = F( a, b T gal, T em ) Emission lines Galaxy Wavelength (A)

Flux (Jy; normalized) Traditional Approach e.g., COSMOS: Ilbert et al. (2009), Laigle et al. (2016) Photometry = F( a, b, c, E(B V) T gal, T em, T dust ) Emission lines Galaxy Reddening Wavelength (A)

Flux (Jy; normalized) Traditional Approach e.g., COSMOS: Ilbert et al. (2009), Laigle et al. (2016) Photometry = F( a, b, c, E(B V), z T gal, T em, T dust ) Emission lines Galaxy Reddening Redshift Wavelength (A)

Flux (Jy; normalized) Traditional Approach e.g., COSMOS: Ilbert et al. (2009), Laigle et al. (2016) Photometry = F( a, b, c, E(B V), z T gal, T em, T dust, T IGM ) Emission lines Galaxy Reddening Redshift Lyman limit IGM Lyα Wavelength (A)

Flux (Jy; normalized) Traditional Approach e.g., COSMOS: Ilbert et al. (2009), Laigle et al. (2016) Photometry = F( a, b, c, E(B V), z T gal, T em, T dust, T IGM, filters) Emission lines Galaxy Reddening Redshift Photometry IGM Wavelength (A)

Flux (Jy; normalized) Traditional Approach e.g., COSMOS: Ilbert et al. (2009), Laigle et al. (2016) Photometry = F( a, b, c, E(B V), z T gal, T em, T dust, T IGM, filters) Emission lines Galaxy Reddening Redshift Photometry IGM Wavelength (A)

Photo-z s: Statistically Speaking A forward modeling problem: can we construct a model from parameters we care about that matches the observed SED? Star formation rate Star formation history Redshift Stellar mass AGN activity Dust content Metallicity

Photo-z s: Statistically Speaking An inverse mapping problem: can we use machine learning to construct a mapping from color to redshift? Star formation rate Star formation history Redshift Stellar mass AGN activity Dust content Metallicity

Machine Learning Training set

Machine Learning Decision Tree

Machine Learning

Machine Learning z 1 z 2 z 3 z 4 z 5 z 6 z 7 z 8 Target (redshift)

Machine Learning ~ P C g Feature vector (observed color) z 1 z 2 z 3 z 4 z 5 z 6 z 7 z 8 Target (redshift)

Machine Learning ~ P C g Feature vector (observed color) z 1 z 2 z 3 z 4 z 5 z 6 z 7 z 8 Target (redshift)

Machine Learning

Machine Learning z 1 z 2 z 3 z 4 z 5 z 6 z 7 z 8 1 7 3 1 0 0 0 0 Target Weight Kernel Density Estimation P z g

Two Communities Model-fitting approaches (color-redshift relation assumed) Machine-learning approaches (feature-redshift relation derived)

Two Communities Model-fitting approaches (color-redshift relation assumed) Machine-learning approaches (feature-redshift relation derived)

Two Communities Model-fitting approaches (color-redshift relation assumed) Probabilistic Interpretable Sensitive to systematics Generally slow Machine-learning approaches (feature-redshift relation derived)

Two Communities Model-fitting approaches (color-redshift relation assumed) Probabilistic Interpretable Sensitive to systematics Generally slow Flexible, data-driven More robust to systematics Generally fast Difficult to interpret Difficult to derive PDFs Machine-learning approaches (feature-redshift relation derived)

What are we doing?

Splitting Photo-z s into Two Parts Mapping from features to redshift. Can be done through model fitting and/or machine learning. P z F or P F z, η

Splitting Photo-z s into Two Parts Mapping from features to redshift. Can be done through model fitting and/or machine learning. P z F or P F z, η Propagating uncertainties in features. P F F, C

Splitting Photo-z s into Two Parts Mapping from features to redshift. Can be done through model fitting and/or machine learning. P z F h, C h P F h F h, C h Propagating uncertainties in features. P F g F g, C g

Splitting Photo-z s into Two Parts Mapping from features to redshift. Can be done through model fitting and/or machine learning. P z F h, C h P F h F h, C h Propagating uncertainties in features. P F g F g, C g

Splitting Photo-z s into Two Parts Mapping from features to redshift. Can be done through model fitting and/or machine learning. P z F h, C h P F h F h, C h Propagating uncertainties in features. P F g F g, C g

Feature Projection P F h F h, C h P F g F g, C g

The Big Data Assumption P z g = h P z h P h g

The Big Data Assumption redshift P z g = target galaxy g h redshift PDF P z h P h g training galaxy h photometric posterior

The Big Data Assumption P z g = P z h P h g h = h P z h P g h P h P g

The Big Data Assumption P z g = P z h P h g h = P z h 1. What is this likelihood? h P g h P h P g

The Big Data Assumption P z g = P z h P h g = h h P z h 1. What is this likelihood? 2. How do we compute it? P g h P h P g

The Big Data Assumption P z g = = h h P z h P z h P h g band mask P g h, b P h P g 1. What is this likelihood? 2. How do we compute it? 3. How do we deal with missing data?

The Big Data Assumption P z g = h h P z h P h g, s g = 1, S g P z h P g h, b P h P g galaxy is observed selection effects 1. What is this likelihood? 2. How do we compute it? 3. How do we deal with missing data? 4. How do we incorporate selection effects?

The Big Data Assumption P z g = P z h P h g, s g = 1, S g h h P z h P g h, b P h P g 1. What is this likelihood? 2. How do we compute it? 3. How do we deal with missing data? 4. How do we incorporate selection effects? 5. What is our prior?

The Big Data Assumption P z g = P z h P h g, s g = 1, S g h h P z h P g h, b P h P g 1. What is this likelihood? 2. How do we compute it? 3. How do we deal with missing data? 4. How do we incorporate selection effects? 5. What is our prior?

Likelihood / Distance Metric What metric(s) to use? P g h Training galaxy Observed galaxy

Likelihood / Distance Metric What metric(s) to use? P g h Training galaxy Observed galaxy NGC 4414 NGC 5457

Likelihood / Distance Metric What metric(s) to use? P g h Training galaxy SDSS Skyserver Observed galaxy Broadband filters

Likelihood / Distance Metric What metric(s) to use? P g h Training galaxy SDSS Skyserver Observed galaxy Assume data is normally distributed. Broadband filters P F g F g, C g N F g F g, C g

Likelihood / Distance Metric What metric(s) to use? P g h Color space (traditional) χ 2 = i F g,i s F h,i 2 σ 2 g,i + s 2 2 σ h,i

Likelihood / Distance Metric What metric(s) to use? P g h Color space (traditional) χ 2 = i F g,i s F h,i 2 σ 2 g,i + s 2 2 σ h,i

Likelihood / Distance Metric What metric(s) to use? P g h Color space (traditional) χ 2 = i F g,i s F h,i 2 σ 2 g,i + s 2 2 σ h,i Magnitude space ( new ) χ 2 = i F g,i F h,i 2 σ 2 2 g,i + σ h,i

Likelihood / Distance Metric What metric(s) to use? P g h Color space (traditional) Requires magnitude priors to account for galaxy evolution. χ 2 = i F g,i s F h,i 2 σ 2 g,i + s 2 2 σ h,i Scale-free Magnitude space ( new ) Requires good sampling in full magnitude space. χ 2 = i F g,i F h,i 2 σ 2 2 g,i + σ h,i Scale-dependent

Population: Mag v Color

The Big Data Assumption P z g = P z h P h g, s g = 1, S g h h P z h P g h, b P h P g 1. What is this likelihood? 2. How do we compute it? 3. How do we deal with missing data? 4. How do we incorporate selection effects? 5. What is our prior?

The Big Data Assumption P z g = P z h P h g, s g = 1, S g h h P z h P g h, b P h P g 1. What is this likelihood? 2. How do we compute it? 3. How do we deal with missing data? 4. How do we incorporate selection effects? 5. What is our prior?

The Problem Training data Galaxy Likelihood P h g Photometric training set Cool et al. (2007)

Machine Learning Approximation Training data Galaxy Likelihood P h g Cool et al. (2007)

Population: FRANKEN-Z

How Valid is Our Approximation?

The Big Data Assumption P z g = P z h P h g, s g = 1, S g h h P z h P g h, b P h P g 1. What is this likelihood? 2. How do we compute it? 3. How do we deal with missing data? 4. How do we incorporate selection effects? 5. What is our prior?

The Big Data Assumption P z g = P z h P h g, s g = 1, S g h h P z h P g h, b P h P g 1. What is this likelihood? 2. How do we compute it? 3. How do we deal with missing data? 4. How do we incorporate selection effects? 5. What is our prior?

Missing Data How to deal with missing data? Training galaxy P g h, b Observed galaxy Band mask

Now What? How to deal with missing data? P g h, b 2 ln L =???

Naïve Likelihood: Multivariate Normal How to deal with missing data? P g h, b 2 ln L n χ n 2 δ n +n ln 2π + ln C g + C h

Implications How to deal with missing data? P g h, b SDSS Skyserver

Implications How to deal with missing data? P g h, b SDSS Skyserver SDSS Skyserver

Implications How to deal with missing data? P g h, b SDSS Skyserver SDSS Skyserver 2 ln L 5 χ 5 2 + 5 ln 2π + ln C g + C h

Implications How to deal with missing data? P g h, b SDSS Skyserver SDSS Skyserver 2 ln L 3 χ 3 2 + 3 ln 2π + ln C g + C h

Solutions How to deal with missing data? P g h, b SDSS Skyserver SDSS Skyserver 2 ln L X n X n, X n, X i.i.d. n χ 2 n δ n

Implementation How to deal with missing data? P g h, b 2 ln L =??? some transformation of doubly-non-central F- distribution

Implementation How to deal with missing data? P g h, b χ 2 = i ignore non-centrality 2 ln L = χ n 2 N b F g,i F h,i 2 σ 2 2 g,i + σ h,i first-order correction

Missing Data: Searching for Neighbors How to deal with missing data? P g h, b SDSS Skyserver Training data Galaxy Cool et al. (2007)

The Big Data Assumption P z g = P z h P h g, s g = 1, S g h h P z h P g h, b P h P g 1. What is this likelihood? 2. How do we compute it? 3. How do we deal with missing data? 4. How do we incorporate selection effects? 5. What is our prior?

The Big Data Assumption P z g = P z h P h g, s g = 1, S g h h P z h P g h, b P h P g 1. What is this likelihood? 2. How do we compute it? 3. How do we deal with missing data? 4. How do we incorporate selection effects? 5. What is our prior?

Selection Effects How to deal with selection effects? P g h

Selection Effects How to deal with selection effects? P F g g

Selection Effects Selection effect(s) How to deal with selection effects? P F g g, s g = 1, S g Binary selection flag (1=in/0=out)

Selection Effects Selection effect(s) How to deal with selection effects? P F g g, s g = 1, S g Selection Probability P s g = 1 F g, S g P F g g P s g = 1 g, S g Marginalized Selection Probability Original PDF Binary selection flag (1=in/0=out)

Application: Signal-to-Noise Cut

Application: Signal-to-Noise Cut

Application: Signal-to-Noise Cut

Application: Signal-to-Noise Cut

The Big Data Assumption P z g = P z h P h g, s g = 1, S g h h P z h P g h, b P h P g 1. What is this likelihood? 2. How do we compute it? 3. How do we deal with missing data? 4. How do we incorporate selection effects? 5. What is our prior?

The Big Data Assumption P z g = P z h P h g, s g = 1, S g h h P z h P g h, b P h P g 1. What is this likelihood? 2. How do we compute it? 3. How do we deal with missing data? 4. How do we incorporate selection effects? 5. What is our prior?

Mixing and Matching Galaxies How to deal with population mismatch? Based on Leistedt, Peiris, & Mortlock (2016)

Mixing and Matching Galaxies How to deal with population mismatch? p g ~ Mult n = 1, p = p g N h categories with probability vector p g = P h g Trinomial distribution from Mathworks. Based on Leistedt, Peiris, & Mortlock (2016)

Mixing and Matching Galaxies How to deal with population mismatch? p g ~ Mult n = 1, p = p g Trinomial distribution from Mathworks. N h categories with probability vector p g = P h g h = {0,1,2} p g = 0,0,1 h g = 2 Based on Leistedt, Peiris, & Mortlock (2016)

Mixing and Matching Galaxies How to deal with population mismatch? p g ~ Mult n = 1, p = p g P h n ~ p g g g Based on Leistedt, Peiris, & Mortlock (2016)

Mixing and Matching Galaxies How to deal with population mismatch? p g ~ Mult n = 1, p = p g n ~ p g g g P h Dir w α = n + 1 Population weights Concentration Counts Based on Leistedt, Peiris, & Mortlock (2016)

Mixing and Matching Galaxies How to deal with population mismatch? p g ~ Mult n = 1, p = p g Low concentration Courtesy of Wikipedia. n ~ High concentration p g g g P h Dir w α = n + 1 Population weights Concentration Counts Based on Leistedt, Peiris, & Mortlock (2016)

Hierarchical Modeling Hierarchical posteriors How to deal with population mismatch? P w, p g D, S Population weights (training set) Based on Leistedt, Peiris, & Mortlock (2016)

Hierarchical Modeling Hierarchical posteriors How to deal with population mismatch? P w, p g D, S Gibbs sampling P p g w, S Population weights (training set) P w p g, S Based on Leistedt, Peiris, & Mortlock (2016)

Hierarchical Modeling Hierarchical posteriors How to deal with population mismatch? P w, p g D, S Gibbs sampling P p g w, S Population weights (training set) 1. Sample hierarchical posteriors: i p g ~ Mult n = 1, p = p g w i 1 P w p g, S Based on Leistedt, Peiris, & Mortlock (2016)

Hierarchical Modeling Hierarchical posteriors How to deal with population mismatch? P w, p g D, S Gibbs sampling P p g w, S P w p g, S Population weights (training set) 1. Sample hierarchical posteriors: i p g ~ Mult n = 1, p = p g w i 1 2. Compute counts: n i ~ g g p g i Based on Leistedt, Peiris, & Mortlock (2016)

Hierarchical Modeling Hierarchical posteriors How to deal with population mismatch? P w, p g D, S Gibbs sampling P p g w, S P w p g, S Population weights (training set) 1. Sample hierarchical posteriors: i p g ~ Mult n = 1, p = p g w i 1 2. Compute counts: n i ~ g g p g i 3. Sample weights: w i Dir w n i + 1 Based on Leistedt, Peiris, & Mortlock (2016)

Application to Mock Data Errors ~ N, independent All objects share same PDF Counts from posterior (uniform prior) Counts from hierarchical model

Application to Mock Data Errors ~ N, independent All objects share same PDF Counts from posterior (uniform prior) Counts from hierarchical model

Application to Mock Data

Tests on ~100k SDSS Galaxies

Next Steps

Robustness of Training Data/Priors Hyper Suprime-Cam (HSC) SSP has ~380k objects taken from 11 surveys. Wide variety of selection criteria, data quality/reliability. Note: old plot from early internal data release.

Robustness of Training Data/Priors 1:1:1 prior 10:5:1 prior Spec-z : g/prism-z : photo-z Relative weights Preliminary

Incorporating Physical Priors Moving a galaxy from one redshift to another is a smooth, physical process that is well-understood. Want to incorporate this into our priors/predictions. GP with physical kernel Leistedt & Hogg (2016)

Incorporating Physical Priors Can use to augment training data. Straightforward to impute missing values. Leistedt & Hogg (2016)

Likelihood / Distance Metric What metric(s) to use? P g h Training galaxy More sophisticated machine learning methods could be used to compute posterior samples (or possibly likelihoods) over complex domains. Observed galaxy NGC 4414 NGC 5457

Summary Photometric redshifts (photo-z s) are an integral part of modern big data extragalactic science. Large training datasets gives new opportunities to develop Bayesian, data-driven photo-z s. Taking advantage of these datasets requires dealing with real-world problems (e.g., biased training data) using a variety of statistical methods (e.g., hierarchical Bayes). Early results look promising!

Code is available! Although still under active development, code, tutorials, and rough draft of a paper are online at: github.com/joshspeagle/frankenz