Probabilistic photometric redshifts in the era of Petascale Astronomy

Similar documents
METAPHOR Machine-learning Estimation Tool for Accurate Photometric Redshifts

Data Analysis Methods

The Dark Energy Survey Public Data Release 1

Subaru/WFIRST Synergies for Cosmology and Galaxy Evolution

Photometric Redshifts, DES, and DESpec

Dictionary Learning for photo-z estimation

Big Data Inference. Combining Hierarchical Bayes and Machine Learning to Improve Photometric Redshifts. Josh Speagle 1,

Cosmology of Photometrically- Classified Type Ia Supernovae

Deep Learning for Gravitational Wave Analysis Results with LIGO Data

New Opportunities in Petascale Astronomy. Robert J. Brunner University of Illinois

A cooperative approach among methods for photometric redshifts estimation

Dark Energy. Cluster counts, weak lensing & Supernovae Ia all in one survey. Survey (DES)

Quasar identification with narrow-band cosmological surveys

Evolution of galaxy clustering in the ALHAMBRA Survey

arxiv: v2 [astro-ph.co] 13 Jun 2016

SNAP Photometric Redshifts and Simulations

Data Release 5. Sky coverage of imaging data in the DR5

ECE 5424: Introduction to Machine Learning

LSST Cosmology and LSSTxCMB-S4 Synergies. Elisabeth Krause, Stanford

arxiv: v1 [astro-ph.ga] 12 Jul 2018

CSCI-567: Machine Learning (Spring 2019)

Robust Machine Learning Applied to Terascale Astronomical Datasets

Mining Digital Surveys for photo-z s and other things

Statistical Searches in Astrophysics and Cosmology

Astroinformatics in the data-driven Astronomy

Approximate Bayesian computation: an application to weak-lensing peak counts

From the Big Bang to Big Data. Ofer Lahav (UCL)

LePhare Download Install Syntax Examples Acknowledgement Le Phare

Refining Photometric Redshift Distributions with Cross-Correlations

Inference of Galaxy Population Statistics Using Photometric Redshift Probability Distribution Functions

MULTIPLE EXPOSURES IN LARGE SURVEYS

Science with large imaging surveys

VBM683 Machine Learning

(KAVLI IPMU) Cosmology with Subaru HSC survey. A large number of cosmological N-body simulations. Parameter space exploration and the Gaussian process

arxiv: v1 [astro-ph.ga] 3 Oct 2017

Some Statistical Aspects of Photometric Redshift Estimation

Kavli IPMU-Berkeley Symposium "Statistics, Physics and Astronomy" January , 2018 Lecture Hall, Kavli IPMU

Machine Astronomers to Discover the Unexpected in Astronomical Surveys. Ray Norris, Western Sydney University & CSIRO Astronomy & Space Science,

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Large Synoptic Survey Telescope

The Large Synoptic Survey Telescope

Radial selec*on issues for primordial non- Gaussianity detec*on

Learning algorithms at the service of WISE survey

LSST, Euclid, and WFIRST

Are VISTA/4MOST surveys interesting for cosmology? Chris Blake (Swinburne)

Click Prediction and Preference Ranking of RSS Feeds

The rise of galaxy surveys and mocks (DESI progress and challenges) Shaun Cole Institute for Computational Cosmology, Durham University, UK

Self-Organizing-Map and Deep-Learning! application to photometric redshift in HSC

THE DARK ENERGY SURVEY: 3 YEARS OF SUPERNOVA

CS 6375 Machine Learning

ECE 5984: Introduction to Machine Learning

WISE as the cornerstone for all-sky photo-z samples

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Photometric Redshifts with DAME

(Slides for Tue start here.)

Surprise Detection in Multivariate Astronomical Data Kirk Borne George Mason University

Gravitational Wave Astronomy s Next Frontier in Computation

Machine Learning and Deep Learning! Vincent Lepetit!

Data Exploration vis Local Two-Sample Testing

An Introduction to the Dark Energy Survey

The SDSS is Two Surveys

Ensembles. Léon Bottou COS 424 4/8/2010

Fast Hierarchical Clustering from the Baire Distance

Weak Gravitational Lensing. Gary Bernstein, University of Pennsylvania KICP Inaugural Symposium December 10, 2005

9. High-level processing (astronomical data analysis)

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Gaussian Process Modeling in Cosmology: The Coyote Universe. Katrin Heitmann, ISR-1, LANL

Constraining Dark Energy: First Results from the SDSS-II Supernova Survey

arxiv: v1 [astro-ph.co] 15 Nov 2010

PRACTICAL ANALYTICS 7/19/2012. Tamás Budavári / The Johns Hopkins University

Photometric Redshifts for the NSLS

SED- dependent Galactic Extinction Prescription for Euclid and Future Cosmological Surveys

Supernovae photometric classification of SNLS data with supervised learning

Lectures in AstroStatistics: Topics in Machine Learning for Astronomers

Surprise Detection in Science Data Streams Kirk Borne Dept of Computational & Data Sciences George Mason University

The The largest assembly ESO high-redshift. Lidia Tasca & VUDS collaboration

Astronomy of the Next Decade: From Photons to Petabytes. R. Chris Smith AURA Observatory in Chile CTIO/Gemini/SOAR/LSST

Modern Image Processing Techniques in Astronomical Sky Surveys

arxiv: v1 [astro-ph] 12 Nov 2008

Photometric redshift analysis in the Dark Energy Survey Science Verification data

Weak Lensing: Status and Prospects

Pattern recognition. "To understand is to perceive patterns" Sir Isaiah Berlin, Russian philosopher

arxiv: v2 [astro-ph.co] 10 May 2016

Supernovae with Euclid

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

arxiv: v1 [astro-ph.im] 12 Jun 2014

Target Selection for future spectroscopic surveys (DESpec) Stephanie Jouvel, Filipe Abdalla, With DESpec target selection team.

Deep generative models for fast shower simulation in ATLAS

Modeling, detection and characterization of eccentric compact binary coalescence

The Merging Systems Identification (MeSsI) Algorithm.

Clustering, Proximity, and Balrog

arxiv: v2 [astro-ph] 21 Aug 2007

SDSS-IV and eboss Science. Hyunmi Song (KIAS)

A unified multi-wavelength model of galaxy formation. Carlton Baugh Institute for Computational Cosmology

Machine Learning Applications in Astronomy

N- body- spectro- photometric simula4ons for DES and DESpec

Énergie noire Formation des structures. N. Regnault C. Yèche

Supernovae Observations of the Expanding Universe. Kevin Twedt PHYS798G April 17, 2007

From DES to LSST. Transient Processing Goes from Hours to Seconds. Eric Morganson, NCSA LSST Time Domain Meeting Tucson, AZ May 22, 2017

SNLS supernovae photometric classification with machine learning

Transcription:

Probabilistic photometric redshifts in the era of Petascale Astronomy Matías Carrasco Kind NCSA/Department of Astronomy University of Illinois at Urbana-Champaign Tools for Astronomical Big Data March 9 th - 11 th, 2015 Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 1

The need of distances in cosmology Credit: SDSS Collaboration 3D Clustering of galaxies as a probe in cosmology, e.g., 2 point correlation function, power spectrum of the galaxy distribution, etc. Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 2

Photometry vs. Spectroscopy vs. Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 3

Big data problem It s happening! 300 millions galaxies up to z =1.5 5,000 squares degrees (1/8 sky) Data managment at NCSA DES specially designed to probe the origin of dark energy S/G class and photo-z needed 1 TB of data per day 2 years completed, 3 more to go Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 4

Big data problem It s happening! 300 millions galaxies up to z =1.5 5,000 squares degrees (1/8 sky) Data managment at NCSA DES specially designed to probe the origin of dark energy S/G class and photo-z needed 1 TB of data per day 2 years completed, 3 more to go Large Synoptic Survey Telescope 2020 first light Half of sky 30 TB of data nightly for 10 years!! NCSA involved in data analysis Big Data Challenge, 1B galaxies Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 4

Data science challenge Determine redshift using limited information. 8 points instead of thousands! Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 5

Motivation Photo-z Probability Density Functions needed Several methods/codes to compute photo-z Need for a meta-algorithm that combines multiple techniques PDF are good but for large datasets, storage and I/O will be an issue Machine Learning and statistical tools Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 6

Motivation Photo-z Probability Density Functions needed Several methods/codes to compute photo-z Need for a meta-algorithm that combines multiple techniques PDF are good but for large datasets, storage and I/O will be an issue Machine Learning and statistical tools Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 6

Motivation Photo-z Probability Density Functions needed Several methods/codes to compute photo-z Need for a meta-algorithm that combines multiple techniques PDF are good but for large datasets, storage and I/O will be an issue Machine Learning and statistical tools Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 6

Motivation Photo-z Probability Density Functions needed Several methods/codes to compute photo-z Need for a meta-algorithm that combines multiple techniques PDF are good but for large datasets, storage and I/O will be an issue Machine Learning and statistical tools Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 6

Motivation Photo-z Probability Density Functions needed Several methods/codes to compute photo-z Need for a meta-algorithm that combines multiple techniques PDF are good but for large datasets, storage and I/O will be an issue Machine Learning and statistical tools Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 6

Photo-z PDF estimation (in 5 min.) Carrasco Kind & Brunner 2013a (MNRAS, 432, 1483) Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 7

Photo-z PDF estimation: TPZ TPZ (Trees for Photo-Z) is a supervised machine learning code Prediction trees and random forest Incorporate measurements errors and deals with missing values Ancillary information: expected errors, attribute ranking and others Carrasco Kind & Brunner 2013a (MNRAS, 432, 1483) Application to the S/G http://lcdm.astro.illinois.edu/code/mlz.html Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 8

Photo-z PDF estimation: TPZ example Tree from DES training data with more than 7000 nodes colors represent depth only Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 9

Photo-z PDF estimation: Random forest Combine predictions from trees Trees are ideally uncorrelated and strong Bootstrapping and error sampling Random features at each node Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 10

zphot Photo-z PDF estimation: TPZ applications Carrasco Kind & Brunner 2014b (MNRAS, 441, 3550) z spec SDSS DR10 DEEP2 Carrasco Kind & Brunner 2014c (MNRAS, 442, 3380) CFHTLenS TPZ has been tested in several databases with remarkable results Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 11

Photo-z PDF estimation: SOM SOM(Self Organized Map) is a unsupervised machine learning algorithm Competitive learning to represent data conserving topology 2D maps and Random Atlas Carrasco Kind & Brunner 2014a (MNRAS, 438, 3409) Framework inherited from TPZ Application to the S/G Carrasco Kind & Brunner 2014a (MNRAS, 438, 3409) Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 12

SOM topologies Carrasco Kind & Brunner 2014a (MNRAS, 438, 3409) Different topologies can be used with or without periodic boundary conditions Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 13

Photo-z PDF estimation: BPZ BPZ (Benitez, 2000) is a Bayesian template fitting method to obtain PDFs Set of calibrated SED and filters Doesn t need training data Priors can be included Carrasco Kind & Brunner 2014c (MNRAS, 442, 3380) Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 14

Photo-z PDF combination + Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 + Probabilistic photo-zs in the era of Petascale Astronomy 15

Bayesian framework Carrasco Kind & Brunner 2014c (MNRAS, 442, 3380) Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 16

Bayesian framework Our approach Supervised method + Unsupervised method + Template fitting + Weighting scheme Carrasco Kind & Brunner 2014c (MNRAS, 442, 3380) photo-z PDF + Outliers Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 16

Photo-z PDF combination: Results scatter outliers bias N(z) Several combination methods Bayesian model averaging (BMA) and combination (BMC) are the best Same applies to S/G (Kim, Brunner & CK in prep.) I z Carrasco Kind & Brunner 2014c (MNRAS, 442, 3380) Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 17

Photo-z PDF combination: Outliers Naïve Bayes Classifier (same used for spam emails) to identify spam galaxies using information from multiple techniques Each feature provides information about these two classes, and can be combined to make a stronger classifier mag-r Npeak zconf TPZ-BPZ Carrasco Kind & Brunner 2014c (MNRAS, 442, 3380) Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 18

Photo-z PDF representation and storage Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 19

Photo-z PDF storage: Strategies Single Gaussian fit Multi-Gaussan fit Monte Carlo sampling Sparse representation techniques Reduce number of points while increasing accuracy Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 20

Photo-z PDF storage: Strategies Single Gaussian fit Multi-Gaussan fit Monte Carlo sampling Sparse representation techniques Reduce number of points while increasing accuracy Carrasco Kind & Brunner 2014b (MNRAS, 441, 3550) Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 20

Photo-z PDF storage: Sparse representation Use Gaussian and Voigt profiles as bases, need 2 Noriginal bases With only 10-20 bases achieve 99.9 % accuracy Use 32-bits integer per basis, compression Carrasco Kind & Brunner 2014b (MNRAS, 441, 3550) Store Multiple PDFs Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 21

Photo-z PDF storage: Sparse representation Use Gaussian and Voigt profiles as bases, need 2 Noriginal bases With only 10-20 bases achieve 99.9 % accuracy Use 32-bits integer per basis, compression Carrasco Kind & Brunner 2014b (MNRAS, 441, 3550) Store Multiple PDFs Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 21

Photo-z PDF storage: Sparse representation Use Gaussian and Voigt profiles as bases, need 2 Noriginal bases With only 10-20 bases achieve 99.9 % accuracy Use 32-bits integer per basis, compression Carrasco Kind & Brunner 2014b (MNRAS, 441, 3550) Store Multiple PDFs Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 21

Photo-z PDF storage: Sparse representation Use Gaussian and Voigt profiles as bases, need 2 Noriginal bases With only 10-20 bases achieve 99.9 % accuracy Use 32-bits integer per basis, compression Carrasco Kind & Brunner 2014b (MNRAS, 441, 3550) Store Multiple PDFs Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 21

By definition: N(z) and sparse representation N(z) = N k=1 z+ z/2 z z/2 P k(z)dz Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 22

By definition: pz k D δ k N(z) = N N(z) and sparse representation k=1 N(z) = N k=1 δ k z+ z/2 z z/2 Ddz z+ z/2 z z/2 P k(z)dz Using sparse representation, we represent each PDF pz k as: D is the dictionary, δ k is the sparse vector, then Only bases are integrated by precomputing: δ N = N k=1 δ k I D (z) = z+ z/2 z z/2 d jdz j = 1, 2,..., m Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 22

By definition: pz k D δ k N(z) = N N(z) and sparse representation k=1 N(z) = N k=1 δ k z+ z/2 z z/2 Ddz z+ z/2 z z/2 P k(z)dz Using sparse representation, we represent each PDF pz k as: by precomputing: δ N = N k=1 δ k D is the dictionary, δ k is the sparse vector, then I D (z) = z+ z/2 z z/2 d jdz Only bases are integrated N(z) is reduce to a simple dot product j = 1, 2,..., m N(z) = I D (z) δ N Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 22

Talk on Friday (ad) Machine Learning in DES Photo-z in DES early data Photo-z PDF in DESDM New tools to access these from DB Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 23

Conclusions Compute photo-z PDF Individual techniques (MLZ; arxiv:1303.7269, arxiv:1312.5753) Combine PDFs efficiently Better than individual, outliers identification (arxiv:1403.0044) PDF Sparse Representation 99.9% accuracy in P(z) and N(z) with 15 points (arxiv:1404.6442) Uses of these tools! Clustering, weak lensing, DES, DESDM, etc... Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 24

THANKS! Questions? Matias Carrasco Kind NCSA/UIUC mcarras2@ncsa.illinois.edu http://matias-ck.com/ https://github.com/mgckind Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 25

EXTRA SLIDES Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 26

Photo-z PDF storage: Results Carrasco Kind & Brunner 2014b (MNRAS, 441, 3550) For PDFs with less than 4 peaks 5-10 points should be sufficient Sparse representation gives more accurate and more compressed representation for N(z), 99.9% accuracy with 15 points (200 points originally) Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 27

Photo-z PDF storage: Dictionary Carrasco Kind & Brunner 2014b (MNRAS, 441, 3550) Combination of Gaussian and Voigt profiles Covering the whole redshift space, at each location we have several bases Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 28

Photo-z PDF estimation: Error and validation Out of Bag (cross-validation) data used to validate trees/maps Changes for every tree/map and is not used during training We can learn from the cross-validation data! Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 29

zphot Photo-z PDF estimation: Error and validation Out of Bag (cross-validation) data used to validate trees/maps Changes for every tree/map and is not used during training We can learn from the cross-validation data! z spec Carrasco Kind & Brunner 2014c (MNRAS, 442, 3380) Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 29

Photo-z PDF estimation: SOM 2D toy example Suppose 2D data distributed in a given space De-project the data in a 2D map Each cell will contain objects with similar properties Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 30

Photo-z PDF estimation: SOM 2D toy example Suppose 2D data distributed in a given space De-project the data in a 2D map Each cell will contain objects with similar properties Matias Carrasco Kind Tools for Astronomical Big Data, March 9-11 2015 Probabilistic photo-zs in the era of Petascale Astronomy 30