Photometric Redshifts with DAME

Similar documents
Astronomical data mining (machine learning)

Knowledge discovery in astrophysics: massive data sets, virtual observatory and beyond astrophysics and the data tsunami

Mining Digital Surveys for photo-z s and other things

DAME: A Web Oriented Infrastructure For Scientific Data Mining And Exploration

DAME: A Web Oriented Infrastructure For Scientific Data

D4.2. First release of on-line science-oriented tutorials

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

ASTERICS - H Abell 1656: the Coma Cluster of Galaxies

Mining Astronomical Massive Data Sets

Science in the Virtual Observatory

Artificial Intelligence

ECE521 Lectures 9 Fully Connected Neural Networks

arxiv: v1 [astro-ph.co] 15 Jul 2011

Rick Ebert & Joseph Mazzarella For the NED Team. Big Data Task Force NASA, Ames Research Center 2016 September 28-30

Stefano Cavuoti Supervisors: G. Longo 1, M. Brescia 1,2 (1) Department of Physics University Federico II Napoli (2) INAF National Institute of

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Automated Classification of HETDEX Spectra. Ted von Hippel (U Texas, Siena, ERAU)

A cooperative approach among methods for photometric redshifts estimation

Extragalactic research with. Igor Chilingarian

Unit III. A Survey of Neural Network Model

Data Mining Part 5. Prediction

METAPHOR Machine-learning Estimation Tool for Accurate Photometric Redshifts

Dictionary Learning for photo-z estimation

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Revision: Neural Network

Exploiting Sparse Non-Linear Structure in Astronomical Data

arxiv: v1 [astro-ph.im] 26 Jan 2015

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

EEE 241: Linear Systems

Modern Image Processing Techniques in Astronomical Sky Surveys

Artificial Neural Networks. Edward Gatt

Galaxy and Mass Assembly (GAMA): Evolution of the mass-size relation

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

VOSA. A VO Spectral Energy Distribution Analyzer. Carlos Rodrigo Blanco 1,2 Amelia Bayo Arán 3 Enrique Solano 1,2 David Barrado y Navascués 1

SED- dependent Galactic Extinction Prescription for Euclid and Future Cosmological Surveys

ECLT 5810 Classification Neural Networks. Reference: Data Mining: Concepts and Techniques By J. Hand, M. Kamber, and J. Pei, Morgan Kaufmann

SPSS, University of Texas at Arlington. Topics in Machine Learning-EE 5359 Neural Networks

Data Mining of Be stars in VO. Petr Škoda & Jaroslav Vážný

Artificial Neural Networks Examination, June 2005

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Neural Networks. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Lecture 4: Feed Forward Neural Networks

A summary of Deep Learning without Poor Local Minima

Data Exploration vis Local Two-Sample Testing

Probabilistic photometric redshifts in the era of Petascale Astronomy

Neural Networks and the Back-propagation Algorithm

Multi-instrument, multiwavelength. energy sources with the Virtual Observatory

Artificial Neural Networks Francesco DI MAIO, Ph.D., Politecnico di Milano Department of Energy - Nuclear Division IEEE - Italian Reliability Chapter

Radial-Basis Function Networks

Radial-Basis Function Networks

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November.

Quantifying correlations between galaxy emission lines and stellar continua

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso

Artificial Neural Networks

Unit 8: Introduction to neural networks. Perceptrons

SNAP Photometric Redshifts and Simulations

ECE521 Lecture 7/8. Logistic Regression

Feed-forward Network Functions

Data Release 5. Sky coverage of imaging data in the DR5

Introduction to Neural Networks

Introduction to Artificial Neural Networks

Some Statistical Aspects of Photometric Redshift Estimation

Decision Support Systems MEIC - Alameda 2010/2011. Homework #8. Due date: 5.Dec.2011

arxiv: v1 [astro-ph.im] 5 Jun 2012

Convolutional Neural Networks

EPL442: Computational

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

In the Name of God. Lectures 15&16: Radial Basis Function Networks

Learning and Neural Networks

Neural Nets in PR. Pattern Recognition XII. Michal Haindl. Outline. Neural Nets in PR 2

Feedforward Neural Nets and Backpropagation

A SPEctra Clustering Tool for the exploration of large spectroscopic surveys. Philipp Schalldach (HU Berlin & TLS Tautenburg, Germany)

New Opportunities in Petascale Astronomy. Robert J. Brunner University of Illinois

Ch.8 Neural Networks

Current Status of Chinese Virtual Observatory

CLASH MCT Program Progress Report. Progress Report on the CLASH Multi-Cycle Treasury Program By Marc Postman

Quasar Selection from Combined Radio and Optical Surveys using Neural Networks

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Neural Networks and Ensemble Methods for Classification

The SDSS is Two Surveys

C4 Phenomenological Modeling - Regression & Neural Networks : Computational Modeling and Simulation Instructor: Linwei Wang

Single layer NN. Neuron Model

Neural Networks and Fuzzy Logic Rajendra Dept.of CSE ASCET

Deep Learning: Self-Taught Learning and Deep vs. Shallow Architectures. Lecture 04

Artificial Neural Networks Examination, June 2004

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

In the Name of God. Lecture 11: Single Layer Perceptrons

Distant galaxies: a future 25-m submm telescope

Optmization Methods for Machine Learning Beyond Perceptron Feed Forward neural networks (FFN)

Keywords- Source coding, Huffman encoding, Artificial neural network, Multilayer perceptron, Backpropagation algorithm

Bayesian Modeling for Type Ia Supernova Data, Dust, and Distances

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

An artificial neural networks (ANNs) model is a functional abstraction of the

CS:4420 Artificial Intelligence

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Optimization Methods for Machine Learning (OMML)

Transcription:

Photometric Redshifts with DAME O. Laurino, R. D Abrusco M. Brescia, G. Longo & DAME Working Group VO-Day... in Tour Napoli, February 09-0, 200

The general astrophysical problem Due to new instruments and new diagnostic tools, the information volume grows exponentially Most data will never be seen by humans! The need for data storage, network, database-related technologies, standards, etc. Information complexity is also increasing greatly Most knowledge hidden behind data complexity is lost Most (all) empirical relationships known so far depend on 3 parameters. Simple universe or rather human bias? Most data (and data constructs) cannot be comprehended by humans directly! The need for data mining, KDD, data understanding technologies, hyperdimensional visualization, AI/Machine-assisted discovery

Detect sources and measure their attributes (brightness, position, shapes, etc.) p={isophotal, petrosian, aperture magnitudes concentration indexes, shape parameters, etc.} t δ R.A p p p 2 N = = =,,, m, m,,, m, m { RA, δ, t, { λ, λ, f, f,..., f, f },..., { λn, λn, fn, fn,..., fn, fn } 2 2 2, 2, 2, m 2, m 2, 2, 2, m 2, m { RA, δ, t, { λ, λ, f, f,..., f, f },..., { λ, λ, f, f,..., f, f }... N N N, N, N, m N, m { RA, δ, t, { λ, λ, f, f,..., f, f },... } D = 3+ + m n PARAMETER SPACE n n n n n n λ From the Data Mining point of view, any observed (simulated) datum p defines a point (region) in a subset of R N. N p R N >>00

From data to knowledge: KDD Knowledge Discovery in Databases Data Gathering (e.g., from sensor networks, telescopes ) Data Farming: Storage/Archiving Indexing, Searchability Data Fusion, Interoperability, ontologies, etc. Data Mining (or Knowledge Discovery in Databases): Pattern or correlation search Clustering analysis, automated classification Outlier / anomaly searches Hyperdimensional visualization Data understanding Computer aided understanding KDD Etc. New Knowledge Database technologies Key mathematical issues Ongoing research

Data Mining in the VO A new Interest group on Knowledge Discovery in Massive Data Sets was born inside the IVOA. The explosion of Data available (the Data tsunami ) in the VO can be effectively dealth with using Data Mining, especially when the time variable comes into play. But...The VO currently lacks general purpose tools for server side massive data sets manipulation. Dame in the VO Framework To provide the VO with an extensible, integrated environment for Data Mining and Exploration; Support of the VO standards and formats, especially for application interop (SAMP); To abstract the application deployment and execution, so to provide the VO with an opaque general purpose computing platform taking advantage of the modern technologies (e.g. Grid, Cloud, etc...).

What is DAME DAME is a joint effort between University Federico II, INAF-OACN, and Caltech aimed at implementing (as web application) a scientific gateway for data analysis, exploration, mining and visualization tools, on top of virtualized distributed computing environment. http://voneural.na.infn.it/ Technical and management info Documents Science cases Newsletter http://dame.na.infn.it/ Web application PROTOTYPE

Photometric redshifts? Multicolour photometry maps physical parameters: luminosity L redshift z spectral type T f observed fluxes If the relation can be inverted then: f - u,g,r,i,z,h,j,k,... z, L, T The function can be approximated by regression in the photometric space. The accuracy of the photometric redshifts depend on two aspects: How the photometric filters cover the SED of the sources... How absorption and emission features relate to the photometric filters...

A Science case: photometric redshifts (sample from SDSS galaxies) one of the main tools to investigate the spatial distribution of galaxies, i.e. to reconstruct the 3-D position of very large number of sources using only their photometric properties. F( λ) SU ( λ) dλ Spectral Energy Distribution convolved with band filters m U = 2.5log 0 + cu SU ( λ) dλ F( λ) SB ( λ) dλ m B = 2.5log + cb X = 0 S ( λ) dλ B Galaxy spectrum - F(λ) Photometric system - S i (λ) Etc U-B Point moves as a function of z and morphological type B-R Color indexes U B m m etc. Zspec U B R m m B B R

Photometric redshifts: the Data Mining approach Photometric redshifts are treated as a regression problem (i.e. function approximation) in a multidimensional parameter space, hence a DM problem: X Y find { x, x2, x3,... xn } input vectors { x, x2, x3,... xm } target vectors M << N fˆ: Yˆ = fˆ ( X) is a good approximation of Y BoK = Base of Knowledge BoK(from Vobs) (set of templates) Mapping function Knowledge(phot-z s) Knowledge always reflects the biases in the BoK. Observed Spectroscopic Redshifts Synthetic colors from theoretical SEDs Synthetic colors from observed SED s.. Interpolative Uneven coverage of parameter space SED fitting Unknown or oversimplified physics? Unjustified assumptions! Computational intensive! Possibly accurate, but is it worth?..

Photometric Redshifts with Neural Networks Spectroscopic observations are the most accurate method to determine redshifts, but time consuming; Photometric sources often outnumber Spectroscopic ones up to 3 orders of magnitude(it may depend on the BoK); If we build a reliable BoK with spectroscopic data we can reproduce the functional mapping between photometric parameters and redshift; Zphot accuracy is adequate for several astronomical applications; Neural Networks advantages: Fast; Scale much better than any other method; Learn by examples (BoK, in this case SDSS) and adapts easily to new data; Do not require a priori assumption on the Spectral Energy Distributions of sources; May be applied to all classes of extragalactic sources;

MLP The Architecture Mc-Culloch & Pitts artificial Neuron model Neuron activation function Biological Neuron

MLP Back Propagation learning algorithm Output error Error tolerance threshold FeedForward phase with input propagated through neuron activation function calculated at each layer Weight updating law Backward phase with activation function gradient calculation and back propagation of learning error through layers, in order to update internal weights learning rate Activation function

BOK Experiment Flow-chart Dataset preparation MLP Configuration TRAINING NO TRAIN STORED WEIGHTS CATALOGUE ERROR < ε? Nepochs reached? YES RUN PLOT

BOK Dataset Preparation (not provided in the prototype but in the DAME Suite under testing): so TOPCAT can be alternatively used Dataset preparation TRAINING MLP Configuration Load data file Inspect content Select columns Browse table Split table NO TRAIN STORED WEIGHTS CATALOGUE ERROR < ε? Nepochs reached? YES RUN PLOT

BOK Experiment Configuration : MLP Setup Dataset preparation MLP Configuration NN topology (number of layers and nodes) Learning rate Nepochs Error threshold Type of training (batch/online) TRAINING NO TRAIN STORED WEIGHTS CATALOGUE ERROR < ε? Nepochs reached? YES RUN PLOT

BOK Train the network TRAINING Dataset preparation MLP Configuration. Submit training patterns 2. Run the feed-forward NN 3. Check if output error < threshold OR if Nepochs reached(if YES GOTO 7, else CONTINUE) 4. Back propagate the error gradient and adjust hidden node weights 5. Store hidden node weights(the MLP brain) 6. GOTO 7. Store final hidden node weights 8. STOP NO TRAIN STORED WEIGHTS CATALOGUE ERROR < ε? Nepochs reached? YES RUN PLOT

Test the network BOK Dataset preparation MLP Configuration Submit test set Run the feed-forward NN Compare output error Evaluate generalization capability of the trained NN TRAINING NO TRAIN STORED WEIGHTS CATALOGUE ERROR < ε? Nepochs reached? YES RUN PLOT

Run the network BOK Dataset preparation MLP Configuration RunNN likea generic deterministic one-shot algorithm Generate output Catalogue Plot and analyze output TRAINING NO TRAIN STORED WEIGHTS CATALOGUE ERROR < ε? Nepochs reached? YES RUN PLOT

Follow on-line tutorial

Use case II A possible application of photometric redshifts is the selection of galaxies belonging to bound structures like clusters or groups (see, for example, [Capozzi et al. 2009]). Let's see how we can find out if a given observed "overdensity" of galaxies in a field is likely associated to a real structure. Photometric redshifts can be used as a valid alternative technique or side-byside to other methods based on the photometric properties of galaxies (for example, the "red-sequence" method [Gladders & Yee 2000]). We will consider a well known rich cluster of galaxies (Abell 2255) and explore a field containing this cluster using SDSS photometry and our own photometry redshifts You will use DAME, Topcat and Aladin, with which you should already be comfortable!

Tutorial Part II Follow the step-by-step instructions on the DAME prototype web page