Learning algorithms at the service of WISE survey

Similar documents
arxiv: v2 [astro-ph.ga] 23 Sep 2016

WISE as the cornerstone for all-sky photo-z samples

arxiv: v2 [astro-ph.ga] 11 Jul 2016

arxiv: v2 [astro-ph.im] 19 Jul 2017

Modern Image Processing Techniques in Astronomical Sky Surveys

arxiv: v1 [astro-ph.co] 5 Jul 2016

Data Release 5. Sky coverage of imaging data in the DR5

Quasar Selection from Combined Radio and Optical Surveys using Neural Networks

Introduction to SDSS -instruments, survey strategy, etc

Classifying Galaxy Morphology using Machine Learning

The SuperCOSMOS all-sky galaxy catalogue

STUDIES OF SELECTED VOIDS. SURFACE PHOTOMETRY OF FAINT GALAXIES IN THE DIRECTION OF IN HERCULES VOID

arxiv: v1 [astro-ph.co] 4 Aug 2014

Automated Classification of Galaxy Zoo Images CS229 Final Report

arxiv: v2 [astro-ph.co] 17 Dec 2013

Characterization of variable stars using the ASAS and SuperWASP databases

Tomographic local 2D analyses of the WISExSuperCOSMOS all-sky galaxy catalogue

JINA Observations, Now and in the Near Future

Astronomical imagers. ASTR320 Monday February 18, 2019

Surprise Detection in Multivariate Astronomical Data Kirk Borne George Mason University

The SKYGRID Project A Calibration Star Catalog for New Sensors. Stephen A. Gregory Boeing LTS. Tamara E. Payne Boeing LTS. John L. Africano Boeing LTS

VISTA HEMISPHERE SURVEY DATA RELEASE 1

arxiv:astro-ph/ v1 30 Aug 2001

Star-Galaxy Separation in the Era of Precision Cosmology

arxiv: v1 [astro-ph.im] 7 Dec 2018

The Millennium Galaxy Catalogue: the photometric accuracy, completeness and contamination of the 2dFGRS and SDSS-EDR/DR1 data sets

arxiv: v1 [astro-ph.co] 5 Jul 2016

SNLS supernovae photometric classification with machine learning

(Present and) Future Surveys for Metal-Poor Stars

Support Vector Machine. Industrial AI Lab.

Miscellaneous New Common Proper Motion Stars

Radio infrared correlation for galaxies: from today's instruments to SKA

On the calibration of WFCAM data from 2MASS

THE 2dF GALAXY REDSHIFT SURVEY: Preliminary Results

arxiv:astro-ph/ v1 5 Oct 2006

Bayesian Modeling for Type Ia Supernova Data, Dust, and Distances

An improved technique for increasing the accuracy of photometrically determined redshifts for blended galaxies

Selection of stars to calibrate Gaia

Galaxy Growth and Classification

SDSS Data Management and Photometric Quality Assessment

Red Magnitudes. Abstract : In which various sky survey red magnitudes are compared to Cousins R

Probabilistic photometric redshifts in the era of Petascale Astronomy

Big Data Inference. Combining Hierarchical Bayes and Machine Learning to Improve Photometric Redshifts. Josh Speagle 1,

Doing astronomy with SDSS from your armchair

Photometric classification of SNe Ia in the SuperNova Legacy Survey with supervised learning

Separating Stars and Galaxies Based on Color

arxiv: v1 [astro-ph] 12 Nov 2008

Incorporating detractors into SVM classification

AUTOMATIC MORPHOLOGICAL CLASSIFICATION OF GALAXIES. 1. Introduction

Dark Energy. Cluster counts, weak lensing & Supernovae Ia all in one survey. Survey (DES)

2 F. PASIAN ET AL. ology, both for ordinary spectra (Gulati et al., 1994; Vieira & Ponz, 1995) and for high-resolution objective prism data (von Hippe

Luminosity dependent covering factor of the dust torus around AGN viewed with AKARI and WISE

Astronomical image reduction using the Tractor

SED- dependent Galactic Extinction Prescription for Euclid and Future Cosmological Surveys

SDSS-IV MaStar: a Large, Comprehensive, and High Quality Empirical Stellar Library

Support Vector Machine. Industrial AI Lab. Prof. Seungchul Lee

Microlensing towards the Galactic Centre with OGLE

arxiv:astro-ph/ v2 2 Mar 2000

ROSAT Roentgen Satellite. Chandra X-ray Observatory

Yuji Shirasaki. National Astronomical Observatory of Japan. 2010/10/13 Shanghai, China

Notes: Infrared Processing and Analysis Center (IPAC). IPAC was founded to process IRAS data and has been expanded to other space observatories.

A Calibration Method for Wide Field Multicolor. Photometric System 1

TMT and Space-Based Survey Missions

Stellar Mass Estimates of Galaxies in Superclusters at z~1

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

arxiv: v1 [astro-ph.ga] 14 May 2015

Machine Learning Applications in Astronomy

Astronomical "color"

The Plato Input Catalog (PIC)

SDSS-IV and eboss Science. Hyunmi Song (KIAS)

Data Analysis Methods

Lectures in AstroStatistics: Topics in Machine Learning for Astronomers

The Two Micron All-Sky Survey: Removing the Infrared Foreground

Random Forests: Finding Quasars

The Cornell Atlas of Spitzer Spectra (CASSIS) and recent advances in the extraction of complex sources

Automated classification of sloan digital sky survey (SDSS) stellar spectra using artificial neural networks

UNIVERSITY COLLEGE LONDON. PHAS : Palomar Sky Survey Prints: Virgo and Hercules Clusters

D4.2. First release of on-line science-oriented tutorials

Project: Hubble Diagrams

Introduction to the Sloan Survey

Introduction The Role of Astronomy p. 3 Astronomical Objects of Research p. 4 The Scale of the Universe p. 7 Spherical Astronomy Spherical

METAPHOR Machine-learning Estimation Tool for Accurate Photometric Redshifts

Surprise Detection in Science Data Streams Kirk Borne Dept of Computational & Data Sciences George Mason University

Studying galaxies with the Sloan Digital Sky Survey

WISE properties of planetary nebulae from the DSH catalogue

Searching for Needles in the Sloan Digital Haystack

Jeff Howbert Introduction to Machine Learning Winter

Design and implementation of the spectra reduction and analysis software for LAMOST telescope

Pattern Recognition and Machine Learning. Perceptrons and Support Vector machines

9. High-level processing (astronomical data analysis)

Nature and completeness of galaxies detected in the Two Micron All Sky Survey

The VVDS-Deep Survey: the growth of bright galaxies by minor mergers since z 1

arxiv:astro-ph/ v1 3 Aug 2004

The Sloan Digital Sky Survey

Excerpts from previous presentations. Lauren Nicholson CWRU Departments of Astronomy and Physics

Quasar identification with narrow-band cosmological surveys

Infrared Mass-to-Light Profile Throughout the Infall Region of the Coma Cluster

Basics of Photometry

Cosmology of Photometrically- Classified Type Ia Supernovae

Fundamental Astronomy

Transcription:

Katarzyna Ma lek 1,2,3, T. Krakowski 1, M. Bilicki 4,3, A. Pollo 1,5,3, A. Solarz 2,3, M. Krupa 5,3, A. Kurcz 5,3, W. Hellwing 6,3, J. Peacock 7, T. Jarrett 4 1 National Centre for Nuclear Research, ul. Hoża 69, 00-681 Warszawa, Poland 2 Nagoya University, Furo-cho, Chikusa-ku, 464-8602 Nagoya, Japan 3 Kepler Institute of Astronomy, University of Zielona Góra, ul. Lubuska 2, 65-265 Zielona Góra, Poland 4 Department of Astronomy, University of Cape Town, Rondebosch, South Africa 5 The Astronomical Observatory of the Jagiellonian University, ul. Orla 171, 30-244 Kraków, Poland 6 Institute for Computational Cosmology, Department of Physics, Durham University, Durham DH1 3LE, U.K 7 Institute of Astronomy, University of Edinburgh, Royal Observatory, Edinburgh EH9 3HJ, UK WISE at 5, 2015

1 MOTIVATION 2 DATA 3 SAMPLE SELECTION 4 TOOLS - Machine learning algorithms SVM the main concept 5 TRAINING SAMPLE parameters 6 RESULTS accuracy completness & purity Preliminary results for the final catalog 7 CONCLUSIONS

How to perform the ultimate classification of the WISE data? Wide-field Infrared Survey Explorer - the biggest Infrared all sky survey, over 1 million images covering the whole sky in 4 infrared wavelengths, map of the sky, PSC good resolution of the data, great opportunity to study different types of galaxies, quasars; huge impact for cosmology. No sufficient auxiliary data to classify all these sources.

PROBLEM TO SOLVE WISExSDSS DR10 1 700 000 objects

AIM to develop an automatic object classification in WISE into galaxy/quasar/star categories, with high completeness and purity, based on photometric properties of the sources; we focused on obtaining clean catalogs of stars, galaxies, and quasars, keeping at the same time the biggest possible sample of WISE sources to take advantage of the all-sky information

IDEA 1 match two all-sky catalogs: AllWISE and SuperCOSMOS; 2 use spectroscopic data (SDSS DR 10) to create calibration sets of three source types (galaxies, stars, and quasars); 3 use the calibration data as training sets for a machine learning algorithm to perform the final classification of the WISE x SuperCosmos data; 4 as a result we can obtain classification of 170 milion of WISE sources, with known purity and contamination distribution.

DATA

The second all-wise dataset (Cutri et al. 2013) availabe to download from http://irsa.ipac.caltech.edu/frontpage/, combining data from the cryo-genic and post-cryogenic survey phases. almost 750 million sources with S/N 5 in at least one of the bands, averaged 95% completeness in unconfused areas is W1: 17.1, W2: 15.7, W3: 11.5 and W4: 7.7 in Vega magnitudes,

The SuperCOSMOS Sky Survey (hereafter SCOS, Hambly et al.2001 a,b,c): digitized photographs in three bands (B, R, I), obtained via automated scanning of source plates from South: the United Kingdom Schmidt Telescope (UKST), and North: the Palomar Observatory Sky Survey-II (POSS-II). calibrated using: SDSS photometry in the relevant areas, 2MASS J band over the rest of the sky. publicly available from the SuperCOSMOS Science Archive (http://surveys.roe.ac.uk/ssa/), contains 1.9 bilion of sources (photometry, morphology and quality)

SAMPLE SELECTION

WISE: S/N ratios larger than 2 in the W1 and W2 bands, removal of obvious artefacts (cc flags[1,2] DPHO) = 603 million detection over the whole sky, rejection of the Galactic Plane ( b < 10 ) = 460 million of objects (83% of the sky available at b > 10 ), W 1 < 17 magnitude threshold for all-sky uniformity = 343 million sources at b > 10.

SCOS: 1 required detections in both the B and R bands, 2 flux cuts for uniformity: B < 21 mag and R < 19.5 mag, 3 galactic latitude cut ( b > 10 ), where blending and high extinction make the SCOS photometry unreliable.

FINAL SAMPLE: WISExSuperCOSMOS catalog 170 milion of sources

MACHINE LEARNING ALGORITHMS

The machine learning algorithms The machine learning algorithms are designed to infer patterns in the data. 1 Unsupervised algorithms identify groups in the data a priori; clustering algorithms; 2 Supervised algorithms are trained to recognize the pattern. Support vector Machines (SVM)

SVM - the main concept to calculate decision planes between a set of objects having different class memberships, which are defined by the Training Sample quantities that describe the properties of each class of objects, SVM searches for the optimal separating hyperplane between the different classes of objects by maximizing the margin between the classes closest points, the objects are classified based on their relative position in the N-dimensional parameter space to the separation boundary.

SVM to search for a hyperplane, SVM uses kernel function 2, and a soft-boundary SVM method called C-SVM: C - trade-off parameter between large margin of different classes of objects and mis-classifications. γ parameter determines the topology of the decision surface. Both parameters, C and γ, need to be tuned based on the Training Sample. 2 Gaussian radial basis kernel (RBK) function for this work.

SVM: practical point of view 1 manually classify the Training Sample, 2 for each object in this subset define a feature vector, 3 Train algorithm and optimize C and γ.

SVM: practical point of view 1 manually classify the Training Sample, 2 for each object in this subset define a feature vector, 3 Train algorithm and optimize C and γ.

SVM: practical point of view 1 manually classify the Training Sample, 2 for each object in this subset define a feature vector, 3 Train algorithm and optimize C and γ.

Training Sample

Training Sample - the heart of the method The WISExSCOS catalog includes z< 0.5 galaxies (Bilicki et al. 2014; Bilicki et al. 2015), as well as stars and higher-redshift quasars, the training sets require good-quality and high-reliability pre-classification using spectroscopic measurements, the best choice: spectroscopic sample from the Sloan Digital Sky Survey DR10 (SDDS DR10, Ahn et al. 2014) cross-matched with WISExSCOS catalog.

Training sample: WISExSCOSxSDSS SDSS DR10 3.4 million spectroscopic sources: 59% - galaxies, 26% - stars, and 15% - quasars. Pairing up these sources with our WISExSCOS flux-limited catalogue within 1 to obtain the Trainig Sample gave over 1.3 million common objects, of which: 68% were galaxies, 25% stars, and 7% of quasars.

Training sample: WISExSCOSxSDSS Figure 1 : Original SDSSxWISExSCOS cross-matched data. accuracy

Training sample: WISExSCOSxSDSS Figure 1 : Training set after oversampling. accuracy

Training sample: WISExSCOSxSDSS We divided the training sample into 5 bins: W1 < 13 mag, 13 mag W1 < 14 mag, 14 mag W1 < 15 mag, 15 mag W1 < 16 mag, and 16 mag W1 < 17 mag. For the training set we select 5000 galaxies, 5000 quasars, and 5000 stars. The rest of the WISExSCOSxSDSS catalog was used as independent test sets to check the stability of the clasisfier, As the parameter space we choose 5 parameters: W1 magnitude, W1-W2 colour, R-W1 colour, B-R colour and the w1mag13 differential aperture of the W1 magnitude.

digression: 5D vs 6D classifier (+ W3) The accuracy, completeness, and purity for both classifiers are very high, and the contamination levels hardly exceed 5%. The differences between the 5D and 6D classifiers are not significant. We decided not to use the W 3 passband, as the most of our sources have only W 3 flux upper limits estimated.

Parameters which can help us to understand the performance of the classifier

Parameters which can help us to understand the performance of the classifier Total Accuracy: TA = 1 10 10 i=1 The accuracy for a given iteration is defined as A i = A i (1) TG + TQ + TS TG + TQ + TS + FG + FQ + FS. (2) where: TG - true galaxies, TQ - quasars and Ts - stars from the training sample, properly classified as galaxies, quasars, and stars, respectively; and FG - false galaxies, being real quasars or stars misclassified as galaxies, with false quasars (FQ) and false stars (FS) defined in a similar manner.

Parameters which can help us to understand the performance of the classifier completeness (c): c g = purity (p, 1-contamination): p g = 1 TG TG + FGS + FGQ, (3) FSG + FQG TG + FSG + FQG, (4) where: FGS and FGQ determine galaxies mis-classified respectively as stars and quasars, and FSG, FQG define stars and quasars mis-classified as galaxies. Definitions for stars and quasars follow in an analogous way.

RESULTS

Figure 2 : Accuracy of the classifier vs W1 TS distribution

Figure 2 : Accuracy of the classifier vs b TS distribution

Figure 3 : Completness of the WISExSCOSxSDSS sample vs W1 mag

Figure 3 : Purity of the WISExSCOSxSDSS sample vs W1 mag

PRELIMINARY RESULTS: FOUR WISExSuperCOSMOS SUBSAMPLES

PRELIMINARY RESULTS FOR THE FINAL CATALOG North Galactic Pole Figure 4 : 504 144 objects: stars: 286 537 (56.84%), galaxies: 185 622 (36.82%), quasars: 31 985 (6.34%)

PRELIMINARY RESULTS FOR THE FINAL CATALOG South Galactic Pole Figure 5 : 1 146 531 objects: stars: 644 341 (56.20%), galaxies: 433 132 (37.78%), quasars: 69 058 (6.02%)

PRELIMINARY RESULTS FOR THE FINAL CATALOG crossing the Bulge (0 l 1 ) Figure 6 : 661 402 objects: stars: 553 358 (83.66%), galaxies: 94 759 (14.33%), quasars: 13 285 (2.00%)

PRELIMINARY RESULTS FOR THE FINAL CATALOG Galactic anti-center (179 l 180 ) Figure 7 : 327 367 objects: stars: 259 457 (79.25%), galaxies: 58 029 (17.23%), quasars: 9 881 (3.02%)

CONCLUSIONS

WISExSuperCOSMOS catalog (170 milion of sources), Training Sample: WISExSuperCOSMOSxSDSS catalog of galaxies, stars, and quasars with spectroscopic classification (1.3 milion of sources), C and γ tuning in 5 W1 bins, computation of the accuracy, purity, contaminations in function of W1 and b, run the final 5D classifier (W1, W1-W2, R-W1, B-R,w1mag13) against the 4 test subsamples of the WISExSCOS catalog, run the final classifier against the whole 170 milion of WISExSuperCOSMOS catalog (the results will be publish very soon). The aim is reached

THANK YOU FOR YOUR ATTENTION final catalog: almost ready, need additional 2 weeks of computation, more about our results: Krakowiak et al., in preparation, Kurcz et al., in preparation