VIALACTEA: the Milky Way as a Star Formation Engine

Similar documents
ViaLactea. The Milky Way as a Star Formation Engine. OACN Team. A. Mercurio. G. Riccio S. Cavuoti M. Brescia

Fragmentation in Hi-GAL clumps

protostelle, classe 0, caratterizzate da un'intensa emissione alle lunghezze della radiazione submillimetrica, che però diviene molto debole a λ<10 µm

An evolutionary sequence for high-mass stars formation

Global star formation in the Milky Way from the VIALACTEA Project

Filamentary Structures in the Galactic Plane Morphology, Physical conditions and relation with star formation

Rick Ebert & Joseph Mazzarella For the NED Team. Big Data Task Force NASA, Ames Research Center 2016 September 28-30

Photometric Redshifts with DAME

Astroinformatics in the data-driven Astronomy

Astronomical data mining (machine learning)

Mining Digital Surveys for photo-z s and other things

A cooperative approach among methods for photometric redshifts estimation

Star Formation & the FIR properties of Infrared Dark Clouds in the Hi-GAL Science Demonstration Field. Mark Thompson and the Hi-GAL Consortium

The Red MSX Source Survey. Massive Star Formation in the Milky Way. Stuart Lumsden University of Leeds

ATLASGAL: APEX Telescope Large Area Survey of the Galaxy

RAMPS: The Radio Ammonia Mid-Plane Survey. James Jackson Institute for Astrophysical Research Boston University

VOSA. A VO Spectral Energy Distribution Analyzer. Carlos Rodrigo Blanco 1,2 Amelia Bayo Arán 3 Enrique Solano 1,2 David Barrado y Navascués 1

The Planck Catalogue of Galactic Cold Clumps

D4.2. First release of on-line science-oriented tutorials

DAME: A Web Oriented Infrastructure For Scientific Data Mining And Exploration

The Grid in INAF. C. Vuerli 1,2, G. Taffoni 1,2, M. Sponza 2, and F. Pasian 1,2. 1. GRID.IT and DRACO. Mem. S.A.It. Suppl. Vol. 13, 158 c SAIt 2009

SOFIA follow-ups of high-mass clumps from the ATLASGAL galactic plane survey

DAME: A Web Oriented Infrastructure For Scientific Data

SPATIAL DATA MINING. Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM

R. Alvino, I. De Marino, M. Magrì, M. Oliviero, G. Severino and Th. Straus

Machine Learning Applications in Astronomy

ESASky, ESA s new open-science portal for ESA space astronomy missions

Surprise Detection in Science Data Streams Kirk Borne Dept of Computational & Data Sciences George Mason University

METAPHOR Machine-learning Estimation Tool for Accurate Photometric Redshifts

GENERALIZATION IN THE NEW GENERATION OF GIS. Dan Lee ESRI, Inc. 380 New York Street Redlands, CA USA Fax:

2 F. PASIAN ET AL. ology, both for ordinary spectra (Gulati et al., 1994; Vieira & Ponz, 1995) and for high-resolution objective prism data (von Hippe

Understanding the tracers of star formation

Predictive analysis on Multivariate, Time Series datasets using Shapelets

Reduced data products in the ESO Phase 3 archive (Status: 02 August 2017)

Object-based feature extraction of Google Earth Imagery for mapping termite mounds in Bahia, Brazil

Department of Astrophysical Sciences Peyton Hall Princeton, New Jersey Telephone: (609)

GOODS/FORS2 Final Data Release: Version 3.0

Science in the Virtual Observatory

Building a 4-D Weather Data Cube for the NextGen Initial Operating Capability (IOC)

Current Status of Chinese Virtual Observatory

Real Astronomy from Virtual Observatories

Intelligence and statistics for rapid and robust earthquake detection, association and location

Multi-wavelength analysis of Hickson Compact Groups of Galaxies.

within entire molecular cloud complexes

Analysis of Evolutionary Trends in Astronomical Literature using a Knowledge-Discovery System: Tétralogie

SCIENTIFIC CASES FOR RECEIVERS UNDER DEVELOPMENT (OR UNDER EVALUATION)

Anatomy of the S255-S257 complex - Triggered high-mass star formation

MATISSE. Multi-purpose Advanced Tool for Instruments for the Solar System Exploration. Angelo Zinzi 1,2, Maria Teresa Capria 3,1, Angelo Antonelli 1,2

Quasar Selection from Combined Radio and Optical Surveys using Neural Networks

Automatic detection of objective prism stellar spectra

Far-IR cooling in massive star-forming regions: a case study of G

Hydrological feature extraction from LiDAR. Grant Pearse 28 March 2017

Protoplanetary discs of isolated VLMOs discovered in the IPHAS survey

ESO Phase 3 Data Release Description. Data Collection ATLASGAL Release Number 1 Data Provider

ISO and AKARI: paving the road to Herschel

Gaia data in the CDS portal. Sébastien Derriere, Mark Allen, Thomas Boch & CDS team Observatoire de Strasbourg, France

Near-Infrared Imaging Observations of the Orion A-W Star Forming Region

Contemporary Data Collection and Spatial Information Management Techniques to support Good Land Policies

ArcGIS Enterprise: What s New. Philip Heede Shannon Kalisky Melanie Summers Sam Williamson

Stefano Cavuoti Supervisors: G. Longo 1, M. Brescia 1,2 (1) Department of Physics University Federico II Napoli (2) INAF National Institute of

Classification of handwritten digits using supervised locally linear embedding algorithm and support vector machine

DP Project Development Pvt. Ltd.

Galactic plane surveys: What have/will we learn(ed)? Henrik Beuther

Mid-infrared images of compact and ultracompact HII regions: W51 and W75N.

Euro-VO. F. Genova, Interoperability meeting, 9 November 2009

Access to massive catalogues in the Gaia archive: a new paradigm

High mass star formation in the Herschel era: highlights of the HOBYS key program

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

The Ṁass- loss of Red Supergiants

RLW paper titles:

Modern Image Processing Techniques in Astronomical Sky Surveys

Scientific Data Mining: Why is it Difficult? Sapphire: using data mining techniques to address the data overload problem

It is advisable to refer to the publisher s version if you intend to cite from the work.

Learning algorithms at the service of WISE survey

GOODS/VIMOS Spectroscopy: Data Release Version 2.0.1

LIGO Grid Applications Development Within GriPhyN

Pushing the Standards Edge: Collaborative Testbeds to Accelerate Standards Development and Implementation

Web components and documentation for programming on-line experiments CODE: DEL-033 VERSION: 01

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

The Herschel/PACS Point Source Catalog

Infrared Dark Clouds seen by Herschel and ALMA

Style-aware Mid-level Representation for Discovering Visual Connections in Space and Time

The Build-up of the Red Sequence in High Redshift Galaxy Clusters

Team. mining the Galactic Plane Goldmine S. Molinari INAF/IAPS, Rome & Herschel composite 346 < l < 356

SOFTWARE ARCHITECTURE DESIGN OF GIS WEB SERVICE AGGREGATION BASED ON SERVICE GROUP

EuroPlanet Integrated and Distributed Information Service (IDIS)

Spatial Analysis in CyberGIS

Splatalogue Quickstart Guide

Multi-instrument, multiwavelength. energy sources with the Virtual Observatory

for Effective Land Administration

Anomaly Detection for the CERN Large Hadron Collider injection magnets

Infall towards massive star forming regions

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

ANTARES: The Arizona-NOAO Temporal Analysis and Response to Events System

Radio Nebulae around Luminous Blue Variable Stars

2MHR. Protein structure classification is important because it organizes the protein structure universe that is independent of sequence similarity.

The INES Archive in the era of Virtual Observatories. E. Solano Spanish Virtual Observatory LAEX CAB / INTA-CSIC

IVOA Beijing Interoperability meeting. Introductory talk. F. Genova, Beijing Interop, Plenary session introduction 1

Low mass star formation. Mark Thompson (with contributions from Jennifer Hatchell, Derek Ward-Thompson, Jane Greaves, Larry Morgan...

What's in the brew? A study of the molecular environment of methanol masers and UCHII regions

Transcription:

Giuseppe Riccio 1, Ugo Becciani 2, Massimo Brescia 1, Robert Butora 3, Stefano Cavuoti 1, Alessandro Costa 2, Anna Maria Di Giorgio 4, Akos Hajnal 5, Gabor Hermann 5, Peter Kacsuk 5, Istvan Marton 5, Amata Mercurio 1, Sergio Molinari 4, Marco Molinaro 3, Riccardo Smareglia 3, Fabio Vitello 2 (1) INAF Astronomical Observatory of Capodimonte, Via Moiariello 16, I-80131 Napoli, Italy (2) INAF Astronomical Observatory of Catania, Via S. Sofia 78, I-95123 Catania, Italy (3) INAF Astronomical Observatory of Trieste, Via Tiepolo 11, I-34131 Trieste, Italy (4) INAF IAPS, Via del Fosso del Cavaliere 100, I-00133 Roma, Italy (5) MTA SZTAKI, Kende u. 13-17, Budapest, Hungary 1

The role of WP5 in the project 2

The role of WP5 in the project 3

Overview of the WP5 activities WP5 main tasks TASK 1: INAF Astronomical Observatory of Trieste Database and Virtual Observatory Infrastructure. This will ensure the integration and interoperability of all new data and tools products by enduring their compliance to VO standards. The usage and test of VO standards and tools can increase the scientific productivity and encourage the development of automatic pipelines to explore existing VO-compatible DB/archives. TASK 2: INAF Astronomical Observatory of Capodimonte, Napoli Data Mining Systems Data Mining System are intelligent integrated systems directly supporting scientific decision making and situation awareness by dynamically integrating, correlating, fusing and analysing extremely large volumes of disparate data resources and streams. All these systems are based on the machine learning paradigms (both supervised and unsupervised), enabling the self-adapting, generalization and automation capabilities to explore and mine data. 4

Overview of the WP5 activities WP5 main tasks TASK 3: INAF Astronomical Observatory of Catania 3D Visual Analytics Systems This task will implement a 3D-aided visual analytics environment allowing the astronomer to easily conduct research activities using methods for multidimensional data and information visualization real-time data interaction to carry out complex tasks for multi-criteria data/metadata queries for subsample selection and further analysis, or real-time control of data fitting to theoretical models TASK 4: MTA SZTAKI Science Gateway According to the needs of the astrophysics user community the Science Gateway with its features will be deployed. Workflows and portlets are also provided by this task with the support of the whole collaboration. 5

Science & Technology Most of the first period has been spent to find a common language among members How astronomers see astroinformaticians How astroinformaticians see astronomers with doubtful but promising results 6

Overview of the WP5 activities Overall system architecture SZTAKI INAF-OACN 7

The data flow and management system VLKB main interface (I/F): IVOA Table Access Protocol (TAP) service WP2 Task 1 WP5 Task 2 Data Mining WP5 Task 3 3D Visual validated DB ingestion processed content WP2 SED models WP3 Task 1 WP3 Task 1/4 http://palantir19.oats.inaf.it:8080/vlkb 8

The Science Gateways Two science gateways had been set up and operated: VIALACTEA Project Science Gateway v0 in Catania VIALACTEA Project Science Gateway v1 in Rome Both conform to STARnet alliance (common authentication, resource sharing, etc.) Connected to PBS clusters of Catania and Rome VIALACTEA Project Science Gateway v1 is based on the latest WS-PGRADE/gUSE release (series 3.7) Improvements of guse are continuously integrated Monitor and test: a new plan will be implemented Let s see a video showing some details about the Science Gateway infrastructure 9

WP1-task1 filamentary structure detection Task 1: Filamentary structure detection Production of column density maps of entire galactic plane Automated filament extraction workflow for Hi- GAL survey The filament extraction code was run on the column density maps covering the region between Galactic longitude 290 -- 320, with different threshold levels equal to 2.5, 3. and 3.5 times the local standard deviation of the minimum eigenvalue (Schisano et al., 2014) OACN Data Mining goal: To refine the edges; To extend filament regions. The right calculation of physical quantities related to filaments strongly depends on their dimensions, so the correct definition of edges is crucial. 10

Overview of the filament areas of interest Original image Intermediate mask TRAINING EXECUTION TRAINING + EXECUTION Traditional thresholding gradients Filament detection Filament optimization FilExSeC To combine different filament detection methods to improve the global performances OR Beam curve detection Structured Forest Filament Flux Calculation Spine Extraction Based on Z-S method SVM detection Fuzzy detection 11

FilExSeC algorithm FilExSeC (Filaments Extraction, Selection and Classification), a data mining tool to refine and optimize the detection of the edges of filamentary structures. The method consists in two main phases: Feature Extraction: a set of features depending by its neighbors is associated to each pixel of the input image Classification: image pixels are classified as filamentary or background, by using a supervised Machine Learning method, trained by these features A further phase, Feature Selection, finds the most relevant features. By reducing the initial data parameter space, it is possible indeed to improve the execution efficiency of the model, without affecting its performances. 12

FilExSeC pixel feature analysis Features extracted from each pixel and its neighbors Feature selection: Haralick type excluded (no information lost and improved computing time) 13

FilExSeC Filament Connections FilExSeC is able to connect, by means of NFPs, filaments that in the traditional method are tagged as disjointed objects. By joining filaments as a unique structure total mass and mass per length change, inducing a different physics of the filamentary structure. Detected Filaments 668 Confirmed Filaments 298 New Filaments 196 Extended Filaments 169 Joined Filaments 5 EXAMPLE: TEST tiles 1+2 of Hi-GAL l217-l224 A further analysis is required to verify the correctness of the reconstruction of interconnections between different filaments, to evaluate the contribution of FilExSeC to the knowledge of the physics of the filaments. Confirmed Filament Pixels New Filament Pixels Confirmed Background Undetected Filament Pixels 14

FilExSeC Example of Joined Filaments IAPS FilExSeC 1 2 3 4 5 6 Connection of 6 filaments identified by IAPS IAPS total number of pixels: 1852 Pixel added by FilExSeC : 858 New Total number of pixels: 2710 % NFP: +46.33% 15

WP2 task 1 Band merging Task 1: Compact Source Extraction and band-merging Hi-GAL Source extraction and photometry Band-merging with ancillary information (from near-ir to radio) 70µm 160µm 250µm 350µm 500µm A first result from OACN of a band-merged catalogue using a data-mining approach has been implemented for the Herschel bands The source extraction with CuTEx (Molinari et al., 2010a) has been run over the entire Galactic plane. The -71 < l < 67.5 portion of the HERSHEL/Hi-GAL photometry lists should be band-merged, filtered and complemented with distances and ancillary photometry : MIPSGAL, UKIDSS, WISE, MSX; ATLASGAL, BGPS Captures and maintains multiple counterpart associations; Topological quality flagging; Ingested into a VO-like database so that complex queries are possible; Interfaced with Visualization tools; Massively based on multi-threading parallelization. 16

The workflow for compact source analysis Compact Sources Band Merged Catalogue w/ Sources SED View Future improvements: User space for catalogue integration Data release solution A&A using LDAP and mixed data policy Catalogue Records Visual Analytics Validation 17

Q-FULLTREE processing flow Catalogue query Project DBMS Quality Rank Quality Fitness Absolute Quality Fitness Other methods SED fitting Quality flagging SED model query CSS list query Project DBMS Science Gateway 18

Q-FULLTREE Example 500 CSS 1 CL 500 350 = 0.91 CL 500 250 = 0.87 CL 350 250 = 0.89 NE TNE = 3 3 = 1 MS 1 = 2.67 MS 2 < MS 1 MS 2 = 1.79 500 CSS 2 CL 500 350 = 0.91 CL 500 250 = 0.87 CL 350 250 = 0.89 + CL 250 70 = 0.02 NE TNE = 4 6 = 2 3 20

WP2-T1 Q-FULLTREE infrastructure aspects 5 bands: on a bi-cpu 1.6GHz, 16 cores: FULLTREE 27 days Q-FULLTREE 3.3 hours On a quad-cpu 2.4GHz, 32 cores: FULLTREE 23 days Q-FULLTREE 1.3 hours On CT cluster (1 CPU 2.4 GHz, 12 cores): FULLTREE 29 days Q-FULLTREE 3,15 hours PERFORMANCES Worst gain in speedup when compared with single-thread FULLTREE: 200x (mostly higher) 21

WP2 T4 Evolutionary classification An evolutionary classification tool for ViaLactea, will catalogue clumps in terms of the evolutionary stage and mass regime of the ongoing star formation. There are two components that need to be developed at the foundation of the classification tool: 1. an evolutionary classification toolbox 2. a set of star-forming clumps in known stages of evolution to be used as a training/test-set for machine-learning algorithms...and adopt some kind of evolutionary scheme Data-mining approaches to source classification FOREIGN (Forming Region Exploring Intelligent Gated Network) Weak Gated Classification We know nothing about the sources evolutionary stage; Identify over-densities in the given parameter space (e.g., built on the evolutionary toolbox, plus any other available evidence); Data are then grouped into clusters: groups of data entries sharing common but a priori unknown correlations among parameter space features. Supervised Classification For a subsample of points, its category/class is well known; Need order of 10 3 objects to be used as a training set; Balanced population of classes in the training set. 22

WP2-T4 Evolutionary Classification (1 st ) Luminosity/mass diagram IR color distribution Mass/radius relationship Association with masers Association with radio continuum Outflow indicators YSO IR-color spatial association Others? ambiguity Quiescent / pre-stellar SF clumps source class prediction Knowledge Base Proto-stellar Hot molecular cores Ultra-compact HII regions Evolved HII regions Train set Validation set Test set Fuzzy/cross-entropy 23

WP2-T4 Evolutionary Classification (2 nd ) Luminosity/mass diagram IR color distribution Mass/radius relationship Association with masers Association with radio continuum Outflow indicators YSO IR-color spatial association Others? ambiguity Quiescent / pre-stellar SF clumps source class prediction Knowledge Base Proto-stellar Hot molecular cores Ultra-compact HII regions Evolved HII regions Train set Validation set Test set Fuzzy/cross-entropy 24

Conclusions The whole project has successfully passed the mid-term official EU commission review The initial inertia due to interaction problems between technology and science communities is going to be successfully overcome The data and computing infrastructures and visual analytics solutions started to host and integrate the planned scientific workflows, matching the expected capabilities The data mining paradigms are demonstrating their expected benefit to help the scientific problem solving automation as well as to manage the foreseen amount and complexity of data In other words The project at mid-term stage (April 2015) is respecting the initial goals, among which the WP5 expectation to release a useful resource for the wide scientific community, which will remain available also after the project closure (October 2016). 25