A SPEctra Clustering Tool for the exploration of large spectroscopic surveys. Philipp Schalldach (HU Berlin & TLS Tautenburg, Germany)

Similar documents
Introduction to the Sloan Survey

Data Release 5. Sky coverage of imaging data in the DR5

ROSAT Roentgen Satellite. Chandra X-ray Observatory

Introduction to SDSS -instruments, survey strategy, etc

Real Astronomy from Virtual Observatories

Active Galaxies and Galactic Structure Lecture 22 April 18th

D4.2. First release of on-line science-oriented tutorials

Design and implementation of the spectra reduction and analysis software for LAMOST telescope

The phenomenon of gravitational lenses

Lecture Outlines. Chapter 25. Astronomy Today 7th Edition Chaisson/McMillan Pearson Education, Inc.

Quasars and Active Galactic Nuclei (AGN)

Black Holes and Active Galactic Nuclei

LARGE QUASAR GROUPS. Kevin Rahill Astrophysics

Question 1. Question 2. Correct. Chapter 16 Homework. Part A

SDSS Data Management and Photometric Quality Assessment

Studying galaxies with the Sloan Digital Sky Survey

Other stellar types. Open and globular clusters: chemical compositions

Astronomical Techniques

Active Galaxies & Quasars

The SDSS Data. Processing the Data

Searching for Needles in the Sloan Digital Haystack

Data Management Plan Extended Baryon Oscillation Spectroscopic Survey

arxiv: v1 [astro-ph] 12 Nov 2008

Chapter 21 Galaxy Evolution. How do we observe the life histories of galaxies?

Quasars are supermassive black holes, found in the centers of galaxies Mass of quasar black holes = solar masses

Lecture 11 Quiz 2. AGN and You. A Brief History of AGN. This week's topics

The Star Formation Observatory (SFO)

GOODS/VIMOS Spectroscopy: Data Release Version 2.0.1

Modern Image Processing Techniques in Astronomical Sky Surveys

Super Massive Black Hole Mass Determination and. Categorization of Narrow Absorption Line Quasars Outflows

Quasars in the SDSS. Rich Kron NGC June 2006 START CI-Team: Variable Quasars Research Workshop Yerkes Observatory

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

Overview of comparison data presented

Measuring Radial Velocities of Low Mass Eclipsing Binaries

Feeding the Beast. Chris Impey (University of Arizona)

Astro-lab at the Landessternwarte Heidelberg. Overview astro-lab & introduction to tasks. Overview astro-lab

Lecture 9. Quasars, Active Galaxies and AGN

Large Scale Structure of the Universe Lab

Galaxies. The majority of known galaxies fall into one of three major classes: spirals (78 %), ellipticals (18 %) and irregulars (4 %).

Foundations of Astrophysics

The Sloan Digital Sky Survey

Jodrell Bank Discovery Centre

1 Lecture, 2 September 1999

DLAs Probing Quasar Host Galaxies. Hayley Finley P. Petitjean, P. Noterdaeme, I. Pâris + SDSS III BOSS Collaboration 2013 A&A

2 F. PASIAN ET AL. ology, both for ordinary spectra (Gulati et al., 1994; Vieira & Ponz, 1995) and for high-resolution objective prism data (von Hippe

Quasars: Back to the Infant Universe

ASTROPHYSICS. K D Abhyankar. Universities Press S T A R S A ND G A L A X I E S

Bright Quasar 3C 273 Thierry J-L Courvoisier. Encyclopedia of Astronomy & Astrophysics P. Murdin

Exploiting Sparse Non-Linear Structure in Astronomical Data

Rick Ebert & Joseph Mazzarella For the NED Team. Big Data Task Force NASA, Ames Research Center 2016 September 28-30

Astronomy of the Next Decade: From Photons to Petabytes. R. Chris Smith AURA Observatory in Chile CTIO/Gemini/SOAR/LSST

The Sloan Digital Sky Survey. Sebastian Jester Experimental Astrophysics Group Fermilab

The Large Synoptic Survey Telescope

HOMEWORK - Chapter 17 The Stars

2019 Astronomy Team Selection Test

Doing astronomy with SDSS from your armchair

The SDSS is Two Surveys

Astronomy Today. Eighth edition. Eric Chaisson Steve McMillan

Galaxy Classification

How Do I Create a Hubble Diagram to show the expanding universe?

Astr 5465 Feb. 6, 2018 Today s Topics

Galaxies. Galaxy Diversity. Galaxies, AGN and Quasars. Physics 113 Goderya

International Olympiad on Astronomy and Astrophysics (IOAA)

The Hertzprung-Russell Diagram. The Hertzprung-Russell Diagram. Question

Chapter 15 2/19/2014. Lecture Outline Hubble s Galaxy Classification. Normal and Active Galaxies Hubble s Galaxy Classification

Galaxies over the Latter Half of Cosmic Time

arxiv:astro-ph/ v1 30 Aug 2001

Protoplanetary discs of isolated VLMOs discovered in the IPHAS survey

Overview: Astronomical Spectroscopy

Stars and their properties: (Chapters 11 and 12)

Supplementary Information for SNLS-03D3bb a super- Chandrasekhar mass Type Ia supernova

Active Galactic Nuclei

Distances to Stars. Important as determines actual brightness but hard to measure as stars are so far away

Chapter 21 Galaxy Evolution. Agenda

Searching for Other Worlds

Characterization of the exoplanet host stars. Exoplanets Properties of the host stars. Characterization of the exoplanet host stars

Science with the New Hubble Instruments. Ken Sembach STScI Hubble Project Scientist

Astronomy C SSSS 2018

Flagging Bad Data in Imaging

Our Galaxy. Milky Way Galaxy = Sun + ~100 billion other stars + gas and dust. Held together by gravity! The Milky Way with the Naked Eye

Data challenges of the Virtual Observatory in Time Domain Astronomy

M31 - Andromeda Galaxy M110 M32

Hubble s Law and the Cosmic Distance Scale

Using Spitzer to Observe the Solar System

Astronomy 102: Stars and Galaxies Examination 3 April 11, 2003

QSO ABSORPTION LINE STUDIES with the HUBBLE SPACE TELESCOPE

Vera Genten. AGN (Active Galactic Nuclei)

Astronomy 10 Test #2 Practice Version

Quasars and AGN. What are quasars and how do they differ from galaxies? What powers AGN s. Jets and outflows from QSOs and AGNs

Growth of structure in an expanding universe The Jeans length Dark matter Large scale structure simulations. Large scale structure

Astr 2310 Thurs. March 3, 2016 Today s Topics

Chapter 17. Active Galaxies and Supermassive Black Holes

Science with large imaging surveys

Galaxies. With a touch of cosmology

Age of the Universe Lab Session - Example report

CHEMICAL ABUNDANCE ANALYSIS OF RC CANDIDATE STAR HD (46 LMi) : PRELIMINARY RESULTS

Lecture Outlines. Chapter 24. Astronomy Today 8th Edition Chaisson/McMillan Pearson Education, Inc.

Multi-wavelength Astronomy

SDSS-IV and eboss Science. Hyunmi Song (KIAS)

Gamma-Ray Astronomy. Astro 129: Chapter 1a

Transcription:

A SPEctra Clustering Tool for the exploration of large spectroscopic surveys Philipp Schalldach (HU Berlin & TLS Tautenburg, Germany)

Working Group Helmut Meusinger (Tautenburg, Germany) Philipp Schalldach (Berlin & Tautenburg, Germany) Aick in der Au (Munich, Germany) Mike Newholm (London, UK) Jesko Schwarzer (Bonn, Germany) Frank Pertermann (Tautenburg, Germany) Jörg Brünecke (Leipzig, Germany)

Content Scientific aim of ASPECT General procedure Data pools SOM algorithm for clustering SDSS spectra Methods for the analysis of SOMs Application: Search for unusual objects Next steps Summary

Aim Astronomy is a data intensive science with a growth rate of ~ 0.5 PB accessible data per year (Berriman & Groom 2011) ASPECT refers to a tool chain for the analysis of large data pools. The main module utilizes an artificial neural network algorithm called Kohonen mapping that projects higher dimensional input data into a 2-dimensional grid (self-organising map = SOM). The resulting SOM allows the user to browse efficiently through huge data collections and to select certain types of (e.g. rare) objects. ASPECT is able to compute SOMs for the spectra of up to one million objects (and more in the near future).

General Procedure Via a learn rate and a neighbourhood function an adaption of a neural network layer to actual spectral data is being achieved The result at this step is a table-like grid in which every cell/neuron represents an object. Main feature: similar objects are located in similar areas of the SOM. The resulting SOM can be analysed by eye or by blending in additional internal or external data. In der Au, Meusinger, Schalldach, et al., 2012, Astron. & Astrophys. 547, A115 http://dx.doi.org/10.1051/0004-6361/201219958

General Procedure Example: SOM for sin functions of different peridicity

General Procedure Example: evolution over a SOM of > 1 Million Spectra

Data Pools Main observables (so far): spectra from the Sloan Digital Sky Survey (SDSS) data releases DR4 to DR10 - The SDSS is an photometric and spectral observation campaign - conducted at Apache Point Observatory, New Mexico - 2.5-meter telescope (120-megapixel camera, pair of spectrographs) Other data has also been processed either during development processes or within scientific evaluation processes including but not refraining to - CoRoT lighcurves for the purpose of exoplanet search - BATSE lightcurves for the purpose of identifying GRBs - Different types of data from near-industrial research Yet, best results up to date achieved with SDSS spectroscopic data.

SDSS Spectra SDSS spectra (as well as imaging data) is available from the SDSS Data Archive Server via several means, including search masks, SQL database interfaces and raw access to folder structures mainly in fits file format. ASPECT works with raw spectral data points being extracted from fits files and folded by several factors to increase on computation performance. Each spectrum along with several parameters are written into a single binary file that can be processed by the main component of the tool chain. The additional parameters (redshift,emission line-measurements, object type as determined by the SDSS pipeline) are not used within the computations, yet in the subsequent analysis of the SOM. For purpose of identification the SDSS nomenclature of identifying a spectrum via its MJD, plate ID, and fiber ID are being kept.

SDSS Spectra Credits: Sloan Digital Sky Survey

SOMs SOMs produce a similarity graph (usally in 2D) from high dimensional input data SOM key ingredients to emerge topographical organized maps: (1) broad casting of input into an n-dimensional map (2) selection of a winner (3) adaption of n-dimensional map in the spatial neighbourhood of the winner

SOM Algorithm for Clustering Spectra Ø Initalize network for instance take input spectra in randomized order Ø For each learning step: present training data (in our case all input spectra to cluster) in randomized order Ø For each input spectrum: find best matching spectrum in the network with smallest error (using euclidian distance between spectra) on collision (more than one input spectrum hit the same network cell) take input spectrum with smaller error, store other in collision list adapt best matching spectrum and the neigbourhood cells repeat steps for all spectra in collision list (search only in non occupied cells) change learn rate and adaption radius for new learning step

SOM Algorithm for Clustering Spectra Neigbourhood function: Actually used neighbourhoodfunction: exp(-sqrt(dx²+dy²)/2sigma²)

SOM Algorithm for Clustering Spectra Learn rate (eta) and Radius (sigma) change over training phase (t=0..1) Network adaption:

Spectral Icon Map Icon map of SDSS spectra of low-redshift galaxies Each pixel of the SOM corresponds to an object and can be represented by an iconised version of the spectrum. The inspection of the icon map is an efficient tool to - get an overview of the variety of spectral types -select objects of a given type -search for rare, unusual objects

Blending in Parameters Aside from the icon map, other parameters such as redshift, photometric data, line indices etc. can be plotted into the grid via color coding This enables for a general overview of the datapool in terms of its parameter space.

Using Interfaces The actual cells of the grid can be linked to other data pools or other means of computing as well. The key feature of similar spectra being located close to each other within the SOM ought to be preserved. Icon map Picture map

Using Interfaces In order to get more detailed information on the objects in the SOM, a html interface links to the SDSS database and therewith also to other data archives (SIMBAD, NED) as well as crossreferences to publications (ADS).

Application: Search for Unusual Quasars How to find rare, unusual objects in a SOM? Due to the intrinsic properties of the Kohonen-mapping algorithm, ASPECT tends to cluster unusual spectra at one or several spots regarding the overall composition of the datapool. These areas can be easily identified via visual inspection, parameter maps, or tracking of priorly defined objects.

Application: Search for Unusual Quasars Quasar = QSO (quasi stellar object) Actively accreting super-massive black hole in the center of a galaxy Nearby matter (gas clouds, stars, planets,...) in the distance range of lightyears is getting sucked into the black hole Angular momentum generates an accretion disc with luminosity higher than billions of stars Very far away, high redshifts but still visible because of its high luminosity Schematic view of a quasar, Credits: NASA / CXC / M.Weiss

Application: Search for Unusual Quasars Binning 100 000 Quasars into redshift-bins of width 0.1. About 80 Kohonen-Maps clustered by ASPECT. Visual analysis yields ~ 1000 unusual Quasars of different types: - strong unusual absorption lines - weak emission lines - strong iron emission - very red spectra - miscellaneous As well as several extremely peculiar spectra that probably represent special evolutionary stages (e.g. very young quasars) Meusinger, Schalldach, Scholz, et al., 2012, Astron. & Astrophys. 541, A77 http://dx.doi.org/10.1051/0004-6361/201118143

Application: Search for Unusual Quasars Composite spectra as compared to the composite spectrum of ordinary quasars. a ) Unusually red spectra b ) Unusual broad absorption lines c ) Weak emission lines

Currents What s being worked on Preparation for SDSS DR12 (> 2 million spectra) Adaption of ASPECT to FPGAs (floating point gate arrays) Adaption of ASPECT to GPUs (crunching the numbers with graphic cards) Enriching complementary Interfaces and other software tools for better navigation on huge Kohonen-Maps (> 1 million objects) Looking for other types of rare / difficult to identify objects such as post-starburst galaxies or carbon stars in order to find limiting constraints and model fitting parameters Using ASPECT for other types of data, such as lightcurves

Summary ASPECT is able to cluster voluminous samples of spectra by means of SOMs. We created a topological map of more than 600 000 spectra from the SDSS DR4 (In der Au et al. 2012) to illustrate the capability of ASPECT. A larger SOM for ~ 1 million spectra from the SDSS DR7 was computed; the analysis is in progress. A large number of smaller SOMs were computed for special object types and were successfully used for the search for unusual quasars (Meusinger et al. 2012; Meusinger & Balafkan 2014). Improvements of ASPECTs will enable to compute a SOM for the > 2 million spectra from the SDSS DR12 (end of 2014).

Thank you all for your time and patience. Fin.