A SPEctra Clustering Tool for the exploration of large spectroscopic surveys Philipp Schalldach (HU Berlin & TLS Tautenburg, Germany)
Working Group Helmut Meusinger (Tautenburg, Germany) Philipp Schalldach (Berlin & Tautenburg, Germany) Aick in der Au (Munich, Germany) Mike Newholm (London, UK) Jesko Schwarzer (Bonn, Germany) Frank Pertermann (Tautenburg, Germany) Jörg Brünecke (Leipzig, Germany)
Content Scientific aim of ASPECT General procedure Data pools SOM algorithm for clustering SDSS spectra Methods for the analysis of SOMs Application: Search for unusual objects Next steps Summary
Aim Astronomy is a data intensive science with a growth rate of ~ 0.5 PB accessible data per year (Berriman & Groom 2011) ASPECT refers to a tool chain for the analysis of large data pools. The main module utilizes an artificial neural network algorithm called Kohonen mapping that projects higher dimensional input data into a 2-dimensional grid (self-organising map = SOM). The resulting SOM allows the user to browse efficiently through huge data collections and to select certain types of (e.g. rare) objects. ASPECT is able to compute SOMs for the spectra of up to one million objects (and more in the near future).
General Procedure Via a learn rate and a neighbourhood function an adaption of a neural network layer to actual spectral data is being achieved The result at this step is a table-like grid in which every cell/neuron represents an object. Main feature: similar objects are located in similar areas of the SOM. The resulting SOM can be analysed by eye or by blending in additional internal or external data. In der Au, Meusinger, Schalldach, et al., 2012, Astron. & Astrophys. 547, A115 http://dx.doi.org/10.1051/0004-6361/201219958
General Procedure Example: SOM for sin functions of different peridicity
General Procedure Example: evolution over a SOM of > 1 Million Spectra
Data Pools Main observables (so far): spectra from the Sloan Digital Sky Survey (SDSS) data releases DR4 to DR10 - The SDSS is an photometric and spectral observation campaign - conducted at Apache Point Observatory, New Mexico - 2.5-meter telescope (120-megapixel camera, pair of spectrographs) Other data has also been processed either during development processes or within scientific evaluation processes including but not refraining to - CoRoT lighcurves for the purpose of exoplanet search - BATSE lightcurves for the purpose of identifying GRBs - Different types of data from near-industrial research Yet, best results up to date achieved with SDSS spectroscopic data.
SDSS Spectra SDSS spectra (as well as imaging data) is available from the SDSS Data Archive Server via several means, including search masks, SQL database interfaces and raw access to folder structures mainly in fits file format. ASPECT works with raw spectral data points being extracted from fits files and folded by several factors to increase on computation performance. Each spectrum along with several parameters are written into a single binary file that can be processed by the main component of the tool chain. The additional parameters (redshift,emission line-measurements, object type as determined by the SDSS pipeline) are not used within the computations, yet in the subsequent analysis of the SOM. For purpose of identification the SDSS nomenclature of identifying a spectrum via its MJD, plate ID, and fiber ID are being kept.
SDSS Spectra Credits: Sloan Digital Sky Survey
SOMs SOMs produce a similarity graph (usally in 2D) from high dimensional input data SOM key ingredients to emerge topographical organized maps: (1) broad casting of input into an n-dimensional map (2) selection of a winner (3) adaption of n-dimensional map in the spatial neighbourhood of the winner
SOM Algorithm for Clustering Spectra Ø Initalize network for instance take input spectra in randomized order Ø For each learning step: present training data (in our case all input spectra to cluster) in randomized order Ø For each input spectrum: find best matching spectrum in the network with smallest error (using euclidian distance between spectra) on collision (more than one input spectrum hit the same network cell) take input spectrum with smaller error, store other in collision list adapt best matching spectrum and the neigbourhood cells repeat steps for all spectra in collision list (search only in non occupied cells) change learn rate and adaption radius for new learning step
SOM Algorithm for Clustering Spectra Neigbourhood function: Actually used neighbourhoodfunction: exp(-sqrt(dx²+dy²)/2sigma²)
SOM Algorithm for Clustering Spectra Learn rate (eta) and Radius (sigma) change over training phase (t=0..1) Network adaption:
Spectral Icon Map Icon map of SDSS spectra of low-redshift galaxies Each pixel of the SOM corresponds to an object and can be represented by an iconised version of the spectrum. The inspection of the icon map is an efficient tool to - get an overview of the variety of spectral types -select objects of a given type -search for rare, unusual objects
Blending in Parameters Aside from the icon map, other parameters such as redshift, photometric data, line indices etc. can be plotted into the grid via color coding This enables for a general overview of the datapool in terms of its parameter space.
Using Interfaces The actual cells of the grid can be linked to other data pools or other means of computing as well. The key feature of similar spectra being located close to each other within the SOM ought to be preserved. Icon map Picture map
Using Interfaces In order to get more detailed information on the objects in the SOM, a html interface links to the SDSS database and therewith also to other data archives (SIMBAD, NED) as well as crossreferences to publications (ADS).
Application: Search for Unusual Quasars How to find rare, unusual objects in a SOM? Due to the intrinsic properties of the Kohonen-mapping algorithm, ASPECT tends to cluster unusual spectra at one or several spots regarding the overall composition of the datapool. These areas can be easily identified via visual inspection, parameter maps, or tracking of priorly defined objects.
Application: Search for Unusual Quasars Quasar = QSO (quasi stellar object) Actively accreting super-massive black hole in the center of a galaxy Nearby matter (gas clouds, stars, planets,...) in the distance range of lightyears is getting sucked into the black hole Angular momentum generates an accretion disc with luminosity higher than billions of stars Very far away, high redshifts but still visible because of its high luminosity Schematic view of a quasar, Credits: NASA / CXC / M.Weiss
Application: Search for Unusual Quasars Binning 100 000 Quasars into redshift-bins of width 0.1. About 80 Kohonen-Maps clustered by ASPECT. Visual analysis yields ~ 1000 unusual Quasars of different types: - strong unusual absorption lines - weak emission lines - strong iron emission - very red spectra - miscellaneous As well as several extremely peculiar spectra that probably represent special evolutionary stages (e.g. very young quasars) Meusinger, Schalldach, Scholz, et al., 2012, Astron. & Astrophys. 541, A77 http://dx.doi.org/10.1051/0004-6361/201118143
Application: Search for Unusual Quasars Composite spectra as compared to the composite spectrum of ordinary quasars. a ) Unusually red spectra b ) Unusual broad absorption lines c ) Weak emission lines
Currents What s being worked on Preparation for SDSS DR12 (> 2 million spectra) Adaption of ASPECT to FPGAs (floating point gate arrays) Adaption of ASPECT to GPUs (crunching the numbers with graphic cards) Enriching complementary Interfaces and other software tools for better navigation on huge Kohonen-Maps (> 1 million objects) Looking for other types of rare / difficult to identify objects such as post-starburst galaxies or carbon stars in order to find limiting constraints and model fitting parameters Using ASPECT for other types of data, such as lightcurves
Summary ASPECT is able to cluster voluminous samples of spectra by means of SOMs. We created a topological map of more than 600 000 spectra from the SDSS DR4 (In der Au et al. 2012) to illustrate the capability of ASPECT. A larger SOM for ~ 1 million spectra from the SDSS DR7 was computed; the analysis is in progress. A large number of smaller SOMs were computed for special object types and were successfully used for the search for unusual quasars (Meusinger et al. 2012; Meusinger & Balafkan 2014). Improvements of ASPECTs will enable to compute a SOM for the > 2 million spectra from the SDSS DR12 (end of 2014).
Thank you all for your time and patience. Fin.