Three data analysis problems

Similar documents
The Redshift Evolution of Bulges and Disks of Spiral Galaxies in COSMOS

Quantifying Secular Evolution Through Structural Decomposition

Multi-wavelength Surveys for AGN & AGN Variability. Vicki Sarajedini University of Florida

Circumnuclear Gaseous Kinematics and Excitation of Four Local Radio Galaxies

Strong Lens Modeling (II): Statistical Methods

LeMMINGs the emerlin radio legacy survey of nearby galaxies Ranieri D. Baldi

Accounting for Calibration Uncertainty in Spectral Analysis. David A. van Dyk

Introduction. Chapter 1

Detecting the cold neutral gas in young radio galaxies. James Allison Triggering Mechanisms for AGN

Automated Classification of HETDEX Spectra. Ted von Hippel (U Texas, Siena, ERAU)

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Accounting for Calibration Uncertainty in Spectral Analysis. David A. van Dyk

PoS(extremesky2009)018

Chapter 10: Unresolved Stellar Populations

Bayesian Computation in Color-Magnitude Diagrams

GAMA-SIGMA: Exploring Galaxy Structure Through Modelling

arxiv: v1 [astro-ph.sr] 10 Jul 2015

Star Formation Indicators

Approximate Bayesian computation: an application to weak-lensing peak counts

Osservatorio Astronomico di Bologna, 27 Ottobre 2011

The star formation history of Virgo spiral galaxies

Galaxy morphology. some thoughts. Tasca Lidia Laboratoire d Astrophysique de Marseille

Review: Probabilistic Matrix Factorization. Probabilistic Matrix Factorization (PMF)

SUPPLEMENTARY INFORMATION

UNIVERSITY OF SOUTHAMPTON

Big Data Inference. Combining Hierarchical Bayes and Machine Learning to Improve Photometric Redshifts. Josh Speagle 1,

Surface Photometry Quantitative description of galaxy morphology. Hubble Sequence Qualitative description of galaxy morphology

Accounting for Calibration Uncertainty

ROSAT Roentgen Satellite. Chandra X-ray Observatory

PoS(ICRC2017)765. Towards a 3D analysis in Cherenkov γ-ray astronomy

Addendum: GLIMPSE Validation Report

Gas Masses and Gas Fractions: Applications of the Kennicutt- Schmidt Law at High Redshift

An Algorithm for Correcting CTE Loss in Spectrophotometry of Point Sources with the STIS CCD

Learning the hyper-parameters. Luca Martino

Computational statistics

Fitting Narrow Emission Lines in X-ray Spectra

Debate on the toroidal structures around hidden- vs non hidden-blr of AGNs

Revealing new optically-emitting extragalactic Supernova Remnants

Krista Lynne Smith M. Koss R.M. Mushotzky

A Unified Theory of Photometric Redshifts. Tamás Budavári The Johns Hopkins University

A New Method for Characterizing Unresolved Point Sources: applications to Fermi Gamma-Ray Data

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Related Concepts: Lecture 9 SEM, Statistical Modeling, AI, and Data Mining. I. Terminology of SEM

Classifying Galaxy Morphology using Machine Learning

Part III: Circumstellar Properties of Intermediate-Age PMS Stars

Ionization, excitation, and abundance ratios in z~2-3 star-forming galaxies with KBSS-MOSFIRE

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

COSMOS-CLASH:The existence and universality of the FMR at z<2.5 and environmental effects

Luminosity dependent covering factor of the dust torus around AGN viewed with AKARI and WISE

X-ray observations of neutron stars and black holes in nearby galaxies

Markov Chain Monte Carlo

arxiv:astro-ph/ v2 18 Oct 2002

12. Physical Parameters from Stellar Spectra. Fundamental effective temperature calibrations Surface gravity indicators Chemical abundances

Bayesian Nonparametric Regression for Diabetes Deaths

Machine Learning. Probabilistic KNN.

Machine Learning, Fall 2009: Midterm

X-ray spectroscopy of nearby galactic nuclear regions

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Introduction to Bayesian methods in inverse problems

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

SEARCHING FOR NARROW EMISSION LINES IN X-RAY SPECTRA: COMPUTATION AND METHODS

Overlapping Astronomical Sources: Utilizing Spectral Information

GALAXIES. I. Morphologies and classification 2. Successes of Hubble scheme 3. Problems with Hubble scheme 4. Galaxies in other wavelengths

Subject CS1 Actuarial Statistics 1 Core Principles

Modern Image Processing Techniques in Astronomical Sky Surveys

Small-variance Asymptotics for Dirichlet Process Mixtures of SVMs

Gas and Stellar Dynamical Black Hole Mass Measurements:

New Results on the AGN Content of Galaxy Clusters

COMP90051 Statistical Machine Learning

Galaxy Morphology Refresher. Driver et al. 2006

C. Watson, E. Churchwell, R. Indebetouw, M. Meade, B. Babler, B. Whitney

Multi-wavelength analysis of Hickson Compact Groups of Galaxies.

Bayesian Deep Learning

Bayesian Modeling for Type Ia Supernova Data, Dust, and Distances

Age Dating A SSP. Quick quiz: please write down a 3 sentence explanation of why these plots look like they do.

Using the Fermi-LAT to Search for Indirect Signals from Dark Matter Annihilation

STA 4273H: Statistical Machine Learning

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

Automatic Unsupervised Spectral Classification of all SDSS/DR7 Galaxies

Stochastic Modeling of Astronomical Time Series

Robust and accurate inference via a mixture of Gaussian and terrors

Multistage Anslysis on Solar Spectral Analyses with Uncertainties in Atomic Physical Models

Calibration Activities the MOS perspective

Radio Properties Of X-Ray Selected AGN

Inferring source polarisation properties from SKA observations. Joern Geisbuesch

Bayesian Estimation of log N log S

Experimental Design and Data Analysis for Biologists

Robust Bayesian Simple Linear Regression

Wrapped Gaussian processes: a short review and some new results

The AGN / host galaxy connection in nearby galaxies.

IRS Spectroscopy of z~2 Galaxies

LePhare Download Install Syntax Examples Acknowledgement Le Phare

arxiv: v1 [astro-ph.im] 20 Jan 2017

Pattern Recognition and Machine Learning

Machine Learning for Data Science (CS4786) Lecture 24

Tutorial on Approximate Bayesian Computation

Extended Chandra Multi-Wavelength Project (ChaMPx): Source Catalog and Applications

Prospects for Observations of Very Extended and Diffuse Sources with VERITAS. Josh Cardenzana

Introduction to AGN. General Characteristics History Components of AGN The AGN Zoo

Transcription:

Three data analysis problems Andreas Zezas University of Crete CfA

Two types of problems: Fitting Source Classification

Fitting: complex datasets

Fitting: complex datasets Maragoudakis et al. in prep.

Fitting: complex datasets Galaxy Spectral Engery Distribution (SED) 10 5 10 3 10 1 10 1 Fnu (mjy) 10 3 10 5 10 7 Best Model Optical Spectrum Photometric Flux 10 9 10 11 10 13 10 15 10 1 10 2 10 3 10 4 10 5 10 6 10 7 10 8 Wavelength (Å)

Fitting: complex datasets data and folded model Galaxy Spectral Engery Distribution (SED) 10 5 0.1 10 3 Iterative fitting may work, but it is inefficient and confidence intervals on parameters not reliable Fnu (mjy) 10 1 0.01 10 1 10 3 10 3 10 5 10 7 4 Best Model Optical Spectrum Photometric Flux How do we fit jointly the two datasets? 2 10 9 0 10 11 2 10 13 4 10 15 1 10 100 10 1 10 2 10 3 10 Energy 4 (kev) 10 5 10 6 10 7 10 8 Wavelength (Å) VERY common problem!

Problem 2 Model selection in 2D fits of images

A primer on galaxy morphology Three components: spheroidal R 1/ 4 I(R) = I e exp 7.67 1 Re exponential disk r I(R) = I0 exp rh and nuclear point source (PSF)

Use a generalized model R I(R) = I e exp k R e Fitting: The method n=4 : spheroidal n=1 : disk Add other (or alternative) models as needed Add blurring by PSF Do χ 2 fit (e.g. Peng et al., 2002) 1/ n 1

Fitting: The method Typical model tree I(R) = I e exp k R R e 1/ n 1 n=free n=4 n=1 n=4 n=4 n=4 PSF PSF PSF PSF PSF

Fitting: Discriminating between models Generally χ 2 works BUT: Combinations of different models may give similar χ 2 How to select the best model? Models not nested: cannot use standard methods Look at the residuals

Fitting: Discriminating between models

Fitting: Discriminating between models Excess variance 2 σ XS 2 = σ obj 2 σ sky Best fitting model among least χ 2 models the one that has the lowest exc. variance

Fitting: Examples Data Model Residuals Sérsic Model χ 2 ν σ 2 XS (1) (2) (3) Sérsic 1.107 1.722(0.120) Sérsic + psfagn 1.107 1.657(0.118) Sérsic + exdisk 1.107 1.770(0.121) Sérsic + psfagn + exdisk 1.106 1.472(0.113) Sérsic + psfagn Sérsic + exdisk Bonfini et al. in prep. Sérsic + psfagn + exdisk

Fitting: Problems Data Model Residuals However, method not ideal: It is not calibrated Sérsic Cannot give significance Fitting process computationally intensive Sérsic + psfagn Require an alternative, robust, fast, method Sérsic + exdisk Sérsic + psfagn + exdisk

Problem 3 Source Classification (a) Stars

Classifying stars Relative strength of lines discriminates between different types of stars Currently done by eye or by cross-correlation analysis

Classifying stars Would like to define a quantitative scheme based on strength of different lines.

Classifying stars Maravelias et al. in prep.

Classifying stars Not simple. Multi-parameter space Degeneracies in parts of the parameter space Sparse sampling Continuous distribution of parameters in training sample (cannot use clustering) Uncertainties and intrinsic variance in training sample

Problem 3 Source Classification (b) Galaxies

Classifying galaxies Ho et al. 1999

Classifying galaxies 1.5 1.0 (a) AGN LOG ([OIII]/Hβ) 0.5 0.0-0.5 HII LOG ([OIII]/Hβ) 1.5 1.0 0.5 0.0-0.5 (c) Seyfert LINER -1.0 Comp -1.0-2.0-1.5-1.0-0.5 0.0 0.5 LOG ([NII]/Hα) -2.0-1.5-1.0-0.5 0.0 LOG ([OI]/Hα) Seyfert HII LINER -1.0-0.5 0.0 0.5 LOG ([SII]/Hα) Kewley et al. 2001 Kewley et al. 2006

Classifying galaxies Basically an empirical scheme Multi-dimensional parameter space Sparse sampling - but now large training sample available Uncertainties and intrinsic variance in training sample LOG ([OIII]/Hβ) 1.5 1.0 0.5 0.0-0.5-1.0 (a) HII AGN Comp -2.0-1.5-1.0-0.5 0.0 0.5 LOG ([NII]/Hα) Use observations to define locus of different classes

Classifying galaxies Uncertainties in classification due to measurement errors uncertainties in diagnostic scheme Not always consistent results LOG ([OIII]/Hβ) 1.5 1.0 0.5 0.0-0.5 (a) HII AGN from different diagnostics -1.0 Comp Use ALL diagnostics together -2.0-1.5-1.0-0.5 0.0 0.5 LOG ([NII]/Hα) Maragoudakis et al in prep. Obtain classification with a confidence interval

PSfrag replacements Classification Problem similar to inverting 800 Hardness ratios 600 to spectral 400 200 0 parameters PSfrag replacements But more difficult We do not have well defined grid 1000 1.0 log(s/m) 0.5 Grid is not continuous Draws 0.0 0.0 0.5 1.0 1.5 log(m/h) log(m/h) 0.5 0.0 0.5 1.0 1.5 2.0 1.0 0.5 0.0 Table 1: Posterior Probabilities for the Grid of the Power-Law log(s/m) Model. The 95% posterior region is indicated in bold face. Figure 2: X-ray Colors Fitted by the Classical and Bayesian Methods. The top left panel shows the 0.250 0.500 0.125 0.250 0.075 0.125 0.050 0.075 0.025 0.050 0.010 0.025 point estimates of colors with marginal 1.75 2.00 error11.36% bars fitted 13.93% by the3.35% classical 1.00% method. 0.53% In the top 0.24% right panel, posterior draws of the X-ray colors 1.50 1.75 simulated 5.56% by the 13.70% Bayesian 5.99% method 2.34% are superimposed 1.70% 0.67% on the grids. The bottom panels show three-dimensional 1.25 1.50 1.80% graphical 7.76% summaries 5.61% for3.11% the posterior 2.82% draws. 1.56% Γ The 1.00 1.25 0.38% 2.71% 2.87% 2.26% 2.33% 1.58% large dot in the panels except the bottom left one represents the true values of X-ray colors. Γ N H 0.75 1.00 0.07% 0.54% 0.82% 0.75% 1.00% 0.81% 0.50 0.75 0.01% 0.09% 0.15% 0.18% 0.23% 0.17% N H Taeyoung Park s thesis C S and C H. The top right panel of Figure 2 shows the joint posterior draws of C S and C H resulting from the Gibbs sampler; a large dot in the diagram represents the true values of the X-ray colors. In the bottom row of Figure 2 presents the three-dimensional histogram of the draws to the left and the contour plot to the right. Because the Monte Carlo3 draws are superimposed on the grids of the power-law and thermal models in the color-color diagram, we can reversely infer the parameters of the models by computing posterior probabilities corresponding to each section split by the grids. Table 1 presents the normalized posterior probabilities of the X-ray colors in the grid of the power-law model. The 95% highest joint

Summary Model selection in multi-component 2D image fits Joint fits of datasets of different sizes Classification in multi-parameter space Definition of the locus of different source types based on sparse data with uncertainties Characterization of objects given uncertainties in classification scheme and measurement errors All are challenging problems related to very common data analysis tasks. Any volunteers?