Bayesian Estimation of log N log S

Similar documents
Bayesian Estimation of log N log S

Bayesian Modeling of log N log S

Ages of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008

Accounting for Calibration Uncertainty in Spectral Analysis. David A. van Dyk

Accounting for Calibration Uncertainty in Spectral Analysis. David A. van Dyk

Bayesian Computation in Color-Magnitude Diagrams

Luminosity Functions

Overlapping Astronomical Sources: Utilizing Spectral Information

Fitting Narrow Emission Lines in X-ray Spectra

Fully Bayesian Analysis of Low-Count Astronomical Images

Two Statistical Problems in X-ray Astronomy

Challenges and Methods for Massive Astronomical Data

New Results of Fully Bayesian

Statistics: Learning models from data

Flexible Regression Modeling using Bayesian Nonparametric Mixtures

X-ray observations of neutron stars and black holes in nearby galaxies

Estimating the Size of Hidden Populations using Respondent-Driven Sampling Data

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Bayesian Methods for Machine Learning

Statistical Tools and Techniques for Solar Astronomers

Bayesian linear regression

Fully Bayesian Analysis of Calibration Uncertainty In High Energy Spectral Analysis

Data Analysis Methods

FAV i R This paper is produced mechanically as part of FAViR. See for more information.

Measurement Error and Linear Regression of Astronomical Data. Brandon Kelly Penn State Summer School in Astrostatistics, June 2007

pyblocxs Bayesian Low-Counts X-ray Spectral Analysis in Sherpa

Accounting for Calibration Uncertainty

Fitting Narrow Emission Lines in X-ray Spectra

Marginal Specifications and a Gaussian Copula Estimation

ntopic Organic Traffic Study

Accounting for Complex Sample Designs via Mixture Models

Model Selection for Galaxy Shapes

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Introduction to Probabilistic Machine Learning

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

Extreme Value Analysis and Spatial Extremes

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

Supplementary Information for False Discovery Rates, A New Deal

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1

Learning Objectives for Stat 225

CSC321 Lecture 18: Learning Probabilistic Models

Three data analysis problems

Statistics for extreme & sparse data

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

Advanced Statistical Methods. Lecture 6

Detection ASTR ASTR509 Jasper Wall Fall term. William Sealey Gosset

Linear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model

Module 6: Introduction to Gibbs Sampling. Rebecca C. Steorts

Analysis Challenges. Lynx Data: From Chandra to Lynx : 2017 Aug 8-10, Cambridge MA

A8824: Statistics Notes David Weinberg, Astronomy 8824: Statistics Notes 6 Estimating Errors From Data

X-ray Dark Sources Detection

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

A Parameterization of the Chandra Point Spread Function

Bayesian Non-parametric Modeling With Skewed and Heavy-Tailed Data 1

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Embedding Astronomical Computer Models into Principled Statistical Analyses. David A. van Dyk

Bayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment

A Bayesian multi-dimensional couple-based latent risk model for infertility

Bayesian Inference in Astronomy & Astrophysics A Short Course

Embedding Astronomical Computer Models into Complex Statistical Models. David A. van Dyk

New Bayesian methods for model comparison

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London

Semiparametric Generalized Linear Models

The Challenges of Heteroscedastic Measurement Error in Astronomical Data

Bayesian nonparametrics for multivariate extremes including censored data. EVT 2013, Vimeiro. Anne Sabourin. September 10, 2013

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Spatial bias modeling with application to assessing remotely-sensed aerosol as a proxy for particulate matter

1 Using standard errors when comparing estimated values

BXA robust parameter estimation & practical model comparison. Johannes Buchner. FONDECYT fellow

A short introduction to INLA and R-INLA

CPSC 540: Machine Learning

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Discussion on Fygenson (2007, Statistica Sinica): a DS Perspective

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Statistical Applications in the Astronomy Literature II Jogesh Babu. Center for Astrostatistics PennState University, USA

Bayesian estimation of the discrepancy with misspecified parametric models

Construction of Dependent Dirichlet Processes based on Poisson Processes

Sub-kilometer-scale space-time stochastic rainfall simulation

C. Watson, E. Churchwell, R. Indebetouw, M. Meade, B. Babler, B. Whitney

Consideration of prior information in the inference for the upper bound earthquake magnitude - submitted major revision

A note on multiple imputation for general purpose estimation

Module 7: Introduction to Gibbs Sampling. Rebecca C. Steorts

Approximate Bayesian Computation

How to Win With Poisson Data: Imaging Gamma-Rays. A. Connors, Eureka Scientific, CHASC, SAMSI06 SaFDe

Probabilistic & Bayesian deep learning. Andreas Damianou

arxiv: v1 [astro-ph.he] 7 Mar 2018

Introduction to Bayesian Statistics

Bayesian Analysis of Latent Variable Models using Mplus

1 Random walks and data

Large Scale Bayesian Inference

Latent Variable Models and EM Algorithm

Probabilistic Graphical Models

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Nearest Neighbor Gaussian Processes for Large Spatial Data

Ridge regression. Patrick Breheny. February 8. Penalized regression Ridge regression Bayesian interpretation

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?

Introduction to Bayesian Inference: Supplemental Topics

Transcription:

Bayesian Estimation of log N log S Paul D. Baines Department of Statistics University of California, Davis May 10th, 2013

Introduction Project Goals Develop a comprehensive method to infer (properties of) the distribution of source fluxes for a wide variety source populations. More generally, to also infer luminosity functions for source populations. Collaborators: Irina Udaltsova (UCD), Andreas Zezas (University of Crete & CfA), Vinay Kashyap (CfA).

The Rationale for log N log S Fitting Let N(> S) denote the number of sources detectable to a sensitivity S i.e., N(> S) is the empirical survival function of the flux distribution. In simple settings we expect: log 10 (N(> S)) = β 0 + β 1 log 10 (S), Cosmology complicates the anticipated linearity somewhat, but in many cases the relationship is approximately linear. Primary Goal: Estimate β 1, the power law slope, while properly accounting for detector uncertainties and biases. Note: There is uncertainty on both x and y axes (i.e., N and s).

Inferential Process To infer the log N log S relationship there are a few steps: 1. Collect raw data images 2. Run a detection algorithm to extract sources from the image 3. Produce a dataset describing the photon counts of all sources (and uncertainty about them, background etc.) 4. Infer physical properties about the source population (e.g., the log N log S distribution) from this dataset Our analysis is focused on the final step accounting for some (but not all) of the detector-induced uncertainties... Adding further layers to the analysis to start with raw images is possible but that is for a later time...

Probabilistic Connection Standard log(n) log(s) approaches make it difficult to coherently incorporate detector effects and uncertainties. Probabilistic Connection: Under independent sampling, linearity on the log N log S scale is equivalent to the flux distribution being a Pareto distribution. (Follows from log-linearity of the survival function) The probabilistic representation for the flux distribution now allows for more rigorous analysis by embedding within a hierarchical model.

Beyond the Pareto With Pareto flux distribution we obtain a linear relationship on the logn log (N > S) scale. In general, with complete-data flux distribution G, we have: S i iid G log10 (1 F G (s)) := H (log 10 (s)). (1) The function H is linear if and only if G is the Pareto distribution. Our framework will allow for flexible specification of the (parametrized) flux distribution. Since linearity has both theoretical and empirical support, a commonly used generalization is a broken power-law: { α0 θ log 10 (1 F G (s)) = 0 log 10 (s) s K α 1 θ 1 log 10 (s) s > K, (2) subject to a continuity constraint.

Broken Power-Law Modeling The broken power-law in (2) can be represented as: [ ( ) ] K θ0 ( ) K θ0 Y 1 X 0 + X 1, S min S min where: X 0 Truncated-Pareto (S min, θ 0, K), X 1 Pareto (K, θ 1 ). The result is also an if and only if result i.e., any distribution whose log N log S relationship is a broken power law, with M breakpoints, can be represented as a mixture of M truncated Pareto distributions and another (untruncated) Pareto distribution.

Physically Motivated Fitting The insight from the probabilistic setting reveals that the broken power-law model has a number of unphysical properties (to be expected). Notably, it requires an initial source population to have a sharp cut-off, before yielding to a secondary source population present only above the threshold. More physically realistic descriptions are also more natural statistically e.g., Y m p j X j, where: X j Pareto (S min, θ j ). j=1 Note: the resulting log N log S plot will be curved!

Observational Challenges The previous discussion centered around the flux distribution. We only observe photon counts from the source with intensity proportional to the flux There is background contamination for all sources Different sensitivities across the detector Some sources will not be observed to detector limitations We do not know how many sources there actually are Some sources extracted from the image may not actually be sources In this context, whether a source is observed is a function of its source count (intensity) which is unobserved for unobserved sources. This missing data mechanism is non-ignorable, and needs to be carefully accounted for in the analysis.

The Model Broken power-law flux distribution (known break-points C): S i S min, θ iid Pareto (θ, S min ), i = 1,..., N. Source and background photon counts: Y tot i S i, L i, E i Pois (λ(si, L i, E i ) + k(b i, L i, E i )), i = 1,..., N, Incompleteness, missing data indicators: Prior distributions: I i Bernoulli (g (S i, B i, L i, E i )). N NegBinom (α, β), p(b i, L i, E i ) S min Gamma(a s, b s ), θ Gamma(a θ, b θ ). Observed data: Y obs = {(Yi tot, B i, L i, E i ) : i I, I = n}, LVs/Missing Data: Y mis = {(Yi tot, S i, B i, L i, E i ) : i / I}, {S i : i I}, Parameters: Θ = {N, θ, S min }.

The Model Standard power-law flux distribution: S i S min, θ iid Broken-Pareto ( θ, Smin ; C Source and background photon counts: ), i = 1,..., N. Y tot i S i, L i, E i Pois (λ(si, L i, E i ) + k(b i, L i, E i )), i = 1,..., N, Incompleteness, missing data indicators: Prior distributions: I i Bernoulli (g (S i, B i, L i, E i )). N NegBinom (α, β), p(b i, L i, E i ) h( C) N(m, V ), θ j Gamma(aj, b j ), j = 1,..., M. Observed data: Y obs = {(Yi tot LVs/Missing Data: Y mis = {(Y tot Parameters: Θ = { N, θ, C }., B i, L i, E i ) : i I, I = n},, S i, B i, L i, E i ) : i / I}, {S i : i I}, i

Posterior Inference Inference about θ, N and S is based on the observed data posterior distribution. Care must be taken with the variable dimension marginalization over the unobserved fluxes. Computation is performed by Gibbs sampling. Makes heavy use of the marginal probability of observing a source: π(θ, S min ) = g(s, B, L, E) p(s S min, θ) p(b, L, E)dB dl de ds For broken power-law this is 2m dimensional where m is the number of pieces. It can be pre-computed but requires a sufficiently dense grid and careful interpolation in higher dimensions.

Model/Computational Notes Things to note: The dimension of the missing data is unknown (care must be taken with conditioning) Incompleteness function g can take any form and is problem-specific p(b i, L i, E i ) needs some care and can be tabulated or parametrized Prior parameters can be science-based or weakly informative For single power-law models computation is fast, and insensitive to the number of missing sources Computation for the broken-power law model is slower Generalized mixtures of Pareto s (or other forms) require only minor modifications of general scheme

Paper Organization Paper 1 (Single Pareto modeling only): Handling of incompleteness Handling of other detector effects (background, exposure maps, source location etc.) Incorporation of prior information Probabilistic/Bayesian modeling Model checking via posterior predictive checks Paper 2 (Broken- and Mixture-Pareto extensions): Broken-Pareto modeling for log(n) log(s) Mixture-Pareto modeling for log(n) log(s) Model selection and model checking

Validation Details Parameter specifications as follows: N NegBinom(α, β), where α = 200 = shape, β = 2 = scale θ Gamma(a, b), where a = 20 = shape, b = 1/20 = scale S i θ Pareto(θ, S min ), where S min = 10 13 Yi src S i, L i, E i Pois(λ(S i, L i, E i )) Y bkg i S i, L i, E i Pois(k(L i, E i )) λ = S i E i γ, where effective area E i (1000, 100000), and the energy per photon γ = 1.6 10 9 k i = z E i, where the rate of background photon count intensity per million seconds z = 0.0005 n iter = 250000, Burnin = 50000

Actual vs. Nominal Coverage (1000 datasets) Actual 0 20 40 60 80 100 Target N theta Sobs 0 20 40 60 80 100 Nominal

Actual vs. Nominal Coverage (196 datasets) Actual 0 20 40 60 80 100 Target N theta Smin Sobs 0 20 40 60 80 100 Nominal

Sensitivity Studies There are a lot of cogs in the full model, so we have performed various sensitivity studies to investigate the impact of different priors, misspecification of the incompleteness function etc.

Sensitivity: θ prior Figure: (L) Weak prior and corresponding posterior for θ, (C) a moderately informative prior for θ, and, (R) a strongly informative but incorrect prior for θ.

Sensitivity: N Prior Figure: Prior and posterior distributions for N and θ. Clockwise from upper-left (top row) these represent a weak prior for N, a moderately informative prior for N and a strongly informative but incorrect prior for N. The bottom row shows the corresponding prior and posterior distributions for θ.

Fixed vs. Estimated S min Figure: The grey regions provide the posterior 95% credible intervals for θ at the fixed S min case, with the θ estimate in the center line. The cross intervals show posterior dispersion in both θ and S min for varying priors on S min.

Fixed S min Results Figure: Sensitivity of S min on estimate of θ. Under the fixed S min scenarios, the plots show: (top-left) bias of θ, (top-right) standard deviation of θ, (mid-left) posterior regions and 95% credible intervals of θ, (mid-right) U-shape nature of MSE of θ.

Sensitivity to Misspecification of g Figure: Top row: Four different incompleteness functions, Rows 2-3: Corresponding prior and posterior distributions of N and θ. The second column corresponds to the correct incompleteness function.

Posterior Predictive Diagnostics Figure: Bivariate posterior predictive scatterplot for the conditional model: (left) fitted S min equal to truth, (right) fitted S min larger to truth.

Application: CDF-N Chandra Deep Field North X-ray sources Subset of 225 sources < 8 arcmins Incompleteness function and priors in tabulated form Priors used are weakly informative S min estimated from the data

Figure: The log(n) log(s) plot for the CDFN data. Each line in the plot corresponds to a sample of fluxes for the complete source population from a single iteration of MCMC scheme with observed sources shown in grey and missing sources in red.

Figure: Posterior predictive plots for the single Pareto model fit to the CDF-N dataset.

CDF-N Results The posterior predictive checks are passable but show a few issues (p values from 0.03 to 0.41 for various features of the model fit). Indications of possible breakpoint (Broken-Pareto model gives better fit). Estimates: ˆθ = 0.68, (0.59, 0.78) consistent with other analyses, ˆN = 274, (256, 299) suggest completeness of 75% 87.5%.

Paper One: Status & Future Work Draft available, almost complete (Switching data analysis to CDF-S from CDF-N) Paper Two: Various simulations completed, more to do... Selection of number of pieces for multiple power-law setting not investigated yet (can use Wong et. al (2013) as guide) Estimation of normalizing constants is tricky, other ideas? Real data: CDF-N, SMC Future stuff: False sources (allowing that observed sources might actually be background/artificial) Field contamination (allowing a mixture of a source population with known parameters) Extension to non-poisson regimes

References Wong, R.K.W., Baines, P.D., Aue, A., Lee, T.C.M and Kashyap, V.K. (2013) Automatic Estimation of Flux Distributions of Astrophysical Source Populations, http://arxiv.org/abs/1305.0979. P.D. Baines, I.S. Udaltsova, A. Zezas, V.L. Kashyap (2011) Bayesian Estimation of log N log S, Proc. of Statistical Challenges in Modern Astronomy V A. Zezas et al. (2004) Chandra survey of the Bar region of the SMC Revista Mexicana de Astronoma y Astrofsica (Serie de Conferencias) Vol. 20. IAU Colloquium 194, pp. 205-205.