The Challenges of Heteroscedastic Measurement Error in Astronomical Data

Similar documents
Measurement Error and Linear Regression of Astronomical Data. Brandon Kelly Penn State Summer School in Astrostatistics, June 2007

Linear regression issues in astronomy. G. Jogesh Babu Center for Astrostatistics

Introduction to astrostatistics

Censoring and truncation in astronomical surveys. Eric Feigelson (Astronomy & Astrophysics, Penn State)

Lectures in AstroStatistics: Topics in Machine Learning for Astronomers

Statistical Applications in the Astronomy Literature II Jogesh Babu. Center for Astrostatistics PennState University, USA

Automated Classification of HETDEX Spectra. Ted von Hippel (U Texas, Siena, ERAU)

Statistics in Astronomy (II) applications. Shen, Shiyin

Astronomy 114. Lecture 27: The Galaxy. Martin D. Weinberg. UMass/Astronomy Department

Detection ASTR ASTR509 Jasper Wall Fall term. William Sealey Gosset

Treatment and analysis of data Applied statistics Lecture 5: More on Maximum Likelihood

Chapter 19 Reading Quiz Clickers. The Cosmic Perspective Seventh Edition. Our Galaxy Pearson Education, Inc.

Two Statistical Problems in X-ray Astronomy

Science in the Virtual Observatory

Treatment and analysis of data Applied statistics Lecture 4: Estimation

Censoring and truncation in astronomical surveys

Lecture Outlines. Chapter 23. Astronomy Today 8th Edition Chaisson/McMillan Pearson Education, Inc.

Statistical Tools and Techniques for Solar Astronomers

Fully Bayesian Analysis of Low-Count Astronomical Images

Hierarchical Bayesian Modeling

Mul$variate clustering & classifica$on with R

Part two of a year-long introduction to astrophysics:

Real Astronomy from Virtual Observatories

Unsupervised clustering of COMBO-17 galaxy photometry

Ages of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008

Astrophysical Quantities

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Probabilistic Catalogs and beyond...

Lecture Outlines. Chapter 24. Astronomy Today 8th Edition Chaisson/McMillan Pearson Education, Inc.

Data Analysis Methods

CHANGING SOURCES FOR RESEARCH LITERATURE

ASTR 101 Introduction to Astronomy: Stars & Galaxies

Applied Multivariate Statistical Analysis Richard Johnson Dean Wichern Sixth Edition

Physics Homework Set 2 Sp 2015

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

Introduction The Role of Astronomy p. 3 Astronomical Objects of Research p. 4 The Scale of the Universe p. 7 Spherical Astronomy Spherical

ASTR 101 Introduction to Astronomy: Stars & Galaxies

The Milky Way Galaxy (ch. 23)

From Supernovae to Planets

TABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1

Hierarchical Bayesian Modeling of Planet Populations. Angie Wolfgang NSF Postdoctoral Fellow, Penn State

LECTURE 1: Introduction to Galaxies. The Milky Way on a clear night

Hierarchical Bayesian Modeling of Planet Populations

Package decon. February 19, 2015

Chapter 23 The Milky Way Galaxy Pearson Education, Inc.

Signal Model vs. Observed γ-ray Sky

Hierarchical Bayesian Modeling

Spatial Normalized Gamma Process

Galaxies & Introduction to Cosmology

D4.2. First release of on-line science-oriented tutorials

The Milky Way. Mass of the Galaxy, Part 2. Mass of the Galaxy, Part 1. Phys1403 Stars and Galaxies Instructor: Dr. Goderya

AN INTRODUCTIONTO MODERN ASTROPHYSICS

PRACTICAL ANALYTICS 7/19/2012. Tamás Budavári / The Johns Hopkins University

A100 Exploring the Universe: The Milky Way as a Galaxy. Martin D. Weinberg UMass Astronomy

COPYRIGHTED MATERIAL CONTENTS. Preface Preface to the First Edition

Lecture Outlines. Chapter 25. Astronomy Today 7th Edition Chaisson/McMillan Pearson Education, Inc.

Our goals for learning: 2014 Pearson Education, Inc. We see our galaxy edge-on. Primary features: disk, bulge, halo, globular clusters All-Sky View

The Milky Way Galaxy and Interstellar Medium

PART 3 Galaxies. Gas, Stars and stellar motion in the Milky Way

Dark Matter ASTR 2120 Sarazin. Bullet Cluster of Galaxies - Dark Matter Lab

Chapter 19: Our Galaxy

Course in Data Science

Experimental Design and Data Analysis for Biologists

Robust and accurate inference via a mixture of Gaussian and terrors

HST Observations of Planetary Atmospheres

Galaxies. CESAR s Booklet

Luminosity Functions of Planetary Nebulae & Globular Clusters. By Azmain Nisak ASTR 8400

On the Detection of Heteroscedasticity by Using CUSUM Range Distribution

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Appendix F. Computational Statistics Toolbox. The Computational Statistics Toolbox can be downloaded from:

Markov Chain Monte Carlo methods

Stellar Structure and Evolution

Bivariate Least Squares Linear Regression: Towards a Unified Analytic Formalism. II. Extreme Structural Models

Contents. Part I: Fundamentals of Bayesian Inference 1

Bayesian Estimation of log N log S

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Non-Imaging Data Analysis

AST 301, Introduction to Astronomy Course Description and Syllabus Fall 2012

Our Galaxy. Milky Way Galaxy = Sun + ~100 billion other stars + gas and dust. Held together by gravity! The Milky Way with the Naked Eye

Astronomy 730 Course Outline

Astronomy A BEGINNER S GUIDE TO THE UNIVERSE EIGHTH EDITION

Fast Hierarchical Clustering from the Baire Distance

Astroinformatics in the data-driven Astronomy

Bootstrap. Director of Center for Astrostatistics. G. Jogesh Babu. Penn State University babu.

Physics 160: Stellar Astrophysics Independent Term Project Guidelines

Statistics Toolbox 6. Apply statistical algorithms and probability models

Intracluster Age Gradients And Disk Longevities In Numerous MYStIX And SFiNCs Young Stellar Clusters

Studies of diffuse UV radiation

Probability distribution functions

Efficient Bayesian Multivariate Surface Regression

Multi-wavelength Astronomy

AS1001: Galaxies and Cosmology

STT 843 Key to Homework 1 Spring 2018

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring

The Milky Way Galaxy

The Galaxy. (The Milky Way Galaxy)

STRUCTURE OF GALAXIES

Space Telescope Science Institute statistics mini-course. October Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses

Statistical Searches in Astrophysics and Cosmology

Transcription:

The Challenges of Heteroscedastic Measurement Error in Astronomical Data Eric Feigelson Penn State Measurement Error Workshop TAMU April 2016

Outline 1. Measurement errors are ubiquitous in astronomy, but different from those in social/bio sciences 2. Progress in measurement errors in bivariate linear regression & density but [3 slides] 3. Little progress for correlation coefficients; multivariate regression, clustering & classification; time series analysis; spatial point processes; etc. [2 slides] 4. Measurement errors and left-censoring need to be treated in a unified fashion [3 slides] A Call to Arms New Methodology is urgently needed!!

Astronomers simultaneously measure observed quantities and their heteroscedastic error variances Infrared image of stars with instrumental & astrophysical background [UKIDSS] Visible spectrum of distant galaxy [SDSS/BOSS] Visible/UV images of a distant galaxy with nondetections [HST] X-ray image with Poisson background & sources [CXO] Astronomers are extremely careful & quantitative in their observations: Exposure time, detector noise, calibration are known in advance Scientifically interesting fluxes are compared to source-free background noise

Science results from random studies X-ray spectrum of a BL Lac object Radio observations of 3 T Tauri stars Structure of interstellar cloud

Gamma-ray search from galaxy cluster Stellar atmospheres for planet search Metallicity of Galactic halo stars

Dwarf galaxy structure & environment Gamma-ray structures near the Galactic Center

Heteroscedastic measurement errors with `known variances are ubiquitous in astronomical research in both the observational data and astrophysical modeling MEs can be present in all variables of a multivariate dataset The same datasets can have non-detections (left-censored data points) when the signal does not exceed ~3x the ME Bibliometrics ~9000 papers appear annually in the 4 principal journals (MNRAS, ApJ, AJ, A&A) of which ~70% have MEs. Full text available from the ADS abstract service (adswww.harvard.edu). Electronic tables can be retrieved from the Journal Web sites or from the Vizier archive service (vizier.u-strasbg.fr)

Statistical methods for datasets with heteroscedastic MEs as data inputs (not model outputs) are not well-treated in standard books on ME methodology Fuller Measurement Error Models 2006 Carroll, Ruppert, Stefanski, Crainiceanu Measurement Error in Nonlinear Models: A Modern Perspective 2006 Buonaccorsi Measurement Error: Models, Methods, and Applications 2010

Regression with heteroscedastic MEs Astronomers frequently engage in regression to find best-fit parameters for either simple heuristic (e.g. linear, power law) or complicated nonlinear astrophysical models. The most common regression technique for many decades has been least squares weighted by the measurement errors nicknamed `minimum chi-squared regression : Is this procedure used in other fields?

Progress on linear regression Problems arise with min-χ 2 regression arise when the total scatter is only partially due to MEs. This formulation is also weak at model selection, confidence intervals, etc. Modified least squares procedures are widely used for the case of heteroscedastic MEs in both X and Y variables, and equation error in Y: o FITEXY (Numerical Recipes, 1990s) o BCES (Akritas & Bershady 1996) Similar to homoscedastic debiased slopes in EIV regression.

Maximum likelihood approach of Kelly (2007) Some Aspects of Measurement Error in Linear Regression of Astronomical Data B.C. Kelly, Astrophysical Journal, 665, 1489-1506 (2007) A complicated likelihood is written with a bivariate linear relationship, heteroscedastic MEs in both variables, equation error, incorporated into a normal mixture model. The best fit is obtained with Bayesian inference using MCMC and a Gibbs sampler. Kelly presents a flexible formulation that can be extended to nonlinear models, mutivariate regression, censored and truncated data, and other problems arising in astronomical research. However, only the simpler applications are in use today.

Statistics in R with heteroscedastic MEs Statistical function Pearson correlation coefficient CRAN package Boot, psych, RankAggreg, weights Chi-squared & t test Nonparametric density estimation weights decon Regressions: MARS, ICR, FDA, k-nn, ridge, quantile Clustering: Hierarchical, K-means, K-medoids Classification: CART, SVM, SOM, neural net caret WeightedCluster, weightedkmeans caret, tree Time series: Box-Pierce, Ljung-Box tests WeightedPortTest Are these methods appropriate for astronomical weighted data?

Positions of 6 stars are known in advance from other studies How left-censored values are generated from astronomical data 1 2 5 3 4 6 Signal & noise for 6 stars 1 31 +/- 4 2 97 4 3 60 7 4 31 10 5 7 5 6 22 26 Undetected fluxes replaced with 3σ upper limits 5 <15 6 <78

Measurement errors, S/N and left-censoring Left-censored data is currently treated with survival analysis techniques where the censored value is exactly measured. But unlike biometrical and industrial reliability testing environment where survival times are measured precisely, in astronomy the censored value is typically set at the 3σ upper limit where σ is the known noise level. A new statistical approach is needed that treats all observations (weighted detections and nondetections) in a self-consistent fashion

Conclusion: A Call to Arms!! Lots of methodology is needed for the astronomical case where variances of measurement errors are easily obtained. Thousands of studies are involved from all branches of astronomy. Some of the needed methodology is probably straightforward. E.g. clarify and present existing procedures with CRAN packages. Some of the needed methodology requires creativity. E.g. a unified signal/noise detection & nondetection treatment. Astronomical journals are eager to publish methodology (AJ, MNRAS, A&C). Acceptance rates are high. These issues can be discussed in detail at the 2016-17 SAMSI program in astrostatistics.