Censoring and truncation in astronomical surveys. Eric Feigelson (Astronomy & Astrophysics, Penn State)

Similar documents
Censoring and truncation in astronomical surveys

The Challenges of Heteroscedastic Measurement Error in Astronomical Data

LARGE QUASAR GROUPS. Kevin Rahill Astrophysics

Statistics in Astronomy (II) applications. Shen, Shiyin

Linear regression issues in astronomy. G. Jogesh Babu Center for Astrostatistics

Active Galaxies & Quasars

Galaxy formation and evolution. Astro 850


Astronomy 114. Lecture 27: The Galaxy. Martin D. Weinberg. UMass/Astronomy Department

Galaxies: The Nature of Galaxies

Astronomy A BEGINNER S GUIDE TO THE UNIVERSE EIGHTH EDITION

Introduction to astrostatistics

Part two of a year-long introduction to astrophysics:

A100 Exploring the Universe: The Milky Way as a Galaxy. Martin D. Weinberg UMass Astronomy

Lecture Outlines. Chapter 24. Astronomy Today 8th Edition Chaisson/McMillan Pearson Education, Inc.

Structure of the Milky Way. Structure of the Milky Way. The Milky Way

Learning Objectives: Chapter 13, Part 1: Lower Main Sequence Stars. AST 2010: Chapter 13. AST 2010 Descriptive Astronomy

Lecture 25 The Milky Way Galaxy November 29, 2017

BROCK UNIVERSITY. Test 2, March 2015 Number of pages: 9 Course: ASTR 1P02 Number of Students: 420 Date of Examination: March 5, 2015

The Milky Way Galaxy. Some thoughts. How big is it? What does it look like? How did it end up this way? What is it made up of?

Ultra Luminous Infared Galaxies. Yanling Wu Feb 22 nd,2005

A galaxy is a self-gravitating system composed of an interstellar medium, stars, and dark matter.

Large-Scale Structure

A Random Walk Through Astrometry

Chapter 15 2/19/2014. Lecture Outline Hubble s Galaxy Classification. Normal and Active Galaxies Hubble s Galaxy Classification

Multi-wavelength Astronomy

A brief history of cosmological ideas

Galaxies. The majority of known galaxies fall into one of three major classes: spirals (78 %), ellipticals (18 %) and irregulars (4 %).

Diffuse Gamma-Ray Emission

Lecture 29. Our Galaxy: "Milky Way"

The Milky Way Galaxy Guiding Questions

The Milky Way Galaxy

Active Galactic Nuclei OIII

The Milky Way - Chapter 23

Hubble s Law. Tully-Fisher relation. The redshift. λ λ0. Are there other ways to estimate distances? Yes.

M31 - Andromeda Galaxy M110 M32

Quasars and Active Galactic Nuclei (AGN)

Spatial distribution of stars in the Milky Way

Quasars and AGN. What are quasars and how do they differ from galaxies? What powers AGN s. Jets and outflows from QSOs and AGNs

Galaxies. Galaxy Diversity. Galaxies, AGN and Quasars. Physics 113 Goderya

Our Galaxy. We are located in the disk of our galaxy and this is why the disk appears as a band of stars across the sky.

Cross-Correlation of Cosmic Shear and Extragalactic Gamma-ray Background

Introduction to Galaxies

Nonparametric Estimation of Luminosity Functions

Star systems like our Milky Way. Galaxies

Feeding the Beast. Chris Impey (University of Arizona)

Chapter 14 The Milky Way Galaxy

LECTURE 1: Introduction to Galaxies. The Milky Way on a clear night

The Galactic diffuse gamma ray emission in the energy range 30 TeV 3 PeV

Lecture 11: SDSS Sources at Other Wavelengths: From X rays to radio. Astr 598: Astronomy with SDSS

Black Holes and Active Galactic Nuclei

The Milky Way Galaxy and Interstellar Medium

Lecture Outlines. Chapter 23. Astronomy Today 8th Edition Chaisson/McMillan Pearson Education, Inc.

How to Understand Stars Chapter 17 How do stars differ? Is the Sun typical? Location in space. Gaia. How parallax relates to distance

Hubble s Law and the Cosmic Distance Scale

Constraining Dark Matter annihilation with the Fermi-LAT isotropic gamma-ray background

The Milky Way. Mass of the Galaxy, Part 2. Mass of the Galaxy, Part 1. Phys1403 Stars and Galaxies Instructor: Dr. Goderya

Astr Resources

The Cosmological Distance Ladder. It's not perfect, but it works!

Multi-wavelength Surveys for AGN & AGN Variability. Vicki Sarajedini University of Florida

Milky Way S&G Ch 2. Milky Way in near 1 IR H-W Rixhttp://online.kitp.ucsb.edu/online/galarcheo-c15/rix/

Statistical Applications in the Astronomy Literature II Jogesh Babu. Center for Astrostatistics PennState University, USA

Measurement Error and Linear Regression of Astronomical Data. Brandon Kelly Penn State Summer School in Astrostatistics, June 2007

ASTRON 449: Stellar (Galactic) Dynamics. Fall 2014

GALAXY EVOLUTION STUDIES AND HIGH PERFORMANCE COMPUTING

AS1001: Galaxies and Cosmology

The Brown Dwarfs of our Milky Way

24.1 Hubble s Galaxy Classification

Active Galaxies. Lecture Topics. Lecture 24. Active Galaxies. Potential exam topics. What powers these things? Lec. 24: Active Galaxies

The Milky Way Galaxy. sun. Examples of three Milky-Way like Galaxies

Thom et al. (2008), ApJ

Active Galaxies and Galactic Structure Lecture 22 April 18th

BROCK UNIVERSITY. Test 2, March 2018 Number of pages: 9 Course: ASTR 1P02, Section 1 Number of Students: 465 Date of Examination: March 12, 2018

Reminders! Observing Projects: Both due Monday. They will NOT be accepted late!!!

(a) B-V 6 V. (b) B-V

Number of Stars: 100 billion (10 11 ) Mass : 5 x Solar masses. Size of Disk: 100,000 Light Years (30 kpc)

There are three main ways to derive q 0 :

What Can GLAST Say About the Origin of Cosmic Rays in Other Galaxies

1.4 Galaxy Light Distributions

ASTR 200 : Lecture 22 Structure of our Galaxy

Active Galactic Nuclei-I. The paradigm

Normal Galaxies (Ch. 24) + Galaxies and Dark Matter (Ch. 25) Symbolically: E0.E7.. S0..Sa..Sb..Sc..Sd..Irr

Astronomy 113. Dr. Joseph E. Pesce, Ph.D. Distances & the Milky Way. The Curtis View. Our Galaxy. The Shapley View 3/27/18

Astronomy 113. Dr. Joseph E. Pesce, Ph.D. Dr. Joseph E. Pesce, Ph.D.

EBL Studies with the Fermi Gamma-ray Space Telescope

Bright Quasar 3C 273 Thierry J-L Courvoisier. Encyclopedia of Astronomy & Astrophysics P. Murdin

The King's University College Astronomy 201 Mid-Term Exam Solutions

Chapter 17. Active Galaxies and Supermassive Black Holes

STRUCTURE AND DYNAMICS OF GALAXIES

Today in Astronomy 142: the Milky Way

The Milky Way, Our galaxy

Chapter 19 Reading Quiz Clickers. The Cosmic Perspective Seventh Edition. Our Galaxy Pearson Education, Inc.

EGRET Excess of diffuse Galactic Gamma Rays as a Trace of the Dark Matter Halo

ASTRO 310: Galactic & Extragalactic Astronomy Prof. Jeff Kenney. Class 2 August 29, 2018 The Milky Way Galaxy: Stars in the Solar Neighborhood

Laboratory: Milky Way

Modern Image Processing Techniques in Astronomical Sky Surveys

Results better than Quiz 5, back to normal Distribution not ready yet, sorry Correct up to 4 questions, due Monday, Apr. 26

The Milky Way Galaxy. Sun you are here. This is what our Galaxy would look like if we were looking at it from another galaxy.

The cosmic distance scale

Planck and Virtual Observatories: Far Infra-red / Sub-mm Specificities

Transcription:

Censoring and truncation in astronomical surveys Eric Feigelson (Astronomy & Astrophysics, Penn State)

Outline 1. Examples of astronomical surveys 2. One survey: Truncation and spatial/luminosity distributions 3. Two or more surveys: Censoring and luminosity distributions & relationships 4. A call to arms

Faint Images of the Radio Sky at Twenty-centimeters (FIRST) 811,000 radio sources

Infrared Astronomical Satellite (IRAS) Early 1980s 350,000 mid-ir sources in all-sky survey

Two-Micron All Sky Survey (2MASS) Early 2000s 300,000,000 stars and galaxies

Sloan Digital Sky Survey (SDSS) 2000s 100,000,000 visual band stars and galaxies plus 1,000,000 spectra

ROSAT All-Sky Survey (RASS) 100,000 X-ray sources plus diffuse emission

Energetic Gamma-Ray Experiment Telescope (EGRET) on the Compton Gamma-Ray Observatory 1990s 271 sources with E>100 MeV

Star counts: The first flux limited surveys For a uniform population of objects distributed randomly in transparent space: S = L / 4 π D 2 V = 4/3 π D 3 log N ~ log V ~ -1.5 log S William Herschel (1785) used deviations from this prediction to infer that the Universe (now known as our Milky Way Galaxy) is limited in extent (~1 kpc) and is elongated in shape.

Star counts at different Galactic latitudes halo disk The Galaxy has multiple components with different spatial distributions producing strong deviations from the logn~-1.5 logs law. Bahcall & Soneira 1980 The universe at faint magnitudes. I - Models for the galaxy and the predicted star counts

But, near the Galactic plane, space is very opaque due to interstellar dust. Thus, Herschel s galaxy is ~20 times too small, which was not appreciated until the 1920-30s. Big effort in 1910-50s to address the Fundamental equation of stellar statistics to simultaneously establish the distribution of stellar luminosities and their spatial distributionin the Galactic disk: A(S,l,b) = Int [D(r,l,b) Φ(L,Abs(r,l,b)) r 2 dr A = number of stars/deg -2 in flux interval around S towards direction (l,b). (Star count data) D = number of stars/pc -3 at distance r from Earth towards (l,b) Φ = normalized distribution of stellar luminosities (LF) corrected for spatially-dependent absorption Fredholm Type 1 integral equation solved by Taylor expansion, Fourier transform, numerical integration. This effort essentially failed in the disk due to the patchy absorption, but functions reasonably well towards the Galactic poles (b=90 o ). Trumpler & Weaver Statistical Astronomy 1953

Stellar populations & kinematics in the Galaxy New catalogs with 4-6 dimensions of phase space: (α,δ) Location in the sky π Parallax giving distance (µ α, µ δ ) Proper motion across the sky V r Radial velocity 50,000 stars have all 6 values measured, 100,000 have 5 values, 30M have 4 values (Hipparcos, UCAC2, Gen-Cop, RAVE, ). Fascinating structure: Thin disk, thick disk, halo components of Galaxy Trace star formation around Sun Halo streams & cannibalized galaxies Very little sophisticated statistical work has been performed on these catalogs (e.g. mixture models, wavelet analysis)

Model of Milky Way cannibalizing the Sagittarius galaxy

Extragalactic surveys: The simple case of nearby galaxies

A recent SDSS study Volume-limited survey of 28,089 with d<150 Mpc Low lum High lum Blanton et al. 2005 The Properties and Luminosity Function of Extremely Low Luminosity Galaxies

Concentration Surface brightness Color Luminosity

Normal galaxy luminosity function Parametric model: Schechter function = Gen. Pareto distribution dn/dl ~ Φ ~ L -α e -L/L* Nonparametric methods: two used here (1/V max, stepwise max. likelihood) (Many technical issues concerning correction for missing low-surface brightness galaxies, K-correction and evolution-corrections to luminosity, double Schechter fits, etc.)

Nonparametric estimators to LFs 1. Classic estimator: Φ(L) = N(L) / V where N is the number of stars/galaxies/agns in surveyed volume V. A biased estimator. 2. Schmidt estimator (>700 citations): Φ(L) = Σ 1 / V max (L i ) where V max is the maximum volume within which an object of the observed flux could have been seen given the survey s sensitivity limit. Unbiased estimator but with high variance. Schmidt 1968 Space Distribution and Luminosity Functions of Quasi-Stellar Radio Sources Felten 1976 On Schmidt's V m estimator and other estimators of luminosity function

Lynden-Bell-Woodroofe MLE (>100 citations): Recursion relation: where C k is the number of stars/galaxies/agns in the k-th rectangle in the luminosity-distance diagram Lynden-Bell 1971 A method of allowing for known observational selection in small samples applied to 3CR quasars Woodroofe 1985 Estimation of a distribution function with truncated data

Stepwise maximum-likelihood estimator (>500 citations): where Set dl/dφ = 0 to obtain Efstathiou et al. 1988 Analysis of a complete galaxy redshift survey. II - The field-galaxy luminosity function

Takeuchi et al. (2000) apply these LF estimators to Monte Carlo simulations of small galaxy catalogs (N=100 and 1000). All perform well when spatial distribution is homogeneous. But for spatially clustered distributions, the 1/V max estimator is badly affected. There has been no analytical evaluation of the mathematical properties of these estimators since study of 1/V max by Felten (1976).

Extragalactic surveys The difficult case of quasar evolution AGN exhibit huge cosmic evolution: hundreds of times more luminous quasars when the Universe was half its current age compared to today.

Here is the logn-logs distribution of radio sources in the sky. The expected logn~-1.5logs of a uniform distribution has been removed here, so deviations from a horizontal line indicates cosmic evolution of the volume density and/or LF. The radio sources arise from two main classes, AGN and star-forming galaxies, which exhibit different evolution behaviors. Enormous literature 1960-90s on quasar evolution, and a 1990-00s literature on star formation evolution (the `Madau plot ). Condon et al. 2002 Radio sources and star formation in the local universe

Two or more flux limited surveys 1. A flux-limited survey gives a catalog of sources at one waveband 2. Astronomers observe this sample at another waveband, but only some objects are detected 3. Questions: What is the LF in the second waveband? (density estimation) Are these LFs the same in different subsamples? (k-sample test) What is the relationship between the first and second luminosities? (correlation & regression) 4. Generalize to a multivariate database with many properties measured for the initial sample. Nondetections appear in many columns. In astronomical parlance, the nondetections give upper limits. In statistical parlance, the nondetections give left-censored data points. These questions have arisen in hundreds of studies

An example of astronomical censored data Heckman et al. 1989 A millimeter-wave survey of CO emission in Seyfert galaxies

Several solutions to censored LF problem were considered by qstronomers during 1975-85, leading to the rediscovery of Efron s redistribute-to-the-right algorithm for the maximum-likelihood Kaplan-Meier product-limit estimator for randomly censored data. Schmitt, Feigelson & Nelson and Isobe et al. introduced astronomers to existing methods of survival analysis for treatment of censored data. Schmitt 1985 Statistical analysis of astronomical data containing upper bounds Feigelson & Nelson 1985 Statistical methods for astronomical data with upper limits I - Univariate distributions Isobe et al. 1986 Statistical methods for astronomical data with upper limite II - Correlation and regression

Example of univariate survival analysis (Kaplan-Meier) Feigelson & Nelson 1986

Test of bivariate correlation/regression for simulated flux-limited surveys Biased line using detections only Isobe, Feigelson & Nelson 1986 Unbiased Buckley-James line Including nondetections

ASURV code Stand-alone code implementing survival methods for astronomy: Kaplan-Meier estimator for the luminosity function Gehan, logrank, Peto-Peto & Peto-Prentice 2-sample tests Cox regression for bivariate correlation Generalized Kendall s τ & Spearman s ρ coefficient including censoring in both variables (Brown, Hollander & Korwar, Akritas) Linear regression with Gaussian residuals (EM Algorithm) Linear regression with K-M residuals (Buckley-James) Linear regression with censoring in both variables (Campbell, Schmitt) LaValley, Isobe & Feigelson 1992 ASURV software report

Methodological work needed for astronomical censored data 1 Investigate best methods for astronomical censoring patterns Type 1 censoring in F=L/4pd 2 gives nonrandom censoring in L 2. Develop multivariate analysis for censoring in all variables Clustering, regression/pca, MANOVA, MARS & neural nets, etc. Shapley et al. 2001

3. Calculate nonlinear regression & goodness-of-fit 4. Treat simultaneous censoring & truncation Astronomical survey Goal: Characterizing LFs and relationships in stars/gals/agns Full population of stars/galaxies/agns Truncated sample from fluxlimited survey Censored measurements of other properties AIDS epidemic Goals: Characterizing infected populations & mortality from AIDS Full population of infected individuals Truncated sample from symptomatic individuals Censored measurements of survival 5. Treat simultaneous censoring & measurement error Unlike biometrical and industrial reliability testing environments where survival times are measured precisely, in astronomy the censored value is typically set at the 3σ upper limit where σ is the known noise level. A new statistical approach is needed that treats all observations (detected or not) with measurement errors. Fisher s fiducial distributions??

A call to arms Astrostatistical research for astronomical surveys Develop mathematics for nonparametric estimators for luminosity functions in flux-limited surveys, including applications to complex situations like spatial inhomogeneities & cosmic evolution. Perform state-of-the-art multivariate/spatial analysis of Galactic stellar populations using Hipparcos/UCAC2/ databases Extend survival methods to truncated, multiplycensored, measurement-error, multivarate databases