Introduction to astrostatistics

Similar documents
The Challenges of Heteroscedastic Measurement Error in Astronomical Data

arxiv:astro-ph/ v2 17 May 1998

Statistical Searches in Astrophysics and Cosmology

Machine Learning Applications in Astronomy

Astrostatistics: Past, Present & Future

Astrostatistics: Past, Present and Future

Statistical Applications in the Astronomy Literature II Jogesh Babu. Center for Astrostatistics PennState University, USA

Censoring and truncation in astronomical surveys. Eric Feigelson (Astronomy & Astrophysics, Penn State)

Real Astronomy from Virtual Observatories

Using Gamma Ray Bursts to Estimate Luminosity Distances. Shanel Deal

Multi-wavelength Astronomy

D4.2. First release of on-line science-oriented tutorials

Eric Howell University of Western Australia

Searching for gravitational waves. with LIGO detectors

Rick Ebert & Joseph Mazzarella For the NED Team. Big Data Task Force NASA, Ames Research Center 2016 September 28-30

Doing astronomy with SDSS from your armchair

AN INTRODUCTIONTO MODERN ASTROPHYSICS

Data challenges of the Virtual Observatory in Time Domain Astronomy

Data Analysis Methods

Part two of a year-long introduction to astrophysics:

Gamma-Ray Astronomy. Astro 129: Chapter 1a

Astroinformatics in the data-driven Astronomy

New Methods of Mixture Model Cluster Analysis

The Large Synoptic Survey Telescope

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Measurement Error and Linear Regression of Astronomical Data. Brandon Kelly Penn State Summer School in Astrostatistics, June 2007

Classification and Energetics of Cosmological Gamma-Ray Bursts

Black Holes and Active Galactic Nuclei

Improved Astronomical Inferences via Nonparametric Density Estimation

New Multiscale Methods for 2D and 3D Astronomical Data Set

Cosmological Gamma-Ray Bursts. Edward Fenimore, NIS-2 Richard Epstein, NIS-2 Cheng Ho, NIS-2 Johannes Intzand, NIS-2

Major Option C1 Astrophysics. C1 Astrophysics

Cherenkov Telescope Array ELINA LINDFORS, TUORLA OBSERVATORY ON BEHALF OF CTA CONSORTIUM, TAUP

Gamma-ray Astronomy Missions, and their Use of a Global Telescope Network

9.1 Years of All-Sky Hard X-ray Monitoring with BATSE

Interpretation of Early Bursts

Expected detection rate of gravitational wave estimated by Short Gamma-Ray Bursts

RLW paper titles:

Construction and Preliminary Application of the Variability Luminosity Estimator

G. Jogesh Babu and S. George Djorgovski

Science in the Virtual Observatory

arxiv: v1 [astro-ph.co] 3 Apr 2019

Review of Lecture 15 3/17/10. Lecture 15: Dark Matter and the Cosmic Web (plus Gamma Ray Bursts) Prof. Tom Megeath

Learning Objectives: Chapter 13, Part 1: Lower Main Sequence Stars. AST 2010: Chapter 13. AST 2010 Descriptive Astronomy

ROTSE: THE SEARCH FOR SHORT PERIOD VARIABLE STARS

Surprise Detection in Multivariate Astronomical Data Kirk Borne George Mason University

arxiv:astro-ph/ v2 18 Oct 2002

Lobster X-ray Telescope Science. Julian Osborne

Classification and Energetics of Cosmological Gamma-Ray Bursts

Data Center. b ss. Ulisses Barres de Almeida Astrophysicist / CBPF! CBPF

Fast Hierarchical Clustering from the Baire Distance

Surprise Detection in Science Data Streams Kirk Borne Dept of Computational & Data Sciences George Mason University

Lectures in AstroStatistics: Topics in Machine Learning for Astronomers

Detectors for 20 kev 10 MeV

Outline Challenges of Massive Data Combining approaches Application: Event Detection for Astronomical Data Conclusion. Abstract

Four ways of doing cosmography with gravitational waves

Hierarchical Bayesian Modeling

X-ray flashes and X-ray rich Gamma Ray Bursts

Data-driven models of stars

Binary Black Holes, Gravitational Waves, & Numerical Relativity Part 1

There are three main ways to derive q 0 :

Lecture 20 High-Energy Astronomy. HEA intro X-ray astrophysics a very brief run through. Swift & GRBs 6.4 kev Fe line and the Kerr metric

Gamma-ray Bursts. Chapter 4

Foundations of Astrophysics

GAMMA-RAY BURSTS. mentor: prof.dr. Andrej Čadež

Active Galaxies and Galactic Structure Lecture 22 April 18th

arxiv:astro-ph/ v1 17 Dec 2003

Skyalert: Real-time Astronomy for You and Your Robots

Determining the GRB (Redshift, Luminosity)-Distribution Using Burst Variability

Two Space High Energy Astrophysics Missions of China: POLAR & HXMT

Statistics in Astronomy (II) applications. Shen, Shiyin

GRB Spectra and their Evolution: - prompt GRB spectra in the γ-regime

Chapter 0 Introduction X-RAY BINARIES

Future Opportunities for Collaborations: Exoplanet Astronomers & Statisticians

ESASky, ESA s new open-science portal for ESA space astronomy missions

X-ray Astronomy F R O M V - R O CKETS TO AT HENA MISSION. Thanassis Akylas

Unsupervised clustering of COMBO-17 galaxy photometry

Scientific Results from the Virgin Islands Center for Space Science at Etelman Observatory

GLAST E/PO Program Status

Editorial comment: research and teaching at UT

Linear regression issues in astronomy. G. Jogesh Babu Center for Astrostatistics

ASTR 2310: Chapter 6

Science of Compact X-Ray and Gamma-ray Objects: MAXI and GLAST

Physics of the hot evolving Universe

Summer School in Statistics for Astronomers & Physicists June 5-10, 2005

Lecture notes assembled by

A SPEctra Clustering Tool for the exploration of large spectroscopic surveys. Philipp Schalldach (HU Berlin & TLS Tautenburg, Germany)

Fully Bayesian Analysis of Low-Count Astronomical Images

ESAC VOSPEC SCIENCE TUTORIAL

On The Nature of High Energy Correlations

Special Relativity. Principles of Special Relativity: 1. The laws of physics are the same for all inertial observers.

Rest-frame properties of gamma-ray bursts observed by the Fermi Gamma-Ray Burst Monitor

Cosmology The Road Map

Gravitational-Wave Data Analysis: Lecture 2

Gravitational Wave Burst Searches

Modern Image Processing Techniques in Astronomical Sky Surveys

Hierarchical Bayesian Modeling of Planet Populations

The Swift GRB MIDEX Mission, A Multi-frequency, Rapid-Response Space Observatory

Department of Statistics Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA citizenship: U.S.A.

Supernovae Observations of the Accelerating Universe. K Twedt Department of Physics, University of Maryland, College Park, MD, 20740, USA

Transcription:

Introduction to astrostatistics Eric Feigelson (Astro & Astrophys) & G. Jogesh Babu (Stat) Center for Astrostatistics Penn State University

Astronomy & statistics: A glorious history Hipparchus (4th c. BC): Average via midrange of observations Galileo (1572): Average via mean of observations Halley (1693): Foundations of actuarial science Legendre (1805): Cometary orbits via least squares regression Gauss (1809): Normal distribution of errors in planetary orbits Quetelet (1835): Statistics applied to human affairs But the fields diverged in the late 19-20th centuries, astronomy astrophysics (EM, QM, GR) statistics social sciences & industries

Do we need statistics in astronomy today? Are the stars/galaxies/sources we are studying an unbiased sample of the vast underlying population? When should these objects be divided into 2/3/ classes? What is the intrinsic relationship between two properties of a class (especially with confounding variables)? Can we answer such questions in the presence of observations with measurement errors & flux limits?

Do we need statistics in astronomy today? Are the stars/galaxies/sources we are studying an unbiased sample of the vast underlying population? Sampling When should these objects be divided into 2/3/ classes? Multivariate classification What is the intrinsic relationship between two properties of a class (especially with confounding variables)? Multivariate regression Can we answer such questions in the presence of observations with measurement errors & flux limits? Censoring, truncation & measurement errors

When is a blip in a spectrum, image or time series a real signal? Statistical inference How do we model the vast range of variable objects (extrasolar planets, BH accretion, GRBs, )? Time series analysis How do we model the 2/4/6-dimensional points representing galaxies in the Universe or photons in a detector? Spatial point processes & image processing How do we model continuous structures (CMB fluctuations, interstellar/intergalactic media)? Density estimation, regression

How often do astronomers need statistics? (a bibliometric measure) Of ~15,000 refereed papers annually: 1% have `statistics in title or keywords 5% have `statistics in abstract 10% treat variable objects 5-10% (est) analyze data tables 5-10% (est) fit parametric models

The state of astrostatistics today The typical astronomical study uses: Fourier transform for temporal analysis (Fourier 1807) Least squares regression (Legendre 1805) & minimum χ 2 (Pearson 1901) Kolmogorov-Smirnov goodness-of-fit test (Kolmogorov, 1933) Principal components analysis for tables (Hotelling 1936) Even traditional methods are often misused: Six unweighted bivariate least squares fits are used interchangeably in H o studies with wrong confidence intervals Feigelson & Babu ApJ 1992 Likelihood ratio test (F test) usage typically inconsistent with asymptotic statistical theory Protassov et al. ApJ 2002

Statistical methodology is the weak link in astronomical research Telescopes Optics Detectors Data reduction AIPS, IRAF, IDL, computation Science products Images, spectra, lightcurves, catalogs Dissemination Journals, ADS, archives, CDS, VO Astrophysical interpretation Atom/nucl/rad/grav physics Statistical analysis χ 2, K-S test, FFT

But astrostatistics is undergoing a resurgence Growing interest in some modern methods -- wavelets, MCMC, neural nets but not in a full context of capabilities & alternatives Conferences & books: Statistical Challenges in Modern Astronomy at Penn State (1991, 1996, 2001, 2006) Data analysis meetings & monographs (e.g. by Fionn Murtagh & Jean-Luc Starck) Astrostat sessions as AAS and JSM/ISI meetings Emerging astro-stat collaborations: Harvard/Smithsonian (David van Dyk, Chandra scientists, students) CMU/Pitt = PiCA (Larry Wasserman, Chris Genovese, Bob Nichol, ) NASA-ARC/Stanford (Jeffrey Scargle, David Donoho) Berger/Jeffreys/Loredo/Connors Efron/Petrosian, Stark/GONG,

A new imperative: Virtual Observatory Huge, uniform, multivariate databases are emerging from specialized survey projects & telescopes: 10 9 -object catalogs from USNO, 2MASS & SDSS opt/ir surveys 10 6 - galaxy redshift catalogs from 2dF & SDSS 10 5 -source radio/infrared/x-ray catalogs 10 3-4 -samples of well-characterized stars & galaxies with dozens of measured properties Many on-line collections of 10 2-10 6 images & spectra Planned Large-aperture Synoptic Survey Telescope will generate ~10 Pby The Virtual Observatory is an international effort underway to federate these distributed on-line astronomical databases. Powerful statistical tools are needed to derive scientific insights from extracted VO datasets

Types of astronomical datasets (and their statistical challenges) Continuous univariate data Examples: Spectra, lightcurves, instrumental time series Issues: Unevenly-spaced observations, heteroscedastic errors, Gaussian/Poissonian signals, periodic/ aperiodic/explosive variations Bivariate imaging data Examples: IR, visible, X-ray images Issues: Gaussian/Poissonian signals, distorting optics, hierarchical structures

Multivariate data Examples: Catalogs & science studies (rows: stars/galaxies/sources, cols: properties,) Issues: Heteroscedastic errors, truncated & censored data, non-gaussian populations Combinations Interferometric data (radio, IR, gamma-ray) using Fourier transforms to give 3-dim spectro-imaging Interferometric gravitational wave observatories seeking weak periodic & single event signals SDSS obtains 10 5 spectra of galaxies & quasars LSST will obtain 10 10 lightcurves

Multivariate classification of gamma ray bursts GRBs are extremely rapid (1-100s), powerful (L~10 51 erg), explosive events occurring at random times in normal galaxies. They probably arise from a subclass of supernova explosions or colliding binary neutron stars which produces a collimated relativistic fireball. Current detectors can see ~1/day. ~2500 papers have been written on GRBs with $10 8-10 9 spent on their detection & study. We consider here only the limited question of whether the bulk properties of the GRB itself suggest multiple classes of events. Properties include: location in the sky, arrival time, duration, fluence, and spectral hardness.

Dendrogram from nonparametric agglomerative hierarchical clustering Must choose sample, variables, unit-free (e.g. log), metric (e.g. Euclidean), merging procedure (e.g. Ward s criterion) & cluster level. The 3 cluster model is validated with MANOVA tests (e.g. Wilk s Λ and Pillai s trace) at P<<0.0001 level. Mukherjee, Feigelson, Babu, Murtagh, Raftery, Fraley ApJ 1998

Parametric clustering using maximum-likelihood estimator (assuming multivariate normal) using EM Algorithm. Validated using Bayesian Information Criterion: BIC = 2L p logn Mukherjee et al. ApJ 1998 The three cluster model has been confirmed by several subsequent studies. The BATSE Team believes the intermediate class is due in part to biases produced by the on-board burst triggering algorithm. Hakkila et al. ApJ 2003

Some methodological challenges for astrostatistics in the 2000s Simultaneous treatment of measurement errors and censoring (esp. multivariate) Statistical inference and visualization with very-large-n datasets too large for computer memories A user-friendly cookbook for construction of likelihoods & Bayesian computation of astronomical problems Links between astrophysical theory and wavelet coefficients (spatial & temporal) Rich families of time series models to treat accretion and explosive phenomena

Structural challenges for astrostatistics Cross-training of astronomers & statisticians New curriculum, summer workshops Effective statistical consulting Enthusiasm for astro-stat collaborative research Recognition within communities & agencies More funding (astrostat gets <0.1% of astro+stat) Implementation software StatCodes Web metasite (astrostatistics.psu.edu/statcodes) Standardized in R & Inference (www.r-project.org) Inreach & outreach Center for Astrostatistics is designed to further these goals