Data Analysis Methods

Similar documents
Improved Astronomical Inferences via Nonparametric Density Estimation

Large Scale Bayesian Inference

Exploiting Sparse Non-Linear Structure in Astronomical Data

Statistical Searches in Astrophysics and Cosmology

(Slides for Tue start here.)

Probabilistic photometric redshifts in the era of Petascale Astronomy

Inference of Galaxy Population Statistics Using Photometric Redshift Probability Distribution Functions

Addressing the Challenges of Luminosity Function Estimation via Likelihood-Free Inference

Hierarchical Bayesian Modeling

Big Data Inference. Combining Hierarchical Bayes and Machine Learning to Improve Photometric Redshifts. Josh Speagle 1,

Kavli IPMU-Berkeley Symposium "Statistics, Physics and Astronomy" January , 2018 Lecture Hall, Kavli IPMU

Some Statistical Aspects of Photometric Redshift Estimation

PROBABILISTIC PROGRAMMING: BAYESIAN MODELLING MADE EASY

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

PROBABILISTIC PROGRAMMING: BAYESIAN MODELLING MADE EASY. Arto Klami

Using training sets and SVD to separate global 21-cm signal from foreground and instrument systematics

Hierarchical Bayesian Modeling

Refining Photometric Redshift Distributions with Cross-Correlations

Bayes via forward simulation. Approximate Bayesian Computation

Lectures in AstroStatistics: Topics in Machine Learning for Astronomers

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Science with large imaging surveys

Uncertainty Quantification for Machine Learning and Statistical Models

Bayesian Inference and MCMC

Development of Stochastic Artificial Neural Networks for Hydrological Prediction

Approximate Bayesian computation: an application to weak-lensing peak counts

Astronomical image reduction using the Tractor

Constraints on the Inflation Model

Data Exploration vis Local Two-Sample Testing

Dictionary Learning for photo-z estimation

Gaussian Process Modeling in Cosmology: The Coyote Universe. Katrin Heitmann, ISR-1, LANL

STA 4273H: Statistical Machine Learning

Approximate Bayesian Computation for the Stellar Initial Mass Function

arxiv: v1 [astro-ph.co] 3 Apr 2019

STA 4273H: Sta-s-cal Machine Learning

From the Big Bang to Big Data. Ofer Lahav (UCL)

Fast Likelihood-Free Inference via Bayesian Optimization

Multimodal Nested Sampling

The connection of dropout and Bayesian statistics

Measuring Neutrino Masses and Dark Energy

Approximate Bayesian Computation and Particle Filters

Department of Statistics Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA citizenship: U.S.A.

Bayesian Modeling for Type Ia Supernova Data, Dust, and Distances

ITP, Universität Heidelberg Jul Weak Lensing of SNe. Marra, Quartin & Amendola ( ) Quartin, Marra & Amendola ( ) Miguel Quartin

Constraining Dark Energy: First Results from the SDSS-II Supernova Survey

STA 4273H: Statistical Machine Learning

Outline Challenges of Massive Data Combining approaches Application: Event Detection for Astronomical Data Conclusion. Abstract

Cosmology of Photometrically- Classified Type Ia Supernovae

Bayesian Regression Linear and Logistic Regression

Introduction. Chapter 1

Density Estimation. Seungjin Choi

Weak Lensing: Status and Prospects

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 13: SEQUENTIAL DATA

A novel determination of the local dark matter density. Riccardo Catena. Institut für Theoretische Physik, Heidelberg

Abstract. Statistics for Twenty-first Century Astrometry. Bayesian Analysis and Astronomy. Advantages of Bayesian Methods

STA414/2104 Statistical Methods for Machine Learning II

Nonparametric Bayesian Methods (Gaussian Processes)

An Introduction to the Dark Energy Survey

Machine Learning. Probabilistic KNN.

The Impact of Positional Uncertainty on Gamma-Ray Burst Environment Studies

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Hierarchical Bayesian Modeling of Planet Populations

Statistical Learning Reading Assignments

Cosmology and dark energy with WFIRST HLS + WFIRST/LSST synergies

The Galaxy Dark Matter Connection

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

STAT 518 Intro Student Presentation

Probabilistic Machine Learning

Astrostatistics: Past, Present and Future

Replicated Softmax: an Undirected Topic Model. Stephen Turner

Astrostatistics: Past, Present & Future

SNLS supernovae photometric classification with machine learning

A Statistical Input Pruning Method for Artificial Neural Networks Used in Environmental Modelling

Spatial inference for astronomical data sets with INLA

The local dark matter halo density. Riccardo Catena. Institut für Theoretische Physik, Heidelberg

CosmoSIS a statistical framework for precision cosmology. Elise Jennings

Future Opportunities for Collaborations: Exoplanet Astronomers & Statisticians

INTRODUCTION TO PATTERN RECOGNITION

Bayesian Estimation of log N log S

STUDY OF THE LARGE-SCALE STRUCTURE OF THE UNIVERSE USING GALAXY CLUSTERS

Uncertainty quantification for Wavefield Reconstruction Inversion

Introduction to astrostatistics

STA 4273H: Statistical Machine Learning

Introduction to CosmoMC

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Dynamic System Identification using HDMR-Bayesian Technique

Discovery of Primeval Large-Scale Structures with Forming Clusters at Redshift z=5.7

Learning algorithms at the service of WISE survey

Nonlinear Data Transformation with Diffusion Map

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Approximate Bayesian Computation for Astrostatistics

Astronomical Time Series Analysis for the Synoptic Survey Era. Joseph W. Richards. UC Berkeley Department of Astronomy Department of Statistics

LSST, Euclid, and WFIRST

arxiv:astro-ph/ v1 14 Sep 2005

LSST Cosmology and LSSTxCMB-S4 Synergies. Elisabeth Krause, Stanford

Monte Carlo Techniques for Regressing Random Variables. Dan Pemstein. Using V-Dem the Right Way

The Challenges of Heteroscedastic Measurement Error in Astronomical Data

New Opportunities in Petascale Astronomy. Robert J. Brunner University of Illinois

Data-Intensive Statistical Challenges in Astrophysics

Transcription:

Data Analysis Methods Successes, Opportunities, and Challenges Chad M. Schafer Department of Statistics & Data Science Carnegie Mellon University cschafer@cmu.edu

The Astro/Data Science Community LSST Informatics and Statistics Science Collaboration Over 75 Members International Astrostatistics Association Over 400 Members American Stat. Assoc. Astrostatistics Interest Group Over 50 Members Individuals motivated by advancing science with novel data analysis tools

Success Story: MCMC in Cosmology Markov Chain Monte Carlo (MCMC) - Class of methods for parameter estimation, approximation of posterior Algorithms date to mid-20th Century, but development took off in Statistics during 1990s By the 2000s, MCMC playing key role in cosmology, including WMAP analysis. (Spergel 2013) Focus on development of resources for MCMC in cosmology (e.g., CosmoMC)

Success Story: Redshift Estimation Estimation of Redshifts using Photometry Crucial challenge faced by photometric surveys Range of methods proposed and attempted, using training sets built from spectroscopic samples. Graham, et al. (2017): Photometric redshifts are an essential part of every cosmological science goal of the LSST.

Success Stories What do these successes have in common? Smaller gap between existing, advanced methods and the application - MCMC naturally addressed challenges of estimation of cosmological parameters with CMB - Photometric redshift estimation can be viewed as a standard supervised learning problem In both cases the methodology is crucial to the science

The Potential Theme: Making full use of data from large surveys Sample size alone does not quantify information content More data reduces variance, better modelling reduces bias Specific areas for potential gains: - Information preservation when representing raw data Improved propagation of errors Lessening bias from incorrect distributional assumptions Joint analyses of data sets

Example: Filament Catalog Chen, et al. (2015): Density ridges in galaxy map lead to finding filamentary structure SDSS filament catalog: https://sites.google.com/site/yenchicr/catalogue Plans to create for LSST

Example: Filament Catalog Example of complex data representation Nonparametric - Does not make restrictive assumptions Emphasis on error quantification Broad Potential: Create catalogs of rich scientific information, avoiding overcompression

Example: Model Misspecification Bias Long (2017): Improving period estimation from light curves through better adjustment for measurement errors Models for light curves misspecified, but period estimates remain accurate Broad Potential: Accept that models are incorrect, bias can be reduced through error quantification

Example: Variational Inference Reiger, et al. (2015): Multiparameter, hierarchical generative model for pixel intensity from a star/galaxy Variational inference approximates posterior when using models of such complexity

Example: Variational Inference Reiger, et al. (2015): Multiparameter, hierarchical generative model for pixel intensity from a star/galaxy Variational inference approximates posterior when using models of such complexity Broad Potential: Variational inference could enable more realistic model fits

Example: Light Curve Classification Castro, et al. (2017): Classification of Variable Stars Handling Observational Gaps and Noise Bootstrap sampling of models fit to light curves in order to assess stability of features Probabilistic classification based on feature uncertainty Broad Potential: Quantification of uncertainty crucial in classification

Example: Likelihood Free Inference Hahn, et al. (2017): Approximate Bayesian computation in large-scale structure. Complexity of LSS makes impossible writing likeilhood function of sufficient accuracy. ABC offers approach which circumvents likelihood Broad Potential: Errors in likelihood are magnified with large samples, esp. comparing simulations and data

The Examples What do the examples have in common? The shift from the variance-dominated era to the bias-dominated era requires careful thought about methods, not just the scaling of existing methods. The gap from method to application is larger, and requires - adaptation of existing methods - deep knowledge of application - expertise on both data science and astronomy sides

Barriers to Progress Complexities Communication Computations Compensation

Why am I here? Working to close the gap Increased role in this discussion Data analysis tools as a crucial community resource Start of work towards a community white paper Massive, rich data sets are an huge opportunity, but reaching full potential requires avoiding overcompression and adequate modelling

References Castro, et al. (2017) Astronomical Journal, Vol. 155 Chen, et al. (2017) MNRAS, Vol. 466 Graham, et al. (2017) arxiv 1706.09507 Hahn, et al. (2017) MNRAS, Vol. 469 Long (2017) Elec. J. of Statistics, Vol. 11 Reiger, et al. (2015) Proceedings of ICML