Approximate Bayesian Computation for Astrostatistics

Similar documents
Approximate Bayesian Computation for the Stellar Initial Mass Function

Approximate Bayesian Computation

Bayes via forward simulation. Approximate Bayesian Computation

Introduction to Bayesian Methods

Addressing the Challenges of Luminosity Function Estimation via Likelihood-Free Inference

Hierarchical Bayesian Modeling with Approximate Bayesian Computation

Approximate Bayesian Computation: a simulation based approach to inference

Fitting the Bartlett-Lewis rainfall model using Approximate Bayesian Computation

Tutorial on Approximate Bayesian Computation

Approximate Bayesian Computation

The Initial Mass Function Elisa Chisari

Statistical Inference: Maximum Likelihood and Bayesian Approaches

Measurement Error and Linear Regression of Astronomical Data. Brandon Kelly Penn State Summer School in Astrostatistics, June 2007

Robust and accurate inference via a mixture of Gaussian and terrors

Tutorial on ABC Algorithms

Proceedings of the 2012 Winter Simulation Conference C. Laroque, J. Himmelspach, R. Pasupathy, O. Rose, and A. M. Uhrmacher, eds.

Lectures in AstroStatistics: Topics in Machine Learning for Astronomers

Bayesian Analysis of RR Lyrae Distances and Kinematics

Statistics: Learning models from data

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model

[MCEN 5228]: Project report: Markov Chain Monte Carlo method with Approximate Bayesian Computation

Space Telescope Science Institute statistics mini-course. October Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses

Review from last class:

Stellar Populations: Resolved vs. unresolved

Module 22: Bayesian Methods Lecture 9 A: Default prior selection

Bayesian Indirect Inference using a Parametric Auxiliary Model

Lecture 3 The IGIMF and implications

One Pseudo-Sample is Enough in Approximate Bayesian Computation MCMC

Introduction to Bayes

Lecture 21 Formation of Stars November 15, 2017

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

Sequential Monte Carlo with Adaptive Weights for Approximate Bayesian Computation

Is the IMF a probability density distribution function or is star formation a self-regulated process?

Fast Likelihood-Free Inference via Bayesian Optimization

Department of Statistics Carnegie Mellon University 5000 Forbes Avenue Pittsburgh, PA citizenship: U.S.A.

Galaxies and Cosmology

Physics of Galaxies 2016 Exercises with solutions batch I

Hubble s Law and the Cosmic Distance Scale

Galaxies and Cosmology

PARAMETER ESTIMATION: BAYESIAN APPROACH. These notes summarize the lectures on Bayesian parameter estimation.

CAULDRON: dynamics meets gravitational lensing. Matteo Barnabè

Lecture : Probabilistic Machine Learning

Review of stellar evolution and color-magnitude diagrams

arxiv: v1 [stat.me] 30 Sep 2009

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

Star Cluster Formation and the Origin of Stellar Properties. Matthew Bate University of Exeter

Data Analysis Methods

Stellar Life Cycle in Giant Galactic Nebula NGC edited by David L. Alles Western Washington University

A100 Exploring the Universe: Discovering Galaxies. Martin D. Weinberg UMass Astronomy

Probability distribution functions

Active Galaxies and Galactic Structure Lecture 22 April 18th

Bayesian Modeling for Type Ia Supernova Data, Dust, and Distances

Chapter 19 Galaxies. Hubble Ultra Deep Field: Each dot is a galaxy of stars. More distant, further into the past. halo

Abundance Constraints on Early Chemical Evolution. Jim Truran

The Milky Way. Finding the Center. Milky Way Composite Photo. Finding the Center. Milky Way : A band of and a. Milky Way

Review of stellar evolution and color-magnitude diagrams

Lecture 13 Fundamentals of Bayesian Inference

The Impact of Positional Uncertainty on Gamma-Ray Burst Environment Studies

Bayesian Regression Linear and Logistic Regression

Lecture 5: Bayes pt. 1

M31 Color Mag Diagram Brown et al 592:L17-L20!

Approximate Bayesian computation: methods and applications for complex systems

Lecture 25: The Cosmic Distance Scale Sections 25-1, 26-4 and Box 26-1

Approximate Bayesian computation: an application to weak-lensing peak counts

ASTR Astrophysics 1 - Stellar and Interstellar. Phil Armitage. office: JILA tower A909

Foundations of Statistical Inference

Distinguishing between WDM and CDM by studying the gap power spectrum of stellar streams

Chapter 30. Galaxies and the Universe. Chapter 30:


Nonparametric Estimation of Luminosity Functions

Approximate Bayesian computation for spatial extremes via open-faced sandwich adjustment

Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation

arxiv: v1 [stat.co] 26 Mar 2015

A Very Brief Summary of Bayesian Inference, and Examples

Astronomy 102: Stars and Galaxies Examination 3 Review Problems

The Cosmological Redshift. Cepheid Variables. Hubble s Diagram

CONDITIONING ON PARAMETER POINT ESTIMATES IN APPROXIMATE BAYESIAN COMPUTATION

Chapter 20: Galaxies and the Foundation of Modern Cosmology

Principles of Bayesian Inference

Galaxies. With a touch of cosmology

Minimum Message Length Analysis of the Behrens Fisher Problem

Part two of a year-long introduction to astrophysics:

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Relativity and Astrophysics Lecture 15 Terry Herter. RR Lyrae Variables Cepheids Variables Period-Luminosity Relation. A Stellar Properties 2

Lecture Outlines. Chapter 23. Astronomy Today 8th Edition Chaisson/McMillan Pearson Education, Inc.


PHYS1118 Astronomy II

ASTRO 310: Galactic & Extragalactic Astronomy Prof. Jeff Kenney. Class 2 August 29, 2018 The Milky Way Galaxy: Stars in the Solar Neighborhood

6. Star Colors and the Hertzsprung-Russell Diagram

6. Star Colors and the Hertzsprung-Russell Diagram.

Age Dating A SSP. Quick quiz: please write down a 3 sentence explanation of why these plots look like they do.

Phys333 - sample questions for final

6. Star Colors and the Hertzsprung-Russell Diagram

Cosmologists dedicate a great deal of effort to determine the density of matter in the universe. Type Ia supernovae observations are consistent with

Bayesian Inference. Chapter 1. Introduction and basic concepts

Today in Astronomy 142

Lecture PowerPoints. Chapter 33 Physics: Principles with Applications, 7 th edition Giancoli

d(m/m 1 ) (15.01) φ(m)dm m (1+x) dm = φ 1 m 1

Lecture Outlines. Chapter 24. Astronomy Today 8th Edition Chaisson/McMillan Pearson Education, Inc.

Spitzer Space Telescope Cycle-6 Exploration Science Programs

Transcription:

Approximate Bayesian Computation for Astrostatistics Jessi Cisewski Department of Statistics Yale University October 24, 2016 SAMSI Undergraduate Workshop

Our goals Introduction to Bayesian methods Likelihoods, priors, posteriors Learn our ABCs... Approximate Bayesian Computation Image from http://livingstontownship.org/lycsblog/?p=241 1

Suppose we randomly select a sample of size n women and measure their heights: x 1, x 2, x 3,..., x n Let s assume this follows a Normal distribution: X N(µ, σ) f (x; µ, σ) = 1 (x µ)2 e 2σ 2 2πσ 2 2

Stellar Initial Mass Function 3

Probability density function (pdf) the function f (x, µ, σ), with µ fixed and x variable f (x, µ, σ) = 1 (x µ)2 e 2σ 2 2πσ 2 Density µ Height Height (in) Each observation from this 4

Probability density function (pdf) the function f (x, µ, σ), with µ fixed and x variable f (x, µ, σ) = 1 (x µ)2 e 2σ 2 2πσ 2 Density µ Height Height (in) Each observation from this Likelihood the function f (x, µ, σ), with µ variable and x fixed f (x, µ, σ) = 1 (x µ)2 e 2σ 2 2πσ 2 Likelihood µ X µ 4

Consider a random sample of size n (assuming independence, and a known σ): X 1,..., X n N(3, 2) n 1 f (x, µ, σ) = f (x 1,..., x n, µ, σ)= (x 2πσ 2 e i µ) 2 2σ 2 i=1 (normalized) Likelihood 0 1 2 3 4 5 n = 50 n = 100 n = 250 n = 500 µ 1.0 1.5 2.0 2.5 3.0 3.5 4.0 µ 5

Likelihood Principle All of the information in a sample is contained in the likelihood function, a density or distribution function. The data are modeled by a likelihood function. How do we infer µ? Or any unknown parameter(s) θ? 6

Maximum likelihood estimation The parameter value, θ, that maximizes the likelihood: ˆθ = max f (x 1,..., x n, θ) θ Likelihood 0e+00 2e-43 4e-43 6e-43 MLE µ 1.0 2.0 3.0 4.0 µ max µ f (x 1,..., x n, µ, σ) = max µ n i=1 Hence, ˆµ = 1 (x 2πσ 2 e i µ)2 2σ 2 n i=1 x i n = x 7

Bayesian framework Suppose θ is our parameter(s) of interest Classical or Frequentist methods for inference consider θ to be fixed and unknown performance of methods evaluated by repeated sampling consider all possible data sets Bayesian methods consider θ to be random only considers observed data set and prior information 8

Posterior distribution Likelihood Prior {}}{{}}{ Data {}}{ f (x θ) π(θ) π(θ x ) = = f (x) Θ f (x θ)π(θ) f (x θ)π(θ) dθf (x θ)π(θ) The prior distribution allows you to easily incorporate your beliefs about the parameter(s) of interest Posterior is a distribution on the parameter space given the observed data 9

Data: y 1,..., y 4 N(µ = 3, σ = 2), ȳ = 1.278 Prior: N(µ 0 = 3, σ 0 = 5) Posterior: N(µ 1 = 1.114, σ 1 = 0.981) Density 0.0 0.1 0.2 0.3 0.4 Likelihood Prior Posterior -10-5 0 5 10 µ 10

Data: y 1,..., y 4 N(µ = 3, σ = 2), ȳ = 1.278 Prior: N(µ 0 = 3, σ 0 = 1) Posterior: N(µ 1 = 0.861, σ 1 = 0.707) Density 0.0 0.2 0.4 0.6 Likelihood Prior Posterior -10-5 0 5 10 µ 11

Data: y 1,..., y 200 N(µ = 3, σ = 2), ȳ = 2.999 Prior: N(µ 0 = 3, σ 0 = 1) Posterior: N(µ 1 = 2.881, σ 1 = 0.140) Density 0.0 1.0 2.0 3.0 Likelihood Prior Posterior -6-4 -2 0 2 4 6 µ 12

Prior distribution The prior distribution allows you to easily incorporate your beliefs about the parameter(s) of interest If one has a specific prior in mind, then it fits nicely into the definition of the posterior But how do you go from prior information to a prior distribution? And what if you don t actually have prior information? 13

Prior distribution The prior distribution allows you to easily incorporate your beliefs about the parameter(s) of interest If one has a specific prior in mind, then it fits nicely into the definition of the posterior But how do you go from prior information to a prior distribution? And what if you don t actually have prior information? And what if you don t know the likelihood?? 13

Approximate Bayesian Computation Likelihood-free approach to approximating π(θ x obs ) i.e. f (x obs θ) not specified Proceeds via simulation of the forward process Why would we not know f (x obs θ)? 1 Physical model too complex to write as a closed form function of θ 2 Strong dependency in data 3 Observational limitations ABC in Astronomy: Ishida et al. (2015); Akeret et al. (2015); Weyant et al. (2013); Schafer and Freeman (2012); Cameron and Pettitt (2012) 14

15

Basic ABC algorithm For the observed data x obs and prior π(θ): Algorithm 1 Sample θ prop from prior π(θ) 2 Generate x prop from forward process f (x θ prop ) 3 Accept θ prop if x obs = x prop 4 Return to step 1 Introduced in Pritchard et al. (1999) (population genetics) 16

Step 3: Accept θ prop if x obs = x prop Waiting for proposals such that x obs = x prop would be computationally prohibitive Instead, accept proposals with (x obs, x prop ) ɛ for some distance and some tolerance threshold ɛ When x is high-dimensional, will have to make ɛ too large in order to keep acceptance probability reasonable. Instead, reduce the dimension by comparing summaries S(x prop ) and S(x obs ) 17

Gaussian illustration Data x obs consists of 25 iid draws from Normal(µ, 1) Summary statistics S(x) = x Distance function (S(x prop ), S(x obs )) = x prop x obs Tolerance ɛ = 0.50 and 0.10 Prior π(µ) = Normal(0,10) 18

Gaussian illustration: posteriors for µ Density 0.0 0.5 1.0 1.5 2.0 Tolerance: 0.5, N:1000, n:25 True Posterior ABC Posterior Input value Density 0.0 0.5 1.0 1.5 2.0 Tolerance: 0.1, N:1000, n:25 True Posterior ABC Posterior Input value -1.0-0.5 0.0 0.5 1.0 µ -1.0-0.5 0.0 0.5 1.0 µ 19

How to pick a tolerance, ɛ? Instead of starting the ABC algorithm over with a smaller tolerance (ɛ), decrease the tolerance and use the already sampled particle system as a proposal distribution rather than drawing from the prior distribution. Particle system: (1) retained sampled values, (2) importance weights Beaumont et al. (2009); Moral et al. (2011); Bonassi and West (2004) 20

Gaussian illustration: sequential posteriors N:1000, n:25 Density 0.0 0.5 1.0 1.5 2.0 True Posterior ABC Posterior Input value -1.0-0.5 0.0 0.5 1.0 Tolerance sequence, ɛ 1:10 : 1.00 0.75 0.53 0.38 0.27 0.19 0.15 0.11 0.08 0.06 µ 21

Stellar Initial Mass Function 22

Stellar Initial Mass Function: the distribution of star masses after a star formation event within a specified volume of space Molecular cloud Protostars Stars Image: adapted from http://www.astro.ljmu.ac.uk 23

Broken power-law (Kroupa, 2001) Φ(M) M α i, M 1i M M 2i α 1 = 0.3 α 2 = 1.3 for 0.01 M/MSun 0.08 [Sub-stellar] for 0.08 M/M Sun 0.50 α 3 = 2.3 for 0.50 M/M Sun M max Many other models, e.g. Salpeter (1955); Chabrier (2003) 1 M Sun = 1 Solar Mass (the mass of our Sun) 24

ABC for the Stellar Initial Mass Function Frequency 0 200 400 600 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Mass Image (left): Adapted from http://www.astro.ljmu.ac.uk 25

IMF Likelihood Start with a power-law distribution: each star s mass is independently drawn from a power law distribution with density ( ) 1 α f (m) = Mmax 1 α M 1 α m α, m (M min, M max ) min Then the likelihood is ( ) ntot 1 α n tot L(α m 1:ntot ) = Mmax 1 α M 1 α min n tot = total number of stars in cluster i=1 m α i 26

Observational limitations: aging Lifecycle of star depends on mass more massive stars die faster Cluster age of τ Myr only observe stars with masses < T age τ 2/5 10 8/5 If age = 30 Myr so the aging cutoff is T age 10 M Sun Then the likelihood is ( L(α m 1:nobs, n tot) = T 1 α age 1 α M 1 α min ) ( nobs nobs i=1 m α i ) P(M > T age) ntot n obs n tot = # of stars in cluster n obs = # stars observed in cluster Image: http://scioly.org 27

Observational limitations: completeness Completeness function: 0, m < C min m C P(observing star m) = min C max C min, m [C min, C max ] 1, m > C max Probability of observing a particular star given its mass Depends on the flux limit, stellar crowding, etc. Image: NASA, J. Trauger (JPL), J. Westphal (Caltech) 28

Observational limitations: measurement error Incorporating log-normal measurement error gives our final likelihood: L(α m 1:nobs, n tot ) = ( ( 1 α P(M > T age ) + Mmax 1 α M 1 α min n { obs Tage (2πσ 2 ) 1 2 m 1 i e 1 i=1 2 ( ( ) M Cmin I {M > C max} + C max C min ) Cmax C min M α 2σ 2 (log(m i ) log(m))2 ( I {C min M C max} ( 1 M C ) ) n tot n obs min dm C max C min ) 1 α Mmax 1 α M 1 α M α min ) } dm 29

IMF With aging, completeness, and error Density 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Density 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0 10 20 30 40 50 60 Mass 2 4 6 8 10 12 14 16 Mass Sample size = 1000 stars, [C min, C max ] = [2, 4], σ = 0.25 30

Simulation Study: forward model Draw from ( f (m) = Aged 30 Myrs 1 α 60 1 α 2 1 α ) m α, m (2, 60) Observational completeness: 0, m < 4 P(obs m) = m 2 2, m [2, 4] 1, m > 4. Uncertainty: log M = log m + 0.25η (with η N(0, 1)) Prior: α U[0, 6] 31

Simulation Study: summary statistics We want to account for the following with our summary statistics and distance functions: 1. Shape of the observed Mass Function [ } ] 2 1/2 ρ 1 (m sim, m obs ) = {ˆf log (x) ˆf msim log (x) mobs dx 32

2. Number of stars observed ρ 2 (m sim, m obs ) = 1 n sim /n obs m sim = masses of the stars simulated from the forward model m obs = masses of observed stars n sim = number of stars simulated from the forward model n obs = number of observed stars 33

Simulation Study 1 Draw n = 10 3 stars 2 IMF slope α = 2.35 with M min = 2 and M max = 60 3 N = 10 3 particles 4 T = 30 sequential time steps Observed Mass Function Density 0.00 0.05 0.10 0.15 0.20 0.25 0 5 10 15 Mass 34

IMF Observed MF Density 0.00 0.10 0.20 0.30 Density 0.00 0.10 0.20 0.30 0 10 20 30 40 50 N = 1000 Bandwidth = 0.4971 0 10 20 30 40 50 N = 510 Bandwidth = 0.5625 Sample size = 1000 stars, [C min, C max ] = [2, 4], σ = 0.25 35

Simulation Study results 1 Draw n = 10 3 stars 2 IMF slope α = 2.35 with M min = 2 and M max = 60 3 N = 10 3 particles 4 T = 30 sequential time steps Log10(f M) 0.0 0.5 1.0 1.5 2.0 2.5 (A) Observed Mass Function Observed MF 95% Credible band Posterior Median 0.4 0.6 0.8 1.0 Log10(M) Density 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 (B) Posteriors Initial ABC posterior ABC Posterior... True Posterior Intermediate ABC posterior Input value... Final ABC Posterior 1.6 1.8 2.0 2.2 2.4 2.6 2.8 α 36

Summary ABC can be a useful tool when data are too complex to define a reasonable likelihood, E.g. the Stellar Initial Mass Function Selection of good summary statistics is crucial for ABC posterior to be meaningful 37

Summary ABC can be a useful tool when data are too complex to define a reasonable likelihood, E.g. the Stellar Initial Mass Function Selection of good summary statistics is crucial for ABC posterior to be meaningful THANK YOU! 37

Bibliography I Akeret, J., Refregier, A., Amara, A., Seehars, S., and Hasner, C. (2015), Approximate Bayesian computation for forward modeling in cosmology, Journal of Cosmology and Astroparticle Physics, 2015, 043. Bate, M. R. (2012), Stellar, brown dwarf and multiple star properties from a radiation hydrodynamical simulation of star cluster formation, Monthly Notices of the Royal Astronomical Society, 419, 3115 3146. Beaumont, M. A., Cornuet, J.-M., Marin, J.-M., and Robert, C. P. (2009), Adaptive approximate Bayesian computation, Biometrika, 96, 983 990. Bonassi, F. V. and West, M. (2004), Sequential Monte Carlo with Adaptive Weights for Approximate Bayesian Computation, Bayesian Analysis, 1, 1 19. Cameron, E. and Pettitt, A. N. (2012), Approximate Bayesian Computation for Astronomical Model Analysis: A Case Study in Galaxy Demographics and Morphological Transformation at High Redshift, Monthly Notices of the Royal Astronomical Society, 425, 44 65. Chabrier, G. (2003), The galactic disk mass function: Reconciliation of the hubble space telescope and nearby determinations, The Astrophysical Journal Letters, 586, L133. Ishida, E. E. O., Vitenti, S. D. P., Penna-Lima, M., Cisewski, J., de Souza, R. S., Trindade, A. M. M., Cameron, E., and Busti, V. C. (2015), cosmoabc: Likelihood-free inference via Population Monte Carlo Approximate Bayesian Computation, Astronomy and Computing, 13, 1 11. Kroupa, P. (2001), On the variation of the initial mass function, Monthly Notices of the Royal Astronomical Society, 322, 231 246. Moral, P. D., Doucet, A., and Jasra, A. (2011), An adaptive sequential Monte Carlo method for approximate Bayesian computation, Statistics and Computing, 22, 1009 1020. Pritchard, J. K., Seielstad, M. T., and Perez-Lezaun, A. (1999), Population Growth of Human Y Chromosomes: A study of Y Chromosome Microsatellites, Molecular Biology and Evolution, 16, 1791 1798. Salpeter, E. E. (1955), The luminosity function and stellar evolution. The Astrophysical Journal, 121, 161. Schafer, C. M. and Freeman, P. E. (2012), Statistical Challenges in Modern Astronomy V, Springer, chap. 1, Lecture Notes in Statistics, pp. 3 19. Weyant, A., Schafer, C., and Wood-Vasey, W. M. (2013), Likeihood-free cosmological inference with type Ia supernovae: approximate Bayesian computation for a complete treatment of uncertainty, The Astrophysical Journal, 764, 116. 38