Approximate Bayesian Computation for Astrostatistics

Approximate Bayesian Computation for Astrostatistics Jessi Cisewski Department of Statistics Yale University October 24, 2016 SAMSI Undergraduate Workshop

Our goals Introduction to Bayesian methods Likelihoods, priors, posteriors Learn our ABCs... Approximate Bayesian Computation Image from http://livingstontownship.org/lycsblog/?p=241 1

Suppose we randomly select a sample of size n women and measure their heights: x 1, x 2, x 3,..., x n Let s assume this follows a Normal distribution: X N(µ, σ) f (x; µ, σ) = 1 (x µ)2 e 2σ 2 2πσ 2 2

Stellar Initial Mass Function 3

Probability density function (pdf) the function f (x, µ, σ), with µ fixed and x variable f (x, µ, σ) = 1 (x µ)2 e 2σ 2 2πσ 2 Density µ Height Height (in) Each observation from this 4

Probability density function (pdf) the function f (x, µ, σ), with µ fixed and x variable f (x, µ, σ) = 1 (x µ)2 e 2σ 2 2πσ 2 Density µ Height Height (in) Each observation from this Likelihood the function f (x, µ, σ), with µ variable and x fixed f (x, µ, σ) = 1 (x µ)2 e 2σ 2 2πσ 2 Likelihood µ X µ 4

Consider a random sample of size n (assuming independence, and a known σ): X 1,..., X n N(3, 2) n 1 f (x, µ, σ) = f (x 1,..., x n, µ, σ)= (x 2πσ 2 e i µ) 2 2σ 2 i=1 (normalized) Likelihood 0 1 2 3 4 5 n = 50 n = 100 n = 250 n = 500 µ 1.0 1.5 2.0 2.5 3.0 3.5 4.0 µ 5

Likelihood Principle All of the information in a sample is contained in the likelihood function, a density or distribution function. The data are modeled by a likelihood function. How do we infer µ? Or any unknown parameter(s) θ? 6

Maximum likelihood estimation The parameter value, θ, that maximizes the likelihood: ˆθ = max f (x 1,..., x n, θ) θ Likelihood 0e+00 2e-43 4e-43 6e-43 MLE µ 1.0 2.0 3.0 4.0 µ max µ f (x 1,..., x n, µ, σ) = max µ n i=1 Hence, ˆµ = 1 (x 2πσ 2 e i µ)2 2σ 2 n i=1 x i n = x 7

Bayesian framework Suppose θ is our parameter(s) of interest Classical or Frequentist methods for inference consider θ to be fixed and unknown performance of methods evaluated by repeated sampling consider all possible data sets Bayesian methods consider θ to be random only considers observed data set and prior information 8

Posterior distribution Likelihood Prior {}}{{}}{ Data {}}{ f (x θ) π(θ) π(θ x ) = = f (x) Θ f (x θ)π(θ) f (x θ)π(θ) dθf (x θ)π(θ) The prior distribution allows you to easily incorporate your beliefs about the parameter(s) of interest Posterior is a distribution on the parameter space given the observed data 9

Data: y 1,..., y 4 N(µ = 3, σ = 2), ȳ = 1.278 Prior: N(µ 0 = 3, σ 0 = 5) Posterior: N(µ 1 = 1.114, σ 1 = 0.981) Density 0.0 0.1 0.2 0.3 0.4 Likelihood Prior Posterior -10-5 0 5 10 µ 10

Data: y 1,..., y 4 N(µ = 3, σ = 2), ȳ = 1.278 Prior: N(µ 0 = 3, σ 0 = 1) Posterior: N(µ 1 = 0.861, σ 1 = 0.707) Density 0.0 0.2 0.4 0.6 Likelihood Prior Posterior -10-5 0 5 10 µ 11

Data: y 1,..., y 200 N(µ = 3, σ = 2), ȳ = 2.999 Prior: N(µ 0 = 3, σ 0 = 1) Posterior: N(µ 1 = 2.881, σ 1 = 0.140) Density 0.0 1.0 2.0 3.0 Likelihood Prior Posterior -6-4 -2 0 2 4 6 µ 12

Prior distribution The prior distribution allows you to easily incorporate your beliefs about the parameter(s) of interest If one has a specific prior in mind, then it fits nicely into the definition of the posterior But how do you go from prior information to a prior distribution? And what if you don t actually have prior information? 13

Approximate Bayesian Computation Likelihood-free approach to approximating π(θ x obs ) i.e. f (x obs θ) not specified Proceeds via simulation of the forward process Why would we not know f (x obs θ)? 1 Physical model too complex to write as a closed form function of θ 2 Strong dependency in data 3 Observational limitations ABC in Astronomy: Ishida et al. (2015); Akeret et al. (2015); Weyant et al. (2013); Schafer and Freeman (2012); Cameron and Pettitt (2012) 14

Basic ABC algorithm For the observed data x obs and prior π(θ): Algorithm 1 Sample θ prop from prior π(θ) 2 Generate x prop from forward process f (x θ prop ) 3 Accept θ prop if x obs = x prop 4 Return to step 1 Introduced in Pritchard et al. (1999) (population genetics) 16

Step 3: Accept θ prop if x obs = x prop Waiting for proposals such that x obs = x prop would be computationally prohibitive Instead, accept proposals with (x obs, x prop ) ɛ for some distance and some tolerance threshold ɛ When x is high-dimensional, will have to make ɛ too large in order to keep acceptance probability reasonable. Instead, reduce the dimension by comparing summaries S(x prop ) and S(x obs ) 17

Gaussian illustration Data x obs consists of 25 iid draws from Normal(µ, 1) Summary statistics S(x) = x Distance function (S(x prop ), S(x obs )) = x prop x obs Tolerance ɛ = 0.50 and 0.10 Prior π(µ) = Normal(0,10) 18

Gaussian illustration: posteriors for µ Density 0.0 0.5 1.0 1.5 2.0 Tolerance: 0.5, N:1000, n:25 True Posterior ABC Posterior Input value Density 0.0 0.5 1.0 1.5 2.0 Tolerance: 0.1, N:1000, n:25 True Posterior ABC Posterior Input value -1.0-0.5 0.0 0.5 1.0 µ -1.0-0.5 0.0 0.5 1.0 µ 19

How to pick a tolerance, ɛ? Instead of starting the ABC algorithm over with a smaller tolerance (ɛ), decrease the tolerance and use the already sampled particle system as a proposal distribution rather than drawing from the prior distribution. Particle system: (1) retained sampled values, (2) importance weights Beaumont et al. (2009); Moral et al. (2011); Bonassi and West (2004) 20

Gaussian illustration: sequential posteriors N:1000, n:25 Density 0.0 0.5 1.0 1.5 2.0 True Posterior ABC Posterior Input value -1.0-0.5 0.0 0.5 1.0 Tolerance sequence, ɛ 1:10 : 1.00 0.75 0.53 0.38 0.27 0.19 0.15 0.11 0.08 0.06 µ 21

Stellar Initial Mass Function 22

Stellar Initial Mass Function: the distribution of star masses after a star formation event within a specified volume of space Molecular cloud Protostars Stars Image: adapted from http://www.astro.ljmu.ac.uk 23

Broken power-law (Kroupa, 2001) Φ(M) M α i, M 1i M M 2i α 1 = 0.3 α 2 = 1.3 for 0.01 M/MSun 0.08 [Sub-stellar] for 0.08 M/M Sun 0.50 α 3 = 2.3 for 0.50 M/M Sun M max Many other models, e.g. Salpeter (1955); Chabrier (2003) 1 M Sun = 1 Solar Mass (the mass of our Sun) 24

ABC for the Stellar Initial Mass Function Frequency 0 200 400 600 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Mass Image (left): Adapted from http://www.astro.ljmu.ac.uk 25

IMF Likelihood Start with a power-law distribution: each star s mass is independently drawn from a power law distribution with density ( ) 1 α f (m) = Mmax 1 α M 1 α m α, m (M min, M max ) min Then the likelihood is ( ) ntot 1 α n tot L(α m 1:ntot ) = Mmax 1 α M 1 α min n tot = total number of stars in cluster i=1 m α i 26

Observational limitations: aging Lifecycle of star depends on mass more massive stars die faster Cluster age of τ Myr only observe stars with masses < T age τ 2/5 10 8/5 If age = 30 Myr so the aging cutoff is T age 10 M Sun Then the likelihood is ( L(α m 1:nobs, n tot) = T 1 α age 1 α M 1 α min ) ( nobs nobs i=1 m α i ) P(M > T age) ntot n obs n tot = # of stars in cluster n obs = # stars observed in cluster Image: http://scioly.org 27

Observational limitations: completeness Completeness function: 0, m < C min m C P(observing star m) = min C max C min, m [C min, C max ] 1, m > C max Probability of observing a particular star given its mass Depends on the flux limit, stellar crowding, etc. Image: NASA, J. Trauger (JPL), J. Westphal (Caltech) 28

Observational limitations: measurement error Incorporating log-normal measurement error gives our final likelihood: L(α m 1:nobs, n tot ) = ( ( 1 α P(M > T age ) + Mmax 1 α M 1 α min n { obs Tage (2πσ 2 ) 1 2 m 1 i e 1 i=1 2 ( ( ) M Cmin I {M > C max} + C max C min ) Cmax C min M α 2σ 2 (log(m i ) log(m))2 ( I {C min M C max} ( 1 M C ) ) n tot n obs min dm C max C min ) 1 α Mmax 1 α M 1 α M α min ) } dm 29

IMF With aging, completeness, and error Density 0.0 0.1 0.2 0.3 0.4 0.5 0.6 Density 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0 10 20 30 40 50 60 Mass 2 4 6 8 10 12 14 16 Mass Sample size = 1000 stars, [C min, C max ] = [2, 4], σ = 0.25 30

Simulation Study: forward model Draw from ( f (m) = Aged 30 Myrs 1 α 60 1 α 2 1 α ) m α, m (2, 60) Observational completeness: 0, m < 4 P(obs m) = m 2 2, m [2, 4] 1, m > 4. Uncertainty: log M = log m + 0.25η (with η N(0, 1)) Prior: α U[0, 6] 31

Simulation Study: summary statistics We want to account for the following with our summary statistics and distance functions: 1. Shape of the observed Mass Function [ } ] 2 1/2 ρ 1 (m sim, m obs ) = {ˆf log (x) ˆf msim log (x) mobs dx 32

2. Number of stars observed ρ 2 (m sim, m obs ) = 1 n sim /n obs m sim = masses of the stars simulated from the forward model m obs = masses of observed stars n sim = number of stars simulated from the forward model n obs = number of observed stars 33

Simulation Study 1 Draw n = 10 3 stars 2 IMF slope α = 2.35 with M min = 2 and M max = 60 3 N = 10 3 particles 4 T = 30 sequential time steps Observed Mass Function Density 0.00 0.05 0.10 0.15 0.20 0.25 0 5 10 15 Mass 34

IMF Observed MF Density 0.00 0.10 0.20 0.30 Density 0.00 0.10 0.20 0.30 0 10 20 30 40 50 N = 1000 Bandwidth = 0.4971 0 10 20 30 40 50 N = 510 Bandwidth = 0.5625 Sample size = 1000 stars, [C min, C max ] = [2, 4], σ = 0.25 35

Simulation Study results 1 Draw n = 10 3 stars 2 IMF slope α = 2.35 with M min = 2 and M max = 60 3 N = 10 3 particles 4 T = 30 sequential time steps Log10(f M) 0.0 0.5 1.0 1.5 2.0 2.5 (A) Observed Mass Function Observed MF 95% Credible band Posterior Median 0.4 0.6 0.8 1.0 Log10(M) Density 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 (B) Posteriors Initial ABC posterior ABC Posterior... True Posterior Intermediate ABC posterior Input value... Final ABC Posterior 1.6 1.8 2.0 2.2 2.4 2.6 2.8 α 36

Summary ABC can be a useful tool when data are too complex to define a reasonable likelihood, E.g. the Stellar Initial Mass Function Selection of good summary statistics is crucial for ABC posterior to be meaningful 37

Bibliography I Akeret, J., Refregier, A., Amara, A., Seehars, S., and Hasner, C. (2015), Approximate Bayesian computation for forward modeling in cosmology, Journal of Cosmology and Astroparticle Physics, 2015, 043. Bate, M. R. (2012), Stellar, brown dwarf and multiple star properties from a radiation hydrodynamical simulation of star cluster formation, Monthly Notices of the Royal Astronomical Society, 419, 3115 3146. Beaumont, M. A., Cornuet, J.-M., Marin, J.-M., and Robert, C. P. (2009), Adaptive approximate Bayesian computation, Biometrika, 96, 983 990. Bonassi, F. V. and West, M. (2004), Sequential Monte Carlo with Adaptive Weights for Approximate Bayesian Computation, Bayesian Analysis, 1, 1 19. Cameron, E. and Pettitt, A. N. (2012), Approximate Bayesian Computation for Astronomical Model Analysis: A Case Study in Galaxy Demographics and Morphological Transformation at High Redshift, Monthly Notices of the Royal Astronomical Society, 425, 44 65. Chabrier, G. (2003), The galactic disk mass function: Reconciliation of the hubble space telescope and nearby determinations, The Astrophysical Journal Letters, 586, L133. Ishida, E. E. O., Vitenti, S. D. P., Penna-Lima, M., Cisewski, J., de Souza, R. S., Trindade, A. M. M., Cameron, E., and Busti, V. C. (2015), cosmoabc: Likelihood-free inference via Population Monte Carlo Approximate Bayesian Computation, Astronomy and Computing, 13, 1 11. Kroupa, P. (2001), On the variation of the initial mass function, Monthly Notices of the Royal Astronomical Society, 322, 231 246. Moral, P. D., Doucet, A., and Jasra, A. (2011), An adaptive sequential Monte Carlo method for approximate Bayesian computation, Statistics and Computing, 22, 1009 1020. Pritchard, J. K., Seielstad, M. T., and Perez-Lezaun, A. (1999), Population Growth of Human Y Chromosomes: A study of Y Chromosome Microsatellites, Molecular Biology and Evolution, 16, 1791 1798. Salpeter, E. E. (1955), The luminosity function and stellar evolution. The Astrophysical Journal, 121, 161. Schafer, C. M. and Freeman, P. E. (2012), Statistical Challenges in Modern Astronomy V, Springer, chap. 1, Lecture Notes in Statistics, pp. 3 19. Weyant, A., Schafer, C., and Wood-Vasey, W. M. (2013), Likeihood-free cosmological inference with type Ia supernovae: approximate Bayesian computation for a complete treatment of uncertainty, The Astrophysical Journal, 764, 116. 38