Ages of stellar populations from color-magnitude diagrams. Paul Baines. September 30, 2008

Similar documents
Bayesian Computation in Color-Magnitude Diagrams

Bayesian Estimation of log N log S

Bayesian Estimation of log N log S

(5) Multi-parameter models - Gibbs sampling. ST440/540: Applied Bayesian Analysis

Making rating curves - the Bayesian approach

Statistical Tools and Techniques for Solar Astronomers

Non-Parametric Bayes

Fitting Narrow Emission Lines in X-ray Spectra

Measurement Error and Linear Regression of Astronomical Data. Brandon Kelly Penn State Summer School in Astrostatistics, June 2007

Fitting Narrow Emission Lines in X-ray Spectra

Markov Chain Monte Carlo methods

Advances and Applications in Perfect Sampling

Approximate Inference using MCMC

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Hierarchical Linear Models

Bayesian Methods for Machine Learning

Principles of Bayesian Inference

A Bayesian Treatment of Linear Gaussian Regression

Basic math for biology

CPSC 540: Machine Learning

STAT Advanced Bayesian Inference

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Celeste: Variational inference for a generative model of astronomical images

Wageningen Summer School in Econometrics. The Bayesian Approach in Theory and Practice

Yaming Yu Department of Statistics, University of California, Irvine Xiao-Li Meng Department of Statistics, Harvard University.

STA 4273H: Statistical Machine Learning

Two Statistical Problems in X-ray Astronomy

PMR Learning as Inference

Gibbs Sampling in Latent Variable Models #1

Construction of Dependent Dirichlet Processes based on Poisson Processes

Introduction to Probabilistic Machine Learning

MCMC Sampling for Bayesian Inference using L1-type Priors

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Riemann Manifold Methods in Bayesian Statistics

Spatial Statistics with Image Analysis. Outline. A Statistical Approach. Johan Lindström 1. Lund October 6, 2016

MCMC for big data. Geir Storvik. BigInsight lunch - May Geir Storvik MCMC for big data BigInsight lunch - May / 17

Learning the hyper-parameters. Luca Martino

Infering the Number of State Clusters in Hidden Markov Model and its Extension

Time-Sensitive Dirichlet Process Mixture Models

On Bayesian Computation

Bayesian Gaussian Process Regression

Integrated Non-Factorized Variational Inference

Hierarchical Bayesian Modeling

Default Priors and Effcient Posterior Computation in Bayesian

Hierarchical Bayesian Modeling of Planet Populations

Markov Chain Monte Carlo methods

Generative Models and Stochastic Algorithms for Population Average Estimation and Image Analysis

Statistical Methods in Particle Physics Lecture 1: Bayesian methods

Part 1: Expectation Propagation

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Part 8: GLMs and Hierarchical LMs and GLMs

Scaling Neighbourhood Methods

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Bayesian Linear Regression

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution

Hierarchical Bayesian Modeling

Calibration of Stochastic Volatility Models using Particle Markov Chain Monte Carlo Methods

Bayesian Inference and MCMC

Study Notes on the Latent Dirichlet Allocation

Monte Carlo in Bayesian Statistics

Module 7: Introduction to Gibbs Sampling. Rebecca C. Steorts

CS-E3210 Machine Learning: Basic Principles

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Bayesian inference for multivariate skew-normal and skew-t distributions

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

Likelihood-free MCMC

20: Gaussian Processes

Metropolis-Hastings Algorithm

Gibbs Sampling in Linear Models #2

Accounting for Calibration Uncertainty in Spectral Analysis. David A. van Dyk

Bayesian Phylogenetics:

State Space and Hidden Markov Models

Inference with few assumptions: Wasserman s example

Markov chain Monte Carlo

Lecture 13 Fundamentals of Bayesian Inference

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

Special Topic: Bayesian Finite Population Survey Sampling

Hierarchical Modeling of Astronomical Images and Uncertainty in Truncated Data Sets. Brandon Kelly Harvard Smithsonian Center for Astrophysics

Foundations of Statistical Inference

Bayesian Nonparametric Regression for Diabetes Deaths

Accounting for Calibration Uncertainty in Spectral Analysis. David A. van Dyk

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Markov Chain Monte Carlo

Introduction to Machine Learning CMU-10701

Bayesian Modeling of Conditional Distributions

Assessing Regime Uncertainty Through Reversible Jump McMC

Theory of Stochastic Processes 8. Markov chain Monte Carlo

Hierarchical models. Dr. Jarad Niemi. August 31, Iowa State University. Jarad Niemi (Iowa State) Hierarchical models August 31, / 31

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Chapter 12 PAWL-Forced Simulated Tempering

Bayesian Image Segmentation Using MRF s Combined with Hierarchical Prior Models

Modeling conditional distributions with mixture models: Theory and Inference

Introduction to Bayesian Statistics with WinBUGS Part 4 Priors and Hierarchical Models

MCMC algorithms for fitting Bayesian models

Bayesian Hierarchical Models

VCMC: Variational Consensus Monte Carlo

MCMC and Gibbs Sampling. Kayhan Batmanghelich

Probabilistic Graphical Models

Outline Challenges of Massive Data Combining approaches Application: Event Detection for Astronomical Data Conclusion. Abstract

Transcription:

Ages of stellar populations from color-magnitude diagrams Paul Baines Department of Statistics Harvard University September 30, 2008

Context & Example Welcome! Today we will look at using hierarchical Bayesian modeling to make inference about the properties of stars; most notably the age and mass of groups of stars. Complete with a brief dummies (statisticians) guide to the Astronomy behind it.

Isochrones For Dummies/Statisticians Given the mass, age and metallicity of a star, we know what its ideal observation should be i.e., where it should be on the CMD. The tables of these ideal observations are called isochrone tables. Why are they only ideal colours/magnitudes?

Observational Error Alas, as with every experiment there are observational errors and biases caused by the instruments.

Observational Error Alas, as with every experiment there are observational errors and biases caused by the instruments. 1. These are relatively well understood and can be considered to be Gaussian with known standard deviation.

Observational Error Alas, as with every experiment there are observational errors and biases caused by the instruments. 1. These are relatively well understood and can be considered to be Gaussian with known standard deviation. 2. Importantly, we can characterize the standard deviation as a function of the observed data. i.e., given Y i = (Y ib, Y iv, Y ii ) T we have σ i = σ (Y i ).

The Observed Data We observe (depending on the experiment) p different colours/magnitudes for n stars. Although it is equally straightforward to model colours U B, B V etc., and magnitudes B, V, etc., we will stick with magnitudes. The (known) standard deviations in each band are also recorded for each observation. We also observe that we observe the n stars in the dataset and that we didn t observe any others!

The Likelihood I y i = 1 B σ (B) i i 1 V σ (V ) i i 1 I σ (I ) i i ) A i, M i, Z N ( f i, R i = 1,..., n (1) Where, f i = 1 σ Bi f b (A i, M i, Z) 1 σ Vi f v (A i, M i, Z) 1 σ Ii f i (A i, M i, Z), R = 1 ρ (BV ) ρ (BI ) ρ (BV ) 1 ρ (VI ) ρ (BI ) ρ (VI ) 1.

The Likelihood II Let S i = 1 if star i is observed, S i = 0 otherwise. S i Y i Bernoulli (p (Y i )) (2) where p(y i ) is the probability of a star of a given magnitude being unobserved (provided by Astronomers).

The Likelihood II Let S i = 1 if star i is observed, S i = 0 otherwise. S i Y i Bernoulli (p (Y i )) (2) where p(y i ) is the probability of a star of a given magnitude being unobserved (provided by Astronomers). Note: We can also have S i = (S ib, S iv, S ii ) T and allow for some stars to be observed only in a subset of the bands.

The Parameters

Mass Before we have any data, the prior distributions of mass and age are still not independent. We know a priori that old stars cannot have large mass, likewise for very young stars. Hence, we specify the prior on mass conditional on age: p (M i A i, M min, M max (A i ), α) 1 M α i 1 {Mi [M min,m max (A i )]} (3) i.e. M i A i, M min, M max (A i ), α Truncated-Pareto.

Age For age we assume the following hierarchical structure: A i µ A, σa 2 iid ( N µa, σa) 2 (4) where A i = log 10 (Age), with µ A and σ 2 A hyperparameters...

Metallicity Denoted by Z i. Assumed to be known and common to all stars i.e., Z i = Z = 4

Hyperparameters Next, we model the hyperparameters with the simple conjugate form: ) µ A σa (µ 2 N 0, σ2 A, σa 2 κ Inv ( χ2 ν 0, σ 2 ) 0 (5) 0 Where µ 0, κ 0, ν 0 and σ0 2 knowledge (or lack of). are fixed by the user to represent prior

Correlation We assume a uniform prior over the space of positive definite correlation matrices. This isn t quite uniform on each of ρ (BV ), ρ (BI ) and ρ (VI ), but it is very close.

Incompleteness Unfortunately, some dimmer stars may not be fully observed. This censoring can bias conclusions about the stellar cluster parameters.

Incompleteness Unfortunately, some dimmer stars may not be fully observed. This censoring can bias conclusions about the stellar cluster parameters. Since magnitudes are functions of photon arrivals, the censoring is stochastic.

Putting it all together S ij Y i Bernoulli (p (Y i )) i = 1,..., n, n + 1,..., n + n mis j {B, V, I } y i = 1 B σ (B) i i 1 V σ (V ) i i 1 I σ (I ) i i ) A i, M i, Z N ( f i, R i = 1,..., n, n + 1,..., n + n mis M i A i, M min, α Truncated-Pareto (α 1, M min, M max (A i )) ( ) A i µ A, σa 2 iid N µ A, σa 2 ( ) ( ) µ A σa 2 N µ 0, σ2 A, σa 2 Inv χ 2 ν 0, σ0 2 κ 0 p(r) 1 {Rp.d.}

Observed-Data Posterior The product of the densities on the previous slide gives us the complete-data posterior. Alas, we don t observe all the stars, and n mis is an unknown parameter. For now, lets just condition on n mis. We have: W obs = { n, y [1:n] = (y 1,..., y n ), S = {1,..., 1, 0,..., 0)} } W mis = { m, Y [(n+1):(n+m)], M [(n+1):(n+m)], A [(n+1):(n+m)] } Θ = { M [1:n], A [1:n], µ A, σ 2 A, R} where X a:b denotes the vector (X a, X a+1,..., X b )

Observed-Data Posterior We want p (Θ W obs ) but so far we have p (Θ, W mis W obs ). So, we integrate out the missing data: p (Θ W obs ) = p (Θ, W mis W obs ) dw mis (6) In practice, this integration is done by sampling from p (Θ, W mis W obs ) and retaining only the samples of Θ.

Observed-Data Posterior We form a Gibbs sampler to sample from p (Θ, W mis W obs ). Given a current state of our Markov Chain, Θ = Θ (t) and W mis = W (t) mis ( ) : 1. Draw Θ (t+1) from p Θ W (t) mis, W obs (as before) 2. Draw W (t+1) mis from p ( W mis Θ (t+1), W obs ) (new)

Sampling W mis At each iteration of the Gibbs sampler we need to draw the missing data from the appropriate distribution. In other words, given a bunch of masses, ages, and metallicities of n mis missing stars, find a bunch of Y i s that are consistent with that: ) p i (Y i Y [1:n], M, A, µ A, σa 2 [1 π (Y i )] (7) { exp 1 ( Y i 2 f T (Y i ; M i, A i, Z)) ( R 1 Y i f (Y i ; M i, A i, Z)) } (8) for i = n + 1,..., n + m.

Sampling W mis Once we have sampled a new set of Y mis, we need to sample the standard deviation of the Gaussian error for those stars. Here we assume this is a deterministic mapping: σ = σ (Y i ).

The Algorithm Sampling from the posterior Some notes:

The Algorithm Sampling from the posterior Some notes: 1. We have our model what does our posterior look like?

The Algorithm Sampling from the posterior Some notes: 1. We have our model what does our posterior look like? 2. Ugly. No chance of working with it analytically MCMC!

The Algorithm Sampling from the posterior Some notes: 1. We have our model what does our posterior look like? 2. Ugly. No chance of working with it analytically MCMC! 3. Going to have to be a Gibbs sampler. How to break it up?

The Algorithm Sampling from the posterior Some notes: 1. We have our model what does our posterior look like? 2. Ugly. No chance of working with it analytically MCMC! 3. Going to have to be a Gibbs sampler. How to break it up? 4. Mass and Age are going to be extremely highly correlated (i.e., sample jointly)

The Algorithm Sampling from the posterior Some notes: 1. We have our model what does our posterior look like? 2. Ugly. No chance of working with it analytically MCMC! 3. Going to have to be a Gibbs sampler. How to break it up? 4. Mass and Age are going to be extremely highly correlated (i.e., sample jointly) 5. No analytic simplification for terms in M i, A i because of f

The Algorithm Sampling from the posterior Some notes: 1. We have our model what does our posterior look like? 2. Ugly. No chance of working with it analytically MCMC! 3. Going to have to be a Gibbs sampler. How to break it up? 4. Mass and Age are going to be extremely highly correlated (i.e., sample jointly) 5. No analytic simplification for terms in M i, A i because of f 6. High dimensional multi-modal, so we also use parallel tempering.

The Algorithm Parallel Tempering A brief overview of parallel tempering: The parallel tempering framework involves sampling N chains, with the i th chain of the form: { p i (θ) = p(θ y) 1/t i exp H(θ) } (9) t i As t i increases the target distributions become flatter.

Does it work? Simulation Results We simulate 100 datasets from the model with n = 100: µ A = 9.2 σa 2 = 0.012 M (min) = 0.8 α = 2.5 R = I (σ Bi, σ Vi, σ Ii ) (0.03, 0.12)

8.9 9.0 9.1 9.2 9.3 9.4 9.5 8.9 9.0 9.1 9.2 9.3 9.4 9.5 mu_age: Posterior medians vs. Truth Median(mu_age) True(mu_age)

Does it work? Post_p 0.5 1.0 2.5 5.0 25.0 50.0 m_cover 0.6 1.2 2.8 6.0 25.0 49.1 a_cover 0.4 1.1 2.8 6.3 25.1 50.4 mu_age 3.0 3.0 5.0 6.0 30.0 55.0 ss_age 0.0 0.0 3.0 4.0 26.0 47.0

Does it work? Post_p 50.0 75.0 95.0 97.5 99.0 99.5 m_cover 49.1 74.1 94.4 96.6 98.5 99.2 a_cover 50.4 75.3 93.7 97.2 99.0 99.3 mu_age 55.0 81.0 97.0 99.0 100.0 100.0 ss_age 47.0 71.0 94.0 97.0 100.0 100.0

Nominal vs. Actual Coverage Nominal 0 20 40 60 80 100 Target m_i a_i mu_age ss_age 0 20 40 60 80 100 Actual

Future Work & Conclusions Future Work Some important things still need to be built into the model before it is fit for purpose: Extinction/Absorption: Shift in observed data Multi-Cluster Models: Allow for multiple stellar clusters