Data Analysis I. Dr Martin Hendry, Dept of Physics and Astronomy University of Glasgow, UK. 10 lectures, beginning October 2006

Similar documents
6. Advanced Numerical Methods. Monte Carlo Methods

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Modeling Uncertainty in the Earth Sciences Jef Caers Stanford University

19 : Slice Sampling and HMC

Parameter Estimation and Fitting to Data

Introduction to Statistics and Error Analysis II

Recall the Basics of Hypothesis Testing

Today: Fundamentals of Monte Carlo

Lecture 5. G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Monte Carlo Simulations

Statistical Methods in Particle Physics

16 : Markov Chain Monte Carlo (MCMC)

Statistics for Data Analysis. Niklaus Berger. PSI Practical Course Physics Institute, University of Heidelberg

Lecture 7 and 8: Markov Chain Monte Carlo

Scientific Computing: Monte Carlo

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Today: Fundamentals of Monte Carlo

Random Number Generation. CS1538: Introduction to simulations

Practical Numerical Methods in Physics and Astronomy. Lecture 5 Optimisation and Search Techniques

Statistics and Data Analysis

Data modelling Parameter estimation

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Chapter 4 Multiple Random Variables

Robert Collins CSE586, PSU Intro to Sampling Methods

17 : Markov Chain Monte Carlo

MCMC 2: Lecture 3 SIR models - more topics. Phil O Neill Theo Kypraios School of Mathematical Sciences University of Nottingham

Bayesian Inference in Astronomy & Astrophysics A Short Course

Doing Bayesian Integrals

Methods of Data Analysis Random numbers, Monte Carlo integration, and Stochastic Simulation Algorithm (SSA / Gillespie)

functions Poisson distribution Normal distribution Arbitrary functions

Introduction to Markov Chain Monte Carlo & Gibbs Sampling

2 Functions of random variables

Mathematical statistics

Markov Chain Monte Carlo

Likelihood-free MCMC

Markov Chains and MCMC

STATISTICS OF OBSERVATIONS & SAMPLING THEORY. Parent Distributions

Strong Lens Modeling (II): Statistical Methods

Lecture : Probabilistic Machine Learning

Multimodal Nested Sampling

Study of Two Metrics for Anisotropy in the Highest Energy Cosmic Rays

Fourier and Stats / Astro Stats and Measurement : Stats Notes

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Statistical Methods in Particle Physics

Machine Learning. Probabilistic KNN.

Monte Carlo Methods. Leon Gu CSD, CMU

Large Scale Bayesian Inference

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1

Markov Chain Monte Carlo

Probabilistic Graphical Models Lecture 17: Markov chain Monte Carlo

DRAGON ADVANCED TRAINING COURSE IN ATMOSPHERE REMOTE SENSING. Inversion basics. Erkki Kyrölä Finnish Meteorological Institute

CSC 446 Notes: Lecture 13

Simulation. Where real stuff starts

Introduction to Mobile Robotics Bayes Filter Particle Filter and Monte Carlo Localization

AST 418/518 Instrumentation and Statistics

Lecture 1: August 28

Markov chain Monte Carlo methods in atmospheric remote sensing

Complex Systems Methods 11. Power laws an indicator of complexity?

Cosmology & CMB. Set5: Data Analysis. Davide Maino

Markov Processes. Stochastic process. Markov process

Statistics notes. A clear statistical framework formulates the logic of what we are doing and why. It allows us to make precise statements.

Practical Statistics

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

Probability Density Functions

Monte Carlo. Lecture 15 4/9/18. Harvard SEAS AP 275 Atomistic Modeling of Materials Boris Kozinsky

Math Review Sheet, Fall 2008

Robert Collins CSE586, PSU Intro to Sampling Methods

Statistics and Prediction. Tom

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Today: Fundamentals of Monte Carlo

Numerical Methods I Monte Carlo Methods

BERTINORO 2 (JVW) Yet more probability Bayes' Theorem* Monte Carlo! *The Reverend Thomas Bayes

Lecture 4. Continuous Random Variables and Transformations of Random Variables

Physics 403 Probability Distributions II: More Properties of PDFs and PMFs

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

Monte Carlo Markov Chains: A Brief Introduction and Implementation. Jennifer Helsby Astro 321

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Introduction to statistics. Laurent Eyer Maria Suveges, Marie Heim-Vögtlin SNSF grant

Advanced Statistical Methods. Lecture 6

Robert Collins CSE586, PSU Intro to Sampling Methods

Learning the hyper-parameters. Luca Martino

Statistical techniques for data analysis in Cosmology

MASSACHUSETTS INSTITUTE OF TECHNOLOGY PHYSICS DEPARTMENT

Probability Density Functions

Statistics, Data Analysis, and Simulation SS 2015

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

Introduction to Bayes

Modern Methods of Data Analysis - WS 07/08

Eco517 Fall 2013 C. Sims MCMC. October 8, 2013

Monte Carlo and cold gases. Lode Pollet.

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Multiple Random Variables

SAMPLING ALGORITHMS. In general. Inference in Bayesian models

Statistical Methods for Astronomy

If we want to analyze experimental or simulated data we might encounter the following tasks:

Short tutorial on data assimilation

MCMC and Gibbs Sampling. Sargur Srihari

Transcription:

Astronomical p( y x, I) p( x, I) p ( x y, I) = p( y, I) Data Analysis I Dr Martin Hendry, Dept of Physics and Astronomy University of Glasgow, UK 10 lectures, beginning October 2006

4. Monte Carlo Methods In many data analysis problems it is useful to create mock datasets, in order to test models and explore possible systematic errors. e.g. Mock galaxy catalogues: From Cole et al. (1998)

4. Monte Carlo Methods In many data analysis problems it is useful to create mock datasets, in order to test models and explore possible systematic errors. e.g. Mock galaxy catalogues: We need methods for generating random variables i.e. samples of numbers which behave as if they are drawn from some particular pdf (e.g. uniform, Gaussian, Poisson etc). From Cole et al. (1998)

4. Monte Carlo Methods In many data analysis problems it is useful to create mock datasets, in order to test models and explore possible systematic errors. e.g. Mock galaxy catalogues: We need methods for generating random variables i.e. samples of numbers which behave as if they are drawn from some particular pdf (e.g. uniform, Gaussian, Poisson etc). We call these Monte Carlo methods From Cole et al. (1998)

4.1 Uniform random numbers Generating uniform random numbers, drawn from the pdf U[0,1], is fairly easy. Any scientific Calculator will have a RAN function p(x) Better examples of U[0,1] random number generators can be found in Numerical Recipes. 1 http://www.numerical-recipes.com/ 0 1 x

4.1 Uniform random numbers Generating uniform random numbers, drawn from the pdf U[0,1], is fairly easy. Any scientific Calculator will have a RAN function p(x) Better examples of U[0,1] random number generators can be found in Numerical Recipes. 1 http://www.numerical-recipes.com/ In what sense are they better? Algorithms only generate pseudo-random 0 1 numbers: very long (deterministic) sequences of numbers which are approximately random (i.e. no discernible pattern). x The better the RNG, the better it approximates U[0,1]

We can test pseudo-random numbers for randomness in several ways: (a) Histogram of sampled values. We can use hypothesis tests to see if the sample is consistent with the pdf we are trying to model. e.g. Chi-squared test, applied to the to the numbers in each histogram bin. 0 1 x

We can test pseudo-random numbers for randomness in several ways: (a) Histogram of sampled values. We can use hypothesis tests to see if the sample is consistent with the pdf we are trying to model. e.g. Chi-squared test, applied to the to the numbers in each histogram bin. 2 χ = n bin i= 1 n obs i n σ i pred i 2 (4.1) Assume the bin number counts are subject 2 to Poisson fluctuations, so that pred σ = i n i Note: no. of degrees of freedom = n bin 1 since we know the total sample size. 0 1 x

(b) Correlations between neighbouring pseudo-random numbers Sequential patterns in the sampled values would show up as structure in the phase portraits scatterplots of the i th value against the (i+1) th value etc. x i+2 x i+1 x i+1 x i x i

(b) Correlations between neighbouring pseudo-random numbers Sequential patterns in the sampled values would show up as structure in the phase portraits scatterplots of the i th value against the (i+1) th value etc. We can compute the Auto-correlation function ( j) = [ ( xi x)( xi+ j x) ] 2 ( xi x) ( xi+ j x) ρ (4.2) 2 j is known as the Lag If the sequence is uniformly random, we expect { ρ(j) = 0 ρ(j) = 1 for j = 0 otherwise

4.2 Variable transformations Generating random numbers from other pdfs can be done by transforming random numbers drawn from simpler pdfs. The procedure is similar to changing variables in integration. Suppose, e.g. x ~ p( x) y = y(x) Let be monotonic Then p ( y) dy = p( x) dx (4.3) p( y) = p( x( y)) dy dx (4.4) Probability of number between y and y+dy Probability of number between x and x+dx Because probability must be positive

We can extend the expression given in eq. (4.4) to the case where y(x) is not monotonic, by calculating p ( y) dy = p( x i ) dx i i so that p( y) = i p( x dy i ( y)) dx i (4.5)

Example 1 p(x) Suppose we have x ~ U[0,1] 1 Then 1 for 0 < x < 1 ( x) = 0 otherwise p (4.6) 0 1 x Define y = a + ( b a) x (4.7) p(y) So dy dx ( b a) = (4.8) 1 b-a i.e. p( y) = b a for a < y < 0 otherwise 1 b (4.9) or y ~ U[ a, b] 0 a b y

Example 2 Numerical recipes provides a program to turn Normal pdf with mean zero and standard deviation unity x ~ U[0,1] into y ~ N[0,1] Suppose we want z ~ N[ µ, σ ] z = µ + σ y We define (4.10) so that dz dy = σ (4.11) Now 1 1 p( y) exp y 2π 2 = 2 (4.12) so ( z) = 1 1 exp 2πσ 2 z µ σ p (4.13) 2

Example 2 Numerical recipes provides a program to turn Normal pdf with mean zero and standard deviation unity x ~ U[0,1] into y ~ N[0,1] Suppose we want z ~ N[ µ, σ ] z = µ + σ y We define (4.10) so that dz dy = σ (4.11) Now 1 1 p( y) exp y 2π 2 = 2 (4.12) so ( z) = 1 1 exp 2πσ 2 z µ σ p (4.13) 2 Variable transformation formula also the basis for error propagation formulae we use in the lab. See example sheet 3 for more on this.

4.3 Probability integral transform One particular variable transformation merits special attention. Suppose we can compute the CDF of some desired random variable

1) Sample a random variable y ~ U[0,1] x y = P(x) x = P 1 ( y ) 2) Compute such that i.e.

1) Sample a random variable y ~ U[0,1] x y = P(x) 2) Compute such that i.e. x = P 1 ( y)

1) Sample a random variable y ~ U[0,1] x y = P(x) 2) Compute such that i.e. x = P 1 ( y) x ~ p( x) 3) Then i.e. is drawn from the pdf corresponding to the cdf x P(x)

Example

4.4 Rejection sampling Suppose we want to sample from some pdf and we know that p 1 ( x) p 2 ( x) y p1( x) < p2( x) x p 1 ( x) 1) Sample from 2) Sample x 1 p 2( x ) p 2 ( x )] y ~ U[0, 1 (Suppose we have an easy way to do this) x 1

4.4 Rejection sampling Suppose we want to sample from some pdf and we know that p 1 ( x) p 2 ( x) y p1( x) < p2( x) x p 1 ( x) 1) Sample from 2) Sample x 1 p 2( x ) p 2 ( x )] y ~ U[0, 1 (Suppose we have an easy way to do this) x 1 y < p 1 ( x) 3) If ACCEPT otherwise REJECT

4.4 Rejection sampling Suppose we want to sample from some pdf and we know that p 1 ( x) p 2 ( x) y p1( x) < p2( x) x p 1 ( x) 1) Sample from 2) Sample y < x 1 p 2( x ) p 2 ( x )] y ~ U[0, 1 p 1 ( x) 3) If ACCEPT otherwise REJECT (Suppose we have an easy way to do this) x 1 Set of accepted values are a sample from p 1 ( x) { } x i

4.4 Rejection sampling p 2 ( x) p 1 ( x) Method can be very slow if the shaded region is too large. Ideally we want to find a pdf p 2 ( x) that is: (a) easy to sample from (b) close to p 1 ( x)

4.5 Markov Chain Monte Carlo This is a very powerful, new (at least in astronomy!) method for sampling from pdfs. (These can be complicated and/or of high dimension). MCMC widely used e.g. in cosmology to determine maximum likelihood model to CMBR data. Angular power spectrum of CMBR temperature fluctuations ML cosmological model, depending on 7 different parameters.

Consider a 2-D example (e.g. bivariate normal distribution); Likelihood function depends on parameters a and b. Suppose we are trying to find the maximum of L(a,b) L(a,b) 1) Start off at some randomly chosen value ( a 1, b 1 ) 2) Compute L( a 1, b 1 ) and gradient L L, a b ( a b ) 1, 3) Move in direction of steepest +ve gradient i.e. increasing fastest 1 L( a 1, b 1 ) is a b 4) Repeat from step 2 until ( a n, b n ) converges on maximum of likelihood

Consider a 2-D example (e.g. bivariate normal distribution); Likelihood function depends on parameters a and b. Suppose we are trying to find the maximum of L(a,b) L(a,b) 1) Start off at some randomly chosen value ( a 1, b 1 ) 2) Compute L( a 1, b 1 ) and gradient L L, a b ( a b ) 1, 3) Move in direction of steepest +ve gradient i.e. increasing fastest 1 L( a 1, b 1 ) is a b 4) Repeat from step 2 until ( a n, b n ) converges on maximum of likelihood OK for finding maximum, but not for generating a sample from or for determining errors on the the ML parameter estimates. L(a,b)

b MCMC provides a simple Metropolis algorithm for generating random samples of points from L(a,b) Slice through L(a,b) a

MCMC provides a simple Metropolis algorithm for generating random samples of points from L(a,b) b 1. Sample random initial point P 1 = ( a 1, b 1 ) P 1 Slice through L(a,b) a

MCMC provides a simple Metropolis algorithm for generating random samples of points from L(a,b) b 1. Sample random initial point P 1 = ( a 1, b 1 ) 2. Centre a new pdf, Q, called the proposal density, on P 1 P 1 Slice through L(a,b) a

MCMC provides a simple Metropolis algorithm for generating random samples of points from L(a,b) b 1. Sample random initial point P 1 = ( a 1, b 1 ) 2. Centre a new pdf, Q, called the proposal density, on P 1 P 1 P 3. Sample tentative new point from Q P = ( a, b ) Slice through L(a,b) a

MCMC provides a simple Metropolis algorithm for generating random samples of points from L(a,b) b 1. Sample random initial point P 1 = ( a 1, b 1 ) 2. Centre a new pdf, Q, called the proposal density, on P 1 P 1 P 3. Sample tentative new point from Q P = ( a, b ) Slice through L(a,b) 4. Compute R = L( a', b' ) L ( a 1, b 1) a

5. If R > 1 this means is uphill from. P P 1 We accept P as the next point in our chain, i.e. P 2 = P 6. If R < 1 this means is downhill from. P P 1 In this case we may reject P as our next point. In fact, we accept with probability R. P How do we do this? (a) Generate a random number x ~ U[0,1] (b) If x < R then accept and set P P 2 = P (c) If x > R then reject and set P P 2 = P 1

5. If R > 1 this means is uphill from. P P 1 We accept P as the next point in our chain, i.e. P 2 = P 6. If R < 1 this means is downhill from. P P 1 In this case we may reject P as our next point. In fact, we accept with probability R. P How do we do this? (a) Generate a random number x ~ U[0,1] (b) If x < R then accept and set P P 2 = P (c) If x > R then reject and set P P 2 = P 1 Acceptance probability depends only on the previous point - Markov Chain

So the Metropolis Algorithm generally (but not always) moves uphill, towards the peak of the Likelihood Function. Remarkable facts Sequence of points { P 1, P 2, P 3, P 4, P 5, } represents a sample from the LF L(a,b) Sequence for each coordinate, e.g. samples the marginalised likelihood of a { a 1, a 2, a 3, a 4, a 5, } We can make a histogram of and use it to compute the mean and variance of a ( i.e. to attach an error bar to a ) { a 1, a 2, a 3, a 4, a 5,, a n }

Why is this so useful? Suppose our LF was a 1-D Gaussian. We could estimate the mean and variance quite well from a histogram of e.g. 1000 samples. But what if our problem is, e.g. 7 dimensional? Normal sampling would need (1000) 7 samples! No. of samples MCMC provides a short-cut. To compute a new point in our Markov Chain we need to compute Sampled value the LF. But the computational cost does not grow so dramatically as we increase the number of dimensions of our problem. This lets us tackle problems that would be impossible by normal sampling.

Example: CMBR constraints from WMAP 3 year data Angular power spectrum of CMBR temperature fluctuations ML cosmological model, depending on 7 different parameters. Possible Honours Lab Project!