Monte Carlo Simulations

Similar documents
Class 12. Random Numbers

Data Analysis I. Dr Martin Hendry, Dept of Physics and Astronomy University of Glasgow, UK. 10 lectures, beginning October 2006

Today: Fundamentals of Monte Carlo

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Hands-on Generating Random

Practice Problems Section Problems

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

2 Random Variable Generation

Today: Fundamentals of Monte Carlo

A8824: Statistics Notes David Weinberg, Astronomy 8824: Statistics Notes 6 Estimating Errors From Data

Monte Carlo Techniques

2008 Winton. Review of Statistical Terminology

Statistical Methods for Astronomy

Random Number Generation. Stephen Booth David Henty

16 : Markov Chain Monte Carlo (MCMC)

2 Functions of random variables

The University of Hong Kong Department of Statistics and Actuarial Science STAT2802 Statistical Models Tutorial Solutions Solutions to Problems 71-80

Statistics and Data Analysis

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Markov Chain Monte Carlo (MCMC)

Inferring from data. Theory of estimators

Random processes and probability distributions. Phys 420/580 Lecture 20

Sampling Distributions Allen & Tildesley pgs and Numerical Recipes on random numbers

Bayesian Methods for Machine Learning

Independent Events. Two events are independent if knowing that one occurs does not change the probability of the other occurring

STATISTICS OF OBSERVATIONS & SAMPLING THEORY. Parent Distributions

11. Bootstrap Methods

Monte Carlo and cold gases. Lode Pollet.

Physics 509: Bootstrap and Robust Parameter Estimation

MTH739U/P: Topics in Scientific Computing Autumn 2016 Week 6

Lecture 1: August 28

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

Review of Statistical Terminology

Simulation. Where real stuff starts

ORF 245 Fundamentals of Statistics Practice Final Exam

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

UNIT 5:Random number generation And Variation Generation

Measurement Error and Linear Regression of Astronomical Data. Brandon Kelly Penn State Summer School in Astrostatistics, June 2007

Ordered Sample Generation

Preliminary statistics

Better Bootstrap Confidence Intervals

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.

Statistical inference

14 Random Variables and Simulation

Cross-correlations of CMB lensing as tools for cosmology and astrophysics. Alberto Vallinotto Los Alamos National Laboratory

Fourier and Stats / Astro Stats and Measurement : Stats Notes

Monte Carlo Methods. Leon Gu CSD, CMU

Simulation. Where real stuff starts

Exam 3, Math Fall 2016 October 19, 2016

Econometrics Summary Algebraic and Statistical Preliminaries

( ) ( ) Monte Carlo Methods Interested in. E f X = f x d x. Examples:

Chapter 4: Monte Carlo Methods. Paisan Nakmahachalasint

Lecture 5: Random numbers and Monte Carlo (Numerical Recipes, Chapter 7) Motivations for generating random numbers

Monte Carlo Simulation

If we want to analyze experimental or simulated data we might encounter the following tasks:

MAS1302 Computational Probability and Statistics

STA 4273H: Statistical Machine Learning

Computer Applications for Engineers ET 601

Probability distribution functions

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

19 : Slice Sampling and HMC

Lecture 15 Random variables

STAT Chapter 5 Continuous Distributions

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Summary: the confidence interval for the mean (σ 2 known) with gaussian assumption

Physics 509: Error Propagation, and the Meaning of Error Bars. Scott Oser Lecture #10

Part I. Experimental Error

Numerical Methods I Monte Carlo Methods

Generating Random Variables

Introduction to Machine Learning CMU-10701

Math Stochastic Processes & Simulation. Davar Khoshnevisan University of Utah

7.3 Rejection Method: Gamma, Poisson, Binomial Deviates

Lecture : Probabilistic Machine Learning

Primer on statistics:

From Determinism to Stochasticity

26, 24, 26, 28, 23, 23, 25, 24, 26, 25

functions Poisson distribution Normal distribution Arbitrary functions

Contents. 1 Probability review Introduction Random variables and distributions Convergence of random variables...

Lecture 4. Continuous Random Variables and Transformations of Random Variables

AST 418/518 Instrumentation and Statistics

Basic Sampling Methods

Consider the joint probability, P(x,y), shown as the contours in the figure above. P(x) is given by the integral of P(x,y) over all values of y.

Answers and expectations

Modeling Uncertainty in the Earth Sciences Jef Caers Stanford University

ISyE 3044 Fall 2017 Test #1a Solutions

IEOR 3106: Introduction to Operations Research: Stochastic Models. Professor Whitt. SOLUTIONS to Homework Assignment 2

Uniform random numbers generators

Lecture 7 and 8: Markov Chain Monte Carlo

Physics 403. Segev BenZvi. Propagation of Uncertainties. Department of Physics and Astronomy University of Rochester

Stochastic Gradient Descent

Continuous RVs. 1. Suppose a random variable X has the following probability density function: π, zero otherwise. f ( x ) = sin x, 0 < x < 2

Estimating the accuracy of a hypothesis Setting. Assume a binary classification setting

Scientific Computing: Monte Carlo

Notation Precedence Diagram

Solution to Assignment 3

Probability and Distributions

From Random Numbers to Monte Carlo. Random Numbers, Random Walks, Diffusion, Monte Carlo integration, and all that

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Northwestern University Department of Electrical Engineering and Computer Science

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger

Transcription:

Monte Carlo Simulations What are Monte Carlo Simulations and why ones them? Pseudo Random Number generators Creating a realization of a general PDF The Bootstrap approach A real life example: LOFAR simulations

Random Number generators Computer generated random numbers is conceptually a contradictory notion. Computer generate random numbers based on a fixed recipe that is set by the programmer, how can a fixed formula generate an infinitely large set of completely random numbers? Obviously, we'll not solve this complicated issue here, however, from this opening you can probably realize that there are no truly random number generators produced by computer (example will follow momentarily) their proper name is actually pseudo-random number generators.

Putting the philosophical issues aside, this issue has practical side, one has to be extra careful with random number generators. In the history of computer analysis of data there are many examples of very badly written random number generators that led to a vast many wrong conclusions, the most infamous of such routines is the one called RANDU that was widespread on IBM mainframe computers. Despite all what have been said, random number generators, provided they are well tested, constitute one of the main tools scientists have in their disposal to analyse and model data. A typical random number generator is a function or subroutine RAN(seed) that requires the user to provide an initial random number called the seed from which the routine generates a number, the next input seed is automatically generated by the previous step.

Almost all supplied random number generators fall under the name congruential generators (Lehmer 1948), which create a sequence of integers from the following simple recipe Ij+1 = aij + c (mod m) Pros: This algorithm is fast in generation of numbers and requires very few operations per call. Cons: Not free of sequential correlation on successive calls. Routine RANDU (IBM Corp.): We guarantee that each number is random individually, but we don t guarantee that more than one of them is random.

The Transformation method We know that if y=y(x), then: dx p(y) = p(x) dy We know how to generate a uniform random number, so that the probability of it being between x and x+dx is: p(x)dx = ½ dx 0 if 0 x 1; otherwise: Therefore we need to solve the differential equation: dx = f (y) ( p(y)) dy The solution is: y(x) = F 1 (x) where F = This method is used to generate normal, exponential and other types of distributions where the inverse is well defined and easy to obtain. Z f (y)dy

The Transformation method: exponential deviates As an simple example consider the case of an exponential distribution p(y)dy = e y dy From the previous relation we can produce a realization of this distribution from a uniform deviate x through the transformation: y(x) = F 1 (x) = ln x

The Transformation method: Gaussian deviates An important example for the application of the transformation method is the Box-Müller method for generating random gaussian deviates with a Gaussian (normal) distribution. 1 y2 =2 p(y)dy = p e dy 2¼ This method makes use of the fact that it is possible to find a function that generates the 2-dimensional Gaussian distribution µ 2 2 1 (y1 + y2 ) p(y1 ; y2 )dy1 dy2 = exp dy1 dy2 2¼ 2 from two uniform deviates x1 and x2 p y1 = p 2 ln x1 cos 2¼x2 y2 = 2 ln x1 sin 2¼x2

Acceptance rejection method Generate a random deviate of f(x) (more tractable function). Generate a second deviate to decide whether to accept or reject that x. Ratio of accepted to rejected points is the ratio of the area under p to the area between p and f. - This is very useful for generating: Gamma distribution deviates, Poisson deviates and binomial deviates.

Bootstrap The bootstrap is a name generically applied to statistical resampling schemes that allow uncertainty in the data to be assessed from the data themselves, in other words, pulling yourself up by your bootstraps. Given n observations zi, i=1,...,n and a calculated statistic S, e.g., the mean, what is the uncertainty in S? The procedure: Draw n values z i from the original data with replacement Calculate the statistic S from the bootstrapped sample Repeat L times to build up a distribution of uncertainty in S

Bootstrap Assumptions 1. Your sample is a valid representative of the population. 2. Bootstrap method will take sampling with replacement from the sample. Each sub sampling is independent and identical distribution (i.i.d.). In other word, it assumes that the sub samples come from the same distribution of the population, but each sample is drawn independently from the other samples.

Bootstrap: Applications Here are some typical statistical examples of problems that you can use Bootstrap method to solve 1. Suppose you have some sample data but your sample is quite small that you are not sure the population theoretical distribution of your sample. How could you estimate the variance of the mean average of your sample? 2. You have two samples from unknown distribution, name them X and Y. You want to know the distribution of ratio Z = X/Y and want to derive some useful statistics (such as mean and standard deviation) from the distribution of the ratio. 3. You have two samples A and B and you want to test whether they come from the same population 4. You have regression model y = x + and you want to get the confidence interval of the parameters and.

Jackknife analysis For a given statistic, one often wants to calculate the bias and error with both are as small as possible. One practical way of doing so, in the absent of knowledge of underlying distribution, is from the data itself through the so called Jackknife analysis. The basic idea is calculate the statistic repeatedly while each excluding one (or more) data points from the estimation of the statistic. Jackknife analysis is related, albeit less general, to the bootstrap method discussed earlier. Its main advantage, however, is in its simplicity.

Jackknife analysis Here we'll rigorously proof that the Jackknife analysis works for a certain case. Assume we are after a statistic s which we want to estimate from n data points, E(sn). We'll assume that the estimator is biased although asymptotically unbiased. For example, assume that the bias in the estimation is given by: E(sn ) s = 1 X ai =ni i=1 We can make n samples of n-1 observations, define a new statistic: s0n = nsn (n 1)sn 1;AV = sn + (n 1)(sn sn 1;AV ) Which less biased than E(sn): E(s0n ) s = a2 =n2 + O(n 3 )

The LOFAR telescope

d A n n t t e e n n n n a a A

LOFAR test stations images

103 3x105y Neutral IGM Time Redshift Dark Ages First UV sources EoR 10 5x108y Protogalaxies Galaxies Clusters 0 Faint radioloud quasars Galactic Galacticsynchrotron synchrotronemission emission LOFAR Foregrounds Ionized IGM 13x109y

D G 2 I Creating a dataset for LOFAR