Statistical Methods for Data Analysis

Similar documents
( x) ( ) F ( ) ( ) ( ) Prob( ) ( ) ( ) X x F x f s ds

Nonlinear Oscillations and Chaos

So we will instead use the Jacobian method for inferring the PDF of functionally related random variables; see Bertsekas & Tsitsiklis Sec. 4.1.

Chapter 3 Single Random Variables and Probability Distributions (Part 1)

Chapter 2. Random Variable. Define single random variables in terms of their PDF and CDF, and calculate moments such as the mean and variance.

Random processes and probability distributions. Phys 420/580 Lecture 20

Physics 403. Segev BenZvi. Monte Carlo Techniques. Department of Physics and Astronomy University of Rochester

State Space and Hidden Markov Models

Lecture Notes 2 Random Variables. Discrete Random Variables: Probability mass function (pmf)

u( x)= Pr X() t hits C before 0 X( 0)= x ( ) 2 AMS 216 Stochastic Differential Equations Lecture #2

ORF 245 Fundamentals of Statistics Joint Distributions

Signals and Spectra - Review

Plotting data is one method for selecting a probability distribution. The following

( ) ( ) Monte Carlo Methods Interested in. E f X = f x d x. Examples:

Modern Methods of Data Analysis - WS 07/08

2 Random Variable Generation

Today: Fundamentals of Monte Carlo

Chapter 4: Monte Carlo Methods. Paisan Nakmahachalasint

Monte Carlo Integration I

Monte Carlo Integration II & Sampling from PDFs

Uniform Random Number Generators

L20: MLPs, RBFs and SPR Bayes discriminants and MLPs The role of MLP hidden units Bayes discriminants and RBFs Comparison between MLPs and RBFs

USING THE RANDOM ITERATION ALGORITHM TO CREATE FRACTALS

Random Number Generation. Stephen Booth David Henty

Class 12. Random Numbers

M.S. Project Report. Efficient Failure Rate Prediction for SRAM Cells via Gibbs Sampling. Yamei Feng 12/15/2011

MONTE CARLO TESTING AND VERIFICATION OF NUMERICAL ALGORITHM IMPLEMENTATIONS. David D. Pokrajac Abdullah-Al-Zubaer Imran Predrag R.

Laplace Distribution

Chapter 2: The Random Variable

Homework 3 solution (100points) Due in class, 9/ (10) 1.19 (page 31)

Monte Carlo Techniques

Lecture Notes 2 Random Variables. Random Variable

Independent Events. Two events are independent if knowing that one occurs does not change the probability of the other occurring

Monte Carlo Methods in High Energy Physics I

Math 180A. Lecture 16 Friday May 7 th. Expectation. Recall the three main probability density functions so far (1) Uniform (2) Exponential.

Approximate inference, Sampling & Variational inference Fall Cours 9 November 25

An example to illustrate frequentist and Bayesian approches

Lecture 20. Randomness and Monte Carlo. J. Chaudhry. Department of Mathematics and Statistics University of New Mexico

Monte Carlo methods for kinetic equations

Rejection sampling - Acceptance probability. Review: How to sample from a multivariate normal in R. Review: Rejection sampling. Weighted resampling

Probability Distributions

General Principles in Random Variates Generation

n px p x (1 p) n x. p x n(n 1)... (n x + 1) x!

Statistische Methoden der Datenanalyse. Kapitel 3: Die Monte-Carlo-Methode

functions Poisson distribution Normal distribution Arbitrary functions

Application of Chaotic Number Generators in Econophysics

NUMERICAL COMPUTATION OF THE CAPACITY OF CONTINUOUS MEMORYLESS CHANNELS

Physics 403 Monte Carlo Techniques

Monte Carlo Radiation Transfer I

Fixed Point Theorem and Sequences in One or Two Dimensions

MA2223 Tutorial solutions Part 1. Metric spaces

Lecture 15: MCMC Sanjeev Arora Elad Hazan. COS 402 Machine Learning and Artificial Intelligence Fall 2016

Monte Carlo radiation transport codes

Probability & Statistics: Infinite Statistics. Robert Leishman Mark Colton ME 363 Spring 2011

II. Probability. II.A General Definitions

1. Write a program to calculate distance traveled by light

2 Statistical Estimation: Basic Concepts

Bayesian statistics. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Pattern Recognition and Machine Learning. Bishop Chapter 11: Sampling Methods

Modeling: How do we capture the uncertainty in our data and the world that produced it?

A fast random number generator for stochastic simulations

How does the computer generate observations from various distributions specified after input analysis?

Reduction of Variance. Importance Sampling

Blind Equalization via Particle Filtering

Pattern Recognition. Parameter Estimation of Probability Density Functions

Monte Carlo and cold gases. Lode Pollet.

3.5 Continuity of a Function One Sided Continuity Intermediate Value Theorem... 23

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods

Mobile Robot Localization

Random variables, distributions and limit theorems

Algorithms and Networking for Computer Games

Computer Problems for Taylor Series and Series Convergence

Transform Techniques - CF

Mean Intensity. Same units as I ν : J/m 2 /s/hz/sr (ergs/cm 2 /s/hz/sr) Function of position (and time), but not direction

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

P Values and Nuisance Parameters

Transform Techniques - CF

Methods of Data Analysis Random numbers, Monte Carlo integration, and Stochastic Simulation Algorithm (SSA / Gillespie)

Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data

Detectors in Nuclear Physics: Monte Carlo Methods. Dr. Andrea Mairani. Lectures I-II

CHAPTER 3 CHAOTIC MAPS BASED PSEUDO RANDOM NUMBER GENERATORS

Random number generators and random processes. Statistics and probability intro. Peg board example. Peg board example. Notes. Eugeniy E.

Monte Carlo integration (naive Monte Carlo)

Simulated Annealing for Constrained Global Optimization

About complexity. We define the class informally P in the following way:

EE 302 Division 1. Homework 6 Solutions.

2 Generating Functions

Chapter Review of of Random Processes

Statistical Methods in Particle Physics

CPSC 531: Random Numbers. Jonathan Hudson Department of Computer Science University of Calgary

Advanced Monte Carlo Methods Problems

Where now? Machine Learning and Bayesian Inference

Random variable X is a mapping that maps each outcome s in the sample space to a unique real number x, x. X s. Real Line

4 Inverse function theorem

Transform Techniques - CF

Computer Intensive Methods in Mathematical Statistics

Transformations and Expectations

On the Optimal Scaling of the Modified Metropolis-Hastings algorithm

Chapter 2. Discrete Distributions

Numerical Characterization of Multi-Dielectric Green s Function for 3-D Capacitance Extraction with Floating Random Walk Algorithm

Transcription:

Statistical Methods for Data Analysis Random number generators Luca Lista INFN Napoli

Pseudo-random generators Requirement: Simulate random process with a computer E.g.: radiation interaction with matter, cosmic rays, particle interaction generators, But also: finance, videogames, 3D graphics,... Problem: Generate random (or almost random ) variables with a computer but computers are deterministic! Luca Lista Statistical Methods for Data Analysis 2

Pseudo-random numbers Definition: Deterministic numeric sequences whose behavior is not easily predictable with simple analytic epressions (Re-) producible with an algorithm based on mathematical formulae Statistical behavior similar to real random sequences Luca Lista Statistical Methods for Data Analysis 3

Eample from chaos transition Let s fi an initial value 0 Define by recursion the sequence: n+1 = n (1 n ) Depending on, the sequence will have different possible behaviors If the sequence converges, we would have, for n the limit solving the equation: = (1 ) = (1- )/, 0 Luca Lista Statistical Methods for Data Analysis 4

Stable behavior Actually, for sufficiently small starting from: n 0 = 0.5 the sequence converges n > 200 Luca Lista Statistical Methods for Data Analysis 5

Bifurcation For > 3 the series does not converge, but oscillates between two values: a = b (1 b ) b = a (1 a ) n n > 200 Luca Lista Statistical Methods for Data Analysis 6

Bifurcation II, III, Bifurcation repeats when grows Sequences of 4, 8, 16, repeating values n n > 200 Luca Lista Statistical Methods for Data Analysis 7

Chaotic behavior For even larger the sequence is unpredictable. For instance, for =4 values densely fills the interval [0, 1] n 200 < n < 100000 Luca Lista Statistical Methods for Data Analysis 8

Transition to chaos Luca Lista Statistical Methods for Data Analysis 9

Another complete view Luca Lista Statistical Methods for Data Analysis 10

Properties of Random Numbers A good random sequence: { 1, 2,, n, } should be made of elements that are independent and identically distributed (i.i.d.) : P( i ) = P( j ), i, j P( n n-1 ) = P( n ), n Luca Lista Statistical Methods for Data Analysis 11

(Pseudo-)random generators The standard C function drand48 is based on sequences of 48 bit integer numbers The sequence is defined as: where: n+1 = (a n + c) mod m m = 2 48 a = 25214903917 = 5DEECE66D (he) c = 11 = B (he) man drand48 for further information! Those numbers give a uniform distribution Luca Lista Statistical Methods for Data Analysis 12

Pseudo-random generators To convert into a floating-point number, just divide the integer by 2 48. The result will be uniformly distributed from 0 to 1 (with precision 1/2 48 ) drand48, mrand48, lrand48 return random numbers with different precision using a sufficiently large number of bits from the main integer sequence Luca Lista Statistical Methods for Data Analysis 13

Random generators in ROOT TRandom (low period: 10 9 ) TRandom1 ( Ranlu, F.James) TRandom2 (period: 10 26 ) TRandom3 (period: 2 19937-1) ROOT::Math generators GSL based, relatively new See dedicated slides Luca Lista Statistical Methods for Data Analysis 14

Probability distribution Within precision, the distribution is uniform (flat) δn / δr r = drand48() Luca Lista Statistical Methods for Data Analysis 15

Non uniform sequences In order to obtain a Gaussian distribution: average many numbers with any limited distribution Central limit theorem r = 0; for ( int i = 0; i < n; i++ ) r += drand48(); r /= n; Works, but inefficient! Luca Lista Statistical Methods for Data Analysis 16

Distribution of 1 / n Σ i=1,n r i Luca Lista Statistical Methods for Data Analysis 17

Comparison with true Gaussians Luca Lista Statistical Methods for Data Analysis 18

Generate a known PDF Given a PDF: f ( ) = dp d Its cumulative distribution is defined as: F ( ) = f ( ʹ ) dʹ Luca Lista Statistical Methods for Data Analysis 19

Inverting the cumulative If the inverse of the cumulative distribution is known (or easily computable numerically) a variable defined as: = F -1 (r) is distributed according to the PDF f() if r is uniformly distributed between 0 and 1 Luca Lista Statistical Methods for Data Analysis 20

Demonstration As r = F(), then: hence: df d r = d = d d P = d ( ) ( ) d If r has a uniform distribution, then dp/dr = 1, hence dp/d = f() f f dp dr Luca Lista Statistical Methods for Data Analysis 21

Luca Lista Statistical Methods for Data Analysis 22 Eample Eponential distribution: Normalization: e f P = = ) ( d d 1 )d ( 1 1 1 d 0 0 = = = = =+ = + f e e [ ] ) log(1 1 ) ( ) log(1 1 1 1 d ) ( ) ( 1 0 0 r r F r r e r e e e f F = = = = = = = ʹ ʹ = ʹ = ʹ = ʹ ) log( 1 ) ( 1 r r F = = 1-r and r have both uniform distribution between 0 and 1

Generate uniformly over a sphere Generate θ and ϕ. Factorize the PDF: Luca Lista Statistical Methods for Data Analysis 23

Generating Gaussian numbers Gaussian cumulative not easily invertible (erf) Solution: Generate simultaneously two independently Gaussian numbers From the inversion of 2D radial cumulative function: Bo-Muller transformation: float r = sqrt(-2*log(drand48()); float phi = 2*pi*drand48(); float y1 = r*cos(phi), y2 = r*sin(phi); Other faster alternative are available (e.g.: Ziggurat) Luca Lista Statistical Methods for Data Analysis 24

Hit or miss Monte Carlo Reproduce a generic distribution: 1. Etract flat from a to b 2. Compute f = f() 3. Etract r from 0 to m, where m ma f() 4. If r > f repeat etraction, if r < f accept m f() miss In this way, the density is proportional to f() hit May be inefficient if the function is very peaked! a Finding maimum of f may be slow in many dimensions b Luca Lista Statistical Methods for Data Analysis 25

Eample: compute an integral double f(double ){ return pow(sin()/, 2); } int main() { const double a = 0, b = 3.141592654, m = 1; int tot = 0; for(int i = 0; i < 10000; ++i) { do { double = a + (b a) * drand48(); double ff = f(); ++tot; double r = drand48() * m; } while (r > ff); } double ratio = double(hit)/double(tot); double error = sqrt(ratio * (1 ratio)/tot); double area = (b a) * m * ratio; } return 0; Luca Lista Statistical Methods for Data Analysis 26

Importance sampling The same method can be repeated in different regions: 1. Etract in one of the regions (1), (2), or (3) with prob. proportional to the areas 2. Apply hit-or-miss in the randomly chosen region m f() 2 The density is still prop. to f(), but a smaller number of etraction is sufficient (and the program runs faster!) 1 3 Variation: use hit or miss within an envelope PDF whose cumulative has is easily invertible a 0 a 1 a 2 a 3 Luca Lista Statistical Methods for Data Analysis 27

Eercise Generate according to the following distribution (0 < ): Luca Lista Statistical Methods for Data Analysis 28

Estimate the error on MC integral MC can also be a mean to estimate integrals Accepting n over N etractions, binomial distribution can be applied: σ n 2 = Nε(1- ε) Where ε = n/n is the best estimate of ε. The error on the estimate of ε is: σ 2 ε = σ 2 n/n = ε(1- ε)/n σ ε = ε( 1 ε) N Luca Lista Statistical Methods for Data Analysis 29

Multi-dimensional integral estimates The same Monte Carlo technique can be applied for multi-dimensional integral estimates, etracting independently the N coordinates ( 1,, n ) The error is always proportional to 1/ N, regardless of the dimension N This is and advantage w.r.t. the standard numerical integration Difficulties: Finding maimum of f numerically may be slow in many dimensions Partitioning the integration range (importance sampling) may be non trivial to do automatically Luca Lista Statistical Methods for Data Analysis 30

References Logistic map, bifurcation and chaos http://en.wikipedia.org/wiki/logistic_map PDG: review of random numbers and Monte Carlo http://pdg.lbl.gov/2001/monterpp.pdf GENBOD: phase space generator F. James, Monte Carlo Phase Space, CERN 68-15 (1968) Luca Lista Statistical Methods for Data Analysis 31