Lecture 20. Randomness and Monte Carlo. J. Chaudhry. Department of Mathematics and Statistics University of New Mexico

Similar documents
Random processes and probability distributions. Phys 420/580 Lecture 20

Chapter 4: Monte Carlo Methods. Paisan Nakmahachalasint

Monte Carlo. Lecture 15 4/9/18. Harvard SEAS AP 275 Atomistic Modeling of Materials Boris Kozinsky

Monte Carlo Integration. Computer Graphics CMU /15-662, Fall 2016

How does the computer generate observations from various distributions specified after input analysis?

Uniform Random Number Generators

2 Random Variable Generation

Review of Statistical Terminology

Department of Electrical- and Information Technology. ETS061 Lecture 3, Verification, Validation and Input

CS 361: Probability & Statistics

Random Number Generation. CS1538: Introduction to simulations

2905 Queueing Theory and Simulation PART IV: SIMULATION

Slides 3: Random Numbers

CPSC 531: Random Numbers. Jonathan Hudson Department of Computer Science University of Calgary

Scientific Computing: Monte Carlo

Random Number Generators

Math 180A. Lecture 16 Friday May 7 th. Expectation. Recall the three main probability density functions so far (1) Uniform (2) Exponential.

Topics in Computer Mathematics

Random Number Generators - a brief assessment of those available

From Random Numbers to Monte Carlo. Random Numbers, Random Walks, Diffusion, Monte Carlo integration, and all that

2008 Winton. Review of Statistical Terminology

Lecture 2 : CS6205 Advanced Modeling and Simulation

Northwestern University Department of Electrical Engineering and Computer Science

Independent Events. Two events are independent if knowing that one occurs does not change the probability of the other occurring

Discrete Random Variables

Basics on Probability. Jingrui He 09/11/2007

IE 581 Introduction to Stochastic Simulation. One page of notes, front and back. Closed book. 50 minutes. Score

How does the computer generate observations from various distributions specified after input analysis?

First Digit Tally Marks Final Count

Discrete Random Variables

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

Recitation 2: Probability

Hochdimensionale Integration

Numerical Methods I Monte Carlo Methods

( x) ( ) F ( ) ( ) ( ) Prob( ) ( ) ( ) X x F x f s ds

What is a random variable

The random variable 1

Simulation. Where real stuff starts

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

ECE Homework Set 2

STAT 430/510 Probability Lecture 7: Random Variable and Expectation

Review of Probability. CS1538: Introduction to Simulations

Contents. 1 Probability review Introduction Random variables and distributions Convergence of random variables...

Analysis of Engineering and Scientific Data. Semester

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

X 1 ((, a]) = {ω Ω : X(ω) a} F, which leads us to the following definition:

Simulation. Where real stuff starts

Random variables. DS GA 1002 Probability and Statistics for Data Science.

IE 581 Introduction to Stochastic Simulation. One page of notes, front and back. Closed book. 50 minutes. Score

Statistics 100A Homework 5 Solutions

Random number generators

Preliminary statistics

Elementary Discrete Probability

Uniform random numbers generators

More General Functions Is this technique limited to the monomials {1, x, x 2, x 3,...}?

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

Computer Applications for Engineers ET 601

Random number generators and random processes. Statistics and probability intro. Peg board example. Peg board example. Notes. Eugeniy E.

Monty Hall Puzzle. Draw a tree diagram of possible choices (a possibility tree ) One for each strategy switch or no-switch

Random number generation

CSE 311 Lecture 13: Primes and GCD. Emina Torlak and Kevin Zatloukal

DUBLIN CITY UNIVERSITY

Hand Simulations. Christos Alexopoulos and Dave Goldsman 9/2/14. Georgia Institute of Technology, Atlanta, GA, USA 1 / 26

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

Lecture 1: Basics of Probability

Chapter 2. Random Variable. Define single random variables in terms of their PDF and CDF, and calculate moments such as the mean and variance.

6.080 / Great Ideas in Theoretical Computer Science Spring 2008

20.1 2SAT. CS125 Lecture 20 Fall 2016

Simulation Modeling. Random Numbers

A Repetition Test for Pseudo-Random Number Generators

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Uniform Random Binary Floating Point Number Generation

MTH739U/P: Topics in Scientific Computing Autumn 2016 Week 6

Monte Carlo Techniques

Lecture 5: Random numbers and Monte Carlo (Numerical Recipes, Chapter 7) Motivations for generating random numbers

Stochastic Models of Manufacturing Systems

Lecture 3. Discrete Random Variables

Generating Random Variables

Introduction to Machine Learning

DS-GA 1002 Lecture notes 2 Fall Random variables

Stochastic Models of Manufacturing Systems

Monte Carlo Integration II & Sampling from PDFs

Numerical methods for lattice field theory

Probability Theory. Introduction to Probability Theory. Principles of Counting Examples. Principles of Counting. Probability spaces.

Differentiation and Integration

02 Background Minimum background on probability. Random process

Random Variables and Their Distributions

Class 8 Review Problems 18.05, Spring 2014

Math 151. Rumbos Fall Solutions to Review Problems for Exam 2. Pr(X = 1) = ) = Pr(X = 2) = Pr(X = 3) = p X. (k) =

Review of Probability Theory

Random Variables. Statistics 110. Summer Copyright c 2006 by Mark E. Irwin

Simulation. Version 1.1 c 2010, 2009, 2001 José Fernando Oliveira Maria Antónia Carravilla FEUP

General Principles in Random Variates Generation

Uniform and Exponential Random Floating Point Number Generation

Chapter 2: Random Variables

Lecture 10. Variance and standard deviation

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Randomized algorithms. Inge Li Gørtz

2007 Winton. Empirical Distributions

Discrete-Event Simulation:

Transcription:

Lecture 20 Randomness and Monte Carlo J. Chaudhry Department of Mathematics and Statistics University of New Mexico J. Chaudhry (UNM) CS 357 1 / 40

What we ll do: Random number generators Monte-Carlo integration Monte-Carlo simulation Our testing and our methods often rely on randomness and the impact of finite precision Random input Random perturbations Random sequences J. Chaudhry (UNM) CS 357 2 / 40

Randomness Randomness unpredictability One view: a sequence is random if it has no shorter description Physical processes, such as flipping a coin or tossing dice, are deterministic with enough information about the governing equations and initial conditions. But even for deterministic systems, sensitivity to the initial conditions can render the behavior practically unpredictable. we need random simulation methods http://dilbert.com/strips/comic/2001-10-25/ http://www.xkcd.com/221/ J. Chaudhry (UNM) CS 357 3 / 40

Quasi-Random Sequences For some applications, reasonable uniform coverage of the sample is more important than the randomness True random samples often exhibit clumping Perfectly uniform samples uses a uniform grid, but does not scale well at high dimensions quasi-random sequences attempt randomness while maintaining coverage J. Chaudhry (UNM) CS 357 4 / 40

Quasi-Random Sequences quasi random sequences are not random, but give random appearance by design, the points avoid each other, resulting in no clumping J. Chaudhry (UNM) CS 357 5 / 40

Randomness is easy, right? In May, 2008, Debian announced a vulnerability with OpenSSL: the OpenSSL pseudo-random number generator the seeding process was compromised (2 years) only 32,767 possible keys seeding based on process ID (this is not entropy!) all SSL and SSH keys from 9/2006-5/2008 regenerated all certificates recertified Cryptographically secure pseudorandom number generator (CSPRNG) are necessary for some apps Other apps rely less on true randomness J. Chaudhry (UNM) CS 357 6 / 40

Repeatability With unpredictability, true randomness is not repeatable...but lack of repeatability makes testing/debugging difficult So we want repeatability, but also independence of the trials 1 >> rand( seed,1234) 2 >> rand(10,1) J. Chaudhry (UNM) CS 357 7 / 40

Pseudorandom Numbers Computer algorithms for random number generations are deterministic...but may have long periodicity (a long time until an apparent pattern emerges) These sequences are labeled pseudorandom Pseudorandom sequences are predictable and reproducible (this is mostly good) J. Chaudhry (UNM) CS 357 8 / 40

Random Number Generators Properties of a good random number generator: Random pattern: passes statistical tests of randomness Long period: long time before repeating Efficiency: executes rapidly and with low storage Repeatability: same sequence is generated using same initial states Portability: same sequences are generated on different architectures J. Chaudhry (UNM) CS 357 9 / 40

Random Number Generators Early attempts relied on complexity to ensure randomness midsquare method: square each member of a sequence and take the middle portion of the results as the next member of the sequence...simple methods with a statistical basis are preferable J. Chaudhry (UNM) CS 357 10 / 40

Linear Congruential Generators Congruential random number generators are of the form: x k = (ax k 1 + c) ( mod m) where a and c are integers given as input. x 0 is called the seed Integer m is the largest integer representable (e.g. 2 31 1) Quality depends on a and c. The period will be at most m. Example Let a = 13, c = 0, m = 31, and x 0 = 1. 1, 13, 14, 27, 10, 6,... This is a permutation of integers from 1,..., 30, so the period is m 1. J. Chaudhry (UNM) CS 357 11 / 40

History From C. Moler, emphncm IBM used Scientific Subroutine Package (SSP) in the 1960 s the mainframes. Their random generator, rnd used a = 65539, c = 0, and m = 2 31. arithmetic mod 2 31 is done quickly with 32 bit words. multiplication can be done quickly with a = 2 16 + 3 with a shift and short add. Notice (mod m): x k+2 = 6x k+1 9x k...strong correlation among three successive integers J. Chaudhry (UNM) CS 357 12 / 40

History From C. Moler, NCM Matlab used a = 7 5, c = 0, and m = 2 31 1 for a while period is m 1. this is no longer sufficient J. Chaudhry (UNM) CS 357 13 / 40

what s used? Two popular methods: 1. Method of Marsaglia (period 2 1430 ). 1 Initialize x 0,..., x 3 and c to random values given a seed 2 3 Let s = 2111111111x n 4 + 1492x n 3 1776x n 2 + 5115x n 1 + c 4 5 Compute x n = s mod 2 32 6 7 c = floor(s/2 32 ) 2. rand() in Unix uses a = 1103515245, c = 12345, m = 2 31. In general, the digits in random numbers are not themselves random...some patterns reoccur much more often. J. Chaudhry (UNM) CS 357 14 / 40

Linear Congruential Generators sensitive to a and c be careful with supplied random functions on your system period is m standard division is necessary if generating floating points in [0, 1). J. Chaudhry (UNM) CS 357 15 / 40

Fibonacci produce floating-point random numbers directly using differences, sums, or products. Typical subtractive generator: x k = x k 17 x k 5 with lags of 17 and 5. Lags much be chosen very carefully negative results need fixing more storage needed than congruential generators no division needed very very good statistical properties long periods since repetition does not imply a period J. Chaudhry (UNM) CS 357 16 / 40

Discrete random variables random variable x values: x 0, x 1,..., x n probabilities p 0, p 1,..., p n with n i=0 p i = 1 throwing a die (1-based index) values: x 1 = 1, x 2 = 2,..., x 6 = 6 probabilities p i = 1/6 J. Chaudhry (UNM) CS 357 17 / 40

Expected value and variance expected value: average value of the variable E[x] = n x j p j j=1 variance: variation from the average σ 2 [x] = E[(x E[x]) 2 ] = E[x 2 ] E[x] 2 throwing a die expected value: E[x] = (1 + 2 + + 6)/6 = 3.5 variance: σ 2 [x] = 2.916 J. Chaudhry (UNM) CS 357 18 / 40

estimated E[x] to estimate the expected value, choose a set of random values based on the probability and average the results N E[x] = 1 N j=1 x i bigger N gives better estimates throwing a die 3 rolls: 3, 1, 6 E[x] (3 + 1 + 6)/3 = 3.33 9 rolls: 3, 1, 6, 2, 5, 3, 4, 6, 2 E[x] (3 + 1 + 6 + 2 + 5 + 3 + 4 + 6 + 2)/9 = 3.51 J. Chaudhry (UNM) CS 357 19 / 40

Law of large numbers by taking N to, the error between the estimate and the expected value is statistically zero. That is, the estimate will converge to the correct value P(E[x] = lim N 1 N N x i ) = 1 This is the basic idea behind Monte Carlo estimation i=1 J. Chaudhry (UNM) CS 357 20 / 40

Continuous random variable and density function random variable: x values: x [a, b] probability: density function ρ(x) with b a ρ(x) dx = 1 probability that the variable has value between (c, d) [a, b]: d c ρ(x) J. Chaudhry (UNM) CS 357 21 / 40

Uniformly distributed random variable ρ(x) is constant b a ρ(x) dx = 1 means ρ(x) = 1/(b a) J. Chaudhry (UNM) CS 357 22 / 40

Continuous extensions to expectation and variance expected value E[x] = E[g(x)] = variance σ 2 [x] = σ 2 [g(x)] = b a b a b a b a xρ(x) dx g(x)ρ(x) dx (x E[x]) 2 ρ(x) dx = E[x 2 ] (E[x]) 2 (g(x) E[g(x)]) 2 p(x) dx = E[g(x) 2 ] (E[g(x)]) 2 estimating the expected value E[g(x)] 1 N N g(x i ) J. Chaudhry (UNM) CS 357 23 / 40 i=1

multidementional extensions difficult domains (complex geometries) expected value E[g(x)] = g(x)ρ(x) da x x D J. Chaudhry (UNM) CS 357 24 / 40

Sampling over intervals Almost all systems provide a uniform random number generator on (0, 1) Matlab: rand If we need a uniform distribution over (a, b), then we modify x k on (0, 1) by (b a)x k + a Matlab: Generate 100 uniformly distributed values between (a, b) r = a + (b-a).*rand(100,1) Uniformly distributed integers: randi Random permutations: randperm J. Chaudhry (UNM) CS 357 25 / 40

Non-uniform distributions sampling nonuniform distributions is much more difficult if the cumulative distribution function is invertible, then we can generate the non-uniform sample from the uniform. Consider probability density function ρ(t) = λe λt, t > 0 giving the cumulative distribution function as x x F(x) = λe λt dt = (e λt ) = (1 e λx ) thus inverting the CDF where y k is uniform in (0, 1)....not so easy in general 0 x k = log(1 y k )/λ 0 J. Chaudhry (UNM) CS 357 26 / 40

Monte Carlo Central to the PageRank (and many many other applications in fincance, science, informatics, etc) is that we randomly process something what we want to know is on average what is likely to happen what would happen if we have an infinite number of samples? let s take a look at integral (a discrete limit in a sense) J. Chaudhry (UNM) CS 357 27 / 40

(deterministic) numerical integration split domain into set of fixed segments sum function values with size of segments (Riemann!) J. Chaudhry (UNM) CS 357 28 / 40

Monte Carlo Integration We have for a random sequence x 1,..., x n 1 0 f (x) dx 1 n n f (x i ) i=1 1 n=100; 2 x=rand(n,1); 3 a=f(x); 4 s=sum(a)/n; J. Chaudhry (UNM) CS 357 29 / 40

Monte Carlo over random domains We have for a random sequence x 1,..., x n in Ω I = Ω f (x) dx Q n = V 1 n where V is the volume given by V = Ω dx n f (x i ) Law of large numbers gurantees that Q n I as n Works in any dimension! One dimension: b a f (x) dx (b a) 1 n i=1 n f (x i ) i=1 where x 1,..., x n are uniformly distributed in (a, b) J. Chaudhry (UNM) CS 357 30 / 40

2d example: computing π Use the unit square [0, 1] 2 with a quarter-circle { 1 (x, y) circle f (x, y) = 0 else A quarter circle = 1 1 0 0 f (x, y) dxdy = pi 4 J. Chaudhry (UNM) CS 357 31 / 40

2d example: computing π Estimate the area of the circle by randomly evaluating f (x, y) A quarter circle 1 N N f (x i, y i ) i=1 J. Chaudhry (UNM) CS 357 32 / 40

2d example: computing π By definition A quarter circle = π/4 so π 4 N N f (x i, y i ) i=1 J. Chaudhry (UNM) CS 357 33 / 40

2d example: computing π algorithm 1 input N 2 call rand in 2d 3 for i=1:n 4 sum = sum + f (x i, y i ) 5 end 6 sum = 4 sum/n J. Chaudhry (UNM) CS 357 34 / 40

2d example: computing π algorithm The expected value of the error is O( 1 N ) convergence does not depend on dimension deterministic integration is very hard in higher dimensions deterministic integration is very hard for complicated domains Trapezoid rule in dimension d: O( 1 N 2/d ) J. Chaudhry (UNM) CS 357 35 / 40

Error in Monte Carlo Estimation The distribution of an average is close to being normal, even when the distribution from which the average is computed is not normal. What? Let x 1,..., x n be some independent random variables from any PDF Each x i has expected value µ and standard deviation σ Consider the Average A n = (x 1 + + x n )/n The expected value of A n is µ and the standard deviation is σ n This follows from Central Limit Theorem What? The sample mean has an error of σ/ n J. Chaudhry (UNM) CS 357 36 / 40

Stochastic Simulation Stochastic simulation mimics physical behavior through random simulations...also known as Monte Carlo method (no single MC method!) Usefulness: nondeterministic processes deterministic models that are too complicated to model deterministic models with high dimensionality J. Chaudhry (UNM) CS 357 37 / 40

Stochastic Simulation Two requirements for MC: knowing which probability distributions are needed generating sufficient random numbers The probability distribution depends on the problem (theoretical or empirical evidence) The probability distribution can be approximated well by simulating a large number of trials J. Chaudhry (UNM) CS 357 38 / 40

Stochastic Simulation: The Monte Hall Problem Question You are on a gameshow. There are three doors that you have the option of opening. There is a prize behind one of the doors, and there are goats behind the other two. You select one of the doors. One of the remaining two doors is opened to reveal a goat. You are then given the option of opening your door or the other remaining unopened door. Should you switch. Determine the probability that you will win the prize if you switch. Answer: Yes, always switch. Probability of success is 2/3. Can show using conditinal probability J. Chaudhry (UNM) CS 357 39 / 40

Stochastic Simulation: The Monte Hall Problem Solution via random simuation 1 input N 2 wins = 0 3 for i=1:n 4 Generate random permutation of goats and prize 5 You randomly choose a door 6 if the door had a goat, host always chooses the other door with the goat 7 else if the door had a prize, host randomly chooses one of the doors with the goat 8 end 9 You switch your choice 0 if the switched door had the prize 1 wins = wins + 1 2 end 3 4 end 5 probability of success = wins/n J. Chaudhry (UNM) CS 357 40 / 40