A Review of Basic Monte Carlo Methods

Similar documents
CSCI-6971 Lecture Notes: Monte Carlo integration

Lecture 21: Counting and Sampling Problems

Probability Theory and Statistics. Peter Jochumzen

Solutions to Math 41 First Exam October 18, 2012

SDS 321: Introduction to Probability and Statistics

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Monte Carlo Integration

6 Markov Chain Monte Carlo (MCMC)

To get horizontal and slant asymptotes algebraically we need to know about end behaviour for rational functions.

Lecture 11: Probability, Order Statistics and Sampling

MTH739U/P: Topics in Scientific Computing Autumn 2016 Week 6

Robert Collins CSE586, PSU Intro to Sampling Methods

ECE 4400:693 - Information Theory

Lecture 11: Continuous-valued signals and differential entropy

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Random Variables and Their Distributions

INFINITE SEQUENCES AND SERIES

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Limits at Infinity. Horizontal Asymptotes. Definition (Limits at Infinity) Horizontal Asymptotes

Section 8.1: Interval Estimation

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

Statistics for scientists and engineers

ABC methods for phase-type distributions with applications in insurance risk problems

Khinchin s approach to statistical mechanics

Discrete Mathematics and Probability Theory Fall 2013 Vazirani Note 16. A Brief Introduction to Continuous Probability

Random variables. DS GA 1002 Probability and Statistics for Data Science.

Math 105 Course Outline

The main results about probability measures are the following two facts:

Discrete Mathematics and Probability Theory Fall 2015 Note 20. A Brief Introduction to Continuous Probability

An Introduction to Non-Standard Analysis and its Applications

A quick introduction to Markov chains and Markov chain Monte Carlo (revised version)

CSC 446 Notes: Lecture 13

Infinite Limits. By Tuesday J. Johnson

Asymptotic distribution of the sample average value-at-risk

Notes on Random Variables, Expectations, Probability Densities, and Martingales

56 CHAPTER 3. POLYNOMIAL FUNCTIONS

Robert Collins CSE586, PSU Intro to Sampling Methods

Monte Carlo Integration. Computer Graphics CMU /15-662, Fall 2016

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Markov Chain Monte Carlo (MCMC)

1 Joint and marginal distributions

STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN

1 Acceptance-Rejection Method

CALCULUS SEVENTH EDITION. Indiana Academic Standards for Calculus. correlated to the CC2

Introduction to Machine Learning CMU-10701

INTRODUCTION TO MARKOV CHAIN MONTE CARLO

Continuous Distributions

Continuous Random Variables

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 18

Iowa State University. Instructor: Alex Roitershtein Summer Homework #5. Solutions

Lecture Quantitative Finance Spring Term 2015

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Mathematical Methods for Physics and Engineering

2 Functions of random variables

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Adaptive Monte Carlo methods

Three examples of a Practical Exact Markov Chain Sampling

2tdt 1 y = t2 + C y = which implies C = 1 and the solution is y = 1

Stat 451 Lecture Notes Monte Carlo Integration

Generating the Sample

IEOR E4703: Monte-Carlo Simulation

exp{ (x i) 2 i=1 n i=1 (x i a) 2 (x i ) 2 = exp{ i=1 n i=1 n 2ax i a 2 i=1

Computational statistics

Eco517 Fall 2013 C. Sims MCMC. October 8, 2013

Advanced Calculus I Chapter 2 & 3 Homework Solutions October 30, Prove that f has a limit at 2 and x + 2 find it. f(x) = 2x2 + 3x 2 x + 2

Chapter 2. Limits and Continuity 2.6 Limits Involving Infinity; Asymptotes of Graphs

Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( )

ECE531 Lecture 10b: Maximum Likelihood Estimation

Monte Carlo Integration I [RC] Chapter 3

Eco517 Fall 2004 C. Sims MIDTERM EXAM

Expectation is a positive linear operator

Lecture 2. G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1

Continuous random variables

ECON 3150/4150, Spring term Lecture 6

If we want to analyze experimental or simulated data we might encounter the following tasks:

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Introduction to Machine Learning

Introduction and Overview STAT 421, SP Course Instructor

BASICS OF PROBABILITY

Spring 2012 Math 541B Exam 1

7 Random samples and sampling distributions

Recall the Basics of Hypothesis Testing

19 : Slice Sampling and HMC

MAT100 OVERVIEW OF CONTENTS AND SAMPLE PROBLEMS

17 : Markov Chain Monte Carlo

7.1 Indefinite Integrals Calculus

Monte Carlo Methods for Computation and Optimization (048715)

Surveying the Characteristics of Population Monte Carlo

Chapter 2. Limits and Continuity. 2.1 Rates of change and Tangents to Curves. The average Rate of change of y = f(x) with respect to x over the

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Recitation 2: Probability

Connectedness. Proposition 2.2. The following are equivalent for a topological space (X, T ).

Semi-Parametric Importance Sampling for Rare-event probability Estimation

Reproducing Kernel Hilbert Spaces

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Basic Sampling Methods

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER.

14.30 Introduction to Statistical Methods in Economics Spring 2009

conditional cdf, conditional pdf, total probability theorem?

Transcription:

A Review of Basic Monte Carlo Methods Julian Haft May 9, 2014 Introduction One of the most powerful techniques in statistical analysis developed in this past century is undoubtedly that of Monte Carlo simulation. First conceived by scientists working on the nuclear bomb in the forties, the theory of Monte Carlo simulation emerged simultaneously with the first computers capable of producing such simulations. In the 70 some odd years following they have become ubiquitous in such diverse fields as finance, chemistry, physics, biology and economics. This paper serves to elucidate the theoretical foundations of the Monte Carlo methods, while demonstrating some basic applications. Motivation Frequently the properties of a statistical model cannot easily be developed through statistical theory alone, due to complexity or gaps in understanding, necessitating the use of empirical methods. How do we empirically study a theoretical model? Well we have to find an actual instance of it that is conducive to study. However this is usually an impossible proposition. So we use Monte Carlo methods to generate data indistinguishable from data collected from an actual phenomena that adhered to the specifications of our model. Monte Carlo methods are thus uniquely suited to empirically studying the properties of theoretical models and this is how they re most often used Though there are a plethora of novel applications like the decryption of ciphers and computer learning models.). Heuristic explanation Monte Carlo methods are based in a rather simple analogy between probability and volume. The idea is encapsulated in the following method for calculating the volume of the unit circle. Let X and Y be i.i.d. U [ 1,1]. Generate a sample of size N from the pair X, Y ). Eliminate all x i, y i ), in the sample, not satisfying x 2 i + y2 i 1. Take the ratio of the number of remaining values over the total number of values calculated. That ratio should be approximately the ratio of 1

area the unit disk and the box [ 1, 1] 2 and it is also an approximation of the probability that an observed value of X, Y ) lies within the unit disk. The above procedure relies heavily on the idea that we can simulate uniform random variables or equivalently that we can simulate U [0,1] As all uniform distributions can be represented as scalings and translations of U [0,1].). Unsurprisingly we can in fact simulate U [0,1] perfectly according to any reasonable set of tests for randomness. General Monte Carlo methods are primarily predicated on the possibility of picking countably many random variables from a known probability distribution. Surprisingly this boils down to being able to simulate U [0,1], as almost any distribution can be simulated from a uniform distribution. In the following section we will outline the methods for simulating random variables from a uniform distribution beginning with the simplistic inverse probability transformation and ending with the extraordinarily powerful acceptreject method. Simulating random variables It will become apparent in the exposition of methods that almost all Monte Carlo methods require us to be able simulate indefinitely large samples from arbitrarily distributed random variable. In this section we demonstrate that we can in fact satisfy this requirement for almost any conceivable distribution. The first method exploits a connection between cumulative distribution functions and the uniform distribution on the unit interval. Inverse Transform Let X be a continuous random variable with pdf f and cdf F. Then we have: F x) = x ft)dt Let U U [0,1] then, assuming F has an inverse, we have P F 1 U) x) = P U F x)) = F x). So F 1 U) X. For the general case, where we don t know if F has an inverse, we use right continuity and monotonicity of a continuous cdf to define an analytic inverse: F 1 x) = inf{t F t) x, 0 < t < 1}. Clearly F 1 U) X for all continuous X. However, since F 1 may not have a closed form this method is often useless. 2

Example As an example of this method consider X β 1 2, 1 2 ), a special case of the β distribution where: F x) = 2 π arcsin x), so F 1 x) = sin 2 πx 2 We drew 10 4 samples from U [0,1] and applied F 1 to each of them and then plotted a histogram of our data with forty bars and superimposed the pdf of X on top of it: ). 5 4 3 2 1 0.2 0.4 0.6 0.8 1.0 Figure 1: Simulation results The treated data clearly has the desired distribution. However we cannot use this method to simulate a general β-distribution, as its cdf generally doesn t permit a closed form and therefore doesn t permit a closed form inverse. Thus we need additional methods before we can be satisfied that the Monte Carlo methods provided rely on reasonable assumptions. Accept-Reject The following method is far more general and has a much more intuitive core approach, however it takes a bit of effort to expose this intuition. The following procedure is not the actual method, but it is that the accept-reject method approximates. Suppose we wish to simulate a random variable X with known pdf f. Let X, U) be a the pairing of X with a random variable U that satisfies U X = x U [0,fx)], so: Thus: P X = x U = u) = P U = u X = x)p X = x) ) 1 = fx) fx) = 1 P X = x) = fx) 0 P U = u X = x)du 3

The above integral gives the marginal density of X, so: P X = x) = fx). Now this result is completely unexciting since in order to get it we would have to be able to simulate X. Nevertheless consider what samples from X, U) actually are, namely points under the probability distribution of X. The idea of the accept reject method is generate samples from Y, U [0,1] ). Where Y is a random variable with a pdf g with a support that contains the support of f and which differs from f by at most a multiplicative factor M. is a random variable following the uniform unit distribution. From this sample we then throw out all pairs Y i, U i ) for which U i is not less than fy i ) by eliminating all pairs where: U i fy ) MgY ). Though we inevitably will have thrown out some pairs under the pdf of X, all the remaining pairs are all under the graph of f, thus the marginal probability of Y for the accepted pairs is X distributed. Thus the set of accepted Y s in the sample are indistinguishable from a sample from X. The mathematical justification for this method can be found in the following calculation which proves that P Y x U is accepted) = P X x) Since the proof is quite arduous and not terribly enlightening, unless you re interested in exploiting the relationship between joint and conditional probabilities, it may be skipped.) P Y y U fy ) ) P = MgY ) First we solve for the denominator. P U fy ) ) = P MgY ) Next we calculate P = ) Y y, U fy ) MgY ) ) P U fy ) MgY ) = 1 M U fy ) ) P MgY ) Y y = = 1 Gy) = 1 Gy) U fy ) ) MgY ) Y = y P Y = y)dy fy) Mgy) gy)dy U fy ) MgY ), Y y ) P Y y) P U fy ) ) MgY ) Y = t P Y = t)dt y y = F y) MGy) ft) Mgt) gt)dt 4

Using this we evaluate the numerator: P Y y, U fy ) ) = P U fy ) ) MgY ) MgY ) Y y P Y y) = F y) MGy) Gy) So P = F y) M Y y U fy ) ) = F y) MgY ) In the above calculation we incidentally calculated the probability that a pair is accepted, here we reproduce it more explicitly. P U fy ) ) = P U fy ) ) MgY ) MgY ) Y = y P Y = y)dy = = 1 M fy) Mgy) gy)dy Limitations The accept-reject method is far more generally applicable than the inverse transform as it only requires us knowing f and being able to identify a probability distribution Y that is appropriate for use in the algorithm. Unfortunately identifying such a Y can be difficult and if M is large then generating a sample of approximately size N from X can be computationally intensive as we would have to generate a sample of size M N from Y, U). 5

Example As an example suppose we want to simulate X β2, 3), let f be the pdf of X. First we calculate M = max x [0,1] fx) = 16 9. Then we take Y U [0,1] and U [0,1] and generate 10 5 pairs Y, U), we expect to keep 56,250 after filtering out the ones which did not satisfy U 9fY ) 16. In the simulation I ran 56, 464 were accepted. The plotted histogram below shows that the accepted values are distributed β2, 3) 2.0 1.5 1.0 0.5 Figure 2: Simulation results Discrete RVs Suppose X is a discrete random variable whose range contains a countably infinite or finite number of values, x 1, x 2,..., x k,.... Suppose f is the pdf of X and let F x) = P X x) = t x ft). We can simulate X by sampling from U U [0,1] and for each U calculating a corresponding Y equal to the least x j such that u F x j ). We clearly have Y X. We have demonstrated three methods of generating random variables from other random variables. Our claim is that the above methods allow us to simulate a random variable with almost any distribution. Monte Carlo Integration Let X be a continuous RV with pdf f that takes values in χ. Let g be a function on χ. One of the central techniques in Monte Carlo simulations is the approximation of the following class of integrals: E f [gx)] = fx)gx)dx. The method is as follows: generate a sample X 1,..., X n ) from X and calculate the following sum ĝ n = 1 n gx i ). n χ i=1 6

The strong law of large numbers ensures that our estimator with converge to E f [gx)] as n approaches infinity. The variance of ĝ n can be expressed as: varĝ n ) = 1 gx) E f [gx)]) 2 fx)dx. n χ Of course we can use Monte Carlo integration to estimate varĝ n ) by calculating: σ n = 1 n 2 n i=1 gx i ) ĝ n ) 2. Let 1 C is the characteristic function of the segment of the unit disk in the first quadrant of R 2. The method we presented in the introduction to approximate the area of this segment is equivalent to using the Monte Carlo integral to approximate E f [1 C X)], where f is the pdf of X U [0,1], U [0,1] ). The utility of these estimators lies in their general applicability. However there are various conditions that can make the method very computationally expensive. We go over one in the following example. Example Let X N0, 1), let 1 X >2.6 be the indicator function for the inequality in the subscript, and let f be the pdf of X. Note that the expected value of E[1 X >2.6 ] = 1 X >2.6x)fx)dx is equivalent to 1 2.6 fx)dx or the probability that X > 2.6. The exact 2.6 numerical value of the integral is 0.009322. Generating a sample of size 10 5 from X 950 values in the support of g = 1 X >2.6 giving ĝ 10 5 = 950 10 5 = 0.0095. This is a rather poor estimate for such a gigantic sample. Intuitively this is because the only values of interest to us are extremely rare values of X and therefore we have to simulate X many times to get one such value. Importance sampling The shortcoming illustrated in the example above can be overcome by using a method called importance sampling. Let X be a random variable with pdf f which takes values in χ and let Y be another random variable with pdf q that takes values in Ω, let h be a function defined on χ and suppose that f, q, and h satisfy the following condition supportf) supporth) supportq). Then we can change the underlying distribution by instead estimating the integral on the right hand side of the following equality, E f [hx)] = hx) fx) [ qx) qx) = E q hx) fx) ] qx) χ 7

Thus given a sample Y 1,..., Y n ) from Y we can use the following estimator, for the integral E f [hx)], v = 1 n n i=1 fy i ) qy i ) hy i). In order to see the utility of the method lets return to the problem we examined in last example. Example Suppose X N0, 1) and h = 1 2.6<X. Let Y be exponential distribution truncated at 2.6 and let g be the pdf of Y, so gx) = exp x 2.6)). Now let s generate a sample of size 10 4 from Y, which we can do by generating Z ɛ1) and setting Y = Z + 2.6. We then use the sample to evaluate: E f [1 X>2.6 X)] 1 n fy i ) n gy i ). Using this procedure I calculated E f [1 X>2.6 ]X)] 0.0046815 which is quite close to the actual value of 0.00466119. Using the symmetry of the normal distribution we get E f [1 X >2.6 ] 0.009362. Thus using importance sampling we ve produced a much better estimate using a tenth of the sample size. Limitations Despite its benefits importance sampling has many limitations. While the estimator using any compatible g will converge in the limit to the desired integral, some choices g may so drastically diminish the efficiency of the estimator to the point that for any N an estimator computed from a sample of size N may deviate from the actual value by an unacceptable margin. An importance sampling estimator derived from a g for which f/g is unbounded will suffer from this problem due to having an unbounded variance when sampled for different N because there will be values of Y for which the associated weight is arbitrarily large. In order to get an acceptable estimate it is therefore necessary to choose a g for which there exists and m such that mf g. Thus the criterion for choosing an acceptable g is essentially the same as for choosing a g under the accept reject method. The central limit theorem tells us that the variations of ĥq n E f [hx)]) are asymptotically Gaussian if f/q is bounded.[2] Conclusion There are several additional central methods that fall under the heading of Monte Carlo methods, however, they require mathematical knowledge well beyond the prerequisites for this class. One of these methods, optimization, can i=1 8

be briefly summarized as calculating the maximum of a bounded function h on a bounded domain by taking a sample from the uniform on the domain and taking the max value in the image of the sample under h. However, this is a terribly slow and inefficient method and gains in efficiency require an understanding of multivariable calculus. Thus the technique of optimization could not be satisfactorily developed in this paper. The results presented above only represent a small subset of the techniques used in Monte Carlo experiments, but should provide an adequate understanding of how most basic experiments work and the central idea of the Monte Carlo simulation. References [1] Paul Glasserman. Monte Carlo methods in financial engineering. Springer Science + Business Media, New York, 2004. [2] W.S. Kendall, J.M. Marin, and C.P. Robert. Confidence bands for brownian motion and applications to monte carlo simulation. Statistics and Computing, 171):1 10, March 2007. [3] Christian P. Robert and George Casella. Introducing Monte Carlo methods with R. Use R! Springer, New York, 2010. [4] Christian P. Robert and George Casella. Monte Carlo statistical methods. Springer, New York, 2nd edition, 2010. Christian P. Robert, George Casella. ill. ; 24 cm. Springer texts in statistics Éd. antérieure: 2004. Bibliogr.: p. [591]-622. 9