Stochastic Simulation Variance reduction methods Bo Friis Nielsen Applied Mathematics and Computer Science Technical University of Denmark 2800 Kgs. Lyngby Denmark Email: bfni@dtu.dk
Variance reduction methods To obtain better estimates with the same ressources Exploit analytical knowledge and/or correlation Methods: Antithetic variables Control variates Stratified sampling Importance sampling Bo Friis Nielsen 6/6 2018 02443 lecture 7 2
Case: Monte Carlo evaluation of integral Consider the integral 1 0 We can interpret this interval as e x dx E ( e U) = 1 0 e x dx = θ U U(0,1) To estimate the integral: sample of the random variable e U and take the average. X i = e U i X = n i=1 X i n This is the crude Monte Carlo estimator, crude because we use no refinements whatsoever. Bo Friis Nielsen 6/6 2018 02443 lecture 7 3
Analytical considerations It is straightforward to calculate the integral in this case The estimator X 1 0 e x dx = e 1 1.72 E(X) = e 1 V(X) = E(X 2 ) E(X) 2 E(X 2 ) = Based on one observation 1 0 (e x ) 2 dx = 1 2 ( e 2 1 ) V(X) = 1 2 ( e 2 1 ) (e 1) 2 = 0.2420 Bo Friis Nielsen 6/6 2018 02443 lecture 7 4
Antithetic variables General idea: to exploit correlation If the estimator is positively correlated with U i (monotone function): Use 1 U also Y i = eu i +e 1 U i 2 = eu i + e e U i 2 Ȳ = n i=0 Y i n The computational effort of calculating the effort needed to compute X. Ȳ should be similar to By the latter expression of Y i we can generate the same number of Y s as X s Bo Friis Nielsen 6/6 2018 02443 lecture 7 5
Antithetic variables - analytical We can analyse the example analytically due to its simplicity E(Ȳ) = E( X) = θ To calculate V(Ȳ) we start with V(Y i). V(Y i ) = 1 4 V( e U i ) + 1 4 V( e 1 U i ) +2 1 4 Cov( e U i,e 1 U i ) = 1 2 V( e U i ) + 1 2 Cov( e U i (e 1 U i ) Cov ( e U i,e 1 U i ) = E ( e U i e 1 U i ) E ( e U i ) E ( e 1 U i ) = e (e 1) 2 = 3e e 2 1 = 0.2342 V(Y i ) = 1 (0.2420 0.2342) = 0.0039 2 Bo Friis Nielsen 6/6 2018 02443 lecture 7 6
Comparison: Crude method vs. antithetic Crude method: V(X i ) = 1 2 ( e 2 1 ) (e 1) 2 = 0.2420 Antithetic method: V(Y i ) = 1 (0.2420 0.2342) = 0.0039 2 I.e, a reduction by 98 %, almost for free. The variance on X - and samples. Ȳ - will scale with 1/n, the number of Going from crude to antithetic method, reduces the variance as much as increasing number of samples with a factor 50. Bo Friis Nielsen 6/6 2018 02443 lecture 7 7
Antethetic variables in more complex models If X = h(u 1,...,U n ) where h is monotone in each of its coordinates, then we can use antithetic variables to reduce the variance, because Y = h(1 U 1,...,1 U n ) Cov(X,Y) 0 and therefore V( 1 2 (X +Y)) 1 2 V(X). Bo Friis Nielsen 6/6 2018 02443 lecture 7 8
Antithetic variables in the queue simulation Can you device the queueing model of yesterday, so that the number of rejections is a monotone function of the underlying U i s? Yes: Make sure that we always use either U i or 1 U i, so that a large U i implies customers arriving quickly and remaining long. Bo Friis Nielsen 6/6 2018 02443 lecture 7 9
Control variates Use of covariates Z = X +c(y µ y ) E(Y) = µ y ( known) V(Z) = V(X)+c 2 V(Y)+2cCov(Y,X) We can minimize V(Z) by choosing to get c = Cov(X,Y) V(Y) V(Z) = V(X) Cov(X,Y)2 V(Y) Bo Friis Nielsen 6/6 2018 02443 lecture 7 10
Example Use U as control variate Z i = X i +c ( U i 1 ) 2 X i = e U i The optimal value can be found by Cov(X,Y) = Cov ( U,e U) = E ( Ue U) E(U)E ( e U) 0.14086 In practice we would not know this covariance, but estimate it empirically. V(Z c= 0.14086 1/12 ) = V ( e U) Cov( e U,U ) 2 V(U) = 0.0039 Bo Friis Nielsen 6/6 2018 02443 lecture 7 11
Stratified sampling This is a general survey technique: We sample in predetermined areas, using knowledge of structure of the sampling space eui,1 10 +e 1 10 +U i,2 10 + +e 9 10 +U i,10 10 W i = 10 What is an appropriate number of strata? (In this case there is a simple answer; for complex problems not so) Bo Friis Nielsen 6/6 2018 02443 lecture 7 12
Importance sampling Suppose we want to evaluate θ = E(h(X)) = h(x)f(x)dx For g(x) > 0 whenever f(x) > 0 this is equivalent to ( ) h(x)f(x) h(y)f(y) θ = g(x)dx = E g(x) g(y) where Y is distributed according to g(y) This is an efficient estimator ) of θ, if we have chosen g such that the variance of is small. ( h(y)f(y) g(y) Such a g will lead to more Y s where h(y) is large. More important regions will be sampled more often. Bo Friis Nielsen 6/6 2018 02443 lecture 7 13
Re-using the random numbers We want to compare two different queueing systems. We can estimate the rejection rate of system i = 1,2 by θ i = E(g i (U 1,...,U n )) and then rate the two systems according to ˆθ 2 ˆθ 1 But typically g 1 ( ) and g 2 ( ) are positively correlated: Long service times imply many rejections. Bo Friis Nielsen 6/6 2018 02443 lecture 7 14
Then are more efficient estimator is based on θ 2 θ 1 = E(g 2 (U 1,...,U n ) g 1 (U 1,...,U n )) This amounts to letting the two systems run with the same input sequence of random numbers, i.e. same arrival and service time for each customer. With some program flows, this is easily obtained by re-setting the seed of the RNG. When this is not sufficient, you must store the sequence of arrival and service times, so they can be re-used. Bo Friis Nielsen 6/6 2018 02443 lecture 7 15
Exercise 5: Variance reduction methods Estimate the integral 1 0 ex dx by simulation (the crude Monte Carlo estimator). Use eg. an estimator based on 100 samples and present the result as the point estimator and a confidence interval. Estimate the integral 1 0 ex dx using antithetic variables, with comparable computer ressources. Estimate the integral 1 0 ex dx using a control variable, with comparable computer ressources. Estimate the integral 1 0 ex dx using stratified sampling, with comparable computer ressources. Use control variates to reduce the variance of the estimator in exercise 4 (Poisson arrivals). Bo Friis Nielsen 6/6 2018 02443 lecture 7 16