Introduction to (randomized) quasi-monte Carlo

Size: px

Start display at page:

Download "Introduction to (randomized) quasi-monte Carlo"

Clementine Banks
6 years ago
Views:

1 1 aft Introduction to (randomized) quasi-monte Carlo Dr Pierre L Ecuyer MCQMC Conference, Stanford University, August 2016

2 Program Monte Carlo, Quasi-Monte Carlo, Randomized quasi-monte Carlo QMC point sets and randomizations Error and variance bounds, convergence rates Transforming the integrand to make it more QMC friendly (smoother, smaller effective dimension, etc.). Numerical illustrations RQMC for Markov chains Focus on ideas, insight, and examples. 2

3 Example: A stochastic activity network Gives precedence relations between activities. Activity k has random duration Y k (also length of arc k) with known cumulative distribution function (cdf) F k (y) := P[Y k y]. Project duration T = (random) length of longest path from source to sink. May want to estimate E[T ], P[T > x], a quantile, density of T, etc. Y 1 Y 2 source 0 1 Y 0 2 Y 5 Y 3 Y Y 9 Y 7 Y 10 Y 8 Y 6 8 sink 7 6 Y 12 Y 11 3

4 Monte Carlo (simulation) Algorithm: Monte Carlo to estimate E[T ] for i = 0,..., n 1 do for k = 0,..., 12 do Generate U k U(0, 1) and let Y k = F 1 k (U k ) Compute X i = T = h(y 0,..., Y 12 ) = f (U 0,..., U 12 ) Estimate E[T ] = (0,1) f (u)du by X s n = 1 n 1 n i=0 X i, etc. Can also compute confidence interval on E[T ], a histogram to estimate the distribution of T, etc. Numerical illustration from Elmaghraby (1977): Y k N(µ k, σ 2 k ) for k = 0, 1, 3, 10, 11, and V k Expon(1/µ k ) otherwise. µ 0,..., µ 12 : 13.0, 5.5, 7.0, 5.2, 16.5, 14.7, 10.3, 6.0, 4.0, 20.0, 3.2, 3.2, We may pay a penalty if T > 90, for example. 4

5 Naive idea: replace each Y k by its expectation. Gives T =

6 Naive idea: replace each Y k by its expectation. Gives T = Results of an experiment with n = Histogram of values of T gives more information than confidence interval on E[T ] or P[T > x]. Values from 14.4 to 268.6; 11.57% exceed x = 90. Frequency mean = 64.2 T = 48.2 T = x = 90 ˆξ 0.99 = T 5

6 Sample path of hurricane Sandy for the next 5 days As Forecasts Go, You Can

only. To order presentation-ready copies for distribution to your colleagues,

com/articles/as-forecasts-go-you-can-bet-on-monte-carlo-1470994203 U.S.

hurricanes, this simulation method helps predict them all Monte Carlo

7 6 Sample path of hurricane Sandy for the next 5 days As Forecasts Go, You Can Bet on Monte Carlo - WSJ This copy is for your personal, non-commercial use only. To order presentation-ready copies for distribution to your colleagues, clients or customers visit U.S. THE NUMBERS As Forecasts Go, You Can Bet on Monte Carlo From Super Bowls to hurricanes, this simulation method helps predict them all Monte Carlo simulations helped give emergency workers advance warning that Hurricane Sandy would make landfall in

8 7 Sample path of hurricane Sandy for the next 5 days

9 Monte Carlo to estimate an expectation Want to estimate µ = E[X ] where X = f (U) = f (U 0,..., U s 1 ), and the U j are i.i.d. U(0, 1) random numbers. We have µ = E[X ] = f (u)du. [0,1) s Monte Carlo estimator: X n = 1 n 1 n where X i = f (U i ) and U 0,..., U n 1 i.i.d. uniform over [0, 1) s. i=0 We have E[ X n ] = µ and Var[ X n ] = σ 2 /n = Var[X ]/n. X i 8

10 Convergence Theorem. Suppose σ 2 <. When n : (i) Strong law of large numbers: lim n ˆµ n = µ with probability 1. 9

11 Convergence Theorem. Suppose σ 2 <. When n : (i) Strong law of large numbers: lim n ˆµ n = µ with probability 1. (ii) Central limit theorem (CLT): n(ˆµn µ) S n N(0, 1), where Sn 2 = 1 n 1 (X i X n ) 2. n 1 i=0 9

12 Confidence interval at level α (we want Φ(x) = 1 α/2): (ˆµ n ± z α/2 S n / n), where z α/2 = Φ 1 (1 α/2). Example: z α/ for α = z α/2 z α/2 α/2 α/2 1 α The width of the confidence interval is asymptotically proportional to σ/ n, so it converges as O(n 1/2 ). Relative error: σ/(µ n). For one more decimal digit of accuracy, we must multiply n by

13 Confidence interval at level α (we want Φ(x) = 1 α/2): (ˆµ n ± z α/2 S n / n), where z α/2 = Φ 1 (1 α/2). Example: z α/ for α = z α/2 z α/2 α/2 α/2 1 α The width of the confidence interval is asymptotically proportional to σ/ n, so it converges as O(n 1/2 ). Relative error: σ/(µ n). For one more decimal digit of accuracy, we must multiply n by 100. Warning: If the X i have an asymmetric law, these confidence intervals can have very bad coverage (convergence to normal can be very slow). 10

14 Alternative estimator of P[T > x] = E[I(T > x)] for SAN. Naive estimator: Generate T and compute X = I[T > x]. Repeat n times and average. Y 1 Y 2 source 0 1 Y 0 2 Y 5 Y 3 Y Y 9 Y 7 Y 10 Y 8 Y 6 8 sink 7 6 Y 12 Y 11 11

15 Conditional Monte Carlo estimator of P[T > x]. Generate the Y j s only for the 8 arcs that do not belong to the cut L = {4, 5, 6, 8, 9}, and replace I[T > x] by its conditional expectation given those Y j s, X e = P[T > x {Y j, j L}]. This makes the integrand continuous in the U j s. 12

16 Conditional Monte Carlo estimator of P[T > x]. Generate the Y j s only for the 8 arcs that do not belong to the cut L = {4, 5, 6, 8, 9}, and replace I[T > x] by its conditional expectation given those Y j s, X e = P[T > x {Y j, j L}]. This makes the integrand continuous in the U j s. To compute X e : for each l L, say from a l to b l, compute the length α l of the longest path from 1 to a l, and the length β l of the longest path from b l to the destination. The longest path that passes through link l does not exceed x iff α l + Y l + β l x, which occurs with probability P[Y l x α l β l ] = F l [x α l β l ]. 12

17 Conditional Monte Carlo estimator of P[T > x]. Generate the Y j s only for the 8 arcs that do not belong to the cut L = {4, 5, 6, 8, 9}, and replace I[T > x] by its conditional expectation given those Y j s, X e = P[T > x {Y j, j L}]. This makes the integrand continuous in the U j s. To compute X e : for each l L, say from a l to b l, compute the length α l of the longest path from 1 to a l, and the length β l of the longest path from b l to the destination. The longest path that passes through link l does not exceed x iff α l + Y l + β l x, which occurs with probability P[Y l x α l β l ] = F l [x α l β l ]. Since the Y l are independent, we obtain X e = 1 l L F l [x α l β l ]. Can be faster to compute than X, and always has less variance. 12

18 Example: Pricing a financial derivative. Market price of some asset (e.g., one share of a stock) evolves in time as stochastic process {S(t), t 0} with (supposedly) known probability law (estimated from data). A financial contract gives owner net payoff g(s(t 1 ),..., S(t d )) at time T = t d, where g : R d R, and 0 t 1 < < t d are fixed observation times. Under a no-arbitrage assumption, present value (fair price) of contract at time 0, when S(0) = s 0, can be written as v(s 0, T ) = E [ e rt g(s(t 1 ),..., S(t d )) ], where E is under a risk-neutral measure and e rt is the discount factor. This expectation can be written as an integral over [0, 1) s and estimated by the average of n i.i.d. replicates of X = e rt g(s(t 1 ),..., S(t d )). 13

19 A simple model for S: geometric Brownian motion (GBM): S(t) = s 0 e (r σ2 /2)t+σB(t) where r is the interest rate, σ is the volatility, and B( ) is a standard Brownian motion: for any t 2 > t 1 0, B(t 2 ) B(t 1 ) N(0, t 2 t 1 ), and the increments over disjoint intervals are independent. 14

20 A simple model for S: geometric Brownian motion (GBM): S(t) = s 0 e (r σ2 /2)t+σB(t) where r is the interest rate, σ is the volatility, and B( ) is a standard Brownian motion: for any t 2 > t 1 0, B(t 2 ) B(t 1 ) N(0, t 2 t 1 ), and the increments over disjoint intervals are independent. Algorithm: Option pricing under GBM model for i = 0,..., n 1 do Let t 0 = 0 and B(t 0 ) = 0 for j = 1,..., d do Generate U j U(0, 1) and let Z j = Φ 1 (U j ) Let B(t j ) = B(t j 1 ) + t j t j 1 Z j Let S(t j ) = s 0 exp [ (r σ 2 /2)t j + σb(t j ) ] Compute X i = e rt g(s(t 1 ),..., S(t d )) Return X n = 1 n 1 n i=0 X i, estimator of v(s 0, T ). 14

21 Example of contract: Discretely-monitored Asian call option: g(s(t 1 ),..., S(t d )) = max 0, 1 d S(t j ) K. d Option price written as an integral over the unit hypercube: Let Z j = Φ 1 (U j ) where the U j are i.i.d. U(0, 1). Here we have s = d and v(s 0, T ) = ( e rt max 0, 1 s s 0 [0,1) s s i=1 i exp (r σ 2 /2)t i + σ tj t j 1 Φ 1 (u j ) K du 1... du s = j=1 [0,1) s f (u 1,..., u s )du 1... du s. j=1 15

22 16 Numerical illustration: Bermudean Asian option with d = 12, T = 1 (one year), t j = j/12 for j = 0,..., 12, K = 100, s 0 = 100, r = 0.05, σ = 0.5. We performed n = 10 6 independent simulation runs. In 53.47% of cases, the payoff is 0. Mean: Max = Histogram of the 46.53% positive values: Frequency ( 10 3 ) Payoff

23 Reducing the variance by changing f If we replace the arithmetic average by a geometric average in the payoff, we obtain d C = e rt max 0, (S(t j )) 1/d K, j=1 whose expectation ν = E[C] has a closed-form formula. When estimating the mean E[X ] = v(s 0, T ), we can then use C as a control variate (CV): Replace the estimator X by the corrected version X c = X β(c ν) for some well-chosen constant β. Optimal β is β = Cov[C, X ]/Var[C]. Using a CV makes the integrand f smoother. Can provide a huge variance reduction, e.g., by a factor of over a million in some examples. 17

24 Quasi-Monte Carlo (QMC) Replace the independent random points U i by a set of deterministic points P n = {u 0,..., u n 1 } that cover [0, 1) s more evenly. Estimate Integration error E n = µ µ. µ = f (u)du by µ n = 1 n 1 f (u i ). [0,1) s n P n is called a highly-uniform point set or low-discrepancy point set if some measure of discrepancy between the empirical distribution of P n and the uniform distribution converges to 0 faster than O(n 1/2 ) (the typical rate for independent random points). i=0 18

25 Quasi-Monte Carlo (QMC) Replace the independent random points U i by a set of deterministic points P n = {u 0,..., u n 1 } that cover [0, 1) s more evenly. Estimate Integration error E n = µ µ. µ = f (u)du by µ n = 1 n 1 f (u i ). [0,1) s n P n is called a highly-uniform point set or low-discrepancy point set if some measure of discrepancy between the empirical distribution of P n and the uniform distribution converges to 0 faster than O(n 1/2 ) (the typical rate for independent random points). i=0 Main construction methods: lattice rules and digital nets (Korobov, Hammersley, Halton, Sobol, Faure, Niederreiter, etc.) 18

26 Simple case: one dimension (s = 1) Obvious solutions: P n = Z n /n = {0, 1/n,..., (n 1)/n} (left Riemann sum): which gives µ n = 1 n 1 f (i/n), and E n = O(n 1 ) if f is bounded, n i=0 19

27 Simple case: one dimension (s = 1) Obvious solutions: P n = Z n /n = {0, 1/n,..., (n 1)/n} (left Riemann sum): which gives µ n = 1 n 1 f (i/n), and E n = O(n 1 ) if f is bounded, n i=0 or P n = {1/(2n), 3/(2n),..., (2n 1)/(2n)} (midpoint rule): for which E n = O(n 2 ) if f is bounded. 19

28 If we allow different weights on the f (u i ), we have the trapezoidal rule: [ ] 1 f (0) + f (1) n 1 + f (i/n), n 2 for which E n = O(n 2 ) if f is bounded, i=1 20

29 If we allow different weights on the f (u i ), we have the trapezoidal rule: [ ] 1 f (0) + f (1) n 1 + f (i/n), n 2 for which E n = O(n 2 ) if f is bounded, or the Simpson rule, f (0) + 4f (1/n) + 2f (2/n) + + 2f ((n 2)/n) + 4f ((n 1)/n) + f (1), 3n i=1 which gives E n = O(n 4 ) if f (4) is bounded, etc. 20

30 If we allow different weights on the f (u i ), we have the trapezoidal rule: [ ] 1 f (0) + f (1) n 1 + f (i/n), n 2 for which E n = O(n 2 ) if f is bounded, or the Simpson rule, f (0) + 4f (1/n) + 2f (2/n) + + 2f ((n 2)/n) + 4f ((n 1)/n) + f (1), 3n i=1 which gives E n = O(n 4 ) if f (4) is bounded, etc. Here, for QMC and RQMC, we restrict ourselves to equal weight rules. For the RQMC points that we will examine, one can prove that equal weights are optimal. 20

31 Simplistic solution for s > 1: rectangular grid P n = {(i 1 /d,..., i s /d) such that 0 i j < d j} where n = d s. u i, u i,2 21

32 Simplistic solution for s > 1: rectangular grid P n = {(i 1 /d,..., i s /d) such that 0 i j < d j} where n = d s. u i, Midpoint rule in s dimensions. Quickly becomes impractical when s increases. Moreover, each one-dimensional projection has only d distinct points, each two-dimensional projections has only d 2 distinct points, etc. u i,2 21

33 Lattice rules (Korobov, Sloan, etc.) Integration lattice: L s = v = s z j v j such that each z j Z, j=1 where v 1,..., v s R s are linearly independent over R and where L s contains Z s. Lattice rule: Take P n = {u 0,..., u n 1 } = L s [0, 1) s. 22

34 Lattice rules (Korobov, Sloan, etc.) Integration lattice: L s = v = s z j v j such that each z j Z, j=1 where v 1,..., v s R s are linearly independent over R and where L s contains Z s. Lattice rule: Take P n = {u 0,..., u n 1 } = L s [0, 1) s. Lattice rule of rank 1: u i = iv 1 mod 1 for i = 0,..., n 1, where nv 1 = a = (a 1,..., a s ) {0, 1,..., n 1} s. Korobov rule: a = (1, a, a 2 mod n,... ). 22

35 Lattice rules (Korobov, Sloan, etc.) Integration lattice: L s = v = s z j v j such that each z j Z, j=1 where v 1,..., v s R s are linearly independent over R and where L s contains Z s. Lattice rule: Take P n = {u 0,..., u n 1 } = L s [0, 1) s. Lattice rule of rank 1: u i = iv 1 mod 1 for i = 0,..., n 1, where nv 1 = a = (a 1,..., a s ) {0, 1,..., n 1} s. Korobov rule: a = (1, a, a 2 mod n,... ). For any u {1,..., s}, the projection L s (u) of L s is also a lattice. 22

36 Example: lattice with s = 2, n = 101, v 1 = (1, 12)/n P n = {u i = iv 1 mod 1) : i = 0,..., n 1} = {(0, 0), (1/101, 12/101), (2/101, 43/101),... }. 1 u i,1 v Here, each one-dimensional projection is {0, 1/n,..., (n 1)/n}. u i,2 23

37 Example: lattice with s = 2, n = 101, v 1 = (1, 12)/n P n = {u i = iv 1 mod 1) : i = 0,..., n 1} = {(0, 0), (1/101, 12/101), (2/101, 43/101),... }. 1 u i,1 v Here, each one-dimensional projection is {0, 1/n,..., (n 1)/n}. u i,2 23

38 Example: lattice with s = 2, n = 101, v 1 = (1, 12)/n P n = {u i = iv 1 mod 1) : i = 0,..., n 1} = {(0, 0), (1/101, 12/101), (2/101, 43/101),... }. 1 u i,1 v Here, each one-dimensional projection is {0, 1/n,..., (n 1)/n}. u i,2 23

39 Example: lattice with s = 2, n = 101, v 1 = (1, 12)/n P n = {u i = iv 1 mod 1) : i = 0,..., n 1} = {(0, 0), (1/101, 12/101), (2/101, 43/101),... }. 1 u i,1 v Here, each one-dimensional projection is {0, 1/n,..., (n 1)/n}. u i,2 23

40 Example: lattice with s = 2, n = 101, v 1 = (1, 12)/n P n = {u i = iv 1 mod 1) : i = 0,..., n 1} = {(0, 0), (1/101, 12/101), (2/101, 43/101),... }. 1 u i,1 v Here, each one-dimensional projection is {0, 1/n,..., (n 1)/n}. u i,2 23

41 Another example: s = 2, n = 1021, v 1 = (1, 90)/n P n = {u i = iv 1 mod 1 : i = 0,..., n 1} = {(i/1021, (90i/1021) mod 1) : i = 0,..., 1020}. u i,1 1 v u i,2 24

42 A bad lattice: s = 2, n = 101, v 1 = (1, 51)/n u i,1 1 v u i,2 Good uniformity in one dimension, but not in two! 25

43 Digital net in base b (Niederreiter) Gives n = b k points. For i = 0,..., b k 1 and j = 1,..., s: u i,j,1. u i,j,w i = a i,0 + a i,1 b + + a i,k 1 b k 1 = a i,k 1 a i,1 a i,0, = C j u i,j = a i,0. a i,k 1 mod b, w u i,j,l b l, u i = (u i,1,..., u i,s ), l=1 where the generating matrices C j are w k with elements in Z b. In practice, w and k are finite, but there is no limit. Digital sequence: infinite sequence. Can stop at n = b k for any k. 26

44 Digital net in base b (Niederreiter) Gives n = b k points. For i = 0,..., b k 1 and j = 1,..., s: u i,j,1. u i,j,w i = a i,0 + a i,1 b + + a i,k 1 b k 1 = a i,k 1 a i,1 a i,0, = C j u i,j = a i,0. a i,k 1 mod b, w u i,j,l b l, u i = (u i,1,..., u i,s ), l=1 where the generating matrices C j are w k with elements in Z b. In practice, w and k are finite, but there is no limit. Digital sequence: infinite sequence. Can stop at n = b k for any k. Can also multiply in some ring R, with bijections between Z b and R. Each one-dim projection truncated to first k digits is Z n /n = {0, 1/n,..., (n 1)/n}. Each C j defines a permutation of Z n /n. 26

45 Small example: Hammersley in two dimensions Let n = 2 8 = 256 and s = 2. Take the points (in binary): i u 1,i u 2,i Right side: van der Corput sequence in base 2. 27

46 u i,1 Hammersley point set, n = 2 8 = 256, s = u i,2 28

47 u i,1 Hammersley point set, n = 2 8 = 256, s = u i,2 28

48 u i,1 Hammersley point set, n = 2 8 = 256, s = u i,2 28

49 u i,1 Hammersley point set, n = 2 8 = 256, s = u i,2 28

50 u i,1 Hammersley point set, n = 2 8 = 256, s = u i,2 28

51 In general, can take n = 2 k points. If we partition [0, 1) 2 in rectangles of sizes 2 k 1 by 2 k 2 where k 1 + k 2 k, each rectangle will contain exactly the same number of points. We say that the points are equidistributed for this partition. 29

52 In general, can take n = 2 k points. If we partition [0, 1) 2 in rectangles of sizes 2 k 1 by 2 k 2 where k 1 + k 2 k, each rectangle will contain exactly the same number of points. We say that the points are equidistributed for this partition. For a digital net in base b in s dimensions, we choose s permutations of {0, 1,..., 2 b 1}, then divide each coordinate by b k. Can also have s = and/or n = (infinite sequence of points). 29

53 Suppose we divide axis j in b q j equal parts, for each j. This determines a partition of [0, 1) s into 2 q 1+ +q s rectangles of equal sizes. If each rectangle contains exactly the same number of points, we say that the point set P n is (q 1,..., q s )-equidistributed in base b. This occurs iff the matrix formed by the first q 1 rows of C 1, the first q 2 rows of C 2,..., the first q s rows of C s, is of full rank (mod b). To verify equidistribution, we can construct these matrices and compute their rank. P n is a (t, k, s)-net iff it is (q 1,..., q s )-equidistributed whenever q q s = k t. This is possible for t = 0 only if b s 1. t-value of a net: smallest t for which it is a (t, k, s)-net. 30

54 Suppose we divide axis j in b q j equal parts, for each j. This determines a partition of [0, 1) s into 2 q 1+ +q s rectangles of equal sizes. If each rectangle contains exactly the same number of points, we say that the point set P n is (q 1,..., q s )-equidistributed in base b. This occurs iff the matrix formed by the first q 1 rows of C 1, the first q 2 rows of C 2,..., the first q s rows of C s, is of full rank (mod b). To verify equidistribution, we can construct these matrices and compute their rank. P n is a (t, k, s)-net iff it is (q 1,..., q s )-equidistributed whenever q q s = k t. This is possible for t = 0 only if b s 1. t-value of a net: smallest t for which it is a (t, k, s)-net. An infinite sequence {u 0, u 1,..., } in [0, 1) s is a (t, s)-sequence in base b if for all k > 0 and ν 0, Q(k, ν) = {u i : i = νb k,..., (ν + 1)b k 1}, is a (t, k, s)-net in base b. 30

55 Suppose we divide axis j in b q j equal parts, for each j. This determines a partition of [0, 1) s into 2 q 1+ +q s rectangles of equal sizes. If each rectangle contains exactly the same number of points, we say that the point set P n is (q 1,..., q s )-equidistributed in base b. This occurs iff the matrix formed by the first q 1 rows of C 1, the first q 2 rows of C 2,..., the first q s rows of C s, is of full rank (mod b). To verify equidistribution, we can construct these matrices and compute their rank. P n is a (t, k, s)-net iff it is (q 1,..., q s )-equidistributed whenever q q s = k t. This is possible for t = 0 only if b s 1. t-value of a net: smallest t for which it is a (t, k, s)-net. An infinite sequence {u 0, u 1,..., } in [0, 1) s is a (t, s)-sequence in base b if for all k > 0 and ν 0, Q(k, ν) = {u i : i = νb k,..., (ν + 1)b k 1}, is a (t, k, s)-net in base b. This is possible for t = 0 only if b s. 30

56 Sobol nets and sequences Sobol (1967) proposed a digital net in base b = 2 where 1 v j,2,1... v j,c, v j,c,2... C j =

57 Sobol nets and sequences Sobol (1967) proposed a digital net in base b = 2 where 1 v j,2,1... v j,c, v j,c,2... C j = Column c of C j is represented by an odd integer c m j,c = v j,c,l 2 c l = v j,c,1 2 c v j,c,c < 2 c. l=1 The integers m j,c are selected as follows. 31

58 For each j, we choose a primitive polynomial over F 2, f j (z) = z d j + a j,1 z d j a j,dj, and we choose d j integers m j,0,..., m j,dj 1 (the first d j columns). 32

59 For each j, we choose a primitive polynomial over F 2, f j (z) = z d j + a j,1 z d j a j,dj, and we choose d j integers m j,0,..., m j,dj 1 (the first d j columns). Then, m j,dj, m j,dj +1,... are determined by the recurrence m j,c = 2a j,1 m j,c 1 2 d j 1 a j,dj 1m j,c dj +1 2 d j m j,c dj m j,c dj Proposition. If the polynomials f j (z) are all distinct, we obtain a (t, s)-sequence with t d d s s. 32

60 For each j, we choose a primitive polynomial over F 2, f j (z) = z d j + a j,1 z d j a j,dj, and we choose d j integers m j,0,..., m j,dj 1 (the first d j columns). Then, m j,dj, m j,dj +1,... are determined by the recurrence m j,c = 2a j,1 m j,c 1 2 d j 1 a j,dj 1m j,c dj +1 2 d j m j,c dj m j,c dj Proposition. If the polynomials f j (z) are all distinct, we obtain a (t, s)-sequence with t d d s s. Sobol suggests to list all primitive polynomials over F 2 by increasing order of degree, starting with f 0 (z) 1 (which gives C 0 = I), and to take f j (z) as the (j + 1)-th polynomial in the list. There are many ways of selecting the first m j,c s, which are called the direction numbers. They can be selected to minimize some discrepancy (or figure of merit). The values proposed by Sobol give an (s, l)-equidistribution for l = 1 and l = 2 (only the first two bits). For n = 2 k fixed, we can gain one dimension as for the Faure sequence. Joe and Kuo (2008) tabulated direction numbers giving the best t-value for the two-dimensional projections, for given s and k. 32

61 Other constructions Faure nets and sequences Niederreiter-Xing point sets and sequences Polynomial lattice rules (special case of digital nets) Halton sequence Etc. 33

62 Worst-case error bounds Koksma-Hlawka-type inequalities (Koksma, Hlawka, Hickernell, etc.): ˆµ n,rqmc µ V (f ) D(P n ) for all f in some Hilbert space or Banach space H, where V (f ) = f µ H is the variation of f, and D(P n ) is the discrepancy of P n. 34

63 Worst-case error bounds Koksma-Hlawka-type inequalities (Koksma, Hlawka, Hickernell, etc.): ˆµ n,rqmc µ V (f ) D(P n ) for all f in some Hilbert space or Banach space H, where V (f ) = f µ H is the variation of f, and D(P n ) is the discrepancy of P n. Lattice rules: For certain Hilbert spaces of smooth periodic functions f with square-integrable partial derivatives of order up to α: D(P n ) = O(n α+ɛ ) for arbitrary small ɛ. Digital nets: Classical Koksma-Hlawka inequality for QMC: f must have finite variation in the sense of Hardy and Krause (implies no discontinuity not aligned with the axes). Popular constructions achieve D(P n ) = O(n 1 (ln n) s ) = O(n 1+ɛ ) for arbitrary small ɛ. More recent constructions offer better rates for smooth functions. 34

64 Worst-case error bounds Koksma-Hlawka-type inequalities (Koksma, Hlawka, Hickernell, etc.): ˆµ n,rqmc µ V (f ) D(P n ) for all f in some Hilbert space or Banach space H, where V (f ) = f µ H is the variation of f, and D(P n ) is the discrepancy of P n. Lattice rules: For certain Hilbert spaces of smooth periodic functions f with square-integrable partial derivatives of order up to α: D(P n ) = O(n α+ɛ ) for arbitrary small ɛ. Digital nets: Classical Koksma-Hlawka inequality for QMC: f must have finite variation in the sense of Hardy and Krause (implies no discontinuity not aligned with the axes). Popular constructions achieve D(P n ) = O(n 1 (ln n) s ) = O(n 1+ɛ ) for arbitrary small ɛ. More recent constructions offer better rates for smooth functions. Bounds are conservative and too hard to compute in practice. 34

65 Randomized quasi-monte Carlo (RQMC) ˆµ n,rqmc = 1 n 1 f (U i ), n i=0 with P n = {U 0,..., U n 1 } (0, 1) s an RQMC point set: (i) each point U i has the uniform distribution over (0, 1) s ; (ii) P n as a whole is a low-discrepancy point set. E[ˆµ n,rqmc ] = µ Var[ˆµ n,rqmc ] = Var[f (U i)] n (unbiased). + 2 n 2 Cov[f (U i ), f (U j )]. i<j We want to make the last sum as negative as possible. 35

66 Randomized quasi-monte Carlo (RQMC) ˆµ n,rqmc = 1 n 1 f (U i ), n i=0 with P n = {U 0,..., U n 1 } (0, 1) s an RQMC point set: (i) each point U i has the uniform distribution over (0, 1) s ; (ii) P n as a whole is a low-discrepancy point set. E[ˆµ n,rqmc ] = µ Var[ˆµ n,rqmc ] = Var[f (U i)] n (unbiased). + 2 n 2 Cov[f (U i ), f (U j )]. i<j We want to make the last sum as negative as possible. Weaker attempts to do the same: antithetic variates (n = 2), Latin hypercube sampling (LHS), stratification,... 35

67 Variance estimation: Can compute m independent realizations X 1,..., X m of ˆµ n,rqmc, then estimate µ and Var[ˆµ n,rqmc ] by their sample mean X m and sample variance S 2 m. Could be used to compute a confidence interval. Temptation: assume that X m has the normal distribution. Beware: usually wrong unless m. 36

68 Stratification of the unit hypercube Partition axis j in k j 1 equal parts, for j = 1,..., s. Draw n = k 1 k s random points, one per box, independently. Example, s = 2, k 1 = 12, k 2 = 8, n = 12 8 = u i,1 0 1 u i,2 37

69 Stratification of the unit hypercube u i,1 Example, s = 2, k 1 = 24, k 2 = 16, n = u i,2 38

70 Stratified estimator: X s,n = 1 n 1 f (U j ). n The crude MC variance with n points can be decomposed as j=0 Var[ X n ] = Var[X s,n ] + 1 n where µ j is the mean over box j. n 1 (µ j µ) 2 j=0 The more the µ j differ, the more the variance is reduced. 39

71 Stratified estimator: X s,n = 1 n 1 f (U j ). n The crude MC variance with n points can be decomposed as j=0 Var[ X n ] = Var[X s,n ] + 1 n where µ j is the mean over box j. n 1 (µ j µ) 2 j=0 The more the µ j differ, the more the variance is reduced. If f is continuous and bounded, and all k j are equal, then Var[X s,n ] = O(n 1 2/s ). 39

72 Stratified estimator: X s,n = 1 n 1 f (U j ). n The crude MC variance with n points can be decomposed as where µ j is the mean over box j. j=0 Var[ X n ] = Var[X s,n ] + 1 n 1 (µ j µ) 2 n j=0 The more the µ j differ, the more the variance is reduced. If f is continuous and bounded, and all k j are equal, then Var[X s,n ] = O(n 1 2/s ). For large s, not practical. For small s, not really better than midpoint rule with a grid when f is smooth. But can still be applied to a few important random variables. Also, gives an unbiased estimator, and variance can be estimated by replicating m 2 times. 39

73 Randomly-Shifted Lattice Example: lattice with s = 2, n = 101, v 1 = (1, 12)/101 u i, u i,2 40

74 Randomly-Shifted Lattice Example: lattice with s = 2, n = 101, v 1 = (1, 12)/101 u i,1 1 U 0 1 u i,2 40

75 Randomly-Shifted Lattice Example: lattice with s = 2, n = 101, v 1 = (1, 12)/101 u i, u i,2 40

76 Randomly-Shifted Lattice Example: lattice with s = 2, n = 101, v 1 = (1, 12)/101 u i, u i,2 40

77 Random digital shift for digital net Equidistribution in digital boxes is lost with random shift modulo 1, but can be kept with a random digital shift in base b. In base 2: Generate U U(0, 1) s and XOR it bitwise with each u i. Example for s = 2: u i = ( , ) 2 U = ( , ) 2 u i U = ( , ) 2. Each point has U(0, 1) distribution. Preservation of the equidistribution (k 1 = 3, k 2 = 5): u i = (0.***, 0.*****) U = (0.010, ) 2 u i U = (0.***, 0.*****) 41

78 Example with U = ( , ) 10 = ( , ) 2. Changes the bits 3, 9, 15, 16, 17, 18 of u i,1 and the bits 2, 4, 8, 9, 13, 15, 16 of u i,2. u n+1 1 u n+1 0 u n Red and green squares are permuted (k 1 = k 2 = 4, first 4 bits of U). 1 u n 42

79 Random digital shift in base b We have u i,j = w l=1 u i,j,lb l. Let U = (U 1,..., U s ) U[0, 1) s where U j = w l=1 U j,l b l. We replace each u i,j by Ũ i,j = w l=1 [(u i,j,l + U j,l ) mod b]b l. Proposition. P n is (q 1,..., q s )-equidistributed in base b iff P n is. For w =, each point Ũ i has the uniform distribution over (0, 1) s. 43

80 Other permutations that preserve equidistribution and may help reduce the variance further: Linear matrix scrambling (Matoušek, Hickernell et Hong, Tezuka, Owen): We left-multiply each matrix C j by a random w w matrix M j, non-singular and lower triangular, mod b. Several variants. We then apply a random digital shift in base b to obtain uniform distribution for each point (unbiasedness). 44

81 Other permutations that preserve equidistribution and may help reduce the variance further: Linear matrix scrambling (Matoušek, Hickernell et Hong, Tezuka, Owen): We left-multiply each matrix C j by a random w w matrix M j, non-singular and lower triangular, mod b. Several variants. We then apply a random digital shift in base b to obtain uniform distribution for each point (unbiasedness). Nested uniform scrambling (Owen 1995). More costly. But provably reduces the variance to O(n 3 (log n) s ) when f is sufficiently smooth! 44

82 Asian option example T = 1 (year), t j = j/d, K = 100, s 0 = 100, r = 0.05, σ = 0.5. s = d = 2. Exact value: µ MC Variance: Lattice: Korobov with a from old table + random shift. Sobol: left matrix scramble + random digital shift. Variance estimated from m = 1000 indep. randomizations. VRF = (MC variance) / (nvar[x s,n ]) method n X m nsm 2 VRF stratif lattice Sobol stratif lattice Sobol ,330 stratif lattice ,318 Sobol ,000 45

83 s = d = 12. µ MC variance: Lattice: Korobov + random shift. Sobol: left matrix scramble + random digital shift. Variance estimated from m = 1000 indep. randomizations. method n X m nsm 2 VRF lattice Sobol lattice Sobol lattice Sobol

84 Variance for randomly-shifted lattice rules Suppose f has Fourier expansion f (u) = h Z s ˆf (h)e 2π 1h tu. For a randomly shifted lattice, the exact variance is always Var[ˆµ n,rqmc ] = ˆf (h) 2, 0 h L s where L s = {h R s : h t v Z for all v L s } Z s is the dual lattice. From the viewpoint of variance reduction, an optimal lattice for f minimizes Var[ˆµ n,rqmc ]. 47

85 Var[ˆµ n,rqmc ] = 0 h L s ˆf (h) 2. Let α > 0 be an even integer. If f has square-integrable mixed partial derivatives up to order α/2 > 0, and the periodic continuation of its derivatives up to order α/2 1 is continuous across the unit cube boundaries, then ˆf (h) 2 = O((max(1, h 1 ) max(1, h s )) α ). Moreover, there is a vector v 1 = v 1 (n) such that P α := (max(1, h 1 ) max(1, h s )) α = O(n α+ɛ ). 0 h L s This P α has been proposed long ago as a figure of merit, often with α = 2. It is the variance for a worst-case f having ˆf (h) 2 = (max(1, h 1 ) max(1, h s )) α. A larger α means a smoother f and a faster convergence rate. 48

86 For even integer α, this worst-case f is f (u) = u {1,...,s} j u (2π) α/2 (α/2)! B α/2(u j ). where B α/2 is the Bernoulli polynomial of degree α/2. In particular, B 1 (u) = u 1/2 and B 2 (u) = u 2 u + 1/6. Easy to compute P α and search for good lattices in this case! However: This worst-case function is not necessarily representative of what happens in applications. Also, the hidden factor in O increases quickly with s, so this result is not very useful for large s. To get a bound that is uniform in s, the Fourier coefficients must decrease faster with the dimension and size of vectors h; that is, f must be smoother in high-dimensional projections. This is typically what happens in applications for which RQMC is effective! 49

87 Baker s (or tent) transformation To make the periodic continuation of f continuous. If f (0) f (1), define f by f (1 u) = f (u) = f (2u) for 0 u 1/2. This f has the same integral as f and f (0) = f (1) /2. 50

88 Baker s (or tent) transformation To make the periodic continuation of f continuous. If f (0) f (1), define f by f (1 u) = f (u) = f (2u) for 0 u 1/2. This f has the same integral as f and f (0) = f (1) /2. 50

89 Baker s (or tent) transformation To make the periodic continuation of f continuous. If f (0) f (1), define f by f (1 u) = f (u) = f (2u) for 0 u 1/2. This f has the same integral as f and f (0) = f (1) /2. 50

90 Baker s (or tent) transformation To make the periodic continuation of f continuous. If f (0) f (1), define f by f (1 u) = f (u) = f (2u) for 0 u 1/2. This f has the same integral as f and f (0) = f (1) /2 For smooth f, can reduce the variance to O(n 4+ɛ ) (Hickernell 2002). The resulting f is symmetric with respect to u = 1/2. In practice, we transform the points U i instead of f. 50

91 One-dimensional case Random shift followed by baker s transformation. Along each coordinate, stretch everything by a factor of 2 and fold. Same as replacing U j by min[2u j, 2(1 U j )]

92 One-dimensional case Random shift followed by baker s transformation. Along each coordinate, stretch everything by a factor of 2 and fold. Same as replacing U j by min[2u j, 2(1 U j )] U/n 51

93 One-dimensional case Random shift followed by baker s transformation. Along each coordinate, stretch everything by a factor of 2 and fold. Same as replacing U j by min[2u j, 2(1 U j )]

94 One-dimensional case Random shift followed by baker s transformation. Along each coordinate, stretch everything by a factor of 2 and fold. Same as replacing U j by min[2u j, 2(1 U j )] Gives locally antithetic points in intervals of size 2/n. This implies that linear pieces over these intervals are integrated exactly. Intuition: when f is smooth, it is well-approximated by a piecewise linear function, which is integrated exactly, so the error is small. 51

95 ANOVA decomposition The Fourier expansion has too many terms to handle. As a cruder expansion, we can write f (u) = f (u 1,..., u s ) as: f (u) = where u {1,...,s} f u (u) = µ + s f {i} (u i ) + i=1 s f {i,j} (u i, u j ) + i,j=1 f u (u) = f (u) duū f v (u v ), [0,1) ū v u and the Monte Carlo variance decomposes as σ 2 = σu, 2 where σu 2 = Var[f u (U)]. u {1,...,s} The σ 2 u s can be estimated by MC or RQMC. Heuristic intuition: Make sure the projections P n (u) are very uniform for the important subsets u (i.e., with larger σu). 2 52

96 53 Weighted P γ,α with projection-dependent weights γ u Denote u(h) = u(h 1,..., h s ) the set of indices j for which h j 0. P γ,α = 0 h L s γ u(h) (max(1, h 1 ) max(1, h s )) α. For α/2 integer > 0, with u i = (u i,1,..., u i,s ) = iv 1 mod 1, P γ,α = =u {1,...,s} and the corresponding variation is V 2 γ (f ) = =u {1,...,s} 1 n 1 [ ( 4π 2 ) α/2 ] u γ u B α (u i,j ), n (α)! i=0 1 γ u (4π 2 ) α u /2 [0,1] u j u α u /2 u f u(u) α/2 for f : [0, 1) s R smooth enough. Then, Var[ˆµ n,rqmc ] = Var[ˆµ n,rqmc (f u )] Vγ 2 (f )P γ,α. u {1,...,s} 2 du,

97 P γ,α with α = 2 and properly chosen weights γ is a good practical choice of figure of merit. Simple choices of weights: order-dependent or product. Lattice Builder: Software to search for good lattices with arbitrary n, s, weights, etc. See my web page. 54

98 ANOVA Variances for estimator of P[T > x] in Stochastic Activity Network x = 64 x = 100 CMC, x = 64 CMC, x = 100 Stochastic Activity Network % of total variance for each cardinality of u 55

99 Variance for estimator of P[T > x] for SAN variance Stochastic Activity Network (x = 64) MC Sobol Lattice (P 2) + baker n Variance decreases roughly as O(n 1.2 ). For E[T ], we observe O(n 1.4 ). n 56

100 Variance for estimator of P[T > x] with CMC variance Stochastic Activity Network (CMC x = 64) MC Sobol Lattice (P 2) + baker n n 57

101 Histograms probability probability probability single MC draw (x = 100) MC estimator (x = 100) 6 7 RQMC estimator (x = 100)

102 Histograms probability probability probability single MC draw (CMC x = 100) MC estimator (CMC x = 100) RQMC estimator (CMC x = 100)

103 Effective dimension (Caflisch, Morokoff, and Owen 1997). A function f has effective dimension d in proportion ρ in the superposition sense if σu 2 ρσ 2. u d It has effective dimension d in the truncation sense if σu 2 ρσ 2. u {1,...,d} High-dimensional functions with low effective dimension are frequent. One may change f to make this happen. 60

104 Example: Function of a Multinormal vector Let µ = E[f (U)] = E[g(Y)] where Y = (Y 1,..., Y s ) N(0, Σ). 61

105 Example: Function of a Multinormal vector Let µ = E[f (U)] = E[g(Y)] where Y = (Y 1,..., Y s ) N(0, Σ). For example, if the payoff of a financial derivative is a function of the values taken by a c-dimensional geometric Brownian motion (GMB) at d observations times 0 < t 1 < < t d = T, then we have s = cd. 61

106 Example: Function of a Multinormal vector Let µ = E[f (U)] = E[g(Y)] where Y = (Y 1,..., Y s ) N(0, Σ). For example, if the payoff of a financial derivative is a function of the values taken by a c-dimensional geometric Brownian motion (GMB) at d observations times 0 < t 1 < < t d = T, then we have s = cd. To generate Y: Decompose Σ = AA t, generate Z = (Z 1,..., Z s ) N(0, I) where the (independent) Z j s are generated by inversion: Z j = Φ 1 (U j ), and return Y = AZ. 61

107 Example: Function of a Multinormal vector Let µ = E[f (U)] = E[g(Y)] where Y = (Y 1,..., Y s ) N(0, Σ). For example, if the payoff of a financial derivative is a function of the values taken by a c-dimensional geometric Brownian motion (GMB) at d observations times 0 < t 1 < < t d = T, then we have s = cd. To generate Y: Decompose Σ = AA t, generate Z = (Z 1,..., Z s ) N(0, I) where the (independent) Z j s are generated by inversion: Z j = Φ 1 (U j ), and return Y = AZ. Choice of A? 61

108 Example: Function of a Multinormal vector Let µ = E[f (U)] = E[g(Y)] where Y = (Y 1,..., Y s ) N(0, Σ). For example, if the payoff of a financial derivative is a function of the values taken by a c-dimensional geometric Brownian motion (GMB) at d observations times 0 < t 1 < < t d = T, then we have s = cd. To generate Y: Decompose Σ = AA t, generate Z = (Z 1,..., Z s ) N(0, I) where the (independent) Z j s are generated by inversion: Z j = Φ 1 (U j ), and return Y = AZ. Choice of A? Cholesky factorization: A is lower triangular. 61

109 Principal component decomposition (PCA) (Ackworth et al. 1998): A = PD 1/2 where D = diag(λ s,..., λ 1 ) (eigenvalues of Σ in decreasing order) and the columns of P are the corresponding unit-length eigenvectors. 62

110 Principal component decomposition (PCA) (Ackworth et al. 1998): A = PD 1/2 where D = diag(λ s,..., λ 1 ) (eigenvalues of Σ in decreasing order) and the columns of P are the corresponding unit-length eigenvectors. With this A, Z 1 accounts for the max amount of variance of Y, then Z 2 the max amount of variance cond. on Z 1, etc. 62

111 Principal component decomposition (PCA) (Ackworth et al. 1998): A = PD 1/2 where D = diag(λ s,..., λ 1 ) (eigenvalues of Σ in decreasing order) and the columns of P are the corresponding unit-length eigenvectors. With this A, Z 1 accounts for the max amount of variance of Y, then Z 2 the max amount of variance cond. on Z 1, etc. Function of a Brownian motion (or other Lévy process): Payoff depends on c-dimensional Brownian motion {X(t), t 0} observed at times 0 = t 0 < t 1 < < t d = T. 62

112 Principal component decomposition (PCA) (Ackworth et al. 1998): A = PD 1/2 where D = diag(λ s,..., λ 1 ) (eigenvalues of Σ in decreasing order) and the columns of P are the corresponding unit-length eigenvectors. With this A, Z 1 accounts for the max amount of variance of Y, then Z 2 the max amount of variance cond. on Z 1, etc. Function of a Brownian motion (or other Lévy process): Payoff depends on c-dimensional Brownian motion {X(t), t 0} observed at times 0 = t 0 < t 1 < < t d = T. Sequential (or random walk) method: generate X(t 1 ), then X(t 2 ) X(t 1 ), then X(t 3 ) X(t 2 ), etc. 62

113 Principal component decomposition (PCA) (Ackworth et al. 1998): A = PD 1/2 where D = diag(λ s,..., λ 1 ) (eigenvalues of Σ in decreasing order) and the columns of P are the corresponding unit-length eigenvectors. With this A, Z 1 accounts for the max amount of variance of Y, then Z 2 the max amount of variance cond. on Z 1, etc. Function of a Brownian motion (or other Lévy process): Payoff depends on c-dimensional Brownian motion {X(t), t 0} observed at times 0 = t 0 < t 1 < < t d = T. Sequential (or random walk) method: generate X(t 1 ), then X(t 2 ) X(t 1 ), then X(t 3 ) X(t 2 ), etc. Bridge sampling (Moskowitz and Caflisch 1996). Suppose d = 2 m. generate X(t d ), then X(t d/2 ) conditional on (X(0), X(t d )), 62

114 Principal component decomposition (PCA) (Ackworth et al. 1998): A = PD 1/2 where D = diag(λ s,..., λ 1 ) (eigenvalues of Σ in decreasing order) and the columns of P are the corresponding unit-length eigenvectors. With this A, Z 1 accounts for the max amount of variance of Y, then Z 2 the max amount of variance cond. on Z 1, etc. Function of a Brownian motion (or other Lévy process): Payoff depends on c-dimensional Brownian motion {X(t), t 0} observed at times 0 = t 0 < t 1 < < t d = T. Sequential (or random walk) method: generate X(t 1 ), then X(t 2 ) X(t 1 ), then X(t 3 ) X(t 2 ), etc. Bridge sampling (Moskowitz and Caflisch 1996). Suppose d = 2 m. generate X(t d ), then X(t d/2 ) conditional on (X(0), X(t d )), then X(t d/4 ) conditional on (X(0), X(t d/2 )), and so on. The first few N(0, 1) r.v. s already sketch the path trajectory. 62

115 Principal component decomposition (PCA) (Ackworth et al. 1998): A = PD 1/2 where D = diag(λ s,..., λ 1 ) (eigenvalues of Σ in decreasing order) and the columns of P are the corresponding unit-length eigenvectors. With this A, Z 1 accounts for the max amount of variance of Y, then Z 2 the max amount of variance cond. on Z 1, etc. Function of a Brownian motion (or other Lévy process): Payoff depends on c-dimensional Brownian motion {X(t), t 0} observed at times 0 = t 0 < t 1 < < t d = T. Sequential (or random walk) method: generate X(t 1 ), then X(t 2 ) X(t 1 ), then X(t 3 ) X(t 2 ), etc. Bridge sampling (Moskowitz and Caflisch 1996). Suppose d = 2 m. generate X(t d ), then X(t d/2 ) conditional on (X(0), X(t d )), then X(t d/4 ) conditional on (X(0), X(t d/2 )), and so on. The first few N(0, 1) r.v. s already sketch the path trajectory. Each of these methods corresponds to some matrix A. Choice has a large impact on the ANOVA decomposition of f. 62

116 Example: Pricing an Asian basket option We have c assets, d observation times. f (U) = e rt 1 max 0, cd Want to estimate E[f (U)], where c d S i (t j ) K i=1 j=1 is the net discounted payoff and S i (t j ) is the price of asset i at time t j. 63

117 Example: Pricing an Asian basket option We have c assets, d observation times. f (U) = e rt 1 max 0, cd Want to estimate E[f (U)], where c d S i (t j ) K i=1 j=1 is the net discounted payoff and S i (t j ) is the price of asset i at time t j. Suppose (S 1 (t),..., S c (t)) obeys a geometric Brownian motion. Then, f (U) = g(y) where Y = (Y 1,..., Y s ) N(0, Σ) and s = cd. 63

118 Example: Pricing an Asian basket option We have c assets, d observation times. f (U) = e rt 1 max 0, cd Want to estimate E[f (U)], where c d S i (t j ) K i=1 j=1 is the net discounted payoff and S i (t j ) is the price of asset i at time t j. Suppose (S 1 (t),..., S c (t)) obeys a geometric Brownian motion. Then, f (U) = g(y) where Y = (Y 1,..., Y s ) N(0, Σ) and s = cd. Even with Cholesky decompositions of Σ, the two-dimensional projections often account for more than 99% of the variance: low effective dimension in the superposition sense. With PCA or bridge sampling, we get low effective dimension in the truncation sense. In realistic examples, the first two coordinates Z 1 and Z 2 often account for more than 99.99% of the variance! 63

119 Numerical experiment with c = 10 and d = 25 This gives a 250-dimensional integration problem. Let ρ i,j = 0.4 for all i j, T = 1, σ i = (i 1)/9 for all i, r = 0.04, S(0) = 100, and K = 100. (Imai and Tan 2002). 64

120 Numerical experiment with c = 10 and d = 25 This gives a 250-dimensional integration problem. Let ρ i,j = 0.4 for all i j, T = 1, σ i = (i 1)/9 for all i, r = 0.04, S(0) = 100, and K = 100. (Imai and Tan 2002). Variance reduction factors for Cholesky (left) and PCA (right) (experiment from 2003): Korobov Lattice Rules n = n = n = a = 5693 a = 944 a = Lattice+shift Lattice+shift+baker Sobol Nets n = 2 14 n = 2 16 n = 2 18 Sobol+Shift Sobol+LMS+Shift Note: The payoff function is not smooth and also unbounded! 64

121 ANOVA Variances for ordinary Asian Option s = 3, seq. s = 3, BB s = 3, PCA s = 6, seq. s = 6, BB s = 6, PCA s = 12, seq. s = 12, BB s = 12, PCA Asian Option with S(0) = 100, K = 100, r = 0.05, σ = % of total variance for each cardinality of u 65

122 66 Total Variance per Coordinate for the Asian Option Asian Option (s = 6) with S(0) = 100, K = 100, r = 0.05, σ = 0.5 sequential BB PCA % of total variance Coordinate 1 Coordinate 2 Coordinate 3 Coordinate 4 Coordinate 5 Coordinate 6

123 Variance with good lattices rules and Sobol points variance Asian Option (PCA) s = 12, S(0) = 100, K = 100, r = 0.05, σ = 0.5 MC Sobol Lattice (P 2) + baker n n 67

124 68 Asian Option on a Single Asset, with control variate Let c = 1, S(0) = 100, r = ln(1.09), σ i = 0.2, T = 120/365, t j = D 1 /365 + (T D 1 /365)(j 1)/(d 1) for j = 1,..., d,

125 68 Asian Option on a Single Asset, with control variate Let c = 1, S(0) = 100, r = ln(1.09), σ i = 0.2, T = 120/365, t j = D 1 /365 + (T D 1 /365)(j 1)/(d 1) for j = 1,..., d, We estimated the optimal CV coefficient by pilot runs for MC and for each combination of sampling scheme, RQMC method, and n.

126 68 Asian Option on a Single Asset, with control variate Let c = 1, S(0) = 100, r = ln(1.09), σ i = 0.2, T = 120/365, t j = D 1 /365 + (T D 1 /365)(j 1)/(d 1) for j = 1,..., d, We estimated the optimal CV coefficient by pilot runs for MC and for each combination of sampling scheme, RQMC method, and n. d D 1 K µ σ 2 VRF of CV

127 VRFs (per run) for RQMC vs MC, with n Sequential sampling (left), bridge sampling (middle), and PCA (right). d D 1 K P n without CV with CV SEQ BBS PCA SEQ BBS PCA Kor+S Kor+S+B Sob+DS Kor+S Kor+S+B Sob+DS Kor+S Kor+S+B Sob+DS For d = 10, Sobol with PCA combined with CV reduces the variance approximately by a factor of , without increasing the CPU time. For d = 120, PCA is slower than SEQ by a factor of 2 or 3, but worth it. 69

128 Array-RQMC for Markov Chains Setting: A Markov chain with state space X R l, evolves as X 0 = x 0, X j = ϕ j (X j 1, U j ), j 1, where the U j are i.i.d. uniform r.v. s over (0, 1) d. Want to estimate µ = E[Y ] where Y = τ g j (X j ). Ordinary MC: n i.i.d. realizations of Y. Requires τs uniforms. Array-RQMC: L., Lécot, Tuffin, et al. [2004, 2006, 2008, etc.] Simulate an array (or population) of n chains in parallel. Goal: Want small discrepancy between empirical distribution of states S n,j = {X 0,j,..., X n 1,j } and theoretical distribution of X j, at each step j. At each step, use RQMC point set to advance all the chains by one step. j=1 70

129 Some RQMC insight: To simplify, suppose X j U(0, 1) l. We estimate µ j = E[g j (X j )] = E[g j (ϕ j (X j 1, U))] = g j (ϕ j (x, u))dxdu [0,1) l+d by ˆµ arqmc,j,n = 1 n 1 g j (X i,j ) = 1 n 1 g j (ϕ j (X i,j 1, U i,j )). n n i=0 This is (roughly) RQMC with the point set Q n = {(X i,j 1, U i,j ), 0 i < n}. We want Q n to have low discrepancy (LD) over [0, 1) l+d. i=0 71

130 Some RQMC insight: To simplify, suppose X j U(0, 1) l. We estimate µ j = E[g j (X j )] = E[g j (ϕ j (X j 1, U))] = g j (ϕ j (x, u))dxdu [0,1) l+d by ˆµ arqmc,j,n = 1 n 1 g j (X i,j ) = 1 n 1 g j (ϕ j (X i,j 1, U i,j )). n n i=0 This is (roughly) RQMC with the point set Q n = {(X i,j 1, U i,j ), 0 i < n}. We want Q n to have low discrepancy (LD) over [0, 1) l+d. We do not choose the X i,j 1 s in Q n : they come from the simulation. We select a LD point set i=0 Q n = {(w 0, U 0,j ),..., (w n 1, U n 1,j )}, where the w i [0, 1) l are fixed and each U i,j U(0, 1) d. Permute the states X i,j 1 so that X πj (i),j 1 is close to w i for each i (LD between the two sets), and compute X i,j = ϕ j (X πj (i),j 1, U i,j ) for each i. Example: If l = 1, can take w i = (i + 0.5)/n and just sort the states. For l > 1, there are various ways to define the matching (multivariate sort). 71

131 Array-RQMC algorithm X i,0 x 0 (or X i,0 x i,0 ) for i = 0,..., n 1; for j = 1, 2,..., τ do Compute the permutation π j of the states (for matching); Randomize afresh {U 0,j,..., U n 1,j } in Q n ; X i,j = ϕ j (X πj (i),j 1, U i,j ), for i = 0,..., n 1; n 1 ˆµ arqmc,j,n = Ȳ n,j = 1 n i=0 g(x i,j); Estimate µ by the average Ȳn = ˆµ arqmc,n = τ j=1 ˆµ arqmc,j,n. 72

132 Array-RQMC algorithm X i,0 x 0 (or X i,0 x i,0 ) for i = 0,..., n 1; for j = 1, 2,..., τ do Compute the permutation π j of the states (for matching); Randomize afresh {U 0,j,..., U n 1,j } in Q n ; X i,j = ϕ j (X πj (i),j 1, U i,j ), for i = 0,..., n 1; n 1 ˆµ arqmc,j,n = Ȳ n,j = 1 n i=0 g(x i,j); Estimate µ by the average Ȳn = ˆµ arqmc,n = τ j=1 ˆµ arqmc,j,n. Proposition: (i) The average Ȳn is an unbiased estimator of µ. (ii) The empirical variance of m independent realizations gives an unbiased estimator of Var[Ȳ n ]. 72

Randomized Quasi-Monte Carlo: An Introduction for Practitioners

Randomized Quasi-Monte Carlo: An Introduction for Practitioners Pierre L Ecuyer Abstract We survey basic ideas and results on randomized quasi-monte Carlo (RQMC) methods, discuss their practical aspects,