NEW FRONTIERS IN APPLIED PROBABILITY

Size: px

Start display at page:

Download "NEW FRONTIERS IN APPLIED PROBABILITY"

Dominic Price
5 years ago
Views:

1 J. Appl. Prob. Spec. Vol. 48A, (2011) Applied Probability Trust 2011 NEW FRONTIERS IN APPLIED PROBABILITY A Festschrift for SØREN ASMUSSEN Edited by P. GLYNN, T. MIKOSCH and T. ROLSKI Part 4. Simulation A COMPARISON OF CROSS-ENTROPY AND VARIANCE MINIMIZATION STRATEGIES JOSHUA C. C. CHAN, Australian National Uniersity Research School of Economics, Australian National Uniersity, Canberra, ACT 0200, Australia. PETER W. GLYNN, Stanford Uniersity Department of Management Science and Engineering, Huang Engineering Center, Stanford Uniersity, 475 Via Ortega, Stanford, CA , USA. DIRK P. KROESE, Uniersity of Queensland Department of Mathematics, Uniersity of Queensland, St Lucia, Brisbane, QLD 4072, Australia. address: kroese@maths.uq.edu.au APPLIED PROBABILITY TRUST AUGUST 2011

2 A COMPARISON OF CROSS-ENTROPY AND VARIANCE MINIMIZATION STRATEGIES BY JOSHUA C. C. CHAN, PETER W. GLYNN AND DIRK P. KROESE Abstract The ariance minimization (VM) and cross-entropy (CE) methods are two ersatile adaptie importance sampling procedures that hae been successfully applied to a wide ariety of difficult rare-eent estimation problems. We compare these two methods ia arious examples where the optimal VM and CE importance densities can be obtained analytically. We find that in the cases studied both VM and CE methods prescribe the same importance sampling parameters, suggesting that the criterion of minimizing the CE distance is ery close, if not asymptotically identical, to minimizing the ariance of the associated importance sampling estimator. Keywords: Variance minimization; cross entropy; importance sampling; rare-eent simulation; likelihood ratio degeneracy 2010 Mathematics Subject Classification: Primary 65C05 Secondary 65C60 1. Introduction In this article we compare two adaptie importance sampling procedures, namely the ariance minimization (VM) and cross-entropy (CE) methods [11], [16, pp ], in the context of rare-eent simulation. Both algorithms aim to find an importance density that is optimal in a well-defined sense, though the optimality criteria are different. Under the VM method, the optimal importance density is the one whose associated estimator has minimum ariance within a gien parametric family. Although this minimum ariance criterion is obiously desirable, in practice the minimization problem inoles application of time consuming numerical techniques. Instead of directly minimizing the ariance of the estimator, the CE method seeks to locate the importance density that is closest in Kullback Leibler diergence or CE distance to the zero-ariance importance density: the conditional density gien the rare eent. The main adantage of the CE method is that the optimization problem required to obtain the optimal density often admits closed-form solutions. To compare these two related but distinct algorithms, we consider arious explicit examples where the optimal VM and CE importance densities can be obtained analytically. In all the examples considered, we find that the optimal VM and CE importance densities are asymptotically identical. Although whether this result holds in general or not is an open question, it suggests that the VM and CE criteria are ery similar, at least asymptotically. Put differently, the importance density that is the closest in CE distance to the zero-ariance importance density is also the one whose associated estimator has the minimum asymptotic ariance. The significance of this obseration is that since CE estimators are typically easier to obtain, this practical adaptie importance sampling strategy is also optimal in the sense that it gies the Applied Probability Trust

3 184 J. C. C. CHAN ET AL. minimum ariance importance sampling estimator. Furthermore, in situations where the VM or CE optimization problem does not admit closed-form solutions, the optimal parameters need to be estimated ia a multileel procedure. We analyze how the ariability in the estimates affects the performance of the associated importance sampling estimator. The rest of this article is organized as follows. In Section 2 we first introduce some background material, and then discuss the classic VM and CE methods as well as two ariants proposed recently. We consider in Section 3 the sum of independent but not necessarily identical random ariables in the exponential families where the number of parameters is sent to infinity. We show in this scenario that the optimal CE parameters coincide with those suggested by large deiation theory. It is followed by three case studies: we consider the example of the sum of exponential random ariables in Section 4, and the cases for Pareto and Weibull random ariables in Sections 5 and 6, respectiely. 2. Adaptie importance sampling ia VM and CE methods We first introduce some standard notation and efficiency measures in the context of rare-eent simulation. We write a(t) b(t) to indicate that lim t a(t)/b(t) = 1, and X i = f, i = 1,...,n, to indicate that X 1,...,X n are independent and identically distributed () according to the density or distribution f. An unbiased estimator Z(γ) for l(γ ) is said to be logarithmically efficient, weakly efficient,orasymptotically optimal if log E Z(γ) 2 lim = 2. γ log l(γ ) This condition is equialent to the requirement that lim γ E Z(γ) 2 /l(γ ) 2 ε = 0 for eery ε > 0. The estimator is said to be strongly efficient or hae bounded relatie error if sup γ 0 E Z(γ) 2 /l(γ ) 2 <. It is readily obsered that bounded relatie error implies asymptotic optimality. These notions of efficiency are standard in the literature; see, for example, [1, pp ] and [13]. We are interested in estimating the probability of the form l = P(S(X) >γ)= 1(S(x) >γ)f(x) dx, where S is some real-alued performance function, X is a ector of random ariables with probability density function (PDF) f, and γ is a sufficiently large constant such that l is small. Consider estimating l ia the importance sampling estimator l IS = 1 N N 1(S(X i )>γ) f(x i) g(x i ), where X i = g, i = 1,...,N, for some importance sampling PDF g for which g(x) = 0 implies that 1(S(x) >γ)f(x) = 0 for all x. Although the estimator l IS is consistent and unbiased for any such g, its performance depends critically on the choice of g. Hence, we wish to choose g so that the associated estimator is optimal in a well-defined sense. To this end, consider a parametric family F ={f(x; )} indexed by a parameter ector that contains the nominal (original) density f. Thus, we can write f(x) = f(x; u) for some parameter ector u. For any gien, the general term of the associated importance sampling estimator is Z() = W(X; u, ) 1(S(X) > γ), where W(x; u, ) is the likelihood ratio defined as W(x; u, ) = f(x; u)/f (x; ). Now, we wish to choose so that the associated importance

4 A comparison of CE and VM strategies 185 sampling estimator has minimum ariance within the parametric family F. The minimizer m is referred to as the optimal VM parameter ector. For any unbiased estimator l of l,wehae ar l = E l 2 l 2. Therefore, m can be written as m = argmin E Z() 2 = argmin E u Z() = argmin log E u Z(), (2.1) where the expectation operators E u and E are taken with respect to the densities f( ; u) and f( ; ), respectiely. A related approach to locating a good importance density inoles the Kullback Leibler diergence,orce distance. To motiate the method, first note that the zeroariance importance density for estimating l is simply g (x) = l 1 f(x; u) 1(S(x) >γ) the conditional density gien the rare eent. Obiously, g cannot be used directly in practice as it inoles the unknown constant l. Neertheless, this proides a practical criterion to locate a good importance density. Specifically, if we choose the density within F that is the closest to g in the CE distance, then intuitiely the associated estimator should hae reasonable performance. Let ce denote the minimizer, which we refer to as the optimal CE parameter ector. It can be shown [16, pp ] that soling the CE minimization problem is equialent to finding ce = argmax f(x; u) 1(S(x) >γ)log f(x; ) dx. (2.2) Although the optimal CE and VM parameter ectors can be obtained analytically for a few specific cases, in general the optimization problems in (2.1) and (2.2) are difficult to sole. Thus, in practice, we often need to estimate m or ce ia a multileel procedure, which we shall call multileel VM or CE (see [11] for a more thorough discussion). Recent research has shown that in certain high-dimensional cases the estimates for m and ce obtained from the multileel procedures are not accurate, and, as a consequence, the associated estimators perform poorly [8], [9], [14], [15]. A recent ariant, called the screening method, introduced in [15], aims to reduce the dimension of the likelihood ratio, and is shown to perform better than the multileel VM and CE methods in arious high-dimensional estimation problems. To motiate the method, partition the parameter ector u into two subsets: u = (u 0, u 1 ), where the occurrence of the rare eent {S(X) >γ} is substantially affected by u 1 but not by u 0. The ector u 1 is referred to as the bottleneck parameter. Now consider the parametric family F 0 ={f(x; ṽ)} indexed by ṽ, where ṽ = (u 0, 1 ) and u 0 is fixed therefore, F 0 is in fact indexed by 1. The screening method proceeds in the same way as the multileel CE and VM methods, but instead of twisting the whole parameter ector u, we only twist the bottleneck parameter u 1. Since F 0 F, the ariance of the importance sampling estimator associated with m is at least as small as the ariance of the estimator associated with ṽ m simply by definition. Paradoxically, howeer, the empirical findings in [15] suggest otherwise for the situation where the parameters are estimated ia a multileel procedure. A possible explanation is that the parameter ector obtained ia the multileel procedure, say, m,t, is not an accurate estimate for m. By reducing the dimension of the likelihood ratio ia the screening method, we can estimate ṽ m the optimal VM parameter ector within F 0 more accurately. As a result, the importance density f(x; ṽ m,t ), where ṽ m,t denotes the parameter ector obtained ia the screening method, is closer to g compared to f(x; m,t ), and, thus, the estimator associated with the former density has a smaller ariance than that of the latter. Another improed ariant proposed by Chan and Kroese [8] aims to estimate ce in one step so as to circument likelihood degeneracy of the estimation procedure. Specifically, instead of

5 186 J. C. C. CHAN ET AL. the multileel procedure in the classic CE method, they proposed estimating ce by finding ce = argmax M log f(x j ; ), j=1 where X 1,...,X M are draws from g. They demonstrated that the improed CE method does not only gie substantial improement oer the traditional approach but also works well in high-dimensional estimation problems. Generating draws from g, howeer, might not be triial in general, but with the adent of Marko chain Monte Carlo (MCMC) methods, this problem is well studied and a ariety of techniques are aailable. In Sections 4 6 we consider arious concrete examples where we can derie asymptotic expressions for m and ce, and we show that they are identical asymptotically. We then compute the asymptotic ariances of the associated estimators and inestigate how they are affected by the estimation errors introduced in the multileel VM and CE approaches. 3. Sum of independent nonidentical random ariables in the exponential families In this section we consider the rare-eent regime where the number of random ariables n approaches infinity. To set the stage, suppose that X 1,X 2,... is a sequence of independent but not necessarily identical random ariables where each X j belongs to a one-parameter exponential family parameterized by the mean; that is, the density of each X j is gien by f j (x; u j ) = e xθ(u j ) ζ(θ(u j )) h j (x). (3.1) Let µ n = n E X i and σ 2 n = n ar X i. We are interested in estimating the probabilities l n = P(S n >nb) as n, where S n = X 1 + +X n and lim n (nb µ n )/σ n =. We will show that the optimal CE parameters coincide with those suggested by large deiation theory. More specifically, the CE method suggests twisting the means of the random ariables such that their sum is equal to the threshold nb. Proposition 3.1. Let X 1,X 2,...be a sequence of independent random ariables such that X j belongs to a one-parameter exponential family parameterized by the mean with PDF gien in (3.1). Consider estimating l n = P(S n >nb)as n ia the CE method with importance density of the form n f i (x i ; i ). Suppose that lim n (nb µ n )/σ n =, where µ n = n E X i and σ 2 n = n ar X i. Then the optimal CE parameters ce,1, ce,2,...satisfy n ce,i nb as n. Proof. First note that to estimate l n, the optimal CE parameters ce,i,i= 1,...,n, are gien in [17, p. 320]: ce,i = E[X i S n >nb]. Therefore, n ce,i = E[S n S n >nb]. By the central limit theorem, (S n µ n )/σ n is asymptotically N(0, 1) distributed as n. Therefore, E[S n S n >nb] µ n + ϕ((nb µ n)/σ n ) 1 ((nb µ n )/σ n ) σ n µ n + nb µ n σ n = nb σ n as n, where ϕ( ) and ( ) are respectiely the PDF and cumulatie distribution function (CDF) of the standard normal distribution hence, the desired result.

6 A comparison of CE and VM strategies 187 In what follows, we consider a different rare-eent regime where we keep the number of random ariables n fixed and let γ go to infinity. 4. Sum of exponential random ariables Consider the estimation of l = P(X 1 + +X n >γ)ia importance sampling, where X i = Exp(1), i = 1...,n; that is, X i has PDF f(x)= e x,x 0. Note that n 1 l = e γ k=0 γ k k! = Ɣ(n, γ ) Ɣ(n), (4.1) where Ɣ(n) = (n 1)! and Ɣ(n, γ ) is the alue of the (upper) incomplete gamma function at (n, γ ). Suppose that we generate X i = Exp( 1 ), i = 1,...,n,with PDF f(x; 1 ) = 1 exp( 1 x). It follows that the general term in the importance sampling estimator is Z() = 1(X 1 + +X n > γ )W(X; 1, 1 ), where the likelihood ratio is gien by ( ( W(x; 1, 1 ) = n exp 1 1 ) n x i ). We first derie the asymptotic expressions for the optimal VM and CE parameters and show that they are the same. We note that Rubinstein and Kroese [16, p. 69] proed the special case for n = 1. Proposition 4.1. Let X i = Exp(1), i = 1,...,n. To estimate l = P(X 1 + +X n >γ)ia importance sampling, suppose that we generate X i identically from the Exp( 1 ) distribution. Then the optimal VM and CE parameters are asymptotically the same. In fact, we hae m γ/nand ce γ/n. Proof. To obtain the optimal VM parameter, we first derie an asymptotic expression for the second moment of the importance sampling estimator Z(): E 1/ Z() 2 = E 1 Z() = xi >γ ( ( n exp 1 1 ) n n x i) e x i dx = n ( 2 1 ) n P(Y 1 + +Y n >γ). Here Y i = Exp(2 1/), i = 1,...,n, for > 2 1. Therefore, we hae E 1/ Z() 2 e 2γ γ n 1 (n 1)! n e γ/ 2 1/ as γ. To obtain m, we differentiate log E 1/ Z() 2 with respect to and sole the equation (with the constraint that > 2 1 ): d d log E 1/ Z() 2 n γ = 0.

7 188 J. C. C. CHAN ET AL. It follows that m = γ + γ 2 nγ + (n + 1) 2 /4 2n + n + 1 4n + O(γ 1 ) γ n. To compute the optimal CE parameter, we first note that the exponential distribution is a member of the exponential family. Therefore, ce = E[X 1 X 1 + +X n >γ]= 1 E[Y Y>γ], n where Y = D Gamma(n, 1). Direct computation shows (see also [16, p. 78]) that E[Y Y > γ ]=Ɣ(n + 1, γ )/ Ɣ(n, γ ). Since we hae Ɣ(n + 1,γ)= nɣ(n, γ ) + γ n e γ and Ɣ(n, γ ) = γ n 1 e γ (1 + O(γ 1 )), E[Y Y>γ]=n + γ n e γ Ɣ(n, γ ) = n + γ(1 + O(γ 1 )). It follows that ce = γ/n+ 1 + O(γ 1 ) γ/nas γ. Therefore, by Proposition 4.1, the optimal VM parameter is asymptotically identical to that gien by the CE program when γ. We show in the next proposition that either m or ce in fact gies an asymptotically optimal importance sampling estimator for l. In addition, by the definition of m, no other importance sampling estimators obtained by generating X i identically from Exp( 1 ) can be strongly efficient. In what follows, we also inestigate how the estimation error in obtaining ce affects the relatie error of the associated importance sampling estimator. Proposition 4.2. Under the same assumptions as in Proposition 4.1, if we set = γ/n+ h for some constant h then E 1/ Z() 2 ( l 2 = en (n 1)! 2n n γ 1 + 2n 1 + n ) 2γ 4γ 2 (2n2 h 2 2nh + 3n 2) + O(γ 3 ) (4.2) as γ. In particular, the optimal VM/CE parameter gies an asymptotically optimal estimator. Proof. Let = γ/n+ h. By a similar computation as in Proposition 4.1 we hae E 1/ Z() 2 = = n Ɣ(n, (2 1/)γ ) (2 1/) n Ɣ(n) ( n e 2γ e γ/ (2 1/)(n 1)! γ n n 1 2 1/ γ 1 + O(γ 2 ) ). (4.3) Substituting = γ/n+h and using the expression for l in (4.1) gies (4.2). The final statement in the proposition follows by setting = γ/n m. It is worth noting that in (4.2) only the coefficient of the third order term 1/γ 3 inoles h, and that the magnitude of h does not affect the asymptotic efficiency of the importance sampling estimator. Howeer, when the dimension of the estimation problem n is large, h might hae a

8 A comparison of CE and VM strategies 189 substantial impact on the ariance of the importance sampling estimator for any finite γ. This explains why the multileel CE estimator generally works well in problems with light-tailed random ariables, but sometimes breaks down when the dimension of the problem becomes large; see, e.g. [8]. We now inestigate what happens when the importance sampling parameter is obtained ia a random procedure, such as in the multileel VM or CE method. Let us denote the random parameter thus obtained by V, which is independent of the {Z i } used in the importance sampling estimator. In the CE procedure the reference parameter V is obtained as V = Nk=1 1(S k >γ)w k (w)s k n N k=1 1(S k >γ)w k (w), (4.4) where W k (w) = W(X k ; 1, 1/w) is the kth likelihood ratio corresponding to a reference parameter w obtained in the penultimate iteration, and S k = Gamma(n, 1/w), k = 1,...,N. The parameter w is usually random as well for example when obtained ia a CE procedure. Suppose, howeer, that w is some arbitrarily fixed reference parameter. The asymptotic distribution of V as a function of w is gien in the next proposition. Proposition 4.3. Under the same assumptions as in Proposition 4.1, the CE reference parameter V gien in (4.4) is asymptotically normal as N with mean ce and ariance σγ,w 2 /N. Furthermore, we hae ( σγ,w ) 2 w n (n 1)! n (2 1/w) γ n+3 e γ/w as γ. In particular, σ 2 γ,γ/n γ 3 (n 1)! (1 1/n)2 e n 2n n. Proof. First note that V gien in (4.4) is a ratio estimator. By the delta method [1, pp ], the asymptotic distribution is normal with mean and ariance σγ,w 2 /N, with µ = E 1/w 1(S > γ )W (w)s n E 1/w 1(S > γ )W (w) = E 1 1(S > γ )S n E 1 1(S > γ ) = ce σγ,w 2 = ar(a) 2µ co(a, B) + µ2 ar(b) l 2, where A = 1(S > γ )W (w)s, B = 1(S > γ )W (w), and S is Gamma(n, 1/w) distributed. The second moment of B is gien in (4.3) with w substituted for. The expectation of A is simply E 1/w A = E 1 1(S > γ )S = l ce. The second moment of A is E 1/w A 2 = 1(s > γ )w n e (1 1/w)s s 2 1 Ɣ(n) sn 1 e s ds n(n + 1) Ɣ(n + 2,(2 1/w)γ ) = (2 1/w) n+2 Ɣ(n + 2) γ 2 E w B 2.

9 190 J. C. C. CHAN ET AL. Moreoer, E 1/w AB = E 1 1(S > γ )W (w)s γ E w B 2. It follows, after some algebra, that, for n>1, ( σγ,w ) 2 w n (n 1)! n (2 1/w) γ n+3 e γ/w. This completes the proof. Note that the asymptotic ariance of the CE reference parameter V is cubic in γ when we set w = γ/n m. Therefore, een though ce gies an asymptotically optimal estimator, when γ is sufficiently large, the estimation error in obtaining ce in the multileel CE procedure might be so substantial that it renders the resulting importance sampling estimator unreliable. 5. Sum of Pareto random ariables We now consider estimating the tail probability of the sum of heay-tailed random ariables. Specifically, we wish to estimate l = P(X 1 + +X n >γ)ia importance sampling, where X i = Pareto(1, 1), i = 1,..., n. Since the Pareto distribution is subexponential [1, pp ], we hae l n/(1 + γ)as γ. To estimate l ia importance sampling, we consider the Pareto(α, 1) family indexed by α > 0 with PDF f(x; α) = α(1 + x) (α+1),x 0. Now suppose that we generate X i = Pareto(α, 1), i = 1,...,n. The general term of the likelihood ratio is W(x; 1,α)= n (1 + x i ) 2 α(1 + x i ) (α+1) = α n and the corresponding importance sampling estimator is n (1 + x i ) (1 α), Z(α) = 1(X 1 + +X n > γ )W(X; 1,α). In the following proposition we show that the optimal VM and CE parameters are identical. In fact, we show that α m n/ log(1 + γ), which gies the minimum ariance estimator within the class of importance sampling estimators obtained by generating X i = Pareto(α, 1) for i = 1,...,n. Compare this with the suggestions in [3] and [10]. Proposition 5.1. Let X i = Pareto(1, 1), i = 1,...,n. Suppose that we wish to estimate l = P(X X n >γ)ia importance sampling by generating X i = Pareto(α, 1). Then the optimal VM and CE parameters for α are asymptotically the same. In fact, we hae α m n/ log(1 + γ). Proof. Note that the optimal CE parameter for α is gien in [4]: α ce = (1 + log(1 + γ)/n) 1 n/ log(1 + γ).to compute the optimal VM parameter, we first derie the second moment of Z(α) with respect to the Pareto(α, 1) distribution: E α Z(α) 2 = E 1 Z(α) = α n xi >γ n (1 + x i ) (1 α) (1 + x i ) 2 dx = (2α α 2 ) n P(Y 1 + +Y n >γ). Here Y i = Pareto(2 α, 1), i = 1,..., n, proided that α < 2. Hence, E α Z(α) 2 (2α α 2 ) n n(1 + γ) (2 α). (5.1)

10 A comparison of CE and VM strategies 191 By a computation similar to that in Proposition 4.1 we hae α m = (2/n)log(1 + γ)+ 2 (4/n 2 ) log 2 (1 + γ)+ 4 + O(log 2 (1 + γ)) (2/n)log(1 + γ) n log(1 + γ). Again, the optimal VM parameter is asymptotically identical to that gien by the CE program as γ. We next inestigate how the choice of the parameter α affects the growth rate of E α Z(α) 2 /l 2. As a corollary to Proposition 5.1, we show that α = α ce n/ log(1 + γ)gies an importance sampling estimator that is asymptotically optimal. We note that Asmussen and Kroese [2] proided a conditional Monte Carlo estimator that has bounded relatie error for the case of the sum of Pareto random ariables. In addition, by utilizing a technique based on Lyapunotype inequalities first introduced in [5], Blanchet and Li [6] were able to derie an importance sampling estimator that achiees bounded relatie error for general subexponential distributions. Proposition 5.2. Under the same assumptions as in Proposition 5.1, if we set α = n/ log(1 + γ)+ h for some constant h such that 0 <α<2, then E α Z(α) 2 ( l 2 en n γ h 2n h(2 h) + log(1 + γ) n 2 ) n log 2 as γ. (5.2) (1 + γ) In particular, the optimal VM/CE parameter gies an asymptotically optimal estimator. Proof. By (5.1) and the fact that l n/(1 + γ),wehae E α Z(α) 2 l 2 1 n(2α α 2 ) n (1 + γ)α. Hence, if we set α = n/ log(1 + γ) + h then (5.2) follows. As a result, for α = α ce n/ log(1 + γ),we hae E α Z(α) 2 ( l 2 en 2n n log(1 + γ) n 2 ) n log 2 as γ. (1 + γ) This completes the proof. It is of interest to note that in contrast to the light-tailed case, the estimation error h does increase the asymptotic ariance of the importance sampling estimator. Therefore, the problem of suboptimal VM and CE reference parameters is expected to be more seere in the heay-tailed case. 6. Sum of Weibull random ariables Consider the same estimation problem as in the last section, but now X i = Weib(β, 1), i = 1,...,n, for 0 <β<1; that is, X i has PDF f(x; β) = βx β 1 e xβ. We wish to estimate the tail probability l ia importance sampling by tilting the scale parameter. That is, we locate the importance density within the parametric family Weib(β, θ) with PDF f(x; β, θ) = θβx β 1 e θxβ,x 0, indexed by θ>0while keeping β fixed. It follows that the general term

11 192 J. C. C. CHAN ET AL. of the importance sampling estimator is with likelihood ratio Z(θ) = 1(X 1 + +X n > γ )W(X; 1,θ), ( W(x; 1,θ)= θ n exp (1 θ) Again, for this sum of Weibull random ariables case, the optimal VM and CE parameters coincide asymptotically. Kroese and Rubinstein [12] proed that, for the n = 2 case, the optimal VM parameter is asymptotically θ m 2/γ β. They conjectured that, for general n, θ m n/γ β, and the squared relatie error of the resulting estimator increases proportionally to γ nβ. The following propositions proe these two conjectures. Proposition 6.1. Let X i = Weib(β, 1), i = 1,..., n, with 0 < β < 1. Suppose that we wish to estimate l = P(X X n >γ)ia importance sampling by generating X i = Weib(β, θ). Then the optimal VM and CE parameters for θ are asymptotically identical. In fact, we hae θ m n/γ β. Proof. First note that the optimal CE parameter for θ is gien in [4] and [16]: θ ce = n/(n + γ β ) n/γ β. Next we compute the optimal VM parameter as follows: E θ Z(θ) 2 = E 1 Z(θ) ( = θ n exp (1 θ) xi >γ n x β i n x β i ) n = θ n (2 θ) n P(Y 1 + +Y n >γ). ). βx β 1 i e xβ i dx Here Y i = Weib(β, 2 θ), proided that θ < 2. Since the Weib(β, θ) distribution is subexponential for β<1, we hae E θ Z(θ) 2 n θ n (2 θ) n P(Y n 1 >γ)= θ n (2 θ) n e (2 θ)γβ as γ. (6.1) By a similar computation as in Proposition 4.1, it can be shown that θ m = n γ β n2 γ 2β + O(γ (β+1) ) n γ β as γ. Therefore, the optimal CE and VM parameters for θ are identical asymptotically. The choice of θ m is to be compared with the suggestion in [10] to take θ = b/γ β, where b>0is some arbitrary constant. Since the Weib(β, 1) distribution is subexponential for β<1, it follows that l ne γ β. Therefore, using the expression in (6.1), it can be shown that if we choose θ such that θγ β = c for some constant c, the associated importance sampling estimator is asymptotically optimal. In particular, the choice θ = θ ce θ m gies an asymptotically optimal importance sampling estimator, which also has the minimum asymptotic ariance within the class of importance sampling estimators with importance densities under which X i = Weib(β, θ), i = 1,...,n.

12 A comparison of CE and VM strategies 193 Proposition 6.2. Under the same assumptions as in Proposition 6.1, if we set θ = n/γ β then E θ Z(θ) 2 l 2 en 2 n n n+1 γ nβ, i.e. the optimal VM/CE parameter gies an asymptotically optimal estimator. Proof. Since the Weib(β, 1) distribution is subexponential for β<1, we hae l ne βγ. It follows from (6.1) that E θ Z(θ) 2 l 2 n 1 θ n (2 θ) n e θγβ as γ. The desired result follows by letting θ = n/γ β. 7. Concluding remarks and future research We compared the VM and CE methods through arious concrete examples and we found that in the three examples considered the optimal VM and CE parameters are asymptotically identical. It would be of considerable interest to determine under what conditions this is the case. Since CE estimators are typically easy to obtain, this would proide a practical approach to locate the importance sampling estimator with the minimum ariance within a gien parametric class. Moreoer, it is worthwhile to study further the impact of CE parameter estimation on the quality of the associated importance sampling estimator. Acknowledgements Joshua Chan and Dirk Kroese would like to acknowledge financial support from the Australian Research Council through Discoery Grants DP and DP , respectiely. References [1] Asmussen, S. and Glynn, P. W. (2007). Stochastic Simulation: Algorithms and Analysis. Springer, New York. [2] Asmussen, S. and Kroese, D. P. (2006). Improed algorithms for rare eent simulation with heay tails. Ad. Appl. Prob. 38, [3] Asmussen, S., Binswanger, K. and Højgaard, B. (2000). Rare eents simulation for heay-tailed distributions. Bernoulli 6, [4] Asmussen, S., Kroese, D. P. and Rubinstein, R. Y. (2005). Heay tails, importance sampling and cross-entropy. Stoch. Models 21, [5] Blanchet, J. and Glynn, P. (2008). Efficient rare-eent simulation for the maximum of heay-tailed random walks. Ann. Appl. Prob. 18, [6] Blanchet, J. and Li, C. (2011). Efficient rare eent simulation for heay-tailed compound sums. To appear in ACM Trans. Model. Comput. Simul. [7] Casella, G. and Berger, R. L. (2001). Statistical Inference, 2nd edn. Duxbury Press. [8] Chan, J. C. C. and Kroese, D. P. (2011). Improed cross-entropy method for estimation. Submitted. [9] Chan, J. C. C. and Kroese, D. P. (2009). Rare-eent probability estimation with conditional Monte Carlo. Ann. Operat. Res., 19pp. [10] Juneja, S. and Shahabuddin, P. (2002). Simulating heay tailed processes using delayed hazard rate twisting. ACM Trans. Model. Comput. Simul. 12, [11] Kroese, D. P. (2011). The cross-entropy method. In Wiley Encyclopedia of Operations Research and Management Science. [12] Kroese, D. P. and Rubinstein, R. Y. (2004). The transform likelihood ratio method for rare eent simulation with heay tails. Queueing Systems 46, [13] L Ecuyer, P., Blanchet, J. H., Tuffin, B. and Glynn, P. W. (2010). Asymptotic robustness of estimators in rare-eent simulation. ACM Trans. Model. Comput. Simul. 20, [14] Rubinstein, R. Y. (2009). The Gibbs cloner for combinatorial optimization, counting and sampling. Methodology Comput. Appl. Prob. 11,

13 194 J. C. C. CHAN ET AL. [15] Rubinstein, R. Y. and Glynn, P. W. (2009). How to deal with the curse of dimensionality of likelihood ratios in Monte Carlo simulation. Stoch. Models 25, [16] Rubinstein, R. Y. and Kroese, D. P. (2004). The Cross-Entropy Method. Springer, New York. [17] Rubinstein, R. Y. and Kroese, D. P. (2007). Simulation and the Monte Carlo Method, 2nd edn. John Wiley, New York. JOSHUA C. C. CHAN, Australian National Uniersity Research School of Economics, Australian National Uniersity, Canberra, ACT 0200, Australia. PETER W. GLYNN, Stanford Uniersity Department of Management Science and Engineering, Huang Engineering Center, Stanford Uniersity, 475 Via Ortega, Stanford, CA , USA. DIRK P. KROESE, Uniersity of Queensland Department of Mathematics, Uniersity of Queensland, St Lucia, Brisbane, QLD 4072, Australia. address:

Provided for non-commercial research and educational use only. Not for reproduction, distribution or commercial use.

Provided for non-commercial research and educational use only. Not for reproduction, distribution or commercial use. Proided for non-commercial research and educational use only. Not for reproduction, distribution or commercial use. This chapter was originally published in the book Handbook of Statistics. The copy attached