Yixi Shi. Jose Blanchet. IEOR Department Columbia University New York, NY 10027, USA. IEOR Department Columbia University New York, NY 10027, USA

Similar documents
Estimation of the large covariance matrix with two-step monotone missing data

Asymptotically Optimal Simulation Allocation under Dependent Sampling

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

MATH 2710: NOTES FOR ANALYSIS

4. Score normalization technical details We now discuss the technical details of the score normalization method.

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process

An Investigation on the Numerical Ill-conditioning of Hybrid State Estimators

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning

Convex Optimization methods for Computing Channel Capacity

State Estimation with ARMarkov Models

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

Feedback-error control

Radial Basis Function Networks: Algorithms

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)]

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK

Paper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation

Topic 7: Using identity types

On Doob s Maximal Inequality for Brownian Motion

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data

Information collection on a graph

A MIXED CONTROL CHART ADAPTED TO THE TRUNCATED LIFE TEST BASED ON THE WEIBULL DISTRIBUTION

arxiv: v1 [physics.data-an] 26 Oct 2012

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK

Published: 14 October 2013

Finite Mixture EFA in Mplus

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS

AN OPTIMAL CONTROL CHART FOR NON-NORMAL PROCESSES

Information collection on a graph

Brownian Motion and Random Prime Factorization

Shadow Computing: An Energy-Aware Fault Tolerant Computing Model

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

Various Proofs for the Decrease Monotonicity of the Schatten s Power Norm, Various Families of R n Norms and Some Open Problems

Sums of independent random variables

A New Asymmetric Interaction Ridge (AIR) Regression Method

On Wald-Type Optimal Stopping for Brownian Motion

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules

Notes on Instrumental Variables Methods

Improved Capacity Bounds for the Binary Energy Harvesting Channel

Linear diophantine equations for discrete tomography

Proceedings of the 2017 Winter Simulation Conference W. K. V. Chan, A. D Ambrogio, G. Zacharewicz, N. Mustafee, G. Wainer, and E. Page, eds.

On the capacity of the general trapdoor channel with feedback

Developing A Deterioration Probabilistic Model for Rail Wear

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES

Introduction to Probability and Statistics

Modeling and Estimation of Full-Chip Leakage Current Considering Within-Die Correlation

Estimating function analysis for a class of Tweedie regression models

Metrics Performance Evaluation: Application to Face Recognition

John Weatherwax. Analysis of Parallel Depth First Search Algorithms

Location of solutions for quasi-linear elliptic equations with general gradient dependence

Adaptive estimation with change detection for streaming data

General Linear Model Introduction, Classes of Linear models and Estimation

Probability Estimates for Multi-class Classification by Pairwise Coupling

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models

Approximating min-max k-clustering

On split sample and randomized confidence intervals for binomial proportions

t 0 Xt sup X t p c p inf t 0

Applied Mathematics and Computation

Recursive Estimation of the Preisach Density function for a Smart Actuator

Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition

Deriving Indicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V.

Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis

Numerical Linear Algebra

dn i where we have used the Gibbs equation for the Gibbs energy and the definition of chemical potential

Characterizing the Behavior of a Probabilistic CMOS Switch Through Analytical Models and Its Verification Through Simulations

EE/Stats 376A: Information theory Winter Lecture 5 Jan 24. Lecturer: David Tse Scribe: Michael X, Nima H, Geng Z, Anton J, Vivek B.

arxiv: v2 [stat.me] 3 Nov 2014

Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition

Lower bound solutions for bearing capacity of jointed rock

The Poisson Regression Model

Understanding and Using Availability

HENSEL S LEMMA KEITH CONRAD

A Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition

Generalized Coiflets: A New Family of Orthonormal Wavelets

Distributed Rule-Based Inference in the Presence of Redundant Information

An Analysis of Reliable Classifiers through ROC Isometrics

ECE 534 Information Theory - Midterm 2

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Computer arithmetic. Intensive Computation. Annalisa Massini 2017/2018

Coding Along Hermite Polynomials for Gaussian Noise Channels

Inequalities for the L 1 Deviation of the Empirical Distribution

START Selected Topics in Assurance

Elementary Analysis in Q p

Distributed K-means over Compressed Binary Data

Asymptotic Properties of the Markov Chain Model method of finding Markov chains Generators of..

Supplementary Materials for Robust Estimation of the False Discovery Rate

arxiv: v3 [physics.data-an] 23 May 2011

Universal Finite Memory Coding of Binary Sequences

A New GP-evolved Formulation for the Relative Permittivity of Water and Steam

Some results of convex programming complexity

SAS for Bayesian Mediation Analysis

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

#A64 INTEGERS 18 (2018) APPLYING MODULAR ARITHMETIC TO DIOPHANTINE EQUATIONS

A note on the random greedy triangle-packing algorithm

CONVOLVED SUBSAMPLING ESTIMATION WITH APPLICATIONS TO BLOCK BOOTSTRAP

Chapter 3. GMM: Selected Topics

A randomized sorting algorithm on the BSP model

A PEAK FACTOR FOR PREDICTING NON-GAUSSIAN PEAK RESULTANT RESPONSE OF WIND-EXCITED TALL BUILDINGS

Transcription:

Proceedings of the 2011 Winter Simulation Conference S. Jain, R. R. Creasey, J. Himmelsach, K. P. White, and M. Fu, eds. EFFICIENT RARE EVENT SIMULATION FOR HEAVY-TAILED SYSTEMS VIA CROSS ENTROPY Jose Blanchet IEOR Deartment Columbia University New Yor, NY 10027, USA Yixi Shi IEOR Deartment Columbia University New Yor, NY 10027, USA ABSTRACT The cross entroy method is a oular technique that has been used in the context of rare event simulation in order to obtain a good selection in the sense of variance erformance tested emirically of an imortance samling distribution. This iterative method requires the selection of a suitable arametric family to start with. The selection of the arametric family is very imortant for the successful alication of the method. Two roerties must be enforced in such a selection. First, subsequent udates of the arameters in the iterations must be easily comutable and, second, the arametric family should be owerful enough to aroximate, in some sense, the zero-variance imortance samling distribution. We obtain arametric families for which these two roerties are satisfied for a large class of heavy-tailed systems including Pareto and Weibull tails. Our estimators are shown to be strongly efficient in these settings. 1 INTRODUCTION Tail robabilities of sums of heavy-tailed increments are a fundamental roblem in the alied robability field. A large number of alications boils down to these building blocs. In this aer we focus our attention on the tail robabilities of a finite sum of heavy-tailed random variables, and we roose a method to imrove variance reduction of an existing class of estimators with roved efficiency. Let S m = X 1 + X 2 +... + X m be a sum of indeendently and identically distributed i.i.d. random variables, with S 0 = 0 and that the X n s are suitably heavy-tailed. The rimary interest is the design of efficient estimators for the tail robability of the sum ub = PS m > b. The basic intuition behind the construction of efficient imortance samling estimators is that one should mimic the behavior of the zero variance change of measure, which coincides with the conditional distribution PS S m > b 1 see for examle, Asmussen and Glynn 2008. Therefore, the behavior of the heavy tailed random wal conditional on the rare event becomes the target to be traced by aths generated under the imortance samling distribution. It is well nown from the theory of heavy-tailed large deviations that this target is characterized by the so-called rincile of big jum, which states that as b the rare event occurs due to the contribution of a single large increment of size Ωb For non-negative f and g we adot the notations 1 f b = Ogb if f b cgb for some c > 0, 2 f b = Ωgb if f b cgb, and 3 f b = ogb as b if f b/gb 0 as b.. On the other hand, aths with more than one jums of order Ωb shall not be neglected in the construction of imortance samler, because of an observation ointed out by Binswanger and Hojgaard. 1997 that the second moment of the estimator for heavy tailed large deviation robabilities is very much sensitive to the lielihood ratio of these aths see also Examle 1 in Section 2. 978-1-4577-2109-0/11/$26.00 2011 IEEE 516

Guided by these observations, it is natural to suggest a mixture based samler for the increments as the candidate imortance samler. Recently several state-deendent imortance samling estimators based on such mixtures Duuis, Leder, and Wang 2006 and Blanchet and Liu 2011 have been develoed and shown to be strongly efficient which means that the number of samles needed to achieve a fixed relative recision is bounded as b. In simle words, one samles the next increment from different regions of its suort with different robabilities. We shall delay the secific form of the mixture to the next section. Since the zero variance change of measure 1, otimal among all ossible samling distribution, involves the unnown quantity of interest ub and is therefore infeasible, the search of global otimal samling distribution is a futile attemt. But if one restricts otimization within a secific arametric family of samler, there is hoe that an imroved change of measure within that family can be obtained. One owerful tool that exactly fits into this setting is Cross Entroy CE minimization see for examle, Rubinstein and Kroese 2004 and Kroese, Rubinstein, and Glynn 2010. Instead of directly minimizing the variance of the estimator, the CE method minimizes the cross-entroy discreancy between two densities. The main advantage of the CE method is that, if the arametric family is well chosen, the otimization roblem often admits closed-form solutions, as oosed to the variance minimization VM method we refer readers to Chan, Glynn, and Kroese 2011 for an in-deth comarison between these two methods. The successful alication of the CE method is closely tied to the quality of the selected arametric family of densities to start with. Two roerties must be enforced in such a selection. First, the arametric family should be owerful enough to aroximate, in some sense, the zero-variance imortance samling distribution and, second, subsequent udates of the arameters in the iterations must be easily comutable. We shall focus on elaborating these roerties on the mixture family of our choice in this aer and demonstrate emirically the erformance of this aroach alied to the mixture family. We noticed that in existing wors, the alication of the CE method on estimating tail robabilities of sums of heavy-tailed random variables has been restricted to imortance samling densities that do not cature the rincile of big jum ; for examle Chan, Glynn, and Kroese 2011 and Blanchet, Chan, and Kroese 2010 considered imortance samling densities by tilting the scale arameters of the Weibull and log-normal increment distributions, resectively. As exected, the corresonding estimators are asymtotically efficient in a wea sense, as oosed to the strong efficiency criterion that our roosed family satisfies see Theorem 1 below. Our contribution of this aer is to justify the alicability of the CE method to a arametric family of densities that cature the large deviations behavior of the heavy-tailed sum, and the resulting estimator is strongly efficient. The rest of the aer is organized as follows. In Section 2 we introduce the assumtions for the heavy-tailed increments, and ut forward the arametric family of imortance samling densities to wor on. Section 3 justifies the reservation of strong efficiency when switching among the same arametric mixture family. In Section 4 the CE method is reviewed and we discuss how it can be alied to the mixture family under consideration, after which the iterative equations are derived in closed-form. Finally in Section 5 we test the erformance of our aroach on two examles and give further discussions. 2 ASSUMPTIONS, NOTATIONS AND PARAMETRIC FAMILY OF IS DISTRIBUTIONS 2.1 Heavy-tailed Increment Distributions Families of heavy-tailed distributions used in ractice include regularly varying Pareto-tye tails Weibull and log-normal. Our two sets of assumtions, discussed next, encomass virtually all models used in ractice. We assume the increment distribution satisfies either of the following two Assumtions. Assumtion 1 F has a regularly varying right tail with index α > 1, i.e., Fx = 1 Fx = Lxx α, where L is a slowly varying function at infinity, that is, lim x Lxt/Lx = 1. 517

Assumtion 2 There exists b 0 such that for all x > b 0 the following conditions hold. 2a lim x xλx =. 2b There exists β 0 0,1 such that logλx = λx/λx β 0 x 1 for x b 0. 2c Λ is concave for all x b 0 ; equivalently, λ is assumed to be non-increasing for x b 0. We remar that under Assumtion 2, the increment distribution F is essentially assumed to ossess a tail at least as heavy as some Weibull distribution with shae arameter β 0 < 1. Note that under these Assumtions, adoted from Blanchet and Liu 2011, the increments X i s are subexonential, which means that PS m > b mpx i > b, as b see Lemma 6 of Blanchet and Liu 2011. 2.2 Parametric Family of IS Distributions State-deendent imortance samler SDIS is designed to samle the increments of the system from a distribution that is deendent on the current status of the system being simulated. We consider a mixture based SDIS. Let us denote by j = j,0,..., j,k the vector of mixture robabilities alied to the jth increment, j = 1,2,...,m 1, where K + 2 is the number of mixture determined by the heaviness of the tail the lighter the tail is, the larger K is. We consider the following family of mixture based densities arameterized by the mixing robabilities = { 1, 2,..., m 1 } = { 1,0, 1,1,..., 1,K,..., m,0, m,1..., m,k } where K 0, from which we samle the th increment of the heavy-tailed system: h x; S 1 = s =,0 f 0 x s + 1 K j f j x s + j=1 K j f x s, j=0 where f and f j for j = 0,1,...,K are roerly normalized density functions, which have disjoint suorts and deend on the current osition of the system S 1 = s. One can thin of the mixture as a mechanism to control the magnitude of the increments based on evaluations of the current status of the system, and therefore it s a natural choice in order to induce the rincile of big jum in the samled aths. The two revalent secifications are from Duuis, Leder, and Wang 2006 and Blanchet and Liu 2011. The former wors for random wals with increments of regularly varying-tye tails that satisfy Assumtion 1, in which case a mixture of two is used, i.e., K = 0. In articular, h x s = I x > ab s F ab s + I x ab s F ab s f x, where a 0,1 is necessary for analytical reasons and is tyically set to be close to 1. For increments that have distributions covered by Assumtion 2, for examle Weibull, estimators based on two mixtures might fail to achieve bounded relative error. As discussed in the revious section, this is because the weight of the contribution of those rogue aths i.e., aths with multile jums of order Ωb to the relative variance of the estimator is growing increasingly ronounced. Consider the following examle. Examle 1 Suose we are interested in estimating PX 1 + X 2 > b, where X 1,X 2 are i.i.d. Weibull with arameter β 0,1, i.e., PX i > t = Ft = ex t β. Note that PX 1 + X 2 > b PX 1 > b + PX 2 > b due to the roerties of subexonential distributions. A two-mixture samler leads to the 518

following imortance samling strategy: samle the increments { X1,X 2 X 1 ; X 2 > b X 1 w..1/2 Y 1,Y 2 = X1 X 2 ; X 1 > b X 2,X 2 w..1/2. The corresonding IS estimator is therefore ˆµ b = f X 1 y 1 f X2 y 2 f X1,X 2 y 1,y 2 = 2 Fb y 1 Fb y 2 I y 1 + y 2 > b. Fb y 1 + Fb y 2 It s not hard to see that for some choice of β < 1, the relative error is unbounded as b. In articular, consider the ath y 1,y 2 = b/2,b/2, one has E ˆµ b 2 PX 1 + X 2 > b 2 = E ˆµ b PX 1 + X 2 > b 2 which grows raidly as b if e.g., β = 2/3. = 1 f X1 b/2 f X2 b/2 PX 1 + X 2 > b 2 f Y1,Y 2 b/2,b/2 f X 1 b/2 f X2 b/2 Fb/2 2 f X1 b/2 2 ex PX 1 + X 2 > b 2 Fb/2 3b/2 β + 2b β As the revious examle illustrates, more mixtures are needed for the increments covered by Assumtion 2 to absorb the imact of such rogue aths on the second moment of the estimator. Following this observation, Blanchet and Liu 2011 roosed a multi-oint mixture family, which is general enough to cover all the increment tyes that satisfy Assumtion 1 and Assumtion 2. The suort of the mixture based densities is defined in terms of the hazard function of the increments, and the number of mixtures used is deendent on the tail heaviness of the increments which is exressed in terms of the concavity of the hazard function of the increment distribution. More mixtures are needed when the tails are not as heavy as regularly varying, for examle Weibull. More recisely, let Λx = log Fx be the integrated hazard function of the increments, given a,a > 0, let f 0 x s = f x I x b s Λ 1 Λb s a Px b s Λ 1 Λb s a, and f x s = f x I x > b s Λ 1 Λb s a Px > b s Λ 1 Λb s a. The densities f j s are defined by a set of cut-off oints c j = a j b s for j = 1,2,...,K 1 where 0 < a 1 < a 2 <... < a K 1 < 1 is a sequence satisfying, for given β 0 0,1 and a ositive constant σ 1, and a β j + 1 a j+1 β 1 + σ 2, a j+1 a j σ 1 /2, for each 1 j 2 for some σ 2 > 0, and a 1 1 σ 1,a 1 σ 1. Set c 0 = b s Λ 1 Λb s a and c K = b s Λ 1 Λb s a we define { f xi x c j 1,c j ]/PX c j 1,c j ] 1 j K 1 f j x = f b s xi x c K 1,c K ]/PX b s c K,b s c K 1 ] j = K 4, 519

for j = 1,2,...,K. Note that the two secifications of the mixtures by Duuis, Leder, and Wang 2006 and Blanchet and Liu 2011 have the same sirits when the increments are regularly varying see equation 14 in Blanchet and Liu 2011. Blanchet and Liu 2011 also showed that this mixture based distribution converges in total variation to the zero-variance distribution in a certain random wal roblem, as b. In what follows, we shall wor on a more general form of the mixture given as follows h x; S 1 = s = K, j I A j sw j s,x + j=0 1 K, j I A sw s,x j=0 where A s = K j=0 A j, and w j s,x,w s,x > 0 satisfy Ew j s,x = Ew s,x = 1. Note that the mixture family secified by Duuis, Leder, and Wang 2006 corresonds to setting w j s,x = I x > ab s, F ab s for j = 0, ; and the one roosed by Blanchet and Liu 2011 corresonds to setting w j s,x = I A js PA j s = I x c j 1,c j ] Px c j 1,c j ], for j = 0,1,...,K 1, and c 1 = with a slight abuse of notation. And w K s,x = f b s xi x c K 1,c K ] f xpx b s c K,b s c K 1 ]. If we write the joint density of the increments under the original measure as fx = f x 1 f x 2... f x m, where x = x 1,...,x m, we can exress the joint imortance samling density for the mixture based SDIS as K 1, j I A s 1 w s,x j=0 m 1 K hx; =, j I A j s 1 w j s,x + =1 j=0 I S m 1 < bpx m > b S m 1 + I S m 1 bfx. 3 STRONG EFFICIENCY OF THE FAMILY UNDER CONSIDERATION The following Theorem highlights the main reason leading to the strong efficiency of the mixture family. The roof, which relates to the techniques studied in Duuis, Leder, and Wang 2006 and Blanchet and Liu 2011, is given in Blanchet and Shi 2011. Theorem 1 Let P and P be the original robability measure and the one induced by the mixture family with mixing robability vector. If there exists an ε > 0 such that > ε 1, for all b > 0, where 1 is a vector of ones of dimension m 1 K + 1, then dp I S m > b dp PS m > b = O 1, as b. The result enables us to comfortably switch to different choices of mixing robabilities within the same arametric family without violating the strong efficiency roerty of the final estimator, which lays the ground for the alicability of the CE method to be introduced shortly. f x, 520

4 CROSS ENTROPY METHOD AND THE ITERATIVE EQUATIONS FOR THE MIXTURE FAMILY 4.1 Review of Cross-Entroy Method If we restrict our search of imortance samler to this articular arametric class, the otimal choice of the vector can be obtained by minimizing the so-called Kullbac-Leibler divergence or the cross-entroy distance. Definition 1 The Kullbac-Leibler cross-entroy between two densities g and h is given by D g,h = = gxlog gx hx dx gxloggxdx gx log hxdx. If we fix g to be the otimal imortance samling density g x ϕ Sx;b f x, where ϕ Sx;b is the erformance measure of the system for examle, SX = m j=1 X j, and ϕ Sx;b = I Sx > b, then our search of the otimal mixture is the outut of the following arametric otimization roblem min D g,h, max D = max E ϕ SX;bloghX; = max E ϕ SX;b hx; hx; loghx; = max E ϕ SX;b fx loghx;, 2 hx; where fx/hx; is the lielihood ratio between the original measure and the measure induced by the mixture based density with some fixed arameter Recall that X = X 1,...,X m. In articular, m 1 fx K hx; = I x A j S 1 =1 j=0, j w j S 1,x + I x A S 1 1 K j=0, j w S 1,x I S m 1 < bpx m > b S m 1 + I S m 1 b. 3 In most cases the exectation in 2 is analytically inaccessible. Rubinstein and Kroese 2004 suggested a recursive method based on the following stochastic counterart of 2 max ˆD = max 1 N N i=1 Cross Entroy CE Algorithm Rubinstein and Kroese 2004 ϕ SXi;b fxi loghxi,. 4 hxi; 1. Choose an initial vector of mixing robabilities 0. Set T = 1. 2. Generate a random samle X 1,...,X N from the joint density h ; T 1. 3. Solve the stochastic otimization rogram 4. Denote the solution by T, i.e., T 1 = argmin N N i=1 fxi ϕ SXi;b h Xi; T 1 loghxi,. 4. Sto if convergence is reached; otherwise, set T = T + 1, go to Ste 2. 521

It s very convenient to embed the CE algorithm in the main SDIS algorithm to further reduce variance. Let M be the total simulation budget, and τ be the number of recursions in the CE algorithm until convergence of. If τn < M, then the SDIS with CE algorithm add-on corresonds to generating τ batches of indeendent samles from the mixture based imortance samling density arameterized by T, for T = 0,1,...,τ 1, and one batch of size M τn of indeendent samles from the imortance density with otimal CE robability vector. Deending on the size of M τn, the final estimator can be obtained by averaging either the last batch of M τn samles, or the entire M samles from different batches. In either case we are able to achieve variance reduction while maintaining strong efficiency roerty. Even for the case where τn M, the imroved cross-entroy after each iteration tyically will reduce the variance of the future samles over those from revious iterations, since each iteration gives us a arameterized density closer to the zero-variance imortance density. 4.2 Iterative Equations for the Mixture IS Family We now roceed to characterize the solution to 4. In the case where we are interested in the tail robability of the sum PS m > b, ϕ SX;b = I S m > b. Note that ˆD is concave and differentiable with resect to the comonents, therefore the solution to 4 is directly given by the first order otimality condition: N i=1 I S m i > b fxi hxi; loghxi, = 0. 5 The roduct structure of the lielihood function is articularly useful because the sensitivity of the lielihood function to the mixing robabilities can be localized. Indeed, a few lines of elementary algebra gives d loghx, =I X A l S 1 w l S 1,X I X A S 1 w S 1,X / d,l We denote W X l i;, = K, j I X A j S 1 w j S 1,X + j=0 = I X A l S 1 I X A S 1,l 1 K j=0., j m 1 =1, l 1 K, j I X A S 1 w S 1,X j=0 h X i; I S m 1 < bpx m i > b S m 1 i + I S m 1 i b, h X i; where = {,0,...,K }, and = {,0,...,K }. And further let Θ l, j = N i=1 W X li;, 1 K j=0 l, j w S l 1,X l i N i=1 W X. li;, l, j w l S l 1,X l i The first order otimality condition 5 therefore yields the following solution to the stochastic otimization roblem 4, we shall call this vector of otimal solution otimal CE mixing robability vector: Θ l, j l, j = 1 + K =0 Θ, 6, j for j = 0,1,...,K and l = 1,2,...,m. It doesn t tae long to realize that the revious exression has the following equivalent form l, j = N i=1 I S mi > bw Xi;, I X l A j S l 1 N i=1 I S, 7 mi > bw Xi;, 522

for j = 0,1,...,K and = 1,2,...,m, where W ;, = h ; /h ; = f /h ; is given by 3. It s worth ointing out that 7 is comutationally advantageous over 6, because it avoids dividing by zero in comuting Θ l, j, esecially when the number of ilot runs is small. Note that the samling of the mth increment ensures S m i > b. Moreover, the exression 7 entails a nice interretation: the otimal mixing robability is the roortion of the contribution to the lielihood function from the jth band of the th increment. For comleteness we also include the exlicit iteration equations for cases where the increments satisfy Assumtion 1 and 2, resectively. We write, for ease of exosition, W m i = I S m 1 i < bpx m i > b S m 1 i + I S m 1 i > b. For regularly varying increments, the solution for the T th iteration of the recursive algorithm can be written as T = N i=1 I S mi > b;x > ab s 1 m 1 =1 N i=1 I S mi > b m 1 =1 PX >ab s 1 + PX ab s 1 T 1 T 1 IX >ab s 1 1 IX ab s 1 PX >ab s 1 + PX ab s 1 T 1 T 1 IX >ab s 1 1 IX ab s 1 W m i W m i For increment distributions that satisfy Assumtion 2, the lielihood function W ;, T 1 becomes = W m 1 =1 X T 1 ;, T 1 = K 1 + j=1 f x T 1 h X T 1, P T 1 T 1 X c 0 P X > c K + T 1 T 1,0 I x c 0 1 K 1 T 1 j=0 T, j I X > c K T 1 T 1 T 1 P X c j 1,c j ] f b s x P X b s c K 1,b s c K ] + W T 1 T 1 T 1 T 1 T 1 m i,, j I x c j 1,c j ],K f x I x c K 1,c K ] T 1 where c j s are the cutoff oints of the bands and we have exlicitly written out the iteration count. Note that at the beginning of iteration T, the only art that is deendent on the unnown arameters in the stochastic rogram 4 is logh Xi, T and hence logh Xi, T in the otimality condition 5; the lielihood W ;, T 1 is a function of the robability vector assed from the T 1st iteration as well as the samles generated from IS density secified by that robability vector. In that regard at the beginning of the T th iteration, all the ingredients in the exression above are available. The iteration equation for the robability vector at iteration T is therefore given by T, j = N i=1 I T 1 S m i > b W X T 1 i;, T 1 T 1 I x c j 1, j ] N i=1 I T 1 S i > b W Xi T 1 ;, T 1, m where c 1 = with a slight abuse of notations. Note that the iterative equations given so far reveal the ease of imlementation of the CE subroutine: one only needs to ee K + 2 bucets, indicating whether the th increment falls into the jth band, j = 1,2,...,K + 2, and aggregate the lielihood function for each bucet. The comutational cost is of the same order as a vanilla SDIS iteration without the CE routine.. 523

Remar 1 One might consider further guiding the arametric family of samlers using large deviations ideas. For examle, in the regularly varying case, one can force the robabilities to have the following structure, = m + 1 1, m for = 2,...,M 1, which is equivalent to = m 1 m, for = 1,2,...,m 1. This choice reflects the intuition that the chance for the -th increment to be a large one is roughly roortional to the inverse of the remaining stes to go. Note that this articular structure is very close to the otimal mixture found by Duuis, Leder, and Wang 2006 using a dynamic rogramming argument. However, due to the global deendence on the first robability arameter. It is not difficult to see that the CE iteration equations will involve a root finding rocedure, which could increase the comutational cost significantly. 5 NUMERICAL EXAMPLES 5.1 Examle 1: Regularly Varying Increments We illustrate the emirical erformance of the SDIS with CE routine SDIS-CE by considering two examles. In the first examle, the increments are regularly varying with index α = 1/2, in articular, X n s have tail distribution PX i > b = 1 + b 1/2. Following Duuis, Leder, and Wang 2006, given the arameters of the model, a given number of increments m and a tail arameter b, we estimate PS m > b and the standard deviation of the estimator as follows. We simulate 20000 relications of our estimator. The estimates are obtained based on averages of the relications. This is the outut of a single run. Then we roduce 500 indeendent runs. The results dislayed are the averages of the oututs of these runs. We run the exeriments with two different sets of inut mixing robabilities. In the first case, which we shall later refer to as the standard choice, we consider the heuristic choice = θ/m where θ = 0.9. And for the second set of inut we use the otimal choice of the robabilities obtained by Duuis, Leder, and Wang 2006, i.e., = a α/2 m a α/2 + 1, which we call the DLW selection. In both cases we select a = 0.9. The results of the exeriment are reorted in the Table 1 and Table 2. From the results of Table 1 we observe that even for a reasonable choice of mixing robabilities based on large deviations intuition, the CE algorithm roduces a smaller relative error. On the other hand, it is outerformed by the otimal choice of the robabilities obtained in Duuis, Leder, and Wang 2006, as can be seen in Table 2, one shall ee in mind, however, that in many alications, the structure of the roblem doesn t allow for such analytical solutions easily. We also oint out that the otimal solution from Duuis, Leder, and Wang 2006 hinges on the assumtion that b is sufficiently large for large deviations asymtotics to be valid. For smaller exceedance level b, we might exect a better erformance using the CE routine, which is underinned by the results shown in Table 3. We have mentioned in the revious section that since the recursive CE algorithm is carried out on the ilot samle, it neglects the fact that the increments are simulated in a sequential manner, but rather treats them in an indeendent way. We averaged the outut CE otimal robability vector over the exeriments, the near identical mixing robabilities in Table 4 is in line with the exected behavior of the method that each increment has robability at roughly 1/4 of causing the rare event. 524

Table 1: Performance of the SDIS-CE estimator comared to the SDIS algorithm without CE rocedure where the inut mixing robabilities are set to be = 0.9/m for = 1,2,...,m 1. m b Standard CE Method 4 1e + 06 3.999E-03 4.000E-03 Average Estimate 3.148E-05 1.395E-05 Average Std. Error 0.787% 0.349% Avg.SE/Avg.Est % 1e + 12 3.999E-06 4.000E-06 3.151E-08 1.403E-08 0.788% 0.351% 1e + 18 4.000E-09 4.000E-09 3.153E-11 1.393E-11 0.788% 0.348% 25 1e + 06 2.503E-02 2.498E-02 1.525E-03 3.404E-04 6.094% 1.363% 1e + 12 2.496E-05 2.499E-05 1.518E-06 3.458E-07 6.082% 1.384% 1e + 18 2.496E-08 2.502E-08 1.524E-09 3.409E-10 6.103% 1.363% Table 2: Performance of the SDIS-CE estimator comared to the SDIS without CE rocedure where the inut mixing robabilities are set to be the otimal choice obtained in Duuis, Leder and Wang 2006. m b DLW CE Method 4 1e + 06 4.000E-03 4.000E-03 Average Estimate 5.660E-06 1.374E-05 Average Std. Error 0.141% 0.344% Avg.SE/Avg.Est % 1e + 12 4.000E-06 4.000E-06 5.683E-09 1.382E-08 0.142% 0.346% 1e + 18 4.000E-09 4.001E-09 5.691E-12 1.373E-11 0.142% 0.343% 25 1e + 06 2.499E-02 2.500E-02 3.925E-05 1.555E-04 0.157% 0.622% 1e + 12 2.500E-05 2.500E-05 4.032E-08 1.567E-07 0.161% 0.627% 1e + 18 2.500E-08 2.500E-08 4.027E-11 1.568E-10 0.161% 0.627% 525

Table 3: Comarison of erformance between 1 SDIS using CE otimal mixing robabilities and 2 Analytical otimal mixing robabilities from Duuis, Leder and Wang 2006, m = 2. b DLW CE Method 5 6.999E-01 6.999E-01 Average Estimate 1.110E-03 5.742E-04 Average Std. Error 0.159% 0.082% Avg.SE/Avg.Est % 20 4.166E-01 4.166E-01 4.727E-04 4.410E-04 0.113% 0.106% Table 4: Average otimal CE.mixing robabilities, m = 4, b = 10 6. 5.2 Examle 2: Weibull Increments 1 2 3 0.248 0.253 0.251 We now roceed to the second examle where the increments are assumed to have the following Weibull-tye of distribution, PX > b = e 2 b+1, for t 1. This corresonds to the case considered by Blanchet and Liu 2011, where the authors use a 5- oint mixtures secified by the cut-off oints c 0 = 0.1 b s,c 1 = 0.1b s,c 2 = 0.5b s,c 3 = 0.9b s and c 4 = b s 0.1 b s. Since the number of cut-off oints increases from the revious mixture samler, we increase the ilot samle number to 5000; all the other algorithmic arameters number of runs and number of relications er run remain the same. The results of the exeriments are summarized in Table 5. Table 5: Performance of the SDIS-CE estimator comared to SDIS without CE rocedurein the case of Weibull-tye of increments, m = 4. We used, j = 1/K + 2m, for j = 0,1,...K and = 1,2,...,m 1 as the standard choice of the mixing robabilities. b Standard CE Method 150 7.977E-11 7.966E-11 Avg. Est. 2.580E-12 7.642E-13 Avg. Std. Err. 3.235% 0.959% Avg. SE/Avg. Est. % 450 1.371E-18 1.372E-18 4.835E-20 1.071E-20 3.526% 0.781% 750 6.086E-24 6.069E-24 2.209E-25 3.185E-26 3.630% 0.525% REFERENCES Asmussen, S., and P. Glynn. 2008. Stochastic Simulation: Algorithms and Analysis. New Yor, NY, USA: Sringer-Verlag. Binswanger, S. A. K., and B. Hojgaard.. 1997. Rare events simulation for heavy-tailed distributions.. Bernoulli 6:303 322. 526

Blanchet, J., J. C. Chan, and D. Kroese. 2010. Asymtotics and fast simulation for tail robabilities of the maximum and minimum of sums of lognormals. woring aer. Blanchet, J., and J. Liu. 2011. Efficient Simulation and Conditional Functional Limit Theorems for Ruinous Heavy-tailed Random Wals. forthcoming. Blanchet, J., and Y. Shi. 2011. Efficient Rare Event Simulation for Heavy-tailed Systems via Cross Entroy. woring aer. Chan, J. C. C., P. W. Glynn, and D. P. Kroese. 2011. A Comarison of Cross-Entroy and Variance Minimization Strategies. Journal of Alied Probabilities 48. Duuis, P., K. Leder, and H. Wang. 2006. Imortance samling for sums of random variables with regularly varying tails.. ACM TOMACS 17:Article 14. Kroese, D. P., R. Y. Rubinstein, and P. W. Glynn. 2010. The Cross-Entroy Method for Estimation. In Handboo of Statistics, edited by V. Govindaraju and C. R. Rao, Volume 31. Elsevier. Rubinstein, R. Y., and D. P. Kroese. 2004. The Cross-Entroy Method. New Yor, NY: Sringer. AUTHOR BIOGRAPHIES JOSE BLANCHET is a faculty member of the IEOR Deartment at Columbia University. Jose holds a Ph.D. in Management Science and Engineering from Stanford University. Prior to joining Columbia he was a faculty member in the Statistics Deartment at Harvard University. Jose is a reciient of the 2009 Best Publication Award given by the INFORMS Alied Probability Society and of the 2010 Erlang Prize. He also received a PECASE award given by NSF in 2010. He wored as an analyst in Protego Financial Advisors, a leading investment ban in Mexico. He has research interests in alied robability and Monte Carlo methods. He serves in the editorial board of Advances in Alied Probability, Journal of Alied Probability, Mathematics of Oerations Research and QUESTA. YIXI SHI is a PhD candidate in Deartment of Industrial Engineering and Oerations Research, School of Engineering and Alied Science at Columbia University. He holds a B.Sc. in Actuarial Science from University of Hong Kong. His email address is ys2347@columbia.edu. 527