Uniformly Efficient Importance Sampling for the Tail Distribution of Sums of Random Variables

Size: px
Start display at page:

Download "Uniformly Efficient Importance Sampling for the Tail Distribution of Sums of Random Variables"

Transcription

1 Uniformly Efficient Importance Sampling for the Tail Distribution of Sums of Random Variables P. Glasserman Columbia University S.K. Juneja Tata Institute of Fundamental Research February 2006 Abstract Successful efficient rare event simulation typically involves using importance sampling tailored to a specific rare event. However, in applications one may be interested in simultaneous estimation of many probabilities or even an entire distribution. In this paper, we address this issue in a simple but fundamental setting. Specifically, we consider the problem of efficient estimation of the probabilities P (S n na) for large n, for all a lying in an interval A, where S n denotes the sum of n independent, identically distributed light-tailed random variables. Importance sampling based on exponential twisting is known to produce asymptotically efficient estimates when A contains a single point. We show, however, that this procedure fails to be asymptotically efficient throughout A when A contains more than one point. We analyze the best performance that can be achieved using a discrete mixture of exponentially twisted distributions and then present a method using a continuous mixture. We show that a continuous mixture of exponentially twisted probabilities and a discrete mixture with sufficiently a large number of components produce asymptotically efficient estimates for all a A simultaneously. Introduction The use of importance sampling for efficient rare event simulation has been studied extensively (see, e.g., Bucklew [3], Heidelberger [7], and Juneja and Shahabuddin [8] for surveys). Application areas for rare event simulation include communications networks where information loss and large delays are important rare events of interest (as in Chang et al. [4]), insurance risk where the probability of ruin is a critical performance measure (e.g., Asmussen []), credit risk models in finance where large losses are a primary concern (e.g., Glasserman and Li [6]), and reliability systems where system failure is a rare event of utmost importance (e.g., Shahabuddin []). Successful applications of importance sampling for rare event simulation typically focus on the probability of a single rare event. As a way of demonstrating the effectiveness of an importance sampling technique, the probability of interest is often embedded in a sequence of probabilities decreasing to zero. The importance sampling technique is said to be asymptotically efficient or asymptotically optimal if the second moment of the associated estimator decreases at the fastest possible rate as the sequence of probabilities approaches zero.

2 Importance sampling based on exponential twisting produces asymptotically efficient estimates of rare event probabilities in a wide range of problems. As in the setting we consider here, asymptotic efficiency typically requires that the twisting parameter be tailored to a specific event. But in applications, one is often interested in estimating many probabilities or an entire distribution; and estimating a distribution is often a step toward estimating functions of the distribution, such as quantiles or tail conditional expectations. These problems motivate our investigation. We consider the simple but fundamental setting of tail probabilities associated with a random walk. Exponential twisting produces asymptotically efficient estimates for a single point in the tail; we show, however, that the standard approach fails to produce asymptotically efficient estimates for multiple points simultaneously. We develop and analyze modifications that rectify this. The relatively simple context we consider here provides a convenient setting in which to identify problems and solutions that may apply more generally. Specifically, we consider the random walk S n = n i= X i where (X i : i n) are independent, identically distributed (iid) random variables with mean µ. We focus on the problem of simultaneous efficient estimation by simulation of the probabilities P (S n /n a) for large n, for all a > µ lying in an interval A. Certain regularity conditions are imposed on this interval. We refer to this problem as that of efficient estimation of the tail distribution curve. A standard technique for estimating P (S n /n a), a > µ, uses importance sampling by applying an exponential twist to the X i. It is well-known that if the twisting parameter is correctly tailored to a, this method produces asymptotically efficient estimates. (See, e.g., Sadowsky and Bucklew [0].) However, we show in Section 2.3 that the exponentially twisted distribution that achieves asymptotic efficiency in estimating P (S n /n a ) fails to be asymptotically efficient in estimating P (S n /n a 2 ) for a 2 a. In particular, it incurs an exponentially increasing computational penalty as n. This motivates the need for the alternative procedures that we develop in this paper. Within the class of exponentially twisted distributions, we first identify the one that estimates the tail distribution curve with asymptotically minimal computational overhead. We then extend this analysis to find the best mixture of k < exponentially twisted distributions. However, even with this distribution asymptotic efficiency is not achieved. We note that this shortcoming may be remedied if k is selected to be an increasing function of n. We further propose an importance sampling distribution that is a continuous mixture of exponentially twisted distributions. We show that such a mixture also estimates the tail probability distribution curve asymptotically efficiently for all a A simultaneously. Furthermore, we identify the mixing probability density function that ensures that all points along the curve are estimated with roughly equal precision. The rest of this paper is organized as follows: In Section 2 we review the basics of importance sampling and introduce some notions of efficiency relevant to our analysis. In Section 3, we discuss the performance of importance sampling in simultaneously estimating the tail probability distribution curve using discrete mixtures of appropriate exponentially twisted distributions. The analysis is then extended to continuous mixtures in Section 4. Concluding remarks are given in Section 5. 2

3 2 Naive Estimation and Importance Sampling 2. Naive Estimation of the Tail Distribution Curve Under naive simulation, m iid samples ((X i, X i2,..., X in ) : i m) of (X, X 2,..., X n ) are generated using the original probability measure P. Let S i n = n j= X ij. Then, for each a A, m m I(Sn i na) i= provides an unbiased estimator of P (S n /n a), where I( ) denotes the indicator function of the event in parentheses. 2.2 Importance Sampling We restrict attention to probability measures P for which P is absolutely continuous with respect to P when both measures are restricted to the σ-algebra generated by (X, X 2,..., X n ). Let L denote the likelihood ratio of the restriction of P to the restriction of P. Then, P (S n /n a) = E P [LI(S n /n a)] () for each a A, where the subscript affixed to E denotes the probability measure used in determining the expectation. If, under P and P, (X, X 2,..., X n ) have joint densities f and f, respectively, then L = f(x, X 2,..., X n ) f (X, X 2,..., X n ). Clearly, L depends on n, though we do not make this dependence explicit in our notation. Under importance sampling with probability P, m iid samples ((X i, X i2,..., X in ) : i m) of (X, X 2,..., X n ) are generated using P. Let (L i : i m) denote the associated samples of L. For each a A, we compute the estimate m m L i I(Sn i na). (2) i= In light of (), (2) provides an unbiased estimator of P (S n /n a). The probabilities P (S n /n a) decrease exponentially in n if the X i are light-tailed. More precisely, if the moment generating function of the X i is finite in a neighborhood of the origin, then (as in, e.g., [5], p.34) lim n log P (S n/n a) = Λ (a), (3) where Λ denotes the large deviations rate function associated with the sequence (S n /n : n 0), to be defined in Section 3.2, and a belongs to the interior of {x : Λ (x) < }. Also note that for any P, E P [L 2 I(S n /n a)] (E P [LI(S n /n a)]) 2 = P (S n /n a) 2 3

4 and hence lim inf n log E P [L2 I(S n /n a)] 2Λ (a). (4) An importance sampling measure P is said to be asymptotically efficiently for P (S n /n a) if equality holds in (4) with the liminf replaced by an ordinary limit. Furthermore, we say that the relative variance of the estimate (under a probability measure P ) grows polynomially at rate p > 0 if 0 < lim inf E P L 2 I(S n /n a) n p P (S n /n a) 2 lim sup E P L 2 I(S n /n a) n p P (S n /n a) 2 <. It is easily seen that if this holds for some p > 0, then P is asymptotically efficient for P (S n /n a). The relative variance of the estimate grows exponentially if the ratio E P L 2 I(S n /n a) P (S n /n a) 2 grows at least at an exponential rate with n. We say that the relative variance of the estimate of the tail distribution curve over A grows polynomially at rate p > 0 if 0 < sup lim inf a A E P L 2 I(S n /n a) n p P (S n /n a) 2 sup lim sup a A n p E P L 2 I(S n /n a) P (S n /n a) 2 <. It is said to grow at most at the polynomial rate p > 0 if the last inequality holds. 2.3 Asymptotically Efficient Exponentially Twisted Distribution We first review the asymptotically efficient exponentially twisted importance sampling distribution for estimating a single probability P (S n /n a), a > µ. We also show that under this measure the relative variance of the estimate grows polynomially at rate /2. Some notation and assumptions are needed for this. Let X i have distribution function F and log-moment generating function Λ under the original probability measure P, Λ(θ) = log exp(θx) df (x). We assume that the domain Θ = {θ : Λ(θ) < } includes zero in its interior Θ o. Further, we assume that A H o, where H = {Λ (θ) : θ Θ}. Let F θ denote the distribution function obtained by exponentially twisting F by θ Θ, i.e., df θ (x) = exp(θx Λ(θ)) df (x). 4

5 Let P θ denote the probability measure under which (X i : i ) are iid with distribution F θ. The restrictions of P and P θ to the σ-algebra generated by X,..., X n are mutually absolutely continuous. Let L θ denote the Radon-Nikodym derivative of the restriction of P to the restriction of P θ. Then, L θ = exp( θs n + nλ(θ)) a.s. (5) We recall some definitions associated with the large deviations rate of S n ; see Dembo and Zeitouni [5] for background. The rate function for (S n /n : n ) is defined by Λ (a) = sup(θa Λ(θ)). θ For each a H o, there is, by definition, a θ(a) Θ for which Λ (θ(a)) = a. It is easy to see that θ(a)a Λ(θ(a)) = sup(θa Λ(θ)) = Λ (a). (6) θ Moreover, through differentiation, it can be seen that Λ (a) = θ(a). By differentiating the equation Λ (θ(a)) = a, it follows that θ (a) = Λ (θ(a)) = Λ (a); as in Section of Dembo and Zeitouni [5], we have Λ (θ) > 0 when Λ (θ) H o. (Furthermore, they note that Λ(θ) is C in Θ o and Λ (x) is C in H o.) It is well known that (as in Sadowsky and Bucklew [0]) P θ(a) asymptotically efficiently estimates P (S n /n a), i.e., log E Pθ(a) [L 2 θ(a) lim I(S n/n a)] = 2. (7) log P (S n /n a) Theorem sharpens this result and also shows what happens when we use a twisting parameter θ different from θ(a) in estimating P (S n /n > a). The theorem applies the Bahadur-Rao [2] approximation: P (S n /n a) 2πΛ (θ(a))nθ(a) exp[ nλ (a)], (8) that is valid when X i have a non-lattice distribution. In this paper we restrict our discussion to X i having a non-lattice distribution. Theorem For a > µ and t in H o, E Pθ(t) [L 2 θ(t) I(S n/n a)] 2πΛ (θ(a))n(θ(t) + θ(a)) exp[n(θ(t)(t a) Λ (t) Λ (a))]. (9) Thus, with t = a the relative variance grows polynomially at rate /2, but with t a the relative variance grows exponentially and P θ(t) fails to be asymptotically efficient for P (S n /n a). 5

6 Proof of Theorem : Once we establish (9), the polynomial rate of growth of the relative variance when t = a follows from dividing (9) by the square of (8). When t a, the strict convexity of Λ (see Dembo and Zeitouni [5], Section ) implies that Λ (a) > Λ (t) + (a t)λ (t). Since θ(t) = Λ (t), dividing (9) by the square of (8) shows that the relative variance grows exponentially. To establish (9), we use (5) and (6) to write Also note that L θ(t) = exp[ θ(t)(s n na) + nθ(t)(t a) nλ (t)]. (0) E Pθ(t) [L 2 θ(t) I(S n/n a)] = E P [L θ(t) I(S n /n a)] = E Pθ(a) [L θ(t) L θ(a) I(S n /n a)]. Using (5) to replace L θ(a) and (0) to replace L θ(t), we get E Pθ(t) L 2 θ(t) I(S n/n a) = exp[n(θ(t)(t a) Λ (t) Λ (a))]e Pθ(a) exp[ (θ(t) + θ(a))(s n na)]i(s n /n a). Using exactly the same arguments as in Bahadur and Rao [2] or Dembo and Zeitouni [5], Section 3.7.4, it follows that E Pθ(a) exp[ (θ(t) + θ(a))(s n na)]i(s n /n a) 2πΛ (θ(a))n(θ(t) + θ(a)). 3 Finite Mixtures of Exponentially Twisted Distributions We have seen in the previous section that a single exponential change of measure cannot estimate multiple points along the tail distribution curve with asymptotic efficiency. In this section, we identify the twisting parameter that is minimax optimal in the sense that it minimizes the worst case relative variance over an interval A = [a, a 2 ] H o, a > µ. We then consider an optimal mixture of k < exponentially twisted distributions. We show that if k as n, then the tail distribution curve is asymptotically efficiently estimated over A. Furthermore, if k is Θ( n/ log n) the relative variance of the tail probability distribution curve grows polynomially over A. Some definitions are useful to our analysis. From Theorem, it follows that for a H o and a > µ, lim n log E P θ(t) [L 2 θ(t) I(S n/n a)] = H(t, a), where H(t, a) = Λ (a) + Λ (t) + Λ (t)(a t). A non-negative function f(x) is said to be Θ(g(x)), where g is another non-negative function, if there exists a positive constant K such that f(x) Kg(x) for all x sufficiently large. 6

7 The asymptotic rate of the relative variance of the estimator under P θ(t) may be defined as From Theorem, this equals ( lim n log EPθ(t) L 2 θ(t) I(S ) n/n a) P (S n /n a) 2, J(t, a) = Λ (a) (a t)λ (t) Λ (t) = 2Λ (a) H(t, a). This may be interpreted as the non-negative exponential penalty incurred for twisting with parameter θ(t) in estimating P (S n /n a). It equals zero at t = a and is positive otherwise. The penalty J(t, a) has the following geometric interpretation: It equals the vertical distance between the rate function Λ and the tangent drawn to this function at t, both evaluated at a. 3. The Minimax Optimal Twist A formulation of the problem of minimizing the worst case asymptotic relative variance is to find t H o minimizing sup lim a A n log ( EPθ(t) L 2 θ(t) I(S ) n/n a) P (S n /n a) 2 = sup J(t, a). () a A Proposition For A = [a, a 2 ] H o, a > µ, the asymptotic worst case relative variance () is minimized over t by t : Proof: The optimization problem reduces to inf t H o a [a,a 2 ] Λ (t ) = Λ (a 2 ) Λ (a ) a 2 a. (2) sup (Λ (a) Λ (t)(a t) Λ (t)). Due to the convexity of Λ, for any t, sup a [a,a 2 ](Λ (a) Λ (t)(a t) Λ (t)) is achieved at a or a 2. Thus, the problem reduces to inf t H o max{λ (a ) Λ (t)(a t) Λ (t), Λ (a 2 ) Λ (t)(a 2 t) Λ (t)}. If t < a then both functions inside the max are decreasing in t, so increasing t reduces the maximum. If t > a 2 then both functions inside the max are increasing in t, so decreasing t reduces the maximum. It therefore suffices to consider t [a, a 2 ]. At all such t, the first function is increasing and the second is decreasing, so the maximum is minimized where they are equal, which is the point t in (2). 7

8 3.2 Finite Mixtures: Minimax Objective We now consider a probability measure P that is a non-negative mixture of k exponentially twisted distributions so that k P (A) = p i P θ(ti )(A) i= where p i > 0, k i= p i =, and t < t 2 < < t k, and each t i H o. Our objective is to find the optimal twisting parameters for a fixed k. We then examine the asymptotic relative variance as k increases. Mixtures of exponentially twisted distributions have been used in many applications, including Sadowsky and Bucklew [0]. For a fixed k, we formulate the problem of selecting a mixture of k exponentially twisted distributions to minimize the worst-case asymptotic relative variance over A = [a, a 2 ] H o as follows: ( inf sup E lim (t,...,t k ) H o n log P [ L 2 ) I(S n /n a)] P (S n /n a) 2, (3) a A where L is the likelihood ratio of P with respect to P, given by L = ki= p i exp[θ(t i )S n nλ(θ(t i ))], and the existence of the limit in (3) is guaranteed by Lemma. Recall that H(t, a) = Λ (a) + Λ (t) + Λ (t)(a t). Lemma For (a, t,..., t k ) H o, a > µ, there exists a constant c such that ( ) E P [ L 2 I(S n /n a)] c exp n max H(t i, a), (4) i=,...,k for all sufficiently large n. Furthermore, lim n log E P [ L 2 I(S n /n a)] = max H(t i, a). (5) i=,...,k Recall that J(t, a) = 2Λ (a) H(t, a). In view of (5), our optimization problem reduces to inf sup min J(t i, a). (6) (t,...,t k ) H o i=,...,k a A Proof of Lemma : We first show (4). We have E P [ L 2 I(S n /n a)] = E[ LI(S n /n a)] (7) and so L E P [ L 2 I(S n /n a)] min (/p i) exp[ θ(t i )S n + nλ(θ(t i ))], i=,...,k min (/p i) exp[ θ(t i )na + nλ(θ(t i ))]P (S n /n a). i=,...,k 8

9 From (8) we get for all sufficiently large n. therefore get P (S n /n a) constant exp[ nλ (a)], Recalling that θ(t i ) = Λ (t i ), and Λ (t) = θ(t)t Λ(θ(t)), we E P [ L 2 I(S n /n a)] constant min (/p i) exp[ nh(t i, a)], (8) i=,...,k for all sufficiently large n. This proves the upper bound in the lemma because the minimum is achieved by the largest exponent H(t i, a), i =,..., k, for all sufficiently large n. To see (5), from (8) it follows that lim sup n log E P [ L 2 I(S n /n a)] To get a lower bound, choose ɛ > 0 and write min H(t i, a). i=,...,k E P [ L 2 I(S n /n a)] = E Pθ(a+ɛ) [ LL θ(a+ɛ) I(S n /n a)] E Pθ(a+ɛ) [ LL θ(a+ɛ) I(a + 2ɛ S n /n a)]. This last expression is, in turn, bounded below by M n,ɛ = exp[ θ(a + ɛ)(a + 2ɛ)n + nλ(θ(a + ɛ))] ( k p i exp(θ i (a + 2ɛ)n nλ(θ i ))) P θ(a+ɛ) (a S n /n a + 2ɛ), i= where we have written θ i for θ(t i ). Because the X i have mean a + ɛ under P θ(a+ɛ), Thus, lim inf from which follows lim inf P θ(a+ɛ) (a S n /n a + 2ɛ). n log M n,ɛ θ(a + ɛ)(a + 2ɛ) + Λ(θ(a + ɛ)) max i=,...,k {θ i(a + 2ɛ) Λ(θ i ))} n log E P [ L 2 I(S n /n a)] θ(a + ɛ)(a + 2ɛ) + Λ(θ(a + ɛ)) max i=,...,k {θ i(a + 2ɛ) Λ(θ i ))}. Because ɛ > 0 is arbitrary, we also have lim inf n log E P [ L 2 I(S n /n a)] θ(a)a+λ(θ(a)) max i=,...,k {θ ia Λ(θ i ))} = max i=,...,m H(t i, a). 9

10 3.3 Finite Mixtures: Optimal Parameters We now turn to the solution of (6), the problem of finding the optimal twisting parameters. We first develop some simple results related to convex functions that are useful for this. Consider a function f that is strictly convex on an interval that includes [y, z] in its interior. Consider the problem of finding the maximum vertical distance between the graph of f and the line joining the points (y, f(y)) and (z, f(z)); i.e., At the optimal point t, we have ( ) f(z) f(y) g(y, z) = max f(z) (z t) f(t). t [y,x] z y f (t ) = f(z) f(y), (9) z y and By substituting for f (t ), it is also easily seen that g(y, z) = f(z) (z t )f (t ) f(t ). (20) g(y, z) = f(y) + (t y)f (t ) f(t ). (2) Also note that g(y, z) is a continuous function of (y, z), it increases in z and decreases in y. The following lemma is useful in arriving at optimal parameters of finite mixtures. Its content is intuitively obvious: given a strictly convex function on an interval, the interval can be partitioned into k subintervals such that the maximum distance between the function and its piecewise linear interpolation between the endpoints of the subintervals is the same over all subintervals. Lemma 2 Suppose that f is strictly convex on [y, z]. Then there exist points y = b < b 2 < < b k+ = z satisfying g(b i, b i+ ) = g(b i+, b i+2 ) i =, 2,..., k. (22) Note that whenever the existence of (b, b 2,..., b k+ ) as in Lemma 2 is established, the common value of g(b i, b i+ ) for i =, 2,..., k, is determined by b k+ if b is held fixed. In the proof, J k (b k+ ) denotes this value. Proof of Lemma 2: We prove this by induction. First consider k = 2. Here g(b, b) g(b, b 3 ) is a continuous increasing function of b. It is negative at b = b and positive at b = b 3. Thus, there exists b 2 (b, b 3 ) such that g(b, b 2 ) = g(b 2, b 3 ) = J 2 (b 3 ). Furthermore, as b 3 increases g(b, b) g(b, b 3 ) decreases, so b 2 and hence J 2 (b 3 ) increase continuously with b 3. To proceed by induction, assume that for each b k > y, there exist points y = b < b 2 < < b k such that g(b i, b i+ ) = g(b i+, b i+2 ) = J k (b k ) for i =, 2,..., k 2. Furthermore, each b i and J k (b k ) are an increasing function of b k. Now consider the function J k (b) g(b, b k+ ) as a function of b. This function is negative at b = b, positive at b = b k+, and it increases continuously with b. So again there exists a value b k so that J k (b k ) = g(b k, b k+ ). Set this 0

11 value equal to J k (b k+ ) Again, it may be similarly seen that b k and hence all b i, i = 2, 3,..., k, and J k (b k+ ) increase with b k+. Given the points (b, b 2,..., b k+ ), as in Lemma 2, we may define t i by f (t i ) = f(b i+) f(b i ) b i+ b i, i =, 2,..., k. (23) Then, from (20) and (2), it is easy to see that (22) amounts to f(t i+ ) f(t i ) = f (t i )[b i+ t i ] + f (t i+ )[t i+ b i+ ] i =, 2,..., k. In Proposition 2, we use these observations and Lemma 2 to identify the solution of (6), i.e., the optimal twisting parameters. Proposition 2 Suppose that A = [a, a 2 ] H o, a > µ. Let the points a = b < b 2 < < b m+ = a 2 and t,..., t k satisfy and Λ (t i ) = Λ (b i+ ) Λ (b i ) b i+ b i, i =,..., k (24) Λ (t i+ ) Λ (t i ) = Λ (t i )[b i+ t i ] + Λ (t i+ )[t i+ b i+ ], i =,..., k, (25) as in Lemma 2 and (22). These t,..., t k solve (6). Proof: From the strict convexity of Λ in H o, it follows that the tangent lines at t,..., t k partition the interval [a, a 2 ] into k subintervals such that throughout the ith interval, Λ is closer to the tangent line at t i than it is to the tangent line at any other t j. The endpoints of the ith subinterval, i = 2,..., k, are the points at which the tangent line at t i crosses the tangent lines at t i and t i+. These are the points b 2,..., b m in (25). The left endpoint of the first subinterval is simply a = b and the right endpoint of the mth subinterval is simply a 2 = b k+. Thus, for all a [b i, b i+ ], min J(t j, a) = J(t i, a). j=,...,k Furthermore, each J(t i, ) is a convex function, so sup b i a b i+ J(t i, a) = max{j(t i, b i ), J(t i, b i+ )}, i =,..., k. Viewing b 2,..., b k as functions of t,..., t k (through (25)), the problem in (6) thus becomes one of minimizing max{j(t, b ), J(t, b 2 ), J(t 2, b 2 ), J(t 2, b 3 ),..., J(t k, b k ), J(t k, b k+ )} over t,..., t k. In fact, (25) says that J(t i, b i+ ) = J(t i+, b i+ ), i =,..., k, so we may simplify this to minimizing max{j(t, b ), J(t 2, b 2 ),..., J(t k, b k ), J(t k, b k+ )}. (26)

12 b 4 t 3 t t 2 b 3 b b 2 Figure : Illustration of the points t i and b i Each b i, i = 2,..., k, is an increasing function of t i and t i. Each J(t i, b i ) J(t i, b i (t i, t i )) is easily seen to be an increasing function of t i and a decreasing function of t i, i = 2,..., k. Similarly, J(t, b ) is an increasing function of t and J(t k, b k+ ) is a decreasing function of t k. It follows that if all values inside the max in (26) are equal, then t,..., t k is optimal: no change in t,..., t k can simultaneously reduce all values inside the max. All values in (26) are equal if J(t i, b i ) = J(t i+, b i+ ), i =,..., k, and J(t k, b k ) = J(t k, b k+ ). Simple algebra now shows that this is equivalent to (24). The proof of Proposition 2 leads to a simple algorithm for finding the optimal points t,..., t k to use in a mixture of m exponentially twisted distributions. The key observation is that it suffices to find the optimal value of (26), which (as noted in the proof) is attained when all values inside the maximum in (26) are equal. Start with a guess ɛ > 0 for this value and set b = a. Now proceed recursively, for i =,..., m: given b i, find t i by solving the equation and then find b i+ by solving Λ (b i ) Λ (t i ) Λ (t i )[b i t i ] = ɛ Λ (b i+ ) Λ (t i ) Λ (t i )[b i+ t i ] = ɛ. This is straightforward, because the left side of the first equation is increasing in the unknown t i and the left side of the second equation is increasing in the the unknown b i+. If at some step there is no solution in [a, a 2 ], the procedure stops, reduces ɛ and starts over at b. If a solution is found at every step but b k+ < a 2, then ɛ is increased and the procedure starts over at b. Thus, one may apply a bisection search over ɛ to find the optimal ɛ and, simultaneously, the optimal t i and b i. The exact optimum has b k+ = a 2 ; in practice, one must specify some error tolerance for b k+ a 2. 2

13 3.4 Increasing Number of Components Again consider A = [a, a 2 ] H o and let J k denote the minimal value in (6), the minimax penalty incurred in estimating P (S n /n a) for all a A using a mixture of k exponentially twisted distributions. We examine how J k decreases with k. Recall that for a H o, 0 < Λ (θ(a)) <. Since, Λ (a) = /Λ (θ(a)), it follows that 0 < Λ (a) <. Since A H o is a compact set, we have inf a A Λ (a) > 0 and sup a A Λ (a) <. Proposition 3 For A = [a, a 2 ] H o, a > µ, there exist constants d, d 2 such that, for all k =, 2,..., d k 2 J k d 2 k 2. Proof: Let c = inf a A Λ (a) and c + = sup a A Λ (a). For any interval [b i, b i+ ] A and point t i satisfying (24), twice integrating the bounds on the second derivative of Λ yields c 2 (t i b i ) 2 Λ (b i ) Λ (t i ) Λ (t i )[b i t i ] c + 2 (t i b i ) 2 and c 2 (t i b i+ ) 2 Λ (b i+ ) Λ (t i ) Λ (t i )[b i+ t i ] c + 2 (t i b i+ ) 2. To construct an upper bound on J k, we may take b,..., b k+ to be equally spaced, so that b i+ b i = (a 2 a )/k, and choose t i to satisfy (24). Then J k max i=,...,k sup {Λ (a) Λ (t i ) Λ (t i )[a t i ]}. b i a b i+ As in the proof of Proposition, the supremum over the ith subinterval is attained at the endpoints, so c + J k max i=,...,k 2 (t i b i ) 2 c +(a 2 a ) 2 2k 2. To get a lower bound on J k, observe that for any b < < b k+ with a = b and b k+ = a 2 (including the optimal values), at least one subinterval [b i, b i+ ] must have length greater than or equal to (a 2 a )/k. Fix such a subinterval and let t i satisfy (24). Then J k max{λ (b i ) Λ (t i ) Λ (t i )[b i t i ], Λ (b i+ ) Λ (t i ) Λ (t i )[b i+ t i ]}, so J k max{ c 2 (t i b i+ ) 2, c 2 (t i b i ) 2 } c 8 (b i+ b i ) 2 c 8k 2 (a 2 a ) 2. The following result shows that by increasing the number of components k in the mixture together with n, we can recover asymptotic efficiency. Here P is defined from k components, as before, using the optimal twisting parameters θ(t i ), i =,..., k. 3

14 Proposition 4 For A = [a, a 2 ] H o, a > µ, if k as n, then P is asymptotically efficient for P (S n /n a), for every a A. If k = Θ( n/ log n), then the relative variance of the estimated tail distribution curve over A is polynomially bounded, for p >. If k = Θ( n), then (27) holds with p =. E sup lim sup P L 2 I(S n /n a) a A n p P (S n /n a) 2 <, (27) Proof: From Lemma and (8), we know that E P [ L 2 I(S n /n a)] constant exp( n max i=,...,k H(t i, a)) and P (S n /n a) constant n exp( nλ (a)), for all sufficiently large n. Thus, E P L 2 I(S n /n a) P (S n /n a) 2 n constant exp(n[2λ (a) max i=,...,k H(t i, a)]) n constant exp(nj k ) n constant exp(nd 2 /k 2 ), (28) for all sufficiently large n. If k, then ( E lim n log P L 2 ) I(S n /n a) P (S n /n a) 2 = 0. From this, asymptotic efficiency as an estimator of P (S n /n a) follows. If k = Θ( n/ log n), (28) implies (27) for any p >. If k = Θ( n), (28) implies (27) with p =. Bounds of the same order of magnitude as those in Proposition 3 (but with different constants) hold if we use equally spaced points t i rather than the optimal points. Thus, the number of components k needed to achieve asymptotic efficiency in that case is of the same order of magnitude as with optimally chosen points. This suggests that choosing the t i optimally is most relevant when k is relatively small. 4 Continuous Mixture of Exponentially Twisted Distributions We now show that a continuous mixture of P θ(t) for t A, simultaneously asymptotically efficiently estimates each P (S n /n a) for each a A. In Theorem 2, we allow A to be any interval in H o of values greater than µ. Thus, it is allowed to have the form [a, ) as long as it is a subset of H o and a > µ. 4

15 4. Polynomially Bounded Relative Variance Let g be a probability density function with support A. Consider the probability measure P g, where for any set A, P g (A) = a2 a P θ(t) (A)g(t)dt. (Here a 2 may equal.) This measure may be used to estimate P (S n /n a) as follows: First a sample T is generated using the density g. Then, exponentially twisted distribution P θ(t ) is used to generate the sample (X, X 2,..., X n ) and hence the output L g (S n )I(S n /n a) where ( a2 ) L g (S n ) = exp[θ(t)s n nλ(θ(t))]g(t)dt. a Theorem 2 For each a A o, when g is a positive continuous function on A, lim sup E Pg L g (S n ) 2 I(S n /n a) n P (S n /n a) 2 θ(a)λ (θ(a)) = Λ (a) g(a) Λ (a)g(a), so that the relative variance of the associated importance sampling estimator grows polynomially at most at rate p =. Proof: On the event {S n na}, L g (S n ) a ( a2 a ) exp[θ(t)n(a Λ(θ(t)))]g(t)dt. Since Λ (t) = θ(t)t Λ(θ(t)), and θ(t) = Λ (t), this upper bound equals ( a2 ) exp( nλ (a)) exp( n(λ (a) Λ (t)(a t) Λ (t))g(t)dt. Note that Λ (a) Λ (t)(a t) Λ (t) is minimized at t = a. Using Laplace s approximation (see, e.g., Olver [9], p. 80-8) a2 2π exp( n(λ (a) Λ (t)(a t) Λ (t))g(t)dt nλ g(a) (29) (θ a ) a if a (a, a 2 ). Thus, for ɛ > 0 and n large enough, on {S n na}, L g (S n ) exp( nλ nλ (a)) (θ(a)) ( + ɛ). (30) 2π g(a) Now E Pg L g (S n ) 2 I(S n /n a) = E P L g (S n )I(S n /n a). Hence, for n large enough, E Pg L g (S n ) 2 I(S n /n a) exp( nλ nλ (a)) (θ a ) 2π g(a) ( + ɛ)p (S n/n a). Using the sharp asymptotic for P (S n /n a) given in (8), and noting that ɛ is arbitrary, it follows that E Pg L g (S n ) 2 I(S n /n a) lim sup n P (S n /n a) 2 θ aλ (θ a ). g(a) Since θ a = Λ (a) and Λ (θ a ) = /Λ (a), the result follows. 5

16 Remark In Theorem 2, if a is on the boundary of [a, a 2 ], the analysis remains unchanged except that in (29) the asymptotic has to be divided by 2 to get the correct value (see, e.g., Olver [9], p. 8). Hence in this setting lim sup E Pg L g (S n ) 2 I(S n /n a) n P (S n /n a) 2 2 In particular, it follows that if A = [a, a 2 ] H o, a > µ, 4.2 Choice of Density Function Λ (a) Λ (a)g(a). E Pg L g (S n ) 2 I(S n /n a) sup lim sup a A n P (S n /n a) 2 <. From an implementation viewpoint, it may be important to select g to estimate all P (S n /n a), a A, with the same asymptotically relative variance. If this holds, we may continue to generate samples until the sample relative variance at one point becomes small, and this ensures that the relative variance at other points is not much different. Theorem 2 suggests that this may be achieved by selecting g(a) proportional to Λ (a)/λ (a) or equivalently θ(a)/θ (a). Suppose that A = [a, a 2 ] H o, a > µ. In this case a2 so such a g is feasible. We now evaluate such a g for some simple cases: a θ(a) θ da < (3) (a) Example If X i has a Bernoulli distribution with mean p, then its log-moment generating function is Λ(θ) = log(exp(θ)p + ( p)) and It is easily seen that Λ (θ) = [ θ(a) = log exp(θ)p exp(θ)p + ( p). a a ] p p and θ (a) = /a + /( a). Thus, g(a) is proportional to ( a + ) [ log a a a ] p. p It can be seen that this function is maximized at the solution of that exceeds p. a p exp[ /(2a )] = a p 6

17 Example 2 If X i has a Gaussian distribution with mean zero and variance, then its logmoment generating function is Λ(θ) = θ 2 /2 and Λ (θ) = θ. Also, θ(a) = a and θ (a) = and g(a) is proportional a. This suggests that more mass should be given to larger values of a in the interval [a, a 2 ]. Example 3 If X i has a Gamma distribution with log-moment generating function the its mean equals αβ. Now Λ(θ) = α log( θβ), Λ (θ) = αβ θβ so that θ(a) = αβ β ( a ) and θ (a) = α. Then, g(a) is proportional to a 2 Recall that a > αβ. 5 Concluding Remarks a( a αβ ). In this paper, we have considered the problem of simultaneous estimation of the probabilities of multiple rare events. In the setting of tail probabilities associated with a random walk, we have shown that the standard importance sampling estimator that yields asymptotically efficient estimates for one point on the distribution fails to do so for all other points. To address this problem, we have examined mixtures of exponentially twisted distributions. We have identified the optimal finite mixture and shown that asymptotic efficiency is achieved uniformly with either a continuous mixture or a finite mixture with an increasing number of components. Although our analysis is restricted to the random walk setting, we expect that similar techniques could prove useful in other rare event simulation problems. Similar ideas should also prove useful in going beyond the estimation of multiple probabilities to the estimation of functions of tail distributions, such as quantiles or tail conditional expectations. Acknowledgements: DMS This work is supported, in part, by NSF grants DMI and References [] Asmussen, S Ruin Probabilities, World Scientific Publishing Co. Ltd. London. [2] Bahadur, R.R. and R. R. Rao On deviations of the sample mean. Ann. Math. Statis.3, [3] Bucklew, J. A Introduction to Rare Event Simulation. Springer-Verlag, New York. [4] Chang, C.S., P. Heidelberger, S. Juneja, and P. Shahabuddin Effective bandwidth and fast simulation of ATM intree networks. Performance Evaluation 20,

18 [5] Dembo, A., and O. Zeitouni Large Deviations Techniques and Applications. Second Edition. Springer, New York, NY. [6] Glasserman, P., and J. Li Importance sampling for portfolio credit risk. Management Science 5, [7] Heidelberger, P Fast simulation of rare events in queueing and reliability models. ACM Transactions on Modeling and Computer Simulation 5,, [8] Juneja, S., and P. Shahabuddin Rare Event Simulation Techniques. To appear in Handbook on Simulation. Editors: Shane Henderson and Barry Nelson. [9] Olver, F. W. J Introduction to Asymptotics and Special Functions, Academic Press, New York. [0] Sadowsky, J. S., and J.A. Bucklew On large deviation theory and asymptotically efficient Monte Carlo estimation. IEEE Transactions on Information Theory 36, 3, [] Shahabuddin, P Importance sampling for the simulation of highly reliable Markovian systems. Management Science 40,

Rare-event Simulation Techniques: An Introduction and Recent Advances

Rare-event Simulation Techniques: An Introduction and Recent Advances Rare-event Simulation Techniques: An Introduction and Recent Advances S. Juneja Tata Institute of Fundamental Research, India juneja@tifr.res.in P. Shahabuddin Columbia University perwez@ieor.columbia.edu

More information

Introduction to Rare Event Simulation

Introduction to Rare Event Simulation Introduction to Rare Event Simulation Brown University: Summer School on Rare Event Simulation Jose Blanchet Columbia University. Department of Statistics, Department of IEOR. Blanchet (Columbia) 1 / 31

More information

Estimating Tail Probabilities of Heavy Tailed Distributions with Asymptotically Zero Relative Error

Estimating Tail Probabilities of Heavy Tailed Distributions with Asymptotically Zero Relative Error Estimating Tail Probabilities of Heavy Tailed Distributions with Asymptotically Zero Relative Error S. Juneja Tata Institute of Fundamental Research, Mumbai juneja@tifr.res.in February 28, 2007 Abstract

More information

Rare-Event Simulation

Rare-Event Simulation Rare-Event Simulation Background: Read Chapter 6 of text. 1 Why is Rare-Event Simulation Challenging? Consider the problem of computing α = P(A) when P(A) is small (i.e. rare ). The crude Monte Carlo estimator

More information

EFFICIENT SIMULATION OF LARGE DEVIATIONS EVENTS FOR SUMS OF RANDOM VECTORS USING SADDLE POINT REPRESENTATIONS

EFFICIENT SIMULATION OF LARGE DEVIATIONS EVENTS FOR SUMS OF RANDOM VECTORS USING SADDLE POINT REPRESENTATIONS Applied Probability Trust (3 November 0) EFFICIENT SIMULATION OF LARGE DEVIATIONS EVENTS FOR SUMS OF RANDOM VECTORS USING SADDLE POINT REPRESENTATIONS ANKUSH AGARWAL, SANTANU DEY AND SANDEEP JUNEJA, Abstract

More information

1 Definition of the Riemann integral

1 Definition of the Riemann integral MAT337H1, Introduction to Real Analysis: notes on Riemann integration 1 Definition of the Riemann integral Definition 1.1. Let [a, b] R be a closed interval. A partition P of [a, b] is a finite set of

More information

Math 328 Course Notes

Math 328 Course Notes Math 328 Course Notes Ian Robertson March 3, 2006 3 Properties of C[0, 1]: Sup-norm and Completeness In this chapter we are going to examine the vector space of all continuous functions defined on the

More information

EFFICIENT TAIL ESTIMATION FOR SUMS OF CORRELATED LOGNORMALS. Leonardo Rojas-Nandayapa Department of Mathematical Sciences

EFFICIENT TAIL ESTIMATION FOR SUMS OF CORRELATED LOGNORMALS. Leonardo Rojas-Nandayapa Department of Mathematical Sciences Proceedings of the 2008 Winter Simulation Conference S. J. Mason, R. R. Hill, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds. EFFICIENT TAIL ESTIMATION FOR SUMS OF CORRELATED LOGNORMALS Jose Blanchet

More information

Efficient rare-event simulation for sums of dependent random varia

Efficient rare-event simulation for sums of dependent random varia Efficient rare-event simulation for sums of dependent random variables Leonardo Rojas-Nandayapa joint work with José Blanchet February 13, 2012 MCQMC UNSW, Sydney, Australia Contents Introduction 1 Introduction

More information

MATH 131A: REAL ANALYSIS (BIG IDEAS)

MATH 131A: REAL ANALYSIS (BIG IDEAS) MATH 131A: REAL ANALYSIS (BIG IDEAS) Theorem 1 (The Triangle Inequality). For all x, y R we have x + y x + y. Proposition 2 (The Archimedean property). For each x R there exists an n N such that n > x.

More information

We are going to discuss what it means for a sequence to converge in three stages: First, we define what it means for a sequence to converge to zero

We are going to discuss what it means for a sequence to converge in three stages: First, we define what it means for a sequence to converge to zero Chapter Limits of Sequences Calculus Student: lim s n = 0 means the s n are getting closer and closer to zero but never gets there. Instructor: ARGHHHHH! Exercise. Think of a better response for the instructor.

More information

On the inefficiency of state-independent importance sampling in the presence of heavy tails

On the inefficiency of state-independent importance sampling in the presence of heavy tails Operations Research Letters 35 (2007) 251 260 Operations Research Letters www.elsevier.com/locate/orl On the inefficiency of state-independent importance sampling in the presence of heavy tails Achal Bassamboo

More information

M2PM1 Analysis II (2008) Dr M Ruzhansky List of definitions, statements and examples Preliminary version

M2PM1 Analysis II (2008) Dr M Ruzhansky List of definitions, statements and examples Preliminary version M2PM1 Analysis II (2008) Dr M Ruzhansky List of definitions, statements and examples Preliminary version Chapter 0: Some revision of M1P1: Limits and continuity This chapter is mostly the revision of Chapter

More information

Asymptotics of minimax stochastic programs

Asymptotics of minimax stochastic programs Asymptotics of minimax stochastic programs Alexander Shapiro Abstract. We discuss in this paper asymptotics of the sample average approximation (SAA) of the optimal value of a minimax stochastic programming

More information

Asymptotics and Fast Simulation for Tail Probabilities of Maximum of Sums of Few Random Variables

Asymptotics and Fast Simulation for Tail Probabilities of Maximum of Sums of Few Random Variables Asymptotics and Fast Simulation for Tail Probabilities of Maximum of Sums of Few Random Variables S. JUNEJA Tata Institute of Fundamental Research, Mumbai R. L. KARANDIKAR Indian Statistical Institute,

More information

On Kusuoka Representation of Law Invariant Risk Measures

On Kusuoka Representation of Law Invariant Risk Measures MATHEMATICS OF OPERATIONS RESEARCH Vol. 38, No. 1, February 213, pp. 142 152 ISSN 364-765X (print) ISSN 1526-5471 (online) http://dx.doi.org/1.1287/moor.112.563 213 INFORMS On Kusuoka Representation of

More information

Portfolio Credit Risk with Extremal Dependence: Asymptotic Analysis and Efficient Simulation

Portfolio Credit Risk with Extremal Dependence: Asymptotic Analysis and Efficient Simulation OPERATIONS RESEARCH Vol. 56, No. 3, May June 2008, pp. 593 606 issn 0030-364X eissn 1526-5463 08 5603 0593 informs doi 10.1287/opre.1080.0513 2008 INFORMS Portfolio Credit Risk with Extremal Dependence:

More information

On asymptotically efficient simulation of large deviation probabilities

On asymptotically efficient simulation of large deviation probabilities On asymptotically efficient simulation of large deviation probabilities A. B. Dieker and M. Mandjes CWI P.O. Box 94079 1090 GB Amsterdam, the Netherlands and University of Twente Faculty of Mathematical

More information

2019 Spring MATH2060A Mathematical Analysis II 1

2019 Spring MATH2060A Mathematical Analysis II 1 2019 Spring MATH2060A Mathematical Analysis II 1 Notes 1. CONVEX FUNCTIONS First we define what a convex function is. Let f be a function on an interval I. For x < y in I, the straight line connecting

More information

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications. Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable

More information

THE INVERSE FUNCTION THEOREM

THE INVERSE FUNCTION THEOREM THE INVERSE FUNCTION THEOREM W. PATRICK HOOPER The implicit function theorem is the following result: Theorem 1. Let f be a C 1 function from a neighborhood of a point a R n into R n. Suppose A = Df(a)

More information

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi Real Analysis Math 3AH Rudin, Chapter # Dominique Abdi.. If r is rational (r 0) and x is irrational, prove that r + x and rx are irrational. Solution. Assume the contrary, that r+x and rx are rational.

More information

Problem 3. Give an example of a sequence of continuous functions on a compact domain converging pointwise but not uniformly to a continuous function

Problem 3. Give an example of a sequence of continuous functions on a compact domain converging pointwise but not uniformly to a continuous function Problem 3. Give an example of a sequence of continuous functions on a compact domain converging pointwise but not uniformly to a continuous function Solution. If we does not need the pointwise limit of

More information

An inverse of Sanov s theorem

An inverse of Sanov s theorem An inverse of Sanov s theorem Ayalvadi Ganesh and Neil O Connell BRIMS, Hewlett-Packard Labs, Bristol Abstract Let X k be a sequence of iid random variables taking values in a finite set, and consider

More information

L p Spaces and Convexity

L p Spaces and Convexity L p Spaces and Convexity These notes largely follow the treatments in Royden, Real Analysis, and Rudin, Real & Complex Analysis. 1. Convex functions Let I R be an interval. For I open, we say a function

More information

arxiv:math/ v2 [math.st] 15 May 2006

arxiv:math/ v2 [math.st] 15 May 2006 The Annals of Statistics 2006, Vol. 34, No. 1, 92 122 DOI: 10.1214/009053605000000859 c Institute of Mathematical Statistics, 2006 arxiv:math/0605322v2 [math.st] 15 May 2006 SEQUENTIAL CHANGE-POINT DETECTION

More information

MATH 722, COMPLEX ANALYSIS, SPRING 2009 PART 5

MATH 722, COMPLEX ANALYSIS, SPRING 2009 PART 5 MATH 722, COMPLEX ANALYSIS, SPRING 2009 PART 5.. The Arzela-Ascoli Theorem.. The Riemann mapping theorem Let X be a metric space, and let F be a family of continuous complex-valued functions on X. We have

More information

f(x) f(z) c x z > 0 1

f(x) f(z) c x z > 0 1 INVERSE AND IMPLICIT FUNCTION THEOREMS I use df x for the linear transformation that is the differential of f at x.. INVERSE FUNCTION THEOREM Definition. Suppose S R n is open, a S, and f : S R n is a

More information

Ordinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods

Ordinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods Ordinal optimization - Empirical large deviations rate estimators, and multi-armed bandit methods Sandeep Juneja Tata Institute of Fundamental Research Mumbai, India joint work with Peter Glynn Applied

More information

MATH5011 Real Analysis I. Exercise 1 Suggested Solution

MATH5011 Real Analysis I. Exercise 1 Suggested Solution MATH5011 Real Analysis I Exercise 1 Suggested Solution Notations in the notes are used. (1) Show that every open set in R can be written as a countable union of mutually disjoint open intervals. Hint:

More information

QF101: Quantitative Finance August 22, Week 1: Functions. Facilitator: Christopher Ting AY 2017/2018

QF101: Quantitative Finance August 22, Week 1: Functions. Facilitator: Christopher Ting AY 2017/2018 QF101: Quantitative Finance August 22, 2017 Week 1: Functions Facilitator: Christopher Ting AY 2017/2018 The chief function of the body is to carry the brain around. Thomas A. Edison 1.1 What is a function?

More information

Math 118B Solutions. Charles Martin. March 6, d i (x i, y i ) + d i (y i, z i ) = d(x, y) + d(y, z). i=1

Math 118B Solutions. Charles Martin. March 6, d i (x i, y i ) + d i (y i, z i ) = d(x, y) + d(y, z). i=1 Math 8B Solutions Charles Martin March 6, Homework Problems. Let (X i, d i ), i n, be finitely many metric spaces. Construct a metric on the product space X = X X n. Proof. Denote points in X as x = (x,

More information

Large deviations for random walks under subexponentiality: the big-jump domain

Large deviations for random walks under subexponentiality: the big-jump domain Large deviations under subexponentiality p. Large deviations for random walks under subexponentiality: the big-jump domain Ton Dieker, IBM Watson Research Center joint work with D. Denisov (Heriot-Watt,

More information

Lecture 2: Basic Concepts of Statistical Decision Theory

Lecture 2: Basic Concepts of Statistical Decision Theory EE378A Statistical Signal Processing Lecture 2-03/31/2016 Lecture 2: Basic Concepts of Statistical Decision Theory Lecturer: Jiantao Jiao, Tsachy Weissman Scribe: John Miller and Aran Nayebi In this lecture

More information

Definition 6.1. A metric space (X, d) is complete if every Cauchy sequence tends to a limit in X.

Definition 6.1. A metric space (X, d) is complete if every Cauchy sequence tends to a limit in X. Chapter 6 Completeness Lecture 18 Recall from Definition 2.22 that a Cauchy sequence in (X, d) is a sequence whose terms get closer and closer together, without any limit being specified. In the Euclidean

More information

Lecture Notes on Metric Spaces

Lecture Notes on Metric Spaces Lecture Notes on Metric Spaces Math 117: Summer 2007 John Douglas Moore Our goal of these notes is to explain a few facts regarding metric spaces not included in the first few chapters of the text [1],

More information

Economics 204 Fall 2011 Problem Set 2 Suggested Solutions

Economics 204 Fall 2011 Problem Set 2 Suggested Solutions Economics 24 Fall 211 Problem Set 2 Suggested Solutions 1. Determine whether the following sets are open, closed, both or neither under the topology induced by the usual metric. (Hint: think about limit

More information

Expectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 )

Expectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 ) Expectation Maximization (EM Algorithm Motivating Example: Have two coins: Coin 1 and Coin 2 Each has it s own probability of seeing H on any one flip. Let p 1 = P ( H on Coin 1 p 2 = P ( H on Coin 2 Select

More information

When is a nonlinear mixed-effects model identifiable?

When is a nonlinear mixed-effects model identifiable? When is a nonlinear mixed-effects model identifiable? O. G. Nunez and D. Concordet Universidad Carlos III de Madrid and Ecole Nationale Vétérinaire de Toulouse Abstract We consider the identifiability

More information

Chapter 4: Asymptotic Properties of the MLE

Chapter 4: Asymptotic Properties of the MLE Chapter 4: Asymptotic Properties of the MLE Daniel O. Scharfstein 09/19/13 1 / 1 Maximum Likelihood Maximum likelihood is the most powerful tool for estimation. In this part of the course, we will consider

More information

EC9A0: Pre-sessional Advanced Mathematics Course. Lecture Notes: Unconstrained Optimisation By Pablo F. Beker 1

EC9A0: Pre-sessional Advanced Mathematics Course. Lecture Notes: Unconstrained Optimisation By Pablo F. Beker 1 EC9A0: Pre-sessional Advanced Mathematics Course Lecture Notes: Unconstrained Optimisation By Pablo F. Beker 1 1 Infimum and Supremum Definition 1. Fix a set Y R. A number α R is an upper bound of Y if

More information

Asymptotics and Simulation of Heavy-Tailed Processes

Asymptotics and Simulation of Heavy-Tailed Processes Asymptotics and Simulation of Heavy-Tailed Processes Department of Mathematics Stockholm, Sweden Workshop on Heavy-tailed Distributions and Extreme Value Theory ISI Kolkata January 14-17, 2013 Outline

More information

Analysis II - few selective results

Analysis II - few selective results Analysis II - few selective results Michael Ruzhansky December 15, 2008 1 Analysis on the real line 1.1 Chapter: Functions continuous on a closed interval 1.1.1 Intermediate Value Theorem (IVT) Theorem

More information

A LITTLE REAL ANALYSIS AND TOPOLOGY

A LITTLE REAL ANALYSIS AND TOPOLOGY A LITTLE REAL ANALYSIS AND TOPOLOGY 1. NOTATION Before we begin some notational definitions are useful. (1) Z = {, 3, 2, 1, 0, 1, 2, 3, }is the set of integers. (2) Q = { a b : aεz, bεz {0}} is the set

More information

On Reparametrization and the Gibbs Sampler

On Reparametrization and the Gibbs Sampler On Reparametrization and the Gibbs Sampler Jorge Carlos Román Department of Mathematics Vanderbilt University James P. Hobert Department of Statistics University of Florida March 2014 Brett Presnell Department

More information

DISTRIBUTION OF THE SUPREMUM LOCATION OF STATIONARY PROCESSES. 1. Introduction

DISTRIBUTION OF THE SUPREMUM LOCATION OF STATIONARY PROCESSES. 1. Introduction DISTRIBUTION OF THE SUPREMUM LOCATION OF STATIONARY PROCESSES GENNADY SAMORODNITSKY AND YI SHEN Abstract. The location of the unique supremum of a stationary process on an interval does not need to be

More information

LECTURE 15: COMPLETENESS AND CONVEXITY

LECTURE 15: COMPLETENESS AND CONVEXITY LECTURE 15: COMPLETENESS AND CONVEXITY 1. The Hopf-Rinow Theorem Recall that a Riemannian manifold (M, g) is called geodesically complete if the maximal defining interval of any geodesic is R. On the other

More information

λ(x + 1)f g (x) > θ 0

λ(x + 1)f g (x) > θ 0 Stat 8111 Final Exam December 16 Eleven students took the exam, the scores were 92, 78, 4 in the 5 s, 1 in the 4 s, 1 in the 3 s and 3 in the 2 s. 1. i) Let X 1, X 2,..., X n be iid each Bernoulli(θ) where

More information

CS 6820 Fall 2014 Lectures, October 3-20, 2014

CS 6820 Fall 2014 Lectures, October 3-20, 2014 Analysis of Algorithms Linear Programming Notes CS 6820 Fall 2014 Lectures, October 3-20, 2014 1 Linear programming The linear programming (LP) problem is the following optimization problem. We are given

More information

CHAPTER 6. Differentiation

CHAPTER 6. Differentiation CHPTER 6 Differentiation The generalization from elementary calculus of differentiation in measure theory is less obvious than that of integration, and the methods of treating it are somewhat involved.

More information

Definitions & Theorems

Definitions & Theorems Definitions & Theorems Math 147, Fall 2009 December 19, 2010 Contents 1 Logic 2 1.1 Sets.................................................. 2 1.2 The Peano axioms..........................................

More information

FUNDAMENTALS OF REAL ANALYSIS by. IV.1. Differentiation of Monotonic Functions

FUNDAMENTALS OF REAL ANALYSIS by. IV.1. Differentiation of Monotonic Functions FUNDAMNTALS OF RAL ANALYSIS by Doğan Çömez IV. DIFFRNTIATION AND SIGND MASURS IV.1. Differentiation of Monotonic Functions Question: Can we calculate f easily? More eplicitly, can we hope for a result

More information

Problem set 1, Real Analysis I, Spring, 2015.

Problem set 1, Real Analysis I, Spring, 2015. Problem set 1, Real Analysis I, Spring, 015. (1) Let f n : D R be a sequence of functions with domain D R n. Recall that f n f uniformly if and only if for all ɛ > 0, there is an N = N(ɛ) so that if n

More information

Topological properties

Topological properties CHAPTER 4 Topological properties 1. Connectedness Definitions and examples Basic properties Connected components Connected versus path connected, again 2. Compactness Definition and first examples Topological

More information

Convex Optimization Notes

Convex Optimization Notes Convex Optimization Notes Jonathan Siegel January 2017 1 Convex Analysis This section is devoted to the study of convex functions f : B R {+ } and convex sets U B, for B a Banach space. The case of B =

More information

ON SPACE-FILLING CURVES AND THE HAHN-MAZURKIEWICZ THEOREM

ON SPACE-FILLING CURVES AND THE HAHN-MAZURKIEWICZ THEOREM ON SPACE-FILLING CURVES AND THE HAHN-MAZURKIEWICZ THEOREM ALEXANDER KUPERS Abstract. These are notes on space-filling curves, looking at a few examples and proving the Hahn-Mazurkiewicz theorem. This theorem

More information

Ordinal Optimization and Multi Armed Bandit Techniques

Ordinal Optimization and Multi Armed Bandit Techniques Ordinal Optimization and Multi Armed Bandit Techniques Sandeep Juneja. with Peter Glynn September 10, 2014 The ordinal optimization problem Determining the best of d alternative designs for a system, on

More information

Notes on Complex Analysis

Notes on Complex Analysis Michael Papadimitrakis Notes on Complex Analysis Department of Mathematics University of Crete Contents The complex plane.. The complex plane...................................2 Argument and polar representation.........................

More information

Lectures 2 3 : Wigner s semicircle law

Lectures 2 3 : Wigner s semicircle law Fall 009 MATH 833 Random Matrices B. Való Lectures 3 : Wigner s semicircle law Notes prepared by: M. Koyama As we set up last wee, let M n = [X ij ] n i,j=1 be a symmetric n n matrix with Random entries

More information

Most Continuous Functions are Nowhere Differentiable

Most Continuous Functions are Nowhere Differentiable Most Continuous Functions are Nowhere Differentiable Spring 2004 The Space of Continuous Functions Let K = [0, 1] and let C(K) be the set of all continuous functions f : K R. Definition 1 For f C(K) we

More information

Chapter 3: Root Finding. September 26, 2005

Chapter 3: Root Finding. September 26, 2005 Chapter 3: Root Finding September 26, 2005 Outline 1 Root Finding 2 3.1 The Bisection Method 3 3.2 Newton s Method: Derivation and Examples 4 3.3 How To Stop Newton s Method 5 3.4 Application: Division

More information

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Ann Inst Stat Math (2009) 61:773 787 DOI 10.1007/s10463-008-0172-6 Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Taisuke Otsu Received: 1 June 2007 / Revised:

More information

EFFICIENT SIMULATION FOR LARGE DEVIATION PROBABILITIES OF SUMS OF HEAVY-TAILED INCREMENTS

EFFICIENT SIMULATION FOR LARGE DEVIATION PROBABILITIES OF SUMS OF HEAVY-TAILED INCREMENTS Proceedings of the 2006 Winter Simulation Conference L. F. Perrone, F. P. Wieland, J. Liu, B. G. Lawson, D. M. Nicol, and R. M. Fujimoto, eds. EFFICIENT SIMULATION FOR LARGE DEVIATION PROBABILITIES OF

More information

Continuity. Chapter 4

Continuity. Chapter 4 Chapter 4 Continuity Throughout this chapter D is a nonempty subset of the real numbers. We recall the definition of a function. Definition 4.1. A function from D into R, denoted f : D R, is a subset of

More information

1 Directional Derivatives and Differentiability

1 Directional Derivatives and Differentiability Wednesday, January 18, 2012 1 Directional Derivatives and Differentiability Let E R N, let f : E R and let x 0 E. Given a direction v R N, let L be the line through x 0 in the direction v, that is, L :=

More information

Random Bernstein-Markov factors

Random Bernstein-Markov factors Random Bernstein-Markov factors Igor Pritsker and Koushik Ramachandran October 20, 208 Abstract For a polynomial P n of degree n, Bernstein s inequality states that P n n P n for all L p norms on the unit

More information

Helly's Theorem and its Equivalences via Convex Analysis

Helly's Theorem and its Equivalences via Convex Analysis Portland State University PDXScholar University Honors Theses University Honors College 2014 Helly's Theorem and its Equivalences via Convex Analysis Adam Robinson Portland State University Let us know

More information

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM

ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM c 2007-2016 by Armand M. Makowski 1 ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM 1 The basic setting Throughout, p, q and k are positive integers. The setup With

More information

Finite Presentations of Hyperbolic Groups

Finite Presentations of Hyperbolic Groups Finite Presentations of Hyperbolic Groups Joseph Wells Arizona State University May, 204 Groups into Metric Spaces Metric spaces and the geodesics therein are absolutely foundational to geometry. The central

More information

POISSON PROCESSES 1. THE LAW OF SMALL NUMBERS

POISSON PROCESSES 1. THE LAW OF SMALL NUMBERS POISSON PROCESSES 1. THE LAW OF SMALL NUMBERS 1.1. The Rutherford-Chadwick-Ellis Experiment. About 90 years ago Ernest Rutherford and his collaborators at the Cavendish Laboratory in Cambridge conducted

More information

Homework for MATH 4603 (Advanced Calculus I) Fall Homework 13: Due on Tuesday 15 December. Homework 12: Due on Tuesday 8 December

Homework for MATH 4603 (Advanced Calculus I) Fall Homework 13: Due on Tuesday 15 December. Homework 12: Due on Tuesday 8 December Homework for MATH 4603 (Advanced Calculus I) Fall 2015 Homework 13: Due on Tuesday 15 December 49. Let D R, f : D R and S D. Let a S (acc S). Assume that f is differentiable at a. Let g := f S. Show that

More information

Chapter 2 Convex Analysis

Chapter 2 Convex Analysis Chapter 2 Convex Analysis The theory of nonsmooth analysis is based on convex analysis. Thus, we start this chapter by giving basic concepts and results of convexity (for further readings see also [202,

More information

Gärtner-Ellis Theorem and applications.

Gärtner-Ellis Theorem and applications. Gärtner-Ellis Theorem and applications. Elena Kosygina July 25, 208 In this lecture we turn to the non-i.i.d. case and discuss Gärtner-Ellis theorem. As an application, we study Curie-Weiss model with

More information

BALANCING GAUSSIAN VECTORS. 1. Introduction

BALANCING GAUSSIAN VECTORS. 1. Introduction BALANCING GAUSSIAN VECTORS KEVIN P. COSTELLO Abstract. Let x 1,... x n be independent normally distributed vectors on R d. We determine the distribution function of the minimum norm of the 2 n vectors

More information

FORMULATION OF THE LEARNING PROBLEM

FORMULATION OF THE LEARNING PROBLEM FORMULTION OF THE LERNING PROBLEM MIM RGINSKY Now that we have seen an informal statement of the learning problem, as well as acquired some technical tools in the form of concentration inequalities, we

More information

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley

Learning Methods for Online Prediction Problems. Peter Bartlett Statistics and EECS UC Berkeley Learning Methods for Online Prediction Problems Peter Bartlett Statistics and EECS UC Berkeley Course Synopsis A finite comparison class: A = {1,..., m}. Converting online to batch. Online convex optimization.

More information

Multivariate Analysis and Likelihood Inference

Multivariate Analysis and Likelihood Inference Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference Joint density

More information

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set Analysis Finite and Infinite Sets Definition. An initial segment is {n N n n 0 }. Definition. A finite set can be put into one-to-one correspondence with an initial segment. The empty set is also considered

More information

Decomposing oriented graphs into transitive tournaments

Decomposing oriented graphs into transitive tournaments Decomposing oriented graphs into transitive tournaments Raphael Yuster Department of Mathematics University of Haifa Haifa 39105, Israel Abstract For an oriented graph G with n vertices, let f(g) denote

More information

Sums of dependent lognormal random variables: asymptotics and simulation

Sums of dependent lognormal random variables: asymptotics and simulation Sums of dependent lognormal random variables: asymptotics and simulation Søren Asmussen Leonardo Rojas-Nandayapa Department of Theoretical Statistics, Department of Mathematical Sciences Aarhus University,

More information

Estimates for probabilities of independent events and infinite series

Estimates for probabilities of independent events and infinite series Estimates for probabilities of independent events and infinite series Jürgen Grahl and Shahar evo September 9, 06 arxiv:609.0894v [math.pr] 8 Sep 06 Abstract This paper deals with finite or infinite sequences

More information

Numerical Sequences and Series

Numerical Sequences and Series Numerical Sequences and Series Written by Men-Gen Tsai email: b89902089@ntu.edu.tw. Prove that the convergence of {s n } implies convergence of { s n }. Is the converse true? Solution: Since {s n } is

More information

JUHA KINNUNEN. Harmonic Analysis

JUHA KINNUNEN. Harmonic Analysis JUHA KINNUNEN Harmonic Analysis Department of Mathematics and Systems Analysis, Aalto University 27 Contents Calderón-Zygmund decomposition. Dyadic subcubes of a cube.........................2 Dyadic cubes

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Measurable Choice Functions

Measurable Choice Functions (January 19, 2013) Measurable Choice Functions Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ [This document is http://www.math.umn.edu/ garrett/m/fun/choice functions.pdf] This note

More information

X. Numerical Methods

X. Numerical Methods X. Numerical Methods. Taylor Approximation Suppose that f is a function defined in a neighborhood of a point c, and suppose that f has derivatives of all orders near c. In section 5 of chapter 9 we introduced

More information

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem

More information

3 Integration and Expectation

3 Integration and Expectation 3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ

More information

ABSTRACT 1 INTRODUCTION. Jose Blanchet. Henry Lam. Boston University 111 Cummington Street Boston, MA 02215, USA

ABSTRACT 1 INTRODUCTION. Jose Blanchet. Henry Lam. Boston University 111 Cummington Street Boston, MA 02215, USA Proceedings of the 2011 Winter Simulation Conference S. Jain, R. R. Creasey, J. Himmelspach, K. P. White, and M. Fu, eds. RARE EVENT SIMULATION TECHNIQUES Jose Blanchet Columbia University 500 W 120th

More information

Verifying Regularity Conditions for Logit-Normal GLMM

Verifying Regularity Conditions for Logit-Normal GLMM Verifying Regularity Conditions for Logit-Normal GLMM Yun Ju Sung Charles J. Geyer January 10, 2006 In this note we verify the conditions of the theorems in Sung and Geyer (submitted) for the Logit-Normal

More information

Reduced-load equivalence for queues with Gaussian input

Reduced-load equivalence for queues with Gaussian input Reduced-load equivalence for queues with Gaussian input A. B. Dieker CWI P.O. Box 94079 1090 GB Amsterdam, the Netherlands and University of Twente Faculty of Mathematical Sciences P.O. Box 17 7500 AE

More information

Sample Spaces, Random Variables

Sample Spaces, Random Variables Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted

More information

REAL VARIABLES: PROBLEM SET 1. = x limsup E k

REAL VARIABLES: PROBLEM SET 1. = x limsup E k REAL VARIABLES: PROBLEM SET 1 BEN ELDER 1. Problem 1.1a First let s prove that limsup E k consists of those points which belong to infinitely many E k. From equation 1.1: limsup E k = E k For limsup E

More information

Robust Network Codes for Unicast Connections: A Case Study

Robust Network Codes for Unicast Connections: A Case Study Robust Network Codes for Unicast Connections: A Case Study Salim Y. El Rouayheb, Alex Sprintson, and Costas Georghiades Department of Electrical and Computer Engineering Texas A&M University College Station,

More information

Krzysztof Burdzy University of Washington. = X(Y (t)), t 0}

Krzysztof Burdzy University of Washington. = X(Y (t)), t 0} VARIATION OF ITERATED BROWNIAN MOTION Krzysztof Burdzy University of Washington 1. Introduction and main results. Suppose that X 1, X 2 and Y are independent standard Brownian motions starting from 0 and

More information

1: PROBABILITY REVIEW

1: PROBABILITY REVIEW 1: PROBABILITY REVIEW Marek Rutkowski School of Mathematics and Statistics University of Sydney Semester 2, 2016 M. Rutkowski (USydney) Slides 1: Probability Review 1 / 56 Outline We will review the following

More information

2. Dual space is essential for the concept of gradient which, in turn, leads to the variational analysis of Lagrange multipliers.

2. Dual space is essential for the concept of gradient which, in turn, leads to the variational analysis of Lagrange multipliers. Chapter 3 Duality in Banach Space Modern optimization theory largely centers around the interplay of a normed vector space and its corresponding dual. The notion of duality is important for the following

More information

h(x) lim H(x) = lim Since h is nondecreasing then h(x) 0 for all x, and if h is discontinuous at a point x then H(x) > 0. Denote

h(x) lim H(x) = lim Since h is nondecreasing then h(x) 0 for all x, and if h is discontinuous at a point x then H(x) > 0. Denote Real Variables, Fall 4 Problem set 4 Solution suggestions Exercise. Let f be of bounded variation on [a, b]. Show that for each c (a, b), lim x c f(x) and lim x c f(x) exist. Prove that a monotone function

More information

Notes on uniform convergence

Notes on uniform convergence Notes on uniform convergence Erik Wahlén erik.wahlen@math.lu.se January 17, 2012 1 Numerical sequences We begin by recalling some properties of numerical sequences. By a numerical sequence we simply mean

More information