c 2018 Society for Industrial and Applied Mathematics

Size: px

Start display at page:

Download "c 2018 Society for Industrial and Applied Mathematics"

Katrina Tate
5 years ago
Views:

1 SIAM J. OPTIM. Vol. 28, No. 1, pp c 2018 Society for Industrial and Applied Mathematics ON SAMPLING RATES IN SIMULATION-BASED RECURSIONS RAGHU PASUPATHY, PETER GLYNN, SOUMYADIP GHOSH, AND FATEMEH S. HASHEMI Abstract. We consider the context of simulation-based recursions, that is, recursions that involve quantities needing to be estimated using a stochastic simulation. Examples include stochastic adaptations of fixed-point and gradient descent recursions obtained by replacing function and derivative values appearing within the recursion by their Monte Carlo counterparts. The primary motivating settings are simulation optimization and stochastic root finding problems, where the low point and the zero of a function are sought, respectively, with only Monte Carlo estimates of the functions appearing within the problem. We as how much Monte Carlo sampling needs to be performed within simulation-based recursions in order that the resulting iterates remain consistent and, more importantly, efficient, where efficient implies convergence at the fastest possible rate. Answering these questions involves trading off two types of error inherent in the iterates: the deterministic error due to recursion and the stochastic error due to sampling. As we demonstrate through a characterization of the relationship between sample sizing and convergence rates, efficiency and consistency are intimately coupled with the speed of the underlying recursion, with faster recursions yielding a wider regime of optimal sampling rates. The implications of our results for practical implementation are immediate since they provide specific guidance on optimal simulation expenditure within a variety of stochastic recursions. Key words. simulation-based recursions, machine learning, stochastic optimization, stochastic gradient AMS subect classifications. 90CXX, 62LXX, 93E35, 68Q32 DOI / Introduction. We consider the question of sampling within algorithmic recursions that involve quantities needing to be estimated using a stochastic simulation. The prototypical example setting is simulation optimization SO) [17, 26], where an optimization problem is to be solved using only a stochastic simulation capable of providing estimates of the obective function and constraints at a requested point. Another closely related example setting is the Stochastic Root Finding Problem SRFP) [25, 28, 27], where the zero of a vector function is sought, with only simulation-based estimates of the function involved. SO problems and SRFPs, instead of stipulating that the functions involved in the problem statement be nown exactly or in analytic form, allow implicit representation of functions through a stochastic simulation, thereby facilitating virtually any level of complexity. Such flexibility has resulted in adoption across widespread application contexts. A few examples are logistics [18, 19, 3], healthcare [1, 13, 11], epidemiology [14], and vehicular-traffic Received by the editors January 6, 2014; accepted for publication in revised form) October 10, 2017; published electronically January 9, Funding: The wor of the first author was supported by Office of Naval Research contracts N and N and National Science Foundation grant CMMI He is also grateful for the financial and logistics support provided by IBM Research, Yortown Heights, NY, where he spent his sabbatical year Department of Statistics, Purdue University, West Lafayette, IN pasupath@purdue.edu). Department of Management Science and Engineering, Stanford University, Stanford, CA glynn@stanford.edu). T.J. Watson IBM Research, Yortown Heights, NY ghosh@us.ibm.com). The Grado Department of Industrial and Systems Engineering, Virginia Tech, Blacsburg, VA fatemeh.s.hashemi@gmail.com). 45

2 46 R. PASUPATHY, P. GLYNN, S. GHOSH, AND F. HASHEMI systems [24]. A popular and reasonable solution paradigm for solving SO problems and SRFPs is to simply mimic what a solution algorithm might do within a deterministic context, after estimating any needed function and derivative values using the available stochastic simulation. An example serves to illustrate such a technique best. Consider the basic quasi-newton recursion 1) x +1 = x α H 1 f x ) fx ), used to find a local minimum of a twice-differentiable real-valued function f : R d R, where H f x) and fx) are the Hessian and gradient deterministic) approximations of the true Hessian H f x) and gradient fx) of the function f : R d R at the point x. We emphasize that H f x) and fx) as they appear in 1) are deterministic and could be, for example, approximations obtained through appropriate finite-differencing of the function f at a set of points around x.) Suppose that the context in consideration is such that only noisy simulation-based estimates of f are available, implying that the recursion in 1) is not implementable as written. A reasonable adaptation of 1) might instead be the recursion 2) X +1 = X ˆα Ĥ 1 f m, X ) ˆ fm, X ), where ˆ fm, x), x R d, and Ĥf m, x), x R d, are simulation estimators of fx), x R d, and H f x), x R d, constructed using estimated function values, and the step-length ˆα estimates the step-length α appearing in the deterministic recursion 1). The simulation effort m in 2) is general and might represent the number of simulation replications in the case of terminating simulations or the simulation run length in the case of nonterminating simulations [21]. While the recursion in 2) is intuitively appealing, important questions arise within its context. Since the exact function value fx) at any point x is unnown and needs to be estimated using stochastic sampling, one might as how much sampling m should be performed during each iteration. Inadequate sampling can cause nonconvergence of 2) due to repeated mis-steps from which iterates in 2) might fail to recover. Such nonconvergence can be avoided through increased sampling, that is, using large m values; however, such increased sampling translates to an increase in computational complexity and an associated decreased convergence rate. The questions we answer in this paper pertain to the simulation) sampling effort expended within recursions such as 2). Our interest is a generalized version of 2) that we call sampling controlled stochastic recursion SCSR), which will be defined more rigorously in section 3. Within the context of SCSR, we as the following questions. Q.1 What sampling rates in SCSR ensure that the resulting iterates are strongly consistent, that is, converge to the correct solution with probability one? Q.2 What is the convergence rate of the iterates resulting from SCSR, expressed as a function of the sample sizes and the speed of the underlying deterministic recursion? Q.3 With reference to Q.2, are there specific SCSR recursions that guarantee a canonical rate, that is, the fastest achievable convergence speed under generic sampling? Q.4 What do the answers to Q.1 Q.3 imply for practical implementation? Questions such as what we as in this paper have recently been considered [15, 10, 29] but usually within a specific algorithmic context. An exception is [7], which

3 ON SAMPLING RATES IN SIMULATION-BASED RECURSIONS 47 broadly treats the complexity trade-offs stemming from estimation, approximation, and optimization errors within large-scale learning problems.) In [15], for instance, the behavior of the stochastic gradient descent recursion 3) x +1 = x α g is considered for optimizing a smooth function f, where α is the step size used during the th iteration and g is an estimate of the gradient fx ). Importantly, g is assumed to be estimated such that the error in the estimate e = g fx ) satisfies E [ e 2] B, where B is a per-iteration bound that can be seen to be related to the notion of sample size in this paper. The results in [15] detail the functional relationship between the convergence rate of the sequence {x } in 3) and the chosen sequence {B }. Lie in [15], the recursion considered in [10] is again 3), but [10] considers the question more directly, proposing a dynamic sampling scheme ain to that in [29] that is a result of balancing the variance and the squared bias of the gradient estimate at each step. One of the main results in [10] states that when sample sizes grow geometrically across iterations, the resulting iterates in 3) exhibit the fastest achievable convergence rate, something that will be reaffirmed for SCSR recursions considered in this paper. As already noted, we consider the questions Q.1 Q.4 within a recursive context SCSR) that is more general than 3) or 2). Our aim is to characterize the relationship between the errors due to recursion and sampling that naturally arise in SCSR, and their implication to SO and SRFP algorithms. We will demonstrate through our answers that these errors are inextricably lined and fully characterizable. Furthermore, we will show that such characterization naturally leads to sampling regimes which, when combined with a deterministic recursion of a specified speed, result in specific SCSR convergence rates. The implication for implementation seems clear: given the choice of the deterministic recursive structure in use, our error characterization suggests sampling rates that should be employed in order to enoy the best achievable SCSR convergence rates Summary and insight from main results. The results we present are broadly divided into those concerning the strong consistency of SCSR iterates and those pertaining to SCSR s efficiency as defined from the standpoint of the total amount of simulation effort. Insight on consistency appears in the form of Theorem 5.2, which relates the estimator quality in SCSR with the minimum sampling rate that will guarantee almost sure convergence. Theorem 5.2 is deliberately generic in that it maes only mild assumptions about the speed of the recursion in use within SCSR and about the simulation estimator quality. Theorem 5.2 also guarantees convergence to zero) of the mean absolute deviation or L 1 convergence) of SCSR s iterates to a solution. Theorems and associated corollaries are devoted to efficiency issues surrounding SCSR. Of these, Theorems are the most important and characterize the convergence rate of SCSR as a function of the sampling rate and the speed of recursion in use. Specifically, as summarized in Figure 1, these results characterize the sampling regimes resulting in predominantly sampling error too little sampling ) versus those resulting in predominantly recursion error too much sampling ), along with identifying the convergence rates for all recursion-sampling combinations. Furthermore, and as illustrated using the shaded region in Figure 1, Theorems identify those recursion-sampling combinations yielding the optimal rate, that is, the highest achievable convergence rates with the given simulation estimator at hand. As

4 48 R. PASUPATHY, P. GLYNN, S. GHOSH, AND F. HASHEMI it turns out, and as implied by Theorems , recursions that utilize more structural information afford a wider range of sampling rates that produce the optimal rate. For instance, Theorems imply that recursions such as 2) will achieve the optimal rate if the sampling rate is either geometric, or superexponential up to a certain threshold; sampling rates falling outside this regime yield subcanonical convergence rates for SCSR. The notions of optimal rates, sampling rates, and recursion rates will be defined rigorously in short order.) The corresponding regime when using a linearly converging recursion such as a fixed-point recursion is narrower and limited to a small band of geometric sampling rates. Interestingly, our results show that sublinearly converging recursions are incapable of yielding optimal rates for SCSR, that is, the sampling regime that produces optimal rates when a sublinearly converging recursion is in use is empty. We also present a result Theorem 6.10) that provides a complexity bound on the mean absolute error of the SCSR iterates under more restrictive assumptions on the behavior of the recursion in use Paper organization. The rest of the paper is organized as follows. In the ensuing section, we introduce much of the standing notation and conventions used throughout the paper. This is followed by section 3, where we present a rigorous problem statement, and by section 4, where we present specific nontrivial examples of SCSR recursions. Sections 5 and 6 contain the main results of the paper. We provide concluding remars in section 7, with a brief commentary on implementation and the use of stochastic sample sizes. 2. Notation and convention. We will adopt the following notation throughout the paper. For more details, especially on the convergence of sequences of random variables, see [5]. i) If x R d is a vector, then its components are denoted through x x 1), x 2),..., x d) ). ii) We use e i R d to denote a unit vector whose ith component is 1 and whose every other component is 0, that is, e i i) = 1 and e i ) = 0 for i. iii) For a sequence of random variables {Z n }, we say Z n p Z if {Zn } d converges to Z in probability; we say Z n Z to mean that {Zn } converges to Z in L distribution; we say that Z p n Z if E[ Z n Z p wp1 ] 0; and finally, we say Z n Z wp1 to mean that {Z n } converges to Z with probability one. When Z n z, where z is a constant, we will say that Z n is strongly consistent with respect to z. iv) Z + denotes the set of positive integers. v) B r x ) {x : x x r} denotes the d-dimensional Euclidean ball centered on x and having radius r. vi) distx, B) = inf{ x y : y B} denotes the Euclidean distance between a point x R d and a set B R d. vii) diamb) = sup{ x y : x, y B} denotes the diameter of the set B R d. viii) For a sequence of real numbers {a n }, we say a n = o1) if lim n a n = 0 and a n = O1) if {a n } is bounded, i.e., c 0, ) with a n < c for large enough n. We say that a n = Θ1) if 0 < lim inf a n lim sup a n <. For positive-valued sequences {a n }, {b n }, we say a n = Ob n ) if a n /b n = O1) as n ; we say a n = Θb n ) if a n /b n = Θ1) as n. ix) For a sequence of p positive-valued random variables {A n }, we say A n = o p 1) if A n 0 as n ; and we say A n = O p 1) if {A n } is stochastically bounded, that is, for given ɛ > 0 there exists cɛ) 0, ) with PA n < cɛ)) > 1 ɛ for large enough n. If {B n } is another sequence of positive-valued random variables, we say A n = O p B n ) if A n /B n = O p 1) as n ; we say A n = o p B n ) if A n /B n = o p 1) as n. Also, when we say A n O p b n ), we mean that A n B n, where {B n } is a random sequence that satisfies B n = O p b n ). x) For two sequences of real numbers {a n }, {b n } we say a n b n if lim n a n /b n = 1.

5 ON SAMPLING RATES IN SIMULATION-BASED RECURSIONS 49 Also, the following notions will help our exposition and will be used heavily. Definition 2.1 growth rate of a sequence). A sequence {m } is said to exhibit Polynomialλ p, p) growth if m = λ p p, = 1, 2,..., for some λ p, p 0, ); it is said to exhibit Geometricc) growth if m +1 = c m, = 0, 1, 2,..., for some c 1, ); and it is said to exhibit SupExponentialλ t, t) growth if m +1 = λ t m t, = 0, 1, 2,..., for some λ t 0, ), t 1, ). Definition 2.2 a sequence increasing faster than another). Let {m } and { m } be two positive-valued increasing sequences that tend to infinity. Then {m } is said to increase faster than { m } if m +1 /m m +1 / m for large enough. In such a case, { m } is also said to increase slower than {m }. According to Definitions 2.1 and 2.2, it can be seen that any sequence that is growing as SupExponentialλ t, t) is faster than any other sequence that is growing as Geometricc); liewise, any sequence growing as Geometricc) is faster than any other sequence growing as Polynomialλ p, p). 3. Problem setting and assumptions. The general context that we consider is that of unconstrained sampling-controlled stochastic recursions SCSR), defined through the following recursion: SCSR) X +1 = X + H m, X ), = 0, 1, 2,..., where X R d for all. The deterministic analogue DA) of SCSR is DA) x +1 = x + h x ), = 0, 1, 2,.... The random function H m, x), x R d, called the simulation estimator should be interpreted as estimating the corresponding deterministic quantity h x) at the point of interest x, after expending m amount of simulation effort. We emphasize that the obects h ) and H m, ) appearing in DA) and SCSR) can be iteration dependent functions. Two illustrative examples are presented in section Assumptions. The following two assumptions are standing assumptions that will be invoed in several of the important results of the paper. Further assumptions will be made as and when required. Assumption 3.1. The recursion DA) exhibits global convergence to a unique point x ; that is, the sequence {x } of iterates generated by DA) when started with any initial point x 0 satisfies lim x = x. Assumption 3.2. Denote the filtration F = σ{x 0, H 0 m 0, X 0 )), X 1, H 1 m 1, X 1 )),..., X, H m, X ))} generated by the history sequence after iteration. Then the simulation estimator H m, X ) satisfies for 1, with probability one, 4) E [m α H m, X ) h X ) F 1 ] κ 0 + κ 1 X for some α > 0, and where κ 0, κ 1 are some positive constants. We will refer to the constant α as the convergence rate associated with the simulation estimator. Assumption 3.1 assumes convergence of the deterministic recursion DA) s iterates starting from any initial point x 0. Such an assumption is needed if we were to expect

6 50 R. PASUPATHY, P. GLYNN, S. GHOSH, AND F. HASHEMI stochastic iterations in SCSR) to converge to the correct solution in any reasonable sense. We view the deterministic recursion DA) to be the limiting form of SCSR), obtained, for example, if the estimator H m, x) at hand is a perfect estimator of h x), constructed using a hypothetical infinite sample. Assumption 3.2 is a statement about the behavior of the simulation estimator H m, x), x R d, and is analogous to standard assumptions in the literature on stochastic approximation and machine learning, e.g., Assumption A3 in [6] and Assumption 4.3b),c) in [8]. In order to develop convergent algorithms for the context we consider in this paper, some sort of restriction on the extent to which a simulation estimator can mislead an algorithm is necessary. Assumption 3.2 is a formal codification of such a restriction; it implies that the error in the estimator H m, X ), conditional on the history of the observed random variables up to iteration, decays with rate α. Furthermore, the manner of such decay can depend on the current iterate X. Assumption 3.2 subsumes typical stochastic optimization contexts where the mean squared error of the simulation estimator with respect to the true obective function value) at any point is bounded by an affine function of the squared L 2 -norm of the true gradient at the point, assuming that the gradient function is Lipschitz Wor and efficiency. In the analysis considered throughout this paper, computational effort calculations are limited to simulation effort. Therefore, the total wor done through iterations of SCSR is given by W = m i. i=1 Our assessment of any sampling strategy will be based on how fast the error E = X x in the th iterate of SCSR stochastically) converges to zero as a function of the total wor W. This will usually be achieved by first identifying the convergence rate of E with respect to the iteration number and then translating this rate with respect to the total wor W. Under mild conditions, we will demonstrate that E cannot converge to zero faster than W α in a certain rigorous sense), where α is defined through Assumption 3.2. This maes intuitive sense because it seems reasonable to expect that a stochastic recursion s quality is at most as good as the quality of the estimator at hand. We will then deem those recursions having error sequences {E } that achieve the convergence rate W α as being efficient. The convergence rate of E with respect to the iteration number is of little significance. 4. Examples. In this section, we illustrate SCSR using two popular recursions occurring within the context of SO and SRFPs. For each example, we show the explicit form of the SCSR and the DA recursions through their corresponding functions H m, ) and h ). We also identify the estimator convergence rate α in each case Sampling controlled gradient method with fixed step. Consider the context of solving an unconstrained optimization problem using the gradient method [9, section 9.3], usually written as 5) x +1 = x + t fx )), = 0, 1,..., where f : R d R is the real-valued function being optimized, f : R d R d is its gradient function, and t > 0 is an appropriately chosen constant. Instead of a fixed stepsize t in 5), one might use a diminishing stepsize sequence {t } chosen to satisfy

7 ON SAMPLING RATES IN SIMULATION-BASED RECURSIONS 51 t 0, =1 t = [4, Chapter 1].) Owing to its simplicity, the recursion in 5) has recently become popular in large-scale SO contexts [8]. Let us now suppose that the gradient function g ) f ) in 5) is unobservable, but we have access to i.i.d. observations G i x), i = 1, 2,..., satisfying E[G i x)] = gx) for any x R d. The sampling controlled version of the gradient method then taes the form 6) X +1 = X + t m 1 m i=1 ) G i X ), = 0, 1,..., thus implying the SCSR) and DA) recursive obects H m, x) t m 1 m i=1 G ix)), h x) t fx) for all x R d. Using standard arguments [23, Theorem ] it can be shown that when f is strongly convex and differentiable with a gradient that satisfies fx) fy) L x y for all x, y R d, L <, and the step size t L 1, the iterates in 5) exhibit linear convergence to a zero of f. Furthermore, elementary probabilistic arguments show that Assumption 3.2 is satisfied with rate constant α = 1/ Sampling controlled Kiefer Wolfowitz iteration. Let us consider unconstrained simulation optimization on a differentiable function f : R d R that is estimated using F m, x) = m 1 m i=1 F ix), where F i x), x R d, i = 1, 2,..., are i.i.d. copies of an unbiased function estimator F x), x R d of f. Assume that we do not have direct stochastic observations of the gradient function fx) so that the current context differs from that in section 4.1. This context has recently been called the zeroth order [16] for the reason that only function estimates are available.) We thus choose the SCSR iteration to be a modified Kiefer Wolfowitz [20] iteration constructed using a finite difference approximation of the stochastic function observations. Specifically, recalling the notation G G 1), G 2),..., G d) ), suppose 7) X +1 = X tgm, X ), = 0, 1,..., where 8) G i) m, X ) = F m, X + s i) ) F m, X s i) ) 2s i) estimates the ith partial derivative of f at X, s s 1), s2),..., sd) ) is the vector step, and t is an appropriately chosen constant. Assume, for simplicity, that the function observations generated at X s are independent of those generated at X + s. In the notation of SCSR) and DA), the simulation estimator H m, x) tgm, x) and h x) t fx) for all x R d assuming that s is chosen so that s i) 0 and m s i), i = 1, 2,..., d. Furthermore, if s is chosen as s i) = cm 1/6 and f has a bounded third derivative, then Assumption 3.2 is satisfied with α = 1/3 [2, Proposition 1.1]. Also, the deterministic recursion DA) corresponding to 7) is the same as that in section 4.1, and the iteration complexity discussed there applies here as well. Remar 4.1. In 8), derivative estimators with faster convergence rates can be constructed by estimating higher order derivatives of f. For instance, by observing G i) m, x + u i) ), = 1, 2,..., n, at n strategically located design points x + u 1, x +

8 52 R. PASUPATHY, P. GLYNN, S. GHOSH, AND F. HASHEMI u 2,..., x + u n, the error E[ H m, x) h x) ] = Om n/2n+1 ), that is, the error in the estimator can be made arbitrarily close to the Monte Carlo canonical rate ) [2, Chapter VII, section 1a]. Om 1/2 5. Consistency. In this section, we present a result that clarifies the conditions on the sampling rates to ensure that the iterates produced by SCSR exhibit almost sure convergence to the solution x. We will rely on the following elegant result that appears in a slightly more specific form as Lemma 11 on page 50 of [30]. Lemma 5.1. Let {V } be a sequence of nonnegative random variables, where E[V 0 ] <, and let {r } and {q } be deterministic scalar sequences such that E[V +1 V 0, V 1,..., V ] 1 r )V + q almost surely for 0, where 0 is fixed, 0 r 1, q 0, =0 r =, =0 q <, lim r 1 q = 0. Then, lim V = 0 almost surely and lim E[V ] = 0. We now state the main consistency result for SCSR) when the corresponding deterministic DA recursion exhibits Sub-Linears) or Linearl) convergence. Theorem 5.2. Let Assumptions 3.1 and 3.2 hold. Let the sample size sequence {m } satisfy m 1 = O 1 α δ ) for some δ > 0. The constant α is the convergence rate of the simulation estimator appearing in Assumption 3.2.) i) Suppose the recursion DA) guarantees a Sub-Linears) decrease at each, that is, for every x, and some s 0, 1), x + h x) x 1 s ) 9) x x. Then X x wp1 0 and E[ X x ] 0. ii) Suppose the recursion DA) guarantees a Linearl) decrease at each, that is, for every x,, the recursion DA) satisfies 10) x + h x) x l x x. Then X x wp1 0 and E[ X x ] 0. Proof. Let us first prove the assertion in i). Using SCSR) and recalling the unique solution x to the recursion DA), we can write 11) X +1 x = X + h X ) x + H m, X ) h X ), = 0, 1, 2,.... Denoting E = X x, 11) gives E +1 1 s ) 12) E + H m, X ) h X ), = 0, 1, 2,.... Now conditioning on F 1 and then taing expectation on both sides of 12), we get E [E +1 F 1 ] 1 s ) E + E[ H m, X ) h X ) F 1 ] 1 s ) E + κ 0 m α + κ 1 X m α 13) 1 s + κ 1 m α ) E + κ 0 + κ 1 x m α 1 s κ 1m α ) ) E + κ 0 + κ 1 x m α.

9 ON SAMPLING RATES IN SIMULATION-BASED RECURSIONS 53 If the sequence {m } is chosen so that m 1 = O 1 α δ ) for some δ > 0 as has been postulated by the theorem), then for any given ɛ, we see that κ 1 m α < ɛ for large enough. Therefore, after integrating out the random variables H i m i, X i ), i = 0, 1,..., 1, in 13), we can write for any given ɛ 0, s) and large enough that ) s ɛ) E [E +1 E 0, E 1,..., E ] 1 E + κ 0 + κ 1 x 14) m α. Now, if we apply Lemma 5.1 to 14) with r s ɛ) 1 and q βm α for β = κ 0 + κ 1 x, then =1 r = =1 s ɛ) 1 =, =1 q = =1 βm α = O =1 1 αδ ) <, and lim sup r 1 q lim sup βs ɛ) 1 αδ = 0. We thus see wp1 that the postulates of Lemma 5.1 hold, implying that E 0 and E[E ] 0. Next, suppose the recursion DA) exhibits Linearl) convergence. The inequality analogous to 14) is then 15) E [E +1 E 0, E 1,..., E ] 1 1 l κ 1 m α ))E + κ 0 + κ 1 x. Since {m }, we see that for any given ɛ 0, 1 l), for large enough 16) E [E +1 E 0, E 1,..., E ] 1 1 l ɛ))e + κ 0 + κ 1 x m α. Now, apply Lemma 5.1 to 16) with r 1 l ɛ and q βm ) α for β = κ 0 +κ 1 x. If the sequence {m } is chosen so that m 1 = O 1 α δ for some δ > 0, then =1 r = =1 l ɛ =, =1 q = =1 βm α = O =1 1 αδ ) < and lim sup r 1 q lim sup βl ɛ) 1 1 αδ = 0. We thus see that the postulates of wp1 Lemma 5.1 hold implying that E 0 and E[E ] 0. It is important to note that the assumed decrease condition, 9) or 10), is on the hypothetical) deterministic recursion DA) and not the stochastic recursion SCSR). The motivating setting here is unconstrained convex minimization where a decrease such as 9) or 10) can usually be guaranteed. The theorem can be relaxed to more general settings where the decrease condition 10) holds only when X is close enough to x, but as we show later when we characterize convergence rates, we will still need a wea decrease condition such as 9) to hold for all X. For this reason, part i) in Theorem 5.2 should be seen as the main result on the strong consistency of SCSR. The stipulation m 1 = O 1 α δ ) for some δ > 0 in Theorem 5.2 amounts to a wea stipulation on the sample size increase rate for guaranteeing strong consistency and L 1 convergence. That the minimum stipulated sample size increase depends on the quality as encoded by the convergence rate α) of the simulation estimator is to be expected. However, part ii) of Theorem 5.2 implies that the minimum stipulated sample size increase does not depend on the speed of the underlying deterministic recursion as long as it exceeds a sublinear rate. So, when a linear decrease 10) as in part ii) of Theorem 5.2 is ensured, the sample size stipulation m 1 = O 1 α δ ) needed for strong consistency remains the same. This, as we shall see in greater detail in ensuing sections, is because sampling error dominates the error due to recursion and is hence decisive in determining whether the iterates converge. 6. Convergence rates and efficiency. In this section, we present results that shed light on the convergence rate and the efficiency of SCSR under different sampling m α

10 54 R. PASUPATHY, P. GLYNN, S. GHOSH, AND F. HASHEMI and recursion contexts. Specifically, we derive the convergence rates associated with using various combinations of sample size increases polynomial, geometric, superexponential) and the speed of convergence of the DA recursion sublinear, linear, superlinear). This information is then used to identify what sample size growth rates may be best, that is, efficient, for various combinations of recursive structures and simulation estimators. See Figure 6.1 for a concise and intuitive summary of the results in this section.) In what follows, convergence rates are first expressed as a function of the iteration and the various constants associated with sampling and recursion. These obtained rates are then related to the total wor done through iterations of SCSR given by W = i=1 m i, in order to obtain a sense of the efficiency. As we show next, the quantity W α is a stochastic lower bound on the error E in the SCSR iterates; thus, loosely speaing, α is an upper bound on the convergence rate of the error in SCSR iterates. It is in this sense that we say SCSR s iterates are efficient whenever they attain the rate W α. Theorem 6.1. Let the postulates of Theorem 5.2 hold with a nondecreasing sample size sequence {m }, and let the recursion DA) satisfy postulate i) in Theorem 5.2. Furthermore, suppose there exist δ, ɛ > 0, and a set B δ x ) such that, for large enough, 17) inf {x,u) :x B δ x ); u =1} Pmα H m, x) h x)) T u δ ) > ɛ. Then the recursion SCSR cannot converge faster than W α ; that is, there exists ɛ > 0 such that for any sequence of sample sizes {m }, lim inf PW α E > δ ) > ɛ. Proof. Since the postulates of Theorem 5.2 are satisfied, we are guaranteed that X x wp1 0 and hence that X x p 0. For proving the theorem, we will show that for large enough, Pm α E δ ) > ɛ, where ɛ > 0. Since W = =1 m m, the assertion of Theorem 6.1 will then hold. Choose δ = minδ, δ ), where δ is the constant appearing in 17). Since {X } p x, for large enough, we have 18) PX B δ x )) 1 ɛ. Denoting U X ) := X + h X ) x, we can write for large enough 19) Pm α +1E +1 δ ) Pm α E +1 δ ) = PA 1 ) + PA 2 ), where the events A 1 and A 2 in 19) are defined as follows. 20) A 1 := m α E +1 δ ) U X ) 0) ; A 2 := m α E +1 δ ) U X ) = 0). We also define the following two other events: 21) C 1 := m α H m, X ) h X )) T U X ) δ U X ) ) U X ) 0) ; C 2 := m α H m, X ) h X ) δ ) U X ) = 0). Since E +1 = X + H m, X ) x, we notice that 22) E H m, X ) h X )) T U X ) + H m, X ) h X ) 2.

11 ON SAMPLING RATES IN SIMULATION-BASED RECURSIONS 55 Due to 22) and the Cauchy Schwarz inequality [5], we see that A 1 C 1 ; due to 22), we also see that A 2 C 2. Hence 23) PA 1 ) PC 1 ) and PA 2 ) PC 2 ). Define R := {x : U x) = 0}. Then, due to the assumption in 17), we see that for any x B δ x ) R c, 24) PC 1 X = x) > ɛ. And, since the Cauchy Schwarz inequality [5] implies that m α H m, X ) h X ) m α H m, X ) h X )) T u for any unit vector u, we again see from the assumption in 17) that for any x B δ x ) R, 25) PC 2 X = x) > ɛ. Next, letting F X denote the distribution function of X, we write PC 1 ) = PC 1 X = x) df X x) PC 1 X = x)i{x B δ x ) R c } df X x) 26) ɛ PX B δ x ) R c ), where the second inequality in 26) follows from 24). Similarly, 27) PC 2 ) ɛ PX B δ x ) R ). Combining 26), 27), and 18), we see that for large enough, 28) PC 1 ) + PC 2 ) ɛ 1 ɛ). Finally, from 28), 23), and 19), we conclude that for large enough, 29) and the theorem is proved. Pm α E δ ) > ɛ := ɛ1 ɛ), Theorem 6.1 is important in that it provides a benchmar for efficiency. Specifically, Theorem 6.1 implies that sampling and recursion choices that result in errors achieving the rate W α are efficient. Theorem 6.1 relies on the assumption in 17) which puts an upper bound on the quality convergence rate) of the simulation estimator H m, x); the reader will recall that Assumption 3.2 puts a lower bound on the quality of H m, x). The condition in 17) is wea, especially since H m, x), being a simulation estimator, is routinely governed by a central limit theorem CLT) [5] of the form m α H m, x) h x)) d N0, Σx)) as, where N0, Σx)) is a normal random variable with zero mean and covariance Σx). We emphasize that Theorem 6.1 only says that α is an upper bound for the convergence rate of SCSR and says nothing about whether this rate is in fact achievable. We will now wor toward a general lower bound on the sampling rates that achieve efficiency. We will need the following lemma for proving such a lower bound. Lemma 6.2. Let {a } be any positive-valued sequence. Then

12 56 R. PASUPATHY, P. GLYNN, S. GHOSH, AND F. HASHEMI i) a = Θ =1 a ) if {a } is Geometricc) or faster; ii) a = o =1 a ) if {a } is Polynomialλ p, p) or slower. Proof of i). If {a } is Geometricc) or faster, we now that a +1 /a c > 1 for large enough. Hence, for some 0 and all 0, a /a c. This implies that for 0, 30) a 1 =1 a = a 1 a 1 = a 1 0 =1 0 =1 0 =1 a + a 1 a + = 0+1 = a + c a c 1 1 c. Using 30) and since a =1 a, we conclude that the assertion holds. Proof of ii). Let p > 0 be such that {a } is Polynomialλ p, p) or slower. We then now that for some 0 > 0 and all 0, a /a p / p. This implies that 31) a 1 =1 a = a 1 0 =1 a + a 1 = 0+1 a a 1 0 =1 a + p = 0+1 Now notice that the term p = 0+1 p appearing on the right-hand side of 31) diverges as to conclude that the assertion in ii) holds. We are now ready to present a lower bound on the rate at which sample sizes should be increased in order to ensure optimal convergence rates. Theorem 6.3. Let the postulates of Theorem 6.1 hold. i) Suppose m = ow ). Then the sequence of solutions {X } is such that W αe is not O p 1); that is, there exist ɛ > 0 and a subsequence { n } such that PW α n E n n) > ɛ. ii) If {m } grows as Polynomialλ p, p), then W αe is not O p 1). Proof of i). The postulates of Theorem 6.1 hold, and hence we now from 29) in the proof of Theorem 6.1 that there exists K 1 δ, ɛ) such that for K 1 δ, ɛ), 32) Pm α E δ ) > 1 ɛ)ɛ, where the constants ɛ, δ are positive constants that satisfy the assumption in 17). Since m = ow ) and α > 0, we see that m α /W α = o1) as. Therefore, for any n > 0, there exists K 2 n) such that for K 2 n), 33) δ n m α /W α. p. Combining 32), 33), we see that for any n > 0, if maxk 1 δ, ɛ), K 2 n)), then ) P m α E n mα 34) W α > 1 ɛ)ɛ, and hence, for maxk 1 δ, ɛ), K 2 n)), 35) PW α E n) > 1 ɛ)ɛ.

13 ON SAMPLING RATES IN SIMULATION-BASED RECURSIONS 57 Proof of ii). The assertion is seen to be true from the assertion in i) and upon noticing that if {m } grows as Polynomialλ p, p), then m = ow ). Theorem 6.3 is important since its assertions imply that for SCSR to have any chance of efficiency, sample sizes should be increased at least geometrically. This is irrespective of the speed of the recursion DA. Of course, since this is only a lower bound, increasing the sample size at least geometrically does not guarantee efficiency, which, as we shall see, depends on the speed of the DA recursion. Before we present such an efficiency result for linearly converging DA recursions, we need two more lemmas. Lemma 6.4. Let {a )} =1, 1, be a triangular array of positive-valued real numbers. Assume that the following hold. i) There exist and β > 1 such that a+1) a ) 1. a ii) lim sup ) a ) = l < for each [1, 1]. Then S = i=1 a i) = Oa )). Proof. We have, for large enough and any ɛ 0, ), 36) 1 S = a ) a ) ɛ + a ) a ) + 1 β for all [, 1] and all a ) a = ) l +, = β where the inequality follows from assumptions i) and ii). Since β > 1, <, and l <, the term within parentheses on the right-hand side of 36) is finite, and the assertion holds. Lemma 6.5. Let {S n } be a nonnegative sequence of random variables, N 0 a welldefined random variable, and {a n }, {b n } positive-valued deterministic sequences. i) If E[S n ] = Oa n ), then S n = O p a n ). ii) If S n O p b n ) for n N 0, then S n = O p b n ). Proof. Suppose the first assertion is false. Then there exist ɛ > 0 and a subsequence {n } such that P Sn a n ) ɛ for all 1. This, however, implies that E[ Sn a n ] ɛ for all 1, contradicting the postulate E[S n ] = Oa n ). The first assertion of the lemma is thus proved. For proving the second assertion, we first note that the postulate S n O p b n ) for n N 0 means that S n B n for n N 0, where {B n } is a sequence of random variables satisfying B n = O p b n ). Now, since B n = O p b n ), given ɛ > 0, we can choose bɛ), n 1 ɛ) so that PB n /b n bɛ)) ɛ/2 for all n n 1 ɛ). Also, since N 0 is a well-defined random variable, we can find n 2 ɛ) such that for all n n 2 ɛ), PN 0 > n) ɛ/2. We can then write for n maxn 1 ɛ), n 2 ɛ)) that PS n /b n bɛ)) PB n /b n bɛ)) N 0 n)) + PB n /b n bɛ)) N 0 > n)) 37) PB n /b n bɛ)) + PN 0 > n) ɛ 2 + ɛ 2 = ɛ, thus proving the assertion in ii).

14 58 R. PASUPATHY, P. GLYNN, S. GHOSH, AND F. HASHEMI We are now ready to prove the main result on the convergence rate and efficiency of SCSR when the DA recursion exhibits linear convergence. Theorem 6.6 presents the convergence rate in terms of the iteration number first, and then in terms of the total simulation wor W. Theorem 6.6 linearly converging DA). Let Assumptions 3.1 and 3.2 hold. Also, suppose the following two assumptions hold. A.1 The deterministic recursion DA) exhibits Linearl) convergence in a neighborhood around x ; that is, there exists a neighborhood V of x such that whenever x V, and for all, A.2 For all x,, x + h x) x l x x, l 0, 1). x + h x) x 1 s ) x x, s 0, 1). Then, recalling that E X x, as, the following hold: i) ii) O p pα if {m } grows as Polynomialλ p, p), pα > 1; O ) p c α if {m } grows as Geometricc) with c 1, l 1/α ); E = O ) p l if {m } grows as Geometricc) with c l 1/α ; O p l ) if {m } grows as SupExponentialλ t, t). W α p p+1 E = O p 1) if {m } grows as Polynomialλ p, p), pα > 1; W αe = O p 1) if {m } grows as Geometricc) with c 1, l 1/α ); c α l 1) W α E = O p 1) if {m } grows as Geometricc) with c l 1/α ; log W ) log t 1/l) E = O p 1) if {m } grows as SupExponentialλ t, t). Proof. First we see that Assumptions 3.1 and 3.2 and A.2 hold, and that all sample size sequences {m } considered in i) and ii) satisfy m = O 1 α δ ) for some δ > 0. We thus see that the postulates of Theorem 5.2 hold, implying that E = X x wp1 0. Therefore, excluding a set of measure zero, for any given > 0 there exists a well-defined random variable K 0 = K 0 ) such that X x for K 0. Now choose such that the ball B x ) V, where V is the neighborhood appearing in Assumption A.1. Since X +1 = X + H m, X ), we can write X K0++1 x = X K0+ x + h K0+X K0+) + H K0+m K0+, X K0+) h K0+X K0+), and hence 38) X K0++1 x l X K0+ x + H K0+m K0+, X K0+) h K0+X K0+). Recursing 38) bacward and recalling the notation E X x, we have for

15 ON SAMPLING RATES IN SIMULATION-BASED RECURSIONS ) E K0++1 l +1 E K0 + l +1 + l H K0+m K0+, X K0+) h K0+X K0+) l H K0+m K0+, X K0+) h K0+X K0+) +K 0 = l +1 + l +K0 H m, X ) h X ) =K 0 +K 0 l +1 + l +K0 H m, X ) h X ), where the second inequality above follows from the definition of K 0, the equality follows after relabeling + K 0, and the third inequality follows from the addition of some positive terms to the summation. Relabeling K 0 + in 39) and denoting ζ := H m, X ) hx ), we can write, for K 0, 40) E +1 l K0+1 + l ζ. Recalling the filtration F 1 generated by the history sequence, we notice that E l ζ = l E [ ζ ] 41) = l E [E [ ζ F 1 ]] l E [ m α κ 0 + κ 1 X ) ] l m α κ 0 + κ 1 x + κ 1 E [E ]), where the first inequality in 41) is due to Assumption 3.2. Due to Theorem 5.2, we now that for a given ɛ > 0, there exists 0 ɛ) such that for all ɛ), E [E ] ɛ. We use this in 41) and write [ ] E l ζ 0 ɛ) l m α κ 0 + κ 1 x + κ 1E [E ]) + = 0 ɛ)+1 l m α κ 0 + κ 1 x + κ 1ɛ) 42) 0 ɛ) l +1 l 1 m α κ 0 + κ 1 x + κ 1E [E ]) + κ 0 + κ 1 x + κ 1ɛ) l m α.

16 60 R. PASUPATHY, P. GLYNN, S. GHOSH, AND F. HASHEMI Since 0 ɛ) is finite and E[E ] < for all 0 ɛ), the inequality in 42) implies that 43) E l ζ = O l +1 + l m α. From part i) of Lemma 6.5, we now that if a positive random sequence {S } satisfies E[S ] = Oa ), where {a } is a positive-valued deterministic sequence, then S /a = O p 1). Therefore, we see from 43) after setting S to be l ζ and a to be l +1 + ) that 44) l m α l ζ = O p l +1 + Use 40) and 44) to write, for K 0, 45) l m α. E +1 l +1 l K0 + O p l +1 + = O p l +1 + l m α l m α. Now use part ii) of Lemma 6.5 on 45) to conclude that 46) E +1 = O p l +1 + l m α. We will now show that the first equality in assertion i) of Theorem 6.6 holds by showing that the two assumptions of Lemma 6.4 hold for l m α appearing in 46). Set the summand of l m α to a ), and since m = λ p p, we have a+1) a ) = 1 l +1 )pα. Choosing β such that β > 1 and lβ < 1, and setting = Max1, 1 1 lβ) 1/pα 1), we see that the first assumption of Lemma 6.4 is satisfied. The second assumption of Lemma 6.4 is also satisfied since for any fixed > 0, lim sup a ) a ) = lim sup l )pα = 0 for all [1, ]. To prove the second and third equalities in assertion i) of Theorem 6.6, suppose {m } grows as Geometricc) with c < l 1/α, that is, c α > l. Then, noticing that m = m 0 c, we write 47) l m α = m α 0 l c α = m α 0 c α = Θc α ) ) +1 ) l 1 c α 1 l c α and use 47) in 46). If {m } grows as Geometricc) with c > l 1/α, that is, c α < l,

17 ON SAMPLING RATES IN SIMULATION-BASED RECURSIONS 61 then notice that 47) becomes 48) l m α = m α 0 l c α = m α 0 = Θl ). c α l ) 1 l ) c α +1 ) l 1 c α l Now use 48) in 46). To see that the fourth equality in assertion i) of Theorem 6.6 holds, we notice that a sample size sequence {m } that grows as SupExponentialλ t, t) is faster as defined in Definition 2.2) than a sample size sequence {m } that grows as Geometricc). Proof of ii). To prove the assertion in ii), we notice that since W = =1 m, we have 49) W = Θ p+1 ) if {m } grows as Polynomialλ p, p); = Θc ) if {m } grows as Geometricc); = Θλ 1 t 1 t m 0 ) t ) if {m } grows as SupExponentialλ t, t). Now use 49) in assertion i) to obtain the assertion in ii). Theorem 6.6 provides various insights about the behavior of the error in SCSR iterates. For instance, the error structures detailed in i) of Theorem 6.6 suggest two well-defined sampling regimes where only one of the two error types, sampling error or recursion error, is dominant. Specifically, note that E = O p pα ) when the sampling rate is Polynomialλ p, p). This implies that when DA exhibits Linearl) convergence, polynomial sampling is too little in the sense that SCSR s convergence rate is dictated purely by sampling error since the constant l corresponding to DA s convergence is absent in the expression for E. The corresponding reduction in efficiency can be seen in ii) where E is shown to converge as O p W α p 1+p ). Recall that efficiency amounts to {E } achieving a convergence rate O p W α ).) The case that is diametrically opposite to polynomial sampling is superexponential sampling, where the sampling is too much in the sense that the convergence rate E = O p l ) is dominated by recursion error. There is a corresponding reduction in efficiency as can be seen in the expression provided in ii) of Theorem 6.6. The assertion ii) in Theorem 6.6 also implies that the only sampling regime that achieves efficiency for linearly converging DA recursions is a Geometricc) sampling rate with c 1, l 1/α ). Values of c on or above the threshold l 1/α result in too much sampling in the sense of a dominating recursion error and a corresponding reduction in efficiency, as quantified in i) and ii) of Theorem 6.6. Before we state a result that is analogous to Theorem 6.6 for the context of superlinearly converging DA recursions, we state and prove a lemma that will be useful. Lemma 6.7. Suppose {a } is a positive-valued sequence satisfying a 1/q 1 as, where q > 1. If Λ 0, 1) is a well-defined random variable, then a Λ q p 0. Proof. Since Λ 0, 1), for any given ɛ 0, 1), there exists δ 1 ɛ) > 0 such that PΛ > 1 + δ 1 ɛ)) 1 ) ɛ. Also, since a 1/q for all N 1, a 1/q 1 as, we can find N 1 such that > 1 δ 2 1ɛ). Therefore, we see that for any given ɛ > 0, there

18 62 R. PASUPATHY, P. GLYNN, S. GHOSH, AND F. HASHEMI exists δ 1 ɛ) > 0 such that for N 1, 50) Pa 1/q Λ > 1 δ1ɛ))1 2 + δ 1 ɛ)) 1 ) = Pa 1/q Λ > 1 δ 1 ɛ)) ɛ. Also notice that, for any given δ 0 > 0, we can choose N 2 such that 1 δ 1 ɛ)) q δ 0 for N 2. Using this observation with 50), we see that, for any given δ 0 > 0 and ɛ > 0, Pa Λ q > δ 0 ) ɛ for maxn 1, N 2 ), and the assertion of the lemma holds. Theorem 6.8 superlinearly converging DA). Let Assumption 3.1, Assumption A.2 from Theorem 6.6, and the following assumption on superlinear decrease hold. A.3. The deterministic recursion DA) exhibits SuperLinearλ q, q) convergence in a neighborhood around x ; that is, there exist a neighborhood V of x and constants λ q > 0, q > 1 such that whenever x V, and for all, x + h x) x λ q x x q. Also, suppose the simulation estimator H m, X ) satisfies for all 0, n 0, with probability one, 51) E[m αn H m, X ) h X ) n F 1 ] κ n 0 + κ n 1 X, for some α > 0 and where κ 0 and κ 1 > 0 are constants. Then, as, the following hold: i) O p αp if {m } grows as Polynomialλ p, p), pα > 1, O ) p c α if {m } grows as Geometricc), E = O p c αt 1 ) if {m } grows as SupExponentialλ t, t), t 1, q), O p Λ q + c αq 2 ) if {m } grows as SupExponentialλ t, t), t q, where c 1 = m 0 λ 1/t 1) t, c 2 = κ 1 α 2λ 1 q q ) q αq 1) m0, κ = maxκ 0, κ 1 ), and Λ is a random variable that satisfies Λ 0, 1). ii) W α p p+1 Λ W + c α W 2 E = O p 1) if {m } grows as Polynomialλ p, p), pα > 1, W αe = O p 1) if {m } grows as Geometricc), W) αe = O p 1) if {m } grows as SupExponentialλ t, t), t 1, q), E = O p 1) if {m } grows as SupExponentialλ t, t), t q, where W = log c1 W ) logt q. Proof. Repeating arguments leading to 38) in the proof of Theorem 6.6, we write 52) E K0++1 λ q E q K 0+ + ζ K 0+, where ζ K0+ = h K0+X K0+) H K0+m K0+, X K0+), and, as in the proof of Theorem 6.6, K 0 is a random variable such that, except for a set of measure zero, X x for K 0 ; the constant is chosen such that the ball B x ) V and < 2λ q ) 1 q 1, where the set V is the neighborhood appearing in A.3.

19 ON SAMPLING RATES IN SIMULATION-BASED RECURSIONS 63 Denote sn) := 1 + q + + q n 1 = q n 1)/q 1), n 1, and recurse 52) to obtain for 0 E K0++1 λ q E q K 0+ + ζ K 0+ 53) λ q λq E q K + ζ ) q 0+ 1 K ζk0+ λ q 2 q λ q qe q2 K q ζ K0+ 1 q) + ζ K0+ = 2 q λ 1+q q 2 q λ 1+q q. E q2 K q λ q ζ K0+ 1 q + ζ K0+ λq E q K + ζ ) q 0+ 2 K q λ q ζ K0+ 1 q + ζ K0+ 2 s+1) 1 λ s+1) q E q+1 K 0 + +K 0 2 s+1) 1 λ s+1) q E q+1 K 0 + ζ q =K 0 ζ K0+ q λq s ) 2 s +1) 1 +K 0 λq s+k0 ) 2 s+k0 +1) 1 +K 0 2 s+1) 1 λ s+1) q E q+1 K 0 + ζ q+k 0 λq s+k0 ) 2 s+k0 +1) 1, where the second to last inequality in 53) is after relabeling + K 0 in the inequality immediately preceding it, and the last inequality is obtained by adding some positive terms to the right-hand side of 53). Now relabel + K 0 in 53) and notice that E K0 by the definition of K 0 to get, for K 0, 54) E +1 2 s K0+1) 1 λ s K0+1) q q K λ q ) s K0+1) q K λ 1 q q ) 1 ζ q λ s ) q 2 s +1) 1 ζ q 2λ 1 q ) s +1). Now, we see from 51) after taing expectation with respect to F 1 ) that 55) E[ ζ q ] κ q m αq 1 + E[ X ]) κ q m αq 1 + x + E[E ]), where κ = maxκ 0, κ 1 ). As in the proof of Theorem 6.6, due to Theorem 5.2, for given ɛ > 0 there exists 0 ɛ) such that for 0 ɛ), E[E ] ɛ. This and 55) imply

20 64 R. PASUPATHY, P. GLYNN, S. GHOSH, AND F. HASHEMI that E ζ q 2λ 1 q ) s +1) 56) 0ɛ) κ q 2λ 1 q q ) s +1) m αq 1 + x + E[E ]) + = 0ɛ)+1 κ q 2λ 1 q ) s +1) m αq 1 + x + ɛ). Since E[E ] < for 0 ɛ), e = maxmax{e[e ] : = 1, 2,..., 0 ɛ)}, ɛ) <. The inequality in 56) then implies that 57) E ζ q 2λ 1 q ) s +1) 1 + x + e ) κ q 2λ 1 q q ) s +1) m αq. Again, as in the proof of Theorem 6.6, we now from part i) of Lemma 6.5 that if a positive random sequence {S n } satisfies E[S n ] = Oa n ), where {a n } is a deterministic positive-valued sequence, then S n = O p a n ). Therefore, we see from 57) that 58) ζ q 2λ 1 q ) s +1) = O p κ q 2λ 1 q q ) s +1) m αq Use 58) and 54) to write, for K 0, E +1 2 s K0+1) 1 λ s K0+1) q q K O p κ q 2λ 1 q ) s +1) m αq 59) 2λ q ) 1 q 1 1 ΛK 0 )) q + O p κ q 2λ 1 q q ) s +1) m αq where, after some algebra, the random variable ΛK 0 ) in 59) can be seen to be 60) ΛK 0 ) = 2λ q ) 1 q 1 ) q K The constant ΛK 0 ) 0, 1) because has been chosen so that < 2λ q ) 1 q 1.) Proof of i). In what follows, the assertions in i) will be proved using conclusions from three parts named Part A, Part B, and Part C that follow. In Part A, we will analyze the behavior of the summation 2λ 1 κq q ) s +1) m αq appearing in 59) when the sample size sequence {m } is Polynomialλ p, p), or Geometricc), or SupExponentialλ t, t) with t 1, q). In Part B, we will analyze the behavior of the summation κq 2λ 1 q ) s +1) m αq appearing in 59) when the sample size sequence {m } is SupExponentialλ t, t) with t q. In Part C, we will analyze the,.

ON SAMPLING RATES IN SIMULATION-BASED RECURSIONS

ON SAMPLING RATES IN SIMULATION-BASED RECURSIONS RAGHU PASUPATHY, PETER GLYNN, SOUMYADIP GHOSH, AND FATEMEH S. HASHEMI Abstract. We consider the context of simulation-based recursions, that is, recursions