c 2018 Society for Industrial and Applied Mathematics

Size: px
Start display at page:

Download "c 2018 Society for Industrial and Applied Mathematics"

Transcription

1 SIAM J. OPTIM. Vol. 28, No. 1, pp c 2018 Society for Industrial and Applied Mathematics ON SAMPLING RATES IN SIMULATION-BASED RECURSIONS RAGHU PASUPATHY, PETER GLYNN, SOUMYADIP GHOSH, AND FATEMEH S. HASHEMI Abstract. We consider the context of simulation-based recursions, that is, recursions that involve quantities needing to be estimated using a stochastic simulation. Examples include stochastic adaptations of fixed-point and gradient descent recursions obtained by replacing function and derivative values appearing within the recursion by their Monte Carlo counterparts. The primary motivating settings are simulation optimization and stochastic root finding problems, where the low point and the zero of a function are sought, respectively, with only Monte Carlo estimates of the functions appearing within the problem. We as how much Monte Carlo sampling needs to be performed within simulation-based recursions in order that the resulting iterates remain consistent and, more importantly, efficient, where efficient implies convergence at the fastest possible rate. Answering these questions involves trading off two types of error inherent in the iterates: the deterministic error due to recursion and the stochastic error due to sampling. As we demonstrate through a characterization of the relationship between sample sizing and convergence rates, efficiency and consistency are intimately coupled with the speed of the underlying recursion, with faster recursions yielding a wider regime of optimal sampling rates. The implications of our results for practical implementation are immediate since they provide specific guidance on optimal simulation expenditure within a variety of stochastic recursions. Key words. simulation-based recursions, machine learning, stochastic optimization, stochastic gradient AMS subect classifications. 90CXX, 62LXX, 93E35, 68Q32 DOI / Introduction. We consider the question of sampling within algorithmic recursions that involve quantities needing to be estimated using a stochastic simulation. The prototypical example setting is simulation optimization SO) [17, 26], where an optimization problem is to be solved using only a stochastic simulation capable of providing estimates of the obective function and constraints at a requested point. Another closely related example setting is the Stochastic Root Finding Problem SRFP) [25, 28, 27], where the zero of a vector function is sought, with only simulation-based estimates of the function involved. SO problems and SRFPs, instead of stipulating that the functions involved in the problem statement be nown exactly or in analytic form, allow implicit representation of functions through a stochastic simulation, thereby facilitating virtually any level of complexity. Such flexibility has resulted in adoption across widespread application contexts. A few examples are logistics [18, 19, 3], healthcare [1, 13, 11], epidemiology [14], and vehicular-traffic Received by the editors January 6, 2014; accepted for publication in revised form) October 10, 2017; published electronically January 9, Funding: The wor of the first author was supported by Office of Naval Research contracts N and N and National Science Foundation grant CMMI He is also grateful for the financial and logistics support provided by IBM Research, Yortown Heights, NY, where he spent his sabbatical year Department of Statistics, Purdue University, West Lafayette, IN pasupath@purdue.edu). Department of Management Science and Engineering, Stanford University, Stanford, CA glynn@stanford.edu). T.J. Watson IBM Research, Yortown Heights, NY ghosh@us.ibm.com). The Grado Department of Industrial and Systems Engineering, Virginia Tech, Blacsburg, VA fatemeh.s.hashemi@gmail.com). 45

2 46 R. PASUPATHY, P. GLYNN, S. GHOSH, AND F. HASHEMI systems [24]. A popular and reasonable solution paradigm for solving SO problems and SRFPs is to simply mimic what a solution algorithm might do within a deterministic context, after estimating any needed function and derivative values using the available stochastic simulation. An example serves to illustrate such a technique best. Consider the basic quasi-newton recursion 1) x +1 = x α H 1 f x ) fx ), used to find a local minimum of a twice-differentiable real-valued function f : R d R, where H f x) and fx) are the Hessian and gradient deterministic) approximations of the true Hessian H f x) and gradient fx) of the function f : R d R at the point x. We emphasize that H f x) and fx) as they appear in 1) are deterministic and could be, for example, approximations obtained through appropriate finite-differencing of the function f at a set of points around x.) Suppose that the context in consideration is such that only noisy simulation-based estimates of f are available, implying that the recursion in 1) is not implementable as written. A reasonable adaptation of 1) might instead be the recursion 2) X +1 = X ˆα Ĥ 1 f m, X ) ˆ fm, X ), where ˆ fm, x), x R d, and Ĥf m, x), x R d, are simulation estimators of fx), x R d, and H f x), x R d, constructed using estimated function values, and the step-length ˆα estimates the step-length α appearing in the deterministic recursion 1). The simulation effort m in 2) is general and might represent the number of simulation replications in the case of terminating simulations or the simulation run length in the case of nonterminating simulations [21]. While the recursion in 2) is intuitively appealing, important questions arise within its context. Since the exact function value fx) at any point x is unnown and needs to be estimated using stochastic sampling, one might as how much sampling m should be performed during each iteration. Inadequate sampling can cause nonconvergence of 2) due to repeated mis-steps from which iterates in 2) might fail to recover. Such nonconvergence can be avoided through increased sampling, that is, using large m values; however, such increased sampling translates to an increase in computational complexity and an associated decreased convergence rate. The questions we answer in this paper pertain to the simulation) sampling effort expended within recursions such as 2). Our interest is a generalized version of 2) that we call sampling controlled stochastic recursion SCSR), which will be defined more rigorously in section 3. Within the context of SCSR, we as the following questions. Q.1 What sampling rates in SCSR ensure that the resulting iterates are strongly consistent, that is, converge to the correct solution with probability one? Q.2 What is the convergence rate of the iterates resulting from SCSR, expressed as a function of the sample sizes and the speed of the underlying deterministic recursion? Q.3 With reference to Q.2, are there specific SCSR recursions that guarantee a canonical rate, that is, the fastest achievable convergence speed under generic sampling? Q.4 What do the answers to Q.1 Q.3 imply for practical implementation? Questions such as what we as in this paper have recently been considered [15, 10, 29] but usually within a specific algorithmic context. An exception is [7], which

3 ON SAMPLING RATES IN SIMULATION-BASED RECURSIONS 47 broadly treats the complexity trade-offs stemming from estimation, approximation, and optimization errors within large-scale learning problems.) In [15], for instance, the behavior of the stochastic gradient descent recursion 3) x +1 = x α g is considered for optimizing a smooth function f, where α is the step size used during the th iteration and g is an estimate of the gradient fx ). Importantly, g is assumed to be estimated such that the error in the estimate e = g fx ) satisfies E [ e 2] B, where B is a per-iteration bound that can be seen to be related to the notion of sample size in this paper. The results in [15] detail the functional relationship between the convergence rate of the sequence {x } in 3) and the chosen sequence {B }. Lie in [15], the recursion considered in [10] is again 3), but [10] considers the question more directly, proposing a dynamic sampling scheme ain to that in [29] that is a result of balancing the variance and the squared bias of the gradient estimate at each step. One of the main results in [10] states that when sample sizes grow geometrically across iterations, the resulting iterates in 3) exhibit the fastest achievable convergence rate, something that will be reaffirmed for SCSR recursions considered in this paper. As already noted, we consider the questions Q.1 Q.4 within a recursive context SCSR) that is more general than 3) or 2). Our aim is to characterize the relationship between the errors due to recursion and sampling that naturally arise in SCSR, and their implication to SO and SRFP algorithms. We will demonstrate through our answers that these errors are inextricably lined and fully characterizable. Furthermore, we will show that such characterization naturally leads to sampling regimes which, when combined with a deterministic recursion of a specified speed, result in specific SCSR convergence rates. The implication for implementation seems clear: given the choice of the deterministic recursive structure in use, our error characterization suggests sampling rates that should be employed in order to enoy the best achievable SCSR convergence rates Summary and insight from main results. The results we present are broadly divided into those concerning the strong consistency of SCSR iterates and those pertaining to SCSR s efficiency as defined from the standpoint of the total amount of simulation effort. Insight on consistency appears in the form of Theorem 5.2, which relates the estimator quality in SCSR with the minimum sampling rate that will guarantee almost sure convergence. Theorem 5.2 is deliberately generic in that it maes only mild assumptions about the speed of the recursion in use within SCSR and about the simulation estimator quality. Theorem 5.2 also guarantees convergence to zero) of the mean absolute deviation or L 1 convergence) of SCSR s iterates to a solution. Theorems and associated corollaries are devoted to efficiency issues surrounding SCSR. Of these, Theorems are the most important and characterize the convergence rate of SCSR as a function of the sampling rate and the speed of recursion in use. Specifically, as summarized in Figure 1, these results characterize the sampling regimes resulting in predominantly sampling error too little sampling ) versus those resulting in predominantly recursion error too much sampling ), along with identifying the convergence rates for all recursion-sampling combinations. Furthermore, and as illustrated using the shaded region in Figure 1, Theorems identify those recursion-sampling combinations yielding the optimal rate, that is, the highest achievable convergence rates with the given simulation estimator at hand. As

4 48 R. PASUPATHY, P. GLYNN, S. GHOSH, AND F. HASHEMI it turns out, and as implied by Theorems , recursions that utilize more structural information afford a wider range of sampling rates that produce the optimal rate. For instance, Theorems imply that recursions such as 2) will achieve the optimal rate if the sampling rate is either geometric, or superexponential up to a certain threshold; sampling rates falling outside this regime yield subcanonical convergence rates for SCSR. The notions of optimal rates, sampling rates, and recursion rates will be defined rigorously in short order.) The corresponding regime when using a linearly converging recursion such as a fixed-point recursion is narrower and limited to a small band of geometric sampling rates. Interestingly, our results show that sublinearly converging recursions are incapable of yielding optimal rates for SCSR, that is, the sampling regime that produces optimal rates when a sublinearly converging recursion is in use is empty. We also present a result Theorem 6.10) that provides a complexity bound on the mean absolute error of the SCSR iterates under more restrictive assumptions on the behavior of the recursion in use Paper organization. The rest of the paper is organized as follows. In the ensuing section, we introduce much of the standing notation and conventions used throughout the paper. This is followed by section 3, where we present a rigorous problem statement, and by section 4, where we present specific nontrivial examples of SCSR recursions. Sections 5 and 6 contain the main results of the paper. We provide concluding remars in section 7, with a brief commentary on implementation and the use of stochastic sample sizes. 2. Notation and convention. We will adopt the following notation throughout the paper. For more details, especially on the convergence of sequences of random variables, see [5]. i) If x R d is a vector, then its components are denoted through x x 1), x 2),..., x d) ). ii) We use e i R d to denote a unit vector whose ith component is 1 and whose every other component is 0, that is, e i i) = 1 and e i ) = 0 for i. iii) For a sequence of random variables {Z n }, we say Z n p Z if {Zn } d converges to Z in probability; we say Z n Z to mean that {Zn } converges to Z in L distribution; we say that Z p n Z if E[ Z n Z p wp1 ] 0; and finally, we say Z n Z wp1 to mean that {Z n } converges to Z with probability one. When Z n z, where z is a constant, we will say that Z n is strongly consistent with respect to z. iv) Z + denotes the set of positive integers. v) B r x ) {x : x x r} denotes the d-dimensional Euclidean ball centered on x and having radius r. vi) distx, B) = inf{ x y : y B} denotes the Euclidean distance between a point x R d and a set B R d. vii) diamb) = sup{ x y : x, y B} denotes the diameter of the set B R d. viii) For a sequence of real numbers {a n }, we say a n = o1) if lim n a n = 0 and a n = O1) if {a n } is bounded, i.e., c 0, ) with a n < c for large enough n. We say that a n = Θ1) if 0 < lim inf a n lim sup a n <. For positive-valued sequences {a n }, {b n }, we say a n = Ob n ) if a n /b n = O1) as n ; we say a n = Θb n ) if a n /b n = Θ1) as n. ix) For a sequence of p positive-valued random variables {A n }, we say A n = o p 1) if A n 0 as n ; and we say A n = O p 1) if {A n } is stochastically bounded, that is, for given ɛ > 0 there exists cɛ) 0, ) with PA n < cɛ)) > 1 ɛ for large enough n. If {B n } is another sequence of positive-valued random variables, we say A n = O p B n ) if A n /B n = O p 1) as n ; we say A n = o p B n ) if A n /B n = o p 1) as n. Also, when we say A n O p b n ), we mean that A n B n, where {B n } is a random sequence that satisfies B n = O p b n ). x) For two sequences of real numbers {a n }, {b n } we say a n b n if lim n a n /b n = 1.

5 ON SAMPLING RATES IN SIMULATION-BASED RECURSIONS 49 Also, the following notions will help our exposition and will be used heavily. Definition 2.1 growth rate of a sequence). A sequence {m } is said to exhibit Polynomialλ p, p) growth if m = λ p p, = 1, 2,..., for some λ p, p 0, ); it is said to exhibit Geometricc) growth if m +1 = c m, = 0, 1, 2,..., for some c 1, ); and it is said to exhibit SupExponentialλ t, t) growth if m +1 = λ t m t, = 0, 1, 2,..., for some λ t 0, ), t 1, ). Definition 2.2 a sequence increasing faster than another). Let {m } and { m } be two positive-valued increasing sequences that tend to infinity. Then {m } is said to increase faster than { m } if m +1 /m m +1 / m for large enough. In such a case, { m } is also said to increase slower than {m }. According to Definitions 2.1 and 2.2, it can be seen that any sequence that is growing as SupExponentialλ t, t) is faster than any other sequence that is growing as Geometricc); liewise, any sequence growing as Geometricc) is faster than any other sequence growing as Polynomialλ p, p). 3. Problem setting and assumptions. The general context that we consider is that of unconstrained sampling-controlled stochastic recursions SCSR), defined through the following recursion: SCSR) X +1 = X + H m, X ), = 0, 1, 2,..., where X R d for all. The deterministic analogue DA) of SCSR is DA) x +1 = x + h x ), = 0, 1, 2,.... The random function H m, x), x R d, called the simulation estimator should be interpreted as estimating the corresponding deterministic quantity h x) at the point of interest x, after expending m amount of simulation effort. We emphasize that the obects h ) and H m, ) appearing in DA) and SCSR) can be iteration dependent functions. Two illustrative examples are presented in section Assumptions. The following two assumptions are standing assumptions that will be invoed in several of the important results of the paper. Further assumptions will be made as and when required. Assumption 3.1. The recursion DA) exhibits global convergence to a unique point x ; that is, the sequence {x } of iterates generated by DA) when started with any initial point x 0 satisfies lim x = x. Assumption 3.2. Denote the filtration F = σ{x 0, H 0 m 0, X 0 )), X 1, H 1 m 1, X 1 )),..., X, H m, X ))} generated by the history sequence after iteration. Then the simulation estimator H m, X ) satisfies for 1, with probability one, 4) E [m α H m, X ) h X ) F 1 ] κ 0 + κ 1 X for some α > 0, and where κ 0, κ 1 are some positive constants. We will refer to the constant α as the convergence rate associated with the simulation estimator. Assumption 3.1 assumes convergence of the deterministic recursion DA) s iterates starting from any initial point x 0. Such an assumption is needed if we were to expect

6 50 R. PASUPATHY, P. GLYNN, S. GHOSH, AND F. HASHEMI stochastic iterations in SCSR) to converge to the correct solution in any reasonable sense. We view the deterministic recursion DA) to be the limiting form of SCSR), obtained, for example, if the estimator H m, x) at hand is a perfect estimator of h x), constructed using a hypothetical infinite sample. Assumption 3.2 is a statement about the behavior of the simulation estimator H m, x), x R d, and is analogous to standard assumptions in the literature on stochastic approximation and machine learning, e.g., Assumption A3 in [6] and Assumption 4.3b),c) in [8]. In order to develop convergent algorithms for the context we consider in this paper, some sort of restriction on the extent to which a simulation estimator can mislead an algorithm is necessary. Assumption 3.2 is a formal codification of such a restriction; it implies that the error in the estimator H m, X ), conditional on the history of the observed random variables up to iteration, decays with rate α. Furthermore, the manner of such decay can depend on the current iterate X. Assumption 3.2 subsumes typical stochastic optimization contexts where the mean squared error of the simulation estimator with respect to the true obective function value) at any point is bounded by an affine function of the squared L 2 -norm of the true gradient at the point, assuming that the gradient function is Lipschitz Wor and efficiency. In the analysis considered throughout this paper, computational effort calculations are limited to simulation effort. Therefore, the total wor done through iterations of SCSR is given by W = m i. i=1 Our assessment of any sampling strategy will be based on how fast the error E = X x in the th iterate of SCSR stochastically) converges to zero as a function of the total wor W. This will usually be achieved by first identifying the convergence rate of E with respect to the iteration number and then translating this rate with respect to the total wor W. Under mild conditions, we will demonstrate that E cannot converge to zero faster than W α in a certain rigorous sense), where α is defined through Assumption 3.2. This maes intuitive sense because it seems reasonable to expect that a stochastic recursion s quality is at most as good as the quality of the estimator at hand. We will then deem those recursions having error sequences {E } that achieve the convergence rate W α as being efficient. The convergence rate of E with respect to the iteration number is of little significance. 4. Examples. In this section, we illustrate SCSR using two popular recursions occurring within the context of SO and SRFPs. For each example, we show the explicit form of the SCSR and the DA recursions through their corresponding functions H m, ) and h ). We also identify the estimator convergence rate α in each case Sampling controlled gradient method with fixed step. Consider the context of solving an unconstrained optimization problem using the gradient method [9, section 9.3], usually written as 5) x +1 = x + t fx )), = 0, 1,..., where f : R d R is the real-valued function being optimized, f : R d R d is its gradient function, and t > 0 is an appropriately chosen constant. Instead of a fixed stepsize t in 5), one might use a diminishing stepsize sequence {t } chosen to satisfy

7 ON SAMPLING RATES IN SIMULATION-BASED RECURSIONS 51 t 0, =1 t = [4, Chapter 1].) Owing to its simplicity, the recursion in 5) has recently become popular in large-scale SO contexts [8]. Let us now suppose that the gradient function g ) f ) in 5) is unobservable, but we have access to i.i.d. observations G i x), i = 1, 2,..., satisfying E[G i x)] = gx) for any x R d. The sampling controlled version of the gradient method then taes the form 6) X +1 = X + t m 1 m i=1 ) G i X ), = 0, 1,..., thus implying the SCSR) and DA) recursive obects H m, x) t m 1 m i=1 G ix)), h x) t fx) for all x R d. Using standard arguments [23, Theorem ] it can be shown that when f is strongly convex and differentiable with a gradient that satisfies fx) fy) L x y for all x, y R d, L <, and the step size t L 1, the iterates in 5) exhibit linear convergence to a zero of f. Furthermore, elementary probabilistic arguments show that Assumption 3.2 is satisfied with rate constant α = 1/ Sampling controlled Kiefer Wolfowitz iteration. Let us consider unconstrained simulation optimization on a differentiable function f : R d R that is estimated using F m, x) = m 1 m i=1 F ix), where F i x), x R d, i = 1, 2,..., are i.i.d. copies of an unbiased function estimator F x), x R d of f. Assume that we do not have direct stochastic observations of the gradient function fx) so that the current context differs from that in section 4.1. This context has recently been called the zeroth order [16] for the reason that only function estimates are available.) We thus choose the SCSR iteration to be a modified Kiefer Wolfowitz [20] iteration constructed using a finite difference approximation of the stochastic function observations. Specifically, recalling the notation G G 1), G 2),..., G d) ), suppose 7) X +1 = X tgm, X ), = 0, 1,..., where 8) G i) m, X ) = F m, X + s i) ) F m, X s i) ) 2s i) estimates the ith partial derivative of f at X, s s 1), s2),..., sd) ) is the vector step, and t is an appropriately chosen constant. Assume, for simplicity, that the function observations generated at X s are independent of those generated at X + s. In the notation of SCSR) and DA), the simulation estimator H m, x) tgm, x) and h x) t fx) for all x R d assuming that s is chosen so that s i) 0 and m s i), i = 1, 2,..., d. Furthermore, if s is chosen as s i) = cm 1/6 and f has a bounded third derivative, then Assumption 3.2 is satisfied with α = 1/3 [2, Proposition 1.1]. Also, the deterministic recursion DA) corresponding to 7) is the same as that in section 4.1, and the iteration complexity discussed there applies here as well. Remar 4.1. In 8), derivative estimators with faster convergence rates can be constructed by estimating higher order derivatives of f. For instance, by observing G i) m, x + u i) ), = 1, 2,..., n, at n strategically located design points x + u 1, x +

8 52 R. PASUPATHY, P. GLYNN, S. GHOSH, AND F. HASHEMI u 2,..., x + u n, the error E[ H m, x) h x) ] = Om n/2n+1 ), that is, the error in the estimator can be made arbitrarily close to the Monte Carlo canonical rate ) [2, Chapter VII, section 1a]. Om 1/2 5. Consistency. In this section, we present a result that clarifies the conditions on the sampling rates to ensure that the iterates produced by SCSR exhibit almost sure convergence to the solution x. We will rely on the following elegant result that appears in a slightly more specific form as Lemma 11 on page 50 of [30]. Lemma 5.1. Let {V } be a sequence of nonnegative random variables, where E[V 0 ] <, and let {r } and {q } be deterministic scalar sequences such that E[V +1 V 0, V 1,..., V ] 1 r )V + q almost surely for 0, where 0 is fixed, 0 r 1, q 0, =0 r =, =0 q <, lim r 1 q = 0. Then, lim V = 0 almost surely and lim E[V ] = 0. We now state the main consistency result for SCSR) when the corresponding deterministic DA recursion exhibits Sub-Linears) or Linearl) convergence. Theorem 5.2. Let Assumptions 3.1 and 3.2 hold. Let the sample size sequence {m } satisfy m 1 = O 1 α δ ) for some δ > 0. The constant α is the convergence rate of the simulation estimator appearing in Assumption 3.2.) i) Suppose the recursion DA) guarantees a Sub-Linears) decrease at each, that is, for every x, and some s 0, 1), x + h x) x 1 s ) 9) x x. Then X x wp1 0 and E[ X x ] 0. ii) Suppose the recursion DA) guarantees a Linearl) decrease at each, that is, for every x,, the recursion DA) satisfies 10) x + h x) x l x x. Then X x wp1 0 and E[ X x ] 0. Proof. Let us first prove the assertion in i). Using SCSR) and recalling the unique solution x to the recursion DA), we can write 11) X +1 x = X + h X ) x + H m, X ) h X ), = 0, 1, 2,.... Denoting E = X x, 11) gives E +1 1 s ) 12) E + H m, X ) h X ), = 0, 1, 2,.... Now conditioning on F 1 and then taing expectation on both sides of 12), we get E [E +1 F 1 ] 1 s ) E + E[ H m, X ) h X ) F 1 ] 1 s ) E + κ 0 m α + κ 1 X m α 13) 1 s + κ 1 m α ) E + κ 0 + κ 1 x m α 1 s κ 1m α ) ) E + κ 0 + κ 1 x m α.

9 ON SAMPLING RATES IN SIMULATION-BASED RECURSIONS 53 If the sequence {m } is chosen so that m 1 = O 1 α δ ) for some δ > 0 as has been postulated by the theorem), then for any given ɛ, we see that κ 1 m α < ɛ for large enough. Therefore, after integrating out the random variables H i m i, X i ), i = 0, 1,..., 1, in 13), we can write for any given ɛ 0, s) and large enough that ) s ɛ) E [E +1 E 0, E 1,..., E ] 1 E + κ 0 + κ 1 x 14) m α. Now, if we apply Lemma 5.1 to 14) with r s ɛ) 1 and q βm α for β = κ 0 + κ 1 x, then =1 r = =1 s ɛ) 1 =, =1 q = =1 βm α = O =1 1 αδ ) <, and lim sup r 1 q lim sup βs ɛ) 1 αδ = 0. We thus see wp1 that the postulates of Lemma 5.1 hold, implying that E 0 and E[E ] 0. Next, suppose the recursion DA) exhibits Linearl) convergence. The inequality analogous to 14) is then 15) E [E +1 E 0, E 1,..., E ] 1 1 l κ 1 m α ))E + κ 0 + κ 1 x. Since {m }, we see that for any given ɛ 0, 1 l), for large enough 16) E [E +1 E 0, E 1,..., E ] 1 1 l ɛ))e + κ 0 + κ 1 x m α. Now, apply Lemma 5.1 to 16) with r 1 l ɛ and q βm ) α for β = κ 0 +κ 1 x. If the sequence {m } is chosen so that m 1 = O 1 α δ for some δ > 0, then =1 r = =1 l ɛ =, =1 q = =1 βm α = O =1 1 αδ ) < and lim sup r 1 q lim sup βl ɛ) 1 1 αδ = 0. We thus see that the postulates of wp1 Lemma 5.1 hold implying that E 0 and E[E ] 0. It is important to note that the assumed decrease condition, 9) or 10), is on the hypothetical) deterministic recursion DA) and not the stochastic recursion SCSR). The motivating setting here is unconstrained convex minimization where a decrease such as 9) or 10) can usually be guaranteed. The theorem can be relaxed to more general settings where the decrease condition 10) holds only when X is close enough to x, but as we show later when we characterize convergence rates, we will still need a wea decrease condition such as 9) to hold for all X. For this reason, part i) in Theorem 5.2 should be seen as the main result on the strong consistency of SCSR. The stipulation m 1 = O 1 α δ ) for some δ > 0 in Theorem 5.2 amounts to a wea stipulation on the sample size increase rate for guaranteeing strong consistency and L 1 convergence. That the minimum stipulated sample size increase depends on the quality as encoded by the convergence rate α) of the simulation estimator is to be expected. However, part ii) of Theorem 5.2 implies that the minimum stipulated sample size increase does not depend on the speed of the underlying deterministic recursion as long as it exceeds a sublinear rate. So, when a linear decrease 10) as in part ii) of Theorem 5.2 is ensured, the sample size stipulation m 1 = O 1 α δ ) needed for strong consistency remains the same. This, as we shall see in greater detail in ensuing sections, is because sampling error dominates the error due to recursion and is hence decisive in determining whether the iterates converge. 6. Convergence rates and efficiency. In this section, we present results that shed light on the convergence rate and the efficiency of SCSR under different sampling m α

10 54 R. PASUPATHY, P. GLYNN, S. GHOSH, AND F. HASHEMI and recursion contexts. Specifically, we derive the convergence rates associated with using various combinations of sample size increases polynomial, geometric, superexponential) and the speed of convergence of the DA recursion sublinear, linear, superlinear). This information is then used to identify what sample size growth rates may be best, that is, efficient, for various combinations of recursive structures and simulation estimators. See Figure 6.1 for a concise and intuitive summary of the results in this section.) In what follows, convergence rates are first expressed as a function of the iteration and the various constants associated with sampling and recursion. These obtained rates are then related to the total wor done through iterations of SCSR given by W = i=1 m i, in order to obtain a sense of the efficiency. As we show next, the quantity W α is a stochastic lower bound on the error E in the SCSR iterates; thus, loosely speaing, α is an upper bound on the convergence rate of the error in SCSR iterates. It is in this sense that we say SCSR s iterates are efficient whenever they attain the rate W α. Theorem 6.1. Let the postulates of Theorem 5.2 hold with a nondecreasing sample size sequence {m }, and let the recursion DA) satisfy postulate i) in Theorem 5.2. Furthermore, suppose there exist δ, ɛ > 0, and a set B δ x ) such that, for large enough, 17) inf {x,u) :x B δ x ); u =1} Pmα H m, x) h x)) T u δ ) > ɛ. Then the recursion SCSR cannot converge faster than W α ; that is, there exists ɛ > 0 such that for any sequence of sample sizes {m }, lim inf PW α E > δ ) > ɛ. Proof. Since the postulates of Theorem 5.2 are satisfied, we are guaranteed that X x wp1 0 and hence that X x p 0. For proving the theorem, we will show that for large enough, Pm α E δ ) > ɛ, where ɛ > 0. Since W = =1 m m, the assertion of Theorem 6.1 will then hold. Choose δ = minδ, δ ), where δ is the constant appearing in 17). Since {X } p x, for large enough, we have 18) PX B δ x )) 1 ɛ. Denoting U X ) := X + h X ) x, we can write for large enough 19) Pm α +1E +1 δ ) Pm α E +1 δ ) = PA 1 ) + PA 2 ), where the events A 1 and A 2 in 19) are defined as follows. 20) A 1 := m α E +1 δ ) U X ) 0) ; A 2 := m α E +1 δ ) U X ) = 0). We also define the following two other events: 21) C 1 := m α H m, X ) h X )) T U X ) δ U X ) ) U X ) 0) ; C 2 := m α H m, X ) h X ) δ ) U X ) = 0). Since E +1 = X + H m, X ) x, we notice that 22) E H m, X ) h X )) T U X ) + H m, X ) h X ) 2.

11 ON SAMPLING RATES IN SIMULATION-BASED RECURSIONS 55 Due to 22) and the Cauchy Schwarz inequality [5], we see that A 1 C 1 ; due to 22), we also see that A 2 C 2. Hence 23) PA 1 ) PC 1 ) and PA 2 ) PC 2 ). Define R := {x : U x) = 0}. Then, due to the assumption in 17), we see that for any x B δ x ) R c, 24) PC 1 X = x) > ɛ. And, since the Cauchy Schwarz inequality [5] implies that m α H m, X ) h X ) m α H m, X ) h X )) T u for any unit vector u, we again see from the assumption in 17) that for any x B δ x ) R, 25) PC 2 X = x) > ɛ. Next, letting F X denote the distribution function of X, we write PC 1 ) = PC 1 X = x) df X x) PC 1 X = x)i{x B δ x ) R c } df X x) 26) ɛ PX B δ x ) R c ), where the second inequality in 26) follows from 24). Similarly, 27) PC 2 ) ɛ PX B δ x ) R ). Combining 26), 27), and 18), we see that for large enough, 28) PC 1 ) + PC 2 ) ɛ 1 ɛ). Finally, from 28), 23), and 19), we conclude that for large enough, 29) and the theorem is proved. Pm α E δ ) > ɛ := ɛ1 ɛ), Theorem 6.1 is important in that it provides a benchmar for efficiency. Specifically, Theorem 6.1 implies that sampling and recursion choices that result in errors achieving the rate W α are efficient. Theorem 6.1 relies on the assumption in 17) which puts an upper bound on the quality convergence rate) of the simulation estimator H m, x); the reader will recall that Assumption 3.2 puts a lower bound on the quality of H m, x). The condition in 17) is wea, especially since H m, x), being a simulation estimator, is routinely governed by a central limit theorem CLT) [5] of the form m α H m, x) h x)) d N0, Σx)) as, where N0, Σx)) is a normal random variable with zero mean and covariance Σx). We emphasize that Theorem 6.1 only says that α is an upper bound for the convergence rate of SCSR and says nothing about whether this rate is in fact achievable. We will now wor toward a general lower bound on the sampling rates that achieve efficiency. We will need the following lemma for proving such a lower bound. Lemma 6.2. Let {a } be any positive-valued sequence. Then

12 56 R. PASUPATHY, P. GLYNN, S. GHOSH, AND F. HASHEMI i) a = Θ =1 a ) if {a } is Geometricc) or faster; ii) a = o =1 a ) if {a } is Polynomialλ p, p) or slower. Proof of i). If {a } is Geometricc) or faster, we now that a +1 /a c > 1 for large enough. Hence, for some 0 and all 0, a /a c. This implies that for 0, 30) a 1 =1 a = a 1 a 1 = a 1 0 =1 0 =1 0 =1 a + a 1 a + = 0+1 = a + c a c 1 1 c. Using 30) and since a =1 a, we conclude that the assertion holds. Proof of ii). Let p > 0 be such that {a } is Polynomialλ p, p) or slower. We then now that for some 0 > 0 and all 0, a /a p / p. This implies that 31) a 1 =1 a = a 1 0 =1 a + a 1 = 0+1 a a 1 0 =1 a + p = 0+1 Now notice that the term p = 0+1 p appearing on the right-hand side of 31) diverges as to conclude that the assertion in ii) holds. We are now ready to present a lower bound on the rate at which sample sizes should be increased in order to ensure optimal convergence rates. Theorem 6.3. Let the postulates of Theorem 6.1 hold. i) Suppose m = ow ). Then the sequence of solutions {X } is such that W αe is not O p 1); that is, there exist ɛ > 0 and a subsequence { n } such that PW α n E n n) > ɛ. ii) If {m } grows as Polynomialλ p, p), then W αe is not O p 1). Proof of i). The postulates of Theorem 6.1 hold, and hence we now from 29) in the proof of Theorem 6.1 that there exists K 1 δ, ɛ) such that for K 1 δ, ɛ), 32) Pm α E δ ) > 1 ɛ)ɛ, where the constants ɛ, δ are positive constants that satisfy the assumption in 17). Since m = ow ) and α > 0, we see that m α /W α = o1) as. Therefore, for any n > 0, there exists K 2 n) such that for K 2 n), 33) δ n m α /W α. p. Combining 32), 33), we see that for any n > 0, if maxk 1 δ, ɛ), K 2 n)), then ) P m α E n mα 34) W α > 1 ɛ)ɛ, and hence, for maxk 1 δ, ɛ), K 2 n)), 35) PW α E n) > 1 ɛ)ɛ.

13 ON SAMPLING RATES IN SIMULATION-BASED RECURSIONS 57 Proof of ii). The assertion is seen to be true from the assertion in i) and upon noticing that if {m } grows as Polynomialλ p, p), then m = ow ). Theorem 6.3 is important since its assertions imply that for SCSR to have any chance of efficiency, sample sizes should be increased at least geometrically. This is irrespective of the speed of the recursion DA. Of course, since this is only a lower bound, increasing the sample size at least geometrically does not guarantee efficiency, which, as we shall see, depends on the speed of the DA recursion. Before we present such an efficiency result for linearly converging DA recursions, we need two more lemmas. Lemma 6.4. Let {a )} =1, 1, be a triangular array of positive-valued real numbers. Assume that the following hold. i) There exist and β > 1 such that a+1) a ) 1. a ii) lim sup ) a ) = l < for each [1, 1]. Then S = i=1 a i) = Oa )). Proof. We have, for large enough and any ɛ 0, ), 36) 1 S = a ) a ) ɛ + a ) a ) + 1 β for all [, 1] and all a ) a = ) l +, = β where the inequality follows from assumptions i) and ii). Since β > 1, <, and l <, the term within parentheses on the right-hand side of 36) is finite, and the assertion holds. Lemma 6.5. Let {S n } be a nonnegative sequence of random variables, N 0 a welldefined random variable, and {a n }, {b n } positive-valued deterministic sequences. i) If E[S n ] = Oa n ), then S n = O p a n ). ii) If S n O p b n ) for n N 0, then S n = O p b n ). Proof. Suppose the first assertion is false. Then there exist ɛ > 0 and a subsequence {n } such that P Sn a n ) ɛ for all 1. This, however, implies that E[ Sn a n ] ɛ for all 1, contradicting the postulate E[S n ] = Oa n ). The first assertion of the lemma is thus proved. For proving the second assertion, we first note that the postulate S n O p b n ) for n N 0 means that S n B n for n N 0, where {B n } is a sequence of random variables satisfying B n = O p b n ). Now, since B n = O p b n ), given ɛ > 0, we can choose bɛ), n 1 ɛ) so that PB n /b n bɛ)) ɛ/2 for all n n 1 ɛ). Also, since N 0 is a well-defined random variable, we can find n 2 ɛ) such that for all n n 2 ɛ), PN 0 > n) ɛ/2. We can then write for n maxn 1 ɛ), n 2 ɛ)) that PS n /b n bɛ)) PB n /b n bɛ)) N 0 n)) + PB n /b n bɛ)) N 0 > n)) 37) PB n /b n bɛ)) + PN 0 > n) ɛ 2 + ɛ 2 = ɛ, thus proving the assertion in ii).

14 58 R. PASUPATHY, P. GLYNN, S. GHOSH, AND F. HASHEMI We are now ready to prove the main result on the convergence rate and efficiency of SCSR when the DA recursion exhibits linear convergence. Theorem 6.6 presents the convergence rate in terms of the iteration number first, and then in terms of the total simulation wor W. Theorem 6.6 linearly converging DA). Let Assumptions 3.1 and 3.2 hold. Also, suppose the following two assumptions hold. A.1 The deterministic recursion DA) exhibits Linearl) convergence in a neighborhood around x ; that is, there exists a neighborhood V of x such that whenever x V, and for all, A.2 For all x,, x + h x) x l x x, l 0, 1). x + h x) x 1 s ) x x, s 0, 1). Then, recalling that E X x, as, the following hold: i) ii) O p pα if {m } grows as Polynomialλ p, p), pα > 1; O ) p c α if {m } grows as Geometricc) with c 1, l 1/α ); E = O ) p l if {m } grows as Geometricc) with c l 1/α ; O p l ) if {m } grows as SupExponentialλ t, t). W α p p+1 E = O p 1) if {m } grows as Polynomialλ p, p), pα > 1; W αe = O p 1) if {m } grows as Geometricc) with c 1, l 1/α ); c α l 1) W α E = O p 1) if {m } grows as Geometricc) with c l 1/α ; log W ) log t 1/l) E = O p 1) if {m } grows as SupExponentialλ t, t). Proof. First we see that Assumptions 3.1 and 3.2 and A.2 hold, and that all sample size sequences {m } considered in i) and ii) satisfy m = O 1 α δ ) for some δ > 0. We thus see that the postulates of Theorem 5.2 hold, implying that E = X x wp1 0. Therefore, excluding a set of measure zero, for any given > 0 there exists a well-defined random variable K 0 = K 0 ) such that X x for K 0. Now choose such that the ball B x ) V, where V is the neighborhood appearing in Assumption A.1. Since X +1 = X + H m, X ), we can write X K0++1 x = X K0+ x + h K0+X K0+) + H K0+m K0+, X K0+) h K0+X K0+), and hence 38) X K0++1 x l X K0+ x + H K0+m K0+, X K0+) h K0+X K0+). Recursing 38) bacward and recalling the notation E X x, we have for

15 ON SAMPLING RATES IN SIMULATION-BASED RECURSIONS ) E K0++1 l +1 E K0 + l +1 + l H K0+m K0+, X K0+) h K0+X K0+) l H K0+m K0+, X K0+) h K0+X K0+) +K 0 = l +1 + l +K0 H m, X ) h X ) =K 0 +K 0 l +1 + l +K0 H m, X ) h X ), where the second inequality above follows from the definition of K 0, the equality follows after relabeling + K 0, and the third inequality follows from the addition of some positive terms to the summation. Relabeling K 0 + in 39) and denoting ζ := H m, X ) hx ), we can write, for K 0, 40) E +1 l K0+1 + l ζ. Recalling the filtration F 1 generated by the history sequence, we notice that E l ζ = l E [ ζ ] 41) = l E [E [ ζ F 1 ]] l E [ m α κ 0 + κ 1 X ) ] l m α κ 0 + κ 1 x + κ 1 E [E ]), where the first inequality in 41) is due to Assumption 3.2. Due to Theorem 5.2, we now that for a given ɛ > 0, there exists 0 ɛ) such that for all ɛ), E [E ] ɛ. We use this in 41) and write [ ] E l ζ 0 ɛ) l m α κ 0 + κ 1 x + κ 1E [E ]) + = 0 ɛ)+1 l m α κ 0 + κ 1 x + κ 1ɛ) 42) 0 ɛ) l +1 l 1 m α κ 0 + κ 1 x + κ 1E [E ]) + κ 0 + κ 1 x + κ 1ɛ) l m α.

16 60 R. PASUPATHY, P. GLYNN, S. GHOSH, AND F. HASHEMI Since 0 ɛ) is finite and E[E ] < for all 0 ɛ), the inequality in 42) implies that 43) E l ζ = O l +1 + l m α. From part i) of Lemma 6.5, we now that if a positive random sequence {S } satisfies E[S ] = Oa ), where {a } is a positive-valued deterministic sequence, then S /a = O p 1). Therefore, we see from 43) after setting S to be l ζ and a to be l +1 + ) that 44) l m α l ζ = O p l +1 + Use 40) and 44) to write, for K 0, 45) l m α. E +1 l +1 l K0 + O p l +1 + = O p l +1 + l m α l m α. Now use part ii) of Lemma 6.5 on 45) to conclude that 46) E +1 = O p l +1 + l m α. We will now show that the first equality in assertion i) of Theorem 6.6 holds by showing that the two assumptions of Lemma 6.4 hold for l m α appearing in 46). Set the summand of l m α to a ), and since m = λ p p, we have a+1) a ) = 1 l +1 )pα. Choosing β such that β > 1 and lβ < 1, and setting = Max1, 1 1 lβ) 1/pα 1), we see that the first assumption of Lemma 6.4 is satisfied. The second assumption of Lemma 6.4 is also satisfied since for any fixed > 0, lim sup a ) a ) = lim sup l )pα = 0 for all [1, ]. To prove the second and third equalities in assertion i) of Theorem 6.6, suppose {m } grows as Geometricc) with c < l 1/α, that is, c α > l. Then, noticing that m = m 0 c, we write 47) l m α = m α 0 l c α = m α 0 c α = Θc α ) ) +1 ) l 1 c α 1 l c α and use 47) in 46). If {m } grows as Geometricc) with c > l 1/α, that is, c α < l,

17 ON SAMPLING RATES IN SIMULATION-BASED RECURSIONS 61 then notice that 47) becomes 48) l m α = m α 0 l c α = m α 0 = Θl ). c α l ) 1 l ) c α +1 ) l 1 c α l Now use 48) in 46). To see that the fourth equality in assertion i) of Theorem 6.6 holds, we notice that a sample size sequence {m } that grows as SupExponentialλ t, t) is faster as defined in Definition 2.2) than a sample size sequence {m } that grows as Geometricc). Proof of ii). To prove the assertion in ii), we notice that since W = =1 m, we have 49) W = Θ p+1 ) if {m } grows as Polynomialλ p, p); = Θc ) if {m } grows as Geometricc); = Θλ 1 t 1 t m 0 ) t ) if {m } grows as SupExponentialλ t, t). Now use 49) in assertion i) to obtain the assertion in ii). Theorem 6.6 provides various insights about the behavior of the error in SCSR iterates. For instance, the error structures detailed in i) of Theorem 6.6 suggest two well-defined sampling regimes where only one of the two error types, sampling error or recursion error, is dominant. Specifically, note that E = O p pα ) when the sampling rate is Polynomialλ p, p). This implies that when DA exhibits Linearl) convergence, polynomial sampling is too little in the sense that SCSR s convergence rate is dictated purely by sampling error since the constant l corresponding to DA s convergence is absent in the expression for E. The corresponding reduction in efficiency can be seen in ii) where E is shown to converge as O p W α p 1+p ). Recall that efficiency amounts to {E } achieving a convergence rate O p W α ).) The case that is diametrically opposite to polynomial sampling is superexponential sampling, where the sampling is too much in the sense that the convergence rate E = O p l ) is dominated by recursion error. There is a corresponding reduction in efficiency as can be seen in the expression provided in ii) of Theorem 6.6. The assertion ii) in Theorem 6.6 also implies that the only sampling regime that achieves efficiency for linearly converging DA recursions is a Geometricc) sampling rate with c 1, l 1/α ). Values of c on or above the threshold l 1/α result in too much sampling in the sense of a dominating recursion error and a corresponding reduction in efficiency, as quantified in i) and ii) of Theorem 6.6. Before we state a result that is analogous to Theorem 6.6 for the context of superlinearly converging DA recursions, we state and prove a lemma that will be useful. Lemma 6.7. Suppose {a } is a positive-valued sequence satisfying a 1/q 1 as, where q > 1. If Λ 0, 1) is a well-defined random variable, then a Λ q p 0. Proof. Since Λ 0, 1), for any given ɛ 0, 1), there exists δ 1 ɛ) > 0 such that PΛ > 1 + δ 1 ɛ)) 1 ) ɛ. Also, since a 1/q for all N 1, a 1/q 1 as, we can find N 1 such that > 1 δ 2 1ɛ). Therefore, we see that for any given ɛ > 0, there

18 62 R. PASUPATHY, P. GLYNN, S. GHOSH, AND F. HASHEMI exists δ 1 ɛ) > 0 such that for N 1, 50) Pa 1/q Λ > 1 δ1ɛ))1 2 + δ 1 ɛ)) 1 ) = Pa 1/q Λ > 1 δ 1 ɛ)) ɛ. Also notice that, for any given δ 0 > 0, we can choose N 2 such that 1 δ 1 ɛ)) q δ 0 for N 2. Using this observation with 50), we see that, for any given δ 0 > 0 and ɛ > 0, Pa Λ q > δ 0 ) ɛ for maxn 1, N 2 ), and the assertion of the lemma holds. Theorem 6.8 superlinearly converging DA). Let Assumption 3.1, Assumption A.2 from Theorem 6.6, and the following assumption on superlinear decrease hold. A.3. The deterministic recursion DA) exhibits SuperLinearλ q, q) convergence in a neighborhood around x ; that is, there exist a neighborhood V of x and constants λ q > 0, q > 1 such that whenever x V, and for all, x + h x) x λ q x x q. Also, suppose the simulation estimator H m, X ) satisfies for all 0, n 0, with probability one, 51) E[m αn H m, X ) h X ) n F 1 ] κ n 0 + κ n 1 X, for some α > 0 and where κ 0 and κ 1 > 0 are constants. Then, as, the following hold: i) O p αp if {m } grows as Polynomialλ p, p), pα > 1, O ) p c α if {m } grows as Geometricc), E = O p c αt 1 ) if {m } grows as SupExponentialλ t, t), t 1, q), O p Λ q + c αq 2 ) if {m } grows as SupExponentialλ t, t), t q, where c 1 = m 0 λ 1/t 1) t, c 2 = κ 1 α 2λ 1 q q ) q αq 1) m0, κ = maxκ 0, κ 1 ), and Λ is a random variable that satisfies Λ 0, 1). ii) W α p p+1 Λ W + c α W 2 E = O p 1) if {m } grows as Polynomialλ p, p), pα > 1, W αe = O p 1) if {m } grows as Geometricc), W) αe = O p 1) if {m } grows as SupExponentialλ t, t), t 1, q), E = O p 1) if {m } grows as SupExponentialλ t, t), t q, where W = log c1 W ) logt q. Proof. Repeating arguments leading to 38) in the proof of Theorem 6.6, we write 52) E K0++1 λ q E q K 0+ + ζ K 0+, where ζ K0+ = h K0+X K0+) H K0+m K0+, X K0+), and, as in the proof of Theorem 6.6, K 0 is a random variable such that, except for a set of measure zero, X x for K 0 ; the constant is chosen such that the ball B x ) V and < 2λ q ) 1 q 1, where the set V is the neighborhood appearing in A.3.

19 ON SAMPLING RATES IN SIMULATION-BASED RECURSIONS 63 Denote sn) := 1 + q + + q n 1 = q n 1)/q 1), n 1, and recurse 52) to obtain for 0 E K0++1 λ q E q K 0+ + ζ K 0+ 53) λ q λq E q K + ζ ) q 0+ 1 K ζk0+ λ q 2 q λ q qe q2 K q ζ K0+ 1 q) + ζ K0+ = 2 q λ 1+q q 2 q λ 1+q q. E q2 K q λ q ζ K0+ 1 q + ζ K0+ λq E q K + ζ ) q 0+ 2 K q λ q ζ K0+ 1 q + ζ K0+ 2 s+1) 1 λ s+1) q E q+1 K 0 + +K 0 2 s+1) 1 λ s+1) q E q+1 K 0 + ζ q =K 0 ζ K0+ q λq s ) 2 s +1) 1 +K 0 λq s+k0 ) 2 s+k0 +1) 1 +K 0 2 s+1) 1 λ s+1) q E q+1 K 0 + ζ q+k 0 λq s+k0 ) 2 s+k0 +1) 1, where the second to last inequality in 53) is after relabeling + K 0 in the inequality immediately preceding it, and the last inequality is obtained by adding some positive terms to the right-hand side of 53). Now relabel + K 0 in 53) and notice that E K0 by the definition of K 0 to get, for K 0, 54) E +1 2 s K0+1) 1 λ s K0+1) q q K λ q ) s K0+1) q K λ 1 q q ) 1 ζ q λ s ) q 2 s +1) 1 ζ q 2λ 1 q ) s +1). Now, we see from 51) after taing expectation with respect to F 1 ) that 55) E[ ζ q ] κ q m αq 1 + E[ X ]) κ q m αq 1 + x + E[E ]), where κ = maxκ 0, κ 1 ). As in the proof of Theorem 6.6, due to Theorem 5.2, for given ɛ > 0 there exists 0 ɛ) such that for 0 ɛ), E[E ] ɛ. This and 55) imply

20 64 R. PASUPATHY, P. GLYNN, S. GHOSH, AND F. HASHEMI that E ζ q 2λ 1 q ) s +1) 56) 0ɛ) κ q 2λ 1 q q ) s +1) m αq 1 + x + E[E ]) + = 0ɛ)+1 κ q 2λ 1 q ) s +1) m αq 1 + x + ɛ). Since E[E ] < for 0 ɛ), e = maxmax{e[e ] : = 1, 2,..., 0 ɛ)}, ɛ) <. The inequality in 56) then implies that 57) E ζ q 2λ 1 q ) s +1) 1 + x + e ) κ q 2λ 1 q q ) s +1) m αq. Again, as in the proof of Theorem 6.6, we now from part i) of Lemma 6.5 that if a positive random sequence {S n } satisfies E[S n ] = Oa n ), where {a n } is a deterministic positive-valued sequence, then S n = O p a n ). Therefore, we see from 57) that 58) ζ q 2λ 1 q ) s +1) = O p κ q 2λ 1 q q ) s +1) m αq Use 58) and 54) to write, for K 0, E +1 2 s K0+1) 1 λ s K0+1) q q K O p κ q 2λ 1 q ) s +1) m αq 59) 2λ q ) 1 q 1 1 ΛK 0 )) q + O p κ q 2λ 1 q q ) s +1) m αq where, after some algebra, the random variable ΛK 0 ) in 59) can be seen to be 60) ΛK 0 ) = 2λ q ) 1 q 1 ) q K The constant ΛK 0 ) 0, 1) because has been chosen so that < 2λ q ) 1 q 1.) Proof of i). In what follows, the assertions in i) will be proved using conclusions from three parts named Part A, Part B, and Part C that follow. In Part A, we will analyze the behavior of the summation 2λ 1 κq q ) s +1) m αq appearing in 59) when the sample size sequence {m } is Polynomialλ p, p), or Geometricc), or SupExponentialλ t, t) with t 1, q). In Part B, we will analyze the behavior of the summation κq 2λ 1 q ) s +1) m αq appearing in 59) when the sample size sequence {m } is SupExponentialλ t, t) with t q. In Part C, we will analyze the,.

ON SAMPLING RATES IN SIMULATION-BASED RECURSIONS

ON SAMPLING RATES IN SIMULATION-BASED RECURSIONS ON SAMPLING RATES IN SIMULATION-BASED RECURSIONS RAGHU PASUPATHY, PETER GLYNN, SOUMYADIP GHOSH, AND FATEMEH S. HASHEMI Abstract. We consider the context of simulation-based recursions, that is, recursions

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth

More information

Sampling Controlled Stochastic Recursions: Applications to Simulation Optimization and Stochastic Root Finding.

Sampling Controlled Stochastic Recursions: Applications to Simulation Optimization and Stochastic Root Finding. Sampling Controlled Stochastic Recursions: Applications to Simulation Optimization and Stochastic Root Finding. Fatemeh Sadat Hashemi Dissertation submitted to the Faculty of the Virginia Polytechnic Institute

More information

Search Directions for Unconstrained Optimization

Search Directions for Unconstrained Optimization 8 CHAPTER 8 Search Directions for Unconstrained Optimization In this chapter we study the choice of search directions used in our basic updating scheme x +1 = x + t d. for solving P min f(x). x R n All

More information

8 Numerical methods for unconstrained problems

8 Numerical methods for unconstrained problems 8 Numerical methods for unconstrained problems Optimization is one of the important fields in numerical computation, beside solving differential equations and linear systems. We can see that these fields

More information

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence: A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition

More information

arxiv: v2 [math.oc] 5 May 2018

arxiv: v2 [math.oc] 5 May 2018 The Impact of Local Geometry and Batch Size on Stochastic Gradient Descent for Nonconvex Problems Viva Patel a a Department of Statistics, University of Chicago, Illinois, USA arxiv:1709.04718v2 [math.oc]

More information

Unconstrained optimization

Unconstrained optimization Chapter 4 Unconstrained optimization An unconstrained optimization problem takes the form min x Rnf(x) (4.1) for a target functional (also called objective function) f : R n R. In this chapter and throughout

More information

Stochastic Optimization Algorithms Beyond SG

Stochastic Optimization Algorithms Beyond SG Stochastic Optimization Algorithms Beyond SG Frank E. Curtis 1, Lehigh University involving joint work with Léon Bottou, Facebook AI Research Jorge Nocedal, Northwestern University Optimization Methods

More information

arxiv: v3 [math.oc] 8 Jan 2019

arxiv: v3 [math.oc] 8 Jan 2019 Why Random Reshuffling Beats Stochastic Gradient Descent Mert Gürbüzbalaban, Asuman Ozdaglar, Pablo Parrilo arxiv:1510.08560v3 [math.oc] 8 Jan 2019 January 9, 2019 Abstract We analyze the convergence rate

More information

EQUIVALENCE OF TOPOLOGIES AND BOREL FIELDS FOR COUNTABLY-HILBERT SPACES

EQUIVALENCE OF TOPOLOGIES AND BOREL FIELDS FOR COUNTABLY-HILBERT SPACES EQUIVALENCE OF TOPOLOGIES AND BOREL FIELDS FOR COUNTABLY-HILBERT SPACES JEREMY J. BECNEL Abstract. We examine the main topologies wea, strong, and inductive placed on the dual of a countably-normed space

More information

Some Results Concerning Uniqueness of Triangle Sequences

Some Results Concerning Uniqueness of Triangle Sequences Some Results Concerning Uniqueness of Triangle Sequences T. Cheslack-Postava A. Diesl M. Lepinski A. Schuyler August 12 1999 Abstract In this paper we will begin by reviewing the triangle iteration. We

More information

Convex Optimization Notes

Convex Optimization Notes Convex Optimization Notes Jonathan Siegel January 2017 1 Convex Analysis This section is devoted to the study of convex functions f : B R {+ } and convex sets U B, for B a Banach space. The case of B =

More information

Stochastic Compositional Gradient Descent: Algorithms for Minimizing Nonlinear Functions of Expected Values

Stochastic Compositional Gradient Descent: Algorithms for Minimizing Nonlinear Functions of Expected Values Stochastic Compositional Gradient Descent: Algorithms for Minimizing Nonlinear Functions of Expected Values Mengdi Wang Ethan X. Fang Han Liu Abstract Classical stochastic gradient methods are well suited

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

Universal probability-free conformal prediction

Universal probability-free conformal prediction Universal probability-free conformal prediction Vladimir Vovk and Dusko Pavlovic March 20, 2016 Abstract We construct a universal prediction system in the spirit of Popper s falsifiability and Kolmogorov

More information

How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization

How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization How to Characterize the Worst-Case Performance of Algorithms for Nonconvex Optimization Frank E. Curtis Department of Industrial and Systems Engineering, Lehigh University Daniel P. Robinson Department

More information

Near-Potential Games: Geometry and Dynamics

Near-Potential Games: Geometry and Dynamics Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo January 29, 2012 Abstract Potential games are a special class of games for which many adaptive user dynamics

More information

Research Article On Maslanka s Representation for the Riemann Zeta Function

Research Article On Maslanka s Representation for the Riemann Zeta Function International Mathematics and Mathematical Sciences Volume 200, Article ID 7447, 9 pages doi:0.55/200/7447 Research Article On Maslana s Representation for the Riemann Zeta Function Luis Báez-Duarte Departamento

More information

Optimal Newton-type methods for nonconvex smooth optimization problems

Optimal Newton-type methods for nonconvex smooth optimization problems Optimal Newton-type methods for nonconvex smooth optimization problems Coralia Cartis, Nicholas I. M. Gould and Philippe L. Toint June 9, 20 Abstract We consider a general class of second-order iterations

More information

The Steepest Descent Algorithm for Unconstrained Optimization

The Steepest Descent Algorithm for Unconstrained Optimization The Steepest Descent Algorithm for Unconstrained Optimization Robert M. Freund February, 2014 c 2014 Massachusetts Institute of Technology. All rights reserved. 1 1 Steepest Descent Algorithm The problem

More information

A NOTE ON Q-ORDER OF CONVERGENCE

A NOTE ON Q-ORDER OF CONVERGENCE BIT 0006-3835/01/4102-0422 $16.00 2001, Vol. 41, No. 2, pp. 422 429 c Swets & Zeitlinger A NOTE ON Q-ORDER OF CONVERGENCE L. O. JAY Department of Mathematics, The University of Iowa, 14 MacLean Hall Iowa

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

Fourth Week: Lectures 10-12

Fourth Week: Lectures 10-12 Fourth Week: Lectures 10-12 Lecture 10 The fact that a power series p of positive radius of convergence defines a function inside its disc of convergence via substitution is something that we cannot ignore

More information

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.

More information

Convergence rates for distributed stochastic optimization over random networks

Convergence rates for distributed stochastic optimization over random networks Convergence rates for distributed stochastic optimization over random networs Dusan Jaovetic, Dragana Bajovic, Anit Kumar Sahu and Soummya Kar Abstract We establish the O ) convergence rate for distributed

More information

1216 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 54, NO. 6, JUNE /$ IEEE

1216 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 54, NO. 6, JUNE /$ IEEE 1216 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL 54, NO 6, JUNE 2009 Feedback Weighting Mechanisms for Improving Jacobian Estimates in the Adaptive Simultaneous Perturbation Algorithm James C Spall, Fellow,

More information

Near-Potential Games: Geometry and Dynamics

Near-Potential Games: Geometry and Dynamics Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo September 6, 2011 Abstract Potential games are a special class of games for which many adaptive user dynamics

More information

MATH 117 LECTURE NOTES

MATH 117 LECTURE NOTES MATH 117 LECTURE NOTES XIN ZHOU Abstract. This is the set of lecture notes for Math 117 during Fall quarter of 2017 at UC Santa Barbara. The lectures follow closely the textbook [1]. Contents 1. The set

More information

REGULARITY FOR INFINITY HARMONIC FUNCTIONS IN TWO DIMENSIONS

REGULARITY FOR INFINITY HARMONIC FUNCTIONS IN TWO DIMENSIONS C,α REGULARITY FOR INFINITY HARMONIC FUNCTIONS IN TWO DIMENSIONS LAWRENCE C. EVANS AND OVIDIU SAVIN Abstract. We propose a new method for showing C,α regularity for solutions of the infinity Laplacian

More information

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by:

1 Newton s Method. Suppose we want to solve: x R. At x = x, f (x) can be approximated by: Newton s Method Suppose we want to solve: (P:) min f (x) At x = x, f (x) can be approximated by: n x R. f (x) h(x) := f ( x)+ f ( x) T (x x)+ (x x) t H ( x)(x x), 2 which is the quadratic Taylor expansion

More information

Part III. 10 Topological Space Basics. Topological Spaces

Part III. 10 Topological Space Basics. Topological Spaces Part III 10 Topological Space Basics Topological Spaces Using the metric space results above as motivation we will axiomatize the notion of being an open set to more general settings. Definition 10.1.

More information

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0 Numerical Analysis 1 1. Nonlinear Equations This lecture note excerpted parts from Michael Heath and Max Gunzburger. Given function f, we seek value x for which where f : D R n R n is nonlinear. f(x) =

More information

THE INVERSE FUNCTION THEOREM

THE INVERSE FUNCTION THEOREM THE INVERSE FUNCTION THEOREM W. PATRICK HOOPER The implicit function theorem is the following result: Theorem 1. Let f be a C 1 function from a neighborhood of a point a R n into R n. Suppose A = Df(a)

More information

Probability and Measure

Probability and Measure Probability and Measure Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Convergence of Random Variables 1. Convergence Concepts 1.1. Convergence of Real

More information

Undergraduate Notes in Mathematics. Arkansas Tech University Department of Mathematics

Undergraduate Notes in Mathematics. Arkansas Tech University Department of Mathematics Undergraduate Notes in Mathematics Arkansas Tech University Department of Mathematics An Introductory Single Variable Real Analysis: A Learning Approach through Problem Solving Marcel B. Finan c All Rights

More information

The Canonical Gaussian Measure on R

The Canonical Gaussian Measure on R The Canonical Gaussian Measure on R 1. Introduction The main goal of this course is to study Gaussian measures. The simplest example of a Gaussian measure is the canonical Gaussian measure P on R where

More information

Lecture 1. Stochastic Optimization: Introduction. January 8, 2018

Lecture 1. Stochastic Optimization: Introduction. January 8, 2018 Lecture 1 Stochastic Optimization: Introduction January 8, 2018 Optimization Concerned with mininmization/maximization of mathematical functions Often subject to constraints Euler (1707-1783): Nothing

More information

17 Solution of Nonlinear Systems

17 Solution of Nonlinear Systems 17 Solution of Nonlinear Systems We now discuss the solution of systems of nonlinear equations. An important ingredient will be the multivariate Taylor theorem. Theorem 17.1 Let D = {x 1, x 2,..., x m

More information

We are going to discuss what it means for a sequence to converge in three stages: First, we define what it means for a sequence to converge to zero

We are going to discuss what it means for a sequence to converge in three stages: First, we define what it means for a sequence to converge to zero Chapter Limits of Sequences Calculus Student: lim s n = 0 means the s n are getting closer and closer to zero but never gets there. Instructor: ARGHHHHH! Exercise. Think of a better response for the instructor.

More information

From Calculus II: An infinite series is an expression of the form

From Calculus II: An infinite series is an expression of the form MATH 3333 INTERMEDIATE ANALYSIS BLECHER NOTES 75 8. Infinite series of numbers From Calculus II: An infinite series is an expression of the form = a m + a m+ + a m+2 + ( ) Let us call this expression (*).

More information

Lecture 22: Variance and Covariance

Lecture 22: Variance and Covariance EE5110 : Probability Foundations for Electrical Engineers July-November 2015 Lecture 22: Variance and Covariance Lecturer: Dr. Krishna Jagannathan Scribes: R.Ravi Kiran In this lecture we will introduce

More information

On the Convergence of Optimistic Policy Iteration

On the Convergence of Optimistic Policy Iteration Journal of Machine Learning Research 3 (2002) 59 72 Submitted 10/01; Published 7/02 On the Convergence of Optimistic Policy Iteration John N. Tsitsiklis LIDS, Room 35-209 Massachusetts Institute of Technology

More information

THE UNIQUE MINIMAL DUAL REPRESENTATION OF A CONVEX FUNCTION

THE UNIQUE MINIMAL DUAL REPRESENTATION OF A CONVEX FUNCTION THE UNIQUE MINIMAL DUAL REPRESENTATION OF A CONVEX FUNCTION HALUK ERGIN AND TODD SARVER Abstract. Suppose (i) X is a separable Banach space, (ii) C is a convex subset of X that is a Baire space (when endowed

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Stochastic Subgradient Method

Stochastic Subgradient Method Stochastic Subgradient Method Lingjie Weng, Yutian Chen Bren School of Information and Computer Science UC Irvine Subgradient Recall basic inequality for convex differentiable f : f y f x + f x T (y x)

More information

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory Part V 7 Introduction: What are measures and why measurable sets Lebesgue Integration Theory Definition 7. (Preliminary). A measure on a set is a function :2 [ ] such that. () = 2. If { } = is a finite

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

Rate of Convergence Analysis of Discretization and. Smoothing Algorithms for Semi-Infinite Minimax. Problems

Rate of Convergence Analysis of Discretization and. Smoothing Algorithms for Semi-Infinite Minimax. Problems Rate of Convergence Analysis of Discretization and Smoothing Algorithms for Semi-Infinite Minimax Problems J. O. Royset and E. Y. Pee June 1, 2012 Abstract: Discretization algorithms for semi-infinite

More information

Douglas-Rachford splitting for nonconvex feasibility problems

Douglas-Rachford splitting for nonconvex feasibility problems Douglas-Rachford splitting for nonconvex feasibility problems Guoyin Li Ting Kei Pong Jan 3, 015 Abstract We adapt the Douglas-Rachford DR) splitting method to solve nonconvex feasibility problems by studying

More information

Comparison of Modern Stochastic Optimization Algorithms

Comparison of Modern Stochastic Optimization Algorithms Comparison of Modern Stochastic Optimization Algorithms George Papamakarios December 214 Abstract Gradient-based optimization methods are popular in machine learning applications. In large-scale problems,

More information

Existence and Uniqueness

Existence and Uniqueness Chapter 3 Existence and Uniqueness An intellect which at a certain moment would know all forces that set nature in motion, and all positions of all items of which nature is composed, if this intellect

More information

Estimating Gaussian Mixture Densities with EM A Tutorial

Estimating Gaussian Mixture Densities with EM A Tutorial Estimating Gaussian Mixture Densities with EM A Tutorial Carlo Tomasi Due University Expectation Maximization (EM) [4, 3, 6] is a numerical algorithm for the maximization of functions of several variables

More information

R-Linear Convergence of Limited Memory Steepest Descent

R-Linear Convergence of Limited Memory Steepest Descent R-Linear Convergence of Limited Memory Steepest Descent Fran E. Curtis and Wei Guo Department of Industrial and Systems Engineering, Lehigh University, USA COR@L Technical Report 16T-010 R-Linear Convergence

More information

Set, functions and Euclidean space. Seungjin Han

Set, functions and Euclidean space. Seungjin Han Set, functions and Euclidean space Seungjin Han September, 2018 1 Some Basics LOGIC A is necessary for B : If B holds, then A holds. B A A B is the contraposition of B A. A is sufficient for B: If A holds,

More information

Chapter 8 Gradient Methods

Chapter 8 Gradient Methods Chapter 8 Gradient Methods An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Introduction Recall that a level set of a function is the set of points satisfying for some constant. Thus, a point

More information

Numerical Sequences and Series

Numerical Sequences and Series Numerical Sequences and Series Written by Men-Gen Tsai email: b89902089@ntu.edu.tw. Prove that the convergence of {s n } implies convergence of { s n }. Is the converse true? Solution: Since {s n } is

More information

Introduction: The Perceptron

Introduction: The Perceptron Introduction: The Perceptron Haim Sompolinsy, MIT October 4, 203 Perceptron Architecture The simplest type of perceptron has a single layer of weights connecting the inputs and output. Formally, the perceptron

More information

Doubly Indexed Infinite Series

Doubly Indexed Infinite Series The Islamic University of Gaza Deanery of Higher studies Faculty of Science Department of Mathematics Doubly Indexed Infinite Series Presented By Ahed Khaleel Abu ALees Supervisor Professor Eissa D. Habil

More information

Newton-like method with diagonal correction for distributed optimization

Newton-like method with diagonal correction for distributed optimization Newton-lie method with diagonal correction for distributed optimization Dragana Bajović Dušan Jaovetić Nataša Krejić Nataša Krlec Jerinić August 15, 2015 Abstract We consider distributed optimization problems

More information

Sub-Sampled Newton Methods

Sub-Sampled Newton Methods Sub-Sampled Newton Methods F. Roosta-Khorasani and M. W. Mahoney ICSI and Dept of Statistics, UC Berkeley February 2016 F. Roosta-Khorasani and M. W. Mahoney (UCB) Sub-Sampled Newton Methods Feb 2016 1

More information

Inequality Constraints

Inequality Constraints Chapter 2 Inequality Constraints 2.1 Optimality Conditions Early in multivariate calculus we learn the significance of differentiability in finding minimizers. In this section we begin our study of the

More information

Comparison of Orlicz Lorentz Spaces

Comparison of Orlicz Lorentz Spaces Comparison of Orlicz Lorentz Spaces S.J. Montgomery-Smith* Department of Mathematics, University of Missouri, Columbia, MO 65211. I dedicate this paper to my Mother and Father, who as well as introducing

More information

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra. DS-GA 1002 Lecture notes 0 Fall 2016 Linear Algebra These notes provide a review of basic concepts in linear algebra. 1 Vector spaces You are no doubt familiar with vectors in R 2 or R 3, i.e. [ ] 1.1

More information

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES

AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES AN ELEMENTARY PROOF OF THE SPECTRAL RADIUS FORMULA FOR MATRICES JOEL A. TROPP Abstract. We present an elementary proof that the spectral radius of a matrix A may be obtained using the formula ρ(a) lim

More information

Suppose R is an ordered ring with positive elements P.

Suppose R is an ordered ring with positive elements P. 1. The real numbers. 1.1. Ordered rings. Definition 1.1. By an ordered commutative ring with unity we mean an ordered sextuple (R, +, 0,, 1, P ) such that (R, +, 0,, 1) is a commutative ring with unity

More information

NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM

NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM NONSMOOTH VARIANTS OF POWELL S BFGS CONVERGENCE THEOREM JIAYI GUO AND A.S. LEWIS Abstract. The popular BFGS quasi-newton minimization algorithm under reasonable conditions converges globally on smooth

More information

A METHOD FOR FAST GENERATION OF BIVARIATE POISSON RANDOM VECTORS. Kaeyoung Shin Raghu Pasupathy

A METHOD FOR FAST GENERATION OF BIVARIATE POISSON RANDOM VECTORS. Kaeyoung Shin Raghu Pasupathy Proceedings of the 27 Winter Simulation Conference S. G. Henderson, B. Biller, M.-H. Hsieh, J. Shortle, J. D. Tew, and R. R. Barton, eds. A METHOD FOR FAST GENERATION OF BIVARIATE POISSON RANDOM VECTORS

More information

arxiv:cs/ v1 [cs.cc] 16 Aug 2006

arxiv:cs/ v1 [cs.cc] 16 Aug 2006 On Polynomial Time Computable Numbers arxiv:cs/0608067v [cs.cc] 6 Aug 2006 Matsui, Tetsushi Abstract It will be shown that the polynomial time computable numbers form a field, and especially an algebraically

More information

ABSTRACT 1. INTRODUCTION

ABSTRACT 1. INTRODUCTION A DIAGONAL-AUGMENTED QUASI-NEWTON METHOD WITH APPLICATION TO FACTORIZATION MACHINES Aryan Mohtari and Amir Ingber Department of Electrical and Systems Engineering, University of Pennsylvania, PA, USA Big-data

More information

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability... Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................

More information

The complexity of recursive constraint satisfaction problems.

The complexity of recursive constraint satisfaction problems. The complexity of recursive constraint satisfaction problems. Victor W. Marek Department of Computer Science University of Kentucky Lexington, KY 40506, USA marek@cs.uky.edu Jeffrey B. Remmel Department

More information

Harmonic Analysis. 1. Hermite Polynomials in Dimension One. Recall that if L 2 ([0 2π] ), then we can write as

Harmonic Analysis. 1. Hermite Polynomials in Dimension One. Recall that if L 2 ([0 2π] ), then we can write as Harmonic Analysis Recall that if L 2 ([0 2π] ), then we can write as () Z e ˆ (3.) F:L where the convergence takes place in L 2 ([0 2π] ) and ˆ is the th Fourier coefficient of ; that is, ˆ : (2π) [02π]

More information

MATH4406 Assignment 5

MATH4406 Assignment 5 MATH4406 Assignment 5 Patrick Laub (ID: 42051392) October 7, 2014 1 The machine replacement model 1.1 Real-world motivation Consider the machine to be the entire world. Over time the creator has running

More information

Sub-Sampled Newton Methods for Machine Learning. Jorge Nocedal

Sub-Sampled Newton Methods for Machine Learning. Jorge Nocedal Sub-Sampled Newton Methods for Machine Learning Jorge Nocedal Northwestern University Goldman Lecture, Sept 2016 1 Collaborators Raghu Bollapragada Northwestern University Richard Byrd University of Colorado

More information

Introduction and Preliminaries

Introduction and Preliminaries Chapter 1 Introduction and Preliminaries This chapter serves two purposes. The first purpose is to prepare the readers for the more systematic development in later chapters of methods of real analysis

More information

The local equivalence of two distances between clusterings: the Misclassification Error metric and the χ 2 distance

The local equivalence of two distances between clusterings: the Misclassification Error metric and the χ 2 distance The local equivalence of two distances between clusterings: the Misclassification Error metric and the χ 2 distance Marina Meilă University of Washington Department of Statistics Box 354322 Seattle, WA

More information

LECTURE 10: REVIEW OF POWER SERIES. 1. Motivation

LECTURE 10: REVIEW OF POWER SERIES. 1. Motivation LECTURE 10: REVIEW OF POWER SERIES By definition, a power series centered at x 0 is a series of the form where a 0, a 1,... and x 0 are constants. For convenience, we shall mostly be concerned with the

More information

Exact and Inexact Subsampled Newton Methods for Optimization

Exact and Inexact Subsampled Newton Methods for Optimization Exact and Inexact Subsampled Newton Methods for Optimization Raghu Bollapragada Richard Byrd Jorge Nocedal September 27, 2016 Abstract The paper studies the solution of stochastic optimization problems

More information

NORMS ON SPACE OF MATRICES

NORMS ON SPACE OF MATRICES NORMS ON SPACE OF MATRICES. Operator Norms on Space of linear maps Let A be an n n real matrix and x 0 be a vector in R n. We would like to use the Picard iteration method to solve for the following system

More information

Monte Carlo Integration I [RC] Chapter 3

Monte Carlo Integration I [RC] Chapter 3 Aula 3. Monte Carlo Integration I 0 Monte Carlo Integration I [RC] Chapter 3 Anatoli Iambartsev IME-USP Aula 3. Monte Carlo Integration I 1 There is no exact definition of the Monte Carlo methods. In the

More information

3 Integration and Expectation

3 Integration and Expectation 3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ

More information

Lecture 17 Brownian motion as a Markov process

Lecture 17 Brownian motion as a Markov process Lecture 17: Brownian motion as a Markov process 1 of 14 Course: Theory of Probability II Term: Spring 2015 Instructor: Gordan Zitkovic Lecture 17 Brownian motion as a Markov process Brownian motion is

More information

Consider the context of selecting an optimal system from among a finite set of competing systems, based

Consider the context of selecting an optimal system from among a finite set of competing systems, based INFORMS Journal on Computing Vol. 25, No. 3, Summer 23, pp. 527 542 ISSN 9-9856 print) ISSN 526-5528 online) http://dx.doi.org/.287/ijoc.2.59 23 INFORMS Optimal Sampling Laws for Stochastically Constrained

More information

Connections between spectral properties of asymptotic mappings and solutions to wireless network problems

Connections between spectral properties of asymptotic mappings and solutions to wireless network problems 1 Connections between spectral properties of asymptotic mappings and solutions to wireless network problems R. L. G. Cavalcante, Member, IEEE, Qi Liao, Member, IEEE, and S. Stańczak, Senior Member, IEEE

More information

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Zhaosong Lu October 5, 2012 (Revised: June 3, 2013; September 17, 2013) Abstract In this paper we study

More information

means is a subset of. So we say A B for sets A and B if x A we have x B holds. BY CONTRAST, a S means that a is a member of S.

means is a subset of. So we say A B for sets A and B if x A we have x B holds. BY CONTRAST, a S means that a is a member of S. 1 Notation For those unfamiliar, we have := means equal by definition, N := {0, 1,... } or {1, 2,... } depending on context. (i.e. N is the set or collection of counting numbers.) In addition, means for

More information

Theorems. Theorem 1.11: Greatest-Lower-Bound Property. Theorem 1.20: The Archimedean property of. Theorem 1.21: -th Root of Real Numbers

Theorems. Theorem 1.11: Greatest-Lower-Bound Property. Theorem 1.20: The Archimedean property of. Theorem 1.21: -th Root of Real Numbers Page 1 Theorems Wednesday, May 9, 2018 12:53 AM Theorem 1.11: Greatest-Lower-Bound Property Suppose is an ordered set with the least-upper-bound property Suppose, and is bounded below be the set of lower

More information

Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming

Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming MATHEMATICS OF OPERATIONS RESEARCH Vol. 37, No. 1, February 2012, pp. 66 94 ISSN 0364-765X (print) ISSN 1526-5471 (online) http://dx.doi.org/10.1287/moor.1110.0532 2012 INFORMS Q-Learning and Enhanced

More information

Stochastically Constrained Simulation Optimization On. Integer-Ordered Spaces: The cgr-spline Algorithm

Stochastically Constrained Simulation Optimization On. Integer-Ordered Spaces: The cgr-spline Algorithm Stochastically Constrained Simulation Optimization On Integer-Ordered Spaces: The cgr-spline Algorithm Kalyani Nagaraj Raghu Pasupathy Department of Statistics, Purdue University, West Lafayette, IN 47907,

More information

Lecture 12 Unconstrained Optimization (contd.) Constrained Optimization. October 15, 2008

Lecture 12 Unconstrained Optimization (contd.) Constrained Optimization. October 15, 2008 Lecture 12 Unconstrained Optimization (contd.) Constrained Optimization October 15, 2008 Outline Lecture 11 Gradient descent algorithm Improvement to result in Lec 11 At what rate will it converge? Constrained

More information

DS-GA 1002 Lecture notes 2 Fall Random variables

DS-GA 1002 Lecture notes 2 Fall Random variables DS-GA 12 Lecture notes 2 Fall 216 1 Introduction Random variables Random variables are a fundamental tool in probabilistic modeling. They allow us to model numerical quantities that are uncertain: the

More information

Approximation of Minimal Functions by Extreme Functions

Approximation of Minimal Functions by Extreme Functions Approximation of Minimal Functions by Extreme Functions Teresa M. Lebair and Amitabh Basu August 14, 2017 Abstract In a recent paper, Basu, Hildebrand, and Molinaro established that the set of continuous

More information

Optimization Tutorial 1. Basic Gradient Descent

Optimization Tutorial 1. Basic Gradient Descent E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.

More information

On Solving Large-Scale Finite Minimax Problems. using Exponential Smoothing

On Solving Large-Scale Finite Minimax Problems. using Exponential Smoothing On Solving Large-Scale Finite Minimax Problems using Exponential Smoothing E. Y. Pee and J. O. Royset This paper focuses on finite minimax problems with many functions, and their solution by means of exponential

More information

Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

More information

A Distributed Newton Method for Network Utility Maximization, II: Convergence

A Distributed Newton Method for Network Utility Maximization, II: Convergence A Distributed Newton Method for Network Utility Maximization, II: Convergence Ermin Wei, Asuman Ozdaglar, and Ali Jadbabaie October 31, 2012 Abstract The existing distributed algorithms for Network Utility

More information