1 Introduction The purpose of this paper is to illustrate the typical behavior of learning algorithms using stochastic approximations (SA). In particu

Size: px
Start display at page:

Download "1 Introduction The purpose of this paper is to illustrate the typical behavior of learning algorithms using stochastic approximations (SA). In particu"

Transcription

1 Strong Points of Weak Convergence: A Study Using RPA Gradient Estimation for Automatic Learning Felisa J. Vazquez-Abad * Department of Computer Science and Operations Research University of Montreal, Montreal, Quebec H3C 3J7 vazquez@iro.umontreal.ca Revised, March 11, 1998 Abstract In this paper we focus on the behavior of adaptive control schemes for automatic learning. Estimates of the sensitivities are used in a gradient-based stochastic approximation procedure, in order to drive the process along the steepest descent trajectory in search for the optimum. The learning rates are kept constant for adaptability. For such procedures, convergence can be established in a weak sense. We consider a model problem of a exible machine where the control parameter is a probability vector. We propose a new sensitivity estimator, generalizing the Phantom Rare Perturbation Analysis (RPA) estimator to multi-valued decisions. From the basic properties of the estimators, we build several updating rules based on the weak convergence theory to ensure asymptotic optimality. We illustrate the predicted theoretical behavior with computer simulations. Finally, we present the comparison between the behavior of our proposed scheme with a regenerative one for which we can establish strong convergence. Our results show that weak convergence yields a dramatic improvement in the rate of convergence, in addition to the capability of adaptation, or tracking. Keywords: Weak Convergence; Gradient Estimation; Rare perturbation Analysis; Tracking; Automatic Control * Supported in part by NSERC-Canada grants # WFA and FCAR-Quebec grant # 93-ER- 1

2 1 Introduction The purpose of this paper is to illustrate the typical behavior of learning algorithms using stochastic approximations (SA). In particular, we compare the behavior of SA using constant gains with that using decreasing gains. It is well known, as we shall briey review, that gradient-based SA can be used as a steepest descent algorithm in search for local minima. When we apply a gradient based SA to the control parameter of a system that we wish to optimize, the result of the procedure is a (stochastic) adaptive control process. Under some conditions, SA with decreasing gain parameters converge strongly to local optima. The ODE method, introduced by Kushner [15] and used to show a.s. convergence by [2] and [23] among others, can also be used to establish weak convergence of SA algorithms with constant gain [16], [18], [34]. In addition, the weak convergence method establishes verifyable conditions that characterize the ensuing behavior of the control processes [31]. We show in a simple example what each of the dierent assumptions and properties mean and discuss the implications when implementing the method in practice. While strong convergence results are naturally preferable and more reassuring in practice, it is impossible to show strong convergence for a learning algorithm that uses constant gains. If we settle for weak convergence, however, we may be able to track perturbations and adapt to changes in external conditions. This paper presents explicit implementations of gradient estimation into SA, as well as dierent updating rules, all of which yield the same asymptotic result for SA under decreasing gains. Our focus is to study the short term behavior of the dierent implementations. In particular, we illustrate weak convergence of the control processes to an ODE by showing sample trajectories and comparing them with the corresponding ones under decreasing gains, which converge a.s. We also discuss the concept of consistency in the average, a condition on the gradient estimators used to show weak convergence with the ODE method. A subsection is devoted to illustrate time scales, a theoretical device used in the proofs that can also be used to accelerate convergence by choosing the right implementation. Finally, we present the typical behavior of our learning algorithms for tracking perturbations. We believe that the unique strength of the weak convergence approach lies precisely in the type of assumptions and requirements on the algorithms, that allow us to distinguish the behavior of slightly dierent implementations. Consider a model where 2 IR d is a control parameter. Suppose that a performance function of the form F () has been dened, but that we do not have a closed form expression for it in terms of our model. Instead, we are capable of measuring data directly from the physical system, if it evolves in time, or from controlled experiments. Examples of this problem are numerous in applications, where the system is subject to uncertainty. A stochastic approximation procedure is a recursive algorithm of the form: n+1 = n + n Y n (1) that in some way adjusts the values of the control variable according to the observations Y n that{we hope{reect a measure of the sensitivity of the performance 2

3 as a function of. The work by Robbins and Monro [27] in the 1950's began a fruitful period for the development of control of stochastic systems. They established that such algorithms behave very much like their deterministic counterpart. Suppose that the gradient r F () is well dened and that the function F () is convex, so that there is a unique maximum. If we can produce a sequence of random variables fy n g with EfY n g = r F ( n ) and uniformly bounded variance, then n!, provided that that gain parameters satisfy: n > 0; n n! 1; n 2 n < 1 (2) The condition P 2 n < 1 can be weakened (see [20]). Intuitively, if we can obtain direct measurements of r F () corrupted by noise, then (1) follows (in the average) the direction of increase of the performance function, while the changes in n become smaller as we approach the optimum. Until very recently, as we shall mention later on, there were no available methods for measuring derivatives directly from observations of a stochastic process. Kiefer and Wolfowitz [13] proposed the following approach. Use a sequence c n! 0 to evaluate Y n = f( n + c n^e k )? f( n? c n^e k ) 2c n where ^e k is a vector with the k-th component set to unity and the remaining to zero, and f() is a sample or pathwise estimate of the function F (). We assume that E[f()] = F (). Their estimators are strongly consistent for the partial derivative F 0 k () w.r.t., in the following sense. If we do not perform the updates, but rather only produce the measurements Y n (), then lim n!1 E[Y n ()] = F 0 k () a.s. It is generally the case that good estimation is achieved as more samples are used in the construction of each estimate Y n. This is commonly performed as follows. One simulates or observes the system for a period of time T n during which the control parameter is kept xed at value n. At the end of the estimation interval, the parameter is updated using (1). It often is necessary that T n! 1 in order to achieve strong consistency of the estimators. Under these conditions, the procedure converges strongly to. In [15] the basic Robbins-Monro procedure is generalized, considering problems subject to constraints and using a truncated version of (1). In the context of stochastic optimization, much progress has been achieved in the past decade, with the increasing interest for constructing single path gradient estimators. Such are the Likelihood Ratio (LR) Method of [26] and [9], the Perturbation Analysis Method (IPA and SPA) of [10], [8] and, more recently, the Rare Perturbation Analysis (RPA) of [4], [29], the Harmonic Gradient (HG) of [12], the Weak Derivative Methods of [25] and the Simultaneous Perturbations (SPSA) of [28]. While many of these gradient estimators need observations of the system as well as detailed information about the systems dynamics, some of them can be applied to specic problems in a very robust manner. Notably, SPSA, HG and RPA can be used without assuming knowledge of the system's dynamics. In our 3

4 examples we shall discuss one imlementation of RPA that requires only observation of the states along a single trajectory. Most of the work has focused on showing strong convergence for simulation optimization of stochastic DES, that is, n! a.s., as in [1], [5], [6], [7], [11], [21] and [25], among others. Their method of proof refers to the general ODE method in [15], [23], [20], typically under the requirement (2). Applications of automatic learning in telecommunications, exible manufacturing systems (FMS), surveillance policies and dynamic resource allocation require algorithms capable of performing adaptive control under possibly changing environments. The implementation of a learning scheme diers form the common procedure for simulation optimization in that the gain parameters do not decrease, but are kept constant, using: n+1 = n + Y n (3) The requirement that the gain parameter (or \learning rate") does not decrease is essential for the algorithm to be able to adjust and track the optimal control when the underlying processes vary their behavior. While this is desirable for on-line optimization, strong convergence to the optimum cannot be achieved. Such would be the case, for example, when the call patterns in a telecommunications network change, machines fail or deteriorate their processing capabilities in a FMS, etc. An alternative approach has been proposed (see [16], [19], [17], [18], [31], [34]) in which convergence is established in a weak sense. Further, [24] has studied the stationary solution of such control processes. We shall study the behavior of (3) when estimates of the sensitivities are used in a gradient-based stochastic approximation procedure. The weak convergence method establishes the conditions under which (3) drives the process along the steepest descent trajectory in search for the optimum. The interpretation of the results of the method is intuitively appealing, but diers considerably from statements such as n! a.s. Instead of studying a sequence of random variables, the method focuses on the stochastic process dened by the time-varying control parameter. The notions of convergence of a process are, naturally, related to the topology introduced in the appropriate functional space. Roughly speaking, the method establishes the conditions under which the random trajectories approach, as! 0, the deterministic trajectories of a companion ODE, the asymptotes of which are its stable points. Therefore, establishing asymptotic optimality of the learning scheme requires identifying such stability points as the optimal control values of the original problem. Our model example is the following. A single server queue has V 2 possible settings or \speeds" under which the service distributions dier. The control parameter is the probability vector determining the fraction of customers served at each setting. The performance measure considers the minimization of the stationary average waiting time as well as the operation costs. Ours is an example of the canonical model in [5], [7] and [18], who implement IPA derivative estimators. In 4

5 our model, however, the control variable is not scalar, the constraint set requires to satisfy the law of total probability, and IPA is not applicable. The aim of this paper is to illustrate how the weak convergence methodology can be used in practice in order to construct learning algorithms. In so doing, we extend the notion of sensitivity estimation in terms of generalized gradients that take into account the Karush-Kuhn-Tucker (KKT) conditions for optimality. Although truncated algorithms are shown to converge in the limit (! 0 for (3) or n! 1 for (1)), the actual implementations are constructed with positive gain parameters. In our experience, truncations may yield bad behavior when one or more components of reach zero, which may be absorbing points of the procedure, even if far from the actual optimum. To solve this practical problem, we introduce the generalized gradients, which makes truncations of (3) unnecessary. We construct a new estimator for such sensitivities, generalizing the Phantom RPA method to the case of multi-valued decisions and discuss two implementations of our estimators. The predicted theoretical behavior of the updating schemes is illustrated via computer simulations using the regenerative and non-reset versions of the estimators. Finally, we present the comparison between the behavior of our proposed scheme with a regenerative one for which we can establish strong convergence. In Section 2, we present a brief review of the weak convergence method, emphasizing the ODE method. We state the main assumptions on the model and sensitivity estimators in the context of optimization of stochastic DES and the main results for strong and weak convergence. In Section 3 we present the model example. Section 4 deals with the construction of the \target" ODE that governs the limiting behavior of the learning scheme. Even if the performance measure is unknown in analytical form, the target ODE is constructed such that its stable points are local minima. In Section 5 we develop the gradient estimators and in Section 6 we describe the ensuing learning algortihms. Section 7 presents the empirical study of the behavior of the algorithms (3) and (1) under dierent implementations. 2 Weak Convergence Review We shall briey present the basic concepts and notation of the Weak Convergence Theory for stochastic approximation. For details on the model and methods of proof, we refer to [16], [18], [31], [19]. We strongly recommend the authoritative reference [20]. 2.1 Assumptions We shall now state the assumptions for strong and weak convergence for a Discrete Event Driven System (DES) under control. Let 2 IR d be the control variable. We consider a stochastic process f ~ t (); t 0g whose evolution is determined by a sequence of \events", happening at random times ft i ; i 0g. We assume that 5

6 T i is measurable with respect to the history or -algebra generated by the process, so that ft i tg 2 F(t)g. Calling i () the state of the embedded process plus the residual clocks at time T n, then f i ()g has a Markovian structure and is generally known as a Generalized Semi-Markov Process (see [8]). We shall use F i () to denote the -algebra generated by f j (); j ig. We shall assume that for some compact set, the process f i ()g regenerates and has a unique, ergodic invariant measure ~ (dx). The performance measure of interest is of the form: F () = Z f(x)~ (dx) = lim M!1 1 M M i=1 f[ i ()] (4) We shall also assume that F is continuously dierentiable. Before proceeding, we introduce two useful denitions. Denition: A sequence of measures f u (); u 2 Ug dened on a common, complete, separable state space S is said to be tight if for every there exists a compact set K S such that sup u2u u (K ) > 1?. Accordingly, we say that a sequence of random variables f i ; i = 1; 2 : : :g dened on a common, complete, separable state space S is tight if for every there exists a compact set K S such that P f i 62 K g. Tightness is the equivalent of compactness in the sense that if a set of measures is tight, then every sequence has a further weakly convergent subsequence, and the limit is a well dened probability measure (see [3]). Denition: We say that a sequence of random variables fy n ; n = 0; 1; : : :g dened on a common, complete, separable state space S is uniformly integrable if for every > 0 there exists a compact set K such that: sup n Z y>k y P fy n 2 dyg Uniform integrability follows from the weaker condition that Var(Y n ) be uniformly bounded in n (see [3]). A pathwise derivative estimator is an estimator constructed from the observations or measurements of the system over a nite horizon, that we shall call the estimation intervals and we shall denote by m. Consider the model for a xed value of the control at, where no updates takes place, but where we compute a pathwise derivative estimator using the observations contained in the m-th estimation interval, that is: n m=0 m () < i n+1 m=0 m () thus obtaining a sequence of estimates. A detailed analysis of weak convergence for general random estimation intervals in in [31]. In the present work, we shall focus only on the commonly used intervals, namely a constant number M of events 6

7 of the process f i ()g or a constant number M of regenerative cycles of the process f i ()g. Thus, both the estimation interval as well as the estimators are functions of the past state values. In particular, the derivative estimator Y n (M; ) is constructed via sample averages of quantities that depend on previous state values. In other words, there exists a F i ()-measurable process Z i () that accounts for the necessary bookkeeping. Call i () = ( i (); Z i ()) the enlarged state space that describes the process and the derivative estimation. Notice that the -algebra generated by f j (); j ig is F i (). Then the event f n () > ig F i (). Assumption 1 Consider the xed control process f i ()g, for 2. We assume that P (; B) = P f i+1 () 2 Bj i () = g is weakly continuous in (; ) and that f i ()g possesses a unique invariant, ergodic measure (d). In addition, we assume that the set of measures f (); 2 Ag is tight for every compact set A. Recall that we assumed that f i ()g has a unique, ergodic invariant measure. If fz i ()g are tight, then every sequence has a further weakly convergent subsequence. Since Z i () is a function of the states f j (); j ig and these possess a unique limiting measure, it follows that the limiting measure of the process i () is unique and uniquely determined by ~(). Denition: We say that Y n (M; ) is a strongly consistent estimator of G() if for any n: lim M!1 Y n(m; ) = G() a:s: We say that Y n (M; ) is consistent in the average sense for G() if for any M: ( 1 m?1 lim m!1 E Y n (M; ) m n=0 ) = G() We now proceed to the description of the model when (3) is used to update the values of at the end of the estimation intervals. A more general description can be found in [31], where a more complex system with asynchronous controllers is studied. In that reference, the updating intervals in a decentralized operation do not necessarily coincide with the estimation intervals of the processors that carry out the estimations. When the control variable changes this way, the process i evolves according to the dynamics of the xed control process at value n over the n-th estimation interval of corresponding length n, at the end of which the estimator Yn(M) is obtained and n is updated according to (3). Let n = P n m=0 m, we denote ~ i = n for i = n + 1; : : : ; n+1. This variable keeps the actual value of the control parameter as the process evolves. We denote by Fi the -algebra generated by fj ; ~ j ; j ig. From the Markovian structure, it follows that P fi+1 2 BjF i g = P f i+1 2 Bj i ; ~ i g and therefore the process f( i ; ~ i )g is a Markov Decision Process (MDP) with general state space. 7

8 Assumption 2 The random variables fyn(m)g and f ng are uniformly integrable. Assumption 3 The sequence f(i ; ~ i ); i = 1; 2; : : : ; > 0g is tight. 2.2 The Main Results The main results of strong convergence have been treated in detail in [7] and [5], [21] when IPA derivative estimators are used. The following version of the result considers the most commonly used assumptions on the estimation. We shall apply this result in Section 7. Theorem 1 Suppose that fy n ()g are independent and E[Y n ()j n = ] = r F () + n (); that sup EjY n ()j 2 < 1, and that F () is locally convex and has a unique maximum 2. Construct Y n as an estimator using the information available within the n-th estimation interval, where the control variable is kept at the value n. Update this value at the end of the estimation interval using (1). In order to ensure that n 2 with probability 1, use a truncation if necessary. Assume that (2) is satised and that n j n j < 1 a.s. Then n! with probability 1. n This model is known as the \martingale dierence noise" model for Y n. In many cases, when the estimators are strongly consistent, their variance decreases as 1=M. A common approach is to use increasing update intervals M(n) n and n 1=n. As mentioned before, weak convergence is established in the sense of convergence in distribution of the stochastic process dened by the time-varying control parameters. Following [31], we introduce the following control processes: Denition: The ladder interpolation process # (t) is dened by: and the natural interpolation process by: # (t) = n t 2 [n; (n + 1)) ~# (t) = ~ i t 2 [i; (i + 1)) As we shall see, these processes reect the common descriptions of learning algorithms, related to the iteration number (that is, in terms of the updates performed) and the actual time (in terms of the event scale of the process). It is 8

9 customary to present results of convergence as a function of the number of iterations performed. This would be related to the behavior of # (). If an updating scheme follows a number M of regenerative cycles to construct the estimators, then the length of the updating intervals depends on the control values themselves and it may be more realistic to study the behavior of ~ # (). Naturally, in the case of (1), where n! 0 and M(n)! 1, the natural interpolation process would stretch the time scale of the corresponding ladder interpolation process. In applications of on-line control, we are interested in the behavior and capability of tracking in real time rather than iteration number. In Section 7 we shall illustrate the various time scales of interest with computer simulations. Assumption 4 Suppose that the functions: are continuous and sup t () > 0. m?1 1 G() = lim E[Y n (M; )] m!1 m n=0 m?1 1 () = lim E[ n ()] m!1 m The following result summarizes the weak convergence approach and follows from [16], [18] and [31]. Theorem 2 Under Assumptions 1 to 4, if Yn is the estimator calculated within the interval n < i n+1 and (3) is used to update n (using a truncation to ensure n 2 if necessary), then # () converges in distribution to the deterministic solution of the ODE: d #(t) = G[#(t)] (5) dt n=0 and ~ # () ) ~ #(), where ~ #[(t)] = #(t) and d[(t)]=dt = [#(t)]. If, in addition, is the only stable point of (5), then lim t!1 #(t) = lim t!1 ~ #(t) =. In general, () > 1, for example, () = M or M times the average number of customers in a cycle, in the case of nite horizon or regenerative estimation, respectively. That is, the time scale of ~ #() is stretched, making its evolution slower. Recent research has considered the long term behavior of such SA procedures, analyzing the limiting stationary control process [24]. 3 A Flexible Machine We consider a system in which a machine processes items at dierent speeds v k ; k = 1; : : : ; V. Items arrive at the machine following a renewal process N(t). Let i () be the speed chosen by the machine for the i-th item processed. The service times of consecutive customers fs i g are independent random variables with E[S i j i () = 9

10 v k ] =?1, and k E[S2 i j i() = v k ] = & k 2. The associated operating cost per unit time is c k. We assume k < j ; c k < c j if k < j, thus faster modes of operation are more costly. Although we may know which speeds are faster in the average, in practice we may not know the distribution of the consecutive service times. The problem is to select the speeds of operation of the machines that minimize the waiting time at the lowest cost. The simplest strategy is the randomized strategy, in which P f i () = v k g = k. We assume that < V, ensuring stability of the process for all 2 = f 2 f0; 1g V : P V k=1 k = 1g. Therefore, the limiting probabilities exist, the process is ergodic, and the invariant measure is unique for each. Calling the vector ( k ; k = 1; : : : ; V ), we seek to optimize the performance function F () = W () + C(), where W () is the stationary average waiting time in the system and C() is the mean stationary cost resulting from the use of the machine. To nd the expression for C() = lim t!1 C(t)=t, where C(t) is the cumulative operation cost up to time t, we dene D(t) as the departure process, N the time of the N-th service completion, and T k (N) as the total time the machine operates at speed v k, having processed N items. Conditioning on the speed chosen at each of the N items, at time N, we have: E[C( N )] = V k=1 c k E[T k (N)] = N Under stability of the process, the arrival rate is the same as the departure rate, so lim N!1 (N= N ) = lim t!1 (D(t)=t) = a.s., thus: 1 V C() = lim t!1 t E[D(t)] k = k=1 k V k=1 V k=1 c k k k c k k k (6) From the structure of the model, F () is continuously dierentiable. This problem may be stated as a minimization problem of the function F () under the feasibility constraints: 4 Limiting ODE Minimize F () = W () + C() subject to h() = V i=1 i = 1 In this section we shall build the \target" ODE and show that its stable points are local minima. Our arguments are general for the optimization under randomized multi-valued decisions. Dene F 0() = ()=@ k. Letting be the vector of optimal values, the theory of Karush-Kuhn-Tucker (KKT) tells us there must exist a real number u such that: 1) Fk( 0 )? uh 0 k( = 0 if ) k > 0 0 if k = 0 for each k 10

11 2) h( ) = 1 3) k 0 for k = 1; ::; V Since h( ) = 1, the rst condition tells us that either k = 0 or F 0 k ( ) = u, which may be synthesized as u = P V k=1 k F 0 k ( ). Call G k the generalized gradient operator dened () () G k [F ()] =? j j Then, a value of such that P V k=1 k = 1 and G k [F ( = 0 if )] = k > 0 0 if for k = 1; ::; V (8) k = 0 satises the KKT conditions. Under convexity of F (), if satises (8), then =. Consider now the following target ODE: d# k (t) dt j=1 =?# k (t)g k [F (#(t))] (9) and notice that no truncation is necessary: if #(0) 2, then for any t, #(t) 2, which follows adding (9) over k and using the denition (7) of the generalized gradient, together with #(t) > 0. Lemma 1 If the starting point #(0) 2 is such that # k (0) > 0 for each k = 1; : : : ; V and it is not a local maximum of F (), then the stable points of (9) are local minima. If the function F () has a unique minimum, then (9) has an asymptote at = min 2 F (). Proof : We shall verify two conditions, namely, that the cost is non-increasing along the trajectory of the ODE, and that the stable points of the ODE are KKT points. The rst condition is easily veried since d V dt F [#(t)] = Fk[#(t)] 0 d# k(t) dt k=1 =? " V? # # k (t) D k (t)? D(t) 2 k=1 0 (10) where we have used the fact that (10) is the negative of a variance, with D k (t) = F 0[#(t)] and P k D(t) = Vk=1 # k (t)d k (t). Since F is bounded below by zero (costs are non-negative) and is non-increasing along the trajectory, then lim t!1 df [#(t)]=dt = 0 and F [#(t)] converges to a value F, possibly depending upon the initial value #(0). The second condition stems from the fact that, from (10), if = lim t!1 #(t) is any limit of the ODE, then d F [ ] = 0 and each term of the sum disappears. This dt 11

12 implies for each component, that either k = 0 or lim t!1 D k (t) = lim t!1 D(t), which may be rewritten as G k [F ( )] = 0. In order to verify (8), it suces to show now that if k = 0 then the corresponding term G k [F ( )] 0. By continuity of the generalized gradient, G k [F ( )] = lim t!1 G k [F (#(t))]. Using (9), since the limit point is k = 0, this component must decrease as time increases, thus d dt [#(t)] 0 for large enough t, implying that lim t!1 G k [F (#(t))] 0. The limiting ODE will therefore have a limit point that satises the KKT conditions and is a local minimum. If F has a unique minimum, then the gradient driven process (9) is asymptotically optimal in the sense that the trajectories of #(t) approach the optimal value as t! 1. 5 The RPA Gradient Estimators We shall now propose two phantom RPA estimators, generalizing [4] and [29] for the case of multi-valued decisions. If we do not know the exact values of k, the gradient of C() can be strongly consistently estimated via sample averages. Since F () = W ()+C(), it suces to build estimators of the generalized gradient in(9) for the waiting time. Call: W () = E 2 4 N bp() i=1 W i 3 5 ; N () = E[N bp ] (11) where N bp is the number of customers in one busy period of the process at control value. Then W () = W ()= N (). 5.1 Parallel Phantom Systems Our model can be stated according to the framework of Section 2, as follows. Call A i the interarrival time between customers i and i + 1, and let fu i g be a sequence of i.i.d uniform variates. The service of customer i is S i (u i ; i ) = V k=1 k G?1 k (u i)1 fi =v k g where G k () is the service distribution of speed v k. The discrete event process is described via Lindley's equations. If R i is the total time a customers remains in the system and W i its waiting time, then given R i, Lindley's equations yield: W i+1 = max(0; R i? A i ) R i+1 = S i+1 (u i ; i ) + W i+1 The above equations hold regardless of the way we choose the speeds of the machine, since we are using the information on the decisions themselves. We shall now use the notation i (0) for the decisions of the nominal system, taken according to P f i (0) = v k g = k, independently of f(a j ; u j ); j ig. Fix the 12

13 index k to estimate a sensitivity, let > 0 be a small number, and ~ 2 IR V be a vector with the value as the k-th component. The other components satisfy ~ l =?p l ; p l > 0; l 6= k and shall be determined later, according to the two gradient estimators that we shall describe. Given N customers, we dene a parallel phantom system using the same sequence (A i ; u i ) and assigning the decision ~ i of the i-th customer as follows: P f~ i = v l j i (0) = v l g = 1; l 6= k P f~ i = v k j i (0) = v k g = 1? k P f~ i = v l j i (0) = v k g = p l k ; l 6= k Then the decisions of the phantom system satisfy P f~ i = v l g = l? ~ l, for all l = 1; : : : ; V and ~ has a distribution according to ~ =? ~. Clearly, the evolution of such a system can be evaluated in parallel to the nominal system using Lindley's equation. For any sequence = f i g of decisions, we dene the cumulative waiting time over the rst M customers as ' M (). We shall be interested in two cases: M a xed, deterministic number, and M = N(), the number of customers in one busy period. To distinguish the two cases we use the notation: ' M () = M i=1 The nite dierences are dened by: W i ; 'N() = N() i=1 D (M) = ' M((0))? ' M (~) D (N) = ' N((0))? 'N(~) W i (12) Given M customers, the probability of having m phantoms in the sequence f~ i ; i = 1; : : : ; Mg is P f~ i 6= i (0)g = P f~ i 6= v k j i (0) = v k gp f i (0) = v k g =. Therefore, E[D (M)] = E + E h M(1? ) M?1 E (1) [D (M)] M m (1? ) M?m E (m) [D (M)] m m=2 " M where E (m) is the expectation w.r.t. ~, conditioning on having exactly m phantoms. As in [29], we use the fact that P M m=2 m (1? ) M?m M 2 2 to bound the second term. If M < 1 is a deterministic number, then ' M () P M j=1 S i for any sequence of decisions, and since E[S i ] 1= V < 1, we can P M i=1 M m i # (13) (14) 13

14 use the dominated convergence to establish the phantom RPA formula for any nite M: lim E[D (M)] = 1 E!0 k 8 < M : j=1 I k (j) [' M ((0)? ' M ((j))] where I k (j) = 1 fj (0)=v k g and we have used that all the sequences of phantom decisions that dier only in one component with (0) are equiprobable. The sequence (j) is dened by i (j) = i (0); i 6= j, and P f j (j) = v l j j (0) = v k g = p l. Consider now the nite dierence (14). In order to use the dominated convergence, we assume that the service distributions are stochastically dominated by a system of possibly random decisions such that S( i (0); u i ) S i (; u i ) and S i (~; u i ) S i (; u i ). Such is the case if G l (s) G V (s); l < V, with i v V. Then the nite dierence D W (N) converges as! 0 provided that E[N() 4 ] < 1. This follows from '(N((0)); (0))? '(N(~); ~) 2 P N() i=1 W i(), since then we can use that W i () is bounded by the length of the busy period in the system with decisions, obtaining E[N 2 ()N 2 ()]E[S i ()]! 0 as! 0. The corresponding RPA formula becomes: lim E[D (N)] = 1 E!0 k 8 < N(0) : j=1 The sequences (j) are dened as before. 5.2 The Swapping Phantoms I k (j) ['N((0))? 'N((j))] Our rst estimator of the generalized gradient is based in the following observation. Even if the partial derivatives F 0 k () make sense mathematically, expressed as F ()?F ( ) lim ~!0, where ~ =? ~ and ~ l = 0; l 6= k, the law of total probability is not preserved, because P V ~ i=1 i 6= 1 and thus the nite dierence for > 0 does not correspond to a physical process. An alternative is to look for appropriate directional derivatives. Consider ~ l = l =(1? k ). Simple algebraic manipulations show that F ()? F ( ) lim ~ = 1 G k [F ()] (17)!0 1? k Therefore, the estimation problem can be stated in terms of estimating the generalized gradient. Phantom customers swap speeds compared to the corresponding nominal decisions, choosing the remaining ones in their original proportion. We then use (15) or (16) with p l = l =(1? k ). This approach helps us dene the corresponding estimators to drive the process towards the limiting ODE (9). The practical problem is the required condition for domination. In our numerical examples, we used Poisson arrivals and uniformly distributed service distributions for each speed. In general, for an M=G=1 queue, Takacs method (see [14]) gives a functional relationship between the moment generating function of the service time and the one 14 9 = ; 9 = ; (15) (16)

15 of the number in a busy period. If the m-th moment of the service distribution is bounded, so is E[N m ], which follows from the above argument, although we omit the details of the proof. In our case, E[S 4 i ] < 1. Proposition 1 Assume that the service times are dominated by fs i ()g, for some random sequence, and that the dominating queueing process satises E[N 4 ()] < 1. Then: k G k [W ()] = lim (1? k)e M!1 k G k [ W ()] = (1? k )E k G k [ N ()] = (1? k )E 8 < N(0) : j=1 8 < N(0) : j=1 8 < : M j=1 I k (j) [' M((0))? ' M ((j))] M I k (j) ['N((0))? 'N((j))] I k (j) [N(0)? N(j)] Proof : The last two statements follow directly from the development of the phantom RPA formula, under the domination assumption. By construction, (15) is an unbiased estimator of (1? k )?1 G k [ P M i=1 W i]=m. The convergence as M! 1 of the r.h.s. follows from the uniform bound Ej' M ((0))? ' M ((j))j 2E[ P N j () i=1 W i ()] < 1, where N j () is the number of customers in the busy period of the dominating system where customer j belongs. This fact follows from the construction of the phantom systems: W i (0) = W i (j) for all i j. When the system nishes the busy period where j belongs, both the nominal and the j-th phantom system with decisions f(j)g nish. After this, their evolution is identical, since j was the only decision that is dierent. We need only to verify that the interchange between the expectation and the limit M! 1 is valid. Since the invariant measure is unique and ergodic, G k [W ()] < 1, and the generalized gradient is a linear operator, this is established if 1 M [W ()] = lim W i k k M for every initial distribution of the process, which in turn follows from [32] for our model, where all the n-step transition probabilities of the process are polynomials in ( i ; i = 1; : : : ; k) and therefore continuously dierentiable. 5.3 The Disappearing Phantoms The requirement that the system be stable if we always choose the slower speed may be too restrictive, especially when we do not know explicitly the service distributions. Naturally, if we knew the service distribution, we could evaluate the analytical solution and nd the optimal setting without need for adaptive control. In the more general situation, we let the machine function and evaluate its own i=1 9 = ; 9 = ; 9 = ; 15

16 estimates to drive the control towards the optimal value. In such cases, and when the service distributions as well as the input rate may vary, we need the algorithm to be able to track the optimal value. We propose now a more robust estimator that does not require assigning the services of the phantom customers as S i (u i ; v l ) with probability p l = l =(1? k ). Consider the alternative model for the original process, where customers of class k arrive according to a renewal process with rate k. They require an amount of service that has distribution G k (). In this model, we can use ~ =? ~ with p l = 0; l 6= k, meaning that some of the customers, the phantom ones, are not allowed entrance to the machine and thus \disappear" from the system. The corresponding system will have a total incoming rate of ~ =?. In this case, we obtain: F (; )? F ( ; G k (; ) = lim ~ ) (; ) = k G k [F (; )] and therefore the generalized gradient can be expressed as: G k [F (; )] = G k [F (; )]? V k=1 (18) k G k [F (; )] (19) In the case of the disappearing phantoms, Lindley equations can be calculated in parallel to the nominal system using S j (j) S j ( j (j); u j ) 0. Proposition 2 Assume that E[N(0) 4 ] < 1. Then: k G k [W ()] = lim k G k [ W ()] = E k G k [ N ()] = E 8 < M E M!1 : j=1 8 < N(0) : j=1 8 < N(0) : j=1 I k (j) [' M((0))? ' M ((j))] M I k (j) ['N((0))? 'N((j))] I k (j) [N(0)? N(j)] Proof : Since S j (j) = 0, the nominal system dominates all of the possible phantoms systems and N(~) N((0)) a.s., which requires then the condition E[N((0)) 4 ] < 1. Notice that in this case it is no longer required that = V < 1, but only that < () = P k k= k. As before, the rst statement follows from the a.s. convergence of the derivatives of the nite horizon averages to those of the stationary averages. 6 The Learning Algorithms In this section we shall describe the actual algorithms for estimating the gradients in (18) via the disappearing phantoms. We present two methods based on the 16 9 = ; 9 = ; 9 = ;

17 regenerative estimation approach, as well as the non-reset version of [29]. The development for the swapping phantoms is analogous and we omit the details. Let C k = c k = k be the partial derivative of C() w.r.t. k and call: so that, clearly, using (18),? k () = G k [W ()] + C k G k [F ()] =? k ()? V l=1 l? l () (20) We shall focus on constructing the estimators of? k (). To simplify notation, x the index k to evaluate the estimator of? k (), and call: W () = G k [ W ()]; N () = G k [ N ()] Taking the appropriate derivative of W () from (11), we get? k () = W () N ()? N() W () [ N ()] Regenerative and Non-Reset Estimators We shall rst describe the ensuing estimators when a regenerative approach is chosen. Since the nominal system dominates the phantoms, we shall use N m (0) to denote the number of customers in the m-th busy period of the phantom system, and N (m) (j) for the -th busy period of the j-th phantom system within the m-th nominal system. In order to obtain unbiased estimations of the numerator, we use the approach proposed in [5], [1]. We estimate? k () using 2M busy periods, estimating W () and W () using the odd numbered BP's, then N () and N () in the even ones, so that the expectation of the product is the product of the expectations. The estimators of N () and N () in the m-th busy period, called ^ N (m); ^ W (m) respectively, are unbiased using the sample averages. The estimators of N () and W () are given respectively by: ^ W (m) = 1 N m(0) k j=1 N m(0) i=j+1 d i (j); ^ N (m) = 1 N m(0) [N m (0)? N (m) 1 (j)] k where d i (j) = W i (0)? W i (j) S j (j) a.s. (see the Appendix), therefore: j=1 Varf^ W (m)g 1 2 k E 8 20 >< 6 >: 2 kn 2 m(0)e Nm(0) i=j+1 d i (j) 1 A 2 j N m (0) 39 7 >= 5>; 1 k E[N 4 (0)] V() < 1 (21) 17

18 which is uniformly bounded for 2, and clearly, Var[^ N (m)] < 1 is also uniformly bounded in. Using Proposition 2 and independence between dierent busy periods, Ef^ N (2m+ 1) ^ W (2m)g = N () W (), for every m. As described, the n-th estimation interval considers M pairs (2m; 2m + 1) of busy periods, from m = nm + 1 up to m = (n + 1)M. To simplify notation, for any function f(m) call: (n) (n+1)m f(m) = m=nm+1 f(m) Method 1: The regenerative estimator ^? (1) n (M; k) is given by: ^? (1) n (M; k) =? P (n) ^ W (2m) P (n) ^N (2m + 1) P (n) ^ N (2m + 1) P (n) ^W (2m) h P(n) ^N (2m + 1)i 2 + C k (22) Since the f^? (1) n (M; k)g are i.i.d. and their variance is bounded, they are strongly consistent estimators of? k (). However, they are not consistent in the average: for any m, 1 m P m?1 n=0 ^? (1) 1 (1) E[^? n (M; k)] = E[^? (1) 1 (M; k)], but for any xed M, the estimator (M; k) is biased. Method 2: As a variant, we consider another regenerative estimator ^? (1) n (M; k) similar to Y n (N), but for which the denominator is replaced by a cumulative version of it. We dene We use now: (n+1) ^ (n) (M) = 1 N ^ N (2m + 1) (n + 1)M m=0 ^? (2) n (M; k) =? P (n) ^ W (2m) ^ (n) N (M) P (n) ^ N (2m + 1) P (n) ^W (2m) h ^(n) N (M) i 2 + C k (23) This estimator is strongly consistent for? k (), and also consistent in the average, (n) since for every M, ^ N (M)! N() a.s. as n! 1. In practice, Method 1 has been preferred to Method 2 for stochastic optimization, mostly because the sequence of estimates are i.i.d. Both are strongly consistent and that is the required property for (1) to converge towards the optimum. We shall discuss their behavior in Section 7. 18

19 The non-reset RPA estimator uses a nite horizon of xed length M, without resetting the propagation of the perturbations to zero. It uses the information available over the service completions from customer nm +1 to customer (n+1)m. In this case, as we do not need the renewal theorem, we directly estimate dw d. Method 3: Consider now: ^? (3) n (M; k) = 1 k M nm n j= n i=nm+1 d i (j) + (n+1)m j=nm+1 (n+1)m i=j+1 1 d i (j) A + Ck (24) where n (respectively, n ) is the index of the customer that starts (nishes) the nominal busy period where customer nm +1 belongs. The second term corresponds to the (unbiased) estimator of the gradient of the nite horizon. The rst term accounts for the bookkeeping, and represents the propagation of the perturbation of the previous phantom systems into the current estimation interval i nm + 1. This estimator is both strongly consistent and (strongly) consistent in the average, provided that E[N 4 (0)] < 1. To see this, we use the fact that the double sum in the rst term (or accumulator) has expectation bounded by E[ k N(0)E[ P N(0) i=1 S j(j)jn(0)]] k E[N 2 (0]= k < 1. Its variance is bounded by V(). Let N (M) be the number of busy periods totally contained within [nm + 1; (n + 1)M], then (24) can be written as a sum over the N (M) busy periods plus the corresponding terms in the rst and last busy periods. Therefore, using independence and Wald's identity (conditioning on N (M)): Var[^? (3) n (M; k)] 1 M E N (M) M V() + 2 V() M which implies strong convergence as M! 1. As for the consistency in the average, it suces to remark that our estimator is additive, so that for any m, m?1 1 ^? (3) n (M; k) = ^? (3) 0 ((m? 1)M; k) m n=0 and therefore, for any M, as m! 1, from Proposition 2 and the fact that the initial accumulator converges to zero a.s. for any initial condition, this latter average converges to? k () a.s. Finally, using either of the estimators ^? (e) n (M; k); e = 1; 2; 3, let: Y (e) n;k (M; ) = k ^? (e) n (M; k)? V l=1 l^? (e) n (M; l) be the estimator of the derivative. Summarizing our previous results, it follows that Y (e) n;k (M; ) is a strongly consistent estimator of kg k () for e = 1; 2; 3 and consistent in the average for e = 2; 3.! 19

20 6.2 Iterative Gradient Search Algorithm We are now ready to go back to where we started. In the following section we shall analyze the behavior of the stochastic approximation using Methods e = 1; 2; 3 and updating each component k of the control variable as: n+1;k = n + Y (e) n;k (M; n) (25) where, for Methods 1 and 2, n has the total length of the busy periods labeled by m = nm + 1; : : : ; (n + 1)M and for Method 3, n = M. We shall next compare the performance of the procedures with the more commonly used stochastic approximation using Method 1 and: n+1 = n + n Y (1) n;k (M(n); n) (26) where, with n 1=n and M(n) n. According to Theorem 1, n! a.s. In the framework of Section 2, we let f i ()g denote the process, counting the waiting time in customer number. Our underlying probability space is dened by! = f! i = (A i ; u i ; i )g, so that i+1 () = max[0; ( i () + S i (u i ; i )? A i )]. In order to simplify notation, we focus on Methods 1 and 2 with M = 1, but the treatment for Method 3 and any M is similar. The enlarged state is constructed using Z i () = ( i(k); i? i ; (d i (j; k); j = i ; : : : ; i? 1); k = 1 : : : ; V ) where i is the index of the rst customer in the busy period where customer i belongs, d i (j; k) = [W i (0)? W i (j)]i k (j) and i(k) is given by: i(k) = i?1 j= i I k (j) i l=j+1 d l (j; k) (27) Assumptions 1 and 4 are veried for the process at xed control value. We provide in the Appendix the recursions satised by Z i (), which imply that P f i+1 () 2 BjF i ()g = P f i+1 () 2 Bj i ()g and this latter is a linear function of, yielding weak continuity. Notice that the enlarged process also regenerates at W i = 0, when the components of Z i are set to 0. Under stability of f i ()g for every 2, the number of customers and the length of the busy periods is nite a.s., implying that the invariant measure exists, it is ergodic and tight for every compact set A. Assumption 4 follows from E[N 4 (0)] < 1, as we have seen, with G k () = k G k () + (; M) for each component k and (t) = E[N(0)] when M = 1. For methods 2 and 3, (; M) = 0 and for method 1, (; M)! 0 as M! 1. Assumptions 2 and 3 refer to the MDP model f(i ; ~ i )g. Clearly, when W i+1 = 0 and an update takes place, the value of ~ i+1 is obtained using the current value of i(k); k = 1; : : : ; V. Then the future evolution of the state j depends on the value ~ i. Since sup 2 V() < 1, then Varf njn = g and VarfY (e) n;k (M; )j n = g are 20

21 uniformly bounded for 2, therefore they are uniformly integrable, verifying Assumption 2. Finally, Assumption 4 requires f(i ; ~ i )g to be tight. Tightness of ~ i follows from boundedness 0 k 1. Verifying tightness of i presents a minor diculty, which was also present when we coded the algorithms: the dimension of Z i is random and, in principle, unbounded, since it has as many components fd i (j; k)g as customers present in a busy period. The solution for the analysis of tightness is very similar to the practical solution. Use an array of xed but \large" dimension D, and notice that P f!: N m (0) > Dg can be made uniformly small for any 2, since the latter is a compact set and the process is stable for all. Given ~ i =, at the start of the estimation interval, the initial distribution is xed and independent of. Use now the bounds d i (j) S (the upper bound on our uniform distributions), i? i N m (0) where m is the current BP, i(k) N 2 m(0) S and the fact that W i is a.s. bounded by the length of the current BP. Using our truncation argument, we can now choose constants K i suciently large so that P f i () > K 1 ; i > K 2 ; (i? i ) > K 3 g <. This implies tightness of fi j~ i = g, over one regenerative cycle, where the bounds are independent of. 7 Simulation Results For the purposes of verifying our simulation results, we consider the case V = 2. Then = P f i = v 1 g becomes scalar, and the process regenerates if < (). Our model is a M/G/1 queue with uniformly distributed service for each speed. Let S(k); k = 1; 2 be the service time of items processed at speed v k, then S(k) U[a k ; b k ]. We used = 0:028; a 1 = 33; b 1 = 38; c 1 = 5; a 2 = 3; b 2 = 7; c 2 = 145. Let C = (c 1 = 1?c 2 = 2 ) be the derivative of C(). Using the Pollaczek-Kintchine formula: df d = (& 1 2? &2 2)? (& 1 2?1 2? & 2 2?1 2 1? + 1? 1 1 ) C where & k 2 = E[S2 k ]. Figure 1 shows the graph of the performance function, along with the waiting time and cost as functions of the control parameter. The dotted line plots C(), the dashed-dotted line plots W (), and the solid line plots F (). Clearly, for certain combinations of costs c 1 and c 2, it will be more advantageous to always use one of the two possible speeds, so that will either be 0 or 1. In our example, = 0:812 and F ( ) = 367:01, but the method can also be used if lies at the boundaries. In all our simulations, our initial point was (0) = 0:51. In all our graphs showing the behavior of # (), the solid line indicates the evolution of #(), given by (9), and the dotted line indicates the evolution of # (). 21

22 Figure 1: Behavior of the Cost Functions 7.1 Understanding Weak Convergence In this section we show how the choice of changes both the convergence rate and the uctuations of the control process associated to (25) around the limiting ODE. Our aim is to present in a visual manner what the convergence in distribution of a stochastic process to a limiting deterministic process means in practice. Theorem 2 predicts that, as! 0, the processes # () approach (in the weak topology) the solid curves in Figure 2, which are the solution of (9). We plotted the values ~ i against the number of iterations to show the processes # () The three plots are a sequence of trajectories of the ladder interpolation processes # () obtained using Method 3, with M = 100, going (from left to right) from = 5 10?4 to = 5 10?5 to = 5 10?6. This is the only dierence in the stochastic approximations Figure 2: Weak Convergence to the ODE Notice, however, that the iteration numbers increase by a corresponding factor in the plots. Naturally, as one increases and thus the rate of convergence, one loses accuracy in the stochastic approximation. 7.2 Understanding Consistency In The Average We show now how closely the dierent processes converge to the deterministic solution of the target ODE when (25) is used for the updates. Figure 3 shows the control processes when the regenerative estimators given in (22) (Method 1, to the 22

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 43, NO. 5, MAY

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 43, NO. 5, MAY IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 43, NO. 5, MAY 1998 631 Centralized and Decentralized Asynchronous Optimization of Stochastic Discrete-Event Systems Felisa J. Vázquez-Abad, Christos G. Cassandras,

More information

Online Companion for. Decentralized Adaptive Flow Control of High Speed Connectionless Data Networks

Online Companion for. Decentralized Adaptive Flow Control of High Speed Connectionless Data Networks Online Companion for Decentralized Adaptive Flow Control of High Speed Connectionless Data Networks Operations Research Vol 47, No 6 November-December 1999 Felisa J Vásquez-Abad Départment d informatique

More information

Optimal Rejuvenation for. Tolerating Soft Failures. Andras Pfening, Sachin Garg, Antonio Puliato, Miklos Telek, Kishor S. Trivedi.

Optimal Rejuvenation for. Tolerating Soft Failures. Andras Pfening, Sachin Garg, Antonio Puliato, Miklos Telek, Kishor S. Trivedi. Optimal Rejuvenation for Tolerating Soft Failures Andras Pfening, Sachin Garg, Antonio Puliato, Miklos Telek, Kishor S. Trivedi Abstract In the paper we address the problem of determining the optimal time

More information

Average Reward Parameters

Average Reward Parameters Simulation-Based Optimization of Markov Reward Processes: Implementation Issues Peter Marbach 2 John N. Tsitsiklis 3 Abstract We consider discrete time, nite state space Markov reward processes which depend

More information

If c, then (u; = for all initial endowment u (Gerber 979. As a consequence of this result, it is common to assume that premiums satisfy c >, which we

If c, then (u; = for all initial endowment u (Gerber 979. As a consequence of this result, it is common to assume that premiums satisfy c >, which we RPA Pathwise Derivative Estimation of Ruin Probabilities Felisa J. Vazquez-Abad Department of Computer Science and Operations Research University of Montreal Montreal, Quebec H3C 3J7, e-mail : vazquez@iro.umontreal.ca

More information

1. Introduction. Consider a single cell in a mobile phone system. A \call setup" is a request for achannel by an idle customer presently in the cell t

1. Introduction. Consider a single cell in a mobile phone system. A \call setup is a request for achannel by an idle customer presently in the cell t Heavy Trac Limit for a Mobile Phone System Loss Model Philip J. Fleming and Alexander Stolyar Motorola, Inc. Arlington Heights, IL Burton Simon Department of Mathematics University of Colorado at Denver

More information

Stochastic Dynamic Programming. Jesus Fernandez-Villaverde University of Pennsylvania

Stochastic Dynamic Programming. Jesus Fernandez-Villaverde University of Pennsylvania Stochastic Dynamic Programming Jesus Fernande-Villaverde University of Pennsylvania 1 Introducing Uncertainty in Dynamic Programming Stochastic dynamic programming presents a very exible framework to handle

More information

Simulation and Optimization

Simulation and Optimization Simulation and Optimization (PADS '97 Lecture) Y x (t) : Discrete Event Dynamic System (GSMP) x :control parameter f x () :time-dependent costs g x ( ) : transition-dependent costs i :jump times Expected

More information

Linear stochastic approximation driven by slowly varying Markov chains

Linear stochastic approximation driven by slowly varying Markov chains Available online at www.sciencedirect.com Systems & Control Letters 50 2003 95 102 www.elsevier.com/locate/sysconle Linear stochastic approximation driven by slowly varying Marov chains Viay R. Konda,

More information

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

IEOR 6711: Stochastic Models I, Fall 2003, Professor Whitt. Solutions to Final Exam: Thursday, December 18.

IEOR 6711: Stochastic Models I, Fall 2003, Professor Whitt. Solutions to Final Exam: Thursday, December 18. IEOR 6711: Stochastic Models I, Fall 23, Professor Whitt Solutions to Final Exam: Thursday, December 18. Below are six questions with several parts. Do as much as you can. Show your work. 1. Two-Pump Gas

More information

Exact Simulation of the Stationary Distribution of M/G/c Queues

Exact Simulation of the Stationary Distribution of M/G/c Queues 1/36 Exact Simulation of the Stationary Distribution of M/G/c Queues Professor Karl Sigman Columbia University New York City USA Conference in Honor of Søren Asmussen Monday, August 1, 2011 Sandbjerg Estate

More information

One important issue in the study of queueing systems is to characterize departure processes. Study on departure processes was rst initiated by Burke (

One important issue in the study of queueing systems is to characterize departure processes. Study on departure processes was rst initiated by Burke ( The Departure Process of the GI/G/ Queue and Its MacLaurin Series Jian-Qiang Hu Department of Manufacturing Engineering Boston University 5 St. Mary's Street Brookline, MA 2446 Email: hqiang@bu.edu June

More information

Waiting time characteristics in cyclic queues

Waiting time characteristics in cyclic queues Waiting time characteristics in cyclic queues Sanne R. Smits, Ivo Adan and Ton G. de Kok April 16, 2003 Abstract In this paper we study a single-server queue with FIFO service and cyclic interarrival and

More information

HITTING TIME IN AN ERLANG LOSS SYSTEM

HITTING TIME IN AN ERLANG LOSS SYSTEM Probability in the Engineering and Informational Sciences, 16, 2002, 167 184+ Printed in the U+S+A+ HITTING TIME IN AN ERLANG LOSS SYSTEM SHELDON M. ROSS Department of Industrial Engineering and Operations

More information

LIMITS FOR QUEUES AS THE WAITING ROOM GROWS. Bell Communications Research AT&T Bell Laboratories Red Bank, NJ Murray Hill, NJ 07974

LIMITS FOR QUEUES AS THE WAITING ROOM GROWS. Bell Communications Research AT&T Bell Laboratories Red Bank, NJ Murray Hill, NJ 07974 LIMITS FOR QUEUES AS THE WAITING ROOM GROWS by Daniel P. Heyman Ward Whitt Bell Communications Research AT&T Bell Laboratories Red Bank, NJ 07701 Murray Hill, NJ 07974 May 11, 1988 ABSTRACT We study the

More information

[4] T. I. Seidman, \\First Come First Serve" is Unstable!," tech. rep., University of Maryland Baltimore County, 1993.

[4] T. I. Seidman, \\First Come First Serve is Unstable!, tech. rep., University of Maryland Baltimore County, 1993. [2] C. J. Chase and P. J. Ramadge, \On real-time scheduling policies for exible manufacturing systems," IEEE Trans. Automat. Control, vol. AC-37, pp. 491{496, April 1992. [3] S. H. Lu and P. R. Kumar,

More information

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley Time Series Models and Inference James L. Powell Department of Economics University of California, Berkeley Overview In contrast to the classical linear regression model, in which the components of the

More information

Chapter 7 Interconnected Systems and Feedback: Well-Posedness, Stability, and Performance 7. Introduction Feedback control is a powerful approach to o

Chapter 7 Interconnected Systems and Feedback: Well-Posedness, Stability, and Performance 7. Introduction Feedback control is a powerful approach to o Lectures on Dynamic Systems and Control Mohammed Dahleh Munther A. Dahleh George Verghese Department of Electrical Engineering and Computer Science Massachuasetts Institute of Technology c Chapter 7 Interconnected

More information

Some multivariate risk indicators; minimization by using stochastic algorithms

Some multivariate risk indicators; minimization by using stochastic algorithms Some multivariate risk indicators; minimization by using stochastic algorithms Véronique Maume-Deschamps, université Lyon 1 - ISFA, Joint work with P. Cénac and C. Prieur. AST&Risk (ANR Project) 1 / 51

More information

Stability, Queue Length and Delay of Deterministic and Stochastic Queueing Networks Cheng-Shang Chang IBM Research Division T.J. Watson Research Cente

Stability, Queue Length and Delay of Deterministic and Stochastic Queueing Networks Cheng-Shang Chang IBM Research Division T.J. Watson Research Cente Stability, Queue Length and Delay of Deterministic and Stochastic Queueing Networks Cheng-Shang Chang IBM Research Division T.J. Watson Research Center P.O. Box 704 Yorktown Heights, NY 10598 cschang@watson.ibm.com

More information

MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES. Contents

MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES. Contents MARKOV CHAINS: STATIONARY DISTRIBUTIONS AND FUNCTIONS ON STATE SPACES JAMES READY Abstract. In this paper, we rst introduce the concepts of Markov Chains and their stationary distributions. We then discuss

More information

Class 11 Non-Parametric Models of a Service System; GI/GI/1, GI/GI/n: Exact & Approximate Analysis.

Class 11 Non-Parametric Models of a Service System; GI/GI/1, GI/GI/n: Exact & Approximate Analysis. Service Engineering Class 11 Non-Parametric Models of a Service System; GI/GI/1, GI/GI/n: Exact & Approximate Analysis. G/G/1 Queue: Virtual Waiting Time (Unfinished Work). GI/GI/1: Lindley s Equations

More information

Statistics 150: Spring 2007

Statistics 150: Spring 2007 Statistics 150: Spring 2007 April 23, 2008 0-1 1 Limiting Probabilities If the discrete-time Markov chain with transition probabilities p ij is irreducible and positive recurrent; then the limiting probabilities

More information

ECON 582: Dynamic Programming (Chapter 6, Acemoglu) Instructor: Dmytro Hryshko

ECON 582: Dynamic Programming (Chapter 6, Acemoglu) Instructor: Dmytro Hryshko ECON 582: Dynamic Programming (Chapter 6, Acemoglu) Instructor: Dmytro Hryshko Indirect Utility Recall: static consumer theory; J goods, p j is the price of good j (j = 1; : : : ; J), c j is consumption

More information

Queueing systems. Renato Lo Cigno. Simulation and Performance Evaluation Queueing systems - Renato Lo Cigno 1

Queueing systems. Renato Lo Cigno. Simulation and Performance Evaluation Queueing systems - Renato Lo Cigno 1 Queueing systems Renato Lo Cigno Simulation and Performance Evaluation 2014-15 Queueing systems - Renato Lo Cigno 1 Queues A Birth-Death process is well modeled by a queue Indeed queues can be used to

More information

Stochastic dominance with imprecise information

Stochastic dominance with imprecise information Stochastic dominance with imprecise information Ignacio Montes, Enrique Miranda, Susana Montes University of Oviedo, Dep. of Statistics and Operations Research. Abstract Stochastic dominance, which is

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

N.G.Bean, D.A.Green and P.G.Taylor. University of Adelaide. Adelaide. Abstract. process of an MMPP/M/1 queue is not a MAP unless the queue is a

N.G.Bean, D.A.Green and P.G.Taylor. University of Adelaide. Adelaide. Abstract. process of an MMPP/M/1 queue is not a MAP unless the queue is a WHEN IS A MAP POISSON N.G.Bean, D.A.Green and P.G.Taylor Department of Applied Mathematics University of Adelaide Adelaide 55 Abstract In a recent paper, Olivier and Walrand (994) claimed that the departure

More information

Renewal theory and its applications

Renewal theory and its applications Renewal theory and its applications Stella Kapodistria and Jacques Resing September 11th, 212 ISP Definition of a Renewal process Renewal theory and its applications If we substitute the Exponentially

More information

Lecture 10: Semi-Markov Type Processes

Lecture 10: Semi-Markov Type Processes Lecture 1: Semi-Markov Type Processes 1. Semi-Markov processes (SMP) 1.1 Definition of SMP 1.2 Transition probabilities for SMP 1.3 Hitting times and semi-markov renewal equations 2. Processes with semi-markov

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

Re-sampling and exchangeable arrays University Ave. November Revised January Summary

Re-sampling and exchangeable arrays University Ave. November Revised January Summary Re-sampling and exchangeable arrays Peter McCullagh Department of Statistics University of Chicago 5734 University Ave Chicago Il 60637 November 1997 Revised January 1999 Summary The non-parametric, or

More information

Garrett: `Bernstein's analytic continuation of complex powers' 2 Let f be a polynomial in x 1 ; : : : ; x n with real coecients. For complex s, let f

Garrett: `Bernstein's analytic continuation of complex powers' 2 Let f be a polynomial in x 1 ; : : : ; x n with real coecients. For complex s, let f 1 Bernstein's analytic continuation of complex powers c1995, Paul Garrett, garrettmath.umn.edu version January 27, 1998 Analytic continuation of distributions Statement of the theorems on analytic continuation

More information

Environment (E) IBP IBP IBP 2 N 2 N. server. System (S) Adapter (A) ACV

Environment (E) IBP IBP IBP 2 N 2 N. server. System (S) Adapter (A) ACV The Adaptive Cross Validation Method - applied to polling schemes Anders Svensson and Johan M Karlsson Department of Communication Systems Lund Institute of Technology P. O. Box 118, 22100 Lund, Sweden

More information

Multiplicative Multifractal Modeling of. Long-Range-Dependent (LRD) Trac in. Computer Communications Networks. Jianbo Gao and Izhak Rubin

Multiplicative Multifractal Modeling of. Long-Range-Dependent (LRD) Trac in. Computer Communications Networks. Jianbo Gao and Izhak Rubin Multiplicative Multifractal Modeling of Long-Range-Dependent (LRD) Trac in Computer Communications Networks Jianbo Gao and Izhak Rubin Electrical Engineering Department, University of California, Los Angeles

More information

Point Process Control

Point Process Control Point Process Control The following note is based on Chapters I, II and VII in Brémaud s book Point Processes and Queues (1981). 1 Basic Definitions Consider some probability space (Ω, F, P). A real-valued

More information

Fluid Heuristics, Lyapunov Bounds and E cient Importance Sampling for a Heavy-tailed G/G/1 Queue

Fluid Heuristics, Lyapunov Bounds and E cient Importance Sampling for a Heavy-tailed G/G/1 Queue Fluid Heuristics, Lyapunov Bounds and E cient Importance Sampling for a Heavy-tailed G/G/1 Queue J. Blanchet, P. Glynn, and J. C. Liu. September, 2007 Abstract We develop a strongly e cient rare-event

More information

Functional Limit theorems for the quadratic variation of a continuous time random walk and for certain stochastic integrals

Functional Limit theorems for the quadratic variation of a continuous time random walk and for certain stochastic integrals Functional Limit theorems for the quadratic variation of a continuous time random walk and for certain stochastic integrals Noèlia Viles Cuadros BCAM- Basque Center of Applied Mathematics with Prof. Enrico

More information

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks

Vasil Khalidov & Miles Hansard. C.M. Bishop s PRML: Chapter 5; Neural Networks C.M. Bishop s PRML: Chapter 5; Neural Networks Introduction The aim is, as before, to find useful decompositions of the target variable; t(x) = y(x, w) + ɛ(x) (3.7) t(x n ) and x n are the observations,

More information

g(.) 1/ N 1/ N Decision Decision Device u u u u CP

g(.) 1/ N 1/ N Decision Decision Device u u u u CP Distributed Weak Signal Detection and Asymptotic Relative Eciency in Dependent Noise Hakan Delic Signal and Image Processing Laboratory (BUSI) Department of Electrical and Electronics Engineering Bogazici

More information

Notes on Measure Theory and Markov Processes

Notes on Measure Theory and Markov Processes Notes on Measure Theory and Markov Processes Diego Daruich March 28, 2014 1 Preliminaries 1.1 Motivation The objective of these notes will be to develop tools from measure theory and probability to allow

More information

X. Hu, R. Shonkwiler, and M.C. Spruill. School of Mathematics. Georgia Institute of Technology. Atlanta, GA 30332

X. Hu, R. Shonkwiler, and M.C. Spruill. School of Mathematics. Georgia Institute of Technology. Atlanta, GA 30332 Approximate Speedup by Independent Identical Processing. Hu, R. Shonkwiler, and M.C. Spruill School of Mathematics Georgia Institute of Technology Atlanta, GA 30332 Running head: Parallel iip Methods Mail

More information

On the static assignment to parallel servers

On the static assignment to parallel servers On the static assignment to parallel servers Ger Koole Vrije Universiteit Faculty of Mathematics and Computer Science De Boelelaan 1081a, 1081 HV Amsterdam The Netherlands Email: koole@cs.vu.nl, Url: www.cs.vu.nl/

More information

OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS

OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS Xiaofei Fan-Orzechowski Department of Applied Mathematics and Statistics State University of New York at Stony Brook Stony

More information

3.1 Basic properties of real numbers - continuation Inmum and supremum of a set of real numbers

3.1 Basic properties of real numbers - continuation Inmum and supremum of a set of real numbers Chapter 3 Real numbers The notion of real number was introduced in section 1.3 where the axiomatic denition of the set of all real numbers was done and some basic properties of the set of all real numbers

More information

Competing sources of variance reduction in parallel replica Monte Carlo, and optimization in the low temperature limit

Competing sources of variance reduction in parallel replica Monte Carlo, and optimization in the low temperature limit Competing sources of variance reduction in parallel replica Monte Carlo, and optimization in the low temperature limit Paul Dupuis Division of Applied Mathematics Brown University IPAM (J. Doll, M. Snarski,

More information

1 Introduction This work follows a paper by P. Shields [1] concerned with a problem of a relation between the entropy rate of a nite-valued stationary

1 Introduction This work follows a paper by P. Shields [1] concerned with a problem of a relation between the entropy rate of a nite-valued stationary Prexes and the Entropy Rate for Long-Range Sources Ioannis Kontoyiannis Information Systems Laboratory, Electrical Engineering, Stanford University. Yurii M. Suhov Statistical Laboratory, Pure Math. &

More information

STOCHASTIC DIFFERENTIAL EQUATIONS WITH EXTRA PROPERTIES H. JEROME KEISLER. Department of Mathematics. University of Wisconsin.

STOCHASTIC DIFFERENTIAL EQUATIONS WITH EXTRA PROPERTIES H. JEROME KEISLER. Department of Mathematics. University of Wisconsin. STOCHASTIC DIFFERENTIAL EQUATIONS WITH EXTRA PROPERTIES H. JEROME KEISLER Department of Mathematics University of Wisconsin Madison WI 5376 keisler@math.wisc.edu 1. Introduction The Loeb measure construction

More information

Time is discrete and indexed by t =0; 1;:::;T,whereT<1. An individual is interested in maximizing an objective function given by. tu(x t ;a t ); (0.

Time is discrete and indexed by t =0; 1;:::;T,whereT<1. An individual is interested in maximizing an objective function given by. tu(x t ;a t ); (0. Chapter 0 Discrete Time Dynamic Programming 0.1 The Finite Horizon Case Time is discrete and indexed by t =0; 1;:::;T,whereT

More information

/97/$10.00 (c) 1997 AACC

/97/$10.00 (c) 1997 AACC Optimal Random Perturbations for Stochastic Approximation using a Simultaneous Perturbation Gradient Approximation 1 PAYMAN SADEGH, and JAMES C. SPALL y y Dept. of Mathematical Modeling, Technical University

More information

Optimization Tutorial 1. Basic Gradient Descent

Optimization Tutorial 1. Basic Gradient Descent E0 270 Machine Learning Jan 16, 2015 Optimization Tutorial 1 Basic Gradient Descent Lecture by Harikrishna Narasimhan Note: This tutorial shall assume background in elementary calculus and linear algebra.

More information

2 light traffic derivatives for the GI/G/ queue. We shall see that our proof of analyticity is mainly based on some recursive formulas very similar to

2 light traffic derivatives for the GI/G/ queue. We shall see that our proof of analyticity is mainly based on some recursive formulas very similar to Analyticity of Single-Server Queues in Light Traffic Jian-Qiang Hu Manufacturing Engineering Department Boston University Cummington Street Boston, MA 0225 September 993; Revised May 99 Abstract Recently,

More information

NEW FRONTIERS IN APPLIED PROBABILITY

NEW FRONTIERS IN APPLIED PROBABILITY J. Appl. Prob. Spec. Vol. 48A, 209 213 (2011) Applied Probability Trust 2011 NEW FRONTIERS IN APPLIED PROBABILITY A Festschrift for SØREN ASMUSSEN Edited by P. GLYNN, T. MIKOSCH and T. ROLSKI Part 4. Simulation

More information

TOWARDS BETTER MULTI-CLASS PARAMETRIC-DECOMPOSITION APPROXIMATIONS FOR OPEN QUEUEING NETWORKS

TOWARDS BETTER MULTI-CLASS PARAMETRIC-DECOMPOSITION APPROXIMATIONS FOR OPEN QUEUEING NETWORKS TOWARDS BETTER MULTI-CLASS PARAMETRIC-DECOMPOSITION APPROXIMATIONS FOR OPEN QUEUEING NETWORKS by Ward Whitt AT&T Bell Laboratories Murray Hill, NJ 07974-0636 March 31, 199 Revision: November 9, 199 ABSTRACT

More information

PROOF OF TWO MATRIX THEOREMS VIA TRIANGULAR FACTORIZATIONS ROY MATHIAS

PROOF OF TWO MATRIX THEOREMS VIA TRIANGULAR FACTORIZATIONS ROY MATHIAS PROOF OF TWO MATRIX THEOREMS VIA TRIANGULAR FACTORIZATIONS ROY MATHIAS Abstract. We present elementary proofs of the Cauchy-Binet Theorem on determinants and of the fact that the eigenvalues of a matrix

More information

LECTURE 12 UNIT ROOT, WEAK CONVERGENCE, FUNCTIONAL CLT

LECTURE 12 UNIT ROOT, WEAK CONVERGENCE, FUNCTIONAL CLT MARCH 29, 26 LECTURE 2 UNIT ROOT, WEAK CONVERGENCE, FUNCTIONAL CLT (Davidson (2), Chapter 4; Phillips Lectures on Unit Roots, Cointegration and Nonstationarity; White (999), Chapter 7) Unit root processes

More information

Operations Research Letters. Instability of FIFO in a simple queueing system with arbitrarily low loads

Operations Research Letters. Instability of FIFO in a simple queueing system with arbitrarily low loads Operations Research Letters 37 (2009) 312 316 Contents lists available at ScienceDirect Operations Research Letters journal homepage: www.elsevier.com/locate/orl Instability of FIFO in a simple queueing

More information

IEOR 6711, HMWK 5, Professor Sigman

IEOR 6711, HMWK 5, Professor Sigman IEOR 6711, HMWK 5, Professor Sigman 1. Semi-Markov processes: Consider an irreducible positive recurrent discrete-time Markov chain {X n } with transition matrix P (P i,j ), i, j S, and finite state space.

More information

Markov processes Course note 2. Martingale problems, recurrence properties of discrete time chains.

Markov processes Course note 2. Martingale problems, recurrence properties of discrete time chains. Institute for Applied Mathematics WS17/18 Massimiliano Gubinelli Markov processes Course note 2. Martingale problems, recurrence properties of discrete time chains. [version 1, 2017.11.1] We introduce

More information

Notes on Iterated Expectations Stephen Morris February 2002

Notes on Iterated Expectations Stephen Morris February 2002 Notes on Iterated Expectations Stephen Morris February 2002 1. Introduction Consider the following sequence of numbers. Individual 1's expectation of random variable X; individual 2's expectation of individual

More information

Developing an Algorithm for LP Preamble to Section 3 (Simplex Method)

Developing an Algorithm for LP Preamble to Section 3 (Simplex Method) Moving from BFS to BFS Developing an Algorithm for LP Preamble to Section (Simplex Method) We consider LP given in standard form and let x 0 be a BFS. Let B ; B ; :::; B m be the columns of A corresponding

More information

Acknowledgements I wish to thank in a special way Prof. Salvatore Nicosia and Dr. Paolo Valigi whose help and advices have been crucial for this work.

Acknowledgements I wish to thank in a special way Prof. Salvatore Nicosia and Dr. Paolo Valigi whose help and advices have been crucial for this work. Universita degli Studi di Roma \Tor Vergata" Modeling and Control of Discrete Event Dynamic Systems (Modellazione e Controllo di Sistemi Dinamici a Eventi Discreti) Francesco Martinelli Tesi sottomessa

More information

In Advances in Neural Information Processing Systems 6. J. D. Cowan, G. Tesauro and. Convergence of Indirect Adaptive. Andrew G.

In Advances in Neural Information Processing Systems 6. J. D. Cowan, G. Tesauro and. Convergence of Indirect Adaptive. Andrew G. In Advances in Neural Information Processing Systems 6. J. D. Cowan, G. Tesauro and J. Alspector, (Eds.). Morgan Kaufmann Publishers, San Fancisco, CA. 1994. Convergence of Indirect Adaptive Asynchronous

More information

`First Come, First Served' can be unstable! Thomas I. Seidman. Department of Mathematics and Statistics. University of Maryland Baltimore County

`First Come, First Served' can be unstable! Thomas I. Seidman. Department of Mathematics and Statistics. University of Maryland Baltimore County revision2: 9/4/'93 `First Come, First Served' can be unstable! Thomas I. Seidman Department of Mathematics and Statistics University of Maryland Baltimore County Baltimore, MD 21228, USA e-mail: hseidman@math.umbc.edui

More information

Data analysis and stochastic modeling

Data analysis and stochastic modeling Data analysis and stochastic modeling Lecture 7 An introduction to queueing theory Guillaume Gravier guillaume.gravier@irisa.fr with a lot of help from Paul Jensen s course http://www.me.utexas.edu/ jensen/ormm/instruction/powerpoint/or_models_09/14_queuing.ppt

More information

Sensitivity Analysis for Discrete-Time Randomized Service Priority Queues

Sensitivity Analysis for Discrete-Time Randomized Service Priority Queues Sensitivity Analysis for Discrete-Time Randomized Service Priority Queues George Kesidis 1, Takis Konstantopoulos 2, Michael Zazanis 3 1. Elec. & Comp. Eng. Dept, University of Waterloo, Waterloo, ON,

More information

Q = (c) Assuming that Ricoh has been working continuously for 7 days, what is the probability that it will remain working at least 8 more days?

Q = (c) Assuming that Ricoh has been working continuously for 7 days, what is the probability that it will remain working at least 8 more days? IEOR 4106: Introduction to Operations Research: Stochastic Models Spring 2005, Professor Whitt, Second Midterm Exam Chapters 5-6 in Ross, Thursday, March 31, 11:00am-1:00pm Open Book: but only the Ross

More information

N.G. Dueld. M. Kelbert. Yu.M. Suhov. In this paper we consider a queueing network model under an arrivalsynchronization

N.G. Dueld. M. Kelbert. Yu.M. Suhov. In this paper we consider a queueing network model under an arrivalsynchronization THE BRANCHING DIFFUSION APPROIMATION FOR A MODEL OF A SYNCHRONIZED QUEUEING NETWORK N.G. Dueld AT&T Laboratories, Room 2C-323, 600 Mountain Avenue, Murray Hill, NJ 07974, USA. E-mail: duffield@research.att.com

More information

Completion Time in Dynamic PERT Networks 57 job are nished, as well as that the associated service station has processed the same activity of the prev

Completion Time in Dynamic PERT Networks 57 job are nished, as well as that the associated service station has processed the same activity of the prev Scientia Iranica, Vol. 14, No. 1, pp 56{63 c Sharif University of Technology, February 2007 Project Completion Time in Dynamic PERT Networks with Generating Projects A. Azaron 1 and M. Modarres In this

More information

7 Variance Reduction Techniques

7 Variance Reduction Techniques 7 Variance Reduction Techniques In a simulation study, we are interested in one or more performance measures for some stochastic model. For example, we want to determine the long-run average waiting time,

More information

Carnegie Mellon University Forbes Ave. Pittsburgh, PA 15213, USA. fmunos, leemon, V (x)ln + max. cost functional [3].

Carnegie Mellon University Forbes Ave. Pittsburgh, PA 15213, USA. fmunos, leemon, V (x)ln + max. cost functional [3]. Gradient Descent Approaches to Neural-Net-Based Solutions of the Hamilton-Jacobi-Bellman Equation Remi Munos, Leemon C. Baird and Andrew W. Moore Robotics Institute and Computer Science Department, Carnegie

More information

Solution: The process is a compound Poisson Process with E[N (t)] = λt/p by Wald's equation.

Solution: The process is a compound Poisson Process with E[N (t)] = λt/p by Wald's equation. Solutions Stochastic Processes and Simulation II, May 18, 217 Problem 1: Poisson Processes Let {N(t), t } be a homogeneous Poisson Process on (, ) with rate λ. Let {S i, i = 1, 2, } be the points of the

More information

Least Mean Square Algorithms With Markov Regime-Switching Limit

Least Mean Square Algorithms With Markov Regime-Switching Limit IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 50, NO. 5, MAY 2005 577 Least Mean Square Algorithms With Markov Regime-Switching Limit G. George Yin, Fellow, IEEE, and Vikram Krishnamurthy, Fellow, IEEE

More information

= w 2. w 1. B j. A j. C + j1j2

= w 2. w 1. B j. A j. C + j1j2 Local Minima and Plateaus in Multilayer Neural Networks Kenji Fukumizu and Shun-ichi Amari Brain Science Institute, RIKEN Hirosawa 2-, Wako, Saitama 35-098, Japan E-mail: ffuku, amarig@brain.riken.go.jp

More information

Stochastic Processes

Stochastic Processes Introduction and Techniques Lecture 4 in Financial Mathematics UiO-STK4510 Autumn 2015 Teacher: S. Ortiz-Latorre Stochastic Processes 1 Stochastic Processes De nition 1 Let (E; E) be a measurable space

More information

Notes on Time Series Modeling

Notes on Time Series Modeling Notes on Time Series Modeling Garey Ramey University of California, San Diego January 17 1 Stationary processes De nition A stochastic process is any set of random variables y t indexed by t T : fy t g

More information

Statistical Learning Theory

Statistical Learning Theory Statistical Learning Theory Fundamentals Miguel A. Veganzones Grupo Inteligencia Computacional Universidad del País Vasco (Grupo Inteligencia Vapnik Computacional Universidad del País Vasco) UPV/EHU 1

More information

Power Domains and Iterated Function. Systems. Abbas Edalat. Department of Computing. Imperial College of Science, Technology and Medicine

Power Domains and Iterated Function. Systems. Abbas Edalat. Department of Computing. Imperial College of Science, Technology and Medicine Power Domains and Iterated Function Systems Abbas Edalat Department of Computing Imperial College of Science, Technology and Medicine 180 Queen's Gate London SW7 2BZ UK Abstract We introduce the notion

More information

Lecture 2: Review of Prerequisites. Table of contents

Lecture 2: Review of Prerequisites. Table of contents Math 348 Fall 217 Lecture 2: Review of Prerequisites Disclaimer. As we have a textbook, this lecture note is for guidance and supplement only. It should not be relied on when preparing for exams. In this

More information

Balance properties of multi-dimensional words

Balance properties of multi-dimensional words Theoretical Computer Science 273 (2002) 197 224 www.elsevier.com/locate/tcs Balance properties of multi-dimensional words Valerie Berthe a;, Robert Tijdeman b a Institut de Mathematiques de Luminy, CNRS-UPR

More information

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks

Recap. Probability, stochastic processes, Markov chains. ELEC-C7210 Modeling and analysis of communication networks Recap Probability, stochastic processes, Markov chains ELEC-C7210 Modeling and analysis of communication networks 1 Recap: Probability theory important distributions Discrete distributions Geometric distribution

More information

Numerical Solution of Hybrid Fuzzy Dierential Equation (IVP) by Improved Predictor-Corrector Method

Numerical Solution of Hybrid Fuzzy Dierential Equation (IVP) by Improved Predictor-Corrector Method Available online at http://ijim.srbiau.ac.ir Int. J. Industrial Mathematics Vol. 1, No. 2 (2009)147-161 Numerical Solution of Hybrid Fuzzy Dierential Equation (IVP) by Improved Predictor-Corrector Method

More information

STATIC LECTURE 4: CONSTRAINED OPTIMIZATION II - KUHN TUCKER THEORY

STATIC LECTURE 4: CONSTRAINED OPTIMIZATION II - KUHN TUCKER THEORY STATIC LECTURE 4: CONSTRAINED OPTIMIZATION II - KUHN TUCKER THEORY UNIVERSITY OF MARYLAND: ECON 600 1. Some Eamples 1 A general problem that arises countless times in economics takes the form: (Verbally):

More information

Fuzzy and Non-deterministic Automata Ji Mo ko January 29, 1998 Abstract An existence of an isomorphism between a category of fuzzy automata and a cate

Fuzzy and Non-deterministic Automata Ji Mo ko January 29, 1998 Abstract An existence of an isomorphism between a category of fuzzy automata and a cate University of Ostrava Institute for Research and Applications of Fuzzy Modeling Fuzzy and Non-deterministic Automata Ji Mo ko Research report No. 8 November 6, 1997 Submitted/to appear: { Supported by:

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.262 Discrete Stochastic Processes Midterm Quiz April 6, 2010 There are 5 questions, each with several parts.

More information

A Stable Finite Dierence Ansatz for Higher Order Dierentiation of Non-Exact. Data. Bob Anderssen and Frank de Hoog,

A Stable Finite Dierence Ansatz for Higher Order Dierentiation of Non-Exact. Data. Bob Anderssen and Frank de Hoog, A Stable Finite Dierence Ansatz for Higher Order Dierentiation of Non-Exact Data Bob Anderssen and Frank de Hoog, CSIRO Division of Mathematics and Statistics, GPO Box 1965, Canberra, ACT 2601, Australia

More information

A Simple Solution for the M/D/c Waiting Time Distribution

A Simple Solution for the M/D/c Waiting Time Distribution A Simple Solution for the M/D/c Waiting Time Distribution G.J.Franx, Universiteit van Amsterdam November 6, 998 Abstract A surprisingly simple and explicit expression for the waiting time distribution

More information

ECE 3511: Communications Networks Theory and Analysis. Fall Quarter Instructor: Prof. A. Bruce McDonald. Lecture Topic

ECE 3511: Communications Networks Theory and Analysis. Fall Quarter Instructor: Prof. A. Bruce McDonald. Lecture Topic ECE 3511: Communications Networks Theory and Analysis Fall Quarter 2002 Instructor: Prof. A. Bruce McDonald Lecture Topic Introductory Analysis of M/G/1 Queueing Systems Module Number One Steady-State

More information

Adaptive linear quadratic control using policy. iteration. Steven J. Bradtke. University of Massachusetts.

Adaptive linear quadratic control using policy. iteration. Steven J. Bradtke. University of Massachusetts. Adaptive linear quadratic control using policy iteration Steven J. Bradtke Computer Science Department University of Massachusetts Amherst, MA 01003 bradtke@cs.umass.edu B. Erik Ydstie Department of Chemical

More information

Ergodic Subgradient Descent

Ergodic Subgradient Descent Ergodic Subgradient Descent John Duchi, Alekh Agarwal, Mikael Johansson, Michael Jordan University of California, Berkeley and Royal Institute of Technology (KTH), Sweden Allerton Conference, September

More information

Program in Statistics & Operations Research. Princeton University. Princeton, NJ March 29, Abstract

Program in Statistics & Operations Research. Princeton University. Princeton, NJ March 29, Abstract An EM Approach to OD Matrix Estimation Robert J. Vanderbei James Iannone Program in Statistics & Operations Research Princeton University Princeton, NJ 08544 March 29, 994 Technical Report SOR-94-04 Abstract

More information

Fixed Term Employment Contracts. in an Equilibrium Search Model

Fixed Term Employment Contracts. in an Equilibrium Search Model Supplemental material for: Fixed Term Employment Contracts in an Equilibrium Search Model Fernando Alvarez University of Chicago and NBER Marcelo Veracierto Federal Reserve Bank of Chicago This document

More information

Markov decision processes and interval Markov chains: exploiting the connection

Markov decision processes and interval Markov chains: exploiting the connection Markov decision processes and interval Markov chains: exploiting the connection Mingmei Teo Supervisors: Prof. Nigel Bean, Dr Joshua Ross University of Adelaide July 10, 2013 Intervals and interval arithmetic

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha

More information

Spurious Chaotic Solutions of Dierential. Equations. Sigitas Keras. September Department of Applied Mathematics and Theoretical Physics

Spurious Chaotic Solutions of Dierential. Equations. Sigitas Keras. September Department of Applied Mathematics and Theoretical Physics UNIVERSITY OF CAMBRIDGE Numerical Analysis Reports Spurious Chaotic Solutions of Dierential Equations Sigitas Keras DAMTP 994/NA6 September 994 Department of Applied Mathematics and Theoretical Physics

More information

Distributed Learning based on Entropy-Driven Game Dynamics

Distributed Learning based on Entropy-Driven Game Dynamics Distributed Learning based on Entropy-Driven Game Dynamics Bruno Gaujal joint work with Pierre Coucheney and Panayotis Mertikopoulos Inria Aug., 2014 Model Shared resource systems (network, processors)

More information

Lecture 5. 1 Chung-Fuchs Theorem. Tel Aviv University Spring 2011

Lecture 5. 1 Chung-Fuchs Theorem. Tel Aviv University Spring 2011 Random Walks and Brownian Motion Tel Aviv University Spring 20 Instructor: Ron Peled Lecture 5 Lecture date: Feb 28, 20 Scribe: Yishai Kohn In today's lecture we return to the Chung-Fuchs theorem regarding

More information

Asymptotics for Polling Models with Limited Service Policies

Asymptotics for Polling Models with Limited Service Policies Asymptotics for Polling Models with Limited Service Policies Woojin Chang School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 30332-0205 USA Douglas G. Down Department

More information

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence: A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition

More information