Linear stochastic approximation driven by slowly varying Markov chains

Size: px
Start display at page:

Download "Linear stochastic approximation driven by slowly varying Markov chains"

Transcription

1 Available online at Systems & Control Letters Linear stochastic approximation driven by slowly varying Marov chains Viay R. Konda, John N. Tsitsilis Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA Received 5 June 2002; received in revised form 30 December 2002; accepted 23 February 2003 Abstract We study a linear stochastic approximation algorithm that arises in the context of reinforcement learning. The algorithm employs a decreasing step-size, and is driven by Marov noise with time-varying statistics. We show that under suitable conditions, the algorithm can trac the changes in the statistics of the Marov noise, as long as these changes are slower than the rate at which the step-size of the algorithm goes to zero. c 2003 Elsevier B.V. All rights reserved. Keywords: Stochastic approximation; Adaptive algorithms; Reinforcement learning 0. Introduction The convergence of stochastic approximation algorithms driven by ergodic noise sequences has been extensively studied in the literature [1,5,9,10]. Typical convergence results show that the iterates converge to a stationary point of the vector eld obtained by averaging the update direction with respect to the statistics of the driving noise. In some applications, the statistics of the driving noise change with time. In such cases, the point to which the algorithm would converge if the noise were held stationary, also changes with time. In this context, a meaningful obective is to have the stochastic approximation algorithm trac this changing point Corresponding author. addresses: onda@alum.mit.edu V.R. Konda, nt@mit.edu J.N. Tsitsilis. closely, after an initial transient period. Algorithms of this type are often called adaptive, as they adapt themselves to the changing environment for a textboo account of adaptive algorithms, see [1]. The tracing ability of adaptive algorithms has been analyzed in several contexts [6,14]. For example, it is nown that the usual constant step-size stochastic approximation algorithms can adapt to changes in the statistics of the driving noise that are slow relative to the step-size of the algorithm. However, the tracing ability of decreasing step-size stochastic approximation algorithms has not been studied before, because of the nature of the assumptions that would be required: for the changes in the statistics of the driving noise to remain slow relative to the decreasing step-size, these changes should become progressively slower. Such an assumption would be too restrictive and unnatural in applications where the noise is exogenously generated /03/$ - see front matter c 2003 Elsevier B.V. All rights reserved. doi: /s

2 96 V.R. Konda, J.N. Tsitsilis / Systems & Control Letters In some contexts, however, the statistics of the driving noise depend only on a parameter that is deliberately changed by the user at a rate which is slower than the natural time scale of the stochastic approximation algorithm. In such cases, it becomes meaningful to study the tracing ability of decreasing step-size stochastic approximation. In this paper, we focus on linear iterations and establish that when the update direction and the statistics of the noise in the update direction depend on a slowly changing parameter, the algorithm can trac such changes in a strong sense to be explained later. To the best of our nowledge, this is the rst result on the tracing ability of adaptive algorithms with decreasing step-sizes. Similar results are possible for methods involving nonlinear iterations under certain stability conditions see, e.g. [4] for stability conditions in a single time scale setting with a simpler noise model, but this direction is not pursued here. The rest of the paper is organized as follows. In Section 1, we motivate the linear algorithms considered within the context of reinforcement learning. In Section 2, we state the main result of this paper. The last three sections are devoted to the proof of this result. 1. Motivation The motivation for the analysis of the linear iterations considered in this paper comes from simulation-based reinforcement learning methods for Marov decision problems [2,13], such as Temporal Dierence TD learning [12], that are used to approximate the value function under a given xed policy. More precisely, consider an ergodic nite-state Marov chain {X }, with transition probabilities py x. Let gx be a one-stage reward function, and let be its steady-state expectation. TD learning methods consider value functions that are linear in a prescribed set of basis functions i x, the Poisson equation V x=gx + py xv y: y The parameters and r, which denote the estimate of average reward and the parameters of the approximate value function, are updated recursively as +1 = + gx ; r +1 = r + gx + r X +1 r X Z ; where Z = l X l l=0 is a so-called eligibility trace, and is a constant with Observe that the update equation for the vector ;r is of the form r +1 = r + hy +1 GY +1 r ; where Y +1 =X ;X +1 ;Z is a Marov chain whose transition probabilities are not aected by the parameters being updated. However, in the context of optimization over a parametric family of Marov chains, both the basis functions and the transition probabilities of the Marov chain {X } depend on a policy parameter that is constantly changing. Therefore, in this context, the update taes the form r +1 = r + h Y +1 G Y +1 r ; where denotes the value of at time, and where Y +1 is generated from Y using the transition probabilities corresponding to.if is held constant at, the algorithm would generally converge to some r. We would lie to prove that even if moves slowly, then r tracs r, in the sense that r r goes to zero. A result of this type, as developed in the next section, is necessary for analyzing the convergence properties of actor-critic algorithms, which combine temporal dierence learning and slowly changing policies [7,8]. ˆ V x; r= i r i i x 2. Main result and adust the vector r of free parameters r i, to obtain a sequence of approximations to the solution V of Consider a stochastic process {Y } taing values in a Polish complete, separable, metric space Y,

3 V.R. Konda, J.N. Tsitsilis / Systems & Control Letters endowed with its Borel -eld. Let {P y; dy; R n } be a parameterized family of transition ernels on Y. Consider the following update equations for a vector r R m and a parameter R n : r +1 = r + h Y +1 G Y +1 r + +1 r ; +1 = + H +1 : 1 In the above iteration, {h ;G : R n } is a parameterized family of m-vector valued and m m-matrixvalued measurable functions on Y. Also, H is a random process that drives the changes in the parameter, which in turn aects the linear stochastic approximation updates of r.wenow continue with our assumptions. Assumption 1. The step-size sequence { } is deterministic, non-increasing, and satises = ; 2 : Let F be the -eld generated by {Y l ;H l ; r l ; l : l 6 }. Assumption 2. For every measurable set A Y, PY +1 A F =PY +1 A Y ; =P Y ;A: For any measurable function f on Y, let P f denote the measurable function y P y; dyfy. Also, for any vector r, let r denote its Euclidean norm. Assumption 3 Existence and properties of solutions to the Poisson equation: For each, there exist functions h R m, G R m m, ĥ : Y R m, and Ĝ : Y R m m that satisfy the following: 1. For each y Y, ĥ y=h y h+p ĥ y; Ĝ y=g y G+P Ĝ y: 2. For some constant C and for all, we have max h ; G 6 C: 3. For any d 0, there exists C d 0 such that sup E[ f Y d ] 6 C d ; where f represents any of the functions ĥ ;h ; Ĝ ;G. 4. For some constant C 0, and for all ; R n, max h h ; G G 6 C : 5. There exists a positive measurable function C on Y such that for each d 0, sup E[CY d ] ; and P f y P f y 6 Cy ; where f is any of the functions ĥ and Ĝ. Assumption 4 Slowly changing environment: The random process {H } satises sup E[ H d ] for all d 0. Furthermore, the sequence { } is deterministic and satises d ; for some d 0. Assumption 5. The sequence { } is a m m-matrix valued F -martingale dierence, with bounded moments i.e., E[ +1 F ]=0; sup E[ d ] d 0: Assumption 6 Uniformpositivedeniteness. There exists some a 0 such that for all r R m and R n : r Gr a r 2 : Our main result is the following. Theorem 7. If Assumptions 1 6 are satised, then lim r G 1 h =0: When is held xed at some value, for all, our result states that r converges to G 1 h, which is a special case of Theorem 17 in p. 239 of [1]. In fact Assumptions 1 and 2 are the counterparts

4 98 V.R. Konda, J.N. Tsitsilis / Systems & Control Letters of Assumptions A.1 and A.2 of [1], and Assumption 3 is the counterpart of Assumptions A.3 and A.4 in [1]. Several sucient conditions for this assumption are presented in [1] when the state space Y of the process Y is a subset of a Euclidean space. When Y is Polish, these conditions can be generalized using the techniques of [11]. There are two dierences between our assumptions and those in [1]. A minor dierence is that [1] considers vector-valued processes Y, whereas we consider more general processes Y. Accordingly, the bounds in [1] are stated in terms of Y, whereas our bounds are stated in terms of some positive functions of Y. The second dierence, which is the more signicant one, is that is changing, albeit slowly. For this reason, we need to use a dierent proof technique. Our proof combines various techniques used in [1,3,4]. In the next section, we present an overview of the proof and the intuition behind it. 3. Overview of the proof We note that the sequence ˆ = G r h satises the iteration: ˆ +1 =ˆ G +1 ˆ ; where 1 +1 = G +1 h Y +1 h G +1 G Y +1 G r + G r ; 2 +1 = 1 G +1 G r h +1 h as can be veried by some simple algebra. Assumption 3 implies that the vector h Y +1 and the matrix G Y +1 have expected values h and G respectively under the steady-state distribution of the time-homogeneous Marov chain Y with transition ernel P. Therefore, we expect that the eect of the error term 1 +1 can be averaged out in the long term. Similarly, since is changing very slowly with respect to the step-size, we expect that 2 +1 goes to zero. The proof consists of showing that the terms ;i=1; 2; are inconsequential in the limit, and by observing that the sequence { ˆ } converges to zero when these terms are absent. We formalize this intuition in the next two sections. Note that the error terms are ane in r, and therefore can be very large if r is large. A ey step in the proof is to show boundedness of the iterates r, and this is the subect of the next section. The main result is then proved in the last section. The approach and techniques used here are inspired by more general techniques developed in [1,4]. i Proof of boundedness Note that the dierence between two successive iterates at time is of the order, which goes to zero as goes to innity. Therefore, to study the asymptotic behavior of the sequence {r }, we need to focus on a subsequence {r }, where the sequence of non-negative integers { } is dened by 0 =0; +1 = min 1 T : l= Here, T is a positive constant that will be held xed throughout the rest of the paper. The sequence { } is chosen so that any two successive elements are suciently apart, resulting in a non-trivial dierence between r +1 and r. To obtain a relation between r +1 and r, we dene a sequence { ˆr } by ˆr = r =max1; r for : Note that ˆr 6 1 for all. Furthermore, ˆr is F -adapted and satises ˆr +1 =ˆr + h max1; r G ˆr + ˆ +1 ; ; where for, the term ˆ +1 = h Y +1 h max1; r G Y +1 G ˆr + +1 ˆr ;

5 V.R. Konda, J.N. Tsitsilis / Systems & Control Letters can be viewed as perturbation noise. Similarly, for each, dene a sequence {r } as follows: r =ˆr ; r +1 = r + h max1; r G r ; : Note that the iteration satised by r is the same as that of ˆr, except that the perturbation noise is not present in it. We will show that the perturbation noise is negligible, and that ˆr tracs r, in the sense that lim max ˆr r =0; w:p:1: The next lemma provides bounds on the perturbation noise. It involves the stopping times 1 C, which are dened as follows: given some constant C 1, let 1 C = min{ : ˆr C}: The stopping time 1 C is the rst time the random sequence { ˆr } exits a ball of radius C. We will often use the simpler notation 1, if the value of C is clear from the context. The following lemma derives bounds on the eect of the perturbation noise before time 1. ˆ Lemma 8. For any given C 0, there exists a constant C 1 0 such that for all, we have 2 E max l ˆ l+1 6 C1 2 : +1 l= +1 = +1 Proof. We only outline the proof as this result is similar to Proposition 7, in [1, pp. 228]. The main dierence is that this proposition considers the sum 1 l=0 l ˆ l+1, whereas we are interested in the sum l= +1 l ˆ l+1. The dierence in the initial limit of the summation does not aect the proof. However, the difference in the nal limit of the summation could aect the proof, because ˆr I{1 =} is not bounded. However, we note that the proof in [1] still goes through, as long as ˆr I{1 =} has bounded moments. To see that ˆr I{1 = } has bounded moments, we observe that it is ane in ˆr 1 I{1 = } which is bounded, with the coecients having bounded moments. The outline of the rest of the proof is as follows. Consider a xed, and suppress the superscript to simplify notation. Note that the perturbation noise ˆ l+1 is of the form F l ˆr l ; Y l+1 F l ˆr l + l+1 ˆr l ; where F r is the steady-state expectation of F r; Y l, and where Y l is a Marov chain with transition ernel P. Using Assumption 3, it is easy to see that for each ; r there exists a solution ˆF r; y to the Poisson equation: ˆF r; y=f r; y F r+p ˆF r; y: The perturbation noise can be expressed in terms of ˆF as follows: ˆ l+1 = l+1 ˆr l + F l ˆr l ; Y l+1 F l ˆr l = l+1 ˆr l +ˆF l ˆr l ; Y l+1 P l ˆF l ˆr l ; Y l+1 = l+1 ˆr l + ˆF l ˆ r l ; Y l+1 P l ˆF l ˆr l ; Y l +P l 1 ˆF l 1 ˆr l 1 ; Y l P l ˆF l ˆr l ; Y l+1 +P l ˆF l ˆr l ; Y l P l ˆF l ˆr l 1 ; Y l +P l ˆF l ˆr l 1 ; Y l P l 1 ˆF l 1 ˆr l 1 ; Y l : To prove the lemma, it is sucient to show that the desired inequality holds when ˆ l is replaced by each of the terms on the right-hand side of the above equation. Indeed, the rst term is a martingale dierence with bounded second moment and the second term is the summand in a telescoping series. The last two terms are of the order O ˆr l ˆr l 1 and O l+1 l, respectively. Using these observations, it is easily shown that each of these terms satises the desired inequality. Lemma 8 indicates that as long as ˆr is bounded, the perturbation noise remains negligible. In the next lemma, we prove that the sequence { ˆr } closely approximates r. Lemma 9. Lim max ˆr r =0,w.p.1. Proof. Since G is bounded, there exists a constant D such that ˆr +1 r +1 6 D l ˆr l r l + l ˆ l+1 l= l=

6 100 V.R. Konda, J.N. Tsitsilis / Systems & Control Letters for every. Using the discrete Gronwall inequality, 1 it is easy to see that max e DT max ˆr +1 r l ˆ l+1 l= ; where T = T Therefore, Lemma 8 and the Chebyshev inequality imply that P max ˆr +1 r +1 6 C l= for some C 1 0 that depends on the constant C in the denition of the stopping time 1 C. Consider the stopping time 2 dened by 2 = min{ : ˆr r }; which is the rst time that the sequence { ˆr } exits from a tube of radius around the sequence {r }.To prove the lemma, we need bounds on P , whereas we only have bounds on P Therefore, we need to relate the stopping times 1 and 2. To do this, note that sup max r 6 C 6 for some constant C 1, because h and G are bounded, and the function G satises Assumption 6. At time = 1 C +, we have ˆr C + and r 6 C, which implies that ˆr r, i.e., 6 1 C + : 2 1 For a non-negative sequence { } and a constant B 0, let {b } be a sequence satisfying b 0 = 0 and b +1 6 b + B : l=0 Then, for every, we have b +1 6 B exp l : l=0 2 l Therefore, P max ˆr +1 r = P = P C + 6 C l : l= The result follows from the summability of the series and the Borel Cantelli lemma. 2 Now we are ready to prove boundedness, using the following simple results. Lemma 10. Suppose that and that {a }, { } are non-negative sequences that satisfy a +1 6 a +. 1 If sup, then sup a. 2 If 0, then a 0. Lemma 11. If an m m matrix G satises r Gr r 2 r R m ; then for suciently small 0, I Gr r : Lemma 12. sup r, w.p.1. Proof. Since h is bounded, Assumption 6 and Lemma 11 imply the following. There exists a constant C such that, for suciently large, and, r a r + C max1; r : Using the inequality 1 x 6 e x, we have 1 r +1 6 e 2 a l= l r + C max1; r : 2 l= l

7 V.R. Konda, J.N. Tsitsilis / Systems & Control Letters This, along with Lemma 9, implies r +1 max1; r 6 e at=2 r max1; r C +T max1; r + ; where 0, w.p.1. Multiplying both sides by max1; r and using the fact that it is less than 1+ r, we have r +1 6 e at=2 + r + CT + : Since e at 1 and 0, w.p.1, it follows from Lemma 10a that sup r ; w:p:1: Recall that r =ˆr is bounded. Then, using Eq. 2 it follows that sup r w:p:1: ; The boundedness of r now follows from the observation: sup r = sup max1; r max ˆr sup {1 + r max r + max r ˆr }; the boundedness of r and r, and Lemma Proof of Theorem 7 To prove Theorem 7, we consider the sequence ˆ = G r h. As noted in Section 3, this sequence evolves according to the iteration: ˆ +1 =ˆ G +1 ˆ ; where 1 +1 = G +1 h Y +1 h G +1 G Y +1 G r + G r ; 2 +1 = 1 G +1 G r h +1 h : Lemma converges w.p.1. Proof. The proof relies on Assumptions 1, 3, and 5, and is omitted because it is very similar to the proof of Lemma 2 in [2, pp. 224]. Lemma 14. lim 2 =0,w.p.1. Proof. Using Assumption 34 and the update equation for, we have C +1 r +1 = C H r +1: Since {H } has bounded moments, the second part of Assumption 4 yields [ ] d E H d for some d 0. Therefore, = H converges to zero, w.p.1. The result follows from the boundedness of {r }. Recall the notation from the previous section. For each, dene, for as follows: +1 =I G ; =ˆ : Lemma 15. lim max ˆ =0,w.p.1. Proof. Since G is bounded, there exists some C such that for every and, 1 ˆ 6 C l ˆ l l + 1 l 1 l+1 l= + 2 l+1 l= : Using the discrete Gronwall inequality, it can be seen that max ˆ e CT max l 1 l l+1 l= 1 6 e CT sup l 1 l+1 l= +ect T sup 2 +1 :

8 102 V.R. Konda, J.N. Tsitsilis / Systems & Control Letters The result follows from the previous two lemmas. We are now ready to complete the proof of Theorem 7. Using Assumption 6 and Lemma 11, we have +1 6 e at=2 : Therefore Lemma 15 implies that ˆ +1 6 e at=2 ˆ + ; where 0 w.p.1. Using Lemma 10b, it follows that G r h = ˆ converges to zero. Using arguments similar to the closing arguments in the proof of Lemma 12, we nally conclude that lim G r h =0; w:p:1: Acnowledgements This research was partially supported by the National Science Foundation under contract ECS , and by the AFOSR under contract F References [1] A. Benveniste, M. Metivier, P. Priouret, Adaptive Algorithms and Stochastic Approximations, Springer, Berlin, [2] D.P. Bertseas, J.N. Tsitsilis, Neuro-Dynamic Programming, Athena Scientic, Belmont, MA, [3] V.S. Borar, Stochastic approximation with two time scales, Systems Control Lett [4] V.S. Borar, S.P. Meyn, The o.d.e. method for convergence of stochastic approximation and reinforcement learning, SIAM J. Control Optim [5] M. Duo, Random Iterative Models, Springer, New Yor, [6] E. Eweda, O. Machi, Convergence of an adaptive linear estimation algorithm, IEEE Trans. Automat. Control AC [7] V.R. Konda, Actor-critic algorithms, Ph.D. Thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, [8] V.R. Konda, J.N. Tsitsilis, On actor-critic algorithms, SIAM J. Control Optim., to appear. [9] H.H. Kushner, Asymptotic behavior of stochastic approximation and large deviations, IEEE Trans. Automat. Control [10] H.J. Kushner, G.G. Yin, Stochastic Approximation Algorithms and Applications, Springer, New Yor, [11] S.P. Meyn, R.L. Tweedie, Marov Chains and Stochastic Stability, Springer, London, [12] R.S. Sutton, Learning to predict by the methods of temporal dierences, Mach. Learning [13] R. Sutton, A. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA, [14] B. Widrow, J. McCool, M.G. Larimore, C.R. Johnson, Stationary and non-stationary learning characteristics of the LMS adaptive lter, Proc. IEEE

Almost Sure Convergence of Two Time-Scale Stochastic Approximation Algorithms

Almost Sure Convergence of Two Time-Scale Stochastic Approximation Algorithms Almost Sure Convergence of Two Time-Scale Stochastic Approximation Algorithms Vladislav B. Tadić Abstract The almost sure convergence of two time-scale stochastic approximation algorithms is analyzed under

More information

Average Reward Parameters

Average Reward Parameters Simulation-Based Optimization of Markov Reward Processes: Implementation Issues Peter Marbach 2 John N. Tsitsiklis 3 Abstract We consider discrete time, nite state space Markov reward processes which depend

More information

Procedia Computer Science 00 (2011) 000 6

Procedia Computer Science 00 (2011) 000 6 Procedia Computer Science (211) 6 Procedia Computer Science Complex Adaptive Systems, Volume 1 Cihan H. Dagli, Editor in Chief Conference Organized by Missouri University of Science and Technology 211-

More information

NP-hardness of the stable matrix in unit interval family problem in discrete time

NP-hardness of the stable matrix in unit interval family problem in discrete time Systems & Control Letters 42 21 261 265 www.elsevier.com/locate/sysconle NP-hardness of the stable matrix in unit interval family problem in discrete time Alejandra Mercado, K.J. Ray Liu Electrical and

More information

Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming

Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming MATHEMATICS OF OPERATIONS RESEARCH Vol. 37, No. 1, February 2012, pp. 66 94 ISSN 0364-765X (print) ISSN 1526-5471 (online) http://dx.doi.org/10.1287/moor.1110.0532 2012 INFORMS Q-Learning and Enhanced

More information

CONVERGENCE RATE OF LINEAR TWO-TIME-SCALE STOCHASTIC APPROXIMATION 1. BY VIJAY R. KONDA AND JOHN N. TSITSIKLIS Massachusetts Institute of Technology

CONVERGENCE RATE OF LINEAR TWO-TIME-SCALE STOCHASTIC APPROXIMATION 1. BY VIJAY R. KONDA AND JOHN N. TSITSIKLIS Massachusetts Institute of Technology The Annals of Applied Probability 2004, Vol. 14, No. 2, 796 819 Institute of Mathematical Statistics, 2004 CONVERGENCE RATE OF LINEAR TWO-TIME-SCALE STOCHASTIC APPROXIMATION 1 BY VIJAY R. KONDA AND JOHN

More information

Q-Learning and Stochastic Approximation

Q-Learning and Stochastic Approximation MS&E338 Reinforcement Learning Lecture 4-04.11.018 Q-Learning and Stochastic Approximation Lecturer: Ben Van Roy Scribe: Christopher Lazarus Javier Sagastuy In this lecture we study the convergence of

More information

Online solution of the average cost Kullback-Leibler optimization problem

Online solution of the average cost Kullback-Leibler optimization problem Online solution of the average cost Kullback-Leibler optimization problem Joris Bierkens Radboud University Nijmegen j.bierkens@science.ru.nl Bert Kappen Radboud University Nijmegen b.kappen@science.ru.nl

More information

Adaptive linear quadratic control using policy. iteration. Steven J. Bradtke. University of Massachusetts.

Adaptive linear quadratic control using policy. iteration. Steven J. Bradtke. University of Massachusetts. Adaptive linear quadratic control using policy iteration Steven J. Bradtke Computer Science Department University of Massachusetts Amherst, MA 01003 bradtke@cs.umass.edu B. Erik Ydstie Department of Chemical

More information

A characterization of consistency of model weights given partial information in normal linear models

A characterization of consistency of model weights given partial information in normal linear models Statistics & Probability Letters ( ) A characterization of consistency of model weights given partial information in normal linear models Hubert Wong a;, Bertrand Clare b;1 a Department of Health Care

More information

In: Proc. BENELEARN-98, 8th Belgian-Dutch Conference on Machine Learning, pp 9-46, 998 Linear Quadratic Regulation using Reinforcement Learning Stephan ten Hagen? and Ben Krose Department of Mathematics,

More information

On the Convergence of Optimistic Policy Iteration

On the Convergence of Optimistic Policy Iteration Journal of Machine Learning Research 3 (2002) 59 72 Submitted 10/01; Published 7/02 On the Convergence of Optimistic Policy Iteration John N. Tsitsiklis LIDS, Room 35-209 Massachusetts Institute of Technology

More information

CONTROL SYSTEMS, ROBOTICS AND AUTOMATION Vol. XI Stochastic Stability - H.J. Kushner

CONTROL SYSTEMS, ROBOTICS AND AUTOMATION Vol. XI Stochastic Stability - H.J. Kushner STOCHASTIC STABILITY H.J. Kushner Applied Mathematics, Brown University, Providence, RI, USA. Keywords: stability, stochastic stability, random perturbations, Markov systems, robustness, perturbed systems,

More information

1 Introduction 1.1 The Contribution The well-nown least mean squares (LMS) algorithm, aiming at tracing the \best linear t" of an observed (or desired

1 Introduction 1.1 The Contribution The well-nown least mean squares (LMS) algorithm, aiming at tracing the \best linear t of an observed (or desired Necessary and Sucient Conditions for Stability of LMS Lei Guo y Lennart Ljung z and Guan-Jun Wang x First Version: November 9, 1995 Revised Version: August 2, 1996 Abstract. In a recent wor [7], some general

More information

Linearly-solvable Markov decision problems

Linearly-solvable Markov decision problems Advances in Neural Information Processing Systems 2 Linearly-solvable Markov decision problems Emanuel Todorov Department of Cognitive Science University of California San Diego todorov@cogsci.ucsd.edu

More information

1 Problem Formulation

1 Problem Formulation Book Review Self-Learning Control of Finite Markov Chains by A. S. Poznyak, K. Najim, and E. Gómez-Ramírez Review by Benjamin Van Roy This book presents a collection of work on algorithms for learning

More information

Semi-strongly asymptotically non-expansive mappings and their applications on xed point theory

Semi-strongly asymptotically non-expansive mappings and their applications on xed point theory Hacettepe Journal of Mathematics and Statistics Volume 46 (4) (2017), 613 620 Semi-strongly asymptotically non-expansive mappings and their applications on xed point theory Chris Lennard and Veysel Nezir

More information

Lecture 5. 1 Chung-Fuchs Theorem. Tel Aviv University Spring 2011

Lecture 5. 1 Chung-Fuchs Theorem. Tel Aviv University Spring 2011 Random Walks and Brownian Motion Tel Aviv University Spring 20 Instructor: Ron Peled Lecture 5 Lecture date: Feb 28, 20 Scribe: Yishai Kohn In today's lecture we return to the Chung-Fuchs theorem regarding

More information

Prediction-based adaptive control of a class of discrete-time nonlinear systems with nonlinear growth rate

Prediction-based adaptive control of a class of discrete-time nonlinear systems with nonlinear growth rate www.scichina.com info.scichina.com www.springerlin.com Prediction-based adaptive control of a class of discrete-time nonlinear systems with nonlinear growth rate WEI Chen & CHEN ZongJi School of Automation

More information

New Lyapunov Krasovskii functionals for stability of linear retarded and neutral type systems

New Lyapunov Krasovskii functionals for stability of linear retarded and neutral type systems Systems & Control Letters 43 (21 39 319 www.elsevier.com/locate/sysconle New Lyapunov Krasovskii functionals for stability of linear retarded and neutral type systems E. Fridman Department of Electrical

More information

1 Introduction This work follows a paper by P. Shields [1] concerned with a problem of a relation between the entropy rate of a nite-valued stationary

1 Introduction This work follows a paper by P. Shields [1] concerned with a problem of a relation between the entropy rate of a nite-valued stationary Prexes and the Entropy Rate for Long-Range Sources Ioannis Kontoyiannis Information Systems Laboratory, Electrical Engineering, Stanford University. Yurii M. Suhov Statistical Laboratory, Pure Math. &

More information

6 Basic Convergence Results for RL Algorithms

6 Basic Convergence Results for RL Algorithms Learning in Complex Systems Spring 2011 Lecture Notes Nahum Shimkin 6 Basic Convergence Results for RL Algorithms We establish here some asymptotic convergence results for the basic RL algorithms, by showing

More information

Stochastic Gradient Descent in Continuous Time

Stochastic Gradient Descent in Continuous Time Stochastic Gradient Descent in Continuous Time Justin Sirignano University of Illinois at Urbana Champaign with Konstantinos Spiliopoulos (Boston University) 1 / 27 We consider a diffusion X t X = R m

More information

In Advances in Neural Information Processing Systems 6. J. D. Cowan, G. Tesauro and. Convergence of Indirect Adaptive. Andrew G.

In Advances in Neural Information Processing Systems 6. J. D. Cowan, G. Tesauro and. Convergence of Indirect Adaptive. Andrew G. In Advances in Neural Information Processing Systems 6. J. D. Cowan, G. Tesauro and J. Alspector, (Eds.). Morgan Kaufmann Publishers, San Fancisco, CA. 1994. Convergence of Indirect Adaptive Asynchronous

More information

Approximate Optimal-Value Functions. Satinder P. Singh Richard C. Yee. University of Massachusetts.

Approximate Optimal-Value Functions. Satinder P. Singh Richard C. Yee. University of Massachusetts. An Upper Bound on the oss from Approximate Optimal-Value Functions Satinder P. Singh Richard C. Yee Department of Computer Science University of Massachusetts Amherst, MA 01003 singh@cs.umass.edu, yee@cs.umass.edu

More information

arxiv: v1 [cs.lg] 23 Oct 2017

arxiv: v1 [cs.lg] 23 Oct 2017 Accelerated Reinforcement Learning K. Lakshmanan Department of Computer Science and Engineering Indian Institute of Technology (BHU), Varanasi, India Email: lakshmanank.cse@itbhu.ac.in arxiv:1710.08070v1

More information

Speedy Q-Learning. Abstract

Speedy Q-Learning. Abstract Speedy Q-Learning Mohammad Gheshlaghi Azar Radboud University Nijmegen Geert Grooteplein N, 6 EZ Nijmegen, Netherlands m.azar@science.ru.nl Mohammad Ghavamzadeh INRIA Lille, SequeL Project 40 avenue Halley

More information

On Average Versus Discounted Reward Temporal-Difference Learning

On Average Versus Discounted Reward Temporal-Difference Learning Machine Learning, 49, 179 191, 2002 c 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. On Average Versus Discounted Reward Temporal-Difference Learning JOHN N. TSITSIKLIS Laboratory for

More information

9 Improved Temporal Difference

9 Improved Temporal Difference 9 Improved Temporal Difference Methods with Linear Function Approximation DIMITRI P. BERTSEKAS and ANGELIA NEDICH Massachusetts Institute of Technology Alphatech, Inc. VIVEK S. BORKAR Tata Institute of

More information

On Differentiability of Average Cost in Parameterized Markov Chains

On Differentiability of Average Cost in Parameterized Markov Chains On Differentiability of Average Cost in Parameterized Markov Chains Vijay Konda John N. Tsitsiklis August 30, 2002 1 Overview The purpose of this appendix is to prove Theorem 4.6 in 5 and establish various

More information

Performance Comparison of Two Implementations of the Leaky. LMS Adaptive Filter. Scott C. Douglas. University of Utah. Salt Lake City, Utah 84112

Performance Comparison of Two Implementations of the Leaky. LMS Adaptive Filter. Scott C. Douglas. University of Utah. Salt Lake City, Utah 84112 Performance Comparison of Two Implementations of the Leaky LMS Adaptive Filter Scott C. Douglas Department of Electrical Engineering University of Utah Salt Lake City, Utah 8411 Abstract{ The leaky LMS

More information

An almost sure invariance principle for additive functionals of Markov chains

An almost sure invariance principle for additive functionals of Markov chains Statistics and Probability Letters 78 2008 854 860 www.elsevier.com/locate/stapro An almost sure invariance principle for additive functionals of Markov chains F. Rassoul-Agha a, T. Seppäläinen b, a Department

More information

WE consider finite-state Markov decision processes

WE consider finite-state Markov decision processes IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 54, NO. 7, JULY 2009 1515 Convergence Results for Some Temporal Difference Methods Based on Least Squares Huizhen Yu and Dimitri P. Bertsekas Abstract We consider

More information

Robust linear optimization under general norms

Robust linear optimization under general norms Operations Research Letters 3 (004) 50 56 Operations Research Letters www.elsevier.com/locate/dsw Robust linear optimization under general norms Dimitris Bertsimas a; ;, Dessislava Pachamanova b, Melvyn

More information

Fuzzy relational equation with defuzzication algorithm for the largest solution

Fuzzy relational equation with defuzzication algorithm for the largest solution Fuzzy Sets and Systems 123 (2001) 119 127 www.elsevier.com/locate/fss Fuzzy relational equation with defuzzication algorithm for the largest solution S. Kagei Department of Information and Systems, Faculty

More information

Asymptotic Filtering and Entropy Rate of a Hidden Markov Process in the Rare Transitions Regime

Asymptotic Filtering and Entropy Rate of a Hidden Markov Process in the Rare Transitions Regime Asymptotic Filtering and Entropy Rate of a Hidden Marov Process in the Rare Transitions Regime Chandra Nair Dept. of Elect. Engg. Stanford University Stanford CA 94305, USA mchandra@stanford.edu Eri Ordentlich

More information

An Adaptive Clustering Method for Model-free Reinforcement Learning

An Adaptive Clustering Method for Model-free Reinforcement Learning An Adaptive Clustering Method for Model-free Reinforcement Learning Andreas Matt and Georg Regensburger Institute of Mathematics University of Innsbruck, Austria {andreas.matt, georg.regensburger}@uibk.ac.at

More information

A note on continuous behavior homomorphisms

A note on continuous behavior homomorphisms Available online at www.sciencedirect.com Systems & Control Letters 49 (2003) 359 363 www.elsevier.com/locate/sysconle A note on continuous behavior homomorphisms P.A. Fuhrmann 1 Department of Mathematics,

More information

Q-Learning for Markov Decision Processes*

Q-Learning for Markov Decision Processes* McGill University ECSE 506: Term Project Q-Learning for Markov Decision Processes* Authors: Khoa Phan khoa.phan@mail.mcgill.ca Sandeep Manjanna sandeep.manjanna@mail.mcgill.ca (*Based on: Convergence of

More information

Markov processes Course note 2. Martingale problems, recurrence properties of discrete time chains.

Markov processes Course note 2. Martingale problems, recurrence properties of discrete time chains. Institute for Applied Mathematics WS17/18 Massimiliano Gubinelli Markov processes Course note 2. Martingale problems, recurrence properties of discrete time chains. [version 1, 2017.11.1] We introduce

More information

Generalization of Hensel lemma: nding of roots of p-adic Lipschitz functions

Generalization of Hensel lemma: nding of roots of p-adic Lipschitz functions Generalization of Hensel lemma: nding of roots of p-adic Lipschitz functions (joint talk with Andrei Khrennikov) Dr. Ekaterina Yurova Axelsson Linnaeus University, Sweden September 8, 2015 Outline Denitions

More information

Nonmonotonic back-tracking trust region interior point algorithm for linear constrained optimization

Nonmonotonic back-tracking trust region interior point algorithm for linear constrained optimization Journal of Computational and Applied Mathematics 155 (2003) 285 305 www.elsevier.com/locate/cam Nonmonotonic bac-tracing trust region interior point algorithm for linear constrained optimization Detong

More information

A System Theoretic Perspective of Learning and Optimization

A System Theoretic Perspective of Learning and Optimization A System Theoretic Perspective of Learning and Optimization Xi-Ren Cao* Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong Kong eecao@ee.ust.hk Abstract Learning and optimization

More information

Vijaymohan Konda. Doctor of Philosophy. at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY. June Vijaymohan Konda, MMII. All rights reserved.

Vijaymohan Konda. Doctor of Philosophy. at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY. June Vijaymohan Konda, MMII. All rights reserved. Actor-Critic Algorithms by Vijaymohan Konda Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy

More information

Spatial autoregression model:strong consistency

Spatial autoregression model:strong consistency Statistics & Probability Letters 65 (2003 71 77 Spatial autoregression model:strong consistency B.B. Bhattacharyya a, J.-J. Ren b, G.D. Richardson b;, J. Zhang b a Department of Statistics, North Carolina

More information

On at systems behaviors and observable image representations

On at systems behaviors and observable image representations Available online at www.sciencedirect.com Systems & Control Letters 51 (2004) 51 55 www.elsevier.com/locate/sysconle On at systems behaviors and observable image representations H.L. Trentelman Institute

More information

Stochastic Dynamic Programming. Jesus Fernandez-Villaverde University of Pennsylvania

Stochastic Dynamic Programming. Jesus Fernandez-Villaverde University of Pennsylvania Stochastic Dynamic Programming Jesus Fernande-Villaverde University of Pennsylvania 1 Introducing Uncertainty in Dynamic Programming Stochastic dynamic programming presents a very exible framework to handle

More information

THE LUBRICATION APPROXIMATION FOR THIN VISCOUS FILMS: REGULARITY AND LONG TIME BEHAVIOR OF WEAK SOLUTIONS A.L. BERTOZZI AND M. PUGH.

THE LUBRICATION APPROXIMATION FOR THIN VISCOUS FILMS: REGULARITY AND LONG TIME BEHAVIOR OF WEAK SOLUTIONS A.L. BERTOZZI AND M. PUGH. THE LUBRICATION APPROXIMATION FOR THIN VISCOUS FILMS: REGULARITY AND LONG TIME BEHAVIOR OF WEAK SOLUTIONS A.L. BERTOI AND M. PUGH April 1994 Abstract. We consider the fourth order degenerate diusion equation

More information

Simulation-Based Optimization of Markov Reward Processes

Simulation-Based Optimization of Markov Reward Processes IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 46, NO. 2, FEBRUARY 2001 191 Simulation-Based Optimization of Markov Reward Processes Peter Marbach John N. Tsitsiklis, Fellow, IEEE Abstract This paper proposes

More information

Stability of nonlinear AR(1) time series with delay

Stability of nonlinear AR(1) time series with delay Stochastic Processes and their Applications 82 (1999 307 333 www.elsevier.com/locate/spa Stability of nonlinear AR(1 time series with delay Daren B.H. Cline, Huay-min H. Pu Department of Statistics, Texas

More information

Error Empirical error. Generalization error. Time (number of iteration)

Error Empirical error. Generalization error. Time (number of iteration) Submitted to Neural Networks. Dynamics of Batch Learning in Multilayer Networks { Overrealizability and Overtraining { Kenji Fukumizu The Institute of Physical and Chemical Research (RIKEN) E-mail: fuku@brain.riken.go.jp

More information

Semi-Markov Decision Problems and Performance Sensitivity Analysis

Semi-Markov Decision Problems and Performance Sensitivity Analysis 758 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 48, NO. 5, MAY 2003 Semi-Markov Decision Problems Performance Sensitivity Analysis Xi-Ren Cao, Fellow, IEEE Abstract Recent research indicates that Markov

More information

Chapter 7 Interconnected Systems and Feedback: Well-Posedness, Stability, and Performance 7. Introduction Feedback control is a powerful approach to o

Chapter 7 Interconnected Systems and Feedback: Well-Posedness, Stability, and Performance 7. Introduction Feedback control is a powerful approach to o Lectures on Dynamic Systems and Control Mohammed Dahleh Munther A. Dahleh George Verghese Department of Electrical Engineering and Computer Science Massachuasetts Institute of Technology c Chapter 7 Interconnected

More information

A converse Lyapunov theorem for discrete-time systems with disturbances

A converse Lyapunov theorem for discrete-time systems with disturbances Systems & Control Letters 45 (2002) 49 58 www.elsevier.com/locate/sysconle A converse Lyapunov theorem for discrete-time systems with disturbances Zhong-Ping Jiang a; ; 1, Yuan Wang b; 2 a Department of

More information

Divisor matrices and magic sequences

Divisor matrices and magic sequences Discrete Mathematics 250 (2002) 125 135 www.elsevier.com/locate/disc Divisor matrices and magic sequences R.H. Jeurissen Mathematical Institute, University of Nijmegen, Toernooiveld, 6525 ED Nijmegen,

More information

The best expert versus the smartest algorithm

The best expert versus the smartest algorithm Theoretical Computer Science 34 004 361 380 www.elsevier.com/locate/tcs The best expert versus the smartest algorithm Peter Chen a, Guoli Ding b; a Department of Computer Science, Louisiana State University,

More information

Comments on integral variants of ISS 1

Comments on integral variants of ISS 1 Systems & Control Letters 34 (1998) 93 1 Comments on integral variants of ISS 1 Eduardo D. Sontag Department of Mathematics, Rutgers University, Piscataway, NJ 8854-819, USA Received 2 June 1997; received

More information

Pade approximants and noise: rational functions

Pade approximants and noise: rational functions Journal of Computational and Applied Mathematics 105 (1999) 285 297 Pade approximants and noise: rational functions Jacek Gilewicz a; a; b;1, Maciej Pindor a Centre de Physique Theorique, Unite Propre

More information

Lifting to non-integral idempotents

Lifting to non-integral idempotents Journal of Pure and Applied Algebra 162 (2001) 359 366 www.elsevier.com/locate/jpaa Lifting to non-integral idempotents Georey R. Robinson School of Mathematics and Statistics, University of Birmingham,

More information

Strassen s law of the iterated logarithm for diusion processes for small time 1

Strassen s law of the iterated logarithm for diusion processes for small time 1 Stochastic Processes and their Applications 74 (998) 9 Strassen s law of the iterated logarithm for diusion processes for small time Lucia Caramellino Dipartimento di Matematica, Universita degli Studi

More information

Temporal difference learning

Temporal difference learning Temporal difference learning AI & Agents for IET Lecturer: S Luz http://www.scss.tcd.ie/~luzs/t/cs7032/ February 4, 2014 Recall background & assumptions Environment is a finite MDP (i.e. A and S are finite).

More information

On Finding Optimal Policies for Markovian Decision Processes Using Simulation

On Finding Optimal Policies for Markovian Decision Processes Using Simulation On Finding Optimal Policies for Markovian Decision Processes Using Simulation Apostolos N. Burnetas Case Western Reserve University Michael N. Katehakis Rutgers University February 1995 Abstract A simulation

More information

1 Introduction The purpose of this paper is to illustrate the typical behavior of learning algorithms using stochastic approximations (SA). In particu

1 Introduction The purpose of this paper is to illustrate the typical behavior of learning algorithms using stochastic approximations (SA). In particu Strong Points of Weak Convergence: A Study Using RPA Gradient Estimation for Automatic Learning Felisa J. Vazquez-Abad * Department of Computer Science and Operations Research University of Montreal, Montreal,

More information

2 optimal prices the link is either underloaded or critically loaded; it is never overloaded. For the social welfare maximization problem we show that

2 optimal prices the link is either underloaded or critically loaded; it is never overloaded. For the social welfare maximization problem we show that 1 Pricing in a Large Single Link Loss System Costas A. Courcoubetis a and Martin I. Reiman b a ICS-FORTH and University of Crete Heraklion, Crete, Greece courcou@csi.forth.gr b Bell Labs, Lucent Technologies

More information

Applied Mathematics &Optimization

Applied Mathematics &Optimization Appl Math Optim 29: 211-222 (1994) Applied Mathematics &Optimization c 1994 Springer-Verlag New Yor Inc. An Algorithm for Finding the Chebyshev Center of a Convex Polyhedron 1 N.D.Botin and V.L.Turova-Botina

More information

University of California. Berkeley, CA fzhangjun johans lygeros Abstract

University of California. Berkeley, CA fzhangjun johans lygeros Abstract Dynamical Systems Revisited: Hybrid Systems with Zeno Executions Jun Zhang, Karl Henrik Johansson y, John Lygeros, and Shankar Sastry Department of Electrical Engineering and Computer Sciences University

More information

The convergence limit of the temporal difference learning

The convergence limit of the temporal difference learning The convergence limit of the temporal difference learning Ryosuke Nomura the University of Tokyo September 3, 2013 1 Outline Reinforcement Learning Convergence limit Construction of the feature vector

More information

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.

More information

Constrained Leja points and the numerical solution of the constrained energy problem

Constrained Leja points and the numerical solution of the constrained energy problem Journal of Computational and Applied Mathematics 131 (2001) 427 444 www.elsevier.nl/locate/cam Constrained Leja points and the numerical solution of the constrained energy problem Dan I. Coroian, Peter

More information

Multiplicative Multifractal Modeling of. Long-Range-Dependent (LRD) Trac in. Computer Communications Networks. Jianbo Gao and Izhak Rubin

Multiplicative Multifractal Modeling of. Long-Range-Dependent (LRD) Trac in. Computer Communications Networks. Jianbo Gao and Izhak Rubin Multiplicative Multifractal Modeling of Long-Range-Dependent (LRD) Trac in Computer Communications Networks Jianbo Gao and Izhak Rubin Electrical Engineering Department, University of California, Los Angeles

More information

A Least Squares Q-Learning Algorithm for Optimal Stopping Problems

A Least Squares Q-Learning Algorithm for Optimal Stopping Problems LIDS REPORT 273 December 2006 Revised: June 2007 A Least Squares Q-Learning Algorithm for Optimal Stopping Problems Huizhen Yu janey.yu@cs.helsinki.fi Dimitri P. Bertsekas dimitrib@mit.edu Abstract We

More information

Off-Policy Actor-Critic

Off-Policy Actor-Critic Off-Policy Actor-Critic Ludovic Trottier Laval University July 25 2012 Ludovic Trottier (DAMAS Laboratory) Off-Policy Actor-Critic July 25 2012 1 / 34 Table of Contents 1 Reinforcement Learning Theory

More information

Spurious Chaotic Solutions of Dierential. Equations. Sigitas Keras. September Department of Applied Mathematics and Theoretical Physics

Spurious Chaotic Solutions of Dierential. Equations. Sigitas Keras. September Department of Applied Mathematics and Theoretical Physics UNIVERSITY OF CAMBRIDGE Numerical Analysis Reports Spurious Chaotic Solutions of Dierential Equations Sigitas Keras DAMTP 994/NA6 September 994 Department of Applied Mathematics and Theoretical Physics

More information

The average dimension of the hull of cyclic codes

The average dimension of the hull of cyclic codes Discrete Applied Mathematics 128 (2003) 275 292 www.elsevier.com/locate/dam The average dimension of the hull of cyclic codes Gintaras Skersys Matematikos ir Informatikos Fakultetas, Vilniaus Universitetas,

More information

Existence and uniqueness of solutions for a continuous-time opinion dynamics model with state-dependent connectivity

Existence and uniqueness of solutions for a continuous-time opinion dynamics model with state-dependent connectivity Existence and uniqueness of solutions for a continuous-time opinion dynamics model with state-dependent connectivity Vincent D. Blondel, Julien M. Hendricx and John N. Tsitsilis July 24, 2009 Abstract

More information

[4] T. I. Seidman, \\First Come First Serve" is Unstable!," tech. rep., University of Maryland Baltimore County, 1993.

[4] T. I. Seidman, \\First Come First Serve is Unstable!, tech. rep., University of Maryland Baltimore County, 1993. [2] C. J. Chase and P. J. Ramadge, \On real-time scheduling policies for exible manufacturing systems," IEEE Trans. Automat. Control, vol. AC-37, pp. 491{496, April 1992. [3] S. H. Lu and P. R. Kumar,

More information

Note Watson Crick D0L systems with regular triggers

Note Watson Crick D0L systems with regular triggers Theoretical Computer Science 259 (2001) 689 698 www.elsevier.com/locate/tcs Note Watson Crick D0L systems with regular triggers Juha Honkala a; ;1, Arto Salomaa b a Department of Mathematics, University

More information

Approximate Dynamic Programming

Approximate Dynamic Programming Approximate Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course Value Iteration: the Idea 1. Let V 0 be any vector in R N A. LAZARIC Reinforcement

More information

On the convergence rates of genetic algorithms

On the convergence rates of genetic algorithms Theoretical Computer Science 229 (1999) 23 39 www.elsevier.com/locate/tcs On the convergence rates of genetic algorithms Jun He a;, Lishan Kang b a Department of Computer Science, Northern Jiaotong University,

More information

Some multivariate risk indicators; minimization by using stochastic algorithms

Some multivariate risk indicators; minimization by using stochastic algorithms Some multivariate risk indicators; minimization by using stochastic algorithms Véronique Maume-Deschamps, université Lyon 1 - ISFA, Joint work with P. Cénac and C. Prieur. AST&Risk (ANR Project) 1 / 51

More information

STOCHASTIC DIFFERENTIAL EQUATIONS WITH EXTRA PROPERTIES H. JEROME KEISLER. Department of Mathematics. University of Wisconsin.

STOCHASTIC DIFFERENTIAL EQUATIONS WITH EXTRA PROPERTIES H. JEROME KEISLER. Department of Mathematics. University of Wisconsin. STOCHASTIC DIFFERENTIAL EQUATIONS WITH EXTRA PROPERTIES H. JEROME KEISLER Department of Mathematics University of Wisconsin Madison WI 5376 keisler@math.wisc.edu 1. Introduction The Loeb measure construction

More information

On reaching head-to-tail ratios for balanced and unbalanced coins

On reaching head-to-tail ratios for balanced and unbalanced coins Journal of Statistical Planning and Inference 0 (00) 0 0 www.elsevier.com/locate/jspi On reaching head-to-tail ratios for balanced and unbalanced coins Tamas Lengyel Department of Mathematics, Occidental

More information

/97/$10.00 (c) 1997 AACC

/97/$10.00 (c) 1997 AACC Optimal Random Perturbations for Stochastic Approximation using a Simultaneous Perturbation Gradient Approximation 1 PAYMAN SADEGH, and JAMES C. SPALL y y Dept. of Mathematical Modeling, Technical University

More information

WARDROP EQUILIBRIA IN AN INFINITE NETWORK

WARDROP EQUILIBRIA IN AN INFINITE NETWORK LE MATEMATICHE Vol. LV (2000) Fasc. I, pp. 1728 WARDROP EQUILIBRIA IN AN INFINITE NETWORK BRUCE CALVERT In a nite network, there is a classical theory of trafc ow, which gives existence of a Wardrop equilibrium

More information

Elements of Reinforcement Learning

Elements of Reinforcement Learning Elements of Reinforcement Learning Policy: way learning algorithm behaves (mapping from state to action) Reward function: Mapping of state action pair to reward or cost Value function: long term reward,

More information

Convergence of Simultaneous Perturbation Stochastic Approximation for Nondifferentiable Optimization

Convergence of Simultaneous Perturbation Stochastic Approximation for Nondifferentiable Optimization 1 Convergence of Simultaneous Perturbation Stochastic Approximation for Nondifferentiable Optimization Ying He yhe@engr.colostate.edu Electrical and Computer Engineering Colorado State University Fort

More information

Richard DiSalvo. Dr. Elmer. Mathematical Foundations of Economics. Fall/Spring,

Richard DiSalvo. Dr. Elmer. Mathematical Foundations of Economics. Fall/Spring, The Finite Dimensional Normed Linear Space Theorem Richard DiSalvo Dr. Elmer Mathematical Foundations of Economics Fall/Spring, 20-202 The claim that follows, which I have called the nite-dimensional normed

More information

Limiting distribution for subcritical controlled branching processes with random control function

Limiting distribution for subcritical controlled branching processes with random control function Available online at www.sciencedirect.com Statistics & Probability Letters 67 (2004 277 284 Limiting distribution for subcritical controlled branching processes with random control function M. Gonzalez,

More information

Total Expected Discounted Reward MDPs: Existence of Optimal Policies

Total Expected Discounted Reward MDPs: Existence of Optimal Policies Total Expected Discounted Reward MDPs: Existence of Optimal Policies Eugene A. Feinberg Department of Applied Mathematics and Statistics State University of New York at Stony Brook Stony Brook, NY 11794-3600

More information

Bias-Variance Error Bounds for Temporal Difference Updates

Bias-Variance Error Bounds for Temporal Difference Updates Bias-Variance Bounds for Temporal Difference Updates Michael Kearns AT&T Labs mkearns@research.att.com Satinder Singh AT&T Labs baveja@research.att.com Abstract We give the first rigorous upper bounds

More information

On the fast convergence of random perturbations of the gradient flow.

On the fast convergence of random perturbations of the gradient flow. On the fast convergence of random perturbations of the gradient flow. Wenqing Hu. 1 (Joint work with Chris Junchi Li 2.) 1. Department of Mathematics and Statistics, Missouri S&T. 2. Department of Operations

More information

A Finite Element Method for an Ill-Posed Problem. Martin-Luther-Universitat, Fachbereich Mathematik/Informatik,Postfach 8, D Halle, Abstract

A Finite Element Method for an Ill-Posed Problem. Martin-Luther-Universitat, Fachbereich Mathematik/Informatik,Postfach 8, D Halle, Abstract A Finite Element Method for an Ill-Posed Problem W. Lucht Martin-Luther-Universitat, Fachbereich Mathematik/Informatik,Postfach 8, D-699 Halle, Germany Abstract For an ill-posed problem which has its origin

More information

The nonsmooth Newton method on Riemannian manifolds

The nonsmooth Newton method on Riemannian manifolds The nonsmooth Newton method on Riemannian manifolds C. Lageman, U. Helmke, J.H. Manton 1 Introduction Solving nonlinear equations in Euclidean space is a frequently occurring problem in optimization and

More information

INTRODUCTION TO MARKOV DECISION PROCESSES

INTRODUCTION TO MARKOV DECISION PROCESSES INTRODUCTION TO MARKOV DECISION PROCESSES Balázs Csanád Csáji Research Fellow, The University of Melbourne Signals & Systems Colloquium, 29 April 2010 Department of Electrical and Electronic Engineering,

More information

PROOF OF TWO MATRIX THEOREMS VIA TRIANGULAR FACTORIZATIONS ROY MATHIAS

PROOF OF TWO MATRIX THEOREMS VIA TRIANGULAR FACTORIZATIONS ROY MATHIAS PROOF OF TWO MATRIX THEOREMS VIA TRIANGULAR FACTORIZATIONS ROY MATHIAS Abstract. We present elementary proofs of the Cauchy-Binet Theorem on determinants and of the fact that the eigenvalues of a matrix

More information

ECON 616: Lecture 1: Time Series Basics

ECON 616: Lecture 1: Time Series Basics ECON 616: Lecture 1: Time Series Basics ED HERBST August 30, 2017 References Overview: Chapters 1-3 from Hamilton (1994). Technical Details: Chapters 2-3 from Brockwell and Davis (1987). Intuition: Chapters

More information

Comments and Corrections

Comments and Corrections 1386 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL 59, NO 5, MAY 2014 Comments and Corrections Corrections to Stochastic Barbalat s Lemma and Its Applications Xin Yu and Zhaojing Wu Abstract The proof of

More information

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space

Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) 1.1 The Formal Denition of a Vector Space Linear Algebra (part 1) : Vector Spaces (by Evan Dummit, 2017, v. 1.07) Contents 1 Vector Spaces 1 1.1 The Formal Denition of a Vector Space.................................. 1 1.2 Subspaces...................................................

More information

On Coarse Geometry and Coarse Embeddability

On Coarse Geometry and Coarse Embeddability On Coarse Geometry and Coarse Embeddability Ilmari Kangasniemi August 10, 2016 Master's Thesis University of Helsinki Faculty of Science Department of Mathematics and Statistics Supervised by Erik Elfving

More information

Introduction Wavelet shrinage methods have been very successful in nonparametric regression. But so far most of the wavelet regression methods have be

Introduction Wavelet shrinage methods have been very successful in nonparametric regression. But so far most of the wavelet regression methods have be Wavelet Estimation For Samples With Random Uniform Design T. Tony Cai Department of Statistics, Purdue University Lawrence D. Brown Department of Statistics, University of Pennsylvania Abstract We show

More information