Quasi Stochastic Approximation American Control Conference San Francisco, June 2011

Size: px
Start display at page:

Download "Quasi Stochastic Approximation American Control Conference San Francisco, June 2011"

Transcription

1 Quasi Stochastic Approximation American Control Conference San Francisco, June 2011 Sean P. Meyn Joint work with Darshan Shirodkar and Prashant Mehta Coordinated Science Laboratory and the Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign, USA Thanks to NSF & AFOSR

2 Outline 1 Background: Stochastic Approximation 2 Background: Q-learning 3 Quasi-Stochastic Approximation 4 Conclusions 2 / 15

3 Background: Stochastic Approximation Background: Stochastic Approximation Robbins and Monro Stochastic Approximation Setting: Solve the equation h(ϑ) = 0, with h(ϑ) = E[h(ϑ, ζ)], where ζ is random. Robbins and Monro [5] : Fixed point iteration with noisy measurements, ϑ n+1 = ϑ n + a n h(ϑ n, ζ n ), n 0, ϑ 0 R d given. 3 / 15

4 Background: Stochastic Approximation Background: Stochastic Approximation Robbins and Monro Stochastic Approximation Setting: Solve the equation h(ϑ) = 0, with h(ϑ) = E[h(ϑ, ζ)], where ζ is random. Robbins and Monro [5] : Fixed point iteration with noisy measurements, ϑ n+1 = ϑ n + a n h(ϑ n, ζ n ), n 0, ϑ 0 R d given. Typical assumptions for convergence: Step size a n = (1 + n) 1 Random sequnce ζ n is identical to ζ in distribution and i.i.d.. Global stability of ODE θ = h(θ). Excellent recent reference: Borkar 2008 [2]. 3 / 15

5 Background: Stochastic Approximation Background: Stochastic Approximation Variance Linearization of Stochastic Approximation Assuming convergence, write ϑ n+1 a n [ A ϑn + Z n ] with Z n = h(ϑ, ζ n ) (zero mean), and A = h (ϑ ). 4 / 15

6 Background: Stochastic Approximation Background: Stochastic Approximation Variance Linearization of Stochastic Approximation Assuming convergence, write ϑ n+1 a n [ A ϑn + Z n ] with Z n = h(ϑ, ζ n ) (zero mean), and A = h (ϑ ). Central Limit Theorem: n ϑn N(0, Σ ϑ ) 4 / 15

7 Background: Stochastic Approximation Background: Stochastic Approximation Variance Linearization of Stochastic Approximation Assuming convergence, write ϑ n+1 a n [ A ϑn + Z n ] with Z n = h(ϑ, ζ n ) (zero mean), and A = h (ϑ ). Central Limit Theorem: n ϑn N(0, Σ ϑ ) Holds under mild conditions. Lyapunov equation for variance, requires eig (A) < 1 2 : (A I)Σ ϑ + Σ ϑ (A I)T + Σ Z = 0 4 / 15

8 Background: Stochastic Approximation Background: Stochastic Approximation Stochastic Newton Raphson Corollary to CLT Question: What is the optimal matrix gain Γ n? ϑ n+1 = ϑ n + a n Γ n h(ϑ n, ζ n ), n 0, ϑ 0 R d given. 5 / 15

9 Background: Stochastic Approximation Background: Stochastic Approximation Stochastic Newton Raphson Corollary to CLT Question: What is the optimal matrix gain Γ n? ϑ n+1 = ϑ n + a n Γ n h(ϑ n, ζ n ), n 0, ϑ 0 R d given. Answer: Stochastic Newton-Raphson, so that Γ n A 1. 5 / 15

10 Background: Stochastic Approximation Background: Stochastic Approximation Stochastic Newton Raphson Corollary to CLT Question: What is the optimal matrix gain Γ n? ϑ n+1 = ϑ n + a n Γ n h(ϑ n, ζ n ), n 0, ϑ 0 R d given. Answer: Stochastic Newton-Raphson, so that Γ n A 1. Proof: Linearization reduces to standard Monte-Carlo 5 / 15

11 Background: Stochastic Approximation Example: Root finding h(ϑ, ζ) = 1 tan(ϑ) + ζ. ζ, normal random variable N(0, 9) h(ϑ, ζ) = 1 tan(ϑ) + ζ h(ϑ) = 1 tan(ϑ) = ϑ = π/4. 6 / 15

12 Background: Stochastic Approximation Example: Root finding h(ϑ, ζ) = 1 tan(ϑ) + ζ. ζ, normal random variable N(0, 9) h(ϑ, ζ) = 1 tan(ϑ) + ζ h(ϑ) = 1 tan(ϑ) = ϑ = π/4. Estimates of ϑ using i.i.d. and deterministic sequences (each ζ): ζ i.i.d. Gaussian ζ = φ 1 ( ) deterministic / 15

13 Background: Stochastic Approximation Example: Root finding h(ϑ, ζ) = 1 tan(ϑ) + ζ. ζ, normal random variable N(0, 9) h(ϑ, ζ) = 1 tan(ϑ) + ζ h(ϑ) = 1 tan(ϑ) = ϑ = π/4. Estimates of ϑ using i.i.d. and deterministic sequences (each ζ): ζ i.i.d. Gaussian ζ = φ 1 ( ) deterministic Analysis requires theory of quasi-stochastic approximation... 6 / 15

14 Background: Q-learning Background: Q-learning M&M 2009 system: ẋ = f(x, u) cost: c(x, u) HJB equation for discounted-cost optimal control problem, min{c(x, u) + f(x, u) h (x) u }{{} } = γh(x) Watkins & Dayan, Q-learning, 1992; Bertsekas & Tsitsiklis, NDP, 1996; Mehta & Meyn, CDC / 15

15 Background: Q-learning Background: Q-learning M&M 2009 system: ẋ = f(x, u) cost: c(x, u) HJB equation for discounted-cost optimal control problem, min{c(x, u) + f(x, u) h (x) } = γh(x) u }{{} Q function Q(x, u) = c(x, u) + f(x, u) h(x) Watkins & Dayan, Q-learning, 1992; Bertsekas & Tsitsiklis, NDP, 1996; Mehta & Meyn, CDC / 15

16 Background: Q-learning Background: Q-learning M&M 2009 system: ẋ = f(x, u) cost: c(x, u) HJB equation for discounted-cost optimal control problem, min{c(x, u) + f(x, u) h (x) } = γh(x) u }{{} Q function Fixed point equation for Q-function: Writing Q = min u Q(x, u) = γh(x), Q(x, u) = c(x, u) + f(x, u) h(x) Watkins & Dayan, Q-learning, 1992; Bertsekas & Tsitsiklis, NDP, 1996; Mehta & Meyn, CDC / 15

17 Background: Q-learning Background: Q-learning M&M 2009 system: ẋ = f(x, u) cost: c(x, u) HJB equation for discounted-cost optimal control problem, min{c(x, u) + f(x, u) h (x) } = γh(x) u }{{} Q function Fixed point equation for Q-function: Writing Q = min u Q(x, u) = γh(x), Q(x, u) = c(x, u) + f(x, u) h(x) = c(x, u) + γ 1 f(x, u) Q (x) Watkins & Dayan, Q-learning, 1992; Bertsekas & Tsitsiklis, NDP, 1996; Mehta & Meyn, CDC / 15

18 Background: Q-learning Background: Q-learning M&M 2009 system: ẋ = f(x, u) cost: c(x, u) HJB equation for discounted-cost optimal control problem, min{c(x, u) + f(x, u) h (x) } = γh(x) u }{{} Q function Fixed point equation for Q-function: Writing Q = min u Q(x, u) = γh(x), Q(x, u) = c(x, u) + f(x, u) h(x) = c(x, u) + γ 1 f(x, u) Q (x) Parameterization set of approximations, {Q ϑ (x, u) : ϑ R d } Bellman error: E ϑ (x, u) = γ(q ϑ (x, u) c(x, u)) f(x, u) Q θ (x) Watkins & Dayan, Q-learning, 1992; Bertsekas & Tsitsiklis, NDP, 1996; Mehta & Meyn, CDC / 15

19 Background: Q-learning Background: Q-learning M&M 2009 system: ẋ = f(x, u) cost: c(x, u) Parameterization, {Q ϑ (x, u) : ϑ R d } Bellman error: E ϑ (x, u) = γ(q ϑ (x, u) c(x, u)) f(x, u) Q θ (x) Watkins & Dayan, Q-learning, 1992; Bertsekas & Tsitsiklis, NDP, 1996; Mehta & Meyn, CDC / 15

20 Background: Q-learning Background: Q-learning M&M 2009 system: ẋ = f(x, u) cost: c(x, u) Parameterization, {Q ϑ (x, u) : ϑ R d } Bellman error: E ϑ (x, u) = γ(q ϑ (x, u) c(x, u)) f(x, u) Q θ (x) Model-free form: E ϑ (x, u) = γ(q ϑ (x, u) c(x, u)) d dt Qθ (x(t)) x=x(t), u=u(t) Watkins & Dayan, Q-learning, 1992; Bertsekas & Tsitsiklis, NDP, 1996; Mehta & Meyn, CDC / 15

21 Background: Q-learning Background: Q-learning M&M 2009 system: ẋ = f(x, u) cost: c(x, u) Parameterization, {Q ϑ (x, u) : ϑ R d } Bellman error: E ϑ (x, u) = γ(q ϑ (x, u) c(x, u)) f(x, u) Q θ (x) Model-free form: E ϑ (x, u) = γ(q ϑ (x, u) c(x, u)) d dt Qθ (x(t)) x=x(t), u=u(t) Q-learning of M&M Find zeros of h(ϑ) = E[E ϑ (x, u) 2 ] ζ = (x, u ) ergodic steady-state. Choose input: stable feedback + mixture of sinusoids, u(t) = k(x(t)) + ω(t), Watkins & Dayan, Q-learning, 1992; Bertsekas & Tsitsiklis, NDP, 1996; Mehta & Meyn, CDC / 15

22 Background: Q-learning Example: Q-Learning ẋ = x 3 + u, c(x, u) = 1 2 (x2 + u 2 ) HJB Equation: [ min c(x, u) + ( x 3 + u) h (x) ] = γh(x) u 9 / 15

23 Background: Q-learning Example: Q-Learning ẋ = x 3 + u, c(x, u) = 1 2 (x2 + u 2 ) [ HJB Equation: min c(x, u) + ( x 3 + u) h (x) ] = γh(x) u Basis: Q ϑ (x, u) = c(x, u) + ϑ 1 x 2 xu + ϑ x 2 9 / 15

24 Background: Q-learning Example: Q-Learning ẋ = x 3 + u, c(x, u) = 1 2 (x2 + u 2 ) [ HJB Equation: min c(x, u) + ( x 3 + u) h (x) ] = γh(x) u Basis: Q ϑ (x, u) = c(x, u) + ϑ 1 x 2 xu + ϑ x 2 Control: u(t) = A[sin(t) + sin(πt) + sin(et)] 9 / 15

25 Background: Q-learning Example: Q-Learning ẋ = x 3 + u, c(x, u) = 1 2 (x2 + u 2 ) [ HJB Equation: min c(x, u) + ( x 3 + u) h (x) ] = γh(x) u Basis: Q ϑ (x, u) = c(x, u) + ϑ 1 x 2 xu + ϑ x 2 Control: u(t) = A[sin(t) + sin(πt) + sin(et)] 1 Optimal policy Optimal policy Low amplitude input High amplitude input 9 / 15

26 Background: Q-learning Example: Q-Learning ẋ = x 3 + u, c(x, u) = 1 2 (x2 + u 2 ) [ HJB Equation: min c(x, u) + ( x 3 + u) h (x) ] = γh(x) u Basis: Q ϑ (x, u) = c(x, u) + ϑ 1 x 2 xu + ϑ x 2 Control: u(t) = A[sin(t) + sin(πt) + sin(et)] 1 Optimal policy Optimal policy Low amplitude input High amplitude input Analysis requires theory of quasi-stochastic approximation... 9 / 15

27 Quasi-Stochastic Approximation Quasi-Stochastic Approximation Continuous time, deterministic version of SA d ϑ(t) = a(t)h(ϑ(t), ζ(t)) dt 10 / 15

28 Quasi-Stochastic Approximation Quasi-Stochastic Approximation Continuous time, deterministic version of SA Assumptions Ergodicity: ζ satisfies d ϑ(t) = a(t)h(ϑ(t), ζ(t)) dt 1 T h(θ) = lim h(θ, ζ(t)) dt, for all θ R d. T T 0 Decreasing gain: a(t) 0, with 0 a(t) dt =, 0 a(t) 2 dt < usually, take a(t) = (1 + t) 1 10 / 15

29 Quasi-Stochastic Approximation Quasi-Stochastic Approximation Continuous time, deterministic version of SA Assumptions Ergodicity: ζ satisfies d ϑ(t) = a(t)h(ϑ(t), ζ(t)) dt 1 T h(θ) = lim h(θ, ζ(t)) dt, for all θ R d. T T 0 Decreasing gain: a(t) 0, with 0 a(t) dt =, 0 a(t) 2 dt < usually, take a(t) = (1 + t) 1 Stable ODE: h is globally Lipschitz, and ϑ = h(ϑ) is globally asymptotically stable, with globally Lipschitz Lyapunov function 10 / 15

30 Quasi-Stochastic Approximation Quasi-Stochastic Approximation Stability & Convergence Assumptions Ergodicity: ζ satisfies d ϑ(t) = a(t)h(ϑ(t), ζ(t)) dt Decreasing gain: a(t) 0, with 1 T h(θ) = lim h(θ, ζ(t)) dt, for all θ R d. T T 0 a(t) dt =, 0 a(t) 2 dt < 0 Stable ODE: h is globally Lipschitz, and ϑ = h(ϑ) is globally asymptotically stable, with globally Lipschitz Lyapunov function Theorem: ϑ(t) ϑ for any initial condition. 11 / 15

31 Quasi-Stochastic Approximation Quasi-Stochastic Approximation Variance a(t) = 1/(1 + t) The model is linear: h(θ, ζ) = Aθ + ζ, and each λ(a) satisfies Re(λ) < 1. 1 T ζ has zero mean: lim ζ(t) dt = 0, at rate 1/T. T T 0 12 / 15

32 Quasi-Stochastic Approximation Quasi-Stochastic Approximation Variance a(t) = 1/(1 + t) The model is linear: h(θ, ζ) = Aθ + ζ, and each λ(a) satisfies Re(λ) < 1. 1 T ζ has zero mean: lim ζ(t) dt = 0, at rate 1/T. T T 0 Rate of convergence is t 1, and not t 1 2, under these assumptions: 12 / 15

33 Quasi-Stochastic Approximation Quasi-Stochastic Approximation Variance a(t) = 1/(1 + t) The model is linear: h(θ, ζ) = Aθ + ζ, and each λ(a) satisfies Re(λ) < 1. 1 T ζ has zero mean: lim ζ(t) dt = 0, at rate 1/T. T T 0 Rate of convergence is t 1, and not t 1 2, under these assumptions: Theorem: For some constant σ <, lim sup t t ϑ(t) ϑ σ 12 / 15

34 Quasi-Stochastic Approximation Polyak and Juditsky Polyak and Juditsky [4, 2] obtain optimal variance for SA using a very simple approach: High gain, and averaging. 13 / 15

35 Quasi-Stochastic Approximation Polyak and Juditsky Polyak and Juditsky [4, 2] obtain optimal variance for SA using a very simple approach: High gain, and averaging. Deterministic P&J: Choose high-gain, a(t) = 1/(1 + t) δ, with δ (0, 1). d dt γ(t) = 1 h(γ(t), ζ(t)) (1 + t) δ 13 / 15

36 Quasi-Stochastic Approximation Polyak and Juditsky Polyak and Juditsky [4, 2] obtain optimal variance for SA using a very simple approach: High gain, and averaging. Deterministic P&J: Choose high-gain, a(t) = 1/(1 + t) δ, with δ (0, 1). d dt γ(t) = 1 h(γ(t), ζ(t)) (1 + t) δ The output of this algorithm is then averaged: d dt ϑ(t) = 1 ( ) ϑ(t) + γ(t) 1 + t 13 / 15

37 Quasi-Stochastic Approximation Polyak and Juditsky Polyak and Juditsky [4, 2] obtain optimal variance for SA using a very simple approach: High gain, and averaging. Deterministic P&J: Choose high-gain, a(t) = 1/(1 + t) δ, with δ (0, 1). d dt γ(t) = 1 h(γ(t), ζ(t)) (1 + t) δ The output of this algorithm is then averaged: d dt ϑ(t) = 1 ( ) ϑ(t) + γ(t) 1 + t Linear convergence under stability alone 1 There is a finite constant σ satisfying, lim sup t t ϑ(t) ϑ σ 13 / 15

38 Quasi-Stochastic Approximation Polyak and Juditsky Polyak and Juditsky [4, 2] obtain optimal variance for SA using a very simple approach: High gain, and averaging. Deterministic P&J: Choose high-gain, a(t) = 1/(1 + t) δ, with δ (0, 1). d dt γ(t) = 1 h(γ(t), ζ(t)) (1 + t) δ The output of this algorithm is then averaged: d dt ϑ(t) = 1 ( ) ϑ(t) + γ(t) 1 + t Linear convergence under stability alone 1 There is a finite constant σ satisfying, lim sup t 2 Question: Is σ minimal? t ϑ(t) ϑ σ 13 / 15

39 Conclusions Conclusions Summary: QSA is the most natural approach to approximate dynamic programming for deterministic systems, such as Q-learning [3]. Stability is established using standard techniques Linear convergence is obtained under mild stability assumptions. 14 / 15

40 Conclusions Conclusions Summary: QSA is the most natural approach to approximate dynamic programming for deterministic systems, such as Q-learning [3]. Stability is established using standard techniques Linear convergence is obtained under mild stability assumptions. Current research: 1 Open question: Can we extend the approach of Polyak and Juditsky to obtain optimal convergence? 14 / 15

41 Conclusions Conclusions Summary: QSA is the most natural approach to approximate dynamic programming for deterministic systems, such as Q-learning [3]. Stability is established using standard techniques Linear convergence is obtained under mild stability assumptions. Current research: 1 Open question: Can we extend the approach of Polyak and Juditsky to obtain optimal convergence? 1 First we must answer, what is the optimal value of σ? 14 / 15

42 Conclusions Conclusions Summary: QSA is the most natural approach to approximate dynamic programming for deterministic systems, such as Q-learning [3]. Stability is established using standard techniques Linear convergence is obtained under mild stability assumptions. Current research: 1 Open question: Can we extend the approach of Polyak and Juditsky to obtain optimal convergence? 1 First we must answer, what is the optimal value of σ? 2 Concentration on applications: Further development of Q-learning Applications to nonlinear filtering. 14 / 15

43 Conclusions References D.P. Bertsekas and J. N. Tsitsiklis. Neuro-Dynamic Programming. Atena Scientific, Cambridge, Mass, V. S. Borkar. Stochastic Approximation: A Dynamical Systems Viewpoint. Hindustan Book Agency and Cambridge University Press (jointly), Delhi, India and Cambridge, UK, P. G. Mehta and S. P. Meyn. Q-learning and Pontryagin s minimum principle. In Proc. of the 48th IEEE Conf. on Dec. and Control; held jointly with the th Chinese Control Conference, pages , Dec B. T. Polyak and A. B. Juditsky. Acceleration of stochastic approximation by averaging. SIAM J. Control Optim., 30(4): , H. Robbins and S. Monro. A stochastic approximation method. Annals of Mathematical Statistics, 22: , C. J. C. H. Watkins and P. Dayan. Q-learning. Machine Learning, 8(3-4): , / 15

Average-cost temporal difference learning and adaptive control variates

Average-cost temporal difference learning and adaptive control variates Average-cost temporal difference learning and adaptive control variates Sean Meyn Department of ECE and the Coordinated Science Laboratory Joint work with S. Mannor, McGill V. Tadic, Sheffield S. Henderson,

More information

TD Learning. Sean Meyn. Department of Electrical and Computer Engineering University of Illinois & the Coordinated Science Laboratory

TD Learning. Sean Meyn. Department of Electrical and Computer Engineering University of Illinois & the Coordinated Science Laboratory TD Learning Sean Meyn Department of Electrical and Computer Engineering University of Illinois & the Coordinated Science Laboratory Joint work with I.-K. Cho, Illinois S. Mannor, McGill V. Tadic, Sheffield

More information

Stochastic Gradient Descent in Continuous Time

Stochastic Gradient Descent in Continuous Time Stochastic Gradient Descent in Continuous Time Justin Sirignano University of Illinois at Urbana Champaign with Konstantinos Spiliopoulos (Boston University) 1 / 27 We consider a diffusion X t X = R m

More information

Asymptotic and finite-sample properties of estimators based on stochastic gradients

Asymptotic and finite-sample properties of estimators based on stochastic gradients Asymptotic and finite-sample properties of estimators based on stochastic gradients Panagiotis (Panos) Toulis panos.toulis@chicagobooth.edu Econometrics and Statistics University of Chicago, Booth School

More information

Inverse Optimality Design for Biological Movement Systems

Inverse Optimality Design for Biological Movement Systems Inverse Optimality Design for Biological Movement Systems Weiwei Li Emanuel Todorov Dan Liu Nordson Asymtek Carlsbad CA 921 USA e-mail: wwli@ieee.org. University of Washington Seattle WA 98195 USA Google

More information

A Convergent O(n) Algorithm for Off-policy Temporal-difference Learning with Linear Function Approximation

A Convergent O(n) Algorithm for Off-policy Temporal-difference Learning with Linear Function Approximation A Convergent O(n) Algorithm for Off-policy Temporal-difference Learning with Linear Function Approximation Richard S. Sutton, Csaba Szepesvári, Hamid Reza Maei Reinforcement Learning and Artificial Intelligence

More information

Procedia Computer Science 00 (2011) 000 6

Procedia Computer Science 00 (2011) 000 6 Procedia Computer Science (211) 6 Procedia Computer Science Complex Adaptive Systems, Volume 1 Cihan H. Dagli, Editor in Chief Conference Organized by Missouri University of Science and Technology 211-

More information

Hamilton-Jacobi-Bellman Equation Feb 25, 2008

Hamilton-Jacobi-Bellman Equation Feb 25, 2008 Hamilton-Jacobi-Bellman Equation Feb 25, 2008 What is it? The Hamilton-Jacobi-Bellman (HJB) equation is the continuous-time analog to the discrete deterministic dynamic programming algorithm Discrete VS

More information

1 / 21 Perturbation of system dynamics and the covariance completion problem Armin Zare Joint work with: Mihailo R. Jovanović Tryphon T. Georgiou 55th

1 / 21 Perturbation of system dynamics and the covariance completion problem Armin Zare Joint work with: Mihailo R. Jovanović Tryphon T. Georgiou 55th 1 / 21 Perturbation of system dynamics and the covariance completion problem Armin Zare Joint work with: Mihailo R. Jovanović Tryphon T. Georgiou 55th IEEE Conference on Decision and Control, Las Vegas,

More information

Randomized Algorithms for Semi-Infinite Programming Problems

Randomized Algorithms for Semi-Infinite Programming Problems Randomized Algorithms for Semi-Infinite Programming Problems Vladislav B. Tadić 1, Sean P. Meyn 2, and Roberto Tempo 3 1 Department of Automatic Control and Systems Engineering University of Sheffield,

More information

Stochastic Composition Optimization

Stochastic Composition Optimization Stochastic Composition Optimization Algorithms and Sample Complexities Mengdi Wang Joint works with Ethan X. Fang, Han Liu, and Ji Liu ORFE@Princeton ICCOPT, Tokyo, August 8-11, 2016 1 / 24 Collaborators

More information

Perturbation of system dynamics and the covariance completion problem

Perturbation of system dynamics and the covariance completion problem 1 / 21 Perturbation of system dynamics and the covariance completion problem Armin Zare Joint work with: Mihailo R. Jovanović Tryphon T. Georgiou 55th IEEE Conference on Decision and Control, Las Vegas,

More information

Gradient-based Adaptive Stochastic Search

Gradient-based Adaptive Stochastic Search 1 / 41 Gradient-based Adaptive Stochastic Search Enlu Zhou H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology November 5, 2014 Outline 2 / 41 1 Introduction

More information

Problem 1 Cost of an Infinite Horizon LQR

Problem 1 Cost of an Infinite Horizon LQR THE UNIVERSITY OF TEXAS AT SAN ANTONIO EE 5243 INTRODUCTION TO CYBER-PHYSICAL SYSTEMS H O M E W O R K # 5 Ahmad F. Taha October 12, 215 Homework Instructions: 1. Type your solutions in the LATEX homework

More information

Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters

Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters Linear-Quadratic-Gaussian (LQG) Controllers and Kalman Filters Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 204 Emo Todorov (UW) AMATH/CSE 579, Winter

More information

Central-limit approach to risk-aware Markov decision processes

Central-limit approach to risk-aware Markov decision processes Central-limit approach to risk-aware Markov decision processes Jia Yuan Yu Concordia University November 27, 2015 Joint work with Pengqian Yu and Huan Xu. Inventory Management 1 1 Look at current inventory

More information

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning

Lecture 10. Neural networks and optimization. Machine Learning and Data Mining November Nando de Freitas UBC. Nonlinear Supervised Learning Lecture 0 Neural networks and optimization Machine Learning and Data Mining November 2009 UBC Gradient Searching for a good solution can be interpreted as looking for a minimum of some error (loss) function

More information

arxiv: v1 [cs.lg] 23 Oct 2017

arxiv: v1 [cs.lg] 23 Oct 2017 Accelerated Reinforcement Learning K. Lakshmanan Department of Computer Science and Engineering Indian Institute of Technology (BHU), Varanasi, India Email: lakshmanank.cse@itbhu.ac.in arxiv:1710.08070v1

More information

An Empirical Algorithm for Relative Value Iteration for Average-cost MDPs

An Empirical Algorithm for Relative Value Iteration for Average-cost MDPs 2015 IEEE 54th Annual Conference on Decision and Control CDC December 15-18, 2015. Osaka, Japan An Empirical Algorithm for Relative Value Iteration for Average-cost MDPs Abhishek Gupta Rahul Jain Peter

More information

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015

Christopher Watkins and Peter Dayan. Noga Zaslavsky. The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Q-Learning Christopher Watkins and Peter Dayan Noga Zaslavsky The Hebrew University of Jerusalem Advanced Seminar in Deep Learning (67679) November 1, 2015 Noga Zaslavsky Q-Learning (Watkins & Dayan, 1992)

More information

Approximate Dynamic Programming

Approximate Dynamic Programming Approximate Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course Value Iteration: the Idea 1. Let V 0 be any vector in R N A. LAZARIC Reinforcement

More information

A Least Squares Q-Learning Algorithm for Optimal Stopping Problems

A Least Squares Q-Learning Algorithm for Optimal Stopping Problems LIDS REPORT 273 December 2006 Revised: June 2007 A Least Squares Q-Learning Algorithm for Optimal Stopping Problems Huizhen Yu janey.yu@cs.helsinki.fi Dimitri P. Bertsekas dimitrib@mit.edu Abstract We

More information

Temporal difference learning

Temporal difference learning Temporal difference learning AI & Agents for IET Lecturer: S Luz http://www.scss.tcd.ie/~luzs/t/cs7032/ February 4, 2014 Recall background & assumptions Environment is a finite MDP (i.e. A and S are finite).

More information

arxiv: v2 [cs.sy] 27 Sep 2016

arxiv: v2 [cs.sy] 27 Sep 2016 Analysis of gradient descent methods with non-diminishing, bounded errors Arunselvan Ramaswamy 1 and Shalabh Bhatnagar 2 arxiv:1604.00151v2 [cs.sy] 27 Sep 2016 1 arunselvan@csa.iisc.ernet.in 2 shalabh@csa.iisc.ernet.in

More information

Monte Carlo methods for sampling-based Stochastic Optimization

Monte Carlo methods for sampling-based Stochastic Optimization Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from

More information

Temporal-Difference Q-learning in Active Fault Diagnosis

Temporal-Difference Q-learning in Active Fault Diagnosis Temporal-Difference Q-learning in Active Fault Diagnosis Jan Škach 1 Ivo Punčochář 1 Frank L. Lewis 2 1 Identification and Decision Making Research Group (IDM) European Centre of Excellence - NTIS University

More information

Nonlinear Systems and Control Lecture # 19 Perturbed Systems & Input-to-State Stability

Nonlinear Systems and Control Lecture # 19 Perturbed Systems & Input-to-State Stability p. 1/1 Nonlinear Systems and Control Lecture # 19 Perturbed Systems & Input-to-State Stability p. 2/1 Perturbed Systems: Nonvanishing Perturbation Nominal System: Perturbed System: ẋ = f(x), f(0) = 0 ẋ

More information

Learning parameters in ODEs

Learning parameters in ODEs Learning parameters in ODEs Application to biological networks Florence d Alché-Buc Joint work with Minh Quach and Nicolas Brunel IBISC FRE 3190 CNRS, Université d Évry-Val d Essonne, France /14 Florence

More information

Open Theoretical Questions in Reinforcement Learning

Open Theoretical Questions in Reinforcement Learning Open Theoretical Questions in Reinforcement Learning Richard S. Sutton AT&T Labs, Florham Park, NJ 07932, USA, sutton@research.att.com, www.cs.umass.edu/~rich Reinforcement learning (RL) concerns the problem

More information

Sufficient Conditions for the Existence of Resolution Complete Planning Algorithms

Sufficient Conditions for the Existence of Resolution Complete Planning Algorithms Sufficient Conditions for the Existence of Resolution Complete Planning Algorithms Dmitry Yershov and Steve LaValle Computer Science niversity of Illinois at rbana-champaign WAFR 2010 December 15, 2010

More information

Introduction to Reinforcement Learning. CMPT 882 Mar. 18

Introduction to Reinforcement Learning. CMPT 882 Mar. 18 Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and

More information

On Stochastic Adaptive Control & its Applications. Bozenna Pasik-Duncan University of Kansas, USA

On Stochastic Adaptive Control & its Applications. Bozenna Pasik-Duncan University of Kansas, USA On Stochastic Adaptive Control & its Applications Bozenna Pasik-Duncan University of Kansas, USA ASEAS Workshop, AFOSR, 23-24 March, 2009 1. Motivation: Work in the 1970's 2. Adaptive Control of Continuous

More information

On the Convergence of Optimistic Policy Iteration

On the Convergence of Optimistic Policy Iteration Journal of Machine Learning Research 3 (2002) 59 72 Submitted 10/01; Published 7/02 On the Convergence of Optimistic Policy Iteration John N. Tsitsiklis LIDS, Room 35-209 Massachusetts Institute of Technology

More information

Model-Based Reinforcement Learning with Continuous States and Actions

Model-Based Reinforcement Learning with Continuous States and Actions Marc P. Deisenroth, Carl E. Rasmussen, and Jan Peters: Model-Based Reinforcement Learning with Continuous States and Actions in Proceedings of the 16th European Symposium on Artificial Neural Networks

More information

A framework for adaptive Monte-Carlo procedures

A framework for adaptive Monte-Carlo procedures A framework for adaptive Monte-Carlo procedures Jérôme Lelong (with B. Lapeyre) http://www-ljk.imag.fr/membres/jerome.lelong/ Journées MAS Bordeaux Friday 3 September 2010 J. Lelong (ENSIMAG LJK) Journées

More information

Risk-Sensitive and Robust Mean Field Games

Risk-Sensitive and Robust Mean Field Games Risk-Sensitive and Robust Mean Field Games Tamer Başar Coordinated Science Laboratory Department of Electrical and Computer Engineering University of Illinois at Urbana-Champaign Urbana, IL - 6181 IPAM

More information

1 Problem Formulation

1 Problem Formulation Book Review Self-Learning Control of Finite Markov Chains by A. S. Poznyak, K. Najim, and E. Gómez-Ramírez Review by Benjamin Van Roy This book presents a collection of work on algorithms for learning

More information

Stability of optimization problems with stochastic dominance constraints

Stability of optimization problems with stochastic dominance constraints Stability of optimization problems with stochastic dominance constraints D. Dentcheva and W. Römisch Stevens Institute of Technology, Hoboken Humboldt-University Berlin www.math.hu-berlin.de/~romisch SIAM

More information

Dynamic programming using radial basis functions and Shepard approximations

Dynamic programming using radial basis functions and Shepard approximations Dynamic programming using radial basis functions and Shepard approximations Oliver Junge, Alex Schreiber Fakultät für Mathematik Technische Universität München Workshop on algorithms for dynamical systems

More information

Introduction to Nonlinear Control Lecture # 3 Time-Varying and Perturbed Systems

Introduction to Nonlinear Control Lecture # 3 Time-Varying and Perturbed Systems p. 1/5 Introduction to Nonlinear Control Lecture # 3 Time-Varying and Perturbed Systems p. 2/5 Time-varying Systems ẋ = f(t, x) f(t, x) is piecewise continuous in t and locally Lipschitz in x for all t

More information

Online Control Basis Selection by a Regularized Actor Critic Algorithm

Online Control Basis Selection by a Regularized Actor Critic Algorithm Online Control Basis Selection by a Regularized Actor Critic Algorithm Jianjun Yuan and Andrew Lamperski Abstract Policy gradient algorithms are useful reinforcement learning methods which optimize a control

More information

A Systematic Approach to Extremum Seeking Based on Parameter Estimation

A Systematic Approach to Extremum Seeking Based on Parameter Estimation 49th IEEE Conference on Decision and Control December 15-17, 21 Hilton Atlanta Hotel, Atlanta, GA, USA A Systematic Approach to Extremum Seeking Based on Parameter Estimation Dragan Nešić, Alireza Mohammadi

More information

c 2014 Krishna Kalyan Medarametla

c 2014 Krishna Kalyan Medarametla c 2014 Krishna Kalyan Medarametla COMPARISON OF TWO NONLINEAR FILTERING TECHNIQUES - THE EXTENDED KALMAN FILTER AND THE FEEDBACK PARTICLE FILTER BY KRISHNA KALYAN MEDARAMETLA THESIS Submitted in partial

More information

HJB equations. Seminar in Stochastic Modelling in Economics and Finance January 10, 2011

HJB equations. Seminar in Stochastic Modelling in Economics and Finance January 10, 2011 Department of Probability and Mathematical Statistics Faculty of Mathematics and Physics, Charles University in Prague petrasek@karlin.mff.cuni.cz Seminar in Stochastic Modelling in Economics and Finance

More information

Markov Decision Processes and Dynamic Programming

Markov Decision Processes and Dynamic Programming Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course In This Lecture A. LAZARIC Markov Decision Processes

More information

Stochastic Process II Dr.-Ing. Sudchai Boonto

Stochastic Process II Dr.-Ing. Sudchai Boonto Dr-Ing Sudchai Boonto Department of Control System and Instrumentation Engineering King Mongkuts Unniversity of Technology Thonburi Thailand Random process Consider a random experiment specified by the

More information

Quadratic Stability of Dynamical Systems. Raktim Bhattacharya Aerospace Engineering, Texas A&M University

Quadratic Stability of Dynamical Systems. Raktim Bhattacharya Aerospace Engineering, Texas A&M University .. Quadratic Stability of Dynamical Systems Raktim Bhattacharya Aerospace Engineering, Texas A&M University Quadratic Lyapunov Functions Quadratic Stability Dynamical system is quadratically stable if

More information

C~I- c-5~- n.=. r (s) = + 'I. ~t(y(-.*) - y^cbnld)

C~I- c-5~- n.=. r (s) = + 'I. ~t(y(-.*) - y^cbnld) A C~I- c-5~- n.=. r (s) = + 'I. ~t(y(-.*) - y^cbnld) b 9, (4 f 3 cc*l, 'PC w4. ECE 555 Control of Stochastic Systems Fall 2005 Handout: Reinforcement learning In this handout we analyse reinforcement

More information

Almost Sure Convergence of Two Time-Scale Stochastic Approximation Algorithms

Almost Sure Convergence of Two Time-Scale Stochastic Approximation Algorithms Almost Sure Convergence of Two Time-Scale Stochastic Approximation Algorithms Vladislav B. Tadić Abstract The almost sure convergence of two time-scale stochastic approximation algorithms is analyzed under

More information

Errors-in-variables identification through covariance matching: Analysis of a colored measurement noise case

Errors-in-variables identification through covariance matching: Analysis of a colored measurement noise case 008 American Control Conference Westin Seattle Hotel Seattle Washington USA June -3 008 WeB8.4 Errors-in-variables identification through covariance matching: Analysis of a colored measurement noise case

More information

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69

I D I A P. Online Policy Adaptation for Ensemble Classifiers R E S E A R C H R E P O R T. Samy Bengio b. Christos Dimitrakakis a IDIAP RR 03-69 R E S E A R C H R E P O R T Online Policy Adaptation for Ensemble Classifiers Christos Dimitrakakis a IDIAP RR 03-69 Samy Bengio b I D I A P December 2003 D a l l e M o l l e I n s t i t u t e for Perceptual

More information

Value Function Based Reinforcement Learning in Changing Markovian Environments

Value Function Based Reinforcement Learning in Changing Markovian Environments Journal of Machine Learning Research 9 (2008) 1679-1709 Submitted 6/07; Revised 12/07; Published 8/08 Value Function Based Reinforcement Learning in Changing Markovian Environments Balázs Csanád Csáji

More information

6 OUTPUT FEEDBACK DESIGN

6 OUTPUT FEEDBACK DESIGN 6 OUTPUT FEEDBACK DESIGN When the whole sate vector is not available for feedback, i.e, we can measure only y = Cx. 6.1 Review of observer design Recall from the first class in linear systems that a simple

More information

Auxiliary Particle Methods

Auxiliary Particle Methods Auxiliary Particle Methods Perspectives & Applications Adam M. Johansen 1 adam.johansen@bristol.ac.uk Oxford University Man Institute 29th May 2008 1 Collaborators include: Arnaud Doucet, Nick Whiteley

More information

Optimal Control. Lecture 18. Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen. March 29, Ref: Bryson & Ho Chapter 4.

Optimal Control. Lecture 18. Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen. March 29, Ref: Bryson & Ho Chapter 4. Optimal Control Lecture 18 Hamilton-Jacobi-Bellman Equation, Cont. John T. Wen Ref: Bryson & Ho Chapter 4. March 29, 2004 Outline Hamilton-Jacobi-Bellman (HJB) Equation Iterative solution of HJB Equation

More information

ON STEP SIZES, STOCHASTIC SHORTEST PATHS, AND SURVIVAL PROBABILITIES IN REINFORCEMENT LEARNING. Abhijit Gosavi

ON STEP SIZES, STOCHASTIC SHORTEST PATHS, AND SURVIVAL PROBABILITIES IN REINFORCEMENT LEARNING. Abhijit Gosavi Proceedings of the 2008 Winter Simulation Conference S. J. Mason, R. R. Hill, L. Moench, O. Rose, eds. ON STEP SIZES, STOCHASTIC SHORTEST PATHS, AND SURVIVAL PROBABILITIES IN REINFORCEMENT LEARNING Abhijit

More information

Lecture 5 Linear Quadratic Stochastic Control

Lecture 5 Linear Quadratic Stochastic Control EE363 Winter 2008-09 Lecture 5 Linear Quadratic Stochastic Control linear-quadratic stochastic control problem solution via dynamic programming 5 1 Linear stochastic system linear dynamical system, over

More information

SUPPLEMENT TO CONTROLLED EQUILIBRIUM SELECTION IN STOCHASTICALLY PERTURBED DYNAMICS

SUPPLEMENT TO CONTROLLED EQUILIBRIUM SELECTION IN STOCHASTICALLY PERTURBED DYNAMICS SUPPLEMENT TO CONTROLLED EQUILIBRIUM SELECTION IN STOCHASTICALLY PERTURBED DYNAMICS By Ari Arapostathis, Anup Biswas, and Vivek S. Borkar The University of Texas at Austin, Indian Institute of Science

More information

Controlled Diffusions and Hamilton-Jacobi Bellman Equations

Controlled Diffusions and Hamilton-Jacobi Bellman Equations Controlled Diffusions and Hamilton-Jacobi Bellman Equations Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2014 Emo Todorov (UW) AMATH/CSE 579, Winter

More information

ON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME

ON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME ON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME SAUL D. JACKA AND ALEKSANDAR MIJATOVIĆ Abstract. We develop a general approach to the Policy Improvement Algorithm (PIA) for stochastic control problems

More information

ON STEP SIZES, STOCHASTIC SHORTEST PATHS, AND SURVIVAL PROBABILITIES IN REINFORCEMENT LEARNING. Abhijit Gosavi

ON STEP SIZES, STOCHASTIC SHORTEST PATHS, AND SURVIVAL PROBABILITIES IN REINFORCEMENT LEARNING. Abhijit Gosavi Proceedings of the 2008 Winter Simulation Conference S. J. Mason, R. R. Hill, L. Mönch, O. Rose, T. Jefferson, J. W. Fowler eds. ON STEP SIZES, STOCHASTIC SHORTEST PATHS, AND SURVIVAL PROBABILITIES IN

More information

Adaptive linear quadratic control using policy. iteration. Steven J. Bradtke. University of Massachusetts.

Adaptive linear quadratic control using policy. iteration. Steven J. Bradtke. University of Massachusetts. Adaptive linear quadratic control using policy iteration Steven J. Bradtke Computer Science Department University of Massachusetts Amherst, MA 01003 bradtke@cs.umass.edu B. Erik Ydstie Department of Chemical

More information

Algorithms for Reinforcement Learning

Algorithms for Reinforcement Learning Algorithms for Reinforcement Learning Draft of the lecture published in the Synthesis Lectures on Artificial Intelligence and Machine Learning series by Morgan & Claypool Publishers Csaba Szepesvári June

More information

An asymptotic ratio characterization of input-to-state stability

An asymptotic ratio characterization of input-to-state stability 1 An asymptotic ratio characterization of input-to-state stability Daniel Liberzon and Hyungbo Shim Abstract For continuous-time nonlinear systems with inputs, we introduce the notion of an asymptotic

More information

Reinforcement learning

Reinforcement learning Reinforcement learning Based on [Kaelbling et al., 1996, Bertsekas, 2000] Bert Kappen Reinforcement learning Reinforcement learning is the problem faced by an agent that must learn behavior through trial-and-error

More information

Beyond stochastic gradient descent for large-scale machine learning

Beyond stochastic gradient descent for large-scale machine learning Beyond stochastic gradient descent for large-scale machine learning Francis Bach INRIA - Ecole Normale Supérieure, Paris, France Joint work with Eric Moulines - October 2014 Big data revolution? A new

More information

ON-LINE OPTIMIZATION OF SWITCHED-MODE DYNAMICAL SYSTEMS 1. X.C. Ding, Y. Wardi 2, and M. Egerstedt

ON-LINE OPTIMIZATION OF SWITCHED-MODE DYNAMICAL SYSTEMS 1. X.C. Ding, Y. Wardi 2, and M. Egerstedt ON-LINE OPTIMIZATION OF SWITCHED-MODE DYNAMICAL SYSTEMS 1 X.C. Ding, Y. Wardi, and M. Egerstedt Abstract This paper considers an optimization problem in the setting of switched-mode hybrid dynamical systems,

More information

Linearly-Convergent Stochastic-Gradient Methods

Linearly-Convergent Stochastic-Gradient Methods Linearly-Convergent Stochastic-Gradient Methods Joint work with Francis Bach, Michael Friedlander, Nicolas Le Roux INRIA - SIERRA Project - Team Laboratoire d Informatique de l École Normale Supérieure

More information

OPTIMAL CONTROL. Sadegh Bolouki. Lecture slides for ECE 515. University of Illinois, Urbana-Champaign. Fall S. Bolouki (UIUC) 1 / 28

OPTIMAL CONTROL. Sadegh Bolouki. Lecture slides for ECE 515. University of Illinois, Urbana-Champaign. Fall S. Bolouki (UIUC) 1 / 28 OPTIMAL CONTROL Sadegh Bolouki Lecture slides for ECE 515 University of Illinois, Urbana-Champaign Fall 2016 S. Bolouki (UIUC) 1 / 28 (Example from Optimal Control Theory, Kirk) Objective: To get from

More information

Stochastic Hybrid Systems: Modeling, analysis, and applications to networks and biology

Stochastic Hybrid Systems: Modeling, analysis, and applications to networks and biology research supported by NSF Stochastic Hybrid Systems: Modeling, analysis, and applications to networks and biology João P. Hespanha Center for Control Engineering and Computation University of California

More information

Chapter 2 Event-Triggered Sampling

Chapter 2 Event-Triggered Sampling Chapter Event-Triggered Sampling In this chapter, some general ideas and basic results on event-triggered sampling are introduced. The process considered is described by a first-order stochastic differential

More information

Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk

Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk John Lafferty School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 lafferty@cs.cmu.edu Abstract

More information

Approximate Dynamic Programming

Approximate Dynamic Programming Master MVA: Reinforcement Learning Lecture: 5 Approximate Dynamic Programming Lecturer: Alessandro Lazaric http://researchers.lille.inria.fr/ lazaric/webpage/teaching.html Objectives of the lecture 1.

More information

HYBRID DETERMINISTIC-STOCHASTIC GRADIENT LANGEVIN DYNAMICS FOR BAYESIAN LEARNING

HYBRID DETERMINISTIC-STOCHASTIC GRADIENT LANGEVIN DYNAMICS FOR BAYESIAN LEARNING COMMUNICATIONS IN INFORMATION AND SYSTEMS c 01 International Press Vol. 1, No. 3, pp. 1-3, 01 003 HYBRID DETERMINISTIC-STOCHASTIC GRADIENT LANGEVIN DYNAMICS FOR BAYESIAN LEARNING QI HE AND JACK XIN Abstract.

More information

A Function Approximation Approach to Estimation of Policy Gradient for POMDP with Structured Policies

A Function Approximation Approach to Estimation of Policy Gradient for POMDP with Structured Policies A Function Approximation Approach to Estimation of Policy Gradient for POMDP with Structured Policies Huizhen Yu Lab for Information and Decision Systems Massachusetts Institute of Technology Cambridge,

More information

Gradient-Based Adaptive Stochastic Search for Non-Differentiable Optimization

Gradient-Based Adaptive Stochastic Search for Non-Differentiable Optimization Gradient-Based Adaptive Stochastic Search for Non-Differentiable Optimization Enlu Zhou Department of Industrial & Enterprise Systems Engineering, University of Illinois at Urbana-Champaign, IL 61801,

More information

Proceedings of the International Conference on Neural Networks, Orlando Florida, June Leemon C. Baird III

Proceedings of the International Conference on Neural Networks, Orlando Florida, June Leemon C. Baird III Proceedings of the International Conference on Neural Networks, Orlando Florida, June 1994. REINFORCEMENT LEARNING IN CONTINUOUS TIME: ADVANTAGE UPDATING Leemon C. Baird III bairdlc@wl.wpafb.af.mil Wright

More information

On Two Different Signal Processing Models

On Two Different Signal Processing Models On Two Different Signal Processing Models Department of Mathematics & Statistics Indian Institute of Technology Kanpur January 15, 2015 Outline First Model 1 First Model 2 3 4 5 6 Outline First Model 1

More information

ECE276B: Planning & Learning in Robotics Lecture 16: Model-free Control

ECE276B: Planning & Learning in Robotics Lecture 16: Model-free Control ECE276B: Planning & Learning in Robotics Lecture 16: Model-free Control Lecturer: Nikolay Atanasov: natanasov@ucsd.edu Teaching Assistants: Tianyu Wang: tiw161@eng.ucsd.edu Yongxi Lu: yol070@eng.ucsd.edu

More information

Dynamic Matching Models

Dynamic Matching Models Dynamic Matching Models Ana Bušić Inria Paris - Rocquencourt CS Department of École normale supérieure joint work with Varun Gupta, Jean Mairesse and Sean Meyn 3rd Workshop on Cognition and Control January

More information

Finding the best mismatched detector for channel coding and hypothesis testing

Finding the best mismatched detector for channel coding and hypothesis testing Finding the best mismatched detector for channel coding and hypothesis testing Sean Meyn Department of Electrical and Computer Engineering University of Illinois and the Coordinated Science Laboratory

More information

A System Theoretic Perspective of Learning and Optimization

A System Theoretic Perspective of Learning and Optimization A System Theoretic Perspective of Learning and Optimization Xi-Ren Cao* Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong Kong eecao@ee.ust.hk Abstract Learning and optimization

More information

The Art of Sequential Optimization via Simulations

The Art of Sequential Optimization via Simulations The Art of Sequential Optimization via Simulations Stochastic Systems and Learning Laboratory EE, CS* & ISE* Departments Viterbi School of Engineering University of Southern California (Based on joint

More information

Linearly-Solvable Stochastic Optimal Control Problems

Linearly-Solvable Stochastic Optimal Control Problems Linearly-Solvable Stochastic Optimal Control Problems Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2014 Emo Todorov (UW) AMATH/CSE 579, Winter 2014

More information

Deterministic Dynamic Programming

Deterministic Dynamic Programming Deterministic Dynamic Programming 1 Value Function Consider the following optimal control problem in Mayer s form: V (t 0, x 0 ) = inf u U J(t 1, x(t 1 )) (1) subject to ẋ(t) = f(t, x(t), u(t)), x(t 0

More information

Controlled sequential Monte Carlo

Controlled sequential Monte Carlo Controlled sequential Monte Carlo Jeremy Heng, Department of Statistics, Harvard University Joint work with Adrian Bishop (UTS, CSIRO), George Deligiannidis & Arnaud Doucet (Oxford) Bayesian Computation

More information

Chapter 13 Wow! Least Squares Methods in Batch RL

Chapter 13 Wow! Least Squares Methods in Batch RL Chapter 13 Wow! Least Squares Methods in Batch RL Objectives of this chapter: Introduce batch RL Tradeoffs: Least squares vs. gradient methods Evaluating policies: Fitted value iteration Bellman residual

More information

Gradient estimation for attractor networks. Thomas Flynn

Gradient estimation for attractor networks. Thomas Flynn Gradient estimation for attractor networks Thomas Flynn Abstract We study several types of neural networks that involve feedback connections. These models lead to interesting problems in optimization and

More information

Stability and Asymptotic Optimality of h-maxweight Policies

Stability and Asymptotic Optimality of h-maxweight Policies Stability and Asymptotic Optimality of h-maxweight Policies - A. Rybko, 2006 Sean Meyn Department of Electrical and Computer Engineering University of Illinois & the Coordinated Science Laboratory NSF

More information

Resource Pooling for Optimal Evacuation of a Large Building

Resource Pooling for Optimal Evacuation of a Large Building Proceedings of the 47th IEEE Conference on Decision and Control Cancun, Mexico, Dec. 9-11, 28 Resource Pooling for Optimal Evacuation of a Large Building Kun Deng, Wei Chen, Prashant G. Mehta, and Sean

More information

Stability of Nonlinear Systems An Introduction

Stability of Nonlinear Systems An Introduction Stability of Nonlinear Systems An Introduction Michael Baldea Department of Chemical Engineering The University of Texas at Austin April 3, 2012 The Concept of Stability Consider the generic nonlinear

More information

Statistical Sparse Online Regression: A Diffusion Approximation Perspective

Statistical Sparse Online Regression: A Diffusion Approximation Perspective Statistical Sparse Online Regression: A Diffusion Approximation Perspective Jianqing Fan Wenyan Gong Chris Junchi Li Qiang Sun Princeton University Princeton University Princeton University University

More information

Introduction to multiscale modeling and simulation. Explicit methods for ODEs : forward Euler. y n+1 = y n + tf(y n ) dy dt = f(y), y(0) = y 0

Introduction to multiscale modeling and simulation. Explicit methods for ODEs : forward Euler. y n+1 = y n + tf(y n ) dy dt = f(y), y(0) = y 0 Introduction to multiscale modeling and simulation Lecture 5 Numerical methods for ODEs, SDEs and PDEs The need for multiscale methods Two generic frameworks for multiscale computation Explicit methods

More information

Reinforcement Learning in Continuous Action Spaces

Reinforcement Learning in Continuous Action Spaces Reinforcement Learning in Continuous Action Spaces Hado van Hasselt and Marco A. Wiering Intelligent Systems Group, Department of Information and Computing Sciences, Utrecht University Padualaan 14, 3508

More information

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.

More information

Lyapunov Stability Theory

Lyapunov Stability Theory Lyapunov Stability Theory Peter Al Hokayem and Eduardo Gallestey March 16, 2015 1 Introduction In this lecture we consider the stability of equilibrium points of autonomous nonlinear systems, both in continuous

More information

arxiv: v1 [math.na] 26 Feb 2013

arxiv: v1 [math.na] 26 Feb 2013 1 Feedback Particle Filter Tao Yang, Prashant G. Mehta and Sean P. Meyn Abstract arxiv:1302.6563v1 [math.na] 26 Feb 2013 A new formulation of the particle filter for nonlinear filtering is presented, based

More information

Lecture 5: Control Over Lossy Networks

Lecture 5: Control Over Lossy Networks Lecture 5: Control Over Lossy Networks Yilin Mo July 2, 2015 1 Classical LQG Control The system: x k+1 = Ax k + Bu k + w k, y k = Cx k + v k x 0 N (0, Σ), w k N (0, Q), v k N (0, R). Information available

More information

ilstd: Eligibility Traces and Convergence Analysis

ilstd: Eligibility Traces and Convergence Analysis ilstd: Eligibility Traces and Convergence Analysis Alborz Geramifard Michael Bowling Martin Zinkevich Richard S. Sutton Department of Computing Science University of Alberta Edmonton, Alberta {alborz,bowling,maz,sutton}@cs.ualberta.ca

More information

Dirac Mixture Density Approximation Based on Minimization of the Weighted Cramér von Mises Distance

Dirac Mixture Density Approximation Based on Minimization of the Weighted Cramér von Mises Distance Dirac Mixture Density Approximation Based on Minimization of the Weighted Cramér von Mises Distance Oliver C Schrempf, Dietrich Brunn, and Uwe D Hanebeck Abstract This paper proposes a systematic procedure

More information