TD Learning. Sean Meyn. Department of Electrical and Computer Engineering University of Illinois & the Coordinated Science Laboratory

Size: px
Start display at page:

Download "TD Learning. Sean Meyn. Department of Electrical and Computer Engineering University of Illinois & the Coordinated Science Laboratory"

Transcription

1 TD Learning Sean Meyn Department of Electrical and Computer Engineering University of Illinois & the Coordinated Science Laboratory Joint work with I.-K. Cho, Illinois S. Mannor, McGill V. Tadic, Sheffield S. Henderson, Cornell NSF support: ECS and DARPA ITMANET

2 Art Recollections - A. Rybko, 2006

3 Recollections Value Function Solidarity Relative value function h(x) = 0 E[c(Q(t; x)) η] dt η= c(x) π(dx) Fluid value function J(x) = - A. Rybko, 2006 Art lim x 0 c(q(t; x)) dt J(x) =1 h(x)

4 Recollections Lyapunov Functions, and Policy Translation x2 Relative value function h(x) = 0 (x) = E[c(Q(t; x)) η] dt η= q2 Fluid value function J(x) = - A. Rybko, 2006 Art PV α1 µ2 0 (x) = c(x) π(dx) µ1 +α1 µ2 +µ1 c(q(t; x)) dt J(x) lim =1 h(x) (x)x := E[V (Q(k + 1)) (x) = µ1 +α1 µ1 Q(k) = x] V (x) Solidarity ε x + η Value Function x1

5 Outline Value function approximation w 2 Infeasible R Performance evaluation Perfromance improvement Simulation TD learning {h (x) =h 0 } w 1 Numerics κ =1 κ =4 Conclusions n

6 Outline Value function approximation Performance evaluation Perfromance improvement Simulation TD learning w 2 Infeasible R {h (x) =h 0 } w 1 Numerics Conclusions κ =1 κ = n

7 Average cost { X(t) :t =0,1,2,... } P (x,y) πp = π c: X R + η = π(c) Ergodic Markov chain on X (countable) Transition matrix Invariant probability measure Cost function (near monotone) Finite average cost η = π(c) = lim n 1 n n c(x(t)) t=0

8 Value functions Value functions: Discounted [ ] h(x):=e (1 + γ) t 1 c(x(t)) t =0 Relative [ τ 0 ] h(x):=e (c(x(t)) η) t =0 X(0) = x X

9 Value functions Value functions: Discounted [ ] h(x):=e (1 + γ) t 1 c(x(t)) t =0 Relative [ τ 0 ] h(x):=e (c(x(t)) η) t =0 X(0) = x X Dynamic programming equations: Discounted [γi D]h = c D = P I Relative D h = c η

10 Value function approximation Discounted-cost value function: [ ] h(x) =E (1 + γ) t 1 c(x(t)) t =0 Network model with c(x) a norm on X Under general conditions, approximated by scaled cost: lim r 1 r h(rx) = 1 γ c(x)

11 Value function approximation Relative value function, [ τ 0 ] h(x) = E (c(x(t)) η) t =0 h solves Poisson s equation, D h = c + η

12 Value function approximation Value function approximated by fluid model: [ τ 0 ] h(x) = E (c(x(t)) η) J(x) = t =0 0 c(q(t)) dt Relative value function Fluid value function Under general conditions, lim r 1 h(rx) =J(x) r2

13 Value function approximation Value function approximated by fluid model: [ τ 0 ] h(x) =E (c(x(t)) η) J(x) = t =0 0 c(q(t)) dt Infeasible R Workload model: level sets of h ι 2 (k) = ι 2 (k) > 0 20 {h (x) =h 0 }

14 Value function approximation: Simulation Simulation: Standard Monte Carlo, η k = k 1 k 1 t=0 c(x(t)) CLT variance: σ 2 = π(h 2 (Ph) 2 ) where h solves Poisson s equation D h = c + η

15 Value function approximation: Simulation Simulation: Standard Monte Carlo, η k = k 1 k 1 t=0 c(x(t)) CLT variance: σ 2 = π(h 2 (Ph) 2 ) where h solves Poisson s equation D h = c + η Example: Simulating the single queue: CLT variance = O 1 (1 ρ) 4

16 Value function approximation: Simulation η n n 1 = n 1 t=0 Q(t) lim log P{η n r} 0, n r > η n Example: Simulating the single queue: CLT variance = O 1 (1 ρ) 4

17 Value function approximation: Simulation Control variates: g : X R has known mean π(g )=0, Smoothed estimator: η k (ϑ) =k 1 k 1 t=0 c(x(t)) ϑg (X(t)), k 1 Perfect estimator: g = h Ph = Dh = c η Zero CLT variance

18 Steady state customer population Smoothed estimator using fluid value function: g = h Ph = Dh, h = J x 10 Standard estimator Smoothed estimator

19 Outline Value function approximation w 2 Infeasible R Performance evaluation Perfromance improvement Simulation TD learning {h (x) =h 0 } w 1 Numerics Conclusions κ =1 κ = n

20 Value function approximation Linear parametrization: h θ (x):=θ T ψ(x) = l h θ i ψ i (x), x X i=1 How to find the best parameter θ? How to define best?

21 Value function approximation: Hilbert space setting Linear parametrization: h θ (x):=θ T ψ(x) = l h θ i ψ i (x), x X i=1 Minimize the mean square error h θ h 2 π

22 Value function approximation: Hilbert space setting Linear parametrization: h θ (x):=θ T ψ(x) = l h θ i ψ i (x), x X i=1 Minimize the mean square error h θ h 2 π = 2 E π [(h θ (X(k)) h(x(t))) ] θ h θ h 2 π =2E π [(h θ (X(k)) h(x(t))) ψ (X(t))] Setting the gradient to zero gives optimal parameter

23 Value function approximation: Hilbert space setting Linear parametrization: h θ (x):=θ T ψ(x) = l h θ i ψ i (x), x X i=1 Minimize the mean square error h θ h 2 π = 2 E π [(h θ (X(k)) h(x(t))) ] θ h θ h 2 π =2E π [(h θ (X(k)) h(x(t))) ψ (X(t))] Setting the gradient to zero gives optimal parameter θ =Σ 1 b b = E π [h(x)ψ(x)], Σ=E π [ψ(x)ψ(x) T ]

24 Value function approximation: Hilbert space setting Define for two functions f, g on X, f,g π = π(fg)

25 Value function approximation: Hilbert space setting Define for two functions f, g on X, f,g π = π(fg) L norm, 2 f g 2 π := f g,f g π = E π [(f(x(0)) g(x(0))) 2 ]

26 Value function approximation: Hilbert space setting Define for two functions f, g on X, f,g π = π(fg) L norm, 2 f g 2 π := f g,f g π = E π [(f(x(0)) g(x(0))) 2 ] Resolvent, R γ = (1 + γ) (t+1) P t t=0

27 Value function approximation: Hilbert space setting Define for two functions f, g on X, f,g π = π(fg) L norm, 2 f g 2 π := f g,f g π = E π [(f(x(0)) g(x(0))) 2 ] Resolvent, R γ = (1 + γ) (t+1) P t t=0 b = E π [h(x)ψ(x)] = R γ c,ψ π

28 Value function approximation: Hilbert space setting Define for two functions f, g on X, f,g π = π(fg) L norm, 2 f g 2 π := f g,f g π = E π [(f(x(0)) g(x(0))) 2 ] Resolvent, R γ = (1 + γ) (t+1) P t t=0 Adjoint, R γ f,g π = f,r γ g π, f, g L 2 (π)

29 Representing the adjoint For the stationary process, R γ f, g π = E t=0 (1 + γ) t 1 P t f (X(0)) g(x(0)) (discounted cost)

30 Representing the adjoint For the stationary process, R γ f, g π = E t=0 (1 + γ) t 1 P t f (X(0)) g(x(0)) Smoothing, E[P t f (X(0))g(X(0))] = E[f(X(t))g(X(0))] = E[f(X(0))g(X( t))]

31 Representing the adjoint For the stationary process, R γ f, g π = E t=0 (1 + γ) t 1 P t f (X(0)) g(x(0)) Smoothing, E[P t f (X(0))g(X(0))] = E[f(X(t))g(X(0))] = E[f(X(0))g(X( t))] R γ f, g π = t=0 (1 + γ) t 1 E[f(X(0))g(X( t))]

32 Representing the adjoint For the stationary process, R γ f, g π = E t=0 (1 + γ) t 1 P t f (X(0)) g(x(0)) Smoothing, E[P t f (X(0))g(X(0))] = E[f(X(t))g(X(0))] = E[f(X(0))g(X( t))] R γ f, g π = t=0 (1 + γ) t 1 E[f(X(0))g(X( t))] = f, R γ g π

33 Representing the adjoint Adjoint is defined in terms of the time-reversed process R γ g (x) = (1 + γ) t 1 E[ g(x( t)) X(0) = x ] t=0 Causal computable!

34 TD Learning for discounted cost Causal computable! 1 θ 2 h θ h 2 π = E π [(h θ (X(k)) h(x(t))) ψ (X(t))] = (h θ R γ c), ψ π = R γ (Rγ 1 h θ c), ψ π = Rγ 1 h θ c, R γ ψ π

35 TD Learning for discounted cost Causal computable! 1 θ 2 h θ h 2 π = E π [(h θ (X(k)) h(x(t))) ψ (X(t))] = (h θ R γ c), ψ π = R γ (Rγ 1 h θ c), ψ π = Rγ 1 h θ c, R γ ψ π = d, P ϕ π Notation

36 TD Learning for discounted cost Causal computable! E[d(k)ϕ(k +1)] 1 2 θ h θ h γ 2 π, θ(i) θ 1 θ 2 h θ h 2 π = E π [(h θ (X(k)) h(x(t))) ψ (X(t))] = (h θ R γ c), ψ π = R γ (Rγ 1 h θ c), ψ π = Rγ 1 h θ c, R γ ψ π = d, P ϕ π Notation

37 TD Learning for discounted cost Causal computable! E[d(k)ϕ(k + 1)] 1 2 θ h θ h γ 2 π, θ(i) θ (i) Temporal differences defined by, d(k) = [(1 + γ)h θ(k) (X(k)) h θ(k) (X(k + 1)) c(x(k))] (ii) Eligibility vectors: the sequence of d-dimensional vectors, ϕ(k) = k (1 + γ) t 1 ψ(x(k t)), k 1 t=0

38 Algorithms TD(1) algorithm θ(k +1) θ(k) =a k d(k)ϕ(x(k +1)), k 0 a k stochastic approx. gain

39 Algorithms TD(1) algorithm θ(k +1) θ(k) =a k d(k)ϕ(x(k +1)), k 0 a k stochastic approx. gain Consistent under mild conditions, θ =Σ 1 b b = E π [h(x)ψ(x)], Σ=E π [ψ(x)ψ(x) T ]

40 Algorithms Newton algorithm θ(k +1)= 1 k k i=1 ψ(x(i +1))ψ T (X(i +1)) 1 1 k k i=1 c(i)ϕ(x(i +1)) Consistent under mild conditions, θ =Σ 1 b b = E π [h(x)ψ(x)], Σ=E π [ψ(x)ψ(x) T ]

41 TD Learning for average cost Potential matrix R 0 g (x) =E x [ σ α t=0 ] g(x(t)), x X σ α hitting time to α X

42 TD Learning for average cost Potential matrix R 0 g (x) =E x [ σ α t=0 ] g(x(t)), x X σ α hitting time to α X Solves Poisson s equation when g = c π(c)

43 TD Learning for average cost Potential matrix R 0 g (x) =E x [ σ α t=0 ] g(x(t)), x X σ α hitting time to α X Adjoint R 0 g (x) =E [ g(x(t)) ] X(0) = x, g L 2 (π) σ [0] α t 0 Backward hitting time α =max{ t k : X(t) =α } σ [k] CTCN 07

44 Value function approximation: Other metrics Bellman error: E π [(Ph θ (X) (1 + γ)h θ (X)+c(X)) 2 ] (discounted cost)

45 Value function approximation: Other metrics Bellman error: E π [(Ph θ (X) (1 + γ)h θ (X)+c(X)) 2 ] (discounted cost) CLT error: Tadic & M. 03, Kim & Henderson 03, CTCN 07 σ 2 (θ) =E π [(h h θ ) 2 (Ph Ph θ ) 2 ] (h solves Poisson s eqn)

46 Value function approximation: Other metrics Bellman error: CLT error: E π [(Ph θ (X) (1 + γ)h θ (X)+c(X)) 2 ] (discounted cost) σ 2 (θ) =E π [(h h θ ) 2 (Ph Ph θ ) 2 ] (h solves Poisson s eqn) In each case we can construct an adjoint equation to solve for optimal parameter

47 Outline Value function approximation w 2 Infeasible R Performance evaluation Perfromance improvement Simulation TD learning Numerics κ =1 κ =4 {h (x) =h 0 } w 1 Conclusions n

48 Once optimal CV is identified: µ 1 Station 1 µ 4 µ 2 Station 2 µ 3 α 3 Smoothed estimator using fluid value function: g = h P h = Dh, h = J x 10 Standard estimator Smoothed estimator

49 Policy Selection µ 1 Station 1 µ 4 µ 2 Station 2 µ 3 α 3 Serve buffer 1 if buffer 4 is empty, or W 2 s i W 1 β 1 and m 2 Q 2 + m 3 Q 3 w 2 Performance as a function of safety stocks w w2 (i) Simulation using smoothed estimator w w2 (i) Simulation using standard estimator 10

50 Optimal speed scaling Joint work with Adam Wierman and the students of ECE 555 Intel s huge bet turns iffy. John Markoff and Steve Lohr. New York Times, September What matters most to the computer designers at Google is not speed, but power, low power, because data centers can consume as much electricity as a city. Dr. Eric Schmidt, CEO of Google [1] O. S. Unsal andi. Koren, System-level power-aware deisgn techniques in real-time systems, Proc. IEEE, vol. 91, no. 7, pp , [2] S. Irani andk. R. Pruhs, Algorithmic problems in power management, SIGACT News, vol. 36, no. 2, pp , [3] S. Kaxirasand M. Martonosi, Computer Architecture Techniques for Power-Efficiency. MorganandClaypool, 2008.

51 Optimal speed scaling Joint work with Adam Wierman and the students of ECE 555 Can we adjust power to computer processor to optimize delay/power utilization tradeoff? 0 Power log(dyn power) Intel PXA 2 TCP proc Pentium M log(freq) Speed [1] O. S. Unsal andi. Koren, System-level power-aware deisgn techniques in real-time systems, Proc. IEEE, vol. 91, no. 7, pp , [2] S. Irani andk. R. Pruhs, Algorithmic problems in power management, SIGACT News, vol. 36, no. 2, pp , [3] S. Kaxirasand M. Martonosi, Computer Architecture Techniques for Power-Efficiency. MorganandClaypool, 2008.

52 Optimal speed scaling - queueing model Abstraction: A controlled queue Q(k + 1) Q(k) = U(k) + A(k + 1) Processing rate Job arrivals Minimize the average cost: Quadratic processing cost for sake of example

53 Optimal speed scaling - fluid model Abstraction: A controlled fluid model Processing rate Job arrivals Minimize the total cost (up to draining time): J (x) = min 0 T 0 c(q(t), ζ(t)) dt, q(0) = x = αx (2x + α2 ) 3/2 α 3

54 Optimal speed scaling - value functions Relative value function h Fluid value function J x J (x) = min 0 T 0 c(q(t), ζ(t)) dt, q(0) = x

55 Optimal speed scaling - policies φ J (x) = arg min 0 u x [ c(x, u) + P u J (x)] input u Stochastic optimal policy queue length x J (x) = min 0 T 0 c(q(t), ζ(t)) dt, q(0) = x

56 Optimal speed scaling - TD Learning 1 θ 1 (t) Basis θ 2 (t) x 10 5 t

57 Outline Value function approximation w 2 Infeasible R Performance evaluation Perfromance improvement Simulation TD learning {h (x) =h 0 } w 1 Numerics κ =1 κ =4 Conclusions n

58 Conclusions Possible to apply adjoint for algorithm synthesis in many settings Many questions remaining! Variance of Newton and SA recursions Parametizations in networks Complexity: workload relaxations (see CTCN and Henderson and M. 2003) Policy improvement, approximate DP Veatch 2004 Mannor, Menache, and Shimkin, 2005 Moallemi, Kumar, and Van Roy 2007

59 Conclusions Possible to apply adjoint for algorithm synthesis in many settings Many questions remaining! Variance of Newton and SA recursions Parametizations in networks Complexity: workload relaxations Policy improvement, approximate DP Chapter 11 Control Techniques for Complex Networks Cambridge 2007

Average-cost temporal difference learning and adaptive control variates

Average-cost temporal difference learning and adaptive control variates Average-cost temporal difference learning and adaptive control variates Sean Meyn Department of ECE and the Coordinated Science Laboratory Joint work with S. Mannor, McGill V. Tadic, Sheffield S. Henderson,

More information

Stability and Asymptotic Optimality of h-maxweight Policies

Stability and Asymptotic Optimality of h-maxweight Policies Stability and Asymptotic Optimality of h-maxweight Policies - A. Rybko, 2006 Sean Meyn Department of Electrical and Computer Engineering University of Illinois & the Coordinated Science Laboratory NSF

More information

Quasi Stochastic Approximation American Control Conference San Francisco, June 2011

Quasi Stochastic Approximation American Control Conference San Francisco, June 2011 Quasi Stochastic Approximation American Control Conference San Francisco, June 2011 Sean P. Meyn Joint work with Darshan Shirodkar and Prashant Mehta Coordinated Science Laboratory and the Department of

More information

C~I- c-5~- n.=. r (s) = + 'I. ~t(y(-.*) - y^cbnld)

C~I- c-5~- n.=. r (s) = + 'I. ~t(y(-.*) - y^cbnld) A C~I- c-5~- n.=. r (s) = + 'I. ~t(y(-.*) - y^cbnld) b 9, (4 f 3 cc*l, 'PC w4. ECE 555 Control of Stochastic Systems Fall 2005 Handout: Reinforcement learning In this handout we analyse reinforcement

More information

Workload Models for Stochastic Networks: Value Functions and Performance Evaluation

Workload Models for Stochastic Networks: Value Functions and Performance Evaluation Workload Models for Stochastic Networks: Value Functions and Performance Evaluation Sean Meyn Fellow, IEEE Abstract This paper concerns control and performance evaluation for stochastic network models.

More information

Dynamic Matching Models

Dynamic Matching Models Dynamic Matching Models Ana Bušić Inria Paris - Rocquencourt CS Department of École normale supérieure joint work with Varun Gupta, Jean Mairesse and Sean Meyn 3rd Workshop on Cognition and Control January

More information

Approximate Dynamic Programming

Approximate Dynamic Programming Approximate Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course Value Iteration: the Idea 1. Let V 0 be any vector in R N A. LAZARIC Reinforcement

More information

Finding the best mismatched detector for channel coding and hypothesis testing

Finding the best mismatched detector for channel coding and hypothesis testing Finding the best mismatched detector for channel coding and hypothesis testing Sean Meyn Department of Electrical and Computer Engineering University of Illinois and the Coordinated Science Laboratory

More information

Management of demand-driven production systems

Management of demand-driven production systems Management of demand-driven production systems Mike Chen, Richard Dubrawski, and Sean Meyn November 4, 22 Abstract Control-synthesis techniques are developed for demand-driven production systems. The resulting

More information

On Reparametrization and the Gibbs Sampler

On Reparametrization and the Gibbs Sampler On Reparametrization and the Gibbs Sampler Jorge Carlos Román Department of Mathematics Vanderbilt University James P. Hobert Department of Statistics University of Florida March 2014 Brett Presnell Department

More information

Central-limit approach to risk-aware Markov decision processes

Central-limit approach to risk-aware Markov decision processes Central-limit approach to risk-aware Markov decision processes Jia Yuan Yu Concordia University November 27, 2015 Joint work with Pengqian Yu and Huan Xu. Inventory Management 1 1 Look at current inventory

More information

Exact Simulation of the Stationary Distribution of M/G/c Queues

Exact Simulation of the Stationary Distribution of M/G/c Queues 1/36 Exact Simulation of the Stationary Distribution of M/G/c Queues Professor Karl Sigman Columbia University New York City USA Conference in Honor of Søren Asmussen Monday, August 1, 2011 Sandbjerg Estate

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

A System Theoretic Perspective of Learning and Optimization

A System Theoretic Perspective of Learning and Optimization A System Theoretic Perspective of Learning and Optimization Xi-Ren Cao* Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong Kong eecao@ee.ust.hk Abstract Learning and optimization

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

Advanced Computer Networks Lecture 2. Markov Processes

Advanced Computer Networks Lecture 2. Markov Processes Advanced Computer Networks Lecture 2. Markov Processes Husheng Li Min Kao Department of Electrical Engineering and Computer Science University of Tennessee, Knoxville Spring, 2016 1/28 Outline 2/28 1 Definition

More information

Temporal-Difference Q-learning in Active Fault Diagnosis

Temporal-Difference Q-learning in Active Fault Diagnosis Temporal-Difference Q-learning in Active Fault Diagnosis Jan Škach 1 Ivo Punčochář 1 Frank L. Lewis 2 1 Identification and Decision Making Research Group (IDM) European Centre of Excellence - NTIS University

More information

Stability and Rare Events in Stochastic Models Sergey Foss Heriot-Watt University, Edinburgh and Institute of Mathematics, Novosibirsk

Stability and Rare Events in Stochastic Models Sergey Foss Heriot-Watt University, Edinburgh and Institute of Mathematics, Novosibirsk Stability and Rare Events in Stochastic Models Sergey Foss Heriot-Watt University, Edinburgh and Institute of Mathematics, Novosibirsk ANSAPW University of Queensland 8-11 July, 2013 1 Outline (I) Fluid

More information

Positive Harris Recurrence and Diffusion Scale Analysis of a Push Pull Queueing Network. Haifa Statistics Seminar May 5, 2008

Positive Harris Recurrence and Diffusion Scale Analysis of a Push Pull Queueing Network. Haifa Statistics Seminar May 5, 2008 Positive Harris Recurrence and Diffusion Scale Analysis of a Push Pull Queueing Network Yoni Nazarathy Gideon Weiss Haifa Statistics Seminar May 5, 2008 1 Outline 1 Preview of Results 2 Introduction Queueing

More information

Near-Optimal Control of Queueing Systems via Approximate One-Step Policy Improvement

Near-Optimal Control of Queueing Systems via Approximate One-Step Policy Improvement Near-Optimal Control of Queueing Systems via Approximate One-Step Policy Improvement Jefferson Huang March 21, 2018 Reinforcement Learning for Processing Networks Seminar Cornell University Performance

More information

Monte Carlo methods for sampling-based Stochastic Optimization

Monte Carlo methods for sampling-based Stochastic Optimization Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from

More information

On Ergodic Impulse Control with Constraint

On Ergodic Impulse Control with Constraint On Ergodic Impulse Control with Constraint Maurice Robin Based on joint papers with J.L. Menaldi University Paris-Sanclay 9119 Saint-Aubin, France (e-mail: maurice.robin@polytechnique.edu) IMA, Minneapolis,

More information

Joint work with Nguyen Hoang (Univ. Concepción, Chile) Padova, Italy, May 2018

Joint work with Nguyen Hoang (Univ. Concepción, Chile) Padova, Italy, May 2018 EXTENDED EULER-LAGRANGE AND HAMILTONIAN CONDITIONS IN OPTIMAL CONTROL OF SWEEPING PROCESSES WITH CONTROLLED MOVING SETS BORIS MORDUKHOVICH Wayne State University Talk given at the conference Optimization,

More information

Tutorial: Optimal Control of Queueing Networks

Tutorial: Optimal Control of Queueing Networks Department of Mathematics Tutorial: Optimal Control of Queueing Networks Mike Veatch Presented at INFORMS Austin November 7, 2010 1 Overview Network models MDP formulations: features, efficient formulations

More information

AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET. Questions AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET

AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET. Questions AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET The Problem Identification of Linear and onlinear Dynamical Systems Theme : Curve Fitting Division of Automatic Control Linköping University Sweden Data from Gripen Questions How do the control surface

More information

NEW FRONTIERS IN APPLIED PROBABILITY

NEW FRONTIERS IN APPLIED PROBABILITY J. Appl. Prob. Spec. Vol. 48A, 209 213 (2011) Applied Probability Trust 2011 NEW FRONTIERS IN APPLIED PROBABILITY A Festschrift for SØREN ASMUSSEN Edited by P. GLYNN, T. MIKOSCH and T. ROLSKI Part 4. Simulation

More information

Approximate Dynamic Programming

Approximate Dynamic Programming Master MVA: Reinforcement Learning Lecture: 5 Approximate Dynamic Programming Lecturer: Alessandro Lazaric http://researchers.lille.inria.fr/ lazaric/webpage/teaching.html Objectives of the lecture 1.

More information

On Stopping Times and Impulse Control with Constraint

On Stopping Times and Impulse Control with Constraint On Stopping Times and Impulse Control with Constraint Jose Luis Menaldi Based on joint papers with M. Robin (216, 217) Department of Mathematics Wayne State University Detroit, Michigan 4822, USA (e-mail:

More information

Performance Evaluation of Queuing Systems

Performance Evaluation of Queuing Systems Performance Evaluation of Queuing Systems Introduction to Queuing Systems System Performance Measures & Little s Law Equilibrium Solution of Birth-Death Processes Analysis of Single-Station Queuing Systems

More information

Resource Pooling for Optimal Evacuation of a Large Building

Resource Pooling for Optimal Evacuation of a Large Building Proceedings of the 47th IEEE Conference on Decision and Control Cancun, Mexico, Dec. 9-11, 28 Resource Pooling for Optimal Evacuation of a Large Building Kun Deng, Wei Chen, Prashant G. Mehta, and Sean

More information

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This

More information

Mean-field dual of cooperative reproduction

Mean-field dual of cooperative reproduction The mean-field dual of systems with cooperative reproduction joint with Tibor Mach (Prague) A. Sturm (Göttingen) Friday, July 6th, 2018 Poisson construction of Markov processes Let (X t ) t 0 be a continuous-time

More information

Stein s Method for Steady-State Approximations: Error Bounds and Engineering Solutions

Stein s Method for Steady-State Approximations: Error Bounds and Engineering Solutions Stein s Method for Steady-State Approximations: Error Bounds and Engineering Solutions Jim Dai Joint work with Anton Braverman and Jiekun Feng Cornell University Workshop on Congestion Games Institute

More information

Introduction to Reinforcement Learning. CMPT 882 Mar. 18

Introduction to Reinforcement Learning. CMPT 882 Mar. 18 Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and

More information

Optimization and the Price of Anarchy in a Dynamic Newsboy Model

Optimization and the Price of Anarchy in a Dynamic Newsboy Model Optimization and the Price of Anarchy in a Dynamic Newsboy Model Sean Meyn Prices Normalized demand Reserve Department of ECE and the Coordinated Science Laboratory University of Illinois Joint work with

More information

Time Reversibility and Burke s Theorem

Time Reversibility and Burke s Theorem Queuing Analysis: Time Reversibility and Burke s Theorem Hongwei Zhang http://www.cs.wayne.edu/~hzhang Acknowledgement: this lecture is partially based on the slides of Dr. Yannis A. Korilis. Outline Time-Reversal

More information

Stochastic Optimization for Undergraduate Computer Science Students

Stochastic Optimization for Undergraduate Computer Science Students Stochastic Optimization for Undergraduate Computer Science Students Professor Joongheon Kim School of Computer Science and Engineering, Chung-Ang University, Seoul, Republic of Korea 1 Reference 2 Outline

More information

1 Markov decision processes

1 Markov decision processes 2.997 Decision-Making in Large-Scale Systems February 4 MI, Spring 2004 Handout #1 Lecture Note 1 1 Markov decision processes In this class we will study discrete-time stochastic systems. We can describe

More information

Lecture 20: Reversible Processes and Queues

Lecture 20: Reversible Processes and Queues Lecture 20: Reversible Processes and Queues 1 Examples of reversible processes 11 Birth-death processes We define two non-negative sequences birth and death rates denoted by {λ n : n N 0 } and {µ n : n

More information

5 Lecture 5: Fluid Models

5 Lecture 5: Fluid Models 5 Lecture 5: Fluid Models Stability of fluid and stochastic processing networks Stability analysis of some fluid models Optimization of fluid networks. Separated continuous linear programming 5.1 Stability

More information

Reinforcement Learning and Optimal Control. ASU, CSE 691, Winter 2019

Reinforcement Learning and Optimal Control. ASU, CSE 691, Winter 2019 Reinforcement Learning and Optimal Control ASU, CSE 691, Winter 2019 Dimitri P. Bertsekas dimitrib@mit.edu Lecture 8 Bertsekas Reinforcement Learning 1 / 21 Outline 1 Review of Infinite Horizon Problems

More information

Other properties of M M 1

Other properties of M M 1 Other properties of M M 1 Přemysl Bejda premyslbejda@gmail.com 2012 Contents 1 Reflected Lévy Process 2 Time dependent properties of M M 1 3 Waiting times and queue disciplines in M M 1 Contents 1 Reflected

More information

QUALIFYING EXAM IN SYSTEMS ENGINEERING

QUALIFYING EXAM IN SYSTEMS ENGINEERING QUALIFYING EXAM IN SYSTEMS ENGINEERING Written Exam: MAY 23, 2017, 9:00AM to 1:00PM, EMB 105 Oral Exam: May 25 or 26, 2017 Time/Location TBA (~1 hour per student) CLOSED BOOK, NO CHEAT SHEETS BASIC SCIENTIFIC

More information

Lecture 12: Detailed balance and Eigenfunction methods

Lecture 12: Detailed balance and Eigenfunction methods Miranda Holmes-Cerfon Applied Stochastic Analysis, Spring 2015 Lecture 12: Detailed balance and Eigenfunction methods Readings Recommended: Pavliotis [2014] 4.5-4.7 (eigenfunction methods and reversibility),

More information

Math 2: Algebra 2, Geometry and Statistics Ms. Sheppard-Brick Chapter 4 Test Review

Math 2: Algebra 2, Geometry and Statistics Ms. Sheppard-Brick Chapter 4 Test Review Chapter 4 Test Review Students will be able to (SWBAT): Write an explicit and a recursive function rule for a linear table of values. Write an explicit function rule for a quadratic table of values. Determine

More information

Almost Sure Convergence of Two Time-Scale Stochastic Approximation Algorithms

Almost Sure Convergence of Two Time-Scale Stochastic Approximation Algorithms Almost Sure Convergence of Two Time-Scale Stochastic Approximation Algorithms Vladislav B. Tadić Abstract The almost sure convergence of two time-scale stochastic approximation algorithms is analyzed under

More information

On Finding Optimal Policies for Markovian Decision Processes Using Simulation

On Finding Optimal Policies for Markovian Decision Processes Using Simulation On Finding Optimal Policies for Markovian Decision Processes Using Simulation Apostolos N. Burnetas Case Western Reserve University Michael N. Katehakis Rutgers University February 1995 Abstract A simulation

More information

Perturbed Proximal Gradient Algorithm

Perturbed Proximal Gradient Algorithm Perturbed Proximal Gradient Algorithm Gersende FORT LTCI, CNRS, Telecom ParisTech Université Paris-Saclay, 75013, Paris, France Large-scale inverse problems and optimization Applications to image processing

More information

LECTURE 3. Last time:

LECTURE 3. Last time: LECTURE 3 Last time: Mutual Information. Convexity and concavity Jensen s inequality Information Inequality Data processing theorem Fano s Inequality Lecture outline Stochastic processes, Entropy rate

More information

Lecture 12: Detailed balance and Eigenfunction methods

Lecture 12: Detailed balance and Eigenfunction methods Lecture 12: Detailed balance and Eigenfunction methods Readings Recommended: Pavliotis [2014] 4.5-4.7 (eigenfunction methods and reversibility), 4.2-4.4 (explicit examples of eigenfunction methods) Gardiner

More information

State Space Abstractions for Reinforcement Learning

State Space Abstractions for Reinforcement Learning State Space Abstractions for Reinforcement Learning Rowan McAllister and Thang Bui MLG RCC 6 November 24 / 24 Outline Introduction Markov Decision Process Reinforcement Learning State Abstraction 2 Abstraction

More information

Some Results on the Ergodicity of Adaptive MCMC Algorithms

Some Results on the Ergodicity of Adaptive MCMC Algorithms Some Results on the Ergodicity of Adaptive MCMC Algorithms Omar Khalil Supervisor: Jeffrey Rosenthal September 2, 2011 1 Contents 1 Andrieu-Moulines 4 2 Roberts-Rosenthal 7 3 Atchadé and Fort 8 4 Relationship

More information

Large Deviation Asymptotics and Control Variates For Simulating Large Functions

Large Deviation Asymptotics and Control Variates For Simulating Large Functions Large Deviation Asymptotics and Control Variates For Simulating Large Functions Sean Meyn October 8, 2004 Abstract Consider the normalized partial sums of a real-valued function F of a Markov chain, n

More information

A comparison of numerical methods for the. Solution of continuous-time DSGE models. Juan Carlos Parra Alvarez

A comparison of numerical methods for the. Solution of continuous-time DSGE models. Juan Carlos Parra Alvarez A comparison of numerical methods for the solution of continuous-time DSGE models Juan Carlos Parra Alvarez Department of Economics and Business, and CREATES Aarhus University, Denmark November 14, 2012

More information

Markov Chains. Arnoldo Frigessi Bernd Heidergott November 4, 2015

Markov Chains. Arnoldo Frigessi Bernd Heidergott November 4, 2015 Markov Chains Arnoldo Frigessi Bernd Heidergott November 4, 2015 1 Introduction Markov chains are stochastic models which play an important role in many applications in areas as diverse as biology, finance,

More information

Stochastic Processes

Stochastic Processes Elements of Lecture II Hamid R. Rabiee with thanks to Ali Jalali Overview Reading Assignment Chapter 9 of textbook Further Resources MIT Open Course Ware S. Karlin and H. M. Taylor, A First Course in Stochastic

More information

Controlled sequential Monte Carlo

Controlled sequential Monte Carlo Controlled sequential Monte Carlo Jeremy Heng, Department of Statistics, Harvard University Joint work with Adrian Bishop (UTS, CSIRO), George Deligiannidis & Arnaud Doucet (Oxford) Bayesian Computation

More information

Practical conditions on Markov chains for weak convergence of tail empirical processes

Practical conditions on Markov chains for weak convergence of tail empirical processes Practical conditions on Markov chains for weak convergence of tail empirical processes Olivier Wintenberger University of Copenhagen and Paris VI Joint work with Rafa l Kulik and Philippe Soulier Toronto,

More information

Lectures on Stochastic Stability. Sergey FOSS. Heriot-Watt University. Lecture 4. Coupling and Harris Processes

Lectures on Stochastic Stability. Sergey FOSS. Heriot-Watt University. Lecture 4. Coupling and Harris Processes Lectures on Stochastic Stability Sergey FOSS Heriot-Watt University Lecture 4 Coupling and Harris Processes 1 A simple example Consider a Markov chain X n in a countable state space S with transition probabilities

More information

Basics of reinforcement learning

Basics of reinforcement learning Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system

More information

A Diffusion Approximation for Stationary Distribution of Many-Server Queueing System In Halfin-Whitt Regime

A Diffusion Approximation for Stationary Distribution of Many-Server Queueing System In Halfin-Whitt Regime A Diffusion Approximation for Stationary Distribution of Many-Server Queueing System In Halfin-Whitt Regime Mohammadreza Aghajani joint work with Kavita Ramanan Brown University APS Conference, Istanbul,

More information

Controlled Diffusions and Hamilton-Jacobi Bellman Equations

Controlled Diffusions and Hamilton-Jacobi Bellman Equations Controlled Diffusions and Hamilton-Jacobi Bellman Equations Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2014 Emo Todorov (UW) AMATH/CSE 579, Winter

More information

Reliability by Design in Distributed Power Transmission Networks

Reliability by Design in Distributed Power Transmission Networks Reliability by Design in Distributed Power Transmission Networks Mike Chen a, In-Koo Cho b, Sean P. Meyn c a Morgan Stanley and Co., 1585 Broadway, New York, NY, 119 USA b Department of Economics, University

More information

Simulation Based Aircraft Trajectories Design for Air Traffic Management

Simulation Based Aircraft Trajectories Design for Air Traffic Management Simulation Based Aircraft Trajectories Design for Air Traffic Management N. Kantas 1 A. Lecchini-Visintini 2 J. M. Maciejowski 1 1 Cambridge University Engineering Dept., UK 2 Dept. of Engineering, University

More information

Operations Research Letters. Instability of FIFO in a simple queueing system with arbitrarily low loads

Operations Research Letters. Instability of FIFO in a simple queueing system with arbitrarily low loads Operations Research Letters 37 (2009) 312 316 Contents lists available at ScienceDirect Operations Research Letters journal homepage: www.elsevier.com/locate/orl Instability of FIFO in a simple queueing

More information

{σ x >t}p x. (σ x >t)=e at.

{σ x >t}p x. (σ x >t)=e at. 3.11. EXERCISES 121 3.11 Exercises Exercise 3.1 Consider the Ornstein Uhlenbeck process in example 3.1.7(B). Show that the defined process is a Markov process which converges in distribution to an N(0,σ

More information

Auxiliary Particle Methods

Auxiliary Particle Methods Auxiliary Particle Methods Perspectives & Applications Adam M. Johansen 1 adam.johansen@bristol.ac.uk Oxford University Man Institute 29th May 2008 1 Collaborators include: Arnaud Doucet, Nick Whiteley

More information

Quality of Real-Time Streaming in Wireless Cellular Networks : Stochastic Modeling and Analysis

Quality of Real-Time Streaming in Wireless Cellular Networks : Stochastic Modeling and Analysis Quality of Real-Time Streaming in Wireless Cellular Networs : Stochastic Modeling and Analysis B. Blaszczyszyn, M. Jovanovic and M. K. Karray Based on paper [1] WiOpt/WiVid Mai 16th, 2014 Outline Introduction

More information

Markov Decision Processes and Dynamic Programming

Markov Decision Processes and Dynamic Programming Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course In This Lecture A. LAZARIC Markov Decision Processes

More information

Multivariate Risk Processes with Interacting Intensities

Multivariate Risk Processes with Interacting Intensities Multivariate Risk Processes with Interacting Intensities Nicole Bäuerle (joint work with Rudolf Grübel) Luminy, April 2010 Outline Multivariate pure birth processes Multivariate Risk Processes Fluid Limits

More information

Final Exam Practice Problems Math 428, Spring 2017

Final Exam Practice Problems Math 428, Spring 2017 Final xam Practice Problems Math 428, Spring 2017 Name: Directions: Throughout, (X,M,µ) is a measure space, unless stated otherwise. Since this is not to be turned in, I highly recommend that you work

More information

Lecture 3: Markov Decision Processes

Lecture 3: Markov Decision Processes Lecture 3: Markov Decision Processes Joseph Modayil 1 Markov Processes 2 Markov Reward Processes 3 Markov Decision Processes 4 Extensions to MDPs Markov Processes Introduction Introduction to MDPs Markov

More information

Randomized Algorithms for Semi-Infinite Programming Problems

Randomized Algorithms for Semi-Infinite Programming Problems Randomized Algorithms for Semi-Infinite Programming Problems Vladislav B. Tadić 1, Sean P. Meyn 2, and Roberto Tempo 3 1 Department of Automatic Control and Systems Engineering University of Sheffield,

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 22 12/09/2013. Skorokhod Mapping Theorem. Reflected Brownian Motion

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 22 12/09/2013. Skorokhod Mapping Theorem. Reflected Brownian Motion MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.7J Fall 213 Lecture 22 12/9/213 Skorokhod Mapping Theorem. Reflected Brownian Motion Content. 1. G/G/1 queueing system 2. One dimensional reflection mapping

More information

Second order forward-backward dynamical systems for monotone inclusion problems

Second order forward-backward dynamical systems for monotone inclusion problems Second order forward-backward dynamical systems for monotone inclusion problems Radu Ioan Boţ Ernö Robert Csetnek March 6, 25 Abstract. We begin by considering second order dynamical systems of the from

More information

Monte Carlo Methods for Computation and Optimization (048715)

Monte Carlo Methods for Computation and Optimization (048715) Technion Department of Electrical Engineering Monte Carlo Methods for Computation and Optimization (048715) Lecture Notes Prof. Nahum Shimkin Spring 2015 i PREFACE These lecture notes are intended for

More information

Stability of the two queue system

Stability of the two queue system Stability of the two queue system Iain M. MacPhee and Lisa J. Müller University of Durham Department of Mathematical Science Durham, DH1 3LE, UK (e-mail: i.m.macphee@durham.ac.uk, l.j.muller@durham.ac.uk)

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Mark Schmidt University of British Columbia Winter 2018 Last Time: Monte Carlo Methods If we want to approximate expectations of random functions, E[g(x)] = g(x)p(x) or E[g(x)]

More information

Approximate Dynamic Programming

Approximate Dynamic Programming Approximate Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course Approximate Dynamic Programming (a.k.a. Batch Reinforcement Learning) A.

More information

Rare Event Simulation using Importance Sampling and Cross-Entropy p. 1

Rare Event Simulation using Importance Sampling and Cross-Entropy p. 1 Rare Event Simulation using Importance Sampling and Cross-Entropy Caitlin James Supervisor: Phil Pollett Rare Event Simulation using Importance Sampling and Cross-Entropy p. 1 500 Simulation of a population

More information

ON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME

ON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME ON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME SAUL D. JACKA AND ALEKSANDAR MIJATOVIĆ Abstract. We develop a general approach to the Policy Improvement Algorithm (PIA) for stochastic control problems

More information

ECE7850 Lecture 7. Discrete Time Optimal Control and Dynamic Programming

ECE7850 Lecture 7. Discrete Time Optimal Control and Dynamic Programming ECE7850 Lecture 7 Discrete Time Optimal Control and Dynamic Programming Discrete Time Optimal control Problems Short Introduction to Dynamic Programming Connection to Stabilization Problems 1 DT nonlinear

More information

Chapter 3 Balance equations, birth-death processes, continuous Markov Chains

Chapter 3 Balance equations, birth-death processes, continuous Markov Chains Chapter 3 Balance equations, birth-death processes, continuous Markov Chains Ioannis Glaropoulos November 4, 2012 1 Exercise 3.2 Consider a birth-death process with 3 states, where the transition rate

More information

Stein s Method for Steady-State Approximations: A Toolbox of Techniques

Stein s Method for Steady-State Approximations: A Toolbox of Techniques Stein s Method for Steady-State Approximations: A Toolbox of Techniques Anton Braverman Based on joint work with Jim Dai (Cornell), Itai Gurvich (Cornell), and Junfei Huang (CUHK). November 17, 2017 Outline

More information

State-dependent Limited Processor Sharing - Approximations and Optimal Control

State-dependent Limited Processor Sharing - Approximations and Optimal Control State-dependent Limited Processor Sharing - Approximations and Optimal Control Varun Gupta University of Chicago Joint work with: Jiheng Zhang (HKUST) State-dependent Limited Processor Sharing Processor

More information

Sequential Monte Carlo Methods for Bayesian Computation

Sequential Monte Carlo Methods for Bayesian Computation Sequential Monte Carlo Methods for Bayesian Computation A. Doucet Kyoto Sept. 2012 A. Doucet (MLSS Sept. 2012) Sept. 2012 1 / 136 Motivating Example 1: Generic Bayesian Model Let X be a vector parameter

More information

Stochastic Gradient Descent in Continuous Time

Stochastic Gradient Descent in Continuous Time Stochastic Gradient Descent in Continuous Time Justin Sirignano University of Illinois at Urbana Champaign with Konstantinos Spiliopoulos (Boston University) 1 / 27 We consider a diffusion X t X = R m

More information

UNIVERSITY OF MANITOBA

UNIVERSITY OF MANITOBA Question Points Score INSTRUCTIONS TO STUDENTS: This is a 6 hour examination. No extra time will be given. No texts, notes, or other aids are permitted. There are no calculators, cellphones or electronic

More information

Course Summary Math 211

Course Summary Math 211 Course Summary Math 211 table of contents I. Functions of several variables. II. R n. III. Derivatives. IV. Taylor s Theorem. V. Differential Geometry. VI. Applications. 1. Best affine approximations.

More information

In search of sensitivity in network optimization

In search of sensitivity in network optimization In search of sensitivity in network optimization Mike Chen, Charuhas Pandit, and Sean Meyn June 13, 2002 Abstract This paper concerns policy synthesis in large queuing networks. The results provide answers

More information

Markov processes and queueing networks

Markov processes and queueing networks Inria September 22, 2015 Outline Poisson processes Markov jump processes Some queueing networks The Poisson distribution (Siméon-Denis Poisson, 1781-1840) { } e λ λ n n! As prevalent as Gaussian distribution

More information

Stochastic process. X, a series of random variables indexed by t

Stochastic process. X, a series of random variables indexed by t Stochastic process X, a series of random variables indexed by t X={X(t), t 0} is a continuous time stochastic process X={X(t), t=0,1, } is a discrete time stochastic process X(t) is the state at time t,

More information

minimize x subject to (x 2)(x 4) u,

minimize x subject to (x 2)(x 4) u, Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for

More information

Dynamic Programming Theorems

Dynamic Programming Theorems Dynamic Programming Theorems Prof. Lutz Hendricks Econ720 September 11, 2017 1 / 39 Dynamic Programming Theorems Useful theorems to characterize the solution to a DP problem. There is no reason to remember

More information

Chapter 7. Markov chain background. 7.1 Finite state space

Chapter 7. Markov chain background. 7.1 Finite state space Chapter 7 Markov chain background A stochastic process is a family of random variables {X t } indexed by a varaible t which we will think of as time. Time can be discrete or continuous. We will only consider

More information

Exercises Stochastic Performance Modelling. Hamilton Institute, Summer 2010

Exercises Stochastic Performance Modelling. Hamilton Institute, Summer 2010 Exercises Stochastic Performance Modelling Hamilton Institute, Summer Instruction Exercise Let X be a non-negative random variable with E[X ]

More information

Outline. A Central Limit Theorem for Truncating Stochastic Algorithms

Outline. A Central Limit Theorem for Truncating Stochastic Algorithms Outline A Central Limit Theorem for Truncating Stochastic Algorithms Jérôme Lelong http://cermics.enpc.fr/ lelong Tuesday September 5, 6 1 3 4 Jérôme Lelong (CERMICS) Tuesday September 5, 6 1 / 3 Jérôme

More information

On the Ergodicity Properties of some Adaptive MCMC Algorithms

On the Ergodicity Properties of some Adaptive MCMC Algorithms On the Ergodicity Properties of some Adaptive MCMC Algorithms Christophe Andrieu, Éric Moulines Abstract In this paper we study the ergodicity properties of some adaptive Monte Carlo Markov chain algorithms

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Dipendra Misra Cornell University dkm@cs.cornell.edu https://dipendramisra.wordpress.com/ Task Grasp the green cup. Output: Sequence of controller actions Setup from Lenz et. al.

More information

arxiv: v3 [cs.dm] 25 Jun 2016

arxiv: v3 [cs.dm] 25 Jun 2016 Approximate optimality with bounded regret in dynamic matching models arxiv:1411.1044v3 [cs.dm] 25 Jun 2016 Ana Bušić Sean Meyn Abstract We consider a discrete-time bipartite matching model with random

More information