TD Learning. Sean Meyn. Department of Electrical and Computer Engineering University of Illinois & the Coordinated Science Laboratory
|
|
- Alexander Rodgers
- 5 years ago
- Views:
Transcription
1 TD Learning Sean Meyn Department of Electrical and Computer Engineering University of Illinois & the Coordinated Science Laboratory Joint work with I.-K. Cho, Illinois S. Mannor, McGill V. Tadic, Sheffield S. Henderson, Cornell NSF support: ECS and DARPA ITMANET
2 Art Recollections - A. Rybko, 2006
3 Recollections Value Function Solidarity Relative value function h(x) = 0 E[c(Q(t; x)) η] dt η= c(x) π(dx) Fluid value function J(x) = - A. Rybko, 2006 Art lim x 0 c(q(t; x)) dt J(x) =1 h(x)
4 Recollections Lyapunov Functions, and Policy Translation x2 Relative value function h(x) = 0 (x) = E[c(Q(t; x)) η] dt η= q2 Fluid value function J(x) = - A. Rybko, 2006 Art PV α1 µ2 0 (x) = c(x) π(dx) µ1 +α1 µ2 +µ1 c(q(t; x)) dt J(x) lim =1 h(x) (x)x := E[V (Q(k + 1)) (x) = µ1 +α1 µ1 Q(k) = x] V (x) Solidarity ε x + η Value Function x1
5 Outline Value function approximation w 2 Infeasible R Performance evaluation Perfromance improvement Simulation TD learning {h (x) =h 0 } w 1 Numerics κ =1 κ =4 Conclusions n
6 Outline Value function approximation Performance evaluation Perfromance improvement Simulation TD learning w 2 Infeasible R {h (x) =h 0 } w 1 Numerics Conclusions κ =1 κ = n
7 Average cost { X(t) :t =0,1,2,... } P (x,y) πp = π c: X R + η = π(c) Ergodic Markov chain on X (countable) Transition matrix Invariant probability measure Cost function (near monotone) Finite average cost η = π(c) = lim n 1 n n c(x(t)) t=0
8 Value functions Value functions: Discounted [ ] h(x):=e (1 + γ) t 1 c(x(t)) t =0 Relative [ τ 0 ] h(x):=e (c(x(t)) η) t =0 X(0) = x X
9 Value functions Value functions: Discounted [ ] h(x):=e (1 + γ) t 1 c(x(t)) t =0 Relative [ τ 0 ] h(x):=e (c(x(t)) η) t =0 X(0) = x X Dynamic programming equations: Discounted [γi D]h = c D = P I Relative D h = c η
10 Value function approximation Discounted-cost value function: [ ] h(x) =E (1 + γ) t 1 c(x(t)) t =0 Network model with c(x) a norm on X Under general conditions, approximated by scaled cost: lim r 1 r h(rx) = 1 γ c(x)
11 Value function approximation Relative value function, [ τ 0 ] h(x) = E (c(x(t)) η) t =0 h solves Poisson s equation, D h = c + η
12 Value function approximation Value function approximated by fluid model: [ τ 0 ] h(x) = E (c(x(t)) η) J(x) = t =0 0 c(q(t)) dt Relative value function Fluid value function Under general conditions, lim r 1 h(rx) =J(x) r2
13 Value function approximation Value function approximated by fluid model: [ τ 0 ] h(x) =E (c(x(t)) η) J(x) = t =0 0 c(q(t)) dt Infeasible R Workload model: level sets of h ι 2 (k) = ι 2 (k) > 0 20 {h (x) =h 0 }
14 Value function approximation: Simulation Simulation: Standard Monte Carlo, η k = k 1 k 1 t=0 c(x(t)) CLT variance: σ 2 = π(h 2 (Ph) 2 ) where h solves Poisson s equation D h = c + η
15 Value function approximation: Simulation Simulation: Standard Monte Carlo, η k = k 1 k 1 t=0 c(x(t)) CLT variance: σ 2 = π(h 2 (Ph) 2 ) where h solves Poisson s equation D h = c + η Example: Simulating the single queue: CLT variance = O 1 (1 ρ) 4
16 Value function approximation: Simulation η n n 1 = n 1 t=0 Q(t) lim log P{η n r} 0, n r > η n Example: Simulating the single queue: CLT variance = O 1 (1 ρ) 4
17 Value function approximation: Simulation Control variates: g : X R has known mean π(g )=0, Smoothed estimator: η k (ϑ) =k 1 k 1 t=0 c(x(t)) ϑg (X(t)), k 1 Perfect estimator: g = h Ph = Dh = c η Zero CLT variance
18 Steady state customer population Smoothed estimator using fluid value function: g = h Ph = Dh, h = J x 10 Standard estimator Smoothed estimator
19 Outline Value function approximation w 2 Infeasible R Performance evaluation Perfromance improvement Simulation TD learning {h (x) =h 0 } w 1 Numerics Conclusions κ =1 κ = n
20 Value function approximation Linear parametrization: h θ (x):=θ T ψ(x) = l h θ i ψ i (x), x X i=1 How to find the best parameter θ? How to define best?
21 Value function approximation: Hilbert space setting Linear parametrization: h θ (x):=θ T ψ(x) = l h θ i ψ i (x), x X i=1 Minimize the mean square error h θ h 2 π
22 Value function approximation: Hilbert space setting Linear parametrization: h θ (x):=θ T ψ(x) = l h θ i ψ i (x), x X i=1 Minimize the mean square error h θ h 2 π = 2 E π [(h θ (X(k)) h(x(t))) ] θ h θ h 2 π =2E π [(h θ (X(k)) h(x(t))) ψ (X(t))] Setting the gradient to zero gives optimal parameter
23 Value function approximation: Hilbert space setting Linear parametrization: h θ (x):=θ T ψ(x) = l h θ i ψ i (x), x X i=1 Minimize the mean square error h θ h 2 π = 2 E π [(h θ (X(k)) h(x(t))) ] θ h θ h 2 π =2E π [(h θ (X(k)) h(x(t))) ψ (X(t))] Setting the gradient to zero gives optimal parameter θ =Σ 1 b b = E π [h(x)ψ(x)], Σ=E π [ψ(x)ψ(x) T ]
24 Value function approximation: Hilbert space setting Define for two functions f, g on X, f,g π = π(fg)
25 Value function approximation: Hilbert space setting Define for two functions f, g on X, f,g π = π(fg) L norm, 2 f g 2 π := f g,f g π = E π [(f(x(0)) g(x(0))) 2 ]
26 Value function approximation: Hilbert space setting Define for two functions f, g on X, f,g π = π(fg) L norm, 2 f g 2 π := f g,f g π = E π [(f(x(0)) g(x(0))) 2 ] Resolvent, R γ = (1 + γ) (t+1) P t t=0
27 Value function approximation: Hilbert space setting Define for two functions f, g on X, f,g π = π(fg) L norm, 2 f g 2 π := f g,f g π = E π [(f(x(0)) g(x(0))) 2 ] Resolvent, R γ = (1 + γ) (t+1) P t t=0 b = E π [h(x)ψ(x)] = R γ c,ψ π
28 Value function approximation: Hilbert space setting Define for two functions f, g on X, f,g π = π(fg) L norm, 2 f g 2 π := f g,f g π = E π [(f(x(0)) g(x(0))) 2 ] Resolvent, R γ = (1 + γ) (t+1) P t t=0 Adjoint, R γ f,g π = f,r γ g π, f, g L 2 (π)
29 Representing the adjoint For the stationary process, R γ f, g π = E t=0 (1 + γ) t 1 P t f (X(0)) g(x(0)) (discounted cost)
30 Representing the adjoint For the stationary process, R γ f, g π = E t=0 (1 + γ) t 1 P t f (X(0)) g(x(0)) Smoothing, E[P t f (X(0))g(X(0))] = E[f(X(t))g(X(0))] = E[f(X(0))g(X( t))]
31 Representing the adjoint For the stationary process, R γ f, g π = E t=0 (1 + γ) t 1 P t f (X(0)) g(x(0)) Smoothing, E[P t f (X(0))g(X(0))] = E[f(X(t))g(X(0))] = E[f(X(0))g(X( t))] R γ f, g π = t=0 (1 + γ) t 1 E[f(X(0))g(X( t))]
32 Representing the adjoint For the stationary process, R γ f, g π = E t=0 (1 + γ) t 1 P t f (X(0)) g(x(0)) Smoothing, E[P t f (X(0))g(X(0))] = E[f(X(t))g(X(0))] = E[f(X(0))g(X( t))] R γ f, g π = t=0 (1 + γ) t 1 E[f(X(0))g(X( t))] = f, R γ g π
33 Representing the adjoint Adjoint is defined in terms of the time-reversed process R γ g (x) = (1 + γ) t 1 E[ g(x( t)) X(0) = x ] t=0 Causal computable!
34 TD Learning for discounted cost Causal computable! 1 θ 2 h θ h 2 π = E π [(h θ (X(k)) h(x(t))) ψ (X(t))] = (h θ R γ c), ψ π = R γ (Rγ 1 h θ c), ψ π = Rγ 1 h θ c, R γ ψ π
35 TD Learning for discounted cost Causal computable! 1 θ 2 h θ h 2 π = E π [(h θ (X(k)) h(x(t))) ψ (X(t))] = (h θ R γ c), ψ π = R γ (Rγ 1 h θ c), ψ π = Rγ 1 h θ c, R γ ψ π = d, P ϕ π Notation
36 TD Learning for discounted cost Causal computable! E[d(k)ϕ(k +1)] 1 2 θ h θ h γ 2 π, θ(i) θ 1 θ 2 h θ h 2 π = E π [(h θ (X(k)) h(x(t))) ψ (X(t))] = (h θ R γ c), ψ π = R γ (Rγ 1 h θ c), ψ π = Rγ 1 h θ c, R γ ψ π = d, P ϕ π Notation
37 TD Learning for discounted cost Causal computable! E[d(k)ϕ(k + 1)] 1 2 θ h θ h γ 2 π, θ(i) θ (i) Temporal differences defined by, d(k) = [(1 + γ)h θ(k) (X(k)) h θ(k) (X(k + 1)) c(x(k))] (ii) Eligibility vectors: the sequence of d-dimensional vectors, ϕ(k) = k (1 + γ) t 1 ψ(x(k t)), k 1 t=0
38 Algorithms TD(1) algorithm θ(k +1) θ(k) =a k d(k)ϕ(x(k +1)), k 0 a k stochastic approx. gain
39 Algorithms TD(1) algorithm θ(k +1) θ(k) =a k d(k)ϕ(x(k +1)), k 0 a k stochastic approx. gain Consistent under mild conditions, θ =Σ 1 b b = E π [h(x)ψ(x)], Σ=E π [ψ(x)ψ(x) T ]
40 Algorithms Newton algorithm θ(k +1)= 1 k k i=1 ψ(x(i +1))ψ T (X(i +1)) 1 1 k k i=1 c(i)ϕ(x(i +1)) Consistent under mild conditions, θ =Σ 1 b b = E π [h(x)ψ(x)], Σ=E π [ψ(x)ψ(x) T ]
41 TD Learning for average cost Potential matrix R 0 g (x) =E x [ σ α t=0 ] g(x(t)), x X σ α hitting time to α X
42 TD Learning for average cost Potential matrix R 0 g (x) =E x [ σ α t=0 ] g(x(t)), x X σ α hitting time to α X Solves Poisson s equation when g = c π(c)
43 TD Learning for average cost Potential matrix R 0 g (x) =E x [ σ α t=0 ] g(x(t)), x X σ α hitting time to α X Adjoint R 0 g (x) =E [ g(x(t)) ] X(0) = x, g L 2 (π) σ [0] α t 0 Backward hitting time α =max{ t k : X(t) =α } σ [k] CTCN 07
44 Value function approximation: Other metrics Bellman error: E π [(Ph θ (X) (1 + γ)h θ (X)+c(X)) 2 ] (discounted cost)
45 Value function approximation: Other metrics Bellman error: E π [(Ph θ (X) (1 + γ)h θ (X)+c(X)) 2 ] (discounted cost) CLT error: Tadic & M. 03, Kim & Henderson 03, CTCN 07 σ 2 (θ) =E π [(h h θ ) 2 (Ph Ph θ ) 2 ] (h solves Poisson s eqn)
46 Value function approximation: Other metrics Bellman error: CLT error: E π [(Ph θ (X) (1 + γ)h θ (X)+c(X)) 2 ] (discounted cost) σ 2 (θ) =E π [(h h θ ) 2 (Ph Ph θ ) 2 ] (h solves Poisson s eqn) In each case we can construct an adjoint equation to solve for optimal parameter
47 Outline Value function approximation w 2 Infeasible R Performance evaluation Perfromance improvement Simulation TD learning Numerics κ =1 κ =4 {h (x) =h 0 } w 1 Conclusions n
48 Once optimal CV is identified: µ 1 Station 1 µ 4 µ 2 Station 2 µ 3 α 3 Smoothed estimator using fluid value function: g = h P h = Dh, h = J x 10 Standard estimator Smoothed estimator
49 Policy Selection µ 1 Station 1 µ 4 µ 2 Station 2 µ 3 α 3 Serve buffer 1 if buffer 4 is empty, or W 2 s i W 1 β 1 and m 2 Q 2 + m 3 Q 3 w 2 Performance as a function of safety stocks w w2 (i) Simulation using smoothed estimator w w2 (i) Simulation using standard estimator 10
50 Optimal speed scaling Joint work with Adam Wierman and the students of ECE 555 Intel s huge bet turns iffy. John Markoff and Steve Lohr. New York Times, September What matters most to the computer designers at Google is not speed, but power, low power, because data centers can consume as much electricity as a city. Dr. Eric Schmidt, CEO of Google [1] O. S. Unsal andi. Koren, System-level power-aware deisgn techniques in real-time systems, Proc. IEEE, vol. 91, no. 7, pp , [2] S. Irani andk. R. Pruhs, Algorithmic problems in power management, SIGACT News, vol. 36, no. 2, pp , [3] S. Kaxirasand M. Martonosi, Computer Architecture Techniques for Power-Efficiency. MorganandClaypool, 2008.
51 Optimal speed scaling Joint work with Adam Wierman and the students of ECE 555 Can we adjust power to computer processor to optimize delay/power utilization tradeoff? 0 Power log(dyn power) Intel PXA 2 TCP proc Pentium M log(freq) Speed [1] O. S. Unsal andi. Koren, System-level power-aware deisgn techniques in real-time systems, Proc. IEEE, vol. 91, no. 7, pp , [2] S. Irani andk. R. Pruhs, Algorithmic problems in power management, SIGACT News, vol. 36, no. 2, pp , [3] S. Kaxirasand M. Martonosi, Computer Architecture Techniques for Power-Efficiency. MorganandClaypool, 2008.
52 Optimal speed scaling - queueing model Abstraction: A controlled queue Q(k + 1) Q(k) = U(k) + A(k + 1) Processing rate Job arrivals Minimize the average cost: Quadratic processing cost for sake of example
53 Optimal speed scaling - fluid model Abstraction: A controlled fluid model Processing rate Job arrivals Minimize the total cost (up to draining time): J (x) = min 0 T 0 c(q(t), ζ(t)) dt, q(0) = x = αx (2x + α2 ) 3/2 α 3
54 Optimal speed scaling - value functions Relative value function h Fluid value function J x J (x) = min 0 T 0 c(q(t), ζ(t)) dt, q(0) = x
55 Optimal speed scaling - policies φ J (x) = arg min 0 u x [ c(x, u) + P u J (x)] input u Stochastic optimal policy queue length x J (x) = min 0 T 0 c(q(t), ζ(t)) dt, q(0) = x
56 Optimal speed scaling - TD Learning 1 θ 1 (t) Basis θ 2 (t) x 10 5 t
57 Outline Value function approximation w 2 Infeasible R Performance evaluation Perfromance improvement Simulation TD learning {h (x) =h 0 } w 1 Numerics κ =1 κ =4 Conclusions n
58 Conclusions Possible to apply adjoint for algorithm synthesis in many settings Many questions remaining! Variance of Newton and SA recursions Parametizations in networks Complexity: workload relaxations (see CTCN and Henderson and M. 2003) Policy improvement, approximate DP Veatch 2004 Mannor, Menache, and Shimkin, 2005 Moallemi, Kumar, and Van Roy 2007
59 Conclusions Possible to apply adjoint for algorithm synthesis in many settings Many questions remaining! Variance of Newton and SA recursions Parametizations in networks Complexity: workload relaxations Policy improvement, approximate DP Chapter 11 Control Techniques for Complex Networks Cambridge 2007
Average-cost temporal difference learning and adaptive control variates
Average-cost temporal difference learning and adaptive control variates Sean Meyn Department of ECE and the Coordinated Science Laboratory Joint work with S. Mannor, McGill V. Tadic, Sheffield S. Henderson,
More informationStability and Asymptotic Optimality of h-maxweight Policies
Stability and Asymptotic Optimality of h-maxweight Policies - A. Rybko, 2006 Sean Meyn Department of Electrical and Computer Engineering University of Illinois & the Coordinated Science Laboratory NSF
More informationQuasi Stochastic Approximation American Control Conference San Francisco, June 2011
Quasi Stochastic Approximation American Control Conference San Francisco, June 2011 Sean P. Meyn Joint work with Darshan Shirodkar and Prashant Mehta Coordinated Science Laboratory and the Department of
More informationC~I- c-5~- n.=. r (s) = + 'I. ~t(y(-.*) - y^cbnld)
A C~I- c-5~- n.=. r (s) = + 'I. ~t(y(-.*) - y^cbnld) b 9, (4 f 3 cc*l, 'PC w4. ECE 555 Control of Stochastic Systems Fall 2005 Handout: Reinforcement learning In this handout we analyse reinforcement
More informationWorkload Models for Stochastic Networks: Value Functions and Performance Evaluation
Workload Models for Stochastic Networks: Value Functions and Performance Evaluation Sean Meyn Fellow, IEEE Abstract This paper concerns control and performance evaluation for stochastic network models.
More informationDynamic Matching Models
Dynamic Matching Models Ana Bušić Inria Paris - Rocquencourt CS Department of École normale supérieure joint work with Varun Gupta, Jean Mairesse and Sean Meyn 3rd Workshop on Cognition and Control January
More informationApproximate Dynamic Programming
Approximate Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course Value Iteration: the Idea 1. Let V 0 be any vector in R N A. LAZARIC Reinforcement
More informationFinding the best mismatched detector for channel coding and hypothesis testing
Finding the best mismatched detector for channel coding and hypothesis testing Sean Meyn Department of Electrical and Computer Engineering University of Illinois and the Coordinated Science Laboratory
More informationManagement of demand-driven production systems
Management of demand-driven production systems Mike Chen, Richard Dubrawski, and Sean Meyn November 4, 22 Abstract Control-synthesis techniques are developed for demand-driven production systems. The resulting
More informationOn Reparametrization and the Gibbs Sampler
On Reparametrization and the Gibbs Sampler Jorge Carlos Román Department of Mathematics Vanderbilt University James P. Hobert Department of Statistics University of Florida March 2014 Brett Presnell Department
More informationCentral-limit approach to risk-aware Markov decision processes
Central-limit approach to risk-aware Markov decision processes Jia Yuan Yu Concordia University November 27, 2015 Joint work with Pengqian Yu and Huan Xu. Inventory Management 1 1 Look at current inventory
More informationExact Simulation of the Stationary Distribution of M/G/c Queues
1/36 Exact Simulation of the Stationary Distribution of M/G/c Queues Professor Karl Sigman Columbia University New York City USA Conference in Honor of Søren Asmussen Monday, August 1, 2011 Sandbjerg Estate
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of
More informationA System Theoretic Perspective of Learning and Optimization
A System Theoretic Perspective of Learning and Optimization Xi-Ren Cao* Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong Kong eecao@ee.ust.hk Abstract Learning and optimization
More informationInformation geometry for bivariate distribution control
Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic
More informationAdvanced Computer Networks Lecture 2. Markov Processes
Advanced Computer Networks Lecture 2. Markov Processes Husheng Li Min Kao Department of Electrical Engineering and Computer Science University of Tennessee, Knoxville Spring, 2016 1/28 Outline 2/28 1 Definition
More informationTemporal-Difference Q-learning in Active Fault Diagnosis
Temporal-Difference Q-learning in Active Fault Diagnosis Jan Škach 1 Ivo Punčochář 1 Frank L. Lewis 2 1 Identification and Decision Making Research Group (IDM) European Centre of Excellence - NTIS University
More informationStability and Rare Events in Stochastic Models Sergey Foss Heriot-Watt University, Edinburgh and Institute of Mathematics, Novosibirsk
Stability and Rare Events in Stochastic Models Sergey Foss Heriot-Watt University, Edinburgh and Institute of Mathematics, Novosibirsk ANSAPW University of Queensland 8-11 July, 2013 1 Outline (I) Fluid
More informationPositive Harris Recurrence and Diffusion Scale Analysis of a Push Pull Queueing Network. Haifa Statistics Seminar May 5, 2008
Positive Harris Recurrence and Diffusion Scale Analysis of a Push Pull Queueing Network Yoni Nazarathy Gideon Weiss Haifa Statistics Seminar May 5, 2008 1 Outline 1 Preview of Results 2 Introduction Queueing
More informationNear-Optimal Control of Queueing Systems via Approximate One-Step Policy Improvement
Near-Optimal Control of Queueing Systems via Approximate One-Step Policy Improvement Jefferson Huang March 21, 2018 Reinforcement Learning for Processing Networks Seminar Cornell University Performance
More informationMonte Carlo methods for sampling-based Stochastic Optimization
Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from
More informationOn Ergodic Impulse Control with Constraint
On Ergodic Impulse Control with Constraint Maurice Robin Based on joint papers with J.L. Menaldi University Paris-Sanclay 9119 Saint-Aubin, France (e-mail: maurice.robin@polytechnique.edu) IMA, Minneapolis,
More informationJoint work with Nguyen Hoang (Univ. Concepción, Chile) Padova, Italy, May 2018
EXTENDED EULER-LAGRANGE AND HAMILTONIAN CONDITIONS IN OPTIMAL CONTROL OF SWEEPING PROCESSES WITH CONTROLLED MOVING SETS BORIS MORDUKHOVICH Wayne State University Talk given at the conference Optimization,
More informationTutorial: Optimal Control of Queueing Networks
Department of Mathematics Tutorial: Optimal Control of Queueing Networks Mike Veatch Presented at INFORMS Austin November 7, 2010 1 Overview Network models MDP formulations: features, efficient formulations
More informationAUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET. Questions AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET
The Problem Identification of Linear and onlinear Dynamical Systems Theme : Curve Fitting Division of Automatic Control Linköping University Sweden Data from Gripen Questions How do the control surface
More informationNEW FRONTIERS IN APPLIED PROBABILITY
J. Appl. Prob. Spec. Vol. 48A, 209 213 (2011) Applied Probability Trust 2011 NEW FRONTIERS IN APPLIED PROBABILITY A Festschrift for SØREN ASMUSSEN Edited by P. GLYNN, T. MIKOSCH and T. ROLSKI Part 4. Simulation
More informationApproximate Dynamic Programming
Master MVA: Reinforcement Learning Lecture: 5 Approximate Dynamic Programming Lecturer: Alessandro Lazaric http://researchers.lille.inria.fr/ lazaric/webpage/teaching.html Objectives of the lecture 1.
More informationOn Stopping Times and Impulse Control with Constraint
On Stopping Times and Impulse Control with Constraint Jose Luis Menaldi Based on joint papers with M. Robin (216, 217) Department of Mathematics Wayne State University Detroit, Michigan 4822, USA (e-mail:
More informationPerformance Evaluation of Queuing Systems
Performance Evaluation of Queuing Systems Introduction to Queuing Systems System Performance Measures & Little s Law Equilibrium Solution of Birth-Death Processes Analysis of Single-Station Queuing Systems
More informationResource Pooling for Optimal Evacuation of a Large Building
Proceedings of the 47th IEEE Conference on Decision and Control Cancun, Mexico, Dec. 9-11, 28 Resource Pooling for Optimal Evacuation of a Large Building Kun Deng, Wei Chen, Prashant G. Mehta, and Sean
More informationAn Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods
An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This
More informationMean-field dual of cooperative reproduction
The mean-field dual of systems with cooperative reproduction joint with Tibor Mach (Prague) A. Sturm (Göttingen) Friday, July 6th, 2018 Poisson construction of Markov processes Let (X t ) t 0 be a continuous-time
More informationStein s Method for Steady-State Approximations: Error Bounds and Engineering Solutions
Stein s Method for Steady-State Approximations: Error Bounds and Engineering Solutions Jim Dai Joint work with Anton Braverman and Jiekun Feng Cornell University Workshop on Congestion Games Institute
More informationIntroduction to Reinforcement Learning. CMPT 882 Mar. 18
Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and
More informationOptimization and the Price of Anarchy in a Dynamic Newsboy Model
Optimization and the Price of Anarchy in a Dynamic Newsboy Model Sean Meyn Prices Normalized demand Reserve Department of ECE and the Coordinated Science Laboratory University of Illinois Joint work with
More informationTime Reversibility and Burke s Theorem
Queuing Analysis: Time Reversibility and Burke s Theorem Hongwei Zhang http://www.cs.wayne.edu/~hzhang Acknowledgement: this lecture is partially based on the slides of Dr. Yannis A. Korilis. Outline Time-Reversal
More informationStochastic Optimization for Undergraduate Computer Science Students
Stochastic Optimization for Undergraduate Computer Science Students Professor Joongheon Kim School of Computer Science and Engineering, Chung-Ang University, Seoul, Republic of Korea 1 Reference 2 Outline
More information1 Markov decision processes
2.997 Decision-Making in Large-Scale Systems February 4 MI, Spring 2004 Handout #1 Lecture Note 1 1 Markov decision processes In this class we will study discrete-time stochastic systems. We can describe
More informationLecture 20: Reversible Processes and Queues
Lecture 20: Reversible Processes and Queues 1 Examples of reversible processes 11 Birth-death processes We define two non-negative sequences birth and death rates denoted by {λ n : n N 0 } and {µ n : n
More information5 Lecture 5: Fluid Models
5 Lecture 5: Fluid Models Stability of fluid and stochastic processing networks Stability analysis of some fluid models Optimization of fluid networks. Separated continuous linear programming 5.1 Stability
More informationReinforcement Learning and Optimal Control. ASU, CSE 691, Winter 2019
Reinforcement Learning and Optimal Control ASU, CSE 691, Winter 2019 Dimitri P. Bertsekas dimitrib@mit.edu Lecture 8 Bertsekas Reinforcement Learning 1 / 21 Outline 1 Review of Infinite Horizon Problems
More informationOther properties of M M 1
Other properties of M M 1 Přemysl Bejda premyslbejda@gmail.com 2012 Contents 1 Reflected Lévy Process 2 Time dependent properties of M M 1 3 Waiting times and queue disciplines in M M 1 Contents 1 Reflected
More informationQUALIFYING EXAM IN SYSTEMS ENGINEERING
QUALIFYING EXAM IN SYSTEMS ENGINEERING Written Exam: MAY 23, 2017, 9:00AM to 1:00PM, EMB 105 Oral Exam: May 25 or 26, 2017 Time/Location TBA (~1 hour per student) CLOSED BOOK, NO CHEAT SHEETS BASIC SCIENTIFIC
More informationLecture 12: Detailed balance and Eigenfunction methods
Miranda Holmes-Cerfon Applied Stochastic Analysis, Spring 2015 Lecture 12: Detailed balance and Eigenfunction methods Readings Recommended: Pavliotis [2014] 4.5-4.7 (eigenfunction methods and reversibility),
More informationMath 2: Algebra 2, Geometry and Statistics Ms. Sheppard-Brick Chapter 4 Test Review
Chapter 4 Test Review Students will be able to (SWBAT): Write an explicit and a recursive function rule for a linear table of values. Write an explicit function rule for a quadratic table of values. Determine
More informationAlmost Sure Convergence of Two Time-Scale Stochastic Approximation Algorithms
Almost Sure Convergence of Two Time-Scale Stochastic Approximation Algorithms Vladislav B. Tadić Abstract The almost sure convergence of two time-scale stochastic approximation algorithms is analyzed under
More informationOn Finding Optimal Policies for Markovian Decision Processes Using Simulation
On Finding Optimal Policies for Markovian Decision Processes Using Simulation Apostolos N. Burnetas Case Western Reserve University Michael N. Katehakis Rutgers University February 1995 Abstract A simulation
More informationPerturbed Proximal Gradient Algorithm
Perturbed Proximal Gradient Algorithm Gersende FORT LTCI, CNRS, Telecom ParisTech Université Paris-Saclay, 75013, Paris, France Large-scale inverse problems and optimization Applications to image processing
More informationLECTURE 3. Last time:
LECTURE 3 Last time: Mutual Information. Convexity and concavity Jensen s inequality Information Inequality Data processing theorem Fano s Inequality Lecture outline Stochastic processes, Entropy rate
More informationLecture 12: Detailed balance and Eigenfunction methods
Lecture 12: Detailed balance and Eigenfunction methods Readings Recommended: Pavliotis [2014] 4.5-4.7 (eigenfunction methods and reversibility), 4.2-4.4 (explicit examples of eigenfunction methods) Gardiner
More informationState Space Abstractions for Reinforcement Learning
State Space Abstractions for Reinforcement Learning Rowan McAllister and Thang Bui MLG RCC 6 November 24 / 24 Outline Introduction Markov Decision Process Reinforcement Learning State Abstraction 2 Abstraction
More informationSome Results on the Ergodicity of Adaptive MCMC Algorithms
Some Results on the Ergodicity of Adaptive MCMC Algorithms Omar Khalil Supervisor: Jeffrey Rosenthal September 2, 2011 1 Contents 1 Andrieu-Moulines 4 2 Roberts-Rosenthal 7 3 Atchadé and Fort 8 4 Relationship
More informationLarge Deviation Asymptotics and Control Variates For Simulating Large Functions
Large Deviation Asymptotics and Control Variates For Simulating Large Functions Sean Meyn October 8, 2004 Abstract Consider the normalized partial sums of a real-valued function F of a Markov chain, n
More informationA comparison of numerical methods for the. Solution of continuous-time DSGE models. Juan Carlos Parra Alvarez
A comparison of numerical methods for the solution of continuous-time DSGE models Juan Carlos Parra Alvarez Department of Economics and Business, and CREATES Aarhus University, Denmark November 14, 2012
More informationMarkov Chains. Arnoldo Frigessi Bernd Heidergott November 4, 2015
Markov Chains Arnoldo Frigessi Bernd Heidergott November 4, 2015 1 Introduction Markov chains are stochastic models which play an important role in many applications in areas as diverse as biology, finance,
More informationStochastic Processes
Elements of Lecture II Hamid R. Rabiee with thanks to Ali Jalali Overview Reading Assignment Chapter 9 of textbook Further Resources MIT Open Course Ware S. Karlin and H. M. Taylor, A First Course in Stochastic
More informationControlled sequential Monte Carlo
Controlled sequential Monte Carlo Jeremy Heng, Department of Statistics, Harvard University Joint work with Adrian Bishop (UTS, CSIRO), George Deligiannidis & Arnaud Doucet (Oxford) Bayesian Computation
More informationPractical conditions on Markov chains for weak convergence of tail empirical processes
Practical conditions on Markov chains for weak convergence of tail empirical processes Olivier Wintenberger University of Copenhagen and Paris VI Joint work with Rafa l Kulik and Philippe Soulier Toronto,
More informationLectures on Stochastic Stability. Sergey FOSS. Heriot-Watt University. Lecture 4. Coupling and Harris Processes
Lectures on Stochastic Stability Sergey FOSS Heriot-Watt University Lecture 4 Coupling and Harris Processes 1 A simple example Consider a Markov chain X n in a countable state space S with transition probabilities
More informationBasics of reinforcement learning
Basics of reinforcement learning Lucian Buşoniu TMLSS, 20 July 2018 Main idea of reinforcement learning (RL) Learn a sequential decision policy to optimize the cumulative performance of an unknown system
More informationA Diffusion Approximation for Stationary Distribution of Many-Server Queueing System In Halfin-Whitt Regime
A Diffusion Approximation for Stationary Distribution of Many-Server Queueing System In Halfin-Whitt Regime Mohammadreza Aghajani joint work with Kavita Ramanan Brown University APS Conference, Istanbul,
More informationControlled Diffusions and Hamilton-Jacobi Bellman Equations
Controlled Diffusions and Hamilton-Jacobi Bellman Equations Emo Todorov Applied Mathematics and Computer Science & Engineering University of Washington Winter 2014 Emo Todorov (UW) AMATH/CSE 579, Winter
More informationReliability by Design in Distributed Power Transmission Networks
Reliability by Design in Distributed Power Transmission Networks Mike Chen a, In-Koo Cho b, Sean P. Meyn c a Morgan Stanley and Co., 1585 Broadway, New York, NY, 119 USA b Department of Economics, University
More informationSimulation Based Aircraft Trajectories Design for Air Traffic Management
Simulation Based Aircraft Trajectories Design for Air Traffic Management N. Kantas 1 A. Lecchini-Visintini 2 J. M. Maciejowski 1 1 Cambridge University Engineering Dept., UK 2 Dept. of Engineering, University
More informationOperations Research Letters. Instability of FIFO in a simple queueing system with arbitrarily low loads
Operations Research Letters 37 (2009) 312 316 Contents lists available at ScienceDirect Operations Research Letters journal homepage: www.elsevier.com/locate/orl Instability of FIFO in a simple queueing
More information{σ x >t}p x. (σ x >t)=e at.
3.11. EXERCISES 121 3.11 Exercises Exercise 3.1 Consider the Ornstein Uhlenbeck process in example 3.1.7(B). Show that the defined process is a Markov process which converges in distribution to an N(0,σ
More informationAuxiliary Particle Methods
Auxiliary Particle Methods Perspectives & Applications Adam M. Johansen 1 adam.johansen@bristol.ac.uk Oxford University Man Institute 29th May 2008 1 Collaborators include: Arnaud Doucet, Nick Whiteley
More informationQuality of Real-Time Streaming in Wireless Cellular Networks : Stochastic Modeling and Analysis
Quality of Real-Time Streaming in Wireless Cellular Networs : Stochastic Modeling and Analysis B. Blaszczyszyn, M. Jovanovic and M. K. Karray Based on paper [1] WiOpt/WiVid Mai 16th, 2014 Outline Introduction
More informationMarkov Decision Processes and Dynamic Programming
Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course In This Lecture A. LAZARIC Markov Decision Processes
More informationMultivariate Risk Processes with Interacting Intensities
Multivariate Risk Processes with Interacting Intensities Nicole Bäuerle (joint work with Rudolf Grübel) Luminy, April 2010 Outline Multivariate pure birth processes Multivariate Risk Processes Fluid Limits
More informationFinal Exam Practice Problems Math 428, Spring 2017
Final xam Practice Problems Math 428, Spring 2017 Name: Directions: Throughout, (X,M,µ) is a measure space, unless stated otherwise. Since this is not to be turned in, I highly recommend that you work
More informationLecture 3: Markov Decision Processes
Lecture 3: Markov Decision Processes Joseph Modayil 1 Markov Processes 2 Markov Reward Processes 3 Markov Decision Processes 4 Extensions to MDPs Markov Processes Introduction Introduction to MDPs Markov
More informationRandomized Algorithms for Semi-Infinite Programming Problems
Randomized Algorithms for Semi-Infinite Programming Problems Vladislav B. Tadić 1, Sean P. Meyn 2, and Roberto Tempo 3 1 Department of Automatic Control and Systems Engineering University of Sheffield,
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 22 12/09/2013. Skorokhod Mapping Theorem. Reflected Brownian Motion
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.7J Fall 213 Lecture 22 12/9/213 Skorokhod Mapping Theorem. Reflected Brownian Motion Content. 1. G/G/1 queueing system 2. One dimensional reflection mapping
More informationSecond order forward-backward dynamical systems for monotone inclusion problems
Second order forward-backward dynamical systems for monotone inclusion problems Radu Ioan Boţ Ernö Robert Csetnek March 6, 25 Abstract. We begin by considering second order dynamical systems of the from
More informationMonte Carlo Methods for Computation and Optimization (048715)
Technion Department of Electrical Engineering Monte Carlo Methods for Computation and Optimization (048715) Lecture Notes Prof. Nahum Shimkin Spring 2015 i PREFACE These lecture notes are intended for
More informationStability of the two queue system
Stability of the two queue system Iain M. MacPhee and Lisa J. Müller University of Durham Department of Mathematical Science Durham, DH1 3LE, UK (e-mail: i.m.macphee@durham.ac.uk, l.j.muller@durham.ac.uk)
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Mark Schmidt University of British Columbia Winter 2018 Last Time: Monte Carlo Methods If we want to approximate expectations of random functions, E[g(x)] = g(x)p(x) or E[g(x)]
More informationApproximate Dynamic Programming
Approximate Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course Approximate Dynamic Programming (a.k.a. Batch Reinforcement Learning) A.
More informationRare Event Simulation using Importance Sampling and Cross-Entropy p. 1
Rare Event Simulation using Importance Sampling and Cross-Entropy Caitlin James Supervisor: Phil Pollett Rare Event Simulation using Importance Sampling and Cross-Entropy p. 1 500 Simulation of a population
More informationON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME
ON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME SAUL D. JACKA AND ALEKSANDAR MIJATOVIĆ Abstract. We develop a general approach to the Policy Improvement Algorithm (PIA) for stochastic control problems
More informationECE7850 Lecture 7. Discrete Time Optimal Control and Dynamic Programming
ECE7850 Lecture 7 Discrete Time Optimal Control and Dynamic Programming Discrete Time Optimal control Problems Short Introduction to Dynamic Programming Connection to Stabilization Problems 1 DT nonlinear
More informationChapter 3 Balance equations, birth-death processes, continuous Markov Chains
Chapter 3 Balance equations, birth-death processes, continuous Markov Chains Ioannis Glaropoulos November 4, 2012 1 Exercise 3.2 Consider a birth-death process with 3 states, where the transition rate
More informationStein s Method for Steady-State Approximations: A Toolbox of Techniques
Stein s Method for Steady-State Approximations: A Toolbox of Techniques Anton Braverman Based on joint work with Jim Dai (Cornell), Itai Gurvich (Cornell), and Junfei Huang (CUHK). November 17, 2017 Outline
More informationState-dependent Limited Processor Sharing - Approximations and Optimal Control
State-dependent Limited Processor Sharing - Approximations and Optimal Control Varun Gupta University of Chicago Joint work with: Jiheng Zhang (HKUST) State-dependent Limited Processor Sharing Processor
More informationSequential Monte Carlo Methods for Bayesian Computation
Sequential Monte Carlo Methods for Bayesian Computation A. Doucet Kyoto Sept. 2012 A. Doucet (MLSS Sept. 2012) Sept. 2012 1 / 136 Motivating Example 1: Generic Bayesian Model Let X be a vector parameter
More informationStochastic Gradient Descent in Continuous Time
Stochastic Gradient Descent in Continuous Time Justin Sirignano University of Illinois at Urbana Champaign with Konstantinos Spiliopoulos (Boston University) 1 / 27 We consider a diffusion X t X = R m
More informationUNIVERSITY OF MANITOBA
Question Points Score INSTRUCTIONS TO STUDENTS: This is a 6 hour examination. No extra time will be given. No texts, notes, or other aids are permitted. There are no calculators, cellphones or electronic
More informationCourse Summary Math 211
Course Summary Math 211 table of contents I. Functions of several variables. II. R n. III. Derivatives. IV. Taylor s Theorem. V. Differential Geometry. VI. Applications. 1. Best affine approximations.
More informationIn search of sensitivity in network optimization
In search of sensitivity in network optimization Mike Chen, Charuhas Pandit, and Sean Meyn June 13, 2002 Abstract This paper concerns policy synthesis in large queuing networks. The results provide answers
More informationMarkov processes and queueing networks
Inria September 22, 2015 Outline Poisson processes Markov jump processes Some queueing networks The Poisson distribution (Siméon-Denis Poisson, 1781-1840) { } e λ λ n n! As prevalent as Gaussian distribution
More informationStochastic process. X, a series of random variables indexed by t
Stochastic process X, a series of random variables indexed by t X={X(t), t 0} is a continuous time stochastic process X={X(t), t=0,1, } is a discrete time stochastic process X(t) is the state at time t,
More informationminimize x subject to (x 2)(x 4) u,
Math 6366/6367: Optimization and Variational Methods Sample Preliminary Exam Questions 1. Suppose that f : [, L] R is a C 2 -function with f () on (, L) and that you have explicit formulae for
More informationDynamic Programming Theorems
Dynamic Programming Theorems Prof. Lutz Hendricks Econ720 September 11, 2017 1 / 39 Dynamic Programming Theorems Useful theorems to characterize the solution to a DP problem. There is no reason to remember
More informationChapter 7. Markov chain background. 7.1 Finite state space
Chapter 7 Markov chain background A stochastic process is a family of random variables {X t } indexed by a varaible t which we will think of as time. Time can be discrete or continuous. We will only consider
More informationExercises Stochastic Performance Modelling. Hamilton Institute, Summer 2010
Exercises Stochastic Performance Modelling Hamilton Institute, Summer Instruction Exercise Let X be a non-negative random variable with E[X ]
More informationOutline. A Central Limit Theorem for Truncating Stochastic Algorithms
Outline A Central Limit Theorem for Truncating Stochastic Algorithms Jérôme Lelong http://cermics.enpc.fr/ lelong Tuesday September 5, 6 1 3 4 Jérôme Lelong (CERMICS) Tuesday September 5, 6 1 / 3 Jérôme
More informationOn the Ergodicity Properties of some Adaptive MCMC Algorithms
On the Ergodicity Properties of some Adaptive MCMC Algorithms Christophe Andrieu, Éric Moulines Abstract In this paper we study the ergodicity properties of some adaptive Monte Carlo Markov chain algorithms
More informationReinforcement Learning
Reinforcement Learning Dipendra Misra Cornell University dkm@cs.cornell.edu https://dipendramisra.wordpress.com/ Task Grasp the green cup. Output: Sequence of controller actions Setup from Lenz et. al.
More informationarxiv: v3 [cs.dm] 25 Jun 2016
Approximate optimality with bounded regret in dynamic matching models arxiv:1411.1044v3 [cs.dm] 25 Jun 2016 Ana Bušić Sean Meyn Abstract We consider a discrete-time bipartite matching model with random
More information