Average-cost temporal difference learning and adaptive control variates
|
|
- Candice Chandler
- 5 years ago
- Views:
Transcription
1 Average-cost temporal difference learning and adaptive control variates Sean Meyn Department of ECE and the Coordinated Science Laboratory Joint work with S. Mannor, McGill V. Tadic, Sheffield S. Henderson, Cornell Work supported in part by NSF Grants ECS and
2 Outline Value function approximation w 2 Infeasible R Performance evaluation Performance improvement Simulation TD learning {h (x) =h 0 } w 1 Numerics κ =1 κ =4 Conclusions n
3 References Chapter 11 Control Techniques for Complex Networks Cambridge University Press, 2007 Preprints available at Rate Control Scheduling Congestion Routing Information? Workload release Link Outage Breakdown Demand Re-work
4 Outline Value function approximation Performance evaluation Performance improvement Simulation TD learning w 2 Infeasible R {h (x) =h 0 } w 1 Numerics Conclusions κ =1 κ = n
5 Average cost { X(t) :t =0,1,2,... } P (x,y) πp = π c: X R + η = π(c) Ergodic Markov chain on X (countable) Transition matrix Invariant probability measure Cost function (near monotone) Finite average cost η = π(c) = lim n 1 n n c(x(t)) t=0
6 Value functions Value functions: Discounted [ ] h(x):=e (1 + γ) t 1 c(x(t)) t =0 Relative [ τ 0 ] h(x):=e (c(x(t)) η) t =0 X(0) = x X
7 Value functions Value functions: Discounted [ ] h(x):=e (1 + γ) t 1 c(x(t)) t =0 Relative [ τ 0 ] h(x):=e (c(x(t)) η) t =0 X(0) = x X Dynamic programming equations: Discounted [γi D]h = c D = P I Relative D h = c η
8 Value function approximation Discounted-cost value function: [ ] h(x) =E (1 + γ) t 1 c(x(t)) t =0 Network model with c(x) a norm on X Under general conditions, approximated by scaled cost: lim r 1 r h(rx) = 1 γ c(x)
9 Value function approximation Discounted-cost value function: [ ] h(x) =E (1 + γ) t 1 c(x(t)) t =0 Network model with c(x) a norm on X Under general conditions, approximated by scaled cost: lim r and the fluid value function, 1 r h(rx) = 1 γ c(x) h(x) J γ (x) = 0 e γt c(q(t)) dt, q(0) = x
10 Value function approximation Relative value function, [ τ 0 ] h(x) = E (c(x(t)) η) t =0 h solves Poisson s equation, D h = c + η
11 Value function approximation Value function approximated by fluid model: [ τ 0 ] h(x) = E (c(x(t)) η) J(x) = t =0 0 c(q(t)) dt Relative value function Fluid value function Under general conditions, lim r 1 h(rx) =J(x) r2
12 Value function approximation Value function approximated by fluid model: [ τ 0 ] h(x) =E (c(x(t)) η) J(x) = t =0 0 c(q(t)) dt Infeasible R Workload model: level sets of h ι 2 (k) = ι 2 (k) > 0 20 {h (x) =h 0 }
13 Value function approximation: Simulation Simulation: Standard Monte Carlo, η k = k 1 k 1 t=0 c(x(t)) CLT variance: σ 2 = π(h 2 (Ph) 2 ) where h solves Poisson s equation D h = c + η
14 Value function approximation: Simulation Simulation: Standard Monte Carlo, η k = k 1 k 1 t=0 c(x(t)) CLT variance: σ 2 = π(h 2 (Ph) 2 ) where h solves Poisson s equation D h = c + η Example: Simulating the single queue: CLT variance = O 1 (1 ρ) 4
15 Value function approximation: Simulation ρ =0.9 Gaussian density Histogram 100, ,000 t=1 Q(t) η Example: Simulating the single queue: CLT variance = O Ch 11 CTCN 1 (1 ρ) 4
16 Value function approximation: Simulation Null LDP rate function: η n n 1 = n 1 t=0 Q(t) lim log P{η n r} 0, r > η n n M n Example: Simulating the single queue: CLT variance = O 1 (1 ρ) 4
17 Value function approximation: Simulation Control variates: g : X R has known mean π(g )=0, Smoothed estimator: η k (ϑ) =k 1 k 1 t=0 c(x(t)) ϑg (X(t)), k 1 Perfect estimator: g = h Ph = Dh = c η Zero CLT variance
18 Steady state customer population Smoothed estimator using fluid value function: g = h Ph = Dh, h = J ˆ x 10 Standard estimator Smoothed estimator
19 Steady state customer population Smoothed estimator using fluid value function: Performance as a function of safety stocks g = h Ph = Dh, h = Ĵ w w w w (i) Simulation using smoothed estimator (ii) Simulation using standard estimator
20 Outline Value function approximation w 2 Infeasible R Performance evaluation Perfromance improvement Simulation TD learning {h (x) =h 0 } w 1 Numerics Conclusions κ =1 κ = n
21 Value function approximation Linear parametrization: h θ (x):=θ T ψ(x) = l h θ i ψ i (x), x X i=1 How to find the best parameter θ? How to define best?
22 Value function approximation: Hilbert space setting Linear parametrization: h θ (x):=θ T ψ(x) = l h θ i ψ i (x), x X i=1 Minimize the mean square error h θ h 2 π
23 Value function approximation: Hilbert space setting Linear parametrization: h θ (x):=θ T ψ(x) = l h θ i ψ i (x), x X i=1 Minimize the mean square error h θ h 2 π = 2 E π [(h θ (X(k)) h(x(t))) ] θ h θ h 2 π =2E π [(h θ (X(k)) h(x(t))) ψ (X(t))] Setting the gradient to zero gives optimal parameter
24 Value function approximation: Hilbert space setting Linear parametrization: h θ (x):=θ T ψ(x) = l h θ i ψ i (x), x X i=1 Minimize the mean square error h θ h 2 π = 2 E π [(h θ (X(k)) h(x(t))) ] θ h θ h 2 π =2E π [(h θ (X(k)) h(x(t))) ψ (X(t))] Setting the gradient to zero gives optimal parameter θ =Σ 1 b b = E π [h(x)ψ(x)], Σ=E π [ψ(x)ψ(x) T ]
25 Value function approximation: Hilbert space setting Define for two functions f, g on X, f,g π = π(fg)
26 Value function approximation: Hilbert space setting Define for two functions f, g on X, f,g π = π(fg) L norm, 2 f g 2 π := f g,f g π = E π [(f(x(0)) g(x(0))) 2 ]
27 Value function approximation: Hilbert space setting Define for two functions f, g on X, f,g π = π(fg) L norm, 2 f g 2 π := f g,f g π = E π [(f(x(0)) g(x(0))) 2 ] Resolvent, R γ = (1 + γ) (t+1) P t t=0
28 Value function approximation: Hilbert space setting Define for two functions f, g on X, f,g π = π(fg) L norm, 2 f g 2 π := f g,f g π = E π [(f(x(0)) g(x(0))) 2 ] Resolvent, R γ = (1 + γ) (t+1) P t t=0 b = E π [h(x)ψ(x)] = R γ c,ψ π
29 Value function approximation: Hilbert space setting Define for two functions f, g on X, f,g π = π(fg) L norm, 2 f g 2 π := f g,f g π = E π [(f(x(0)) g(x(0))) 2 ] Resolvent, R γ = (1 + γ) (t+1) P t t=0 Adjoint, R γ f,g π = f,r γ g π, f, g L 2 (π)
30 Representing the adjoint For the stationary process, R γ f, g π = E t=0 (1 + γ) t 1 P t f (X(0)) g(x(0)) (discounted cost)
31 Representing the adjoint For the stationary process, R γ f, g π = E t=0 (1 + γ) t 1 P t f (X(0)) g(x(0)) Smoothing, E[P t f (X(0))g(X(0))] = E[f(X(t))g(X(0))] = E[f(X(0))g(X( t))]
32 Representing the adjoint For the stationary process, R γ f, g π = E t=0 (1 + γ) t 1 P t f (X(0)) g(x(0)) Smoothing, E[P t f (X(0))g(X(0))] = E[f(X(t))g(X(0))] = E[f(X(0))g(X( t))] R γ f, g π = t=0 (1 + γ) t 1 E[f(X(0))g(X( t))]
33 Representing the adjoint For the stationary process, R γ f, g π = E t=0 (1 + γ) t 1 P t f (X(0)) g(x(0)) Smoothing, E[P t f (X(0))g(X(0))] = E[f(X(t))g(X(0))] = E[f(X(0))g(X( t))] R γ f, g π = t=0 (1 + γ) t 1 E[f(X(0))g(X( t))] = f, R γ g π
34 Representing the adjoint Adjoint is defined in terms of the time-reversed process R γ g (x) = (1 + γ) t 1 E[ g(x( t)) X(0) = x ] t=0 Causal computable!
35 TD Learning for discounted cost Causal computable! 1 θ 2 h θ h 2 π = E π [(h θ (X(k)) h(x(t))) ψ (X(t))] = (h θ R γ c), ψ π = R γ (Rγ 1 h θ c), ψ π = Rγ 1 h θ c, R γ ψ π = d, ϕ π
36 TD Learning for discounted cost Causal computable! E[d(k)ϕ(k + 1)] 1 2 θ h θ h γ 2 π, θ(i) θ 1 θ 2 h θ h 2 π = E π [(h θ (X(k)) h(x(t))) ψ (X(t))] = (h θ R γ c), ψ π = R γ (Rγ 1 h θ c), ψ π = Rγ 1 h θ c, R γ ψ π = d, ϕ π
37 TD Learning for discounted cost Causal computable! E[d(k)ϕ(k + 1)] 1 2 θ h θ h γ 2 π, θ(i) θ (i) Temporal differences defined by, d(k) = [(1 + γ)h θ(k) (X(k)) h θ(k) (X(k + 1)) c(x(k))] (ii) Eligibility vectors: the sequence of d-dimensional vectors, ϕ(k) = k (1 + γ) t 1 ψ(x(k t)), k 1 t=0
38 Algorithms TD(1) algorithm θ(k +1) θ(k) =a k d(k)ϕ(x(k +1)), k 0 a k stochastic approx. gain
39 Algorithms TD(1) algorithm θ(k +1) θ(k) =a k d(k)ϕ(x(k +1)), k 0 a k stochastic approx. gain Consistent under mild conditions, θ =Σ 1 b b = E π [h(x)ψ(x)], Σ=E π [ψ(x)ψ(x) T ]
40 Algorithms Newton algorithm θ(k +1)= 1 k k i=1 ψ(x(i +1))ψ T (X(i +1)) 1 1 k k i=1 c(i)ϕ(x(i +1)) Consistent under mild conditions, θ =Σ 1 b b = E π [h(x)ψ(x)], Σ=E π [ψ(x)ψ(x) T ]
41 TD Learning for average cost Potential matrix R 0 g (x) =E x [ σ α t=0 ] g(x(t)), x X σ α hitting time to α X
42 TD Learning for average cost Potential matrix R 0 g (x) =E x [ σ α t=0 ] g(x(t)), x X σ α hitting time to α X Solves Poisson s equation when g = c π(c)
43 TD Learning for average cost Potential matrix R 0 g (x) =E x [ σ α t=0 ] g(x(t)), x X σ α hitting time to α X Adjoint R 0 g (x) =E [ g(x(t)) ] X(0) = x, g L 2 (π) σ [0] α t 0 Backward hitting time α =max{ t k : X(t) =α } σ [k]
44 Value function approximation: Other metrics Bellman error: E π [(Ph θ (X) (1 + γ)h θ (X)+c(X)) 2 ] (discounted cost)
45 Value function approximation: Other metrics Bellman error: CLT error: E π [(Ph θ (X) (1 + γ)h θ (X)+c(X)) 2 ] (discounted cost) σ 2 (θ) =E π [(h h θ ) 2 (Ph Ph θ ) 2 ] (h solves Poisson s eqn)
46 Value function approximation: Other metrics Bellman error: CLT error: E π [(Ph θ (X) (1 + γ)h θ (X)+c(X)) 2 ] (discounted cost) σ 2 (θ) =E π [(h h θ ) 2 (Ph Ph θ ) 2 ] (h solves Poisson s eqn) In each case we can construct an adjoint equation to solve for optimal parameter
47 Outline Value function approximation w 2 Infeasible R Performance evaluation Perfromance improvement Simulation TD learning Numerics κ =1 κ =4 {h (x) =h 0 } w 1 Conclusions n
48 Simulating the M/M/1queue X(t +1)=[X(t)+Δ(t + 1)] +, X(0) = x Z + Increment process Δ is i.i.d., Δ(t) = { 1, with prob. p =(δ + n)/(n +1); n, with prob. 1 p Mean = δ, Variance = O(n) Two special cases: δ = 1 10 and, 1. TheM/M/1 queue, n =1 2. Bursty queue, n =4
49 Simulating the M/M/1queue X(t +1)=[X(t)+Δ(t +1)] + Smoothed estimator: 1 N N 1 t=0 X(t) g θ t (X(t)) g θ = h θ Ph θ h θ (x) =θ 1 x + θ 2 x 2 θ 1 = θ 2 θ 1 θ 2 15 ρ =0.8 ρ = x x 10
50 Simulating the M/M/1queue X(t +1)=[X(t)+Δ(t +1)] + State weighting for variance reduction: f g 2 π,ω := E[(f(X) g(x)) 2 /Ω(X)] h θ (x) =θ 1 x + θ 2 x 2 θ 1 = θ 2
51 Simulating the M/M/1queue X(t +1)=[X(t)+Δ(t + 1)] + State weighting for variance reduction: f g 2 π,ω := E[(f(X) g(x)) 2 /Ω(X)] h θ (x) =θ 1 x + θ 2 x 2 θ 1 = θ 2 Quartic weighting results in much lower variance: θ 1 θ 2 Ω(x) =[1+(μ α)x] x x 10
52 Outline Value function approximation w 2 Infeasible R Performance evaluation Perfromance improvement Simulation TD learning {h (x) =h 0 } w 1 Numerics κ =1 κ =4 Conclusions n
53 Conclusions Value function approximation: as fundamental as the Kalman filter Many questions remaining! Variance of Newton and SA recursions Parametizations in networks Policy improvement, approximate DP Other applications, such as MCMC (very recent work by Van Roy et. al.)
TD Learning. Sean Meyn. Department of Electrical and Computer Engineering University of Illinois & the Coordinated Science Laboratory
TD Learning Sean Meyn Department of Electrical and Computer Engineering University of Illinois & the Coordinated Science Laboratory Joint work with I.-K. Cho, Illinois S. Mannor, McGill V. Tadic, Sheffield
More informationQuasi Stochastic Approximation American Control Conference San Francisco, June 2011
Quasi Stochastic Approximation American Control Conference San Francisco, June 2011 Sean P. Meyn Joint work with Darshan Shirodkar and Prashant Mehta Coordinated Science Laboratory and the Department of
More informationC~I- c-5~- n.=. r (s) = + 'I. ~t(y(-.*) - y^cbnld)
A C~I- c-5~- n.=. r (s) = + 'I. ~t(y(-.*) - y^cbnld) b 9, (4 f 3 cc*l, 'PC w4. ECE 555 Control of Stochastic Systems Fall 2005 Handout: Reinforcement learning In this handout we analyse reinforcement
More informationStability and Asymptotic Optimality of h-maxweight Policies
Stability and Asymptotic Optimality of h-maxweight Policies - A. Rybko, 2006 Sean Meyn Department of Electrical and Computer Engineering University of Illinois & the Coordinated Science Laboratory NSF
More informationMonte Carlo methods for sampling-based Stochastic Optimization
Monte Carlo methods for sampling-based Stochastic Optimization Gersende FORT LTCI CNRS & Telecom ParisTech Paris, France Joint works with B. Jourdain, T. Lelièvre, G. Stoltz from ENPC and E. Kuhn from
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of
More informationOn Reparametrization and the Gibbs Sampler
On Reparametrization and the Gibbs Sampler Jorge Carlos Román Department of Mathematics Vanderbilt University James P. Hobert Department of Statistics University of Florida March 2014 Brett Presnell Department
More informationTemporal-Difference Q-learning in Active Fault Diagnosis
Temporal-Difference Q-learning in Active Fault Diagnosis Jan Škach 1 Ivo Punčochář 1 Frank L. Lewis 2 1 Identification and Decision Making Research Group (IDM) European Centre of Excellence - NTIS University
More informationInformation geometry for bivariate distribution control
Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic
More informationCentral-limit approach to risk-aware Markov decision processes
Central-limit approach to risk-aware Markov decision processes Jia Yuan Yu Concordia University November 27, 2015 Joint work with Pengqian Yu and Huan Xu. Inventory Management 1 1 Look at current inventory
More informationA System Theoretic Perspective of Learning and Optimization
A System Theoretic Perspective of Learning and Optimization Xi-Ren Cao* Hong Kong University of Science and Technology Clear Water Bay, Kowloon, Hong Kong eecao@ee.ust.hk Abstract Learning and optimization
More information6 Markov Chain Monte Carlo (MCMC)
6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution
More informationELEMENTS OF PROBABILITY THEORY
ELEMENTS OF PROBABILITY THEORY Elements of Probability Theory A collection of subsets of a set Ω is called a σ algebra if it contains Ω and is closed under the operations of taking complements and countable
More informationSequential Monte Carlo Methods for Bayesian Computation
Sequential Monte Carlo Methods for Bayesian Computation A. Doucet Kyoto Sept. 2012 A. Doucet (MLSS Sept. 2012) Sept. 2012 1 / 136 Motivating Example 1: Generic Bayesian Model Let X be a vector parameter
More informationAdvanced Computer Networks Lecture 2. Markov Processes
Advanced Computer Networks Lecture 2. Markov Processes Husheng Li Min Kao Department of Electrical Engineering and Computer Science University of Tennessee, Knoxville Spring, 2016 1/28 Outline 2/28 1 Definition
More informationAUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET. Questions AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET
The Problem Identification of Linear and onlinear Dynamical Systems Theme : Curve Fitting Division of Automatic Control Linköping University Sweden Data from Gripen Questions How do the control surface
More informationApproximate Dynamic Programming
Approximate Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course Value Iteration: the Idea 1. Let V 0 be any vector in R N A. LAZARIC Reinforcement
More informationOn Ergodic Impulse Control with Constraint
On Ergodic Impulse Control with Constraint Maurice Robin Based on joint papers with J.L. Menaldi University Paris-Sanclay 9119 Saint-Aubin, France (e-mail: maurice.robin@polytechnique.edu) IMA, Minneapolis,
More informationReinforcement Learning and Optimal Control. ASU, CSE 691, Winter 2019
Reinforcement Learning and Optimal Control ASU, CSE 691, Winter 2019 Dimitri P. Bertsekas dimitrib@mit.edu Lecture 8 Bertsekas Reinforcement Learning 1 / 21 Outline 1 Review of Infinite Horizon Problems
More informationDynamic Matching Models
Dynamic Matching Models Ana Bušić Inria Paris - Rocquencourt CS Department of École normale supérieure joint work with Varun Gupta, Jean Mairesse and Sean Meyn 3rd Workshop on Cognition and Control January
More informationMean-field dual of cooperative reproduction
The mean-field dual of systems with cooperative reproduction joint with Tibor Mach (Prague) A. Sturm (Göttingen) Friday, July 6th, 2018 Poisson construction of Markov processes Let (X t ) t 0 be a continuous-time
More informationMarkov Chain Monte Carlo Methods for Stochastic Optimization
Markov Chain Monte Carlo Methods for Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U of Toronto, MIE,
More information{σ x >t}p x. (σ x >t)=e at.
3.11. EXERCISES 121 3.11 Exercises Exercise 3.1 Consider the Ornstein Uhlenbeck process in example 3.1.7(B). Show that the defined process is a Markov process which converges in distribution to an N(0,σ
More informationParticle Filtering Approaches for Dynamic Stochastic Optimization
Particle Filtering Approaches for Dynamic Stochastic Optimization John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge I-Sim Workshop,
More informationNEW FRONTIERS IN APPLIED PROBABILITY
J. Appl. Prob. Spec. Vol. 48A, 209 213 (2011) Applied Probability Trust 2011 NEW FRONTIERS IN APPLIED PROBABILITY A Festschrift for SØREN ASMUSSEN Edited by P. GLYNN, T. MIKOSCH and T. ROLSKI Part 4. Simulation
More informationJoint work with Nguyen Hoang (Univ. Concepción, Chile) Padova, Italy, May 2018
EXTENDED EULER-LAGRANGE AND HAMILTONIAN CONDITIONS IN OPTIMAL CONTROL OF SWEEPING PROCESSES WITH CONTROLLED MOVING SETS BORIS MORDUKHOVICH Wayne State University Talk given at the conference Optimization,
More informationSome Results on the Ergodicity of Adaptive MCMC Algorithms
Some Results on the Ergodicity of Adaptive MCMC Algorithms Omar Khalil Supervisor: Jeffrey Rosenthal September 2, 2011 1 Contents 1 Andrieu-Moulines 4 2 Roberts-Rosenthal 7 3 Atchadé and Fort 8 4 Relationship
More informationStein s Method for Steady-State Approximations: Error Bounds and Engineering Solutions
Stein s Method for Steady-State Approximations: Error Bounds and Engineering Solutions Jim Dai Joint work with Anton Braverman and Jiekun Feng Cornell University Workshop on Congestion Games Institute
More informationChapter 7. Markov chain background. 7.1 Finite state space
Chapter 7 Markov chain background A stochastic process is a family of random variables {X t } indexed by a varaible t which we will think of as time. Time can be discrete or continuous. We will only consider
More informationMarkov Chain Monte Carlo Methods for Stochastic
Markov Chain Monte Carlo Methods for Stochastic Optimization i John R. Birge The University of Chicago Booth School of Business Joint work with Nicholas Polson, Chicago Booth. JRBirge U Florida, Nov 2013
More informationStochastic process. X, a series of random variables indexed by t
Stochastic process X, a series of random variables indexed by t X={X(t), t 0} is a continuous time stochastic process X={X(t), t=0,1, } is a discrete time stochastic process X(t) is the state at time t,
More informationSequential Monte Carlo Samplers for Applications in High Dimensions
Sequential Monte Carlo Samplers for Applications in High Dimensions Alexandros Beskos National University of Singapore KAUST, 26th February 2014 Joint work with: Dan Crisan, Ajay Jasra, Nik Kantas, Alex
More informationLecture 6. 2 Recurrence/transience, harmonic functions and martingales
Lecture 6 Classification of states We have shown that all states of an irreducible countable state Markov chain must of the same tye. This gives rise to the following classification. Definition. [Classification
More informationReinforcement Learning In Continuous Time and Space
Reinforcement Learning In Continuous Time and Space presentation of paper by Kenji Doya Leszek Rybicki lrybicki@mat.umk.pl 18.07.2008 Leszek Rybicki lrybicki@mat.umk.pl Reinforcement Learning In Continuous
More informationOn Stopping Times and Impulse Control with Constraint
On Stopping Times and Impulse Control with Constraint Jose Luis Menaldi Based on joint papers with M. Robin (216, 217) Department of Mathematics Wayne State University Detroit, Michigan 4822, USA (e-mail:
More informationMonte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan
Monte-Carlo MMD-MA, Université Paris-Dauphine Xiaolu Tan tan@ceremade.dauphine.fr Septembre 2015 Contents 1 Introduction 1 1.1 The principle.................................. 1 1.2 The error analysis
More informationState Space Representation of Gaussian Processes
State Space Representation of Gaussian Processes Simo Särkkä Department of Biomedical Engineering and Computational Science (BECS) Aalto University, Espoo, Finland June 12th, 2013 Simo Särkkä (Aalto University)
More informationA regeneration proof of the central limit theorem for uniformly ergodic Markov chains
A regeneration proof of the central limit theorem for uniformly ergodic Markov chains By AJAY JASRA Department of Mathematics, Imperial College London, SW7 2AZ, London, UK and CHAO YANG Department of Mathematics,
More informationExact Simulation of the Stationary Distribution of M/G/c Queues
1/36 Exact Simulation of the Stationary Distribution of M/G/c Queues Professor Karl Sigman Columbia University New York City USA Conference in Honor of Søren Asmussen Monday, August 1, 2011 Sandbjerg Estate
More informationNear-Optimal Control of Queueing Systems via Approximate One-Step Policy Improvement
Near-Optimal Control of Queueing Systems via Approximate One-Step Policy Improvement Jefferson Huang March 21, 2018 Reinforcement Learning for Processing Networks Seminar Cornell University Performance
More informationStatistical signal processing
Statistical signal processing Short overview of the fundamentals Outline Random variables Random processes Stationarity Ergodicity Spectral analysis Random variable and processes Intuition: A random variable
More informationUniversity of Toronto Department of Statistics
Norm Comparisons for Data Augmentation by James P. Hobert Department of Statistics University of Florida and Jeffrey S. Rosenthal Department of Statistics University of Toronto Technical Report No. 0704
More informationSTOCHASTIC PROCESSES Basic notions
J. Virtamo 38.3143 Queueing Theory / Stochastic processes 1 STOCHASTIC PROCESSES Basic notions Often the systems we consider evolve in time and we are interested in their dynamic behaviour, usually involving
More informationMultivariate Risk Processes with Interacting Intensities
Multivariate Risk Processes with Interacting Intensities Nicole Bäuerle (joint work with Rudolf Grübel) Luminy, April 2010 Outline Multivariate pure birth processes Multivariate Risk Processes Fluid Limits
More informationApproximate Dynamic Programming
Master MVA: Reinforcement Learning Lecture: 5 Approximate Dynamic Programming Lecturer: Alessandro Lazaric http://researchers.lille.inria.fr/ lazaric/webpage/teaching.html Objectives of the lecture 1.
More informationIntroduction to Reinforcement Learning. CMPT 882 Mar. 18
Introduction to Reinforcement Learning CMPT 882 Mar. 18 Outline for the week Basic ideas in RL Value functions and value iteration Policy evaluation and policy improvement Model-free RL Monte-Carlo and
More informationWorkload Models for Stochastic Networks: Value Functions and Performance Evaluation
Workload Models for Stochastic Networks: Value Functions and Performance Evaluation Sean Meyn Fellow, IEEE Abstract This paper concerns control and performance evaluation for stochastic network models.
More informationFinding the best mismatched detector for channel coding and hypothesis testing
Finding the best mismatched detector for channel coding and hypothesis testing Sean Meyn Department of Electrical and Computer Engineering University of Illinois and the Coordinated Science Laboratory
More information1 Markov decision processes
2.997 Decision-Making in Large-Scale Systems February 4 MI, Spring 2004 Handout #1 Lecture Note 1 1 Markov decision processes In this class we will study discrete-time stochastic systems. We can describe
More informationProbability Models in Electrical and Computer Engineering Mathematical models as tools in analysis and design Deterministic models Probability models
Probability Models in Electrical and Computer Engineering Mathematical models as tools in analysis and design Deterministic models Probability models Statistical regularity Properties of relative frequency
More informationBayesian Methods with Monte Carlo Markov Chains II
Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University hslu@stat.nctu.edu.tw http://tigpbp.iis.sinica.edu.tw/courses.htm 1 Part 3
More informationParticle Filtering for Data-Driven Simulation and Optimization
Particle Filtering for Data-Driven Simulation and Optimization John R. Birge The University of Chicago Booth School of Business Includes joint work with Nicholas Polson. JRBirge INFORMS Phoenix, October
More informationIrreducibility. Irreducible. every state can be reached from every other state For any i,j, exist an m 0, such that. Absorbing state: p jj =1
Irreducibility Irreducible every state can be reached from every other state For any i,j, exist an m 0, such that i,j are communicate, if the above condition is valid Irreducible: all states are communicate
More informationLecture 5: Importance sampling and Hamilton-Jacobi equations
Lecture 5: Importance sampling and Hamilton-Jacobi equations Henrik Hult Department of Mathematics KTH Royal Institute of Technology Sweden Summer School on Monte Carlo Methods and Rare Events Brown University,
More informationPerturbed Proximal Gradient Algorithm
Perturbed Proximal Gradient Algorithm Gersende FORT LTCI, CNRS, Telecom ParisTech Université Paris-Saclay, 75013, Paris, France Large-scale inverse problems and optimization Applications to image processing
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationAn introduction to particle filters
An introduction to particle filters Andreas Svensson Department of Information Technology Uppsala University June 10, 2014 June 10, 2014, 1 / 16 Andreas Svensson - An introduction to particle filters Outline
More informationProceedings of the 2014 Winter Simulation Conference A. Tolk, S. D. Diallo, I. O. Ryzhov, L. Yilmaz, S. Buckley, and J. A. Miller, eds.
Proceedings of the 014 Winter Simulation Conference A. Tolk, S. D. Diallo, I. O. Ryzhov, L. Yilmaz, S. Buckley, and J. A. Miller, eds. Rare event simulation in the neighborhood of a rest point Paul Dupuis
More informationHJB equations. Seminar in Stochastic Modelling in Economics and Finance January 10, 2011
Department of Probability and Mathematical Statistics Faculty of Mathematics and Physics, Charles University in Prague petrasek@karlin.mff.cuni.cz Seminar in Stochastic Modelling in Economics and Finance
More informationState-Space Methods for Inferring Spike Trains from Calcium Imaging
State-Space Methods for Inferring Spike Trains from Calcium Imaging Joshua Vogelstein Johns Hopkins April 23, 2009 Joshua Vogelstein (Johns Hopkins) State-Space Calcium Imaging April 23, 2009 1 / 78 Outline
More informationManagement of demand-driven production systems
Management of demand-driven production systems Mike Chen, Richard Dubrawski, and Sean Meyn November 4, 22 Abstract Control-synthesis techniques are developed for demand-driven production systems. The resulting
More informationTime-Varying Parameters
Kalman Filter and state-space models: time-varying parameter models; models with unobservable variables; basic tool: Kalman filter; implementation is task-specific. y t = x t β t + e t (1) β t = µ + Fβ
More informationMarkov processes and queueing networks
Inria September 22, 2015 Outline Poisson processes Markov jump processes Some queueing networks The Poisson distribution (Siméon-Denis Poisson, 1781-1840) { } e λ λ n n! As prevalent as Gaussian distribution
More informationOutline. A Central Limit Theorem for Truncating Stochastic Algorithms
Outline A Central Limit Theorem for Truncating Stochastic Algorithms Jérôme Lelong http://cermics.enpc.fr/ lelong Tuesday September 5, 6 1 3 4 Jérôme Lelong (CERMICS) Tuesday September 5, 6 1 / 3 Jérôme
More informationModeling and state estimation Examples State estimation Probabilities Bayes filter Particle filter. Modeling. CSC752 Autonomous Robotic Systems
Modeling CSC752 Autonomous Robotic Systems Ubbo Visser Department of Computer Science University of Miami February 21, 2017 Outline 1 Modeling and state estimation 2 Examples 3 State estimation 4 Probabilities
More informationMarkov Chains. Arnoldo Frigessi Bernd Heidergott November 4, 2015
Markov Chains Arnoldo Frigessi Bernd Heidergott November 4, 2015 1 Introduction Markov chains are stochastic models which play an important role in many applications in areas as diverse as biology, finance,
More informationStochastic Hybrid Systems: Applications to Communication Networks
research supported by NSF Stochastic Hybrid Systems: Applications to Communication Networks João P. Hespanha Center for Control Engineering and Computation University of California at Santa Barbara Deterministic
More informationStat 451 Lecture Notes Monte Carlo Integration
Stat 451 Lecture Notes 06 12 Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 23 in Lange, and Chapters 3 4 in Robert & Casella 2 Updated:
More informationPractical conditions on Markov chains for weak convergence of tail empirical processes
Practical conditions on Markov chains for weak convergence of tail empirical processes Olivier Wintenberger University of Copenhagen and Paris VI Joint work with Rafa l Kulik and Philippe Soulier Toronto,
More informationOptimization and the Price of Anarchy in a Dynamic Newsboy Model
Optimization and the Price of Anarchy in a Dynamic Newsboy Model Sean Meyn Prices Normalized demand Reserve Department of ECE and the Coordinated Science Laboratory University of Illinois Joint work with
More informationCentral limit theorems for ergodic continuous-time Markov chains with applications to single birth processes
Front. Math. China 215, 1(4): 933 947 DOI 1.17/s11464-15-488-5 Central limit theorems for ergodic continuous-time Markov chains with applications to single birth processes Yuanyuan LIU 1, Yuhui ZHANG 2
More informationSession 1: Probability and Markov chains
Session 1: Probability and Markov chains 1. Probability distributions and densities. 2. Relevant distributions. 3. Change of variable. 4. Stochastic processes. 5. The Markov property. 6. Markov finite
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.
Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 1.1 Outline Introduction to Markov chain Monte Carlo The Gibbs Sampler Examples Overview of the Lecture
More informationLecture 20: Reversible Processes and Queues
Lecture 20: Reversible Processes and Queues 1 Examples of reversible processes 11 Birth-death processes We define two non-negative sequences birth and death rates denoted by {λ n : n N 0 } and {µ n : n
More informationMarkov Decision Processes and Dynamic Programming
Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course In This Lecture A. LAZARIC Markov Decision Processes
More informationPROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS
PROBABILITY: LIMIT THEOREMS II, SPRING 218. HOMEWORK PROBLEMS PROF. YURI BAKHTIN Instructions. You are allowed to work on solutions in groups, but you are required to write up solutions on your own. Please
More informationResource Pooling for Optimal Evacuation of a Large Building
Proceedings of the 47th IEEE Conference on Decision and Control Cancun, Mexico, Dec. 9-11, 28 Resource Pooling for Optimal Evacuation of a Large Building Kun Deng, Wei Chen, Prashant G. Mehta, and Sean
More informationA Backward Particle Interpretation of Feynman-Kac Formulae
A Backward Particle Interpretation of Feynman-Kac Formulae P. Del Moral Centre INRIA de Bordeaux - Sud Ouest Workshop on Filtering, Cambridge Univ., June 14-15th 2010 Preprints (with hyperlinks), joint
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationSUPPLEMENT TO PAPER CONVERGENCE OF ADAPTIVE AND INTERACTING MARKOV CHAIN MONTE CARLO ALGORITHMS
Submitted to the Annals of Statistics SUPPLEMENT TO PAPER CONERGENCE OF ADAPTIE AND INTERACTING MARKO CHAIN MONTE CARLO ALGORITHMS By G Fort,, E Moulines and P Priouret LTCI, CNRS - TELECOM ParisTech,
More information5 Applying the Fokker-Planck equation
5 Applying the Fokker-Planck equation We begin with one-dimensional examples, keeping g = constant. Recall: the FPE for the Langevin equation with η(t 1 )η(t ) = κδ(t 1 t ) is = f(x) + g(x)η(t) t = x [f(x)p
More informationErgodicity and Non-Ergodicity in Economics
Abstract An stochastic system is called ergodic if it tends in probability to a limiting form that is independent of the initial conditions. Breakdown of ergodicity gives rise to path dependence. We illustrate
More informationStochastic Processes
Elements of Lecture II Hamid R. Rabiee with thanks to Ali Jalali Overview Reading Assignment Chapter 9 of textbook Further Resources MIT Open Course Ware S. Karlin and H. M. Taylor, A First Course in Stochastic
More informationConcentration inequalities for Feynman-Kac particle models. P. Del Moral. INRIA Bordeaux & IMB & CMAP X. Journées MAS 2012, SMAI Clermond-Ferrand
Concentration inequalities for Feynman-Kac particle models P. Del Moral INRIA Bordeaux & IMB & CMAP X Journées MAS 2012, SMAI Clermond-Ferrand Some hyper-refs Feynman-Kac formulae, Genealogical & Interacting
More informationStatistics 150: Spring 2007
Statistics 150: Spring 2007 April 23, 2008 0-1 1 Limiting Probabilities If the discrete-time Markov chain with transition probabilities p ij is irreducible and positive recurrent; then the limiting probabilities
More informationSmoothers: Types and Benchmarks
Smoothers: Types and Benchmarks Patrick N. Raanes Oxford University, NERSC 8th International EnKF Workshop May 27, 2013 Chris Farmer, Irene Moroz Laurent Bertino NERSC Geir Evensen Abstract Talk builds
More informationPositive Harris Recurrence and Diffusion Scale Analysis of a Push Pull Queueing Network. Haifa Statistics Seminar May 5, 2008
Positive Harris Recurrence and Diffusion Scale Analysis of a Push Pull Queueing Network Yoni Nazarathy Gideon Weiss Haifa Statistics Seminar May 5, 2008 1 Outline 1 Preview of Results 2 Introduction Queueing
More informationIntroduction to Markov Chain Monte Carlo & Gibbs Sampling
Introduction to Markov Chain Monte Carlo & Gibbs Sampling Prof. Nicholas Zabaras Sibley School of Mechanical and Aerospace Engineering 101 Frank H. T. Rhodes Hall Ithaca, NY 14853-3801 Email: zabaras@cornell.edu
More informationUniqueness of random gradient states
p. 1 Uniqueness of random gradient states Codina Cotar (Based on joint work with Christof Külske) p. 2 Model Interface transition region that separates different phases p. 2 Model Interface transition
More informationIntroduction to Rare Event Simulation
Introduction to Rare Event Simulation Brown University: Summer School on Rare Event Simulation Jose Blanchet Columbia University. Department of Statistics, Department of IEOR. Blanchet (Columbia) 1 / 31
More informationIEOR 6711, HMWK 5, Professor Sigman
IEOR 6711, HMWK 5, Professor Sigman 1. Semi-Markov processes: Consider an irreducible positive recurrent discrete-time Markov chain {X n } with transition matrix P (P i,j ), i, j S, and finite state space.
More informationLectures on Stochastic Stability. Sergey FOSS. Heriot-Watt University. Lecture 4. Coupling and Harris Processes
Lectures on Stochastic Stability Sergey FOSS Heriot-Watt University Lecture 4 Coupling and Harris Processes 1 A simple example Consider a Markov chain X n in a countable state space S with transition probabilities
More informationGradient-based Adaptive Stochastic Search
1 / 41 Gradient-based Adaptive Stochastic Search Enlu Zhou H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology November 5, 2014 Outline 2 / 41 1 Introduction
More informationQuality of Real-Time Streaming in Wireless Cellular Networks : Stochastic Modeling and Analysis
Quality of Real-Time Streaming in Wireless Cellular Networs : Stochastic Modeling and Analysis B. Blaszczyszyn, M. Jovanovic and M. K. Karray Based on paper [1] WiOpt/WiVid Mai 16th, 2014 Outline Introduction
More informationCondensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C.
Condensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C. Spall John Wiley and Sons, Inc., 2003 Preface... xiii 1. Stochastic Search
More informationApproximate Bayesian Computation and Particle Filters
Approximate Bayesian Computation and Particle Filters Dennis Prangle Reading University 5th February 2014 Introduction Talk is mostly a literature review A few comments on my own ongoing research See Jasra
More informationRiemann Manifold Methods in Bayesian Statistics
Ricardo Ehlers ehlers@icmc.usp.br Applied Maths and Stats University of São Paulo, Brazil Working Group in Statistical Learning University College Dublin September 2015 Bayesian inference is based on Bayes
More informationSequential Monte Carlo Methods
University of Pennsylvania Bradley Visitor Lectures October 23, 2017 Introduction Unfortunately, standard MCMC can be inaccurate, especially in medium and large-scale DSGE models: disentangling importance
More informationGaussian, Markov and stationary processes
Gaussian, Markov and stationary processes Gonzalo Mateos Dept. of ECE and Goergen Institute for Data Science University of Rochester gmateosb@ece.rochester.edu http://www.ece.rochester.edu/~gmateosb/ November
More informationName of the Student: Problems on Discrete & Continuous R.Vs
Engineering Mathematics 05 SUBJECT NAME : Probability & Random Process SUBJECT CODE : MA6 MATERIAL NAME : University Questions MATERIAL CODE : JM08AM004 REGULATION : R008 UPDATED ON : Nov-Dec 04 (Scan
More information