Risk-Sensitive and Average Optimality in Markov Decision Processes

Size: px
Start display at page:

Download "Risk-Sensitive and Average Optimality in Markov Decision Processes"

Transcription

1 Risk-Sensitive and Average Optimality in Markov Decision Processes Karel Sladký Abstract. This contribution is devoted to the risk-sensitive optimality criteria in finite state Markov Decision Processes. At first, we rederive necessary and sufficient conditions for average optimality of (classical) risk-neutral unichain models. This approach is then extended to the risk-sensitive case, i.e., when expectation of the stream of one-stage costs (or rewards) generated by a Markov chain is evaluated by an exponential utility function. We restrict ourselves on irreducible or unichain Markov models where risk-sensitive average optimality is independent of the starting state. As we show this problem is closely related to solution of (nonlinear) Poissonian equations and their connections with nonnegative matrices. Keywords: dynamic programming, stochastic models, risk analysis and management. JEL classification: C44, C6, C63 AMS classification: 90C40, 60J0, 93E20 Notation and Preinaries In this note, we consider unichain Markov decision processes with finite state space and compact action spaces where the stream of costs generated by the Markov processes is evaluated by an exponential utility function (so-called risk-sensitive models) with a given risk sensitivity coefficient. To this end, let us consider an exponential utility function, say ū γ ( ), i.e. a separable utility function with constant risk sensitivity γ R. For γ > 0 (risk averse case) ū γ ( ) is convex, if γ < 0 (risk seeking case) ū γ ( ) is concave. Finally if γ =0 (risk neutral case) ū γ ( ) is linear. Observe that exponential utility function ū γ ( ) is separable and multiplicative if the risk sensitivity γ 0 and additive for γ = 0. In particular, we have u γ (ξ + ξ 2 ) = u γ (ξ ) u γ (ξ 2 ) if γ 0 and u γ (ξ + ξ 2 ) ξ + ξ 2 for γ = 0. Then the utility assigned to the (random) outcome ξ is given by ū γ (ξ) := { (sign γ) exp(γξ), if γ 0, ξ for γ = 0. () For what follows let u γ (ξ) := exp(γξ), hence ū γ (ξ) = (sign γ) u γ (ξ). Obviously ū γ ( ) is continuous and strictly increasing. Then for the corresponding certainty equivalent, say Z γ (ξ), since ū γ (Z γ (ξ)) = E [ū γ (ξ)] (E is reserved for expectation), we immediately get Z γ (ξ) = { γ ln{e u γ (ξ)}, if γ 0 E [ξ] for γ = 0. (2) In what follows, we consider a Markov decision chain X = {X n, n = 0,,...} with finite state space I = {, 2,..., N} and a compact set A i of possible decisions (actions) in state i I. Supposing that in state i I action a A i is selected, then state j is reached in the next transition with a given probability p ij (a) and one-stage transition cost c ij will be accrued to such transition. Institute of Information Theory and Automation Academy of Sciences of the Czech Republic Pod Vodárenskou věží 4, Praha 8, Czech Republic, sladky@utia.cas.cz

2 A (Markovian) policy controlling the decision process is given by a sequence of decisions (actions) at every time point. In particular, policy controlling the process, π = (f 0, f,...), with π k = (f k, f k+,...) for k =, 2,..., hence also π = (f 0, f,..., f k, π k ) is identified by a sequence of decision vectors {f n, n = 0,,...} where f n F A... A N for every n = 0,, 2,..., and f n i A i is the decision (or action) taken at the nth transition if the chain X is in state i. Policy π which selects at all times the same decision rule, i.e. π (f), is called stationary, hence X is a homogeneous Markov chain with transition probability matrix P (f) whose ij-th element equals p ij (f i ). Let ξn α = αk c Xk,X k+ with α (0, ), resp. ξ n = c X k,x k+, be the stream of α-discounted, resp. undiscounted, transition costs received in the n next transitions of the considered Markov chain X. Similarly let ξ (m,n) be reserved for the total (random) cost obtained from the mth up to the nth transition (obviously, ξ n = c X0,X + ξ (,n) ). Moreover, if the risk sensitivity γ 0 then ū γ (ξn) α = (sign γ) u γ (ξn), α resp. ū γ (ξ n ) = (sign γ) u γ (ξ n ), is the (random) utility assigned to ξn, α resp. to ξ n. Observe that ξ α := ξn α is well defined, hence u γ (ξ α ) = exp(γ α k c Xk,X k+ ). In the overwhelming literature on stochastic dynamic programming attention was mostly paid to the risk neutral case, i.e. if γ = 0. The following results and techniques adapted from [8] and [2] will be useful for derivation of necessary and sufficient average optimality conditions and for their further extensions to risk-sensitive models. Introducing for arbitrary g, w j R (i, j I) the discrepancy function we can easily verify the following identities: φ i,j (w, g) = c i,j w i + w j g (3) ξn α = αn α g + w X 0 α n w Xn + α k [ φ Xk,X k+ (w, g) ( α) w Xk+ ] (4) ξ n = ng + w X0 w Xn + φ Xk,X k+ (w, g). (5) If the process starts in state i and policy π = (f n ) is followed then for the expected α-discounted or undiscounted total cost Vi π(α, n) := E π i ξα n, Vi π(n) := E π i ξ n we immediately get by (4) (5) Vi π (α, n) = αn α g + w i + E π i { α k [ φ Xk,X k+ (w, g) ( α)w Xk+ ] α n w Xn } (6) V π Observe that i (n) = ng + w i + E π i { φ Xk,X k+ (w, g) w Xn }. (7) E π i φ Xk,X k+ (w, g) = p ij (f i ){ φ i,j (w, g) + E π j k= φ Xk,X k+ (w, g)}. (8) It is well-known from the dynamic programming literature (cf. e.g. [, 6, 9, 0, 6]) that If there exists state i 0 I that is accessible from any state i I for every f F then (i) For every f F the resulting transition probability matrix P (f) is unichain (i.e. P (f) have no two disjoint closed sets), (ii) There exists decision vector ˆf F along with numbers ŵ i, i I (unique up to additive constant), and ĝ being the solution of the set of (nonlinear) equations ŵ i + ĝ = min p ij (a)[c i,j + ŵ j ] = p ij ( ˆf i )[c i,j + ŵ j ], (9) a A i φ i (f, ˆf) := p ij (f)[c i,j + ŵ j ] ŵ i ĝ 0 with φ i ( ˆf, ˆf) = 0. (0) ( )

3 From (6), (7), (0) we immediately get for Vi π (α) := V π i (α, n) V ˆπ i (α) = αĝ + ŵ i ( α)e ˆπ i α k ŵ Xk+, V ˆπ i (n) = nĝ + ŵ i E ˆπ i ŵ n hence for stationary policy π ( ˆf) and arbitrary policy π = (f n ) n V i ˆπ (n) = ( α)vi ˆπ (α) = ĝ () α n V π i (n) = ĝ = α ( α)v π i (α) if and only if n E π i φ Xk (f n, ˆf) = 0. (2) 2 Risk-Sensitive Optimality On inserting the discrepancy function given by (3) in the exponential function u γ ( ) by (4) we get for the stream of discounted costs u γ (ξn) α = e γ α k c Xk,X k+ (3) = e γ[ and for U π i (γ, α, n) := E π i uγ (ξ α n) we have α k g+w X0 α n w Xn ] γ e α k [ φ Xk,X k+ (w,g) ( α) w Xk+ ] γ{ Ui π αn γ[ (γ, α, n) = e α g+wi] E π α k [ φ Xk,X k+ (w,g) ( α)w Xk+ ] α n w X n i e } (4) Observe that w i s are bounded, i.e. w Xk K for some K 0. Hence it holds e γ K e γ ( α) α k w Xk+ k= e γ K (5) and for n tending to infinity from (4) we immediately get for U π i (γ, α) := U π i (γ, α, n) Ui π (γ, α) = e γ[ α g+wi] E π i e γ α k [ φ Xk,X k+ (w,g)+( α)w Xk+ ] (6) Similarly for undiscounted models we get by (3), (4) Now observe that E π j {e γ k=m E π i e γ U π i (γ, n) = e γ[ng+w i] E π i e γ[ φ Xk,X k+ (w,g) = φ Xk,X k+ (w,g) Xm = j} = p jl (fj m l I φ Xk,X k+ (w,g) w X n ] (7) p ij (f 0 i ) e γ[ci,j wi+wj g] E π j ) e γ[c j,l w j +w l g] E πm+ l Employing (8) the following facts can be easily verified by (6), (7). e γ φ Xk,X k+ (w,g) k= (8) e γ φ Xk,X k+ (w,g) k=m+ (9)

4 Result. (i) Let for a given stationary policy π (f) there exist g(f), w j (f) s such that p ij (f i ) e γ φi,j(w(f),g(f)) =, i I. (20) Then p ij (f i ) e γ[ci,j wi(f)+wj(f) g(f)] = Ui π (γ, α) = e γ( α g(f)+wi(f)) E π γ ( α) i e α k w Xk (f) k= (2) U π i (γ, n) = e γ(ng(f)+w i(f)) E π i e γ w Xn (f) (22) (ii) If it is possible to select g = g and w j = w j s, resp. g = ĝ and w j = ŵ j s, such that for any f F, all i I and some f F, resp. ˆf F, respectively p ij (f i ) e γ φi,j(w,g ) with p ij (f i ) e γ φi,j(ŵ,ĝ) with p ij (f i ) e γ φ i,j(w,g ) = (23) p ij ( ˆf i ) e γ φi,j(ŵ,ĝ) = (24) then e γ( α ĝ+ŵ i) e γ K U ˆπ i (γ, α) e γ(nĝ+ŵi) e γ K U ˆπ i (γ, n) U π i (γ, α) e γ( α g +w i ) e γ K (25) U π i (γ, n)) e γ(ng +w i ) e γ K. (26) Result 2. Let (cf. (2)) Zi π(γ, α) = γ ln U i π(γ, α), Zπ i (γ, n) = γ ln U i π (γ, n). Then by (25), (26) for stationary policies ˆπ ( ˆf), π (f ), and by (6), (7) for an arbitrary policy π = (f n ) n Z i ˆπ (γ, n) = ( α)zi ˆπ (γ, α) = ĝ, α n Zπ i (γ, n) = g, resp. n ln[e π i e γ 3 Poissonian Equations n Zπ i (γ, n) = ĝ, n Zπ i (γ, n) = α ( α)z π i (γ, α) = g (27) if and only if φ Xk,X k+ (w,g ) ] = 0, resp. n ln[e π i e γ φ Xk,X k+ (ŵ,ĝ) ] = 0. (28) The system of equations (20) for the considered stationary policy π (f) and the nonlinear systems of equations (23), (24) for finding stationary policy with maximal/minimal value of g(f) can be also written as e γ[g(f)+w i(f)] = e γ[g +w i ] = max e γ[ĝ+ŵ i] = min p ij (f i ) e γ[c i,j+w j (f)] (i I) (29) p ij (f i ) e γ[ci,j+w j ] (i I) (30) p ij (f i ) e γ[c i,j+ŵ j ] (i I) (3) respectively, for the values g(f), ĝ, g, w i (f), wi, ŵ i (i =,..., N); obviously, these values depend on the selected risk sensitivity γ. Eqs. (30), (3) can be called the γ-average reward/cost optimality equation. In particular, if γ 0 using the Taylor expansion by (29), resp. (3), we have g(f) + w i (f) = p ij (f i ) [c i,j + w j (f)], resp. ĝ + ŵ i = min that well corresponds to (9). p ij (f i ) [c i,j + ŵ j ]

5 On introducing new variables v i (f) := e γwi(f), ρ(f) := e γg(f), and on replacing transition probabilities p ij (f i ) s by general nonnegative numbers defined by q ij (f i ) := p ij (f i ) e γcij (29) can be alternatively written as the following set of equations ρ(f)v i (f) = q ij (f i ) v j (f) (i I) (32) and (30), (3) can be rewritten as the following sets of nonlinear equations (here ˆv i := e γŵ i, v i := eγw i, ˆρ = e γĝ, ρ := e γg ) ρ v i = max q ij (f i ) vj, ˆρ ˆv i = min called γ-average reward/cost optimality equation in multiplicative form. q ij (f i ) ˆv j (i I) (33) For what follows it is convenient to consider (32), (33) in matrix form. To this end, we introduce the N N matrix Q(f) = [q ij (f i )] with spectral radius (Perron eigenvalue) ρ(f) along with its right Perron eigenvector v(f) = [v i (f i )], hence (cf. [5]) ρ(f)v(f) = Q(f)v(f). Similarly, for v(f ) = v, v( ˆf) = ˆv (33) can be written in matrix form as ρ v = max Q(f)v, ˆρ ˆv = min Q(f)ˆv. (34) Recall that vectorial maximum and minimum in (34) should be considered componentwise and ˆv, v are unique up to multiplicative constant. Furthermore, if the transition probability matrix P (f) is irreducible then also Q(f) is irreducible and the right Perron eigenvector v(f) can be selected strictly positive. To extend this assertion to unichain models in contrast to condition ( ) for the risk neutral case it is necessary to assume existence of state i 0 I accessible from any state i I for every f F that belongs to the basic class of Q(f). If condition ( ) is fulfilled it can be shown (cf. [5]) that in (30), (3) and (33) eigenvectors v(f), ˆv, v can be selected strictly positive and ρ, resp. ˆρ, is the maximum, resp. minimum, Perron eigenvalue of the matrix family {Q(f), f F}. So we have arrived to Result 3. Sufficient condition for the existence of γ-average reward/costs optimality equation is the existence of state i 0 I fulfilling condition ( ) (trivially fulfilled for irreducible models). ( ) In particular, for unichain models condition ( ) is fulfilled if this risk sensitive coefficient γ is sufficiently close to zero (cf. [3, 4, 5]). Finding solution of (34) can be performed by policy or value iteration. Details can be found e.g. in [2, 3, 7, 4, 5]. Finally, we rewrite optimality condition (28) of Result 2 in terms of Q(f), ˆv, ˆρ. To this end first observe that E π i 0 e γ φ Xk,X k+ (ŵ,ĝ) = p ik,i k+ (fi k k ) e γ[c i k,i k+ +ŵ ik+ ŵ ik ĝ] i k+ I = q ik,i k+ (fi k k ) ˆv ik+ ˆv i k ˆρ (35) i k+ I So equation (28) can be also written in matrix form as { } { [ ] } n ln ˆV Q(f k ) ˆV ˆρ = n ln ˆV Q(f k ) ˆρ ˆV ˆV = 0 (36) where the diagonal matrix ˆV = diag [ˆv i ] and 0 is reserved for a null matrix. (i.e. irreducible class with spectral radius equal to the Perron eigenvalue of Q(f))

6 4 Conclusions In this note necessary and sufficient optimality conditions for discrete time Markov decision chains are obtained along with equations for average optimal policies both for risk-neutral and risk-sensitive models. Our analysis is restricted to unichain models, and for the risk-sensitive case some additional assumptions are made. For multichain models it is necessary to find suitable partition of the state space into nested classes that retain some properties of the unichain model. Some results in this direction can be found in [, 5, 7, 8]. Acknowledgements This research was supported by the Czech Science Foundation under Grants P402//050 and P402/0/0956. References [] Bertsekas, D. P.: Dynamic Programming and Optimal Control. Volume 2. Third Edition. Athena Scientific, Belmont, Mass [2] Cavazos-Cadena, R. and Montes-de-Oca R.: The value iteration algorithm in risk-sensitive average Markov decision chains with finite state space. Mathematics of Operations Research 28 (2003), [3] Cavazos-Cadena, R.: Solution to the risk-sensitive average cost optimality equation in a class of Markov decision processes with finite state space. Mathematical Methods of Operations Research 57 (2003), [4] Cavazos-Cadena, R. and Hernández-Hernández, D.: A characterization of the optimal risk-sensitive average cost infinite controlled Markov chains. Annals of Applied Probability 5 (2005), [5] Gantmakher, F. R.: The Theory of Matrices. Chelsea, London 959. [6] Howard, R. A.: Dynamic Programming and Markov Processes. MIT Press, Cambridge, Mass [7] Howard, R. A. and Matheson, J.: Risk-sensitive Markov decision processes. Management Science 23 (972), [8] Mandl, P.: On the variance in controlled Markov chains. Kybernetika 7 (97), 2. [9] Puterman, M. L.: Markov Decision Processes Discrete Stochastic Dynamic Programming. Wiley, New York 994. [0] Ross, S. M.: Introduction to Stochastic Dynamic Programming. Academic Press, New York 983. [] Rothblum, U. G. and Whittle, P.: Growth optimality for branching Markov decision chains. Mathematics of Operations Research 7 (982), [2] Sladký, K.: On the set of optimal controls for Markov chains with rewards. Kybernetika 0 (974), [3] Sladký, K.: On dynamic programming recursions for multiplicative Markov decision chains. Mathematical Programming Study 6 (976), [4] Sladký, K.: Bounds on discrete dynamic programming recursions I. Kybernetika 6 (980), [5] Sladký, K.: Growth rates and average optimality in risk-sensitive Markov decision chains. Kybernetika 44 (2008), [6] Tijms, H. C.: A First Course in Stochastic Models. Wiley, Chichester, [7] Whittle, P.: Optimization Over Time Dynamic Programming and Stochastic Control. Volume II, Chapter 35. Wiley, Chichester 983. [8] Zijm, W. H. M.: Nonnegative Matrices in Dynamic Programming. Mathematical Centre Tract, Amsterdam

Separable Utility Functions in Dynamic Economic Models

Separable Utility Functions in Dynamic Economic Models Separable Utility Functions in Dynamic Economic Models Karel Sladký 1 Abstract. In this note we study properties of utility functions suitable for performance evaluation of dynamic economic models under

More information

Non-Irreducible Controlled Markov Chains with Exponential Average Cost.

Non-Irreducible Controlled Markov Chains with Exponential Average Cost. Non-Irreducible Controlled Markov Chains with Exponential Average Cost. Agustin Brau-Rojas Departamento de Matematicas Universidad de Sonora. Emmanuel Fernández-Gaucherand Dept. of Electrical & Computer

More information

UNCORRECTED PROOFS. P{X(t + s) = j X(t) = i, X(u) = x(u), 0 u < t} = P{X(t + s) = j X(t) = i}.

UNCORRECTED PROOFS. P{X(t + s) = j X(t) = i, X(u) = x(u), 0 u < t} = P{X(t + s) = j X(t) = i}. Cochran eorms934.tex V1 - May 25, 21 2:25 P.M. P. 1 UNIFORMIZATION IN MARKOV DECISION PROCESSES OGUZHAN ALAGOZ MEHMET U.S. AYVACI Department of Industrial and Systems Engineering, University of Wisconsin-Madison,

More information

Markov decision processes and interval Markov chains: exploiting the connection

Markov decision processes and interval Markov chains: exploiting the connection Markov decision processes and interval Markov chains: exploiting the connection Mingmei Teo Supervisors: Prof. Nigel Bean, Dr Joshua Ross University of Adelaide July 10, 2013 Intervals and interval arithmetic

More information

Lecture 1: Brief Review on Stochastic Processes

Lecture 1: Brief Review on Stochastic Processes Lecture 1: Brief Review on Stochastic Processes A stochastic process is a collection of random variables {X t (s) : t T, s S}, where T is some index set and S is the common sample space of the random variables.

More information

INTERTWINING OF BIRTH-AND-DEATH PROCESSES

INTERTWINING OF BIRTH-AND-DEATH PROCESSES K Y BERNETIKA VOLUM E 47 2011, NUMBER 1, P AGES 1 14 INTERTWINING OF BIRTH-AND-DEATH PROCESSES Jan M. Swart It has been known for a long time that for birth-and-death processes started in zero the first

More information

Stochastic Processes. Theory for Applications. Robert G. Gallager CAMBRIDGE UNIVERSITY PRESS

Stochastic Processes. Theory for Applications. Robert G. Gallager CAMBRIDGE UNIVERSITY PRESS Stochastic Processes Theory for Applications Robert G. Gallager CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv Swgg&sfzoMj ybr zmjfr%cforj owf fmdy xix Acknowledgements xxi 1 Introduction and review

More information

Markov Chains, Random Walks on Graphs, and the Laplacian

Markov Chains, Random Walks on Graphs, and the Laplacian Markov Chains, Random Walks on Graphs, and the Laplacian CMPSCI 791BB: Advanced ML Sridhar Mahadevan Random Walks! There is significant interest in the problem of random walks! Markov chain analysis! Computer

More information

A relative entropy characterization of the growth rate of reward in risk-sensitive control

A relative entropy characterization of the growth rate of reward in risk-sensitive control 1 / 47 A relative entropy characterization of the growth rate of reward in risk-sensitive control Venkat Anantharam EECS Department, University of California, Berkeley (joint work with Vivek Borkar, IIT

More information

Mean-field dual of cooperative reproduction

Mean-field dual of cooperative reproduction The mean-field dual of systems with cooperative reproduction joint with Tibor Mach (Prague) A. Sturm (Göttingen) Friday, July 6th, 2018 Poisson construction of Markov processes Let (X t ) t 0 be a continuous-time

More information

Elements of Reinforcement Learning

Elements of Reinforcement Learning Elements of Reinforcement Learning Policy: way learning algorithm behaves (mapping from state to action) Reward function: Mapping of state action pair to reward or cost Value function: long term reward,

More information

Online solution of the average cost Kullback-Leibler optimization problem

Online solution of the average cost Kullback-Leibler optimization problem Online solution of the average cost Kullback-Leibler optimization problem Joris Bierkens Radboud University Nijmegen j.bierkens@science.ru.nl Bert Kappen Radboud University Nijmegen b.kappen@science.ru.nl

More information

Markov Decision Processes and Dynamic Programming

Markov Decision Processes and Dynamic Programming Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course In This Lecture A. LAZARIC Markov Decision Processes

More information

21 Markov Decision Processes

21 Markov Decision Processes 2 Markov Decision Processes Chapter 6 introduced Markov chains and their analysis. Most of the chapter was devoted to discrete time Markov chains, i.e., Markov chains that are observed only at discrete

More information

Markov Chains, Stochastic Processes, and Matrix Decompositions

Markov Chains, Stochastic Processes, and Matrix Decompositions Markov Chains, Stochastic Processes, and Matrix Decompositions 5 May 2014 Outline 1 Markov Chains Outline 1 Markov Chains 2 Introduction Perron-Frobenius Matrix Decompositions and Markov Chains Spectral

More information

Population Games and Evolutionary Dynamics

Population Games and Evolutionary Dynamics Population Games and Evolutionary Dynamics William H. Sandholm The MIT Press Cambridge, Massachusetts London, England in Brief Series Foreword Preface xvii xix 1 Introduction 1 1 Population Games 2 Population

More information

Chapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS

Chapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS Chapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS 63 2.1 Introduction In this chapter we describe the analytical tools used in this thesis. They are Markov Decision Processes(MDP), Markov Renewal process

More information

Matrix analytic methods. Lecture 1: Structured Markov chains and their stationary distribution

Matrix analytic methods. Lecture 1: Structured Markov chains and their stationary distribution 1/29 Matrix analytic methods Lecture 1: Structured Markov chains and their stationary distribution Sophie Hautphenne and David Stanford (with thanks to Guy Latouche, U. Brussels and Peter Taylor, U. Melbourne

More information

Intrinsic products and factorizations of matrices

Intrinsic products and factorizations of matrices Available online at www.sciencedirect.com Linear Algebra and its Applications 428 (2008) 5 3 www.elsevier.com/locate/laa Intrinsic products and factorizations of matrices Miroslav Fiedler Academy of Sciences

More information

Statistics 992 Continuous-time Markov Chains Spring 2004

Statistics 992 Continuous-time Markov Chains Spring 2004 Summary Continuous-time finite-state-space Markov chains are stochastic processes that are widely used to model the process of nucleotide substitution. This chapter aims to present much of the mathematics

More information

On Finding Optimal Policies for Markovian Decision Processes Using Simulation

On Finding Optimal Policies for Markovian Decision Processes Using Simulation On Finding Optimal Policies for Markovian Decision Processes Using Simulation Apostolos N. Burnetas Case Western Reserve University Michael N. Katehakis Rutgers University February 1995 Abstract A simulation

More information

Q-Learning for Markov Decision Processes*

Q-Learning for Markov Decision Processes* McGill University ECSE 506: Term Project Q-Learning for Markov Decision Processes* Authors: Khoa Phan khoa.phan@mail.mcgill.ca Sandeep Manjanna sandeep.manjanna@mail.mcgill.ca (*Based on: Convergence of

More information

Some notes on Markov Decision Theory

Some notes on Markov Decision Theory Some notes on Markov Decision Theory Nikolaos Laoutaris laoutaris@di.uoa.gr January, 2004 1 Markov Decision Theory[1, 2, 3, 4] provides a methodology for the analysis of probabilistic sequential decision

More information

MDP Preliminaries. Nan Jiang. February 10, 2019

MDP Preliminaries. Nan Jiang. February 10, 2019 MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process

More information

The convergence limit of the temporal difference learning

The convergence limit of the temporal difference learning The convergence limit of the temporal difference learning Ryosuke Nomura the University of Tokyo September 3, 2013 1 Outline Reinforcement Learning Convergence limit Construction of the feature vector

More information

Finite-Horizon Statistics for Markov chains

Finite-Horizon Statistics for Markov chains Analyzing FSDT Markov chains Friday, September 30, 2011 2:03 PM Simulating FSDT Markov chains, as we have said is very straightforward, either by using probability transition matrix or stochastic update

More information

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past. 1 Markov chain: definition Lecture 5 Definition 1.1 Markov chain] A sequence of random variables (X n ) n 0 taking values in a measurable state space (S, S) is called a (discrete time) Markov chain, if

More information

RECURSION EQUATION FOR

RECURSION EQUATION FOR Math 46 Lecture 8 Infinite Horizon discounted reward problem From the last lecture: The value function of policy u for the infinite horizon problem with discount factor a and initial state i is W i, u

More information

COMPUTING OPTIMAL SEQUENTIAL ALLOCATION RULES IN CLINICAL TRIALS* Michael N. Katehakis. State University of New York at Stony Brook. and.

COMPUTING OPTIMAL SEQUENTIAL ALLOCATION RULES IN CLINICAL TRIALS* Michael N. Katehakis. State University of New York at Stony Brook. and. COMPUTING OPTIMAL SEQUENTIAL ALLOCATION RULES IN CLINICAL TRIALS* Michael N. Katehakis State University of New York at Stony Brook and Cyrus Derman Columbia University The problem of assigning one of several

More information

On Polynomial Cases of the Unichain Classification Problem for Markov Decision Processes

On Polynomial Cases of the Unichain Classification Problem for Markov Decision Processes On Polynomial Cases of the Unichain Classification Problem for Markov Decision Processes Eugene A. Feinberg Department of Applied Mathematics and Statistics State University of New York at Stony Brook

More information

Alternative Characterization of Ergodicity for Doubly Stochastic Chains

Alternative Characterization of Ergodicity for Doubly Stochastic Chains Alternative Characterization of Ergodicity for Doubly Stochastic Chains Behrouz Touri and Angelia Nedić Abstract In this paper we discuss the ergodicity of stochastic and doubly stochastic chains. We define

More information

A Connection between Controlled Markov Chains and Martingales

A Connection between Controlled Markov Chains and Martingales KYBERNETIKA VOLUME 9 (1973), NUMBER 4 A Connection between Controlled Marov Chains and Martingales PETR MANDL A finite controlled Marov chain is considered. It is assumed that its control converges to

More information

Measuring transitivity of fuzzy pairwise comparison matrix

Measuring transitivity of fuzzy pairwise comparison matrix Measuring transitivity of fuzzy pairwise comparison matrix Jaroslav Ramík 1 Abstract. A pair-wise comparison matrix is the result of pair-wise comparison a powerful method in multi-criteria optimization.

More information

Discrete time Markov chains. Discrete Time Markov Chains, Limiting. Limiting Distribution and Classification. Regular Transition Probability Matrices

Discrete time Markov chains. Discrete Time Markov Chains, Limiting. Limiting Distribution and Classification. Regular Transition Probability Matrices Discrete time Markov chains Discrete Time Markov Chains, Limiting Distribution and Classification DTU Informatics 02407 Stochastic Processes 3, September 9 207 Today: Discrete time Markov chains - invariant

More information

Average Reward Parameters

Average Reward Parameters Simulation-Based Optimization of Markov Reward Processes: Implementation Issues Peter Marbach 2 John N. Tsitsiklis 3 Abstract We consider discrete time, nite state space Markov reward processes which depend

More information

Continuous-Time Markov Decision Processes. Discounted and Average Optimality Conditions. Xianping Guo Zhongshan University.

Continuous-Time Markov Decision Processes. Discounted and Average Optimality Conditions. Xianping Guo Zhongshan University. Continuous-Time Markov Decision Processes Discounted and Average Optimality Conditions Xianping Guo Zhongshan University. Email: mcsgxp@zsu.edu.cn Outline The control model The existing works Our conditions

More information

University of Warwick, EC9A0 Maths for Economists Lecture Notes 10: Dynamic Programming

University of Warwick, EC9A0 Maths for Economists Lecture Notes 10: Dynamic Programming University of Warwick, EC9A0 Maths for Economists 1 of 63 University of Warwick, EC9A0 Maths for Economists Lecture Notes 10: Dynamic Programming Peter J. Hammond Autumn 2013, revised 2014 University of

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha

More information

Linearly-solvable Markov decision problems

Linearly-solvable Markov decision problems Advances in Neural Information Processing Systems 2 Linearly-solvable Markov decision problems Emanuel Todorov Department of Cognitive Science University of California San Diego todorov@cogsci.ucsd.edu

More information

Eigenvalues in Applications

Eigenvalues in Applications Eigenvalues in Applications Abstract We look at the role of eigenvalues and eigenvectors in various applications. Specifically, we consider differential equations, Markov chains, population growth, and

More information

BOUNDS OF MODULUS OF EIGENVALUES BASED ON STEIN EQUATION

BOUNDS OF MODULUS OF EIGENVALUES BASED ON STEIN EQUATION K Y BERNETIKA VOLUM E 46 ( 2010), NUMBER 4, P AGES 655 664 BOUNDS OF MODULUS OF EIGENVALUES BASED ON STEIN EQUATION Guang-Da Hu and Qiao Zhu This paper is concerned with bounds of eigenvalues of a complex

More information

P i [B k ] = lim. n=1 p(n) ii <. n=1. V i :=

P i [B k ] = lim. n=1 p(n) ii <. n=1. V i := 2.7. Recurrence and transience Consider a Markov chain {X n : n N 0 } on state space E with transition matrix P. Definition 2.7.1. A state i E is called recurrent if P i [X n = i for infinitely many n]

More information

Markov Chains and Stochastic Sampling

Markov Chains and Stochastic Sampling Part I Markov Chains and Stochastic Sampling 1 Markov Chains and Random Walks on Graphs 1.1 Structure of Finite Markov Chains We shall only consider Markov chains with a finite, but usually very large,

More information

STOCHASTIC PROCESSES Basic notions

STOCHASTIC PROCESSES Basic notions J. Virtamo 38.3143 Queueing Theory / Stochastic processes 1 STOCHASTIC PROCESSES Basic notions Often the systems we consider evolve in time and we are interested in their dynamic behaviour, usually involving

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative

More information

Approximate active fault detection and control

Approximate active fault detection and control Approximate active fault detection and control Jan Škach Ivo Punčochář Miroslav Šimandl Department of Cybernetics Faculty of Applied Sciences University of West Bohemia Pilsen, Czech Republic 11th European

More information

Monte Carlo Methods in Statistical Mechanics

Monte Carlo Methods in Statistical Mechanics Monte Carlo Methods in Statistical Mechanics Mario G. Del Pópolo Atomistic Simulation Centre School of Mathematics and Physics Queen s University Belfast Belfast Mario G. Del Pópolo Statistical Mechanics

More information

The Perron eigenspace of nonnegative almost skew-symmetric matrices and Levinger s transformation

The Perron eigenspace of nonnegative almost skew-symmetric matrices and Levinger s transformation Linear Algebra and its Applications 360 (003) 43 57 www.elsevier.com/locate/laa The Perron eigenspace of nonnegative almost skew-symmetric matrices and Levinger s transformation Panayiotis J. Psarrakos

More information

ELEC633: Graphical Models

ELEC633: Graphical Models ELEC633: Graphical Models Tahira isa Saleem Scribe from 7 October 2008 References: Casella and George Exploring the Gibbs sampler (1992) Chib and Greenberg Understanding the Metropolis-Hastings algorithm

More information

Computational complexity estimates for value and policy iteration algorithms for total-cost and average-cost Markov decision processes

Computational complexity estimates for value and policy iteration algorithms for total-cost and average-cost Markov decision processes Computational complexity estimates for value and policy iteration algorithms for total-cost and average-cost Markov decision processes Jefferson Huang Dept. Applied Mathematics and Statistics Stony Brook

More information

The generalized order-k Fibonacci Pell sequence by matrix methods

The generalized order-k Fibonacci Pell sequence by matrix methods Journal of Computational and Applied Mathematics 09 (007) 33 45 wwwelseviercom/locate/cam The generalized order- Fibonacci Pell sequence by matrix methods Emrah Kilic Mathematics Department, TOBB University

More information

Series Expansions in Queues with Server

Series Expansions in Queues with Server Series Expansions in Queues with Server Vacation Fazia Rahmoune and Djamil Aïssani Abstract This paper provides series expansions of the stationary distribution of finite Markov chains. The work presented

More information

DISCRETE STOCHASTIC PROCESSES Draft of 2nd Edition

DISCRETE STOCHASTIC PROCESSES Draft of 2nd Edition DISCRETE STOCHASTIC PROCESSES Draft of 2nd Edition R. G. Gallager January 31, 2011 i ii Preface These notes are a draft of a major rewrite of a text [9] of the same name. The notes and the text are outgrowths

More information

Stochastic Models. Edited by D.P. Heyman Bellcore. MJ. Sobel State University of New York at Stony Brook

Stochastic Models. Edited by D.P. Heyman Bellcore. MJ. Sobel State University of New York at Stony Brook Stochastic Models Edited by D.P. Heyman Bellcore MJ. Sobel State University of New York at Stony Brook 1990 NORTH-HOLLAND AMSTERDAM NEW YORK OXFORD TOKYO Contents Preface CHARTER 1 Point Processes R.F.

More information

Loss Bounds for Uncertain Transition Probabilities in Markov Decision Processes

Loss Bounds for Uncertain Transition Probabilities in Markov Decision Processes Loss Bounds for Uncertain Transition Probabilities in Markov Decision Processes Andrew Mastin and Patrick Jaillet Abstract We analyze losses resulting from uncertain transition probabilities in Markov

More information

Stability of the two queue system

Stability of the two queue system Stability of the two queue system Iain M. MacPhee and Lisa J. Müller University of Durham Department of Mathematical Science Durham, DH1 3LE, UK (e-mail: i.m.macphee@durham.ac.uk, l.j.muller@durham.ac.uk)

More information

A PARAMETRIC MODEL FOR DISCRETE-VALUED TIME SERIES. 1. Introduction

A PARAMETRIC MODEL FOR DISCRETE-VALUED TIME SERIES. 1. Introduction tm Tatra Mt. Math. Publ. 00 (XXXX), 1 10 A PARAMETRIC MODEL FOR DISCRETE-VALUED TIME SERIES Martin Janžura and Lucie Fialová ABSTRACT. A parametric model for statistical analysis of Markov chains type

More information

Summary: A Random Walks View of Spectral Segmentation, by Marina Meila (University of Washington) and Jianbo Shi (Carnegie Mellon University)

Summary: A Random Walks View of Spectral Segmentation, by Marina Meila (University of Washington) and Jianbo Shi (Carnegie Mellon University) Summary: A Random Walks View of Spectral Segmentation, by Marina Meila (University of Washington) and Jianbo Shi (Carnegie Mellon University) The authors explain how the NCut algorithm for graph bisection

More information

Detailed Proof of The PerronFrobenius Theorem

Detailed Proof of The PerronFrobenius Theorem Detailed Proof of The PerronFrobenius Theorem Arseny M Shur Ural Federal University October 30, 2016 1 Introduction This famous theorem has numerous applications, but to apply it you should understand

More information

Key words. Strongly eventually nonnegative matrix, eventually nonnegative matrix, eventually r-cyclic matrix, Perron-Frobenius.

Key words. Strongly eventually nonnegative matrix, eventually nonnegative matrix, eventually r-cyclic matrix, Perron-Frobenius. May 7, DETERMINING WHETHER A MATRIX IS STRONGLY EVENTUALLY NONNEGATIVE LESLIE HOGBEN 3 5 6 7 8 9 Abstract. A matrix A can be tested to determine whether it is eventually positive by examination of its

More information

Model reversibility of a two dimensional reflecting random walk and its application to queueing network

Model reversibility of a two dimensional reflecting random walk and its application to queueing network arxiv:1312.2746v2 [math.pr] 11 Dec 2013 Model reversibility of a two dimensional reflecting random walk and its application to queueing network Masahiro Kobayashi, Masakiyo Miyazawa and Hiroshi Shimizu

More information

Cover Page. The handle holds various files of this Leiden University dissertation

Cover Page. The handle   holds various files of this Leiden University dissertation Cover Page The handle http://hdlhandlenet/1887/39637 holds various files of this Leiden University dissertation Author: Smit, Laurens Title: Steady-state analysis of large scale systems : the successive

More information

= P{X 0. = i} (1) If the MC has stationary transition probabilities then, = i} = P{X n+1

= P{X 0. = i} (1) If the MC has stationary transition probabilities then, = i} = P{X n+1 Properties of Markov Chains and Evaluation of Steady State Transition Matrix P ss V. Krishnan - 3/9/2 Property 1 Let X be a Markov Chain (MC) where X {X n : n, 1, }. The state space is E {i, j, k, }. The

More information

CHUN-HUA GUO. Key words. matrix equations, minimal nonnegative solution, Markov chains, cyclic reduction, iterative methods, convergence rate

CHUN-HUA GUO. Key words. matrix equations, minimal nonnegative solution, Markov chains, cyclic reduction, iterative methods, convergence rate CONVERGENCE ANALYSIS OF THE LATOUCHE-RAMASWAMI ALGORITHM FOR NULL RECURRENT QUASI-BIRTH-DEATH PROCESSES CHUN-HUA GUO Abstract The minimal nonnegative solution G of the matrix equation G = A 0 + A 1 G +

More information

High-dimensional Markov Chain Models for Categorical Data Sequences with Applications Wai-Ki CHING AMACL, Department of Mathematics HKU 19 March 2013

High-dimensional Markov Chain Models for Categorical Data Sequences with Applications Wai-Ki CHING AMACL, Department of Mathematics HKU 19 March 2013 High-dimensional Markov Chain Models for Categorical Data Sequences with Applications Wai-Ki CHING AMACL, Department of Mathematics HKU 19 March 2013 Abstract: Markov chains are popular models for a modelling

More information

International Competition in Mathematics for Universtiy Students in Plovdiv, Bulgaria 1994

International Competition in Mathematics for Universtiy Students in Plovdiv, Bulgaria 1994 International Competition in Mathematics for Universtiy Students in Plovdiv, Bulgaria 1994 1 PROBLEMS AND SOLUTIONS First day July 29, 1994 Problem 1. 13 points a Let A be a n n, n 2, symmetric, invertible

More information

Analysis of an Infinite-Server Queue with Markovian Arrival Streams

Analysis of an Infinite-Server Queue with Markovian Arrival Streams Analysis of an Infinite-Server Queue with Markovian Arrival Streams Guidance Professor Associate Professor Assistant Professor Masao FUKUSHIMA Tetsuya TAKINE Nobuo YAMASHITA Hiroyuki MASUYAMA 1999 Graduate

More information

Cover Page. The handle holds various files of this Leiden University dissertation

Cover Page. The handle  holds various files of this Leiden University dissertation Cover Page The handle http://hdl.handle.net/1887/39637 holds various files of this Leiden University dissertation Author: Smit, Laurens Title: Steady-state analysis of large scale systems : the successive

More information

MATH4406 Assignment 5

MATH4406 Assignment 5 MATH4406 Assignment 5 Patrick Laub (ID: 42051392) October 7, 2014 1 The machine replacement model 1.1 Real-world motivation Consider the machine to be the entire world. Over time the creator has running

More information

NP-hardness results for linear algebraic problems with interval data

NP-hardness results for linear algebraic problems with interval data 1 NP-hardness results for linear algebraic problems with interval data Dedicated to my father, Mr. Robert Rohn, in memoriam J. Rohn a a Faculty of Mathematics and Physics, Charles University, Malostranské

More information

Convex Optimization CMU-10725

Convex Optimization CMU-10725 Convex Optimization CMU-10725 Simulated Annealing Barnabás Póczos & Ryan Tibshirani Andrey Markov Markov Chains 2 Markov Chains Markov chain: Homogen Markov chain: 3 Markov Chains Assume that the state

More information

Neural, Parallel, and Scientific Computations 18 (2010) A STOCHASTIC APPROACH TO THE BONUS-MALUS SYSTEM 1. INTRODUCTION

Neural, Parallel, and Scientific Computations 18 (2010) A STOCHASTIC APPROACH TO THE BONUS-MALUS SYSTEM 1. INTRODUCTION Neural, Parallel, and Scientific Computations 18 (2010) 283-290 A STOCHASTIC APPROACH TO THE BONUS-MALUS SYSTEM KUMER PIAL DAS Department of Mathematics, Lamar University, Beaumont, Texas 77710, USA. ABSTRACT.

More information

Stochastic Design Criteria in Linear Models

Stochastic Design Criteria in Linear Models AUSTRIAN JOURNAL OF STATISTICS Volume 34 (2005), Number 2, 211 223 Stochastic Design Criteria in Linear Models Alexander Zaigraev N. Copernicus University, Toruń, Poland Abstract: Within the framework

More information

Value Iteration and Action ɛ-approximation of Optimal Policies in Discounted Markov Decision Processes

Value Iteration and Action ɛ-approximation of Optimal Policies in Discounted Markov Decision Processes Value Iteration and Action ɛ-approximation of Optimal Policies in Discounted Markov Decision Processes RAÚL MONTES-DE-OCA Departamento de Matemáticas Universidad Autónoma Metropolitana-Iztapalapa San Rafael

More information

OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS

OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS Xiaofei Fan-Orzechowski Department of Applied Mathematics and Statistics State University of New York at Stony Brook Stony

More information

Regular finite Markov chains with interval probabilities

Regular finite Markov chains with interval probabilities 5th International Symposium on Imprecise Probability: Theories and Applications, Prague, Czech Republic, 2007 Regular finite Markov chains with interval probabilities Damjan Škulj Faculty of Social Sciences

More information

Markov Decision Processes and Solving Finite Problems. February 8, 2017

Markov Decision Processes and Solving Finite Problems. February 8, 2017 Markov Decision Processes and Solving Finite Problems February 8, 2017 Overview of Upcoming Lectures Feb 8: Markov decision processes, value iteration, policy iteration Feb 13: Policy gradients Feb 15:

More information

Stochastic convexity in dynamic programming

Stochastic convexity in dynamic programming Economic Theory 22, 447 455 (2003) Stochastic convexity in dynamic programming Alp E. Atakan Department of Economics, Columbia University New York, NY 10027, USA (e-mail: aea15@columbia.edu) Received:

More information

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains

Markov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains Markov Chains A random process X is a family {X t : t T } of random variables indexed by some set T. When T = {0, 1, 2,... } one speaks about a discrete-time process, for T = R or T = [0, ) one has a continuous-time

More information

Markov Chains. As part of Interdisciplinary Mathematical Modeling, By Warren Weckesser Copyright c 2006.

Markov Chains. As part of Interdisciplinary Mathematical Modeling, By Warren Weckesser Copyright c 2006. Markov Chains As part of Interdisciplinary Mathematical Modeling, By Warren Weckesser Copyright c 2006 1 Introduction A (finite) Markov chain is a process with a finite number of states (or outcomes, or

More information

A SECOND ORDER STOCHASTIC DOMINANCE PORTFOLIO EFFICIENCY MEASURE

A SECOND ORDER STOCHASTIC DOMINANCE PORTFOLIO EFFICIENCY MEASURE K Y B E R N E I K A V O L U M E 4 4 ( 2 0 0 8 ), N U M B E R 2, P A G E S 2 4 3 2 5 8 A SECOND ORDER SOCHASIC DOMINANCE PORFOLIO EFFICIENCY MEASURE Miloš Kopa and Petr Chovanec In this paper, we introduce

More information

Notes on Linear Algebra and Matrix Theory

Notes on Linear Algebra and Matrix Theory Massimo Franceschet featuring Enrico Bozzo Scalar product The scalar product (a.k.a. dot product or inner product) of two real vectors x = (x 1,..., x n ) and y = (y 1,..., y n ) is not a vector but a

More information

MATH36001 Perron Frobenius Theory 2015

MATH36001 Perron Frobenius Theory 2015 MATH361 Perron Frobenius Theory 215 In addition to saying something useful, the Perron Frobenius theory is elegant. It is a testament to the fact that beautiful mathematics eventually tends to be useful,

More information

SIMILAR MARKOV CHAINS

SIMILAR MARKOV CHAINS SIMILAR MARKOV CHAINS by Phil Pollett The University of Queensland MAIN REFERENCES Convergence of Markov transition probabilities and their spectral properties 1. Vere-Jones, D. Geometric ergodicity in

More information

A NEW EFFECTIVE PRECONDITIONED METHOD FOR L-MATRICES

A NEW EFFECTIVE PRECONDITIONED METHOD FOR L-MATRICES Journal of Mathematical Sciences: Advances and Applications Volume, Number 2, 2008, Pages 3-322 A NEW EFFECTIVE PRECONDITIONED METHOD FOR L-MATRICES Department of Mathematics Taiyuan Normal University

More information

Total Expected Discounted Reward MDPs: Existence of Optimal Policies

Total Expected Discounted Reward MDPs: Existence of Optimal Policies Total Expected Discounted Reward MDPs: Existence of Optimal Policies Eugene A. Feinberg Department of Applied Mathematics and Statistics State University of New York at Stony Brook Stony Brook, NY 11794-3600

More information

Key words. Feedback shift registers, Markov chains, stochastic matrices, rapid mixing

Key words. Feedback shift registers, Markov chains, stochastic matrices, rapid mixing MIXING PROPERTIES OF TRIANGULAR FEEDBACK SHIFT REGISTERS BERND SCHOMBURG Abstract. The purpose of this note is to show that Markov chains induced by non-singular triangular feedback shift registers and

More information

Affine iterations on nonnegative vectors

Affine iterations on nonnegative vectors Affine iterations on nonnegative vectors V. Blondel L. Ninove P. Van Dooren CESAME Université catholique de Louvain Av. G. Lemaître 4 B-348 Louvain-la-Neuve Belgium Introduction In this paper we consider

More information

Affine Monotonic and Risk-Sensitive Models in Dynamic Programming

Affine Monotonic and Risk-Sensitive Models in Dynamic Programming REPORT LIDS-3204, JUNE 2016 (REVISED, NOVEMBER 2017) 1 Affine Monotonic and Risk-Sensitive Models in Dynamic Programming Dimitri P. Bertsekas arxiv:1608.01393v2 [math.oc] 28 Nov 2017 Abstract In this paper

More information

Foundations of Matrix Analysis

Foundations of Matrix Analysis 1 Foundations of Matrix Analysis In this chapter we recall the basic elements of linear algebra which will be employed in the remainder of the text For most of the proofs as well as for the details, the

More information

E-Companion to The Evolution of Beliefs over Signed Social Networks

E-Companion to The Evolution of Beliefs over Signed Social Networks OPERATIONS RESEARCH INFORMS E-Companion to The Evolution of Beliefs over Signed Social Networks Guodong Shi Research School of Engineering, CECS, The Australian National University, Canberra ACT 000, Australia

More information

INEQUALITY FOR VARIANCES OF THE DISCOUNTED RE- WARDS

INEQUALITY FOR VARIANCES OF THE DISCOUNTED RE- WARDS Applied Probability Trust (5 October 29) INEQUALITY FOR VARIANCES OF THE DISCOUNTED RE- WARDS EUGENE A. FEINBERG, Stony Brook University JUN FEI, Stony Brook University Abstract We consider the following

More information

ON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME

ON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME ON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME SAUL D. JACKA AND ALEKSANDAR MIJATOVIĆ Abstract. We develop a general approach to the Policy Improvement Algorithm (PIA) for stochastic control problems

More information

Central-limit approach to risk-aware Markov decision processes

Central-limit approach to risk-aware Markov decision processes Central-limit approach to risk-aware Markov decision processes Jia Yuan Yu Concordia University November 27, 2015 Joint work with Pengqian Yu and Huan Xu. Inventory Management 1 1 Look at current inventory

More information

Markov Decision Processes and Dynamic Programming

Markov Decision Processes and Dynamic Programming Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course How to model an RL problem The Markov Decision Process

More information

Basis Construction from Power Series Expansions of Value Functions

Basis Construction from Power Series Expansions of Value Functions Basis Construction from Power Series Expansions of Value Functions Sridhar Mahadevan Department of Computer Science University of Massachusetts Amherst, MA 3 mahadeva@cs.umass.edu Bo Liu Department of

More information

Kernels of Directed Graph Laplacians. J. S. Caughman and J.J.P. Veerman

Kernels of Directed Graph Laplacians. J. S. Caughman and J.J.P. Veerman Kernels of Directed Graph Laplacians J. S. Caughman and J.J.P. Veerman Department of Mathematics and Statistics Portland State University PO Box 751, Portland, OR 97207. caughman@pdx.edu, veerman@pdx.edu

More information

Asymptotics of minimax stochastic programs

Asymptotics of minimax stochastic programs Asymptotics of minimax stochastic programs Alexander Shapiro Abstract. We discuss in this paper asymptotics of the sample average approximation (SAA) of the optimal value of a minimax stochastic programming

More information

An iterative procedure for constructing subsolutions of discrete-time optimal control problems

An iterative procedure for constructing subsolutions of discrete-time optimal control problems An iterative procedure for constructing subsolutions of discrete-time optimal control problems Markus Fischer version of November, 2011 Abstract An iterative procedure for constructing subsolutions of

More information