Risk-Sensitive and Average Optimality in Markov Decision Processes
|
|
- Stella Adams
- 6 years ago
- Views:
Transcription
1 Risk-Sensitive and Average Optimality in Markov Decision Processes Karel Sladký Abstract. This contribution is devoted to the risk-sensitive optimality criteria in finite state Markov Decision Processes. At first, we rederive necessary and sufficient conditions for average optimality of (classical) risk-neutral unichain models. This approach is then extended to the risk-sensitive case, i.e., when expectation of the stream of one-stage costs (or rewards) generated by a Markov chain is evaluated by an exponential utility function. We restrict ourselves on irreducible or unichain Markov models where risk-sensitive average optimality is independent of the starting state. As we show this problem is closely related to solution of (nonlinear) Poissonian equations and their connections with nonnegative matrices. Keywords: dynamic programming, stochastic models, risk analysis and management. JEL classification: C44, C6, C63 AMS classification: 90C40, 60J0, 93E20 Notation and Preinaries In this note, we consider unichain Markov decision processes with finite state space and compact action spaces where the stream of costs generated by the Markov processes is evaluated by an exponential utility function (so-called risk-sensitive models) with a given risk sensitivity coefficient. To this end, let us consider an exponential utility function, say ū γ ( ), i.e. a separable utility function with constant risk sensitivity γ R. For γ > 0 (risk averse case) ū γ ( ) is convex, if γ < 0 (risk seeking case) ū γ ( ) is concave. Finally if γ =0 (risk neutral case) ū γ ( ) is linear. Observe that exponential utility function ū γ ( ) is separable and multiplicative if the risk sensitivity γ 0 and additive for γ = 0. In particular, we have u γ (ξ + ξ 2 ) = u γ (ξ ) u γ (ξ 2 ) if γ 0 and u γ (ξ + ξ 2 ) ξ + ξ 2 for γ = 0. Then the utility assigned to the (random) outcome ξ is given by ū γ (ξ) := { (sign γ) exp(γξ), if γ 0, ξ for γ = 0. () For what follows let u γ (ξ) := exp(γξ), hence ū γ (ξ) = (sign γ) u γ (ξ). Obviously ū γ ( ) is continuous and strictly increasing. Then for the corresponding certainty equivalent, say Z γ (ξ), since ū γ (Z γ (ξ)) = E [ū γ (ξ)] (E is reserved for expectation), we immediately get Z γ (ξ) = { γ ln{e u γ (ξ)}, if γ 0 E [ξ] for γ = 0. (2) In what follows, we consider a Markov decision chain X = {X n, n = 0,,...} with finite state space I = {, 2,..., N} and a compact set A i of possible decisions (actions) in state i I. Supposing that in state i I action a A i is selected, then state j is reached in the next transition with a given probability p ij (a) and one-stage transition cost c ij will be accrued to such transition. Institute of Information Theory and Automation Academy of Sciences of the Czech Republic Pod Vodárenskou věží 4, Praha 8, Czech Republic, sladky@utia.cas.cz
2 A (Markovian) policy controlling the decision process is given by a sequence of decisions (actions) at every time point. In particular, policy controlling the process, π = (f 0, f,...), with π k = (f k, f k+,...) for k =, 2,..., hence also π = (f 0, f,..., f k, π k ) is identified by a sequence of decision vectors {f n, n = 0,,...} where f n F A... A N for every n = 0,, 2,..., and f n i A i is the decision (or action) taken at the nth transition if the chain X is in state i. Policy π which selects at all times the same decision rule, i.e. π (f), is called stationary, hence X is a homogeneous Markov chain with transition probability matrix P (f) whose ij-th element equals p ij (f i ). Let ξn α = αk c Xk,X k+ with α (0, ), resp. ξ n = c X k,x k+, be the stream of α-discounted, resp. undiscounted, transition costs received in the n next transitions of the considered Markov chain X. Similarly let ξ (m,n) be reserved for the total (random) cost obtained from the mth up to the nth transition (obviously, ξ n = c X0,X + ξ (,n) ). Moreover, if the risk sensitivity γ 0 then ū γ (ξn) α = (sign γ) u γ (ξn), α resp. ū γ (ξ n ) = (sign γ) u γ (ξ n ), is the (random) utility assigned to ξn, α resp. to ξ n. Observe that ξ α := ξn α is well defined, hence u γ (ξ α ) = exp(γ α k c Xk,X k+ ). In the overwhelming literature on stochastic dynamic programming attention was mostly paid to the risk neutral case, i.e. if γ = 0. The following results and techniques adapted from [8] and [2] will be useful for derivation of necessary and sufficient average optimality conditions and for their further extensions to risk-sensitive models. Introducing for arbitrary g, w j R (i, j I) the discrepancy function we can easily verify the following identities: φ i,j (w, g) = c i,j w i + w j g (3) ξn α = αn α g + w X 0 α n w Xn + α k [ φ Xk,X k+ (w, g) ( α) w Xk+ ] (4) ξ n = ng + w X0 w Xn + φ Xk,X k+ (w, g). (5) If the process starts in state i and policy π = (f n ) is followed then for the expected α-discounted or undiscounted total cost Vi π(α, n) := E π i ξα n, Vi π(n) := E π i ξ n we immediately get by (4) (5) Vi π (α, n) = αn α g + w i + E π i { α k [ φ Xk,X k+ (w, g) ( α)w Xk+ ] α n w Xn } (6) V π Observe that i (n) = ng + w i + E π i { φ Xk,X k+ (w, g) w Xn }. (7) E π i φ Xk,X k+ (w, g) = p ij (f i ){ φ i,j (w, g) + E π j k= φ Xk,X k+ (w, g)}. (8) It is well-known from the dynamic programming literature (cf. e.g. [, 6, 9, 0, 6]) that If there exists state i 0 I that is accessible from any state i I for every f F then (i) For every f F the resulting transition probability matrix P (f) is unichain (i.e. P (f) have no two disjoint closed sets), (ii) There exists decision vector ˆf F along with numbers ŵ i, i I (unique up to additive constant), and ĝ being the solution of the set of (nonlinear) equations ŵ i + ĝ = min p ij (a)[c i,j + ŵ j ] = p ij ( ˆf i )[c i,j + ŵ j ], (9) a A i φ i (f, ˆf) := p ij (f)[c i,j + ŵ j ] ŵ i ĝ 0 with φ i ( ˆf, ˆf) = 0. (0) ( )
3 From (6), (7), (0) we immediately get for Vi π (α) := V π i (α, n) V ˆπ i (α) = αĝ + ŵ i ( α)e ˆπ i α k ŵ Xk+, V ˆπ i (n) = nĝ + ŵ i E ˆπ i ŵ n hence for stationary policy π ( ˆf) and arbitrary policy π = (f n ) n V i ˆπ (n) = ( α)vi ˆπ (α) = ĝ () α n V π i (n) = ĝ = α ( α)v π i (α) if and only if n E π i φ Xk (f n, ˆf) = 0. (2) 2 Risk-Sensitive Optimality On inserting the discrepancy function given by (3) in the exponential function u γ ( ) by (4) we get for the stream of discounted costs u γ (ξn) α = e γ α k c Xk,X k+ (3) = e γ[ and for U π i (γ, α, n) := E π i uγ (ξ α n) we have α k g+w X0 α n w Xn ] γ e α k [ φ Xk,X k+ (w,g) ( α) w Xk+ ] γ{ Ui π αn γ[ (γ, α, n) = e α g+wi] E π α k [ φ Xk,X k+ (w,g) ( α)w Xk+ ] α n w X n i e } (4) Observe that w i s are bounded, i.e. w Xk K for some K 0. Hence it holds e γ K e γ ( α) α k w Xk+ k= e γ K (5) and for n tending to infinity from (4) we immediately get for U π i (γ, α) := U π i (γ, α, n) Ui π (γ, α) = e γ[ α g+wi] E π i e γ α k [ φ Xk,X k+ (w,g)+( α)w Xk+ ] (6) Similarly for undiscounted models we get by (3), (4) Now observe that E π j {e γ k=m E π i e γ U π i (γ, n) = e γ[ng+w i] E π i e γ[ φ Xk,X k+ (w,g) = φ Xk,X k+ (w,g) Xm = j} = p jl (fj m l I φ Xk,X k+ (w,g) w X n ] (7) p ij (f 0 i ) e γ[ci,j wi+wj g] E π j ) e γ[c j,l w j +w l g] E πm+ l Employing (8) the following facts can be easily verified by (6), (7). e γ φ Xk,X k+ (w,g) k= (8) e γ φ Xk,X k+ (w,g) k=m+ (9)
4 Result. (i) Let for a given stationary policy π (f) there exist g(f), w j (f) s such that p ij (f i ) e γ φi,j(w(f),g(f)) =, i I. (20) Then p ij (f i ) e γ[ci,j wi(f)+wj(f) g(f)] = Ui π (γ, α) = e γ( α g(f)+wi(f)) E π γ ( α) i e α k w Xk (f) k= (2) U π i (γ, n) = e γ(ng(f)+w i(f)) E π i e γ w Xn (f) (22) (ii) If it is possible to select g = g and w j = w j s, resp. g = ĝ and w j = ŵ j s, such that for any f F, all i I and some f F, resp. ˆf F, respectively p ij (f i ) e γ φi,j(w,g ) with p ij (f i ) e γ φi,j(ŵ,ĝ) with p ij (f i ) e γ φ i,j(w,g ) = (23) p ij ( ˆf i ) e γ φi,j(ŵ,ĝ) = (24) then e γ( α ĝ+ŵ i) e γ K U ˆπ i (γ, α) e γ(nĝ+ŵi) e γ K U ˆπ i (γ, n) U π i (γ, α) e γ( α g +w i ) e γ K (25) U π i (γ, n)) e γ(ng +w i ) e γ K. (26) Result 2. Let (cf. (2)) Zi π(γ, α) = γ ln U i π(γ, α), Zπ i (γ, n) = γ ln U i π (γ, n). Then by (25), (26) for stationary policies ˆπ ( ˆf), π (f ), and by (6), (7) for an arbitrary policy π = (f n ) n Z i ˆπ (γ, n) = ( α)zi ˆπ (γ, α) = ĝ, α n Zπ i (γ, n) = g, resp. n ln[e π i e γ 3 Poissonian Equations n Zπ i (γ, n) = ĝ, n Zπ i (γ, n) = α ( α)z π i (γ, α) = g (27) if and only if φ Xk,X k+ (w,g ) ] = 0, resp. n ln[e π i e γ φ Xk,X k+ (ŵ,ĝ) ] = 0. (28) The system of equations (20) for the considered stationary policy π (f) and the nonlinear systems of equations (23), (24) for finding stationary policy with maximal/minimal value of g(f) can be also written as e γ[g(f)+w i(f)] = e γ[g +w i ] = max e γ[ĝ+ŵ i] = min p ij (f i ) e γ[c i,j+w j (f)] (i I) (29) p ij (f i ) e γ[ci,j+w j ] (i I) (30) p ij (f i ) e γ[c i,j+ŵ j ] (i I) (3) respectively, for the values g(f), ĝ, g, w i (f), wi, ŵ i (i =,..., N); obviously, these values depend on the selected risk sensitivity γ. Eqs. (30), (3) can be called the γ-average reward/cost optimality equation. In particular, if γ 0 using the Taylor expansion by (29), resp. (3), we have g(f) + w i (f) = p ij (f i ) [c i,j + w j (f)], resp. ĝ + ŵ i = min that well corresponds to (9). p ij (f i ) [c i,j + ŵ j ]
5 On introducing new variables v i (f) := e γwi(f), ρ(f) := e γg(f), and on replacing transition probabilities p ij (f i ) s by general nonnegative numbers defined by q ij (f i ) := p ij (f i ) e γcij (29) can be alternatively written as the following set of equations ρ(f)v i (f) = q ij (f i ) v j (f) (i I) (32) and (30), (3) can be rewritten as the following sets of nonlinear equations (here ˆv i := e γŵ i, v i := eγw i, ˆρ = e γĝ, ρ := e γg ) ρ v i = max q ij (f i ) vj, ˆρ ˆv i = min called γ-average reward/cost optimality equation in multiplicative form. q ij (f i ) ˆv j (i I) (33) For what follows it is convenient to consider (32), (33) in matrix form. To this end, we introduce the N N matrix Q(f) = [q ij (f i )] with spectral radius (Perron eigenvalue) ρ(f) along with its right Perron eigenvector v(f) = [v i (f i )], hence (cf. [5]) ρ(f)v(f) = Q(f)v(f). Similarly, for v(f ) = v, v( ˆf) = ˆv (33) can be written in matrix form as ρ v = max Q(f)v, ˆρ ˆv = min Q(f)ˆv. (34) Recall that vectorial maximum and minimum in (34) should be considered componentwise and ˆv, v are unique up to multiplicative constant. Furthermore, if the transition probability matrix P (f) is irreducible then also Q(f) is irreducible and the right Perron eigenvector v(f) can be selected strictly positive. To extend this assertion to unichain models in contrast to condition ( ) for the risk neutral case it is necessary to assume existence of state i 0 I accessible from any state i I for every f F that belongs to the basic class of Q(f). If condition ( ) is fulfilled it can be shown (cf. [5]) that in (30), (3) and (33) eigenvectors v(f), ˆv, v can be selected strictly positive and ρ, resp. ˆρ, is the maximum, resp. minimum, Perron eigenvalue of the matrix family {Q(f), f F}. So we have arrived to Result 3. Sufficient condition for the existence of γ-average reward/costs optimality equation is the existence of state i 0 I fulfilling condition ( ) (trivially fulfilled for irreducible models). ( ) In particular, for unichain models condition ( ) is fulfilled if this risk sensitive coefficient γ is sufficiently close to zero (cf. [3, 4, 5]). Finding solution of (34) can be performed by policy or value iteration. Details can be found e.g. in [2, 3, 7, 4, 5]. Finally, we rewrite optimality condition (28) of Result 2 in terms of Q(f), ˆv, ˆρ. To this end first observe that E π i 0 e γ φ Xk,X k+ (ŵ,ĝ) = p ik,i k+ (fi k k ) e γ[c i k,i k+ +ŵ ik+ ŵ ik ĝ] i k+ I = q ik,i k+ (fi k k ) ˆv ik+ ˆv i k ˆρ (35) i k+ I So equation (28) can be also written in matrix form as { } { [ ] } n ln ˆV Q(f k ) ˆV ˆρ = n ln ˆV Q(f k ) ˆρ ˆV ˆV = 0 (36) where the diagonal matrix ˆV = diag [ˆv i ] and 0 is reserved for a null matrix. (i.e. irreducible class with spectral radius equal to the Perron eigenvalue of Q(f))
6 4 Conclusions In this note necessary and sufficient optimality conditions for discrete time Markov decision chains are obtained along with equations for average optimal policies both for risk-neutral and risk-sensitive models. Our analysis is restricted to unichain models, and for the risk-sensitive case some additional assumptions are made. For multichain models it is necessary to find suitable partition of the state space into nested classes that retain some properties of the unichain model. Some results in this direction can be found in [, 5, 7, 8]. Acknowledgements This research was supported by the Czech Science Foundation under Grants P402//050 and P402/0/0956. References [] Bertsekas, D. P.: Dynamic Programming and Optimal Control. Volume 2. Third Edition. Athena Scientific, Belmont, Mass [2] Cavazos-Cadena, R. and Montes-de-Oca R.: The value iteration algorithm in risk-sensitive average Markov decision chains with finite state space. Mathematics of Operations Research 28 (2003), [3] Cavazos-Cadena, R.: Solution to the risk-sensitive average cost optimality equation in a class of Markov decision processes with finite state space. Mathematical Methods of Operations Research 57 (2003), [4] Cavazos-Cadena, R. and Hernández-Hernández, D.: A characterization of the optimal risk-sensitive average cost infinite controlled Markov chains. Annals of Applied Probability 5 (2005), [5] Gantmakher, F. R.: The Theory of Matrices. Chelsea, London 959. [6] Howard, R. A.: Dynamic Programming and Markov Processes. MIT Press, Cambridge, Mass [7] Howard, R. A. and Matheson, J.: Risk-sensitive Markov decision processes. Management Science 23 (972), [8] Mandl, P.: On the variance in controlled Markov chains. Kybernetika 7 (97), 2. [9] Puterman, M. L.: Markov Decision Processes Discrete Stochastic Dynamic Programming. Wiley, New York 994. [0] Ross, S. M.: Introduction to Stochastic Dynamic Programming. Academic Press, New York 983. [] Rothblum, U. G. and Whittle, P.: Growth optimality for branching Markov decision chains. Mathematics of Operations Research 7 (982), [2] Sladký, K.: On the set of optimal controls for Markov chains with rewards. Kybernetika 0 (974), [3] Sladký, K.: On dynamic programming recursions for multiplicative Markov decision chains. Mathematical Programming Study 6 (976), [4] Sladký, K.: Bounds on discrete dynamic programming recursions I. Kybernetika 6 (980), [5] Sladký, K.: Growth rates and average optimality in risk-sensitive Markov decision chains. Kybernetika 44 (2008), [6] Tijms, H. C.: A First Course in Stochastic Models. Wiley, Chichester, [7] Whittle, P.: Optimization Over Time Dynamic Programming and Stochastic Control. Volume II, Chapter 35. Wiley, Chichester 983. [8] Zijm, W. H. M.: Nonnegative Matrices in Dynamic Programming. Mathematical Centre Tract, Amsterdam
Separable Utility Functions in Dynamic Economic Models
Separable Utility Functions in Dynamic Economic Models Karel Sladký 1 Abstract. In this note we study properties of utility functions suitable for performance evaluation of dynamic economic models under
More informationNon-Irreducible Controlled Markov Chains with Exponential Average Cost.
Non-Irreducible Controlled Markov Chains with Exponential Average Cost. Agustin Brau-Rojas Departamento de Matematicas Universidad de Sonora. Emmanuel Fernández-Gaucherand Dept. of Electrical & Computer
More informationUNCORRECTED PROOFS. P{X(t + s) = j X(t) = i, X(u) = x(u), 0 u < t} = P{X(t + s) = j X(t) = i}.
Cochran eorms934.tex V1 - May 25, 21 2:25 P.M. P. 1 UNIFORMIZATION IN MARKOV DECISION PROCESSES OGUZHAN ALAGOZ MEHMET U.S. AYVACI Department of Industrial and Systems Engineering, University of Wisconsin-Madison,
More informationMarkov decision processes and interval Markov chains: exploiting the connection
Markov decision processes and interval Markov chains: exploiting the connection Mingmei Teo Supervisors: Prof. Nigel Bean, Dr Joshua Ross University of Adelaide July 10, 2013 Intervals and interval arithmetic
More informationLecture 1: Brief Review on Stochastic Processes
Lecture 1: Brief Review on Stochastic Processes A stochastic process is a collection of random variables {X t (s) : t T, s S}, where T is some index set and S is the common sample space of the random variables.
More informationINTERTWINING OF BIRTH-AND-DEATH PROCESSES
K Y BERNETIKA VOLUM E 47 2011, NUMBER 1, P AGES 1 14 INTERTWINING OF BIRTH-AND-DEATH PROCESSES Jan M. Swart It has been known for a long time that for birth-and-death processes started in zero the first
More informationStochastic Processes. Theory for Applications. Robert G. Gallager CAMBRIDGE UNIVERSITY PRESS
Stochastic Processes Theory for Applications Robert G. Gallager CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv Swgg&sfzoMj ybr zmjfr%cforj owf fmdy xix Acknowledgements xxi 1 Introduction and review
More informationMarkov Chains, Random Walks on Graphs, and the Laplacian
Markov Chains, Random Walks on Graphs, and the Laplacian CMPSCI 791BB: Advanced ML Sridhar Mahadevan Random Walks! There is significant interest in the problem of random walks! Markov chain analysis! Computer
More informationA relative entropy characterization of the growth rate of reward in risk-sensitive control
1 / 47 A relative entropy characterization of the growth rate of reward in risk-sensitive control Venkat Anantharam EECS Department, University of California, Berkeley (joint work with Vivek Borkar, IIT
More informationMean-field dual of cooperative reproduction
The mean-field dual of systems with cooperative reproduction joint with Tibor Mach (Prague) A. Sturm (Göttingen) Friday, July 6th, 2018 Poisson construction of Markov processes Let (X t ) t 0 be a continuous-time
More informationElements of Reinforcement Learning
Elements of Reinforcement Learning Policy: way learning algorithm behaves (mapping from state to action) Reward function: Mapping of state action pair to reward or cost Value function: long term reward,
More informationOnline solution of the average cost Kullback-Leibler optimization problem
Online solution of the average cost Kullback-Leibler optimization problem Joris Bierkens Radboud University Nijmegen j.bierkens@science.ru.nl Bert Kappen Radboud University Nijmegen b.kappen@science.ru.nl
More informationMarkov Decision Processes and Dynamic Programming
Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course In This Lecture A. LAZARIC Markov Decision Processes
More information21 Markov Decision Processes
2 Markov Decision Processes Chapter 6 introduced Markov chains and their analysis. Most of the chapter was devoted to discrete time Markov chains, i.e., Markov chains that are observed only at discrete
More informationMarkov Chains, Stochastic Processes, and Matrix Decompositions
Markov Chains, Stochastic Processes, and Matrix Decompositions 5 May 2014 Outline 1 Markov Chains Outline 1 Markov Chains 2 Introduction Perron-Frobenius Matrix Decompositions and Markov Chains Spectral
More informationPopulation Games and Evolutionary Dynamics
Population Games and Evolutionary Dynamics William H. Sandholm The MIT Press Cambridge, Massachusetts London, England in Brief Series Foreword Preface xvii xix 1 Introduction 1 1 Population Games 2 Population
More informationChapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS
Chapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS 63 2.1 Introduction In this chapter we describe the analytical tools used in this thesis. They are Markov Decision Processes(MDP), Markov Renewal process
More informationMatrix analytic methods. Lecture 1: Structured Markov chains and their stationary distribution
1/29 Matrix analytic methods Lecture 1: Structured Markov chains and their stationary distribution Sophie Hautphenne and David Stanford (with thanks to Guy Latouche, U. Brussels and Peter Taylor, U. Melbourne
More informationIntrinsic products and factorizations of matrices
Available online at www.sciencedirect.com Linear Algebra and its Applications 428 (2008) 5 3 www.elsevier.com/locate/laa Intrinsic products and factorizations of matrices Miroslav Fiedler Academy of Sciences
More informationStatistics 992 Continuous-time Markov Chains Spring 2004
Summary Continuous-time finite-state-space Markov chains are stochastic processes that are widely used to model the process of nucleotide substitution. This chapter aims to present much of the mathematics
More informationOn Finding Optimal Policies for Markovian Decision Processes Using Simulation
On Finding Optimal Policies for Markovian Decision Processes Using Simulation Apostolos N. Burnetas Case Western Reserve University Michael N. Katehakis Rutgers University February 1995 Abstract A simulation
More informationQ-Learning for Markov Decision Processes*
McGill University ECSE 506: Term Project Q-Learning for Markov Decision Processes* Authors: Khoa Phan khoa.phan@mail.mcgill.ca Sandeep Manjanna sandeep.manjanna@mail.mcgill.ca (*Based on: Convergence of
More informationSome notes on Markov Decision Theory
Some notes on Markov Decision Theory Nikolaos Laoutaris laoutaris@di.uoa.gr January, 2004 1 Markov Decision Theory[1, 2, 3, 4] provides a methodology for the analysis of probabilistic sequential decision
More informationMDP Preliminaries. Nan Jiang. February 10, 2019
MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process
More informationThe convergence limit of the temporal difference learning
The convergence limit of the temporal difference learning Ryosuke Nomura the University of Tokyo September 3, 2013 1 Outline Reinforcement Learning Convergence limit Construction of the feature vector
More informationFinite-Horizon Statistics for Markov chains
Analyzing FSDT Markov chains Friday, September 30, 2011 2:03 PM Simulating FSDT Markov chains, as we have said is very straightforward, either by using probability transition matrix or stochastic update
More informationLecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.
1 Markov chain: definition Lecture 5 Definition 1.1 Markov chain] A sequence of random variables (X n ) n 0 taking values in a measurable state space (S, S) is called a (discrete time) Markov chain, if
More informationRECURSION EQUATION FOR
Math 46 Lecture 8 Infinite Horizon discounted reward problem From the last lecture: The value function of policy u for the infinite horizon problem with discount factor a and initial state i is W i, u
More informationCOMPUTING OPTIMAL SEQUENTIAL ALLOCATION RULES IN CLINICAL TRIALS* Michael N. Katehakis. State University of New York at Stony Brook. and.
COMPUTING OPTIMAL SEQUENTIAL ALLOCATION RULES IN CLINICAL TRIALS* Michael N. Katehakis State University of New York at Stony Brook and Cyrus Derman Columbia University The problem of assigning one of several
More informationOn Polynomial Cases of the Unichain Classification Problem for Markov Decision Processes
On Polynomial Cases of the Unichain Classification Problem for Markov Decision Processes Eugene A. Feinberg Department of Applied Mathematics and Statistics State University of New York at Stony Brook
More informationAlternative Characterization of Ergodicity for Doubly Stochastic Chains
Alternative Characterization of Ergodicity for Doubly Stochastic Chains Behrouz Touri and Angelia Nedić Abstract In this paper we discuss the ergodicity of stochastic and doubly stochastic chains. We define
More informationA Connection between Controlled Markov Chains and Martingales
KYBERNETIKA VOLUME 9 (1973), NUMBER 4 A Connection between Controlled Marov Chains and Martingales PETR MANDL A finite controlled Marov chain is considered. It is assumed that its control converges to
More informationMeasuring transitivity of fuzzy pairwise comparison matrix
Measuring transitivity of fuzzy pairwise comparison matrix Jaroslav Ramík 1 Abstract. A pair-wise comparison matrix is the result of pair-wise comparison a powerful method in multi-criteria optimization.
More informationDiscrete time Markov chains. Discrete Time Markov Chains, Limiting. Limiting Distribution and Classification. Regular Transition Probability Matrices
Discrete time Markov chains Discrete Time Markov Chains, Limiting Distribution and Classification DTU Informatics 02407 Stochastic Processes 3, September 9 207 Today: Discrete time Markov chains - invariant
More informationAverage Reward Parameters
Simulation-Based Optimization of Markov Reward Processes: Implementation Issues Peter Marbach 2 John N. Tsitsiklis 3 Abstract We consider discrete time, nite state space Markov reward processes which depend
More informationContinuous-Time Markov Decision Processes. Discounted and Average Optimality Conditions. Xianping Guo Zhongshan University.
Continuous-Time Markov Decision Processes Discounted and Average Optimality Conditions Xianping Guo Zhongshan University. Email: mcsgxp@zsu.edu.cn Outline The control model The existing works Our conditions
More informationUniversity of Warwick, EC9A0 Maths for Economists Lecture Notes 10: Dynamic Programming
University of Warwick, EC9A0 Maths for Economists 1 of 63 University of Warwick, EC9A0 Maths for Economists Lecture Notes 10: Dynamic Programming Peter J. Hammond Autumn 2013, revised 2014 University of
More informationReinforcement Learning
Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha
More informationLinearly-solvable Markov decision problems
Advances in Neural Information Processing Systems 2 Linearly-solvable Markov decision problems Emanuel Todorov Department of Cognitive Science University of California San Diego todorov@cogsci.ucsd.edu
More informationEigenvalues in Applications
Eigenvalues in Applications Abstract We look at the role of eigenvalues and eigenvectors in various applications. Specifically, we consider differential equations, Markov chains, population growth, and
More informationBOUNDS OF MODULUS OF EIGENVALUES BASED ON STEIN EQUATION
K Y BERNETIKA VOLUM E 46 ( 2010), NUMBER 4, P AGES 655 664 BOUNDS OF MODULUS OF EIGENVALUES BASED ON STEIN EQUATION Guang-Da Hu and Qiao Zhu This paper is concerned with bounds of eigenvalues of a complex
More informationP i [B k ] = lim. n=1 p(n) ii <. n=1. V i :=
2.7. Recurrence and transience Consider a Markov chain {X n : n N 0 } on state space E with transition matrix P. Definition 2.7.1. A state i E is called recurrent if P i [X n = i for infinitely many n]
More informationMarkov Chains and Stochastic Sampling
Part I Markov Chains and Stochastic Sampling 1 Markov Chains and Random Walks on Graphs 1.1 Structure of Finite Markov Chains We shall only consider Markov chains with a finite, but usually very large,
More informationSTOCHASTIC PROCESSES Basic notions
J. Virtamo 38.3143 Queueing Theory / Stochastic processes 1 STOCHASTIC PROCESSES Basic notions Often the systems we consider evolve in time and we are interested in their dynamic behaviour, usually involving
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationComputer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo
Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative
More informationApproximate active fault detection and control
Approximate active fault detection and control Jan Škach Ivo Punčochář Miroslav Šimandl Department of Cybernetics Faculty of Applied Sciences University of West Bohemia Pilsen, Czech Republic 11th European
More informationMonte Carlo Methods in Statistical Mechanics
Monte Carlo Methods in Statistical Mechanics Mario G. Del Pópolo Atomistic Simulation Centre School of Mathematics and Physics Queen s University Belfast Belfast Mario G. Del Pópolo Statistical Mechanics
More informationThe Perron eigenspace of nonnegative almost skew-symmetric matrices and Levinger s transformation
Linear Algebra and its Applications 360 (003) 43 57 www.elsevier.com/locate/laa The Perron eigenspace of nonnegative almost skew-symmetric matrices and Levinger s transformation Panayiotis J. Psarrakos
More informationELEC633: Graphical Models
ELEC633: Graphical Models Tahira isa Saleem Scribe from 7 October 2008 References: Casella and George Exploring the Gibbs sampler (1992) Chib and Greenberg Understanding the Metropolis-Hastings algorithm
More informationComputational complexity estimates for value and policy iteration algorithms for total-cost and average-cost Markov decision processes
Computational complexity estimates for value and policy iteration algorithms for total-cost and average-cost Markov decision processes Jefferson Huang Dept. Applied Mathematics and Statistics Stony Brook
More informationThe generalized order-k Fibonacci Pell sequence by matrix methods
Journal of Computational and Applied Mathematics 09 (007) 33 45 wwwelseviercom/locate/cam The generalized order- Fibonacci Pell sequence by matrix methods Emrah Kilic Mathematics Department, TOBB University
More informationSeries Expansions in Queues with Server
Series Expansions in Queues with Server Vacation Fazia Rahmoune and Djamil Aïssani Abstract This paper provides series expansions of the stationary distribution of finite Markov chains. The work presented
More informationDISCRETE STOCHASTIC PROCESSES Draft of 2nd Edition
DISCRETE STOCHASTIC PROCESSES Draft of 2nd Edition R. G. Gallager January 31, 2011 i ii Preface These notes are a draft of a major rewrite of a text [9] of the same name. The notes and the text are outgrowths
More informationStochastic Models. Edited by D.P. Heyman Bellcore. MJ. Sobel State University of New York at Stony Brook
Stochastic Models Edited by D.P. Heyman Bellcore MJ. Sobel State University of New York at Stony Brook 1990 NORTH-HOLLAND AMSTERDAM NEW YORK OXFORD TOKYO Contents Preface CHARTER 1 Point Processes R.F.
More informationLoss Bounds for Uncertain Transition Probabilities in Markov Decision Processes
Loss Bounds for Uncertain Transition Probabilities in Markov Decision Processes Andrew Mastin and Patrick Jaillet Abstract We analyze losses resulting from uncertain transition probabilities in Markov
More informationStability of the two queue system
Stability of the two queue system Iain M. MacPhee and Lisa J. Müller University of Durham Department of Mathematical Science Durham, DH1 3LE, UK (e-mail: i.m.macphee@durham.ac.uk, l.j.muller@durham.ac.uk)
More informationA PARAMETRIC MODEL FOR DISCRETE-VALUED TIME SERIES. 1. Introduction
tm Tatra Mt. Math. Publ. 00 (XXXX), 1 10 A PARAMETRIC MODEL FOR DISCRETE-VALUED TIME SERIES Martin Janžura and Lucie Fialová ABSTRACT. A parametric model for statistical analysis of Markov chains type
More informationSummary: A Random Walks View of Spectral Segmentation, by Marina Meila (University of Washington) and Jianbo Shi (Carnegie Mellon University)
Summary: A Random Walks View of Spectral Segmentation, by Marina Meila (University of Washington) and Jianbo Shi (Carnegie Mellon University) The authors explain how the NCut algorithm for graph bisection
More informationDetailed Proof of The PerronFrobenius Theorem
Detailed Proof of The PerronFrobenius Theorem Arseny M Shur Ural Federal University October 30, 2016 1 Introduction This famous theorem has numerous applications, but to apply it you should understand
More informationKey words. Strongly eventually nonnegative matrix, eventually nonnegative matrix, eventually r-cyclic matrix, Perron-Frobenius.
May 7, DETERMINING WHETHER A MATRIX IS STRONGLY EVENTUALLY NONNEGATIVE LESLIE HOGBEN 3 5 6 7 8 9 Abstract. A matrix A can be tested to determine whether it is eventually positive by examination of its
More informationModel reversibility of a two dimensional reflecting random walk and its application to queueing network
arxiv:1312.2746v2 [math.pr] 11 Dec 2013 Model reversibility of a two dimensional reflecting random walk and its application to queueing network Masahiro Kobayashi, Masakiyo Miyazawa and Hiroshi Shimizu
More informationCover Page. The handle holds various files of this Leiden University dissertation
Cover Page The handle http://hdlhandlenet/1887/39637 holds various files of this Leiden University dissertation Author: Smit, Laurens Title: Steady-state analysis of large scale systems : the successive
More information= P{X 0. = i} (1) If the MC has stationary transition probabilities then, = i} = P{X n+1
Properties of Markov Chains and Evaluation of Steady State Transition Matrix P ss V. Krishnan - 3/9/2 Property 1 Let X be a Markov Chain (MC) where X {X n : n, 1, }. The state space is E {i, j, k, }. The
More informationCHUN-HUA GUO. Key words. matrix equations, minimal nonnegative solution, Markov chains, cyclic reduction, iterative methods, convergence rate
CONVERGENCE ANALYSIS OF THE LATOUCHE-RAMASWAMI ALGORITHM FOR NULL RECURRENT QUASI-BIRTH-DEATH PROCESSES CHUN-HUA GUO Abstract The minimal nonnegative solution G of the matrix equation G = A 0 + A 1 G +
More informationHigh-dimensional Markov Chain Models for Categorical Data Sequences with Applications Wai-Ki CHING AMACL, Department of Mathematics HKU 19 March 2013
High-dimensional Markov Chain Models for Categorical Data Sequences with Applications Wai-Ki CHING AMACL, Department of Mathematics HKU 19 March 2013 Abstract: Markov chains are popular models for a modelling
More informationInternational Competition in Mathematics for Universtiy Students in Plovdiv, Bulgaria 1994
International Competition in Mathematics for Universtiy Students in Plovdiv, Bulgaria 1994 1 PROBLEMS AND SOLUTIONS First day July 29, 1994 Problem 1. 13 points a Let A be a n n, n 2, symmetric, invertible
More informationAnalysis of an Infinite-Server Queue with Markovian Arrival Streams
Analysis of an Infinite-Server Queue with Markovian Arrival Streams Guidance Professor Associate Professor Assistant Professor Masao FUKUSHIMA Tetsuya TAKINE Nobuo YAMASHITA Hiroyuki MASUYAMA 1999 Graduate
More informationCover Page. The handle holds various files of this Leiden University dissertation
Cover Page The handle http://hdl.handle.net/1887/39637 holds various files of this Leiden University dissertation Author: Smit, Laurens Title: Steady-state analysis of large scale systems : the successive
More informationMATH4406 Assignment 5
MATH4406 Assignment 5 Patrick Laub (ID: 42051392) October 7, 2014 1 The machine replacement model 1.1 Real-world motivation Consider the machine to be the entire world. Over time the creator has running
More informationNP-hardness results for linear algebraic problems with interval data
1 NP-hardness results for linear algebraic problems with interval data Dedicated to my father, Mr. Robert Rohn, in memoriam J. Rohn a a Faculty of Mathematics and Physics, Charles University, Malostranské
More informationConvex Optimization CMU-10725
Convex Optimization CMU-10725 Simulated Annealing Barnabás Póczos & Ryan Tibshirani Andrey Markov Markov Chains 2 Markov Chains Markov chain: Homogen Markov chain: 3 Markov Chains Assume that the state
More informationNeural, Parallel, and Scientific Computations 18 (2010) A STOCHASTIC APPROACH TO THE BONUS-MALUS SYSTEM 1. INTRODUCTION
Neural, Parallel, and Scientific Computations 18 (2010) 283-290 A STOCHASTIC APPROACH TO THE BONUS-MALUS SYSTEM KUMER PIAL DAS Department of Mathematics, Lamar University, Beaumont, Texas 77710, USA. ABSTRACT.
More informationStochastic Design Criteria in Linear Models
AUSTRIAN JOURNAL OF STATISTICS Volume 34 (2005), Number 2, 211 223 Stochastic Design Criteria in Linear Models Alexander Zaigraev N. Copernicus University, Toruń, Poland Abstract: Within the framework
More informationValue Iteration and Action ɛ-approximation of Optimal Policies in Discounted Markov Decision Processes
Value Iteration and Action ɛ-approximation of Optimal Policies in Discounted Markov Decision Processes RAÚL MONTES-DE-OCA Departamento de Matemáticas Universidad Autónoma Metropolitana-Iztapalapa San Rafael
More informationOPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS
OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS Xiaofei Fan-Orzechowski Department of Applied Mathematics and Statistics State University of New York at Stony Brook Stony
More informationRegular finite Markov chains with interval probabilities
5th International Symposium on Imprecise Probability: Theories and Applications, Prague, Czech Republic, 2007 Regular finite Markov chains with interval probabilities Damjan Škulj Faculty of Social Sciences
More informationMarkov Decision Processes and Solving Finite Problems. February 8, 2017
Markov Decision Processes and Solving Finite Problems February 8, 2017 Overview of Upcoming Lectures Feb 8: Markov decision processes, value iteration, policy iteration Feb 13: Policy gradients Feb 15:
More informationStochastic convexity in dynamic programming
Economic Theory 22, 447 455 (2003) Stochastic convexity in dynamic programming Alp E. Atakan Department of Economics, Columbia University New York, NY 10027, USA (e-mail: aea15@columbia.edu) Received:
More informationMarkov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains
Markov Chains A random process X is a family {X t : t T } of random variables indexed by some set T. When T = {0, 1, 2,... } one speaks about a discrete-time process, for T = R or T = [0, ) one has a continuous-time
More informationMarkov Chains. As part of Interdisciplinary Mathematical Modeling, By Warren Weckesser Copyright c 2006.
Markov Chains As part of Interdisciplinary Mathematical Modeling, By Warren Weckesser Copyright c 2006 1 Introduction A (finite) Markov chain is a process with a finite number of states (or outcomes, or
More informationA SECOND ORDER STOCHASTIC DOMINANCE PORTFOLIO EFFICIENCY MEASURE
K Y B E R N E I K A V O L U M E 4 4 ( 2 0 0 8 ), N U M B E R 2, P A G E S 2 4 3 2 5 8 A SECOND ORDER SOCHASIC DOMINANCE PORFOLIO EFFICIENCY MEASURE Miloš Kopa and Petr Chovanec In this paper, we introduce
More informationNotes on Linear Algebra and Matrix Theory
Massimo Franceschet featuring Enrico Bozzo Scalar product The scalar product (a.k.a. dot product or inner product) of two real vectors x = (x 1,..., x n ) and y = (y 1,..., y n ) is not a vector but a
More informationMATH36001 Perron Frobenius Theory 2015
MATH361 Perron Frobenius Theory 215 In addition to saying something useful, the Perron Frobenius theory is elegant. It is a testament to the fact that beautiful mathematics eventually tends to be useful,
More informationSIMILAR MARKOV CHAINS
SIMILAR MARKOV CHAINS by Phil Pollett The University of Queensland MAIN REFERENCES Convergence of Markov transition probabilities and their spectral properties 1. Vere-Jones, D. Geometric ergodicity in
More informationA NEW EFFECTIVE PRECONDITIONED METHOD FOR L-MATRICES
Journal of Mathematical Sciences: Advances and Applications Volume, Number 2, 2008, Pages 3-322 A NEW EFFECTIVE PRECONDITIONED METHOD FOR L-MATRICES Department of Mathematics Taiyuan Normal University
More informationTotal Expected Discounted Reward MDPs: Existence of Optimal Policies
Total Expected Discounted Reward MDPs: Existence of Optimal Policies Eugene A. Feinberg Department of Applied Mathematics and Statistics State University of New York at Stony Brook Stony Brook, NY 11794-3600
More informationKey words. Feedback shift registers, Markov chains, stochastic matrices, rapid mixing
MIXING PROPERTIES OF TRIANGULAR FEEDBACK SHIFT REGISTERS BERND SCHOMBURG Abstract. The purpose of this note is to show that Markov chains induced by non-singular triangular feedback shift registers and
More informationAffine iterations on nonnegative vectors
Affine iterations on nonnegative vectors V. Blondel L. Ninove P. Van Dooren CESAME Université catholique de Louvain Av. G. Lemaître 4 B-348 Louvain-la-Neuve Belgium Introduction In this paper we consider
More informationAffine Monotonic and Risk-Sensitive Models in Dynamic Programming
REPORT LIDS-3204, JUNE 2016 (REVISED, NOVEMBER 2017) 1 Affine Monotonic and Risk-Sensitive Models in Dynamic Programming Dimitri P. Bertsekas arxiv:1608.01393v2 [math.oc] 28 Nov 2017 Abstract In this paper
More informationFoundations of Matrix Analysis
1 Foundations of Matrix Analysis In this chapter we recall the basic elements of linear algebra which will be employed in the remainder of the text For most of the proofs as well as for the details, the
More informationE-Companion to The Evolution of Beliefs over Signed Social Networks
OPERATIONS RESEARCH INFORMS E-Companion to The Evolution of Beliefs over Signed Social Networks Guodong Shi Research School of Engineering, CECS, The Australian National University, Canberra ACT 000, Australia
More informationINEQUALITY FOR VARIANCES OF THE DISCOUNTED RE- WARDS
Applied Probability Trust (5 October 29) INEQUALITY FOR VARIANCES OF THE DISCOUNTED RE- WARDS EUGENE A. FEINBERG, Stony Brook University JUN FEI, Stony Brook University Abstract We consider the following
More informationON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME
ON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME SAUL D. JACKA AND ALEKSANDAR MIJATOVIĆ Abstract. We develop a general approach to the Policy Improvement Algorithm (PIA) for stochastic control problems
More informationCentral-limit approach to risk-aware Markov decision processes
Central-limit approach to risk-aware Markov decision processes Jia Yuan Yu Concordia University November 27, 2015 Joint work with Pengqian Yu and Huan Xu. Inventory Management 1 1 Look at current inventory
More informationMarkov Decision Processes and Dynamic Programming
Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) ENS Cachan - Master 2 MVA SequeL INRIA Lille MVA-RL Course How to model an RL problem The Markov Decision Process
More informationBasis Construction from Power Series Expansions of Value Functions
Basis Construction from Power Series Expansions of Value Functions Sridhar Mahadevan Department of Computer Science University of Massachusetts Amherst, MA 3 mahadeva@cs.umass.edu Bo Liu Department of
More informationKernels of Directed Graph Laplacians. J. S. Caughman and J.J.P. Veerman
Kernels of Directed Graph Laplacians J. S. Caughman and J.J.P. Veerman Department of Mathematics and Statistics Portland State University PO Box 751, Portland, OR 97207. caughman@pdx.edu, veerman@pdx.edu
More informationAsymptotics of minimax stochastic programs
Asymptotics of minimax stochastic programs Alexander Shapiro Abstract. We discuss in this paper asymptotics of the sample average approximation (SAA) of the optimal value of a minimax stochastic programming
More informationAn iterative procedure for constructing subsolutions of discrete-time optimal control problems
An iterative procedure for constructing subsolutions of discrete-time optimal control problems Markus Fischer version of November, 2011 Abstract An iterative procedure for constructing subsolutions of
More information