Non-Irreducible Controlled Markov Chains with Exponential Average Cost.

Size: px
Start display at page:

Download "Non-Irreducible Controlled Markov Chains with Exponential Average Cost."

Transcription

1 Non-Irreducible Controlled Markov Chains with Exponential Average Cost. Agustin Brau-Rojas Departamento de Matematicas Universidad de Sonora. Emmanuel Fernández-Gaucherand Dept. of Electrical & Computer Eng. and Computer Science University of Cincinnati. Abstract We study discrete controlled Markov chains with finite state and action spaces. The performance of control policies is measured by the exponential average cost (EAC), a risk-sensitive version of the standard average cost which models risk-sensitivity by means of an exponential (dis)utility function (so that a constant risk-sensitivity coefficient is assumed.) The main result is the characterization of the EAC corresponding to an arbitrary stationary deterministic policy in terms of the spectral radii of suitable irreducible matrices. This result generalizes a well known theorem of Howard and Matheson that deals with the particular case in which the transition probability matrix induced by the control policy is primitive. The following products are obtained from the mentioned characterization. It is shown that, when a stationary deterministic policy determines only one class of recurrent states, the corresponding EAC converges to the risk-null average cost as the risk-sensitivity coefficient γ goes to zero. However, it is also shown that for large risk-sensitivity, fundamental differences arise between both models. In particular, the limiting values of the EAC as γ goes to infinity are determined. Further insight and illustration of the behavior of the EAC are provided by means of simple examples. Finally, we include a proof of the existence of solutions to the associated optimality equation when a simultaneous Doeblin condition is satisfied and the risk-sensitivity coefficient is small enough. The just mentioned proof is significantly simpler than that recently provided in [5, 6, 7], and unlike that one, it completely relies on elementary results of Perron-Frobenius theory of non-negative matrices, which is the approach in Howard and Matheson s seminal paper [4].. Introduction. We study controlled Markov chains (CMC s) with a risk-sensitive average cost optimality criterion as introduced by Howard and Matheson [4]; see also [5, 6, 0,, 6]. This criterion incorporates risk-sensitivity in the decision process by means of an exponential (dis)utility function U γ (x) = sgn(γ) e γx, where γ 0, and sgn(γ) denotes the sign of γ, see [8, 20]. We deal with both cases γ > 0 and γ < 0, which represent respectively risk-aversion and risk-proneness. For brevity, we will refer to Howard-Matheson s criterion as the exponential average cost (EAC), after the (dis)utility function employed to define it. The paper is organized as follows. In Section 2, the model and some general notation and terminology are introduced. Sections 3 and 4 are devoted to a comprehensive discussion of the EAC corresponding to a fixed (but otherwise arbitrary) stationary deterministic policy f. In Section 3, we restrict attention to the the case in which the transition probability matrix P f induced by f is irreducible. First, for arbitrary γ 0, the EAC is characterized as the unique solution of a Poisson equation and is be expressed in terms of the spectral radius of the so called disutility matrix P f (γ); this result constitutes an extension of a theorem of Howard and Matheson [4], which considers only the case γ > 0 and assumes that P is primitive; see also [7, 0]. Then the impact of both small and large risk sensitivity on the EAC under the irreducibility assumption is studied. We show that (a) the risk-sensitive model approaches the risk-null one as the risk-sensitivity coefficient γ goes to zero, and (b) as γ goes to ( ), the EAC converges to the worst (best) arithmetical

2 average of the costs in the cycles determined by the transition matrix. Section 4 deals with the EAC for an arbitrary (not necessarily irreducible) P f. The main result of that section, Theorem 2, characterizes the EAC in terms of the spectral radii of suitable submatrices of Pf (γ), thus generalizing Howard-Matheson s characterization. Theorem 2 is employed to compute the EAC in two simple examples which illustrate two peculiar features of risk sensitivity when the initial state is transient: (a) the EAC is not affected by the probability of entering an irreducible closed class but only by the EAC on that class, and (b) the EAC may depend on the cost structure at the transient states. Section 5 is concerned with the existence of solutions for the exponential average optimality equation (EAOE) which, as it is known, yield the optimal exponential average cost and an optimal stationary deterministic policy, see [4, 6]. First, by using Howard-Matheson s policy improvement argument [4], we show that if P f is irreducible for every f Π SD, then for arbitrary γ 0 the EAOE has a solution. Then we show that if the mentioned recurrence assumption is relaxed to a simultaneous Doeblin condition, the existence of solutions to the EAOE can be assured only for γ small enough. We must note that both results in this last section as well as the EAC s charactherization as a solution of a Poisson equation (Theorem ) for irreducible P, were recently obtained by Cavazos Cadena and Fernández Gaucherand [5, 6, 7]. However, the proofs we present here, unlike those in [5, 6, 7], are completely based on elementary results of the Perron-Frobenius theory of non-negative matrices, and are much less involved than those provided in the cited references. 2. Description of the Model. Consider the standard framework for a CMC, specified by X, A, {A(i) : i X}, P, c, where: a) X, the state space, is a discrete set. For ease of notation we will take X = {, 2,..., N}. b) A, the action or control space, is a finite set. c) A(i), the set of admissible actions at i, is a subset of A. The set of admissible state-action pairs is defined as K := {(i, a) : i X, a A(i)}. d) P = {P ( i, a) : (i, a) K}, is a transition probability on K 2 X, where 2 X is the family of all subsets of X. For brevity, sometimes we will write P (j i, a) or P ij (a) instead of P ({j} x, a). e) c : K R is the one-stage cost function. We will assume that c is bounded and, without loss of generality for our purposes as we will see later, also non-negative; that is, 0 c(i, a) K for some constant K (0, ). To avoid trivial situations we will assume also that c is no identically zero. The six tuple X, A, {A(i) : i X}, P, c represents a stochastic dynamic system observed at times (or epochs) t N 0 := {0,, 2,...}. The evolution of the system is as follows. Let X t denote the state at time t N 0, and A t the action chosen at that time. If at decision epoch t the system is in state X t = i X, and the control A t = a A(i) is chosen, then (i) a cost c(i, a) is incurred, and (ii) the system moves to a new state X t+ according to the probability distribution P ( i, a). Once the transition into the new state has ocurred, a new action is chosen, and the process is repeated. We will take the stochastic processes (X t ) and (A t ) as given by the coordinate functions defined on (X A) in the usual way; for more details, see [, 2, 9]. For simplicity we will often denote C t := c((x t, A t )). Let Π denote the set of all admissible (possibly randomized and history dependent) policies (see []), and F the set of admissible decision functions, i.e., functions from X to A such that f(i) A(i) i X. We will distinguish two subclasses of policies: the class Π MD of Markovian deterministic policies and the class Π SD of stationary deterministic policies []. The stationary deterministic policy determined by f F will be denoted by f. For each policy π Π and each initial state i X we define in the usual way a probability measure Pi π on Ω := (X A) [, 7, 9] and denote the corresponding expectation operator by Ei π. When π = f we 2

3 write P f i and E f i for Pi π and Ei π respectively, and the transition probability matrix induced by that policy is denoted by P f, that is, P f (i, j) := P (j i, f(i)). The (risk-null) average cost due to a policy π Π and initial state i X will be denoted by φ π (i) := lim sup n Eπ i [ n ] C t. () The certainty equivalent of a random variable Z with respect to the (dis)utility function U γ is defined as E(γ, Z) := U γ (E[ U γ(z)]) = γ log ( E[e γz ] ). Heuristically, a decision maker with utility function U γ is indiferent between the random (thus uncertain) cost Z and the (certain) cost E(γ, Z). The risk-sensitive average cost or exponential average cost (EAC) is defined as the (long-run) average of the certainty equivalents of the finite horizon costs. In other words, it is obtained by replacing the expectation operator in the definition of the risk-null average cost by the the certainty equivalent operator. Thus, the EAC corresponding to a risk-sensitivity coefficient γ, a policy π, and an initial state i will be given by J π (γ, i) := lim sup = lim sup n U γ ( ( n γ log E π i E π i [ ]) n U γ ( C t ) [ ( )]) n exp γ C t. Despite the fact that the EAC is not an expected utility criterion [9], it has been widely accepted in the literature as the exponential utility version of the standard (risk-null) average cost; see [4] for a discussion of alternative definitions. The optimal control problem associated to the EAC is, of course, to compute the optimal value function J (γ, i) := inf π Π J π (γ, i), (3) and to find policies at which the optimal values are attained, that is, to find π Π such that J (γ, i) = J π (γ, i). (4) Remark Observe that no generality is gained by dropping the assumption of nonnegative costs. Indeed, suppose that K c(i, a) K and define c := +K, C t = C t + K. Then 0 c(i, a) 2K and ( [ ( )]) ( [ ( )]) n lim sup n γ log n Ei π exp γ C t = lim sup n γ log e γnk Ei π exp γ C t ( [ ( )]) n = K + lim sup n γ log Ei π exp γ C t, where C t is defined similarly as C t. (2) 3

4 Notation and preliminary results. The following notation and terminology of probability theory, most of which is rather standard, will be used in the sequel. For i, j X, we write i j and say that i leads to j (or j is accesible from i), when P n (i, j) = P(X n = j X 0 = i) > 0 for some n ; observe that with this definition it may happen that i i, in which case we say that i is an irrelevant state. Similarly, for C X, i C ( the class C is accesible from i ) means that i j for some j C. As usual, i j, which is read i and j communicate, means that i j and j i. A class of states C X is called self communicating (SC) if: a) i j for every i, j C, and b) there does not exist a class C containing C such that (a) holds for C. If C is a SC class, then we will denote Q C := ( P (i, j) ) i,j C and QC := ( P (i, j) )i,j C = ( e γc(i) P (i, j) ) i,j C. Note that if C is SC, then both Q C and Q C are irreducible. Now we recall some standard notation and basic definitions and facts about the theory of non negative matrices we will use in the sequel. Vectors in R N and real N N matrices will be called positive (non negative) if all of their components are positive (non negative). Let A denote an N N non negative matrix. The spectral radius of A will be denoted by ρ(a); recall that ρ(a) := max { λ : λ is an eigenvalue of A } = lim An n, (5) where A = max{ N j= A(i, j) : i N}, see [3], Cor The well known Perron Frobenius Theorem (Cf. [3], Th ) establishes that, when A is irreducible (a) ρ(a) is positive, (b) ρ(a) is an algebraically simple eigenvalue of A and (c) there exist both positive right and positive left eigenvectors corresponding to ρ(a). Moreover, ρ(a) is the unique positive eigenvalue of A having such eigenvectors (see [3], Cor ). The asymptotic behavior of the iterates A n = (A n (i, j)) shown in the following proposition, is sometimes also included as part of the Perron Frobenius Theorem, see for example [8] Th Proposition. Let A = [A(i, j)] N i,j= be a non negative irreducible matrix and φ = (φ, φ 2,..., φ N ) any positive vector, then n m lim A n (i, j)φ i = ρ(a) (6) for every i {,..., N}. j= The following two monotonicity properties of irreducible matrices will be needed in the proof of Theorem 3. below; for a proof, see [2, Cor and 3.3] Proposition 2. Let A and B be non negative N N irreducible matrices such that A B 0 and A B. Then ρ(a) > ρ(b). Proposition 3. Let A be a non negative irreducible matrix, x a positive vector and α (0, ) such that Ax αx and Ax αx. Then ρ(a) < α. 4

5 Proposition 4. If P is a non negative N N matrix, then either P is irreducible, or by a permutation similarity it can be brought into the so called irreducible normal form P = Q 0... where each matrix Q i is either irreducible or equal to the matrix (0). Moreover, ρ(p ) = max{ρ(q i ) : Q i is irreducible}; Cf. [3], Ch In the next two sections we study the EAC corresponding to a (fixed) stationary deterministic policy f as a function of γ, the risk-sensitivity coefficient. Consequently, throughout both sections the dependance on f is omited and we consider a simplified model, known as a Markov cost chain (MCC), whose elements are the state space X = {,..., N}, a stochastic matrix P = (P ij ) N i,j=, and a non null cost vector c = (c(),..., c(n)) with nonnegative components. Then, the exponential average cost for the MCC is given by J(γ, i) : = lim sup = lim sup n E i Q m ( ) n γ, c(x t ), [ ( )]} n {E n γ log i exp γ c(x t ), where E i and E i are respectively the expectation and the certainty equivalent operators on (X, 2 X ) induced by the transition probability matrix P and the initial state i X. If we denote the disutility incurred by the cost chain up to time n by ( )] n U n (γ, i) := E i [sgn(γ) exp γ c(x t ), for n =, 2,..., and U 0 (γ, j) sgn(γ), then it is not hard to see that the following recursion formula holds true: N U n+ (γ, i) = P ij e γc(i) U n (γ, j), j= n = 0,,...; see [3, 4]. Now, if we define the disutility matrix P (γ) by P (γ) := (P (i, j)e γc(i) ) and denote U n := (U n (, γ),..., U n (N, γ)) τ (where v τ denotes the transpose of a vector v), then the above recursion formula can be written in vector form as U n+ = P U n, n = 0,,..., where U 0 := sgn(γ)(,..., ) τ. Thus U n = P n U 0, n = 0,,..., and substituting this expression for U n in (7) we get J(γ, i) = lim sup (7) N n γ log P n (i, j). (8) j= 3. The exponential average cost: the irreducible case. Throughout this section, we restrict attention to MCC s X, P, c for which the transition matrix P is irreducible. In Theorem 3. below, the EAC is 5

6 characterized as the unique solution of a Poisson equation and its value is expressed in terms of the spectral radius ρ( P ) of the disutility matrix P. The results in Theorem 3. are an extension of those established by Howard and Matheson in [4], where P is required to be primitive, that is, aperiodicity is assumed there besides irreducibility. We follow closely Howard and Matheson s arguments based on Perron Frobenius Theory of non negative matrices, thus showing that aperiodicity is not essential to employ such a tool in the present problem. Theorem If P is irreducible, then for every i X N J(γ, i) = lim n γ log P n (i, j) = log ρ(γ) =: J(γ), (9) γ j= where ρ(γ) is the spectral radius of the (irreducible) disutility matrix P (γ). Moreover, for each γ 0 there exists H(γ, ) : X R such that (J(γ), H(γ, )) is the unique solution of the Poissson equation with J(γ) > 0 and H(γ, N) = 0. e γ[j(γ)+h(γ, )] = e γc( ) N j= P ij e γh(γ,j), (0) Proof: Set H := {u : X R : u(n) = and u(i) > 0 i X}, H 2 := {v : X R : v(n) = 0} and let I γ denotes the interval (0, ) if γ < 0, and (, ) if γ > 0. Then, the mapping T : I γ H (0, ) H 2 defined by T (x, u) = ( γ log x, γ log u) is bijective. Moreover, if (x, u) I γ H and T (x, u) = (y, v), then (y, v) satisfies equation (0) if and only if (x, u) satisfies the equation xu = P (γ)u (we abuse notation by indistincly considering u as a function or as a vector in R N ). i.e., if and only if x is the spectral radius of P (γ) and u the corresponding positive eigenvector with U(N) =. Let ρ(γ) > 0 denote the spectral radius of P (γ) and w the positive eigenvector with w N =. First, let us check that ρ(γ) I γ. To that end, note that P (γ) P because c is non null; also, if γ > (<)0 then P (γ) ()P. Thus, by Proposition 2 ρ(γ) > (<)ρ(p ) = and our claim is proved. Therefore, by the Perron Frobenius Theorem, (ρ, w) is the unique couple in H for which ρ(γ)w = P (γ)w. Consequently, it follows from the above considerations that if we take (J(γ), H(γ, )) := T ( ρ(γ), w) = ( γ log ρ(γ), γ log w), then (J(γ), H(γ, )) is the unique solution of the Poissson equation (0). Finally, by Proposition, the irreducibility of P (γ) implies that for every positive vector φ = (φ,..., φ N ) the limit relationship N lim n log P (γ) n (i, j)φ j = log ρ( P (γ)) () j= holds true for every i X. Thus, (9) follows directly from () by taking φ = (,,..., ) τ. We can observe that there is a clear resemblance between (0) and the value equation for the risk-neutral average cost N φ + h(i) = c(i) + P ij h(j), (2) j= 6

7 where φ and h( ) are respectively the (risk-neutral) average cost and the relative value function [, 3, 2, 9]). Moreover, as the following lemma shows, when P is irreducible, the risk-sensitive model converges to the risk-null model as the risk-sensitivity coefficient decreases to zero. In particular, if we define J(0, i) := φ i X, then J(, i) turns out to be continuous at γ = 0 for every i X. This result was predicted in [4], but a proof was not provided. Theorem 2 If P is irreducible, then lim J(γ) = φ and lim H(γ, i) = h(i). (3) γ 0 γ 0 Proof: For each γ 0, let ρ(γ) denotes the spectral radius of P (γ) and w(γ) = (w (γ),......, w N (γ)) corresponding eigenvector such that w N (γ) =. Since the entries of P (γ) are analytic functions of γ, and ρ is an eigenvalue of multiplicity one, then ρ and w i are also analytic functions of γ; see [5, Ch.II]. In particular, lim γ 0 ρ(γ) = ρ(0) = and lim γ 0 w i (γ) = w i (0) =, i X. Differentiating (with respect to γ) both sides of the eigenvalue equations we obtain and letting γ 0 yields ρ(γ)w i (γ) = N P ij e γc(i) w j (γ), i X, j= ρ (γ)w i (γ) + ρ(γ)w i(γ) = N P ij [w j(γ) + w j (γ)c(i)]e γc(i) ; j= lim γ 0 ρ (γ) + lim w γ 0 i(γ) = N j= [ ] P ij lim γ 0 w j(γ) + c(i). (4) Since w N (γ) 0 and the solution (φ, h) with h(n) = 0 of the value equation (2) is unique [], we deduce from (4) that φ = lim ρ (γ) and h(i) = lim w γ 0 γ 0 i(γ). Finally, the lemma follows by observing that, from L Hospitals rule and Theorem, ρ (γ) lim J(γ) = lim log ρ(γ) = lim γ 0 γ 0 γ γ 0 ρ(γ) = lim γ 0 ρ (γ) = φ, and lim H(γ, i) = lim γ 0 γ 0 γ log w w i i(γ) = lim (γ) γ 0 w i (γ) = lim γ 0 u i(γ) = h(i). Remark 2 Since J(γ) = γ log ρ(γ) is analytic in R \ {0} and lim γ 0 γj(γ) = 0 (by the continuity of J at zero) then J is in fact analytic at zero. 7

8 To end this section, we analize the behavior of the EAC for large risk-sensitivity when the probability transition matrix P is irreducible. We will show that for (infinitely) large risk aversion, the EAC is given by the worst average cost that can occur (with positive probability) in the long run. Heuristically, we might say that the attitude toward risk of a decision maker with infinitely large risk-aversion is as pessimistic as can be. A similar result is proved as well for (infinitely) large risk proneness. We will use the following notation and definitions. A finite directed P-path (or simply a P-path) from state i to state i k is a finite ordered sequence of states {i, i,..., i k }, k, such that P (i, i ) P (i, i 2 ) P (i k 2, i k )P (i k, i k ) > 0. The length of a P-path {i, i,..., i k } is the number k of states following the initial state i. Thus, a P -path of length k from i to j exists if and only if P k (i, j) > 0. A P-cycle at state i, which we will denote as Γ i, is a P -path which begins and ends at i, and such that i occurs exactly twice in the path. A P -cycle of length is called a P-loop. If Γ i is a P -cycle, then A(Γ i ) will denote the arithmetic average of the costs in the cycle. That is, if Γ i is a P -loop then A(Γ i ) = c(i), and if Γ i = {i, i,..., i k, i} with k, then A(Γ i ) = ( c(i) + c(i ) + + c(i k ) ). k + Also, we denote P (Γ i ) = P (i, i)e γc(i) if Γ i is a P -loop, and if Γ i = {i, i,..., i k, i} with k. P (Γ i ) = P (i, i ) P (i, i 2 ) P (i k, i)e γ(c(i)+c(i)+ +c(i k)) Theorem 3 If P is irreducible then { } lim J(γ) = max A(Γ i ) : Γ i is a P -cycle. (5) γ Proof: Let s denote by α the right hand side of (5). First, observe that if Γ i = i, i,, i k, i is any P -cycle of length k (k ), then the positive number P (Γ i ) is one of the summands in the (i, i)-entry of P k. Thus, which yields with 0 < p.therefore, ρ ( P (γ) ) k = ρ ( P (γ) k ) P (Γ i ), γ log ρ( ) P kγ log P (Γ i ) = kγ log p + A(Γ i), lim inf γ J(γ) A(Γ i), and consequently lim inf γ J(γ) α, since Γ i was an arbitrary cycle. To prove the converse inequality, we will first determine an upper bound in terms of α for the entries of matrix P n (γ). To that end, observe that any P -path {i 0, i,..., i n } of length n > N must contain at least one P -cycle. If that cycle is of length r, say {k 0, k,..., k r, k 0 } then we have [ ] [ ] e γ c(i 0)+c(i )+ +c(i n +c(i n)) e γ c(i 0)+c(j )+ +c(j n r) e γrα, 8

9 where {i 0, j,..., j n r } is the P -path from i 0 to j n r (= i n ) of length n r obtained after removing {k 0, k,..., k r } from the original P -path. By applying the previous procedure as many times as necessary, we see that there must exist a path {i 0, k,..., k s, i n } with s < N, such that [ ] [ ] e γ c(i 0)+c(i )+ +c(i n +c(i n) e γ c(i 0)+c(k )+ +c(k s )+c(i n) e γ(n s)α. (6) Moreover, for each i X, c(i) Nα because i is contained at least in one P -cycle (recall that P is irreducible). Taking into account the last observation, inequality (6) yields [ ] e γ c(i 0)+c(i )+ +c(i n ) e γsnα e γ(n s)α (7) e γn 2α e γnα = e γ(n+n 2 )α. Hence, it follows from (7) that, for arbitrary n > N and i, j {,..., N}, we have P n (i, j) P n (i, j)e γ(n+n 2 )α, and we have accomplished our first step. The right above inequality is in fact rather rough, yet it will be sufficient for our purposes. From the just obtained inequality it follows immediately that P n e γ(n+n 2 )α P for every n N. Thus, ρ( P ) n = ρ( P n ) e γ(n+n 2 )α n N, and applying the function γ n log( ) to both extremes of this inequality yields γ log ρ( P ) N 2 + n α n N. n Consequently, J(γ) α for every γ > 0, and therefore lim sup γ J(γ) α. The proof of the lemma is complete. Remark 3 It is not always necessary to consider the set of all the P -cycles when taking the maximum in (5). If, for example, P (i, i) > 0 for every i X, then it is sufficient to consider the P -loops: { } lim J(γ) = max c(),..., c(n). γ As opposed to what happens with large risk aversion, the attitude toward risk of a decision maker with large risk proneness may be described as highly optimistic. The rigorous statement, whose proof is similar to that of Lemma 2, is given below. Lemma If P is irreducible then { } lim J(γ) = min A(Γ i ) : Γ i is a P -cycle. (8) γ 9

10 4. Exponential average cost: the general (non-irreducible) case. The main result of this section, Theorem 2, provides a representation of the EAC in terms of the spectral radii of suitable submatrices of P, for an arbitrary (not necessarily irreducible) transition probability matrix P. Then, by means of two simple examples, the effect of risk-sensitivity when P is not irreducible is illustrated. Finally, we study the behavior of the EAC for little risk-sensitivity in the present general case. In particular, convergence of the EAC to the risk-neutral average cost is seen to hold under a recurrence assumption less restrictive than irreducibility. Before proceeding to the main result, we prove an auxiliary proposition which is important by itself. Lemma 2 For i, j X, i j sgn(γ) J(γ, i) sgn(γ) J(γ, j). Proof: Since equality trivially holds true for i = j, let us assume that i j. Let us consider first the case γ > 0. Take r such that P r (i, j) > 0. Then for n r we have Thus, E i [exp(γs n )] P r (i, j)e i [exp(γs n ) X r = j] ] n P r (i, j)e i [exp(γ c(x t )) X r = j t=r = P r (i, j)e j [exp(γs n r )]. n γ log (E i [exp(γs n )]) n γ log (P r (i, j)e j [exp(γs n r )]) = n γ log P r (i, j) + n γ log E j [exp(γs n r )], and the claim follows by taking lim sup as n in the extremes of (9), since lim n γ log P r (i, j) = 0. For the case γ < 0, if we take r and n as above then, by taking into account that c( ) K, we have E i [exp(γs n )] P r (i, j)e i [exp(γs n ) X r = j] ] n P r (i, j)e γrk E i [exp(γ c(x t )) X r = j t=r = P r (i, j)e γrk E j [exp(γs n r )]. Since lim n γ log(p r (i, j)e γrk ) = 0 as well, we can proceed similarly as in the case γ > 0 to obtain J(γ, i) J(γ, j), and the proof is complete. Remark 4 (a) It follows from Lemma 2 that J(γ, i) = J(γ, j) when i j, that is, the EAC is constant within a self-communicating class. (b) Note that the arguments in the proof of Lemma 2 are valid even if the state space X is infinite. (9) Theorem 4 For a Markov cost chain with arbitrary transition probability matrix P, the EAC for i X is given by J(γ, i) = {log γ max ρ( Q } C ) : C is SC and i C (20) 0

11 Proof: First, note that for γ < 0, equation (20) can be written as { } J(γ, i) = min γ log ρ( Q C ) : C is SC and i C. For brevity of writing, denote by α(γ, i) the right hand side of (20). First, if i C and C is a closed SC class, then (20) holds because J(γ, i) = γ log ρ( Q C ) = α(γ, i). Next, if if C and C is a non closed (transient) SC class, then for γ > (<)0 we have J(γ, i) = lim sup = γ log ρ( Q C ). N n γ log P n (i, j) () lim j= Finally, if C is SC, i C and i / C, then taking j C, by Lemma 2 we have J(γ, i) ()J(γ, j) () γ log ρ( Q C ). n γ log j C( Q C ) n (i, j) Thus, we have proved that J(γ, i) α(i) for all i X. To obtain the opposite inequality, define L(i) := {i} {j : i j} and P [i] := ( P (k, l)) k,l L(i). Then, for γ > (<)0 lim sup N n γ log P n (i, j) j= = lim sup n γ log j L(i) P n [i] (i, j) n lim log P [i] γ /n = γ log ρ( P [i] ) = α(i). Since ρ( P [i] ) = max{ρ( Q C ) : C is SC and i C} by Proposition 3, the proof is complete. Remark 5 Two remarkable differences between the risk-sensitive and the risk-neutral model become apparent from Theorem 2: () The EAC when beginning at a transient state is not a typical average of the EAC s over the closed classes that are accesible from the initial state; and (2) The EAC may depend on the cost structure at the transient states. Remark 6 Notice that the proofs of the last two results are still valid if we substitute lim sup by lim inf wherever the first one appears. We deduce from that observation that, for the finite state space model, the limit exists in the definition of the EAC (2) and in (8). Next, we give an example that demonstrates how characteristic () in the above remark may cause J(, x) not to be continuous at zero, i.e., that (3) does not hold in general. The example considers a model with a transient state x which leads to more than one closed class.

12 Example. Consider the cost process with state space X = {, 2, 3}, transition probability matrix P = , p p 0 and cost vector c = (, 3, 5). In this case we have that e γc() 0 0 P = 0 e γc(2) 0, pe γc(3) ( p)e γc(3) 0 and state 3 leads to the classes C = {} and C 2 = {2}. Thus, from Theorem 4 we see that { max{c(), c(2)} = 3 if γ > 0, J(γ, 3) = min{c(), c(2)} = if γ < 0, so that lim J(γ, 3) = 3 = lim J(γ, 3). γ 0+ γ 0 On the other hand, it is easy to check that φ(3) = pc() + ( p)c(2) = 3 2p. Thus, J(, 3) is neither left nor right continuous at zero. Remark 7 Note that the J(γ, 3) has the same value for all 0 < p <. In other words, as opposed to what happens with the risk null criterion, given that and 2 are accesible from 3, J(γ, 3) does not depend on the probability of entering to each of the classes It is a well known fact that if a Markov cost chain has a unichain structure, i.e. only one irreducible closed class and possibly some transient states, then the standard (risk-neutral) average cost does not depend on the initial state. The following example shows how feature (2) in Remark (5) may cause the previously mentioned property not to be true in general for the EAC, when the risk-sensitivity coefficient is large. Example 2. Consider the cost process with state space X = {, 2}, transition matrix ( ) 0 P =, p p and cost vector c = (c(), c(2)) such that c() < c(2). In this case we have P = (e γc() ) and P 2 = (pe γc(2) ), corresponding to the self-communicating classe C = {} and C 2 = {2}. Thus, from Theorem 2, J(γ, ) = c() for every γ, and J(γ, 2) = max{c(), γ log(peγc(2) )} for γ > 0, and J(γ, 2) = min{c(), γ log(peγc(2) )} for γ < 0, that is { c() if γ < log p Thus, if γ > log p c() c(2) J(γ, 2) = c(2) + γ c() c(2), log p if γ log p c() c(2)., then the EAC does depend on the initial state. On the other hand, we can observe that the EAC is better behaved for small values of γ: J(γ, ) = J(γ, 2) for γ 2 log p c() c(2).

13 The following corollary to Theorem 2 shows that for γ close enough to zero, the EAC behaves similarly as the risk neutral average cost in that it is completely determined by the long-run behavior of the underlying stochastic process, that is, its value is not influenced by the cost structure at the transient states. Corollary There exists γ 0 > 0 such that for every transient state i and γ ( γ 0, γ 0 ), J(γ, i) = {log γ max ρ( Q } C ) : i C and C is SC and closed. Proof: Let C be a non closed (transient) SC class. For γ > 0, we have Q C e γk Q C ; thus ρ( Q C ) e γk ρ(q C ) and consequently γ log ρ( Q C ) K + γ log (ρ(q C)). Taking into account that ρ(q C ) <, which follows from Proposition 2 and the fact that Q C is irreducible and strictly substochastic, we have, for γ > 0 lim γ 0+ γ log ρ( Q C ) K + lim γ 0+ γ log ρ(q C) =. Now, for γ < 0, QC Q C ; thus ρ( Q C ) ρ(q C ) and consequently Then, similarly as in the previous case, γ log ρ( Q C ) γ log (ρ(q C)). lim γ 0 γ log ρ( Q C ) lim γ 0 γ log ρ(q C) = +. Since there is only a finite number of SC classes, the claim follows from Theorem 2. Corollary 2 If the probability transition matrix induces only one closed SC class C then, as in the irreducible case, the EAC converges to the risk-neutral average cost when γ goes to zero. Proof: On one hand, we know that the value φ of the risk neutral average cost does not depend on the initial state, see for example []. On the other hand, by the previous corollary, for γ small enough J(γ, i) = γ log ρ( Q C ) for every i X. Therefore, by Lemma lim J((γ, i)) = lim γ 0 γ 0 γ log ρ( Q C ) = φ for every i X. 5. Existence of solutions to the optimality equation. Similarly as in the optimal control problem for the risk neutral average cost, the risk sensitive optimal value function is, under apropiate conditions, given by the solutions to the functional equation { sgn(γ)αw(i) = min sgn(γ)e γc(i,a) P ij (a)w(j) }, i X, (2) a A j X 3

14 which we will call the exponential average optimality equation (EAOE) corresponding to γ or the γ-eaoe. More precisely, if α > 0 and the function w : X [K, K 2 ], with K > 0, satisfy (2), then J (γ, i) = γ log α for every i X. Furthermore, if for each i, f (i) attains the minimum on the right hand side of (2), then the policy π = (f, f,...) Π SD is optimal; see [4, 0,, 8]. In this section, we show that, under a simultaneous Doeblin condition and for small enough γ, solutions to the γ-eaoe exist. This result was already proved by Cavazos-Cadena and Fernández-Gaucherand [5]; however, the proof we present here (a) is significantly simpler than that in [5], and (b) as the developments in previous sections of this paper, it completely relies on elementary results of Perron Frobenius theory for non negative matrices (similarly to the approach in the seminal paper of Howard and Matheson howard-matheson:72.) First, we prove the existence claim for the case in which the transition matrix P f is irreducible for every decision function f. Although this result extends that of Howard Matheson s howard-matheson:72 in that aperiodicity of P f is not required, the proof essentially follows the policy improvement algorithm devised by these authors. Lemma 3 If a finite state CMC is irreducible, i.e., P f = (P ij (f(i)) is irreducible for every f Π SD, then for each γ 0 the γ-eaoe has a solution (α, w). Proof: Take f Π SD such that J f = min{j g : g Π SD }. Let ρ(f) and w f respectively be the spectral radius and corresponding positive eigenvector of P f, so that J f = γ log ρ(f) and ρ(f)w = P f w. We claim that (ρ(f), w f ) is a solution of the optimality equation. Assume to the contrary that there exists g F such that sgn(γ)ρ(f)w f sgn(γ) P g w f and ρ(f)w f P g w f. Then, taking into account that w f is a positive vector, it follows from Proposition 3 that sgn(γ)ρ(f) > sgn(γ)ρ(g). This last inequality contradicts the way we chose f, since J f J g implies sgn(γ)ρ(f) sgn(γ)ρ(g). Remark 8 (a) It is clear that under the irreducibility assumption of Lemma 3, a stationary deterministic policy exists which is optimal within Π SD, because that class of policies is finite. What the lemma guarantees, via the verification theorem cited at the beginning of this section, is that such policy will be optimal within the whole Π as well. (b) The smoothness (with respect to γ) of the EAC for a model as in the lemma above, has simple yet remarkable implications for the variation of the optimal policies with respect to γ. First, consider arbitrary f, g Π SD. If φ f > φ g then, from the continuity of the value functions J f ( ) and J g ( ) at γ = 0 we obtain that J f (γ) > J g (γ) for every γ in a neighborhood of zero. Now, if φ f = φ g, then the analytic character of the value functions implies that for some γ 0 > 0, J f (γ) J g (γ) > 0 γ < γ 0 or J f (γ) J g (γ) = 0 γ < γ 0. It follows at once from the previous observations that there exists γ 0 > 0 and decision functions f, g F (possibly equal) such that f is γ average optimal for every γ (0, γ 0 ) and g is γ-average optimal for every γ ( γ 0, 0). Moreover, f and g are risk-null average optimal. In particular, if there is only one risk-null average optimal policy (an unlikely scenario), that policy must be also γ-average optimal for every γ ( γ 0, γ 0 ). Lemma 4 If a finite state CMC satisfies a simultaneous Doeblin condition, i.e., if there exists an state i 0 such that i i 0 under P f for any f in the set F of decision functions and every i X, then the exponential average optimality equation has a solution whenever γ is small enough. 4

15 Proof: Let us respectively denote by C(f) and T (f) the class of recurrent states and the class of transient states corresponding to the transition probability matrix P f induced by f. Additionally, denote Q f := (P (i, j)) i,j C and as usual by Q f the corresponding disutility matrix. Observe that i 0 C f for every f F. Since F is finite, Corollary guarantees the existence of γ 0 > 0 such that if γ < γ 0, then ρ( Q f ) = ρ( P f ) := ρ(f) and J f (γ, i) = γ ρ( Q f ) := J f, for every i X and f F. As in Lemma 3, take f F such that J f = min{j g : g F} and denote by w f a nonnegative eigenvector of Pf corresponding to ρ(f ). To prove that (ρ(f ), w f ) is a solution of the optimality equation exactly as we did in Lemma 3, all we need to check is that w f is in fact a positive vector. To that end, relabel X so that C(f ) = {,, k} and T (f ) = {k +,, N}. Since ρ(f ) is also the spectral radius of Qf, which is irreducible, and (w f (),, w f (k)) is a corresponding nonnegative eigenvector, then that eigenvector must be positive, that is, we have w f (i) > 0 for i C f. Consider now i T f and a positive integer n such that P f n (i, i 0) > 0. From the eigenvalue equation P f n w f = ρ(f ) n w f we obtain the equality ρ(f ) n w f (i) = P f n (i, j)w f (j) + P f n (i, j)w f (j), j C f n Now, P f (i, i 0 ) > 0 and w f (i 0 ) > 0 imply that the first sum in the right hand side of the above equality is positive and consequently, w f (i) > 0. Thus, as we noted before, we can now proceed as in Lemma 3 to complete the proof of the present Lemma. j T f Remark 9 Observe that the remarks to Lemma 3 are still valid in the context of Lemma 4. References [] A. Arapostathis, V. S. Borkar, E. Fernández-Gaucherand, M. K. Ghosh, and S. I. Marcus. Discrete-time controlled Markov processes with average cost criterion: a survey. SIAM J. Control and Optimization, 3(2): , March 993. [2] A. Berman and R. Plemmons. Nonnegative Matrices in the Mathematical Sciences. Academic Press, New York, 979. [3] D. P. Bertsekas. Dynamic Programming: Deterministic and Stochastic Models. Prentice Hall, Englewood Cliffs, N.J., 987. [4] A. Brau and E. Fernandéz-Gaucherand. Controlled Markov chains with risk-sensitive exponential average cost criterion. In Proceedings of the 36th IEEE Conference on Decision and Control, pages , San Diego, CA, 997. [5] R. Cavazos-Cadena and E. Fernandéz-Gaucherand. Controlled Markov chains with risk-sensitive average cost criterion: A counter-example and necessary conditions for optimal solutions under strong recurrence assumptions. Submitted for publication, 998. [6] R. Cavazos-Cadena and E. Fernandéz-Gaucherand. Controlled Markov chains with risk-sensitive criteria: Average cost, optimality equations, and optimal solutions. Mathematical Methods of Operations Research, 49: ,

16 [7] R. Cavazos-Cadena and E. Fernandéz-Gaucherand. Risk sensitive optimal control in communicating average markov decision chains. In M. Dror, P. L Ecuyer, and D. F. Szidarovszky, editors, Modeling Uncertainty. An Examination of Stochastic Theory, Methods and Applications. Kluwer Academic Publishers, [8] A. Dembo and O. Zeitouni. Large Deviations Techniques and Applications. Jones and Bartlett, Boston, MA, 993. [9] E. A. Feinberg. Controlled Markov processes with arbitrary numerical criteria. Theory of Probability and its Applications, 27: , 982. [0] W. H. Fleming and D. Hernández-Hernández. Risk sensitive control of finite state machines on an infinite horizon i. SIAM Journal on Control and Optimization, 35(5):970 80, September 997. [] D. Hernández-Hernández and S. Marcus. Risk sensitive control of Markov processes in countable state space. Systems and Control Letters, 997. (to appear). [2] O. Hernández-Lerma and J. Laserre. Discrete Time Markov Control Processes. Springer, New York, 996. [3] R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, Cambridge, 982. [4] R. A. Howard and J. E. Matheson. Risk-sensitive Markov decision processes. Management Science, 8(7): , March 972. [5] T. Kato. A Short Introduction to Perturbation Theory for Linear Operators. Springer-Verlag, New York, 982. [6] S. I. Marcus, E. Fernández-Gaucherand, D. Hernández-Hernández, S. Coraluppi, and P. Fard. Risk sensitive Markov decision processes. In C. Byrnes, B. Data, D. Gilliam, and C. Martin, editors, Systems and Control in the Twenty-First Century, Progress in Systems and Control, pages Birkhauser, 997. [7] J. Neveu. Mathematical Foundations of the Calculus of Probability. Holden-Day,Inc., San Francisco, Cal., 965. [8] J. W. Pratt. Risk aversion in the small and in the large. Econometrica, 32():22 36, January-April 964. [9] M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York, 994. [20] P. Whittle. Risk-sensitive Optimal Control. John Wiley & Sons, Chichester,

Risk-Sensitive and Average Optimality in Markov Decision Processes

Risk-Sensitive and Average Optimality in Markov Decision Processes Risk-Sensitive and Average Optimality in Markov Decision Processes Karel Sladký Abstract. This contribution is devoted to the risk-sensitive optimality criteria in finite state Markov Decision Processes.

More information

REMARKS ON THE EXISTENCE OF SOLUTIONS IN MARKOV DECISION PROCESSES. Emmanuel Fernández-Gaucherand, Aristotle Arapostathis, and Steven I.

REMARKS ON THE EXISTENCE OF SOLUTIONS IN MARKOV DECISION PROCESSES. Emmanuel Fernández-Gaucherand, Aristotle Arapostathis, and Steven I. REMARKS ON THE EXISTENCE OF SOLUTIONS TO THE AVERAGE COST OPTIMALITY EQUATION IN MARKOV DECISION PROCESSES Emmanuel Fernández-Gaucherand, Aristotle Arapostathis, and Steven I. Marcus Department of Electrical

More information

Markov Chains, Stochastic Processes, and Matrix Decompositions

Markov Chains, Stochastic Processes, and Matrix Decompositions Markov Chains, Stochastic Processes, and Matrix Decompositions 5 May 2014 Outline 1 Markov Chains Outline 1 Markov Chains 2 Introduction Perron-Frobenius Matrix Decompositions and Markov Chains Spectral

More information

Applied Mathematics Letters. Comparison theorems for a subclass of proper splittings of matrices

Applied Mathematics Letters. Comparison theorems for a subclass of proper splittings of matrices Applied Mathematics Letters 25 (202) 2339 2343 Contents lists available at SciVerse ScienceDirect Applied Mathematics Letters journal homepage: www.elsevier.com/locate/aml Comparison theorems for a subclass

More information

Separable Utility Functions in Dynamic Economic Models

Separable Utility Functions in Dynamic Economic Models Separable Utility Functions in Dynamic Economic Models Karel Sladký 1 Abstract. In this note we study properties of utility functions suitable for performance evaluation of dynamic economic models under

More information

Online solution of the average cost Kullback-Leibler optimization problem

Online solution of the average cost Kullback-Leibler optimization problem Online solution of the average cost Kullback-Leibler optimization problem Joris Bierkens Radboud University Nijmegen j.bierkens@science.ru.nl Bert Kappen Radboud University Nijmegen b.kappen@science.ru.nl

More information

On the simultaneous diagonal stability of a pair of positive linear systems

On the simultaneous diagonal stability of a pair of positive linear systems On the simultaneous diagonal stability of a pair of positive linear systems Oliver Mason Hamilton Institute NUI Maynooth Ireland Robert Shorten Hamilton Institute NUI Maynooth Ireland Abstract In this

More information

Markov Chains and Stochastic Sampling

Markov Chains and Stochastic Sampling Part I Markov Chains and Stochastic Sampling 1 Markov Chains and Random Walks on Graphs 1.1 Structure of Finite Markov Chains We shall only consider Markov chains with a finite, but usually very large,

More information

An estimation of the spectral radius of a product of block matrices

An estimation of the spectral radius of a product of block matrices Linear Algebra and its Applications 379 (2004) 267 275 wwwelseviercom/locate/laa An estimation of the spectral radius of a product of block matrices Mei-Qin Chen a Xiezhang Li b a Department of Mathematics

More information

Markov Chains, Random Walks on Graphs, and the Laplacian

Markov Chains, Random Walks on Graphs, and the Laplacian Markov Chains, Random Walks on Graphs, and the Laplacian CMPSCI 791BB: Advanced ML Sridhar Mahadevan Random Walks! There is significant interest in the problem of random walks! Markov chain analysis! Computer

More information

21 Markov Decision Processes

21 Markov Decision Processes 2 Markov Decision Processes Chapter 6 introduced Markov chains and their analysis. Most of the chapter was devoted to discrete time Markov chains, i.e., Markov chains that are observed only at discrete

More information

On the static assignment to parallel servers

On the static assignment to parallel servers On the static assignment to parallel servers Ger Koole Vrije Universiteit Faculty of Mathematics and Computer Science De Boelelaan 1081a, 1081 HV Amsterdam The Netherlands Email: koole@cs.vu.nl, Url: www.cs.vu.nl/

More information

Z-Pencils. November 20, Abstract

Z-Pencils. November 20, Abstract Z-Pencils J. J. McDonald D. D. Olesky H. Schneider M. J. Tsatsomeros P. van den Driessche November 20, 2006 Abstract The matrix pencil (A, B) = {tb A t C} is considered under the assumptions that A is

More information

Discrete time Markov chains. Discrete Time Markov Chains, Limiting. Limiting Distribution and Classification. Regular Transition Probability Matrices

Discrete time Markov chains. Discrete Time Markov Chains, Limiting. Limiting Distribution and Classification. Regular Transition Probability Matrices Discrete time Markov chains Discrete Time Markov Chains, Limiting Distribution and Classification DTU Informatics 02407 Stochastic Processes 3, September 9 207 Today: Discrete time Markov chains - invariant

More information

CHUN-HUA GUO. Key words. matrix equations, minimal nonnegative solution, Markov chains, cyclic reduction, iterative methods, convergence rate

CHUN-HUA GUO. Key words. matrix equations, minimal nonnegative solution, Markov chains, cyclic reduction, iterative methods, convergence rate CONVERGENCE ANALYSIS OF THE LATOUCHE-RAMASWAMI ALGORITHM FOR NULL RECURRENT QUASI-BIRTH-DEATH PROCESSES CHUN-HUA GUO Abstract The minimal nonnegative solution G of the matrix equation G = A 0 + A 1 G +

More information

Computational complexity estimates for value and policy iteration algorithms for total-cost and average-cost Markov decision processes

Computational complexity estimates for value and policy iteration algorithms for total-cost and average-cost Markov decision processes Computational complexity estimates for value and policy iteration algorithms for total-cost and average-cost Markov decision processes Jefferson Huang Dept. Applied Mathematics and Statistics Stony Brook

More information

Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of. F s F t

Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of. F s F t 2.2 Filtrations Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of σ algebras {F t } such that F t F and F t F t+1 for all t = 0, 1,.... In continuous time, the second condition

More information

Invertibility and stability. Irreducibly diagonally dominant. Invertibility and stability, stronger result. Reducible matrices

Invertibility and stability. Irreducibly diagonally dominant. Invertibility and stability, stronger result. Reducible matrices Geršgorin circles Lecture 8: Outline Chapter 6 + Appendix D: Location and perturbation of eigenvalues Some other results on perturbed eigenvalue problems Chapter 8: Nonnegative matrices Geršgorin s Thm:

More information

Lecture notes for Analysis of Algorithms : Markov decision processes

Lecture notes for Analysis of Algorithms : Markov decision processes Lecture notes for Analysis of Algorithms : Markov decision processes Lecturer: Thomas Dueholm Hansen June 6, 013 Abstract We give an introduction to infinite-horizon Markov decision processes (MDPs) with

More information

Necessary and sufficient conditions for strong R-positivity

Necessary and sufficient conditions for strong R-positivity Necessary and sufficient conditions for strong R-positivity Wednesday, November 29th, 2017 The Perron-Frobenius theorem Let A = (A(x, y)) x,y S be a nonnegative matrix indexed by a countable set S. We

More information

Lecture 7. µ(x)f(x). When µ is a probability measure, we say µ is a stationary distribution.

Lecture 7. µ(x)f(x). When µ is a probability measure, we say µ is a stationary distribution. Lecture 7 1 Stationary measures of a Markov chain We now study the long time behavior of a Markov Chain: in particular, the existence and uniqueness of stationary measures, and the convergence of the distribution

More information

Detailed Proof of The PerronFrobenius Theorem

Detailed Proof of The PerronFrobenius Theorem Detailed Proof of The PerronFrobenius Theorem Arseny M Shur Ural Federal University October 30, 2016 1 Introduction This famous theorem has numerous applications, but to apply it you should understand

More information

Another algorithm for nonnegative matrices

Another algorithm for nonnegative matrices Linear Algebra and its Applications 365 (2003) 3 12 www.elsevier.com/locate/laa Another algorithm for nonnegative matrices Manfred J. Bauch University of Bayreuth, Institute of Mathematics, D-95440 Bayreuth,

More information

Perron Frobenius Theory

Perron Frobenius Theory Perron Frobenius Theory Oskar Perron Georg Frobenius (1880 1975) (1849 1917) Stefan Güttel Perron Frobenius Theory 1 / 10 Positive and Nonnegative Matrices Let A, B R m n. A B if a ij b ij i, j, A > B

More information

A monotonic property of the optimal admission control to an M/M/1 queue under periodic observations with average cost criterion

A monotonic property of the optimal admission control to an M/M/1 queue under periodic observations with average cost criterion A monotonic property of the optimal admission control to an M/M/1 queue under periodic observations with average cost criterion Cao, Jianhua; Nyberg, Christian Published in: Seventeenth Nordic Teletraffic

More information

Regular finite Markov chains with interval probabilities

Regular finite Markov chains with interval probabilities 5th International Symposium on Imprecise Probability: Theories and Applications, Prague, Czech Republic, 2007 Regular finite Markov chains with interval probabilities Damjan Škulj Faculty of Social Sciences

More information

The spectrum of a square matrix A, denoted σ(a), is the multiset of the eigenvalues

The spectrum of a square matrix A, denoted σ(a), is the multiset of the eigenvalues CONSTRUCTIONS OF POTENTIALLY EVENTUALLY POSITIVE SIGN PATTERNS WITH REDUCIBLE POSITIVE PART MARIE ARCHER, MINERVA CATRAL, CRAIG ERICKSON, RANA HABER, LESLIE HOGBEN, XAVIER MARTINEZ-RIVERA, AND ANTONIO

More information

N.G.Bean, D.A.Green and P.G.Taylor. University of Adelaide. Adelaide. Abstract. process of an MMPP/M/1 queue is not a MAP unless the queue is a

N.G.Bean, D.A.Green and P.G.Taylor. University of Adelaide. Adelaide. Abstract. process of an MMPP/M/1 queue is not a MAP unless the queue is a WHEN IS A MAP POISSON N.G.Bean, D.A.Green and P.G.Taylor Department of Applied Mathematics University of Adelaide Adelaide 55 Abstract In a recent paper, Olivier and Walrand (994) claimed that the departure

More information

OPTIMAL SCALING FOR P -NORMS AND COMPONENTWISE DISTANCE TO SINGULARITY

OPTIMAL SCALING FOR P -NORMS AND COMPONENTWISE DISTANCE TO SINGULARITY published in IMA Journal of Numerical Analysis (IMAJNA), Vol. 23, 1-9, 23. OPTIMAL SCALING FOR P -NORMS AND COMPONENTWISE DISTANCE TO SINGULARITY SIEGFRIED M. RUMP Abstract. In this note we give lower

More information

Value Iteration and Action ɛ-approximation of Optimal Policies in Discounted Markov Decision Processes

Value Iteration and Action ɛ-approximation of Optimal Policies in Discounted Markov Decision Processes Value Iteration and Action ɛ-approximation of Optimal Policies in Discounted Markov Decision Processes RAÚL MONTES-DE-OCA Departamento de Matemáticas Universidad Autónoma Metropolitana-Iztapalapa San Rafael

More information

Value and Policy Iteration

Value and Policy Iteration Chapter 7 Value and Policy Iteration 1 For infinite horizon problems, we need to replace our basic computational tool, the DP algorithm, which we used to compute the optimal cost and policy for finite

More information

ON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME

ON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME ON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME SAUL D. JACKA AND ALEKSANDAR MIJATOVIĆ Abstract. We develop a general approach to the Policy Improvement Algorithm (PIA) for stochastic control problems

More information

Some bounds for the spectral radius of the Hadamard product of matrices

Some bounds for the spectral radius of the Hadamard product of matrices Some bounds for the spectral radius of the Hadamard product of matrices Guang-Hui Cheng, Xiao-Yu Cheng, Ting-Zhu Huang, Tin-Yau Tam. June 1, 2004 Abstract Some bounds for the spectral radius of the Hadamard

More information

Modified Gauss Seidel type methods and Jacobi type methods for Z-matrices

Modified Gauss Seidel type methods and Jacobi type methods for Z-matrices Linear Algebra and its Applications 7 (2) 227 24 www.elsevier.com/locate/laa Modified Gauss Seidel type methods and Jacobi type methods for Z-matrices Wen Li a,, Weiwei Sun b a Department of Mathematics,

More information

SIGN PATTERNS THAT REQUIRE OR ALLOW POWER-POSITIVITY. February 16, 2010

SIGN PATTERNS THAT REQUIRE OR ALLOW POWER-POSITIVITY. February 16, 2010 SIGN PATTERNS THAT REQUIRE OR ALLOW POWER-POSITIVITY MINERVA CATRAL, LESLIE HOGBEN, D. D. OLESKY, AND P. VAN DEN DRIESSCHE February 16, 2010 Abstract. A matrix A is power-positive if some positive integer

More information

Gaussian Estimation under Attack Uncertainty

Gaussian Estimation under Attack Uncertainty Gaussian Estimation under Attack Uncertainty Tara Javidi Yonatan Kaspi Himanshu Tyagi Abstract We consider the estimation of a standard Gaussian random variable under an observation attack where an adversary

More information

Two Characterizations of Matrices with the Perron-Frobenius Property

Two Characterizations of Matrices with the Perron-Frobenius Property NUMERICAL LINEAR ALGEBRA WITH APPLICATIONS Numer. Linear Algebra Appl. 2009;??:1 6 [Version: 2002/09/18 v1.02] Two Characterizations of Matrices with the Perron-Frobenius Property Abed Elhashash and Daniel

More information

Linear Dependence of Stationary Distributions in Ergodic Markov Decision Processes

Linear Dependence of Stationary Distributions in Ergodic Markov Decision Processes Linear ependence of Stationary istributions in Ergodic Markov ecision Processes Ronald Ortner epartment Mathematik und Informationstechnologie, Montanuniversität Leoben Abstract In ergodic MPs we consider

More information

Key words. Strongly eventually nonnegative matrix, eventually nonnegative matrix, eventually r-cyclic matrix, Perron-Frobenius.

Key words. Strongly eventually nonnegative matrix, eventually nonnegative matrix, eventually r-cyclic matrix, Perron-Frobenius. May 7, DETERMINING WHETHER A MATRIX IS STRONGLY EVENTUALLY NONNEGATIVE LESLIE HOGBEN 3 5 6 7 8 9 Abstract. A matrix A can be tested to determine whether it is eventually positive by examination of its

More information

MATH 56A: STOCHASTIC PROCESSES CHAPTER 1

MATH 56A: STOCHASTIC PROCESSES CHAPTER 1 MATH 56A: STOCHASTIC PROCESSES CHAPTER. Finite Markov chains For the sake of completeness of these notes I decided to write a summary of the basic concepts of finite Markov chains. The topics in this chapter

More information

For δa E, this motivates the definition of the Bauer-Skeel condition number ([2], [3], [14], [15])

For δa E, this motivates the definition of the Bauer-Skeel condition number ([2], [3], [14], [15]) LAA 278, pp.2-32, 998 STRUCTURED PERTURBATIONS AND SYMMETRIC MATRICES SIEGFRIED M. RUMP Abstract. For a given n by n matrix the ratio between the componentwise distance to the nearest singular matrix and

More information

In particular, if A is a square matrix and λ is one of its eigenvalues, then we can find a non-zero column vector X with

In particular, if A is a square matrix and λ is one of its eigenvalues, then we can find a non-zero column vector X with Appendix: Matrix Estimates and the Perron-Frobenius Theorem. This Appendix will first present some well known estimates. For any m n matrix A = [a ij ] over the real or complex numbers, it will be convenient

More information

642:550, Summer 2004, Supplement 6 The Perron-Frobenius Theorem. Summer 2004

642:550, Summer 2004, Supplement 6 The Perron-Frobenius Theorem. Summer 2004 642:550, Summer 2004, Supplement 6 The Perron-Frobenius Theorem. Summer 2004 Introduction Square matrices whose entries are all nonnegative have special properties. This was mentioned briefly in Section

More information

On Finding Optimal Policies for Markovian Decision Processes Using Simulation

On Finding Optimal Policies for Markovian Decision Processes Using Simulation On Finding Optimal Policies for Markovian Decision Processes Using Simulation Apostolos N. Burnetas Case Western Reserve University Michael N. Katehakis Rutgers University February 1995 Abstract A simulation

More information

Total Expected Discounted Reward MDPs: Existence of Optimal Policies

Total Expected Discounted Reward MDPs: Existence of Optimal Policies Total Expected Discounted Reward MDPs: Existence of Optimal Policies Eugene A. Feinberg Department of Applied Mathematics and Statistics State University of New York at Stony Brook Stony Brook, NY 11794-3600

More information

We first repeat some well known facts about condition numbers for normwise and componentwise perturbations. Consider the matrix

We first repeat some well known facts about condition numbers for normwise and componentwise perturbations. Consider the matrix BIT 39(1), pp. 143 151, 1999 ILL-CONDITIONEDNESS NEEDS NOT BE COMPONENTWISE NEAR TO ILL-POSEDNESS FOR LEAST SQUARES PROBLEMS SIEGFRIED M. RUMP Abstract. The condition number of a problem measures the sensitivity

More information

On Differentiability of Average Cost in Parameterized Markov Chains

On Differentiability of Average Cost in Parameterized Markov Chains On Differentiability of Average Cost in Parameterized Markov Chains Vijay Konda John N. Tsitsiklis August 30, 2002 1 Overview The purpose of this appendix is to prove Theorem 4.6 in 5 and establish various

More information

Spectral Properties of Matrix Polynomials in the Max Algebra

Spectral Properties of Matrix Polynomials in the Max Algebra Spectral Properties of Matrix Polynomials in the Max Algebra Buket Benek Gursoy 1,1, Oliver Mason a,1, a Hamilton Institute, National University of Ireland, Maynooth Maynooth, Co Kildare, Ireland Abstract

More information

THE PERTURBATION BOUND FOR THE SPECTRAL RADIUS OF A NON-NEGATIVE TENSOR

THE PERTURBATION BOUND FOR THE SPECTRAL RADIUS OF A NON-NEGATIVE TENSOR THE PERTURBATION BOUND FOR THE SPECTRAL RADIUS OF A NON-NEGATIVE TENSOR WEN LI AND MICHAEL K. NG Abstract. In this paper, we study the perturbation bound for the spectral radius of an m th - order n-dimensional

More information

Spectrally arbitrary star sign patterns

Spectrally arbitrary star sign patterns Linear Algebra and its Applications 400 (2005) 99 119 wwwelseviercom/locate/laa Spectrally arbitrary star sign patterns G MacGillivray, RM Tifenbach, P van den Driessche Department of Mathematics and Statistics,

More information

UNCORRECTED PROOFS. P{X(t + s) = j X(t) = i, X(u) = x(u), 0 u < t} = P{X(t + s) = j X(t) = i}.

UNCORRECTED PROOFS. P{X(t + s) = j X(t) = i, X(u) = x(u), 0 u < t} = P{X(t + s) = j X(t) = i}. Cochran eorms934.tex V1 - May 25, 21 2:25 P.M. P. 1 UNIFORMIZATION IN MARKOV DECISION PROCESSES OGUZHAN ALAGOZ MEHMET U.S. AYVACI Department of Industrial and Systems Engineering, University of Wisconsin-Madison,

More information

I R TECHNICAL RESEARCH REPORT. Risk-Sensitive Probability for Markov Chains. by Vahid R. Ramezani and Steven I. Marcus TR

I R TECHNICAL RESEARCH REPORT. Risk-Sensitive Probability for Markov Chains. by Vahid R. Ramezani and Steven I. Marcus TR TECHNICAL RESEARCH REPORT Risk-Sensitive Probability for Markov Chains by Vahid R. Ramezani and Steven I. Marcus TR 2002-46 I R INSTITUTE FOR SYSTEMS RESEARCH ISR develops, applies and teaches advanced

More information

Nonnegative and spectral matrix theory Lecture notes

Nonnegative and spectral matrix theory Lecture notes Nonnegative and spectral matrix theory Lecture notes Dario Fasino, University of Udine (Italy) Lecture notes for the first part of the course Nonnegative and spectral matrix theory with applications to

More information

Estimates for probabilities of independent events and infinite series

Estimates for probabilities of independent events and infinite series Estimates for probabilities of independent events and infinite series Jürgen Grahl and Shahar evo September 9, 06 arxiv:609.0894v [math.pr] 8 Sep 06 Abstract This paper deals with finite or infinite sequences

More information

Matrix functions that preserve the strong Perron- Frobenius property

Matrix functions that preserve the strong Perron- Frobenius property Electronic Journal of Linear Algebra Volume 30 Volume 30 (2015) Article 18 2015 Matrix functions that preserve the strong Perron- Frobenius property Pietro Paparella University of Washington, pietrop@uw.edu

More information

Infinite-Horizon Average Reward Markov Decision Processes

Infinite-Horizon Average Reward Markov Decision Processes Infinite-Horizon Average Reward Markov Decision Processes Dan Zhang Leeds School of Business University of Colorado at Boulder Dan Zhang, Spring 2012 Infinite Horizon Average Reward MDP 1 Outline The average

More information

Markov decision processes and interval Markov chains: exploiting the connection

Markov decision processes and interval Markov chains: exploiting the connection Markov decision processes and interval Markov chains: exploiting the connection Mingmei Teo Supervisors: Prof. Nigel Bean, Dr Joshua Ross University of Adelaide July 10, 2013 Intervals and interval arithmetic

More information

Robustness of policies in Constrained Markov Decision Processes

Robustness of policies in Constrained Markov Decision Processes 1 Robustness of policies in Constrained Markov Decision Processes Alexander Zadorojniy and Adam Shwartz, Senior Member, IEEE Abstract We consider the optimization of finite-state, finite-action Markov

More information

Stochastic modelling of epidemic spread

Stochastic modelling of epidemic spread Stochastic modelling of epidemic spread Julien Arino Centre for Research on Inner City Health St Michael s Hospital Toronto On leave from Department of Mathematics University of Manitoba Julien Arino@umanitoba.ca

More information

R, 1 i 1,i 2,...,i m n.

R, 1 i 1,i 2,...,i m n. SIAM J. MATRIX ANAL. APPL. Vol. 31 No. 3 pp. 1090 1099 c 2009 Society for Industrial and Applied Mathematics FINDING THE LARGEST EIGENVALUE OF A NONNEGATIVE TENSOR MICHAEL NG LIQUN QI AND GUANGLU ZHOU

More information

A probabilistic proof of Perron s theorem arxiv: v1 [math.pr] 16 Jan 2018

A probabilistic proof of Perron s theorem arxiv: v1 [math.pr] 16 Jan 2018 A probabilistic proof of Perron s theorem arxiv:80.05252v [math.pr] 6 Jan 208 Raphaël Cerf DMA, École Normale Supérieure January 7, 208 Abstract Joseba Dalmau CMAP, Ecole Polytechnique We present an alternative

More information

The effect on the algebraic connectivity of a tree by grafting or collapsing of edges

The effect on the algebraic connectivity of a tree by grafting or collapsing of edges Available online at www.sciencedirect.com Linear Algebra and its Applications 428 (2008) 855 864 www.elsevier.com/locate/laa The effect on the algebraic connectivity of a tree by grafting or collapsing

More information

OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS

OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS Xiaofei Fan-Orzechowski Department of Applied Mathematics and Statistics State University of New York at Stony Brook Stony

More information

Optimality Inequalities for Average Cost MDPs and their Inventory Control Applications

Optimality Inequalities for Average Cost MDPs and their Inventory Control Applications 43rd IEEE Conference on Decision and Control December 14-17, 2004 Atlantis, Paradise Island, Bahamas FrA08.6 Optimality Inequalities for Average Cost MDPs and their Inventory Control Applications Eugene

More information

MDP Preliminaries. Nan Jiang. February 10, 2019

MDP Preliminaries. Nan Jiang. February 10, 2019 MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process

More information

MATH 56A: STOCHASTIC PROCESSES CHAPTER 2

MATH 56A: STOCHASTIC PROCESSES CHAPTER 2 MATH 56A: STOCHASTIC PROCESSES CHAPTER 2 2. Countable Markov Chains I started Chapter 2 which talks about Markov chains with a countably infinite number of states. I did my favorite example which is on

More information

Abed Elhashash and Daniel B. Szyld. Report August 2007

Abed Elhashash and Daniel B. Szyld. Report August 2007 Generalizations of M-matrices which may not have a nonnegative inverse Abed Elhashash and Daniel B. Szyld Report 07-08-17 August 2007 This is a slightly revised version of the original report dated 17

More information

A Single-letter Upper Bound for the Sum Rate of Multiple Access Channels with Correlated Sources

A Single-letter Upper Bound for the Sum Rate of Multiple Access Channels with Correlated Sources A Single-letter Upper Bound for the Sum Rate of Multiple Access Channels with Correlated Sources Wei Kang Sennur Ulukus Department of Electrical and Computer Engineering University of Maryland, College

More information

WEAK CONVERGENCES OF PROBABILITY MEASURES: A UNIFORM PRINCIPLE

WEAK CONVERGENCES OF PROBABILITY MEASURES: A UNIFORM PRINCIPLE PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY Volume 126, Number 10, October 1998, Pages 3089 3096 S 0002-9939(98)04390-1 WEAK CONVERGENCES OF PROBABILITY MEASURES: A UNIFORM PRINCIPLE JEAN B. LASSERRE

More information

Stochastic Shortest Path Problems

Stochastic Shortest Path Problems Chapter 8 Stochastic Shortest Path Problems 1 In this chapter, we study a stochastic version of the shortest path problem of chapter 2, where only probabilities of transitions along different arcs can

More information

8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains

8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains 8. Statistical Equilibrium and Classification of States: Discrete Time Markov Chains 8.1 Review 8.2 Statistical Equilibrium 8.3 Two-State Markov Chain 8.4 Existence of P ( ) 8.5 Classification of States

More information

Note that in the example in Lecture 1, the state Home is recurrent (and even absorbing), but all other states are transient. f ii (n) f ii = n=1 < +

Note that in the example in Lecture 1, the state Home is recurrent (and even absorbing), but all other states are transient. f ii (n) f ii = n=1 < + Random Walks: WEEK 2 Recurrence and transience Consider the event {X n = i for some n > 0} by which we mean {X = i}or{x 2 = i,x i}or{x 3 = i,x 2 i,x i},. Definition.. A state i S is recurrent if P(X n

More information

Fuzzy Limits of Functions

Fuzzy Limits of Functions Fuzzy Limits of Functions Mark Burgin Department of Mathematics University of California, Los Angeles 405 Hilgard Ave. Los Angeles, CA 90095 Abstract The goal of this work is to introduce and study fuzzy

More information

Continuous-Time Markov Decision Processes. Discounted and Average Optimality Conditions. Xianping Guo Zhongshan University.

Continuous-Time Markov Decision Processes. Discounted and Average Optimality Conditions. Xianping Guo Zhongshan University. Continuous-Time Markov Decision Processes Discounted and Average Optimality Conditions Xianping Guo Zhongshan University. Email: mcsgxp@zsu.edu.cn Outline The control model The existing works Our conditions

More information

Auxiliary signal design for failure detection in uncertain systems

Auxiliary signal design for failure detection in uncertain systems Auxiliary signal design for failure detection in uncertain systems R. Nikoukhah, S. L. Campbell and F. Delebecque Abstract An auxiliary signal is an input signal that enhances the identifiability of a

More information

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past. 1 Markov chain: definition Lecture 5 Definition 1.1 Markov chain] A sequence of random variables (X n ) n 0 taking values in a measurable state space (S, S) is called a (discrete time) Markov chain, if

More information

Simplex Algorithm for Countable-state Discounted Markov Decision Processes

Simplex Algorithm for Countable-state Discounted Markov Decision Processes Simplex Algorithm for Countable-state Discounted Markov Decision Processes Ilbin Lee Marina A. Epelman H. Edwin Romeijn Robert L. Smith November 16, 2014 Abstract We consider discounted Markov Decision

More information

arxiv:quant-ph/ v1 22 Aug 2005

arxiv:quant-ph/ v1 22 Aug 2005 Conditions for separability in generalized Laplacian matrices and nonnegative matrices as density matrices arxiv:quant-ph/58163v1 22 Aug 25 Abstract Chai Wah Wu IBM Research Division, Thomas J. Watson

More information

Point Process Control

Point Process Control Point Process Control The following note is based on Chapters I, II and VII in Brémaud s book Point Processes and Queues (1981). 1 Basic Definitions Consider some probability space (Ω, F, P). A real-valued

More information

Part III. 10 Topological Space Basics. Topological Spaces

Part III. 10 Topological Space Basics. Topological Spaces Part III 10 Topological Space Basics Topological Spaces Using the metric space results above as motivation we will axiomatize the notion of being an open set to more general settings. Definition 10.1.

More information

Numerical Analysis: Solving Systems of Linear Equations

Numerical Analysis: Solving Systems of Linear Equations Numerical Analysis: Solving Systems of Linear Equations Mirko Navara http://cmpfelkcvutcz/ navara/ Center for Machine Perception, Department of Cybernetics, FEE, CTU Karlovo náměstí, building G, office

More information

Applications of Controlled Invariance to the l 1 Optimal Control Problem

Applications of Controlled Invariance to the l 1 Optimal Control Problem Applications of Controlled Invariance to the l 1 Optimal Control Problem Carlos E.T. Dórea and Jean-Claude Hennet LAAS-CNRS 7, Ave. du Colonel Roche, 31077 Toulouse Cédex 4, FRANCE Phone : (+33) 61 33

More information

Near-Potential Games: Geometry and Dynamics

Near-Potential Games: Geometry and Dynamics Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo January 29, 2012 Abstract Potential games are a special class of games for which many adaptive user dynamics

More information

P i [B k ] = lim. n=1 p(n) ii <. n=1. V i :=

P i [B k ] = lim. n=1 p(n) ii <. n=1. V i := 2.7. Recurrence and transience Consider a Markov chain {X n : n N 0 } on state space E with transition matrix P. Definition 2.7.1. A state i E is called recurrent if P i [X n = i for infinitely many n]

More information

On the asymptotic dynamics of 2D positive systems

On the asymptotic dynamics of 2D positive systems On the asymptotic dynamics of 2D positive systems Ettore Fornasini and Maria Elena Valcher Dipartimento di Elettronica ed Informatica, Univ. di Padova via Gradenigo 6a, 353 Padova, ITALY e-mail: meme@paola.dei.unipd.it

More information

FIXED POINT ITERATIONS

FIXED POINT ITERATIONS FIXED POINT ITERATIONS MARKUS GRASMAIR 1. Fixed Point Iteration for Non-linear Equations Our goal is the solution of an equation (1) F (x) = 0, where F : R n R n is a continuous vector valued mapping in

More information

STOCHASTIC PROCESSES Basic notions

STOCHASTIC PROCESSES Basic notions J. Virtamo 38.3143 Queueing Theory / Stochastic processes 1 STOCHASTIC PROCESSES Basic notions Often the systems we consider evolve in time and we are interested in their dynamic behaviour, usually involving

More information

On Distributed Coordination of Mobile Agents with Changing Nearest Neighbors

On Distributed Coordination of Mobile Agents with Changing Nearest Neighbors On Distributed Coordination of Mobile Agents with Changing Nearest Neighbors Ali Jadbabaie Department of Electrical and Systems Engineering University of Pennsylvania Philadelphia, PA 19104 jadbabai@seas.upenn.edu

More information

Connections between spectral properties of asymptotic mappings and solutions to wireless network problems

Connections between spectral properties of asymptotic mappings and solutions to wireless network problems 1 Connections between spectral properties of asymptotic mappings and solutions to wireless network problems R. L. G. Cavalcante, Member, IEEE, Qi Liao, Member, IEEE, and S. Stańczak, Senior Member, IEEE

More information

Matrix analytic methods. Lecture 1: Structured Markov chains and their stationary distribution

Matrix analytic methods. Lecture 1: Structured Markov chains and their stationary distribution 1/29 Matrix analytic methods Lecture 1: Structured Markov chains and their stationary distribution Sophie Hautphenne and David Stanford (with thanks to Guy Latouche, U. Brussels and Peter Taylor, U. Melbourne

More information

INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING

INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING INTRODUCTION TO MARKOV CHAINS AND MARKOV CHAIN MIXING ERIC SHANG Abstract. This paper provides an introduction to Markov chains and their basic classifications and interesting properties. After establishing

More information

ELA

ELA SUBDOMINANT EIGENVALUES FOR STOCHASTIC MATRICES WITH GIVEN COLUMN SUMS STEVE KIRKLAND Abstract For any stochastic matrix A of order n, denote its eigenvalues as λ 1 (A),,λ n(a), ordered so that 1 = λ 1

More information

Linearizing Symmetric Matrix Polynomials via Fiedler pencils with Repetition

Linearizing Symmetric Matrix Polynomials via Fiedler pencils with Repetition Linearizing Symmetric Matrix Polynomials via Fiedler pencils with Repetition Kyle Curlett Maribel Bueno Cachadina, Advisor March, 2012 Department of Mathematics Abstract Strong linearizations of a matrix

More information

Prioritized Sweeping Converges to the Optimal Value Function

Prioritized Sweeping Converges to the Optimal Value Function Technical Report DCS-TR-631 Prioritized Sweeping Converges to the Optimal Value Function Lihong Li and Michael L. Littman {lihong,mlittman}@cs.rutgers.edu RL 3 Laboratory Department of Computer Science

More information

Markov Decision Processes and Dynamic Programming

Markov Decision Processes and Dynamic Programming Markov Decision Processes and Dynamic Programming A. LAZARIC (SequeL Team @INRIA-Lille) Ecole Centrale - Option DAD SequeL INRIA Lille EC-RL Course In This Lecture A. LAZARIC Markov Decision Processes

More information

Convex Optimization CMU-10725

Convex Optimization CMU-10725 Convex Optimization CMU-10725 Simulated Annealing Barnabás Póczos & Ryan Tibshirani Andrey Markov Markov Chains 2 Markov Chains Markov chain: Homogen Markov chain: 3 Markov Chains Assume that the state

More information

Boolean Inner-Product Spaces and Boolean Matrices

Boolean Inner-Product Spaces and Boolean Matrices Boolean Inner-Product Spaces and Boolean Matrices Stan Gudder Department of Mathematics, University of Denver, Denver CO 80208 Frédéric Latrémolière Department of Mathematics, University of Denver, Denver

More information

Series Expansions in Queues with Server

Series Expansions in Queues with Server Series Expansions in Queues with Server Vacation Fazia Rahmoune and Djamil Aïssani Abstract This paper provides series expansions of the stationary distribution of finite Markov chains. The work presented

More information

arxiv: v2 [cs.ds] 27 Nov 2014

arxiv: v2 [cs.ds] 27 Nov 2014 Single machine scheduling problems with uncertain parameters and the OWA criterion arxiv:1405.5371v2 [cs.ds] 27 Nov 2014 Adam Kasperski Institute of Industrial Engineering and Management, Wroc law University

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information