Zero-Sum Average Semi-Markov Games: Fixed Point Solutions of the Shapley Equation

Size: px
Start display at page:

Download "Zero-Sum Average Semi-Markov Games: Fixed Point Solutions of the Shapley Equation"

Transcription

1 Zero-Sum Average Semi-Markov Games: Fixed Point Solutions of the Shapley Equation Oscar Vega-Amaya Departamento de Matemáticas Universidad de Sonora May 2002 Abstract This paper deals with zero-sum average semi-markov games with Borel state and action spaces, and unbounded payoffs and mean holding times. A solution of the Sapley equation is obtained via the Banach Fixed Point Theorem assumming that the model satisfies a Lyapunov-like condition, a growth hypothesis on the payoff function and the mean holding time, besides standard continuity and compactness requirements. Key words. Zero-sum semi-markov games, Average payoff criterion, Lyapunov conditions, Fixed-point approach. AMS subject classification. 90D10, 90D20, 93E05. 1 Introduction Several recent papers have used variants of a Lyapunov-like condition to solve an average payoff optimization problem for markovian systems with unbounded payoff and Borel state and action spaces (see, e.g. [9], [13], [14], for Markov models; [15], [20], [28] for semi-markov models; [11], [16], [23] for zero-sum Markov games and [17] for zero-sum semi-markov games). The key property used in all these papers is that the imposed Lyapunov condition yields the socalled weighted geometric ergodicity (WGE) property, which is a generalization of the standard uniform geometric ergodicity in Markov chain theory (see [10], [12] and [21] for a detailed discussion of these concepts). Roughly speaking, in these papers the WGE property is combined, explicitly or implicitly, either with the vanishing discount factor approach or with some variants of the policy iteration algorithm for proving their main results. These facts are the first main difference with the present paper since, in spite of imposing a similar stability condition, we use instead a fixed-point approach which does not rely, at least explicitly, on the WGE property. This research was supported by CONACyT (México) under Grant E. 1

2 The fixed-point approach allows us to obtain directly the Shapley equation, which in turn yields the existence of a stationary optimal strategy pair or saddle point see Theorem 4.7 (a) and (b). In contrast, the approaches followed in [11], [16], [23] first show the existence of a stationary saddle point and then establish the Shapley equation. On the other hand, [20], [15], [17] recur to auxiliary models related to the original one; more precisely, [20] uses the socalled Schweitzer data transformation [26], while the analysis in [15] and [17] relies on certain perturbed models. A second key difference concerns the times between two consecutive decision epochs. In contrast with discrete-time Markov control processes and Markov games, the decision epochs in semi-markov control processes are random; thus it is necessary to ensure that such processes experience only finitely many transitions in each finite time period. This is usually done by assuming that the mean holding time function is bounded below by a constant, even for the discrete state space case (see, e.g. [2], [5], [19], [24] and their references). In particular, this condition plays a crucial role in the approaches followed in [28], [15], [17] and [20]; in fact, in the three latter references it is also assumed that the mean holding time function is bounded above by a constant while in the present paper it is only assumed that this function is positive. It is important to mention that, as a by-product, the fixed-point approach yields a minimax characterization of certain solution of the Shapley equation Theorem 4.7(c) which, seemingly, have not been previously discussed in the literature dealing with zero-sum stochastic games. We should also mention that the fixed-point approach has been used in several early papers (see, e.g. [7], [12], [18], [25]) but under much stronger ergodicity conditions, which, in particular, exclude the case of unbounded payoffs. The variant of Lyapunov condition we consider here was recently introduced in [27] for Markov control process and used in [8] to study minimax problems. In fact, the present paper extends to zero-sum semi-markov games the results of the two latter references. For brief surveys of the existing literature on stochastic games with finite or denumerable state space the reader can consult [1], [3], [6], [7] and [19]. The remainder of the paper is organized as follows. The semi-markov game model and the (ratio) expected average payoff criterion are introduced in Sections 2 and 3, respectively. The assumptions and main results are stated in Section 4. The proofs of all results are given in Sections 5 and 6. 2 The Game Model Throughout the paper we shall use the following notation. Given a Borel space S that is, a Borel subset of a complete separable metric space B(S) denotes the Borel σ algebra and measurability always means measurability with respect to B(S). The class of all probability measure on S is denoted by P(S). Given two Borel spaces S and S, a stochastic kernel ϕ( ) on S given S is a function such that ϕ( s ) is in P(S) for each s S, and ϕ(b ) is a measurable 2

3 function on S for each B B(S). Moreover, R + stands for the nonnegative real number subset and N (N 0, resp.) denotes the positive (nonnegative, resp.) integers subset. The semi-markov game model. This paper is concerned with a zero-sum semi-markov game modeled by (, A, B, K A, K B, Q, F, r) where is the state space, and the sets A and B are the control spaces for players 1 and 2, respectively. It is assumed that all these sets are Borel spaces. The constraint sets K A and K B are Borel subsets of A and B, respectively. Thus, for each x, the x-sections A(x) := {a A : (x, a) K A } B(x) := {b B : (x, b) K B }, stand for the sets of admissible actions or controls for players 1 and 2, respectively. Now, let K := {(x, a, b) : x, a A(x), b B(x)}, which, by [22], is a Borel subset of A B. The transition law Q( ) of the system is a stochastic kernel on given K. For each (x, a, b, y) K, F ( x, a, b, y) is a distribution function on R + := [0, + ), and F (t ) is a measurable function on K for each t R +. Finally, the payoff r is a measurable function on K R +. The game is played over an infinite horizon as follows: at time t = 0 the game is observed in some state x 0 = x and the players independently choose controls a 0 = a A(x 0 ) and b 0 = b B(x 0 ). Then, the system remains in state x 0 = x for a nonnegative random time δ 1 and player 1 receives the amount r(x, a, b, δ 1 ) from player 2. At time δ 1 the system jumps to a new state x 1 = x according to the probability measure Q( x, a, b). The distribution of the random variable δ 1, given that the system has jumped into state x, is F ( x, a, b, x ); that is, F (t x, a, b, x ) = Pr [δ 1 t x 0 = x, a 0 = a, b 0 = b, x 1 = x ] t R +. Thus, given that x 0 = x, a 0 = a and b 0 = b, the distribution of δ 1 is G(t x, a, b) := + 0 F (t x, a, b, y)q(dy x, a, b), t R +, (x, a, b) K, and it is called the holding time distribution. Immediately after the transition occurs, the players again choose controls, say, a 1 = a A(x ) and b 1 = b B(x ), and the above process is repeated over and over again. 3

4 This procedure yields a stochastic processes {(x n, a n, b n, δ n+1 )} where, for each n N 0, x n is the state of the system, a n and b n are the control variables for player 1 and 2, respectively, and δ n+1 is the holding time at state x n. The goal of player 1 (player 2, resp.) is to maximize (minimize, resp.) his/her flow rewards (costs, resp.) r(x 0, a 0, b 0, δ 1 ), r(x 1, a 1, b 1, δ 2 ), over an infinite horizon using an expected average reward (cost) criterion defined by (5) below. The functions on K given as τ(x, a, b) := + 0 tg(dt x, a, b) (1) R(x, a, b) := + 0 r(x, a, b, t)g(dt x, a, b) (2) are called the mean holding time and the mean payoff, respectively. Strategies. Let H 0 := and H n := K R + H n 1 for n N. Then, for each n N 0, a generic element of H n is denoted as h n := (x 0, a 0, b 0, δ 1,, x n 1, a n 1, b n 1, δ n, x n ) which can be thought of as the history of the game up to the time of the nth transition T n := T n 1 + δ n, n N, (3) where T 0 := 0. Thus a strategy for player 1 is a sequence π 1 = {π 1 n } of stochastic kernels π1 n on A given H n satisfying the constraint π 1 n (A(x n) h n ) = 1 h n H n, n N 0. The class of all strategies for player 1 is denoted by Π 1. For each x, let A(x) := P(A(x)) and denote by Φ 1 the class of all stochastic kernels ϕ 1 on A given such that ϕ 1 ( x) A(x) for all x. A policy π 1 is called stationary if π 1 n( h n ) = ϕ 1 ( x n ) h n H n, n N 0, for some stochastic kernel ϕ 1 in Φ 1. Following an standard convention, Φ 1 is identified with the class of stationary strategies for player 1. The sets of strategies Π 2 and Φ 2 for player 2 are defined in a similar way but writing B(x) and B(x) instead of A(x) and A(x), respectively. 4

5 Let (Ω, F) be the (canonical) measurable space consisting of the sample space Ω := (K R + ) and its product σ-algebra. Thus, for each strategy pair (π 1, π 2 ) Π 1 Π 2 and each initial state x, there exists a probability measure P π1,π 2 x defined on (Ω, F) which governs the evolution of the stochastic process {(x n, a n, b n, δ n+1 )}. The expectation operator with respect to the measure probability P π1,π 2 x is denoted as E π1,π 2 x. Throughout the paper we shall use the following notation: for a measurable function u on K and a stationary strategy pair (ϕ 1, ϕ 2 ) Φ 1 Φ 2, let u ϕ 1,ϕ 2(x) := u(x, a, b)ϕ 1 (da x)ϕ 2 (db x) x. (4) B(x) A(x) Thus, in particular, we shall write and, similarly, for all x. R ϕ 1,ϕ 2(x) := τ ϕ 1,ϕ 2(x) := Q ϕ 1,ϕ 2( x) := B(x) A(x) B(x) A(x) B(x) A(x) R(x, a, b)ϕ 1 (da x)ϕ 2 (db x), τ(x, a, b)ϕ 1 (da x)ϕ 2 (db x), Q( x, a, b)ϕ 1 (da x)ϕ 2 (db x), If the players use a stationary strategy pair, say (ϕ 1, ϕ 2 ), then the state process {x n } is a Markov chain with transition probability Q ϕ 1,ϕ2( ). In this case, the n-step transition probability is denoted by Q n ϕ 1,ϕ ( ) for each n N 2 0, where Q 0 ϕ 1,ϕ ( x) is the Dirac measure at x. Thus, for each u B 2 W (), Q n ϕ 1,ϕ 2u(x) := u(dy)q n ϕ 1,ϕ 2(dy x) = Eϕ1,ϕ 2 x u(x n ) x, n N 0. 3 The expected average payoff criterion The (ratio) expected average payoff (EAP) for the strategy pair (π 1, π 2 ) Π 1 Π 2, given the initial state x 0 = x, is defined as J(π 1, π 2, x) := lim inf n E π1,π 2 x n 1 k=0 r(x k, a k, b k, δ k+1 ) E π1,π 2 x T n. (5) It is easy to verify using properties of conditional expectation that 5

6 and also that E π1,π 2 x δ k+1 = E π1,π 2 x τ(x k, a k, b k ) E π1,π 2 x r(x k, a k, b k, δ k+1 ) = E π1,π 2 x R(x k, a k, b k ), for all x, (π 1, π 2 ) Π 1 Π 2, k N 0. Thus, (5) can be rewritten as J(π 1, π 2, x) = lim inf n E π1,π 2 x E π1,π 2 x n 1 Now consider the following functions on defined as k=0 R(x k, a k, b k ) n 1 k=0 τ(x k, a k, b k ). (6) L(x) := sup π 1 Π 1 inf π 2 Π J(π1, π 2, x) and U(x) := inf 2 π 2 Π 2 sup J(π 1, π 2, x), (7) π 1 Π 1 which are called the lower value and the upper value of the game, respectively, for the ratio EAP criterion. In general, L( ) U( ), but if it holds L( ) = U( ), the common function is called the value of the game and denoted by V ( ). If the game has a value V ( ), a strategy π 1 Π1 is said to be expected average payoff (EAP-) optimal for player 1 if Similarly, π 2 player 2 if inf π 2 Π J(π1, π 2, x) = V (x) x. 2 Π 2 is said to be expected average payoff (EAP-) optimal for sup J(π 1, π, 2 x) = V (x) x. π 1 Π 1 If π i is EAP-optimal for player i (i = 1, 2), then (π, 1 π ) 2 is called an EAPoptimal pair or saddle point. Note that (π 1, π2 ) is EAP-optimal if and only if J(π 1, π 2, x) J(π1, π2, x) J(π1, π2, x) x, (π 1, π 2 ) Π 1 Π 2. 4 Assumptions and main results The first condition imposed on the model, Assumption 4.1 below, ensures that the systems is regular, which means that it experiences finitely many jumps or transitions over each finite period of time. Usually, the regularity property is obtained assuming that the mean holding time τ is bounded below by a positive constant (see, e.g. [2], [5], [15], [17], [18], [19], [20], [24], [26], [28] and their references). In the present paper is only assumed that the mean holding time is a positive function. 6

7 Assumption 4.1.(Regularity condition) τ(x, a, b) > 0 for all (x, a, b) K. The second hypothesis imposes a growth condition both in the mean holding time and the mean payoff. Assumption 4.2. There exists a measurable function W ( ) on bounded below by a constant θ > 0 such that max {τ(x, a, b), R(x, a, b) } KW (x) (x, a, b) K, for a fixed positive constant K. To state the third set of hypotheses as well as several of its consequences some notation is required. For a measurable function u( ) on, define the weighted norm with respect to W (W norm, for short) as u(x) u W := sup x W (x), and denote by B W () the Banach space of all measurable functions with finite W norm. Moreover, for a measure γ( ) on let γ(u) := u(x)γ(dx), whenever the integral is well defined. Assumption 4.3.(Lyapunov condition) There exists a non-trivial measure ν( ) on, a nonnegative measurable function S( ) on K and a positive constant λ < 1 such that: (a) ν(w ) < ; (b) Q(B x, a, b) ν(b)s(x, a, b) B B(), (x, a, b) K; (c) W (y)q(dy x, a, b) λw (x) + S(x, a, b)ν(w ) (x, a, b) K; (d) ν(s ϕ 1,ϕ 2) > 0 (ϕ1, ϕ 2 ) Φ 1 Φ 2. As we mentioned in the Introduction, Assumption 4.3 allows us to use a fixed-point approach. More precisely, we consider the kernel Q( x, a, b) := Q( x, a, b) ν( )S(x, a, b) (x, a, b) K, (8) which, under Assumption 4.3, is nonnegative. The point here is that Assumption 4.3(c) can be expressed equivalently as W (y) Q(dy x, a, b) λw (x) (x, a, b) K, (9) which, roughly speaking, means that Q( ) satisfies a certain contraction property. This contraction property is precisely what we shall exploit to prove our main results (Theorems 4.5 and 4.7 below). 7

8 Assumption 4.3 was first used in [27], though it is actually a simplified version of the Lyapunov condition introduced in [9]. Specifically, besides the conditions in Assumption 4.3, [9] assume the existence of a common irreducibility measure for the transition laws induced by the stationary strategies and also that the inequality in Assumption 4.3(c) holds uniformly, that is, inf ϕ 1,ϕ 2 ν(s ϕ 1,ϕ2) > 0. However, as it is shown in [27, Thm. 3.3] see Proposition 4.4 below the latter condition is not required while the irreducibility condition is redundant. On the other hand, several other papers have used similar Lyapunov conditions to Assumption 4.3 (see, e.g. [13], [14], [15], [16], [17], [23]) but with some important differences, which seemingly precludes the fixed point-approach. For instance, the fourth latter papers suppose instead of the conditions in Assumption 4.3 that W (y)q(dy x, a, b) λw (x) + bi C (x) (x, a, b) K where C is a Borel subset of, b is a positive constant, λ (0, 1) and W ( ) is bounded on C, and also that Q ϕ 1,ϕ 2(B x) δi C(x)ν ϕ 1,ϕ 2(B) for all x, B B(), (ϕ 1, ϕ 2 ) Φ 1 Φ 2, where each ν ϕ 1,ϕ2( ) is a probability measure concentrated on C and δ is a positive constant. A quick glance at the latter conditions shows that they do not lead to a contraction property as in (9), so the fixed-point approach is not applicable, at least in the way we do here. Finally, it is convenient to point out again that, in spite of imposing similar conditions to Assumption 4.3, the approaches followed in all the papers so far cited rely on the WGE mentioned in the Introduction, with the only exception of [27] and [8]. In the next proposition are stated some important consequences of Assumption 4.2 and 4.3, which are proved in [27] using fixed-points arguments too. Proposition 4.4. Suppose that Assumptions 4.3 holds. Then, for each stationary strategy pair (ϕ 1, ϕ 2 ) Φ 1 Φ 2, the following holds: (a) The transition law Q ϕ 1,ϕ2( x) is positive Harris recurrent. Thus, in particular, there exists a unique invariant probability measure µ ϕ 1,ϕ2( ), that is, µ ϕ 1,ϕ 2( ) = Q ϕ 1,ϕ 2( x)µ ϕ 1,ϕ 2(dx). Moreover, ν is an irreducibility measure for Q ϕ 1,ϕ 2( ). (b) µ ϕ 1,ϕ2(W ) is finite; in fact, it holds the bounds θ µ ϕ 1,ϕ 2(W ) ν(w ) (1 λ)ν(). (10) 8

9 Next observe that, under the Assumptions , by Proposition 4.4 the constants ρ(ϕ 1, ϕ 2 ) := µ ϕ 1,ϕ 2(R ϕ 1,ϕ 2) µ ϕ 1,ϕ 2(τ ϕ 1,ϕ 2) (ϕ 1, ϕ 2 ) Φ 1 Φ 2 (11) are finite. Then, for each (ϕ 1, ϕ 2 ) Φ 1 Φ 2, define on B W () the operator L ϕ 1,ϕ 2u(x) := R ϕ 1,ϕ 2(x) + u(y)q ϕ 1,ϕ2(dy x) x, (12) where R ϕ 1,ϕ 2( ) := R ϕ 1,ϕ 2( ) ρ(ϕ1, ϕ 2 )τ ϕ 1,ϕ2( ). (13) Theorem 4.5. Suppose that Assumptions 4.1, 4.2 and 4.3 hold. Then for each stationary strategy pair (ϕ 1, ϕ 2 ) Φ 1 Φ 2 : (a) There exists a unique function h ϕ 1,ϕ 2 B W (), with ν(h ϕ 1,ϕ2) = 0, that satisfies the (semi-markov) Poisson equation h ϕ 1,ϕ 2(x) = L ϕ 1,ϕ 2h ϕ 1,ϕ 2(x) = R ϕ 1,ϕ 2(x) + (b) Moreover, J(ϕ 1, ϕ 2, ) = ρ(ϕ 1, ϕ 2 ). h ϕ 1,ϕ 2(y)Q ϕ 1,ϕ2(dy x) x ; Now, we impose some compactness/continuity conditions on the model to assure the existence of measurable minimizers/maximizers; notice that this can be done in several settings (see, e.g. [10, Thm. 3.5, p. 28] or [8, Lemma 3.5]). Here, for simplicity, we consider the following one. Assumption 4.6.(Compactness/continuity conditions) For each (x, a, b) K : (a) A(x) and B(x) are non-empty compact subsets; (b) R(x,, b) is upper semicontinuous on A(x), and R(x, a, ) is lower semicontinuous on B(x); (c) τ(x,, b) and τ(x, a, ) are continuous on A(x) and B(x), respectively; (d) S(x,, b) and S(x, a, ) are continuous on A(x) and B(x), respectively; (e) For each bounded measurable function v on, the functions v(y)q(dy x,, b) and v(y)q(dy x, a, ) are continuous on A(x) and B(x), respectively; (f) The functions 9

10 W (y)q(dy x,, b) and are continuous on A(x) and B(x), respectively. W (y)q(dy x, a, ) Theorem 4.7. Suppose that Assumptions 4.1, 4.2, 4.3 and 4.6 hold. Then: (a) There exists a unique function h B W () with ν(h ) = 0, a stationary strategy pair (ϕ 1, ϕ 2 ) Φ 1 Φ 2 and a constant ρ which satisfy the Shapley equation } h (x) = min {R ϕ 1,ϕ 2(x) ϕ 2 Φ ρ τ ϕ 1,ϕ 2(x) + h (y)q ϕ 1 2,ϕ 2(dy x) x, } = max {R ϕ 1,ϕ 2 (x) ϕ 1 Φ ρ τ ϕ 1,ϕ 2 (x) + h (y)q ϕ 1,ϕ 2(dy x) 1 = R ϕ 1,ϕ 2 (x) ρ τ ϕ 1,ϕ 2 (x) + h (y)q ϕ 1,ϕ 2 (dy x). (b) The constant ρ is the value of the game and (ϕ 1, ϕ 2 ) is an EAP-optimal stationary strategy pair. That is, J(ϕ 1, ϕ2, ) = ρ and Hence, by Theorem 4.5, J(π 1, ϕ 2, ) ρ J(ϕ 1, π2, ) (π 1, π 2 ) Π 1 Π 2. (c) Moreover, h ( ) = h ϕ 1,ϕ 2 ( ). ρ = ρ(ϕ 1, ϕ 2 ) = max min ϕ 2 Φ ρ(ϕ1, ϕ 2 ) = min 2 ϕ 1 Φ 1 ϕ 2 Φ 2 max ϕ 1 Φ ρ(ϕ1, ϕ 2 ), (14) 1 h ( ) = h ϕ 1,ϕ 2 ( ) = min h ( ) = max h ( ), (15) ϕ 2 F 2 ϕ 1,ϕ2 ϕ 1 Φ 1 ϕ 1,ϕ 2 where F i stands for the class of all stationary EAP-optimal strategies for player i (i = 1, 2). It is worth mentioning that, to the best of our knowledge, the minimax characterization of the solution h ( ) of the Shapley equation given in (15) has been discussed in any of the previous paper dealing with zero-sum stochastic games, even for the case of discrete state space. 10

11 5 Proof of Theorem 4.5 For the proof of the results in Section 4 several preliminary results are needed. The first one are collected in the next lemma, which we state without proofs because they follow directly from Assumption 4.1, 4.2, and 4.3. Lemma 5.1. Suppose that Assumption 4.3 holds. Then: (a) For each function u in B W (), 1,π lim 2 n n Eπ1 x u(x n ) = 0 x, (π 1, π 2 ) Π 1 Π 2 ; (b) For each stationary strategy pair (ϕ 1, ϕ 2 ) Φ 1 Φ 2, it holds that µ ϕ 1,ϕ 2(S (1 λ)θ ϕ 1,ϕ2) > 0; ν(w ) (c) If in addition Assumptions 4.1 and 4.2 hold, then µ ϕ 1,ϕ 2(S ϕ 1,ϕ 2) µ ϕ1,ϕ 2(τ ϕ 1,ϕ 2) 1 λ Kν(W ) > 0. The following lemma concerns the existence of solutions to the Poisson equation which, in addition to being interesting in itself, plays a key role in our development. In fact, its proof exhibits the way we take advantage of the contraction property (9). Lemma 5.2. Suppose Assumption 4.2 and 4.3 holds and let (ϕ 1, ϕ 2 ) Φ 1 Φ 2 be fixed but arbitrary. Then, for each function v in B W () there exists a unique function h v in B W (), with ν(h v ) = 0, which satisfies the Poisson equation h v (x) = v(x) µ ϕ 1,ϕ 2(v) + h v (y)q ϕ 1,ϕ2(dy x) x. (16) Thus, from Lemma 5.1(a), 1 µ ϕ 1,ϕ2(v) = lim n n 1 1 n Eϕ,ϕ 2 x k=0 v(x k ) x. (17) Proof of Lemma 5.2. Fix a function v B W (), and let µ( ) := µ ϕ 1,ϕ 2( ), S( ) := S ϕ 1,ϕ 2( ) and Q( ) := Q ϕ 1,ϕ2( ). Next, define T u(x) = v(x) µ(v) + u(y) Q(dy x) x, u B W (). By Assumption 4.3(c), it is clear that T maps B W () into itself. Moreover, for any functions u, w B W (), it holds that 11

12 T u(x) T w(x) u(y) w(y) Q(dy x) for all x. Hence, u w W W (y) Q(dy x) u w W λw (x) T u T w W λ u w W. That is, T is a contraction operator from BW () into itself with modulus λ. Then, by the Banach Fixed Point Theorem, there exists a unique function h v B W () that satisfies the equation h v (x) = v(x) µ(v) + h v (y) Q(dy x) x = v(x) µ(v) + h v (y)q(dy x) ν(h v )S(x). Now, an integration with respect to the invariant probability measure µ( ) in both sides of the last equation yields ν(h v )µ(s) = 0, which, by Lemma 5.1(b), implies that ν(h v ) = 0. Therefore, h v satisfies the Poisson equation h v (x) = v(x) µ(v) + h v (y)q(dy x) x, which proves (16). Finally, the property (17) is obtained by iteration of the Poisson equation and using Lemma 5.1(a). Now we proceed to prove Theorem 4.5. Proof of Theorem 4.5. Let (ϕ 1, ϕ 2 ) Φ 1 Φ 2 be fixed but arbitrary. Thus, since the function v( ) := R ϕ 1,ϕ 2( ) = R ϕ 1,ϕ 2( ) ρ(ϕ1, ϕ 2 ) τ ϕ 1,ϕ 2( ) is in B W (), by Lemma 5.2 there exists a unique function h ϕ 1,ϕ 2 B W () with ν(h ϕ 1,ϕ2) = 0 that satisfies the Poisson equation h ϕ 1,ϕ 2(x) = R ϕ 1,ϕ 2(x) + h ϕ 1,ϕ 2(y)Q ϕ 1,ϕ2(dy x) x. 12

13 This proves part (a) of the theorem. Next, to prove part (b), first note that iteration of the last equation yields h ϕ 1,ϕ 2(x) = Eϕ1,ϕ 2 x [ n 1 ] n 1 R ϕ 1,ϕ 2(x k) ρ(ϕ 1, ϕ 2 ) τ ϕ 1,ϕ 2(x k) k=1 + h ϕ 1,ϕ 2(y)Qn ϕ 1,ϕ 2(dy x) k=1 (18) for all n N and x. Moreover, by Assumptions 4.1 and 4.2, applying Lemma 5.2 with v( ) := τ ϕ 1,ϕ2( ), we obtain n 1 µ ϕ 1,ϕ 2(τ 1,ϕ ϕ 1,ϕ2) = lim 2 n n Eϕ1 x τ ϕ 1,ϕ 2(x k) > 0 x, k=1 which combined with (18) and Lemma 5.1(a) implies that ρ(ϕ 1, ϕ 2 E ϕ1,ϕ 2 x ) = lim n E ϕ1,ϕ 2 x n 1 k=0 R ϕ 1,ϕ 2(x k) n 1 k=0 τ ϕ 1,ϕ 2(x k) x. 6 Proof of Theorem 4.7 Define the constants ρ l := sup ϕ 1 Φ 1 inf ϕ 2 Φ ρ(ϕ1, ϕ 2 ) and ρ u := inf 2 ϕ 2 Φ 2 sup ρ(ϕ 1, ϕ 2 ). ϕ 1 Φ 1 We show in the next lemma that this constants are finite. Observe that this trivially holds if one assume that the mean holding time function is bounded below by a positive constant. Lemma 6.1. Suppose that Assumptions 4.1, 4.2, 4.3 and 4.6 hold. Then ρ l < and ρ u <. Proof of Lemma 6.1. Let ϕ 1 be a fixed but arbitrary stationary strategy for player 1 and consider the Markov (one player) model M = (, K B, Q, τ) where and K B are as above, and the transition law and the one-step cost function are defined as 13

14 Q( x, b) := Q( x, a, b)ϕ 1 (da x) A(x) τ(x, b) := A(x) τ(x, a, b)ϕ 1 (da x) for all (x, b) K B, respectively. Thus following the notation (4), for all x and ϕ 2 Φ 2, define Q ϕ 2( x) := B(x) Q( x, b)ϕ 2 (db x) τ ϕ 2(x) := B(x) τ(x, b)ϕ 2 (db x). Note that Q ϕ 2( ) = Q ϕ 1,ϕ 2( ) and τ ϕ 2( ) = τ ϕ 1,ϕ 2( ) for all ϕ2 Φ 2. The Markov model M satisfies all the conditions in [27, Thm. 3.6]; hence, in particular, there exists a stationary policy ϕ 2 + Φ 2 such that µ ϕ1,ϕ 2 (τ + ϕ 1,ϕ 2 ) = µ + ϕ 1,ϕ 2 ( τ ϕ2) = inf µ + ϕ 2 Φ 2 ϕ 1,ϕ 2 ( τ ϕ 2). + Then, by Assumption 4.1, it holds that µ ϕ1,ϕ 2 (τ + ϕ 1,ϕ 2 ) > 0. Next observe that + ρ(ϕ 1, ϕ 2 ) µ ϕ 1,ϕ 2( R ϕ 1,ϕ 2 ) µ ϕ 1,ϕ 2(τ ϕ 1,ϕ 2) µ ϕ1,ϕ2(w ) µ ϕ 1,ϕ 2 + (τ ϕ 1,ϕ 2 + ) k µ ϕ 1,ϕ 2 + (τ ϕ 1,ϕ 2 + ) where the last inequality follows from (10) with k := ν(w )[(1 λ)ν()] 1. Hence, k < µ ϕ 1,ϕ 2 (τ + ϕ 1,ϕ 2 ) inf ϕ 2 Φ ρ(ϕ1, ϕ 2 ) ρ(ϕ 1, ϕ 2 ) ϕ 1 Φ 1. (19) 2 + Now fix ϕ 2 Φ 2 and proceed as above to get a stationary strategy ϕ 1 + Φ such that µ ϕ 1 +,ϕ 2(τ ϕ 1 +,ϕ2) = inf µ ϕ 1,ϕ 2(τ ϕ 1,ϕ2) > 0. ϕ 1 Φ 1 14

15 Then, Hence, ρ(ϕ 1, ϕ 2 ) µ ϕ 1,ϕ 2( R ϕ 1,ϕ 2 ) µ ϕ 1,ϕ 2(τ ϕ 1,ϕ 2) Therefore, by (19)-(20), ρ(ϕ 1, ϕ 2 ) sup ϕ 1 Φ 1 ρ(ϕ 1, ϕ 2 ) k µ ϕ 1 +,ϕ 2(τ < +. ϕ 1 +,ϕ2) k µ ϕ 1 +,ϕ 2(τ. (20) ϕ 1 +,ϕ2) < ρ l = sup ϕ 1 Φ 1 which proves the desired result. inf ϕ 2 Φ ρ(ϕ1, ϕ 2 ) ρ u = 2 inf ϕ 2 Φ 2 sup ρ(ϕ 1, ϕ 2 ) < + ϕ 1 Φ 1 For the proof of Theorem 4.7 introduce the following operators: for each u B W () define L l u(x, a, b) := R l (x, a, b) + u(y) Q(dy x, a, b) (x, a, b) K, (21) where R l (x, a, b) := R(x, a, b) ρ l τ(x, a, b) (x, a, b) K. (22) Thus, following the notation (4), for each strategy pair (ϕ 1, ϕ 2 ) Φ 1 Φ 2 define the operators L l ϕ 1,ϕ 2u( ) := Rl ϕ 1,ϕ 2( ) + u(y) Q ϕ 1,ϕ2(dy ), (23) for each u B W (). L u( ) := sup inf ϕ 1 A(x) ϕ 2 B(x) Ll ϕ 1,ϕ 2u( ), (24) The results in the next lemma are a combination of well-known measurable selection theorem [22] and Fan Minimax Theorem [4]. The proof is omitted since it is the same as the proof of Lemma 6.5 in [11] and Lemmas 2, 3 and 4 in [23]. Lemma 6.2. Suppose that Assumption 4.1, 4.2, 4.3 and 4.6 hold and let u be a fixed function in B W (). Then (a) For each x, the sets A(x) and B(x) are compact with respect to the weak convergence of measures; (b) For each x, (ϕ 1, ϕ 2 ) Φ 1 Φ 2 and u B W (), the mappings 15

16 ϕ 1 L l ϕ 1,ϕ 2u(x) ϕ 2 L l ϕ 1,ϕ 2u(x) are upper semicontinuous and lower semicontinuous on A(x) and B(x), respectively, with respect to the weak convergence of measures; (c) Moreover, there exists a stationary strategy pair (ϕ 1 u, ϕ 2 u) Φ 1 Φ 2 such that L u( ) = L l ϕ 1 u,ϕ2 u u( ) Hence, L u( ) is in B W (). = max ϕ 1 Φ Ll 1 ϕ 1,ϕ u( ) = min 2 u ϕ 2 Φ Ll 2 ϕ,ϕ2u( ). 1 u The proof of Theorem 4.7 follows the same scheme as that of Lemma 5.2, so we first show in Lemma 6.3 below that L is a contraction operator from B W () into itself with modulus λ; hence, by the Banach Fixed Point Theorem, there exists a unique function h in B W () such that h ( ) = L h ( ) = sup ϕ 1 A(x) As a second step, in Lemma 6.4, we prove that ρ := ρ l = ρ u and ν(h ) 0. Once the latter is done, we show in Lemma 6.5 that inf ϕ 2 B(x) Ll ϕ 1,ϕ 2h ( ). (25) Then, (25) becomes ν(h ) = 0. h (x) = sup inf ϕ 1 A(x) ϕ 2 B(x) [R ϕ1,ϕ 2(x) ρ τ ϕ1,ϕ 2(x) + ] h (y)q ϕ1,ϕ 2(dy x) for all x. Hence, Lemma 6.2 yields a stationary strategy pair (ϕ 1, ϕ2 ) Φ 1 Φ 2 satisfying Theorem 4.7(a). Lemma 6.3. Suppose that assumptions in Theorem 4.7 hold. Then, L in (24) is a contraction operator from B W () into itself with modulus λ. Thus, by the Banach Fixed Point Theorem and Lemma 6.2, there exists a unique function h in B W () and a stationary strategy pair (ϕ 1, ϕ2 ) Φ1 Φ 2 such that 16

17 h ( ) = L h ( ) = L l ϕ 1,ϕ2 h ( ) (26) = min ϕ 2 B(x) Ll ϕ 1,ϕ2h ( ) = max ϕ 1 A(x) Ll ϕ 1,ϕ 2h ( ). (27) Proof of Lemma 6.3. By Lemma 6.2 it only remains to prove that L is a contraction operator from B W () into itself with modulus λ. To prove this, consider arbitrary functions u, v in B W () and observe, by Assumption 4.3(b) and (9), that L l ϕ 1,ϕ 2u( ) Ll ϕ 1,ϕ 2v( ) u v W W (y) Q ϕ1,ϕ 2(dy ) for all (ϕ 1, ϕ 2 ) Φ 1 Φ 2. This implies that u v W λw ( ) L l ϕ 1,ϕ 2u( ) Ll ϕ 1,ϕ 2v( ) + u v W λw ( ) (ϕ1, ϕ 2 ) Φ 1 Φ 2. Thus, the latter inequality together Lemma 6.2 implies inf ϕ 2 B(x) Ll ϕ 1,ϕ 2u( ) inf ϕ 2 B(x) Ll ϕ 1,ϕ 2v( ) + u v W λw ( ) ϕ1 Φ 1, which, using again Lemma 6.2, yields L u( ) L v( ) + u v W λw ( ). Similarly, interchanging the role of u and v, it also holds that Therefore, L v( ) L u( ) + u v W λw ( ). L u L v W λ u v W. That is, L is a contraction operator from B W () into itself with modulus λ. Now, the Banach Fixed Point Theorem together with Lemma 6.2 ensures the existence of a unique function h B W () and a stationary strategy pair (ϕ 1, ϕ 2 ) Φ 1 Φ 2 satisfying (26)-(27). Lemma 6.4. Suppose that assumptions in Theorem 4.7 hold and let h be as in Lemma 6.3. Then, 17

18 ν(h ) 0 and ρ l = ρ u. Proof of Lemma 6.4. Let (ϕ 1, ϕ2 ) be as in Lemma 6.3. Then, h (x) = min [R lϕ1,ϕ2(x) ] + h (y) Q ϕ 1 ϕ 2 B(x),ϕ 2(dy x) (28) Rϕ l 1,ϕ2(x) + h (y) Q ϕ 1,ϕ 2(dy x) = Rϕ l 1,ϕ2(x) + h (y)q ϕ 1,ϕ 2(dy x) ν(h )S ϕ 1,ϕ 2(x) for all x, ϕ 2 Φ 2. Then, an integration with respect to the invariant probability measure µ ϕ 1,ϕ 2 yields 0 µ ϕ 1,ϕ 2(Rl ϕ 1,ϕ2) ν(h )µ ϕ 1,ϕ 2(S ϕ 1,ϕ2) ϕ2 Φ 2, which implies that ν(h )µ ϕ 1,ϕ 2(S ϕ 1,ϕ2) µ ϕ 1,ϕ2(R ϕ 1,ϕ2) ρl µ ϕ 1,ϕ 2(τ ϕ 1,ϕ2) = µ ϕ 1,ϕ 2(τ ϕ 1,ϕ2) [ ρ(ϕ 1, ϕ2 ) ρ l], for all ϕ 2 Φ 2. Now, taking infimum over Φ 2, we obtain [ ν(h )µ ϕ 1 inf,ϕ2(s ϕ 1,ϕ2) ] ϕ 2 B(x) µ ϕ 1,ϕ 2(τ ϕ 1,ϕ2) inf ϕ 2 B(x) ρ(ϕ1, ϕ 2 ) ρ l 0, which, by Assumption 4.1 and Lemma 5.1(b), implies that ν(h ) 0. This inequality combined with (27) implies h (x) = ] max [R lϕ1,ϕ2 (x) + h (y) Q ϕ 1,ϕ 2(dy x) ϕ 1 A(x) ] max [R lϕ1,ϕ2 (x) + h (y)q ϕ 1,ϕ 2(dy x) ϕ 1 A(x) R l ϕ 1,ϕ 2 (x) + h (y)q ϕ 1,ϕ 2 (dy x) 18

19 for all x, ϕ 1 Φ 1. Now, integrating both sides of the latter inequality with respect to the invariant probability measure µ ϕ 1,ϕ 2, we see that 0 µ ϕ 1,ϕ 2 (Rl ϕ 1,ϕ 2 ) = µ ϕ 1,ϕ 2 (R ϕ 1,ϕ 2 ) ρl µ ϕ 1,ϕ 2 (τ ϕ 1,ϕ 2 ) ϕ1 Φ 1, which implies that Hence, ρ l ρ(ϕ 1, ϕ 2 ) = µ ϕ 1,ϕ 2 (R ϕ 1,ϕ 2 ) µ ϕ1,ϕ 2 (τ ϕ 1,ϕ 2 ) ϕ 1 Φ 1. Therefore, ρ l = ρ u. ρ l sup ϕ 1 Φ 1 ρ(ϕ 1, ϕ 2 ) inf ϕ 2 Φ 2 sup ρ(ϕ 1, ϕ 2 ) = ρ u. ϕ 1 Φ 1 Lemma 6.5. Suppose that assumptions in Theorem 4.7 hold and let h be as in Lemma 6.3. Then, ν(h ) = 0. Proof of Lemma 6.5. Let (ϕ 1, ϕ2 ) be as in Lemma 6.3 and put ρ := ρ l = ρ u. By (27), we have h (x) = ] max [R ϕ 1,ϕ 2 (x) ϕ 1 A(x) ρ τ ϕ 1,ϕ 2 (x) + h (y) Q ϕ 1,ϕ 2 (dy x) R ϕ 1,ϕ 2 (x) ρ τ ϕ 1,ϕ 2 (x) + h (y) Q ϕ 1,ϕ 2 (dy x) for all x, ϕ 1 Φ 1. As above, integrating with respect to the invariant probability measure µ ϕ1,ϕ 2 in both sides of the latter inequality we obtain ν(h )µ ϕ 1,ϕ 2 (S ϕ 1,ϕ 2 ) µ ϕ 1,ϕ 2 (τ ϕ 1,ϕ 2 ) [ ρ(ϕ 1, ϕ 2 ) ρ ] which implies that = µ ϕ1,ϕ 2 (τ ϕ 1,ϕ 2 ) [ρ(ϕ 1, ϕ 2 ) inf ϕ 2 Φ 2 sup ρ(ϕ 1, ϕ 2 ) ϕ 1 Φ 1 µ ϕ 1,ϕ 2 (τ ϕ 1,ϕ 2 ) [ρ(ϕ 1, ϕ 2 ) sup ϕ 1 Φ 1 ρ(ϕ 1, ϕ 2 ) ], ] 19

20 Then, ν(h )µ ϕ 1,ϕ 2 (S ϕ 1,ϕ 2 ) µ ϕ 1,ϕ 2 (τ ϕ 1,ϕ 2 ) ρ(ϕ 1, ϕ 2 ) sup ϕ 1 Φ 1 ρ(ϕ 1, ϕ 2 ) ϕ1 Φ 1. [ ν(h )µ ϕ sup 1,ϕ 2(S ϕ 1,ϕ 2 ) ] ϕ 1 Φ 1 µ ϕ 1,ϕ 2(τ ϕ 1,ϕ 2) 0. This inequality implies that ν(h ) 0. Hence, by Lemma 6.4, ν(h ) = 0. Finally we are ready for the proof of Theorem 4.7. Proof of Theorem 4.7. Let h and (ϕ 1, ϕ 2 ) be as in Lemma 6.3. First note that proof of part (a) is given throughout Lemmas 6.3, 6.4 and 6.5. Part (b) follows using standard dynamic programming arguments, while the first statement in part (c) is exactly Lemma 6.4. Thus, it only remains to prove the equalities in (15). To do this first recall that F i denotes the class of all stationary optimal strategies for player i, with i = 1, 2, which is nonempty because of part (b). Now, define the following operators on B W (): Mu(x) := Nu(x) := ] max [R ϕ 1,ϕ 2(x) ϕ 1 A(x) ρ τ ϕ 1,ϕ 2(x) + u(y) Q ϕ 1,ϕ 2(dy x) ] min [R ϕ 1,ϕ 2(x) ϕ 2 B(x) ρ τ ϕ 1,ϕ 2(x) + u(y) Q ϕ 1,ϕ 2(dy x) for all x. Proceeding as above it is easy to check that M and N are welldefined and they are λ contraction operators on B W () into itself. In fact, from part (a), h is the fixed point for both operators; that is, h ( ) = Mh ( ) = Nh ( ). Next choose an arbitrary strategy ϕ 1 0 in F 1 and note that ρ = ρ(ϕ 1 0, ϕ2 ). Then, by Theorem 4.5, there exists a unique function h ϕ 1 0,ϕ 2 ν(h ϕ 1 0,ϕ ) = 0, which satisfies 2 in B W (), with h ϕ 1 0,ϕ 2 (x) = R ϕ 1 0,ϕ2 (x) ρ τ ϕ 1 0,ϕ 2 (x) + Next, observe that h ϕ 1 0,ϕ 2 ( ) Mh ϕ 1 0,ϕ2 ( ), h ϕ 1 0,ϕ (y) Q 2 ϕ 1 0,ϕ (dy x) x. 2 20

21 which implies that h ϕ 1 0,ϕ ( ) M n h 2 ϕ 1 0,ϕ ( ) n N. 2 Now, since M is a contraction and h is its fixed point, we have h ϕ 1 0,ϕ 2 ( ) h ( ). Hence, since h ( ) = h ϕ 1,ϕ 2 ( ) and the policy ϕ1 0 was chosen arbitrarily in F 1, we have max ϕ 1 F 1 h ϕ 1,ϕ 2 ( ) = h ( ). Similar arguments, but using the operator N instead of M, show that h ( ) = min ϕ 2 F 2 h ϕ 1,ϕ2( ). Acknowledgment. The author thanks to Prof. Onésimo Hernández-Lerma for his valuable comments on a early version of this work. References [1] E. Altman, A. Hordijk and F. M. Spieksma, Contraction conditions for average and α discount optimality in countable state Markov games with unbounded rewards, Math. Oper. Res. 22 (1997), [2] S. Bathnagar and V. S. Borkar, A convex analitic framework for ergodic control of semi-markov processes, Math. Oper. Res. 20 (1995), [3] V.S. Borkar, M. K. Gosh, Denumerable stochastic games with limiting average payoff, J. Optim. Theory Appl. 76 (1993), [4] K. Fan, Minimax theorems, Proc. Acad. Sci. USA 39 (1953), [5] A. Ferdegruen, P. J. Schweitzer and H. C. Tijms, Denumerable undiscounted semi-markov decision processes with unbounded rewards, Math. Oper. Res. 8 (1983), [6] J. Filar and K. Vrieze, Competitive Markov Decision Processes, Springer- Verlag, New York, [7] M. K. Gosh and A. Bagchi, Stochastic games with average payoff criterion, Appl. Math. Optim. 38 (1998), [8] J.I. González-Trejo, O. Hernández-Lerma and L. F. Hoyos-Reyes, Minimax control of discrete-time stochastic systems, SIAM J. Control Optim., to appear. 21

22 [9] E. Gordienko and O. Hernández-Lerma, Average cost Markov control processes with weighted norms: existence of canonical policies, Appl. Math. (Warsaw) 23 (1995), [10] O. Hernández-Lerma and J.B. Lasserre, Further Topics on Discrete-Time Markov Control Processes, Springer-Verlag, New York, [11] O. Hernández-Lerma and J.B. Lasserre, Zero-sum stochastic games in Borel spaces: average payoff criteria, SIAM J. Control Optim. 39 (2001), [12] O. Hernández-Lerma, R. Montes-de-Oca, R. Cavazos-Cadena, Recurrence condtions for MDPs with Borel state space, Ann. Oper. Res. 28 (1991), [13] O. Hernández-Lerma and O. Vega-Amaya, Infinite-horizon Markov control processes with undiscounted cost criteria: from average to overtaking optimality, Appl. Math. (Warsaw) 25 (1998), [14] O. Hernández-Lerma, O. Vega-Amaya and G. Carrasco, Sample-path optimality and variance-minimization of average cost Markov control processes, SIAM J. Control Optim. 38 (1999), [15] A. Jaśkiewicz, An approximation approach to ergodic semi-markov control processes, Math. Methods Oper. Res. 54 (2001), [16] A. Jaśkiewicz and A. S. Nowak, On the optimality equation for zero-sum ergodic stochastic games, Math. Methods Oper. Res. 54 (2001), [17] A. Jaśkiewicz, Zero-sum semi-markov games, SIAM J. Control and Optim., to appear. [18] M. Kurano, Average optimal adaptive policies in semi-markov decision processes including an unknown parameter, J. Oper. Res. Soc. Japan 28 (1985), [19] A. K. Lal and S. Sinha, Zero-sum two-person semi-markov games, J. Appl. Prob. 29 (1992), [20] F. Luque-Vásquez and O. Hernández-Lerma, Semi-Markov models with average costs, Appl. Math. (Warsaw) 26 (1999), [21] S. P. Meyn and R. L. Tweedie, Markov Chains and Stochastic Stability, Springer-Verlag, London, [22] A. S. Nowak, Measurable selection theorems for minimax stochastic optimizations problems, SIAM J. Control Optim. 23 (1985), [23] A. S. Nowak, Optimal strategies in a class of zero-sum ergodic stochastic games, Math. Methods Oper. Res. 50 (1999),

23 [24] M. L. Puterman, Markov Decision Processes. Discrete Stochastic Dynamic Programming, Wiley, New York, [25] U. Rieder, Average optimality in Markov games with general state space, Proc. 3rd Conf. on Approx. Theory and Optim. (1995), Puebla, México. (Available in [26] P. J. Schweitzer, Iterative solutions of functional equations of undiscounted Markov renewal programming, J. Math. Anal. Appl. (1971), [27] O. Vega-Amaya, The average cost optimality equation: a fixed point approach, Reporte de Investigación No. 4 (2001), Departamento de Matemáticas, Universidad de Sonora, México. (Available in: tedi/reportes). [28] O. Vega-Amaya and F. Luque-Vásquez, Sample-path average cost optimality for semi-markov control processes on Borel spaces: unbounded costs and mean holding times, Appl. Math. (Warsaw) 27 (2000),

Zero-sum semi-markov games in Borel spaces: discounted and average payoff

Zero-sum semi-markov games in Borel spaces: discounted and average payoff Zero-sum semi-markov games in Borel spaces: discounted and average payoff Fernando Luque-Vásquez Departamento de Matemáticas Universidad de Sonora México May 2002 Abstract We study two-person zero-sum

More information

Value Iteration and Action ɛ-approximation of Optimal Policies in Discounted Markov Decision Processes

Value Iteration and Action ɛ-approximation of Optimal Policies in Discounted Markov Decision Processes Value Iteration and Action ɛ-approximation of Optimal Policies in Discounted Markov Decision Processes RAÚL MONTES-DE-OCA Departamento de Matemáticas Universidad Autónoma Metropolitana-Iztapalapa San Rafael

More information

Noncooperative continuous-time Markov games

Noncooperative continuous-time Markov games Morfismos, Vol. 9, No. 1, 2005, pp. 39 54 Noncooperative continuous-time Markov games Héctor Jasso-Fuentes Abstract This work concerns noncooperative continuous-time Markov games with Polish state and

More information

Total Expected Discounted Reward MDPs: Existence of Optimal Policies

Total Expected Discounted Reward MDPs: Existence of Optimal Policies Total Expected Discounted Reward MDPs: Existence of Optimal Policies Eugene A. Feinberg Department of Applied Mathematics and Statistics State University of New York at Stony Brook Stony Brook, NY 11794-3600

More information

REMARKS ON THE EXISTENCE OF SOLUTIONS IN MARKOV DECISION PROCESSES. Emmanuel Fernández-Gaucherand, Aristotle Arapostathis, and Steven I.

REMARKS ON THE EXISTENCE OF SOLUTIONS IN MARKOV DECISION PROCESSES. Emmanuel Fernández-Gaucherand, Aristotle Arapostathis, and Steven I. REMARKS ON THE EXISTENCE OF SOLUTIONS TO THE AVERAGE COST OPTIMALITY EQUATION IN MARKOV DECISION PROCESSES Emmanuel Fernández-Gaucherand, Aristotle Arapostathis, and Steven I. Marcus Department of Electrical

More information

A Perturbation Approach to Approximate Value Iteration for Average Cost Markov Decision Process with Borel Spaces and Bounded Costs

A Perturbation Approach to Approximate Value Iteration for Average Cost Markov Decision Process with Borel Spaces and Bounded Costs A Perturbation Approach to Approximate Value Iteration for Average Cost Markov Decision Process with Borel Spaces and Bounded Costs Óscar Vega-Amaya Joaquín López-Borbón August 21, 2015 Abstract The present

More information

Continuous-Time Markov Decision Processes. Discounted and Average Optimality Conditions. Xianping Guo Zhongshan University.

Continuous-Time Markov Decision Processes. Discounted and Average Optimality Conditions. Xianping Guo Zhongshan University. Continuous-Time Markov Decision Processes Discounted and Average Optimality Conditions Xianping Guo Zhongshan University. Email: mcsgxp@zsu.edu.cn Outline The control model The existing works Our conditions

More information

SEMI-MARKOV CONTROL MODELS WITH AVERAGE COSTS

SEMI-MARKOV CONTROL MODELS WITH AVERAGE COSTS APPLICATIONES MATHEMATICAE 26,3(1999), pp. 315 331 F. LUQUE-VÁSQUEZ (Sonora) O. HERNÁNDEZ-LERMA(México) SEMI-MARKOV CONTROL MODELS WITH AVERAGE COSTS Abstract. This paper studies semi-markov control models

More information

INVARIANT PROBABILITIES FOR

INVARIANT PROBABILITIES FOR Applied Mathematics and Stochastic Analysis 8, Number 4, 1995, 341-345 INVARIANT PROBABILITIES FOR FELLER-MARKOV CHAINS 1 ONtSIMO HERNNDEZ-LERMA CINVESTA V-IPN Departamento de Matemticas A. Postal 1-70,

More information

WEAK CONVERGENCES OF PROBABILITY MEASURES: A UNIFORM PRINCIPLE

WEAK CONVERGENCES OF PROBABILITY MEASURES: A UNIFORM PRINCIPLE PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY Volume 126, Number 10, October 1998, Pages 3089 3096 S 0002-9939(98)04390-1 WEAK CONVERGENCES OF PROBABILITY MEASURES: A UNIFORM PRINCIPLE JEAN B. LASSERRE

More information

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R. Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions

More information

THE DAUGAVETIAN INDEX OF A BANACH SPACE 1. INTRODUCTION

THE DAUGAVETIAN INDEX OF A BANACH SPACE 1. INTRODUCTION THE DAUGAVETIAN INDEX OF A BANACH SPACE MIGUEL MARTÍN ABSTRACT. Given an infinite-dimensional Banach space X, we introduce the daugavetian index of X, daug(x), as the greatest constant m 0 such that Id

More information

Lecture notes for Analysis of Algorithms : Markov decision processes

Lecture notes for Analysis of Algorithms : Markov decision processes Lecture notes for Analysis of Algorithms : Markov decision processes Lecturer: Thomas Dueholm Hansen June 6, 013 Abstract We give an introduction to infinite-horizon Markov decision processes (MDPs) with

More information

Controlled Markov Processes with Arbitrary Numerical Criteria

Controlled Markov Processes with Arbitrary Numerical Criteria Controlled Markov Processes with Arbitrary Numerical Criteria Naci Saldi Department of Mathematics and Statistics Queen s University MATH 872 PROJECT REPORT April 20, 2012 0.1 Introduction In the theory

More information

On the Convergence of Optimal Actions for Markov Decision Processes and the Optimality of (s, S) Inventory Policies

On the Convergence of Optimal Actions for Markov Decision Processes and the Optimality of (s, S) Inventory Policies On the Convergence of Optimal Actions for Markov Decision Processes and the Optimality of (s, S) Inventory Policies Eugene A. Feinberg Department of Applied Mathematics and Statistics Stony Brook University

More information

Optimality Inequalities for Average Cost MDPs and their Inventory Control Applications

Optimality Inequalities for Average Cost MDPs and their Inventory Control Applications 43rd IEEE Conference on Decision and Control December 14-17, 2004 Atlantis, Paradise Island, Bahamas FrA08.6 Optimality Inequalities for Average Cost MDPs and their Inventory Control Applications Eugene

More information

On Finding Optimal Policies for Markovian Decision Processes Using Simulation

On Finding Optimal Policies for Markovian Decision Processes Using Simulation On Finding Optimal Policies for Markovian Decision Processes Using Simulation Apostolos N. Burnetas Case Western Reserve University Michael N. Katehakis Rutgers University February 1995 Abstract A simulation

More information

ON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME

ON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME ON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME SAUL D. JACKA AND ALEKSANDAR MIJATOVIĆ Abstract. We develop a general approach to the Policy Improvement Algorithm (PIA) for stochastic control problems

More information

A convergence result for an Outer Approximation Scheme

A convergence result for an Outer Approximation Scheme A convergence result for an Outer Approximation Scheme R. S. Burachik Engenharia de Sistemas e Computação, COPPE-UFRJ, CP 68511, Rio de Janeiro, RJ, CEP 21941-972, Brazil regi@cos.ufrj.br J. O. Lopes Departamento

More information

Minimum average value-at-risk for finite horizon semi-markov decision processes

Minimum average value-at-risk for finite horizon semi-markov decision processes 12th workshop on Markov processes and related topics Minimum average value-at-risk for finite horizon semi-markov decision processes Xianping Guo (with Y.H. HUANG) Sun Yat-Sen University Email: mcsgxp@mail.sysu.edu.cn

More information

Reductions Of Undiscounted Markov Decision Processes and Stochastic Games To Discounted Ones. Jefferson Huang

Reductions Of Undiscounted Markov Decision Processes and Stochastic Games To Discounted Ones. Jefferson Huang Reductions Of Undiscounted Markov Decision Processes and Stochastic Games To Discounted Ones Jefferson Huang School of Operations Research and Information Engineering Cornell University November 16, 2016

More information

Existence, Uniqueness and Stability of Invariant Distributions in Continuous-Time Stochastic Models

Existence, Uniqueness and Stability of Invariant Distributions in Continuous-Time Stochastic Models Existence, Uniqueness and Stability of Invariant Distributions in Continuous-Time Stochastic Models Christian Bayer and Klaus Wälde Weierstrass Institute for Applied Analysis and Stochastics and University

More information

A note on two-person zero-sum communicating stochastic games

A note on two-person zero-sum communicating stochastic games Operations Research Letters 34 26) 412 42 Operations Research Letters www.elsevier.com/locate/orl A note on two-person zero-sum communicating stochastic games Zeynep Müge Avşar a, Melike Baykal-Gürsoy

More information

Introduction and Preliminaries

Introduction and Preliminaries Chapter 1 Introduction and Preliminaries This chapter serves two purposes. The first purpose is to prepare the readers for the more systematic development in later chapters of methods of real analysis

More information

Series Expansions in Queues with Server

Series Expansions in Queues with Server Series Expansions in Queues with Server Vacation Fazia Rahmoune and Djamil Aïssani Abstract This paper provides series expansions of the stationary distribution of finite Markov chains. The work presented

More information

Research Article A Necessary Condition for Nash Equilibrium in Two-Person Zero-Sum Constrained Stochastic Games

Research Article A Necessary Condition for Nash Equilibrium in Two-Person Zero-Sum Constrained Stochastic Games Game Theory Volume 2013, Article ID 290427, 5 pages http://dx.doi.org/10.1155/2013/290427 Research Article A Necessary Condition for Nash Equilibrium in Two-Person Zero-Sum Constrained Stochastic Games

More information

Section Notes 9. Midterm 2 Review. Applied Math / Engineering Sciences 121. Week of December 3, 2018

Section Notes 9. Midterm 2 Review. Applied Math / Engineering Sciences 121. Week of December 3, 2018 Section Notes 9 Midterm 2 Review Applied Math / Engineering Sciences 121 Week of December 3, 2018 The following list of topics is an overview of the material that was covered in the lectures and sections

More information

Optimality Inequalities for Average Cost Markov Decision Processes and the Optimality of (s, S) Policies

Optimality Inequalities for Average Cost Markov Decision Processes and the Optimality of (s, S) Policies Optimality Inequalities for Average Cost Markov Decision Processes and the Optimality of (s, S) Policies Eugene A. Feinberg Department of Applied Mathematics and Statistics State University of New York

More information

1 Stochastic Dynamic Programming

1 Stochastic Dynamic Programming 1 Stochastic Dynamic Programming Formally, a stochastic dynamic program has the same components as a deterministic one; the only modification is to the state transition equation. When events in the future

More information

Subgame-Perfect Equilibria for Stochastic Games

Subgame-Perfect Equilibria for Stochastic Games MATHEMATICS OF OPERATIONS RESEARCH Vol. 32, No. 3, August 2007, pp. 711 722 issn 0364-765X eissn 1526-5471 07 3203 0711 informs doi 10.1287/moor.1070.0264 2007 INFORMS Subgame-Perfect Equilibria for Stochastic

More information

On the Convergence of Optimal Actions for Markov Decision Processes and the Optimality of (s, S) Inventory Policies

On the Convergence of Optimal Actions for Markov Decision Processes and the Optimality of (s, S) Inventory Policies On the Convergence of Optimal Actions for Markov Decision Processes and the Optimality of (s, S) Inventory Policies Eugene A. Feinberg, 1 Mark E. Lewis 2 1 Department of Applied Mathematics and Statistics,

More information

On Differentiability of Average Cost in Parameterized Markov Chains

On Differentiability of Average Cost in Parameterized Markov Chains On Differentiability of Average Cost in Parameterized Markov Chains Vijay Konda John N. Tsitsiklis August 30, 2002 1 Overview The purpose of this appendix is to prove Theorem 4.6 in 5 and establish various

More information

THE SKOROKHOD OBLIQUE REFLECTION PROBLEM IN A CONVEX POLYHEDRON

THE SKOROKHOD OBLIQUE REFLECTION PROBLEM IN A CONVEX POLYHEDRON GEORGIAN MATHEMATICAL JOURNAL: Vol. 3, No. 2, 1996, 153-176 THE SKOROKHOD OBLIQUE REFLECTION PROBLEM IN A CONVEX POLYHEDRON M. SHASHIASHVILI Abstract. The Skorokhod oblique reflection problem is studied

More information

Proofs for Large Sample Properties of Generalized Method of Moments Estimators

Proofs for Large Sample Properties of Generalized Method of Moments Estimators Proofs for Large Sample Properties of Generalized Method of Moments Estimators Lars Peter Hansen University of Chicago March 8, 2012 1 Introduction Econometrica did not publish many of the proofs in my

More information

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

MDP Preliminaries. Nan Jiang. February 10, 2019

MDP Preliminaries. Nan Jiang. February 10, 2019 MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process

More information

Weighted Sup-Norm Contractions in Dynamic Programming: A Review and Some New Applications

Weighted Sup-Norm Contractions in Dynamic Programming: A Review and Some New Applications May 2012 Report LIDS - 2884 Weighted Sup-Norm Contractions in Dynamic Programming: A Review and Some New Applications Dimitri P. Bertsekas Abstract We consider a class of generalized dynamic programming

More information

SOME REMARKS ON KRASNOSELSKII S FIXED POINT THEOREM

SOME REMARKS ON KRASNOSELSKII S FIXED POINT THEOREM Fixed Point Theory, Volume 4, No. 1, 2003, 3-13 http://www.math.ubbcluj.ro/ nodeacj/journal.htm SOME REMARKS ON KRASNOSELSKII S FIXED POINT THEOREM CEZAR AVRAMESCU AND CRISTIAN VLADIMIRESCU Department

More information

Lecture 10. Theorem 1.1 [Ergodicity and extremality] A probability measure µ on (Ω, F) is ergodic for T if and only if it is an extremal point in M.

Lecture 10. Theorem 1.1 [Ergodicity and extremality] A probability measure µ on (Ω, F) is ergodic for T if and only if it is an extremal point in M. Lecture 10 1 Ergodic decomposition of invariant measures Let T : (Ω, F) (Ω, F) be measurable, and let M denote the space of T -invariant probability measures on (Ω, F). Then M is a convex set, although

More information

Economics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011

Economics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011 Economics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011 Section 2.6 (cont.) Properties of Real Functions Here we first study properties of functions from R to R, making use of the additional structure

More information

Infinite-Horizon Average Reward Markov Decision Processes

Infinite-Horizon Average Reward Markov Decision Processes Infinite-Horizon Average Reward Markov Decision Processes Dan Zhang Leeds School of Business University of Colorado at Boulder Dan Zhang, Spring 2012 Infinite Horizon Average Reward MDP 1 Outline The average

More information

FUNCTIONAL COMPRESSION-EXPANSION FIXED POINT THEOREM

FUNCTIONAL COMPRESSION-EXPANSION FIXED POINT THEOREM Electronic Journal of Differential Equations, Vol. 28(28), No. 22, pp. 1 12. ISSN: 172-6691. URL: http://ejde.math.txstate.edu or http://ejde.math.unt.edu ftp ejde.math.txstate.edu (login: ftp) FUNCTIONAL

More information

OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS

OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS Xiaofei Fan-Orzechowski Department of Applied Mathematics and Statistics State University of New York at Stony Brook Stony

More information

Near-Potential Games: Geometry and Dynamics

Near-Potential Games: Geometry and Dynamics Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo January 29, 2012 Abstract Potential games are a special class of games for which many adaptive user dynamics

More information

Two-player games : a reduction

Two-player games : a reduction Two-player games : a reduction Nicolas Vieille December 12, 2001 The goal of this chapter, together with the next one, is to give a detailed overview of the proof of the following result. Theorem 1 Let

More information

On Ergodic Impulse Control with Constraint

On Ergodic Impulse Control with Constraint On Ergodic Impulse Control with Constraint Maurice Robin Based on joint papers with J.L. Menaldi University Paris-Sanclay 9119 Saint-Aubin, France (e-mail: maurice.robin@polytechnique.edu) IMA, Minneapolis,

More information

SUMMARY OF RESULTS ON PATH SPACES AND CONVERGENCE IN DISTRIBUTION FOR STOCHASTIC PROCESSES

SUMMARY OF RESULTS ON PATH SPACES AND CONVERGENCE IN DISTRIBUTION FOR STOCHASTIC PROCESSES SUMMARY OF RESULTS ON PATH SPACES AND CONVERGENCE IN DISTRIBUTION FOR STOCHASTIC PROCESSES RUTH J. WILLIAMS October 2, 2017 Department of Mathematics, University of California, San Diego, 9500 Gilman Drive,

More information

On the simplest expression of the perturbed Moore Penrose metric generalized inverse

On the simplest expression of the perturbed Moore Penrose metric generalized inverse Annals of the University of Bucharest (mathematical series) 4 (LXII) (2013), 433 446 On the simplest expression of the perturbed Moore Penrose metric generalized inverse Jianbing Cao and Yifeng Xue Communicated

More information

PERTURBATION THEORY FOR NONLINEAR DIRICHLET PROBLEMS

PERTURBATION THEORY FOR NONLINEAR DIRICHLET PROBLEMS Annales Academiæ Scientiarum Fennicæ Mathematica Volumen 28, 2003, 207 222 PERTURBATION THEORY FOR NONLINEAR DIRICHLET PROBLEMS Fumi-Yuki Maeda and Takayori Ono Hiroshima Institute of Technology, Miyake,

More information

A regeneration proof of the central limit theorem for uniformly ergodic Markov chains

A regeneration proof of the central limit theorem for uniformly ergodic Markov chains A regeneration proof of the central limit theorem for uniformly ergodic Markov chains By AJAY JASRA Department of Mathematics, Imperial College London, SW7 2AZ, London, UK and CHAO YANG Department of Mathematics,

More information

AN EFFECTIVE METRIC ON C(H, K) WITH NORMAL STRUCTURE. Mona Nabiei (Received 23 June, 2015)

AN EFFECTIVE METRIC ON C(H, K) WITH NORMAL STRUCTURE. Mona Nabiei (Received 23 June, 2015) NEW ZEALAND JOURNAL OF MATHEMATICS Volume 46 (2016), 53-64 AN EFFECTIVE METRIC ON C(H, K) WITH NORMAL STRUCTURE Mona Nabiei (Received 23 June, 2015) Abstract. This study first defines a new metric with

More information

A TALE OF TWO CONFORMALLY INVARIANT METRICS

A TALE OF TWO CONFORMALLY INVARIANT METRICS A TALE OF TWO CONFORMALLY INVARIANT METRICS H. S. BEAR AND WAYNE SMITH Abstract. The Harnack metric is a conformally invariant metric defined in quite general domains that coincides with the hyperbolic

More information

The local equicontinuity of a maximal monotone operator

The local equicontinuity of a maximal monotone operator arxiv:1410.3328v2 [math.fa] 3 Nov 2014 The local equicontinuity of a maximal monotone operator M.D. Voisei Abstract The local equicontinuity of an operator T : X X with proper Fitzpatrick function ϕ T

More information

University of Warwick, EC9A0 Maths for Economists Lecture Notes 10: Dynamic Programming

University of Warwick, EC9A0 Maths for Economists Lecture Notes 10: Dynamic Programming University of Warwick, EC9A0 Maths for Economists 1 of 63 University of Warwick, EC9A0 Maths for Economists Lecture Notes 10: Dynamic Programming Peter J. Hammond Autumn 2013, revised 2014 University of

More information

STAT 7032 Probability Spring Wlodek Bryc

STAT 7032 Probability Spring Wlodek Bryc STAT 7032 Probability Spring 2018 Wlodek Bryc Created: Friday, Jan 2, 2014 Revised for Spring 2018 Printed: January 9, 2018 File: Grad-Prob-2018.TEX Department of Mathematical Sciences, University of Cincinnati,

More information

ON PARABOLIC HARNACK INEQUALITY

ON PARABOLIC HARNACK INEQUALITY ON PARABOLIC HARNACK INEQUALITY JIAXIN HU Abstract. We show that the parabolic Harnack inequality is equivalent to the near-diagonal lower bound of the Dirichlet heat kernel on any ball in a metric measure-energy

More information

APPLICATIONS OF THE KANTOROVICH-RUBINSTEIN MAXIMUM PRINCIPLE IN THE THEORY OF MARKOV OPERATORS

APPLICATIONS OF THE KANTOROVICH-RUBINSTEIN MAXIMUM PRINCIPLE IN THE THEORY OF MARKOV OPERATORS 12 th International Workshop for Young Mathematicians Probability Theory and Statistics Kraków, 20-26 September 2009 pp. 43-51 APPLICATIONS OF THE KANTOROVICH-RUBINSTEIN MAIMUM PRINCIPLE IN THE THEORY

More information

Infinite-Horizon Discounted Markov Decision Processes

Infinite-Horizon Discounted Markov Decision Processes Infinite-Horizon Discounted Markov Decision Processes Dan Zhang Leeds School of Business University of Colorado at Boulder Dan Zhang, Spring 2012 Infinite Horizon Discounted MDP 1 Outline The expected

More information

Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms

Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms Yan Bai Feb 2009; Revised Nov 2009 Abstract In the paper, we mainly study ergodicity of adaptive MCMC algorithms. Assume that

More information

OPTIMAL SOLUTIONS OF CONSTRAINED DISCOUNTED SEMI-MARKOV CONTROL PROBLEMS

OPTIMAL SOLUTIONS OF CONSTRAINED DISCOUNTED SEMI-MARKOV CONTROL PROBLEMS Bol. Soc. Mat. Mexicana (3) Vol. 19, 213 OPTIMAL SOLUTIONS OF CONSTRAINED DISCOUNTED SEMI-MARKOV CONTROL PROBLEMS JUAN GONZÁLEZ-HERNÁNDEZ AND CÉSAR EMILIO VILLARREAL-RODRÍGUEZ ABSTRACT. We give conditions

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha

More information

ON ESSENTIAL INFORMATION IN SEQUENTIAL DECISION PROCESSES

ON ESSENTIAL INFORMATION IN SEQUENTIAL DECISION PROCESSES MMOR manuscript No. (will be inserted by the editor) ON ESSENTIAL INFORMATION IN SEQUENTIAL DECISION PROCESSES Eugene A. Feinberg Department of Applied Mathematics and Statistics; State University of New

More information

ON THE CONVERGENCE OF MODIFIED NOOR ITERATION METHOD FOR NEARLY LIPSCHITZIAN MAPPINGS IN ARBITRARY REAL BANACH SPACES

ON THE CONVERGENCE OF MODIFIED NOOR ITERATION METHOD FOR NEARLY LIPSCHITZIAN MAPPINGS IN ARBITRARY REAL BANACH SPACES TJMM 6 (2014), No. 1, 45-51 ON THE CONVERGENCE OF MODIFIED NOOR ITERATION METHOD FOR NEARLY LIPSCHITZIAN MAPPINGS IN ARBITRARY REAL BANACH SPACES ADESANMI ALAO MOGBADEMU Abstract. In this present paper,

More information

Cone-Constrained Linear Equations in Banach Spaces 1

Cone-Constrained Linear Equations in Banach Spaces 1 Journal of Convex Analysis Volume 4 (1997), No. 1, 149 164 Cone-Constrained Linear Equations in Banach Spaces 1 O. Hernandez-Lerma Departamento de Matemáticas, CINVESTAV-IPN, A. Postal 14-740, México D.F.

More information

Asymptotic stability of an evolutionary nonlinear Boltzmann-type equation

Asymptotic stability of an evolutionary nonlinear Boltzmann-type equation Acta Polytechnica Hungarica Vol. 14, No. 5, 217 Asymptotic stability of an evolutionary nonlinear Boltzmann-type equation Roksana Brodnicka, Henryk Gacki Institute of Mathematics, University of Silesia

More information

Weighted Sums of Orthogonal Polynomials Related to Birth-Death Processes with Killing

Weighted Sums of Orthogonal Polynomials Related to Birth-Death Processes with Killing Advances in Dynamical Systems and Applications ISSN 0973-5321, Volume 8, Number 2, pp. 401 412 (2013) http://campus.mst.edu/adsa Weighted Sums of Orthogonal Polynomials Related to Birth-Death Processes

More information

On intermediate value theorem in ordered Banach spaces for noncompact and discontinuous mappings

On intermediate value theorem in ordered Banach spaces for noncompact and discontinuous mappings Int. J. Nonlinear Anal. Appl. 7 (2016) No. 1, 295-300 ISSN: 2008-6822 (electronic) http://dx.doi.org/10.22075/ijnaa.2015.341 On intermediate value theorem in ordered Banach spaces for noncompact and discontinuous

More information

Abstract Dynamic Programming

Abstract Dynamic Programming Abstract Dynamic Programming Dimitri P. Bertsekas Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Overview of the Research Monograph Abstract Dynamic Programming"

More information

Invariant measures for iterated function systems

Invariant measures for iterated function systems ANNALES POLONICI MATHEMATICI LXXV.1(2000) Invariant measures for iterated function systems by Tomasz Szarek (Katowice and Rzeszów) Abstract. A new criterion for the existence of an invariant distribution

More information

A FIXED POINT THEOREM FOR GENERALIZED NONEXPANSIVE MULTIVALUED MAPPINGS

A FIXED POINT THEOREM FOR GENERALIZED NONEXPANSIVE MULTIVALUED MAPPINGS Fixed Point Theory, (0), No., 4-46 http://www.math.ubbcluj.ro/ nodeacj/sfptcj.html A FIXED POINT THEOREM FOR GENERALIZED NONEXPANSIVE MULTIVALUED MAPPINGS A. ABKAR AND M. ESLAMIAN Department of Mathematics,

More information

A CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE

A CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE Journal of Applied Analysis Vol. 6, No. 1 (2000), pp. 139 148 A CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE A. W. A. TAHA Received

More information

Central limit theorems for ergodic continuous-time Markov chains with applications to single birth processes

Central limit theorems for ergodic continuous-time Markov chains with applications to single birth processes Front. Math. China 215, 1(4): 933 947 DOI 1.17/s11464-15-488-5 Central limit theorems for ergodic continuous-time Markov chains with applications to single birth processes Yuanyuan LIU 1, Yuhui ZHANG 2

More information

Exercise Solutions to Functional Analysis

Exercise Solutions to Functional Analysis Exercise Solutions to Functional Analysis Note: References refer to M. Schechter, Principles of Functional Analysis Exersize that. Let φ,..., φ n be an orthonormal set in a Hilbert space H. Show n f n

More information

LYAPUNOV STABILITY OF CLOSED SETS IN IMPULSIVE SEMIDYNAMICAL SYSTEMS

LYAPUNOV STABILITY OF CLOSED SETS IN IMPULSIVE SEMIDYNAMICAL SYSTEMS Electronic Journal of Differential Equations, Vol. 2010(2010, No. 78, pp. 1 18. ISSN: 1072-6691. URL: http://ejde.math.txstate.edu or http://ejde.math.unt.edu ftp ejde.math.txstate.edu LYAPUNOV STABILITY

More information

Uniform turnpike theorems for finite Markov decision processes

Uniform turnpike theorems for finite Markov decision processes MATHEMATICS OF OPERATIONS RESEARCH Vol. 00, No. 0, Xxxxx 0000, pp. 000 000 issn 0364-765X eissn 1526-5471 00 0000 0001 INFORMS doi 10.1287/xxxx.0000.0000 c 0000 INFORMS Authors are encouraged to submit

More information

Solution existence of variational inequalities with pseudomonotone operators in the sense of Brézis

Solution existence of variational inequalities with pseudomonotone operators in the sense of Brézis Solution existence of variational inequalities with pseudomonotone operators in the sense of Brézis B. T. Kien, M.-M. Wong, N. C. Wong and J. C. Yao Communicated by F. Giannessi This research was partially

More information

The Equivalence of Ergodicity and Weak Mixing for Infinitely Divisible Processes1

The Equivalence of Ergodicity and Weak Mixing for Infinitely Divisible Processes1 Journal of Theoretical Probability. Vol. 10, No. 1, 1997 The Equivalence of Ergodicity and Weak Mixing for Infinitely Divisible Processes1 Jan Rosinski2 and Tomasz Zak Received June 20, 1995: revised September

More information

AW -Convergence and Well-Posedness of Non Convex Functions

AW -Convergence and Well-Posedness of Non Convex Functions Journal of Convex Analysis Volume 10 (2003), No. 2, 351 364 AW -Convergence Well-Posedness of Non Convex Functions Silvia Villa DIMA, Università di Genova, Via Dodecaneso 35, 16146 Genova, Italy villa@dima.unige.it

More information

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi Real Analysis Math 3AH Rudin, Chapter # Dominique Abdi.. If r is rational (r 0) and x is irrational, prove that r + x and rx are irrational. Solution. Assume the contrary, that r+x and rx are rational.

More information

Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem

Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem 56 Chapter 7 Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem Recall that C(X) is not a normed linear space when X is not compact. On the other hand we could use semi

More information

A monotonic property of the optimal admission control to an M/M/1 queue under periodic observations with average cost criterion

A monotonic property of the optimal admission control to an M/M/1 queue under periodic observations with average cost criterion A monotonic property of the optimal admission control to an M/M/1 queue under periodic observations with average cost criterion Cao, Jianhua; Nyberg, Christian Published in: Seventeenth Nordic Teletraffic

More information

ADJOINTS, ABSOLUTE VALUES AND POLAR DECOMPOSITIONS

ADJOINTS, ABSOLUTE VALUES AND POLAR DECOMPOSITIONS J. OPERATOR THEORY 44(2000), 243 254 c Copyright by Theta, 2000 ADJOINTS, ABSOLUTE VALUES AND POLAR DECOMPOSITIONS DOUGLAS BRIDGES, FRED RICHMAN and PETER SCHUSTER Communicated by William B. Arveson Abstract.

More information

SOLUTION OF AN INITIAL-VALUE PROBLEM FOR PARABOLIC EQUATIONS VIA MONOTONE OPERATOR METHODS

SOLUTION OF AN INITIAL-VALUE PROBLEM FOR PARABOLIC EQUATIONS VIA MONOTONE OPERATOR METHODS Electronic Journal of Differential Equations, Vol. 214 (214), No. 225, pp. 1 1. ISSN: 172-6691. URL: http://ejde.math.txstate.edu or http://ejde.math.unt.edu ftp ejde.math.txstate.edu SOLUTION OF AN INITIAL-VALUE

More information

ALUR DUAL RENORMINGS OF BANACH SPACES SEBASTIÁN LAJARA

ALUR DUAL RENORMINGS OF BANACH SPACES SEBASTIÁN LAJARA ALUR DUAL RENORMINGS OF BANACH SPACES SEBASTIÁN LAJARA ABSTRACT. We give a covering type characterization for the class of dual Banach spaces with an equivalent ALUR dual norm. Let K be a closed convex

More information

SMSTC (2007/08) Probability.

SMSTC (2007/08) Probability. SMSTC (27/8) Probability www.smstc.ac.uk Contents 12 Markov chains in continuous time 12 1 12.1 Markov property and the Kolmogorov equations.................... 12 2 12.1.1 Finite state space.................................

More information

An Iterative Procedure for Solving the Riccati Equation A 2 R RA 1 = A 3 + RA 4 R. M.THAMBAN NAIR (I.I.T. Madras)

An Iterative Procedure for Solving the Riccati Equation A 2 R RA 1 = A 3 + RA 4 R. M.THAMBAN NAIR (I.I.T. Madras) An Iterative Procedure for Solving the Riccati Equation A 2 R RA 1 = A 3 + RA 4 R M.THAMBAN NAIR (I.I.T. Madras) Abstract Let X 1 and X 2 be complex Banach spaces, and let A 1 BL(X 1 ), A 2 BL(X 2 ), A

More information

ON THE EXISTENCE OF THREE SOLUTIONS FOR QUASILINEAR ELLIPTIC PROBLEM. Paweł Goncerz

ON THE EXISTENCE OF THREE SOLUTIONS FOR QUASILINEAR ELLIPTIC PROBLEM. Paweł Goncerz Opuscula Mathematica Vol. 32 No. 3 2012 http://dx.doi.org/10.7494/opmath.2012.32.3.473 ON THE EXISTENCE OF THREE SOLUTIONS FOR QUASILINEAR ELLIPTIC PROBLEM Paweł Goncerz Abstract. We consider a quasilinear

More information

WEAK LOWER SEMI-CONTINUITY OF THE OPTIMAL VALUE FUNCTION AND APPLICATIONS TO WORST-CASE ROBUST OPTIMAL CONTROL PROBLEMS

WEAK LOWER SEMI-CONTINUITY OF THE OPTIMAL VALUE FUNCTION AND APPLICATIONS TO WORST-CASE ROBUST OPTIMAL CONTROL PROBLEMS WEAK LOWER SEMI-CONTINUITY OF THE OPTIMAL VALUE FUNCTION AND APPLICATIONS TO WORST-CASE ROBUST OPTIMAL CONTROL PROBLEMS ROLAND HERZOG AND FRANK SCHMIDT Abstract. Sufficient conditions ensuring weak lower

More information

Probability and Measure

Probability and Measure Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability

More information

Domination of semigroups associated with sectorial forms

Domination of semigroups associated with sectorial forms Domination of semigroups associated with sectorial forms Amir Manavi, Hendrik Vogt, and Jürgen Voigt Fachrichtung Mathematik, Technische Universität Dresden, D-0106 Dresden, Germany Abstract Let τ be a

More information

Journal of Complexity. New general convergence theory for iterative processes and its applications to Newton Kantorovich type theorems

Journal of Complexity. New general convergence theory for iterative processes and its applications to Newton Kantorovich type theorems Journal of Complexity 26 (2010) 3 42 Contents lists available at ScienceDirect Journal of Complexity journal homepage: www.elsevier.com/locate/jco New general convergence theory for iterative processes

More information

Near-Potential Games: Geometry and Dynamics

Near-Potential Games: Geometry and Dynamics Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo September 6, 2011 Abstract Potential games are a special class of games for which many adaptive user dynamics

More information

Operator approach to stochastic games with varying stage duration

Operator approach to stochastic games with varying stage duration Operator approach to stochastic games with varying stage duration G.Vigeral (with S. Sorin) CEREMADE Universite Paris Dauphine 4 December 2015, Stochastic methods in Game theory 1 G.Vigeral (with S. Sorin)

More information

A generalization of Dobrushin coefficient

A generalization of Dobrushin coefficient A generalization of Dobrushin coefficient Ü µ ŒÆ.êÆ ÆÆ 202.5 . Introduction and main results We generalize the well-known Dobrushin coefficient δ in total variation to weighted total variation δ V, which

More information

Markov Chains and Stochastic Sampling

Markov Chains and Stochastic Sampling Part I Markov Chains and Stochastic Sampling 1 Markov Chains and Random Walks on Graphs 1.1 Structure of Finite Markov Chains We shall only consider Markov chains with a finite, but usually very large,

More information

On Robust Arm-Acquiring Bandit Problems

On Robust Arm-Acquiring Bandit Problems On Robust Arm-Acquiring Bandit Problems Shiqing Yu Faculty Mentor: Xiang Yu July 20, 2014 Abstract In the classical multi-armed bandit problem, at each stage, the player has to choose one from N given

More information

at time t, in dimension d. The index i varies in a countable set I. We call configuration the family, denoted generically by Φ: U (x i (t) x j (t))

at time t, in dimension d. The index i varies in a countable set I. We call configuration the family, denoted generically by Φ: U (x i (t) x j (t)) Notations In this chapter we investigate infinite systems of interacting particles subject to Newtonian dynamics Each particle is characterized by its position an velocity x i t, v i t R d R d at time

More information

WHY SATURATED PROBABILITY SPACES ARE NECESSARY

WHY SATURATED PROBABILITY SPACES ARE NECESSARY WHY SATURATED PROBABILITY SPACES ARE NECESSARY H. JEROME KEISLER AND YENENG SUN Abstract. An atomless probability space (Ω, A, P ) is said to have the saturation property for a probability measure µ on

More information

CONVERGENCE OF HYBRID FIXED POINT FOR A PAIR OF NONLINEAR MAPPINGS IN BANACH SPACES

CONVERGENCE OF HYBRID FIXED POINT FOR A PAIR OF NONLINEAR MAPPINGS IN BANACH SPACES International Journal of Analysis and Applications ISSN 2291-8639 Volume 8, Number 1 2015), 69-78 http://www.etamaths.com CONVERGENCE OF HYBRID FIXED POINT FOR A PAIR OF NONLINEAR MAPPINGS IN BANACH SPACES

More information

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due 9/5). Prove that every countable set A is measurable and µ(a) = 0. 2 (Bonus). Let A consist of points (x, y) such that either x or y is

More information