Zero-sum semi-markov games in Borel spaces: discounted and average payoff

Size: px
Start display at page:

Download "Zero-sum semi-markov games in Borel spaces: discounted and average payoff"

Transcription

1 Zero-sum semi-markov games in Borel spaces: discounted and average payoff Fernando Luque-Vásquez Departamento de Matemáticas Universidad de Sonora México May 2002 Abstract We study two-person zero-sum semi-markov games in Borel spaces with possibly unbounded payoff function, under discounted and average payoff criteria.conditions are given for the existence of the value of the game, the existence of optimal strategies for both players, and for a characterization of the discounted optimal stationary strategies. Mathematics Subject Classification: 91A15, 91A25, 90C40. Key Words: Zero-sum semi-markov games, Borel spaces, discounted payoff, average payoff, Shapley equation. 1 Introduction This paper deals with two-person zero-sum semi-markov games with Borel spaces and possibly unbounded payoff function, under discounted and average payoff criteria. Under suitable conditions on the transition law, the payoff function and the distribution of the transition times, we show the existence of the value of the game and the existence of optimal pairs of stationary strategies for both the discounted and the average criteria. We also obtain a characterization for a pair of stationary stategies to be discounted optimal. In the discounted case, our approach is based on contraction conditions. Our assumptions ensure that the minimax operator is a contraction map with respect to a weighted norm, and we show that the (unique) fixed point is the value of the game. Fan s Minimax Theorem [3] and measurable selection theorems are used to prove the existence of optimal stationary strategies. This approach has been used for several authors in Markov games (for example, [1], [13], [16]) and for semi-markov games in [10], which considers a countable state space and a bounded payoff function. For the analysis of the average criterion, we suppose, besides the usual continuity/compactness requirements, that the payoff function satisfies a growth condition, and also that the embedded 1

2 Markov chains have suitable stability properties. A key step, is the use of a data transformation to associate a Markov game with our original semi-markov game. This transformation has been used for several authors (for example, [4], [5], [10] and [11]). Markov stochastic games with discounted and average payoff have been widely studied by several authors (for example, [1], [8], [12], [13], [15] [16] and [18]) but, to the best of our knowledge, the only paper that studies zero-sum semi-markov stochastic games is [10]. Our main results generalize to the semi-markov context, some results in [13], [15], [16] and [10]. The remainder of the paper is organized as follows. In Section 2 the semi- Markov game model is described. Next, in Section 3, the discounted and the average criteria are introduced. In Section 4 we introduce the assumptions and present our main results, Theorems 4.3 and 4.4, which are proved in Sections 5 and 6, respectively. Terminology and notation. Given a Borel space X, i.e. a Borel subset of a complete and separable metric space, we denote by B(X) its Borel σ- algebra. P(X) denotes the family of probability measures on X endowed with the weak topology. If X and Y are Borel spaces, we denote by P(X Y ) the family of transition probabilities (or stochastic kernels) from Y to X. For a transition probability f P(X Y ), we write its values as f(y)(b) or f(b y ) for all B B(X) and y Y. If X = Y, then f is called a Markov transition probability on X 2 The semi-markov game A semi-markov game model is defined by the collection (X, A, B, K A, K B, Q, F, r), (1) where X is the state space, and A and B are the action spaces for players 1 and 2, respectively. These spaces are assumed to be Borel spaces, whereas K A B(X A) and K B B(X B) are the constraint sets. For each x X, the x-section A(x) := {a A : (x, a) K A } represents the set of admissible actions for player 1 in state x. Similarly, the x-section B(x) := {b B : (x, b) K B } denotes the set of admissible actions for player 2 in state x. Let K := {(x, a, b) : x X, a A(x), b B(x)}, which is a Borel subset of X A B (see [14]). Moreover, Q( x, a, b) is a stochastic kernel on X given K called the transition law, and F ( x, a, b) is a probability distribution function on R + := [0, ) given K called the transition time distribution. Finally, r is a real-valued measurable function on K that denotes the payoff function, and it represents the reward for player 1 and the cost function for player 2. The game is played as follows: if x is the state of the game at some decision (or transition) epoch, and the players independently choose actions a A(x) and b B(x), then the following happens: player 1 receives an immediate reward r(x, a, b), player 2 incurres a cost r(x, a, b), and the system moves to a new state according to the probability measure Q( x, a, b). The time until the transition occurs is a random variable having the distribution function F ( x, a, b). 2

3 Let H 0 := X and H n := (K R + ) H n 1 for n = 1, 2,.... For each n, an element h n = (x 0, a 0, b 0, δ 1,..., x n 1, a n 1, b n 1, δ n, x n ) of H n represents the history of the game up to the nth decision epoch. A strategy π for player 1 is a sequence π = {π n : n = 0, 1,...} of stochastic kernels π n P(A H n ) such that π n (A(x n ) h n ) = 1 h n H n. We denote by Π the family of all strategies for player 1. A strategy π = {π n } is called a Markov strategy if π n P(A X) for each n = 0, 1,..., that is, each π n depends only on the current state x n of the system. The set of all Markov strategies of player 1 is denoted by Π M. Let Φ 1 denote the class of all transition probabilities f P(A X) such that f(x) P(A(x)). A Markov strategy π = {π n } is said to be a stationary strategy if there exists f Φ 1 such that π n = f for each n = 0, 1,.... In this case, the strategy is identified with f, and the set of all stationary strategies for player 1 with Φ 1. The sets Γ, Γ M and Φ 2 of all strategies, all Markov strategies and all stationary strategies for player 2 are defined similarly. Let (Ω, F) be the canonical measurable space that consists of the sample space Ω := (X A B R + ) and its product σ-algebra F. Then for each pair of strategies (π, γ) Π Γ and each initial state x there exist a unique probability measure Px πγ and a stochastic process {(x n, a n, b n, δ n+1 ), n = 0, 1,...}, where x n, a n and b n represent the state and the actions for players 1 and 2, respectively, at the nth decision epoch, whereas δ n represents the time between the (n 1)th and the nth decision epoch. Ex πγ denotes the expectation operator with respect to P πγ x. 3 Optimality criteria Definition 3.1. For α > 0, x X and (π, γ) Π Γ, the infinite-horizon total expected α-discounted payoff is V (x, π, γ) := E πγ x e αt k r(x k, a k, b k ). (2) k=0 where T 0 = 0 and T n = T n 1 + δ n for n = 1, 2,... Definition 3.2. For x X and (π, γ) Π Γ, the long-run expected average payoff is defined as Ex πγ J(x, π, γ) := lim inf n n 1 k=0 r(x k, a k, b k ) Ex πγ. (3) (T n ) To define our optimality criteria, we need to introduce the following concepts. The functions on X given by L α (x) := sup inf V (x, π, γ) and U α(x) := inf γ Γ π Π 3 sup γ Γ π Π V (x, π, γ) (4)

4 are called the lower value and the upper value, respectively, of the (expected) α-discounted payoff game. It is clear that L α ( ) U α ( ) in general, but if it holds that L α (x) = U α (x) for all x X, then the common value is called the value of the semi-markov game and will be denoted by V α (x). if Definition 3.3. (a) A strategy π Π is said to be α-optimal for player 1 U α (x) V (x, π, γ) γ Γ, x X. (b) A strategy γ Γ is said to be α-optimal for player 2 if V (x, π, γ ) L α (x) π Π, x X. (c) A pair (π, γ ) Π Γ is said to be an α-optimal pair of strategies if, for all x X, U α (x) = inf V (x, γ Γ π, γ) and L α (x) = sup V (x, π, γ ). For the expected average semi-markov game, the lower value L( ), the upper value U( ), the value V ( ) and optimal strategies are defined similarily. We note that the existence of an optimal strategy either for player 1 or player 2, implies that the game has a value. and Remark 3.4. Let β α (x, a, b) := τ(x, a, b) := 0 0 π Π e αt F (dt x, a, b), (5) tf (dt x, a, b). Then using properties of the conditional expectation we can write V (x, π, γ) = E πγ x [r(x 0, a 0, b 0 ) + n 1 n=1 k=0 β α (x k, a k, b k )r(x n, a n, b n )], (6) and J(x, π, γ) = lim inf n E πγ x E πγ x n 1 k=0 r(x k, a k, b k ) n 1 k=0 τ(x k, a k, b k ). 4 Assumptions and main results The problem we are concerned with is to show the existence of optimal strategies for players 1 and 2, which requires imposing suitable assumptions on the semi- Markov game model. In particular, the first one is a regularity condition that ensures that an infinite number of transitions do not occur in a finite time interval. 4

5 and Assumption 1 (A1). There exist θ > 0 and ε > 0 such that F (θ x, a, b) 1 ε (x, a, b) K. An important consequence of this assumption is the following. Lemma 4.1. If A1 holds, then ρ α := sup β α (x, a, b) < 1, (7) (x,a,b) K inf τ(x, a, b) > 0. (8) (x,a,b) K Proof. Let (x, a, b) K be fixed. Then integrating by parts in (5) we have β α (x, a, b) = α = α[ 0 θ 0 e αt F (t x, a, b)dt e αt F (t x, a, b)dt + θ e αt F (t x, a, b)dt] (1 ε)(1 e αθ ) + e αθ = 1 ε + εe αθ < 1. As (x, a, b) K was arbitrary, we get (7). On the other hand, which yields (8). τ(x, a, b) θ tf (dt x, a, b) θε, Our second assumption is a combination of standard continuity and compactness requirements, together with a growth condition on the payoff function. Assumption 2 (A2). (a) For each x X, the sets A(x) and B(x) are compact. (b) For each (x, a, b) K, r(x,, b) is upper semicontinuous on A(x), and r(x, a, ) is lower semicontinuous on B(x). (c) For each (x, a, b) K and each bounded and measurable function v on X, the functions a v(y)q(dy x, a, b) and b v(y)q(dy x, a, b) are continuous on A(x) and B(x), respectively. (d) There exist a measurable function w : X [1, ) and a positive constant m such that for all (x, a, b) K r(x, a, b) mw(x). In addition, part (c) holds when v is replaced with w. 5

6 Remark 4.2. By Lemma 1.11 in [13], it follows that if Assumption 2(a) holds, then the multifunctions A : X 2 P(A) and B : X 2 P(B) defined as A(x) := P(A(x)) and B(x) := P(B(x)) are measurable compact-valued multifunctions. Assumption 3 (A3). (a) There exists a positive constant η, with ηρ α < 1, such that for all (x, a, b) K w(y)q(dy x, a, b) ηw(x), where w is the function in A2(d). (b) For each t 0, F (t x, a, b) is continuous on K. We now introduce the following notation: for any given function h : K R, x X, and probability measures µ A(x) and λ B(x) we write h(x, µ, λ) := h(x, a, b)µ(da)λ(db), B(x) A(x) whenever the integrals are well defined. In particular, r(x, µ, λ) := r(x, a, b)µ(da)λ(db), and β α (x, µ, λ) := τ(x, µ, λ) := Q(D x, µ, λ) := B(x) B(x) B(x) B(x) A(x) A(x) A(x) A(x) β α (x, a, b)µ(da)λ(db), τ(x, a, b)µ(da)λ(db), Q(D x, a, b)µ(da)λ(db), D B(X). B w (X) denotes the linear space of measurable functions u on X with finite w-norm, which is defined as u(x) u w := sup x X w(x). For u B w (X) and (x, a, b) K, we write H(u, x, a, b) := r(x, a, b) + β α (x, a, b) For each function u B w (X) let T α u(x) := sup inf µ A(x) λ B(x) X u(y)q(dy x, a, b). H(u, x, µ, λ), (9) which defines a function T α u in B w (X) (see Lemma 5.1 below). We call T α the Shapley operator, and a function v B w (X) is said to be a solution to the 6

7 Shapley equation if T α v(x) = v(x) for all x X. In the proof of Lemma 5.1, we show that if Assumptions 1, 2 and 3 hold, then for each µ A(x), H(u, x, µ, ) is l.s.c. on B(x), and for each λ B(x), H(u, x,, λ) is u.s.c. on A(x). Thus, by Theorem A.2.3 in [2] the supremum and the infimum are indeed attained in (9). Hence, we can write T α u(x) := max min µ A(x) λ B(x) We are now ready to state our first main result. H(u, x, µ, λ). (10) Theorem 4.3. If A1-A3 hold, then (a) The α-discounted semi-markov game has a value V α ( ), which is the unique function in B w (X) that satisfies the Shapley equation V α (x) = T α V α (x), and, furthermore, there exists a pair of α-optimal stationary strategies. (b) A pair of stationary strategies (f, g) Φ 1 Φ 2 is α-optimal if and only if V (, f, g) is a solution to the Shapley equation. To ensure the existence of the value and optimal strategies for both players in the average criterion, we introduce the following conditions. Assumption 4 (A4). (a) There exists a positive constant M such that for all (x, a, b) K, τ(x, a, b) M. (b) The function τ is continuous on K. The next two assumptions are used to guarantee that the state process is ergodic in a suitable sense. Assumption 5 (A5). There exist a probability measure ν P(X), a positive number α < 1 and a measurable function φ : K [0, 1] for which the following holds for all (x, a, b) K and D B(X) : (a) Q(D x, a, b) φ(x, a, b)ν(d). (b) w(y)q(dy x, a, b) αw(x) + φ(x, a, b) ν w, where w is the function defined in A2(d) and ν w := w(y)ν(dy). (c) φ(x, f(x), g(x))ν(dx) > 0 for each f Φ 1, g Φ 2. Assumption 6 (A6). There is a constant b > 0 such that for each f Φ 1 and g Φ 2, φ(x, f(x), g(x))ν(dx) > b, with ν and φ as in Assumption 5. We are now ready to state our second main result. Theorem 4.4. Suppose that A1, A2, A4-A6 hold. Then the average semi- Markov game has a constant value V (x) = ξ for all x X, and there exists a pair of average optimal stationary strategies. 7

8 5 Proof of Theorem 4.3 First we shall prove some preliminary results. Lemma 5.1. If A1-A3 hold, then for each u B w (X), the function T α u is in B w (X), and T α u(x) = min max H(u, x, µ, λ). (11) λ B(x) µ A(x) Moreover, there exist stationary strategies f Φ 1 and g Φ 2 such that T α u(x) = H(u, x, f(x), g(x)) = max µ A(x) H(u, x, µ, g(x)) (12) = min λ B(x) H(u, x, f(x), λ). Proof. By Lemma 4.1 and A3(a), we have that for u B w (X) and (x, a, b) K, H(u, x, a, b) mw(x) + ρ α u w ηw(x), (13) which, as T α u is measurable, implies that T α u is in B w (X). On the other hand, by A2 and (13), it follows that the function x H(u, x, a, b) is in B w (X) and H(u, x,, b) is u.s.c. on A(x). Then, for fixed λ B(x), by Fatou s Lemma, the function a H(u, x, a, b)λ(db) B(x) is u.s.c. and bounded on A(x). Thus, since convergence on A(x) is the weak convergence of probability measures, by Theorem in [2], the function H(u, x,, λ) is u.s.c. on A(x). Similarily, H(u, x, µ, ) is l.s.c. on B(x). Moreover, H(u, x, µ, λ) is concave in µ and convex in λ. Thus, by Fan s Minimax Theorem [3] we obtain (11). The existence of stationary strategies f Φ 1 and g Φ 2 that satisfy (12) follows from (11) and well-known measurable selection theorems (see for instance Lemma 4.3 in [13]). We use below the following notation: for a pair of stationary strategies (f, g) Φ 1 Φ 2, we define the operator T fg on B w (X) as T fg u(x) := H(u, x, f(x), g(x)). It is clear (see the proof of Lemma 5.1) that T fg u belongs to B w (X) for each u B w (X). Lemma 5.2. If A1-A3 hold, then T α and T fg are both contraction operators on B w (X) with modulus ηρ α < 1. Proof. First we note that both operators are monotone; that is, if u, v B w (X) and u( ) v( ), then for all x X T fg u(x) T fg v(x). 8

9 Similarly, T α u(x) T α v(x) for all x X. Also, it is easy to see using (7) and A3 that for k 0, and T fg (u + kw)(x) T fg u(x) + ρ α ηkw(x) x X, T α (u + kw)(x) T α u(x) + ρ α ηkw(x) x X. (14) Now, for u, v B w (X), by (14), the monotonocity of T α and the fact that u v + w u v w, it follows that so that T α u(x) T α v(x) + ρ α η u v w w(x) x X, T α u(x) T α v(x) ρ α η u v w w(x) x X. (15) If we now interchange u and v we obtain T α u(x) T α v(x) ρ α η u v w w(x) x X, (16) and combining (15) and (16) we get i.e. T α u(x) T α v(x) ρ α η u v w w(x) x X, T α u T α v w ρ α η u v w. Hence, T α is a contraction operator with modulus ρ α η. Using the same arguments we can prove that T fg is a contraction operator with the same modulus ρ α η. Remark 5.3. Since T α and T fg are contraction operators on B w (X), by Banach s Fixed Point Theorem there exist unique functions v and v fg in B w (X) such that T α v (x) = v (x) and T fg v fg (x) = v fg (x) for all x X. Lemma 5.4. For a pair of stationary strategies (f, g) Φ 1 Φ 2, the function V (, f, g) is the unique fixed point of T fg in B w (X). Proof. We have to show that V (x, f, g) = T fg V (x, f, g) for all x X. Now, by (??), V (x, f, g) = E fg x {r(x 0, a 0, b 0 ) + n 1 n=1 k=0 = r(x, f(x), g(x)) + E fg x { n=1 k=0 β α (x k, a k, b k )r(x n, a n, b n )} n 1 β α (x k, a k, b k )r(x n, a n, b n )} = r(x, f(x), g(x)) + Ex fg {β α (x 0, a 0, b 0 )Ex fg [r(x 1, a 1, b 1 ) n 1 + β α (x k, a k, b k )r(x n, a n, b n ) h 1 ]} n=2 k=1 = r(x, f(x), g(x)) + Ex fg {β α(x 0, a 0, b 0 )V (x 1, f, g)} = T fg V (x, f, g). 9

10 Thus, V (, f, g) is the fixed point of T fg. Lemma 5.5. Suppose that A1-A3 hold, and let π and γ be arbitrary strategies for players 1 and 2, respectively. Then for each x X and n = 0, 1,... (a) Ex πγ w(x n ) η n w(x), (b) Ex πγr(x n, a n, b n ) mη n w(x), (c) lim n Ex πγ( n 1 k=0 β α(x k, a k, b k )u(x n )) = 0 for each u B w (X). Proof. For n = 0, (a) and (b) are trivially satisfied. Now, if n 1 then, by A3(a), Ex πγ [w(x n ) h n 1, a n 1, b n 1 ] = w(y)q(dy x n 1, a n 1, b n 1 ) ηw(x n 1 ). Hence Ex πγ w(x n ) ηex πγ w(x n 1 ), which by iteration yields (a). Part (b) follows immediately from (a) and A3(a). To prove (c), we observe that Lemma 4.1 and (a) yield n 1 [ Eπγ x k=0 β α (x k, a k, b k )u(x n )] ρn α Eπγ x This yields (c), since ρ α η < 1. u(x n) ρ n α u w Eπγ x w(x n) (ρ α η) n u w w(x). Proof of Theorem 4.3. (a) Let v be the unique fixed point in B w (X) of T α ; That is, by (9), v (x) = T α v (x) = max µ A(x) min λ B(x) H(v, x, µ, λ). By Lemma 5.1 there exists a pair of stationary strategies (f, g ) Φ 1 Φ 2 such that v (x) = H(v, x, f (x), g (x)) (17) = min λ B(x) H(v, x, f (x), λ) = max µ A(x) H(v, x, µ, g (x)). We will next prove that v is the value of the semi-markov game and that (f, g ) is an α-optimal strategy pair. The first equality in (17) implies that v is the fixed point in B w (X) of T f g (see Remark 5.3). Thus, by Lemma 5.4, v ( ) = V (, f, g ), so that it is enough to show that for arbitrary π Π and γ Γ, V (x, f, γ) V (x, f, g ) V (x, π, g ) x X. (18) We will prove the second inequality in (18). A similar proof can be given for the first inequality. By (6) we have V (x, π, g ) = E πg x {r(x 0, a 0, b 0 ) + n 1 n=1 k=0 10 β α (x k, a k, b k )r(x n, a n, b n )}.

11 From properties of the conditional expectation we have for n 1, h n H n, a n A(x n ), and b n B(x n ), Ex πg { n k=0 β α(x k, a k, b k )V (x n+1, f, g ) h n, a n, b n } = n k=0 β α(x k, a k, b k )E πg x {V (x n+1, f, g ) h n, a n, b n } = n k=0 β α(x k, a k, b k ) V (y, f, g )Q(dy x n, π n (h n ), g (x n )) = n 1 k=0 β α(x k, a k, b k ){β α (x n, a n, b n ) V (y, f, g )Q(dy x n, π n (h n ), g (x n )) +r(x n, π n (h n ), g (x n )) r(x n, π n (h n ), g (x n ))} n 1 k=0 β α(x k, a k, b k )[V (x n, f, g ) r(x n, π n (h n ), g (x n ))]. Equivalently, for n 1 n 1 k=0 β α(x k, a k, b k )V (x n, f, g ) We also have Ex πg { n k=0 β α(x k, a k, b k )V (x n+1, f, g ) h n, a n, b n ] n 1 k=0 β α(x k, a k, b k )r(x n, π n (h n ), g (x n )). V (x 0, f, g ) E πg x [β α (x 0, a 0, b 0 )V (x 1, f, g )] r(x 0, π 0 (x 0 ), g (x 0 )). Now, taking expectations and summing over n = 0, 1,...N we obtain N V (x, f, g ) Ex πg { k=0 E πg x [r(x 0, a 0, b 0 ) + β α (x k, a k, b k )V (x N+1, f, g )} N n 1 n=1 k=0 β α (x k, a k, b k )r(x n, a n, b n )]. Finally, letting N, by (6) and Lemma 5.5 (c) we obtain the required result. (b) (= ) Suppose that (f, g) Φ 1 Φ 2 is a pair of α-optimal stationary strategies. Then for all x X, π Π and γ Γ, V (x, f, γ) V (x, f, g) V (x, π, g). (19) Fix x X and for an arbitrary λ B(x) define ˆγ = (ˆγ n ) as follows: ˆγ 0 = λ and ˆγ n = g for n = 1, 2,... Then, by the first inequality in (19), V (x, f, g) V (x, f, ˆγ) = [r(x, a, b) B(x) A(x) + β α (x, a, b) V (y, f, g)q(dy x, a, b)]f(x)(da)λ(db). 11

12 It follows that from which we get Similarily, we can prove V (x, f, g) H(V (, f, g), x, f, λ), V (x, f, g) T α V (x, f, g). V (x, f, g) T α V (x, f, g), and combining the last two inequalities we get the desired result. ( =) The proof of this part is contained in the proof of part (a). 6 Proof of Theorem 4.4 The proof uses a data transformation introduced in [17] and applied by several authors for semi-markov control models (see e.g. [4], [5] [11]), and for semi- Markov games [10]. Definition 6.1. Let τ be a real number such that 0 < τ < inf K τ(x, a, b) (see Lemma 4.1). Define the function ˆr : K R and the stochastic kernel ˆQ on X given K by r(x, a, b) ˆr(x, a, b) := (20) τ(x, a, b) and ˆQ(B x, a, b) := τ τ Q(B x, a, b) + (1 τ(x, a, b) τ(x, a, b) )δ x(b) (21) where δ x ( ) is the Dirac measure concentrated at x. We refer to (X, A, B, K A, K B, ˆQ, ˆr) as the Markov game model associated to the semi-markov game (1). We will state some lemmas, which essentially show that the associated game model satisfies Assumptions A2, A5 and A6. We omit the proofs of lemmas 6.2, 6.3 and 6.4 because they are similar to the proofs of lemmas 5.2, 5.3 and 5.4 in [11], respectively. Lemma 6.2. If A1, A2 and A4 hold then, parts (b), (c) and (d) of A2 are satisfied when r and Q are replaced by ˆr and ˆQ, respectively. Lemma 6.3. Suppose that A1, A4 and A5 hold and let ˆφ(x, a, b) := [τ/τ(x, a, b)]φ(x, a, b). Then for all D B(X) and x X, (a) ˆQ(D x, a, b) ˆφ(x, a, b)ν(d). (b) w(y) ˆQ(dy x, a, b) ˆαw(x) + ˆφ(x, a, b) ν w where 0 < ˆα < 1 is the constant ˆα := [1 (τ/m)(1 α)]. (c) ˆφ(x, f(x), g(x))ν(dx) > 0 for each f Φ1, g Φ 2. 12

13 Lemma 6.4. If A1, A4, and A6 hold, then for each f Φ 1 and g Φ 2, ˆφ(x, f(x), g(x))ν(dx) > τb M. Remark 6.5. (a) If A1, A4 and A5 hold then for each pair (f, g) in Φ 1 Φ 2 : (a) The Markov chain with transition probability ˆQ ( x, f(x), g(x)) is ν- ireducible and positive Harris recurrent. Hence, in particular, it admits a unique invariant probability measure ˆq(f, g)( ). (b) If in addition A6 holds, Then the Markov chain {x n } is w-geometrically ergodic; that is, there exist positive constants ζ < 1 and L such that u(y) ˆQ n (dy x, f(x), g(x) ) u(y)ˆq(f, g)(dy) w(x) u w Lζn (22) for every u B w (X), x X, and n = 1, 2,..., where ˆQ n denotes the n-step Markov transition probability. (c) For x X, π Π, γ Γ, and u B w (X), lim n Ex πγ u(x n ) = 0. (23) n Part (a) follows from Lemma 3.3 in [19]. Part (b) comes from (a) and Lemmas 3.3 and 3.4 in [6], while part (c) follows from Lemma 4.3 (c) in [11]. Lemma 6.6. If Assumptions A1-A2, A4-A6 hold, then there exist a real number ξ, µ x P(A(x)), λ x P(B(x)) and functions h, h B w (X) such that, ξ + h (x) max {ˆr(x, µ, λ x) + h (y) ˆQ(dy x, µ, λ x )} (24) µ A(x) and ξ + h (x) min {ˆr(x, µ x, λ) + λ B(x) h (y) ˆQ(dy x, µ x, λ)}. (25) The proof of Lemma 6.6 is contained in the proof of Theorem 3 in [15]. The hypotheses C5 and C6 used in [15], are somewhat different from our A5 and A6, but the fact is that the key tool to obtain (24) and (25) is the w-geometric ergodicity (22). For u B w (X) and (x, a, b) K, define G(u, x, a, b) := r(x, a, b) ξ τ(x, a, b) + u(y)q(dy x, a, b), where ξ is the constant in Lemma 6.6. For u B w (X) let T u(x) := sup inf µ A(x) λ B(x) G(u, x, µ, λ). (26) 13

14 Then using similar arguments to those used in the proof of Lemma 5.1 we can show the following. Lemma 6.7. Suppose that the assumptions of Theorem 4.4 are satisfied. Then for each u B w (X), the function T u is in B w (X), the supremum and the infimum are attained in (26), and T u(x) = max min µ A(x) λ B(x) G(u, x, µ, λ) = min max λ B(x) µ A(x) G(u, x, µ, λ). Moreover there exist stationary strategies f Φ 1 and g Φ 2 such that for all x X, T u(x) = G(u, x, f(x), g(x)) = max µ A(x) G(u, x, µ, g(x)) = min λ B(x) G(u, x, f(x), λ). Proof of Theorem 4.4. Fix x X. Then from (24) it follows that for each a A(x), ξ + h (x) ˆr(x, a, λ x ) + h (y) ˆQ(dy x, a, λ x ) inf {ˆr(x, a, b) + h (y) ˆQ(dy x, a, b)}. b B(x) Then, by A2(a) and Lemma 6.2, for a A(x) there exists b a B(x) such that ξ + h (x) ˆr(x, a, b a ) + h (y) ˆQ(dy x, a, b a ). Applying the transformation (20) and (21) we obtain τh (x) r(x, a, b a ) ξ τ(x, a, b a ) + τh (y)q(dy x, a, b a ). (27) Since (27) holds for each a A(x), setting ω ( ) := τh ( ) we get ω (x) min max {r(x, a, b) b B(x) a A(x) ξ τ(x, a, b) + ω (y)q(dy x, a, b)} which in turn yields ω (x) min max {r(x, µ, λ) λ B(x) µ A(x) ξ τ(x, µ, λ) + ω (y)q(dy x, µ, λ )}. (28) Thus, by Lemma 6.7, there is g Φ 2 such that ω (x) max {r(x, µ, µ B(x) g (x)) ξ τ(x, µ, g (x)) (29) + ω (y)q(dy x, µ, g (x) )}. 14

15 Then, by standard dynamic programming arguments (see [1], [4], [5], [7], [8], [9]) based on (29) and (23) we have for each π Π. Thus we get ξ J(x, π, g ) ξ sup J(x, π, g ) U(x). (30) π Π Similarily, from (25) we can show the existence of a stationary strategy f Φ 1 such that ξ inf J(x, f, γ) L(x). (31) γ Γ Combining (30) and (31), we conclude that the game has a value ξ, which is independent of the initial state x, and (f, g ) Φ 1 Φ 2 is an average optimal pair of stationary strategies. References [1] E. Altman, A. Hordijk and F.M. Spieksma, Contraction conditions for average and α-discounted optimality in countable state Markov games with unbounded rewards. Math. Oper. Res. 22 (1997), [2] R.B. Ash and C.A. Doléans-Dade, Probability and Measure Theory. Academic Press, New York, [3] K. Fan, Minimax theorems. Proc. Nat. Acad. Sci. USA 39 (1953), [4] A. Federgruen, P.J. Schweitzer and H.C. Tijms, Denumerable undiscounted semi-markov decision processes with unbounded rewards. Math. Oper. Res. 8 (1983), [5] A. Federgruen and H.C. Tijms, The optimality equation in average cost denumerable state semi-markov decision problems. Recurrence conditions and algorithms. J. Appl. Probab. 15 (1978), [6] E. Gordienko and O. Hernández-Lerma, Average cost Markov control processes with weigthed norms: Existence of canonical policies. Appl. Math. (Warsaw), 23 (1995), [7] O. Hernández-Lerma and J.B. Lasserre, Discrete-Time Markov Control Processes: Basic Optimality Criteria. Springer-Verlag, New York, [8] O. Hernández-Lerma and J.B. Lasserre, Zero-sum stochastic games in Borel spaces: average payoff criteria. SIAM J. Control Optim. 39 (2001),

16 [9] O. Hernández-Lerma and O. Vega-Amaya, Infinite-horizon Markov control processes with undiscounted cost criteria: From average to overtaking optimality. Appl. Math. (Warsaw), 25 (1998), [10] A.K. Lal and S. Sinha, Zero-sum two person semi-markov games. J. Appl. Prob. 29 (1992), [11] F. Luque-Vásquez and O. Hernández-Lerma, Semi-Markov control models with average cost. Appl. Math. (Warsaw), 26 (1999), [12] A. Maitra and T. Parthasarathy, On stochastic games. J. Optim. Theory Appl. 5 (1970), [13] A.S. Nowak, On zero-sum stochastic games with general state space I. Probab. Math. Statist. 4 (1984), [14] A.S. Nowak, Measurable selection theorems for minimax stochastic optimization problems. SIAM J. Control Optim. 23 (1985), [15] A.S. Nowak, Optimal strategies in a class of zero-sum ergodic stochastic games. Math. Meth. Oper. Res. 50 (1999), [16] F. Ramírez-Reyes, Existence of optimal strategies for zero-sum stochastic games with discounted payoff. Morfismos 5 (2001), [17] P.J. Schweitzer, Iterative solution of the functional equations of undiscounted Markov renewal programming. J. Math. Anal. Appl. 34 (1971), [18] L.I. Sennott, Zero-sum stochastic games with unbounded costs: discounted and average cost cases. Z. Oper. Res. 39 (1994), [19] O. Vega-Amaya, The average cost optimality equation: a fixed point approach. Syst. Control Lett.; to appear. 16

Zero-Sum Average Semi-Markov Games: Fixed Point Solutions of the Shapley Equation

Zero-Sum Average Semi-Markov Games: Fixed Point Solutions of the Shapley Equation Zero-Sum Average Semi-Markov Games: Fixed Point Solutions of the Shapley Equation Oscar Vega-Amaya Departamento de Matemáticas Universidad de Sonora May 2002 Abstract This paper deals with zero-sum average

More information

Total Expected Discounted Reward MDPs: Existence of Optimal Policies

Total Expected Discounted Reward MDPs: Existence of Optimal Policies Total Expected Discounted Reward MDPs: Existence of Optimal Policies Eugene A. Feinberg Department of Applied Mathematics and Statistics State University of New York at Stony Brook Stony Brook, NY 11794-3600

More information

Continuous-Time Markov Decision Processes. Discounted and Average Optimality Conditions. Xianping Guo Zhongshan University.

Continuous-Time Markov Decision Processes. Discounted and Average Optimality Conditions. Xianping Guo Zhongshan University. Continuous-Time Markov Decision Processes Discounted and Average Optimality Conditions Xianping Guo Zhongshan University. Email: mcsgxp@zsu.edu.cn Outline The control model The existing works Our conditions

More information

REMARKS ON THE EXISTENCE OF SOLUTIONS IN MARKOV DECISION PROCESSES. Emmanuel Fernández-Gaucherand, Aristotle Arapostathis, and Steven I.

REMARKS ON THE EXISTENCE OF SOLUTIONS IN MARKOV DECISION PROCESSES. Emmanuel Fernández-Gaucherand, Aristotle Arapostathis, and Steven I. REMARKS ON THE EXISTENCE OF SOLUTIONS TO THE AVERAGE COST OPTIMALITY EQUATION IN MARKOV DECISION PROCESSES Emmanuel Fernández-Gaucherand, Aristotle Arapostathis, and Steven I. Marcus Department of Electrical

More information

Value Iteration and Action ɛ-approximation of Optimal Policies in Discounted Markov Decision Processes

Value Iteration and Action ɛ-approximation of Optimal Policies in Discounted Markov Decision Processes Value Iteration and Action ɛ-approximation of Optimal Policies in Discounted Markov Decision Processes RAÚL MONTES-DE-OCA Departamento de Matemáticas Universidad Autónoma Metropolitana-Iztapalapa San Rafael

More information

A Perturbation Approach to Approximate Value Iteration for Average Cost Markov Decision Process with Borel Spaces and Bounded Costs

A Perturbation Approach to Approximate Value Iteration for Average Cost Markov Decision Process with Borel Spaces and Bounded Costs A Perturbation Approach to Approximate Value Iteration for Average Cost Markov Decision Process with Borel Spaces and Bounded Costs Óscar Vega-Amaya Joaquín López-Borbón August 21, 2015 Abstract The present

More information

Noncooperative continuous-time Markov games

Noncooperative continuous-time Markov games Morfismos, Vol. 9, No. 1, 2005, pp. 39 54 Noncooperative continuous-time Markov games Héctor Jasso-Fuentes Abstract This work concerns noncooperative continuous-time Markov games with Polish state and

More information

SEMI-MARKOV CONTROL MODELS WITH AVERAGE COSTS

SEMI-MARKOV CONTROL MODELS WITH AVERAGE COSTS APPLICATIONES MATHEMATICAE 26,3(1999), pp. 315 331 F. LUQUE-VÁSQUEZ (Sonora) O. HERNÁNDEZ-LERMA(México) SEMI-MARKOV CONTROL MODELS WITH AVERAGE COSTS Abstract. This paper studies semi-markov control models

More information

On Ergodic Impulse Control with Constraint

On Ergodic Impulse Control with Constraint On Ergodic Impulse Control with Constraint Maurice Robin Based on joint papers with J.L. Menaldi University Paris-Sanclay 9119 Saint-Aubin, France (e-mail: maurice.robin@polytechnique.edu) IMA, Minneapolis,

More information

Optimality Inequalities for Average Cost Markov Decision Processes and the Optimality of (s, S) Policies

Optimality Inequalities for Average Cost Markov Decision Processes and the Optimality of (s, S) Policies Optimality Inequalities for Average Cost Markov Decision Processes and the Optimality of (s, S) Policies Eugene A. Feinberg Department of Applied Mathematics and Statistics State University of New York

More information

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due ). Show that the open disk x 2 + y 2 < 1 is a countable union of planar elementary sets. Show that the closed disk x 2 + y 2 1 is a countable

More information

On the Convergence of Optimal Actions for Markov Decision Processes and the Optimality of (s, S) Inventory Policies

On the Convergence of Optimal Actions for Markov Decision Processes and the Optimality of (s, S) Inventory Policies On the Convergence of Optimal Actions for Markov Decision Processes and the Optimality of (s, S) Inventory Policies Eugene A. Feinberg Department of Applied Mathematics and Statistics Stony Brook University

More information

Controlled Markov Processes with Arbitrary Numerical Criteria

Controlled Markov Processes with Arbitrary Numerical Criteria Controlled Markov Processes with Arbitrary Numerical Criteria Naci Saldi Department of Mathematics and Statistics Queen s University MATH 872 PROJECT REPORT April 20, 2012 0.1 Introduction In the theory

More information

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due 9/5). Prove that every countable set A is measurable and µ(a) = 0. 2 (Bonus). Let A consist of points (x, y) such that either x or y is

More information

Optimality Inequalities for Average Cost MDPs and their Inventory Control Applications

Optimality Inequalities for Average Cost MDPs and their Inventory Control Applications 43rd IEEE Conference on Decision and Control December 14-17, 2004 Atlantis, Paradise Island, Bahamas FrA08.6 Optimality Inequalities for Average Cost MDPs and their Inventory Control Applications Eugene

More information

Probability and Measure

Probability and Measure Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability

More information

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3

Brownian Motion. 1 Definition Brownian Motion Wiener measure... 3 Brownian Motion Contents 1 Definition 2 1.1 Brownian Motion................................. 2 1.2 Wiener measure.................................. 3 2 Construction 4 2.1 Gaussian process.................................

More information

ON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME

ON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME ON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME SAUL D. JACKA AND ALEKSANDAR MIJATOVIĆ Abstract. We develop a general approach to the Policy Improvement Algorithm (PIA) for stochastic control problems

More information

On the Convergence of Optimal Actions for Markov Decision Processes and the Optimality of (s, S) Inventory Policies

On the Convergence of Optimal Actions for Markov Decision Processes and the Optimality of (s, S) Inventory Policies On the Convergence of Optimal Actions for Markov Decision Processes and the Optimality of (s, S) Inventory Policies Eugene A. Feinberg, 1 Mark E. Lewis 2 1 Department of Applied Mathematics and Statistics,

More information

WEAK CONVERGENCES OF PROBABILITY MEASURES: A UNIFORM PRINCIPLE

WEAK CONVERGENCES OF PROBABILITY MEASURES: A UNIFORM PRINCIPLE PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY Volume 126, Number 10, October 1998, Pages 3089 3096 S 0002-9939(98)04390-1 WEAK CONVERGENCES OF PROBABILITY MEASURES: A UNIFORM PRINCIPLE JEAN B. LASSERRE

More information

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.

Ergodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R. Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions

More information

Abstract Dynamic Programming

Abstract Dynamic Programming Abstract Dynamic Programming Dimitri P. Bertsekas Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Overview of the Research Monograph Abstract Dynamic Programming"

More information

Reductions Of Undiscounted Markov Decision Processes and Stochastic Games To Discounted Ones. Jefferson Huang

Reductions Of Undiscounted Markov Decision Processes and Stochastic Games To Discounted Ones. Jefferson Huang Reductions Of Undiscounted Markov Decision Processes and Stochastic Games To Discounted Ones Jefferson Huang School of Operations Research and Information Engineering Cornell University November 16, 2016

More information

THEOREMS, ETC., FOR MATH 515

THEOREMS, ETC., FOR MATH 515 THEOREMS, ETC., FOR MATH 515 Proposition 1 (=comment on page 17). If A is an algebra, then any finite union or finite intersection of sets in A is also in A. Proposition 2 (=Proposition 1.1). For every

More information

1 Stochastic Dynamic Programming

1 Stochastic Dynamic Programming 1 Stochastic Dynamic Programming Formally, a stochastic dynamic program has the same components as a deterministic one; the only modification is to the state transition equation. When events in the future

More information

Continuity of equilibria for two-person zero-sum games with noncompact action sets and unbounded payoffs

Continuity of equilibria for two-person zero-sum games with noncompact action sets and unbounded payoffs DOI 10.1007/s10479-017-2677-y FEINBERG: PROBABILITY Continuity of equilibria for two-person zero-sum games with noncompact action sets and unbounded payoffs Eugene A. Feinberg 1 Pavlo O. Kasyanov 2 Michael

More information

Subgame-Perfect Equilibria for Stochastic Games

Subgame-Perfect Equilibria for Stochastic Games MATHEMATICS OF OPERATIONS RESEARCH Vol. 32, No. 3, August 2007, pp. 711 722 issn 0364-765X eissn 1526-5471 07 3203 0711 informs doi 10.1287/moor.1070.0264 2007 INFORMS Subgame-Perfect Equilibria for Stochastic

More information

Weighted Sup-Norm Contractions in Dynamic Programming: A Review and Some New Applications

Weighted Sup-Norm Contractions in Dynamic Programming: A Review and Some New Applications May 2012 Report LIDS - 2884 Weighted Sup-Norm Contractions in Dynamic Programming: A Review and Some New Applications Dimitri P. Bertsekas Abstract We consider a class of generalized dynamic programming

More information

Economics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011

Economics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011 Economics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011 Section 2.6 (cont.) Properties of Real Functions Here we first study properties of functions from R to R, making use of the additional structure

More information

OPTIMAL SOLUTIONS TO STOCHASTIC DIFFERENTIAL INCLUSIONS

OPTIMAL SOLUTIONS TO STOCHASTIC DIFFERENTIAL INCLUSIONS APPLICATIONES MATHEMATICAE 29,4 (22), pp. 387 398 Mariusz Michta (Zielona Góra) OPTIMAL SOLUTIONS TO STOCHASTIC DIFFERENTIAL INCLUSIONS Abstract. A martingale problem approach is used first to analyze

More information

Infinite-Horizon Discounted Markov Decision Processes

Infinite-Horizon Discounted Markov Decision Processes Infinite-Horizon Discounted Markov Decision Processes Dan Zhang Leeds School of Business University of Colorado at Boulder Dan Zhang, Spring 2012 Infinite Horizon Discounted MDP 1 Outline The expected

More information

2 Statement of the problem and assumptions

2 Statement of the problem and assumptions Mathematical Notes, 25, vol. 78, no. 4, pp. 466 48. Existence Theorem for Optimal Control Problems on an Infinite Time Interval A.V. Dmitruk and N.V. Kuz kina We consider an optimal control problem on

More information

(convex combination!). Use convexity of f and multiply by the common denominator to get. Interchanging the role of x and y, we obtain that f is ( 2M ε

(convex combination!). Use convexity of f and multiply by the common denominator to get. Interchanging the role of x and y, we obtain that f is ( 2M ε 1. Continuity of convex functions in normed spaces In this chapter, we consider continuity properties of real-valued convex functions defined on open convex sets in normed spaces. Recall that every infinitedimensional

More information

A note on two-person zero-sum communicating stochastic games

A note on two-person zero-sum communicating stochastic games Operations Research Letters 34 26) 412 42 Operations Research Letters www.elsevier.com/locate/orl A note on two-person zero-sum communicating stochastic games Zeynep Müge Avşar a, Melike Baykal-Gürsoy

More information

Optimality of Walrand-Varaiya Type Policies and. Approximation Results for Zero-Delay Coding of. Markov Sources. Richard G. Wood

Optimality of Walrand-Varaiya Type Policies and. Approximation Results for Zero-Delay Coding of. Markov Sources. Richard G. Wood Optimality of Walrand-Varaiya Type Policies and Approximation Results for Zero-Delay Coding of Markov Sources by Richard G. Wood A thesis submitted to the Department of Mathematics & Statistics in conformity

More information

The local equicontinuity of a maximal monotone operator

The local equicontinuity of a maximal monotone operator arxiv:1410.3328v2 [math.fa] 3 Nov 2014 The local equicontinuity of a maximal monotone operator M.D. Voisei Abstract The local equicontinuity of an operator T : X X with proper Fitzpatrick function ϕ T

More information

A convergence result for an Outer Approximation Scheme

A convergence result for an Outer Approximation Scheme A convergence result for an Outer Approximation Scheme R. S. Burachik Engenharia de Sistemas e Computação, COPPE-UFRJ, CP 68511, Rio de Janeiro, RJ, CEP 21941-972, Brazil regi@cos.ufrj.br J. O. Lopes Departamento

More information

Examples of Dual Spaces from Measure Theory

Examples of Dual Spaces from Measure Theory Chapter 9 Examples of Dual Spaces from Measure Theory We have seen that L (, A, µ) is a Banach space for any measure space (, A, µ). We will extend that concept in the following section to identify an

More information

FUNCTIONAL COMPRESSION-EXPANSION FIXED POINT THEOREM

FUNCTIONAL COMPRESSION-EXPANSION FIXED POINT THEOREM Electronic Journal of Differential Equations, Vol. 28(28), No. 22, pp. 1 12. ISSN: 172-6691. URL: http://ejde.math.txstate.edu or http://ejde.math.unt.edu ftp ejde.math.txstate.edu (login: ftp) FUNCTIONAL

More information

A CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE

A CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE Journal of Applied Analysis Vol. 6, No. 1 (2000), pp. 139 148 A CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE A. W. A. TAHA Received

More information

On Finding Optimal Policies for Markovian Decision Processes Using Simulation

On Finding Optimal Policies for Markovian Decision Processes Using Simulation On Finding Optimal Policies for Markovian Decision Processes Using Simulation Apostolos N. Burnetas Case Western Reserve University Michael N. Katehakis Rutgers University February 1995 Abstract A simulation

More information

Invariant measures for iterated function systems

Invariant measures for iterated function systems ANNALES POLONICI MATHEMATICI LXXV.1(2000) Invariant measures for iterated function systems by Tomasz Szarek (Katowice and Rzeszów) Abstract. A new criterion for the existence of an invariant distribution

More information

Stochastic Dynamic Programming: The One Sector Growth Model

Stochastic Dynamic Programming: The One Sector Growth Model Stochastic Dynamic Programming: The One Sector Growth Model Esteban Rossi-Hansberg Princeton University March 26, 2012 Esteban Rossi-Hansberg () Stochastic Dynamic Programming March 26, 2012 1 / 31 References

More information

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond Measure Theory on Topological Spaces Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond May 22, 2011 Contents 1 Introduction 2 1.1 The Riemann Integral........................................ 2 1.2 Measurable..............................................

More information

OPTIMAL SOLUTIONS OF CONSTRAINED DISCOUNTED SEMI-MARKOV CONTROL PROBLEMS

OPTIMAL SOLUTIONS OF CONSTRAINED DISCOUNTED SEMI-MARKOV CONTROL PROBLEMS Bol. Soc. Mat. Mexicana (3) Vol. 19, 213 OPTIMAL SOLUTIONS OF CONSTRAINED DISCOUNTED SEMI-MARKOV CONTROL PROBLEMS JUAN GONZÁLEZ-HERNÁNDEZ AND CÉSAR EMILIO VILLARREAL-RODRÍGUEZ ABSTRACT. We give conditions

More information

EXISTENCE OF SOLUTIONS TO ASYMPTOTICALLY PERIODIC SCHRÖDINGER EQUATIONS

EXISTENCE OF SOLUTIONS TO ASYMPTOTICALLY PERIODIC SCHRÖDINGER EQUATIONS Electronic Journal of Differential Equations, Vol. 017 (017), No. 15, pp. 1 7. ISSN: 107-6691. URL: http://ejde.math.txstate.edu or http://ejde.math.unt.edu EXISTENCE OF SOLUTIONS TO ASYMPTOTICALLY PERIODIC

More information

SOME RESULTS CONCERNING THE MULTIVALUED OPTIMIZATION AND THEIR APPLICATIONS

SOME RESULTS CONCERNING THE MULTIVALUED OPTIMIZATION AND THEIR APPLICATIONS Seminar on Fixed Point Theory Cluj-Napoca, Volume 3, 2002, 271-276 http://www.math.ubbcluj.ro/ nodeacj/journal.htm SOME RESULTS CONCERNING THE MULTIVALUED OPTIMIZATION AND THEIR APPLICATIONS AUREL MUNTEAN

More information

MAT 578 FUNCTIONAL ANALYSIS EXERCISES

MAT 578 FUNCTIONAL ANALYSIS EXERCISES MAT 578 FUNCTIONAL ANALYSIS EXERCISES JOHN QUIGG Exercise 1. Prove that if A is bounded in a topological vector space, then for every neighborhood V of 0 there exists c > 0 such that tv A for all t > c.

More information

AW -Convergence and Well-Posedness of Non Convex Functions

AW -Convergence and Well-Posedness of Non Convex Functions Journal of Convex Analysis Volume 10 (2003), No. 2, 351 364 AW -Convergence Well-Posedness of Non Convex Functions Silvia Villa DIMA, Università di Genova, Via Dodecaneso 35, 16146 Genova, Italy villa@dima.unige.it

More information

Lecture 10. Theorem 1.1 [Ergodicity and extremality] A probability measure µ on (Ω, F) is ergodic for T if and only if it is an extremal point in M.

Lecture 10. Theorem 1.1 [Ergodicity and extremality] A probability measure µ on (Ω, F) is ergodic for T if and only if it is an extremal point in M. Lecture 10 1 Ergodic decomposition of invariant measures Let T : (Ω, F) (Ω, F) be measurable, and let M denote the space of T -invariant probability measures on (Ω, F). Then M is a convex set, although

More information

Recall that if X is a compact metric space, C(X), the space of continuous (real-valued) functions on X, is a Banach space with the norm

Recall that if X is a compact metric space, C(X), the space of continuous (real-valued) functions on X, is a Banach space with the norm Chapter 13 Radon Measures Recall that if X is a compact metric space, C(X), the space of continuous (real-valued) functions on X, is a Banach space with the norm (13.1) f = sup x X f(x). We want to identify

More information

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past.

Lecture 5. If we interpret the index n 0 as time, then a Markov chain simply requires that the future depends only on the present and not on the past. 1 Markov chain: definition Lecture 5 Definition 1.1 Markov chain] A sequence of random variables (X n ) n 0 taking values in a measurable state space (S, S) is called a (discrete time) Markov chain, if

More information

On intermediate value theorem in ordered Banach spaces for noncompact and discontinuous mappings

On intermediate value theorem in ordered Banach spaces for noncompact and discontinuous mappings Int. J. Nonlinear Anal. Appl. 7 (2016) No. 1, 295-300 ISSN: 2008-6822 (electronic) http://dx.doi.org/10.22075/ijnaa.2015.341 On intermediate value theorem in ordered Banach spaces for noncompact and discontinuous

More information

Weighted Sums of Orthogonal Polynomials Related to Birth-Death Processes with Killing

Weighted Sums of Orthogonal Polynomials Related to Birth-Death Processes with Killing Advances in Dynamical Systems and Applications ISSN 0973-5321, Volume 8, Number 2, pp. 401 412 (2013) http://campus.mst.edu/adsa Weighted Sums of Orthogonal Polynomials Related to Birth-Death Processes

More information

THEOREMS, ETC., FOR MATH 516

THEOREMS, ETC., FOR MATH 516 THEOREMS, ETC., FOR MATH 516 Results labeled Theorem Ea.b.c (or Proposition Ea.b.c, etc.) refer to Theorem c from section a.b of Evans book (Partial Differential Equations). Proposition 1 (=Proposition

More information

USING FUNCTIONAL ANALYSIS AND SOBOLEV SPACES TO SOLVE POISSON S EQUATION

USING FUNCTIONAL ANALYSIS AND SOBOLEV SPACES TO SOLVE POISSON S EQUATION USING FUNCTIONAL ANALYSIS AND SOBOLEV SPACES TO SOLVE POISSON S EQUATION YI WANG Abstract. We study Banach and Hilbert spaces with an eye towards defining weak solutions to elliptic PDE. Using Lax-Milgram

More information

ON THE EXISTENCE OF THREE SOLUTIONS FOR QUASILINEAR ELLIPTIC PROBLEM. Paweł Goncerz

ON THE EXISTENCE OF THREE SOLUTIONS FOR QUASILINEAR ELLIPTIC PROBLEM. Paweł Goncerz Opuscula Mathematica Vol. 32 No. 3 2012 http://dx.doi.org/10.7494/opmath.2012.32.3.473 ON THE EXISTENCE OF THREE SOLUTIONS FOR QUASILINEAR ELLIPTIC PROBLEM Paweł Goncerz Abstract. We consider a quasilinear

More information

MA651 Topology. Lecture 10. Metric Spaces.

MA651 Topology. Lecture 10. Metric Spaces. MA65 Topology. Lecture 0. Metric Spaces. This text is based on the following books: Topology by James Dugundgji Fundamental concepts of topology by Peter O Neil Linear Algebra and Analysis by Marc Zamansky

More information

Continuity of convex functions in normed spaces

Continuity of convex functions in normed spaces Continuity of convex functions in normed spaces In this chapter, we consider continuity properties of real-valued convex functions defined on open convex sets in normed spaces. Recall that every infinitedimensional

More information

MATHS 730 FC Lecture Notes March 5, Introduction

MATHS 730 FC Lecture Notes March 5, Introduction 1 INTRODUCTION MATHS 730 FC Lecture Notes March 5, 2014 1 Introduction Definition. If A, B are sets and there exists a bijection A B, they have the same cardinality, which we write as A, #A. If there exists

More information

1 Basics of probability theory

1 Basics of probability theory Examples of Stochastic Optimization Problems In this chapter, we will give examples of three types of stochastic optimization problems, that is, optimal stopping, total expected (discounted) cost problem,

More information

University of Warwick, EC9A0 Maths for Economists Lecture Notes 10: Dynamic Programming

University of Warwick, EC9A0 Maths for Economists Lecture Notes 10: Dynamic Programming University of Warwick, EC9A0 Maths for Economists 1 of 63 University of Warwick, EC9A0 Maths for Economists Lecture Notes 10: Dynamic Programming Peter J. Hammond Autumn 2013, revised 2014 University of

More information

CHARACTERIZATION OF CARATHÉODORY FUNCTIONS. Andrzej Nowak. 1. Preliminaries

CHARACTERIZATION OF CARATHÉODORY FUNCTIONS. Andrzej Nowak. 1. Preliminaries Annales Mathematicae Silesianae 27 (2013), 93 98 Prace Naukowe Uniwersytetu Śląskiego nr 3106, Katowice CHARACTERIZATION OF CARATHÉODORY FUNCTIONS Andrzej Nowak Abstract. We study Carathéodory functions

More information

Applied Analysis (APPM 5440): Final exam 1:30pm 4:00pm, Dec. 14, Closed books.

Applied Analysis (APPM 5440): Final exam 1:30pm 4:00pm, Dec. 14, Closed books. Applied Analysis APPM 44: Final exam 1:3pm 4:pm, Dec. 14, 29. Closed books. Problem 1: 2p Set I = [, 1]. Prove that there is a continuous function u on I such that 1 ux 1 x sin ut 2 dt = cosx, x I. Define

More information

Subdifferential representation of convex functions: refinements and applications

Subdifferential representation of convex functions: refinements and applications Subdifferential representation of convex functions: refinements and applications Joël Benoist & Aris Daniilidis Abstract Every lower semicontinuous convex function can be represented through its subdifferential

More information

Stability of optimization problems with stochastic dominance constraints

Stability of optimization problems with stochastic dominance constraints Stability of optimization problems with stochastic dominance constraints D. Dentcheva and W. Römisch Stevens Institute of Technology, Hoboken Humboldt-University Berlin www.math.hu-berlin.de/~romisch SIAM

More information

Aliprantis, Border: Infinite-dimensional Analysis A Hitchhiker s Guide

Aliprantis, Border: Infinite-dimensional Analysis A Hitchhiker s Guide aliprantis.tex May 10, 2011 Aliprantis, Border: Infinite-dimensional Analysis A Hitchhiker s Guide Notes from [AB2]. 1 Odds and Ends 2 Topology 2.1 Topological spaces Example. (2.2) A semimetric = triangle

More information

Chapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS

Chapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS Chapter 2 SOME ANALYTICAL TOOLS USED IN THE THESIS 63 2.1 Introduction In this chapter we describe the analytical tools used in this thesis. They are Markov Decision Processes(MDP), Markov Renewal process

More information

CARISTI TYPE OPERATORS AND APPLICATIONS

CARISTI TYPE OPERATORS AND APPLICATIONS STUDIA UNIV. BABEŞ BOLYAI, MATHEMATICA, Volume XLVIII, Number 3, September 2003 Dedicated to Professor Gheorghe Micula at his 60 th anniversary 1. Introduction Caristi s fixed point theorem states that

More information

NONTRIVIAL SOLUTIONS TO INTEGRAL AND DIFFERENTIAL EQUATIONS

NONTRIVIAL SOLUTIONS TO INTEGRAL AND DIFFERENTIAL EQUATIONS Fixed Point Theory, Volume 9, No. 1, 28, 3-16 http://www.math.ubbcluj.ro/ nodeacj/sfptcj.html NONTRIVIAL SOLUTIONS TO INTEGRAL AND DIFFERENTIAL EQUATIONS GIOVANNI ANELLO Department of Mathematics University

More information

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

Integral Jensen inequality

Integral Jensen inequality Integral Jensen inequality Let us consider a convex set R d, and a convex function f : (, + ]. For any x,..., x n and λ,..., λ n with n λ i =, we have () f( n λ ix i ) n λ if(x i ). For a R d, let δ a

More information

A note on the growth rate in the Fazekas Klesov general law of large numbers and on the weak law of large numbers for tail series

A note on the growth rate in the Fazekas Klesov general law of large numbers and on the weak law of large numbers for tail series Publ. Math. Debrecen 73/1-2 2008), 1 10 A note on the growth rate in the Fazekas Klesov general law of large numbers and on the weak law of large numbers for tail series By SOO HAK SUNG Taejon), TIEN-CHUNG

More information

EXISTENCE RESULTS FOR QUASILINEAR HEMIVARIATIONAL INEQUALITIES AT RESONANCE. Leszek Gasiński

EXISTENCE RESULTS FOR QUASILINEAR HEMIVARIATIONAL INEQUALITIES AT RESONANCE. Leszek Gasiński DISCRETE AND CONTINUOUS Website: www.aimsciences.org DYNAMICAL SYSTEMS SUPPLEMENT 2007 pp. 409 418 EXISTENCE RESULTS FOR QUASILINEAR HEMIVARIATIONAL INEQUALITIES AT RESONANCE Leszek Gasiński Jagiellonian

More information

Lebesgue Integration: A non-rigorous introduction. What is wrong with Riemann integration?

Lebesgue Integration: A non-rigorous introduction. What is wrong with Riemann integration? Lebesgue Integration: A non-rigorous introduction What is wrong with Riemann integration? xample. Let f(x) = { 0 for x Q 1 for x / Q. The upper integral is 1, while the lower integral is 0. Yet, the function

More information

Robust Optimal Control Using Conditional Risk Mappings in Infinite Horizon

Robust Optimal Control Using Conditional Risk Mappings in Infinite Horizon Robust Optimal Control Using Conditional Risk Mappings in Infinite Horizon Kerem Uğurlu Monday 9 th April, 2018 Department of Applied Mathematics, University of Washington, Seattle, WA 98195 e-mail: keremu@uw.edu

More information

Weak Feller Property of Non-linear Filters

Weak Feller Property of Non-linear Filters ariv:1812.05509v1 [math.oc] 13 Dec 2018 Weak Feller Property of Non-linear Filters Ali Devran Kara, Naci Saldi and Serdar üksel Abstract Weak Feller property of controlled and control-free Markov chains

More information

Generalized metric properties of spheres and renorming of Banach spaces

Generalized metric properties of spheres and renorming of Banach spaces arxiv:1605.08175v2 [math.fa] 5 Nov 2018 Generalized metric properties of spheres and renorming of Banach spaces 1 Introduction S. Ferrari, J. Orihuela, M. Raja November 6, 2018 Throughout this paper X

More information

Introduction and Preliminaries

Introduction and Preliminaries Chapter 1 Introduction and Preliminaries This chapter serves two purposes. The first purpose is to prepare the readers for the more systematic development in later chapters of methods of real analysis

More information

On Differentiability of Average Cost in Parameterized Markov Chains

On Differentiability of Average Cost in Parameterized Markov Chains On Differentiability of Average Cost in Parameterized Markov Chains Vijay Konda John N. Tsitsiklis August 30, 2002 1 Overview The purpose of this appendix is to prove Theorem 4.6 in 5 and establish various

More information

REVIEW OF ESSENTIAL MATH 346 TOPICS

REVIEW OF ESSENTIAL MATH 346 TOPICS REVIEW OF ESSENTIAL MATH 346 TOPICS 1. AXIOMATIC STRUCTURE OF R Doğan Çömez The real number system is a complete ordered field, i.e., it is a set R which is endowed with addition and multiplication operations

More information

Infinite-Horizon Average Reward Markov Decision Processes

Infinite-Horizon Average Reward Markov Decision Processes Infinite-Horizon Average Reward Markov Decision Processes Dan Zhang Leeds School of Business University of Colorado at Boulder Dan Zhang, Spring 2012 Infinite Horizon Average Reward MDP 1 Outline The average

More information

Lecture 12. F o s, (1.1) F t := s>t

Lecture 12. F o s, (1.1) F t := s>t Lecture 12 1 Brownian motion: the Markov property Let C := C(0, ), R) be the space of continuous functions mapping from 0, ) to R, in which a Brownian motion (B t ) t 0 almost surely takes its value. Let

More information

Stochastic relations of random variables and processes

Stochastic relations of random variables and processes Stochastic relations of random variables and processes Lasse Leskelä Helsinki University of Technology 7th World Congress in Probability and Statistics Singapore, 18 July 2008 Fundamental problem of applied

More information

Herz (cf. [H], and also [BS]) proved that the reverse inequality is also true, that is,

Herz (cf. [H], and also [BS]) proved that the reverse inequality is also true, that is, REARRANGEMENT OF HARDY-LITTLEWOOD MAXIMAL FUNCTIONS IN LORENTZ SPACES. Jesús Bastero*, Mario Milman and Francisco J. Ruiz** Abstract. For the classical Hardy-Littlewood maximal function M f, a well known

More information

ITERATIVE SCHEMES FOR APPROXIMATING SOLUTIONS OF ACCRETIVE OPERATORS IN BANACH SPACES SHOJI KAMIMURA AND WATARU TAKAHASHI. Received December 14, 1999

ITERATIVE SCHEMES FOR APPROXIMATING SOLUTIONS OF ACCRETIVE OPERATORS IN BANACH SPACES SHOJI KAMIMURA AND WATARU TAKAHASHI. Received December 14, 1999 Scientiae Mathematicae Vol. 3, No. 1(2000), 107 115 107 ITERATIVE SCHEMES FOR APPROXIMATING SOLUTIONS OF ACCRETIVE OPERATORS IN BANACH SPACES SHOJI KAMIMURA AND WATARU TAKAHASHI Received December 14, 1999

More information

Laplace s Equation. Chapter Mean Value Formulas

Laplace s Equation. Chapter Mean Value Formulas Chapter 1 Laplace s Equation Let be an open set in R n. A function u C 2 () is called harmonic in if it satisfies Laplace s equation n (1.1) u := D ii u = 0 in. i=1 A function u C 2 () is called subharmonic

More information

Lecture 5: The Bellman Equation

Lecture 5: The Bellman Equation Lecture 5: The Bellman Equation Florian Scheuer 1 Plan Prove properties of the Bellman equation (In particular, existence and uniqueness of solution) Use this to prove properties of the solution Think

More information

Zdzis law Brzeźniak and Tomasz Zastawniak

Zdzis law Brzeźniak and Tomasz Zastawniak Basic Stochastic Processes by Zdzis law Brzeźniak and Tomasz Zastawniak Springer-Verlag, London 1999 Corrections in the 2nd printing Version: 21 May 2005 Page and line numbers refer to the 2nd printing

More information

for all subintervals I J. If the same is true for the dyadic subintervals I D J only, we will write ϕ BMO d (J). In fact, the following is true

for all subintervals I J. If the same is true for the dyadic subintervals I D J only, we will write ϕ BMO d (J). In fact, the following is true 3 ohn Nirenberg inequality, Part I A function ϕ L () belongs to the space BMO() if sup ϕ(s) ϕ I I I < for all subintervals I If the same is true for the dyadic subintervals I D only, we will write ϕ BMO

More information

WHY SATURATED PROBABILITY SPACES ARE NECESSARY

WHY SATURATED PROBABILITY SPACES ARE NECESSARY WHY SATURATED PROBABILITY SPACES ARE NECESSARY H. JEROME KEISLER AND YENENG SUN Abstract. An atomless probability space (Ω, A, P ) is said to have the saturation property for a probability measure µ on

More information

Notes on Measure, Probability and Stochastic Processes. João Lopes Dias

Notes on Measure, Probability and Stochastic Processes. João Lopes Dias Notes on Measure, Probability and Stochastic Processes João Lopes Dias Departamento de Matemática, ISEG, Universidade de Lisboa, Rua do Quelhas 6, 1200-781 Lisboa, Portugal E-mail address: jldias@iseg.ulisboa.pt

More information

AN EFFECTIVE METRIC ON C(H, K) WITH NORMAL STRUCTURE. Mona Nabiei (Received 23 June, 2015)

AN EFFECTIVE METRIC ON C(H, K) WITH NORMAL STRUCTURE. Mona Nabiei (Received 23 June, 2015) NEW ZEALAND JOURNAL OF MATHEMATICS Volume 46 (2016), 53-64 AN EFFECTIVE METRIC ON C(H, K) WITH NORMAL STRUCTURE Mona Nabiei (Received 23 June, 2015) Abstract. This study first defines a new metric with

More information

FENCHEL DUALITY, FITZPATRICK FUNCTIONS AND MAXIMAL MONOTONICITY S. SIMONS AND C. ZĂLINESCU

FENCHEL DUALITY, FITZPATRICK FUNCTIONS AND MAXIMAL MONOTONICITY S. SIMONS AND C. ZĂLINESCU FENCHEL DUALITY, FITZPATRICK FUNCTIONS AND MAXIMAL MONOTONICITY S. SIMONS AND C. ZĂLINESCU This paper is dedicated to Simon Fitzpatrick, in recognition of his amazing insights ABSTRACT. We show in this

More information

Lecture notes for Analysis of Algorithms : Markov decision processes

Lecture notes for Analysis of Algorithms : Markov decision processes Lecture notes for Analysis of Algorithms : Markov decision processes Lecturer: Thomas Dueholm Hansen June 6, 013 Abstract We give an introduction to infinite-horizon Markov decision processes (MDPs) with

More information

Proofs for Large Sample Properties of Generalized Method of Moments Estimators

Proofs for Large Sample Properties of Generalized Method of Moments Estimators Proofs for Large Sample Properties of Generalized Method of Moments Estimators Lars Peter Hansen University of Chicago March 8, 2012 1 Introduction Econometrica did not publish many of the proofs in my

More information

HOPF S DECOMPOSITION AND RECURRENT SEMIGROUPS. Josef Teichmann

HOPF S DECOMPOSITION AND RECURRENT SEMIGROUPS. Josef Teichmann HOPF S DECOMPOSITION AND RECURRENT SEMIGROUPS Josef Teichmann Abstract. Some results of ergodic theory are generalized in the setting of Banach lattices, namely Hopf s maximal ergodic inequality and the

More information

ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE. Sangho Kum and Gue Myung Lee. 1. Introduction

ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE. Sangho Kum and Gue Myung Lee. 1. Introduction J. Korean Math. Soc. 38 (2001), No. 3, pp. 683 695 ON GAP FUNCTIONS OF VARIATIONAL INEQUALITY IN A BANACH SPACE Sangho Kum and Gue Myung Lee Abstract. In this paper we are concerned with theoretical properties

More information

REAL VARIABLES: PROBLEM SET 1. = x limsup E k

REAL VARIABLES: PROBLEM SET 1. = x limsup E k REAL VARIABLES: PROBLEM SET 1 BEN ELDER 1. Problem 1.1a First let s prove that limsup E k consists of those points which belong to infinitely many E k. From equation 1.1: limsup E k = E k For limsup E

More information

Could Nash equilibria exist if the payoff functions are not quasi-concave?

Could Nash equilibria exist if the payoff functions are not quasi-concave? Could Nash equilibria exist if the payoff functions are not quasi-concave? (Very preliminary version) Bich philippe Abstract In a recent but well known paper (see [11]), Reny has proved the existence of

More information