Zero-Sum Average Semi-Markov Games: Fixed Point Solutions of the Shapley Equation
|
|
- Karen Little
- 5 years ago
- Views:
Transcription
1 Zero-Sum Average Semi-Markov Games: Fixed Point Solutions of the Shapley Equation Oscar Vega-Amaya Departamento de Matemáticas Universidad de Sonora May 2002 Abstract This paper deals with zero-sum average semi-markov games with Borel state and action spaces, and unbounded payoffs and mean holding times. A solution of the Sapley equation is obtained via the Banach Fixed Point Theorem assumming that the model satisfies a Lyapunov-like condition, a growth hypothesis on the payoff function and the mean holding time, besides standard continuity and compactness requirements. Key words. Zero-sum semi-markov games, Average payoff criterion, Lyapunov conditions, Fixed-point approach. AMS subject classification. 90D10, 90D20, 93E05. 1 Introduction Several recent papers have used variants of a Lyapunov-like condition to solve an average payoff optimization problem for markovian systems with unbounded payoff and Borel state and action spaces (see, e.g. [9], [13], [14], for Markov models; [15], [20], [28] for semi-markov models; [11], [16], [23] for zero-sum Markov games and [17] for zero-sum semi-markov games). The key property used in all these papers is that the imposed Lyapunov condition yields the socalled weighted geometric ergodicity (WGE) property, which is a generalization of the standard uniform geometric ergodicity in Markov chain theory (see [10], [12] and [21] for a detailed discussion of these concepts). Roughly speaking, in these papers the WGE property is combined, explicitly or implicitly, either with the vanishing discount factor approach or with some variants of the policy iteration algorithm for proving their main results. These facts are the first main difference with the present paper since, in spite of imposing a similar stability condition, we use instead a fixed-point approach which does not rely, at least explicitly, on the WGE property. This research was supported by CONACyT (México) under Grant E. 1
2 The fixed-point approach allows us to obtain directly the Shapley equation, which in turn yields the existence of a stationary optimal strategy pair or saddle point see Theorem 4.7 (a) and (b). In contrast, the approaches followed in [11], [16], [23] first show the existence of a stationary saddle point and then establish the Shapley equation. On the other hand, [20], [15], [17] recur to auxiliary models related to the original one; more precisely, [20] uses the socalled Schweitzer data transformation [26], while the analysis in [15] and [17] relies on certain perturbed models. A second key difference concerns the times between two consecutive decision epochs. In contrast with discrete-time Markov control processes and Markov games, the decision epochs in semi-markov control processes are random; thus it is necessary to ensure that such processes experience only finitely many transitions in each finite time period. This is usually done by assuming that the mean holding time function is bounded below by a constant, even for the discrete state space case (see, e.g. [2], [5], [19], [24] and their references). In particular, this condition plays a crucial role in the approaches followed in [28], [15], [17] and [20]; in fact, in the three latter references it is also assumed that the mean holding time function is bounded above by a constant while in the present paper it is only assumed that this function is positive. It is important to mention that, as a by-product, the fixed-point approach yields a minimax characterization of certain solution of the Shapley equation Theorem 4.7(c) which, seemingly, have not been previously discussed in the literature dealing with zero-sum stochastic games. We should also mention that the fixed-point approach has been used in several early papers (see, e.g. [7], [12], [18], [25]) but under much stronger ergodicity conditions, which, in particular, exclude the case of unbounded payoffs. The variant of Lyapunov condition we consider here was recently introduced in [27] for Markov control process and used in [8] to study minimax problems. In fact, the present paper extends to zero-sum semi-markov games the results of the two latter references. For brief surveys of the existing literature on stochastic games with finite or denumerable state space the reader can consult [1], [3], [6], [7] and [19]. The remainder of the paper is organized as follows. The semi-markov game model and the (ratio) expected average payoff criterion are introduced in Sections 2 and 3, respectively. The assumptions and main results are stated in Section 4. The proofs of all results are given in Sections 5 and 6. 2 The Game Model Throughout the paper we shall use the following notation. Given a Borel space S that is, a Borel subset of a complete separable metric space B(S) denotes the Borel σ algebra and measurability always means measurability with respect to B(S). The class of all probability measure on S is denoted by P(S). Given two Borel spaces S and S, a stochastic kernel ϕ( ) on S given S is a function such that ϕ( s ) is in P(S) for each s S, and ϕ(b ) is a measurable 2
3 function on S for each B B(S). Moreover, R + stands for the nonnegative real number subset and N (N 0, resp.) denotes the positive (nonnegative, resp.) integers subset. The semi-markov game model. This paper is concerned with a zero-sum semi-markov game modeled by (, A, B, K A, K B, Q, F, r) where is the state space, and the sets A and B are the control spaces for players 1 and 2, respectively. It is assumed that all these sets are Borel spaces. The constraint sets K A and K B are Borel subsets of A and B, respectively. Thus, for each x, the x-sections A(x) := {a A : (x, a) K A } B(x) := {b B : (x, b) K B }, stand for the sets of admissible actions or controls for players 1 and 2, respectively. Now, let K := {(x, a, b) : x, a A(x), b B(x)}, which, by [22], is a Borel subset of A B. The transition law Q( ) of the system is a stochastic kernel on given K. For each (x, a, b, y) K, F ( x, a, b, y) is a distribution function on R + := [0, + ), and F (t ) is a measurable function on K for each t R +. Finally, the payoff r is a measurable function on K R +. The game is played over an infinite horizon as follows: at time t = 0 the game is observed in some state x 0 = x and the players independently choose controls a 0 = a A(x 0 ) and b 0 = b B(x 0 ). Then, the system remains in state x 0 = x for a nonnegative random time δ 1 and player 1 receives the amount r(x, a, b, δ 1 ) from player 2. At time δ 1 the system jumps to a new state x 1 = x according to the probability measure Q( x, a, b). The distribution of the random variable δ 1, given that the system has jumped into state x, is F ( x, a, b, x ); that is, F (t x, a, b, x ) = Pr [δ 1 t x 0 = x, a 0 = a, b 0 = b, x 1 = x ] t R +. Thus, given that x 0 = x, a 0 = a and b 0 = b, the distribution of δ 1 is G(t x, a, b) := + 0 F (t x, a, b, y)q(dy x, a, b), t R +, (x, a, b) K, and it is called the holding time distribution. Immediately after the transition occurs, the players again choose controls, say, a 1 = a A(x ) and b 1 = b B(x ), and the above process is repeated over and over again. 3
4 This procedure yields a stochastic processes {(x n, a n, b n, δ n+1 )} where, for each n N 0, x n is the state of the system, a n and b n are the control variables for player 1 and 2, respectively, and δ n+1 is the holding time at state x n. The goal of player 1 (player 2, resp.) is to maximize (minimize, resp.) his/her flow rewards (costs, resp.) r(x 0, a 0, b 0, δ 1 ), r(x 1, a 1, b 1, δ 2 ), over an infinite horizon using an expected average reward (cost) criterion defined by (5) below. The functions on K given as τ(x, a, b) := + 0 tg(dt x, a, b) (1) R(x, a, b) := + 0 r(x, a, b, t)g(dt x, a, b) (2) are called the mean holding time and the mean payoff, respectively. Strategies. Let H 0 := and H n := K R + H n 1 for n N. Then, for each n N 0, a generic element of H n is denoted as h n := (x 0, a 0, b 0, δ 1,, x n 1, a n 1, b n 1, δ n, x n ) which can be thought of as the history of the game up to the time of the nth transition T n := T n 1 + δ n, n N, (3) where T 0 := 0. Thus a strategy for player 1 is a sequence π 1 = {π 1 n } of stochastic kernels π1 n on A given H n satisfying the constraint π 1 n (A(x n) h n ) = 1 h n H n, n N 0. The class of all strategies for player 1 is denoted by Π 1. For each x, let A(x) := P(A(x)) and denote by Φ 1 the class of all stochastic kernels ϕ 1 on A given such that ϕ 1 ( x) A(x) for all x. A policy π 1 is called stationary if π 1 n( h n ) = ϕ 1 ( x n ) h n H n, n N 0, for some stochastic kernel ϕ 1 in Φ 1. Following an standard convention, Φ 1 is identified with the class of stationary strategies for player 1. The sets of strategies Π 2 and Φ 2 for player 2 are defined in a similar way but writing B(x) and B(x) instead of A(x) and A(x), respectively. 4
5 Let (Ω, F) be the (canonical) measurable space consisting of the sample space Ω := (K R + ) and its product σ-algebra. Thus, for each strategy pair (π 1, π 2 ) Π 1 Π 2 and each initial state x, there exists a probability measure P π1,π 2 x defined on (Ω, F) which governs the evolution of the stochastic process {(x n, a n, b n, δ n+1 )}. The expectation operator with respect to the measure probability P π1,π 2 x is denoted as E π1,π 2 x. Throughout the paper we shall use the following notation: for a measurable function u on K and a stationary strategy pair (ϕ 1, ϕ 2 ) Φ 1 Φ 2, let u ϕ 1,ϕ 2(x) := u(x, a, b)ϕ 1 (da x)ϕ 2 (db x) x. (4) B(x) A(x) Thus, in particular, we shall write and, similarly, for all x. R ϕ 1,ϕ 2(x) := τ ϕ 1,ϕ 2(x) := Q ϕ 1,ϕ 2( x) := B(x) A(x) B(x) A(x) B(x) A(x) R(x, a, b)ϕ 1 (da x)ϕ 2 (db x), τ(x, a, b)ϕ 1 (da x)ϕ 2 (db x), Q( x, a, b)ϕ 1 (da x)ϕ 2 (db x), If the players use a stationary strategy pair, say (ϕ 1, ϕ 2 ), then the state process {x n } is a Markov chain with transition probability Q ϕ 1,ϕ2( ). In this case, the n-step transition probability is denoted by Q n ϕ 1,ϕ ( ) for each n N 2 0, where Q 0 ϕ 1,ϕ ( x) is the Dirac measure at x. Thus, for each u B 2 W (), Q n ϕ 1,ϕ 2u(x) := u(dy)q n ϕ 1,ϕ 2(dy x) = Eϕ1,ϕ 2 x u(x n ) x, n N 0. 3 The expected average payoff criterion The (ratio) expected average payoff (EAP) for the strategy pair (π 1, π 2 ) Π 1 Π 2, given the initial state x 0 = x, is defined as J(π 1, π 2, x) := lim inf n E π1,π 2 x n 1 k=0 r(x k, a k, b k, δ k+1 ) E π1,π 2 x T n. (5) It is easy to verify using properties of conditional expectation that 5
6 and also that E π1,π 2 x δ k+1 = E π1,π 2 x τ(x k, a k, b k ) E π1,π 2 x r(x k, a k, b k, δ k+1 ) = E π1,π 2 x R(x k, a k, b k ), for all x, (π 1, π 2 ) Π 1 Π 2, k N 0. Thus, (5) can be rewritten as J(π 1, π 2, x) = lim inf n E π1,π 2 x E π1,π 2 x n 1 Now consider the following functions on defined as k=0 R(x k, a k, b k ) n 1 k=0 τ(x k, a k, b k ). (6) L(x) := sup π 1 Π 1 inf π 2 Π J(π1, π 2, x) and U(x) := inf 2 π 2 Π 2 sup J(π 1, π 2, x), (7) π 1 Π 1 which are called the lower value and the upper value of the game, respectively, for the ratio EAP criterion. In general, L( ) U( ), but if it holds L( ) = U( ), the common function is called the value of the game and denoted by V ( ). If the game has a value V ( ), a strategy π 1 Π1 is said to be expected average payoff (EAP-) optimal for player 1 if Similarly, π 2 player 2 if inf π 2 Π J(π1, π 2, x) = V (x) x. 2 Π 2 is said to be expected average payoff (EAP-) optimal for sup J(π 1, π, 2 x) = V (x) x. π 1 Π 1 If π i is EAP-optimal for player i (i = 1, 2), then (π, 1 π ) 2 is called an EAPoptimal pair or saddle point. Note that (π 1, π2 ) is EAP-optimal if and only if J(π 1, π 2, x) J(π1, π2, x) J(π1, π2, x) x, (π 1, π 2 ) Π 1 Π 2. 4 Assumptions and main results The first condition imposed on the model, Assumption 4.1 below, ensures that the systems is regular, which means that it experiences finitely many jumps or transitions over each finite period of time. Usually, the regularity property is obtained assuming that the mean holding time τ is bounded below by a positive constant (see, e.g. [2], [5], [15], [17], [18], [19], [20], [24], [26], [28] and their references). In the present paper is only assumed that the mean holding time is a positive function. 6
7 Assumption 4.1.(Regularity condition) τ(x, a, b) > 0 for all (x, a, b) K. The second hypothesis imposes a growth condition both in the mean holding time and the mean payoff. Assumption 4.2. There exists a measurable function W ( ) on bounded below by a constant θ > 0 such that max {τ(x, a, b), R(x, a, b) } KW (x) (x, a, b) K, for a fixed positive constant K. To state the third set of hypotheses as well as several of its consequences some notation is required. For a measurable function u( ) on, define the weighted norm with respect to W (W norm, for short) as u(x) u W := sup x W (x), and denote by B W () the Banach space of all measurable functions with finite W norm. Moreover, for a measure γ( ) on let γ(u) := u(x)γ(dx), whenever the integral is well defined. Assumption 4.3.(Lyapunov condition) There exists a non-trivial measure ν( ) on, a nonnegative measurable function S( ) on K and a positive constant λ < 1 such that: (a) ν(w ) < ; (b) Q(B x, a, b) ν(b)s(x, a, b) B B(), (x, a, b) K; (c) W (y)q(dy x, a, b) λw (x) + S(x, a, b)ν(w ) (x, a, b) K; (d) ν(s ϕ 1,ϕ 2) > 0 (ϕ1, ϕ 2 ) Φ 1 Φ 2. As we mentioned in the Introduction, Assumption 4.3 allows us to use a fixed-point approach. More precisely, we consider the kernel Q( x, a, b) := Q( x, a, b) ν( )S(x, a, b) (x, a, b) K, (8) which, under Assumption 4.3, is nonnegative. The point here is that Assumption 4.3(c) can be expressed equivalently as W (y) Q(dy x, a, b) λw (x) (x, a, b) K, (9) which, roughly speaking, means that Q( ) satisfies a certain contraction property. This contraction property is precisely what we shall exploit to prove our main results (Theorems 4.5 and 4.7 below). 7
8 Assumption 4.3 was first used in [27], though it is actually a simplified version of the Lyapunov condition introduced in [9]. Specifically, besides the conditions in Assumption 4.3, [9] assume the existence of a common irreducibility measure for the transition laws induced by the stationary strategies and also that the inequality in Assumption 4.3(c) holds uniformly, that is, inf ϕ 1,ϕ 2 ν(s ϕ 1,ϕ2) > 0. However, as it is shown in [27, Thm. 3.3] see Proposition 4.4 below the latter condition is not required while the irreducibility condition is redundant. On the other hand, several other papers have used similar Lyapunov conditions to Assumption 4.3 (see, e.g. [13], [14], [15], [16], [17], [23]) but with some important differences, which seemingly precludes the fixed point-approach. For instance, the fourth latter papers suppose instead of the conditions in Assumption 4.3 that W (y)q(dy x, a, b) λw (x) + bi C (x) (x, a, b) K where C is a Borel subset of, b is a positive constant, λ (0, 1) and W ( ) is bounded on C, and also that Q ϕ 1,ϕ 2(B x) δi C(x)ν ϕ 1,ϕ 2(B) for all x, B B(), (ϕ 1, ϕ 2 ) Φ 1 Φ 2, where each ν ϕ 1,ϕ2( ) is a probability measure concentrated on C and δ is a positive constant. A quick glance at the latter conditions shows that they do not lead to a contraction property as in (9), so the fixed-point approach is not applicable, at least in the way we do here. Finally, it is convenient to point out again that, in spite of imposing similar conditions to Assumption 4.3, the approaches followed in all the papers so far cited rely on the WGE mentioned in the Introduction, with the only exception of [27] and [8]. In the next proposition are stated some important consequences of Assumption 4.2 and 4.3, which are proved in [27] using fixed-points arguments too. Proposition 4.4. Suppose that Assumptions 4.3 holds. Then, for each stationary strategy pair (ϕ 1, ϕ 2 ) Φ 1 Φ 2, the following holds: (a) The transition law Q ϕ 1,ϕ2( x) is positive Harris recurrent. Thus, in particular, there exists a unique invariant probability measure µ ϕ 1,ϕ2( ), that is, µ ϕ 1,ϕ 2( ) = Q ϕ 1,ϕ 2( x)µ ϕ 1,ϕ 2(dx). Moreover, ν is an irreducibility measure for Q ϕ 1,ϕ 2( ). (b) µ ϕ 1,ϕ2(W ) is finite; in fact, it holds the bounds θ µ ϕ 1,ϕ 2(W ) ν(w ) (1 λ)ν(). (10) 8
9 Next observe that, under the Assumptions , by Proposition 4.4 the constants ρ(ϕ 1, ϕ 2 ) := µ ϕ 1,ϕ 2(R ϕ 1,ϕ 2) µ ϕ 1,ϕ 2(τ ϕ 1,ϕ 2) (ϕ 1, ϕ 2 ) Φ 1 Φ 2 (11) are finite. Then, for each (ϕ 1, ϕ 2 ) Φ 1 Φ 2, define on B W () the operator L ϕ 1,ϕ 2u(x) := R ϕ 1,ϕ 2(x) + u(y)q ϕ 1,ϕ2(dy x) x, (12) where R ϕ 1,ϕ 2( ) := R ϕ 1,ϕ 2( ) ρ(ϕ1, ϕ 2 )τ ϕ 1,ϕ2( ). (13) Theorem 4.5. Suppose that Assumptions 4.1, 4.2 and 4.3 hold. Then for each stationary strategy pair (ϕ 1, ϕ 2 ) Φ 1 Φ 2 : (a) There exists a unique function h ϕ 1,ϕ 2 B W (), with ν(h ϕ 1,ϕ2) = 0, that satisfies the (semi-markov) Poisson equation h ϕ 1,ϕ 2(x) = L ϕ 1,ϕ 2h ϕ 1,ϕ 2(x) = R ϕ 1,ϕ 2(x) + (b) Moreover, J(ϕ 1, ϕ 2, ) = ρ(ϕ 1, ϕ 2 ). h ϕ 1,ϕ 2(y)Q ϕ 1,ϕ2(dy x) x ; Now, we impose some compactness/continuity conditions on the model to assure the existence of measurable minimizers/maximizers; notice that this can be done in several settings (see, e.g. [10, Thm. 3.5, p. 28] or [8, Lemma 3.5]). Here, for simplicity, we consider the following one. Assumption 4.6.(Compactness/continuity conditions) For each (x, a, b) K : (a) A(x) and B(x) are non-empty compact subsets; (b) R(x,, b) is upper semicontinuous on A(x), and R(x, a, ) is lower semicontinuous on B(x); (c) τ(x,, b) and τ(x, a, ) are continuous on A(x) and B(x), respectively; (d) S(x,, b) and S(x, a, ) are continuous on A(x) and B(x), respectively; (e) For each bounded measurable function v on, the functions v(y)q(dy x,, b) and v(y)q(dy x, a, ) are continuous on A(x) and B(x), respectively; (f) The functions 9
10 W (y)q(dy x,, b) and are continuous on A(x) and B(x), respectively. W (y)q(dy x, a, ) Theorem 4.7. Suppose that Assumptions 4.1, 4.2, 4.3 and 4.6 hold. Then: (a) There exists a unique function h B W () with ν(h ) = 0, a stationary strategy pair (ϕ 1, ϕ 2 ) Φ 1 Φ 2 and a constant ρ which satisfy the Shapley equation } h (x) = min {R ϕ 1,ϕ 2(x) ϕ 2 Φ ρ τ ϕ 1,ϕ 2(x) + h (y)q ϕ 1 2,ϕ 2(dy x) x, } = max {R ϕ 1,ϕ 2 (x) ϕ 1 Φ ρ τ ϕ 1,ϕ 2 (x) + h (y)q ϕ 1,ϕ 2(dy x) 1 = R ϕ 1,ϕ 2 (x) ρ τ ϕ 1,ϕ 2 (x) + h (y)q ϕ 1,ϕ 2 (dy x). (b) The constant ρ is the value of the game and (ϕ 1, ϕ 2 ) is an EAP-optimal stationary strategy pair. That is, J(ϕ 1, ϕ2, ) = ρ and Hence, by Theorem 4.5, J(π 1, ϕ 2, ) ρ J(ϕ 1, π2, ) (π 1, π 2 ) Π 1 Π 2. (c) Moreover, h ( ) = h ϕ 1,ϕ 2 ( ). ρ = ρ(ϕ 1, ϕ 2 ) = max min ϕ 2 Φ ρ(ϕ1, ϕ 2 ) = min 2 ϕ 1 Φ 1 ϕ 2 Φ 2 max ϕ 1 Φ ρ(ϕ1, ϕ 2 ), (14) 1 h ( ) = h ϕ 1,ϕ 2 ( ) = min h ( ) = max h ( ), (15) ϕ 2 F 2 ϕ 1,ϕ2 ϕ 1 Φ 1 ϕ 1,ϕ 2 where F i stands for the class of all stationary EAP-optimal strategies for player i (i = 1, 2). It is worth mentioning that, to the best of our knowledge, the minimax characterization of the solution h ( ) of the Shapley equation given in (15) has been discussed in any of the previous paper dealing with zero-sum stochastic games, even for the case of discrete state space. 10
11 5 Proof of Theorem 4.5 For the proof of the results in Section 4 several preliminary results are needed. The first one are collected in the next lemma, which we state without proofs because they follow directly from Assumption 4.1, 4.2, and 4.3. Lemma 5.1. Suppose that Assumption 4.3 holds. Then: (a) For each function u in B W (), 1,π lim 2 n n Eπ1 x u(x n ) = 0 x, (π 1, π 2 ) Π 1 Π 2 ; (b) For each stationary strategy pair (ϕ 1, ϕ 2 ) Φ 1 Φ 2, it holds that µ ϕ 1,ϕ 2(S (1 λ)θ ϕ 1,ϕ2) > 0; ν(w ) (c) If in addition Assumptions 4.1 and 4.2 hold, then µ ϕ 1,ϕ 2(S ϕ 1,ϕ 2) µ ϕ1,ϕ 2(τ ϕ 1,ϕ 2) 1 λ Kν(W ) > 0. The following lemma concerns the existence of solutions to the Poisson equation which, in addition to being interesting in itself, plays a key role in our development. In fact, its proof exhibits the way we take advantage of the contraction property (9). Lemma 5.2. Suppose Assumption 4.2 and 4.3 holds and let (ϕ 1, ϕ 2 ) Φ 1 Φ 2 be fixed but arbitrary. Then, for each function v in B W () there exists a unique function h v in B W (), with ν(h v ) = 0, which satisfies the Poisson equation h v (x) = v(x) µ ϕ 1,ϕ 2(v) + h v (y)q ϕ 1,ϕ2(dy x) x. (16) Thus, from Lemma 5.1(a), 1 µ ϕ 1,ϕ2(v) = lim n n 1 1 n Eϕ,ϕ 2 x k=0 v(x k ) x. (17) Proof of Lemma 5.2. Fix a function v B W (), and let µ( ) := µ ϕ 1,ϕ 2( ), S( ) := S ϕ 1,ϕ 2( ) and Q( ) := Q ϕ 1,ϕ2( ). Next, define T u(x) = v(x) µ(v) + u(y) Q(dy x) x, u B W (). By Assumption 4.3(c), it is clear that T maps B W () into itself. Moreover, for any functions u, w B W (), it holds that 11
12 T u(x) T w(x) u(y) w(y) Q(dy x) for all x. Hence, u w W W (y) Q(dy x) u w W λw (x) T u T w W λ u w W. That is, T is a contraction operator from BW () into itself with modulus λ. Then, by the Banach Fixed Point Theorem, there exists a unique function h v B W () that satisfies the equation h v (x) = v(x) µ(v) + h v (y) Q(dy x) x = v(x) µ(v) + h v (y)q(dy x) ν(h v )S(x). Now, an integration with respect to the invariant probability measure µ( ) in both sides of the last equation yields ν(h v )µ(s) = 0, which, by Lemma 5.1(b), implies that ν(h v ) = 0. Therefore, h v satisfies the Poisson equation h v (x) = v(x) µ(v) + h v (y)q(dy x) x, which proves (16). Finally, the property (17) is obtained by iteration of the Poisson equation and using Lemma 5.1(a). Now we proceed to prove Theorem 4.5. Proof of Theorem 4.5. Let (ϕ 1, ϕ 2 ) Φ 1 Φ 2 be fixed but arbitrary. Thus, since the function v( ) := R ϕ 1,ϕ 2( ) = R ϕ 1,ϕ 2( ) ρ(ϕ1, ϕ 2 ) τ ϕ 1,ϕ 2( ) is in B W (), by Lemma 5.2 there exists a unique function h ϕ 1,ϕ 2 B W () with ν(h ϕ 1,ϕ2) = 0 that satisfies the Poisson equation h ϕ 1,ϕ 2(x) = R ϕ 1,ϕ 2(x) + h ϕ 1,ϕ 2(y)Q ϕ 1,ϕ2(dy x) x. 12
13 This proves part (a) of the theorem. Next, to prove part (b), first note that iteration of the last equation yields h ϕ 1,ϕ 2(x) = Eϕ1,ϕ 2 x [ n 1 ] n 1 R ϕ 1,ϕ 2(x k) ρ(ϕ 1, ϕ 2 ) τ ϕ 1,ϕ 2(x k) k=1 + h ϕ 1,ϕ 2(y)Qn ϕ 1,ϕ 2(dy x) k=1 (18) for all n N and x. Moreover, by Assumptions 4.1 and 4.2, applying Lemma 5.2 with v( ) := τ ϕ 1,ϕ2( ), we obtain n 1 µ ϕ 1,ϕ 2(τ 1,ϕ ϕ 1,ϕ2) = lim 2 n n Eϕ1 x τ ϕ 1,ϕ 2(x k) > 0 x, k=1 which combined with (18) and Lemma 5.1(a) implies that ρ(ϕ 1, ϕ 2 E ϕ1,ϕ 2 x ) = lim n E ϕ1,ϕ 2 x n 1 k=0 R ϕ 1,ϕ 2(x k) n 1 k=0 τ ϕ 1,ϕ 2(x k) x. 6 Proof of Theorem 4.7 Define the constants ρ l := sup ϕ 1 Φ 1 inf ϕ 2 Φ ρ(ϕ1, ϕ 2 ) and ρ u := inf 2 ϕ 2 Φ 2 sup ρ(ϕ 1, ϕ 2 ). ϕ 1 Φ 1 We show in the next lemma that this constants are finite. Observe that this trivially holds if one assume that the mean holding time function is bounded below by a positive constant. Lemma 6.1. Suppose that Assumptions 4.1, 4.2, 4.3 and 4.6 hold. Then ρ l < and ρ u <. Proof of Lemma 6.1. Let ϕ 1 be a fixed but arbitrary stationary strategy for player 1 and consider the Markov (one player) model M = (, K B, Q, τ) where and K B are as above, and the transition law and the one-step cost function are defined as 13
14 Q( x, b) := Q( x, a, b)ϕ 1 (da x) A(x) τ(x, b) := A(x) τ(x, a, b)ϕ 1 (da x) for all (x, b) K B, respectively. Thus following the notation (4), for all x and ϕ 2 Φ 2, define Q ϕ 2( x) := B(x) Q( x, b)ϕ 2 (db x) τ ϕ 2(x) := B(x) τ(x, b)ϕ 2 (db x). Note that Q ϕ 2( ) = Q ϕ 1,ϕ 2( ) and τ ϕ 2( ) = τ ϕ 1,ϕ 2( ) for all ϕ2 Φ 2. The Markov model M satisfies all the conditions in [27, Thm. 3.6]; hence, in particular, there exists a stationary policy ϕ 2 + Φ 2 such that µ ϕ1,ϕ 2 (τ + ϕ 1,ϕ 2 ) = µ + ϕ 1,ϕ 2 ( τ ϕ2) = inf µ + ϕ 2 Φ 2 ϕ 1,ϕ 2 ( τ ϕ 2). + Then, by Assumption 4.1, it holds that µ ϕ1,ϕ 2 (τ + ϕ 1,ϕ 2 ) > 0. Next observe that + ρ(ϕ 1, ϕ 2 ) µ ϕ 1,ϕ 2( R ϕ 1,ϕ 2 ) µ ϕ 1,ϕ 2(τ ϕ 1,ϕ 2) µ ϕ1,ϕ2(w ) µ ϕ 1,ϕ 2 + (τ ϕ 1,ϕ 2 + ) k µ ϕ 1,ϕ 2 + (τ ϕ 1,ϕ 2 + ) where the last inequality follows from (10) with k := ν(w )[(1 λ)ν()] 1. Hence, k < µ ϕ 1,ϕ 2 (τ + ϕ 1,ϕ 2 ) inf ϕ 2 Φ ρ(ϕ1, ϕ 2 ) ρ(ϕ 1, ϕ 2 ) ϕ 1 Φ 1. (19) 2 + Now fix ϕ 2 Φ 2 and proceed as above to get a stationary strategy ϕ 1 + Φ such that µ ϕ 1 +,ϕ 2(τ ϕ 1 +,ϕ2) = inf µ ϕ 1,ϕ 2(τ ϕ 1,ϕ2) > 0. ϕ 1 Φ 1 14
15 Then, Hence, ρ(ϕ 1, ϕ 2 ) µ ϕ 1,ϕ 2( R ϕ 1,ϕ 2 ) µ ϕ 1,ϕ 2(τ ϕ 1,ϕ 2) Therefore, by (19)-(20), ρ(ϕ 1, ϕ 2 ) sup ϕ 1 Φ 1 ρ(ϕ 1, ϕ 2 ) k µ ϕ 1 +,ϕ 2(τ < +. ϕ 1 +,ϕ2) k µ ϕ 1 +,ϕ 2(τ. (20) ϕ 1 +,ϕ2) < ρ l = sup ϕ 1 Φ 1 which proves the desired result. inf ϕ 2 Φ ρ(ϕ1, ϕ 2 ) ρ u = 2 inf ϕ 2 Φ 2 sup ρ(ϕ 1, ϕ 2 ) < + ϕ 1 Φ 1 For the proof of Theorem 4.7 introduce the following operators: for each u B W () define L l u(x, a, b) := R l (x, a, b) + u(y) Q(dy x, a, b) (x, a, b) K, (21) where R l (x, a, b) := R(x, a, b) ρ l τ(x, a, b) (x, a, b) K. (22) Thus, following the notation (4), for each strategy pair (ϕ 1, ϕ 2 ) Φ 1 Φ 2 define the operators L l ϕ 1,ϕ 2u( ) := Rl ϕ 1,ϕ 2( ) + u(y) Q ϕ 1,ϕ2(dy ), (23) for each u B W (). L u( ) := sup inf ϕ 1 A(x) ϕ 2 B(x) Ll ϕ 1,ϕ 2u( ), (24) The results in the next lemma are a combination of well-known measurable selection theorem [22] and Fan Minimax Theorem [4]. The proof is omitted since it is the same as the proof of Lemma 6.5 in [11] and Lemmas 2, 3 and 4 in [23]. Lemma 6.2. Suppose that Assumption 4.1, 4.2, 4.3 and 4.6 hold and let u be a fixed function in B W (). Then (a) For each x, the sets A(x) and B(x) are compact with respect to the weak convergence of measures; (b) For each x, (ϕ 1, ϕ 2 ) Φ 1 Φ 2 and u B W (), the mappings 15
16 ϕ 1 L l ϕ 1,ϕ 2u(x) ϕ 2 L l ϕ 1,ϕ 2u(x) are upper semicontinuous and lower semicontinuous on A(x) and B(x), respectively, with respect to the weak convergence of measures; (c) Moreover, there exists a stationary strategy pair (ϕ 1 u, ϕ 2 u) Φ 1 Φ 2 such that L u( ) = L l ϕ 1 u,ϕ2 u u( ) Hence, L u( ) is in B W (). = max ϕ 1 Φ Ll 1 ϕ 1,ϕ u( ) = min 2 u ϕ 2 Φ Ll 2 ϕ,ϕ2u( ). 1 u The proof of Theorem 4.7 follows the same scheme as that of Lemma 5.2, so we first show in Lemma 6.3 below that L is a contraction operator from B W () into itself with modulus λ; hence, by the Banach Fixed Point Theorem, there exists a unique function h in B W () such that h ( ) = L h ( ) = sup ϕ 1 A(x) As a second step, in Lemma 6.4, we prove that ρ := ρ l = ρ u and ν(h ) 0. Once the latter is done, we show in Lemma 6.5 that inf ϕ 2 B(x) Ll ϕ 1,ϕ 2h ( ). (25) Then, (25) becomes ν(h ) = 0. h (x) = sup inf ϕ 1 A(x) ϕ 2 B(x) [R ϕ1,ϕ 2(x) ρ τ ϕ1,ϕ 2(x) + ] h (y)q ϕ1,ϕ 2(dy x) for all x. Hence, Lemma 6.2 yields a stationary strategy pair (ϕ 1, ϕ2 ) Φ 1 Φ 2 satisfying Theorem 4.7(a). Lemma 6.3. Suppose that assumptions in Theorem 4.7 hold. Then, L in (24) is a contraction operator from B W () into itself with modulus λ. Thus, by the Banach Fixed Point Theorem and Lemma 6.2, there exists a unique function h in B W () and a stationary strategy pair (ϕ 1, ϕ2 ) Φ1 Φ 2 such that 16
17 h ( ) = L h ( ) = L l ϕ 1,ϕ2 h ( ) (26) = min ϕ 2 B(x) Ll ϕ 1,ϕ2h ( ) = max ϕ 1 A(x) Ll ϕ 1,ϕ 2h ( ). (27) Proof of Lemma 6.3. By Lemma 6.2 it only remains to prove that L is a contraction operator from B W () into itself with modulus λ. To prove this, consider arbitrary functions u, v in B W () and observe, by Assumption 4.3(b) and (9), that L l ϕ 1,ϕ 2u( ) Ll ϕ 1,ϕ 2v( ) u v W W (y) Q ϕ1,ϕ 2(dy ) for all (ϕ 1, ϕ 2 ) Φ 1 Φ 2. This implies that u v W λw ( ) L l ϕ 1,ϕ 2u( ) Ll ϕ 1,ϕ 2v( ) + u v W λw ( ) (ϕ1, ϕ 2 ) Φ 1 Φ 2. Thus, the latter inequality together Lemma 6.2 implies inf ϕ 2 B(x) Ll ϕ 1,ϕ 2u( ) inf ϕ 2 B(x) Ll ϕ 1,ϕ 2v( ) + u v W λw ( ) ϕ1 Φ 1, which, using again Lemma 6.2, yields L u( ) L v( ) + u v W λw ( ). Similarly, interchanging the role of u and v, it also holds that Therefore, L v( ) L u( ) + u v W λw ( ). L u L v W λ u v W. That is, L is a contraction operator from B W () into itself with modulus λ. Now, the Banach Fixed Point Theorem together with Lemma 6.2 ensures the existence of a unique function h B W () and a stationary strategy pair (ϕ 1, ϕ 2 ) Φ 1 Φ 2 satisfying (26)-(27). Lemma 6.4. Suppose that assumptions in Theorem 4.7 hold and let h be as in Lemma 6.3. Then, 17
18 ν(h ) 0 and ρ l = ρ u. Proof of Lemma 6.4. Let (ϕ 1, ϕ2 ) be as in Lemma 6.3. Then, h (x) = min [R lϕ1,ϕ2(x) ] + h (y) Q ϕ 1 ϕ 2 B(x),ϕ 2(dy x) (28) Rϕ l 1,ϕ2(x) + h (y) Q ϕ 1,ϕ 2(dy x) = Rϕ l 1,ϕ2(x) + h (y)q ϕ 1,ϕ 2(dy x) ν(h )S ϕ 1,ϕ 2(x) for all x, ϕ 2 Φ 2. Then, an integration with respect to the invariant probability measure µ ϕ 1,ϕ 2 yields 0 µ ϕ 1,ϕ 2(Rl ϕ 1,ϕ2) ν(h )µ ϕ 1,ϕ 2(S ϕ 1,ϕ2) ϕ2 Φ 2, which implies that ν(h )µ ϕ 1,ϕ 2(S ϕ 1,ϕ2) µ ϕ 1,ϕ2(R ϕ 1,ϕ2) ρl µ ϕ 1,ϕ 2(τ ϕ 1,ϕ2) = µ ϕ 1,ϕ 2(τ ϕ 1,ϕ2) [ ρ(ϕ 1, ϕ2 ) ρ l], for all ϕ 2 Φ 2. Now, taking infimum over Φ 2, we obtain [ ν(h )µ ϕ 1 inf,ϕ2(s ϕ 1,ϕ2) ] ϕ 2 B(x) µ ϕ 1,ϕ 2(τ ϕ 1,ϕ2) inf ϕ 2 B(x) ρ(ϕ1, ϕ 2 ) ρ l 0, which, by Assumption 4.1 and Lemma 5.1(b), implies that ν(h ) 0. This inequality combined with (27) implies h (x) = ] max [R lϕ1,ϕ2 (x) + h (y) Q ϕ 1,ϕ 2(dy x) ϕ 1 A(x) ] max [R lϕ1,ϕ2 (x) + h (y)q ϕ 1,ϕ 2(dy x) ϕ 1 A(x) R l ϕ 1,ϕ 2 (x) + h (y)q ϕ 1,ϕ 2 (dy x) 18
19 for all x, ϕ 1 Φ 1. Now, integrating both sides of the latter inequality with respect to the invariant probability measure µ ϕ 1,ϕ 2, we see that 0 µ ϕ 1,ϕ 2 (Rl ϕ 1,ϕ 2 ) = µ ϕ 1,ϕ 2 (R ϕ 1,ϕ 2 ) ρl µ ϕ 1,ϕ 2 (τ ϕ 1,ϕ 2 ) ϕ1 Φ 1, which implies that Hence, ρ l ρ(ϕ 1, ϕ 2 ) = µ ϕ 1,ϕ 2 (R ϕ 1,ϕ 2 ) µ ϕ1,ϕ 2 (τ ϕ 1,ϕ 2 ) ϕ 1 Φ 1. Therefore, ρ l = ρ u. ρ l sup ϕ 1 Φ 1 ρ(ϕ 1, ϕ 2 ) inf ϕ 2 Φ 2 sup ρ(ϕ 1, ϕ 2 ) = ρ u. ϕ 1 Φ 1 Lemma 6.5. Suppose that assumptions in Theorem 4.7 hold and let h be as in Lemma 6.3. Then, ν(h ) = 0. Proof of Lemma 6.5. Let (ϕ 1, ϕ2 ) be as in Lemma 6.3 and put ρ := ρ l = ρ u. By (27), we have h (x) = ] max [R ϕ 1,ϕ 2 (x) ϕ 1 A(x) ρ τ ϕ 1,ϕ 2 (x) + h (y) Q ϕ 1,ϕ 2 (dy x) R ϕ 1,ϕ 2 (x) ρ τ ϕ 1,ϕ 2 (x) + h (y) Q ϕ 1,ϕ 2 (dy x) for all x, ϕ 1 Φ 1. As above, integrating with respect to the invariant probability measure µ ϕ1,ϕ 2 in both sides of the latter inequality we obtain ν(h )µ ϕ 1,ϕ 2 (S ϕ 1,ϕ 2 ) µ ϕ 1,ϕ 2 (τ ϕ 1,ϕ 2 ) [ ρ(ϕ 1, ϕ 2 ) ρ ] which implies that = µ ϕ1,ϕ 2 (τ ϕ 1,ϕ 2 ) [ρ(ϕ 1, ϕ 2 ) inf ϕ 2 Φ 2 sup ρ(ϕ 1, ϕ 2 ) ϕ 1 Φ 1 µ ϕ 1,ϕ 2 (τ ϕ 1,ϕ 2 ) [ρ(ϕ 1, ϕ 2 ) sup ϕ 1 Φ 1 ρ(ϕ 1, ϕ 2 ) ], ] 19
20 Then, ν(h )µ ϕ 1,ϕ 2 (S ϕ 1,ϕ 2 ) µ ϕ 1,ϕ 2 (τ ϕ 1,ϕ 2 ) ρ(ϕ 1, ϕ 2 ) sup ϕ 1 Φ 1 ρ(ϕ 1, ϕ 2 ) ϕ1 Φ 1. [ ν(h )µ ϕ sup 1,ϕ 2(S ϕ 1,ϕ 2 ) ] ϕ 1 Φ 1 µ ϕ 1,ϕ 2(τ ϕ 1,ϕ 2) 0. This inequality implies that ν(h ) 0. Hence, by Lemma 6.4, ν(h ) = 0. Finally we are ready for the proof of Theorem 4.7. Proof of Theorem 4.7. Let h and (ϕ 1, ϕ 2 ) be as in Lemma 6.3. First note that proof of part (a) is given throughout Lemmas 6.3, 6.4 and 6.5. Part (b) follows using standard dynamic programming arguments, while the first statement in part (c) is exactly Lemma 6.4. Thus, it only remains to prove the equalities in (15). To do this first recall that F i denotes the class of all stationary optimal strategies for player i, with i = 1, 2, which is nonempty because of part (b). Now, define the following operators on B W (): Mu(x) := Nu(x) := ] max [R ϕ 1,ϕ 2(x) ϕ 1 A(x) ρ τ ϕ 1,ϕ 2(x) + u(y) Q ϕ 1,ϕ 2(dy x) ] min [R ϕ 1,ϕ 2(x) ϕ 2 B(x) ρ τ ϕ 1,ϕ 2(x) + u(y) Q ϕ 1,ϕ 2(dy x) for all x. Proceeding as above it is easy to check that M and N are welldefined and they are λ contraction operators on B W () into itself. In fact, from part (a), h is the fixed point for both operators; that is, h ( ) = Mh ( ) = Nh ( ). Next choose an arbitrary strategy ϕ 1 0 in F 1 and note that ρ = ρ(ϕ 1 0, ϕ2 ). Then, by Theorem 4.5, there exists a unique function h ϕ 1 0,ϕ 2 ν(h ϕ 1 0,ϕ ) = 0, which satisfies 2 in B W (), with h ϕ 1 0,ϕ 2 (x) = R ϕ 1 0,ϕ2 (x) ρ τ ϕ 1 0,ϕ 2 (x) + Next, observe that h ϕ 1 0,ϕ 2 ( ) Mh ϕ 1 0,ϕ2 ( ), h ϕ 1 0,ϕ (y) Q 2 ϕ 1 0,ϕ (dy x) x. 2 20
21 which implies that h ϕ 1 0,ϕ ( ) M n h 2 ϕ 1 0,ϕ ( ) n N. 2 Now, since M is a contraction and h is its fixed point, we have h ϕ 1 0,ϕ 2 ( ) h ( ). Hence, since h ( ) = h ϕ 1,ϕ 2 ( ) and the policy ϕ1 0 was chosen arbitrarily in F 1, we have max ϕ 1 F 1 h ϕ 1,ϕ 2 ( ) = h ( ). Similar arguments, but using the operator N instead of M, show that h ( ) = min ϕ 2 F 2 h ϕ 1,ϕ2( ). Acknowledgment. The author thanks to Prof. Onésimo Hernández-Lerma for his valuable comments on a early version of this work. References [1] E. Altman, A. Hordijk and F. M. Spieksma, Contraction conditions for average and α discount optimality in countable state Markov games with unbounded rewards, Math. Oper. Res. 22 (1997), [2] S. Bathnagar and V. S. Borkar, A convex analitic framework for ergodic control of semi-markov processes, Math. Oper. Res. 20 (1995), [3] V.S. Borkar, M. K. Gosh, Denumerable stochastic games with limiting average payoff, J. Optim. Theory Appl. 76 (1993), [4] K. Fan, Minimax theorems, Proc. Acad. Sci. USA 39 (1953), [5] A. Ferdegruen, P. J. Schweitzer and H. C. Tijms, Denumerable undiscounted semi-markov decision processes with unbounded rewards, Math. Oper. Res. 8 (1983), [6] J. Filar and K. Vrieze, Competitive Markov Decision Processes, Springer- Verlag, New York, [7] M. K. Gosh and A. Bagchi, Stochastic games with average payoff criterion, Appl. Math. Optim. 38 (1998), [8] J.I. González-Trejo, O. Hernández-Lerma and L. F. Hoyos-Reyes, Minimax control of discrete-time stochastic systems, SIAM J. Control Optim., to appear. 21
22 [9] E. Gordienko and O. Hernández-Lerma, Average cost Markov control processes with weighted norms: existence of canonical policies, Appl. Math. (Warsaw) 23 (1995), [10] O. Hernández-Lerma and J.B. Lasserre, Further Topics on Discrete-Time Markov Control Processes, Springer-Verlag, New York, [11] O. Hernández-Lerma and J.B. Lasserre, Zero-sum stochastic games in Borel spaces: average payoff criteria, SIAM J. Control Optim. 39 (2001), [12] O. Hernández-Lerma, R. Montes-de-Oca, R. Cavazos-Cadena, Recurrence condtions for MDPs with Borel state space, Ann. Oper. Res. 28 (1991), [13] O. Hernández-Lerma and O. Vega-Amaya, Infinite-horizon Markov control processes with undiscounted cost criteria: from average to overtaking optimality, Appl. Math. (Warsaw) 25 (1998), [14] O. Hernández-Lerma, O. Vega-Amaya and G. Carrasco, Sample-path optimality and variance-minimization of average cost Markov control processes, SIAM J. Control Optim. 38 (1999), [15] A. Jaśkiewicz, An approximation approach to ergodic semi-markov control processes, Math. Methods Oper. Res. 54 (2001), [16] A. Jaśkiewicz and A. S. Nowak, On the optimality equation for zero-sum ergodic stochastic games, Math. Methods Oper. Res. 54 (2001), [17] A. Jaśkiewicz, Zero-sum semi-markov games, SIAM J. Control and Optim., to appear. [18] M. Kurano, Average optimal adaptive policies in semi-markov decision processes including an unknown parameter, J. Oper. Res. Soc. Japan 28 (1985), [19] A. K. Lal and S. Sinha, Zero-sum two-person semi-markov games, J. Appl. Prob. 29 (1992), [20] F. Luque-Vásquez and O. Hernández-Lerma, Semi-Markov models with average costs, Appl. Math. (Warsaw) 26 (1999), [21] S. P. Meyn and R. L. Tweedie, Markov Chains and Stochastic Stability, Springer-Verlag, London, [22] A. S. Nowak, Measurable selection theorems for minimax stochastic optimizations problems, SIAM J. Control Optim. 23 (1985), [23] A. S. Nowak, Optimal strategies in a class of zero-sum ergodic stochastic games, Math. Methods Oper. Res. 50 (1999),
23 [24] M. L. Puterman, Markov Decision Processes. Discrete Stochastic Dynamic Programming, Wiley, New York, [25] U. Rieder, Average optimality in Markov games with general state space, Proc. 3rd Conf. on Approx. Theory and Optim. (1995), Puebla, México. (Available in [26] P. J. Schweitzer, Iterative solutions of functional equations of undiscounted Markov renewal programming, J. Math. Anal. Appl. (1971), [27] O. Vega-Amaya, The average cost optimality equation: a fixed point approach, Reporte de Investigación No. 4 (2001), Departamento de Matemáticas, Universidad de Sonora, México. (Available in: tedi/reportes). [28] O. Vega-Amaya and F. Luque-Vásquez, Sample-path average cost optimality for semi-markov control processes on Borel spaces: unbounded costs and mean holding times, Appl. Math. (Warsaw) 27 (2000),
Zero-sum semi-markov games in Borel spaces: discounted and average payoff
Zero-sum semi-markov games in Borel spaces: discounted and average payoff Fernando Luque-Vásquez Departamento de Matemáticas Universidad de Sonora México May 2002 Abstract We study two-person zero-sum
More informationValue Iteration and Action ɛ-approximation of Optimal Policies in Discounted Markov Decision Processes
Value Iteration and Action ɛ-approximation of Optimal Policies in Discounted Markov Decision Processes RAÚL MONTES-DE-OCA Departamento de Matemáticas Universidad Autónoma Metropolitana-Iztapalapa San Rafael
More informationNoncooperative continuous-time Markov games
Morfismos, Vol. 9, No. 1, 2005, pp. 39 54 Noncooperative continuous-time Markov games Héctor Jasso-Fuentes Abstract This work concerns noncooperative continuous-time Markov games with Polish state and
More informationTotal Expected Discounted Reward MDPs: Existence of Optimal Policies
Total Expected Discounted Reward MDPs: Existence of Optimal Policies Eugene A. Feinberg Department of Applied Mathematics and Statistics State University of New York at Stony Brook Stony Brook, NY 11794-3600
More informationREMARKS ON THE EXISTENCE OF SOLUTIONS IN MARKOV DECISION PROCESSES. Emmanuel Fernández-Gaucherand, Aristotle Arapostathis, and Steven I.
REMARKS ON THE EXISTENCE OF SOLUTIONS TO THE AVERAGE COST OPTIMALITY EQUATION IN MARKOV DECISION PROCESSES Emmanuel Fernández-Gaucherand, Aristotle Arapostathis, and Steven I. Marcus Department of Electrical
More informationA Perturbation Approach to Approximate Value Iteration for Average Cost Markov Decision Process with Borel Spaces and Bounded Costs
A Perturbation Approach to Approximate Value Iteration for Average Cost Markov Decision Process with Borel Spaces and Bounded Costs Óscar Vega-Amaya Joaquín López-Borbón August 21, 2015 Abstract The present
More informationContinuous-Time Markov Decision Processes. Discounted and Average Optimality Conditions. Xianping Guo Zhongshan University.
Continuous-Time Markov Decision Processes Discounted and Average Optimality Conditions Xianping Guo Zhongshan University. Email: mcsgxp@zsu.edu.cn Outline The control model The existing works Our conditions
More informationSEMI-MARKOV CONTROL MODELS WITH AVERAGE COSTS
APPLICATIONES MATHEMATICAE 26,3(1999), pp. 315 331 F. LUQUE-VÁSQUEZ (Sonora) O. HERNÁNDEZ-LERMA(México) SEMI-MARKOV CONTROL MODELS WITH AVERAGE COSTS Abstract. This paper studies semi-markov control models
More informationINVARIANT PROBABILITIES FOR
Applied Mathematics and Stochastic Analysis 8, Number 4, 1995, 341-345 INVARIANT PROBABILITIES FOR FELLER-MARKOV CHAINS 1 ONtSIMO HERNNDEZ-LERMA CINVESTA V-IPN Departamento de Matemticas A. Postal 1-70,
More informationWEAK CONVERGENCES OF PROBABILITY MEASURES: A UNIFORM PRINCIPLE
PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY Volume 126, Number 10, October 1998, Pages 3089 3096 S 0002-9939(98)04390-1 WEAK CONVERGENCES OF PROBABILITY MEASURES: A UNIFORM PRINCIPLE JEAN B. LASSERRE
More informationErgodic Theorems. Samy Tindel. Purdue University. Probability Theory 2 - MA 539. Taken from Probability: Theory and examples by R.
Ergodic Theorems Samy Tindel Purdue University Probability Theory 2 - MA 539 Taken from Probability: Theory and examples by R. Durrett Samy T. Ergodic theorems Probability Theory 1 / 92 Outline 1 Definitions
More informationTHE DAUGAVETIAN INDEX OF A BANACH SPACE 1. INTRODUCTION
THE DAUGAVETIAN INDEX OF A BANACH SPACE MIGUEL MARTÍN ABSTRACT. Given an infinite-dimensional Banach space X, we introduce the daugavetian index of X, daug(x), as the greatest constant m 0 such that Id
More informationLecture notes for Analysis of Algorithms : Markov decision processes
Lecture notes for Analysis of Algorithms : Markov decision processes Lecturer: Thomas Dueholm Hansen June 6, 013 Abstract We give an introduction to infinite-horizon Markov decision processes (MDPs) with
More informationControlled Markov Processes with Arbitrary Numerical Criteria
Controlled Markov Processes with Arbitrary Numerical Criteria Naci Saldi Department of Mathematics and Statistics Queen s University MATH 872 PROJECT REPORT April 20, 2012 0.1 Introduction In the theory
More informationOn the Convergence of Optimal Actions for Markov Decision Processes and the Optimality of (s, S) Inventory Policies
On the Convergence of Optimal Actions for Markov Decision Processes and the Optimality of (s, S) Inventory Policies Eugene A. Feinberg Department of Applied Mathematics and Statistics Stony Brook University
More informationOptimality Inequalities for Average Cost MDPs and their Inventory Control Applications
43rd IEEE Conference on Decision and Control December 14-17, 2004 Atlantis, Paradise Island, Bahamas FrA08.6 Optimality Inequalities for Average Cost MDPs and their Inventory Control Applications Eugene
More informationOn Finding Optimal Policies for Markovian Decision Processes Using Simulation
On Finding Optimal Policies for Markovian Decision Processes Using Simulation Apostolos N. Burnetas Case Western Reserve University Michael N. Katehakis Rutgers University February 1995 Abstract A simulation
More informationON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME
ON THE POLICY IMPROVEMENT ALGORITHM IN CONTINUOUS TIME SAUL D. JACKA AND ALEKSANDAR MIJATOVIĆ Abstract. We develop a general approach to the Policy Improvement Algorithm (PIA) for stochastic control problems
More informationA convergence result for an Outer Approximation Scheme
A convergence result for an Outer Approximation Scheme R. S. Burachik Engenharia de Sistemas e Computação, COPPE-UFRJ, CP 68511, Rio de Janeiro, RJ, CEP 21941-972, Brazil regi@cos.ufrj.br J. O. Lopes Departamento
More informationMinimum average value-at-risk for finite horizon semi-markov decision processes
12th workshop on Markov processes and related topics Minimum average value-at-risk for finite horizon semi-markov decision processes Xianping Guo (with Y.H. HUANG) Sun Yat-Sen University Email: mcsgxp@mail.sysu.edu.cn
More informationReductions Of Undiscounted Markov Decision Processes and Stochastic Games To Discounted Ones. Jefferson Huang
Reductions Of Undiscounted Markov Decision Processes and Stochastic Games To Discounted Ones Jefferson Huang School of Operations Research and Information Engineering Cornell University November 16, 2016
More informationExistence, Uniqueness and Stability of Invariant Distributions in Continuous-Time Stochastic Models
Existence, Uniqueness and Stability of Invariant Distributions in Continuous-Time Stochastic Models Christian Bayer and Klaus Wälde Weierstrass Institute for Applied Analysis and Stochastics and University
More informationA note on two-person zero-sum communicating stochastic games
Operations Research Letters 34 26) 412 42 Operations Research Letters www.elsevier.com/locate/orl A note on two-person zero-sum communicating stochastic games Zeynep Müge Avşar a, Melike Baykal-Gürsoy
More informationIntroduction and Preliminaries
Chapter 1 Introduction and Preliminaries This chapter serves two purposes. The first purpose is to prepare the readers for the more systematic development in later chapters of methods of real analysis
More informationSeries Expansions in Queues with Server
Series Expansions in Queues with Server Vacation Fazia Rahmoune and Djamil Aïssani Abstract This paper provides series expansions of the stationary distribution of finite Markov chains. The work presented
More informationResearch Article A Necessary Condition for Nash Equilibrium in Two-Person Zero-Sum Constrained Stochastic Games
Game Theory Volume 2013, Article ID 290427, 5 pages http://dx.doi.org/10.1155/2013/290427 Research Article A Necessary Condition for Nash Equilibrium in Two-Person Zero-Sum Constrained Stochastic Games
More informationSection Notes 9. Midterm 2 Review. Applied Math / Engineering Sciences 121. Week of December 3, 2018
Section Notes 9 Midterm 2 Review Applied Math / Engineering Sciences 121 Week of December 3, 2018 The following list of topics is an overview of the material that was covered in the lectures and sections
More informationOptimality Inequalities for Average Cost Markov Decision Processes and the Optimality of (s, S) Policies
Optimality Inequalities for Average Cost Markov Decision Processes and the Optimality of (s, S) Policies Eugene A. Feinberg Department of Applied Mathematics and Statistics State University of New York
More information1 Stochastic Dynamic Programming
1 Stochastic Dynamic Programming Formally, a stochastic dynamic program has the same components as a deterministic one; the only modification is to the state transition equation. When events in the future
More informationSubgame-Perfect Equilibria for Stochastic Games
MATHEMATICS OF OPERATIONS RESEARCH Vol. 32, No. 3, August 2007, pp. 711 722 issn 0364-765X eissn 1526-5471 07 3203 0711 informs doi 10.1287/moor.1070.0264 2007 INFORMS Subgame-Perfect Equilibria for Stochastic
More informationOn the Convergence of Optimal Actions for Markov Decision Processes and the Optimality of (s, S) Inventory Policies
On the Convergence of Optimal Actions for Markov Decision Processes and the Optimality of (s, S) Inventory Policies Eugene A. Feinberg, 1 Mark E. Lewis 2 1 Department of Applied Mathematics and Statistics,
More informationOn Differentiability of Average Cost in Parameterized Markov Chains
On Differentiability of Average Cost in Parameterized Markov Chains Vijay Konda John N. Tsitsiklis August 30, 2002 1 Overview The purpose of this appendix is to prove Theorem 4.6 in 5 and establish various
More informationTHE SKOROKHOD OBLIQUE REFLECTION PROBLEM IN A CONVEX POLYHEDRON
GEORGIAN MATHEMATICAL JOURNAL: Vol. 3, No. 2, 1996, 153-176 THE SKOROKHOD OBLIQUE REFLECTION PROBLEM IN A CONVEX POLYHEDRON M. SHASHIASHVILI Abstract. The Skorokhod oblique reflection problem is studied
More informationProofs for Large Sample Properties of Generalized Method of Moments Estimators
Proofs for Large Sample Properties of Generalized Method of Moments Estimators Lars Peter Hansen University of Chicago March 8, 2012 1 Introduction Econometrica did not publish many of the proofs in my
More informationMetric Spaces and Topology
Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies
More informationMDP Preliminaries. Nan Jiang. February 10, 2019
MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process
More informationWeighted Sup-Norm Contractions in Dynamic Programming: A Review and Some New Applications
May 2012 Report LIDS - 2884 Weighted Sup-Norm Contractions in Dynamic Programming: A Review and Some New Applications Dimitri P. Bertsekas Abstract We consider a class of generalized dynamic programming
More informationSOME REMARKS ON KRASNOSELSKII S FIXED POINT THEOREM
Fixed Point Theory, Volume 4, No. 1, 2003, 3-13 http://www.math.ubbcluj.ro/ nodeacj/journal.htm SOME REMARKS ON KRASNOSELSKII S FIXED POINT THEOREM CEZAR AVRAMESCU AND CRISTIAN VLADIMIRESCU Department
More informationLecture 10. Theorem 1.1 [Ergodicity and extremality] A probability measure µ on (Ω, F) is ergodic for T if and only if it is an extremal point in M.
Lecture 10 1 Ergodic decomposition of invariant measures Let T : (Ω, F) (Ω, F) be measurable, and let M denote the space of T -invariant probability measures on (Ω, F). Then M is a convex set, although
More informationEconomics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011
Economics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011 Section 2.6 (cont.) Properties of Real Functions Here we first study properties of functions from R to R, making use of the additional structure
More informationInfinite-Horizon Average Reward Markov Decision Processes
Infinite-Horizon Average Reward Markov Decision Processes Dan Zhang Leeds School of Business University of Colorado at Boulder Dan Zhang, Spring 2012 Infinite Horizon Average Reward MDP 1 Outline The average
More informationFUNCTIONAL COMPRESSION-EXPANSION FIXED POINT THEOREM
Electronic Journal of Differential Equations, Vol. 28(28), No. 22, pp. 1 12. ISSN: 172-6691. URL: http://ejde.math.txstate.edu or http://ejde.math.unt.edu ftp ejde.math.txstate.edu (login: ftp) FUNCTIONAL
More informationOPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS
OPTIMALITY OF RANDOMIZED TRUNK RESERVATION FOR A PROBLEM WITH MULTIPLE CONSTRAINTS Xiaofei Fan-Orzechowski Department of Applied Mathematics and Statistics State University of New York at Stony Brook Stony
More informationNear-Potential Games: Geometry and Dynamics
Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo January 29, 2012 Abstract Potential games are a special class of games for which many adaptive user dynamics
More informationTwo-player games : a reduction
Two-player games : a reduction Nicolas Vieille December 12, 2001 The goal of this chapter, together with the next one, is to give a detailed overview of the proof of the following result. Theorem 1 Let
More informationOn Ergodic Impulse Control with Constraint
On Ergodic Impulse Control with Constraint Maurice Robin Based on joint papers with J.L. Menaldi University Paris-Sanclay 9119 Saint-Aubin, France (e-mail: maurice.robin@polytechnique.edu) IMA, Minneapolis,
More informationSUMMARY OF RESULTS ON PATH SPACES AND CONVERGENCE IN DISTRIBUTION FOR STOCHASTIC PROCESSES
SUMMARY OF RESULTS ON PATH SPACES AND CONVERGENCE IN DISTRIBUTION FOR STOCHASTIC PROCESSES RUTH J. WILLIAMS October 2, 2017 Department of Mathematics, University of California, San Diego, 9500 Gilman Drive,
More informationOn the simplest expression of the perturbed Moore Penrose metric generalized inverse
Annals of the University of Bucharest (mathematical series) 4 (LXII) (2013), 433 446 On the simplest expression of the perturbed Moore Penrose metric generalized inverse Jianbing Cao and Yifeng Xue Communicated
More informationPERTURBATION THEORY FOR NONLINEAR DIRICHLET PROBLEMS
Annales Academiæ Scientiarum Fennicæ Mathematica Volumen 28, 2003, 207 222 PERTURBATION THEORY FOR NONLINEAR DIRICHLET PROBLEMS Fumi-Yuki Maeda and Takayori Ono Hiroshima Institute of Technology, Miyake,
More informationA regeneration proof of the central limit theorem for uniformly ergodic Markov chains
A regeneration proof of the central limit theorem for uniformly ergodic Markov chains By AJAY JASRA Department of Mathematics, Imperial College London, SW7 2AZ, London, UK and CHAO YANG Department of Mathematics,
More informationAN EFFECTIVE METRIC ON C(H, K) WITH NORMAL STRUCTURE. Mona Nabiei (Received 23 June, 2015)
NEW ZEALAND JOURNAL OF MATHEMATICS Volume 46 (2016), 53-64 AN EFFECTIVE METRIC ON C(H, K) WITH NORMAL STRUCTURE Mona Nabiei (Received 23 June, 2015) Abstract. This study first defines a new metric with
More informationA TALE OF TWO CONFORMALLY INVARIANT METRICS
A TALE OF TWO CONFORMALLY INVARIANT METRICS H. S. BEAR AND WAYNE SMITH Abstract. The Harnack metric is a conformally invariant metric defined in quite general domains that coincides with the hyperbolic
More informationThe local equicontinuity of a maximal monotone operator
arxiv:1410.3328v2 [math.fa] 3 Nov 2014 The local equicontinuity of a maximal monotone operator M.D. Voisei Abstract The local equicontinuity of an operator T : X X with proper Fitzpatrick function ϕ T
More informationUniversity of Warwick, EC9A0 Maths for Economists Lecture Notes 10: Dynamic Programming
University of Warwick, EC9A0 Maths for Economists 1 of 63 University of Warwick, EC9A0 Maths for Economists Lecture Notes 10: Dynamic Programming Peter J. Hammond Autumn 2013, revised 2014 University of
More informationSTAT 7032 Probability Spring Wlodek Bryc
STAT 7032 Probability Spring 2018 Wlodek Bryc Created: Friday, Jan 2, 2014 Revised for Spring 2018 Printed: January 9, 2018 File: Grad-Prob-2018.TEX Department of Mathematical Sciences, University of Cincinnati,
More informationON PARABOLIC HARNACK INEQUALITY
ON PARABOLIC HARNACK INEQUALITY JIAXIN HU Abstract. We show that the parabolic Harnack inequality is equivalent to the near-diagonal lower bound of the Dirichlet heat kernel on any ball in a metric measure-energy
More informationAPPLICATIONS OF THE KANTOROVICH-RUBINSTEIN MAXIMUM PRINCIPLE IN THE THEORY OF MARKOV OPERATORS
12 th International Workshop for Young Mathematicians Probability Theory and Statistics Kraków, 20-26 September 2009 pp. 43-51 APPLICATIONS OF THE KANTOROVICH-RUBINSTEIN MAIMUM PRINCIPLE IN THE THEORY
More informationInfinite-Horizon Discounted Markov Decision Processes
Infinite-Horizon Discounted Markov Decision Processes Dan Zhang Leeds School of Business University of Colorado at Boulder Dan Zhang, Spring 2012 Infinite Horizon Discounted MDP 1 Outline The expected
More informationSimultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms
Simultaneous drift conditions for Adaptive Markov Chain Monte Carlo algorithms Yan Bai Feb 2009; Revised Nov 2009 Abstract In the paper, we mainly study ergodicity of adaptive MCMC algorithms. Assume that
More informationOPTIMAL SOLUTIONS OF CONSTRAINED DISCOUNTED SEMI-MARKOV CONTROL PROBLEMS
Bol. Soc. Mat. Mexicana (3) Vol. 19, 213 OPTIMAL SOLUTIONS OF CONSTRAINED DISCOUNTED SEMI-MARKOV CONTROL PROBLEMS JUAN GONZÁLEZ-HERNÁNDEZ AND CÉSAR EMILIO VILLARREAL-RODRÍGUEZ ABSTRACT. We give conditions
More informationReinforcement Learning
Reinforcement Learning March May, 2013 Schedule Update Introduction 03/13/2015 (10:15-12:15) Sala conferenze MDPs 03/18/2015 (10:15-12:15) Sala conferenze Solving MDPs 03/20/2015 (10:15-12:15) Aula Alpha
More informationON ESSENTIAL INFORMATION IN SEQUENTIAL DECISION PROCESSES
MMOR manuscript No. (will be inserted by the editor) ON ESSENTIAL INFORMATION IN SEQUENTIAL DECISION PROCESSES Eugene A. Feinberg Department of Applied Mathematics and Statistics; State University of New
More informationON THE CONVERGENCE OF MODIFIED NOOR ITERATION METHOD FOR NEARLY LIPSCHITZIAN MAPPINGS IN ARBITRARY REAL BANACH SPACES
TJMM 6 (2014), No. 1, 45-51 ON THE CONVERGENCE OF MODIFIED NOOR ITERATION METHOD FOR NEARLY LIPSCHITZIAN MAPPINGS IN ARBITRARY REAL BANACH SPACES ADESANMI ALAO MOGBADEMU Abstract. In this present paper,
More informationCone-Constrained Linear Equations in Banach Spaces 1
Journal of Convex Analysis Volume 4 (1997), No. 1, 149 164 Cone-Constrained Linear Equations in Banach Spaces 1 O. Hernandez-Lerma Departamento de Matemáticas, CINVESTAV-IPN, A. Postal 14-740, México D.F.
More informationAsymptotic stability of an evolutionary nonlinear Boltzmann-type equation
Acta Polytechnica Hungarica Vol. 14, No. 5, 217 Asymptotic stability of an evolutionary nonlinear Boltzmann-type equation Roksana Brodnicka, Henryk Gacki Institute of Mathematics, University of Silesia
More informationWeighted Sums of Orthogonal Polynomials Related to Birth-Death Processes with Killing
Advances in Dynamical Systems and Applications ISSN 0973-5321, Volume 8, Number 2, pp. 401 412 (2013) http://campus.mst.edu/adsa Weighted Sums of Orthogonal Polynomials Related to Birth-Death Processes
More informationOn intermediate value theorem in ordered Banach spaces for noncompact and discontinuous mappings
Int. J. Nonlinear Anal. Appl. 7 (2016) No. 1, 295-300 ISSN: 2008-6822 (electronic) http://dx.doi.org/10.22075/ijnaa.2015.341 On intermediate value theorem in ordered Banach spaces for noncompact and discontinuous
More informationAbstract Dynamic Programming
Abstract Dynamic Programming Dimitri P. Bertsekas Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Overview of the Research Monograph Abstract Dynamic Programming"
More informationInvariant measures for iterated function systems
ANNALES POLONICI MATHEMATICI LXXV.1(2000) Invariant measures for iterated function systems by Tomasz Szarek (Katowice and Rzeszów) Abstract. A new criterion for the existence of an invariant distribution
More informationA FIXED POINT THEOREM FOR GENERALIZED NONEXPANSIVE MULTIVALUED MAPPINGS
Fixed Point Theory, (0), No., 4-46 http://www.math.ubbcluj.ro/ nodeacj/sfptcj.html A FIXED POINT THEOREM FOR GENERALIZED NONEXPANSIVE MULTIVALUED MAPPINGS A. ABKAR AND M. ESLAMIAN Department of Mathematics,
More informationA CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE
Journal of Applied Analysis Vol. 6, No. 1 (2000), pp. 139 148 A CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE A. W. A. TAHA Received
More informationCentral limit theorems for ergodic continuous-time Markov chains with applications to single birth processes
Front. Math. China 215, 1(4): 933 947 DOI 1.17/s11464-15-488-5 Central limit theorems for ergodic continuous-time Markov chains with applications to single birth processes Yuanyuan LIU 1, Yuhui ZHANG 2
More informationExercise Solutions to Functional Analysis
Exercise Solutions to Functional Analysis Note: References refer to M. Schechter, Principles of Functional Analysis Exersize that. Let φ,..., φ n be an orthonormal set in a Hilbert space H. Show n f n
More informationLYAPUNOV STABILITY OF CLOSED SETS IN IMPULSIVE SEMIDYNAMICAL SYSTEMS
Electronic Journal of Differential Equations, Vol. 2010(2010, No. 78, pp. 1 18. ISSN: 1072-6691. URL: http://ejde.math.txstate.edu or http://ejde.math.unt.edu ftp ejde.math.txstate.edu LYAPUNOV STABILITY
More informationUniform turnpike theorems for finite Markov decision processes
MATHEMATICS OF OPERATIONS RESEARCH Vol. 00, No. 0, Xxxxx 0000, pp. 000 000 issn 0364-765X eissn 1526-5471 00 0000 0001 INFORMS doi 10.1287/xxxx.0000.0000 c 0000 INFORMS Authors are encouraged to submit
More informationSolution existence of variational inequalities with pseudomonotone operators in the sense of Brézis
Solution existence of variational inequalities with pseudomonotone operators in the sense of Brézis B. T. Kien, M.-M. Wong, N. C. Wong and J. C. Yao Communicated by F. Giannessi This research was partially
More informationThe Equivalence of Ergodicity and Weak Mixing for Infinitely Divisible Processes1
Journal of Theoretical Probability. Vol. 10, No. 1, 1997 The Equivalence of Ergodicity and Weak Mixing for Infinitely Divisible Processes1 Jan Rosinski2 and Tomasz Zak Received June 20, 1995: revised September
More informationAW -Convergence and Well-Posedness of Non Convex Functions
Journal of Convex Analysis Volume 10 (2003), No. 2, 351 364 AW -Convergence Well-Posedness of Non Convex Functions Silvia Villa DIMA, Università di Genova, Via Dodecaneso 35, 16146 Genova, Italy villa@dima.unige.it
More informationReal Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi
Real Analysis Math 3AH Rudin, Chapter # Dominique Abdi.. If r is rational (r 0) and x is irrational, prove that r + x and rx are irrational. Solution. Assume the contrary, that r+x and rx are rational.
More informationLocally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem
56 Chapter 7 Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem Recall that C(X) is not a normed linear space when X is not compact. On the other hand we could use semi
More informationA monotonic property of the optimal admission control to an M/M/1 queue under periodic observations with average cost criterion
A monotonic property of the optimal admission control to an M/M/1 queue under periodic observations with average cost criterion Cao, Jianhua; Nyberg, Christian Published in: Seventeenth Nordic Teletraffic
More informationADJOINTS, ABSOLUTE VALUES AND POLAR DECOMPOSITIONS
J. OPERATOR THEORY 44(2000), 243 254 c Copyright by Theta, 2000 ADJOINTS, ABSOLUTE VALUES AND POLAR DECOMPOSITIONS DOUGLAS BRIDGES, FRED RICHMAN and PETER SCHUSTER Communicated by William B. Arveson Abstract.
More informationSOLUTION OF AN INITIAL-VALUE PROBLEM FOR PARABOLIC EQUATIONS VIA MONOTONE OPERATOR METHODS
Electronic Journal of Differential Equations, Vol. 214 (214), No. 225, pp. 1 1. ISSN: 172-6691. URL: http://ejde.math.txstate.edu or http://ejde.math.unt.edu ftp ejde.math.txstate.edu SOLUTION OF AN INITIAL-VALUE
More informationALUR DUAL RENORMINGS OF BANACH SPACES SEBASTIÁN LAJARA
ALUR DUAL RENORMINGS OF BANACH SPACES SEBASTIÁN LAJARA ABSTRACT. We give a covering type characterization for the class of dual Banach spaces with an equivalent ALUR dual norm. Let K be a closed convex
More informationSMSTC (2007/08) Probability.
SMSTC (27/8) Probability www.smstc.ac.uk Contents 12 Markov chains in continuous time 12 1 12.1 Markov property and the Kolmogorov equations.................... 12 2 12.1.1 Finite state space.................................
More informationAn Iterative Procedure for Solving the Riccati Equation A 2 R RA 1 = A 3 + RA 4 R. M.THAMBAN NAIR (I.I.T. Madras)
An Iterative Procedure for Solving the Riccati Equation A 2 R RA 1 = A 3 + RA 4 R M.THAMBAN NAIR (I.I.T. Madras) Abstract Let X 1 and X 2 be complex Banach spaces, and let A 1 BL(X 1 ), A 2 BL(X 2 ), A
More informationON THE EXISTENCE OF THREE SOLUTIONS FOR QUASILINEAR ELLIPTIC PROBLEM. Paweł Goncerz
Opuscula Mathematica Vol. 32 No. 3 2012 http://dx.doi.org/10.7494/opmath.2012.32.3.473 ON THE EXISTENCE OF THREE SOLUTIONS FOR QUASILINEAR ELLIPTIC PROBLEM Paweł Goncerz Abstract. We consider a quasilinear
More informationWEAK LOWER SEMI-CONTINUITY OF THE OPTIMAL VALUE FUNCTION AND APPLICATIONS TO WORST-CASE ROBUST OPTIMAL CONTROL PROBLEMS
WEAK LOWER SEMI-CONTINUITY OF THE OPTIMAL VALUE FUNCTION AND APPLICATIONS TO WORST-CASE ROBUST OPTIMAL CONTROL PROBLEMS ROLAND HERZOG AND FRANK SCHMIDT Abstract. Sufficient conditions ensuring weak lower
More informationProbability and Measure
Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability
More informationDomination of semigroups associated with sectorial forms
Domination of semigroups associated with sectorial forms Amir Manavi, Hendrik Vogt, and Jürgen Voigt Fachrichtung Mathematik, Technische Universität Dresden, D-0106 Dresden, Germany Abstract Let τ be a
More informationJournal of Complexity. New general convergence theory for iterative processes and its applications to Newton Kantorovich type theorems
Journal of Complexity 26 (2010) 3 42 Contents lists available at ScienceDirect Journal of Complexity journal homepage: www.elsevier.com/locate/jco New general convergence theory for iterative processes
More informationNear-Potential Games: Geometry and Dynamics
Near-Potential Games: Geometry and Dynamics Ozan Candogan, Asuman Ozdaglar and Pablo A. Parrilo September 6, 2011 Abstract Potential games are a special class of games for which many adaptive user dynamics
More informationOperator approach to stochastic games with varying stage duration
Operator approach to stochastic games with varying stage duration G.Vigeral (with S. Sorin) CEREMADE Universite Paris Dauphine 4 December 2015, Stochastic methods in Game theory 1 G.Vigeral (with S. Sorin)
More informationA generalization of Dobrushin coefficient
A generalization of Dobrushin coefficient Ü µ ŒÆ.êÆ ÆÆ 202.5 . Introduction and main results We generalize the well-known Dobrushin coefficient δ in total variation to weighted total variation δ V, which
More informationMarkov Chains and Stochastic Sampling
Part I Markov Chains and Stochastic Sampling 1 Markov Chains and Random Walks on Graphs 1.1 Structure of Finite Markov Chains We shall only consider Markov chains with a finite, but usually very large,
More informationOn Robust Arm-Acquiring Bandit Problems
On Robust Arm-Acquiring Bandit Problems Shiqing Yu Faculty Mentor: Xiang Yu July 20, 2014 Abstract In the classical multi-armed bandit problem, at each stage, the player has to choose one from N given
More informationat time t, in dimension d. The index i varies in a countable set I. We call configuration the family, denoted generically by Φ: U (x i (t) x j (t))
Notations In this chapter we investigate infinite systems of interacting particles subject to Newtonian dynamics Each particle is characterized by its position an velocity x i t, v i t R d R d at time
More informationWHY SATURATED PROBABILITY SPACES ARE NECESSARY
WHY SATURATED PROBABILITY SPACES ARE NECESSARY H. JEROME KEISLER AND YENENG SUN Abstract. An atomless probability space (Ω, A, P ) is said to have the saturation property for a probability measure µ on
More informationCONVERGENCE OF HYBRID FIXED POINT FOR A PAIR OF NONLINEAR MAPPINGS IN BANACH SPACES
International Journal of Analysis and Applications ISSN 2291-8639 Volume 8, Number 1 2015), 69-78 http://www.etamaths.com CONVERGENCE OF HYBRID FIXED POINT FOR A PAIR OF NONLINEAR MAPPINGS IN BANACH SPACES
More information2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?
MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due 9/5). Prove that every countable set A is measurable and µ(a) = 0. 2 (Bonus). Let A consist of points (x, y) such that either x or y is
More information