Research Article A Necessary Condition for Nash Equilibrium in Two-Person Zero-Sum Constrained Stochastic Games

Size: px

Start display at page:

Download "Research Article A Necessary Condition for Nash Equilibrium in Two-Person Zero-Sum Constrained Stochastic Games"

Sharyl Craig
5 years ago
Views:

1 Game Theory Volume 2013, Article ID , 5 pages Research Article A Necessary Condition for Nash Equilibrium in Two-Person Zero-Sum Constrained Stochastic Games Hyeong Soo Chang Department of Computer Science and Engineering, Sogang University, Seoul , Republic of Korea Correspondence should be addressed to Hyeong Soo Chang; hschang@sogang.ac.kr Received 15 September 2013; Accepted 19 November 2013 Academic Editor: Peijun Guo Copyright 2013 Hyeong Soo Chang. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. We provide a necessary condition that a constrained Nash-equilibrium (CNE) policy pair satisfies in two-person zero-sum constrained stochastic discounted-payoff games and discuss a general method of approximating CNE based on the condition. 1. Introduction Altman and Shwartz [1] established a sufficient condition for the existence of a stationary Markovian constrained Nash equilibrium (CNE) policy pair in a general model of finite two-person zero-sum constrained stochastic games and Alvarez-Mena and Hernández-Lerma [2] extended the result for infinite state and action spaces. Even though a few computational studies exist for average-payoff models with additional simplifying assumptions (see, e.g., [3 5]), there seems to be no work providing a meaningful necessary condition for CNE or any general approximation scheme for CNE within the general discounting-cost model. This brief paper establishes a necessary condition that a CNE policy pair satisfies by a novel characterization of the set of all feasible policies of one player when the other player s policy is fixed. This is done by identifying feasible mixed actions of one player at a current state when the expected total discounted constraint cost from each reachable next stateisgivenbyavaluefunctiondefinedoverthestatespace. The necessary condition provides a general method of testing whetheragivenpolicypairisacnepolicypairandcan induce a general approximation scheme for CNE. 2. Preliminaries Consider a two-person zero-sum Markov game (MG) [6] M = (X,A,,P,C),whereX is a finite state set, and A and are nonempty finite pure-action sets for the minimizer and the maximizer, respectively, at x in X with A = A and =. We denote the mixed action sets at x in X over A and to be D[A] and D[], respectively,withd[a] = D[A] and D[] = D[]. Once f in D[A] and g in D[] are simultaneously taken at x in X by the minimizer and the maximizer, respectively (with the complete knowledge of the state but without knowing each other s current action being taken), x makes a transition to a next state y by the probability P fg given as P fg = a A b f (a) g (b) p (y x, a, b). Here f(a) denotes the probability of selecting a, similarto g(b), and p(y x, a, b) denotes the probability of moving from x to y by a and b. Then the minimizer obtains an expected cost of C(x, f, g) given by C(x,f,g) = a A b c (x, y, a, b) p (y x, a, b) f (a) g (b), where c(x, y, a, b) in R is a payoff to the minimizer (the negative of this will be incurred to the maximizer). We define a stationary Markovian policy π of the minimizer as a function π : X D[A] with π D[A] (1) (2)

2 2 Game Theory for all anddenote Π to be the set of all possible such policies. A policy φ is similarly defined for the maximizer with D[], and we denote Φ to be the set of all possible such policies. Define the objective value of π in Π and φ in Φ with an initial state x in X as V (π, φ) =E[ γ t C (X t,π(x t ),φ(x t )) X 0 =x], t=0 where X t is a random variable denoting the state at time t by following π and φ,andγ (0, 1) is a fixed discounting factor. We let V δ (π, φ) = δv(π, φ) for a given initial state distribution δ over X. The MG M is associated with constraint functions κ i, i= 1, 2, defined over X and constraint-cost functions d i, i=1,2, where d i (x,y,a,b) R, x, y X, a A,andb, is a constraint-cost paid by the minimizer if i=1and by the maximizer if i=2. (For simplicity, we consider the model in [1, 2] with only one side constraint for each player.) Apolicyπ in Π is called δ-feasible with respect to φ in Φ if the pair of π and φ satisfies the constraint inequality of δj 1 (π, φ) δκ 1, wherej 1 is defined with β 1 (0, 1) such that J 1 (π, φ) =E[ β t 1 D 1 (X t,π(x t ),φ(x t )) X 0 =x], t=0 The expected constraint cost D 1 (x, f, g) is given by D 1 (x, f, g) = a A b (3). (4) d 1 (x,y,a,b)p(y x,a,b)f(a) g (b). Similarly, φ in Φ is δ-feasible with respect to π in Π if δj 2 (π, φ) δκ 2, wherej 2 is defined with D 2 and β 2 (0, 1).Wesaythatπ is feasible with respect to φ if, for all x in X, J 1 (π, φ) κ 1,andφ is feasible with respect to π if, for all x in X, J 2 (π, φ) κ 2.Notethatifπ is feasible with respect to φ, thenπ is δ-feasible with respect to φ for any δ. Let Π φ =π Π: δ J 1 (π, φ) δ κ 1 }, φ Φ, Φ π = φ Φ : δ J 2 (π, φ) δ κ 2 }, π Π. (6) That is, Π φ is the set of all δ-feasible policies in Π with respect to φ (when the maximizer s policy is fixed by φ); similarly Φ π (5) is obtained when the minimizer s policy is fixed by π. Then M has a constrained Nash equilibrium (CNE) if there exists a pair of π in Π and φ in Φ such that π is δ-feasible with respect to φ and φ is δ-feasible with respect to π,and V δ (π,φ) V δ (π,φ ) V δ (π, φ ) π Π φ, φ Φ π. 3. A Necessary Condition for CNE Let (X) be the set of all real-valued functions on X.Givenφ in Φ and V in (X), define φ-constrained feasible mixed action set with V for the minimizer: for all x in X, D[A ] φ,v = f D[A ] :D 1 (x, f, φ )+β 1 P fφ V (y) (7) V +(1 β 1 ) min (κ 1 (x ) V(x )) } }. } (8) We further define π-constrained feasible mixed action set with V forthemaximizerforπ in Π and V in (X):forallx in X, D[ ] π,v = g D[ ] :D 2 (x, π,g)+β 2 P πg V (y) V +(1 β 2 ) min (κ 2 (x ) V(x )) } }. } (9) Lemma 1. Given φ in Φ and V in (X), suppose that D[A] φ,v =0for all x in X.Thenanyπ in Π such that π D[A] φ,v for all x in X is feasible with respect to φ.similarly, given π in Π and V in (X), ifd[] π,v =0 for all x in X, then any φ in Π such that φ D[] π,v for all x in X is feasible with respect to π. Proof. Consider an operator T π,φ :(X) (X)given as T π,φ (V) =D 1 (x, π,φ)+β 1 P πφ V (y), x X, V (X). (10) Thenwehavethatforτ in R and u in (X) if T π,φ (u) u + τ for all x in X, then,forallx X, J 1 (π, φ) u+τ(1 β 1 ) 1 and lim n (T π,φ (u)) n = J 1 (π, φ) [6].

3 Game Theory 3 ecause π is in D[A] φ,v for all x in X, T π,φ (V) V + (1 β 1 )min (κ 1 (x ) V(x )) for all x in X. With τ=(1 β 1 )min (κ 1 (x ) V(x )),wehavethat,forallx in X, J 1 (π, φ) V + min (κ 1 (x ) V(x )) κ 1. (11) The second statement can be proven with the similar symmetrical reasoning. Some notable examples of choosing V in (X) for D[A] φ,v and D[] π,v are κ 1, κ 2, J 1 (π, φ), andj 2 (π, φ), forsomepairofπ and φ, and the zero function such that V = 0 for all x in X. Inparticular,ifV = 0 for all,thend[a] φ,v = f D[A] : D 1 (x, f, φ) (1 β 1 )min κ 1 (x )}. We now let Π φ,v = π Π : π D[A ] φ,v, }, φ Φ, V (X), Φ π,v = φ Φ : φ D[ ] π,v, }, π Π,V (X). (12) If D[A] φ,v = 0 for some x in X, thenπ φ,v = 0,and similarly Φ π,v =0if D[] π,v =0for some x in X. The following theorem characterizes the set of one player s all feasible policies with respect to a fixed policy of the other player. Theorem 2. Given φ in Φ, π in Π is feasible with respect to φ if and only if π is in V (X) Π φ,v.furthermore,givenπ Π, φ in Φ is feasible with respect to π if and only if φ is in V (X) Φ π,v. Proof. We prove only the first part of the statement for the minimizer case. Suppose that π is feasible with respect to φ. ThenbysettingV = J 1 (π, φ), (1 β 1 )min (κ 1 J 1 (π, φ)) 0.Itfollowsthat D 1 (x, π,φ)+β 1 =J 1 (π, φ) P πφ J 1 (π, φ) (y) J 1 (π, φ) +(1 β 1 ) min (κ 1 (x ) J 1 (π, φ) (x )) (13) and this implies that there exists V in (X) such that π is in D[A] φ,v for all x in X. For the other direction, if π is in V (X) Π φ,v,then,for some V (X), π is in D[A] φ,v for all x in X. y Lemma 1, π is feasible with respect to φ. ecause a policy of one player which is feasible with respect to a policy of the other player is δ-feasible with respect to the policy of the other player, the following necessary condition satisfied by a CNE policy pair is immediate. Corollary 3. If a pair of π in Π and φ in Φ is a CNE pair for a given δ, then the pair satisfies the following saddle-point inequality: V δ (π,φ) V δ (π,φ ) V δ (π, φ ), π Π φ,v, φ Φ π,v. V (X) V (X) (14) Given a pair of π in Π and φ in Φ, if we find some V, u in (X) such that the pair does not satisfy the saddle-point inequality over nonempty Π φ,v and Φ π,u, then the pair is not acnepair. 4. An Example of Approximation Scheme for CNE We now provide an example of a general approximation scheme for CNE based on the necessary condition. asically, we fix some u, V in (X) andtrytofindanequilibriumpolicy pair that satisfies the saddle-point inequalities over subsets of the feasible policy spaces induced with the selected u, V and the pair, which is an approximate CNE pair. We start with selecting arbitrary V 1, V 2 in (X) and define feasible mixed joint-action sets induced with V 1, V 2 :forallx X, D V 1,V 2 = (f, g) D [A ] D[ ] i=1,2 Let :D i (x, f, g) + β i P fg V i (y) V i +(1 β i ) min (κ i (x ) V i (x )) } }. } =(a, b) A :(f a,g b ) D V 1,V 2 }, (15) (16) where f a denotes the mixed action with f a (a) = 1 and g b with g b (b) = 1. Assume that =0for all x in X.Wethenobtaina pair of nonempty A A and ΔV 1,V 2 such that A ΔV 1,V 2. Let and similarly A φ,v =a A :f a D[A ] φ,v },, V (X), φ Φ π,v =b :g b D[ ] π,v },, V (X), π Π. (17) (18)

4 4 Game Theory That is, A φ,v is a φ-constrained feasible pure-action set with V at x for the minimizer and similarly π,v is for the maximizer. For any subset A A, x X with A = A, we denote D[A ] to be the set of all possible probability distributions over A that have zero probabilities for the actions in A\A.IfA = 0,then D[A ] = 0. ThenotationofD[ ] for is similarly denoted. (Note that D[A φ,v ] D[A] φ,v D[A] for all x in X in general.) We further let Π[A ]=π Π:π D[A ], }, Φ[ ]=φ Φ:φ D[ ], }. (19) If D[A ] = 0 for some x in X, Π[A ]=0,andsimilarly Φ[ ]=0if D[ ] = 0 for some x in X. y construction, the following result is immediate. For all, A φ Φ[Δ V 1,V 2 ] π Π[Δ V 1,V 2 A ] A φ,v1, π,v2, (20) which further implies that, for any ( π, φ) such that π D[ A ] for all x in X and φ D[ ] for all x in X, π is feasible with respect to φ and φ is feasible with respect to π. Consider now the unconstrained game M V 1,V 2 = (X, A,ΔV 1,V 2,P,C), where P and C are evaluated only at f in D[ A ] and g in D[ΔV 1,V 2 ] for all x in X, and denote the set of all NE policy pairs of M V 1,V 2 to be NE(M V 1,V 2 ). The above two results finally imply then that any ( π, φ) NE(M V 1,V 2 ) is a local CNE for M.Inotherwords, for any δ, π is δ-feasible with respect to φ and φ is δ-feasible with respect to π, andπ[ A ] Π φ and Φ[ΔV 1,V 2 ] Φ π, and V δ ( π, φ) V δ ( π, φ) V δ (π, φ), π Π [ A ], φ Φ[ΔV 1,V 2 ]. (21) That is, the local CNE pair ( π, φ) satisfies all of the conditions of CNE except that the saddle-point inequality is satisfied locally for the subsets of Π φ and Φ π. In fact, related solution concepts for games that are resistant to local deviations, called local NE, have been already established in economics (see, e.g., [7]). Projecting into the two sets of A and turns out to be equivalent to obtaining a complete bipartite subgraph or biclique subgraph from a (bipartite) graph. The problem of finding a biclique in a given bipartite graph is well studied in the graph theory literature (see, e.g., [8]). Another issue is how we set V 1 and V 2 such that is nonempty for all x in X. If there exists a pure-policy pair of π in Π and φ in Φ such that, for all x in X, π(a) = 1 for some a Aand φ(b) = 1 for some b and onepolicyisfeasiblewithrespecttotheotherpolicy,thenby setting V i =J i (π, φ), i=1,2,wehave(π, φ) D V 1,V 2 for all x in X, making =0 for all x in X. Wecan put the following feasibility assumption on M to assure the existence of such a pure-policy pair: for all x in X, min κ 1,κ 2 } max inf J 1 (π, φ), inf J (22) 2 (π, φ) }. π Π,φ Φ π Π,φ Φ In other words, by this assumption there exists at least one pure-policy pair such that one policy is feasible with respect to the other policy. The existence comes from the fact that there exists a pure-policy pair (π, φ) that achieves inf π,φ J 1(π,φ ) for all x in X in solving this Markov decision process problem [9] and also for the case of inf π,φ J 2(π,φ ) for all x in X. To illustrate how the above method works, we consider the following simple MG M = (X,A,,P,C),whereX = 1, 2}, A = = a, b}, x X,andp, c, d 1,andd 2 are given in terms of a matrix form, respectively: p aa =[ ], p ab = [ ], pba = [ ],andpbb = [ ]; caa = c ab = [ ], cba = c bb = [ ];anddaa 1 = d ab 1 = [ ], d ba 1 =d bb 1 =[ ], daa 2 =d ab 2 =[ ],anddba 2 =d bb 2 =[ ]. In this matrix form, for example, the (i, j)th entries of p ab and c ab refer to p(j i, a, b) and c(i, j, a, b), respectively.theother parameters are given such that γ = β 1 = 0.9, β 2 = 0.95, κ 1 (1) = κ 2 (1) = 40,andκ 1 (2) = κ 2 (2) = 50. We first observe that, for the pure-policy pair (π, φ) = ((a, a), (a, a)), one policy is feasible with respect to the other policy. The notation of π = (π(1), π(2)) here refers to π(1) D[A(1)] and π(2) D[A(2)]. Forsimplicity, we writeπ(1) as a for the distribution concentrated on the action a.(notethat we can obtain another such pure-policy pair ((a, a), (b, b)) which achieves inf π Π,φ Φ J 1 (π, φ) = 2, x X.) Now, by setting V i = J i (π, φ), i = 1,2,wefirstobtain = (a, a), (a, b)}, forall,therebyhavingthe sets of A = a} and ΔV 1,V 2 = a, b}, forall. Wesolvetheunconstrainedtwo-personzero-sumMGM V 1,V 2, obtaining a pure NE policy pair of ( π, φ) = ((a, a), (a, a)) for M V 1,V 2, which is then a local CNE for M. 5. Concluding Remark For simplicity, the model of the present note deals with only one side constraint for each player. Generalization of this into multiple side constraints per player would make the definitions complex, but the ideas must be the same to the one side-constraint case. Acknowledgment The author is with the Department of Computer Science and Engineering at Sogang University, Seoul, Republic of Korea,

5 Game Theory 5 References [1] E. Altman and A. Shwartz, Constrained Markov games: Nash equilibria, in Annals of the International Society of Dynamic Games,V.Gaitsgory,J.Filar,andK.Mizukami,Eds.,vol.5,pp , irkhäauser, oston, Mass, USA, [2] J. Alvarez-Mena and O. Hernández-Lerma, Existence of nash equilibria for constrained stochastic games, Mathematical Methods of Operations Research, vol. 63, no. 2, pp , [3] E. Altman, K. Avrachenkov, R. Marquez, and G. Miller, Zerosum constrained stochastic games with independent state processes, Mathematical Methods of Operations Research, vol. 62, no. 3, pp , [4] E. Altman, K. Avrachenkov, N. onneau, M. Debbah, R. El-Azouzi, and D. S. Menasche, Constrained cost-coupled stochastic games with independent state processes, Operations Research Letters,vol.36,no.2,pp ,2008. [5] N. Shimkin, Stochastic games with average cost constraints, in Annals of the International Society of Dynamic Games, T.asar anda.haurie,eds.,vol.1ofadvances in Dynamic Games and Applications, irkhäauser, oston, Mass, USA, [6] J. Filar and K. Vrieze, Competitive Markov Decision Processes, Springer,NewYork,NY,USA,1996. [7] C. Alós-Ferrer and A.. Ania, Local equilibria in economic games, Economics Letters, vol. 70, no. 2, pp , [8] D. S. Hochbaum, Approximating clique and biclique problems, Algorithms, vol. 29, no. 1, pp , [9] M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley & Sons, New York, NY, USA, 1994.

6 Advances in Operations Research Advances in Decision Sciences Applied Mathematics Algebra Probability and Statistics The Scientific World Journal International Differential Equations Submit your manuscripts at International Advances in Combinatorics Mathematical Physics Complex Analysis International Mathematics and Mathematical Sciences Mathematical Problems in Engineering Mathematics Discrete Mathematics Discrete Dynamics in Nature and Society Function Spaces Abstract and Applied Analysis International Stochastic Analysis Optimization

Research Article Existence of Periodic Positive Solutions for Abstract Difference Equations

Discrete Dynamics in Nature and Society Volume 2011, Article ID 870164, 7 pages doi:10.1155/2011/870164 Research Article Existence of Periodic Positive Solutions for Abstract Difference Equations Shugui