Oblivious Equilibrium for Large-Scale Stochastic Games with Unbounded Costs

Size: px

Start display at page:

Download "Oblivious Equilibrium for Large-Scale Stochastic Games with Unbounded Costs"

Willa Melton
5 years ago
Views:

1 Proceedings of the 47th IEEE Conference on Decision and Control Cancun, Mexico, Dec. 9-11, 008 Oblivious Equilibrium for Large-Scale Stochastic Games with Unbounded Costs Sachin Adlakha, Ramesh Johari, Gabriel Weintraub, and Andrea Goldsmith Abstract We stud stochastic dnamic games with a large number of plaers, where plaers are coupled via their cost functions. A standard solution concept for stochastic games is Markov perfect equilibrium MPE. In MPE, each plaer s strateg is a function of its own state as well as the state of the other plaers. This makes MPE computationall prohibitive as the number of plaers becomes large. An approximate solution concept called oblivious equilibrium OE was introduced in 1, where each plaer s decision depends onl on its own state and the long-run average state of other plaers. This makes OE computationall more tractable than MPE. It was shown in 1 that, under a set of assumptions, as the number of plaers become large, OE closel approximates MPE. In this paper we relax those assumptions and generalize that result to cases where the cost functions are unbounded. Furthermore, we show that under these relaxed set of assumptions, the OE approximation result can be applied to large population linear quadratic Gaussian LQG games. I. INTRODUCTION In this paper, we stud stochastic games with a large number of plaers. Such games are used to model complex dnamical sstems such as wireless networks 3, 4, industr dnamics with man firms 5, etc. In such games, the plaers tpicall have competing objectives. A common equilibrium notion for such games is Markov perfect equilibrium. In MPE, each plaer minimizes its individual cost b choosing a strateg that is a function of the current state of all the plaers. MPE suffers from at least two drawbacks. First, as an equilibrium concept, it is implausible in settings where man agents interact with each other; MPE requires the agents to be aware of the state evolution of all other agents. Second, MPE can be computationall intractable. MPE is tpicall obtained numericall using dnamic programming DP algorithms 7. Thus, as the number of plaers increases, the cost of computing the strateg and maintaining the state of all the plaers becomes prohibitivel large 8. Several techniques have been proposed in the literature to deal with the complexit of large scale sstems 9, 10, 15. Recentl, a scheme for approximating MPE for such large scale games was proposed in 1, via a solution concept called oblivious equilibrium. In oblivious equilibrium, a plaer optimizes given onl the long-run average statistics of other plaers, rather than the entire instantaneous vector of its S. Adlakha and A. Goldsmith are with the Department of Electrical Engineering, Stanford Universit. adlakha@stanford.edu, andrea@wsl.stanford.edu R. Johari is with the Department of Management Science and Engineering, Stanford Universit. ramesh.johari@stanford.edu G. Weintraub is with the Columbia Business School, Columbia Universit. gweintraub@columbia.edu competitors state. OE resolves both of the difficulties raised above: in OE, a plaer is reacting to far simpler aggregate statistics of the behavior of other plaers. Further, OE computation is significantl simpler than MPE computation, since each plaer onl needs to solve a one-dimensional dnamic program. Under what conditions will OE approximate MPE? When there are a large number of plaers in a game, an individual plaer can optimize based on the average behavior of it competitors provided a the actual behavior of the competitors is close to the average behavior, and b an deviation from the average behavior has small impact on the cost of the individual plaer. It is intuitive to believe that as the number of plaers in a game becomes large, the actual behavior would approach its mean b a law of large numbers effect. To measure the impact of an deviation from the average behavior, the authors in 1 defined a light-tail condition. Informall, this condition implies that the effect of a small perturbation in the instantaneous state of the competitors has a small effect on the cost of a plaer. It is reasonable to expect that under such a condition, if plaers make decisions based onl on the long-run average, the should achieve near-optimal performance. Indeed, it is established in 1 that under a reasonable set of technical conditions including the light-tail condition, OE is a good approximation to MPE for industr dnamic models with man firms; formall, this is called the asmptotic Markov equilibrium AME propert. As presented in 1, the main approximation result is tailored to the class of firm competition models presented there. In 6, using the methods of 1, the authors isolated a set of parsimonious assumptions for a general class of stochastic games, under which OE is a good approximation to MPE. The also stud the case of non-uniform plaers with heterogeneous cost functions. However, the results in 1, 6 are based on the assumption that the cost functions are uniforml bounded over states and actions. This is a restrictive assumption; for example, tpical cost functions used in decentralized control are unbounded e.g., quadratic cost. In this paper, we remove this boundedness assumption on the cost functions; this change necessitates an alternate approach to establish that OE is a good approximation to MPE. We show a general result under this assumption, then appl our result to a class of games including games with quadratic cost and linear dnamics. The latter result is a generalization of a similar result derived b. The rest of the paper is organized as follows. In section II, we outline our model of stochastic game, describe our /08/$ IEEE 5531

2 47th IEEE CDC, Cancun, Mexico, Dec. 9-11, 008 notation and define the oblivious equilibrium. In section III, we introduce the asmptotic Markov equilibrium AME propert, formall define the light-tail condition and introduce our assumptions on the cost function. Section IV details the proof of the main theorem. In section V, we show that linear-quadratic games can be reduced to a special case of our model. Section VI concludes the paper. II. MODEL, DEFINITIONS AND NOTATION We consider an m-plaer stochastic game evolving over discrete time periods with an infinite horizon. The discrete time periods are indexed b non-negative integers t N. The state of a plaer i at time t is denoted b x i,t X, where X is a discrete subset of R. We assume that the state evolution of a plaer i depends onl on its own current state and the action it takes. This can be represented b a conditional probabilit mass function pmf x i,t+1 h x x i,t,a i,t, 1 where a i,t is the action taken b the plaer i at time t. We denote the set of actions available to a plaer b A; we assume this is a discrete subset of Euclidean space R. The single period cost to a plaer i is given as cx i,t,a i,t,x,t. Here x,t is the state of all plaers except plaer i at time t. Note that the cost to plaer i does not depend on the actions taken b other plaers. Furthermore, we assume that the cost function is independent of the identit of other plaers. That is, it onl depends on the current state x i,t of plaer i, the total number m of plaers at an time, and the fraction f m,t, which is the fraction of plaers excluding plaer i, that have their state as. In other words, we can write the cost function as c x i,t,a i,t,f m,t,m, where f m,t can be expressed as f m 1,t m 1 1 {x}. From equation 1 and the definition of the cost function, we note that the plaers are coupled via their cost functions onl. Each plaer i chooses an action a i,t µ m i x i,t,f m,t to minimize its expected present value. Note that the polic µ m i depends on the total number of plaers m because of the underling dependence of the cost function on m. Let µ m be the vector of policies of all plaers, and µ m be the vector of policies of all plaers except plaer i. We define V x,f,m µ m i,µ m to be the expected net present value for plaer i with current state x, if the current aggregate state of plaers other than i is f, given that i follows the polic and the polic vector of plaers other than i is given µ m i b µ m. In particular, we have V x,f,m µ m i,µ m E β t cx i,t,a i,t,f m,t,m x i,0 x,f m,0 f;µm i,µ m, 3 where 0 < β < 1 is the discount factor. Note that the random variables x i,t,f m i,t depend on the polic vector µ m and the state evolution function h. We focus on smmetric Markov perfect equilibrium, where all plaers use the same polic µ m. We thus drop the subscript i in the polic of a plaer i. Let M be the set of all policies available to a plaer. Note that this set also depends on the total number of plaers m. Definition 1 Markov Perfect Equilibrium: The vector of policies µ m is a Markov perfect equilibrium if for all i,x, and f we have inf V x,f,m µ,µ m µ V x,f,m µ m,µ m. M As the number of plaers becomes large, the MPE becomes computationall intractable. This is because the set of all policies grows exponentiall in the number of plaers. However, if the coupling between the plaers is weak, it is possible that the plaers can choose their optimal action based solel on their own state and the average state of the other plaers. We expect that as the number of plaers becomes large, the changes in the plaers states average out such that the state vector f m,t is well approximated b its long run average. Thus, each plaer can find its optimal polic based solel on its own state and the long-run average aggregate state of the other plaers. We therefore restrict attention to policies that are onl a function of the plaer s own state, and an underling constant aggregate distribution of the competitors. Such strategies are referred to as oblivious strategies since the do not take into account the complete state of the competitors at an time. Let us denote µ m as an oblivious polic of a plaer i; we let M denote the set of all oblivious policies available to a plaer. This set also depends on the number of plaers m. Note that if all plaers use oblivious strategies, their states evolve as independent Markov chains. We make the following assumption regarding the Markov chain of each plaer plaing an oblivious polic. Assumption 1: The Markov chain associated with the state evolution of each plaer i plaing an oblivious polic µ m is positive recurrent, and reaches a stationar distribution q m. The stationar distribution depends on the number of plaers m because the oblivious polic depends on m. Let µ m be the vector of oblivious policies for all plaers, µ m i be the oblivious polic for a plaer i, and µ m be the vector of oblivious policies of all plaer except the plaer i. For simplification of analsis, we assume that the initial state of a plaer i is sampled from the stationar distribution q m of its state Markov chain; without this assumption, 553

3 47th IEEE CDC, Cancun, Mexico, Dec. 9-11, 008 the OE approximation holds onl after sufficient mixing of the individual plaers state evolution Markov chains. Given µ m, for a particular plaer i, the long-run average aggregate m state of its competitors is denoted b f, and is defined as f m E f m,t q m 4 Note that, fm is completel determined b the state evolution function h and the oblivious polic µ m. As with the case of smmetric MPE defined above, we assume that plaers use the same oblivious polic denoted b µ m. We thus drop the subscript i in the oblivious polic of a plaer i. We define the oblivious value function Ṽ x,m µ m, µ m to be the expected net present value for a plaer i with current state x, if plaer i follows the oblivious polic µ m, and plaers other than i follow the oblivious polic vector µ m. Specificall, we have Ṽ x,m µ m, µ m E β t m c x i,t,a i,t, f,m x i,0 x; µ m, µ m. Note that the expectation does not depend explicitl on the policies used b plaers other than i; this dependence onl m enters through the long-run average aggregate state f. In particular, the state evolution is onl due to the polic of plaer i. Using the oblivious value function, we define oblivious equilibrium as follows. Definition Oblivious Equilibrium: The vector of policies µ represents an oblivious equilibrium if for all i, we have inf Ṽ x,m µ, µ m µ M Ṽ x,m µ m, µ m, x. In this paper, we do not show the existence of Markov perfect equilibrium or of oblivious equilibrium. We assume that both the equilibrium points exist for the stochastic game under consideration 14. Assumption : Markov perfect equilibrium and oblivious equilibrium exist for the stochastic game under consideration. III. ASYMPTOTIC MARKOV EQUILIBRIUM AND THE LIGHT TAIL As mentioned before, we would like to approximate MPE using OE. To formalize the notion under which OE approximates MPE, we define the asmptotic Markov equilibrium AME propert. Intuitivel, this propert sas that an oblivious polic is approximatel optimal even when compared against Markov policies. Formall, the AME ensures that as number of plaers in the game becomes large, the approximation error between the expected net present value obtained b deviating from the oblivious polic µ m and instead following the optimal non-oblivious polic goes to zero for each state x of the plaer. Definition 3 Asmptotic Markov Equilibrium: We sa that a sequence of oblivious policies µ m possesses the 5 asmptotic Markov equilibrium AME propert if for all x and i, we have lim E V x,f,m µ m, µ m m inf V x,f,m µ, µ m µ 0. M Notice that the expectation here is over f, which denotes the aggregate state of all plaers other than i. MPE requires the error to be zero for all x,f, rather than in expectation; of course, in general, it will not be possible to find a single oblivious polic that satisfies the AME propert for an f. In particular, in OE, actions taken b a plaer will perform poorl if the other plaers state is far from the long-run average aggregate state. Thus, AME implies that the OE polic performs nearl as well as the non-oblivious best polic for those aggregate states of other plaers that occur with high probabilit. In order to establish the AME propert, we make some assumptions on the cost functions. For notational convenience, we drop the subscripts i,t whenever it does not lead to an ambiguit. Note that in oblivious equilibrium, we replace the actual distribution of the opponents state f m,t b its mean m distribution f. In order to measure the distance between the actual distribution of the state and its mean, we define a notion of norm on a distribution. Definition 4 1-g Norm: Given a function g : X 0,, we define the 1 g norm of the distribution f as f f g 6 The 1 g norm is a weighted norm where g is the weight function. Note that this function depends on the actual form of the cost function. For a given function g, the distance between the actual distribution f m m,t and its mean f is given b f m m,t f f m m,t f g. We now formall define the light tail condition. Assumption 3 Light Tail: Given an ǫ > 0, there exists a state value z > 0, such that E E gũm 1 Ũ m >z Ũm g Ũm 1 Ũ m >z Ũm m f f m ǫ, m, 7 ǫ, m. 8 Here Ũm is a random variable distributed according to f m. As mentioned before, g is a weight function for the state. Thus, the light tail assumption requires that the weighted tail probabilit of the competitors goes to zero uniforml over m. Also, note that onl the second condition needs to be checked; a straightforward application of Jensen s inequalit then shows the first condition must hold. However, we retain both inequalities for clarit. Furthermore, if sup g <, then the light tail condition is just an m assumption on the tail of the mean distribution f. In order that the AME propert holds, we would like that the cost functions are close to each other when the actual 5533

4 47th IEEE CDC, Cancun, Mexico, Dec. 9-11, 008 distribution f m m,t is close to its mean distribution f. For this to hold, we impose a uniformit condition on the growth of the cost function. Specificall, we make the following assumption on the cost function. Assumption 4: There exists a constant C such that, for all x,a,f 1,f and m, we have cx,a,f 1,m cx,a,f,m C cx,a,f 1,m f 1 f + H f 1 f Note that here H f denotes a function that is linear in moments of f up to the second moment; i.e., H f cannot include an terms that depend on moments higher than f. Note that the function g in Definition 4 must be chosen such that the assumptions 3 and 4 are simultaneousl satisfied. As we will show in section V, the linear quadratic Gaussian LQG tracking problem discussed in is a special case where the difference in the cost functions can be expressed in the form given above. Furthermore,we will show that for the LQG problem, the function g is a polnomial in the state. The next assumption is on the set of policies M. We will restrict attention to sequences µ m, µ m that satisf the following assumption. Assumption 5: Given a sequence µ m, µ m, we assume that: sup E β t c x i,t,a i,t, m m f,m x i,0 x,f m,0 f;µm, µ m < In contrast to earlier work on oblivious equilibrium 1 6, our assumptions are significantl different. Specificall, we do not assume that the cost functions are uniforml bounded in state x, distribution f, or actions a as was done in the previous work. This lack of uniform bound on the cost function necessitates a significantl different proof technique. However, some of the ideas in the proofs are borrowed from 6. IV. ASYMPTOTIC RESULTS FOR OBLIVIOUS EQUILIBRIUM In this section, we prove the AME propert using a series of technical lemmas. Assumptions 1-5 are kept throughout the remainder of this section. The first lemma shows that under the 1 g norm, the variance of the distribution f m,t goes to zero as the number of plaers become large. Lemma 1: Under the light-tail assumption and if all the plaers use oblivious polic µ m, we have f m m E,t f 0 as m f m,t Proof: We can write m f g f m,t m f. Now, let a small ǫ > 0 be given and let z be such that the light tail condition in equation 7 and 8 is satisfied for the given ǫ. Then, f m m,t f f z max g m m,t f 1,g z + gf m,t + m. g f >z >z Using the identit a + b + c 4a + 4b + 4c, we get that Ef m m,t f 4z E max 1,g z g f m m,t f }{{} + 4E gf m >z,t } {{ } B z m +4E A m z >z m g f } {{ } C z m. Note that term in parenthesis in C z m is independent of t and hence a constant. B the light tail assumption, for sufficientl small ǫ > 0 and sufficientl large z, we have C z m 4ǫ < 4ǫ for all m. Let us now consider the term A z m. We have E f m m,t f 1 m 1 E 1 {x} E 1 {x}, 1 m 1 Var 1 {x}, 1 0 as m. 4m 1 The random variable 1 {x} is a Bernoulli random variable with E 1 {x} q m and Var 1 {x} q m 1 q m 1 4. Thus, for each z, there exists an m such that for m m we have f m m E,t f ǫ 4z g Define m a max z {m }. Then, for m > m a we have A z m < ǫ. Let us now consider the term B z m we have of f m,t B z m 4 m 1 E >z 4 m 1 E >z 9. Using the definition g1 {x} g1 {x} where the interchange of summation is justified since for an given m, the summations are finite. Let us denote 5534

5 47th IEEE CDC, Cancun, Mexico, Dec. 9-11, 008 >z g1 {x } W m. We can thus write B z m as B z m 4 m 1 E W m 4 m 1 Var W m + EW m 10 From equation 4, we know that E 1 {x} fm. From the light tail condition, for chosen z, we have E W m < ǫ for all m. Thus, E < m 1ǫ 11 W m Let us now consider the first term in equation 10. We have Var Var W m W m m 1Var m 1 E < m 1E W m W m W m E W m + m 1ǫ 1 where the first equalit follows from the fact that f m,t is an m m 1 fold convolution of f, since all the opponents are using oblivious polic µ and hence are evolving independentl. The last inequalit is because of the chosen value of z, which gives E W m < ǫ for all m. Note that W m as defined is a random variable that takes the value g with m probabilit f. Thus, E W m g m f < ǫ >z where the last inequalit follows from the light tail condition. Substituting the above equation in equation 1, we get that Var < m 1ǫ + m 1ǫ < m 1ǫ W m 13 Substituting equations 11 and 13 in equation 10, we get that for some m > m b we have B z m 4 < m 1ǫ + m 1 m 1 ǫ 8 < m 1 ǫ + 4ǫ 1ǫ From bounds on A z m and C z m and the above equation, we get that for sufficientl large z and for m > max{m a,m b } and we have Ef m m f 17ǫ. 1,g,t Since ǫ is arbitrar, this proves the lemma. Lemma : For all x and µ m, µ m satisfing assumptions 1-5, we have lim E β t c x i,t,a i,t,f m m,t,m m c x i,t,a i,t, f,m xi,0 x;µ m, µ m Proof: Let us define m i,t c x i,t,a i,t,f m,t,m c 0. m. x i,t,a i,t, f,m m Also denote c m,t c x i,t,a i,t, f,m and Fm,t f m m f. Using assumption 4 on the cost function,t we have E β t m i,t µ m, µ m CE β t c m,t F m,t µ m, µ m + E β t H Fm,t C β t E c m,t F m,t µ m, µ m + β t H E Fm,t where the last equalit follows from monotone convergence theorem. Let us denote the first term of the above equation as T 1 and the second term as T. Using Cauch-Schwarz inequalit we get that T 1 C β t E c 1/ m,t β t E Fm,t 1/ C β t E c 1/ m,t β t E Fm,t 1/ where the last inequalit is due to Hölder s inequalit. Note that we have dropped the polic vector in the conditioning field for notational compactness. B assumption 5, the first term in above equation is bounded. Also, note that E Fm,t is independent of t and hence β t E Fm,t E Fm,t 1 β Substituting the above equation in second term of T 1 and also in T and using lemma 1 we get the desired result. Theorem 3 Main Theorem: Consider a sequence of oblivious equilibrium policies µ m that satisfies assumption 5 with µ m equal to either the oblivious or nonoblivious best response to µ m. Then the AME propert holds. That is, for all i,x, we have lim E V m x,f,m µ m, µ m inf V µ M x,f,m µ, µ m 0. Proof: The proof of the theorem is similar to one given in 1 and is omitted due to space constraints. 5535

6 47th IEEE CDC, Cancun, Mexico, Dec. 9-11, 008 V. LINEAR QUADRATIC SYSTEM In this section, we consider linear quadratic LQ games with man plaers. We show that these games are a special case of the model considered in the previous section. We keep assumptions 1, and 5 throughout the remainder of this section. Our objective in this section is to prove that assumptions 3 and 4 hold for the LQ model, and thus the AME propert holds for the LQ model. Compared to, we consider a discrete time version of the LQ games. Furthermore, for simplicit we assume that all plaers have the same form of cost function, i.e., the are uniform. This assumption can be relaxed, as we discuss in section VI. The state evolution of a plaer 1 i m is given as x i,t+1 Ax i,t + Ba i,t + w i,t, 14 where x i,t Z and a i,t Z. Here A,B > 0. The noise process w i,t is assumed to be independent across time as well as plaers. We also assume that the noise process has zero mean and all its moments are finite. We assume that the single period cost function for a plaer i is separable in its action. That is c x i,t,a i,t,f m,t c 1 x i,t,f m,t + c a i,t where we assume that c a i,t is a quadratic function of a i,t. Note that the cost function does not depend upon the number of plaers plaing the game. Here f m,t is the distribution of m plaers over the state space. Let us define the kth moment of f m,t as γ k f m,t k. We assume that c 1 is a jointl strictl convex, quadratic function of x and γ k, k 1,...,K. Also, for given x and, we have inf f m,t c 1 x,f ǫ 0 > 0. x,f This model is a generalization of the LQG games considered in, where the cost function depends onl on the first moment γ 1 of the distribution f m,t. Furthermore, that result was derived for a specific form of cost function. Each plaer choses an action a i,t µ m x i,t,f m,t to minimize its total expected cost. The expected net present cost for a plaer i is given as V x,f µ m,µ E β t c x i,t,a i,t,f m,t x i,0 x,f m,0 f;µm,µ We assume that each plaer chooses its polic µ m to make its expected net present value V x,f µ m,µ finite. In oblivious equilibrium, each plaer s polic is onl a function of its current state and average aggregate distribution of its competitors. The next lemma shows that for the LQ model described above, the oblivious equilibrium is independent of the number of plaers m Lemma 4: Under assumption, for the LQ model described above, there exists an oblivious equilibrium that is independent of the number of plaers m plaing the game. Proof: Let ˆµ, ˆf be an oblivious equilibrium for the LQ model with m 1 plaers plaing the game. This means that ˆµ is an optimal polic under the cost function cx,a, ˆf and that ˆf is the stationar distribution obtained from the dnamic equation x i,t+1 Ax i,t + Bˆµx i,t, ˆf + w i,t. Now let us assume that the number of plaers changes to m. Consider a plaer i and assume that ever one of its m 1 opponent uses the polic ˆµ. Then f ˆf, and since the cost function does not depend upon the total number of plaers, the optimal polic for plaer i is ˆµ. Since the dnamics are independent of the number of plaers, the stationar distribution for plaer i is ˆf. Thus, ˆµ, ˆf is an oblivious equilibrium for the game with m plaers. To prove that there is no optimalit loss if a plaer uses oblivious equilibrium polic, we first define the weight function g for the LQ model. Definition 5 LQ 1-g norm: For the LQ model described above, let g K. With this weight function, we define f The following lemma verifies the light-tail condition for the LQ model. Lemma 5: For the LQ model, the light tail condition holds for the weight function g K. Proof: Note that f q where q is the stationar probabilit of the Markov chain to be in state and is independent of the number of plaers m. For linear sstems with quadratic cost, we know that the optimal polic is a linear function of the state 11. That is, µ Lx+l 0, where L > 0 and l 0 are constants that ma depend on f. The closed loop sstem thus evolves as x t+1 A BLx t +Bl 0 +w t. From assumption 1, we know that the closed loop sstem is stable; thus we have A BL < 1. We use the Foster-Lapunov stabilit criterion 1 to establish the light tail condition. Using the Lapunov function V x x and the fact that the noise process has finite mean, we can show that E q x is finite. We then use induction to show that the stationar distribution has finite K moments. Specificall, assume that E q x j is finite for j 1,...,p, where p < K. Then, using the Lapunov function V x x p+1 and the fact that the noise process has finite moments, we can show that E q x p+1 is finite. Thus, the stationar distribution has its first K moments finite 13. So for an given ǫ, there exists a state z, such that >z f K K q < ǫ. Hence the lemma is proved. The next lemma relates the absolute difference in the cost function to the 1 g norm difference between the actual distribution f m,t and its average f. Lemma 6: For the LQ model described above, there exist

7 47th IEEE CDC, Cancun, Mexico, Dec. 9-11, 008 constants M 1,M and M 3 such that c x i,t,a i,t,f m,t c x i,t,a i,t, f M 1 c x i,t,a i,t, f f m,t f + M f m,t f + M 3 f m,t f Proof: For simplicit of notation, we drop the subscripts i,t. Also denote Γf γ 1 f,...,γ K f, where γ i f is the ith moment of the distribution f. Then, we have c cx,a,γf c x,a,γ f c 1 x,γf c 1 x,γ f 1 x,γ f + αf f dα α0 α 1 f f α0 x,γ f + αf f df dα 15 where we have used the fundamental theorem of calculus. Consider the second term in the above equation. We have x,γ f + αf f df γ j γ j1 j f γ j1 j j K K 1 K γ j j1 where the last inequalit holds for some K 1 since Z. Substituting the above equation in equation 15, we get c K 1 f f j1 1 α0 x,γ f + αf f dα 16 γ j where we use the fact that f f is independent of α and we also changed the order of integral and the summation. Define a vector z γ 1,γ,...γ K,x,1 and note that c 1 is a strictl convex quadratic function of z. Thus, we can write c 1 x,γ 1,...,γ K z T Qz for some positive definite smmetric matrix Q. Note that Γ f + αf f j γ j + αγ j γ j. Since c 1 is a quadratic function of γ j, the partial derivative of c 1 with respect to γ j is a linear function of γ j and consequentl a linear function of α. Thus, we have x,γ f + αf f γ j Q sj z s + Q js z s + Q K+1,j x + Q j,k+ s1 Q K+1,j x + Q K+1,j x + s1 Q sj + Q js z s + Q j,k+ s1 Q sj + Q js γ s + α γ s γ s s1 + Q j,k+ 17 where we have used the fact that z s γ s + αγ s γ s for s 1,...,K. Now, γ s γ s f s f s f f s f f K f f Substituting above equation in equation 17, and integrating with respect to α we get that 1 x,γ f + αf f α0 γ j dα Dj T f z + K f, for some K. Here we used the notation z γ 1,... γ K, x,1 and D j is a vector of length K+ with coefficients from equation 17. Substituting above equation in equation 16, we get c K 1 f f Dj T z + K K f f j1 K 1 f f D T z + K 1 K K f f 18 Now consider D T z. We show that there exists a constant δ > 0 such that δd T z z T Q z + ǫ 0. Define w δ/q 1/ D. Then, we have z T Q z + ǫ 0 δd T z z T Q z w T Q 1/ z + ǫ 0 + w T w w T w T Q 1/ z w Q 1/ z w + ǫ 0 w T w T z Q 1/ w Q z Q 1/ w + ǫ 0 w T w ǫ 0 w T w 5537

8 47th IEEE CDC, Cancun, Mexico, Dec. 9-11, 008 where the last inequalit follows since the first term is a quadratic form. So we need to ensure that ǫ 0 w T w 0, which implies that δ 4 DT Q 1 D ǫ 0 δ 4ǫ 0 D T Q 1 D If we choose K 3 > 1/δ, then we have D T z K 3 z T Q z + K 3 ǫ 0. Next we show that there exists a K 4 such that z T Q z K 4 zq z, where z γ 1,... γ K,x,1. Note that z T z z T z. We have z T Q z λ Q max z T z λ Q max z T z K 4 λ Q min zt z where the last inequalit is true for a suitable choice of K 4. Note that Q is positive definite matrix so λ Q min > 0. Since K 4 λ Q min zt z K 4 z T Q z, we have that z T Q z K 4 zq z for some value of K 4. Now, D T z K 3 z T Q z + K 3 ǫ 0 K 3 K 4 zq z + K 3 ǫ 0 K 3 K 4 c 1 x, γ 1,... γ K + K 3 ǫ 0 19 Substituting the above equation in equation 18, we prove the lemma. We restrict attention to sequences µ m, µ m that satisf assumption 5. We conjecture that this assumption would be satisfied for the LQ model if the noise process has finite fourth moment. Under this assumption, the AME propert holds for linear sstems with quadratic costs. VI. CONCLUSIONS AND DISCUSSION In this paper, we studied stochastic dnamic games with man plaers, where the plaers are coupled via their cost functions. Similar to 1, we showed that for a certain class of stochastic games, we can approximate MPE b a computationall simpler concept called OE. Previous work done in this area 1, 6 where the AME propert was established for profit maximization assumed a uniform bound on the cost function. In this paper, we extended the notion of OE to a class of games where the cost function is unbounded. The lack of a uniform bound necessitated a new proof technique. We showed that games with linear dnamics and quadratic costs are a special case of the model considered. This generalizes a similar result derived in. In the development throughout the paper, we have assumed that all plaers are uniform, i.e., the have same form of cost function c. The results of this paper can be easil extended to the case where the plaers are non-uniform and their cost functions are drawn from a finite set of tpes. The reader is referred to 6, where a similar development was done albeit for a different set of assumptions. However, the same technique can be used here to extend our model; the details of which are omitted due to space constraints. As mentioned before, the concept of oblivious equilibrium was first introduced in 1, where it was used to establish AME propert for industr dnamic models. Common single period profit functions used in those models are bounded over states. A tpical example is a profit function arising from price competition among firms that face a logit demand sstem generated b consumers with bounded income. For such functions, the AME propert can be established using a slightl different approach. Specificall, assumptions 4 and 5 can be replaced b a uniform bound on the cost function as well as assuming that the cost functions are Gateaux differentiable with respect to f m. For such models, it is possible to show that log cx,a,f 1,m log cx,a,f,m f 1 f where g sup x,a,f m,m log c x,a,f m,m f m The statement of lemma then follows b a similar argument as given in lemma 4 of 6. As a part of our future work, we hope to develop a model that unifies these two approaches. REFERENCES 1 G. Y. Weintraub, C. L. Benkard, and B. Van Ro, Markov Perfect Industr Dnamics with Man Firms, accepted for publication, Econometrica. M. Huang, P. E. Caines and R. P. Malhamé, Large-Population Cost- Coupled LQG Problems With Nonuniform Agents : Individual-Mass Behavior and Decentralized ǫ-nash Equilibria, IEEE Transactions on Automatic Control, Vol. 5, No. 9, pp , M. Huang, P. E. Caines and R. P. Malhamé, Uplink Power Adjustment in Wireless Communication Sstems: A Stochastic Control Analsis, IEEE Transactions on Automatic Control, Vol. 49, No. 10, pp , October S. Ulukus and R. D. Yates, Stochastic Power Control for Cellular Radio Sstems, IEEE Transactions on Communications, Vol. 46, No. 6, pp , R. Ericson and A. Pakes, Markov-perfect industr dnamics: A framework for empirical work, Review of Economic Studies 61, pp. 53-8, V. Abhishek, S. Adlakha, R. Johari and G. Weintraub, Oblivious Equilibrium for General Stochastic Games with Man Plaers, Allerton Conference on Communication, Control and Computing, A. Pakes and P. McGuire, Computing Markov-Perfect Nash Equilibria: Numerical Implications of a Dnamic Differentiated Product Model, RAND Journal of Economics, 54, pp , A. Pakes and P. McGuire, Stochastic Algorithms, Smmetric Markov Perfect Equilibrium, and the Curse of Dimensionalit, Econometrica, 695, pp , A. C. Antoulas and D. C. Sorensen, Approximation of Large-Scale Dnamical Sstems: An Overview, Int. J. Appl. Math. Comput. Sci, Vol. 11, No. 5, pp , D. D. Siljak, Decentralized Control of Complex Sstems, Academic Pr., D. P. Bertsekas, Dnamic Programming and Optimal Control, Athena Scientific, S. P. Men and R. L. Tweedie, Markov Chains and Stochastic Stabilit, Springer-Verlag, J. G. Dai and S. P. Men, Stabilit and Convergence of moments for Multiclass Queuing Networks Via Fluid Limit Models, IEEE Transactions on Automatic Control, Vol. 40, No. 11, pp , M. Huang, R. P. Malhamé, and P. E. Caines, Nash Equilibria for Large-Population Linear Stochastic Sstems of Weakl Coupled Agents, Analsis, Control and Optimization of Complex Dnamical Sstems, pp. 15-5, J. M. Lasr and P. L Lions, Mean Field Games, Japanese Journal of Mathematics,, 9-60,

Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games

Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games Gabriel Y. Weintraub, Lanier Benkard, and Benjamin Van Roy Stanford University {gweintra,lanierb,bvr}@stanford.edu Abstract