Decoupling Coupled Constraints Through Utility Design

Size: px

Start display at page:

Download "Decoupling Coupled Constraints Through Utility Design"

Dinah Reeves
6 years ago
Views:

1 1 Decoupling Coupled Constraints Through Utility Design Na Li and Jason R. Marden Abstract The central goal in multiagent systems is to design local control laws for the individual agents to ensure that the emergent global behavior is desirable with respect to a given system level objective. In many systems this control design is further complicated by coupled constraints on the agents behavior. This paper seeks to address the design of such algorithms using the field of game theory. In particular, is it possible to design local agent utility functions for an unconstrained game such that all resulting pure Nash equilibria (i) optimize the given system level objective and (ii) satisfy the given coupled constraint. Such developments would greatly simplify the control design by eliminating the need to explicitly consider the constraint. Unfortunately, we illustrate that the standard game theoretic framework is not suitable for this design objective. However, we demonstrate that by adding an additional state variable in the game environment, i.e., moving towards a special class of Markov games termed state based games, we can satisfy these performance criteria through utility design. The key to this realization is incorporating exterior penalty functions and barrier functions into the design of the agents utility functions. I. INTRODUCTION The central goal in multiagent systems is to derive desirable collective behaviors through the design of individual agent control algorithms [1] [7]. In many systems, the desirable collective behavior must also satisfy a given coupled constraint on the agents actions [2] [7]. One example is the problem of TCP control where the users sending rates need to satisfy link capacity constraints [4]. An alternative example is the problem of power control in MIMO interference systems where the users sending rates need to satisfy quality of service constraints This research was supported by AFOSR grant #FA Na Li is a graduate student of Control and Dynamical Systems, California Institute of Technology, Pasadena, CA 91125, USA nali@caltech.edu J. R. Marden is with the Department of Electrical, Computer & Energy Engineering, University of Colorado, Boulder, CO 80309, USA jason.marden@colorado.edu. Corresponding author.

2 2 [5], [6]. Regardless of the specific application domain, these coupled constraints bring additional complexity to the control algorithm design. There are two main research directions aimed at designing distributed control algorithms to satisfy performance criteria involving coupled constraints. The first direction seeks to design algorithms that ensure the coupled constraint is always satisfied, e.g., the well-studied consensus algorithm [1], [3], [8], [9]. While theoretically appealing, such algorithms lack a robustness to environmental uncertainties, noisy measurements, and inconsistent clock rates amongst the agents. The second direction seeks to design algorithms that ensure the limiting behavior satisfies the coupled constraints, e.g., dual decomposition [4], [10] [13] and subgradient methods [14], [15]. Such algorithms often require a two-time scale solution approach by introducing pricing terms to help coordinate behavior. Depending on the application domain, these two-time scale approaches may be prohibitive either by the informational dependence on the pricing terms or the rigidity of the update algorithm. 1 Another key problem associated with the above algorithms is the informational dependence on the agents control policies. From a design perspective, it is important to have a methodology for control design which guarantees that the resulting agent control policies are within an admissible space, e.g., control policies depend only on local information. However, when using the aforementioned algorithms, the informational dependence of the resulting agents control policies is a byproduct of the algorithm and the structure of the system level objective function. Hence, the resulting control policies need not be admissible. Recently, the approaches in [14] and [15] seek to address this problem by adding such informational constraints to the resulting control policies. The derived results in [14], [15] hold for a special class of optimization problems 1 For example, in [11] the authors introduce a two-time scale algorithm for a network coding problem where the pricing terms are set, agents reach an equilibrium, and then the pricing terms are updated. This type of algorithm is difficult to implement as convergence rates for reaching an equilibrium are uncharacterized; therefore, it is hard to determine when to adapt the pricing terms. In alternative settings, such as [4], the agents can reach an equilibrium in one time step; hence, implementing such an algorithm is not necessarily prohibitive. However, the fact that agents can equilibrate in one time step is a result of the special structure of the objective function and in general is not true for other system level objective functions.

3 3 with separable objective functions and coupled feasibility constraints. 2 In contrast, we focus on an alternative class of optimization problems with non-separable objective functions and linear coupled constraints. While our solution approach is fundamentally different than the approaches in [14], [15], in spirit the approaches are similar as informational limitations are overcome by introducing local estimation terms for the agents. Our results offer certain advantages over the results in [14], [15] by (i) providing automatic robustness guarantees and (ii) providing algorithms which ensure that the coupled constraints are always satisfied. 3 This paper focuses on using game theory for the design of agent control policies. A game theoretic approach to cooperative control involves the following two steps: (i) Game design: Define the agents as self-interested decision makers in a game theoretic environment. (ii) Agent decision rules: Define a distributed learning algorithm, i.e., a distributed control law, for each agent to coordinate group behavior. The goal is to do both steps in conjunction with one another to ensure that the emergent global behavior is desirable with respect to a given system level objective [16], [17]. One of the central appeals of game theory for the design and control of multiagent systems is that it provides a hierarchical decomposition between the distribution of the optimization problem (game design) and the specific distributed control laws (agent decision rules) [18]. For example, if the game is designed as a potential game then a system designer can appeal to a wide class of distributed learning algorithms which guarantee convergence to a (pure) Nash equilibrium under a variety of informational dependencies [19] [23]. This decomposition provides the system designer with tremendous flexibility to meet the objectives and constraints inherent to a broad class of multiagent systems. Furthermore, the framework of potential games is inherently robust. That is, even if the agents control policies are inherently heterogenous stemming from delays 2 The paper [14] considers a distributed optimization problem where each agent i {1,..., n} is endowed with a local objective n function f i( ) and a local constraint set X i. The considered system level objective is of the form min x n i=1 X fi(x) i i=1 where f i : R n R is a convex function and X i R n is a closed convex set. The algorithm in [15] also focuses on distributed optimization problems with the same separable objective functions but with coupled constraints of the form of g(x) 0. 3 The algorithm presented in [15] does not ensure that the coupled constraints are always satisfied for the considered class of optimization problems with linear coupled constraints of the form Ax b. The results contained in this paper (Section IV-C) provide such algorithms.

4 4 or corruption in information, varying degrees of accessible information, and variations in clock rates amongst the agents, behavior will still to converge to a pure Nash equilibrium in such settings. This theme is encapsulated in [22] which proves that virtually any reasonable decision making process will converge to a pure Nash equilibrium in potential games. The first part of this work focuses on the following question: Is it possible to design a game such that (i) the agents utility functions depend only on local information, (ii) all resulting equilibria satisfy the desired performance criterion, and (iii) the resulting game has an underlying structure that can be exploited by distributed algorithms? Our first result proves that the framework of strategic form games is not rich enough to meet these objectives. We demonstrate this impossibility result on the well-studied consensus problem which possesses a convex system level objective function and linear coupled constraints. Ultimately, the desired locality in each agent s utility function limits the degree to which we can achieve these objectives. Our second result establishes a systematic methodology for game design that satisfies the above objectives by considering a broader class of games than that of strategic form games. Specifically, we consider a simplified class of Markov games [24], termed state based games, which introduces an underlying state space into the game theoretic environment [25]. Here, the local state variables are introduced as a mechanism for each agent to estimate constraint violations using only available information. The novelty of our approach stems from integrating classical optimization techniques, in particular exterior penalty methods and barrier function methods, into the design of the agents cost functions as shown in Sections IV-B and IV-C respectively. Both methodologies ensure that all three objectives are satisfied (Theorems 2 and 3). The core difference between the two approaches is that barrier functions can also be used to ensure that the constraint is satisfied dynamically in addition to asymptotically. The developed methodologies, whether incorporating the exterior penalty method or the barrier function method, ensure that the resulting game fits into a class of games termed state based potential games [25]. A state based potential game, which represents an extension of potential games to the class of state based games, possesses an underlying structure that can be exploited in distributed learning. Our last result proves that the learning algorithm gradient play [26] [28] asymptotically converges to a set of equilibria, termed stationary state Nash equilibria, in state based potential games. Accordingly, the two step design approach achieves our desired performance criteria.

5 5 II. PRELIMINARIES A. Multiagent Systems: Problem Definition We consider a multiagent system consisting of n agents denoted by the set N {1,, n}. Each agent i N is endowed with a set of possible decisions (or values) denoted by V i which we assume is a convex subset of R, i.e., V i R. 4 We denote a joint decision by the tuple v (v 1,..., v n ) V i N V i where V is referred to as the set of joint decisions. There is a global objective of the form φ : V R that a system designer seeks to minimize. We assume throughout that φ is differentiable and convex unless otherwise noted. Lastly, we focus on multiagent systems with an inherent coupled constraint represented by a set of linear inequalities { n i=1 A k i v i C k 0} k M {1,...,m}. More formally, the optimization problem takes on the form min vi V i,i N φ(v 1, v 2,..., v n ) (1) s.t. ni=1 A k i v i C k 0, k M We do not explicitly highlight the equality case, n i=1 A k i v i C k = 0, since this can be handled by two inequalities of the form n i=1 A k i v i C k 0 and ( ni=1 A k i v i C k) 0. The focus of this paper is different from that of direct optimization as our goal is to establish an interaction framework where each decision maker i N makes a decision independently in response to local information. These informational dependencies are modeled by an undirected and connected communication graph G = (N, E) with nodes N and edges E N N where the neighbor set of agent i is N i {j N : (i, j) E}. 5 The interaction framework produces a sequence of decisions v(0), v(1), v(2),..., where at each iteration t {0, 1,...} the decision of each agent i is chosen independently according to a local control law of the form v i (t) = F i ({Information about agent j at stage t} j Ni ) (2) which designates how each agent processes available information to formulate a decision at each iteration. The goal in this setting is to design the local controllers {F i ( )} i N within the desired informational constraints such that the collective behavior converges to a joint decision v that 4 The results in this paper also hold for value with a higher dimension, i.e, V i R d i where d i > 1 and d i can be different from d j if i j. For ease of exposition, we focus purely on the case d i = 1 for all i N. 5 By convention we let i N i for each agent i N.

6 6 solves the optimization problem in (1). Note that these informational constraints often limit the availability of established distributed optimization approaches. The main objective of this paper is to develop a methodology for decoupling coupled constraints through local estimation terms. Hence, we assume throughout that it is possible to design local agent control policies of the form (2) for solving the class of optimization problems in (1) without the coupled constraints. Accordingly, we assume that for each agent i N there exists a function M i : j N i V j R such that for any value profiles v, v V where v i = v i we have ( ) ( {v } ) φ(v) φ(v ) = M i {vj } j Ni Mi j Here v i denotes the value profile of agents other than agent i, i.e., v i = (v 1,..., v i 1, v i+1,..., v n ). There are several examples of systems with such objective functions including consensus [1], [3], [8], [9], power control [10], network coding [11], [29], sensor coverage [30], network routing [4], [31], among many more. 6 Furthermore, recent work in [32] focuses on how to achieve this for more general objective functions by introducing auxiliary state variables. j N i (3) B. Background: Finite Strategic Form Games This paper focuses on using game theory to derive control policies of the form (2). Game theory provides a rich analytical framework to analyze distributed systems where the decisionmakers are responding to local and heterogenous objective functions. The core game theoretic framework is that of strategic form games which consists of an agent set N {1,..., n}. Each agent i N has an action set A i and a cost function J i : A R where A := i N A i denotes the joint action set. For an action profile a = (a 1,..., a n ), let a i denote the action profile of agents other than agent i, i.e., a i = (a 1,..., a i 1, a i+1,..., a n ). We will frequently represent a game by the tuple G = {N, {A i } i N, {J i } i N }. Pure Nash equilibria represent one of the most famous equilibrium concepts for such distributed systems. An action profile a A is called a pure Nash equilibrium if for all i N J i (a i, a i) = min a i A i J i (a i, a i). 6 Consider the consensus problem introduced in Section III-A. The objective function for consensus is of the form 1 v 2 i N,j N i v i j 2 2. It is straightforward to verify that this objective function satisfies (3) where M i ({v j} j Ni ) = v j N i v i j 2 2.

7 7 Our motivation for studying pure Nash equilibria derives from the fact that for many game structures, e.g., potential games, there are a wide array of distributed learning algorithms which guarantee convergence to a (pure) Nash equilibrium under a variety of informational dependencies [19] [23]. III. AN IMPOSSIBILITY RESULT FOR GAME DESIGN We focus on game theory as a tool for obtaining distributed solutions to the optimization problem (1). Accordingly, we formulate the optimization problem as a game where the agent set is N, the action set of each agent is V i, and each agent is assigned a cost function of the form J i : V R. This section focuses on the feasibility of developing a methodology for game design that achieves the following three objectives: (O-1): Each agent s cost function relies solely on local information and is of the form J i : j N i V j R. Furthermore, the cost functions should possess a degree of scalability with regards to the size of the system and the topology of the communication graph. (O-2): All Nash equilibria of the resulting game G represent solutions to the optimization problem in (1). (O-3): The resulting game G possesses an underlying structure, e.g., potential games [19] [23], that can be exploited by distributed learning algorithm. Accomplishing these objectives would ensure that the agents control policies resulting from the designed game plus a suitable learning algorithm would be of the local form in (2). However, in this section we demonstrate that achieving these objectives using the framework of strategic form games is impossible in general. We demonstrate this impossibility result on the well-studied consensus problem. A. Definition: Average Consensus We now present the average consensus problem in [1]. Consider a set of agents N = {1,, n} where each agent i N has an initial value v i (0) R. The goal of the consensus problem is to establish a set of local control laws of the form (2) such that each agent s value converges to the average of the initial values, i.e., lim t v i (t) = v = 1 n i N v i (0). Given that the communication graph G is connected, this can be formalized as the following optimization

8 8 problem: 1 min v 2 i N,j N i v i v j 2 2 s.t. i N v i = i N v i (0). Furthermore, the control laws {F i ( )} i N should provide the desired asymptotic guarantees for any initial value profile v(0) and communication graph G. This implies that the underlying control design must be invariant to these parameters. In the case of average consensus, this also implies that the underlying control design must be invariant to the agents indices. We refer to such control designs as scalable. (4) B. Average Consensus: A Game Theoretic Formulation Consider a game theoretic model for the consensus problem where the agent set is N and the action set of each agent is R, i.e., A i = R for all agents i N. To facilitate the design of scalable agent control policies, we focus of the design of agent cost functions of the form J i (v) J ( ) {v j, v j (0)} j Ni (5) where the function J( ) is invariant to specific indices assigned to agents. Notice that this design of J( ) leads to a well defined game irrespective of the agent set N, initial value profile v(0), or the structure of the communication graph {N i } i N. The following proposition demonstrates that it is impossible to design J( ) such that for any game induced by an initial value profile and communication graph all resulting Nash equilibria solve the optimization problem in (4). Proposition 1 There does not exist a single J( ) such that for any game induced by a connected communication graph G, an initial value profile v(0) V, and agents cost functions of the form (5), the Nash equilibria of the induced game represent solutions to the optimization problem in (4). Proof: Suppose that there exists a single J( ) that satisfies the proposition. We will now construct a counterexample to show this is impossible. Consider two average consensus problems with a communication graph and initial values (either 0 or 20) as highlighted in Figure 1. For example, in setting (a) we have N = {1, 2, 3, 4}, E = {{1, 2}, {1, 3}, {3, 4}}, v 2 (0) = 20, and v 1 (0) = v 3 (0) = v 4 (0) = 0. Define a cost function for each agent of the form (5) for both

$This implies that for any agent i N, we have J ( ( ) {vi, v i (0)}, {vj, v j (0)} j Ni \i) J {vi, v i (0)}, {vj, v j (0)} j Ni \i (6) for any v i V i.$ Furthermore, by assumption we know that vi = 5 for all agents i = 1, 2, 3, 4. We now claim that (v1, v2, v3, v4, v5) = (5, 5, 5, 5, 5) must also be a Nash equilibrium for setting (b).

Furthermore, by assumption we know that vi = 5 for all agents i = 1, 2, 3, 4. We now claim that (v1, v2, v3, v4, v5) = (5, 5, 5, 5, 5) must also be a Nash equilibrium for setting (b).

(b). This results from the fact that the agents cost functions are invariant to the specific agent indices.

Furthermore, agent 5 in setting (b) has the same cost function as agent 3 in setting (a) also resulting from the fact that the agents cost functions are invariant to the specific agent indices.

9 9 Fig. 1. Two consensus problems for the proof of Proposition 1. settings (a) and (b) using this single J( ). Suppose (v1, v2, v3, v4) represents a Nash equilibrium for setting (a). This implies that for any agent i N, we have J ( ( ) {vi, v i (0)}, {vj, v j (0)} j Ni \i) J {vi, v i (0)}, {vj, v j (0)} j Ni \i (6) for any v i V i. Furthermore, by assumption we know that vi = 5 for all agents i = 1, 2, 3, 4. We now claim that (v1, v2, v3, v4, v5) = (5, 5, 5, 5, 5) must also be a Nash equilibrium for setting (b). By writing down the Nash equilibrium condition in (6) for setting (b), it is straightforward to see that agents 1, 2, 3, 4 in setting (a) have the same cost function as agents 1, 2, 3, 4 in setting (b). This results from the fact that the agents cost functions are invariant to the specific agent indices. Therefore, since (v1, v2, v3, v4) = (5, 5, 5, 5) is a Nash equilibrium for (a) this implies that no member of the above group has a unilateral incentive to deviate from vi = 5 in setting (b). Furthermore, agent 5 in setting (b) has the same cost function as agent 3 in setting (a) also resulting from the fact that the agents cost functions are invariant to the specific agent indices. Therefore, agent 5 also does not have a unilateral incentive to deviate from v5 = 5 in setting (b) following the same logic as above. Therefore (v1, v2, v3, v4, v5) = (5, 5, 5, 5, 5) is a Nash equilibrium of (b). The impossibility comes from the fact that 5 does not represent the average of setting (b). IV. STATE BASED GAME DESIGNS The previous section demonstrates that the framework of strategic form games is not suitable for handling coupled constraints through the design of agent cost functions. In this section, we

10 10 focus on achieving objectives (O-1) (O-3) using the framework of state based games [25] which represents a simple yet tractable class of Markov games [24]. In such games, each agent i N is assigned a local state based cost function of the form J i : X j A j R j N i where X j is a set of local state variables for agent j and A j is a set of actions for agent j. We present two separate designs in this section that achieves objectives (O-1) (O-3) by utilizing these local state variables as local agents estimates of the coupled constraint violations. The first design employs exterior penalty functions in the design of the agents cost functions. The second design employs barrier functions in the design of the agents cost functions. The core difference between the two approaches is that barrier functions can be used to ensure that the constraint is satisfied dynamically in addition to asymptotically. A. Preliminaries: State Based Games State based games represent an extension of finite strategic form games where an underlying state space is introduced to the game theoretic environment. The class of state based games considered in this paper consists of the following elements: (i) an agent set N, (ii) a state space X, (iii) state dependent action sets of the form A i (x) for each agent i N and state x X, (iv) state dependent cost functions of the form J i : X A R for each agent i N where A i N A i and A i x X A i (x), and (v) a deterministic state transition function f : X A X. Lastly, we focus on state based games where for any state x X there exists a null action 0 A(x) such that x = f(x, 0). We denote a state based game by the tuple G = {N, {A i } i N, {J i } i N, X, f}. In this paper we focus on the following class of games, termed state based potential games [25], which represent an extension of potential games to the framework of state based games. Definition 1 (State Based Potential Game) A game G is a state based potential game if there exists a potential function Φ : X A R that satisfies the following two properties for every state x X: (D-1): For any agent i N, action profile a A(x), and action a i A i (x) J i (x, a i, a i ) J i (x, a) = Φ(x, a i, a i ) Φ(x, a)

11 11 (D-2): For any action profile a A(x) and new state x = f(x, a) the potential function satisfies Φ( x, 0) = Φ(x, a). The first condition states that each agent s cost function is aligned with the potential function in the same fashion as in potential games [19]. The second condition relates to the evolution on the potential function along the state trajectory. 7 We focus on the class of state based potential games since dynamics can be derived which converge to the following class of equilibria (Section V): Definition 2 (Stationary State Nash Equilibrium) A state action pair [x, a ] is a stationary state Nash equilibrium if (D-1): For any agent i N we have a i arg min ai A i (x ) J i (x, a i, a i). (D-2): The state is a fixed point of the state transition function, i.e., x = f(x, a ). Note that a (strict) stationary state Nash equilibrium represents a fixed point of a better reply process [23]. Proposition 3 in the appendix proves that a stationary state Nash equilibrium is guaranteed to exist in any state based potential game. B. Design Using Exterior Penalty Functions Our first design integrates exterior penalty functions into the agents cost functions. 8 We now introduce the specifics of our designed game. Table I in the appendix summarizes all necessary notations. State Space: The starting point of our design is an underlying state space X where each state x X is defined as a tuple x = (v, e), where v = (v 1,..., v n ) R n is the profile of values and 7 The definitions of state based potential games differs slightly from the definition in [25] since we focus on state dependent actions sets and games where there exists null actions. 8 Exterior penalty functions are widely used in optimization for transforming a constrained optimization problem into an equivalent uncoupled constrained optimization problem where the constraints are translated to penalties that are directly integrated into the objective functions. More specifically, the coupled constrained optimization problem (1) is transformed into an uncoupled constrained optimization problem min v V φ(v) + µα(v) where α(v) is a penalty function associated with the constraints { i Ak i v i C k 0} k M and µ > 0 is a tradeoff parameter. The value of this transformation centers on the fact that solving this uncoupled constrained optimization problem is easier than solving the optimization problem in (1). For each µ > 0 the solution to the uncoupled constrained optimization problem may represent an infeasible solution to the coupled constrained optimization problem in (1); however, as µ, the limit of the solutions to the uncoupled constrained optimization problem is an optimal solution to the optimization problem in (1). See Theorem in [33] for more details.

12 e = { } e k i is the profile of estimation terms. The term k M,i N ek i represents agent i s estimate for the k-th constraint, i.e., e k i n i=1 A k i v i C k. Actions: Each agent i is assigned a state dependent action set A i (x) that permits the agent to change its value and constraint estimation through communication with neighboring agents. Specifically, an action a i is defined as a tuple a i = (ˆv i, {ê 1 i,..., ê m i }) where ˆv i indicates a change in the agent s value and ê k i indicates a change in the agent s estimate of the k-th constraint. Here, the change in estimation terms for agent i is represented by a tuple ê k i = { } ê k i j. The term j N i ê k i j indicates the estimation value that agent i exchanges (or passes) to agent j N i regarding the k-th constraint. State Dynamics: For notational simplicity let ê k i in j N i ê k j i and ê k i out j N i ê k i j denote the total estimation passed to and from agent i regarding the k-th constraint respectively. The state transition function f(x, a) takes on the following form: for any state x = (v, e) and action a = (ˆv, ê) the ensuing state x = (ṽ, ẽ) = f(x, a) is ṽ i = v i + ˆv i, ẽ i = { e k i + A k i ˆv i + ê k i in ê k i out } 12 k M. (7) Let v(0) be the initial value profile and define the initial estimation terms e(0) to satisfy i N e k i (0) = i N A k i v i (0) C k for all constraints k M. 9 Let x(0) = (v(0), e(0)) be the initial state. It is straightforward to show that for any sequence of action profiles a(0), a(1),..., the resulting state trajectory x(t) = (v(t), e(t)) = f(x(t 1), a(t 1)) satisfies for all t 1 and constraints k M. Therefore e k i (t) = A k i v i (t) C k (8) i N i N e k i (t) 0 A k i v i (t) C k 0 (9) i N i N for any constraint k M. Hence, the estimation terms encode information regarding whether the constraints are violated. Action Sets: The optimization problem in (1) requires that v i V i for all agents i N. Hence, the state dependent action set of agent i for a given state x = (v, e) is defined as 9 Note that there are several ways to define the initial estimation terms to satisfy this equality.

13 13 A i (x) {(ˆv, ê) : v i + ˆv i V i }. The null action 0 takes on the form ˆv i = 0, i N ê k i j = 0, i N, j N i, k M. (10) Agent Cost Functions: We define the state based cost function for each agent i N as a summation of two terms relating to the objective φ and the constraints Av C respectively as J i (x, a) = J φ i (x, a) + µji c (x, a) (11) where µ > 0 is a trade-off parameter. For any state x X and admissible action profile a i N A i (x) the components take on the form J φ i (x, a) = M i ( {ṽj } j Ni ) (12) Ji c (x, a) = m [ ( )] max 0, ẽ k 2 j (13) j N i k=1 where x = (ṽ, ẽ) = f(x, a) represents the ensuing state and M i ( ) is defined in (3). The following two theorems demonstrate two analytical properties of the designed game. The first property establishes that the designed state based game is a state based potential game. We present the proofs of both theorems in the appendix. Theorem 1 Model the constrained optimization problem in (1) as a state based game G with a fixed trade-off parameter µ > 0 as depicted in Section IV-B. The state based game is a state based potential game with potential function m [ ( )] max 0, ẽ k 2 i (14) Φ(x, a) = φ(ṽ) + µ i N where x = (ṽ, ẽ) = f(x, a) represents the ensuing state. The next theorem provides a complete characterization of the set of stationary state Nash equilibria in the designed state based game. k=1 Theorem 2 Model the constrained optimization problem in (1) as a state based game G with a fixed trade-off parameter µ > 0 as depicted in Section IV-B. Suppose the objective function φ : V R is convex and differentiable and the communication graph G is undirected and connected. A state action pair [x, a] = [(v, e), (ˆv, ê)] is a stationary state Nash equilibrium in game G if and only if the following four conditions are satisfied:

14 14 (i) The value profile v is an optimal point of the uncoupled constrained optimization problem min φ(v) + µ [ ( max 0, )] 2 A k i v i C k. (15) v V n i N k M (ii) The estimation profile e satisfies max ( ( ) 0, e k 1 i = n max 0, i N A k i v i C k ) (iii) The change in value profile satisfies ˆv i = 0 for all agents i N., i N, k M. (iv) The net change in estimation profile is 0, i.e., ê k i in ê k i out = 0 for all agents i N and constraints k M. This characterization proves the equivalence between the stationary state Nash equilibria of the designed game and solutions to the uncoupled constrained optimization problem in (15). Therefore, as µ all equilibria of our designed game are solutions to the coupled constrained optimization problem in (1) [33]. Note that this equivalence is surprising given the fact that we impose a restriction on the underlying decision making architecture. In particular, we impose a distributed decision making architecture where each of the decision making entities only has access to local information. Recall that such local information is encoded by the communication graph G which is undirected and connected. C. Design Using Barrier Functions In this section we introduce our second design which integrates barrier functions, as opposed to exterior penalty functions, into the agents cost functions. 10 Both design approaches focus on solving the constrained optimization problem through a sequence of subproblems which generates a trajectory of value profiles v(1), v(2),..., that converges to the optimal solution. The key difference between the two approaches lies in the feasibility of the intermediate solutions as 10 Much like exterior penalty functions, barrier functions are widely used in optimization for transforming a constrained optimization problem into an equivalent, yet easier to solve, optimization problem where the constraints are translated to penalties that are directly integrated into the objective functions. More specifically, the constrained optimization problem in (1) is transformed into the following problem: min v V φ(v) + µb(v) subject to the constraint { n i=1 Ak i v i C k < 0} k M where B(v) is a barrier function associated with the constraints { n i=1 Ak i v i C k < 0} k M and µ > 0 is a given constant or tradeoff parameter. The barrier function is added to the objective function and ultimately prohibits the generated points from leaving the feasible region. Hence, this method generates a feasible point for each µ. When µ 0, the limit of those feasible points is an optimal solution to the original problem. See Theorem in [33] for more details.

15 15 barrier functions can be used to ensure that the intermediate solutions are in fact feasible. We now introduce the specifics of our designed game. State Space, Actions, State Dynamics: These three parts are identical to those in Section IV-B. Admissible Action Sets: Let x = (v, e) represent a feasible state where the value profile v satisfies Av < C, v i V i for each agent i N, and the estimation profile e satisfies e k i < 0 for each i N and k M. Define the admissible action set for each agent i N as A i (x) { (ˆv i, ê i ) : v i + ˆv i V i, e k i + A k i ˆv i ê k i out < 0, ê k i j < 0, k M }. (16) According to the state transition rule in (7), it is straightforward to verify that for any admissible action profile a A(x), the ensuing state [ṽ, ẽ] = f(x, a) is also feasible. Accordingly, if the initial state x(0) is feasible then all subsequent states are also feasible, i.e, all states x(1), x(2),..., generated according to the process x(t + 1) = f(x(t), a(t)) where a(t) A(x(t)) for each time t {0, 1,...}. This implies that the coupled constraint will always be satisfied since for each time t > 0 the value profile satisfies Av(t) < C, v i (t) V i for each agent i N, and the estimation profile e satisfies e k i (t) < 0 for each i N and k M. Furthermore, note that each agent i can evaluate the admissible action set A i (x) using only local information. Agent Cost Functions: Each agent i N is assigned a cost function of the form (11) where the sole difference is the cost function associated with the estimation terms which now takes on the form Ji c (x, a) = m log ( ) ẽ k j. (17) j N i k=1 where x = (ṽ, ẽ) = f(x, a) represents the ensuing state. Here, the cost function Ji c ( ) utilizes a barrier function, as opposed to a penalty function, which penalizes values close to the constraint boundary. The following theorem demonstrates that the game design using barrier functions possesses the same analytical properties of the designed game using exterior penalty function. We omit the proof of the following theorem as it is virtually identical to the proofs of Theorems 1 and 2. Theorem 3 Model the constrained optimization problem in (1) as a state based game G with a fixed trade-off parameter µ > 0 as depicted in Section IV-C. The state based game is a state based potential game with potential function Φ(x, a) = φ(ṽ) µ m log ( ) ẽ k i i N k=1

16 16 where x = (ṽ, ẽ) = f(x, a) represents the ensuing state. Furthermore, suppose the objective function φ : V R is convex and differentiable and the communication graph G is undirected and connected. A state action pair [x, a] = [(v, e), (ˆv, ê)] is a stationary state Nash equilibrium if and only if the following four conditions are satisfied: (i) The value profile v is an optimal point of the following optimization problem min v V φ(v) nµ ( log C k ) A k i v i k M i N (18) s.t. m A k i v i C k < 0, k M i=1 (ii) The estimation profile e satisfies e k i = 1 A k i v i C k, n i N i N, k M. (iii) The change in value profile satisfies ˆv i = 0 for all agents i N. (iv) The net change in estimation profile is 0, i.e., ê k i in ê k i out = 0 for all agents i N and constraints k M. As µ 0 all equilibria of our designed game are solutions to the constrained optimization problem in (1) [33]. V. GRADIENT PLAY In this section we prove that the learning algorithm gradient play [26] [28] converges to a stationary state Nash equilibrium in any state based potential game. Since the designed games, either with exterior penalty functions or barrier functions, represent state based potential games, the algorithm gradient play can be utilized to design control laws of the form (2). A. Gradient Play for State Based Potential Games Suppose that A i (x) is a closed convex set for all i N and x X. Let x(t) represent the state at time t. Accordingly to the learning algorithm gradient play, each agent i N selects an action a i (t) A i (x(t)) according to a i (t) = [ ɛ J ] i(x(t), a) + a i a=0 (19) where [ ] + represents the projection onto the closed convex set A i (x(t)) and ɛ is the step size which is a positive constant.

17 17 Theorem 4 Let G be a state based potential games with a potential function Φ(x, a) satisfies: (i) Φ(x, a) is a convex, differentiable and bounded below. (ii) There exist a constant L such that for any x X and for any a, a A(x), a Φ(x, a) a Φ(x, a ) 2 L a a 2 where a Φ(x, a) ( Φ a 1,..., Φ a n ). 11 (iii) A i (x) is a closed convex set for all i N and x X. If the step size is smaller than 2/L, then the state action pair [x(t), a(t)] of the gradient play process in (19) asymptotically converges to a stationary state Nash equilibrium of the form [x, 0]. Proof: We will prove that the potential function Φ(x, a) = Φ( x, 0) where x f(x, a) is monotonically decreasing during the gradient play process provided that the step size is sufficiently small. The gradient play process in (19) can be expressed using the state based potential function as a i (t) = [ ɛ J i(x(t), a) a i ] + [ = ɛ a=0 Φ(x(t), a) a i ] + (20) a=0 Let x(t + 1) = f(x(t), a(t)). By Definition 1, we know that Φ(x(t + 1), 0) = Φ(x(t), a(t)). Accordingly, we will focus on analyzing the sequence Φ(x(t), 0) which satisfies Φ(x(t + 1), 0) Φ(x(t), 0) = Φ(x(t), a(t)) Φ(x(t), 0) a(t) T Φ(x(t), a) T + L a 2 a(t) 2 2 where the second inequality is based on Proposition A.24 in [34]. By the Projection Theorem [34], we know that ( which is equivalent to ɛ a(t) T Φ(x(t), a) a Φ(x(t), a) a If the step size is smaller than 2, we have that L Φ(x(t + 1), 0) Φ(x(t), 0) a=0 ) T a(t) ( a(t)) 0 a=0 1 a=0 ɛ a(t)t a(t). ( L 2 1 ɛ ) a(t) and the equality holds if and only if a(t) = 0. Therefore, Φ(x(t), 0) is monotonically decreasing along the trajectory x(t). Since Φ(x(t), 0) is bounded below, Φ(x(t), 0) keeps decreasing until it 11 This says that aφ(x, a) is a Lipschitz function on variable a.

18 18 reaches a fixed point, which means a(t) = 0. By Lemma 1, we know that [x(t), a(t)] converges to the stationary state Nash equilibrium [x, 0]. The rate of convergence of gradient play depends on the structure of the potential function Φ, the state transition function f, and the stepsize ɛ. A larger ɛ generally leads to faster convergence but can also lead to instability. The bound on the stepsize ɛ in Theorem 4 is conservative as larger stepsize can usually be used without losing stability. Furthermore, the same proof techniques can be used to prove convergence even with heterogeneous stepsizes among agents. This implies that agents can update synchronously or asynchronously without altering the asymptotic guarantees. B. Gradient Play for Designed Game Using Exterior Penalty Functions The state based game design using penalty functions depicted in Section IV-B represents a state potential game with a potential function that is convex and differentiable. Accordingly, we can implement the gradient play algorithm to reach a stationary state Nash equilibrium provided that φ is bounded below and V i is a closed convex set for all i N. The gradient play algorithm can be described as follows: at each time t 0 each agent i selects an action a i (t) (ˆv i (t), ê i (t)) given the state x(t) = (v(t), e(t)) according to [ ˆv i (t) = ɛ J ] i(x(t), a) + ˆv i a=0 = ɛ φ(v) + 2µ A k v i i max ( 0, e k i (t) ) v=v(t) k M ê k i j(t) = ɛ J i(x(t), a) ê k i j a=0 + (21) = ɛ 2µ ( max ( 0, e k i (t) ) max ( 0, e k j (t) )) (22) where [ ] + represents the projection onto the closed convex set Aˆv i (x(t)) {ˆv i : v i (t)+ ˆv i V i }. C. Gradient Play for Designed Game Using Barrier Functions The state based game design using barrier functions depicted in Section IV-C is a state potential game with a potential function that is convex and differentiable. Accordingly, we can implement the algorithm gradient play to reach a stationary state Nash equilibrium. However, in order to guarantee that the coupled constraints { i A k i v i C k < 0} k M is satisfied at each time step,

19 19 we need adjust the gradient play algorithm in (19) as follows: at each time t 0 each agent i selects an action a i (t) (ˆv i (t), ê i (t)) given the state x(t) = (v(t), e(t)) according to ˆv i = β(t) ɛ J i(x(t), a) ˆv i a=0 = β(t) ɛ φ(v) µ A k 1 v i i (23) v=v(t) k M e k i ê k i j = β(t) ɛ J i(x(t), a) = β(t) ɛ ê k i j a=0 ( ) 1 e k i 1 e k j where ɛ is the step size which is a positive constant and β(t) = ( ) k(t) 1 2 where k(t) is the smallest nonnegative integer k for which ( ( ) 1 k ɛ J i(x(t), a) ( ) 1 k, ɛ J i(x(t), a) 2 ˆv i a=0 2 ê k i j a=0 ) A i (x(t)). (24) VI. ILLUSTRATIONS In this section we illustrate our results on two separate cooperative control problems where the desired global behavior must satisfy a given coupled constraint. A. Consensus Consider the consensus problem described in Section III-A. Recall that it was impossible to design agent cost functions of the form (5) which ensured that all resulting Nash equilibria solved the optimization problem in (4). However, the state based game design using exterior penalty functions (Section IV-B) accomplishes this task using the framework of state based games. Note that the state based game design using barrier functions (Section IV-C) is not applicable in this setting since the constraint is an equality constraint. 1) A Simple Example: We will now demonstrate the state based game design using exterior penalty functions on the consensus problem highlighted in Figure 1. First, we express the constraint i v i = i v i (0) as two inequality constraints i v i i v i (0) and i v i i v i (0). Each agent i has a local state variable of the form x i (v i, e 1 i, e 2 i ) where e 1 i and e 2 i are the estimation terms associated with the two inequality constraints. Agent i s action a i is

20 a i (ˆv i, ê 1 i, ê 2 i ) where ê k i = { ê k i j } j N i for k = 1, 2. The state transition rule and local cost function are defined in (7) and (11) where M i ({v j } j Ni ) j N i v i v j 2 2 satisfies (3). For concreteness, consider agent 3 in the setting of Figure 1.a as an example. A state of agent 3 is of the form x 3 (v 3, e 1 3, e 2 3). An action of agent 3 is of the form a 3 (ˆv 3, ê 1 3 1, ê 1 3 4, ê 2 3 1, ê 2 3 4). The state transition rule is of the form [ṽ, ẽ] = f ([v, e], [ˆv, ê]) where ṽ 3 = v 3 + ˆv 3, ẽ 1 3 = e 3 + ˆv 3 ê ê ê ê 1 4 3, ẽ 2 3 = e 3 ˆv 3 ê ê ê ê The local cost function of agent 3 is of the form J 3 ([v, e], [ˆv, ê]) = ṽ 3 ṽ ṽ 3 ṽ 4 2 +µ ( [max ( )] 2 [ ( )] 2 [ ( )] ) 2 k=1,2 0, ẽ k 3 + max 0, ẽ k 1 + max 0, ẽ k 4. Notice that the above design methodology does not depend on the initial value profile and communication graph which was a requirement stated in Section III-B. Figure 2 shows the simulation results for those two different settings by applying the state based game design using exterior penalty functions (Section IV-B) and the gradient play algorithm (Section V-B). In the simulation, we set µ = 1 and stepsize ɛ = The simulation demonstrates that using state based game design we can achieve average consensus for both settings. Furthermore, the agents achieve the correct estimate of the couple constraints. 2) Analytical Properties of Pairwise Decomposable Objective Functions: The simulation results in the previous section highlight that the control design obtained using the state based game design with exterior penalty functions in conjunction with the gradient play learning algorithm yields the exact solutions to the consensus problem in (4). However, Theorem 2 suggests that in order to attain such guarantees we need to let µ. The following theorem demonstrates that if the objective function has a special pairwise decomposable structure, such as the objective function in the consensus problem, we can achieve the optimal solution of the original optimization problem in (1) with any fixed µ > 0. We present the proof of the following proposition in the appendix. Proposition 2 Consider the state based game depicted in Section IV-B with a fixed trade-off parameter µ > 0. Suppose the following conditions are satisfied: 20

21 21 Fig. 2. Convergence of gradient play for the state based game design using exterior penalty functions for consensus problems (a) and (b) in Figure 1 respectively. The left figures show the convergence of value v(t) which demonstrates that all the agents asymptotically reach the average consensus point. The right figures show the convergence of the first estimation terms e 1 i (t) for all agents i N. Recall, that e 1 i (t) = 0 for all i N implies that the coupled constraint is satisfied. 1) The value set for any agent i N is V i = R. 2) The objective function φ is convex, differentiable, and pairwise decomposable, i.e., φ(v) = d ij (v j v i ) (25) i N j N i where d ij (v j v i ) = d ji (v i v j ) for any two agent i, j N. 3) The constraint is a single equality constraint i N A i v i = c where i N A i 0. 4) The communication graph is undirected and connected. The state action pair [x, a] = [(v, e), (ˆv, ê)] is a stationary state Nash equilibrium if and only if the following four conditions are satisfied: (i) The value profile v is an optimal point of the constrained optimization problem (1).

22 22 (ii) The estimation profile e satisfies e k i = 0 for all agents i N and constraints k = 1, (iii) The change in value profile ˆv satisfies ˆv i = 0 for all agents i N. (iv) The net change in estimation profile ê is 0, i.e. ê k i in ê k i out = 0 for all agents i N and constraints k M. 3) A More Complex Example: We simulated a two-dimension average consensus problem with 61 agents and the results are presented in Figure In the simulation, we set µ = 1 and stepsize ɛ = 0.2. The figure on the left shows the initial conditions, i.e., the agents, the communication graph, the initial values, and the average of the initial values. The figure on the right illustrates the convergence of the dynamics attained by applying the state based game design proposed in Section IV-B in conjunction with the gradient play dynamics in Section V-B. Figure 4 demonstrates the evolution of Φ(x(t), a(t)), as defined in (14), during the gradient play learning process. Note that Φ(x(t), a(t)) converges to 0 very quickly. Moreover, the evolution of the individual components of the potential function Φ( ) which pertain to the value and estimation terms respectively, Φ v (x(t), a(t)) = φ(ṽ) and Φ e (x(t), a(t)) = µ [ 2 k=1 max(0, ẽ k i ) ] 2, are plotted in Figure 4. Since the agents are initially dispersed, the magnitude of Φ(x(t), a(t)) is primarily caused by the value component Φ v (x(t), a(t)). Gradually the agents values move closer together and the estimation cost Φ e (x(t), a(t)) is the main contributor to Φ( ). This plot demonstrates that agents are usually responding to inaccurate estimates of the value profile; however, these inaccuracies do not impact the limiting behavior of the dynamics. B. Power control Consider an optical channel problem where there is a set of channels N = {1,..., n} that are transmitted across an optical fiber by wavelength-multiplexing [10]. The system level objective is to minimize the total power while ensuring that the optical signal-to-noise ratio (OSNR) is larger than a specified threshold. For any channel i N let v i be the optical power at the input, p i be the optical power at the output, and n in i and n out i OSNR of channel i N at the receiver is defined as OSNR i = be the noise at the input and output. The p i n out i = n in Γ ii v i i + j N Γ i,j v j where 12 Note that a single equality constraint is represented by two inequality constraints. 13 All results contained in this paper immediately extend to multi-dimensional optimization problems, i.e., d i 1.

23 Fig. 3. The left figure shows the communication graph: Green nodes represent agents and their positions represent their initial values v i(0), which are two dimensional.

The XY plane highlights the value of each agent while the Z axis shows the time step. Each curve is a trajectory of one agent s value v i(t). Fig. 4.

23 23 Fig. 3. The left figure shows the communication graph: Green nodes represent agents and their positions represent their initial values v i(0), which are two dimensional. Blue lines represent the communication links and the red node is the (weighted) average. The right figure demonstrates that the dynamics converge. The XY plane highlights the value of each agent while the Z axis shows the time step. Each curve is a trajectory of one agent s value v i(t). Fig. 4. Dynamics of Φ(x(t), a(t)), Φ v (x(t), a(t)), and Φ e (x(t), a(t)) for the two-dimensional average consensus problem. Γ = [Γ i,j ] is a system matrix. We consider a power control problem for this channel, which is defined as: min vi 0 s.t. i N v i Γ i,i v i n in+ i j i Γ i,jv j r i, i N where r i is the threshold for channel i s OSNR. It is straightforward to observe that the constraints can be transformed into the following linear constraints r i n in i (26) + r i Γ i,j v j Γ i,i v i 0, i N. (27) j N

24 24 Moreover, the objective function satisfies (3) with M i (v i ) = v i. Therefore, we can use the state based game designs depicted in Section IV-B and Section IV-C to solve this channel problem. We consider the following setup with N = 4, n in i system matrix Γ = = 0.5, OSNR thresholds r = (4, 5, 3, 2) and The communication graph is Figure 5 shows the simulation of the power control problem using the state based game design incorporating exterior penalty functions and the learning algorithm gradient play. We set µ = 50 and ɛ = Figure 5 illustrates that the value profile v converges to (v 1,..., v 4 ) = ( , , , ). This is close to the optimal result (v 1, v 2, v 3, v 4 ) = ( , , , ) which was obtained using linear programming. Fig. 5. Power control game design: Trajectory of each agent s power level v i using the gradient play algorithm with the state based game design using exterior penalty functions. Figure 6 shows the simulation of the power control problem using the state based game design incorporating barrier functions and the learning algorithm gradient play. We set µ = 0.05 and ɛ = Figure 6 illustrates that the value profile v converges to (v 1,..., v 4 ) = (14.30, 19.77, 12.98, 6.68) which is close to the optimal result. Furthermore, note that the constraints are always satisfied when using the gradient play algorithm for this setting.

Designing Games to Handle Coupled Constraints

Designing Games to Handle Coupled Constraints Na Li and Jason R. Marden Abstract The central goal in multiagent systems is to design local control laws for the individual agents to ensure that the emergent