A Distributed Regularized Jacobi-type ADMM-Method for Generalized Nash Equilibrium Problems in Hilbert Spaces

Size: px

Start display at page:

Download "A Distributed Regularized Jacobi-type ADMM-Method for Generalized Nash Equilibrium Problems in Hilbert Spaces"

Melissa Lamb
5 years ago
Views:

1 A Distributed Regularized Jacobi-type ADMM-Method for Generalized Nash Equilibrium Problems in Hilbert Spaces Eike Börgens Christian Kanzow March 14, 2018 Abstract We consider the generalized Nash equilibrium problem in a Hilbert space setting. The joint constraints are eliminated by an augmented Lagrangian-type approach, and we present a fully distributed version by using ideas from alternating direction methods of multipliers ADMM methods). Convergence follows, under a cocoercivity condition, from the fact that this method can be interpreted as a suitable splitting approach in our Hilbert space endowed with a modified scalar product. This observation also leads to a second algorithmic approach, which yields convergence under a Lipschitz assumption and monotonicity. Numerical results are presented for some examples arising in both finite- and infinite-dimensional Hilbert spaces. 1 Introduction We consider the generalized Nash equilibrium problem GNEP) with N players ν, where player ν-th s optimization problem is given by min θ ν x ν, x ν ) + ϕ ν x ν ) x ν X ν s.t. A µ x µ = b 1) or, more generally, min θ ν x ν, x ν ) + ϕ ν x ν ) x ν X ν s.t. B µ x µ b C 2) for all ν = 1,..., N. Here, H ν and K are given Hilbert spaces, ϕ ν : H ν R are proper, convex, and lower semi-continuous functions, θ ν : H 1 H N R are continuously University of Würzburg, Institute of Mathematics, Campus Hubland Nord, Emil-Fischer-Str. 30, Würzburg, Germany; {eike.boergens,kanzow}@mathematik.uni-wuerzburg.de. 1

2 Fréchet-differentiable with θ ν, x ν ) being convex for any fixed x ν, X ν H ν are nonempty, closed, and convex sets, C is a nonempty, closed, convex cone, A ν, B ν LH ν, K) and b K. Following standard notation in Nash games, we write x = x ν, x ν ), where x ν subsumes all the remaining block components x µ with µ ν. This notation is used to emphasize the particular role played by the block component x ν within the entire vector x and does not mean that the components of x are re-ordered. In particular, we therefore have x = x ν, x ν ) = x 1,..., x N ) and, similarly, y ν, x ν ) = x 1,..., x ν 1, y ν, x ν+1,..., x N ). We assume that the generalized Nash equilibrium problem 1) has a nonempty feasible set. Since we have explicit constraints X ν, there is no loss of generality to have θ ν and ϕ ν real-valued for all ν = 1,..., N. Moreover, we do not assume the operators A ν to be injective or surjective, a condition that is often used for ADMM-type methods ADMM = Alternating Direction Method of Multipliers) in the finite-dimensional context, where the matrices A ν are assumed to have full rank. For the sake of notational simplicity, we use the abbreviations H := H 1 H N, X := X 1... X N H, x := x 1,..., x N ) H, Ax := A ν x ν, Bx := B ν x ν, ϕx) = ϕ ν x ν ), F := {x X Ax = b}. Canonically, H becomes a Hilbert space with the scalar product x y := x 1 y x N y N ; the scalar product in the space H K is defined analogously. The symbol always denotes the norm induced by the corresponding scalar product in H ν, H, K, or H K); even though it should be clear from the context, we want to put an index on all norms and scalar products except the one in H K and operator norms, in order not to inflate the length of the notation. There exist many approaches, mainly in finite-dimensional spaces, for the numerical solution of GNEPs, and the interested reader is referred to the survey papers [11, 14] for more details. Here, we are in a Hilbert space setting and present a fully distributed augmented Lagrangian-type method for the generalized Nash equilibrium problems 1) and 2). The idea of this method was influenced by the alternating direction methods of multipliers, see [3, 4, 9, 17, 18, 19, 20], and the recent Lagrangian methods for GNEPs, see [22, 23]. In these latter methods a standard NEP has to be solved in every step, whereas our approach requires only the solution of much simpler optimization problems at each iteration. The papers that are probably the closest to our approach are [6, 30]. Both articles consider splitting-/admm-type methods; in [6], the authors focus on standard NEPs and show afterwards how to solve certain GNEPs under a cocoercivity assumption. However, their method for GNEPs requires a projection step in the full space and is therefore not a distributed approach. On the other hand, the algorithm considered in [30] for finitedimensional problems) is fully distributed, but it uses a strong monotonicity and Lipschitz assumption. Somewhat related is the decomposition scheme in [13] in finite-dimensional spaces), where the convergence theory is carried out for the class of potential games and where relatively strong assumptions for the non-augmented) joint constraints are necessary. Here we introduce two fully distributed methods for the solution of the GNEPs from 1) and 2). The first one turns out to be equivalent to a standard splitting method in our 2

3 Hilbert space endowed with a different scalar product, and is therefore convergent under a cocoercivity condition. This equivalence observation then leads to a second method which, at some extra computational costs, turns out to be convergent under a Lipschitz and monotonicity assumption; hence no cocoercivity or strong monotonicity of a certain pseudo-gradient) mapping is required. This paper is organized as follows. Section 2 contains some basic definitions and preliminary results. Section 3 provides some theoretical background for GNEPs of the form 1). Our first ADMM-type method is stated and analyzed in Section 4. A modification of this approach is investigated in Section 5 and requires only a Lipschitz and monotonicity assumptions to guarantee convergence. Section 6 then shows how to extend our approaches to GNEPs with more general constraints as in 2). Some applications and numerical results for problems resulting from finite- and infinite-dimensional GNEPs are presented in Section 7. We close with some final remarks in Section 8. 2 Preliminaries We first give some basic definitions from set-valued and variational analysis and then recall two splitting-type methods for the solution of operator equations, which will play a central role in our convergence analysis. For more details, we refer the reader to the excellent monograph [1]. 2.1 Basic Definitions The distance function in a Hilbert space H of a point w H to a nonempty, closed, and convex set C H is given by dist C w) := inf v C w v H, and dist 2 Cw) := inf v C w v 2 H is the squared distance function. In the case that C is a nonempty, closed, and convex set, this function is Fréchet-differentiable, cf. [1, Cor ]. Given a set-valued operator T : H 2 H, the domain of T is defined by dom T := {x H T x) }, the graph of T is given by grapht ) := { x, u) H H u T x) }, and the inverse of T is defined through its graph by grapht 1 ) := { u, x) H H u T x) }, so that u T x) if and only if x T 1 u). The set of all points x satisfying 0 T x) is denoted by zert ). The set-valued operator T is called monotone if u v x y 0 x, u), y, v) graph T, and maximally monotone if it is monotone and its graph is not properly contained in the graph of another monotone operator. A set-valued operator T is called strongly monotone if there is a constant ρ > 0 such that u v x y ρ x y 2 x, u), y, v) graph T. The subdifferential of a convex function f : H, + ] at a point x is defined by f x) := { s H fx) f x) s x x x H }, and is known to be maximally monotone. 3

4 A single-valued operator T : H H is called α-cocoercive for some α > 0 if T x) T y) x y α T x) T y) 2. From the Cauchy-Schwarz inequality, we see that every α-cocoercive operator is 1/α-Lipschitz continuous. If T is strongly monotone with modulus ρ and Lipschitz continuous with constant L, then T is also ρ/l 2 -cocoercive. On the other hand, cocoercivity does not imply strong monotonicity. This observation will play an important role in our analysis, since a certain operator to be defined later will never be strongly monotone but will be cocoercive under suitable conditions. We also recall that the normal cone is defined by N X x) := { s H s y x 0 y X }, if x X ; N X x) :=, if x X. The Hilbert space adjoint of a linear operator A : X K will be denoted by A. We call A LH) self-adjoint if A = A. The following result summarizes some properties of self-adjoint and strongly monotone operators. Lemma 2.1. Let H be a Hilbert space with scalar product, and let Q LH) be self-adjoint and strongly monotone. Then the following statements hold: a) Q 2 x x Q Qx x for all x H. b) The inverse Q 1 exists and is also self-adjoint and strongly monotone. c) x 2 Q 1 x 2 Q, where x Q denotes the norm induced by the scalar product x y Q := Qx y. Proof. a) Let B := Q/ Q. Then we have Bx x 0, I B)x x = x 2 Bx x 0 by the Cauchy-Schwarz inequality, and both B and I B are self-adjoint. Since B B 2 = BI B)B + I B)BI B), we therefore obtain B B 2 )x x = BI B)Bx x + I B)BI B)x x = I B)Bx Bx + BI B)x I B)x 0. Hence B 2 x x Bx x for all x H. Thus a) follows from the definition of B. b) We only need to show that Q 1 is strongly monotone. To this end, take an arbitrary y H. Applying a) with x := Qy, we obtain x 2 = x x = Qy Qy = Q 2 y y Q Qy y = Q x Q 1 x ; hence Q 1 is strongly monotone with modulus 1/ Q. c) Applying a) to x := Q 1 y and using b) yields x 2 = x x = Q 1 y Q 1 y = Q 2 y y Q 1 Q 1 y y = Q 1 x Qx = Q 1 x x Q = Q 1 x 2 Q. Noting that this holds for all x H gives the desired result. Note that statement a) of Lemma 2.1 already holds for monotone operators Q. 4

5 2.2 Splitting-type Methods To analyze the convergence properties of two algorithms that will be presented in subsequent sections, we recall the definition and the corresponding properties of two basic splitting-type methods for problems of the type 0 Ax + Bx for suitable operators A : H 2 H and B : H H note that A is allowed to be multi-valued, whereas B is single-valued). The idea of the forward-backward method for solving the above problem is based on the reformulation with some β > 0) 0 Ax + Bx Bx Ax I βb)x I + βa)x x I + βa) 1 I βb)x. 3) Assuming A to be maximally monotone implies that the last inclusion is actually an equation, so we obtain the characterization x = I +βa) 1 I βb)x. This characterization motivates the corresponding fixed point iteration x k+1 := I + βa) 1 I βb)x k, which can be written equivalently as y k := x k βbx k, x k+1 := I + βa) 1 y k. 4) The computations of y k and x k+1 are called the forward and the backward step, respectively, which motivates the name forward-backward method. Provided that A is maximally monotone, B is α-cocoercive for some α > 0, the parameter β is taken such that β 0, 2α), and assuming that zera + B), one can show that a sequence {x k } generated by the forward-backward scheme 4) converges weakly to a point in zera + B); see [1, Thm ] for more details and further convergence properties. The second splitting method that we recall here is due to Tseng [28], see also [1]. For the application we have in mind, we only restate a simplified version of that method. The main advantage of Tseng s approach is that it replaces the cocoercivity assumption for B by a simpler condition. More precisely, assume that B is Lipschitz continuous with constant 1/α, and let β 0, α), hence β/α < 1. Then it is easy to see that I βb is strongly monotone and, therefore, a bijection. Consequently, we can further rewrite 3) as follows: 0 Ax + Bx x = I + βa) 1 I βb)x I βb)x = I βb)i + βa) 1 I βb)x x = I βb)i + βa) 1 I βb) + βb ) x. This motivates the fixed point iteration x k+1 = I βb)i + βa) 1 I βb) + βb ) x k, which can be rewritten as y k := x k βbx k, z k := I + βa) 1 y k, x k+1 := z k + βbx k Bz k ). This justifies the alternative name forward-backward-forward splitting method for Tseng s approach. Provided that A is maximally monotone, B is monotone and 1/α-Lipschitz continuous for some α > 0, A + B is maximally monotone, the parameter β is taken such that β 0, α), and assuming that zera+b), one can show that the sequences {x k } and {z k } generated by this forward-backward-forward scheme converge weakly to a point in zera + B); see [1, Thm ] for more details and further convergence properties. 5

6 3 Generalized Nash Equilibrium Problems In this work, we consider a special kind of solution of problem 1). To this end, let Ψx, y) := θν x ν, x ν ) + ϕ ν x ν ) θ ν y ν, x ν ) ϕ ν y ν ) ) be the Nikaido-Isoda-function of the GNEP from 1). Then x H is called a normalized equilibrium or a variational equilibrium of 1) if sup y F Ψx, y) = 0. Following [11], for example, it is not difficult to see that every variational equilibrium is a generalized Nash equilibrium of the GNEP. We also introduce the pseudo-gradient P θ : H H of the functions θ ν from 1) by x1 θ 1 x 1, x 1 ) P θ x) :=.. 5) xn θ N x N, x N ) Further note that the definition of ϕ yields x1 ϕ 1 x 1 ) ϕx) =.. xn ϕ N x N ) If all ϕ ν are differentiable, we notice that ϕ = ϕ = P ϕ. This notation allows us to extend a known result from finite-dimensional GNEPs see, e.g. [11]) to our Hilbert space setting. Theorem 3.1. Under the given convexity and smoothness assumptions on θ ν and ϕ ν, it holds that x is a variational equilibrium of 1) if and only if 0 ϕx ) + P θ x ) + N F x ). Proof. By definition, x is a variational equilibrium of 1) if and only if sup y F Ψx, y) = 0 or, equivalently, if Ψx, y) 0 for all y F. In turn, this means that x solves the problem min y F θν y ν, x ν) + ϕy ν ) ). 6) Since the objective function is convex as a mapping of y, it follows, using the notation introduced before, that 6) is equivalent to 0 ϕx ) + P θ x ) + N F x ). Note that the previous result remains true for more general convex sets X, not necessarily given as in our framework from Section 1. Under certain regularity conditions we can characterize the normal cone N F x ). This leads to a particular notion of a KKT point for which we use the following terminology. Definition 3.2. A pair x, µ ) X K is called a variational KKT point of 1) if it satisfies the following KKT-type conditions: 0 ϕx) + P θ x) + A µ + N X x) and 0 = b Ax. 6

7 Note that a variational KKT point has to be feasible with respect to the abstract constraints X, whereas it exploits the existence of a multiplier for the equality constraints. This setting is useful for our ADMM-type method where only the linear constraints are penalized, whereas the abstract constraints remain unchanged. The following result clarifies the relation between variational equilibria and variational KKT points of problem 1). Theorem 3.3. The following statements hold: a) If x, µ ) H K is a variational KKT pair of 1), then x is a variational equilibrium. b) Conversely, assume that A LH, K) has closed range, and that int X {x H Ax = b}. If x F is a variational equilibrium of 1), then there exists a multiplier µ such that x, µ ) H K is a variational KKT pair of 1). Proof. Recall that F = {x X Ax = b} = X Y with Y := {x Ax = b} being the preimage of b under A. Since A is continuous, it follows that Y is also closed and convex. By assumption, X is a closed, convex set. Moreover, an easy calculation shows that N Y x) = KerA) = RangeA ) RangeA ) for any x Y, where we used a lemma related to the Banach closed range theorem, cf. [8, Thm. 8.11]. Since the inclusion N X x) + N Y x) N F x) always holds for any x F, statement a) follows from N X x ) + A µ N X x ) + RangeA ) N X x ) + N Y x ) N F x ) together with Theorem 3.1. Statement b) can be verified similarly by noting that, under the given assumption, the equality N X x ) + N Y x ) = N F x ) holds, cf. [1, Prop ]. In our subsequent algorithm for the solution of the GNEP from 1), we will compute a variational KKT point. Theorem 3.3 shows that this always yields a variational equilibrium, and that this approach is actually equivalent to finding a variational equilibrium under a certain regularity condition. In order to rewrite the variational KKT conditions in a more compact form, let us further introduce the notation W := X 1... X N K, w := x 1,..., x N, µ ), ψw) := ϕx), where the last expression is just a formal re-definition of the mapping ϕ with the only difference being that ψ is viewed as a function of all variables w, whereas ϕ depends only on x. Hence ) ϕx) ψw) =, 7) {0} where the corresponding subdifferentials are taken with respect to w and x, respectively. Moreover, we define the pseudo-gradient as a mapping of the whole vector w = x, µ) by P θ w) := ) Pθ x). 8) {0} 7

8 Finally, let us define G : H K H K by Gw) := A 1µ. A N µ b N A νx ν The particular structure of G immediately yields the following result.. 9) Lemma 3.4. The mapping G as defined in 9) satisfies Gw) G w) w w = 0 for all w, w W; in particular, G is a continuous monotone operator. The above notation gives a compact representation of the variational KKT conditions. Lemma 3.5. The vector pair w = x, µ ) X K is a variational KKT point of 1) if and only if w W, where W := { w W 0 ψw) + P θ w) + Gw) + N W w) }. Proof. The proof follows from the previous definitions, taking into account that, due to the Cartesian structure of W, we have N W w) = N X1 x 1 )... N XN x N ) N K µ) and N K µ) = {0} since K is the entire space. Let us define the multifunctions T w) := T 1 w) + T 2 w) with T 1 w) := P θ w), T 2 w) := ψw) + Gw) + N W w). 10) The domains of T and T 2 are obviously given by the nonempty set W, while the domain of T 1 = P θ is the whole space H K. Then the set W from Lemma 3.5 can be expressed as W = {w W 0 T 1 w) + T 2 w)}. This indicates that the set-valued mappings T 1 and T 2 play a central role in our analysis. Their most important properties are formulated in the following result. Proposition 3.6. Suppose that the pseudo-gradient P θ : H K H K defined in 8) is a monotone mapping. Then the three set-valued functions T, T 1, and T 2 defined in 10) are maximally monotone. Proof. Using the fact that P θ is monotone, continuous, and single-valued, it is maximally monotone by [1, Cor ]. Since G is a continuous monotone function in view of Lemma 3.4 and dom G = H K, it follows that B := G + N W is a maximally monotone operator, see, e.g., [1, Cor. 25.5]. Since the domain of the operator A := ψ is also equal to the entire space H K see [1, Cor combined with Prop ]) and domb), it follows again from [1, Cor. 25.5] that T 2 = ψ + G + N W = A + B is also maximally monotone. Similarly, it follows that T is maximally monotone by domp θ ) = H K and dom T 2. The monotonicity of P θ or, equivalently, of the pseudo-gradient P θ, as required in the assumption of Proposition 3.6, is a standard condition used in the context of GNEPs and typically represents a minimal assumption on the given GNEP in order to prove convergence of suitable methods to a solution of the GNEP. 8

9 4 Regularized Jacobi-type ADMM Method The following regularized Jacobi-type method for the solution of 1) will be investigated in this section. Its basic idea is the following: The joint constraints get augmented in order to obtain a separable structure in the remaining constraints. We then use the ADMM idea and view the resulting optimization problems of each player as minimization problems of the variables x ν alone. Note that we also use a linearization of the smooth part θ ν in our subproblems, which might simplify the solution of the resulting subproblems significantly whereas the possibly nonsmooth term ϕ ν remains unchanged). Finally, a proximal term is added to improve the convergence properties and to make the overall method well-defined. Algorithm 4.1. Regularized linearized Jacobi-type ADMM Method) S.0) Choose a starting point x 0, µ 0 ) X K, parameters β, γ > 0, and set k := 0. S.1) If a suitable termination criterion is satisfied: STOP. S.2) For ν = 1,..., N, compute S.3) Define x k+1 ν := arg min xν X ν {ϕ ν x ν ) + xν θ ν x k ν, x k ν) x ν x k ν Hν + µ k A ν x ν K S.4) Set k k + 1, and go to S.1). + β 2 Aν x ν + µ ν A µx k µ b 2 K + γ x ν x k ν 2 H ν ) }. 11) N ) µ k+1 := µ k + β A µ x k+1 µ b. 12) Throughout our convergence analysis, we assume implicitly that Algorithm 4.1 generates an infinite number of iterates. We further note that all subproblems 11) are strongly convex for all ν and all iterations k. Hence x k+1 := ) x k+1 1,..., x k+1 N is uniquely defined. Note that this is due to the quadratic regularization term, which does not occur in standard ADMM methods for two or more components. The main computational overhead in Algorithm 4.1 comes from the solution of the optimization subproblems in S.2). However, in contrast to augmented Lagrangian-type methods [22, 23], these subproblems are only optimization problems and not Nash equilibrium problems themselves. Moreover, the subproblems occuring in S.2) can typically be solved in an efficient way, sometimes even analytically. We next investigate the convergence properties of Algorithm 4.1. The main idea of our analysis is to interpret Algorithm 4.1, after a simple linear transformation, as a forwardbackward splitting method applied to a suitable inclusion problem in an appropriate Hilbert space. To this end, let us introduce the linear operator M LH) by µ ν N N ) N µ=2 Mx := A A 1A µ x µ νa µ x µ =. N 1 A N A µx µ. 13) 9

10 It is not difficult to see that M is self-adjoint. Furthermore, define Q β,γ LH K) by ) ) x β Q β,γ := 2 γx Mx), 14) µ µ where β, γ denote the constants from Algorithm 4.1. The following simple remark plays a crucial role in our subsequent convergence analysis. Remark 4.2. Since M from 13) is self-adjoint, it follows that Q β,γ from 14) is also selfadjoint. Moreover, for all γ > 0 sufficiently large say γ > M ), Q β,γ is also strongly monotone. This implies that Q β,γ is both injective and surjective. Hence the inverse of Q β,γ LH K) exists and is also a linear, continuous, and self-adjoint operator. We first estimate the norm of the operator M, since this will be required later. Lemma 4.3. It holds that M N 1) max,...,n { A ν 2 }. Proof. The definition of M yields Mx x H = = = = N A νa µ x µ x ν µ ν H ν A µ x µ A ν x ν K µ ν N 1) N 1 A ν x ν 2 K + 2 N 1 A ν x ν 2 K + 2 A ν 2 x ν 2 H ν µ ν = N 1) max,...,n { A ν 2 } x 2 H, N µ ν A νa µ x µ x ν Hν µ ν 1 2 A µx µ 2 K N 1 A ν x ν 2 K 2 1 Aν x ν 2 K + A µ x µ 2 2 K) N 1) max,...,n { A ν 2 } x ν 2 H ν where subscripts for the underlying operator norms are omitted for the sake of notational Mx x convenience. M = sup x 0, cf. [7, Thm ]. We therefore obtain the desired x 2 estimate. The next result gives a suitable reformulation of the optimality conditions for the subproblems from 11) and 12). Lemma 4.4. The vector w k+1 = x k+1, µ k+1) computed in 11) and 12) is characterized by the inclusion I βq 1 β,γ P ) θ w k I + βq 1 β,γ ψ + G + N W) ) w k+1. 15) 10

11 Proof. Using the optimality conditions for the programs 11), it follows that x k+1 ν these programs if and only if x ν = x k+1 ν satisfies the optimality conditions 0 xν ϕ ν x ν ) + xν θ ν x k ν, x k ν), x ν x k ν H ν + µ k A ν x ν + β 2 Aν x ν + µ ν A µx k µ b 2 K + γ x ν x k ν 2 H ν ) ) + N Xν x ν ) K solves for all ν = 1,..., N. This is equivalent to saying that there exist elements g ν ϕ ν x k+1 ν ) such that the vector g ν + xν θ ν x k ν, x k ν) + A νµ k + βa νa ν x k+1 ν + µ ν ) A µ x k µ b) + βγx k+1 ν x k ν) belongs to the normal cone N Xν x k+1 ν ) for all ν = 1,..., N. By definition of the normal cone, this can be rewritten as g ν + xν θ ν x k ν, x k ν) + A νµ k x ν x k+1 + βaν Aν x k+1 ν + µ ν A µx k µ b ) + βγx k+1 ν x k ν) x ν x k+1 ν ν H ν H ν 0 for all x ν X ν and all ν = 1,..., N. Using µ k+1 = µ k + β N A µx k+1 µ b ), cf. 12), the last inequality is equivalent to g ν + xν θ ν x k ν, x k ν) x ν x k+1 ν + A νµ k+1 + βa ν H ν µ ν A µx k µ x k+1 µ ) ) + βγx k+1 ν x k ν) x ν x k+1 ν H ν 0 for all x ν X ν and all ν = 1,..., N. Exploiting the definition of M in 13), the Cartesian product structure of the set X, recalling that P θ x) = x1 θ 1 x k ),..., xn θ N x k )), and setting g = g 1,..., g N ), this can be rewritten more compactly as g + P θ x k ) + A µ k+1 + βmx k x k+1 ) + βγx k+1 x k ) x x k+1 0 for all x X. Since 1 β µk+1 µ k ) + b ) A ν x k+1 ν µ µ k = 0 µ K K in view of 12), setting g := g, 0), the previous two formulas are equivalent to g + P θ w k ) + Gw k+1 ) + 1 β Q β,γw k+1 w k ) w w k+1 0 for all w W. Using the definition of the normal cone N W, we can express this as 0 g + P θ w k ) + Gw k+1 ) + 1 β Q β,γw k+1 w k ) + N W w k+1 ). H 11

12 By definition, we have g ν ϕ ν x k+1 ν ) for all ν = 1,..., N, hence the last equation is equivalent to 0 ψw k+1 ) + P θ w k ) + Gw k+1 ) + 1 β Q β,γw k+1 w k ) + N W w k+1 ) 1 β Q β,γw k P θ w k ) ψw k+1 ) + Gw k+1 ) + 1 β Q β,γw k+1 + N W w k+1 ), which can be rewritten as I βq 1 β,γ P ) θ w k ) I + βq 1 β,γ ψ + G + N W) ) w k+1 ). This completes the proof. Based on the previous results and assuming that Q β,γ is strongly monotone, cf. Remark 4.2, we obtain the following alternative procedure for the computation of w k+1 from Algorithm 4.1: Using the operators T 1 and T 2 from 10), we can rewrite formula 15) as w k+1 I + βq 1 β,γ T ) 1 2 I βq 1 β,γ T 1) w k, which almost looks like a forward-backward method. In fact, if the operator A := Q 1 β,γ T 2 would be maximally monotone, we could rewrite this inclusion as an equation and would obtain the forward-backward iteration w k+1 = I + βq 1 β,γ T 2 ) 1 I βq 1 β,γ T 1) w k, 16) which is known to converge provided that the second operator B := Q 1 β,γ T 1 has a suitable cocoercivity property. Unfortunately, though the operators T 1 and T 2 themselves are maximally monotone in view of Proposition 3.6, this statement does not hold, in general, for the operators A = Q 1 β,γ T 2 and B = Q 1 β,γ T 1. But this problem can be solved easily by using a suitable weighted scalar product. Proposition 4.5. Assume that Q β,γ is self-adjoint and strongly monotone. Let T 1 and T 2 be the operators defined in 10). Furthermore, consider the Hilbert space H K endowed with the scalar product w z Qβ,γ := Q β,γ w z. Then the following statements hold for this scalar product: a) The operator B := Q 1 β,γ T 1 is maximally monotone and single-valued. b) The operator A := Q 1 β,γ T 2 is maximally monotone. c) The operator A + B is maximally monotone. Proof. Once we verify statement b), part a) follows in the same way. To this end, recall that T 2 is maximally monotone with respect to the scalar product ) in view of Proposition 3.6. Hence βt 2 is also maximally monotone. Since Q 1 β,γ is self-adjoint and strongly monotone, it follows from [1, Prop ] that βq 1 β,γ T 2 is maximally monotone in the Hilbert space H K endowed with the scalar product Qβ,γ. Statement c) follows directly from [1, Cor. 25.5] since dom B = H K. 12

13 Proposition 4.5 implies that the sequence {w k } generated by Algorithm 4.1 can be equivalently represented by the forward-backward scheme from 16), which is known to yield weak convergence under suitable assumptions. However, since we have maximally monotone operators only with respect to the weighted scalar product introduced in Proposition 4.5, we obtain weak and strong convergence with respect to this scalar product and its induced norm only. But this scalar product is introduced here only for theoretical reasons; one is typically interested in corresponding convergence results in terms of the given Hilbert space, endowed with the original scalar product. The following result claims that both convergence properties coincide in our particular case. Its simple proof can be found, e.g., in [3]. Lemma 4.6. Consider the Hilbert space H K with the usual scalar product, and let Qβ,γ be defined as in Proposition 4.5 for Q β,γ self-adjoint and strongly monotone. Then the corresponding induced norms and Qβ,γ are equivalent. Moreover, weak convergence with respect to is equivalent to weak convergence with respect to Qβ,γ. In order to verify convergence, it remains to show that the forward) operator B = Q 1 β,γ T 1 is cocoercive for some modulus α > 0. Then, in principle, the known convergence properties of the forward-backward splitting method can be applied for any choice of the step size β from the interval 0, 2α). However, in our case, the operator B = Q 1 β,γ T 1 itself depends on β via the linear operator Q β,γ. This is actually the reason for not simply writing Q for this operator, since this dependence is crucial, which therefore causes some additional problems. The following result first discusses the cocoercivity of the operator B which, obviously, depends on corresponding properties of the mapping P θ. Lemma 4.7. Suppose that P θ is α-cocoercive in H K endowed with the scalar product and Q β,γ LH K) is a strongly monotone, self-adjoint operator. Then Q 1 β,γ P θ is α/ Q 1 β,γ -cocoercive in H K endowed with the scalar product Q β,γ. Proof. First recall that, for any continuous linear operator Q, the inequality Qw w Q w 2 holds for all w. We therefore obtain Q 1 β,γ P θw) Q 1 β,γ P θv) 2 Q β,γ = Q β,γ Q 1 β,γ P θw) Q 1 β,γ P θv) ) Q 1 β,γ P θw) Q 1 β,γ P θv) = P θ w) P θ v) Q 1 β,γ Pθ w) P θ v) ) Q 1 β,γ P θw) P θ v) 2. Hence, the assumed α-cocoercivity of P θ yields Q 1 β,γ P θw) Q 1 β,γ P θv) w v Qβ,γ = P θ w) P θ v) w v α P θ w) P θ v) 2 α Q 1 β,γ Q 1 β,γ P θw) Q 1 β,γ P θv) 2 Q β,γ, and this completes the proof. Note that the cocoercivity assumption on the operator P θ is very natural, and follows immediately from the corresponding cocoercivity of the mapping P θ. In particular, this 13

14 assumption holds if P θ is strongly monotone and Lipschitz continuous. On the other hand, it is important to note that the latter condition does not imply P θ is strongly monotone. In fact, the operator P θ is never strongly monotone due to its last component being zero. We finally need to address the problem that the operator B depends on β, which causes some issues regarding the choice of this step size. However, these are resolved in the proof of the following main convergence result for Algorithm 4.1. Theorem 4.8. Suppose that the operator P θ from 5) is α-cocoercive in H endowed with the usual scalar product, and that there is at least one variational KKT point for the Nash equilibrium problem 1). Furthermore, assume that the parameters β, γ are chosen such that β 0, 2α), γ 1 β 2 + M. Then the following statements hold: a) The operator Q β,γ from 14) is self-adjoint and strongly monotone. b) The sequence w k = x k, µ k ) generated by the Algorithm 4.1 converges weakly to a variational KKT pair w = x, µ ). c) The sequence P θ x k ) converges strongly to the unique value P θ x ). d) If P θ is strongly monotone, the sequence {x k } converges strongly to x. Proof. a) Since γ > M, the statement follows from Remark 4.2. b) First notice that P θ : H K H K as defined in 8) is also α-cocoercive whenever P θ : H H as defined in 5) is α-cocoercive. Since γ 1 β 2 + M, we have β 2 γi M)x H β 2 γx H Mx H ) β 2 γ M ) x H x H for all x H. This yields 1 Q 1 = 1 = inf β,γ 2 Q sup 1 β,γ w 2 w H K\{0} w 2 w H K\{0} β 2 γi M)x 2 H = inf + µ 2 K x,µ) H K\{0} x 2 H + µ 2 K w 2 Q 1 = inf Q β,γ v 2 β,γ w 2 v H K\{0} v 2 x 2 H inf + µ 2 K x,µ) H K\{0} x 2 H + µ 2 K where we substituted v = Q 1 β,γw. Hence, the choice of γ guarantees that Q 1 β,γ 1. In view of Lemma 4.7, it therefore follows that Q 1 β,γ P θ is α-cocoercive in H K endowed with the scalar product Qβ,γ. Statement b) therefore follows from standard convergence properties of the forward-backward splitting method, cf. [1, Prop ], together with Lemma 4.6. c) Since P θ = P θ, 0), this statement also follows from standard results, cf. [1, Prop ]. d) This statement follows directly from c) and the observation that P θ x k ) P θ x ) H ρ x k x H for some ρ > 0, which is a consequence of the strong monotonicity and the Cauchy-Schwarz inequality. In practice, the constant γ 1 β 2 + M from Theorem 4.8 might be large, but there also exist examples where this constant is just of moderate size, see Section 7 for an instance. = 1, 17) 14

15 5 Modified Regularized Jacobi-type ADMM Method Motivated by the fact that Algorithm 4.1 turned out to be a forward-backward splitting method in a suitable Hilbert space, it is natural to apply Tseng s method, as outlined in Section 2.2, to our setting in order to weaken the cocoercivity assumption. This means that we have to correct the outcome of Algorithm 4.1 in a suitable way. The details are given in the following algorithm. Algorithm 5.1. Modified Regularized linearized Jacobi-type ADMM Method) S.0) Choose a starting point x 0, µ 0 ) X K, parameters β, γ > 0, and set k := 0. S.1) If a suitable termination criterion is satisfied: STOP. S.2) For ν = 1,..., N, compute ˆx k ν := arg min xν X ν {ϕ ν x ν ) + xν θ ν x k ν, x k ν) x ν x k ν Hν + µ k A ν x ν K + β 2 Aν x ν + µ ν A µx k µ b 2 K + γ x ν x k ν 2 H ν ) }. 18) S.3) Define N ) µ k+1 := µ k + β A ν ˆx k ν b. 19) S.4) Let ˆx k := ˆx k 1,..., ˆx k N), and compute S.5) Set k k + 1, and go to S.1). x k+1 := ˆx k + βγi βm) 1 Pθ x k ) P θ ˆx k ) ). 20) We will show that this algorithm has essentially the same convergence properties as the previous one, but without the cocoercivity assumption. The price we have to pay is the solution of the linear operator equation 20) with a constant operator, independent of k). In the finite-dimensional case, this means that we have to solve a linear system of equations, but with the same matrix at all iterations, so only one matrix decomposition is required which can be used in all iterations. Let us define the two vectors ) ) ˆx ŵ k+1 k x := and w k+1 k+1 :=. 21) µ k+1 Then we see that ŵ k+1 corresponds exactly to one iteration of Algorithm 4.1 and can therefore be viewed as a forward-backward method in a Hilbert space with a suitably modified scalar product. In order to verify convergence of Algorithm 5.1, we will see that this method may µ k+1 15

16 be viewed as a Tseng-type splitting method in the same modified) Hilbert space. To this end, note that we can indeed compute w k+1 from the formula w k+1 := ŵ k+1 + β Q 1 β,γ P θw k ) Q 1 β,γ P θŵ k+1 ) ). 22) To use the general convergence theory for Tseng s method, we apply his result to the modified Hilbert space. In particular, this requires Lipschitz continuity. This task is taken care of in the following result. Lemma 5.2. Suppose that P θ is 1/α-Lipschitz continuous in H K endowed with the scalar product, and that Q β,γ LH K) is a strongly monotone, self-adjoint operator. Then Q 1 β,γ P θ is Q 1 β,γ /α-lipschitz continuous in H K endowed with the scalar product Qβ,γ. Proof. Applying Lemma 2.1 c) with Q := Q β,γ implies Q 1 β,γ P θw) Q 1 β,γ P θv) 2 Q β,γ = Q β,γ Q 1 β,γ P θw) Q 1 β,γ P θv) ) Q 1 β,γ P θw) Q 1 β,γ P θv) = P θ w) P θ v) Q 1 β,γ Pθ w) P θ v) ) This completes the proof. Q 1 β,γ P θw) P θ v) 2 Q 1 β,γ α 2 w v 2 Q 1 β,γ 2 α 2 w v 2 Q β,γ. As in the case where P θ needs to be cocoercive we again have the problem that the operator B = Q 1 P β,γ θ itself depends on β. But, as in Section 4, we can resolve this problem by selecting γ accordingly. Theorem 5.3. Let Q β,γ LH K) be as defined in 14), and suppose that P θ from 5) is 1/α-Lipschitz continuous and monotone in H endowed with the usual scalar product, and that there is at least one variational KKT point for the Nash equilibrium problem 1). Choose β 0, α) and take γ 1 β 2 + M. Then the following statements hold: a) Q β,γ is self-adjoint and strongly monotone. b) The sequences w k = x k, µ k ) and ŵ k = ˆx k, µ k+1 ) generated by the Algorithm 5.1 converge weakly to a variational KKT pair x, µ ). Proof. Since γ > M, assertion a) follows from Remark 4.2. To verify statement b), first note that P θ : H K H K is also 1/α-Lipschitz continuous since P θ : H H is 1/α-Lipschitz continuous with respect to the given norm). Using γ 1 + M, we β 2 notice that 17) is valid; thus we have Q 1 β,γ 1. Lemma 5.2 therefore yields that Q 1 β,γ P θ is 1/α-Lipschitz continuous in H K endowed with the scalar product Qβ,γ. From Proposition 4.5, we see that Q 1 β,γ T, Q 1 β,γ T 1, Q 1 β,γ T 2, where T, T 1, T 2 are defined as in 10), are maximally monotone in H K, Qβ,γ ). Further, 22) together with Lemma 4.4 and 16) show that 18) and 19) together from Algorithm 5.1 can be interpreted as a forward-backward-forward splitting in H K endowed with the scalar product Qβ,γ. Thus the standard convergence result for Tseng s splitting method from, e.g., [1, Thm ]) together with Lemma 4.6 yields statement b). 16

17 6 Generalization to GNEPs with Conic Constraints So far, we only considered GNEPs of the form 1) with joint equality constraints linear both because of convexity reasons and in order to have a separable structure). We now want to extend the previous results to the more general class of GNEPs defined by 2) with some nonempty, closed, and convex cone C. In particular, this scenario allows us to take into account linear inequalities, a situation that will actually occur in our application in Section 7. The main idea is to transform the GNEP from 2) with conical constraints to a GNEP with linear equality constraints, and then to extend our previous results to this reformulated problem. To this end, note that the variational KKT conditions of 2) are given by 0 xν θ ν x ) + ϕ ν x ν) + B νλ + N Xν x ν), λ N C N ) B µ x µ b for all ν = 1,..., N, see [2, Ch. 3.1]. Since C is a convex cone, the latter condition is equivalent to B µ x µ b C, λ C, λ B µ x µ b = 0, where C := {λ K λ s 0 s C} denotes the polar cone of C. Under suitable regularity assumptions, these KKT conditions are necessary and sufficient optimality conditions. Next, we want to rewrite problem 2) as an equality-constrained GNEP. There are different ways, and in this section, we present two such reformulations that are probably the most natural ones. The first reformulation uses the optimization problems min θ ν x ν, x ν ) + ϕ ν x ν ) x ν X ν s ν C s.t. B µ x µ b s µ = 0 23) for all players ν = 1,..., N. The second reformulation uses a GNEP, where the first N 1 players ν deal with the optimization problems min θ ν x ν, x ν ) + ϕ ν x ν ) x ν X ν s.t. B µ x µ b s = 0, 24a) whereas the minimization problem of the final player ν = N is given by min θ N x N, x N ) + ϕ N x N ) x N X N s C s.t. B µ x µ b s = 0. 24b) Hence, the first reformulation 23) tries to reformulate the conic constraints as an equality constraint by splitting the conic condition among all players, whereas the second reformulation 24) uses a slack variable s only for the last player. Note that the splitting used in the first approach is not unique in general. The precise relation between the conical-constrained GNEP and these two formulations is discussed in the next two results. 17

18 Proposition 6.1. The following statements are equivalent: a) x = x 1,..., x N ) is a generalized Nash equilibrium of 2). b) x, s ) = x 1,..., x N, s 1,..., s N ) is a generalized Nash equilibrium of 23) for some s ν C, ν = 1,..., N. c) x, s ) = x 1,..., x N, s ) is a generalized Nash equilibrium of 24) for some s C. Proof. Note that the objective functions of all three GNEPs are identical and independent of s or s ν. Hence, the statement follows by noting that a feasible point of one GNEP yields a feasible point of the other GNEPs and vice versa. a) = c): Suppose that x is a generalized Nash equilibrium of 2). Then set s := Bµ x µ b C. It follows that x, s ) is feasible for 24). c) = b): Suppose that x, s ) is a Nash equilibrium of 24). Setting s ν = 0 for all ν = 1,..., N 1 and s N = s then gives a feasible point of 23). b) = a): Since C is a convex cone, it follows that N s µ = N N s µ/n) C, from which statement a) follows. The previous result states that the reformulated GNEPs have the same solutions in the sense of a generalized Nash equilibrium as the original GNEP from 2). The following result shows that the corresponding sets of variational KKT points are also the same, which is of particular interest for our methods since they compute variational KKT points. To this end, recall the close relationship between variational KKT points and variational equilibria from Theorem 3.1. Proposition 6.2. The following statements are equivalent: a) x, λ ) is a variational KKT pair of 2). b) x, s 1,..., s N ), λ ) is a variational KKT pair of 23) for some s ν C, ν = 1,..., N. c) x, s ), λ ) is a variational KKT pair of 24) for some s C. Proof. a) = c): Let x, λ ) be a variational KKT pair of 2), and define s := N B µx µ b. Then 0 xν θ ν x ) + ϕ ν x ) + B νλ + N Xν x ν) and λ N C N ) B µ x µ b = N C s ) for all ν = 1,..., N. This can be rewritten as ) ) Pθ x 0 ) + ϕx ) B + λ + N 0 I X C x, s ) which are exactly the variational KKT conditions of 24). and B µ x µ b s = 0, 18

19 c) = b): Suppose that x, s ), λ ) is a variational KKT pair of 24) and set s 1 :=... := s N := 1 N s. Then it is easy to see that x, s 1,..., s N ), λ ) is a variational KKT pair of 23). b) = a): Let x, s 1,..., s N ), λ ) be a KKT pair of 23), i.e., it satisfies P θ x ) + ϕx ) B I. λ + N X C C x, s 1,..., s N) 25) 0 I and µ B µx µ b µ s µ = 0. Using 25) and N X C C x, s 1,..., s N ) = N X x) N C s 1 ) N C s N ), we therefore obtain λ N C s ν) for all ν = 1,..., N. Hence λ N N Cs ν) N C N s ν) = N C N B νx ν b). We therefore have 0 P θ x ) + ϕx ) + B λ + N X x ) and λ N C N which are exactly the variational KKT conditions of 2). ) B µ x µ b, The previous results allow us to apply our ADMM-type algorithms to the more general case of GNEPs with conic constraints. Formally, the corresponding objective functions, and therefore also the resulting mapping P θ, need to be regarded as functions in x, s). Note that the latter is never strongly monotone, even if P θ x) is strongly monotone as a function in x alone. Fortunately, we only require a cocoercivity assumption, and the cocoercivity of P θ as a function in x immediately implies the same property of the operator P θ as a function in x, s). The other problem that requires some discussion is that the resulting optimization problems that need to be solved in each iteration are now mininization problems in the variables x ν, s ν ) or x N, s), respectively. Following the idea of Rockafellar [27], however, it turns out that the minimization with respect to s ν or s) can be carried out exactly so that we have to solve, once again, minimization problems in x ν alone. We first illustrate this for the subproblems arising from the formulation in 23). The resulting optimization problems have to compute, for each player ν = 1,..., N, the solution pair x k+1 ν, s k+1 ν ) { := arg min xν Xν ϕ ν x ν ) + xν θ ν x k ν, x k ν) x ν x k ν Hν s ν C + λ k B ν x ν s ν K + β B 2 νx ν + µ ν B µx k µ b s ν } µ ν sk µ 2 K + βγ x 2 ν x k ν 2 H ν + βγ s 2 ν s k ν 2 K = arg min xν X ν s ν C { ϕ ν x ν ) + xν θ ν x k ν, x k ν) x ν x k ν Hν + βγ 2 x ν x k ν 2 H ν + βγ B 2γ+1) νx ν + µ ν B µx k µ b N sk µ + λk β 2 K + β γ + 1)s 2γ+1) ν B ν x ν + µ ν B µx k µ b } µ ν sk µ + λk + β γsk ν) 2 K, 19

20 where the second equality follows after some elementary though lengthy) algebraic calculations. This shows that 1 λ s k+1 ν := s k+1 k ν x ν ) = Proj C γ + 1 β + B νx ν + B µ x k µ b s k µ + γsν) ) k. µ ν µ ν Using the squared distance function, we therefore obtain x k+1 ν from x k+1 ν = arg min xν X ν {ϕ ν x ν ) + xν θ ν x k ν, x k ν) x ν x k ν Hν + βγ B 2γ+1) νx ν + µ ν B µx k µ b N + βγ+1) dist 2 2 C sk µ + λk β 2 K + βγ 1 γ+1 B νx ν + µ ν B µx k µ b µ ν sk µ + λk for all ν = 1,..., N. The corresponding multiplier update becomes λ k+1 = λ k + β B ν x k+1 ν b s k+1 ν ). x 2 ν x k ν )} 2 H ν + β γsk ν) For the reformulation in 24), the resulting x ν -subproblems for ν = 1,..., N 1 stay essentially the same as in Algorithms 4.1 or 5.1. The only difference arises for the last player, who has to solve the subproblem x k+1 N, sk+1 ) { := arg min xn X N ϕ N x N ) + xn θ N x k N, xk N ) x N x k N H ν + λ k B N x N s K s C + β BN x 2 N + µ N B µx k µ b s 2 K + γ x N x k N 2 H ν + γ s s k K) } 2. Following the same technique as before yields the following updating formulas: x k+1 N := arg min xn X N {ϕ N x N ) + xn θ N x k N, xk N ) x N x k N H ν + βγ s k+1 := Proj C B 2γ+1) Nx N + µ N B µx k µ b s k + λk + βγ+1) dist C B γ+1 Nx N + µ N B µx k µ b + λk 1 γ+1 β 2 K + βγ λ k β + B Nx k+1 N + µ N B µx k µ b) + γs k)), λ k+1 := λ k + β N B νx k+1 ν b s k+1 ). x 2 N x )} k N 2 H ν + β γsk ), Note that the projections onto C are often easy to compute; e.g., in finite dimensions, we often have C as a Cartesian product of intervals like, 0], where the projection is simply given by Proj C y) = min{0, y}. A similar observation holds for the distance function since a corresponding computation of the projection onto C immediately yields the distance. Remark 6.3. In order to obtain convergence of Algorithm 4.1, we need to estimate the constant γ > 1/β 2 + M, where M is defined in 13). Please be aware that in the last section we changed the operators B ν from problem 2) to either an operator A ν x ν, s ν ) = B ν x ν s ν 20

21 in the case where we introduced N slack variables, or A ν x ν = B ν x ν for ν = 1,..., N 1 and A ν x ν, s) = B ν x ν + s in the case where we introduced one slack variable. Recall that the operator M is defined for the equality constraint problem, hence we need to use the modified operators A ν to compute the operator norm M. To do so, we see A ν 2 = sup x ν,s ν) 0 B ν u ν + s ν 2 K u ν, s ν ) 2 H ν K B ν u ν 2 K = sup + s ν 2 K u ν,s ν) 0 u ν 2 H ν + s ν 2 K B ν in the first case, or A ν = B ν for ν = 1,..., N 1 and A N 2 B N in the second case. Together with Lemma 4.3, we obtain M N 1) max,...,n { A ν 2 + 1}. 7 Applications and Numerical Results This section presents some examples for which we can apply Algorithms 4.1 and 5.1. We begin with two infinite-dimensional problems and then consider some finite-dimensional ones from the existing literature. 7.1 Elliptic Optimal Control GNEPs Here we discuss a class of examples that is closely related to previously used ones in [21, 22, 23], where different solution methods are considered. Our aim is to verify that this example satisfies the requirements which guarantee convergence of our solution methods). In order to describe the example, we slightly change our notation in this section to be consistent with the standard notation used in the optimal control setting. The players strategies x ν X ν are now called the controls and denoted by u ν L 2 Ω), Ω being a suitable domain in R d. The so-called state variable y H 1 0Ω) is the solution of an elliptic partial differential equation that depends on the players strategies u = u 1,..., u N ) L 2 Ω) N. We then consider the optimal control generalized Nash problem { 1 min u ν L 2 Ω) 2 yu ν, u ν ) yν d 2 L 2 Ω) + α } ν 2 u ν 2 L 2 Ω) s.t. y = N u ν in Ω, y = 0 in Ω, u ν x) [u a νx), u b νx)] f.a.a. x Ω, N u νx) ψx) f.a.a. x Ω 26a) 26b) with a sufficiently smooth domain Ω. Hence we have a tracking-type objective function for each player ν = 1,..., N, pointwise lower and upper bounds on the controls u ν, and an additional upper bound on the sum of controls N u ν. Using the control-to-state-map S : L 2 Ω) N H 1 0Ω) CΩ), u y, Su = S ν u ν, 21

22 where the last expression uses the linearity of the solution mapping S, we can rewrite 26) as { 1 min N S ν u ν y d 2 u ν L 2 ν Ω) 2 + α } ν L 2 Ω) 2 u ν 2 L 2 Ω) s.t. u ν [u a ν, u b ν], u ν ψ 27) and therefore obtain a GNEP of the form 2) by taking θ ν u) = 1 N S ν u ν yν d 2 2, L 2 Ω) ϕ νu ν ) = α ν 2 u ν 2 L 2 Ω), X ν = { u ν L 2 Ω) u ν x) [u a νx), u b νx)] f.a.a. x Ω }, A ν = Id, b = ψ. Our aim is to show that the resulting operator P θ is α-cocoercive for a suitable α > 0. To this end, note that the subsequent analysis makes use of the Gelfand triple H0Ω) 1 L 2 Ω) = L 2 Ω) H0Ω) 1, where we identify the Hilbert space L 2 Ω) with its dual. Moreover, recall that Su and S ν u ν are elements of H0Ω), 1 but we will often view them as elements of the space L 2 Ω). Formally, this means that we often consider the mappings i S and i S ν, where i := i H 1 0 L denotes the canonical embedding of 2 H1 0Ω) into L 2 Ω). For notational convenience, we follow the standard convention and omit writing the mapping i everywhere. Proposition 7.1. The operator P { θ is α-cocoercive with α := 1 min 1 N,...,N Proof. By an elementary calculation, see [22, Lem. 6.2], we obtain P θ u) P θ v) u v L 2 Ω) N = Su Sv 2 L 2 Ω) N. S ν 2 }. Since S νsu S ν Su, and thus S νsu / S ν Su, we therefore obtain P θ u) P θ v) u v L 2 Ω) N = Su Sv 2 L 2 Ω) N = 1 Su yν d Sv y d N ν) 2 L 2 Ω) which is precisely the statement. 1 N = 1 N 1 S S νsu yν) d SνSv yν) d 2 ν L 2 Ω) 1 [ Pθ u) S ν P θ v) ] ν { 1 S ν 1 N min,...,n 2 L 2 Ω) } Pθ u) P θ v) 2 L 2 Ω) N, In view of the previous result, we need a way to estimate the size of S ν. 22

23 Proposition 7.2. Suppose that the domain Ω R d is contained in a cube with edges of length c > 0. Then the solution operator S ν : L 2 Ω) H 1 0Ω) of satisfies S ν c 2. y ν v ν L 2 Ω) = u ν v ν v ν H 1 0Ω) 28) Proof. We omit the index ν in this proof. The definition of c together with the Poincaré inequality from, e.g., [5, Thm. 1.5] implies that u L 2 Ω) c u L 2 Ω) for all u H 1 0Ω). We therefore obtain 1 c 2 Su 2 L 2 Ω) Su 2 L 2 Ω) = Su Su L 2 Ω) = u Su L 2 Ω) u L 2 Ω) Su L 2 Ω), from which we get which is precisely our assertion. Su L S = sup 2 Ω) u 0 u L 2 Ω) c 2 u L 2 Ω), Summarizing the previous results, we finally obtain the α-cocoercivity of the operator P θ with a computable constant α. Theorem 7.3. Suppose that Ω R d is contained in a cube with a side length of c > 0, and let S ν : L 2 Ω) H 1 0Ω) be the solution operator of 28). Then the operator P θ from 5) is α-cocoercive with α := 1 Nc 2. Proof. The statement follows immediately from Propositions 7.1 and 7.2. Using N = 4 and Ω = 0, 1) 2 as in our numerical example below, we can choose c = 1 and then obtain the α-cocoercivity of P θ with α = 1/4. We estimated the norm of the operator M defined in 13) for conic constraints in Remark 6.3. This operator norm is necessary to estimate the constant γ 1/β 2 + M from Algorithm 4.1. By Remark 6.3, we thus obtain M 2N 1). In our case of a game with N = 4 players and Ω = 0, 1) 2, we obtain M 6. Thus, we have all constants that are required to run Algorithm 4.1. We implemented the elliptic optimal control GNEP presented above in MATLAB, with N = 4, Ω = 0, 1) 2, α ν = 1 for all ν = 1,..., 4, 1 yνx) d = 10 3 max 0, 4 4 max x 1 zν, 1 x 2 zν ))) 2, where z 1 = 0.25, 0.75, 0.25, 0.75) and z 2 = 0.25, 0.25, 0.75, 0.75), and the box constraints [u a νx), u b νx)] = [ 5, 5] for all ν = 1,..., N. The parameters were chosen to be β = 1/4 and γ = 1/β 2 + 6, as discussed above. We present results for the two choices ψ 1 x) := 3 cos 5 ) ) x 1 0.5) x 2 0.5) and 23

ψ 2 x) := 3 cos 5 ) ) x 1 0.5) 2 + x 2 0.5) 2 + 1.

see 24). We call these two different approaches A. 23) and A. 24). The numerical results for different discretizations are given in Tables 1 and 2.

24 ψ 2 x) := 3 cos 5 ) ) x 1 0.5) 2 + x 2 0.5) Recall that in Section 6 we presented two different ways to rewrite problem 2) as an equalityconstrained problem either by inserting N slack variables, see 23), or by using a single slack variable, see 24). We call these two different approaches A. 23) and A. 24). The numerical results for different discretizations are given in Tables 1 and 2. The numerical behaviour is quite different for the two instances: Using ψ 1, the number of iterations stays essentially constant for finer discretizations, whereas for ψ 2, the iteration numbers are increasing. A possible explanation can be seen from the solution, especially by looking at the pictures for the corresponding Lagrange multipliers. Figure 1 indicates that the Lagrange multiplier for ψ 1 is in L 2, whereas Figure 2 allows the interpretation that, for ψ 2, the optimal multiplier seems to belong to a measure space only. Table 1: Number of iterations Its) of Algorithm 4.1 for ψ 1 until the KKT conditions are satisfied with accuracy Its of A. 23) Its of A. 24) Figure 1: The controls u 1,..., u 4, their sum, and the Lagrange multiplier for the problem with ψ 1. Table 2: Number of iterations Its) of Algorithm 4.1 for ψ 2 until the KKT conditions are satisfied with accuracy Its of A. 23) Its of A. 24)

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of