On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean

Size: px

Start display at page:

Download "On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean"

Tracey Simmons
6 years ago
Views:

1 On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean Renato D.C. Monteiro B. F. Svaiter March 17, 2009 Abstract In this paper we analyze the iteration-complexity of the hybrid proximal extragradient (HPE) method for finding a zero of a maximal monotone operator recently proposed by Solodov and Svaiter. One of the key points of our analysis is the use of a new termination criteria based on the ε- enlargement of a maximal monotone operator. The advantages of using this termination criterion it that its definition does not depend on the boundedness of the domain of the operator. We then show that Korpelevich s extragradient method for solving monotone variational inequalities falls in the framework of the HPE method. As a consequence, using the complexity analysis of the HPE method, we obtain new complexity bounds for Korpelevich s extragradient method which do not require the feasible set to be bounded, as assumed in a recent paper by Nemirovski. Another feature of our analysis it that the derived iteration complexity bounds are proportional to the distance of the initial point to the solution set. Using also the framework of the HPE method, we study the complexity of a variant of a Newton-type extragradient algorithm proposed by Solodov and Svaiter for finding a zero of a smooth monotone function with Lipschitz continuous Jacobian. 1 Introduction A broad class of optimization, saddle point, equilibrium and variational inequality problems can be posed as the monotone inclusion problem, namely: finding x such that 0 T (x), where T is a maximal monotone point-to-set operator. The proximal point method, proposed by Rockafellar [9], is a classical iterative scheme for solving the MI problem which generates a sequence {x k } according to x k = (λ k T + I) 1 (x k 1 ). It has been used as a generic framework for the design and analysis of several implementable algorithms. The classical inexact version of the proximal point method allows for the presence of a sequence of summable errors in the above iteration, i.e.: x k (λ k T + I) 1 (x k 1 ) e k, e k <. School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, ( monteiro@isye.gatech.edu). The work of this author was partially supported by NSF Grant CCF and ONR Grant ONR N IMPA, Instituto de Matematica Pura e Aplicada, Rio de Janeiro, RJ, Brasil ( benar@impa.br). The work of this author was partially supported by FAPERJ grants E-26/ /2006 and E-26/ /2008 and CNPq grants / and / k=1

2 Convergence results under the above error condition have been establish in [9] and have been used in the convergence analysis of other methods that can be recast in the above framework. New inexact versions of the proximal point method, with relative error criteria were proposed by Solodov and Svaiter [10, 11, 13, 12]. In this paper we will analyze the iteration-complexity of one of these inexact versions of the proximal point method introduced in [10], namely the hybrid proximal extragradient (HPE) method. One of the key points of our analysis is the use of a new termination criteria based on the ε-enlargement of T introduced in [1]. More specifically, given ε > 0, the algorithm terminates whenever it finds a point ȳ and a pair ( v, ε) such that v T ε (ȳ), max{ v, ε} ε. (1) For each x, T ε (x) is is an outer approximation of T (x) which coincides with T (x) when ε = 0. Hence, for ε = 0 the above termination criteria reduces to the condition that 0 T (x). The ε- enlargement of maximal monotone operators is a generalization of the ε-subgradient enlargement of the subdifferential of a convex function. The advantages of using this termination criterion it that it does not require boundedness of the domain of T. Another feature of our analysis it that the derived iteration complexity bounds are proportional to the distance of the initial point to the solution set. Results of this kind are know for minimization of convex functions but, to the best our knowledge, are new in the context of monotone variational inequalities (see for example [6]). We then establish that Korpelevich s extragradient method for solving variational inequalities is a special case of the HPE. This allows us to obtain an O(d 0 /ε) iteration complexity for termination criteria (1), where d 0 is the distance of the initial iterate to the solution set. Since, together with every iterate ȳ k, the method also generates a pair ( v k, ε k ) so that (1) can be checked, there is no need to assume the feasible set to be bounded or to estimate d 0. We also translate (and sometimes strengthen) the above complexity results to the context of monotone variational inequalities with linear operators and/or bounded feasible sets, and monotone complementarity problems. It should be noted that, prior to this work, Nemirovski [5] and Nesterov [7] have proposed and/or study the complexity of algorithms for solving the monotone variational inequality problems. More specifically, Nemirovski [5] study the complexity of Korpelevich s extragradient method under the assumption that the feasible set is bounded and an upper bound of its diameter is known. Nesterov [7] proposed a new dual extrapolation algorithm for solving variational inequalities whose termination depends on the guess of a ball centered at the initial iterate. Using also the framework of the HPE method, we study the complexity of a variant of a Newtontype extragradient algorithm proposed in [10] for finding a zero of a smooth monotone function with Lipschitz continuous Jacobian. 2 The ε-enlargement Since our complexity analysis is based on the ε-enlargement of a monotone operator, in this section we give its definition and review some of its properties. We also derive new results for the ε-enlargement of Lipschitz continuous monotone operators. A point-to-set operator T : R n R n is a relation T R n R n and T (x) = {v R n (x, v) T }. 2

3 Alternatively, one can consider T as a multi-valued function of R n into the the family (R n ) = 2 (Rn ) of subsets of R n. Regardless of the approach, it is usual to identify T with its graph, An operator T : R n R n is monotone if Gr(T ) = {(x, v) R n R n v T (x)}. v ṽ, x x 0, (x, v), ( x, ṽ) T and T is maximal monotone if it is monotone and maximal in the family of monotone operators with respect to the partial order of inclusion, i.e., S : R n R n monotone and Gr(S) Gr(T ) implies that S = T. In [1], Burachik, Iusem and Svaiter introduced the ε-enlargement of maximal monotone operators. Here we extend this concept to a generic point-to-set operator in R n. Given T : R n R n and a scalar ε, define T ε : R n R n as T ε (x) = {v R n x x, v ṽ ε, x R n, ṽ T ( x)}, x R n. (2) We now state a few useful properties of the operator T ε that will be needed in our presentation. Proposition 1. Let T, T : R n R n. Then, a) if ε 1 ε 2, then T ε 1 (x) T ε 2 (x) for every x R n ; b) T ε (x) + (T ) ε (x) (T + T ) ε+ε (x) for every x R n and ε, ε R; c) T is monotone if, and only if, T T 0 ; d) T is maximal monotone if, and only if, T = T 0 ; e) if T is maximal monotone, {(x k, v k, ε k )} R n R n R + converges to ( x, v, ε), and v k T ε k(x k ) for every k, then v T ε ( x). Proof. Statements (a), (b), (c), and (d) follow directly from definition (2) and the definition of (maximal) monotonicity. For a proof of statement (e), see [3]. We now make two remarks about Proposition 1. If T is a monotone operator and ε 0, it follows from a) and d), that T ε (x) T (x) for every x R n, and hence that T ε is really an enlargement of T. Moreover, if T is maximal monotone, then e) says that T and T ε coincide when ε = 0. The ε-enlargement of monotone operators is a generalization of the ε-subdifferential of convex functions. Recall that for a proper function f : R n R and scalar ε 0, the ε-subdifferential of f if the operator ε f : R n R n defined as ε f(x) = {v f(y) f(x) + y x, v ε, y R n }, x R n. When ε = 0, the operator ε f is simply denoted by f and is referred to as the subdifferential of f. The operator f is trivially monotone if f is proper. If f is a proper lower semi-continuous convex function, then f is maximal monotone [8]. The following result lists some useful properties about the ε-subdifferential of a proper convex function. 3

4 Proposition 2. Let f : R n R be a proper convex function. Then, a) ε f(x) ( f) ε (x) for any ε 0 and x R n ; b) ε f(x) = {v f(x) + f (v) x, v + ε} for any ε 0 and x R n ; c) if v f(x) and f(y) <, then v ε f(y), where ε := f(y) [f(x) + y x, v ]. Note that, due to the definition of T ε, the verification of the inclusion v T ε (x) requires checking an infinite number of inequalities. This verification is feasible only for specially-structured instances of operators T. However, it is possible to compute points in the graph of T ε using the following weak transportation formula [2]. Theorem 3 ([2, Theorem 2.3]). Suppose that T : R n R n is maximal monotone. Let x i, v i R n and ε i, α i R +, for i = 1,..., k, be such that v i T ε i (x i ), i = 1,..., k, α i = 1, and define x = α i x i, v = Then, the following statements hold: a) ε 0 and v T ε ( x); α i v i, ε = α i ε i + α i x i x, v i v. (3) b) if, in addition, T = f for some proper lower semi-continuous convex function f and v i εi f(x i ) for i = 1,..., k, then v ε f( x). Whenever necessary, we will identify a map F : Ω R n R n with the point-to-set operator F : R n R n, { {F (x)}, x Ω ; F (x) =, otherwise. The following result is an immediate consequence of Theorem 3. Corollary 4. Let a monotone map F : Ω R n R n, points x 1,..., x k R n and nonnegative scalars α 1,..., α k such that k α i = 1 be given. Define x := α i x i, F := α i F (x i ), ε := α i x i x, F (x i ) F. (4) Then, ε 0 and F F ε ( x). Proof. First use Zorn s Lemma to conclude that there exist a maximal monotone T : R n R n which extends F, that is F T. To end the proof, apply Theorem 3 to T and use the assumption that is extends F. 4

5 Definition 1. For a constant L 0, the map F : Ω R m R n is said to be L-Lipschitz continuous if F (x) F ( x) L x x, x, x X. (5) Now we are ready to prove that, for a monotone Lipschitz continuous map F defined on the whole space R n, the distance between any vector in F ε (x) and F (x) is proportional to ε. Proposition 5. If F : R n R n is monotone and L-Lipschitz continuous on R n, then for every x R n, ε 0, and v F ε (x): F (x) v 2 Lε. Proof. Let v F ε (x) be given. Then, for any x R n, we have F (x) v, x x = F ( x) v, x x F ( x) F (x), x x ε F ( x) F (x) x x ε L x x 2, where the first inequality follows from the definition of F ε and the Cauchy-Schwarz inequality and the second one from (5). Specializing this inequality for where p = v F (x), we obtain p 2 Lε. x = x + 1 2L p, Corollary 6. Let a monotone map F : R n R n, points x 1,..., x k R n and nonnegative scalars α 1,..., α k such that k α i = 1 be given and define x, F and ε as in Corollary 4. Then, ε 0 and F ( x) F 2 ε L. (6) Proof. By Corollary 4, we have F F ε ( x), which together with Proposition 5 imply that (6) holds. We observe that when F is an affine (monotone) map, the left hand side of (6) is zero in view of (4). Hence, in this case, the right hand side of (6) is a poor estimate of the error F ( x) F. We will now develop a better estimate of this error which depends on a certain constant which measures the nonlinearity of a monotone map F. Definition 2. For a monotone map F : Ω R n R n, let Nonl(F ) be the infimum of all L 0 such that there exist an L-Lipschitz map G an an affine map A such that with G and A monotone. F = G + A, Clearly, if F is a monotone affine map, then Nonl(F ) = 0. Note also that if F is monotone and L-Lipschitz, then Nonl(F ) L. We note however that Nonl(F ) can be much smaller than L for many relevant instances. For example, if F = G + µa, where µ 0, A is a monotone affine map and the map G is monotone and L-Lipschitz then we have Nonl(F ) L. Hence, in the latter case, Nonl(F ) is bounded by a constant which does not depend on µ while the Lipschitz constant of F converges to as µ if A is not constant. 5

6 Proposition 7. Let a monotone map F : R n R n, points x 1,..., x k R n and nonnegative scalars α 1,..., α k such that k α i = 1 be given and define x as in (3) and F and ε as in Corollary 4. Then, ε 0 and F ( x) F 2 ε N F. (7) where N F := Nonl(F, R n ). Proof. Suppose that F = G+A where G is an L-Lipschitz monotone map and A is an affine monotone map. Define Ḡ = k α ig(x i ), ε g := k α i G(x i ) Ḡ, x i x, ā = k α ia(x i ), ε a := k α i A(x i ) ā, x i x. Since G is a monotone map, it follows from Corollary 6 that ε g 0 and G( x) Ḡ 2 ε g L. Moreover, since A is affine and monotone, we conclude that ā = A( x) and ε a 0. Also, noting that F = Ḡ + ā and ε = εa + ε g ε g, we conclude that F ( x) F = [G( x) + A( x)] [ Ḡ + ā] = G( x) Ḡ 2 ε g L 2 εl. Bound (7) now follows by noting that N F is the infimum of all L for G and A as above. 3 Approximate solutions of the variational inequality problem In this section, we introduce two notions of approximate solutions of variational inequalities problem. We then discuss how the monotone VI problem can be viewed as a special instance of the monotone inclusion problem and interpret these two notions of approximate solutions for the VI problem in terms of criteria (1). We assume throughout this section that Ω R n, F : Ω R n is a continuous monotone map and X Ω is a non-empty closed convex set. The (monotone) variational inequality (VI) problem with respect to the pair (F, X), denoted by V IP (F, X), consists of finding x such that x X, x x, F (x ) 0, x X. (9) It is well know that, under the above assumptions, condition (9) is equivalent to x X, x x, F (x) 0, x X. (10) We will now introduce two notions of approximate solutions of the V IP (F, X) which are essentially relaxations of the characterizations (9) and (10) of an exact solution of VI problem. Definition 3. Given x X, the pair (r, ε) R n R + is a weak residual of x for V IP (F, X) if and (r, ε) is a strong residual of x for V IP (F, X) if x x, F (x) r ε, x X, (11) x x, F ( x) r ε, x X. (12) (8) 6

7 If (r, ε) is a strong residual of x, the it is also a weak residual of x. Note that the solutions of V IP (F, X) are exactly the points in X with weak/strong residuals (0, 0). More generally, the size of the residual of a point measures its quality as an approximate solution of V IP (F, X). The next definition formalize this observation. Definition 4. A point x X is a (ρ, ε)-weak (resp., (ρ, ε)-strong) solution of V IP (F, X) if there exists r R n such that r ρ and (r, ε) is a weak (resp., strong) residual of x for V IP (F, X). A (0, ε)-weak (resp., (0, ε)-strong) solution of V IP (F, X) is simply called an ε-weak (resp., ε- strong) solution of V IP (F, X). Note that an (ρ, ε)-strong solution is also a (ρ, ε)-weak solution. The following result shows that, when X is bounded, every approximate solution of V IP (F, X) with weak (resp., strong) residual (r, ε) is an ε -weak (resp., ε -strong) solution of V IP (F, X) for some ε. Recall that the diameter D X of a set X R is defined as D X := sup{ x 1 x 2 : x 1, x 2 X}. (13) Proposition 8. Assume that X has finite diameter D X. For any x X, if (r, ε) R n R + is a weak (resp., strong) residual of x for V IP (F, X), then x is an ε -weak (resp., ε -strong) solution of V IP (F, X), where ε := ε + sup r, x x ε + r D X. x X Proof. This result follows from Definition 3, Definition 4, the definition of D X and the Cauchy- Schwarz inequality. We will now characterize the above notions of approximate solutions for the V IP (F, X) in terms of ε-enlargements of certain maximal monotone operators. Recall that the normal cone operator of X is the point-to-set map N X : R n R n, {, x / X, N X (x) = {v R n (14), y x, v 0, y X}, x X. From the above definition, it follows that (9) is equivalent to F (x ) N(x ), and hence to the monotone inclusion problem 0 (F + N X )(x ). (15) Note that the assumption on F and X guarantees maximal monotonicity of F +N X (see for example Proposition of [4]). It turns out that approximate solutions of V IP (F, X) with weak, or strong, residual (r, ε) are related with certain approximate solutions of problem (15) as described by the following result. Proposition 9. Let x X and pair (r, ε) R n R + be given. Then, the following equivalences hold: a) (r, ε) is a weak residual of x for V IP (F, X) if, and only if, r (F + N X ) ε ( x); b) (r, ε) is a strong residual of x for V IP (F, X) if, and only if, r (F + N ε X )( x). 7

8 Proof. a) Using definition (2), we have that r (F + N X ) ε ( x) is equivalent to x x, F (x) + q r ε, x R n, q N X (x). Taking the infimum of the left hand side for q N X (x) and using the fact that the domain of N X is X and the assumption that x X, we conclude that this condition is equivalent to (11), i.e., to (r, ε) being a weak residual of x for V IP (F, X). b) This equivalence follows from a) with r = r F ( x) and F 0. The following result follows immediately from Proposition 9. Corollary 10. For any x X and ε 0, the following equivalences hold: a) x is an ε-weak solution of V IP (F, X) if, and only if, 0 (F + N X ) ε ( x); b) x is an ε-strong solution of V IP (F, X) if, and only if, 0 (F + N ε X ) ( x). We will now provide a different characterization for x to be an approximate solution of V IP (F, X) with strong residual (r, ε) for the case when the feasible set is a closed convex cone. We will first review a few well-known concepts. The indicator function of X is the function δ X : R n R defined as { 0, x X, δ X (x) =, otherwise. The normal cone operator N X of X can be expressed in terms of δ X as N X = δ X. Direct use of definition (2) and the definition of ε-subdifferential shows that for f = δ X, inclusion on Proposition 2(a) holds as equality, i.e.: (N X ) ε = ε δ X. (16) We now state the following technical result with characterizes membership in (N X ) ε in terms of certain ε-complementarity conditions. Lemma 11. If K is a nonempty closed convex cone and K is its dual cone, i.e. then, for every x K, we have K = {v R n x, v 0, x K}, q (N K ) ε (x) q K, x, q ε. Proof. Since (N K ) ε = ε δ X, it follows from Proposition 2(b) that the condition q (N X ) ε (x) is equivalent to (δ K ) ( q) = δ K (x) + (δ K ) ( q) x, q + ε, where the first equality follows from the assumption that x K. To end the proof, it suffices to use the fact that (δ K ) = δ K. When X = K is a closed convex cone, the following result shows that the condition that x be an approximate solution of V IP (F, K) with strong residual (r, ε) is equivalent to the pair ( x, q) satisfy the ε-complementarity condition, where q is a vector that is close to F ( x) in the sense that F ( x) q = r. 8

9 Proposition 12. Assume that X = K, where K is a nonempty closed convex cone, and that conditions A.2 and A.3 hold. Then, (r, ε) is a strong residual of x K for V IP (F, K) if, and only if, the vector q := F ( x) r satisfies the conditions q K and x, q ε. Proof. This result follows as an immediate consequence of Proposition 9(b) and Lemma The hybrid proximal extragradient method Throughout this section, we assume that T : R n R n is a maximal monotone operator. monotone inclusion problem for T is: find x such that The 0 T (x). (17) We also assume throughout this section that this problem has a solution, that is, T 1 (0). In this section we study the iteration complexity of the hybrid proximal extragradient method introduced in [10] for solving the above problem. We start by stating the hybrid proximal extragradient method (HPE method from now on). Hybrid Proximal Extragradient (HPE) Method: 0) Let x 0 R n and 0 σ < 1 be given and set k = 1. 1) Choose λ k > 0 and find y k, v k R n, σ k [0, σ] and ε k 0 such that v k T ε k (y k ), λ k v k + y k x k λ k ε k σ 2 k y k x k 1 2. (18) 2) Define x k = x k 1 λ k v k, set k k + 1 and go to step 1. end We now make several remarks about the HPE method. First, the HPE method does not specify how to choose λ k and how to find y k, v k and ε k as in (18). The particular choice of λ k and the algorithm used to compute y k, v k and ε k will depend on the particular implementation of the method and the properties of the operator T. Second, if y := (λ k T + I) 1 x k 1 is the exact proximal point iterate, or equivalently v T (y), (19) λ k v + y x k 1 = 0, (20) for some v R n, then (y k, v k ) = (y, v) and ε k = 0 satisfies (18). Therefore, the error criterion (18) relaxes the inclusion (19) to v T ε (y) and relaxes equation (20) by allowing a small error relative to y k x k 1. Third, note also that due to step 2 of the HPE method and the error criteria in (18), we have λ k v k + y k x k 1 = y k x k, y k x k 2 + 2λ k ε k σ 2 k y k x k 1 2. (21) Before establishing the iteration complexity results for the HPE method, we need some technical results. Lemma 13. For every k N, (1 σ) y k x k 1 λ k v k (1 + σ) y k x k 1. (22) 9

10 Proof. In view of (18) and the triangle inequality for norms, we have λ k v k y k x k 1 λ k v k + y k x k 1 σ k y k x k 1, k N, which clearly implies (22). Lemma 14. The following statements hold: a) for any x R n and i N, x x i 1 2 = x x i 2 + 2λ i y i x, v i + y i x i 1 2 x i y i 2 ; (23) b) for any x T 1 (0) and i N, x x i 1 2 x x i 2 + (1 σ 2 i ) y i x i 1 2 ; c) for any x T 1 (0), the sequence { x x k } is non-increasing and x x 0 2 (1 σk 2 ) y k x k 1 2 (1 σ 2 ) y k x k 1 2. (24) k=1 Proof. To prove a), let x R n and i N. Then, x x i 1 2 = x x i x x i, x i x i 1 + x i x i 1 2 = x x i x y i, x i x i y i x i, x i x i 1 + x i x i 1 2 k=1 = x x i x y i, x i x i 1 + y i x i 1 2 y i x i 2. Statement a) now follows by noting that x i x i 1 = λ i v i, in view of step 2 of the HPE method. If 0 T (x ) then, since v i T ε i (y i ), the definition of T ε k implies that y i x, v i = y i x, v i 0 ε i. Using statement a) with x = x, the above inequality and (18), we have x x i 1 2 x x i 2 2λ i ε i + y i x i 1 2 x i y i 2 x x i 2 + (1 σ 2 i ) y i x i 1 2, which proves b). Statement c) follows immediately from b) and the assumptions 0 σ i σ < 1 (see steps 0 and 1 of the HPE method). Lemma 15. Let d 0 be the distance of x 0 to T 1 (0). For every α R and every k, there exists an i k such that ( ) ( ) v i d 0 (1 + σ) λ α 2 i (1 σ) k, ε i d2 0 σ2 λ α 1 i j=1 λα 2(1 σ 2 ) k. (25) j j=1 λα j 10

11 Proof. Define, for each k N, τ k := max { 2εk λ 1 α k σ 2, vk 2λ } 2 α k (1 + σ) 2. Then, in view of the assumption that σ k σ for all k N and relations (18) and (22), we have j=1 { τ k λ α k = max 2εk λ k σ 2, λ2 k v k 2 } (1 + σ) 2 y k x k 1 2. Letting x T 1 (0) be such that d 0 = x 0 x, the latter inequality together with (24) then imply that τ j λ α j y j x j 1 2 x 0 x 2 d 2 0 (1 σ 2 = ) (1 σ 2 ), and hence that j=1 ( ) min τ j λ α j j=1,...,k j=1 d 2 0 (1 σ 2 ). The conclusion of the proposition now follows immediately from the latter inequality and the definition of τ k. Theorem 16. Let d 0 be the distance of x 0 to T 1 (0). The following statements hold: a) if λ := inf λ k > 0, then for every k N there exists i k such that ( ) v i d σ λ 1 1 σ k j=1 λ d σ j λ k 1 σ ε i σ2 d (1 σ 2 ) k λ i σ 2 d 2 0 2(1 σ 2 )λk, b) for every k N, there exists an index i k such that ( ) v i d σ 1 1 σ k, ε i j=1 λ2 j σ 2 d 2 0 λ i 2(1 σ 2 ), (26) k j=1 λ2 j c) if k=1 λ2 k = then the sequences {y k} and {x k } converge to some point in T 1 (0). Proof. Statements a) and b) follow from Lemma 15 with α equal 1 and 2, respectively. To prove c), first note that if v i = 0 and ε i = 0 for some i then x i = y i T 1 (0) and x k = x i for all k i. We may then assume that a i := max{ v i, ε i } > 0, i. 11

12 The assumption that λ2 i = + imply that lim k max j=1,...,k λ j k λ2 i = 0. Therefore, using also (26) we conclude that lim k min a i = 0.,...k Therefore, there exist a subsequence {a i } i K which converges to 0. Lemma 14(c) implies that {x k } is bounded and lim y k x k 1 = 0, (27) k Hence, {y k } is also bounded and this implies that there exists a subsequence {y i } i K, with K K, which converges to some y. Since lim i K a i = 0, using Proposition 1(e) we conclude that y T 1 (0). Now use (27) to conclude that y is an accumulation point of {x k }. Since x k y is non-increasing in view of Lemma 14(c), we conclude that lim k x k y = 0. In view of (27), the sequence {y k } also converges to y. Both Lemma 15 and Theorem 16 estimate the quality of the best among the iterates y 1,..., y k. We will refer to these estimates as the pointwise complexity bounds for the HPE algorithm. The sequence of ergodic means {ȳ k } associated with {y k }, is defined as ȳ k := 1 λ i y i, where Λ k := Λ k λ i (28) The next result, which is a straightforward application of the transportation formula, shows that the ergodic iterate is related with the ε-enlargement of T, even when ε i 0. Thus, it provides provides a computable residual pair for ȳ k. Lemma 17. For every k N, define Then v k := 1 λ i v i, ε k := 1 λ i (ε i + y i ȳ k, v i v k ). (29) Λ k Λ k ε k 0 v k T ε k (ȳ k ) Proof. The inequality ε k 0 and the inclusion v k T ε k(ȳ k ) follow from Theorem 3 and the inclusions v i T ε i (y i ). The following result gives alternative expressions for the residual pair ( v k, ε k ), which will be used for obtaining bounds on its size. Proposition 18. For any k we have v k = 1 Λ k (x 0 x k ), (30) ε k = 1 2Λ k [ 2 ȳk x 0, x k x 0 x k x β k ] (31) 12

13 where β k := ) (2λ i ε i + x i y i 2 y i x i (32) Proof. The definitions of Λ k and v k in (28) and (29) and the update rule in step 2 of the HEP method imply that x k = x 0 λ i v i = x 0 Λ k v k, from which (30) follows. Direct use of the definition of ȳ k yields λ i y i ȳ k, v i v k = = λ i y i ȳ k, v i λ i y i ȳ k, v k λ i y i ȳ k, v i λ i (y i ȳ k ), v k = λ i y i ȳ k, v i. Adding (23) with x = ȳ k from i = 1 to k, we have ȳ k x 0 2 = ȳ k x k 2 + ( 2λi y i ȳ k, v i y i x i 1 2 x i y i 2). Combining the above two equations with the definitions of ε k in (29) and β k in (32) we obtain ε k = 1 2Λ k [ ȳk x 0 2 ȳ k x k 2 + β k ]. Relation (31) now follow from the above equation and the identity ȳ k x k 2 = ȳ k x ȳ k x 0, x 0 x k + x 0 x k 2. Finally, the inequality in (32) is due to (21) and the assumption 0 σ i σ < 1 (see steps 0 and 1 of the HPE method). The next result provides estimates on the quality measure of the ergodic mean ȳ k. It essentially shows that the quantities v k and ε k appearing in (29) are O(1/Λ k ). Theorem 19. For every k N, let Λ k, ȳ k, v k and ε k be as (28), (29). Then, for every k N, we have v k 2d 0, ε k 2θ kd 2 0, (33) Λ k Λ k where d 0 is the distance of x 0 to T 1 (0), θ k := 1 + σ τ k (1 σ 2 ), τ λ i k = max 1. (34),...,k Λ k 13

14 Proof. Let x T 1 (0) be such that x 0 x = d 0. Using Lemma 14(c), we have x k x d 0, and hence x k x 0 x k x + x x 0 2d 0, (35) for every k N. This, together with (30), yields the first bound in (33). Defining x k = 1 Λ k λ i x i, and noting (21), (24), (28), (34) and (35), we have 1 x k x 0 ȳ k x k 2 1 Λ k Λ k λ i (x i x 0 ) 1 Λ k λ i y i x i 2 τ k λ i x i x 0 2d 0, (36) y i x i 2 σ 2 τ k y i x i 1 2 σ2 τ k d 2 0 (1 σ 2 ), (37) where the first inequalities in the above two relations are due to the convexity of and 2, respectively. The above two relations together with (34) and the triangular inequality for norms yield τk ȳ k x 0 ȳ k x k + x k x 0 σd 0 1 σ 2 + 2d 0 = (1 + θ k ) d 0. This relation, expressions (31), (32) and (35), and the Cauchy-Schwarz inequality then imply ε k 1 2Λ k [ xk x ȳ k x 0 x k x 0 ] 1 2Λ k max{ t 2 + 2(1 + θ k )d 0 t : t [0, 2d 0 ]}. The desired bound for ε k in (33) now follows from the fact that the above maximum is achieved at t = 2d 0. 5 Korpelevich s extragradient method for solving the monotone VIP Our main goal in this section is to establish the complexity analysis of Korpelevich s extragradient method for solving the monotone VI problem over an unbounded feasible set. First we state Korpelevich s extragradient algorithm and show that it can be interpreted as a particular case of the HPE method. This allows us to use the results of Section 4 to obtain iteration complexity bounds for Korpelevich s extragradient method. In Subsection 5.1, we obtain further results specialized to the case when one or more of the following assumptions are made on the VI problem: X is bounded, X is a cone or F is linear. Throughout this section, unless otherwise explicitly mentioned, we assume that the set X R n and the map F : Ω R n R n satisfy the following assumptions: A.1) X Ω is a nonempty closed convex set; A.2) F is monotone and L-Lipschitz continuous ( on Ω); 14

15 A.3) the set of solutions of V IP (F, X) is nonempty. We start by stating Korpelevich s extragradient algorithm. The notation P X denotes the projection operator onto the set X. Korpelevich s extragradient Algorithm: 0) Let x 0 X and 0 < σ < 1 be given and set λ = σ/l and k = 1. 1) Compute y k = P X (x k 1 λf (x k 1 )), x k = P X (x k 1 λf (y k )). (38) 2) Set k k + 1 and go to step 1. end We observe that assumptions A.1 and A.2 imply that the operator T = F + N X is maximal monotone (see for example Proposition of [4]). Also, recall that solving V IP (F, X) is equivalent to solving the monotone inclusion problem 0 T (x), where = F + N X (see the discussion on the paragraph following Proposition 8). We will now show that Korpelevich s extragradient algorithm for solving V IP (F, X) can be viewed as a particular case of the HEP method for solving the monotone inclusion problem 0 T (x), and this will allow us to obtain iteration-complexity bounds for Korpelevich s extragradient algorithm, without assuming boundedness of the feasible set X. Theorem 20. Let {y k } and {x k } be the sequences generated by Korpelevich s extragradient algorithm and, for each k N, define Then: a) x k = x k 1 λv k ; q k = 1 λ [x k 1 λf (y k ) x k ], ε k = q k, x k y k, v k = F (y k ) + q k. (39) b) q k εk δ X (y k ) and v k (F + N ε k X )(y k) (F + N X ) ε k(y k ); c) λv k + y k x k λε k σ 2 y k x k 1 2. As a consequence of the above statements, it follows that Korpelevich s algorithm is a special case of the HPE method. Proof. Statement a) follows immediately from the definition of q k and v k in (39). Recall that the projection map P X has the property that z P X (z) N X (P X (z)), z R n. (40) Using this fact together with the definition of x k and q k in (38) and (39), respectively, it follows that q k N X (x k ) = δ X (x k ). (41) 15

16 The first inclusion in statement b) follows from (41) and Proposition 2(c). Using this inclusion, the definition of v k, and identity (16), we obtain v k = F (y k ) + q k F (y k ) + εk δ X (y k ) = F (y k ) + (N X ) ε k (y k ) F 0 (y k ) + (N X ) ε k (y k ) (F + N X ) ε k (y k ), where the two last inclusions follows from the monotonicity of F and statement c) and statement b) (with ε = 0) of Proposition 1. To prove statement c), define p k = 1 λ [x k 1 λf (x k 1 ) y k ]. (42) Using this definition, the definition of y k in (38) and (40), we conclude that p k N X (y k ). This fact, together with (39), gives the estimate This, together with a), imply that ε k = q k p k, x k y k + p k, x k y k q k p k, x k y k λv k + y k x k λε k = y k x k 2 + 2λε k x k y k 2 + 2λ q k p k, x k y k = λ(q k p k ) + x k y k 2 λ 2 q k p k 2 λ(q k p k ) + x k y k 2 = λ(f (x k 1 ) F (y k )) 2, where the last equality follows from the definition of q k and p k in (39) and (42). Statement c) follows from the previous inequality, the assumption that λ = σ/l and our global assumption that F is L-Lipschitz continuous on X, i.e., (5) holds. The following result establishes the global convergence rate of Korpelevich s extragradient algorithm in terms of the criteria (1) with T = F + N X. Theorem 21. Let {y k } and {x k } be the sequences generated by Korpelevich s extragradient algorithm and let {v k } and {ε k } be the sequences given by (39). For every k N, define v k = 1 k v i, ȳ k = 1 k y i, (43) ε k = 1 k [ε i + y i ȳ k, v i v k ]. (44) Then, for every k N, the following statements hold: a) y k is an approximate solution of V IP (F, X) with strong residual (v k, ε k ), and there exists i k such that v i Ld 0 (1 + σ) σ k(1 σ), ε σld 2 0 i 2(1 σ 2 )k ; 16

17 b) ȳ k is an approximate solution of V IP (F, X) with weak residual ( v k, ε k ), and v k 2Ld 0 kσ, ε k 2Ld2 0 θ k kσ, (45) where d 0 is the distance of x 0 to the solution set of V IP (F, X) and θ k := 1 + σ k(1 σ 2 ). (46) Proof. a) By Theorem 20(b), we have that v k (F + N ε k X )(y k). Hence, by Proposition 9(b), we conclude that y k is an approximate solution of V IP (F, X) with strong residual (v k, ε k ). Also, Theorem 20 implies that Korpelevich s extragradient algorithm is a special case of the general HPE method of Section 4, where λ k = σ/l for all k N. Hence, the remaining claim in (a) follows from Theorem 16(a) with λ = σ/l. b) By Theorem 20(b) and Lemma 17 with T = F + N X, we conclude that v k (F + N X ) ε k(ȳ k ). In view of Proposition 9(a), this implies that ȳ k is an approximate solution of V IP (F, X) with weak residual ( v k, ε k ). The bounds in (45) follow from Theorem 19 with T = F + N X and λ k = σ/l, and the fact that Λ k, τ k and θ k defined in (28) and (34) are, in this case, equal to kλ/l, 1/k and θ k, respectively, where θ k is given by (46). Observe that the derived bounds obtained in b) are asymptotically better than the ones obtained in a). Indeed, while the bounds for ε k and ε k are O(1/k), the ones for v k and v k are O(1/ k) and O(1/k) respectively. However, it should be emphasized that a) describes the quality of the strong residual of some point among the iterates y 1,..., y k while b) describes the quality of the weak residual of ȳ k. The following result, which is an immediate consequence of Theorem 21, presents iterationcomplexity bounds for Korpelevich s extragradient algorithm to obtain (ρ, ε)-weak and strong solutions of V IP (F, X). For simplicity, we ignore the dependence of these bounds on the parameter σ and other universal constants and express them only in terms of L, d 0 and the tolerances ρ and ε. Corollary 22. Consider the sequence {y k } generated by Korpelevich s extragradient algorithm and the sequence {ȳ k } defined as in (43). Then, for every pair of positive scalars (ρ, ε), the following statements hold: a) there exists an index ( [ Ld 2 i = O max 0 ε, L2 d 2 ]) 0 ρ 2 such that the iterate y i is an (ρ, ε)-strong solution of V IP (F, X); b) there exists an index ( [ Ld 2 k 0 = O max 0 ε, Ld ]) 0 ρ such that, for any k k 0, the point ȳ k is a (ρ, ε)-weak solution of V IP (F, X); (47) (48) 17

18 5.1 Specialized complexity results for Korpelevich s extragradient method In this section we will obtain additional complexity results assuming that F is defined in all R n, or that the feasible set is bounded or is a cone. We first establish the following preliminary result. Lemma 23. Let {y k } and {x k } be the sequences generated by Korpelevich s extragradient algorithm and let {v k }, {q k } and {ε k } be the sequences given by (39). For every k N, define ε k = 1 k Then, for every k N, we have: F k = 1 k F (y i ), q k = 1 k y i ȳ k, F (y i ) F k, ε k = 1 k F k F ε k(ȳ k ), q i, (49) [ε i + y i ȳ k, q i q k ]. (50) q k (N X ) ε k (ȳ k ), v k = F k + q k, (51) ε k = ε k + ε k, ε k, ε k 0. (52) Proof. Applying Theorem 3 with T = F, x i = y i, v i = F (x i ), ε i = 0 and α i = 1/k for i = 1,..., k, we conclude that F k F ε k(ȳ k ) and ε k 0. Also, it follows from Theorem 20(b) and Theorem 3 with T = N X, x i = y i, v i = q i and α i = 1/k, for i = 1,..., k, that q k (N X ) ε k (ȳ k ) and ε k 0. The identity v k = F k + q k follows from (43), (49) and the fact that v i = F (y i ) + q i, i = 1,..., k. The other identity ε k = ε k + ε k now follows from (50) and the fact that v k = F k + q k and v i = F (y i ) + q i, i = 1,..., k. The following result shows that when F is defined in all R n and N F < L, the iteration complexity for the ergodic point ȳ k to be a (ρ, ε)-strong solution is better than the iteration complexity for the best of the iterates among y 1,..., y k to be a (ρ, ε)-strong solution. Moreover, when F is affine, it is shown that the dependence on the tolerance ρ of the first complexity is O(1/ρ) while that of the second complexity is O(1/ρ 2 ). Theorem 24 (F defined in all R n ). In addition to Assumptions (A.1)-(A.3), assume that Ω = R n. Let {y k } and {x k } be the sequences generated by Korpelevich s extragradient algorithm and let {ȳ k }, { v k }, {ε k } and { q k} be the sequences defined as in (43), (44) and (49). For every k N, define Then, the following statements hold: ˆv k := F (ȳ k ) + q k. (53) a) for every k N, ȳ k is an approximate solution of V IP (F, X) with strong residual (ˆv k, ε k ) and the following bounds on ˆv k and ε k hold: ε k 2Ld2 0 θ k kσ, ˆv k d 0 8 θk LN F kσ + 2Ld 0 kσ, (54) where N F := Nonl(F, R n ), θ k is defined in (46) and d 0 is the distance of x 0 to the solution set of V IP (F, X). 18

19 b) for every pair of positive scalars (ρ, ε), there exists an index ( [ Ld 2 k 0 = O max 0 ε, Ld 0 ρ + d2 0 LN ]) F ρ 2 such that, for any k k 0, the point ȳ k is a (ρ, ε)-strong solution of V IP (F, X). c) if F is also affine, then, for every pair of positive scalars (ρ, ε), there exists an index ( [ Ld 2 k 0 = O max 0 ε, Ld ]) 0 ρ such that, for any k k 0, the point ȳ k is a (ρ, ε)-strong solution of V IP (F, X). (55) (56) Proof. a) First note that (53) and (51) imply that ˆv k = F (ȳ k ) + q k (F + N ε k X )(ȳ k), from which we conclude that ȳ k is an approximate solution of V IP (F, X) with strong residual (ˆv k, ε k ), in view of Proposition 9(b). The first bound in (54) follows immediately from the second bound in (45) and the fact that ε k ε k in view of (52). Now, (51), (53), Proposition 7 with x i = y i and the triangle inequality for norms yield ˆv k v k + ˆv k v k = v k + F (ȳ k ) F k v k + 2 ε k N F. The second bound in (54) now follows from the above inequality and the bounds in (45). Statement (b) follows from the bounds in (54), the definition of (ρ, ε)-strong solution and some straightforward arguments. In many important instances of F (e.g., see the discussion after Definition 2), the constant N F = Nonl(F, R n ) is much smaller than L, and hence bound (55) can be much smaller than (47) on Corollary 22. In the following result, we consider the situation where X = K is a closed convex cone and V IP (F, K) becomes equivalent to the following monotone complementarity problem: 10 = F (y) s, y, s = 0, (y, s) K K. Using Proposition 12, we can translate the conclusions of Proposition 22(a) and Theorem 24 to the context of the above problem as follows. Corollary 25 (Monotone complementarity problems). In addition to Assumptions (A.1)-(A.3), assume that X = K. where K is a nonempty closed convex cone. Consider the sequences {x k } and {y k } generated by Korpelevich s extragradient algorithm applied to V IP (F, K) and the sequences {q k }, {ȳ k } and { q k } determined according to (39), (43) and (49), respectively. Then, for any pair of positive scalars (ρ, ε), the following statements hold: a) there exists an index such that the pair (y, s) = (y k, q k ) satisfies ( [ Ld 2 k = O max 0 ε, L2 d 2 ]) 0 ρ 2 F (y) s ρ, y, s ε, (y, s) K K ; (57) 19

20 b) if Ω = R n and F : R n R n is monotone and L-Lipschitz continuous on R n, then there exists an index ( [ Ld 2 k 0 = O max 0 ε, Ld 0 ρ + d2 0 LN ]) F ρ 2, (58) where N F := Nonl(F, R n ), such that, for any k k 0, the pair (y, s) = (ȳ k, q k ) satisfies (57). In particular, if F is affine, then the iteration-complexity bound (58) reduces to the one in (56). The following result derives alternative convergence rate bounds for Korpelevich s extragradient algorithm in the case where the feasible set X of V IP (F, X) is bounded. Theorem 26 (Bounded feasible sets). Assume that conditions A.1-A.3 hold and that the set X has finite diameter D X <. Consider the sequences {x k } and {y k } generated by Korpelevich s extragradient algorithm applied to V IP (F, X) and the sequences {ȳ k } and { ε k } determined according to (43) and (44), respectively. Then, for every k N, the following statements hold: a) ȳ k is an ε k -weak solution of V IP (F, X), or equivalently: where max x X F (x), ȳ k x ε k ε k := 2Ld 0 kσ ( DX + d 0 θk ) ; b) if Ω = R n and F : R n R n is monotone and L-Lipschitz continuous on R n, then ȳ k is an ˆε k -strong solution of V IP (F, X), or equivalently: where max x X F (y k), ȳ k x ε k ˆε k := d 0D X 8 θk LN F kσ + 2Ld 0(D X + θ k d 0 ) kσ (59) and N F := Nonl(F, R n ). Proof. Both statements follow immediately from Proposition 8, Theorem 21(b) and Theorem 24(a). The following result, which is an immediate consequence of Theorem 26, presents iterationcomplexity bounds for Korpelevich s extragradient algorithm to obtain ε-weak and strong solutions of V IP (F, X). We again ignore the dependence of these bounds on the parameter σ and other universal constants and express them only in terms of L, d 0 and the tolerance ε. Corollary 27. Assume that conditions A.1-A.3 hold and that the set X has finite diameter D X. Consider the sequence {y k } generated by Korpelevich s extragradient algorithm applied to V IP (F, X) and the sequence {ȳ k } determined according to (43). Then, for every ε > 0, the following statements hold: 20

21 a) there exists an index ( ) LDX d 0 k 0 = O ε such that, for any k k 0, the point ȳ k is an ε-weak solution of V IP (F, X); (60) b) if Ω = R n and F : R n R n is monotone and L-Lipschitz continuous on R n, then there exists an index ( k 0 LDX d 0 = O + LN F D 2 ) X d2 0 ε ε 2, where N F := Nonl(F, R n ), such that, for any k k 0, the point ȳ k is an ε-strong solution of V IP (F, X). It is interesting to compare the iteration-complexity bound obtained in Corollary 27(a) for finding an ε-weak solution of V IP (F, X) with the corresponding ones obtained in Nemirovski [5] and Nesterov [7]. Indeed, their analysis both yield O(DX 2 L/ε) iteration complexity bound to obtain an ε-weak solution of V IP (F, X). Hence, in contrast to their bounds, our bound O(d 0 D X L/ε) is proportional to d 0 and formaly shows for the first time that Korpelevich s extragradient method benefits from warm-start. 6 Hybrid proximal extragradient for smooth monotone equations In this section we consider the problem of solving where F : R n R n satisfies the following conditions: B.1) F is monotone; B.2) F is differentiable and its derivative F is L 1 -Lipschitz continuous. F (x) = 0, (61) Note that this problem is a special case of V IP (F, X) where X = R n. Newton Method applied to problem (61), under assumption B.2, has excellent local convergence properties, provided the starting point is close to a solution x and F is non-singular at x. Global well-definedness of Newton s method requires F to be non-singular everywhere on R n. However, this additional assumption does not guarantee the global convergence of Newton method. Solodov and Svaiter [10] proposed the use of Newton method for approximately solving the proximal subproblem at each iteration of the HPE method for problem (61). In this section, we will analyze the complexity of a variant of this method. Specifically, we will consider the following special case of the HPE method. 21

22 Newton Proximal Extragradient (NPE) Method: end 0) Let x 0 R n and 0 < σ l < σ u < 1 be given and set k = 1. 1) If F (x k1 ) = 0 STOP. Otherwise 2) Compute λ k R and s k R n satisfying (λ k F (x k 1 ) + I)s k = λ k F (x k 1 ), (62) 2 L 1 σ l λ k s k 2 L 1 σ u, (63) 3) Define y k = x k 1 + s k and x k = x k 1 λ k F (y k ), set k k + 1 and go to step 1. We will assume that F (x k ) 0 for k = 0, 1,.... The complexity analysis will not be affected by this assumption, because any assertion about the results after k iterations will be valid, adding the alternative or the algorithm finds a zero. For practical computations, given λ k, the step s k shall be computed solving the linear equation (F (x k ) + λ 1 k I)s = F (x k) Note that the direction s k in step 1 of the NPE method is the Newton direction with respect to the proximal point equation λf (x) + λ k (x x k 1 ) = 0 at the point x k 1. Define, for each k, σ k := L 1 2 λ k s k. (64) We need the following well-known result about differentiable maps with Lipschitz continuous derivative. Lemma 28. Suppose that G : R n R n is differentiable and its derivative G is L-Lipschitz continuous. Then, for every x, s R n, we have G(x + s) G(x) G (x)s L 2 s 2. We will now establish that the NPE method can be viewed as a special case of the HPE method. Lemma 29. For each k, σ l σ k σ u and λ k F (y k ) + y k x k 1 σ k y k x k 1. (65) As a consequence, the NPE method is a special case of the HPE method stated in Section 4 with ε k = 0, v k = F (y k ) and σ = σ u. Proof. The bounds on σ k follows directly from (63). Define for each k G k (x) := λ k F (x) + x x k 1. 22

23 Then, G k is λ kl 1 -Lipschitz continuous and in view of (62) Therefore, using Lemma 28 and (64) we have G k (x k 1)s k + G k (x k 1 ) = 0. G k (y k ) = G k (y k ) [G(x k 1 ) + G k (x k 1)s k ] λ kl 1 2 s k 2 = σ k s k, which, due to the fact that s k = y k x k 1, proves (65). Now we shall prove that the NPE method is a special case of the HPE method. Using inequality (65) and the fact that σ k σ u, ε k = 0 and v k = F (y k ) and F = F 0 = F ε k, we conclude that condition (18) is satisfied with T = F. To end the proof, note that x k = x k 1 λ k v k. We have seen in Theorems 16 and 19 that the performance of the HPE method depends on the sums λ i and λ 2 i. We now give lower bounds for these quantities in the context of the NPE method. Lemma 30. The sequence {λ k } of the NPE method satisfies 4β L λ 2 i d 2 0, (66) where d 0 is the distance of x 0 to F 1 (0) and β is the minimum value of the function (1 t 2 )t 2 over the interval [σ l, σ u ], i.e.: β := min{(1 σ 2 l )σ2 l, (1 σ2 u)σ 2 u}. (67) As a consequence, for any k, λ i 2 β L 1 d 0 k 3/2, Proof. Using the definition of y i and (64) we have λ 2 i 4β L 2 1 d2 0 k 2. (68) y i x i 1 2 = 1 λ 2 (λ 2 i s i ) 2 = 1 i λ 2 i ( ) 2 2 σ i. L 1 Since the NPE method is a particular case of the HPE method (Lemma 29), we can combine the above equation with the first inequality in (24) to conclude that d L 2 1 (1 σi 2 )σi 2 1. λ 2 i To end the proof of (66) use the inclusion σ i [σ l, σ u ] and the definition of β. The inequalities in (68) now follow by noting that the minimum value of the functions k t i and k t2 i subject to the condition that k t 2 i C and t i > 0 for i = 1,..., k are k 3/2 / C and k 2 /C, respectively. The next result gives the pointwise iteration complexity bound for the NPE method. 23

24 Proposition 31. Consider the sequence {y k } generated by the NPE method. Then, the following statements hold: a) for every k 1, there exists an index i k such that F (y i ) where β is the constant defined in Lemma 30. b) for any scalar ε > 0, there exists an index i = O such that the iterate y i satisfies F (y i ) ε. 1 + σu 1 σ u L 1 d 2 0 2k β, ( L1 d 2 ) 0 ε Proof. By Lemma 29, the NPE method is a special case of the HPE method where σ = σ u and the sequences {v k } and {ε k } are given by v k = F (y k ) and ε k = 0 for every k. Hence, it follows from Theorem 16(b) and Lemma 30 that, for every k N, there exists i k such that F (y i ) = v i (1 + σ u ) (1 σ u ) d σu L 1 d 2 0 ( k ) 1/2 1 σ u 2k β, λ2 i i.e., (a) holds. The proof of (b) follows immediately from (a). We now state a convergence result for the NPE method about the ergodic pair (ȳ k, v k ) defined as in (28) and (29). Proposition 32. Let {λ k }, {y k } and {x k } be the sequences generated by the NPE method and consider the sequences {v k } and {ε k } defined as v k = F (y k ) and ε k = 0 for every k 1. Then, the sequences {ȳ k }, { v k } and { ε k } defined according to (28) and (29) satisfy the following conditions: a) for every k 1, ε k 0 and v k F ε k(ȳ k ), or equivalently, x ȳ k, F (x) v k ε k for all x R n ; b) for every k 1, we have v k L 1d 2 0 k 3/2 β, ε k θl 1d 3 0 k 3/2 β, (69) where θ := 1 + σ u 1 σ 2 u. Proof. This result follows as an immediate consequence of Theorem 19 with σ = σ u and Lemmas 29 and

25 Our goal in the remaining part of this section will be to discuss the work involved in finding the pair (s k, λ k ) in the step 1 of the NPE method. First note that, due to the monotonicity of F, F (x) is positive semidefinite and for every x R n and λ > 0, the system has a unique solution s R n, which we denote by s(λ; x). Clearly, (λf (x) + I)s = λf (x), (70) s(λ; x) = (F (x) + λ 1 I)F (x). Observe that, given x R n, step 1 of the NPE method can be rephrased as the problem of computing λ > 0 such that 2σ l /L 1 λ s(λ; x) 2σ u /L 1. (71) Computation of the scalar λ will done by performing a logarithmic type bisection scheme on λ that which be discussed shortly. Each iteration of this bisection scheme requires computation of s(λ; x) and evaluation of the quantity λ s(λ; x) to check whether λ satisfies the aforementioned condition. Without any use of previous information, computation of s(λ; x) requires O(n 3 ) arithmetic operations for each λ. Hence, if the aforementioned bisection scheme requires k x evaluations of s( ; x), then the total cost of the bisection scheme will be O(n 3 k x ) arithmetic operations. However, it is possible to do better than that by using the following steps. First, compute a Hessenberg factorization of F (x), i.e., factor F (x) as F (x) = QHQ T where Q is an orthonormal matrix and H is a upper Hessenberg matrix (namely, H ij = 0 for every j < i). Then, s = s(λ; x) can be computed by solving the system (H +λ 1 I) s = Q T F (x) for s and and letting s = Q T s. Clearly, the first step of the modified bisection scheme can be performed in O(n 3 ) arithmetic operations while subsequent ones in O(n 2 ) arithmetic operations. Hence, the modified bisection scheme can be carried out in O(n 3 + n 2 k x ) arithmetic operations. Given a fixed x R n, our goal now will be to describe the aforementioned bisection scheme to compute λ and to estimate its number of iterations k x. Proposition 33. For any x R n, the mapping λ s(λ; x) is continuous on (0, ) and where F (x) is the operator norm of F (x). λ F (x) λ F s(λ; x) λ F (x), λ > 0, (72) (x) + 1 Proof. Continuity s(λ) for λ (0, ) follows from the fact that F (x) is positive semidefinte. To simplify the proof, let s = s(λ; x). Using (70), triangle inequality and the definition of operator norm, we conclude that λ F (x) λf (x)s + s ( λf (x) + 1) s, and hence that the first inequality in (72) holds. To prove the second inequality, we multiply both sides of (70) by s, use the Cauchy-Schwarz inequality and the positive definiteness of F (x) to obtain s 2 λ s, F (x)s + s 2 = λ s, F (x) λ s F (x), and hence that the second inequality in (72) holds. 25

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This