Acceleration Method for Convex Optimization over the Fixed Point Set of a Nonexpansive Mapping

Noname manuscript No. will be inserted by the editor) Acceleration Method for Convex Optimization over the Fixed Point Set of a Nonexpansive Mapping Hideaki Iiduka Received: date / Accepted: date Abstract The existing algorithms for solving the convex minimization problem over the fixed point set of a nonexpansive mapping on a Hilbert space are based on algorithmic methods, such as the steepest descent method and conjugate gradient methods, for finding a minimizer of the objective function over the whole space, and attach importance to minimizing the objective function as quickly as possible. Meanwhile, it is of practical importance to devise algorithms which converge in the fixed point set quickly because the fixed point set is the set with the constraint conditions that must be satisfied in the problem. This paper proposes an algorithm which not only minimizes the objective function quickly but also converges in the fixed point set much faster than the existing algorithms and proves that the algorithm with diminishing step-size sequences strongly converges to the solution to the convex minimization problem. We also analyze the proposed algorithm with each of the Fletcher Reeves, Polak Ribiére Polyak, Hestenes Stiefel, and Dai Yuan formulas used in the conventional conjugate gradient methods, and show that there is an inconvenient possibility that their algorithms may not converge to the solution to the convex minimization problem. We numerically compare the proposed algorithm with the existing algorithms and show its effectiveness and fast convergence. Keywords convex optimization fixed point set nonexpansive mapping conjugate gradient method three-term conjugate gradient method fixed point optimization algorithm This work was supported by the Japan Society for the Promotion of Science through a Grant-in-Aid for Young Scientists B) 23760077), and in part by the Japan Society for the Promotion of Science through a Grant-in-Aid for Scientific Research C) 22540175). H. Iiduka Department of Computer Science, Meiji University 1-1-1 Higashimita, Tama-ku, Kawasaki-shi, Kanagawa, 214-8571 Japan E-mail: iiduka@cs.meiji.ac.jp

2 Hideaki Iiduka Mathematics Subject Classification 2000) 47H07 47H09 65K05 65K10 90C25 90C30 90C52 1 Introduction This paper discusses the following convex optimization problem over the fixed point set of a nonexpansive mapping [31]: given a convex, continuously Fréchet differentiable functional, f, on a real Hilbert space, H, and a nonexpansive mapping, N, from H into itself, which has its fixed point i.e., FixN) := {x H : Nx) = x} = ), minimize fx) subject to x FixN). 1) Problem 1) includes practical problems such as signal recovery [8], beamforming [27], and bandwidth allocation [17, 19]. In particular, it plays an important role when the constraint set composed of the absolute set and the subsidiary sets is not feasible [17, 19]. When we consider an optimization problem, including the problem in [17,19], it would be reasonable to deal with a constraint set in the problem as a subset [9, Section I, Framework 2], [31, Definition 4.1] of the absolute set with the elements closest to the subsidiary sets in terms of the norm. Here, we formulate a compromise solution to the problem by using the minimizer of the objective function over this subset. Since the subset can be expressed as the fixed point set of a certain nonexpansive mapping [31, Proposition 4.2], the minimization problem over the subset can be formulated as Problem 1). We shall review the existing algorithms, called fixed point optimization algorithms, for solving Problem 1) when the gradient of f, denoted by f : H H, is strongly monotone and Lipschitz continuous. The first algorithm developed for solving Problem 1) is the hybrid steepest descent method HSDM) [31,32]: x 0 H, d f 0 := fx 0), x n+1 := N x n + µα n d f n), 2) d f n+1 := f x n+1), for each n N, where µ > 0 and α n ) n N is a sequence with lim n α n = 0 and n=0 α n =. HSDM strongly converges to the unique solution to Problem 1) [32, Theorem 2.15, Remark 2.17 a)]. Reference [8] proposed an effective algorithm, called the block-iterative surrogate constraint splitting method, to accelerate HSDM. The method in [8] converges strongly to the solution to Problem 1) without using diminishing sequences. The conjugate gradient methods [25, Chapter 5] and three-term conjugate gradient methods [7, 23, 34 36] are the most popular methods that can accelerate the steepest descent method i.e., x n+1 := x n α n fx n )) for large-scale unconstrained optimization problems. The search directions of the conjugate

Convex Optimization over Fixed Point Set 3 gradient method and three-term conjugate gradient method are as follows: for each n N, d f n+1 := f x n+1) + δ 1) n d f n, 3) d f n+1 := f x n+1) + δ 1) n d f n δ 2) n z n, 4) where δ n i) ) n N [0, ) i = 1, 2) and z n H n N) is an arbitrary point. In general, the conjugate gradient method i.e, x n+1 := x n + α n d f n with d f n defined by Equation 3)) does not generate the descent search direction, 1 which means that it does not always decrease f at each iteration. We need to set appropriately to ensure that d f n) n N defined by Equation 3) is the descent search direction. Meanwhile, the three-term conjugate gradient method i.e., x n+1 := x n + α n d f n with d f n defined by Equation 4)) generates the descent search direction [23, Subsection 2.1] without depending on the choice δ 1) n of δ n 1) see Footnote 2 for the well-known formulas of δ n 1) ). This is because the third term δ n 2) z n in Equation 4) plays a role in generating the descent search direction see [23, Subsection 2.1] and Subsection 3.1). On the basis of such acceleration methods for the steepest descent method, references [20] and [15] presented algorithms that respectively use Equation 2) and Equations 3) and 4) to solve Problem 1). The algorithm with Equations 2) and 3) and the algorithm with Equations 2) and 4) are referred to here as the hybrid conjugate gradient method HCGM) and the hybrid three-term conjugate gradient method HTCGM), respectively. HCGM and HTCGM converge = 0 i = 1, 2) and z n ) n N is bounded [20, Theorem 4.1], [15, Theorem 7]. Here, we remark that the conjugate gradient methods with the well-known formulas, such as the Fletcher Reeves FR), Polak Ribiére Polyak PRP), Hestenes Stiefel HS), and Dai Yuan DY) formulas 2, can solve unconstrained optimization problems without assuming lim n δ n = 0. To distinguish between the conventional conjugate gradient directions with the four formulas and the directions defined strongly to the solution to Problem 1) when lim n δ i) n by Equations 3) and 4) with lim n δ n i) = 0, we call the latter the conjugate gradient-like directions. The numerical examples in [15, 20] show that HCGM and HTCGM with slowly diminishing sequences δ i) n s e.g., δ i) n := 1/n+1) 0.01 i = 1, 2)) converge to the solution faster than HSDM, and that, in particular, HTCGM converges fastest. The main advantage of fixed point optimization algorithms, such as HCGM and HTCGM, with conjugate gradient-like directions is to enable f to be decreased much faster than HSDM with the steepest descent direction. Meanwhile, the rates of convergence of the distance dx n, FixN)) := inf x FixN) x n x to 0 are the same for all three algorithms because these algorithms each iterate as x n+1 := Nx n + µα n d f n) n N). Here, we shall discuss Problem 1 d f n) n N is referred to as a descent search direction if d f n, fx n ) < 0 for all n N. 2 These are defined as follows: δn FR := fx n+1 ) 2 / fx n ) 2, δn PRP := v n/ fx n) 2, δn HS := v n/u n, δn DY := fx n+1 ) 2 /u n, where u n := d f n, fx n+1 ) fx n ) and v n := fx n+1 ), fx n+1 ) fx n ).

4 Hideaki Iiduka 1) when FixN) is the set of all minimizers of a convex, continuously Fréchet differentiable functional, g, over H and see that x n+1 := Nx n + µα n d f n) n N) is based on the steepest descent method to minimize g over H. Suppose that g : H H is Lipschitz continuous with a constant l > 0 and define N g : H H by N g := I α g, where α 0, 2/l] and I : H H stands for the identity mapping. Accordingly, N g satisfies the nonexpansivity condition and FixN g ) = Argmin x H gx) see, e.g., [14, Proposition 2.3]). Therefore, Equation 2) with N g := I α g is as follows: y n := x n + µα n d f n, x n+1 := N g xn + µα n d f ) n = [I α g] y n ) = y n + α [ gy n )] [ ] Ng y n ) y n = y n + α. α 5) Hence, Algorithm 5) has the steepest descent direction, d Ng n+1 := gy n) = N gy n ) y n, α which implies it converges slowly in the constraint set, FixN g ). From such a viewpoint, one can expect that an algorithm with the three-term conjugate gradient direction, d N g n+1 := N gy n ) y n α + β n 1) d Ng n + β n 2) w n, 6) where β n i) R i = 1, 2) and w n H, would converge in the constraint set faster than Algorithm 5). In this paper, we present an algorithm with both Direction 4) to accelerate the objective function minimization and Direction 6) to accelerate the search for a fixed point of a nonexpansive mapping. We also present its convergence analysis. This paper is organized as follows. Section 2 gives mathematical preliminaries. Section 3 presents the fixed point optimization algorithm with Directions 4) and 6) to accelerate the existing algorithms and proves that the algorithm with lim n δ n i) = 0 and lim n β n i) = 0 i = 1, 2) strongly converges to the solution to Problem 3.1. It also proves that HCGM with each of the FR, PRP, HS, and DY formulas i.e., the algorithm with Equations 2) and 3) when δ n 1) is defined by one of δn FR, δn PRP, δn HS, and δn DY ) does not satisfy lim n δ n 1) = 0 when the unique minimizer of the objective function over the whole space is not in the fixed point set of a nonexpansive mapping. This implies that there is an inconvenient possibility that HCGMs with the four formulas may not converge strongly to the unique minimizer of the objective function over the fixed point set that is not equal to the unique minimizer of the objective function over the whole space. Section 4 provides numerical

Convex Optimization over Fixed Point Set 5 comparisons of the proposed algorithm with the existing fixed point optimization algorithms and shows its effectiveness. It also describes examples such that HCGMs with the four formulas do not always converge to the solution. Section 5 concludes the paper by summarizing its key points and mentions future subjects for development of the proposed algorithm. 2 Mathematical Preliminaries Let H be a real Hilbert space with inner product,,, and its induced norm,, and let N be the set of zero and all positive integers; i.e., N := {0, 1, 2,...}. We denote the identity mapping on H by I; i.e., Ix) := x for all x H. 2.1 Convexity, monotonicity, and nonexpansivity A function, f : H R, is said to be convex if, for any x, y H and for any λ [0, 1], fλx + 1 λ)y) λfx) + 1 λ)fy). In particular, a convex function, f : H R, is said to be strongly convex with c > 0 c-strongly convex) if fλx + 1 λ)y) λfx) + 1 λ)fy) cλ1 λ)/2)x y 2 for all x, y H and for all λ [0, 1]. A: H H is referred to as a monotone operator if x y, Ax) Ay) 0 for all x, y H. A: H H is said to be strongly monotone with c > 0 cstrongly monotone) if x y, Ax) Ay) cx y 2 for all x, y H. Let f : H R be a Fréchet differentiable function. If f is convex resp. c-strongly convex), f is monotone resp. c-strongly monotone) [4, Example 22.3]. A mapping, A: H H, is said to be Lipschitz continuous with L > 0 L-Lipschitz continuous) if Ax) Ay) Lx y for all x, y H. When N : H H is 1-Lipschitz continuous, N is referred to as a nonexpansive mapping [3,12,13,29]. In particular, N is said to be firmly nonexpansive if Nx) Ny) 2 x y, Nx) Ny) for all x, y H. The Cauchy-Schwarz inequality guarantees that any firmly nonexpansive mapping satisfies the nonexpansivity condition. We denote the fixed point set of N : H H by FixN) := {x H : Nx) = x}. FixN) satisfies closedness and convexity properties when N is nonexpansive [13, Proposition 5.3]. Given a nonempty, closed convex set, C H), the mapping that assigns every point, x H, to its unique nearest point in C is called the metric projection onto C and is denoted by P C ; i.e., P C x) C and x P C x) = inf y C x y. The metric projection, P C, satisfies the firm nonexpansivity condition with FixP C ) = C [3, Facts 1.5], [28, Theorem 2.4-1 ii)], [4, Proposition 4.8, Equation 4.8)]. Some closed convex set, C, for example, a linear variety, a closed ball, a closed cone, or a closed polytope, is simple in the sense that the explicit form of P C is known, which implies that P C can be computed within a finite number of arithmetic operations [4, Subchapter 28.3], [30]. The following lemmas will be used to prove the main theorem.

6 Hideaki Iiduka Lemma 2.1 Lemma 3.1 in [31]) Suppose that f : H R is c-strongly convex and Fréchet differentiable, f : H H is L-Lipschitz continuous, and µ 0, 2c/L 2 ). Define T : H H by T x) := x µα fx) x H), where α [0, 1]. Then, for all x, y H, T x) T y) 1 τα)x y, where τ := 1 1 µ 2c µl 2 ) 0, 1]. Lemma 2.2 Theorems 3.7 and 3.9 in [1]) Suppose that N 1 : H H is firmly nonexpansive and N 2 : H H is nonexpansive with FixN 1 ) FixN 2 ), and x n ) n N H) is bounded. Then, lim n x n N 1 N 2 x n )) = 0 if and only if lim n x n N 1 x n ) = 0 and lim n x n N 2 x n ) = 0. 2.2 Monotone variational inequality The variational inequality problem [11,22] for a monotone operator, A: H H, over a closed convex set, C H), is to find a point in VIC, A) := {x C : x x, A x ) 0 for all x C}. Suppose that f : H R is c-strongly convex and Fréchet differentiable, and f : H H is L-Lipschitz continuous. Then, VIC, f) can be characterized as the set of all minimizers of f over C, which coincides with the fixed point set of P C I α f) [6, Subsection 8.3], [11, Proposition 2.1], [33, Theorem 46.C 1) and 2)]: VIC, f) = Argmin fx) := x C { } x C : fx ) = min fx) x C = Fix P C I α f)) := {x C : P C x α fx )) = x }, where α is an arbitrary positive real number. Since P C I ˆα f) is a contraction mapping when ˆα 0, 2c/L 2 ), P C I ˆα f) has a unique fixed point [12, Theorem 2.1]. Therefore, the solution to the variational inequality consists of one point. 3 Optimization over the Fixed Point Set This section discusses the following problem: Problem 3.1 Under the assumptions that A1) N : H H is a nonexpansive mapping with FixN), A2) K H) is a nonempty, bounded, closed convex set onto which the metric projection is computable, and FixN) K, 3 3 For example, when there is a bound on FixN), we can choose K as a closed ball with a large radius containing FixN). The metric projection onto such a K is easily computed see also Subsection 2.1). See the final paragraph in Subsection 3.1 for a discussion of Problem 3.1 when a bound on FixN) either does not exist or is not known.

Convex Optimization over Fixed Point Set 7 A3) f : H R is c-strongly convex and Fréchet differentiable, and f : H H is L-Lipschitz continuous, minimize fx) subject to x FixN). From the closedness and convexity of FixN) and the discussion in Subsection 2.2, we get the following: Proposition 3.1 The existence and uniqueness of the solution to Problem 3.1 is guaranteed. 3.1 Acceleration method for the convex optimization problem over the fixed point set We present the following algorithm for solving Problem 3.1: Algorithm 3.1 Step 0. Take α n ) n N, β n i) ) n N, δ n i) ) n N 0, 1] i = 1, 2), γ 0, 1], and µ > 0, choose x 0 H arbitrarily, and let d f 0 := fx 0), y 0 := x 0 + µα 0 d f 0, d N 0 := Ny 0 ) y 0, and n := 0. Step 1. Given x n, d f n H, compute y n H as Compute d N n+1 H as y n := P K xn + µα n d f n). d N n+1 := Ny n ) y n + β 1) n d N n + β 2) n w n, 7) where w n H is an arbitrary point. Step 2. Compute x n+1 H as x n+1 := P K yn + γd N ) n+1 and update d f n+1 H by d f n+1 := fx n+1) + δ 1) n d f n δ 2) n z n, 8) where z n H is an arbitrary point. Put n := n + 1, and go to Step 1. In unconstrained optimization problems, it is desirable to use iterative methods which generate descent search directions. This is because such methods can decrease strictly the objective function at each iteration. Generally, it is not guaranteed that the conjugate gradient method defined by x n+1 := x n +α n d f n and Equation 8) with δ n 2) z n := 0 generates the descent search direction. 4 The three-term conjugate gradient method defined by x n+1 := x n +α n d f n 4 The conjugate gradient method with the DY formula i.e., δ n 1) := δn DY ) generates the descent search direction under the Wolfe conditions [10]. Whether or not the conjugate gradient methods generate descent search directions depends on the choices of δ n 1) and α n.

8 Hideaki Iiduka and Equation 8) with δ n 2) z n 0 generates the descent search direction without depending on the choices of δ n 1) and α n [23, Subsection 2.1]. 5 Therefore, it would be useful in Problem 3.1 to use an accelerated algorithm with Direction 8) when δ n 2) z n 0. On the other hand, the discussion on Equation 5) describes that Ny n ) y n is expressed as the steepest descent direction at y n of a certain convex function of which a minimizer is a fixed point of N. Accordingly, Direction 7) is the three-term conjugate gradient direction for finding a fixed point of N. Hence, one can expect that an algorithm with Direction 7) when β n 2) w n 0 would converge in FixN) quickly see also Section 1). Let us compare Algorithm 3.1 with the existing algorithms, such as HSDM [32, Theorem 2.15, Remark 2.17 a)], HCGM [20, Algorithm 3.4], and HTCGM [15, Algorithm 6], for solving Problem 3.1. HTCGM is as follows see also Equations 2) and 4)): x 0 H, d f 0 := fx 0), and { x n+1 := N x n + µα n d f n), d f n+1 := fx n+1) + δ 1) n d f n δ n 2) z n n N). Algorithm 9) with δ n i) := 0 i = 1, 2, n N) coincides with HSDM, and Algorithm 9) with δ n 2) := 0 n N) coincides with HCGM. Hence, the existing algorithms can be expressed as Algorithm 9). Algorithm 3.1 with K := H, γ := 1, and β n i) := 0 i = 1, 2) has x n+1 = y n + d N n+1 = Ny n ) = Nx n + µα n d f n), which means that Algorithm 3.1 in this case coincides with Algorithm 9). Algorithm 3.1 uses d N n+1 := Ny n ) y n + β n 1) d N n + β n 2) w n to converge in FixN) faster than Algorithm 9), as discussed in Section 1. The following theorem constitutes the convergence analysis of Algorithm 3.1. The proof of the theorem is given in Subsection 3.3. Theorem 3.1 Suppose that I) µ 0, 2c/L 2 ), II) w n ) n N and z n ) n N are bounded, 6 and III) α n ) n N, β n i) ) n N, and δ n i) ) n N i = 1, 2) are sequences in 0, 1] satisfying i) lim n α n = 0, ii) n=0 α n =, iii) n=0 α n+1 α n <, iv) β n i) αn 2 i = 1, 2, n N), and v) lim n δ n i) = 0 i = 1, 2). Then, x n ) n N in Algorithm 3.1 strongly converges to the unique solution to Problem 3.1. Let us compare Theorem 3.1 with the previously reported results in [32, Theorem 2.15, Remark 2.17 a)], [20, Theorem 4.1], and [15, Theorem 7]. HSDM i.e., Algorithm 9) with δ n i) := 0 i = 1, 2)) with Conditions I), i), ii), and iii) in Theorem 3.1 converges strongly to the solution to Problem 3.1 5 Reference [23, Subsection 2.1] showed that x n+1 := x n + α n d f n and d f n+1 := fx n+1 ) + δ n 1) d f n δ n 2) z n, where α n, δ n 1) > 0) are arbitrary, z n R N ) is any vector, and δ n 2) := δ n 1) fx n+1 ), d n / fx n+1 ), z n ), satisfy d f n, fx n ) = fx n ) 2 n N). 6 We can choose, for example, w n := Ny n) y n and z n := fx n+1 ) n N) by referring to [35] and [15, Section 3]. Lemma 3.1 ensures that they are bounded. 9)

Convex Optimization over Fixed Point Set 9 [32, Theorem 2.15, Remark 2.17 a)]. HCGM i.e., Algorithm 9) with δ 2) n := 0) with the conditions in Theorem 3.1 converges strongly to the solution if fx n )) n N is bounded [20, Theorem 4.1]. Theorem 7 in [15] guarantees that, if fx n )) n N is bounded, then HTCGM with the conditions in Theorem 3.1 converges strongly to the solution. The results in [20, Theorem 4.1] and [15, Theorem 7] and the proof of Theorem 3.1 lead us to a strong convergence of Algorithm 3.1 to the solution without assuming the boundedness of K if fx n )) n N and Ny n ) y n ) n N are bounded. However, it would be difficult to verify whether fx n )) n N and Ny n ) y n ) n N are bounded or not in advance. Hence, we assume the existence of a bounded K satisfying FixN) K in place of the boundedness of fx n )) n N and Ny n ) y n ) n N see Footnote 3 for the choice of K). Let us consider the case where a bound on FixN) either does not exist or is not known. In this case, we cannot choose a bounded K satisfying FixN) K. 7 Even in the case, we can execute Algorithm 3.1 with K = H. However, we need to verify the boundedness of fx n )) n N and Ny n ) y n ) n N to guarantee that Algorithm 3.1 converges to the solution see the above paragraph). When we try to apply HCGM and HTCGM to this case, we also need to verify whether or not fx n )) n N is bounded [20, Theorem 4.1], [15, Theorem 7]. Meanwhile, we can apply HSDM to this case without any problem [32, Theorem 2.15, Remark 2.17 a)]. Therefore, when a bound on FixN) either does not exist or is not known, we should execute HSDM. However, HSDM converges slowly. Hence, it would be desirable to execute HSDM, HCGM, HTCGM, and Algorithm 3.1 and verify whether the convergent point of HSDM that is the minimizer of f over FixN) is equal to the convergent points of HCGM, HTCGM, and Algorithm 3.1. 3.2 Analysis of Algorithm 3.1 with the conventional formulas of conjugate gradient directions In this subsection, we analyze Algorithm 3.1 when δ n 1) is one of the wellknown formulas, such as the Fletcher Reeves FR), Dai Yuan DY), Polak Ribiére Polyak PRP), and Hestenes Stiefel HS) formulas, that are used to solve large-scale unconstrained optimization problems. Let us define the FR, DY, PRP, and HS formulas, which can be applied to constrained optimization 7 Given a halfspace S := {x H : a, x b}, where a 0) H and b R, Nx) := P S x) = x [max{0, a, x b}/a 2 ]a x H) is nonexpansive with FixN) = FixP S ) = S [3, p. 406], [4, Subchapter 28.3]. However, we cannot define a bounded K satisfying FixN) = S K.

10 Hideaki Iiduka problems, as follows: for each n N, fx n+1 ) 2 δn FR := fx n ) 2 if fx n ) > 0, 0 otherwise, fx n+1 ) 2 δn DY := d f if u n 0, n, fx n+1 ) 1 + η) fx n ) 0 otherwise, δ PRP n := δ HS n := fx n+1 ), fx n+1 ) 1 + κ) fx n ) fx n ) 2 if fx n ) > 0, 0 otherwise, fx n+1 ), fx n+1 ) 1 + κ) fx n ) d f if u n 0, n, fx n+1 ) 1 + η) fx n ) 0 otherwise, 10) where η, κ 0, and u n := d f n, fx n+1 ) 1 + η) fx n ) n N). For simplicity, we assume that δ n 2) := 0 n N), i.e., d f n n N) in Algorithm 3.1 is defined by the conventional conjugate gradient direction: d f n+1 := fx n+1) + δ n d f n n N), where δ n R is defined as one of Formulas 10). The following proposition is satisfied for Algorithm 3.1 with the conventional FR, DY, PRP, and HS Formulas 10): Proposition 3.2 Suppose that Conditions I), II), and i) iv) in Theorem 3.1 are satisfied. Then, the following hold: i) If lim n δn FR = 0, then the unique minimizer of f over H belongs to FixN) i.e., FixN) Argmin x H fx) ). In this case, x n ) n N in Algorithm 3.1 strongly converges to the unique minimizer of f over H. ii) If lim n δn DY = 0 and if η 0, then the unique minimizer of f over H belongs to FixN). In this case, x n ) n N in Algorithm 3.1 strongly converges to the unique minimizer of f over H. iii) If lim n δn PRP = 0 and if κ 0, then the unique minimizer of f over H belongs to FixN). In this case, x n ) n N in Algorithm 3.1 strongly converges to the unique minimizer of f over H. iv) If lim n δn HS = 0 and if η, κ 0, then the unique minimizer of f over H belongs to FixN). In this case, x n ) n N in Algorithm 3.1 strongly converges to the unique minimizer of f over H. Let us discuss Proposition 3.2 for Algorithm 3.1 with γ := 1, β n i) := 0 i = 1, 2), and δ n := δ n 1) defined by one of Formulas 10), i.e., { [ x n+1 := P K N PK xn + µα n dn))] f, d f n+1 := fx n+1) + δ n d f 11) n n N).

Convex Optimization over Fixed Point Set 11 Proposition 3.2 says that, in the case of η, κ 0, if Algorithm 11) with δ n defined by one of Formulas 10) satisfies lim n δ n = 0, then the unique minimizer of f over H is always in FixN), and the algorithm strongly converges to the unique minimizer of f over H belonging to FixN). Proposition 3.2 describes that Algorithm 11) satisfies lim n δ n 0 in the case where η, κ 0 and the unique minimizer of f over H is not in FixN). According to Theorem 3.1 or [20, Theorem 4.1]), Algorithm 11) in this case might not converge to the unique minimizer of f over FixN) that is not equal to the unique minimizer of f over H. To guarantee that Algorithm 11) converges in this case to the unique minimizer of f over FixN), for example, we need to reset δ n := 0 when n exceeds a certain number of iterations. Algorithm 11) with the steepest descent direction δ n := 0), i.e., x n+1 := P K [NP K x n µα n fx n )))] n N) HSDM [32]), strongly converges to the unique minimizer of f over FixN) [32, Theorem 2.15, Remark 2.17 a)], however, it converges slowly. The above observation suggests that Algorithm 11) with each of the conventional formulas would not be an efficient way to solve constrained convex optimization problems. Numerical examples in Section 4 show that the algorithms with the conventional formulas do not always converge to the unique minimizer of f over FixN). Meanwhile, Algorithm 11) with lim n δ n = 0 e.g., δ n := 1/n + 1) a a > 0)) always converges to the unique minimizer of f over FixN) [20, Theorem 4.1], which means there is no need to verify whether the unique minimizer of f over H is in FixN) or not in advance, and converges faster than HSDM see [20] for details on the fast convergence of Algorithm 11) with lim n δ n = 0). In the case that the unique minimizer of f over H is not in FixN) and lim n δn DY = 0 or lim n δn HS = 0 and κ 0), we get η = 0 from Proposition 3.2 ii) and iv). Hence, Inequality 14) see the proof of Proposition 3.2 ii) and iv) and Remark 3.1) imply that f x n+1 ) f x n ) for large enough n. Since fx n ) tends to be smaller at the unique minimizer of f over FixN), Algorithm 11) with δ n := δn DY or δn HS ) will not converge to the unique minimizer of f over FixN) when the unique minimizer of f over H is not in FixN). Proof of Proposition 3.2. Let x H be the unique minimizer of f over H. i) The boundedness of K ensures that x n ) n N is bounded. The Lipschitz continuity of f guarantees that fx n ) fx ) Lx n x for all n N, which implies that fx n )) n N is bounded. Hence, B 1 > 0 exists such that fx n ) B 1 for all n N. 12) Assume that x / FixN). We then can choose ε 1, ε 2 > 0 such that d x, FixN)) := inf {x y : y FixN)} ε 1 + ε 2.

12 Hideaki Iiduka Theorem 3.1 and lim n δn FR = 0 guarantee that x n ) n N strongly converges to the unique minimizer of f over FixN), denoted by x FixN). Hence, for ε 1 > 0, there exists N 0 N such that, for all n N 0, d x n, FixN)) := inf {x n y : y FixN)} x n x ε 1. Fix n N 0 arbitrarily. Then, yn) FixN) exists such that x n yn) = dx n, FixN)). Hence, ε 2 d x, FixN)) ε 1 d x, FixN)) d x n, FixN)) = inf {x y : y FixN)} x n yn) x yn) x n yn) x x n. Since the c-strong monotonicity of f implies f) 1 is 1/c-Lipschitz continuous, we find that, for all x H, f) 1 0) f) 1 x) 1/c)x, and hence, ε 2 x x n = f) 1 0) f) 1 fx n )) 1 c fx n). Therefore, we have fx n ) cε 2 =: B 2 for all n N 0. 13) Inequalities 12) and 13) ensure that, for all n N 0, δn FR = fx n+1) 2 fx n ) 2 B2 B 1 ) 2 > 0. This contradicts lim n δn FR = 0. Hence, we find that {x } = Argmin x H fx) FixN). Moreover, since x is the solution to Problem 3.1, Theorem 3.1 guarantees that x n ) n N strongly converges to x. ii) Assume that x / FixN). Proposition 3.2 i) guarantees that Inequalities 12) and 13) hold for all n N 0. Since lim n δn DY = 0, there exists N 1 N such that δn DY 1/2 for all n N 1. Put B := max{b 1, d f N 1 } <. Then, d f N 1 2B. Suppose that d f m 2B for some m N 1. From d f n+1 := fx n+1) + δn DY d f n n N), we find that d f m+1 fx m+1 ) + δ DY m d f m B + 1 2B) = 2B. 2 Induction shows that d f n 2B for all n N 1. Hence, the boundedness of d f n) n N and fx n )) n N imply that, for all n N 1, u n d f n, fx n+1 ) 1 + η) fx n ) d f n fx n+1 ) 1 + η) fx n ) d f n fx n+1 ) + 1 + η) fx n )) 22 + η)bb 1.

Convex Optimization over Fixed Point Set 13 If u n := d f n, fx n+1 ) 1 + η) fx n ) = 0 for all n max{n 0, N 1 }, then δn DY = fx n+1) 2 B2 2 > 0, u n 22 + η)bb 1 which implies lim n δn DY > 0. Therefore, we find that u n = 0 for all n max{n 0, N 1 }, i.e., δn DY = 0 for all n max{n 0, N 1 }. This implies that d f n+1 = fx n+1) for all n max{n 0, N 1 }. From u n = 0 for all n N 2 := max{n 0, N 1 }+1, we have d f n, fx n+1 ) = 1+η) d f n, fx n ), which means fx n ), fx n+1 ) = 1 + η) fx n ) 2, and hence, fx n+1 ) 1 + η) fx n ) 1 + η) n N 2 fx N2 ) 1 + η) n N 2 B 2. 14) In the case of η > 0, the right hand side of the above inequality diverges when n diverges. This contradicts the boundedness property of fx n )) n N. Accordingly, {x } = Argmin x H fx) FixN). Theorem 3.1 guarantees that x n ) n N strongly converges to x. iii) Assume that x / FixN). Put v n := fx n+1 ), fx n+1 ) 1 + κ) fx n ) n N). From lim n δn PRP = lim n v n / fx n ) 2 ) = 0 and Inequalities 12) and 13), we have lim n v n = 0. Moreover, the strong convergence of x n ) n N to x FixN) Theorem 3.1) and the continuity of f ensure ) 0 = lim v n = lim fx n+1 ) 2 1 + κ) fx n+1 ), fx n ) n n = fx ) 2 1 + κ) fx ) 2 = κ fx ) 2, which implies from fx ) = 0 that κ = 0. Therefore, assuming κ 0 and lim n δn PRP = 0 implies that {x } = Argmin x H fx) FixN). Theorem 3.1 guarantees that x n ) n N strongly converges to x. iv) Assume that x / FixN). A similar discussion to that of the proof of Proposition 3.2 ii) leads us to Inequalities 12) and 13) and the boundedness of d f n) n N and u n ) n N. The strong convergence of x n ) n N to x FixN) Theorem 3.1), the continuity of f, the boundedness of d f n) n N, and lim n δn HS = 0 imply that 0 lim sup d f n+1 + fx ) = lim sup fx n+1 ) + δn HS d f n + fx ) n n lim sup fx ) fx n+1 ) + δn HS d f ) n 0, n which implies that lim n d f n+1 + fx ) = 0. Meanwhile, we have that, for all n N, u n η fx ) 2 = d f n, fx n+1 ) 1 + η) fx n ) + fx ), fx ) 1 + η) fx ) = d f n, fx n+1 ) + fx ), fx ) 1 + η) d f n, fx n ) + fx ), fx ) ).

14 Hideaki Iiduka So, the triangle inequality ensures that, for all n N, u n η fx ) 2 d f n, fx n+1 ) + fx ), fx ) Moreover, we find that, for all n N, d f n, fx n+1 ) + fx ), fx ) + 1 + η) d f n, fx n ) + fx ), fx ). = d f n + fx ), fx n+1 ) fx ), fx n+1 ) + fx ), fx ) = d f n + fx ), fx n+1 ) + fx ), fx ) fx n+1 ) d f n + fx ), fx n+1 ) + fx ), fx ) fx n+1 ), which means that d f n, fx n+1 ) + fx ), fx ) d f n + fx ) fx n+1 ) + fx ) fx ) fx n+1 ). We also have that, for all n N, d f n, fx n ) + fx ), fx ) = d f n + fx ), fx n ) fx ), fx n ) + fx ), fx ) = d f n + fx ), fx n ) + fx ), fx ) fx n ) d f n + fx ), fx n ) + fx ), fx ) fx n ) d f n + fx ) fx n ) + fx ) fx ) fx n ). From lim n d f n + fx ) = 0, lim n fx n ) fx ) = 0, and the boundedness of fx n )) n N, we have that lim n d f n, fx n+1 ) + fx ), fx ) = 0 and lim n d f n, fx n ) + fx ), fx ) = 0. Therefore, lim n u n η fx ) 2 = 0, i.e., lim u n = η fx ) 2. 15) n In the case of η > 0, we find from Equation 15) and fx ) = 0 that lim n u n > 0. Therefore, we have from lim n δn HS = lim n v n /u n ) = 0 that lim n v n = 0. A discussion similar to the proof of Proposition 3.2 iii) leads us to κ = 0, which is a contradiction. Hence, we find that {x } = Argmin x H fx) FixN). Remark 3.1 Consider the case in which the minimizer of f over H is not in FixN), lim n δn HS = 0, and κ 0. Then, Proposition 3.2 iv) leads to η = 0. In the case of η = 0, Equation 15) implies that lim n u n = 0. From lim n δn HS = lim n v n /u n ) = 0, we have the following cases: A) v n = ou n ), or B) u n = 0 for large enough n i.e., δn HS = 0 for large enough n). We have that lim n v n = κ fx ) 2 0 because κ 0 and x is the minimizer of f over FixN) when the minimizer of f over H is not in

Convex Optimization over Fixed Point Set 15 FixN) see the proof of Proposition 3.2 iii)). This implies that Case A) does not hold. In Case B), from a discussion similar to Proposition 3.2 ii) see Inequality 14)) and η = 0, we find that fx n+1 ) fx n ) for large enough n. 3.3 Proof of Theorem 3.1 We first prove the boundedness of x n ) n N, y n ) n N, d N n ) n N, and d f n) n N. Lemma 3.1 Suppose that Conditions II), i), iv), and v) in Theorem 3.1 are satisfied. Then, x n ) n N, y n ) n N, d N n ) n N, and d f n) n N in Algorithm 3.1 are bounded. Proof The boundedness of K and the definitions of x n ) n N and y n ) n N guarantee the boundedness of x n ) n N and y n ) n N. The nonexpansivity of N guarantees that Ny n ) y y n y for all y FixN). Thus, the boundedness of y n ) n N implies that Ny n )) n N is bounded, i.e., Ny n ) y n ) n N is bounded. Moreover, the Lipschitz continuity of f ensures that fx n ) fx) Lx n x for all x H. Hence, the boundedness of x n ) n N means that fx n )) n N is bounded. We shall prove that d N n ) n N is bounded. Since lim n β n i) = 0 i = 1, 2) from Conditions i) and iv), there exists n 0 N such that β n 1) 1/3 and β n 2) 1 for all n n 0. Condition II) ensures that K 1 := max{sup{ny n ) y n : n N}, sup{w n : n N}} < and K 2 := max{k 1, d N n 0 } <. Obviously, d N n 0 3K 2. We assume that d N n 3K 2 for some n n 0. The definition of d N n ) n N guarantees that d N n+1 Ny n ) y n + β 1) n d N n + β 2) n w n K 2 + 1 3 3K 2) + K 2 = 3K 2. Induction shows that d N n 3K 2 for all n n 0 ; i.e., d N n ) n N is bounded. Next, we shall prove that d f n) n N is bounded. The boundedness of fx n )) n N and Condition II) imply that K 3 := max{sup{ fx n ): n N}, sup{z n : n N}} <. Condition v) ensures the existence of n 1 N such that δ n 1) 1/3 and δ n 2) 1 for all n n 1. Put K 4 := max{k 3, d f n 1 } <. Then, d f n 1 3K 4. Suppose that d f n 3K 4 for some n n 1. The definition of d f n) n N means that fx n+1 ) + δ n 1) d f n + δ n 2) z n K 4 + 1 3 3K 4) + K 4 = 3K 4. d f n+1 Induction shows that d f n 3K 4 for all n n 1 ; i.e., d f n) n N is bounded. Next, we prove the following: Lemma 3.2 Suppose that the assumptions in Theorem 3.1 are satisfied. Then, i) lim n x n+1 P K ˆNx n )) = 0, where ˆN := 1 γ)i + γn;

16 Hideaki Iiduka ii) lim n x n+1 x n = 0 and lim n x n ˆNx n ) = 0; iii) lim sup n x x n, fx ) 0, where x FixN) is the solution to Problem 3.1. Proof i) x n+1 n N) in Algorithm 3.1 can be rewritten as follows: x n+1 = P K yn + γd N ) n+1 = PK y n + γ = P K 1 γ)i + γn) y n ) + γ ) =: P K ˆNyn ) + γt n, Ny n ) y n + β 1) n d N n + β 2) n w n )) β 1) n d N n + β 2) n w n )) where ˆN := 1 γ)i + γn and t n := β n 1) d N n + β n 2) w n n N). Since N is nonexpansive, ˆN := 1 γ)i + γn satisfies the nonexpansivity condition with FixN) = Fix ˆN). The nonexpansivity of P K guarantees that, for all n N, ) ) PK x n+1 P K ˆNxn ) = ˆN yn ) + γt n P K ˆNxn )) ) ˆN yn ) + γt n ˆNx n ) ˆN y n ) ˆNx n ) + γ t n, which from the nonexpansivity of ˆN means that, for all n N, x n+1 P K ˆNxn )) yn x n + γ t n. Since y n := P K x n + µα n d f n) and x n = P K x n ) from x n+1 = P K ˆNy n ) + γt n ) K n N)), the nonexpansivity of P K ensures that y n x n = P K xn + µα n d f n) PK x n ) x n + µα n d f n) xn = µα n d f n. Moreover, from Condition iv) and αn 2 α n 1 n N) we have t n = β n 1) d N n + β n 2) w n β 1) n d N n + β n 2) w n αn 2 d N n α n d N n + w n ). + α 2 n w n Therefore, we find that, for all n N, x n+1 P K ˆNxn )) µαn d f n + γαn d N n + wn ) 2K 5 α n, where K 5 := max{sup{µd f n: n N}, sup{γd N n + w n ): n N}} <. Hence, Condition i) implies that lim n x n+1 P K ˆNx n )) = 0. ii) Let τ := 1 1 µ 2c µl 2 ) 0, 1] see also Lemma 2.1). Put s n := x n +µα n d f n) x n 1 +µα n 1 d f n 1 ) n 1), M 1 := sup{2µ s n, fx n 1 ) : n 1}, M 2 := sup{ s n, d f n 1 /τ : n 1}, M 3 := sup{ s n, z n 1 /τ : n 1}, M 4 := sup{ s n, d f n 2 /τ : n 2}, and M 5 := sup{ s n, z n 2 /τ : n 2}.

Convex Optimization over Fixed Point Set 17 Lemma 3.1 ensures that M 6 := max{m i : i = 1, 2,..., 5} <. The definition of d f n) n N means that, for all n 2, s n 2 = x n + µα n d f ) ) n x n 1 + µα n 1 d f 2 n 1 )) = x n + µα n fx n ) + δ 1) n 1 df n 1 δ2) n 1 z n 1 x n 1 µα n fx n 1 )) µα n fx n 1 ) µα n 1 d f n 1 2, which implies that s n 2 = x n µα n fx n )) x n 1 µα n fx n 1 )) + µ α n δ 1) n 1 df n 1 α nδ 2) n 1 z n 1 α n fx n 1 ) α n 1 dn 1) f 2. From the inequality, x + y 2 x 2 + 2 x + y, y x, y H), we find that, for all n 2, s n 2 x n µα n fx n )) x n 1 µα n fx n 1 )) 2 + 2µ α n δ 1) n 1 df n 1 α nδ 2) n 1 z n 1 α n fx n 1 ) α n 1 d f n 1, s n. Moreover, Lemma 2.1 guarantees that, for all n 2, x n µα n fx n )) x n 1 µα n fx n 1 )) 2 1 τα n ) 2 x n x n 1 2 1 τα n )x n x n 1 2. The definition of d f n) n N means that, for all n 2, 2µ α n δ 1) n 1 df n 1 α nδ 2) n 1 z n 1 α n fx n 1 ) α n 1 d f n 1, s n α n δ 1) n 1 df n 1 α nδ 2) n 1 z n 1 α n fx n 1 ) = 2µ α n 1 fx n 1 ) + δ 1) n 2 df n 2 δ2) n 2 z n 2 ), s n = 2µα n 1 α n ) s n, fx n 1 ) + 2µα n δ 1) n 1 s n, d f n 1 + 2µα n δ 2) n 1 s n, z n 1 + 2µα n 1 δ 1) n 2 s n, d f n 2 + 2µα n 1 δ 2) n 2 s n, z n 2 2µ α n 1 α n s n, fx n 1 ) + 2µα n δ 1) n 1 s n, d f n 1 + 2µαn δ 2) n 1 s n, z n 1 + 2µα n 1 δ 1) n 2 s n, d f n 2 + 2µαn 1 δ 2) n 2 s n, z n 2.

18 Hideaki Iiduka The definition of M 6 leads one to deduce that, for all n 2, 2µ α n δ 1) n 1 df n 1 α nδ 2) n 1 z n 1 α n fx n 1 ) α n 1 d f n 1, s n M 6 α n 1 α n + 2µα n δ 1) n 1 τm 6) + 2µα n δ 2) n 1 τm 6) + 2µα n 1 δ 1) n 2 τm 6) + 2µα n 1 δ 2) n 2 τm 6). From α n 1 α n + α n α n 1 n N), the right hand side of the above inequality means that M 6 α n 1 α n + 2µα n δ 1) n 1 τm 6) + 2µα n δ 2) n 1 τm 6) + 2µα n 1 δ 1) n 2 τm 6) + 2µα n 1 δ 2) n 2 τm 6) M 6 α n 1 α n + 2µα n δ 1) n 1 τm 6) + 2µα n δ 2) n 1 τm 6) + 2µ {α n + α n α n 1 } δ 1) n 2 τm 6) + 2µ {α n + α n α n 1 } δ 2) n 2 τm 6) ) = M 6 + 2µδ 1) n 2 τm 6) + 2µδ 2) n 2 τm 6) α n 1 α n + 2µM 6 τα n δ 1) n 1 + 2µM 6τα n δ 2) n 1 + 2µM 6τα n δ 1) n 2 + 2µM 6τα n δ 2) n 2. Therefore, combining all the above inequalities means that, for all n 2, s n 2 1 τα n )x n x n 1 2 + M 7 α n 1 α n + 2µM 6 τα n δ 1) n 1 + 2µM 6τα n δ 2) n 1 + 2µM 6τα n δ 1) n 2 + 2µM 6τα n δ 2) n 2, 16) where M 7 := sup{m 6 + 2µδ 1) n 2 τm 6) + 2µδ 2) n 2 τm 6): n 2} <. Put r n := ˆNy n ) + γt n ) ˆNy n 1 ) + γt n 1 ) n N) and M 8 := max{sup{2γr n d N n +w n ): n N: n 1}, sup{2γr n d N n 1+w n 1 ): n N: n 2}} <. From x n+1 = P K ˆNy n ) + γt n ) n N) and the nonexpasivity of P K, we have that, for all n 1, ) ) x n+1 x n 2 2 = P K ˆN yn ) + γt n P K ˆN yn 1 ) + γt n 1 ) ) 2 ˆN yn ) + γt n ˆN yn 1 ) + γt n 1 = rn 2. So, from the inequality, x + y 2 x 2 + 2 x + y, y x, y H), we find that, for all n 1, x n+1 x n 2 r n 2 = ˆN y n ) ˆN y n 1 ) + γ t n t n 1 ) ˆN y n ) ˆN y n 1 ) 2 + 2γ r n, t n t n 1. 2

Convex Optimization over Fixed Point Set 19 The nonexpasivity of ˆN and PK that and the Cauchy-Schwarz inequality mean x n+1 x n 2 y n y n 1 2 + 2γ r n t n t n 1 = P K xn + µα n d f n ) PK x n 1 + µα n 1 d f n 1) 2 + 2γ rn t n t n 1 x n + µα n d f ) n x n 1 + µα n 1 dn 1) f 2 + 2γ rn t n t n 1 s n 2 + 2γ r n {t n + t n 1 }, where y n := P K x n + µα n d f n) and s n := x n + µα n d f n) x n 1 + µα n 1 d f n 1 ) n 1). From t n := β n 1) d N n + β 2) w n n N) and Condition iv), we find that x n+1 x n 2 { s n 2 + 2γr n β 1) n d N n s n 2 + 2γr n { α 2 n d N n s n 2 + M 8 α 2 n + α 2 n 1). n + β 2) n w n + β 1) + w n ) + αn 1 2 + β 2) dn 1 N + w n 1 )} n 1 d N n 1 n 1 w n 1 Moreover, from α n 1 α n + α n α n 1 n N), we have that, for all n 1, } x n+1 x n 2 s n 2 + M 8 τ τα nα n + M 8 α n 1 α n + α n α n 1 ) s n 2 + M 8 τ τα n α n + α n 1 ) + M 8 α n α n 1. Accordingly, Inequality 16) guarantees that, for all n 2, x n+1 x n 2 1 τα n )x n x n 1 2 + M 7 + M 8 ) α n 1 α n + 2µM 6 τα n δ 1) n 1 + 2µM 6 τα n δ 2) n 1 + 2µM 6τα n δ 1) n 2 + 2µM 6τα n δ 2) n 2 + M 8 τ τα n α n + α n 1 ). On the other hand, Conditions i) and v) guarantee that, for all ε > 0, there exists m 0 N such that M 8 /τ)α n ε/10, M 8 /τ)α n 1 ε/10, µm 6 δ i) n 1 ε/10, and µm 6 δ i) n 2 ε/10 i = 1, 2) for all n m 0. Therefore, we find that, for all n m 0, x n+1 x n 2 1 τα n )x n x n 1 2 + M 7 + M 8 ) α n 1 α n + τα n ε = 1 τα n )x n x n 1 2 + M 7 + M 8 ) α n α n 1 + 1 1 τα n )) ε.

20 Hideaki Iiduka Hence, for all m, n m 0, x n+m+1 x n+m 2 1 τα n+m )x n+m x n+m 1 2 + M 7 + M 8 ) α n+m α n+m 1 + ε 1 1 τα n+m )) 1 τα n+m ) { 1 τα n+m 1 )x n+m 1 x n+m 2 2 + M 7 + M 8 ) α n+m 1 α n+m 2 + ε 1 1 τα n+m 1 )) } + M 7 + M 8 ) α n+m α n+m 1 + ε 1 1 τα n+m )) 1 τα n+m )1 τα n+m 1 )x n+m 1 x n+m 2 2 + M 7 + M 8 ) α n+m α n+m 1 + α n+m 1 α n+m 2 ) + ε 1 1 τα n+m )1 τα n+m 1 )) n+m 1 k=m + ε 1 τα k+1 )x m+1 x m 2 + M 7 + M 8 ) 1 n+m 1 k=m 1 τα k+1 ) ). n+m 1 k=m α k+1 α k Since k=m 1 τα k+1) = 0 from Condition ii), we find that, for every m m 0, lim sup x n+1 x n 2 = lim sup x n+m+1 x n+m 2 n n 1 τα k+1 )x m+1 x m 2 + M 7 + M 8 ) α k+1 α k k=m + ε 1 M 7 + M 8 ) ) 1 τα k+1 ) k=m α k+1 α k + ε. k=m k=m Moreover, since lim m k=m α k+1 α k = 0 from Condition iii), we find that lim sup n x n+1 x n 2 ε for all ε > 0. The arbitrary property of ε ensures that lim x n+1 x n = 0. n From x n P K ˆNx n )) x n x n+1 +x n+1 P K ˆNx n )), lim n x n+1 P K ˆNx n )) = 0, and lim n x n+1 x n = 0, we find that lim x n P K ˆNxn )) = 0. n

Convex Optimization over Fixed Point Set 21 Therefore, the firm nonexpansivity of P K, the nonexpansivity of ˆN, FixN) = Fix ˆN) K = FixP K ), and Lemma 2.2 ensure that lim x n ˆN x n ) = 0. 17) n iii) Suppose that x FixN) is the unique solution to Problem 3.1. Choose a subsequence, x ni ) i N, of x n ) n N such that lim sup x x n, fx ) = lim x x ni, fx ). n i The boundedness of x ni ) i N guarantees the existence of a subsequence, x nij ) j N, of x ni ) i N and a point, x H, such that x nij ) j N weakly converges to x. From the closedness of K and x n ) n N K, we find that x K. We may assume without loss of generality that x ni ) i N weakly converges to x. We shall prove that x K is a fixed point of N. Assume that x ˆNx ). Opial s condition 8, Equation 17), and the nonexpansivity of ˆN produce a contradiction: lim inf i = lim inf i x n i x < lim inf x ni ˆNx ) i x ni ˆNx ni ) + ˆNx ni ) ˆNx ) = lim inf ˆNx ni ) ˆNx ) lim inf i x n i x. i Accordingly, we find that x Fix ˆN) = FixN). Since x FixN) is the solution to Problem 3.1, x x, fx ) 0 holds see Subsection 2.2). Therefore, lim sup x x n, fx ) = lim x x ni, fx ) = x x, fx ) 0. n i This completes the proof. Regarding Lemma 3.2ii), we can make the following remark. Remark 3.2 From x n ˆNx n ) = x n 1 γ)x n γnx n ) = γx n Nx n ) n N), Lemma 3.2ii) guarantees that lim n x n Nx n ) = 0. Let us see whether x n Nx n )) n N in Algorithm 3.1 monotonically decreases or not. For simplicity, we assume that x n + µα n d f n, y n + γd N n+1 K n N) and γ := 1, i.e., y n := x n + µα n d f n, x n+1 := y n + d N n+1 n N). The definition of d N n ) n N means that, for all n N, x n+1 N x n+1 ) = y n + d N n+1) N xn+1 ) = N y n ) + β n 1) d N n + β n 2) w n N x n+1 ) N y n ) N x n+1 ) + β n 1) d N n + β n 2) w n, 8 Suppose that x n ) n N H) weakly converges to ˆx H and x ˆx. Then, the following condition, called Opial s condition [26], is satisfied: lim inf n x n ˆx < lim inf n x n x. In the above situation, Opial s condition leads to lim inf i x ni x < lim inf i x ni ˆNx ).

22 Hideaki Iiduka which from the nonexpansivity of N implies that, for all n N, x n+1 N x n+1 ) y n x n+1 + β n 1) d N n + β n 2) w n. From the definition of y n and the triangle inequality, we also have that, for all n N, y n x n+1 = x n + µα n d f n x n+1 x n x n+1 + µα n d f n x n N x n ) + N x n ) x n+1 + µα n d f n. Since the the triangle inequality and the nonexpansivity of N guarantee that, for all n N, N x n ) x n+1 = N x n ) y n + dn+1) N = N x n ) N y n ) β n 1) d N n β n 2) w n N x n ) N y n ) + β n 1) d N n + β n 2) w n x n y n + β n 1) d N n + β n 2) w n = µα n d f n + β 1) n d N n + β 2) n w n, we find that, for all n N, ) x n+1 N x n+1 ) x n N x n ) + 2 µα n d f n + β n 1) d N n + β n 2) w n. This implies that x n Nx n )) n N does not monotonically decrease. However, for large enough n, µα n d f n + β n 1) d N n + β n 2) w n 0 by Conditions i) and iv) in Theorem 3.1. Therefore, we can see that x n Nx n )) n N will monotonically decrease for large enough n. Such a trend is also observed in the numerical examples in Section 4. Figures 5 and 7 show that x n Nx n )) n 10 in Algorithm 3.1 does not monotonically decrease, x n Nx n )) n>10 in Algorithm 3.1 monotonically decreases, and Algorithm 3.1 converges in FixN) faster than the existing algorithms. See Section 4 for the details about the numerical examples. We can prove Theorem 3.1 by using Lemmas 3.1 and 3.2. Proof of Theorem 3.1. Conditions II), i) and v), and Lemmas 3.1 and 3.2 guarantee that, for all ε > 0, there exists m 1 N such that, for all n m 1, µ τ x x n, fx ) ε µδ 2) n 1 10, µδ 1) n 1 τ x n x, d f n 1 ε 10, x x n, z n 1 ε τ 10, γα n d N n + w n ) ˆNyn ) τ ˆNx ε ) + γt n 10, µ 2 α n d f τ n, δ 1) n 1 df n 1 δ2) n 1 z n 1 fx ) ε 10. 18)

Convex Optimization over Fixed Point Set 23 The nonexpansivity of P K and the definition of d f n) n N imply that, for all n m 1, y n x 2 = P K xn + µα n d f ) n PK x ) 2 x n + µα n d f n x 2 ) = x n + µα n fx n ) + δ 1) n 1 df n 1 δ2) n 1 z n 1 x 2 = x n µα n fx n )) x µα n fx )) + µα n δ 1) n 1 df n 1 δ2) n 1 z n 1 fx )) 2. Accordingly, from the inequality, x + y 2 x 2 + 2 x + y, y x, y H) and Lemma 2.1, we have that, for all n m 1, y n x 2 x n µα n fx n )) x µα n fx )) 2 + 2µα n x n + µα n d f n x, δ 1) n 1 df n 1 δ2) n 1 z n 1 fx ) 1 τα n ) x n x 2 + 2µα n x n + µα n d f n x, δ 1) n 1 df n 1 δ2) n 1 z n 1 fx ) { = 1 τα n ) x n x 2 µ + 2τα n τ x x n, fx ) + µδ1) n 1 τ + µδ2) n 1 x x n, z n 1 + µ2 α n τ τ Therefore, Inequality 18) guarantees that, for all n m 1, x n x, d f n 1 d f n, δ 1) n 1 df n 1 δ2) n 1 z n 1 fx ) }. y n x 2 1 τα n ) x n x 2 + 4 5 ετα n. 19) Also, from the nonexpansivity of P K and the inequality, x + y 2 x 2 + 2 x + y, y x, y H), we have that, for all n m 1, ) ) x n+1 x 2 = P K ˆN yn ) + γt n P K ˆN x 2 ) ˆN y n ) ˆNx 2 ) + γt n ˆN y n ) ˆNx ) 2 + 2γ t n, ˆNy n ) ˆNx ) + γt n. Moreover, the nonexpasivity of ˆN and the Cauchy-Schwarz inequality mean that, for all n m 1, x n+1 x 2 y n x 2 + 2γ β n 1) d N n + β n 2) w n, ˆNy n ) ˆNx ) + γt n y n x 2 + 2γ β n 1) d N n + β n 2) w n ) ˆNyn ) ˆNx ) + γt n.

24 Hideaki Iiduka Condition iv) leads one to deduce that, for all n m 1, x n+1 x 2 y n x 2 + 2γα 2 n d N n = y n x 2 + 2τα n γα n τ + w n ) ˆNyn ) ˆNx ) + γt n d N n + w n ) ˆNyn ) ˆNx ) + γt n. Hence, Inequalities 19) and 18) imply that, for all n m 1, x n+1 x 2 1 τα n ) x n x 2 + 4 5 ετα n + 2τα n ε 10 = 1 τα n ) x n x 2 + ετα n Induction thus gives, for all n m 1, x n+1 x 2 n = 1 τα n ) x n x 2 + ε1 1 τα n )). k=m 1 1 τα k ) x m1 x 2 + ε 1 Since k=m 1 1 τα k+1 ) = 0 from Condition ii), we find that lim sup x n+1 x 2 ε. n n k=m 1 1 τα k ) The arbitrary property of ε ensures that lim sup n x n+1 x 2 0; i.e., lim n x n+1 x 2 = 0. This means that x n ) n N in Algorithm 3.1 strongly converges to the unique solution to Problem 3.1. ). 4 Numerical Examples This section provides numerical comparisons of the existing algorithms HSDM, HCGM, and HTCGM) with Algorithm 3.1 for the following problem: Problem 4.1 Minimize fx) := 1 x, Qx + b, x subject to x FixN), 2 where Q R S S S = 1000, 5000) is positive definite, b R S, and N : R S R S is nonexpansive with FixN). HSDM, HCGM, and HTCGM used in the experiment were as follows: x 0 R S, d f 0 := fx 0), ) x n+1 := N x n + 10 4 d f n, n + 1

Convex Optimization over Fixed Point Set 25 where the directions in HSDM, HCGM, and HTCGM are, respectively, d f n+1 := fx n+1), d f n+1 := fx 1 n+1) + n + 1) 0.01 df n, 20) d f n+1 := fx n+1) + 1 1 n + 1) 0.01 df n n + 1) 0.01 fx n+1). 21) It is guaranteed that the above HSDM, HCGM, and HTCGM converge to the unique solution to Problem 4.1 [15, Theorem 7]. The directions in Algorithm 3.1 used in the experiment are given by d f 0 := fx 0 ), d N 0 := Nx 0 + 10 4 d f 0 ) x 0 + 10 4 d f 0 ), d N n+1 := Ny n ) y n + 1 n + 1 dn n + 1 n + 1 Ny n) y n ), 22) d f n+1 := fx 1 1 n+1) + n + 1) 0.01 df n n + 1) 0.01 fx n+1), 23) where y n := P K x n + 10 4 / n + 1)d f n), x n+1 := P K y n + d N n+1), and K R S ) is a closed ball with a large radius. Theorem 3.1 guarantees that Algorithm 3.1 with the above directions converges to the unique solution to Problem 4.1. We also applied HCGM with each of the FR, PRP, HS, and DY formulas Algorithm 11) with δ n defined by one of Formulas 10)) to Problem 4.1 and verified whether HCGMs with the FR, PRP, HS, and DY formulas converge to the solution to Problem 4.1. We chose five random initial points and executed HSDM, HCGM, HTCGM, Algorithm 3.1, and HCGMs with the FR, PRP, HS, and DY formulas for any initial point. The following graphs plot the mean values of the fifth execution. The computer used in the experiment had an Intel Boxed Core i7 i7-870 2.93 GHz 8 M CPU and 8 GB of memory. The language was MATLAB 7.9. 4.1 Constraint set in Problem 4.1 is the intersection of two balls Suppose that b := 0 R S, Q R S S S = 1000, 5000) is a diagonal matrix which has eigenvalues, 1, 2,..., S, C 1 := {x R S : x 2 4}, and C 2 := {x R S : x 2, 0, 0..., 0) T 2 1}. Define N : R S R S by N := P C1 P C2. Then, N is nonexpansive because P C1 and P C2 are nonexpansive. Moreover, FixN) = C 1 C 2. Note that the exact solution to Problem 4.1 in this case is x := 1, 0, 0,..., 0) T R S. To see whether or not the algorithms used in the experiment converge to the solution, we employed the following function: for each n N, D n := x n x 2,

26 Hideaki Iiduka where x n is the nth approximation to the solution. The convergence of D n ) n N s to 0 implies that the algorithms converge to the solution to Problem 4.1. Figure 1 describes the behaviors of D n for HSDM, HCGM, HTCGM, and Algorithm 3.1 Proposed) when S = 1000. This figure shows that D n ) n N generated by Algorithm 3.1 converges to 0 faster than D n ) n N s generated by the existing algorithms, which means that Algorithm 3.1 converges fastest to the solution. The CPU time to compute x 2000 satisfying D 2000 < 10 6 in Algorithm 3.1 is about 8.1 s, while HSDM, HCGM, and HTCGM satisfy D n > 10 2 when the CPU time is about 8.1 s. In particular, Algorithm 3.1 converges to the solution faster than the best conventional HTCGM employing the three-term conjugate gradient-like direction. HTCGM has the direction, d N n+1 := Ny n ) y n, whereas Algorithm 3.1 has the direction in Equation 22) to converge in FixN) quickly. It is considered that this difference between HTCGM and Algorithm 3.1 leads us to the fast convergence of Algorithm 3.1. Figure 2 plots the behaviors of D n for HSDM, HCGM, HTCGM, and Algorithm 3.1 Proposed) when S = 5000 and shows that Algorithm 3.1 converges fastest, as can be seen in Figure 1. Let us apply HCGMs employing the conventional FR, PRP, HS, and DY formulas with lim n δ n 0 to Problem 4.1 in the above cases and see whether they converge to the solution. Unfortunately, it is not guaranteed that they converge to the solution because δ n defined by one of Formulas 10) satisfies lim n δ n 0 when κ, η 0 and the unique minimizer of f over R S satisfying fx ) = Qx = 0 i.e., x = Q 1 0 = 0) is not 10 5 10 5 10 0 10 0 Dn 10-5 HSDM HCGM HTCGM Proposed Dn 10-5 HSDM HCGM HTCGM Proposed 10-10 10-10 10-15 0 2000 4000 6000 8000 10000 Number of iterations 10-15 0 2000 4000 6000 8000 10000 Number of iterations Fig. 1 Behavior of D n := x n x 2 for HSDM, HCGM, HTCGM, and Algorithm 3.1 Proposed) when S = 1000 and {x } = Argmin x C1 C 2 fx) CPU times to compute x 500 in HSDM, HCGM, HTCGM, and Algorithm 3.1 are, respectively, 0.5048 s, 0.9446 s, 1.2596 s, and 2.0045 s.) Fig. 2 Behavior of D n := x n x 2 for HSDM, HCGM, HTCGM, and Algorithm 3.1 Proposed) when S = 5000 and {x } = Argmin x C1 C 2 fx) CPU times to compute x 500 in HSDM, HCGM, HTCGM, and Algorithm 3.1 are, respectively, 8.5586 s, 15.2463 s, 21.1717 s, and 33.4314 s.)