Iterative Hard Thresholding Methods for l 0 Regularized Convex Cone Programming
|
|
- Adrian Gordon
- 5 years ago
- Views:
Transcription
1 Iterative Hard Thresholding Methods for l 0 Regularized Convex Cone Programming arxiv: v2 [math.oc] 2 Nov 2012 Zhaosong Lu October 30, 2012 Abstract In this paper we consider l 0 regularized convex cone programming problems. In particular, we first propose an iterative hard thresholding (IHT) method and its variant for solving l 0 regularized box constrained convex programming. We show that the sequence generated by these methods converges to a local minimizer. Also, we establish the iteration complexity of the IHT method for finding an ǫ-local-optimal solution. We then propose a method for solving l 0 regularized convex cone programming by applying the IHT method to its quadratic penalty relaxation and establish its iteration complexity for finding an ǫ-approximate local minimizer. Finally, we propose a variant of this method in which the associated penalty parameter is dynamically updated, and show that every accumulation point is a local minimizer of the problem. Key words: Sparse approximation, iterative hard thresholding method, l 0 regularization, box constrained convex programming, convex cone programming 1 Introduction Sparse approximations have over the last decade gained a great deal of popularity in numerous areas. For example, in compressed sensing, a large sparse signal is decoded by finding a sparse solution to a system of linear equalities and/or inequalities. Our particular interest of this paper is to find a sparse approximation to a convex cone programming problem in the form of min f(x) s.t. Ax b K, (1) l x u Department of Mathematics, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada. ( zhaosong@sfu.ca). This work was supported in part by NSERC Discovery Grant. Part of this work was conduct during the author s sabbatical leave in Department of Industrial and Systems Engineering at Texas A&M University. The author would like to thank them for hosting his visit. 1
2 for some l R n, u R n +, A R m n and b R m, where K denotes the dual cone of a closed convex cone K R m, i.e., K = {s R m : s T x 0, x K}, and R n = {x : x i 0, 1 i n} and R n + = {x R n : 0 x i, 1 i n}. A sparse solution to (1) can be sought by solving the following l 0 regularized convex cone programming problem: min f(x)+λ x 0 s.t. Ax b K, l x u (2) for some λ > 0, where x 0 denotes the cardinality of x. Onespecial case of (2), that is, the l 0 - regularized unconstrained least squares problem, has been well studied in the literature (e.g., [13, 10]), and some methods were developed for solving it. For example, the iterative hard thresholding (IHT) methods [6, 2, 3] and matching pursuit algorithms [11, 14] were proposed to solve this type of problems. Recently, Lu and Zhang [10] proposed a penalty decomposition method for solving a more general class of l 0 minimization problems. As shown by the extensive experiments in [2, 3], the IHT method performs very well in finding a sparse solution to unconstrained least squares problems. In addition, the similar type of methods [5, 8] were successfully applied to find low rank solutions in the context of matrix completion. Inspired by these works, in this paper we study IHT methods for solving l 0 regularized convex cone programming problem (2). In particular, we first propose an IHT method and its variant for solving l 0 regularized box constrained convex programming. We show that the sequence generated by these methods converges to a local minimizer. Also, we establish the iteration complexity of the IHT method for finding an ǫ-local-optimal solution. We then propose a method for solving l 0 regularized convex cone programming by applying the IHT method to its quadratic penalty relaxation and establish its iteration complexity for finding an ǫ-approximate local minimizer of the problem. We also propose a variant of the method in which the associated penalty parameter is dynamically updated, and show that every accumulation point is a local minimizer of the problem. The outline of this paper is as follows. In Subsection 1.1 we introduce some notations that are used in the paper. In Section 2 we present some technical results about a projected gradient method for convex programming. In Section 3 we propose IHT methods for solving l 0 regularized box constrained convex programming and study their convergence. In section 4 we develop IHT methods for solving l 0 regularized convex cone programming and study their convergence. Finally, in Section 5 we present some concluding remarks. 1.1 Notation Given a nonempty closed convex Ω R n and an arbitrary point x Ω, N Ω (x) denotes the normal cone of Ω at x. In addition, d Ω (y) denotes the Euclidean distance between y R n and Ω. All norms used in the paper are Euclidean norm denoted by. We use U(r) to denote a ball centered at the origin with a radius r 0, that is, U(r) := {x R n : x r}. 2
3 2 Technical preliminaries In this section we present some technical results about a projected gradient method for convex programming that will be subsequently used in this paper. Consider the convex programming problem φ := minφ(x), (3) x X where X R n is a closed convex set and φ : X R is a smooth convex function whose gradient is Lipschitz continuous with constant L φ > 0. Assume that the set of optimal solutions of (3), denoted by X, is nonempty. Let L L φ be arbitrarily given. A projected gradient of φ at any x X with respect to X is defined as g(x) := L[x Π X (x φ(x)/l)], (4) where Π X ( ) is the projection map onto X (see, for example, [12]). The following properties of the projected gradient are essentially shown in Proposition 3 and Lemma 4 of [9] (see also [12]). Lemma 2.1 Let x X be given and define x + := Π X (x φ(x)/l). Then, for any given ǫ 0, the following statements hold: a) g(x) ǫ if and only if φ(x) N X (x + )+U(ǫ). b) g(x) ǫ implies that φ(x + ) N X (x + )+U(2ǫ). c) φ(x + ) φ(x) g(x) 2 /(2L). d) φ(x) φ(x ) g(x) 2 /(2L), where x Argmin{φ(y) : y X}. We next study a projected gradient method for solving (3). Projected gradient method for (3): Choose an arbitrary x 0 X. Set k = 0. end 1) Solve the subproblem 2) Set k k +1 and go to step 1). x k+1 = argmin x X {φ(xk )+ φ(x k ) T (x x k )+ L 2 x xk 2 }. (5) Some properties of the above projected gradient method are established in the following two theorems, which will be used in the subsequent sections of this paper. 3
4 Theorem 2.2 Let {x k } be generated by the above projected gradient method. Then the following statements hold: (i) For every k 0 and l 1, (ii) {x k } converges to some optimal solution x of (3). φ(x k+l ) φ L 2l xk x 2. (6) Proof. (i) Since the objective function of (5) is strongly convex with modulus L, it follows that for every x X, φ(x k )+ φ(x k ) T (x x k )+ L 2 x xk 2 φ(x k )+ φ(x k ) T (x k+1 x k )+ L 2 xk+1 x k 2 + L 2 x xk+1 2. By the convexity of φ, Lipschitz continuity of φ and L L φ, we have φ(x) φ(x k )+ φ(x k ) T (x x k ), φ(x k+1 ) φ(x k )+ φ(x k ) T (x k+1 x k )+ L 2 xk+1 x k 2, which together with the above inequality imply that φ(x)+ L 2 x xk 2 φ(x k+1 )+ L 2 x xk+1 2, x X. (7) Letting x = x k in (7), we obtain that φ(x k ) φ(x k+1 ) L x k+1 x k 2 /2. Hence, {φ(x k )} is decreasing. Letting x = x X in (7), we have φ(x k+1 ) φ L 2 ( x k x 2 x k+1 x 2), k 0. Using this inequality and the monotonicity of {φ(x k )}, we obtain that l(φ(x k+l ) φ ) which immediately yields (6). (ii) It follows from (8) that [φ(x i+1 ) φ ] L ( x k x 2 x k+l x 2), (8) 2 k+l 1 i=k x k+l x x k x, k 0,l 1. (9) Hence, x k x x 0 x for every k. It implies that {x k } is bounded. Then, there exists a subsequence K such that {x k } K ˆx X. It can be seen from (6) that {φ(x k )} K φ. Hence, φ(ˆx ) = lim k K φ(x k ) = φ, which implies that ˆx X. Since (9) holds for any x X, we also have x k+l ˆx x k ˆx for every k 0 and l 1. This together with the fact {x k } K ˆx implies that {x k } ˆx and hence statement (ii) holds. 4
5 Theorem 2.3 Suppose that φ is strongly convex with modulus σ > 0. Let {x k } be generated by the above projected gradient method. Then, for any given ǫ > 0, the following statements hold: (i) φ(x k ) φ ǫ whenever (ii) φ(x k ) φ < ǫ whenever k 2 L/σ log φ(x0 ) φ. ǫ k 2 L/σ log φ(x0 ) φ +1. ǫ Proof. (i) Let M = L/σ. It follows from Theorem 2.2 that φ(x k+l ) φ L 2l xk x 2 L σl (φ(xk ) φ ), where x is the optimal solution of (3). Hence, we have which implies that φ(x k+2m ) φ L 2σM (φ(xk ) φ ) 1 2 (φ(xk ) φ ), φ(x 2jM ) φ 1 2 j(φ(x0 ) φ ). Let K = log((φ(x 0 ) φ )/ǫ). Hence, when k 2KM, we have φ(x k ) φ φ(x 2KM ) φ 1 2 K(φ(x0 ) φ ) ǫ, which immediately implies that statement (i) holds. (ii) Let K and M be defined as above. If φ(x 2KM ) = φ, by monotonicity of {φ(x k )} we have φ(x k ) = φ when k > 2KM, and hence the conclusion holds. We now suppose that φ(x 2KM ) > φ. It implies that g(x 2KM ) 0, where g is defined in (4). Using this relation, Lemma 2.1 (c) and statement (i), we obtain that φ(x 2KM+1 ) < φ(x 2KM ) ǫ, which together with the montonicity of {φ(x k )} implies that the conclusion holds. Finally, we consider the convex programming problem f := min{f(x) : Ax b K,x X}, (10) for some A R m n and b R m, where f : X R is a smooth convex function whose gradient is Lipschitz continuous gradient with constant L f > 0, X R n is a closed convex set, and K is the dual cone of a closed convex cone K. 5
6 The Lagrangian dual function associated with (10) is given by d(µ) := inf{f(x)+µ T (Ax b) : x X}, µ K. Assume that there exists a Lagrange multiplier for (10), that is, a vector µ K such that d(µ ) = f. Under this assumption, the following results are established in Corollary 2 and Proposition 10 of [9], respectively. Lemma 2.4 Let µ be a Lagrange multiplier for (10). There holds: f(x) f µ d K (Ax b), x X. Lemma 2.5 Let ρ > 0 be given and L ρ = L f +ρ A 2. Consider the problem Φ ρ := min x X {Φ ρ(x) := f(x)+ ρ 2 [d K (Ax b)]2 }. (11) If x X is a ξ-approximate solution of (11), i.e., Φ ρ (x) Φ ρ ξ, then the pair (x +,µ) defined as x + := Π X (x Φ ρ (x)/l ρ ), µ := ρ[ax + b Π K (Ax + b)] is in X ( K) and satisfies µ T Π K (Ax + b) = 0 and the relations d K (Ax + b) 1 ρ µ ξ +, ρ f(x + )+A T µ N X (x + )+U(2 2L ρ ξ), where µ is an arbitrary Lagrange multiplier for (10). 3 l 0 regularized box constrained convex programming In this section we consider a special case of (2), that is, l 0 regularized box constrained convex programming problem in the form of: F := min F(x) := f(x)+λ x 0 s.t. l x u (12) for some λ > 0, l R n and u R n +. Recently, Blumensath and Davies [2, 3] proposed an iterative hard thresholding (IHT) method for solving a special case of (12) with f(x) = Ax b 2, l i = and u i = for all i. Our aim is to extend their IHT method to solve (12) and study its convergence. In addition, we establish its iteration complexity for finding an ǫ-local-optimal solution of (12). Finally, we propose a variant of the IHT method in which only local Lipschitz constant of f is used. 6
7 Throughout this section we assume that f is a smoothconvex function in B whose gradient is Lipschitz continuous with constant L f > 0, and also that f is bounded below on the set B, where B := {x R n : l x u}. (13) We now present an IHT method for solving problem (12). Iterative hard thresholding method for (12): Choose an arbitrary x 0 B. Set k = 0. end 1) Solve the subproblem x k+1 Argmin x B {f(xk )+ f(x k ) T (x x k )+ L 2 x xk 2 +λ x 0 }. (14) 2) Set k k +1 and go to step 1). Remark. The subproblem (14) has a closed form solution given in (21). In what follows, we study the convergence of the above IHT method for (12). Before proceeding, we introduce some notations that will be used subsequently. Define B I := {x B : x I = 0}, I {1,...,n}, (15) Π B (x) := argmin{ y x : y B}, x R n, s L (x) := x 1 f(x), x B, (16) L I(x) := {i : x i = 0}, x R n (17) for some constant L > L f. The following lemma establishes some properties of the operators s L ( ) and Π B (s L ( )), which will be used subsequently. Lemma 3.1 For any x, y R n, there hold: (1) [s L (x)] 2 i [s L (y)] 2 i 4( x y + [s L (y)] i ) x y ; (2) [Π B (s L (x)) s L (x)] 2 i [Π B (s L (y)) s L (y)] 2 i 4( x y + [Π B (s L (y)) s L (y)] i ) x y. Proof. (1) We observe that s L (x) s L (y) = x y 1 L ( f(x) f(y)) x y + 1 L f(x) f(y), (1+ L f ) x y 2 x y. (18) L 7
8 It follows from (18) that [s L (x)] 2 i [s L(y)] 2 i = [s L(x)] i +[s L (y)] i [s L (x)] i [s L (y)] i, (2) It can be shown that Using this inequality and (18), we then have ( [s L (x)] i [s L (y)] i +2 [s L (y)] i ) [s L (x)] i [s L (y)] i, 4( x y + [s L (y)] i ) x y. Π B (x) x+y Π B (y) x y. [Π B (s L (x)) s L (x)] 2 i [Π B (s L (y)) s L (y)] 2 i ( [Π B (s L (x)) s L (x)] i [Π B (s L (y)) s L (y)] i +2 Π B (s L (y)) s L (y)] i ) [Π B (s L (x)) s L (x)] i [Π B (s L (y)) s L (y)] i, ( s L (x) s L (y) +2 [Π B (s L (y)) s L (y)] i ) s L (x) s L (y), 4( x y + [Π B (s L (y)) s L (y)] i ) x y. The following lemma shows that for the sequence {x k }, the magnitude of any nonzero component x k i cannot be too small for k 1. Lemma 3.2 Let {x k } be generated by the above IHT method. Then, for all k 0, x k+1 j δ := minδ i > 0, if x k+1 j 0, (19) i/ I 0 where I 0 = {i : l i = u i = 0} and min(u i, 2λ/L), if l i = 0, δ i = min( l i, 2λ/L), if u i = 0, min( l i,u i, 2λ/L), otherwise, i I 0. (20) Proof. One can observe from (14) that for i = 1,...,n, [Π B (s L (x k ))] i, if [s L (x k )] 2 i [Π B(s L (x k )) s L (x k )] 2 i > 2λ, L x k+1 i = 0, if [s L (x k )] 2 i [Π B(s L (x k )) s L (x k )] 2 i < 2λ, L [Π B (s L (x k ))] i or 0, otherwise (21) (see, for example, [10]). Suppose that j is an index such that x k+1 j I 0 is define above. It follows from (21) that 0. Clearly, j / I 0, where x k+1 j = [Π B (s L (x k ))] j 0, [s L (x k )] 2 j [Π B (s L (x k )) s L (x k )] 2 j 2λ L. (22) 8
9 The second relation of (22) implies that [s L (x k )] j 2λ/L. In addition, by the first relation of (22) and the definition of Π B, we have x k+1 j = [Π B (s L (x k ))] j = min(max([s L (x k )] j,l j ),u j ) 0. (23) Recall that j / I 0. We next show that x k+1 j δ j by considering three separate cases: i) l j = 0; ii) u j = 0; and iii) l j u j 0. For case i), it follows from (23) that [s L (x k )] j 0 = min([s L (x k )] j,u j ). This together with the relation [s L (x k )] j 2λ/L and the definition of δ j implies that x k+1 j δ j. By the similar arguments, we can show that δ j also holds for the other two cases. Then, it is easy to see that the conclusion of this lemma holds. and x k+1 j x k+1 j We next establish that the sequence {x k } converges to a local minimizer of (12), and moreover, F(x k ) converges to a local minimum value of (12). Theorem 3.3 Let {x k } be generated by the above IHT method. Then, x k converges to a local minimizer x of problem (12) and moreover, I(x k ) I(x ), x k 0 x 0 and F(x k ) F(x ). Proof. Since f is Lipschitz continuous with constant L f, we have f(x k+1 ) f(x k )+ f(x k ) T (x x k )+ L f 2 xk+1 x k 2. Using this inequality, the fact that L > L f, and (14), we obtain that F(x k+1 ) = f(x k+1 )+λ x k+1 0 a {}}{ f(x k )+ f(x k ) T (x k+1 x k )+ L f 2 xk+1 x k 2 +λ x k+1 0, f(x k )+ f(x k ) T (x k+1 x k )+ L 2 xk+1 x k 2 +λ x k+1 0 }{{} b f(x k )+λ x k 0 = F(x k ), where the last inequality follows from (14). The above inequality implies that {F(x k )} is nonincreasing and moreover, F(x k ) F(x k+1 ) b a = L L f 2 x k+1 x k 2. (24) By the assumption, we know that f is bounded below in B. It then follows that {F(x k )} is bounded below. Hence, {F(x k )} converges to a finite value as k, which together with (24) implies that lim k xk+1 x k = 0. (25) 9
10 Let I k = I(x k ), where I( ) is defined in (17). In view of (19), we observe that x k+1 x k δ if I k I k+1. (26) This together with (25) implies that I k does not change when k is sufficient large. Hence, there exist some K 0 and I {1,...,n} such that I k = I for all k K. Then one can observe from (14) that x k+1 = argmin x B I {f(x k )+ f(x k ) T (x x k )+ L 2 x xk 2 }, k > K, where B I is defined in (15). It follows from Lemma 2.2 that x k x, where x Argmin{f(x) : x B I }. (27) It is not hard to see from (27) that x is a local minimizer of (12). In addition, we know from (19) that x k i δ for k > K and i / I. It yields x i δ for i / I and x i = 0 for i I. Hence, I(x k ) = I(x ) = I for all k > K, which clearly implies that x k 0 = x 0 for every k > K. By continuity of f, we have f(x k ) f(x ). It then follows that F(x k ) = f(x k )+λ x k 0 f(x )+λ x 0 = F(x ). AsshowninTheorem3.3, x k x forsomelocalminimizer x of (12)andF(x k ) F(x ). Our next aim is to establish the iteration complexity of the IHT method for finding an ǫ-localoptimal solution x ǫ B of (12) satisfying F(x ǫ ) F(x ) + ǫ and I(x ǫ ) = I(x ). Before proceeding, we define { α = β = min I {1,...,n} max I {1,...,n} min i [s L(x )] 2 i [Π B(s L (x )) s L (x )] 2 i 2λ } L : x Argmin{f(x) : x B I }(28), { } max [s L (x )] i + Π B (s L (x )) s L (x )] i : x Argmin{f(x) : x B I }. (29) i Theorem 3.4 Assume that f is a smooth strongly convex function with modulus σ > 0. Suppose that L > L f is chosen such that α > 0. Let {x k } be generated by the above IHT method, I k = I(x k ) for all k, x = lim k x k, and F = F(x ). Then, for any given ǫ > 0, the following statements hold: (i) The number changes of I k is at most 2(F(x 0 ) F ) (L L f. )δ 2 (ii) The total number of iterations by the IHT method for finding an ǫ-local-optimal solution x ǫ B satisfying I(x ǫ ) = I(x ) and F(x ǫ ) F +ǫ is at most 2 L/σ log θ, where ǫ { } θ = (F(x 0 ) F )2 ω+3 2, ω = max (d 2c)t ct 2 : 0 t 2(F(x 0 ) F ) (L L f, (30) )δ 2 t c = (L L f)δ 2 2(F(x 0 ) F ), γ = σ( 2α+β 2 β) 2 /32, (31) d = 2log(F(x 0 ) F )+4 2logγ +c. 10
11 Proof. (i) As shown in Theorem 3.3, I k only changes for a finite number of times. Assume that I k only changes at k = n 1 +1,...,n J +1, that is, I nj 1 +1 = = I nj I nj +1 = = I nj+1, j = 1,...,J 1, (32) where n 0 = 0. We next bound J, i.e., the total number of changes of I k. In view of (26) and (32), one can observe that x n j+1 x n j δ, j = 1,...,J, which together with (24) implies that F(x n j ) F(x n j+1 ) 1 2 (L L f)δ 2, j = 1,...,J. (33) Summing up these inequalities and using the monotonicity of {F(x k )}, we have and hence 1 2 (L L f)δ 2 J F(x n 1 ) F(x n J+1 ) F(x 0 ) F, (34) J 2(F(x 0 ) F ) (L L f )δ 2 (ii) Let n j be defined as above for j = 1,...,J. We first show that. (35) n j n j L/σ log ( F(x 0 ) (j 1)(L L f )δ 2 /2 F ) logγ, j = 1,...,J, (36) where F and γ are defined in (12) and (31), respectively. Indeed, one can observe from (14) that x k+1 = argmin x B {f(xk )+ f(x k ) T (x x k )+ L 2 x xk 2 : x Ik+1 = 0}. Therefore, for j = 1,...,J and k = n j 1,...,n j 1, x k+1 = argmin x B {f(xk )+ f(x k ) T (x x k )+ L 2 x xk 2 : x Inj = 0}. We arbitrarily choose 1 j J. Let x (depending on j) denote the optimal solution of One can observe that min {f(x) : x I nj = 0}. (37) x B x 0 x n j Also, it follows from (33) and the monotonicity of {F(x k )} that F(x n j+1 ) F(x 0 ) j 2 (L L f)δ 2, j = 1,...,J. (38) 11
12 Using these relations and the fact that F( x ) F, we have f(x n j 1+1 ) f( x ) = F(x n j 1+1 ) λ x n j F( x )+λ x 0, F(x 0 ) j 1 2 (L L f)δ 2 F. (39) Suppose for a contradiction that (36) does not hold for some 1 j J. Hence, we have n j n j 1 > 2+2 L/σ log ( F(x 0 ) (j 1)(L L f )δ 2 /2 F ) logγ. This inequality and (39) yields n j n j 1 > 2+2 L/σ log f(xnj 1+1 ) f( x ). γ Using the strong convexity of f and applying Theorem 2.3 (ii) to (37) with ǫ = γ, we obtain that σ 2 xn j x 2 f(x n j ) f( x ) < σ 32 ( 2α+β 2 β) 2. It implies that x n j x 2α+β2 β <. (40) 4 Using (40), Lemma 3.1 and the definition of β, we have [s L (x n j )] 2 i [s L( x )] 2 i [Π B(s L (x n j )) s L (x n j )] 2 i +[Π B(s L ( x )) s L ( x )] 2 i [s L (x n j )] 2 i [s L( x )] 2 i + [Π B(s L (x n j )) s L (x n j )] 2 i [Π B(s L ( x )) s L ( x )] 2 i 4(2 x n j x +β) x n j x < α, (41) where the last inequality is due to (40). Let { I = i : [s L ( x )] 2 i [Π B(s L ( x )) s L ( x )] 2 i < 2λ } L and let Ī = {1,...,n}\I. Since α > 0, we know that [s L ( x )] 2 i [Π B(s L ( x )) s L ( x )] 2 i > 2λ L, i Ī. It then follows from (41) and the definition of α that [s L (x n j )] 2 i [Π B(s L (x n j )) s L (x n j )] 2 i < 2λ L, i I, [s L (x n j )] 2 i [Π B (s L (x n j )) s L (x n j )] 2 i > 2λ L, i Ī. Observe that [Π B (s L (x n j ))] i 0 for all i Ī. This fact together with (21) implies that x n j+1 i = 0, i I and x n j+1 i 0, i Ī. 12
13 By a similar argument, one can show that x n j i = 0, i I and x n j i 0, i Ī. Hence, I nj = I nj +1 = I, which is a contradiction to (32). We thus conclude that (36) holds. Let N ǫ denote the total number of iterations for finding an ǫ-local-optimal solution x ǫ B by the IHT method satisfying I(x ǫ ) = I(x ) and F(x ǫ ) F +ǫ. We next establish an upper bound for N ǫ. Summing up the inequality (36) for j = 1,...,J, we obtain that n J J j=1 { 2+2 L/σ log(f(x 0 ) j 1 } 2 (L L f)δ 2 F ) logγ. Using this inequality, (34), and the facts that L σ and log(1 t) t for all t (0,1), we have J [ ( n J 2+2 L/σ log(f(x 0 ) j 1 )] 2 (L L f)δ 2 F ) logγ +1, j=1 J [ 2+2 L/σ (log(f(x 0 ) F ) (L L )] f)δ 2 2(F(x 0 ) F (j 1) logγ +1, ) j=1 L/σ (2log(F(x 0 ) F )+4 2logγ + (L L ) f)δ 2 2(F(x 0 ) F J (L L f)δ 2 ) 2(F(x }{{} 0 ) F ) }{{} d c By the definition of n J, we observe that after n J +1 iterations, the IHT method becomes the projected gradient method applied to the problem x = argmin x B {f(x) : x I nj +1 = 0}. J 2 (42). In addition, we know from Theorem 3.3 that I(x k ) = I(x ) for all k > n J. Hence, f(x k ) f(x ) = F(x k ) F when k > n J. Using these facts and Theorem 2.3 (ii), we have N ǫ n J +1+2 L/σ log F(xnJ+1 ) F. ǫ Using this inequality, (38), (42) and the facts that F F, L σ and log(1 t) t for all t (0,1), we obtain that ( N ǫ n J +1+2 L/σ log(f(x 0 ) J ) 2 (L L f)δ 2 F )+1 logǫ, ( n J + L/σ 2log(F(x 0 ) F ) (L L ) f)δ 2 J F(x 0 ) F +3 2logǫ L/σ [ (d 2c)J cj 2 +2log(F(x 0 ) F )+3 2logǫ ], 13
14 which together with (35) and (30) implies that N ǫ 2 L/σ log θ ǫ. The iteration complexity given in Theorem 3.4 is based on the assumption that f is strongly convex in B. We next consider a case where B is bounded and f is convex but not strongly convex. We will establish the iteration complexity of finding an ǫ-local-optimal solution of (12) by the IHT method applied to a perturbation of (12) obtained from adding a small strongly convex regularization term to f. Consider a perturbation of (12) in the form of where ν > 0 and F ν := min x B {F ν(x) := f ν (x)+λ x 0 }, (43) f ν (x) := f(x)+ ν 2 x 2. One can easily see that f ν is strongly convex in B with modulus ν and moreover f ν is Lipschitz continuous with constant L ν, where L ν = L f +ν. (44) We next establish the iteration complexity of finding an ǫ-local-optimal solution of (12) by the IHT method applied to (43). Given any L > 0, let s L, α and β be defined according to (16), (28) and (29), respectively, by replacing f by f ν, and let δ be defined in (19). Theorem 3.5 Suppose that B is bounded and f is convex but not strongly convex. Let ǫ > 0 be arbitrarily given, D = max{ x : x B}, ν = ǫ/d 2, and L > L ν be chosen such that α > 0. Let {x k } be generated by the IHT method applied to (43), and let x = lim k x k, Fν = F ν(x ) and F = min{f(x) : x B I }, where I = {i : x i = 0}. Then, the total number of iterations by the IHT method for finding an ǫ-local-optimal solution x ǫ B satisfying F(x ǫ ) F +ǫ is D at most 2 2 L f +1 log 2θ, where ǫ ǫ { } θ = (F ν (x 0 ) Fν )2ω+3 2, ω = max (d 2c)t ct 2 : 0 t 2(Fν(x 0 ) Fν) (L L ν)δ, 2 t c = (L Lν)δ2 2(F ν(x 0 ) F ν ), γ = ν( 2α+β 2 β) 2 /32, d = 2log(F ν (x 0 ) F ν )+4 2logγ +c. Proof. By Theorem 3.4 (ii), we see that the IHT method applied to (43) finds an ǫ/2- local-optimal solution x ǫ B of (43) satisfying I(x ǫ ) = I(x ) and F ν (x ǫ ) Fν +ǫ/2 within 2 L ν /ν log 2θ iterations. From the proof of Theorem 3.3, we observe that ǫ F ν (x ) = min{f ν (x) : x B I }. 14
15 Hence, we have Fν = F ν (x ) min f(x)+ νd2 x B I 2 F + ǫ 2. In addition, we observe that F(x ǫ ) F ν (x ǫ ). Hence, it follows that F(x ǫ ) F ν (x ǫ ) F ν + ǫ 2 F +ǫ. Note that F is a local optimal value of (12). Hence, x ǫ is an ǫ-local-optimal solution of (12). The conclusion of this theorem then follows from (44) and ν = ǫ/d 2. For the above IHT method, a fixed L is used through all iterations, which may be too conservative. To improve its practical performance, we can use local L that is update dynamically. The resulting variant of the method is presented as follows. A variant of IHT method for (12): Let 0 < L min < L max, τ > 1 and η > 0 be given. Choose an arbitrary x 0 B and set k = 0. 1) Choose L 0 k [L min,l max ] arbitrarily. Set L k = L 0 k. 1a) Solve the subproblem x k+1 Argmin x B {f(xk )+ f(x k ) T (x x k )+ L k 2 x xk 2 +λ x 0 }. (45) 1b) If F(x k ) F(x k+1 ) η 2 xk+1 x k 2 (46) is satisfied, then go to step 2). 1c) Set L k τl k and go to step 1a). 2) Set k k +1 and go to step 1). end Remark. L 0 k can be chosen by the similar scheme as used in [1, 4], that is, { }} L 0 k {L = max min,min L max, ft x, x 2 where x = x k x k 1 and f = f(x k ) f(x k 1 ). At each iteration, the IHT method solves a single subproblem in step 1). Nevertheless, its variant needs to solve a sequence of subproblems. We next show that for each outer iteration, its number of inner iterations is finite. 15
16 Theorem 3.6 For each k 0, the inner termination criterion (46) is satisfied after at most log(lf +η) log(l min ) +2 inner iterations. logτ Proof. Let L k denote the final value of L k at the kth outer iteration. By (45) and the similar arguments as for deriving (24), one can show that F(x k ) F(x k+1 ) L k L f 2 x k+1 x k 2. Hence, (46) holds whenever L k L f +η, which together with the definition of L k implies that L k /τ < L f +η, that is, L k < τ(l f +η). Let n k denote the number of inner iterations for the kth outer iteration. Then, we have L min τ nk 1 L 0 k τn k 1 = L k < τ(l f +η). log(lf +η) log(l Hence, n k min ) +2 and the conclusion holds. logτ We next establish that the sequence {x k } generated by the above variant of IHT method converges to a local minimizer of (12) and moreover, F(x k ) converges to a local minimum value of (12). Theorem 3.7 Let {x k } be generated by the above variant of IHT method. Then, x k converges to a local minimizer x of problem (12), and moreover, I(x k ) I(x ), x k 0 x 0 and F(x k ) F(x ). Proof. Let L k denote the final value of L k at the kth outer iteration. From the proof of Theorem 3.6, we know that L k [L min,τ(l f +η)). Using this fact and a similar argument as used to prove (19), one can obtain that x k+1 i δ := min δi > 0, if x k+1 j 0, i/ I 0 where I 0 = {i : l i = u i = 0} and δ i is defined according to (20) by replacing L by τ(l f +η) for all i I 0. It implies that x k+1 x k δ if I(x k ) I(x k+1 ). The conclusion then follows from this inequality and the similar arguments as used in the proof of Theorem l 0 -regularized convex cone programming In this section we consider l 0 -regularized convex cone programming problem (2) and propose IHT methods for solving it. In particular, we apply the IHT method proposed in Section 16
17 3 to a quadratic penalty relaxation of (2) and establish the iteration complexity for finding an ǫ-approximate local minimizer of (2). We also propose a variant of the method in which the associated penalty parameter is dynamically updated, and show that every accumulation point is a local minimizer of (2). Let B be defined in (13). We assume that f is a smooth convex function in B, f is Lipschitz continuous with constant L f and that f is bounded below on B. In addition, we make the following assumption throughout this section. Assumption 1 For each I {1,...,n}, there exists a Lagrange multiplier for f I = min{f(x) : Ax b K,x B I }, (47) provided that (47) is feasible, that is, there exists µ K such that f I = d I(µ ), where d I (µ) := inf{f(x)+µ T (Ax b) : x B I }, µ K. Let x be a point in B, and let I = {i : x i = 0}. One can observe that x is a local minimizer of (2) if and only if x is a minimizer of (47) with I = I. Then, in view of Assumption 1, we see that x is a local minimizer of (2) if and only if x B and there exists µ K such that Ax b K, (µ ) T (Ax b) = 0, (48) f(x )+A T µ N BI (x ). Based on the above observation, we can define an approximate local minimizer of (2) to be the one that nearly satisfies (48). Definition 1 Let x be a point in B, and let I = {i : x i = 0}. x is an ǫ-approximate local minimizer of (2) if there exists µ K such that d K (Ax b) ǫ, (µ ) T Π K (Ax b) = 0, f(x )+A T µ N BI (x )+U(ǫ). In what follows, we propose an IHT method for finding an approximate local minimizer of (2). In particular, we apply the IHT method or its variant to a quadratic penalty relaxation of (2) which is in the form of where Ψ ρ := min x B {Ψ ρ(x) := Φ ρ (x)+λ x 0 }, (49) Φ ρ (x) := f(x)+ ρ 2 [d K (Ax b)]2 (50) It is not hard to show that the function Φ ρ is convex differentiable and moreover Φ ρ is Lipschitz continuous with constant L ρ = L f +ρ A 2 (51) 17
18 (see, for example, Proposition 8 and Corollary 9 of [9]). Therefore, problem (49) can be suitably solved by the IHT method or its variant proposed in Section 3. Under the assumption that f is strongly convex in B, we next establish the iteration complexity of the IHT method applied to (49) for finding an approximate local minimizer of (2). Given any L > 0, let s L, α and β be defined according to (16), (28) and (29), respectively, by replacing f by Φ ρ, and let δ be defined in (19). Theorem 4.1 Assume that f is a smooth strongly convex function with modulus σ > 0. Given any ǫ > 0, let ρ = t ǫ A (52) for any t max min µ, where Λ I is the set of Lagrange multipliers of (47). Let L > L ρ I {1,...,n} µ Λ I be chosen such that α > 0. Let {x k } be generated by the IHT method applied to (49), and let x = lim k x k and Ψ ρ = Ψ ρ(x ). Then the IHT method finds an ǫ-approximate local minimizer of (2) in at most Lρ N := 2 log 8L ρθ σ ǫ 2 iterations, where { } θ = (Ψ ρ (x 0 ) Ψ ρ )2ω+3 2, ω = max (d 2c)t ct 2 2(Ψρ(x : 0 t 0 ) Ψ ρ ) (L L ρ)δ, 2 t c = (L Lρ)δ2 2(Ψ ρ(x 0 ) Ψ ρ ), γ = σ( 2α+β 2 β) 2 /32, d = 2log(Ψ ρ (x 0 ) Ψ ρ)+4 2logγ +c. Proof. We know from Theorem 3.3 that x k x for some local minimizer x of (49), I(x k ) I(x ) and Ψ ρ (x k ) Ψ ρ (x ) = Ψ ρ. By Theorem 3.4, after at most N iterations, the IHT method generates x B such at I( x) = I(x ) and Ψ ρ ( x) Ψ ρ (x ) ξ := ǫ 2 /(8L ρ ). It then follows that Φ ρ ( x) Φ ρ (x ) ξ. Since x is a local minimizer of (49), we observe that x = arg min Φ ρ (x), (53) x B I where I = I(x ). Hence, x is a ξ-approximate solution of (53). Let µ Argmin{ µ : µ Λ I }, where Λ I is the set of Lagrange multipliers of (47) with I = I. In view of Lemma 2.5, we see that the pair ( x +,µ) defined as x + := Π BI ( x Φ ρ ( x)/l ρ ) and µ := ρ[a x + b Π K (A x + b)] satisfies f( x + )+A T µ N BI ( x + )+U(2 2L ρ ξ) = N BI ( x + )+U(ǫ), ( ) d K (A x + b) 1 ρ µ ξ + 1 µ + ǫ ρ ρ 8 A ǫ, where the last inequality is due to (52) and the assumption t ˆt µ. Hence, x + is an ǫ-approximate local minimizer of (2). 18
19 We next consider finding an ǫ-approximate local minimizer of (2) for the case where B is bounded and f is convex but not strongly convex. In particular, we apply the IHT method or its variant to a quadratic penalty relaxation of a perturbation of (2) obtained from adding a small strongly convex regularization term to f. Consider a perturbation of (2) in the form of where min {f(x)+ ν x B 2 x 2 +λ x 0 : Ax b K }. (54) The associated quadratic penalty problem for (54) is given by Ψ ρ,ν := min x B {Ψ ρ,ν(x) := Φ ρ,ν (x)+λ x 0 }, (55) Φ ρ,ν (x) := f(x)+ ν 2 x 2 + ρ 2 [d K (Ax b)]2. One can easily see that Φ ρ,ν is strongly convex in B with modulus ν and moreover Φ ρ,ν is Lipschitz continuous with constant L ρ,ν := L f +ρ A 2 +ν. Clearly, the IHT method or its variant can be suitably applied to (55). We next establish the iteration complexity of the IHT method applied to (55) for finding an approximate local minimizer of (2). Given any L > 0, let s L, α and β be defined according to (16), (28) and (29), respectively, by replacing f by Φ ρ,ν, and let δ be defined in (19). Theorem 4.2 Suppose that B is bounded and f is convex but not strongly convex. Let ǫ > 0 be arbitrarily given, D = max{ x : x B}, ( D ) 2 + D +16t+ 2 2ǫ A ρ =, ν = ǫ (56) 16ǫ 2D for any t max min µ, where Λ I is the set of Lagrange multipliers of (47). Let L > L ρ,ν I {1,...,n} µ Λ I be chosen such that α > 0. Let {x k } be generated by the IHT method applied to (55), and let x = lim k x k and Ψ ρ,ν = Ψ ρ,ν(x ). Then the IHT method finds an ǫ-approximate local minimizer of (2) in at most iterations, where N := 2 2DLρ,ν ǫ log 32L ρ,νθ ǫ 2 { θ = (Ψ ρ,ν (x 0 ) Ψ ρ,ν )2ω+3 2, ω = max (d 2c)t ct 2 : 0 t t c = (L Lρ,ν)δ2 2(Ψ ρ,ν(x 0 ) Ψ ρ,ν ), γ = ν( 2α ρ,ν +β 2 ρ,ν β ρ,ν ) 2 /32, d = 2log(Ψ ρ,ν (x 0 ) Ψ ρ,ν)+4 2logγ +c. } 2(Ψρ,ν(x 0 ) Ψ ρ,ν ) (L L ρ,ν)δ, 2 19
20 Proof. From Theorem 3.3, we know that x k x for some local minimizer x of (55), I(x k ) I(x ) and Ψ ρ,ν (x k ) Ψ ρ,ν (x ) = Ψ ρ,ν. By Theorem 3.4, after at most N iterations, theiht methodapplied to (55) generates x B such at I( x) = I(x ) andψ ρ,ν ( x) Ψ ρ,ν (x ) ξ := ǫ 2 /(32L ρ,ν ). It then follows that Φ ρ,ν ( x) Φ ρ,ν (x ) ξ. Since x is a local minimizer of (55), we see that x = arg min Φ ρ,ν (x), (57) x B I where I = I(x ). Hence, x is a ξ-approximate solution of (57). In view of Lemma 2.5, we see that the pair ( x +,µ) defined as x + := Π BI ( x Φ ρ,ν ( x)/l ρ,ν ) and µ := ρ[a x + b Π K (A x + b)] satisfies f( x + )+ν x + +A T µ N BI ( x + )+U(2 2L ρ,ν ξ) = N BI ( x + )+U(ǫ/2), which together with the fact that ν x + νd ǫ/2 implies that f( x + )+A T µ ν x + N BI ( x + )+U(ǫ/2) N BI ( x + )+U(ǫ). In addition, it follows from Lemma 2.1 (c) that Φ ρ,ν ( x + ) Φ ρ,ν ( x), and hence Φ ρ,ν ( x + ) Φ ρ,ν (x ) Φ ρ,ν ( x) Φ ρ,ν (x ) ξ. Let Φ ρ = min{φ ρ(x) : x B I }, where Φ ρ is defined in (50). Notice that Φ ρ,ν (x ) Φ ρ + νd 2 /2. It then follows that Φ ρ ( x + ) Φ ρ Φ ρ,ν( x + ) Φ ρ,ν (x )+ νd2 2 ξ + ǫd 4 ǫ 2 32ρ A 2 + ǫd 4. Let µ Argmin{ µ : µ Λ I }, where Λ I is the set of Lagrange multipliers of (47) with I = I. In view of Lemma 2.5 and the assumption t ˆt µ, we obtain that d K (A x + b) 1 ǫ ρ µ ρ 2 A + ǫd 2 4ρ 1 ( ǫ ǫd t+ )+ ρ 32 A 4ρ = ǫ, where the last inequality is due to (56). Hence, x + is an ǫ-approximate local minimizer of (2). For the above method, the fixed penalty parameter ρ is used through all iterations, which may be too conservative. To improve its practical performance, we can update ρ dynamically. The resulting variant of the method is presented as follows. Before proceeding, we define the projected gradient of Φ ρ at x B I with respect to B I as g(x;ρ,i) = L ρ [x Π BI (x 1 L ρ Φ ρ (x))], (58) where I {1,...,n}, and Φ ρ and L ρ are defined in (50) and (51), respectively. 20
21 A variant of IHT method for (2): Let {ǫ k } be a positive decreasing sequence. Let ρ 0 > 0, τ > 1, t > max min µ, where Λ I I {1,...,n} µ Λ I is the set of Lagrange multipliers of (47). Choose an arbitrary x 0 B. Set k = 0. end 1) Start from x k 1 and apply the IHT method or its variant to problem (49) with ρ = ρ k until finding some x k B such that where I k = I(x k ). 2) Set ρ k+1 := τρ k. 3) Set k k +1 and go to step 1). d K (Ax k b) t ρ k, g(x k ;ρ k,i k ) min{1,l ρk }ǫ k, (59) The following theorem shows that x k satisfying (59) can be found within a finite number of iterations by the IHT method or its variant applied to problem (49) with ρ = ρ k. Without loss of generality, we consider the IHT method or its variant applied to problem (49) with any given ρ > 0. Theorem 4.3 Let x 0 B be an arbitrary point and the sequence {x l } be generated by the IHT method or its variant applied to problem (49). Then, the following statements hold: (i) lim l g(x l ;ρ,i l ) = 0, where I l = I(x l ) for all l. min µ and Λ I is the set of Lagrange multi- µ Λ I (ii) lim d K (Ax l b) ˆt, where ˆt := max l ρ pliers of (47). I {1,...,n} Proof. (i) It follows from Theorems 3.3 and 3.7 that x l x for some local minimizer x of (49) and moreover, Φ ρ (x l ) Φ ρ (x ) and I l I, where I l = I(x l ) and I = I(x ). We also know that x Arg min Φ ρ (x). x B I It then follows from Lemma 2.1 (d) that Φ ρ (x l ) Φ ρ (x ) 1 2L ρ g(x l ;ρ,i ) 2, l N. Using this inequality and Φ ρ (x l ) Φ ρ (x ), we thus have g(x l ;ρ,i ) 0. Since I l = I for l N, we also have g(x l ;ρ,i l ) 0. 21
22 (ii) Let f I be defined in (47). Applying Lemma 2.4 to problem (47), we know that f(x l ) f I(l) ˆtd K (Ax l b), l, (60) where ˆt is defined above. Let x and I be defined in the proof of statement (i). We observe that f I Φ ρ(x ). Using this relation and (60), we have that for sufficiently large l, Φ ρ (x l ) Φ ρ (x ) = f(x l )+ ρ 2 [d K (Ax l b)] 2 Φ ρ (x ) f(x l ) f I + ρ 2 [d K (Ax l b)] 2 which implies that = f(x l ) f I(l) + ρ 2 [d K (Ax l b)] 2 ˆtd K (Ax l b)+ ρ 2 [d K (Ax l b)] 2, d K (Ax l b) ˆt ρ + Φ ρ (x l ) Φ ρ (x ). ρ This inequality together with the fact lim l Φ ρ (x l ) = Φ ρ (x ) yields statement (ii). Remark. From Theorem 4.3, we can see that the inner iterations of the above method terminates finitely. We next establish convergence of the outer iterations of the above variant of the IHT method for (2). In particular, we show that every accumulation point of {x k } is a local minimizer of (2). Theorem 4.4 Let {x k } be the sequence generated by the above variant of the IHT method for solving (2). Then any accumulation point of {x k } is a local minimizer of (2). Proof. Let x k = Π BIk (x k 1 L ρk Φ ρk (x k )). Since {x k } satisfies (59), it follows from Lemma 2.1 (a) that Φ ρk (x k ) N BIk ( x k )+U(ǫ k ), (61) where I k = I(x k ). Let x be any accumulation point of {x k }. Then there exists a subsequence K such that {x k } K x. By passing to a subsequence if necessary, we can assume that I k = I for all k K. Let µ k = ρ k [Ax k b Π K (Ax k b)]. We clearly see that Using (61) and the definitions of Φ ρ and µ k, we have (µ k ) T Π K (Ax k b) = 0. (62) f(x k )+A T µ k N BI ( x k )+U(ǫ k ), k K. (63) 22
23 By (58), (59) and the definition of x k, one can observe that x k x k = 1 L ρk g(x k ;ρ k,i k ) ǫ k. (64) Inaddition, noticethat µ k = ρ k d K (Ax k b), which together with(59)implies that µ k t for all k. Hence, {µ k } is bounded. By passing to a subsequence if necessary, we can assume that {µ k } K µ. Using (64) and upon taking limits on both sides of (62) and (63) as k K, we have (µ ) T Π K (Ax b) = 0, f(x )+A T µ N BI (x ) In addition, since x k I = 0 for k K, we know that x I = 0. Also, it follows from (59) that d K (Ax b) = 0, which implies that Ax b K. These relations yield that and hence, x is a local minimizer of (2). 5 Concluding remarks x Argmin x B I {f(x) : Ax b K }, In this paper we studied iterative hard thresholding (IHT) methods for solving l 0 regularized convex cone programming problems. In particular, we first proposed an IHT method and its variant for solving l 0 regularized box constrained convex programming. We showed that the sequence generated by these methods converges to a local minimizer. Also, we established the iteration complexity of the IHT method for finding an ǫ-local-optimal solution. We then proposed a method for solving l 0 regularized convex cone programming by applying the IHT method to its quadratic penalty relaxation and established its iteration complexity for finding an ǫ-approximate local minimizer. Finally, we proposed a variant of this method in which the associated penalty parameter is dynamically updated, and showed that every accumulation point is a local minimizer of the problem. Some of the methods studied in this paper can be extended to solve some l 0 regularized nonconvex optimization problems. For example, the IHT method and its variant can be applied to problem (12) in which f is nonconvex and f is Lipschitz continuous. In addition, the numerical study of the IHT methods will be presented in the working paper [7]. Finally, it would be interesting to extend the methods of this paper to solve rank minimization problems and compare them with the methods studied in [5, 8]. This is left as a future research. Acknowledgment The author would like to thank Ting Kei Pong for proofreading and suggestions which substantially improves the presentation of the paper. 23
24 References [1] J. Barzilai and J.M. Borwein. Two point step size gradient methods. IMA J. Numer. Anal., 8: , [2] T. Blumensath and M. E. Davies. Iterative thresholding for sparse approximations. J. FOURIER ANAL. APPL., 14: , [3] T. Blumensath and M. E. Davies. Iterative hard thresholding for compressed sensing. Appl. Comput. Harmon. Anal., 27(3): , [4] E. G. Birgin, J. M. Martínez, and M. Raydan. Nonmonotone spectral projected gradient methods on convex sets. SIAM J. Optim., 4: , [5] J. Cai, E. Candès, and Z. Shen. A singular value thresholding algorithm for matrix completion. SIAM J. Optim., 20: , [6] K. K. Herrity, A. C. Gilbert, and J. A. Tropp. Sparse approximation via iterative thresholding. IEEE International Conference on Acoustics, Speech and Signal Processing, [7] J. Huang, S. Liu, and Z. Lu. Sparse approximation via nonconvex regularizers. Working paper, Department of Statistics, Texas A&M University, [8] P. Jain, R. Meka, and I. Dhillon. Guaranteed rank minimization via singular value projection. Neural Information Processing Systems, , [9] G. Lan and R. D. C. Monteiro. Iteration-complexity of first-order penalty methods for convex programming. To appear in Math. Prog.. [10] Z. Lu and Y. Zhang. Sparse approximation via penalty decomposition methods. Manuscript, Department of Mathematics, Simon Fraser University, Februray [11] S. Mallat and Z. Zhang. Matching pursuits with time-frequency dictionaries. IEEE T. Image Process., 41(12): , [12] Y. Nesterov. Introductory Lectures on Convex Programming: a basic course. Kluwer Academic Publishers, Massachusetts, [13] M. Nikolova. Description of the minimizers of least squares regularized with l 0 norm. Report HAL , CMLA - CNRS ENS Cachan, France, October [14] J. A. Tropp. Greed is good: algorithmic results for sparse approximation. IEEE T. Inform. Theory, 50(10): ,
Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming
Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Zhaosong Lu October 5, 2012 (Revised: June 3, 2013; September 17, 2013) Abstract In this paper we study
More informationSparse Approximation via Penalty Decomposition Methods
Sparse Approximation via Penalty Decomposition Methods Zhaosong Lu Yong Zhang February 19, 2012 Abstract In this paper we consider sparse approximation problems, that is, general l 0 minimization problems
More informationOptimization over Sparse Symmetric Sets via a Nonmonotone Projected Gradient Method
Optimization over Sparse Symmetric Sets via a Nonmonotone Projected Gradient Method Zhaosong Lu November 21, 2015 Abstract We consider the problem of minimizing a Lipschitz dierentiable function over a
More informationRandomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming
Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming Zhaosong Lu Lin Xiao June 25, 2013 Abstract In this paper we propose a randomized block coordinate non-monotone
More informationA Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming
A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming Zhaosong Lu Lin Xiao March 9, 2015 (Revised: May 13, 2016; December 30, 2016) Abstract We propose
More informationIteration-complexity of first-order penalty methods for convex programming
Iteration-complexity of first-order penalty methods for convex programming Guanghui Lan Renato D.C. Monteiro July 24, 2008 Abstract This paper considers a special but broad class of convex programing CP)
More informationSpectral gradient projection method for solving nonlinear monotone equations
Journal of Computational and Applied Mathematics 196 (2006) 478 484 www.elsevier.com/locate/cam Spectral gradient projection method for solving nonlinear monotone equations Li Zhang, Weijun Zhou Department
More informationHomotopy methods based on l 0 norm for the compressed sensing problem
Homotopy methods based on l 0 norm for the compressed sensing problem Wenxing Zhu, Zhengshan Dong Center for Discrete Mathematics and Theoretical Computer Science, Fuzhou University, Fuzhou 350108, China
More informationarxiv: v1 [math.oc] 1 Jul 2016
Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the
More informationIteration-complexity of first-order augmented Lagrangian methods for convex programming
Math. Program., Ser. A 016 155:511 547 DOI 10.1007/s10107-015-0861-x FULL LENGTH PAPER Iteration-complexity of first-order augmented Lagrangian methods for convex programming Guanghui Lan Renato D. C.
More informationConvex Optimization. Newton s method. ENSAE: Optimisation 1/44
Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)
More information1. Introduction. We consider the following constrained optimization problem:
SIAM J. OPTIM. Vol. 26, No. 3, pp. 1465 1492 c 2016 Society for Industrial and Applied Mathematics PENALTY METHODS FOR A CLASS OF NON-LIPSCHITZ OPTIMIZATION PROBLEMS XIAOJUN CHEN, ZHAOSONG LU, AND TING
More informationYou should be able to...
Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set
More informationOn the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method
Optimization Methods and Software Vol. 00, No. 00, Month 200x, 1 11 On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method ROMAN A. POLYAK Department of SEOR and Mathematical
More informationSparse Recovery via Partial Regularization: Models, Theory and Algorithms
Sparse Recovery via Partial Regularization: Models, Theory and Algorithms Zhaosong Lu and Xiaorui Li Department of Mathematics, Simon Fraser University, Canada {zhaosong,xla97}@sfu.ca November 23, 205
More informationInexact Alternating Direction Method of Multipliers for Separable Convex Optimization
Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization Hongchao Zhang hozhang@math.lsu.edu Department of Mathematics Center for Computation and Technology Louisiana State
More informationShiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers
Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)
More informationAdaptive First-Order Methods for General Sparse Inverse Covariance Selection
Adaptive First-Order Methods for General Sparse Inverse Covariance Selection Zhaosong Lu December 2, 2008 Abstract In this paper, we consider estimating sparse inverse covariance of a Gaussian graphical
More informationDouglas-Rachford splitting for nonconvex feasibility problems
Douglas-Rachford splitting for nonconvex feasibility problems Guoyin Li Ting Kei Pong Jan 3, 015 Abstract We adapt the Douglas-Rachford DR) splitting method to solve nonconvex feasibility problems by studying
More informationFrank-Wolfe Method. Ryan Tibshirani Convex Optimization
Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)
More informationAn Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods
An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This
More informationOptimization methods
Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,
More informationFIXED POINT ITERATIONS
FIXED POINT ITERATIONS MARKUS GRASMAIR 1. Fixed Point Iteration for Non-linear Equations Our goal is the solution of an equation (1) F (x) = 0, where F : R n R n is a continuous vector valued mapping in
More informationCoordinate Update Algorithm Short Course Subgradients and Subgradient Methods
Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n
More informationConditional Gradient (Frank-Wolfe) Method
Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties
More informationPrimal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming
Mathematical Programming manuscript No. (will be inserted by the editor) Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming Guanghui Lan Zhaosong Lu Renato D. C. Monteiro
More informationA derivative-free nonmonotone line search and its application to the spectral residual method
IMA Journal of Numerical Analysis (2009) 29, 814 825 doi:10.1093/imanum/drn019 Advance Access publication on November 14, 2008 A derivative-free nonmonotone line search and its application to the spectral
More informationAlgorithms for constrained local optimization
Algorithms for constrained local optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Algorithms for constrained local optimization p. Feasible direction methods Algorithms for constrained
More informationContraction Methods for Convex Optimization and Monotone Variational Inequalities No.16
XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of
More informationAn Augmented Lagrangian Approach for Sparse Principal Component Analysis
An Augmented Lagrangian Approach for Sparse Principal Component Analysis Zhaosong Lu Yong Zhang July 12, 2009 Abstract Principal component analysis (PCA) is a widely used technique for data analysis and
More informationAccelerated primal-dual methods for linearly constrained convex problems
Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize
More informationA Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming
A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming Zhaosong Lu Lin Xiao June 8, 2014 Abstract In this paper we propose a randomized nonmonotone block
More informationConstrained Optimization and Lagrangian Duality
CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may
More informationGradient methods for minimizing composite functions
Math. Program., Ser. B 2013) 140:125 161 DOI 10.1007/s10107-012-0629-5 FULL LENGTH PAPER Gradient methods for minimizing composite functions Yu. Nesterov Received: 10 June 2010 / Accepted: 29 December
More informationGradient methods for minimizing composite functions
Gradient methods for minimizing composite functions Yu. Nesterov May 00 Abstract In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum
More informationGLOBAL CONVERGENCE OF CONJUGATE GRADIENT METHODS WITHOUT LINE SEARCH
GLOBAL CONVERGENCE OF CONJUGATE GRADIENT METHODS WITHOUT LINE SEARCH Jie Sun 1 Department of Decision Sciences National University of Singapore, Republic of Singapore Jiapu Zhang 2 Department of Mathematics
More informationarxiv: v1 [math.oc] 5 Dec 2014
FAST BUNDLE-LEVEL TYPE METHODS FOR UNCONSTRAINED AND BALL-CONSTRAINED CONVEX OPTIMIZATION YUNMEI CHEN, GUANGHUI LAN, YUYUAN OUYANG, AND WEI ZHANG arxiv:141.18v1 [math.oc] 5 Dec 014 Abstract. It has been
More informationStability and Robustness of Weak Orthogonal Matching Pursuits
Stability and Robustness of Weak Orthogonal Matching Pursuits Simon Foucart, Drexel University Abstract A recent result establishing, under restricted isometry conditions, the success of sparse recovery
More informationLecture 7: September 17
10-725: Optimization Fall 2013 Lecture 7: September 17 Lecturer: Ryan Tibshirani Scribes: Serim Park,Yiming Gu 7.1 Recap. The drawbacks of Gradient Methods are: (1) requires f is differentiable; (2) relatively
More informationMath 273a: Optimization Subgradient Methods
Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R
More informationA NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang
A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES Fenghui Wang Department of Mathematics, Luoyang Normal University, Luoyang 470, P.R. China E-mail: wfenghui@63.com ABSTRACT.
More informationOptimization for Machine Learning
Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html
More informationCubic regularization of Newton s method for convex problems with constraints
CORE DISCUSSION PAPER 006/39 Cubic regularization of Newton s method for convex problems with constraints Yu. Nesterov March 31, 006 Abstract In this paper we derive efficiency estimates of the regularized
More informationGradient based method for cone programming with application to large-scale compressed sensing
Gradient based method for cone programming with application to large-scale compressed sensing Zhaosong Lu September 3, 2008 (Revised: September 17, 2008) Abstract In this paper, we study a gradient based
More informationDesign of Projection Matrix for Compressive Sensing by Nonsmooth Optimization
Design of Proection Matrix for Compressive Sensing by Nonsmooth Optimization W.-S. Lu T. Hinamoto Dept. of Electrical & Computer Engineering Graduate School of Engineering University of Victoria Hiroshima
More information5 Handling Constraints
5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest
More informationMath 273a: Optimization Subgradients of convex functions
Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions
More informationKaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization
Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä New Proximal Bundle Method for Nonsmooth DC Optimization TUCS Technical Report No 1130, February 2015 New Proximal Bundle Method for Nonsmooth
More informationContraction Methods for Convex Optimization and Monotone Variational Inequalities No.18
XVIII - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No18 Linearized alternating direction method with Gaussian back substitution for separable convex optimization
More informationLECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE
LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization
More informationAccelerated Block-Coordinate Relaxation for Regularized Optimization
Accelerated Block-Coordinate Relaxation for Regularized Optimization Stephen J. Wright Computer Sciences University of Wisconsin, Madison October 09, 2012 Problem descriptions Consider where f is smooth
More informationOn the acceleration of augmented Lagrangian method for linearly constrained optimization
On the acceleration of augmented Lagrangian method for linearly constrained optimization Bingsheng He and Xiaoming Yuan October, 2 Abstract. The classical augmented Lagrangian method (ALM plays a fundamental
More informationWorst Case Complexity of Direct Search
Worst Case Complexity of Direct Search L. N. Vicente October 25, 2012 Abstract In this paper we prove that the broad class of direct-search methods of directional type based on imposing sufficient decrease
More informationDifferentiable exact penalty functions for nonlinear optimization with easy constraints. Takuma NISHIMURA
Master s Thesis Differentiable exact penalty functions for nonlinear optimization with easy constraints Guidance Assistant Professor Ellen Hidemi FUKUDA Takuma NISHIMURA Department of Applied Mathematics
More informationA smoothing augmented Lagrangian method for solving simple bilevel programs
A smoothing augmented Lagrangian method for solving simple bilevel programs Mengwei Xu and Jane J. Ye Dedicated to Masao Fukushima in honor of his 65th birthday Abstract. In this paper, we design a numerical
More informationAccelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)
Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan
More informationarxiv: v1 [math.na] 26 Nov 2009
Non-convexly constrained linear inverse problems arxiv:0911.5098v1 [math.na] 26 Nov 2009 Thomas Blumensath Applied Mathematics, School of Mathematics, University of Southampton, University Road, Southampton,
More informationIdentifying Active Constraints via Partial Smoothness and Prox-Regularity
Journal of Convex Analysis Volume 11 (2004), No. 2, 251 266 Identifying Active Constraints via Partial Smoothness and Prox-Regularity W. L. Hare Department of Mathematics, Simon Fraser University, Burnaby,
More informationWorst Case Complexity of Direct Search
Worst Case Complexity of Direct Search L. N. Vicente May 3, 200 Abstract In this paper we prove that direct search of directional type shares the worst case complexity bound of steepest descent when sufficient
More informationCharacterizations of Solution Sets of Fréchet Differentiable Problems with Quasiconvex Objective Function
Characterizations of Solution Sets of Fréchet Differentiable Problems with Quasiconvex Objective Function arxiv:1805.03847v1 [math.oc] 10 May 2018 Vsevolod I. Ivanov Department of Mathematics, Technical
More informationAn Alternating Direction Method for Finding Dantzig Selectors
An Alternating Direction Method for Finding Dantzig Selectors Zhaosong Lu Ting Kei Pong Yong Zhang November 19, 21 Abstract In this paper, we study the alternating direction method for finding the Dantzig
More informationFast proximal gradient methods
L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient
More informationOn the convergence properties of the projected gradient method for convex optimization
Computational and Applied Mathematics Vol. 22, N. 1, pp. 37 52, 2003 Copyright 2003 SBMAC On the convergence properties of the projected gradient method for convex optimization A. N. IUSEM* Instituto de
More informationNewton s Method. Javier Peña Convex Optimization /36-725
Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and
More informationOn the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean
On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean Renato D.C. Monteiro B. F. Svaiter March 17, 2009 Abstract In this paper we analyze the iteration-complexity
More informationLecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem
Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R
More informationLecture 15: SQP methods for equality constrained optimization
Lecture 15: SQP methods for equality constrained optimization Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lecture 15: SQP methods for equality constrained
More informationRadius Theorems for Monotone Mappings
Radius Theorems for Monotone Mappings A. L. Dontchev, A. Eberhard and R. T. Rockafellar Abstract. For a Hilbert space X and a mapping F : X X (potentially set-valued) that is maximal monotone locally around
More informationA Unified Approach to Proximal Algorithms using Bregman Distance
A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department
More informationMinimizing the Difference of L 1 and L 2 Norms with Applications
1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:
More informationTechnische Universität Dresden Herausgeber: Der Rektor
Als Manuskript gedruckt Technische Universität Dresden Herausgeber: Der Rektor The Gradient of the Squared Residual as Error Bound an Application to Karush-Kuhn-Tucker Systems Andreas Fischer MATH-NM-13-2002
More informationIntroduction to Alternating Direction Method of Multipliers
Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction
More informationUses of duality. Geoff Gordon & Ryan Tibshirani Optimization /
Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear
More informationMS&E 318 (CME 338) Large-Scale Numerical Optimization
Stanford University, Management Science & Engineering (and ICME) MS&E 318 (CME 338) Large-Scale Numerical Optimization 1 Origins Instructor: Michael Saunders Spring 2015 Notes 9: Augmented Lagrangian Methods
More informationResearch Note. A New Infeasible Interior-Point Algorithm with Full Nesterov-Todd Step for Semi-Definite Optimization
Iranian Journal of Operations Research Vol. 4, No. 1, 2013, pp. 88-107 Research Note A New Infeasible Interior-Point Algorithm with Full Nesterov-Todd Step for Semi-Definite Optimization B. Kheirfam We
More informationProjection methods to solve SDP
Projection methods to solve SDP Franz Rendl http://www.math.uni-klu.ac.at Alpen-Adria-Universität Klagenfurt Austria F. Rendl, Oberwolfach Seminar, May 2010 p.1/32 Overview Augmented Primal-Dual Method
More informationGENERALIZED second-order cone complementarity
Stochastic Generalized Complementarity Problems in Second-Order Cone: Box-Constrained Minimization Reformulation and Solving Methods Mei-Ju Luo and Yan Zhang Abstract In this paper, we reformulate the
More informationLagrange Relaxation and Duality
Lagrange Relaxation and Duality As we have already known, constrained optimization problems are harder to solve than unconstrained problems. By relaxation we can solve a more difficult problem by a simpler
More informationEfficient Methods for Stochastic Composite Optimization
Efficient Methods for Stochastic Composite Optimization Guanghui Lan School of Industrial and Systems Engineering Georgia Institute of Technology, Atlanta, GA 3033-005 Email: glan@isye.gatech.edu June
More information1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0
Numerical Analysis 1 1. Nonlinear Equations This lecture note excerpted parts from Michael Heath and Max Gunzburger. Given function f, we seek value x for which where f : D R n R n is nonlinear. f(x) =
More informationA projection-type method for generalized variational inequalities with dual solutions
Available online at www.isr-publications.com/jnsa J. Nonlinear Sci. Appl., 10 (2017), 4812 4821 Research Article Journal Homepage: www.tjnsa.com - www.isr-publications.com/jnsa A projection-type method
More informationStep lengths in BFGS method for monotone gradients
Noname manuscript No. (will be inserted by the editor) Step lengths in BFGS method for monotone gradients Yunda Dong Received: date / Accepted: date Abstract In this paper, we consider how to directly
More informationof Orthogonal Matching Pursuit
A Sharp Restricted Isometry Constant Bound of Orthogonal Matching Pursuit Qun Mo arxiv:50.0708v [cs.it] 8 Jan 205 Abstract We shall show that if the restricted isometry constant (RIC) δ s+ (A) of the measurement
More informationOne Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties
One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,
More informationA CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE
Journal of Applied Analysis Vol. 6, No. 1 (2000), pp. 139 148 A CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE A. W. A. TAHA Received
More information6. Proximal gradient method
L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping
More informationComplexity of gradient descent for multiobjective optimization
Complexity of gradient descent for multiobjective optimization J. Fliege A. I. F. Vaz L. N. Vicente July 18, 2018 Abstract A number of first-order methods have been proposed for smooth multiobjective optimization
More information1. Introduction. We analyze a trust region version of Newton s method for the optimization problem
SIAM J. OPTIM. Vol. 9, No. 4, pp. 1100 1127 c 1999 Society for Industrial and Applied Mathematics NEWTON S METHOD FOR LARGE BOUND-CONSTRAINED OPTIMIZATION PROBLEMS CHIH-JEN LIN AND JORGE J. MORÉ To John
More informationGeneralized Uniformly Optimal Methods for Nonlinear Programming
Generalized Uniformly Optimal Methods for Nonlinear Programming Saeed Ghadimi Guanghui Lan Hongchao Zhang Janumary 14, 2017 Abstract In this paper, we present a generic framewor to extend existing uniformly
More informationInexact Newton Methods and Nonlinear Constrained Optimization
Inexact Newton Methods and Nonlinear Constrained Optimization Frank E. Curtis EPSRC Symposium Capstone Conference Warwick Mathematics Institute July 2, 2009 Outline PDE-Constrained Optimization Newton
More informationAn Inexact Newton Method for Optimization
New York University Brown Applied Mathematics Seminar, February 10, 2009 Brief biography New York State College of William and Mary (B.S.) Northwestern University (M.S. & Ph.D.) Courant Institute (Postdoc)
More informationAn accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems
An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems O. Kolossoski R. D. C. Monteiro September 18, 2015 (Revised: September 28, 2016) Abstract
More informationLeast squares regularized or constrained by L0: relationship between their global minimizers. Mila Nikolova
Least squares regularized or constrained by L0: relationship between their global minimizers Mila Nikolova CMLA, CNRS, ENS Cachan, Université Paris-Saclay, France nikolova@cmla.ens-cachan.fr SIAM Minisymposium
More informationPDE-Constrained and Nonsmooth Optimization
Frank E. Curtis October 1, 2009 Outline PDE-Constrained Optimization Introduction Newton s method Inexactness Results Summary and future work Nonsmooth Optimization Sequential quadratic programming (SQP)
More informationLecture 7 Monotonicity. September 21, 2008
Lecture 7 Monotonicity September 21, 2008 Outline Introduce several monotonicity properties of vector functions Are satisfied immediately by gradient maps of convex functions In a sense, role of monotonicity
More informationA globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications
A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth
More informationGENERAL NONCONVEX SPLIT VARIATIONAL INEQUALITY PROBLEMS. Jong Kyu Kim, Salahuddin, and Won Hee Lim
Korean J. Math. 25 (2017), No. 4, pp. 469 481 https://doi.org/10.11568/kjm.2017.25.4.469 GENERAL NONCONVEX SPLIT VARIATIONAL INEQUALITY PROBLEMS Jong Kyu Kim, Salahuddin, and Won Hee Lim Abstract. In this
More informationPart 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)
Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective
More informationNewton s Method. Ryan Tibshirani Convex Optimization /36-725
Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x
More informationAn accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex concave saddle-point problems
Optimization Methods and Software ISSN: 1055-6788 (Print) 1029-4937 (Online) Journal homepage: http://www.tandfonline.com/loi/goms20 An accelerated non-euclidean hybrid proximal extragradient-type algorithm
More information