Iterative Hard Thresholding Methods for l 0 Regularized Convex Cone Programming

Size: px
Start display at page:

Download "Iterative Hard Thresholding Methods for l 0 Regularized Convex Cone Programming"

Transcription

1 Iterative Hard Thresholding Methods for l 0 Regularized Convex Cone Programming arxiv: v2 [math.oc] 2 Nov 2012 Zhaosong Lu October 30, 2012 Abstract In this paper we consider l 0 regularized convex cone programming problems. In particular, we first propose an iterative hard thresholding (IHT) method and its variant for solving l 0 regularized box constrained convex programming. We show that the sequence generated by these methods converges to a local minimizer. Also, we establish the iteration complexity of the IHT method for finding an ǫ-local-optimal solution. We then propose a method for solving l 0 regularized convex cone programming by applying the IHT method to its quadratic penalty relaxation and establish its iteration complexity for finding an ǫ-approximate local minimizer. Finally, we propose a variant of this method in which the associated penalty parameter is dynamically updated, and show that every accumulation point is a local minimizer of the problem. Key words: Sparse approximation, iterative hard thresholding method, l 0 regularization, box constrained convex programming, convex cone programming 1 Introduction Sparse approximations have over the last decade gained a great deal of popularity in numerous areas. For example, in compressed sensing, a large sparse signal is decoded by finding a sparse solution to a system of linear equalities and/or inequalities. Our particular interest of this paper is to find a sparse approximation to a convex cone programming problem in the form of min f(x) s.t. Ax b K, (1) l x u Department of Mathematics, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada. ( zhaosong@sfu.ca). This work was supported in part by NSERC Discovery Grant. Part of this work was conduct during the author s sabbatical leave in Department of Industrial and Systems Engineering at Texas A&M University. The author would like to thank them for hosting his visit. 1

2 for some l R n, u R n +, A R m n and b R m, where K denotes the dual cone of a closed convex cone K R m, i.e., K = {s R m : s T x 0, x K}, and R n = {x : x i 0, 1 i n} and R n + = {x R n : 0 x i, 1 i n}. A sparse solution to (1) can be sought by solving the following l 0 regularized convex cone programming problem: min f(x)+λ x 0 s.t. Ax b K, l x u (2) for some λ > 0, where x 0 denotes the cardinality of x. Onespecial case of (2), that is, the l 0 - regularized unconstrained least squares problem, has been well studied in the literature (e.g., [13, 10]), and some methods were developed for solving it. For example, the iterative hard thresholding (IHT) methods [6, 2, 3] and matching pursuit algorithms [11, 14] were proposed to solve this type of problems. Recently, Lu and Zhang [10] proposed a penalty decomposition method for solving a more general class of l 0 minimization problems. As shown by the extensive experiments in [2, 3], the IHT method performs very well in finding a sparse solution to unconstrained least squares problems. In addition, the similar type of methods [5, 8] were successfully applied to find low rank solutions in the context of matrix completion. Inspired by these works, in this paper we study IHT methods for solving l 0 regularized convex cone programming problem (2). In particular, we first propose an IHT method and its variant for solving l 0 regularized box constrained convex programming. We show that the sequence generated by these methods converges to a local minimizer. Also, we establish the iteration complexity of the IHT method for finding an ǫ-local-optimal solution. We then propose a method for solving l 0 regularized convex cone programming by applying the IHT method to its quadratic penalty relaxation and establish its iteration complexity for finding an ǫ-approximate local minimizer of the problem. We also propose a variant of the method in which the associated penalty parameter is dynamically updated, and show that every accumulation point is a local minimizer of the problem. The outline of this paper is as follows. In Subsection 1.1 we introduce some notations that are used in the paper. In Section 2 we present some technical results about a projected gradient method for convex programming. In Section 3 we propose IHT methods for solving l 0 regularized box constrained convex programming and study their convergence. In section 4 we develop IHT methods for solving l 0 regularized convex cone programming and study their convergence. Finally, in Section 5 we present some concluding remarks. 1.1 Notation Given a nonempty closed convex Ω R n and an arbitrary point x Ω, N Ω (x) denotes the normal cone of Ω at x. In addition, d Ω (y) denotes the Euclidean distance between y R n and Ω. All norms used in the paper are Euclidean norm denoted by. We use U(r) to denote a ball centered at the origin with a radius r 0, that is, U(r) := {x R n : x r}. 2

3 2 Technical preliminaries In this section we present some technical results about a projected gradient method for convex programming that will be subsequently used in this paper. Consider the convex programming problem φ := minφ(x), (3) x X where X R n is a closed convex set and φ : X R is a smooth convex function whose gradient is Lipschitz continuous with constant L φ > 0. Assume that the set of optimal solutions of (3), denoted by X, is nonempty. Let L L φ be arbitrarily given. A projected gradient of φ at any x X with respect to X is defined as g(x) := L[x Π X (x φ(x)/l)], (4) where Π X ( ) is the projection map onto X (see, for example, [12]). The following properties of the projected gradient are essentially shown in Proposition 3 and Lemma 4 of [9] (see also [12]). Lemma 2.1 Let x X be given and define x + := Π X (x φ(x)/l). Then, for any given ǫ 0, the following statements hold: a) g(x) ǫ if and only if φ(x) N X (x + )+U(ǫ). b) g(x) ǫ implies that φ(x + ) N X (x + )+U(2ǫ). c) φ(x + ) φ(x) g(x) 2 /(2L). d) φ(x) φ(x ) g(x) 2 /(2L), where x Argmin{φ(y) : y X}. We next study a projected gradient method for solving (3). Projected gradient method for (3): Choose an arbitrary x 0 X. Set k = 0. end 1) Solve the subproblem 2) Set k k +1 and go to step 1). x k+1 = argmin x X {φ(xk )+ φ(x k ) T (x x k )+ L 2 x xk 2 }. (5) Some properties of the above projected gradient method are established in the following two theorems, which will be used in the subsequent sections of this paper. 3

4 Theorem 2.2 Let {x k } be generated by the above projected gradient method. Then the following statements hold: (i) For every k 0 and l 1, (ii) {x k } converges to some optimal solution x of (3). φ(x k+l ) φ L 2l xk x 2. (6) Proof. (i) Since the objective function of (5) is strongly convex with modulus L, it follows that for every x X, φ(x k )+ φ(x k ) T (x x k )+ L 2 x xk 2 φ(x k )+ φ(x k ) T (x k+1 x k )+ L 2 xk+1 x k 2 + L 2 x xk+1 2. By the convexity of φ, Lipschitz continuity of φ and L L φ, we have φ(x) φ(x k )+ φ(x k ) T (x x k ), φ(x k+1 ) φ(x k )+ φ(x k ) T (x k+1 x k )+ L 2 xk+1 x k 2, which together with the above inequality imply that φ(x)+ L 2 x xk 2 φ(x k+1 )+ L 2 x xk+1 2, x X. (7) Letting x = x k in (7), we obtain that φ(x k ) φ(x k+1 ) L x k+1 x k 2 /2. Hence, {φ(x k )} is decreasing. Letting x = x X in (7), we have φ(x k+1 ) φ L 2 ( x k x 2 x k+1 x 2), k 0. Using this inequality and the monotonicity of {φ(x k )}, we obtain that l(φ(x k+l ) φ ) which immediately yields (6). (ii) It follows from (8) that [φ(x i+1 ) φ ] L ( x k x 2 x k+l x 2), (8) 2 k+l 1 i=k x k+l x x k x, k 0,l 1. (9) Hence, x k x x 0 x for every k. It implies that {x k } is bounded. Then, there exists a subsequence K such that {x k } K ˆx X. It can be seen from (6) that {φ(x k )} K φ. Hence, φ(ˆx ) = lim k K φ(x k ) = φ, which implies that ˆx X. Since (9) holds for any x X, we also have x k+l ˆx x k ˆx for every k 0 and l 1. This together with the fact {x k } K ˆx implies that {x k } ˆx and hence statement (ii) holds. 4

5 Theorem 2.3 Suppose that φ is strongly convex with modulus σ > 0. Let {x k } be generated by the above projected gradient method. Then, for any given ǫ > 0, the following statements hold: (i) φ(x k ) φ ǫ whenever (ii) φ(x k ) φ < ǫ whenever k 2 L/σ log φ(x0 ) φ. ǫ k 2 L/σ log φ(x0 ) φ +1. ǫ Proof. (i) Let M = L/σ. It follows from Theorem 2.2 that φ(x k+l ) φ L 2l xk x 2 L σl (φ(xk ) φ ), where x is the optimal solution of (3). Hence, we have which implies that φ(x k+2m ) φ L 2σM (φ(xk ) φ ) 1 2 (φ(xk ) φ ), φ(x 2jM ) φ 1 2 j(φ(x0 ) φ ). Let K = log((φ(x 0 ) φ )/ǫ). Hence, when k 2KM, we have φ(x k ) φ φ(x 2KM ) φ 1 2 K(φ(x0 ) φ ) ǫ, which immediately implies that statement (i) holds. (ii) Let K and M be defined as above. If φ(x 2KM ) = φ, by monotonicity of {φ(x k )} we have φ(x k ) = φ when k > 2KM, and hence the conclusion holds. We now suppose that φ(x 2KM ) > φ. It implies that g(x 2KM ) 0, where g is defined in (4). Using this relation, Lemma 2.1 (c) and statement (i), we obtain that φ(x 2KM+1 ) < φ(x 2KM ) ǫ, which together with the montonicity of {φ(x k )} implies that the conclusion holds. Finally, we consider the convex programming problem f := min{f(x) : Ax b K,x X}, (10) for some A R m n and b R m, where f : X R is a smooth convex function whose gradient is Lipschitz continuous gradient with constant L f > 0, X R n is a closed convex set, and K is the dual cone of a closed convex cone K. 5

6 The Lagrangian dual function associated with (10) is given by d(µ) := inf{f(x)+µ T (Ax b) : x X}, µ K. Assume that there exists a Lagrange multiplier for (10), that is, a vector µ K such that d(µ ) = f. Under this assumption, the following results are established in Corollary 2 and Proposition 10 of [9], respectively. Lemma 2.4 Let µ be a Lagrange multiplier for (10). There holds: f(x) f µ d K (Ax b), x X. Lemma 2.5 Let ρ > 0 be given and L ρ = L f +ρ A 2. Consider the problem Φ ρ := min x X {Φ ρ(x) := f(x)+ ρ 2 [d K (Ax b)]2 }. (11) If x X is a ξ-approximate solution of (11), i.e., Φ ρ (x) Φ ρ ξ, then the pair (x +,µ) defined as x + := Π X (x Φ ρ (x)/l ρ ), µ := ρ[ax + b Π K (Ax + b)] is in X ( K) and satisfies µ T Π K (Ax + b) = 0 and the relations d K (Ax + b) 1 ρ µ ξ +, ρ f(x + )+A T µ N X (x + )+U(2 2L ρ ξ), where µ is an arbitrary Lagrange multiplier for (10). 3 l 0 regularized box constrained convex programming In this section we consider a special case of (2), that is, l 0 regularized box constrained convex programming problem in the form of: F := min F(x) := f(x)+λ x 0 s.t. l x u (12) for some λ > 0, l R n and u R n +. Recently, Blumensath and Davies [2, 3] proposed an iterative hard thresholding (IHT) method for solving a special case of (12) with f(x) = Ax b 2, l i = and u i = for all i. Our aim is to extend their IHT method to solve (12) and study its convergence. In addition, we establish its iteration complexity for finding an ǫ-local-optimal solution of (12). Finally, we propose a variant of the IHT method in which only local Lipschitz constant of f is used. 6

7 Throughout this section we assume that f is a smoothconvex function in B whose gradient is Lipschitz continuous with constant L f > 0, and also that f is bounded below on the set B, where B := {x R n : l x u}. (13) We now present an IHT method for solving problem (12). Iterative hard thresholding method for (12): Choose an arbitrary x 0 B. Set k = 0. end 1) Solve the subproblem x k+1 Argmin x B {f(xk )+ f(x k ) T (x x k )+ L 2 x xk 2 +λ x 0 }. (14) 2) Set k k +1 and go to step 1). Remark. The subproblem (14) has a closed form solution given in (21). In what follows, we study the convergence of the above IHT method for (12). Before proceeding, we introduce some notations that will be used subsequently. Define B I := {x B : x I = 0}, I {1,...,n}, (15) Π B (x) := argmin{ y x : y B}, x R n, s L (x) := x 1 f(x), x B, (16) L I(x) := {i : x i = 0}, x R n (17) for some constant L > L f. The following lemma establishes some properties of the operators s L ( ) and Π B (s L ( )), which will be used subsequently. Lemma 3.1 For any x, y R n, there hold: (1) [s L (x)] 2 i [s L (y)] 2 i 4( x y + [s L (y)] i ) x y ; (2) [Π B (s L (x)) s L (x)] 2 i [Π B (s L (y)) s L (y)] 2 i 4( x y + [Π B (s L (y)) s L (y)] i ) x y. Proof. (1) We observe that s L (x) s L (y) = x y 1 L ( f(x) f(y)) x y + 1 L f(x) f(y), (1+ L f ) x y 2 x y. (18) L 7

8 It follows from (18) that [s L (x)] 2 i [s L(y)] 2 i = [s L(x)] i +[s L (y)] i [s L (x)] i [s L (y)] i, (2) It can be shown that Using this inequality and (18), we then have ( [s L (x)] i [s L (y)] i +2 [s L (y)] i ) [s L (x)] i [s L (y)] i, 4( x y + [s L (y)] i ) x y. Π B (x) x+y Π B (y) x y. [Π B (s L (x)) s L (x)] 2 i [Π B (s L (y)) s L (y)] 2 i ( [Π B (s L (x)) s L (x)] i [Π B (s L (y)) s L (y)] i +2 Π B (s L (y)) s L (y)] i ) [Π B (s L (x)) s L (x)] i [Π B (s L (y)) s L (y)] i, ( s L (x) s L (y) +2 [Π B (s L (y)) s L (y)] i ) s L (x) s L (y), 4( x y + [Π B (s L (y)) s L (y)] i ) x y. The following lemma shows that for the sequence {x k }, the magnitude of any nonzero component x k i cannot be too small for k 1. Lemma 3.2 Let {x k } be generated by the above IHT method. Then, for all k 0, x k+1 j δ := minδ i > 0, if x k+1 j 0, (19) i/ I 0 where I 0 = {i : l i = u i = 0} and min(u i, 2λ/L), if l i = 0, δ i = min( l i, 2λ/L), if u i = 0, min( l i,u i, 2λ/L), otherwise, i I 0. (20) Proof. One can observe from (14) that for i = 1,...,n, [Π B (s L (x k ))] i, if [s L (x k )] 2 i [Π B(s L (x k )) s L (x k )] 2 i > 2λ, L x k+1 i = 0, if [s L (x k )] 2 i [Π B(s L (x k )) s L (x k )] 2 i < 2λ, L [Π B (s L (x k ))] i or 0, otherwise (21) (see, for example, [10]). Suppose that j is an index such that x k+1 j I 0 is define above. It follows from (21) that 0. Clearly, j / I 0, where x k+1 j = [Π B (s L (x k ))] j 0, [s L (x k )] 2 j [Π B (s L (x k )) s L (x k )] 2 j 2λ L. (22) 8

9 The second relation of (22) implies that [s L (x k )] j 2λ/L. In addition, by the first relation of (22) and the definition of Π B, we have x k+1 j = [Π B (s L (x k ))] j = min(max([s L (x k )] j,l j ),u j ) 0. (23) Recall that j / I 0. We next show that x k+1 j δ j by considering three separate cases: i) l j = 0; ii) u j = 0; and iii) l j u j 0. For case i), it follows from (23) that [s L (x k )] j 0 = min([s L (x k )] j,u j ). This together with the relation [s L (x k )] j 2λ/L and the definition of δ j implies that x k+1 j δ j. By the similar arguments, we can show that δ j also holds for the other two cases. Then, it is easy to see that the conclusion of this lemma holds. and x k+1 j x k+1 j We next establish that the sequence {x k } converges to a local minimizer of (12), and moreover, F(x k ) converges to a local minimum value of (12). Theorem 3.3 Let {x k } be generated by the above IHT method. Then, x k converges to a local minimizer x of problem (12) and moreover, I(x k ) I(x ), x k 0 x 0 and F(x k ) F(x ). Proof. Since f is Lipschitz continuous with constant L f, we have f(x k+1 ) f(x k )+ f(x k ) T (x x k )+ L f 2 xk+1 x k 2. Using this inequality, the fact that L > L f, and (14), we obtain that F(x k+1 ) = f(x k+1 )+λ x k+1 0 a {}}{ f(x k )+ f(x k ) T (x k+1 x k )+ L f 2 xk+1 x k 2 +λ x k+1 0, f(x k )+ f(x k ) T (x k+1 x k )+ L 2 xk+1 x k 2 +λ x k+1 0 }{{} b f(x k )+λ x k 0 = F(x k ), where the last inequality follows from (14). The above inequality implies that {F(x k )} is nonincreasing and moreover, F(x k ) F(x k+1 ) b a = L L f 2 x k+1 x k 2. (24) By the assumption, we know that f is bounded below in B. It then follows that {F(x k )} is bounded below. Hence, {F(x k )} converges to a finite value as k, which together with (24) implies that lim k xk+1 x k = 0. (25) 9

10 Let I k = I(x k ), where I( ) is defined in (17). In view of (19), we observe that x k+1 x k δ if I k I k+1. (26) This together with (25) implies that I k does not change when k is sufficient large. Hence, there exist some K 0 and I {1,...,n} such that I k = I for all k K. Then one can observe from (14) that x k+1 = argmin x B I {f(x k )+ f(x k ) T (x x k )+ L 2 x xk 2 }, k > K, where B I is defined in (15). It follows from Lemma 2.2 that x k x, where x Argmin{f(x) : x B I }. (27) It is not hard to see from (27) that x is a local minimizer of (12). In addition, we know from (19) that x k i δ for k > K and i / I. It yields x i δ for i / I and x i = 0 for i I. Hence, I(x k ) = I(x ) = I for all k > K, which clearly implies that x k 0 = x 0 for every k > K. By continuity of f, we have f(x k ) f(x ). It then follows that F(x k ) = f(x k )+λ x k 0 f(x )+λ x 0 = F(x ). AsshowninTheorem3.3, x k x forsomelocalminimizer x of (12)andF(x k ) F(x ). Our next aim is to establish the iteration complexity of the IHT method for finding an ǫ-localoptimal solution x ǫ B of (12) satisfying F(x ǫ ) F(x ) + ǫ and I(x ǫ ) = I(x ). Before proceeding, we define { α = β = min I {1,...,n} max I {1,...,n} min i [s L(x )] 2 i [Π B(s L (x )) s L (x )] 2 i 2λ } L : x Argmin{f(x) : x B I }(28), { } max [s L (x )] i + Π B (s L (x )) s L (x )] i : x Argmin{f(x) : x B I }. (29) i Theorem 3.4 Assume that f is a smooth strongly convex function with modulus σ > 0. Suppose that L > L f is chosen such that α > 0. Let {x k } be generated by the above IHT method, I k = I(x k ) for all k, x = lim k x k, and F = F(x ). Then, for any given ǫ > 0, the following statements hold: (i) The number changes of I k is at most 2(F(x 0 ) F ) (L L f. )δ 2 (ii) The total number of iterations by the IHT method for finding an ǫ-local-optimal solution x ǫ B satisfying I(x ǫ ) = I(x ) and F(x ǫ ) F +ǫ is at most 2 L/σ log θ, where ǫ { } θ = (F(x 0 ) F )2 ω+3 2, ω = max (d 2c)t ct 2 : 0 t 2(F(x 0 ) F ) (L L f, (30) )δ 2 t c = (L L f)δ 2 2(F(x 0 ) F ), γ = σ( 2α+β 2 β) 2 /32, (31) d = 2log(F(x 0 ) F )+4 2logγ +c. 10

11 Proof. (i) As shown in Theorem 3.3, I k only changes for a finite number of times. Assume that I k only changes at k = n 1 +1,...,n J +1, that is, I nj 1 +1 = = I nj I nj +1 = = I nj+1, j = 1,...,J 1, (32) where n 0 = 0. We next bound J, i.e., the total number of changes of I k. In view of (26) and (32), one can observe that x n j+1 x n j δ, j = 1,...,J, which together with (24) implies that F(x n j ) F(x n j+1 ) 1 2 (L L f)δ 2, j = 1,...,J. (33) Summing up these inequalities and using the monotonicity of {F(x k )}, we have and hence 1 2 (L L f)δ 2 J F(x n 1 ) F(x n J+1 ) F(x 0 ) F, (34) J 2(F(x 0 ) F ) (L L f )δ 2 (ii) Let n j be defined as above for j = 1,...,J. We first show that. (35) n j n j L/σ log ( F(x 0 ) (j 1)(L L f )δ 2 /2 F ) logγ, j = 1,...,J, (36) where F and γ are defined in (12) and (31), respectively. Indeed, one can observe from (14) that x k+1 = argmin x B {f(xk )+ f(x k ) T (x x k )+ L 2 x xk 2 : x Ik+1 = 0}. Therefore, for j = 1,...,J and k = n j 1,...,n j 1, x k+1 = argmin x B {f(xk )+ f(x k ) T (x x k )+ L 2 x xk 2 : x Inj = 0}. We arbitrarily choose 1 j J. Let x (depending on j) denote the optimal solution of One can observe that min {f(x) : x I nj = 0}. (37) x B x 0 x n j Also, it follows from (33) and the monotonicity of {F(x k )} that F(x n j+1 ) F(x 0 ) j 2 (L L f)δ 2, j = 1,...,J. (38) 11

12 Using these relations and the fact that F( x ) F, we have f(x n j 1+1 ) f( x ) = F(x n j 1+1 ) λ x n j F( x )+λ x 0, F(x 0 ) j 1 2 (L L f)δ 2 F. (39) Suppose for a contradiction that (36) does not hold for some 1 j J. Hence, we have n j n j 1 > 2+2 L/σ log ( F(x 0 ) (j 1)(L L f )δ 2 /2 F ) logγ. This inequality and (39) yields n j n j 1 > 2+2 L/σ log f(xnj 1+1 ) f( x ). γ Using the strong convexity of f and applying Theorem 2.3 (ii) to (37) with ǫ = γ, we obtain that σ 2 xn j x 2 f(x n j ) f( x ) < σ 32 ( 2α+β 2 β) 2. It implies that x n j x 2α+β2 β <. (40) 4 Using (40), Lemma 3.1 and the definition of β, we have [s L (x n j )] 2 i [s L( x )] 2 i [Π B(s L (x n j )) s L (x n j )] 2 i +[Π B(s L ( x )) s L ( x )] 2 i [s L (x n j )] 2 i [s L( x )] 2 i + [Π B(s L (x n j )) s L (x n j )] 2 i [Π B(s L ( x )) s L ( x )] 2 i 4(2 x n j x +β) x n j x < α, (41) where the last inequality is due to (40). Let { I = i : [s L ( x )] 2 i [Π B(s L ( x )) s L ( x )] 2 i < 2λ } L and let Ī = {1,...,n}\I. Since α > 0, we know that [s L ( x )] 2 i [Π B(s L ( x )) s L ( x )] 2 i > 2λ L, i Ī. It then follows from (41) and the definition of α that [s L (x n j )] 2 i [Π B(s L (x n j )) s L (x n j )] 2 i < 2λ L, i I, [s L (x n j )] 2 i [Π B (s L (x n j )) s L (x n j )] 2 i > 2λ L, i Ī. Observe that [Π B (s L (x n j ))] i 0 for all i Ī. This fact together with (21) implies that x n j+1 i = 0, i I and x n j+1 i 0, i Ī. 12

13 By a similar argument, one can show that x n j i = 0, i I and x n j i 0, i Ī. Hence, I nj = I nj +1 = I, which is a contradiction to (32). We thus conclude that (36) holds. Let N ǫ denote the total number of iterations for finding an ǫ-local-optimal solution x ǫ B by the IHT method satisfying I(x ǫ ) = I(x ) and F(x ǫ ) F +ǫ. We next establish an upper bound for N ǫ. Summing up the inequality (36) for j = 1,...,J, we obtain that n J J j=1 { 2+2 L/σ log(f(x 0 ) j 1 } 2 (L L f)δ 2 F ) logγ. Using this inequality, (34), and the facts that L σ and log(1 t) t for all t (0,1), we have J [ ( n J 2+2 L/σ log(f(x 0 ) j 1 )] 2 (L L f)δ 2 F ) logγ +1, j=1 J [ 2+2 L/σ (log(f(x 0 ) F ) (L L )] f)δ 2 2(F(x 0 ) F (j 1) logγ +1, ) j=1 L/σ (2log(F(x 0 ) F )+4 2logγ + (L L ) f)δ 2 2(F(x 0 ) F J (L L f)δ 2 ) 2(F(x }{{} 0 ) F ) }{{} d c By the definition of n J, we observe that after n J +1 iterations, the IHT method becomes the projected gradient method applied to the problem x = argmin x B {f(x) : x I nj +1 = 0}. J 2 (42). In addition, we know from Theorem 3.3 that I(x k ) = I(x ) for all k > n J. Hence, f(x k ) f(x ) = F(x k ) F when k > n J. Using these facts and Theorem 2.3 (ii), we have N ǫ n J +1+2 L/σ log F(xnJ+1 ) F. ǫ Using this inequality, (38), (42) and the facts that F F, L σ and log(1 t) t for all t (0,1), we obtain that ( N ǫ n J +1+2 L/σ log(f(x 0 ) J ) 2 (L L f)δ 2 F )+1 logǫ, ( n J + L/σ 2log(F(x 0 ) F ) (L L ) f)δ 2 J F(x 0 ) F +3 2logǫ L/σ [ (d 2c)J cj 2 +2log(F(x 0 ) F )+3 2logǫ ], 13

14 which together with (35) and (30) implies that N ǫ 2 L/σ log θ ǫ. The iteration complexity given in Theorem 3.4 is based on the assumption that f is strongly convex in B. We next consider a case where B is bounded and f is convex but not strongly convex. We will establish the iteration complexity of finding an ǫ-local-optimal solution of (12) by the IHT method applied to a perturbation of (12) obtained from adding a small strongly convex regularization term to f. Consider a perturbation of (12) in the form of where ν > 0 and F ν := min x B {F ν(x) := f ν (x)+λ x 0 }, (43) f ν (x) := f(x)+ ν 2 x 2. One can easily see that f ν is strongly convex in B with modulus ν and moreover f ν is Lipschitz continuous with constant L ν, where L ν = L f +ν. (44) We next establish the iteration complexity of finding an ǫ-local-optimal solution of (12) by the IHT method applied to (43). Given any L > 0, let s L, α and β be defined according to (16), (28) and (29), respectively, by replacing f by f ν, and let δ be defined in (19). Theorem 3.5 Suppose that B is bounded and f is convex but not strongly convex. Let ǫ > 0 be arbitrarily given, D = max{ x : x B}, ν = ǫ/d 2, and L > L ν be chosen such that α > 0. Let {x k } be generated by the IHT method applied to (43), and let x = lim k x k, Fν = F ν(x ) and F = min{f(x) : x B I }, where I = {i : x i = 0}. Then, the total number of iterations by the IHT method for finding an ǫ-local-optimal solution x ǫ B satisfying F(x ǫ ) F +ǫ is D at most 2 2 L f +1 log 2θ, where ǫ ǫ { } θ = (F ν (x 0 ) Fν )2ω+3 2, ω = max (d 2c)t ct 2 : 0 t 2(Fν(x 0 ) Fν) (L L ν)δ, 2 t c = (L Lν)δ2 2(F ν(x 0 ) F ν ), γ = ν( 2α+β 2 β) 2 /32, d = 2log(F ν (x 0 ) F ν )+4 2logγ +c. Proof. By Theorem 3.4 (ii), we see that the IHT method applied to (43) finds an ǫ/2- local-optimal solution x ǫ B of (43) satisfying I(x ǫ ) = I(x ) and F ν (x ǫ ) Fν +ǫ/2 within 2 L ν /ν log 2θ iterations. From the proof of Theorem 3.3, we observe that ǫ F ν (x ) = min{f ν (x) : x B I }. 14

15 Hence, we have Fν = F ν (x ) min f(x)+ νd2 x B I 2 F + ǫ 2. In addition, we observe that F(x ǫ ) F ν (x ǫ ). Hence, it follows that F(x ǫ ) F ν (x ǫ ) F ν + ǫ 2 F +ǫ. Note that F is a local optimal value of (12). Hence, x ǫ is an ǫ-local-optimal solution of (12). The conclusion of this theorem then follows from (44) and ν = ǫ/d 2. For the above IHT method, a fixed L is used through all iterations, which may be too conservative. To improve its practical performance, we can use local L that is update dynamically. The resulting variant of the method is presented as follows. A variant of IHT method for (12): Let 0 < L min < L max, τ > 1 and η > 0 be given. Choose an arbitrary x 0 B and set k = 0. 1) Choose L 0 k [L min,l max ] arbitrarily. Set L k = L 0 k. 1a) Solve the subproblem x k+1 Argmin x B {f(xk )+ f(x k ) T (x x k )+ L k 2 x xk 2 +λ x 0 }. (45) 1b) If F(x k ) F(x k+1 ) η 2 xk+1 x k 2 (46) is satisfied, then go to step 2). 1c) Set L k τl k and go to step 1a). 2) Set k k +1 and go to step 1). end Remark. L 0 k can be chosen by the similar scheme as used in [1, 4], that is, { }} L 0 k {L = max min,min L max, ft x, x 2 where x = x k x k 1 and f = f(x k ) f(x k 1 ). At each iteration, the IHT method solves a single subproblem in step 1). Nevertheless, its variant needs to solve a sequence of subproblems. We next show that for each outer iteration, its number of inner iterations is finite. 15

16 Theorem 3.6 For each k 0, the inner termination criterion (46) is satisfied after at most log(lf +η) log(l min ) +2 inner iterations. logτ Proof. Let L k denote the final value of L k at the kth outer iteration. By (45) and the similar arguments as for deriving (24), one can show that F(x k ) F(x k+1 ) L k L f 2 x k+1 x k 2. Hence, (46) holds whenever L k L f +η, which together with the definition of L k implies that L k /τ < L f +η, that is, L k < τ(l f +η). Let n k denote the number of inner iterations for the kth outer iteration. Then, we have L min τ nk 1 L 0 k τn k 1 = L k < τ(l f +η). log(lf +η) log(l Hence, n k min ) +2 and the conclusion holds. logτ We next establish that the sequence {x k } generated by the above variant of IHT method converges to a local minimizer of (12) and moreover, F(x k ) converges to a local minimum value of (12). Theorem 3.7 Let {x k } be generated by the above variant of IHT method. Then, x k converges to a local minimizer x of problem (12), and moreover, I(x k ) I(x ), x k 0 x 0 and F(x k ) F(x ). Proof. Let L k denote the final value of L k at the kth outer iteration. From the proof of Theorem 3.6, we know that L k [L min,τ(l f +η)). Using this fact and a similar argument as used to prove (19), one can obtain that x k+1 i δ := min δi > 0, if x k+1 j 0, i/ I 0 where I 0 = {i : l i = u i = 0} and δ i is defined according to (20) by replacing L by τ(l f +η) for all i I 0. It implies that x k+1 x k δ if I(x k ) I(x k+1 ). The conclusion then follows from this inequality and the similar arguments as used in the proof of Theorem l 0 -regularized convex cone programming In this section we consider l 0 -regularized convex cone programming problem (2) and propose IHT methods for solving it. In particular, we apply the IHT method proposed in Section 16

17 3 to a quadratic penalty relaxation of (2) and establish the iteration complexity for finding an ǫ-approximate local minimizer of (2). We also propose a variant of the method in which the associated penalty parameter is dynamically updated, and show that every accumulation point is a local minimizer of (2). Let B be defined in (13). We assume that f is a smooth convex function in B, f is Lipschitz continuous with constant L f and that f is bounded below on B. In addition, we make the following assumption throughout this section. Assumption 1 For each I {1,...,n}, there exists a Lagrange multiplier for f I = min{f(x) : Ax b K,x B I }, (47) provided that (47) is feasible, that is, there exists µ K such that f I = d I(µ ), where d I (µ) := inf{f(x)+µ T (Ax b) : x B I }, µ K. Let x be a point in B, and let I = {i : x i = 0}. One can observe that x is a local minimizer of (2) if and only if x is a minimizer of (47) with I = I. Then, in view of Assumption 1, we see that x is a local minimizer of (2) if and only if x B and there exists µ K such that Ax b K, (µ ) T (Ax b) = 0, (48) f(x )+A T µ N BI (x ). Based on the above observation, we can define an approximate local minimizer of (2) to be the one that nearly satisfies (48). Definition 1 Let x be a point in B, and let I = {i : x i = 0}. x is an ǫ-approximate local minimizer of (2) if there exists µ K such that d K (Ax b) ǫ, (µ ) T Π K (Ax b) = 0, f(x )+A T µ N BI (x )+U(ǫ). In what follows, we propose an IHT method for finding an approximate local minimizer of (2). In particular, we apply the IHT method or its variant to a quadratic penalty relaxation of (2) which is in the form of where Ψ ρ := min x B {Ψ ρ(x) := Φ ρ (x)+λ x 0 }, (49) Φ ρ (x) := f(x)+ ρ 2 [d K (Ax b)]2 (50) It is not hard to show that the function Φ ρ is convex differentiable and moreover Φ ρ is Lipschitz continuous with constant L ρ = L f +ρ A 2 (51) 17

18 (see, for example, Proposition 8 and Corollary 9 of [9]). Therefore, problem (49) can be suitably solved by the IHT method or its variant proposed in Section 3. Under the assumption that f is strongly convex in B, we next establish the iteration complexity of the IHT method applied to (49) for finding an approximate local minimizer of (2). Given any L > 0, let s L, α and β be defined according to (16), (28) and (29), respectively, by replacing f by Φ ρ, and let δ be defined in (19). Theorem 4.1 Assume that f is a smooth strongly convex function with modulus σ > 0. Given any ǫ > 0, let ρ = t ǫ A (52) for any t max min µ, where Λ I is the set of Lagrange multipliers of (47). Let L > L ρ I {1,...,n} µ Λ I be chosen such that α > 0. Let {x k } be generated by the IHT method applied to (49), and let x = lim k x k and Ψ ρ = Ψ ρ(x ). Then the IHT method finds an ǫ-approximate local minimizer of (2) in at most Lρ N := 2 log 8L ρθ σ ǫ 2 iterations, where { } θ = (Ψ ρ (x 0 ) Ψ ρ )2ω+3 2, ω = max (d 2c)t ct 2 2(Ψρ(x : 0 t 0 ) Ψ ρ ) (L L ρ)δ, 2 t c = (L Lρ)δ2 2(Ψ ρ(x 0 ) Ψ ρ ), γ = σ( 2α+β 2 β) 2 /32, d = 2log(Ψ ρ (x 0 ) Ψ ρ)+4 2logγ +c. Proof. We know from Theorem 3.3 that x k x for some local minimizer x of (49), I(x k ) I(x ) and Ψ ρ (x k ) Ψ ρ (x ) = Ψ ρ. By Theorem 3.4, after at most N iterations, the IHT method generates x B such at I( x) = I(x ) and Ψ ρ ( x) Ψ ρ (x ) ξ := ǫ 2 /(8L ρ ). It then follows that Φ ρ ( x) Φ ρ (x ) ξ. Since x is a local minimizer of (49), we observe that x = arg min Φ ρ (x), (53) x B I where I = I(x ). Hence, x is a ξ-approximate solution of (53). Let µ Argmin{ µ : µ Λ I }, where Λ I is the set of Lagrange multipliers of (47) with I = I. In view of Lemma 2.5, we see that the pair ( x +,µ) defined as x + := Π BI ( x Φ ρ ( x)/l ρ ) and µ := ρ[a x + b Π K (A x + b)] satisfies f( x + )+A T µ N BI ( x + )+U(2 2L ρ ξ) = N BI ( x + )+U(ǫ), ( ) d K (A x + b) 1 ρ µ ξ + 1 µ + ǫ ρ ρ 8 A ǫ, where the last inequality is due to (52) and the assumption t ˆt µ. Hence, x + is an ǫ-approximate local minimizer of (2). 18

19 We next consider finding an ǫ-approximate local minimizer of (2) for the case where B is bounded and f is convex but not strongly convex. In particular, we apply the IHT method or its variant to a quadratic penalty relaxation of a perturbation of (2) obtained from adding a small strongly convex regularization term to f. Consider a perturbation of (2) in the form of where min {f(x)+ ν x B 2 x 2 +λ x 0 : Ax b K }. (54) The associated quadratic penalty problem for (54) is given by Ψ ρ,ν := min x B {Ψ ρ,ν(x) := Φ ρ,ν (x)+λ x 0 }, (55) Φ ρ,ν (x) := f(x)+ ν 2 x 2 + ρ 2 [d K (Ax b)]2. One can easily see that Φ ρ,ν is strongly convex in B with modulus ν and moreover Φ ρ,ν is Lipschitz continuous with constant L ρ,ν := L f +ρ A 2 +ν. Clearly, the IHT method or its variant can be suitably applied to (55). We next establish the iteration complexity of the IHT method applied to (55) for finding an approximate local minimizer of (2). Given any L > 0, let s L, α and β be defined according to (16), (28) and (29), respectively, by replacing f by Φ ρ,ν, and let δ be defined in (19). Theorem 4.2 Suppose that B is bounded and f is convex but not strongly convex. Let ǫ > 0 be arbitrarily given, D = max{ x : x B}, ( D ) 2 + D +16t+ 2 2ǫ A ρ =, ν = ǫ (56) 16ǫ 2D for any t max min µ, where Λ I is the set of Lagrange multipliers of (47). Let L > L ρ,ν I {1,...,n} µ Λ I be chosen such that α > 0. Let {x k } be generated by the IHT method applied to (55), and let x = lim k x k and Ψ ρ,ν = Ψ ρ,ν(x ). Then the IHT method finds an ǫ-approximate local minimizer of (2) in at most iterations, where N := 2 2DLρ,ν ǫ log 32L ρ,νθ ǫ 2 { θ = (Ψ ρ,ν (x 0 ) Ψ ρ,ν )2ω+3 2, ω = max (d 2c)t ct 2 : 0 t t c = (L Lρ,ν)δ2 2(Ψ ρ,ν(x 0 ) Ψ ρ,ν ), γ = ν( 2α ρ,ν +β 2 ρ,ν β ρ,ν ) 2 /32, d = 2log(Ψ ρ,ν (x 0 ) Ψ ρ,ν)+4 2logγ +c. } 2(Ψρ,ν(x 0 ) Ψ ρ,ν ) (L L ρ,ν)δ, 2 19

20 Proof. From Theorem 3.3, we know that x k x for some local minimizer x of (55), I(x k ) I(x ) and Ψ ρ,ν (x k ) Ψ ρ,ν (x ) = Ψ ρ,ν. By Theorem 3.4, after at most N iterations, theiht methodapplied to (55) generates x B such at I( x) = I(x ) andψ ρ,ν ( x) Ψ ρ,ν (x ) ξ := ǫ 2 /(32L ρ,ν ). It then follows that Φ ρ,ν ( x) Φ ρ,ν (x ) ξ. Since x is a local minimizer of (55), we see that x = arg min Φ ρ,ν (x), (57) x B I where I = I(x ). Hence, x is a ξ-approximate solution of (57). In view of Lemma 2.5, we see that the pair ( x +,µ) defined as x + := Π BI ( x Φ ρ,ν ( x)/l ρ,ν ) and µ := ρ[a x + b Π K (A x + b)] satisfies f( x + )+ν x + +A T µ N BI ( x + )+U(2 2L ρ,ν ξ) = N BI ( x + )+U(ǫ/2), which together with the fact that ν x + νd ǫ/2 implies that f( x + )+A T µ ν x + N BI ( x + )+U(ǫ/2) N BI ( x + )+U(ǫ). In addition, it follows from Lemma 2.1 (c) that Φ ρ,ν ( x + ) Φ ρ,ν ( x), and hence Φ ρ,ν ( x + ) Φ ρ,ν (x ) Φ ρ,ν ( x) Φ ρ,ν (x ) ξ. Let Φ ρ = min{φ ρ(x) : x B I }, where Φ ρ is defined in (50). Notice that Φ ρ,ν (x ) Φ ρ + νd 2 /2. It then follows that Φ ρ ( x + ) Φ ρ Φ ρ,ν( x + ) Φ ρ,ν (x )+ νd2 2 ξ + ǫd 4 ǫ 2 32ρ A 2 + ǫd 4. Let µ Argmin{ µ : µ Λ I }, where Λ I is the set of Lagrange multipliers of (47) with I = I. In view of Lemma 2.5 and the assumption t ˆt µ, we obtain that d K (A x + b) 1 ǫ ρ µ ρ 2 A + ǫd 2 4ρ 1 ( ǫ ǫd t+ )+ ρ 32 A 4ρ = ǫ, where the last inequality is due to (56). Hence, x + is an ǫ-approximate local minimizer of (2). For the above method, the fixed penalty parameter ρ is used through all iterations, which may be too conservative. To improve its practical performance, we can update ρ dynamically. The resulting variant of the method is presented as follows. Before proceeding, we define the projected gradient of Φ ρ at x B I with respect to B I as g(x;ρ,i) = L ρ [x Π BI (x 1 L ρ Φ ρ (x))], (58) where I {1,...,n}, and Φ ρ and L ρ are defined in (50) and (51), respectively. 20

21 A variant of IHT method for (2): Let {ǫ k } be a positive decreasing sequence. Let ρ 0 > 0, τ > 1, t > max min µ, where Λ I I {1,...,n} µ Λ I is the set of Lagrange multipliers of (47). Choose an arbitrary x 0 B. Set k = 0. end 1) Start from x k 1 and apply the IHT method or its variant to problem (49) with ρ = ρ k until finding some x k B such that where I k = I(x k ). 2) Set ρ k+1 := τρ k. 3) Set k k +1 and go to step 1). d K (Ax k b) t ρ k, g(x k ;ρ k,i k ) min{1,l ρk }ǫ k, (59) The following theorem shows that x k satisfying (59) can be found within a finite number of iterations by the IHT method or its variant applied to problem (49) with ρ = ρ k. Without loss of generality, we consider the IHT method or its variant applied to problem (49) with any given ρ > 0. Theorem 4.3 Let x 0 B be an arbitrary point and the sequence {x l } be generated by the IHT method or its variant applied to problem (49). Then, the following statements hold: (i) lim l g(x l ;ρ,i l ) = 0, where I l = I(x l ) for all l. min µ and Λ I is the set of Lagrange multi- µ Λ I (ii) lim d K (Ax l b) ˆt, where ˆt := max l ρ pliers of (47). I {1,...,n} Proof. (i) It follows from Theorems 3.3 and 3.7 that x l x for some local minimizer x of (49) and moreover, Φ ρ (x l ) Φ ρ (x ) and I l I, where I l = I(x l ) and I = I(x ). We also know that x Arg min Φ ρ (x). x B I It then follows from Lemma 2.1 (d) that Φ ρ (x l ) Φ ρ (x ) 1 2L ρ g(x l ;ρ,i ) 2, l N. Using this inequality and Φ ρ (x l ) Φ ρ (x ), we thus have g(x l ;ρ,i ) 0. Since I l = I for l N, we also have g(x l ;ρ,i l ) 0. 21

22 (ii) Let f I be defined in (47). Applying Lemma 2.4 to problem (47), we know that f(x l ) f I(l) ˆtd K (Ax l b), l, (60) where ˆt is defined above. Let x and I be defined in the proof of statement (i). We observe that f I Φ ρ(x ). Using this relation and (60), we have that for sufficiently large l, Φ ρ (x l ) Φ ρ (x ) = f(x l )+ ρ 2 [d K (Ax l b)] 2 Φ ρ (x ) f(x l ) f I + ρ 2 [d K (Ax l b)] 2 which implies that = f(x l ) f I(l) + ρ 2 [d K (Ax l b)] 2 ˆtd K (Ax l b)+ ρ 2 [d K (Ax l b)] 2, d K (Ax l b) ˆt ρ + Φ ρ (x l ) Φ ρ (x ). ρ This inequality together with the fact lim l Φ ρ (x l ) = Φ ρ (x ) yields statement (ii). Remark. From Theorem 4.3, we can see that the inner iterations of the above method terminates finitely. We next establish convergence of the outer iterations of the above variant of the IHT method for (2). In particular, we show that every accumulation point of {x k } is a local minimizer of (2). Theorem 4.4 Let {x k } be the sequence generated by the above variant of the IHT method for solving (2). Then any accumulation point of {x k } is a local minimizer of (2). Proof. Let x k = Π BIk (x k 1 L ρk Φ ρk (x k )). Since {x k } satisfies (59), it follows from Lemma 2.1 (a) that Φ ρk (x k ) N BIk ( x k )+U(ǫ k ), (61) where I k = I(x k ). Let x be any accumulation point of {x k }. Then there exists a subsequence K such that {x k } K x. By passing to a subsequence if necessary, we can assume that I k = I for all k K. Let µ k = ρ k [Ax k b Π K (Ax k b)]. We clearly see that Using (61) and the definitions of Φ ρ and µ k, we have (µ k ) T Π K (Ax k b) = 0. (62) f(x k )+A T µ k N BI ( x k )+U(ǫ k ), k K. (63) 22

23 By (58), (59) and the definition of x k, one can observe that x k x k = 1 L ρk g(x k ;ρ k,i k ) ǫ k. (64) Inaddition, noticethat µ k = ρ k d K (Ax k b), which together with(59)implies that µ k t for all k. Hence, {µ k } is bounded. By passing to a subsequence if necessary, we can assume that {µ k } K µ. Using (64) and upon taking limits on both sides of (62) and (63) as k K, we have (µ ) T Π K (Ax b) = 0, f(x )+A T µ N BI (x ) In addition, since x k I = 0 for k K, we know that x I = 0. Also, it follows from (59) that d K (Ax b) = 0, which implies that Ax b K. These relations yield that and hence, x is a local minimizer of (2). 5 Concluding remarks x Argmin x B I {f(x) : Ax b K }, In this paper we studied iterative hard thresholding (IHT) methods for solving l 0 regularized convex cone programming problems. In particular, we first proposed an IHT method and its variant for solving l 0 regularized box constrained convex programming. We showed that the sequence generated by these methods converges to a local minimizer. Also, we established the iteration complexity of the IHT method for finding an ǫ-local-optimal solution. We then proposed a method for solving l 0 regularized convex cone programming by applying the IHT method to its quadratic penalty relaxation and established its iteration complexity for finding an ǫ-approximate local minimizer. Finally, we proposed a variant of this method in which the associated penalty parameter is dynamically updated, and showed that every accumulation point is a local minimizer of the problem. Some of the methods studied in this paper can be extended to solve some l 0 regularized nonconvex optimization problems. For example, the IHT method and its variant can be applied to problem (12) in which f is nonconvex and f is Lipschitz continuous. In addition, the numerical study of the IHT methods will be presented in the working paper [7]. Finally, it would be interesting to extend the methods of this paper to solve rank minimization problems and compare them with the methods studied in [5, 8]. This is left as a future research. Acknowledgment The author would like to thank Ting Kei Pong for proofreading and suggestions which substantially improves the presentation of the paper. 23

24 References [1] J. Barzilai and J.M. Borwein. Two point step size gradient methods. IMA J. Numer. Anal., 8: , [2] T. Blumensath and M. E. Davies. Iterative thresholding for sparse approximations. J. FOURIER ANAL. APPL., 14: , [3] T. Blumensath and M. E. Davies. Iterative hard thresholding for compressed sensing. Appl. Comput. Harmon. Anal., 27(3): , [4] E. G. Birgin, J. M. Martínez, and M. Raydan. Nonmonotone spectral projected gradient methods on convex sets. SIAM J. Optim., 4: , [5] J. Cai, E. Candès, and Z. Shen. A singular value thresholding algorithm for matrix completion. SIAM J. Optim., 20: , [6] K. K. Herrity, A. C. Gilbert, and J. A. Tropp. Sparse approximation via iterative thresholding. IEEE International Conference on Acoustics, Speech and Signal Processing, [7] J. Huang, S. Liu, and Z. Lu. Sparse approximation via nonconvex regularizers. Working paper, Department of Statistics, Texas A&M University, [8] P. Jain, R. Meka, and I. Dhillon. Guaranteed rank minimization via singular value projection. Neural Information Processing Systems, , [9] G. Lan and R. D. C. Monteiro. Iteration-complexity of first-order penalty methods for convex programming. To appear in Math. Prog.. [10] Z. Lu and Y. Zhang. Sparse approximation via penalty decomposition methods. Manuscript, Department of Mathematics, Simon Fraser University, Februray [11] S. Mallat and Z. Zhang. Matching pursuits with time-frequency dictionaries. IEEE T. Image Process., 41(12): , [12] Y. Nesterov. Introductory Lectures on Convex Programming: a basic course. Kluwer Academic Publishers, Massachusetts, [13] M. Nikolova. Description of the minimizers of least squares regularized with l 0 norm. Report HAL , CMLA - CNRS ENS Cachan, France, October [14] J. A. Tropp. Greed is good: algorithmic results for sparse approximation. IEEE T. Inform. Theory, 50(10): ,

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Zhaosong Lu October 5, 2012 (Revised: June 3, 2013; September 17, 2013) Abstract In this paper we study

More information

Sparse Approximation via Penalty Decomposition Methods

Sparse Approximation via Penalty Decomposition Methods Sparse Approximation via Penalty Decomposition Methods Zhaosong Lu Yong Zhang February 19, 2012 Abstract In this paper we consider sparse approximation problems, that is, general l 0 minimization problems

More information

Optimization over Sparse Symmetric Sets via a Nonmonotone Projected Gradient Method

Optimization over Sparse Symmetric Sets via a Nonmonotone Projected Gradient Method Optimization over Sparse Symmetric Sets via a Nonmonotone Projected Gradient Method Zhaosong Lu November 21, 2015 Abstract We consider the problem of minimizing a Lipschitz dierentiable function over a

More information

Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming

Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming Randomized Block Coordinate Non-Monotone Gradient Method for a Class of Nonlinear Programming Zhaosong Lu Lin Xiao June 25, 2013 Abstract In this paper we propose a randomized block coordinate non-monotone

More information

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming Zhaosong Lu Lin Xiao March 9, 2015 (Revised: May 13, 2016; December 30, 2016) Abstract We propose

More information

Iteration-complexity of first-order penalty methods for convex programming

Iteration-complexity of first-order penalty methods for convex programming Iteration-complexity of first-order penalty methods for convex programming Guanghui Lan Renato D.C. Monteiro July 24, 2008 Abstract This paper considers a special but broad class of convex programing CP)

More information

Spectral gradient projection method for solving nonlinear monotone equations

Spectral gradient projection method for solving nonlinear monotone equations Journal of Computational and Applied Mathematics 196 (2006) 478 484 www.elsevier.com/locate/cam Spectral gradient projection method for solving nonlinear monotone equations Li Zhang, Weijun Zhou Department

More information

Homotopy methods based on l 0 norm for the compressed sensing problem

Homotopy methods based on l 0 norm for the compressed sensing problem Homotopy methods based on l 0 norm for the compressed sensing problem Wenxing Zhu, Zhengshan Dong Center for Discrete Mathematics and Theoretical Computer Science, Fuzhou University, Fuzhou 350108, China

More information

arxiv: v1 [math.oc] 1 Jul 2016

arxiv: v1 [math.oc] 1 Jul 2016 Convergence Rate of Frank-Wolfe for Non-Convex Objectives Simon Lacoste-Julien INRIA - SIERRA team ENS, Paris June 8, 016 Abstract arxiv:1607.00345v1 [math.oc] 1 Jul 016 We give a simple proof that the

More information

Iteration-complexity of first-order augmented Lagrangian methods for convex programming

Iteration-complexity of first-order augmented Lagrangian methods for convex programming Math. Program., Ser. A 016 155:511 547 DOI 10.1007/s10107-015-0861-x FULL LENGTH PAPER Iteration-complexity of first-order augmented Lagrangian methods for convex programming Guanghui Lan Renato D. C.

More information

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44 Convex Optimization Newton s method ENSAE: Optimisation 1/44 Unconstrained minimization minimize f(x) f convex, twice continuously differentiable (hence dom f open) we assume optimal value p = inf x f(x)

More information

1. Introduction. We consider the following constrained optimization problem:

1. Introduction. We consider the following constrained optimization problem: SIAM J. OPTIM. Vol. 26, No. 3, pp. 1465 1492 c 2016 Society for Industrial and Applied Mathematics PENALTY METHODS FOR A CLASS OF NON-LIPSCHITZ OPTIMIZATION PROBLEMS XIAOJUN CHEN, ZHAOSONG LU, AND TING

More information

You should be able to...

You should be able to... Lecture Outline Gradient Projection Algorithm Constant Step Length, Varying Step Length, Diminishing Step Length Complexity Issues Gradient Projection With Exploration Projection Solving QPs: active set

More information

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method

On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method Optimization Methods and Software Vol. 00, No. 00, Month 200x, 1 11 On the Local Quadratic Convergence of the Primal-Dual Augmented Lagrangian Method ROMAN A. POLYAK Department of SEOR and Mathematical

More information

Sparse Recovery via Partial Regularization: Models, Theory and Algorithms

Sparse Recovery via Partial Regularization: Models, Theory and Algorithms Sparse Recovery via Partial Regularization: Models, Theory and Algorithms Zhaosong Lu and Xiaorui Li Department of Mathematics, Simon Fraser University, Canada {zhaosong,xla97}@sfu.ca November 23, 205

More information

Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization

Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization Inexact Alternating Direction Method of Multipliers for Separable Convex Optimization Hongchao Zhang hozhang@math.lsu.edu Department of Mathematics Center for Computation and Technology Louisiana State

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

Adaptive First-Order Methods for General Sparse Inverse Covariance Selection

Adaptive First-Order Methods for General Sparse Inverse Covariance Selection Adaptive First-Order Methods for General Sparse Inverse Covariance Selection Zhaosong Lu December 2, 2008 Abstract In this paper, we consider estimating sparse inverse covariance of a Gaussian graphical

More information

Douglas-Rachford splitting for nonconvex feasibility problems

Douglas-Rachford splitting for nonconvex feasibility problems Douglas-Rachford splitting for nonconvex feasibility problems Guoyin Li Ting Kei Pong Jan 3, 015 Abstract We adapt the Douglas-Rachford DR) splitting method to solve nonconvex feasibility problems by studying

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

FIXED POINT ITERATIONS

FIXED POINT ITERATIONS FIXED POINT ITERATIONS MARKUS GRASMAIR 1. Fixed Point Iteration for Non-linear Equations Our goal is the solution of an equation (1) F (x) = 0, where F : R n R n is a continuous vector valued mapping in

More information

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods

Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Coordinate Update Algorithm Short Course Subgradients and Subgradient Methods Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 30 Notation f : H R { } is a closed proper convex function domf := {x R n

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming

Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming Mathematical Programming manuscript No. (will be inserted by the editor) Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming Guanghui Lan Zhaosong Lu Renato D. C. Monteiro

More information

A derivative-free nonmonotone line search and its application to the spectral residual method

A derivative-free nonmonotone line search and its application to the spectral residual method IMA Journal of Numerical Analysis (2009) 29, 814 825 doi:10.1093/imanum/drn019 Advance Access publication on November 14, 2008 A derivative-free nonmonotone line search and its application to the spectral

More information

Algorithms for constrained local optimization

Algorithms for constrained local optimization Algorithms for constrained local optimization Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Algorithms for constrained local optimization p. Feasible direction methods Algorithms for constrained

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

An Augmented Lagrangian Approach for Sparse Principal Component Analysis

An Augmented Lagrangian Approach for Sparse Principal Component Analysis An Augmented Lagrangian Approach for Sparse Principal Component Analysis Zhaosong Lu Yong Zhang July 12, 2009 Abstract Principal component analysis (PCA) is a widely used technique for data analysis and

More information

Accelerated primal-dual methods for linearly constrained convex problems

Accelerated primal-dual methods for linearly constrained convex problems Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23 Accelerated proximal gradient For convex composite problem: minimize

More information

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming Zhaosong Lu Lin Xiao June 8, 2014 Abstract In this paper we propose a randomized nonmonotone block

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

Gradient methods for minimizing composite functions

Gradient methods for minimizing composite functions Math. Program., Ser. B 2013) 140:125 161 DOI 10.1007/s10107-012-0629-5 FULL LENGTH PAPER Gradient methods for minimizing composite functions Yu. Nesterov Received: 10 June 2010 / Accepted: 29 December

More information

Gradient methods for minimizing composite functions

Gradient methods for minimizing composite functions Gradient methods for minimizing composite functions Yu. Nesterov May 00 Abstract In this paper we analyze several new methods for solving optimization problems with the objective function formed as a sum

More information

GLOBAL CONVERGENCE OF CONJUGATE GRADIENT METHODS WITHOUT LINE SEARCH

GLOBAL CONVERGENCE OF CONJUGATE GRADIENT METHODS WITHOUT LINE SEARCH GLOBAL CONVERGENCE OF CONJUGATE GRADIENT METHODS WITHOUT LINE SEARCH Jie Sun 1 Department of Decision Sciences National University of Singapore, Republic of Singapore Jiapu Zhang 2 Department of Mathematics

More information

arxiv: v1 [math.oc] 5 Dec 2014

arxiv: v1 [math.oc] 5 Dec 2014 FAST BUNDLE-LEVEL TYPE METHODS FOR UNCONSTRAINED AND BALL-CONSTRAINED CONVEX OPTIMIZATION YUNMEI CHEN, GUANGHUI LAN, YUYUAN OUYANG, AND WEI ZHANG arxiv:141.18v1 [math.oc] 5 Dec 014 Abstract. It has been

More information

Stability and Robustness of Weak Orthogonal Matching Pursuits

Stability and Robustness of Weak Orthogonal Matching Pursuits Stability and Robustness of Weak Orthogonal Matching Pursuits Simon Foucart, Drexel University Abstract A recent result establishing, under restricted isometry conditions, the success of sparse recovery

More information

Lecture 7: September 17

Lecture 7: September 17 10-725: Optimization Fall 2013 Lecture 7: September 17 Lecturer: Ryan Tibshirani Scribes: Serim Park,Yiming Gu 7.1 Recap. The drawbacks of Gradient Methods are: (1) requires f is differentiable; (2) relatively

More information

Math 273a: Optimization Subgradient Methods

Math 273a: Optimization Subgradient Methods Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R

More information

A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang

A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES. Fenghui Wang A NEW ITERATIVE METHOD FOR THE SPLIT COMMON FIXED POINT PROBLEM IN HILBERT SPACES Fenghui Wang Department of Mathematics, Luoyang Normal University, Luoyang 470, P.R. China E-mail: wfenghui@63.com ABSTRACT.

More information

Optimization for Machine Learning

Optimization for Machine Learning Optimization for Machine Learning (Problems; Algorithms - A) SUVRIT SRA Massachusetts Institute of Technology PKU Summer School on Data Science (July 2017) Course materials http://suvrit.de/teaching.html

More information

Cubic regularization of Newton s method for convex problems with constraints

Cubic regularization of Newton s method for convex problems with constraints CORE DISCUSSION PAPER 006/39 Cubic regularization of Newton s method for convex problems with constraints Yu. Nesterov March 31, 006 Abstract In this paper we derive efficiency estimates of the regularized

More information

Gradient based method for cone programming with application to large-scale compressed sensing

Gradient based method for cone programming with application to large-scale compressed sensing Gradient based method for cone programming with application to large-scale compressed sensing Zhaosong Lu September 3, 2008 (Revised: September 17, 2008) Abstract In this paper, we study a gradient based

More information

Design of Projection Matrix for Compressive Sensing by Nonsmooth Optimization

Design of Projection Matrix for Compressive Sensing by Nonsmooth Optimization Design of Proection Matrix for Compressive Sensing by Nonsmooth Optimization W.-S. Lu T. Hinamoto Dept. of Electrical & Computer Engineering Graduate School of Engineering University of Victoria Hiroshima

More information

5 Handling Constraints

5 Handling Constraints 5 Handling Constraints Engineering design optimization problems are very rarely unconstrained. Moreover, the constraints that appear in these problems are typically nonlinear. This motivates our interest

More information

Math 273a: Optimization Subgradients of convex functions

Math 273a: Optimization Subgradients of convex functions Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions

More information

Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization

Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä New Proximal Bundle Method for Nonsmooth DC Optimization TUCS Technical Report No 1130, February 2015 New Proximal Bundle Method for Nonsmooth

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.18

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.18 XVIII - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No18 Linearized alternating direction method with Gaussian back substitution for separable convex optimization

More information

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization

More information

Accelerated Block-Coordinate Relaxation for Regularized Optimization

Accelerated Block-Coordinate Relaxation for Regularized Optimization Accelerated Block-Coordinate Relaxation for Regularized Optimization Stephen J. Wright Computer Sciences University of Wisconsin, Madison October 09, 2012 Problem descriptions Consider where f is smooth

More information

On the acceleration of augmented Lagrangian method for linearly constrained optimization

On the acceleration of augmented Lagrangian method for linearly constrained optimization On the acceleration of augmented Lagrangian method for linearly constrained optimization Bingsheng He and Xiaoming Yuan October, 2 Abstract. The classical augmented Lagrangian method (ALM plays a fundamental

More information

Worst Case Complexity of Direct Search

Worst Case Complexity of Direct Search Worst Case Complexity of Direct Search L. N. Vicente October 25, 2012 Abstract In this paper we prove that the broad class of direct-search methods of directional type based on imposing sufficient decrease

More information

Differentiable exact penalty functions for nonlinear optimization with easy constraints. Takuma NISHIMURA

Differentiable exact penalty functions for nonlinear optimization with easy constraints. Takuma NISHIMURA Master s Thesis Differentiable exact penalty functions for nonlinear optimization with easy constraints Guidance Assistant Professor Ellen Hidemi FUKUDA Takuma NISHIMURA Department of Applied Mathematics

More information

A smoothing augmented Lagrangian method for solving simple bilevel programs

A smoothing augmented Lagrangian method for solving simple bilevel programs A smoothing augmented Lagrangian method for solving simple bilevel programs Mengwei Xu and Jane J. Ye Dedicated to Masao Fukushima in honor of his 65th birthday Abstract. In this paper, we design a numerical

More information

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems)

Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Accelerated Dual Gradient-Based Methods for Total Variation Image Denoising/Deblurring Problems (and other Inverse Problems) Donghwan Kim and Jeffrey A. Fessler EECS Department, University of Michigan

More information

arxiv: v1 [math.na] 26 Nov 2009

arxiv: v1 [math.na] 26 Nov 2009 Non-convexly constrained linear inverse problems arxiv:0911.5098v1 [math.na] 26 Nov 2009 Thomas Blumensath Applied Mathematics, School of Mathematics, University of Southampton, University Road, Southampton,

More information

Identifying Active Constraints via Partial Smoothness and Prox-Regularity

Identifying Active Constraints via Partial Smoothness and Prox-Regularity Journal of Convex Analysis Volume 11 (2004), No. 2, 251 266 Identifying Active Constraints via Partial Smoothness and Prox-Regularity W. L. Hare Department of Mathematics, Simon Fraser University, Burnaby,

More information

Worst Case Complexity of Direct Search

Worst Case Complexity of Direct Search Worst Case Complexity of Direct Search L. N. Vicente May 3, 200 Abstract In this paper we prove that direct search of directional type shares the worst case complexity bound of steepest descent when sufficient

More information

Characterizations of Solution Sets of Fréchet Differentiable Problems with Quasiconvex Objective Function

Characterizations of Solution Sets of Fréchet Differentiable Problems with Quasiconvex Objective Function Characterizations of Solution Sets of Fréchet Differentiable Problems with Quasiconvex Objective Function arxiv:1805.03847v1 [math.oc] 10 May 2018 Vsevolod I. Ivanov Department of Mathematics, Technical

More information

An Alternating Direction Method for Finding Dantzig Selectors

An Alternating Direction Method for Finding Dantzig Selectors An Alternating Direction Method for Finding Dantzig Selectors Zhaosong Lu Ting Kei Pong Yong Zhang November 19, 21 Abstract In this paper, we study the alternating direction method for finding the Dantzig

More information

Fast proximal gradient methods

Fast proximal gradient methods L. Vandenberghe EE236C (Spring 2013-14) Fast proximal gradient methods fast proximal gradient method (FISTA) FISTA with line search FISTA as descent method Nesterov s second method 1 Fast (proximal) gradient

More information

On the convergence properties of the projected gradient method for convex optimization

On the convergence properties of the projected gradient method for convex optimization Computational and Applied Mathematics Vol. 22, N. 1, pp. 37 52, 2003 Copyright 2003 SBMAC On the convergence properties of the projected gradient method for convex optimization A. N. IUSEM* Instituto de

More information

Newton s Method. Javier Peña Convex Optimization /36-725

Newton s Method. Javier Peña Convex Optimization /36-725 Newton s Method Javier Peña Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, f ( (y) = max y T x f(x) ) x Properties and

More information

On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean

On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean On the complexity of the hybrid proximal extragradient method for the iterates and the ergodic mean Renato D.C. Monteiro B. F. Svaiter March 17, 2009 Abstract In this paper we analyze the iteration-complexity

More information

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R

More information

Lecture 15: SQP methods for equality constrained optimization

Lecture 15: SQP methods for equality constrained optimization Lecture 15: SQP methods for equality constrained optimization Coralia Cartis, Mathematical Institute, University of Oxford C6.2/B2: Continuous Optimization Lecture 15: SQP methods for equality constrained

More information

Radius Theorems for Monotone Mappings

Radius Theorems for Monotone Mappings Radius Theorems for Monotone Mappings A. L. Dontchev, A. Eberhard and R. T. Rockafellar Abstract. For a Hilbert space X and a mapping F : X X (potentially set-valued) that is maximal monotone locally around

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

Minimizing the Difference of L 1 and L 2 Norms with Applications

Minimizing the Difference of L 1 and L 2 Norms with Applications 1/36 Minimizing the Difference of L 1 and L 2 Norms with Department of Mathematical Sciences University of Texas Dallas May 31, 2017 Partially supported by NSF DMS 1522786 2/36 Outline 1 A nonconvex approach:

More information

Technische Universität Dresden Herausgeber: Der Rektor

Technische Universität Dresden Herausgeber: Der Rektor Als Manuskript gedruckt Technische Universität Dresden Herausgeber: Der Rektor The Gradient of the Squared Residual as Error Bound an Application to Karush-Kuhn-Tucker Systems Andreas Fischer MATH-NM-13-2002

More information

Introduction to Alternating Direction Method of Multipliers

Introduction to Alternating Direction Method of Multipliers Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

MS&E 318 (CME 338) Large-Scale Numerical Optimization

MS&E 318 (CME 338) Large-Scale Numerical Optimization Stanford University, Management Science & Engineering (and ICME) MS&E 318 (CME 338) Large-Scale Numerical Optimization 1 Origins Instructor: Michael Saunders Spring 2015 Notes 9: Augmented Lagrangian Methods

More information

Research Note. A New Infeasible Interior-Point Algorithm with Full Nesterov-Todd Step for Semi-Definite Optimization

Research Note. A New Infeasible Interior-Point Algorithm with Full Nesterov-Todd Step for Semi-Definite Optimization Iranian Journal of Operations Research Vol. 4, No. 1, 2013, pp. 88-107 Research Note A New Infeasible Interior-Point Algorithm with Full Nesterov-Todd Step for Semi-Definite Optimization B. Kheirfam We

More information

Projection methods to solve SDP

Projection methods to solve SDP Projection methods to solve SDP Franz Rendl http://www.math.uni-klu.ac.at Alpen-Adria-Universität Klagenfurt Austria F. Rendl, Oberwolfach Seminar, May 2010 p.1/32 Overview Augmented Primal-Dual Method

More information

GENERALIZED second-order cone complementarity

GENERALIZED second-order cone complementarity Stochastic Generalized Complementarity Problems in Second-Order Cone: Box-Constrained Minimization Reformulation and Solving Methods Mei-Ju Luo and Yan Zhang Abstract In this paper, we reformulate the

More information

Lagrange Relaxation and Duality

Lagrange Relaxation and Duality Lagrange Relaxation and Duality As we have already known, constrained optimization problems are harder to solve than unconstrained problems. By relaxation we can solve a more difficult problem by a simpler

More information

Efficient Methods for Stochastic Composite Optimization

Efficient Methods for Stochastic Composite Optimization Efficient Methods for Stochastic Composite Optimization Guanghui Lan School of Industrial and Systems Engineering Georgia Institute of Technology, Atlanta, GA 3033-005 Email: glan@isye.gatech.edu June

More information

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0 Numerical Analysis 1 1. Nonlinear Equations This lecture note excerpted parts from Michael Heath and Max Gunzburger. Given function f, we seek value x for which where f : D R n R n is nonlinear. f(x) =

More information

A projection-type method for generalized variational inequalities with dual solutions

A projection-type method for generalized variational inequalities with dual solutions Available online at www.isr-publications.com/jnsa J. Nonlinear Sci. Appl., 10 (2017), 4812 4821 Research Article Journal Homepage: www.tjnsa.com - www.isr-publications.com/jnsa A projection-type method

More information

Step lengths in BFGS method for monotone gradients

Step lengths in BFGS method for monotone gradients Noname manuscript No. (will be inserted by the editor) Step lengths in BFGS method for monotone gradients Yunda Dong Received: date / Accepted: date Abstract In this paper, we consider how to directly

More information

of Orthogonal Matching Pursuit

of Orthogonal Matching Pursuit A Sharp Restricted Isometry Constant Bound of Orthogonal Matching Pursuit Qun Mo arxiv:50.0708v [cs.it] 8 Jan 205 Abstract We shall show that if the restricted isometry constant (RIC) δ s+ (A) of the measurement

More information

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties

One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties One Mirror Descent Algorithm for Convex Constrained Optimization Problems with Non-Standard Growth Properties Fedor S. Stonyakin 1 and Alexander A. Titov 1 V. I. Vernadsky Crimean Federal University, Simferopol,

More information

A CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE

A CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE Journal of Applied Analysis Vol. 6, No. 1 (2000), pp. 139 148 A CHARACTERIZATION OF STRICT LOCAL MINIMIZERS OF ORDER ONE FOR STATIC MINMAX PROBLEMS IN THE PARAMETRIC CONSTRAINT CASE A. W. A. TAHA Received

More information

6. Proximal gradient method

6. Proximal gradient method L. Vandenberghe EE236C (Spring 2013-14) 6. Proximal gradient method motivation proximal mapping proximal gradient method with fixed step size proximal gradient method with line search 6-1 Proximal mapping

More information

Complexity of gradient descent for multiobjective optimization

Complexity of gradient descent for multiobjective optimization Complexity of gradient descent for multiobjective optimization J. Fliege A. I. F. Vaz L. N. Vicente July 18, 2018 Abstract A number of first-order methods have been proposed for smooth multiobjective optimization

More information

1. Introduction. We analyze a trust region version of Newton s method for the optimization problem

1. Introduction. We analyze a trust region version of Newton s method for the optimization problem SIAM J. OPTIM. Vol. 9, No. 4, pp. 1100 1127 c 1999 Society for Industrial and Applied Mathematics NEWTON S METHOD FOR LARGE BOUND-CONSTRAINED OPTIMIZATION PROBLEMS CHIH-JEN LIN AND JORGE J. MORÉ To John

More information

Generalized Uniformly Optimal Methods for Nonlinear Programming

Generalized Uniformly Optimal Methods for Nonlinear Programming Generalized Uniformly Optimal Methods for Nonlinear Programming Saeed Ghadimi Guanghui Lan Hongchao Zhang Janumary 14, 2017 Abstract In this paper, we present a generic framewor to extend existing uniformly

More information

Inexact Newton Methods and Nonlinear Constrained Optimization

Inexact Newton Methods and Nonlinear Constrained Optimization Inexact Newton Methods and Nonlinear Constrained Optimization Frank E. Curtis EPSRC Symposium Capstone Conference Warwick Mathematics Institute July 2, 2009 Outline PDE-Constrained Optimization Newton

More information

An Inexact Newton Method for Optimization

An Inexact Newton Method for Optimization New York University Brown Applied Mathematics Seminar, February 10, 2009 Brief biography New York State College of William and Mary (B.S.) Northwestern University (M.S. & Ph.D.) Courant Institute (Postdoc)

More information

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex-concave saddle-point problems O. Kolossoski R. D. C. Monteiro September 18, 2015 (Revised: September 28, 2016) Abstract

More information

Least squares regularized or constrained by L0: relationship between their global minimizers. Mila Nikolova

Least squares regularized or constrained by L0: relationship between their global minimizers. Mila Nikolova Least squares regularized or constrained by L0: relationship between their global minimizers Mila Nikolova CMLA, CNRS, ENS Cachan, Université Paris-Saclay, France nikolova@cmla.ens-cachan.fr SIAM Minisymposium

More information

PDE-Constrained and Nonsmooth Optimization

PDE-Constrained and Nonsmooth Optimization Frank E. Curtis October 1, 2009 Outline PDE-Constrained Optimization Introduction Newton s method Inexactness Results Summary and future work Nonsmooth Optimization Sequential quadratic programming (SQP)

More information

Lecture 7 Monotonicity. September 21, 2008

Lecture 7 Monotonicity. September 21, 2008 Lecture 7 Monotonicity September 21, 2008 Outline Introduce several monotonicity properties of vector functions Are satisfied immediately by gradient maps of convex functions In a sense, role of monotonicity

More information

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications

A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications A globally and R-linearly convergent hybrid HS and PRP method and its inexact version with applications Weijun Zhou 28 October 20 Abstract A hybrid HS and PRP type conjugate gradient method for smooth

More information

GENERAL NONCONVEX SPLIT VARIATIONAL INEQUALITY PROBLEMS. Jong Kyu Kim, Salahuddin, and Won Hee Lim

GENERAL NONCONVEX SPLIT VARIATIONAL INEQUALITY PROBLEMS. Jong Kyu Kim, Salahuddin, and Won Hee Lim Korean J. Math. 25 (2017), No. 4, pp. 469 481 https://doi.org/10.11568/kjm.2017.25.4.469 GENERAL NONCONVEX SPLIT VARIATIONAL INEQUALITY PROBLEMS Jong Kyu Kim, Salahuddin, and Won Hee Lim Abstract. In this

More information

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL)

Part 3: Trust-region methods for unconstrained optimization. Nick Gould (RAL) Part 3: Trust-region methods for unconstrained optimization Nick Gould (RAL) minimize x IR n f(x) MSc course on nonlinear optimization UNCONSTRAINED MINIMIZATION minimize x IR n f(x) where the objective

More information

Newton s Method. Ryan Tibshirani Convex Optimization /36-725

Newton s Method. Ryan Tibshirani Convex Optimization /36-725 Newton s Method Ryan Tibshirani Convex Optimization 10-725/36-725 1 Last time: dual correspondences Given a function f : R n R, we define its conjugate f : R n R, Properties and examples: f (y) = max x

More information

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex concave saddle-point problems

An accelerated non-euclidean hybrid proximal extragradient-type algorithm for convex concave saddle-point problems Optimization Methods and Software ISSN: 1055-6788 (Print) 1029-4937 (Online) Journal homepage: http://www.tandfonline.com/loi/goms20 An accelerated non-euclidean hybrid proximal extragradient-type algorithm

More information