Iterative Hard Thresholding Methods for l 0 Regularized Convex Cone Programming

Size: px

Start display at page:

Download "Iterative Hard Thresholding Methods for l 0 Regularized Convex Cone Programming"

Adrian Gordon
5 years ago
Views:

1 Iterative Hard Thresholding Methods for l 0 Regularized Convex Cone Programming arxiv: v2 [math.oc] 2 Nov 2012 Zhaosong Lu October 30, 2012 Abstract In this paper we consider l 0 regularized convex cone programming problems. In particular, we first propose an iterative hard thresholding (IHT) method and its variant for solving l 0 regularized box constrained convex programming. We show that the sequence generated by these methods converges to a local minimizer. Also, we establish the iteration complexity of the IHT method for finding an ǫ-local-optimal solution. We then propose a method for solving l 0 regularized convex cone programming by applying the IHT method to its quadratic penalty relaxation and establish its iteration complexity for finding an ǫ-approximate local minimizer. Finally, we propose a variant of this method in which the associated penalty parameter is dynamically updated, and show that every accumulation point is a local minimizer of the problem. Key words: Sparse approximation, iterative hard thresholding method, l 0 regularization, box constrained convex programming, convex cone programming 1 Introduction Sparse approximations have over the last decade gained a great deal of popularity in numerous areas. For example, in compressed sensing, a large sparse signal is decoded by finding a sparse solution to a system of linear equalities and/or inequalities. Our particular interest of this paper is to find a sparse approximation to a convex cone programming problem in the form of min f(x) s.t. Ax b K, (1) l x u Department of Mathematics, Simon Fraser University, Burnaby, BC, V5A 1S6, Canada. ( zhaosong@sfu.ca). This work was supported in part by NSERC Discovery Grant. Part of this work was conduct during the author s sabbatical leave in Department of Industrial and Systems Engineering at Texas A&M University. The author would like to thank them for hosting his visit. 1

2 for some l R n, u R n +, A R m n and b R m, where K denotes the dual cone of a closed convex cone K R m, i.e., K = {s R m : s T x 0, x K}, and R n = {x : x i 0, 1 i n} and R n + = {x R n : 0 x i, 1 i n}. A sparse solution to (1) can be sought by solving the following l 0 regularized convex cone programming problem: min f(x)+λ x 0 s.t. Ax b K, l x u (2) for some λ > 0, where x 0 denotes the cardinality of x. Onespecial case of (2), that is, the l 0 - regularized unconstrained least squares problem, has been well studied in the literature (e.g., [13, 10]), and some methods were developed for solving it. For example, the iterative hard thresholding (IHT) methods [6, 2, 3] and matching pursuit algorithms [11, 14] were proposed to solve this type of problems. Recently, Lu and Zhang [10] proposed a penalty decomposition method for solving a more general class of l 0 minimization problems. As shown by the extensive experiments in [2, 3], the IHT method performs very well in finding a sparse solution to unconstrained least squares problems. In addition, the similar type of methods [5, 8] were successfully applied to find low rank solutions in the context of matrix completion. Inspired by these works, in this paper we study IHT methods for solving l 0 regularized convex cone programming problem (2). In particular, we first propose an IHT method and its variant for solving l 0 regularized box constrained convex programming. We show that the sequence generated by these methods converges to a local minimizer. Also, we establish the iteration complexity of the IHT method for finding an ǫ-local-optimal solution. We then propose a method for solving l 0 regularized convex cone programming by applying the IHT method to its quadratic penalty relaxation and establish its iteration complexity for finding an ǫ-approximate local minimizer of the problem. We also propose a variant of the method in which the associated penalty parameter is dynamically updated, and show that every accumulation point is a local minimizer of the problem. The outline of this paper is as follows. In Subsection 1.1 we introduce some notations that are used in the paper. In Section 2 we present some technical results about a projected gradient method for convex programming. In Section 3 we propose IHT methods for solving l 0 regularized box constrained convex programming and study their convergence. In section 4 we develop IHT methods for solving l 0 regularized convex cone programming and study their convergence. Finally, in Section 5 we present some concluding remarks. 1.1 Notation Given a nonempty closed convex Ω R n and an arbitrary point x Ω, N Ω (x) denotes the normal cone of Ω at x. In addition, d Ω (y) denotes the Euclidean distance between y R n and Ω. All norms used in the paper are Euclidean norm denoted by. We use U(r) to denote a ball centered at the origin with a radius r 0, that is, U(r) := {x R n : x r}. 2

3 2 Technical preliminaries In this section we present some technical results about a projected gradient method for convex programming that will be subsequently used in this paper. Consider the convex programming problem φ := minφ(x), (3) x X where X R n is a closed convex set and φ : X R is a smooth convex function whose gradient is Lipschitz continuous with constant L φ > 0. Assume that the set of optimal solutions of (3), denoted by X, is nonempty. Let L L φ be arbitrarily given. A projected gradient of φ at any x X with respect to X is defined as g(x) := L[x Π X (x φ(x)/l)], (4) where Π X ( ) is the projection map onto X (see, for example, [12]). The following properties of the projected gradient are essentially shown in Proposition 3 and Lemma 4 of [9] (see also [12]). Lemma 2.1 Let x X be given and define x + := Π X (x φ(x)/l). Then, for any given ǫ 0, the following statements hold: a) g(x) ǫ if and only if φ(x) N X (x + )+U(ǫ). b) g(x) ǫ implies that φ(x + ) N X (x + )+U(2ǫ). c) φ(x + ) φ(x) g(x) 2 /(2L). d) φ(x) φ(x ) g(x) 2 /(2L), where x Argmin{φ(y) : y X}. We next study a projected gradient method for solving (3). Projected gradient method for (3): Choose an arbitrary x 0 X. Set k = 0. end 1) Solve the subproblem 2) Set k k +1 and go to step 1). x k+1 = argmin x X {φ(xk )+ φ(x k ) T (x x k )+ L 2 x xk 2 }. (5) Some properties of the above projected gradient method are established in the following two theorems, which will be used in the subsequent sections of this paper. 3

4 Theorem 2.2 Let {x k } be generated by the above projected gradient method. Then the following statements hold: (i) For every k 0 and l 1, (ii) {x k } converges to some optimal solution x of (3). φ(x k+l ) φ L 2l xk x 2. (6) Proof. (i) Since the objective function of (5) is strongly convex with modulus L, it follows that for every x X, φ(x k )+ φ(x k ) T (x x k )+ L 2 x xk 2 φ(x k )+ φ(x k ) T (x k+1 x k )+ L 2 xk+1 x k 2 + L 2 x xk+1 2. By the convexity of φ, Lipschitz continuity of φ and L L φ, we have φ(x) φ(x k )+ φ(x k ) T (x x k ), φ(x k+1 ) φ(x k )+ φ(x k ) T (x k+1 x k )+ L 2 xk+1 x k 2, which together with the above inequality imply that φ(x)+ L 2 x xk 2 φ(x k+1 )+ L 2 x xk+1 2, x X. (7) Letting x = x k in (7), we obtain that φ(x k ) φ(x k+1 ) L x k+1 x k 2 /2. Hence, {φ(x k )} is decreasing. Letting x = x X in (7), we have φ(x k+1 ) φ L 2 ( x k x 2 x k+1 x 2), k 0. Using this inequality and the monotonicity of {φ(x k )}, we obtain that l(φ(x k+l ) φ ) which immediately yields (6). (ii) It follows from (8) that [φ(x i+1 ) φ ] L ( x k x 2 x k+l x 2), (8) 2 k+l 1 i=k x k+l x x k x, k 0,l 1. (9) Hence, x k x x 0 x for every k. It implies that {x k } is bounded. Then, there exists a subsequence K such that {x k } K ˆx X. It can be seen from (6) that {φ(x k )} K φ. Hence, φ(ˆx ) = lim k K φ(x k ) = φ, which implies that ˆx X. Since (9) holds for any x X, we also have x k+l ˆx x k ˆx for every k 0 and l 1. This together with the fact {x k } K ˆx implies that {x k } ˆx and hence statement (ii) holds. 4

5 Theorem 2.3 Suppose that φ is strongly convex with modulus σ > 0. Let {x k } be generated by the above projected gradient method. Then, for any given ǫ > 0, the following statements hold: (i) φ(x k ) φ ǫ whenever (ii) φ(x k ) φ < ǫ whenever k 2 L/σ log φ(x0 ) φ. ǫ k 2 L/σ log φ(x0 ) φ +1. ǫ Proof. (i) Let M = L/σ. It follows from Theorem 2.2 that φ(x k+l ) φ L 2l xk x 2 L σl (φ(xk ) φ ), where x is the optimal solution of (3). Hence, we have which implies that φ(x k+2m ) φ L 2σM (φ(xk ) φ ) 1 2 (φ(xk ) φ ), φ(x 2jM ) φ 1 2 j(φ(x0 ) φ ). Let K = log((φ(x 0 ) φ )/ǫ). Hence, when k 2KM, we have φ(x k ) φ φ(x 2KM ) φ 1 2 K(φ(x0 ) φ ) ǫ, which immediately implies that statement (i) holds. (ii) Let K and M be defined as above. If φ(x 2KM ) = φ, by monotonicity of {φ(x k )} we have φ(x k ) = φ when k > 2KM, and hence the conclusion holds. We now suppose that φ(x 2KM ) > φ. It implies that g(x 2KM ) 0, where g is defined in (4). Using this relation, Lemma 2.1 (c) and statement (i), we obtain that φ(x 2KM+1 ) < φ(x 2KM ) ǫ, which together with the montonicity of {φ(x k )} implies that the conclusion holds. Finally, we consider the convex programming problem f := min{f(x) : Ax b K,x X}, (10) for some A R m n and b R m, where f : X R is a smooth convex function whose gradient is Lipschitz continuous gradient with constant L f > 0, X R n is a closed convex set, and K is the dual cone of a closed convex cone K. 5

6 The Lagrangian dual function associated with (10) is given by d(µ) := inf{f(x)+µ T (Ax b) : x X}, µ K. Assume that there exists a Lagrange multiplier for (10), that is, a vector µ K such that d(µ ) = f. Under this assumption, the following results are established in Corollary 2 and Proposition 10 of [9], respectively. Lemma 2.4 Let µ be a Lagrange multiplier for (10). There holds: f(x) f µ d K (Ax b), x X. Lemma 2.5 Let ρ > 0 be given and L ρ = L f +ρ A 2. Consider the problem Φ ρ := min x X {Φ ρ(x) := f(x)+ ρ 2 [d K (Ax b)]2 }. (11) If x X is a ξ-approximate solution of (11), i.e., Φ ρ (x) Φ ρ ξ, then the pair (x +,µ) defined as x + := Π X (x Φ ρ (x)/l ρ ), µ := ρ[ax + b Π K (Ax + b)] is in X ( K) and satisfies µ T Π K (Ax + b) = 0 and the relations d K (Ax + b) 1 ρ µ ξ +, ρ f(x + )+A T µ N X (x + )+U(2 2L ρ ξ), where µ is an arbitrary Lagrange multiplier for (10). 3 l 0 regularized box constrained convex programming In this section we consider a special case of (2), that is, l 0 regularized box constrained convex programming problem in the form of: F := min F(x) := f(x)+λ x 0 s.t. l x u (12) for some λ > 0, l R n and u R n +. Recently, Blumensath and Davies [2, 3] proposed an iterative hard thresholding (IHT) method for solving a special case of (12) with f(x) = Ax b 2, l i = and u i = for all i. Our aim is to extend their IHT method to solve (12) and study its convergence. In addition, we establish its iteration complexity for finding an ǫ-local-optimal solution of (12). Finally, we propose a variant of the IHT method in which only local Lipschitz constant of f is used. 6

7 Throughout this section we assume that f is a smoothconvex function in B whose gradient is Lipschitz continuous with constant L f > 0, and also that f is bounded below on the set B, where B := {x R n : l x u}. (13) We now present an IHT method for solving problem (12). Iterative hard thresholding method for (12): Choose an arbitrary x 0 B. Set k = 0. end 1) Solve the subproblem x k+1 Argmin x B {f(xk )+ f(x k ) T (x x k )+ L 2 x xk 2 +λ x 0 }. (14) 2) Set k k +1 and go to step 1). Remark. The subproblem (14) has a closed form solution given in (21). In what follows, we study the convergence of the above IHT method for (12). Before proceeding, we introduce some notations that will be used subsequently. Define B I := {x B : x I = 0}, I {1,...,n}, (15) Π B (x) := argmin{ y x : y B}, x R n, s L (x) := x 1 f(x), x B, (16) L I(x) := {i : x i = 0}, x R n (17) for some constant L > L f. The following lemma establishes some properties of the operators s L ( ) and Π B (s L ( )), which will be used subsequently. Lemma 3.1 For any x, y R n, there hold: (1) [s L (x)] 2 i [s L (y)] 2 i 4( x y + [s L (y)] i ) x y ; (2) [Π B (s L (x)) s L (x)] 2 i [Π B (s L (y)) s L (y)] 2 i 4( x y + [Π B (s L (y)) s L (y)] i ) x y. Proof. (1) We observe that s L (x) s L (y) = x y 1 L ( f(x) f(y)) x y + 1 L f(x) f(y), (1+ L f ) x y 2 x y. (18) L 7

8 It follows from (18) that [s L (x)] 2 i [s L(y)] 2 i = [s L(x)] i +[s L (y)] i [s L (x)] i [s L (y)] i, (2) It can be shown that Using this inequality and (18), we then have ( [s L (x)] i [s L (y)] i +2 [s L (y)] i ) [s L (x)] i [s L (y)] i, 4( x y + [s L (y)] i ) x y. Π B (x) x+y Π B (y) x y. [Π B (s L (x)) s L (x)] 2 i [Π B (s L (y)) s L (y)] 2 i ( [Π B (s L (x)) s L (x)] i [Π B (s L (y)) s L (y)] i +2 Π B (s L (y)) s L (y)] i ) [Π B (s L (x)) s L (x)] i [Π B (s L (y)) s L (y)] i, ( s L (x) s L (y) +2 [Π B (s L (y)) s L (y)] i ) s L (x) s L (y), 4( x y + [Π B (s L (y)) s L (y)] i ) x y. The following lemma shows that for the sequence {x k }, the magnitude of any nonzero component x k i cannot be too small for k 1. Lemma 3.2 Let {x k } be generated by the above IHT method. Then, for all k 0, x k+1 j δ := minδ i > 0, if x k+1 j 0, (19) i/ I 0 where I 0 = {i : l i = u i = 0} and min(u i, 2λ/L), if l i = 0, δ i = min( l i, 2λ/L), if u i = 0, min( l i,u i, 2λ/L), otherwise, i I 0. (20) Proof. One can observe from (14) that for i = 1,...,n, [Π B (s L (x k ))] i, if [s L (x k )] 2 i [Π B(s L (x k )) s L (x k )] 2 i > 2λ, L x k+1 i = 0, if [s L (x k )] 2 i [Π B(s L (x k )) s L (x k )] 2 i < 2λ, L [Π B (s L (x k ))] i or 0, otherwise (21) (see, for example, [10]). Suppose that j is an index such that x k+1 j I 0 is define above. It follows from (21) that 0. Clearly, j / I 0, where x k+1 j = [Π B (s L (x k ))] j 0, [s L (x k )] 2 j [Π B (s L (x k )) s L (x k )] 2 j 2λ L. (22) 8

9 The second relation of (22) implies that [s L (x k )] j 2λ/L. In addition, by the first relation of (22) and the definition of Π B, we have x k+1 j = [Π B (s L (x k ))] j = min(max([s L (x k )] j,l j ),u j ) 0. (23) Recall that j / I 0. We next show that x k+1 j δ j by considering three separate cases: i) l j = 0; ii) u j = 0; and iii) l j u j 0. For case i), it follows from (23) that [s L (x k )] j 0 = min([s L (x k )] j,u j ). This together with the relation [s L (x k )] j 2λ/L and the definition of δ j implies that x k+1 j δ j. By the similar arguments, we can show that δ j also holds for the other two cases. Then, it is easy to see that the conclusion of this lemma holds. and x k+1 j x k+1 j We next establish that the sequence {x k } converges to a local minimizer of (12), and moreover, F(x k ) converges to a local minimum value of (12). Theorem 3.3 Let {x k } be generated by the above IHT method. Then, x k converges to a local minimizer x of problem (12) and moreover, I(x k ) I(x ), x k 0 x 0 and F(x k ) F(x ). Proof. Since f is Lipschitz continuous with constant L f, we have f(x k+1 ) f(x k )+ f(x k ) T (x x k )+ L f 2 xk+1 x k 2. Using this inequality, the fact that L > L f, and (14), we obtain that F(x k+1 ) = f(x k+1 )+λ x k+1 0 a {}}{ f(x k )+ f(x k ) T (x k+1 x k )+ L f 2 xk+1 x k 2 +λ x k+1 0, f(x k )+ f(x k ) T (x k+1 x k )+ L 2 xk+1 x k 2 +λ x k+1 0 }{{} b f(x k )+λ x k 0 = F(x k ), where the last inequality follows from (14). The above inequality implies that {F(x k )} is nonincreasing and moreover, F(x k ) F(x k+1 ) b a = L L f 2 x k+1 x k 2. (24) By the assumption, we know that f is bounded below in B. It then follows that {F(x k )} is bounded below. Hence, {F(x k )} converges to a finite value as k, which together with (24) implies that lim k xk+1 x k = 0. (25) 9

10 Let I k = I(x k ), where I( ) is defined in (17). In view of (19), we observe that x k+1 x k δ if I k I k+1. (26) This together with (25) implies that I k does not change when k is sufficient large. Hence, there exist some K 0 and I {1,...,n} such that I k = I for all k K. Then one can observe from (14) that x k+1 = argmin x B I {f(x k )+ f(x k ) T (x x k )+ L 2 x xk 2 }, k > K, where B I is defined in (15). It follows from Lemma 2.2 that x k x, where x Argmin{f(x) : x B I }. (27) It is not hard to see from (27) that x is a local minimizer of (12). In addition, we know from (19) that x k i δ for k > K and i / I. It yields x i δ for i / I and x i = 0 for i I. Hence, I(x k ) = I(x ) = I for all k > K, which clearly implies that x k 0 = x 0 for every k > K. By continuity of f, we have f(x k ) f(x ). It then follows that F(x k ) = f(x k )+λ x k 0 f(x )+λ x 0 = F(x ). AsshowninTheorem3.3, x k x forsomelocalminimizer x of (12)andF(x k ) F(x ). Our next aim is to establish the iteration complexity of the IHT method for finding an ǫ-localoptimal solution x ǫ B of (12) satisfying F(x ǫ ) F(x ) + ǫ and I(x ǫ ) = I(x ). Before proceeding, we define { α = β = min I {1,...,n} max I {1,...,n} min i [s L(x )] 2 i [Π B(s L (x )) s L (x )] 2 i 2λ } L : x Argmin{f(x) : x B I }(28), { } max [s L (x )] i + Π B (s L (x )) s L (x )] i : x Argmin{f(x) : x B I }. (29) i Theorem 3.4 Assume that f is a smooth strongly convex function with modulus σ > 0. Suppose that L > L f is chosen such that α > 0. Let {x k } be generated by the above IHT method, I k = I(x k ) for all k, x = lim k x k, and F = F(x ). Then, for any given ǫ > 0, the following statements hold: (i) The number changes of I k is at most 2(F(x 0 ) F ) (L L f. )δ 2 (ii) The total number of iterations by the IHT method for finding an ǫ-local-optimal solution x ǫ B satisfying I(x ǫ ) = I(x ) and F(x ǫ ) F +ǫ is at most 2 L/σ log θ, where ǫ { } θ = (F(x 0 ) F )2 ω+3 2, ω = max (d 2c)t ct 2 : 0 t 2(F(x 0 ) F ) (L L f, (30) )δ 2 t c = (L L f)δ 2 2(F(x 0 ) F ), γ = σ( 2α+β 2 β) 2 /32, (31) d = 2log(F(x 0 ) F )+4 2logγ +c. 10

11 Proof. (i) As shown in Theorem 3.3, I k only changes for a finite number of times. Assume that I k only changes at k = n 1 +1,...,n J +1, that is, I nj 1 +1 = = I nj I nj +1 = = I nj+1, j = 1,...,J 1, (32) where n 0 = 0. We next bound J, i.e., the total number of changes of I k. In view of (26) and (32), one can observe that x n j+1 x n j δ, j = 1,...,J, which together with (24) implies that F(x n j ) F(x n j+1 ) 1 2 (L L f)δ 2, j = 1,...,J. (33) Summing up these inequalities and using the monotonicity of {F(x k )}, we have and hence 1 2 (L L f)δ 2 J F(x n 1 ) F(x n J+1 ) F(x 0 ) F, (34) J 2(F(x 0 ) F ) (L L f )δ 2 (ii) Let n j be defined as above for j = 1,...,J. We first show that. (35) n j n j L/σ log ( F(x 0 ) (j 1)(L L f )δ 2 /2 F ) logγ, j = 1,...,J, (36) where F and γ are defined in (12) and (31), respectively. Indeed, one can observe from (14) that x k+1 = argmin x B {f(xk )+ f(x k ) T (x x k )+ L 2 x xk 2 : x Ik+1 = 0}. Therefore, for j = 1,...,J and k = n j 1,...,n j 1, x k+1 = argmin x B {f(xk )+ f(x k ) T (x x k )+ L 2 x xk 2 : x Inj = 0}. We arbitrarily choose 1 j J. Let x (depending on j) denote the optimal solution of One can observe that min {f(x) : x I nj = 0}. (37) x B x 0 x n j Also, it follows from (33) and the monotonicity of {F(x k )} that F(x n j+1 ) F(x 0 ) j 2 (L L f)δ 2, j = 1,...,J. (38) 11

12 Using these relations and the fact that F( x ) F, we have f(x n j 1+1 ) f( x ) = F(x n j 1+1 ) λ x n j F( x )+λ x 0, F(x 0 ) j 1 2 (L L f)δ 2 F. (39) Suppose for a contradiction that (36) does not hold for some 1 j J. Hence, we have n j n j 1 > 2+2 L/σ log ( F(x 0 ) (j 1)(L L f )δ 2 /2 F ) logγ. This inequality and (39) yields n j n j 1 > 2+2 L/σ log f(xnj 1+1 ) f( x ). γ Using the strong convexity of f and applying Theorem 2.3 (ii) to (37) with ǫ = γ, we obtain that σ 2 xn j x 2 f(x n j ) f( x ) < σ 32 ( 2α+β 2 β) 2. It implies that x n j x 2α+β2 β <. (40) 4 Using (40), Lemma 3.1 and the definition of β, we have [s L (x n j )] 2 i [s L( x )] 2 i [Π B(s L (x n j )) s L (x n j )] 2 i +[Π B(s L ( x )) s L ( x )] 2 i [s L (x n j )] 2 i [s L( x )] 2 i + [Π B(s L (x n j )) s L (x n j )] 2 i [Π B(s L ( x )) s L ( x )] 2 i 4(2 x n j x +β) x n j x < α, (41) where the last inequality is due to (40). Let { I = i : [s L ( x )] 2 i [Π B(s L ( x )) s L ( x )] 2 i < 2λ } L and let Ī = {1,...,n}\I. Since α > 0, we know that [s L ( x )] 2 i [Π B(s L ( x )) s L ( x )] 2 i > 2λ L, i Ī. It then follows from (41) and the definition of α that [s L (x n j )] 2 i [Π B(s L (x n j )) s L (x n j )] 2 i < 2λ L, i I, [s L (x n j )] 2 i [Π B (s L (x n j )) s L (x n j )] 2 i > 2λ L, i Ī. Observe that [Π B (s L (x n j ))] i 0 for all i Ī. This fact together with (21) implies that x n j+1 i = 0, i I and x n j+1 i 0, i Ī. 12

13 By a similar argument, one can show that x n j i = 0, i I and x n j i 0, i Ī. Hence, I nj = I nj +1 = I, which is a contradiction to (32). We thus conclude that (36) holds. Let N ǫ denote the total number of iterations for finding an ǫ-local-optimal solution x ǫ B by the IHT method satisfying I(x ǫ ) = I(x ) and F(x ǫ ) F +ǫ. We next establish an upper bound for N ǫ. Summing up the inequality (36) for j = 1,...,J, we obtain that n J J j=1 { 2+2 L/σ log(f(x 0 ) j 1 } 2 (L L f)δ 2 F ) logγ. Using this inequality, (34), and the facts that L σ and log(1 t) t for all t (0,1), we have J [ ( n J 2+2 L/σ log(f(x 0 ) j 1 )] 2 (L L f)δ 2 F ) logγ +1, j=1 J [ 2+2 L/σ (log(f(x 0 ) F ) (L L )] f)δ 2 2(F(x 0 ) F (j 1) logγ +1, ) j=1 L/σ (2log(F(x 0 ) F )+4 2logγ + (L L ) f)δ 2 2(F(x 0 ) F J (L L f)δ 2 ) 2(F(x }{{} 0 ) F ) }{{} d c By the definition of n J, we observe that after n J +1 iterations, the IHT method becomes the projected gradient method applied to the problem x = argmin x B {f(x) : x I nj +1 = 0}. J 2 (42). In addition, we know from Theorem 3.3 that I(x k ) = I(x ) for all k > n J. Hence, f(x k ) f(x ) = F(x k ) F when k > n J. Using these facts and Theorem 2.3 (ii), we have N ǫ n J +1+2 L/σ log F(xnJ+1 ) F. ǫ Using this inequality, (38), (42) and the facts that F F, L σ and log(1 t) t for all t (0,1), we obtain that ( N ǫ n J +1+2 L/σ log(f(x 0 ) J ) 2 (L L f)δ 2 F )+1 logǫ, ( n J + L/σ 2log(F(x 0 ) F ) (L L ) f)δ 2 J F(x 0 ) F +3 2logǫ L/σ [ (d 2c)J cj 2 +2log(F(x 0 ) F )+3 2logǫ ], 13

14 which together with (35) and (30) implies that N ǫ 2 L/σ log θ ǫ. The iteration complexity given in Theorem 3.4 is based on the assumption that f is strongly convex in B. We next consider a case where B is bounded and f is convex but not strongly convex. We will establish the iteration complexity of finding an ǫ-local-optimal solution of (12) by the IHT method applied to a perturbation of (12) obtained from adding a small strongly convex regularization term to f. Consider a perturbation of (12) in the form of where ν > 0 and F ν := min x B {F ν(x) := f ν (x)+λ x 0 }, (43) f ν (x) := f(x)+ ν 2 x 2. One can easily see that f ν is strongly convex in B with modulus ν and moreover f ν is Lipschitz continuous with constant L ν, where L ν = L f +ν. (44) We next establish the iteration complexity of finding an ǫ-local-optimal solution of (12) by the IHT method applied to (43). Given any L > 0, let s L, α and β be defined according to (16), (28) and (29), respectively, by replacing f by f ν, and let δ be defined in (19). Theorem 3.5 Suppose that B is bounded and f is convex but not strongly convex. Let ǫ > 0 be arbitrarily given, D = max{ x : x B}, ν = ǫ/d 2, and L > L ν be chosen such that α > 0. Let {x k } be generated by the IHT method applied to (43), and let x = lim k x k, Fν = F ν(x ) and F = min{f(x) : x B I }, where I = {i : x i = 0}. Then, the total number of iterations by the IHT method for finding an ǫ-local-optimal solution x ǫ B satisfying F(x ǫ ) F +ǫ is D at most 2 2 L f +1 log 2θ, where ǫ ǫ { } θ = (F ν (x 0 ) Fν )2ω+3 2, ω = max (d 2c)t ct 2 : 0 t 2(Fν(x 0 ) Fν) (L L ν)δ, 2 t c = (L Lν)δ2 2(F ν(x 0 ) F ν ), γ = ν( 2α+β 2 β) 2 /32, d = 2log(F ν (x 0 ) F ν )+4 2logγ +c. Proof. By Theorem 3.4 (ii), we see that the IHT method applied to (43) finds an ǫ/2- local-optimal solution x ǫ B of (43) satisfying I(x ǫ ) = I(x ) and F ν (x ǫ ) Fν +ǫ/2 within 2 L ν /ν log 2θ iterations. From the proof of Theorem 3.3, we observe that ǫ F ν (x ) = min{f ν (x) : x B I }. 14

15 Hence, we have Fν = F ν (x ) min f(x)+ νd2 x B I 2 F + ǫ 2. In addition, we observe that F(x ǫ ) F ν (x ǫ ). Hence, it follows that F(x ǫ ) F ν (x ǫ ) F ν + ǫ 2 F +ǫ. Note that F is a local optimal value of (12). Hence, x ǫ is an ǫ-local-optimal solution of (12). The conclusion of this theorem then follows from (44) and ν = ǫ/d 2. For the above IHT method, a fixed L is used through all iterations, which may be too conservative. To improve its practical performance, we can use local L that is update dynamically. The resulting variant of the method is presented as follows. A variant of IHT method for (12): Let 0 < L min < L max, τ > 1 and η > 0 be given. Choose an arbitrary x 0 B and set k = 0. 1) Choose L 0 k [L min,l max ] arbitrarily. Set L k = L 0 k. 1a) Solve the subproblem x k+1 Argmin x B {f(xk )+ f(x k ) T (x x k )+ L k 2 x xk 2 +λ x 0 }. (45) 1b) If F(x k ) F(x k+1 ) η 2 xk+1 x k 2 (46) is satisfied, then go to step 2). 1c) Set L k τl k and go to step 1a). 2) Set k k +1 and go to step 1). end Remark. L 0 k can be chosen by the similar scheme as used in [1, 4], that is, { }} L 0 k {L = max min,min L max, ft x, x 2 where x = x k x k 1 and f = f(x k ) f(x k 1 ). At each iteration, the IHT method solves a single subproblem in step 1). Nevertheless, its variant needs to solve a sequence of subproblems. We next show that for each outer iteration, its number of inner iterations is finite. 15

16 Theorem 3.6 For each k 0, the inner termination criterion (46) is satisfied after at most log(lf +η) log(l min ) +2 inner iterations. logτ Proof. Let L k denote the final value of L k at the kth outer iteration. By (45) and the similar arguments as for deriving (24), one can show that F(x k ) F(x k+1 ) L k L f 2 x k+1 x k 2. Hence, (46) holds whenever L k L f +η, which together with the definition of L k implies that L k /τ < L f +η, that is, L k < τ(l f +η). Let n k denote the number of inner iterations for the kth outer iteration. Then, we have L min τ nk 1 L 0 k τn k 1 = L k < τ(l f +η). log(lf +η) log(l Hence, n k min ) +2 and the conclusion holds. logτ We next establish that the sequence {x k } generated by the above variant of IHT method converges to a local minimizer of (12) and moreover, F(x k ) converges to a local minimum value of (12). Theorem 3.7 Let {x k } be generated by the above variant of IHT method. Then, x k converges to a local minimizer x of problem (12), and moreover, I(x k ) I(x ), x k 0 x 0 and F(x k ) F(x ). Proof. Let L k denote the final value of L k at the kth outer iteration. From the proof of Theorem 3.6, we know that L k [L min,τ(l f +η)). Using this fact and a similar argument as used to prove (19), one can obtain that x k+1 i δ := min δi > 0, if x k+1 j 0, i/ I 0 where I 0 = {i : l i = u i = 0} and δ i is defined according to (20) by replacing L by τ(l f +η) for all i I 0. It implies that x k+1 x k δ if I(x k ) I(x k+1 ). The conclusion then follows from this inequality and the similar arguments as used in the proof of Theorem l 0 -regularized convex cone programming In this section we consider l 0 -regularized convex cone programming problem (2) and propose IHT methods for solving it. In particular, we apply the IHT method proposed in Section 16

17 3 to a quadratic penalty relaxation of (2) and establish the iteration complexity for finding an ǫ-approximate local minimizer of (2). We also propose a variant of the method in which the associated penalty parameter is dynamically updated, and show that every accumulation point is a local minimizer of (2). Let B be defined in (13). We assume that f is a smooth convex function in B, f is Lipschitz continuous with constant L f and that f is bounded below on B. In addition, we make the following assumption throughout this section. Assumption 1 For each I {1,...,n}, there exists a Lagrange multiplier for f I = min{f(x) : Ax b K,x B I }, (47) provided that (47) is feasible, that is, there exists µ K such that f I = d I(µ ), where d I (µ) := inf{f(x)+µ T (Ax b) : x B I }, µ K. Let x be a point in B, and let I = {i : x i = 0}. One can observe that x is a local minimizer of (2) if and only if x is a minimizer of (47) with I = I. Then, in view of Assumption 1, we see that x is a local minimizer of (2) if and only if x B and there exists µ K such that Ax b K, (µ ) T (Ax b) = 0, (48) f(x )+A T µ N BI (x ). Based on the above observation, we can define an approximate local minimizer of (2) to be the one that nearly satisfies (48). Definition 1 Let x be a point in B, and let I = {i : x i = 0}. x is an ǫ-approximate local minimizer of (2) if there exists µ K such that d K (Ax b) ǫ, (µ ) T Π K (Ax b) = 0, f(x )+A T µ N BI (x )+U(ǫ). In what follows, we propose an IHT method for finding an approximate local minimizer of (2). In particular, we apply the IHT method or its variant to a quadratic penalty relaxation of (2) which is in the form of where Ψ ρ := min x B {Ψ ρ(x) := Φ ρ (x)+λ x 0 }, (49) Φ ρ (x) := f(x)+ ρ 2 [d K (Ax b)]2 (50) It is not hard to show that the function Φ ρ is convex differentiable and moreover Φ ρ is Lipschitz continuous with constant L ρ = L f +ρ A 2 (51) 17

18 (see, for example, Proposition 8 and Corollary 9 of [9]). Therefore, problem (49) can be suitably solved by the IHT method or its variant proposed in Section 3. Under the assumption that f is strongly convex in B, we next establish the iteration complexity of the IHT method applied to (49) for finding an approximate local minimizer of (2). Given any L > 0, let s L, α and β be defined according to (16), (28) and (29), respectively, by replacing f by Φ ρ, and let δ be defined in (19). Theorem 4.1 Assume that f is a smooth strongly convex function with modulus σ > 0. Given any ǫ > 0, let ρ = t ǫ A (52) for any t max min µ, where Λ I is the set of Lagrange multipliers of (47). Let L > L ρ I {1,...,n} µ Λ I be chosen such that α > 0. Let {x k } be generated by the IHT method applied to (49), and let x = lim k x k and Ψ ρ = Ψ ρ(x ). Then the IHT method finds an ǫ-approximate local minimizer of (2) in at most Lρ N := 2 log 8L ρθ σ ǫ 2 iterations, where { } θ = (Ψ ρ (x 0 ) Ψ ρ )2ω+3 2, ω = max (d 2c)t ct 2 2(Ψρ(x : 0 t 0 ) Ψ ρ ) (L L ρ)δ, 2 t c = (L Lρ)δ2 2(Ψ ρ(x 0 ) Ψ ρ ), γ = σ( 2α+β 2 β) 2 /32, d = 2log(Ψ ρ (x 0 ) Ψ ρ)+4 2logγ +c. Proof. We know from Theorem 3.3 that x k x for some local minimizer x of (49), I(x k ) I(x ) and Ψ ρ (x k ) Ψ ρ (x ) = Ψ ρ. By Theorem 3.4, after at most N iterations, the IHT method generates x B such at I( x) = I(x ) and Ψ ρ ( x) Ψ ρ (x ) ξ := ǫ 2 /(8L ρ ). It then follows that Φ ρ ( x) Φ ρ (x ) ξ. Since x is a local minimizer of (49), we observe that x = arg min Φ ρ (x), (53) x B I where I = I(x ). Hence, x is a ξ-approximate solution of (53). Let µ Argmin{ µ : µ Λ I }, where Λ I is the set of Lagrange multipliers of (47) with I = I. In view of Lemma 2.5, we see that the pair ( x +,µ) defined as x + := Π BI ( x Φ ρ ( x)/l ρ ) and µ := ρ[a x + b Π K (A x + b)] satisfies f( x + )+A T µ N BI ( x + )+U(2 2L ρ ξ) = N BI ( x + )+U(ǫ), ( ) d K (A x + b) 1 ρ µ ξ + 1 µ + ǫ ρ ρ 8 A ǫ, where the last inequality is due to (52) and the assumption t ˆt µ. Hence, x + is an ǫ-approximate local minimizer of (2). 18

19 We next consider finding an ǫ-approximate local minimizer of (2) for the case where B is bounded and f is convex but not strongly convex. In particular, we apply the IHT method or its variant to a quadratic penalty relaxation of a perturbation of (2) obtained from adding a small strongly convex regularization term to f. Consider a perturbation of (2) in the form of where min {f(x)+ ν x B 2 x 2 +λ x 0 : Ax b K }. (54) The associated quadratic penalty problem for (54) is given by Ψ ρ,ν := min x B {Ψ ρ,ν(x) := Φ ρ,ν (x)+λ x 0 }, (55) Φ ρ,ν (x) := f(x)+ ν 2 x 2 + ρ 2 [d K (Ax b)]2. One can easily see that Φ ρ,ν is strongly convex in B with modulus ν and moreover Φ ρ,ν is Lipschitz continuous with constant L ρ,ν := L f +ρ A 2 +ν. Clearly, the IHT method or its variant can be suitably applied to (55). We next establish the iteration complexity of the IHT method applied to (55) for finding an approximate local minimizer of (2). Given any L > 0, let s L, α and β be defined according to (16), (28) and (29), respectively, by replacing f by Φ ρ,ν, and let δ be defined in (19). Theorem 4.2 Suppose that B is bounded and f is convex but not strongly convex. Let ǫ > 0 be arbitrarily given, D = max{ x : x B}, ( D ) 2 + D +16t+ 2 2ǫ A ρ =, ν = ǫ (56) 16ǫ 2D for any t max min µ, where Λ I is the set of Lagrange multipliers of (47). Let L > L ρ,ν I {1,...,n} µ Λ I be chosen such that α > 0. Let {x k } be generated by the IHT method applied to (55), and let x = lim k x k and Ψ ρ,ν = Ψ ρ,ν(x ). Then the IHT method finds an ǫ-approximate local minimizer of (2) in at most iterations, where N := 2 2DLρ,ν ǫ log 32L ρ,νθ ǫ 2 { θ = (Ψ ρ,ν (x 0 ) Ψ ρ,ν )2ω+3 2, ω = max (d 2c)t ct 2 : 0 t t c = (L Lρ,ν)δ2 2(Ψ ρ,ν(x 0 ) Ψ ρ,ν ), γ = ν( 2α ρ,ν +β 2 ρ,ν β ρ,ν ) 2 /32, d = 2log(Ψ ρ,ν (x 0 ) Ψ ρ,ν)+4 2logγ +c. } 2(Ψρ,ν(x 0 ) Ψ ρ,ν ) (L L ρ,ν)δ, 2 19

20 Proof. From Theorem 3.3, we know that x k x for some local minimizer x of (55), I(x k ) I(x ) and Ψ ρ,ν (x k ) Ψ ρ,ν (x ) = Ψ ρ,ν. By Theorem 3.4, after at most N iterations, theiht methodapplied to (55) generates x B such at I( x) = I(x ) andψ ρ,ν ( x) Ψ ρ,ν (x ) ξ := ǫ 2 /(32L ρ,ν ). It then follows that Φ ρ,ν ( x) Φ ρ,ν (x ) ξ. Since x is a local minimizer of (55), we see that x = arg min Φ ρ,ν (x), (57) x B I where I = I(x ). Hence, x is a ξ-approximate solution of (57). In view of Lemma 2.5, we see that the pair ( x +,µ) defined as x + := Π BI ( x Φ ρ,ν ( x)/l ρ,ν ) and µ := ρ[a x + b Π K (A x + b)] satisfies f( x + )+ν x + +A T µ N BI ( x + )+U(2 2L ρ,ν ξ) = N BI ( x + )+U(ǫ/2), which together with the fact that ν x + νd ǫ/2 implies that f( x + )+A T µ ν x + N BI ( x + )+U(ǫ/2) N BI ( x + )+U(ǫ). In addition, it follows from Lemma 2.1 (c) that Φ ρ,ν ( x + ) Φ ρ,ν ( x), and hence Φ ρ,ν ( x + ) Φ ρ,ν (x ) Φ ρ,ν ( x) Φ ρ,ν (x ) ξ. Let Φ ρ = min{φ ρ(x) : x B I }, where Φ ρ is defined in (50). Notice that Φ ρ,ν (x ) Φ ρ + νd 2 /2. It then follows that Φ ρ ( x + ) Φ ρ Φ ρ,ν( x + ) Φ ρ,ν (x )+ νd2 2 ξ + ǫd 4 ǫ 2 32ρ A 2 + ǫd 4. Let µ Argmin{ µ : µ Λ I }, where Λ I is the set of Lagrange multipliers of (47) with I = I. In view of Lemma 2.5 and the assumption t ˆt µ, we obtain that d K (A x + b) 1 ǫ ρ µ ρ 2 A + ǫd 2 4ρ 1 ( ǫ ǫd t+ )+ ρ 32 A 4ρ = ǫ, where the last inequality is due to (56). Hence, x + is an ǫ-approximate local minimizer of (2). For the above method, the fixed penalty parameter ρ is used through all iterations, which may be too conservative. To improve its practical performance, we can update ρ dynamically. The resulting variant of the method is presented as follows. Before proceeding, we define the projected gradient of Φ ρ at x B I with respect to B I as g(x;ρ,i) = L ρ [x Π BI (x 1 L ρ Φ ρ (x))], (58) where I {1,...,n}, and Φ ρ and L ρ are defined in (50) and (51), respectively. 20

21 A variant of IHT method for (2): Let {ǫ k } be a positive decreasing sequence. Let ρ 0 > 0, τ > 1, t > max min µ, where Λ I I {1,...,n} µ Λ I is the set of Lagrange multipliers of (47). Choose an arbitrary x 0 B. Set k = 0. end 1) Start from x k 1 and apply the IHT method or its variant to problem (49) with ρ = ρ k until finding some x k B such that where I k = I(x k ). 2) Set ρ k+1 := τρ k. 3) Set k k +1 and go to step 1). d K (Ax k b) t ρ k, g(x k ;ρ k,i k ) min{1,l ρk }ǫ k, (59) The following theorem shows that x k satisfying (59) can be found within a finite number of iterations by the IHT method or its variant applied to problem (49) with ρ = ρ k. Without loss of generality, we consider the IHT method or its variant applied to problem (49) with any given ρ > 0. Theorem 4.3 Let x 0 B be an arbitrary point and the sequence {x l } be generated by the IHT method or its variant applied to problem (49). Then, the following statements hold: (i) lim l g(x l ;ρ,i l ) = 0, where I l = I(x l ) for all l. min µ and Λ I is the set of Lagrange multi- µ Λ I (ii) lim d K (Ax l b) ˆt, where ˆt := max l ρ pliers of (47). I {1,...,n} Proof. (i) It follows from Theorems 3.3 and 3.7 that x l x for some local minimizer x of (49) and moreover, Φ ρ (x l ) Φ ρ (x ) and I l I, where I l = I(x l ) and I = I(x ). We also know that x Arg min Φ ρ (x). x B I It then follows from Lemma 2.1 (d) that Φ ρ (x l ) Φ ρ (x ) 1 2L ρ g(x l ;ρ,i ) 2, l N. Using this inequality and Φ ρ (x l ) Φ ρ (x ), we thus have g(x l ;ρ,i ) 0. Since I l = I for l N, we also have g(x l ;ρ,i l ) 0. 21

22 (ii) Let f I be defined in (47). Applying Lemma 2.4 to problem (47), we know that f(x l ) f I(l) ˆtd K (Ax l b), l, (60) where ˆt is defined above. Let x and I be defined in the proof of statement (i). We observe that f I Φ ρ(x ). Using this relation and (60), we have that for sufficiently large l, Φ ρ (x l ) Φ ρ (x ) = f(x l )+ ρ 2 [d K (Ax l b)] 2 Φ ρ (x ) f(x l ) f I + ρ 2 [d K (Ax l b)] 2 which implies that = f(x l ) f I(l) + ρ 2 [d K (Ax l b)] 2 ˆtd K (Ax l b)+ ρ 2 [d K (Ax l b)] 2, d K (Ax l b) ˆt ρ + Φ ρ (x l ) Φ ρ (x ). ρ This inequality together with the fact lim l Φ ρ (x l ) = Φ ρ (x ) yields statement (ii). Remark. From Theorem 4.3, we can see that the inner iterations of the above method terminates finitely. We next establish convergence of the outer iterations of the above variant of the IHT method for (2). In particular, we show that every accumulation point of {x k } is a local minimizer of (2). Theorem 4.4 Let {x k } be the sequence generated by the above variant of the IHT method for solving (2). Then any accumulation point of {x k } is a local minimizer of (2). Proof. Let x k = Π BIk (x k 1 L ρk Φ ρk (x k )). Since {x k } satisfies (59), it follows from Lemma 2.1 (a) that Φ ρk (x k ) N BIk ( x k )+U(ǫ k ), (61) where I k = I(x k ). Let x be any accumulation point of {x k }. Then there exists a subsequence K such that {x k } K x. By passing to a subsequence if necessary, we can assume that I k = I for all k K. Let µ k = ρ k [Ax k b Π K (Ax k b)]. We clearly see that Using (61) and the definitions of Φ ρ and µ k, we have (µ k ) T Π K (Ax k b) = 0. (62) f(x k )+A T µ k N BI ( x k )+U(ǫ k ), k K. (63) 22

23 By (58), (59) and the definition of x k, one can observe that x k x k = 1 L ρk g(x k ;ρ k,i k ) ǫ k. (64) Inaddition, noticethat µ k = ρ k d K (Ax k b), which together with(59)implies that µ k t for all k. Hence, {µ k } is bounded. By passing to a subsequence if necessary, we can assume that {µ k } K µ. Using (64) and upon taking limits on both sides of (62) and (63) as k K, we have (µ ) T Π K (Ax b) = 0, f(x )+A T µ N BI (x ) In addition, since x k I = 0 for k K, we know that x I = 0. Also, it follows from (59) that d K (Ax b) = 0, which implies that Ax b K. These relations yield that and hence, x is a local minimizer of (2). 5 Concluding remarks x Argmin x B I {f(x) : Ax b K }, In this paper we studied iterative hard thresholding (IHT) methods for solving l 0 regularized convex cone programming problems. In particular, we first proposed an IHT method and its variant for solving l 0 regularized box constrained convex programming. We showed that the sequence generated by these methods converges to a local minimizer. Also, we established the iteration complexity of the IHT method for finding an ǫ-local-optimal solution. We then proposed a method for solving l 0 regularized convex cone programming by applying the IHT method to its quadratic penalty relaxation and established its iteration complexity for finding an ǫ-approximate local minimizer. Finally, we proposed a variant of this method in which the associated penalty parameter is dynamically updated, and showed that every accumulation point is a local minimizer of the problem. Some of the methods studied in this paper can be extended to solve some l 0 regularized nonconvex optimization problems. For example, the IHT method and its variant can be applied to problem (12) in which f is nonconvex and f is Lipschitz continuous. In addition, the numerical study of the IHT methods will be presented in the working paper [7]. Finally, it would be interesting to extend the methods of this paper to solve rank minimization problems and compare them with the methods studied in [5, 8]. This is left as a future research. Acknowledgment The author would like to thank Ting Kei Pong for proofreading and suggestions which substantially improves the presentation of the paper. 23

24 References [1] J. Barzilai and J.M. Borwein. Two point step size gradient methods. IMA J. Numer. Anal., 8: , [2] T. Blumensath and M. E. Davies. Iterative thresholding for sparse approximations. J. FOURIER ANAL. APPL., 14: , [3] T. Blumensath and M. E. Davies. Iterative hard thresholding for compressed sensing. Appl. Comput. Harmon. Anal., 27(3): , [4] E. G. Birgin, J. M. Martínez, and M. Raydan. Nonmonotone spectral projected gradient methods on convex sets. SIAM J. Optim., 4: , [5] J. Cai, E. Candès, and Z. Shen. A singular value thresholding algorithm for matrix completion. SIAM J. Optim., 20: , [6] K. K. Herrity, A. C. Gilbert, and J. A. Tropp. Sparse approximation via iterative thresholding. IEEE International Conference on Acoustics, Speech and Signal Processing, [7] J. Huang, S. Liu, and Z. Lu. Sparse approximation via nonconvex regularizers. Working paper, Department of Statistics, Texas A&M University, [8] P. Jain, R. Meka, and I. Dhillon. Guaranteed rank minimization via singular value projection. Neural Information Processing Systems, , [9] G. Lan and R. D. C. Monteiro. Iteration-complexity of first-order penalty methods for convex programming. To appear in Math. Prog.. [10] Z. Lu and Y. Zhang. Sparse approximation via penalty decomposition methods. Manuscript, Department of Mathematics, Simon Fraser University, Februray [11] S. Mallat and Z. Zhang. Matching pursuits with time-frequency dictionaries. IEEE T. Image Process., 41(12): , [12] Y. Nesterov. Introductory Lectures on Convex Programming: a basic course. Kluwer Academic Publishers, Massachusetts, [13] M. Nikolova. Description of the minimizers of least squares regularized with l 0 norm. Report HAL , CMLA - CNRS ENS Cachan, France, October [14] J. A. Tropp. Greed is good: algorithmic results for sparse approximation. IEEE T. Inform. Theory, 50(10): ,

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Zhaosong Lu October 5, 2012 (Revised: June 3, 2013; September 17, 2013) Abstract In this paper we study