arxiv: v2 [math.oc] 26 May 2018

Size: px
Start display at page:

Download "arxiv: v2 [math.oc] 26 May 2018"

Transcription

1 Noname manuscript No. (will be inserted by the editor) A successive difference-of-conve approimation method for a class of nonconve nonsmooth optimization problems Tianiang Liu Ting Kei Pong Akiko Takeda arxiv: v2 [math.oc] 26 May 208 Received: date / Accepted: date Abstract We consider a class of nonconve nonsmooth optimization problems whose objective is the sum of a smooth function and a finite number of nonnegative proper closed possibly nonsmooth functions (whose proimal mappings are easy to compute), some of which are further composed with linear maps. This kind of problems arises naturally in various applications when different regularizers are introduced for inducing simultaneous structures in the solutions. Solving these problems, however, can be challenging because of the coupled nonsmooth functions: the corresponding proimal mapping can be hard to compute so that standard first-order methods such as the proimal gradient algorithm cannot be applied efficiently. In this paper, we propose a successive difference-of-conve approimation method for solving this kind of problems. In this algorithm, we approimate the nonsmooth functions by their Moreau envelopes in each iteration. Making use of the simple observation that Moreau envelopes of nonnegative proper closed functions are continuous difference-of-conve functions, we can then approimately the approimation function by first-order methods with suitable majorization techniques. These first-order methods can be implemented efficiently thanks to the fact that the proimal mapping of each nonsmooth function is easy to compute. Under suitable assumptions, we prove that the sequence generated by Ting Kei Pong is supported in part by Hong Kong Research Grants Council PolyU53085/6p. Akiko Takeda is supported by Grant-in-Aid for Scientific Research (C), 5K0003. T. Liu Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong. tiskyliu@polyu.edu.hk T. K. Pong Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong. tk.pong@polyu.edu.hk A. Takeda Department of Creative Informatics, Graduate School of Information Science and Technology, the University of Tokyo, Tokyo, Japan. takeda@mist.i.u-tokyo.ac.jp RIKEN Center for Advanced Intelligence Project, -4-, Nihonbashi, Chuo-ku, Tokyo , Japan. akiko.takeda@riken.jp

2 2 T. Liu, T. K. Pong and A. Takeda our method is bounded and any accumulation point is a stationary point of the objective. We also discuss how our method can be applied to concrete applications such as nonconve fused regularized optimization problems and simultaneously structured matri optimization problems, and illustrate the performance numerically for these two specific applications. Keywords Moreau envelope difference-of-conve approimation proimal mapping simultaneous structures Introduction In this paper, we consider the following possibly nonconve nonsmooth optimization problem: F () := f() + P 0 () + P i (A i ), () with the objective satisfying the following assumptions (see the net section for notation and definitions): A. f : R n R is an L-smooth function i.e., there eists a constant L > 0 so that i= f() f(v) L v for any, v R n. A2. A i : R n R ni, i =,..., m, are linear mappings and P i : R ni R + {, i = 0,..., m, are proper closed functions. The functions P i, i = 0,..., m, are continuous in their respective domains, and dom P 0 m i= A i (dom P i ). Moreover, the proimal mapping of γp i is easy to compute for every γ > 0 and for each i = 0,..., m. The sets dom P i, i =,..., m, are closed. A3. The function f + P 0 is level-bounded, i.e., for each r R, the set { R n : f() + P 0 () r is bounded. Problem () arises in many contemporary applications such as structured low rank matri recovery problems (see, for eample, [8]), nonconve fused regularized optimization problems (see, for eample, [2] and Eample 2 in Section 4) and simultaneously structured matri optimization problems (see, for eample, [23] and Eample 5 in Section 4). In these applications, the P i s are used for inducing desirable structures in the solutions and they are typically functions whose proimal mappings are easy to compute. If only one such function appears in (), i.e., m = 0, then some standard first-order methods such as the proimal gradient algorithm or its variants can be applied to solving () efficiently, because these algorithms only require the computation of f and the proimal mapping of γp 0 (γ > 0) in each iteration. However, in all the aforementioned applications, there are always more than one such structure-inducing functions in () (i.e., m )

3 Successive DC Approimation for Nonconve and Nonsmooth Problems 3 and the A i s might not always be identity mappings. Then the proimal gradient algorithm and its variants cannot be applied efficiently, because the proimal mapping of P 0 () + m i= P i(a i ) can be hard to compute in general. When the function f and the P i s are all conve functions, one alternative approach for solving () is the alternating direction method of multipliers (ADMM); see, for eample, [9, 0]. This method can be applied to () by suitably introducing slack variables that transform the problem into a linearly constrained problem, and each iteration only requires computing the proimal mappings of f and γp i s, as well as an update of an auiliary (dual) variable. However, it is known that the ADMM does not necessarily converge if the P i s are nonconve and m ; see, for eample, [3, Eample 7]. In the case when P i s are nonconve but globally Lipschitz for i = 0,..., m, and A i is the identity mapping for all i, a new method for solving () was introduced in a series of work [32, 33]. Their method is based on the so-called proimal average of P i s, and each iteration involves only the computations of f and the proimal mappings of γp i s. However, it was only shown that any accumulation point of the sequence generated by their method is a stationary point of a certain smooth approimation of (). Moreover, their method was designed for the case when P i s are globally Lipschitz, and the convergence behavior of their method is unknown when some non-lipschitz functions such as the l p quasi-norm or the indicator function of some closed sets (such as the set of all k-sparse vectors) are present in (). In this paper, we propose a new method for solving () that is ready to take advantage of the ease of proimal mapping computations and has convergence guarantee under suitable assumptions, without imposing conveity nor globally Lipschitz continuity on P i s. We call our method the successive difference-of-conve approimation method (SDCAM). In this method, we construct an approimation to the objective of () in each iteration using the Moreau envelopes of the λ i,t P i, i =,..., m, where t is the number of iteration and {λ i,t are nonincreasing positive sequences satisfying lim t λ i,t = 0; a suitable approimate stationary point of this approimation function is then taken to be the net iterate t+ of our algorithm. The point t+ can be found efficiently by recalling that the Moreau envelopes involved, despite being nonsmooth in general due to the possible nonconveity of the P i s, are continuous difference-of-conve functions. Thus, one can incorporate majorization techniques in some standard first-order methods such as the proimal gradient algorithm for finding t+ in each iteration. Moreover, when such first-order methods are applied, the main computational cost per inner iteration typically only depends on the computations of f and the proimal mappings of γp i, i = 0,..., m, γ > 0, which are inepensive in many applications. This suggests that the SDCAM can be applied efficiently for solving (). More details of this algorithm will be discussed in Section 3, where we also prove that the sequence { t generated is bounded and any accumulation point is a stationary point of () under suitable assumptions. The rest of the paper is organized as follows. In Section 2, we introduce notation and some preliminary results. Our SDCAM is presented and its convergence is analyzed under suitable assumptions in Section 3. We then discuss how our method can be applied to various kinds of structured optimization problems including some nonconve fused regularized optimization problems, some simultaneously sparse and low rank matri optimization problems, and the low rank nearest correlation matri problem, in Section 4. We also perform numerical eperiments on some

4 4 T. Liu, T. K. Pong and A. Takeda of these applications to demonstrate the efficiency of our algorithm in Section 5. Finally, we present some concluding remarks in Section 6. 2 Notation and preliminaries In this paper, vectors and matrices are represented in bold lower case letters and upper case letters, respectively. The inner product of two vectors a and b R n are denoted by a b or b a, and we use a 0, a and a to denote the number of nonzero entries, the l norm and the l 2 norm of a, respectively. Moreover, we use Diag(a) to denote the diagonal matri whose diagonal is a. For two matrices A and B R m n, their Hadamard (entrywise) product is denoted by A B. We also use A and A F to denote the nuclear norm and the Fröbenius norm of A, respectively, and let vec(a) R mn denote the vectorization of A, which is obtained by stacking the columns of A on top of one another. Furthermore, we use σ ma(a) to denote the largest singular value of A. The space of symmetric n n matrices is denoted by S n. For a matri X S n, we use diag(x) R n to denote its diagonal and λ ma(x) to denote its largest eigenvalue. We write X 0 if X is positive semidefinite. For a linear operator A, we let A denote its adjoint. A function h : R n R { is said to be proper if dom h := { : h() < =. Such a function is said to be closed if it is lower semicontinuous. Following [25, Definition 8.3], for a proper function h, the limiting and horizon subdifferentials at dom h are defined respectively as { h() = u : u t u, t h with u t ˆ h( t ) for each t, { h() = u : α t 0, α t u t u, t h with u t ˆ h( t ) for each t, where ˆ h(w) := h(y) h(w) u {u : lim inf (y w) 0, and the notation t h y w,y w y w means t and h( t ) h(). We also define h() = h() := when / dom h. It is easy to show that at any dom h, the limiting and horizon subdifferentials have the following robustness property: {u : u t u, t h with u t h( t ) for each t h(), {u : α t 0, α t u t u, t h with u t h( t ) for each t h(). (2) The limiting subdifferential at reduces to { h() if h is continuously differentiable at [25, Eercise 8.8(b)], and reduces to the conve subdifferential if h is proper conve [25, Proposition 8.2]. For a proper closed function h with inf h >, we will also need its Moreau envelope for any given λ > 0, which is defined as e λ h() := inf y { 2λ y 2 + h(y) This function is finite everywhere [25, Theorem.25]. It is not hard to see that e λ h() h() (3).

5 Successive DC Approimation for Nonconve and Nonsmooth Problems 5 for all. The infimum in the definition of Moreau envelope is attained at the so-called proimal mapping of λh at, which is defined as { pro λh () := Argmin u R n 2λ u 2 + h(u). This set is always nonempty because h is proper closed and bounded below [25, Theorem.25]. Let ζ λ pro λh (). Then we have from [25, Theorem 0.] and [25, Eercise 8.8(c)] that λ ( ζ λ) h(ζ λ ). (4) Furthermore, we have the following simple lemma, which should be well known. We provide a short proof for self-containedness. Lemma Let h be a proper closed function with inf h > and let dom h. Suppose that t, λ t 0 and pick any ζ t pro λth( t ) for each t. Then it holds that ζ t dom h for all t and ζ t. Proof Under the assumptions, we have the following inequality: 2λ t t ζ t 2 + inf h 2λ t t ζ t 2 + h(ζ t ) = e λt h( t ) 2λ t t 2 + h( ). Hence, we have ζ t dom h for all t and ζ t ζ t t + t 2λ t (h( ) inf h) + t 2 + t 0. as Finally, recall that for a nonempty closed set C, the indicator function is defined { 0 if C, δ C () = else. We define the (limiting) normal cone at any C as N C () := δ C (). We let dist(, C) := inf y C y. The set of points in the nonempty closed set C that are closest to a given is denoted by proj C (). One can observe that proj C = pro δc. The set proj C () at a given is always nonempty for a nonempty closed set C, and is a singleton when C is in addition conve. 3 Solution method for nonconve nonsmooth optimization problems 3. Successive difference-of-conve approimation method In this paper, we consider problem () and assume that its objective satisfies the assumptions A, A2 and A3 in Section. We will discuss some concrete applications of () in more details in Section 4. In this section, we present an algorithm for solving (). Notice that () is in general a nonsmooth nonconve optimization problem. The nonsmooth nonconve function P 0 + m i= P i A i can be complicated in practice and handling it directly can be challenging. Indeed, although the proimal mappings of

6 6 T. Liu, T. K. Pong and A. Takeda γp i, i = 0,..., m, are easy to compute, the proimal mapping of P 0 + m i= P i A i may be hard to evaluate and hence the classical proimal gradient algorithm and its variants cannot be adapted directly and efficiently for solving (). In this paper, we suitably adapt a smoothing scheme for solving the above nonconve nonsmooth problem. In this approach, in each iteration, we the auiliary function F λ () := f() + P 0 () + e λi P i (A i ) (5) approimately and then update and λ = (λ,, λ m), where e λi P i is the Moreau envelope of P i. When P i, i =,..., m are all conve functions, the corresponding functions e λi P i are Lipschitz differentiable [3, Proposition 2.29]. Hence, the function F λ becomes the sum of a nonsmooth function P 0 and a smooth function, and can be d efficiently using, for eample, the proimal gradient algorithm and its variants. This smoothing strategy has been widely used in the literature for conve problems; see [20], and also [4] for a software package for conve optimization problems based on smoothing techniques. However, in our setting, P i is not necessarily conve. Thus, the corresponding Moreau envelope e λi P i is not necessarily smooth and it is unclear whether F λ can be d efficiently at first glance. The key ingredient in our approach (where P i is possibly nonconve) is the simple observation that for any nonnegative proper closed function P and any µ > 0, e µp (u) = { 2µ u 2 sup y dom P µ u y 2µ y 2 P (y). (6) {{ D µ,p (u) Such a decomposition has been noted in [2] when P = δ C for some nonempty closed set C, and in [7, Proposition 3] for the general case. Then D µ,p, as the supreme of affine functions and being finite-valued, is conve continuous. Moreover, using the definition of e µp (u), pro µp (u) and (6), we see that the supremum in D µ,p (u) is attained at any point in pro µp (u). Let y pro µp (A). Then y dom P and we have for any w that D µ,p (w) D µ,p (A) = sup y dom P { µ w y 2µ y 2 P (y) µ w y 2µ y 2 P (y ) = µ y (w A). sup i= y dom P { µ (A) y 2µ y 2 P (y) ( µ (A) y 2µ y 2 P (y ) This implies µ pro µp (A) D µ,p (A), from which we deduce further that µ A pro µp (A) A D µ,p (A) = (D µ,p A)(), (7) where the last equality follows from [24, Theorem 23.9] because D µ,p is conve continuous. Thus, (5) is the sum of a smooth function f, a nonsmooth nonconve function P 0 whose proimal mapping is easy to compute, and a continuous )

7 Successive DC Approimation for Nonconve and Nonsmooth Problems 7 difference-of-conve function such that a subgradient corresponding to its concave part is easy to compute; thanks to (7) and Assumption A2. Proimal gradient methods with majorization techniques can then be suitably applied to minimizing (5). For instance, one can apply the NPG major described in the appendi. Specifically, one can apply NPG major with h() = f() + i= 2λ i A i 2, P () = P 0 (), g() = D λi,p i (A i ). It is routine to check that this choice of h, P and g satisfies the assumptions required in the appendi. Moreover, the F λ is level-bounded because f + P 0 is level-bounded by assumption and e λi P i are nonnegative for each i =,..., m since P i are nonnegative. Finally, F λ is continuous in its domain because P 0 is. Hence all assumptions required in the appendi for applying NPG major are satisfied and the method can be applied to minimizing F λ by initializing at any point 0 dom P 0. We now describe our method for solving () with its update rules below in Algorithm. We call this method the successive difference-of-conve approimation method (SDCAM). i= Algorithm The SDCAM for () Step 0. Pick m + sequences of positive numbers with ɛ t 0 and λ i,t 0 for i =,..., m, an feas dom P 0 m i= A i (dom P i ), and an 0 dom P 0. Set t = 0. Step. If F λt ( t ) F λt ( feas ), set t,0 = t. Else, set t,0 = feas. Step 2. Approimately F λt (), starting at t,0, and terminating at t,l t when dist ( 0, f( t,l t ) + P 0 ( t,l t+ ) + i= and t,l t+ t,l t ɛ t, F λt ( t,l t ) F λt ( t,0 ). Step 3. Update t+ = t,l t and t = t +. Go to Step. ) A i λ [A i t,l t pro λi,t P i (A i t,l t )] ɛ t, i,t (8) We would like to point out that Step in SDCAM is crucial in our convergence analysis: this strategy was also used in the penalty decomposition method in [5]. As we shall see in the proof of Theorem 2 below, it ensures that (2) can be applied at an accumulation point of { t. 3.2 Theoretical guarantee for global convergence In this section, we first discuss how F λt can be approimately d so that (8) is satisfied at the t-th iteration and comment on the computational compleity. Then we prove the convergence of the SDCAM under suitable assumptions. As discussed above, F λt can be d by the NPG major outlined in the appendi. Moreover, due to (7), one can choose ζ t,l = m i= λ i,t A i ζt,l i in the algorithm with ζ t,l i pro λi,tp i (A i t,l ) (9)

8 8 T. Liu, T. K. Pong and A. Takeda for each i =,..., m and l 0 so that m i= λ i,t A i ζt,l i lies in the subdifferential of m i= (D λ i,t,p i A i ) at t,l. Using this special version of NPG major, we can show that the termination criterion (8) is satisfied after finitely many inner iterations. Theorem Suppose that the NPG major is applied with ζ t,l = m i= λ i,t A i ζt,l i, where ζ t,l i are chosen as in (9), to minimizing F λt in the t-th iteration of SDCAM. Then the criterion (8) is satisfied after finitely many inner iterations. Proof According to the convergence properties of the NPG major, one obtains a sequence { t,l l 0 satisfying. lim t,l+ t,l = 0 (Proposition 2 in the appendi), F λt ( t,l ) F λt ( t,0 ) l (thanks to (46)); and 2. for any l 0 (see (45)), t,l+ Argmin ( f( t,l ) + i= ) ω t,l i + L t,l λ i,t 2 t,l 2 + P 0 (), (0) where ω t,l i := A i [A i t,l ζ t,l i ]. Here, the sequence { Lt,l l 0 can be shown to be bounded; see Proposition in the appendi. Using [25, Eercise 8.8(c)], the condition (0) implies 0 f( t,l ) + i= Lt,l ( t,l+ t,l ) f( t,l ) + λ i,t A i [A i t,l ζ t,l i ] + Lt,l ( t,l+ t,l ) + P 0 ( t,l+ ), i= λ i,t A i [A i t,l ζ t,l i ] + P 0 ( t,l+ ), from which (8) can be seen to hold with l t = l when l is sufficiently large because lim l t,l+ t,l = 0 and { Lt,l l 0 is bounded. Remark (Computational compleity) Suppose that the NPG major is applied to minimizing F λt in each iteration of SDCAM, with the ζ t,l chosen as in Theorem. Then one has to repeatedly solve subproblems of the form (0) for various values of λ t and β > 0 (in place of Lt,l ). These computations are easy under the assumption that the proimal mapping γp i, i =,..., m, γ > 0, is easy to compute. Indeed, the subproblems can be rewritten as ( ( )) t,l+ pro t,l f( t,l ) + A β P0 i [A i t,l ζ t,l β λ i ], () i,t where ζ t,l i pro λi,tp i (A i t,l ). We now state and prove our convergence result for SDCAM. We will comment on (2) in Remark 2 below before proving the theorem. Theorem 2 (Convergence of SDCAM) Let { t be the sequence generated by SD- CAM for solving (). Then { t is bounded. Let be an accumulation point of this sequence. Then we have the following results. i=

9 Successive DC Approimation for Nonconve and Nonsmooth Problems 9 (i) It holds that dom P 0 m (ii) Suppose that y 0 + i= A i (dom P i ). A i y i = 0 and y 0 P 0 ( ), i= = y i = 0 for i = 0,..., m. Then is a stationary point of (), i.e., 0 f( ) + P 0 ( ) + Remark 2 (Comments on condition (2)) y i P i (A i ) for i =,..., m (2) A i P i (A i ). (3) (i) Condition (2) is a classical constraint qualification for nonconve nonsmooth optimization problems; see [25, Corollary 0.9]. It is satisfied, for eample, when A i equals the identity map for all i, and all but one P i are locally Lipschitz so that P i ( ) = {0 for all but one P i ; see [25, Eercise 0.0]. (ii) Under (2), it can be shown using [25, Theorem 0.], [25, Proposition 0.5] and [25, Theorem 0.6] that any local r of () satisfies (3). Proof Using the nonnegativity of P i, the last criterion in (8) and the definitions of F λ and t,0, we see that f( t ) + P 0 ( t ) F λt ( t ) F λt ( feas ) F ( feas ) =: F feas, (4) where the last inequality follows from the definitions of F, F λ and (3). From this, one immediately conclude that { t is bounded because f + P 0 is level-bounded. Net, let be an accumulation point of { t. Then there eists a subsequence { t so that lim t =. Using this, (4), and the lower semicontinuity of f+p 0, we further see that i= f( ) + P 0 ( ) lim inf f(t ) + P 0 ( t ) F feas <. This shows that dom P 0. On the other hand, since P i is nonnegative, we have 0 { 2 dist2 (A i, dom P i ) = inf y dom P i 2 A i y 2 { inf y dom P i 2 A i y 2 + λ i,t P i (y) = λ i,t e λi,t P i (A i ) for all and for each i =,..., m. Using this, the finiteness of l := inf{f + P 0 (thanks to the level-boundedness of f + P 0 ), and the definition of F λ, we have for each i =,..., m that l + 2λ i,t dist 2 (A i t, dom P i ) l + e λi,t P i (A i t ) F λt ( t ) F feas, where the last inequality follows from (4). Since λ i,t 0, we conclude that dist 2 (A i, dom P i ) 0 and hence A i dom P i because dom P i is closed.

10 0 T. Liu, T. K. Pong and A. Takeda We now prove (3) under (2). For notational simplicity, let y t+ := t,lt+. Then lim y t = thanks to the second relation in (8). Moreover, from the first relation in (8), we see that there eist ξ t with ξ t ɛ t, η t P 0 (y t ) and ζi t pro λ i,t P i (A i t ) for each i =,..., m so that Define ξ t = f( t ) + η t + r t := η t + i= i= λ i,t A i (A i t ζ t i ). (5) λ i,t A i (A i t ζ t i ). We claim that {r t is bounded. Suppose to the contrary that {r t is unbounded and we assume without loss of generality that lim r t = and inf r t > 0. Then the sequences { r t η t and { λ i,t r t A i (A i t ζi t ) for i =,..., m are bounded. Without loss of generality, we may assume η t ( lim = η and lim A Ai t ζ t ) i i = χ i (6) r t λ i,t r t for some η and χ i, i =,..., m. Notice that = ηt + m i= λ i,t A i (A i t ζi t ) = η + r t χ i. (7) In addition, by dividing r t from both sides of (5) and passing to the limit along t I, we conclude that 0 = η + χ i. (8) On the other hand, since η t P 0 (y t ) and lim r t =, we have from (6), the continuity of P 0 in its domain and (2) that i= i= η P 0 ( ). (9) Net, we prove that χ i A i P i (A i ) for i =,..., m. To proceed, we define for each i =,..., m, wi t := A i t ζi t λ i,t r t and claim that {w t i is bounded for all i =,..., m. For an arbitrarily fied i {,..., m, suppose to the contrary that {w t i is unbounded and we assume without loss of generality that lim w t i = and that A lim i t ζ t wi t i = ψi (20) λ i,t r t for some ψ i with unit norm. Then from the second equation in (6), we have ψ i = and A i ψ i = 0. (2)

11 Successive DC Approimation for Nonconve and Nonsmooth Problems In addition, we observe from (20) that ψi A = lim i t ζ t wi t i λ i,t r t { lim wi tr u t : u t P i (ζi t ) for each t P i (A i ). t where the first inclusion follows from (4) and the second inclusion follows from Lemma (so that lim ζi t = A i and {ζi t dom P i ), the continuity of P i in its domain and (2). These together with the facts 0 P 0 ( ), 0 P i (A i ) (i =,..., m) and (2) contradict (2). Consequently, {wi t is bounded for all A i =,..., m. Then, without loss of generality, we assume that lim i t ζ t i λ i,t r t eists for all i =,..., m. Then, for each i =,..., m, we observe from (6) that χ i = A i lim A i t ζ t i λ i,t r t A i { lim u t : u t P i (ζi t ) for each t A i P i (A i ), r t where the first inclusion follows from (4) and the second inclusion follows from Lemma (so that lim ζi t = A i and {ζi t dom P i for each i =,..., m), the continuity of P i in its domain and (2). These together with (7), (8) and (9) contradict (2). Consequently, {r t is bounded. Since {r t is bounded, we may assume without loss of generality that lim η t = η and lim A i (A i t ζi t ) = χ i (22) λ i,t for some η and χ i, i =,..., m. Then we have from (2) and the continuity of P 0 in its domain that η P 0 ( ). (23) Net, we prove that χ i A i P i(a i ) for i =,..., m. To proceed, we define for each i =,..., m, νi t := A i t ζi t λ i,t and claim that {νi t is bounded for all i =,..., m. For an arbitrary fied i {,..., m, suppose to the contrary that {νi t is unbounded and we assume without loss of generality that lim νi t = and that A lim i t ζ t νi t i = φ i (24) λ i,t for some φ i with unit norm. Notice from the second equation of (22) that In addition, we observe from (24) that φ A i = lim i t ζ t νi t i λ i,t These follow from (i) and [25, Corollary 8.0]. φ i = and A i φ i = 0. (25) { lim νi t u t : u t P i (ζi t ) for each t P i (A i ).

12 2 T. Liu, T. K. Pong and A. Takeda where the first inclusion follows from (4) and the second inclusion follows from Lemma (so that lim ζi t = A i and {ζi t dom P i ), the continuity of P i in its domain and (2). These together with the facts 0 P 0 ( ), 0 P i (A i ) (i =,..., m) 2 and (25) contradict (2). Consequently, {νi t is bounded for all A i =,..., m. Then, without loss of generality, we assume that lim i t ζ t i λ i,t eists for all i =,..., m. Therefore, for each i =,..., m, we obtain from (22) that χ i = A i lim A i t ζ t i λ i,t A i { lim u t : u t P i (ζ t i ) for each t A i P i (A i ), (26) where the first inclusion follows from (4) and the second inclusion follows from Lemma (so that lim ζi t = A i and {ζi t dom P i for each i =,..., m), the continuity of P i in its domain and (2). Passing to the limit in (5) along t I and invoking (22), (23) and (26), we see that 0 = f( ) + η + χ i f( ) + P 0 ( ) + A i P i (A i ). i= i= This completes the proof. Remark 3 If, instead of (8), one can guarantee that F λt ( t,lt ) inf F λt + ɛ t, then one can show that any accumulation point of the sequence { t generated by SDCAM is a global r of (). To see this, recall from [25, Theorem.25] that e λi,t P i (A i ) P i (A i ) for each i and all, and from the discussion on [25, Page 244] that {(e λi,t P i ) A i epiconverges to P i A i for each i. Using these together with [25, Theorem 7.46], we further see that {F λt epiconverges to F. Now, in view of [25, Theorem 7.3(b)], we conclude that any accumulation point of the sequence { t generated by SDCAM is a global r of F. 4 Applications to structured optimization problems 4. Problems involving sparsity Consider the following l 0 -constrained optimization problem discussed in [30]: f() subject to 0 k, C, (27) where f is as in () and C is a nonempty closed set. This model includes many important application problems such as sparse principal component analysis, sparse portfolio selection and sparse nonnegative linear regression as special cases. These applications typically involve a closed set C whose projection is easy to compute. For instance, we have f() = V, defined with a covariance matri V S n 2 These follow from (i) and [25, Corollary 8.0].

13 Successive DC Approimation for Nonconve and Nonsmooth Problems 3 and C = { : = for sparse principal component analysis [27]. As another eample, for sparse nonnegative linear regression [26], f() = 2 A b 2 defined with A R m n and b R m, and C = { : 0 are used. For these two eamples, the direct projection onto C { : 0 k is easy to compute, and the proimal gradient algorithm can then be applied to solving (27). We net discuss a specific eample where the direct projection onto C { : 0 k might not be easy to compute, and describe how our SDCAM can be applied. Eample (Sparse portfolio problem) Given a basket of investable assets, the Markowitz model [9] seeks to find the optimal asset allocation of the portfolio by minimizing the estimated variance with an epected return above a specified level. More recently, [6] has added the l -norm to the classical Markowitz model to obtain sparse portfolios, and after that, various types of sparse regularizers such as l p-norm (0 < p < ) are incorporated into the Markowitz model (e.g., [8]). The sparse portfolio selection problem we consider here takes the following form: f() := 2 Q subject to 0 k, 0, e =, r = r 0, (28) where Q S n is the estimated covariance matri of the portfolio, r R n is the estimated mean return vector of investable assets, r 0 R is a specific return level, and e is the vector of all ones. The constraint 0 is known as the non-shortsale constraint, and model (28) is the formulation of the shorting-prohibited sparse Markowitz model. We assume here that the feasible set of (28) is nonempty. Notice that the feasible set of (28) is compact and hence (28) has a solution. Let be a solution of (28) and τ ma i i. Define Ω := { : 0 k, 0 τ and S := { : e =, r = r 0. Then (28) can be rewritten in the form of () (with the same optimal value) as follows f() + δ Ω () + δ S (), (29) {{{{ P 0() P () in which f + P 0 is level-bounded. Therefore, we can apply SDCAM in Section 3 to (29), and in each subproblem of SDCAM we can use NPG major to F λt as described in Theorem. The method involves computing two projections proj Ω and proj S, which are easy to compute. Indeed, we have ma{min{ Hk (y), τ, 0 proj Ω (y), where Hk (v) keeps any k largest entries of v and sets the rest to zero. 3 3 To see this, recall from [5, Proposition 3.] that an element ζ of proj Ω (y) can be obtained as ζ i = { ζ i if i I, 0 otherwise, where ζ i = argmin{ 2 (ζ i y i ) 2 : 0 ζ i τ = ma{min{y i, τ, 0, and I is an inde set of size k corresponding to the k largest values of { 2 y2 i 2 ( ζ i y i) 2 n i= = { 2 y2 i 2 (min{ma{y i τ, 0, y i ) 2 n i=. Since the function t 2 t2 2 (min{ma{t τ, 0, t)2 is nondecreasing, we can let I correspond to any k largest entries of y.

14 4 T. Liu, T. K. Pong and A. Takeda In statistics, l -norm regularizer has been used for inducing sparsity in variable selection problems; see Lasso [28], which is an application of the l penalty to linear regression. A more general model of Lasso, the generalized Lasso [29], has been proposed as 2 A b 2 + c D, where A R m n is a matri of predictors, b R m is a response vector, c 0 is a tuning parameter and D R d n is a specified penalty matri. The term D can enforce certain structural sparsity on the coefficients in the solution. For eample, with an appropriate D, D can epress n i=2 i i, which penalizes the absolute differences in adjacent coordinates of. This specific D leads to the so-called fused Lasso. A variant of this type of regularizer (anisotropic total variation regularizer) is also used in image processing for minimizing the horizontal or/and vertical differences between piels. Some other applications which require a non-identity matri D in the generalized Lasso were discussed in [29]. In the net eample, we discuss how our SDCAM can be applied to some nonconve variants of the generalized Lasso problem. Eample 2 (Nonconve fused regularized problem) Similarly as in [2], we consider the following nonconve fused regularized problem 2 A b 2 + c φ () + c 2 φ 2 (D), (30) where A R m n, b R m, D = ( 2,..., n n ), c > 0 and c 2 > 0 are regularization parameters, φ () = n i= ϕ i( i ) and φ 2 are nonconve sparsityinducing regularizers with ϕ i : R + R + being closed and nondecreasing, and φ 2 : R n R + being closed and level-bounded. Note that (30) can be rewritten in the form of in which à = ( ) A, D b = ) ( b 0 g(ã b) + Ψ(), (3), g(y) = 2 y 2 + c 2 φ 2 (y 2 ) with y := (y, y 2 ) R m R n, and Ψ() = n c i= ϕ i( i ). It is routine to check that g and Ψ satisfy [4, Assumption 2]. Hence, according to [4, Theorem 2.], we know that (3), and hence (30), has at least one solution. Notice that we can directly apply the SDCAM in Section 3 to (30) when φ is level-bounded, e.g., φ () = p : we set f() = 2 A b 2, P 0 = c φ and P = c 2 φ 2 with A = D in this case. When the NPG major is applied as described in Theorem for solving the corresponding subproblems, it involves computing the proimal mappings pro µφ and pro µφ2 for µ > 0. These are easy to compute for many well-known nonconve sparse regularizers; see [2]. Finally, in the case when φ is not level-bounded, let be a solution of (30) and τ ma. We define Ω := { : ma i τ and rewrite (30) in the form i i of () (with the same optimal value) as follows i n A b 2 + c ϕ 2 i ( i ) + δ Ω () + c 2 φ 2 (D). (32) {{{{ i= f() {{ P (A ) P 0()

15 Successive DC Approimation for Nonconve and Nonsmooth Problems 5 Then f + P 0 is level-bounded and hence the SDCAM in Section 3 can be applied. When the NPG major is applied in the subproblem of SDCAM as described in Theorem, it involves computing the proimal mappings pro µp0 and pro µφ2 for µ > 0. Note that pro µp0 can be obtained from pro µψi with ψ i ( i ) := c ϕ i ( i )+ δ τ ( i ), i =,..., n, which can be efficiently computed for various nonconve sparse regularizers such as SCAD, MCP, l p penalty and Capped-l (see [2]). Finally, the computation of pro µφ2 is also easy for many of these regularizers. 4.2 Problems with rank constraints Our algorithm can also be applied to rank-constrained nonconve nonsmooth matri optimization problems. We discuss some concrete eamples below. For notational simplicity, from now on, we let Ξ k := {X : rank(x) k for a given integer k. Note that if P = δ Ξk, then e λ P (X) = 2λ dist 2 (X, Ξ k ) = 2λ ( X 2 F X 2 k,2), where X 2 k,2 denotes the sum of squares of the k largest singular values of X. The function X X 2 F X 2 k,2 is a rank-related variant of the so-called k-sparsity functions [] because the relation rank(x) k can be equivalently epressed as X 2 F X 2 k,2 = 0. A variant of this function was used in [30] as a penalty function for inducing sparsity. It is interesting to note that this function falls out naturally from the Moreau envelope of the indicator function of Ξ k. Eample 3 (Matri completion) The problem of recovering a low-rank data matri M R m n from a sampling of its entries is known as the matri completion problem [7]. This problem can be formulated as rank(x) X subject to P Ω (X) = P Ω (M), where Ω is the inde set of known entries of M, and P Ω is the sampling map defined as { Y ij if (i, j) Ω, [P Ω (Y )] ij = 0 otherwise. When the entries of the data matri are noisy, one can consider the following variants of the above model: P Ω (X) P Ω (M) 2 F X subject to rank(x) k, or X P Ω (X) P Ω (M) 2 F + µ rank(x), where µ > 0 is tuning parameter, and k is a positive integer. Since these problems are nonconve in general, some popular conve relaation approaches have been proposed, where the rank function is replaced by the nuclear norm function [22]. The conve relaations can be shown to be equivalent to the original nonconve problems under suitable conditions [7].

16 6 T. Liu, T. K. Pong and A. Takeda Here we consider the following variation of the matri completion problem: X 2 P Ω(X) P Ω (M) 2 F subject to P Θ (X) = P Θ (M), rank(x) k, (33) where Ω is an inde set corresponding to possibly noisy known entries of M, and Θ is another inde set corresponding to noiseless known entries of M. Suppose that (33) has a solution X, and take τ ma{ma i,j X ij, σma(x ). Let S := {X : P Θ (X) = P Θ (M), S := {X S : ma X ij τ and Ξk := i,j {X Ξ k : σ ma(x) τ. Then (33) can be rewritten in the form of () (with the same optimal value) in the following two ways: X X 2 P Ω(X M) 2 F + δ S (X) + δ {{{{ Ξ k (X), {{ P f(x) (X) P 0(X) 2 P Ω(X M) 2 F + δ S (X) + δ Ξk (X). {{{{{{ f(x) P 0(X) P (X) (34) (35) Note that in both cases, f +P 0 is level-bounded and hence the SDCAM in Section 3 can be applied. Suppose that SDCAM is applied to (34). Then when the NPG major is applied as described in Theorem for solving the subproblems, it requires computing proj S and proj Ξ k. Both of these are easy to compute. In particular, let UDiag(σ)V be a singular value decomposition of W. Then an element Y proj Ξ k (W ) can be computed as Y = UDiag(ζ )V with ζ = min{h k (σ), τe, where e is the vector of all ones, the minimum is taken componentwise, and H k (v) is the hard thresholding operator that keeps any k largest entries of v in magnitude and sets the rest to zero. 4 On the other hand, when applying SDCAM to (35) with the NPG major as described in Theorem applied to the subproblems, one needs to compute proj S and proj Ξk. Again, both of these are easy to compute. In particular, let UDiag(σ)V be a singular value decomposition of W. Then an element Y proj Ξk (W ) can be computed as Y = UDiag(H k (σ))v. Eample 4 (Nearest low-rank correlation matri) Finding the nearest low-rank correlation matri has important applications in finance; see [5, ]. The problem 4 To see this, recall from [6, Corollary 2.3] and [5, Proposition 3.] that an element Y proj Ξk (W ) can be computed as Y = UDiag(ζ )V, where ζ i = { ζ i if i I, 0 otherwise, where ζ i = argmin{ 2 (ζ i σ i ) 2 : ζ i τ = min{σ i, τ, and I is an inde set of size k corresponding to the k largest values of { 2 σ2 i 2 ( ζ i σ i) 2 n i= = { 2 σ2 i 2 (ma{0, σ i τ) 2 n i=. Since t 2 t2 2 (ma{0, t τ)2 is nondecreasing for nonnegative t, we can take I to correspond to any k largest singular values.

17 Successive DC Approimation for Nonconve and Nonsmooth Problems 7 is often formulated as X S n 2 H (X M) 2 F subject to diag(x) = e, X 0, rank(x) k, (36) where S n is the space of n n symmetric matrices, H is a given nonnegative weight matri, M is a given symmetric matri and e is the vector of all ones, k. In [], the constraint rank(x) k was rewritten equivalently as requiring the sum of the n k smallest eigenvalues equal zero. A penalty approach was then adopted to handle this latter equality constraint. In the following, we describe how to solve (36) by the SDCAM in Section 3. Notice that for any X S n satisfying diag(x) = e and X 0, we have X n I. Thus, the feasible set of (36) is compact and hence (36) has a solution. Let X be a solution of (36) and τ ma{ma i,j X ij, λma(x ). Define S := {X S n : diag(x) = e, Π k := {X 0 : rank(x) k, S := {X S : ma X ij τ, i,j Πk := {X Π k : λ ma(x) τ. Then (36) can be rewritten in the form of () (with the same optimal value) in the following two ways: X S n 2 H (X M) 2 F {{ f(x) X S n 2 H (X M) 2 F {{ f(x) + δ S (X) + δ {{ Π k (X), {{ P (X) P 0(X) + δ S {{ (X) + δ Πk (X). {{ P 0(X) P (X) Notice that in both cases, f + P 0 is level-bounded and hence we can apply the SDCAM in Section 3. We first look at (37). When the NPG major as described in Theorem is applied to the subproblems, one has to compute proj S and proj Π k. Both projections can be easily computed. In particular, let UDiag(λ)U be an eigenvalue decomposition of W S n. Then an element Y proj Π k (W ) can be computed as Y = UDiag(ζ )V with ζ = ma{min{ Hk (λ), τ, 0, where Hk (v) keeps any k largest entries of v and sets the rest to zero. 5 We net turn to (38). In this case, in each NPG major iteration, one has to compute proj S and proj Π k. Again, both projections can be easily computed. In 5 To see this, recall from [6, Proposition 2.8] and [5, Proposition 3.] that an element Y proj Πk (W ) can be computed as Y = UDiag(ζ )V, where ζ i = { ζ i if i I, 0 otherwise, where ζ i = argmin{ 2 (ζ i λ i ) 2 : 0 ζ i τ = ma{min{λ i, τ, 0, and I is an inde set of size k corresponding to the k largest values of { 2 λ2 i 2 ( ζ i λ i) 2 n i= = { 2 λ2 i 2 (min{ma{λ i τ, 0, λ i ) 2 n i=. Since the function t 2 t2 2 (min{ma{t τ, 0, t)2 is nondecreasing, we can let I correspond to any k largest entries of λ. (37) (38)

18 8 T. Liu, T. K. Pong and A. Takeda particular, let UDiag(λ)U be an eigenvalue decomposition of W S n. Then an element Y proj Πk (W ) can be computed as Y = UDiag(ma{ Hk (λ), 0)U. Eample 5 (Simultaneously sparse and low rank matri optimization problem) The following problem was considered in [23]: X f(x) + γ vec(x) + τ X, where f is as in (), γ and τ are positive numbers. This problem aims at finding solutions which are both sparse and low-rank, and can be applied to identifying clusters in social networks; see [23, Section 6.2]. This model relaes and penalizes the sparsity inde vec(x) 0 and the low-rank inde rank(x) by two conve functions vec(x) and X, respectively. Here, we consider the following variant that eplicitly incorporates the sparsity and rank constraints: f(x) X subject to vec(x) 0 s, rank(x) k. (39) Suppose that (39) has a solution X, and let τ ma{ma i,j X ij, σma(x ). Define S := {X : vec(x) 0 s, S := {X S : ma X ij τ and Ξk := {X i,j Ξ k : σ ma(x) τ. Then (39) can be rewritten in the form of () (with the same optimal value) in the following two ways: X X f(x) + δ S (X) + δ {{ Ξ k (X), {{ P (X) P 0(X) f(x) + δ S {{ (X) + δ Ξk (X). {{ P 0(X) P (X) (40) (4) Note that in both cases, f +P 0 is level-bounded and hence the SDCAM in Section 3 can be applied. When the NPG major as described in Theorem is applied to the corresponding subproblems, one has to compute proj S and proj Ξ k for (40), and proj S and proj Ξ k for (4). All these projections can be computed efficiently; see Eamples and 3. 5 Numerical eperiments In this section, we apply our SDCAM in Section 3 with subproblems solved by NPG major as described in Theorem to an instance of Eample 2 and Eample 5: the nonconve fused regularized problem and the simultaneously sparse and low rank matri optimization problem. All numerical eperiments are performed in Matlab R206a on a 64-bit PC with an Intel(R) Core(TM) i CPU (3.4GHz) and 32GB of RAM.

19 Successive DC Approimation for Nonconve and Nonsmooth Problems 9 5. Nonconve fused regularized problem: comparison against a solution method based on smoothing We consider the following special instance of nonconve fused regularized problem: 2 b 2 + c + c 2 D p p, (42) where c > 0, c 2 > 0, p = 0.5, D = ( 2,..., n n ), and b R n is the noisy measurement of a piecewise constant sparse signal. Notice that the function is level-bounded. We can directly apply SDCAM as described in Eample 2 and solve the subproblems by NPG major. On the other hand, a commonly used technique for handling optimization problems involving l p penalty functions (0 < p < ) is smoothing. Thus, in our eperiments below, we compare SDCAM with a method based on smoothing, the smoothing nonmonotone proimal gradient method (snpg), for solving (42). In snpg, we solve the following sequence of subproblems approimately by NPG (this is NPG major applied to (44) when g = 0): n ( ) p 2 b 2 + c 2 (D) 2 i + λ 2 2 t i= {{ f t() + c, {{ Q() where λ t 0 is the smoothing parameter. The approimate stationary point of f t + Q obtained is then used as initialization for minimizing f t+ + Q. Data generation: We first randomly generate a piecewise constant signal R n using the following Matlab code: J = randperm(0);i = sort(j(:6), ascend ); = zeros(n,); for i = :r if randn > 0 (n*i(i)/0-3*n/50 - randi(3) : n*i(i)/0) = randi(3); else (n*i(i)/0-3*n/50 - randi(3) : n*i(i)/0) = -randi(3); end end Then we let b = +σξ, where σ > 0 is a noise factor and ξ has i.i.d. standard Gaussian entries. In our eperiments, motivated by [2], we choose c = c 2 = σ n/40. We shall see that this choice leads to reasonable recovery results in Figure. We also set σ = 0., n = 2000, 4000, 6000, 8000, Parameter setting: In SDCAM, we set λ t = /0 t+ and feas to be the vector of all ones. In the NPG major for solving the subproblems, we set M = 4, L ma = 0 8, L min = 0 8, τ = 2, c = 0 4, L 0 t,0 = and for l, L 0 t,l = ma { min { s l y l s l 2, Lma, L min,

20 20 T. Liu, T. K. Pong and A. Takeda (which is the inverse of the so-called Barzilai-Borwein stepsize) where s l = t,l t,l and y l = h( t,l ) h( t,l ). We initialize NPG major at feas and terminate it when the maimum number of iterations eceeds 0000 or t,l t,l ma( t,l, ) < ɛ t/ Lt,l or F λ t ( t,l ) F λt ( t,l ) ma{, F λt ( t,l < 0 2, ) where ɛ 0 = 0 5 and ɛ t = ma{ɛ t /.5, 0 6. On the other hand, in snpg, we also let λ t = /0 t+ and solve the subproblems using NPG (i.e., NPG major applied to (44) with g = 0) with the same setting as described above, ecept that the F λt above is replaced by f t + Q and for l, { { L 0 ma min s l y l, L t,l = s l 2 ma, L min ma { min { Lt,l /2, L ma, Lmin if s l y l > 0 2, otherwise. Finally, we terminate SDCAM when λ t < 0 9. And for a fair comparison, we consider two different termination criteria for snpg: λ t < 0 7 (snpg 7 ) and λ t < 0 8 (snpg 8 ). Numerical results: In Table, we compare SDCAM, snpg 7 and snpg 8 in terms of the number of iterations (iter), CPU time (CPU) and the terminating function values (fval), averaged over 0 randomly generated instances. One can see that the terminating function values are comparable, and SDCAM is in general faster than snpg 8 and slower than snpg 7. Moreover, SDCAM outperforms the snpg s slightly in terms of function values when the dimension is relatively low ( 4000). To illustrate the ability to recover the original signal, we also plot the original signal, the noisy measurement b and the signals recovered by SDCAM and snpg 8 for a random instance with n = 2000 in Figure. Table : Results for SDCAM, snpg 7 and snpg 8 for solving (42). n iter CPU fval SDCAM snpg 7 snpg 8 SDCAM snpg 7 snpg 8 SDCAM snpg 7 snpg e e e e e e e e e e e e e e e+03 To illustrate intuitively the approimation used in our SDCAM and snpg, we plot the function f() = /2 (in dashed lines), its Moreau envelope and its smoothing function in Figure 2. One can see that the envelope smooths the original nonsmooth point by a quadratic function. It is a lower approimation of f, while the smoothing function is an upper approimation of f.

21 Successive DC Approimation for Nonconve and Nonsmooth Problems 2 Fig. : Recovery comparison for noisy signal. Fig. 2: /2 with its Moreau envelope and smoothing function. 5.2 Simultaneously sparse and low rank matri optimization problem: which constraint should be modeled by P? We consider the following special instance of simultaneously sparse and low rank matri optimization problem: X 2 X M 2 F subject to vec(x) 0 s, rank(x) k, (43) where M R m n is a given noisy matri, s and k are positive integers. Note that f(x) := 2 X M 2 F is level-bounded. Therefore, (43) has at least one solution. Then, as discussed in Eample 5, we can apply SDCAM to solving (43) in two different ways by considering, respectively, the two formulations in (40) and (4): 6 the indicator function δ 0 s( ) is approimated by the Moreau envelope in (40) and the function δ rank( ) k ( ) is approimated by its Moreau envelope in (4). We call the method based on (40) SDCAM r and the method based on (4) SDCAM s. In the following eperiments, we compare these two methods. Data generation: We first randomly generate M R m k and M 2 R k n to have i.i.d. standard Gaussian entries. Then we set m/0 random rows of M to zero and let M = M M 2 + σ, where σ > 0 is a noise factor and has i.i.d. standard Gaussian entries. We fi n = 500, k = 0 and s = mn/0, and we eperiment with σ = 0.005, 0.0, 0.02 and m = 000, 2000, 3000 below. 6 We would like to point out that we are indeed using Ξ k in place of Ξ k in (40) and using S in place of S in (4) in our eperiments below. Notice that A3 is still satisfied because f is level-bounded.

22 22 T. Liu, T. K. Pong and A. Takeda Parameter setting: In both SDCAM r and SDCAM s, we set λ t = /0 t+ and X feas = 0. In the NPG major for solving the subproblems, we use the same parameter setting as in Section 5.. We initialize both algorithms at X feas and terminate them when dist(x t, S) 0 6 X t F and dist(x t, Ξ k ) 0 6 X t F, respectively. Numerical results: In Table 2, we compare SDCAM r and SDCAM s in terms of the number of iterations (iter), CPU time (CPU) and the feasibility violation (vio) (i.e., dist(x t, S) and dist(x t, Ξ k ), respectively) at termination, averaged over 0 randomly generated instances. One can see that SDCAM r takes fewer iterations and less time. An intuitive eplanation could be that the rank constraint is a more complicated constraint than the sparsity constraint to approimate via subgradients. Thus, the algorithm SDCAM r that maintains all its iterates in the rank constraint and then attempts to approimately satisfy the sparsity constraint as the algorithm progresses ends up converging more quickly. Table 2: Comparison of SDCAM r and SDCAM s for solving (43). σ m iter CPU vio SDCAM r SDCAM s SDCAM r SDCAM s SDCAM r SDCAM s e e e e e e e e e e e e e e e e e e-04 6 Conclusions In this paper, we propose a successive difference-of-conve approimation method for solving (). The key idea of this method is to approimate the nonsmooth functions in the objective of () by their Moreau envelopes. The approimation function can then be d by various proimal gradient methods with majorization techniques such as NPG major in the appendi, thanks to (6). We prove that the sequence generated by our method is bounded and any accumulation point is a stationary point of () under suitable conditions. We also discuss how to apply our method to concrete applications and conduct numerical eperiments to illustrate its efficiency.

Douglas-Rachford splitting for nonconvex feasibility problems

Douglas-Rachford splitting for nonconvex feasibility problems Douglas-Rachford splitting for nonconvex feasibility problems Guoyin Li Ting Kei Pong Jan 3, 015 Abstract We adapt the Douglas-Rachford DR) splitting method to solve nonconvex feasibility problems by studying

More information

1. Introduction. We consider the following constrained optimization problem:

1. Introduction. We consider the following constrained optimization problem: SIAM J. OPTIM. Vol. 26, No. 3, pp. 1465 1492 c 2016 Society for Industrial and Applied Mathematics PENALTY METHODS FOR A CLASS OF NON-LIPSCHITZ OPTIMIZATION PROBLEMS XIAOJUN CHEN, ZHAOSONG LU, AND TING

More information

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming

A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming A Randomized Nonmonotone Block Proximal Gradient Method for a Class of Structured Nonlinear Programming Zhaosong Lu Lin Xiao March 9, 2015 (Revised: May 13, 2016; December 30, 2016) Abstract We propose

More information

Sparse Recovery via Partial Regularization: Models, Theory and Algorithms

Sparse Recovery via Partial Regularization: Models, Theory and Algorithms Sparse Recovery via Partial Regularization: Models, Theory and Algorithms Zhaosong Lu and Xiaorui Li Department of Mathematics, Simon Fraser University, Canada {zhaosong,xla97}@sfu.ca November 23, 205

More information

Introduction to Alternating Direction Method of Multipliers

Introduction to Alternating Direction Method of Multipliers Introduction to Alternating Direction Method of Multipliers Yale Chang Machine Learning Group Meeting September 29, 2016 Yale Chang (Machine Learning Group Meeting) Introduction to Alternating Direction

More information

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming

Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Iterative Reweighted Minimization Methods for l p Regularized Unconstrained Nonlinear Programming Zhaosong Lu October 5, 2012 (Revised: June 3, 2013; September 17, 2013) Abstract In this paper we study

More information

Lecture 23: November 19

Lecture 23: November 19 10-725/36-725: Conve Optimization Fall 2018 Lecturer: Ryan Tibshirani Lecture 23: November 19 Scribes: Charvi Rastogi, George Stoica, Shuo Li Charvi Rastogi: 23.1-23.4.2, George Stoica: 23.4.3-23.8, Shuo

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course otes for EE7C (Spring 018): Conve Optimization and Approimation Instructor: Moritz Hardt Email: hardt+ee7c@berkeley.edu Graduate Instructor: Ma Simchowitz Email: msimchow+ee7c@berkeley.edu October

More information

In applications, we encounter many constrained optimization problems. Examples Basis pursuit: exact sparse recovery problem

In applications, we encounter many constrained optimization problems. Examples Basis pursuit: exact sparse recovery problem 1 Conve Analsis Main references: Vandenberghe UCLA): EECS236C - Optimiation methods for large scale sstems, http://www.seas.ucla.edu/ vandenbe/ee236c.html Parikh and Bod, Proimal algorithms, slides and

More information

Lecture 4: September 12

Lecture 4: September 12 10-725/36-725: Conve Optimization Fall 2016 Lecture 4: September 12 Lecturer: Ryan Tibshirani Scribes: Jay Hennig, Yifeng Tao, Sriram Vasudevan Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Bregman Divergence and Mirror Descent

Bregman Divergence and Mirror Descent Bregman Divergence and Mirror Descent Bregman Divergence Motivation Generalize squared Euclidean distance to a class of distances that all share similar properties Lots of applications in machine learning,

More information

Sparse Approximation via Penalty Decomposition Methods

Sparse Approximation via Penalty Decomposition Methods Sparse Approximation via Penalty Decomposition Methods Zhaosong Lu Yong Zhang February 19, 2012 Abstract In this paper we consider sparse approximation problems, that is, general l 0 minimization problems

More information

Least Sparsity of p-norm based Optimization Problems with p > 1

Least Sparsity of p-norm based Optimization Problems with p > 1 Least Sparsity of p-norm based Optimization Problems with p > Jinglai Shen and Seyedahmad Mousavi Original version: July, 07; Revision: February, 08 Abstract Motivated by l p -optimization arising from

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

Geometric Modeling Summer Semester 2010 Mathematical Tools (1)

Geometric Modeling Summer Semester 2010 Mathematical Tools (1) Geometric Modeling Summer Semester 2010 Mathematical Tools (1) Recap: Linear Algebra Today... Topics: Mathematical Background Linear algebra Analysis & differential geometry Numerical techniques Geometric

More information

Optimization over Sparse Symmetric Sets via a Nonmonotone Projected Gradient Method

Optimization over Sparse Symmetric Sets via a Nonmonotone Projected Gradient Method Optimization over Sparse Symmetric Sets via a Nonmonotone Projected Gradient Method Zhaosong Lu November 21, 2015 Abstract We consider the problem of minimizing a Lipschitz dierentiable function over a

More information

Algorithms for Nonsmooth Optimization

Algorithms for Nonsmooth Optimization Algorithms for Nonsmooth Optimization Frank E. Curtis, Lehigh University presented at Center for Optimization and Statistical Learning, Northwestern University 2 March 2018 Algorithms for Nonsmooth Optimization

More information

An Augmented Lagrangian Approach for Sparse Principal Component Analysis

An Augmented Lagrangian Approach for Sparse Principal Component Analysis An Augmented Lagrangian Approach for Sparse Principal Component Analysis Zhaosong Lu Yong Zhang July 12, 2009 Abstract Principal component analysis (PCA) is a widely used technique for data analysis and

More information

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems

An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems An accelerated proximal gradient algorithm for nuclear norm regularized linear least squares problems Kim-Chuan Toh Sangwoon Yun March 27, 2009; Revised, Nov 11, 2009 Abstract The affine rank minimization

More information

Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization

Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä. New Proximal Bundle Method for Nonsmooth DC Optimization Kaisa Joki Adil M. Bagirov Napsu Karmitsa Marko M. Mäkelä New Proximal Bundle Method for Nonsmooth DC Optimization TUCS Technical Report No 1130, February 2015 New Proximal Bundle Method for Nonsmooth

More information

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R

More information

10-725/36-725: Convex Optimization Spring Lecture 21: April 6

10-725/36-725: Convex Optimization Spring Lecture 21: April 6 10-725/36-725: Conve Optimization Spring 2015 Lecturer: Ryan Tibshirani Lecture 21: April 6 Scribes: Chiqun Zhang, Hanqi Cheng, Waleed Ammar Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

CONVERGENCE AND STABILITY OF A REGULARIZATION METHOD FOR MAXIMAL MONOTONE INCLUSIONS AND ITS APPLICATIONS TO CONVEX OPTIMIZATION

CONVERGENCE AND STABILITY OF A REGULARIZATION METHOD FOR MAXIMAL MONOTONE INCLUSIONS AND ITS APPLICATIONS TO CONVEX OPTIMIZATION Variational Analysis and Appls. F. Giannessi and A. Maugeri, Eds. Kluwer Acad. Publ., Dordrecht, 2004 CONVERGENCE AND STABILITY OF A REGULARIZATION METHOD FOR MAXIMAL MONOTONE INCLUSIONS AND ITS APPLICATIONS

More information

Convexity II: Optimization Basics

Convexity II: Optimization Basics Conveity II: Optimization Basics Lecturer: Ryan Tibshirani Conve Optimization 10-725/36-725 See supplements for reviews of basic multivariate calculus basic linear algebra Last time: conve sets and functions

More information

Lecture 23: Conditional Gradient Method

Lecture 23: Conditional Gradient Method 10-725/36-725: Conve Optimization Spring 2015 Lecture 23: Conditional Gradient Method Lecturer: Ryan Tibshirani Scribes: Shichao Yang,Diyi Yang,Zhanpeng Fang Note: LaTeX template courtesy of UC Berkeley

More information

Sparse Optimization Lecture: Basic Sparse Optimization Models

Sparse Optimization Lecture: Basic Sparse Optimization Models Sparse Optimization Lecture: Basic Sparse Optimization Models Instructor: Wotao Yin July 2013 online discussions on piazza.com Those who complete this lecture will know basic l 1, l 2,1, and nuclear-norm

More information

1 Kernel methods & optimization

1 Kernel methods & optimization Machine Learning Class Notes 9-26-13 Prof. David Sontag 1 Kernel methods & optimization One eample of a kernel that is frequently used in practice and which allows for highly non-linear discriminant functions

More information

Lecture 7: Weak Duality

Lecture 7: Weak Duality EE 227A: Conve Optimization and Applications February 7, 2012 Lecture 7: Weak Duality Lecturer: Laurent El Ghaoui 7.1 Lagrange Dual problem 7.1.1 Primal problem In this section, we consider a possibly

More information

arxiv: v3 [math.oc] 19 Oct 2017

arxiv: v3 [math.oc] 19 Oct 2017 Gradient descent with nonconvex constraints: local concavity determines convergence Rina Foygel Barber and Wooseok Ha arxiv:703.07755v3 [math.oc] 9 Oct 207 0.7.7 Abstract Many problems in high-dimensional

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

WE consider an undirected, connected network of n

WE consider an undirected, connected network of n On Nonconvex Decentralized Gradient Descent Jinshan Zeng and Wotao Yin Abstract Consensus optimization has received considerable attention in recent years. A number of decentralized algorithms have been

More information

Conditions for Robust Principal Component Analysis

Conditions for Robust Principal Component Analysis Rose-Hulman Undergraduate Mathematics Journal Volume 12 Issue 2 Article 9 Conditions for Robust Principal Component Analysis Michael Hornstein Stanford University, mdhornstein@gmail.com Follow this and

More information

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725

Proximal Gradient Descent and Acceleration. Ryan Tibshirani Convex Optimization /36-725 Proximal Gradient Descent and Acceleration Ryan Tibshirani Convex Optimization 10-725/36-725 Last time: subgradient method Consider the problem min f(x) with f convex, and dom(f) = R n. Subgradient method:

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization

Some Properties of the Augmented Lagrangian in Cone Constrained Optimization MATHEMATICS OF OPERATIONS RESEARCH Vol. 29, No. 3, August 2004, pp. 479 491 issn 0364-765X eissn 1526-5471 04 2903 0479 informs doi 10.1287/moor.1040.0103 2004 INFORMS Some Properties of the Augmented

More information

arxiv: v3 [stat.me] 8 Jun 2018

arxiv: v3 [stat.me] 8 Jun 2018 Between hard and soft thresholding: optimal iterative thresholding algorithms Haoyang Liu and Rina Foygel Barber arxiv:804.0884v3 [stat.me] 8 Jun 08 June, 08 Abstract Iterative thresholding algorithms

More information

An augmented Lagrangian approach for sparse principal component analysis

An augmented Lagrangian approach for sparse principal component analysis Math. Program., Ser. A DOI 10.1007/s10107-011-0452-4 FULL LENGTH PAPER An augmented Lagrangian approach for sparse principal component analysis Zhaosong Lu Yong Zhang Received: 12 July 2009 / Accepted:

More information

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16

Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 XVI - 1 Contraction Methods for Convex Optimization and Monotone Variational Inequalities No.16 A slightly changed ADMM for convex optimization with three separable operators Bingsheng He Department of

More information

STAT 200C: High-dimensional Statistics

STAT 200C: High-dimensional Statistics STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57

More information

A Smoothing SQP Framework for a Class of Composite L q Minimization over Polyhedron

A Smoothing SQP Framework for a Class of Composite L q Minimization over Polyhedron Noname manuscript No. (will be inserted by the editor) A Smoothing SQP Framework for a Class of Composite L q Minimization over Polyhedron Ya-Feng Liu Shiqian Ma Yu-Hong Dai Shuzhong Zhang Received: date

More information

30 not everywhere defined and capturing the domain of f was part of the problem. In this paper the objective

30 not everywhere defined and capturing the domain of f was part of the problem. In this paper the objective 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 A SMOOTHING DIRECT SEARCH METHOD FOR MONTE CARLO-BASED BOUND CONSTRAINED COMPOSITE NONSMOOTH OPTIMIZATION XIAOJUN CHEN, C. T. KELLEY, FENGMIN XU, AND ZAIKUN

More information

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 17 EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 17 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory May 29, 2012 Andre Tkacenko

More information

Lecture 26: April 22nd

Lecture 26: April 22nd 10-725/36-725: Conve Optimization Spring 2015 Lecture 26: April 22nd Lecturer: Ryan Tibshirani Scribes: Eric Wong, Jerzy Wieczorek, Pengcheng Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept.

More information

PARALLEL SUBGRADIENT METHOD FOR NONSMOOTH CONVEX OPTIMIZATION WITH A SIMPLE CONSTRAINT

PARALLEL SUBGRADIENT METHOD FOR NONSMOOTH CONVEX OPTIMIZATION WITH A SIMPLE CONSTRAINT Linear and Nonlinear Analysis Volume 1, Number 1, 2015, 1 PARALLEL SUBGRADIENT METHOD FOR NONSMOOTH CONVEX OPTIMIZATION WITH A SIMPLE CONSTRAINT KAZUHIRO HISHINUMA AND HIDEAKI IIDUKA Abstract. In this

More information

Monotonicity and Restart in Fast Gradient Methods

Monotonicity and Restart in Fast Gradient Methods 53rd IEEE Conference on Decision and Control December 5-7, 204. Los Angeles, California, USA Monotonicity and Restart in Fast Gradient Methods Pontus Giselsson and Stephen Boyd Abstract Fast gradient methods

More information

Sparse PCA with applications in finance

Sparse PCA with applications in finance Sparse PCA with applications in finance A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon 1 Introduction

More information

STATIONARITY RESULTS FOR GENERATING SET SEARCH FOR LINEARLY CONSTRAINED OPTIMIZATION

STATIONARITY RESULTS FOR GENERATING SET SEARCH FOR LINEARLY CONSTRAINED OPTIMIZATION STATIONARITY RESULTS FOR GENERATING SET SEARCH FOR LINEARLY CONSTRAINED OPTIMIZATION TAMARA G. KOLDA, ROBERT MICHAEL LEWIS, AND VIRGINIA TORCZON Abstract. We present a new generating set search (GSS) approach

More information

Robustly Stable Signal Recovery in Compressed Sensing with Structured Matrix Perturbation

Robustly Stable Signal Recovery in Compressed Sensing with Structured Matrix Perturbation Robustly Stable Signal Recovery in Compressed Sensing with Structured Matri Perturbation Zai Yang, Cishen Zhang, and Lihua Xie, Fellow, IEEE arxiv:.7v [cs.it] 4 Mar Abstract The sparse signal recovery

More information

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss

An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss An efficient ADMM algorithm for high dimensional precision matrix estimation via penalized quadratic loss arxiv:1811.04545v1 [stat.co] 12 Nov 2018 Cheng Wang School of Mathematical Sciences, Shanghai Jiao

More information

ON DISCRETE HESSIAN MATRIX AND CONVEX EXTENSIBILITY

ON DISCRETE HESSIAN MATRIX AND CONVEX EXTENSIBILITY Journal of the Operations Research Society of Japan Vol. 55, No. 1, March 2012, pp. 48 62 c The Operations Research Society of Japan ON DISCRETE HESSIAN MATRIX AND CONVEX EXTENSIBILITY Satoko Moriguchi

More information

Convex Optimization Problems. Prof. Daniel P. Palomar

Convex Optimization Problems. Prof. Daniel P. Palomar Conve Optimization Problems Prof. Daniel P. Palomar The Hong Kong University of Science and Technology (HKUST) MAFS6010R- Portfolio Optimization with R MSc in Financial Mathematics Fall 2018-19, HKUST,

More information

Network Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China)

Network Newton. Aryan Mokhtari, Qing Ling and Alejandro Ribeiro. University of Pennsylvania, University of Science and Technology (China) Network Newton Aryan Mokhtari, Qing Ling and Alejandro Ribeiro University of Pennsylvania, University of Science and Technology (China) aryanm@seas.upenn.edu, qingling@mail.ustc.edu.cn, aribeiro@seas.upenn.edu

More information

Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming

Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming Mathematical Programming manuscript No. (will be inserted by the editor) Primal-dual first-order methods with O(1/ǫ) iteration-complexity for cone programming Guanghui Lan Zhaosong Lu Renato D. C. Monteiro

More information

Math 273a: Optimization Lagrange Duality

Math 273a: Optimization Lagrange Duality Math 273a: Optimization Lagrange Duality Instructor: Wotao Yin Department of Mathematics, UCLA Winter 2015 online discussions on piazza.com Gradient descent / forward Euler assume function f is proper

More information

On the acceleration of the double smoothing technique for unconstrained convex optimization problems

On the acceleration of the double smoothing technique for unconstrained convex optimization problems On the acceleration of the double smoothing technique for unconstrained convex optimization problems Radu Ioan Boţ Christopher Hendrich October 10, 01 Abstract. In this article we investigate the possibilities

More information

Proximal methods. S. Villa. October 7, 2014

Proximal methods. S. Villa. October 7, 2014 Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem

More information

Local strong convexity and local Lipschitz continuity of the gradient of convex functions

Local strong convexity and local Lipschitz continuity of the gradient of convex functions Local strong convexity and local Lipschitz continuity of the gradient of convex functions R. Goebel and R.T. Rockafellar May 23, 2007 Abstract. Given a pair of convex conjugate functions f and f, we investigate

More information

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization /

Uses of duality. Geoff Gordon & Ryan Tibshirani Optimization / Uses of duality Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember conjugate functions Given f : R n R, the function is called its conjugate f (y) = max x R n yt x f(x) Conjugates appear

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley A. d Aspremont, INFORMS, Denver,

More information

CSC 576: Variants of Sparse Learning

CSC 576: Variants of Sparse Learning CSC 576: Variants of Sparse Learning Ji Liu Department of Computer Science, University of Rochester October 27, 205 Introduction Our previous note basically suggests using l norm to enforce sparsity in

More information

Math 273a: Optimization Subgradient Methods

Math 273a: Optimization Subgradient Methods Math 273a: Optimization Subgradient Methods Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com Nonsmooth convex function Recall: For ˉx R n, f(ˉx) := {g R

More information

Lecture 7 Monotonicity. September 21, 2008

Lecture 7 Monotonicity. September 21, 2008 Lecture 7 Monotonicity September 21, 2008 Outline Introduce several monotonicity properties of vector functions Are satisfied immediately by gradient maps of convex functions In a sense, role of monotonicity

More information

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles

Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Convex Optimization on Large-Scale Domains Given by Linear Minimization Oracles Arkadi Nemirovski H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology Joint research

More information

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 9. Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 1 Chapter 9 Alternating Direction Method of Multipliers Shiqian Ma, MAT-258A: Numerical Optimization 2 Separable convex optimization a special case is min f(x)

More information

Optimality Conditions for Constrained Optimization

Optimality Conditions for Constrained Optimization 72 CHAPTER 7 Optimality Conditions for Constrained Optimization 1. First Order Conditions In this section we consider first order optimality conditions for the constrained problem P : minimize f 0 (x)

More information

Exact penalty decomposition method for zero-norm minimization based on MPEC formulation 1

Exact penalty decomposition method for zero-norm minimization based on MPEC formulation 1 Exact penalty decomposition method for zero-norm minimization based on MPEC formulation Shujun Bi, Xiaolan Liu and Shaohua Pan November, 2 (First revised July 5, 22) (Second revised March 2, 23) (Final

More information

Successive Concave Sparsity Approximation for Compressed Sensing

Successive Concave Sparsity Approximation for Compressed Sensing Successive Concave Sparsity Approimation for Compressed Sensing Mohammadreza Malek-Mohammadi, Ali Koochakzadeh, Massoud Babaie-Zadeh, Senior Member, IEEE, Magnus Jansson, Senior Member, IEEE, and Cristian

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

A Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions

A Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions A Geometric Framework for Nonconvex Optimization Duality using Augmented Lagrangian Functions Angelia Nedić and Asuman Ozdaglar April 15, 2006 Abstract We provide a unifying geometric framework for the

More information

arxiv: v1 [math.oc] 16 Nov 2015

arxiv: v1 [math.oc] 16 Nov 2015 Conve programming with fast proimal and linear operators arxiv:1511.04815v1 [math.oc] 16 Nov 2015 Matt Wytock, Po-Wei Wang, J. Zico Kolter November 17, 2015 Abstract We present Epsilon, a system for general

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 5, 06 Reading: See class website Eric Xing @ CMU, 005-06

More information

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST) Lagrange Duality Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Lagrangian Dual function Dual

More information

Primal/Dual Decomposition Methods

Primal/Dual Decomposition Methods Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients

More information

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization

Frank-Wolfe Method. Ryan Tibshirani Convex Optimization Frank-Wolfe Method Ryan Tibshirani Convex Optimization 10-725 Last time: ADMM For the problem min x,z f(x) + g(z) subject to Ax + Bz = c we form augmented Lagrangian (scaled form): L ρ (x, z, w) = f(x)

More information

Conditional Gradient (Frank-Wolfe) Method

Conditional Gradient (Frank-Wolfe) Method Conditional Gradient (Frank-Wolfe) Method Lecturer: Aarti Singh Co-instructor: Pradeep Ravikumar Convex Optimization 10-725/36-725 1 Outline Today: Conditional gradient method Convergence analysis Properties

More information

Chapter 2 Convex Analysis

Chapter 2 Convex Analysis Chapter 2 Convex Analysis The theory of nonsmooth analysis is based on convex analysis. Thus, we start this chapter by giving basic concepts and results of convexity (for further readings see also [202,

More information

A derivative-free nonmonotone line search and its application to the spectral residual method

A derivative-free nonmonotone line search and its application to the spectral residual method IMA Journal of Numerical Analysis (2009) 29, 814 825 doi:10.1093/imanum/drn019 Advance Access publication on November 14, 2008 A derivative-free nonmonotone line search and its application to the spectral

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

Convex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE

Convex Analysis Notes. Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE Convex Analysis Notes Lecturer: Adrian Lewis, Cornell ORIE Scribe: Kevin Kircher, Cornell MAE These are notes from ORIE 6328, Convex Analysis, as taught by Prof. Adrian Lewis at Cornell University in the

More information

COMPLEXITY OF A QUADRATIC PENALTY ACCELERATED INEXACT PROXIMAL POINT METHOD FOR SOLVING LINEARLY CONSTRAINED NONCONVEX COMPOSITE PROGRAMS

COMPLEXITY OF A QUADRATIC PENALTY ACCELERATED INEXACT PROXIMAL POINT METHOD FOR SOLVING LINEARLY CONSTRAINED NONCONVEX COMPOSITE PROGRAMS COMPLEXITY OF A QUADRATIC PENALTY ACCELERATED INEXACT PROXIMAL POINT METHOD FOR SOLVING LINEARLY CONSTRAINED NONCONVEX COMPOSITE PROGRAMS WEIWEI KONG, JEFFERSON G. MELO, AND RENATO D.C. MONTEIRO Abstract.

More information

Convex Functions. Pontus Giselsson

Convex Functions. Pontus Giselsson Convex Functions Pontus Giselsson 1 Today s lecture lower semicontinuity, closure, convex hull convexity preserving operations precomposition with affine mapping infimal convolution image function supremum

More information

HW #1 SOLUTIONS. g(x) sin 1 x

HW #1 SOLUTIONS. g(x) sin 1 x HW #1 SOLUTIONS 1. Let f : [a, b] R and let v : I R, where I is an interval containing f([a, b]). Suppose that f is continuous at [a, b]. Suppose that lim s f() v(s) = v(f()). Using the ɛ δ definition

More information

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS

ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS MATHEMATICS OF OPERATIONS RESEARCH Vol. 28, No. 4, November 2003, pp. 677 692 Printed in U.S.A. ON A CLASS OF NONSMOOTH COMPOSITE FUNCTIONS ALEXANDER SHAPIRO We discuss in this paper a class of nonsmooth

More information

First-order methods for structured nonsmooth optimization

First-order methods for structured nonsmooth optimization First-order methods for structured nonsmooth optimization Sangwoon Yun Department of Mathematics Education Sungkyunkwan University Oct 19, 2016 Center for Mathematical Analysis & Computation, Yonsei University

More information

Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery

Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery Multi-stage convex relaxation approach for low-rank structured PSD matrix recovery Department of Mathematics & Risk Management Institute National University of Singapore (Based on a joint work with Shujun

More information

arxiv: v2 [math.oc] 22 Jun 2017

arxiv: v2 [math.oc] 22 Jun 2017 A Proximal Difference-of-convex Algorithm with Extrapolation Bo Wen Xiaojun Chen Ting Kei Pong arxiv:1612.06265v2 [math.oc] 22 Jun 2017 June 17, 2017 Abstract We consider a class of difference-of-convex

More information

Research Reports on Mathematical and Computing Sciences

Research Reports on Mathematical and Computing Sciences ISSN 1342-284 Research Reports on Mathematical and Computing Sciences Exploiting Sparsity in Linear and Nonlinear Matrix Inequalities via Positive Semidefinite Matrix Completion Sunyoung Kim, Masakazu

More information

Lagrangian-Conic Relaxations, Part I: A Unified Framework and Its Applications to Quadratic Optimization Problems

Lagrangian-Conic Relaxations, Part I: A Unified Framework and Its Applications to Quadratic Optimization Problems Lagrangian-Conic Relaxations, Part I: A Unified Framework and Its Applications to Quadratic Optimization Problems Naohiko Arima, Sunyoung Kim, Masakazu Kojima, and Kim-Chuan Toh Abstract. In Part I of

More information

Lecture 9: SVD, Low Rank Approximation

Lecture 9: SVD, Low Rank Approximation CSE 521: Design and Analysis of Algorithms I Spring 2016 Lecture 9: SVD, Low Rank Approimation Lecturer: Shayan Oveis Gharan April 25th Scribe: Koosha Khalvati Disclaimer: hese notes have not been subjected

More information

Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint

Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint Conditional Gradient Algorithms for Rank-One Matrix Approximations with a Sparsity Constraint Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with Ronny Luss Optimization and

More information

AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING

AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING AN AUGMENTED LAGRANGIAN AFFINE SCALING METHOD FOR NONLINEAR PROGRAMMING XIAO WANG AND HONGCHAO ZHANG Abstract. In this paper, we propose an Augmented Lagrangian Affine Scaling (ALAS) algorithm for general

More information

Optimization methods

Optimization methods Lecture notes 3 February 8, 016 1 Introduction Optimization methods In these notes we provide an overview of a selection of optimization methods. We focus on methods which rely on first-order information,

More information

LECTURE 7. Least Squares and Variants. Optimization Models EE 127 / EE 227AT. Outline. Least Squares. Notes. Notes. Notes. Notes.

LECTURE 7. Least Squares and Variants. Optimization Models EE 127 / EE 227AT. Outline. Least Squares. Notes. Notes. Notes. Notes. Optimization Models EE 127 / EE 227AT Laurent El Ghaoui EECS department UC Berkeley Spring 2015 Sp 15 1 / 23 LECTURE 7 Least Squares and Variants If others would but reflect on mathematical truths as deeply

More information

Lecture 24 November 27

Lecture 24 November 27 EE 381V: Large Scale Optimization Fall 01 Lecture 4 November 7 Lecturer: Caramanis & Sanghavi Scribe: Jahshan Bhatti and Ken Pesyna 4.1 Mirror Descent Earlier, we motivated mirror descent as a way to improve

More information

S 1/2 Regularization Methods and Fixed Point Algorithms for Affine Rank Minimization Problems

S 1/2 Regularization Methods and Fixed Point Algorithms for Affine Rank Minimization Problems S 1/2 Regularization Methods and Fixed Point Algorithms for Affine Rank Minimization Problems Dingtao Peng Naihua Xiu and Jian Yu Abstract The affine rank minimization problem is to minimize the rank of

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods

An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and its Implications to Second-Order Methods Renato D.C. Monteiro B. F. Svaiter May 10, 011 Revised: May 4, 01) Abstract This

More information

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence:

On Acceleration with Noise-Corrupted Gradients. + m k 1 (x). By the definition of Bregman divergence: A Omitted Proofs from Section 3 Proof of Lemma 3 Let m x) = a i On Acceleration with Noise-Corrupted Gradients fxi ), u x i D ψ u, x 0 ) denote the function under the minimum in the lower bound By Proposition

More information