arxiv: v2 [math.oc] 26 May 2018

Size: px

Start display at page:

Download "arxiv: v2 [math.oc] 26 May 2018"

Suzan Peters
5 years ago
Views:

1 Noname manuscript No. (will be inserted by the editor) A successive difference-of-conve approimation method for a class of nonconve nonsmooth optimization problems Tianiang Liu Ting Kei Pong Akiko Takeda arxiv: v2 [math.oc] 26 May 208 Received: date / Accepted: date Abstract We consider a class of nonconve nonsmooth optimization problems whose objective is the sum of a smooth function and a finite number of nonnegative proper closed possibly nonsmooth functions (whose proimal mappings are easy to compute), some of which are further composed with linear maps. This kind of problems arises naturally in various applications when different regularizers are introduced for inducing simultaneous structures in the solutions. Solving these problems, however, can be challenging because of the coupled nonsmooth functions: the corresponding proimal mapping can be hard to compute so that standard first-order methods such as the proimal gradient algorithm cannot be applied efficiently. In this paper, we propose a successive difference-of-conve approimation method for solving this kind of problems. In this algorithm, we approimate the nonsmooth functions by their Moreau envelopes in each iteration. Making use of the simple observation that Moreau envelopes of nonnegative proper closed functions are continuous difference-of-conve functions, we can then approimately the approimation function by first-order methods with suitable majorization techniques. These first-order methods can be implemented efficiently thanks to the fact that the proimal mapping of each nonsmooth function is easy to compute. Under suitable assumptions, we prove that the sequence generated by Ting Kei Pong is supported in part by Hong Kong Research Grants Council PolyU53085/6p. Akiko Takeda is supported by Grant-in-Aid for Scientific Research (C), 5K0003. T. Liu Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong. tiskyliu@polyu.edu.hk T. K. Pong Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong. tk.pong@polyu.edu.hk A. Takeda Department of Creative Informatics, Graduate School of Information Science and Technology, the University of Tokyo, Tokyo, Japan. takeda@mist.i.u-tokyo.ac.jp RIKEN Center for Advanced Intelligence Project, -4-, Nihonbashi, Chuo-ku, Tokyo , Japan. akiko.takeda@riken.jp

2 2 T. Liu, T. K. Pong and A. Takeda our method is bounded and any accumulation point is a stationary point of the objective. We also discuss how our method can be applied to concrete applications such as nonconve fused regularized optimization problems and simultaneously structured matri optimization problems, and illustrate the performance numerically for these two specific applications. Keywords Moreau envelope difference-of-conve approimation proimal mapping simultaneous structures Introduction In this paper, we consider the following possibly nonconve nonsmooth optimization problem: F () := f() + P 0 () + P i (A i ), () with the objective satisfying the following assumptions (see the net section for notation and definitions): A. f : R n R is an L-smooth function i.e., there eists a constant L > 0 so that i= f() f(v) L v for any, v R n. A2. A i : R n R ni, i =,..., m, are linear mappings and P i : R ni R + {, i = 0,..., m, are proper closed functions. The functions P i, i = 0,..., m, are continuous in their respective domains, and dom P 0 m i= A i (dom P i ). Moreover, the proimal mapping of γp i is easy to compute for every γ > 0 and for each i = 0,..., m. The sets dom P i, i =,..., m, are closed. A3. The function f + P 0 is level-bounded, i.e., for each r R, the set { R n : f() + P 0 () r is bounded. Problem () arises in many contemporary applications such as structured low rank matri recovery problems (see, for eample, [8]), nonconve fused regularized optimization problems (see, for eample, [2] and Eample 2 in Section 4) and simultaneously structured matri optimization problems (see, for eample, [23] and Eample 5 in Section 4). In these applications, the P i s are used for inducing desirable structures in the solutions and they are typically functions whose proimal mappings are easy to compute. If only one such function appears in (), i.e., m = 0, then some standard first-order methods such as the proimal gradient algorithm or its variants can be applied to solving () efficiently, because these algorithms only require the computation of f and the proimal mapping of γp 0 (γ > 0) in each iteration. However, in all the aforementioned applications, there are always more than one such structure-inducing functions in () (i.e., m )

3 Successive DC Approimation for Nonconve and Nonsmooth Problems 3 and the A i s might not always be identity mappings. Then the proimal gradient algorithm and its variants cannot be applied efficiently, because the proimal mapping of P 0 () + m i= P i(a i ) can be hard to compute in general. When the function f and the P i s are all conve functions, one alternative approach for solving () is the alternating direction method of multipliers (ADMM); see, for eample, [9, 0]. This method can be applied to () by suitably introducing slack variables that transform the problem into a linearly constrained problem, and each iteration only requires computing the proimal mappings of f and γp i s, as well as an update of an auiliary (dual) variable. However, it is known that the ADMM does not necessarily converge if the P i s are nonconve and m ; see, for eample, [3, Eample 7]. In the case when P i s are nonconve but globally Lipschitz for i = 0,..., m, and A i is the identity mapping for all i, a new method for solving () was introduced in a series of work [32, 33]. Their method is based on the so-called proimal average of P i s, and each iteration involves only the computations of f and the proimal mappings of γp i s. However, it was only shown that any accumulation point of the sequence generated by their method is a stationary point of a certain smooth approimation of (). Moreover, their method was designed for the case when P i s are globally Lipschitz, and the convergence behavior of their method is unknown when some non-lipschitz functions such as the l p quasi-norm or the indicator function of some closed sets (such as the set of all k-sparse vectors) are present in (). In this paper, we propose a new method for solving () that is ready to take advantage of the ease of proimal mapping computations and has convergence guarantee under suitable assumptions, without imposing conveity nor globally Lipschitz continuity on P i s. We call our method the successive difference-of-conve approimation method (SDCAM). In this method, we construct an approimation to the objective of () in each iteration using the Moreau envelopes of the λ i,t P i, i =,..., m, where t is the number of iteration and {λ i,t are nonincreasing positive sequences satisfying lim t λ i,t = 0; a suitable approimate stationary point of this approimation function is then taken to be the net iterate t+ of our algorithm. The point t+ can be found efficiently by recalling that the Moreau envelopes involved, despite being nonsmooth in general due to the possible nonconveity of the P i s, are continuous difference-of-conve functions. Thus, one can incorporate majorization techniques in some standard first-order methods such as the proimal gradient algorithm for finding t+ in each iteration. Moreover, when such first-order methods are applied, the main computational cost per inner iteration typically only depends on the computations of f and the proimal mappings of γp i, i = 0,..., m, γ > 0, which are inepensive in many applications. This suggests that the SDCAM can be applied efficiently for solving (). More details of this algorithm will be discussed in Section 3, where we also prove that the sequence { t generated is bounded and any accumulation point is a stationary point of () under suitable assumptions. The rest of the paper is organized as follows. In Section 2, we introduce notation and some preliminary results. Our SDCAM is presented and its convergence is analyzed under suitable assumptions in Section 3. We then discuss how our method can be applied to various kinds of structured optimization problems including some nonconve fused regularized optimization problems, some simultaneously sparse and low rank matri optimization problems, and the low rank nearest correlation matri problem, in Section 4. We also perform numerical eperiments on some

4 4 T. Liu, T. K. Pong and A. Takeda of these applications to demonstrate the efficiency of our algorithm in Section 5. Finally, we present some concluding remarks in Section 6. 2 Notation and preliminaries In this paper, vectors and matrices are represented in bold lower case letters and upper case letters, respectively. The inner product of two vectors a and b R n are denoted by a b or b a, and we use a 0, a and a to denote the number of nonzero entries, the l norm and the l 2 norm of a, respectively. Moreover, we use Diag(a) to denote the diagonal matri whose diagonal is a. For two matrices A and B R m n, their Hadamard (entrywise) product is denoted by A B. We also use A and A F to denote the nuclear norm and the Fröbenius norm of A, respectively, and let vec(a) R mn denote the vectorization of A, which is obtained by stacking the columns of A on top of one another. Furthermore, we use σ ma(a) to denote the largest singular value of A. The space of symmetric n n matrices is denoted by S n. For a matri X S n, we use diag(x) R n to denote its diagonal and λ ma(x) to denote its largest eigenvalue. We write X 0 if X is positive semidefinite. For a linear operator A, we let A denote its adjoint. A function h : R n R { is said to be proper if dom h := { : h() < =. Such a function is said to be closed if it is lower semicontinuous. Following [25, Definition 8.3], for a proper function h, the limiting and horizon subdifferentials at dom h are defined respectively as { h() = u : u t u, t h with u t ˆ h( t ) for each t, { h() = u : α t 0, α t u t u, t h with u t ˆ h( t ) for each t, where ˆ h(w) := h(y) h(w) u {u : lim inf (y w) 0, and the notation t h y w,y w y w means t and h( t ) h(). We also define h() = h() := when / dom h. It is easy to show that at any dom h, the limiting and horizon subdifferentials have the following robustness property: {u : u t u, t h with u t h( t ) for each t h(), {u : α t 0, α t u t u, t h with u t h( t ) for each t h(). (2) The limiting subdifferential at reduces to { h() if h is continuously differentiable at [25, Eercise 8.8(b)], and reduces to the conve subdifferential if h is proper conve [25, Proposition 8.2]. For a proper closed function h with inf h >, we will also need its Moreau envelope for any given λ > 0, which is defined as e λ h() := inf y { 2λ y 2 + h(y) This function is finite everywhere [25, Theorem.25]. It is not hard to see that e λ h() h() (3).

5 Successive DC Approimation for Nonconve and Nonsmooth Problems 5 for all. The infimum in the definition of Moreau envelope is attained at the so-called proimal mapping of λh at, which is defined as { pro λh () := Argmin u R n 2λ u 2 + h(u). This set is always nonempty because h is proper closed and bounded below [25, Theorem.25]. Let ζ λ pro λh (). Then we have from [25, Theorem 0.] and [25, Eercise 8.8(c)] that λ ( ζ λ) h(ζ λ ). (4) Furthermore, we have the following simple lemma, which should be well known. We provide a short proof for self-containedness. Lemma Let h be a proper closed function with inf h > and let dom h. Suppose that t, λ t 0 and pick any ζ t pro λth( t ) for each t. Then it holds that ζ t dom h for all t and ζ t. Proof Under the assumptions, we have the following inequality: 2λ t t ζ t 2 + inf h 2λ t t ζ t 2 + h(ζ t ) = e λt h( t ) 2λ t t 2 + h( ). Hence, we have ζ t dom h for all t and ζ t ζ t t + t 2λ t (h( ) inf h) + t 2 + t 0. as Finally, recall that for a nonempty closed set C, the indicator function is defined { 0 if C, δ C () = else. We define the (limiting) normal cone at any C as N C () := δ C (). We let dist(, C) := inf y C y. The set of points in the nonempty closed set C that are closest to a given is denoted by proj C (). One can observe that proj C = pro δc. The set proj C () at a given is always nonempty for a nonempty closed set C, and is a singleton when C is in addition conve. 3 Solution method for nonconve nonsmooth optimization problems 3. Successive difference-of-conve approimation method In this paper, we consider problem () and assume that its objective satisfies the assumptions A, A2 and A3 in Section. We will discuss some concrete applications of () in more details in Section 4. In this section, we present an algorithm for solving (). Notice that () is in general a nonsmooth nonconve optimization problem. The nonsmooth nonconve function P 0 + m i= P i A i can be complicated in practice and handling it directly can be challenging. Indeed, although the proimal mappings of

6 6 T. Liu, T. K. Pong and A. Takeda γp i, i = 0,..., m, are easy to compute, the proimal mapping of P 0 + m i= P i A i may be hard to evaluate and hence the classical proimal gradient algorithm and its variants cannot be adapted directly and efficiently for solving (). In this paper, we suitably adapt a smoothing scheme for solving the above nonconve nonsmooth problem. In this approach, in each iteration, we the auiliary function F λ () := f() + P 0 () + e λi P i (A i ) (5) approimately and then update and λ = (λ,, λ m), where e λi P i is the Moreau envelope of P i. When P i, i =,..., m are all conve functions, the corresponding functions e λi P i are Lipschitz differentiable [3, Proposition 2.29]. Hence, the function F λ becomes the sum of a nonsmooth function P 0 and a smooth function, and can be d efficiently using, for eample, the proimal gradient algorithm and its variants. This smoothing strategy has been widely used in the literature for conve problems; see [20], and also [4] for a software package for conve optimization problems based on smoothing techniques. However, in our setting, P i is not necessarily conve. Thus, the corresponding Moreau envelope e λi P i is not necessarily smooth and it is unclear whether F λ can be d efficiently at first glance. The key ingredient in our approach (where P i is possibly nonconve) is the simple observation that for any nonnegative proper closed function P and any µ > 0, e µp (u) = { 2µ u 2 sup y dom P µ u y 2µ y 2 P (y). (6) {{ D µ,p (u) Such a decomposition has been noted in [2] when P = δ C for some nonempty closed set C, and in [7, Proposition 3] for the general case. Then D µ,p, as the supreme of affine functions and being finite-valued, is conve continuous. Moreover, using the definition of e µp (u), pro µp (u) and (6), we see that the supremum in D µ,p (u) is attained at any point in pro µp (u). Let y pro µp (A). Then y dom P and we have for any w that D µ,p (w) D µ,p (A) = sup y dom P { µ w y 2µ y 2 P (y) µ w y 2µ y 2 P (y ) = µ y (w A). sup i= y dom P { µ (A) y 2µ y 2 P (y) ( µ (A) y 2µ y 2 P (y ) This implies µ pro µp (A) D µ,p (A), from which we deduce further that µ A pro µp (A) A D µ,p (A) = (D µ,p A)(), (7) where the last equality follows from [24, Theorem 23.9] because D µ,p is conve continuous. Thus, (5) is the sum of a smooth function f, a nonsmooth nonconve function P 0 whose proimal mapping is easy to compute, and a continuous )

7 Successive DC Approimation for Nonconve and Nonsmooth Problems 7 difference-of-conve function such that a subgradient corresponding to its concave part is easy to compute; thanks to (7) and Assumption A2. Proimal gradient methods with majorization techniques can then be suitably applied to minimizing (5). For instance, one can apply the NPG major described in the appendi. Specifically, one can apply NPG major with h() = f() + i= 2λ i A i 2, P () = P 0 (), g() = D λi,p i (A i ). It is routine to check that this choice of h, P and g satisfies the assumptions required in the appendi. Moreover, the F λ is level-bounded because f + P 0 is level-bounded by assumption and e λi P i are nonnegative for each i =,..., m since P i are nonnegative. Finally, F λ is continuous in its domain because P 0 is. Hence all assumptions required in the appendi for applying NPG major are satisfied and the method can be applied to minimizing F λ by initializing at any point 0 dom P 0. We now describe our method for solving () with its update rules below in Algorithm. We call this method the successive difference-of-conve approimation method (SDCAM). i= Algorithm The SDCAM for () Step 0. Pick m + sequences of positive numbers with ɛ t 0 and λ i,t 0 for i =,..., m, an feas dom P 0 m i= A i (dom P i ), and an 0 dom P 0. Set t = 0. Step. If F λt ( t ) F λt ( feas ), set t,0 = t. Else, set t,0 = feas. Step 2. Approimately F λt (), starting at t,0, and terminating at t,l t when dist ( 0, f( t,l t ) + P 0 ( t,l t+ ) + i= and t,l t+ t,l t ɛ t, F λt ( t,l t ) F λt ( t,0 ). Step 3. Update t+ = t,l t and t = t +. Go to Step. ) A i λ [A i t,l t pro λi,t P i (A i t,l t )] ɛ t, i,t (8) We would like to point out that Step in SDCAM is crucial in our convergence analysis: this strategy was also used in the penalty decomposition method in [5]. As we shall see in the proof of Theorem 2 below, it ensures that (2) can be applied at an accumulation point of { t. 3.2 Theoretical guarantee for global convergence In this section, we first discuss how F λt can be approimately d so that (8) is satisfied at the t-th iteration and comment on the computational compleity. Then we prove the convergence of the SDCAM under suitable assumptions. As discussed above, F λt can be d by the NPG major outlined in the appendi. Moreover, due to (7), one can choose ζ t,l = m i= λ i,t A i ζt,l i in the algorithm with ζ t,l i pro λi,tp i (A i t,l ) (9)

8 8 T. Liu, T. K. Pong and A. Takeda for each i =,..., m and l 0 so that m i= λ i,t A i ζt,l i lies in the subdifferential of m i= (D λ i,t,p i A i ) at t,l. Using this special version of NPG major, we can show that the termination criterion (8) is satisfied after finitely many inner iterations. Theorem Suppose that the NPG major is applied with ζ t,l = m i= λ i,t A i ζt,l i, where ζ t,l i are chosen as in (9), to minimizing F λt in the t-th iteration of SDCAM. Then the criterion (8) is satisfied after finitely many inner iterations. Proof According to the convergence properties of the NPG major, one obtains a sequence { t,l l 0 satisfying. lim t,l+ t,l = 0 (Proposition 2 in the appendi), F λt ( t,l ) F λt ( t,0 ) l (thanks to (46)); and 2. for any l 0 (see (45)), t,l+ Argmin ( f( t,l ) + i= ) ω t,l i + L t,l λ i,t 2 t,l 2 + P 0 (), (0) where ω t,l i := A i [A i t,l ζ t,l i ]. Here, the sequence { Lt,l l 0 can be shown to be bounded; see Proposition in the appendi. Using [25, Eercise 8.8(c)], the condition (0) implies 0 f( t,l ) + i= Lt,l ( t,l+ t,l ) f( t,l ) + λ i,t A i [A i t,l ζ t,l i ] + Lt,l ( t,l+ t,l ) + P 0 ( t,l+ ), i= λ i,t A i [A i t,l ζ t,l i ] + P 0 ( t,l+ ), from which (8) can be seen to hold with l t = l when l is sufficiently large because lim l t,l+ t,l = 0 and { Lt,l l 0 is bounded. Remark (Computational compleity) Suppose that the NPG major is applied to minimizing F λt in each iteration of SDCAM, with the ζ t,l chosen as in Theorem. Then one has to repeatedly solve subproblems of the form (0) for various values of λ t and β > 0 (in place of Lt,l ). These computations are easy under the assumption that the proimal mapping γp i, i =,..., m, γ > 0, is easy to compute. Indeed, the subproblems can be rewritten as ( ( )) t,l+ pro t,l f( t,l ) + A β P0 i [A i t,l ζ t,l β λ i ], () i,t where ζ t,l i pro λi,tp i (A i t,l ). We now state and prove our convergence result for SDCAM. We will comment on (2) in Remark 2 below before proving the theorem. Theorem 2 (Convergence of SDCAM) Let { t be the sequence generated by SD- CAM for solving (). Then { t is bounded. Let be an accumulation point of this sequence. Then we have the following results. i=

9 Successive DC Approimation for Nonconve and Nonsmooth Problems 9 (i) It holds that dom P 0 m (ii) Suppose that y 0 + i= A i (dom P i ). A i y i = 0 and y 0 P 0 ( ), i= = y i = 0 for i = 0,..., m. Then is a stationary point of (), i.e., 0 f( ) + P 0 ( ) + Remark 2 (Comments on condition (2)) y i P i (A i ) for i =,..., m (2) A i P i (A i ). (3) (i) Condition (2) is a classical constraint qualification for nonconve nonsmooth optimization problems; see [25, Corollary 0.9]. It is satisfied, for eample, when A i equals the identity map for all i, and all but one P i are locally Lipschitz so that P i ( ) = {0 for all but one P i ; see [25, Eercise 0.0]. (ii) Under (2), it can be shown using [25, Theorem 0.], [25, Proposition 0.5] and [25, Theorem 0.6] that any local r of () satisfies (3). Proof Using the nonnegativity of P i, the last criterion in (8) and the definitions of F λ and t,0, we see that f( t ) + P 0 ( t ) F λt ( t ) F λt ( feas ) F ( feas ) =: F feas, (4) where the last inequality follows from the definitions of F, F λ and (3). From this, one immediately conclude that { t is bounded because f + P 0 is level-bounded. Net, let be an accumulation point of { t. Then there eists a subsequence { t so that lim t =. Using this, (4), and the lower semicontinuity of f+p 0, we further see that i= f( ) + P 0 ( ) lim inf f(t ) + P 0 ( t ) F feas <. This shows that dom P 0. On the other hand, since P i is nonnegative, we have 0 { 2 dist2 (A i, dom P i ) = inf y dom P i 2 A i y 2 { inf y dom P i 2 A i y 2 + λ i,t P i (y) = λ i,t e λi,t P i (A i ) for all and for each i =,..., m. Using this, the finiteness of l := inf{f + P 0 (thanks to the level-boundedness of f + P 0 ), and the definition of F λ, we have for each i =,..., m that l + 2λ i,t dist 2 (A i t, dom P i ) l + e λi,t P i (A i t ) F λt ( t ) F feas, where the last inequality follows from (4). Since λ i,t 0, we conclude that dist 2 (A i, dom P i ) 0 and hence A i dom P i because dom P i is closed.

10 0 T. Liu, T. K. Pong and A. Takeda We now prove (3) under (2). For notational simplicity, let y t+ := t,lt+. Then lim y t = thanks to the second relation in (8). Moreover, from the first relation in (8), we see that there eist ξ t with ξ t ɛ t, η t P 0 (y t ) and ζi t pro λ i,t P i (A i t ) for each i =,..., m so that Define ξ t = f( t ) + η t + r t := η t + i= i= λ i,t A i (A i t ζ t i ). (5) λ i,t A i (A i t ζ t i ). We claim that {r t is bounded. Suppose to the contrary that {r t is unbounded and we assume without loss of generality that lim r t = and inf r t > 0. Then the sequences { r t η t and { λ i,t r t A i (A i t ζi t ) for i =,..., m are bounded. Without loss of generality, we may assume η t ( lim = η and lim A Ai t ζ t ) i i = χ i (6) r t λ i,t r t for some η and χ i, i =,..., m. Notice that = ηt + m i= λ i,t A i (A i t ζi t ) = η + r t χ i. (7) In addition, by dividing r t from both sides of (5) and passing to the limit along t I, we conclude that 0 = η + χ i. (8) On the other hand, since η t P 0 (y t ) and lim r t =, we have from (6), the continuity of P 0 in its domain and (2) that i= i= η P 0 ( ). (9) Net, we prove that χ i A i P i (A i ) for i =,..., m. To proceed, we define for each i =,..., m, wi t := A i t ζi t λ i,t r t and claim that {w t i is bounded for all i =,..., m. For an arbitrarily fied i {,..., m, suppose to the contrary that {w t i is unbounded and we assume without loss of generality that lim w t i = and that A lim i t ζ t wi t i = ψi (20) λ i,t r t for some ψ i with unit norm. Then from the second equation in (6), we have ψ i = and A i ψ i = 0. (2)

11 Successive DC Approimation for Nonconve and Nonsmooth Problems In addition, we observe from (20) that ψi A = lim i t ζ t wi t i λ i,t r t { lim wi tr u t : u t P i (ζi t ) for each t P i (A i ). t where the first inclusion follows from (4) and the second inclusion follows from Lemma (so that lim ζi t = A i and {ζi t dom P i ), the continuity of P i in its domain and (2). These together with the facts 0 P 0 ( ), 0 P i (A i ) (i =,..., m) and (2) contradict (2). Consequently, {wi t is bounded for all A i =,..., m. Then, without loss of generality, we assume that lim i t ζ t i λ i,t r t eists for all i =,..., m. Then, for each i =,..., m, we observe from (6) that χ i = A i lim A i t ζ t i λ i,t r t A i { lim u t : u t P i (ζi t ) for each t A i P i (A i ), r t where the first inclusion follows from (4) and the second inclusion follows from Lemma (so that lim ζi t = A i and {ζi t dom P i for each i =,..., m), the continuity of P i in its domain and (2). These together with (7), (8) and (9) contradict (2). Consequently, {r t is bounded. Since {r t is bounded, we may assume without loss of generality that lim η t = η and lim A i (A i t ζi t ) = χ i (22) λ i,t for some η and χ i, i =,..., m. Then we have from (2) and the continuity of P 0 in its domain that η P 0 ( ). (23) Net, we prove that χ i A i P i(a i ) for i =,..., m. To proceed, we define for each i =,..., m, νi t := A i t ζi t λ i,t and claim that {νi t is bounded for all i =,..., m. For an arbitrary fied i {,..., m, suppose to the contrary that {νi t is unbounded and we assume without loss of generality that lim νi t = and that A lim i t ζ t νi t i = φ i (24) λ i,t for some φ i with unit norm. Notice from the second equation of (22) that In addition, we observe from (24) that φ A i = lim i t ζ t νi t i λ i,t These follow from (i) and [25, Corollary 8.0]. φ i = and A i φ i = 0. (25) { lim νi t u t : u t P i (ζi t ) for each t P i (A i ).

12 2 T. Liu, T. K. Pong and A. Takeda where the first inclusion follows from (4) and the second inclusion follows from Lemma (so that lim ζi t = A i and {ζi t dom P i ), the continuity of P i in its domain and (2). These together with the facts 0 P 0 ( ), 0 P i (A i ) (i =,..., m) 2 and (25) contradict (2). Consequently, {νi t is bounded for all A i =,..., m. Then, without loss of generality, we assume that lim i t ζ t i λ i,t eists for all i =,..., m. Therefore, for each i =,..., m, we obtain from (22) that χ i = A i lim A i t ζ t i λ i,t A i { lim u t : u t P i (ζ t i ) for each t A i P i (A i ), (26) where the first inclusion follows from (4) and the second inclusion follows from Lemma (so that lim ζi t = A i and {ζi t dom P i for each i =,..., m), the continuity of P i in its domain and (2). Passing to the limit in (5) along t I and invoking (22), (23) and (26), we see that 0 = f( ) + η + χ i f( ) + P 0 ( ) + A i P i (A i ). i= i= This completes the proof. Remark 3 If, instead of (8), one can guarantee that F λt ( t,lt ) inf F λt + ɛ t, then one can show that any accumulation point of the sequence { t generated by SDCAM is a global r of (). To see this, recall from [25, Theorem.25] that e λi,t P i (A i ) P i (A i ) for each i and all, and from the discussion on [25, Page 244] that {(e λi,t P i ) A i epiconverges to P i A i for each i. Using these together with [25, Theorem 7.46], we further see that {F λt epiconverges to F. Now, in view of [25, Theorem 7.3(b)], we conclude that any accumulation point of the sequence { t generated by SDCAM is a global r of F. 4 Applications to structured optimization problems 4. Problems involving sparsity Consider the following l 0 -constrained optimization problem discussed in [30]: f() subject to 0 k, C, (27) where f is as in () and C is a nonempty closed set. This model includes many important application problems such as sparse principal component analysis, sparse portfolio selection and sparse nonnegative linear regression as special cases. These applications typically involve a closed set C whose projection is easy to compute. For instance, we have f() = V, defined with a covariance matri V S n 2 These follow from (i) and [25, Corollary 8.0].

13 Successive DC Approimation for Nonconve and Nonsmooth Problems 3 and C = { : = for sparse principal component analysis [27]. As another eample, for sparse nonnegative linear regression [26], f() = 2 A b 2 defined with A R m n and b R m, and C = { : 0 are used. For these two eamples, the direct projection onto C { : 0 k is easy to compute, and the proimal gradient algorithm can then be applied to solving (27). We net discuss a specific eample where the direct projection onto C { : 0 k might not be easy to compute, and describe how our SDCAM can be applied. Eample (Sparse portfolio problem) Given a basket of investable assets, the Markowitz model [9] seeks to find the optimal asset allocation of the portfolio by minimizing the estimated variance with an epected return above a specified level. More recently, [6] has added the l -norm to the classical Markowitz model to obtain sparse portfolios, and after that, various types of sparse regularizers such as l p-norm (0 < p < ) are incorporated into the Markowitz model (e.g., [8]). The sparse portfolio selection problem we consider here takes the following form: f() := 2 Q subject to 0 k, 0, e =, r = r 0, (28) where Q S n is the estimated covariance matri of the portfolio, r R n is the estimated mean return vector of investable assets, r 0 R is a specific return level, and e is the vector of all ones. The constraint 0 is known as the non-shortsale constraint, and model (28) is the formulation of the shorting-prohibited sparse Markowitz model. We assume here that the feasible set of (28) is nonempty. Notice that the feasible set of (28) is compact and hence (28) has a solution. Let be a solution of (28) and τ ma i i. Define Ω := { : 0 k, 0 τ and S := { : e =, r = r 0. Then (28) can be rewritten in the form of () (with the same optimal value) as follows f() + δ Ω () + δ S (), (29) {{{{ P 0() P () in which f + P 0 is level-bounded. Therefore, we can apply SDCAM in Section 3 to (29), and in each subproblem of SDCAM we can use NPG major to F λt as described in Theorem. The method involves computing two projections proj Ω and proj S, which are easy to compute. Indeed, we have ma{min{ Hk (y), τ, 0 proj Ω (y), where Hk (v) keeps any k largest entries of v and sets the rest to zero. 3 3 To see this, recall from [5, Proposition 3.] that an element ζ of proj Ω (y) can be obtained as ζ i = { ζ i if i I, 0 otherwise, where ζ i = argmin{ 2 (ζ i y i ) 2 : 0 ζ i τ = ma{min{y i, τ, 0, and I is an inde set of size k corresponding to the k largest values of { 2 y2 i 2 ( ζ i y i) 2 n i= = { 2 y2 i 2 (min{ma{y i τ, 0, y i ) 2 n i=. Since the function t 2 t2 2 (min{ma{t τ, 0, t)2 is nondecreasing, we can let I correspond to any k largest entries of y.

14 4 T. Liu, T. K. Pong and A. Takeda In statistics, l -norm regularizer has been used for inducing sparsity in variable selection problems; see Lasso [28], which is an application of the l penalty to linear regression. A more general model of Lasso, the generalized Lasso [29], has been proposed as 2 A b 2 + c D, where A R m n is a matri of predictors, b R m is a response vector, c 0 is a tuning parameter and D R d n is a specified penalty matri. The term D can enforce certain structural sparsity on the coefficients in the solution. For eample, with an appropriate D, D can epress n i=2 i i, which penalizes the absolute differences in adjacent coordinates of. This specific D leads to the so-called fused Lasso. A variant of this type of regularizer (anisotropic total variation regularizer) is also used in image processing for minimizing the horizontal or/and vertical differences between piels. Some other applications which require a non-identity matri D in the generalized Lasso were discussed in [29]. In the net eample, we discuss how our SDCAM can be applied to some nonconve variants of the generalized Lasso problem. Eample 2 (Nonconve fused regularized problem) Similarly as in [2], we consider the following nonconve fused regularized problem 2 A b 2 + c φ () + c 2 φ 2 (D), (30) where A R m n, b R m, D = ( 2,..., n n ), c > 0 and c 2 > 0 are regularization parameters, φ () = n i= ϕ i( i ) and φ 2 are nonconve sparsityinducing regularizers with ϕ i : R + R + being closed and nondecreasing, and φ 2 : R n R + being closed and level-bounded. Note that (30) can be rewritten in the form of in which Ã = ( ) A, D b = ) ( b 0 g(ã b) + Ψ(), (3), g(y) = 2 y 2 + c 2 φ 2 (y 2 ) with y := (y, y 2 ) R m R n, and Ψ() = n c i= ϕ i( i ). It is routine to check that g and Ψ satisfy [4, Assumption 2]. Hence, according to [4, Theorem 2.], we know that (3), and hence (30), has at least one solution. Notice that we can directly apply the SDCAM in Section 3 to (30) when φ is level-bounded, e.g., φ () = p : we set f() = 2 A b 2, P 0 = c φ and P = c 2 φ 2 with A = D in this case. When the NPG major is applied as described in Theorem for solving the corresponding subproblems, it involves computing the proimal mappings pro µφ and pro µφ2 for µ > 0. These are easy to compute for many well-known nonconve sparse regularizers; see [2]. Finally, in the case when φ is not level-bounded, let be a solution of (30) and τ ma. We define Ω := { : ma i τ and rewrite (30) in the form i i of () (with the same optimal value) as follows i n A b 2 + c ϕ 2 i ( i ) + δ Ω () + c 2 φ 2 (D). (32) {{{{ i= f() {{ P (A ) P 0()

15 Successive DC Approimation for Nonconve and Nonsmooth Problems 5 Then f + P 0 is level-bounded and hence the SDCAM in Section 3 can be applied. When the NPG major is applied in the subproblem of SDCAM as described in Theorem, it involves computing the proimal mappings pro µp0 and pro µφ2 for µ > 0. Note that pro µp0 can be obtained from pro µψi with ψ i ( i ) := c ϕ i ( i )+ δ τ ( i ), i =,..., n, which can be efficiently computed for various nonconve sparse regularizers such as SCAD, MCP, l p penalty and Capped-l (see [2]). Finally, the computation of pro µφ2 is also easy for many of these regularizers. 4.2 Problems with rank constraints Our algorithm can also be applied to rank-constrained nonconve nonsmooth matri optimization problems. We discuss some concrete eamples below. For notational simplicity, from now on, we let Ξ k := {X : rank(x) k for a given integer k. Note that if P = δ Ξk, then e λ P (X) = 2λ dist 2 (X, Ξ k ) = 2λ ( X 2 F X 2 k,2), where X 2 k,2 denotes the sum of squares of the k largest singular values of X. The function X X 2 F X 2 k,2 is a rank-related variant of the so-called k-sparsity functions [] because the relation rank(x) k can be equivalently epressed as X 2 F X 2 k,2 = 0. A variant of this function was used in [30] as a penalty function for inducing sparsity. It is interesting to note that this function falls out naturally from the Moreau envelope of the indicator function of Ξ k. Eample 3 (Matri completion) The problem of recovering a low-rank data matri M R m n from a sampling of its entries is known as the matri completion problem [7]. This problem can be formulated as rank(x) X subject to P Ω (X) = P Ω (M), where Ω is the inde set of known entries of M, and P Ω is the sampling map defined as { Y ij if (i, j) Ω, [P Ω (Y )] ij = 0 otherwise. When the entries of the data matri are noisy, one can consider the following variants of the above model: P Ω (X) P Ω (M) 2 F X subject to rank(x) k, or X P Ω (X) P Ω (M) 2 F + µ rank(x), where µ > 0 is tuning parameter, and k is a positive integer. Since these problems are nonconve in general, some popular conve relaation approaches have been proposed, where the rank function is replaced by the nuclear norm function [22]. The conve relaations can be shown to be equivalent to the original nonconve problems under suitable conditions [7].

16 6 T. Liu, T. K. Pong and A. Takeda Here we consider the following variation of the matri completion problem: X 2 P Ω(X) P Ω (M) 2 F subject to P Θ (X) = P Θ (M), rank(x) k, (33) where Ω is an inde set corresponding to possibly noisy known entries of M, and Θ is another inde set corresponding to noiseless known entries of M. Suppose that (33) has a solution X, and take τ ma{ma i,j X ij, σma(x ). Let S := {X : P Θ (X) = P Θ (M), S := {X S : ma X ij τ and Ξk := i,j {X Ξ k : σ ma(x) τ. Then (33) can be rewritten in the form of () (with the same optimal value) in the following two ways: X X 2 P Ω(X M) 2 F + δ S (X) + δ {{{{ Ξ k (X), {{ P f(x) (X) P 0(X) 2 P Ω(X M) 2 F + δ S (X) + δ Ξk (X). {{{{{{ f(x) P 0(X) P (X) (34) (35) Note that in both cases, f +P 0 is level-bounded and hence the SDCAM in Section 3 can be applied. Suppose that SDCAM is applied to (34). Then when the NPG major is applied as described in Theorem for solving the subproblems, it requires computing proj S and proj Ξ k. Both of these are easy to compute. In particular, let UDiag(σ)V be a singular value decomposition of W. Then an element Y proj Ξ k (W ) can be computed as Y = UDiag(ζ )V with ζ = min{h k (σ), τe, where e is the vector of all ones, the minimum is taken componentwise, and H k (v) is the hard thresholding operator that keeps any k largest entries of v in magnitude and sets the rest to zero. 4 On the other hand, when applying SDCAM to (35) with the NPG major as described in Theorem applied to the subproblems, one needs to compute proj S and proj Ξk. Again, both of these are easy to compute. In particular, let UDiag(σ)V be a singular value decomposition of W. Then an element Y proj Ξk (W ) can be computed as Y = UDiag(H k (σ))v. Eample 4 (Nearest low-rank correlation matri) Finding the nearest low-rank correlation matri has important applications in finance; see [5, ]. The problem 4 To see this, recall from [6, Corollary 2.3] and [5, Proposition 3.] that an element Y proj Ξk (W ) can be computed as Y = UDiag(ζ )V, where ζ i = { ζ i if i I, 0 otherwise, where ζ i = argmin{ 2 (ζ i σ i ) 2 : ζ i τ = min{σ i, τ, and I is an inde set of size k corresponding to the k largest values of { 2 σ2 i 2 ( ζ i σ i) 2 n i= = { 2 σ2 i 2 (ma{0, σ i τ) 2 n i=. Since t 2 t2 2 (ma{0, t τ)2 is nondecreasing for nonnegative t, we can take I to correspond to any k largest singular values.

17 Successive DC Approimation for Nonconve and Nonsmooth Problems 7 is often formulated as X S n 2 H (X M) 2 F subject to diag(x) = e, X 0, rank(x) k, (36) where S n is the space of n n symmetric matrices, H is a given nonnegative weight matri, M is a given symmetric matri and e is the vector of all ones, k. In [], the constraint rank(x) k was rewritten equivalently as requiring the sum of the n k smallest eigenvalues equal zero. A penalty approach was then adopted to handle this latter equality constraint. In the following, we describe how to solve (36) by the SDCAM in Section 3. Notice that for any X S n satisfying diag(x) = e and X 0, we have X n I. Thus, the feasible set of (36) is compact and hence (36) has a solution. Let X be a solution of (36) and τ ma{ma i,j X ij, λma(x ). Define S := {X S n : diag(x) = e, Π k := {X 0 : rank(x) k, S := {X S : ma X ij τ, i,j Πk := {X Π k : λ ma(x) τ. Then (36) can be rewritten in the form of () (with the same optimal value) in the following two ways: X S n 2 H (X M) 2 F {{ f(x) X S n 2 H (X M) 2 F {{ f(x) + δ S (X) + δ {{ Π k (X), {{ P (X) P 0(X) + δ S {{ (X) + δ Πk (X). {{ P 0(X) P (X) Notice that in both cases, f + P 0 is level-bounded and hence we can apply the SDCAM in Section 3. We first look at (37). When the NPG major as described in Theorem is applied to the subproblems, one has to compute proj S and proj Π k. Both projections can be easily computed. In particular, let UDiag(λ)U be an eigenvalue decomposition of W S n. Then an element Y proj Π k (W ) can be computed as Y = UDiag(ζ )V with ζ = ma{min{ Hk (λ), τ, 0, where Hk (v) keeps any k largest entries of v and sets the rest to zero. 5 We net turn to (38). In this case, in each NPG major iteration, one has to compute proj S and proj Π k. Again, both projections can be easily computed. In 5 To see this, recall from [6, Proposition 2.8] and [5, Proposition 3.] that an element Y proj Πk (W ) can be computed as Y = UDiag(ζ )V, where ζ i = { ζ i if i I, 0 otherwise, where ζ i = argmin{ 2 (ζ i λ i ) 2 : 0 ζ i τ = ma{min{λ i, τ, 0, and I is an inde set of size k corresponding to the k largest values of { 2 λ2 i 2 ( ζ i λ i) 2 n i= = { 2 λ2 i 2 (min{ma{λ i τ, 0, λ i ) 2 n i=. Since the function t 2 t2 2 (min{ma{t τ, 0, t)2 is nondecreasing, we can let I correspond to any k largest entries of λ. (37) (38)

18 8 T. Liu, T. K. Pong and A. Takeda particular, let UDiag(λ)U be an eigenvalue decomposition of W S n. Then an element Y proj Πk (W ) can be computed as Y = UDiag(ma{ Hk (λ), 0)U. Eample 5 (Simultaneously sparse and low rank matri optimization problem) The following problem was considered in [23]: X f(x) + γ vec(x) + τ X, where f is as in (), γ and τ are positive numbers. This problem aims at finding solutions which are both sparse and low-rank, and can be applied to identifying clusters in social networks; see [23, Section 6.2]. This model relaes and penalizes the sparsity inde vec(x) 0 and the low-rank inde rank(x) by two conve functions vec(x) and X, respectively. Here, we consider the following variant that eplicitly incorporates the sparsity and rank constraints: f(x) X subject to vec(x) 0 s, rank(x) k. (39) Suppose that (39) has a solution X, and let τ ma{ma i,j X ij, σma(x ). Define S := {X : vec(x) 0 s, S := {X S : ma X ij τ and Ξk := {X i,j Ξ k : σ ma(x) τ. Then (39) can be rewritten in the form of () (with the same optimal value) in the following two ways: X X f(x) + δ S (X) + δ {{ Ξ k (X), {{ P (X) P 0(X) f(x) + δ S {{ (X) + δ Ξk (X). {{ P 0(X) P (X) (40) (4) Note that in both cases, f +P 0 is level-bounded and hence the SDCAM in Section 3 can be applied. When the NPG major as described in Theorem is applied to the corresponding subproblems, one has to compute proj S and proj Ξ k for (40), and proj S and proj Ξ k for (4). All these projections can be computed efficiently; see Eamples and 3. 5 Numerical eperiments In this section, we apply our SDCAM in Section 3 with subproblems solved by NPG major as described in Theorem to an instance of Eample 2 and Eample 5: the nonconve fused regularized problem and the simultaneously sparse and low rank matri optimization problem. All numerical eperiments are performed in Matlab R206a on a 64-bit PC with an Intel(R) Core(TM) i CPU (3.4GHz) and 32GB of RAM.

19 Successive DC Approimation for Nonconve and Nonsmooth Problems 9 5. Nonconve fused regularized problem: comparison against a solution method based on smoothing We consider the following special instance of nonconve fused regularized problem: 2 b 2 + c + c 2 D p p, (42) where c > 0, c 2 > 0, p = 0.5, D = ( 2,..., n n ), and b R n is the noisy measurement of a piecewise constant sparse signal. Notice that the function is level-bounded. We can directly apply SDCAM as described in Eample 2 and solve the subproblems by NPG major. On the other hand, a commonly used technique for handling optimization problems involving l p penalty functions (0 < p < ) is smoothing. Thus, in our eperiments below, we compare SDCAM with a method based on smoothing, the smoothing nonmonotone proimal gradient method (snpg), for solving (42). In snpg, we solve the following sequence of subproblems approimately by NPG (this is NPG major applied to (44) when g = 0): n ( ) p 2 b 2 + c 2 (D) 2 i + λ 2 2 t i= {{ f t() + c, {{ Q() where λ t 0 is the smoothing parameter. The approimate stationary point of f t + Q obtained is then used as initialization for minimizing f t+ + Q. Data generation: We first randomly generate a piecewise constant signal R n using the following Matlab code: J = randperm(0);i = sort(j(:6), ascend ); = zeros(n,); for i = :r if randn > 0 (n*i(i)/0-3*n/50 - randi(3) : n*i(i)/0) = randi(3); else (n*i(i)/0-3*n/50 - randi(3) : n*i(i)/0) = -randi(3); end end Then we let b = +σξ, where σ > 0 is a noise factor and ξ has i.i.d. standard Gaussian entries. In our eperiments, motivated by [2], we choose c = c 2 = σ n/40. We shall see that this choice leads to reasonable recovery results in Figure. We also set σ = 0., n = 2000, 4000, 6000, 8000, Parameter setting: In SDCAM, we set λ t = /0 t+ and feas to be the vector of all ones. In the NPG major for solving the subproblems, we set M = 4, L ma = 0 8, L min = 0 8, τ = 2, c = 0 4, L 0 t,0 = and for l, L 0 t,l = ma { min { s l y l s l 2, Lma, L min,

20 20 T. Liu, T. K. Pong and A. Takeda (which is the inverse of the so-called Barzilai-Borwein stepsize) where s l = t,l t,l and y l = h( t,l ) h( t,l ). We initialize NPG major at feas and terminate it when the maimum number of iterations eceeds 0000 or t,l t,l ma( t,l, ) < ɛ t/ Lt,l or F λ t ( t,l ) F λt ( t,l ) ma{, F λt ( t,l < 0 2, ) where ɛ 0 = 0 5 and ɛ t = ma{ɛ t /.5, 0 6. On the other hand, in snpg, we also let λ t = /0 t+ and solve the subproblems using NPG (i.e., NPG major applied to (44) with g = 0) with the same setting as described above, ecept that the F λt above is replaced by f t + Q and for l, { { L 0 ma min s l y l, L t,l = s l 2 ma, L min ma { min { Lt,l /2, L ma, Lmin if s l y l > 0 2, otherwise. Finally, we terminate SDCAM when λ t < 0 9. And for a fair comparison, we consider two different termination criteria for snpg: λ t < 0 7 (snpg 7 ) and λ t < 0 8 (snpg 8 ). Numerical results: In Table, we compare SDCAM, snpg 7 and snpg 8 in terms of the number of iterations (iter), CPU time (CPU) and the terminating function values (fval), averaged over 0 randomly generated instances. One can see that the terminating function values are comparable, and SDCAM is in general faster than snpg 8 and slower than snpg 7. Moreover, SDCAM outperforms the snpg s slightly in terms of function values when the dimension is relatively low ( 4000). To illustrate the ability to recover the original signal, we also plot the original signal, the noisy measurement b and the signals recovered by SDCAM and snpg 8 for a random instance with n = 2000 in Figure. Table : Results for SDCAM, snpg 7 and snpg 8 for solving (42). n iter CPU fval SDCAM snpg 7 snpg 8 SDCAM snpg 7 snpg 8 SDCAM snpg 7 snpg e e e e e e e e e e e e e e e+03 To illustrate intuitively the approimation used in our SDCAM and snpg, we plot the function f() = /2 (in dashed lines), its Moreau envelope and its smoothing function in Figure 2. One can see that the envelope smooths the original nonsmooth point by a quadratic function. It is a lower approimation of f, while the smoothing function is an upper approimation of f.

21 Successive DC Approimation for Nonconve and Nonsmooth Problems 2 Fig. : Recovery comparison for noisy signal. Fig. 2: /2 with its Moreau envelope and smoothing function. 5.2 Simultaneously sparse and low rank matri optimization problem: which constraint should be modeled by P? We consider the following special instance of simultaneously sparse and low rank matri optimization problem: X 2 X M 2 F subject to vec(x) 0 s, rank(x) k, (43) where M R m n is a given noisy matri, s and k are positive integers. Note that f(x) := 2 X M 2 F is level-bounded. Therefore, (43) has at least one solution. Then, as discussed in Eample 5, we can apply SDCAM to solving (43) in two different ways by considering, respectively, the two formulations in (40) and (4): 6 the indicator function δ 0 s( ) is approimated by the Moreau envelope in (40) and the function δ rank( ) k ( ) is approimated by its Moreau envelope in (4). We call the method based on (40) SDCAM r and the method based on (4) SDCAM s. In the following eperiments, we compare these two methods. Data generation: We first randomly generate M R m k and M 2 R k n to have i.i.d. standard Gaussian entries. Then we set m/0 random rows of M to zero and let M = M M 2 + σ, where σ > 0 is a noise factor and has i.i.d. standard Gaussian entries. We fi n = 500, k = 0 and s = mn/0, and we eperiment with σ = 0.005, 0.0, 0.02 and m = 000, 2000, 3000 below. 6 We would like to point out that we are indeed using Ξ k in place of Ξ k in (40) and using S in place of S in (4) in our eperiments below. Notice that A3 is still satisfied because f is level-bounded.

22 22 T. Liu, T. K. Pong and A. Takeda Parameter setting: In both SDCAM r and SDCAM s, we set λ t = /0 t+ and X feas = 0. In the NPG major for solving the subproblems, we use the same parameter setting as in Section 5.. We initialize both algorithms at X feas and terminate them when dist(x t, S) 0 6 X t F and dist(x t, Ξ k ) 0 6 X t F, respectively. Numerical results: In Table 2, we compare SDCAM r and SDCAM s in terms of the number of iterations (iter), CPU time (CPU) and the feasibility violation (vio) (i.e., dist(x t, S) and dist(x t, Ξ k ), respectively) at termination, averaged over 0 randomly generated instances. One can see that SDCAM r takes fewer iterations and less time. An intuitive eplanation could be that the rank constraint is a more complicated constraint than the sparsity constraint to approimate via subgradients. Thus, the algorithm SDCAM r that maintains all its iterates in the rank constraint and then attempts to approimately satisfy the sparsity constraint as the algorithm progresses ends up converging more quickly. Table 2: Comparison of SDCAM r and SDCAM s for solving (43). σ m iter CPU vio SDCAM r SDCAM s SDCAM r SDCAM s SDCAM r SDCAM s e e e e e e e e e e e e e e e e e e-04 6 Conclusions In this paper, we propose a successive difference-of-conve approimation method for solving (). The key idea of this method is to approimate the nonsmooth functions in the objective of () by their Moreau envelopes. The approimation function can then be d by various proimal gradient methods with majorization techniques such as NPG major in the appendi, thanks to (6). We prove that the sequence generated by our method is bounded and any accumulation point is a stationary point of () under suitable conditions. We also discuss how to apply our method to concrete applications and conduct numerical eperiments to illustrate its efficiency.

Douglas-Rachford splitting for nonconvex feasibility problems

Douglas-Rachford splitting for nonconvex feasibility problems Guoyin Li Ting Kei Pong Jan 3, 015 Abstract We adapt the Douglas-Rachford DR) splitting method to solve nonconvex feasibility problems by studying