Sparse Regularization via Convex Analysis

Size: px

Start display at page:

Download "Sparse Regularization via Convex Analysis"

Anis Parks
6 years ago
Views:

1 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 65, NO. 7, PP , SEPTEMBER 7 (PREPRINT) Sparse Regularization via Convex Analysis Ivan Selesnick Abstract Sparse approximate solutions to linear equations are classically obtained via L norm regularized least squares, but this method often underestimates the true solution. As an alternative to the L norm, this paper proposes a class of nonconvex penalty functions that maintain the convexity of the least squares cost function to be minimized, and avoids the systematic underestimation characteristic of L norm regularization. The proposed penalty function is a multivariate generalization of the minimax-concave (MC) penalty. It is defined in terms of a new multivariate generalization of the Huber function, which in turn is defined via infimal convolution. The proposed sparseregularized least squares cost function can be minimized by proximal algorithms comprising simple computations. I. INTRODUCTION Numerous signal and image processing techniques build upon sparse approximation [59]. A sparse approximate solution to a system of linear equations (y = Ax) can often be obtained via convex optimization. The usual technique is to minimize the regularized linear least squares cost function J : R N R, J(x) = y Ax + λ x, λ >. () The l norm is classically used as a regularizer here, since among convex regularizers it induces sparsity most effectively [9]. But this formulation tends to underestimate highamplitude components of x R N. Non-convex sparsityinducing regularizers are also widely used (leading to more accurate estimation of high-amplitude components), but then the cost function is generally non-convex and has extraneous suboptimal local minimizers [4]. This paper proposes a class of non-convex penalties for sparse-regularized linear least squares that generalizes the l norm and maintains the convexity of the least squares cost function to be minimized. That is, we consider the cost function F : R N R F (x) = y Ax + λ ψ B(x), λ > () and we propose a new non-convex penalty ψ B : R N R that makes F convex. The penalty ψ B is parameterized by a matrix B, and the convexity of F depends on B being suitably prescribed. In fact, the choice of B will depend on A. The matrix (linear operator) A may be arbitrary (i.e., injective, surjective, both, or neither). In contrast to the l norm, the new approach does not systematically underestimate highamplitude components of sparse vectors. Since the proposed Department of Electrical and Computer Engineering, Tandon School of Engineering, New York University, New York, USA. selesi@nyu.edu This work was supported by NSF under grant CCF-5598 and ONR under grant N Supplemental software (Matlab and Python) is available from the author or online at formulation is convex, the cost function has no suboptimal local minimizers. The new class of non-convex penalties is defined using tools of convex analysis. In particular, infimal convolution is used to define a new multivariate generalization of the Huber function. In turn, the generalized Huber function is used to define the proposed non-convex penalty, which can be considered a multivariate generalization of the minimax-concave (MC) penalty. Even though the generalized MC (GMC) penalty is non-convex, it is easy to prescribe this penalty so as to maintain the convexity of the cost function to be minimized. The proposed convex cost functions can be minimized using proximal algorithms, comprising simple computations. In particular, the minimization problem can be cast as a kind of saddle-point problem for which the forward-backward splitting algorithm is applicable. The main computational steps of the algorithm are the operators A, A T, and soft thresholding. The implementation is thus matrix-free in that it involves the operators A and A T, but does not access or modify the entries of A. Hence, the algorithm can leverage efficient implementations of A and its transpose. We remark that while the proposed GMC penalty is nonseparable, we do not advocate non-separability in and of itself as a desirable property of a sparsity-inducing penalty. But in many cases (depending on A), non-separability is simply a requirement of a non-convex penalty designed so as to maintain convexity of the cost function F to be minimized. If A T A is singular (and none of its eigenvectors are standard basis vectors), then a separable penalty that maintains the convexity of the cost function F must, in fact, be a convex penalty [55]. This leads us back to the l norm. Thus, to improve upon the l norm, the penalty must be non-separable. This paper is organized as follows. Section II sets notation and recalls definitions of convex analysis. Section III recalls the (scalar) Huber function, the (scalar) MC penalty, and how they arise in the formulation of threshold functions (instances of proximity operators). The subsequent sections generalize these concepts to the multivariate case. In Section IV, we define a multivariate version of the Huber function. In Section V, we define a multivariate version of the MC penalty. In Section VI, we show how to set the GMC penalty to maintain convexity of the least squares cost function. Section VII presents a proximal algorithm to minimize this type of cost function. Section VIII presents examples wherein the GMC penalty is used for signal denoising and approximation. Elements of this work were presented in Ref. [49]. A. Related work Many prior works have proposed non-convex penalties that strongly promote sparsity or describe algorithms for solving

2 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 65, NO. 7, PP , SEPTEMBER 7 (PREPRINT) the sparse-regularized linear least squares problem, e.g., [], [], [5], [6], [9], [5], [], [], [8], [9], [4], [47], [58], [64], [66]. However, most of these papers (i) use separable (additive) penalties or (ii) do not seek to maintain convexity of the cost function. Non-separable non-convex penalties are proposed in Refs. [6], [6], but they are not designed to maintain cost function convexity. The development of convexity-preserving non-convex penalties was pioneered by Blake, Zisserman, and Nikolova [7], [4] [44], and further developed in [6], [7], [], [], [6], [7], [45], [54], [56]. But these are separable penalties, and as such they are fundamentally limited. Specifically, if A T A is singular, then a separable penalty constrained to maintain cost function convexity can only improve on the l norm to a very limited extent [55]. Non-convex regularization that maintains cost function convexity was used in [5] in an iterative manner for non-convex optimization, to reduce the likelihood that an algorithm converges to suboptimal local minima. To overcome the fundamental limitation of separable nonconvex penalties, we proposed a bivariate non-separable nonconvex penalty that maintains the convexity of the cost function to be minimized [55]. But that penalty is useful for only a narrow class of linear inverse problems. To handle more general problems, we subsequently proposed a multivariate penalty formed by subtracting from the l norm a function comprising the composition of a linear operator and a separable nonlinear function [5]. Technically, this type of multivariate penalty is non-separable, but it still constitutes a rather narrow class of non-separable functions. Convex analysis tools (especially the Moreau envelope and the Fenchel conjugate) have recently been used in novel ways for sparse regularized least squares [], [57]. Among other aims, these papers seek the convex envelope of the l pseudonorm regularized least squares cost function, and derive alternate cost functions that share the same global minimizers but have fewer local minima. In these approaches, algorithms are less likely to converge to suboptimal local minima (the global minimizer might still be difficult to calculate). For the special case where A T A is diagonal, the proposed GMC penalty is closely related to the continuous exact l (CEL) penalty introduced in [57]. In [57] it is observed that if A T A is diagonal, then the global minimizers of the l regularized problem coincides with that of a convex function defined using the CEL penalty. Although the diagonal case is simpler than the non-diagonal case (a non-convex penalty can be readily constructed to maintain cost function convexity [54]), the connection to the l problem is enlightening. In other related work, we use convex analysis concepts (specifically, the Moreau envelope) for the problem of total variation (TV) denoising [5]. In particular, we prescribe a non-convex TV penalty that preserves the convexity of the TV denoising cost function to be minimized. The approach of Ref. [5] generalizes standard TV denoising so as to more accurately estimate jump discontinuities. II. NOTATION The l, l, and l norms of x R N are defined x = n x n, x = ( n x n ) /, and x = max n x n. If x Fig.. The Huber function. A R M N, then component n of Ax is denoted [Ax] n. If the matrix A B is positive semidefinite, we write B A. The matrix -norm of matrix A is denoted A and its value is the square root of the maximum eigenvalue of A T A. We have Ax A x for all x R N. If A has full rowrank (i.e., AA T is invertible), then the pseudo-inverse of A is given by A + := A T (AA T ). We denote the transpose of the pseudo-inverse of A as A +T, i.e., A +T := (A + ) T. If A has full row-rank, then A +T = (AA T ) A. This work uses definitions and notation of convex analysis [4]. The infimal convolution of two functions f and g from R N to R + } is given by (f g)(x) = inf v R N f(v) + g(x v) }. () The Moreau envelope of the function f : R N R is given by f M (x) = inf v R N f(v) + x v }. (4) In the notation of infimal convolution, we have f M = f. (5) The set of proper lower semicontinuous (lsc) convex functions from R N to R + } is denoted Γ (R N ). If the function f is defined as the composition f(x) = h(g(x)), then we write f = h g. The soft threshold function soft: R R with threshold parameter λ is defined as, y λ soft(y; λ) := (6) ( y λ) sign(y), y λ. III. SCALAR PENALTIES We recall the definition of the Huber function []. Definition. The Huber function s: R R is defined as s(x) := x, x x, x, (7) as illustrated in Fig.. Proposition. The Huber function can be written as s(x) = min v R v + (x v) }. (8)

3 .5.5 x x +.5 x x φ(x) x Fig.. The Huber function as the pointwise minimum of three functions. In the notation of infimal convolution, we have equivalently s = ( ). (9) And in the notation of the Moreau envelope, we have equivalently s = M. The Huber function is a standard example of the Moreau envelope. For example, see Sec.. of Ref. [46] and []. We note here that, given x R, the minimum in (8) is achieved for v equal to, x, or x +, i.e., s(x) = min v + v, x, x+} (x v) }. () Consequently, the Huber function can be expressed as s(x) = min x, x +, x + + } as illustrated in Fig.. () We now consider the scalar penalty function illustrated in Fig.. This is the minimax-concave (MC) penalty [65]; see also [5], [6], [8]. Definition. The minimax-concave (MC) penalty function φ: R R is defined as x φ(x) := x, x, x, () as illustrated in Fig.. The MC penalty can be expressed as φ(x) = x s(x) () where s is the Huber function. This representation of the MC penalty will be used in Sec. V to generalize the MC penalty to the multivariate case. A. Scaled functions It will be convenient to define scaled versions of the Huber function and MC penalty. Definition. Let b R. The scaled Huber function s b : R R is defined as s b (x) := s(b x)/b, b. (4) For b =, the function is defined as s (x) :=. (5) x Fig.. The MC penalty function. Hence, for b, the scaled Huber function is given by s b (x) = b x, x /b x b, x /b (6). The scaled Huber function s b is shown in Fig. 4 for several values of the scaling parameter b. Note that and s b (x) x, x R, (7) lim b(x) = x b (8) lim b(x) =. b (9) Incidentally, we use b in definition (4) rather than b, so as to parallel the generalized Huber function to be defined in Sec. IV. Proposition. Let b R. The scaled Huber function can be written as s b (x) = min v R v + b (x v) }. () In terms of infimal convolution, we have equivalently s b = b ( ). () Proof. For b, we have from (4) that s b (x) = min v R v + (b x v) }/b = min v R b v + (b x b v) }/b = min v R v + b (x v) }. It follows from = that () holds for b =. Definition 4. Let b R. The scaled MC penalty function φ b : R R is defined as where s b is the scaled Huber function. φ b (x) := x s b (x) () The scaled MC penalty φ b is shown in Fig. 4 for several values of b. Note that φ (x) = x. For b, x φ b (x) = b x, x /b b, x /b ().

4 4 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 65, NO. 7, PP , SEPTEMBER 7 (PREPRINT) Scaled Huber function s b (x) x b =.4 b =. b =.7 b =.5 4 Firm threshold function λ µ.5 Scaled MC penalty φ b (x) b =. b = y.5 b =.7.5 b =. b =.4 x Fig. 4. Scaled Huber function and MC penalty for several values of the scaling parameter. B. Convexity condition In the scalar case, the MC penalty corresponds to a type of threshold function. Specifically, the firm threshold function is the proximity operator of the MC penalty, provided that a particular convexity condition is satisfied. Here, we give the convexity condition for the scalar case. We will generalize this condition to the multivariate case in Sec. VI. Proposition. Let λ > and a R. Define f : R R, f(x) = (y ax) + λ φ b (x) (4) where φ b is the scaled MC penalty (). If then f is convex. b a /λ, (5) There are several ways to prove Proposition. In anticipation of the multivariate case, we use a technique in the following proof that we later use in the proof of Theorem in Sec. VI. Proof of Proposition. Using (), we write f as f(x) = (y ax) + λ x λs b (x) = g(x) + λ x where s b is the scaled Huber function and g : R R is given by g(x) = (y ax) λs b (x). (6) Since the sum of two convex functions is convex, it is sufficient to show g is convex. Using (), we have g(x) = (y ax) λ min v R v + b (x v) } Fig. 5. Firm threshold function. = max v R (y ax) λ v λb (x v) } = (a λb )x + max v R (y axy) λ v λb (v xv) }. Note that the expression in the curly braces is affine (hence convex) in x. Since the pointwise maximum of a set of convex functions is itself convex, the second term is convex in x. Hence, g is convex if a λb. The firm threshold function was defined by Gao and Bruce [9] as a generalization of hard and soft thresholding. Definition 5. Let λ > and µ > λ. The threshold function firm: R R is defined as firm(y; λ, µ) :=, y λ µ( y λ)/(µ λ) sign(y), λ y µ y, y µ as illustrated in Fig. 5. (7) In contrast to the soft threshold function, the firm threshold function does not underestimate large amplitude values, since it equals the identity for large values of its argument. As µ λ or µ, the firm threshold function approaches the hard or soft threshold function, respectively. We now state the correspondence between the MC penalty and the firm threshold function. When f in (4) is convex (i.e., b a /λ), the minimizer of f is given by firm thresholding. This is noted in Refs. [5], [8], [64], [65]. Proposition 4. Let λ >, a >, b >, and b a /λ. Let y R. Then the minimizer of f in (4) is given by firm thresholding, i.e., x opt = firm(y/a; λ/a, /b ). (8) Hence, the minimizer of the scalar function f in (4) is easily obtained via firm thresholding. However, the situation in the multivariate case is more complicated. The aim of this

5 5 paper is to generalize this process to the multivariate case: to define a multivariate MC penalty generalizing (), to define a regularized least squares cost function generalizing (4), to generalize the convexity condition (5), and to provide a method to calculate a minimizer. IV. GENERALIZED HUBER FUNCTION In this section, we introduce a multivariate generalization of the Huber function. The basic idea is to generalize () which expresses the scalar Huber function as an infimal convolution. Definition 6. Let B R M N. We define the generalized Huber function S B : R N R as S B (x) := inf v + v R N B(x } v). (9) In the notation of infimal convolution, we have S B = B. () Proposition 5. The generalized Huber function S B is a proper lower semicontinuous convex function, and the infimal convolution is exact, i.e., S B (x) = min v + v R N B(x } v). () Proof. Set f = and g = B. Both f and g are convex; hence f g is convex by proposition. in [4]. Since f is coercive and g is bounded below, and f, g Γ (R N ), it follows that f g Γ (R N ) and the infimal convolution is exact (i.e., the infimum is achieved for some v) by Proposition.4 in [4]. Note that if C T C = B T B, then S B (x) = S C (x) for all x. That is, the generalized Huber function S B depends only on B T B, not on B itself. Therefore, without loss of generality, we may assume B has full row-rank. (If a given matrix B does not have full row-rank, then there is another matrix C with full row-rank such that C T C = B T B, yielding the same function S B.) As expected, the generalized Huber function reduces to the scalar Huber function. Proposition 6. If B is a scalar, i.e., B = b R, then the generalized Huber function reduces to the scalar Huber function, S b (x) = s b (x) for all x R. The generalized Huber function is separable (additive) when B T B is diagonal. Proposition 7. Let B R M N. If B T B is diagonal, then the generalized Huber function is separable (additive), comprising a sum of scalar Huber functions. Specifically, B T B = diag(α,..., α N) = S B (x) = n s αn (x n ). The utility of the generalized Huber function will be most apparent when B T B is a non-diagonal matrix. In this case, the generalized Huber function is non-separable, as illustrated in the following two examples. Example. For the matrix B =, () the generalized Huber function S B is shown in Fig. 6. As shown in the contour plot, the level sets of S B near the origin are ellipses. Example. For the matrix B = [.5] () the generalized Huber function S B is shown in Fig. 7. The level sets of S B are parallel lines because B is of rank. There is not a simple explicit formula for the generalized Huber function. But, using () we can derive several properties regarding the function. Proposition 8. Let B R M N. The generalized Huber function satisfies S B (x) x, x R N. (4) Proof. Using (), we have S B (x) = min v + v R N B(x } v) [ ] v + B(x v) = x. v=x Since S B is the minimum of a non-negative function, it also follows that S B (x) for all x. The following proposition accounts for the ellipses near the origin in the contour plot of the generalized Huber function in Fig. 6. (Further from the origin, the contours are not ellipsoidal.) Proposition 9. Let B R M N. The generalized Huber function satisfies S B (x) = Bx for all BT Bx. (5) Proof. From (), we have that S B (x) is the minimum value of g where g : R N R is given by Note that g(v) = v + B(x v). g() = Bx. Hence, it suffices to show that minimizes g if and only if B T Bx. Since g is convex, minimizes g if and only if g() where g is the subdifferential of g given by g(v) = sign(v) + B T B(v x) where sign is the set-valued signum function, }, t > sign(t) := [, ], t = }, t <.

6 6 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 65, NO. 7, PP , SEPTEMBER 7 (PREPRINT) Generalized Huber function S B (x) Generalized Huber function S B (x) x x x x Contours of S B Contours of S B Fig. 6. The generalized Huber function for the matrix B in (). It follows that minimizes g if and only if sign() B T Bx B T Bx [, ] N [ B T Bx ] [, ] for n =,..., N n B T Bx. Hence, the function S B coincides with B on a subset of its domain. Proposition. Let B R M N and set α = B. The generalized Huber function satisfies S B (x) S αi (x), x R N (6) = n s α (x n ). (7) Proof. Using (), we have S B (x) = min v + v R N B(x } v) min v + v R N B (x } v) = min v + v R N α (x v) } = min v + v R N α (x } v) = S αi (x). From Proposition 7 we have (7). Fig. 7. The generalized Huber function for the matrix B in (). The Moreau envelope is well studied in convex analysis [4]. Hence, it is useful to express the generalized Huber function S B in terms of a Moreau envelope, so we can draw on results in convex analysis to derive further properties of the generalized Huber function. Lemma. If B R N N is invertible, then the generalized Huber function S B can be expressed in terms of a Moreau envelope as S B = ( B ) M B. (8) Proof. Using (9), we have ( S B = ( ( B) = B)) B B ( ( = B ) ( ) ) B = ( B ) M B. Lemma. If B R M N has full row-rank, then the generalized Huber function S B can be expressed in terms of a Moreau envelope as S B = ( d B + ) M B (9) where d: R N R is the convex distance function d(x) = min x w (4) w null B

7 7 which represents the distance from the point x R N to the null space of B as measured by the l norm. Proof. Using (), we have S B (x) = min v + v R N B(x } v) = f(bx) where f : R M R is given by f(z) = min v R N v + z Bv = min u (null B) = min u (null B) } min u + w + w null B z B(u + } w) min u + w + w null B z } Bu } = min u (null B) d(u) + z Bu where d is the convex function given by (4). The fact that d is convex follows from Proposition 8.6 of [4] and Examples.6 and.7 of [8]. Since (null B) = range B T, f(z) = min d(u) + u range B T z } Bu = min d(b T v) + v R M z BBT v } = min d(b T (BB T ) v) + v R M z BBT (BB T ) v } = min d(b + v) + v R M z } v = ( d(b + ) ) M (z). Hence, S B (x) = ( d(b + ) ) M (Bx) which completes the proof. Note that (9) reduces to (8) when B is invertible. (Suppose B is invertible. Then null B = }; hence d(x) = x in (4). Additionally, B + = B.) Proposition. The generalized Huber function is differentiable. Proof. By Lemma, S B is the composition of a Moreau envelope of a convex function and a linear function. Additionally, by Proposition 5, S B Γ (R N ). By Proposition.9 in [4], it follows that S B is differentiable. The following result regards the gradient of the generalized Huber function. This result will be used in Sec. V to show the generalized MC penalty defined therein constitutes a valid penalty. Lemma. The gradient of the generalized Huber function S B : R N R satisfies S B (x) for all x R N. (4) Proof. Since S B is convex and differentiable, we have S B (v)+ [ S B (v) ] T (x v) SB (x), x R N, v R N. Using Proposition 8, it follows that S B (v) + [ S B (v) ] T (x v) x, x R N, v R N. Let x = (,...,, t,,..., ) where t is in position n. It follows that c(v) + [ S B (v) ] n t t, t R, v RN (4) where c(v) R does not depend on t. It follows from (4) that [ S B (v)] n. The generalized Huber function can be evaluated by taking the pointwise minimum of numerous simpler functions (comprising quadratics, absolute values, and linear functions). This generalizes the situation for the scalar Huber function, which can be evaluated as the pointwise minimum of three functions, as expressed in () and illustrated in Fig.. Unfortunately, evaluating the generalized Huber function on R N this way requires the evaluation of N simpler functions, which is not practical except for small N. In turn, the evaluation of the GMC penalty is also impractical. However, we do not need to explicitly evaluate these functions to utilize them for sparse regularization, as shown in Sec. VII. For this paper, we compute these functions on R only for the purpose of illustration (Figs. 6 and 7). V. GENERALIZED MC PENALTY In this section, we propose a multivariate generalization of the MC penalty (). The basic idea is to generalize () using the l norm and the generalized Huber function. Definition 7. Let B R M N. We define the generalized MC (GMC) penalty function ψ B : R N R as ψ B (x) := x S B (x) (4) where S B is the generalized Huber function (9). The GMC penalty reduces to a separable penalty when B T B is diagonal. Proposition. Let B R M N. If B T B is a diagonal matrix, then ψ B is separable (additive), comprising a sum of scalar MC penalties. Specifically, B T B = diag(α,..., α N) = ψ B (x) = n φ αn (x n ) where φ b is the scaled MC penalty (). If B T B =, then ψ B (x) = x. Proof. If B T B = diag(α,..., αn ), then by Proposition 7 we have ψ B (x) = x n = n s αn (x n ) x n s αn (x n ) which proves the result in light of definition (). The most interesting case (the case that motivates the GMC penalty) is the case where B T B is a non-diagonal matrix. If B T B is non-diagonal, then the GMC penalty is non-separable. Example. For the matrices B given in () and (), the GMC penalty is illustrated in Fig. 8 and Fig. 9, respectively.

8 8 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 65, NO. 7, PP , SEPTEMBER 7 (PREPRINT) GMC penalty ψ B (x) = x S B (x) GMC penalty ψ B (x) = x S B (x).5.5 x x x x Contours of ψ B Contours of ψ B Fig. 8. The GMC penalty for the matrix B in (). The following corollaries follow directly from Propositions 8 and 9. Corollary. The generalized MC penalty satisfies ψ B (x) x for all x R N. (44) Corollary. Given B R M N, the generalized MC penalty satisfies ψ B (x) = x Bx for all BT Bx. (45) The corollaries imply that around zero the generalized MC penalty approximates the l norm (from below), i.e., ψ B (x) x for x. The generalized MC penalty has a basic property expected of a regularization function; namely, that large values are penalized more than (or the same as) small values. Specifically, if v, x R N with v i x i and sign v i = sign x i for i =,..., N, then ψ B (v) ψ B (x). That is, in any given quadrant, the function ψ B (x) is a non-decreasing function in each x i. This is formalized in the following proposition, and illustrated in Figs. 8 and 9. Basically, the gradient of ψ B points away from the origin. Proposition. Let x R N with x i. The generalized MC penalty ψ B has the property that [ ψ B (x)] i either has the same sign as x i or is equal to zero. Fig. 9. The GMC penalty for the matrix B in (). Proof. Let x R N with x i. Then, from the definition of the MC penalty, ψ B x i (x) = sign(x i ) S B x i (x). From Lemma, S B (x)/ x i. Hence ψ B (x)/ x i when x i >, and ψ B (x)/ x i when x i <. A penalty function not satisfying Proposition would not be considered an effective sparsity-inducing regularizer. VI. SPARSE REGULARIZATION In this section, we consider how to set the GMC penalty to maintain the convexity of the regularized least square cost function. To that end, the condition (47) below generalizes the scalar convexity condition (5). Theorem. Let y R M, A R M N, and λ >. Define F : R N R as F (x) = y Ax + λ ψ B(x) (46) where ψ B : R N R is the generalized MC penalty (4). If then F is a convex function. B T B λ AT A (47)

9 9 Proof. Write F as F (x) = y Ax + λ ( x S B (x) ) = y Ax + λ x min λ v + λ v R N B(x } v) = max v R N y Ax + λ x λ v λ B(x } v) = max v R N xt( A T A λb T B ) x + λ x + g(x, v) } = xt( A T A λb T B ) x + λ x + max g(x, v) v R N where g is affine in x. The last term is convex as it is the pointwise maximum of a set of convex functions (Proposition 8.4 in [4]). Hence, F is convex if A T A λb T B is positive semidefinite. The convexity condition (47) is easily satisfied. Given A, we may simply set B = γ/λ A, γ. (48) Then B T B = (γ/λ)a T A which satisfies (47) when γ. The parameter γ controls the non-convexity of the penalty ψ B. If γ =, then B = and the penalty reduces to the l norm. If γ =, then (47) is satisfied with equality and the penalty is maximally non-convex. In practice, we use a nominal range of.5 γ.8. When A T A is diagonal, the proposed methodology reduces to element-wise firm thresholding. Proposition 4. Let y R M, A R M N, and λ >. If A T A is diagonal with positive diagonal entries and B is given by (48), then the minimizer of the cost function F in (46) is given by element-wise firm thresholding. Specifically, if then A T A = diag(α,..., α N ), (49) x opt n = firm([a T y] n /α n; λ/α n, λ/(γα n)) (5) when < γ, and when γ =. x opt n = soft([a T y] n /α n; λ/α n) (5) Proof. If A T A = diag(α,..., αn ), then y Ax = yt y x T A T y + xt A T Ax = yt y + ( xn [A T y] n + α nx n) n = n ( [A T y] n /α n α n x n ) + C where C does not depend on x. If B is given by (48), then B T B = (γ/λ) diag(α,..., α N). Using Proposition, we have ψ B (x) = n φ αn γ/λ (x n ). Hence, F in (46) is given by F (x) = [ ( [A T ) ] y] n /α n α n x n + λφαn γ/λ (x n ) +C n and so (5) follows from (8). VII. OPTIMIZATION ALGORITHM Even though the GMC penalty does not have a simple explicit formula, a global minimizer of the sparse-regularized cost function (46) can be readily calculated using proximal algorithms. It is not necessary to explicitly evaluate the GMC penalty or its gradient. To use proximal algorithms to minimize the cost function F in (46) when B satisfies (47), we rewrite it as a saddle-point problem: where (x opt, v opt ) = arg min max F (x, v) (5) x R N v R N F (x, v) = y Ax + λ x λ v λ B(x v) (5) If we use (48) with γ, then the saddle function is given by F (x, v) = y Ax + λ x λ v γ A(x v). (54) These saddle-point problems are instances of monotone inclusion problems. Hence, the solution can be obtained using the forward-backward (FB) algorithm for such a problems; see Theorem 5.8 of Ref. [4]. The FB algorithm involves only simple computational steps (soft-thresholding and the operators A and A T ). Proposition 5. Let λ > and γ <. Let y R N and A R M N. Then a saddle-point (x opt, v opt ) of F in (54) can be obtained by the iterative algorithm: Set ρ = max, γ/( γ)} A T A Set µ : < µ < /ρ For i =,,,... w (i) = x (i) µa T( A(x (i) + γ(v (i) x (i) )) y ) end u (i) = v (i) µγa T A(v (i) x (i) ) x (i+) = soft(w (i), µλ) v (i+) = soft(u (i), µλ) where i is the iteration counter. Proof. The point (x opt, v opt ) is a saddle-point of F if F (x opt, v opt ) where F is the subdifferential of F. From (54), we have x F (x, v) = A T (Ax y) γa T A(x v) + λ sign(x) v F (x, v) = γa T A(x v) λ sign(v). Hence, F if P (x, v) + Q(x, v) where [ [ ] [ ] ( γ)a P (x, v) = T A γa T A x A γa T A γa A] T T y v

10 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 65, NO. 7, PP , SEPTEMBER 7 (PREPRINT) Q(x, v) = [ ] λ sign(x). λ sign(v) Finding (x, v) such that P (x, v) + Q(x, v) is the problem of constructing a zero of a sum of operators. The operators P and Q are maximally monotone and P is single-valued and β-cocoercive with β > ; hence, the forward-backward algorithm (Theorem 5.8 in [4]) can be used. In the current notation, the forward-backward algorithm is [ ] [ ] w (i) x (i) u (i) = v (i) µp (x (i), v (i) ) [ ] x (i+) = J µq (w (i), u (i) ) v (i+) where J Q = (I +Q) is the resolvent of Q. The resolvent of the sign function is soft thresholding. The constant µ should be chosen < µ < β where P is β-cocoercive (Definition 4.4 in [4]), i.e., βp is firmly non-expansive. We now address the value β. By Corollary 4.(v) in [4], this condition is equivalent to P + P T βp T P. (55) We may write P using a Kronecker product, [ ] γ γ P = A T A. γ γ Then we have P + P T βp T P [ ] γ = A T A γ [ ] [ ] γ γ γ γ β (A T A) γ γ γ γ (([ ] [ ] [ ]) ) γ γ γ γ γ = β γ I γ γ γ γ N ( I ( I N β A T A )) ( I A T A ) where β = β β. Hence, (55) is satisfied if [ ] [ ] [ ] γ γ γ γ γ β γ γ γ γ γ and I N β A T A. These conditions are repsectively satisfied if and β / max, γ/( γ)} β / A T A. The FB algorithm requires that P be β-cocoercive with β > ; hence, γ = is precluded. If γ = in Proposition 5, then the algorithm reduces to the classic iterative shrinkage/thresholding algorithm (ISTA) [], [6]. The Douglas-Rachford algorithm (Theorem 5.6 in [4]) may also be used to find a saddle-point of F in (54) Noise free signal 5 Denoising [L norm] RMSE =.4 5 FFT of noisy signal Frequency Noisy signal, σ =. 5 Denoising [GMC penalty] RMSE =.49 5 Optimized Fourier coefficients o GMC x L norm Frequency Fig.. Denoising using the l norm and the proposed GMC penalty. The plot of optimized coefficients shows only the non-zero values. VIII. NUMERICAL EXAMPLES A. Denoising using frequency-domain sparsity This example illustrates the use of the GMC penalty for denoising [8]. Specifically, we consider the estimation of the discrete-time signal g(m) = cos(πf m) + sin(πf m), m =,..., M of length M = with frequencies f =. and f =.. This signal is sparse in the frequency domain, so we model the signal as g = Ax where A is an over-sampled inverse discrete Fourier transform and x C N is a sparse vector of Fourier coefficients with N M. Specifically, we define the matrix A C M N as A m,n = ( / N ) exp(j(π/n)mn), m =,..., M, n =,..., N with N = 56. The columns of A form a normalized tight frame, i.e., AA H = I where A H is the complex conjugate transpose of A. For the denoising experiment, we corrupt the signal with additive white Gaussian noise (AWGN) with standard deviation σ =., as illustrated in Fig.. In addition to the l norm and proposed GMC penalty, we use several other methods: debiasing the l norm solution [7], iterative p-shrinkage (IPS) [6], [64], and multivariate sparse regularization (MUSR) [5]. Debiasing the l norm solution is a two-step approach where the l -norm solution is used to estimate the support, then the identified non-zero values are reestimated by un-regularized least squares. The IPS algorithm is a type of iterative thresholding algorithm that performed particularly well in a detailed comparison of several algorithms [55]. MUSR regularization is a precursor of the GMC penalty,

11 Average RMSE Denoising via frequency domain sparsity L norm L + debias IPS MUSR GMC λ Fig.. Average RMSE for three denoising methods. i.e., a non-separable non-convex penalty designed to maintain cost function convexity, but with a simpler functional form. In this denoising experiment, we use realizations of the noise. Each method calls for a regularization parameter λ to be set. We vary λ from.5 to.5 (with increment.5) and evaluate the RMSE for each method, for each λ, and for each realization. For the GMC method we must also specify the matrix B, which we set using (48) with γ =.8. Since B H B is not diagonal, the GMC penalty is non-separable. The average RMSE as a function of λ for each method is shown in Fig.. The GMC compares favorably with the other methods, achieving the minimum average RMSE. The next bestperforming method is debiasing of the l -norm solution, which performs almost as well as GMC. Note that this debiasing method does not minimize an initially prescribed cost function, in contrast to the other methods. The IPS algorithm aims to minimize a (non-convex) cost function. Figure shows the l -norm and GMC solutions for a particular noise realization. The solutions shown in this figure were obtained using the value of λ that minimizes the average RMSE (λ =. and λ =., respectively). Comparing the l norm and GMC solutions, we observe: the GMC solution is more sparse in the frequency domain; and the l norm solution underestimates the coefficient amplitudes. Neither increasing nor decreasing the regularization parameter λ helps the l -norm solution here. A larger value of λ makes the l -norm solution sparser, but reduces the coefficient amplitudes. A smaller value of λ increases the coefficient amplitudes of the l -norm solution, but makes the solution less sparse and more noisy. Note that the purpose of this example is to compare the proposed GMC penalty with other sparse regularizers. We are not advocating it for frequency estimation per se. B. Denoising using time-frequency sparsity This example considers the denoising of a bat echolocation pulse, shown in Fig. (sampling period of 7 microseconds). The bat pulse can be modeled as sparse in the time- The bat echolocation pulse data is curtesy of Curtis Condon, Ken White, and Al Feng of the Beckman Center at the University of Illinois. Available online at True signal. Noisy signal.. Denoising using L norm. RMSE =.6. Denoising using GMC penalty. RMSE =.6. Frequency (khz) Frequency (khz) Frequency (khz) Frequency (khz) 6 4 Time frequency profile (db) Time frequency profile (db) 6 4 Time frequency profile (db) 6 4 Time frequency profile (db) 6 4 Fig.. Denoising a bat echolocation pulse using the l norm and GMC penalty. The GMC penalty results in fewer extraneous noise artifacts in the time-frequency representation. frequency domain. We use a short-time Fourier transform (STFT) with 75% overlapping segments (the transform is fourtimes overcomplete). We implement the STFT as a normalized tight frame, i.e., AA H = I. The bat pulse and its spectrogram are illustrated in Fig.. For the denoising experiment, we contaminate the pulse with AWGN (σ =.5). We perform denoising by estimating the STFT coefficients by minimizing the cost function F in (46) where A represents the inverse STFT operator. We set λ so as to minimize the rootmean-square error (RMSE). This leads to the values λ =. and λ =.5 for the l -norm and GMC penalties, respectively. For the GMC penalty, we set B as in (48) with γ =.7. Since B H B is not diagonal, the GMC penalty is non-separable. We

12 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 65, NO. 7, PP , SEPTEMBER 7 (PREPRINT) then estimate the bat pulse by computing the inverse STFT of the optimized coefficients. With λ individually set for each method, the resulting RMSE is about the same (.6). The optimized STFT coefficients (time-frequency representation) for each solution is shown in Fig.. We observe that the GMC solution has substantially fewer extraneous noise artifacts in the time-frequency representation, compared to the l norm solution. (The time-frequency representations in Fig. are shown in decibels with db being black and -5 db being white.) C. Sparsity-assisted signal smoothing This example uses the GMC penalty for sparsity-assisted signal smoothing (SASS) [5], [5]. The SASS method is suitable for the denoising of signals that are smooth for the exception of singularities. Here, we use SASS to denoise the biosensor data illustrated in Fig. (a), which exhibits jump discontinuities. This data was acquired using a whispering gallery mode (WGM) sensor designed to detect nano-particles with high sensitivity [], []. Nano-particles show up as jump discontinuities in the data. The SASS technique formulates the denoising problem as a sparse deconvolution problem. The cost function to be minimized has the form (46). The exact cost function, given by equation (4) in Ref. [5], depends on a prescribed lowpass filter and the order of the singularities the signal is assumed to posses. For the biosensor data shown in Fig., the singularities are of order K = since the first-order derivative of the signal exhibits impulses. In this example, we use a low-pass filter of order d = and cut-off frequency f c =. (these parameter values designate a low-pass filter as described in [5]). We set λ = and, for the GMC penalty, we set γ =.7. Solving the SASS problem using the l norm and GMC penalty yields the denoised signals shown in Figs. (b) and (c), respectively. The amplitudes of the jump discontinuities are indicated in the figure. It can be seen, especially in Fig. (d), that the GMC solution estimates the jump discontinuities more accurately than the l norm solution. The l norm solution tends to underestimate the amplitudes of the jump discontinuities. To reduce this tendency, a smaller value of λ could be used, but that tends to produce false discontinuities (false detections). IX. CONCLUSION In regards to the sparse-regularized linear least squares problem, this work bridges the convex (i.e., l norm) and the non-convex (e.g., l p norm with p < ) approaches, which are usually mutually exclusive and incompatible. Specifically, this work formulates the sparse-regularized linear least squares problem using a non-convex generalization of the l norm that preserves the convexity of the cost function to be minimized. The proposed method leads to optimization problems with no extraneous suboptimal local minima and allows the leveraging of globally convergent, computationally efficient, scalable convex optimization algorithms. The advantage compared to l norm regularization is (i) more accurate estimation of highamplitude components of sparse solutions or (ii) a higher level Biosensor data 5 5 (a).8 6. SASS using L norm 5 5 (b) 5 5 (c) data L norm GMC. SASS using GMC penalty (d) 6.5 Magnified view Fig.. Sparsity-assisted signal smoothing (SASS) using l -norm and GMC regularization, as applied to biosensor data. The GMC method more accurately estimates jump discontinuities. of sparsity in a sparse approximation problem. The sparse regularizer is expressed as the l norm minus a smooth convex function defined via infimal convolution. In the scalar case, the method reduces to firm thresholding (a generalization of soft thresholding). Several extensions of this method are of interest. For example, the idea may admit extension to more general convex regularizers such as total variation [48], nuclear norm [], mixed norms [4], composite regularizers [], [], co-sparse regularization [4], and more generally, atomic norms [4], and partly smooth regularizers [6]. Another extension of

13 interest is to problems where the data fidelity term is not quadratic (e.g., Poisson denoising [4]). REFERENCES [] M. V. Afonso, J. M. Bioucas-Dias, and M. A. T. Figueiredo. An augmented Lagrangian approach to linear inverse problems with compound regularization. In Proc. IEEE Int. Conf. Image Processing (ICIP), pages , September. [] R. Ahmad and P. Schniter. Iteratively reweighted L approaches to sparse composite regularization. IEEE Trans. Comput. Imaging, (4): 5, 5. [] S. Arnold, M. Khoshima, I. Teraoka, S. Holler, and F. Vollmer. Shift of whispering-gallery modes in microspheres by protein adsorption. Opt. Lett, 8(4):7 74, February 5,. [4] H. H. Bauschke and P. L. Combettes. Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer,. [5] İ. Bayram. Penalty functions derived from monotone mappings. IEEE Signal Processing Letters, ():65 69, March 5. [6] I. Bayram. On the convergence of the iterative shrinkage/thresholding algorithm with a weakly convex penalty. IEEE Trans. Signal Process., 64(6):597 68, March 6. [7] A. Blake and A. Zisserman. Visual Reconstruction. MIT Press, 987. [8] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 4. [9] A. Bruckstein, D. Donoho, and M. Elad. From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Review, 5():4 8, 9. [] E. J. Candes and Y. Plan. Matrix completion with noise. Proc. IEEE, 98(6):95 96, June. [] E. J. Candès, M. B. Wakin, and S. Boyd. Enhancing sparsity by reweighted l minimization. J. Fourier Anal. Appl., 4(5):877 95, December 8. [] M. Carlsson. On convexification/optimization of functionals including an l-misfit term. September 6. [] M. Castella and J.-C. Pesquet. Optimization of a Geman-McClure like criterion for sparse signal deconvolution. In IEEE Int. Workshop Comput. Adv. Multi-Sensor Adaptive Proc. (CAMSAP), pages 9, December 5. [4] V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky. The convex geometry of linear inverse problems. Foundations of Computational mathematics, (6):85 849,. [5] R. Chartrand. Shrinkage mappings and their induced penalty functions. In Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP), pages 6 9, May 4. [6] L. Chen and Y. Gu. The convergence guarantees of a non-convex approach for sparse recovery. IEEE Trans. Signal Process., 6(5): , August 4. [7] P.-Y. Chen and I. W. Selesnick. Group-sparse signal denoising: Nonconvex regularization, convex optimization. IEEE Trans. Signal Process., 6(): , July 4. [8] S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit. SIAM J. Sci. Comput., (): 6, 998. [9] E. Chouzenoux, A. Jezierska, J. Pesquet, and H. Talbot. A majorizeminimize subspace approach for l l image regularization. SIAM J. Imag. Sci., 6():56 59,. [] P. L. Combettes. Perspective functions: Properties, constructions, and examples. Set-Valued and Variational Analysis, pages 8, 7. [] V. R. Dantham, S. Holler, V. Kolchenko, Z. Wan, and S. Arnold. Taking whispering gallery-mode single virus detection and sizing to the limit. Appl. Phys. Lett., (4),. [] I. Daubechies, M. Defrise, and C. De Mol. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math, 57():4 457, 4. [] Y. Ding and I. W. Selesnick. Artifact-free wavelet denoising: Nonconvex sparse regularization, convex optimization. IEEE Signal Processing Letters, (9):64 68, September 5. [4] F.-X. Dupé, J. M. Fadili, and J.-L. Starck. A proximal iteration for deconvolving Poisson noisy images using sparse representations. IEEE Trans. Image Process., 8():, 9. [5] J. Fan and R. Li. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc., 96(456):48 6,. [6] M. Figueiredo and R. Nowak. An EM algorithm for wavelet-based image restoration. IEEE Trans. Image Process., (8):96 96, August. [7] M. A. T. Figueiredo, R. D. Nowak, and S. J. Wright. Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems. IEEE J. Sel. Top. Signal Process., (4): , December 7. [8] M. Fornasier and H. Rauhut. Iterative thresholding algorithms. J. of Appl. and Comp. Harm. Analysis, 5():87 8, 8. [9] H.-Y. Gao and A. G. Bruce. Waveshrink with firm shrinkage. Statistica Sinica, 7: , 997. [] G. Gasso, A. Rakotomamonjy, and S. Canu. Recovering sparse signals with a certain family of nonconvex penalties and DC programming. IEEE Trans. Signal Process., 57(): , December 9. [] A. Gholami and S. M. Hosseini. A general framework for sparsity-based denoising and inversion. IEEE Trans. Signal Process., 59():5 5, November. [] W. He, Y. Ding, Y. Zi, and I. W. Selesnick. Sparsity-based algorithm for detecting faults in rotating machines. Mechanical Systems and Signal Processing, 7-7:46 64, May 6. [] P. J. Huber. Robust estimation of a location parameter. The Annals of Mathematical Statistics, 5():7, 964. [4] M. Kowalski and B. Torrésani. Sparsity and persistence: mixed norms provide simple signal models with dependent coefficients. Signal, Image and Video Processing, ():5 64, 9. [5] A. Lanza, S. Morigi, I. Selesnick, and F. Sgallari. Nonconvex nonsmooth optimization via convex nonconvex majorization minimization. Numerische Mathematik, 6():4 8, 7. [6] A. Lanza, S. Morigi, and F. Sgallari. Convex image denoising via nonconvex regularization with parameter selection. J. Math. Imaging and Vision, 56():95, 6. [7] M. Malek-Mohammadi, C. R. Rojas, and B. Wahlberg. A class of nonconvex penalties preserving overall convexity in optimizationbased mean filtering. IEEE Trans. Signal Process., 64(4): , December 6. [8] Y. Marnissi, A. Benazza-Benyahia, E. Chouzenoux, and J.-C. Pesquet. Generalized multivariate exponential power prior for wavelet-based multichannel image restoration. In Proc. IEEE Int. Conf. Image Processing (ICIP),. [9] H. Mohimani, M. Babaie-Zadeh, and C. Jutten. A fast approach for overcomplete sparse decomposition based on smoothed l norm. IEEE Trans. Signal Process., 57():89, January 9. [4] S. Nam, M. E. Davies, M. Elad, and R. Gribonval. The cosparse analysis model and algorithms. J. of Appl. and Comp. Harm. Analysis, 4(): 56,. [4] M. Nikolova. Estimation of binary images by minimizing convex criteria. In Proc. IEEE Int. Conf. Image Processing (ICIP), pages 8 vol., 998. [4] M. Nikolova. Markovian reconstruction using a GNC approach. IEEE Trans. Image Process., 8(9):4, 999. [4] M. Nikolova. Energy minimization methods. In O. Scherzer, editor, Handbook of Mathematical Methods in Imaging, chapter 5, pages Springer,. [44] M. Nikolova, M. K. Ng, and C.-P. Tam. Fast nonconvex nonsmooth minimization methods for image restoration and reconstruction. IEEE Trans. Image Process., 9():7 88, December. [45] A. Parekh and I. W. Selesnick. Enhanced low-rank matrix approximation. IEEE Signal Processing Letters, (4):49 497, April 6. [46] N. Parikh and S. Boyd. Proximal algorithms. Foundations and Trends in Optimization, ():, 4. [47] J. Portilla and L. Mancera. L-based sparse approximation: two alternative methods and some applications. In Proceedings of SPIE, volume 67 (Wavelets XII), San Diego, CA, USA, 7. [48] L. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms. Physica D, 6:59 68, 99. [49] I. Selesnick. Sparsity amplified. In Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP), pages , March 7. [5] I. Selesnick. Sparsity-assisted signal smoothing (revisited). In Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP), pages , March 7. [5] I. Selesnick. Total variation denoising via the Moreau envelope. IEEE Signal Processing Letters, 4():6, February 7. [5] I. Selesnick and M. Farshchian. Sparse signal approximation via nonseparable regularization. IEEE Trans. Signal Process., 65():56 575, May 7. [5] I. W. Selesnick. Sparsity-assisted signal smoothing. In R. Balan et al., editors, Excursions in Harmonic Analysis, Volume 4, pages Birkhäuser Basel, 5.

14 4 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 65, NO. 7, PP , SEPTEMBER 7 (PREPRINT) [54] I. W. Selesnick and I. Bayram. Sparse signal estimation by maximally sparse convex optimization. IEEE Trans. Signal Process., 6(5):78 9, March 4. [55] I. W. Selesnick and I. Bayram. Enhanced sparsity by non-separable regularization. IEEE Trans. Signal Process., 64(9):98, May 6. [56] I. W. Selesnick, A. Parekh, and I. Bayram. Convex -D total variation denoising with non-convex regularization. IEEE Signal Processing Letters, ():4 44, February 5. [57] E. Soubies, L. Blanc-Féraud, and G. Aubert. A continuous exact l penalty (CEL) for least squares regularized problem. SIAM J. Imag. Sci., 8():67 69, 5. [58] C. Soussen, J. Idier, J. Duan, and D. Brie. Homotopy based algorithms for l -regularized least-squares. IEEE Trans. Signal Process., 6(): 6, July 5. [59] J.-L. Starck, F. Murtagh, and J. Fadili. Sparse image and signal processing: Wavelets and related geometric multiscale analysis. Cambridge University Press, 5. [6] M. E. Tipping. Sparse Bayesian learning and the relevance vector machine. J. Machine Learning Research, : 44,. [6] S. Vaiter, C. Deledalle, J. Fadili, G. Peyré, and C. Dossal. The degrees of freedom of partly smooth regularizers. Annals of the Institute of Statistical Mathematics, pages 4, 6. [6] S. Voronin and R. Chartrand. A new generalized thresholding algorithm for inverse problems with sparsity constraints. In Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP), pages 66 64, May. [6] D. P. Wipf, B. D. Rao, and S. Nagarajan. Latent variable Bayesian models for promoting sparsity. IEEE Trans. Inform. Theory, 57(9):66 655, September. [64] J. Woodworth and R. Chartrand. Compressed sensing recovery via nonconvex shrinkage penalties. Inverse Problems, (7): , July 6. [65] C.-H. Zhang. Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, pages ,. [66] H. Zou and R. Li. One-step sparse estimates in nonconcave penalized likelihood models. Ann. Statist., 6(4):59 5, 8.

Sparse Regularization via Convex Analysis

Sparse Regularization via Convex Analysis Ivan Selesnick Electrical and Computer Engineering Tandon School of Engineering New York University Brooklyn, New York, USA 29 / 66 Convex or non-convex: Which