Sparse Regularization via Convex Analysis

Size: px
Start display at page:

Download "Sparse Regularization via Convex Analysis"

Transcription

1 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 65, NO. 7, PP , SEPTEMBER 7 (PREPRINT) Sparse Regularization via Convex Analysis Ivan Selesnick Abstract Sparse approximate solutions to linear equations are classically obtained via L norm regularized least squares, but this method often underestimates the true solution. As an alternative to the L norm, this paper proposes a class of nonconvex penalty functions that maintain the convexity of the least squares cost function to be minimized, and avoids the systematic underestimation characteristic of L norm regularization. The proposed penalty function is a multivariate generalization of the minimax-concave (MC) penalty. It is defined in terms of a new multivariate generalization of the Huber function, which in turn is defined via infimal convolution. The proposed sparseregularized least squares cost function can be minimized by proximal algorithms comprising simple computations. I. INTRODUCTION Numerous signal and image processing techniques build upon sparse approximation [59]. A sparse approximate solution to a system of linear equations (y = Ax) can often be obtained via convex optimization. The usual technique is to minimize the regularized linear least squares cost function J : R N R, J(x) = y Ax + λ x, λ >. () The l norm is classically used as a regularizer here, since among convex regularizers it induces sparsity most effectively [9]. But this formulation tends to underestimate highamplitude components of x R N. Non-convex sparsityinducing regularizers are also widely used (leading to more accurate estimation of high-amplitude components), but then the cost function is generally non-convex and has extraneous suboptimal local minimizers [4]. This paper proposes a class of non-convex penalties for sparse-regularized linear least squares that generalizes the l norm and maintains the convexity of the least squares cost function to be minimized. That is, we consider the cost function F : R N R F (x) = y Ax + λ ψ B(x), λ > () and we propose a new non-convex penalty ψ B : R N R that makes F convex. The penalty ψ B is parameterized by a matrix B, and the convexity of F depends on B being suitably prescribed. In fact, the choice of B will depend on A. The matrix (linear operator) A may be arbitrary (i.e., injective, surjective, both, or neither). In contrast to the l norm, the new approach does not systematically underestimate highamplitude components of sparse vectors. Since the proposed Department of Electrical and Computer Engineering, Tandon School of Engineering, New York University, New York, USA. selesi@nyu.edu This work was supported by NSF under grant CCF-5598 and ONR under grant N Supplemental software (Matlab and Python) is available from the author or online at formulation is convex, the cost function has no suboptimal local minimizers. The new class of non-convex penalties is defined using tools of convex analysis. In particular, infimal convolution is used to define a new multivariate generalization of the Huber function. In turn, the generalized Huber function is used to define the proposed non-convex penalty, which can be considered a multivariate generalization of the minimax-concave (MC) penalty. Even though the generalized MC (GMC) penalty is non-convex, it is easy to prescribe this penalty so as to maintain the convexity of the cost function to be minimized. The proposed convex cost functions can be minimized using proximal algorithms, comprising simple computations. In particular, the minimization problem can be cast as a kind of saddle-point problem for which the forward-backward splitting algorithm is applicable. The main computational steps of the algorithm are the operators A, A T, and soft thresholding. The implementation is thus matrix-free in that it involves the operators A and A T, but does not access or modify the entries of A. Hence, the algorithm can leverage efficient implementations of A and its transpose. We remark that while the proposed GMC penalty is nonseparable, we do not advocate non-separability in and of itself as a desirable property of a sparsity-inducing penalty. But in many cases (depending on A), non-separability is simply a requirement of a non-convex penalty designed so as to maintain convexity of the cost function F to be minimized. If A T A is singular (and none of its eigenvectors are standard basis vectors), then a separable penalty that maintains the convexity of the cost function F must, in fact, be a convex penalty [55]. This leads us back to the l norm. Thus, to improve upon the l norm, the penalty must be non-separable. This paper is organized as follows. Section II sets notation and recalls definitions of convex analysis. Section III recalls the (scalar) Huber function, the (scalar) MC penalty, and how they arise in the formulation of threshold functions (instances of proximity operators). The subsequent sections generalize these concepts to the multivariate case. In Section IV, we define a multivariate version of the Huber function. In Section V, we define a multivariate version of the MC penalty. In Section VI, we show how to set the GMC penalty to maintain convexity of the least squares cost function. Section VII presents a proximal algorithm to minimize this type of cost function. Section VIII presents examples wherein the GMC penalty is used for signal denoising and approximation. Elements of this work were presented in Ref. [49]. A. Related work Many prior works have proposed non-convex penalties that strongly promote sparsity or describe algorithms for solving

2 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 65, NO. 7, PP , SEPTEMBER 7 (PREPRINT) the sparse-regularized linear least squares problem, e.g., [], [], [5], [6], [9], [5], [], [], [8], [9], [4], [47], [58], [64], [66]. However, most of these papers (i) use separable (additive) penalties or (ii) do not seek to maintain convexity of the cost function. Non-separable non-convex penalties are proposed in Refs. [6], [6], but they are not designed to maintain cost function convexity. The development of convexity-preserving non-convex penalties was pioneered by Blake, Zisserman, and Nikolova [7], [4] [44], and further developed in [6], [7], [], [], [6], [7], [45], [54], [56]. But these are separable penalties, and as such they are fundamentally limited. Specifically, if A T A is singular, then a separable penalty constrained to maintain cost function convexity can only improve on the l norm to a very limited extent [55]. Non-convex regularization that maintains cost function convexity was used in [5] in an iterative manner for non-convex optimization, to reduce the likelihood that an algorithm converges to suboptimal local minima. To overcome the fundamental limitation of separable nonconvex penalties, we proposed a bivariate non-separable nonconvex penalty that maintains the convexity of the cost function to be minimized [55]. But that penalty is useful for only a narrow class of linear inverse problems. To handle more general problems, we subsequently proposed a multivariate penalty formed by subtracting from the l norm a function comprising the composition of a linear operator and a separable nonlinear function [5]. Technically, this type of multivariate penalty is non-separable, but it still constitutes a rather narrow class of non-separable functions. Convex analysis tools (especially the Moreau envelope and the Fenchel conjugate) have recently been used in novel ways for sparse regularized least squares [], [57]. Among other aims, these papers seek the convex envelope of the l pseudonorm regularized least squares cost function, and derive alternate cost functions that share the same global minimizers but have fewer local minima. In these approaches, algorithms are less likely to converge to suboptimal local minima (the global minimizer might still be difficult to calculate). For the special case where A T A is diagonal, the proposed GMC penalty is closely related to the continuous exact l (CEL) penalty introduced in [57]. In [57] it is observed that if A T A is diagonal, then the global minimizers of the l regularized problem coincides with that of a convex function defined using the CEL penalty. Although the diagonal case is simpler than the non-diagonal case (a non-convex penalty can be readily constructed to maintain cost function convexity [54]), the connection to the l problem is enlightening. In other related work, we use convex analysis concepts (specifically, the Moreau envelope) for the problem of total variation (TV) denoising [5]. In particular, we prescribe a non-convex TV penalty that preserves the convexity of the TV denoising cost function to be minimized. The approach of Ref. [5] generalizes standard TV denoising so as to more accurately estimate jump discontinuities. II. NOTATION The l, l, and l norms of x R N are defined x = n x n, x = ( n x n ) /, and x = max n x n. If x Fig.. The Huber function. A R M N, then component n of Ax is denoted [Ax] n. If the matrix A B is positive semidefinite, we write B A. The matrix -norm of matrix A is denoted A and its value is the square root of the maximum eigenvalue of A T A. We have Ax A x for all x R N. If A has full rowrank (i.e., AA T is invertible), then the pseudo-inverse of A is given by A + := A T (AA T ). We denote the transpose of the pseudo-inverse of A as A +T, i.e., A +T := (A + ) T. If A has full row-rank, then A +T = (AA T ) A. This work uses definitions and notation of convex analysis [4]. The infimal convolution of two functions f and g from R N to R + } is given by (f g)(x) = inf v R N f(v) + g(x v) }. () The Moreau envelope of the function f : R N R is given by f M (x) = inf v R N f(v) + x v }. (4) In the notation of infimal convolution, we have f M = f. (5) The set of proper lower semicontinuous (lsc) convex functions from R N to R + } is denoted Γ (R N ). If the function f is defined as the composition f(x) = h(g(x)), then we write f = h g. The soft threshold function soft: R R with threshold parameter λ is defined as, y λ soft(y; λ) := (6) ( y λ) sign(y), y λ. III. SCALAR PENALTIES We recall the definition of the Huber function []. Definition. The Huber function s: R R is defined as s(x) := x, x x, x, (7) as illustrated in Fig.. Proposition. The Huber function can be written as s(x) = min v R v + (x v) }. (8)

3 .5.5 x x +.5 x x φ(x) x Fig.. The Huber function as the pointwise minimum of three functions. In the notation of infimal convolution, we have equivalently s = ( ). (9) And in the notation of the Moreau envelope, we have equivalently s = M. The Huber function is a standard example of the Moreau envelope. For example, see Sec.. of Ref. [46] and []. We note here that, given x R, the minimum in (8) is achieved for v equal to, x, or x +, i.e., s(x) = min v + v, x, x+} (x v) }. () Consequently, the Huber function can be expressed as s(x) = min x, x +, x + + } as illustrated in Fig.. () We now consider the scalar penalty function illustrated in Fig.. This is the minimax-concave (MC) penalty [65]; see also [5], [6], [8]. Definition. The minimax-concave (MC) penalty function φ: R R is defined as x φ(x) := x, x, x, () as illustrated in Fig.. The MC penalty can be expressed as φ(x) = x s(x) () where s is the Huber function. This representation of the MC penalty will be used in Sec. V to generalize the MC penalty to the multivariate case. A. Scaled functions It will be convenient to define scaled versions of the Huber function and MC penalty. Definition. Let b R. The scaled Huber function s b : R R is defined as s b (x) := s(b x)/b, b. (4) For b =, the function is defined as s (x) :=. (5) x Fig.. The MC penalty function. Hence, for b, the scaled Huber function is given by s b (x) = b x, x /b x b, x /b (6). The scaled Huber function s b is shown in Fig. 4 for several values of the scaling parameter b. Note that and s b (x) x, x R, (7) lim b(x) = x b (8) lim b(x) =. b (9) Incidentally, we use b in definition (4) rather than b, so as to parallel the generalized Huber function to be defined in Sec. IV. Proposition. Let b R. The scaled Huber function can be written as s b (x) = min v R v + b (x v) }. () In terms of infimal convolution, we have equivalently s b = b ( ). () Proof. For b, we have from (4) that s b (x) = min v R v + (b x v) }/b = min v R b v + (b x b v) }/b = min v R v + b (x v) }. It follows from = that () holds for b =. Definition 4. Let b R. The scaled MC penalty function φ b : R R is defined as where s b is the scaled Huber function. φ b (x) := x s b (x) () The scaled MC penalty φ b is shown in Fig. 4 for several values of b. Note that φ (x) = x. For b, x φ b (x) = b x, x /b b, x /b ().

4 4 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 65, NO. 7, PP , SEPTEMBER 7 (PREPRINT) Scaled Huber function s b (x) x b =.4 b =. b =.7 b =.5 4 Firm threshold function λ µ.5 Scaled MC penalty φ b (x) b =. b = y.5 b =.7.5 b =. b =.4 x Fig. 4. Scaled Huber function and MC penalty for several values of the scaling parameter. B. Convexity condition In the scalar case, the MC penalty corresponds to a type of threshold function. Specifically, the firm threshold function is the proximity operator of the MC penalty, provided that a particular convexity condition is satisfied. Here, we give the convexity condition for the scalar case. We will generalize this condition to the multivariate case in Sec. VI. Proposition. Let λ > and a R. Define f : R R, f(x) = (y ax) + λ φ b (x) (4) where φ b is the scaled MC penalty (). If then f is convex. b a /λ, (5) There are several ways to prove Proposition. In anticipation of the multivariate case, we use a technique in the following proof that we later use in the proof of Theorem in Sec. VI. Proof of Proposition. Using (), we write f as f(x) = (y ax) + λ x λs b (x) = g(x) + λ x where s b is the scaled Huber function and g : R R is given by g(x) = (y ax) λs b (x). (6) Since the sum of two convex functions is convex, it is sufficient to show g is convex. Using (), we have g(x) = (y ax) λ min v R v + b (x v) } Fig. 5. Firm threshold function. = max v R (y ax) λ v λb (x v) } = (a λb )x + max v R (y axy) λ v λb (v xv) }. Note that the expression in the curly braces is affine (hence convex) in x. Since the pointwise maximum of a set of convex functions is itself convex, the second term is convex in x. Hence, g is convex if a λb. The firm threshold function was defined by Gao and Bruce [9] as a generalization of hard and soft thresholding. Definition 5. Let λ > and µ > λ. The threshold function firm: R R is defined as firm(y; λ, µ) :=, y λ µ( y λ)/(µ λ) sign(y), λ y µ y, y µ as illustrated in Fig. 5. (7) In contrast to the soft threshold function, the firm threshold function does not underestimate large amplitude values, since it equals the identity for large values of its argument. As µ λ or µ, the firm threshold function approaches the hard or soft threshold function, respectively. We now state the correspondence between the MC penalty and the firm threshold function. When f in (4) is convex (i.e., b a /λ), the minimizer of f is given by firm thresholding. This is noted in Refs. [5], [8], [64], [65]. Proposition 4. Let λ >, a >, b >, and b a /λ. Let y R. Then the minimizer of f in (4) is given by firm thresholding, i.e., x opt = firm(y/a; λ/a, /b ). (8) Hence, the minimizer of the scalar function f in (4) is easily obtained via firm thresholding. However, the situation in the multivariate case is more complicated. The aim of this

5 5 paper is to generalize this process to the multivariate case: to define a multivariate MC penalty generalizing (), to define a regularized least squares cost function generalizing (4), to generalize the convexity condition (5), and to provide a method to calculate a minimizer. IV. GENERALIZED HUBER FUNCTION In this section, we introduce a multivariate generalization of the Huber function. The basic idea is to generalize () which expresses the scalar Huber function as an infimal convolution. Definition 6. Let B R M N. We define the generalized Huber function S B : R N R as S B (x) := inf v + v R N B(x } v). (9) In the notation of infimal convolution, we have S B = B. () Proposition 5. The generalized Huber function S B is a proper lower semicontinuous convex function, and the infimal convolution is exact, i.e., S B (x) = min v + v R N B(x } v). () Proof. Set f = and g = B. Both f and g are convex; hence f g is convex by proposition. in [4]. Since f is coercive and g is bounded below, and f, g Γ (R N ), it follows that f g Γ (R N ) and the infimal convolution is exact (i.e., the infimum is achieved for some v) by Proposition.4 in [4]. Note that if C T C = B T B, then S B (x) = S C (x) for all x. That is, the generalized Huber function S B depends only on B T B, not on B itself. Therefore, without loss of generality, we may assume B has full row-rank. (If a given matrix B does not have full row-rank, then there is another matrix C with full row-rank such that C T C = B T B, yielding the same function S B.) As expected, the generalized Huber function reduces to the scalar Huber function. Proposition 6. If B is a scalar, i.e., B = b R, then the generalized Huber function reduces to the scalar Huber function, S b (x) = s b (x) for all x R. The generalized Huber function is separable (additive) when B T B is diagonal. Proposition 7. Let B R M N. If B T B is diagonal, then the generalized Huber function is separable (additive), comprising a sum of scalar Huber functions. Specifically, B T B = diag(α,..., α N) = S B (x) = n s αn (x n ). The utility of the generalized Huber function will be most apparent when B T B is a non-diagonal matrix. In this case, the generalized Huber function is non-separable, as illustrated in the following two examples. Example. For the matrix B =, () the generalized Huber function S B is shown in Fig. 6. As shown in the contour plot, the level sets of S B near the origin are ellipses. Example. For the matrix B = [.5] () the generalized Huber function S B is shown in Fig. 7. The level sets of S B are parallel lines because B is of rank. There is not a simple explicit formula for the generalized Huber function. But, using () we can derive several properties regarding the function. Proposition 8. Let B R M N. The generalized Huber function satisfies S B (x) x, x R N. (4) Proof. Using (), we have S B (x) = min v + v R N B(x } v) [ ] v + B(x v) = x. v=x Since S B is the minimum of a non-negative function, it also follows that S B (x) for all x. The following proposition accounts for the ellipses near the origin in the contour plot of the generalized Huber function in Fig. 6. (Further from the origin, the contours are not ellipsoidal.) Proposition 9. Let B R M N. The generalized Huber function satisfies S B (x) = Bx for all BT Bx. (5) Proof. From (), we have that S B (x) is the minimum value of g where g : R N R is given by Note that g(v) = v + B(x v). g() = Bx. Hence, it suffices to show that minimizes g if and only if B T Bx. Since g is convex, minimizes g if and only if g() where g is the subdifferential of g given by g(v) = sign(v) + B T B(v x) where sign is the set-valued signum function, }, t > sign(t) := [, ], t = }, t <.

6 6 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 65, NO. 7, PP , SEPTEMBER 7 (PREPRINT) Generalized Huber function S B (x) Generalized Huber function S B (x) x x x x Contours of S B Contours of S B Fig. 6. The generalized Huber function for the matrix B in (). It follows that minimizes g if and only if sign() B T Bx B T Bx [, ] N [ B T Bx ] [, ] for n =,..., N n B T Bx. Hence, the function S B coincides with B on a subset of its domain. Proposition. Let B R M N and set α = B. The generalized Huber function satisfies S B (x) S αi (x), x R N (6) = n s α (x n ). (7) Proof. Using (), we have S B (x) = min v + v R N B(x } v) min v + v R N B (x } v) = min v + v R N α (x v) } = min v + v R N α (x } v) = S αi (x). From Proposition 7 we have (7). Fig. 7. The generalized Huber function for the matrix B in (). The Moreau envelope is well studied in convex analysis [4]. Hence, it is useful to express the generalized Huber function S B in terms of a Moreau envelope, so we can draw on results in convex analysis to derive further properties of the generalized Huber function. Lemma. If B R N N is invertible, then the generalized Huber function S B can be expressed in terms of a Moreau envelope as S B = ( B ) M B. (8) Proof. Using (9), we have ( S B = ( ( B) = B)) B B ( ( = B ) ( ) ) B = ( B ) M B. Lemma. If B R M N has full row-rank, then the generalized Huber function S B can be expressed in terms of a Moreau envelope as S B = ( d B + ) M B (9) where d: R N R is the convex distance function d(x) = min x w (4) w null B

7 7 which represents the distance from the point x R N to the null space of B as measured by the l norm. Proof. Using (), we have S B (x) = min v + v R N B(x } v) = f(bx) where f : R M R is given by f(z) = min v R N v + z Bv = min u (null B) = min u (null B) } min u + w + w null B z B(u + } w) min u + w + w null B z } Bu } = min u (null B) d(u) + z Bu where d is the convex function given by (4). The fact that d is convex follows from Proposition 8.6 of [4] and Examples.6 and.7 of [8]. Since (null B) = range B T, f(z) = min d(u) + u range B T z } Bu = min d(b T v) + v R M z BBT v } = min d(b T (BB T ) v) + v R M z BBT (BB T ) v } = min d(b + v) + v R M z } v = ( d(b + ) ) M (z). Hence, S B (x) = ( d(b + ) ) M (Bx) which completes the proof. Note that (9) reduces to (8) when B is invertible. (Suppose B is invertible. Then null B = }; hence d(x) = x in (4). Additionally, B + = B.) Proposition. The generalized Huber function is differentiable. Proof. By Lemma, S B is the composition of a Moreau envelope of a convex function and a linear function. Additionally, by Proposition 5, S B Γ (R N ). By Proposition.9 in [4], it follows that S B is differentiable. The following result regards the gradient of the generalized Huber function. This result will be used in Sec. V to show the generalized MC penalty defined therein constitutes a valid penalty. Lemma. The gradient of the generalized Huber function S B : R N R satisfies S B (x) for all x R N. (4) Proof. Since S B is convex and differentiable, we have S B (v)+ [ S B (v) ] T (x v) SB (x), x R N, v R N. Using Proposition 8, it follows that S B (v) + [ S B (v) ] T (x v) x, x R N, v R N. Let x = (,...,, t,,..., ) where t is in position n. It follows that c(v) + [ S B (v) ] n t t, t R, v RN (4) where c(v) R does not depend on t. It follows from (4) that [ S B (v)] n. The generalized Huber function can be evaluated by taking the pointwise minimum of numerous simpler functions (comprising quadratics, absolute values, and linear functions). This generalizes the situation for the scalar Huber function, which can be evaluated as the pointwise minimum of three functions, as expressed in () and illustrated in Fig.. Unfortunately, evaluating the generalized Huber function on R N this way requires the evaluation of N simpler functions, which is not practical except for small N. In turn, the evaluation of the GMC penalty is also impractical. However, we do not need to explicitly evaluate these functions to utilize them for sparse regularization, as shown in Sec. VII. For this paper, we compute these functions on R only for the purpose of illustration (Figs. 6 and 7). V. GENERALIZED MC PENALTY In this section, we propose a multivariate generalization of the MC penalty (). The basic idea is to generalize () using the l norm and the generalized Huber function. Definition 7. Let B R M N. We define the generalized MC (GMC) penalty function ψ B : R N R as ψ B (x) := x S B (x) (4) where S B is the generalized Huber function (9). The GMC penalty reduces to a separable penalty when B T B is diagonal. Proposition. Let B R M N. If B T B is a diagonal matrix, then ψ B is separable (additive), comprising a sum of scalar MC penalties. Specifically, B T B = diag(α,..., α N) = ψ B (x) = n φ αn (x n ) where φ b is the scaled MC penalty (). If B T B =, then ψ B (x) = x. Proof. If B T B = diag(α,..., αn ), then by Proposition 7 we have ψ B (x) = x n = n s αn (x n ) x n s αn (x n ) which proves the result in light of definition (). The most interesting case (the case that motivates the GMC penalty) is the case where B T B is a non-diagonal matrix. If B T B is non-diagonal, then the GMC penalty is non-separable. Example. For the matrices B given in () and (), the GMC penalty is illustrated in Fig. 8 and Fig. 9, respectively.

8 8 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 65, NO. 7, PP , SEPTEMBER 7 (PREPRINT) GMC penalty ψ B (x) = x S B (x) GMC penalty ψ B (x) = x S B (x).5.5 x x x x Contours of ψ B Contours of ψ B Fig. 8. The GMC penalty for the matrix B in (). The following corollaries follow directly from Propositions 8 and 9. Corollary. The generalized MC penalty satisfies ψ B (x) x for all x R N. (44) Corollary. Given B R M N, the generalized MC penalty satisfies ψ B (x) = x Bx for all BT Bx. (45) The corollaries imply that around zero the generalized MC penalty approximates the l norm (from below), i.e., ψ B (x) x for x. The generalized MC penalty has a basic property expected of a regularization function; namely, that large values are penalized more than (or the same as) small values. Specifically, if v, x R N with v i x i and sign v i = sign x i for i =,..., N, then ψ B (v) ψ B (x). That is, in any given quadrant, the function ψ B (x) is a non-decreasing function in each x i. This is formalized in the following proposition, and illustrated in Figs. 8 and 9. Basically, the gradient of ψ B points away from the origin. Proposition. Let x R N with x i. The generalized MC penalty ψ B has the property that [ ψ B (x)] i either has the same sign as x i or is equal to zero. Fig. 9. The GMC penalty for the matrix B in (). Proof. Let x R N with x i. Then, from the definition of the MC penalty, ψ B x i (x) = sign(x i ) S B x i (x). From Lemma, S B (x)/ x i. Hence ψ B (x)/ x i when x i >, and ψ B (x)/ x i when x i <. A penalty function not satisfying Proposition would not be considered an effective sparsity-inducing regularizer. VI. SPARSE REGULARIZATION In this section, we consider how to set the GMC penalty to maintain the convexity of the regularized least square cost function. To that end, the condition (47) below generalizes the scalar convexity condition (5). Theorem. Let y R M, A R M N, and λ >. Define F : R N R as F (x) = y Ax + λ ψ B(x) (46) where ψ B : R N R is the generalized MC penalty (4). If then F is a convex function. B T B λ AT A (47)

9 9 Proof. Write F as F (x) = y Ax + λ ( x S B (x) ) = y Ax + λ x min λ v + λ v R N B(x } v) = max v R N y Ax + λ x λ v λ B(x } v) = max v R N xt( A T A λb T B ) x + λ x + g(x, v) } = xt( A T A λb T B ) x + λ x + max g(x, v) v R N where g is affine in x. The last term is convex as it is the pointwise maximum of a set of convex functions (Proposition 8.4 in [4]). Hence, F is convex if A T A λb T B is positive semidefinite. The convexity condition (47) is easily satisfied. Given A, we may simply set B = γ/λ A, γ. (48) Then B T B = (γ/λ)a T A which satisfies (47) when γ. The parameter γ controls the non-convexity of the penalty ψ B. If γ =, then B = and the penalty reduces to the l norm. If γ =, then (47) is satisfied with equality and the penalty is maximally non-convex. In practice, we use a nominal range of.5 γ.8. When A T A is diagonal, the proposed methodology reduces to element-wise firm thresholding. Proposition 4. Let y R M, A R M N, and λ >. If A T A is diagonal with positive diagonal entries and B is given by (48), then the minimizer of the cost function F in (46) is given by element-wise firm thresholding. Specifically, if then A T A = diag(α,..., α N ), (49) x opt n = firm([a T y] n /α n; λ/α n, λ/(γα n)) (5) when < γ, and when γ =. x opt n = soft([a T y] n /α n; λ/α n) (5) Proof. If A T A = diag(α,..., αn ), then y Ax = yt y x T A T y + xt A T Ax = yt y + ( xn [A T y] n + α nx n) n = n ( [A T y] n /α n α n x n ) + C where C does not depend on x. If B is given by (48), then B T B = (γ/λ) diag(α,..., α N). Using Proposition, we have ψ B (x) = n φ αn γ/λ (x n ). Hence, F in (46) is given by F (x) = [ ( [A T ) ] y] n /α n α n x n + λφαn γ/λ (x n ) +C n and so (5) follows from (8). VII. OPTIMIZATION ALGORITHM Even though the GMC penalty does not have a simple explicit formula, a global minimizer of the sparse-regularized cost function (46) can be readily calculated using proximal algorithms. It is not necessary to explicitly evaluate the GMC penalty or its gradient. To use proximal algorithms to minimize the cost function F in (46) when B satisfies (47), we rewrite it as a saddle-point problem: where (x opt, v opt ) = arg min max F (x, v) (5) x R N v R N F (x, v) = y Ax + λ x λ v λ B(x v) (5) If we use (48) with γ, then the saddle function is given by F (x, v) = y Ax + λ x λ v γ A(x v). (54) These saddle-point problems are instances of monotone inclusion problems. Hence, the solution can be obtained using the forward-backward (FB) algorithm for such a problems; see Theorem 5.8 of Ref. [4]. The FB algorithm involves only simple computational steps (soft-thresholding and the operators A and A T ). Proposition 5. Let λ > and γ <. Let y R N and A R M N. Then a saddle-point (x opt, v opt ) of F in (54) can be obtained by the iterative algorithm: Set ρ = max, γ/( γ)} A T A Set µ : < µ < /ρ For i =,,,... w (i) = x (i) µa T( A(x (i) + γ(v (i) x (i) )) y ) end u (i) = v (i) µγa T A(v (i) x (i) ) x (i+) = soft(w (i), µλ) v (i+) = soft(u (i), µλ) where i is the iteration counter. Proof. The point (x opt, v opt ) is a saddle-point of F if F (x opt, v opt ) where F is the subdifferential of F. From (54), we have x F (x, v) = A T (Ax y) γa T A(x v) + λ sign(x) v F (x, v) = γa T A(x v) λ sign(v). Hence, F if P (x, v) + Q(x, v) where [ [ ] [ ] ( γ)a P (x, v) = T A γa T A x A γa T A γa A] T T y v

10 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 65, NO. 7, PP , SEPTEMBER 7 (PREPRINT) Q(x, v) = [ ] λ sign(x). λ sign(v) Finding (x, v) such that P (x, v) + Q(x, v) is the problem of constructing a zero of a sum of operators. The operators P and Q are maximally monotone and P is single-valued and β-cocoercive with β > ; hence, the forward-backward algorithm (Theorem 5.8 in [4]) can be used. In the current notation, the forward-backward algorithm is [ ] [ ] w (i) x (i) u (i) = v (i) µp (x (i), v (i) ) [ ] x (i+) = J µq (w (i), u (i) ) v (i+) where J Q = (I +Q) is the resolvent of Q. The resolvent of the sign function is soft thresholding. The constant µ should be chosen < µ < β where P is β-cocoercive (Definition 4.4 in [4]), i.e., βp is firmly non-expansive. We now address the value β. By Corollary 4.(v) in [4], this condition is equivalent to P + P T βp T P. (55) We may write P using a Kronecker product, [ ] γ γ P = A T A. γ γ Then we have P + P T βp T P [ ] γ = A T A γ [ ] [ ] γ γ γ γ β (A T A) γ γ γ γ (([ ] [ ] [ ]) ) γ γ γ γ γ = β γ I γ γ γ γ N ( I ( I N β A T A )) ( I A T A ) where β = β β. Hence, (55) is satisfied if [ ] [ ] [ ] γ γ γ γ γ β γ γ γ γ γ and I N β A T A. These conditions are repsectively satisfied if and β / max, γ/( γ)} β / A T A. The FB algorithm requires that P be β-cocoercive with β > ; hence, γ = is precluded. If γ = in Proposition 5, then the algorithm reduces to the classic iterative shrinkage/thresholding algorithm (ISTA) [], [6]. The Douglas-Rachford algorithm (Theorem 5.6 in [4]) may also be used to find a saddle-point of F in (54) Noise free signal 5 Denoising [L norm] RMSE =.4 5 FFT of noisy signal Frequency Noisy signal, σ =. 5 Denoising [GMC penalty] RMSE =.49 5 Optimized Fourier coefficients o GMC x L norm Frequency Fig.. Denoising using the l norm and the proposed GMC penalty. The plot of optimized coefficients shows only the non-zero values. VIII. NUMERICAL EXAMPLES A. Denoising using frequency-domain sparsity This example illustrates the use of the GMC penalty for denoising [8]. Specifically, we consider the estimation of the discrete-time signal g(m) = cos(πf m) + sin(πf m), m =,..., M of length M = with frequencies f =. and f =.. This signal is sparse in the frequency domain, so we model the signal as g = Ax where A is an over-sampled inverse discrete Fourier transform and x C N is a sparse vector of Fourier coefficients with N M. Specifically, we define the matrix A C M N as A m,n = ( / N ) exp(j(π/n)mn), m =,..., M, n =,..., N with N = 56. The columns of A form a normalized tight frame, i.e., AA H = I where A H is the complex conjugate transpose of A. For the denoising experiment, we corrupt the signal with additive white Gaussian noise (AWGN) with standard deviation σ =., as illustrated in Fig.. In addition to the l norm and proposed GMC penalty, we use several other methods: debiasing the l norm solution [7], iterative p-shrinkage (IPS) [6], [64], and multivariate sparse regularization (MUSR) [5]. Debiasing the l norm solution is a two-step approach where the l -norm solution is used to estimate the support, then the identified non-zero values are reestimated by un-regularized least squares. The IPS algorithm is a type of iterative thresholding algorithm that performed particularly well in a detailed comparison of several algorithms [55]. MUSR regularization is a precursor of the GMC penalty,

11 Average RMSE Denoising via frequency domain sparsity L norm L + debias IPS MUSR GMC λ Fig.. Average RMSE for three denoising methods. i.e., a non-separable non-convex penalty designed to maintain cost function convexity, but with a simpler functional form. In this denoising experiment, we use realizations of the noise. Each method calls for a regularization parameter λ to be set. We vary λ from.5 to.5 (with increment.5) and evaluate the RMSE for each method, for each λ, and for each realization. For the GMC method we must also specify the matrix B, which we set using (48) with γ =.8. Since B H B is not diagonal, the GMC penalty is non-separable. The average RMSE as a function of λ for each method is shown in Fig.. The GMC compares favorably with the other methods, achieving the minimum average RMSE. The next bestperforming method is debiasing of the l -norm solution, which performs almost as well as GMC. Note that this debiasing method does not minimize an initially prescribed cost function, in contrast to the other methods. The IPS algorithm aims to minimize a (non-convex) cost function. Figure shows the l -norm and GMC solutions for a particular noise realization. The solutions shown in this figure were obtained using the value of λ that minimizes the average RMSE (λ =. and λ =., respectively). Comparing the l norm and GMC solutions, we observe: the GMC solution is more sparse in the frequency domain; and the l norm solution underestimates the coefficient amplitudes. Neither increasing nor decreasing the regularization parameter λ helps the l -norm solution here. A larger value of λ makes the l -norm solution sparser, but reduces the coefficient amplitudes. A smaller value of λ increases the coefficient amplitudes of the l -norm solution, but makes the solution less sparse and more noisy. Note that the purpose of this example is to compare the proposed GMC penalty with other sparse regularizers. We are not advocating it for frequency estimation per se. B. Denoising using time-frequency sparsity This example considers the denoising of a bat echolocation pulse, shown in Fig. (sampling period of 7 microseconds). The bat pulse can be modeled as sparse in the time- The bat echolocation pulse data is curtesy of Curtis Condon, Ken White, and Al Feng of the Beckman Center at the University of Illinois. Available online at True signal. Noisy signal.. Denoising using L norm. RMSE =.6. Denoising using GMC penalty. RMSE =.6. Frequency (khz) Frequency (khz) Frequency (khz) Frequency (khz) 6 4 Time frequency profile (db) Time frequency profile (db) 6 4 Time frequency profile (db) 6 4 Time frequency profile (db) 6 4 Fig.. Denoising a bat echolocation pulse using the l norm and GMC penalty. The GMC penalty results in fewer extraneous noise artifacts in the time-frequency representation. frequency domain. We use a short-time Fourier transform (STFT) with 75% overlapping segments (the transform is fourtimes overcomplete). We implement the STFT as a normalized tight frame, i.e., AA H = I. The bat pulse and its spectrogram are illustrated in Fig.. For the denoising experiment, we contaminate the pulse with AWGN (σ =.5). We perform denoising by estimating the STFT coefficients by minimizing the cost function F in (46) where A represents the inverse STFT operator. We set λ so as to minimize the rootmean-square error (RMSE). This leads to the values λ =. and λ =.5 for the l -norm and GMC penalties, respectively. For the GMC penalty, we set B as in (48) with γ =.7. Since B H B is not diagonal, the GMC penalty is non-separable. We

12 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 65, NO. 7, PP , SEPTEMBER 7 (PREPRINT) then estimate the bat pulse by computing the inverse STFT of the optimized coefficients. With λ individually set for each method, the resulting RMSE is about the same (.6). The optimized STFT coefficients (time-frequency representation) for each solution is shown in Fig.. We observe that the GMC solution has substantially fewer extraneous noise artifacts in the time-frequency representation, compared to the l norm solution. (The time-frequency representations in Fig. are shown in decibels with db being black and -5 db being white.) C. Sparsity-assisted signal smoothing This example uses the GMC penalty for sparsity-assisted signal smoothing (SASS) [5], [5]. The SASS method is suitable for the denoising of signals that are smooth for the exception of singularities. Here, we use SASS to denoise the biosensor data illustrated in Fig. (a), which exhibits jump discontinuities. This data was acquired using a whispering gallery mode (WGM) sensor designed to detect nano-particles with high sensitivity [], []. Nano-particles show up as jump discontinuities in the data. The SASS technique formulates the denoising problem as a sparse deconvolution problem. The cost function to be minimized has the form (46). The exact cost function, given by equation (4) in Ref. [5], depends on a prescribed lowpass filter and the order of the singularities the signal is assumed to posses. For the biosensor data shown in Fig., the singularities are of order K = since the first-order derivative of the signal exhibits impulses. In this example, we use a low-pass filter of order d = and cut-off frequency f c =. (these parameter values designate a low-pass filter as described in [5]). We set λ = and, for the GMC penalty, we set γ =.7. Solving the SASS problem using the l norm and GMC penalty yields the denoised signals shown in Figs. (b) and (c), respectively. The amplitudes of the jump discontinuities are indicated in the figure. It can be seen, especially in Fig. (d), that the GMC solution estimates the jump discontinuities more accurately than the l norm solution. The l norm solution tends to underestimate the amplitudes of the jump discontinuities. To reduce this tendency, a smaller value of λ could be used, but that tends to produce false discontinuities (false detections). IX. CONCLUSION In regards to the sparse-regularized linear least squares problem, this work bridges the convex (i.e., l norm) and the non-convex (e.g., l p norm with p < ) approaches, which are usually mutually exclusive and incompatible. Specifically, this work formulates the sparse-regularized linear least squares problem using a non-convex generalization of the l norm that preserves the convexity of the cost function to be minimized. The proposed method leads to optimization problems with no extraneous suboptimal local minima and allows the leveraging of globally convergent, computationally efficient, scalable convex optimization algorithms. The advantage compared to l norm regularization is (i) more accurate estimation of highamplitude components of sparse solutions or (ii) a higher level Biosensor data 5 5 (a).8 6. SASS using L norm 5 5 (b) 5 5 (c) data L norm GMC. SASS using GMC penalty (d) 6.5 Magnified view Fig.. Sparsity-assisted signal smoothing (SASS) using l -norm and GMC regularization, as applied to biosensor data. The GMC method more accurately estimates jump discontinuities. of sparsity in a sparse approximation problem. The sparse regularizer is expressed as the l norm minus a smooth convex function defined via infimal convolution. In the scalar case, the method reduces to firm thresholding (a generalization of soft thresholding). Several extensions of this method are of interest. For example, the idea may admit extension to more general convex regularizers such as total variation [48], nuclear norm [], mixed norms [4], composite regularizers [], [], co-sparse regularization [4], and more generally, atomic norms [4], and partly smooth regularizers [6]. Another extension of

13 interest is to problems where the data fidelity term is not quadratic (e.g., Poisson denoising [4]). REFERENCES [] M. V. Afonso, J. M. Bioucas-Dias, and M. A. T. Figueiredo. An augmented Lagrangian approach to linear inverse problems with compound regularization. In Proc. IEEE Int. Conf. Image Processing (ICIP), pages , September. [] R. Ahmad and P. Schniter. Iteratively reweighted L approaches to sparse composite regularization. IEEE Trans. Comput. Imaging, (4): 5, 5. [] S. Arnold, M. Khoshima, I. Teraoka, S. Holler, and F. Vollmer. Shift of whispering-gallery modes in microspheres by protein adsorption. Opt. Lett, 8(4):7 74, February 5,. [4] H. H. Bauschke and P. L. Combettes. Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer,. [5] İ. Bayram. Penalty functions derived from monotone mappings. IEEE Signal Processing Letters, ():65 69, March 5. [6] I. Bayram. On the convergence of the iterative shrinkage/thresholding algorithm with a weakly convex penalty. IEEE Trans. Signal Process., 64(6):597 68, March 6. [7] A. Blake and A. Zisserman. Visual Reconstruction. MIT Press, 987. [8] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 4. [9] A. Bruckstein, D. Donoho, and M. Elad. From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Review, 5():4 8, 9. [] E. J. Candes and Y. Plan. Matrix completion with noise. Proc. IEEE, 98(6):95 96, June. [] E. J. Candès, M. B. Wakin, and S. Boyd. Enhancing sparsity by reweighted l minimization. J. Fourier Anal. Appl., 4(5):877 95, December 8. [] M. Carlsson. On convexification/optimization of functionals including an l-misfit term. September 6. [] M. Castella and J.-C. Pesquet. Optimization of a Geman-McClure like criterion for sparse signal deconvolution. In IEEE Int. Workshop Comput. Adv. Multi-Sensor Adaptive Proc. (CAMSAP), pages 9, December 5. [4] V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky. The convex geometry of linear inverse problems. Foundations of Computational mathematics, (6):85 849,. [5] R. Chartrand. Shrinkage mappings and their induced penalty functions. In Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP), pages 6 9, May 4. [6] L. Chen and Y. Gu. The convergence guarantees of a non-convex approach for sparse recovery. IEEE Trans. Signal Process., 6(5): , August 4. [7] P.-Y. Chen and I. W. Selesnick. Group-sparse signal denoising: Nonconvex regularization, convex optimization. IEEE Trans. Signal Process., 6(): , July 4. [8] S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit. SIAM J. Sci. Comput., (): 6, 998. [9] E. Chouzenoux, A. Jezierska, J. Pesquet, and H. Talbot. A majorizeminimize subspace approach for l l image regularization. SIAM J. Imag. Sci., 6():56 59,. [] P. L. Combettes. Perspective functions: Properties, constructions, and examples. Set-Valued and Variational Analysis, pages 8, 7. [] V. R. Dantham, S. Holler, V. Kolchenko, Z. Wan, and S. Arnold. Taking whispering gallery-mode single virus detection and sizing to the limit. Appl. Phys. Lett., (4),. [] I. Daubechies, M. Defrise, and C. De Mol. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math, 57():4 457, 4. [] Y. Ding and I. W. Selesnick. Artifact-free wavelet denoising: Nonconvex sparse regularization, convex optimization. IEEE Signal Processing Letters, (9):64 68, September 5. [4] F.-X. Dupé, J. M. Fadili, and J.-L. Starck. A proximal iteration for deconvolving Poisson noisy images using sparse representations. IEEE Trans. Image Process., 8():, 9. [5] J. Fan and R. Li. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc., 96(456):48 6,. [6] M. Figueiredo and R. Nowak. An EM algorithm for wavelet-based image restoration. IEEE Trans. Image Process., (8):96 96, August. [7] M. A. T. Figueiredo, R. D. Nowak, and S. J. Wright. Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems. IEEE J. Sel. Top. Signal Process., (4): , December 7. [8] M. Fornasier and H. Rauhut. Iterative thresholding algorithms. J. of Appl. and Comp. Harm. Analysis, 5():87 8, 8. [9] H.-Y. Gao and A. G. Bruce. Waveshrink with firm shrinkage. Statistica Sinica, 7: , 997. [] G. Gasso, A. Rakotomamonjy, and S. Canu. Recovering sparse signals with a certain family of nonconvex penalties and DC programming. IEEE Trans. Signal Process., 57(): , December 9. [] A. Gholami and S. M. Hosseini. A general framework for sparsity-based denoising and inversion. IEEE Trans. Signal Process., 59():5 5, November. [] W. He, Y. Ding, Y. Zi, and I. W. Selesnick. Sparsity-based algorithm for detecting faults in rotating machines. Mechanical Systems and Signal Processing, 7-7:46 64, May 6. [] P. J. Huber. Robust estimation of a location parameter. The Annals of Mathematical Statistics, 5():7, 964. [4] M. Kowalski and B. Torrésani. Sparsity and persistence: mixed norms provide simple signal models with dependent coefficients. Signal, Image and Video Processing, ():5 64, 9. [5] A. Lanza, S. Morigi, I. Selesnick, and F. Sgallari. Nonconvex nonsmooth optimization via convex nonconvex majorization minimization. Numerische Mathematik, 6():4 8, 7. [6] A. Lanza, S. Morigi, and F. Sgallari. Convex image denoising via nonconvex regularization with parameter selection. J. Math. Imaging and Vision, 56():95, 6. [7] M. Malek-Mohammadi, C. R. Rojas, and B. Wahlberg. A class of nonconvex penalties preserving overall convexity in optimizationbased mean filtering. IEEE Trans. Signal Process., 64(4): , December 6. [8] Y. Marnissi, A. Benazza-Benyahia, E. Chouzenoux, and J.-C. Pesquet. Generalized multivariate exponential power prior for wavelet-based multichannel image restoration. In Proc. IEEE Int. Conf. Image Processing (ICIP),. [9] H. Mohimani, M. Babaie-Zadeh, and C. Jutten. A fast approach for overcomplete sparse decomposition based on smoothed l norm. IEEE Trans. Signal Process., 57():89, January 9. [4] S. Nam, M. E. Davies, M. Elad, and R. Gribonval. The cosparse analysis model and algorithms. J. of Appl. and Comp. Harm. Analysis, 4(): 56,. [4] M. Nikolova. Estimation of binary images by minimizing convex criteria. In Proc. IEEE Int. Conf. Image Processing (ICIP), pages 8 vol., 998. [4] M. Nikolova. Markovian reconstruction using a GNC approach. IEEE Trans. Image Process., 8(9):4, 999. [4] M. Nikolova. Energy minimization methods. In O. Scherzer, editor, Handbook of Mathematical Methods in Imaging, chapter 5, pages Springer,. [44] M. Nikolova, M. K. Ng, and C.-P. Tam. Fast nonconvex nonsmooth minimization methods for image restoration and reconstruction. IEEE Trans. Image Process., 9():7 88, December. [45] A. Parekh and I. W. Selesnick. Enhanced low-rank matrix approximation. IEEE Signal Processing Letters, (4):49 497, April 6. [46] N. Parikh and S. Boyd. Proximal algorithms. Foundations and Trends in Optimization, ():, 4. [47] J. Portilla and L. Mancera. L-based sparse approximation: two alternative methods and some applications. In Proceedings of SPIE, volume 67 (Wavelets XII), San Diego, CA, USA, 7. [48] L. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms. Physica D, 6:59 68, 99. [49] I. Selesnick. Sparsity amplified. In Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP), pages , March 7. [5] I. Selesnick. Sparsity-assisted signal smoothing (revisited). In Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP), pages , March 7. [5] I. Selesnick. Total variation denoising via the Moreau envelope. IEEE Signal Processing Letters, 4():6, February 7. [5] I. Selesnick and M. Farshchian. Sparse signal approximation via nonseparable regularization. IEEE Trans. Signal Process., 65():56 575, May 7. [5] I. W. Selesnick. Sparsity-assisted signal smoothing. In R. Balan et al., editors, Excursions in Harmonic Analysis, Volume 4, pages Birkhäuser Basel, 5.

14 4 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 65, NO. 7, PP , SEPTEMBER 7 (PREPRINT) [54] I. W. Selesnick and I. Bayram. Sparse signal estimation by maximally sparse convex optimization. IEEE Trans. Signal Process., 6(5):78 9, March 4. [55] I. W. Selesnick and I. Bayram. Enhanced sparsity by non-separable regularization. IEEE Trans. Signal Process., 64(9):98, May 6. [56] I. W. Selesnick, A. Parekh, and I. Bayram. Convex -D total variation denoising with non-convex regularization. IEEE Signal Processing Letters, ():4 44, February 5. [57] E. Soubies, L. Blanc-Féraud, and G. Aubert. A continuous exact l penalty (CEL) for least squares regularized problem. SIAM J. Imag. Sci., 8():67 69, 5. [58] C. Soussen, J. Idier, J. Duan, and D. Brie. Homotopy based algorithms for l -regularized least-squares. IEEE Trans. Signal Process., 6(): 6, July 5. [59] J.-L. Starck, F. Murtagh, and J. Fadili. Sparse image and signal processing: Wavelets and related geometric multiscale analysis. Cambridge University Press, 5. [6] M. E. Tipping. Sparse Bayesian learning and the relevance vector machine. J. Machine Learning Research, : 44,. [6] S. Vaiter, C. Deledalle, J. Fadili, G. Peyré, and C. Dossal. The degrees of freedom of partly smooth regularizers. Annals of the Institute of Statistical Mathematics, pages 4, 6. [6] S. Voronin and R. Chartrand. A new generalized thresholding algorithm for inverse problems with sparsity constraints. In Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP), pages 66 64, May. [6] D. P. Wipf, B. D. Rao, and S. Nagarajan. Latent variable Bayesian models for promoting sparsity. IEEE Trans. Inform. Theory, 57(9):66 655, September. [64] J. Woodworth and R. Chartrand. Compressed sensing recovery via nonconvex shrinkage penalties. Inverse Problems, (7): , July 6. [65] C.-H. Zhang. Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, pages ,. [66] H. Zou and R. Li. One-step sparse estimates in nonconcave penalized likelihood models. Ann. Statist., 6(4):59 5, 8.

Sparse Regularization via Convex Analysis

Sparse Regularization via Convex Analysis Sparse Regularization via Convex Analysis Ivan Selesnick Electrical and Computer Engineering Tandon School of Engineering New York University Brooklyn, New York, USA 29 / 66 Convex or non-convex: Which

More information

SPARSE SIGNAL RESTORATION. 1. Introduction

SPARSE SIGNAL RESTORATION. 1. Introduction SPARSE SIGNAL RESTORATION IVAN W. SELESNICK 1. Introduction These notes describe an approach for the restoration of degraded signals using sparsity. This approach, which has become quite popular, is useful

More information

Minimizing Isotropic Total Variation without Subiterations

Minimizing Isotropic Total Variation without Subiterations MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Minimizing Isotropic Total Variation without Subiterations Kamilov, U. S. TR206-09 August 206 Abstract Total variation (TV) is one of the most

More information

Artifact-free Wavelet Denoising: Non-convex Sparse Regularization, Convex Optimization

Artifact-free Wavelet Denoising: Non-convex Sparse Regularization, Convex Optimization IEEE SIGNAL PROCESSING LETTERS. 22(9):164-168, SEPTEMBER 215. (PREPRINT) 1 Artifact-free Wavelet Denoising: Non-convex Sparse Regularization, Convex Optimization Yin Ding and Ivan W. Selesnick Abstract

More information

Sparse Signal Estimation by Maximally Sparse Convex Optimization

Sparse Signal Estimation by Maximally Sparse Convex Optimization IEEE TRANSACTIONS ON SIGNAL PROCESSING. (PREPRINT) Sparse Signal Estimation by Maximally Sparse Convex Optimization Ivan W. Selesnick and Ilker Bayram Abstract This paper addresses the problem of sparsity

More information

Denoising of NIRS Measured Biomedical Signals

Denoising of NIRS Measured Biomedical Signals Denoising of NIRS Measured Biomedical Signals Y V Rami reddy 1, Dr.D.VishnuVardhan 2 1 M. Tech, Dept of ECE, JNTUA College of Engineering, Pulivendula, A.P, India 2 Assistant Professor, Dept of ECE, JNTUA

More information

SPARSITY-ASSISTED SIGNAL SMOOTHING (REVISITED) Ivan Selesnick

SPARSITY-ASSISTED SIGNAL SMOOTHING (REVISITED) Ivan Selesnick SPARSITY-ASSISTED SIGNAL SMOOTHING (REVISITED) Ivan Selesnick Electrical and Computer Engineering Tandon School of Engineering, New York University Brooklyn, New York, USA ABSTRACT This paper proposes

More information

TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS

TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS TRACKING SOLUTIONS OF TIME VARYING LINEAR INVERSE PROBLEMS Martin Kleinsteuber and Simon Hawe Department of Electrical Engineering and Information Technology, Technische Universität München, München, Arcistraße

More information

Sparsity-Assisted Signal Smoothing

Sparsity-Assisted Signal Smoothing Sparsity-Assisted Signal Smoothing Ivan W. Selesnick Electrical and Computer Engineering NYU Polytechnic School of Engineering Brooklyn, NY, 1121 selesi@poly.edu Summary. This chapter describes a method

More information

The Suppression of Transient Artifacts in Time Series via Convex Analysis

The Suppression of Transient Artifacts in Time Series via Convex Analysis The Suppression of Transient Artifacts in Time Series via Convex Analysis Yining Feng 1, Harry Graber 2, and Ivan Selesnick 1 (1) Tandon School of Engineering, New York University, Brooklyn, New York (2)

More information

Introduction to Sparsity in Signal Processing 1

Introduction to Sparsity in Signal Processing 1 Introduction to Sparsity in Signal Processing Ivan Selesnick June, NYU-Poly Introduction These notes describe how sparsity can be used in several signal processing problems. A common theme throughout these

More information

Signal Restoration with Overcomplete Wavelet Transforms: Comparison of Analysis and Synthesis Priors

Signal Restoration with Overcomplete Wavelet Transforms: Comparison of Analysis and Synthesis Priors Signal Restoration with Overcomplete Wavelet Transforms: Comparison of Analysis and Synthesis Priors Ivan W. Selesnick a and Mário A. T. Figueiredo b a Polytechnic Institute of New York University, Brooklyn,

More information

2 Regularized Image Reconstruction for Compressive Imaging and Beyond

2 Regularized Image Reconstruction for Compressive Imaging and Beyond EE 367 / CS 448I Computational Imaging and Display Notes: Compressive Imaging and Regularized Image Reconstruction (lecture ) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement

More information

Introduction to Sparsity in Signal Processing

Introduction to Sparsity in Signal Processing 1 Introduction to Sparsity in Signal Processing Ivan Selesnick Polytechnic Institute of New York University Brooklyn, New York selesi@poly.edu 212 2 Under-determined linear equations Consider a system

More information

On the Convergence of the Iterative Shrinkage/Thresholding Algorithm With a Weakly Convex Penalty

On the Convergence of the Iterative Shrinkage/Thresholding Algorithm With a Weakly Convex Penalty On the Convergence of the Iterative Shrinkage/Thresholding Algorithm With a Weakly Convex Penalty İlker Bayram arxiv:5.78v [math.oc] Nov 5 Abstract We consider the iterative shrinkage/thresholding algorithm

More information

A discretized Newton flow for time varying linear inverse problems

A discretized Newton flow for time varying linear inverse problems A discretized Newton flow for time varying linear inverse problems Martin Kleinsteuber and Simon Hawe Department of Electrical Engineering and Information Technology, Technische Universität München Arcisstrasse

More information

Introduction to Sparsity in Signal Processing

Introduction to Sparsity in Signal Processing 1 Introduction to Sparsity in Signal Processing Ivan Selesnick Polytechnic Institute of New York University Brooklyn, New York selesi@poly.edu 212 2 Under-determined linear equations Consider a system

More information

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6)

EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) EE 367 / CS 448I Computational Imaging and Display Notes: Image Deconvolution (lecture 6) Gordon Wetzstein gordon.wetzstein@stanford.edu This document serves as a supplement to the material discussed in

More information

Fast Angular Synchronization for Phase Retrieval via Incomplete Information

Fast Angular Synchronization for Phase Retrieval via Incomplete Information Fast Angular Synchronization for Phase Retrieval via Incomplete Information Aditya Viswanathan a and Mark Iwen b a Department of Mathematics, Michigan State University; b Department of Mathematics & Department

More information

Continuous State MRF s

Continuous State MRF s EE64 Digital Image Processing II: Purdue University VISE - December 4, Continuous State MRF s Topics to be covered: Quadratic functions Non-Convex functions Continuous MAP estimation Convex functions EE64

More information

About Split Proximal Algorithms for the Q-Lasso

About Split Proximal Algorithms for the Q-Lasso Thai Journal of Mathematics Volume 5 (207) Number : 7 http://thaijmath.in.cmu.ac.th ISSN 686-0209 About Split Proximal Algorithms for the Q-Lasso Abdellatif Moudafi Aix Marseille Université, CNRS-L.S.I.S

More information

Learning MMSE Optimal Thresholds for FISTA

Learning MMSE Optimal Thresholds for FISTA MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com Learning MMSE Optimal Thresholds for FISTA Kamilov, U.; Mansour, H. TR2016-111 August 2016 Abstract Fast iterative shrinkage/thresholding algorithm

More information

MMSE Denoising of 2-D Signals Using Consistent Cycle Spinning Algorithm

MMSE Denoising of 2-D Signals Using Consistent Cycle Spinning Algorithm Denoising of 2-D Signals Using Consistent Cycle Spinning Algorithm Bodduluri Asha, B. Leela kumari Abstract: It is well known that in a real world signals do not exist without noise, which may be negligible

More information

Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization

Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization Exact Low-rank Matrix Recovery via Nonconvex M p -Minimization Lingchen Kong and Naihua Xiu Department of Applied Mathematics, Beijing Jiaotong University, Beijing, 100044, People s Republic of China E-mail:

More information

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond

Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Large-Scale L1-Related Minimization in Compressive Sensing and Beyond Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Arizona State University March

More information

Gauge optimization and duality

Gauge optimization and duality 1 / 54 Gauge optimization and duality Junfeng Yang Department of Mathematics Nanjing University Joint with Shiqian Ma, CUHK September, 2015 2 / 54 Outline Introduction Duality Lagrange duality Fenchel

More information

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms

Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Coordinate Update Algorithm Short Course Proximal Operators and Algorithms Instructor: Wotao Yin (UCLA Math) Summer 2016 1 / 36 Why proximal? Newton s method: for C 2 -smooth, unconstrained problems allow

More information

EUSIPCO

EUSIPCO EUSIPCO 013 1569746769 SUBSET PURSUIT FOR ANALYSIS DICTIONARY LEARNING Ye Zhang 1,, Haolong Wang 1, Tenglong Yu 1, Wenwu Wang 1 Department of Electronic and Information Engineering, Nanchang University,

More information

On Optimal Frame Conditioners

On Optimal Frame Conditioners On Optimal Frame Conditioners Chae A. Clark Department of Mathematics University of Maryland, College Park Email: cclark18@math.umd.edu Kasso A. Okoudjou Department of Mathematics University of Maryland,

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

LINEARIZED BREGMAN ITERATIONS FOR FRAME-BASED IMAGE DEBLURRING

LINEARIZED BREGMAN ITERATIONS FOR FRAME-BASED IMAGE DEBLURRING LINEARIZED BREGMAN ITERATIONS FOR FRAME-BASED IMAGE DEBLURRING JIAN-FENG CAI, STANLEY OSHER, AND ZUOWEI SHEN Abstract. Real images usually have sparse approximations under some tight frame systems derived

More information

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit New Coherence and RIP Analysis for Wea 1 Orthogonal Matching Pursuit Mingrui Yang, Member, IEEE, and Fran de Hoog arxiv:1405.3354v1 [cs.it] 14 May 2014 Abstract In this paper we define a new coherence

More information

Adaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise

Adaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise Adaptive Corrected Procedure for TVL1 Image Deblurring under Impulsive Noise Minru Bai(x T) College of Mathematics and Econometrics Hunan University Joint work with Xiongjun Zhang, Qianqian Shao June 30,

More information

Sparse Solutions of an Undetermined Linear System

Sparse Solutions of an Undetermined Linear System 1 Sparse Solutions of an Undetermined Linear System Maddullah Almerdasy New York University Tandon School of Engineering arxiv:1702.07096v1 [math.oc] 23 Feb 2017 Abstract This work proposes a research

More information

A Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases

A Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases 2558 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 48, NO 9, SEPTEMBER 2002 A Generalized Uncertainty Principle Sparse Representation in Pairs of Bases Michael Elad Alfred M Bruckstein Abstract An elementary

More information

Translation-Invariant Shrinkage/Thresholding of Group Sparse. Signals

Translation-Invariant Shrinkage/Thresholding of Group Sparse. Signals Translation-Invariant Shrinkage/Thresholding of Group Sparse Signals Po-Yu Chen and Ivan W. Selesnick Polytechnic Institute of New York University, 6 Metrotech Center, Brooklyn, NY 11201, USA Email: poyupaulchen@gmail.com,

More information

A Parallel Block-Coordinate Approach for Primal-Dual Splitting with Arbitrary Random Block Selection

A Parallel Block-Coordinate Approach for Primal-Dual Splitting with Arbitrary Random Block Selection EUSIPCO 2015 1/19 A Parallel Block-Coordinate Approach for Primal-Dual Splitting with Arbitrary Random Block Selection Jean-Christophe Pesquet Laboratoire d Informatique Gaspard Monge - CNRS Univ. Paris-Est

More information

An Homotopy Algorithm for the Lasso with Online Observations

An Homotopy Algorithm for the Lasso with Online Observations An Homotopy Algorithm for the Lasso with Online Observations Pierre J. Garrigues Department of EECS Redwood Center for Theoretical Neuroscience University of California Berkeley, CA 94720 garrigue@eecs.berkeley.edu

More information

PART II: Basic Theory of Half-quadratic Minimization

PART II: Basic Theory of Half-quadratic Minimization PART II: Basic Theory of Half-quadratic Minimization Ran He, Wei-Shi Zheng and Liang Wang 013-1-01 Outline History of half-quadratic (HQ) minimization Half-quadratic minimization HQ loss functions HQ in

More information

Math 273a: Optimization Overview of First-Order Optimization Algorithms

Math 273a: Optimization Overview of First-Order Optimization Algorithms Math 273a: Optimization Overview of First-Order Optimization Algorithms Wotao Yin Department of Mathematics, UCLA online discussions on piazza.com 1 / 9 Typical flow of numerical optimization Optimization

More information

A New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction

A New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction A New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with H. Bauschke and J. Bolte Optimization

More information

Symmetric Wavelet Tight Frames with Two Generators

Symmetric Wavelet Tight Frames with Two Generators Symmetric Wavelet Tight Frames with Two Generators Ivan W. Selesnick Electrical and Computer Engineering Polytechnic University 6 Metrotech Center, Brooklyn, NY 11201, USA tel: 718 260-3416, fax: 718 260-3906

More information

DNNs for Sparse Coding and Dictionary Learning

DNNs for Sparse Coding and Dictionary Learning DNNs for Sparse Coding and Dictionary Learning Subhadip Mukherjee, Debabrata Mahapatra, and Chandra Sekhar Seelamantula Department of Electrical Engineering, Indian Institute of Science, Bangalore 5612,

More information

SCRIBERS: SOROOSH SHAFIEEZADEH-ABADEH, MICHAËL DEFFERRARD

SCRIBERS: SOROOSH SHAFIEEZADEH-ABADEH, MICHAËL DEFFERRARD EE-731: ADVANCED TOPICS IN DATA SCIENCES LABORATORY FOR INFORMATION AND INFERENCE SYSTEMS SPRING 2016 INSTRUCTOR: VOLKAN CEVHER SCRIBERS: SOROOSH SHAFIEEZADEH-ABADEH, MICHAËL DEFFERRARD STRUCTURED SPARSITY

More information

Wavelet Based Image Restoration Using Cross-Band Operators

Wavelet Based Image Restoration Using Cross-Band Operators 1 Wavelet Based Image Restoration Using Cross-Band Operators Erez Cohen Electrical Engineering Department Technion - Israel Institute of Technology Supervised by Prof. Israel Cohen 2 Layout Introduction

More information

COMPRESSED Sensing (CS) is a method to recover a

COMPRESSED Sensing (CS) is a method to recover a 1 Sample Complexity of Total Variation Minimization Sajad Daei, Farzan Haddadi, Arash Amini Abstract This work considers the use of Total Variation (TV) minimization in the recovery of a given gradient

More information

OPTIMAL SURE PARAMETERS FOR SIGMOIDAL WAVELET SHRINKAGE

OPTIMAL SURE PARAMETERS FOR SIGMOIDAL WAVELET SHRINKAGE 17th European Signal Processing Conference (EUSIPCO 009) Glasgow, Scotland, August 4-8, 009 OPTIMAL SURE PARAMETERS FOR SIGMOIDAL WAVELET SHRINKAGE Abdourrahmane M. Atto 1, Dominique Pastor, Gregoire Mercier

More information

Sparse signal representation and the tunable Q-factor wavelet transform

Sparse signal representation and the tunable Q-factor wavelet transform Sparse signal representation and the tunable Q-factor wavelet transform Ivan Selesnick Polytechnic Institute of New York University Brooklyn, New York Introduction Problem: Decomposition of a signal into

More information

Robust Principal Component Analysis

Robust Principal Component Analysis ELE 538B: Mathematics of High-Dimensional Data Robust Principal Component Analysis Yuxin Chen Princeton University, Fall 2018 Disentangling sparse and low-rank matrices Suppose we are given a matrix M

More information

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Sparse regression. Optimization-Based Data Analysis.   Carlos Fernandez-Granda Sparse regression Optimization-Based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_spring16 Carlos Fernandez-Granda 3/28/2016 Regression Least-squares regression Example: Global warming Logistic

More information

Lecture Notes 9: Constrained Optimization

Lecture Notes 9: Constrained Optimization Optimization-based data analysis Fall 017 Lecture Notes 9: Constrained Optimization 1 Compressed sensing 1.1 Underdetermined linear inverse problems Linear inverse problems model measurements of the form

More information

c 2011 International Press Vol. 18, No. 1, pp , March DENNIS TREDE

c 2011 International Press Vol. 18, No. 1, pp , March DENNIS TREDE METHODS AND APPLICATIONS OF ANALYSIS. c 2011 International Press Vol. 18, No. 1, pp. 105 110, March 2011 007 EXACT SUPPORT RECOVERY FOR LINEAR INVERSE PROBLEMS WITH SPARSITY CONSTRAINTS DENNIS TREDE Abstract.

More information

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER 2011 7255 On the Performance of Sparse Recovery Via `p-minimization (0 p 1) Meng Wang, Student Member, IEEE, Weiyu Xu, and Ao Tang, Senior

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

A Dykstra-like algorithm for two monotone operators

A Dykstra-like algorithm for two monotone operators A Dykstra-like algorithm for two monotone operators Heinz H. Bauschke and Patrick L. Combettes Abstract Dykstra s algorithm employs the projectors onto two closed convex sets in a Hilbert space to construct

More information

An iterative hard thresholding estimator for low rank matrix recovery

An iterative hard thresholding estimator for low rank matrix recovery An iterative hard thresholding estimator for low rank matrix recovery Alexandra Carpentier - based on a joint work with Arlene K.Y. Kim Statistical Laboratory, Department of Pure Mathematics and Mathematical

More information

Convex Hodge Decomposition of Image Flows

Convex Hodge Decomposition of Image Flows Convex Hodge Decomposition of Image Flows Jing Yuan 1, Gabriele Steidl 2, Christoph Schnörr 1 1 Image and Pattern Analysis Group, Heidelberg Collaboratory for Image Processing, University of Heidelberg,

More information

Robust Principal Component Analysis Based on Low-Rank and Block-Sparse Matrix Decomposition

Robust Principal Component Analysis Based on Low-Rank and Block-Sparse Matrix Decomposition Robust Principal Component Analysis Based on Low-Rank and Block-Sparse Matrix Decomposition Gongguo Tang and Arye Nehorai Department of Electrical and Systems Engineering Washington University in St Louis

More information

Tractable Upper Bounds on the Restricted Isometry Constant

Tractable Upper Bounds on the Restricted Isometry Constant Tractable Upper Bounds on the Restricted Isometry Constant Alex d Aspremont, Francis Bach, Laurent El Ghaoui Princeton University, École Normale Supérieure, U.C. Berkeley. Support from NSF, DHS and Google.

More information

Uniqueness Conditions for A Class of l 0 -Minimization Problems

Uniqueness Conditions for A Class of l 0 -Minimization Problems Uniqueness Conditions for A Class of l 0 -Minimization Problems Chunlei Xu and Yun-Bin Zhao October, 03, Revised January 04 Abstract. We consider a class of l 0 -minimization problems, which is to search

More information

ϕ ( ( u) i 2 ; T, a), (1.1)

ϕ ( ( u) i 2 ; T, a), (1.1) CONVEX NON-CONVEX IMAGE SEGMENTATION RAYMOND CHAN, ALESSANDRO LANZA, SERENA MORIGI, AND FIORELLA SGALLARI Abstract. A convex non-convex variational model is proposed for multiphase image segmentation.

More information

A memory gradient algorithm for l 2 -l 0 regularization with applications to image restoration

A memory gradient algorithm for l 2 -l 0 regularization with applications to image restoration A memory gradient algorithm for l 2 -l 0 regularization with applications to image restoration E. Chouzenoux, A. Jezierska, J.-C. Pesquet and H. Talbot Université Paris-Est Lab. d Informatique Gaspard

More information

In collaboration with J.-C. Pesquet A. Repetti EC (UPE) IFPEN 16 Dec / 29

In collaboration with J.-C. Pesquet A. Repetti EC (UPE) IFPEN 16 Dec / 29 A Random block-coordinate primal-dual proximal algorithm with application to 3D mesh denoising Emilie CHOUZENOUX Laboratoire d Informatique Gaspard Monge - CNRS Univ. Paris-Est, France Horizon Maths 2014

More information

Bayesian Methods for Sparse Signal Recovery

Bayesian Methods for Sparse Signal Recovery Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Jason Palmer, Zhilin Zhang and Ritwik Giri Motivation Motivation Sparse Signal Recovery

More information

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem

Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem Iterative Convex Optimization Algorithms; Part One: Using the Baillon Haddad Theorem Charles Byrne (Charles Byrne@uml.edu) http://faculty.uml.edu/cbyrne/cbyrne.html Department of Mathematical Sciences

More information

Dual and primal-dual methods

Dual and primal-dual methods ELE 538B: Large-Scale Optimization for Data Science Dual and primal-dual methods Yuxin Chen Princeton University, Spring 2018 Outline Dual proximal gradient method Primal-dual proximal gradient method

More information

POISSON noise, also known as photon noise, is a basic

POISSON noise, also known as photon noise, is a basic IEEE SIGNAL PROCESSING LETTERS, VOL. N, NO. N, JUNE 2016 1 A fast and effective method for a Poisson denoising model with total variation Wei Wang and Chuanjiang He arxiv:1609.05035v1 [math.oc] 16 Sep

More information

Sparse signal representation and the tunable Q-factor wavelet transform

Sparse signal representation and the tunable Q-factor wavelet transform Sparse signal representation and the tunable Q-factor wavelet transform Ivan Selesnick Polytechnic Institute of New York University Brooklyn, New York Introduction Problem: Decomposition of a signal into

More information

Robust PCA. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng

Robust PCA. CS5240 Theoretical Foundations in Multimedia. Leow Wee Kheng Robust PCA CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Robust PCA 1 / 52 Previously...

More information

Robust Sparse Recovery via Non-Convex Optimization

Robust Sparse Recovery via Non-Convex Optimization Robust Sparse Recovery via Non-Convex Optimization Laming Chen and Yuantao Gu Department of Electronic Engineering, Tsinghua University Homepage: http://gu.ee.tsinghua.edu.cn/ Email: gyt@tsinghua.edu.cn

More information

446 SCIENCE IN CHINA (Series F) Vol. 46 introduced in refs. [6, ]. Based on this inequality, we add normalization condition, symmetric conditions and

446 SCIENCE IN CHINA (Series F) Vol. 46 introduced in refs. [6, ]. Based on this inequality, we add normalization condition, symmetric conditions and Vol. 46 No. 6 SCIENCE IN CHINA (Series F) December 003 Construction for a class of smooth wavelet tight frames PENG Lizhong (Λ Π) & WANG Haihui (Ξ ) LMAM, School of Mathematical Sciences, Peking University,

More information

A direct formulation for sparse PCA using semidefinite programming

A direct formulation for sparse PCA using semidefinite programming A direct formulation for sparse PCA using semidefinite programming A. d Aspremont, L. El Ghaoui, M. Jordan, G. Lanckriet ORFE, Princeton University & EECS, U.C. Berkeley Available online at www.princeton.edu/~aspremon

More information

Dual methods for the minimization of the total variation

Dual methods for the minimization of the total variation 1 / 30 Dual methods for the minimization of the total variation Rémy Abergel supervisor Lionel Moisan MAP5 - CNRS UMR 8145 Different Learning Seminar, LTCI Thursday 21st April 2016 2 / 30 Plan 1 Introduction

More information

EMPLOYING PHASE INFORMATION FOR AUDIO DENOISING. İlker Bayram. Istanbul Technical University, Istanbul, Turkey

EMPLOYING PHASE INFORMATION FOR AUDIO DENOISING. İlker Bayram. Istanbul Technical University, Istanbul, Turkey EMPLOYING PHASE INFORMATION FOR AUDIO DENOISING İlker Bayram Istanbul Technical University, Istanbul, Turkey ABSTRACT Spectral audio denoising methods usually make use of the magnitudes of a time-frequency

More information

Least Sparsity of p-norm based Optimization Problems with p > 1

Least Sparsity of p-norm based Optimization Problems with p > 1 Least Sparsity of p-norm based Optimization Problems with p > Jinglai Shen and Seyedahmad Mousavi Original version: July, 07; Revision: February, 08 Abstract Motivated by l p -optimization arising from

More information

Least Squares with Examples in Signal Processing 1. 2 Overdetermined equations. 1 Notation. The sum of squares of x is denoted by x 2 2, i.e.

Least Squares with Examples in Signal Processing 1. 2 Overdetermined equations. 1 Notation. The sum of squares of x is denoted by x 2 2, i.e. Least Squares with Eamples in Signal Processing Ivan Selesnick March 7, 3 NYU-Poly These notes address (approimate) solutions to linear equations by least squares We deal with the easy case wherein the

More information

Signal Recovery, Uncertainty Relations, and Minkowski Dimension

Signal Recovery, Uncertainty Relations, and Minkowski Dimension Signal Recovery, Uncertainty Relations, and Minkowski Dimension Helmut Bőlcskei ETH Zurich December 2013 Joint work with C. Aubel, P. Kuppinger, G. Pope, E. Riegler, D. Stotz, and C. Studer Aim of this

More information

ROBUST BLIND SPIKES DECONVOLUTION. Yuejie Chi. Department of ECE and Department of BMI The Ohio State University, Columbus, Ohio 43210

ROBUST BLIND SPIKES DECONVOLUTION. Yuejie Chi. Department of ECE and Department of BMI The Ohio State University, Columbus, Ohio 43210 ROBUST BLIND SPIKES DECONVOLUTION Yuejie Chi Department of ECE and Department of BMI The Ohio State University, Columbus, Ohio 4 ABSTRACT Blind spikes deconvolution, or blind super-resolution, deals with

More information

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches

Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Splitting Techniques in the Face of Huge Problem Sizes: Block-Coordinate and Block-Iterative Approaches Patrick L. Combettes joint work with J.-C. Pesquet) Laboratoire Jacques-Louis Lions Faculté de Mathématiques

More information

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble

Motivation Sparse Signal Recovery is an interesting area with many potential applications. Methods developed for solving sparse signal recovery proble Bayesian Methods for Sparse Signal Recovery Bhaskar D Rao 1 University of California, San Diego 1 Thanks to David Wipf, Zhilin Zhang and Ritwik Giri Motivation Sparse Signal Recovery is an interesting

More information

Self-dual Smooth Approximations of Convex Functions via the Proximal Average

Self-dual Smooth Approximations of Convex Functions via the Proximal Average Chapter Self-dual Smooth Approximations of Convex Functions via the Proximal Average Heinz H. Bauschke, Sarah M. Moffat, and Xianfu Wang Abstract The proximal average of two convex functions has proven

More information

LEARNING OVERCOMPLETE SPARSIFYING TRANSFORMS FOR SIGNAL PROCESSING. Saiprasad Ravishankar and Yoram Bresler

LEARNING OVERCOMPLETE SPARSIFYING TRANSFORMS FOR SIGNAL PROCESSING. Saiprasad Ravishankar and Yoram Bresler LEARNING OVERCOMPLETE SPARSIFYING TRANSFORMS FOR SIGNAL PROCESSING Saiprasad Ravishankar and Yoram Bresler Department of Electrical and Computer Engineering and the Coordinated Science Laboratory, University

More information

Proximal methods. S. Villa. October 7, 2014

Proximal methods. S. Villa. October 7, 2014 Proximal methods S. Villa October 7, 2014 1 Review of the basics Often machine learning problems require the solution of minimization problems. For instance, the ERM algorithm requires to solve a problem

More information

Strengthened Sobolev inequalities for a random subspace of functions

Strengthened Sobolev inequalities for a random subspace of functions Strengthened Sobolev inequalities for a random subspace of functions Rachel Ward University of Texas at Austin April 2013 2 Discrete Sobolev inequalities Proposition (Sobolev inequality for discrete images)

More information

IEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 9, SEPTEMBER

IEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 9, SEPTEMBER IEEE SIGNAL PROCESSING LETTERS, VOL. 22, NO. 9, SEPTEMBER 2015 1239 Preconditioning for Underdetermined Linear Systems with Sparse Solutions Evaggelia Tsiligianni, StudentMember,IEEE, Lisimachos P. Kondi,

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

Enhanced Compressive Sensing and More

Enhanced Compressive Sensing and More Enhanced Compressive Sensing and More Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Nonlinear Approximation Techniques Using L1 Texas A & M University

More information

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation

A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation A Brief Overview of Practical Optimization Algorithms in the Context of Relaxation Zhouchen Lin Peking University April 22, 2018 Too Many Opt. Problems! Too Many Opt. Algorithms! Zero-th order algorithms:

More information

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization

A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization A Multilevel Proximal Algorithm for Large Scale Composite Convex Optimization Panos Parpas Department of Computing Imperial College London www.doc.ic.ac.uk/ pp500 p.parpas@imperial.ac.uk jointly with D.V.

More information

Design of Projection Matrix for Compressive Sensing by Nonsmooth Optimization

Design of Projection Matrix for Compressive Sensing by Nonsmooth Optimization Design of Proection Matrix for Compressive Sensing by Nonsmooth Optimization W.-S. Lu T. Hinamoto Dept. of Electrical & Computer Engineering Graduate School of Engineering University of Victoria Hiroshima

More information

Sequential Unconstrained Minimization: A Survey

Sequential Unconstrained Minimization: A Survey Sequential Unconstrained Minimization: A Survey Charles L. Byrne February 21, 2013 Abstract The problem is to minimize a function f : X (, ], over a non-empty subset C of X, where X is an arbitrary set.

More information

Sparsity Regularization

Sparsity Regularization Sparsity Regularization Bangti Jin Course Inverse Problems & Imaging 1 / 41 Outline 1 Motivation: sparsity? 2 Mathematical preliminaries 3 l 1 solvers 2 / 41 problem setup finite-dimensional formulation

More information

Signal Denoising with Wavelets

Signal Denoising with Wavelets Signal Denoising with Wavelets Selin Aviyente Department of Electrical and Computer Engineering Michigan State University March 30, 2010 Introduction Assume an additive noise model: x[n] = f [n] + w[n]

More information

PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION. A Thesis MELTEM APAYDIN

PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION. A Thesis MELTEM APAYDIN PHASE RETRIEVAL OF SPARSE SIGNALS FROM MAGNITUDE INFORMATION A Thesis by MELTEM APAYDIN Submitted to the Office of Graduate and Professional Studies of Texas A&M University in partial fulfillment of the

More information

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1

EE 546, Univ of Washington, Spring Proximal mapping. introduction. review of conjugate functions. proximal mapping. Proximal mapping 6 1 EE 546, Univ of Washington, Spring 2012 6. Proximal mapping introduction review of conjugate functions proximal mapping Proximal mapping 6 1 Proximal mapping the proximal mapping (prox-operator) of a convex

More information

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles Or: the equation Ax = b, revisited University of California, Los Angeles Mahler Lecture Series Acquiring signals Many types of real-world signals (e.g. sound, images, video) can be viewed as an n-dimensional

More information

A New Estimate of Restricted Isometry Constants for Sparse Solutions

A New Estimate of Restricted Isometry Constants for Sparse Solutions A New Estimate of Restricted Isometry Constants for Sparse Solutions Ming-Jun Lai and Louis Y. Liu January 12, 211 Abstract We show that as long as the restricted isometry constant δ 2k < 1/2, there exist

More information

A generalized forward-backward method for solving split equality quasi inclusion problems in Banach spaces

A generalized forward-backward method for solving split equality quasi inclusion problems in Banach spaces Available online at www.isr-publications.com/jnsa J. Nonlinear Sci. Appl., 10 (2017), 4890 4900 Research Article Journal Homepage: www.tjnsa.com - www.isr-publications.com/jnsa A generalized forward-backward

More information

Compressed Sensing and Neural Networks

Compressed Sensing and Neural Networks and Jan Vybíral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31 Outline Lasso & Introduction Notation Training the network Applications

More information

Sparse linear models and denoising

Sparse linear models and denoising Lecture notes 4 February 22, 2016 Sparse linear models and denoising 1 Introduction 1.1 Definition and motivation Finding representations of signals that allow to process them more effectively is a central

More information