Sparse Regularization via Convex Analysis

Size: px

Start display at page:

Download "Sparse Regularization via Convex Analysis"

Basil Edwards
5 years ago
Views:

1 Sparse Regularization via Convex Analysis Ivan Selesnick Electrical and Computer Engineering Tandon School of Engineering New York University Brooklyn, New York, USA 29 / 66

2 Convex or non-convex: Which is better for inverse problems? Benefits of convex optimization:. Absence of suboptimal local minima 2. Continuity of solution as a function of input data 3. Algorithms guaranteed to converge to a global optimum 4. Regularization parameters easier to set But convex regularization tends to under-estimate signal values. Non-convex regularization often performs better! Can we design non-convex sparsity-inducing penalties that maintain the convexity of the cost function to be minimized? 2 / 66

3 Convex function 4 convex function / 66

4 Non-Convex function 2 Non-convex function / 66

5 Goal Goal: Find a sparse approximate solution to a linear system y = Ax. Minimize a cost function: J(x) = 2 y Ax λ x or F (x) = 2 y Ax λ ψ(x) Question: How to define ψ? Let us allow ψ to be non-convex such that F is convex. This is the Convex Non-Convex (CNC) approach. 5 / 66

6 Biosensor Signal Biosensor data Time (n) 6 / 66

7 Linear Filter Given noisy data y R N, perform smoothing via: ˆx = arg min x R N which can be written where { N } N y(n) x(n) 2 + λ x(n) x(n ) 2 n= n= { } ˆx = arg min y x 2 x R N 2 + λ Dx 2 2 x 2 2 := n x(n) 2 D = Solution: ˆx = (I + λd T D) y 7 / 66

8 Biosensor Signal Biosensor data Time (n) Linear filter Time (n) 8 / 66

9 Total Variation Denoising (Nonlinear Filter) Given noisy data y R N, perform smoothing via: { } N N ˆx = arg min y(n) x(n) 2 + λ x(n) x(n ) x R N 2 which can be written n= ˆx = arg min x R N n= { } 2 y x λ Dx where x 2 2 := n x(n) 2, x := n x(n) Solution? No closed form solution. D = Use iterative algorithm... but note the cost function is not differentiable! 9 / 66

10 Biosensor Signal Biosensor data Time (n) Total variation denoising Time (n) / 66

11 Two Penalties The l norm induces sparsity unlike the the sum of squares. x 2 x 2 2 x / 66

12 Combine Quadratic and Sparse Regularization { } arg min u,v R N 2 y u v λ Du + λ2 2 Dv 2 2 ˆx = u + v 2 / 66

13 Combine Quadratic and Sparse Regularization Solving for v gives { } arg min u,v R N 2 y u v λ Du + λ2 2 Dv 2 2 v = (I + λ 2D T D) (y u) x = v + u = (I + λ 2D T D) (y + λ 2D T Du) Substituting v back in to the cost function: or J(u) = λ2 2 (y u)t D T (I + λ 2DD T ) D(y u) + λ Du J(u) = λ2 2 R D(y u) λ Du RR T = I + λ 2DD T (R is a banded matrix) Since x depends on Du, not u directly, define g = Du. So we need to minimize F (g) = λ2 2 R Dy R g λ g 3 / 66

14 Biosensor Signal Biosensor data Time (n) SIPS Denoising Time (n) 4 / 66

15 Scalar case { } ˆx = arg min x 2 (y x)2 + λ x = ˆx λ y 5 / 66

16 Non-convex scalar penalty functions (alternatives to l norm) Log φ a(x) = log( + a x ) a Rat φ a(x) = x + a x /2 Exp φ a(x) = ) ( e a x a x a MC φ a(x) = x 2, x /a 2, x /a 2a The penalties are parameterized such that φ a( + ) = φ a ( + ) = a. 6 / 66

17 Non-convex scalar penalty functions Penalty functions with a =. 4 Penalty functions 3 x 2 Log Rat Exp MC x 7 / 66

18 Logarithmic penalty 2 Logarithmic penalty (a = ) Second derivative x 8 / 66

19 MC penalty 2 MC penalty (a = ) Second derivative x 9 / 66

20 l p penalty, < p < (precluded) 2 x p (p =.5) Second derivative x 2 / 66

21 Scalar MC penalty We consider henceforth only the minimax-concave (MC) penalty function x a φ a(x) = x 2, x /a 2, x /a 2a x φ(x) x / a 2 / 66

22 Scalar MC penalty The parameter a controls the non-convexity of φ a. 2.5 MC penalty φ a (x) a =. 2.5 a =.25 a =.5.5 a = x 22 / 66

23 Scalar case using MC penalty { } ˆx = arg min x 2 (y x)2 + λφ a(x) ˆx λ / a y ˆx is a continuous function of y when a < λ. 23 / 66

24 Scalar case using MC penalty Consider f (x) = 2 (y x)2 + λφ a(x). For what values a is f a convex function? f (x) = 2 x 2 + λφ a(x) }{{} f (x) It is sufficient to consider the convexity of [ + f (x) = 2 x 2 + λφ a(x). ] 2 y 2 yx. }{{} convex in x 24 / 66

25 Scalar case using MC penalty f (x) = 2 x 2 + λφ a(x). 5 x 2 λ φ a ( x ) Is f convex? φ a is not differentiable. So we can not simply check that the second derivative of f is positive / 66

26 Scalar case using MC penalty Let us write φ a(x) = x s a(x) φ a ( x ) = x s a ( x ) x s a ( x ) We see the Huber function: a s a(x) = x 2, x /a 2 x, x /a. 2a 26 / 66

27 Scalar case using MC penalty Writing φ a as φ a(x) = x s a(x), we have f (x) = 2 (y x)2 + λφ a(x) = [ ] 2 (y x)2 + λ x s a(x) = [ 2 x 2 λs a(x) + λ x + ] }{{} 2 y 2 yx }{{} g(x) convex in x g convex = f convex Note that g is differentiable unlike f. Is g convex? It depends on a and λ. 27 / 66

28 Scalar case using MC penalty g(x) = 2 x 2 λs a(x). 5 x 2 s a ( x ) g ( x ) =. 5 x 2 λ s a ( x ) a =, λ =.8, g is convex. 28 / 66

29 Scalar case using MC penalty g(x) = 2 x 2 λs a(x). 5 x 2 s a ( x ) g ( x ) =. 5 x 2 λ s a ( x ) a =, λ =.3, g is not convex. 29 / 66

30 Scalar case using MC penalty The Huber function is differentiable. But not twice differentiable. s ( x ) s a ( x ) s a ( x ) 3 / 66

31 Scalar case using MC penalty g(x) = 2 x 2 λs a(x) When is g convex? We can not check the second derivative of g because it is not twice differentiable (see previous page). How can we ensure g (and hence f ) is convex? 3 / 66

32 Huber function as an infimal convolution The Huber function can be written as { a } s a(x) = min v 2 (x v)2 + v. As infimal convolution ( a ) s a(x) = 2 ( )2 (x) where infimal convolution (Moreau-Yosida regularization) is defined as (f g)(x) := min v { f (v) + g(x v) } 32 / 66

33 Huber function as an infimal convolution ( a 2 ( )2 ) (x). 5 x 2 g ( x ) = x Huber function 33 / 66

34 Epigraph The epigraph of a function is a set comprising points on and above the graph The epigraph of epi{f g} = epi{f } + epi{g} 34 / 66

35 Scalar case using MC penalty The Huber function can be written as { a } s a(x) = min v 2 (x v)2 + v. When is g convex? g(x) = 2 x 2 λs a(x) = { 2 x 2 a } λ min v 2 (x v)2 + v = 2 x 2 λ min v = 2 x 2 λ a 2 x 2 + λ max v { a } 2 (x 2 2xv + v 2 ) + v { a } 2 (2xv v 2 ) v }{{} } affine in x {{ } convex in x = 2 ( aλ) x 2 + convex function The function g is convex if aλ is non-negative, i.e., a /λ We do not need derivatives! 35 / 66

36 Scalar MC penalty The MC penalty can be written as φ a(x) = x s a(x) { a } = x min v 2 (x v)2 + v }{{} Huber function ( a ) = x 2 ( )2 (x) φ a ( x ) = x s a ( x ) x s a ( x ) 36 / 66

37 Multivariate case F (x) = 2 y Ax λ ψ(x) How can we set ψ so that F is convex and promotes sparsity of x? We can generalize the scalar case / 66

38 Separable penalty (precluded) Conventional penalty: additive (separable) φ(x) = φ(x ) + φ(x 2) φ(x ) + φ(x 2 ).5 x x 38 / 66

39 Generalized Huber function Let B R M N. We define the generalized Huber function { } S B (x) := min v R N 2 B(x v) v. In the notation of infimal convolution, we have ( ) S B (x) := 2 B 2 2 (x). 39 / 66

40 Example. Generalized Huber function Generalized Huber function S B (x) 3 2 B = x 2 x S B (x) := min v { 2 B(x v) v } Contours of S B 4 / 66

41 Example 2. Generalized Huber function Generalized Huber function S B (x) 2 B = [.5] x 2 x S B (x) := min v { 2 B(x v) v } Contours of S B 4 / 66

42 Example 3. Generalized Huber function Generalized Huber function S B (x) 2 [ ] B = S B (x) := min v { 2 B(x v) v } x 2 x Contours of S B If B is diagonal, then S B is separable. 42 / 66

43 Generalized Huber function The generalized Huber function is differentiable. Its gradient is given by S B (x) = B T B ( { }) x arg min v R N 2 B(x v) v. Neither the generalized Huber function nor its gradient have simple closed form expressions. But we will still be able to use them... When B = I we recover a well known identity S I (x) = x arg min v R N { 2 x v v } 43 / 66

44 Generalized MC penalty We define the generalized MC (GMC) penalty ψ B (x) := x S B (x) { } := x min v R N 2 B(x v) v. 44 / 66

45 Example. Generalized MC penalty x S B (x) ψ B (x) = x S B (x) x 2 x x 2 x Contours Contours Contours 45 / 66

46 Example 2. Generalized MC penalty x S B (x) ψ B (x) = x S B (x) x 2 x x 2 x Contours Contours Contours 46 / 66

47 Example 3. Generalized MC penalty x S B (x) ψ B (x) = x S B (x) x 2 x x 2 x Contours Contours Contours 47 / 66

48 Convexity Theorem. The function F (x) = 2 y Ax λ ψ B (x) = 2 y Ax λ [ x S B (x) ] = 2 y Ax 2 2 λ x λ min v R N { 2 B(x v) v }. is convex if B T B λ AT A even when ψ B is non-convex. 48 / 66

49 Convexity condition proof Write F as F (x) = y 2 Ax λ ψ B (x) = y 2 Ax λ [ x S B (x) ] [ ] = 2 Ax y 2 2 y T Ax + λ x λs B (x) = 2 Ax 2 2 λs B (x) + 2 }{{} y 2 2 y T Ax + λ x }{{} G(x) convex in x G convex = F convex 49 / 66

50 Convexity condition proof Write G as G(x) = 2 Ax 2 2 λs B (x) { = 2 Ax 2 2 λ min B(x } v 2 v) v { = 2 Ax 2 2 λ min v 2 Bx Bv 2 2 v T B T } Bx + v { = 2 Ax 2 2 λ 2 Bx λ max v T B T } Bx v 2 Bv 2 2 v }{{} } affine in x {{ } convex in x = 2 x T( A T A λb T B ) x + convex function The function G (hence F ) is convex if A T A λb T B is positive semidefinite, i.e. B T B λ AT A. 5 / 66

51 Convexity condition A straightforward choice of B to satisfy B T B λ AT A. is for some γ with γ. B = γ λ A 5 / 66

52 Optimization Algorithm We can use the Forward-Backward Splitting (FBS) algorithm The FBS algorithm is given by: F (x) = 2 y Ax λ ψ B (x) = 2 y Ax λ x λs B (x) = y 2 Ax 2 2 λs B (x) + λ x }{{}}{{} convex, differentiable convex = f (x) + f 2(x) w (i) = x (i) µ [ f (x (i) ) ] x (i+) = arg min x { 2 w (i) x µf 2(x) } = prox µf2 (w (i) ) 52 / 66

53 Optimization Algorithm Let γ and B = γ/λ A. Then a minimizer of the objective function F is obtained by the iteration: v (i) = arg min v R N { γ 2 A(x (i) v) λ v } z (i) = γa(x (i) v (i) ) x (i+) = arg min x R N { 2 y + z (i) Ax λ x } Interpretation: iteratively adjusted additive data perturbation of l norm regularized problem / 66

54 Biosensor Signal Biosensor data Time (n) SIPS Denoising Time (n) 54 / 66

55 Biosensor Signal Biosensor data Time (n) SIPS Denoising (non separable non convex penalty) Time (n) 55 / 66

56 ECG Signal.5 ECG data Time (n).5 Linear filter Time (n) 56 / 66

57 ECG Signal.5 ECG data Time (n).5 Total variation denoising Time (n) 57 / 66

58 ECG Signal.5 ECG data Time (n).5 SIPS Denoising Time (n) 58 / 66

59 ECG Signal.5 ECG data Time (n).5 SIPS Denoising (non separable non convex penalty) Time (n) 59 / 66

60 Generalizations (TV, nuclear norm, etc) Consider the function J (x) := 2 Ax b λ R(x) () where R is a convex regularizer of the form R(x) = Φ(G(Lx)) where L is a linear operator, G is possibly nonlinear, and Φ promotes sparsity. For example, total variation regularization is expressed as R(x) = G(Lx) = i g i (Lx) with ([ ] ) Dh D v x g i (Lx) = i (Dh x) 2 i + (D v x) 2 i anisotropic TV isotropic TV 6 / 66

61 Generalizations (TV, nuclear norm, etc) Suppose R satisfies A) R( ) = Φ(G(L )) is convex and bounded from below by zero; A2) Φ(G( )) is a proper, lower semicontinuous and coercive function. Consider the function with non-convex regularizer J B (x) := 2 Ax b λ R B (x) (2) R B (x) := R(x) min v R N { 2 B(x v) R(v) }. The function J B (x) is convex (strongly convex) if A T A λ B T B ( ). Reference: A. Lanza, S. Morigi, I. Selesnick, and F. Sgallari. Sparsity-inducing Non-convex Non-separable Regularization for Convex Image Processing. 27. Submitted. 6 / 66

62 Numerical Examples D Total Variation denoising 6 Clean signal 6 Noisy signal TV Denoising (L norm) (best λ) RMSE = TV Denoising (GMC penalty) (best λ) RMSE = λ = λ = Reference: I. Selesnick, A. Lanza, S. Morigi, and F. Sgallari. Total Variation Signal Denoising via Convex Analysis. 27. Submitted. 62 / 66

63 Summary We show how to construct non-convex regularizers that preserve the convexity of functionals for sparse-regularized linear least-squares. Generalizes the l norm. Can be used in conjunction with other convex non-smooth regularizers (TV, nuclear norm, etc). Resulting regularizers are non-separable. Optimization implementable using proximal algorithms (as for l norm). 63 / 66

64 References A. Lanza, S. Morigi, I. Selesnick, and F. Sgallari. Sparsity-inducing Non-convex Non-separable Regularization for Convex Image Processing. 27. Submitted. I. Selesnick, A. Lanza, S. Morigi, and F. Sgallari. Total Variation Signal Denoising via Convex Analysis. 27. Submitted. A. Lanza, S. Morigi, I. Selesnick, and F. Sgallari. Nonconvex nonsmooth optimization via convex nonconvex majorization minimization. Numerische Mathematik, 36(2):343 38, 27. I. Selesnick. Sparse Regularization via Convex Analysis. IEEE Trans. Signal Process., 65(7): , September 27. I. Selesnick. Total Variation Denoising via the Moreau Envelope. IEEE Signal Processing Letters, 24(2):26 22, February 27. A. Lanza, S. Morigi, and F. Sgallari. Convex Image Denoising via Non-convex Regularization with Parameter Selection. J. Math. Imaging and Vision, 56(2):95 22, 26. A. Lanza, S. Morigi, and F. Sgallari. Convex Image Denoising via Non-Convex Regularization. In Scale Space and Variational Methods in Computer Vision. Volume I. W. Selesnick, A. Parekh, and I. Bayram. Convex -D Total Variation Denoising with Non-convex Regularization. IEEE Signal Processing Letters, 22(2):4 44, February / 66

65 Optimization Algorithm (saddle point algorithm) With we have and J B (x) = Ax 2 b λ R B (x) { R B (x) = R(x) min B(x v 2 v) R(v) } J B (x) = Ax { 2 b λr(x) λ min B(x v 2 v) R(v) } = Ax { 2 b λr(x) + λ max B(x v 2 v) 2 2 R(v) } ˆx = arg min J B (x) x = arg min { x or a saddle point problem (ˆx, ˆv) = arg min x 2 Ax b λr(x) + λ max v max v { B(x 2 v) 2 2 R(v) }} { 2 Ax b λr(x) λ 2 B(x v) 2 2 λr(v) } 65 / 66

66 Optimization Algorithm (saddle point algorithm) This saddle-point problem is an instance of a monotone inclusion problem. Hence, the solution can be obtained using the forward-backward (FB) algorithm. for k =,, 2,... [ ] w k = x k µ A T (Ax k b) + λ B T B (v k x k ) u k = v k µλb T B(v k x k ) { x k+ = arg min R(x) + } x R n 2µλ x w k 2 2 { v k+ = arg min R(v) + } v R n 2µλ v u k 2 2 end This algorithm reduces to ISTA for B = and R(x) = x. 66 / 66

The Suppression of Transient Artifacts in Time Series via Convex Analysis

The Suppression of Transient Artifacts in Time Series via Convex Analysis Yining Feng 1, Harry Graber 2, and Ivan Selesnick 1 (1) Tandon School of Engineering, New York University, Brooklyn, New York (2)