Recent developments on sparse representation

Recent developments on sparse representation Zeng Tieyong Department of Mathematics, Hong Kong Baptist University Email: zeng@hkbu.edu.hk Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last 1

Outline 1. Introduction 2. MP shrinkage algorithm 3. TV dictionary model 4. TV wavelet shrinkage 5. Conclusions Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last 2

Background Research on sparse representation Dictionary learning task: given images, find dictionary methods Sparse representation task: given dictionary, for any image, find representation methods: Basis Pursuit, Matching Pursuit, Orthogonal Matching Pursuit Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last 3

Mathematical framework Let H be a Hilbert space and the analyzed signal/image v H contains some noise: v = u + b, where u is clean image. Dictionary: a subsect D = {(ψ i ) i I } of H Atom: element in the dictionary, usually normalized Task of sparse representation: find a linear expansion, use atoms as few as possible, to approximate v Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last 4

Background Various approaches have been proposed for the sparse representation problem, such as Basis Pursuit S. Chen, D. Donoho, and M. Saunders. Atomic Decomposition by Basis Pursuit, SIAM J. Sci. Comput., Vol. 20(1), pp. 33-61 (1998). Matching Pursuit S. Mallat and Z. Zhang. Matching Pursuits with Time-Frequency Dictionaries, IEEE Trans. Signal Process., Vol. 41(12), pp. 3397-3415 (1993). Orthogonal Matching Pursuit Y. Pati and R. Rezaiifar and P. Krishnaprasad. Orthogonal matching pursuit : Recursive function approximation with applications to wavelet decomposition, Proc. of 27th Asimolar Conf. on Signals, Systems and Computers, Los Alamitos (1993). Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last 5

Basis Pursuit This model considers: min (λ i ) i I v i I where α is regularization parameter. λ i ψ i 2 + α i I λ i, advantages solvable comparing to the l 0 -norm minimization easy to be integrated into other variational model disadvantages difficult optimization task the tuning of parameter α is not straightforward Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last 6

Matching Pursuit The MP approximates v by the iteration of decomposing the n-th residual R n v: Set R 0 v = v, n = 0. Iterate (loop in n) 1. find the best atom ψ γn by: 2. sub-decompose: γ n = arg sup i I R n v, ψ i ; R n+1 v = R n v R n v, ψ γn ψ γn. In applications, for M predefined, we can take the M-first terms as result u: u = R n v, ψ γn ψ γn. (1) M 1 n=0 Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last 7

Remarks on MP Advantages simple! correctly picks up atoms in the case of existing sparse solution useful for compression-noiseless case Disadvantages heavy computation noisy case? Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last 8

Outline 1. Introduction 2. MP shrinkage algorithm 3. TV dictionary model 4. TV wavelet shrinkage 5. Conclusions Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last 9

Wavelet shrinkage Let D = (ψ i ) i I H be a wavelet basis. For the noisy image v, this method takes: u = θ τ ( v, ψ i )ψ i, (2) i I where τ is a fixed positive and θ τ is a shrinkage function. Typical examples: (soft thresholding) ρ τ (t) = { ( t τ)sgn(t), when t τ; 0, otherwise, (3) (hard thresholding) h τ (t) = { t, when t τ; 0, otherwise. (4) Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last 10

Why shrinkage on dictionary On can observe that: when D is wavelet basis, MP is exactly wavelet-shrinkage with hard threshold function. Is soft-shrinkage better than hard-shrinkage? Moreover, denote u n def = R n v b. As ψ γn D is selected by MP, we have: R n v, ψ γn ψ γn = u n, ψ γn ψ γn + b, ψ γn ψ γn. It is inappropriate to replace brutally u n, ψ γn ψ γn by R n v, ψ γn ψ γn. We propose to shrink R n v, ψ γn at each iteration of MP. Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last 11

General shrinkage functions Definition 1. A function θ( ) : R R is called a shrinkage function if and only if it satisfies: 1. θ( ) is nondecreasing, i.e, t, t R, t t = θ(t) θ(t ); 2. θ( ) is a shrinkage, i.e, t R, θ(t) t. Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last 12

The MP shrinkage algorithm For v H fixed. Let R 0 v = v, a predefined factor α (0, 1]. Iterate for n N: find an atom ψ γn D: ψ γn, R n v α sup i I R n v, ψ i. sub-decompose R n v as R n v = s n ψ γn + R n+1 v, (5) where s n = θ(m n ) with M n = R n v, ψ γn. (6) Finally, take:u = + n=0 s nψ γn. A. θ = Id: usual MP; B. D wavelet basis: wavelet shrinkage Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last 13

Convergence of MP shrinkage Does MP shrinkage converge? Theorem 1. Let (ψ i ) i I be a normed dictionary, v H and θ( ) be a shrinkage function. The sequences defined in Eq.6 satisfy: (R n v) n N converges. As a consequence, + n=0 s n ψ γn exists. We denote the limit of (R n v) n N by R + v and we trivially have v = + n=0 s n ψ γn + R + v. Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last 14

Bound on l 1 regularity interior threshold: τ def = inf t:θ(t) 0 t exterior threshold: τ + def = sup t:θ(t)=0 t. thresholding function: iff τ > 0. Theorem 2. Let (ψ i ) i I be a normed dictionary, v H and θ( ) be a thresholding function. The quantities defined in Eq.6 satisfy: + n=0 s n v 2 R + v 2 τ v 2 τ, (7) where τ > 0 denotes the interior threshold. Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last 15

Bound on the residual norm Let us define the semi-norm on H, u D def = sup i I u, ψ i, u H. Let V = Span{D}, V V = H. Theorem 3. Let (ψ i ) i I be a normed dictionary, v H and θ( ) be a shrinkage function. The limits of MP shrinkage satisfy R + v P V v D τ + α, and + n=0 where τ + is the exterior threshold. s n ψ γn P V v τ + α, D Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last 16

Experiments for MP/MP shrinkage Figure 1: Basic filters to construct the dictionary. Each filter is extended to the same size as the underlying noisy image by zero-padding and then translate over the plan. Left: DCT filters; right: nine letter filters. Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last 17

Convergence: s n 10 5 10 0 10 5 MP τ=10 τ=50 τ=100 10 10 10 15 0 500 1000 1500 2000 2500 3000 3500 4000 Figure 2: The quantity s n (y-axis) as a function of the iteration number n (x-axis). These are for the MP shrinkage with the soft thresholding function for τ = 0 (i.e MP), τ = 10, 50 and 100. Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last 18

Detection by letter dictionary Figure 3: Top-left: clean image; top-middle: noisy image of Gaussian noise std 150; top-right: denoised-image by ROF (λ = 1 300 ); bottom-left: wavelet soft-shrinkage (τ = 400); bottom middle: MP by letter dictionary; bottom right: MP shrinkage by letter dictionary (soft-shrinkage with τ = 400). Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last 19

Outline 1. Introduction 2. MP shrinkage algorithm 3. TV dictionary model 4. TV wavelet shrinkage 5. Conclusions Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last 20

Total variation dictionary model The task of image denoising is to recover an ideal image u L 2 (Ω) from a noisy observation: v = u + b, where Ω is a rectangle of R 2 to define the image, v L 2 (Ω) is the noisy image and b L 2 (Ω) is Gaussian noise of standard variation σ. We are interested in the following total variation dictionary model: (P ) : { min T V (w) subject to w v, ψ τ, ψ D, for a finite dictionary D L 2 (Ω) which is often symmetric and a positive parameter τ associated with the noise level. Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last 21

Remarks on (P ) Advantages better than ROF model, texture recovering rather flexible Key problem how to design the dictionary Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last 22

KTT parameters Suppose w is solution of (P ). Using Kuhn-Tucker Theorem, we know that there exist positive Lagrangian parameters (λ ψ ) ψ D such that: ψ D λ ψψ = ( w w ). (8) Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last 23

Dural form Let T V : conjugate function of T V. Let f : H [, + ] be a convex function. The Bregman distance associated with f for points p, q H is: B f (p, q) = f(p) f(q) f(q), p q, where f(q) is a subgradient of f at the point q. Theorem 1. The dural problem of (P ) is: min (λ ψ ) ψ D 0 B T V λ ψ ψ, ( v v ) + τ λ ψ, (9) ψ D ψ D where B T V is the Bregman distance associated to T V. Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last 24

Ad-hoc dictionary The sparsest case: curvature of ideal image Figure 4: Left: curvature of Lena image; right: curvature of letters. Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last 25

Figure 5: Denoising by (P ) with ad-hoc dictionary and ROF. Top: clean image, noisy image (σ = 20, PSNR = 22.11); middle: result of ROF (PSNR = 27.66), result of (P ) with ad-hoc dictionary (PSNR = 34.93); bottom: residue of ROF and (P ). Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last 26

Figure 6: Image decomposition: Top: clean image, noisy image to decompose, obtained by 20% impulse noise; middle: cartoon part, noisy-texture part of ROF model; bottom: letter part, background-noisy part of model (P ). Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last 27

Figure 7: Image denoising. Top: clean image, noisy image with σ = 20, P SNR = 22.08; middle: denoise result of ROF with P SNR = 24.56, residue of ROF; bottom: denoise result of (P ) with P SNR = 31.20, residue of (P ). Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last 28

Outline 1. Introduction 2. MP shrinkage algorithm 3. TV dictionary model 4. TV wavelet shrinkage 5. Conclusions Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last 29

Total variation based shrinkage model Total variation model wavelet shrinkage min w H min r H 1 2 f w 2 2 + βt V (w) 1 2 f r 2 2 + α r, ϕ ϕ D TV wavelet shrinkage min E(r, w) := 1 w,r H 2 f r w 2 2 + α r, ϕ + βt V (w), ϕ D Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last 30

Alternating minimization direction algorithm 1. Initialized r 0. 2. Repeat until convergence: - update w n = arg min w 1 2 (f r n 1) w 2 2 + βt V (w), (10) 1 r n = arg min r 2 (f w n) r 2 2 + α r, ϕ ; (11) ϕ D - take u n = w n + r n. Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last 31

Strong convergence Theorem 2. We have: 1. the sequence (w n, r n ) converges strongly to a global minimal point of E(r, w). 2. the sequence (u n ) converges strongly to a unique point regardless the initialization of (r 0, w 0 ). Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last 32

Experiments on Cameraman Figure 8: Top-left: clean image; top-left: noisy image of Gaussian std 20; bottom-left: wavelet soft-shrinkage with α = 50, SN R = 11.89; bottom-middle: ROF, SN R = 13.61; bottom-right: new model with α = 50, β = 60, SNR = 15.73. Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last 33

Conclusion 1. MP shrinkage algorithm 2. TV dictionary model 3. TV wavelet shrinkage Hong Kong Baptist University Dec. 8, 2008 First Previous Next Last 34