The Analysis Cosparse Model for Signals and Images

The Analysis Cosparse Model for Signals and Images Raja Giryes Computer Science Department, Technion. The research leading to these results has received funding from the European Research Council under European Union's Seventh Framework Program, ERC Grant agreement no. 30649. Sangnam Nam, CMI, Marseille France Michael Elad, Technion, Haifa Israel Remi Gribonval, INRIA, Rennes France Mike E. Davies University of Edinburgh Scotland

Outline Classical synthesis model versus the new analysis model. A recipe for generating algorithms for the new model. New set of tools that provide a theory with uniform recovery guarantees. Some open problems.

3 Agenda Analysis and Synthesis - Introduction From Synthesis to Analysis Experimental Results Theoretical Guarantees

4 Problem Setup Given the following linear measurements x, m, 0 y Mx e y e d Recover 0 from. e is the noise term. md M is the measurement matrix (m<d). In the noiseless case e 0. y

5 Synthesis Sparsity Prior x 0 Given that is k-sparse we can recover it stably from y under some proper assumption on M. The same holds if x 0 has a k-sparse representation under certain types of dn dictionaries : x Dz, z k D 0 0 0 0

6 Synthesis Sparsity Prior x Dz, z k 0 0 0 0 Zero location Non-zero location x 0 z, k 5 0 D

7 Synthesis Minimization Problem The problem we aim at solving in synthesis is: xˆ zˆ arg min z s.t. y MDz 0 z Dzˆ The above can be approximated stably in a polynomial time under some proper assumptions on M and D.

8 Analysis Model Another way to model the sparsity of the signal is using the analysis model [Elad, Milanfar, and Rubinstein, 007]. Looks at the coefficients of x 0. pd is a given transform the analysis dictionary.

9 Analysis Model - Example Assume, x 0 44 x 0 x 0 Zero location Non-zero location x 0 3 non-zeros l p x 0 1 zeros What is the dimension of the subspace in which x 0 resides?

10 Cosparsity and Corank x 0 We are interested in the zeros in. Cosparsity l: contains l zeros. Cosupport : The zeros locations. Corank r: The rank of. If is in general position then l=r. In general x 0 resides in a subspace of dimension d-r. x 0

11 Synthesis versus Analysis Synthesis Sparsity k non-zeros Analysis Cosparsity l zeros

1 Synthesis versus Analysis Synthesis Sparsity k non-zeros Synthesis dictionary D dn Analysis Cosparsity l zeros Analysis dictionary pd

13 Synthesis versus Analysis Synthesis Sparsity k non-zeros Synthesis dictionary D Sparse representation x dn Dz z with (at most) k nonzeros Analysis Cosparsity l zeros Analysis dictionary pd Cosparse representation x with (at least) l zeros x p l 0

14 Synthesis versus Analysis Synthesis Sparsity k non-zeros Synthesis dictionary D dn Sparse representation x Dz z with (at most) k nonzeros D i - i-th column in D Analysis Cosparsity l zeros Analysis dictionary pd Cosparse representation x with (at least) l zeros x p l 0 - i-th row in i

15 Synthesis versus Analysis Synthesis Support T 1.. n - indices of the nonzeros in z Analysis Cosupport {1.. p} - indices of the zeros in x

16 Synthesis versus Analysis Synthesis Support T 1.. n - indices of the nonzeros in z x resides in a k dimensional subspace spanned by D D, i T T i Analysis Cosupport {1.. p} - indices of the zeros in x x resides in a d-r dimensional subspace orthogonal to the subspace spanned by, i i

17 Analysis Minimization Problem We work directly with and not with its representation. Generates a different family of signals. The signals are characterized by their behavior and not by their building blocks. The problem we aim at solving in analysis is xˆ arg min x s.t. y Mx x x 0

18 Phantom and Fourier Sampling Mx x0 0 Shepp-Logan phantom 1 sampled Fourier radial lines

19 Naïve recovery Naïve recovery l minimization result with no prior

0 Shepp-Logan Derivatives D-DIF x0 Result of applying D-Finite difference operator

1 Approximation Methods The analysis problem is hard just like the synthesis one and thus approximation techniques are required.

l 1 relaxation For synthesis zˆ arg min z s.t. y MDz For analysis z 1 xˆ argmin x s.t. y Mx x 1

3 More Approximation Techniques In synthesis we have many more approximation techniques OMP ROMP [Needell and Vershynin 009] CoSaMP [Needell and Tropp 009] SP [Dai and Milenkovic 009] IHT [Blumensath and Davies, 009] HTP [Foucart, 010] Can we convert them for the analysis framework?

4 Agenda Analysis and Synthesis - Introduction From Synthesis to Analysis Experimental Results Theoretical Guarantees Novel Part

5 Today s algorithms We present a general scheme for translating synthesis operations into analysis ones. This provides us with a recipe for converting the algorithms. The recipe is general and easy to use We apply this recipe on CoSaMP, SP, IHT and HTP. Theoretical study of analysis techniques is more complicated.

6 (Co)Sparse vectors addition Synthesis z 1 Given two vectors and with supports T 1 and T of sizes and k Task: find the support of z1 z Solution: supp z 1 z T1 T The maximal size of the support is z T k 1 T 1 k k 1 Analysis x 1 Given two vectors and x with cosupports and of sizes and Task: find the cosupport of x1 x Solution: cosupp x l1 x 1 The minimal size of the cosupport is l l p 1 1 l 1 1

7 Orthogonal Projection Synthesis Given a representation z and support T Task: Orthogonal projection onto the subspace supported on T Solution: keep elements in z supported on T and zero the rest z T Analysis Given a vector x and cosupport Task: Orthogonal projection onto the subspace orthogonal to the one spanned by Solution: calculate projection to the subspace spanned by and remove it from the vector Q I

8 Objective Aware Projection Synthesis Given a vector z and support T Task: Perform an objective aware projection onto the subspace spanned by Solution: solve The last has a simple closed form solution D T arg min y MDz s.t. z 0 z T C MD T z Analysis Given a vector x and cosupport Task: Perform an objective aware projection onto the subspace orthogonal to the one spanned by Solution: solve arg min x y Mx s.t. x 0 The last has a closed form solution

9 (Co)Support Selection Synthesis Given a non sparse vector z Task: Find the support T of the closest k-sparse vector Solution: Select the indices of largest k elements in z T supp z, k Optimal solution. Analysis Given a non cosparse vector x Task: Find the cosupport of the closest l-cosparse vector Solution: Select the indices of smallest l elements in x cosupp x, l This solution is suboptimal

30 Cosupport Selection Task: Find the cosupport of the closest l- cosparse vector to x arg min x Q x In some cases efficient algorithms exist: Simple thresholding for the unitary case. Dynamic programming for the one dimensional finite difference operator ( 1D-DIF ) [Han et al., 004] and the Fused-Lasso operator ( ) [Giryes et al., 013]. FUS 1D-DIF ;I

31 Near Optimal Cosupport Selection In general the cosupport selection problem for a general is an NP-hard problem [Gribonval et al., 013]. A cosupport selection scheme Sˆl is near optimal d with a constant C l if for any vector Reminder: v Q v v C inf x v Sˆ l z l is -cosaprse x l Q v I v Sˆ z Sˆ z Sˆ z l l l

3 Cosupport Selection Open Problems Find new types of analysis operators with optimal or near optimal cosupport selection schemes. An efficient global near optimal cosupport selection scheme for a given C l.

33 Subspace Pursuit (SP) r 0 y, t 0 T Output: xˆ xˆt Find new support elements * * t1 T supp D M r, k t t1 Update the residual: t r y Mxˆ xˆ t t t Dzˆ Support update: T T T Calculate temporal representations: z MD y p arg min s.t. z 0 T C T y MDz Calculate temporal representations: p z MD y arg min s.t. z 0 T C T y MDz Estimate k- sparse support: T supp z, k p [Dai and Milenkovic 009]

34 Analysis SP (ASP) r 0 y, t 0 [1.. p] Output: xˆ xˆt Find new cosupport elements ˆ * t 1 S M r al t t1 Update the residual: t r y Mxˆ t Cosupport update: Compute new solution: t xˆ arg min x s.t. 0 x y Mx Calculate temporal solution: x p arg min x s.t. 0 x y Mx Estimate l-cosparse cosupport: Sˆl x p [Giryes and Elad 01]

35 r T 0 Compressive Sampling Matching Pursuit (CoSaMP) y, t 0 Output: xˆ xˆt Find new support elements * * t 1 T supp D M r, k t t1 Update the residual: t r y Mxˆ xˆ t t t Dzˆ Support update: T T T Compute new solution: zˆ t z p T Calculate temporal representations: p z MD y arg min s.t. z 0 T C T y MDz Estimate k- sparse support: T supp z, k p [Needell and Tropp 009]

36 Analysis CoSaMP (ACoSaMP) r 0 y, t 0 [1.. p] Output: xˆ xˆt Find new cosupport elements S ˆal M r t r y Mxˆ * t 1 t t1 Update the residual: t Cosupport update: Compute new solution: xˆ t Qx p Calculate temporal solution: x p arg min x s.t. 0 x y Mx Estimate l-cosparse cosupport: S ˆl x p [Giryes and Elad 01]

37 AIHT and AHTP Applying the same scheme on iterative hard thresholding (IHT) and hard thresholding pursuit (HTP) result with their analysis versions AIHT and AHTP. [Giryes, Nam, Gribonval and Davies 011]

38 Algorithms Variations Relaxed ACoSaMP (RACoSaMP) and relaxed ASP (RASP) instead of arg min x y Mx x arg min y Mx s.t. x 0 x Easier to solve in high dimensional problems. [Giryes and Elad 01]

39 Other Analysis Existing Methods Greedy anlaysis pursuit (GAP) [Nam, Davies, Elad and Gribonval, 013] Selects the cosupport iteratively In each iteration removes elements from the cosupport in a greedy way Simple Thresholding Orthogonal backward greedy (OBG) an algorithm for the case M=I [Rubinstein, Peleg, Elad, 013].

40 Agenda Analysis and Synthesis - Introduction From Synthesis to Analysis Experimental Results Theoretical Guarantees Novel Part

41 Noiseless Signal Recovery Setup Synthetic noiseless compressed sensing experiment. is a transpose of a random tight frame. pd md d=00, p=40 (reminder:, M ). M is drawn from a random Gaussian ensemble. We draw a phase diagram: a grid of m d versus d l m. We repeat each experiment 50 times. We select a=1.

4 ACoSaMP ASP GAP l 1

43 High Dimensional Image Recovery is the D-finite differences operator. Recovering the Shepp-Logan Phantom. Few radial lines from two dimensional Fourier transform. 15/1 radial lines suffice for RACoSaMP/RASP - less than 5.77%/4.63% of the data. Performance similar to GAP. Better than l 1.

44 High Dimensional Image Recovery Shepp-Logan phantom 1 sampled radial lines

45 High Dimensional Image Stable Recovery An additive zero-mean white Gaussian noise. SNR of 0. Naïve recovery PSNR = 18dB Using radial lines for RASP and 5 radial lines for RACoSaMP. PSNR of 36dB. Same performance as GAPN.

46 High Dimensional Stable Recovery Naïve recovery RASP recovery

47 Agenda Analysis and Synthesis - Introduction From Synthesis to Analysis Experimental Results Theoretical Guarantees Novel Part

48 Restricted Isometry Property (RIP) We say that a matrix M satisfies the RIP [Candès and Tao, 006] with a constants if for every k-sparse vector it holds that 1 u Mu 1 u. k u k k

49 Algorithm Guarantees Given that z 0 is a k-sparse vector and that MD satisfies the Restricted Isometry Property (RIP) with a constant bk algorithm then after a constant number of iterations i *, SP, CoSaMP, IHT and HTP satisfy: ˆi * z z c e 0 1 where c 1 is a given constant (the constant of each algorithm is different). [Needell and Tropp 009, Dai and Milenkovic 009, Blumensath and Davies 009, Foucart 010]

50 -RIP We say that a matrix M satisfies the -RIP with a constant if for every l-cosparse vector it l holds that l l 1 v Mv 1 v. v

51 Near Optimal Cosupport Selection Reminder: A cosupport selection scheme is near optimal d with a constant C l if for any vector Q v v C inf x v Sˆ l z l is -cosaprse x l S ˆl Q v I v Sˆ z Sˆ z Sˆ z l l l v

5 Reconstruction Guarantees Theorem: Apply either ACoSaMP or ASP with a=(l-p)/l, obtaining after t iterations. For an appropriate value of C max C l, Cl p there exists a reference constant C, M greater than zero such that if 4l3 p C, M then after a finite number of iterations t * * t ˆ, x x c e 0 where c <1 is a function of x ˆt 4l3 p, Cl, Clp and M. [Giryes, Nam, Elad, Gribonval and Davies, 013]

53 Special Cases Optimal Projection Given an optimal projection scheme the condition of the theorem is simply 4 3 0.0156. l When is unitary the RIP and -RIP coincide and the condition becomes p M * 4k 0.0156.

54 AIHT and AHTP AIHT and AHTP have a similar theorem to ASP and ACoSaMP. Given an optimal projection scheme AIHT and AHTP have stable recovery if In the unitary case k 1/3. l p M * 1/ 3.

55 -RIP Matrices p Theorem: for a fixed d md, if M satisfies CM m for any z P Mz z z e then, for any constant l 0 we have l l with probability exceeding 1 e t if 3 9 p m p llog t CM l p l l [Blumensath, Davies, 009], [Giryes, Nam, Elad, Gribonval and Davies, 013]

56 Linear Dependencies in For -RIP the number of measurements we need is m O p llog p If is in a general position then l<d => if p=d we have m>d If we allow linear dependencies then l can be greater than d. =>if p=d we can have m<d. Conclusion: Linear dependencies are permitted in and even encouraged d p

57 -RIP Open Questions Johnson-Lindenstrauss matrices are also RIP and -RIP matrices. Is there an advantage for the -RIP over the RIP? Can we find a family of matrices that has the -RIP but not the RIP? Can we get a guarantee which is in terms of d-r (r is the corank) instead of p-l? What is coherence in the analysis model?

58 Related Work D-RIP based recovery guarantees for l 1 - minimization [Candès, Eldar, Needell, Randall, 011], [Liu, Mi, Li, 01]. ERC like based recovery conditions for GAP [Nam, Davies, Elad and Gribonval, 013] and l 1 - minimization [Vaiter,Peyre,Dossal,Fadili, 013]. D-RIP based TV recovery guarantees [Needell, Ward, 013] Thresholding denoising (M=I) performance for the case of Gaussian noise [Peleg, Elad, 013].

59 Implications for the Synthesis Model Classically, in the synthesis model linear dependencies between small groups of columns are not allowed. However, in the analysis model, as our target is the signal, linear dependencies are even encouraged. Does the same hold for synthesis?

60 Implication for the Synthesis Model Signal space CoSaMP signal recovery under the synthesis model also when the dictionary is highly coherent [Davenport, Needell, Wakin, 013]. An easy extension of our analysis theoretical guarantees also to synthesis [Giryes, Elad, 013]. The uniqueness and stability conditions for the signal recovery and the representation recovery are not the same [Giryes, Elad, 013].

61 Conclusion New sparsity model the analysis model. The zeros contain the information. A general recipe for converting standard synthesis techniques into analysis ones. New theory and tools. Linear dependencies are a good thing in the analysis dictionary. Implications on the synthesis model.

64 Reconstruction Guarantees Theorem: Apply either ACoSaMP or ASP with a=(l-p)/l, obtaining after t iterations. If C 1 M / C 1(*) and 4l3 p C, M, where C max C l, Cl p and is a constant greater than zero C, M whenever (*) is satisfied, then after a finite number of iterations t * * t ˆ, x x c e 0 where c <1 is a function of x ˆt 4l3 p, Cl, Clp and M. [Giryes, Nam, Elad, Gribonval and Davies, 013]

65 Iterative hard thresholding (IHT) Iterative hard thresholding (IHT) [Blumensath and Davies, 009] has two steps: Gradient step with step size : z z MD y MDz 1 ˆ ˆ. i i i Projection to the closest k-sparse vector: zˆ z. T i1 i1 k k is a hard thresholding operator that keeps the k largest elements and zeros the rest.

66 Hard thresholding pursuit(htp) Hard Thresholding Pursuit (HTP) [Foucart, 010] is similar to IHT but differ in the projection step Instead of projecting to the closest k-sparse vector, takes the support T i+1 of z i 1 and k minimizes the fidelity term: zˆ arg min y MD z i1 z i1 1 * * * * T T T D M MD D M y i1 i1 i1 T

67 Changing step size Instead of using a constant step size, one can choose in each iteration the step size that minimizes the fidelity term [V. Cevher, 011]: y MDzˆi 1

68 A-IHT and A-HTP gradient step cosupport selection: Projection: For A-IHT: ˆ Reminder: For A-HTP: x x M y Mx ˆ T ˆ 1 ˆ cosupp x, l i i i x Q x Q xˆ y Mx i1 i1 i1 i1 i1 I i1 i1 arg min i1 x s.t. x0 ˆ i 1 i1 i1 i1 T ˆ ˆ T Q M M Q M y ˆ 0 i1 [Giryes, Nam, Gribonval and Davies 011]

69 Algorithms variations Optimal changing stepsize instead of a fixed one: ˆ i 1 arg min y Mxi 1 Approximated changing stepsize i 1 arg min y Mxi 1 Underestimate the cosparsity and use l l.

70 Reconstruction Guarantees Theorem: Apply either AIHT or AHTP with a certain constant step size or optimal step size obtaining x t after t iterations. If C 1 / C 1(*) l M l and l p 1 Cl, M, where 1 C, M is a constant greater than zero whenever (*) is satisfied, then after a finite number of iterations t * * t ˆ, x x c e 0 1 where c 1 <1 is a function of l p, Cl, and M. [Giryes, Nam, Elad, Gribonval and Davies, 013]

71 Special Case Unitary Transform When is unitary the condition of the first theorems is simply M * k 1/ 3