Sparse Approximation of Signals with Highly Coherent Dictionaries Bishnu P. Lamichhane and Laura Rebollo-Neira b.p.lamichhane@aston.ac.uk, rebollol@aston.ac.uk Support from EPSRC (EP/D062632/1) is acknowledged http://www.epsrc.co.uk New Trends and Directions in Harmonic Analysis, Approximation Theory, and Image Analysis, Inzell Summer School, Germany, Sept. 17 21, 2007
Outline 0 Outline 1. Sparse Approximation of a Signal Introduction Optimized Orthogonal Matching Pursuit Adaptive Biorthogonalization Swapping Controlling the Condition Number 2. Oblique Matching Pursuit Oblique Projections Structured Noise Filtering 3. Conclusion
Sparse Approximation 1 Introduction The problem of non-linear approximation consists of representing a signal f as a linear superposition of minimal number of basis functions, called atoms selected from a redundant set called a dictionary, D = {α i } i J. We want to minimize x 0, so that i J α ix i = f or for given ɛ > 0, we want to obtain i J α ix i f ɛ. (NP-hard and highly non-linear problem) Matching pursuit methodologies yield a trade-off between the optimality and computational complexity. Working with a highly coherent dictionary the subset selected by matching pursuit methods can have a large condition number. We want to control the condition number in a step-wise manner.
2 Optimized Orthogonal Matching Pursuit Optimized Orthogonal Matching Pursuit Given a signal f H. We select a set of atoms {α li } k 1 i=1 spanning the space V k 1. The new index l k is the maximizer over all n J \J k 1 of γ n, f (OMP), or γ n, f γ n, γ n 0, (OOMP), γ n = α n P Vn α n, where J k 1 = {l 1,, l k 1 }. The orthogonal projection of the signal f onto V k 1 P Vk 1 f = k 1 i=1 α li β k 1 i, f = k 1 i=1 β k 1 i α li, f = } k 1 k 1 i=1 c k 1 i α li, where the set {α li } k 1 i=1 is biorthogonal to {βk 1 i i OOMP criterion is based on minimizing the norm of the residual at each step. R k 1 2 = R k 2 + γ n, f 2 i=1, and V k 1 = span{β k 1 γ n 2 } k 1 i=1.
3 Biorthogonalization Recursive Biorthogonalization We want to update and downdate the orthogonal projection when an atom is added to the set {α li } k 1 i=1, and an atom is deleted from the set {α l i } k i=1. Forward step: After selecting the kth atom we have {α li } k i=1. The biorthogonal functions {βi k}k i=1 can be constructed from {βk 1 i } k 1 i=1 using β k k = β k i = β k 1 i γ lk γ lk 1 2; γ l k = α lk P Vk 1 α lk, β k k α lk, β k 1 i, i = 1,, k. Backward step: The biorthogonal set {β k i }k i=1 spanning V k is given and one atom, say α lj, is to be removed from V k. Denote by V k\j the reduced subspace V k\j = span{α l1,..., α lj 1, α lj+1,..., α lk }. The biorthogonal set is to be modified according to the equation β k\j i = β k i βk j βk j, βk i β k j 2, i = 1,..., j 1, j + 1,..., k.
The orthogonal projector onto V k = V k 1 {α lk }, P Vk, is then given by k k P Vk = α li βi k, = βi k α li,. i=1 and the orthogonal projector onto V k\j by k P Vk\j = α li β k\j i, = i=1 i j i=1 k i=1 i j β k\j i α li,. 3 Biorthogonalization The coefficients are to be modified as when including the atom α lk c k k c k i = β k i, f = ck 1 i = βk k, f, c k k βk 1 i, α k, i = 1,..., k 1. and when deleting an atom α lj c k\j i = β k\j i, f = c k i βk i, βk j ck j β k j 2, i = 1,..., j 1, j + 1,..., k.
Swapping 3 Biorthogonalization Swapping is based on comparing whether removing one already chosen atom and including a new one improve the approximation. Backward and forward steps are made easy with adaptive biorthogonalization. Backward step: Downdate P Vk f to P Vk\j f The index to be removed is the minimizer over all n = 1,..., k c k n β k n. Forward step: Update P Vk 1 f to P Vk f The new index l k is the maximizer over all n J \J k 1 of γ n, f (OMP), or γ n, f γ n, γ n 0, (OOMP), where J k 1 = {l 1,, l k 1 }.
Theoretical Bound 3 Biorthogonalization OMP selects only the optimal atoms if Φ optφ npt 1 < 1 (Tropp 2004). The bound is obtained from the inequality Φ nptr n f Φ opt R n f < 1, where Φ opt and Φ npt form sets of optimal and non-optimal atoms, and R n f = f P Vn f. Using φ, R n f = R n φ, f, we get R nφ npt f R n Φ opt f < 1, which gives better bound. Figure 1: Signal independent and signal dependent check for OMP and OOMP
More Coherent B-spline Dictionary 3 Biorthogonalization The theoretical bound does not apply at all. Signals are generated by selecting atoms randomly and combining them linearly The signal is exactly recovered with swapping Without swapping only the signals up to sparsity 80 are exactly recovered Error Number of Selected Atoms Babel Function Figure 2: Exact signal recovery using and not using swapping operation
Controlling the Condition Number 4 Condition Control Restrict the search space with γ n ɛ 1 and stop when γ n,f γ n ɛ 2, γ n = α n P Vk α n. Both depend on the dictionary and the signal. If W k = [β1, k, βk k], 1 gives the smallest singular value of the selected basis. W k The search can be broken if it goes beyond some limit. Optimal trade-off between the condition number and the approximation? Need a cheap condition estimator. Implementation of the matching pursuit based on the incremental QRdecomposition and the biorthogonal functions = the use of incremental norm estimator proposed by Duff and Voemel. Assume W k = [α l1,, α lk ], then W T k W k = I k. Suppose that W k = Q k R k (QR-decomposition of W k ). The condition number of the basis W k = W k W k = R 1 k R k.
Controlling the Condition Number 4 Condition Control W k = [α l1,, α lk ] = Q k R k. Update the QR-decomposition of W k to get the QR-decomposition of W n k = [α l 1,, α lk, α n ]. Hence W n k = Qn k Rn k with R n k = ( Rk v n k 0 r n k Assume that σ k = R k z be the norm of the matrix with z = 1. The idea of incremental norm estimator is to maximize the quantity R kẑ n with ẑ = (sz, c)t, and s 2 + c 2 n = 1. An analytical expression for the maximum of Rkẑ is easily obtained [Duff and Voemel]. (R n k ) 1 = 1/σ n k with σn k being the smallest singular value of the matrix Rn k. The same incremental norm estimator can be used to compute the smallest singular value of R n k. ).
Controlling the Condition Number 4 Condition Control The smallest and largest singular values of R k and corresponding right singular vectors can be cheaply computed by applying a few power iterations to R k and initialized by some intelligent guess. R 1 k Let Jk ɛ = {n : γ n,f γ ɛ k }. Assume that κ n k the estimated condition number of the basis [W k, α n ], n Jk ɛ. Select the atom with index l k+1 = arg min n J ɛ k κ n k. Alternative way: Assume that D k := { α n, n J \J k } is sorted out in such a way that { γ n,f γ n, n J \J k} is in descending order, and κ k is the condition number of the basis at the kth step, then starting from the first atom in D k, estimate the condition number κ n k of [W κ k, α n ], n J \J k until n k κ is less than some fixed k number. Swapping can be done to improve the approximation or the condition number of the basis.
4 Condition Control Numerical Results Given a set find a subset with a given condition number ax n, a = {1, 2}, n = {1, 2, 3, 4, 5}, x = 10. (Old Alg. is based on SVD: Golub and Van Loan) Thrs x 2x x 2 2x 2 x 3 2x 3 x 4 2x 4 x 5 2x 5 Bspline Old 37 41 49 52 82 93 109 114 122 125 (143) New 34 39 48 50 75 87 105 110 119 122 Wavelet Old 152 163 182 189 202 207 216 219 226 229 (635) New 158 169 185 194 205 209 217 221 228 230 Figure 3: The ratios of the thresholds (thrs/cnd) and the computed condition numbers, B-spline dictionary (left), Wavelet dictionary (right)
4 Condition Control Numerical Results for OOMP We apply OOMP with and without condition control to approximate the modulated chirp function f = exp(x) cos(kπx 2 ) for k = 5, 10, 15, 20, 25, 30, 35, 40 and 45. The search is stopped when the condition number is greater than 10 6. Figure 4: Chirp function (left) error (middle) and number of selected atoms (right) versus k for the approximation of the modulated chirp function
4 Condition Control Numerical Results for Swapping Signals are chosen by selecting the atoms randomly and using linear combination of them. Figure 5: Error and condition growth with swapping operation
Oblique Projection 5 Oblique Matching Pursuit We are given three subspaces V, W and W of H with H = W W = V + W. The oblique projector E VW : H V is uniquely defined with the properties E VW v = v, v V, and E VW w = 0, w W if V W ={0}. For f H, Orthogonal projection: P Vk f = arg min g V k f g. Oblique projection: E Vk W f = arg min g V k f g PW with f, g PW = f, P W g. Following error bound holds: f P Vk f f E Vk W f 1 cos(θ) f P V k f, where θ is the angle between the spaces V k and W.
Oblique Matching Pursuit 5 Oblique Matching Pursuit If a signal f H is corrupted with noise, and the noise is known to lie in W with f = g + h with g V and h W, then g = E VW f. In general, only a redundant set D of atoms, called a dictionary, is known, which span V. The signal can be even corrupted with other noise. How to select atoms from the set D to span g? H = W W = V + W. Non-uniqueness of V in Use OOMP on the modified dictionary P W D to select atoms and construct oblique projector E Vk W. We call it Oblique matching pursuit. The efficient implementation can be done by using biorthogonal functions as before which span the space P W V k.
5 Oblique Matching Pursuit Figure 6: Oblique matching pursuit applied to the signal corrupted with only structured noise- called backrgound (left) and corrupted with structured and white noise σ = 0.001 (right) Note: The direct oblique projection (computed by T-SVD) explodes in case of the signal with white noise.
6 Conclusion Conclusion and Remark The recursive biorthogonal approach yields an efficient implmentation for the Matching Pursuit method. Forward and backward basis selection can be combined to introduce swapping operation based on the minimal residual condition. The incremental condition estimator can be easily adapted within the recursive biorthogonal approach to deal with the ill-conditioning. Oblique Matching Pursuit (Weighted Matching Pursuit) can be used to filter the structured noise. Theoretical investigation of the swapping operation. Generalizing the result for the multiple selection. Develop a domain decomposition algorithm. Combining different algorithms for efficiency and flexibility. Thank You