Signal Recovery, Uncertainty Relations, and Minkowski Dimension Helmut Bőlcskei ETH Zurich December 2013 Joint work with C. Aubel, P. Kuppinger, G. Pope, E. Riegler, D. Stotz, and C. Studer
Aim of this Talk Develop a unified theory for a wide range of (sparse) signal recovery problems: 2 / 35
Aim of this Talk Develop a unified theory for a wide range of (sparse) signal recovery problems: Signal separation 2 / 35
Aim of this Talk Develop a unified theory for a wide range of (sparse) signal recovery problems: Signal separation Super-resolution 2 / 35
Aim of this Talk Develop a unified theory for a wide range of (sparse) signal recovery problems: Signal separation Super-resolution Inpainting 2 / 35
Aim of this Talk Develop a unified theory for a wide range of (sparse) signal recovery problems: Signal separation Super-resolution Inpainting De-clipping 2 / 35
Aim of this Talk Develop a unified theory for a wide range of (sparse) signal recovery problems: Signal separation Super-resolution Inpainting De-clipping Removal of impulse noise or narrowband interference 2 / 35
Aim of this Talk Develop a unified theory for a wide range of (sparse) signal recovery problems: Signal separation Super-resolution Inpainting De-clipping Removal of impulse noise or narrowband interference Establish fundamental performance limits 2 / 35
Aim of this Talk Develop a unified theory for a wide range of (sparse) signal recovery problems: Signal separation Super-resolution Inpainting De-clipping Removal of impulse noise or narrowband interference Establish fundamental performance limits Propose an information-theoretic formulation 2 / 35
Signal Separation Decompose image into cartoon and textured part observation 3 / 35
Signal Separation Decompose image into cartoon and textured part observation cartoon texture 3 / 35
Image Restoration observation 4 / 35
Image Restoration observation signal scratches 4 / 35
Removing Clicks from a Vinyl/Record Player recorded signal 5 / 35
Removing Clicks from a Vinyl/Record Player recorded signal original audio signal clicks 5 / 35
Structural Specifics and Signal Model z = Ax + Be Transform A sparsifies images, e.g., wavelet transform 6 / 35
Structural Specifics and Signal Model z = Ax + Be Transform A sparsifies images, e.g., wavelet transform Error signal sparse in transform B: 6 / 35
Structural Specifics and Signal Model z = Ax + Be Transform A sparsifies images, e.g., wavelet transform Error signal sparse in transform B: texture: sparse in curvelet frame 6 / 35
Structural Specifics and Signal Model z = Ax + Be Transform A sparsifies images, e.g., wavelet transform Error signal sparse in transform B: texture: sparse in curvelet frame scratches: sparse in ridgelet frame 6 / 35
Structural Specifics and Signal Model z = Ax + Be Transform A sparsifies images, e.g., wavelet transform Error signal sparse in transform B: texture: sparse in curvelet frame scratches: sparse in ridgelet frame clicks: sparse in identity basis 6 / 35
Super-Resolution Downsampled image (by a factor of 9) 7 / 35
Super-Resolution Downsampled image (by a factor of 9) Linear interpolation 7 / 35
Super-Resolution Downsampled image (by a factor of 9) Sparsity-exploiting reconstruction 7 / 35
Inpainting 8 / 35
Inpainting 8 / 35
Inpainting W. Heisenberg 8 / 35
Inpainting W. Heisenberg 8 / 35
Inpainting W. Heisenberg D. Gábor 8 / 35
Inpainting W. Heisenberg D. Gábor 8 / 35
Inpainting W. Heisenberg D. Gábor H. Minkowski 8 / 35
Signal Model for Super-Resolution and Inpainting Only a subset of the entries in is available y = Ax 9 / 35
Signal Model for Super-Resolution and Inpainting Only a subset of the entries in y = Ax is available Taken into account by assuming that we observe z = Ax + Be = Ax + Ie and choosing e so that the missing entries of y are set to, e.g., 0 9 / 35
Signal Model for Super-Resolution and Inpainting Only a subset of the entries in y = Ax is available Taken into account by assuming that we observe z = Ax + Be = Ax + Ie and choosing e so that the missing entries of y are set to, e.g., 0 Error signal e is sparse if few entries are missing 9 / 35
Recovery of Clipped Signals + 10 / 35
Recovery of Clipped Signals + 10 / 35
Signal Model for Clipping Instead of y = Ax 1 0.8 0.6 we observe z = Ax + [clip(ax) Ax] }{{} amplitude 0.4 0.2 0 0.2 0.4 0.6 0.8 sparse in B=I 1 1 32 64 96 128 160 192 224 256 sample index 11 / 35
Signal Model for Clipping Instead of y = Ax 1 0.8 0.6 we observe z = Ax + [clip(ax) Ax] }{{} amplitude 0.4 0.2 0 0.2 0.4 0.6 0.8 sparse in B=I 1 1 32 64 96 128 160 192 224 256 sample index Error signal is sparse if clipping is not too aggressive Support set of e is known 11 / 35
Some Existing Approaches Literature is rich, e.g. Signal separation: Morphological component analysis [Starck et al., 2004; Elad et al., 2005] Split-Bregman methods [Cai et al., 2009] Microlocal analysis [Donoho & Kutyniok, 2010] Convex demixing [McCoy & Tropp, 2013] 12 / 35
Some Existing Approaches Literature is rich, e.g. Signal separation: Morphological component analysis [Starck et al., 2004; Elad et al., 2005] Split-Bregman methods [Cai et al., 2009] Microlocal analysis [Donoho & Kutyniok, 2010] Convex demixing [McCoy & Tropp, 2013] Super-resolution: Navier-Stokes [Bertalmio et al., 2001] Sparsity enhancing [Yang et al., 2008] Total variation minimization [Candès & Fernandez-Granda, 2013] 12 / 35
Some Existing Approaches Cont d Inpainting: Local transforms and separation [Dong et al., 2011] Total variation minimization [Chambolle, 2004] Morphological component analysis [Elad et al., 2005] Image colorization [Sapiro, 2005] Clustered sparsity [King et al., 2014] De-clipping: Constrained matching pursuit [Adler et al., 2011] 13 / 35
General Problem Statement Signal model: z = Ax + Be 14 / 35
General Problem Statement Signal model: z = Ax + Be x, e sparse, may depend on each other 14 / 35
General Problem Statement Signal model: z = Ax + Be x, e sparse, may depend on each other A, B dictionaries (bases, incomplete sets, or frames) 14 / 35
General Problem Statement Signal model: z = Ax + Be x, e sparse, may depend on each other A, B dictionaries (bases, incomplete sets, or frames) Redundancy can lead to sparser representation 14 / 35
General Problem Statement Signal model: z = Ax + Be x, e sparse, may depend on each other A, B dictionaries (bases, incomplete sets, or frames) Redundancy can lead to sparser representation Examples: Overcomplete DFT Gabor frames Curvelet or wavelet frames Ridgelets or shearlets 14 / 35
General Problem Statement Signal model: z = Ax + Be x, e sparse, may depend on each other A, B dictionaries (bases, incomplete sets, or frames) Redundancy can lead to sparser representation Examples: Overcomplete DFT Gabor frames Curvelet or wavelet frames Ridgelets or shearlets Want to recover x and/or e from z! Knowledge on x and/or e may be available (support set, sparsity level, full knowledge). 14 / 35
Formalizing the Problem z = Ax + Be = [A B] }{{} D [ ] x e 15 / 35
Formalizing the Problem z = Ax + Be = [A B] }{{} D [ ] x Requires solving an underdetermined linear system of equations e 15 / 35
Formalizing the Problem z = Ax + Be = [A B] }{{} D [ ] x Requires solving an underdetermined linear system of equations What are the fundamental limits on extracting x and e from z? e 15 / 35
Formalizing the Problem z = Ax + Be = [A B] }{{} D [ ] x Requires solving an underdetermined linear system of equations What are the fundamental limits on extracting x and e from z? Could use 1 2 (1 + 1/µ)-threshold [Donoho & Elad, 2003; Gribonval & Nielsen, 2003] for general D e 15 / 35
Uniqueness Assume there exist two pairs (x, e) and (x, e ) such that Ax + Be = Ax + Be and hence A(x x ) = B(e e) 16 / 35
Uniqueness Assume there exist two pairs (x, e) and (x, e ) such that Ax + Be = Ax + Be and hence A(x x ) = B(e e) The vectors (x x ) and (e e) represent the same signal s A(x x ) = B(e e) s in two different dictionaries A and B 16 / 35
Enter Uncertainty Principle Assume that x, x are n x -sparse x x is (2n x )-sparse e, e are n e -sparse e e is (2n e )-sparse 17 / 35
Enter Uncertainty Principle Assume that x, x are n x -sparse x x is (2n x )-sparse If e, e are n e -sparse e e is (2n e )-sparse n x and n e are small enough A and B are sufficiently different it may not be possible to satisfy s = A(x x ) = B(e e) 17 / 35
Uncertainty Relations for ONBs [Donoho & Stark, 1989]: A = I m, B = F m, Ap = Bq, then p 0 q 0 m m-point DFT [Elad & Bruckstein, 2002]: A and B general ONBs with µ max i j a i, b j, then p 0 q 0 1 µ 2 18 / 35
Uncertainty Relation for General A, B Theorem (Studer et al., 2011) Let A C m na be a dictionary with coherence µ a B C m n b be a dictionary with coherence µ b D = [A B] have coherence µ Ap = Bq Then, we have p 0 q 0 [1 µ a( p 0 1)] + [1 µ b ( q 0 1)] + µ 2. 19 / 35
Recovery with BP if supp(e) is Known (e.g., Declipping) Theorem (Studer et al., 2011) Let z = Ax + Be where E = supp(e) is known. Consider the convex program (BP, E) { minimize x 1 subject to A x ({z} + R(B E )). If 2 x 0 e 0 < [1 µ a(2 x 0 1)] + [1 µ b ( e 0 1)] + µ 2 then the unique solution of (BP, E) is given by x. Extended to compressible signals and noisy measurements [Studer & Baraniuk, 2011] 20 / 35
Rethinking Transform Coding Example: Separate text from picture Text is sparse in identity basis Use wavelets or DCT to sparsify image Observation 21 / 35
Rethinking Transform Coding Example: Separate text from picture Text is sparse in identity basis Use wavelets or DCT to sparsify image Observation A = wavelet basis A = DCT µ = 0.25 µ 0.0039 Wavelet basis is more coherent with identity yields worse separation performance 21 / 35
Analytical vs. Numerical Results 50% success-rate contour analytical thresholds and known only known only known only known and unknown and known only known only known and unknown error sparsity level signal sparsity level A, B R 64 80 µ a 0.126, µ b 0.131, and µ 0.132 22 / 35
The Thresholds are Tight A = I m, B = F m, Ap = Bq 23 / 35
The Thresholds are Tight A = I m, B = F m, Ap = Bq Recovery is unique if p 0 + q 0 < m 23 / 35
The Thresholds are Tight A = I m, B = F m, Ap = Bq Recovery is unique if p 0 + q 0 < m [ ] δ 0 0 23 / 35
The Thresholds are Tight A = I m, B = F m, Ap = Bq Recovery is unique if p 0 + q 0 < m 0 [ I [ ] 0 δ ] [ ] δ F = 0 [ I ] [ ] 0 F δ 23 / 35
The Thresholds are Tight A = I m, B = F m, Ap = Bq Recovery is unique if p 0 + q 0 < m 0 [ I [ ] 0 δ ] [ ] δ F = 0 [ I ] [ ] 0 F δ This behavior is fundamental and is known as the square-root bottleneck 23 / 35
Probabilistic Recovery Guarantees for BP Neither support set known [Kuppinger et al., 2011] One or both support sets known [Pope et al., 2011] Recovery possible with high probability even if Compare to p 0 + q 0 m log n p 0 + q 0 m This breaks the square-root bottleneck! 24 / 35
An Information-Theoretic Formulation Sparsity for random signals: components of signal are drawn i.i.d. (1 ρ)δ 0 + ρp cont where 0 ρ 1 represents the mixture parameter 25 / 35
An Information-Theoretic Formulation Sparsity for random signals: components of signal are drawn i.i.d. (1 ρ)δ 0 + ρp cont where 0 ρ 1 represents the mixture parameter For large dimensions, the fraction of nonzero components in the signal is given by ρ (LLN) 25 / 35
An Information-Theoretic Formulation Sparsity for random signals: components of signal are drawn i.i.d. (1 ρ)δ 0 + ρp cont where 0 ρ 1 represents the mixture parameter For large dimensions, the fraction of nonzero components in the signal is given by ρ (LLN) General distributions Lebesgue decomposition: P = αp disc + βp cont + γp sing, α + β + γ = 1 25 / 35
Almost Lossless Signal Separation Framework inspired by [Wu & Verdú, 2010]: z = Ax + Be Existence of a measurable separator g such that for general random sources x, e, for sufficiently large blocklengths [ ( ]) ]] P g [A B] [ x e [ x e < ε 26 / 35
Almost Lossless Signal Separation Framework inspired by [Wu & Verdú, 2010]: z = Ax + Be Existence of a measurable separator g such that for general random sources x, e, for sufficiently large blocklengths [ ( ]) ]] P g [A B] [ x e [ x e < ε Almost lossless signal separation We are interested in the structure of pairs A, B for which separation is possible. Concretely: fix B, look for suitable A 26 / 35
Setting Source: [ X1 X n l }{{} fraction: 1 λ ] T E 1 E }{{} l R n fraction: λ stoch. processes: (X i ) i N and (E i ) i N, fraction parameter: λ [0, 1] 27 / 35
Setting Source: [ X1 X n l }{{} fraction: 1 λ ] T E 1 E }{{} l R n fraction: λ stoch. processes: (X i ) i N and (E i ) i N, fraction parameter: λ [0, 1] Code of rate R = m/n: (m = no. of measurements, n = no. of unknowns) measurement matrices: A R m (n l), B R m l measurable separator g : R m R m (n l) R m l 27 / 35
Setting Source: [ X1 X n l }{{} fraction: 1 λ ] T E 1 E }{{} l R n fraction: λ stoch. processes: (X i ) i N and (E i ) i N, fraction parameter: λ [0, 1] Code of rate R = m/n: (m = no. of measurements, n = no. of unknowns) measurement matrices: A R m (n l), B R m l measurable separator g : R m R m (n l) R m l R is ε-achievable if for sufficiently large n (asymptotic analysis) [ ( [ ]) [ ]] x x P g [A B] < ε e e 27 / 35
Minkowski Dimension A suitable measure for complexity: Covering number: N S (ε) := min { k N S i {1,...,k} B n (u i, ε), u i R n} 28 / 35
Minkowski Dimension A suitable measure for complexity: Covering number: N S (ε) := min { k N S i {1,...,k} B n (u i, ε), u i R n} 28 / 35
Minkowski Dimension A suitable measure for complexity: Covering number: N S (ε) := min { k N S i {1,...,k} B n (u i, ε), u i R n} 28 / 35
Minkowski Dimension A suitable measure for complexity: Covering number: N S (ε) := min { k N S i {1,...,k} B n (u i, ε), u i R n} (Lower) Minkowski dimension/box-counting dimension: dim B (S) := lim inf ε 0 log N S (ε) log 1 ε for small ε: N S (ε) ε dim B (S) 28 / 35
Minkowski Dimension Compression Rate Minkowski dimension compression rate: R B (ε) := lim sup a n (ε) n where a n (ε) := inf { dim B (S) n [[ x S R n, P e ] S ] 1 ε } 29 / 35
Minkowski Dimension Compression Rate Minkowski dimension compression rate: R B (ε) := lim sup a n (ε) n where a n (ε) := inf { dim B (S) n [[ x S R n, P e ] S ] 1 ε } Among all approximate support sets: 29 / 35
Minkowski Dimension Compression Rate Minkowski dimension compression rate: R B (ε) := lim sup a n (ε) n where a n (ε) := inf { dim B (S) n [[ x S R n, P e ] S ] 1 ε } Among all approximate support sets: the smallest possible Minkowski dimension (per blocklength) 29 / 35
Main Result Theorem Let R >R B (ε). Then, for every fixed full-rank matrix B R m l with m l and for Lebesgue a.a. matrices A R m (n l), where m= Rn, there exists a measurable separator g such that for sufficiently large n [ ( ]) ]] P g [A B] [ x e [ x e < ε 30 / 35
Main Result Theorem Let R >R B (ε). Then, for every fixed full-rank matrix B R m l with m l and for Lebesgue a.a. matrices A R m (n l), where m= Rn, there exists a measurable separator g such that for sufficiently large n [ ( ]) ]] P g [A B] [ x e [ x e < ε simple and intuitive proof inspired by [Sauer et al., 1991] almost all matrices A are incoherent to a given matrix B 30 / 35
A Probabilistic Null-Space Property Proposition Let S R n be such that dim B (S) < m and let B R m l be a full-rank matrix with m l. Then { u S \{0} [A B]u = 0 } = for Lebesgue a.a. A R m (n l) 31 / 35
A Probabilistic Null-Space Property Proposition Let S R n be such that dim B (S) < m and let B R m l be a full-rank matrix with m l. Then { u S \{0} [A B]u = 0 } = for Lebesgue a.a. A R m (n l) A and B ONBs, then there is no (p, q) 0 such that Ap = Bq and p 0 q 0 < 1 µ 2 dim B (S) < m, then for a.a. A there is no (p, q) 0 such that Ap = Bq and u = (p, q) S 31 / 35
A Probabilistic Null-Space Property Proposition Let S R n be such that dim B (S) < m and let B R m l be a full-rank matrix with m l. Then { u S \{0} [A B]u = 0 } = for Lebesgue a.a. A R m (n l) ker[a B] 31 / 35
A Probabilistic Null-Space Property Proposition Let S R n be such that dim B (S) < m and let B R m l be a full-rank matrix with m l. Then { u S \{0} [A B]u = 0 } = for Lebesgue a.a. A R m (n l) S ker[a B] 31 / 35
A Probabilistic Null-Space Property Proposition Let S R n be such that dim B (S) < m and let B R m l be a full-rank matrix with m l. Then { u S \{0} [A B]u = 0 } = for Lebesgue a.a. A R m (n l) S ker[a B] dim ker[a B] = n m dim B (S) < m dim ker[a B] + dim B (S) < n 31 / 35
A Probabilistic Null-Space Property Proposition Let S R n be such that dim B (S) < m and let B R m l be a full-rank matrix with m l. Then { u S \{0} [A B]u = 0 } = for Lebesgue a.a. A R m (n l) S ker[a B] ker[a B] dim ker[a B] = n m dim B (S) < m dim ker[a B] + dim B (S) < n 31 / 35
A Probabilistic Null-Space Property Proposition Let S R n be such that dim B (S) < m and let B R m l be a full-rank matrix with m l. Then { u S \{0} [A B]u = 0 } = for Lebesgue a.a. A R m (n l) S ker[a B] ker[a B] dim ker[a B] = n m dim B (S) < m dim ker[a B] + dim B (S) < n S 31 / 35
Back to Discrete-Continuous Mixtures X i E i i.i.d. (1 ρ 1 )P d1 + ρ 1 P c1 i.i.d. (1 ρ 2 )P d2 + ρ 2 P c2 where 0 ρ i 1; for P d1 = P d2 = δ 0 sparse signal model Fraction of X i s = 1 λ; fraction of E i s = λ 32 / 35
Back to Discrete-Continuous Mixtures X i E i i.i.d. (1 ρ 1 )P d1 + ρ 1 P c1 i.i.d. (1 ρ 2 )P d2 + ρ 2 P c2 where 0 ρ i 1; for P d1 = P d2 = δ 0 sparse signal model Fraction of X i s = 1 λ; fraction of E i s = λ An exact expression for R B (ε) and a converse: Theorem For discrete-continuous mixtures the optimal compression rate is R B (ε) = (1 λ)ρ 1 + λρ 2 32 / 35
Back to Discrete-Continuous Mixtures X i E i i.i.d. (1 ρ 1 )P d1 + ρ 1 P c1 i.i.d. (1 ρ 2 )P d2 + ρ 2 P c2 where 0 ρ i 1; for P d1 = P d2 = δ 0 sparse signal model Fraction of X i s = 1 λ; fraction of E i s = λ An exact expression for R B (ε) and a converse: Theorem For discrete-continuous mixtures the optimal compression rate is R B (ε) = (1 λ)ρ 1 + λρ 2 Optimal no. of measurements = no. of nonzero components 32 / 35
What it is and What it is not The information-theoretic approach applies to general source distributions 33 / 35
What it is and What it is not The information-theoretic approach applies to general source distributions works for given B for a.a. A 33 / 35
What it is and What it is not The information-theoretic approach applies to general source distributions works for given B for a.a. A yields explicit expression for optimal compression rate (i.e., optimal no. of measurements) for sparse signals 33 / 35
What it is and What it is not The information-theoretic approach applies to general source distributions works for given B for a.a. A yields explicit expression for optimal compression rate (i.e., optimal no. of measurements) for sparse signals achieves linear scaling (got rid of log n factor) 33 / 35
What it is and What it is not The information-theoretic approach applies to general source distributions works for given B for a.a. A yields explicit expression for optimal compression rate (i.e., optimal no. of measurements) for sparse signals achieves linear scaling (got rid of log n factor) is of asymptotic nature 33 / 35
What it is and What it is not The information-theoretic approach applies to general source distributions works for given B for a.a. A yields explicit expression for optimal compression rate (i.e., optimal no. of measurements) for sparse signals achieves linear scaling (got rid of log n factor) is of asymptotic nature deals with the noiseless case 33 / 35
What it is and What it is not The information-theoretic approach applies to general source distributions works for given B for a.a. A yields explicit expression for optimal compression rate (i.e., optimal no. of measurements) for sparse signals achieves linear scaling (got rid of log n factor) is of asymptotic nature deals with the noiseless case provides existence results only for decoders 33 / 35
Thank you
If you ask me anything I don t know, I m not going to answer. Y. Berra