A Tutorial on Compressive Sensing Simon Foucart Drexel University / University of Georgia CIMPA13 New Trends in Applied Harmonic Analysis Mar del Plata, Argentina, 5-16 August 2013
This minicourse acts as an invitation to the elegant theory of Compressive Sensing. It aims at giving a solid overview of the fundamental mathematical aspects. Its content is partly based on a recent textbook coauthored with Holger Rauhut:
Part 1: The Standard Problem and First Algorithms The lecture introduces the question of sparse recovery, establishes its theoretical limits, and presents an algorithm achieving these limits in an idealized situation. In order to treat more realistic situations, other algorithms are necessary. Basis Pursuit (that is, l 1 -minimization) and Orthogonal Matching Pursuit make their first appearance. Their success is proved using the concept of coherence of a matrix.
Keywords
Keywords Sparsity Essential
Keywords Sparsity Essential Randomness Nothing better so far (measurement process)
Keywords Sparsity Essential Randomness Nothing better so far (measurement process) Optimization Preferred, but competitive alternatives are available (reconstruction process)
The Standard Compressive Sensing Problem
The Standard Compressive Sensing Problem x : unknown signal of interest in K N
The Standard Compressive Sensing Problem x : unknown signal of interest in K N y : measurement vector in K m
The Standard Compressive Sensing Problem x : unknown signal of interest in K N y : measurement vector in K m with m N,
The Standard Compressive Sensing Problem x : unknown signal of interest in K N y : measurement vector in K m with m N, s : sparsity of x
The Standard Compressive Sensing Problem x : unknown signal of interest in K N y : measurement vector in K m with m N, s : sparsity of x = card { j {1,..., N} : x j 0 }.
The Standard Compressive Sensing Problem x : unknown signal of interest in K N y : measurement vector in K m with m N, s : sparsity of x = card { j {1,..., N} : x j 0 }. Find concrete sensing/recovery protocols,
The Standard Compressive Sensing Problem x : unknown signal of interest in K N y : measurement vector in K m with m N, s : sparsity of x = card { j {1,..., N} : x j 0 }. Find concrete sensing/recovery protocols, i.e., find measurement matrices A : x K N y K m
The Standard Compressive Sensing Problem x : unknown signal of interest in K N y : measurement vector in K m with m N, s : sparsity of x = card { j {1,..., N} : x j 0 }. Find concrete sensing/recovery protocols, i.e., find measurement matrices A : x K N y K m reconstruction maps : y K m x K N
The Standard Compressive Sensing Problem x : unknown signal of interest in K N y : measurement vector in K m with m N, s : sparsity of x = card { j {1,..., N} : x j 0 }. Find concrete sensing/recovery protocols, i.e., find measurement matrices A : x K N y K m reconstruction maps : y K m x K N such that (Ax) = x for any s-sparse vector x K N.
The Standard Compressive Sensing Problem x : unknown signal of interest in K N y : measurement vector in K m with m N, s : sparsity of x = card { j {1,..., N} : x j 0 }. Find concrete sensing/recovery protocols, i.e., find measurement matrices A : x K N y K m reconstruction maps : y K m x K N such that (Ax) = x for any s-sparse vector x K N. In realistic situations, two issues to consider:
The Standard Compressive Sensing Problem x : unknown signal of interest in K N y : measurement vector in K m with m N, s : sparsity of x = card { j {1,..., N} : x j 0 }. Find concrete sensing/recovery protocols, i.e., find measurement matrices A : x K N y K m reconstruction maps : y K m x K N such that (Ax) = x for any s-sparse vector x K N. In realistic situations, two issues to consider: Stability: x not sparse but compressible, Robustness: measurement error in y = Ax + e.
A Selection of Applications
A Selection of Applications Magnetic resonance imaging Figure: Left: traditional MRI reconstruction; Right: compressive sensing reconstruction (courtesy of M. Lustig and S. Vasanawala)
A Selection of Applications Magnetic resonance imaging Sampling theory 8 6 4 2 0 2 4 6 0.5 0.4 0.3 0.2 0.1 0 0.1 0.2 0.3 0.4 0.5 Figure: Time-domain signal with 16 samples.
A Selection of Applications Magnetic resonance imaging Sampling theory Error correction
A Selection of Applications Magnetic resonance imaging Sampling theory Error correction and many more...
l 0 -Minimization Since x p p := N j=1 x j p p 0 N j=1 1 {xj 0},
l 0 -Minimization Since x p p := N j=1 x j p p 0 N j=1 the notation x 0 [sic] has become usual for 1 {xj 0}, x 0 := card(supp(x)), where supp(x) := {j [N] : x j 0}.
l 0 -Minimization Since x p p := N j=1 x j p p 0 N j=1 the notation x 0 [sic] has become usual for 1 {xj 0}, x 0 := card(supp(x)), where supp(x) := {j [N] : x j 0}. For an s-sparse x K N, observe the equivalence of
l 0 -Minimization Since x p p := N j=1 x j p p 0 N j=1 the notation x 0 [sic] has become usual for 1 {xj 0}, x 0 := card(supp(x)), where supp(x) := {j [N] : x j 0}. For an s-sparse x K N, observe the equivalence of x is the unique s-sparse solution of Az = y with y = Ax,
l 0 -Minimization Since x p p := N j=1 x j p p 0 N j=1 the notation x 0 [sic] has become usual for 1 {xj 0}, x 0 := card(supp(x)), where supp(x) := {j [N] : x j 0}. For an s-sparse x K N, observe the equivalence of x is the unique s-sparse solution of Az = y with y = Ax, x can be reconstructed as the unique solution of (P 0 ) minimize z K N z 0 subject to Az = y.
l 0 -Minimization Since x p p := N j=1 x j p p 0 N j=1 the notation x 0 [sic] has become usual for 1 {xj 0}, x 0 := card(supp(x)), where supp(x) := {j [N] : x j 0}. For an s-sparse x K N, observe the equivalence of x is the unique s-sparse solution of Az = y with y = Ax, x can be reconstructed as the unique solution of (P 0 ) minimize z K N z 0 subject to Az = y. This is a combinatorial problem, NP-hard in general.
Minimal Number of Measurements
Minimal Number of Measurements Given A K m N, the following are equivalent:
Minimal Number of Measurements Given A K m N, the following are equivalent: 1. Every s-sparse x is the unique s-sparse solution of Az = Ax,
Minimal Number of Measurements Given A K m N, the following are equivalent: 1. Every s-sparse x is the unique s-sparse solution of Az = Ax, 2. ker A {z K N : z 0 2s} = {0},
Minimal Number of Measurements Given A K m N, the following are equivalent: 1. Every s-sparse x is the unique s-sparse solution of Az = Ax, 2. ker A {z K N : z 0 2s} = {0}, 3. For any S [N] with card(s) 2s, the matrix A S is injective,
Minimal Number of Measurements Given A K m N, the following are equivalent: 1. Every s-sparse x is the unique s-sparse solution of Az = Ax, 2. ker A {z K N : z 0 2s} = {0}, 3. For any S [N] with card(s) 2s, the matrix A S is injective, 4. Every set of 2s columns of A is linearly independent.
Minimal Number of Measurements Given A K m N, the following are equivalent: 1. Every s-sparse x is the unique s-sparse solution of Az = Ax, 2. ker A {z K N : z 0 2s} = {0}, 3. For any S [N] with card(s) 2s, the matrix A S is injective, 4. Every set of 2s columns of A is linearly independent. As a consequence, exact recovery of every s-sparse vector forces m 2s.
Minimal Number of Measurements Given A K m N, the following are equivalent: 1. Every s-sparse x is the unique s-sparse solution of Az = Ax, 2. ker A {z K N : z 0 2s} = {0}, 3. For any S [N] with card(s) 2s, the matrix A S is injective, 4. Every set of 2s columns of A is linearly independent. As a consequence, exact recovery of every s-sparse vector forces m 2s. This can be achieved using partial Vandermonde matrices.
Exact s-sparse Recovery from 2s Fourier Measurements
Exact s-sparse Recovery from 2s Fourier Measurements Identify an s-sparse x C N with a function x on {0, 1,..., N 1} with support S, card(s) = s.
Exact s-sparse Recovery from 2s Fourier Measurements Identify an s-sparse x C N with a function x on {0, 1,..., N 1} with support S, card(s) = s. Consider the 2s Fourier coefficients ˆx(j) = N 1 k=0 x(k)e i2πjk/n, 0 j 2s 1.
Exact s-sparse Recovery from 2s Fourier Measurements Identify an s-sparse x C N with a function x on {0, 1,..., N 1} with support S, card(s) = s. Consider the 2s Fourier coefficients ˆx(j) = N 1 k=0 x(k)e i2πjk/n, 0 j 2s 1. Consider a trigonometric polynomial vanishing exactly on S, i.e., p(t) := k S ( 1 e i2πk/n e i2πt/n).
Exact s-sparse Recovery from 2s Fourier Measurements Identify an s-sparse x C N with a function x on {0, 1,..., N 1} with support S, card(s) = s. Consider the 2s Fourier coefficients ˆx(j) = N 1 k=0 x(k)e i2πjk/n, 0 j 2s 1. Consider a trigonometric polynomial vanishing exactly on S, i.e., p(t) := k S ( 1 e i2πk/n e i2πt/n). Since p x 0, discrete convolution gives 0 = (ˆp ˆx)(j) = N 1 k=0 ˆp(k)ˆx(j k), 0 j N 1.
Exact s-sparse Recovery from 2s Fourier Measurements Identify an s-sparse x C N with a function x on {0, 1,..., N 1} with support S, card(s) = s. Consider the 2s Fourier coefficients ˆx(j) = N 1 k=0 x(k)e i2πjk/n, 0 j 2s 1. Consider a trigonometric polynomial vanishing exactly on S, i.e., p(t) := k S ( 1 e i2πk/n e i2πt/n). Since p x 0, discrete convolution gives 0 = (ˆp ˆx)(j) = N 1 k=0 ˆp(k)ˆx(j k), 0 j N 1. Note that ˆp(0) = 1 and that ˆp(k) = 0 for k > s.
Exact s-sparse Recovery from 2s Fourier Measurements Identify an s-sparse x C N with a function x on {0, 1,..., N 1} with support S, card(s) = s. Consider the 2s Fourier coefficients ˆx(j) = N 1 k=0 x(k)e i2πjk/n, 0 j 2s 1. Consider a trigonometric polynomial vanishing exactly on S, i.e., p(t) := k S ( 1 e i2πk/n e i2πt/n). Since p x 0, discrete convolution gives 0 = (ˆp ˆx)(j) = N 1 k=0 ˆp(k)ˆx(j k), 0 j N 1. Note that ˆp(0) = 1 and that ˆp(k) = 0 for k > s. The equations s,..., 2s 1 translate into a Toeplitz system with unknowns ˆp(1),..., ˆp(s).
Exact s-sparse Recovery from 2s Fourier Measurements Identify an s-sparse x C N with a function x on {0, 1,..., N 1} with support S, card(s) = s. Consider the 2s Fourier coefficients ˆx(j) = N 1 k=0 x(k)e i2πjk/n, 0 j 2s 1. Consider a trigonometric polynomial vanishing exactly on S, i.e., p(t) := k S ( 1 e i2πk/n e i2πt/n). Since p x 0, discrete convolution gives 0 = (ˆp ˆx)(j) = N 1 k=0 ˆp(k)ˆx(j k), 0 j N 1. Note that ˆp(0) = 1 and that ˆp(k) = 0 for k > s. The equations s,..., 2s 1 translate into a Toeplitz system with unknowns ˆp(1),..., ˆp(s). This determines ˆp, hence p, then S, and finally x.
Optimization and Greedy Strategies
l 1 -Minimization (Basis Pursuit)
l 1 -Minimization (Basis Pursuit) Replace (P 0 ) by (P 1 ) minimize z K N z 1 subject to Az = y.
l 1 -Minimization (Basis Pursuit) Replace (P 0 ) by (P 1 ) minimize z K N z 1 subject to Az = y. Geometric intuition
l 1 -Minimization (Basis Pursuit) Replace (P 0 ) by (P 1 ) minimize z K N z 1 subject to Az = y. Geometric intuition Unique l 1 -minimizers are at most m-sparse (when K = R)
l 1 -Minimization (Basis Pursuit) Replace (P 0 ) by (P 1 ) minimize z K N z 1 subject to Az = y. Geometric intuition Unique l 1 -minimizers are at most m-sparse (when K = R) Convex optimization program, hence solvable in practice
l 1 -Minimization (Basis Pursuit) Replace (P 0 ) by (P 1 ) minimize z K N z 1 subject to Az = y. Geometric intuition Unique l 1 -minimizers are at most m-sparse (when K = R) Convex optimization program, hence solvable in practice In the real setting, recast as the linear optimization program minimize c,z R N N c j subject to Az = y and c j z j c j. j=1
l 1 -Minimization (Basis Pursuit) Replace (P 0 ) by (P 1 ) minimize z K N z 1 subject to Az = y. Geometric intuition Unique l 1 -minimizers are at most m-sparse (when K = R) Convex optimization program, hence solvable in practice In the real setting, recast as the linear optimization program minimize c,z R N N c j subject to Az = y and c j z j c j. j=1 In the complex setting, recast as a second order cone program
Basis Pursuit Null Space Property
Basis Pursuit Null Space Property 1 (Ax) = x for every vector x supported on S if and only if
Basis Pursuit Null Space Property 1 (Ax) = x for every vector x supported on S if and only if (NSP) u S 1 < u S 1, all u ker A \ {0}.
Basis Pursuit Null Space Property 1 (Ax) = x for every vector x supported on S if and only if (NSP) u S 1 < u S 1, all u ker A \ {0}. For real measurement matrices,
Basis Pursuit Null Space Property 1 (Ax) = x for every vector x supported on S if and only if (NSP) u S 1 < u S 1, all u ker A \ {0}. For real measurement matrices, real and complex NSPs read
Basis Pursuit Null Space Property 1 (Ax) = x for every vector x supported on S if and only if (NSP) u S 1 < u S 1, all u ker A \ {0}. For real measurement matrices, real and complex NSPs read u j < u l, all u ker R A \ {0}, j S l S
Basis Pursuit Null Space Property 1 (Ax) = x for every vector x supported on S if and only if (NSP) u S 1 < u S 1, all u ker A \ {0}. For real measurement matrices, real and complex NSPs read u j < u l, all u ker R A \ {0}, j S l S vj 2 + wj 2 < vl 2 + w l 2, all (v, w) (ker R A) 2 \ {0}. j S l S
Basis Pursuit Null Space Property 1 (Ax) = x for every vector x supported on S if and only if (NSP) u S 1 < u S 1, all u ker A \ {0}. For real measurement matrices, real and complex NSPs read u j < u l, all u ker R A \ {0}, j S l S vj 2 + wj 2 < vl 2 + w l 2, all (v, w) (ker R A) 2 \ {0}. j S l S Real and complex NSPs are in fact equivalent.
Orthogonal Matching Pursuit
Orthogonal Matching Pursuit Starting with S 0 = and x 0 = 0, iterate (OMP 1 ) (OMP 2 ) S n+1 = S n { j n+1 { := argmax (A (y Ax n )) j }}, j [N] x n+1 = argmin z C N { y Az 2, supp(z) S n+1}.
Orthogonal Matching Pursuit Starting with S 0 = and x 0 = 0, iterate S n+1 = S n { j n+1 { (OMP := argmax (A (y Ax n )) j }} 1 ), j [N] (OMP 2 ) x n+1 = argmin z C N { y Az 2, supp(z) S n+1}. The norm of the residual decreases according to y Ax n+1 2 2 y Ax n 2 2 (A (y Ax n )) j n+1 2.
Orthogonal Matching Pursuit Starting with S 0 = and x 0 = 0, iterate S n+1 = S n { j n+1 { (OMP := argmax (A (y Ax n )) j }} 1 ), j [N] (OMP 2 ) x n+1 = argmin z C N { y Az 2, supp(z) S n+1}. The norm of the residual decreases according to y Ax n+1 2 2 y Ax n 2 2 (A (y Ax n )) j n+1 2. Every vector x 0 supported on S, card(s) = s, is recovered from y = Ax after at most s iterations of OMP if and only if A S is injective and (ERC) max j S (A r) j > max (A r) l l S for all r 0 { Az, supp(z) S }.
First Recovery Guarantees
Coherence
Coherence For a matrix with l 2 -normalized columns a 1,..., a N, define µ := max a i, a j. i j
Coherence For a matrix with l 2 -normalized columns a 1,..., a N, define µ := max a i, a j. i j As a rule, the smaller the coherence, the better.
Coherence For a matrix with l 2 -normalized columns a 1,..., a N, define µ := max a i, a j. i j As a rule, the smaller the coherence, the better. However, the Welch bound reads N m µ m(n 1).
Coherence For a matrix with l 2 -normalized columns a 1,..., a N, define µ := max a i, a j. i j As a rule, the smaller the coherence, the better. However, the Welch bound reads N m µ m(n 1). Welch bound achieved at and only at equiangular tight frames
Coherence For a matrix with l 2 -normalized columns a 1,..., a N, define µ := max a i, a j. i j As a rule, the smaller the coherence, the better. However, the Welch bound reads N m µ m(n 1). Welch bound achieved at and only at equiangular tight frames Deterministic matrices with coherence µ c/ m exist
Recovery Conditions using Coherence
Recovery Conditions using Coherence Every s-sparse x C N is recovered from y = Ax C m via at most s iterations of OMP provided µ < 1 2s 1.
Recovery Conditions using Coherence Every s-sparse x C N is recovered from y = Ax C m via at most s iterations of OMP provided µ < 1 2s 1. Every s-sparse x C N is recovered from y = Ax C m via Basis Pursuit provided µ < 1 2s 1.
Recovery Conditions using Coherence Every s-sparse x C N is recovered from y = Ax C m via at most s iterations of OMP provided µ < 1 2s 1. Every s-sparse x C N is recovered from y = Ax C m via Basis Pursuit provided µ < 1 2s 1. In fact, the Exact Recovery Condition can be rephrased as A S A S 1 1 < 1, and this implies the Null Space Property.
Summary
Summary The coherence conditions for s-sparse recovery are of the type µ c s,
Summary The coherence conditions for s-sparse recovery are of the type µ c s, while the Welch bound (for N 2m, say) reads µ c m.
Summary The coherence conditions for s-sparse recovery are of the type µ c s, while the Welch bound (for N 2m, say) reads µ c m. Therefore, arguments based on coherence necessitate m c s 2.
Summary The coherence conditions for s-sparse recovery are of the type µ c s, while the Welch bound (for N 2m, say) reads µ c m. Therefore, arguments based on coherence necessitate m c s 2. This is far from the ideal linear scaling of m in s...
Summary The coherence conditions for s-sparse recovery are of the type µ c s, while the Welch bound (for N 2m, say) reads µ c m. Therefore, arguments based on coherence necessitate m c s 2. This is far from the ideal linear scaling of m in s... Next, we will introduce new tools to break this quadratic barrier (with random matrices).