Sparse Legendre expansions via l 1 minimization

Sparse Legendre expansions via l 1 minimization Rachel Ward, Courant Institute, NYU Joint work with Holger Rauhut, Hausdorff Center for Mathematics, Bonn, Germany. June 8, 2010

Outline Sparse recovery for bounded orthonormal systems Sparse recovery for (certain) unbounded orthonormal systems From compressive sensing to function approximation Applications

Problem set-up Consider the general problem: reconstruct an s-sparse vector x C N or x R N (or a compressible vector) from its vector of m measurements y = Ax Suppose that the measurements correspond to underdetermined system: s < m < N, so that the system y = Ax has infinitely many solutions. We want a sparse solution. Preferably we would like to have a fast algorithm that performs the sparse reconstruction.

5 Algorithms for sparse recovery l 0 -minimization: x 0 := supp x, min x 0 subject to Ax = y. x C N Problem: l 0 -minimization is NP hard. l 1 minimization: min x x 1 = N x j subject to Ax = y j=1 This is the convex relaxation of l 0 -minimization problem, can be solved efficiently.

A sufficient condition for sparse recovery The restricted isometry constant δ s of a matrix A C m N is defined as the smallest δ s such that for all s-sparse x C N. (1 δ s ) x 2 2 Ax 2 2 (1 + δ s ) x 2 2 (Candes, Romberg, Tao 2005, Foucart/Lai 2009): If A R m N satisfies RIP of order 2s, and if δ 2s.46..., then any vector x R N may be approximated from y = Ax + ξ with ξ 2 ɛ by up to error x = min z z 1 subject to Az y 2 ɛ x x 2 C 1 x x s 1 s + C 2 ɛ.

Which matrices satisfy the restricted isometry property? If A R m N is a Gaussian or Bernoulli random matrix, m = O(s ln(n/s)) measurements ensure that A satisfy the restricted isometry property of order s with high probability. In contrast, all available deterministic constructions run into quadratic bottleneck: require m = O(s 2 log N) measurements. Compromise between completely random and deterministic: structured random matrices.

Structured random matrices: the random partial Fourier matrix Consider the discrete Fourier matrix F C N N with entries F l,k = 1 N e 2πilk/N, l, k = 1,..., N, Randomly subsample m < N rows from F to get the m N random partial Fourier matrix. The measurements y l = (Fx) l are a random subsample of the DFT of x. Theorem (Candes, Romberg, Tao 2005; Rudelson, Vershynin 2008) If m O(s ln 4 (N)), then with high probability, the m N random partial Fourier matrix satisfies the restricted isometry property of order s.

The random partial Fourier matrix: Function approximation interpretation Random partial Fourier measurements correspond to samples of a trigonometric polynomial: y l = (Fx) l = g x (t l ), t l = 2πj l /N, where g x (t) := N 1 k=0 x ke ikt. We say that g x (t) is an s-sparse trigonometric polynomial of degree N 1 if the coefficient vector (x k ) N 1 k=0 is s-sparse. RIP of the random partial Fourier matrix means that all s-sparse trigonometric polynomials of degree N can be recovered from their values on a fixed subset of m = O(s log 4 N) measurements of the form t l = 2πl/N.

Outline Sparse recovery for bounded orthonormal systems Sparse recovery for (certain) unbounded orthonormal systems From compressive sensing to function approximation Applications

Sparse recovery for more general orthonormal systems? We may consider the recovery of functions f (u) = N 1 k=0 x k ψ k (u), u D R d that are sparse relative to a function system ψ 1,..., ψ N : D C that is orthonormal with respect to measure ν on D, from a number m < N of point samples y l = f (u l ) = N x k ψ k (u l ), u 1,..., u m ν. k=0 where m is proportional to the sparsity level s. Whether sparse recovery is possible depends on the structure of the sampling matrix A l,k = ψ k (u l ) associated to the orthonormal system

Sparse recovery for general bounded orthonormal systems Sampling matrix: A l,k = ψ k (u l ), l [m], k [N] Theorem (Rauhut 09) Suppose that ψ k K for all k [N]. Let A C m N be the sampling matrix associated to the bounded orthonormal system {ψ j } N 1 j=0 with sampling points u 1,..., u m D chosen i.i.d. from the orthogonalization measure ν. If m O(K 2 s ln 4 (N)), then with probability exceeding 1 N log3 N, the matrix 1 m A satisfies RIP of order s. Examples of uniformly bounded orthonormal systems: The complex exponential basis (with uniform orthogonalization measure) and Chebyshev polynomial basis, orthonormal on [ 1, 1] w.r.t. the Chebyshev measure.

The Legendre polynomials The Legendre polynomials P l are orthonormal on D = [ 1, 1] with respect to Lebesgue measure. They are generated via Gram-Schmidt orthogonalization on the monomial system 1, x, x 2,... They are used in schemes for solving several classical PDEs, such as Laplace s equation Fast Legendre polynomial transform: requires only O(N log N) arithmetic operations

4 The Legendre polynomials The Legendre polynomials satisfy P l = 2l + 1, so K = sup 0 l N 1 P l = 2N 1. From existing theory: s-sparse polynomials of the form f (t) = N k=0 x kp k (t) may be recovered from its values at m = O(sK 2 ln 4 (N)) = O(sN ln 4 N) sampling points. this is trivial!

The Legendre polynomials The Legendre polynomials satisfy P l = 2l + 1, so K = sup 0 l N 1 P l = 2N 1. From existing theory: s-sparse polynomials of the form f (t) = N k=0 x kp k (t) may be recovered from its values at m = O(sK 2 ln 4 (N)) = O(sN ln 4 N) sampling points. this is trivial!

Sparse recovery for Legendre orthonormal systems Theorem (Rauhut, W 10) Let N and s be given. Suppose that m = O(s log 4 N) sampling points t 1,..., t m are drawn from the Chebyshev measure dν = π 1 (1 t 2 ) 1/2 dt on [ 1, 1]. Then with probability exceeding 1 N log3 N, the following holds for all polynomials f (t) = N 1 k=0 x kp k (t): Suppose that noisy sample values (f (t 1 ) + η 1,..., f (t m ) + η m ) are observed, and η 2 mɛ. Then the coefficient vector x = (x 0, x 1,..., x N 1 ) is approximated by x := min z z 1 subject to Az y 2 ɛ up to error x x 2 C 1 x x s 1 s + C 2 ɛ.

17 Numerical Example A sparse Legendre polynomial with sparsity s = 5, maximal degree N = 80, and n = 20 i.i.d. sampling points from the Chebyshev measure. Reconstruction by l 1 -minimization is exact! 3 2 1 0 1 2 3 1 0.5 0 0.5 1

Numerical Example 3 2 1 0 1 2 3 1 0.5 0 0.5 1 The same sparse Legendre polynomial (black) subject to noisy random measurements, and the stable approximation obtained by l 1 -minimization.

19 Key idea in proof: Premultiplication The Legendre polynomials grow uniformly as they approach the endpoints: P n (t) < 2π 1/2 (1 t 2 ) 1/4, 1 t 1; (Constant due to S. Bernstein)

20 Key idea in proof: Premultiplication The Legendre polynomials grow uniformly as they approach the endpoints: P n (t) < 2π 1/2 2π 1/2 (1 t 2 ) 1/4, 1 t 1; (Constant due to S. Bernstein) So the functions Q n (t) = π 2 (1 t2 ) 1/4 P n (t) form a uniformly bounded system (and Q n 2). Also, the Q n s are orthonormal with respect to the Chebyshev measure dν = π 1 (1 t 2 ) 1/2 dt on [ 1, 1]: 1 1 Q n (t)q m (t)π 1 (1 t 2 ) 1/2 dt = 1 1 P n (t)p m (t)dt = δ n,m.

Outline Sparse recovery for bounded orthonormal systems Sparse recovery for (certain) unbounded orthonormal systems From compressive sensing to function approximation Applications

22 Universality of the Chebyshev measure Consider a probability measure ν(t)dt on [ 1, 1]. (Szego): If ν(t) satisfies a mild continuity condition, then the polynomials p ν n that are orthonormal w.r.t. ν will still satisfy a uniform growth condition: p ν n(t) C ν (1 t 2 ) 1/4 ν(t) 1/2, as before, q ν n(t) = (1 t 2 ) 1/4 ν(t) 1/2 p ν n(t) is a uniformly bounded system and orthonormal w.r.t. Chebyshev measure (1 t 2 ) 1/2 dt.

From compressive sensing to function approximation Consider the weighted norm on continuous functions in [ 1, 1], f,w = and the error sup f (t) w(t), w(t) = (1 t 2 ) 1/4 ν(t), t [ 1,1] N 1 σ N,s (f ),w = inf{ f x k pk ν,w : (x k ) R N, x 0 s} k=0 Theorem (Rauhut and W, 10) Let N, m, s be given with m = O(s log 4 N). Then there exist sampling points t 1,..., t m (chosen according to Chebyshev distribution) and an efficient reconstruction procedure (l 1 minimization) such that for any continuous function f, the polynomial P of degree at most N 1 reconstructed from f (t 1 ),..., f (t m ) satisfies 23 f P,w C w s σn,s (f ),w.

Extension to spherical harmonics Re(Y 0 0 ), Re(Y 0 1 ), Re(Y 0 2 ). Re(Y 1 2 ), Re(Y 0 3 ), Re(Y 1 3 ). Y m l (φ, θ) = C l,mp m l ( cos(θ) ) e imφ, l m l, l = 0, 1,... P m l s are the associated Legendre functions. Fast Spherical harmonic transform: requires only O(N(log N) 2 ) arithmetic operations

26 Extension to spherical harmonics The associated Legendre functions Pl m may be written in terms of orthonormal polynomials with respect to the positive weights ν α (x) = (1 x 2 ) α, α = 0, 1,..., n. Krasikov 08: (1 x 2 ) 1/4 ν α (x) 1/2 p α n (x) C α O(α 1/4 ) from this one can derive... Corollary Let N, m, s be given with m = O(sN 1/4 log 4 N). Then there exist sampling points t 1,..., t m (chosen according to the spherical Chebyshev distribution) and an efficient reconstruction procedure (l 1 minimization) such that any function on the sphere which is an s-sparse expansion in the first N spherical harmonic basis functions may be exactly recovered from these sampling points.

Outline Sparse recovery for bounded orthonormal systems Sparse recovery for (certain) unbounded orthonormal systems From compressive sensing to function approximation Applications

8 Application: Cosmic Microwave Background Radiation (CMB) map WMAP full-sky temperature anisotropy map (Source: http://map.gsfc.nasa.gov) T (θ, φ) = l l=0 m= l a l,myl m (θ, φ) where the Y s are the spherical harmonics. Red band: measurements are corrupted, interference with galactic foreground signal.

29 CMB map is compressible in spherical harmonics Source: Lyman Page, http://ophelia.princeton.edu/ a a s 2 / a 2, s = 1,..., n 2 Consider the coefficient vector a = (a l,m ) in T (θ, φ) n l l=0 m= l a l,myl m (θ, φ). This vector is predicted and observed to be compressible in the spherical harmonic basis.

CMB map is compressible in spherical harmonics (J. Starck et. al., 08): 1 Propose full-sky CMB map inpainting from partial CMB measurements T (θ j, φ k ) = m j,k. Obtain the coefficients a = (a l,m ) by solving the l 1 minimization problem, a = arg min N l l=0 m= l z l,m s.t. N l l=0 m= l z l,m Y l,m (θ j, φ k ) = m j,k 1 Source: P. Abrial, Y. Moudden, J. Starck, J. Fadili, J. Delabrouille, and M. Nguyen. CMB data analysis and sparsity. Stat. Methodol., 2008.

Final remarks Our results provide some theoretical justification for the good inpainting results obtained for the CMB map. Open questions: 1. Sparse recovery in spherical harmonic basis: can we get better recovery guarantees? 2. Can we generalize these results to functions sparse in bases that are eigenfunction solutions to a more general class of PDE THANK YOU