The uniform uncertainty principle and compressed sensing Harmonic analysis and related topics, Seville December 5, 2008

Similar documents
Robust Uncertainty Principles: Exact Signal Reconstruction from Highly Incomplete Frequency Information

A COUNTEREXAMPLE TO AN ENDPOINT BILINEAR STRICHARTZ INEQUALITY TERENCE TAO. t L x (R R2 ) f L 2 x (R2 )

HARMONIC ANALYSIS TERENCE TAO

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Least singular value of random matrices. Lewis Memorial Lecture / DIMACS minicourse March 18, Terence Tao (UCLA)

Noncommutative Uncertainty Principle

Compressed sensing. Or: the equation Ax = b, revisited. Terence Tao. Mahler Lecture Series. University of California, Los Angeles

Fourier transforms, I

THE DONOHO STARK UNCERTAINTY PRINCIPLE FOR A FINITE ABELIAN GROUP. 0. Introduction

Strengthened Sobolev inequalities for a random subspace of functions

Best constants in the local Hausdorff Young inequality on compact Lie groups

Sparse Legendre expansions via l 1 minimization

Lecture 3. Random Fourier measurements

Greedy Signal Recovery and Uniform Uncertainty Principles

LECTURE NOTES 9 FOR 247B

Exponential decay of reconstruction error from binary measurements of sparse signals

13. Fourier transforms

Super-resolution via Convex Programming

CS 229r: Algorithms for Big Data Fall Lecture 19 Nov 5

Does Compressed Sensing have applications in Robust Statistics?

Reminder Notes for the Course on Distribution Theory

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery

New Coherence and RIP Analysis for Weak. Orthogonal Matching Pursuit

Some Remarks on the Discrete Uncertainty Principle

Optimization-based sparse recovery: Compressed sensing vs. super-resolution

Reconstruction from Anisotropic Random Measurements

GREEDY SIGNAL RECOVERY REVIEW

MATH MEASURE THEORY AND FOURIER ANALYSIS. Contents

The Kakeya Problem Connections with Harmonic Analysis Kakeya sets over Finite Fields. Kakeya Sets. Jonathan Hickman. The University of Edinburgh

The dichotomy between structure and randomness. International Congress of Mathematicians, Aug Terence Tao (UCLA)

Towards a Mathematical Theory of Super-resolution

Wiener Measure and Brownian Motion

Bernstein s inequality and Nikolsky s inequality for R d

Terence Tao s harmonic analysis method on restricted sumsets

Noisy Signal Recovery via Iterative Reweighted L1-Minimization

A Generalized Uncertainty Principle and Sparse Representation in Pairs of Bases

TOOLS FROM HARMONIC ANALYSIS

The circular law. Lewis Memorial Lecture / DIMACS minicourse March 19, Terence Tao (UCLA)

A Sharpened Hausdorff-Young Inequality

Szemerédi s regularity lemma revisited. Lewis Memorial Lecture March 14, Terence Tao (UCLA)

Z Algorithmic Superpower Randomization October 15th, Lecture 12

ANALYSIS CLUB. Restriction Theory. George Kinnear, 7 February 2011

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define

Near Optimal Signal Recovery from Random Projections

1.3.1 Definition and Basic Properties of Convolution

A Non-Linear Lower Bound for Constant Depth Arithmetical Circuits via the Discrete Uncertainty Principle

Introduction How it works Theory behind Compressed Sensing. Compressed Sensing. Huichao Xue. CS3750 Fall 2011

Solution-recovery in l 1 -norm for non-square linear systems: deterministic conditions and open questions

Uncertainty principles and sparse approximation

CoSaMP: Greedy Signal Recovery and Uniform Uncertainty Principles

Rui ZHANG Song LI. Department of Mathematics, Zhejiang University, Hangzhou , P. R. China

Compressed Sensing and Robust Recovery of Low Rank Matrices

CHARACTER-FREE APPROACH TO PROGRESSION-FREE SETS

Simple Abelian Topological Groups. Luke Dominic Bush Hipwood. Mathematics Institute

The Kadison-Singer Problem and the Uncertainty Principle Eric Weber joint with Pete Casazza

Sparse recovery for spherical harmonic expansions

n 1 f = f m, m=0 n 1 k=0 Note that if n = 2, we essentially get example (i) (but for complex functions).

Lecture Notes 9: Constrained Optimization

Constructing Explicit RIP Matrices and the Square-Root Bottleneck

Primes in arithmetic progressions

Real Analysis Notes. Thomas Goller

7: FOURIER SERIES STEVEN HEILMAN

MATH 426, TOPOLOGY. p 1.

Compressed Sensing and Sparse Recovery

Uniform Uncertainty Principle and signal recovery via Regularized Orthogonal Matching Pursuit

Hilbert spaces. 1. Cauchy-Schwarz-Bunyakowsky inequality

Uncertainty Relations and Applications for the Schrödinger and Heat conduction equation

CoSaMP. Iterative signal recovery from incomplete and inaccurate samples. Joel A. Tropp

Super-resolution by means of Beurling minimal extrapolation

3 Integration and Expectation

MAT 585: Johnson-Lindenstrauss, Group testing, and Compressed Sensing

We denote the space of distributions on Ω by D ( Ω) 2.

Introduction to Real Analysis Alternative Chapter 1

Heisenberg s inequality

THE KADISON-SINGER PROBLEM AND THE UNCERTAINTY PRINCIPLE

Long Arithmetic Progressions in A + A + A with A a Prime Subset 1. Zhen Cui, Hongze Li and Boqing Xue 2

On the recovery of measures without separation conditions

Tools from Lebesgue integration

Near Ideal Behavior of a Modified Elastic Net Algorithm in Compressed Sensing

5406 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 12, DECEMBER 2006

3: THE SHANNON SAMPLING THEOREM

Decoupling course outline Decoupling theory is a recent development in Fourier analysis with applications in partial differential equations and

Supremum of simple stochastic processes

JUHA KINNUNEN. Harmonic Analysis

Compressed Sensing: Lecture I. Ronald DeVore

On positive positive-definite functions and Bochner s Theorem

Lecture 13 October 6, Covering Numbers and Maurey s Empirical Method

Linear Independence of Finite Gabor Systems

Sparse solutions of underdetermined systems

Recovering overcomplete sparse representations from structured sensing

1 Fourier Integrals on L 2 (R) and L 1 (R).

5 Measure theory II. (or. lim. Prove the proposition. 5. For fixed F A and φ M define the restriction of φ on F by writing.

MATH 220 solution to homework 5

A survey on l 2 decoupling

Qualifying Exams I, Jan where µ is the Lebesgue measure on [0,1]. In this problems, all functions are assumed to be in L 1 [0,1].

B 1 = {B(x, r) x = (x 1, x 2 ) H, 0 < r < x 2 }. (a) Show that B = B 1 B 2 is a basis for a topology on X.

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 11, NOVEMBER On the Performance of Sparse Recovery

A VERY BRIEF REVIEW OF MEASURE THEORY

An algebraic perspective on integer sparse recovery

Combining geometry and combinatorics

Transcription:

The uniform uncertainty principle and compressed sensing Harmonic analysis and related topics, Seville December 5, 2008 Emmanuel Candés (Caltech), Terence Tao (UCLA) 1

Uncertainty principles A basic principle in harmonic analysis is: Uncertainty principle: (informal) If a function f : G C on an abelian group G is concentrated in a small set, then its Fourier transform ˆf : Ĝ C must be spread out over a large set. There are many results that rigorously capture this sort of principle. 2

For instance, for the real line G = R, with the standard Fourier transform ˆf(ξ) = R f(x)e 2πixξ dx, we have Heisenberg uncertainty principle: If f L 2 (R) = ˆf L 2 (R) = 1, and x 0, ξ 0 R, then (x x 0 )f L 2 (R) (ξ ξ 0 ) ˆf L 2 (R) 1 4π. (More succinctly: ( x)( ξ) 1 4π.) Proof: Normalise x 0 = ξ 0 = 0, use the obvious inequality R axf(x) + ibf (x) 2 dx 0, integrate by parts, and optimise in a, b. 3

Equality is attained for centred Gaussians f(x) = ce πax2 ; ˆf(ξ) = c A e πξ2 /A when x 0 = ξ 0 = 0; this example can be translated and modulated to produce similar examples exist for other x 0, ξ 0. 4

What about for finite abelian groups G, e.g. cyclic groups G = Z/NZ? The Pontryagin dual group Ĝ of characters ξ : G R/Z has the same cardinality as G. For f : G C, we define the Fourier transform ˆf : Ĝ C as ˆf(ξ) := f(x)e(ξ x) dx where e(x) := e 2πix and dx = 1 d# is normalised G counting measure. G 5

We have the inversion formula f(x) := ξ Ĝ ˆf(ξ)e( ξ x) and the Plancherel formula f L 2 (G) = ˆf l 2 (Ĝ). 6

The analogue of Gaussians for finite abelian groups are the indicator functions of subgroups. If H G is a subgroup of G, define the orthogonal complement H Ĝ as H := {ξ Ĝ : ξ x = 0 for all x H}. We have the Poisson summation formula 1 H = H G 1 H (in particular, the Fourier transform of 1 is a Dirac mass, and vice versa.) From this and Plancherel we have the basic identity H H = G. 7

More generally, for finite abelian G we have Donoho-Stark uncertainty principle(1989) For any non-trivial f : G C, we have supp(f) supp( ˆf) G. Proof: Combine Plancherel s theorem with the Hölder inequality estimates f L 1 (G) supp(f) 1/2 G 1/2 f L 2 (G); ˆf l 2 (Ĝ) supp( ˆf) 1/2 ˆf l (Ĝ) and the Riemann-Lebesgue inequality ˆf l (Ĝ) f L 1 (G). 8

One can show that equality is attained precisely for the indicators 1 H of subgroups, up to translation, modulation, and multiplication by constants. 9

One also has a slightly more quantitative variant: Entropy uncertainty principle If f L 2 (G) = ˆf l 2 (Ĝ) = 1, then f(x) 2 1 log G f(x) dξ ˆf(ξ) 2 log ξ G 1 f(ξ) 0. Proof Differentiate (!) the Hausdorff-Young inequality ˆf L p (Ĝ) f L p (G) at p = 2. 10

From Jensen s inequality we have f(x) 2 1 supp(f) log dξ log f(x) G G ξ G ˆf(ξ) 2 log 1 f(ξ) log supp( ˆf) and so the entropy uncertainty principle implies the Donoho-Stark uncertainty principle. Again, equality is attained for indicators of subgroups (up to translation, modulation, and scalar multiplication). 11

For arbitrary groups G and arbitrary functions f, one cannot hope to do much better than the above uncertainty principles, due to examples such as the subgroup counterexamples f = 1 H. On the other hand, for generic groups and functions, one expects to do a lot better. (For instance, for generic f one has supp(f) = G and supp( ˆf) = Ĝ.) So one expects to obtain improved estimates by imposing additional hypotheses on G or f. 12

For instance, for cyclic groups G = Z/pZ of prime order, which have no non-trivial subgroups, we have Uncertainty principle for Z/pZ (T., 2005) If f : Z/pZ C is non-trivial, then supp(f) + supp( ˆf) p + 1. This is equivalent to an old result of Chebotarev that all minors of the Fourier matrix (e(xξ/p)) 1 x,ξ p are non-singular, which is proven by algebraic methods. The result is completely sharp: if A, B are sets with A + B p + 1, then there exists a function f with supp(f) = A and supp( ˆf) = B. Partial extensions to other groups (Meshulam, 2006). 13

This uncertainty principle has some amusing applications to arithmetic combinatorics, for instance it implies the Cauchy-Davenport inequality (1813) A + B min( A + B 1, p) for subsets A, B of Z/pZ. (Proof: apply the uncertainty principle to functions of the form f g, where f is supported in A, g is supported in B, and supp( ˆf), supp(ĝ) are chosen to have as small an intersection as possible.) Further applications of this type (Sun-Guo, 2008), (Guo, 2008). 14

The uncertainty principle for Z/pZ has the following equivalent interpretation: Uncertainty principle for Z/pZ If f : Z/pZ C is S-sparse (i.e. supp(f) S) and non-trivial, and Ω Z/pZ has cardinality at least S, then ˆf does not vanish identically on Ω. From a signal processing perspective, this means that any S Fourier coefficients of f are sufficient to detect the non-triviality of an S-sparse signal. This is of course best possible. 15

It is crucial that p is prime. For instance, if N is a square, then Z/NZ contains a subgroup of size N, and the indicator function of that subgroup (the Dirac comb) vanishes on N N Fourier coefficients despite being only N-sparse. 16

As a corollary of the uncertainty principle, if f : Z/pZ C is an unknown signal which is known to be S-sparse, and we measure 2S Fourier coefficients ( ˆf(ξ)) ξ Ω of f, then this uniquely determines f; for if two S-sparse signals f, g had the same Fourier coefficients on Ω, then the 2S-sparse difference f g would have trivial Fourier transform on Ω, a contradiction. 17

This is a prototype of a compressed sensing result. Compressed sensing refers to the ability to reconstruct sparse (or compressed) signals using very few measurements, without knowing in advance the support of the signal. (Note that one normally needs all p Fourier coefficients in order to recover a general signal; the point is that sparse signals have a much lower information entropy and thus are easier to recover than general signals.) 18

However, this result is unsatisfactory for several reasons: It is ineffective. It says that recovery of the S-sparse signal f from 2S Fourier coefficients is possible (since f is uniquely determined), but gives no efficient algorithm to actually locate this f. It is not robust. For instance, the result fails if p is changed from a prime to a composite number. One can also show that the result is not stable with respect to small perturbations of f, even if one keeps p to be prime. 19

It turns out that both of these problems can be solved if the frequency set Ω does more than merely detect the presence of a non-trivial sparse signal, but gives an accurate measurement as to how large that signal is. This motivates: Restricted Isometry Principle (RIP). A set of frequencies Ω Z/NZ is said to obey the RIP with sparsity S and error tolerance δ if one has (1 δ) Ω N f 2 L 2 (Z/NZ) ˆf 2 l 2 (Ω) (1+δ) Ω N f 2 L 2 (Z/NZ) for all S-sparse functions f. 20

Note that the factor Ω is natural in view of Plancherel s N theorem; it asserts that Ω always captures its fair share of the energy of a sparse function. It implies that Ω detects the presence of non-trivial S-sparse functions, but is much stronger than this. 21

This principle is very useful in compressed sensing, e.g. Theorem (Candés-Romberg-T., 2005) Suppose Ω Z/NZ obeys the RIP with sparsity 4S and error tolerance 1/4. Then any S-sparse signal f is the unique solution g : Z/NZ C to the problem ĝ Ω = ˆf Ω with minimal L 1 (G) norm. In particular, f can be reconstructed from the Fourier measurements ˆf Ω by a convex optimisation problem. 22

Sketch of proof: If a function g = f + h is distinct from f but has the same Fourier coefficients as f on Ω, use the RIP to show that h has a substantial presence outside of the support of f compared to its presence inside this support, and use this to show that f + h must have a strictly larger L 1 (G) norm than f. 23

Similar arguments show that signal recovery using frequency sets that obey the RIP are robust with respect to noise or lack of perfect sparsity (e.g. if f is merely S-compressible rather than S-sparse, i.e. small outside of a set of size S). There is now a vast literature on how to efficiently perform compressed sensing for various measurement models, many of which obey (or are assumed to obey) the RIP. 24

On the other hand, the RIP fails for many frequency sets. Consider for instance an S-sparse function f : Z/NZ C that is a bump function adapted to the interval { S/2,..., S/2} in Z/NZ. Then ˆf is concentrated in an interval of length about N/S centred at the frequency origin. If Ω avoids this interval (or intersects it with too high or too low of a density), then the RIP fails. Variants of this example show that a frequency set must be equidistributed in various senses if it is to obey the RIP. 25

Uniform uncertainty principle (Candés- T., 2006) A randomly chosen subset Ω of Z/NZ of size CS log 6 N will obey the RIP with high probability (1 O(N C )). Informally, a randomly chosen set of size O(S log 6 N) will always capture its fair share of the energy of any S-sparse function, thus we have a sort of local Plancherel theorem for sparse functions that only requires a random subset of the frequencies. 26

This implies that robust compressed sensing is possible with an oversampling factor of O(log 6 N). This was improved to O(log 5 N) (Rudelson-Vershynin, 2008). In practice, numerics show that an oversampling of 4 or 5 is sufficient. A separate argument (Candés-Romberg-Tao, 2006) shows that (non-robust) compressed sensing is possible w.h.p. with an oversampling factor of just O(log N). 27

The method of proof is related to Bourgain s solution of the Λ p problem, which eventually reduced to understanding the behaviour of maximal exponential sums such as Λ p (Ω) := sup{ ξ Ω c ξ e(xξ/n) L p (Z/NZ) : c l 2 (Ω) = 1} for randomly chosen sets Ω. In particular it relies on the use of a chaining argument used by Bourgain (and also simultaneously by Talagrand). 28

The chaining argument For each individual S-sparse function f, and a random Ω, it is not hard to show that the desired inequality (1 δ) Ω N f 2 L 2 (Z/NZ) ˆf 2 l 2 (Ω) (1 + δ) Ω N f 2 L 2 (Z/NZ) holds with high probability; this is basically the law of large numbers (and is the reason why Monte Carlo integration works), and one can get very good estimates using the Chernoff inequality. The problem is that there are a lot of S-sparse functions in the world, and the total probability of error quickly adds up. 29

One can partially resolve this problem by discretisation: pick an ε, and cover the space Σ of all S-sparse functions in some suitable metric (e.g. L 2 metric) by an ε-net of functions. But it turns out that there are still too many functions in the net to control, and even after controlling these functions, the S-sparse functions in Σ that are near to the net, but not actually on the net, are still not easy to handle. 30

The solution is to chain several nets together, or more precisely to chain together 2 n -nets N n of Σ for each n = 1, 2, 3,.... Instead of controlling the functions f n in each net N n separately, one instead controls the extent to which f n deviates from its parent f n 1 in the next coarser net N n 1, defined as the nearest element in N n 1 to f n. This deviation is much smaller than f itself, in practice, and is easier to control. After getting good control on all of these deviations, one can then control arbitrary functions f by expressing f as a telescoping series of differences f n f n 1. 31