Solution Recovery via L1 minimization: What are possible and Why?

Solution Recovery via L1 minimization: What are possible and Why? Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, U.S.A. Eighth US-Mexico Workshop on Optimization and its Applications Huatulco, Mexico, January 8, 2007 (Research supported in part by NSF Grants DMS-0442065)

Outline Don t we already know everything about LP? We introduce some recent results on using l 1 -minimization (LP) to recover exact solutions to non-square systems under certain sparsity conditions. Example: Missing data recovery Two Recovery Problems Under-determined system: Compressive sensing Over-determined system: Error correction Recoverability Result: A simple Proof Algorithmic and Computational Issues

Application: Missing Data Recovery by LP (I) 0.5 Complete data 0 0.5 0 100 200 300 400 500 600 700 800 900 1000 0.5 Available data 0 0.5 0 100 200 300 400 500 600 700 800 900 1000 0.5 Recovered data 0 0.5 0 100 200 300 400 500 600 700 800 900 1000 The signal was synthesized by a few Fourier components.

Application: Missing Data Recovery by LP (I) Complete data Available data Recovered data 75% of pixels were blacked out (becoming unknown).

Application: Missing Data Recovery by LP (III) Complete data Available data Recovered data 85% of pixels were blacked out (becoming unknown).

How are missing data recovered? Data vector f has a missing part u: f := [ c u ], c R m, u R n m. Under a basis Φ, f has a representation x, f = Φx, or [ ] [ ] B x c =. A u x may be recovered from the available data, c, by solving LP min{ x 1 : Bx = c}, x provided: (i) "sufficient" sparsity in x, (ii) "goodness" of B.

Under-determined: Compressive Sensing Given B R m n, m < n, and an "under sample vector" c, c = Bx. Question: Is it possible to recover x R n from c R m in poly-time? No, in general (Bx = c is under-determined). Yes, if x is sufficiently sparse and B is good. (Donoho et al, Candes-Tao,..., 2003-05 )

Over-determined: Error Correction Given A T R n (n m), and a corrupted over sample vector" b, b = A T y + h, where error h is unknown. Question: Is it possible to recover y R (n m) from b R n with h 0? No, in general. Yes, if h is sufficiently sparse and A is good. (Candes-Tao, Rudelson-Vershynin,... 2005.)

Equivalence Let A be (n m) n, B be m n, both of full rank, and ˆx = arg min{ x 1 : Bx = c} ŷ = arg min A T y b 1 (U1) (O1) Lemma 1: (Candes-Tao 05, YZ 05) Problems (U1) and (O1) are equivalent if and only if AB T = 0, c = Bb, in the sense that ŷ = (AA T ) 1 A(b ˆx), ˆx = b A T ŷ. It is useful to be able to treat the two interchangeably.

Geometry for error correction

Necessary & Sufficient Conditions Recovery depends on sparsity of x and "goodness" of A or B. How sparse is sufficient? What is "good"? We start with necessary and sufficient conditions. Proposition: (YZ 05) The following 3 conditions are equivalent and necessary and sufficient for recovering any k-sparse x : (recall BA T = 0) (1) range(a T ) is strictly k-balanced. (2) range(b T ) is strictly k-thick. (3) conv{±b j : j = 1,, n} R m is k-neighborly. We will only concentrate on condition (1) on range(a T ).

Condition (1) For S {1, 2,, n} and v R n, we define v S R S to be the sub-vector of v corresponding to S. Definition: k-balancedness A subspace A R n is strictly S-balanced if v A v S 1 < v S 1. The subspace is called strictly k-balanced if it is strictly S-balanced for all S such that S k.

Example Consider the 1D subspace in R 4 spanned by A T = [0.5000 0.6533 0.5000 0.2706] T. It is {i, 4}-balanced for i = 1, 2, 3. It is not {1, 2}-balanced, thus not 2-balanced. It is 1-balanced. Consequently, for b = A T y + h (h being error), y = arg min A T y b 1 for any h with h = 1, or h = 2 and h 4 0 (wharever size).

Sufficient Conditions How is sparsity connected to A (in fact range(a T ) or B T )? Lemma 1: where k < [γ 2 (A)] 2 Recoverability k < γ (A) Recoverability γ p (A) := 1 2 min v 1. v R(A T ) v p 1st condition is tight in order, but seems hard to compute. 2nd condition is weaker, but poly-time computable (n LPs).

2 Nonconvex Problems Sufficient conditions require p = : = min γ p (A) := 1 2 min v 1, p = 2, v R(A T ) v p 2γ (A) = min x { A T x 1 : A T x = 1} min 1 i n x p = 2: (assuming A T A = I) { AT x 1 : a T i x = 1, a T j x 1, j i}. 2γ 2 (A) = min x { A T x 1 : x 2 = 1} We still don t know how hard (or easy) the above is.

Randomness comes to rescue Tight bounds for γ 2 (A) already available for random matrices. Lemma 2: (Kashin 77, Garnaev-Gluskin 84) With probability above 1 e c 1(n m), a random Gaussian matrix A R (n m) n satisfies [γ 2 (A)] 2 = Θ (m/log(1 + n/m)) The constants involved are still unknown.

A Recoverability Result Theorem 1: (Candes-Tao 05,...) Let c = Bx. With a probability p > 1 e c1(n m), x = arg min{ x 1 : Bx = c} if B is a m by n Gaussian random matrix and k < c 2 m log(1+n/m) or m > c 3 k log(1 + n/k) A simple proof now follows from Lemmas 1 and 2 (YZ 05). To encode a sparse signal, the # of random measurements required grows almost linearly with the sparsity.

Signs help Theorem 2: There exist good matrices B so that x can be recovered from (U1) for all x 0 with sparsity level k m/2. In particular, (generalized) Vandermonde matrices (including partial Fourier matrix) are good. Candes-Romberg-Tao 05 (partial Fourier matrix) Donoho-Tanner 05 (using classical k-neighborliness results) YZ 05 (a simple proof)

Practical Significance Compressive Sensing New paradigm in data acquisition? Sensor Receiver Current Full Sample + Compression Light decoding CS Mild Sample, no compress Heavy decoding Sample size reduction: O(n) O(k log (n/k)) Computation load shift: sensor receiver Longer life for space telescopes, cheap sensors,... Potential next-generation data processing devises Potential applications in missing data recovery...

Algorithmic and Computational Issues Large-scale, dense LPs, SOCPs: min{ x 1 : Bx = c}. min{ x 1 : Bx c 2 ɛ}. A 1024 1024 (2D) image gives over 10 6 variables. The "good" matrices are all dense. Fast solutions (real-time processing) are often required. Interior-point or simplex methods are impractical. Research on low storage, fast algorithms is needed (and will be reported else where).

The End 3 technical reports on this subject are available from my website: http://www.caam.rice.edu/ yzhang/reports/ Thank You All!