Sparse analysis Lecture VII: Combining geometry and combinatorics, sparse matrices for sparse signal recovery Anna C. Gilbert Department of Mathematics University of Michigan
Sparse signal recovery measurements: length m = k log(n) k-sparse signal length n
Problem statement m as small as possible Assume x has low complexity: x is k-sparse (with noise) Construct matrix A: R n R m Given Ax for any signal x R n, we can quickly recover x with x x p C min y k sparse x y q
Parameters Number of measurements m Recovery time Approximation guarantee (norms, mixed) One matrix vs. distribution over matrices Explicit construction Universal matrix (for any basis, after measuring) Tolerance to measurement noise
Applications Applications Data stream algorithms x i = number of items with index i Data stream algorithms can maintain Ax under increments to x and recover x i = number of approximation items with index to ix can maintain Ax under increments to x Efficient data sensing recover approximation to x Efficient data sensing digital cameras digital/analoganalog-to-digital cameras converters analog-to-digital converters Error-correcting codes Error-correcting codes code {y R n Ay code {y R = 0} n Ay = 0} x = error vector, x = Ax error = syndrome vector, Ax = syndrome
Two approaches Geometric [Donoho 04],[Candes-Tao 04, 06],[Candes-Romberg-Tao 05], [Rudelson-Vershynin 06], [Cohen-Dahmen-DeVore 06], and many others... Dense recovery matrices (e.g., Gaussian, Fourier) Geometric recovery methods (l 1 minimization, LP) x = argmin z 1 s.t. Φz = Φx Uniform guarantee: one matrix A that works for all x Combinatorial [Gilbert-Guha-Indyk-Kotidis-Muthukrishnan-Strauss 02], [Charikar-Chen-FarachColton 02] [Cormode-Muthukrishnan 04], [Gilbert-Strauss-Tropp-Vershynin 06, 07] Sparse random matrices (typically) Combinatorial recovery methods or weak, greedy algorithms Per-instance guarantees, later uniform guarantees
Paper A/E Sketch length Encode time Column sparsity/ Decode time Approx. error Noise Update time [CCFC02, CM06] E k log c n n log c n log c n k log c n l2 Cl2 E k log n n log n log n n log n l2 Cl2 [CM04] E k log c n n log c n log c n k log c n l1 Cl1 E k log n n log n log n n log n l1 Cl1 [CRT06] A k log(n/k) nk log(n/k) k log(n/k) LP l2 C k 1/2 l1 A k log c n n log n k log c n LP l2 C k 1/2 l1 Y Y [GSTV06] A k log c n n log c n log c n k log c n l1 C log nl1 Y [GSTV07] A k log c n n log c n log c n k 2 log c n l2 C k 1/2 l1 [NV07] A k log(n/k) nk log(n/k) k log(n/k) nk 2 log c C(log n)1/2 n l2 k 1/2 l1 Y A k log c n n log n k log c n nk 2 log c C(log n)1/2 n l2 k 1/2 l1 Y [GLR08] A k(log n) c log log log n kn 1 a n 1 a LP l2 C k 1/2 l1 (k large ) This paper A k log(n/k) n log(n/k) log(n/k) LP l1 Cl1 Y Prior work: summary Paper A/E Sketch length Encode time Update time Decode time Approx. error Noise [DM08] A k log(n/k) nk log(n/k) k log(n/k) nk log(n/k) log D l2 C k 1/2 l1 Y [NT08] A k log(n/k) nk log(n/k) k log(n/k) nk log(n/k) log D l2 C k 1/2 l1 A k log c n n log n k log c n n log n log D l2 C k 1/2 l1 Y Y [IR08] A k log(n/k) nlog(n/k) log(n/k) nlog(n/k) l1 Cl1 Y Important contributions, flurry of activity
Unify these techniques Achieve best of both worlds LP decoding using sparse matrices combinatorial decoding (with augmented matrices) Deterministic (explicit) constructions What do combinatorial and geometric approaches share? What makes them work?
Sparse matrices: Expander graphs S N(S) Adjacency matrix A of a d regular (1, ɛ) expander graph Graph G = (X, Y, E), X = n, Y = m For any S X, S k, the neighbor set Probabilistic construction: N(S) (1 ɛ)d S d = O(log(n/k)/ɛ), m = O(k log(n/k)/ɛ 2 ) Deterministic construction: d = O(2 O(log3 (log(n)/ɛ)) ), m = k/ɛ 2 O(log3 (log(n)/ɛ))
Bipartite graph Adjacency matrix 1 0 1 1 0 1 1 0 0 1 0 1 1 1 0 1 0 1 1 0 0 1 1 1 1 0 0 1 1 0 0 1 1 1 1 0 1 0 1 0 Measurement matrix 5 10 15 20 (larger example) 50 100 150 200 250
RIP(p) A measurement matrix A satisfies RIP(p, k, δ) property if for any k-sparse vector x, (1 δ) x p Ax p (1 + δ) x p.
RIP(p) expander Theorem (k, ɛ) expansion implies (1 2ɛ)d x 1 Ax 1 d x 1 for any k-sparse x. Get RIP(p) for 1 p 1 + 1/ log n. Theorem RIP(1) + binary sparse matrix implies (k, ɛ) expander for ɛ = 1 1/(1 + δ) 2. 2
Expansion = LP decoding Theorem Φ adjacency matrix of (2k, ɛ) expander. Consider two vectors x, x such that Φx = Φx and x 1 x 1. Then x x 1 2 1 2α(ɛ) x x k 1 where x k is the optimal k-term representation for x and α(ɛ) = (2ɛ)/(1 2ɛ). Guarantees that Linear Program recovers good sparse approximation Robust to noisy measurements too
RIP(1) RIP(2) Any binary sparse matrix which satisfies RIP(2) must have Ω(k 2 ) rows [Chandar 07] Gaussian random matrix m = O(k log(n/k)) (scaled) satisfies RIP(2) but not RIP(1) x T = ( 0 0 1 0 0 ) y T = ( 1/k 1/k 0 0 ) x 1 = y 1 but Gx 1 k Gy 1
Expansion = RIP(1) Theorem (k, ɛ) expansion implies (1 2ɛ)d x 1 Ax 1 d x 1 for any k-sparse x. Proof. Take any k-sparse x. Let S be the support of x. Upper bound: Ax 1 d x 1 for any x Lower bound: most right neighbors unique if all neighbors unique, would have Ax 1 = d x 1 can make argument robust Generalization to RIP(p) similar but upper bound not trivial.
RIP(1) = LP decoding l 1 uncertainty principle Lemma Let y satisfy Ay = 0. Let S the set of k largest coordinates of y. Then y S 1 α(ɛ) y 1. LP guarantee Theorem Consider any two vectors u, v such that for y = u v we have Ay = 0, v 1 u 1. S set of k largest entries of u. Then y 1 2 1 2α(ɛ) u S c 1.
l 1 uncertainty principle Proof. (Sketch): Let S 0 = S, S 1,... be coordinate sets of size k in decreasing order of magnitudes On the one hand A = A restricted to N(S). y S Ay A y S 1 = Ay S 1 (1 2ɛ)d y 1. On the other S1 N(S) 0 = A y 1 = A y S 1 y i l 1 (i,j) E[S l :N(S)] S2 (1 2ɛ)d y S 1 l E[S l : N(S)] 1/k y Sl 1 1 (1 2ɛ)d y S 1 2ɛdk l 1 1/k y Sl 1 1 (1 2ɛ)d y S 1 2ɛd y 1
Sparse matrices and combinatorial decoding Combinatorial decoding: bit-test B 1s = 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 0 0 1 0 0 0 0 0 = 0 1 0 MSB LSB bit-test matrix signal = location in binary
Augmented expander Isolation Matrix with Bit Testing Isolation Matrix Bit Testing Matrix Ψ is adjacency matrix of (k, 1/8)-expander, degree d, m rows. A = Ψ r B1 with m log n rows.
Combinatorial decoding Recover(A, y) { if y = 0 then return 0 z = Reduce(A, y) w = Recover(y - Az) return z + w } Reduce(A, y) { for j = 1... m compute (i, val) such that if supp(φ) j supp(x) = {i} then x i = val if val 0 then votes[i] = votes[i] {val} w = 0 for i = 1... n such that votes[i] if votes[i] contains d/2 copies of val then w i = val return w }
Combinatorial decoding Bit-test Good votes Bad votes Retain {index, val} if have > d/2 votes for index d/2 + d/2 + d/2 = 3d/2 violates expander = each set of d/2 incorrect votes gives at most 2 incorrect indices Decrease incorrect indices by factor 2 each iteration
Augmented expander = Combinatorial decoding Theorem Ψ is (k, 1/8)-expander. A = Ψ r B 1 with m log n rows. Then, for any k-sparse x, given Ax, we can recover x in time O(m log 2 n). With additional hash matrix and polylog(n) more rows in structured matrices, can approximately recover all x in time O(k 2 log O(1) n) with same error guarantees as LP decoding. Expander central element in [Indyk 08], [Gilbert-Strauss-Tropp-Vershynin 06, 07]
ρ ρ Empirical results 1 Probability of exact recovery, signed signals 1 1 Probability of exact recovery, positive signals 1 0.9 0.9 0.9 0.9 0.8 0.8 0.8 0.8 0.7 0.7 0.7 0.7 0.6 0.6 0.6 0.6 0.5 0.5 0.4 0.5 0.4 0.5 0.3 0.4 0.3 0.4 0.2 0.3 0.2 0.3 0.1 0.2 0.1 0.2 0 0 0.2 0.4 0.6 0.8 1 δ 0.1 0 0 0.2 0.4 0.6 0.8 1 δ 0.1 Performance comparable to dense LP decoding Image reconstruction (TV/LP wavelets), running times, error bounds available in [Berinde, Indyk 08]
Summary: Structural Results Geometric RIP(2) Combinatorial RIP(1) Linear Programming Weak Greedy
More specifically, sparse binary RIP(1) matrix Expander Explicit constructions (log log n)o(1) m = k2 + 2nd hasher (for noise only) bit tester LP decoding (fast update time, sparse) Combinatorial decoding (fast update time, fast recovery time, sparse)