Info-Greedy Sequential Adaptive Compressed Sensing

Size: px

Start display at page:

Download "Info-Greedy Sequential Adaptive Compressed Sensing"

Angelina Hodge
5 years ago
Views:

1 Info-Greedy Sequential Adaptive Compressed Sensing Yao Xie Joint work with Gabor Braun and Sebastian Pokutta Georgia Institute of Technology Presented at Allerton Conference 2014

2 Information sensing for big data I How to extract useful information from big data for statistical inference? I many measurements are made sequentially I example: real-time estimation of state of a large network, detect changes/anomaly quickly I we need to make minimum number of measurements

3 Sequential adaptive compressed sensing linear measurement: total resource constraint: y i = a i x + w i, i = 1,..., m. β i = a i 2 2 : power allocated m a i 2 2 P i=1 adaptiveness: measurements made sequentially, next measurement can be designed based on previous results compressive measurement how to acquire as much new information as possible?

4 Info-Greedy Sensing use conditional mutual information as metric incorporate distributional knowledge about x Info-Greedy optimal problem max ai I[x; a i x + w y j, a j, j < i] non-concave in general [PalomarVerdu2006]

5 Related work mutual information one-shot compressed sensing: [CarsonChenCalderbankCarin2012] maximize A I(x; Ax + w) subject to 1 m tr(aa ) 1 direct adaptive sensing: distilled sensing [HauptCastroNowak2011], multistage support estimate [WeiHero2013] adaptive compressive sensing for k-sparse signal: compressive binary search [DavenportCastro2012], CASS [MalloyNowak2012]

6 What s new in Info-Greedy sensing systematical look at sequential adaptive compressive sensing from information theory perspective new insights (Info-Greedy optimality, when adaptiveness help) into existing algorithms: bisection, CASS, Gaussian more general signal and noise model new algorithms: sparse measurement

7 Information theoretical bounds to obtain certain small recovery error x ˆx 2 2 ɛ sequentially acquire measurements to reconstruct a signal equivalent to learning the ɛ-ball that x is constrained in ˆxk ak ak+1 kx ˆxk 2 < " following numb

8 let F be a family of signals of interest F F a random variable with uniform distribution Lemma: general lower bound on the number of measurements Suppose that for some constant C > 0, H[y i a i, a j, y j, j i, M i] C for every round i. Then E[M] Moreover, for all t we have log F C. P[M < t] (Ct)/H(F ) and P[M = O(H(F ))] = 1 o(1).

9 k-sparse signal: windfarm example I n wind turbines on a windfarm I a sensor equipped on each turbine and senses xi {0, 1}, i = 1,..., n I I I a central hub measures total sum of a subset of sensor readings at each time suppose there are k turbines broken down: k n how do hub make measurements to quickly localize defective turbines

10 Bisection algorithm a1 a2 a3 (S0, 2) y1=2 (S1, 2), (S2, 3) y2=0, y3=2 (S4, 2), (S5, 1), (S6, 2)

11 k-sparse signal... Use results from information theoretic lower bounds: uniform amplitude signal Lemma: upper bound for bisection algorithm Let x R n, [x] i {0, 1} be a k-sparse signal. Then in the absence of noise, the bisection algorithm recovers x exactly with at most k log n measurements. Lemma: Info-Greedy optimality For k = 1 in the absence of noise the bisection algorithm is Info-Greedy optimal with respect to random support. When k > 1, the bisection algorithm is Info-Greedy optimal up to a factor of log k. can also be extended to non-uniform non-negative amplitude signals: CASS-type algorithms is Info-Greedy optimal

12 Low-rank Gaussian signals assume x N (0, Σ x ) I[x; y 1 ] = 1 ( ) a 2 ln 1 Σ x a 1 σ a 1 = arg max I[x; y 1 ] = leading eigenvector of Σ x a conditioned on y 1, random vector [x, y 2 ] is again Gaussian with updated mean and covariance matrix

13 Info-Greedy Sensing for low-rank Gaussian Recursive form Given: Σ x, σ 2, p: probability of error less than ε Repeat Until Σ x ε 2 /χ 2 n(p) β (χ 2 n(p)/ε 2 1/ Σ x )σ 2 a β leading eigenvector of Σ x µ µ + Σ x a(a Σ x a + σ 2 ) 1 (y a µ) Σ x Σ x Σ x a(a Σ x a + σ 2 ) 1 a Σ x

Uncertainty reduction After measuring in the direction of a unit norm eigenvector with eigenvalue λ, using power β, conditional covariance matrix Σ x Σ x βa ( βa Σ x βa + σ 2 ) 1 βa

14 Uncertainty reduction After measuring in the direction of a unit norm eigenvector with eigenvalue λ, using power β, conditional covariance matrix Σ x Σ x βa ( βa Σ x βa + σ 2 ) 1 βa Σ x = λσ2 βλ + σ 2 a a + Σ a x, Σ a x : component of Σ x in the orthogonal complement of a after measurement: eigenvalue of a: λ λσ 2 /(βλ + σ 2 ) uncertainty in direction a is reduced

15 Insight λ λσ 2 /(βλ + σ 2 ) Making repeated measurements is equivalent to allocating more power. after one measurement with β = 1: λ λσ2 λ + σ 2 = σ σ 2 /λ Suppose after k measurements: λσ 2 kλ+σ 2 with k + 1 measurement in that direction and β = 1 σ σ 2 kλ+σ 2 λσ 2 = λ(σ 2 ) 2 λσ 2 + kλσ 2 + (σ 2 ) 2 = λσ 2 (k + 1)λ + σ 2

16 Info-Greedy Sensing for low-rank Gaussian... based on above insights one-shot form: power allocation choose a 1, a 2,... to be eigenvectors of Σ x in a decreasing order of eigenvalues amplitude β 1, β 2,... to be (χ 2 n(p)/ε 2 1/λ i )σ 2 one-shot form: repeated measurements choose a 1, a 2,... to be eigenvectors of Σ x in a decreasing order of eigenvalues β i = 1 ith measurement repeats (χ 2 n(p)/ε 2 1/λ i )σ 2 times

17 Sequential and One-shot solutions are same for Gaussian? Sequential: max ai I[x; a i x + w y j, a j, j < i] One-shot: max A I[x; Ax] (solution: dominated eigenvector [CarsonChenCalderbankCarin2012]) Still, sequential may be beneficial Sparse Σ x R n n with v non-zero entries sparse power s method, O(t(n + v)) when knowledge of Σ x is imprecise, sequential measurement allows updating Σ x from data

18 Theoretical performance guarantee Low bound on m: white noise Info-Greedy Sensing computes a reconstruction ˆx with x ˆx 2 2 < ε with probability at least p with at most the following number of measurements with β i = 1, k i=1 λ i 0 { ( χ 2 max 0, n (p) ε 2 1 ) } σ 2. λ i Or with k measurements, and total power at most k i=1 λ i 0 { ( χ 2 max 0, n (p) ε 2 1 ) } σ 2. λ i

19 Error for White Noise 10 1 Info Greedy Random ε = Ordered Optimal m, White Noise

20 Recovery of power consumption vector Recovery power consumption data of 58 counties in California Gaussian is reasonable fit: data year 2006 to year 2011 Even knowledge learned from 5 years of limited data can make a difference! Probability Normal Probability Plot residual x x * /maxx i Info Greedy Random M

21 Low-rank Gaussian mixture model signals p(x) = C π c N (µ c, Σ c ) c=1 no analytic expression for conditional mutual information two approaches: gradient descent based approach, greedy approach adaptiveness makes a difference greedy gradient M = 20, σ = M = 20, σ = 0.01 greedy gradient I(X; Y 1,..., Y i ) I(X; Y i Y 1,..., Y i 1 ) number of measurements: i number of measurements: i

22 Handwritten digit recognition True = 2 Greedy = 6 Grad = 2 Figure : Comparison of true and recovered handwritten digit 2 by the greedy heuristic and the gradient descent algorithm, respectively. Method Random Greedy Gradient prob. false classification

23 Sparse measurement vector add additional constraint a 0 k 0 in the formulation can be solved using outer approximation mixed-integer linear program maximize a,r,z subject to z n i=1 r i k 0 a i r i, a i r i 0 z c, r i {0, 1}, i = 1,..., n a R n, z R, 10 1 Random Sparse measurement dimension Info Greedy non sparse Info Greedy Sparse

24 Summary Info-Greedy sequential adaptive compressed sensing max a i I[x; a i x + w y j, a j, j < i], i = 1, 2,... Info-Greedy optimality: bisection and CASS-type algorithm Gaussian signals and insights sparse measurements Future: towards online compressed sensing interplay of learning signal distribution and designing sequential measurements

Introduction to Compressed Sensing

Introduction to Compressed Sensing Alejandro Parada, Gonzalo Arce University of Delaware August 25, 2016 Motivation: Classical Sampling 1 Motivation: Classical Sampling Issues Some applications Radar Spectral