Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery

Compressibility of Infinite Sequences and its Interplay with Compressed Sensing Recovery Jorge F. Silva and Eduardo Pavez Department of Electrical Engineering Information and Decision Systems Group Universidad de Chile E-mails: josilva@ing.uchile.cl, epavez@ing.uchile.cl Abstract This work elaborates connections between notions of compressibility of infinite sequences, recently addressed by Amini et al. [1], and the performance of the compressed sensing CS) type of recovery algorithms from linear measurements in the under-sample scenario. In particular, in the asymptotic regime when the signal dimension goes to infinity, we established a new set of compressibility definitions over infinite sequences that guarantees arbitrary good performance in an l 1-noise to signal ratio l 1-NSR) sense with an arbitrary close to zero number of measurements per signal dimension. I. INTRODUCTION In the case of finite dimensions, compressed sensing CS) has established sufficient conditions over a set of lineal measurements [2], [3], for perfect recovery and near-optimal reconstruction under a notion of signal sparsity and compressibility [4], [5], [6], [7]. Remarkably, these results cover the under-sampled regime where the number of measurements is strictly less than the signal dimension, and consequently perfect recovery over all possible signals cannot be achieved. In a nutshell, CS captures signal structure by means of a notion of sparsity and compressibility), where restricted to these signal families, a way of doing perfect signal decoding and near optimal best k-term approximation) is achieved by means of an l 1 -minimization algorithm, which is implementable by linear programing [4]. The other critical ingredient in the CS theory is the role of random measurements that allow with very large probability) perfect recovery under an Ok logn/k)) number of measurements [3], with k being the sparsity of the signal and N the signal dimension. Moving on the countable infinity case i.e., the space of sequences), Amini et al. [1] have introduced definitions of l p -compressibility for deterministic and random infinity sequences. Remarkably, these definitions allow the signals to have an infinite l p -norm, as they are based on a relative norm concept the ratio between the norm of the best k-term approximation and the norm of the whole signal), reminiscent of an l p -signal to noise ratio l p -SNR) fidelity measure. This relative fidelity measure allows capturing a wider spectrum of signal collections, which was particularly relevant when the authors addressed the random case generated from an independent and identically distributed i.i.d.) process [1, Sec III, Th. 1]. In summary, based on their notion of l p -compressibility for deterministic and random sequences, they offered the basis for categorizing sequences in a way that could be of interest for CS performance recovery. In this work we move on in this direction, and provide a connection between l 1 -compressibility notions for deterministic sequences and their respective performance in a CS lineal measurement coding and l 1 -minimization decoding) setting. In particular, we revisit and propose new compressibility definitions that provide a concrete categorization of sequences over which the CS signal representation setting achieves an arbitrary close to zero l 1 -SNR distortion under arbitrary close to zero number of measurements per signal dimension. We show that these performances are obtained under different scenarios with compressibility notions that are stronger than original definitions of Amini et al. [1]. In summary, these results provide concrete connections between signal structure and guarantee performance for an infinitely countable CS setting, with focus on the zero measurement per signal dimension asymptotic regime. The rest of the paper is organized as follows: Section II introduces the basic elements of the CS theory needed in the work. Section III is the main content section that presents the new definitions and CS recovery results. Section IV explores the important case of l 1 -sequences, and finally Section V offers some directions for future work. II. COMPRESSED SENSING Let x R N be a finite dimensional vector, where the usual l p -norm is given by N ) 1 p x lp = x i p, p > 1. 1) n=1 We define the support of a x by supportx) {i : x i > 0} N. The signal x is said to be k-sparse if supportx) k, and the collection of k-sparse signals is denoted by Σ k. For an arbitrary signal x R N, x k denotes its best k-term approximation, i.e., x k retains the k-largest entries of x and zeroes out the rest of the coefficients 1. In this context the k-term approximation error of x is denoted by σ k x) p inf x Σk x x lp. CS puts attention on the problem of recovering sparse signals through linear measurements. The CS setting can be 1 Alternatively, x k arg min x Σk x x l1 for all k {1,..N}.

seen as a coding-decoding framework over the operational constraint of a finite set of linear measurements [8], [3], [4]. The encoding function is a linear operator φ : R N R m, that generates a measurement vector by y = φx R m. The main focus is on the under-sample regime, i.e., m < N, where the problem of recovering x from y is ill-posed. However, the main CS conjecture was that on constraining the signal domain to a collection of sparse signals, perfect reconstruction could be achieved when m < N. Furthermore, CS proposes a concrete and implementable by linear programing) decoding function : R m R N given by y) = arg min { x R N :y=φ x} x l1. 2) Remarkably, CS theory establishes sufficient conditions over φ and implicitly over the number of measurements m) in order that x = φx) when x Σ k for some k < N. Here, we revisit the results derived from the celebrated restricted isometry property RIP) [2], [4], which is the one we use in this work. Definition 1: [2, Def. 1.1] Given a matrix φ and k {1,..N}, the isometry constant δ k of φ is the smaller nonnegative number such that x Σ k 1 δ k ) x 2 l 2 φx 2 l 2 1 + δ k ) x 2 l 2. 3) The isometry definition in 3) is the key to obtaining the following recovery result: THEOREM 1: [2, Th. 1.2] If the measurement matrix φ has an isometry constant δ 2k < 2 1 for some k > 0, then x x l1 C 0 x x k l1 = C 0 σ k x) 1 4) where x is the solution of 2) and C 0 is a universal constant that is only a function of δ k. In this case, we say that φ satisfies the RIP property for k. Hence for a given level of approximation k, it is critical to characterize the minimum number of measurements m and the structure of the matrix φ that satisfy the RIP property of Theorem 1. The following result, in its original form stated in [5], shows that random measurements offer a solution to the problem of constructing matrixes that satisfy the RIP property with a near optimal relationship between m and k [8]. THEOREM 2: [3, Th. 5.2] Let φw) be a random matrix 2, w Ω mn, whose entries are driven by i.i.d realizations of a Gaussian distribution N 0, 1/m) or a binary variable with uniform distribution over {1/ m, 1/ m}. Then for any arbitrary number δ 0, 1) and k N, if m C 1 k log N k, φw) satisfies the condition in 3) with respect to δ and k with a probability over the space Ω mn at least equal to 1 2 C2 m, where C 1 and C 2 are universal constants that are only function of δ. In summary, Theorems 1 and 2 provide sufficient conditions over the number of measurements m of concrete random matrices the Gaussian and Bernoulli random matrices) to 2 This result can be generalized to random matrices satisfying a concentration inequality, which is not reported here for space considerations. See details on this observation in [3]. achieve perfect reconstruction for k-sparse signals, or near optimal k-term approximation errors for any arbitrary signal in R N. This can be obtained with high probability over the sample space of random matrixes, i.e., over Ω mn. III. COMPRESSIBILITY AND CS PERFORMANCE GUARANTEE IN THE ASYMPTOTIC REGIME Here we are interested in the role of CS coding-decoding for sequences x n ) R N and in the kind of compressibility notion in this domain that provides guaranteed performance in this context. We follow the approach proposed by Amini et al. [1], where the analysis is based on truncating the sequence to a finite dimension N and then taking the it as N. We begin revisiting the definitions introduced in [1] and include some new extensions. Let x = x n ) be a sequence in R N. We denote x N x i ) N i=1 the N-truncated version of x and x ni) N i=1 its N-order statistics, where x n1 x n2... x nn. Let ς p k, x N ) x n1 p +... + x nk p ) 1 p, k {1,.., N} 5) be the l p -norm concentrated in the k-most significant entries of x N, and let us define { } κ p r, x N ) min k {1,.., N} : ς pk, x N ) x N r, 6) lp for all r 0, 1]. Definition 2: Amini et. al[1, Def. 4]) A sequence x n ) R N is said to be l p -compressible, if r 0, 1), κ p r, x N ) = 0. 7) N Definition 3: x n ) is said to be strongly l p -compressible if r 0, 1), κ p r, x N ) log N N = 0. 8) Definition 4: x n ) is said to be τ-power dominated for l p for some τ 0, 1), if r 0, 1), κ p r, x N )) N>0 is ON τ ). 9) Definition 5: x n ) is said to be asymptotically sparse for l p, if r 0, 1), sup κ p r, x N ) <. 10) Note that all these definitions of compressibility are hierarchical, from the weakest in Definition 2 to the strongest in Definition 5. Finally in this countable infinite setting, we can naturally say that x n ) is k-sparse if supportx n ) k. Here we are interested in the recovery of x with a family of CS encoding-decoding pairs {φ mn N, N ) : N > 0} of different signal-lengths, where for a fixed dimension N > 0 the encoder is given by a linear operator: y N φ mn Nx N 11) from R N to R m N, and the decoding process x N N yn ) R N is the solution of the l 1 -minimization in 2).

Sequences in R N have an infinite norm in general, then as a performance metric or distortion measure) we consider the notion of noise to signal ratio NSR) in the l p -sense given by: D lp φmn N, N ); x N) x N x N lp x N [0, 1]. 12) lp On the other hand, as a cost measure for the pair φ mn N, N ) we consider the number of linear measurements per signal dimension given by: Rφ mn N, N ) m N /N 0, 1]. 13) A. Asymptotic CS Recovery Results From this point on, we will focus exclusively on l 1 -norm compressibility as is seen in Theorem 1 the right-hand side of 4)), it is the type of approximation condition over the signals that could provide guaranteed performance. We begin introducing an asymptotic definition of performance for the random CS scheme of Theorem 2. Definition 6: The pair d, c) [0, 1) [0, 1) is achievable for x n ) R N adopting the standard random CS scheme {φ mn Nw), N ) : N > 0} of Theorem 2, if there is a sequence m N ) such that sup Rφ mn N w), N ) c and P { w Ω m N N : D l1 φmn N w), N); x N ) d }) = 1, where P denotes the process distribution of the random matrix process {φ mn N w) : N 1}. We can state the following result which is a corollary of Theorem 1: LEMMA 1: Let x n ) be an arbitrary sequence and φ mn N, N ) be a CS encoding-decoding pair. If φ m N N satisfies the RIP of Theorem 1 for all k κ 1 r, x N ), then D l1 φmn N, N ); x N) C 0 1 r). 14) Proof: In particular, φ mn N satisfies the RIP with respect to k = κ 1 r, x N ), hence from 4), x x l1 x x k x N C l1 0 l1 x N C 0 1 r), 15) l1 this last inequality by definition of κ 1 r, x N ). THEOREM 3: Let x n ) be l 1 -strongly compressible, then d 0, 1) there is {φ mn Nw), N ) : N > 0} a random CS scheme 3, where P { w Ω m N N : D l1 φmn Nw), N); x N ) d }) = 1 and Rφ m N N, N ) = 0. In other words, for any strongly compressible signal and distortion level, there is a standard random CS scheme that achieves that distortion, under the asymptotic regime of zero 3 φ mn N w) is either the Gaussian or Bernoulli random matrix of Theorem 2. measurement per signal dimension, and with an arbitrary probability close to one. Hence, this result provides the operational interpretation of the notion of asymptotic compressibility stated in Definition 3. Proof of Theorem 3: Let d 0, 1) be an arbitrary number and r 0, 1) such that C 0 1 r) < d. Let us consider the sequence κ 1 r, x N )) and take m N ) κ 1 r, x N ) log N). In this context, Theorem 2 says that for an arbitrary number N > 1, φ mn Nw) satisfies the RIP property of Theorem 1 with respect to k κ 1 r, x N ) with probability at least equal to 1 2 C2 m N. Therefore from Lemma 1, P { w : D l1 φmn N w), N); x N) C 0 1 r) }). 16) Finally, the facts that m N and C 0 1 r) < d conclude the argument. It is worth noting that the proof of this result uses a CS scheme that depends on the signal and the level of distortion m N ) κ 1 r, x N ) log N) with r > 1 d/c 0 ), which is a kind of oracle result. The following result takes advantage of the universal nature 4 of the RIP property obtained for random matrices in Theorem 2, and offers an universal CS codingdecoding construction over a family of signals and distortion levels. PROPOSITION 1: Let us consider a random CS scheme {φ mn N w), N ) : N > 0} with m N N ρ log N with ρ 0, 1). Then for all x n ) τ-power dominated with τ < ρ and for any d 0, 1), P { w : D l1 φmn Nw), N ); x N) d }) = 1, 17) furthermore, we have that D l φmn 1 Nw), N ); x N) = 0, P almost surely. 18) Note that the universal scheme of Proposition 1 is zero measurement per signal dimension by construction. Proof of Proposition 1: Let d 0, 1) be an arbitrary number and x n ) an arbitrary signal τ-power dominated. By hypothesis as m N N ρ log N, considering any r > 1 d/c 0 ), we have that N 0 such that for all N N 0, m N κ 1 r, x N ) log N. Therefore from Lemma 1, N N 0 P { w Ω m N N : D l1 φmn Nw), N ); x N) d }), 19) then, taking the it on N, we obtain 17). Note that 19) is valid for all d > 0, which is equivalent to saying that D l1 φmn Nw), N ); xn ) = 0 in probability or measure) [9]. We can get the stronger almost-sure convergence in 18) from the Borel-Cantelli lemma [9] and from the fact that by the hypothesis we have that N 1 2 C3 m N <. Remark 1: i) Note that the construction of Proposition 1 shows that the distortion-measurement point d, 0), for d arbitrary close to zero, is achievable for the class of signals 4 Over the signal collection Σ k.

τ-power dominated K τ. ii) Furthermore, there is a universal scheme that achieves this result over the family τ<p K τ. iii) Finally, D l1 φmn Nw), N ); ) xn convergences to zero with probability one with respect to P, which is stronger than the convergence in measure stipulated in Definition 6. The universal zero distortion and zero measurement construction of Proposition 1 is also valid for the collection of asymptotically sparse signals of Definition 5, as they are τ- power dominated for any τ 0, 1). However, we can get a refined result if we restrict the analysis exclusively to this family of asymptotically sparse signals. PROPOSITION 2: Let {φ mn Nw), N ) : N > 0} be the usual random CS scheme of Theorem 3, and let K R N denote the set of asymptotically sparse signals. If we assume that 1/m N ) is o1/log N) and m N ) is on), then for any d > 0 and for all x n ) K, P { w : D l1 φmn N w), N); x N ) d }) = 1. 20) Thus this result shows sufficient conditions for a zero measurement universal CS scheme to achieve close to zero arbitrary distortion as N tends to infinity. It is important to note that the condition 1 1 m N ) is o log N ) is weaker than its universal counterpart in Proposition 1, obtained for the family of τ-power dominated signals. However, we pay the price that the almost-sure convergence in 18) is not necessarily holding, where from 20) D l1 φmn Nw), N ); xn) converges to zero in probability. Proof of Proposition 2: Let us fix an arbitrary r 0, 1). As sup κ 1 r, x N ) <, then by hypothesis N 0 such that N N 0, m N κ 1 r, x N ) log N. The rest of the argument follows the same steps as those in the proof of Proposition 1. Remark 2: To conclude this section, it is worth emphasizing that the compressibility in Definition 2, although it provides a nice interpretation of asymptotic energy compaction of the signal in the sense of best k-term approximation [1], does not grant any performance in the sense of NSR versus number of measurements per dimension for the operational context of CS analysis and synthesis studied in this work. As a result, stringent compressibility notions were needed to fill this gap, which motivated the inclusion of Definitions 3-5. IV. THE l 1 N) CASE Here we study the implications of the type of results presented in the previous section, for the special case of sequences in l 1 N). We begin by mentioning the following fact: LEMMA 2: If x n ) is l 1 N) then it is asymptotically sparse Definition 5). The proof is presented in the Appendix. As l 1 N) K, then, from Proposition 2, we have a universal CS scheme achieving zero distortion with zero measurement per dimension for this class of signals. Furthermore considering that <, Theorems 1 and 2 offer the following result concerning l 1 -absolute error in the asymptotic regime: THEOREM 4: Let {φ mn Nw), N ) : N > 0} be the standard CS random scheme. If 1/m N ) is o1/log N) and m N ) is on), then x n ) l 1 N) and for any d > 0, { P w : x N Nφ mn Nw) x N ) }) l1 d = 1. 21) Furthermore, if m N ) N ρ log N) for some ρ 0, 1), then x n ) l 1 N) x N Nφ mn Nw) x N ) l1 = 0, P almost surely. 22) The proof follows the same arguments as the proof of Propositions 1 and 2. For completeness it is presented here. Proof: We show first the result in 21). Let us consider r 0, 1). As sup κ 1 r, x N ) < by Lemma 2, then by the hypothesis N 0 r) such that N N 0 r), m N κ 1 r, x N ) log N. Then from 19) and the fact that x N l1 <, N N 0 r) { P x N Nφ mn Nw) x N ) } C l1 01 r), 23) At this point for any arbitrary d > 0, we can choose r > 0 sufficiently small such that C 0 1 r) < d, and, consequently, we obtain the result in Eq. 21) from 23) since by hypothesis m N goes to infinity. Concerning the result in Eq.22), note that this is obtained from 23), which is valid in this scenario as m N ) N ρ log N), and the Borel-Cantelli lemma, because N 1 2 C3 m N <. Finally, given that x n ) l2 <, we can obtain a version of Theorem 4 considering l 2 -norm as a fidelity measure. V. SUMMARY AND FUTURE WORK This work introduces new definitions of compressibility of infinite sequences. In this context, we elaborate a compressed sensing analysis-synthesis scenario, in which results are established that guarantee recovery performances in the sense of achieving a given number of measurements per signal dimension and an l 1 noise to signal ratio l 1 -NSR) distortion measure. In particular, we explore the zero-measurement and zero distortion regime, and their interplay with the previously mentioned notions of compressibility in the asymptotic regime when the signal dimension goes to infinity. There are a number of interesting directions to pursue in further research on this topic. Just to name a few key ones: i) the characterization of concrete sequences that satisfy the proposed notions of compressibility; ii) the extension of performance recovery for the important case of an l 2 - based distortion measure; iii) the extension of these results for random sequences, where the focus would be on average performances instead of the universal minimax) [8] performance criterion explored in this work.

ACKNOWLEDGMENT This material is based on work partially supported by Program U-Apoya, University of Chile and grants of CONICYT- Chile, Fondecyt 1110145. We thank Sandra Beckman for proofreading this work. A. Proof of Lemma 2 VI. APPENDICES As x n ) is l 1 N), then ɛ > 0, Nɛ) 1 such that n Nɛ) x n < ɛ. Let us define { k 1 r, N, x n )) min k {1,.., N} : ς 1k, x N ) r, 24) where k 1 r, N, x n )) κ p r, x N ) as x N l1. We have that for all N > 0, and for any k {1,.., N}, x N1 + + x Nk x 1 + + x k = 1 } n k x n, 25) where {x N1,.., x NN } is the order statistics of {x 1,.., x N }. Hence, for all ɛ > 0, if N k > Nɛ), x N1 + + x Nk x n) l1 1 ɛ/. Therefore, if ɛ < 1 r), sup k 1 r, N, x n )) Nɛ) < for all r 0, 1). REFERENCES [1] A. Amini, M. Unser, and F. Marvasti, Compressibility of deterministic and random infinity sequences, IEEE Transactions on Signal Processing, vol. 59, no. 11, pp. 5193 5201, November 2011. [2] E. J. Candes, The restricted isometry property and its applications for compressed sensing, C. R. Acad. Sci. Paris, vol. I 346, pp. 589 592, 2008. [3] R. G. Baraniuk, M. Davenport, R. DeVore, and M. Wakin, A simple proof of the restricted isometry property for random matrices, Constructive Approximation, 2008. [4] E. J. Candes and T. Tao, Decoding by linear programing, IEEE Transactions on Information Theory, vol. 51, no. 12, pp. 4203 4215, December 2005. [5] E. J. Candes, J. Romberg, and T. Tao, Robust uncertanty principle: exact signal reconstruction from highly imcomplete frequency information, IEEE Transactions on Information Theory, vol. 52, no. 2, pp. 489 509, 2006. [6], Stable signal recovery from incomplete and inaccurate measurements, Commun. on Pure and Appl. Math., vol. 59, pp. 1207 1223, 2006. [7] D. L. Donoho, Compressed sensing, IEEE Transactions on Information Theory, vol. 52, pp. 1289 1306, 2006. [8] A. Cohen, W. Dahmen, and R. DeVore, Compressed sensing and best k-term approximation, Journal of the American Mathematical Society, vol. 22, no. 1, pp. 211 231, July 2009. [9] L. Breiman, Probability. Addison-Wesley, 1968.