Robust Bayesian compressed sensing. over finite fields: asymptotic performance analysis

Size: px

Start display at page:

Download "Robust Bayesian compressed sensing. over finite fields: asymptotic performance analysis"

Bertha Hamilton
6 years ago
Views:

1 Robust Bayesian compressed sensing 1 over finite fields: asymptotic performance analysis arxiv: v1 [cs.it] 17 Jan 2014 Wenjie Li, Francesca Bassi, and Michel Kieffer Abstract This paper addresses the topic of robust Bayesian compressed sensing over finite fields. For stationary and ergodic sources, it provides asymptotic with the size of the vector to estimate) necessary and sufficient conditions on the number of required measurements to achieve vanishing reconstruction error, in presence of sensing and communication noise. In all considered cases, the necessary and sufficient conditions asymptotically coincide. Conditions on the sparsity of the sensing matrix are established in presence of communication noise. Several previously published results are generalized and extended. Index Terms Compressed sensing, finite fields, MAP estimation, asymptotic performance analysis, stationary and ergodic sources. W. Li and M. Kieffer are with Laboratoire des Signaux et Systèmes, CRS Supelec Univ Paris-Sud, Gif-sur-Yvette. M. Kieffer is partly supported by the Institut Universitaire de France. F. Bassi is with ESME-Sudria, Ivry-sur-Seine. This work was partly supported by European etwork of Excellence project EWCOM#.

2 2 I. ITRODUCTIO Compressed sensing refers to the compression of a vector θ R, obtained by acquiring linear measurements whose number M can be significantly smaller than the size of the vector itself. If θ is k-sparse with respect to some known basis, its almost surely exact reconstruction can be evaluated from the linear measurements using basis pursuit, for M as small as Ok log)) [1], [2]. The same result holds true also for compressible vector θ [3], with reconstruction quality matching the one allowed by direct observation of the biggest k coefficients of θ in the transform domain. The major feature of compressed sensing is that the linear coefficients do not need to be adaptive with respect to the signal to be acquired, but can actually be random, provided that appropriate conditions on the global measurement matrix are satisfied [2], [4]. Moreover, compressed sensing is robust to the presence of noise in the measurements [4], [5]. Bayesian compressed sensing [6] refers to the same problem, considered in the statistical inference perspective. In particular, the vector to be compressed is now understood as a statistical source Θ, whose a priori distribution can induce sparsity or correlation between the symbols. This allows to redefine the reconstruction problem as an estimation problem, solvable using standard Bayesian techniques, e.g., Maximum A Posteriori MAP) estimation. In practical implementations, estimation from the linear measurements can be achieved exploiting statistical graphical models [7], e.g., using belief propagation [8] as done in [9] for deterministic measurement matrices, and in [10] for random measurement matrices. In this paper we address the topic of robust Bayesian compressed sensing over finite fields. The motivating example for considering this setting comes from the large and growing bulk of works devoted to data dissemination and collection in wireless sensor networks. Wireless sensor networks [11] are composed by autonomous nodes, with sensing capability of some physical phenomenon e.g. temperature, or pressure). In order

3 3 to ensure ease of deployment and robustness, the communication between the nodes might need to be performed in absence of designated access points and of a hierarchical structure. At the network layer, dissemination of the measurements to all the nodes can be achieved using an asynchronous protocol based on random linear network coding RLC) [12]. In the protocol, each node in the network broadcasts a packet evaluated as the linear combination of the local measurement, and of the packets received from neighboring nodes. The linear coefficients are randomly chosen, and are sent in each packet header. Upon an appropriate number of communication rounds, each node has collected enough linearly independent combinations of the network measurements, and can perform decoding, by solving a system of linear equations. Due to the physical nature of the sensed phenomenon, and to the spatial distribution of the nodes in the network, correlation between the measurements at different nodes can be assumed, and exploited to perform decoding, as done in [13] [17]. Recasting the problem in the Bayesian compressed sensing framework, the vector of the measurements at the nodes is interpreted as the compressible source Θ, the network coding matrix as the sensing matrix, and the decoding at each node as the estimation operation. Before transmission, all the measurements needs to be quantized. uantization can be performed after the network encoding operation, as done in [17], where reconstruction on the real field is performed via l 1 -norm minimization, or it can be done prior to the network encoding operation. For the latter choice, which is the target of this work, each quantization index is represented by an element of a finite field, from which the network coding coefficients i.e., the sensing coefficients in the compressed sensing framework) are chosen as well. This setting has been considered in [13], where exact MAP reconstruction is obtained solving a mixed-integer quadratic program, and in [14] [16], where approximate MAP estimation is obtained using variants of the belief propagation algorithm. The performance analysis of compressed sensing over finite fields has been addressed

4 4 in [15], [18], and [19]. The work in [19] does not consider Bayesian priors, and assumes a known sparsity level of θ. Ideal decoding via l 0 -norm minimization is assumed, and necessary and sufficient conditions for exact recovery are derived as functions of the size of the vector, its sparsity level, the number of measurements, and the sparsity of the sensing matrix. umerical results show that the necessary and sufficient conditions coincide, as the size of θ asymptotically increases. A Bayesian setting is considered in [18] and [15]. In [18] a prior distribution induces sparsity on the realization of Θ, whose elements are assumed statistically independent. Using the method of types [20], the error exponent with respect to exact reconstruction using l 0 -norm minimization is derived in absence of noise in the measurements, and the error exponent with respect to exact reconstruction using minimum-empirical entropy decoding is derived for noisy measurements. In [15] specific correlation patterns pairwise, cluster) between the elements of Θ are considered. Error exponents under MAP decoding are derived, only in case of absence of noise on the measurements. The contribution of this work can be summarized as follows. We assume a Bayesian setting and we consider MAP decoding. Inspired by the work in [19], we aim to derive necessary and sufficient conditions for almost surely exact recovery of θ, as its size asymptotically increases. We consider three classes of prior distributions on the source vector: i) the prior distribution is sparsity inducing, and the elements are statistically independent; ii) the vector Θ is a Markov process; iii) the vector Θ is an ergodic process. To the best of our knowledge, no analysis has been previously performed for the latter source model, which is quite general. We consider both sparse and dense sensing matrices. We consider two kinds of noises: a) the sensing noise, affecting the measurements prior to network coding i.e., prior to random projection acquisition in the compressed sensing framework); b) the communication noise, affecting the network coded packets i.e., the random projections in the compressed sensing framework). To the best of our knowledge, no analysis has been previously performed in presence of

5 5 both kinds of noise. Considering source model i), our results for the noiseless setting are compatible with the ones presented in [19]; in addition, we can formally prove the asymptotic convergence of necessary and sufficient conditions, and extend the bounds on the sparsity factor of the sensing matrix in presence of communication noise. The asymptotic analysis under MAP decoding, both for the noiseless case and in presence of communication noise b), are compatible with the results derived in [18], respectively under l 0 -norm minimization decoding and under minimum-empirical entropy decoding. Error exponents for MAP decoding of correlated sources in the noiseless setting are compatible with the ones presented in [15], and are here extended to the case of arbitrary statistical structure, and presence of noise contamination both preceding and following the sensing operation. The rest of the paper is organized as follows. Section II introduces the considered signal models in the context of data dissemination in a wireless sensor network. In Section III, we derive the necessary conditions for asymptotic almost surely exact recovery, both for the noiseless and noisy cases. Section IV describes the sufficient conditions and the error exponents under MAP decoding, for the noiseless case and in presence of communication noise only. In Section V, sensing noise is also taken into account. Section VI concludes the paper. II. SYSTEM MODEL AD PROBLEM SETUP This section introduces the system model as well as various hypotheses on the sources and on the sensing and communication noises. In what follows, sans-serif font denotes random quantities while serif font denotes deterministic quantities. Matrices are in boldface upper-case letters. A length n vector is in bold-face lower-case with a superscript n. Calligraphic font denotes set, except H, which denotes the entropy rate. All logarithms are in base 2.

6 6 A. The source model Consider a wireless sensor network consisting of a set of = sensors. The target physical phenomenon e.g. the temperature) at the n-th sensor is represented by the random variable Θ n, taking values on a finite field F of size. Let θ a realization of the random vector Θ vector Θ be = Θ 1,..., Θ ), taking values in F. The represents the source in the Bayesian compressed sensing framework. The probability mass function pmf) associated with Θ is denoted by p θ ), rather than p ) Θ θ, for the sake of simplicity. In general, the analytic form of p θ ) depends on the characteristics of the observed phenomenon and of the topology of the sensor network. Here we consider three different models, defined as follows. SI: Sparse, Independent and identically distributed source. Each element of the source vector Θ is independent and identically distributed iid) with pmf p Θ ) and p Θ 0) > 0.5, p θ ) = p Θ θ n ). 1) n=1 StM: Stationary Markov model. Let θ n+r 1 n F r denote the sequences θ n,..., θ n+r 1 ). This is the stationary r-th order Markov model with r + and 1 r and transition probability p ) θ n+r θ n+r 1 n. The pmf of Θ may be written as p θ ) r = p θ r 1) p ) θ n+r θ n+r 1 n. 2) n=1 GSE: General Stationary and Ergodic model. This is the general case, without any further assumption apart from the ergodicity of the source. B. The sensing model The considered system model is shown in Figure 1. Let x n F be the measurement of Θ n obtained by the n-th sensor. The random vector x = x 1, x 2,..., x ) F is a copy of the source vector Θ corrupted by the sensing noise. The sensing noise models

7 7 Θ Sensor Measurement x RLC y Decoder Θ u Figure 1. Block diagram for network compressive sensing model the effect of imperfect measure acquisition at each sensor. It is described by the stationary transition probability p x Θ x n θ n ), n. Remark that this implies that x as long as Θ is stationary is stationary. The local measurement x n at node n is used to compute a packet via RLC [12], which is then broadcast and received by the neighbours of n. Each node in the network can act as a sink, and attempt reconstruction of Θ, after a number M of linear combinations has been received. The effects of RLC at a sink node can be modeled as multiplying x by a random matrix A F M. We assume that some communication noise u M F M affects the received packets, modeling the effects of transmission. Each entry of u M is iid with pmf p u ). The sink node is assumed to have received M packets, with the i-th packet carrying the coefficients A i and the result of linear combination y i F, where A i is the i-th row of A and y i = A i x + u i, with all operations in F. The vector y M = y 1, y 2,..., y M ) t F M as can be then represented y M = Ax + u M, 3) where the network coding matrix A plays the role of the random sensing matrix in the compressed sensing setup. According to the presence of the sensing and communication noises, one obtains four types of noise models, namely Without oise W), oise in Communications C) only, oise in the Sensing process S) only, and noise in both Communications and in the Sensing process CS). These models are summarized in Table I.

8 8 Table I CLASSIFICATIO AD OTATIO BASED O THE PRESECE OF OISE Communication oise absent present Sensing absent W C oise present S CS In general, the matrix A is not necessarily of full rank, and it is assumed to be independent of x. Two different assumptions about the structure of A are considered here: A1) the entries of A are iid, uniformly distributed in F ; A2) the entries of A are iid, all non-zero elements of F are equiprobable. Both can be represented using the following model: for the entry A ij of A, 1 γ q = 0, Pr A ij = q) = 4) γ/ 1) q F \ {0}, where γ is the sparsity factor, 0 < γ < 1, and A is sparse if γ < 0.5. We only assume that 0 < γ ) otice that choosing γ < 1 1 corresponds to assumption A2), while choosing γ = 1 1 corresponds to assumption A1), since 4) becomes the uniform distribution. In practice, sparse matrices are preferable. As the information of the sensing matrix is carried in the headers of packets [21], [22], the network coding overhead may be large if A is dense and is large. Moreover, as mentioned in [14], sparse matrices facilitate the convergence of the approximate belief propagation algorithm [8]. In practice, the structure of A is strongly dependent on the structure of the network. For example, [15] assumes that only a subset of sensors S i have participated in the i-th linear mixing. The content of the subsets S i depends on the location of each sensor and is designed

9 9 to minimize communication costs. In A, coefficients associated to nodes belonging to S i follow a uniform distribution, while the others are null. This model, however, is not considered here, since we aim at a general asymptotic analysis, independent on the topology of the network. C. MAP Decoding The sink node observes the realization y M and perfectly knows the realization A, e.g., from packet headers, see [21] and [22]. The maximum a posteriori estimate ˆθ of the realization of Θ is evaluated as where the a posteriori pmf is p θ y M, A ) p θ, y M, A ) = ˆθ = arg max p θ y M, A ), 6) θ F p θ, x, u M, y M, A ) x F u M F M = x F u M F M p θ ) p x θ ) p u M) p A) p y M x, u M, A ). 7) ote that the conditional pmf p y M x, u M, A ) is an indicator function, i.e., p y M x, u M, A ) = 1 y M =Ax +u M. 8) An error event decoding error) occurs when ˆθ θ, with probability { P e = Pr ˆΘ Θ }. 9) Our objective is to evaluate lower and upper bounds of 9) under MAP decoding, as functions of M,, and γ, for the various source and noise models previously introduced.

10 10 With these bounds, one can obtain necessary and sufficient conditions on the ratio M/ for asymptotic with ) perfect recovery, i.e., to obtain lim P e = 0. 10) III. ECESSARY CODITIO FOR ASYMPTOTIC PERFECT RECOVERY This section derives the necessary conditions for asymptotically ) vanishing probability of decoding error. They only depend on the assumptions considered about the sensing and communication noises. We directly analyze the CS case for the GSE source model. The results for this case can be easily adapted to the other cases. This work extends results obtained in [19] for the noiseless case W). Two situations are considered, depending on the value of the entropy rate Proposition 1 ecessary condition for the CS case). 1 H x) = lim H x ). 11) Assume the presence of both communication and sensing noises and that H x) > 0. Consider some arbitrary small δ R +. For, the necessary conditions for P e < δ are and H Θ, x) H x) < 3ε + δ log, 12) M where ε R + is an arbitrary small constant. H p u ) < log, 13) H Θ, x) 5ε + 2δ log ) >, 14) log H p u ) Corollary 1. Consider the same hypotheses as in Proposition 1 and assume now that H x) = 0. Consider some arbitrary small δ R +. For, the necessary condition for P e < δ is H Θ, x) < 3ε + δ log, 15)

11 11 where ε R + is an arbitrary small constant. In Proposition 1, 12) implies that for asymptotically exact recovery, p x Θ ) should degenerate, almost surely, into a deterministic mapping. The condition 13) indicates that asymptotically exact recovery for non-deterministic sources is not possible in case of uniformly distributed communication noise. Finally, 14) indicates that the minimum number of required measurements depends both on the sensing and communication noises as well as on the distribution of Θ. In particular, for a given source with entropy rate H Θ), the number of necessary measurements increases with the level of the sensing noise, determined by H x Θ). Similarly, the number of necessary measurements increases when the communication noise gets closer to uniformly distributed. The following proof is inspired by the work in [19], with both communication noise and sensing noise are considered here. Proof: From the problem setup, one has the Markov chain from which one deduces that Θ x y M, A ) ˆΘ, 16) H Θ x ) H Θ ˆΘ ), 17) and H x y M, A ) H Θ ˆΘ ). 18) Applying Fano s inequality [23, Sec. 2.10], one gets H Θ ˆΘ ) 1 + P e log 1 ) an upper bound of P e is obtained combining 17) and 19), < 1 + P e log, 19) P e > H Θ, x ) H x ) 1. 20) log

12 12 Since Θ and x are stationary and ergodic, for any ε > 0, there exists 0 such that > 0, one has H Θ, x) ε < HΘ,x ) < H Θ, x) + ε, H x) ε < Hx ) < H x) + ε, ε > 1. Hence for > 0, 20) can be rewritten as For P e P e > 21) H Θ, x) H x) 3ε. 22) log < δ, one deduces 12) from 22). For δ and ε arbitrary small, 12) imposes H Θ, x) = H x), meaning that Θ should be deterministic knowing x, almost surely. From 18) and 19), one gets an other lower bound for P e P e > H x y M, A ) 1. 23) log The conditional entropy H x y M, A ) can be bounded as H x y M, A ) = H x ) I x ; y M, A ) = H x ) I x ; A ) + I x ; y M A )) a) = H x ) H y M A ) H y M A, x )) b) H x ) M log + H y M A, x ) c) = H x ) M log + MH p u ), 24) where a) follows from the assumption that x and A are independent, b) comes from H y M A ) H y M) log F M = M log, and c) is because H y M A, x ) = H Ax + u M A, x ) = H u M) = MH p u ). 25) Using 23) and 24), a second necessary condition for P e < δ is H x ) M log H p u )) 1 log < δ. 26)

13 13 For > 0, using 21) in 26) yields H x) M log H p u)) 2ε log < δ. 27) ow consider two cases. In the first case, the communication noise is assumed uniformly distributed, i.e., H p u ) = log, 28) the condition 27) becomes H x) < δ log + 2ε. 29) As δ can be made arbitrary small, 29) imposes that, for uniform communication noise, asymptotically vanishing probability of error is possible only if H x) is arbitrary close to zero. For non-degenerate cases, i.e., H x) > 0, one obtains the necessary condition 13). In this second case, a lower bound of the compression ratio M/ is obtained immediately from 27), M H x) 2ε + δ log ) >. 30) log H p u ) We can represent the condition 30) in terms of the joint entropy rate H Θ, x) by applying 12). Then, one gets 14) and Proposition 1 is proved. Consider now H x) = 0, then 27) holds for any value of M/, and for any H p u ) log, since the left side of 27) is always negative. Hence, 12) is the only necessary condition for this case. Corollary 1 is also proved. With the results of the CS noise model, one may derive the necessary conditions for the other models. If no sensing noise is considered, i.e., x = Θ, one has H Θ x ) = H x Θ ) = 0 and H Θ, x ) = H Θ ). If communication noise is absent, i.e., u M = 0, H u M) = 0. The necessary conditions for asymptotically ) vanishing probability of decoding error for each case are listed in Table 2.

14 14 Table II ECESSARY CODITIOS FOR ASYMPTOTIC PERFECT RECOVERY I OISELESS AD OISY CASES Case ecessary Condition H x) > 0) W M > HΘ), already obtained in [19], log C M > HΘ) log Hp u) and H pu) < log, S M > HΘ,x) log and H Θ x) = 0, CS M > HΘ,x) log Hp u) and H pu) < log and H Θ x) = 0. IV. SUFFICIET CODITIO I ABSECE OF SESIG OISE This section provides an upper bound of the error probability for the MAP estimation problem in absence of sensing noise the W and C cases). These two cases are considered simultaneously because their proofs are similar. When the channel noise vanishes, the C case boils down to the W case. A. Upper Bound of the Error Probability Proposition 2 Upper bound of P e, W and C cases). Under MAP decoding, the asymptotic ) probability of error in absence of sensing noise can be upper bounded as P e P 1 α) + P 2 α) + 2ε, 31) where ε R + is an arbitrarily small constant. P 1 α) and P 2 α) are defined as P 1 α) = 2 M Hpu)+log1 γ)+ε) H 2α) α log 1) logα) ), 32) and P 2 α) = 2 HΘ) M Hp u)+log 1 + ) ) ) ) α 1 γ ) +ε ε, 33)

15 15 with α R + and α < 0.5. Proof: The proof consists of two parts. First we define the error event, and then we analyze the probability of error. Since no sensing noise is considered, we have x = Θ throughout this section. The a posteriori pmf 7) becomes p θ y M, A ) u M F M p θ ) p u M) p A) 1 y M =Aθ +u M. 34) Suppose that θ given but unknown) is the true state vector and consider that A has been generated randomly. At the sink, A and y M are known. With MAP decoding, the reconstruction ˆθ in 6) is ˆθ = arg max θ F u M F M p θ ) p u M) p A) 1 y M =Aθ +u M. 35) A decoding error happens if there exists a vector ϕ F \ { θ } such that p ϕ ) p v M) 1 y M =Aϕ +v M p θ ) p u M) 1 y M =Aθ +u M. 36) v M F M u M F M For fixed y M, A, and θ, there is exactly one vector u M such that u M = y M Aθ. Hence the right side of 36) can be represented as p ) Θ θ p u M y M Aθ ). The subscripts for the pmfs are introduced to avoid any ambiguity of notations. Then 36) is equivalent to p Θ ) ϕ p u M y M Aϕ ) p ) Θ θ p u M y M Aθ ). 37) An alternative way to state the error event can be: For a given realization Θ = θ, which implies the realization u M = u M = y M Aθ, there exists a pair ϕ, v M )

16 16 F FM such that ϕ θ, Aϕ + v M = y M = Aθ + u M, p ϕ ) p v M) p θ ) p u M). 38) From conditions 38), one concludes that the MAP decoder is equivalent to the maximum -probability decoder [24] in the C case. An upper bound of the error probability is now derived. For a fixed θ and u M, the conditional error probability is denoted by Pr { error θ, u M}. The average error probability is P e = θ F u M F M p θ, u M) Pr { error θ, u M}. 39) Weak typicality is instrumental in the following proofs. The notations of [25, Definition 4.2] are extended to stationary and ergodic sources. For any positive real number ε and some integer > 0, the weakly typical set A [Θ]ε F for a stationary and ergodic source Θ is the set of vectors θ F satisfying 1 log p θ ) H Θ) ε, 40) where H Θ) is the entropy rate of the source. Similarly, for the noise vector u M, define { A M [u]ε = u M F M : 1 M log p u M) } H p u ) ε. 41) Recall that the entries of u M are uncorrelated, so H u) = H p u ). Thanks to Shannon- McMillan-Breiman theorem [23, Sec. 16.8], the pmf of the general stationary and ergodic source converges. In other words, for any ε > 0, there exists ε and M ε such that for all > ε and M > M ε, { Pr 1 log p Θ ) } H Θ) ε 1 ε, 42)

17 17 and { Pr 1 M log p u M) } H p u ) ε 1 ε. 43) We can make ε arbitrary close to zero as and M. A sandwich proof of this theorem is proposed in [23, Sec. 16.8]. For the sparse and uncorrelated source as defined in 1), H Θ) is equal to H p Θ ), the entropy of a single source. The entropy rate of the StM source is the conditional entropy H ) Θ n+r Θ n+r 1 n. { } { } From 42) and 43), one has Pr Θ A [Θ]ε 1 ε and Pr u M A M [u]ε 1 ε for > ε and M > M ε. It implies that, for and M sufficiently large, Θ u M belong to the weakly typical set A [Θ]ε and AM [u]ε, almost surely. With respect to the typicality, F FM and can be divided into two parts. Define the sets U and U c for the pair of vectors θ, u M), such that U U c = F FM U = { θ F, u M F M : θ A [Θ]ε and u M A M [u]ε}, 44) U c = { θ F, u M F M : θ / A [Θ]ε or u M / A M [u]ε}. 45) U is the joint typical set for θ, u M ), due to the independence of Θ error probability can be bounded as P e = a) b) θ,u M ) U + θ,u M ) U c p θ,u M ) U θ,u M ) U and p θ ) p u M) Pr { error θ, u M} θ ) p u M) Pr { error θ, u M} p θ ) p u M) Pr { error θ, u M} + θ,u M ) U c p and u M. The θ ) p u M) p θ ) p u M) Pr { error θ, u M} + 2ε, 46)

18 18 where a) comes from Pr error θ, u M) 1 and b) follows from the fact that θ,u M ) U c p θ ) p u M) = 1 θ,u M ) U = 1 p θ ) p u M) p θ ) θ A u [Θ]ε M A M [u]ε p u M) 1 1 ε) 1 ε) 2ε. 47) Since A is generated randomly, define the random event E θ, u M ; ϕ, v M) = { Aθ + u M = Aϕ + v M}, 48) where θ, u M) is the realization of the environment state, and ϕ, v M ) is the potential reconstruction result. Conditioned on θ, u M), Pr { error θ, u M} is in fact the probability of the union of the events E θ, u M ; ϕ, v M) with all the parameter pairs ϕ, v M ) F FM such that ϕ θ and p ϕ ) p v M) p θ ) p u M), see 38). The conditional error probability can then be rewritten as Pr { error θ, u M} = Pr ϕ F \{θ }, v M F M : pϕ )pv M ) pθ )pu M ) Introducing 49) in 46) and applying the union bound yields P e E θ, u M ; ϕ, v M). 49) p θ ) p u M) Pr { E θ, u M ; ϕ, v M)} + 2ε = θ,u M ) U ϕ F \{θ }, v M F M : pϕ )pv M ) pθ )pu M ) θ,u M ) U ϕ F \{θ } v M F M p θ ) p u M) Φ θ, u M ; ϕ, v M) Pr { E θ, u M ; ϕ, v M)} + 2ε, 50)

19 19 where Φ θ, u M ; ϕ, v M) 1 if p ϕ ) p v M) p θ ) p u M), = 0 if p ϕ ) p v M) < p θ ) p u M). 51) ow consider the following lemma. Lemma 1 Upper bound of Φ). Consider some s R + with s 1. For any θ, ϕ in F and um, v M in F M, the following inequality holds, Φ ) θ, ϕ, u M, v M) p ϕ p v M) ) s p θ ). 52) p u M ) Lemma 1 is a part of Gallager s derivation of error exponents in [26, Sec. 5.6]. Introducing 52) with s = 1 into 50), one gets P e p ϕ ) p v M) Pr { E θ, u M ; ϕ, v M)} + 2ε. 53) θ,u M ) U ϕ F \{θ } v M F M In 53), Pr { E θ, u M ; ϕ, v M)} = Pr { Aµ = s M µ 0, s M} 54) with µ = ϕ θ F \ {0}, and sm = v M u M F M. This probability depends on the sparsity of µ and of s M, let d 1 = µ 0 and d 2 = s M 0. Both d 1 and d 2 are integers such that 1 d 1 and 0 d 2 M. Define the multivariable function f d 1, d 2 ; γ,, M) = Pr { Aµ = s M µ 0 = d 1, } s M 0 = d 2, 55) where γ,, and M are the parameters of the random matrix A. Pr { Aµ = 0 µ 0 } = f d 1, 0; γ,, M) has been evaluated in [27] and [19]. We provide a simple extension of this result for d 2 0.

20 20 Lemma 2 Properties of f d 1, d 2 ; γ,, M)). The function f d 1, d 2 ; γ,, M), defined in 55), is non-increasing in d 2 for a given d 1 and f d 1, d 2 ; γ,, M) f d 1, 0; γ,, M) = 1 + Moreover f d 1, 0; γ,, M) is non-increasing in d 1 and ) d1 γ 1 )) M ) f d 1, 0; γ,, M) f 1, 0; γ,, M) = 1 γ) M. 57) If γ = 1 1, which corresponds to a uniformly distributed sensing matrix, is constant. f d 1, d 2 ; γ,, M) = M 58) See Appendix A for the proof details. Using Lemma 2, 53) can be expressed as P e a) M d 1 =1 d 2 =0 θ,u M ) U ϕ F : ϕ θ 0 =d 1 p ϕ ) p v M) f d 1, d 2 ; γ,, M) + 2ε b) c) d 1 =1 α d 1 =1 v M F M : um v M 0 =d 2 θ,u M ) U ϕ F : ϕ θ 0 =d 1 θ,u M ) U ϕ F : ϕ θ 0 =d 1 p ϕ ) f d 1, 0; γ,, M) p ϕ ) f 1, 0; γ,, M) v M F M p v M) + 2ε + p ϕ ) f α, 0; γ,, M) + 2ε, 59) d 1 = α θ,u M ) U ϕ F : ϕ θ =d 0 1 where a) is by the classification of ϕ and v M according to the l 0 norm of their difference with θ and u M respectively and b) is obtained using the bound 56) and

21 21 using v M F p v M) = 1. The splitting in c) permits f d M 1, 0; γ,, M) to be bounded in different cases, this idea comes from [27] and is also meaningful here. The parameter α is a positive real number with 0 < α < 0.5. The way to choose α is discussed in Section IV-B. The two terms in 59), denoted by P U1 α) and P U2 α), need to be considered separately. For the first term P U1 α), we have α P U1 α) = f 1, 0; γ,, M) d 1 =1 θ,u M ) U ϕ F : ϕ θ 0 =d 1 p ϕ ) a) = 1 γ) M α u M A M d 1 =1 ϕ [u]ε F p ϕ ) θ A [Θ]ε : θ ϕ 0 =d 1 1 b) 1 γ) M c) 1 γ) M α u M A M d 1 =1 ϕ [u]ε F α u M A M d 1 =1 [u]ε p ϕ ) { θ F : θ ϕ 0 = d 1 } 2 H 2 d 1 ) 1) d 1 d) e) 1 γ) M A M [u]ε α 2 H 2 α) 1) α 2 M Hpu)+log1 γ)+ε) H 2α) α log 1) logα) ) = P1 α) 60) where a) is by changing the order of summation and b) is obtained considering all θ F and not only typical sequences. The bound c) is obtained noticing that { θ F : θ ϕ 0 = d 1 } = d1 ) 1) d 1 2 H 2 d 1 ) 1) d 1, 61) where H 2 p) denotes the entropy of a Bernoulli-p source and ϕ F p ϕ ) = 1; d) is because of the monotonicity of the function H d1 ) 2, which is increasing in d1 as d 1 α < /2; e) comes from [23, Theorem 3.1.2], the upper bound of the size

22 22 of A M [u]ε, i.e., A M [u]ε 2 MHp u)+ε), 62) for M > M ε. Similarly, for > ε, one has ow we turn to P U2 α), P U2 α) = a) d 1 = α A [Θ]ε θ,u M ) U ϕ F : ϕ θ 0 =d 1 θ,u M ) U ϕ F = A [Θ]ε A M [u]ε b) 2 HΘ)+ε). 63) p ϕ ) f α, 0; γ,, M) p ϕ ) f α, 0; γ,, M) 1 + ) α γ 1 )) M ) ) ) ) α 2 HΘ) M Hp u)+log γ ) +ε ε = P 2 α)64), where a) is by ignoring the constraint that ϕ θ 0 = d 1, and b) is by the upper bounds of and, as before. Equations 59), 60), and 64) complete the proof. A [Θ]ε A M [u]ε B. Sufficient Condition In this section, sufficient conditions for the W and C cases are derived to get a vanishing upper bound of error probability. Proposition 3 Sufficient condition, W and C cases). Assume the absence of sensing noise and consider a sensing matrix with sparsity factor γ. For some δ R + which may be taken arbitrary close to zero), there exists small positive real numbers ε, ξ, and integers δ, M ε such that > δ and M > M ε, if the following conditions hold

23 23 the communication noise is not uniformly distributed, i.e., the sparsity factor is lower bounded the compression ratio M/ satisfies M > H p u ) < log ξ, 65) γ > 1 2 Hpu) ε, 66) H Θ) + ε log H p u ) ξ, 67) then one has P e δ using MAP decoding. As and M, ε and ξ can be chosen arbitrary close to zero. Proof: Both P 1 α) and P 2 α) need to be vanishing for increasing and M. The exponent of each term is considered respectively. Define, from 60), E C 1 = M H p u) + log 1 γ) + ε) H 2 α) α log 1) log α). 68) Then lim 2 EC 1 = 0 if E C 1 > 0. Thus, if E C 1 > 0, for any τ 1 R + arbitrarily small, τ1 such that > τ1, one has P 1 α) < τ 1. have otice that if H p u ) + log 1 γ) + ε 0, E C 1 is negative, thus one should first leading to 66). With this condition, E C 1 > 0 leads to Similarly, define from 64) E C 2 = H Θ) M H p u ) + log 1 γ) + ε < 0, 69) M > H 2 α) + α log 1) + logα) log 1 H p. 70) 1 γ u) ε H p u ) + log 1 + ) α γ 1 )) ε) ε )

24 24 Again, if E C 2 > 0, for any τ 2 R + arbitrarily small, τ2 + such that > τ2, one has P 2 α) < τ 2. Since 0 < γ 1 1, one gets 0 1 lim 1 γ 1 1 < 1 and ) α γ = 0. 72) 1 1 Thus for σ R + arbitrarily small, there exists an σ such that for > σ, ) α γ 1 ) 1 1 < σ 1. 73) 1 1 Hence E C 2 in 71) can be lower bounded by E C 2 > H Θ) M H pu ) + log 1 + σ 1) + ε ) ε, 74) for > σ. If this lower bound is positive, then E C 2 is positive. Again, if H p u ) log + log1 + σ) + ε 0, one obtains a negative lower bound for E C 2 from 74). Thus, one deduces 65) in Proposition 3, with ξ = log 1 + σ) + ε. 75) From 65), to get a positive lower bound for 74), one should have M > H Θ) + ε log H p u ) log 1 + σ) ε. 76) From 76) and 75), with ξ 0 as, one gets 67) in Proposition 3. From 70) and 76), one obtains M > max { H2 α) + α log 1) + logα) log 1 H p, 1 γ u) ε } H Θ) + ε. 77) log H p u ) ξ The value of α should be chosen such that the lower bound 77) on M/ is minimum. One may compare 77) with the necessary condition 14). The second term of 77) is similar to 14), since both ξ and ε can be made arbitrarily close to 0 as. The best value for α has thus to be such that H 2 α) + α log 1) + logα) log 1 1 γ H p u) ε H Θ) + ε log H p u ) ξ. 78)

25 25 The function H 2 α) + α log 1) is increasing when α ]0, 0.5[ and tends to 0 as α 0. The term log α) / is also negligible for large. Thus, there always exists some α satisfying 78). Since the speed of convergence of ξ is affected by α, we choose the largest α that satisfies 78). Finally, the sufficient condition 67) is obtained for M/. From 31), one may conclude that P e τ 1 + τ 2 + 2ε. 79) To ensure P e < δ, we should choose τ 1, τ 2, and ε to satisfy τ 1 + τ 2 + 2ε < δ. Then a proper value of σ, which depends on τ 2 and ε, can be chosen. At last, ξ is obtained from 75). With these well determined parameters, if all the three conditions in Proposition 3 hold, there exists integers ε, τ1, τ2, and σ, such that for any > δ = max { ε, τ1, τ2 σ }, 80) and M > M ε, one has P e < δ. C. Discussion and umerical Results In [18, Eq. 24)], considering a sparse and iid source, a uniformly distributed random matrix A, and the minimum empirical entropy decoder, the following error exponent in the case C is obtained E C 0 = min p,q D p p Θ) + M D q p u) + M log H p) M H q) +, 81) where D ) denotes the relative entropy between two distributions and + = max {0, }. In parallel, [12] proposed an approach to prove that the upper bound for the probability of decoding error P e under minimum empirical entropy decoding is equal to that of the maximum -probability decoder. As discussed in Section IV-A, in the W and C cases, the MAP decoder in the considered context is equivalent to the maximum

26 26 -probability decoder. As a consequence, 81) is also the error exponent of the MAP decoder in the considered context. A proof for 81) using the method of types need to do some assumptions on the topology of the considered sensor network to specify the type of θ. For correlated sources, one can extend 81) considering Markov model, and use higher-order types, leading to cumbersome derivations. From 81), provided that E C 0 > 0, P e tends to 0 as increases. E C 0 cannot be negative and E0 C = 0 if and only if D p p Θ ) = 0, D q p u ) = 0, M log H p) M H q) 0. Thus, 82) implies that M log H p Θ) M H p u) 0. Thus, a necessary and sufficient condition to have E C 0 > 0 is M log H p Θ) M H p u) > 0, which is the same as 67) with γ = 1 1 corresponding to A uniformly distributed). The proof using weak typicality leads to the same results in terms of sufficient condition for having asymptotically vanishing P e ) as the technique in [18]. In the noiseless case, since γ can be chosen arbitrarily small, the necessary condition in Proposition 1 and the sufficient condition in Proposition 3 asymptotically coincide. This confirms the numerical results obtained in [19]. In the C case, the difference between the two conditions comes from the constraint linking γ and the entropy of the communication noise. In Section III, the structure of A was not considered and no condition on γ has been obtained. The lower bound on γ implies that A should be dense enough to fight against the noise. Since the communication noise is iid, for a given probability of having one entry of u M non-zero, i.e., Pr u 0), the entropy H p u ) is maximized when p u q) = Pr u 0) / 1) for any q F \ {0}. This corresponds to the worst noise in terms of compression efficiency. Figure 2 represents the lower bound of γ as a function of Pr u 0), ranging from 82)

27 to 10 1, for different value of. There is almost no requirement on γ when Pr u 0) For a given noise level, a larger size of the finite field needs a denser sensing matrix. Figure 3 shows the influence of the communication noise on the optimum compression ratio. The lower bound of M/ is represented as a function of H Θ) / log, for different values of and for different values of Pr u 0). Sparisity Factor: γ = 2 = 4 = 16 = Crossover Probability: Pru 0) Figure 2. Lower bound of γ to achieve the optimum compression ratio for, according to 66) V. SUFFICIET CODITIO I PRESECE OF SESIG OISE This section performs an achievability study in presence of sensing noise by considering the conditional pmf p x Θ. The communication noise u M is first neglected to simplify the problem S case). The extension to the CS case is easily obtained from the S case. Assume that θ is the true state vector and that x represents the measurements of the sensors. The sink receives y M = Ax. The a posteriori pmf 7) can be written

28 Compression Ratio M/) =2, Pru 0) = 0.01 =4, Pru 0) = =16, Pru 0) = 0.01 =256, Pru 0) = 0.01 oiseless HΘ)/log Compression Ratio M/) =2, Pru 0) = 0.05 =4, Pru 0) = =16, Pru 0) = 0.05 =256, Pru 0) = 0.05 oiseless HΘ)/log 1 1 Compression Ratio M/) =2, Pru 0) = 0.1 =4, Pru 0) = =16, Pru 0) = 0.1 =256, Pru 0) = 0.1 oiseless HΘ)/log Compression Ratio M/) =2, Pru 0) = 0.2 =4, Pru 0) = =16, Pru 0) = 0.2 =256, Pru 0) = 0.2 oiseless HΘ)/log Figure 3. Optimum asymptotic achievable compression ratio in function of H Θ) / log, according to 67), for a crossover probability equal to 0.01, 0.05, 0.1, and 0.2 respectively, and without noise as p θ y M, A ) p θ ) p z θ ) 1 y M =Az. 83) z F In the case of MAP estimation, an error occurs if there exists a vector ϕ F \ { θ } such that p θ, z ) 1 y M =Az p ϕ, z ) 1 y M =Az. 84) z F z F θ and x are considered as fixed, but unknown. The decoder has knowledge of A and

29 29 y M = Ax, thus an alternative way to express 84) is p θ, z ) 1 Ax =Az p ϕ, z ) 1 Ax =Az. 85) z F z F A. Achievability Study We begin with the extension of the basic weakly typical set as introduced in Section IV-A. For any ε > 0 and +, based on A [Θ]ε for θ, one defines the weakly conditional typical set A [x Θ]ε θ ) for x, which is conditionally distributed with respect to p x Θ, with θ A [Θ]ε, A ) { [x Θ]ε θ = x F such that 1 log p x θ ) } H x Θ) ε. 86) Since H Θ, x) = H Θ) + H x Θ), if θ A [Θ]ε and x A ) [x Θ]ε θ, then θ, x ) A [Θ,x]2ε by consistency, where A [Θ,x]2ε denotes the weakly joint typical set, i.e., the set of pairs θ, x ) F F such that 1 log p θ, x ) H Θ, x) 2ε. 87) For any ε > 0 there exist an ε such that for all ε and for any θ A { [Θ]ε, one has Pr x A )} { [x Θ]ε θ Θ 1 ε and Pr, x ) } A [Θ,x]2ε 1 2ε. The cardinality of the set A [Θ,x]2ε satisfies A [Θ,x]2ε 2 HΘ,x)+2ε). 88) One may have ε arbitrary close to zero as. Considering A [Θ,x]2ε, the estimation error probability is bounded by p θ, x ) Pr { error θ, x } + P e p θ, x ) θ,x ) A [Θ,x]2ε θ,x )/ A [Θ,x]2ε p θ, x ) Pr { error θ, x } + 2ε, 89) θ,x ) A [Θ,x]2ε

30 30 Errors appear mainly due to a bad sensing matrix. Averaging over all A F M, 89) becomes P e A F M p A) θ A [Θ]ε x A [x Θ]εθ ) p θ, x ) Pr { error θ, x, A } +2ε, 90) where p A) = Pr {A = A}. Pr { error θ, x, A } can be written as Pr { error θ, x, A } 1 if ϕ F = \ { θ } s.t. 85) holds, 0 if ϕ F \ { θ }, 85) does not hold. 91) Using again the idea of Lemma 1, the conditional error probability is bounded by Pr { error θ, x, A } z p ) ϕ, z 1 F 1 1Ax =Az 1 z p ). 92) θ, z 2 F 2 1Ax =Az 2 ϕ F \{θ } From 90) and 92), one gets P e A F M p A) x A [x Θ]εθ ) p θ, x ) ϕ F \{θ } z p ϕ, z 1 F 1 z p θ, z 2 F 2 ) 1Ax =Az 1 ) 1Ax =Az 2 + 2ε. θ A [Θ]ε ow, for some θ A [Θ]ε, consider the direct image by A of the conditional typical set A ) [x Θ]ε θ 93) Y ε A, θ ) = { y M = Ax, for all x A [x Θ]ε θ )}. 94) Lemma 3. For any arbitrary real-valued function h x ) with x F, one has x A [x Θ]εθ ) h x ) = y M Y εa,θ ) x A [x Θ]εθ ) Proof: For a given y M Y ε A, θ ), consider the set h x ) 1 y M =Ax. 95) X ε y M, A, θ ) = { x A [x Θ]ε θ ) such that y M = Ax }. 96)

31 31 Then one has A ) [x Θ]ε θ = X ε y M, A, θ ), 97) y M Y εa,θ ) with X ε y M i, A, θ ) X ε y M j, A, θ ) = for any yi M yj M, since the multiplication by A is a surjection from A ) [x Θ]ε θ to Y ) ε A, θ. So any sum over x A ) [x Θ]ε θ can be decomposed as h x ) = h x ) x A [x Θ]εθ ) = y M Y εa,θ ) x X εy M,A,θ ) y M Y εa,θ ) x A [x Θ]εθ ) h x ) 1 y M =Ax. 98) Applying 95) to 93), one obtains P e p A) A F M θ A [Θ]ε y M Y εa,θ ) x A [x Θ]εθ ) z p ϕ, z 1 F 1 ϕ F \{θ z p θ, z } 2 F 2 = p A) A F M θ A [Θ]ε y M Y εa,θ ) ) 1y M =Az 1 ) 1y M =Az 2 p θ, x ) 1 y M =Ax ϕ F \{θ } z 1 F + 2ε x A [x Θ]εθ ) p θ, x ) 1 y M =Ax z p ) + 2ε θ, z 2 F 2 1y M =Az 2 p A) A F M θ A [Θ]ε y M Y εa,θ ) ϕ F \{θ } z 1 F p ϕ, z 1 p ϕ, z 1 ) 1y M =Az 1 ) 1y M =Az 1 + 2ε, 99)

32 32 since we have x A [x Θ]εθ ) p θ, x ) 1 y M =Ax z p ) θ, z 2 F 2 1y M =Az ) The bound 100) is tight because for sufficiently large, the probability of the nontypical set vanishes. Recall that y M = Ax, even though x is not explicit in 99). As a vector y M may correspond to several x s, 99) is further bounded by Since P e one gets P e A F M θ A [Θ]ε p A) θ A [Θ]ε ϕ F \{θ } A F M θ A [Θ]ε ϕ F \{θ } Suppose that x z 1 x A [x Θ]εθ ) ϕ F \{θ } z 1 F x A [x Θ]ε θ ) z 1 F p ϕ, z 1 ) p ϕ, z 1 A F M p A) 1 Ax =Az 1 = Pr { Ax = Az 1 x A [x Θ]ε θ ) z 1 F p ϕ, z 1 0 = d. If d = 0, Pr { Ax = Az 1 ) 1Ax =Az 1 + 2ε p A) 1 Ax =Az 1 ) Pr { Ax = Az 1 apply Lemma 2, without communication noise, Pr { Ax = Az 1 Depending on d being zero or not, P A is split as follows + 2ε. 101) }, 102) } + 2ε. 103) } equals 1. Otherwise we can } = f d, 0; γ,, M). P e P A1 + P A2 + 2ε, 104) where P A1 = p ) ϕ, z 1, 105) θ A [Θ]ε z 1 A [x Θ]εθ ) ϕ F \{θ }

33 33 and P A2 = θ A [Θ]ε ϕ F \{θ } x A [x Θ]ε θ ) z 1 F \{x } p ϕ, z 1 ) Pr { Ax = Az 1 }. 106) Lemma 4. A sufficient condition for P A1 2ε is that, for any pair of vectors θ, ϕ ) A [Θ]ε A [Θ]ε such that θ ϕ, A [x Θ]ε θ ) A [x Θ]ε ϕ ) =. 107) becomes Proof: Assume that 107) is satisfied. Changing the order of summation, 105) P A1 = ϕ F p ϕ ) θ A [Θ]ε \{ϕ } z 1 A [x Θ]ε θ ) which can be further decomposed as P A1 = P A11 + P A12, with P A11 = a) ϕ A [Θ]ε ϕ A [Θ]ε ϕ A [Θ]ε p ϕ ) p ϕ ) θ A [Θ]ε \{ϕ } z 1 A [x Θ]ε θ ) z 1 F \A [x Θ]ε ϕ ) p z 1 ϕ ), 108) p z 1 ϕ ) p z 1 ϕ ) p ϕ ) ε ε, 109) where a) comes from the fact that if 107) is satisfied, one has θ A [Θ]ε \{ϕ } A [x Θ]ε θ ) F \ A [x Θ]ε ϕ ). 110)

34 34 On the other hand, P A12 = p ϕ ) p z 1 ϕ ) ϕ F \A [Θ]ε θ A [Θ]ε z 1 A [x Θ]ε θ ) p ϕ ) ε, 111) since for this part ϕ F \A [Θ]ε θ A [Θ]ε A [x Θ]ε θ ) F. 112) From 109) and 111), Lemma 4 is proved. ow consider the term 106), P A2 = p ) ϕ, z 1 f d, 0; γ,, M) d=1 θ A [Θ]ε ϕ F \{θ } x A [x Θ]ε θ ) z 1 F : x z 1 0 =d β d=1 + θ A [Θ]ε ϕ F \{θ } θ A [Θ]ε x A [x Θ]ε θ ) z 1 F x F : x z 1 =d 0 ϕ A [Θ]ε \{θ } z 1 F p ϕ, z 1 p ϕ, z 1 ) f 1, 0; γ,, M) ) f β, 0; γ,, M), 113) which is similar to 59) in Section IV-A. For sufficient large, the condition on M/ to ensure P A2 tends to zero as is M > H Θ, x) + ε log ξ, 114) for some ξ R +. Finally, we have Proposition 4 to conclude the sufficient condition for reliable recovery in the S case. Proposition 4 Sufficient condition, S case). In the S case, fix an arbitrary small positive real number δ, there exists ε R +, ξ R +, δ + and M ε + such that

35 35 for any > δ and M > M ε, one has P e < δ under MAP decoding if 107) and 114) hold. One can make both ε and ξ arbitrary close to 0 as. Finally, the CS case, accounting for both communication and sensing noise, has to be considered. Proposition 5 Sufficient condition, CS case). Considering both communication noise and sensing noise, for and M sufficient large and positive ε, ξ arbitrary small, the reliable recovery can be ensured under MAP decoding if the communication noise is not uniformly distributed, 65) there is no overlapping between any two different weakly conditional typical sets, i.e., A [x Θ]ε θ ) A [x Θ]ε ϕ ) = for any two typical but different θ and ϕ, the sparsity factor satisfies the constraint in 66), the compression ratio M/ is lower bounded by M > H Θ, x) + ε log H p u ) ξ, 115) The derivations are similar to those of Proposition 3 and Proposition 4. B. Discussion and umerical Results When comparing the necessary condition in Proposition 1 and the sufficient condition in Proposition 5, an interesting fact is that H Θ x) = 0 is a sufficient condition to have 107). This implies that the value of θ should be fixed almost surely, as long as x is known. So, 107) is helpful to interpret 12), justifying the need for the conditional entropy H Θ x) to tend to zero as increases. This condition may be satisfied since F as long as H Θ) < log. The entropy rate H Θ) can be very small, A [Θ]ε Appendix B presents a possible situation where H Θ) = 0. Another implicit constraint

36 36 resulting from 107) is θ A [Θ]ε A [x Θ]ε θ ) F 116) which means that H Θ, x) log. 117) Consider a communication noise with Pr u 0) = 0.1 and the transition pmf 1 Pr x Θ) if x n = θ n px n θ n ) =, 118) Prx Θ) if x 1 n F \ {θ n} where Pr x Θ) denotes the probability of the sensing error. In Figure 4, the lower bound of M/ is represented as a function of H Θ) / log, for different values of and for different values of Pr x Θ). VI. COCLUSIOS AD FUTURE WORK In this paper we have considered robust Bayesian compressed sensing over finite fields under MAP decoding. Both asymptotically necessary and sufficient conditions of the compression ratio for reliable recovery are obtained and their convergence is also shown, even in the case of sparse sensing matrices. Several previous results have been generalized by considering a stationary and ergodic source model. Both communication noise and sensing noise have been taken into account. We have shown that the choice of the sparsity factor of the sensing matrix only depends on the communication noise. Since necessary and sufficient conditions asymptotically converge, the MAP decoder achieves the optimum lower bound of the compression ratio, which can be expressed as a function of H Θ, x), H p u ), and the alphabet size. In this paper, the sensing matrix was assumed to be perfectly known, without specific structure. In sensor network compressive sensing applications, the structure of the sensing matrix usually depends on the structure of the network. Evaluating the impact of these

37 Compression Ratio M/) =2, Prx θ) = =4, Prx θ) = 0 =16, Prx θ) = 0 =256, Prx θ) = HΘ)/log Compression Ratio M/) =2, Prx θ) = 0.01 =4, Prx θ) = =16, Prx θ) = 0.01 =256, Prx θ) = HΘ)/log 1 1 Compression Ratio M/) =2, Prx θ) = 0.05 =4, Prx θ) = =16, Prx θ) = 0.05 =256, Prx θ) = HΘ)/log Compression Ratio M/) =2, Prx θ) = 0.1 =4, Prx θ) = =16, Prx θ) = 0.1 =256, Prx θ) = HΘ)/log Figure 4. Optimum achievable compression ratio in function of H Θ) / log, according to 115), for the cases that Pr x Θ) being 0 C case), 0.01, 0.05, and 0.1, respectively, when Pr u 0) = 0.1 constraints on the compression efficiency will be the subject of future research. A first step in this direction was done in [15], which considered clustered sensors.

38 38 APPEDIX A PROOF OF LEMMA 1 Proof: Let A i be the i-th row of A. As all entries in A are independent Pr { Aµ = s M µ 0, s M} M = Pr { } A i µ = s i µ 0, s i. 119) According to [27, Lemma 21], we have and i=1 Pr { A i µ = 0 µ 0 = d 1 } = 1 + Pr { A i µ = q µ 0 = d 1, q F \ {0} } = 1 ) d1 γ 1 ) 1 1, 120) 1 1 ) d1 γ ) 1 1 Since d 2 is the number of non-zero entries of s M, combining 119), 120), and 121), one gets f d 1, d 2 ; γ,, M) = ) d1 1 γ + 1 )) M d ) ) d2 d1 γ 1 122). 1 1 The monotonicity of this function is not hard to obtain with its expression and the condition 5). APPEDIX B A POSSIBLE SITUATIO FOR H Θ) = 0 Consider sensors uniformly deployed over a unit-radius disk. The physical quantities in R), which are collected by the sensors, are denoted by Ω R. We assume that Ω 0, Σ) with 1 e λd2 1,2 e λd2 1, e λd2 2,1 1 e λd2 2, Σ =..... e λd2,1 1, 123)

39 39 where λ is some constant, d i,j is the distance between sensors i and j. The distance between two sensors is random since the location of each sensor is random. The realvalued entries of Ω are quantized with a level scalar quantizer. We assume that = 2, corresponding to the rule 0 if Ω i < 0, Θ i = 1 if Ω i 0. With the above assumptions, we can prove the following lemma. Lemma 5. The conditional entropy H ) Θ n Θ n 1 1 converges to zero for n. 124) Proof: Suppose that j is the index of the sensor which has the minimum distance to sensor n, among the n 1 neighbor sensors, i.e., We have j = arg min d n,i. 125) 1in 1 H ) Θ n Θ n 1 1 H Θn Θ j ) 126) Denote the minimum distance as d n) = d n,j, the covariance matrix of Ω n and Ω j is Σ n = 1 ρ, 127) ρ 1 where ρ = e λdn)2. For a pair of realizations ω n and ω j, the joint probability density function writes g ω n, ω j ) = 1 2π 1 ρ exp ω2 n + ω 2 ) j 2ρω n ω j. 128) ρ 2 ) We easily obtain the probability of both Ω n and Ω j being negative, ˆ 0 ˆ 0 Pr {Ω n < 0 and Ω j < 0} = g ω n, ω j ) dω n dω j ) = π arctan ρ 1 ρ 2 129)

40 40 Taking into account 124), 129) is exactly the probability of the pair Θ n, Θ j ) being 0, 0). Define After the similar derivations, one obtains ) ε ρ) := π arctan ρ. 130) 1 ρ 2 Pr Θ n = 0, Θ j = 0) = Pr Θ n = 1, Θ j = 1) = 1 ε ρ) 131) 2 and Pr Θ n = 0, Θ j = 1) = Pr Θ n = 1, Θ j = 0) = ε ρ). 132) Then the joint entropy is H Θ n, Θ j ) = 1 + H 2 2ε ρ)). 133) Meanwhile H Θ j ) = 1, thanks to the 2-level uniform quantizer. Obviously H Θ n Θ j ) = H 2 2ε ρ)) = H 2 2ε e λd2)). 134) This conditional entropy is increasing in d. When the number of sensors increases, the disk will be denser, and the minimum distance d goes smaller. Thus, d tends to 0 as n, which implies that H Θ n Θ j ) 0. According to 126), we conclude that H ) Θ n Θ n 1 1 also goes to zero as n. Applying the chain rule, the entropy rate writes H Θ 1 ) + n=2 H Θ) = lim H ) Θ n Θ n ) By Cesaro mean [23, Theorem 4.2.3], H Θ) = 0 as H ) Θ n Θ n

Lecture 4 Noisy Channel Coding

Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 9, 2015 1 / 56 I-Hsiang Wang IT Lecture 4 The Channel Coding Problem