The Empirical Distribution of Rate-Constrained Source Codes

Size: px
Start display at page:

Download "The Empirical Distribution of Rate-Constrained Source Codes"

Transcription

1 The Empirical Distribution of Rate-Constrained Source Codes Tsachy Weissman, Eri Ordentlich HP Laboratories Palo Alto HPL December 8 th, 2003* tsachy@stanford.edu, eord@hpl.hp.com rate-distortion theory, sample converse, denoising, empirical distribution Let X = X,... be a stationary ergodic finite-alphabet source, X n denote its first n symbols, and Y n be the codeword assigned to X n by a lossy source code. The empirical th-order joint distribution Qˆ [X n, Y n ]x, y is defined as the frequency of appearances of pairs of -strings x, y along the pair X n, Y n. Our main interest is in the sample behavior of this random distribution. Letting IQ denote the mutual information IX ; Y when X, Y ~ Q we show that for any sequence of lossy source codes of rate R lim sup I Qˆ [X n, Y n ] R + H X - H X a.s. where H X denotes the entropy rate of X. This is shown to imply, for a large class of sources including all i.i.d. sources and all sources satisfying the Shannon lower bound with equality, that for any sequence of codes which is good in the sense of asymptotically attaining a point on the rate distortion curve whenever P X ~ Y Qˆ [X n,y n ] d P X ~ Y, a.s.,, is the unique distribution attaining the minimum in the definition of the th-order rate distortion function. Further consequences of these results are explored. These include a simple proof of Kieffer's sample converse to lossy source coding, as well as pointwise performance bounds for compression-based denoisers. * Internal Accession Date Only Approved for External Publication Copyright Hewlett-Pacard Company 2003

2 The Empirical Distribution of Rate-Constrained Source Codes Tsachy Weissman Eri Ordentlich November 6, 2003 Abstract Let X = X,... be a stationary ergodic finite-alphabet source, X n denote its first n symbols, and Y n be the codeword assigned to X n by a lossy source code. The empirical th-order joint distribution ˆQ [X n, Y n ]x, y is defined as the frequency of appearances of pairs of -strings x, y along the pair X n, Y n. Our main interest is in the sample behavior of this random distribution. Letting IQ denote the mutual information IX ; Y when X, Y Q we show that for any sequence of lossy source codes of rate R lim sup I ˆQ [X n, Y n ] R + HX HX where HX denotes the entropy rate of X. This is shown to imply, for a large class of sources including all i.i.d. sources and all sources satisfying the Shannon lower bound with equality, that for any sequence of codes which is good in the sense of asymptotically attaining a point on the rate distortion curve a.s., ˆQ [X n, Y n ] d P X,Ỹ a.s., whenever P X,Ỹ is the unique distribution attaining the minimum in the definition of the th-order rate distortion function. Further consequences of these results are explored. These include a simple proof of Kieffer s sample converse to lossy source coding, as well as pointwise performance bounds for compression-based denoisers. Introduction Loosely speaing, a rate distortion code sequence is considered good for a given source if it attains a point on its rate distortion curve. The existence of good code sequences with empirical distributions close to those achieving the minimum mutual information in the definition of the rate distortion function is a consequence of the random coding argument at the heart of the achievability part of rate distortion theory. It turns out, however, in ways that we quantify in this wor, that any good code sequence must have this property. This behavior of the empirical distribution of good rate distortion codes is somewhat analogous to that of good channel codes which was characterized by Shamai and Verdú in [SV97]. Defining the th-order empirical distribution of a channel code as the proportion of -strings anywhere in the codeboo equal to every given -string, this empirical distribution was shown to converge to the capacity-achieving channel input distribution. The analogy is more than merely qualitative. For example, the growth rate of with n where this convergence was shown in [SV97] to brea down will be seen to have an analogue in our setting. A slight difference between the problems is that in the setting of [SV97] the whole codeboo of a good code was shown to be well-behaved in the sense that the empirical distribution T. Weissman is with the Electrical Engineering Department, Stanford University, CA USA tsachy@stanford.edu. Part of this wor was done while T. Weissman was visiting Hewlett-Pacard Laboratories, Palo Alto, CA, USA. T. Weissman was partially supported by NSF grants DMS and CCR E. Ordentlich is with Hewlett-Pacard Laboratories, Palo Alto, CA USA eord@hpl.hp.com.

3 is defined as an average over all codewords. In the source coding analogue, it is clear that no meaningful statement can be made on the empirical distribution obtained when averaging and giving equal weights to all codewords in the codeboo. The reason is that any approximately good codeboo will remain approximately good when appending to it a polynomial number of additional codeboos of the same size, so essentially any empirical distribution induced by the new codeboo can be created while maintaining the goodness of the codeboo. We therefore adopt in this wor what seems to be a more meaningful analogue to the empirical distribution notion of [SV97], namely the empirical distribution of the codeword corresponding to the particular source realization, or the joint empirical distribution of source realization and its associated codeword. We shall show several strong probabilistic senses in which the latter entities converge to the distributions attaining the minimum in the associated rate distortion function. The basic phenomenon we quantify in this wor is that a rate constraint on a sequence of codes imposes a severe limitation on the mutual information associated with its empirical distribution. This is independent of the notion of distortion, or goodness of a code. The described behavior of the empirical distribution of good codes is a consequence of it. Our wor consists of two main parts. The first part, Section 3, considers the distribution of the pair of - tuples obtained by looing at the source-reconstruction pair X n, Y n through a -window at a uniformly sampled location. More specifically, we loo at the distribution of X I+ I, Y I+ I, where I is chosen uniformly from {,..., n + } independently of everything else. We refer to this distribution as the th-order marginalization of the distribution of X n, Y n. In subsection A we show that the normalized mutual information between X I+ I and Y I+ I is esentially upper bounded by R + HX HX, R denoting the rate of the code and HX the source entropy rate. This is seen in subsection B to imply the convergence in distribution of X I+ I, Y I+ I to the joint distribution attaining RX, D the th-order rate distortion function whenever it is unique and satisfies the relation RX, D = RX, D + HX HX with RX, D denoting the rate distortion function of the source. This includes all i.i.d. sources, as well as the large family of sources for which the Shannon lower bound holds with equality cf. [Gra70, Gra7, BG98] and references therein. In subsection C we show that convergence in any reasonable sense cannot hold for any code sequence when increases with n such that /n > RX, D/HỸ, where HỸ is the entropy rate of the reconstruction process attaining the minimum mutual information defining the rate distortion function. In subsection D we apply the results to obtain performance bounds for compressionbased denoising. We extend results from [Don02] by showing that any additive noise distribution induces a distortion measure such that if optimal lossy compression of the noisy signal is performed under it, at a distortion level matched to the level of the noise, then the marginalized joint distribution of the noisy source and the reconstruction converges to that of the noisy source and the underlying clean source. This is shown to lead to bounds on the performance of such compression-based denoising schemes. Section 4 presents the primary part of our wor, which consists of pointwise analogues to the results of Section 3. More specifically, we loo at properties of the random empirical th-order joint distribution ˆQ [X n, Y n ] induced by the source-reconstruction pair of n-tuples X n, Y n. The main result of subsection A Theorem 7 asserts that Up to an o term, independent on the particular code. 2

4 R+ HX HX is not only an upper bound on the normalized mutual information between X I+ I and Y I+ I as established in Section 3, but is essentially 2, with probability one, an upper bound on the normalized mutual information under ˆQ [X n, Y n ]. This is seen in subsection B to imply the sample converse to lossy source coding of [Kie9], avoiding the use of the ergodic theorem in [Kie89]. In subsection C this is used to establish the almost sure convergence of ˆQ to the joint distribution attaining the th-order rate distortion function of the source, under the conditions stipulated for the convergence in the setting of Section 3. In subsection D we apply these almost sure convergence results to derive a pointwise analogue of the performance bound for compression-based denoising derived in subsection 3.D. In subsection E we show that a simple post-processing derandomization scheme performed on the output of the previously analyzed compression-based denoisers results in essentially optimum denoising performance. The empirical distribution of good lossy source codes was first considered by Kanlis, Khudanpur and Narayan in [KSP96]. For memoryless sources they showed that any good sequence of codeboos must have an exponentially non-negligible fraction of codewords which are typical with respect to the RD-achieving output distribution. It was later further shown in [Kan97] that this subset of typical codewords carries most of the probabilistic mass in that the probability of a source word having a non-typical codeword is negligible. That wor also setched a proof of the fact that the source word and its reconstruction are, with high probability, jointly typical with respect to the joint distribution attaining RD. More recently, Donoho established distributional properties of good rate distortion codes for certain families of processes and applied them to performance analysis of compression based denoisers [Don02]. The main innovation in the parts of the present wor related to good source codes is in considering the pointwise behavior of any th-order joint empirical distribution for general stationary ergodic processes. Other than sections 3 and 4 which were detailed above, we introduce notation and conventions in Section 2, and summarize the paper with a few of its open directions in Section 5. 2 Notation and Conventions X and Y will denote, respectively, the source and reconstruction alphabets which we assume throughout to be finite. We shall also use the notation x n for x,..., x n and x j i = x i,... x j. We denote the set of probability measures on a finite set A by MA. For P, P MA we let P P = max a A P a P a. BP, δ will denote the l ball around P, i.e., BP, δ = {P : P P δ}. For {P n }, P n, P MA, P n d P, Pn P, and P n P will all stand for lim P n P = 0. 2 For Q MX Y, IQ will denote the mutual information IX; Y when X, Y Q. Similarly, for Q MX Y, IQ will stand for IX ; Y when X, Y Q. For P MA and a stochastic matrix W from A into B, P W MA B is the distribution on A, B under which PrA = a, B = b = P aw b a. 2 In the limit of large n. 3

5 H P W X Y and HW P will both denote HX Y when X, Y P W. IP ; W will stand for IX; Y when X, Y P W. When dealing with expectations of functions or with functionals of random variables, we shall sometimes subscript the distributions of the associated random variables. Thus, for example, for any f : A R and P MA we shall write E P fa for the expectation of fa when A is distributed according to P. M n A will denote the subset of MA consisting of n-th-order empirical types, i.e., P M n A if and only if P a is an integer multiple of /n for all a A. For any P MX, C n P will denote the set of stochastic matrices conditional distributions from X into Y for which P W M n X Y. For P MA we let T n P, or simply T P, denote the associated type class, i.e., the set of sequences u n A n with M n A will denote the set of P MA for which T n P sequences u n A n with n { i n : u i = a} = P a a A. 3. For δ 0, T n [P ] δ, or simply T [P ]δ, will denote the set of n { i n : u i = a} P a δ a A. 4 For a random element, P subscripted by the element will denote its distribution. Thus, for example, for the process X = X, X 2,..., P X, P X n and P X will denote, respectively, the distribution of its first component, the distribution of its first n components, and the distribution of the process itself. If, say, X 0, Z j i are jointly distributed then P X0 z j MX will denote the conditional distribution of X 0 given Z j i = zj i. This will also hold for the cases i with i = and\or j = by assuming P X0 z j to be a regular version of the conditional distribution of X 0 given i Z j i, evaluated at zj i. For a sequence of random variables {X d d n}, X n X will stand for PXn PX. For x n X n and y n Y n we let ˆQ [x n ] MX, ˆQ [y n ] MY, and ˆQ [x n, y n ] MX Y denote the -th order empirical distributions defined by and ˆQ [x n ]u = ˆQ [y n ]v = ˆQ [x n, y n ]u, v = {0 i n : xi+ i+ n + = u }, 5 {0 i n : yi+ i+ n + = v }, 6 {0 i n : xi+ i+ n + = u, y i+ i+ = v }. 7 ˆQ [y n x n ] will denote the conditional distribution P Y X when X, Y ˆQ [x n, y n ]. In accordance with notation defined above, for example, E ˆQ [x n,y n ] fx, Y will denote expectation of fx, Y when X, Y ˆQ [x n, y n ]. Definition A fixed-rate n-bloc code is a pair C n, φ n where C n Y n is the code-boo and φ n : X n C n. The rate of the bloc-code is given by n log C n. The rate of a sequence of bloc codes is defined by lim sup n log C n. Y n = Y n X n = φ n X n will denote the reconstruction sequence when the n-bloc code is applied to the source sequence X n. 4

6 Definition 2 A variable-rate code for n-blocs is a triple C n, φ n, l n with C n and φ n as in Definition. The code operates by mapping a source n-tuple X n into C n via φ n, and then encoding the corresponding member of the code-boo denoted Y n in Definition using a uniquely decodable binary code. Letting l n X n denote the associated length function, the rate of the code is defined by E n l nx n and the rate of a sequence of codes for the source X = X, X 2,... is defined by lim sup E n l nx n. Note that the notion of a code in the above definitions does not involve a distortion measure according to which goodness of reconstruction is judged. We assume throughout a given single-letter loss function ρ : X Y [0, satisfying x X y Y s.t. ρx, y = 0. 8 For x n X n and y n Y n we let ρ n x n, y n = n ρx i, y i. We shall also let δ Y X denote a conditional distribution of Y given X with the property that for all x X δ Y X y xρx, y = 0 9 y Y note that 8 implies existence of at least one such conditional distribution. The rate distortion function associated with the random variable X X is defined by RX, D = min IX; Y, 0 EρX,Y D where the minimum is over all joint distributions of the pair X, Y X Y consistent with the given distribution of X. Letting ρ n denote the normalized distortion measure between n-tuples induced by ρ, ρ n x n, y n = n the rate distortion function of X n X n is defined by RX n, D = n ρx i, y i, i= min Eρ nx n,y n D n IXn ; Y n. 2 The rate distortion function of the stationary ergodic process X = X, X 2,... is defined by RX, D = lim RXn, D. 3 Note that our assumptions on the distortion measure, together with the assumption of finite source alphabets, combined with its well-nown convexity, imply that RX, D as a function of D [0, is. Non-negative valued and bounded. 2. Continuous. 3. Strictly decreasing in the range 0 D D X max, where D X max D D X max, = min{d : RX, D = 0} and identically 0 for 5

7 properties that will be tacitly relied on in proofs of some of the results. The set { } RX, D, D : 0 D Dmax X will be referred to as the rate distortion curve. The distortion rate function is the inverse of RX,, which we formally define to account for the trivial regions as unique value of D such that RX, D = R for R 0, RX, 0] DX, R = Dmax X for R = 0 0 for R > RX, 0. 3 Distributional Properties of Rate-Constrained Codes Throughout this section codes can be taen to mean either fixed- or variable-rate codes. 4 A Bound on the Mutual Information Induced by a Rate-Constrained Code The following result and its proof are implicit in the proof of the converse of [DW03, Theorem ]. Theorem Let X = X, X 2... be stationary and Y n = Y n X n be the reconstruction of an n-bloc code of rate R. Let J be uniformly distributed on {,..., n}, and independent of X. Then IX J ; Y J R + HX n HXn. 5 In particular IX J ; Y J R when X is memoryless. Proof: HY n IX n ; Y n = HX n HX n Y n n = HX n nhx + [HX i HX i X i, Y n ] HX n nhx + = HX n nhx + i= n [HX i HX i Y i ] i= n IX i ; Y i HX n nhx + nix J ; Y J, 6 d where the last inequality follows from the facts that by stationarity for all i X J = Xi, that P YJ X J = n n i= P Y i X i, and the convexity of the mutual information in the conditional distribution cf. [CT9, Theorem 2.7.4]. The fact that Y n is the reconstruction of an n-bloc code of rate R implies HY n nr, completing the proof when combined with 6. Remar: It is readily verified that 5 remains true even when the requirement of a stationary X is relaxed to the requirement of equal first order marginals. To see that this cannot be relaxed much further note that when the source is an individual sequence the right side of 5 equals R. On the other hand, if this individual sequence is non-constant one can tae a code of rate 0 consisting of one codeword whose joint empirical distribution with the source sequence has positive mutual information, violating the bound. i= 6

8 Theorem 2 Let X = X, X 2... be stationary and Y n = Y n X n be the reconstruction of an n-bloc code of rate R. Let n and J be uniformly distributed on {,..., n + }, and independent of X. Then I X J+ J ; Y J+ n J n R + HX n HXn + 2 n log X. 7 Proof: Let, for j, S j = { i n : i = j mod }. Note that n S j n j. Let P X,Y distribution defined by Note, in particular, that Now be the P j x, y = Pr{X i+ X,Y i = x, Y i+ i = y }. 8 S j i S j nr HY n H j H P X J+ J,Y = J+ J Y j+ Sj X j+ Sj j j= S j n P j X,Y. 9 S j HX + S j I P j, 20 X,Y where the first inequality is due to the rate constraint of the code and the last one follows from the bound established in 6 with the assignment n S j, X n, Y n X j+ Sj j, Y j+ Sj j, X i, Y i X j+i j+i, Xj+i j+i. Rearranging terms I P j X,Y n S j R + HX S j H X j+ Sj j [ n n R + HX n HXn + 2 n where in the second inequality we have used the fact that HX n H + HX j, Xj+ Sj n H X j+ Sj j X j+ Sj j ] log X, log X. Inequality 7 now follows from 2, 9, the fact that P X J+ = P j both equaling the distribution of a source J X -tuple for all j, and the convexity of the mutual information in the conditional distribution. An immediate consequence of Theorem 2 is Corollary Let X = X, X 2... be stationary and {Y n } be a sequence of bloc codes of rate R. For n let J n, be uniformly distributed on {,..., n + }, and independent of X. Consider the pair X n, Y n = X n, Y n X n and denote Then Proof: Theorem 2 implies Xn,, Ỹ n, = lim sup completing the proof by letting. X J n, + J n,, Y J n, J n, lim sup lim sup I Xn, ; Ỹ n, R. 23 I Xn, ; Ỹ n, R + HX HX, 24 7

9 B The Empirical Distribution of Good Codes The converse in rate distortion theory cf. [Ber7], [Gal68] asserts that if X is stationary ergodic and R, D is a point on its rate distortion curve then any code sequence of rate R must satisfy lim inf Eρ nx n, Y n X n D. 25 The direct part, on the other hand, guarantees the existence of codes that are good in the following sense. Definition 3 Let X be stationary ergodic, 0 D D X max, and R, D be a point on its rate distortion curve. The sequence of codes with associated mappings {Y n } will be said to be good for the source X at rate R or at distortion level D if it has rate R and lim sup Eρ n X n, Y n X n D. 26 Note that the converse to rate distortion coding implies that the limit supremum in the above definition is actually a limit and the rate of the corresponding good code sequence is = R. Theorem 3 Let {Y n } correspond to a good code sequence for the stationary ergodic source X and Xn,, Ỹ n, be defined as in Corollary. Then:. For every RX, D lim inf so, in particular, lim, Ỹ n, = D, 27 I Xn, ; Ỹ n, lim sup I Xn, ; Ỹ n, RX, D + HX HX 28 lim lim inf I Xn, ; Ỹ n, = lim lim sup I Xn, ; Ỹ n, = RX, D If it also holds that RX, D = RX, D + HX HX then lim I Xn, ; Ỹ n, = RX, D If, additionally, RX, D is uniquely achieved in distribution by the pair X, Ỹ then Xn,, Ỹ n, d X, Ỹ as n. 3 The condition in the second item clearly holds for any memoryless source. It holds, however, much beyond memoryless sources. Any source for which the Shannon lower bound holds with equality is readily seen to satisfy RX, D = RX, D + HX HX for all. This is a rich family of sources that includes Marov processes, hidden Marov processes, auto-regressive sources, and Gaussian processes in the continuous alphabet setting, cf. [Gra70, Gra7, BG98, EM02, WM03]. The first part of Theorem 3 will be seen to follow from Theorem 2. Its last part will be a consequence of the following lemma, whose proof is given in Appendix A. 8

10 Lemma Let RD be uniquely achieved by the joint distribution P X,Y. distributions on X, Y satisfying and Then P n X d lim I P n X,Y Let further {P n X,Y } be a sequence of P X, 32 = RD, 33 lim E ρx, Y = D. 34 P n X,Y P n X,Y d P X,Y. 35 Proof of Theorem 3: For any code sequence {Y n }, an immediate consequence of the definition of Xn,, Ỹ n, is that Combined with the fact that {Y n } is a good code this implies [ lim Eρ Xn,, Ỹ n, ] Eρ n X n, Y n = lim Eρ Xn,, Ỹ n, = lim Eρ nx n, Y n = D, 37 proving 27. To prove 28 note that since {Y n } is a sequence of codes for X at rate RX, D, Theorem 2 implies lim sup I Xn, ; Ỹ n, RX, D + HX HX. 38 On the other hand, since X n, d = X, it follows from the definition of RX, that implying I Xn, ; Ỹ n, R X, Eρ Xn,, Ỹ n,, 39 lim inf I Xn,, Ỹ n, RX, D 40 by 37 and the continuity of RX,. This complete the proof of 28. Displays 29 and 30 are immediate consequences. The convergence in 3 is a direct consequence of 30 and Lemma applied with the assignment X X, Y Y and ρ ρ. Remar: Note that to conclude the convergence in 3 in the above proof a weaer version of Lemma would have sufficed, that assumes P n X = P X for all n instead of P n d X P X in 32. This is because clearly, by construction of X n,, Xn, = d X for all n and n. The stronger form of Lemma given will be instrumental in the derivation of a pointwise analogue to 3 in Section 4 third item of Theorem 9. C A Lower Bound on the Convergence Rate Assume a stationary ergodic source X that satisfies, for each, RX, D = RX, D + HX HX and for which RX, D is uniquely achieved by the pair X, Ỹ. Theorem 3 implies that any good code sequence for X at rate RX, D must satisfy X n,, Ỹ n, d X, Ỹ as n 4 9

11 recall Corollary for the definition of the pair X n,, Ỹ n,. This implies, in particular, It follows from 42 that for n increasing slowly enough with n Display 43 is equivalent to HỸ n, HỸ as n. 42 n [HỸ n,n HỸ n ] 0 as n. 43 HỸ n,n HỸ as n, 44 n where we denote lim HỸ by HỸ a limit which exists3 under our hypothesis that RX, D is uniquely achieved by X, Ỹ for every. The rate at which n may be allowed to increase while maintaining the convergence in 44 may depend on the particular code sequence through which X n,, Ỹ n, are defined. We now show, however, that if n increases more rapidly than a certain fraction of n, then the convergence in 44 fails for any code sequence. To see this note that for any Now, since Y n is associated with a good code, Letting n = αn, 49 and 50 imply HỸ n, = HY J n, + 45 J n, HY J n, +, J n, 46 J n, lim sup excluding the possibility that 44 holds whenever HY n, J n, 47 HY n + HJ n, 48 HY n + log n. 49 n HY n RX, D. 50 HỸ n,n RX, D, 5 n α α > RX, D. 52 HỸ This is analogous to the situation in the empirical distribution of good channel codes where it was shown in [SV97, Section III] for the memoryless case that convergence is excluded whenever n = αn for α > C H, where H stands for the entropy of the capacity-achieving channel input distribution. To sum up, we see that convergence taes place whenever = O Theorem 3, and does not tae place whenever = αn for α satisfying 52. For growing with n at the intermediate rates convergence depends on the particular code sequence, and examples can be constructed both for convergence and for lac thereof analogous to [SV97, Examples 4,5]. 3 In particular, for a memoryless source HỸ = HỸ. For sources for which the Shannon lower bound is tight it is the entropy rate of the source which results in X when corrupted by the max-entropy white noise tailored to the corresponding distortion measure and level. 0

12 D Applications to Compression-Based Denoising The point of view that compression may facilitate denoising was put forth by Natarajan in [Nat95]. It is based on the intuition that the noise constitutes that part of a noisy signal which is least compressible. Thus, lossy compression of the noisy signal, under the right distortion measure and at the right distortion level, should lead to effective denoising. Compression-based denoising schemes have since been suggested and studied under various settings and assumptions cf. [Nat93, Nat95, NKH98, CYV97, JY0, Don02, TRA02, Ris00, TPB99] and references therein. In this subsection we consider compression-based denoising when the clean source is corrupted by additive white noise. We will give a new performance bound for denoisers that optimally lossily compress the noisy signal under distortion measure and level induced by the noise. For simplicity, we restrict attention throughout this section to the case where the alphabets of the clean-, noisy-, and reconstructed-source are all equal to the M-ary alphabet A = {0,,..., M }. Addition and subtraction between elements of this alphabet should be understood modulo-m throughout. The results we develop below have analogues for cases where the alphabet is the real line. We consider the case of a stationary ergodic source X corrupted by additive white noise N. That is, we assume N i are i.i.d. N, independent of X, and that the noisy observation sequence Z is given by Z i = X i + N i. 53 By channel matrix we refer to the Toeplitz matrix whose, say, first row is PrN = 0,..., PrN = M. We assume that PrN = a > 0 for all a A and associate with it a difference distortion measure ρ N defined by ρ N a = log We shall omit the superscript and let ρ stand for ρ N throughout this section. Fact is uniquely achieved in distribution by N. max{hx : X is A-valued and EρX HN} PrN = a. 54 Proof: N is clearly in the feasible set by the definition of ρ. Now, for any A-valued X in the feasible set with X d N HN EρX = P X a log PrN = a a > HX, where the strict inequality follows from DP X P N > 0. Theorem 4 Let RZ, denote the -th order rate distortion function of Z under the distortion measure in 54. Then RZ, HN = HZ HN. 55 Furthermore, it is uniquely achieved in distribution by the pair Z, X whenever the channel matrix is invertible.

13 Proof: Let Z, Y achieve RZ, HN. Then RZ, HN = IZ ; Y = HZ HZ Y HZ HZ Y Y HZ HZ Y HZ HN, 56 where the last inequality follows by the fact that Eρ n Z, Y HN otherwise Z, Y would not be an achiever of RZ, HN and Fact. The first part of the theorem follows since the pair Z, X satisfies Eρ n Z, X = HN and is readily seen to satisfy all the inequalities in 56 with equality. For the uniqueness part note that it follows from the chain of inequalities in 56 that if Z, Y achieve RZ, HN then Z is the output of the memoryless additive channel with noise components N whose input is Y. The invertibility of the channel matrix guarantees uniqueness of the distribution of the Y satisfying this for any given output distribution cf., e.g., [WOS + 03]. Let now Λ : A A [0, be the loss function according to which denoising performance is measured. For u n, v n A n let Λ n u n, v n = n n i= Λu i, v i. Define Ψ : MA MA [0, by and Ψ : MA [0, by ΓP, Q = ΨP = ΓP, P = max EΛU, V. 57 U,V :U P,V Q max EΛU, V. 58 U,V :U P,V P It will be convenient to state the following implied by an elementary continuity argument for future reference Fact 2 Let {R n }, {Q n } be sequences in MA with R n P and Q n P. Then lim ΓR n, Q n = ΨP. Example In the binary case, M = 2, when Λ denotes Hamming distortion, it is readily checed that, for 0 p ΨBernoullip = 2 min{p, p} = 2φp, 59 where φp = min{p, p} 60 is the well-nown Bayesian envelope for this setting [Han57, Sam63, MF98]. The achieving distribution in this case assuming, say, p /2 is PrU = 0, V = 0 = 2p, PrU = 0, V = = p, PrU =, V = 0 = p, PrU = 0, V = 0 = 0. 6 Theorem 5 Let {Y n } correspond to a good code sequence for the noisy source Z at distortion level HN under the difference distortion measure in 54 and assume that the channel matrix is invertible. Then P X0 Z lim sup EΛ n X n, Y n Z n EΨ. 62 2

14 Note, in particular, that for the case of a binary source corrupted by a BSC, the optimum distribution dependent denoising performance is Eφ P X0 Z cf. [WOS + 03, Section 5], so the right side of 62 is, by 59, precisely twice the expected fraction of errors made by the optimal denoiser in the limit of many observations. Note that this performance can be attained universally i.e., with no a priori nowledge of the distribution of the noise-free source X by employing a universal rate distortion code on the noisy source, e.g., the fixed distortion version of the Yang-Kieffer codes 4 [YK96, Section III]. Proof of Theorem 5: Similarly as in Corollary, for n 2 let J n, be uniformly distributed on {+,..., n + }, and independent of X, Z. Consider the triple X n, Z n, Y n = X n, Z n, Y n Z n and denote Note first that so and, in particular, Also, by the definition of Γ, Now, clearly X [n,], Z [n,], Y [n,] = X J n, + J n,, ZJn, + J n,, Y J n, J n, EΛ n X n, Y n Z n 2Λ max n EΛ X [n,] 0, Y [n,] 0 n + EΛ X [n,] 0, Y [n,] 0 = EΛX i, Y i 64 n 2 i=+ EΛ X [n,] 0, Y [n,] 0 n n 2 EΛ nx n, Y n Z n 65 [ ] lim EΛ n X n, Y n Z n EΛ X [n,] 0, Y [n,] 0 = = z A E z A Γ [ ] Λ X [n,] 0, Y 0 [n,] Z [n,] = z Pr Z [n,] = z P X [n,] 0 Z, P [n,] =z Y [n,] 0 Z Pr Z [n,] [n,] =z = z. 67 X [n,], Z [n,] d= X, Z so, in particular, z A P X [n,] 0 Z [n,] =z = P X0 Z =z. 68 On the other hand, from the combination of Theorem 3 and Theorem 4 it follows, since {Y n } corresponds to a good code sequence, that and therefore, in particular, z A Y [n,], Z [n,] d X, Z as n. 69 P Y [n,] 0 Z [n,] =z P X0 Z =z as n. 70 The combination of 68, 70, and Fact 2 implies that z A Γ P X [n,] 0 Z, P [n,] =z Y [n,] 0 Z Ψ P [n,] =z X0 Z =z as n. 7 4 This may conceptually motivate employing a universal lossy source code for denoising. Implementation of such a code, however, is too complex to be of practical value. It seems even less motivated in light of the universally optimal and practical scheme in [WOS + 03]. 3

15 Thus we obtain lim sup EΛ n X n, Y n Z n a = lim sup b = EΨ z A Ψ EΛ X [n,] 0, Y [n,] 0 P X0 Z Pr Z =z = z P X0 Z, 72 where a follows from 66 and b from combining 67 with 7 and the fact that Z [n,] that lim P X0 Z = P X 0 Z theorem imply which completes the proof when combined with 72. d = Z. The fact a.s. martingale convergence, the continuity of Ψ, and the bounded convergence lim P EΨ X0 Z = EΨ P X0 Z, 73 4 Sample Properties of the Empirical Distribution Throughout this section codes should be understood in the fixed-rate sense of Definition. A Pointwise Bounds on the Mutual Information Induced by a Rate-Constrained Code The first result of this section is the following. Theorem 6 Let {Y n } be a sequence of bloc codes with rate R. For any stationary ergodic X lim sup I ˆQ [X n, Y n X n ] R + HX HX a.s. 74 In particular, when X is memoryless, lim sup I ˆQ [X n, Y n X n ] R a.s. 75 The following lemma is at the heart of the proof of Theorem 6. Lemma 2 Let Y n be an n-bloc code of rate R. Then, for every P M n X and η > 0, where ε n = X Y n logn +. { x n T P : I ˆQ [x n, Y n x n ] > R + η} e n[hp +ε n η], 76 Lemma 2 quantifies a remarable limitation that the rate of the code induces: for any η > 0, only an exponentially negligible fraction of the source sequences in any type can have an empirical joint distribution with their reconstruction with mutual information exceeding the rate by more than η. Proof of Lemma 2: For every W C n P let SP, W = {x n : x n, Y n x n T P W } 77 4

16 and note that SP, W e nr e n[h P W X Y ], 78 since there are no more than e nr different values of Y n x n and, by [CK8, Lemma 2.3], no more than e n[h P W X Y ] different sequences in SP, W that can be mapped to the same value of Y n. It follows that Now, by definition, so IP ; W R + η W C n P : SP, W e n[hp η]. 79 { } x n T P : I ˆQ [x n, Y n x n ] > R + η = { x n T P : I ˆQ [x n, Y n x n ] W C np :IP ;W >R+η } > R + η W C np :IP ;W >R+η W C np :IP ;W >R+η n[hp η] C n P e SP, W 80 SP, W e n[hp η] 8 n + X Y n[hp η] e where 8 follows from 79 and 82 from the definition of ε n. e n[hp +εn η], 82 We shall also mae use of the following converse to the AEP proof of which is deferred to Appendix B. Lemma 3 Let X be stationary ergodic and A n X n an arbitrary sequence satisfying lim sup Then, with probability one, X n A n for only finitely many n. Proof of Theorem 6: Fix an arbitrary η > 0 and ξ > 0 small enough so that n log A n < HX. 83 max P BP X,ξ HP HX + η/2 84 the continuity of the entropy functional guarantees the existence of such a ξ > 0. Lemma 2 with the assignment η HX HX + η guarantees that for every n and P M n X, { x n T P : I ˆQ [x n, Y n x n ] > R n + HX HX + η} e n[hp +ε n HX +HX η], 85 where R n denotes the rate of Y n. Thus { x n T [PX ] ξ : I ˆQ [x n, Y n x n ] > R n + HX HX + η} 86 { [ ]} exp n max HP + 2ε n HX + HX η 87 P BP X,ξ exp{n[2ε n + HX η/2]}, 88 5

17 where 87 follows from 85 and from the obvious fact that T [PX ] ξ is a union of less than e nεn strict types, and 88 follows from 84. Denoting now the set in 86 by A n, i.e., A n = { } x n T [PX ] ξ : I ˆQ [x n, Y n x n ] > R n + HX HX + η, 89 we have { } x n : I ˆQ [x n, Y n x n ] > R n + HX HX + η T[P c X ] ξ A n. 90 But, by ergodicity, X n T c [P X ] ξ for only finitely many n with probability one. On the other hand, the fact that A n exp{n[2ε n + HX η/2]} recall 88 implies, by Lemma 3, that X n A n for only finitely many n with probability one. Thus we get I ˆQ [X n, Y n X n ] R n + HX HX + η eventually a.s. 9 But, by hypothesis, {Y n } has rate R namely lim sup R n R so 9 implies lim sup I ˆQ [X n, Y n X n ] R + HX HX + η a.s. 92 which completes the proof by the arbitrariness of η > 0. Theorem 6 extends to higher-order empirical distributions as follows: Theorem 7 Let {Y n } be a sequence of bloc codes with rate R. For any stationary ergodic X and lim sup In particular, when X is memoryless, I ˆQ [X n, Y n X n ] R + HX HX a.s. 93 lim sup Note that defining, for each 0 j, ˆQ,j by 5 I ˆQ [X n, Y n X n ] R a.s. 94 ˆQ,j [x n, y n ]u, v = { 0 i n n : xi++j i+j+ = u, y i++j i+j+ = v }, 95 it follows from Theorem 6 applied to a th-order super symbol that when the source is ergodic in -blocs 6 lim sup Then, since for fixed and large n ˆQ I ˆQ,j [X n, Y n X n ] R + HX HX a.s. 96 j=0 ˆQ,j and, by the -bloc ergodictiy, ˆQ,j [X n ] ˆQ,j [X n ] P X, it would follow from the continuity of I and its convexity in the conditional distribution that, for large n, I ˆQ < j=0 I ˆQ,j, implying 93 when combined with 96. The proof that follows maes these continuity arguments precise and accommodates the possibility that the source is not ergodic in -blocs. Proof of Theorem 7: For each 0 j denote the th-order super source by X,j = {X i++j i+j+ } i. Lemma of [Gal68] implies the existence of events A 0,..., A with the following properties: 5 Note that ˆQ,j [x n, y n ] may not be a bona fide distribution but is very close to one for fixed and large n. 6 When ˆQ,j [x n, y n ] is not a bona fide distribution I ˆQ,j [x n, y n ] should be understood as the mutual information under the normalized version of ˆQ,j [x n, y n ]. 6

18 { / if i = j. P A i A j = 0 otherwise. 2. For each i, j {0,..., }, X,j conditioned on A i is stationary and ergodic. 3. X,j conditioned on A i is equal in distribution to X,j+ conditioned on A i+ where the indices i, j are to be taen modulo. Note that ˆQ [x n, y n ] j=0 ˆQ,j [x n, y n ] δ n, 97 where {δ n } n is a deterministic sequence with δ n 0 as n. Theorem 6 applied to X,j conditioned on A i gives lim sup I ˆQ,j [X n, Y n X n ] R + HX +j j+ A i HX,j A i a.s. on A i, 98 where HX +j j+ A i and HX,j A i denote, respectively, the entropy of the distribution of X +j j+ when conditioned on A i and the entropy rate of X,j when condition on A i. Now, due to the fact that X,j conditioned on A i is stationary ergodic we have, for each j, lim ˆQ,j [X n ] = P X A i a.s. on A i, 99 with P X A i denoting the distribution of X conditioned on A i. Equation 99 implies, in turn, by a standard continuity argument letting ˆQ,j [y n x n ] be defined analogously as ˆQ [y n x n ], lim ˆQ,j [X n, Y n X n ] P X A i ˆQ,j [Y n X n X n ] = 0 a.s. on A i. 00 Combined with 97, 00 implies also lim ˆQ [X n, Y n X n ] P X A i j=0 ˆQ,j [Y n X n X n ] = 0 a.s. on A i 0 implying, in turn, by the continuity of the mutual information as a function of the joint distribution I ˆQ [X n, Y n X n ] I P X A i j=0 ˆQ,j [Y n X n X n ] 0 a.s. on A i. 02 On the other hand, by the convexity of the mutual information in the conditional distribution [CT9, Theorem 2.7.4], we have for all, n and all sample paths I P X A i ˆQ,j [Y n X n X n ] I P X A i ˆQ,j [Y n X n X n ]. 03 j=0 Using 99 and the continuity of the mutual information yet again gives for all j j=0 I P X A i ˆQ,j [Y n X n X n ] I ˆQ,j [X n, Y n X n ] 0 a.s. on A i. 04 7

19 Consequently, a.s. on A i, lim sup I ˆQ [X n, Y n X n ] lim sup I ˆQ,j [X n, Y n X n ] j=0 05 R + [HX +j j+ A i HX,j A i ] 06 j=0 = R + [HX A i j mod HX, A i j mod ] 07 j=0 = R + [HX A j HX, A j ], 08 j=0 where 05 follows by combining 02-04, 06 follows from 98, and 07 follows from the third property of the events A i recalled above. Since 08 does not depend on i we obtain lim sup I ˆQ [X n, Y n X n ] R + [HX A j HX, A j ] a.s. 09 Defining the {0,..., }-valued random variable J by j=0 J = i on A i, 0 it follows from the first property of the sets A i recalled above that and that HX A j = HX J HX j=0 HX, A j = HX, J j=0 Combining 09 with and 2 gives 93. A direct consequence of 93 is the following: = lim n HXn J = lim n [HXn, J HJ] = lim lim n HXn, J n HXn = HX. 2 Corollary 2 Let {Y n } be a sequence of bloc codes with rate R. For any stationary ergodic X lim sup lim sup I ˆQ [X n, Y n X n ] R a.s. 3 8

20 B Sample Converse in Lossy Source Coding One of the main results of [Kie9] is the following. Theorem 8 Theorem in [Kie9] Let {Y n } be a sequence of bloc codes with rate R. For any stationary ergodic X lim inf ρ nx n, Y n X n DX, R a.s. 4 The proof in [Kie9] was valid for general source and reconstruction alphabets, and was based on the powerful ergodic theorem in [Kie89]. We now show how Corollary 2 can be used to give a simple proof, valid in our finite-alphabet setting. Proof of Theorem 8: Note first that by a standard continuity argument and ergodicity, for each, lim ˆQ [X n, Y n X n ] P X ˆQ [Y n X n X n ] = 0 a.s. 5 implying, in turn, by the continuity of the mutual information as a function of the joint distribution I ˆQ [X n, Y n X n ] I P X ˆQ [Y n X n X n ] 0 a.s. 6 Fix now ε > 0. By Corollary 2, with probability one there exists a large enough that I ˆQ [X n, Y n X n ] R + ε for all sufficiently large n, 7 implying, by 6, with probability one existence of large enough that I P X ˆQ [Y n X n X n ] R + 2ε for all sufficiently large n. 8 By the definition of D X, it follows that for all such and n E PX ˆQ [Y n X n X n ] ρ X, Y D X, R + 2ε DX, R + 2ε, 9 implying, by the continuity property 5, with probability one the existence of large enough that E ˆQ [X n,y n X n ] ρ X, Y DX, R + 2ε ε for all sufficiently large n. 20 By the definition of ˆQ it is readily verified that for any and all sample paths 7 E ˆQ [X n,y n X n ] ρ X, Y ρ n X n, Y n ε n, 2 where {ε n } n is a deterministic sequence with ε n 0 as n. Combined with 20 this implies lim inf ρ nx n, Y n DX, R + 2ε ε a.s., 22 completing the proof by the arbitrariness of ε and the continuity of DX,. 7 In fact, we would have pointwise equality E ˆQ [x n,y n ] ρ X, Y = ρ nx n, y n had ˆQ been slightly modified from its definition in 2 to ˆQ [x n, y n ]u, v = { i n : xi+ n i+ = u, y i+ i+ = v } with the indices understood modulo n. 9

21 C Sample Behavior of the Empirical Distribution of Good Codes In the context of Theorem 8 we mae the following sample analogue of Definition 3. Definition 4 Let {Y n } be a sequence of bloc codes with rate R and let X be a stationary ergodic source. {Y n } will be said to be a pointwise good sequence of codes for the stationary ergodic source X in the strong sense at rate R if lim sup ρ n X n, Y n X n DX, R a.s. 23 Note that Theorem 8 implies that the limit supremum in the above definition is actually a limit, the inequality in 23 actually holds with equality, and the rate of the corresponding good code sequence is = R. The bounded convergence theorem implies that a pointwise good sequence of codes is also good in the sense of Definition 3. The converse, however, is not true [Wei]. The existence of pointwise good code sequences is a nown consequence of the existence of good code sequences [Kie9, Kie78]. In fact, there exist pointwise good code sequences that are universally good for all stationary and ergodic sources [Ziv72, Ziv80, YK96]. Henceforth the phrase good codes should be understood in the sense of Definition 4, even when omitting pointwise. The following is the pointwise version of Theorem 3. Theorem 9 Let {Y n } correspond to a pointwise good code sequence for the stationary ergodic source X at rate RX, D. Then:. For every RX, D lim inf so in particular I ˆQ [X n, Y n X n ] lim E ˆQ [X n,y n X n ] ρ X, Y = D a.s., 24 lim sup I ˆQ [X n, Y n X n ] RX, D+ HX HX a.s., 25 lim lim inf I ˆQ [X n, Y n X n ] = lim lim sup I ˆQ [X n, Y n X n ] = RX, D a.s If it also holds that RX, D = RX, D + HX HX then lim I ˆQ [X n, Y n X n ] = RX, D a.s If, additionally, R X, D is uniquely achieved by the pair X, Ỹ then ˆQ [X n, Y n X n ] P X,Ỹ a.s. 28 Proof of Theorem 9: {Y n } being a good code sequence implies lim ρ nx n, Y n X n = D a.s., 29 20

22 implying 24 by 2. Reasoning similarly as in the proof of Theorem 8, it follows from 5 that implying lim E P X ˆQ [Y n X n X n ] ρ X, Y = lim E ˆQ [X n,y n X n ] ρ X, Y = D a.s., 30 lim inf I ˆQ [X n, Y n X n ] lim inf I P X ˆQ [Y n X n X n ] R X, D a.s., 3 where the left inequality follows from 6 and the second one by 30 and the continuity of R X,. The upper bound in 25 follows from Theorem 7. Displays 26 and 27 are immediate consequences. The convergence in 28 follows directly from combining 27, 24, the fact that, by ergodicity, ˆQ [X n ] P X a.s., 32 and Lemma applied to the -th order super-symbol. D Application to Compression-Based Denoising Consider again the setting of subsection 3.D where the stationary ergodic source X is corrupted by additive noise N of i.i.d. components. The following is the almost sure version of Theorem 5. Theorem 0 Let {Y n } correspond to a good code sequence for the noisy source Z at distortion level HN under the difference distortion measure in 54 and assume that the channel matrix is invertible. Then lim sup Λ n X n, Y n Z n EΨ P X0 Z Proof: The combination of Theorem 4 with the third item of Theorem 9 gives a.s. 33 ˆQ [Z n, Y n Z n ] P Z,X a.s. 34 On the other hand, joint stationarity and ergodicity of the pair X, Z implies ˆQ [X n, Z n ] P X,Z a.s. 35 Arguing analogously as in the proof of Theorem 5 this time for the sample empirical distribution 8 ˆQ [X n, Z n, Y n ] instead of the distribution of the triple in 63 leads, taing say = 2m +, to implying 62 upon letting m. lim sup Λ n X n, Y n Z n EΨ P X0 Z m m a.s., 36 8 We extend the notation by letting ˆQ [x n, z n, y n ] stand for the -th-order empirical distribution of the triple X, Z, Y induced by x n, z n, y n. 2

23 E Derandomizing for Optimum Denoising The reconstruction associated with a good code sequence has what seems to be a desirable property in the denoising setting of the previous subsection, namely, that the marginalized distribution, for any finite, of the noisy source and reconstruction is essentially distributed lie the noisy source with the underlying clean signal, as n becomes large. Indeed, this property was enough to derive an upper bound on the denoising loss theorems 5 and 0 which, for example, for the binary case was seen to be within a factor of 2 from the optimum. We now point out, using the said property, that essentially optimum denoising can be attained by a simple post-processing procedure. The procedure is to fix an m and evaluate, for each z m m A 2m+ and y A ˆQ 2m+ [Z n, Y n Z n ]z m m, y = y m,ym ˆQ 2m+ [Z n, Y n Z n ]z m m, y myy m. In practice, the computation of ˆQ 2m+ [Z n, Y n Z n ]z m m, y can be done quite efficiently and sequentially by updating counts for the various z m m A 2m+ as they appear along the noisy sequence cf. [WOS + 03, Section 3]. Define now the n-bloc denoiser ˆX [m],n Z n by letting the reconstruction symbol at location m + i n m be given by ˆX i = arg min ˆx y and can be arbitrarily defined for i-s outside that range. ˆQ 2m+ [Z n, Y n Z n ]z i+m i m, yλy, ˆx 37 Note that ˆX i is nothing but the Bayes response to the conditional distribution of Y m+ given Z 2m+ = z i+m i m induced by the joint distribution ˆQ 2m+ [Z n, Y n Z n ] of Z 2m+, Y 2m+. However, from the conclusion in 34 we now that this converges almost surely to the conditional distribution of X m+ given Z 2m+ = z i+m i m. Thus, ˆXi in 37 is a Bayes response to a conditional distribution that converges almost surely as n to the true conditional distribution of X i conditioned on Z i+m i m. It thus follows from continuity of the performance of Bayes responses cf., e.g., [Han57, Equation 4], [WOS + 03, Lemma] and 35 that lim Λ nx n, ˆX [m],n Z n = Eφ P X0 Z m m a.s., 38 with φ denoting the Bayes envelope associated with the loss function Λ defined, for P MA, by φp = min Λx, ˆxP x. 39 ˆx A x A Letting m in 38 gives lim lim Λ nx n, ˆX [m],n Z n = Eφ P X0 Z m a.s., 40 where the right side is the asymptotic optimum distribution-dependent denoising performance cf. [WOS + 03, Section 5]. Note that ˆX [m],n can be chosen independently of a source, e.g., taing the Yang-Kieffer codes of [YK96] followed by the postprocessing detailed in 37. Thus, 40 implies that the post-processing step leads to essentially asymptotically optimum denoising performance. Bounded convergence implies also lim lim EΛ nx n, ˆX [m],n Z n = Eφ P X0 Z m 22 4

24 implying, in turn, the existence of a deterministic sequence {m n } such that for ˆX n = P X0 Z lim EΛ nx n, ˆX [mn],n Z n = Eφ ˆX [mn],n. 42 The sequence {m n } for which 42 holds, however, may depend on the particular code sequence {Y n }, as well as on the distribution of the active source X. 5 Conclusion and Open Questions In this wor we have seen that a rate constraint on a source code places a similar limitation on the mutual information associated with its -th order marginalization, for any finite, as n becomes large. This was shown to be the case both in the distributional sense of Section 3 and the pointwise sense of Section 4. This was also shown to imply, in various quantitative senses, that the empirical distribution of codes that are good in the sense of asymptotically attaining a point on the rate distortion curve becomes close to that attaining the minimum mutual information problem associated with the rate distortion function of the source. For a source corrupted by additive white noise this was shown to imply that, under the right distortion measure, and at the right distortion level, the joint empirical distribution of the noisy source and reconstruction associated with a good code tends to imitate the joint distribution of the noisy and noise-free source. This property led to performance bounds both in expectation and pointwise on the denoising performance of such codes. It was also seen to imply the existence of a simple postprocessing procedure that, when applied to these compression-based denoisers, results in essentially optimum denoising performance. One of the salient questions left open in the context of the third item in Theorem 3 respectively, Theorem 9 is whether the convergence in distribution in some appropriately defined sense continues to hold in cases beyond those required in the second item. Another interesting question, in the context of the postprocessing scheme of subsection 4.E, is whether there exists a universally good code sequence for the noisy source sequence under the distortion measure and level associated with the given noise distribution and a corresponding growth rate for m under which 42 holds simultaneously for all stationary ergodic underlying noise-free sources. Appendix A Proof of Lemma We shall use the following elementary fact from analysis. Fact 3 Let f : D R be continuous and D be compact. Assume further that min z D fz is uniquely attained by some z D. Let z n D satisfy Then lim fz n = fz = min fz. z D lim z n = z. 23

25 Equipped with this, we prove Lemma as follows. Let P n Y X denote the conditional distribution of Y given X induced by P n X,Y and define n ˆP X,Y = P X P n Y X. A. It follows from 32 that n ˆP X,Y P n X,Y 0 A.2 and therefore also, by 34, Define further lim E n ρx, Y = D. ˆP X,Y n P X,Y by n P X,Y = α n n ˆP X,Y + α np X δ Y X, A.3 A.4 where By construction, α n = min E ˆP n X,Y D ρx, Y,. A.5 P n X = P X and E n P ρx, Y D n. A.6 X,Y n n Also, A.3 implies that α n and hence ˆP X,Y P X,Y 0 implying, when combined with A.2, that n P X,Y P n X,Y 0 A.7 implying, in turn, by 33 and uniform continuity of I, Now, the set lim I n P X,Y = RD. A.8 { } P X,Y : P X = P X, E P X,Y ρx, Y D is compact, I is continuous, and RD is uniquely achieved by P X,Y. So A.6, A.8, and Fact 3 imply implying 35 by A.7. B Proof of Lemma 3 P n X,Y d P X,Y, Inequality 83 implies the existence of η > 0 and n 0 such that A.9 A n e n[hx 2η] n n 0. A.0 Defining G n,η = { x n : } n log P X nxn HX η, A. 24

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University Chapter 4 Data Transmission and Channel Capacity Po-Ning Chen, Professor Department of Communications Engineering National Chiao Tung University Hsin Chu, Taiwan 30050, R.O.C. Principle of Data Transmission

More information

Asymptotic Filtering and Entropy Rate of a Hidden Markov Process in the Rare Transitions Regime

Asymptotic Filtering and Entropy Rate of a Hidden Markov Process in the Rare Transitions Regime Asymptotic Filtering and Entropy Rate of a Hidden Marov Process in the Rare Transitions Regime Chandra Nair Dept. of Elect. Engg. Stanford University Stanford CA 94305, USA mchandra@stanford.edu Eri Ordentlich

More information

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Lecture 6 I. CHANNEL CODING. X n (m) P Y X 6- Introduction to Information Theory Lecture 6 Lecturer: Haim Permuter Scribe: Yoav Eisenberg and Yakov Miron I. CHANNEL CODING We consider the following channel coding problem: m = {,2,..,2 nr} Encoder

More information

Lecture 4 Noisy Channel Coding

Lecture 4 Noisy Channel Coding Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 9, 2015 1 / 56 I-Hsiang Wang IT Lecture 4 The Channel Coding Problem

More information

DISCRETE sources corrupted by discrete memoryless

DISCRETE sources corrupted by discrete memoryless 3476 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 8, AUGUST 2006 Universal Minimax Discrete Denoising Under Channel Uncertainty George M. Gemelos, Student Member, IEEE, Styrmir Sigurjónsson, and

More information

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122 Lecture 5: Channel Capacity Copyright G. Caire (Sample Lectures) 122 M Definitions and Problem Setup 2 X n Y n Encoder p(y x) Decoder ˆM Message Channel Estimate Definition 11. Discrete Memoryless Channel

More information

5218 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 12, DECEMBER 2006

5218 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 12, DECEMBER 2006 5218 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 12, DECEMBER 2006 Source Coding With Limited-Look-Ahead Side Information at the Decoder Tsachy Weissman, Member, IEEE, Abbas El Gamal, Fellow,

More information

Denoising via MCMC-based Lossy. Compression

Denoising via MCMC-based Lossy. Compression Denoising via MCMC-based Lossy 1 Compression Shirin Jalali and Tsachy Weissman, Center for Mathematics of Information, CalTech, Pasadena, CA 91125 Department of Electrical Engineering, Stanford University,

More information

Lecture 5 Channel Coding over Continuous Channels

Lecture 5 Channel Coding over Continuous Channels Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 14, 2014 1 / 34 I-Hsiang Wang NIT Lecture 5 From

More information

5284 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 11, NOVEMBER 2009

5284 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 11, NOVEMBER 2009 5284 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 11, NOVEMBER 2009 Discrete Denoising With Shifts Taesup Moon, Member, IEEE, and Tsachy Weissman, Senior Member, IEEE Abstract We introduce S-DUDE,

More information

Lecture 15: Conditional and Joint Typicaility

Lecture 15: Conditional and Joint Typicaility EE376A Information Theory Lecture 1-02/26/2015 Lecture 15: Conditional and Joint Typicaility Lecturer: Kartik Venkat Scribe: Max Zimet, Brian Wai, Sepehr Nezami 1 Notation We always write a sequence of

More information

Lecture 3: Channel Capacity

Lecture 3: Channel Capacity Lecture 3: Channel Capacity 1 Definitions Channel capacity is a measure of maximum information per channel usage one can get through a channel. This one of the fundamental concepts in information theory.

More information

Upper Bounds on the Capacity of Binary Intermittent Communication

Upper Bounds on the Capacity of Binary Intermittent Communication Upper Bounds on the Capacity of Binary Intermittent Communication Mostafa Khoshnevisan and J. Nicholas Laneman Department of Electrical Engineering University of Notre Dame Notre Dame, Indiana 46556 Email:{mhoshne,

More information

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion st Semester 00/ Solutions to Homework Set # Sanov s Theorem, Rate distortion. Sanov s theorem: Prove the simple version of Sanov s theorem for the binary random variables, i.e., let X,X,...,X n be a sequence

More information

Algorithms for Discrete Denoising Under Channel Uncertainty

Algorithms for Discrete Denoising Under Channel Uncertainty Algorithms for Discrete Denoising Under Channel Uncertainty George M. Gemelos Styrmir Siguronsson Tsachy Weissman Pacard 25 Pacard 25 Pacard 256 350 Serra Mall 350 Serra Mall 350 Serra Mall Stanford, CA

More information

ELEC546 Review of Information Theory

ELEC546 Review of Information Theory ELEC546 Review of Information Theory Vincent Lau 1/1/004 1 Review of Information Theory Entropy: Measure of uncertainty of a random variable X. The entropy of X, H(X), is given by: If X is a discrete random

More information

Lecture 11: Quantum Information III - Source Coding

Lecture 11: Quantum Information III - Source Coding CSCI5370 Quantum Computing November 25, 203 Lecture : Quantum Information III - Source Coding Lecturer: Shengyu Zhang Scribe: Hing Yin Tsang. Holevo s bound Suppose Alice has an information source X that

More information

EE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16

EE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16 EE539R: Problem Set 3 Assigned: 24/08/6, Due: 3/08/6. Cover and Thomas: Problem 2.30 (Maimum Entropy): Solution: We are required to maimize H(P X ) over all distributions P X on the non-negative integers

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

Electrical and Information Technology. Information Theory. Problems and Solutions. Contents. Problems... 1 Solutions...7

Electrical and Information Technology. Information Theory. Problems and Solutions. Contents. Problems... 1 Solutions...7 Electrical and Information Technology Information Theory Problems and Solutions Contents Problems.......... Solutions...........7 Problems 3. In Problem?? the binomial coefficent was estimated with Stirling

More information

On Scalable Coding in the Presence of Decoder Side Information

On Scalable Coding in the Presence of Decoder Side Information On Scalable Coding in the Presence of Decoder Side Information Emrah Akyol, Urbashi Mitra Dep. of Electrical Eng. USC, CA, US Email: {eakyol, ubli}@usc.edu Ertem Tuncel Dep. of Electrical Eng. UC Riverside,

More information

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15 EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15 1. Cascade of Binary Symmetric Channels The conditional probability distribution py x for each of the BSCs may be expressed by the transition probability

More information

EE5585 Data Compression May 2, Lecture 27

EE5585 Data Compression May 2, Lecture 27 EE5585 Data Compression May 2, 2013 Lecture 27 Instructor: Arya Mazumdar Scribe: Fangying Zhang Distributed Data Compression/Source Coding In the previous class we used a H-W table as a simple example,

More information

1590 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 6, JUNE Source Coding, Large Deviations, and Approximate Pattern Matching

1590 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 6, JUNE Source Coding, Large Deviations, and Approximate Pattern Matching 1590 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 6, JUNE 2002 Source Coding, Large Deviations, and Approximate Pattern Matching Amir Dembo and Ioannis Kontoyiannis, Member, IEEE Invited Paper

More information

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Capacity of a channel Shannon s second theorem. Information Theory 1/33 Capacity of a channel Shannon s second theorem Information Theory 1/33 Outline 1. Memoryless channels, examples ; 2. Capacity ; 3. Symmetric channels ; 4. Channel Coding ; 5. Shannon s second theorem,

More information

Lossy Compression Coding Theorems for Arbitrary Sources

Lossy Compression Coding Theorems for Arbitrary Sources Lossy Compression Coding Theorems for Arbitrary Sources Ioannis Kontoyiannis U of Cambridge joint work with M. Madiman, M. Harrison, J. Zhang Beyond IID Workshop, Cambridge, UK July 23, 2018 Outline Motivation

More information

Lattices for Distributed Source Coding: Jointly Gaussian Sources and Reconstruction of a Linear Function

Lattices for Distributed Source Coding: Jointly Gaussian Sources and Reconstruction of a Linear Function Lattices for Distributed Source Coding: Jointly Gaussian Sources and Reconstruction of a Linear Function Dinesh Krithivasan and S. Sandeep Pradhan Department of Electrical Engineering and Computer Science,

More information

18.2 Continuous Alphabet (discrete-time, memoryless) Channel

18.2 Continuous Alphabet (discrete-time, memoryless) Channel 0-704: Information Processing and Learning Spring 0 Lecture 8: Gaussian channel, Parallel channels and Rate-distortion theory Lecturer: Aarti Singh Scribe: Danai Koutra Disclaimer: These notes have not

More information

Discrete Universal Filtering Through Incremental Parsing

Discrete Universal Filtering Through Incremental Parsing Discrete Universal Filtering Through Incremental Parsing Erik Ordentlich, Tsachy Weissman, Marcelo J. Weinberger, Anelia Somekh Baruch, Neri Merhav Abstract In the discrete filtering problem, a data sequence

More information

ECE 4400:693 - Information Theory

ECE 4400:693 - Information Theory ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential

More information

National University of Singapore Department of Electrical & Computer Engineering. Examination for

National University of Singapore Department of Electrical & Computer Engineering. Examination for National University of Singapore Department of Electrical & Computer Engineering Examination for EE5139R Information Theory for Communication Systems (Semester I, 2014/15) November/December 2014 Time Allowed:

More information

Information measures in simple coding problems

Information measures in simple coding problems Part I Information measures in simple coding problems in this web service in this web service Source coding and hypothesis testing; information measures A(discrete)source is a sequence {X i } i= of random

More information

Inequalities Relating Addition and Replacement Type Finite Sample Breakdown Points

Inequalities Relating Addition and Replacement Type Finite Sample Breakdown Points Inequalities Relating Addition and Replacement Type Finite Sample Breadown Points Robert Serfling Department of Mathematical Sciences University of Texas at Dallas Richardson, Texas 75083-0688, USA Email:

More information

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code

More information

Pointwise Redundancy in Lossy Data Compression and Universal Lossy Data Compression

Pointwise Redundancy in Lossy Data Compression and Universal Lossy Data Compression Pointwise Redundancy in Lossy Data Compression and Universal Lossy Data Compression I. Kontoyiannis To appear, IEEE Transactions on Information Theory, Jan. 2000 Last revision, November 21, 1999 Abstract

More information

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute ENEE 739C: Advanced Topics in Signal Processing: Coding Theory Instructor: Alexander Barg Lecture 6 (draft; 9/6/03. Error exponents for Discrete Memoryless Channels http://www.enee.umd.edu/ abarg/enee739c/course.html

More information

The Information Lost in Erasures Sergio Verdú, Fellow, IEEE, and Tsachy Weissman, Senior Member, IEEE

The Information Lost in Erasures Sergio Verdú, Fellow, IEEE, and Tsachy Weissman, Senior Member, IEEE 5030 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 11, NOVEMBER 2008 The Information Lost in Erasures Sergio Verdú, Fellow, IEEE, Tsachy Weissman, Senior Member, IEEE Abstract We consider sources

More information

Shannon s A Mathematical Theory of Communication

Shannon s A Mathematical Theory of Communication Shannon s A Mathematical Theory of Communication Emre Telatar EPFL Kanpur October 19, 2016 First published in two parts in the July and October 1948 issues of BSTJ. First published in two parts in the

More information

Feedback Capacity of a Class of Symmetric Finite-State Markov Channels

Feedback Capacity of a Class of Symmetric Finite-State Markov Channels Feedback Capacity of a Class of Symmetric Finite-State Markov Channels Nevroz Şen, Fady Alajaji and Serdar Yüksel Department of Mathematics and Statistics Queen s University Kingston, ON K7L 3N6, Canada

More information

lossless, optimal compressor

lossless, optimal compressor 6. Variable-length Lossless Compression The principal engineering goal of compression is to represent a given sequence a, a 2,..., a n produced by a source as a sequence of bits of minimal possible length.

More information

ECE Information theory Final (Fall 2008)

ECE Information theory Final (Fall 2008) ECE 776 - Information theory Final (Fall 2008) Q.1. (1 point) Consider the following bursty transmission scheme for a Gaussian channel with noise power N and average power constraint P (i.e., 1/n X n i=1

More information

An Extended Fano s Inequality for the Finite Blocklength Coding

An Extended Fano s Inequality for the Finite Blocklength Coding An Extended Fano s Inequality for the Finite Bloclength Coding Yunquan Dong, Pingyi Fan {dongyq8@mails,fpy@mail}.tsinghua.edu.cn Department of Electronic Engineering, Tsinghua University, Beijing, P.R.

More information

Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels

Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels Nan Liu and Andrea Goldsmith Department of Electrical Engineering Stanford University, Stanford CA 94305 Email:

More information

Lecture 2. Capacity of the Gaussian channel

Lecture 2. Capacity of the Gaussian channel Spring, 207 5237S, Wireless Communications II 2. Lecture 2 Capacity of the Gaussian channel Review on basic concepts in inf. theory ( Cover&Thomas: Elements of Inf. Theory, Tse&Viswanath: Appendix B) AWGN

More information

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions Engineering Tripos Part IIA THIRD YEAR 3F: Signals and Systems INFORMATION THEORY Examples Paper Solutions. Let the joint probability mass function of two binary random variables X and Y be given in the

More information

Lecture 8: Shannon s Noise Models

Lecture 8: Shannon s Noise Models Error Correcting Codes: Combinatorics, Algorithms and Applications (Fall 2007) Lecture 8: Shannon s Noise Models September 14, 2007 Lecturer: Atri Rudra Scribe: Sandipan Kundu& Atri Rudra Till now we have

More information

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for MARKOV CHAINS A finite state Markov chain is a sequence S 0,S 1,... of discrete cv s from a finite alphabet S where q 0 (s) is a pmf on S 0 and for n 1, Q(s s ) = Pr(S n =s S n 1 =s ) = Pr(S n =s S n 1

More information

THIS paper constitutes a systematic study of the capacity

THIS paper constitutes a systematic study of the capacity 4 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 1, JANUARY 2002 Capacities of TimeVarying MultipleAccess Channels With Side Information Arnab Das Prakash Narayan, Fellow, IEEE Abstract We determine

More information

Optimal Encoding Schemes for Several Classes of Discrete Degraded Broadcast Channels

Optimal Encoding Schemes for Several Classes of Discrete Degraded Broadcast Channels Optimal Encoding Schemes for Several Classes of Discrete Degraded Broadcast Channels Bike Xie, Student Member, IEEE and Richard D. Wesel, Senior Member, IEEE Abstract arxiv:0811.4162v4 [cs.it] 8 May 2009

More information

Homework 10 Solution

Homework 10 Solution CS 174: Combinatorics and Discrete Probability Fall 2012 Homewor 10 Solution Problem 1. (Exercise 10.6 from MU 8 points) The problem of counting the number of solutions to a napsac instance can be defined

More information

Second-Order Asymptotics in Information Theory

Second-Order Asymptotics in Information Theory Second-Order Asymptotics in Information Theory Vincent Y. F. Tan (vtan@nus.edu.sg) Dept. of ECE and Dept. of Mathematics National University of Singapore (NUS) National Taiwan University November 2015

More information

Subset Source Coding

Subset Source Coding Fifty-third Annual Allerton Conference Allerton House, UIUC, Illinois, USA September 29 - October 2, 205 Subset Source Coding Ebrahim MolavianJazi and Aylin Yener Wireless Communications and Networking

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006 MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.44 Transmission of Information Spring 2006 Homework 2 Solution name username April 4, 2006 Reading: Chapter

More information

Lecture 22: Final Review

Lecture 22: Final Review Lecture 22: Final Review Nuts and bolts Fundamental questions and limits Tools Practical algorithms Future topics Dr Yao Xie, ECE587, Information Theory, Duke University Basics Dr Yao Xie, ECE587, Information

More information

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have

EECS 229A Spring 2007 * * (a) By stationarity and the chain rule for entropy, we have EECS 229A Spring 2007 * * Solutions to Homework 3 1. Problem 4.11 on pg. 93 of the text. Stationary processes (a) By stationarity and the chain rule for entropy, we have H(X 0 ) + H(X n X 0 ) = H(X 0,

More information

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information 1 Conditional entropy Let (Ω, F, P) be a probability space, let X be a RV taking values in some finite set A. In this lecture

More information

Entropy and Ergodic Theory Notes 21: The entropy rate of a stationary process

Entropy and Ergodic Theory Notes 21: The entropy rate of a stationary process Entropy and Ergodic Theory Notes 21: The entropy rate of a stationary process 1 Sources with memory In information theory, a stationary stochastic processes pξ n q npz taking values in some finite alphabet

More information

EE 4TM4: Digital Communications II. Channel Capacity

EE 4TM4: Digital Communications II. Channel Capacity EE 4TM4: Digital Communications II 1 Channel Capacity I. CHANNEL CODING THEOREM Definition 1: A rater is said to be achievable if there exists a sequence of(2 nr,n) codes such thatlim n P (n) e (C) = 0.

More information

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16 EE539R: Problem Set 4 Assigned: 3/08/6, Due: 07/09/6. Cover and Thomas: Problem 3.5 Sets defined by probabilities: Define the set C n (t = {x n : P X n(x n 2 nt } (a We have = P X n(x n P X n(x n 2 nt

More information

Exercises with solutions (Set D)

Exercises with solutions (Set D) Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where

More information

Chapter I: Fundamental Information Theory

Chapter I: Fundamental Information Theory ECE-S622/T62 Notes Chapter I: Fundamental Information Theory Ruifeng Zhang Dept. of Electrical & Computer Eng. Drexel University. Information Source Information is the outcome of some physical processes.

More information

Communications Theory and Engineering

Communications Theory and Engineering Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 AEP Asymptotic Equipartition Property AEP In information theory, the analog of

More information

Lecture 4: Proof of Shannon s theorem and an explicit code

Lecture 4: Proof of Shannon s theorem and an explicit code CSE 533: Error-Correcting Codes (Autumn 006 Lecture 4: Proof of Shannon s theorem and an explicit code October 11, 006 Lecturer: Venkatesan Guruswami Scribe: Atri Rudra 1 Overview Last lecture we stated

More information

On Competitive Prediction and Its Relation to Rate-Distortion Theory

On Competitive Prediction and Its Relation to Rate-Distortion Theory IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 49, NO. 12, DECEMBER 2003 3185 On Competitive Prediction and Its Relation to Rate-Distortion Theory Tsachy Weissman, Member, IEEE, and Neri Merhav, Fellow,

More information

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel Introduction to Coding Theory CMU: Spring 2010 Notes 3: Stochastic channels and noisy coding theorem bound January 2010 Lecturer: Venkatesan Guruswami Scribe: Venkatesan Guruswami We now turn to the basic

More information

Exercise Solutions to Functional Analysis

Exercise Solutions to Functional Analysis Exercise Solutions to Functional Analysis Note: References refer to M. Schechter, Principles of Functional Analysis Exersize that. Let φ,..., φ n be an orthonormal set in a Hilbert space H. Show n f n

More information

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information 204 IEEE International Symposium on Information Theory Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information Omur Ozel, Kaya Tutuncuoglu 2, Sennur Ulukus, and Aylin Yener

More information

Lecture 11: Polar codes construction

Lecture 11: Polar codes construction 15-859: Information Theory and Applications in TCS CMU: Spring 2013 Lecturer: Venkatesan Guruswami Lecture 11: Polar codes construction February 26, 2013 Scribe: Dan Stahlke 1 Polar codes: recap of last

More information

Linear Codes, Target Function Classes, and Network Computing Capacity

Linear Codes, Target Function Classes, and Network Computing Capacity Linear Codes, Target Function Classes, and Network Computing Capacity Rathinakumar Appuswamy, Massimo Franceschetti, Nikhil Karamchandani, and Kenneth Zeger IEEE Transactions on Information Theory Submitted:

More information

Side-information Scalable Source Coding

Side-information Scalable Source Coding Side-information Scalable Source Coding Chao Tian, Member, IEEE, Suhas N. Diggavi, Member, IEEE Abstract The problem of side-information scalable (SI-scalable) source coding is considered in this work,

More information

Polar codes for the m-user MAC and matroids

Polar codes for the m-user MAC and matroids Research Collection Conference Paper Polar codes for the m-user MAC and matroids Author(s): Abbe, Emmanuel; Telatar, Emre Publication Date: 2010 Permanent Link: https://doi.org/10.3929/ethz-a-005997169

More information

x log x, which is strictly convex, and use Jensen s Inequality:

x log x, which is strictly convex, and use Jensen s Inequality: 2. Information measures: mutual information 2.1 Divergence: main inequality Theorem 2.1 (Information Inequality). D(P Q) 0 ; D(P Q) = 0 iff P = Q Proof. Let ϕ(x) x log x, which is strictly convex, and

More information

The Method of Types and Its Application to Information Hiding

The Method of Types and Its Application to Information Hiding The Method of Types and Its Application to Information Hiding Pierre Moulin University of Illinois at Urbana-Champaign www.ifp.uiuc.edu/ moulin/talks/eusipco05-slides.pdf EUSIPCO Antalya, September 7,

More information

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet. EE376A - Information Theory Final, Monday March 14th 216 Solutions Instructions: You have three hours, 3.3PM - 6.3PM The exam has 4 questions, totaling 12 points. Please start answering each question on

More information

INFORMATION THEORY AND STATISTICS

INFORMATION THEORY AND STATISTICS CHAPTER INFORMATION THEORY AND STATISTICS We now explore the relationship between information theory and statistics. We begin by describing the method of types, which is a powerful technique in large deviation

More information

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157 Lecture 6: Gaussian Channels Copyright G. Caire (Sample Lectures) 157 Differential entropy (1) Definition 18. The (joint) differential entropy of a continuous random vector X n p X n(x) over R is: Z h(x

More information

Appendix B Information theory from first principles

Appendix B Information theory from first principles Appendix B Information theory from first principles This appendix discusses the information theory behind the capacity expressions used in the book. Section 8.3.4 is the only part of the book that supposes

More information

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page. EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 28 Please submit on Gradescope. Start every question on a new page.. Maximum Differential Entropy (a) Show that among all distributions supported

More information

On Common Information and the Encoding of Sources that are Not Successively Refinable

On Common Information and the Encoding of Sources that are Not Successively Refinable On Common Information and the Encoding of Sources that are Not Successively Refinable Kumar Viswanatha, Emrah Akyol, Tejaswi Nanjundaswamy and Kenneth Rose ECE Department, University of California - Santa

More information

Network coding for multicast relation to compression and generalization of Slepian-Wolf

Network coding for multicast relation to compression and generalization of Slepian-Wolf Network coding for multicast relation to compression and generalization of Slepian-Wolf 1 Overview Review of Slepian-Wolf Distributed network compression Error exponents Source-channel separation issues

More information

(Classical) Information Theory III: Noisy channel coding

(Classical) Information Theory III: Noisy channel coding (Classical) Information Theory III: Noisy channel coding Sibasish Ghosh The Institute of Mathematical Sciences CIT Campus, Taramani, Chennai 600 113, India. p. 1 Abstract What is the best possible way

More information

Dispersion of the Gilbert-Elliott Channel

Dispersion of the Gilbert-Elliott Channel Dispersion of the Gilbert-Elliott Channel Yury Polyanskiy Email: ypolyans@princeton.edu H. Vincent Poor Email: poor@princeton.edu Sergio Verdú Email: verdu@princeton.edu Abstract Channel dispersion plays

More information

(Classical) Information Theory II: Source coding

(Classical) Information Theory II: Source coding (Classical) Information Theory II: Source coding Sibasish Ghosh The Institute of Mathematical Sciences CIT Campus, Taramani, Chennai 600 113, India. p. 1 Abstract The information content of a random variable

More information

Reliable Computation over Multiple-Access Channels

Reliable Computation over Multiple-Access Channels Reliable Computation over Multiple-Access Channels Bobak Nazer and Michael Gastpar Dept. of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley, CA, 94720-1770 {bobak,

More information

Mismatched Multi-letter Successive Decoding for the Multiple-Access Channel

Mismatched Multi-letter Successive Decoding for the Multiple-Access Channel Mismatched Multi-letter Successive Decoding for the Multiple-Access Channel Jonathan Scarlett University of Cambridge jms265@cam.ac.uk Alfonso Martinez Universitat Pompeu Fabra alfonso.martinez@ieee.org

More information

Entropies & Information Theory

Entropies & Information Theory Entropies & Information Theory LECTURE I Nilanjana Datta University of Cambridge,U.K. See lecture notes on: http://www.qi.damtp.cam.ac.uk/node/223 Quantum Information Theory Born out of Classical Information

More information

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy Haykin_ch05_pp3.fm Page 207 Monday, November 26, 202 2:44 PM CHAPTER 5 Information Theory 5. Introduction As mentioned in Chapter and reiterated along the way, the purpose of a communication system is

More information

Randomized Quantization and Optimal Design with a Marginal Constraint

Randomized Quantization and Optimal Design with a Marginal Constraint Randomized Quantization and Optimal Design with a Marginal Constraint Naci Saldi, Tamás Linder, Serdar Yüksel Department of Mathematics and Statistics, Queen s University, Kingston, ON, Canada Email: {nsaldi,linder,yuksel}@mast.queensu.ca

More information

Information Dimension

Information Dimension Information Dimension Mina Karzand Massachusetts Institute of Technology November 16, 2011 1 / 26 2 / 26 Let X would be a real-valued random variable. For m N, the m point uniform quantized version of

More information

Lecture 4 Channel Coding

Lecture 4 Channel Coding Capacity and the Weak Converse Lecture 4 Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 15, 2014 1 / 16 I-Hsiang Wang NIT Lecture 4 Capacity

More information

Information Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST

Information Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST Information Theory Lecture 5 Entropy rate and Markov sources STEFAN HÖST Universal Source Coding Huffman coding is optimal, what is the problem? In the previous coding schemes (Huffman and Shannon-Fano)it

More information

An approach from classical information theory to lower bounds for smooth codes

An approach from classical information theory to lower bounds for smooth codes An approach from classical information theory to lower bounds for smooth codes Abstract Let C : {0, 1} n {0, 1} m be a code encoding an n-bit string into an m-bit string. Such a code is called a (q, c,

More information

Training-Based Schemes are Suboptimal for High Rate Asynchronous Communication

Training-Based Schemes are Suboptimal for High Rate Asynchronous Communication Training-Based Schemes are Suboptimal for High Rate Asynchronous Communication The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation

More information

Discrete Memoryless Channels with Memoryless Output Sequences

Discrete Memoryless Channels with Memoryless Output Sequences Discrete Memoryless Channels with Memoryless utput Sequences Marcelo S Pinho Department of Electronic Engineering Instituto Tecnologico de Aeronautica Sao Jose dos Campos, SP 12228-900, Brazil Email: mpinho@ieeeorg

More information

MATH 205C: STATIONARY PHASE LEMMA

MATH 205C: STATIONARY PHASE LEMMA MATH 205C: STATIONARY PHASE LEMMA For ω, consider an integral of the form I(ω) = e iωf(x) u(x) dx, where u Cc (R n ) complex valued, with support in a compact set K, and f C (R n ) real valued. Thus, I(ω)

More information

Lecture 8: Channel Capacity, Continuous Random Variables

Lecture 8: Channel Capacity, Continuous Random Variables EE376A/STATS376A Information Theory Lecture 8-02/0/208 Lecture 8: Channel Capacity, Continuous Random Variables Lecturer: Tsachy Weissman Scribe: Augustine Chemparathy, Adithya Ganesh, Philip Hwang Channel

More information

6196 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 9, SEPTEMBER 2011

6196 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 9, SEPTEMBER 2011 6196 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 9, SEPTEMBER 2011 On the Structure of Real-Time Encoding and Decoding Functions in a Multiterminal Communication System Ashutosh Nayyar, Student

More information

Secret Key and Private Key Constructions for Simple Multiterminal Source Models

Secret Key and Private Key Constructions for Simple Multiterminal Source Models Secret Key and Private Key Constructions for Simple Multiterminal Source Models arxiv:cs/05050v [csit] 3 Nov 005 Chunxuan Ye Department of Electrical and Computer Engineering and Institute for Systems

More information

Coding for Discrete Source

Coding for Discrete Source EGR 544 Communication Theory 3. Coding for Discrete Sources Z. Aliyazicioglu Electrical and Computer Engineering Department Cal Poly Pomona Coding for Discrete Source Coding Represent source data effectively

More information

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal

More information