Chapter 6. Continuous Sources and Channels. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Size: px
Start display at page:

Download "Chapter 6. Continuous Sources and Channels. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University"

Transcription

1 Chapter 6 Continuous Sources and Channels Po-Ning Chen, Professor Department of Communications Engineering National Chiao Tung University Hsin Chu, Taiwan 300, R.O.C.

2 Continuous Sources and Channels I: 6-1 Model {X t X, t I} Discrete sources Both X and I are discrete. Continuous sources Discrete-time continuous sources X is continuous; I is discrete. Waveform sources Both X and I are continuous.

3 Information Content of Continuous Sources I: 6-2 A straightforward extension of entropy from discrete sources to continuous sources. Entropy: H(X) = x X P X (x) log P X (x) nats. Example 6.1 (extension of entropy to continuous sources) Give a source X with source alphabet [0, 1) and uniform generic distribution. We can make it discrete by quantizing it into m levels as q m (X) = i m, if i 1 m X < i m, for 1 i m. Then the resultant entropy of the quantized source is m ( ) 1 1 H(q m (X)) = m log = log(m) nats. m i=1 Since the entropy H(q m (X)) of the quantized source is a lower bound to the entropy H(X) of the uniformly distributed continuous source, H(X) lim m H(q m(x)) =.

4 Information Content of Continuous Sources I: 6-3 A quantization extension of entropy seems ineffective for continuous sources, because their entropies are all infinity. Proof: For any continuous source X, there must exist a non-empty open interval in which the cumulative distribution function F X ( ) is strictly increasing. Now quantize the source into m + 1 level as follows: Then Assign one level to the complement of this open interval, and assign m levels to this open interval such that the probability mass on this interval, denoted by a, is equally distributed to these m levels. (This is similar to do equal partitions on the concerned domain of F 1 X ( ).) H(X) H(X ) = (1 a) log(1 a) a log a m, where X represents the quantized version of X. The lower bound goes to infinity as m tends to infinity.

5 Differential Entropy I: 6-4 Alternative extension of entropy to continuous sources (from its formula) Definition 6.2 (differential entropy) The differential entropy (in nats) of a continuous source with generic probability density function (pdf) p X is defined as h(x) = p X (x) log p X (x)dx. X The next example demonstrates the difference (in its quantity) between the entropy and differential entropy. Example 6.3 A continuous source X with source alphabet [0, 1) and pdf f(x) = 2x has differential entropy equal to 1 2x log(2x)dx = x2 1 (1 2 log(2x)) 2 0 = 1 log(2) nats. 2 0

6 Examples of Differential Entropy I: 6-5 Note that the differential entropy, unlike the entropy, can be negative in its value. Example 6.4 (differential entropy of continuous sources with uniform generic distribution) A continuous source X with uniform generic distribution over (a, b) has differential entropy h(x) = log b a nats. Example 6.5 (differential entropy of Gaussian sources) A continuous source X with Gaussian generic distribution of mean µ and variance σ 2 has differential entropy [ ] 1 (x h(x) = φ(x) R 2 log(2πσ2 µ)2 ) + dx 2σ 2 = 1 2 log(2πσ2 ) + 1 2σ 2E[(X µ)2 ] = 1 2 log(2πσ2 ) = 1 2 log(2πσ2 e) nats, where φ(x) is the pdf of the Gaussian distribution with mean µ and variance σ 2.

7 Properties of Differential Entropy I: 6-6 The extension of AEP theorem from discrete cases to continuous cases is not based on number counting (which is always infinity for continuous sources), but on volume measuring. Theorem 6.6 (AEP for continuous sources) Let X 1,..., X n be a sequence of sources drawn i.i.d. according to the density p X ( ). Then 1 n log p X(X 1,..., X n ) E[ log p X (X)] = h(x) in probability. Proof: The proof is an immediate result of law of large numbers. Definition 6.7 (typical set) For δ > 0 and any n given, define the typical set as { } F n (δ) = x n X n : 1 n log p X(X 1,..., X n ) h(x) < δ. Definition 6.8 (volume) The volume of a set A is defined as Vol(A) = dx 1 dx n. A

8 Properties of Differential Entropy I: 6-7 Theorem 6.9 (Shannon-McMillan theorem for continuous sources) 1. For n sufficiently large, P X n {F c n(δ)} < δ. 2. Vol(F n (δ)) e n(h(x)+δ) for all n. 3. Vol(F n (δ)) (1 δ)e n(h(x) δ) for n sufficiently large. Proof: The proof is an extension of Shannon-McMillan theorem for discrete sources, and hence we omit it. E.g. We can also derive a source coding theorem for continuous sources and obtain that when a continuous source is compressed by quantization, it is beneficial to put most of the quantization effort on its typical set, instead of on the entire source space. Assigning (m 1)-level to elements in F n (δ); Assigning 1 level to those elements outside F n (δ).

9 Properties of Differential Entropy I: 6-8 Remarks: If the differential entropy of a continuous source is larger, it is expected that a larger number of quantization levels is required in order to minimize the distortion introduced via quantization. So we may conclude that continuous sources with higher differential entropy contain more information in volume. Question: Which continuous source is the richest in information (i.e., is highest in differential entropy and cost more in quantization)? Answer: Gaussian.

10 Properties of Differential Entropy I: 6-9 Theorem 6.10 (maximal differential entropy of Gaussian source) The Gaussian source has the largest differential entropy among all continuous sources with identical mean and variance. Proof: Let p( ) be the pdf of a continuous source X, and let φ( ) be the pdf of a Gaussian source Y. Assume that these two sources have the same mean µ and variance σ 2. Observe that R φ(y) log φ(y)dy = R φ(y) [ 1 2 log(2πσ2 ) + ] (y µ)2 dy 2σ 2 = 1 2 log(2πσ2 ) + 1 2σ 2E[(Y µ)2 ] = 1 2 log(2πσ2 ) + 1 = = R R 2σ 2E[(X µ)2 ] p(x) [ 12 log(2πσ2 ) p(x) log φ(x)dx. (x µ)2 2σ 2 ] dx

11 Properties of Differential Entropy I: 6-10 Hence, h(y ) h(x) = φ(y) log φ(y)dy + p(x) log p(x)dx R R = p(x) log φ(x)dx + p(x) log p(x)dx R R = p(x) log p(x) R φ(x) dx ( p(x) 1 φ(x) ) dx (fundamental inequality) R p(x) = (p(x) φ(x)) dx = 0, R with equality holds if, and only if, p(x) = φ(x) for all x R.

12 Properties of Differential Entropy I: 6-11 Properties regarding differential entropy for continuous sources, which are the same as in discrete cases. Lemma h(x Y ) h(x) with equality holds if, and only if, X and Y are independent. 2. (chain rule for differential entropy) n h(x 1, X 2,..., X n ) = h(x i X 1, X 2,..., X i 1 ). are inde- 3. h(x n ) n i=1 h(x i) with equality holds if, and only if, {X i } n i=1 pendent. i=1

13 Properties of Differential Entropy I: 6-12 There are some properties that are conceptually different in continuous cases from the original ones in discrete cases. Lemma 6.12 (In discrete cases) For any one-to-one correspondence mapping f, H(f(X)) = H(X). (In continuous cases) For a mapping f(x) = ax with some non-zero constant a, h(f(x)) = h(x) + log a. Proof: (For continuous cases only) Let p X ( ) and p f(x) ( ) be respectively the pdfs of the original source and the mapped source. Then p f(x) (u) = 1 ( u ) a p X. a By taking the new pdf into the formula of differential entropy, we have the desired result.

14 Important Notes on Differential Entropy I: 6-13 The above lemma says that the differential entropy can be increased by a oneto-one correspondence mapping. Therefore, when viewing the quantity as a measure of information content of continuous sources, we would yield that information content can be increased by a function mapping, which is somewhat contrary to the intuition. Based on this reason, it may not be appropriate to interpret the differential entropy as an index of information content of continuous sources. Indeed, it may be better to view this quantity as a measure of quantization efficiency.

15 Important Notes on Differential Entropy I: 6-14 Some researchers interpret the differential entropy as a convenient intermediate formula of calculating the mutual information and divergence for systems with continuous alphabets, which is in general true. Before introducing the formula of relative entropy and mutual information for continuous settings, we point out beforehand that the interpretation, as well as the operational characteristics, of the mutual information and divergence for systems with continuous alphabets is exactly the same as for systems with discrete alphabets.

16 More Properties on Differential Entropy I: 6-15 Corollary 6.13 For a sequence of continuous sources X n = (X 1,..., X n ), and a non-singular n n matrix A, h(ax n ) = h(x n ) + log A, where A represents the determinant of matrix A. Corollary 6.14 h(x + c) = h(x) and h(x Y ) = h(x + Y Y ).

17 Operational Meaning of Differential Entropy I: 6-16 Lemma 6.15 Give the pdf f(x) of a continuous source X, and suppose that f(x) log 2 f(x) is Riemann-integrable. Then to uniformly quantize the random source within n-bit accuracy, i.e., the quantization width is no greater than 2 n, need approximately h(x) + n bits (as n large enough). Proof: Step 1: Mean-value theorem. Let = 2 n be the width of two adjacent quantization levels. Let t i = i for integer i (, ). From mean-value theorem, we can choose x i [t i 1, t i ] such that ti t i 1 f(x)dx = f(x i )(t i t i 1 ) = f(x i ).

18 Operational Meaning of Differential Entropy I: 6-17 Step 2: Definition of h (X). Let h (X) = [f(x i ) log 2 f(x i )]. i= Since h(x) is Riemann-integrable, h (X) h(x) as = 2 n 0. Therefore, given any ε > 0, there exists N such that for all n > N, h(x) h (X) < ε. Step 3: Computation of H(X ). The entropy of the quantized source X is H(X ) = p i log 2 p i = (f(x i ) ) log 2 (f(x i ) ) bits. i= i=

19 Operational Meaning of Differential Entropy I: 6-18 Step 4: H(X ) h (X). From Steps 2 and 3, Hence, for n > N. H(X ) h (X) = i= = ( log 2 ) = ( log 2 ) [f(x i ) ] log 2 i= ti t i 1 f(x)dx f(x)dx = log 2 = n. [h(x) + n] ε < H(X ) = h (X) + n < [h(x) + n] + ε, Remark 1: Since H(X ) is the minimum average number of codeword length for lossless data compression, to uniformly quantize a continuous source upto n-bit accuracy requires approximately h(x) + n bits. Remark 2: We may conclude that the larger the differential entropy, the average number of bits required to uniformly quantize the source subject to a fixed accuracy is larger.

20 Operational Meaning of Differential Entropy I: 6-19 This operational meaning of differential entropy can be used to interpret its properties introduced in the previous subsection. For example, h(x + c) = h(x) can be interpreted as A shift in value does not change the quantization efficiency of the original source.

21 Riemann Integral Versus Lebesgue Integral I: 6-20 Riemann integral: Let s(x) represent a step function on [a, b), which is defined as that there exists a partition a = x 0 < x 1 < < x n = b such that s(x) is constant during (x i, x i+1 ) for 0 i < n. If a function f(x) is Riemann integrable, b a f(x) = sup { s(x) : s(x) f(x) } b a s(x)dx = { inf s(x) : s(x) f(x) } b a s(x)dx. Example of a non-riemann-integrable function: f(x) = 0 if x is irrational; f(x) = 1 if x is rational. Then but { sup { s(x) : s(x) f(x) inf s(x) : s(x) f(x) } } b a b a s(x)dx = 0, s(x)dx = (b a).

22 Riemann Integral Versus Lebesgue Integral I: 6-21 Lebesgue integral: Let t(x) represent a simple function, which is defined as the linear combination of indicator functions for mutually-disjoint partitions. For example, let U 1,..., U m be the mutually-disjoint partitions of the domain X and m i=1 U i = X. The indicator function of U i is 1(x; U i ) = 1 if x U i, and 0, otherwise. Then t(x) = m i=1 a i1(x; U i ) is a simple function. If a function f(x) is Lebesgue integrable, then b a f(x) = sup { t(x) : t(x) f(x) } b a t(x)dx = { inf t(x) : t(x) f(x) } b a t(x)dx. The previous example is actually Lebesgue integrable, and its Lebesgue integral is equal to zero.

23 Example of Quantization Efficiency I: 6-22 Example 6.16 Find the minimum average number of bits required to uniformly quantize the decay time (in years) of a radium atom upto 3-digit accuracy, if the half-life of the radium is 80 years. Note that the half-life of a radium atom is the median of its decay time distribution f(x) = λe λx (x > 0). Since the median is 80, we obtain: 80 0 λe λx dx = 0.5, which implies λ = Also, 3-digit accuracy is approximately equivalent to log = bit accuracy. Therefore, the number of bits required to quantize the source is approximately e h(x) + 10 = log = bits. λ

24 Relative Entropy and Mutual Information I: 6-23 Definition 6.17 (relative entropy) Define the relative entropy between two densities p X and p ˆX by D(X ˆX) = p X (x) log p X(x) dx. p ˆX(x) X Definition 6.18 (mutual information) The mutual information with inputoutput joint density p X,Y (x, y) is defined as I(X; Y ) = p X (x, y) log p X,Y (x, y) p X (x)p Y (y) dxdy. X Y

25 Relative Entropy and Mutual Information I: 6-24 Different from the cases for entropy, the properties of relative entropy and mutual information in continuous cases are the same as those in the discrete cases. In particular, the mutual information of the quantized version of a continuous channel will converge to the mutual information of the same continuous channel (as the quantization step size goes to zero). Hence, some researchers prefer to define the mutual information of a continuous channel directly as the limit of the quantized channel. Here, we quote some of the properties on relative entropy and mutual information from discrete settings. Lemma D(X ˆX) 0 with equality holds if, and only if, p X = p ˆX. 2. I(X; Y ) 0 with equality holds if, and only if, X and Y are independent. 3. I(X; Y ) = h(y ) h(y X).

26 Rate Distortion Theorem Revisited I: 6-25 Since the entropy of continuous sources is infinity, to compress a continuous source without distortion is impossible according to Shannon s source coding theorem. Thus, one way to characterize the data compression for continuous sources is to encode the original source subject to a constraint on the distortion, which yields the rate-distortion function for data compression. In concept, the rate-distortion function is the minimum data compression rate (nats required per source letter or nats required per source sample) for which the distortion constraint is satisfied.

27 Specific Formula of Rate Distortion Theorem I: 6-26 Theorem 6.20 Under the squared error distortion measure, namely ρ(z,ẑ) = (z ẑ) 2, the rate-distortion function for continuous source Z with zero mean and variance σ 2 satisfies 1 σ2 R(D) log 2 D, for 0 D σ2 ; 0, for D > σ 2. Equality holds when Z is Gaussian. Proof: By Theorem 5.16 (extended to squared-error distortion measure), R(D) = { min pẑ Z : E[(Z Ẑ)2 ] D So for any pẑ Z satisfying the distortion constraint, R(D) I(p Z, pẑ Z ). } I(Z; Ẑ). For 0 D σ 2, choose a dummy Gaussian random variable W with zero mean and variance ad, where a = 1 D/σ 2, and is independent of Z. Let Ẑ = az +W. Then E[(Z Ẑ)2 ] = E[(1 a) 2 Z 2 ] + E[W 2 ] = (1 a) 2 σ 2 + ad = D

28 Specific Formula of Rate Distortion Theorem I: 6-27 which satisfies the distortion constraint. Note that the variance of Ẑ is equal to E[a 2 Z 2 ] + E[W 2 ] = σ 2 D. Consequently, R(D) I(Z; Ẑ) = = = = h(ẑ) h(ẑ Z) h(ẑ) h(w + az Z) h(ẑ) h(w Z) (By Corollary 6.14) h(ẑ) h(w) (By Lemma 6.11) = h(ẑ) 1 2 log(2πe(ad)) 1 2 log(2πe(σ2 D)) 1 2 log(2πe(ad)) = 1 2 log σ2 D. For D > σ 2, let Ẑ satisfy Pr{Ẑ = 0} = 1, and be independent of Z. Then E[(Z Ẑ)2 ] = E[Z 2 ] E[Ẑ2 ] 2E[Z]E[Ẑ] = σ2 < D, and I(Z; Ẑ) = 0. Hence, R(D) = 0 for D > σ 2. The achievability of the upper bound by Gaussian source will be proved in Theorem 6.22.

29 Rate Distortion Function for Binary Source I: 6-28 Theorem 6.21 Fix a memoryless binary source Z = {Z n = (Z 1, Z 2,..., Z n )} n=1 with marginal distribution P Z (0) = 1 P Z (1) = p. Assume that the Hamming additive distortion measure is employed. Then the rate-distortion function { Hb (p) H R(D) = b (D), if 0 D min{p, 1 p}; 0, if D > min{p, 1 p}, where H b (p) = p log(p) (1 p) log(1 p) is the binary entropy function. Proof: Assume without loss of generality that p 1/2. We first prove the theorem under 0 D < min{p, 1 p} = p. Observe that for any binary random variable Ẑ, Also observe that H(Z Ẑ) = H(Z Ẑ Ẑ). E[ρ(Z, Ẑ)] D implies Pr{Z Ẑ = 1} D.

30 Rate Distortion Function for Binary Source I: 6-29 Then I(Z; Ẑ) = H(Z) H(Z Ẑ) = H b (p) H(Z Ẑ Ẑ) H b (p) H(Z Ẑ) H b (p) H b (D), (conditioning never increase entropy) where the last inequality follows since H b (x) is increasing for x 1/2, and Pr{Z Ẑ = 1} D. Since the above derivation is true for any PẐ Z, we have R(D) H b (p) H b (D). It remains to show that the lower bound is achievable by some PẐ Z, or equivalently, H(Z Ẑ) = H b(d) for some PẐ Z. By defining P Z Ẑ (0 0) = PZ Ẑ(1 1) = 1 D, we immediately obtain H(Z Ẑ) = H b(d). The desired PẐ Z can be obtained by solving 1 = PẐ(0) + PẐ(1) = P Z(0) = P Z(0) P Z Ẑ (0 1)P Ẑ Z (1 0) P Z Ẑ (0 0)P Ẑ Z (0 0) + p 1 D P Ẑ Z (0 0) + p D (1 P Ẑ Z (0 0))

31 Rate Distortion Function for Binary Source I: 6-30 and and yield PẐ Z (0 0) = 1 D 1 2D 1 = PẐ(0) + PẐ(1) = P Z(1) P Z Ẑ (1 0)P Ẑ Z (0 1) + P Z(1) P Z Ẑ (1 1)P Ẑ Z (1 1) = 1 p D (1 P Ẑ Z (1 1)) + 1 p 1 D P Ẑ Z (1 1), ( 1 D ) p and PẐ Z (1 1) = 1 D 1 2D ( 1 D ). 1 p Now in the case of p D < 1 p, we can let PẐ Z (1 0) = PẐ Z (1 1) = 1 to obtain I(Z; Ẑ) = 0 and E[ρ(Z, Ẑ)] = 1 z=0 1 P Z (z)pẑ Z (ẑ z)ρ(z,ẑ) = p D. ẑ=0 Similarly, in the case of D 1 p, we let PẐ Z (0 0) = PẐ Z (0 1) = 1 to obtain I(Z; Ẑ) = 0 and E[ρ(Z, Ẑ)] = 1 z=0 1 P Z (z)pẑ Z (ẑ z)ρ(z,ẑ) = 1 p D. ẑ=0

32 Rate Distortion Function for Binary Source I: 6-31 Remark: The Hamming additive distortion measure is defined as: n ρ n (z n, ẑ n ) = z i ẑ i, where denotes modulo two addition. In such case, ρ(z n, ẑ n ) is exactly the number of bit changes or bit errors after compression. i=1

33 Rate Distortion Function for Gaussian Sources I: 6-32 Theorem 6.22 Fix a memoryless source Z = {Z n = (Z 1, Z 2,..., Z n )} n=1 with zero-mean Gaussian marginal distribution of variance σ 2. Assume that the squared error distortion measure is employed. Then the rate-distortion function is given by: R(D) = 1 σ2 log 2 D, if 0 D σ2 ; 0, if D > σ 2. Proof: From Theorem 6.20, it suffices to show that under the Gaussian source, (1/2) log(σ 2 /D) is a lower bound to R(D) for 0 D σ 2.

34 Rate Distortion Function for Gaussian Sources I: 6-33 This can be proved as follows. For Gaussian source Z with E[(Z Ẑ)2 ] D, I(Z; Ẑ) = h(z) h(z Ẑ) = 1 2 log(2πeσ2 ) h(z Ẑ Ẑ) (Corollary 6.14) 1 2 log(2πeσ2 ) h(z Ẑ) (Lemma 6.11) 1 2 log(2πeσ2 ) 1 ( ) 2 log 2πe Var[(Z Ẑ)] 1 2 log(2πeσ2 ) 1 ) (2πe 2 log E[(Z Ẑ)2 ] (Theorem 6.10) 1 2 log(2πeσ2 ) 1 2 log (2πeD) = 1 2 log σ2 D.

35 Channel Coding Theorem for Continuous Sources I: 6-34 Power constraint on channel input To derive the channel capacity for a memoryless continuous channel without any constraint on the inputs is somewhat impractical, especially when the input can be any number on the infinite real line. Such constraint is usually of the form or 1 n E[t(X)] S n E[t(X i )] S for a sequence of random inputs, i=1 where t( ) is a non-negative cost function. Example 6.23 (average power constraint) t(x) = x 2, i.e., the constraint is that the average input power is bounded above by S.

36 Channel Coding Theorem for Continuous Sources I: 6-35 As extended from discrete cases, the channel capacity of a discrete-time continuous channel with the input cost constraints is of the form C(S) = max I(X; Y ). (6.3.1) {p X : E[t(X)] S} Claim: C(S) is a concave function of S.

37 Channel Coding Theorem for Continuous Sources I: 6-36 Lemma 6.24 (concavity of capacity-cost function) C(S) is concave, continuous, and strictly increasing in S. Proof: Let P X1 and P X2 be two distributions that respectively achieve C(P 1 ) and C(P 2 ). Denote P Xλ = λpx1 + (1 λ)p X2. Then C(λP 1 + (1 λ)p 2 ) = max I(P X, P Y X ) {P X : E[t(X)] λp 1 +(1 λ)p 2 } I(P Xλ, P Y X ) λi(p X1, P Y X ) + (1 λ)i(p X2, P Y X ) = λc(p 1 ) + (1 λ)c(p 2 ), where the first inequality holds since E Xλ [t(x)] = t(x)dp λ (x) R = λ t(x)dp X1 (x) + (1 λ) R = λe X1 [t(x)] + (1 λ)e X2 [t(x)] λp 1 + (1 λ)p 2, R t(x)dp X2 (x) and the second inequality follows from the concavity of mutual information with respect to the first argument. Accordingly, C(S) is concave in S.

38 Channel Coding Theorem for Continuous Sources I: 6-37 Furthermore, it can be easily seen by definition that C(S) is non-decreasing, which, together with its concavity, implies its continuity and strict increasing.

39 Channel Coding Theorem for Continuous Sources I: 6-38 Although the capacity-cost function formula in (6.3.1) is valid for general cost function t( ), we only substantiate it under the average power constraint in the next forward channel coding theorem. Its validity for a more general case can be similarly proved based on the same concept.

40 Channel Coding Theorem for Continuous Sources I: 6-39 Theorem 6.25 (forward channel coding theorem for continuous channels under average power constraint) For any ε (0, 1), there exist 0 < γ < 2ε and a data transmission code sequence { C n = (n, M n )} n=1 satisfying 1 n log M n > C(S) γ and for each codeword c = (c 1, c 2,..., c n ), 1 n n c 2 i S (6.3.2) i=1 such that the probability of decoding error P e ( C n ) is less than ε for sufficiently large n, where C(S) = max I(X; Y ). {p X : E[X 2 ] S}

41 Channel Coding Theorem for Continuous Sources I: 6-40 Proof: The theorem holds trivially when C(S) = 0 because we can choose M n = 1 for every n, and yields P e ( C n ) = 0. Hence, assume without loss of generality C(S) > 0. Step 0: Take a positive γ satisfying Pick ξ > 0 small enough such that γ < min{2ε, C(S)}. 2[C(S) C(S ξ)] < γ, where the existence of such ξ is assured by the strict increasing of C(S). Hence, we have C(S ξ) γ/2 > C(S) γ > 0. Choose M n to satisfy C(S ξ) γ 2 > 1 n log M n > C(S) γ, for which the choice should exist for all sufficiently large n. Take δ = γ/8. Let P X be the distribution that achieves C(S ξ); hence, E[X 2 ] S ξ and I(X; Y ) = C(S ξ).

42 Channel Coding Theorem for Continuous Sources I: 6-41 Step 1: Random coding with average power constraint. Randomly draw M n 1 codewords according to distribution P X n with n P X n(x n ) = P X (x i ). By law of large numbers, each randomly selected codeword c m = (c m1,..., c mn ) i=1 satisfies lim n 1 n for m = 1, 2,..., M n 1. n c 2 mi S ε almost surely i=1

43 Channel Coding Theorem for Continuous Sources I: 6-42 Step 2: Coder. For M n selected codewords {c 1,..., c Mn }, replace the codewords that violate the power constraint (i.e., (6.3.2)) by all-zero codeword 0. Define the encoder as f n (m) = c m for 1 m M n. When receiving an output sequence y n, the decoder g n ( ) is given by m, if (c m, y n ) F n (δ) g n (y n ) = and ( m m) (c m, y n ) F n (δ), arbitrary, otherwise, where { F n (δ) = (x n, y n ) X n Y n : 1 n log p X n Y n(xn, y n ) h(x, Y ) < δ, 1 n log p X n(xn ) h(x) < δ, } and 1 n log p Y n(yn ) h(y ) < δ.

44 Channel Coding Theorem for Continuous Sources I: 6-43 Step 3: Probability of error. Let λ m denote the error probability given that codeword m is transmitted. Define { } E 0 = x n X n : 1 n x 2 i > S. n Then by following similar argument as (4.3.2) (discrete cases), we get: where i=1 E[λ m ] P X n(e 0 ) + P X n,y n (Fc n(δ)) M n + P X n,y n(c m, y n ), m =1 m m c m X n y n F n (δ c m ) F n (δ x n ) = {y n Y n : (x n, y n ) F n (δ)}. Note that the additional term P X n(e 0 ) (to discrete cases) is to cope with the errors due to all-zero codeword replacement, which will be less than δ for all sufficiently large n by the law of large numbers.

45 Channel Coding Theorem for Continuous Sources I: 6-44 Finally, by carrying out similar procedure as in the proof of the capacity for discrete channels, we obtain: E[P e (C n )] P X n(e 0 ) + P X n,y n (Fc n(δ)) +M n e n(h(x,y )+δ) e n(h(x) δ) n(h(y ) δ) e P X n(e 0 ) + P X n,y n (Fc n(δ)) + e n(c(s ξ) 4δ) n(i(x;y ) 3δ) e = P X n(e 0 ) + P X n,y n (Fc n(δ)) + e nδ. Accordingly, we can make the average probability of error, namely E[P e (C n )], less than 3δ = 3γ/8 < 3ε/4 < ε for all sufficiently large n.

46 Channel Coding Theorem for Continuous Sources I: 6-45 Theorem 6.26 (converse channel coding theorem for continuous channels) For any data transmission code sequence { C n = (n, M n )} n=1 satisfying the power constraint, if the ultimate data transmission rate satisfies lim inf n 1 n log M n > C(S), then its probability of decoding error is bounded away from zero for all n sufficiently large. Proof: For an (n, M n ) block data transmission code, an encoding function is chosen as: f n : {1, 2,..., M n } X n, and each index i is equally likely for the average probability of block decoding error criterion. Hence, we can assume that the information message {1, 2,..., M n } is generated from a uniformly distributed random variable, and denote it by W. As a result, H(W) = log M n. Since W X n Y n forms a Markov chain because Y n only depends on X n, we obtain by the data processing lemma that I(W; Y n ) I(X n ; Y n ).

47 Channel Coding Theorem for Continuous Sources I: 6-46 We can also bound I(X n ; Y n ) by C(S) as: I(X n ; Y n ) max {P X n : (1/n) I(X n ; Y n ) n i=1 E[Xi 2] S} max {P X n : (1/n) n i=1 E[X 2 i ] S} n I(X j ; Y j ) j=1 = max {(P 1,P 2,...,P n ) : (1/n) max n i=1 P i =S} {P X n : ( i) E[Xi 2] P i} max {(P 1,P 2,...,P n ) : (1/n) n i=1 P i =S} max {(P 1,P 2,...,P n ) : (1/n) n i=1 P i =S} = max {(P 1,P 2,...,P n ):(1/n) n i=1 P i =S} = max {(P 1,P 2,...,P n ):(1/n) n n i=1 P i =S} n j=1 n j=1 n I(X j ; Y j ) j=1 max I(X j ; Y j ) {P X n : ( i) E[Xi 2] P i} max I(X j ; Y j ) {P Xj : E[Xj 2] P j} n C(P j ) j=1 n j=1 1 n C(P j)

48 Channel Coding Theorem for Continuous Sources I: 6-47 max {(P 1,P 2,...,P n ):(1/n) nc n i=1 P i =S} (by concavity of C(S)) = nc(s). Consequently, by defining P e ( C n ) as the error of guessing W by observing Y n via a decoding function g n : Y n {1, 2,..., M n }, 1 n which is exactly the average block decoding failure, we get log M n = H(W) = H(W Y n ) + I(W; Y n ) H(W Y n ) + I(X n ; Y n ) n j=1 H b (P e ( C n )) + P e ( C n ) log( W 1) + nc(s), (by Fano s inequality) P j log(2) + P e ( C n ) log(m n 1) + nc(s), (by the fact that ( t [0, 1]) H b (t) log(2)) log(2) + P e ( C n ) log M n + nc(s),

49 Channel Coding Theorem for Continuous Sources I: 6-48 which implies that P e ( C n ) 1 C(S) log(2). (1/n) log M n log M n So if lim inf n (1/n) log M n > C(S), then there exists δ with 0 < δ < 4ε and an integer N such that for n N, 1 n log M n > C(S) + δ. Hence, for n N 0 = max{n, 2 log(2)/δ}, P e ( C n ) 1 C(S) C(S) + δ log(2) n(c(s) + δ) δ 2(C(S) + δ). Remarks: This is a weak converse statement. Since the capacity is now a function of the cost constraint, it is named the capacity-cost function. Next, we will derive the capacity-cost function of some frequently used channel models.

50 Memoryless Additive Gaussian Channels I: 6-49 Definition 6.27 (memoryless additive channel) Let X 1,..., X n and Y 1,..., Y n be the input and output sequences of the channel. Also let N 1,..., N n be the noise. Then a memoryless additive channel is defined by Y i = X i + N i for each i, where {X i, Y i, N i } n i=1 are i.i.d. and X i is independent of N i. Definition 6.28 (memoryless additive Gaussian channel) A memoryless additive channel is called memoryless additive Gaussian channel, if the noise is a Gaussian random variable. Theorem 6.29 (capacity of memoryless additive Gaussian channel under average power constraint) The capacity of a memoryless additive Gaussian channel with noise model N(0, σ 2 ) and average power constraint is equal to: C(S) = 1 ( 2 log 1 + S ) nats/channel symbol. σ 2

51 Memoryless Additive Gaussian Channels I: 6-50 Proof: By definition, C(S) = max I(X; Y ) {p X : E[X 2 ] S} = max (h(y ) h(y X)) {p X : E[X 2 ] S} = max (h(y ) h(n + X X)) {p X : E[X 2 ] S} = max (h(y ) h(n X)) {p X : E[X 2 ] S} = max {p X : E[X 2 ] S} ( (h(y ) h(n)) ) = max {p X : E[X 2 ] S} h(y ) h(n), where N represents the additive Gaussian noise. We thus need to find an input distribution satisfying E[X 2 ] S that maximizes the differential entropy of Y. Recall that the differential entropy subject to mean and variance constraint is maximized by Gaussian random variable; also, the differential entropy of a Gaussian random variable with variance σ 2 is (1/2) log(2πσ 2 e) nats, and is nothing to do with its mean.

52 Memoryless Additive Gaussian Channels I: 6-51 Therefore, by taking X to be a Gaussian random variable having distribution N(0, S), Y achieves its largest variance S + σ 2 under the constraint E[X 2 ] S. Consequently, C(S) = 1 2 log(2πe(s + σ2 )) 1 2 log(2πeσ2 ) = 1 ( 2 log 1 + S ) nats/channel symbol. σ 2 Theorem 6.30 (worseness in capacity of Gaussian noise) For all memoryless additive discrete-time continuous channels whose noise has zero-mean and variance σ 2, the capacity subject to average power constraint is lower bounded by 1 2 log (1 + Sσ ), 2 which is the capacity for memoryless additive discrete-time Gaussian channel. (This means that the Gaussian noise is the worst kind of noise in the sense of channel capacity.)

53 Memoryless Additive Gaussian Channels I: 6-52 Proof: Let p Yg X g and p Y X denote the transition probabilities of the Gaussian channel and some other channel satisfying the cost constraint, respectively. Let N g and N respectively denote their noises. Then for any Gaussian input p Xg, I(p Xg, p Y X ) I(p Xg, p Yg X g ) = p Xg (x)p N (y x) log p N(y x) dydx R R p Y (y) p Xg (x)p Ng (y x) log p N g (y x) dydx R R p Yg (y) = p Xg (x)p N (y x) log p N(y x) dydx R R p Y (y) p Xg (x)p N (y x) log p N g (y x) dydx R R p Yg (y) = p Xg (x)p N (y x) log p N(y x)p Yg (y) R R p Ng (y x)p Y (y) dydx ( p Xg (x)p N (y x) 1 p ) N g (y x)p Y (y) dydx R R p N (y x)p Yg (y) ( ) p Y (y) = 1 p Xg (x)p Ng (y x)dx dy R p Yg (y) R = 0,

54 Memoryless Additive Gaussian Channels I: 6-53 with equality holds if, and only if, for all x. Therefore, p Y (y) p Yg (y) = p N(y x) p Ng (y x) max I(p X, p Yg {p X : E[X 2 X g ) = I(p X g, p Yg X g ) ] S} I(p X g, P Y X ) max I(p X, p Y X ). {p X : E[X 2 ] S}

55 Uncorrelated Parallel Gaussian Channels I: 6-54 Water-pouring scheme In concept, more channel input power should be placed on those channels with smaller noise power. The water-pouring scheme not only substantiates the above intuition, but also gives a quantitatively meaning for it. S = S 1 + S 2 + S 4 σ 2 2 σ 2 3 S 1 S 2 S 4 σ 2 1 σ 2 4

56 Uncorrelated Parallel Gaussian Channels I: 6-55 Theorem 6.31 (capacity for parallel additive Gaussian channels) The capacity of k parallel additive Gaussian channels under an overall input power constraint S, is k ( 1 C(S) = 2 log 1 + S ) i σi 2, i=1 where σi 2 is the noise variance of channel i, S i = max{0, θ σi 2 }, and θ is chosen to satisfy k i=1 S i = S. This capacity is achieved by a set of independent Gaussian input with zero mean and variance S i.

57 Uncorrelated Parallel Gaussian Channels I: 6-56 Proof: By definition, C(S) = { p X k : max } I(X k ; Y k ). ki=1 E[Xi 2] S Since noise N 1,..., N k are independent, ( ) I(X k ; Y k ) = h(y k ) h(y k X k ) = h(y k ) h(n k + X k X k ) = h(y k ) h(n k X k ) = h(y k ) h(n k ) k = h(y k ) h(n i ) = i=1 k h(y i ) i=1 k I(X i ; Y i ) i=1 k i=1 k h(n i ) i=1 ( 1 2 log 1 + S ) i σi 2 with equality holds if each input is a Gaussian random variable with zero mean

58 Uncorrelated Parallel Gaussian Channels I: 6-57 and variance S i, and the inputs are independent, where S i is the individual power constraint applied on channel i with k i=1 S i = S. So the problem is reduced to finding the power allotment that maximizes the capacity subject to the constraint k i=1 S i = S. By using the Lagrange multiplier technique, the maximizer of { k ( 1 max 2 log 1 + S ) ( k )} i σ 2 + λ S i S i=1 i i=1 can be found by taking the derivative (w.r.t. S i ) of the above equation and let it be zero, which yields 1 1 2S i + σ 2 + λ = 0, if S i > 0; i 1 1 2S i + σi 2 + λ 0, if S i = 0. Hence, { Si = θ σ 2 i, if S i > 0; S i θ σ 2 i, if S i = 0, where θ = 1/(2λ).

59 Rate Distortion for Parallel Gaussian Sources I: 6-58 A theorem on rate-distortion function parallel to that on capacity-cost function can also be established. Theorem 6.32 (rate distortion for parallel Gaussian sources) Given k mutually independent Gaussian sources with variance σ1, 2..., σk 2. The overall rate-distortion function for additive squared error distortion, namely is given by k E[(Z i Ẑi) 2 ] D i=1 R(D) = k i=1 1 2 log σ2 i D i, where D i = min{θ, σ 2 i } and θ is chosen to satisfy k i=1 D i = D.

60 Rate Distortion for Parallel Gaussian Sources I: 6-59 height σ 2 3 Pour in water of amount D. θ height σ 2 2 height σ 2 1 The overall water is D = D 1 + D 2 + D 3 + D 4 = σ σ θ + σ 2 4. height σ 2 4 The water-pouring for lossy data compression of parallel Gaussian sources.

61 Correlated Parallel Additive Gaussian Channels I: 6-60 Theorem 6.33 (capacity for correlated parallel additive Gaussian channels) The capacity of k parallel additive Gaussian channels under an overall input power constraint S, is k ( 1 C(S) = 2 log 1 + S ) i, λ i i=1 where λ i is the i-th eigenvalue of the positive-definite noise covariance matrix K N, S i = max{0, θ λ i }, and θ is chosen to satisfy k i=1 S i = S. This capacity is achieved by a set of independent Gaussian input with zero mean and variance S i. A matrix K k k is positive definite if for every x 1,..., x k, x 1 [x 1,..., x k ]K. 0, x k with equality holds only when x i = 0 for 1 i k. The matrix Λ in the decomposition of a positive-definite matrix K, i.e., K = QΛQ t, is a diagonal matrix with non-zero diagonal components.

62 Correlated Parallel Additive Gaussian Channels I: 6-61 Proof: Let K X be the covariance matrix of the input, X 1,..., X k. The input power constraint then becomes k E[Xi 2 ] = tr(k X ) S, i=1 where tr( ) represent the traverse of the k k matrix K X. Assume that the input is independent of the noise. Then I(X k ; Y k ) = h(y k ) h(y k X k ) = h(y k ) h(n k + X k X k ) = h(y k ) h(n k X k ) = h(y k ) h(n k ). Since h(n k ) is not determined by the input, the capacity-finding problem is reduced to maximize h(y k ) over all possible inputs satisfying the power constraint. Now observe that the covariance matrix of Y k is K Y = K X +K N, which implies that the differential entropy of Y k is upper bounded by h(y k ) 1 2 log((2πe)k K X + K N ). It remains to find the K X (if it is possible) under which the above upper bound is achieved, and also this achievable upper bound is maximized.

63 Correlated Parallel Additive Gaussian Channels I: 6-62 Decompose K N into its diagonal form as K N = QΛQ t, where superscript t represents the transpose operation on matrices, QQ t = I k k, and I k k represents the identity matrix of order k. Note that since K N is positive definite, Λ is a diagonal matrix with positive diagonal components equal to the eigenvalues of K N. Then K X + K N = K X + QΛQ t = Q Q t K X Q + Λ Q t = Q t K X Q + Λ = A + Λ, where A = Q t K X Q. Since tr(a) = tr(k X ), the problem is further transformed to maximize A + Λ subject to tr(a) S.

64 Correlated Parallel Additive Gaussian Channels I: 6-63 Lemma [Hadamard s inequality] Any positive definite k k matrix K satisfies k K K ii, where K ii is the component of matrix K locating at i th row and i th column. Equality holds if, and only if, the matrix is diagonal. i=1 By observing that A + Λ is positive definite (because Λ is positive definite), together with the above lemma, we have A + Λ k (A ii + λ i ), i=1 where λ i is the component of matrix Λ locating at i th row and i th column, which is exactly the i-th eigenvalue of K N. Thus, the maximum value of A + Λ under tr(a) S is achieved by a diagonal A with k A ii = S. i=1

65 Correlated Parallel Additive Gaussian Channels I: 6-64 Finally, we can adopt the Lagrange multiplier technique as used previously to obtain: A ii = max{0, θ λ i }, where θ is chosen to satisfy k i=1 A ii = S. Waveform channels will be discussed next!

66 Bandlimited Waveform Channels with AWGN I: 6-65 A common model for communication over a radio network or a telephone line is a band-limited channel with white noise, which is a continuous-time channel, modelled as Y t = (X t + N t ) h(t), where represents the convolution operation, X t is the waveform source, Y t is the waveform output, N t is the white noise, and h(t) is the band-limited filter. N t X t + H(f) Y t Channel

67 Bandlimited Waveform Channels with AWGN I: 6-66 Coding over waveform channels For a fixed interval [0, T), select M different functions (waveform codebook) for each informational message. c 1 (t), c 2 (t),..., c M (t), Based on the received function y(t) for t [0, T) at the channel output, a decision on the input message is made. Sampling theorem Sampling a band-limited signal at a sampling rate 1/(2W) is sufficient to reconstruct the signal from the samples, if W is the bandwidth of the signal. Conventional Remark: Based on the sampling theorem, one can sample the filtered waveform signal c m (t) and the filtered waveform noise N t, and reconstruct them distortionlessly by the sampling frequency 2W (from the receiver Y t point of view).

68 Bandlimited Waveform Channels with AWGN I: 6-67 My Remarks: Assume for the moment that the channel is noiseless. The input waveform codeword c j (t) is time-limited in [0, T); so it cannot be band-limited. However, the receiver can only observe a band-limited version c j (t) of c j (t) due to the ideal band-limited filter h(t). Notably, c j (t) is no longer time-limited. By sampling theorem, the band-limited but time-unlimited c j (t) can be distortionlessly reconstructed by its (possibly infinitely many) samples at sampling rate 1/(2W). Implicit System Constraint: Yet, the receiver can only use those samples within time [0, T) to guess what the transmitter originally sent out. Notably, these 2WT samples may not reconstruct c j (t) without distortion. As a result, the waveform codewords {c j (t)} M j=1 are chosen such that their residual signals { c j (t)} M j=1, after experiencing the ideal lowpass filter h(t) and the implicit 2WT-sample constraint, are more resistent to noise.

69 Bandlimited Waveform Channels with AWGN I: 6-68 What we learn from these remarks. The time-limited waveform c j (t) would pass through an ideal lowpass filter and be sampled, and only 2WT samples survives at the receiver end without noise. Xt (not X t ) is the true residual signal that survives at the receiver end without noise. The power constraint in the capacity-cost function is actually applied on Xt (the true signal that can be reconstructed by the 2WT samples seen at the receiver end), rather than the transmitted signal X t. Indeed, the signal-to-noise ratio concerned in most communication problems is the ratio of the signal power survived at the receiver end against the noise power experienced by this received signal. Do not misunderstand that the signal power in this ratio is the transmitted power at the transmitter end.

70 Bandlimited Waveform Channels with AWGN I: 6-69 The noise process How about the band-limited AWGN Ñt = N t h(t)? Ñt is no longer white? However, with the right sampling rate, the samples are still uncorrelated. Consequently, the noises experienced by the 2W T signal samples are still independent Gaussian distributed (cf. The next slide). Y t = (X t + N t ) h(t) = X t + Ñt. Y k/(2w) = X k/(2w) + Ñk/(2W) for 0 k < 2WT.

71 Bandlimited Waveform Channels with AWGN I: 6-70 [( ) ( )] E[Ñi/(2W)Ñk/(2W)] = E h(τ)n i/(2w) τ dτ h(τ )N k/(2w) τ dτ R R = h(τ)h(τ )E [ N i/(2w) τ N k/(2w) τ ] dτ dτ R R = h(τ)h(τ ) N ( 0 i R R 2 δ 2W k ) 2W τ + τ dτ dτ = N 0 h(τ)h(τ (i k)/(2w))dτ 2 R ( ) H(f) 2 df = 1 = N ( W ) ( 0 1 W ) e j2πfτ 1 df e j2πf (τ (i k)/(2w)) df dτ 2 R W 2W W 2W = N W W ( ) 0 e j2π(f+f )τ dτ e j2πf (i k)/(2w) df df 4W W W W W R = N 0 δ(f + f )e j2πf (i k)/(2w) df df 4W W W = N W ( 0 e j2πf(i k)/(2w) df = N ) 0 sin(2πw(i k)/(2w)) 4W W 2 π(i k)/(2w) = N 0 sin(π(i k)) (With the right sample rate, W is cancelled out.) 2 π(i k) { N0 /2, if i = k; = 0, if i k.

72 Bandlimited Waveform Channels with AWGN I: 6-71 Hence, the capacity-cost function of this channel subject to input waveform width T (and implicit system constraint) is equal to: C T (S) = = = = { p X 2WT : 2WT 1 i=0 2WT 1 i=0 2WT 1 i=0 1 2 log 1 2 log 1 2 log = WT log max } I( X 2WT ; Y 2WT ) i=0 E[ X i/(2w) 2 ] S ( 1 + S ) i 2WT 1 σ 2 i ( 1 + S/(2WT) (N 0 /2) ( 1 + S ) WTN 0 ), ( 1 + S WTN 0 where the input samples X 2WT that achieves the capacity is also i.i.d. Gaussian distributed. It can be proved similarly to white N t i.i.d. Ñ2WT that a white Gaussian process X t can render an i.i.d. Gaussian distributed filtered samples X 2WT, if a right sampling rate is employed. )

73 Bandlimited Waveform Channels with AWGN I: 6-72 Remark on notations C T (S) denotes the capacity-cost function subject to input waveform width T. This notation is specifically used in waveform channels. C T (S) should not be confused with the notation of C(S), which is used to represent the capacity-cost function of a discrete-time channels, where the channel input is transmitted only at each sampled time instance, and hence no duration T is involved. One, however, can measure C(S) by the unit of bits per sample period to relate the quantity to the usual unit of data transmission speed, such as bits per second. To obtain the maximum reliable data transmission speed over waveform channel in units of bits per second, we require to calculate: C T (S) T.

74 Bandlimited Waveform Channels with AWGN I: 6-73 Example 6.34 (telephone line channel) Suppose telephone signals are bandlimited to 4 khz. Given signal-to-noise (SNR) ratio of 20dB (namely, S/(WN 0 ) = 20dB) and T = 1 millisecond, the capacity of bandlimited Gaussian waveform channel is equal to: C T (S) = WT log ( 1 + S ) WTN 0 = 4000 ( ) log Therefore, the maximum reliable transmission speed is: C T (S) T = nats per second ( ) = bits per second.

75 Bandlimited Waveform Channels with AWGN I: 6-74 Remarks: It needs to pay special attention that the capacity-cost formula used above is calculated based on a prohibitively simplified channel model, i.e., the noise is additive white Gaussian. The capacity formula from such a simplified model only provides a lower bound to the true channel capacity, since AWGN is the worst noise in the sense of capacity. So the true channel capacity is possibly higher than the quantity obtained in the above example!

76 Band-Unlimited Waveform Channels with AWGN I: 6-75 Take W in the above formula, we obtain the channel capacity for Gaussian waveform channel of infinite bandwidth is C T (S) S N 0 (nats per T unit time). The capacity grows linearly with the input power.

77 Filtered Waveform Stationary Gaussian Channels I: 6-76 N t X t Y H(f) + t Channel The filtered waveform stationary Gaussian channel can be transformed to: N t X t + H(f) Y t where the power spectral density PSD N (f) of N t satisfies PSD N (f) = PSD N(f) H(f) 2, and PSD N (f) is the power spectral density of N t.

78 Filtered Waveform Stationary Gaussian Channels I: 6-77 Notes We cannot use sampling theorem in this case (as we did previously) because N t is no longer assumed white; so the noise samples are not i.i.d. h(t) is not necessarily band-limited. Lemma 6.35 Any real-valued function v(t) defined over [0, T) can be decomposed into v(t) = v i Ψ i (t), i=1 where the real-valued functions {Ψ i (t)} are any orthonormal set (which can span the space with respect to v(t)) of functions on [0, T), namely and T 0 Ψ i (t)ψ j (t)dt = v i = T 0 { 1, if i = j; 0, if i j, Ψ i (t)v(t)dt.

79 Filtered Waveform Stationary Gaussian Channels I: 6-78 Examples Orthonormal set is not unique. Sampling theorem employs sinc function as the orthonormal set to span a bandlimited signal. Fourier expansion is another example of orthonormal expansion. The above two orthonormal sets do not suit here, because their resultant coefficients are not necessarily i.i.d. We therefore have to introduce the Karhunen-Loeve expansion.

80 Filtered Waveform Stationary Gaussian Channels I: 6-79 Lemma 6.36 (Karhunen-Loeve expansion) Given a stationary random process V t, and its autocorrelation function φ V (t) = E[V τ V τ+t ]. Let {Ψ i (t)} i=1 and {λ i} i=1 be the eigenfunctions and eigenvalues of φ V (t), namely T 0 φ V (t s)ψ i (s)ds = λ i Ψ i (t), 0 t T. Then the expansion coefficients {Λ i } i=1 of V t with respect to orthonormal functions {Ψ i (t)} i=1 are uncorrelated. In addition, if V t is Gaussian, then {Λ i } i=1 are independent Gaussian random variables.

81 Filtered Waveform Stationary Gaussian Channels I: 6-80 Proof: [ T E[Λ i Λ j ] = E Ψ i (t)v t dt = = = = = 0 T T 0 T 0 T 0 T 0 T T 0 ] Ψ j (s)v s ds Ψ i (t)ψ j (s)e[v t V s ]dtds Ψ i (t)ψ j (s)φ V (t s)dtds 0 ( T ) Ψ i (t) Ψ j (s)φ V (t s)ds dt 0 Ψ i (t)(λ j Ψ j (t))dt { 0 λi, if i = j; 0, if i j.

82 Filtered Waveform Stationary Gaussian Channels I: 6-81 Theorem 6.37 Give a filtered Gaussian waveform channel with noise spectral density PSD N (f) and filter H(f). C T (S) C(S) = lim = 1 [ ] θ max 0, log df, T T 2 PSD N (f)/ H(f) 2 where θ is the solution of S = max [ 0, θ PSD ] N(f) df. H(f) 2 Remark: This also follows the water-pouring scheme.

83 Filtered Waveform Stationary Gaussian Channels I: 6-82 (a) The water pouring scheme. The curve is PSD N (f)/h 2 (f) (b) The input spectral density that achieves capacity.

84 Filtered Waveform Stationary Gaussian Channels I: 6-83 Proof: Let the Karhunen-Loeve expansion of the autocorrelation function of N t be denoted by {Ψ i (t)} i=1 To abuse the notations without ambiguity, we let N i and X i be the Karhunen- Loeve coefficients of N t and X t with respect to Ψ i (t). Since {N i } i=1 are independent Gaussian distributed, we obtain that the channel capacity subject to input waveform width T is 1 C T (S) = 2 log where θ is the solution of = = i=1 i=1 i=1 S = 1 2 log 1 2 max ( 1 + max(0, θ λ i) [ max λ i (1, θλi )] ] [0, log θλi, max [0, θ λ i ] i=1 and E[(N i )2 ] = λ i is the i-th eigenvalue of the autocorrelation function of N t (corresponding to eigenfunction Ψ i (t)). )

85 Filtered Waveform Stationary Gaussian Channels I: 6-84 The proof is then completed by applying the Toeplitz distribution theorem. [Toeplitz distribution theorem] Consider a zero-mean stationary random process V t with power spectral density PSD V (f) and PSD V (f)df <. Denote by λ 1 (T), λ 2 (T), λ 3 (T),... the eigenvalues of the Karhunen-Loeve expansion corresponding to the autocorrelation function of V t over the time interval of width T. Then for any real-valued continuous function a( ) satisfying for any finite constant K, a(t) K t lim T 1 T a(λ i (T)) = i=1 for 0 t max f R PSD V (f) a ( PSD V (f) ) df.

86 Information Transmission Theorem I: 6-85 Theorem 6.38 (joint source-channel coding theorem) Fix a distortion measure. A DMS can be reproduced at the output of a channel with distortion less than D (by taking sufficiently large blocklength), if R(D) nats/source letter T s seconds/source letter < C(S) nats/channel usage T c seconds/channel usage, where T s and T c represent the durations per source letter and per channel input, respectively. Note that the units of R(D) and C(S) should be the same, i.e., they should be measured both in nats (by taking natural logarithm), or both in bits (by taking base-2 logarithm). Theorem 6.39 (joint source-channel coding converse) All data transmission codes will have average distortion larger than D for sufficiently large blocklength, if R(D) > C(S). T s T c

87 Information Transmission Theorem I: 6-86 Example 6.40 (additive white Gaussian noise (AWGN) channel with binary channel input) The source is discrete-time binary memoryless with uniform marginal. The discrete-time continuous channel has binary input alphabet and real-line output alphabet with Gaussian transition probability. Denote by P b the probability of bit error (i.e, Hamming distortion measure is adopted.)

88 Information Transmission Theorem I: 6-87 The rate-distortion function for binary input and Hamming additive distortion measure is log(2) H b (D), if 0 D 1 R(D) = 2 ; 0, if D > 1 2. Due to Butman and McEliece, the channel capacity-cost function for binaryinput AWGN channel is C(S) = S σ 1 [ ( )] e y2 /2 S S log cosh 2 2π σ + y dy 2 σ 2 = E bt c /T s N 0 /2 1 [ ( )] e y2 /2 E b T c /T s log cosh 2π N 0 /2 + y E b T c /T s dy N 0 /2 = 2Rγ b 1 2π e y2 /2 log[cosh(2rγ b + y 2Rγ b )]dy, where R = T c /T s is the code rate for data transmission and is measured in the unit of source letter/channel usage (or information bit/channel bit), and γ b (often denoted by E b /N 0 ) is the signal-to-noise ratio per information bit.

89 Information Transmission Theorem I: 6-88 Then from the joint source-channel coding theorem, good codes exist when or equivalently log(2) H b (P b ) < 2γ b 1 R 2π R(D) < T s T c C(S), e y2 /2 log[cosh(2rγ b + y 2Rγ b )]dy.

90 Information Transmission Theorem I: 6-89 By re-formulating the above inequality as H b (P b ) > log(2) 2γ b + 1 R 2π e y2 /2 log[cosh(2rγ b + y 2Rγ b )]dy, a lower bound on the bit error probability as a function of γ b is established. This is the Shannon limit for any code to achieve binary-input Gaussian channel. P b 1 1e-1 1e-2 1e-3 1e-4 1e-5 Shannon limit R = 1/2 R = 1/3 1e γ b (db) The Shannon limits for (2, 1) and (3, 1) codes under binary-input AWGN channel.

91 Information Transmission Theorem I: 6-90 The result in the above example becomes important due to the invention of the Turbo coding, for which a near-shannon-limit performance is first obtained. That implies that a near-optimal channel code has been constructed, since in principle, no codes can perform better than the Shannon-limit.

92 Information Transmission Theorem I: 6-91 Example 6.41 (AWGN channel with real number input) The source is discrete-time binary memoryless with uniform marginal. The discrete-time continuous channel has real-line input alphabet and real-line output alphabet with Gaussian transition probability. Denote by P b the probability of bit error (i.e, Hamming distortion measure is adopted.)

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University Chapter 4 Data Transmission and Channel Capacity Po-Ning Chen, Professor Department of Communications Engineering National Chiao Tung University Hsin Chu, Taiwan 30050, R.O.C. Principle of Data Transmission

More information

ELEC546 Review of Information Theory

ELEC546 Review of Information Theory ELEC546 Review of Information Theory Vincent Lau 1/1/004 1 Review of Information Theory Entropy: Measure of uncertainty of a random variable X. The entropy of X, H(X), is given by: If X is a discrete random

More information

ECE 4400:693 - Information Theory

ECE 4400:693 - Information Theory ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential

More information

Chapter 9 Fundamental Limits in Information Theory

Chapter 9 Fundamental Limits in Information Theory Chapter 9 Fundamental Limits in Information Theory Information Theory is the fundamental theory behind information manipulation, including data compression and data transmission. 9.1 Introduction o For

More information

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157 Lecture 6: Gaussian Channels Copyright G. Caire (Sample Lectures) 157 Differential entropy (1) Definition 18. The (joint) differential entropy of a continuous random vector X n p X n(x) over R is: Z h(x

More information

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions EE/Stat 376B Handout #5 Network Information Theory October, 14, 014 1. Problem.4 parts (b) and (c). Homework Set # Solutions (b) Consider h(x + Y ) h(x + Y Y ) = h(x Y ) = h(x). (c) Let ay = Y 1 + Y, where

More information

Lecture 5 Channel Coding over Continuous Channels

Lecture 5 Channel Coding over Continuous Channels Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 14, 2014 1 / 34 I-Hsiang Wang NIT Lecture 5 From

More information

18.2 Continuous Alphabet (discrete-time, memoryless) Channel

18.2 Continuous Alphabet (discrete-time, memoryless) Channel 0-704: Information Processing and Learning Spring 0 Lecture 8: Gaussian channel, Parallel channels and Rate-distortion theory Lecturer: Aarti Singh Scribe: Danai Koutra Disclaimer: These notes have not

More information

Lecture 17: Differential Entropy

Lecture 17: Differential Entropy Lecture 17: Differential Entropy Differential entropy AEP for differential entropy Quantization Maximum differential entropy Estimation counterpart of Fano s inequality Dr. Yao Xie, ECE587, Information

More information

Chapter 4: Continuous channel and its capacity

Chapter 4: Continuous channel and its capacity meghdadi@ensil.unilim.fr Reference : Elements of Information Theory by Cover and Thomas Continuous random variable Gaussian multivariate random variable AWGN Band limited channel Parallel channels Flat

More information

Gaussian channel. Information theory 2013, lecture 6. Jens Sjölund. 8 May Jens Sjölund (IMT, LiU) Gaussian channel 1 / 26

Gaussian channel. Information theory 2013, lecture 6. Jens Sjölund. 8 May Jens Sjölund (IMT, LiU) Gaussian channel 1 / 26 Gaussian channel Information theory 2013, lecture 6 Jens Sjölund 8 May 2013 Jens Sjölund (IMT, LiU) Gaussian channel 1 / 26 Outline 1 Definitions 2 The coding theorem for Gaussian channel 3 Bandlimited

More information

Revision of Lecture 5

Revision of Lecture 5 Revision of Lecture 5 Information transferring across channels Channel characteristics and binary symmetric channel Average mutual information Average mutual information tells us what happens to information

More information

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Lecture 6 I. CHANNEL CODING. X n (m) P Y X 6- Introduction to Information Theory Lecture 6 Lecturer: Haim Permuter Scribe: Yoav Eisenberg and Yakov Miron I. CHANNEL CODING We consider the following channel coding problem: m = {,2,..,2 nr} Encoder

More information

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page. EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 28 Please submit on Gradescope. Start every question on a new page.. Maximum Differential Entropy (a) Show that among all distributions supported

More information

CS6304 / Analog and Digital Communication UNIT IV - SOURCE AND ERROR CONTROL CODING PART A 1. What is the use of error control coding? The main use of error control coding is to reduce the overall probability

More information

LECTURE 18. Lecture outline Gaussian channels: parallel colored noise inter-symbol interference general case: multiple inputs and outputs

LECTURE 18. Lecture outline Gaussian channels: parallel colored noise inter-symbol interference general case: multiple inputs and outputs LECTURE 18 Last time: White Gaussian noise Bandlimited WGN Additive White Gaussian Noise (AWGN) channel Capacity of AWGN channel Application: DS-CDMA systems Spreading Coding theorem Lecture outline Gaussian

More information

Lecture 3: Channel Capacity

Lecture 3: Channel Capacity Lecture 3: Channel Capacity 1 Definitions Channel capacity is a measure of maximum information per channel usage one can get through a channel. This one of the fundamental concepts in information theory.

More information

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 8: Differential entropy Chapter 8 outline Motivation Definitions Relation to discrete entropy Joint and conditional differential entropy Relative entropy and mutual information Properties AEP for

More information

(Classical) Information Theory III: Noisy channel coding

(Classical) Information Theory III: Noisy channel coding (Classical) Information Theory III: Noisy channel coding Sibasish Ghosh The Institute of Mathematical Sciences CIT Campus, Taramani, Chennai 600 113, India. p. 1 Abstract What is the best possible way

More information

Capacity of AWGN channels

Capacity of AWGN channels Chapter 3 Capacity of AWGN channels In this chapter we prove that the capacity of an AWGN channel with bandwidth W and signal-tonoise ratio SNR is W log 2 (1+SNR) bits per second (b/s). The proof that

More information

Principles of Communications

Principles of Communications Principles of Communications Weiyao Lin Shanghai Jiao Tong University Chapter 10: Information Theory Textbook: Chapter 12 Communication Systems Engineering: Ch 6.1, Ch 9.1~ 9. 92 2009/2010 Meixia Tao @

More information

Basic Principles of Video Coding

Basic Principles of Video Coding Basic Principles of Video Coding Introduction Categories of Video Coding Schemes Information Theory Overview of Video Coding Techniques Predictive coding Transform coding Quantization Entropy coding Motion

More information

Exercises with solutions (Set D)

Exercises with solutions (Set D) Exercises with solutions Set D. A fair die is rolled at the same time as a fair coin is tossed. Let A be the number on the upper surface of the die and let B describe the outcome of the coin toss, where

More information

LECTURE 10. Last time: Lecture outline

LECTURE 10. Last time: Lecture outline LECTURE 10 Joint AEP Coding Theorem Last time: Error Exponents Lecture outline Strong Coding Theorem Reading: Gallager, Chapter 5. Review Joint AEP A ( ɛ n) (X) A ( ɛ n) (Y ) vs. A ( ɛ n) (X, Y ) 2 nh(x)

More information

Chapter I: Fundamental Information Theory

Chapter I: Fundamental Information Theory ECE-S622/T62 Notes Chapter I: Fundamental Information Theory Ruifeng Zhang Dept. of Electrical & Computer Eng. Drexel University. Information Source Information is the outcome of some physical processes.

More information

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122 Lecture 5: Channel Capacity Copyright G. Caire (Sample Lectures) 122 M Definitions and Problem Setup 2 X n Y n Encoder p(y x) Decoder ˆM Message Channel Estimate Definition 11. Discrete Memoryless Channel

More information

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet. EE376A - Information Theory Final, Monday March 14th 216 Solutions Instructions: You have three hours, 3.3PM - 6.3PM The exam has 4 questions, totaling 12 points. Please start answering each question on

More information

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions Engineering Tripos Part IIA THIRD YEAR 3F: Signals and Systems INFORMATION THEORY Examples Paper Solutions. Let the joint probability mass function of two binary random variables X and Y be given in the

More information

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion st Semester 00/ Solutions to Homework Set # Sanov s Theorem, Rate distortion. Sanov s theorem: Prove the simple version of Sanov s theorem for the binary random variables, i.e., let X,X,...,X n be a sequence

More information

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Capacity of a channel Shannon s second theorem. Information Theory 1/33 Capacity of a channel Shannon s second theorem Information Theory 1/33 Outline 1. Memoryless channels, examples ; 2. Capacity ; 3. Symmetric channels ; 4. Channel Coding ; 5. Shannon s second theorem,

More information

Lecture 2. Capacity of the Gaussian channel

Lecture 2. Capacity of the Gaussian channel Spring, 207 5237S, Wireless Communications II 2. Lecture 2 Capacity of the Gaussian channel Review on basic concepts in inf. theory ( Cover&Thomas: Elements of Inf. Theory, Tse&Viswanath: Appendix B) AWGN

More information

LECTURE 13. Last time: Lecture outline

LECTURE 13. Last time: Lecture outline LECTURE 13 Last time: Strong coding theorem Revisiting channel and codes Bound on probability of error Error exponent Lecture outline Fano s Lemma revisited Fano s inequality for codewords Converse to

More information

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity 5-859: Information Theory and Applications in TCS CMU: Spring 23 Lecture 8: Channel and source-channel coding theorems; BEC & linear codes February 7, 23 Lecturer: Venkatesan Guruswami Scribe: Dan Stahlke

More information

Shannon s A Mathematical Theory of Communication

Shannon s A Mathematical Theory of Communication Shannon s A Mathematical Theory of Communication Emre Telatar EPFL Kanpur October 19, 2016 First published in two parts in the July and October 1948 issues of BSTJ. First published in two parts in the

More information

ECE Information theory Final (Fall 2008)

ECE Information theory Final (Fall 2008) ECE 776 - Information theory Final (Fall 2008) Q.1. (1 point) Consider the following bursty transmission scheme for a Gaussian channel with noise power N and average power constraint P (i.e., 1/n X n i=1

More information

Digital Communications

Digital Communications Digital Communications Chapter 9 Digital Communications Through Band-Limited Channels Po-Ning Chen, Professor Institute of Communications Engineering National Chiao-Tung University, Taiwan Digital Communications:

More information

Shannon s noisy-channel theorem

Shannon s noisy-channel theorem Shannon s noisy-channel theorem Information theory Amon Elders Korteweg de Vries Institute for Mathematics University of Amsterdam. Tuesday, 26th of Januari Amon Elders (Korteweg de Vries Institute for

More information

7 The Waveform Channel

7 The Waveform Channel 7 The Waveform Channel The waveform transmitted by the digital demodulator will be corrupted by the channel before it reaches the digital demodulator in the receiver. One important part of the channel

More information

Lecture 4 Noisy Channel Coding

Lecture 4 Noisy Channel Coding Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 9, 2015 1 / 56 I-Hsiang Wang IT Lecture 4 The Channel Coding Problem

More information

Shannon meets Wiener II: On MMSE estimation in successive decoding schemes

Shannon meets Wiener II: On MMSE estimation in successive decoding schemes Shannon meets Wiener II: On MMSE estimation in successive decoding schemes G. David Forney, Jr. MIT Cambridge, MA 0239 USA forneyd@comcast.net Abstract We continue to discuss why MMSE estimation arises

More information

Lecture 11: Continuous-valued signals and differential entropy

Lecture 11: Continuous-valued signals and differential entropy Lecture 11: Continuous-valued signals and differential entropy Biology 429 Carl Bergstrom September 20, 2008 Sources: Parts of today s lecture follow Chapter 8 from Cover and Thomas (2007). Some components

More information

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15 EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15 1. Cascade of Binary Symmetric Channels The conditional probability distribution py x for each of the BSCs may be expressed by the transition probability

More information

Communication Theory II

Communication Theory II Communication Theory II Lecture 15: Information Theory (cont d) Ahmed Elnakib, PhD Assistant Professor, Mansoura University, Egypt March 29 th, 2015 1 Example: Channel Capacity of BSC o Let then: o For

More information

Lecture 22: Final Review

Lecture 22: Final Review Lecture 22: Final Review Nuts and bolts Fundamental questions and limits Tools Practical algorithms Future topics Dr Yao Xie, ECE587, Information Theory, Duke University Basics Dr Yao Xie, ECE587, Information

More information

Electrical and Information Technology. Information Theory. Problems and Solutions. Contents. Problems... 1 Solutions...7

Electrical and Information Technology. Information Theory. Problems and Solutions. Contents. Problems... 1 Solutions...7 Electrical and Information Technology Information Theory Problems and Solutions Contents Problems.......... Solutions...........7 Problems 3. In Problem?? the binomial coefficent was estimated with Stirling

More information

Lecture 8: Channel Capacity, Continuous Random Variables

Lecture 8: Channel Capacity, Continuous Random Variables EE376A/STATS376A Information Theory Lecture 8-02/0/208 Lecture 8: Channel Capacity, Continuous Random Variables Lecturer: Tsachy Weissman Scribe: Augustine Chemparathy, Adithya Ganesh, Philip Hwang Channel

More information

EC2252 COMMUNICATION THEORY UNIT 5 INFORMATION THEORY

EC2252 COMMUNICATION THEORY UNIT 5 INFORMATION THEORY EC2252 COMMUNICATION THEORY UNIT 5 INFORMATION THEORY Discrete Messages and Information Content, Concept of Amount of Information, Average information, Entropy, Information rate, Source coding to increase

More information

EE 4TM4: Digital Communications II. Channel Capacity

EE 4TM4: Digital Communications II. Channel Capacity EE 4TM4: Digital Communications II 1 Channel Capacity I. CHANNEL CODING THEOREM Definition 1: A rater is said to be achievable if there exists a sequence of(2 nr,n) codes such thatlim n P (n) e (C) = 0.

More information

Solutions to Homework Set #4 Differential Entropy and Gaussian Channel

Solutions to Homework Set #4 Differential Entropy and Gaussian Channel Solutions to Homework Set #4 Differential Entropy and Gaussian Channel 1. Differential entropy. Evaluate the differential entropy h(x = f lnf for the following: (a Find the entropy of the exponential density

More information

Lecture 18: Gaussian Channel

Lecture 18: Gaussian Channel Lecture 18: Gaussian Channel Gaussian channel Gaussian channel capacity Dr. Yao Xie, ECE587, Information Theory, Duke University Mona Lisa in AWGN Mona Lisa Noisy Mona Lisa 100 100 200 200 300 300 400

More information

Solutions to Homework Set #3 Channel and Source coding

Solutions to Homework Set #3 Channel and Source coding Solutions to Homework Set #3 Channel and Source coding. Rates (a) Channels coding Rate: Assuming you are sending 4 different messages using usages of a channel. What is the rate (in bits per channel use)

More information

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1 Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,

More information

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture : Mutual Information Method Lecturer: Yihong Wu Scribe: Jaeho Lee, Mar, 06 Ed. Mar 9 Quick review: Assouad s lemma

More information

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm 1. Feedback does not increase the capacity. Consider a channel with feedback. We assume that all the recieved outputs are sent back immediately

More information

Block 2: Introduction to Information Theory

Block 2: Introduction to Information Theory Block 2: Introduction to Information Theory Francisco J. Escribano April 26, 2015 Francisco J. Escribano Block 2: Introduction to Information Theory April 26, 2015 1 / 51 Table of contents 1 Motivation

More information

CHAPTER 3. P (B j A i ) P (B j ) =log 2. j=1

CHAPTER 3. P (B j A i ) P (B j ) =log 2. j=1 CHAPTER 3 Problem 3. : Also : Hence : I(B j ; A i ) = log P (B j A i ) P (B j ) 4 P (B j )= P (B j,a i )= i= 3 P (A i )= P (B j,a i )= j= =log P (B j,a i ) P (B j )P (A i ).3, j=.7, j=.4, j=3.3, i=.7,

More information

Second-Order Asymptotics in Information Theory

Second-Order Asymptotics in Information Theory Second-Order Asymptotics in Information Theory Vincent Y. F. Tan (vtan@nus.edu.sg) Dept. of ECE and Dept. of Mathematics National University of Singapore (NUS) National Taiwan University November 2015

More information

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information 204 IEEE International Symposium on Information Theory Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information Omur Ozel, Kaya Tutuncuoglu 2, Sennur Ulukus, and Aylin Yener

More information

ECE Information theory Final

ECE Information theory Final ECE 776 - Information theory Final Q1 (1 point) We would like to compress a Gaussian source with zero mean and variance 1 We consider two strategies In the first, we quantize with a step size so that the

More information

Solutions to Set #2 Data Compression, Huffman code and AEP

Solutions to Set #2 Data Compression, Huffman code and AEP Solutions to Set #2 Data Compression, Huffman code and AEP. Huffman coding. Consider the random variable ( ) x x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0. 0.04 0.04 0.03 0.02 (a) Find a binary Huffman code

More information

Upper Bounds on the Capacity of Binary Intermittent Communication

Upper Bounds on the Capacity of Binary Intermittent Communication Upper Bounds on the Capacity of Binary Intermittent Communication Mostafa Khoshnevisan and J. Nicholas Laneman Department of Electrical Engineering University of Notre Dame Notre Dame, Indiana 46556 Email:{mhoshne,

More information

UCSD ECE250 Handout #27 Prof. Young-Han Kim Friday, June 8, Practice Final Examination (Winter 2017)

UCSD ECE250 Handout #27 Prof. Young-Han Kim Friday, June 8, Practice Final Examination (Winter 2017) UCSD ECE250 Handout #27 Prof. Young-Han Kim Friday, June 8, 208 Practice Final Examination (Winter 207) There are 6 problems, each problem with multiple parts. Your answer should be as clear and readable

More information

On the Duality between Multiple-Access Codes and Computation Codes

On the Duality between Multiple-Access Codes and Computation Codes On the Duality between Multiple-Access Codes and Computation Codes Jingge Zhu University of California, Berkeley jingge.zhu@berkeley.edu Sung Hoon Lim KIOST shlim@kiost.ac.kr Michael Gastpar EPFL michael.gastpar@epfl.ch

More information

Lecture 15: Conditional and Joint Typicaility

Lecture 15: Conditional and Joint Typicaility EE376A Information Theory Lecture 1-02/26/2015 Lecture 15: Conditional and Joint Typicaility Lecturer: Kartik Venkat Scribe: Max Zimet, Brian Wai, Sepehr Nezami 1 Notation We always write a sequence of

More information

Binary Transmissions over Additive Gaussian Noise: A Closed-Form Expression for the Channel Capacity 1

Binary Transmissions over Additive Gaussian Noise: A Closed-Form Expression for the Channel Capacity 1 5 Conference on Information Sciences and Systems, The Johns Hopkins University, March 6 8, 5 inary Transmissions over Additive Gaussian Noise: A Closed-Form Expression for the Channel Capacity Ahmed O.

More information

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code

More information

An introduction to basic information theory. Hampus Wessman

An introduction to basic information theory. Hampus Wessman An introduction to basic information theory Hampus Wessman Abstract We give a short and simple introduction to basic information theory, by stripping away all the non-essentials. Theoretical bounds on

More information

Energy State Amplification in an Energy Harvesting Communication System

Energy State Amplification in an Energy Harvesting Communication System Energy State Amplification in an Energy Harvesting Communication System Omur Ozel Sennur Ulukus Department of Electrical and Computer Engineering University of Maryland College Park, MD 20742 omur@umd.edu

More information

Coding for Discrete Source

Coding for Discrete Source EGR 544 Communication Theory 3. Coding for Discrete Sources Z. Aliyazicioglu Electrical and Computer Engineering Department Cal Poly Pomona Coding for Discrete Source Coding Represent source data effectively

More information

Lecture 6 Channel Coding over Continuous Channels

Lecture 6 Channel Coding over Continuous Channels Lecture 6 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 9, 015 1 / 59 I-Hsiang Wang IT Lecture 6 We have

More information

Lecture 11: Quantum Information III - Source Coding

Lecture 11: Quantum Information III - Source Coding CSCI5370 Quantum Computing November 25, 203 Lecture : Quantum Information III - Source Coding Lecturer: Shengyu Zhang Scribe: Hing Yin Tsang. Holevo s bound Suppose Alice has an information source X that

More information

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy Coding and Information Theory Chris Williams, School of Informatics, University of Edinburgh Overview What is information theory? Entropy Coding Information Theory Shannon (1948): Information theory is

More information

P 1.5 X 4.5 / X 2 and (iii) The smallest value of n for

P 1.5 X 4.5 / X 2 and (iii) The smallest value of n for DHANALAKSHMI COLLEGE OF ENEINEERING, CHENNAI DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING MA645 PROBABILITY AND RANDOM PROCESS UNIT I : RANDOM VARIABLES PART B (6 MARKS). A random variable X

More information

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006)

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006) MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK SATELLITE COMMUNICATION DEPT./SEM.:ECE/VIII UNIT V PART-A 1. What is binary symmetric channel (AUC DEC 2006) 2. Define information rate? (AUC DEC 2007)

More information

x log x, which is strictly convex, and use Jensen s Inequality:

x log x, which is strictly convex, and use Jensen s Inequality: 2. Information measures: mutual information 2.1 Divergence: main inequality Theorem 2.1 (Information Inequality). D(P Q) 0 ; D(P Q) = 0 iff P = Q Proof. Let ϕ(x) x log x, which is strictly convex, and

More information

Lattices for Distributed Source Coding: Jointly Gaussian Sources and Reconstruction of a Linear Function

Lattices for Distributed Source Coding: Jointly Gaussian Sources and Reconstruction of a Linear Function Lattices for Distributed Source Coding: Jointly Gaussian Sources and Reconstruction of a Linear Function Dinesh Krithivasan and S. Sandeep Pradhan Department of Electrical Engineering and Computer Science,

More information

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Multimedia Communications. Mathematical Preliminaries for Lossless Compression Multimedia Communications Mathematical Preliminaries for Lossless Compression What we will see in this chapter Definition of information and entropy Modeling a data source Definition of coding and when

More information

Feedback Capacity of a Class of Symmetric Finite-State Markov Channels

Feedback Capacity of a Class of Symmetric Finite-State Markov Channels Feedback Capacity of a Class of Symmetric Finite-State Markov Channels Nevroz Şen, Fady Alajaji and Serdar Yüksel Department of Mathematics and Statistics Queen s University Kingston, ON K7L 3N6, Canada

More information

( 1 k "information" I(X;Y) given by Y about X)

( 1 k information I(X;Y) given by Y about X) SUMMARY OF SHANNON DISTORTION-RATE THEORY Consider a stationary source X with f (x) as its th-order pdf. Recall the following OPTA function definitions: δ(,r) = least dist'n of -dim'l fixed-rate VQ's w.

More information

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample

More information

MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK. SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A

MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK. SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK DEPARTMENT: ECE SEMESTER: IV SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A 1. What is binary symmetric channel (AUC DEC

More information

Statistical signal processing

Statistical signal processing Statistical signal processing Short overview of the fundamentals Outline Random variables Random processes Stationarity Ergodicity Spectral analysis Random variable and processes Intuition: A random variable

More information

National University of Singapore Department of Electrical & Computer Engineering. Examination for

National University of Singapore Department of Electrical & Computer Engineering. Examination for National University of Singapore Department of Electrical & Computer Engineering Examination for EE5139R Information Theory for Communication Systems (Semester I, 2014/15) November/December 2014 Time Allowed:

More information

10-704: Information Processing and Learning Fall Lecture 9: Sept 28

10-704: Information Processing and Learning Fall Lecture 9: Sept 28 10-704: Information Processing and Learning Fall 2016 Lecturer: Siheng Chen Lecture 9: Sept 28 Note: These notes are based on scribed notes from Spring15 offering of this course. LaTeX template courtesy

More information

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University Chapter 3, 4 Random Variables ENCS6161 - Probability and Stochastic Processes Concordia University ENCS6161 p.1/47 The Notion of a Random Variable A random variable X is a function that assigns a real

More information

Information measures in simple coding problems

Information measures in simple coding problems Part I Information measures in simple coding problems in this web service in this web service Source coding and hypothesis testing; information measures A(discrete)source is a sequence {X i } i= of random

More information

Linear Codes, Target Function Classes, and Network Computing Capacity

Linear Codes, Target Function Classes, and Network Computing Capacity Linear Codes, Target Function Classes, and Network Computing Capacity Rathinakumar Appuswamy, Massimo Franceschetti, Nikhil Karamchandani, and Kenneth Zeger IEEE Transactions on Information Theory Submitted:

More information

EE 4TM4: Digital Communications II Scalar Gaussian Channel

EE 4TM4: Digital Communications II Scalar Gaussian Channel EE 4TM4: Digital Communications II Scalar Gaussian Channel I. DIFFERENTIAL ENTROPY Let X be a continuous random variable with probability density function (pdf) f(x) (in short X f(x)). The differential

More information

Lecture 15: Thu Feb 28, 2019

Lecture 15: Thu Feb 28, 2019 Lecture 15: Thu Feb 28, 2019 Announce: HW5 posted Lecture: The AWGN waveform channel Projecting temporally AWGN leads to spatially AWGN sufficiency of projection: irrelevancy theorem in waveform AWGN:

More information

ECE534, Spring 2018: Solutions for Problem Set #5

ECE534, Spring 2018: Solutions for Problem Set #5 ECE534, Spring 08: s for Problem Set #5 Mean Value and Autocorrelation Functions Consider a random process X(t) such that (i) X(t) ± (ii) The number of zero crossings, N(t), in the interval (0, t) is described

More information

EE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16

EE5319R: Problem Set 3 Assigned: 24/08/16, Due: 31/08/16 EE539R: Problem Set 3 Assigned: 24/08/6, Due: 3/08/6. Cover and Thomas: Problem 2.30 (Maimum Entropy): Solution: We are required to maimize H(P X ) over all distributions P X on the non-negative integers

More information

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for MARKOV CHAINS A finite state Markov chain is a sequence S 0,S 1,... of discrete cv s from a finite alphabet S where q 0 (s) is a pmf on S 0 and for n 1, Q(s s ) = Pr(S n =s S n 1 =s ) = Pr(S n =s S n 1

More information

The Central Limit Theorem: More of the Story

The Central Limit Theorem: More of the Story The Central Limit Theorem: More of the Story Steven Janke November 2015 Steven Janke (Seminar) The Central Limit Theorem:More of the Story November 2015 1 / 33 Central Limit Theorem Theorem (Central Limit

More information

C.M. Liu Perceptual Signal Processing Lab College of Computer Science National Chiao-Tung University

C.M. Liu Perceptual Signal Processing Lab College of Computer Science National Chiao-Tung University Quantization C.M. Liu Perceptual Signal Processing Lab College of Computer Science National Chiao-Tung University http://www.csie.nctu.edu.tw/~cmliu/courses/compression/ Office: EC538 (03)5731877 cmliu@cs.nctu.edu.tw

More information

Shannon s Noisy-Channel Coding Theorem

Shannon s Noisy-Channel Coding Theorem Shannon s Noisy-Channel Coding Theorem Lucas Slot Sebastian Zur February 2015 Abstract In information theory, Shannon s Noisy-Channel Coding Theorem states that it is possible to communicate over a noisy

More information

Exercise 1. = P(y a 1)P(a 1 )

Exercise 1. = P(y a 1)P(a 1 ) Chapter 7 Channel Capacity Exercise 1 A source produces independent, equally probable symbols from an alphabet {a 1, a 2 } at a rate of one symbol every 3 seconds. These symbols are transmitted over a

More information

Quiz 2 Date: Monday, November 21, 2016

Quiz 2 Date: Monday, November 21, 2016 10-704 Information Processing and Learning Fall 2016 Quiz 2 Date: Monday, November 21, 2016 Name: Andrew ID: Department: Guidelines: 1. PLEASE DO NOT TURN THIS PAGE UNTIL INSTRUCTED. 2. Write your name,

More information

SOURCE CODING WITH SIDE INFORMATION AT THE DECODER (WYNER-ZIV CODING) FEB 13, 2003

SOURCE CODING WITH SIDE INFORMATION AT THE DECODER (WYNER-ZIV CODING) FEB 13, 2003 SOURCE CODING WITH SIDE INFORMATION AT THE DECODER (WYNER-ZIV CODING) FEB 13, 2003 SLEPIAN-WOLF RESULT { X i} RATE R x ENCODER 1 DECODER X i V i {, } { V i} ENCODER 0 RATE R v Problem: Determine R, the

More information

19. Channel coding: energy-per-bit, continuous-time channels

19. Channel coding: energy-per-bit, continuous-time channels 9. Channel coding: energy-per-bit, continuous-time channels 9. Energy per bit Consider the additive Gaussian noise channel: Y i = X i + Z i, Z i N ( 0, ). (9.) In the last lecture, we analyzed the maximum

More information

Homework Set #2 Data Compression, Huffman code and AEP

Homework Set #2 Data Compression, Huffman code and AEP Homework Set #2 Data Compression, Huffman code and AEP 1. Huffman coding. Consider the random variable ( x1 x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0.11 0.04 0.04 0.03 0.02 (a Find a binary Huffman code

More information