Chapter I: Fundamental Information Theory

Size: px
Start display at page:

Download "Chapter I: Fundamental Information Theory"

Transcription

1 ECE-S622/T62 Notes Chapter I: Fundamental Information Theory Ruifeng Zhang Dept. of Electrical & Computer Eng. Drexel University. Information Source Information is the outcome of some physical processes. Though information sources are highly complex and diverse in real world, we study them from a viewpoint of communications engineering and using probabilistic model. An information source is modeled as a stochastic process {X(t); p( )} where p( ) is the (possibly infinite-dimensional) distribution of the process X(t). According to the time index t and the value domain of X(t), the information sources can be divided into different categories. If t is continuous, the source is a continuous-time source but usually called waveform source. If t is discrete, the source is called discrete-time source. The set of values from which X(t) takes is denoted as X. If X is continuous, the source is a continuous source. Otherwise, the source is a discrete source. The value set X of a discrete source is called the alphabet of the source and each element of it is called a symbol or letter. Discrete sources are of primary interest to the discussion of digital communications; therefore, we will first study such sources. Example. An information source of dialed telephone numbers has the source alphabet X = {,, 2, 3, 4, 5, 6, 7, 8, 9, #, }. An information source of text would consist of the letters (both lower and upper cases), space and various punctuation symbols: X = {a,..., z, A,..., Z,,,,., ;, :,!,?}. An information source representing the on/off status of a switch or actuator is a binary information source: X = {off,on}, or simply, X = {, } with representing off and on. Generally, we need infinite-dimensional joint distribution to fully characterize a stochastic process and therefore, an information source. However, if the values of X(t) at different time instances t are statistically independent, it suffices to use one-dimensional distribution. Such a source is called memoryless source. We further impose the identical distribution condition on the memoryless sources because we want them to be also stationary (Why?). Definition. (Discrete Memoryless Source) A discrete source with independent and identically distributed samples from its alphabet is called a Discrete Memoryless Source (DMS). For a DMS source, we can omit the time index of the stochastic process X(t) and just use a random variable X with alphabet X and probability mass function p X (x) = P [X = x X ] to describe it. Our discussion on information theory will start from the DMS.

2 ECE-S622/T62 Notes Part I 2.2 Self-information When we say that something is informative, we imply that we have some unknowns or are not sure about that thing. Therefore, the information contained in an event must be associated with the uncertainty of that event in some sense; and the higher the uncertainty of an event, the more information it contains. It is then needed to find a measure of the uncertainty in order to quantify the information. There is not yet a unanimous measure of the uncertainty (and I don t think that there exists one). However, for the stochastically modeled information source as described above, it is possible to conveniently measure the uncertainty, and thus the information, with the probabilistic distribution. We are considering DMSes. We first want a proper definition of the information measure of the event that the source outputs a specific symbol, i.e., X = x X. Let us denote the information of this event as I(X = x). According to our previous discussion, I(X = x) should depends on the probability P [X = x] = p. For some obvious reasons of usefulness of the definition, we desire the following properties (axioms, say in mathematical language) regard I(X = x) and p.. The information of an event is a function of its probability, i.e., I(X = x) = F (p); 2. The information is differentiable with respect to the probability, i.e., 3. The information is monotonically decreasing with the probability, i.e., df (p) dp df (p) dp exists; 4. Suppose that the source outputs two symbols in a row independently. The information of the event that the first symbol is x and the second is y equals the summation of the information of the two individual events X = x and X = y, i.e., I(X() = x, X(2) = y) = F (pq) = F (p) + F (q) = I(X = x) + I(X = y). 5. Deterministic events contain zero information and impossible events contain infinite information, i.e. F () = and F () =. One can see that these requirements do correlate with our intuition of the relation of information and uncertainty. Interestingly, only the following definitions of I(X = x) satisfies the aforementioned requirement. Definition.2 (Self-information) The self-information of an event X = x with probability p = P [X = x] is ; I(X = x) = log p (.) Qualification of this definition to meet all the desired properties can be easily verified. We further indicate that the choice of the base of the logarithm does not bother us very much. It only affects the unit of the information measure. If base 2 is chosen, the resulting information unit is called bit (standing for binary digit). If natural base (e) is chosen, the resulting information unit is called nat (standing for natural digit). While, if base is chosen, the resulting information unit is called Hartley (also called dit, correct!). In most cases, we use base 2. The subtle reason

3 ECE-S622/T62 Notes Part I 3 for that will be clear as we proceed. Sometimes the natural base offers much convenience in mathematical analysis and hence is preferred there. For simplicity, we denote base 2 logarithm with log, natural base logarithm with ln, and base logarithm with lg (though we will rarely use it). Different information units can be easily translated to each other. Using the base change formula of logarithm, we can easily find that nat = log e bits and dit = log bits. The last paragraph of this section is devoted to the mathematical justification of Definition.2, and is only for those who are curious of the uniqueness of Definition.2 and are stick to mathematical rigidity. Proof: From the fourth property in the list above, differentiating both sides of the equation with respect to p, we obtain Similarly, df (pq) dp df (pq) dq Combining the above two equations, we get = q df (p) dp = df (q) p dq df (p) df (q) p = q dp dq Since it holds for arbitrary p and q, F (p) must satisfy where C is a constant. Consequently, df (p) p = C, dp F (p) = C ln p + D, df (p) where D is another constant. Since we require F () =, D = and since we require, dp C <. We are in favor of base 2 logarithm, so we set C = / ln 2 and the desired result follows..3 Entropy Self-information is just a measure of the information of the outcome of a specific symbol in the alphabet of a source. To characterize the information content of the source, we need to (statistically) average the self-information of all symbols. This average information is called entropy of the source. Definition.3 (Entropy) The average information (per symbol) or entropy, H(X) of DMS X with alphabet X = {x i, i =,..., n} and probability mass function p X (x i ) = P [X = x i ] = p i, is H(X) = E[I(X = x i )] = p i log = p i i=... p i log p i (.2) i=

4 ECE-S622/T62 Notes Part I 4 The word entropy (from the Greek entrope, meaning change) was obviously borrowed from the thermodynamics which was first used by Clausius to measure the irreversible increase of the non-disposable energy. Actually, (.2) is very similar to Boltzmann s statistical definition of entropy. Example.2 Consider the binary source X = { : p, : ( p)}. Its entropy is H(X) = p log p + ( p) log p. If the source is equiprobable, i.e., p = /2, then H(X) =. We find that the binary source has average bit information per information symbol. That is the reason why we call a binary symbol ( or ) a bit. It means that a message represented with N binary symbols contains N bits of information. In other words, a message contains N bits of information needs N binary symbols to describe. The binary source is the simplest and therefore, most convenient source for both theoretical and practical purposes. Therefore, using base 2 logarithm to have an information unit in bit is prevalent. Example.3 Consider the source X = {x :.5, x 2 :.25, x 3 :.25} Then, H(X) =.5 log log log 4 =.5 bits. Thus, a typical message from the source contains.5 bits of information per symbol. Consequently, one symbol of the given source is equivalent in information content to.5 binary symbols. Example.4 Listed below are letters of the English alphabet with their relative frequencies. According to them, we can compute the entropy of English as an DMS. H(X) bits per letter. Letter Frequency Letter Frequency A.856 N.77 B.39 O.797 C.279 P.99 D.378 Q.2 E.34 R.677 F.289 S.67 G.99 T.45 H.528 U.249 I.627 V.92 J.3 W.49 K.42 X.7 L.339 Y.99 M.249 Z.8 Table.: English Alphabet The information entropy has the following properties. Theorem. (Minimum and Maximum Entropy). H(X) with equality when p i = for one of the symbol x i X. 2. H(X) log n for an alphabet of n symbols, with equality when p i = n for all symbols.

5 ECE-S622/T62 Notes Part I 5 Theorem. gives the minimum and maximum of the entropy of an information source. The first result states that entropy is non-negative, which means that there is no negative information. This is quite intuitive because we will not lose information to a message even if we may get nothing from it. The zero entropy happens when the source alphabet loses randomness because we get no information from a deterministic event. The second result tells us that the equiprobable source has the maximum entropy. Now, we proof Theorem.. Before we do that, we first proof the following lemma. Lemma. (Information theory fundamental inequality) ln x x. (.3) Proof: Denote f(x) = ln x x +. We have f () = and f () = <. Therefore, x = is the maxima of f(x), i.e., f(x) f() =. The desired result follows. A corollary of Lemma. is that ln x > x, (.4) which is obtained by replacing ln x with ln x in (.3). Then let us proof Theorem.. Proof: The first result is derived as H(X) = p i log ln ( ) p = p i i p i ln 2 p i ( p i ) = p 2 i, ln 2 ln 2 i= i= and the condition for equality can be easily verified. For the second result, we consider H(X) log n = p i log p i i= i= p i log n = i= i= p i log np i i= i= p i ( np i ) =, and the desired result follows. We note that the equality stands if and only if np i =, i.e., p i = n for all i. Example.5 Using the expression obtained in Example.2, let us plot the entropy of the binary source as a function of the probability p. The plot is shown in Figure., from which we can see that the maximum entropy is reached when p = /2, i.e., when the binary source is equiprobable..4 Joint Entropy Joint entropy is an obvious extension of entropy when we need to study two or more information sources. Definition.4 (Joint Entropy) The joint entropy of k information sources, X j X j, j =,..., k with the joint probability mass function p X...X K (x,..., x k ) = P [X = x X,..., X k = x k X k ], is defined as H(X,..., X k ) = x X... i k X k p X...X K (x,..., x k ) log p X...X K (x,..., x k ). (.5)

6 ECE-S622/T62 Notes Part I 6 H(x) p Figure.: Plot of H(X) = p log p ( p) log( p) Example.6 Consider two binary sources X and X 2 sharing the same alphabet X = {, } but having different probability mass functions. For X, p () = p () =.5, while for X 2, p 2 () = /3 and p 2 () = 2/3. We assume independence of the two sources, i.e., p X,X 2 (x, x 2 ) = p (x )p 2 (x 2 ). Then, H(X, X 2 ) = 6 log log log log bits per symbol pair. We want to emphasis that H(X, X 2 ) is defined as the average bits of information per pair of symbols. This should be in contrast with H(X) which is defined as the average bits of information per symbol. If the information per symbol of the joint source are concerned, we can divide H(X, X 2 ) by 2 to get what we want, which is.959 bits of information per symbol. You may compare this number with H(X ) = and H(X 2 ) =.983 to see the change of the entropy when computed jointly from those computed individually. An important application of joint entropy is to describe the information contents of the extended sources. Definition.5 (The k-th Extension of an information source) Let X be an source with alphabet X = {x,..., x n }. The k-th extension of X, denoted as X k, is a source with alphabet X k = {σ,..., σ n k}, each σ i corresponding to a length-k block of symbols from X, i.e., σ i = (x i,,..., x i,k ), x i,j X. Example.7 Consider the binary source, X = {, }. Its 2nd extension is X 2 = {,,, } and the 3rd extension X 3 = {,,,,,,, }. The entropy of the extended source X k can be computed using the formula of joint entropy, assuming the sources share the same alphabet. The complexity lies in that we need to know the k-dimensional distribution of X. However, the DMS allows simple computation of k-dimensional probability from one-dimensional mass function, p X k(σ i ) = p X (x i, )... p X (x i,k ). Therefore, the

7 ECE-S622/T62 Notes Part I 7 entropy of the kth extension of a DMS X is H(X k ) = = = = = = p X k(σ i ) log p X k(σ i ) n k i= p X (x i, )... p X (x i,k ) log p i= X (x i, )... p X (x i,k )... p X (x i )... p X (x ik ) log p i = i k = X (x i )... p X (x ik ) k... p X (x i )... p X (x ik ) log p X (x ij ) n k j= k i = j= i j = i k = p X (x ij ) p X (x ij ) k H(X) j= = kh(x) bits per symbol block (.6) Note that H(X k )/k = H(x). That means that each symbol still contain the same amount of information in the extended source as in the original source. This fact hold for DMSes. For sources with memory however, each symbol contain less information in the extended source. (Why?).5 Conditional Entropy Conditional entropy is used to quantify the information of a source when the information of other sources is available. Consider two sources X and Y with alphabets X and Y. The conditional probability that Y = y Y when X = x X is p Y X (y x). By simple analogy to the selfinformation, we can guess that the conditional self-information of Y = y on X = x is I(Y = y X = x) = log p Y X (y x). The average information of Y conditioned on X = x then, can be written as H(Y X = x) = y Y p Y X (y x) log p Y X (y x) Furthermore, we want the average information of Y conditioned on all possible symbols of x X, H(Y X) = p Y X (y x) log p x X y Y Y X (y x) p X(x) = p X,Y (x, y) log p x X y Y Y X (y x). (.7) The above equation is the definition of the conditional entropy of Y on X. It can be easily generalized to the case of multiple conditioning sources.

8 ECE-S622/T62 Notes Part I 8 Definition.6 (Conditional Entropy) The conditional entropy of the information source X k on sources X,..., X k is H(X k X,..., X k ) = x X... x k X k p X...X k (x,..., x k ) log p Xk X...X k (x k x,..., x k ) Example.8 Consider sources X { :.5, :.5} and Y {, }. The conditional probability of Y on X is p Y X ( ) =.25, p Y X ( ) =.6, p Y X ( ) =.75, p Y X ( ) =.4. The conditional entropy of Y on X then, is H(Y X) = (.25)(.5) log + (.75)(.5) log.75 + (.6)(.5) log + (.4)(.5) log =.89. We can also derive the marginal probability of Y using the conditional probability and the probability of X as p Y () =.425, p Y () =.575. Then, we can compute the (unconditional) entropy of Y as H(Y ) =.9837 bits per symbol. Note that the conditional entropy is smaller than the (unconditional) entropy. The noted fact in the above example is a general result about conditional entropy Theorem.2 (Conditioning Reduces Uncertainty) (.8) H(X Y ) H(X) (.9) Proof: H(X Y ) H(X) = p XY (x, y) log p y Y x X X Y (x y) p X (x) log p X (x) x X = p X (x) p XY (x, y) log p y Y x X X Y (x y) = p XY (x, y) log p X(x)p Y (y) p XY (x, y) y Y x X [ ] px (x)p Y (y) p XY (x, y) ln 2 p XY (x, y) = y Y x X This property is very intuitive because conditions give us some information about the considered source and thus reduces the original information of it. Conditional entropy and joint entropy are related by the chain rule. Theorem.3 (Chain Rule) H(X, Y ) = H(X) + H(Y X) (.)

9 ECE-S622/T62 Notes Part I 9 Proof: H(X, Y ) = p XY (x, y) log p XY (x, y) x X y Y = p XY (x, y) log p Y X y xp X (x) x X y Y = p XY (x, y) log p X (x) p XY (x, y) log p Y X y xp X (x) x X y Y x X y Y = H(X) + H(Y X). Corollary. Proof: The proof is along the same line as the theorem. H(X, Y Z) = H(X Z) + H(Y X, Z). (.) Example.9 Continue from Example.8. We can derive the joint probabilities as p XY (, ) = p Y X ( )P () = (.25)(.5) =.25, p XY (, ) =.375, p XY (, ) =.3, p XY (, ) =.2. Therefore, we can compute the joint entropy of X and Y, H(X, Y ) =.89. In addition, the entropy of X is obviously H(X) =. In Example.8, we have got H(X 2 X ) =.89. Then, we see H(X, Y ) = H(X) + H(Y X). A final remark is that H(Y X) H(X Y ). However, H(X) H(X Y ) = H(Y ) H(Y X), a property that we shall exploit later..6 Communication Channels Shannon gave a very abstract but precise model for communication systems: An information source, an output and a communication channel in between them. This model is shown in Figure.2. We consider the so-called discrete channel as the first step. A discrete channel is such a channel that both the source and the output are discrete processes. It can be simply modeled as a mapping of the source alphabet to the output alphabet in a probabilistic manner. If this discrete channel is additionally memoryless (from one source symbol to another), the mapping of successive source symbols is independent and the description of it can be simple. Definition.7 (Discrete Memoryless Channel) A discrete memoryless channel (DMC) between the information source X X = {x,..., x n } and the output Y Y = {y,..., y m } is a set of conditional probability p Y X (y j x i ) = p ij, standing for the probability that the output symbol y j is received when the source symbol x i is sent. Note that the channel may change the transmitted symbol to another one or introduce new symbols.

10 ECE-S622/T62 Notes Part I PSfrag replacementsinformation Source X Channel Information Output Y Figure.2: Information Theoretical Model for Communication Systems It is convenient to organize the conditional probabilities into an matrix p Y X (y x ) p Y X (y 2 x )... p Y X (y m x ) p p 2... p m p Y X (y x 2 ) p Y X (y 2 x 2 )... p Y X (y m x 2 ) P = = p 2 p p 2m p Y X (y x n ) p Y X (y 2 x n )... p Y X (y m x n ) p n p n2... p nm PSfrag replacements This matrix is usually referred to as channel matrix. Each row of the channel matrix P corresponds to an input of the channel, and each column of P corresponds to an output of the channel. Since if we sent x i we must receive some y j, we have m p ij =, i =,..., n. We also usually represent the channel graphically as in Figure.3. j= X Y x p p 2 y x 2 p 2 p 22 y 2 p npn2 p 2m pm x n p nm y m Figure.3: Channel Transition Graph Example. A source emits symbols {, } and the receiver receives symbols {, } as well.. If the channel is noiseless and deterministic, PSfrag replacements then p( ) = p( ) = and p( ) = p( ) =. The channel matrix and transition graph are [ ] P = This is called Binary Deterministic channel.

11 ECE-S622/T62 Notes Part I 2. If the channel introduces % bit inversion errors, then p( ) = p( ) =.99 and p( ) = PSfrag replacements p( ) =.. The channel matrix and transition graph are.99 [ ].99. P = The general case of P ( ) = P ( ) = ɛ and P ( ) = P ( ) = ɛ is called Binary symmetric channel (BSC). The binary deterministic channel described above is a special case of the BSC. PSfrag replacements 3. In general the errors in a binary channel depend on the symbol transmitted, i.e., P ( ) P ( )..8 [ ].8.2 P =.3.7 Example. (Binary Erasure Channel (BEC)) A binary erasure channel has a binary source {, } and a ternary output {,?, } where? means a decision cannot be made on whether a or was sent (output is erased). q P = [ q q q q ] PSfrag replacements q q If the source symbols are sent with probabilities p X (x i ), i =,..., n, the output symbols will then appear with some other set of probabilities: p Y (y j ), j =,..., m, which can be derived using the total probability formula: p ( y j ) = for any given channel p Y X (y j x i ). From Bayes law, we also get p Y X (y j x i )p X (x i ). (.2) i=? p X Y (x i y j ) = p XY (x i, y j ) p Y (y j ) = p Y X(y j x i )p X (x i ) p Y (y j ), (.3) which is the probability of an input x i having been sent given that the output y j was received. We denote p X Y (x i y j ) the backward probability and p Y X (y j x i ) the forward probability. Note that if we are given p Y (y j ) and p Y X (y j x i ) it may not be possible to invert (.2) to determine p X (x i ). In other words, there may be many source distribution p X (x i ) which lead to the same output distribution p Y (y j ) for a given channel p Y X (y j x i ). But if we are given p X (x i ) and p Y X (y j x i ) we always having unique p Y (y j ).

12 ECE-S622/T62 Notes Part I 2 [ ] 2/3 /3 Example.2 A binary channel which includes the source probabilities p / 9/ X () = 3/4 and p X () = /4 PSfrag can be replacements represented diagrammatically by 3/4 2/3 /4 /3 / 9/ Now we can derive the output probabilities: the backward probabilities p Y () = (2/3)(3/4) + (/)(/4) = 2/4 P Y () = (9/)(/4) + (/3)(3/4) = 9/4 p X Y ( ) = (2/3)(3/4) = 2/2 2/4 p X Y ( ) = (/)(/4) = /2 2/4 p X Y ( ) = (/3)(3/4) = /9 9/4 p X Y ( ) = (9/)(/4) = 9/9 9/4.7 Equivocation and Mutual Information Now let us study how the information content of a source is affected by a communication channel. We first need to give the information measure of the source before and after the channel (when the output is available). The a priori entropy of a source X is just the regular entropy of it, H(X) = P (x i ) log P (x i ). (.4) i= While, it is easy to guess that the a posteriori entropy should be the conditional entropy of X on the channel output Y, which we call equivocation of X with respect to Y. Definition.8 (Equivocation) Equivocation of X with respect to Y is the conditional entropy of X on Y is H(X Y ) = i= j= m p XY (x i, y j ) log p X Y (x i y j ) (.5)

13 ECE-S622/T62 Notes Part I 3 Equivocation gives the information of source X after its response at the output of the channel is available. From the discussion on conditional entropy in the previous section, we know that H(X Y ) H(X), i.e., after the transmission through channel p Y X (y j x i ), the information in X has been decreased. If we remember that information is uncertainty, then we have less uncertainty about the source X after we get the channel output Y. In other words, we are more certain about the source after we observe its through the channel output. This totally correlates our intuition. Then, it is straightforward to think that the difference between H(X) and H(X Y ) must be the information we extract from X through knowing Y. In other words, H(X) H(X Y ) gives the information about X implicated in Y. This quantity is name mutual information. Definition.9 (Mutual Information) The mutual information of X and Y is defined as Alternative expressions for I(X; Y ) can be obtained as follows, I(X; Y ) = H(X) H(X Y ) (.6) I(X; Y ) = = = = = p X (x i ) log p X (x i ) i= i= j= i= j= m p XY (x i, y j ) log m m i= j= m i= j= i= j= m p XY (x i, y j ) log p X (x i ) p XY (x i, y j ) log p X Y (x i y j ) p X (x i ) i= j= p XY (x i, y j ) log p X,Y (x i, y j ) p X (x i )p Y (y j ) p XY (x i, y j ) log p Y X(y j x i ) p Y (y j ) p X Y (x i y j ) m p XY (x i, y j ) log p X Y (x i y j ) (.7) The following propertied of the mutual information I(X; Y ) can be observed. Theorem.4 I(X; Y ) (.8) with equality if and only if p XY (x i, y j ) = p X (x i )p Y (y j ), i, j. Proof: Use information theory fundamental inequality of (.3).. Theorem.5 I(X;X) = H(X). Therefore, source entropy can be viewed as a special case of mutual information. Theorem.6 I(X; Y ) = I(Y ; X). (.9)

14 ECE-S622/T62 Notes Part I 4 It means that Mutual information are symmetric with respect to X and Y. This can be easily seen from (.7). It is interesting to investigate the relations between the entropies H(X) and H(Y ), the joint entropy H(X, Y ), + the equivocations H(X Y ) and H(Y X), and the mutual information I(X; Y ) = I(Y ; X), which are summarized in Figure.4 and the following theorem. Theorem.7 H(X, Y ) = H(X) + H(Y ) I(X; Y ) = H(X) + H(Y X) = H(Y ) + H(X Y ). (.2) The proof of this relationship is just a exercise of playing with the probability, joint probability and conditional probability. H(X) H(Y) H(X Y) I(X;Y) I(Y;X) H(Y)X) H(X,Y) Figure.4: Relationship between entropies, enquivocations, and mutual information Example.3 Use the specifications given in Example.2. Before we seeing an output from the channel, our priori knowledge of the information source is H(X) = (3/4) log(4/3) + (/4) log(4) =.8. But after seeing the channel output, say y j =, our knowledge of the input source becomes H(X y j = ) = (2/2) log(2/2) + (/2) log(2) =.276. We can similarly obtain H(X y j = ) =.998. We see that we are more certain that is sent when we observe because the uncertainty (entropy) of the source X is reduced from.8 to.276 when we receive a. On the other hand, when we receive a, we are more uncertain about what x i is (with almost equal uncertainty about whether it was a or ). On average, the equivocation of X wrt Y is H(X Y ) =.276(2/4) +.998(9/4) =.69 <.8 = H(X). The mutual information of X and Y is I(X; Y ) =.8.69 =.92. That means we get.92 bits of information per symbol received. Other quantities include H(Y ) = (2/4) log(4/2) + (9/4) log(4/9) =.9982, H(Y X) = H(Y ) I(Y ; X) = H(Y ) I(X; Y ) =.862, and H(X; Y ) = H(X) + H(Y ) I(X; Y ) =.67. Example.4 For a BSC (c.f. Example.), the following probabilities can be obtained p X (x i ): p X () = p, p X () = p;

15 ECE-S622/T62 Notes Part I 5 p Y X (y j x i ): p Y X ( ) = p Y X ( ) = ɛ, p Y X ( ) = p Y X ( ) = ɛ; p XY (x i, y j ): p XY (, ) = p( ɛ), p XY (, ) = ( p)( ɛ), + p XY (, ) = pɛ, p XY (, ) = ( p)ɛ; p Y (y j ): p Y () = p( ɛ) + ( p)ɛ, p Y () = pɛ + ( p)( ɛ); p X Y (x i y j ): p X Y ( ) = p( ɛ) p( ɛ)+( p)ɛ, p X Y ( ) = ( p)ɛ p( ɛ)+( p)ɛ, p X Y ( ) = pɛ pɛ+( p)( ɛ), p X Y ( ) = ( p)( ɛ) pɛ+( p)( ɛ) Then we can compute various entropies and the mutual information. Here we just show an easy way for mutual information. I(X; Y ) = H(Y ) H(Y X) = F (P ( ɛ) + ( p)ɛ) F (ɛ) where F (x) = x log x + ( x) log x. What would you expect I(X; Y ) to be for a BSC if p =.5, p =, ɛ =.5, or ɛ =.? You do not need to calculate! Example.5 (Noiseless Channel) A channel of which each output symbol can be produced by the occurrence only of a particular one of the source symbols is called noisless channel, i.e., there is no noise or ambiguity on which input have caused the output. An example channel is given in the following /2 /2 P = /2 /2 3/5 3/ / 3/5 3/ / We see that for a noiseless channel, the channel matrix has one and only on nonzero element in each column. Also note that the output symbols may be more than the source symbols. However, it can not be less (Why?). For a noiseless channel when we observe the output y j we know with probability that input, say x, was sent; that is, p X Y (x y j ) = for x and p X Y (x i y j ) = for all other x i x s. The equivocation H(X Y ) will be H(X Y ) = i= j= m m P (x i, y j ) log p(x i y j ) = P (y j ) j= P (x i y j ) log P (x i y j ) =, i= because there is only one P (x i y j ) to be and other are zero. Then, we have the following result: For noiseless channels, I(X; Y ) = H(X). That means that with noiseless channel, there is no uncertainty about the input upon observing the output, and that the amount of information transmitted through the channel is the same as the information contained in the source. That is way we are favor of noiseless channels. Example.6 (Deterministic channel) A channel in which there are more possible input symbols than output symbols, but where each of the input symbol is only capable of producing one of the output

16 ECE-S622/T62 Notes Part I 6 symbols, is called a deterministic channel. An example of deterministic channel is given below. P = We can see that a deterministic channel has a channel matrix with one and only one nonzero element in each row. For a deterministic channel we know with probability that output symbol, say y i will be produced when x i is sent. Therefore, P (y i a i ) = for y i and P (y j a i ) = for other y j s. The equivocation H(Y X) = following the same derivation in the previous example. Hence, we have the following result For deterministic channels I(X; Y ) = H(Y )..8 Cascaded Channel A cascade of two channels is shown in Figure.5 The output of channel is connected to the input of channel 2. When x i is sent through channel, the output is y j. The same y j forms the input to channel 2 which produces the output z k. If we know that the intermediate symbol is y j, then the probability of obtaining z k at the output is dependent solely on y j and not on x i. That is p Z XY (z k x i, y j ) = p Z Y (z k y j ), i, j, k. Actually, this relationship can be viewed as a definition of a cascade of two channels. In the reverse direction we have p X Y Z (x i y j, z k ) = p X Y (x i y j ). PSfrag replacements X Y Z Channel Channel 2 Figure.5: Cascade of Two Channels

17 ECE-S622/T62 Notes Part I 7 Let us look at H(X Z) H(X Y ) = P (x, z) log P (x z) log P (x y) X,Z X,Y = P (x, y, z) log P (x z) X,Y,Z = X,Y,Z = Y,Z = Y,Z P (x, y, z) log P (x y) P (x z) P (y, z) X P (y, z) X X,Y,Z P (x, y, z) log P (x y, z) P (x y, z) log P (x z) ( P (x y, z) P (x z) ) P (x y, z) P (x y) Hence, H(X Z) H(X Y ) with equality iff p X Z (x z) = P X Y Z (x y, z) p X Z (x z) = p X Y (x y). Consequently, we have the following result Theorem.8 For the cascade of channels X Y and Y Z, with equality iff p X Z (x z) = p X Y (x y). I(X; Y ) I(X; Z) (.2) This result implies that information channels tend to leak: the information coming out at the end of the cascaded system can be no greater (and probably less) than the information from an intermediate point. Example.7 If the channel Y Z is noiseless, then p X Z (x z) = p Y y (x y) because p X Z (x z) = y Y p X Y (x y)p Y Z (y z) and p Y Z (y z) = only for a specific y according to the property of noiseless channels. However, the condition p XZ (x z) = p XY (x y) can be satisfied by noisy channels. Example.8 Consider the cascade of channel X Y [ ] /3 /3 /3 P XY = /2 /2 with channel Y Z P Y Z = 2/3 /3 /3 2/3 which gives

18 ECE-S622/T62 Notes Part I 8 /3 /3 /3 /2 /2 2/3 /3 /3 2/3 You can verify that channel Y Z is not noiseless, but surprisingly it does not leak information because I(X; Z) = I(X; Y ). Indeed P XZ = P XY P Y Z = P XY.9 Continuous Sources and Channels Models for continuous sources and channels are necessary when we study analogue communication systems. Even in digital communication systems, the signal transmission between the modulator and demodulator is in a continuous fashion. A continuous source is a random process X(t) with continuous amplitude. It can be memoryless or with memory. But we mainly consider memoryless sources which can be described with a random variable X representing one snapshot of the source. The associated probability density function (pdf) of X, f X (x) can fully specify the continuous information source. A continuous channel maps a continuous source X to the output Y which is also continuous. The mapping is probabilistic. Such a channel is also called waveform channel. Again, we consider memoryless channels which can be described by the conditional pdf f Y X (y x). Example.9 (Additive white Gaussian noise channel) The most popular channel is the additive white Gaussian noise (AWGN) channel as shown in Figure.6. The channel can be described by the conditional probability f Y X (y x) = 2πσ e (y x)2 2σ 2 (.22) X Y=X+W W Figure.6: Additive White Gaussian Noise (AWGN) Channel There are also semi-continuous channels, in which one of X or Y is continuous and another is discrete.

19 ECE-S622/T62 Notes Part I 9. Mutual Information and Differential Entropy Let us first consider the extension of the mutual information for discrete channels to continuous ones. Consider quantizing the source X and output Y by dividing their value set into small intervals δx and δy, respectively, and concentrating each small interval into one value: x i = iδx, y j = jδy The probabilities associated with x i and y i are related to the pdfs in the following way: P (x i ) = f X (iδx)δx, P (y j ) = f Y (jδy)δy, P (x i, y j ) = f Y X (iδx, jδy)δxδy We can use the mutual information of X d = {x i }, Y d = {y j } to approximate the mutual information of X and Y, I(X; Y ) I(X d ; Y d ) = i f(iδx, jδy) f(iδx, jδy)δxδy log f(iδx)f(jδy) j Letting δx, δy, we can expect that the approximation becomes precise. Note that limiting procedure changes the double-summation into double-integration. Finally, we get the following definition of mutual information for continuous channels. Definition. (Mutual Information) The mutual information of X and Y is I(X; Y ) = f XY (x, y) log f XY (x, y) dxdy (.23) f X (x)f Y (y) (.23) also allows the following alternative expressions: I(X; Y ) = = X X X Y Y Y f X (x)f Y X (y x) f Y X(y x) dxdy f Y (y) f Y (y)f X Y (x y) f X Y (x y) dxdy (.24) f X (x) However, this quantization method does not apply to the extension of entropy, because H(X d ) = i f X (iδx)δx log f X (iδx)δx = i f X (iδx) log f X (iδx) δx + i f X (iδx) log δx δx where the second term does not converge as δx. A solution is that we only take the first well-behaved term as the entropy of the continuous source X, which we call differential entropy. Definition. (Differential Entropy) h(x) = X f X (x) log dx (.25) f X (x)

20 ECE-S622/T62 Notes Part I 2 Similarly, we have joint differential entropy h(x, Y ) = f XY (x, y) log dx, (.26) f XY (x, y) X Y and conditional differential entropy h(y X) = X Y f XY (x, y) log f Y X (y x) (.27) Though these definitions look just like a replacement of the probabilities with pdfs, they no longer possess the mathematical beautifulness of the former ones. For example, h(x) may be negative. More importantly, the usefulness of h(x) depends on the existence of the integration which is not necessarily a finite number. One the other hand, as long as h(x) and h(x Y ) exist the relationship will hold. I(X; Y ) = h(x) h(x Y ) (.28) Example.2 The differential entropy of a Gaussian source, f X (x) = 2πσ e (x m)2 2σ 2 can be derived as [ ( )] h(x) = E log e (x m)2 2σ 2 = log( 2πσ) + [ ] (x m) 2 2πσ 2 log e E σ 2 = 2 log(2πeσ2 ) (.29) If we view σ 2 as the power of X, we see that the differential entropy of a Gaussian source is determined by its power. The result of (.29) can be extended to k joint Gaussian sources, h(x,..., X k ) = 2 log[(2πe)n P] where P is the covariance matrix of X,..., X k. Besides Gaussian distribution, uniform distribution, f X (x) = /(b a), a x b is another important distribution. Its differential entropy is h(x) = log(b a), (.3) determined by the length of the interval. If b a <, h(x) <. So, the positiveness of h(x) does not hold.

Lecture 8: Channel Capacity, Continuous Random Variables

Lecture 8: Channel Capacity, Continuous Random Variables EE376A/STATS376A Information Theory Lecture 8-02/0/208 Lecture 8: Channel Capacity, Continuous Random Variables Lecturer: Tsachy Weissman Scribe: Augustine Chemparathy, Adithya Ganesh, Philip Hwang Channel

More information

ECE 4400:693 - Information Theory

ECE 4400:693 - Information Theory ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential

More information

Information Theory - Entropy. Figure 3

Information Theory - Entropy. Figure 3 Concept of Information Information Theory - Entropy Figure 3 A typical binary coded digital communication system is shown in Figure 3. What is involved in the transmission of information? - The system

More information

(Classical) Information Theory III: Noisy channel coding

(Classical) Information Theory III: Noisy channel coding (Classical) Information Theory III: Noisy channel coding Sibasish Ghosh The Institute of Mathematical Sciences CIT Campus, Taramani, Chennai 600 113, India. p. 1 Abstract What is the best possible way

More information

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University Chapter 4 Data Transmission and Channel Capacity Po-Ning Chen, Professor Department of Communications Engineering National Chiao Tung University Hsin Chu, Taiwan 30050, R.O.C. Principle of Data Transmission

More information

Principles of Communications

Principles of Communications Principles of Communications Weiyao Lin Shanghai Jiao Tong University Chapter 10: Information Theory Textbook: Chapter 12 Communication Systems Engineering: Ch 6.1, Ch 9.1~ 9. 92 2009/2010 Meixia Tao @

More information

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information. L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate

More information

Dept. of Linguistics, Indiana University Fall 2015

Dept. of Linguistics, Indiana University Fall 2015 L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission

More information

EC2252 COMMUNICATION THEORY UNIT 5 INFORMATION THEORY

EC2252 COMMUNICATION THEORY UNIT 5 INFORMATION THEORY EC2252 COMMUNICATION THEORY UNIT 5 INFORMATION THEORY Discrete Messages and Information Content, Concept of Amount of Information, Average information, Entropy, Information rate, Source coding to increase

More information

Revision of Lecture 5

Revision of Lecture 5 Revision of Lecture 5 Information transferring across channels Channel characteristics and binary symmetric channel Average mutual information Average mutual information tells us what happens to information

More information

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy Haykin_ch05_pp3.fm Page 207 Monday, November 26, 202 2:44 PM CHAPTER 5 Information Theory 5. Introduction As mentioned in Chapter and reiterated along the way, the purpose of a communication system is

More information

Solutions to Homework Set #3 Channel and Source coding

Solutions to Homework Set #3 Channel and Source coding Solutions to Homework Set #3 Channel and Source coding. Rates (a) Channels coding Rate: Assuming you are sending 4 different messages using usages of a channel. What is the rate (in bits per channel use)

More information

Chapter 9 Fundamental Limits in Information Theory

Chapter 9 Fundamental Limits in Information Theory Chapter 9 Fundamental Limits in Information Theory Information Theory is the fundamental theory behind information manipulation, including data compression and data transmission. 9.1 Introduction o For

More information

Noisy-Channel Coding

Noisy-Channel Coding Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/05264298 Part II Noisy-Channel Coding Copyright Cambridge University Press 2003.

More information

Revision of Lecture 4

Revision of Lecture 4 Revision of Lecture 4 We have completed studying digital sources from information theory viewpoint We have learnt all fundamental principles for source coding, provided by information theory Practical

More information

Lecture 5 Channel Coding over Continuous Channels

Lecture 5 Channel Coding over Continuous Channels Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 14, 2014 1 / 34 I-Hsiang Wang NIT Lecture 5 From

More information

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006)

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006) MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK SATELLITE COMMUNICATION DEPT./SEM.:ECE/VIII UNIT V PART-A 1. What is binary symmetric channel (AUC DEC 2006) 2. Define information rate? (AUC DEC 2007)

More information

MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK. SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A

MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK. SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK DEPARTMENT: ECE SEMESTER: IV SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A 1. What is binary symmetric channel (AUC DEC

More information

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 2: Entropy and Mutual Information Chapter 2 outline Definitions Entropy Joint entropy, conditional entropy Relative entropy, mutual information Chain rules Jensen s inequality Log-sum inequality

More information

Module 1. Introduction to Digital Communications and Information Theory. Version 2 ECE IIT, Kharagpur

Module 1. Introduction to Digital Communications and Information Theory. Version 2 ECE IIT, Kharagpur Module ntroduction to Digital Communications and nformation Theory Lesson 3 nformation Theoretic Approach to Digital Communications After reading this lesson, you will learn about Scope of nformation Theory

More information

Noisy channel communication

Noisy channel communication Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 6 Communication channels and Information Some notes on the noisy channel setup: Iain Murray, 2012 School of Informatics, University

More information

Lecture 11: Continuous-valued signals and differential entropy

Lecture 11: Continuous-valued signals and differential entropy Lecture 11: Continuous-valued signals and differential entropy Biology 429 Carl Bergstrom September 20, 2008 Sources: Parts of today s lecture follow Chapter 8 from Cover and Thomas (2007). Some components

More information

3F1 Information Theory, Lecture 1

3F1 Information Theory, Lecture 1 3F1 Information Theory, Lecture 1 Jossy Sayir Department of Engineering Michaelmas 2013, 22 November 2013 Organisation History Entropy Mutual Information 2 / 18 Course Organisation 4 lectures Course material:

More information

Channel capacity. Outline : 1. Source entropy 2. Discrete memoryless channel 3. Mutual information 4. Channel capacity 5.

Channel capacity. Outline : 1. Source entropy 2. Discrete memoryless channel 3. Mutual information 4. Channel capacity 5. Channel capacity Outline : 1. Source entropy 2. Discrete memoryless channel 3. Mutual information 4. Channel capacity 5. Exercices Exercise session 11 : Channel capacity 1 1. Source entropy Given X a memoryless

More information

Lecture 17: Differential Entropy

Lecture 17: Differential Entropy Lecture 17: Differential Entropy Differential entropy AEP for differential entropy Quantization Maximum differential entropy Estimation counterpart of Fano s inequality Dr. Yao Xie, ECE587, Information

More information

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions Engineering Tripos Part IIA THIRD YEAR 3F: Signals and Systems INFORMATION THEORY Examples Paper Solutions. Let the joint probability mass function of two binary random variables X and Y be given in the

More information

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Lecture 6 I. CHANNEL CODING. X n (m) P Y X 6- Introduction to Information Theory Lecture 6 Lecturer: Haim Permuter Scribe: Yoav Eisenberg and Yakov Miron I. CHANNEL CODING We consider the following channel coding problem: m = {,2,..,2 nr} Encoder

More information

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute ENEE 739C: Advanced Topics in Signal Processing: Coding Theory Instructor: Alexander Barg Lecture 6 (draft; 9/6/03. Error exponents for Discrete Memoryless Channels http://www.enee.umd.edu/ abarg/enee739c/course.html

More information

One Lesson of Information Theory

One Lesson of Information Theory Institut für One Lesson of Information Theory Prof. Dr.-Ing. Volker Kühn Institute of Communications Engineering University of Rostock, Germany Email: volker.kuehn@uni-rostock.de http://www.int.uni-rostock.de/

More information

Lecture 4 Noisy Channel Coding

Lecture 4 Noisy Channel Coding Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 9, 2015 1 / 56 I-Hsiang Wang IT Lecture 4 The Channel Coding Problem

More information

Discrete Memoryless Channels with Memoryless Output Sequences

Discrete Memoryless Channels with Memoryless Output Sequences Discrete Memoryless Channels with Memoryless utput Sequences Marcelo S Pinho Department of Electronic Engineering Instituto Tecnologico de Aeronautica Sao Jose dos Campos, SP 12228-900, Brazil Email: mpinho@ieeeorg

More information

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157 Lecture 6: Gaussian Channels Copyright G. Caire (Sample Lectures) 157 Differential entropy (1) Definition 18. The (joint) differential entropy of a continuous random vector X n p X n(x) over R is: Z h(x

More information

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1 Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,

More information

Communication Theory II

Communication Theory II Communication Theory II Lecture 15: Information Theory (cont d) Ahmed Elnakib, PhD Assistant Professor, Mansoura University, Egypt March 29 th, 2015 1 Example: Channel Capacity of BSC o Let then: o For

More information

1 Introduction to information theory

1 Introduction to information theory 1 Introduction to information theory 1.1 Introduction In this chapter we present some of the basic concepts of information theory. The situations we have in mind involve the exchange of information through

More information

x log x, which is strictly convex, and use Jensen s Inequality:

x log x, which is strictly convex, and use Jensen s Inequality: 2. Information measures: mutual information 2.1 Divergence: main inequality Theorem 2.1 (Information Inequality). D(P Q) 0 ; D(P Q) = 0 iff P = Q Proof. Let ϕ(x) x log x, which is strictly convex, and

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet. EE376A - Information Theory Final, Monday March 14th 216 Solutions Instructions: You have three hours, 3.3PM - 6.3PM The exam has 4 questions, totaling 12 points. Please start answering each question on

More information

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy Coding and Information Theory Chris Williams, School of Informatics, University of Edinburgh Overview What is information theory? Entropy Coding Information Theory Shannon (1948): Information theory is

More information

Information in Biology

Information in Biology Information in Biology CRI - Centre de Recherches Interdisciplinaires, Paris May 2012 Information processing is an essential part of Life. Thinking about it in quantitative terms may is useful. 1 Living

More information

Coding for Discrete Source

Coding for Discrete Source EGR 544 Communication Theory 3. Coding for Discrete Sources Z. Aliyazicioglu Electrical and Computer Engineering Department Cal Poly Pomona Coding for Discrete Source Coding Represent source data effectively

More information

Information in Biology

Information in Biology Lecture 3: Information in Biology Tsvi Tlusty, tsvi@unist.ac.kr Living information is carried by molecular channels Living systems I. Self-replicating information processors Environment II. III. Evolve

More information

Bioinformatics: Biology X

Bioinformatics: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA Model Building/Checking, Reverse Engineering, Causality Outline 1 Bayesian Interpretation of Probabilities 2 Where (or of what)

More information

Entropies & Information Theory

Entropies & Information Theory Entropies & Information Theory LECTURE I Nilanjana Datta University of Cambridge,U.K. See lecture notes on: http://www.qi.damtp.cam.ac.uk/node/223 Quantum Information Theory Born out of Classical Information

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Lecture 22: Final Review

Lecture 22: Final Review Lecture 22: Final Review Nuts and bolts Fundamental questions and limits Tools Practical algorithms Future topics Dr Yao Xie, ECE587, Information Theory, Duke University Basics Dr Yao Xie, ECE587, Information

More information

Lecture 2: August 31

Lecture 2: August 31 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy

More information

ITCT Lecture IV.3: Markov Processes and Sources with Memory

ITCT Lecture IV.3: Markov Processes and Sources with Memory ITCT Lecture IV.3: Markov Processes and Sources with Memory 4. Markov Processes Thus far, we have been occupied with memoryless sources and channels. We must now turn our attention to sources with memory.

More information

An introduction to basic information theory. Hampus Wessman

An introduction to basic information theory. Hampus Wessman An introduction to basic information theory Hampus Wessman Abstract We give a short and simple introduction to basic information theory, by stripping away all the non-essentials. Theoretical bounds on

More information

Information measures in simple coding problems

Information measures in simple coding problems Part I Information measures in simple coding problems in this web service in this web service Source coding and hypothesis testing; information measures A(discrete)source is a sequence {X i } i= of random

More information

ELEC546 Review of Information Theory

ELEC546 Review of Information Theory ELEC546 Review of Information Theory Vincent Lau 1/1/004 1 Review of Information Theory Entropy: Measure of uncertainty of a random variable X. The entropy of X, H(X), is given by: If X is a discrete random

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#8:(November-08-2010) Cancer and Signals Outline 1 Bayesian Interpretation of Probabilities Information Theory Outline Bayesian

More information

CHAPTER 3. P (B j A i ) P (B j ) =log 2. j=1

CHAPTER 3. P (B j A i ) P (B j ) =log 2. j=1 CHAPTER 3 Problem 3. : Also : Hence : I(B j ; A i ) = log P (B j A i ) P (B j ) 4 P (B j )= P (B j,a i )= i= 3 P (A i )= P (B j,a i )= j= =log P (B j,a i ) P (B j )P (A i ).3, j=.7, j=.4, j=3.3, i=.7,

More information

Information. = more information was provided by the outcome in #2

Information. = more information was provided by the outcome in #2 Outline First part based very loosely on [Abramson 63]. Information theory usually formulated in terms of information channels and coding will not discuss those here.. Information 2. Entropy 3. Mutual

More information

A Gentle Tutorial on Information Theory and Learning. Roni Rosenfeld. Carnegie Mellon University

A Gentle Tutorial on Information Theory and Learning. Roni Rosenfeld. Carnegie Mellon University A Gentle Tutorial on Information Theory and Learning Roni Rosenfeld Mellon University Mellon Outline First part based very loosely on [Abramson 63]. Information theory usually formulated in terms of information

More information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information 4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information Ramji Venkataramanan Signal Processing and Communications Lab Department of Engineering ramji.v@eng.cam.ac.uk

More information

ELEMENT OF INFORMATION THEORY

ELEMENT OF INFORMATION THEORY History Table of Content ELEMENT OF INFORMATION THEORY O. Le Meur olemeur@irisa.fr Univ. of Rennes 1 http://www.irisa.fr/temics/staff/lemeur/ October 2010 1 History Table of Content VERSION: 2009-2010:

More information

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel Introduction to Coding Theory CMU: Spring 2010 Notes 3: Stochastic channels and noisy coding theorem bound January 2010 Lecturer: Venkatesan Guruswami Scribe: Venkatesan Guruswami We now turn to the basic

More information

Communication Theory and Engineering

Communication Theory and Engineering Communication Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 018-019 Information theory Practice work 3 Review For any probability distribution, we define

More information

5 Mutual Information and Channel Capacity

5 Mutual Information and Channel Capacity 5 Mutual Information and Channel Capacity In Section 2, we have seen the use of a quantity called entropy to measure the amount of randomness in a random variable. In this section, we introduce several

More information

Entropy Rate of Stochastic Processes

Entropy Rate of Stochastic Processes Entropy Rate of Stochastic Processes Timo Mulder tmamulder@gmail.com Jorn Peters jornpeters@gmail.com February 8, 205 The entropy rate of independent and identically distributed events can on average be

More information

Application of Information Theory, Lecture 7. Relative Entropy. Handout Mode. Iftach Haitner. Tel Aviv University.

Application of Information Theory, Lecture 7. Relative Entropy. Handout Mode. Iftach Haitner. Tel Aviv University. Application of Information Theory, Lecture 7 Relative Entropy Handout Mode Iftach Haitner Tel Aviv University. December 1, 2015 Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December

More information

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page. EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 28 Please submit on Gradescope. Start every question on a new page.. Maximum Differential Entropy (a) Show that among all distributions supported

More information

3F1 Information Theory, Lecture 3

3F1 Information Theory, Lecture 3 3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2011, 28 November 2011 Memoryless Sources Arithmetic Coding Sources with Memory 2 / 19 Summary of last lecture Prefix-free

More information

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows. Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage

More information

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information 204 IEEE International Symposium on Information Theory Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information Omur Ozel, Kaya Tutuncuoglu 2, Sennur Ulukus, and Aylin Yener

More information

Introduction to Machine Learning

Introduction to Machine Learning What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes

More information

Shannon s noisy-channel theorem

Shannon s noisy-channel theorem Shannon s noisy-channel theorem Information theory Amon Elders Korteweg de Vries Institute for Mathematics University of Amsterdam. Tuesday, 26th of Januari Amon Elders (Korteweg de Vries Institute for

More information

Solutions to Homework Set #4 Differential Entropy and Gaussian Channel

Solutions to Homework Set #4 Differential Entropy and Gaussian Channel Solutions to Homework Set #4 Differential Entropy and Gaussian Channel 1. Differential entropy. Evaluate the differential entropy h(x = f lnf for the following: (a Find the entropy of the exponential density

More information

Shannon s Noisy-Channel Coding Theorem

Shannon s Noisy-Channel Coding Theorem Shannon s Noisy-Channel Coding Theorem Lucas Slot Sebastian Zur February 2015 Abstract In information theory, Shannon s Noisy-Channel Coding Theorem states that it is possible to communicate over a noisy

More information

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Multimedia Communications. Mathematical Preliminaries for Lossless Compression Multimedia Communications Mathematical Preliminaries for Lossless Compression What we will see in this chapter Definition of information and entropy Modeling a data source Definition of coding and when

More information

The binary entropy function

The binary entropy function ECE 7680 Lecture 2 Definitions and Basic Facts Objective: To learn a bunch of definitions about entropy and information measures that will be useful through the quarter, and to present some simple but

More information

CS6304 / Analog and Digital Communication UNIT IV - SOURCE AND ERROR CONTROL CODING PART A 1. What is the use of error control coding? The main use of error control coding is to reduce the overall probability

More information

Shannon meets Wiener II: On MMSE estimation in successive decoding schemes

Shannon meets Wiener II: On MMSE estimation in successive decoding schemes Shannon meets Wiener II: On MMSE estimation in successive decoding schemes G. David Forney, Jr. MIT Cambridge, MA 0239 USA forneyd@comcast.net Abstract We continue to discuss why MMSE estimation arises

More information

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 13 Competitive Optimality of the Shannon Code So, far we have studied

More information

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm 1. Feedback does not increase the capacity. Consider a channel with feedback. We assume that all the recieved outputs are sent back immediately

More information

Discrete Probability Refresher

Discrete Probability Refresher ECE 1502 Information Theory Discrete Probability Refresher F. R. Kschischang Dept. of Electrical and Computer Engineering University of Toronto January 13, 1999 revised January 11, 2006 Probability theory

More information

Classical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006

Classical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006 Classical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006 Fabio Grazioso... July 3, 2006 1 2 Contents 1 Lecture 1, Entropy 4 1.1 Random variable...............................

More information

Bounds on Mutual Information for Simple Codes Using Information Combining

Bounds on Mutual Information for Simple Codes Using Information Combining ACCEPTED FOR PUBLICATION IN ANNALS OF TELECOMM., SPECIAL ISSUE 3RD INT. SYMP. TURBO CODES, 003. FINAL VERSION, AUGUST 004. Bounds on Mutual Information for Simple Codes Using Information Combining Ingmar

More information

Investigation of the Elias Product Code Construction for the Binary Erasure Channel

Investigation of the Elias Product Code Construction for the Binary Erasure Channel Investigation of the Elias Product Code Construction for the Binary Erasure Channel by D. P. Varodayan A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF BACHELOR OF APPLIED

More information

ECE531: Principles of Detection and Estimation Course Introduction

ECE531: Principles of Detection and Estimation Course Introduction ECE531: Principles of Detection and Estimation Course Introduction D. Richard Brown III WPI 22-January-2009 WPI D. Richard Brown III 22-January-2009 1 / 37 Lecture 1 Major Topics 1. Web page. 2. Syllabus

More information

Appendix B Information theory from first principles

Appendix B Information theory from first principles Appendix B Information theory from first principles This appendix discusses the information theory behind the capacity expressions used in the book. Section 8.3.4 is the only part of the book that supposes

More information

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information 1 Conditional entropy Let (Ω, F, P) be a probability space, let X be a RV taking values in some finite set A. In this lecture

More information

Review of Probability Theory

Review of Probability Theory Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving

More information

CHAPTER 12 Boolean Algebra

CHAPTER 12 Boolean Algebra 318 Chapter 12 Boolean Algebra CHAPTER 12 Boolean Algebra SECTION 12.1 Boolean Functions 2. a) Since x 1 = x, the only solution is x = 0. b) Since 0 + 0 = 0 and 1 + 1 = 1, the only solution is x = 0. c)

More information

Electrical and Information Technology. Information Theory. Problems and Solutions. Contents. Problems... 1 Solutions...7

Electrical and Information Technology. Information Theory. Problems and Solutions. Contents. Problems... 1 Solutions...7 Electrical and Information Technology Information Theory Problems and Solutions Contents Problems.......... Solutions...........7 Problems 3. In Problem?? the binomial coefficent was estimated with Stirling

More information

CS 630 Basic Probability and Information Theory. Tim Campbell

CS 630 Basic Probability and Information Theory. Tim Campbell CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)

More information

LECTURE 3. Last time:

LECTURE 3. Last time: LECTURE 3 Last time: Mutual Information. Convexity and concavity Jensen s inequality Information Inequality Data processing theorem Fano s Inequality Lecture outline Stochastic processes, Entropy rate

More information

A CLASSROOM NOTE: ENTROPY, INFORMATION, AND MARKOV PROPERTY. Zoran R. Pop-Stojanović. 1. Introduction

A CLASSROOM NOTE: ENTROPY, INFORMATION, AND MARKOV PROPERTY. Zoran R. Pop-Stojanović. 1. Introduction THE TEACHING OF MATHEMATICS 2006, Vol IX,, pp 2 A CLASSROOM NOTE: ENTROPY, INFORMATION, AND MARKOV PROPERTY Zoran R Pop-Stojanović Abstract How to introduce the concept of the Markov Property in an elementary

More information

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity Eckehard Olbrich MPI MiS Leipzig Potsdam WS 2007/08 Olbrich (Leipzig) 26.10.2007 1 / 18 Overview 1 Summary

More information

DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY

DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY OUTLINE 3.1 Why Probability? 3.2 Random Variables 3.3 Probability Distributions 3.4 Marginal Probability 3.5 Conditional Probability 3.6 The Chain

More information

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions EE/Stat 376B Handout #5 Network Information Theory October, 14, 014 1. Problem.4 parts (b) and (c). Homework Set # Solutions (b) Consider h(x + Y ) h(x + Y Y ) = h(x Y ) = h(x). (c) Let ay = Y 1 + Y, where

More information

William Stallings Copyright 2010

William Stallings Copyright 2010 A PPENDIX F M EASURES OF S ECRECY AND S ECURITY William Stallings Copyright 2010 F.1 PERFECT SECRECY...2! F.2 INFORMATION AND ENTROPY...8! Information...8! Entropy...10! Properties of the Entropy Function...12!

More information

Lecture 8: Shannon s Noise Models

Lecture 8: Shannon s Noise Models Error Correcting Codes: Combinatorics, Algorithms and Applications (Fall 2007) Lecture 8: Shannon s Noise Models September 14, 2007 Lecturer: Atri Rudra Scribe: Sandipan Kundu& Atri Rudra Till now we have

More information

6.1 Main properties of Shannon entropy. Let X be a random variable taking values x in some alphabet with probabilities.

6.1 Main properties of Shannon entropy. Let X be a random variable taking values x in some alphabet with probabilities. Chapter 6 Quantum entropy There is a notion of entropy which quantifies the amount of uncertainty contained in an ensemble of Qbits. This is the von Neumann entropy that we introduce in this chapter. In

More information

Chapter 7. Error Control Coding. 7.1 Historical background. Mikael Olofsson 2005

Chapter 7. Error Control Coding. 7.1 Historical background. Mikael Olofsson 2005 Chapter 7 Error Control Coding Mikael Olofsson 2005 We have seen in Chapters 4 through 6 how digital modulation can be used to control error probabilities. This gives us a digital channel that in each

More information

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity 5-859: Information Theory and Applications in TCS CMU: Spring 23 Lecture 8: Channel and source-channel coding theorems; BEC & linear codes February 7, 23 Lecturer: Venkatesan Guruswami Scribe: Dan Stahlke

More information

An Alternative Proof of Channel Polarization for Channels with Arbitrary Input Alphabets

An Alternative Proof of Channel Polarization for Channels with Arbitrary Input Alphabets An Alternative Proof of Channel Polarization for Channels with Arbitrary Input Alphabets Jing Guo University of Cambridge jg582@cam.ac.uk Jossy Sayir University of Cambridge j.sayir@ieee.org Minghai Qin

More information

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 8: Differential entropy Chapter 8 outline Motivation Definitions Relation to discrete entropy Joint and conditional differential entropy Relative entropy and mutual information Properties AEP for

More information

Information Sources. Professor A. Manikas. Imperial College London. EE303 - Communication Systems An Overview of Fundamentals

Information Sources. Professor A. Manikas. Imperial College London. EE303 - Communication Systems An Overview of Fundamentals Information Sources Professor A. Manikas Imperial College London EE303 - Communication Systems An Overview of Fundamentals Prof. A. Manikas (Imperial College) EE303: Information Sources 24 Oct. 2011 1

More information