Chapter I: Fundamental Information Theory
|
|
- Godfrey Shaw
- 6 years ago
- Views:
Transcription
1 ECE-S622/T62 Notes Chapter I: Fundamental Information Theory Ruifeng Zhang Dept. of Electrical & Computer Eng. Drexel University. Information Source Information is the outcome of some physical processes. Though information sources are highly complex and diverse in real world, we study them from a viewpoint of communications engineering and using probabilistic model. An information source is modeled as a stochastic process {X(t); p( )} where p( ) is the (possibly infinite-dimensional) distribution of the process X(t). According to the time index t and the value domain of X(t), the information sources can be divided into different categories. If t is continuous, the source is a continuous-time source but usually called waveform source. If t is discrete, the source is called discrete-time source. The set of values from which X(t) takes is denoted as X. If X is continuous, the source is a continuous source. Otherwise, the source is a discrete source. The value set X of a discrete source is called the alphabet of the source and each element of it is called a symbol or letter. Discrete sources are of primary interest to the discussion of digital communications; therefore, we will first study such sources. Example. An information source of dialed telephone numbers has the source alphabet X = {,, 2, 3, 4, 5, 6, 7, 8, 9, #, }. An information source of text would consist of the letters (both lower and upper cases), space and various punctuation symbols: X = {a,..., z, A,..., Z,,,,., ;, :,!,?}. An information source representing the on/off status of a switch or actuator is a binary information source: X = {off,on}, or simply, X = {, } with representing off and on. Generally, we need infinite-dimensional joint distribution to fully characterize a stochastic process and therefore, an information source. However, if the values of X(t) at different time instances t are statistically independent, it suffices to use one-dimensional distribution. Such a source is called memoryless source. We further impose the identical distribution condition on the memoryless sources because we want them to be also stationary (Why?). Definition. (Discrete Memoryless Source) A discrete source with independent and identically distributed samples from its alphabet is called a Discrete Memoryless Source (DMS). For a DMS source, we can omit the time index of the stochastic process X(t) and just use a random variable X with alphabet X and probability mass function p X (x) = P [X = x X ] to describe it. Our discussion on information theory will start from the DMS.
2 ECE-S622/T62 Notes Part I 2.2 Self-information When we say that something is informative, we imply that we have some unknowns or are not sure about that thing. Therefore, the information contained in an event must be associated with the uncertainty of that event in some sense; and the higher the uncertainty of an event, the more information it contains. It is then needed to find a measure of the uncertainty in order to quantify the information. There is not yet a unanimous measure of the uncertainty (and I don t think that there exists one). However, for the stochastically modeled information source as described above, it is possible to conveniently measure the uncertainty, and thus the information, with the probabilistic distribution. We are considering DMSes. We first want a proper definition of the information measure of the event that the source outputs a specific symbol, i.e., X = x X. Let us denote the information of this event as I(X = x). According to our previous discussion, I(X = x) should depends on the probability P [X = x] = p. For some obvious reasons of usefulness of the definition, we desire the following properties (axioms, say in mathematical language) regard I(X = x) and p.. The information of an event is a function of its probability, i.e., I(X = x) = F (p); 2. The information is differentiable with respect to the probability, i.e., 3. The information is monotonically decreasing with the probability, i.e., df (p) dp df (p) dp exists; 4. Suppose that the source outputs two symbols in a row independently. The information of the event that the first symbol is x and the second is y equals the summation of the information of the two individual events X = x and X = y, i.e., I(X() = x, X(2) = y) = F (pq) = F (p) + F (q) = I(X = x) + I(X = y). 5. Deterministic events contain zero information and impossible events contain infinite information, i.e. F () = and F () =. One can see that these requirements do correlate with our intuition of the relation of information and uncertainty. Interestingly, only the following definitions of I(X = x) satisfies the aforementioned requirement. Definition.2 (Self-information) The self-information of an event X = x with probability p = P [X = x] is ; I(X = x) = log p (.) Qualification of this definition to meet all the desired properties can be easily verified. We further indicate that the choice of the base of the logarithm does not bother us very much. It only affects the unit of the information measure. If base 2 is chosen, the resulting information unit is called bit (standing for binary digit). If natural base (e) is chosen, the resulting information unit is called nat (standing for natural digit). While, if base is chosen, the resulting information unit is called Hartley (also called dit, correct!). In most cases, we use base 2. The subtle reason
3 ECE-S622/T62 Notes Part I 3 for that will be clear as we proceed. Sometimes the natural base offers much convenience in mathematical analysis and hence is preferred there. For simplicity, we denote base 2 logarithm with log, natural base logarithm with ln, and base logarithm with lg (though we will rarely use it). Different information units can be easily translated to each other. Using the base change formula of logarithm, we can easily find that nat = log e bits and dit = log bits. The last paragraph of this section is devoted to the mathematical justification of Definition.2, and is only for those who are curious of the uniqueness of Definition.2 and are stick to mathematical rigidity. Proof: From the fourth property in the list above, differentiating both sides of the equation with respect to p, we obtain Similarly, df (pq) dp df (pq) dq Combining the above two equations, we get = q df (p) dp = df (q) p dq df (p) df (q) p = q dp dq Since it holds for arbitrary p and q, F (p) must satisfy where C is a constant. Consequently, df (p) p = C, dp F (p) = C ln p + D, df (p) where D is another constant. Since we require F () =, D = and since we require, dp C <. We are in favor of base 2 logarithm, so we set C = / ln 2 and the desired result follows..3 Entropy Self-information is just a measure of the information of the outcome of a specific symbol in the alphabet of a source. To characterize the information content of the source, we need to (statistically) average the self-information of all symbols. This average information is called entropy of the source. Definition.3 (Entropy) The average information (per symbol) or entropy, H(X) of DMS X with alphabet X = {x i, i =,..., n} and probability mass function p X (x i ) = P [X = x i ] = p i, is H(X) = E[I(X = x i )] = p i log = p i i=... p i log p i (.2) i=
4 ECE-S622/T62 Notes Part I 4 The word entropy (from the Greek entrope, meaning change) was obviously borrowed from the thermodynamics which was first used by Clausius to measure the irreversible increase of the non-disposable energy. Actually, (.2) is very similar to Boltzmann s statistical definition of entropy. Example.2 Consider the binary source X = { : p, : ( p)}. Its entropy is H(X) = p log p + ( p) log p. If the source is equiprobable, i.e., p = /2, then H(X) =. We find that the binary source has average bit information per information symbol. That is the reason why we call a binary symbol ( or ) a bit. It means that a message represented with N binary symbols contains N bits of information. In other words, a message contains N bits of information needs N binary symbols to describe. The binary source is the simplest and therefore, most convenient source for both theoretical and practical purposes. Therefore, using base 2 logarithm to have an information unit in bit is prevalent. Example.3 Consider the source X = {x :.5, x 2 :.25, x 3 :.25} Then, H(X) =.5 log log log 4 =.5 bits. Thus, a typical message from the source contains.5 bits of information per symbol. Consequently, one symbol of the given source is equivalent in information content to.5 binary symbols. Example.4 Listed below are letters of the English alphabet with their relative frequencies. According to them, we can compute the entropy of English as an DMS. H(X) bits per letter. Letter Frequency Letter Frequency A.856 N.77 B.39 O.797 C.279 P.99 D.378 Q.2 E.34 R.677 F.289 S.67 G.99 T.45 H.528 U.249 I.627 V.92 J.3 W.49 K.42 X.7 L.339 Y.99 M.249 Z.8 Table.: English Alphabet The information entropy has the following properties. Theorem. (Minimum and Maximum Entropy). H(X) with equality when p i = for one of the symbol x i X. 2. H(X) log n for an alphabet of n symbols, with equality when p i = n for all symbols.
5 ECE-S622/T62 Notes Part I 5 Theorem. gives the minimum and maximum of the entropy of an information source. The first result states that entropy is non-negative, which means that there is no negative information. This is quite intuitive because we will not lose information to a message even if we may get nothing from it. The zero entropy happens when the source alphabet loses randomness because we get no information from a deterministic event. The second result tells us that the equiprobable source has the maximum entropy. Now, we proof Theorem.. Before we do that, we first proof the following lemma. Lemma. (Information theory fundamental inequality) ln x x. (.3) Proof: Denote f(x) = ln x x +. We have f () = and f () = <. Therefore, x = is the maxima of f(x), i.e., f(x) f() =. The desired result follows. A corollary of Lemma. is that ln x > x, (.4) which is obtained by replacing ln x with ln x in (.3). Then let us proof Theorem.. Proof: The first result is derived as H(X) = p i log ln ( ) p = p i i p i ln 2 p i ( p i ) = p 2 i, ln 2 ln 2 i= i= and the condition for equality can be easily verified. For the second result, we consider H(X) log n = p i log p i i= i= p i log n = i= i= p i log np i i= i= p i ( np i ) =, and the desired result follows. We note that the equality stands if and only if np i =, i.e., p i = n for all i. Example.5 Using the expression obtained in Example.2, let us plot the entropy of the binary source as a function of the probability p. The plot is shown in Figure., from which we can see that the maximum entropy is reached when p = /2, i.e., when the binary source is equiprobable..4 Joint Entropy Joint entropy is an obvious extension of entropy when we need to study two or more information sources. Definition.4 (Joint Entropy) The joint entropy of k information sources, X j X j, j =,..., k with the joint probability mass function p X...X K (x,..., x k ) = P [X = x X,..., X k = x k X k ], is defined as H(X,..., X k ) = x X... i k X k p X...X K (x,..., x k ) log p X...X K (x,..., x k ). (.5)
6 ECE-S622/T62 Notes Part I 6 H(x) p Figure.: Plot of H(X) = p log p ( p) log( p) Example.6 Consider two binary sources X and X 2 sharing the same alphabet X = {, } but having different probability mass functions. For X, p () = p () =.5, while for X 2, p 2 () = /3 and p 2 () = 2/3. We assume independence of the two sources, i.e., p X,X 2 (x, x 2 ) = p (x )p 2 (x 2 ). Then, H(X, X 2 ) = 6 log log log log bits per symbol pair. We want to emphasis that H(X, X 2 ) is defined as the average bits of information per pair of symbols. This should be in contrast with H(X) which is defined as the average bits of information per symbol. If the information per symbol of the joint source are concerned, we can divide H(X, X 2 ) by 2 to get what we want, which is.959 bits of information per symbol. You may compare this number with H(X ) = and H(X 2 ) =.983 to see the change of the entropy when computed jointly from those computed individually. An important application of joint entropy is to describe the information contents of the extended sources. Definition.5 (The k-th Extension of an information source) Let X be an source with alphabet X = {x,..., x n }. The k-th extension of X, denoted as X k, is a source with alphabet X k = {σ,..., σ n k}, each σ i corresponding to a length-k block of symbols from X, i.e., σ i = (x i,,..., x i,k ), x i,j X. Example.7 Consider the binary source, X = {, }. Its 2nd extension is X 2 = {,,, } and the 3rd extension X 3 = {,,,,,,, }. The entropy of the extended source X k can be computed using the formula of joint entropy, assuming the sources share the same alphabet. The complexity lies in that we need to know the k-dimensional distribution of X. However, the DMS allows simple computation of k-dimensional probability from one-dimensional mass function, p X k(σ i ) = p X (x i, )... p X (x i,k ). Therefore, the
7 ECE-S622/T62 Notes Part I 7 entropy of the kth extension of a DMS X is H(X k ) = = = = = = p X k(σ i ) log p X k(σ i ) n k i= p X (x i, )... p X (x i,k ) log p i= X (x i, )... p X (x i,k )... p X (x i )... p X (x ik ) log p i = i k = X (x i )... p X (x ik ) k... p X (x i )... p X (x ik ) log p X (x ij ) n k j= k i = j= i j = i k = p X (x ij ) p X (x ij ) k H(X) j= = kh(x) bits per symbol block (.6) Note that H(X k )/k = H(x). That means that each symbol still contain the same amount of information in the extended source as in the original source. This fact hold for DMSes. For sources with memory however, each symbol contain less information in the extended source. (Why?).5 Conditional Entropy Conditional entropy is used to quantify the information of a source when the information of other sources is available. Consider two sources X and Y with alphabets X and Y. The conditional probability that Y = y Y when X = x X is p Y X (y x). By simple analogy to the selfinformation, we can guess that the conditional self-information of Y = y on X = x is I(Y = y X = x) = log p Y X (y x). The average information of Y conditioned on X = x then, can be written as H(Y X = x) = y Y p Y X (y x) log p Y X (y x) Furthermore, we want the average information of Y conditioned on all possible symbols of x X, H(Y X) = p Y X (y x) log p x X y Y Y X (y x) p X(x) = p X,Y (x, y) log p x X y Y Y X (y x). (.7) The above equation is the definition of the conditional entropy of Y on X. It can be easily generalized to the case of multiple conditioning sources.
8 ECE-S622/T62 Notes Part I 8 Definition.6 (Conditional Entropy) The conditional entropy of the information source X k on sources X,..., X k is H(X k X,..., X k ) = x X... x k X k p X...X k (x,..., x k ) log p Xk X...X k (x k x,..., x k ) Example.8 Consider sources X { :.5, :.5} and Y {, }. The conditional probability of Y on X is p Y X ( ) =.25, p Y X ( ) =.6, p Y X ( ) =.75, p Y X ( ) =.4. The conditional entropy of Y on X then, is H(Y X) = (.25)(.5) log + (.75)(.5) log.75 + (.6)(.5) log + (.4)(.5) log =.89. We can also derive the marginal probability of Y using the conditional probability and the probability of X as p Y () =.425, p Y () =.575. Then, we can compute the (unconditional) entropy of Y as H(Y ) =.9837 bits per symbol. Note that the conditional entropy is smaller than the (unconditional) entropy. The noted fact in the above example is a general result about conditional entropy Theorem.2 (Conditioning Reduces Uncertainty) (.8) H(X Y ) H(X) (.9) Proof: H(X Y ) H(X) = p XY (x, y) log p y Y x X X Y (x y) p X (x) log p X (x) x X = p X (x) p XY (x, y) log p y Y x X X Y (x y) = p XY (x, y) log p X(x)p Y (y) p XY (x, y) y Y x X [ ] px (x)p Y (y) p XY (x, y) ln 2 p XY (x, y) = y Y x X This property is very intuitive because conditions give us some information about the considered source and thus reduces the original information of it. Conditional entropy and joint entropy are related by the chain rule. Theorem.3 (Chain Rule) H(X, Y ) = H(X) + H(Y X) (.)
9 ECE-S622/T62 Notes Part I 9 Proof: H(X, Y ) = p XY (x, y) log p XY (x, y) x X y Y = p XY (x, y) log p Y X y xp X (x) x X y Y = p XY (x, y) log p X (x) p XY (x, y) log p Y X y xp X (x) x X y Y x X y Y = H(X) + H(Y X). Corollary. Proof: The proof is along the same line as the theorem. H(X, Y Z) = H(X Z) + H(Y X, Z). (.) Example.9 Continue from Example.8. We can derive the joint probabilities as p XY (, ) = p Y X ( )P () = (.25)(.5) =.25, p XY (, ) =.375, p XY (, ) =.3, p XY (, ) =.2. Therefore, we can compute the joint entropy of X and Y, H(X, Y ) =.89. In addition, the entropy of X is obviously H(X) =. In Example.8, we have got H(X 2 X ) =.89. Then, we see H(X, Y ) = H(X) + H(Y X). A final remark is that H(Y X) H(X Y ). However, H(X) H(X Y ) = H(Y ) H(Y X), a property that we shall exploit later..6 Communication Channels Shannon gave a very abstract but precise model for communication systems: An information source, an output and a communication channel in between them. This model is shown in Figure.2. We consider the so-called discrete channel as the first step. A discrete channel is such a channel that both the source and the output are discrete processes. It can be simply modeled as a mapping of the source alphabet to the output alphabet in a probabilistic manner. If this discrete channel is additionally memoryless (from one source symbol to another), the mapping of successive source symbols is independent and the description of it can be simple. Definition.7 (Discrete Memoryless Channel) A discrete memoryless channel (DMC) between the information source X X = {x,..., x n } and the output Y Y = {y,..., y m } is a set of conditional probability p Y X (y j x i ) = p ij, standing for the probability that the output symbol y j is received when the source symbol x i is sent. Note that the channel may change the transmitted symbol to another one or introduce new symbols.
10 ECE-S622/T62 Notes Part I PSfrag replacementsinformation Source X Channel Information Output Y Figure.2: Information Theoretical Model for Communication Systems It is convenient to organize the conditional probabilities into an matrix p Y X (y x ) p Y X (y 2 x )... p Y X (y m x ) p p 2... p m p Y X (y x 2 ) p Y X (y 2 x 2 )... p Y X (y m x 2 ) P = = p 2 p p 2m p Y X (y x n ) p Y X (y 2 x n )... p Y X (y m x n ) p n p n2... p nm PSfrag replacements This matrix is usually referred to as channel matrix. Each row of the channel matrix P corresponds to an input of the channel, and each column of P corresponds to an output of the channel. Since if we sent x i we must receive some y j, we have m p ij =, i =,..., n. We also usually represent the channel graphically as in Figure.3. j= X Y x p p 2 y x 2 p 2 p 22 y 2 p npn2 p 2m pm x n p nm y m Figure.3: Channel Transition Graph Example. A source emits symbols {, } and the receiver receives symbols {, } as well.. If the channel is noiseless and deterministic, PSfrag replacements then p( ) = p( ) = and p( ) = p( ) =. The channel matrix and transition graph are [ ] P = This is called Binary Deterministic channel.
11 ECE-S622/T62 Notes Part I 2. If the channel introduces % bit inversion errors, then p( ) = p( ) =.99 and p( ) = PSfrag replacements p( ) =.. The channel matrix and transition graph are.99 [ ].99. P = The general case of P ( ) = P ( ) = ɛ and P ( ) = P ( ) = ɛ is called Binary symmetric channel (BSC). The binary deterministic channel described above is a special case of the BSC. PSfrag replacements 3. In general the errors in a binary channel depend on the symbol transmitted, i.e., P ( ) P ( )..8 [ ].8.2 P =.3.7 Example. (Binary Erasure Channel (BEC)) A binary erasure channel has a binary source {, } and a ternary output {,?, } where? means a decision cannot be made on whether a or was sent (output is erased). q P = [ q q q q ] PSfrag replacements q q If the source symbols are sent with probabilities p X (x i ), i =,..., n, the output symbols will then appear with some other set of probabilities: p Y (y j ), j =,..., m, which can be derived using the total probability formula: p ( y j ) = for any given channel p Y X (y j x i ). From Bayes law, we also get p Y X (y j x i )p X (x i ). (.2) i=? p X Y (x i y j ) = p XY (x i, y j ) p Y (y j ) = p Y X(y j x i )p X (x i ) p Y (y j ), (.3) which is the probability of an input x i having been sent given that the output y j was received. We denote p X Y (x i y j ) the backward probability and p Y X (y j x i ) the forward probability. Note that if we are given p Y (y j ) and p Y X (y j x i ) it may not be possible to invert (.2) to determine p X (x i ). In other words, there may be many source distribution p X (x i ) which lead to the same output distribution p Y (y j ) for a given channel p Y X (y j x i ). But if we are given p X (x i ) and p Y X (y j x i ) we always having unique p Y (y j ).
12 ECE-S622/T62 Notes Part I 2 [ ] 2/3 /3 Example.2 A binary channel which includes the source probabilities p / 9/ X () = 3/4 and p X () = /4 PSfrag can be replacements represented diagrammatically by 3/4 2/3 /4 /3 / 9/ Now we can derive the output probabilities: the backward probabilities p Y () = (2/3)(3/4) + (/)(/4) = 2/4 P Y () = (9/)(/4) + (/3)(3/4) = 9/4 p X Y ( ) = (2/3)(3/4) = 2/2 2/4 p X Y ( ) = (/)(/4) = /2 2/4 p X Y ( ) = (/3)(3/4) = /9 9/4 p X Y ( ) = (9/)(/4) = 9/9 9/4.7 Equivocation and Mutual Information Now let us study how the information content of a source is affected by a communication channel. We first need to give the information measure of the source before and after the channel (when the output is available). The a priori entropy of a source X is just the regular entropy of it, H(X) = P (x i ) log P (x i ). (.4) i= While, it is easy to guess that the a posteriori entropy should be the conditional entropy of X on the channel output Y, which we call equivocation of X with respect to Y. Definition.8 (Equivocation) Equivocation of X with respect to Y is the conditional entropy of X on Y is H(X Y ) = i= j= m p XY (x i, y j ) log p X Y (x i y j ) (.5)
13 ECE-S622/T62 Notes Part I 3 Equivocation gives the information of source X after its response at the output of the channel is available. From the discussion on conditional entropy in the previous section, we know that H(X Y ) H(X), i.e., after the transmission through channel p Y X (y j x i ), the information in X has been decreased. If we remember that information is uncertainty, then we have less uncertainty about the source X after we get the channel output Y. In other words, we are more certain about the source after we observe its through the channel output. This totally correlates our intuition. Then, it is straightforward to think that the difference between H(X) and H(X Y ) must be the information we extract from X through knowing Y. In other words, H(X) H(X Y ) gives the information about X implicated in Y. This quantity is name mutual information. Definition.9 (Mutual Information) The mutual information of X and Y is defined as Alternative expressions for I(X; Y ) can be obtained as follows, I(X; Y ) = H(X) H(X Y ) (.6) I(X; Y ) = = = = = p X (x i ) log p X (x i ) i= i= j= i= j= m p XY (x i, y j ) log m m i= j= m i= j= i= j= m p XY (x i, y j ) log p X (x i ) p XY (x i, y j ) log p X Y (x i y j ) p X (x i ) i= j= p XY (x i, y j ) log p X,Y (x i, y j ) p X (x i )p Y (y j ) p XY (x i, y j ) log p Y X(y j x i ) p Y (y j ) p X Y (x i y j ) m p XY (x i, y j ) log p X Y (x i y j ) (.7) The following propertied of the mutual information I(X; Y ) can be observed. Theorem.4 I(X; Y ) (.8) with equality if and only if p XY (x i, y j ) = p X (x i )p Y (y j ), i, j. Proof: Use information theory fundamental inequality of (.3).. Theorem.5 I(X;X) = H(X). Therefore, source entropy can be viewed as a special case of mutual information. Theorem.6 I(X; Y ) = I(Y ; X). (.9)
14 ECE-S622/T62 Notes Part I 4 It means that Mutual information are symmetric with respect to X and Y. This can be easily seen from (.7). It is interesting to investigate the relations between the entropies H(X) and H(Y ), the joint entropy H(X, Y ), + the equivocations H(X Y ) and H(Y X), and the mutual information I(X; Y ) = I(Y ; X), which are summarized in Figure.4 and the following theorem. Theorem.7 H(X, Y ) = H(X) + H(Y ) I(X; Y ) = H(X) + H(Y X) = H(Y ) + H(X Y ). (.2) The proof of this relationship is just a exercise of playing with the probability, joint probability and conditional probability. H(X) H(Y) H(X Y) I(X;Y) I(Y;X) H(Y)X) H(X,Y) Figure.4: Relationship between entropies, enquivocations, and mutual information Example.3 Use the specifications given in Example.2. Before we seeing an output from the channel, our priori knowledge of the information source is H(X) = (3/4) log(4/3) + (/4) log(4) =.8. But after seeing the channel output, say y j =, our knowledge of the input source becomes H(X y j = ) = (2/2) log(2/2) + (/2) log(2) =.276. We can similarly obtain H(X y j = ) =.998. We see that we are more certain that is sent when we observe because the uncertainty (entropy) of the source X is reduced from.8 to.276 when we receive a. On the other hand, when we receive a, we are more uncertain about what x i is (with almost equal uncertainty about whether it was a or ). On average, the equivocation of X wrt Y is H(X Y ) =.276(2/4) +.998(9/4) =.69 <.8 = H(X). The mutual information of X and Y is I(X; Y ) =.8.69 =.92. That means we get.92 bits of information per symbol received. Other quantities include H(Y ) = (2/4) log(4/2) + (9/4) log(4/9) =.9982, H(Y X) = H(Y ) I(Y ; X) = H(Y ) I(X; Y ) =.862, and H(X; Y ) = H(X) + H(Y ) I(X; Y ) =.67. Example.4 For a BSC (c.f. Example.), the following probabilities can be obtained p X (x i ): p X () = p, p X () = p;
15 ECE-S622/T62 Notes Part I 5 p Y X (y j x i ): p Y X ( ) = p Y X ( ) = ɛ, p Y X ( ) = p Y X ( ) = ɛ; p XY (x i, y j ): p XY (, ) = p( ɛ), p XY (, ) = ( p)( ɛ), + p XY (, ) = pɛ, p XY (, ) = ( p)ɛ; p Y (y j ): p Y () = p( ɛ) + ( p)ɛ, p Y () = pɛ + ( p)( ɛ); p X Y (x i y j ): p X Y ( ) = p( ɛ) p( ɛ)+( p)ɛ, p X Y ( ) = ( p)ɛ p( ɛ)+( p)ɛ, p X Y ( ) = pɛ pɛ+( p)( ɛ), p X Y ( ) = ( p)( ɛ) pɛ+( p)( ɛ) Then we can compute various entropies and the mutual information. Here we just show an easy way for mutual information. I(X; Y ) = H(Y ) H(Y X) = F (P ( ɛ) + ( p)ɛ) F (ɛ) where F (x) = x log x + ( x) log x. What would you expect I(X; Y ) to be for a BSC if p =.5, p =, ɛ =.5, or ɛ =.? You do not need to calculate! Example.5 (Noiseless Channel) A channel of which each output symbol can be produced by the occurrence only of a particular one of the source symbols is called noisless channel, i.e., there is no noise or ambiguity on which input have caused the output. An example channel is given in the following /2 /2 P = /2 /2 3/5 3/ / 3/5 3/ / We see that for a noiseless channel, the channel matrix has one and only on nonzero element in each column. Also note that the output symbols may be more than the source symbols. However, it can not be less (Why?). For a noiseless channel when we observe the output y j we know with probability that input, say x, was sent; that is, p X Y (x y j ) = for x and p X Y (x i y j ) = for all other x i x s. The equivocation H(X Y ) will be H(X Y ) = i= j= m m P (x i, y j ) log p(x i y j ) = P (y j ) j= P (x i y j ) log P (x i y j ) =, i= because there is only one P (x i y j ) to be and other are zero. Then, we have the following result: For noiseless channels, I(X; Y ) = H(X). That means that with noiseless channel, there is no uncertainty about the input upon observing the output, and that the amount of information transmitted through the channel is the same as the information contained in the source. That is way we are favor of noiseless channels. Example.6 (Deterministic channel) A channel in which there are more possible input symbols than output symbols, but where each of the input symbol is only capable of producing one of the output
16 ECE-S622/T62 Notes Part I 6 symbols, is called a deterministic channel. An example of deterministic channel is given below. P = We can see that a deterministic channel has a channel matrix with one and only one nonzero element in each row. For a deterministic channel we know with probability that output symbol, say y i will be produced when x i is sent. Therefore, P (y i a i ) = for y i and P (y j a i ) = for other y j s. The equivocation H(Y X) = following the same derivation in the previous example. Hence, we have the following result For deterministic channels I(X; Y ) = H(Y )..8 Cascaded Channel A cascade of two channels is shown in Figure.5 The output of channel is connected to the input of channel 2. When x i is sent through channel, the output is y j. The same y j forms the input to channel 2 which produces the output z k. If we know that the intermediate symbol is y j, then the probability of obtaining z k at the output is dependent solely on y j and not on x i. That is p Z XY (z k x i, y j ) = p Z Y (z k y j ), i, j, k. Actually, this relationship can be viewed as a definition of a cascade of two channels. In the reverse direction we have p X Y Z (x i y j, z k ) = p X Y (x i y j ). PSfrag replacements X Y Z Channel Channel 2 Figure.5: Cascade of Two Channels
17 ECE-S622/T62 Notes Part I 7 Let us look at H(X Z) H(X Y ) = P (x, z) log P (x z) log P (x y) X,Z X,Y = P (x, y, z) log P (x z) X,Y,Z = X,Y,Z = Y,Z = Y,Z P (x, y, z) log P (x y) P (x z) P (y, z) X P (y, z) X X,Y,Z P (x, y, z) log P (x y, z) P (x y, z) log P (x z) ( P (x y, z) P (x z) ) P (x y, z) P (x y) Hence, H(X Z) H(X Y ) with equality iff p X Z (x z) = P X Y Z (x y, z) p X Z (x z) = p X Y (x y). Consequently, we have the following result Theorem.8 For the cascade of channels X Y and Y Z, with equality iff p X Z (x z) = p X Y (x y). I(X; Y ) I(X; Z) (.2) This result implies that information channels tend to leak: the information coming out at the end of the cascaded system can be no greater (and probably less) than the information from an intermediate point. Example.7 If the channel Y Z is noiseless, then p X Z (x z) = p Y y (x y) because p X Z (x z) = y Y p X Y (x y)p Y Z (y z) and p Y Z (y z) = only for a specific y according to the property of noiseless channels. However, the condition p XZ (x z) = p XY (x y) can be satisfied by noisy channels. Example.8 Consider the cascade of channel X Y [ ] /3 /3 /3 P XY = /2 /2 with channel Y Z P Y Z = 2/3 /3 /3 2/3 which gives
18 ECE-S622/T62 Notes Part I 8 /3 /3 /3 /2 /2 2/3 /3 /3 2/3 You can verify that channel Y Z is not noiseless, but surprisingly it does not leak information because I(X; Z) = I(X; Y ). Indeed P XZ = P XY P Y Z = P XY.9 Continuous Sources and Channels Models for continuous sources and channels are necessary when we study analogue communication systems. Even in digital communication systems, the signal transmission between the modulator and demodulator is in a continuous fashion. A continuous source is a random process X(t) with continuous amplitude. It can be memoryless or with memory. But we mainly consider memoryless sources which can be described with a random variable X representing one snapshot of the source. The associated probability density function (pdf) of X, f X (x) can fully specify the continuous information source. A continuous channel maps a continuous source X to the output Y which is also continuous. The mapping is probabilistic. Such a channel is also called waveform channel. Again, we consider memoryless channels which can be described by the conditional pdf f Y X (y x). Example.9 (Additive white Gaussian noise channel) The most popular channel is the additive white Gaussian noise (AWGN) channel as shown in Figure.6. The channel can be described by the conditional probability f Y X (y x) = 2πσ e (y x)2 2σ 2 (.22) X Y=X+W W Figure.6: Additive White Gaussian Noise (AWGN) Channel There are also semi-continuous channels, in which one of X or Y is continuous and another is discrete.
19 ECE-S622/T62 Notes Part I 9. Mutual Information and Differential Entropy Let us first consider the extension of the mutual information for discrete channels to continuous ones. Consider quantizing the source X and output Y by dividing their value set into small intervals δx and δy, respectively, and concentrating each small interval into one value: x i = iδx, y j = jδy The probabilities associated with x i and y i are related to the pdfs in the following way: P (x i ) = f X (iδx)δx, P (y j ) = f Y (jδy)δy, P (x i, y j ) = f Y X (iδx, jδy)δxδy We can use the mutual information of X d = {x i }, Y d = {y j } to approximate the mutual information of X and Y, I(X; Y ) I(X d ; Y d ) = i f(iδx, jδy) f(iδx, jδy)δxδy log f(iδx)f(jδy) j Letting δx, δy, we can expect that the approximation becomes precise. Note that limiting procedure changes the double-summation into double-integration. Finally, we get the following definition of mutual information for continuous channels. Definition. (Mutual Information) The mutual information of X and Y is I(X; Y ) = f XY (x, y) log f XY (x, y) dxdy (.23) f X (x)f Y (y) (.23) also allows the following alternative expressions: I(X; Y ) = = X X X Y Y Y f X (x)f Y X (y x) f Y X(y x) dxdy f Y (y) f Y (y)f X Y (x y) f X Y (x y) dxdy (.24) f X (x) However, this quantization method does not apply to the extension of entropy, because H(X d ) = i f X (iδx)δx log f X (iδx)δx = i f X (iδx) log f X (iδx) δx + i f X (iδx) log δx δx where the second term does not converge as δx. A solution is that we only take the first well-behaved term as the entropy of the continuous source X, which we call differential entropy. Definition. (Differential Entropy) h(x) = X f X (x) log dx (.25) f X (x)
20 ECE-S622/T62 Notes Part I 2 Similarly, we have joint differential entropy h(x, Y ) = f XY (x, y) log dx, (.26) f XY (x, y) X Y and conditional differential entropy h(y X) = X Y f XY (x, y) log f Y X (y x) (.27) Though these definitions look just like a replacement of the probabilities with pdfs, they no longer possess the mathematical beautifulness of the former ones. For example, h(x) may be negative. More importantly, the usefulness of h(x) depends on the existence of the integration which is not necessarily a finite number. One the other hand, as long as h(x) and h(x Y ) exist the relationship will hold. I(X; Y ) = h(x) h(x Y ) (.28) Example.2 The differential entropy of a Gaussian source, f X (x) = 2πσ e (x m)2 2σ 2 can be derived as [ ( )] h(x) = E log e (x m)2 2σ 2 = log( 2πσ) + [ ] (x m) 2 2πσ 2 log e E σ 2 = 2 log(2πeσ2 ) (.29) If we view σ 2 as the power of X, we see that the differential entropy of a Gaussian source is determined by its power. The result of (.29) can be extended to k joint Gaussian sources, h(x,..., X k ) = 2 log[(2πe)n P] where P is the covariance matrix of X,..., X k. Besides Gaussian distribution, uniform distribution, f X (x) = /(b a), a x b is another important distribution. Its differential entropy is h(x) = log(b a), (.3) determined by the length of the interval. If b a <, h(x) <. So, the positiveness of h(x) does not hold.
Lecture 8: Channel Capacity, Continuous Random Variables
EE376A/STATS376A Information Theory Lecture 8-02/0/208 Lecture 8: Channel Capacity, Continuous Random Variables Lecturer: Tsachy Weissman Scribe: Augustine Chemparathy, Adithya Ganesh, Philip Hwang Channel
More informationECE 4400:693 - Information Theory
ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential
More informationInformation Theory - Entropy. Figure 3
Concept of Information Information Theory - Entropy Figure 3 A typical binary coded digital communication system is shown in Figure 3. What is involved in the transmission of information? - The system
More information(Classical) Information Theory III: Noisy channel coding
(Classical) Information Theory III: Noisy channel coding Sibasish Ghosh The Institute of Mathematical Sciences CIT Campus, Taramani, Chennai 600 113, India. p. 1 Abstract What is the best possible way
More informationChapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University
Chapter 4 Data Transmission and Channel Capacity Po-Ning Chen, Professor Department of Communications Engineering National Chiao Tung University Hsin Chu, Taiwan 30050, R.O.C. Principle of Data Transmission
More informationPrinciples of Communications
Principles of Communications Weiyao Lin Shanghai Jiao Tong University Chapter 10: Information Theory Textbook: Chapter 12 Communication Systems Engineering: Ch 6.1, Ch 9.1~ 9. 92 2009/2010 Meixia Tao @
More informationIntroduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.
L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate
More informationDept. of Linguistics, Indiana University Fall 2015
L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission
More informationEC2252 COMMUNICATION THEORY UNIT 5 INFORMATION THEORY
EC2252 COMMUNICATION THEORY UNIT 5 INFORMATION THEORY Discrete Messages and Information Content, Concept of Amount of Information, Average information, Entropy, Information rate, Source coding to increase
More informationRevision of Lecture 5
Revision of Lecture 5 Information transferring across channels Channel characteristics and binary symmetric channel Average mutual information Average mutual information tells us what happens to information
More informationInformation Theory CHAPTER. 5.1 Introduction. 5.2 Entropy
Haykin_ch05_pp3.fm Page 207 Monday, November 26, 202 2:44 PM CHAPTER 5 Information Theory 5. Introduction As mentioned in Chapter and reiterated along the way, the purpose of a communication system is
More informationSolutions to Homework Set #3 Channel and Source coding
Solutions to Homework Set #3 Channel and Source coding. Rates (a) Channels coding Rate: Assuming you are sending 4 different messages using usages of a channel. What is the rate (in bits per channel use)
More informationChapter 9 Fundamental Limits in Information Theory
Chapter 9 Fundamental Limits in Information Theory Information Theory is the fundamental theory behind information manipulation, including data compression and data transmission. 9.1 Introduction o For
More informationNoisy-Channel Coding
Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/05264298 Part II Noisy-Channel Coding Copyright Cambridge University Press 2003.
More informationRevision of Lecture 4
Revision of Lecture 4 We have completed studying digital sources from information theory viewpoint We have learnt all fundamental principles for source coding, provided by information theory Practical
More informationLecture 5 Channel Coding over Continuous Channels
Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 14, 2014 1 / 34 I-Hsiang Wang NIT Lecture 5 From
More informationMAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006)
MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK SATELLITE COMMUNICATION DEPT./SEM.:ECE/VIII UNIT V PART-A 1. What is binary symmetric channel (AUC DEC 2006) 2. Define information rate? (AUC DEC 2007)
More informationMAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK. SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A
MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK DEPARTMENT: ECE SEMESTER: IV SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A 1. What is binary symmetric channel (AUC DEC
More informationChapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye
Chapter 2: Entropy and Mutual Information Chapter 2 outline Definitions Entropy Joint entropy, conditional entropy Relative entropy, mutual information Chain rules Jensen s inequality Log-sum inequality
More informationModule 1. Introduction to Digital Communications and Information Theory. Version 2 ECE IIT, Kharagpur
Module ntroduction to Digital Communications and nformation Theory Lesson 3 nformation Theoretic Approach to Digital Communications After reading this lesson, you will learn about Scope of nformation Theory
More informationNoisy channel communication
Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 6 Communication channels and Information Some notes on the noisy channel setup: Iain Murray, 2012 School of Informatics, University
More informationLecture 11: Continuous-valued signals and differential entropy
Lecture 11: Continuous-valued signals and differential entropy Biology 429 Carl Bergstrom September 20, 2008 Sources: Parts of today s lecture follow Chapter 8 from Cover and Thomas (2007). Some components
More information3F1 Information Theory, Lecture 1
3F1 Information Theory, Lecture 1 Jossy Sayir Department of Engineering Michaelmas 2013, 22 November 2013 Organisation History Entropy Mutual Information 2 / 18 Course Organisation 4 lectures Course material:
More informationChannel capacity. Outline : 1. Source entropy 2. Discrete memoryless channel 3. Mutual information 4. Channel capacity 5.
Channel capacity Outline : 1. Source entropy 2. Discrete memoryless channel 3. Mutual information 4. Channel capacity 5. Exercices Exercise session 11 : Channel capacity 1 1. Source entropy Given X a memoryless
More informationLecture 17: Differential Entropy
Lecture 17: Differential Entropy Differential entropy AEP for differential entropy Quantization Maximum differential entropy Estimation counterpart of Fano s inequality Dr. Yao Xie, ECE587, Information
More information3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions
Engineering Tripos Part IIA THIRD YEAR 3F: Signals and Systems INFORMATION THEORY Examples Paper Solutions. Let the joint probability mass function of two binary random variables X and Y be given in the
More informationLecture 6 I. CHANNEL CODING. X n (m) P Y X
6- Introduction to Information Theory Lecture 6 Lecturer: Haim Permuter Scribe: Yoav Eisenberg and Yakov Miron I. CHANNEL CODING We consider the following channel coding problem: m = {,2,..,2 nr} Encoder
More information(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute
ENEE 739C: Advanced Topics in Signal Processing: Coding Theory Instructor: Alexander Barg Lecture 6 (draft; 9/6/03. Error exponents for Discrete Memoryless Channels http://www.enee.umd.edu/ abarg/enee739c/course.html
More informationOne Lesson of Information Theory
Institut für One Lesson of Information Theory Prof. Dr.-Ing. Volker Kühn Institute of Communications Engineering University of Rostock, Germany Email: volker.kuehn@uni-rostock.de http://www.int.uni-rostock.de/
More informationLecture 4 Noisy Channel Coding
Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 9, 2015 1 / 56 I-Hsiang Wang IT Lecture 4 The Channel Coding Problem
More informationDiscrete Memoryless Channels with Memoryless Output Sequences
Discrete Memoryless Channels with Memoryless utput Sequences Marcelo S Pinho Department of Electronic Engineering Instituto Tecnologico de Aeronautica Sao Jose dos Campos, SP 12228-900, Brazil Email: mpinho@ieeeorg
More informationLecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157
Lecture 6: Gaussian Channels Copyright G. Caire (Sample Lectures) 157 Differential entropy (1) Definition 18. The (joint) differential entropy of a continuous random vector X n p X n(x) over R is: Z h(x
More informationAn instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1
Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,
More informationCommunication Theory II
Communication Theory II Lecture 15: Information Theory (cont d) Ahmed Elnakib, PhD Assistant Professor, Mansoura University, Egypt March 29 th, 2015 1 Example: Channel Capacity of BSC o Let then: o For
More information1 Introduction to information theory
1 Introduction to information theory 1.1 Introduction In this chapter we present some of the basic concepts of information theory. The situations we have in mind involve the exchange of information through
More informationx log x, which is strictly convex, and use Jensen s Inequality:
2. Information measures: mutual information 2.1 Divergence: main inequality Theorem 2.1 (Information Inequality). D(P Q) 0 ; D(P Q) = 0 iff P = Q Proof. Let ϕ(x) x log x, which is strictly convex, and
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More informationEE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.
EE376A - Information Theory Final, Monday March 14th 216 Solutions Instructions: You have three hours, 3.3PM - 6.3PM The exam has 4 questions, totaling 12 points. Please start answering each question on
More informationInformation Theory. Coding and Information Theory. Information Theory Textbooks. Entropy
Coding and Information Theory Chris Williams, School of Informatics, University of Edinburgh Overview What is information theory? Entropy Coding Information Theory Shannon (1948): Information theory is
More informationInformation in Biology
Information in Biology CRI - Centre de Recherches Interdisciplinaires, Paris May 2012 Information processing is an essential part of Life. Thinking about it in quantitative terms may is useful. 1 Living
More informationCoding for Discrete Source
EGR 544 Communication Theory 3. Coding for Discrete Sources Z. Aliyazicioglu Electrical and Computer Engineering Department Cal Poly Pomona Coding for Discrete Source Coding Represent source data effectively
More informationInformation in Biology
Lecture 3: Information in Biology Tsvi Tlusty, tsvi@unist.ac.kr Living information is carried by molecular channels Living systems I. Self-replicating information processors Environment II. III. Evolve
More informationBioinformatics: Biology X
Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA Model Building/Checking, Reverse Engineering, Causality Outline 1 Bayesian Interpretation of Probabilities 2 Where (or of what)
More informationEntropies & Information Theory
Entropies & Information Theory LECTURE I Nilanjana Datta University of Cambridge,U.K. See lecture notes on: http://www.qi.damtp.cam.ac.uk/node/223 Quantum Information Theory Born out of Classical Information
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More informationLecture 22: Final Review
Lecture 22: Final Review Nuts and bolts Fundamental questions and limits Tools Practical algorithms Future topics Dr Yao Xie, ECE587, Information Theory, Duke University Basics Dr Yao Xie, ECE587, Information
More informationLecture 2: August 31
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy
More informationITCT Lecture IV.3: Markov Processes and Sources with Memory
ITCT Lecture IV.3: Markov Processes and Sources with Memory 4. Markov Processes Thus far, we have been occupied with memoryless sources and channels. We must now turn our attention to sources with memory.
More informationAn introduction to basic information theory. Hampus Wessman
An introduction to basic information theory Hampus Wessman Abstract We give a short and simple introduction to basic information theory, by stripping away all the non-essentials. Theoretical bounds on
More informationInformation measures in simple coding problems
Part I Information measures in simple coding problems in this web service in this web service Source coding and hypothesis testing; information measures A(discrete)source is a sequence {X i } i= of random
More informationELEC546 Review of Information Theory
ELEC546 Review of Information Theory Vincent Lau 1/1/004 1 Review of Information Theory Entropy: Measure of uncertainty of a random variable X. The entropy of X, H(X), is given by: If X is a discrete random
More informationComputational Systems Biology: Biology X
Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#8:(November-08-2010) Cancer and Signals Outline 1 Bayesian Interpretation of Probabilities Information Theory Outline Bayesian
More informationCHAPTER 3. P (B j A i ) P (B j ) =log 2. j=1
CHAPTER 3 Problem 3. : Also : Hence : I(B j ; A i ) = log P (B j A i ) P (B j ) 4 P (B j )= P (B j,a i )= i= 3 P (A i )= P (B j,a i )= j= =log P (B j,a i ) P (B j )P (A i ).3, j=.7, j=.4, j=3.3, i=.7,
More informationInformation. = more information was provided by the outcome in #2
Outline First part based very loosely on [Abramson 63]. Information theory usually formulated in terms of information channels and coding will not discuss those here.. Information 2. Entropy 3. Mutual
More informationA Gentle Tutorial on Information Theory and Learning. Roni Rosenfeld. Carnegie Mellon University
A Gentle Tutorial on Information Theory and Learning Roni Rosenfeld Mellon University Mellon Outline First part based very loosely on [Abramson 63]. Information theory usually formulated in terms of information
More information4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information
4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information Ramji Venkataramanan Signal Processing and Communications Lab Department of Engineering ramji.v@eng.cam.ac.uk
More informationELEMENT OF INFORMATION THEORY
History Table of Content ELEMENT OF INFORMATION THEORY O. Le Meur olemeur@irisa.fr Univ. of Rennes 1 http://www.irisa.fr/temics/staff/lemeur/ October 2010 1 History Table of Content VERSION: 2009-2010:
More informationNotes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel
Introduction to Coding Theory CMU: Spring 2010 Notes 3: Stochastic channels and noisy coding theorem bound January 2010 Lecturer: Venkatesan Guruswami Scribe: Venkatesan Guruswami We now turn to the basic
More informationCommunication Theory and Engineering
Communication Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 018-019 Information theory Practice work 3 Review For any probability distribution, we define
More information5 Mutual Information and Channel Capacity
5 Mutual Information and Channel Capacity In Section 2, we have seen the use of a quantity called entropy to measure the amount of randomness in a random variable. In this section, we introduce several
More informationEntropy Rate of Stochastic Processes
Entropy Rate of Stochastic Processes Timo Mulder tmamulder@gmail.com Jorn Peters jornpeters@gmail.com February 8, 205 The entropy rate of independent and identically distributed events can on average be
More informationApplication of Information Theory, Lecture 7. Relative Entropy. Handout Mode. Iftach Haitner. Tel Aviv University.
Application of Information Theory, Lecture 7 Relative Entropy Handout Mode Iftach Haitner Tel Aviv University. December 1, 2015 Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December
More informationEE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.
EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 28 Please submit on Gradescope. Start every question on a new page.. Maximum Differential Entropy (a) Show that among all distributions supported
More information3F1 Information Theory, Lecture 3
3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2011, 28 November 2011 Memoryless Sources Arithmetic Coding Sources with Memory 2 / 19 Summary of last lecture Prefix-free
More informationPerhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.
Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage
More informationCapacity of the Discrete Memoryless Energy Harvesting Channel with Side Information
204 IEEE International Symposium on Information Theory Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information Omur Ozel, Kaya Tutuncuoglu 2, Sennur Ulukus, and Aylin Yener
More informationIntroduction to Machine Learning
What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes
More informationShannon s noisy-channel theorem
Shannon s noisy-channel theorem Information theory Amon Elders Korteweg de Vries Institute for Mathematics University of Amsterdam. Tuesday, 26th of Januari Amon Elders (Korteweg de Vries Institute for
More informationSolutions to Homework Set #4 Differential Entropy and Gaussian Channel
Solutions to Homework Set #4 Differential Entropy and Gaussian Channel 1. Differential entropy. Evaluate the differential entropy h(x = f lnf for the following: (a Find the entropy of the exponential density
More informationShannon s Noisy-Channel Coding Theorem
Shannon s Noisy-Channel Coding Theorem Lucas Slot Sebastian Zur February 2015 Abstract In information theory, Shannon s Noisy-Channel Coding Theorem states that it is possible to communicate over a noisy
More informationMultimedia Communications. Mathematical Preliminaries for Lossless Compression
Multimedia Communications Mathematical Preliminaries for Lossless Compression What we will see in this chapter Definition of information and entropy Modeling a data source Definition of coding and when
More informationThe binary entropy function
ECE 7680 Lecture 2 Definitions and Basic Facts Objective: To learn a bunch of definitions about entropy and information measures that will be useful through the quarter, and to present some simple but
More informationCS6304 / Analog and Digital Communication UNIT IV - SOURCE AND ERROR CONTROL CODING PART A 1. What is the use of error control coding? The main use of error control coding is to reduce the overall probability
More informationShannon meets Wiener II: On MMSE estimation in successive decoding schemes
Shannon meets Wiener II: On MMSE estimation in successive decoding schemes G. David Forney, Jr. MIT Cambridge, MA 0239 USA forneyd@comcast.net Abstract We continue to discuss why MMSE estimation arises
More informationInformation Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay
Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay Lecture - 13 Competitive Optimality of the Shannon Code So, far we have studied
More informationEE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm
EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm 1. Feedback does not increase the capacity. Consider a channel with feedback. We assume that all the recieved outputs are sent back immediately
More informationDiscrete Probability Refresher
ECE 1502 Information Theory Discrete Probability Refresher F. R. Kschischang Dept. of Electrical and Computer Engineering University of Toronto January 13, 1999 revised January 11, 2006 Probability theory
More informationClassical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006
Classical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006 Fabio Grazioso... July 3, 2006 1 2 Contents 1 Lecture 1, Entropy 4 1.1 Random variable...............................
More informationBounds on Mutual Information for Simple Codes Using Information Combining
ACCEPTED FOR PUBLICATION IN ANNALS OF TELECOMM., SPECIAL ISSUE 3RD INT. SYMP. TURBO CODES, 003. FINAL VERSION, AUGUST 004. Bounds on Mutual Information for Simple Codes Using Information Combining Ingmar
More informationInvestigation of the Elias Product Code Construction for the Binary Erasure Channel
Investigation of the Elias Product Code Construction for the Binary Erasure Channel by D. P. Varodayan A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF BACHELOR OF APPLIED
More informationECE531: Principles of Detection and Estimation Course Introduction
ECE531: Principles of Detection and Estimation Course Introduction D. Richard Brown III WPI 22-January-2009 WPI D. Richard Brown III 22-January-2009 1 / 37 Lecture 1 Major Topics 1. Web page. 2. Syllabus
More informationAppendix B Information theory from first principles
Appendix B Information theory from first principles This appendix discusses the information theory behind the capacity expressions used in the book. Section 8.3.4 is the only part of the book that supposes
More informationEntropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information
Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information 1 Conditional entropy Let (Ω, F, P) be a probability space, let X be a RV taking values in some finite set A. In this lecture
More informationReview of Probability Theory
Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving
More informationCHAPTER 12 Boolean Algebra
318 Chapter 12 Boolean Algebra CHAPTER 12 Boolean Algebra SECTION 12.1 Boolean Functions 2. a) Since x 1 = x, the only solution is x = 0. b) Since 0 + 0 = 0 and 1 + 1 = 1, the only solution is x = 0. c)
More informationElectrical and Information Technology. Information Theory. Problems and Solutions. Contents. Problems... 1 Solutions...7
Electrical and Information Technology Information Theory Problems and Solutions Contents Problems.......... Solutions...........7 Problems 3. In Problem?? the binomial coefficent was estimated with Stirling
More informationCS 630 Basic Probability and Information Theory. Tim Campbell
CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)
More informationLECTURE 3. Last time:
LECTURE 3 Last time: Mutual Information. Convexity and concavity Jensen s inequality Information Inequality Data processing theorem Fano s Inequality Lecture outline Stochastic processes, Entropy rate
More informationA CLASSROOM NOTE: ENTROPY, INFORMATION, AND MARKOV PROPERTY. Zoran R. Pop-Stojanović. 1. Introduction
THE TEACHING OF MATHEMATICS 2006, Vol IX,, pp 2 A CLASSROOM NOTE: ENTROPY, INFORMATION, AND MARKOV PROPERTY Zoran R Pop-Stojanović Abstract How to introduce the concept of the Markov Property in an elementary
More informationComplex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity
Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity Eckehard Olbrich MPI MiS Leipzig Potsdam WS 2007/08 Olbrich (Leipzig) 26.10.2007 1 / 18 Overview 1 Summary
More informationDEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY
DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY OUTLINE 3.1 Why Probability? 3.2 Random Variables 3.3 Probability Distributions 3.4 Marginal Probability 3.5 Conditional Probability 3.6 The Chain
More informationEE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions
EE/Stat 376B Handout #5 Network Information Theory October, 14, 014 1. Problem.4 parts (b) and (c). Homework Set # Solutions (b) Consider h(x + Y ) h(x + Y Y ) = h(x Y ) = h(x). (c) Let ay = Y 1 + Y, where
More informationWilliam Stallings Copyright 2010
A PPENDIX F M EASURES OF S ECRECY AND S ECURITY William Stallings Copyright 2010 F.1 PERFECT SECRECY...2! F.2 INFORMATION AND ENTROPY...8! Information...8! Entropy...10! Properties of the Entropy Function...12!
More informationLecture 8: Shannon s Noise Models
Error Correcting Codes: Combinatorics, Algorithms and Applications (Fall 2007) Lecture 8: Shannon s Noise Models September 14, 2007 Lecturer: Atri Rudra Scribe: Sandipan Kundu& Atri Rudra Till now we have
More information6.1 Main properties of Shannon entropy. Let X be a random variable taking values x in some alphabet with probabilities.
Chapter 6 Quantum entropy There is a notion of entropy which quantifies the amount of uncertainty contained in an ensemble of Qbits. This is the von Neumann entropy that we introduce in this chapter. In
More informationChapter 7. Error Control Coding. 7.1 Historical background. Mikael Olofsson 2005
Chapter 7 Error Control Coding Mikael Olofsson 2005 We have seen in Chapters 4 through 6 how digital modulation can be used to control error probabilities. This gives us a digital channel that in each
More informationLecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity
5-859: Information Theory and Applications in TCS CMU: Spring 23 Lecture 8: Channel and source-channel coding theorems; BEC & linear codes February 7, 23 Lecturer: Venkatesan Guruswami Scribe: Dan Stahlke
More informationAn Alternative Proof of Channel Polarization for Channels with Arbitrary Input Alphabets
An Alternative Proof of Channel Polarization for Channels with Arbitrary Input Alphabets Jing Guo University of Cambridge jg582@cam.ac.uk Jossy Sayir University of Cambridge j.sayir@ieee.org Minghai Qin
More informationChapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye
Chapter 8: Differential entropy Chapter 8 outline Motivation Definitions Relation to discrete entropy Joint and conditional differential entropy Relative entropy and mutual information Properties AEP for
More informationInformation Sources. Professor A. Manikas. Imperial College London. EE303 - Communication Systems An Overview of Fundamentals
Information Sources Professor A. Manikas Imperial College London EE303 - Communication Systems An Overview of Fundamentals Prof. A. Manikas (Imperial College) EE303: Information Sources 24 Oct. 2011 1
More information