I - Information theory basics - PDF Free Download

I - Information theor basics Introduction To communicate, that is, to carr information between two oints, we can emlo analog or digital transmission techniques. In digital communications the message is constituted b sequences of bits. Digital transmissions offer the following advantages:. Robustness to noise and interference that cannot attained with analog sstems (through the use of source and channel coding and a convenient transmission rate in bits/second). 2. Integration of several information sources (analog and digital) in a common format. 3. Securit of the information along the ath between the source and the destination (through the use of encrted messages and sread sectrum techniques). 4. Efficient storage of large amounts of data in otical or magnetic media. 5. Fleibilit in the transmission of information through the communication network b formatting data in ackets (data+origin and destination addresses+acket number). Fig. deicts a oint-to-oint digital communication sstem. The transmission channel is the hsical transmission medium used to connect the source of information (transmitter) to the user (receiver). Different tes of channels can be defined, deending on the art of the sstem we are analzing. Between the modulator outut and the demodulator inut we have a continuous channel modeled, for instance, according to Fig. 2. In this case, the channel is comletel characterized b the noise robabilit densit function w(t). A common channel in Telecommunications is the AWGN (additive white Gaussian noise) where w(t) is additive Gaussian noise with ower sectral densit G w (f) = N 0 /2. If, alternativel, we consider in Fig. the channel encoder outut and the decoder inut, we have a discrete channel that accets smbols i of an inut alhabet X rovided b the channel encoder and ro- FDNunes, IST 203.

2 SOURCE ENCODING 2 duces smbols j belonging to an outut alhabet Y. When X and Y contain the same smbols, j is an estimate of the transmitted smbol i. discrete source source encoder channel encoder i modulator (t) transmission channel user source decoder channel decoder j demodulator (t) Figure : Functional block diagram of a oint-to-oint digital communication sstem (t) + (t) w(t) Figure 2: Diagram of the AWGN channel 2 Source encoding Consider that the message consists of a sequence of smbols selected from a finite set, named source alhabet. In general, we can associate a given robabilit with the occurrence of each smbol of the alhabet. Besides, the successive emitted smbols ma be statisticall indeendent or ehibit some te of deendenc between them. In the former case we sa that the source is memorless. The amount of information carried b a given smbol of the alhabet deends on the uncertaint of its occurrence. For instance, given the sentences: a dog bit a man and a man bit a dog, the amount of information is larger in the latter sentence because the robabilit of occurrence of the latter event is smaller (an event with robabilit one corresonds to a null amount of information). Consider a finite alhabet X formed b M smbols { i } M i= and define a message as a sequence of indeendent smbols (n), n = 0,,..., with n denoting time. A robabilit of occurrence i = rob ( i ) eists associated with each smbol. The amount of information corresonding to that smbol is I( i ) = log 2 (bits) i

2 SOURCE ENCODING 3 In order to characterize the alhabet we define the average content of information (or entro) of X M M H(X) = i I( i ) = i log 2 i which is eressed in bits/smbol. i= eamle: A source alhabet consists of four smbols with robabilities = /2, 2 = /4, 3 = 4 = /8. The source entro is given b H(X) = 2 log 2 2 + 4 log 2 4 + 2 8 log 2 8 =.75 bits/smbol i= A roblem that arises is the encoding of each source smbol through a binar code word (using binar smbols 0 and ). Since the wa of encoding each smbol is not unique, this leads to the question of otimizing the encoding rocess in the sense of minimizing the average number of bits (binar smbols) used to transmit the message. A classical source encoding eamle is the Morse code where the letters A..Z, the numbers 0..9 and some unctuation marks are encoded in binar words constituted b dashes and dots. Let L be the average length of the code words, given b M L = i l i, i= where l i is the length (in bits) of the code word associated with the smbol i. It can be roven that the average length of the code words resents a minimum value such that L H(X) in order to allow the discrete memorless source, X, to be encoded and uniquel decoded (without ambiguit), that is, in such a wa that, to each finite sequence of bits, there is at most a corresonding message. A sufficient condition that allows the code to be uniquel decodable and instantaneous (each word is immediatel decoded after its occurrence) is that no code word is the refi of another longer code word (refi code). For instance, the following code

2 SOURCE ENCODING 4 smbols code words 0 2 0 3 0 4 00 is ambiguous or not uniquel decodable because the sequence of bits 00 can reresent either the smbols 3 + or the smbol 4. But the net code is decodable without ambiguit (refi code) smbols code words 0 2 0 3 0 4 0 The Huffman rocedure can be used to build uniquel decodable codes: ste. Order the M smbols according to decreasing values of their robabilities. ste 2. Grou that last two smbols, M e M, into an equivalent smbol with robabilit M + M. ste 3. Reeat stes e 2 until one smbol is left. ste 4. Using the tree generated b the revious stes, associate the binar smbols 0 and to each air of branches originated from a given intermediate node. The code word of each message smbol is written (from left to right) as a binar sequence read from the root of the tree (thus, from right to left). eamle: Determine a Huffman code for the following source smbols robabilities, i 0.4 2 0.25 3 0.20 4 0. 5 0.05 Solution:

2 SOURCE ENCODING 5 A ossible solution is smbols code words 2 0 3 000 4 000 5 00 corresonding to the Huffman tree i i 2 3 4 5 0.4 0.25 0.2 0. 0.05 0 0.4 0.25 0.2 0.5 0 0.4 0.35 0.25 0 0.6 0.4 0.0 where Figure 3: Eamle of a Huffman tree The efficienc of the resulting code is defined as bits/smbol and η = H L, H = 0.4 log 2 (/0.4) + 0.25 log 2 (/0.25) + 0.2 log 2 (/0.2) + 0. log 2 (/0.) + 0.05 log 2 (/0.05) = 2.04 L = 0.4 + 0.25 2 + 0.20 3 + 0. 4 + 0.04 4 = 2. bits/word, ielding η = H/L = 97.2%. The Huffman algorithm, roosed in 952, requires a robabilistic source model. This data comression technique was later surassed b the Lemel-Ziv algorithm (invented in 978) which is adative and does not require knowledge of the source distribution model. The Lemel-Ziv algorithm is nowadas the most oular data

3 GAUSSIAN CHANNEL CAPACITY 6 comression technique; when alied to english tets is allows a comaction of about 55% whereas the Huffman algorithm allows about 43% comaction. Note that the urose of source encoding is to reduce the source code redundanc and not to rotect against channel errors. This task is assigned to the channel encoding, to be discussed later in this course. 3 Gaussian channel caacit The otimal digital sstem is the one that minimizes the bit error robabilit when certain constraints are imosed to the transmitted energ and channel bandwidth. An issue is the ossibilit of transmitting data without bit errors through a nois channel.this roblem was solved b Claude Shannon in 948, which has shown that, for an AWGN channel, it is ossible to transmit data with a bit error robabilit as small as desired (virtuall tending to zero) rovided that the transmission rate (in bits/second) is smaller than the channel caacit ( C = B log 2 + P ), bits/s N 0 B where B is the channel bandwidth in Hz, P is the average ower of the received signal in watts and P /N 0 B is the recetion signal-to-noise ratio. O channel caacit theorem establishes the theoretical limit that the actual communication sstems can achieve although it does not secif which are the modulation and encoding/decoding techniques to be used to attain that limit. Eamle: Which is the caacit of the AWGN channel, with bandwidth B = 0 KHz when the signal-to-noise ratio is: a) 0 db; b) 20 db. Solution: a) C = 0 4 log 2 2 = 0 Kbits/s b) C = 0 4 log 2 0 = 66.6 Kbits/s Let E b be the average bit energ and R b the transmission rate in bits/second. The Shannon theorem ma be re-written as ( C B = log 2 + E br b N 0 B ).

3 GAUSSIAN CHANNEL CAPACITY 7 or, But R b C; thus ( R b B log 2 + E br b N 0 B E b 2Rb/B N 0 R b /B. This inequalit gives us the minimum value of the bit signal-to-noise ratio for transmissions with arbitraril small robabilites. If now we allow the channel bandwidth to increase to infinit, the asmtotic value of the caacit is ), C = lim C = lim B ln ( ) + P N 0 B B B ln 2 But lim n ( + /n) n =, leading to = ( ln 2 lim ln + P ) B B N 0 B C = P/N 0 ln 2 where P = E b /T b = r b E b (r b = /T b is the transmission rate in bits/s). But r b < C, so E b N 0 > ln 2.6 db This value is the absolute minimum for communications with virtuall null robabilities, being named Shannon limit. Eamle: Determine the minimum bit signal-to-noise ratio to transmit with an arbitraril small error robabilit at the rate of kbit/second when the channel bandwidth is a) B = khz, b) B = 00 Hz. Solution: a) R b /B =, E b /N 0 (0 db) b) R b /B = 0, E b /N 0 02.3 (20. db)

4 DISCRETE MEMORYLESS CHANNEL 8 4 Discrete memorless channel A discrete channel is characterized b an inut alhabet X = { i }, i =,..., M, an outut alhabet Y = { j }, j =,..., N, and a set of conditional robabilities ij, where ij = P ( j i ) reresents the robabilit of receiving the smbol j when smbol i was transmitted (see Fig. 4). It is assumed that the channel does not have memor, that is n P ((),..., (n) (),..., (n)) = P ((i) (i)) i= where (i) and (i) are resectivel the channel inut and outut smbols that occur at the discrete time i with i =,..., n. 2 2 N 2 M M M2 MN N Figure 4: Model of the discrete memorless channel In general we have N ij =, i =,..., M j= that is, the sum of all the transition robabilities using the same inut smbol is equal to one. It is usual to organize the transition robabilities in the so-called channel matri N P = 2 2N..... M MN

4 DISCRETE MEMORYLESS CHANNEL 9 For M = N we define the average error robabilit as P (e) = = N N N N P ( i, j ) = P ( i ) P ( j i ) i= i= j= j i N N P ( i ) ij i= j= j i N P ( i )( ii ) i= j= j i whereas the robabilit of receiving correctl the transmitted smbol is N P (c) P (e) = P ( i ) ii i= noiseless channel. We have M = N and the transition robabilities are ij = {, j = i 0, j i Thus, P (e) = 0. useless channel. We have M = N and the outut smbols are indeendent of the inut smbols ij = P ( j i ) = P ( j ), i, j The noiseless channel and the useless channel are the etreme cases of the ossible channel behavior. The outut smbol of the noiseless channel defines uniquel the inut smbol. In the useless channel the received smbol does not give an useful information about the transmitted smbol. smmetric channel. In this channel each row of P contains the same set of values {r j }, j =,..., N and each column contains the same set of values {q i }, i =,..., M. Eamles: /2 /3 /6 P = /6 /2 /3, P = /3 /6 /2 [ /3 /3 /6 ] /6 /6 /6 /3 /3 Using the channel inut and outut alhabets, resectivel X and Y and the channel matri P, we can define the following five entroies. (i) inut entro H(X)

4 DISCRETE MEMORYLESS CHANNEL 0 H(X) = M i= ( ) P ( i ) log 2 P ( i ) bit/smbol which measures the average amount of information of each smbol of X. (ii) outut entro H(Y ) H(Y ) = N i= ( ) P ( j ) log 2 P ( j ) bit/smbol which measures the average amount of information of each smbol of Y. (iii) joint entro H(X, Y ) ( ) M N H(X, Y ) = P ( i, j ) log 2 i= j= P ( i, j ) bit/(air of smbols) which measures the average information content of a air of outut and inut channel smbols. (iv) conditional entro H(Y X) ( ) M N H(Y X) = P ( i, j ) log 2 i= j= P ( j i ) bit/smbol which measures the average amount of information required to secif the outut (received) smbol when the inut (transmitted) smbol is known. (v) conditional entro H(X Y ) ( ) M N H(X Y ) = P ( i, j ) log 2 i= j= P ( i j ) bit/smbol which measures the average amount of information required to secif the inut smbol when the outut smbol is known. This conditional entro reresents the average amount of information that is lost in the channel (or equivocation). It can also be conceived as the uncertaint about the channel inut after the observation of the channel outut. Note that for a noiseless channel there is no loss of the information in the channel and we have H(X Y ) = 0, whereas in the useless channel we have H(X Y ) = H(X). In this case, the uncertaint about the transmitted smbol remains unaltered b the observation (recetion) of the outut smbol (all the information was lost in the channel). Using the revious entro definitions and the fact that H(X Y ) H(X) and H(Y X) H(Y ), we obtain H(X, Y ) = H(Y, X) = H(X) + H(Y X) = H(Y ) + H(X Y ) ()

5 CAPACITY OF THE DISCRETE MEMORYLESS CHANNEL 5 Caacit of the discrete memorless channel We define the flow of information (or mutual information) between X and Y through the channel as or using () I(X; Y ) H(X) H(X Y ) bit/smbol (2) I(X; Y ) = H(Y ) H(Y X) = H(X) + H(Y ) H(X, Y ) (3) H(X) H(Y) H(X Y) I(X;Y) H(Y X) Figure 5: Relation between the conditional entroies and the mutual information We have I(X; Y ) = H(X) + H(Y ) H(X, Y ) = E = E [ [ ( )] log 2 P (X) + E ( )] P (X, Y ) log 2 P (X)P (Y ) But P ( i, j ) = P ( j i )P ( i ) leading to I(X; Y ) = From (2) and (3) we get also M N i= j= [ ( )] log 2 P (Y ) = M N i= j= ( )] E [log 2 P (X, Y ) ( ) P (i, j ) P ( i, i ) log 2 P ( i )P ( j ) ( ) P (j i ) P ( i, i ) log 2 P ( j ) I(X; Y ) = I(Y ; X) The mutual information I(X; Y ) quantifies the reduction of uncertaint relative to a X given the knowledge of Y (see Fig. 5).

6 CAPACITY OF THE BINARY SYMMETRIC CHANNEL 2 The caacit C of a discrete memorless channel is defined as the maimum of mutual information I(X; Y ) that can be transmitted through the channel C ma P () I(X; Y ) bit/transmission Maimization is carried out relative to the robabilities P ( i ) of the inut smbols. 6 Caacit of the binar smmetric channel Consider the binar smmetric channel (BSC) of Fig. 6 and let q P ( ) and r P ( ). The entro H(X) of source X is (see Fig. 7) - - 2 2 Figure 6: Binar smmetric channel (BSC) H(q) = q log 2 q + ( q) log 2 q and the entro of source Y is H(Y ) = H(r) with Besides ( ) 2 2 H(Y X) = P ( i, j ) log 2 i= j= P ( j i ) bit/smbol P (, ) = P ( )P ( ) = ( )q P (, 2 ) = P ( 2 )P ( ) = q P ( 2, ) = P ( 2 )P ( 2 ) = ( q) P ( 2, 2 ) = P ( 2 2 )P ( 2 ) = ( )( q)

6 CAPACITY OF THE BINARY SYMMETRIC CHANNEL 3 0.9 0.8 0.7 0.6 H(q) 0.5 0.4 0.3 0.2 0. 0 0 0.2 0.4 0.6 0.8 q Figure 7: Entro of the binar source X resulting in H(Y X) = ( )q log 2 + q log 2 + ( q) log 2 + ( )( q) log 2 = H() Thus, the mutual information of the BSC is given b and the caacit of the BSC is I(X; Y ) = H(Y ) H(Y X) = H(r) H() or taking into account Fig. 7 C = ma{h(r)} H() r bit/transmission C = H() The lot of the BSC caacit versus the transition robabilit is shown in Fig. 8. The situation that leads to the maimum, that is, H(r) =, corresonds to

6 CAPACITY OF THE BINARY SYMMETRIC CHANNEL 4 BSC caacit 0.9 0.8 0.7 C [bits/transmission] 0.6 0.5 0.4 0.3 0.2 0. 0 0 0.2 0.4 0.6 0.8 Figure 8: Caacit of the BSC versus the transition robabilit r log 2 r + ( r) log 2 r = which, b insection of Fig. 7, gives r = ( ) = /2. In other words, the maimum of information transmission from the channel inut to the outut, for an value of, occurs when the robabilities of and 2 are equal. The channel caacit is maimum when = 0 or =, since in both cases the channel is noiseless (see Fig. 9). (=0) 2 2 2 2 (=) Figure 9: Noiseless channels that maimize the caacit C For = /2, the channel caacit is zero because the outut smbols are indeendent from the inut smbols as no information can flow through the channel. We have then I(X; Y ) = H(Y ) H(Y X) = H(r) H() = H ( 2) H ( 2) = 0

6 CAPACITY OF THE BINARY SYMMETRIC CHANNEL 5 Bibliograh S. Benedetto, E. Biglieri - Princiles of Digital Transmission with Wireless Alications, Kluwer, 999. Simon Hakin - Communication Sstems, 4.th edition, Wile, 200. C. E. Shannon - A mathematical theor of communication, Bell Sst. Tech. J., vol. 27,. 379-423, 623-656, Jul-Oct. 948. S. Verdú - Fift ears of Shannon theor, IEEE Trans. Info. Theor, vol. 44,. 2057-2078, Oct. 998.