ELEMENT OF INFORMATION THEORY

Size: px
Start display at page:

Download "ELEMENT OF INFORMATION THEORY"

Transcription

1 History Table of Content ELEMENT OF INFORMATION THEORY O. Le Meur Univ. of Rennes 1 October

2 History Table of Content VERSION: : Document creation, done by OLM; : Document updated, done by OLM: Slide in introduction: The human communication; Slide on Spearman's rank correlation coecient; Slide on Correlation does not imply causation; Slide to introduce the amount of information; Slide presenting a demonstration concerning the joint entropy H(X, Y ). 2

3 History Table of Content ELEMENTS OF INFORMATION THEORY

4 Information Theory Goal and framework of the communication system Some denitions 1 Goal and framework of the communication system Some denitions

5 The human communication... Goal and framework of the communication system Some denitions Visual signal: the smoke signal is one of the oldest forms of communication in recorded history; Painting by Frederic Remington. Smoke signals were also employed by Hannibal, the Carthaginian general who lived two centuries BC Sonor signal: in Ancient China, soldiers stationed along the Great Wall would alert each other of impending enemy attack by signaling from tower to tower. In this way, they were able to transmit a message as far away as 480 km (300 miles) in just a few hours. A common point, there is a code... 5

6 Goal and Framework of the communication system Goal and framework of the communication system Some denitions To transmit an information at the minimum rate for a given quality; Seminal work of Claude Shannon (1948)[Shannon,48]. Ultimate goal The source and channel codec must be designed to ensure a good transmission of a message given a minimum bit rate or a minimum level of quality. 6

7 Goal and framework of the communication system Goal and framework of the communication system Some denitions Three major research axis: 1 Measure: carried by a message. 2 Compression: Lossy vs lossless coding... Mastering the distortion d(s, Ŝ) 3 Transmission: Channel and noise modelling Channel capacity 7

8 Some denitions Goal and framework of the communication system Some denitions Denition Source of information: something that produces messages! Denition Message: a stream of symbols taking theirs values in a predened alphabet. Example Source: a camera Message: a picture Symbols: RGB coecients alphabet= (0,..., 255) Example Source: book Message: a text Symbols: letters alphabet= (a,..., z) 8

9 Some denitions Goal and framework of the communication system Some denitions Denition Source Encoder: the goal is to transform S in a binary signal X of size as small as possible (eliminate the redundancy). Channel Encoding: the goal is to add some redundancy in order to be sure to transmit the binary signal X without errors. Denition Compression Rate: σ = Nb bits of input Nb bits of output 9

10 Information Theory Random variables and probability distribution Joint probability Conditional probability and Bayes rule Statistical independence of two random variables Correlation coecient 1 2 Random variables and probability distribution Joint probability Conditional probability and Bayes rule Statistical independence of two random variables Correlation coecient

11 Random variables and probability distribution Random variables and probability distribution Joint probability Conditional probability and Bayes rule Statistical independence of two random variables Correlation coecient The transmitted messages are considered as a random variable with a nite alphabet. Denition (Alphabet) An alphabet A is a set of data {a 1,..., a N } that we might wish to encode. Denition (Random Variable) A discrete random variable X is dened by an alphabet A = {x 1,..., x N } and a probability distribution {p 1,..., p N }, i.e. p i = P(X = x i ). Properties Remark: a symbol is the outcome of a random variable. 0 p i 1; N i=1 p i = 1 also noted x A p(x). p i = P(X = x i ) is equivalent to P X (x i ) and P X (i). 11

12 Joint probability Random variables and probability distribution Joint probability Conditional probability and Bayes rule Statistical independence of two random variables Correlation coecient Denition (Joint probability) Let X and Y be discrete random variables dened by alphabets {x 1,..., x N } and {y 1,..., y M }, respectively. A and B are the events X = x i and Y = y j, P(X = x i, Y = y j ) is the joint probability also called P(A, B) or p ij. Properties of the joint probability density function (pdf) N M P(X = x i=1 j=1 i, Y = y j ) = 1, If A B =, P(A, B) = P(X = x i, Y = y j ) = 0, Marginal probability distribution of X and Y : P(A) = P(X = x i ) = M j=1 P(X = x i, Y = y j ) is the probability of the event A, P(B) = P(Y = y j ) = N i=1 P(X = x i, Y = y j ) is the probability of the event B. 12

13 Joint probability Random variables and probability distribution Joint probability Conditional probability and Bayes rule Statistical independence of two random variables Correlation coecient Example Let X and Y be discrete random variables dened by alphabets {x 1, x 2} and {y 1, y 2, y 3, y 4}, respectively. The sets of events of (X, Y ) can be represented in a joint probability matrix: X, Y y 1 y 2 y 3 y 4 x 1 (x 1, y 1) (x 1, y 2) (x 1, y 3) (x 1, y 4) x 2 (x 2, y 1) (x 2, y 2) (x 2, y 3) (x 2, y 4) 13

14 Joint probability Random variables and probability distribution Joint probability Conditional probability and Bayes rule Statistical independence of two random variables Correlation coecient Example Let X and Y be discrete random variables dened by alphabets {R, NR} and {S, Su, A, W }, respectively. X, Y S Su A W R NR

15 Joint probability Random variables and probability distribution Joint probability Conditional probability and Bayes rule Statistical independence of two random variables Correlation coecient Example Let X and Y be discrete random variables dened by alphabets {R, NR} and {S, Su, A, W }, respectively. Questions: X, Y S Su A W R NR Does (X, Y ) dene a pdf? => Yes, 2 i=1 4 j=1 P(X = x i, Y = y j ) = 1; Is it possible to dene the marginal pdf of X? => Yes, X = {R, NR}, P(X = R) = 4 j=1 P(X = R, Y = y j) = 0.55, P(X = NR) = 4 j=1 P(X = NR, Y = y j) =

16 Conditional probability and Bayes rule Random variables and probability distribution Joint probability Conditional probability and Bayes rule Statistical independence of two random variables Correlation coecient Notation: the conditional probability of X = x i knowing that Y = y j is written as P(X = x i Y = y j ). Denition (Bayes rule) P(X = x i Y = y j ) = P(X =x i,y =y j ) P(Y =y j ) Properties N k=1 P(X = x k Y = y j ) = 1; P(X = x i Y = y j ) P(Y = y j X = x i ). 14

17 Conditional probability and Bayes rule Random variables and probability distribution Joint probability Conditional probability and Bayes rule Statistical independence of two random variables Correlation coecient Example Let X and Y be discrete random variables dened by alphabets {R, NR} and {S, Su, A, W }, respectively. X, Y S Su A W R NR Question: What is the conditional probability distribution of P(X = x i Y = S)? 15

18 Conditional probability and Bayes rule Random variables and probability distribution Joint probability Conditional probability and Bayes rule Statistical independence of two random variables Correlation coecient Example Let X and Y be discrete random variables dened by alphabets {R, NR} and {S, Su, A, W }, respectively. X, Y S Su A W R NR Question: What is the conditional probability distribution of P(X = x i Y = S)? P(Y = S) = 2 P(Y = y i=1 i ) = 0.25, and from Bayes P(X = R Y = S) = and P(X = NR Y = S) =

19 Statistical independence of two random variables Random variables and probability distribution Joint probability Conditional probability and Bayes rule Statistical independence of two random variables Correlation coecient Denition (Independence) Two discrete random variables are independent if the joint pdf is egal to the product of the marginal pdfs: P(X = x i, Y = y j ) = P(X = x i ) P(Y = y j ) i and j. Remark: If X and Y independent, P(X = x i Y = y j ) = P(X = x i ) (From Bayes). While independence of a set of random variables implies independence of any subset, the converse is not true. In particular, random variables can be pairwise independent but not independent as a set. 16

20 Linear correlation coecient (Pearson) Random variables and probability distribution Joint probability Conditional probability and Bayes rule Statistical independence of two random variables Correlation coecient Denition (linear correlation coecient) The degree of linearity between two discrete random variables X and Y is given by the linear correlation coecient r: r(x, Y ) def = COV (X,Y ) σ X σ Y COV (X, Y ) is the covariance given by 1 N (X X )(Y Y ) (X σ X is the standard deviation given by X ). N Properties r(x, Y ) [ 1, 1]; r(x, Y ) 1 means that X and Y are very correlated; r(t X, Y ) = r(x, Y ), where T is a linear transform. 17

21 Linear correlation coecient (Pearson) Random variables and probability distribution Joint probability Conditional probability and Bayes rule Statistical independence of two random variables Correlation coecient Relationship between the regression line and the linear correlation coecient: Regression line The goal is to nd Ŷ = ax + b. a and b deduced from the minimization of the MSE: J = argmin (a,b) (Y Ŷ ) 2 J COV (X,Y ) = 0 a = a σ X 2 J b = 0 b = Y ax The slope a is equal to r, only when σ X = σ Y (standardised variables). 18

22 Spearman's rank correlation coecient Random variables and probability distribution Joint probability Conditional probability and Bayes rule Statistical independence of two random variables Correlation coecient Denition (Spearman's rank correlation coecient) The degree of linearity between two discrete random variables X and Y is given by the linear correlation coecient r: ρ(x, Y ) def = 1 6 d 2 i n 2 (n 1) where d i = x i y i. The n data X i and Y i are converted to ranks x i and y i. To illustrate the dierence between Pearson and Spearman correlation coecient: Pearson correlation is 0.456; Spearman correlation is 1. (0, 1),(10, 100),(101, 500),(102, 2000) 19

23 Correlation does not imply causation Random variables and probability distribution Joint probability Conditional probability and Bayes rule Statistical independence of two random variables Correlation coecient Correlation does not imply causation is a phrase used in science and statistics to emphasize that correlation between two variables does not automatically imply that one causes the other. Example (Ice cream and drowning) As ice cream sales increase, the rate of drowning deaths increases sharply. Therefore, ice cream causes drowning. The aforementioned example fails to recognize the importance of time in relationship to ice cream sales. Ice cream is sold during the summer months at a much greater rate, and it is during the summer months that people are more likely to engage in activities involving water, such as swimming. The increased drowning deaths are simply caused by more exposure to water based activities, not ice cream. From 20

24 Information Theory Self-Information Entropy denition Joint information, joint entropy Conditional information, conditional entropy Mutual information Venn's diagram Self-Information Entropy denition Joint information, joint entropy Conditional information, conditional entropy Mutual information Venn's diagram

25 Self-Information Entropy denition Joint information, joint entropy Conditional information, conditional entropy Mutual information Venn's diagram Claude Shannon ( ) and communication theory For Shannon a message is very informative if the chance of its occurrence is small. If, in contrast, a message is very predictable, then it has a small amount of information. One is not surprised to receive it. A measure of the amount of information in a message. 22

26 Self-Information Self-Information Entropy denition Joint information, joint entropy Conditional information, conditional entropy Mutual information Venn's diagram Let X be a discrete random variable dened by the alphabet {x 1,..., x N } and the probability density {p(x = x 1),..., p(x = x N )}. How to measure the amount of information provided by an event A, X = x i? Denition (Self-Information proposed by Shannon) I (A) def = log 2p(A) I (X = x i ) def = log 2p(X = x i ) Unit: bit/symbol. Properties I (A) 0; I (A) = 0 if p(a) = 1; if p(a) < p(b) then I (A) > I (B); p(a) 0, I (A) +. The self-information can be measured in bits, nits, or hartleys depending on the base of the logarithm. 23

27 Shannon entropy Self-Information Entropy denition Joint information, joint entropy Conditional information, conditional entropy Mutual information Venn's diagram Shannon introduced the entropy rate, a quantity that measured a source's information production rate. Denition (Shannon Entropy) The entropy of a discrete random variable X dened by the alphabet {x 1,..., x N } and the probability density {p(x = x 1),..., p(x = x N )} is given by: H(X ) = E{I (X )} = N i=1 p(x = x i )log 2(p(X = x i )), unit: bit/symbol. The entropy H(X ) is a measure of the amount of uncertainty, a measure of surprise associated with the value of X. Properties Entropy gives the average number of bits per symbol to represent X H(X ) 0; H(X ) log 2N (equality for a uniform probability distribution). 24

28 Shannon entropy Self-Information Entropy denition Joint information, joint entropy Conditional information, conditional entropy Mutual information Venn's diagram Example Example 1: The value of p(0) is highly predictable, the entropy (amount of uncertainty) is zero. p(0) p(1) p(2) p(3) p(4) p(5) p(6) p(7) Entropy (bits/symbol) Example

29 Shannon entropy Self-Information Entropy denition Joint information, joint entropy Conditional information, conditional entropy Mutual information Venn's diagram Example Example 1: The value of p(0) is highly predictable, the entropy (amount of uncertainty) is zero; Example 2: This is a probability distribution of Bernoulli {p, 1 p}. p(0) p(1) p(2) p(3) p(4) p(5) p(6) p(7) Entropy (bits/symbol) Example Example

30 Shannon entropy Self-Information Entropy denition Joint information, joint entropy Conditional information, conditional entropy Mutual information Venn's diagram Example Example 1: The value of p(0) is highly predictable, the entropy (amount of uncertainty) is zero; Example 2: This is a probability distribution of Bernoulli {p, 1 p}; Example 3: Uniform probability distribution (P(X = x i ) = M 1, with M = 4, i {2,..., 5}). p(0) p(1) p(2) p(3) p(4) p(5) p(6) p(7) Entropy (bits/symbol) Example Example Example

31 Shannon entropy Self-Information Entropy denition Joint information, joint entropy Conditional information, conditional entropy Mutual information Venn's diagram Example Example 1: The value of p(0) is highly predictable, the entropy (amount of uncertainty) is zero; Example 2: This is a probability distribution of Bernoulli {p, 1 p}; Example 3: Uniform probability distribution (P(X = x i ) = M 1, with M = 4, i {2,..., 5}); Example 4: ; p(0) p(1) p(2) p(3) p(4) p(5) p(6) p(7) Entropy (bits/symbol) Example Example Example Example

32 Shannon entropy Self-Information Entropy denition Joint information, joint entropy Conditional information, conditional entropy Mutual information Venn's diagram Example Example 1: The value of p(0) is highly predictable, the entropy (amount of uncertainty) is zero; Example 2: This is a probability distribution of Bernoulli {p, 1 p}; Example 3: Uniform probability distribution (P(X = x i ) = M 1, with M = 4, i {2,..., 5}); Example 4: ; Example 5: Uniform probability distribution (P(X = x i ) = N 1, with N = 8, i {0,..., 7}) p(0) p(1) p(2) p(3) p(4) p(5) p(6) p(7) Entropy (bits/symbol) Example Example Example Example Example

33 Self-Information Entropy denition Joint information, joint entropy Conditional information, conditional entropy Mutual information Venn's diagram Example (Shannon entropy) (a) H=1.39bit/s (b) H=3.35bit/s (c) H=6.92bit/s (d) H=7.48bit/s (e) H=7.62bit/s (c) Original picture (256 levels); (a) Original picture (4 levels); (b) Original picture (16 levels); (d) Original picture + uniform noise; (e) Original picture + uniform noise (more than previous one). 26

34 Joint information/joint entropy Self-Information Entropy denition Joint information, joint entropy Conditional information, conditional entropy Mutual information Venn's diagram Denition (Joint information) Let X and Y be discrete random variables dened by alphabet {x 1,..., x N } and {y 1,..., y M }, respectively. The joint information of two events (X = x i ) and (Y = y i ) is dened by I (X = x i, Y = y j ) = log 2(p(X = x i, Y = y j )). Denition (Joint entropy) The joint entropy of the two discret random variables X and Y is given by: H(X, Y ) = E{I (X = x i, Y = y j )}. H(X, Y ) = N i=1 M j=1 p(x = x i, Y = y j )log 2(p(X = x i, Y = y j )) 27

35 Joint information/joint entropy Self-Information Entropy denition Joint information, joint entropy Conditional information, conditional entropy Mutual information Venn's diagram Demonstrate that the joint entropy H(X, Y ) is equal to H(X ) + H(Y ), if X and Y are statistically independant: H(X, Y ) = P(x, y) log P(x, y) x X y Y = P(x)P(y) log P(x)P(y) x X y Y = P(x)P(y) [log P(x) + log P(y)] x X y Y = P(x)P(y) log P(x) P(x)P(y) log P(y) x X y Y y Y x X = P(x) log P(x) P(y) P(y) log P(y) P(x) x X y Y y Y x X = P(x) log P(x) P(y) log P(y) x X y Y H(X, Y ) = H(X ) + H(Y ) 28

36 Joint information/joint entropy Self-Information Entropy denition Joint information, joint entropy Conditional information, conditional entropy Mutual information Venn's diagram In the same vein, you should know how to demonstrate that: H(X, Y ) H(X ) + H(Y ) (Equality if X and Y independant). This property is called sub-additivity. It means that two systems, considered together, can never have more entropy than the sum of the entropy in each of them. 29

37 Conditional information/conditional entropy Self-Information Entropy denition Joint information, joint entropy Conditional information, conditional entropy Mutual information Venn's diagram Denition (Conditional information) The conditional information... I (X = x i Y = y j ) = log 2(p(X = x i Y = y j )) Denition (Conditional entropy) The conditional entropy of Y given the random variable X : H(Y X ) = H(Y X ) = N p(x = x i )H(Y X = x i ) i=1 N M i=1 j=1 p(x = x i, Y = y j )log 2 1 p(y = y j X = x i ) The conditional entropy H(Y X ) is the amount of uncertainty remaining about Y after X is known. Remarks: We always have H(Y X ) H(Y ) (The knowledge reduces the uncertainty); Entropy chain rule: H(X, Y ) = H(X ) + H(Y X ) = H(Y ) + H(X Y ) (From Bayes); 30

38 Mutual information Self-Information Entropy denition Joint information, joint entropy Conditional information, conditional entropy Mutual information Venn's diagram Denition (Mutual information) The Mutual information of two events X = x i and Y = y j is dened as I (X = x i ; Y = y j ) = log 2 p(x =x i )p(y =y j ) p(x =x i,y =y j ) The Mutual information of two random variables X and Y is dened as I (X ; Y ) = N i=1 Mj=1 p(x = x i, Y = y j )log 2 p(x =x i )p(y =y j ) p(x =x i,y =y j ) The mutual information I (X ; Y ) measures the information that X and Y share... Be careful to the notation, I (X, Y ) I (X ; Y ) Properties Symmetry: I (X = x i ; Y = y j ) = I (Y = y j ; X = x i ); I (X ; Y ) 0; zero if and only if X and Y are independent variables; H(X X ) = 0 H(X ) = I (X ; X ) I (X ; X ) I (X ; Y ). 31

39 Mutual information Self-Information Entropy denition Joint information, joint entropy Conditional information, conditional entropy Mutual information Venn's diagram Mutual information and dependency The mutual information can be expressed as : I (X ; Y ) = D(p(X = x i, Y = y j ) p(x = x i )p(y = y j )) where, p(x = x i, Y = y j ) joint pdf of X and Y ; p(x = x i ) and p(y = y i ) marginal probability distribution of X and Y, respectively; D(..) the divergence of Kullback-Leibler. Remarks regarding the KL-divergence: D(p q) = N i=1 p(x = x i )log 2 p(x =x i ) p(q=q i ), Q random variable {q 1,..., q N }; This is a measure of divergence between two pdfs, not a distance; D(p q) = 0, if and only if the two pdfs are strictly the same. 32

40 Self-Information Entropy denition Joint information, joint entropy Conditional information, conditional entropy Mutual information Venn's diagram Example (Mutual information) Mutual information between the original picture of C. Shannon and the following ones: (a) I=2.84 (b) I=2.54 (c) I=2.59 (d) I=2.41 (e) I=1.97 (a) Original picture + uniform noise; (b) Original picture + uniform noise (more than previous one); (c) Human; (d) Einstein; (e) A landscape. 33

41 Venn's diagram Self-Information Entropy denition Joint information, joint entropy Conditional information, conditional entropy Mutual information Venn's diagram X Y H(X Y ) I (X ; Y ) H(Y X ) We retrieve: H(X, Y ) = H(X ) + H(Y X ) H(X, Y ) = H(Y ) + H(X Y ) H(X, Y ) = H(X Y ) + H(Y X ) + I (X ; Y ) I (X ; Y ) = H(X ) H(X Y ) I (X ; Y ) = H(Y ) H(Y X ) H(X ) H(Y ) H(X, Y ) 34

42 Information Theory Parameters of a discrete source Discrete memoryless source Extension of a discrete memoryless source with memory (Markov source) Parameters of a discrete source Discrete memoryless source Extension of a discrete memoryless source with memory (Markov source)

43 Parameters of a discrete source Discrete memoryless source Extension of a discrete memoryless source with memory (Markov source) Remind of the goal To transmit an information at the minimum rate for a given quality; Seminal work of Claude Shannon (1948)[Shannon,48]. 36

44 Parameters of a discrete source Parameters of a discrete source Discrete memoryless source Extension of a discrete memoryless source with memory (Markov source) Denition (Alphabet) An alphabet A is a set of data {a 1,..., a N } that we might wish to encode. Denition (discrete source) A source is dened as a discrete random variable S dened by the alphabet {s 1,..., s N } and the probability density {p(s = s 1),..., p(s = s N )}. Example (Text) Alphabet={a,..., z} Source Message {a, h, y, r, u} 37

45 Discrete memoryless source Parameters of a discrete source Discrete memoryless source Extension of a discrete memoryless source with memory (Markov source) Denition (Discrete memoryless source) A discrete source S is memoryless if the symbols of the source alphabet are independent and identically distributed: p(s = s 1,..., S = s N ) = N i=1 p(s = s i ) Remarks: Entropie: H(S) = N i=1 p(s = s i )log 2p(S = s i ) bit; Particular case of a uniform source: H(S) = log 2N. 38

46 Extension of a discrete memoryless source Parameters of a discrete source Discrete memoryless source Extension of a discrete memoryless source with memory (Markov source) Rather than considering individuals symbols, more useful to deal with blocks of symbols. Let S be a discrete source with an alphabet of size N. The output of the source is grouped into blocks of K symbols. The new source, called S K, is dened by an alphabet of size N K. Denition (Discrete memoryless source, K th extension of a source S) If the source S K is the K th extension of a source S, the entropy per extended symbols of S K is K times the entropy per individual symbol of S: H(S K ) = K H(S) Remark: the probability of a symbol si K p(si K ) = K j=1 p(s ij ). = (s i1,..., s ik ) from the source S K is given by 39

47 with memory (Markov source) Parameters of a discrete source Discrete memoryless source Extension of a discrete memoryless source with memory (Markov source) Discrete memoryless source This is not realistic! Successive symbols are not completely independent of one another... in a picture: a pel (S 0) depends statistically on the previous pels s5 s4 s3 s2 s1 s0 This dependence is expressed by the conditionnal probability p(s 0 S 1, S 2, S 3, S 4, S 5). p(s 0 S 1, S 2, S 3, S 4, S 5) p(s 0) in the langage (french): p(s k = u) p(s k = e), p(s k = u S k 1 = q) >> p(s k = e S k 1 = q); 40

48 with memory (Markov source) Parameters of a discrete source Discrete memoryless source Extension of a discrete memoryless source with memory (Markov source) Denition ( with memory) A discrete source with memory of order N (N th order Markov) is dened as: p(s k S k 1, S k 2,..., S k N ) The entropy is given by: H(S) = H(S k S k 1, S k 2,..., S k N ) Example (One dimensional Markov model) The pel value S 0 depends statistically only on the pel value S 1. Q H(X ) = 1.9 bit/symb, H(Y ) = 1.29, H(X, Y ) = 2.15, H(X Y ) =

49 Information Theory Parameters of a discrete channel Memoryless channel Channel with memory A taxonomy Parameters of a discrete channel Memoryless channel Channel with memory A taxonomy 6 42

50 Parameters of a discrete channel Memoryless channel Channel with memory A taxonomy Remind of the goal To transmit an information at the minimum rate for a given quality; Seminal work of Claude Shannon (1948)[Shannon,48]. 43

51 Parameters of a discrete channel Parameters of a discrete channel Memoryless channel Channel with memory A taxonomy X t {x 1,..., x N } Communication channel Y t {y 1,..., y M } Denition (Channel) A channel transmits as best as it can symbols X t. Symbols Y t are produced (possibly corrupted). The subscript t indicates the time. The channel is featured by the following conditional probabilities: p(y t = y j X t = x i, X t = x i 1...) The output symbol can be dependent on several input symbols. Capacity The channel is characterized by its capacity C bits per symbol. It is an upper bound on the bit rate that can be accomodated by the channel. C = max p(x ) I (X ; Y ) 44

52 Memoryless channel Parameters of a discrete channel Memoryless channel Channel with memory A taxonomy Denition (Memoryless channel) The channel is memoryless when the conditionnal probability p(y X ) is independent of previously transmitted channel symbols: p(y = y j X = x i ). Notice that the subscript t is no more required... Transition matrix Π, size (N, M): p(y =y1 X =x1)... p(y =y m X =x1) Π(X, Y ) = p(y =y1 X =x n)... p(y =y m X =x n) Graphical representation: x1 p(y1 x 1 ) p(y2 x 1 ) y1 x2 y2 p(ym x1 ) xn p(ym xn ) ym 45

53 Ideal Channel Parameters of a discrete channel Memoryless channel Channel with memory A taxonomy Ideal Channel or noiseless channel Perfect correspondance between the input and the output. Each transmitted bit is received without error. p(y = y j X = x i ) = δ xi y j with, δ xy = 1 if x = y 0 if x y The channel capacity is C = max p(x ) I (X ; Y ) = 1bit. Π(X, Y ) = ( )

54 Binary Symmetric Channel (BSC) Parameters of a discrete channel Memoryless channel Channel with memory A taxonomy Denition (Binary Symmetric Channel (BSC)) A binary symmetric channel has binary input and binary output (X = {0, 1}, Y = {0, 1}). C = 1 H(p) bits. p the probability that the symbol is modied at the ouptut. ( ) 1 p p Π(X, Y ) = p 1 p p p p 1 p 0 1 From left to right: original binary signal, result for p = 0.1, p = 0.2 and p = 1 (the 2 channel has a null capacity!). 47

55 Binary Erasure Channel (BEC) Parameters of a discrete channel Memoryless channel Channel with memory A taxonomy Denition (Binary Erasure Channel (BEC)) A binary erasure channel has binary input and ternary output (X = {0, 1}, Y = {0, 1, e}). The third symbol is called an erasure (complete loss of information about an input bit). C = 1 p e p bits. p e is the probability to receive the symbol e. 1 p p e 0 0 p e ( 1 p pe p e p Π(X, Y ) = p p e 1 p p e ) p p p e e 1 1 p p e 1 48

56 Channel with memory Parameters of a discrete channel Memoryless channel Channel with memory A taxonomy Gilbert-Elliott model The channel is a binary symmetric channel with memory determined by a two states Markov chain. A channel has two states: G, good state and B, bad state. α and β are the transition probabilities α 1 β G B 1 α β Transition matrix to go from one state to another: S t = {G, B}. S t ( 1 β S t 1 β ) α 1 α 49

57 : a taxonomy Parameters of a discrete channel Memoryless channel Channel with memory A taxonomy Lossless channel: H(X Y ) = 0, the input is dened by the output; Deterministic channel: H(Y X ), the output dened the input; Distortionless channel: H(X ) = H(Y ) = I (X ; Y ), deterministic and lossless; Channel with a null capacity: I (X ; Y ) = 0. 50

58 Information Theory Source code Kraft inequality Higher bound of entropy Source coding theorem Rabbani-Jones extension Channel coding theorem Source/Channel theorem Source code Kraft inequality Higher bound of entropy Source coding theorem Rabbani-Jones extension Channel coding theorem Source/Channel theorem 51 7

59 Source code Source code Kraft inequality Higher bound of entropy Source coding theorem Rabbani-Jones extension Channel coding theorem Source/Channel theorem Denition (Source code) A source code C for a random variable X is a mapping from x X to {0, 1}. Let c i denotes the code word for x i and l i denote the length of c i. {0, 1} is the set of all nite binary string. Denition (Prex code) A code is called a prex code (instantaneous code) if no code word is a prex of another code word Not required to wait for the whole message to be able to decode it. 52

60 Kraft inequality Source code Kraft inequality Higher bound of entropy Source coding theorem Rabbani-Jones extension Channel coding theorem Source/Channel theorem Denition (Kraft inequality) A code C is instantaneous if it satises the following inequality: N i=1 2 l i 1 with, l i the length of code word length i 53

61 Kraft inequality Source code Kraft inequality Higher bound of entropy Source coding theorem Rabbani-Jones extension Channel coding theorem Source/Channel theorem Example (Illustration of the Kraft inequality using a coding tree) The following tree contains all three-bit codes:

62 Kraft inequality Source code Kraft inequality Higher bound of entropy Source coding theorem Rabbani-Jones extension Channel coding theorem Source/Channel theorem Example (Illustration of the Kraft inequality using a coding tree) The following tree contains a prex code. We decide to use the code word 0 and The remaining leaves constitute a prex code: 4 i=1 2 l i = = 1 55

63 Higher bound of entropy Source code Kraft inequality Higher bound of entropy Source coding theorem Rabbani-Jones extension Channel coding theorem Source/Channel theorem Let S a discret source dened by the alphabet {s 1,..., s N } and the probability density {p(s = s 1),..., p(s = s N )}. Denition (Higher bound of entropy) H(S) log 2N Interpretation the entropy is limited by the size of the alphabet; a source with a uniform pdf provides the highest entropy. 56

64 Source coding theorem Source code Kraft inequality Higher bound of entropy Source coding theorem Rabbani-Jones extension Channel coding theorem Source/Channel theorem Let S a discrete source dened by the alphabet {s 1,..., s N } and the probability density {p(s = s 1),..., p(s = s N )}. Each symbol s i is coded with a length l i bits: Denition (Source coding theorem or First ) H(S) l C with l C = N i=1 p i l i The entropy of the source gives the limit of the lossless compression. We can not encode the source with less than H(S) bit per symbol. The entropy of the source is the lower-bound. Warning... {l i } i=1,...,n must satisfy Kraft's inequality. Remarks: l C = H(S), when l i = log 2p(X = x i ). 57

65 Source coding theorem Source code Kraft inequality Higher bound of entropy Source coding theorem Rabbani-Jones extension Channel coding theorem Source/Channel theorem Denition (Source coding theorem (bis)) Whatever the source S, there exist an instantaneous code C, such that H(S) l C < H(S) + 1 The upper bound is equal to H(S) + 1, simply because the Shannon information gives a fractionnal value. 58

66 Source coding theorem Source code Kraft inequality Higher bound of entropy Source coding theorem Rabbani-Jones extension Channel coding theorem Source/Channel theorem Example Let X a random variable with the following probability density. The optimal code lengths are given by the self-information: X x 1 x 2 x 3 x 4 x 5 P(X = x i ) I (X = x i ) The entropy H(X ) is equal to bits. The source coding theorem gives: l <

67 Rabbani-Jones extension Source code Kraft inequality Higher bound of entropy Source coding theorem Rabbani-Jones extension Channel coding theorem Source/Channel theorem Symbols can be coded in blocks of source samples instead of one at a time (block coding). In this case, further bit-rate reductions are possible. Denition (Rabbani-Jones extension) Let S be an ergodic source with an entropy H(S). Consider encoding blocks of N source symbols at a time into binary codewords. For any δ > 0, it is possible to construct a code that the average number of bits per original source symbol l C satises: H(S) l C < H(S) + δ Remarks: Any source can be losslessly encoded with a code very close to the source entropy in bits; There is a high benet to increase the value N; But, the number of symbols in the alphabet becomes very high. Example: block of 2 2 pixels (coded on 8 bits) leads to values per block... 60

68 Channel coding theorem Source code Kraft inequality Higher bound of entropy Source coding theorem Rabbani-Jones extension Channel coding theorem Source/Channel theorem Let a discrete memoryless channel of capacity C. The channel coding transform the messages {b 1,..., b k } into binary codes having a length n. Denition (Transmission rate) The transmission rate R is given by: R def = k n R is the amount of information stemming from the symbols b i per transmitted bits. B i block of length k Channel coding C i block of length n 61

69 Channel coding theorem Source code Kraft inequality Higher bound of entropy Source coding theorem Rabbani-Jones extension Channel coding theorem Source/Channel theorem Example (repetition coder) The coder is simply a device repeating r times a particular bit. Below, example for r = 2. R = 1 3. B t 011 Channel coding C t This is our basic scheme to communicate with others! We repeat the information... Denition (Channel coding theorem or second ) R < C, p e > 0: it is possible to transmit information nearly without error at any rate below the channel capacity; if R C, all codes will have a probability of error greater than a certain positive minimal level, and this level increases with the rate. 62

70 Source/Channel theorem Source code Kraft inequality Higher bound of entropy Source coding theorem Rabbani-Jones extension Channel coding theorem Source/Channel theorem Let a noisy channel having a capacity C and a source S having an entropy H. Denition (Source/Channel theorem) if H < C it is possible to transmit information nearly without error. Shannon showed that it was possible to do that by making a source coding followed by a channel coding; if H C, the transmission cannot be done with an arbitrarily small probability. 63

71 Information Theory

72 YOU MUST KNOW Let X a random variable dened by X = {x 1,..., x N } and the probabilities {p x1,..., p xn }. Let Y a random variable dened by Y = {y 1,..., y N } and the probabilities {p y1,..., p yn }. Ni=1 p xi = 1 Independence: p(x = x 1,..., X = x N ) = N i=1 p(x = x i ) Bayes rule: p(x = x i Y = y j ) = p(x =x i,y =y j ) p(y =y j ) Self information: I (X = x i ) = log 2 p(x = x i ) Mutual information: I (X ; Y ) = N i=1 Mj=1 p(x = x i, Y = y j )log 2 p(x =x i )p(y =y j ) p(x =x i,y =y j ) Entropy: H(X ) = N i=1 p(x = x i )log 2 p(x = x i ) Conditional entropy of Y given X : H(Y X ) = N i=1 p(x = x i )H(Y X = x i ) Higher Bound of entropy: H(X ) log 2 N Limit of the lossless compression : H(X ) l C, l C = N i=1 p xi l i 65

73 Suggestion for further reading... [Shannon,48] C.E. Shannon. A Mathematical Theory of Communication. Bell System Technical Journal, 27,

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information. L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate

More information

Dept. of Linguistics, Indiana University Fall 2015

Dept. of Linguistics, Indiana University Fall 2015 L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission

More information

UNIT I INFORMATION THEORY. I k log 2

UNIT I INFORMATION THEORY. I k log 2 UNIT I INFORMATION THEORY Claude Shannon 1916-2001 Creator of Information Theory, lays the foundation for implementing logic in digital circuits as part of his Masters Thesis! (1939) and published a paper

More information

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Multimedia Communications. Mathematical Preliminaries for Lossless Compression Multimedia Communications Mathematical Preliminaries for Lossless Compression What we will see in this chapter Definition of information and entropy Modeling a data source Definition of coding and when

More information

Noisy channel communication

Noisy channel communication Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 6 Communication channels and Information Some notes on the noisy channel setup: Iain Murray, 2012 School of Informatics, University

More information

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1 Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,

More information

Information Theory. M1 Informatique (parcours recherche et innovation) Aline Roumy. January INRIA Rennes 1/ 73

Information Theory. M1 Informatique (parcours recherche et innovation) Aline Roumy. January INRIA Rennes 1/ 73 1/ 73 Information Theory M1 Informatique (parcours recherche et innovation) Aline Roumy INRIA Rennes January 2018 Outline 2/ 73 1 Non mathematical introduction 2 Mathematical introduction: definitions

More information

Chapter 9 Fundamental Limits in Information Theory

Chapter 9 Fundamental Limits in Information Theory Chapter 9 Fundamental Limits in Information Theory Information Theory is the fundamental theory behind information manipulation, including data compression and data transmission. 9.1 Introduction o For

More information

CSCI 2570 Introduction to Nanocomputing

CSCI 2570 Introduction to Nanocomputing CSCI 2570 Introduction to Nanocomputing Information Theory John E Savage What is Information Theory Introduced by Claude Shannon. See Wikipedia Two foci: a) data compression and b) reliable communication

More information

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University Chapter 4 Data Transmission and Channel Capacity Po-Ning Chen, Professor Department of Communications Engineering National Chiao Tung University Hsin Chu, Taiwan 30050, R.O.C. Principle of Data Transmission

More information

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Capacity of a channel Shannon s second theorem. Information Theory 1/33 Capacity of a channel Shannon s second theorem Information Theory 1/33 Outline 1. Memoryless channels, examples ; 2. Capacity ; 3. Symmetric channels ; 4. Channel Coding ; 5. Shannon s second theorem,

More information

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 2: Entropy and Mutual Information Chapter 2 outline Definitions Entropy Joint entropy, conditional entropy Relative entropy, mutual information Chain rules Jensen s inequality Log-sum inequality

More information

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy Coding and Information Theory Chris Williams, School of Informatics, University of Edinburgh Overview What is information theory? Entropy Coding Information Theory Shannon (1948): Information theory is

More information

Entropies & Information Theory

Entropies & Information Theory Entropies & Information Theory LECTURE I Nilanjana Datta University of Cambridge,U.K. See lecture notes on: http://www.qi.damtp.cam.ac.uk/node/223 Quantum Information Theory Born out of Classical Information

More information

Chapter 2 Source Models and Entropy. Any information-generating process can be viewed as. computer program in executed form: binary 0

Chapter 2 Source Models and Entropy. Any information-generating process can be viewed as. computer program in executed form: binary 0 Part II Information Theory Concepts Chapter 2 Source Models and Entropy Any information-generating process can be viewed as a source: { emitting a sequence of symbols { symbols from a nite alphabet text:

More information

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006)

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006) MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK SATELLITE COMMUNICATION DEPT./SEM.:ECE/VIII UNIT V PART-A 1. What is binary symmetric channel (AUC DEC 2006) 2. Define information rate? (AUC DEC 2007)

More information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information 4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information Ramji Venkataramanan Signal Processing and Communications Lab Department of Engineering ramji.v@eng.cam.ac.uk

More information

EC2252 COMMUNICATION THEORY UNIT 5 INFORMATION THEORY

EC2252 COMMUNICATION THEORY UNIT 5 INFORMATION THEORY EC2252 COMMUNICATION THEORY UNIT 5 INFORMATION THEORY Discrete Messages and Information Content, Concept of Amount of Information, Average information, Entropy, Information rate, Source coding to increase

More information

MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK. SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A

MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK. SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK DEPARTMENT: ECE SEMESTER: IV SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A 1. What is binary symmetric channel (AUC DEC

More information

1 Introduction to information theory

1 Introduction to information theory 1 Introduction to information theory 1.1 Introduction In this chapter we present some of the basic concepts of information theory. The situations we have in mind involve the exchange of information through

More information

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy Haykin_ch05_pp3.fm Page 207 Monday, November 26, 202 2:44 PM CHAPTER 5 Information Theory 5. Introduction As mentioned in Chapter and reiterated along the way, the purpose of a communication system is

More information

Information and Entropy

Information and Entropy Information and Entropy Shannon s Separation Principle Source Coding Principles Entropy Variable Length Codes Huffman Codes Joint Sources Arithmetic Codes Adaptive Codes Thomas Wiegand: Digital Image Communication

More information

5 Mutual Information and Channel Capacity

5 Mutual Information and Channel Capacity 5 Mutual Information and Channel Capacity In Section 2, we have seen the use of a quantity called entropy to measure the amount of randomness in a random variable. In this section, we introduce several

More information

X 1 : X Table 1: Y = X X 2

X 1 : X Table 1: Y = X X 2 ECE 534: Elements of Information Theory, Fall 200 Homework 3 Solutions (ALL DUE to Kenneth S. Palacio Baus) December, 200. Problem 5.20. Multiple access (a) Find the capacity region for the multiple-access

More information

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Lecture 6 I. CHANNEL CODING. X n (m) P Y X 6- Introduction to Information Theory Lecture 6 Lecturer: Haim Permuter Scribe: Yoav Eisenberg and Yakov Miron I. CHANNEL CODING We consider the following channel coding problem: m = {,2,..,2 nr} Encoder

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

Information Theory, Statistics, and Decision Trees

Information Theory, Statistics, and Decision Trees Information Theory, Statistics, and Decision Trees Léon Bottou COS 424 4/6/2010 Summary 1. Basic information theory. 2. Decision trees. 3. Information theory and statistics. Léon Bottou 2/31 COS 424 4/6/2010

More information

Lecture 22: Final Review

Lecture 22: Final Review Lecture 22: Final Review Nuts and bolts Fundamental questions and limits Tools Practical algorithms Future topics Dr Yao Xie, ECE587, Information Theory, Duke University Basics Dr Yao Xie, ECE587, Information

More information

Exercise 1. = P(y a 1)P(a 1 )

Exercise 1. = P(y a 1)P(a 1 ) Chapter 7 Channel Capacity Exercise 1 A source produces independent, equally probable symbols from an alphabet {a 1, a 2 } at a rate of one symbol every 3 seconds. These symbols are transmitted over a

More information

Digital communication system. Shannon s separation principle

Digital communication system. Shannon s separation principle Digital communication system Representation of the source signal by a stream of (binary) symbols Adaptation to the properties of the transmission channel information source source coder channel coder modulation

More information

CS 630 Basic Probability and Information Theory. Tim Campbell

CS 630 Basic Probability and Information Theory. Tim Campbell CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)

More information

Noisy-Channel Coding

Noisy-Channel Coding Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/05264298 Part II Noisy-Channel Coding Copyright Cambridge University Press 2003.

More information

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory Digital Communications III (ECE 154C) Introduction to Coding and Information Theory Tara Javidi These lecture notes were originally developed by late Prof. J. K. Wolf. UC San Diego Spring 2014 1 / 8 I

More information

Lecture 1: Introduction, Entropy and ML estimation

Lecture 1: Introduction, Entropy and ML estimation 0-704: Information Processing and Learning Spring 202 Lecture : Introduction, Entropy and ML estimation Lecturer: Aarti Singh Scribes: Min Xu Disclaimer: These notes have not been subjected to the usual

More information

Shannon s noisy-channel theorem

Shannon s noisy-channel theorem Shannon s noisy-channel theorem Information theory Amon Elders Korteweg de Vries Institute for Mathematics University of Amsterdam. Tuesday, 26th of Januari Amon Elders (Korteweg de Vries Institute for

More information

Entropy as a measure of surprise

Entropy as a measure of surprise Entropy as a measure of surprise Lecture 5: Sam Roweis September 26, 25 What does information do? It removes uncertainty. Information Conveyed = Uncertainty Removed = Surprise Yielded. How should we quantify

More information

Lecture 4 Noisy Channel Coding

Lecture 4 Noisy Channel Coding Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 9, 2015 1 / 56 I-Hsiang Wang IT Lecture 4 The Channel Coding Problem

More information

Channel capacity. Outline : 1. Source entropy 2. Discrete memoryless channel 3. Mutual information 4. Channel capacity 5.

Channel capacity. Outline : 1. Source entropy 2. Discrete memoryless channel 3. Mutual information 4. Channel capacity 5. Channel capacity Outline : 1. Source entropy 2. Discrete memoryless channel 3. Mutual information 4. Channel capacity 5. Exercices Exercise session 11 : Channel capacity 1 1. Source entropy Given X a memoryless

More information

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122 Lecture 5: Channel Capacity Copyright G. Caire (Sample Lectures) 122 M Definitions and Problem Setup 2 X n Y n Encoder p(y x) Decoder ˆM Message Channel Estimate Definition 11. Discrete Memoryless Channel

More information

Lecture 2: August 31

Lecture 2: August 31 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy

More information

3F1 Information Theory, Lecture 3

3F1 Information Theory, Lecture 3 3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2011, 28 November 2011 Memoryless Sources Arithmetic Coding Sources with Memory 2 / 19 Summary of last lecture Prefix-free

More information

Lecture 5 Channel Coding over Continuous Channels

Lecture 5 Channel Coding over Continuous Channels Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 14, 2014 1 / 34 I-Hsiang Wang NIT Lecture 5 From

More information

Bioinformatics: Biology X

Bioinformatics: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA Model Building/Checking, Reverse Engineering, Causality Outline 1 Bayesian Interpretation of Probabilities 2 Where (or of what)

More information

Information Theory: Entropy, Markov Chains, and Huffman Coding

Information Theory: Entropy, Markov Chains, and Huffman Coding The University of Notre Dame A senior thesis submitted to the Department of Mathematics and the Glynn Family Honors Program Information Theory: Entropy, Markov Chains, and Huffman Coding Patrick LeBlanc

More information

ELEC546 Review of Information Theory

ELEC546 Review of Information Theory ELEC546 Review of Information Theory Vincent Lau 1/1/004 1 Review of Information Theory Entropy: Measure of uncertainty of a random variable X. The entropy of X, H(X), is given by: If X is a discrete random

More information

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions Engineering Tripos Part IIA THIRD YEAR 3F: Signals and Systems INFORMATION THEORY Examples Paper Solutions. Let the joint probability mass function of two binary random variables X and Y be given in the

More information

ECE Information theory Final

ECE Information theory Final ECE 776 - Information theory Final Q1 (1 point) We would like to compress a Gaussian source with zero mean and variance 1 We consider two strategies In the first, we quantize with a step size so that the

More information

Information Theory in Intelligent Decision Making

Information Theory in Intelligent Decision Making Information Theory in Intelligent Decision Making Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire, United Kingdom June 7, 2015 Information Theory

More information

Chapter I: Fundamental Information Theory

Chapter I: Fundamental Information Theory ECE-S622/T62 Notes Chapter I: Fundamental Information Theory Ruifeng Zhang Dept. of Electrical & Computer Eng. Drexel University. Information Source Information is the outcome of some physical processes.

More information

Chapter 2: Source coding

Chapter 2: Source coding Chapter 2: meghdadi@ensil.unilim.fr University of Limoges Chapter 2: Entropy of Markov Source Chapter 2: Entropy of Markov Source Markov model for information sources Given the present, the future is independent

More information

Block 2: Introduction to Information Theory

Block 2: Introduction to Information Theory Block 2: Introduction to Information Theory Francisco J. Escribano April 26, 2015 Francisco J. Escribano Block 2: Introduction to Information Theory April 26, 2015 1 / 51 Table of contents 1 Motivation

More information

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2 COMPSCI 650 Applied Information Theory Jan 21, 2016 Lecture 2 Instructor: Arya Mazumdar Scribe: Gayane Vardoyan, Jong-Chyi Su 1 Entropy Definition: Entropy is a measure of uncertainty of a random variable.

More information

Exercises with solutions (Set B)

Exercises with solutions (Set B) Exercises with solutions (Set B) 3. A fair coin is tossed an infinite number of times. Let Y n be a random variable, with n Z, that describes the outcome of the n-th coin toss. If the outcome of the n-th

More information

Compression and Coding

Compression and Coding Compression and Coding Theory and Applications Part 1: Fundamentals Gloria Menegaz 1 Transmitter (Encoder) What is the problem? Receiver (Decoder) Transformation information unit Channel Ordering (significance)

More information

Homework Set #2 Data Compression, Huffman code and AEP

Homework Set #2 Data Compression, Huffman code and AEP Homework Set #2 Data Compression, Huffman code and AEP 1. Huffman coding. Consider the random variable ( x1 x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0.11 0.04 0.04 0.03 0.02 (a Find a binary Huffman code

More information

3F1 Information Theory, Lecture 1

3F1 Information Theory, Lecture 1 3F1 Information Theory, Lecture 1 Jossy Sayir Department of Engineering Michaelmas 2013, 22 November 2013 Organisation History Entropy Mutual Information 2 / 18 Course Organisation 4 lectures Course material:

More information

Principles of Communications

Principles of Communications Principles of Communications Weiyao Lin Shanghai Jiao Tong University Chapter 10: Information Theory Textbook: Chapter 12 Communication Systems Engineering: Ch 6.1, Ch 9.1~ 9. 92 2009/2010 Meixia Tao @

More information

Information Theory - Entropy. Figure 3

Information Theory - Entropy. Figure 3 Concept of Information Information Theory - Entropy Figure 3 A typical binary coded digital communication system is shown in Figure 3. What is involved in the transmission of information? - The system

More information

The Method of Types and Its Application to Information Hiding

The Method of Types and Its Application to Information Hiding The Method of Types and Its Application to Information Hiding Pierre Moulin University of Illinois at Urbana-Champaign www.ifp.uiuc.edu/ moulin/talks/eusipco05-slides.pdf EUSIPCO Antalya, September 7,

More information

Shannon s Noisy-Channel Coding Theorem

Shannon s Noisy-Channel Coding Theorem Shannon s Noisy-Channel Coding Theorem Lucas Slot Sebastian Zur February 13, 2015 Lucas Slot, Sebastian Zur Shannon s Noisy-Channel Coding Theorem February 13, 2015 1 / 29 Outline 1 Definitions and Terminology

More information

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information 204 IEEE International Symposium on Information Theory Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information Omur Ozel, Kaya Tutuncuoglu 2, Sennur Ulukus, and Aylin Yener

More information

Module 1. Introduction to Digital Communications and Information Theory. Version 2 ECE IIT, Kharagpur

Module 1. Introduction to Digital Communications and Information Theory. Version 2 ECE IIT, Kharagpur Module ntroduction to Digital Communications and nformation Theory Lesson 3 nformation Theoretic Approach to Digital Communications After reading this lesson, you will learn about Scope of nformation Theory

More information

Information Theory. Week 4 Compressing streams. Iain Murray,

Information Theory. Week 4 Compressing streams. Iain Murray, Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 4 Compressing streams Iain Murray, 2014 School of Informatics, University of Edinburgh Jensen s inequality For convex functions: E[f(x)]

More information

Basic Principles of Video Coding

Basic Principles of Video Coding Basic Principles of Video Coding Introduction Categories of Video Coding Schemes Information Theory Overview of Video Coding Techniques Predictive coding Transform coding Quantization Entropy coding Motion

More information

CS6304 / Analog and Digital Communication UNIT IV - SOURCE AND ERROR CONTROL CODING PART A 1. What is the use of error control coding? The main use of error control coding is to reduce the overall probability

More information

Coding for Discrete Source

Coding for Discrete Source EGR 544 Communication Theory 3. Coding for Discrete Sources Z. Aliyazicioglu Electrical and Computer Engineering Department Cal Poly Pomona Coding for Discrete Source Coding Represent source data effectively

More information

Ch 0 Introduction. 0.1 Overview of Information Theory and Coding

Ch 0 Introduction. 0.1 Overview of Information Theory and Coding Ch 0 Introduction 0.1 Overview of Information Theory and Coding Overview The information theory was founded by Shannon in 1948. This theory is for transmission (communication system) or recording (storage

More information

Shannon s Noisy-Channel Coding Theorem

Shannon s Noisy-Channel Coding Theorem Shannon s Noisy-Channel Coding Theorem Lucas Slot Sebastian Zur February 2015 Abstract In information theory, Shannon s Noisy-Channel Coding Theorem states that it is possible to communicate over a noisy

More information

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H. Appendix A Information Theory A.1 Entropy Shannon (Shanon, 1948) developed the concept of entropy to measure the uncertainty of a discrete random variable. Suppose X is a discrete random variable that

More information

Information Theory Primer:

Information Theory Primer: Information Theory Primer: Entropy, KL Divergence, Mutual Information, Jensen s inequality Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE General e Image Coder Structure Motion Video x(s 1,s 2,t) or x(s 1,s 2 ) Natural Image Sampling A form of data compression; usually lossless, but can be lossy Redundancy Removal Lossless compression: predictive

More information

3F1 Information Theory, Lecture 3

3F1 Information Theory, Lecture 3 3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2013, 29 November 2013 Memoryless Sources Arithmetic Coding Sources with Memory Markov Example 2 / 21 Encoding the output

More information

Chapter 2 Data Coding and Image Compression

Chapter 2 Data Coding and Image Compression Chapter 2 Data Coding and Image Compression The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Shannon, Claude

More information

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code

More information

Lecture Notes on Digital Transmission Source and Channel Coding. José Manuel Bioucas Dias

Lecture Notes on Digital Transmission Source and Channel Coding. José Manuel Bioucas Dias Lecture Notes on Digital Transmission Source and Channel Coding José Manuel Bioucas Dias February 2015 CHAPTER 1 Source and Channel Coding Contents 1 Source and Channel Coding 1 1.1 Introduction......................................

More information

ITCT Lecture IV.3: Markov Processes and Sources with Memory

ITCT Lecture IV.3: Markov Processes and Sources with Memory ITCT Lecture IV.3: Markov Processes and Sources with Memory 4. Markov Processes Thus far, we have been occupied with memoryless sources and channels. We must now turn our attention to sources with memory.

More information

(Classical) Information Theory III: Noisy channel coding

(Classical) Information Theory III: Noisy channel coding (Classical) Information Theory III: Noisy channel coding Sibasish Ghosh The Institute of Mathematical Sciences CIT Campus, Taramani, Chennai 600 113, India. p. 1 Abstract What is the best possible way

More information

Lecture 8: Shannon s Noise Models

Lecture 8: Shannon s Noise Models Error Correcting Codes: Combinatorics, Algorithms and Applications (Fall 2007) Lecture 8: Shannon s Noise Models September 14, 2007 Lecturer: Atri Rudra Scribe: Sandipan Kundu& Atri Rudra Till now we have

More information

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code Chapter 3 Source Coding 3. An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code 3. An Introduction to Source Coding Entropy (in bits per symbol) implies in average

More information

Lecture 3: Channel Capacity

Lecture 3: Channel Capacity Lecture 3: Channel Capacity 1 Definitions Channel capacity is a measure of maximum information per channel usage one can get through a channel. This one of the fundamental concepts in information theory.

More information

Revision of Lecture 4

Revision of Lecture 4 Revision of Lecture 4 We have completed studying digital sources from information theory viewpoint We have learnt all fundamental principles for source coding, provided by information theory Practical

More information

Lecture 5 - Information theory

Lecture 5 - Information theory Lecture 5 - Information theory Jan Bouda FI MU May 18, 2012 Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 1 / 42 Part I Uncertainty and entropy Jan Bouda (FI MU) Lecture 5 - Information

More information

Information. = more information was provided by the outcome in #2

Information. = more information was provided by the outcome in #2 Outline First part based very loosely on [Abramson 63]. Information theory usually formulated in terms of information channels and coding will not discuss those here.. Information 2. Entropy 3. Mutual

More information

A Gentle Tutorial on Information Theory and Learning. Roni Rosenfeld. Carnegie Mellon University

A Gentle Tutorial on Information Theory and Learning. Roni Rosenfeld. Carnegie Mellon University A Gentle Tutorial on Information Theory and Learning Roni Rosenfeld Mellon University Mellon Outline First part based very loosely on [Abramson 63]. Information theory usually formulated in terms of information

More information

Chapter 2 Review of Classical Information Theory

Chapter 2 Review of Classical Information Theory Chapter 2 Review of Classical Information Theory Abstract This chapter presents a review of the classical information theory which plays a crucial role in this thesis. We introduce the various types of

More information

Feature selection. c Victor Kitov August Summer school on Machine Learning in High Energy Physics in partnership with

Feature selection. c Victor Kitov August Summer school on Machine Learning in High Energy Physics in partnership with Feature selection c Victor Kitov v.v.kitov@yandex.ru Summer school on Machine Learning in High Energy Physics in partnership with August 2015 1/38 Feature selection Feature selection is a process of selecting

More information

Source and Channel Coding for Correlated Sources Over Multiuser Channels

Source and Channel Coding for Correlated Sources Over Multiuser Channels Source and Channel Coding for Correlated Sources Over Multiuser Channels Deniz Gündüz, Elza Erkip, Andrea Goldsmith, H. Vincent Poor Abstract Source and channel coding over multiuser channels in which

More information

Feedback Capacity of a Class of Symmetric Finite-State Markov Channels

Feedback Capacity of a Class of Symmetric Finite-State Markov Channels Feedback Capacity of a Class of Symmetric Finite-State Markov Channels Nevroz Şen, Fady Alajaji and Serdar Yüksel Department of Mathematics and Statistics Queen s University Kingston, ON K7L 3N6, Canada

More information

Midterm Exam Information Theory Fall Midterm Exam. Time: 09:10 12:10 11/23, 2016

Midterm Exam Information Theory Fall Midterm Exam. Time: 09:10 12:10 11/23, 2016 Midterm Exam Time: 09:10 12:10 11/23, 2016 Name: Student ID: Policy: (Read before You Start to Work) The exam is closed book. However, you are allowed to bring TWO A4-size cheat sheet (single-sheet, two-sided).

More information

Coding of memoryless sources 1/35

Coding of memoryless sources 1/35 Coding of memoryless sources 1/35 Outline 1. Morse coding ; 2. Definitions : encoding, encoding efficiency ; 3. fixed length codes, encoding integers ; 4. prefix condition ; 5. Kraft and Mac Millan theorems

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#8:(November-08-2010) Cancer and Signals Outline 1 Bayesian Interpretation of Probabilities Information Theory Outline Bayesian

More information

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H. Problem sheet Ex. Verify that the function H(p,..., p n ) = k p k log p k satisfies all 8 axioms on H. Ex. (Not to be handed in). looking at the notes). List as many of the 8 axioms as you can, (without

More information

6.02 Fall 2011 Lecture #9

6.02 Fall 2011 Lecture #9 6.02 Fall 2011 Lecture #9 Claude E. Shannon Mutual information Channel capacity Transmission at rates up to channel capacity, and with asymptotically zero error 6.02 Fall 2011 Lecture 9, Slide #1 First

More information

ECE Advanced Communication Theory, Spring 2009 Homework #1 (INCOMPLETE)

ECE Advanced Communication Theory, Spring 2009 Homework #1 (INCOMPLETE) ECE 74 - Advanced Communication Theory, Spring 2009 Homework #1 (INCOMPLETE) 1. A Huffman code finds the optimal codeword to assign to a given block of source symbols. (a) Show that cannot be a Huffman

More information

Information Theory and Coding Techniques

Information Theory and Coding Techniques Information Theory and Coding Techniques Lecture 1.2: Introduction and Course Outlines Information Theory 1 Information Theory and Coding Techniques Prof. Ja-Ling Wu Department of Computer Science and

More information

Classical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006

Classical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006 Classical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006 Fabio Grazioso... July 3, 2006 1 2 Contents 1 Lecture 1, Entropy 4 1.1 Random variable...............................

More information

Lecture 2. Capacity of the Gaussian channel

Lecture 2. Capacity of the Gaussian channel Spring, 207 5237S, Wireless Communications II 2. Lecture 2 Capacity of the Gaussian channel Review on basic concepts in inf. theory ( Cover&Thomas: Elements of Inf. Theory, Tse&Viswanath: Appendix B) AWGN

More information

Shannon's Theory of Communication

Shannon's Theory of Communication Shannon's Theory of Communication An operational introduction 5 September 2014, Introduction to Information Systems Giovanni Sileno g.sileno@uva.nl Leibniz Center for Law University of Amsterdam Fundamental

More information

Solutions to Homework Set #3 Channel and Source coding

Solutions to Homework Set #3 Channel and Source coding Solutions to Homework Set #3 Channel and Source coding. Rates (a) Channels coding Rate: Assuming you are sending 4 different messages using usages of a channel. What is the rate (in bits per channel use)

More information

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel Introduction to Coding Theory CMU: Spring 2010 Notes 3: Stochastic channels and noisy coding theorem bound January 2010 Lecturer: Venkatesan Guruswami Scribe: Venkatesan Guruswami We now turn to the basic

More information