CS225/ECE205A, Spring 2005: Information Theory Wim van Dam Engineering 1, Room 5109

Size: px
Start display at page:

Download "CS225/ECE205A, Spring 2005: Information Theory Wim van Dam Engineering 1, Room 5109"

Transcription

1 CS225/ECE205A, Spring 2005: Information Theory Wim van Dam Engineering 1, Room

2 Formalities There was a mistake in earlier announcement. The midterm is scheduled for Week 5 (May 2/4). Before the weekend I will post a number of exercises that we will discuss the coming week. Try to answer them before next Tuesday. Also, start thinking about a project that you would like to work on. See the book and other places for ideas. Questions?

3 G. Perec s La Disparition A Void s plot follows a group of individuals hunting a missing companion, Anton Vowl. It is in part a parody of noir and horror fiction, with many stylistic tricks and gags, plot convolutions, and a grim conclusion. In many parts it implicitly talks about its own lipogrammatic limitation: illustrations of this including missing subdivision 5 of 26, and calling its main protagonist "Vowl". Individuals within A Void do work out what is missing, but find difficulty discussing or naming it as it has no word or form that conforms to its author's constraint. Philip Howard, writing a lipogrammatic appraisal of A Void in a British daily journal, said, This is a story chock-full of plots and sub-plots, of loops within loops, of trails in pursuit of trails, all of which allow its author an opportunity to display his customary virtuosity as an avant-gardist magician, acrobat and clown.

4 Letter Frequencies in English Figure from Information Theory, Inference, and Learning Algorithms, by David J.C. MacKay From Information Theory, Inference and Learning Algorithms, D. MacKay. The 27 letter alphabet gives an upper bound on the entropy of log 27 = A random sentence: XFOML RXKHRJFFJUJ ZLPWCFWKCYJ FFJEYVKCQSGHYD The different single letter frequencies gives a lower entropy of 4.1 bits/letter. A typical sentence: OCRO HLI RGWR NMIELWIS EU LL NBNESEBYA.

5 Combined Letter Frequencies Figure from Information Theory, Inference, and Learning Algorithms, by David J.C. MacKay For letter pairs, the entropy is 4.03 bits per letter. Typical sentence using triplets: IN NO IST LAT WHEY CRATICT FROURE BIRS GROCID For letter quadruplets, the entropy is down to 2.8 bits/letter. The entropy of English is estimated to be 1.5 bits per letter. See

6 Redundancy, Redundancy The fact that a natural language is very redundant is very useful: it allows us to understand English texts that are written or spoken unclearly. Redundancy of images is also helpful when we are reproducing them imperfectly (copying, faxing, et c.) But when we are using reliable (digital) technology, then the natural redundancy incurs an unnecessary cost when transmitting information. Shannon s idea while working for Bell Labs: Data Compression

7 Chapter 5: Data Compression

8 (Binary) Source Codes We want to efficiently encode (in bits) sequences produced by a random source variable X ~ p(x). The expected length of a code C is given by: L(C) = Σ x p(x) l(x), with l(x) the length of C(x). Example: For X={a,b,c} we can use the binary source code C(a)=00, C(b)=01, C(c)=10, with L(C)=2 bits/letter. A better idea is to use C(a)=0, C(b)=10, C(c)=11, such that if p(a) = p(b) = p(c) = 1/3, the code has expected length 1.66 bits per letter (bpc).

9 Relying on Bias Continued example for X={a,b,c}: The C(a)=0, C(b)=10, C(c)=11, works even better if p(a) = ½ and p(b) = p(c) = ¼, in which case the expected length is 1.5 bits per letter. It is worse if we have p(a) = p(b) = ¼ and p(c) = ½ such that L(C)= ¼ + ½ + 1 = 1.75 bits per letter. Central idea of variable length coding: Assign short code words to likely letters (and longer code words to the less likely letters).

10 Morse Code Figure from Information Theory, Inference, and Learning Algorithms, by David J.C. MacKay

11 Distributing Lengths The question has become: Given a probability p distribution over the alphabet X, how can we assign the lengths l(x) of the code words C(x)? Is it possible to have lengths l 1 =1, l 2 =2, l 3 =3, l 4 =3? Answer: Yes; use words 0,10,110 and 111. Is it possible to have lengths l 1 =1, l 2 =2, l 3 =2, l 4 =3? Answer: No. Try 0,01,10 and 111; what does 010 mean? We want our codes to be uniquely decodable.

12 A Uniquely Decodable Code Take the following code for X={a,b,c,d}: C(a)=10, C(b)=00, C(c)=11, C(d)=110. b ut a n n o y i n g Try decoding the string It gives C 1 ( ) = abcacbd, but there was some suspense with the second c, as the 11 could also have been the start of C(d)=110. This code is uniquely decodable, but the decoding is not instantaneous.

13 Instantaneous Codes Definition [CT, Section 5.1]: A code is called a prefix code or an instantaneous code if no code word is the prefix of any other code word. We want to do our coding with such prefix codes. The non-instantaneous code had code lengths 2,2,2,3. There is a instantaneous code with such lengths. Take the words 00,01,10,111. But there is no instantaneous code with lengths 1,2,2,3.

14 Kraft s Inequality Crucial question: For which code word lengths l 1,,l m does there exist a prefix code? Kraft s Inequality: For any binary prefix code with code word lengths l 1,,l m, it must hold that: Conversely, if the lengths l j obey Kraft s inequality, then there is a prefix code with such lengths. m j= 1 2 l j 1 If instead of bits we use the more general Dits {1,,D}, then we have m j= 1 D l j 1

15 Proving Kraft s Inequality L.G. Kraft, A device for quantizing, grouping, and coding amplitude modulated pulses, M.Sc. thesis, MIT, [Cover & Thomas]: The Kraft inequality was first proved by McMillan (1956) How to prove it? It might help to think about prefix codes as trees.

16 A Binary Code Tree Code words are the leafs, with lengths: 2,2,3,3,4,4,4,

17 CS225/ECE205A, Spring 2005: Information Theory Wim van Dam Engineering 1, Room

18 Last Week The amount of randomness of a discrete random variable X (with distribution p) is measured by the entropy function H (with bits as units): H(X) = x p(x)log 2 p(x) If X and Y are independent, then H(X,Y) = H(X)+H(Y). In general we have H(X,Y) = H(X) + Σ x p(x) H(Y X=x).

19 Conditional Entropy The expected entropy of Y after we have observed a value x X, is called the conditional entropy H(Y X): H(Y X) = x = = = E p(x) H(Y X = x) x x,y p(x) p(x,y)log p(y x) p(x,y) y log p(y X) p(y x)log p(y x) Chain rule: H(X,Y) = H(X)+H(Y X) = H(Y)+H(X Y).

20 Noisy Channel Example The symmetric binary channel with noise rate p again: Assume that the X bit is completely random with H(X) = H((½,½)) = 1. X Y 0 1 p p 0 p 1 1 p 1 After receiving/observing Y, the entropy has changed from H(X)=1 to H(X Y) = H((1 p,p)) = p log p (1 p)log(1 p). We expect an entropy reduction of 1 H((p,1 p)). Think of this reduction as gaining information about X.

21 Another Example of H(X Y) Take p(x) over {0,,500} with p = (½,1/1000,,1/1000) with entropy H(X) = ½ + ½ log 1000 = bits. If we learn that x is not 0, then we increase the entropy: p(x x 0 )=(0,1/500,,1/500) with H(X x 0 )=8.966 bits. We learned information, yet the entropy/uncertainty increased? Think: Not finding your wallet in the likely place. The expected uncertainty (=conditional entropy) goes down: H(X x=0? ) = ½ H(X x=0 ) + ½ H(X x 0 ) = bits.

22 Asymmetry of H(X Y) If the relation between X and Y is asymmetric, then we will have in general: H(X Y) H(Y X). X Y Example of an asymmetric channel Assume again H(X) = 1 and ½ note that H(Y) = H((3/4,1/4)) = ½ We have H(X Y) = bits. On the other hand H(Y X) = 0.5 bits. Check with H(X,Y) = 3/2 and the chain rule: H(X,Y) = H(X) + H(Y X) = H(Y) + H(X Y)

23 Chain Rule for Entropy For random variables X 1,,X n we have the Chain rule: H(X,...,X 1 n ) = H(X = n i= 1 1 ) + H(X H(X i X 2 i 1 X 1,...,X ) + 1 ) + H(X n X n 1,...,X Think: the amount of information that you obtain by observing X 1,,X n equals the X 1 information H(X 1 ), plus the additional X 2 information H(X 2 X 1 ), et cetera. L 1 ) Notice the similarity with the multiplicative rules for joint probabilities: p(x,y) = p(x) p(y x).

24 Mutual Information For two variables X,Y the mutual information I(X;Y) is the amount of certainty regarding X that we learned after observing Y. Hence I(X;Y) = H(X) H(X Y). Note that now X and Y can be interchanged using the chain rule: I(X; Y) = H(X) H(X Y) = H(X,Y) H(Y X) H(X Y) = H(Y) H(Y X) = I(Y;X) Think of I(X;Y) as the overlap between X and Y.

25 All Together Now H(X,Y) H(X Y) I(X;Y) H(Y X) H(X) H(Y)

26 Asymmetric Channel H(X Y)=0.689 H(X)=1 H(X,Y)=3/2 I(X;Y) = H(Y X) =0.5 H(Y)=0.811 X Y ½ 1 1 ½ For the previous asymmetric channel we had H(X)=1, H(Y)=0.811 and H(X,Y)=3/2, etc. Hence I(X;Y)=0.311.

27 About Mutual Information Mutual information is the central notion in information theory. It quantifies how much we learn about X by observing Y. When X and Y are the same we get: I(X;X) = H(X), hence entropy is called self information. It can be generalized to more than two variables: X Y Z

28 Expectation of What? Mutual information can be viewed as an expectation: I(X; Y) = = = H(X) H(X Y) x,y E p p(x,y)log log p(x, y) p(x) p(x, y) p(x) p(y) p(y) This expectation is also called the relative entropy between the probabilities p(x,y) and p(x) p(y).

29 Relative Entropy The relative entropy or Kullback Leibler distance between two distributions p and q is defined by D(p q) = p(x)log = x E p p(x) log q(x) p(x) q(x) This distance expresses our expected disbelief in q when we are observing probability distribution p. Not a true distance as D(p q) D(q p). Also, D(p q) can be infinite.

30 Example KL Distance Compare a fair coin and biased coin. Probabilities: p = (½,½) versus q = (q,1 q) D(p q) D(p q) expresses how much disbelief we have in the q-model, when we observe a fair coin (per coin flip). D(q p) expresses the disbelief in the fair-coin theory when observing q. D(q p)

31 Another D(q p) Example Consider a 10% difference: q=[q,1 q] while p=[q+/,1 q /] D(q p) The difference between [90%,10%] and [80%,20%] is much bigger than between [60%,40%] and [50%,50%]

32 CS225/ECE205A, Spring 2005: Information Theory Wim van Dam Engineering 1, Room

33 Help Wanted Note Taker You must be sensitive to the needs of students with disabilities $100 Talk to me after class

34 Formalities We are going to have projects. The emphasis will be on a small paper in which you show that you understand and can apply the tools of information theory. You will also give an ultra-short presentation (5 min). Grade determination: Midterm+Final+Project: 1+2+3=6.

35 Topics for Projects Information theory and data compression (lossless, lossy, mpeg) error correcting codes (802.11, DVDs, satellite) gambling and the stock market [CT, 6, 15] Kolmogorov complexity [CT, 7] Quantum information theory [Nielsen & Chuang] analog signals, rate distortion theory [CT, 13] information theory and natural languages statistical distances and phylogenetics anything that uses some nontrivial information theory

36 Entropic Inequalities Entropic definitions and equalities: H(X) = Σ x p(x) log p(x), H(X Y) = Σ y p(y) H(X Y=y), I(X;Y) = H(X) H(X Y), D(p q) = Σ x p(x) log p(x)/q(x), Entropic inequalities that follow from 0 p(x) 1: H(X) 0, H(X Y) 0, H(X,Y) H(X), Less obvious inequalities: I(X;Y) 0, H(X) log X, D(p q) 0, How do those inequalities relate? How to prove the not-so-obvious inequalities?

37 Information Inequality Theorem 2.6.3: For two probabilities distribution p and q we have D(p q) 0 and D(p q)=0 if and only if p=q. p(x) D(p q) = p(x) log x q(x) = = = log log(1) 0 x p(x) log x p(x) q(x) p(x) q(x) p(x) Q: What happened here? A: Application of Jensen s inequality to the concave function log:r + R.

38 Convex and Concave A function f is convex if for all points x 1 and x 2 it holds that f(λx 1 +(1 λ)x 2 ) λf(x 1 )+(1 λ)f(x 2 ). Equivalently: f 0 Strictly convex: f >0 A function f is concave if for all points x 1 and x 2 it holds that f(λx 1 +(1 λ)x 2 ) λf(x 1 )+(1 λ)f(x 2 ) Equivalently: f 0 Strictly concave: f <0

39 Jensen s Inequality Theorem 2.6.2: If f is a convex function on a random variable Z, then E[f(Z)] f(e[z]). If f is a concave function on a random variable Z, then E[f(Z)] f(e[z]). If f is strictly convex or strictly concave, then E[f(Z)] = f(e[z]) implies that Z is a deterministic variable. Proof: See Cover and Thomas. Note: f is convex if and only if f is concave.

40 Jensen for the Log Function The function log(z) is concave for z R +, hence E[log Z] log(e[z]) and also E[log 1/Z] log(1/e[z]). For our purposes we typically have variables X,Y and some additional function g:x,y R + such that (with Z=g(X,Y)) Jensen s inequality tells us: E[log g(x,y)] log(e[g(x,y)]). Important example: Σ x p(x) log q(x)/p(x) = E[log q(x)/p(x)] log(e[q(x)/p(x)]) = log(σ x p(x) q(x)/p(x)) = log(σ x q(x)) = 0.

41 Information Inequality, Again Theorem 2.6.3: For two probabilities distribution p and q we have D(p q) 0 and D(p q)=0 if and only if p=q. p(x) D(p q) = p(x) log x q(x) = = = log log(1) 0 x p(x) x q(x) log p(x) p(x) q(x) p(x) Q: What happened here? A: Application of Jensen s inequality to the concave function log:r + R.

42 For Next Tuesday Complete the proof that for d(x,y) = H(X Y) + H(Y X) it holds that d(x,y) + d(y,z) d(x,z). Go through as many entropic inequalities as you want and see which ones are the easy ones and which one require Jensen s inequality. Think about a project you want to work on.

43 CS225/ECE205A, Spring 2005: Information Theory Wim van Dam Engineering 1, Room

44 Formalities Answers to Homework Exercise 1 will be posted. Next week s Midterm will be on Thursday, May 4. Topics: Entropies and source coding. The Midterm and Final are both open book, etc. Also next week, I want to receive the tentative titles and abstracts for your projects.

45 Using Jensen s Inequality Using Jensen s inequality on the log function, we get log(e[1/g(x)]) E[log g(x)] log(e[g(x)]) for all functions g:x R + and distributions p(x). Using this we can prove H(X) log X I(X;Y) 0 D(p q) 0 for all random variables X, Y and distributions p,q.

46 Proving H(X) log X, Twice Directly using Jensen s Inequality on E[log(1/p(X))]: H(X) = E = E = = p p log(e log( log [ logp(x)] [log1/ p(x)] p x X [1/ p(x)]) X p(x) / p(x)) Using nonnegativity of Relative entropy D(p q) between p(x) and q=1/ X : 0 = E = E = E D(p q) p p p [log(p(x) / q(x))] [log(p(x) [log(p(x)) + log = H(X) + log X X )]. X ] Upper bound on the entropy H(X) with the probability distribution p over the finite alphabet X of size X.

47 What We Know Kraft s Inequality: For any binary prefix code with code word lengths l 1,,l m, it must hold that: Conversely, if the lengths l j obey Kraft s inequality, then there is a prefix code with such lengths. m j= 1 2 l j Entropic Inequalities like: D(p q) 0, such that E p [log 1/p(X)] E p [log 1/q(X)] for all p,q. 1 In other words: E p [log 1/q(X)] is minimized for q=p.

48 Source Coding Question Given a probability distribution p 1,,p m, what lengths l 1,,l m do we take such that we minimize the expected length L(C) = Σ j p j l j, while obeying Kraft s inequality? Rough answer: Take l j log p j. Note that if exact this gives: L(C) = Σ j p j l j = Σ j p j log p j = H(p). Using inequalities like D(p q) 0 we will show that: - For all prefix codes we must have: L(C) H(p) - There exists a code such that: L(C) < H(p)+1.

49 Lossless Coding Theorem Shannon s 1948 Lossless Source Coding Theorem: Let X be a random variable with entropy H(X). - For all uniquely decodable codes C it holds that L(C) H(X) - There exists a prefix code C such that L(C) < H(X)+1. Idea: Let X={1,,n}. Any code C with lengths l 1,, l n that obeys Kraft s inequality defines a probability distribution q over {0,1,,n} with q(j) = 2 lj (and q(0) such that Σ j q(j)=1). The expected length of this code: L(C) = E p [ log q(x)].

50 Lower Bound Proof Let p be the probability distribution of variable X. Let code C have lengths l 1,, l n, defining the distribution q over {0,1,,n} with q(j) = 2 lj, such that : L(C) = E p [ log q(x)]. This gives: L(C) H(X) = = = E E p p [ log q(x)] E [log p(x) / q(x)] D(p q) 0 p [ log p(x)] The equality L(C) H(X) = D(p q) also shows that we should try to have q=p to get the smallest possible L(C).

51 The Upper Bound Let p be the probability distribution of variable X. Try defining a prefix code C with l j = log p j. Algebra tells us: j 2 l j = = 1. j j 2 2 logp logp The expected length of this code is: j j Kraft s inequality tells us that there does exist a prefix code C with these lengths l j. E p [l j ] = E p [ log p(x) ] < E p [ log p(x)]+1 = H(X)+1. How to get rid of the annoying +1 term?

52 Lowering the Upper Bound To reduce the H(X)+1 upper bound to H(X)+ε, we apply the same approach to strings in X m instead of X. The new probability distribution p m is defined by: p m (x 1,,x m ) = p(x 1 ) p(x m ) such that H(X m ) = m H(X). We know that there exists a prefix code C m for X m with L(C m ) < H(X m ) +1, hence the average per letter is now bounded by: L(C) m < m H(X ) m + 1 = H(X) + 1 m. By letting m grow, we get L(C m )/m < H(X)+ε, for arbitrary small ε>0.

53 Source Coding Algorithms How to implement Shannon s idea for lossless source coding for a known distribution p(x)? Two important cases: - Huffman codes - Shannon, Shannon-Fano, Shannon-Fano-Elias, and/or Arithmetic codes An important but different setting occurs when we do not know a priori the distribution p. - Lempel-Ziv compression

54 Optimal Coding? Let X={0,1} with p(x=0) = 0.99 and p(x=1) = Shannon s original idea suggests a code with l 0 = log p 0 = 1 bit and l 1 = log p 1 = 7 bits, which is clearly not optimal. Looking at the code tree it is clear that for an optimal code the two longest words must have equal length and differ only in the last bits. Another optimal property: if p j > p k then l j l k. These observations lead to the Huffman code (which is indeed optimal).

55 Huffman Code Assume probabilities p 1 p 2 p m for X={1,,m}. Because p 1 and p 2 are the two smallest probabilities, the lengths l 1 and l 2 are the longest and equal. The node at depth l 1 1 connecting the code words for 1 and 2 can be viewed as a code word for 1 or 2 (with probability p 1 +p 2 ). If 1 or 2 is less likely than 3 (p 1 +p 2 < p 2 ), then l 1 1 l 3. Otherwise (p 1 +p 2 p 2 ), we must have l 1 1 l 3. Applying this line of thought recursively gives a code tree for the Huffman code.

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18 Information Theory David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 18 A Measure of Information? Consider a discrete random variable

More information

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 2: Entropy and Mutual Information Chapter 2 outline Definitions Entropy Joint entropy, conditional entropy Relative entropy, mutual information Chain rules Jensen s inequality Log-sum inequality

More information

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information. L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate

More information

Dept. of Linguistics, Indiana University Fall 2015

Dept. of Linguistics, Indiana University Fall 2015 L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission

More information

1 Introduction to information theory

1 Introduction to information theory 1 Introduction to information theory 1.1 Introduction In this chapter we present some of the basic concepts of information theory. The situations we have in mind involve the exchange of information through

More information

Lecture 2: August 31

Lecture 2: August 31 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy

More information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information 4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information Ramji Venkataramanan Signal Processing and Communications Lab Department of Engineering ramji.v@eng.cam.ac.uk

More information

Chapter 2: Source coding

Chapter 2: Source coding Chapter 2: meghdadi@ensil.unilim.fr University of Limoges Chapter 2: Entropy of Markov Source Chapter 2: Entropy of Markov Source Markov model for information sources Given the present, the future is independent

More information

Lecture 22: Final Review

Lecture 22: Final Review Lecture 22: Final Review Nuts and bolts Fundamental questions and limits Tools Practical algorithms Future topics Dr Yao Xie, ECE587, Information Theory, Duke University Basics Dr Yao Xie, ECE587, Information

More information

Entropy as a measure of surprise

Entropy as a measure of surprise Entropy as a measure of surprise Lecture 5: Sam Roweis September 26, 25 What does information do? It removes uncertainty. Information Conveyed = Uncertainty Removed = Surprise Yielded. How should we quantify

More information

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Multimedia Communications. Mathematical Preliminaries for Lossless Compression Multimedia Communications Mathematical Preliminaries for Lossless Compression What we will see in this chapter Definition of information and entropy Modeling a data source Definition of coding and when

More information

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1 Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,

More information

Lecture 1: September 25, A quick reminder about random variables and convexity

Lecture 1: September 25, A quick reminder about random variables and convexity Information and Coding Theory Autumn 207 Lecturer: Madhur Tulsiani Lecture : September 25, 207 Administrivia This course will cover some basic concepts in information and coding theory, and their applications

More information

Lecture 1: Introduction, Entropy and ML estimation

Lecture 1: Introduction, Entropy and ML estimation 0-704: Information Processing and Learning Spring 202 Lecture : Introduction, Entropy and ML estimation Lecturer: Aarti Singh Scribes: Min Xu Disclaimer: These notes have not been subjected to the usual

More information

Information & Correlation

Information & Correlation Information & Correlation Jilles Vreeken 11 June 2014 (TADA) Questions of the day What is information? How can we measure correlation? and what do talking drums have to do with this? Bits and Pieces What

More information

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code

More information

Chapter 9 Fundamental Limits in Information Theory

Chapter 9 Fundamental Limits in Information Theory Chapter 9 Fundamental Limits in Information Theory Information Theory is the fundamental theory behind information manipulation, including data compression and data transmission. 9.1 Introduction o For

More information

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code Chapter 3 Source Coding 3. An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code 3. An Introduction to Source Coding Entropy (in bits per symbol) implies in average

More information

Chapter 5: Data Compression

Chapter 5: Data Compression Chapter 5: Data Compression Definition. A source code C for a random variable X is a mapping from the range of X to the set of finite length strings of symbols from a D-ary alphabet. ˆX: source alphabet,

More information

Information Theory and Statistics Lecture 2: Source coding

Information Theory and Statistics Lecture 2: Source coding Information Theory and Statistics Lecture 2: Source coding Łukasz Dębowski ldebowsk@ipipan.waw.pl Ph. D. Programme 2013/2014 Injections and codes Definition (injection) Function f is called an injection

More information

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H. Problem sheet Ex. Verify that the function H(p,..., p n ) = k p k log p k satisfies all 8 axioms on H. Ex. (Not to be handed in). looking at the notes). List as many of the 8 axioms as you can, (without

More information

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal

More information

Intro to Information Theory

Intro to Information Theory Intro to Information Theory Math Circle February 11, 2018 1. Random variables Let us review discrete random variables and some notation. A random variable X takes value a A with probability P (a) 0. Here

More information

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

10-704: Information Processing and Learning Fall Lecture 10: Oct 3 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 0: Oct 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of

More information

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1) 3- Mathematical methods in communication Lecture 3 Lecturer: Haim Permuter Scribe: Yuval Carmel, Dima Khaykin, Ziv Goldfeld I. REMINDER A. Convex Set A set R is a convex set iff, x,x 2 R, θ, θ, θx + θx

More information

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy Coding and Information Theory Chris Williams, School of Informatics, University of Edinburgh Overview What is information theory? Entropy Coding Information Theory Shannon (1948): Information theory is

More information

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols. Universal Lossless coding Lempel-Ziv Coding Basic principles of lossless compression Historical review Variable-length-to-block coding Lempel-Ziv coding 1 Basic Principles of Lossless Coding 1. Exploit

More information

Solutions to Set #2 Data Compression, Huffman code and AEP

Solutions to Set #2 Data Compression, Huffman code and AEP Solutions to Set #2 Data Compression, Huffman code and AEP. Huffman coding. Consider the random variable ( ) x x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0. 0.04 0.04 0.03 0.02 (a) Find a binary Huffman code

More information

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2 COMPSCI 650 Applied Information Theory Jan 21, 2016 Lecture 2 Instructor: Arya Mazumdar Scribe: Gayane Vardoyan, Jong-Chyi Su 1 Entropy Definition: Entropy is a measure of uncertainty of a random variable.

More information

Coding of memoryless sources 1/35

Coding of memoryless sources 1/35 Coding of memoryless sources 1/35 Outline 1. Morse coding ; 2. Definitions : encoding, encoding efficiency ; 3. fixed length codes, encoding integers ; 4. prefix condition ; 5. Kraft and Mac Millan theorems

More information

Lecture 1. Introduction

Lecture 1. Introduction Lecture 1. Introduction What is the course about? Logistics Questionnaire Dr. Yao Xie, ECE587, Information Theory, Duke University What is information? Dr. Yao Xie, ECE587, Information Theory, Duke University

More information

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity Eckehard Olbrich MPI MiS Leipzig Potsdam WS 2007/08 Olbrich (Leipzig) 26.10.2007 1 / 18 Overview 1 Summary

More information

3F1 Information Theory, Lecture 3

3F1 Information Theory, Lecture 3 3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2011, 28 November 2011 Memoryless Sources Arithmetic Coding Sources with Memory 2 / 19 Summary of last lecture Prefix-free

More information

Entropies & Information Theory

Entropies & Information Theory Entropies & Information Theory LECTURE I Nilanjana Datta University of Cambridge,U.K. See lecture notes on: http://www.qi.damtp.cam.ac.uk/node/223 Quantum Information Theory Born out of Classical Information

More information

Lecture 5 - Information theory

Lecture 5 - Information theory Lecture 5 - Information theory Jan Bouda FI MU May 18, 2012 Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 1 / 42 Part I Uncertainty and entropy Jan Bouda (FI MU) Lecture 5 - Information

More information

Noisy channel communication

Noisy channel communication Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 6 Communication channels and Information Some notes on the noisy channel setup: Iain Murray, 2012 School of Informatics, University

More information

Information in Biology

Information in Biology Information in Biology CRI - Centre de Recherches Interdisciplinaires, Paris May 2012 Information processing is an essential part of Life. Thinking about it in quantitative terms may is useful. 1 Living

More information

Example: Letter Frequencies

Example: Letter Frequencies Example: Letter Frequencies i a i p i 1 a 0.0575 2 b 0.0128 3 c 0.0263 4 d 0.0285 5 e 0.0913 6 f 0.0173 7 g 0.0133 8 h 0.0313 9 i 0.0599 10 j 0.0006 11 k 0.0084 12 l 0.0335 13 m 0.0235 14 n 0.0596 15 o

More information

EC2252 COMMUNICATION THEORY UNIT 5 INFORMATION THEORY

EC2252 COMMUNICATION THEORY UNIT 5 INFORMATION THEORY EC2252 COMMUNICATION THEORY UNIT 5 INFORMATION THEORY Discrete Messages and Information Content, Concept of Amount of Information, Average information, Entropy, Information rate, Source coding to increase

More information

Example: Letter Frequencies

Example: Letter Frequencies Example: Letter Frequencies i a i p i 1 a 0.0575 2 b 0.0128 3 c 0.0263 4 d 0.0285 5 e 0.0913 6 f 0.0173 7 g 0.0133 8 h 0.0313 9 i 0.0599 10 j 0.0006 11 k 0.0084 12 l 0.0335 13 m 0.0235 14 n 0.0596 15 o

More information

Lecture 11: Information theory THURSDAY, FEBRUARY 21, 2019

Lecture 11: Information theory THURSDAY, FEBRUARY 21, 2019 Lecture 11: Information theory DANIEL WELLER THURSDAY, FEBRUARY 21, 2019 Agenda Information and probability Entropy and coding Mutual information and capacity Both images contain the same fraction of black

More information

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code Chapter 2 Date Compression: Source Coding 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code 2.1 An Introduction to Source Coding Source coding can be seen as an efficient way

More information

Data Compression. Limit of Information Compression. October, Examples of codes 1

Data Compression. Limit of Information Compression. October, Examples of codes 1 Data Compression Limit of Information Compression Radu Trîmbiţaş October, 202 Outline Contents Eamples of codes 2 Kraft Inequality 4 2. Kraft Inequality............................ 4 2.2 Kraft inequality

More information

Coding for Discrete Source

Coding for Discrete Source EGR 544 Communication Theory 3. Coding for Discrete Sources Z. Aliyazicioglu Electrical and Computer Engineering Department Cal Poly Pomona Coding for Discrete Source Coding Represent source data effectively

More information

Homework Set #2 Data Compression, Huffman code and AEP

Homework Set #2 Data Compression, Huffman code and AEP Homework Set #2 Data Compression, Huffman code and AEP 1. Huffman coding. Consider the random variable ( x1 x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0.11 0.04 0.04 0.03 0.02 (a Find a binary Huffman code

More information

Information Theory and Communication

Information Theory and Communication Information Theory and Communication Ritwik Banerjee rbanerjee@cs.stonybrook.edu c Ritwik Banerjee Information Theory and Communication 1/8 General Chain Rules Definition Conditional mutual information

More information

ELEMENTS O F INFORMATION THEORY

ELEMENTS O F INFORMATION THEORY ELEMENTS O F INFORMATION THEORY THOMAS M. COVER JOY A. THOMAS Preface to the Second Edition Preface to the First Edition Acknowledgments for the Second Edition Acknowledgments for the First Edition x

More information

Computing and Communications 2. Information Theory -Entropy

Computing and Communications 2. Information Theory -Entropy 1896 1920 1987 2006 Computing and Communications 2. Information Theory -Entropy Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2017, Autumn 1 Outline Entropy Joint entropy

More information

Information in Biology

Information in Biology Lecture 3: Information in Biology Tsvi Tlusty, tsvi@unist.ac.kr Living information is carried by molecular channels Living systems I. Self-replicating information processors Environment II. III. Evolve

More information

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code Lecture 16 Agenda for the lecture Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code Variable-length source codes with error 16.1 Error-free coding schemes 16.1.1 The Shannon-Fano-Elias

More information

Information Theory. M1 Informatique (parcours recherche et innovation) Aline Roumy. January INRIA Rennes 1/ 73

Information Theory. M1 Informatique (parcours recherche et innovation) Aline Roumy. January INRIA Rennes 1/ 73 1/ 73 Information Theory M1 Informatique (parcours recherche et innovation) Aline Roumy INRIA Rennes January 2018 Outline 2/ 73 1 Non mathematical introduction 2 Mathematical introduction: definitions

More information

Example: Letter Frequencies

Example: Letter Frequencies Example: Letter Frequencies i a i p i 1 a 0.0575 2 b 0.0128 3 c 0.0263 4 d 0.0285 5 e 0.0913 6 f 0.0173 7 g 0.0133 8 h 0.0313 9 i 0.0599 10 j 0.0006 11 k 0.0084 12 l 0.0335 13 m 0.0235 14 n 0.0596 15 o

More information

UNIT I INFORMATION THEORY. I k log 2

UNIT I INFORMATION THEORY. I k log 2 UNIT I INFORMATION THEORY Claude Shannon 1916-2001 Creator of Information Theory, lays the foundation for implementing logic in digital circuits as part of his Masters Thesis! (1939) and published a paper

More information

Lecture 1: Shannon s Theorem

Lecture 1: Shannon s Theorem Lecture 1: Shannon s Theorem Lecturer: Travis Gagie January 13th, 2015 Welcome to Data Compression! I m Travis and I ll be your instructor this week. If you haven t registered yet, don t worry, we ll work

More information

(Classical) Information Theory III: Noisy channel coding

(Classical) Information Theory III: Noisy channel coding (Classical) Information Theory III: Noisy channel coding Sibasish Ghosh The Institute of Mathematical Sciences CIT Campus, Taramani, Chennai 600 113, India. p. 1 Abstract What is the best possible way

More information

Machine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang

Machine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang Machine Learning Lecture 02.2: Basics of Information Theory Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering The Hong Kong University of Science and Technology Nevin L. Zhang

More information

Dynamics and time series: theory and applications. Stefano Marmi Scuola Normale Superiore Lecture 6, Nov 23, 2011

Dynamics and time series: theory and applications. Stefano Marmi Scuola Normale Superiore Lecture 6, Nov 23, 2011 Dynamics and time series: theory and applications Stefano Marmi Scuola Normale Superiore Lecture 6, Nov 23, 2011 Measure-preserving transformations X phase space, μ probability measure Φ:X R observable

More information

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy Haykin_ch05_pp3.fm Page 207 Monday, November 26, 202 2:44 PM CHAPTER 5 Information Theory 5. Introduction As mentioned in Chapter and reiterated along the way, the purpose of a communication system is

More information

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory Digital Communications III (ECE 154C) Introduction to Coding and Information Theory Tara Javidi These lecture notes were originally developed by late Prof. J. K. Wolf. UC San Diego Spring 2014 1 / 8 I

More information

CS 229r Information Theory in Computer Science Feb 12, Lecture 5

CS 229r Information Theory in Computer Science Feb 12, Lecture 5 CS 229r Information Theory in Computer Science Feb 12, 2019 Lecture 5 Instructor: Madhu Sudan Scribe: Pranay Tankala 1 Overview A universal compression algorithm is a single compression algorithm applicable

More information

Information Theory: Entropy, Markov Chains, and Huffman Coding

Information Theory: Entropy, Markov Chains, and Huffman Coding The University of Notre Dame A senior thesis submitted to the Department of Mathematics and the Glynn Family Honors Program Information Theory: Entropy, Markov Chains, and Huffman Coding Patrick LeBlanc

More information

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006)

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006) MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK SATELLITE COMMUNICATION DEPT./SEM.:ECE/VIII UNIT V PART-A 1. What is binary symmetric channel (AUC DEC 2006) 2. Define information rate? (AUC DEC 2007)

More information

Quiz 2 Date: Monday, November 21, 2016

Quiz 2 Date: Monday, November 21, 2016 10-704 Information Processing and Learning Fall 2016 Quiz 2 Date: Monday, November 21, 2016 Name: Andrew ID: Department: Guidelines: 1. PLEASE DO NOT TURN THIS PAGE UNTIL INSTRUCTED. 2. Write your name,

More information

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H. Appendix A Information Theory A.1 Entropy Shannon (Shanon, 1948) developed the concept of entropy to measure the uncertainty of a discrete random variable. Suppose X is a discrete random variable that

More information

ELEC 515 Information Theory. Distortionless Source Coding

ELEC 515 Information Theory. Distortionless Source Coding ELEC 515 Information Theory Distortionless Source Coding 1 Source Coding Output Alphabet Y={y 1,,y J } Source Encoder Lengths 2 Source Coding Two coding requirements The source sequence can be recovered

More information

Information Theory. Week 4 Compressing streams. Iain Murray,

Information Theory. Week 4 Compressing streams. Iain Murray, Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 4 Compressing streams Iain Murray, 2014 School of Informatics, University of Edinburgh Jensen s inequality For convex functions: E[f(x)]

More information

Lecture 3 : Algorithms for source coding. September 30, 2016

Lecture 3 : Algorithms for source coding. September 30, 2016 Lecture 3 : Algorithms for source coding September 30, 2016 Outline 1. Huffman code ; proof of optimality ; 2. Coding with intervals : Shannon-Fano-Elias code and Shannon code ; 3. Arithmetic coding. 1/39

More information

Introduction to Information Theory. B. Škorić, Physical Aspects of Digital Security, Chapter 2

Introduction to Information Theory. B. Škorić, Physical Aspects of Digital Security, Chapter 2 Introduction to Information Theory B. Škorić, Physical Aspects of Digital Security, Chapter 2 1 Information theory What is it? - formal way of counting information bits Why do we need it? - often used

More information

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16 EE539R: Problem Set 4 Assigned: 3/08/6, Due: 07/09/6. Cover and Thomas: Problem 3.5 Sets defined by probabilities: Define the set C n (t = {x n : P X n(x n 2 nt } (a We have = P X n(x n P X n(x n 2 nt

More information

Information Theory, Statistics, and Decision Trees

Information Theory, Statistics, and Decision Trees Information Theory, Statistics, and Decision Trees Léon Bottou COS 424 4/6/2010 Summary 1. Basic information theory. 2. Decision trees. 3. Information theory and statistics. Léon Bottou 2/31 COS 424 4/6/2010

More information

AQI: Advanced Quantum Information Lecture 6 (Module 2): Distinguishing Quantum States January 28, 2013

AQI: Advanced Quantum Information Lecture 6 (Module 2): Distinguishing Quantum States January 28, 2013 AQI: Advanced Quantum Information Lecture 6 (Module 2): Distinguishing Quantum States January 28, 2013 Lecturer: Dr. Mark Tame Introduction With the emergence of new types of information, in this case

More information

LECTURE 3. Last time:

LECTURE 3. Last time: LECTURE 3 Last time: Mutual Information. Convexity and concavity Jensen s inequality Information Inequality Data processing theorem Fano s Inequality Lecture outline Stochastic processes, Entropy rate

More information

The binary entropy function

The binary entropy function ECE 7680 Lecture 2 Definitions and Basic Facts Objective: To learn a bunch of definitions about entropy and information measures that will be useful through the quarter, and to present some simple but

More information

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion st Semester 00/ Solutions to Homework Set # Sanov s Theorem, Rate distortion. Sanov s theorem: Prove the simple version of Sanov s theorem for the binary random variables, i.e., let X,X,...,X n be a sequence

More information

Classification & Information Theory Lecture #8

Classification & Information Theory Lecture #8 Classification & Information Theory Lecture #8 Introduction to Natural Language Processing CMPSCI 585, Fall 2007 University of Massachusetts Amherst Andrew McCallum Today s Main Points Automatically categorizing

More information

Lecture Notes on Digital Transmission Source and Channel Coding. José Manuel Bioucas Dias

Lecture Notes on Digital Transmission Source and Channel Coding. José Manuel Bioucas Dias Lecture Notes on Digital Transmission Source and Channel Coding José Manuel Bioucas Dias February 2015 CHAPTER 1 Source and Channel Coding Contents 1 Source and Channel Coding 1 1.1 Introduction......................................

More information

Module 1. Introduction to Digital Communications and Information Theory. Version 2 ECE IIT, Kharagpur

Module 1. Introduction to Digital Communications and Information Theory. Version 2 ECE IIT, Kharagpur Module ntroduction to Digital Communications and nformation Theory Lesson 3 nformation Theoretic Approach to Digital Communications After reading this lesson, you will learn about Scope of nformation Theory

More information

Noisy-Channel Coding

Noisy-Channel Coding Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/05264298 Part II Noisy-Channel Coding Copyright Cambridge University Press 2003.

More information

CS6304 / Analog and Digital Communication UNIT IV - SOURCE AND ERROR CONTROL CODING PART A 1. What is the use of error control coding? The main use of error control coding is to reduce the overall probability

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

CS4800: Algorithms & Data Jonathan Ullman

CS4800: Algorithms & Data Jonathan Ullman CS4800: Algorithms & Data Jonathan Ullman Lecture 22: Greedy Algorithms: Huffman Codes Data Compression and Entropy Apr 5, 2018 Data Compression How do we store strings of text compactly? A (binary) code

More information

Chapter 2 Review of Classical Information Theory

Chapter 2 Review of Classical Information Theory Chapter 2 Review of Classical Information Theory Abstract This chapter presents a review of the classical information theory which plays a crucial role in this thesis. We introduce the various types of

More information

MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK. SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A

MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK. SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK DEPARTMENT: ECE SEMESTER: IV SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A 1. What is binary symmetric channel (AUC DEC

More information

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Information Theory and Distribution Modeling

TTIC 31230, Fundamentals of Deep Learning David McAllester, April Information Theory and Distribution Modeling TTIC 31230, Fundamentals of Deep Learning David McAllester, April 2017 Information Theory and Distribution Modeling Why do we model distributions and conditional distributions using the following objective

More information

3F1 Information Theory, Lecture 3

3F1 Information Theory, Lecture 3 3F1 Information Theory, Lecture 3 Jossy Sayir Department of Engineering Michaelmas 2013, 29 November 2013 Memoryless Sources Arithmetic Coding Sources with Memory Markov Example 2 / 21 Encoding the output

More information

Autumn Coping with NP-completeness (Conclusion) Introduction to Data Compression

Autumn Coping with NP-completeness (Conclusion) Introduction to Data Compression Autumn Coping with NP-completeness (Conclusion) Introduction to Data Compression Kirkpatrick (984) Analogy from thermodynamics. The best crystals are found by annealing. First heat up the material to let

More information

Introduction to Information Theory

Introduction to Information Theory Introduction to Information Theory Gurinder Singh Mickey Atwal atwal@cshl.edu Center for Quantitative Biology Kullback-Leibler Divergence Summary Shannon s coding theorems Entropy Mutual Information Multi-information

More information

Non-binary Distributed Arithmetic Coding

Non-binary Distributed Arithmetic Coding Non-binary Distributed Arithmetic Coding by Ziyang Wang Thesis submitted to the Faculty of Graduate and Postdoctoral Studies In partial fulfillment of the requirements For the Masc degree in Electrical

More information

(Classical) Information Theory II: Source coding

(Classical) Information Theory II: Source coding (Classical) Information Theory II: Source coding Sibasish Ghosh The Institute of Mathematical Sciences CIT Campus, Taramani, Chennai 600 113, India. p. 1 Abstract The information content of a random variable

More information

CSCI 2570 Introduction to Nanocomputing

CSCI 2570 Introduction to Nanocomputing CSCI 2570 Introduction to Nanocomputing Information Theory John E Savage What is Information Theory Introduced by Claude Shannon. See Wikipedia Two foci: a) data compression and b) reliable communication

More information

A Mathematical Theory of Communication

A Mathematical Theory of Communication A Mathematical Theory of Communication Ben Eggers Abstract This paper defines information-theoretic entropy and proves some elementary results about it. Notably, we prove that given a few basic assumptions

More information

Information and Entropy

Information and Entropy Information and Entropy Shannon s Separation Principle Source Coding Principles Entropy Variable Length Codes Huffman Codes Joint Sources Arithmetic Codes Adaptive Codes Thomas Wiegand: Digital Image Communication

More information

Shannon's Theory of Communication

Shannon's Theory of Communication Shannon's Theory of Communication An operational introduction 5 September 2014, Introduction to Information Systems Giovanni Sileno g.sileno@uva.nl Leibniz Center for Law University of Amsterdam Fundamental

More information

lossless, optimal compressor

lossless, optimal compressor 6. Variable-length Lossless Compression The principal engineering goal of compression is to represent a given sequence a, a 2,..., a n produced by a source as a sequence of bits of minimal possible length.

More information

Chapter I: Fundamental Information Theory

Chapter I: Fundamental Information Theory ECE-S622/T62 Notes Chapter I: Fundamental Information Theory Ruifeng Zhang Dept. of Electrical & Computer Eng. Drexel University. Information Source Information is the outcome of some physical processes.

More information

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information 1 Conditional entropy Let (Ω, F, P) be a probability space, let X be a RV taking values in some finite set A. In this lecture

More information

Information Theory - Entropy. Figure 3

Information Theory - Entropy. Figure 3 Concept of Information Information Theory - Entropy Figure 3 A typical binary coded digital communication system is shown in Figure 3. What is involved in the transmission of information? - The system

More information

10-704: Information Processing and Learning Fall Lecture 9: Sept 28

10-704: Information Processing and Learning Fall Lecture 9: Sept 28 10-704: Information Processing and Learning Fall 2016 Lecturer: Siheng Chen Lecture 9: Sept 28 Note: These notes are based on scribed notes from Spring15 offering of this course. LaTeX template courtesy

More information

3F1 Information Theory Course

3F1 Information Theory Course 3F1 Information Theory Course (supervisor copy) 1 3F1 Information Theory Course Jossy Sayir js851@cam.ac.uk November 15, 211 Contents 1 Introduction to Information Theory 4 1.1 Source Coding.....................................

More information

Shannon s Noisy-Channel Coding Theorem

Shannon s Noisy-Channel Coding Theorem Shannon s Noisy-Channel Coding Theorem Lucas Slot Sebastian Zur February 2015 Abstract In information theory, Shannon s Noisy-Channel Coding Theorem states that it is possible to communicate over a noisy

More information