Chapter 11. Information Theory and Statistics

Size: px
Start display at page:

Download "Chapter 11. Information Theory and Statistics"

Transcription

1 Chapter 11 Information Theory and Statistics Peng-Hua Wang Graduate Inst. of Comm. Engineering National Taipei University

2 Chapter Outline Chap. 11 Information Theory and Statistics 11.1 Method of Types 11.2 Law of Large Numbers 11.3 Universal Source Coding 11.4 Large Deviation Theory 11.5 Examples of Sanov s THeorem 11.6 Conditional Limit Theorem 11.7 Hypothesis Testing 11.8 Chernoff-Stein Lemma 11.9 Chernoff Information Fisher Information and the Cramér-Rao Inequality Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 2/34

3 11.1 Method of Types Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 3/34

4 Definitions LetX 1,X 2,... be a sequence ofnsymbols from an alphabet X = {a 1,a 2,...,a M } wherem = X is the number of alphabets. x n x is a sequencex 1,x 2,...x n. The typep x (or empirical probability distribution) of a sequence x 1,x 2,...x n is the relative frequency of each symbol ofx. P x (a) = N(a x) n for alla X wheren(a x) is the number of times the symbola occurs in the sequencex. Example. LetX = {a,b,c},x = aabca. Then the typep x = P aabca is P (a) = 3, P (b) = 1, P (c) = 1, orp = ( 3, 1, 1 ) Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 4/34

5 Definitions The type classt(p) is the set of sequences that have the same type. T(P) = {x : P x = P}. Example. LetX = {a,b,c},x = aabca. Then the type P x = P aabca is P x (a) = 3 5, P x(b) = 1 5, P x(c) = 1 5. The type classt(p x ) is the set of the length-5 sequences that have 3a s, 1band 1c. T(P x ) = {aaabc,aabca,abcaa,bcaaa,...}. The number of elements int(p x ) is ( ) 5 T(P x ) = = 5! = 20. Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 5/34

6 Definitions LetP n denote the set of types with denominatorn. For example, if X = {a,b,c}, P n = {( x1 n, x 2 n, x 3 n ) } : x 1 +x 2 +x 3 = n,x 1 0,x 2 0,x 3 0 wherex 1 = P(a),x 2 = P(b),x 3 = P(c). Theorem. P n (n+1) M Proof. {( x1 P n = n, x 2 n,..., x )} M n where0 x k n. Since there aren+1choices for eachx k, the result follows. Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 6/34

7 Observations The number of sequences of lengthnism n. (exponential inn). The number of types of lengthnis(n+1) M. (polynomial inn). Therefore, at least one type has exponentially many sequences in its type class. In fact, the largest type class has essentially the same number of elements as the entire set of sequences. Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 7/34

8 Theorem Theorem. IfX 1,X 2,...,X n are drawn i.i.d. according toq(x), the probability ofxdepends only on its type and is given by Q n (x) = 2 n(h(p x)+d(p x Q)) where Q n (x) = Pr(x) = n Pr(x i ) = n Q(x i ). i=1 i=1 Proof. n Q n (x) = i ) = i=1q(x a X = Q(a) npx(a) = a X a X Q(a) N(a x) 2 np x(a)logq(a) = 2 n a X P x(a)logq(a) Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 8/34

9 Theorem Proof. (cont.) Since P x (a)logq(a) a X = (P x (a)logq(a)+p x (a)logp x (a) P x (a)logp x (a)) a X = H(P x ) D(P x Q), we have Q n (x) = 2 n(h(p x)+d(p x Q)). Corollary. Ifxis in the type class ofq, then Q n (x) = 2 nh(q). Proof. Ifx T(Q), thenp x = Q andd(p x Q) = 0. Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 9/34

10 Size of T(P) Next, we will estimate the size of T(P). The exact size of T(P) is ( ) n T(P) =. np(a 1 ),np(a 2 ),...,np(a M ) This value is hard to manipulate. We give a simple bound of T(P). We need the following lemmas. Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 10/34

11 Size of T(P) Lemma. Proof. For m n,we have Form < n, m! n! nm n m! n! = 1 2 m 1 2 n = (n+1)(n+2) m n n...n }{{} m n times = n m n m! n! = 1 2 m 1 2 n = 1 (m+1)(m+2) n 1 = 1 n n...n }{{} n = n m nm n Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 11/34

12 Size of T(P) Lemma. The type classt(p) has the the highest probability among all type classes under the probability distributionp. Proof. P n (T(P)) P n (T(ˆP)) for all ˆP Pn. a X P(a)nP(a) P n (T(P)) P n (T(ˆP)) = T(P) T(ˆP) a X P(a)nˆP(a) ( ) n np(a 1 ),np(a 2 ),...,np(a M ) a X = ( ) n nˆp(a 1 ),nˆp(a 2 ),...,nˆp(a M ) = (nˆp(a))! P(a) n(p(a) ˆP(a)) P(a) np(a) Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 12/34 a X P(a) nˆp(a)

13 Size of T(P) Proof. (cont.) a X (np(a)) nˆp(a) np(a) P(a) n(p(a) ˆP(a)) = a X n nˆp(a) np(a) = n n a X ˆP(a) n a X P(a) = n n n = 1 Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 13/34

14 Size of T(P) Theorem. 1 (n+1) M2nH(P) T(P) 2 nh(p). Note. The exact size of T(P) is T(P) = This value is hard to manipulate. ( ) n. np(a 1 ),np(a 2 ),...,np(a M ) Proof. (upper bound) IfX 1,X 2,...,X n are drawn i.i.d. fromp, then 1 P n (T(P)) = x T(P) a X P(a) np(a) = T(P) a X 2 np(a)logp(a) = T(P) 2 n a X P(a)logP(a) = T(P) 2 nh(p). Thus, T(P) 2 nh(p) Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 14/34

15 Size of T(P) Proof. (lower bound) 1 = Q P n P n (T(Q)) Q P n max Q Pn (T(Q)) = Q P n P n (T(P)) (n+1) M P n (T(P)) = (n+1) M T(P) 2 nh(p) Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 15/34

16 Probability of type class Theorem. For anyp P n and any distributionq, the probability of the type classt(p) underq n satisfies Proof. 1 (n+1) M2 nd(p Q) Q n (T(P)) 2 nd(p Q). Q n (T(P))) = (x T(P))Q n (x) = (x T(P))2 n(h(p x)+d(p x Q)) = T(P) 2 n(h(p x)+d(p x Q)) Since we have 1 (n+1) M2nH(P) T(P) 2 nh(p), Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 16/34

17 Summary 1 n P n (n+1) M Q n (x) = 2 n(h(p x)+d(p x Q)) log T(P) H(P) asn. 1 n logqn (T(P)) D(P Q) asn. IfX i Q, the probability of sequences with typep Q approaches 0 asn. Typical sequences aret(q). Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 17/34

18 11.2 Law of Large Numbers Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 18/34

19 Typical Sequences Givenǫ > 0, the typicalt ǫ Q for the distributionqn is defined as T ǫ Q = {x : D(P x Q) ǫ} The probability thatxis nontypical is 1 Q n (TQ) ǫ = P:D(P Q)>ǫ Q n (T(P)) 2 nd(p Q) P:D(P Q)>ǫ 2 nǫ P:D(P Q)>ǫ 2 nǫ = (n+1) M 2 nǫ P Q n ln(n+1) n(ǫ M ) = 2 n Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 19/34

20 Theorem Theorem. LetX 1,X 2,... be i.i.d. P(x). Then ln(n+1) n(ǫ M Pr(D(P x P) > ǫ) 2 n ). Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 20/34

21 11.3 Universal Source Coding Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 21/34

22 Introduction An iid source with a known distributionp(x) can be compressed to its entropyh(x). by Huffman coding. Wrong code for incorrect distributionq(x), a penalty ofd(p q) bits is incurred. Is there a universal code of raterthat is sufficient to compress every iid source with entropyh(x) < R? Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 22/34

23 Concept There are2 nh(p) sequences of typep. There are no more than(n+1) X (polynomial) types. There are no more than(n+1) X 2 nh(p) sequences to describe. IfH(P) < R there are no more than(n+1) X 2 nr sequences to describe. Need nr bits as n. Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 23/34

24 11.4 Large Deviation Theory Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 24/34

25 Large Deviation Theory IfX i is i.i.d. Bernoulli withp(x i = 1) = 1 3 that 1 n n i=1 X i is near 1 3? This is a small deviation., what is the probability Deviation means deviation from the expected outcome. The probability is near 1. What is the probability that 1 n n i=1 X i is greater than 3 4 large deviation. The probability is exponentially small.? This is a We might estimate the exponent using central limit theory, but this is a poor approximation for more than a few standard deviation. We note that 1 n n i=1 X i = 3 4 is equivalent top x = ( 1 4, 3 4). Thus, the probability is approximated to 2 nd(p x Q) = 2 nd (( 1 4,3 4) ( 1 3,2 3)) Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 25/34

26 Definition LetE be a subset of the set of probability mass functions. We write (with a slight abuse of notation) Q n (E) = Q n (E P n ) = x:p x E P n Q n (x) Why do we say this is a slight abuse of notation? The reason is thatq n ( ) in its original meaning represents the probability of a set of sequences. But now we borrow this notion to represent a set of probability mass functions. For example, let X = 2 ande 1 be the probability mass functions with mean= 1. ThenE 1 =. For example, let X = 2 ande 2 be the probability mass functions with mean= 2/2. ThenE 2 P n =. (why?) Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 26/34

27 Definition IfE contains a relative entropy neighborhood ofq, then by the weak law of large numbersq n (E) 1. Specifically, ifp x E, then Q n (E) Q n (T(P x )) = P(D(P x Q) < ǫ) 1 2 n(ǫ X ln(n+1) n ) 1 Otherwise,Q n (E) 0 exponential fast. We will use the method of types to calculate the exponent (rate function.) Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 27/34

28 Example By observation we find that the sample average ofg(x) is greater than or equal toα. This event is equivalent to the eventp X E P n, where E = { P : a X g(a)p(a) α }. Because 1 n n i=1 g(x i ) α a X P X (a)g(a) α P X E P n Thus, Pr ( 1 n n i=1 g(x i ) α ) = Q n (E P n ) = Q n (E) Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 28/34

29 Sanov s Theorem Theorem. LetX 1,X 2,...X n be iid Q(x). LetE P be a set of probability distributions. Then Q n (E) = Q n (E P n ) (n+1) X 2 nd(p Q) where P = argmin P E D(P Q) is the distribution ine that is closet toqin relative entropy. If, in addition,e P n for alln n 0 for somen 0, then 1 n logqn (E) D(P Q). Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 29/34

30 Proof of Upper Bound Q n (E) = Q n (T(P)) P E P n 2 nd(p Q) P E P n max 2 nd(p Q) P E P n P E P n max P E 2 nd(p Q) P E P n = P E P n 2 nminp E D(P Q) = P E P n 2 nd(p Q) (n+1) X 2 nd(p Q) Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 30/34

31 Proof of Lower Bound SinceE P n for alln n 0, we can find a sequence of P n E P n such thatd(p n Q) D(P Q)), and Q n (E) = P E P n Q n (T(P)) Q n (T(P n )) 1 (n+1) X 2 nd(p n Q). Accordingly,D(P Q) is the large deviation rate function. Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 31/34

32 Example 1 Suppose that we toss a fair dientimes, what is the probability that the average of the throws is greater than or equal to 4? From Sanov s theorem, the large deviation rate function isd(p Q) wherep minimizesd(p Q) over all distributionp that satisfy 6 ip i 4, 6 p i = 1 i=1 i=1 By Lagrange multipliers, we construct the cost function J = D(P Q)+λ = 6 i=1 6 ip i +µ Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 32/34 i=1 p i ln p i q i +λ 6 p i +µ i=1 6 i=1 6 i=1 p i p i

33 Example 1 Let J = 0 ln(6p i )+1+iλ+µ = 0 p i = e 1 µ e iλ. p i 6 Substituting them for the constraints, 6 i=1 6 i=1 ip i = e 1 µ 6 p i = e 1 µ 6 6 i=1 6 i=1 ie iλ = 4 e iλ = 1 We can solve numericallye λ = And P = (0.1031,0.1227,0.1461,0.1740,0.2072,0.2468). Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 33/34

34 Example 1 The probability that the average of throws is grater than or equal to 4 is about 2 nd(p Q) Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 34/34

INFORMATION THEORY AND STATISTICS

INFORMATION THEORY AND STATISTICS CHAPTER INFORMATION THEORY AND STATISTICS We now explore the relationship between information theory and statistics. We begin by describing the method of types, which is a powerful technique in large deviation

More information

Information Theory and Statistics, Part I

Information Theory and Statistics, Part I Information Theory and Statistics, Part I Information Theory 2013 Lecture 6 George Mathai May 16, 2013 Outline This lecture will cover Method of Types. Law of Large Numbers. Universal Source Coding. Large

More information

Information Theory and Hypothesis Testing

Information Theory and Hypothesis Testing Summer School on Game Theory and Telecommunications Campione, 7-12 September, 2014 Information Theory and Hypothesis Testing Mauro Barni University of Siena September 8 Review of some basic results linking

More information

Chapter 5. Data Compression

Chapter 5. Data Compression Chapter 5 Data Compression Peng-Hua Wang Graduate Inst. of Comm. Engineering National Taipei University Chapter Outline Chap. 5 Data Compression 5.1 Example of Codes 5.2 Kraft Inequality 5.3 Optimal Codes

More information

IT and large deviation theory

IT and large deviation theory PhD short course Information Theory and Statistics Siena, 15-19 September, 2014 IT and large deviation theory Mauro Barni University of Siena Outline of the short course Part 1: Information theory in a

More information

Chapter 9. Gaussian Channel

Chapter 9. Gaussian Channel Chapter 9 Gaussian Channel Peng-Hua Wang Graduate Inst. of Comm. Engineering National Taipei University Chapter Outline Chap. 9 Gaussian Channel 9.1 Gaussian Channel: Definitions 9.2 Converse to the Coding

More information

Topics. Probability Theory. Perfect Secrecy. Information Theory

Topics. Probability Theory. Perfect Secrecy. Information Theory Topics Probability Theory Perfect Secrecy Information Theory Some Terms (P,C,K,E,D) Computational Security Computational effort required to break cryptosystem Provable Security Relative to another, difficult

More information

National University of Singapore Department of Electrical & Computer Engineering. Examination for

National University of Singapore Department of Electrical & Computer Engineering. Examination for National University of Singapore Department of Electrical & Computer Engineering Examination for EE5139R Information Theory for Communication Systems (Semester I, 2014/15) November/December 2014 Time Allowed:

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal

More information

Necessary and Sufficient Conditions for High-Dimensional Salient Feature Subset Recovery

Necessary and Sufficient Conditions for High-Dimensional Salient Feature Subset Recovery Necessary and Sufficient Conditions for High-Dimensional Salient Feature Subset Recovery Vincent Tan, Matt Johnson, Alan S. Willsky Stochastic Systems Group, Laboratory for Information and Decision Systems,

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006 MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.44 Transmission of Information Spring 2006 Homework 2 Solution name username April 4, 2006 Reading: Chapter

More information

Solutions to Set #2 Data Compression, Huffman code and AEP

Solutions to Set #2 Data Compression, Huffman code and AEP Solutions to Set #2 Data Compression, Huffman code and AEP. Huffman coding. Consider the random variable ( ) x x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0. 0.04 0.04 0.03 0.02 (a) Find a binary Huffman code

More information

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion st Semester 00/ Solutions to Homework Set # Sanov s Theorem, Rate distortion. Sanov s theorem: Prove the simple version of Sanov s theorem for the binary random variables, i.e., let X,X,...,X n be a sequence

More information

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16 EE539R: Problem Set 4 Assigned: 3/08/6, Due: 07/09/6. Cover and Thomas: Problem 3.5 Sets defined by probabilities: Define the set C n (t = {x n : P X n(x n 2 nt } (a We have = P X n(x n P X n(x n 2 nt

More information

Homework Set #2 Data Compression, Huffman code and AEP

Homework Set #2 Data Compression, Huffman code and AEP Homework Set #2 Data Compression, Huffman code and AEP 1. Huffman coding. Consider the random variable ( x1 x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0.11 0.04 0.04 0.03 0.02 (a Find a binary Huffman code

More information

Lecture 22: Error exponents in hypothesis testing, GLRT

Lecture 22: Error exponents in hypothesis testing, GLRT 10-704: Information Processing and Learning Spring 2012 Lecture 22: Error exponents in hypothesis testing, GLRT Lecturer: Aarti Singh Scribe: Aarti Singh Disclaimer: These notes have not been subjected

More information

Information Theory in Intelligent Decision Making

Information Theory in Intelligent Decision Making Information Theory in Intelligent Decision Making Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire, United Kingdom June 7, 2015 Information Theory

More information

Communications Theory and Engineering

Communications Theory and Engineering Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 AEP Asymptotic Equipartition Property AEP In information theory, the analog of

More information

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1) 3- Mathematical methods in communication Lecture 3 Lecturer: Haim Permuter Scribe: Yuval Carmel, Dima Khaykin, Ziv Goldfeld I. REMINDER A. Convex Set A set R is a convex set iff, x,x 2 R, θ, θ, θx + θx

More information

10-704: Information Processing and Learning Fall Lecture 24: Dec 7

10-704: Information Processing and Learning Fall Lecture 24: Dec 7 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 24: Dec 7 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of

More information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information 4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information Ramji Venkataramanan Signal Processing and Communications Lab Department of Engineering ramji.v@eng.cam.ac.uk

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and Estimation I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 22, 2015

More information

Exercises with solutions (Set B)

Exercises with solutions (Set B) Exercises with solutions (Set B) 3. A fair coin is tossed an infinite number of times. Let Y n be a random variable, with n Z, that describes the outcome of the n-th coin toss. If the outcome of the n-th

More information

If the objects are replaced there are n choices each time yielding n r ways. n C r and in the textbook by g(n, r).

If the objects are replaced there are n choices each time yielding n r ways. n C r and in the textbook by g(n, r). Caveat: Not proof read. Corrections appreciated. Combinatorics In the following, n, n 1, r, etc. will denote non-negative integers. Rule 1 The number of ways of ordering n distinguishable objects (also

More information

The Method of Types and Its Application to Information Hiding

The Method of Types and Its Application to Information Hiding The Method of Types and Its Application to Information Hiding Pierre Moulin University of Illinois at Urbana-Champaign www.ifp.uiuc.edu/ moulin/talks/eusipco05-slides.pdf EUSIPCO Antalya, September 7,

More information

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code

Chapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code Chapter 2 Date Compression: Source Coding 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code 2.1 An Introduction to Source Coding Source coding can be seen as an efficient way

More information

Application of Information Theory, Lecture 7. Relative Entropy. Handout Mode. Iftach Haitner. Tel Aviv University.

Application of Information Theory, Lecture 7. Relative Entropy. Handout Mode. Iftach Haitner. Tel Aviv University. Application of Information Theory, Lecture 7 Relative Entropy Handout Mode Iftach Haitner Tel Aviv University. December 1, 2015 Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December

More information

The method of types. PhD short course Information Theory and Statistics Siena, September, Mauro Barni University of Siena

The method of types. PhD short course Information Theory and Statistics Siena, September, Mauro Barni University of Siena PhD short course Iformatio Theory ad Statistics Siea, 15-19 September, 2014 The method of types Mauro Bari Uiversity of Siea Outlie of the course Part 1: Iformatio theory i a utshell Part 2: The method

More information

The PAC Learning Framework -II

The PAC Learning Framework -II The PAC Learning Framework -II Prof. Dan A. Simovici UMB 1 / 1 Outline 1 Finite Hypothesis Space - The Inconsistent Case 2 Deterministic versus stochastic scenario 3 Bayes Error and Noise 2 / 1 Outline

More information

Chapter 5: Data Compression

Chapter 5: Data Compression Chapter 5: Data Compression Definition. A source code C for a random variable X is a mapping from the range of X to the set of finite length strings of symbols from a D-ary alphabet. ˆX: source alphabet,

More information

EE5585 Data Compression May 2, Lecture 27

EE5585 Data Compression May 2, Lecture 27 EE5585 Data Compression May 2, 2013 Lecture 27 Instructor: Arya Mazumdar Scribe: Fangying Zhang Distributed Data Compression/Source Coding In the previous class we used a H-W table as a simple example,

More information

Second-Order Asymptotics in Information Theory

Second-Order Asymptotics in Information Theory Second-Order Asymptotics in Information Theory Vincent Y. F. Tan (vtan@nus.edu.sg) Dept. of ECE and Dept. of Mathematics National University of Singapore (NUS) National Taiwan University November 2015

More information

Data Compression. Limit of Information Compression. October, Examples of codes 1

Data Compression. Limit of Information Compression. October, Examples of codes 1 Data Compression Limit of Information Compression Radu Trîmbiţaş October, 202 Outline Contents Eamples of codes 2 Kraft Inequality 4 2. Kraft Inequality............................ 4 2.2 Kraft inequality

More information

COS597D: Information Theory in Computer Science September 21, Lecture 2

COS597D: Information Theory in Computer Science September 21, Lecture 2 COS597D: Information Theory in Computer Science September 1, 011 Lecture Lecturer: Mark Braverman Scribe: Mark Braverman In the last lecture, we introduced entropy H(X), and conditional entry H(X Y ),

More information

ST5215: Advanced Statistical Theory

ST5215: Advanced Statistical Theory Department of Statistics & Applied Probability Wednesday, October 5, 2011 Lecture 13: Basic elements and notions in decision theory Basic elements X : a sample from a population P P Decision: an action

More information

EE 4TM4: Digital Communications II Information Measures

EE 4TM4: Digital Communications II Information Measures EE 4TM4: Digital Commuicatios II Iformatio Measures Defiitio : The etropy H(X) of a discrete radom variable X is defied by We also write H(p) for the above quatity. Lemma : H(X) 0. H(X) = x X Proof: 0

More information

Hypothesis Testing with Communication Constraints

Hypothesis Testing with Communication Constraints Hypothesis Testing with Communication Constraints Dinesh Krithivasan EECS 750 April 17, 2006 Dinesh Krithivasan (EECS 750) Hyp. testing with comm. constraints April 17, 2006 1 / 21 Presentation Outline

More information

Math Bootcamp 2012 Miscellaneous

Math Bootcamp 2012 Miscellaneous Math Bootcamp 202 Miscellaneous Factorial, combination and permutation The factorial of a positive integer n denoted by n!, is the product of all positive integers less than or equal to n. Define 0! =.

More information

CSE 312 Final Review: Section AA

CSE 312 Final Review: Section AA CSE 312 TAs December 8, 2011 General Information General Information Comprehensive Midterm General Information Comprehensive Midterm Heavily weighted toward material after the midterm Pre-Midterm Material

More information

What is a random variable

What is a random variable OKAN UNIVERSITY FACULTY OF ENGINEERING AND ARCHITECTURE MATH 256 Probability and Random Processes 04 Random Variables Fall 20 Yrd. Doç. Dr. Didem Kivanc Tureli didemk@ieee.org didem.kivanc@okan.edu.tr

More information

Entropy and Large Deviations

Entropy and Large Deviations Entropy and Large Deviations p. 1/32 Entropy and Large Deviations S.R.S. Varadhan Courant Institute, NYU Michigan State Universiy East Lansing March 31, 2015 Entropy comes up in many different contexts.

More information

Stat 260/CS Learning in Sequential Decision Problems.

Stat 260/CS Learning in Sequential Decision Problems. Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Multi-armed bandit algorithms. Exponential families. Cumulant generating function. KL-divergence. KL-UCB for an exponential

More information

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code

More information

1 Review of The Learning Setting

1 Review of The Learning Setting COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #8 Scribe: Changyan Wang February 28, 208 Review of The Learning Setting Last class, we moved beyond the PAC model: in the PAC model we

More information

Chapter 8. Hypothesis Testing. Po-Ning Chen. Department of Communications Engineering. National Chiao-Tung University. Hsin Chu, Taiwan 30050

Chapter 8. Hypothesis Testing. Po-Ning Chen. Department of Communications Engineering. National Chiao-Tung University. Hsin Chu, Taiwan 30050 Chapter 8 Hypothesis Testing Po-Ning Chen Department of Communications Engineering National Chiao-Tung University Hsin Chu, Taiwan 30050 Error exponent and divergence II:8-1 Definition 8.1 (exponent) Arealnumbera

More information

Lecture 3 : Algorithms for source coding. September 30, 2016

Lecture 3 : Algorithms for source coding. September 30, 2016 Lecture 3 : Algorithms for source coding September 30, 2016 Outline 1. Huffman code ; proof of optimality ; 2. Coding with intervals : Shannon-Fano-Elias code and Shannon code ; 3. Arithmetic coding. 1/39

More information

Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation

Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation Akisato KIMURA akisato@ss.titech.ac.jp Tomohiko UYEMATSU uematsu@ss.titech.ac.jp April 2, 999 No. AK-TR-999-02 Abstract

More information

Homework 8 Solutions to Selected Problems

Homework 8 Solutions to Selected Problems Homework 8 Solutions to Selected Problems June 7, 01 1 Chapter 17, Problem Let f(x D[x] and suppose f(x is reducible in D[x]. That is, there exist polynomials g(x and h(x in D[x] such that g(x and h(x

More information

7 Convergence in R d and in Metric Spaces

7 Convergence in R d and in Metric Spaces STA 711: Probability & Measure Theory Robert L. Wolpert 7 Convergence in R d and in Metric Spaces A sequence of elements a n of R d converges to a limit a if and only if, for each ǫ > 0, the sequence a

More information

Stat 260/CS Learning in Sequential Decision Problems. Peter Bartlett

Stat 260/CS Learning in Sequential Decision Problems. Peter Bartlett Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Multi-armed bandit algorithms. Concentration inequalities. P(X ǫ) exp( ψ (ǫ))). Cumulant generating function bounds. Hoeffding

More information

Coding of memoryless sources 1/35

Coding of memoryless sources 1/35 Coding of memoryless sources 1/35 Outline 1. Morse coding ; 2. Definitions : encoding, encoding efficiency ; 3. fixed length codes, encoding integers ; 4. prefix condition ; 5. Kraft and Mac Millan theorems

More information

1 INFO Sep 05

1 INFO Sep 05 Events A 1,...A n are said to be mutually independent if for all subsets S {1,..., n}, p( i S A i ) = p(a i ). (For example, flip a coin N times, then the events {A i = i th flip is heads} are mutually

More information

Information Theory and Coding Techniques

Information Theory and Coding Techniques Information Theory and Coding Techniques Lecture 1.2: Introduction and Course Outlines Information Theory 1 Information Theory and Coding Techniques Prof. Ja-Ling Wu Department of Computer Science and

More information

Chernoff Bounds. Theme: try to show that it is unlikely a random variable X is far away from its expectation.

Chernoff Bounds. Theme: try to show that it is unlikely a random variable X is far away from its expectation. Chernoff Bounds Theme: try to show that it is unlikely a random variable X is far away from its expectation. The more you know about X, the better the bound you obtain. Markov s inequality: use E[X ] Chebyshev

More information

Discrete Mathematics and Probability Theory Fall 2017 Ramchandran and Rao Midterm 2 Solutions

Discrete Mathematics and Probability Theory Fall 2017 Ramchandran and Rao Midterm 2 Solutions CS 70 Discrete Mathematics and Probability Theory Fall 2017 Ramchandran and Rao Midterm 2 Solutions PRINT Your Name: Oski Bear SIGN Your Name: OS K I PRINT Your Student ID: CIRCLE your exam room: Pimentel

More information

Introduction to Information Entropy Adapted from Papoulis (1991)

Introduction to Information Entropy Adapted from Papoulis (1991) Introduction to Information Entropy Adapted from Papoulis (1991) Federico Lombardo Papoulis, A., Probability, Random Variables and Stochastic Processes, 3rd edition, McGraw ill, 1991. 1 1. INTRODUCTION

More information

8. Limit Laws. lim(f g)(x) = lim f(x) lim g(x), (x) = lim x a f(x) g lim x a g(x)

8. Limit Laws. lim(f g)(x) = lim f(x) lim g(x), (x) = lim x a f(x) g lim x a g(x) 8. Limit Laws 8.1. Basic Limit Laws. If f and g are two functions and we know the it of each of them at a given point a, then we can easily compute the it at a of their sum, difference, product, constant

More information

COMM901 Source Coding and Compression. Quiz 1

COMM901 Source Coding and Compression. Quiz 1 German University in Cairo - GUC Faculty of Information Engineering & Technology - IET Department of Communication Engineering Winter Semester 2013/2014 Students Name: Students ID: COMM901 Source Coding

More information

Chaos, Complexity, and Inference (36-462)

Chaos, Complexity, and Inference (36-462) Chaos, Complexity, and Inference (36-462) Lecture 7: Information Theory Cosma Shalizi 3 February 2009 Entropy and Information Measuring randomness and dependence in bits The connection to statistics Long-run

More information

Network coding for multicast relation to compression and generalization of Slepian-Wolf

Network coding for multicast relation to compression and generalization of Slepian-Wolf Network coding for multicast relation to compression and generalization of Slepian-Wolf 1 Overview Review of Slepian-Wolf Distributed network compression Error exponents Source-channel separation issues

More information

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Lecture 6 I. CHANNEL CODING. X n (m) P Y X 6- Introduction to Information Theory Lecture 6 Lecturer: Haim Permuter Scribe: Yoav Eisenberg and Yakov Miron I. CHANNEL CODING We consider the following channel coding problem: m = {,2,..,2 nr} Encoder

More information

On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method

On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method Igal Sason Department of Electrical Engineering Technion - Israel Institute of Technology Haifa 32000, Israel ETH, Zurich,

More information

A One-to-One Code and Its Anti-Redundancy

A One-to-One Code and Its Anti-Redundancy A One-to-One Code and Its Anti-Redundancy W. Szpankowski Department of Computer Science, Purdue University July 4, 2005 This research is supported by NSF, NSA and NIH. Outline of the Talk. Prefix Codes

More information

Large Deviations Performance of Interval Algorithm for Random Number Generation

Large Deviations Performance of Interval Algorithm for Random Number Generation Large Deviations Performance of Interval Algorithm for Random Number Generation Akisato KIMURA akisato@ss.titech.ac.jp Tomohiko UYEMATSU uematsu@ss.titech.ac.jp February 22, 999 No. AK-TR-999-0 Abstract

More information

1 Introduction to information theory

1 Introduction to information theory 1 Introduction to information theory 1.1 Introduction In this chapter we present some of the basic concepts of information theory. The situations we have in mind involve the exchange of information through

More information

Learning Theory. Sridhar Mahadevan. University of Massachusetts. p. 1/38

Learning Theory. Sridhar Mahadevan. University of Massachusetts. p. 1/38 Learning Theory Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts p. 1/38 Topics Probability theory meet machine learning Concentration inequalities: Chebyshev, Chernoff, Hoeffding, and

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

Capacity of AWGN channels

Capacity of AWGN channels Chapter 3 Capacity of AWGN channels In this chapter we prove that the capacity of an AWGN channel with bandwidth W and signal-tonoise ratio SNR is W log 2 (1+SNR) bits per second (b/s). The proof that

More information

APC486/ELE486: Transmission and Compression of Information. Bounds on the Expected Length of Code Words

APC486/ELE486: Transmission and Compression of Information. Bounds on the Expected Length of Code Words APC486/ELE486: Transmission and Compression of Information Bounds on the Expected Length of Code Words Scribe: Kiran Vodrahalli September 8, 204 Notations In these notes, denotes a finite set, called the

More information

Cases Where Finding the Minimum Entropy Coloring of a Characteristic Graph is a Polynomial Time Problem

Cases Where Finding the Minimum Entropy Coloring of a Characteristic Graph is a Polynomial Time Problem Cases Where Finding the Minimum Entropy Coloring of a Characteristic Graph is a Polynomial Time Problem Soheil Feizi, Muriel Médard RLE at MIT Emails: {sfeizi,medard}@mit.edu Abstract In this paper, we

More information

Computing and Communications 2. Information Theory -Entropy

Computing and Communications 2. Information Theory -Entropy 1896 1920 1987 2006 Computing and Communications 2. Information Theory -Entropy Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2017, Autumn 1 Outline Entropy Joint entropy

More information

Signal Processing - Lecture 7

Signal Processing - Lecture 7 1 Introduction Signal Processing - Lecture 7 Fitting a function to a set of data gathered in time sequence can be viewed as signal processing or learning, and is an important topic in information theory.

More information

Quick Tour of Basic Probability Theory and Linear Algebra

Quick Tour of Basic Probability Theory and Linear Algebra Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions

More information

Lecture 5 - Information theory

Lecture 5 - Information theory Lecture 5 - Information theory Jan Bouda FI MU May 18, 2012 Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 1 / 42 Part I Uncertainty and entropy Jan Bouda (FI MU) Lecture 5 - Information

More information

Homework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015

Homework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015 10-704 Homework 1 Due: Thursday 2/5/2015 Instructions: Turn in your homework in class on Thursday 2/5/2015 1. Information Theory Basics and Inequalities C&T 2.47, 2.29 (a) A deck of n cards in order 1,

More information

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel Introduction to Coding Theory CMU: Spring 2010 Notes 3: Stochastic channels and noisy coding theorem bound January 2010 Lecturer: Venkatesan Guruswami Scribe: Venkatesan Guruswami We now turn to the basic

More information

to mere bit flips) may affect the transmission.

to mere bit flips) may affect the transmission. 5 VII. QUANTUM INFORMATION THEORY to mere bit flips) may affect the transmission. A. Introduction B. A few bits of classical information theory Information theory has developed over the past five or six

More information

Testing Monotone High-Dimensional Distributions

Testing Monotone High-Dimensional Distributions Testing Monotone High-Dimensional Distributions Ronitt Rubinfeld Computer Science & Artificial Intelligence Lab. MIT Cambridge, MA 02139 ronitt@theory.lcs.mit.edu Rocco A. Servedio Department of Computer

More information

Intermittent Communication

Intermittent Communication Intermittent Communication Mostafa Khoshnevisan, Student Member, IEEE, and J. Nicholas Laneman, Senior Member, IEEE arxiv:32.42v2 [cs.it] 7 Mar 207 Abstract We formulate a model for intermittent communication

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

Discrete Probability Refresher

Discrete Probability Refresher ECE 1502 Information Theory Discrete Probability Refresher F. R. Kschischang Dept. of Electrical and Computer Engineering University of Toronto January 13, 1999 revised January 11, 2006 Probability theory

More information

Subset Source Coding

Subset Source Coding Fifty-third Annual Allerton Conference Allerton House, UIUC, Illinois, USA September 29 - October 2, 205 Subset Source Coding Ebrahim MolavianJazi and Aylin Yener Wireless Communications and Networking

More information

LECTURE NOTES 57. Lecture 9

LECTURE NOTES 57. Lecture 9 LECTURE NOTES 57 Lecture 9 17. Hypothesis testing A special type of decision problem is hypothesis testing. We partition the parameter space into H [ A with H \ A = ;. Wewrite H 2 H A 2 A. A decision problem

More information

Information-theoretic Secrecy A Cryptographic Perspective

Information-theoretic Secrecy A Cryptographic Perspective Information-theoretic Secrecy A Cryptographic Perspective Stefano Tessaro UC Santa Barbara WCS 2017 April 30, 2017 based on joint works with M. Bellare and A. Vardy Cryptography Computational assumptions

More information

h(x) lim H(x) = lim Since h is nondecreasing then h(x) 0 for all x, and if h is discontinuous at a point x then H(x) > 0. Denote

h(x) lim H(x) = lim Since h is nondecreasing then h(x) 0 for all x, and if h is discontinuous at a point x then H(x) > 0. Denote Real Variables, Fall 4 Problem set 4 Solution suggestions Exercise. Let f be of bounded variation on [a, b]. Show that for each c (a, b), lim x c f(x) and lim x c f(x) exist. Prove that a monotone function

More information

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2 COMPSCI 650 Applied Information Theory Jan 21, 2016 Lecture 2 Instructor: Arya Mazumdar Scribe: Gayane Vardoyan, Jong-Chyi Su 1 Entropy Definition: Entropy is a measure of uncertainty of a random variable.

More information

EECS 750. Hypothesis Testing with Communication Constraints

EECS 750. Hypothesis Testing with Communication Constraints EECS 750 Hypothesis Testing with Communication Constraints Name: Dinesh Krithivasan Abstract In this report, we study a modification of the classical statistical problem of bivariate hypothesis testing.

More information

1 Randomized Computation

1 Randomized Computation CS 6743 Lecture 17 1 Fall 2007 1 Randomized Computation Why is randomness useful? Imagine you have a stack of bank notes, with very few counterfeit ones. You want to choose a genuine bank note to pay at

More information

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Computational Learning Theory Le Song Lecture 11, September 20, 2012 Based on Slides from Eric Xing, CMU Reading: Chap. 7 T.M book 1 Complexity of Learning

More information

Sequences. Chapter 3. n + 1 3n + 2 sin n n. 3. lim (ln(n + 1) ln n) 1. lim. 2. lim. 4. lim (1 + n)1/n. Answers: 1. 1/3; 2. 0; 3. 0; 4. 1.

Sequences. Chapter 3. n + 1 3n + 2 sin n n. 3. lim (ln(n + 1) ln n) 1. lim. 2. lim. 4. lim (1 + n)1/n. Answers: 1. 1/3; 2. 0; 3. 0; 4. 1. Chapter 3 Sequences Both the main elements of calculus (differentiation and integration) require the notion of a limit. Sequences will play a central role when we work with limits. Definition 3.. A Sequence

More information

Chapter 3: Random Variables 1

Chapter 3: Random Variables 1 Chapter 3: Random Variables 1 Yunghsiang S. Han Graduate Institute of Communication Engineering, National Taipei University Taiwan E-mail: yshan@mail.ntpu.edu.tw 1 Modified from the lecture notes by Prof.

More information

Introduction and basic definitions

Introduction and basic definitions Chapter 1 Introduction and basic definitions 1.1 Sample space, events, elementary probability Exercise 1.1 Prove that P( ) = 0. Solution of Exercise 1.1 : Events S (where S is the sample space) and are

More information

A Hierarchy of Information Quantities for Finite Block Length Analysis of Quantum Tasks

A Hierarchy of Information Quantities for Finite Block Length Analysis of Quantum Tasks A Hierarchy of Information Quantities for Finite Block Length Analysis of Quantum Tasks Marco Tomamichel, Masahito Hayashi arxiv: 1208.1478 Also discussing results of: Second Order Asymptotics for Quantum

More information

ELEMENTS O F INFORMATION THEORY

ELEMENTS O F INFORMATION THEORY ELEMENTS O F INFORMATION THEORY THOMAS M. COVER JOY A. THOMAS Preface to the Second Edition Preface to the First Edition Acknowledgments for the Second Edition Acknowledgments for the First Edition x

More information

On Achievable Rates for Channels with. Mismatched Decoding

On Achievable Rates for Channels with. Mismatched Decoding On Achievable Rates for Channels with 1 Mismatched Decoding Anelia Somekh-Baruch arxiv:13050547v1 [csit] 2 May 2013 Abstract The problem of mismatched decoding for discrete memoryless channels is addressed

More information

Information measures in simple coding problems

Information measures in simple coding problems Part I Information measures in simple coding problems in this web service in this web service Source coding and hypothesis testing; information measures A(discrete)source is a sequence {X i } i= of random

More information

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute ENEE 739C: Advanced Topics in Signal Processing: Coding Theory Instructor: Alexander Barg Lecture 6 (draft; 9/6/03. Error exponents for Discrete Memoryless Channels http://www.enee.umd.edu/ abarg/enee739c/course.html

More information

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015 Outline Codes and Cryptography 1 Information Sources and Optimal Codes 2 Building Optimal Codes: Huffman Codes MAMME, Fall 2015 3 Shannon Entropy and Mutual Information PART III Sources Information source:

More information

Lecture 22: Final Review

Lecture 22: Final Review Lecture 22: Final Review Nuts and bolts Fundamental questions and limits Tools Practical algorithms Future topics Dr Yao Xie, ECE587, Information Theory, Duke University Basics Dr Yao Xie, ECE587, Information

More information