Chapter 11. Information Theory and Statistics
|
|
- Alison Wilcox
- 5 years ago
- Views:
Transcription
1 Chapter 11 Information Theory and Statistics Peng-Hua Wang Graduate Inst. of Comm. Engineering National Taipei University
2 Chapter Outline Chap. 11 Information Theory and Statistics 11.1 Method of Types 11.2 Law of Large Numbers 11.3 Universal Source Coding 11.4 Large Deviation Theory 11.5 Examples of Sanov s THeorem 11.6 Conditional Limit Theorem 11.7 Hypothesis Testing 11.8 Chernoff-Stein Lemma 11.9 Chernoff Information Fisher Information and the Cramér-Rao Inequality Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 2/34
3 11.1 Method of Types Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 3/34
4 Definitions LetX 1,X 2,... be a sequence ofnsymbols from an alphabet X = {a 1,a 2,...,a M } wherem = X is the number of alphabets. x n x is a sequencex 1,x 2,...x n. The typep x (or empirical probability distribution) of a sequence x 1,x 2,...x n is the relative frequency of each symbol ofx. P x (a) = N(a x) n for alla X wheren(a x) is the number of times the symbola occurs in the sequencex. Example. LetX = {a,b,c},x = aabca. Then the typep x = P aabca is P (a) = 3, P (b) = 1, P (c) = 1, orp = ( 3, 1, 1 ) Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 4/34
5 Definitions The type classt(p) is the set of sequences that have the same type. T(P) = {x : P x = P}. Example. LetX = {a,b,c},x = aabca. Then the type P x = P aabca is P x (a) = 3 5, P x(b) = 1 5, P x(c) = 1 5. The type classt(p x ) is the set of the length-5 sequences that have 3a s, 1band 1c. T(P x ) = {aaabc,aabca,abcaa,bcaaa,...}. The number of elements int(p x ) is ( ) 5 T(P x ) = = 5! = 20. Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 5/34
6 Definitions LetP n denote the set of types with denominatorn. For example, if X = {a,b,c}, P n = {( x1 n, x 2 n, x 3 n ) } : x 1 +x 2 +x 3 = n,x 1 0,x 2 0,x 3 0 wherex 1 = P(a),x 2 = P(b),x 3 = P(c). Theorem. P n (n+1) M Proof. {( x1 P n = n, x 2 n,..., x )} M n where0 x k n. Since there aren+1choices for eachx k, the result follows. Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 6/34
7 Observations The number of sequences of lengthnism n. (exponential inn). The number of types of lengthnis(n+1) M. (polynomial inn). Therefore, at least one type has exponentially many sequences in its type class. In fact, the largest type class has essentially the same number of elements as the entire set of sequences. Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 7/34
8 Theorem Theorem. IfX 1,X 2,...,X n are drawn i.i.d. according toq(x), the probability ofxdepends only on its type and is given by Q n (x) = 2 n(h(p x)+d(p x Q)) where Q n (x) = Pr(x) = n Pr(x i ) = n Q(x i ). i=1 i=1 Proof. n Q n (x) = i ) = i=1q(x a X = Q(a) npx(a) = a X a X Q(a) N(a x) 2 np x(a)logq(a) = 2 n a X P x(a)logq(a) Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 8/34
9 Theorem Proof. (cont.) Since P x (a)logq(a) a X = (P x (a)logq(a)+p x (a)logp x (a) P x (a)logp x (a)) a X = H(P x ) D(P x Q), we have Q n (x) = 2 n(h(p x)+d(p x Q)). Corollary. Ifxis in the type class ofq, then Q n (x) = 2 nh(q). Proof. Ifx T(Q), thenp x = Q andd(p x Q) = 0. Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 9/34
10 Size of T(P) Next, we will estimate the size of T(P). The exact size of T(P) is ( ) n T(P) =. np(a 1 ),np(a 2 ),...,np(a M ) This value is hard to manipulate. We give a simple bound of T(P). We need the following lemmas. Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 10/34
11 Size of T(P) Lemma. Proof. For m n,we have Form < n, m! n! nm n m! n! = 1 2 m 1 2 n = (n+1)(n+2) m n n...n }{{} m n times = n m n m! n! = 1 2 m 1 2 n = 1 (m+1)(m+2) n 1 = 1 n n...n }{{} n = n m nm n Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 11/34
12 Size of T(P) Lemma. The type classt(p) has the the highest probability among all type classes under the probability distributionp. Proof. P n (T(P)) P n (T(ˆP)) for all ˆP Pn. a X P(a)nP(a) P n (T(P)) P n (T(ˆP)) = T(P) T(ˆP) a X P(a)nˆP(a) ( ) n np(a 1 ),np(a 2 ),...,np(a M ) a X = ( ) n nˆp(a 1 ),nˆp(a 2 ),...,nˆp(a M ) = (nˆp(a))! P(a) n(p(a) ˆP(a)) P(a) np(a) Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 12/34 a X P(a) nˆp(a)
13 Size of T(P) Proof. (cont.) a X (np(a)) nˆp(a) np(a) P(a) n(p(a) ˆP(a)) = a X n nˆp(a) np(a) = n n a X ˆP(a) n a X P(a) = n n n = 1 Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 13/34
14 Size of T(P) Theorem. 1 (n+1) M2nH(P) T(P) 2 nh(p). Note. The exact size of T(P) is T(P) = This value is hard to manipulate. ( ) n. np(a 1 ),np(a 2 ),...,np(a M ) Proof. (upper bound) IfX 1,X 2,...,X n are drawn i.i.d. fromp, then 1 P n (T(P)) = x T(P) a X P(a) np(a) = T(P) a X 2 np(a)logp(a) = T(P) 2 n a X P(a)logP(a) = T(P) 2 nh(p). Thus, T(P) 2 nh(p) Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 14/34
15 Size of T(P) Proof. (lower bound) 1 = Q P n P n (T(Q)) Q P n max Q Pn (T(Q)) = Q P n P n (T(P)) (n+1) M P n (T(P)) = (n+1) M T(P) 2 nh(p) Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 15/34
16 Probability of type class Theorem. For anyp P n and any distributionq, the probability of the type classt(p) underq n satisfies Proof. 1 (n+1) M2 nd(p Q) Q n (T(P)) 2 nd(p Q). Q n (T(P))) = (x T(P))Q n (x) = (x T(P))2 n(h(p x)+d(p x Q)) = T(P) 2 n(h(p x)+d(p x Q)) Since we have 1 (n+1) M2nH(P) T(P) 2 nh(p), Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 16/34
17 Summary 1 n P n (n+1) M Q n (x) = 2 n(h(p x)+d(p x Q)) log T(P) H(P) asn. 1 n logqn (T(P)) D(P Q) asn. IfX i Q, the probability of sequences with typep Q approaches 0 asn. Typical sequences aret(q). Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 17/34
18 11.2 Law of Large Numbers Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 18/34
19 Typical Sequences Givenǫ > 0, the typicalt ǫ Q for the distributionqn is defined as T ǫ Q = {x : D(P x Q) ǫ} The probability thatxis nontypical is 1 Q n (TQ) ǫ = P:D(P Q)>ǫ Q n (T(P)) 2 nd(p Q) P:D(P Q)>ǫ 2 nǫ P:D(P Q)>ǫ 2 nǫ = (n+1) M 2 nǫ P Q n ln(n+1) n(ǫ M ) = 2 n Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 19/34
20 Theorem Theorem. LetX 1,X 2,... be i.i.d. P(x). Then ln(n+1) n(ǫ M Pr(D(P x P) > ǫ) 2 n ). Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 20/34
21 11.3 Universal Source Coding Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 21/34
22 Introduction An iid source with a known distributionp(x) can be compressed to its entropyh(x). by Huffman coding. Wrong code for incorrect distributionq(x), a penalty ofd(p q) bits is incurred. Is there a universal code of raterthat is sufficient to compress every iid source with entropyh(x) < R? Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 22/34
23 Concept There are2 nh(p) sequences of typep. There are no more than(n+1) X (polynomial) types. There are no more than(n+1) X 2 nh(p) sequences to describe. IfH(P) < R there are no more than(n+1) X 2 nr sequences to describe. Need nr bits as n. Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 23/34
24 11.4 Large Deviation Theory Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 24/34
25 Large Deviation Theory IfX i is i.i.d. Bernoulli withp(x i = 1) = 1 3 that 1 n n i=1 X i is near 1 3? This is a small deviation., what is the probability Deviation means deviation from the expected outcome. The probability is near 1. What is the probability that 1 n n i=1 X i is greater than 3 4 large deviation. The probability is exponentially small.? This is a We might estimate the exponent using central limit theory, but this is a poor approximation for more than a few standard deviation. We note that 1 n n i=1 X i = 3 4 is equivalent top x = ( 1 4, 3 4). Thus, the probability is approximated to 2 nd(p x Q) = 2 nd (( 1 4,3 4) ( 1 3,2 3)) Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 25/34
26 Definition LetE be a subset of the set of probability mass functions. We write (with a slight abuse of notation) Q n (E) = Q n (E P n ) = x:p x E P n Q n (x) Why do we say this is a slight abuse of notation? The reason is thatq n ( ) in its original meaning represents the probability of a set of sequences. But now we borrow this notion to represent a set of probability mass functions. For example, let X = 2 ande 1 be the probability mass functions with mean= 1. ThenE 1 =. For example, let X = 2 ande 2 be the probability mass functions with mean= 2/2. ThenE 2 P n =. (why?) Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 26/34
27 Definition IfE contains a relative entropy neighborhood ofq, then by the weak law of large numbersq n (E) 1. Specifically, ifp x E, then Q n (E) Q n (T(P x )) = P(D(P x Q) < ǫ) 1 2 n(ǫ X ln(n+1) n ) 1 Otherwise,Q n (E) 0 exponential fast. We will use the method of types to calculate the exponent (rate function.) Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 27/34
28 Example By observation we find that the sample average ofg(x) is greater than or equal toα. This event is equivalent to the eventp X E P n, where E = { P : a X g(a)p(a) α }. Because 1 n n i=1 g(x i ) α a X P X (a)g(a) α P X E P n Thus, Pr ( 1 n n i=1 g(x i ) α ) = Q n (E P n ) = Q n (E) Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 28/34
29 Sanov s Theorem Theorem. LetX 1,X 2,...X n be iid Q(x). LetE P be a set of probability distributions. Then Q n (E) = Q n (E P n ) (n+1) X 2 nd(p Q) where P = argmin P E D(P Q) is the distribution ine that is closet toqin relative entropy. If, in addition,e P n for alln n 0 for somen 0, then 1 n logqn (E) D(P Q). Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 29/34
30 Proof of Upper Bound Q n (E) = Q n (T(P)) P E P n 2 nd(p Q) P E P n max 2 nd(p Q) P E P n P E P n max P E 2 nd(p Q) P E P n = P E P n 2 nminp E D(P Q) = P E P n 2 nd(p Q) (n+1) X 2 nd(p Q) Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 30/34
31 Proof of Lower Bound SinceE P n for alln n 0, we can find a sequence of P n E P n such thatd(p n Q) D(P Q)), and Q n (E) = P E P n Q n (T(P)) Q n (T(P n )) 1 (n+1) X 2 nd(p n Q). Accordingly,D(P Q) is the large deviation rate function. Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 31/34
32 Example 1 Suppose that we toss a fair dientimes, what is the probability that the average of the throws is greater than or equal to 4? From Sanov s theorem, the large deviation rate function isd(p Q) wherep minimizesd(p Q) over all distributionp that satisfy 6 ip i 4, 6 p i = 1 i=1 i=1 By Lagrange multipliers, we construct the cost function J = D(P Q)+λ = 6 i=1 6 ip i +µ Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 32/34 i=1 p i ln p i q i +λ 6 p i +µ i=1 6 i=1 6 i=1 p i p i
33 Example 1 Let J = 0 ln(6p i )+1+iλ+µ = 0 p i = e 1 µ e iλ. p i 6 Substituting them for the constraints, 6 i=1 6 i=1 ip i = e 1 µ 6 p i = e 1 µ 6 6 i=1 6 i=1 ie iλ = 4 e iλ = 1 We can solve numericallye λ = And P = (0.1031,0.1227,0.1461,0.1740,0.2072,0.2468). Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 33/34
34 Example 1 The probability that the average of throws is grater than or equal to 4 is about 2 nd(p Q) Peng-Hua Wang, May 21, 2012 Information Theory, Chap p. 34/34
INFORMATION THEORY AND STATISTICS
CHAPTER INFORMATION THEORY AND STATISTICS We now explore the relationship between information theory and statistics. We begin by describing the method of types, which is a powerful technique in large deviation
More informationInformation Theory and Statistics, Part I
Information Theory and Statistics, Part I Information Theory 2013 Lecture 6 George Mathai May 16, 2013 Outline This lecture will cover Method of Types. Law of Large Numbers. Universal Source Coding. Large
More informationInformation Theory and Hypothesis Testing
Summer School on Game Theory and Telecommunications Campione, 7-12 September, 2014 Information Theory and Hypothesis Testing Mauro Barni University of Siena September 8 Review of some basic results linking
More informationChapter 5. Data Compression
Chapter 5 Data Compression Peng-Hua Wang Graduate Inst. of Comm. Engineering National Taipei University Chapter Outline Chap. 5 Data Compression 5.1 Example of Codes 5.2 Kraft Inequality 5.3 Optimal Codes
More informationIT and large deviation theory
PhD short course Information Theory and Statistics Siena, 15-19 September, 2014 IT and large deviation theory Mauro Barni University of Siena Outline of the short course Part 1: Information theory in a
More informationChapter 9. Gaussian Channel
Chapter 9 Gaussian Channel Peng-Hua Wang Graduate Inst. of Comm. Engineering National Taipei University Chapter Outline Chap. 9 Gaussian Channel 9.1 Gaussian Channel: Definitions 9.2 Converse to the Coding
More informationTopics. Probability Theory. Perfect Secrecy. Information Theory
Topics Probability Theory Perfect Secrecy Information Theory Some Terms (P,C,K,E,D) Computational Security Computational effort required to break cryptosystem Provable Security Relative to another, difficult
More informationNational University of Singapore Department of Electrical & Computer Engineering. Examination for
National University of Singapore Department of Electrical & Computer Engineering Examination for EE5139R Information Theory for Communication Systems (Semester I, 2014/15) November/December 2014 Time Allowed:
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationSource Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria
Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal
More informationNecessary and Sufficient Conditions for High-Dimensional Salient Feature Subset Recovery
Necessary and Sufficient Conditions for High-Dimensional Salient Feature Subset Recovery Vincent Tan, Matt Johnson, Alan S. Willsky Stochastic Systems Group, Laboratory for Information and Decision Systems,
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006
MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science 6.44 Transmission of Information Spring 2006 Homework 2 Solution name username April 4, 2006 Reading: Chapter
More informationSolutions to Set #2 Data Compression, Huffman code and AEP
Solutions to Set #2 Data Compression, Huffman code and AEP. Huffman coding. Consider the random variable ( ) x x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0. 0.04 0.04 0.03 0.02 (a) Find a binary Huffman code
More informationSolutions to Homework Set #1 Sanov s Theorem, Rate distortion
st Semester 00/ Solutions to Homework Set # Sanov s Theorem, Rate distortion. Sanov s theorem: Prove the simple version of Sanov s theorem for the binary random variables, i.e., let X,X,...,X n be a sequence
More informationEE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16
EE539R: Problem Set 4 Assigned: 3/08/6, Due: 07/09/6. Cover and Thomas: Problem 3.5 Sets defined by probabilities: Define the set C n (t = {x n : P X n(x n 2 nt } (a We have = P X n(x n P X n(x n 2 nt
More informationHomework Set #2 Data Compression, Huffman code and AEP
Homework Set #2 Data Compression, Huffman code and AEP 1. Huffman coding. Consider the random variable ( x1 x X = 2 x 3 x 4 x 5 x 6 x 7 0.50 0.26 0.11 0.04 0.04 0.03 0.02 (a Find a binary Huffman code
More informationLecture 22: Error exponents in hypothesis testing, GLRT
10-704: Information Processing and Learning Spring 2012 Lecture 22: Error exponents in hypothesis testing, GLRT Lecturer: Aarti Singh Scribe: Aarti Singh Disclaimer: These notes have not been subjected
More informationInformation Theory in Intelligent Decision Making
Information Theory in Intelligent Decision Making Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire, United Kingdom June 7, 2015 Information Theory
More informationCommunications Theory and Engineering
Communications Theory and Engineering Master's Degree in Electronic Engineering Sapienza University of Rome A.A. 2018-2019 AEP Asymptotic Equipartition Property AEP In information theory, the analog of
More informationLecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)
3- Mathematical methods in communication Lecture 3 Lecturer: Haim Permuter Scribe: Yuval Carmel, Dima Khaykin, Ziv Goldfeld I. REMINDER A. Convex Set A set R is a convex set iff, x,x 2 R, θ, θ, θx + θx
More information10-704: Information Processing and Learning Fall Lecture 24: Dec 7
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 24: Dec 7 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of
More information4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information
4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information Ramji Venkataramanan Signal Processing and Communications Lab Department of Engineering ramji.v@eng.cam.ac.uk
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and Estimation I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 22, 2015
More informationExercises with solutions (Set B)
Exercises with solutions (Set B) 3. A fair coin is tossed an infinite number of times. Let Y n be a random variable, with n Z, that describes the outcome of the n-th coin toss. If the outcome of the n-th
More informationIf the objects are replaced there are n choices each time yielding n r ways. n C r and in the textbook by g(n, r).
Caveat: Not proof read. Corrections appreciated. Combinatorics In the following, n, n 1, r, etc. will denote non-negative integers. Rule 1 The number of ways of ordering n distinguishable objects (also
More informationThe Method of Types and Its Application to Information Hiding
The Method of Types and Its Application to Information Hiding Pierre Moulin University of Illinois at Urbana-Champaign www.ifp.uiuc.edu/ moulin/talks/eusipco05-slides.pdf EUSIPCO Antalya, September 7,
More informationChapter 2 Date Compression: Source Coding. 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code
Chapter 2 Date Compression: Source Coding 2.1 An Introduction to Source Coding 2.2 Optimal Source Codes 2.3 Huffman Code 2.1 An Introduction to Source Coding Source coding can be seen as an efficient way
More informationApplication of Information Theory, Lecture 7. Relative Entropy. Handout Mode. Iftach Haitner. Tel Aviv University.
Application of Information Theory, Lecture 7 Relative Entropy Handout Mode Iftach Haitner Tel Aviv University. December 1, 2015 Iftach Haitner (TAU) Application of Information Theory, Lecture 7 December
More informationThe method of types. PhD short course Information Theory and Statistics Siena, September, Mauro Barni University of Siena
PhD short course Iformatio Theory ad Statistics Siea, 15-19 September, 2014 The method of types Mauro Bari Uiversity of Siea Outlie of the course Part 1: Iformatio theory i a utshell Part 2: The method
More informationThe PAC Learning Framework -II
The PAC Learning Framework -II Prof. Dan A. Simovici UMB 1 / 1 Outline 1 Finite Hypothesis Space - The Inconsistent Case 2 Deterministic versus stochastic scenario 3 Bayes Error and Noise 2 / 1 Outline
More informationChapter 5: Data Compression
Chapter 5: Data Compression Definition. A source code C for a random variable X is a mapping from the range of X to the set of finite length strings of symbols from a D-ary alphabet. ˆX: source alphabet,
More informationEE5585 Data Compression May 2, Lecture 27
EE5585 Data Compression May 2, 2013 Lecture 27 Instructor: Arya Mazumdar Scribe: Fangying Zhang Distributed Data Compression/Source Coding In the previous class we used a H-W table as a simple example,
More informationSecond-Order Asymptotics in Information Theory
Second-Order Asymptotics in Information Theory Vincent Y. F. Tan (vtan@nus.edu.sg) Dept. of ECE and Dept. of Mathematics National University of Singapore (NUS) National Taiwan University November 2015
More informationData Compression. Limit of Information Compression. October, Examples of codes 1
Data Compression Limit of Information Compression Radu Trîmbiţaş October, 202 Outline Contents Eamples of codes 2 Kraft Inequality 4 2. Kraft Inequality............................ 4 2.2 Kraft inequality
More informationCOS597D: Information Theory in Computer Science September 21, Lecture 2
COS597D: Information Theory in Computer Science September 1, 011 Lecture Lecturer: Mark Braverman Scribe: Mark Braverman In the last lecture, we introduced entropy H(X), and conditional entry H(X Y ),
More informationST5215: Advanced Statistical Theory
Department of Statistics & Applied Probability Wednesday, October 5, 2011 Lecture 13: Basic elements and notions in decision theory Basic elements X : a sample from a population P P Decision: an action
More informationEE 4TM4: Digital Communications II Information Measures
EE 4TM4: Digital Commuicatios II Iformatio Measures Defiitio : The etropy H(X) of a discrete radom variable X is defied by We also write H(p) for the above quatity. Lemma : H(X) 0. H(X) = x X Proof: 0
More informationHypothesis Testing with Communication Constraints
Hypothesis Testing with Communication Constraints Dinesh Krithivasan EECS 750 April 17, 2006 Dinesh Krithivasan (EECS 750) Hyp. testing with comm. constraints April 17, 2006 1 / 21 Presentation Outline
More informationMath Bootcamp 2012 Miscellaneous
Math Bootcamp 202 Miscellaneous Factorial, combination and permutation The factorial of a positive integer n denoted by n!, is the product of all positive integers less than or equal to n. Define 0! =.
More informationCSE 312 Final Review: Section AA
CSE 312 TAs December 8, 2011 General Information General Information Comprehensive Midterm General Information Comprehensive Midterm Heavily weighted toward material after the midterm Pre-Midterm Material
More informationWhat is a random variable
OKAN UNIVERSITY FACULTY OF ENGINEERING AND ARCHITECTURE MATH 256 Probability and Random Processes 04 Random Variables Fall 20 Yrd. Doç. Dr. Didem Kivanc Tureli didemk@ieee.org didem.kivanc@okan.edu.tr
More informationEntropy and Large Deviations
Entropy and Large Deviations p. 1/32 Entropy and Large Deviations S.R.S. Varadhan Courant Institute, NYU Michigan State Universiy East Lansing March 31, 2015 Entropy comes up in many different contexts.
More informationStat 260/CS Learning in Sequential Decision Problems.
Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Multi-armed bandit algorithms. Exponential families. Cumulant generating function. KL-divergence. KL-UCB for an exponential
More informationEE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018
Please submit the solutions on Gradescope. EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018 1. Optimal codeword lengths. Although the codeword lengths of an optimal variable length code
More information1 Review of The Learning Setting
COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #8 Scribe: Changyan Wang February 28, 208 Review of The Learning Setting Last class, we moved beyond the PAC model: in the PAC model we
More informationChapter 8. Hypothesis Testing. Po-Ning Chen. Department of Communications Engineering. National Chiao-Tung University. Hsin Chu, Taiwan 30050
Chapter 8 Hypothesis Testing Po-Ning Chen Department of Communications Engineering National Chiao-Tung University Hsin Chu, Taiwan 30050 Error exponent and divergence II:8-1 Definition 8.1 (exponent) Arealnumbera
More informationLecture 3 : Algorithms for source coding. September 30, 2016
Lecture 3 : Algorithms for source coding September 30, 2016 Outline 1. Huffman code ; proof of optimality ; 2. Coding with intervals : Shannon-Fano-Elias code and Shannon code ; 3. Arithmetic coding. 1/39
More informationLarge Deviations Performance of Knuth-Yao algorithm for Random Number Generation
Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation Akisato KIMURA akisato@ss.titech.ac.jp Tomohiko UYEMATSU uematsu@ss.titech.ac.jp April 2, 999 No. AK-TR-999-02 Abstract
More informationHomework 8 Solutions to Selected Problems
Homework 8 Solutions to Selected Problems June 7, 01 1 Chapter 17, Problem Let f(x D[x] and suppose f(x is reducible in D[x]. That is, there exist polynomials g(x and h(x in D[x] such that g(x and h(x
More information7 Convergence in R d and in Metric Spaces
STA 711: Probability & Measure Theory Robert L. Wolpert 7 Convergence in R d and in Metric Spaces A sequence of elements a n of R d converges to a limit a if and only if, for each ǫ > 0, the sequence a
More informationStat 260/CS Learning in Sequential Decision Problems. Peter Bartlett
Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Multi-armed bandit algorithms. Concentration inequalities. P(X ǫ) exp( ψ (ǫ))). Cumulant generating function bounds. Hoeffding
More informationCoding of memoryless sources 1/35
Coding of memoryless sources 1/35 Outline 1. Morse coding ; 2. Definitions : encoding, encoding efficiency ; 3. fixed length codes, encoding integers ; 4. prefix condition ; 5. Kraft and Mac Millan theorems
More information1 INFO Sep 05
Events A 1,...A n are said to be mutually independent if for all subsets S {1,..., n}, p( i S A i ) = p(a i ). (For example, flip a coin N times, then the events {A i = i th flip is heads} are mutually
More informationInformation Theory and Coding Techniques
Information Theory and Coding Techniques Lecture 1.2: Introduction and Course Outlines Information Theory 1 Information Theory and Coding Techniques Prof. Ja-Ling Wu Department of Computer Science and
More informationChernoff Bounds. Theme: try to show that it is unlikely a random variable X is far away from its expectation.
Chernoff Bounds Theme: try to show that it is unlikely a random variable X is far away from its expectation. The more you know about X, the better the bound you obtain. Markov s inequality: use E[X ] Chebyshev
More informationDiscrete Mathematics and Probability Theory Fall 2017 Ramchandran and Rao Midterm 2 Solutions
CS 70 Discrete Mathematics and Probability Theory Fall 2017 Ramchandran and Rao Midterm 2 Solutions PRINT Your Name: Oski Bear SIGN Your Name: OS K I PRINT Your Student ID: CIRCLE your exam room: Pimentel
More informationIntroduction to Information Entropy Adapted from Papoulis (1991)
Introduction to Information Entropy Adapted from Papoulis (1991) Federico Lombardo Papoulis, A., Probability, Random Variables and Stochastic Processes, 3rd edition, McGraw ill, 1991. 1 1. INTRODUCTION
More information8. Limit Laws. lim(f g)(x) = lim f(x) lim g(x), (x) = lim x a f(x) g lim x a g(x)
8. Limit Laws 8.1. Basic Limit Laws. If f and g are two functions and we know the it of each of them at a given point a, then we can easily compute the it at a of their sum, difference, product, constant
More informationCOMM901 Source Coding and Compression. Quiz 1
German University in Cairo - GUC Faculty of Information Engineering & Technology - IET Department of Communication Engineering Winter Semester 2013/2014 Students Name: Students ID: COMM901 Source Coding
More informationChaos, Complexity, and Inference (36-462)
Chaos, Complexity, and Inference (36-462) Lecture 7: Information Theory Cosma Shalizi 3 February 2009 Entropy and Information Measuring randomness and dependence in bits The connection to statistics Long-run
More informationNetwork coding for multicast relation to compression and generalization of Slepian-Wolf
Network coding for multicast relation to compression and generalization of Slepian-Wolf 1 Overview Review of Slepian-Wolf Distributed network compression Error exponents Source-channel separation issues
More informationLecture 6 I. CHANNEL CODING. X n (m) P Y X
6- Introduction to Information Theory Lecture 6 Lecturer: Haim Permuter Scribe: Yoav Eisenberg and Yakov Miron I. CHANNEL CODING We consider the following channel coding problem: m = {,2,..,2 nr} Encoder
More informationOn the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method
On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method Igal Sason Department of Electrical Engineering Technion - Israel Institute of Technology Haifa 32000, Israel ETH, Zurich,
More informationA One-to-One Code and Its Anti-Redundancy
A One-to-One Code and Its Anti-Redundancy W. Szpankowski Department of Computer Science, Purdue University July 4, 2005 This research is supported by NSF, NSA and NIH. Outline of the Talk. Prefix Codes
More informationLarge Deviations Performance of Interval Algorithm for Random Number Generation
Large Deviations Performance of Interval Algorithm for Random Number Generation Akisato KIMURA akisato@ss.titech.ac.jp Tomohiko UYEMATSU uematsu@ss.titech.ac.jp February 22, 999 No. AK-TR-999-0 Abstract
More information1 Introduction to information theory
1 Introduction to information theory 1.1 Introduction In this chapter we present some of the basic concepts of information theory. The situations we have in mind involve the exchange of information through
More informationLearning Theory. Sridhar Mahadevan. University of Massachusetts. p. 1/38
Learning Theory Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts p. 1/38 Topics Probability theory meet machine learning Concentration inequalities: Chebyshev, Chernoff, Hoeffding, and
More informationLecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016
Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,
More informationCapacity of AWGN channels
Chapter 3 Capacity of AWGN channels In this chapter we prove that the capacity of an AWGN channel with bandwidth W and signal-tonoise ratio SNR is W log 2 (1+SNR) bits per second (b/s). The proof that
More informationAPC486/ELE486: Transmission and Compression of Information. Bounds on the Expected Length of Code Words
APC486/ELE486: Transmission and Compression of Information Bounds on the Expected Length of Code Words Scribe: Kiran Vodrahalli September 8, 204 Notations In these notes, denotes a finite set, called the
More informationCases Where Finding the Minimum Entropy Coloring of a Characteristic Graph is a Polynomial Time Problem
Cases Where Finding the Minimum Entropy Coloring of a Characteristic Graph is a Polynomial Time Problem Soheil Feizi, Muriel Médard RLE at MIT Emails: {sfeizi,medard}@mit.edu Abstract In this paper, we
More informationComputing and Communications 2. Information Theory -Entropy
1896 1920 1987 2006 Computing and Communications 2. Information Theory -Entropy Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2017, Autumn 1 Outline Entropy Joint entropy
More informationSignal Processing - Lecture 7
1 Introduction Signal Processing - Lecture 7 Fitting a function to a set of data gathered in time sequence can be viewed as signal processing or learning, and is an important topic in information theory.
More informationQuick Tour of Basic Probability Theory and Linear Algebra
Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions
More informationLecture 5 - Information theory
Lecture 5 - Information theory Jan Bouda FI MU May 18, 2012 Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 1 / 42 Part I Uncertainty and entropy Jan Bouda (FI MU) Lecture 5 - Information
More informationHomework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015
10-704 Homework 1 Due: Thursday 2/5/2015 Instructions: Turn in your homework in class on Thursday 2/5/2015 1. Information Theory Basics and Inequalities C&T 2.47, 2.29 (a) A deck of n cards in order 1,
More informationNotes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel
Introduction to Coding Theory CMU: Spring 2010 Notes 3: Stochastic channels and noisy coding theorem bound January 2010 Lecturer: Venkatesan Guruswami Scribe: Venkatesan Guruswami We now turn to the basic
More informationto mere bit flips) may affect the transmission.
5 VII. QUANTUM INFORMATION THEORY to mere bit flips) may affect the transmission. A. Introduction B. A few bits of classical information theory Information theory has developed over the past five or six
More informationTesting Monotone High-Dimensional Distributions
Testing Monotone High-Dimensional Distributions Ronitt Rubinfeld Computer Science & Artificial Intelligence Lab. MIT Cambridge, MA 02139 ronitt@theory.lcs.mit.edu Rocco A. Servedio Department of Computer
More informationIntermittent Communication
Intermittent Communication Mostafa Khoshnevisan, Student Member, IEEE, and J. Nicholas Laneman, Senior Member, IEEE arxiv:32.42v2 [cs.it] 7 Mar 207 Abstract We formulate a model for intermittent communication
More informationPROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability
More informationDiscrete Probability Refresher
ECE 1502 Information Theory Discrete Probability Refresher F. R. Kschischang Dept. of Electrical and Computer Engineering University of Toronto January 13, 1999 revised January 11, 2006 Probability theory
More informationSubset Source Coding
Fifty-third Annual Allerton Conference Allerton House, UIUC, Illinois, USA September 29 - October 2, 205 Subset Source Coding Ebrahim MolavianJazi and Aylin Yener Wireless Communications and Networking
More informationLECTURE NOTES 57. Lecture 9
LECTURE NOTES 57 Lecture 9 17. Hypothesis testing A special type of decision problem is hypothesis testing. We partition the parameter space into H [ A with H \ A = ;. Wewrite H 2 H A 2 A. A decision problem
More informationInformation-theoretic Secrecy A Cryptographic Perspective
Information-theoretic Secrecy A Cryptographic Perspective Stefano Tessaro UC Santa Barbara WCS 2017 April 30, 2017 based on joint works with M. Bellare and A. Vardy Cryptography Computational assumptions
More informationh(x) lim H(x) = lim Since h is nondecreasing then h(x) 0 for all x, and if h is discontinuous at a point x then H(x) > 0. Denote
Real Variables, Fall 4 Problem set 4 Solution suggestions Exercise. Let f be of bounded variation on [a, b]. Show that for each c (a, b), lim x c f(x) and lim x c f(x) exist. Prove that a monotone function
More informationCOMPSCI 650 Applied Information Theory Jan 21, Lecture 2
COMPSCI 650 Applied Information Theory Jan 21, 2016 Lecture 2 Instructor: Arya Mazumdar Scribe: Gayane Vardoyan, Jong-Chyi Su 1 Entropy Definition: Entropy is a measure of uncertainty of a random variable.
More informationEECS 750. Hypothesis Testing with Communication Constraints
EECS 750 Hypothesis Testing with Communication Constraints Name: Dinesh Krithivasan Abstract In this report, we study a modification of the classical statistical problem of bivariate hypothesis testing.
More information1 Randomized Computation
CS 6743 Lecture 17 1 Fall 2007 1 Randomized Computation Why is randomness useful? Imagine you have a stack of bank notes, with very few counterfeit ones. You want to choose a genuine bank note to pay at
More informationMachine Learning. Computational Learning Theory. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Computational Learning Theory Le Song Lecture 11, September 20, 2012 Based on Slides from Eric Xing, CMU Reading: Chap. 7 T.M book 1 Complexity of Learning
More informationSequences. Chapter 3. n + 1 3n + 2 sin n n. 3. lim (ln(n + 1) ln n) 1. lim. 2. lim. 4. lim (1 + n)1/n. Answers: 1. 1/3; 2. 0; 3. 0; 4. 1.
Chapter 3 Sequences Both the main elements of calculus (differentiation and integration) require the notion of a limit. Sequences will play a central role when we work with limits. Definition 3.. A Sequence
More informationChapter 3: Random Variables 1
Chapter 3: Random Variables 1 Yunghsiang S. Han Graduate Institute of Communication Engineering, National Taipei University Taiwan E-mail: yshan@mail.ntpu.edu.tw 1 Modified from the lecture notes by Prof.
More informationIntroduction and basic definitions
Chapter 1 Introduction and basic definitions 1.1 Sample space, events, elementary probability Exercise 1.1 Prove that P( ) = 0. Solution of Exercise 1.1 : Events S (where S is the sample space) and are
More informationA Hierarchy of Information Quantities for Finite Block Length Analysis of Quantum Tasks
A Hierarchy of Information Quantities for Finite Block Length Analysis of Quantum Tasks Marco Tomamichel, Masahito Hayashi arxiv: 1208.1478 Also discussing results of: Second Order Asymptotics for Quantum
More informationELEMENTS O F INFORMATION THEORY
ELEMENTS O F INFORMATION THEORY THOMAS M. COVER JOY A. THOMAS Preface to the Second Edition Preface to the First Edition Acknowledgments for the Second Edition Acknowledgments for the First Edition x
More informationOn Achievable Rates for Channels with. Mismatched Decoding
On Achievable Rates for Channels with 1 Mismatched Decoding Anelia Somekh-Baruch arxiv:13050547v1 [csit] 2 May 2013 Abstract The problem of mismatched decoding for discrete memoryless channels is addressed
More informationInformation measures in simple coding problems
Part I Information measures in simple coding problems in this web service in this web service Source coding and hypothesis testing; information measures A(discrete)source is a sequence {X i } i= of random
More information(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute
ENEE 739C: Advanced Topics in Signal Processing: Coding Theory Instructor: Alexander Barg Lecture 6 (draft; 9/6/03. Error exponents for Discrete Memoryless Channels http://www.enee.umd.edu/ abarg/enee739c/course.html
More informationPART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015
Outline Codes and Cryptography 1 Information Sources and Optimal Codes 2 Building Optimal Codes: Huffman Codes MAMME, Fall 2015 3 Shannon Entropy and Mutual Information PART III Sources Information source:
More informationLecture 22: Final Review
Lecture 22: Final Review Nuts and bolts Fundamental questions and limits Tools Practical algorithms Future topics Dr Yao Xie, ECE587, Information Theory, Duke University Basics Dr Yao Xie, ECE587, Information
More information