Example: Letter Frequencies

Similar documents
Example: Letter Frequencies

Example: Letter Frequencies

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information

Information Theory and Communication

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.

Lecture 22: Final Review

The binary entropy function

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye

Lecture 5 - Information theory

Machine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang

Information Theory. Master of Logic 2015/16 2nd Block Nov/Dec 2015

Information Theory. Week 4 Compressing streams. Iain Murray,

LECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline

1 Basic Information Theory

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

Lecture 2: August 31

ECE 4400:693 - Information Theory

3F1 Information Theory, Lecture 1

ECE 587 / STA 563: Lecture 2 Measures of Information Information Theory Duke University, Fall 2017

Probability, Entropy, and Inference

Solutions to Set #2 Data Compression, Huffman code and AEP

Solutions to Homework Set #3 Channel and Source coding

Information Theory Primer:

Computing and Communications 2. Information Theory -Entropy

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Dept. of Linguistics, Indiana University Fall 2015

Homework Set #2 Data Compression, Huffman code and AEP

H(X) = plog 1 p +(1 p)log 1 1 p. With a slight abuse of notation, we denote this quantity by H(p) and refer to it as the binary entropy function.

Lecture 1: September 25, A quick reminder about random variables and convexity

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

Homework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015

Information Theory: Entropy, Markov Chains, and Huffman Coding

LECTURE 3. Last time:

x log x, which is strictly convex, and use Jensen s Inequality:

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Chapter I: Fundamental Information Theory

3F1 Information Theory, Lecture 3

Entropies & Information Theory

Introduction to Information Theory. B. Škorić, Physical Aspects of Digital Security, Chapter 2

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Ch. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited

Solutions 1. Introduction to Coding Theory - Spring 2010 Solutions 1. Exercise 1.1. See Examples 1.2 and 1.11 in the course notes.

CS 630 Basic Probability and Information Theory. Tim Campbell

Lecture 3: Channel Capacity

(Classical) Information Theory III: Noisy channel coding

1 Introduction to information theory

AQI: Advanced Quantum Information Lecture 6 (Module 2): Distinguishing Quantum States January 28, 2013

Lecture 1: Introduction, Entropy and ML estimation

Polynomial Degree Leading Coefficient. Sign of Leading Coefficient

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

Noisy-Channel Coding

Information Theoretic Limits of Randomness Generation

ECE 534: Elements of Information Theory. Solutions to Midterm Exam (Spring 2006)

MATH GRE PREP: WEEK 2. (2) Which of the following is closest to the value of this integral:

CS225/ECE205A, Spring 2005: Information Theory Wim van Dam Engineering 1, Room 5109

Lecture 2. Capacity of the Gaussian channel

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

Quantitative Biology Lecture 3

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Some Basic Concepts of Probability and Information Theory: Pt. 2

EE 4TM4: Digital Communications II. Channel Capacity

Noisy channel communication

Lecture 8: Channel Capacity, Continuous Random Variables

3F1 Information Theory, Lecture 3

Shannon s Noisy-Channel Coding Theorem

Entropy Rate of Stochastic Processes

The Communication Complexity of Correlation. Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan

ELEC546 Review of Information Theory

MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK. SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A

Lecture 11: Quantum Information III - Source Coding

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

6.1 Main properties of Shannon entropy. Let X be a random variable taking values x in some alphabet with probabilities.

Communication Theory and Engineering

Quantitative Biology II Lecture 4: Variational Methods

Channel capacity. Outline : 1. Source entropy 2. Discrete memoryless channel 3. Mutual information 4. Channel capacity 5.

Chapter 3: Asymptotic Equipartition Property

Solutions to Homework Set #2 Broadcast channel, degraded message set, Csiszar Sum Equality

Principles of Communications

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

UCSD ECE 255C Handout #14 Prof. Young-Han Kim Thursday, March 9, Solutions to Homework Set #5

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2

Information theory and decision tree

Revision of Lecture 5

COS597D: Information Theory in Computer Science September 21, Lecture 2

The Logic of Partitions with an application to Information Theory

Concentration Inequalities

Lecture Notes for Statistics 311/Electrical Engineering 377. John Duchi

Probability, Entropy, and Inference / More About Inference

Computational Systems Biology: Biology X

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006)

Lecture 18: Quantum Information Theory and Holevo s Bound

Transcription:

Example: Letter Frequencies i a i p i 1 a 0.0575 2 b 0.0128 3 c 0.0263 4 d 0.0285 5 e 0.0913 6 f 0.0173 7 g 0.0133 8 h 0.0313 9 i 0.0599 10 j 0.0006 11 k 0.0084 12 l 0.0335 13 m 0.0235 14 n 0.0596 15 o 0.0689 16 p 0.0192 17 q 0.0008 18 r 0.0508 19 s 0.0567 20 t 0.0706 21 u 0.0334 22 v 0.0069 23 w 0.0119 24 x 0.0073 25 y 0.0164 26 z 0.0007 27 0.1928 a b c d e f g h i j k l m n o p q r s t u v w x y z Figure 2.1. Probability distribution over the 27 outcomes for a randomly selected letter in an English language document (estimated from The Frequently Asked Questions Manual for Linux ). The picture shows the probabilities by the areas of white squares. x a b c d e f g h i j k l m n o p q r s t u v w x y z a b c d e f g h i j k l m n o p q r s t u v w x y z Figure 2.2. The probability distribution over the 27 27 possible bigrams xy in an English language document, The Frequently Asked Questions Manual for Linux. Book by David MacKay y

Example: Surprisal Values from http://www.umsl.edu/~fraundorfp/egsurpri.html i a i p i h(p i ) 1 a.0575 4.1 2 b.0128 6.3 3 c.0263 5.2 4 d.0285 5.1 5 e.0913 3.5 6 f.0173 5.9 7 g.0133 6.2 8 h.0313 5.0 9 i.0599 4.1 10 j.0006 10.7 11 k.0084 6.9 12 l.0335 4.9 13 m.0235 5.4 14 n.0596 4.1 15 o.0689 3.9 16 p.0192 5.7 17 q.0008 10.3 18 r.0508 4.3 19 s.0567 4.1 20 t.0706 3.8 21 u.0334 4.9 22 v.0069 7.2 23 w.0119 6.4 24 x.0073 7.1 25 y.0164 5.9 26 z.0007 10.4 27 -.1928 2.4 i p i log 2 1 p i 4.1 Table 2.9. Shannon information contents of the outcomes a z. Book by David MacKay

MacKay s Mnemonic convex convec-smile concave conca-frown

Examples: Convex & Concave Functions x 2 e x -1 0 1 2 3-1 0 1 2 3 log 1 x x log x 0 1 2 3 0 1 2 3 Book by David MacKay

Jensen s Inequality D D! D Definition 1 The function f : D! R is convex if for all x 1,x 2 2 D and for all 2 [0, 1] R: f(x 1 )+(1 )f(x 2 ) f( x 1 +(1 )x 2 ). The function f is strictly convex if equality only holds when =0or =1, or when x 1 = x 2. The function f is (strictly) concave if the function f is (strictly) convex. Proposition 2 (Jensen s inequality) Let the function f : D! R be convex, and let n 2 N. Then for any p 1,...,p n 2 R 0 such that P n i=1 p i =1and for any x 1,...,x n 2 D it holds that nx p i f(x i ) i=1 X n f i=1 p i x i. If f is strictly convex and p 1,...,p n > 0, then equality holds i x 1 = = x n. In particular, if X is a real random variable whose image X is contained in D, then E[f(X)] f(e[x]), and, if f is strictly convex, equality holds i there is c 2 X such that X = c with probability 1.

Binary Entropy Function H 2 (x) 1 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 x Figure 1.3. The binary entropy function. Book by David MacKay

Decomposability of Entropy { } ( p2 H(p) = H(p 1, 1 p 1 ) + (1 p 1 )H, 1 p 1 p 3 1 p 1,..., p I 1 p 1 ). (2.43) When it s written as a formula, this property looks regrettably ugly; nevertheless it is a simple property and one that you should make use of. Generalizing further, the entropy has the property for any m that H(p) = H [(p 1 + p 2 + + p m ), (p m+1 + p m+2 + + p I )] ( ) p 1 +(p 1 + + p m )H (p 1 + + p m ),..., p m (p 1 + + p m ) ( ) p m+1 +(p m+1 + + p I )H (p m+1 + + p I ),..., p I. (p m+1 + + p I ) (2.44) Book by David MacKay

Order These in Terms of Entropy ECE 534 by Natasha Devroye

Order These in Terms of Entropy ECE 534 by Natasha Devroye

Mutual Information and Entropy Theorem: Relationship between mutual information and entropy. I(X; Y ) = H(X) H(X Y ) I(X; Y ) = H(Y ) H(Y X) I(X; Y ) = H(X)+H(Y ) H(X, Y ) I(X; Y ) = I(Y ; X) (symmetry) I(X; X) = H(X) ( self-information ) H(X) H(Y) H(Y) H(X Y) H(X) H(Y) I(X;Y) I(X;Y) I(X;Y) ECE 534 by Natasha Devroye

Chain Rule for Entropy Theorem: (Chain rule for entropy): (X 1,X 2,..., X n ) p(x 1,x 2,..., x n ) H(X1) H(X2) H(X 1,X 2,..., X n )= n i=1 H(X i X i 1,..., X 1 )! H(X3) H(X1) H(X2 X1) H(X1,X2,X3) = + + H(X3 X1,X2) ECE 534 by Natasha Devroye

Chain Rule for Mutual Information Theorem: (Chain rule for mutual information) I(X 1,X 2,..., X n ; Y )= n i=1 I(X i ; Y X i 1,X i 2,..., X 1 ) H(X) H(Y) H(X) H(Y) H(X) H(Y) I(X,Y;Z) = I(X,;Z) + I(Y,;Z X) H(Z) H(Z) H(Z) ECE 534 by Natasha Devroye

What are the Grey Regions? H(X) H(Y) H(X) H(Y) H(Z) H(Z) ECE 534 by Natasha Devroye