Example: Letter Frequencies

Similar documents
Example: Letter Frequencies

Example: Letter Frequencies

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Information Theory. Master of Logic 2015/16 2nd Block Nov/Dec 2015

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

Lecture 5 - Information theory

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

3F1 Information Theory, Lecture 1

Dept. of Linguistics, Indiana University Fall 2015

1 Basic Information Theory

ECE 4400:693 - Information Theory

Solutions to Homework Set #3 Channel and Source coding

Machine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang

Solutions to Set #2 Data Compression, Huffman code and AEP

Lecture 22: Final Review

The binary entropy function

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18

Lecture 2: August 31

Introduction to Information Theory. B. Škorić, Physical Aspects of Digital Security, Chapter 2

Information Theory Primer:

CS 630 Basic Probability and Information Theory. Tim Campbell

Homework 1 Due: Thursday 2/5/2015. Instructions: Turn in your homework in class on Thursday 2/5/2015

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

Homework Set #2 Data Compression, Huffman code and AEP

Lecture 1: Introduction, Entropy and ML estimation

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Entropies & Information Theory

Noisy channel communication

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

(Classical) Information Theory III: Noisy channel coding

Chapter I: Fundamental Information Theory

Lecture 3: Channel Capacity

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Ch. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited

Communication Theory and Engineering

Information Theory. Week 4 Compressing streams. Iain Murray,

Channel capacity. Outline : 1. Source entropy 2. Discrete memoryless channel 3. Mutual information 4. Channel capacity 5.

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye

Computing and Communications 2. Information Theory -Entropy

Revision of Lecture 5

LECTURE 3. Last time:

Lecture 2. Capacity of the Gaussian channel

ECE 534: Elements of Information Theory. Solutions to Midterm Exam (Spring 2006)

3F1 Information Theory, Lecture 3

MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK. SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A

Lecture 18: Quantum Information Theory and Holevo s Bound

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

Gambling and Information Theory

Information Theory: Entropy, Markov Chains, and Huffman Coding

LECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline

The Communication Complexity of Correlation. Prahladh Harsha Rahul Jain David McAllester Jaikumar Radhakrishnan

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

Noisy-Channel Coding

Information Theory and Communication

Exercises with solutions (Set D)

Probability, Entropy, and Inference

Lecture 1: September 25, A quick reminder about random variables and convexity

Note that the new channel is noisier than the original two : and H(A I +A2-2A1A2) > H(A2) (why?). min(c,, C2 ) = min(1 - H(a t ), 1 - H(A 2 )).

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006)

Principles of Communications

AQI: Advanced Quantum Information Lecture 6 (Module 2): Distinguishing Quantum States January 28, 2013

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

Computational Systems Biology: Biology X

Some Basic Concepts of Probability and Information Theory: Pt. 2

Information in Biology

ELEC546 Review of Information Theory

Entropy Rate of Stochastic Processes

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

(Refer Slide Time: 0:36)

Cryptography - Session 2

Information Theoretic Limits of Randomness Generation

H(X) = plog 1 p +(1 p)log 1 1 p. With a slight abuse of notation, we denote this quantity by H(p) and refer to it as the binary entropy function.

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Shannon s Noisy-Channel Coding Theorem

CS225/ECE205A, Spring 2005: Information Theory Wim van Dam Engineering 1, Room 5109

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

6.1 Main properties of Shannon entropy. Let X be a random variable taking values x in some alphabet with probabilities.

Information. = more information was provided by the outcome in #2

A Gentle Tutorial on Information Theory and Learning. Roni Rosenfeld. Carnegie Mellon University

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2

x log x, which is strictly convex, and use Jensen s Inequality:

ECE 587 / STA 563: Lecture 2 Measures of Information Information Theory Duke University, Fall 2017

MATH GRE PREP: WEEK 2. (2) Which of the following is closest to the value of this integral:

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.

3F1 Information Theory, Lecture 3

Information in Biology

The Logic of Partitions with an application to Information Theory

Lecture 11: Quantum Information III - Source Coding

Hands-On Learning Theory Fall 2016, Lecture 3

EGR 544 Communication Theory

Quantitative Biology Lecture 3

Probability, Entropy, and Inference / More About Inference

1 Introduction to information theory

Transcription:

Example: Letter Frequencies i a i p i 1 a 0.0575 2 b 0.0128 3 c 0.0263 4 d 0.0285 5 e 0.0913 6 f 0.0173 7 g 0.0133 8 h 0.0313 9 i 0.0599 10 j 0.0006 11 k 0.0084 12 l 0.0335 13 m 0.0235 14 n 0.0596 15 o 0.0689 16 p 0.0192 17 q 0.0008 18 r 0.0508 19 s 0.0567 20 t 0.0706 21 u 0.0334 22 v 0.0069 23 w 0.0119 24 x 0.0073 25 y 0.0164 26 z 0.0007 27 0.1928 a b c d e f g h i j k l m n o p q r s t u v w x y z Figure 2.1. Probability distribution over the 27 outcomes for a randomly selected letter in an English language document (estimated from The Frequently Asked Questions Manual for Linux ). The picture shows the probabilities by the areas of white squares. Book by David MacKay

Example: Letter Frequencies i a i p i 1 a 0.0575 2 b 0.0128 3 c 0.0263 4 d 0.0285 5 e 0.0913 6 f 0.0173 7 g 0.0133 8 h 0.0313 9 i 0.0599 10 j 0.0006 11 k 0.0084 12 l 0.0335 13 m 0.0235 14 n 0.0596 15 o 0.0689 16 p 0.0192 17 q 0.0008 18 r 0.0508 19 s 0.0567 20 t 0.0706 21 u 0.0334 22 v 0.0069 23 w 0.0119 24 x 0.0073 25 y 0.0164 26 z 0.0007 27 0.1928 a b c d e f g h i j k l m n o p q r s t u v w x y z Figure 2.1. Probability distribution over the 27 outcomes for a randomly selected letter in an English language document (estimated from The Frequently Asked Questions Manual for Linux ). The picture shows the probabilities by the areas of white squares. x a b c d e f g h i j k l m n o p q r s t u v w x y z a b c d e f g h i j k l m n o p q r s t u v w x y z Figure 2.2. The probability distribution over the 27 27 possible bigrams xy in an English language document, The Frequently Asked Questions Manual for Linux. Book by David MacKay y

Example: Surprisal Values from http://www.umsl.edu/~fraundorfp/egsurpri.html i a i p i h(p i ) 1 a.0575 4.1 2 b.0128 6.3 3 c.0263 5.2 4 d.0285 5.1 5 e.0913 3.5 6 f.0173 5.9 7 g.0133 6.2 8 h.0313 5.0 9 i.0599 4.1 10 j.0006 10.7 11 k.0084 6.9 12 l.0335 4.9 13 m.0235 5.4 14 n.0596 4.1 15 o.0689 3.9 16 p.0192 5.7 17 q.0008 10.3 18 r.0508 4.3 19 s.0567 4.1 20 t.0706 3.8 21 u.0334 4.9 22 v.0069 7.2 23 w.0119 6.4 24 x.0073 7.1 25 y.0164 5.9 26 z.0007 10.4 27 -.1928 2.4 i p i log 2 1 p i 4.1 Table 2.9. Shannon information contents of the outcomes a z. Book by David MacKay

MacKay s Mnemonic convex concave

MacKay s Mnemonic convex concave

MacKay s Mnemonic convex concave

MacKay s Mnemonic convex convec-smile concave

MacKay s Mnemonic convex convec-smile concave conca-frown

Examples: Convex & Concave Functions x 2-1 0 1 2 3 Book by David MacKay

Examples: Convex & Concave Functions x 2-1 0 1 2 3 Book by David MacKay

Examples: Convex & Concave Functions x 2 e x -1 0 1 2 3-1 0 1 2 3 Book by David MacKay

Examples: Convex & Concave Functions x 2 e x -1 0 1 2 3-1 0 1 2 3 log 1 x 0 1 2 3 Book by David MacKay

Examples: Convex & Concave Functions x 2 e x -1 0 1 2 3-1 0 1 2 3 log 1 x x log x 0 1 2 3 0 1 2 3 Book by David MacKay

Examples: Convex & Concave Functions x 2 e x -1 0 1 2 3-1 0 1 2 3 log 1 x x log x 0 1 2 3 0 1 2 3 Book by David MacKay

Binary Entropy Function H 2 (x) 1 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 x Figure 1.3. The binary entropy function. Book by David MacKay

Order These in Terms of Entropy ECE 534 by Natasha Devroye

Order These in Terms of Entropy ECE 534 by Natasha Devroye

Mutual Information and Entropy Theorem: Relationship between mutual information and entropy. I(X; Y ) = H(X) H(X Y ) I(X; Y ) = H(Y ) H(Y X) I(X; Y ) = H(X)+H(Y ) H(X, Y ) I(X; Y ) = I(Y ; X) (symmetry) I(X; X) = H(X) ( self-information ) H(X) H(Y) H(Y) H(X Y) H(X) H(Y) I(X;Y) I(X;Y) I(X;Y) ECE 534 by Natasha Devroye

Chain Rule for Entropy Theorem: (Chain rule for entropy): (X 1,X 2,..., X n ) p(x 1,x 2,..., x n ) H(X1) H(X2) H(X 1,X 2,..., X n )= n i=1 H(X i X i 1,..., X 1 )! H(X3) H(X1) H(X2 X1) H(X1,X2,X3) = + + H(X3 X1,X2) ECE 534 by Natasha Devroye

Chain Rule for Mutual Information Theorem: (Chain rule for mutual information) I(X 1,X 2,..., X n ; Y )= n i=1 I(X i ; Y X i 1,X i 2,..., X 1 ) H(X) H(Y) H(X) H(Y) H(X) H(Y) I(X,Y;Z) = I(X,;Z) + I(Y,;Z X) H(Z) H(Z) H(Z) ECE 534 by Natasha Devroye

What are the Grey Regions? H(X) H(Y) H(X) H(Y) H(Z) H(Z) ECE 534 by Natasha Devroye