An Introduction to Information Theory: Notes

Similar documents
ECE 534 Information Theory - Midterm 2

Formal Modeling in Cognitive Science Lecture 29: Noisy Channel Model and Applications;

(Classical) Information Theory III: Noisy channel coding

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Shannon s Noisy-Channel Coding Theorem

EE/Stats 376A: Information theory Winter Lecture 5 Jan 24. Lecturer: David Tse Scribe: Michael X, Nima H, Geng Z, Anton J, Vivek B.

AQI: Advanced Quantum Information Lecture 6 (Module 2): Distinguishing Quantum States January 28, 2013

Lecture 21: Quantum Communication

Homework Solution 4 for APPM4/5560 Markov Processes

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Dept. of Linguistics, Indiana University Fall 2015

Shannon s noisy-channel theorem

Noisy channel communication

Lecture 18: Quantum Information Theory and Holevo s Bound

Homework Set #3 Rates definitions, Channel Coding, Source-Channel coding

Lecture 2: August 31

Lecture 1: Introduction, Entropy and ML estimation

Towards a Theory of Information Flow in the Finitary Process Soup

Information in Biology

Convex Optimization methods for Computing Channel Capacity

DRAFT - do not circulate

Improved Capacity Bounds for the Binary Energy Harvesting Channel

1 Gambler s Ruin Problem

Information in Biology

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2

Lecture 16 Oct 21, 2014

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.

ECE 6960: Adv. Random Processes & Applications Lecture Notes, Fall 2010

Lecture 11: Quantum Information III - Source Coding

Lecture 11: Information theory THURSDAY, FEBRUARY 21, 2019

Lecture: Condorcet s Theorem

INTRODUCTION TO INFORMATION THEORY

What is Entropy? Jeff Gill, 1 Entropy in Information Theory

4. Score normalization technical details We now discuss the technical details of the score normalization method.

Real Analysis 1 Fall Homework 3. a n.

MATH 2710: NOTES FOR ANALYSIS

AM 221: Advanced Optimization Spring Prof. Yaron Singer Lecture 6 February 12th, 2014

Shannon s Noisy-Channel Coding Theorem

Entropy Rate of Stochastic Processes

Intro to Information Theory

Lecture 14: Introduction to Decision Making

Gambling and Information Theory

Chapter I: Fundamental Information Theory

Noisy-Channel Coding

Information Theory in Intelligent Decision Making

Lecture 11: Continuous-valued signals and differential entropy

dn i where we have used the Gibbs equation for the Gibbs energy and the definition of chemical potential

X 1 : X Table 1: Y = X X 2

Model checking, verification of CTL. One must verify or expel... doubts, and convert them into the certainty of YES [Thomas Carlyle]

7.2 Inference for comparing means of two populations where the samples are independent

CS 630 Basic Probability and Information Theory. Tim Campbell

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK

where x i is the ith coordinate of x R N. 1. Show that the following upper bound holds for the growth function of H:

arxiv:cond-mat/ v2 25 Sep 2002

Chaos, Complexity, and Inference (36-462)

Channel Coding: Zero-error case

An introduction to basic information theory. Hampus Wessman

Upper Bounds on the Capacity of Binary Intermittent Communication

Solutions to Problem Set 5

9. Distance measures. 9.1 Classical information measures. Head Tail. How similar/close are two probability distributions? Trace distance.

Exercise 1. = P(y a 1)P(a 1 )

CS Communication Complexity: Applications and New Directions

Entropy as a measure of surprise

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information

Economics 101. Lecture 7 - Monopoly and Oligopoly

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

Information Theory - Entropy. Figure 3

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

EE229B - Final Project. Capacity-Approaching Low-Density Parity-Check Codes

Ch. 8 Math Preliminaries for Lossy Coding. 8.5 Rate-Distortion Theory

COMMUNICATION BETWEEN SHAREHOLDERS 1

Entanglement and information

Homework 2: Solution

Channel capacity. Outline : 1. Source entropy 2. Discrete memoryless channel 3. Mutual information 4. Channel capacity 5.

Information & Correlation

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Discussion 6A Solution

COMPSCI 650 Applied Information Theory Apr 5, Lecture 18. Instructor: Arya Mazumdar Scribe: Hamed Zamani, Hadi Zolfaghari, Fatemeh Rezaei

Notes on Instrumental Variables Methods

Entropies & Information Theory

Information Theory Primer:

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

Uniform Law on the Unit Sphere of a Banach Space

15-451/651: Design & Analysis of Algorithms October 23, 2018 Lecture #17: Prediction from Expert Advice last changed: October 25, 2018

Chapter 1: PROBABILITY BASICS

Lecture 8: Channel Capacity, Continuous Random Variables

Lecture 7: DecisionTrees

Classification & Information Theory Lecture #8

B8.1 Martingales Through Measure Theory. Concept of independence

Multiple-Input Multiple-Output Systems

Biointelligence Lab School of Computer Sci. & Eng. Seoul National University

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

an information-theoretic model for adaptive side-channel attacks

Useful Concepts from Information Theory

3 More applications of derivatives

Approximating min-max k-clustering

FAST AND EFFICIENT SIDE INFORMATION GENERATION IN DISTRIBUTED VIDEO CODING BY USING DENSE MOTION REPRESENTATIONS

Named Entity Recognition using Maximum Entropy Model SEEM5680

On the capacity of the general trapdoor channel with feedback

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Transcription:

An Introduction to Information Theory: Notes Jon Shlens jonshlens@ucsd.edu 03 February 003 Preliminaries. Goals. Define basic set-u of information theory. Derive why entroy is the measure of information caacity. 3. Discuss the basics of mutual information 4. Solve binary symmetric channel. Probability Theory robability distribution relations joint : P (X, Y ) P (X) P (Y ) marginal : P (X) = P (X, Y = y i ) y iɛy conditional : P (X Y ) = y iɛy P (Y = y i )P (X, Y = y i ) Bayes rule : P (X, Y ) = P (X Y ) P (Y ) = P (Y X) P (X) iid = indeendently and identically distributed P (X, X ) = P (X) P (X) A Simle Examle. Situation A Pretend we like to buy and sell a articular commodity - how about ork bellies at the Chicago Mercantile Exchange in 860. We talk to our trader every day and tell him one action a day: BUY, SELL or HOLD. One day we decide that we are going on a tri to Euroe but we would like to kee trading. Because hones don t exist, we decide on a simle system. We use a telegrah line to send a Morse code signal of a dot (denoted 0) or a dash (denoted ). Here is what we agree on: We will BUY and SELL exactly one half of the days; we will never HOLD. We will send a 0 reeatedly if it is a BUY and a reeatedly if it is a SELL.

. Qualtitative Analysis of A question: what does the trader learn by receiving a 0 or? before signal: equal chance of a BUY or a SELL but never HOLD. after signal: 0, denotes with 00% certainty to either BUY or SELL.3 Situation B Same sitaution as above but now let s say that our telegrah machine is noisy. Most of the time that we ress a 0 or, the trader receives a 0 or, resectively. But occasionally, say, 0% of the time, the trader receives the oosite..4 Qualitative Analysis of B question: what does the trader learn by receiving a 0 or? before signal: same as situation A after signal: Not 00% certain what order was. However, the trader does have a good hunch..5 Statements about Classical Information Theory. There exists a reset, agreed-uon model between the sender to receiver.. Information is usually measured in bits. question in the game of 0 questions 3. Information is selection between ossible alternatives. dee oint: the quantity of information does not deend on the comlexity of the reset alternatives. 3 Intuitive Examle Pretend we have a set of ossible messages X = {x, x,..., x N } all with equal robability { : i = for i =,,..., N}. We lan to send only one message x i through our channel. 3. A Simle Game Each element of X is labeled with a number j =,,..., N. Pretend that you are the sender and you are about to transmit one symbol x i. Your friend will be the receiver. Let your friend try to guess which symbol you will send. This game is a formal version of 0 questions. conclusion: how many questions does your friend need to select N equally robable numbers?

3. Define the uncertainty If the answers to your questions are yes or no, then we attach an equation to this situations. Let H be the average minimum number of questions your friend needs to guess which symbol you will send. H = N We define H as the Shannon entroy. H = log N H = log N H = log 4 Entroy. The Shannon entroy is the one and the same from thermodynamics.. Entroy measures the number of ossible states in a system. equivalent to a measure of uncertainty, variability or even concentration in a df. 3. In base the units of entroy are bits. In many theoretical treatments, base e is measured in gnats. 4. The most general form of entroy for X = {x, x,..., x N } and P = {,,..., N } (non-equal robabilities) is: H(P (X)) log i = N i log i i 5. Entroies can generalize to continuous distributions. discrete distributions: H 0 continuous distributions: 5 Proerties of Entroy 5. Sending two symbols not well defined. The same equirobable situation as the revious examle. However, this time we will send two symbols x i and x j. What is the entroy of sending two symbols x i and x j? H(x i, x j ) = N i,j= H(x i, x j ) = log N H(x i, x j ) = log N N log N H(x i, x j ) = log N + log N 3

5. Conclusion. Information is additive.. Entroy grows as more symbols sent. Entroy is an extensive quantity. 6 Mutual Information The goal is to formally quantify the reduction in uncertainty by examining the aroriate subtraction of entroies. Let us first look at the robability distributions of the receiver before and after one symbol is sent. beforehand: P (X) afterwards: P (X Y ) = i P (Y = y i)p (X Y = y i ) By our definition of uncertainty, the reduction in entroy between the two robability states is defined as the mutual information. I(X; Y ) = H(P (X)) H(P (X Y )) or I(X; Y ) = H(X) H(X Y ) Mutual information is also measure in bits. 6. Relations between entroies I will just not justify these statements but it is easy to work out. Regardless of whether one remembers the details of these equations, it is much easier to remember the Venn diagram in Figure. mutual information is symmetric I(X; Y ) = I(Y ; X) mutual information can be defined many ways I(X; Y ) = H(X) H(X Y ) I(X; Y ) = H(Y ) H(Y X) I(X; Y ) = H(X) + H(X) H(X, Y ) 7 Simle Examles, Returned We will now return full circle and calculate the mutual information I in the two beginning examles. In other words, we will formally quantify our revious qualitative notions. 4

Figure : Venn diagram of relations between variable entroies 7. Examle A: Returned X = {BUY, SELL, HOLD}, P (X) = {, }, 0) The entroy beforehand H(X). [ H(X) = log + ] log + 0 log 0 But notice that 0 log 0 is not finite. This brings u the comlicated issue of suort which authors go a great length to address. The simle, ad-hoc way avoiding these roofs is just to state in the context of information theory 0 log b 0 0. [ H(X) = log + ] log + 0 = [ + ] = Before calculating H(X Y ), we need to comute P (X Y ). P (X Y = 0) = {, 0, 0} P (X Y = ) = {0,, 0} Now we can comute the associated entroy. H(X Y ) = P (Y = 0)H(X Y = 0) + P (Y = )H(X Y = ) = 0 + 0 = 0 Therefore, the mutual information is I(X; Y ) = H(X) H(X Y ) = 0 = bit. 5

Mutual Information in a binary symmetric channel 0.9 0.8 0.7 0.6 I(X;Y) 0.5 0.4 0.3 0. 0. 0 0 0. 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Figure : Mutual information in a binary symmetric channel 7. Examle B: Returned First of all, I need to state beforehand that this roblem is a famous first chater roblem in any information theory textbook. It is often called the binary symmetric channel or more colloquially the noisy tyewriter. Let s just state all of the robability distributions before calculating the entories. Let the variable = 0. be the robability of incorrect transmission. { P (X) =, }, 0 P (X Y = 0) = {,, 0} P (X Y = ) = {,, 0} We can now calculate all of the entroies. H(X) = H(X Y = 0) = [( ) log ( ) + log ()] = ( ) log + log H(X Y = ) = [( ) log ( ) + log ()] = ( ) log + log Finally we can calculate the mutual information. I(X; Y ) = H(X) [P (Y = 0)H(X Y = 0) + P (Y = )H(Y = )] [( ) ( ) ( I(X; Y ) = ( ) log + log + [ ] I(X; Y ) = ( ) log + log 6 ) ( ( ) log + log )]

As a consistency check, notice that if = 0, we recover the solution for Examle A of bit. This function is lotted in figure. 8 Conclusions. Classical information theory requires a set robability model.. Information is selection between ossibilities. 3. Entroy is an extensive measure of uncertainty. 4. food for thought: if entroy is an extensive quantity, what is an invariant of a system? 9 References. Cover T and Thomas J (99) Elements of Information Theory. New York: John Wiley and Sons.. Rieke et al (997) Sikes: exloring the neural code. Cambridge, MA: MIT Press. 3. Mackay, David (003) Information Theory, Inference and Learning Algorithms, online htt://www.inference.hy.cam.ac.uk/mackay/itrnn/book.html 7