3F1 Information Theory, Lecture 1

Similar documents
3F1 Information Theory, Lecture 3

3F1 Information Theory, Lecture 3

Outline of the Lecture. Background and Motivation. Basics of Information Theory: 1. Introduction. Markku Juntti. Course Overview

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Channel capacity. Outline : 1. Source entropy 2. Discrete memoryless channel 3. Mutual information 4. Channel capacity 5.

Information in Biology

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Dept. of Linguistics, Indiana University Fall 2015

Machine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang

Information in Biology

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006)

MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK. SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A

Chapter I: Fundamental Information Theory

Lecture 5 - Information theory

Communication Theory II

ELECTRONICS & COMMUNICATIONS DIGITAL COMMUNICATIONS

Information Theory and Coding Techniques

Principles of Communications

Entropies & Information Theory

Example: Letter Frequencies

Lecture 1: Introduction, Entropy and ML estimation

Information Theory - Entropy. Figure 3

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Information Theory Primer:

Quantitative Biology Lecture 3

Example: Letter Frequencies

Example: Letter Frequencies

Information Theory, Statistics, and Decision Trees

Lecture 2: August 31

Computing and Communications 2. Information Theory -Entropy

1 Introduction to information theory

Noisy channel communication

Part I. Entropy. Information Theory and Networks. Section 1. Entropy: definitions. Lecture 5: Entropy

Cover s Open Problem: The Capacity of the Relay Channel

Lecture 22: Final Review

(Classical) Information Theory III: Noisy channel coding

Lecture 2. Capacity of the Gaussian channel

EC2252 COMMUNICATION THEORY UNIT 5 INFORMATION THEORY

Introduction to Information Theory

Computational Systems Biology: Biology X

Revision of Lecture 5

The Measure of Information Uniqueness of the Logarithmic Uncertainty Measure

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

AQI: Advanced Quantum Information Lecture 6 (Module 2): Distinguishing Quantum States January 28, 2013

LECTURE 3. Last time:

LECTURE 13. Last time: Lecture outline


Information. = more information was provided by the outcome in #2

A Gentle Tutorial on Information Theory and Learning. Roni Rosenfeld. Carnegie Mellon University

ELEMENT OF INFORMATION THEORY

5 Mutual Information and Channel Capacity

Basic information theory

Bioinformatics: Biology X

CSCI 2570 Introduction to Nanocomputing

6.02 Fall 2011 Lecture #9

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

INTRODUCTION TO INFORMATION THEORY

ECE INFORMATION THEORY. Fall 2011 Prof. Thinh Nguyen

Chapter 8: Differential entropy. University of Illinois at Chicago ECE 534, Natasha Devroye

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Is the entropy a good measure of correlation?

Lecture 11: Information theory THURSDAY, FEBRUARY 21, 2019

A Mathematical Theory of Communication

An introduction to basic information theory. Hampus Wessman

Lecture 14 February 28

One Lesson of Information Theory

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

Shannon s A Mathematical Theory of Communication

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information

Common Information. Abbas El Gamal. Stanford University. Viterbi Lecture, USC, April 2014

Upper Bounds on the Capacity of Binary Intermittent Communication

Classification & Information Theory Lecture #8

A Mathematical Theory of Communication

CS 630 Basic Probability and Information Theory. Tim Campbell

6.02 Fall 2012 Lecture #1

Introduction to Information Theory. B. Škorić, Physical Aspects of Digital Security, Chapter 2

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Shannon's Theory of Communication

ELEC546 Review of Information Theory

to mere bit flips) may affect the transmission.

A Brief Introduction to Shannon s Information Theory

Lecture 8: Channel Capacity, Continuous Random Variables

Chapter 2 Review of Classical Information Theory

ECE 4400:693 - Information Theory

ITCT Lecture IV.3: Markov Processes and Sources with Memory

Lecture 15: Conditional and Joint Typicaility

UNIT I INFORMATION THEORY. I k log 2

Introduction to Information Theory

Chapter 9 Fundamental Limits in Information Theory

Lecture 8: Shannon s Noise Models

CSE468 Information Conflict

Lecture 11: Quantum Information III - Source Coding

Entropy and Ergodic Theory Lecture 3: The meaning of entropy in information theory

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Roll No. :... Invigilator's Signature :.. CS/B.TECH(ECE)/SEM-7/EC-703/ CODING & INFORMATION THEORY. Time Allotted : 3 Hours Full Marks : 70

Shannon s Noisy-Channel Coding Theorem

Transcription:

3F1 Information Theory, Lecture 1 Jossy Sayir Department of Engineering Michaelmas 2013, 22 November 2013

Organisation History Entropy Mutual Information 2 / 18 Course Organisation 4 lectures Course material: lecture notes (4 Sections) and slides 1 examples paper Exam material: notes and example paper, past exams

Organisation History Entropy Mutual Information 3 / 18 Course contents 1. Introduction to Information Theory 2. Good Variable Length Codes 3. Higher Order Sources 4. Communication Channels 5. Continuous Random Variables

Organisation History Entropy Mutual Information 4 / 18 This lecture A bit of history... Hartley s measure of information Shannon s uncertainty / entropy Properties of Shannon s entropy

Organisation History Entropy Mutual Information 5 / 18 Claude Elwood Shannon (1916-2001)

Organisation History Entropy Mutual Information 6 / 18 Shannon s paper, 1948 Reprinted with corrections from The Bell System Technical Journal, Vol. 27, pp. 379 423, 623 656, July, October, 1948. A Mathematical Theory of Communication By C. E. SHANNON INTRODUCTION THE recent development of various methods of modulation such as PCM and PPM which exchange bandwidth for signal-to-noise ratio has intensified the interest in a general theory of communication. A basis for such a theory is contained in the important papers of Nyquist 1 and Hartley 2 on this subject. In the present paper we will extend the theory to include a number of new factors, in particular the effect of noise in the channel, and the savings possible due to the statistical structure of the original message and due to the nature of the final destination of the information. The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. Frequently the messages have meaning; that is they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem. The significant aspect is that the actual message is one selected from a set of possible messages. The system must be designed to operate for each possible selection, not just the one which will actually be chosen since this is unknown at the time of design. If the number of messages in the set is finite then this number or any monotonic function of this number can be regarded as a measure of the information produced when one message is chosen from the set, all choices being equally likely. As was pointed out by Hartley the most natural choice is the logarithmic function. Although this definition must be generalized considerably when we consider the influence of the statistics of the message and when we have a continuous range of messages, we will in all cases use an Department essentially of Engineering logarithmic measure. 3F1 Information Theory c Jossy Sayir

Organisation History Entropy Mutual Information 7 / 18 Shannon s paper, 1948 INFORMATION SOURCE TRANSMITTER RECEIVER DESTINATION MESSAGE SIGNAL RECEIVED SIGNAL MESSAGE NOISE SOURCE Fig. 1 Schematic diagram of a general communication system. a decimal digit is about 3 1 3 bits. A digit wheel on a desk computing machine has ten stable positions and therefore has a storage capacity of one decimal digit. In analytical work where integration and differentiation are involved the base e is sometimes useful. The resulting units of information will be called natural units. Change from the base a to base b merely requires multiplication by log b a. By a communication system we will mean a system of the type indicated schematically in Fig. 1. It consists of essentially five parts: 1. An information source which produces a message or sequence of messages to be communicated to the receiving terminal. The message may be of various types: (a) A sequence of letters as in a telegraph of teletype system; (b) A single function of time f (t) as in radio or telephony; (c) A function of Department of Engineering time and other variables as in black and white 3F1 television Information here Theory the message may be thought of as a c Jossy Sayir

Organisation History Entropy Mutual Information 8 / 18 R.V.L. Hartley (1888-1970)

Organisation History Entropy Mutual Information 9 / 18 Uncertainty (Entropy) Shannon Entropy For a discrete random variable X, H(X) def = x supp P X P X (x) log b P X (x) = E[ log b P X (X)] b = 2, entropy in bits b = e, entropy in nats b = 10, entropy in Hartleys

Organisation History Entropy Mutual Information 10 / 18 Properties of the Entropy Extremes If X is defined over an alphabet of size N, 0 H(X) log b N with equality on the left if and only if supp P X = 1 and on the right if and only if P X (x) = 1/N for all x. Proof: H(X) log b N = x P X (x) log b P X (x) x P X (x) log b N = 1 log e b 1 log e b = 1 log e b 1 P X (x) log e NP x X (x) ( P X (x) x ( 1 N x x ) 1 NP X (x) 1 ) P X (x) = 0 (IT inequality)

Organisation History Entropy Mutual Information 11 / 18 IT-Inequality 1 0.5 x 1 0 0.5 1 1.5 ln(x) 2 2.5 Lemma (IT-Inequality) 3 0.5 0 0.5 1 1.5 2 with equality if and only if x = 1. log e (x) x 1

Organisation History Entropy Mutual Information 12 / 18 Binary Entropy Function h(p) 1 0.9 0.8 0.7 0.6 h(p) 0.5 0.4 0.3 0.2 0.1 Good to remember: 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 p h(.11) 1 2

Organisation History Entropy Mutual Information 13 / 18 Block and Conditional Entropy Entropy naturally extends to vectors or blocks : H(X) = x H(XY ) = xy P X (x) log P X (x) P XY (x, y) log P XY (x, y) Entropy conditioned on an event: H(X Y = y) = x P X Y (x y) log P X Y (x y) Equivocation or conditional entropy H(X Y ) = y P Y (y)h(x Y = y) = E[ log P X Y (X Y )]

Organisation History Entropy Mutual Information 14 / 18 Chain rule of entropies Two random variables H(XY ) = H(X) + H(Y X) = H(Y ) + H(X Y ) Follows directly from our definition of H(Y X) Any number of random variables H(X 1 X 2... X N ) = H(X 1 ) + H(X 2 X 1 ) +... + H(X N X 1... X N ) Follows from recursive application of the two variable chain rule

Organisation History Entropy Mutual Information 15 / 18 Does conditioning reduce uncertainty? X color of randomly picked ball Y row of picked ball (1-10) H(X) = h(1/4) = 2 3 4 log 2 3 = 0.811 bits H(X Y = 1) = 1 bit > H(X) H(X Y = 2) = 0 bits < H(X) H(X Y ) = 5 1+5 0 10 = 1 2 < H(X) Conditioning Theorem 0 H(X Y ) H(X) Conditioning on a random variable only ever reduces uncertainty / entropy (but conditioning on an event can increaase it).

Organisation History Entropy Mutual Information 16 / 18 Proof of conditioning theorem H(X Y ) H(X) = H(XY ) H(X) H(Y ) = P(x, y) log P(x)P(y) P(x, y) x,y [ ] P(x)P(y) P(x, y) 1 P(x, y) x,y (chain rule) (IT-inequality) = x,y P(x)P(y) x,y P(x, y) = 0

Organisation History Entropy Mutual Information 17 / 18 Mutual Information Definition I(X; Y ) = H(X) H(X Y ) Mutual information is mutual: I(X; Y ) = H(X) + H(Y ) H(XY ) = H(Y ) H(Y X)

Organisation History Entropy Mutual Information 18 / 18 Positivity of Mutual Information Theorem I(X; Y ) 0 with equality if and only if X and Y are independent Equivalent to H(X Y ) H(X).