Entropies & Information Theory

Similar documents
Entropies & Information Theory

Noisy channel communication

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Dept. of Linguistics, Indiana University Fall 2015

Quantum rate distortion, reverse Shannon theorems, and source-channel separation

Shannon s noisy-channel theorem

9. Distance measures. 9.1 Classical information measures. Head Tail. How similar/close are two probability distributions? Trace distance.

Principles of Communications

Lecture 11: Quantum Information III - Source Coding

Lecture 4 Noisy Channel Coding

Lecture 2: August 31

CSCI 2570 Introduction to Nanocomputing

(Classical) Information Theory III: Noisy channel coding

Chapter 9 Fundamental Limits in Information Theory

Block 2: Introduction to Information Theory

ELEC546 Review of Information Theory

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Capacity of a channel Shannon s second theorem. Information Theory 1/33

to mere bit flips) may affect the transmission.

Quantum Information Theory

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Introduction to Information Theory. B. Škorić, Physical Aspects of Digital Security, Chapter 2

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

Lecture 4 Channel Coding

Exercise 1. = P(y a 1)P(a 1 )

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

3F1: Signals and Systems INFORMATION THEORY Examples Paper Solutions

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

AQI: Advanced Quantum Information Lecture 6 (Module 2): Distinguishing Quantum States January 28, 2013

Lecture 18: Quantum Information Theory and Holevo s Bound

Module 1. Introduction to Digital Communications and Information Theory. Version 2 ECE IIT, Kharagpur

Lecture 3: Channel Capacity

1 Introduction to information theory

Information in Biology

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Information Theory Primer:

Problem Set: TT Quantum Information

Lecture 1: Introduction, Entropy and ML estimation

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Lecture 8: Shannon s Noise Models

Information Theory CHAPTER. 5.1 Introduction. 5.2 Entropy

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

Information in Biology

4 An Introduction to Channel Coding and Decoding over BSC

Shannon s Noisy-Channel Coding Theorem

Noisy-Channel Coding

Quantum Information Theory and Cryptography

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

ITCT Lecture IV.3: Markov Processes and Sources with Memory

Lecture 10: Broadcast Channel and Superposition Coding

Bandwidth: Communicate large complex & highly detailed 3D models through lowbandwidth connection (e.g. VRML over the Internet)

Quantum Information Chapter 10. Quantum Shannon Theory

5 Mutual Information and Channel Capacity

CS 630 Basic Probability and Information Theory. Tim Campbell

Transmitting and Hiding Quantum Information

Classification & Information Theory Lecture #8

Computing and Communications 2. Information Theory -Entropy

Lecture 22: Final Review

18.2 Continuous Alphabet (discrete-time, memoryless) Channel


Lecture 15: Conditional and Joint Typicaility

Lecture 5 - Information theory

Shannon's Theory of Communication

3F1 Information Theory, Lecture 3

Entropy in Classical and Quantum Information Theory

X 1 : X Table 1: Y = X X 2

UNIT I INFORMATION THEORY. I k log 2

Lecture 2. Capacity of the Gaussian channel

The Method of Types and Its Application to Information Hiding

Lecture 6: Quantum error correction and quantum capacity

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006)

(Classical) Information Theory II: Source coding

Lecture 5 Channel Coding over Continuous Channels

Digital communication system. Shannon s separation principle

Example: sending one bit of information across noisy channel. Effects of the noise: flip the bit with probability p.

Ch. 8 Math Preliminaries for Lossy Coding. 8.4 Info Theory Revisited

Lecture 1: Shannon s Theorem

Classical Information Theory Notes from the lectures by prof Suhov Trieste - june 2006

ELEMENT OF INFORMATION THEORY

Lecture 11: Polar codes construction

Entropy and Ergodic Theory Lecture 3: The meaning of entropy in information theory

Appendix B Information theory from first principles

EE 376A: Information Theory Lecture Notes. Prof. Tsachy Weissman TA: Idoia Ochoa, Kedar Tatwawadi

Information Theory. M1 Informatique (parcours recherche et innovation) Aline Roumy. January INRIA Rennes 1/ 73

3F1 Information Theory, Lecture 1

Upper Bounds on the Capacity of Binary Intermittent Communication

Revision of Lecture 5

MAHALAKSHMI ENGINEERING COLLEGE QUESTION BANK. SUBJECT CODE / Name: EC2252 COMMUNICATION THEORY UNIT-V INFORMATION THEORY PART-A

List Decoding of Reed Solomon Codes

Solutions to Homework Set #3 Channel and Source coding

Information Theory - Entropy. Figure 3

FRAMES IN QUANTUM AND CLASSICAL INFORMATION THEORY

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity

ECE Information theory Final

Computational Systems Biology: Biology X

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

Shannon s Noisy-Channel Coding Theorem

Entropy as a measure of surprise

Transcription:

Entropies & Information Theory LECTURE I Nilanjana Datta University of Cambridge,U.K. See lecture notes on: http://www.qi.damtp.cam.ac.uk/node/223

Quantum Information Theory Born out of Classical Information Theory Mathematical theory of storage, transmission & processing of information Quantum Information Theory: how these tasks can be accomplished using quantum-mechanical systems as information carriers (e.g. photons, electrons, ) Quantum Physics Quantum Information Theory Information Theory

The underlying quantum mechanics distinctively new features These can be exploited to: improve the performance of certain information-processing tasks as well as accomplish tasks which are impossible in the classical realm!

Classical Information Theory: 1948, Claude Shannon He posed 2 questions: (Q1) What is the limit to which information can be reliably compressed? relevance: physical limit on storage (Q2) What is the maximum amount of information that can be transmitted reliably per use of a communications channel? relevance: noise in communications channels information = data =signals= messages = outputs of a source

He posed 2 questions: Classical Information Theory:1948, Claude Shannon (Q1) What is the limit to which information can be reliably compressed? (A1) Shannon s Source Coding Theorem: data compression limit = Shannon entropy of the source (Q2) What is the maximum amount of information that can be transmitted reliably per use of a communications channel? (A2) Shannon s Noisy Channel Coding Theorem: maximum rate of info transmission: given in terms of the mutual information

Shannon: information What is information? uncertainty Information gain = decrease in uncertainty of an event measure of information measure of uncertainty Surprisal or Self-information: Consider an event described by a random variable (r.v.) X p( x) (p.m.f); x J (finite alphabet) A measure of uncertainty in getting outcome : ( x): log p( x) a highly improbable outcome is surprising! x log log 2 px ( ) only depends on -- not on values taken by continuous; additive for independent events x X

Shannon entropy = average surprisal Defn: Shannon entropy H ( X ) of a discrete r.v. X p( x), H ( X) ( ( X)) px ( )log px ( ) xj Convention: 0log0 0 lim wlog w 0 w0 log log 2 (If an event has zero probability, it does not contribute to the entropy) H ( X ) : a measure of uncertainty of the r.v. also quantifies the amount of info we gain on average when we learn the value of X X x J H ( X) H( p ) H p( x) X p ( ) X p x x J

Example: Binary Entropy X p( x) J {0,1} p(0) p; p(1) 1 p; H ( X) plog p(1 p)log(1 p) h( p) Properties p p 0 x 1 h( p) 0 hp ( ) 1 x0 no uncertainty p 0.5 : h( p) 1 Concave function of Continuous function of maximum uncertainty p p

Operational Significance of the Shannon Entropy = optimal rate of data compression for a classical memoryless (i.i.d.) information source Classical Information Source Outputs/signals : sequences of letters from a finite set J : source alphabet (i) binary alphabet J {0,1} (ii) telegraph English : 26 letters + a space (iii) written English : 26 letters in upper & lower case + punctuation J

Simplest example: a memoryless source successive signals: independent of each other characterized by a probability distribution On each use of the source, a letter u J pu Modelled by a sequence of i.i.d. random variables U p( u) ( ) u J emitted with prob U, U,..., U u J 1 2 n i pu ( ) PU ( u), uj1 kn. k p( u) Signal emitted by uses of the source: n u ( u, u,..., u ) u 1 2 n ( n) p( u) P( U u, U u,..., U u ) 1 1 2 2 n n 1 2 p( u ) p( u )... p( u n ) Shannon entropy of the source: HU ( ): pu ( )log pu ( ) uj

(Q) Why is data compression possible? (A) There is redundancy in the info emitted by the source -- an info source typically produces some outputs more frequently than others: In English text e occurs more frequently than z. --during data compression one exploits this redundancy in the data to form the most compressed version possible Variable length coding: -- more frequently occurring signals (e.g e ) assigned shorter descriptions (fewer bits) than the less frequent ones (e.g. z ) Fixed length coding: -- identify a set of signals which have high prob of occurrence: typical signals -- assign unique fixed length (l) binary strings to each of them -- all other signal (atypical) assigned a single binary string of same length (l)

Typical Sequences Defn: Consider an i.i.d. info source : 0, U, U,... U ; p( u) ; u J 1 2 For any sequences 1 2 n n u: ( u, u,... u ) J for which n( H( U) ) n( H( U) ) 2 pu ( 1, u2,... u n ) 2, where HU ( ) are called typical sequences n Shannon entropy of the source ( n T ) : typical set = set of typical sequences Note: Typical sequences are almost equiprobable u ( n T ), pu ( ) 2 nh U ( )

Operational Significance of the Shannon Entropy (Q) What is the optimal rate of data compression for such a source? [ min. # of bits needed to store the signals emitted per use of the source] (for reliable data compression) Optimal rate is evaluated in the asymptotic limit n n number of uses of the source One requires p ( n) error 0 ; n (A) optimal rate of data compression = HU ( ) Shannon entropy of the source

Compression-Decompression Scheme U, U,... U ; U p( u) ; u J Suppose is an i.i.d. information 1 2 n i Shannon entropy source A compression scheme of rate When is this a compression scheme? Decompression: Average probability of error: H ( U ) R : En : u: ( u1, u2,... u n ) 1 2 n J mn D n Compr.-decompr. scheme reliable if m : 0,1 n ( n) p av x : ( x, x,... x mn ) ( n) av nr J n m 0,1 n pup ( ) Dn( En( u)) u u p 0 as n

Shannon s Source Coding Theorem: U, U,... U ; U p( u) ; u J Suppose is an i.i.d. information 1 2 n i Shannon entropy source H ( U ) R Suppose : then there exists a reliable compression scheme of rate H( U) R for the source. R H( U) If then any compression scheme of rate will not be reliable. R

Entropies for a pair of random variables Consider a pair of discrete random variables X p( x) ; x JX and Y p( y) ; y JY Given their joint probabilities & their conditional probabilities P( X x, Y y) p( x, y) ; P( Y y X x) p( y x) ; Joint entropy: H( X, Y): p( x, y) log p( x, y) xj X yj Y Conditional entropy: HY ( X): pxhy ( ) ( X x) xj X xj X px ( ) py ( x) log py ( x) yjy xj X yj Y HY ( X) 0 p( xy, ) log py ( x)

Entropies for a pair of random variables Relative Entropy: Measure of the distance between two probability distributions p p( x) ; q q( x) xj p( x) D( p q): p( x)log xj qx ( ) xj convention: 0 u 0log 0 ; ulog u 0 u 0 D( p q) 0 D( p q) 0 if & only if p q BUT not a true distance not symmetric; does not satisfy the triangle inequality

Entropies for a pair of random variables Mutual Information: Measure of the amount of info one r.v. contains about another r.v. X p( x), Y p( y) pxy (, ) I( X, Y): p( x, y)log xy, p( x) p( y) I( X : Y) D( p p p ) XY X Y p pxy (, ) ; p px ( ) ; p py ( ) XY xy, X x Y y Chain rules: H( X, Y) H( Y X) H( X) I( X : Y) H( X) H( Y) H( X, Y) H( X) H( X Y) HY ( ) HY ( X)

Properties of Entropies Let X p( x), Y p( y) be discrete random variables: Then, H( X) 0, with equality if & only if is deterministic H ( X) log J, if x J Subadditivity: H( X, Y) H( X) H( Y), p X Concavity: if & are 2 prob. distributions, H ( p (1 ) p ) H( p ) (1 ) H( p ), X Y X Y HY ( X) 0, or equivalently H( X, Y) H( Y), p Y X I( X : Y) 0 with equality if & only if & X are independent Y

So far. Classical Data Compression: answer to Shannon s question (Q1) What is the limit to which information can be reliably compressed? (A1) Shannon s Source Coding Theorem: data compression limit = Shannon entropy of the source 1 st Classical entropies and their properties

Shannon s 2 nd question (Q2) What is the maximum amount of information that can be transmitted reliably per use of a communications channel? The biggest hurdle in the path of efficient transmission of info is the presence of noise in the communications channel Noise distorts the information sent through the channel. input noisy channel output output input To combat the effects of noise: use error-correcting codes

To overcome the effects of noise: instead of transmitting the original messages, -- the sender encodes her messages into suitable codewords -- these codewords are then sent through (multiple uses of) the channel Alice Bob Alice s message encoding En codeword input n N ( n) uses of N output decoding Dn Bob s inference Error-correcting code: C : ( E, D ): n n n

The idea behind the encoding: To introduce redundancy in the message so that upon decoding, Bob can retrieve the original message with a low probability of error: The amount of redundancy which needs to be added depends on the noise in the channel

Memoryless binary symmetric channel (m.b.s.c.) 0 1 p p 1-p 1-p Encoding: 0 000 1 111 0 1 it transmits single bits effect of the noise: to flip the bit with probability p codewords the 3 bits are sent through 3 successive uses of the m.b.s.c. Suppose 000 010 m.b.s.c. codeword Example Decoding : (majority voting) 010 0 Repetition Code (Bob receives) (Bob infers)

Probability of error for the m.b.s.c. : without encoding = p with encoding = Prob (2 or more bits flipped) := q inference possible inputs 0 0 0 0 1 1 1 1 output of 3 uses of a m.b.s.c. 010 Prove: q < p if p < 1/2 -- in this case encoding helps! (Encoding Decoding) : Repetition Code.

Information transmission is said to be reliable if: -- the probability of error in decoding the output vanishes asymptotically in the number of uses of the channel Aim: to achieve reliable information transmission whilst optimizing the rate the amount of information that can be sent per use of the channel The optimal rate of reliable info transmission: capacity

Discrete classical channel N J X input alphabet; JY output alphabet n uses of N input ( n) x ( n ) n x J X N ( n) output ( ) ( ) ( n n py x ) ( n) ( n) y y J Y n py ( n) ( n) ( x ) conditional probabilities ; known to sender & receiver

Correspondence between input & output sequences is not 1-1 x ( n ) n J X y ( n ) n J Y ( ) x ' n Shannon proved: it is possible to choose a subset of input sequences-- such that there exists only : 1 highly likely input corresponding to a given input Use these input sequences as codewords

Transmission of info through a classical channel Alice s Alice M : m M Alice s message codeword: output: finite set of messages encoding En ( n) ( n) x input N noisy channel n N ( n) uses of x ( x1, x2,..., xn ); ( n) y ( y, y,..., y ); 1 2 n N ( n) y output N ( n ) : decoding Dn p y Bob m M Bob s inference ( n) ( n) ( x ) Error-correcting code: C : ( E, D ): n n n

m M Alice s message If m m encoding En ( n) x input ( n) N then an error occurs! ( n) y output decoding Info transmission is reliable if: Prob. of error 0 Dn as m M Bob s inference n Rate of info transmission = number of bits of message transmitted per use of the channel Aim: achieve reliable transmission whilst maximizing the rate Shannon: there is a fundamental limit on the rate of reliable info transmission ; property of the channel Capacity: maximum rate of reliable information transmission

Shannon in his Noisy Channel Coding Theorem: -- obtained an explicit expression for the capacity of a memoryless classical channel n ( n) ( n) ( ) ( i i) i1 p y x p y x Memoryless (classical or quantum) channels action of each use of the channel is identical and it is independent for different uses -- i.e., the noise affecting states transmitted through the channel on successive uses is assumed to be uncorrelated.

Classical memoryless channel: a schematic representation X px ( ) N input x output y x J, p( y x) X y J Y, Y channel: a set of conditional probs. p( y x) Capacity input distributions C( N ) max I( X : Y) px ( ) I( X : Y) H( X) H( Y) H( X, Y) Shannon Entropy mutual information H( X) px ( )log px ( ) x

Shannon s Noisy Channel Coding Theorem: For a memoryless channel: X px ( ) input N p( y x) Y output Optimal rate of reliable info transmission capacity C( N ) max I( X : Y) px ( ) Sketch of proof

Supposedly von Neumann asked Shannon to call this quantity entropy, saying You should call it entropy for two reasons; first, the function is already in use in thermodynamics; second, and most importantly, most people don t know what entropy really is, & if you use entropy in an argument, you will win every time.

Relation between Shannon entropy & thermodynamic entropy S : klog th Suppose the microstate occurs with prob.. consider replicas of the system th Then on average replicas are in the state Total # of microstates Th.dyn.entropy of compound system of replicas v r vr [ vpr] v v! v v v v v v v v1 v2 vr!!..!.. 1 2 r 1 2 S kv p log p v r r r r (by Stirling s) S S / v k p log p k H p r Shannon v r r r v total # of microstates r p r entropy