ELEMENTS O F INFORMATION THEORY

Similar documents
ELEMENTS OF INFORMATION THEORY

Lecture 22: Final Review

Introduction to Information Theory

Information Theory and Coding Techniques: Chapter 1.1. What is Information Theory? Why you should take this course?

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

Springer Undergraduate Texts in Mathematics and Technology

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

Large Deviations Techniques and Applications

Chapter 9 Fundamental Limits in Information Theory

Basic Principles of Lossless Coding. Universal Lossless coding. Lempel-Ziv Coding. 2. Exploit dependences between successive symbols.

(Classical) Information Theory III: Noisy channel coding

Computing and Communications 2. Information Theory -Entropy

Interpretations of Directed Information in Portfolio Theory, Data Compression, and Hypothesis Testing

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem

Memory in Classical Information Theory: A Brief History

MUTUAL information between two random

Information Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

IN this paper, we study the problem of universal lossless compression

Lecture 5: Asymptotic Equipartition Property

Outline of the Lecture. Background and Motivation. Basics of Information Theory: 1. Introduction. Markku Juntti. Course Overview

INFORMATION THEORY AND STATISTICS

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

Coding for Discrete Source

Lecture 10: Broadcast Channel and Superposition Coding

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

Information Theory. Coding and Information Theory. Information Theory Textbooks. Entropy

Information Theory. Lecture 10. Network Information Theory (CT15); a focus on channel capacity results

Chaos, Complexity, and Inference (36-462)

Motivation for Arithmetic Coding

Lecture Notes on Digital Transmission Source and Channel Coding. José Manuel Bioucas Dias

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

Lecture 11: Polar codes construction

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

Quiz 2 Date: Monday, November 21, 2016

1 Background on Information Theory

SHARED INFORMATION. Prakash Narayan with. Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe

Chapter 2: Source coding

Fundamentals of Information Theory Lecture 1. Introduction. Prof. CHEN Jie Lab. 201, School of EIE BeiHang University

Complex Systems Methods 2. Conditional mutual information, entropy rate and algorithmic complexity

1590 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 6, JUNE Source Coding, Large Deviations, and Approximate Pattern Matching

Optimal Encoding Schemes for Several Classes of Discrete Degraded Broadcast Channels

Characterization of Information Channels for Asymptotic Mean Stationarity and Stochastic Stability of Nonstationary/Unstable Linear Systems

The Information Lost in Erasures Sergio Verdú, Fellow, IEEE, and Tsachy Weissman, Senior Member, IEEE

Information Theory with Applications, Math6397 Lecture Notes from September 30, 2014 taken by Ilknur Telkes

ECE Information theory Final

Information and Entropy

Multiaccess Channels with State Known to One Encoder: A Case of Degraded Message Sets

(Classical) Information Theory II: Source coding

Chapter 2 Review of Classical Information Theory

Optimal Natural Encoding Scheme for Discrete Multiplicative Degraded Broadcast Channels

Interactive Decoding of a Broadcast Message


lossless, optimal compressor

EC2252 COMMUNICATION THEORY UNIT 5 INFORMATION THEORY

LECTURE 3. Last time:

Lecture 4 Noisy Channel Coding

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Lecture 4 : Adaptive source coding algorithms

EE 376A: Information Theory Lecture Notes. Prof. Tsachy Weissman TA: Idoia Ochoa, Kedar Tatwawadi

Recent Results on Input-Constrained Erasure Channels

The Gallager Converse

Multimedia Communications. Mathematical Preliminaries for Lossless Compression

Digital communication system. Shannon s separation principle

ELEC 515 Information Theory. Distortionless Source Coding

UNIT I INFORMATION THEORY. I k log 2

SHARED INFORMATION. Prakash Narayan with. Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe

MARKOV CHAINS A finite state Markov chain is a sequence of discrete cv s from a finite alphabet where is a pmf on and for

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

arxiv: v1 [cs.it] 5 Sep 2008

Stochastic Processes. Theory for Applications. Robert G. Gallager CAMBRIDGE UNIVERSITY PRESS

Midterm Exam Information Theory Fall Midterm Exam. Time: 09:10 12:10 11/23, 2016

National University of Singapore Department of Electrical & Computer Engineering. Examination for

3F1 Information Theory, Lecture 3

Introduction to information theory and coding

Reliable Computation over Multiple-Access Channels

Lattices for Distributed Source Coding: Jointly Gaussian Sources and Reconstruction of a Linear Function

Lecture 8: Shannon s Noise Models

ECE 587 / STA 563: Lecture 5 Lossless Compression

ECE 587 / STA 563: Lecture 5 Lossless Compression

Fundamentals of Applied Probability and Random Processes

3F1 Information Theory, Lecture 3

Information Theory and Coding Techniques

Non-binary Distributed Arithmetic Coding

ECE Information theory Final (Fall 2008)

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

Data Compression. Limit of Information Compression. October, Examples of codes 1

5 Mutual Information and Channel Capacity

Upper Bounds on the Capacity of Binary Intermittent Communication

Data Compression Techniques (Spring 2012) Model Solutions for Exercise 2

6196 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 9, SEPTEMBER 2011

Katalin Marton. Abbas El Gamal. Stanford University. Withits A. El Gamal (Stanford University) Katalin Marton Withits / 9

MAHALAKSHMI ENGINEERING COLLEGE-TRICHY QUESTION BANK UNIT V PART-A. 1. What is binary symmetric channel (AUC DEC 2006)

Remote Source Coding with Two-Sided Information

Lecture 4 Channel Coding

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

Introduction to Information Theory. Part 4

Performance of Polar Codes for Channel and Source Coding

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

ONE of Shannon s key discoveries was that for quite

Transcription:

ELEMENTS O F INFORMATION THEORY THOMAS M. COVER JOY A. THOMAS

Preface to the Second Edition Preface to the First Edition Acknowledgments for the Second Edition Acknowledgments for the First Edition x v xvi i xx i xxii i 1 Introduction and Preview 1 1.1 Preview of the Book 5 2 Entropy, Relative Entropy, and Mutual Information 1 3 2.1 Entropy 1 3 2.2 Joint Entropy and Conditional Entropy 1 6 2.3 Relative Entropy and Mutual Information 1 9 2.4 Relationship Between Entropy and Mutua l Information 20 2.5 Chain Rules for Entropy, Relative Entropy, and Mutual Information 22 2.6 Jensen's Inequality and Its Consequences 2 5 2.7 Log Sum Inequality and Its Applications 3 0 2.8 Data-Processing Inequality 3 4 2.9 Sufficient Statistics 3 5 2.10 Fano's Inequality 3 7 Summary 4 1 Problems 43 Historical Notes 54

3 Asymptotic Equipartition Property 57 3.1 Asymptotic Equipartition Property Theorem 5 8 3.2 Consequences of the AEP: Data Compression 60 3.3 High-Probability Sets and the Typical Set 62 Summary 64 Problems 64 Historical Notes 69 4 Entropy Rates of a Stochastic Process 7 1 4.1 Markov Chains 7 1 4.2 Entropy Rate 7 4 4.3 Example: Entropy Rate of a Random Walk on a Weighted Graph 7 8 4.4 Second Law of Thermodynamics 8 1 4.5 Functions of Markov Chains 84 Summary 87 Problems 8 8 Historical Notes 100 5 Data Compression 103 5.1 Examples of Codes 10 3 5.2 Kraft Inequality 107 5.3 Optimal Codes 11 0 5.4 Bounds on the Optimal Code Length 11 2 5.5 Kraft Inequality for Uniquely Decodabl e Codes 11 5 5.6 Huffman Codes 11 8 5.7 Some Comments on Huffman Codes 120 5.8 Optimality of Huffman Codes 12 3 5.9 Shannon-Fano-Elias Coding 127 5.10 Competitive Optimality of the Shanno n Code 130 5.11 Generation of Discrete Distributions from Fair Coins 134 Summary 14 1 Problems 142 Historical Notes 157

6 Gambling and Data Compression 159 6.1 The Horse Race 15 9 6.2 Gambling and Side Information 164 6.3 Dependent Horse Races and Entropy Rate 16 6 6.4 The Entropy of English 168 6.5 Data Compression and Gambling 17 1 6.6 Gambling Estimate of the Entropy of English 17 3 Summary 17 5 Problems 17 6 Historical Notes 18 2 7 Channel Capacity 183 7.1 Examples of Channel Capacity 184 7.1.1 Noiseless Binary Channel 184 7.1.2 Noisy Channel with Nonoverlappin g Outputs 185 7.1.3 Noisy Typewriter 18 6 7.1.4 Binary Symmetric Channel 18 7 7.1.5 Binary Erasure Channel 18 8 7.2 Symmetric Channels 18 9 7.3 Properties of Channel Capacity 19 1 7.4 Preview of the Channel Coding Theorem 19 1 7.5 Definitions 192 7.6 Jointly Typical Sequences 19 5 7.7 Channel Coding Theorem 199 7.8 Zero-Error Codes 205 7.9 Fano's Inequality and the Converse to the Codin g Theorem 206 7.10 Equality in the Converse to the Channel Coding Theorem 208 7.11 Hamming Codes 21 0 7.12 Feedback Capacity 21 6 7.13 Source-Channel Separation Theorem 21 8 Summary 22 2 Problems 22 3 Historical Notes 240

8 Differential Entropy 24 3 8.1 Definitions 24 3 8.2 AEP for Continuous Random Variables 24 5 8.3 Relation of Differential Entropy to Discrete Entropy 24 7 8.4 Joint and Conditional Differential Entropy 249 8.5 Relative Entropy and Mutual Information 25 0 8.6 Properties of Differential Entropy, Relative Entropy, and Mutual Information 252 Summary 25 6 Problems 256 Historical Notes 25 9 9 Gaussian Channel 26 1 9.1 Gaussian Channel : Definitions 26 3 9.2 Converse to the Coding Theorem for Gaussian Channels 26 8 9.3 Bandlimited Channels 27 0 9.4 Parallel Gaussian Channels 274 9.5 Channels with Colored Gaussian Noise 27 7 9.6 Gaussian Channels with Feedback 28 0 Summary 28 9 Problems 290 Historical Notes 299 10 Rate Distortion Theory 30 1 10.1 Quantization 30 1 10.2 Definitions 30 3 10.3 Calculation of the Rate Distortion Function 307 10.3.1 Binary Source 307 10.3.2 Gaussian Source 31 0 10.3.3 Simultaneous Description of Independen t Gaussian Random Variables 31 2 10.4 Converse to the Rate Distortion Theorem 31 5 10.5 Achievability of the Rate Distortion Function 31 8 10.6 Strongly Typical Sequences and Rate Distortion 32 5 10.7 Characterization of the Rate Distortion Function 329

10.8 Computation of Channel Capacity and the Rate Distortion Function 33 2 Summary 33 5 Problems 33 6 Historical Notes 34 5 11 Information Theory and Statistics 347 11.1 Method of Types 347 11.2 Law of Large Numbers 35 5 11.3 Universal Source Coding 35 7 11.4 Large Deviation Theory 360 11.5 Examples of Sanov's Theorem 364 11.6 Conditional Limit Theorem 36 6 11.7 Hypothesis Testing 37 5 11.8 Chernoff-Stein Lemma 38 0 11.9 Chernoff Information 384 11.10 Fisher Information and the Cramer-Ra o Inequality 39 2 Summary 39 7 Problems 399 Historical Notes 40 8 12 Maximum Entropy 409 12.1 Maximum Entropy Distributions 409 12.2 Examples 41 1 12.3 Anomalous Maximum Entropy Problem 41 3 12.4 Spectrum Estimation 41 5 12.5 Entropy Rates of a Gaussian Process 41 6 12.6 Burg's Maximum Entropy Theorem 41 7 Summary 420 Problems 42 1 Historical Notes 42 5 13 Universal Source Coding 427 13.1 Universal Codes and Channel Capacity 42 8 13.2 Universal Coding for Binary Sequences 43 3 13.3 Arithmetic Coding 436

13.4 Lempel-Ziv Coding 44 0 13.4.1 Sliding Window Lempel-Ziv Algorithm 44 1 13.4.2 Tree-Structured Lempel-Ziv Algorithms 442 13.5 Optimality of Lempel-Ziv Algorithms 44 3 13.5.1 Sliding Window Lempel-Ziv Algorithms 443 13.5.2 Optimality of Tree-Structured Lempel-Zi v Compression 44 8 Summary 45 6 Problems 45 7 Historical Notes 46 1 14 Kolmogorov Complexity 46 3 14.1 Models of Computation 46 4 14.2 Kolmogorov Complexity: Definitions and Examples 466 14.3 Kolmogorov Complexity and Entropy 47 3 14.4 Kolmogorov Complexity of Integers 47 5 14.5 Algorithmically Random and Incompressible Sequences 476 14.6 Universal Probability 480 14.7 Kolmogorov complexity 48 2 14.8 Q 484 14.9 Universal Gambling 487 14.10 Occam's Razor 48 8 14.11 Kolmogorov Complexity and Universa l Probability 49 0 14.12 Kolmogorov Sufficient Statistic 49 6 14.13 Minimum Description Length Principle 50 0 Summary 50 1 Problems 50 3 Historical Notes 507 15 Network Information Theory 509 15.1 Gaussian Multiple-User Channels 513

15.1.1 Single-User Gaussian Channel 51 3 15.1.2 Gaussian Multiple-Access Channe l with in Users 51 4 15.1.3 Gaussian Broadcast Channel 51 5 15.1.4 Gaussian Relay Channel 51 6 15.1.5 Gaussian Interference Channel 51 8 15.1.6 Gaussian Two-Way Channel 51 9 15.2 Jointly Typical Sequences 52 0 15.3 Multiple-Access Channel 524 15.3.1 Achievability of the Capacity Region for the Multiple-Access Channel 53 0 15.3.2 Comments on the Capacity Region for th e Multiple-Access Channel 53 2 15.3.3 Convexity of the Capacity Region of th e Multiple-Access Channel 534 15.3.4 Converse for the Multiple-Acces s Channel 53 8 15.3.5 m-user Multiple-Access Channels 54 3 15.3.6 Gaussian Multiple-Access Channels 54 4 15.4 Encoding of Correlated Sources 54 9 15.4.1 Achievability of the Slepian-Wolf Theorem 55 1 15.4.2 Converse for the Slepian-Wolf Theorem 55 5 15.4.3 Slepian-Wolf Theorem for Man y Sources 55 6 15.4.4 Interpretation of Slepian-Wol f Coding 55 7 15.5 Duality Between Slepian-Wolf Encoding an d Multiple-Access Channels 55 8 15.6 Broadcast Channel 560 15.6.1 Definitions for a Broadcast Channel 563 15.6.2 Degraded Broadcast Channels 564 15.6.3 Capacity Region for the Degraded Broadcas t Channel 565 15.7 Relay Channel 57 1 15.8 Source Coding with Side Information 57 5 15.9 Rate Distortion with Side Information 580

15.10 General Multiterminal Networks 58 7 Summary 594 Problems 59 6 Historical Notes 60 9 16 Information Theory and Portfolio Theory 61 3 16.1 The Stock Market: Some Definitions 61 3 16.2 Kuhn-Tucker Characterization of the Log-Optima l Portfolio 61 7 16.3 Asymptotic Optimality of the Log-Optima l Portfolio 61 9 16.4 Side Information and the Growth Rate 62 1 16.5 Investment in Stationary Markets 62 3 16.6 Competitive Optimality of the Log-Optima l Portfolio 62 7 16.7 Universal Portfolios 629 16.7.1 Finite-Horizon Universal Portfolios 63 1 16.7.2 Horizon-Free Universal Portfolios 63 8 16.8 Shannon-McMillan-Breiman Theorem (General AEP) 644 Summary 65 0 Problems 65 2 Historical Notes 65 5 17 Inequalities in Information Theory 65 7 17.1 Basic Inequalities of Information Theory 65 7 17.2 Differential Entropy 660 17.3 Bounds on Entropy and Relative Entropy 66 3 17.4 Inequalities for Types 66 5 17.5 Combinatorial Bounds on Entropy 66 6 17.6 Entropy Rates of Subsets 667 17.7 Entropy and Fisher Information 67 1 17.8 Entropy Power Inequality and Brunn-Minkowsk i Inequality 67 4 17.9 Inequalities for Determinants 679

17.10 Inequalities for Ratios of Determinants 68 3 Summary 68 6 Problems 68 6 Historical Notes 68 7 Bibliography 68 9 List of Symbols 723 Index 727