Computing and Communications 2. Information Theory -Entropy

1896 1920 1987 2006 Computing and Communications 2. Information Theory -Entropy Ying Cui Department of Electronic Engineering Shanghai Jiao Tong University, China 2017, Autumn 1

Outline Entropy Joint entropy and conditional entropy Relative entropy and mutual information Relationship between entropy and mutual information Chain rules for entropy, relative entropy and mutual information Jensen s inequality and its consequences 2

Reference Elements of information theory, T. M. Cover and J. A. Thomas, Wiley 3

OVERVIEW 4

Information Theory Information theory answers two fundamental questions in communication theory what is the ultimate data compression? -- entropy H what is the ultimate transmission rate of communication? -- channel capacity C Information theory is considered as a subset of communication theory 5

Information Theory Information theory has fundamental contributions to other fields 6

A Mathematical Theory of Commun. In 1948, Shannon published A Mathematical Theory of Communication, founding Information Theory Shannon made two major modifications having huge impact on communication design the source and channel are modeled probabilistically bits became the common currency of communication 7

A Mathematical Theory of Commun. Shannon proved the following three theorems Theorem 1. Minimum compression rate of the source is its entropy rate H Theorem 2. Maximum reliable rate over the channel is its mutual information I Theorem 3. End-to-end reliable communication happens if and only if H < I, i.e. there is no loss in performance by using a digital interface between source and channel coding Impacts of Shannon s results after almost 70 years, all communication systems are designed based on the principles of information theory the limits not only serve as benchmarks for evaluating communication schemes, but also provide insights on designing good ones basic information theoretic limits in Shannon s theorems have now been successfully achieved using efficient algorithms and codes 8

ENTROPY 9

Definition Entropy is a measure of the uncertainty of a r.v. Consider discrete r.v. X with alphabet and p.m.f. p( x) Pr[ X x], x log is to the base 2, and entropy is expressed in bits e.g., the entropy of a fair coin toss is 1 bit define 0log 0 0, since x log x 0 as x 0 adding terms of zero probability does not change the entropy 10

Properties entropy is nonnegative base of log can be changed 11

Example H(X)=1 bit when p=0.5 maximum uncertainty H(X)=0 bit when p=0 or 1 minimum uncertainty concave function of p 12

Example 13

JOINT ENTROPY AND CONDITIONAL ENTROPY 14

Joint Entropy Joint entropy is a measure of the uncertainty of a pair of r.v.s Consider a pair of discrete r.v.s (X,Y) with alphabet and p.m.f.s ( ) Pr[ ],, ( ) Pr[ ],, p x X x x p y Y y y 15

Conditional Entropy Conditional entropy of a r.v. (Y) given another r.v. (X) expected value of entropies of conditional distributions, averaged over conditioning r.v. 16

Chain Rule 17

Chain Rule 18

Example 19

Example 20

RELATIVE ENTROPY AND MUTUAL INFORMATION 21

Relative Entropy Relative entropy is a measure of the distance between two distributions convention: if there is any 0 0 p 0log 0, 0log 0 and p log 0 q 0 x such that p( x) 0 and q( x) 0, then D( p q). 22

Example 23

Mutual Information Mutual information is a measure of the amount of information that one r.v. contains about another r.v. 24

RELATIONSHIP BETWEEN ENTROPY AND MUTUAL INFORMATION 25

Relation 26

Proof 27

Illustration 28

CHAIN RULES FOR ENTROPY, RELATIVE ENTROPY, AND MUTUAL INFORMATION 29

Chain Rule for Entropy 30

Proof 31

Alternative Proof 32

Chain Rule for Information 33

Proof 34

Chain Rule for Relative Entropy 35

Proof 36

JENSEN'S INEQUALITY AND ITS CONSEQUENCES 37

Convex & Concave Functions Examples: 2 x convex functions: x, x, e, x log x (for x 0) concave functions: log x and x (for x 0) linear functions ax b are both convex and concave 38

Convex & Concave Functions 39

Jensen s Inequality 40

Information Inequality 41

Proof 42

Nonnegativity of Mutual Information 43

Max. Entropy Dist. Uniform Dist. 44

Conditioning Reduces Entropy 45

Independence Bound on Entropy 46

Summary 47

Summary 48

Summary 49

cuiying@sjtu.edu.cn iwct.sjtu.edu.cn/personal/yingcui 50