Lecture 22: Final Review Nuts and bolts Fundamental questions and limits Tools Practical algorithms Future topics Dr Yao Xie, ECE587, Information Theory, Duke University
Basics Dr Yao Xie, ECE587, Information Theory, Duke University 1
Nuts and bolts Entropy: H(X) = p(x) log 2 p(x) (bits) x H(X) = f(x) log f(x)dx (bits) Conditional entropy: H(X Y ), joint entropy: H(X, Y ) Mutual information: reduction in uncertainty due to another random variable I(X; Y ) = H(X) H(X Y ) Relative entropy: D(p q) = x p(x) log p(x) q(x) Dr Yao Xie, ECE587, Information Theory, Duke University 2
For stochastic processes: entropy rate H(X ) = lim n = ij H(X n ) n = lim n H(X n X n 1,, X 1 ) µ i P ij log P ij for 1st order Markov chain 567896(!"#$%&'( 3*&00:,(!"#$%&'(( )*+,(-(( %/01023*&004,(( ;4#47(<"=%$>78%"( ( <*+?(@,(-()*+,(2()*+0@,( ( <*+?(@,(-(3*&*AB(',00&*A,&*',,( Dr Yao Xie, ECE587, Information Theory, Duke University 3
H(X,Y) H(X Y ) I(X;Y) H(Y X ) H(X ) H(Y ) Dr Yao Xie, ECE587, Information Theory, Duke University 4
Thou Shalt Know (In)equalities Chain rules: H(X, Y ) = H(X) + H(Y X), H(X 1, X 2,, X n ) = n i=1 H(X i X i 1,, X 1 ) I(X 1, X 2,, X n ; Y ) = n i=1 I(X i; Y X i 1,, X 1 ) Jensen s inequality: if f is a convex function, Ef(X) f(ex) Conditioning reduces entropy: H(X Y ) H(X) H(X) 0 (but differential entropy can be < 0), I(X; Y ) 0 (for both discrete and continuous) Dr Yao Xie, ECE587, Information Theory, Duke University 5
Data processing inequality: X Y Z I(X; Z) I(X; Y ) I(X; Z) I(Y ; Z) Fano s inequality: P e H(X Y ) 1 log X Dr Yao Xie, ECE587, Information Theory, Duke University 6
Fundamental Questions and Limits Dr Yao Xie, ECE587, Information Theory, Duke University 7
Fundamental questions Physical Channel Source Encoder Transmitter Receiver Decoder Data compression limit (lossless source coding) Data transmission limit (channel capacity) Tradeoff between rate and distortion (lossy compression) Dr Yao Xie, ECE587, Information Theory, Duke University 8
Codebook /0+1'+"#$,"-#'' 10'#23$)2")')'!"#$%""&' ( )* '+"#$,"-#' Dr Yao Xie, ECE587, Information Theory, Duke University 9
Fundamental limits Data compression limit Data transmission limit ^ min l(x; X ) max l(x; Y ) Lossless compression: I(X; ˆX) = H(X) H(X ˆX) H(X ˆX) = 0, I(X; ˆX) = H(X) Data compression limit: x l(x)p(x) H(X) Dr Yao Xie, ECE587, Information Theory, Duke University 10
instantaneous code: m i=1 D l i 1 optimal code length: l i = log D p i 0 Root 10 110 111 Dr Yao Xie, ECE587, Information Theory, Duke University 11
Data compression limit Data transmission limit ^ min l(x; X ) max l(x; Y ) Channel capacity: given fixed channel with transition probability p(y x) C = max p(x) I(X; Y ) Dr Yao Xie, ECE587, Information Theory, Duke University 12
BSC: C = 1 H(p), Erasure: C = 1 α Gaussian channel: C = 1 2 log(1 + P N ) water-filling Power n P 1 P 2 N 3 N 1 N 2 Channel 1 Channel 2 Channel 3 Dr Yao Xie, ECE587, Information Theory, Duke University 13
!" % & '() "!*" % & '() "!!" $" #" Dr Yao Xie, ECE587, Information Theory, Duke University 14
Data compression limit Data transmission limit ^ min l(x; X ) max l(x; Y ) Rate-distortion: given source with distribution p(x) R(D) = min p(x ˆx): p(x)p(ˆx x)d(x,ˆx) D I(X; ˆX) compute R(D): construct test channel Dr Yao Xie, ECE587, Information Theory, Duke University 15
Bernoulli source: R(D) = (H(p) H(D)) + Gaussian source: R(D) = ( 1 2 ) + σ2 log D Dr Yao Xie, ECE587, Information Theory, Duke University 16
Tools Dr Yao Xie, ECE587, Information Theory, Duke University 17
Tools: from probability Law of Large Number (LLN): If x n independent and identically distributed, 1 N x n E{X}, wp1 N n=1 Variance: σ 2 = E{(X µ) 2 } = E{X 2 } µ 2 Central Limit Theorem (CLT): 1 Nσ 2 N (x n µ) N (0, 1) n=1 Dr Yao Xie, ECE587, Information Theory, Duke University 18
Tools: AEP LLN for product of iid random variables n n i=1 E(log X) X i e AEP 1 n log 1 p(x 1, X 2,, X n ) H(X) p(x 1, X 2,, X n ) 2 nh(x) Divide all sequences in X n into two sets Dr Yao Xie, ECE587, Information Theory, Duke University 19
Typicality n : n elements Non-typical set Typical set A (n) : 2 n(h + ) elements Dr Yao Xie, ECE587, Information Theory, Duke University 20
Joint typicality x n y n X n Y n Dr Yao Xie, ECE587, Information Theory, Duke University 21
Tools: Maximum entropy Discrete: H(X) log X, equality when X has uniform distribution Continuous: H(X) 1 2 log(2πe)n K, EX 2 = K equality when X N (0, K) Dr Yao Xie, ECE587, Information Theory, Duke University 22
Practical Algorithms Dr Yao Xie, ECE587, Information Theory, Duke University 23
Practical algorithms: source coding Huffman, Shannon-Fano-Elias, Arithmetic code Codeword Length Codeword X Probability 2 01 1 025 03 045 055 1 2 10 2 025 025 03 045 2 11 3 02 025 025 3 000 4 015 02 3 001 5 015 Dr Yao Xie, ECE587, Information Theory, Duke University 24
Practical algorithms: channel coding!"#$%&'()*+,'' +*-$' *#/*)01*#%)' +*-$' 2%33"#4' +*-$' 80&(*' 5$$-6 7*)*3*#'!9:'?2' :*)%&' ;'<=%&1%)>'-"%4&%3' Dr Yao Xie, ECE587, Information Theory, Duke University 25
Practical algorithms: decoding Viterbi algorithm Dr Yao Xie, ECE587, Information Theory, Duke University 26
What are some future topics Dr Yao Xie, ECE587, Information Theory, Duke University 27
Distributed lossless coding Consider the two iid sources (U, V ) p(u, v) U n Encoder 1 I Decoder (Ûn, ˆV n ) V n Encoder 2 J R + R H(U,V ) R 2 H(V ) H(V U) H(U V )H(U) R 1 Dr Yao Xie, ECE587, Information Theory, Duke University 28
Multi-user information theory Dr Yao Xie, ECE587, Information Theory, Duke University 29
Multiple access channel W 1 Encoder 1 X n 1 p(y x 1, x 2 ) Y n Decoder (Ŵ 1, Ŵ 2 ) W 2 Encoder 2 X n 2 R 2 I(X 2 ; Y X 1 ) I(X 2 ; Y ) I(X 1 ; Y ) I(X 1 ; Y X 2 ) R 1 [1,2]: The capacity region for the DM-MAC is the Dr Yao Xie, ECE587, Information Theory, Duke University 30
Statistics Large deviation theory Stein s lemma Dr Yao Xie, ECE587, Information Theory, Duke University 31
One last thing Make things as simple as possible, but not simpler A Einstein Dr Yao Xie, ECE587, Information Theory, Duke University 32