One Lesson of Information Theory

Institut für One Lesson of Information Theory Prof. Dr.-Ing. Volker Kühn Institute of Communications Engineering University of Rostock, Germany Email: volker.kuehn@uni-rostock.de http://www.int.uni-rostock.de/ September 2010 Volker Kühn - One Lesson of Information Theory UNIVERSITÄT ROSTOCK FAKULTÄT INFORMATIK UND ELEKTROTECHNIK

Outline of Lectures Lesson 1: One Lesson of Information Theory Principle structure of communication systems Definitions of entropy, mutual information, Channel coding theorem of Shannon Lesson 2: Introduction to Error Correcting Codes Basics of error correcting codes Linear block codes Convolutional codes (if time permits) Lesson 3: State-of-the-art channel coding Coding strategies to approach the capacity limits Definition of soft-information and turbo decoding principle Examples for state-of-the-art error correcting codes 2

Literature Lin/ Costello: Error Contol Coding: Fundamentals and Applications Bossert: Channel Coding Johannesson/Zigangirov: Fundamentals of Convolutional Codes Richardson, Urbanke: Modern Coding Theory Neubauer, Freudenberger, Kühn: Coding Theory Algorithms, Architectures, and Applications Johannesson: Information Theory Cover, Thomas: Elements of Information Theory 3

Principle Structure of Digital Communication System analog source source encoder digital source u k Source generates analog signal (e.g. voice, video) Source coding samples, quantizes and compresses analog signal Digital Source: comprises analog source and source coding, delivers digital data vector u of length k 5

Principle Structure of Digital Communication System analog source source encoder digital source u k channel encoder x n Channel encoder adds redundancy to u resulting in code word x of length n Channel encoder may consist of several constituent codes Code rate: R c = k / n 6

Principle Structure of Digital Communication System analog source source encoder digital source Modulator maps discrete vector x onto analog waveform and moves it into the transmission band Physical channel represents transmission medium Multipath propagation intersymbol interference (ISI) Time varying fading, i.e. deep fades Additive noise Demodulator: Moves signal back into baseband and performs lowpass filtering, sampling, quantization u channel encoder R c = k/n x y modulator demodulator physical channel time-discrete channel Time-discrete channel: comprises analog part of modulator, physical channel and analog part of demodulator 7

Principle Structure of Digital Communication System analog source source encoder digital source u channel encoder R c = k/n x modulator physical channel u channel decoder y demodulator Channel decoder: Estimation of u on the basis of received vector y y need not to consist of hard quantized values {0,1} Since encoder may consist of several parts, decoder may also consist of several modules time-discrete channel 8

Principle Structure of Digital Communication System analog source source encoder digital source source sink decoder digital sink u x channel encoder R c = k/n û y channel decoder modulator demodulator physical channel time-discrete channel Citation of Massey: The purpose of the modulation system is to create a good discrete channel from the modulator input to the demodulator output, and the purpose of the coding system is to transmit the information bits reliably through this discrete channel at the highest practicable rate. 9

Time-Discrete Channel Time-discrete channel comprises analog parts of modulator and demodulator as well as physical transmission medium x i X discrete input alphabet X = {X 0,..., X X 1 } discrete channel y i Y discrete or continuous output alphabets Y = {Y 0,..., Y Y 1 } Y = R Probabilities, probability densities: Joint probability of event: Conditional probabilities: A posteriori probabilities: Pr{X ν }, Pr{Y μ } Pr{X ν, Y μ } Pr{Y μ X ν } Pr{X ν Y μ } p(y) p(x = X ν, y) p(y x = X ν ) Pr{X ν y} 10

AWGN: Additive White Gaussian Noise x i n i y i p(y x = X ν ) = 1 p e (y Xν ) 2σ N 2 2πσ 2 N 2 0.8 signal-to-noise-ratio E s /N 0 : 2 db 0.8 signal-to-noise-ratio E s /N 0 : 6 db 0.6 p(y x = 1) p(y x = +1) 0.6 p(y x = 1) p(y x = +1) 0.4 0.2 p(y) 0.4 0.2 p(y) 0-4 -2 0 2 4 y 0-4 -2 0 2 4 y 11

Error Probability of AWGN Channel and BPSK Complementary error function: 0.8 0.6 P s = 1 π Z p Es /N 0 e ξ 2 decision threshold Y 0 Y 1 dξ = 1 2 erfc Ãr Es N 0! 0.4 0.2 0-4 -2 0 2 4 X 0 = -1 X 1 = +1 12

Transition to Discrete Channels Discrete channels arise from quantization of continuous channel output We consider binary antipodal transmission: X = {X 0, X 1 } = {+1, 1} Generally continuously distributed channel output: Y = R L-bit quantization due to finite precision of digital circuits delivers alphabet Y = {Y 0,..., Y 2 L 1} L = 1: Hard-Decision: Y = {Y 0, Y 1 } = {+1, 1} = X L = 2: four output symbols: Y = {Y 0, Y 1, Y 2, Y 3 } L = 3: eight output symbols: Y = {Y 0, Y 1, Y 2, Y 3, Y 4, Y 5, Y 6, Y 7 } 13

Discrete Channels (1) Binary Symmetric Channel (BSC) from hard decision (L = 2) Y = {Y 0, Y 1 } = {+1, 1} X 0 = -1 X 1 = +1 Y 0 Y 1 X 0 X 1 1-P e Y 0 P e 1-P e P e Y 1 Ãr! P e = 1 2 erfc Es N 0 14

Discrete Channels (2) Binary Symmetric Erasure Channel (BSEC) X 0 P e 1-P e -P q P q P e Y 0 Y 2 P X q 1 Y 1-P e -P 1 q X 0 = -1 X 1 = +1 -a +a Y 0 Y 2 Y 1 15

Discrete Channels (3) 2-Bit-Quantization -a 0 +a Y 0 Y 0 Y 1 Y 2 Y 3 X 0 Y 1 Y 2 X 1 Y 3 16

Information, Entropy Amount of information should depend on probability: For independent events: I(X ν ) = f(pr{x ν }) Pr{X ν, Y μ } = Pr{X ν } Pr{Y μ } I(X ν, Y μ ) = I(X ν ) + I(Y μ ) Logarithm is sole function that maps product onto a sum µ 1 I(X ν ) = log 2 (Pr{X ν }) = log 2 0 Pr(X ν ) Entropy: H(X) = P ν Pr{X ν} log 2 (Pr{X ν }) = E log 2 (Pr{X}) ª Entropy is a measure of uncertainty 18

Examples for Entropy Set of events: X = X 1, X 2, X 3, X 4, X 5 ª Each event occurs with certain probability Pr{X 1 } = 0.30 I(X 1 ) = 1.7370 Pr{X 2 } = 0.20 I(X 1 ) = 2.3219 Pr{X 3 } = 0.20 I(X 1 ) = 2.3219 Pr{X 4 } = 0.15 I(X 1 ) = 2.7370 Pr{X 5 } = 0.15 I(X 1 ) = 2.7370 H(X) = X ν Pr{X ν} log 2 (Pr{X ν }) = 2.271 bit Entropy of a set is maximized, when all M elements are equally likely max H(X) = H equal(x) = X M 1 Pr{X} ν=0 1 M log 2(M) = log 2 (M) bit = 2, 32 bit 19

Example: LCD for 10 digits a b c d e f g digit 0 1 2 3 4 5 6 7 8 9 a 1 0 0 0 1 1 1 0 1 1 b 1 0 1 0 0 0 1 0 1 0 c 1 0 1 1 0 1 1 1 1 1 d 0 0 1 1 1 1 1 0 1 1 e 1 0 1 1 0 1 1 0 1 1 f 1 1 1 1 1 0 0 1 1 1 g 1 1 0 1 1 1 1 1 1 1 All digits with same probability: Amount of information per digit: Entropy of alphabet: Absolute redundancy: Relative redundancy: Pr{X ν } = 0.1 I(X ν ) = log 2 (Pr{X ν }) = log 2 (10) = 3.32 bit H(X) = P ν Pr{X ν} I(X ν ) = 3.32 bit R = m H(X) = 7 bit 3.32 bit = 3.68 bit r = R/m = 3.68 bit/7 bit = 52.54% 20

Binary Entropy Function Set of events: X = X 1, X 2, ª Event probabilities: Pr{X 1 } = P 1 Pr{X 2 } = 1 P 1 H(X) = H 2 (P 1 ) = P 1 log 2 (P 1 ) (1 P 1 ) log 2 (1 P 1 ) 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 P 1 0.8 1 21

Illumination of Entropies H(X) H(Y) H(X Y) H(Y X) H(X; Y) H(X, Y) H(X), H(Y) H(X, Y) H(X Y) H(Y X) H(X; Y) entropies of source and sink alphabet entropy of sink alphabet joint entropy of source and sink equivocation: information lost during transmission irrelevance, information originating not from source mutual information: information correctly received at sink 22

Joint Entropy, Equivocation, Irrelevance Joint Information I(X ν, Y μ ) = log 2 Pr{Xν, Y μ } Joint Entropy of source and sink: H(X, Y) = X ν Xμ Pr{X ν, Y μ } log 2 Pr{Xν, Y μ } = E log 2 Pr{Xν, Y μ } ª Equivocation: Information lost during transmission Xμ Pr{X ν, Y μ } log 2 Pr{Xν Y μ } H(X Y) = H(X, Y) H(Y) = X ν = E log 2 Pr{Xν Y μ } ª Irrelevance Xμ Pr{X ν, Y μ } log 2 Pr{Yμ X ν } H(Y X) = H(X, Y) H(X) = X ν = E log 2 Pr{Yμ X ν } ª 23

Mutual Information Definition of Mutual Information H(X; Y) = H(X) H(X Y) = H(Y) H(Y X) = H(X) + H(Y) H(X, Y) = X X µ Pr{Y Pr{Yμ X ν } μ X ν } Pr{X ν } log 2 ν μ Pr{Y μ } ½ µ ¾ Pr{Yμ X ν } = E log 2 Pr{Y μ } ½ µ ¾ Pr{Yμ, X ν } = E log 2 Pr{Y μ } Pr{X ν } Mutual information is the amount of information common to X and Y Mutual information is the reduction of uncertainty in X due to the knowledge of Y 24

Illustration of Channel Capacity H(X Y) equivocation H(X) H(X; Y) H(Y) H(Y X) irrelevance Maximization of mutual information with respect to source statistics delivers channel capacity: X X C = sup Pr{X} ν μ Pr{Y μ X ν } Pr{X ν } log 2 Pr{Y μ X ν } Pr{Y μ } 25

Channel Coding Theorem of Shannon Shannon, 1948: A Mathematical Theory of Communication If a channel has the capacity C, there exist a code with rate R c C for which the probability of a decoding error can be made arbitrary small. Converse Theorem: If a channel has the capacity C, a reliable (error-free) communication cannot be achieved for codes with rates R c > C. Theorems are not constructive, i.e. they do not provide a construction guideline for powerful codes 27

Capacity of Binary Channels Statistics of channel X 0 X 1 1 1 Mutual information Y 0 Y 1 Pr ( ª P 0 for μ = 0 X ν = 1 P 0 for μ = 1 Pr ( ª 1 for μ = ν Y μ X ν = 0 for μ 6=ν Pr ( ª P 0 for μ = 0 Y μ = 1 P 0 for μ = 1 H(X; Y) = P 0 log 2 1 P 0 + (1 P 0 ) log 2 1 1 P 0 = H 2 (P 0 ) = H(X) Hint: 0 log 2 (0) = 0 Perfect transmission without any errors! 28

Capacity of Binary Channels Statistics of channel X 0 X 1 1 1 Mutual information Y 0 Y 1 Pr ( ª P 0 for μ = 0 X ν = 1 P 0 for μ = 1 Pr ( ª 0 for μ = ν Y μ X ν = 1 for μ 6=ν Pr ( ª 1 P 0 for μ = 0 Y μ = P 0 for μ = 1 H(X; Y) = P 0 log 2 1 P 0 + (1 P 0 ) log 2 1 1 P 0 = H 2 (P 0 ) = H(X) Hint: 0 log 2 (0) = 0 Perfect transmission without any errors! 29

Capacity of Binary Erasure Channel Statistics of BEC channel X 0 1-P e Y 0 P e Y 2 X 1 Y 1 1-P e Mutual information of BEC Pr ( ª P 0 for μ = 0 X ν = 1 P 0 for μ = 1 Pr ª P 0 (1 P e ) for μ = 0 Y μ = P e (P 0 + 1 P 0 ) = P e for μ = 2 (1 P 0 ) (1 P e ) for μ = 1 I(X; Y) = (1 P e )P 0 log 2 1 P e P 0 (1 P e ) + P ep 0 log 2 P e P e (P 0 + 1 P 0 ) + (1 P e )(1 P 0 ) log 2 1 P e (1 P 0 )(1 P e ) + P e(1 P 0 ) log 2 P e P e = (1 P e ) H 2 (P 0 ) 30

Capacity of Binary Erasure Channel Mutual information of BEC and different statistics of input signal 1 Pr(X 0 )=0.1 0.8 Pr(X 0 )=0.2 Pr(X 0 )=0.3 0.6 Pr(X 0 )=0.4 Pr(X 0.4 0 )=0.5 0.2 0 0 0.2 0.4 0.6 0.8 1 P e Capacity of BEC for uniform input distribution: C BEC = (1 P e ) 31

Capacity of Binary Symmetric Channel Statistics of BSC channel for uniform input distribution X 0 X 1 1-P e 1-P e P e P e Y 0 Y 1 Mutual information of BSC Pr ª ª 1 X 0 = Pr X1 = 2 Pr ( ª 1 P e for μ = ν Y μ X ν = P e for μ 6=ν Pr Y 0 ª = Pr Y1 ª = 1 2 C BSC = 2 (1 P e ) 1 2 log 2 2(1 Pe ) + 2 P e 1 2 log 2 2Pe = (1 P e ) 1 + log 2 (1 P e ) + P e 1 + log 2 (P e ) = 1 + (1 P e ) log 2 (1 P e ) + P e log 2 (P e ) = 1 H 2 (P e ) 32

Capacity of Binary Symmetric Channel Mutual information of BSC and different statistics of input signal 1 0.8 Pr{X 0 } = 0.1 Pr{X 0 } = 0.3 Pr{X 0 } = 0.5 C(p e ) 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 p e Capacity of BSC for uniform input distribution C BSC = 1 + P e log 2 (P e ) + (1 P e ) log 2 (1 P e ) = 1 H 2 (P e ) 33

Binary Symmetric Erasure Channel (BSEC) Quantization parameter a has to be optimized with respect to channel capacity C Optimal choice depends on signal-to-noise-ratio Es/N0 X 0 1-P e -P q P e Y 0 P q Y 2 X 1 1-P e -P q Y 1 X 0 = -1 X 1 = +1 -a +a Y 0 Y 1 Y 2 C BSEC = 1 P q +P e log 2 (P e )+(1 P e P q ) log 2 (1 P e P q ) (1 P q ) log 2 (1 P q ) 34

Channel Capacity for BSC and BSEC BSEC 1 0.8 0.6 0.4 C BSEC, a=opt. C BSC a opt 0.2 E s /N 0 in db 0-20 -10 0 10 E s /N 0 in db a > 1 leads only to minor improvement of channel capacity 35

Capacity of AWGN Channel Additive White Gaussian Noise Channel n N (0, σ 2 N ) x N (0, σ 2 X ) y N (0, σ2 Y ) Differential entropy of Gaussian random process Z h(x) = p X (ξ) log 2 p X (ξ) dξ = 1 2 log 2(2πeσX) 2 Capacity of AWGN channel C = h(y) h(y X) = h(y) h(n) = 1 2 log 2 2πe(σ 2 X + σn 2 ) 1 2 log 2(2πeσN 2 ) = 1 2 log 2 1 + σ 2 X /σ 2 N 36

Channel Capacity of BPSK and AWGN Influence of quantization 1 0.8 0.6 C 0.4 q = q = 1 0.2 q = 2 q = 3 gauss 0-10 -5 0 5 10 E S / N0 in db 37

Ultimate Communication Limit Energy per information bit: E b = E s / C E s = C E b Capacity of 1-D AWGN channel C = 1 µ 2 log 2 1 + 2 Es N 0 = 1 µ 2 log 2 1 + 2C Eb N 0 Minimum signal to noise ratio E b = 22C 1 N 0 2C C 0 1.59 db ln(2) C 1 0.8 0.6 0.4 0.2 0-2 -1 0 1 2 Eb / N0 in db -1.59 db 38

Institut für Thanks for your attention! September 2010 Volker Kühn - One Lesson of Information Theory UNIVERSITÄT ROSTOCK FAKULTÄT INFORMATIK UND ELEKTROTECHNIK