An Introduction to Algorithmic Coding Theory

An Introduction to Algorithmic Coding Theory M. Amin Shokrollahi Bell Laboratories

Part : Codes -

A puzzle What do the following problems have in common? 2

Problem : Information Transmission MESSAGE G W D A S U N C L R V J K F T X M I H Z B Y E MASSAGE?? 3

Problem 2: Football Pool Problem Bayern München :. FC Kaiserslautern Homburg : Hannover 96 86 München : FC Karlsruhe 2. FC Köln : Hertha BSC Wolfsburg : Stuttgart Bremen : Unterhaching 2 Frankfurt : Rostock Bielefeld : Duisburg Feiburg : Schalke 4 2 4

Problem 3: In-Memory Database Systems Database Corrupted! User memory malloc malloc malloc malloc Code memory 5

Problem 4: Bulk Data Distribution 6

What do they have in common? They all can be solved using (algorithmic) coding theory 7

Basic Idea of Coding Adding redundancy to be able to correct! Objectives: MESSAGE Add as little redundancy as possible; correct as many errors as possible. MESSAGE MESSAGE MASSAGE 8

Codes A code of block-length n over the alphabet GF(q) is a set of vectors in GF(q) n. If the set forms a vector space over GF(q), then the code is called linear. [n,k] q -code 9

??? Encoding Problem source??? Codewords Efficiency!

Linear Codes: Generator Matrix Generator matrix source codeword k n k x n O(n 2 )

Linear Codes: Parity Check Matrix Parity check matrix codeword O(n 2 ) after O(n 3 ) preprocessing. 2

The Decoding Problem: Maximum Likelihood Decoding a b c d a b c d a b c d a b c d Received: a x c d a b z d a b c d. a x c d a b z d a b c d a x c d a x c d a x c d 3 a b z d a b z d a b z d 3 a b c d a b c d a b c d 2 3 3

Hamming Distance Hamming distance between two vectors of equal dimension is number of positions at which they differ. [n,k,d] q -code 4

Maximum Likelihood Decoding Given received word, find a codeword that has least Hamming distance to it. Intractable in general. 5

Worst Case Error Correction [n,k,d] q -code is capable of correcting up to e = (d )/2 errors in whatever locations! Hamming balls of radius e d e 6

Errors in Known Locations: Erasures [n,k,d] q -code is capable of correcting up to d errors in whatever locations if the locations are known. 7

Projective plane over GF(2) 2 3 4 5 6 2 3 4 5 6 Minimum distance = 4 8

Hamming Code 2 3 4 5 6 7,4,3] [ -code 2 9

A Solution to the Football Match Problem [4,2,3] 3 -Hamming Code with generator matrix: 2 Codewords: 2 2 2 2 2 2 2 2 2 2 2 2 2

A Solution to the Football Match Problem This Hamming code is perfect: Hamming balls of radius fill the space GF(3) 4 : ( ) 4 3 2 2 i = 9 (+4 2) = 8. i i= Any vector in GF(3) 4 has Hamming distance at most to a codeword. Hamming code Hamming code Fixed game 2

Bounds How can we prove optimality of codes? (Fix any two of the three parameters n, k, d and maximize the third.). Hamming bound: e i=( n i) (q ) i q n k. Equality: perfect codes 2. Plotkin bound: d+k n+. Equality: MDS codes. 3. Other more refined bounds... 22

Perfect Codes Have been completely classified by van Lint and Tieätväinen. Essentially: Hamming codes, Golay codes. Not desirable in communication scenarios. 23

MDS Codes Not classified completely. Open problem: Given dimension k and field size q, determine maximum block length of an MDS-code. (Conjecturally q + or q +2.) MDS codes are desirable in practice in worst case scenarios if efficient encoding and decoding available. Prototype: Reed-Solomon codes. 24

Reed-Solomon Codes: Applications. Satellite Communication, 2. Hard disks, 3. Compact Discs, Digital Versatile Disks, Digital Audio Tapes, 4. Wireless Communication, 5. Secret sharing, 6.... 25

Reed-Solomon Codes: Definitions Choose n different elements x,...,x n in GF(q). Reed-Solomon code is image of the morphism GF(q)[x] <k GF(q) n f (f(x ),...,f(x n )) Block length = n. Dimension? Minimum distance? 26

Reed-Solomon Codes: Parameters Theorem. Nonzero polynomial of degree m over a field can have at most n zeros over that field. 27

Reed-Solomon Codes: Dimension GF(q)[x] <k GF(q) n f (f(x ),...,f(x n )) Kernel: if k n since nonzero polynomial of degree k has at most k zeros. Dimension: k (if k n). 28

Reed-Solomon Codes: Minimum Distance GF(q)[x] <k GF(q) n f (f(x ),...,f(x n )) Minimum distance: Maximal number of zeros in a nonzero codeword is k, since evaluating polynomial of degree k. Minimum distance is n k +, hence equal, hence MDS code!. 29

Reed-Solomon Codes: Encoding f (f(x ),...,f(x n )) is easy to compute! O(n 2 ) using naive algorithm. O(nlog 2 (n)loglog(n) using fast algorithms. 3

Reed-Solomon Codes: Decoding No efficient maximum likelihood decoding known. Concentrate on bounded distance decoding. Transmitted word Received word Number of agreements (n+k)/2. Number of disagreements (n k)/2. 3

Welch-Berlekamp Algorithm Transmitted word: (f(x ),...,f(x n )). Received word: (y,...,y n ). Number of agreements (n+k)/2. Find f! 32

Welch-Berlekamp Algorithm Step : Find g(x) GF(q)[x] <(n+k)/2 and h(x) GF(q)[x] (n k)/2, not both zero, such that i =,...,n: g(x i )+y i h(x i ) =. (Solving a system of equations!) Step 2: Then f = g/h! 33

Welch-Berlekamp Algorithm: Proof H(x) := g(x) f(x)h(x). Degree of H(x) < (n+k)/2. If y i = f(x i ), then H(x i ) =. H(x) has at least (n+k)/2 zeros. H(x) is zero. f(x) = g(x)/h(x). 34

Welch-Berlekamp Algorithm: Running time Step : Solving a homogeneous n (n+) system of equations; O(n 3 ). Can be reduced to O(n 2 ) (Welch-Berlekamp, 983; displacement method (Olshevsky-Shokrollahi, 999)). Step 2: Polynomial division; O(n 2 ). 35

Welch-Berlekamp Algorithm: Generalization Has been generalized to more than (n k)/2 errors (list-decoding, Sudan, 997, Guruswami-Sudan, 999). Step 2 requires factorization of bivariate polynomials. Can be done more efficiently (Gao-Shokrollahi 999, Olshevsky-Shokrollahi 999). 36

A Solution to the In-Memory Database Problem Database Redundant User memory malloc malloc malloc malloc Code memory 37

Reed-Solomon Codes: Generalization Disadvantage of RS-codes: GF(q) must be large to accommodate many points, so long codes impossible. Interpret GF(q) as affine line over itself, and generalize to more complicated algebraic curves. Lead to best known codes in terms of minimum distance, dimension, block-length. Above algorithms can be generalized to these Algebraic-geometric codes. 38

Probabilistic Methods 38-

Input alphabet Channels.2.3.5.5.5..9 Transition probabilities 39 Output alphabet

Entropy and Mutual Information X and Y discrete random variables on alphabets X and Y and distributions p(x) and q(x). p(x, y) their joint distribution. Entropy H(X) of X H(X) = x X p(x)log p(x). Mutual information I(X; Y) I(X;Y) = y Y x X p(x,y)log p(x,y) p(x)p(y). 4

Entropy and Mutual Information H(X) is the amount of uncertainty of random variable X. I(X;Y) is the reduction in the uncertainty of X due to the knowledge of Y. 4

Capacity Capacity of a channel with input alphabet X and output alphabet Y and probability transition matrix p(y x) is C = max p(x) I(X;Y), where maximum is over all possible input distributions p(x). 42

Binary Erasure Channel: Examples of Capacity: BEC -p p p E -p Capacity = p 43

Binary Symmetric Channel: Examples of Capacity: BSC -p p p -p Capacity = +plog 2 (p)+( p)log 2 ( p) 44

Capacity and Communication Shannon s Coding Theorem, 948: C channel with capacity C. For any rate R C there exists a sequence of codes of rate R such that the probability of error of the Maximum Likelihood Decoding for these codes approaches zero as the block-length approaches infinity. The condition R C is necessary and sufficient. 45

Problems How to find the sequences of codes? (Random codes, Concatenated codes, ) How to decode efficiently? Has been open for almost 5 years. Low-Density Parity-Check Codes 46

46- Part 2: Low-Density Parity-Check Codes

Low-Density Parity Check Codes Gallager 963 Zyablov 97 Zyablov-Pinsker 976 Tanner 98 Turbo Codes 993 Berroux-Glavieux-Thitimajshima 47

Sipser-Spielman, Spielman 995 MacKay-Neal, MacKay 995 Luby-Mitzenmacher-S-Spielman-Stemann 997 Luby-Mitzenmacher-S-Spielman 998 Richardson-Urbanke 999 Richardson-Shokrollahi-Urbanke 999 48

Code Construction Codes are constructed from sparse bipartite graphs. 49

Code Construction Any binary linear code has a graphical representation. a b a c f! = c b c! d e= d e! a c e= f Not any code can be represented by a sparse graph. 5

Parameters n r Rate Rate n-r n - average left degree average right degree 5

Dual Construction a b c a b f Source bits d e a b c g c e g h Redundant bits f g b d e f g h h Encoding time is proportional to number of edges. 52

Encoding? Algorithmic Issues Is linear time for the dual construction Is quadratic time (after preprocessing) for the Gallager construction. More later! Decoding? Depends on the channel, Depends on the fraction of errors. 53

Decoding on a BSC: Flipping satisfied check unsatisfied check 54

Decoding on a BSC: Gallager Algorithm A (Message passing) b u x y z m x y m z u m = x b if x=y=z=u else m=x y z u MESSAGE CHECK 55

Decoding on a BSC: Belief Propagation b u x y z m hyperbolic transform x y m z u m = x+y+z+u+b m=x * y z u * * ( a,b) * (c,d):=(a+c, b+d mod 2) MESSAGE CHECK Messages in log-likelihood ratios. 56

Optimality of Belief Propagation Belief propagation is bit-optimal if graph has no loops. Maximizes the probability P(c m = b y) = c CP(c y). 57

Performance on a (3,6)-graph Shannon limit: % Flipping algorithm: %? Gallager A: 4% Gallager B: 4% (6.27%) Erasure decoder: 7% Belief propagation: 8.7% (.8%) 58

The Binary Erasure Channel (BEC) -p p p E -p 59

Decoding on a BEC: Luby-Mitzenmacher-Shokrollahi-Spielman- Stemann x y z u m x y m z u m = if x y z u = else m = if x = y = z = u = else MESSAGE CHECK 6

Decoding on a BEC Phase : Direct recovery a? c?? f g h b b e b d d e 6

Decoding on a BEC Phase 2: Substitution a b c?? f g h b d e d e 62

Example (a) (b) (c) Complete Recovery (d) (e) (f) 63

Have: fast decoding algorithms. The (inverse) problem Want: design codes that can correct many errors using these algorithms. Focus on the BEC in the following. 64

Choose regular graphs. Experiments An (d,k)-regular graph has rate at least d/k. Can correct at most an d/k-fraction of erasures. Choose a random (d, k)-graph. p := maximum fraction of erasures the algorithm can correct. What are these numbers? d k d/k p 3 6.5.429 4 8.5.383 5.5.34 3 9.33.282 4 2.33.2572 65

A Theorem Luby, Mitzenmacher, Shokrollahi, Spielman, Stemann, 997: A randomly chosen (d,k)-graph can correct a p -fraction of erasures with high probability if and only if p ( ( x) k ) d < x for x (,p ). 66

Analysis: (3,6)-graphs Expand neighborhoods of message nodes. 68

Analysis: (3,6)-graphs p i probability that message node is still erased after ith iteration. p i+ Message Check p i Message p i+ = p ( ( p i ) 5 ) 2. 69

Successful Decoding Condition: p ( ( p i ) 5 ) 2 <p i 7

Making arguments exact: Analysis: (3,6)-graphs Neighborhood is tree-like: high probability, standard argument. Above argument works for expected fraction of erasures at lth round. Real value is sharply concentrated around expected value p l : Edge exposure martingale, Azuma s inequality. 7

The General Case Let λ i and ρ i be the fraction of edges of degree i on the left and the right hand side, respectively. Let λ(x) := i λ ix i and ρ(x) := i ρ ix i. Condition for successful decoding for erasure probability p is then for all x (,p ). p λ( ρ( x)) < x 72

Richardson-Urbanke, 999: Belief propagation f l : density of the probability distribution of the messages passed from the check nodes to the message nodes at round l of the algorithm. P : density of the error distribution (in log-likelihood representation). Consider (d, k) regular graph. Γ(f l+ ) = ( ( )) (d ) (k ) Γ P f l, where Γ is a hyperbolic change of measure function, and denotes convolution. Γ(f)(y) := f(lncothy/2)/sinh(y), 73

We want f l to converge to a Delta function at. Gives rise to high-dimensional optimization algorithms. 74

Achieving capacity Want to design codes that can recover from a fraction of R of erasures (asymptotically). Want to have λ and ρ so that p λ( ρ( x)) < x for all x (,p ), and p arbitrarily close to R = ρ(x)dx. λ(x)dx 75

Tornado codes Extremely irregular graphs provide for any rate R sequences of codes which come arbitrarily close to the capacity of the erasure channel! Degree structure? Choose design parameter D. λ(x) := H(D) (x+ x2 2 + + xd D ) ρ(x) := exp(µ(x )), where H(D) = +/2+ +/D and µ = H(D)/( /(D +)). 76

Tornado Codes: Left Degree Distribution 77

Right regular codes Shokrollahi, 999: Graphs that are regular on the right. Degrees on the left are related to the Taylor expansion of ( x) /m. These are the only known examples of LDPC codes that achieve capacity on a nontrivial channel using a linear time decoding algorithm. 78

Other channels? f density function. λ(f) := i λ if (i ). ρ(f) := i ρ if (i ). Want P such that f l. Γ(f l+ ) = ρ(γ(p λ(f l ))). 79

Conditions on the density functions Richardson-Shokrollahi-Urbanke, 999: Consistency: if the channel is symmetric, then the density functions f l satisfy f(x) = f( x)e x. Fixed point theorem: If P err (f i ) = P err (f j ) for i < j, then f i = f j is a fixed point of the iteration. 8

Conditions on the density functions Stability: let r := lim n n logp err(p n ). Then for λ 2 ρ () > e r we have P err (f l ) > ǫ for some fixed ǫ and all l. If λ 2 ρ () < e r, then the fixed point is stable. is the error probability. P err (f) := f(x)dx 8

Stability Erasure channel with erasure probability p : λ 2 ρ () p. BSC channel: with probability p: λ 2 ρ () AWGN channel: with variance σ 2 : 2 p( p). λ 2 ρ () e 2σ 2. 82

Shokrollahi, 999: Stability for the Erasure Channel stable not stable p λ( ρ( x ))- x p λ( ρ( x)) - x 83

Flatness: Higher Stability Conditions Shokrollahi, 2: (λ m (x),ρ m (x)) capacity achieving sequence of degree distributions. Then: ( R)λ m ( ρ m ( x)) x converges uniformly to the zero-function on the interval [, R]. No equivalent known for other channels. 84

Flatness: Higher Stability Conditions p λ( ρ( x)) x 85

Capacity achieving No sequences of c.a. degree distributions for channels other than the erasure channel known. Conjecture: They exist! -2-3 -4 irregular LDPCC; n= 6 Turbo Code; n= 6-5 Shannon Limit Threshold Threshold (3, 6)-regular LDPCC; n= 6-6..2.4.6.8..2 E b /N [db]..977.955.933.92.89.87.85 σ.59.53.47.42.36.3.25.2 P b 86

Applications to computer networks Distribution of bulk data to a large number of clients. Want fully reliable, low network overhead, support vast number of receivers with heterogeneous characteristics users want to access data at times of their choosing and these access times overlap. 87

A Solution Broadcast Server 88

A Solution Client joins multicast group until enough of the encoding has been received, and then decodes to obtain original data. Amount of encoding received Time Digital Fountain, http://www.dfountain.com. 89

Open problems Asymptotic theory. Classification of capacity achieving sequences for the erasure channel. 2. Capacity achieving sequences for other channels. 3. Exponentially small error probabilities for the decoder (instead of polynomially small). Explicit constructions. Constructions using finite geometries. 2. Construction using Reed-Solomon-Codes. 3. Algebraic constructions. Graphs with loops. Short codes 9

Algorithmic issues. Design and analysis of new decoding algorithms. 2. Design of new encoders. Packet based wireless networks. Applications Randomness Use of randomness in other areas: random convolutional codes?. 9