Information and Coding Theory Autumn 017 Lecturer: Madhur Tulsiani Lecture 1: November 6, 017 Recall: We were looking at codes of the form C : F k p F n p, where p is prime, k is the message length, and n is the block length of the code We also saw C as a set in F n p and defined the distance of the code as (C) := min { (x, y)} x,y C,x =y where (x, y) is the Hamming distance between x and y We proved that a code C can correct t errors iff (C) t + 1 1 Linear codes Definition 11 (Linear Codes) A code C F n p is a linear code C is a subspace of F n p We will also think of a C as a linear encoding C : F k p F n p, satisfying C(αu + v) = αc(u) + C(v) for all u, v F k p and α F p Since a linear encoding is a linear map from a finite dimensional vector space to another, we can write it as a matrix of finite size That is, there is a corresponding G F n k p st C(x) = Gx for all x F k p If the code has nonzero distance, then the rank of G must be k (otherwise there exist x, y F k p such that Gx = Gy) Hence, the null space of G T has dimension n k This defines another useful matrix, known as the parity check matrix of the code Definition 1 (Parity Check Matrix) Let b 1,, b n k be a basis for the null space of G T corresponding to a linear code C Then H F (n k) n p, defined by H T = [ ] b 1 b b n k is called the parity check matrix of C Since G T H T = 0 HG = 0, we have (HG)x = 0 for all x F k p, ie, Hz = 0 for all z C Moreover, since the columns of H T are a basis for the null-space of G T, we have that z C Hz = 0 1
So the parity check matrix gives us a way to quickly check a codeword, by checking the parities of some bits of z (each row of H gives a parity constraint on z) Also, one can equivalently define a linear code by either giving G or the parity check matrix H The distance of linear codes, can also be characterized in terms of the number of non-zero entries in any z C \ {0} Exercise 13 For z F n p, let wt(z) = {i [n] z i = 0} Prove that for a linear code C (C) = min z C wt(z) 11 Hamming Code Consider the following code from F 4 to F7, known as the Hamming Code Example 14 Let C : F 4 F7, where C(x 1, x, x 3, x 4 ) = (x 1, x, x 3, x 4, x + x 3 + x 4, x 1 + x 3 + x 4, x 1 + x + x 4 ) Note that each element of the image is a linear function of the x i s, ie, one can express C with matrix multiplication as follows: C(x 1, x, x 3, x 4 ) = 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 1 1 1 1 0 1 1 1 1 0 1 Example 15 The parity check matrix of our example Hamming Code is: H = 0 0 0 1 1 1 1 0 1 1 0 0 1 1 1 0 1 0 1 0 1 Note that the i th column is the integer i in binary One can easily check that HG = 0 x 1 x x 3 x 4
Now suppose z = (z 1,, z 7 ) T is our codeword and we make a single error in the i th entry Then the output codeword with the error is z + e i = z 1 z i z 7 0 + 1 0 and H(z + e i ) = Hz + He i = He i = H i, the i th column of H, which reads i in binary So this is a very efficient decoding algorithm just based on parity checking Since the Hamming code (C) can correct at least t = 1 errors, we must have that (C) t + 1 = 3 Verify that the distance is exactly 3 using the following characterization of distance for linear codes One can generalize the Hamming code to larger message and block lengths, we can create a parity matrix H F (n k) n p, where the i th column reads i in binary Elementary bounds on codes We consider a few bounds showing how much redundancy is necessary for any code with a given distance 1 Hamming bound We now show an optimality bound on the size of the code, starting with the case of distance-3 codes and then generalizing to distance-d codes Theorem 1 Let C : F k Fn be any distance-3 code, ie, (C) = 3 Then C = k n n + 1 Proof: For each z C, let B(z) be the ball of size n + 1 consisting of z and the n elements in F n (not in C), each at distance 1 from z Then the balls formed by the codewords in C are disjoint, if B(z) and B(z ) intersect, then (z, z ) by triangle inequality For each codeword z C, we have B(z) = n + 1 codes, so C (n + 1) n Note that our example hamming code from F 4 to F7 satisfied C = 4 = 7, so it was an 8 optimal distance-3 code Generalizing to distance-d codes, we have: 3
Theorem (Hamming Bound) Let C : F k Fn be any distance-d code, ie, (C) = d Then C = k ( n ) vol B d ( ) where vol B d codeword z C = d i=1 (n i ) is the number of codes at distance at most d from any fixed Remark 3 The Hamming bound also gives us a bound on the rate of the code in terms of entropy (recall: the rate of the code is k n ) Let d = δn for δ 1 Since l i=1 ( n i ) nh( n l ) for l n, we have: k 1 H(δ/) + o(1) n Singleton bound While the Hamming bound is a good bound for codes over small alphabets (F p ), the following is a better bound when the alphabet size is large Theorem 4 (Singleton Bound) Let C : F k p F n p be a distance-d code Then d n k + 1 Proof: Consider the map Γ : F k p F n d+1 p, where Γ(x) is the first n d + 1 coordinates of C(x) Then for all x = y, we have Γ(x) = Γ(y), since (C(x), C(y)) d Therefore, img(γ) = F n d+1 p F k p We will next see Reed-Solomon codes, which achieve the singleton bound 3 Reed-Solomon codes We now look at Reed-Solomon codes over F p These are optimal codes which can achieve a very large distance However, they have a drawback that they need p n Definition 31 (Reed-Solomon Code) Assume p n and fix S = {a 1,, a n } F p, distinct st S = n For each message (m 0,, m k 1 ) F k p, consider the polynomial P(x) = m 0 + m 1 x + + m k 1 x k 1 Then the Reed-Solomon Code is defined as: C(m 0,, m k 1 ) = (P(a 1 ),, P(a n )) 4
Remark 3 Reed-Solomon Codes can again be encoded using inner codes to create concatenated codes, which can work with smaller p Let s compute the distance of the Reed-Solomon Code: Claim 33 (C) n k + 1 Proof: Consider C(m 0,, m k 1 ) with P(x) = m 0 + m 1 x + + m k 1 x k 1 and C(m 0,, m k 1 ) with P (x) = m 0 + m 1 x + + m k 1 xk 1, where we assume (m 0,, m k 1 ) = (m 0,, m k 1 ) and hence P = P Then P P is a non-zero polynomial of degree at most k 1 and has at most k 1 roots, ie, P and P agree on at most k 1 points This implies that C(m 0,, m k 1 ) = (P(a 1 ),, P(a n )) and C(m 0,, m k 1 ) = (P (a 1 ),, P (a n )) differ on at least n k + 1 places Hence, (C) n k + 1 Remark 34 If n = k, then for d = k + 1, we can correct k errors using the Reed-Solomon Code, and this is optimal (by the singleton bound) Moreover, the Reed-Solomon Code is a linear code: 1 a 1 a 1 a k 1 1 C(m 0,, m k 1 ) = 1 a n a n an k 1 m 0 m 1 m k 1 31 Unique decoding of Reed-Solomon codes If the codewords output by the Reed-Solomon code did not contain any errors, we could simply use Lagrange interpolation to recover the messages However, we must be able to handle noise, and the trick is to work with a hypothetical error-locator polynomial telling us where the error is Definition 35 (Error-locator Polynomial) An error-locator polynomial, E(x), is a polynomial which satisfies i [n] E(a i ) = 0 y i = P(a i ) for all i = 1,, n, where y i is the i th element of the output Reed-Solomon Code obtained with noise Note that this directly implies that y i E(a i ) = P(a i ) E(a i ) for all a i S Let s denote Q as Q(x) := P(x) E(x) The following algorithm by Berlekemp and Welch [WB86] finds the polynomials E and Q defined above, using these to to decode a message with at most (C) 1 = n k errors 5
Note that the message (m 0,, m k 1 ) is identified with the polynomial P(x) = k 1 i=0 m ix i Let (P(a 1 ),, P(a n )) be the original codeword and let y = (y 1,, y n ) be it s corrupted version (we only know y) The distance between them is at most t Unique decoding for Reed-Solomon codes Input: {(a i, y i )} i=1,,n 1 Find E, Q F p [x] such that E 0, deg(e) t, deg(q) k 1 + t i [n] Q(a i ) = y i E(a i ) Output Q E We first observe that Step 1 in the algorithm can be done by solving linear system Let E(x) = e 0 + e 1 x + + e t x t and Q(x) = q 0 + q 1 x + + q k 1+t x k 1+t Then for each given (a i, y i ), the equation Q(a i ) = y i E(a i ) is linear in variables e 0,, e t and q 0,, q k 1+t Note that such system is homogeneous and hence it always has a trivial solution We need to show that there is a solution with nonzero E Lemma 36 There exists (E, Q) that satisfies the conditions in Step 1 of the algorithm Proof: Let I = {i [n] P(a i ) = y i } and E = i I (x a i ) (in particular, E 1 if I is empty) Let Q = P E Then for all i [n], we have y i E (a i ) = P(a i ) E (a i ) = Q (a i ) If there is a unique nonzero solution to the linear system in Step 1, then Step outputs the correct polynomial But in general there can be more than one such solution The following lemma guarantees the correctness of Step Lemma 37 For any two solutions (Q 1, E 1 ) and (Q, E ) that satisfy the conditions in Step 1, Q 1 E 1 = Q E Proof: It suffices to show Q 1 E = Q E 1 Indeed, since they satisfy the equation Q(a i ) = y i E(a i ) for each i [n], we have (Q 1 E )(a i ) = y i E 1 (a i )E (a i ) = (E 1 Q )(a i ) for each i [n] It implies that they must be the same as there are n zeros of Q 1 E Q E 1 but its degree is at most k + t 1 k + n k 1 n 1 6
References [WB86] LR Welch and ER Berlekamp Error correction for algebraic block codes, December 30 1986 US Patent 4,633,470 URL: http://wwwgooglecom/patents/ US4633470 5 7