MTH6108 Coding theory - PDF Free Download

MTH6108 Coding theory Contents 1 Introduction and definitions 2 2 Good codes 6 2.1 The main coding theory problem............................ 6 2.2 The Singleton bound................................... 6 2.3 The Hamming bound................................... 8 2.4 The Plotkin bound.................................... 9 3 Error probabilities and nearest-neighbour decoding 13 3.1 Noisy channels and decoding processes........................ 13 3.2 Rate, capacity and Shannon s Theorem......................... 14 4 Linear codes 15 4.1 Revision of linear algebra................................ 15 4.2 Finite fields and linear codes............................... 18 4.3 The minimum distance of a linear code........................ 18 4.4 Bases and generator matrices.............................. 19 4.5 Equivalence of linear codes............................... 19 4.6 Decoding with a linear code............................... 25 5 Dual codes and parity-check matrices 28 5.1 The dual code....................................... 28 5.2 Syndrome decoding.................................... 31 6 Some examples of linear codes 33 6.1 Reed Muller codes.................................... 33 6.2 Hamming codes...................................... 35 6.3 The ternary Golay code.................................. 38 6.4 Existence of codes and linear independence...................... 40 6.5 MDS codes......................................... 41 1

2 Coding Theory 1 Introduction and definitions We begin with the most important definitions. Definition. An alphabet is a finite set of symbols. If an alphabet A contains exactly q symbols, we call it a q-ary alphabet. In fact, we usually say binary rather than 2-ary, and ternary rather than 3-ary. Definition. If A is an alphabet, a word of length n over A is a string a 1... a n of elements of A. The Hamming space A n is the set of all words of length n over A. A code of length n over A is a subset of A n, i.e. a set of words of length n over A. If A is a q-ary alphabet, we say that a code over A is a q-ary code. Definition. Given two words x = x 1... x n and y = y 1... y n of length n over an alphabet A, we define the distance d(x, y) between then to be the number of values of i for which x i y i. Lemma 1.1. d is a metric on A n, i.e.: 1. d(x, x) = 0 for all x A n ; 2. d(x, y) > 0 for all x y A n ; 3. d(x, y) = d(y, x) for all x, y A n ; 4. (the triangle inequality) d(x, z) d(x, y) + d(y, z) for all x, y, z A n. Proof. (1), (2) and (3) are very easy, so let s do (4). Now d(x, z) is the number of values i for which x i z i. Note that if x i z i, then either x i y i or y i z i. Hence So {i x i z i } {i x i y i } {i y i z i }. d(x, z) = { i x i z i } {i x i y i } {i y i z i } {i x i y i } + {i y i z i } Now we can talk about error detection and correction. = d(x, y) + d(y, z). Definition. Suppose C is a code of length n over an alphabet A and t > 0. C is t-error-detecting if d(x, y) > t for any x y in C. C is t-error-correcting if there do not exist distinct words x, y C and z A n such that d(x, z) t and d(y, z) t.

Introduction and definitions 3 Informally, a code is t-error-detecting if, whenever we take a word from the code and change at most t of the symbols in it, we don t reach a different word in the code. So if we send the new word to someone, he ll be able to tell that we ve changed some symbols. A code is t-error-correcting if whenever we take a word in the code and change at most t of the symbols in it, we don t reach a different word in the code, and we don t even reach a word which can be obtained from a different starting word by changing at most t of the symbols. So if we send the new word to someone without telling him which symbols we changed, he will be able to tell us which word from the code we started with. In fact, rather than thinking about error detection and correction, we ll talk about minimum distance. Definition. If C is a code, we define its minimum distance d(c) to be the smallest distance between distinct words in C: d(c) = min{d(x, y) x y C}. By definition, C is t-error-detecting if and only if d(c) > t. correction in terms of minimum distance. We can also express error Lemma 1.2. A code C is t-error-correcting if and only if d(c) > 2t. Proof. First suppose C is not t-error-correcting. This means that we can find words x, y C with x y, and another word z (not necessarily in C) such that d(x, z) t and d(y, z) t. By the triangle inequality, we get d(x, y) 2t, and therefore d(c) 2t. Conversely, suppose d(c) 2t; we ll show that C is not t-error-correcting. Take words x y C such that d(x, y) = d 2t. Let i 1, i 2,..., i d be the positions where x and y are different. Define a word z as follows: set z i = x i for i {i 1,..., i t }, set z i = y i for i {i t+1,..., i d }, and set z i = x i = y i for all other i. Then x and z differ only in positions i t+1,..., i d, so d(x, z) = d t t; y and z differ only in positions i 1,..., i t, so d(y, z) = t. And so C is not t-error-correcting. Key example. Suppose A is any alphabet, and n > 0. The repetition code of length n over A simply consists of all words aa... a, for a A. For this code, any two distinct words differ in every position, and so d(x, y) = n for all x y in C. So the code is t-error-detecting for every t n 1, and is t-error-correcting for every t n 1 2. Terminology. A q-ary (n, M, d)-code means a code of length n over a q-ary alphabet, containing exactly M words, and with minimum distance at least d. Equivalent codes Definition. If C and D are codes of length n over an alphabet A, we say that C is equivalent to D if we can get from C to D by a combination of the following operations. Operation 1 permutation of the positions Choose a permutation σ of {1,..., n}, and for any word v = v 1... v n in A n define v σ = v σ(1)... v σ(n). Now replace C with the code C σ = {v σ v C}.

4 Coding Theory Operation 2 applying a permutation of A in a fixed position Choose a permutation f of A and an integer i {1,..., n}, and for v = v 1... v n define v f,i = v 1... v i 1 f (v i )v i+1... v n. Now replace C with the code C f,i = {v f,i v C}. The point of equivalence is that equivalent codes have the same size and the same minimum distance; we can often simplify both decoding procedures and some of our proofs by replacing codes with equivalent codes. Lemma 1.3. Suppose v and w are words of length n over an alphabet A, and suppose σ is a permutation of {1,..., n}. Define the words v σ and w σ as above. Then d(v σ, w σ ) = d(v, w). Proof. Write v = v 1... v n, w = w 1... w n, so that v σ = v σ(1)... v σ(n) and w = w σ(1)... w σ(n). Now d(v, w) is the number of positions in which v and w differ, i.e. d(v, w) = {j vj w j } ; but σ is a permutation, so each j can be uniquely written in the form σ(i). Hence d(v, w) = {i vσ(i) w σ(i) } = {i (vσ ) i (w σ ) i } = d(v σ, w σ ). Corollary 1.4. Suppose C is a code of length n over A and σ is a permutation of {1,..., n}. Then d(c σ ) = d(c) and C = C σ. Proof. First note that for words v, w C, Hence v w d(v, w) > 0 Lemma 1.3 d(v σ, w σ ) > 0 v σ w σ. ( ) d(c σ ) = min{d(x, y) x y C σ } = min{d(v σ, w σ ) v w C} = min{d(v, w) v w C} by Lemma 1.3 = d(c). To show that C σ = C, we consider the function φ : C C σ v v σ, which we claim is a bijection. φ is surjective, because C σ is defined to be the image of φ. φ is also injective, because if v w C, then by ( ) we have v σ w σ, i.e. φ(v) φ(w). So φ is a bijection, so C = C σ.

Introduction and definitions 5 Now we prove the same properties for Operation 2. Lemma 1.5. Suppose v and w are words in A n. Suppose f is a permutation of A and i {1,..., n}, and define the words v f,i and w f,i as above. Then d(v f,i, w f,i ) = d(v, w). Proof. Recall that ( v f,i ) j = v j (j i) f (v i ) (j = i). We consider two cases. If v i = w i, then f (v i ) = f (w i ), so d(v, w) = {j i vj w j } = d(v f,i, w f,i ). If v i w i, then (since f is injective) f (v i ) f (w i ), and so d(v, w) = {j i vj w j } + 1 = d(v f,i, w f,i ). Corollary 1.6. Suppose C is a code of length n over A, f is a permutation of A and i {1,..., n}. Then d(c f,i ) = d(c) and C f,i = C. Proof. Copy the proof of Corollary 1.4. Corollary 1.7. Suppose C is an (n, M, d)-code over A, and D is a code equivalent to C. Then D is an (n, M, d)-code. Proof. Obviously D is a code of length n. We get from C to D by repeatedly applying Operation 1 and/or 2, and by Corollaries 1.4 and 1.6 these operations do not change the size or the minimum distance of a code.

6 Coding Theory 2 Good codes 2.1 The main coding theory problem The most basic question we might ask about codes is: given q, n, M and d, does a q-ary (n, M, d)-code exist? More precisely, we make the following definition. Definition. Suppose q, n, d > 0. A q (n, d) is defined to be the maximum M such that a q-ary (n, M, d)-code exists. The numbers A q (n, d) are unknown in general, and calculating them is often referred to as the main coding theory problem. Here are two very special cases. Theorem 2.1. 1. A q (n, 1) = q n. 2. A q (n, n) = q. Proof. 1. We can take C = A n, the set of all words of length n. Any two distinct words must differ in at least one position, so the code has minimum distance at least 1. Obviously a q-ary code of length n can t be bigger than this. 2. Suppose we have a code of length n with at least q + 1 words. Then by the pigeonhole principle there must be two words with the same first symbol. These two words can therefore differ in at most n 1 positions, and so the code has minimum distance less than n. So A q (n, n) q. On the other hand, the repetition code described above is an (n, q, n)-code. From the above proof we observe the following general principle, which is very useful in proofs. A q (n, d) A if and only if a q-ary (n, A, d)-code exists. A q (n, d) A if and only if there does not exist a q-ary (n, M, d)-code with M > A. 2.2 The Singleton bound Now we prove our first substantial theorem. Theorem 2.2 (Singleton bound). 1. Suppose n, d > 1. If a q-ary (n, M, d)-code exists, then a q-ary (n 1, M, d 1)-code exists. 2. Suppose n, d > 1. Then A q (n, d) A q (n 1, d 1). 3. Suppose n, d 1. Then A q (n, d) q n d+1. Proof. 1. Let C be a q-ary (n, M, d)-code, and for x C, let x be the word obtained by deleting the last symbol. Let C = {x x C}. Claim. If x y C, then d(x, y) d 1.

Good codes 7 Proof. We have d(x, y) d, so x and y differ in at least d positions; so there are at least d 1 positions other than position n where x and y differ. Hence d 1 {i < n xi y i } = d(x, y). The first consequence of the claim is that, since d > 1, x and y are distinct when x and y are. So C = M. The second consequence is that d(c) d 1. So C is an (n 1, M, d 1)-code. 2. To show that A q (n, d) A q (n 1, d 1), take an (n, M, d)-code C with M = A q (n, d). Then by part (1) there is an (n 1, M, d 1)-code, which means that A q (n 1, d 1) M = A q (n, d). 3. Applying part (2) repeatedly, we have A q (n, d) A q (n 1, d 1) A q (n 2, d 2) A q (n d + 1, 1) = q n d+1, from Theorem 2.1. Now we prove another result which only applies to binary codes. We begin with a lemma. Lemma 2.3. Suppose x and y are words over the alphabet {0, 1}, both containing an even number of 1s. Then d(x, y) is even. Proof. d(x, y) = { i xi = 1 and y i = 0 } + { i xi = 0 and y i = 1 } = { i xi = 1 and y i = 0 } + { i xi = 1 and y i = 1 } + { i xi = 0 and y i = 1 } + { i xi = 1 and y i = 1 } 2 { i xi = 1 and y i = 1 } = { i xi = 1 } + { i yi = 1 } 2 { i xi = 1 and y i = 1 } which is the sum of three even numbers, so is even. Theorem 2.4. Suppose n, d > 0 and d is even. If a binary (n 1, M, d 1)-code exists, then a binary (n, M, d)-code exists. Hence if d is even, then A 2 (n, d) A 2 (n 1, d 1). Proof. Suppose we have a binary (n 1, M, d 1)-code C. Given a word x C, we form a word ˆx of length n by adding an extra symbol, which we choose to be 0 or 1 in such a way that ˆx contains an even number of 1s. Now for any x y C, we have d(x, y) d 1, and clearly this gives d( ˆx, ŷ) d 1. But d 1 is odd, and by Lemma 2.3 d( ˆx, ŷ) is even, so in fact we have d( ˆx, ŷ) d. So the code Ĉ = { ˆx x C} is an (n, M, d)-code. If we do this with M = A 2 (n 1, d 1), we get A 2 (n, d) M = A 2 (n 1, d 1). As a consequence of this theorem and the Singleton bound, we have A 2 (n, d) = A 2 (n 1, d 1) whenever d is even.

8 Coding Theory 2.3 The Hamming bound Lemma 2.5. Suppose v is a word of length n over a q-ary alphabet A, and d 0. Then the number of words w A n such that d(v, w) = d is (q 1) d( n d). Proof. In order to write a word w such that d(v, w) = d, we must choose d positions where v and w will be different, and then we must choose which symbols will appear in these positions in w. We can choose the positions in ( n d) ways; in each of these positions, we can choose any symbol from A other than the symbol appearing in this position in v; this leaves q 1 choices for each position. So the total number of choices is (q 1) d( n d). Definition. If x A n is a word, then the sphere of radius t and centre x is S(x, t) = { y A n d(x, y) t }. Lemma 2.6. If A is a q-ary alphabet, x is a word over A of length n and t n, then the sphere S(x, t) contains exactly ( ) ( ) ( ) ( ) n n n n + (q 1) + (q 1) 2 + + (q 1) t 0 1 2 t words. Proof. This follows from Lemma 2.5, by summing over d = 0,..., t. Lemma 2.7. Suppose C is a code of length n over A. x y C, the spheres S(x, t) and S(y, t) are disjoint. C is t-error-correcting if and only if for all Proof. C is t-error-correcting x y C, z A n such that d(x, z) t d(y, z) x y C, z A n such that z S(x, t) and z S(y, t) x y C, such that S(x, t) S(y, t) x y C, S(x, t) S(y, t) =. Theorem 2.8. Suppose A is a q-ary alphabet, and C is a code of length n over A which is t-errorcorrecting. Then C ( n 0 q n ) + (q 1) ( n 1 ) + (q 1) 2 ( n 2 ) ( + + (q 1) t n). t Proof. Each word in C has a sphere of radius t around it, and by Lemma 2.7 these spheres are disjoint. So the total number of words in all these spheres together is C (( n 0 ) + (q 1) ( n 1 ) + (q 1) 2 ( n 2 ) ( + + (q 1) t n)), t and this can t be bigger than the total number of words in A n, which is q n. Corollary 2.9 (Hamming bound). For n, q, t > 0, A q (n, 2t + 1) ( n 0 q n ) ( n ) ( + (q 1) + (q 1) 2 n) ( + + (q 1) t n). 1 2 t

Good codes 9 Proof. If C is a q-ary (n, M, 2t + 1)-code, then by Lemma 1.2 C is t-error-correcting. Theorem 2.8 C ( n 0 q n ) + (q 1) ( n 1 ) + (q 1) 2 ( n 2 ) ( + + (q 1) t n). t So by Definition. Suppose C is a q-ary code of length n, and t > 0. C is a perfect t-error-correcting code if d(c) = 2t + 1 and C = ( n 0 q n ) ( n ) ( + (q 1) + (q 1) 2 n) ( + + (q 1) t n). 1 2 t 2.4 The Plotkin bound The Plotkin bound is more complicated, but more useful. There is a version for arbitrary q, but we ll address only binary codes. First we need to recall some notation: remember that if x R, then x is the largest integer which is less than or equal to x. Lemma 2.10. If x R, then 2x 2 x + 1. Proof. Let y = x ; then x < y + 1, so 2x < 2y + 2, and so 2x < 2y + 2. But both sides of this inequality are integers, so we get 2x 2y + 1. Now we can state the Plotkin bound there are two cases, depending on whether d is even or odd. Theorem 2.11 (Plotkin bound). 1. If d is even and d n < 2d, then d A 2 (n, d) 2. 2d n 2. If d is odd and d n < 2d + 1, then d + 1 A 2 (n, d) 2. 2d + 1 n We now begin the proof of this bound. The proof is not examinable, but you are encouraged to read it. Non-examinable material Lemma 2.12. Suppose N, M are integers with 0 N M. Then N(M N) M 2 4 M 2 1 4 (if M is even) (if M is odd).

10 Coding Theory Proof. The graph of N(M N) is an unhappy quadratic with its turning point at N = M/2, so to maximise it we want to make N as near as possible to this (remembering that N must be an integer). If M is even, then we can take N = M/2, while if M is odd we take N = (M 1)/2. The proof is a double-counting argument. We suppose that our alphabet is {0, 1}, and if v = (v 1... v n ) and w = (w 1... w n ) are words in C, then we define v + w to be the word (v 1 + w 1 )(v 2 + w 2 )... (v n + w n ), where we do addition modulo 2 (so 1 + 1 = 0). Now given a binary (n, M, d)-code C, we write down an ( M) 2 by n array B whose rows are all the words v + w for pairs of distinct words v, w C. We re going to count the number of 1s in this array in two different ways. A really useful feature of the addition operation we have defined on words is the following. Lemma 2.13. Suppose v, w are binary words of length n. Then d(v, w) is the number of 1s in v + w. Proof. By looking at the possibilities for v i and w i, we see that So 0 (if v i = w i ) (v + w) i = 1 (if v i w i ). d(v, w) = {i v i w i } Lemma 2.14. The number of 1s in B is at least d ( M). 2 = {i (v + w) i = 1}. Proof. We look at the number of 1s in each row. Since d(c) d, we have d(v, w) d for each v w C, so by Lemma 2.13 there are at least d 1s in each row. Summing over all the rows gives the result. Now we count the 1s in B in a different way. Lemma 2.15. The number of 1s in B is at most n M2 4 n M2 1 4 (if M is even) (if M is odd). Proof. We count the number of 1s in each column. Choose j {1,..., n}. The word v + w has a 1 in the jth position if and only if one of v and w has a 1 in the jth position, and the other has a 0. If we let N be the number of words in C which have a 1 in the jth position, then the number ways of choosing a pair v, w such that v + w has a 1 in the jth position is N(M N). So the number of 1s in the jth column of our array is M 2 N(M N) 4 M 2 1 4 (if M is even) (if M is odd) by Lemma 2.12. This is true for every j, so by adding up we obtain the desired inequality.

Good codes 11 Proof of the Plotkin bound. 1. Suppose we have a binary (n, M, d)-code C with M = A 2 (n, d), and construct the array B as above. There are two cases, according to whether M is even or odd. Case 1: M even By combining Lemma 2.14 and Lemma 2.15, we get ( ) M d nm2 2 4 Since M > 0, we get = dm(m 1) 2 nm2 4. 2d(M 1) nm = (2d n)m 2d. By assumption 2d n > 0, so we divide both sides by 2d n to get M 2d 2d n. Since M is an integer, we can say 2d d M 2 + 1. 2d n 2d n But M is even and the right-hand side is odd, so d M 2. 2d n Case 2: M odd Now we combine Lemma 2.14 and 2.15 to get = Assuming M 2, we get ( ) M d 2 dm(m 1) 2 n(m2 1) 4 n(m + 1)(M 1). 4 2dM n(m + 1) = (2d n)m n. Again we have 2d n > 0, so we get M n 2d n = 2d 2d n 1.

12 Coding Theory Since M is an integer, we get 2d d M 1 2 + 1 1, 2d n 2d n so d A 2 (n, d) = M 2. 2d n 2. Now we consider the case where d is odd; but this follows by Theorem 2.4. If d is odd and n < 2d + 1, then d + 1 is even and (n + 1) < 2(d + 1). So by the even case of the Plotkin bound we have (d + 1) A 2 (n + 1, d + 1), 2(d + 1) (n + 1) and by Theorem 2.4 this equals A 2 (n, d). End of non-examinable material

Error probabilities and nearest-neighbour decoding 13 3 Error probabilities and nearest-neighbour decoding 3.1 Noisy channels and decoding processes In this section, we consider the situations in which our codes might be used, and show why we try to get a large distance between words. The idea is that we choose a code C, choose a word v C, and transmit v along a noisy channel, which might introduce a few errors and produce a distorted word. The person receiving this word knows what code C we re using, and guesses which word in C we started from. Definition. Given a code C of length n over the alphabet A, a decoding process for C is a function from A n to C. A nearest-neighbour decoding process for C is a function f : A n C such that d(v, f (v)) d(v, w) for all v A n, w C. What this means is that a nearest-neighbour decoding process send a word to the nearest possible word in C. We make certain assumptions about our noisy channel, namely that all errors are independent and equally likely. This means that there is a fixed p (the symbol error probability) such that any symbol will be transmitted correctly with probability 1 p, or incorrectly with probability p, and that if there is an error then all the incorrect symbols are equally likely. Moreover, errors on different symbols are independent whether an error occurs in one symbol has no effect on whether errors occur in later symbols. Given a code C and a decoding process f : A n C, we consider the word error probability for each v C: this is the probability that if we transmit the word v through our channel, we end up with a word w such that f (w) v. Example. Suppose A = {0, 1}, and C = {000, 011, 110}. A nearest-neighbour decoding process for C is f : 000 000 001 011 010 011 011 011 100 000 101 110 110 110 111 110 (This is not the only possible choice of a nearest-neighbour decoding process.) Suppose out noisy channel has symbol error probability 1 5. Let s work out the word error probability for the word 000. This is the probability that when we transmit 000, the word w that comes out has f (w) 000. This is 1 P( f (w) = 000) = 1 P(w = 000 or 100) = 1 4 5. 4 5. 4 5 1 5. 4 5. 4 5 = 9 25.

14 Coding Theory In general, word error probability depends on the symbol error probability, the code, the decoding process and the word chosen, and we seek a decoding process which minimises the maximum word error probability. It can be shown that (as long as p is small) the best decoding process in this respect is always a nearest-neighbour decoding process. 3.2 Rate, capacity and Shannon s Theorem Definition. Given a q-ary (n, M, d)-code C, we define the rate of C to be log q M. n The rate of a code can be interpreted as the ratio of useful information to total information transmitted. For example, the q-ary repetition code of length 3 is a (3, q, 3)-code, so has rate 1 3. The useful information in a word can be thought of as the first digit the rest of the digits are just redundant information included to reduce error probabilities. Definition. The capacity of a symmetric binary channel with symbol error probability p is 1 + p log 2 p + (1 p) log 2 (1 p). The capacity of a channel is supposed to be a measure of how good the channel is for communicating. Note that if p = 1 2, then the capacity is 0. This reflects the fact that it is hopeless transmitting through such a channel given a received word, the word sent is equally likely to be any word in the code. Theorem 3.1 (Shannon s Theorem). Suppose we have a noisy channel with capacity C. Suppose ɛ and ρ are positive real numbers with ρ < C. Then for any sufficiently large n, there exists a binary code of length n and rate at least ρ and a decoding process such that the word error probability is at most ɛ. What does the theorem say? It says that as long as the rate of our code is less than the capacity of the channel, we can make the word error probability as small as we like. The proof of this theorem is well beyond the scope of this course.

Linear codes 15 4 Linear codes For the rest of the course, we shall be restricting our attention to linear codes; these are codes in which the alphabet A is a finite field, and the code itself forms a vector space over A. These codes are of great interest because: they are easy to describe we need only specify a basis for our code; it is easy to calculate the minimum distance of a linear code we need only calculate the distance of each word from the word 00... 0; it is easy to decode an error-correcting linear code, via syndrome decoding; many of the best codes known are linear; in particular, every known non-trivial perfect code has the same parameters (i.e. length, number of words and minimum distance) as some linear code. 4.1 Revision of linear algebra Non-examinable material Definition. A field is a set F with distinguished elements 0 and 1 and binary operations + and such that: F is an abelian group under +, with identity element 0 (that is, we have a + b = b + a, (a + b) + c = a + (b + c), a + 0 = a, there exists an element a of F such that a + a = 0 for all a, b, c F); F \ {0} is an abelian group under, with identity element 1 (that is, we have a b = b a, (a b) c = a (b c), a 1 = a, there exists an element a 1 of F such that a 1 a = 1 for all a, b, c F \ {0}); a (b + c) = (a b) + (a c) for all a, b, c F. Definition. If F is a field, a vector space over F is a set V with a distinguished element 0, a binary operation + and a function : (F V) V (that is, a function which, given an element λ of F and an element v of V, produces a new element λ v of V) such that: V is an abelian group under + with identity 0;

16 Coding Theory for all λ, µ F and u, v V, we have (λ µ) v = λ (µ v), (λ + µ) v = (λ v) + (µ v), λ (u + v) = (λ u) + (λ v), 1 v = v. We make all the familiar notational conventions: we may write a b as ab; we write a b 1 as a/b; we write a + ( b) as a b. Definition. If V is a vector space over F, then a subspace is a subset of V which is also a vector space under the same operations; we write W V to mean that W is a subspace of V. To check whether a subset W V is a subspace, we can use the Subspace Test: W is a subspace if and only if 0 W, v + w W for all v, w W, and λv W for all v W, λ F. Definition. Suppose V is a vector space over F and v 1,..., v n V. Then we say that v 1,..., v n are linearly independent if there do not not exist λ 1,..., λ n F which are not all zero such that λ 1 v 1 + + λ n v n = 0. A linear combination of v 1,..., v n is an expression λ 1 v 1 + + λ n v n, where λ 1,..., λ n F. The set of all linear combinations of v 1,..., v n is called the span of v 1,..., v n, written as v 1,..., v n. If v 1,..., v n = V, we say that v 1,..., v n span V. A basis for V is a set {v 1,..., v n } of vectors in V which are linearly independent and span V. Fact. If {v 1,..., v n } is a basis of a vector space V, then every v V can be uniquely written in the form λ 1 v 1 + + λ n v n, for λ 1,..., λ n V. Fact. Any vector space V has at least one basis, and all bases of V have the same size. Definition. If V is a vector space, the dimension of V (written dim(v)) is the number of elements in a basis of V. Useful fact. If V is a vector space of dimension n and v 1,..., v n are vectors in V which are linearly independent or span V, then {v 1,..., v n } is a basis of V. Definition. Suppose V, W are vector spaces over F. A linear map from V to W is a function α : V W such that α(λu + µv) = λα(u) + µα(v) for all λ, µ F and u, v V. If α is a linear map, the kernel of α is the subset ker(α) = {v V α(v) = 0}

Linear codes 17 of V, and the image of α is the subset Im(α) = {α(v) v V} of W. ker(α) is a subspace of V, and we refer to its dimension as the nullity of α. Im(α) is a subspace of W, and we refer to its dimension as the rank of α. The Rank nullity Theorem says that if α is a linear map from V to W, then nullity(α) + rank(α) = dim(v). We shall need the following familiar property of fields. Lemma 4.1. Let F be a field, and a, b F. 1. a0 = 0. 2. If ab = 0, then a = 0 or b = 0. Proof. 1. We have 0 = 0 + 0, so a0 = a(0 + 0) = a0 + a0. Adding (a0) to both sides gives 0 = a0. 2. Suppose ab = 0 but b 0. Then we have bb 1 = 1. So 0 = 0b 1 = abb 1 = a1 = a. We shall only be interested in one particular type of vector space. For a non-negative integer n, we consider the set F n, which we think of as the set of column vectors of length n over F, with operations x 1 + y 1 and x 1. x n + λ x 1. x n y 1. y n = =. x n + y n λx 1.. λx n F n is a vector space over F of dimension n. Sometimes we will think of the elements of F n as row vectors rather than column vectors, or as words of length n over F. Given m, n and an n m matrix A over F, we can define a linear map α : F m F n by A 11 x 1 + + A 1m x m α x 1. x m = A x 1. x m =.. A n1 x 1 + + A nm x m Every linear map from F m to F n arises in this way. The rank of A is defined to be the rank of this linear map. The column rank of A is defined to be the dimension of c 1,..., c m, where c 1,..., c m are the columns of A regarded as vectors in F n, and the row rank is defined to be the dimension of r 1,..., r n, where r 1,..., r n are the rows of A regarded as (row) vectors in F m. Fact. If A is an n m matrix and α : F m F n is the corresponding linear map, then rank(α) = column rank(a) = row rank(a). End of non-examinable material

18 Coding Theory 4.2 Finite fields and linear codes The examples of fields you are most familiar with are Q, R and C. But these are of no interest in this course we are concerned with finite fields. The classification of finite fields goes back to Galois. Theorem 4.2. Let q be an integer greater than 1. Then a field of order q exists if and only if q is a prime power, and this field is unique. If q is a prime power, then we refer to the unique field of order q as F q. For example, if q is actually a prime, then F q simply consists of set {0, 1,..., q 1}, with addition and multiplication mod q. If q is a prime power but not a prime, then the field F q is not of this form you don t get a field if you define it like this. Actually, it s awkward to describe what the field F q looks like without developing lots of theory. But this doesn t matter all that matters for us is that F q is a field. Definition. Suppose q is a prime power. A linear code over F q is a subspace of F n q. 4.3 The minimum distance of a linear code One of the advantages of linear codes that we mentioned earlier is that it s easy to find the minimum distance of a linear code. Definition. Suppose q is a prime power and w F n q. The weight weight(w) of w is defined to be the number of non-zero symbols in w. Lemma 4.3. Suppose q is a prime power, w, x, y F n q and λ is a non-zero element of F q. Then: 1. d(w + y, x + y) = d(w, x); 2. d(λw, λx) = d(w, x); 3. d(w, x) = weight(w x). Proof. 1. d(w + y, x + y) = {i (w + y) i (x + y) i } = {i w i + y i x i + y i }. We have w i + y i = x i + y i if and only if w i = x i (because we can add ±y i to both sides), so d(w + y, x + y) = {i w i x i } = d(w, x). 2. d(λw, λx) = {i (λw) i (λx) i } = {i λw i λx i }. We have λw i = λx i if and only if w i = x i (because we can multiply both sides by λ or λ 1 ), so d(λw, λx) = {i w i x i } = d(w, x).

Linear codes 19 3. By definition weight(w x) = d(w x, 00... 0), which equals d(w x, x x), and by part (1) this equals d(w, x). Corollary 4.4. Suppose C is a linear code over F q. The minimum distance of C equals the minimum weight of a non-zero word in C. Proof. Let D be the minimum weight of a non-zero word in C. If we let w be a word in C of weight D, then we have D = weight(w) = d(w, 00... 0). Since w, 00... 0 are different words in C, we must have d(c) D. Now let x, y be words in C such that d(x, y) = d(c). Then we have x y C because C is closed under subtraction, and weight(x y) = d(x, y) = d(c). So D d(c). So we have D d(c) and D d(c), so D = d(c). Key example. Let q be any prime power, n 2 an integer, and define the parity-check code of length n over F q to be C = { v F n q v 1 + + v n = 0 }. Then C is linear (apply the Subspace Test), and C has minimum distance 2. This is beacue C contains word of weight 2, namely 1( 1)00... 0, but C does not contains any words of weight 1: if v F n q has weight 1, then there is a unique i sch that v i 0. But then v 1 + + v n = v i 0, so v C. 4.4 Bases and generator matrices Lemma 4.5. A vector space V of dimension k over F q contains exactly q k elements. Proof. Suppose {e 1,..., e k } is a basis for V. Then and v V can be written uniquely in the form λ 1 e 1 + λ 2 e 2 + + λ k e k, for some choice of λ 1,..., λ k F q. So the number of vectors in V is the number of expressions of this form, i.e. the number of choices of λ 1,..., λ k. Now there are q ways to choose each λ i (since there are q elements of F q to choose from), and so the total number of choices of these scalars is q k. Notation. Suppose q is a prime power. A linear [n, k]-code over F q is a linear code over F q of length n and dimension k. If in addition the code has minimum distance at least d, we say that C is a linear [n, k, d]-code. As a consequence of Lemma 4.5, we see that a linear [n, k, d]-code over F q is a q-ary (n, q k, d)- code. Definition. Suppose C is a linear [n, k]-code over F q. A generator matrix for C is a k n matrix with entries in F q, whose rows form a basis for C. 4.5 Equivalence of linear codes Recall the definition of equivalent codes from earlier: codes C and D over an alphabet A are equivalent if we can get from C to D via a sequence of Operations 1 and 2. There s a slight problem with applying this to linear codes, which is that if C is linear and D is equivalent to C, then D need not be linear. Operation 1 is OK, as we shall now show.

20 Coding Theory Lemma 4.6. Suppose C is a linear [n, k, d]-code over F q, and σ is a permutation of {1,..., n}. Then the map φ : C F n q v v σ is linear, and C σ is a linear [n, k, d]-code. Proof. Suppose v, w C and λ, µ F q. We need to show that φ(λv + µw) = λφ(v) + µφ(w), i.e. ( φ(λv + µw) ) j = ( λφ(v) + µφ(w) ) j for every j {1,..., n}. We have ( φ(λv + µw) )j = ( (λv + µw) σ ) = (λv + µw) σ(j) = (λv) σ(j) + (µw) σ(j) = λ(v σ(j) ) + µ(w σ(j) ) = λ(v σ ) j + µ(w σ ) j = λ(φ(v)) j + µ(φ(w)) j = ( λφ(v) ) j + ( µφ(w) ) j = ( λφ(v) + µφ(w) ) j, as required. Now C σ is by definition the image of φ, and so is a subspace of F n q, i.e. a linear code. We know that d(c σ ) = d(c) from before, and that C σ = C. By Lemma 4.5 this implies that dim C σ = dim C = k, so C σ is an [n, k, d]-code. Unfortunately, Operation 2 does not preserve linearity. So we define the following. Operation 2 Suppose C is a linear code of length n over F q. Choose i {1,..., n} and a F q \{0}. For v = v 1... v n F n q define v a,i = v 1... v i 1 (av i )v i+1... v n. j Now replace C with the code C a,i = {v a,i v C}. We want to show that Operation 2 preserves linearity, dimension and minimum distance. We begin by showing that it s a special case of Operation 2.

Linear codes 21 Lemma 4.7. If F is a field and a F \ {0}, then the map f : F F x ax is a bijection, i.e. a permutation of F. Proof. Since a 0 and F is a field, a has an inverse a 1. So f has an inverse x a 1 x, and hence is a bijection. Now we show that the operation which sends v to v a,i is linear, which will mean that it sends linear codes to linear codes. Lemma 4.8. Suppose C is a linear [n, k, d]-code over F q, i {1,..., n} and 0 a F q. Then the map is linear, and C a,i is a linear [n, k, d]-code over F q. φ : C F n q v v a,i Proof. Take v, w C and λ, µ F q. We must show that ( φ(λv + µw) ) for each j {1,..., n}. For j i we have ( φ(λv + µw) ) j = ( λφ(v) + µφ(w) ) j j = (λv + µw) j = (λv) j + (µw) j = λv j + µw j = λ(φ(v)) j + µ(φ(w)) j = (λφ(v)) j + (µφ(w)) j, = ( λφ(v) + µφ(w) ) j, while for j = i we have ( φ(λv + µw) ) j = a( λv + µw ) j = a ( (λv) j + (µw) j ) = a(λv j + µw j ) = aλv j + aµw j = λav j + µaw j = λ(φ(v) j ) + µ(φ(w) j ) = (λφ(v)) j + (µφ(w)) j = ( λφ(v) + µφ(w) ) j,

22 Coding Theory as required. Now C a,i is by definition the image of φ, and this is a subspace of F n q, i.e. a linear code. We know from before (since Operation 2 is a special case of Operation 2) that d(c a,i ) = d(c) and C a,i = C, and this gives dim C a,i = dim C = k, so that C a,i is a linear [n, k, d]-code. In view of these results we re-define equivalence for linear codes. Definition. Suppose C and D are linear codes over F q. C and D are equivalent (as linear codes) if we can get from one to the other by applying Operations 1 and 2 repeatedly. As a consequence of Lemma 4.6 and Lemma 4.8, we see that if two linear codes are equivalent, then they have the same dimension and minimum distance. Now we consider the relationship between equivalence and generator matrices. We define the following operations on matrices over F q : MO1. permuting the rows; MO2. multiplying a row by a non-zero element of F q ; MO3. adding a multiple of a row to another row. You should recognise these as the elementary row operations from Linear Algebra. Their importance is as follows. Lemma 4.9. Suppose C is a linear [n, k]-code with generator matrix G. If the matrix H can be obtained from G by applying one of MO1 MO3, then H is also a generator matrix for C. Proof. Since G is a generator matrix for C, we know that the rows of G are linearly independent and span C. So G has rank k (the number of rows) and row space C. We know from linear algebra that elementary row operations do not affect the rank or the row space of a matrix, so H also has rank k and row space C. So the rows of H are linearly independent and span C, so form a basis for C, i.e. H is a generator matrix for C. Now we define two more matrix operations: MO4. permuting the columns; MO5. multiplying a column by a non-zero element of F q. Lemma 4.10. Suppose C is a linear [n, k]-code over F q, with generator matrix G. If the matrix H is obtained from G by applying MO4 or MO5, then H is a generator matrix for a code D equivalent to C. Proof. Suppose G has entries g jl, for 1 j k and 1 l n. Let r 1,..., r k be the rows of G, i.e. r j = g j1 g j2... g jn. By assumption {r 1,..., r k } is a basis for C; in particular, the rank of G is k. Suppose H is obtained using MO4, applying a permutation σ to the columns. This means that h jl = g jσ(l), so row j of H is the word g jσ(1) g jσ(2)... g jσ(n). But this is the word (r j ) σ as defined in equivalence operation 1. So the rows of H lie in the code C σ, which is equivalent to C.

Linear codes 23 Now suppose instead that we obtain H by applying MO5, multiplying column i by a F q \ {0}. This means that g jl (l i) h jl = ag jl (l = i), so that row j of H is the word g j1 g j2... g j(i 1) (ag ji )g j(i+1)... g jn. But this is the word (r j ) a,i as defined in equivalence operation 2. So the rows of H lie in the code C a,i, which is equivalent to C. For either matrix operation, we have seen that the rows of H lie in a code D equivalent to C. We need to know that they form a basis for D. Since there are k rows and dim(d) = dim(c) = k, it suffices to show that the rows of H are linearly independent, i.e. to show that H has rank k. But matrix operations 4 and 5 are elementary column operations, and we know from linear algebra that these don t affect the rank of a matrix. So rank(h) = rank(g) = k. We can summarise these results as follows. Proposition 4.11. Suppose C is a linear [n, k]-code with a generator matrix G, and that the matrix H is obtained by applying a combination of matrix operations MO1 5. Then H is a generator matrix for a code D equivalent to C. Proof. Each time we apply MO1 3 we get a new generator matrix for for the same code, by Lemma 4.9. Each time we apply MO4 or MO5, we get a generator matrix for an equivalent code, by Lemma 4.10. So a combination of these operations will yield a generator matrix for a code equivalent to the one we started with. Note that in the list of matrix operations 1 5, there is a sort of symmetry between rows and columns. In fact, you might expect that you can do another operation MO6. adding a multiple of a column to another column but you can t. Doing this can take you to a code with a different minimum distance. For example, suppose q = 2, and that C is the parity-check code of length 3: C = {000, 011, 101, 110}. We have seen that d(c) = 2 and that C has a generator matrix ( ) 101 G =. 011 If we applied MO6 above, adding column 1 to column 2, we d get the matrix ( ) 111 H =. 011 This is a generator matrix for the code {000, 111, 011, 100},

24 Coding Theory which has minimum distance 1, so is not equivalent to C. So the difference between row operations and column operations is critical, and we do not allow MO6. Armed with these operations, we can define a standard way in which we can write generator matrices. Definition. Let G be a k n matrix over F q, with k n. We say that G is in standard form if G = (I k B), where I k is the k k identity matrix, and B is some k (n k) matrix. Transforming a matrix into standard form Now we show how to transform a generator matrix into standard form, using MO1 5. So suppose G is a k n matrix over F q, whose rows are linearly independent. Since the rows of G are linearly independent, G has rank k. MO1 5 do not affect the rank of a matrix, so at any point in the following procedure the matrix obtained will have rank k, and will therefore have linearly independent rows. For i = 1,..., k we want to transform column i into 0 where the 1 is in position i. Suppose we have already done this for columns 1,..., i 1.. 0 1 0. 0 Step 1. Since the rows of our matrix are linearly independent, they are non-zero, so there must be some non-zero entry in the ith row. Furthermore, by what we know about columns 1,..., i 1, this non-zero entry must occur in one of columns i,..., n. So we apply MO4, permuting columns i,..., n to get a non-zero entry in the (i, i)-position of our matrix. Step 2. Suppose the (i, i)-entry of our matrix is a 0. Then we apply MO5, multiplying column i of our matrix by a 1, to get a 1 in the (i, i)-position. Step 3. We now apply MO3, adding multiples of row i to the other rows in order to kill the remaining non-zero entries in column i. Note that this operation does not affect columns 1,..., i 1. By applying Steps 1 3 for i = 1,..., k in turn, we get a matrix in standard form. Note that it is automatic from the proof that k n. Proposition 4.12. Suppose C is a linear [n, k]-code over F q. Then C is equivalent to a code with a generator matrix in standard form. Proof. Let G be a generator matrix for C. G has linearly independent rows, so using the procedure described above we can transform G into a matrix H in standard form using MO1 5. By Proposition 4.11, H is a generator matrix for a code equivalent to C.,

Linear codes 25 4.6 Decoding with a linear code Definition. Suppose C is a linear [n, k]-code over F q. For v F n q, define The set v + C is a called a coset of C. v + C = {v + w w C}. The crucial properties of cosets are as follows. Proposition 4.13. Suppose C is an [n, k]-code over F q. Then: 1. every coset of C contains exactly q k words; 2. every word in F n q is contained in some coset of C; 3. if the word v is contained in the coset u + C, then v + C = u + C; 4. any word in F n q is contained in at most one coset of C; 5. there are exactly q n k cosets of C. Proof. 1. Given a coset v + C, we define φ : C v + C w v + w, and we claim that φ is a bijection. φ is certainly surjective, since by definition v + C is the image of φ. And φ is injective, since if w, x C with φ(w) = φ(x), then v + w = v + x, which gives w = x. So φ is a bijection, so v + C = C = q k, by Lemma 4.5. 2. Suppose v F n q. Since C is linear, we have 00... 0 C, and so v = v + 00... 0 v + C. 3. Since v u + C, we have v = u + x for some x C. Now for every w C, we have v + w = u + (x + w) u + C; so every word in v + C is in u + C, i.e. v + C u + C. But v + C = u + C by (1), so v + C = u + C. 4. Suppose u v + C and u w + C, where u, v, w F n q. Then by part (3) we have v + C = u + C = w + C. 5. There are q n words altogether in F n q, and by parts (2) and (4) each of them is contained in exactly one coset of C. Each coset has size q k, and so the number of cosets must be q n /q k = q n k.

26 Coding Theory Definition. Given a linear [n, k]-code C over F q and a coset w + C, we define a leader of w + C to be a word of minimal weight in w + C. Now suppose we are given a linear [n, k]-code C over F q. We define a Slepian array for C to be an array of words, constructed as follows: choose one leader from each coset (note that the word 00... 0 must be the leader chosen from the coset 00... 0 + C = C, since it has smaller weight than any other word); in the first row of the array, put all the words in C, with 00... 0 at the left and the other words in any order; in the first column put all the chosen coset leaders the word 00... 0 is at the top, and the remaining leaders go in any order; now fill in the remaining entries by letting the entry in row i and column j be (leader at the start of row i) + (word at the top of column j). Lemma 4.14. Suppose C is a linear [n, k]-code over F q, and S is a Slepian array for C. Then every word in F n q appears once in S. Proof. Take v F n q. The v lies in some coset D, by Proposition 4.13(2). Let y be the chosen leader for D; then y appears in column 1, in row i, say. Since y D, we have D = y + C, by Proposition 4.13(3). So v y + C, and so we can write v = y + u, where u C. u lies in row 1 of the array, in column j, say, and so v lies in row i and column j of the array. Definition. Let C be an [n, k]-code over F q, and let S be a Slepian array for C. We define a decoding process f : F n q C as follows. For v F n q, we find v in the array S (which we can do, by Lemma 4.14). Now we let f (v) be the word at the top of the same column as v. Theorem 4.15. Suppose C is a linear [n, k]-code over F q and S is a Slepian array for C, and let f be the decoding process constructed above. Then f is a nearest-neighbour decoding process. Proof. We need to show that for any v F n q and any w C, d(v, f (v)) d(v, w). v appears somewhere in S, and by construction f (v) is the word at the top of the same column. If we let u be the word at the start of the same row as v, then by the construction of S we have This gives v = u + f (v). v w = u + ( f (v) w) u + C. We also have u = u + 00... 0 u + C, and u was chosen to be a leader for this coset, which means that weight(u) weight(v w). So d(v, f (v)) = weight(v f (v)) = weight(u) weight(v w) = d(v, w).

Linear codes 27 Quick algorithm for constructing a Slepian array Row 1: write the words of C in a row, with the word 00... 0 at the start, and the remaining words in any order. Subsequent rows: choose a word u that you haven t written yet of minimal weight. Put u at the start of the row, and then for the rest of the row, put in column j. u + (word at the top of column j) Stop when every word appears in the array.

28 Coding Theory 5 Dual codes and parity-check matrices 5.1 The dual code Definition. Suppose q is a prime power, and n > 0. For v, w F n q, define the dot product v.w = v 1 w 1 +... v n w n. Lemma 5.1. The dot product. is symmetric and bilinear, i.e. v.w = w.v and v.(λw + µx) = λ(v.w) + µ(v.x) for all words v, w, w and all λ, µ F q. Proof. Exercise (non-examinable). Definition. Suppose C is a linear [n, k]-code over F q. The dual code is defined to be C = { w F n q v.w = 0 for all v C }. Lemma 5.2. Suppose C is a linear [n, k]-code and G is a generator matrix for C. Then w C Gw T = 0. Note that we think of the elements of F n q as row vectors; if w is the row vector (w 1... w n ), then w T is the column vector w 1.. G is a k n matrix, so Gw T is a column vector of length k. w n Proof. Suppose G has entries g ij, for 1 i k and 1 j n. Let g(i) denote the ith row of G. Then g(1),..., g(k) are words which form a basis for C, and g(i) = g i1... g in. Now ( ) n Gw T = g ij w j i j=1 = g(i).w, so Gw T = 0 if and only if g(i).w = 0 for all i. Suppose w C. Then v.w = 0 for all v C. In particular, g(i).w = 0 for i = 1,..., k. So Gw T = 0. Conversely, suppose Gw T = 0. Given v C, we can write v = λ 1 g(1) + + λ k g(k)

Dual codes and parity-check matrices 29 for λ 1,..., λ k F q, since g(1),..., g(k) span C. So v.w = (λ 1 g(1) + + λ k g(k)).w = λ(g(1).w) + + λ k (g(k).w) by Lemma 5.1 = 0 + + 0 = 0. This applies for any v C, so w C. C. This lemma gives us another way to think of C : it is the kernel of any generator matrix of Theorem 5.3. If C is a linear [n, k]-code over F q, then C is a linear [n, n k]-code. Proof. Let G be a generator matrix of C. Then Lemma 5.2 says that C is the kernel of the linear map φ : F n q F k q w Gw T, so it is a subspace of F n q, i.e. a linear code of length n. The Rank nullity Theorem says that rank(φ) + nullity(φ) = dim F n q, so rank(g) + dim(c ) = n. G has k rows, which are linearly independent, and so rank(g) = k. Hence dim C = n k. Theorem 5.4. Suppose C is a linear [n, k]-code over F q. Then (C ) = C. Proof. First note that because C = {w F n q v.w = 0 for all v C}, we have v.w = 0 whenever v C and w C. This means that any v V is contained in {x F n q w.x = 0 for all w C }. But this is by definition (C ). So C (C ). On the other hand C and (C ) are both vector spaces, and by Theorem 5.3 dim((c ) ) = n dim(c ) = n (n k) = k = dim C, so C = (C ). Now we make a very important definition. Definition. Suppose C be a linear code. A parity-check matrix for C is a generator matrix for C. We will see in the rest of the course that parity-check matrices are generally more useful than generator matrices. Here is an instance of this. Lemma 5.5. Suppose C is a linear [n, k]-code over F q, H is a parity-check matrix for C, and v is a word in F n q. Then v C if and only if Hv T = 0.

30 Coding Theory Proof. By Theorem 5.4 we have v C if and only if v (C ). Now H is a generator matrix for C, and so by Lemma 5.2 we have w (C ) if and only if Hw T = 0. Non-examinable material But can we find a parity-check matrix? The following lemma provides a start. Lemma 5.6. Suppose C is a linear [n, k]-code over F q with generator matrix G, and H is a matrix over F q. Then H is a parity-check matrix for C if and only if H is an (n k) n matrix, the rows of H are linearly independent, and GH T = 0. In order to prove this lemma, we think about how we multiply matrices. Suppose A is an r s matrix, and B is an s t matrix, and let b(1),..., b(t) deonte the columns of B. Then the columns of AB are Ab(1),..., Ab(t). Proof of Lemma 5.6. ( ) Suppose H is a parity-check matrix for C, then it is a generator matrix for C, which is a linear [n, n k]-code, so H is an (n k) n matrix. If we let h(1),..., h(n k) denote the rows of H, then these rows form a basis for C ; in particular, they are linearly independent. Now the columns of H T are h(1) T,..., h(n k) T, so the columns of GH T are Gh(1) T,..., Gh(n k) T. Since each h(i) lies in C, we have Gh(i) T = 0 by Lemma 5.2. So GH T = 0. ( ) Suppose H is an (n k) n matrix with linearly independent rows, and that GH T = 0. Again, we let h(1),..., h(n k) denote the rows of H. Since GH T = 0, we have Gh(i) T = 0 for each i, so each h(i) lies in C by Lemma 5.2. So the rows of H are linearly independent words in C. But the number of rows is n k, which is the dimension of C, so in fact the rows form a basis for C, and hence H is a generator matrix for C, i.e. a parity-check matrix for C. This helps us to find a parity-check matrix if we already have a generator matrix. If the generator matrix is in standard form, then a parity-check matrix is particularly easy to find. First we need a lemma about multiplying block matrices. Lemma 5.7. Suppose F is a field, and A is an m n matrix over F, B is an m s matrix over F, C is an n t matrix over F, D is an s t matrix over F. Let Then E = ( A B ), F = EF = AC + BD. ( C D ).