The extended Golay code - PDF Free Download

The extended Golay code N. E. Straathof July 6, 2014 Master thesis Mathematics Supervisor: Dr R. R. J. Bocklandt Korteweg-de Vries Instituut voor Wiskunde Faculteit der Natuurwetenschappen, Wiskunde en Informatica Universiteit van Amsterdam

Abstract This thesis discusses a type of error-correcting codes: the extended Golay code G 24. With the help of general coding theory the characteristic features of this code are explained. Special emphasis is laid on its automorphism group, the group that acts on all codewords and leaves the code unaltered. It is called the Mathieu group M 24, which is one of the few sporadic groups in the classification of all finite simple groups. The properties of G 24 and M 24 are visualised by four geometric objects: the icosahedron, dodecahedron, dodecadodecahedron, and the cubicuboctahedron. The dodecahedron also provides us with a means to visualise the encoding and decoding processes, and it opts for an alternative way of discussing the code in coding theory or programming courses. Title: The extended Golay code Authors: N. E. Straathof, nina.straathof@student.uva.nl, 6187501 Supervisor: Dr R. R. J. Bocklandt Second grader: Dr H. B. Posthuma Date: July 6, 2014 Korteweg-de Vries Instituut voor Wiskunde Universiteit van Amsterdam Science Park 904, 1098 XH Amsterdam http://www.science.uva.nl/math 2

Contents 1. Introduction 5 2. Error-correcting codes 6 2.1. Definitions................................... 6 2.2. Encoding................................... 10 2.3. Decoding.................................... 12 2.4. Cyclic codes.................................. 16 2.5. Steiner systems................................ 19 3. The extended Golay code G 24 22 3.1. The extended Golay code........................... 22 3.2. The Golay code................................ 34 3.3. Decoding.................................... 35 4. The Mathieu Group M 24 39 4.1. Introduction.................................. 39 4.2. Definitions................................... 40 4.3. Construction................................. 44 4.4. Multiple transitivity............................. 48 4.5. Simplicity................................... 50 5. Geometric constructions 53 5.1. Definitions................................... 53 5.2. Icosahedron.................................. 55 5.3. Dodecahedron................................. 57 5.3.1. Encoding............................... 58 5.3.2. Decoding............................... 59 5.4. Dodecadodecahedron............................. 68 5.5. Cubicuboctahedron.............................. 71 5.6. Links...................................... 79 3

6. Discussion 82 A. Decoding patterns 84 B. GAP verifications 92 C. Popular summary 95 Bibliography 96 4

Chapter 1 Introduction Communication is important in our daily life. We use phones, satellites, computers and other devices to send messages through a channel to a receiver. Unfortunately, most types of communication are subject to noise, which may cause errors in the messages that are being sent. Especially when sending messages is a difficult or expensive task, for example in satellite communication, it is important to find ways to diminish the occurrance of errors as much as possible. This is the central idea in coding theory: what message was being sent given what we have received? To make this problem as easy as possible we use error-correcting codes. The main idea is to add redundancy to the messages which enables us to both identify and correct the errors that may have occured. This thesis discusses a specific type of error-correcting codes, the extended Golay code G 24, named after the Swiss mathematician Marcel J.E. Golay (1902-1989). He used mathematics to solve real-world problems, one of which was the question of how to send messages from satellites through space. The extended Golay code was used for sending images of Jupiter and Saturn from the Voyager 1 and 2. Along with the extended Golay code we discuss a specific Mathieu group, M 24, as it is highly linked to the code. This group is named after the French mathematician Émile Léonard Mathieu (1835-1890). The last part of this thesis describes four geometric figures with which we can visualise properties of G 24 and M 24. Upon reading this thesis the reader schould be familiar with group theory, linear algebra and (hyperbolic) geometry. 5

Chapter 2 Error-correcting codes The transfer of information in general comes down to three steps: a source sends, a channel transmits, and a receiver receives. However, in many cases the possibility exists that the information is altered by noise in the channel. For example, messages which are sent from satellites through space have a high chance of being altered along the way. The question of how to enlarge the probability of receiving information as was intended initiated the study of error-correcting codes: codes which enable us to correct errors that have occured. Figure 2.1 shows the general idea: a message is encoded into a codeword, it is sent to the receiver through a channel, in this channel the possibility exists that errors occur, and the receiver tries to obtain the original message by decoding the word. message m encoding sending decoding codeword x received information r Figure 2.1.: Using error-correcting codes. message m In section 2.1 we start by explaning the basic definitions, and in section 2.2 and 2.3 the encoding and decoding methods will be explained. Sections 2.5 and 2.4 discuss two subjects that we will need later on, since they are properties of the extended Golay code. 2.1. Definitions We use error-correcting codes to understand what was sent to us, even though what we received might differ from what was sent. It consists of codewords which are the original messages with some amount of redundancy added. In this section we will discuss the most important properties of such codes which allows us to give a detailed description of 6

the extended Golay in chapter 3. Definition 1. A message m of length k is a sequence of k symbols out of some finite field F, so m = (m 1... m k ) F k. Then an n-code C over a finite field F is a set of vectors in F n, where n k. If F = q, we call the code q-ary, and if F = 2, we call it binary. Since we will be dealing with a binary code only, we will assume codes are binary from now on. This means that F = F 2 = {0, 1}, and thus that F n = F n 2 = 2 n. Definition 2. The error probability p is the probability that 0 is received when 1 was sent, or 1 is received when 0 was sent. In general we assume that p [0, 1 2 ). Figure 2.2 illustrates a binary channel with error probability p. Figure 2.2.: Binary channel with error probability p. Definition 3. The Hamming weight wt(v) of a vector v F n is the number of its nonzero elements: wt(v) = #{i v i 0, i = 1,..., n}. Definition 4. The Hamming distance dist(v, w) of two vectors v, w F n is the number of places where they differ: dist(v, w) = #{i v i w i, i = 1,..., n}. The idea is that an n-code C is a strict subset of F n in which we want the Hamming distance between any two vectors to be as large as possible. Therefore, the minimum Hamming distance is an important characteristic of the code. Definition 5. The mimimum Hamming distance d of a code C is defined as d = min{dist(x, y) x, y C}. Definition 6. An (n, M, d)-code C is a set of M vectors in F n 2, such that dist(x, y) d for all x, y C. Its block length is n and M is its dimension. In general we know little about (n, M, d)-codes. What helps us is to add some structure to the set of vectors of which it consists. Definition 7. A linear [n,k,d]-code C is a k-dimensional subspace of F n 2, such that dist(x, y) d for all x, y C. 7

Remark 8. The notational distinction between linear and non-linear codes is given by the brackets: an [n, k, d]-code is an (n, 2 k, d)-code. If x and y are codewords in C, then for all a, b F 2 also ax + by C, since C is a subspace of F n 2. This justifies the term linear. Since dim(c) = k, we have C = 2 k, but we still call k the dimension of C. Another useful property of linear codes is the following: Theorem 9. For a binary linear code C it holds that the minimum Hamming distance is equal to the minimum weight of any non-zero codeword. Proof. If x, y C then also w := x y C, since the code is linear. So: d = min{dist(x, y) x, y C, x y} = min{wt(x y) x, y C, x y} = min{wt(w) w C, w 0}. For each codeword x we can look at the set of all other codewords y that lie within a certain distance: Definition 10. The Hamming sphere S t (x) of radius t around a vector x is the set of vectors y F n 2 such that dist(x, y) t. Note that the number of vectors y at discance i from x is ( n i), since we have to choose i out of n places where x and y differ. This means that the number of vectors in S t (x) is equal to t i=0 ( n i). Definition 11. The error-correcting capability t of C is the largest radius of Hamming spheres around all codewords of C, such that for any two different codewords y and z their corresponding Hamming spheres S t (y) and S t (z) are disjoint. The error-correcting capability gives you the number of errors that the code is able to correct, i.e. the number of digits in which a sent and received vector differ. Note that this is not the same as the error-detecting capability, which says how many errors the codes is able to detect, but not neccesarily to correct. Theorem 12. For the error-correcting capability t of an (n, M, d)-code C it holds that t = 1 2 (d 1) 1. Proof. Firstly suppose that t > 1 (d 1). If we take two codewords x, y C for which 2 dist(x, y) = d, then S t (x) S t (y), a contradiction. So, we must have t 1 (d 1). 2 Now let t < 1 (d 1). Then for any two different codewords x, y C and any vector 2 v S t (x) S t (y), we have by the triangle inequality: dist(x, y) dist(x, v) + dist(v, y) 1 s denotes the largest integer not greater than s, for any real valued number s. 8

1 1 < (d 1) + (d 1) 2 2 d 1, which contradicts the fact that the minimum Hamming distance is d. Hence if t = 1 (d 1) then it is the largest radius such that the Hamming spheres of any two different 2 codewords are disjoint. Codewords are vectors in F n 2, so we are able to obtain another vector from a codeword x by permuting its n digits. If we permute the position of the digits only, we say that the resulting codeword y is obtained from a positional permutation. If we permute the symbols at specific places, we say that it is obtained from a symbolic permutation. Example 13. Suppose we have a [4, 2, 2]-code C = {(0000), (0011), (1100), (1111)}. If we look at the following sets: C 1 = {(0000), (1001), (0110), (1111)}, C 2 = {(0000), (0101), (1010), (1111)}, C 3 = {(1000), (1011), (0100), (0111)}, and C 4 = {(0110), (0101), (1010), (1001)}, then we see that C 1 and C 2 are obtained from C by performing a positional permutation: C 1 from shifting all digits one place to the right (and thus the fourth digit moves to position 1), C 2 from interchanging the digits on positions 2 and 3. C 3 and C 4 are obtained from C by performing a symbolic permutation: C 3 from changing the first digit and C 4 from changing the last three digits. If we perform the same permutations on all codewords of a code C, the resulting set also is a code. Luckily, performing either kind of permutations does not alter the set of Hamming distances of the codewords, so the parameters of the code are preserved. Therefore, we can view such codes as being the same : Definition 14. Two codes are called equivalent if one can be obtained from the other by performing a sequence of positional and/or symbolic permutations. In the case where C is a linear code we have to check whether the linearity is preserved. Obviously positional permutations do not cause any trouble, but in example 13 we see that (1010) + (1001) = (0011) C 4, hence: Definition 15. Two codes are called linearly equivalent if one can be obtained from the other by performing a sequence of positional permutations. The next section describes how we can turn messages into codewords. 9

2.2. Encoding Preferably we want the construction of a code, i.e. the choice of redundancy that we add to messages, to be a simple function f : F k 2 F n 2. For linear codes this exactly is the case: the function f sends any message m to a vector x F n 2, where xh t = 0 for some n k n-binary matrix H. Obviously there are several vectors x for which this holds, but in order to obtain one vector x for each message m we set (x 1... x k ) = (m 1... m k ), for then: xh t = (m 1... m k x k+1... x n )H t = mh t k + (x k+1... x n )H t n k, where H k consists of the first k columns of H and H n k consists of its last n k columns. So: xh t = 0 mh t k = (x k+1... x n )H t n k, (2.1) which determines the last n k digits of x uniquely. If we now let H be of the form H = (A 1 n k ), where A is some n k k matrix, then equation 2.1 becomes: xh t = 0 ma t = (x k+1... x n ). (2.2) The matrix H is called the parity check matrix of our code, and if it is of the form as in 2.2 then it is said to be in normal position. Example 16. For a linear code C with n = 6 and k = 3 let H = (A 1 3 ) be the parity check matrix with: 0 1 1 A = 1 0 1. 1 1 0 If the message m is (011), then we decode it into a codeword as follows: first we set (x 1 x 2 x 3 ) equal to (011), and second we determine (x 4 x 5 x 6 ) by putting xh t = 0. As in equation 2.2 this gives us: 0 1 1 ma t = (011) 1 0 1 = (011) = (x 4 x 5 x 6 ). 1 1 0 Hence, the codeword is (011011). Now we can strengthen our original definition of a linear code: Definition 17. An [n, k, d]-code C with parity check matrix H = (A 1 n k ) consists of all vectors x F n 2 such that xh t = 0, where A is a some n k k binary matrix. The vectors x are called codewords of C. 10

For any codeword x of a linear code with parity check matrix H in normal position, we call its first k digits message symbols, and its last n k digits check symbols. The check symbols form the earlier mentioned redundancy that is added to messages. Example 18. A repetition code C is an [n, 1, n]-code with parity check matrix H = (A 1 n 1 ), where A = (1... 1). If a message m = (0), then the equation (0)A t = (x 2... x n ) gives us x = 0, and if m = (1) then (1)A t = (x 2... x n ) implies x = 1. Hence, C = {0, 1}. In both codewords the message symbol is repeated n times, which explains the name of the code. An equivalent way of encoding messages into codewords of a linear code C is by using a generator matrix G. Since all linear combinations of codewords in C are again codewords, we can find a basis for C. The rows of G then are the vectors of this basis. In example 16 the set B = {(100011), (010101), (001110)} forms a basis: the three vectors are linearly independent and B = k = 3. For example the codeword (111000) is obtained by adding all three codewords in B. Hence, in this case: 1 0 0 0 1 1 G = 0 1 0 1 0 1. 0 0 1 1 1 0 Definition 19. An [n, k, d]-code C with generator matrix G consists of all vectors x F n 2 such that x = mg for all messages m F k 2. Remark 20. Note that from equation 2.2 it follows that x = m(1 k A t ), so the linear code with parity check matrix H = (A 1 n k ) is equal to the linear code with generator matrix G = (1 k A t ). Another way of viewing how G and H are related is by computing the dual code: Definition 21. If we have a linear code C, then the dual code C consist of all vectors v F n 2 such that v x = v 1 x 1 +... + v n x n = 0, for all x C. We can see that if y, z C, then x y = x z = 0 for all x C. So, x (ay + bz) = 0 for all a, b F 2 which implies that also C is a linear code. In general, the dimension of C is n k, but its minimum Hamming distance has to be computed seperately for each C. If we now have generator matrices G of C and H of C, then ofcourse HG t = GH t = 0. Hence, H is the parity check matrix of C and G is the parity check matrix of C. Until now we assumed H and G are in normal position. Codes that admit such G and H are called systematic codes. However, as long as H has n columns and n k linearly independent rows, it is a parity check matrix. Likewise, for G to be a generator matrix it suffices to have n columns and k linearly independent rows. Moreover, permuting rows in G will give the same code, and permuting the colums will give an equivalent code. Luckily, we can always work with systematic codes by the following theorem: Theorem 22. For every linear code C there exist matrices H and G that are in normal position, such that the linear code C with parity check matrix H and generator matrix G is linearly equivalent to C. 11

Proof. Suppose C is an [n, k, d]-code with generator matrix G (that is not necessarily in normal position). For any 1 i k, we can firstly interchange rows to ensure that the leftmost non-zero element starting from row i is in row i. Next we add row i to the rows above and below it in order to obtain zero s in the column where the first non-zero entry appears. This means that we now have obtained a k n-matrix G that is in reduced echelon form, so G = MG where M is an invertible k k-matrix. Hence, G has n columns and k linearly independent rows, so it is a generator matrix for C. All that we have left to do is permuting G s columns so that it is in normal position. If p i is the position of the first 1 in row i, then we shift the columns such that column p i becomes column i, and we are done. The reason why we want to encode in this manner, is that the redundancy we add to the message by multiplying it with a generator matrix allows us to detect some errors that might occur, and even correct them. This will be explained in the next section. 2.3. Decoding Noise in the channel through which we send our codeword x gives the possibility that the received vector r and x are different. The error vector e is then defined as e = r x. If the error probability is p, then ofcourse e i = 1 with probability p for i = 1,..., n. By deciding what message was sent upon receiving r, we want to know what codeword x was sent, and thus what is the error vector e. We will see how the encoding method described in the previous section is preferable to solve this, by looking at the syndrome of a received vector r: Definition 23. The syndrome S(r) of a received vector r F n 2 is the vector rh t F n k 2, where H is the parity check matrix of the used linear code C. By definition S(x) = 0 if and only if x C, so: S(r) = rh t = (x + e)h t = xh t + eh t = eh t. Hence, the syndrome of a received vector is equal to the syndrome of its error vector, and instead of looking at the received vector we can focus on its syndrome. Moreover: eh t = n e i (H t ) i, i=1 where (H t ) i is the i th row of H t, so the syndrome of a received vector is equal to the sum of the columns of H where the error occured. Finally, note that there is a one-to-one 12

correspondence between syndromes and cosets of a linear code C, since two codewords x and y are in the same coset of C if and only if x y C, which in turn happens if and only if (x y)h t = 0, i.e. xh t = yh t. So, syndromes form a partition of the code. The decoding therefore includes making a list of all possible syndromes of a received vector. There might be several error vectors that belong to one syndrome, but since p < 1 2 the error vector with the smallest Hamming weight is the one which is most likely to occur. To summarize, the method for decoding a vector r when an [n, k, d]-code was used, is commonly referred to as the standard array, and it goes as follows: Algorithm 24. The standard array. (i) Make a list of all possible messages and their corresponding codewords. (ii) Make a list of all 2 n k syndromes and their corresponding error vectors. If more than one error vectors belongs to a syndrome, you pick the one(s) with least Hamming weight. You do this by using the fact that a syndrome of a vector is the sum of the columns in the parity check matrix H of C where the error(s) occured. (iii) For any received vector r you compute its syndrome, and check in the list which error vector e corresponds to it. Then the original codeword most likely was x = r e, and hence its first k digits are the message. If more than one error vector belongs to a syndrome then they are equally likely, so you have found codewords which are all equally likely to have been sent. Note that for all error vectors e with Hamming weight t = 1 (d 1), this standard array 2 will give you a solution. For larger Hamming weight of the error vector the possibility exists that the received vector will be decoded incorrectly. Example 25. Let C be the [5, 2, 3]-code with parity check matrix H: 1 0 1 0 0 H = 1 1 0 1 0. 0 1 0 0 1 We follow the standard array as in algorithm 24. First we make a list of all possible messages and their corresponding codewords: message codeword (00) (00000) (01) (01011) (10) (10110) (11) (11111) Then we compute the 2 5 2 = 8 syndromes and their corresponding error vectors with the smallest Hamming weight: 13

syndrome error vector (000) (00000) (100) (00100) (010) (00010) (001) (00001) (110) (10000) (101) (00101), (11000) (011) (01000) (111) (00111), (11010) For example the syndrom (100) corresponds to the third column of H, so our error vector is (00100). Likewise, the syndrome (101) is both the sum of the third and the fifth column of H, and the sum of the first and the second column of H. The error vectors (00101) and (11000) have Hamming weight 2, so we both include them in our list. Now suppose we have received r = (11011). Then its syndrome S(r) is: S(r) = rh t 1 1 0 0 1 1 = (11011) 1 0 0 0 1 0 0 0 1 = (110), so in the above table we we find that e = (100000), so x = r e = (01011). Hence, our original message most likely was m = (01). Next we will define the most desirable type of codes: the ones for which no ambiguity about which error vector to choose can arise. Suppose that C is an [n, k, d]-code. Then each Hamming sphere S t (x) around a codeword x C contains t ( n i=0 i) vectors, and if we take t = 1 (d 1), then by theorem 12 for different x, y C it holds that 2 S t (x) S t (y) =. Now since #{S t (x) x C} = #C = 2 k, we have: #( x C S t (x)) = 2 k t i=0 ( ) n 2 n, i and we obtain the following inequality, which is called the sphere packing boundary: t i=0 ( ) n 2 n k. (2.3) i As mentioned before we want a code to contain those codewords such that the probability that the receiver will be able to determine what message was sent is as large as possible. 14

On the other hand, we also want the codewords not to be too large, since then the communication will take up much time. Hence, for an [n, k, d]-code we want to increase both the ratio d and the rate of efficiency R = k. The sphere packing boundary is one n n of the few inequalities that gives a relation between these parameters. Preferably, we want it to be an equality, since then the disjunct Hamming spheres cover up the whole of the code. If this is the case, we call the code perfect: Definition 26. An [n, k, d]-code C with t = 1 (d 1) is perfect if and only if: 2 t ( ) n = 2 n k. (2.4) i i=0 Theorem 27. An [n, k, d]-code C is perfect if and only if for every possible syndrome there is a unique error vector e with wt(e) t = 1 (d 1). 2 Proof. For a codeword x we know that all error vectors inside S t (x) have a different syndrome, since the error-correcting capability is t. The number of vectors inside S t (x) is equal to t i=0 ( n i), and the number of syndromes is 2 n k. By the pigeonhole principle it holds that if #S t (x) is equal to the number of different syndromes, then we must have that every syndrome has a unique error vector e of Hamming weight less than or equal to t. Since the definition of a perfect code requires that t i=0 ( n i) = 2 n k, we have the desired result. Note that linear codes with even d can never be perfect: if d is even then there are vectors in F n 2 which are exactly at distance 1 d from two different codewords, so the error 2 vectors of different codewords cannot be unique. Example 28. The binary linear code C of example 18 is perfect if and only if n is odd: If C is perfect, then d is odd by the above comment, and hence n is odd. If n is odd, say n = 2m + 1 for some m N {0}, then: t i=0 ( ) n = i so by definition 26 C is perfect. = 1 2 = 1 2 t i=0 (( ) ( )) 1 n n + 2 i n i 1 2 (n 1) i=0 n i=0 = 1 2 2n = 2 n 1, ( n i (( ) n + i ) ( )) n n i 15

Perfect codes are desirable but, unfortunately, there are not many of them. An important example of a perfect code is the [23, 11, 7] Golay code, which is closely related to the [24, 12, 8] extended Golay code. In the next sections we will discuss two more topics that enable us to describe these codes in full detail in chapter 3. 2.4. Cyclic codes Linearity imposes quite some restrictions on the parameters that we want to use for our codes. Cyclic codes form another type of codes that allow more flexibility and therefore are widely used. As the Golay codes too are cyclic, this section discusses the properties of such codes. Definition 29. An [n, k, d]-code C is cyclic if and only if (x 1... x n ) C (x n x 1... x n 1 ) C, for all x C. Note that (x 1... x n ) (y 1... y n ) = (x n x 1... x n 1 ) (y n y 1... y n 1 ), so if C is cyclic then C is too. Obviously, if the rows of a generator matrix of a code are cyclic shifts of each other, this code is cyclic. If we look at the polynomial ring F n 2[X], we see that with every vector r F n 2 we can associate a polynomial r(x) as follows: (r 0... r n 1 ) r(x) = r 0 + r 1 X +... + r n 1 X n 1. Cyclicity of a linear code C implies that (r n 1 r 0... r n 2 ) C too, but the associated polynomial then is r 0 X +... + r n 1 X n, which we cannot associate with a codeword anymore. We solve this by working with the quotient ring R = F n 2[X]/ X n 1, so that multiplying with X in R corresponds to a cyclic shift in F n 2. The association of a polynomial to a vector then becomes the map f: f : F n 2 F n 2[X]/ X n 1 (r 0... r n 1 ) r(x) mod (X n 1). Definition 30. An ideal I of F n 2[X]/ X n 1 is a linear subspace of F n 2[X]/ X n 1 such that if r(x) I, then so is r(x) X. If in addition I is generated by one polynomial of F n 2[X]/ X n 1, then I is called a principal ideal. We can easily see that the image of a cyclic [n, k, d]-code C under f is a subset of F n 2[X]/ X n 1 that is closed under addition (linearity) and under multiplication with X (cyclicity). Hence, it is closed under multiplication with any polynomial modulo (X n 1), and C is an ideal of F n 2[X]/ X n 1. If C is principal, then its generator polynomial g(x) plays a similar role as the generator matrix of a linear code, as we will see in the next theorem. 16

Theorem 31. Let C be any non-zero ideal in F n 2[X]/ X n 1, i.e. a cyclic code of length n. Then: (i) C = g(x), where g(x) is a unique monic polynomial of minimal degree r in C, (ii) g(x) is a factor of X n 1, (iii) If g(x) = g 0 + g 1 X +... + g r X r, then C has a generator matrix G which is given by: g(x) g(x) X G =.... g(x) X n r 1 Proof. Let g(x) be a non-zero polynomial in F n 2[X]/ X n 1 with minimal degree r. (i) Suppose g (X) too is a polynomial of degree r in C. Because an ideal is a linear subspace of F n 2[X]/ X n 1, also g (X) g(x) C. However, g (X) g(x) has degree lower than r, a contradiction unless g (X) = g(x), so g(x) is unique. If now f(x) is an arbitrary polynomial in F n 2[X]/ X n 1 then we can write f(x) = h(x)g(x) + r(x), where deg(r(x)) < r and h(x) F n 2[X]/ X n 1. However, then r(x) = f(x) h(x)g(x) C, and since g(x) has minimal degree we must conclude that r(x) = 0. Hence, any polynomial in C is divisible by g(x) so C = g(x). Since g(x) is only defined up to multiplication with a constant we may assume that it is monic. (ii) Write X n 1 = q(x)g(x) + r(x) F n 2[X], where deg(r(x)) < r. By the same argument as in (i) we must conlcude that r(x) = 0 in F n 2[X]/ X n 1, so X n 1 = q(x)g(x). Hence, g(x) is a factor of X n 1. (iii) From (i) we know that we can write any polynomial f(x) C as f(x) = h(x)g(x), and from (ii) we know that X n 1 = q(x)g(x). So in F n 2[X]/ X n 1 : f(x) = h(x)g(x) = h(x)g(x) + p(x)(x n 1) for any p(x) C = h(x)g(x) + p(x)q(x)g(x) = (h(x) + p(x)q(x))g(x) = c(x)g(x), where deg(c(x)) n r 1. So any f(x) C can be written uniquely as f(x) = c(x)g(x). Since we have n r linearly independent multiples of g(x), namely g(x), g(x) X,..., g(x) X n r 1, and the dimension of C is n r, we see that C is generated as a subspace of F n 2 by the rows of G. 17

Example 32. Suppose C is a cyclic [7, 4, 3]-code, with generator polynomial g(x) = 1 + X + X 3. Then n = 7 and r = 3, so n r 1 = 3 and we have: g(x) G = g(x) X g(x) X 2 g(x) X 3 1 + X + X 3 = X + X 2 + X 4 X 2 + X 3 + X 5, X 3 + X 4 + X 6 which can be expressed as the generator matrix for which each row has a 1 on the positions that correspond to the powers of the polynomial in that row, e.g. 1 + X + X 3 gives the row (1101000). So: 1 1 0 1 0 0 0 G = 0 1 1 0 1 0 0 0 0 1 1 0 1 0. 0 0 0 1 1 0 1 A specific type of cyclic codes are quadratic residue (QR-)codes. These are codes over F p q, where p is prime and q is a quadratic residue modulo p, i.e. q = i 2 for some i = 1,..., p 1. If q = 2, the QR-code is called binary. If we let Q be the set of quadratic residues modulo p and let N = F p q\(q {0}), then: Definition 33. A QR-code C over F p q g(x) F q [X]/ X p 1, given by: is the cyclic code with generator polynomial g(x) = r Q(X α r ), where α is a primitive p th root of unity in the splitting field of X p 1. Note that: X p 1 = (X 1) (X α r ) α r Q r N(X r ) (2.5) = (X 1)g(X) r N(X α r ). (2.6) Finally, we move on to the our last subject that enables us to discuss the Golay codes: Steiner systems. 18

2.5. Steiner systems Definition 34. Let t, k, v be positive integers such that t k v. Let X be a set that consists of v points, where we call a subset of k points a block. Then a Steiner system S(t, k, v) is a collection of disctinct blocks such that any subset of t points is contained in exactly one block. Example 35. Let X be the projective plane of order 2 that consists of 7 points, see figure 2.3. We call a line through 3 points a block. Then X is a Steiner system S(2, 3, 7), since the total number of points is 7, a block is a subset of 3 points, and any subset of 2 points is contained in exactly one block. Figure 2.3.: The projective plane X. Definition 36. Let p 1,..., p k be the points in a block of a Steiner system S(t, k, v), and let λ ij be the number of blocks which contain p 1,..., p i but do not contain p i+1,..., p j for 0 i j k. If λ ij is constant, i.e. does not depend on the choice of p 1,..., p j, then λ ij is called a block intersection number. If i = 0, then λ ij is the number of blocks that do not contain p 1,..., p j, and if i = j then λ ij is the number of blocks that do contain p 1,..., p i. Theorem 37. Let t, k, v be integers for which a Steiner system S(t, k, v) exists. Then: (i) λ ij = λ i,j+1 + λ i+1,j+1 for all 0 i j k (the Pascal property), (ii) Let p 1,..., p t be any t distinct points and let λ i be the number of blocks that contain p 1,..., p i for i t, where λ 0 is the total number of distinct blocks. Then λ i is indepent of the choice of the i points out of {p 1,..., p t }, and for all i t it holds that: ) λ i = ( v i t i ( k i t i (iii) λ ii = λ i for all i < t, and λ ii = 1 for all i t. ), 19

Proof. (i) Let 0 i j k. Then λ ij is equal to the number of blocks that contain p 1,..., p i but do not contain p i+2,..., p j+1 by definition of the block intersection number. This set of blocks consists of blocks that contain p i+1 and those that do not: the number of blocks that do contain p i+1 is equal to λ i+1,j+1 and the number of blocks that do not is equal to λ i,j+1. Hence, λ ij = λ i,j+1 + λ i+1,j+1. (ii) The proof of this step goes by induction on i. Firstly, let i = t. Then by definition of a Steiner system λ t is independent of the choice of p 1,..., p t and it is equal to 1. Moreover: ) ) ( v t t t ( k t t t ) = ( v t 0 ( k t 0 ) = 1 1 = 1, so the statement holds for i = t. Now suppose it holds for some i + 1. For each block B that contains p 1,..., p i and each point q that is different from p 1,..., p i we define: { 1 if q B φ(q, B) := 0 if q B. Then by the induction hypothesis we have: φ(q, B) = λ i+1 (v i), q B since λ i+1 is the number of blocks containing p 1,..., p i and q, and v i is the number of choices for q. Also: φ(q, B) = λ i (k i), B q since λ i is the number of blocks containing p 1,..., p i and k i is the number of choices to choose q B. Hence, λ i is independent of the choice of p 1,..., p i and: v i λ i = λ i+1 ( k i v i 1 ) t i 1 = ( k i 1 ) v i k i t i 1 ) = = ( v i 1 t i 1 ( k i 1 t i 1 ( v i ) t i ( k i ). t i (iii) By definition λ ii = λ i for all 0 i t (here λ 00 is interpreted as the number of blocks that do not contain nothing, so it equals the total number of blocks). In (ii) we have seen that if i = t then λ t = 1, which immediately implies that λ i = 1 for all i > t as well. ) v i t i k i t i 20

Remark 38. Suppose that from any Steiner system S(t, k, v) we pick a point p. Then we can divide the blocks into two sets: the λ 01 blocks that do not contain p, and the λ 1 blocks that do contain p. If we now omit p from our system, then we can easily see that the λ 1 sets of k 1 points form the blocks of a Steiner system S(t 1, k 1, v 1) with block intersection numbers λ i+1,j+1. Hence, if a Steiner system S(t, k, v) exists, then so does a Steiner system S(t 1, k 1, v 1). By theorem 37(i) we can make a Pascal triangle of the block intersection numbers as follows: λ 0 λ 01 λ 1 λ 02 λ 12 λ 2 Example 39. The Pascal triangle of the block intersection numbers of figure 2.3 is: 7 4 3 2 2 1 0 2 0 1 Now that we have explored error-correcting codes in general we can move on to the Golay codes. 21

Chapter 3 The extended Golay code G 24 The extended Golay code was used for sending messages through space. This is because it is useful in situations where there is a high risk of noise in the channel, and when sending messages is difficult or expensive. We will discuss the most imporant features of the extended Golay code G 24 in section 3.1. In section 3.2 we discuss another Golay code, G 23, which is linked to G 24 and which we will need later on. Section 3.3 discusses how the decoding works for both codes. 3.1. The extended Golay code Definition 40. The extended Golay code G 24 is the binary linear code with generator matrix G = (1 12 B), where B is given by: 1 1 0 1 1 1 0 0 0 1 0 1 1 0 1 1 1 0 0 0 1 0 1 1 0 1 1 1 0 0 0 1 0 1 1 1 1 1 1 0 0 0 1 0 1 1 0 1 1 1 0 0 0 1 0 1 1 0 1 1 B = 1 0 0 0 1 0 1 1 0 1 1 1 0 0 0 1 0 1 1 0 1 1 1 1. 0 0 1 0 1 1 0 1 1 1 0 1 0 1 0 1 1 0 1 1 1 0 0 1 1 0 1 1 0 1 1 1 0 0 0 1 0 1 1 0 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 Remark 41. In general G 24 refers to any of the linear codes that are linearly equivalent to the one in definition 40. For example, we can find a generator matrix G for G 24 whose rows all have Hamming weight 8. 22

Firstly notice that B 2 = 1 12 and B t = B, which means that G is a parity check matrix itself and H is a generator matrix too. We can easily verify that any sum of two rows of G has Hamming weight 8. If we multiply G with its transpose, we obtain GG t = 0, so r r = n i=1 r ir i = 0 for each two rows r, r of G. This implies that x y = 0 for all x, y G 24, so G 24 is contained in its dual. However, since H is obtained from G by permuting columns, we have that G 24 and G 24 are linearly equivalent, so they have the same dimension. Hence, G 24 = G 24. Moreover, we must have that #{i x i = y i } is even for all x, y G 24. Now if we assume both wt(x) and wt(y) are divisible by 4 then: wt(x + y) = #{i x i + y i = 1} = #{i (x i, y i ) {(1, 0), (0, 1)}} = #({i x i = 1}\{i x i = y i = 1}) + #({i y i = 1}\{i y i = x i = 1}) = wt(x) + wt(y) 2wt(x y) = wt(x) + wt(y), so wt(x + y) again is divisible by 4. We can easily check that each row of G has weight 8 or 12. Combining all this immediately proves the following lemma: Lemma 42. The Hamming weight of each codeword of G 24 is divisible by 4. Proof. We can permute the columns of G to obtain a linearly equivalent code, which by remark 41 gives us another Golay code G 24. We take G = (L R), with: 1 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 L = 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0, and 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 23

0 1 1 0 1 1 1 0 0 0 1 0 0 1 0 1 1 1 0 0 0 1 0 1 0 0 1 1 1 0 0 0 1 0 1 1 0 1 1 1 0 0 0 1 0 1 1 0 0 1 1 0 0 0 1 0 1 1 0 1 R = 0 1 0 0 0 1 0 1 1 0 1 1 0 0 0 0 1 0 1 1 0 1 1 1. 0 0 0 1 0 1 1 0 1 1 1 0 0 0 1 0 1 1 0 1 1 1 0 0 0 1 0 1 1 0 1 1 1 0 0 0 0 0 1 1 0 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 We label the 12 columns of L from left to right as l, l 1,..., l 11, and the columns of R as r, r 1,..., r 11. We see that the columns l 1,..., l 11, r form the identity matrix 1 12, and columns r 1,..., r 11, l form our original matrix B. This enables us to deduce the following facts about G 24 : Theorem 43. G 24 is invariant under the permutation τ = (l r )(l 0 r 0 )(l 1 r 10 )(l 2 r 9 ) (l 10 r 1 ). Proof. If g 0,..., g 11 are the rows of G = (L R), then: τ(g 0 ) = τ(110000000000011011100010) = (010100011101110000000000) = g 0 + g 1 + g 3 + g 4 + g 5 + g 9 + g 11, which is in G 24 since it is a linear code. In the same manner we can show that τ(g i ) is a sum of certain rows of G 24 for i = 1,..., 10. Finally: τ(g 11 ) = τ(000000000000111111111111) = (111111111111000000000000) = g 11, which a row of G 24 which we have shown to be equal to G 24. Corollary 44. If we denote a codeword x of G 24 by (x L x R ), where x L and x R consist of the first and last twelve digits of x respectively, then theorem 43 states that also x = (x L x R ) is in G 24, where wt(x L ) = wt(x R ) and wt(x R ) = wt(x L ). For any message m we have a codeword x = mg = (x L x R ), where wt(x L ) = wt(x R ) 0 mod 2, since the weight of each row of L and the weight of each row of R is even. This leads to the following result: Theorem 45. No codeword of G 24 can have Hamming weight 4. Proof. Suppose x is a codeword of G 24 of weight 4. Then we have three possibilities: wt(x L ) = 0, 2 or 4. We use generator matrix G = (L R) for G 24 that is given in lemma 42 to check these options. 24

(i) If wt(x L ) = 0, then for the original message m it holds that m 1 =... = m 11 = 0, and m 12 can either be 0 or 1. If m 12 = 0, then wt(x R ) = 0, and if m 12 = 1, then wt(x R ) = 12. In both cases wt(x) 4, a contradiction. (ii) If wt(x L ) = 2, then for the original message m there are two cases. In case 1) it holds that m i = 1 for some i = 1,..., 11 where all other digits are 0 and m 12 is either 0 or 1. Both when m 12 = 0 and m 12 = 1 we can easily verify that wt(x L ) = 6. In case 2) it holds that m i = m j = 1 for some i, j = 1,..., 11 i j, where all other digits are zero, and m 12 is either 0 or 1. Again for either m 12 we have that wt(x R ) = 6. So all possibilities give us wt(x) = 8, a contradiction. (iii) If wt(x L ) = 4, then wt(x R ) must be equal to 0 and by corollary 44 we have x = (x L x R ) G 24, where wt(x L ) = 0. However, this is case (i) so again we have a contradiction. Corollary 46. No codeword of G 24 can have Hamming weight 20. Proof. If we take m = (1,..., 1) F 12 2, then mg = 1 G 24. So if x is a codeword with wt(x) = 20, then x + 1 again is a codeword with wt(x + 1) = 4, but by corollary 45 this cannot hold. Theorem 47. G 24 is a [24, 12, 8]-code. Proof. We already know that G 24 is linear and that it has block length 24 and dimension 12. By lemma 42 the weight of each codeword of G 24 is divisible by 4, by theorem 45 there are no codewords of Hamming weight 4, and we can easily check that there exist codewords of weight 8. For example, take m = (1 0... 0), then mg = (110000000000 011011100010) so wt(mg) = 8. Hence, the minimal Hamming weight of all codewords is 8. Since G 24 is linear, by theorem 9 we know that the minimal Hamming weight is equal to the minimal Haming distance, so G 24 is a [24, 12, 8]-code. The possible Hamming weights of codewords in G 24 are 0, 8, 12, 16 and 24. Out of the total of 2 12 codewords in G 24 it is interesting to see how many of them have weight i, the so-called weight distribution number A i. Ofcourse we have A 1 = A 24 = 1, since 0 and 1 are the only codewords of Hamming weight 0 and 24 respectively. Also, G 24 is self-dual so for any codeword of Hamming weight 8 there is exactly one codeword of Hamming weight 16, namely its complement. Hence, A 8 = A 16. We will use the proof of theorem 45 to derive A 8. Suppose x is a codeword of Hamming weight 8. If wt(x L ) = 0 then wt(x R ) is either 0 or 12, but both cases this means that wt(x) 8. By corollary 44 then wt(x L ) cannot be equal to 8 either. Hence, wt(x L ) = 2, 4 or 6: (i) If wt(x L ) = 2, then there are two cases for the original message m, as we have seen in the proof of theorem 45. For case 1), we have to choose 1 out of 11 digits to 25

set equal to 1, which gives us 11 possibilities. For case 2), we have to choose 2 out of 11 digits to set equal to 1, which gives us ( ) 11 2 possibilities. Since in both cases m 12 can either be 0 or 1, we have to multiply by 2. This gives us 2 (11 + ( ) 11 2 ) possibilities for wt(x L ) = 2. (ii) If wt(x L ) = 4, then wt(x R ) must be equal to 4 as well. This means that for the original message m there are two cases. 1) It holds that m i = m j = m k = 1 for some i, j, k = 1,..., 11 that are distinct, and all other digits are zero. 2) It holds that m i = m j = m k = m l = 1 for some i, j, k, l = 1,..., 11 that are distinct, and all other digits are zero. In both cases m 12 must be equal to 0, otherwise wt(x R ) = 8. This means that we have ( 11 3 ) + ( 11 4 ) possibilities for wt(xl ) = 4. (iii) If wt(x L ) = 6, then by corollary 44 there is a codeword x G 24 for which it holds that wt(x L ) = 2. However, this is case (i), so again we have 2 (11 + ( ) 11 2 ) possibilities. Combining all this gives us: A 8 = A 16 = 2 ( 11 + ( )) 11 + 2 (( ) 11 + 3 = 2 66 + (165 + 330) + 2 66 = 759, and: A 12 = 2 12 A 1 A 8 A 16 A 24 = 4096 1 759 759 1 = 2576. ( )) ( 11 + 2 11 + 4 Table 3.1 gives an overview of the weight distribution numbers of G 24. Hamming weight i 0 8 12 16 24 Weight distribution number A i 1 759 2576 759 1 ( )) 11 2 Table 3.1.: Weight distribution of G 24. If we focus on the 759 codewords of Hamming weight 8, we will see that they form an important subset of G 24. Theorem 48. Any vector r F 24 2 with wt(r) = 5 is covered 1 by exactly one x G 24 for which wt(x) = 8. Proof. Suppose we have a vector r with wt(r) = 5. (i) If it is covered by two codewords x, y G 24 of Hamming weight 8, then we have r ai = x ai = y ai for some a i {1,..., 24} and i = 1,..., 5. Since wt(x) = wt(y) = 8, x and y can differ at at most 6 places, which means that dist(x, y) 6. However, this is a contradiction to the fact that d = 8. 1 If for two vectors r, w F n 2 it holds that r covers w, then r i = 1 implies that w i = 1 for all i = 1,..., n. 26

(ii) If x is a codeword with wt(x) = 8, then the number of vectors r with wt(r) = 5 that it covers is ( 8 5), since there are 5 out of 8 digits that you can choose to set equal to 1. (iii) We have shown that there are 759 codewords of Hamming weight 8 in G 24, and 759 (8 5) = ( 24 5 ) which is exactly the number of vectors r with wt(r) = 5. Theorem 49. The codewords x G 24 with wt(x) = 8, the so called octads, form the blocks of a Steiner system S(5, 8, 24). Proof. This follows immediately from theorem 48. Since we have a generator matrix for which any sum of two of its columns has Hamming weight 8, the octads generate G 24. Hence, the extended Golay code is the vector space over F 2 spanned by the octads of the Steiner system S(5, 8, 24). By theorem 37 we can make a Pascal triangle of the Steiner system S(5, 8, 24), which is given in table 3.2: 759 506 253 330 176 77 210 120 56 21 130 80 40 16 5 78 52 28 12 4 1 46 32 20 8 4 0 1 30 16 16 4 4 0 0 1 30 0 16 0 4 0 0 0 1 Table 3.2.: The Pascal triangle of S(5, 8, 24). From this triangle we can derive important properties of the blocks of a Steiner system S(5, 8, 24). For example, it will help us in proving that this Steiner system S(5, 8, 24) is unique, which we will see in theorem 55. First we will need some preliminary results. Theorem 50. Let {λ ij } 0 i j 8 be the block intersection numbers of G 24 as in table 3.2. Then: (i) Two blocks can meet in either 0, 2 or 4 points, (ii) If two blocks meet in 4 points, then their sum again is a block, (iii) The blocks meeting any block in 4 points determine all other blocks. 27

Proof. We denote the 24 points of S(5, 8, 24) by {a, b,..., x}. Since the octads of G 24 form the blocks of a Steiner system S(5, 8, 24), we can view addition of blocks in S(5, 8, 24) as taking their symmetric difference. E.g. {abcdef gh} + {abcdijkl} = {ef ghijkl}. (i) In table 3.2 we see that the number of blocks in S(5, 8, 24) containing 5, 6, 7 or 8 points are λ 5 =... = λ 8 = 1, so blocks cannot meet in 5 points or more. The octads of G 24 form the blocks of a Steiner system S(5, 8, 24), and since the weight of all codewords in G 24 is even, blocks can meet in 0, 2 or 4 points only. (ii) Let A and B be two blocks that meet in 4 points, say A = {abcdefgh} and B = {abcdijkl}. In S(5, 8, 24) each set of 5 points is covered by exactly one block. Suppose A + B = {efghijkl} is not a block. Then, if C is the block that contains {efghi}, by (i) it must contain at least one more point of B, say C = {efghijmn}. Likewise, the block D that contains {efghk} must contain at least one more point of B, say D = {efghklop}. Now let E be the block that contains {efgik}. By (i) it holds that E must contain at least one more point of A. If this point is a, b, c or d, then we see that E must also contain at least one more point of B, etc. However, then E meets A and B in more than 4 points, a contradiction with (i). If it is h, then E meets C and D in 5 points, also a contradiction to (i). Hence, A + B must be a block. (iii) Suppose we have all the blocks in S(5, 8, 24) that meet any given block A in 4 points. Then by (i) we must find all blocks that meet A in either 0 or 2 points. By table 3.2 we know that the number of blocks that meet A in 0 points is λ 08 = 30, and the number of blocks that meet A in 2 points is λ 28 = 16. We can write A = {abcde...} since any five points of a block uniquely determine three others. The number of blocks that contain {abcd} is λ 4 = 5, so next to A we have 4 more which we denote by B 1, B 2, B 3 and B 4. Likewise, the 4 blocks other than A that contain {abce} we denote by C 1, C 2, C 3 and C 4. From (ii) we know that the B n + C m are blocks for n, m = 1,..., 4. Since they are clearly distinct and #{B n + C m n, m = 1,..., 4} = 16, they are the blocks that meet A in 2 points: in d and e. Likewise, the B n + B m are blocks for n, m = 1,..., 4. Obviously, the B n + B m all meet A in 0 points and #{B n + B m n, m = 1,..., 4} = 3! = 6. Since instead of e we can also choose a, b, c or d, we have 5 choices in total. This gives us the 5 6 = 30 blocks that meet A in 0 points. We obtain other useful properties of the blocks of S(5, 8, 24) by looking at sextets. These will help us by proving uniqueness of the Steiner system S(5, 8, 24). Since for any four points a fifth uniquely determines a block, we can divide the 24 points into 6 distinct subsets of 4 points as follows: if we have {abcd}, then {e} gives us {fgh}, {i} gives us {jkl}, etc. We obtain the following partition of S(5, 8, 24): 28

{abcd} {e} {fgh} {i} {jkl} {m} {nop} {q} {rst} {u} {vwx} Table 3.3.: Partition of S(5, 8, 24). We can thus say that this partition is defined by {abcd}. Definition 51. A sextet is a partition of S(5, 8, 24) into 6 disjunct subsets that is defined by a set of 4 points. These 6 subsets are called tetrads. Remark 52. Note that the definition of a sextet implies that any two tetrads together form a block of S(5, 8, 24). We can look at the number of points in which a block meet such sextets. We denote this number by x k 1 1 x k 6 6, where x i is the number of points in which the block meets tetrad i, and k i gives the amount of tetrads for which this number is the same. E.g. for the sextet in table 3.3 and block {abcdefgh}, this number is 4 2 0 4. Theorem 53. The number of points in which a block meets a sextet is either 4 2 0 4, 2 4 0 2 or 3 1 1 5. Proof. Suppose we have a sextet where T is one of its tetrads. Let B = {abcdefgh} be any block. Then B can meet T in 4, 3, 2, 1 or 0 points. Firstly suppose B meets T in 4 points, say in {abcd}. Let S be any other tetrad. Then by remark 52 also {abcd} S is a block. If {efgh} meets S in 1, 2 or 3 points, then we are in the situation in which two blocks meet in 5, 6 or 7 points respectively, a contradiction to theorem 50 (i). Therefore, {efgh} must meet S in 0 or 4 points. If they meet in 0 points, then there must be another tetrad S that meets {efgh} in 4 points. Hence, if B meets a tetrad in 4 points, then x k 1 1 x k 6 6 = 4 2 0 4. Secondly suppose that B meets T in 3 points. Then by theorem 50 (ii) any other tetrad S must meet B in precisely 1 point. Hence, the only possibility is that x k 1 1 x k 6 6 = 3 1 1 5. Now suppose that B meets T in 2 points. Let S be any other tetrad. Since T S is a block, by theorem 50(ii) B and S can only meet in 0 or 2 points. The only possibility thus is x k 1 1 x k 6 6 = 2 4 0 2. Finally, when B meets T in 0 points, obviously the only possibilities are x k 1 1 x k 6 6 = 4 2 04 or 2 4 0 2. Definition 54. If we label the tetrads of any two sextets A and B as A 1,..., A 6 and B 1,..., B 6, then the ij th entry of the 6 6 intersection matrix I AB of A and B is given by the number of points that are both in A i and B j, for all i, j = 1,..., 6. 29

Using remark 52 and theorem 53 we can easily verify that an intersextion matrix of any two different sextets must be of the following form: 4 0 0 0 0 0 0 4 0 0 0 0 0 0 4 0 0 0 0 0 0 4 0 0 if the sextets are equal, (3.1) 0 0 0 0 4 0 0 0 0 0 0 4 2 2 0 0 0 0 2 2 0 0 0 0 0 0 2 2 0 0 0 0 2 2 0 0 if two colums of a sextet meet the other sextet in 4 2 0 4 points, 0 0 0 0 2 2 0 0 0 0 2 2 (3.2) 2 0 0 0 1 1 0 2 0 0 1 1 0 0 2 0 1 1 0 0 0 2 1 1 if two columns of a sextet meet the other sextet in 2 4 0 2 points, 1 1 1 1 0 0 1 1 1 1 0 0 (3.3) 3 0 0 0 0 1 1 0 0 0 0 3 0 1 1 1 1 0 0 1 1 1 1 0 if two columns of a sextet meet the other sextet in 3 1 5 1 points. 0 1 1 1 1 0 0 1 1 1 1 0 (3.4) We have now obtained enough tools to prove the following theorem. Theorem 55. The Steiner system S(5, 8, 24) is unique. Proof. We will show that an arbitraty block O of S(5, 8, 24) determines all others, which means that S(5, 8, 24) is unique. For this, we will look at 7 sextets that are uniquely determined by O. Then we will look at their intersection matrices and pick the ones that are of the same form as in 3.2. We will then show that with these intersection matrices we can obtain all blocks that meet O in 4 points, and theorem 50 (iii) will complete the proof. Suppose we have a Steiner system S(5, 8, 24) where we denote its points as a, b,..., x, and let O be any block, say O = {abcdefgh}. Let i be a point that is not in O and let S 1 be the sextet that is defined by {abcd}, say S 1 is as in table 3.3. If we let the disjoint 30

sets of 4 points be the columns of a 4 6-matrix, then we can represent S 1 as: a e i m q u S 1 = b f j n r v c g k o s w, d h l p t x where ofcourse permutations inside the columns are allowed. We will now construct six more sextets that are determined by O and call them S 2,..., S 7. Firstly, we look at the block {bcdei...}. By theorem 53 the number of points in which this block meets S 1 can only be 3 1 1 5. So, we can say that this block is {bcdeimqu}, and that the sextet S 2 that is defined by {bcde} is: b a i j k l S 2 = c f m n o p d g q r s t. e h u v w x If we now put the columns of S 2 that do not meet O in a 4 4-matrix B, we obtain: i j k l B = m n o p q r s t. u v w x From this matrix B we can make several new blocks by adding rows and columns: (i) The sums of any two rows of B give us 6 blocks, (ii) the sums of any two columns of B give us 6 more, and (iii) the sums of (i) and (ii) give us 6 6 = 2 182. This gives us 6 + 6 + 18 = 30 blocks that are disjoint from O, and in table 3.2 we see that λ 08 = 30, which means that we actually obtained all blocks which are disjoint from O. We now write: 1 2 3 4 5 6 S 1 = 1 2 3 4 5 6 1 2 3 4 5 6, 1 2 3 4 5 6 meaning we set S 1 as a basis matrix. Since permutations inside columns are allowed, the digits in column i all get label i. Then S 2 becomes: 2 1 3 3 3 3 S 2 = 1 2 4 4 4 4 1 2 5 5 5 5. 1 2 6 6 6 6 2 Here we have to divide by two, since otherwise we count all octads twice. For example, rows 1 and 2 with columns 1 and 2 yield the same octad as rows 3 and 4 with columns 3 and 4. 31

For example the first column (2 1 1 1) t corresponds to the fact that a is in the second column in S 2, and b, c and d are in its first. By theorem 53 the number of points in which the block {acdei...} meets both S 1 and S 2 can only be 3 1 1 5, which means that this block is actually {acdeinsx}. The sextet S 3 that is generated by {acde} can be of different forms by theorem 53, but if keep in mind that it cannot give rise to any of the 30 blocks as in (i), (ii) or (iii), it can only be: 1 1 3 4 6 5 S 3 = 2 2 4 3 5 6 1 2 6 5 3 4. 1 2 5 6 4 3 By taking a closer look to the three sextets S 1, S 2 and S 3, we see that they are invariant under the following permutations of g, h,..., x: τ 1 = (mqu)(nsx)(jkl)(rwp)(otv) τ 2 = (mq)(pt)(jk)(vw)(ns)(or) τ 3 = (gh). The block {abcei...} meets the first columns of S 1, S 2 and S 3 in three points, so the number of points in which it meets either one of these sextets can only be 3 1 1 5. This means that its last three digits cannot be {jkl}, {nsx} or {mqu}. What remains is {otv} or {prw}. We see that τ 2 ({otv}) = (prw), so we can choose the block {abcei...} to be {abceiprw}. Hence, the sextet S 4 that is defined by {abce} is given by: 1 1 3 4 5 6 S 4 = 1 2 5 6 3 4 1 2 6 5 4 3. 2 2 4 3 6 5 In a similar manner we can see that the sextet S 5 that is defined by {abde} is given by: 1 1 3 4 5 6 S 5 = 1 2 6 5 4 3 2 2 4 3 6 5. 1 2 5 6 3 4 Now by theorem 53 the number of points in which the block {abefi...} meets S 1 is 2 4 0 2, and since τ 1 leaves S 1 unchanged we can assume the four set of two points where they meet are in the first four columns of S 1. Hence, this block is {abefijmn}, and this gives rise to the sextet S 6 : 1 1 3 3 4 4 S 6 = 1 1 3 3 4 4 2 2 5 5 6 6. 2 2 5 5 6 6 32

Finally, in a similiar way we can deduce that the block {aceik...} actually is {aceigkmo} by using the fact that τ 3 leaves S 1 unchanged. This gives rise to the sextet S 7 : 1 1 3 3 5 5 S 7 = 2 2 4 4 6 6 1 1 3 3 5 5. 2 2 4 4 6 6 Now we will show that the sextets S 1,..., S 7 which are uniquely determined by O actually determine all blocks that meet O in 4 points. For this, we firstly notice that the number of blocks containing four points is λ 4 = 5, and the number of ways to choose 4 points out of 8 is ( 8 4) = 70, so the number of blocks meeting O in 4 points is (5 1) 70 = 280. Hence, we will check if S 1,..., S 7 determine 280 different blocks that meet O in 4 points. By theorem 50 (ii) the sum of two blocks that meet in 4 points again is a block. If we now add blocks from two sextets that have an intersection matrix of the form as in 3.2, we will obtain new blocks and hence new sextets. An easy verification shows that only intersection matrices I S1 S 6, I S2 S 6, I S3 S 6, I S1 S 7, I S2 S 7, I S5 S 7 and I S6 S 7 are of this form. So we have 7 duo s of matrices from which we can obtain new blocks. For each of these duo s, the number of new blocks that we can make from them that meet O in 4 points is 8, but only four of them are different. For example, from the sextets S 6 and S 7 we can add the following blocks: S 6,1 S 6,3 + S 7,1 S 7,3, S 6,1 S 6,4 + S 7,1 S 7,4, S 6,1 S 6,5 + S 7,1 S 7,5 and S 6,1 S 6,6 + S 7,1 S 7,6, which are the same as S 6,2 S 6,3 + S 7,2 S 7,3, S 6,2 S 6,4 + S 7,2 S 7,4, S 6,2 S 6,5 + S 7,2 S 7,5 and S 6,2 S 6,6 + S 7,2 S 7,6. This means that in total we can obtain 7 4 = 28 new sextets that are defined by 4 points of O. We already had S 1,..., S 7, so in total we now have 35 such sextets. For any of these 35 sextets there are 8 possibilities of combining two of its tetrads in order to obtain a block that meets O in 4 points (namely, tetrads 1+3, 1+4, 1+5, 1+6, 2+3, 2+4, 2+5 or 2+6), so in total we have now find 35 8 = 280 blocks that meet O in 4 points. To conclude, the arbitrary block O determines all blocks that meet O in 4 points, which by theorem 50 (iii) determine all blocks of S(5, 8, 24). So, S(5, 8, 24) is unique. Corollary 56. G 24 is unique. Proof. We know the octads generate G 24 and that they form the blocks of the Steiner system S(5, 8, 24), which by theorem 55 is unique. Hence, G 24 is unique. Remark 57. Note that G 24 is unique up to equivalence, since a permutation of the columns of a generator matrix G corresponds to a permutation of the digits in the codewords, and thus maps any octad to another octad. However, we already stated in remark 41 that in general G 24 refers to any of the linear codes that are equivalent to the one in definition 40, so we can simply say that G 24 is unique. 33

3.2. The Golay code Now that we have defined the extended Golay code G 24 and discussed its most imporant features, it is easy to construct the Golay code G 23. Definition 58. The Golay code is the linear code G 23 which is obtained from G 24 by omitting the last digit from each codeword x G 24. Remark 59. This definition is equivalent to stating that G 23 is the linear code whose generator matrix is obtained from the generator matrix G of G 24 by deleting its last column. Moreover, like the extended Golay code also G 23 refers to any of the linear codes that are linearly equivalent to the one in definition 58, so in fact we can remove any i th digit of every codeword x G 24 to obtain the Golay code G 23. This means that the block length of G 23 is 23, its dimension remains 12 and the minimum Hamming distance is 7. Deleting one digit ofcourse preserves linearity, so G 23 is a [23, 12, 7]-code. We can also construct G 24 from G 23 by adding a so called overall parity check: to every codeword x G 23 we add a 1 as a 24 th digit if the Hamming weight of x is odd, and we add a 0 if the Hamming weight of x is even. This explains why G 24 is called the extended Golay code. Theorem 60. G 23 is a perfect code. Proof. We calculate the sphere packing boundary of G 23. Since d = 7 and t = 1 2 (7 1) = 3, we obtain: 3 i=0 ( ) 23 = 1 + i ( ) 23 + 1 ( ) 23 + 2 = 1 + 23 + 253 + 1771 = 2048 = 2 23 12, and by definition 26 it follows that G 23 is perfect. ( ) 23 3 By remark 38 we know that if a Steiner system S(t, k, v) exists, then a Steiner system S(t 1, k 1, v 1) exists as well. As the octads of G 24 form the blocks of the Steiner system S(5, 8, 24), a Steiner system S(4, 7, 23) must exist as well. The codewords of G 23 of Hamming weight 7 actually form its blocks. Table 3.4 gives an overview of the properties of the two binary Golay codes, and tabel 3.5 gives the weight distributions of the two binary Golay codes: 34

G 24 G 23 [n, k, d] [24, 12, 8] [23, 12, 7] perfect no yes error-correcting capability t 3 3 self dual yes no Steiner system S(5, 8, 24) S(4, 7, 23) generators/blocks codewords of weight 8 codewords of weight 7 Table 3.4.: Properties of G 24 and G 23. Hamming weight i 0 7 8 11 12 15 16 23 24 G 24 1-759 - 2576-759 - 1 G 23 1 253 506 1288 1288 506 253 1 - Table 3.5.: Weight distributions of G 24 and G 23. 3.3. Decoding We use the Golay codes in communicating because they give a small risk of misinterpreting a message that was sent through a noisy channel. In this section we describe how this interpreting, the decoding of received vectors, is being done. For both G 24 and G 23 we know that d = 8, so their error-correcting capability is t = 1 (8 1) = 3. This means we can correct all vectors that we receive with Hamming 2 weight 3 or less. Firstly we explain the decoding procedure for G 24. We know that the syndrome of a received vector is the same as the syndrome of its error vector, so we will look at the error vector only. Now suppose that e is an error vector of weight 3 or less, and write e = (e L e R ). Since the generator matrix G of definition 40 is a parity check matrix as well, there are two ways to compute the syndrome: S 1 (e) = eh t = (e L e R )(B 1 12 ) t = (e L B t + e R ), and S 2 (e) = eg t = (e L e R )(1 12 B) t = (e L + e R B t ) = S 1 (e)b t. Since wt(e) 3, either wt(e L ) 1 or wt(e R ) 1: 35

(i) If wt(e L ) 1, then either: wt(e L ) = 0, in which case wt(e R ) 3 and S 1 (e) = e R, or wt(e L ) = 1, in which case S 1 (e) = (b + e R ), where b is a column of B and wt(e R ) 2. (ii) If wt(e R ) 1, then either: wt(e R ) = 0, in which case wt(e L ) 3 and S 2 (e) = e L, or wt(e R ) = 1, in which case S 2 (e) = (e L + b), where b is a column of B and wt(e L 2). Hence, the syndrome is a column of B with at most two digits changed. This allows us to construct the following algorithm to decode a given vector r when using the extended Golay code: Algorithm 61. Decoding the extended Golay code G 24. For a received vector r F 24 2 : (i) Compute the syndrome S 1 (r) = rh t = r(b 1 12 ) t. (ii) If wt(s 1 (r)) 3, then the error vector e = (0 S 1 (r)) and you can complete the decoding as in algorithm 24. (iii) If wt(s 1 (r)) > 3, then compute wt(s 1 (r) + b i ) where b i is the i th column of B for all i = 1,..., 12. If wt(s 1 (r) + b i ) 2 for some i, then the error vector is e = (S 1 (r) + B i δ i ), where δ i is the vector in F 24 2 with 1 on position i and 0 on all other positions. If it holds that wt(s 1 (r) + b i ) 2 for more than one i, you choose the one(s) with the smallest Hamming weight. You complete as in algorithm 24. (iv) If wt(s 1 (r) + b i ) > 3 for all i = 1,..., 12, then compute the syndrome S 2 (r) = S 1 (r)b t. (v) If wt(s 2 (r) 3) then the error vector e = (S 2 (r) 0) and you complete as in algorithm 24. (vi) If wt(s 2 (r) > 3), then compute wt(s 2 (r)+b i ) for all i = 1,..., 12. If wt(s 2 (r)+b i ) 2 for some i, then the error vector e = (δ i S 2 (r) + b i ), and if for more than one i it holds that wt(s 2 (r) 2), then you choose the one(s) with the smallest Hamming weight. You complete as in algorithm 24. Example 62. Suppose we receive a vector r with syndrome S 1 (r) = (100011010010). We will decode it using algorithm 61. (i) S 1 (r) = (100011010010). (ii) wt(s 1 (r)) = 6 > 3. (iii) wt(s 1 (r) + b 1 ) = wt(010100010111) = 6 > 2, 36

wt(s 1 (r) + b 2 ) = wt(001101011001) = 6 > 2, wt(s 1 (r) + b 3 ) = wt(111111000101) = 8 > 2, wt(s 1 (r) + b 4 ) = wt(011011111111) = 10 > 2, wt(s 1 (r) + b 5 ) = wt(010010001001) = 4 > 2, wt(s 1 (r) + b 6 ) = wt(000001100101) = 4 > 2, wt(s 1 (r) + b 7 ) = wt(100110111101) = 8 > 2, wt(s 1 (r) + b 8 ) = wt(101000001111) = 6 > 2, wt(s 1 (r) + b 9 ) = wt(110101101011) = 8 > 2, wt(s 1 (r) + b 10 ) = wt(111000110001) = 6 > 2, wt(s 1 (r) + b 11 ) = wt(111101000100) = 6 > 2, wt(s 1 (r) + b 12 ) = wt(011001011001) = 6 > 2. (iv) wt(s 1 (r) + b i ) > 3 for all i = 1,..., 12, so we compute the syndrome S 2 (r) = S 1 (r)b t = (100110100011). (v) wt(s 2 (r)) = 6 > 2. (vi) wt(s 2 (r) + b 1 ) = wt(010001100010) = 4 > 2, wt(s 2 (r) + b 2 ) = wt(001000101100) = 4 > 2, wt(s 2 (r) + b 3 ) = wt(111010110000) = 6 > 2, wt(s 2 (r) + b 4 ) = wt(011110001010) = 6 > 2, wt(s 2 (r) + b 5 ) = wt(010111111100) = 8 > 2, wt(s 2 (r) + b 6 ) = wt(000100010000) = 2, wt(s 2 (r) + b 7 ) = wt(100000001100) = 3 > 2, wt(s 2 (r) + b 8 ) = wt(101101111110) = 9 > 2, wt(s 2 (r) + b 9 ) = wt(110000011010) = 5 > 2, wt(s 2 (r) + b 10 ) = wt(001011010010) = 5 > 2, wt(s 2 (r) + b 11 ) = wt(111101000000) = 5 > 2, wt(s 2 (r) + b 12 ) = wt(011001011101) = 7 > 2. Our error vector is e = (δ 6 S 2 (r)) = (000001000000 000100010000), the original codeword is x = r e, and therefore the original message is m = x L. For the decoding procedure of G 23 we make use of the fact that G 24 is obtained from G 23 by adding an overall parity check to its codewords: if we add a 24 th digit to a received vector r F 23 2, we can use the standard array. 37

Note that all codewords of G 24 have even Hamming weight, so if a received vector in F 24 2 has odd Hamming weight then we know that an odd amount of errors has occured. Since the error-correcting capability of G 23 too is 3, we assume that at most 3 errors occured in the 23 digits of r. Now we want to choose the 24 th digit r 24 in such a way that wt(r 1... r 23 r 24 ) is odd, for then we know that the Hamming weight of its error vector is at most 3, and we can use the standard array. Moreover, since G 23 is perfect and t = 1 (7 1) = 3, we can correct all vectors that we receive with Hamming weight 2 3 or less uniquely. The algorithm for decoding both Golay codes by using GAP can be found in [14]. Now that we have explored the most important properties of the Golay codes, we will discuss a group which has an interesting connection to G 24, the Mathieu group M 24. 38

Chapter 4 The Mathieu Group M 24 4.1. Introduction The Mathieu groups are five groups which are commonly denoted by M 11, M 12, M 22, M 23 and M 24. Their characterizing features are very rare, and as a result they do not belong to any of the larger families in the classification of all finite simple groups. Therefore, they are called sporadic. This classification is as follows: Any finite simple group belongs to either one of the following categories: Cyclic groups C p, where p is prime, Alternating groups A n, where n 5, Lie-type groups, including both: Classical Lie-groups, and Exceptional and twisted groups of Lie-type, Sporadic groups. The last category consists of a collection of groups that do not have common characteristics with the first three categories, but cannot be charcaterized as a fourth family either. The Mathieu groups are multiple transitive permutation groups, but they are not alternating, which is the reason why they do not fall into either of the first three categories. These characteristics will be explained in detail in the remainder of this chapter. Mathieu first discovered M 12 in 1861, and already briefly mentioned M 24. For a long time the existence of M 24 was controversial, and in 1898 Miller even showed that this group could not exist. However, two years later he found a mistake in his proof. It was not until 1938 when Witt finally proved its existence, by showing that it was an automorphism group of the Steiner system S(5, 8, 24). Since the blocks of this Steiner 39

system are the octads of G 24, we will devote this chaper to M 24. In section 4.2 we will discuss some definitions and results that allow us to describe the construction of M 24 in section 4.3, and to prove its multiple transitivity and simplicity in sections 4.4 and 4.5. 4.2. Definitions We will first discuss the most important definitions and results which we will need in the remainder of this chapter. To start with, as M 24 is the automorphism group of the extended Golay code, we will define what an automorphism group of an error-correcting code is. Definition 63. The automorphism group of a linear code C is the group of all permutations of the digits of the codewords that map C to itself. It is denoted by Aut(C). Remark 64. We can view Aut(C) as a subgroup of S n, where n is the block length of C: σ(x) = (x σ(1)... x σ(n) ) for any σ Aut(C). The theorems and lemmas in this sextion will be convenient in the proofs of M 24 s order, multiple transitivity and simplicity. We use the following notation: For a group G of permutations on a set X it holds that: The point-wise stabilizer of an element x X is given by Stab(x) = {g G g(x) = x}. The set-wise stabilizer of a subset Y in X is given by Stab(Y ) = {g G g(y) Y y Y }. The orbit of a subset Y in X under G is given by Y G = {g(y ) g G} = {{g(y) y Y } g G}. The normalizer of a subgroup H of G is given by N G (H) = {g G gh = Hg}. Note that N G (H) always contains H and that H is a normal subgroup if and only if N G (H) = G. Theorem 65 (Orbit-Stabilizer theorem). For a group G of permutations on a set X and any Y X it holds that: #G = #Stab(Y ) #Y G. Proof. For any g, h G we have that gy = hy if and only if gh 1 Stab(Y ), so the size of Y G is equal to the number of cosets of Stab(Y ) in G, i.e. #Y G = [G : Stab(Y )] = #G #Stab(Y ). Definition 66. Let G be a group of permutations that acts on a set X. Then G is transitive on X if it has one orbit, i.e. if x G = X for all x X. 40

One of the characterizing properties of the Mathieu groups is mutiple transitivity, which is defined as follows: Definition 67. Let k be any integer and let G be a group of permutations that acts on a set X with #X k. Then G is k-fold transitive if for any two ordered k-tuples (x 1,..., x k ) and (y 1,..., y k ) in X with x i x j and y i y j for all i j, there is a permutation σ G such that σ(x i ) = y i for i = 1,..., k. Lemma 68. A group of permutations G that acts on a set X with #X 3 is k-fold transitive for 2 k #X if and only if the stabilizer of any k 1 points x 1,..., x k 1 X is transitive on X\{x 1,..., x k 1 }. Proof. Assume that G is k-fold transitive. Then for any two ordered k-tuples (x 1,..., x k 1, w) and (x 1,..., x k 1, z) in X with w, z x 1,..., x k 1 there is a permutation σ G such that σ(x i ) = x i for all i = 1,..., k 1 and σ(w) = z. Hence, Stab(x 1,..., x k 1 ) acts transively on X\{x 1,..., x k 1 }. Now assume that for any x 1,..., x k 1 X with x i x j for all i j we have that Stab(x 1,..., x k 1 ) acts transitively on X\{x 1,..., x k 1 }. Let (x 1,..., x k ) and (y 1,..., y k ) be two ordered k-tuples in X such that x i x j and y i y j for all i j. Firstly suppose that y k x 1,..., x k 1. Then there are σ Stab(x 1,..., x k 1 ) and τ Stab(y k ) such that τ σ(x 1,..., x k ) = τ(x 1,..., x k 1, y k ) = (y 1,..., y k ). Now if y k = x i for some i = 1,..., k, then we make an additional step: choose some d X such that d x i, y 1,..., y k 1, which is possible since k #X. Then there are σ Stab(x 1,..., x k 1 ), τ Stab(d) and υ Stab(y 1,..., y k 1 ) such that: υ τ σ(x 1,..., x k ) = υ τ(x 1,..., x k 1, d) = υ(y 1,..., y k 1, d) = (y 1,..., y k ). So, G is k-fold transitive. Lemma 69. Let G be a group of permutations on a set X with #X k, that is k-fold transitive for some k 4. If there is an x X such that Stab(x) is simple, then G is simple. Proof. See [19] p. 263. Definition 70 (Sylow p-subgroup). Let p be a prime number and let n, k be integers such that p k. If G is a finite group with #G = kp n, then a Sylow p-subgroup S p is a subgroup of G such that #S p = p n. I.e. it is a subgroup whose order is the highest power of p that divides the order of G. Theorem 71 (Sylow s theorem). Let p be any prime number and let n, k be integers such that p k. For any finite group G with #G = kp n it holds that: (i) G has at least 1 Sylow p-subgroup, 41

(ii) The number of Sylow p-subgroups is congruent to 1 mod p, (iii) All Sylow p-subgroups of G are conjugate, i.e. if S p and S p are Sylow p-subgroups then there is a g G such that g 1 S p g = S p. Proof. See [8], p. 99-102. Since G 24 is generated by the octads, M 24 must be the set of permutations of S 24 that map each octad to an octad. But what exactly do these permutations look like? A few of them are found in a projective special linear group. Definition 72. The projective special linear group PSL 2 (23) is given by the set of 2 2-matrices with entries in F 23 and determinant 1, that are factored out by the scalar matrices: { ( ) a b PSL 2 (23) = a, b, c, d F23, ad bc = 1}/ λ1 c d 2 λ = 1, 1. Let P (F 23 ) be the projective line over F 23, so P (F 23 ) = {0, 1,..., 22, }. Then PSL 2 (23) acts on P (F 23 ) as follows: ( ) a b x = ax + b c d cx + d, for all x P (F 23 ). Hence, PSL 2 (23) is a set of permutations on the 24 points in P (F 23 ). One of the elements of PSL 2 (23) is τ: by a simple relabeling of the columns of G = (L R) in theorem 43, we can write: τ = ( 0)(1 22)(2 11)(3 15)(4 17)(5 9)(6 19)(7 13)(8 20)(10 16)(12 21)(14 18), i.e. τ(i) = i 1 mod 23, for i P (F 23 ). In matrix notation this becomes: ( ) 0 1 τ =, 1 0 and since det(τ) = 1 (1 1) = 1, we have that τ PSL 2 (23). Theorem 73. PSL 2 (23) = {υ i σ j, υ i σ j τσ k 0 i 10, 0 j, k < 23}, where: ( ) 1 1 σ = 0 1 ( ) ε 0 υ = 0 ε 1 ( ) 0 1 τ =, 1 0 and ε is a primitive element of F 23. 42

( ) a b Proof. For any element PSL c d 2 (23) and any x P (F 23 ) we have: ( ) a b (i) If c = 0 then d = 1, so x = a a 0 d 2 x + ab, (ii) If c 0 then b = ad c 1 c, so: ( a ad c c 1 c d ) a(cx + d) 1 x = c(cx + d) = a c 1 c 2 x + cd. This shows that PSL 2 (23)x = {a 2 x + ab a F 23, b F 23 } { a c 1 xc 2 +cd a, b, c F 23}, for any x P (F 23 ). Actually, we can show that: {a 2 x + ab a F 23, b F 23 } = {υ i σ j (x) 0 i 10, 0 j < 23} and (4.1) { } a c 1 xc 2 + cd a, b, c F 23 = {υ i σ j τσ k 0 i 10, 0 j, k < 23}, (4.2) which will prove the theorem. For this firstly note that: so for any x P (F 23 ): {a 2 a F 23} = {2 i mod 23 0 i 10} = {ε 2i mod 23 0 i 10}, {a 2 x + ab a F 23, b F 23 } = {ε 2i x + ε i b 0 i 10, b F 23 }, and { } { } a c 1 a c 2 x + cd a, c, d F 23 = ε 1 i ε 2i x + εd 0 i 10, a, d F 23. Now if we let a = ε i for any 0 i 10, then: υ i σ ab (x) = σ ab (ε 2i (x)) = ε 2i x + ab = ε 2i x + ε i b, and since b F 23 we have that 0 ab < 23, which proves (4.1). Also, if we let c = ε i for any 0 i 10, then: υ i σ cd τσ a c (x) = σ cd τσ a c (ε 2i x) and since a, d F 23 we have that 0 cd, a c = τσ a c (ε 2i x + cd) = σ a 1 c ( ε 2i x + cd ) 1 = ε 2i x + cd + a c = a ε 1 i ε 2 x + cd, < 23, which proves (4.2). 43

4.3. Construction We have now obtained the necessary definitions and results, so we can go on to construct the Mathieu group M 24. First of all, we have seen that τ preserves G 24, so it is an element of M 24. The following theorem says that also the other two generators of PSL 2 (23), σ and υ, are elements of M 24. Theorem 74. G 24 is preserved by PSL 2 (23). Proof. We know that PSL 2 (23) is generated by τ, σ and υ, and theorem 43 says that τ preserves G 24. So what remains is to prove that σ and υ preserve G 24 too. For this, we consider G 23 as QR-code with the digits of the codewords labeled as {0, 1,..., 22}. Then G 24 is the code that is obtained by adding an overall parity check to the codewords of G 23, which is labeled as. Then for any µ PSL 2 (23) and x G 24, the action of PSL 2 (23) on G 24 is as follows: µ(x) = µ(x 0 x 1 x 22 x ) = (x µ(0) x µ(1) x µ(22) x µ( ) ). We can easily see that σ is a cyclic shift that fixes, so G 24 is fixed by σ. For any polynomial f(x) F 23 2 [X]/ X 23 1 it holds that: υ(f(x)) = f(x 2 ) = f(x) 2, so υ fixes G 23. Also, υ( ) = ε 2 =, so υ fixes G 24. We will show that there is actually one more permutation ω S 24 that fixes G 24 and, together with σ, υ and τ, generates M 24. This would then prove that M 24 fixes G 24, so M 24 Aut(G 24 ). If we again consider G 23 as QR-code then we have: Q = {1 2,..., 22 2 mod 23} = {1, 2, 3, 4, 6, 8, 9, 12, 13, 16, 18}, and N = F 23 \(Q {0}) We define the permutation ω of P (F 23 ) as: = {5, 7, 10, 11, 14, 15, 17, 19, 20, 21, 22}. if i = 0 ( i 2 ω : i )2 if i Q (2i) 2 if i N 0 if i =. An easy verification shows that ω(q) = N and that ω(n) = Q. 44

Theorem 75. G 24 is invariant under ω. Proof. We will make use of a polynomial φ(x) that is defined as follows: φ(x) := i N X i. If this polynomial generates G 23, then we can write any polynomial in G 24 as φ(x) X i + X for i = 0,..., 22. Hence, if we prove that ω(φ(x) X i +X ) G 24 for all i = 0,..., 22 we are done. For this we notice that F 23 = {2 i mod 23 i = 1,..., 11} {2 i 22 mod 23 i = 1,..., 11}, and we make use of the fact that υω = ωυ 2, so for all 1 i 11 it holds that: ω(φ(x) X 2i + X ) = (υω)(φ(x) X i + X ω ) = (ωυ 2 )(φ(x) X i + X ). Since υ and therefore υ 2 fixes G 24, this means it suffices to prove that ω(g(x) X i +X ) G 24 for i = 0, 1, 22. These 3 cases give: ω(φ(x) + X )) = i N = i Q ω(φ(x) X + X )) = i N ω(φ(x) X 22 + X ) = i N X ω(i) + X 0 X i + X 0 = X i + X 0 + X i + i Q i N i N ( ) ( = X i + X + i N =(φ(x) + X ) + X i + i Q i N i P (F 23 ) =(φ(x) + X ) + 1 G 24, X ω(i+1 mod23) + X 0 X i X i + X + X X i + X 0 + X ) = X ω(6) + X ω(8) + X ω(11) + X ω(12) + X ω(15) + X ω(16) + X ω(18) + X ω(20) + X ω(21) + X ω(22) + X ω(0) + X 0 = X 14 + X 7 + X 1 + X 10 + X 3 + X 5 + X 11 + X 13 + X 16 + X 4 + X + X 0 = (φ(x) X 2 + φ(x) X 11 + φ(x) X 20 + X ) G 24, and X ω(i+22 mod23) + X 0 = X ω(4) + X ω(6) + X ω(9) + X ω(10) + X ω(13) + X ω(14) + X ω(16) + X ω(18) + X ω(19) + X ω(20) + X ω(22) + X 0 45

= X 19 + X 14 + X 20 + X 9 + X 21 + X 2 + X 5 + X 11 + X 18 + X 13 + X 4 + X 0 = (φ(x) + φ(x) X + φ(x) X 20 + φ(x) X 22 + X ) G 24. Now what remains to prove is that φ(x) indeed generates G 23. For this, notice that in F 23 2 [X]/ X 23 1 we have that X 23 1 = X 23 + 1 = g(x)h(x), where: g(x) = X 11 + X 10 + X 6 + X 5 + X 4 + X 2 + 1, and h(x) = (X 11 + X 9 + X 7 + X 6 + X 5 + X + 1)(X + 1). This means that we can choose the generator polynomial to be either g(x) or X 11 + X 9 + X 7 + X 6 + X 5 + X + 1. We take G 23 = g(x). Since g(x) and h(x) are relatively prime there are f(x), f (X) such that: Moreover, we can choose f(x) such that: 1 = f(x)g(x) + f (X)h(X). φ(x) = f(x)g(x). Since g(x) and h(x) are relatively prime, also f(x) and h(x) are relatively prime, so there are p(x), p (X) such that: 1 = p(x)f(x) + p (X)h(X), so g(x) = (p(x)f(x) + p (X)h(X))g(X) = p(x)f(x)g(x) + p (X)h(X)g(X) = p(x)f(x)g(x), since p (X)h(X)g(X) = p (X) (X23 +1) g(x) g(x) = 0 in F 23 2 [X]. Hence, g(x) g(x)r(x) = φ(x). Obviously, φ(x) g(x), so G 23 = φ(x), which completes the proof. We now know that σ, υ, τ, ω M 24. What remains to prove is that there is actually no other permutation in S 24 that fixes G 24, so that we are sure that M 24 = σ, υ, τ, ω. For this, we will find the order of σ, υ, τ, ω and conclude that it must be equal to the size of M 24. Theorem 76. # σ, υ, τ, ω = 244823040. Proof. Suppose Y is a block in the Steiner system S(5, 8, 24), say Y = {abcdefgh}, where a, b, c, d, e, f, g, h {0, 1,..., 22, } and they are all distinct. Let H be the subset of Stab(Y ) that fixes an additional i {0, 1,..., 22, }, so H = {g Stab(Y ) g(i) = i}. By lemma 65 we have that # σ, υ, τ, ω = # Stab(Y ) #Y σ,υ,τ,ω. Since Y σ,υ,τ,ω is the orbit of Y under σ, υ, τ, ω and Y is a block, we have that #Y σ,υ,τ,ω is equal to the number of blocks in the Steiner system S(5, 8, 24). From table 3.1 we know that this number is equal to 759, so: # σ, υ, τ, ω = #Stab(Y ) 759. We will find Stab(Y ) in four steps, where we follow the proof as in [13], p. 639-640: 46

(i) Firstly we prove that #Stab(Y ) = 16 #H, (ii) Then we show that #H 20160 by proving that H is isomorphic to a subgroup of GL 4 (F 2 ), (iii) Thirdly we look at the action of H on Y to prove that H contains a subgroup that is isomorphic to A 8, (iv) We conclude by comparing the orders of the above mentioned groups. Step (i). σ, υ, τ, ω contains a cykel of type 1 3 5 15, for example: ω 2 σ 11 = (( )(0)(3)(15)(1 18 4 2 6)(17 11 19 22 14)(5 21 20 10 7)(8 16 13 9 12)) (( )(0 11 22 10 21 9 20 8 19 7 18 6 17 5 16 4 15 3 14 2 13 1 12)) = ( )(2 17 22)(4 13 20 21 8)(0 11 7 16 1 6 12 19 10 18 15 3 14 5 9). Notice that for any two cycles τ 1 and τ 2, we have that τ1 1 τ 2 τ 1 is obtained from τ 2 by applying τ 1 to the symbols in τ 2. For example, if τ 1 = (a b c d) and τ 2 = (a c)(e d b), then τ1 1 τ 2 τ 1 = (b d)(e a c). So, by conjugating, we may assume that the cykel of size 5 is (a b c d e) and the one of size 1 is (i). Since Y is a block of S(5, 8, 24) it must be fixed by this permutation, so the cykel of size 3 is (f g h). We call this permutation π, and note that it is contained in Stab(Y ). Now since: ω 2 σ 5 = (( )(0)(3)(15)(1 18 4 2 6)(17 11 19 22 14)(5 21 20 10 7)(8 16 13 9 12)) ((24)(1 6 11 16 21 3 8 13 18 23 5 10 15 20 2 7 12 17 22 4 9 14 19)) = ( )(1)(10 15)(4 12 11 13)(2 5 7 8 9 17 14 22)(0 21 3 16 20 6 19 18), We see that σ, υ, τ, ω also contains a cykel of type 1 2 2 4 8 2. By conjugating we may assume that Y consists of the cykel of size 4 and a cykel of size 1. Then, since Y is a block, it must contain the other cykel of size 1 and the cykel of size 2 as well. Hence, Stab(Y ) contains a permutation that fixes Y set-wise and permutes all other points in two cykels of size 8. This means that Stab(Y ) is transitive on the remaining 16 points, so #{i} Stab(Y ) = 16. H is the subgroup of Stab(Y ) that fixes {i}, so by lemma 65 #Stab(Y ) = 16 #H. Step (ii). If we look at the collection of codewords of G 24 for which a = b =... = i = 0, then we obtain a linear code of length 15. (Obviously, these entries will remain 0 whenever we add such codewords or if we multiply them with a constant.) Since {abcdefgh} is a block and G 24 is self-dual, the size of this code is 212 = 2 4. Since the codewords of G 2 8 24 have Hamming weight 0, 8, 12, 16, or 24, this new code can only contain codewords of Hamming weight 0, 8 or 12. However, if it contains a codeword of Hamming weight 12, then G 24 must contain a codeword x with wt(x L ) = 4 of Hamming weight 20 as well, 47

which is a contradiction. So, this new code is a [15, 4, 8]-code that contains 15 codewords of Hamming weight 8. Now by [13] p.639 this code has automorphism group GL 4 (F 2 ), which has order 4 i=1 (24 2 i 1 ) = 20160. If we now look at two permutations g and h in H that give the same permutation of the digits of the codewords in this new code, then we must have that g 1 h fixes 16 digits (namely all digits of G 24 except for {abcdefgh}, since H fixes this octad set-wise). However, only the identity permutation can fix 16 points, so we must have that H is isomorphic to a subgroup of GL 4 (F 2 ). Step (iii). Let H consist of those permutations of H that permute entries of the octad Y. If we look at the permutation π as in step (i) we see that: π 5 = ((a b c d e)(f g h)(i)( )) 5 = (a)(b)(c)(d)(e)(f h g)(i)( )( )( )( )( ), which is of cykel type 1 6 3 6. It shows that (f g h) is in H, and by conjugating we can get all cykels of size 3. Since A 8 is generated by these cykels, we see that A 8 H. Step (iv). From (ii) we have that H is isomorphic to a subgroup of GL 4 (F 2 ), so #H #GL 4 (F 2 ) = 20160. From (iii) we have that H contains a subgroup that is isomorphic to A 8, so #H #A 8 = 1 8! = 20160. Combining these two results yields that #H = 20160. 2 From (i) we have that #Stab(Y ) = 16 #H = 16 20160 = 322560. Finally, since # σ, υ, τ, ω = #Stab(Y ) 759, we have that: # σ, υ, τ, ω = 322560 759 = 244823040. Corollary 77. M 24 = σ, υ, τ, ω. Proof. By theorem 75 we have σ, υ, τ, ω M 24 = Aut(G 24 ), so 244823040 #Aut(G 24 ). The proof of theorem 76 still holds if we replace σ, υ, τ, ω by Aut(G 24 ), so #Aut(G 24 ) = 244823040, and hence, Aut(G 24 ) = σ, υ, τ, ω. Since M 24 is defined as Aut(G 24 ), this gives the desired result. 4.4. Multiple transitivity Multiple transitivity is one of the characterizing features of the Mathieu groups. The groups M 11, M 12, M 22, M 23 and M 24 are 4-, 5-, 3-, 4- and 5-fold transitive respectively. This section gives a proof of 5-fold transitivity of M 24. Theorem 78. M 24 is 5-fold transitive. 48

Proof. We know that M 24 = σ, υ, τ, ω, and we write the permutations σ, υ, τ and ω on P (F 23 ) as follows: σ = ( )(0 1 22) υ = ( )(0)(1 2 4 8 16 9 18 13 3 6 12)(5 10 20 17 11 22 21 19 15 7 14) τ = ( 0)(1 22)(2 11)(3 15)(4 17)(5 9)(6 19)(7 13)(8 20)(10 16)(12 21)(14 18) ω = ( 0)(3 15)(1 17 6 14 2 22 4 19 18 11)(5 8 7 12 10 9 20 13 21 16). We denote the cykeltypes of the permutations in σ, υ, τ, ω as a b 1 1 a b k k, where a i is the cykel size of cykel i and b i is the number of cykels with size a i. The permutations σ, υ and τ give us cykeltypes 1 23, 1 2 11 2 and 2 12 respectively. By taking products of these permutations we can obtain other cykeltypes. For example: ω 2 = (( 0)(3 15)(1 17 6 14 2 22 4 19 18 11)(5 8 7 12 10 9 20 13 21 16)) 2 = ( )(0)(3)(15)(1 18 4 2 6)(17 11 19 22 14)(5 21 20 10 7)(8 16 13 9 12), which gives us cykeltype 1 4 5 4, and: ω 2 σ 2 = (( )(0)(3)(15)(1 18 4 2 6)(17 11 19 22 14)(5 21 20 10 7)(8 16 13 9 12)) (( )(0 1 22)) 2 = (( )(0)(3)(15)(1 18 4 2 6)(17 11 19 22 14)(5 21 20 10 7)(8 16 13 9 12)) (( )(0 2 4 6 8 10 12 14 16 18 20 22 1 3 5 7 9 11 13 15 17 19 21)) = (21 2 6 16 4 13)(19 20 14 13 15 11 9)(17 22 18 10 8 7 12)(5)( )(0), which gives us cykel type 1 3 7 3. Likewise, (σω 2 ) 3 and (σ 13 τω 2 ) 3 give us cykeltypes 1 8 2 8 and 4 6 respectively. We can easily see that cykeltypes 1 23 and 2 12 already show that σ, υ, τ, ω is transitive: to any element a {0, 1,..., 22} we can apply σ to obtain a + 1 mod 22, so σ k (a) gives us a + k mod 22. To we can apply τ combined with σ k to obtain a + k mod 22. Since this holds for any k we have for any a, b {0, 1,..., 22, } with a b that there is a permutation τ i σ k such that τ i σ k (a) = b. We will now successively prove that σ, υ, τ, ω is 2-, 3-, 4- and 5-fold transitive by using lemma 68 and the following facts about cykels: (i) τ 1 1 τ 2 τ 1 is obtained from τ 2 by applying τ 1 to the symbols in τ 2. (ii) Any permutation of {0, 1,..., k} is obtained from (0 1 k) and (i j), where i, j {0, 1,..., k} and i j. We already have one cykel of type 1 23, namely σ, where the cykel of size 1 is ( ). By (i), conjugating σ with τ gives us another cykel of type 1 23 where the cykel of size 1 is (0). If we then conjugate this with σ k we obtain a cykel of type 1 23 where the cykel of size 1 is (k), for any k {1,..., 22}. Hence, for any a {0, 1,..., 22, }, Stab(a) contains a cykel of type 1 23, so it acts transitively on the remaining 23 points, and by 49

lemma 68 this means that σ, υ, τ, ω is 2-fold transitive. Likewise, by conjugating υ with σ and τ we can obtain all cykels of type 1 2 11 2 where one of the cykels of size 1 is ( ) and the other is any a {0, 1,..., 22}. Also, conjugating ω 2 σ 2 with σ and τ gives us all cykels of type 1 3 7 3 where one of the cykels of size 1 is ( ) and the other two are (a) and (b) for any a, b {0, 1,..., 22} with a b. Combined, this means that the stabilizer of any two points contains these two cykeltypes, and so is transitive on the remaining 22 points. By lemma 68 we now have that σ, υ, τ, ω is 3-fold transitive. In a similar manner, conjugating ω 2 with σ and τ gives us the cykels of type 1 4 5 4 where one the the cykels of size 1 is ( ), and the other three are (a), (b) and (c) for any distinct a, b, c {0, 1,..., 22}. Hence, the stabilizer of any three points contains cykels of type 1 3 7 3 and 1 4 5 4, so is transitive on the remaining 21 points, and by lemma 68 σ, υ, τ, ω therefore is 4-fold transitive. Now since (σ 13 τω 2 ) 3 is of type 4 6, we see that a subgroup that fixes a set of four points as a whole contains cykels of type 1 4 5 4 and 4 6. So, it is transitive on the remaining 20 points, which means that σ, υ, τ, ω is transitive on any set of five points. Since (σω 2 ) 3 is of type 1 8 2 8, we see that a subgroup that fixes a set of five points as a whole contains cykels of type 1 4 5 4 and 1 8 2 8. If we again conjugate these types of permutations with σ and τ we see that for any set of five distinct points {a, b, c, d, e} we have permutations of type 1 4 5 4 where one of the cykels of size 5 is (a b c d e), and we have permutations of type 1 8 2 8 where three of the cykels of size 1 and one of size 2 are (a)(b)(c)(d e). By (ii) we have that (a b c d e) and (d e) generate S 5, so we must have that the stabilizer of any five points acts transitively on the remaining 19 points. Hence, by lemma 68, σ, υ, τ, ω is 5-fold transitive. Note that the octads of G 24 form the blocks of the Steiner system S(5, 8, 24), so any codeword of Hamming weight 5 is covered by exactly one codeword of Hamming weight 8. This corresponds to the fact that M 24 is 5-fold transitive and transitive on sets of eight points: Corollary 79. M 24 is transitive on the blocks of the Steiner system S(5, 8, 24). 4.5. Simplicity In the proof of the simplicity of M 24 we will make use of M 23. As it turns out, M 23 is the automorphism group of a Steiner system S(4, 7, 23). Since G 24 is obtained from G 23 by adding the overall parity check, we have that M 23 = {π M 24 π( ) = } = Stab( ). If we prove that M 23 is simple, then simplicity of M 24 follows by lemma 69. In order to prove that M 23 is simple we make use of the lemma s below. Lemma 80. Let p be any prime number and let G be a transitive subgroup of S p. Let n be the number of Sylow p-subgroups of G and r be the index of a Sylow p-subgroup in its normalizer. Then: 50

(i) G has a cyclic Sylow p-subgroup S p, (ii) #G = nrp, (iii) r is the least positive residue of #G p mod p, (iv) If r = 1, then G = F p. Proof. (i) If G is transitive on X, then {g(x) g G} = X for all x X, so the order of X divides the order of G, i.e. p #G. Since #S p = p! we have that p i #S p and hence p i #G for all i 2. This means that G = kp for some integer k with p k, so any Sylow p-subgroup of G has order p and therefore is cyclic. By theorem 71 (i) at least 1 such Sylow p-subgroup exists, which we will denote by S p. (ii) By theorem 65 we have that G = n N G (S p ), so n = [G : N G (S p )]. By theorem 71 (iii) all Sylow p-subgroups are conjugate, so: #G = [G : N G (S p )][N G (S p ) : S p ]#S p = nrp. (iii) Firstly, by (i) we have that S p = (1 p), and therefore N Sp (S p ) consists of all permutations π for which π(1 p)π 1 is some power of (1 p). We use that π(1 p)π 1 = (π(1) π(p)), and that (1 p) n = ((1)(1 + n)(1 + 2n) (1 + pn)) for all n < p, with all entries of the cykels calculated modulo p. Combined, this means that any such π is of the form: π : F p F p x ax + b, where a F p, b F p. Obviously, there are p(p 1) of such maps, so #N Sp (S p ) = p(p 1). Now since #S p = p and we have a sequence of subgroups S p N G (S p ) S Sp (S p ), it holds that: p 1 = [N Sp (S p ) : S p ] = [N Sp (S p ) : N G (S p )][N G (S p ) : S p ] = [N Sp (S p ) : N G (S p )] r, so we must have that r divides p 1. By theorem 71 we have that n 1 mod p, so using the result from (ii) we see that #G r mod p. Combined, this implies p that r is the least positive residue of #G mod p. p (iv) If r = 1 then the number of elements in S p of order p is equal to n(p 1). All these elements are p-cylces, which do not fix any point of X. So, G contains at most np n(p 1) = n elements that fix at least one point of X. Now because G is transitive on X, theorem 65 says that #Stab(x)= #G = n for any x X. p Hence, Stab(x) = Stab(x ) for all x, x X. However, the only permutation that fixes all elements of X is the trivial permutation, so this must mean that all these stabilizers are trivial. Hence, n = 1 and #G = p, so G = F p. 51

Lemma 81. Let p be any prime number and let G be a transitive subgroup of S p. If #G = nrp where n > 1, n 1 mod p, p > r and r is prime, then G is simple. Proof. Let n and r be as in lemma 80. Suppose that G is not simple, and let H be a non-trivial strict normal subgroup of G. Then the orbit of {i} under H is permuted transitively by G for all 1 i p, so #{i} G = p for all 1 i, j p. This means that H is transitive on X. Therefore, p #H, and by theorem 71(i) H has a Sylow p-subgroup Σ p. Now since H is a normal subgroup of G (which is a union of conjugacy classes) and all Sylow p-subgroups are conjugate by theorem 71 (iii), H actually contains all Sylow p-subgroups. By lemma 80 (ii) #H = np[n G (Σ p ) : Σ p ], so [N G (Σ p ) : Σ p ] divides r. In lemma 80 (iv) we have seen that if r = 1 then n = 1, but we assumed that n > 1, so we must have r > 1 as well. Finally, since r is prime, this must mean that [N G (Σ p ) : Σ p ] = r and so H = G, a contradiction. Hence, G is simple. Finally, notice that M 23 and M 24 are multiple transitive subgroups of S 23 and S 24 respectively, and that 23 is prime. So: Theorem 82. M 23 is simple. Proof. First of all, by theorem 76 and corollary 77, #M 24 = 244823040 = 24 23 11 7 5 3 2 2 7. Now since M 23 = Stab( ) in M 24, theorem 65 says that: #M 23 = #M 24 #{ } = 24 23 11 7 5 32 2 7 M 24 24 = 23 11 7 5 3 2 2 7. Now #M 23 = 443520 11 mod 23, so by lemma 80 (iii) r = 11 and by (ii) n = #M 23 23 40320 1 mod 23. Finally, since r is prime, lemma 81 gives the desired result. Corollary 83. M 24 is simple. 11 23 = Proof. M 24 is a subgroup of S 24, it is 5-fold transitive, and Stab( ) = M 23 is simple, so the result follows by lemma 69. Now that we have discussed the most important features of G 24 and its automorphism group M 24, we move on to visualise these properties by using geometric figures in chapter 5. 52

Chapter 5 Geometric constructions In chapter 3 we discussed the properties of the extended Golay code. These properties are closely related to geometric figures: the icosahedron, the dodecahedron and the dodecadodecahedron. Also, the Mathieu group M 24 can be visualised by a geometric figure, the cubicuboctahedron. We start by giving some definitions concerning polyhedra in section 5.1, that enable us to discuss the geometric figures afterwards. Then in section 5.2 we show how we can use the icosahedron to visualise a generator matrix for G 24, and we use the dodecahedron to provide methods for both the encoding and decoding process in section 5.3. In section 5.4 we look at the dodecadodecahedron to visualise octads, the generators of G 24, and in section 5.5 we provide the cubicuboctahedron as a geometrical interpretation of the Mathieu group M 24. Finally, in section 5.6 we discuss the links between properties of the code and its automorphism group and the geometric figures. 5.1. Definitions Definition 84. A polygon is a sequence of line segments (v 1 v 2, v 2 v 3,..., v n v 1 ), where the v i are distinct points that lie in a common plane in R 3. The line segments are called edges and the n points are called vertices. A polygon is called regular if all edges have the same length and all angles v i 1 v i v i+1 have the same degree (where we take the labels of the points modulo n). Definition 85. A polyhedron is a collection of vertices V R 3, edges E {v i v j v i, v j V, i j}, and faces F which are polygons with edges in E, such that: (i) Each edge is an edge of exactly two polygons, (ii) The graph with the polygons as its nodes and the line segments as its edges is connected, 53

(iii) The subgraph of all edges and polygons that meet a given vertex is a cykel. The interior of a polyhedron we understand to be all points in R 3 from which there is no path to infinity that does not intersect the polyhedron. Definition 86. A polyhedron is convex if its interior is convex, i.e. if every line segment between two points of the interior is contained in the interior. The symmetry group of a polygon is the set of isometries under which it is invariant. The group operation is given by composition of the symmetries. Such a symmetry group can act on the set of faces, edges and vertices. Definition 87. A polyhedron is uniform if: (i) All its faces are regular polygons, (ii) Its symmetry group is transitive on its vertices (i.e. the concatenation of polygons around each vertex is the same). Remark 88. Note that a uniform polyhedron need not be convex. If we look at the topogical space that is obtained by gluing together convex polygons in the same way as the (possibly nonconvex) polygons in the polyhedron, then by definition 85 (ii) and (iii) we have that this is a surface, i.e. the surface of the polyhedron. Now for any polyhedron we can calculate a topological invariant χ that describes the structure of its surface (see [10]): Definition 89. The Euler characteristic χ is given by: We can distinguish 3 cases: χ = F + V E. If χ > 0, then the surface is a quotient of the sphere by a discrete group, If χ = 0, then the surface is a quotient of the plane by a discrete group, If χ < 0,then the surface is a quotient of the hyperbolic plane by a discrete group. We say that the surface can be uniformized by the sphere, plane or hyperbolic plane. If the surface of a polyhedron is connected and orientable, then the Euler characteristic also equals 2 2g, where g is the genus of the surface. This genus gives us the number of holes in the surface, i.e. the surface of the polyhedron can be considered as the surface of a g-holed torus. Vice versa, these surfaces of the tori can be embedded (if this can be done without intersections of the surface) or immersed (if the surface must intersect itself along the way) into 3D space. In the following sections we will consider four polyhedra that are related to the extended Golay code and its automorphism group. 54

5.2. Icosahedron Definition 90. The icosahedron is a convex, regular polyhedron that has 20 triangular faces, 12 vertices and 30 edges. At each vertex 5 triangles meet. Figure 5.1.: The icosahedron. A generator matrix for G 24 is given by G = (1 12 A), where A is the anti-adjacency matrix of the icosahedron. If we enumerate the 12 vertices of the icosahedron, then for any i, j = 1,..., 12 the ij th position in its anti-adjacency matrix is given by: { 0 if there is an edge between vertices i and j a ij = 1 if there is no edge between vertices i and j. We will now look at all possible elements of the symmetry group of the icosahedron that leave its orientation unchanged: Lemma 91. The group of orientation preserving symmetries of the icosahedron is isomorphic to A 5. Proof. We will use the structure of the proof as given in [3] p. 19. Firstly we label the vertices of the icosahedron as follows: we start with a vertex X and label its opposite vertex X. The vertices adjacent to X we label as A, B, C, D and E, and label their opposite vertices A, B, C, D and E respectively. Figure 5.2 illustrates this labeling. A E B X D C D E A Figure 5.2.: The icosahedron with labeled vertices. 55

We can immediately see that the following rotations belong to the group of orientation preserving symmetries of the icosahedron: (i) The rotations k 2π with k = 1, 2, 3, 4, around the axes through the six pairs of 5 opposite vertices. For example, the rotation 2π around the axis through A and A 5 yields the following permutation of vertices: B X E C D B, and C D B X E C. This gives us 6 4 = 24 rotations. (ii) The rotations π around the axes through the midpoints of the edges. There are 30 edges in total, so this gives us 15 rotations. (iii) The rotations k π with k = 1, 2, around the axes through the midpoints of opposite 3 faces. There are 20 faces in total, so we have 10 rotations for k = 1 and 10 rotations for k = 2, which gives us 20 rotations in total. (iv) Finally, we have to include the identity rotation as well. In total we now have a set R of 24 + 15 + 20 + 1 = 60 rotations that preserve orientation. To show that there are no others we will make use of the Orbit-Stabilizer theorem 65. The orbit of any vertex under R gives us the set of all vertices: we can go from any vertex to any other by using rotations that give us a path of adjacent vertices, and obviously we can also use rotations to interchange adjacent vertices. So, R is transitive on the set of vertices of the icosahedron, which means that #{i} R = 12 for some vertex i. Now if we look at the stabilizer of a given vertex, say X, then it consists of rotations around the axis through X and X, so this stablizer fixes X too. Therefore, Stab(X) only consists of the permutations k 2π for k = 1, 2, 3, 4 and the identity. Now by the 5 Orbit-Stabilizer theorem we have: #R = Stab(X) #{X} R = 5 12 = 60, which means that we have found all orientation preserving symmetries of the icosahedron. Next we will show that there exists an isomorphism between R and A 5. For this we divide the edges into five groups of 6 edges, where each group consists of those edges that are either parallel or perpendicular to each other. Such a group is completely determined by one edge. For example, the group determined by XB is {XB, X B, DE, D E, AC, A C }. We label this group as 1, and we label the other groups that are determined by XA, XC, XD and XE as 2, 3, 4 and 5 respectively. We now observe that any of the rotations in R permute these groups as a whole, since edges keep their mutual parallelism and perpendicularity. This already shows that there is a homomorphism f : R A 5. By comparing orders of elements, we see that f sends the 24 rotations of the form k 2π to the 5-cycles, 5 it sends the 15 rotations π to the cycles of type (2,2,1), it sends the 20 rotations of type k π to the cycles of type (3, 1, 1) and it sends the identity rotation to the identity cycle. 3 We see that the rotation 2π around the axis 5 XX corresponds to the cycle (12345), and that the rotation π around the axis through the faces 3 (CA E ) and (C AE) corresponds to the cycle (123). This means that (123), (12345) Im(f). 56

Finally, we can easily verify that by conjugating (123) with (12345) we can obtain all 3-cycles in S 5, so H = A 5. Since f is a homomorphism it holds that R\Ker(f) = Im(f), but #A 5 = #H = #R = 60 and H Im(f), so we must conclude that R = A 5. From the icosahedron we can easily obtain another important polyhedron: the dodecahedron. 5.3. Dodecahedron Definition 92. The dodecahedron is a convex, regular polyhedron that has 12 pentagonal faces, 20 vertices and 30 edges. At each vertex 3 pentagons meet. Figure 5.3.: The dodecahedron. Remark 93. The dodecahedron is the polyhedron that is obtained from the icosahedron by birectification: it is its dual. Hence, its groups of orientation preserving symmetries too is isomorphic to A 5. If we unfold the dodecahedron we obtain a pattern of twelve pentagons, see figure 5.4: Figure 5.4. We will show how we can encode and decode with G 24 by using the dodecahedron. The idea is that we provide each of the twelve faces with two bits: one representing a message symbol and one representing a check symbol of a codeword of G 24. To any given face on the dodecahedron we have that 5 faces are adjacent, 5 are at distance 2 and 1 is at distance 3. We say that a face is at distance 0 from itself. If we use the dodecahedron to 57

encode a message of 12 bits into a codewords of G 24, we must ensure that the minimal weight of the codewords is 8. So, if only one face has a 1 as message symbol, how do we obtain at least 7 faces with a 1 as check symbol? The answer is: eliminate five faces that are at distance 1 from the face with message symbol 1. For this we use a strip of five adjacent faces of the dodecahedron, which in figure 5.5 is indicated with green. In the following section a detailed description of this process is provided. Figure 5.5.: Strip. 5.3.1. Encoding Suppose we have a message m = (m 1... m 12 ) that we want do encode. We start by enumerating the faces of the dodecahedron, see figure 5.6. Next we put the message symbols on the appropriate faces: m 1 on face 1, m 2 on face 2, etc. We put the strip around face 1 and check the parity of all visible message symbols outside of the strip. If it is even, we put a 0 next to the message symbol on face 1, if it is odd, we put there a 1. This step we repeat for all other faces so that the dodecahedron will have a message symbol and a check symbol on each of its faces. The codeword x then is given by the message symbols followed by the check symbols. 3 8 9 4 7 2 12 10 1 5 6 Figure 5.6. 11 Notice that since x = (m 1... m 12 x 13... x 24 ), decoding through the dodecahedron is equivalent to decoding by using a generator matrix G = (1 12 A). Example 94. Suppose we want to encode the message m = (110000111101). Then firstly we enumerate the dodecahedron and put the message symbols on the appropriate faces, see figure 5.7: 58

3 0 8 9 1 1 4 0 7 1 1 1 2 1 12 1 10 1 0 0 0 5 6 11 Figure 5.7. Then we put the strip around each of the twelve faces and do the parity checks, see figure 5.8: 3 01 8 9 1 10 4 00 7 10 10 10 2 1 12 1 10 1 01 00 00 5 6 11 Figure 5.8. For example, if we put the strip around face 1, then we see one face that contains a 0 (face number 11) and six faces that contain a 1 (face numbers 1, 7, 8, 9, 10 and 12). Hence, its parity check is (0 + 1 + 1 + 1 + 1 + 1 + 1) mod 2 = 0. Now we can read off our desired codeword from the dodecahedron by putting the message and check symbels next to each other: x = (110000111101 001010010101). 5.3.2. Decoding In order to decode a given vector r with the dodecahedron, we have to use properties of G 24. Its error-correcting capability is 1 8 1 = 3, so if the Hamming weight of 2 the error vector r is less than or equal to 3 we will be able to correct all errors that have occured. However, if its Hamming weight is 4, we will only be able to detect that errors have occured, but we ill not be able to determine which ones. If the Hamming weight is 5 or more, then we will most probably decode incorrectly. Hence, in this section we assume that all our received vectors have error vectors of Hamming weight 3 or less. The decoding starts by putting all the digits of a received vector on the appropriate faces of the enumerated dodecahedron. We then make use of decoding checks, which are quite similar to parity checks. If a face of the dodecahedron passes this check, we will delete it. If it fails, then we leave the face where it is. In this manner we obtain a pattern of pentagons. Such patterns can be categorized into 12 categories which we will call C1, 59

C2, C3, Parachute, Wave, Rails, Rings, Lock, Fish, Horizon, Mountain and Cup. These patterns will be explained below. A decoding check on a face of the dodecahedron is made in the following way: we put the strip around the face and do the parity check, and to that we add the check symbol of the particular face. This face passes if the result is even, it fails if the result is odd. Ofcourse when no errors occured, all faces of the dodecahedron pass. We will explain what patterns can arise after performing all decoding checks, when the error vector e of the received vector r has Hamming weight 1, 2 or 3. These patterns give you the exact location of where the errors occured, and thus allows you to decode r. wt(e) = 1 1 in check symbol If the error occured in a check symbol, then only the face on which this check symbol is placed will fail the decoding check. Hence, the pattern consists of only one pentagon: the one on which the error occured. We will call this pattern C1. 1 in message symbol If the error occured in a message symbol, only the faces adjacent to the one with this incorrect symbol will pass the decoding check. This is because when we put the strip around these faces, the wrong message symbol is not visible. This means that after we performed all decoding checks we are left with a pattern that is (of the same form as) the complement of the strip. Because of its resemblance in 3D, this pattern is called the Parachute (by [4]). Notice that the location of the wrong message symbol corresponds to the location of the parachutist. In figure 5.9 it is indicated with a green line. Figure 5.9.: Parachute. wt(e) = 2 2 in check symbols If the two errors occured in check symbols then again only the faces on which these errors occured will fail the decoding check. Hence, we are left with a pattern of two pentagons only: the ones on which the errors occured. We will call this pattern C2. 1 in check symbol, 1 in message symbol If one of the errors occured in an message symbol, the pattern that we obtain after 60

performing the decoding checks is a Parachute. An additional error in a check symbol will keep this pattern intact but alters one of its faces: the face on which the error in the check symbol occured. If this face is on the Parachute we obtain a Parachute with a hole, see for example figure 5.10, where the face on which the error occured in a check symbol is marked with a blue line. If the error in the check symbol occured outside of the Parachute, we obtain a Parachute with an extra face, see for example figure 5.11, where again the face on which the error occured in a check symbol is marked with a blue line. Figure 5.10.: Parachute with a hole. Figure 5.11.: Parachute with an extra face. 2 in message symbols If both of the errors occured in message symbols then there are three possibilities that we have to consider: 1) the errors occured in adjacent faces, 2) the errors occured in faces that are at distance 2 of each other, or 3) the errors occured in faces that are at distance 3. If we perform the decoding checks, these cases yield patterns which we call the Wave, Rails, and Rings respectively. They are shown in figure 5.12, 5.13 and 5.14, where the faces on which the errors occured are marked by green lines. 61

Figure 5.12.: Wave. Figure 5.13.: Rails. Figure 5.14.: Rings. wt(e) = 3 3 in check symbols As in the case where errors occured in 1 or 2 check symbols, the pattern that we obtain after performing the decoding checks when 3 errors occured in check symbols consists of 3 pentagons only: the ones on which the errors occured. This pattern is called C3. 2 in check symbols, 1 in message symbol An error in an message symbol yields the Parachute, and two additional errors in check symbols result in the Parachute with either two holes, one hole and one extra face, or two extra faces. See figure 5.15, 5.16 and 5.17 respectively for three examples. 62

Figure 5.15.: Parachute with two holes. Figure 5.16.: Parachute with one hole and one extra face. Figure 5.17.: Parachute with two extra faces. 1 in check symbol, 2 in message symbols If 2 errors occured in message symbols we obtain the Wave, Rails or Rings. One extra error in a check symbol results in one of the patterns with either a hole or an extra face. See for example figure 5.18 or 5.19. 63

Figure 5.18.: Wave with one extra face. Figure 5.19.: Rails with one hole. 3 in message symbols If the 3 errors occured in message symbols then we have to distinguish between the diffent distances between the faces on which the errors occured. If we call the faces on which the errors occured x, y and z, then with a triple (,, ) we denote (distance(x, y), distance(y, z), distance(x, z)). There are five cases we have to consider: (1,1,1), (1,1,2), (1,2,2), (2,2,2) and (1,2,3) (in appendix A a detailed explanation of these cases is given). If we perform the decoding checks these cases yield patterns which we call the Lock, Fish, Horizon, Mountain and Cup respectively. They are shown in figure 5.20, 5.21, 5.22, 5.23 and 5.24. Figure 5.20.: Lock. 64

Figure 5.21.: Fish. Figure 5.22.: Horizon. Figure 5.23.: Mountain. Figure 5.24.: Cup. We call the number of faces of which a pattern exists its weight. Table 5.1 gives an overview of all possible weights and the corresponding (possibly altered) patterns that we might obtain when decoding a given vector on the dodecahedron. 65

Weight Possible patterns 1 C1 2 C2 3 C3 5 Parachute with 2 holes Wave with 1 hole Rails with 1 hole Fish Horizon Cup 6 Parachute with a hole Wave Rails 7 Parachute Parachute with one hole and one extra face Wave with an extra face Rails with an extra face 8 Parachute with an extra face 9 Parachute with 2 extra faces Rings with a hole Lock Mountain 10 Rings 11 Rings with an extra face Table 5.1.: Possible weights and corresponding patterns. Example 95. Suppose we receive r = (101101001101 010011110100). Since wt(r) = 13, we immediately see that at least one error must have occured. We start by writing down the digits on our dodecahedron. Next we make twelve decoding checks and we mark the faces that failed green, see figure 5.25. 3 10 8 9 01 10 4 10 7 01 10 01 2 10 12 1 10 1 01 11 00 5 6 11 Figure 5.25. 66

This pattern has weight 7 so the possibilities are: (i) Parachute with no extra holes or faces, (ii) Parachute with a hole and one extra face. By rotating this pattern in 3D we immediately see that it is not of the same form as the Parachute, so we are dealing with option (ii). This means that we must have one error in an message symbol and two errors in check symbols. In 3D we see that the location of the Parachutist (the face with the error in the message symbol) is face 6, and the faces on which the errors occured in check symbols are faces 2 (extra face) and 3 (extra hole), see figure 5.26, where the faces on which the errors occured are marked with red. So, the codeword x is r (000001000000 011000000000) = (101100001101 001011110100). 1 3 4 5 8 6 2 9 10 11 7 Figure 5.26. 12 By deciding what errors occured in the decoding process we look at the 12 different patterns. A crucial fact is that we look at these patterns up to rotational symmetry. For example, the Parachute with one hole that is at distance 2 of the Parachutist, gives the same pattern for all five possibilities for the location of the hole. Also, the case where only one error occured in a check symbol gives the same pattern for all twelve possible errors. Only by numbering the faces of the dodecahedron we are able to determine where exactly the error occured. If we make a list of all patterns that are different up to rotational symmetry we obtain 41 possibilities, which is explained in detail in appendix A. However, we can also apply another strategy for decoding a received vector. The idea is that we first check if performing the decoding checks gives us C1, C2, C3 or a Parachute with at most two faces altered. If not, then we follow the exact same decoding scheme, only this time with the message and check symbols interchanged. Then the faces of C1, C2 and C3 or the place of the Parachutist correspond to the positions where the errors occured plus 12 : if the Parachutist is face number 6, then the error occured in check symbol r 6+12 = r 18. An extra face or a hole on the Parachute then gives the location of an error in a message symbol. 67

5.4. Dodecadodecahedron We know that the octads of G 24 form the blocks of the Steiner system S(5, 8, 24), and that they generate the code. It is interesting to provide a geometrical interpretation of these generators, and for this we will use the dodecadodecahedron. Definition 96. The dodecadodecahedron is a semi-regular polyhedron that has 24 faces of which 12 are pentagrams and 12 are pentagons, and it has 30 vertices and 60 edges. At each vertex, 2 pentagrams and 2 pentagons meet alternately. Figure 5.27.: The dodecadodecahedron. Remark 97. The 12 star pentagrams are clearly visible, but the 12 pentagons require a closer look becacuse the dodecadodecahedron is not convex. Figure 5.28 shows one of them, indicated with a blue line. Figure 5.28.: A pentagon on the dodecadodecahedron. The dodecadodecahedron clearly has the same group of orientation preserving symmetries as the dodecahedron, so it too is isomorphic to A 5. Since its surface is connected and orientable, we can consider it as the surface of a g-holed torus. From definition 89, we have χ = 24 + 30 60 = 6 = 2 2g, which implies that g = 4. Both the pentagons 68

and the pentagrams are pentagonal, so the surface consists of 24 pentagons where 4 pentagons meet at each vertex. The dodecadodecahedron thus is an immersion of this surface into 3D space. The Euler characteristic of the dodecadodecahedron is negative, which implies that we can uniformize this surface by the hyperbolic plane. By doing so we are able to see all faces of the dodecadodecahedron at once. This will make a description of how we can view codewords of G 24 and which of them generate the code easier. On the hyperbolic plane also 4 pentagons have to meet at each vertex, see figure 5.29. Figure 5.29.: Uniformized surface of the dodecadodecahedron by the hyperbolic plane. Obviously, the 24 faces of the dodecahedron represent the 24 digits of the codewords. However, to decide which 8 of them form an octad, we will look at color paths: Definition 98. A color path on the dodecadodecahedron is a sequence of adjacent faces on the dodecadodecahedron that is obtained as follows: In figure 5.29, choose one pentagon in which to start the path. Cross one of its edges and successively cross the vertex that is furthest away from this edge. You follow a straight line, so crossing the vertex means you cross over an intersection point of two lines. The pentagon that you have reached is the endpoint. We have given the same color to the pentagons in which we start and end a color path: on the dodecadodecahedron in 3D space this means that faces of the same color are 69

parallel. Moreover, if we follow 4 paths in succession we will end on the same face as where we have started. If we color all faces of the dodecadodecaheron by walking 4 different paths 6 times, we obtain a partition of the 24 faces into 6 different colored sets. So, on the hyperbolic plane we can point out the 24 faces of the dodecadodecahedron as in figure 5.30. We labeled the parallel faces as follows: Light blue: {a, b, c, d}, yellow: {A, B, C, D}, red: {I, II, III, IV }, purple: {w, x, y, z}, blue: {i, ii, iii, iv} and green: {1, 2, 3, 4}. 3 C 4 z IV D i w y 2 iv III B d 1 A a iii II x ii I c b Figure 5.30.: The 24 faces of the dodecadodecahedron on the hyperbolic plane. Now how do we use this to find faces that compromise an octad? Firstly, we can choose any two sets of 4 faces that have the same color: together form an octad, which we call parallel octads. Furthermore, on the hyperbolic plane we can easily see that any face has five faces that are adjacent. We find three other faces by walking a color path, starting from our initial face. So, the three faces are the ones that are parallel to our initial face. This means that for each of the 24 faces on the dodecadodecahedron we can find an octad, which obviously are all distinct. These 24 octads are called facial octads. En example is given by figure 5.31, where the faces of the octad are indicated with black. 70

Figure 5.31.: Example of a facial octad on the hyperbolic plane. Another method for obtaining octads is by looking at the vertices of the dodecadodecahedron. Surrounding each vertex v there are four faces, which form the first half of an octad. If we now follow a color path from each of these four faces though v, then we obtain the remaining four faces of the octad. Such octads are called vertical octads. Since each vertex yields a different vertical octad, there are 30 of them in total. If we now look at two opposite vertices, then their corresponding vertical octads do not have any face in common. The 8 faces that are in neither of these vertical octads are called equitorial octads. On the dodecadodecahedron we can easily see that the vertices of the star pentagrams are all vertices of the dodecahedron. The 15 rotations π in the proof of lemma 91 correspond to rotations around axes through vertices of the dodecahedron. Each of these rotations fix two opposite vertices, which give two disjoint vertical octads and hence one equitorial octad. So, there are 15 equitorial octads. 5.5. Cubicuboctahedron In the previous sections we discussed three geometric objects that were related to the extended Golay code. Likewise, we can provide a geometrical interpretation for its automorphism goup, the Mathieu group M 24. For this we will use the cubicuboctahedron. 71

Definition 99. The cubicuboctahedron is a non-convex uniform polyhedron that has 20 faces of which 8 are triangles, 6 are squares and 6 are octagons, and it has 24 vertices and 48 edges. At each vertex, 1 triangle, 1 square and 2 octagons meet. Figure 5.32.: The cubicuboctahedron. Remark 100. The octagonal faces are the faces that are parallel to the square faces, see figure 5.33, where one of the octagonal faces is indicated with a blue line. Figure 5.33.: An octagon on the cubicuboctahedron. We will show how the Mathieu group can be seen as a group of permutations on the 24 vertices in a few steps. Firstly we uniformize the cubicuboctahedron by the hyperbolic plane, but modify it slightly as to obtain a specific tiling of the so-called Klein quartic. We do this because we know the automorphism group of this covering tiling of the hyperbolic plane, and moreover can augment it by only one symmetry in order to obtain the Mathieu group M 24. Firstly, the cubicuboctahedron is non-convex (the octagonal faces intersect transversally), so we can consider it as an immersion of a connected orientable surface into 3D space. We calculate its genus g by: χ = 20 + 24 48 = 4 = 2 2g, which gives us g = 3. 72