Investigation of the Elias Product Code Construction for the Binary Erasure Channel

Investigation of the Elias Product Code Construction for the Binary Erasure Channel by D. P. Varodayan A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF BACHELOR OF APPLIED SCIENCE DIVISION OF ENGINEERING SCIENCE FACULTY OF APPLIED SCIENCE AND ENGINEERING UNIVERSITY OF TORONTO Supervisor: F. R. Kschischang December 2002

Abstract This thesis studies the iteration of the product code construction (due to Elias) over a sequence of Reed-Muller codes for communication across the binary erasure channel. We derive criteria for the choice of the constituent (in this case, Reed-Muller) sequence in order that the iterated product code achieves positive rate and arbitrarily good erasure correction. Subsequently, we consider four approaches to selecting the constituent sequence to synthesize high performance iterated product codes. One of these techniques gives high rate (0.9768) codes with low erasure probability thresholds (0.0026). Another yields low rate (0.0067) codes with high erasure probability thresholds (0.9004). We suggest heuristics for synthesizing capacity-achieving codes with similar combinations of high rates and low erasure probability thresholds, and low rates and high erasure probability thresholds. ii

Acknowledgements I extend my sincere thanks to my supervisor Professor Frank Kschischang for his expert guidance and mentorship. I would also like to acknowledge the friendship and support of my colleagues in the Engineering Science program. Finally, I would like to thank my family for their patience, advice and continued encouragement. iii

Contents Abstract ii Acknowledgements iii List of Symbols vi List of Figures vii List of Tables ix 1 Introduction 1 2 Background 3 2.1 Linear Block Codes............................ 3 2.2 Memoryless Binary Channels....................... 6 2.2.1 The Binary Erasure Channel................... 7 2.2.2 The Binary Symmetric Channel................. 12 iv

2.3 The Reed-Muller Family of Codes.................... 15 2.4 The Product Construction........................ 17 2.5 Iterated Product Codes.......................... 21 3 Criteria and Analysis 24 3.1 Criteria.................................. 24 3.2 Analysis for Positive Limiting Rate................... 26 3.3 Analysis for Zero Erasures in the Limit................. 29 4 Code Synthesis and Performance 35 4.1 Performance Measures.......................... 35 4.2 The Space of Constituent Codes..................... 37 4.3 Approaches to Code Synthesis...................... 38 4.3.1 Selecting every lth Link of a Chain............... 39 4.3.2 Starting at the lth Link of a Chain............... 41 4.3.3 Taking the Chain of Distance 2 c................. 43 4.3.4 Switching between Chains.................... 45 4.4 Summary of Heuristics for Code Synthesis............... 48 5 Conclusions and Future Research 50 Bibliography 52 v

List of Symbols n k d R G E p η C H(X) I(X; Y ) H m RM(r, m) A B blocklength of a code dimension of a code minimum distance of a code rate of a code generator matrix of a code erasure symbol probability of symbol erasure threshold for erasure probability capacity of a channel entropy of random variable X mutual information of random variables X and Y extended Hamming code of blocklength 2 m rth order Reed-Muller code of blocklength 2 m direct product code of A and B {C i } i=1 sequence of constituent codes {P j } j=1 sequence of iterated product codes vi

List of Figures 1.1 Block Diagram of Error Control Coding................. 2 2.1 Model of a Memoryless Channel..................... 7 2.2 The Binary Erasure Channel....................... 8 2.3 Plot of H(X) versus p X (0) for a Binary Random Variable X..... 9 2.4 Output versus Input Erasure Profile for a code over the BEC..... 12 2.5 The Binary Symmetric Channel..................... 13 2.6 Plot of Capacity C versus Probability of Erasure or Error, p, for the BEC and BSC............................... 14 3.1 Output versus Input Erasure Profiles for Reed-Muller Codes with d = 4 32 3.2 Output versus Input Erasure Profiles for Reed-Muller Codes with n = 16................................... 33 4.1 The Reference Curve for Capacity-Achieving Codes.......... 36 vii

4.2 Performance of Codes, synthesized by selecting every lth Link of a Chain................................... 40 4.3 Performance of Codes, synthesized by starting at the lth Link of a Chain................................... 42 4.4 Performance of Codes, synthesized by taking the Chain of distance 2 c 44 4.5 Performance of Codes, synthesized by switching Chains........ 47 4.6 Performance of various Iterated Product Codes............ 49 viii

List of Tables 2.1 The space of Reed-Muller codes RM(r, m) indexed by m and r... 15 4.1 The space of Reed-Muller codes RM(r, m) indexed by n = 2 m and d = 2 m r.................................. 37 ix

Chapter 1 Introduction Error control codes enable efficient and reliable data transmission over noisy channels. Fig. 1.1 shows a block diagram of the communication process with error control. The encoder takes a message stream from the source and incorporates some redundant information. During transmission through the channel, noise may introduce errors into the data. When the signal is received, the decoder uses the added information to attempt detecting and correcting any errors that occurred. Thus, the original message may be recovered, free of transmission errors. Claude Shannon, in 1948 [1], established theoretical bounds on the performance of such error control codes. From that time, coding theorists have striven (with success) to realize codes that approach these limits of performance. These capacityachieving coding schemes, however, suffer from complexity in the encoding and decoding stages. An alternate approach, which was suggested by Peter Elias in 1954 [2], is to build 1

Figure 1.1: Block Diagram of Error Control Coding powerful linear block codes from simpler linear block codes using an operation known as the Elias product construction. The encoding and decoding of these so-called product codes are only as complicated as encoding and decoding the original (constituent) codes. Furthermore, Elias demonstrated, by iterating the product construction over an infinite sequence of constituent codes, it was possible to achieve arbitrarily good error control. The purpose of this thesis is to investigate the Shannon performance characteristics of iterated product codes over the binary erasure channel (BEC). Specifically, we derive conditions that a sequence of constituent codes should fulfill, and evaluate some examples of the resulting iterated product codes. This investigation suggests some heuristics for the synthesis of iterated product codes that operate close to the Shannon limit. 2

Chapter 2 Background 2.1 Linear Block Codes The linear block code is the basic unit of the Elias product construction; two such codes combine to produce another linear block code. This section provides an elementary coverage of the theory of linear block coding. An (n, k) block code encodes the input data stream by dividing it into blocks of k symbols. Each k symbol block (an information word) is encoded independently into an n symbol block (a codeword), which is transmitted over the channel. Usually (and in this thesis), we are interested only in the binary case; that is, each symbol is a bit. Definition 2.1 An (n, k) binary block code C is a set of 2 k binary sequences of length n, called codewords. 3

The parameter n is known as the blocklength and the parameter k the dimension. The rate R of a block code measures the ratio of information symbols to total message symbols and is given by R = k n (2.1) Besides blocklength n and dimension k, the other important parameter of a code is its minimum distance d, which measures the difference between the most similar codewords. Definition 2.2 The Hamming distance d(x, y) between two words, x and y, is the number of places in which they differ. Definition 2.3 The minimum distance d of a code C is the minimum Hamming distance between any pair of its codewords; that is, d = min d(x, y), where x, y C (2.2) x y The three parameters, blocklength, dimension and minimum distance, are important in determining the performance of the code. Hence, in this thesis, we refer to block codes as (n, k, d) codes. Linear block codes are subject to the additional constraint that any linear combination of the codewords is also a codeword. In the binary case, the addition is componentwise modulo 2 and the scalar multiplication is also modulo 2. To understand the behavior of arithmetic for codes on larger symbol alphabets, one should study the theory of finite (or Galois) fields. This, however, is beyond the scope of this thesis; elementary texts on error control coding, such as [3], provide a sound 4

coverage. Nonetheless, we point out to the reader that binary linear codes are said to be defined on the Galois Field of two elements, GF (2). That is, GF (2) consists of the set {0, 1}, as well as the aforementioned modulo 2 arithmetic operations. Notice that codewords, which are n-tuples of GF (2), belong to the vector space GF (2) n. This observation suggests the following definition of the binary linear block code. Definition 2.4 An (n, k) binary linear block code is a k-dimensional subspace of GF (2) n. A generator matrix G is a k n matrix whose rows are basis vectors of the subspace. That is, the row space of G spans the code and any linear combination of the rows of G is a codeword. This last fact means that G has an n k right inverse G 1. Equivalently, the n k transpose G T has a k n left inverse (G T ) 1 = (G 1 ) T. Finally, notice that G is not unique for a given linear code. It is natural to use G to establish the encoding scheme. The one-to-one mapping from information words of length k to codewords of length n is: c = ig, (2.3) where i is the information word and c is the codeword. Another consequence of Definition 2.4 is that the zero word is always a codeword of a linear code, because vector subspaces contain the zero element. Furthermore, the arrangement of codewords at various distances from the zero codeword is equivalent to the arrangement around any other codeword. For, if c is any codeword and 5

c 1,..., c r are the codewords at some distance l away, then the zero codeword has codewords c 1 c,..., c r c at the same distance l. As a result, we can find the minimum distance d of a linear code by finding the distance between the zero codeword and the nearest nonzero codeword. Definition 2.5 The Hamming weight w(x) of a codeword x is its Hamming distance from the zero codeword; that is, w(x) = d(x, 0) (2.4) Definition 2.6 The minimum weight w of a code C is the minimum Hamming weight of any nonzero codeword; that is, w = min w(x), where x C (2.5) x 0 Due to the symmetry in the code structure, the minimum distance d equals the minimum weight w for a linear code. 2.2 Memoryless Binary Channels The performance of codes varies depending on the nature of the information transfer channel. In this study, the binary erasure channel (BEC) is considered because it affords simplicity and elegance to the analysis, while still being a valid model for certain communication channels (for example, Internet broadcasting). However, it is also important to realize that the binary symmetric channel (BSC) is a better model for most traditional channels. 6

Memoryless channels are ones for which the output symbol at a given time is a function of only the input symbol at that time. Hence, we can characterize a memoryless channel with input X and output Y by its input alphabet A X, its output alphabet A Y, and the conditional probability of the output given the input p Y X (y x), as shown in Fig. 2.1. Figure 2.1: Model of a Memoryless Channel 2.2.1 The Binary Erasure Channel Definition 2.7 The binary erasure channel (BEC) is a memoryless channel with input alphabet A X = {0, 1} and output alphabet A Y = {0, 1, E}. The output symbol E indicates erasure; that is, the receiver is unable to detect the value of the input bit. Suppose p is the probability of erasure. Then the conditional probability of the output given the input, p Y X (y x) = p if y = E 1 p if y = x 0 otherwise, where 0 p 1 (2.6) This behavior is illustrated in Fig. 2.2. 7

Figure 2.2: The Binary Erasure Channel Now, we investigate the capacity of the BEC, the maximum amount of information that can be transported through the channel. First, however, let us review Shannon s definition of entropy [1], the average information content of a random variable. Definition 2.8 For a random variable X with probability distribution p X (x), the entropy is given by H(X) = E [ ] 1 log 2 = p X (x) x p X (x) log 2 1 p X (x) (2.7) Observe that for a binary random variable, H(X) = p X (0) log 2 1 p X (0) + p X(1) log 2 1 p X (1), (2.8) where p X (0) + p X (1) = 1. Plotting H(X) versus p X (0) in Fig. 2.3, we see that the entropy H(X) achieves a maximum of 1 when p X (0) = p X (1) = 0.5. When p X (0) = 0 or p X (0) = 1, however, the random variable is in fact deterministic, so that the entropy H(X) equals 0. 8

Figure 2.3: Plot of H(X) versus p X (0) for a Binary Random Variable X Relating this notion of entropy back to the channel in Fig. 2.1, we see that the a priori information content of the input is H(X). After receiving the output, though, the information content is reduced to H(X Y ). Therefore, the amount of information transported by the channel is given by I(X; Y ) = H(X) H(X Y ), (2.9) where I(X; Y ) is known as the mutual information of random variables X and Y. Alternatively, the information content of the output a priori is H(Y ), and after sending the input, becomes H(Y X). So, the information transported is also I(X; Y ) = H(Y ) H(Y X), (2.10) 9

Using (2.9), the amount of information transported over the BEC, I(X; Y ) = H(X) [ H(X Y = 0)p Y (0) + H(X Y = 1)p Y (1) + H(X Y = E)p Y (E) ] It is a property of the BEC that when Y equals 0, X must equal 0, and when Y equals 1, X must equal 1. In other words X Y = 0 and X Y = 1 are deterministic; so, H(X Y = 0) = H(X Y = 1) = 0. On the other hand, when Y equals E, no extra information about X has been gained; this means H(X Y = E) = H(X). Also, notice that p Y (E) = p, the probability of erasure. Making these substitutions, the amount of information transported through the BEC is I(X; Y ) = H(X) H(X)p = H(X)(1 p), (2.11) where H(X) is the entropy of the input and p is the probability of erasure. Definition 2.9 The capacity C of a channel is the maximum amount of information that can be transported through that channel. That is where X is the input and Y the output of the channel. [ ] C = max I(X; Y ), (2.12) p X (x) From (2.11), the capacity of the BEC is given by [ ] C = max H(X)(1 p) = 1 p, (2.13) p X (x) since H(X) for binary input has maximum 1 when p X (0) = p X (1) = 0.5. From the point of view of code design, the capacity C of a channel is an upper limit on the rate R of a code that achieves arbitrarily good error control. To simplify, a 10

reliable code must have R C. Shannon, in his seminal 1948 paper [1], showed that there indeed exist capacity-achieving codes (for which R C) that enable reliable transmission. Finally, it is pertinent now to comment on the erasure correction capability of codes operating over the BEC. Theorem 2.10 An (n, k, d) block code C guarantees complete correction of a received word if the number of erasures in the received word is less than or equal to d 1. Proof of Theorem 2.10: Suppose, by way of contradiction, that there exists an uncorrectable received word containing e erasures, where e d 1. This means that the pattern of n e correct symbols in this word matches at least two valid codewords, x and y, belonging to C. Hence, x and y agree in at least these n e symbol locations. This contradicts the fact that C has minimum distance d, since d(x, y) e < d. On the other hand, there do exist two codewords, u and v for which d(u, v) = d, since this is the minimum distance of C. Construct the received word r from u, by erasing the symbol locations of u which disagree with v. Thus, r has d erasures and its pattern of correct symbols matches both u and v. That is, r is not correctable. Therefore, C only guarantees correction of received words that contain d 1 or fewer erasures. Please note that, although some patterns of d or more erasures may be correctable by C, we assume they are unchanged by the decoder in our subsequent analysis. 11

The resulting output versus input erasure profile for a code with n = 16 and d = 8 is shown in Fig. 2.4. Figure 2.4: Output versus Input Erasure Profile for a code over the BEC 2.2.2 The Binary Symmetric Channel Definition 2.11 The binary symmetric channel (BSC) is a memoryless channel with input alphabet A X = {0, 1} and output alphabet A Y = {0, 1}. Suppose p is the probability of error. Then the conditional probability of the output 12

given the input, p if y x p Y X (y x) = 1 p if y = x, where 0 p 1 (2.14) This behavior is illustrated in Fig. 2.5. Figure 2.5: The Binary Symmetric Channel To calculate the capacity of the BSC, we begin with (2.10), which states that I(X; Y ) = H(Y ) H(Y X). We then write the output Y = X + N, where noise N is a binary random variable with p N (1) = p. Hence, H(Y X) = H(X + N X) = H(N), since noise N is independent of input X. Substituting this into (2.10) gives 1 I(X; Y ) = H(Y ) H(N) = H(Y ) p log 2 p (1 p) log 1 2 1 p, because N is a binary random variable with p N (1) = p. Finally, notice that making the input X equiprobable makes the output Y also equiprobable over the BSC, 13

thereby maximizing H(Y ) to the value 1. Thus, applying (2.12) yields the capacity of the BSC, [ ] 1 C = max I(X; Y ) = 1 p log2 p X (x) p (1 p) log 1 2 1 p (2.15) The capacity of the BSC is compared with that of the BEC in Fig. 2.6. Note that for p > 0.5, the BSC transfers more bits incorrectly than correctly, so we should think of it instead as an inverting channel with error probability 1 p. This observation explains the symmetry of the BSC capacity about p = 0.5. So, for probabilities of interest 0 < p 0.5, the capacity of the BEC exceeds that of the BSC. Figure 2.6: Plot of Capacity C versus Probability of Erasure or Error, p, for the BEC and BSC 14

2.3 The Reed-Muller Family of Codes Reed-Muller codes [4, 5] are an infinite class of binary linear block codes. Each code, specified by two parameters, r and m, where 0 < r < m, has: blocklength n = 2 m dimension k = r i=0 ( m i ), and minimum distance d = 2 m r. Such a code is known as the rth-order Reed-Muller code of blocklength 2 m, or RM(r, m). The space of Reed-Muller codes with rows indexed by m and columns by r is depicted in Table 2.1. Constructions of these codes are presented in [3, Sec. 3.7]. r = 0 r = 1 r = 2 r = 3 r = 4 m = 1 RM(0, 1) m = 2 RM(0, 2) RM(1, 2) m = 3 RM(0, 3) RM(1, 3) RM(2, 3) m = 4 RM(0, 4) RM(1, 4) RM(2, 4) RM(3, 4) m = 5 RM(0, 5) RM(1, 5) RM(2, 5) RM(3, 5) RM(4, 5)......... Table 2.1: The space of Reed-Muller codes RM(r, m) indexed by m and r Let us consider a specific Reed-Muller code: the case with r = 1 and m = 2. So, blocklength n = 4, dimension k = 3 and minimum distance d = 2, making RM(1, 2) 15

a (4, 3, 2) linear block code. Notice that the rate R equals 0.75. There are a total of 2 k GF (2) 4 : = 8 codewords, that form a 3-dimensional subspace of 0 0 0 0 0 0 1 1 0 1 0 1 0 1 1 0 1 0 0 1 1 0 1 0 1 1 0 0 1 1 1 1 Observe that each pair of these codewords is separated by Hamming distance greater than or equal to 2; that is, the code s minimum distance d equals 2. This means that the code is able to correct a single erasure per block over the BEC. For example, if the word 01E1 is received, it can unambiguously be corrected to 0101. However, a word with two erasures may not be able to be corrected; 0EE1, for instance, may originally have been 0101 or 0011. A generator matrix can be constructed by finding a set of basis vectors for the codeword space. One such choice is: G = 1 0 0 1 0 1 0 1 0 0 1 1 16

This generator matrix specifies the coding scheme. For instance, suppose we wish to encode the information word i = 110. The codeword c is obtained by multiplying i by the generator matrix, according to (2.3): ( c = ig = 1 1 0 ) 1 0 0 1 0 1 0 1 0 0 1 1 ( = 1 1 0 0 ) Hence, the codeword c is 1100. (Keep in mind that arithmetic in GF (2) is carried out modulo 2.) 2.4 The Product Construction In this thesis, we investigate the combination of linear block codes (such as Reed- Muller codes) to produce more powerful linear block codes. The method of combination, called the product construction, was suggested by Peter Elias in 1954 [2]. Suppose A and B are (n A, k A, d A ) and (n B, k B, d B ) linear block codes over GF (2), respectively. Let a A and b B be codewords of the respective codes. First we define the direct product code, A B, as a code of blocklength n A n B whose codewords are n A n B matrices over GF (2). Then we show that a T b is a codeword of A B. Definition 2.12 The direct product A B is the code consisting of all n A n B matrices with the property that each matrix column is a codeword of A and each matrix row is a codeword of B. 17

Define a b to be the n A n B matrix a T b, so that [a b] ij = a i b j. Each column of a b is a scalar multiple of a and each row is a scalar multiple of b. Therefore, a b is in fact a codeword of A B. We term codewords of this form separable. Note that separable codewords usually comprise only a small proportion of the product code A B. Consider the direct product of two RM(1, 2) codes; that is, the code RM(1, 2) RM(1, 2). An example of a separable codeword in this case is 0 0 0 0 0 1 ( ) 0 1 0 1 0 1 0 1 = 1 0 1 0 1 0 0 0 0 0 and examples of non-separable codeword are 1 0 0 1 0 1 1 0 and 0 1 1 0 1 0 0 1 0 1 1 0 1 0 1 0 0 1 0 1 1 0 0 1 We now present a useful result concerning the parameters (n, k, d) of the product code A B. Theorem 2.13 and its proof are suggested by [6]. Theorem 2.13 If A and B are (n A, k A, d A ) and (n B, k B, d B ) linear block codes over GF (2), respectively, then A B is an (n A n B, k A k B, d A d B ) linear block code over GF (2). 18

Since RM(1, 2) is a (4, 3, 2) code, Theorem 2.13 claims that RM(1, 2) RM(1, 2) is a (16, 9, 4) code. Proof of Theorem 2.13: The code C = A B contains the n A n B zero matrix, and so is a non-empty code of length n A n B. Since A and B are linear, any linear combination of the columns (or rows) of C is also a valid column (or row). Thus, C is also linear. To establish that C has minimum distance d = d A d B, consider a nonzero codeword x C. Now, x has a nonzero column a A, whose weight is as least d A. Furthermore, each nonzero component of a is part of a nonzero row, and each of these rows is a nonzero codeword of B. That is, any nonzero codeword x C has weight at least d A d B, which implies that d d A d B. On the other hand, if a A and b B are nonzero codewords of weight d A and d B, respectively, then a b C has weight d A d B. So, in fact, d = d A d B. All that remains to be shown is that the dimension of C equals k A k B. Suppose that A and B have generator matrices G A and G B, respectively. Recall that the rows of a generator matrix form a basis for the code it encodes. Let us denote the rows of G A by gi A for 1 i k A, and the rows of G B by gj B for 1 j k B. We will now prove that the separable codewords gi A gj B form a basis for C, so that the dimension of C is k A k B. Exploiting Definition 2.12, we write the separable codewords as follows: g A i g B j = (g A i ) T (g B j ) = G T AE ij G B, (2.16) 19

where E ij is a k A k B matrix specified by its entries 1 if u = i and v = j e uv = 0 otherwise (2.17) To show that the codewords g A i g B j are linearly independent, suppose that some linear combination of them equals zero; that is, suppose λ ij G T AE ij G B = 0, where λ ij {0, 1} i,j By the distributive law for matrix multiplication, ( λ ij E ij )G B = 0 and G T A i,j λ ij E ij = 0 i,j by multiplying on the left by the left inverse (G T A ) 1 and on the right by the right inverse G 1 B. Hence, λ ij = 0 for all 1 i k A and 1 j k B, showing that the codewords g A i g B j are linearly independent. On the other hand, suppose the n A n B matrix C is a codeword of C. By Definition 2.12, each column of C is a codeword of A and each row is a codeword of B. This means that the columns of C are linear combinations of the columns of G T A, and the rows of C are linear combinations of the rows of G B. The last fact means C = DG B, for some n A k B matrix D. Multiplying this by the right inverse G 1 B, yields D = CG 1 B. This expression indicates that every column of D is a linear combination of the columns of C, which are themselves 20

linear combinations of the columns of G T A. Hence, D = GT A E, for some k A k B matrix E. Since E is k A k B, E = i,j λ ije ij for some λ ij {0, 1}. Combining these statements gives ( ) C = DG B = G T AEG B = G T A λ ij E ij G B = i,j i,j λ ij G T AE ij G B, which means that C is a linear combination of the separable codewords g A i g B j. In other words, the codewords g A i g B j span the code C. Since these k A k B codewords are also linearly independent, they form a basis of C. Therefore, the dimension of C is k A k B. 2.5 Iterated Product Codes The Elias product construction may be iterated over a sequence of linear codes. Indeed, Elias did exactly this in [2], using a sequence of extended Hamming codes in the following manner. Suppose H m is the (2 m, 2 m m 1, 4) extended Hamming code, where m 2. Notice that these are in fact the Reed-Muller codes RM(m 2,m). Elias proposed the iterated product codes E L = H 2 H 3 H L+1 (2.18) The behavior of these codes as L is noteworthy. The limiting rates of the codes, lim L R(L), achieve non-zero values, indicating that the codes still have information content in the limit. Elias also showed that the codes E L display arbitrarily good error correction performance over the BSC as L. 21

Since the capacity of the BEC exceeds that of the BSC for low probabilities p of erasure or error, we expect Elias iterated product codes E L to be more successful over the BEC than over the BSC. That is, if the input erasure probability is below some threshold level, the output erasure probability can be made arbitrarily small by increasing L. Indeed, we determine this threshold for the codes E L in Chapter 4, in which we investigate the performance of various iterated product codes. The decoder for an iterated product code over the BEC works in an iterative fashion as well. Consider, for example, the decoder for the (16, 9, 4) product code RM(1, 2) RM(1, 2) correcting the 4 4 received word with three erasures, 0 E 1 0 1 0 1 0 r = 0 1 0 1 1 E 0 E The decoder consists of two stages. At the first stage, four decoders for RM(1, 2) operate on the columns of r. Since RM(1, 2) has minimum distance 2, these decoders correct the single erasure in the fourth column but not the double erasure in the second column. The resulting intermediate word is 0 E 1 0 1 0 1 0 r 1 = 0 1 0 1 1 E 0 1 At the second stage of decoding, four decoders for RM(1, 2) operate on the rows of 22

r 1, correcting the erasures in both row 1 and row 4 and yielding, 0 1 1 0 1 0 1 0 r 2 = 0 1 0 1 1 0 0 1 Thus, in general, the decoder for an iterated product code is only as complicated as the decoders for its constituent codes. The decoder has j stages of decoding, where j is the number of constituent codes. As in the previous example, each stage of decoders operates on an orthogonal dimension of the received word. This means that the erasures encountered by a given stage are uniformly distributed regardless of the correction of previous stages. Hence, the input bit erasure probability encountered by each stage of decoding (after the first) is the output erasure probability resulting from the previous stage. For the ith stage of decoding, we can thus denote the input erasure probability by p i 1 and the output erasure probability by p i. So, p 0 and p L are, respectively, the input erasure probability and the output erasure probability of the iterated product code as a whole. 23

Chapter 3 Criteria and Analysis In this chapter, we first formalize the criteria that iterated product codes should satisfy in order to be considered effective. We then perform mathematical analysis to determine sufficient conditions (wherever possible) for the codes to meet these criteria. 3.1 Criteria Before discussing the criteria, we should establish the following notation. Let {C i } i=1 denote an infinite sequence of (n i, k i, d i ) binary linear block codes. Then, construct the infinite sequence of iterated product codes {P j } j=1, as follows: P j = C 1 C 2 C j (3.1) 24

Observe that Theorem 2.13 indicates that P j is an (n 1 n 2... n j, k 1 k 2... k j, d 1 d 2... d j ) binary linear block code. Hence, the rate of P j is given by R(j) = k 1k 2... k j n 1 n 2... n j = j i=1 k i n i (3.2) Notice also that decoding any of the codes in {P j } is identical, except that the decoder for each terminates at a different stage. Therefore, we can define the input and output erasure probabilities of the ith decoding stage for any code in the sequence {P j } to be p i 1 and p i, respectively. So, for a given code P j, the overall input erasure probability is p 0 and the overall output erasure probability is p j. Observe also that p 0 p 1 p 2, since the BEC decoders never introduce any erasures. The final comment about notation is a reminder that these codes are operating over the BEC with probability of erasure given by p, so p 0 = p and the capacity C of the channel equals 1 p. The first criterion concerns the rates of the codes. The rate R(j) measures the information content of a transmitted message encoded with P j. These rates are given by R(j) = j i=1 k i n i = k j n j R(j 1), (3.3) Now, k i < n i for each constituent code C i, so R(j) is a decreasing function bounded below by 0. Since R(j) is monotone and bounded, lim j R(j) = R for some R 0. We would like to ensure that R 0; otherwise, the iterated product codes would be transmitting no information in the limit. In other words, we require a positive limiting rate R. 25

Criterion 3.1 We require that where R > 0. lim R(j) = k i = R j n i=1 i Subsequently we will attempt to make R close to C = 1 p, the upper bound on R(j) for reliable transmission over the BEC. The second criterion relates to the output erasure probabilities p j for the codes in {P j }. Since p 0 p 1 p 2, the probability p j is also decreasing and bounded below by 0. In this case, we want to force lim j p j = 0; otherwise, the iterated product codes would leave some erasures uncorrected in the limit. That is, we require zero erasures in the limit, or arbitrarily good erasure correction. Criterion 3.2 We require that lim p j = 0 j 3.2 Analysis for Positive Limiting Rate Before grappling with Criterion 3.1, we should define exactly what is meant by convergence of an infinite product. Noting that [ j j lim x i = lim exp ln x i ], (3.4) j j i=1 i=1 we take the standard definition given in [7, Sec. 49]. 26

Definition 3.3 Suppose that {x i } i=1 is a sequence of real numbers. Then, x i converges if and only if i=1 ln x i converges. (3.5) i=1 Note that, under Definition 3.3, if the limit of the product approaches 0, the limit of the sum approaches and the infinite product is said to diverge to zero. At this point, we remind the reader of the limit comparison test for the convergence of infinite series. Refer to elementary Calculus texts, such as [8, Sec. 11.3], for a proof. This result is required for the proof of Theorem 3.5 for the convergence of infinite products; note that an alternate proof is given in [7, Sec. 50]. Theorem 3.4 Limit Comparison Test. Suppose i series of positive terms. Suppose also that lim i x i y i Then i x i converges if and only if i y i converges. x i and i y i are two infinite = L, where L is a positive real. Theorem 3.5 Suppose {x i } i=1 is a sequence of positive real numbers. Then, (1 + x i ) converges if and only if i=1 x i converges. i=1 Proof of Theorem 3.5: By Definition 3.3, it suffices to show that ln(1 + x i ) converges if and only if i=1 x i converges. i=1 Let f(x) = x 1+x, g(x) = ln(1 + x), and h(x) = x. Hence, f (x) = 1 (1+x) 2, g (x) = 1 1+x, and h (x) = 1. Therefore, x 1 + x < ln(1 + x) < x, for all x > 0 (3.6) 27

because f(0) = g(0) = h(0) = 0 and f (x) < g (x) < h (x) for all x > 0. Applying the comparison test for convergence to (3.6) gives the following two results: i=1 x i 1 + x i converges if ln(1 + x i ) converges. (3.7) i=1 Suppose i=1 ln(1 + x i ) converges if i=1 x i converges. (3.8) i=1 x i 1+x i converges. Since this is a sum of positive terms, lim i x i 1+x i = 0. That is, lim i (1 1 1+x i ) = 0, which implies lim i 1 1+x i = 1. Rewriting this as lim i x i 1+x i x i = 1, and applying the limit comparison test for convergence yields: x i converges if i=1 i=1 x i 1 + x i converges. (3.9) Combining (3.7), (3.8) and (3.9) proves the required result. Using Theorem 3.5 in conjunction with the assumption lim i k i n i = 1 gives Corollary 3.6, a necessary and sufficient condition for positive limiting rate. This assumption is reasonable; if lim i k i n i 1, then certainly R = k i i=1 n i = 0. Corollary 3.6 Suppose that lim i k i n i = 1. Then i=1 k i n i = R if and only if i=1 n i k i n i converges, where R > 0. 28

Proof of Corollary 3.6: Set x i = n i k i k i in Theorem 3.5. Since n i > k i for all i, {x i } i=1 is a sequence of positive reals. We can thus apply Theorem 3.5, to give: i=1 n i k i converges if and only if i=1 n i k i k i converges. (3.10) This is not exactly the result we wish to prove. However, notice that i=1 k i n i converges if and only if i=1 n i k i converges (3.11) since their combined product is 1. (Recall that convergence of a product is to a nonzero value.) And by the limit comparison test for convergence, since i=1 n i k i n i lim i converges if and only if n i k i n i n i k i k i i=1 k i = lim = 1, which is assumed. i n i Taking (3.10), (3.11) and (3.12) together with the fact that k i n i the required result. n i k i k i converges (3.12) > 0 for all i, yields 3.3 Analysis for Zero Erasures in the Limit Let us first consider the relationship between the output and input erasure probabilities for a linear block code operating over the BEC, before establishing a condition for Criterion 3.2. Theorem 3.7 The decoder for an (n, k, d) binary linear block code, operating over the BEC with erasure probability p in, produces the output erasure probability p out 29

given by p out p in = n t=d ( ) n 1 (p in ) t 1 (1 p in ) n t t 1 Proof of Theorem 3.7: Consider a particular symbol location S in a received word r. Recall that we represent erasure of the location S, by labelling it with the output symbol E. Hence, p in = P [ S contains E before decoding ] (3.13) and p out = P [ S contains E after decoding ] (3.14) Now, S contains E after decoding if and only if S contains E before decoding and r is not corrected during decoding. Applying the definition of conditional probability to this fact, in conjunction with (3.13) and (3.14) gives P [ r is not corrected S contains E before decoding ] = p out p in (3.15) Theorem 2.10 states that the decoder for the BEC guarantees correction of all erasures in r if the number of erasures is less than or equal to d 1. If the number of erasures is greater than or equal to d, we assume that the decoder leaves r unchanged. For the conditional probability in (3.15), we are given that the symbol in S is erased; so r is not corrected if and only if there are at least d 1 erasures in the other n 1 symbol locations of r. That is, p out p in = P [ at least d 1 erasures in n 1 symbol locations ] = n 1 s=d 1 P [ s erasures in n 1 symbol locations ] (3.16) 30

Remember that the BEC is memoryless, so erasures of symbol locations occur independently with probability p in. Therefore, the number of erasures in a set of symbol locations is a binomially-distributed random variable: P [ s erasures in n 1 symbol locations ] ( ) n 1 = (p in ) s (1 p in ) (n 1) s (3.17) s Combining (3.16) and (3.17) gives p out p in = n 1 s=d 1 = n t=d ( n 1 ) (pin ) s (1 p s in ) (n 1) s ) (pin ) t 1 (1 p in ) n t ( n 1 t 1 by making the substitution t 1 = s. The result in Theorem 3.7 is difficult to grasp because it is not expressed in a closed form. To visualize the consequences of this result, we plot the output versus input erasure profiles for two sets of Reed-Muller codes on logarithmic axes. Fig. 3.1 shows the erasure profiles for a collection of codes of equal minimum distance d = 4, {RM(0, 2), RM(1, 3), RM(2, 4),..., RM(9, 11)}. The best erasure correction profile belongs to the code RM(0, 2) since it has the shortest blocklength. The other codes match this erasure correction rate up to a constant factor for low values of p in. However, at high p in, they perform almost no erasure correction at all. Fig. 3.2 shows the erasure profiles for a collection of codes of equal blocklength n = 16, {RM(0, 4), RM(1, 4), RM(2, 4), RM(3, 4)}. Among these codes, the best profile belongs to RM(0, 4) since it has the largest minimum distance. The other codes improve in erasure correction as p in decreases but never achieve erasure correction comparable to RM(0, 4). 31

Figure 3.1: Output versus Input Erasure Profiles for Reed-Muller Codes with d = 4 We now present Corollary 3.8, a sufficient condition for zero erasures in the limit, that follows directly from Criterion 3.2 and Theorem 3.7. Corollary 3.8 The sequences {n i } i=1 and {d i } i=1 of integers and the initial value p 0 = p must be selected so that the relationship p i n i ( ) ni 1 = (p i 1 ) t 1 (1 p i 1 ) n i t p i 1 t 1 forces lim j p j = 0. t=d i Before completing the analysis for zero erasures in the limit, we interpret Corollary 3.8 from a different perspective. Given a sequence of iterated product codes {P j }, 32

Figure 3.2: Output versus Input Erasure Profiles for Reed-Muller Codes with n = 16 the sequences {n i } and {d i } are fixed, so let us consider which values of p force lim j p j = 0. Notice that if {P j } achieves zero erasures in the limit over the BEC with erasure probability p = ψ, then it certainly does so for all p < ψ, where 0 < ψ 1. This is because the BEC with p = ψ is less reliable than any of those with p < ψ. Hence, we introduce the concept of erasure probability threshold η for the sequence of codes {P j }. Definition 3.9 For the sequence of iterated product codes {P j }, define the Boolean 33

function CONV(p) as below: true, if lim j p j = 0 CONV (p) = false, if lim j p j 0, where 0 p 1 Definition 3.10 The erasure probability threshold η for the sequence of iterated product codes {P j }, is given by η = sup { p CONV (p), 0 p 1 } That is, η is the least upper bound on the set of BEC erasure probabilities p that ensure zero erasures in the limit. It is this threshold η, together with the limiting rate R, that is used in Chapter 4 to evaluate the performance of various sequences of iterated product codes. 34

Chapter 4 Code Synthesis and Performance In this chapter, we construct several infinite sequences of iterated product codes, using Reed-Muller codes as the constituent codes. We then evaluate their performance with reference to the Shannon bound. By comparing the results of various approaches, we propose a number of heuristics for the synthesis of iterated product codes that operate close to capacity. 4.1 Performance Measures The performance of a sequence of iterated product codes {P j } is summarized by the two parameters introduced in Chapter 3; namely, the limiting rate R, and the threshold erasure probability η for zero erasures in the limit. We now show that these two quantities are related by the Shannon bound. Consider the BEC with probability of erasure p. According to Shannon [1], R C = 1 p, in order to 35

achieve reliable transmission. For {P j }, this is true for all p up to the threshold η. Hence, R 1 η, (4.1) with equality if and only if {P j } is capacity-achieving in the limit. Fig. 4.1 shows the equality case in (4.1) plotted as a curve on logarithmic axes. Figure 4.1: The Reference Curve for Capacity-Achieving Codes The curve in Fig. 4.1 serves as a benchmark for evaluating the performance of a sequence of iterated product codes. Observe that the point (η, R) for the sequence {P j } must lie on or below the curve. The distance from (η, R) to the curve along the R axis is a proportional measure of how much the rate R is below the capacity 36

of the BEC of erasure probability η. On the other hand, the distance from (η, R) to the curve along the η axis is a proportional measure of how much the threshold η is below the maximum value for the capacity-achieving code of rate R. Note that these are two different interpretations of performance, but both are equally valid. 4.2 The Space of Constituent Codes In this thesis, we use the Reed-Muller codes RM(r, m) as the constituent codes in the synthesis of iterated product codes. Table 2.1 shows the space of Reed-Muller codes, indexed by m and r. In this chapter, however, we gain more insight by arranging the codes according to the block code parameters, blocklength n = 2 m and minimum distance d = 2 m r. Table 4.1 shows this arrangement with the rows indexed by n and the columns by d. d = 2 1 d = 2 2 d = 2 3 d = 2 4 d = 2 5 n = 2 1 RM(0, 1) n = 2 2 RM(1, 2) RM(0, 2) n = 2 3 RM(2, 3) RM(1, 3) RM(0, 3) n = 2 4 RM(3, 4) RM(2, 4) RM(1, 4) RM(0, 4) n = 2 5 RM(4, 5) RM(3, 5) RM(2, 5) RM(1, 5) RM(0, 5)......... Table 4.1: The space of Reed-Muller codes RM(r, m) indexed by n = 2 m and d = 2 m r 37

For convenience, let us establish some terminology to refer to the codes in Table 4.1. Definition 4.1 Let the chain of distance 2 c denote the sequence of codes in the cth column of Table 4.1, starting at the top of the column. That is, the chain of distance 2 c is the sequence {RM(0, c), RM(1, c + 1), RM(2, c + 2),...}. We then refer to an individual code in a chain as a link. 4.3 Approaches to Code Synthesis In this section, we consider and evaluate a number of approaches for synthesizing iterated product codes. Each technique involves selecting sequences of constituent codes {C i } from the Reed-Muller family, and applying the rule in (3.1), P j = C 1 C 2 C j to generate sequences of iterated product codes {P j }. We then compute the performance parameters (η, R) (using Matlab), so that we may compare the performance of each code synthesized. Note that calculating η involves a binary search of the interval (0, 1), whereas R can be evaluated directly. Our starting point for code synthesis is Elias construction, described earlier in (2.18). For that sequence of iterated product codes, C i = H i+1, where H m is the (2 m, 2 m m 1, 4) extended Hamming code. Recall that the sequence {H m } m=2 is, in fact, the sequence of Reed-Muller codes {RM(m 2, m)} m=2. This means that it forms the chain of distance 4. The performance of Elias example is given by (η, R) = (0.7729, 0.0539), which is the point (η, R) shown in Fig. 4.1. 38

Let us now progressively develop four approaches to code synthesis. 4.3.1 Selecting every lth Link of a Chain The first attempt at code synthesis involves selecting every lth link from the first code in a chain of codes as the sequence of constituent codes {C i }. Consider, for example, the chain of distance 4: {RM(0, 2), RM(1, 3), RM(2, 4),...}. Selecting every lth link from the first code gives the sequence of iterated product codes that approaches RM(0, 2) RM(l, l + 2) RM(2l, 2l + 2) The performance of these sequences, for l ranging from 1 to 16, is plotted in Fig. 4.2. The apparent trend among these points indicates that the erasure probability threshold η decreases dramatically as a proportion its capacity-achieving value, when l increases. At the same time, the rate R increases but approaches a limit significantly below capacity. Similar behavior is observed when links are skipped in the other chains of codes. Therefore, selecting every lth link in a chain is not a profitable method of constructing capacity-achieving codes; the gain in R is insignificant compared to the loss in η. Notice that R is bounded above by 0.25, the rate of RM(0, 2). This is because R is the infinite product of the rates of the constituent codes. The fact that η falls so rapidly is apparent from Fig. 3.1. This demonstrates the fact that among codes of equal minimum distance, the ones with smallest blocklength are the best at erasure correction. Therefore, selecting only every lth link of a chain results in codes with good erasure correction ability being discarded. We try to avoid this degradation 39

LEGEND l Iterated Product Code in the Limit (a) 1 RM(0, 2) RM(1, 3) RM(2, 4) (b) 2 RM(0, 2) RM(2, 4) RM(4, 6) (c) 4 RM(0, 2) RM(4, 6) RM(8, 10) (d) 8 RM(0, 2) RM(8, 10) RM(16, 18) (e) 16 RM(0, 2) RM(16, 18) RM(32, 34) Figure 4.2: Performance of Codes, synthesized by selecting every lth Link of a Chain 40

in the subsequent strategies by never skipping over a link in a chain. Instead, we ensure that a constituent code C i is always followed in the sequence by a code C i+1 of exactly twice the blocklength. In terms of the array in Table 4.1, this means that the code C i+1 must appear in the row immediately below the row that contains C i. 4.3.2 Starting at the lth Link of a Chain The second approach to code synthesis also involves only a single chain of codes. In this case, we select the sequence of constituent codes {C i } starting at the lth link of the chain and continuing down the chain without skipping links. Once again, as an example consider the chain of distance 4. Starting at the lth link of this chain we generate a sequence of iterated product codes that approaches RM(l 1, l + 1) RM(l, l + 2) RM(l + 1, l + 3) The performance of this sequence, for l ranging from 1 to 9, is plotted in Fig. 4.3. As with the previous method, the threshold η for erasure probability falls as l increases. However, this time the proportional decline of η compared to the capacity-achieving η is not so steep. Meanwhile, the rate R approaches capacity. For instance, the code with l = 9 has performance (η, R) = (0.0026, 0.9768) represented by the point (e) in Fig 4.3. Performing these calculations for other chains of codes yields similar conclusions. Therefore, starting at the lth link of a chain is a suitable technique for designing high rate iterated product codes with low erasure probability thresholds. The behavior of η is justified by Fig. 3.1, as before. Among codes with equal minimum distance, the ones with smallest blocklength are the best at erasure cor- 41

LEGEND l Iterated Product Code in the Limit (a) 1 RM(0, 2) RM(1, 3) RM(2, 4) (b) 3 RM(2, 4) RM(3, 5) RM(4, 6) (c) 5 RM(4, 6) RM(5, 7) RM(6, 8) (d) 7 RM(6, 8) RM(7, 9) RM(8, 10) (e) 9 RM(8, 10) RM(9, 11) RM(10, 12) Figure 4.3: Performance of Codes, synthesized by starting at the lth Link of a Chain 42

rection. By successively removing small blocklength codes from the iterated product codes, we gradually reduce the threshold for erasure probability. Since these small blocklength codes also have the lowest rates, removing them allows R to approach capacity. 4.3.3 Taking the Chain of Distance 2 c The first two approaches to iterated product code synthesis manipulated the selection of constituent codes from a single chain. We now consider the iterated product codes that result from choosing the sequence {C i } as the chain of codes of distance 2 c. Hence, the sequence of iterated product codes approaches RM(0, c) RM(1, c + 1) RM(2, c + 2) The performance of this sequence, for c ranging from 1 to 3, is plotted in Fig. 4.4. When chains of larger distance are used as the sequence of constituent codes, the rate R falls as a proportion of capacity. The erasure probability threshold η, on the other hand, increases as a proportion of capacity-achieving η. So, in contrast to the previous method, this approach is appropriate for synthesizing low rate codes that have high erasure probability thresholds. For example, the code with c = 3 has performance given by (η, R) = (0.9004, 0.0067), which is shown in Fig. 4.4 as point (c). The decreasing limiting rate is expected because equal blocklength Reed-Muller codes with greater minimum distance, d = 2 m r, have smaller dimension, k = ), and therefore lower rates. Also, among codes with equal blocklength, the r ( m i=0 i 43

LEGEND c Iterated Product Code in the Limit (a) 1 RM(0, 1) RM(1, 2) RM(2, 3) (b) 2 RM(0, 2) RM(1, 3) RM(2, 4) (c) 3 RM(0, 3) RM(1, 4) RM(2, 5) Figure 4.4: Performance of Codes, synthesized by taking the Chain of distance 2 c 44

codes with greater minimum distance display superior erasure correction ability, as demonstrated in Fig. 3.2. Thus, this method produces iterated product codes with η that approaches its capacity-achieving value. 4.3.4 Switching between Chains Thus far we have only been able to construct close-to-capacity iterated product codes which have extreme combinations of limiting rate and erasure probability threshold. We now augment our collection of iterated product codes with intermediate values of R and η. Notice that, in an effective iterated product code, the bulk of erasure correction must be performed in the initial stages of decoding. Otherwise, the initial stages would be redundant since subsequent stages would correct most of the erasures anyway. On the other hand, each constituent code in an iterated product code contributes equally to the limiting rate. Let us now reconsider the results of the previous approach of taking the chain of distance 2 c as the sequence of constituent codes; as c increases, the erasure probability threshold increases but the limiting rate decreases. In light of our new observations, it makes sense to use large distance codes for the initial stages of the iterated product code sequence and smaller distance codes for the later stages. Therefore, we propose forming the constituent code sequence {C i } by switching between contiguous segments of chains, in such a way that blocklength doubles and the minimum distance does not increase from stage to stage. That is, given C i, we 45