Guess & Check Codes for Deletions, Insertions, and Synchronization

Size: px
Start display at page:

Download "Guess & Check Codes for Deletions, Insertions, and Synchronization"

Transcription

1 Guess & Chec Codes for Deletions, Insertions, and Synchronization Serge Kas Hanna, Salim El Rouayheb ECE Department, IIT, Chicago Abstract We consider the problem of constructing codes that can correct deletions occurring in an arbitrary binary string of length n bits Varshamov-Tenengolts (VT) codes are zeroerror single deletion ( = ) correcting codes, and have an asymptotically optimal redundancy Finding similar codes for deletions is an open problem We propose a new family of codes, that we call Guess & Chec (GC) codes, that can correct, with high probability, up to deletions occurring in a binary string Moreover, we show that GC codes can also correct up to insertions GC codes are based on MDS codes and have an asymptotically optimal redundancy that is Θ( log n) We provide deterministic polynomial time encoding and decoding schemes for these codes We also describe the applications of GC codes to file synchronization I INTRODUCTION The deletion channel is probably the most notorious example of a point-to-point channel whose capacity remains unnown The bits that are deleted by this channel are completely removed from the transmitted sequence and their locations are unnown at the receiver (unlie erasures) For example, if is transmitted, the receiver would get if the first and third bits were deleted Constructing efficient codes for the deletion channel has also been a challenging tas Varshamov- Tenengolts (VT) codes [] are the only deletion codes with asymptotically optimal redundancy and can correct only a single deletion The study of the deletion channel has many applications such as file synchronization [3] [7] and DNAbased storage [8] The capacity of the deletion channel has been studied in the probabilistic model In the model where the deletions are iid and occur with a fixed probability p, an immediate upper bound on the channel capacity is given by the capacity of the erasure channel p Mitzenmacher and Drinea showed in [9] that the capacity of the deletion channel in the iid model is at least ( p)/9 Extensive wor in the literature has focused on determining lower and upper bounds on the capacity [9] [4] We refer interested readers to the comprehensive survey by Mitzenmacher [5] Ma et al [7] also studied the capacity in the bursty model of the deletion channel, where the deletion process is modeled by a Marov chain A separate line of wor has focused on constructing codes that can correct a given number of deletions In this wor This is an extended version of the conference paper in [], which will appear in the proceedings of the IEEE International Symposium on Information Theory, 7 This wor was supported in parts by NSF Grant CCF we are interested in binary codes that correct deletions Levenshtein showed in [6] that VT codes [] are capable of correcting a single deletion ( = ), with an asymptotically optimal redundancy (log(n + ) bits) More information about the VT codes and other properties of single deletion codes can be found in [7] VT codes have been used to construct codes that can correct a combination of a single deletion and multiple adjacent transpositions [8] However, finding VT-lie codes for multiple deletions ( ) is an open problem In [6], Levenshtein provided bounds showing that the asymptotic number of redundant bits needed to correct bit deletions in an n bit codeword is Θ( log n), ie, c log n for some constant c > Levenshtein s bounds were later generalized and improved in [8] The simplest code for correcting deletions is the ( + ) repetition code, where every bit is repeated ( + ) times However, this code is inefficient because it requires n redundant bits, ie, a redundancy that is linear in n Helberg codes [9,] are a generalization of VT codes for multiple deletions These codes can correct multiple deletions but their redundancy is at least linear in n even for two deletions Schulman and Zucerman in [] presented codes that can correct a constant fraction of deletions Their construction was improved in [,3], but the redundancies in these constructions are O(n) Recently in [4], Braensie et al provided an explicit encoding and decoding scheme, for fixed, that has O( log log n) redundancy and a near-linear complexity But the crux of the approach in [4] is that the scheme is limited to a specific family of strings, which the authors in [4] refer to as pattern rich strings In summary, even for the case of two deletions, there are no nown explicit codes for arbitrary strings, with O( log n) redundancy Contributions: We consider the problem of constructing codes that can correct multiple deletions with an asymptotically optimal redundancy This is a problem that has been open for many decades We relax the standard requirement in the literature (eg, [9] [4]) that these codes must satisfy zero-error decoding, and construct codes that have an asymptotically vanishing probability of decoding failure, assuming uniform iid binary messages Specifically, we mae the following contributions: (i) We propose new explicit codes, which we The term decoding failure means that the decoder cannot mae a correct decision and outputs a failure to decode error message The other assumption we mae is that the positions of the deletions are independent of the information message

2 Guess & Chec (GC) codes u bits Bloc I Binary to q ary q = U symbols Bloc III Bloc II q ary Systematic MDS X to ( + c, ) + c binary symbols q = + c log bits Bloc IV ( + ) repetition of parity bits x + c( + ) log bits Fig : General encoding bloc diagram of the GC code for deletions Bloc I: The binary message of length bits is chuned into adjacent blocs of length log bits each, and each bloc is mapped to its corresponding symbol in GF (q) where q = log = Bloc II: The resulting string is coded using a systematic ( + c, ) q ary MDS code where c > is the number of parity symbols Bloc III: The symbols in GF (q) are mapped to their binary representations Bloc IV: Only the parity bits are coded using a ( + ) repetition code call Guess & Chec (GC) codes, that can correct, with high probability, and in polynomial time, up to a constant number of deletions (or insertions) occurring in arbitrary binary strings The encoding and decoding schemes of GC codes are deterministic Moreover, these codes have an asymptotically optimal redundancy of value c( + ) log c( + ) log n (asymptotically), where and n are the lengths of the message and codeword, respectively, and c > is a constant integer; (ii) GC codes enable different trade-offs between redundancy, decoding complexity, and probability of decoding failure; (iii) We provide numerical simulations on the decoding failure of GC codes and compare these simulations to our theoretical results For instance, we observe that a GC code with rate 8 can correct up to 4 deletions in a message of 4 bits with no decoding failure detected within runs of simulations; (iv) We describe how to use GC codes for file synchronization as part of the interactive algorithm proposed by Venataramanan et al in [3,4] and provide simulation results highlighting the resulting savings in number of rounds and total communication cost Organization: The paper is organized as follows In Section II, we introduce the necessary notations used throughout the paper We state and discuss the main result of this paper in Section III In Section IV, we provide encoding and decoding examples on GC codes In Section V, we describe in detail the encoding and decoding schemes of GC codes The proof of the main result of this paper is given in Section VI In Section VII, we explain the trade-offs achieved by GC codes In Section VIII, we explain how these codes can be used to correct insertions instead of deletions The results of the numerical simulations on the decoding failure of GC codes and their applications to file synchronization are shown in Section IX and X, respectively We conclude with some open problems in Section XI II NOTATION Let and n be the lengths in bits of the message and codeword, respectively Let be the number of deletions Without loss of generality, we assume that is a power of Our code is based on a q ary systematic ( + c, ) MDS code, where q = > + c and c > is a code parameter representing the number of MDS parity symbols 3 3 Explicit constructions of systematic Reed-Solomon codes (based on Cauchy or Vandermonde matrices) always exist for these parameters We will drop the ceiling notation for and simply write All logarithms in this paper are of base The bloc diagram of the encoder is shown in Fig We denote binary and q ary vectors by lower and upper case bold letters respectively, and random variables by calligraphic letters III MAIN RESULT Let u be a binary vector of length with iid Bernoulli(/) components representing the information message The message u is encoded into the codeword x of length n bits using the Guess & Chec (GC) code illustrated in Fig The GC decoder, explained in Section V, can either decode successfully and output the decoded string, or output a failure to decode error message because it cannot mae a correct decision The latter case is referred to as a decoding failure, and its corresponding event is denoted by F Theorem The Guess & Chec (GC) code can correct in polynomial time up to a constant number of deletions occurring within x Let c > be a constant integer The code has the following properties: ) Redundancy: n = c( + ) log bits ) Encoding complexity is O( log ), and decoding complex- ( ) ity is O + log ( 3) Probability of decoding failure: P r(f ) = O c log The probability of decoding failure in Theorem is true for any given deletion positions which are independent of x Hence, the same result can be also obtained for any random distribution over the deletion positions (lie the uniform distribution for example), by averaging over all the possible deletion positions GC codes enable trade-offs between the code properties shown in Theorem, this will be highlighted later in Section VII These properties show that: (i) the code rate, R = /( + c( + ) log ), is asymptotically optimal and approaches one as goes to infinity; (ii) the order of complexity is polynomial in and is not affected by the constant c; (iii) the probability of decoding failure goes to zero polynomially in if c > ; and exponentially in c for a fixed Note that the decoder can always detect when it cannot decode successfully This can serve as an advantage in models which allow feedbac There, the decoder can as for additional redundancy to be able to decode successfully )

3 IV EXAMPLES The GC code we propose can correct up to deletions with high probability We provide examples to illustrate the encoding and decoding schemes The examples are for = deletion just for the sae of simplicity 4 Example (Encoding) Consider a binary message u of length = 6 given by u = u is encoded by following the different encoding blocs illustrated in Fig ) Binary to q ary (Bloc I, Fig ) The message u is chuned into adjacent blocs of length log = 4 bits each, u = bloc α bloc bloc 3 α 3 bloc 4 Each bloc is then mapped to its corresponding symbol in GF (q), q = = 4 = 6, by considering its leftmost bit as its most significant bit This results in a string U which consists of = 4 symbols in GF (6) The extension field used here has a primitive element α, with α 4 = α + Hence, U GF (6) 4 is given by U = (α,, α 3, ) ) Systematic MDS code (Bloc II, Fig ) U is then coded using a systematic ( + c, ) = (6, 4) MDS code over GF (6), with c = > The encoded string is denoted by X GF (6) 6 and is given by multiplying U by the following code generator matrix, X = ( α,, α 3, ) α α, α 3 = (α,, α 3,, α, α ) 3) Q ary to binary (Bloc III, Fig ) The binary codeword corresponding to X, of length n = + log = 4 bits, is x = For simplicity we sip the last encoding step (Bloc IV) intended to protect the parity bits and assume that deletions affect only the systematic bits The high level idea of the decoding algorithm is to: (i) mae an assumption on in which bloc the bit deletion has occurred (the guessing part); (ii) chun the bits accordingly, treat the affected bloc as erased, decode the erasure and chec whether the obtained sequence is consistent with the parities (the checing part); (iii) go over all the possibilities 4 VT codes can correct one deletion with zero-error However, GC codes are generalizable to multiple deletions Example (Successful Decoding) Suppose that the 4 th bit of x gets deleted, x = The decoder receives the following 3 bit string y, y = The decoder goes through all the possible = 4 cases, where in each case i, i =,, 4, the deletion is assumed to have occurred in bloc i and y is chuned accordingly Given this assumption, symbol i is considered erased and erasure decoding is applied over GF (6) to recover this symbol Furthermore, given two parities, each symbol i can be recovered in two different ways Without loss of generality, we assume that the first parity p, p = α, is the parity used for decoding the erasure The decoded q ary string in case i is denoted by Y i GF (6) 4, and its binary representation is denoted by y i GF () 6 The four cases are shown below: Case : The deletion is assumed to have occurred in bloc, so y is chuned as follows and the erasure is denoted by E, E α 5 α 4 α α Applying erasure decoding over GF (6), the recovered value of symbol is α 3 Hence, the decoded q ary string Y GF (6) 4 is Y = (α 3,, α 5, α 4 ) Its equivalent in binary y GF () 6 is y = α 3 α 5 α 4 Now, to chec our assumption, we test whether Y is consistent with the second parity p = α However, the computed parity is ( α 3,, α 5, α 4) (, α, α, α 3) T = α α This shows that Y does not satisfy the second parity Therefore, we deduce that our assumption on the deletion location is wrong, ie, the deletion did not occur in bloc Throughout the paper we refer to such cases as impossible cases Case : The deletion is assumed to have occurred in bloc, so the sequence is chuned as follows α E α 5 α 4 α α Applying erasure decoding, the recovered value of symbol is α 4 Now, before checing whether the decoded string is consistent with the second parity p, one can notice that the binary representation of the decoded erasure () is not a supersequence of the sub-bloc () So, without checing p, we can deduce that this case is impossible Definition We restrict this definition to the case of = deletion with two MDS parity symbols in GF (q) A case i, i =,,,, is said to be possible if it satisfies the

4 two criteria below simultaneously Criterion : The q ary string decoded based on the first parity in case i, denoted by Y i, satisfies the second parity Criterion : The binary representation of the decoded erasure is a supersequence of its corresponding sub-bloc If any of the two criteria is not satisfied, the case is said to be impossible The two criteria mentioned above are both necessary For instance, in this example, case does not satisfy Criterion but it is easy to verify that it satisfies Criterion Furthermore, case satisfies Criterion but does not satisfy Criterion A case is said to be possible if it satisfies both criteria simultaneously Case 3: The deletion is assumed to have occurred in bloc 3, so the sequence is chuned as follows α E α 4 α In this case, the decoded binary string is y 3 = α α 8 α α 4 By following the same steps as cases and, it is easy to verify that both criteria are not satisfied in this case, ie, case 3 is also impossible Case 4: The deletion is assumed to have occurred in bloc 4, so the sequence is chuned as follows α α 3 E In this case, the decoded binary string is y 4 = α α α 3 α Here, it is easy to verify that this case satisfies both criteria and is indeed possible After going through all the cases, case 4 stands alone as the only possible case So the decoder declares successful decoding and outputs y 4 (y 4 = u) The next example considers another message u and shows how the proposed decoding scheme can lead to a decoding failure The importance of Theorem is that it shows that the probability of a decoding failure vanishes a goes to infinity Example 3 (Decoding failure) Let = 6, = and c = Consider the binary message u given by u = Following the same encoding steps as before, the q ary codeword X GF (6) 6 is given by X = (α 3,, α 3, α 8,, α 8 ) We still assume that the deletion affects only the systematic bits Suppose that the 4 th bit of the binary codeword x GF () 4 gets deleted x = The decoder receives the following 3 bit binary string y GF () 3, y = The decoding is carried out as explained in Example The q ary strings decoded in case and case 4 are Y = (α 3, α 3, α, ), Y 4 = (α 3,, α 3, α 8 ) It is easy to verify that both cases and 4 are possible cases The decoder here cannot now which of the two cases is the correct one, so it declares a decoding failure Remar In the previous analysis, each case refers to the assumption that a certain bloc is affected by the deletion Hence, among all the cases considered, there is only one correct case that corresponds to the actual deletion location That correct case always satisfies the two criteria for possible cases (Definition ) So whenever there is only one possible case (lie in Example ), the decoding will be successful since that case would be for sure the correct one However, in general, the analysis may yield multiple possible cases Nevertheless, the decoding can still be successful if all these possible cases lead to the same decoded string An example of this is when the transmitted codeword is the all s sequence Regardless of the deletion position, this sequence will be decoded as all s in all the cases In fact, whenever the deletion occurs within a run of s or s that extends to multiple blocs, the cases corresponding to these blocs will all be possible and lead to the same decoded string However, sometimes the possible cases can lead to different decoded strings lie in Example 3, thereby causing a decoding failure V GENERAL ENCODING AND DECODING OF GC CODES In this section, we describe the general encoding and decoding schemes that can correct up to deletions The encoding and decoding steps for > deletions are a direct generalization of the steps for = described in the previous section For decoding, we assume without loss of generality that exactly deletions have occurred Therefore, the length of the binary string y received by the decoder is n bits A Encoding Steps The encoding steps follow from the bloc diagram that was shown in Fig ) Binary to q ary (Bloc I, Fig ) The message u is chuned into adjacent blocs of length log bits each Each bloc is then mapped to its corresponding symbol in GF (q), where q = log = This results in a string U which consists of symbols in GF (q) 5 ) Systematic MDS code (Bloc II, Fig ) U is then coded using a systematic ( + c, ) MDS code over 5 After chuning, the last bloc may contain fewer than log bits In order to map the bloc to its corresponding symbol, it is first padded with zeros to a length of log bits Hence, U consists of symbols We drop the ceiling notation throughout the paper and simply write

5 GF (q), where c > is a code parameter The q ary codeword X consists of + c symbols in GF (q) 3) Q ary to binary (Bloc III, Fig ) The binary representations of the symbols in X are concatenated respectively 4) Coding parity bits by repetition (Bloc IV, Fig ) Only the parity bits are coded using a ( + ) repetition code, ie, each bit is repeated ( + ) times The resulting binary codeword x to be transmitted is of length n = + c ( + ) log B Decoding Steps ) Decoding the parity symbols of Bloc II (Fig ): these parities are protected by a (+) repetition code, and therefore can be always recovered correctly by the decoder A simple way to do this is to examine the bits of y from right to left and decode deletions instantaneously until the total length of the decoded sequence is c( + ) bits (the original length of the coded parity bits) Therefore, for the remaining steps we will assume without loss of generality that all the deletions have occurred in the systematic bits ) The guessing part: the number of possible ways to distribute the deletions among the blocs is ( ) + t = We index these possibilities by i, i =,, t, and refer to each possibility by case i The decoder goes through all the t cases (guesses) 3) The checing part: for each case i, i =,, t, the decoder (i) chuns the sequence according to the corresponding assumption; (ii) considers the affected blocs erased and maps the remaining blocs to their corresponding symbols in GF (q); (iii) decodes the erasures using the first parity symbols; (iv) checs whether the case is possible or not based on the criteria described below Definition For deletions, a case i, i =,,, t, is said to be possible if it satisfies the two criteria below simultaneously Criterion : the decoded q ary string in case i, denoted by Y i GF (q), satisfies the last c parities simultaneously Criterion : The binary representations of all the decoded erasures in Y i are supersequences of their corresponding subblocs If any of these two criteria is not satisfied, the case is said to be impossible 4) After going through all the cases, the decoder declares successful decoding if (i) only one possible case exists; or (ii) multiple possible cases exist but all lead to the same decoded string Otherwise, the decoder declares a decoding failure A Redundancy VI PROOF OF THEOREM The ( + c, ) q ary MDS code in step of the encoding scheme adds a redundancy of c log bits These c log bits are then coded using a ( + ) repetition code Therefore, the overall redundancy of the code is c( +) log bits B Complexity i) Encoding Complexity: The complexity of mapping a binary string to its corresponding q ary symbol (step ), or vice versa (step 3), is O() Furthermore, the encoding complexity of a (+c, ) q ary systematic MDS code is quantified by the complexity of computing the c MDS parity symbols Computing one MDS parity symbol involves multiplications of symbols in GF (q) The complexity of multiplying two symbols in GF (q) is O(log q) Recall that in our code q = Therefore, the complexity of step is O(log c ) = O(c log ) Step 4 in the encoding scheme codes c log bits by repetition, its complexity is O(c log ) Therefore, the encoding complexity of GC codes is O(c log ) = O( log ) since c > is a constant ii) Decoding Complexity: The computationally dominant step in the decoding scheme is step 3, that goes over all the t cases and decodes the erasures in each case 6 The coding scheme wors for any systematic MDS code construction Suppose we apply the Berleamp-Massey algorithm which can be used in Reed-Solomon codes to decode erasures The Berleamp-Massey algorithm has a complexity of O( ) [5] Since the erasure decoding is performed for all the t cases, the total decoding complexity is O ( t ) The number of cases t is given by ( ) ( ) + t = = O log Therefore the decoding complexity is ( ) + O log, which is polynomial in for constant C Proof of the probability of decoding failure for = To prove the upper bound on the probability of decoding failure, we first introduce the steps of the proof for = deletion Then, we generalize the proof to the case of > The probability of decoding failure for = is computed over all possible bit messages Recall that the bits of the message u are iid Bernoulli(/) The message u is encoded as shown in Fig For =, the decoder goes through a total of cases, where in a case i it decodes by assuming that bloc i is affected by the deletion Let Y i be the random variable representing the q ary string decoded in case i, i =,,,, in step 3 of the decoding scheme Let Y GF (q) be a realization of the random variable Y i We denote by P r GF (q), r =,,, c, the random 6 The complexity of checing whether a decoded erasure, of length log bits, is a supersequence of its corresponding sub-bloc is O(log ) using the Wagner-Fischer algorithm Hence, Criterion (Definition ) does not affect the order of decoding complexity

6 variable representing the r th MDS parity symbol (Bloc II, Fig ) Also, let G r GF (q) be the MDS encoding vector responsible for generating P r Consider c > arbitrary MDS parities p,, p c, for which we define the following sets For r =,, c, A r {Y GF (q) G T r Y = p r, A A A A c A r and A are affine subspaces of dimensions and c, respectively Therefore, A r = q log and A = q log c () Recall that the correct values of the MDS parities are recovered at the decoder, and that for =, Y i is decoded based on the first parity Hence, for a fixed MDS parity p, and for = deletion, Y i taes values in A Note that Y i is not necessarily uniformly distributed over A For instance, if the assumption in case i is wrong, two different message inputs can generate the same decoded string Y i A We illustrate this later through Example 4 The next claim gives an upper bound on the probability mass function of Y i for = deletion Claim For any case i, i =,,,, P r (Y i = Y P = p ) q log Claim can be interpreted as that at most different input messages can generate the same decoded string Y i A We assume Claim is true for now and prove it later in Section VI-E Next, we use this claim to prove the following upper bound on the probability of decoding failure for =, P r(f ) < c log () In the general decoding scheme, we mentioned two criteria which determine whether a case is possible or not (Definition ) Here, we upper bound P r(f ) by taing into account Criterion only Based on Criterion, if a case i is possible, then Y i satisfies all the c MDS parities simultaneously, ie, Y i A Without loss of generality, we assume case is the correct case, ie, the deletion occurred in bloc A decoding failure is declared if there exists a possible case j, j =,,, that leads to a decoded string different than that of case Namely, Y j A and Y j Y Therefore, P r (F P = p ) P r = = = = < j= j= j= j= j= j= {Y j A, Y j Y P = p P r (Y j A, Y j Y P = p ) (3) (4) P r (Y j A P = p ) (5) P r (Y j = Y P = p ) (6) Y A Y A A q log (7) q log (8) q log c (9) j= q log ( ) log q c () c log () (4) follows from applying the union bound (5) follows from the fact that P r (Y j Y Y j A, P = p ) (7) follows from Claim (9) follows from () () follows from the fact that q = in the coding scheme Next, to complete the proof of (), we use () and average over all values of p P r(f ) = P r (F P = p ) P r (P = p ) p GF (q) < < p GF (q) c log c log P r (P = p ) D Proof of the probability of decoding failure for deletions We now generalize the previous proof for > deletions and show that ( ) P r(f ) = O c log For deletions the number of cases is given by ( ) ( ) + t = = O log () Consider the random variable Y i which represents the q ary string decoded in case i, i =,,, t The next claim generalizes Claim for > deletions

7 Claim There exists a deterministic function h of, h() independent of, such that for any case i, i =,,, t, P r (Y i = Y P = p,, P = p ) h() q log We assume Claim is true for now and prove it later (see Appendix A) To bound the probability of decoding failure for deletions, we use the result of Claim and follow the same steps of the proof for = deletion while considering t cases instead of Some of the steps will be sipped for the sae of brevity t P r (F p,, p ) P r {Y j A, Y j Y p,, p Furthermore, P r(f ) = Therefore, < j= (3) t P r (Y j A p,, p ) (4) j= t h() j= Y A q log (5) < t h() (6) qc p,,p GF (q) p,,p GF (q) P r (F p,, p ) P r (p,, p ) (7) t h() q c P r (p,, p ) (8) < t h() (9) qc ( P r(f ) = O c log ) () (5) follows from Claim (8) follows from (6) () follows from (), (9) and the fact that h() is constant (independent of ) for a constant E Proof of Claim We focus on case i (i fixed) that assumes that the deletion has occurred in bloc i We observe the output Y i of the decoder in step 3 of the decoding scheme for all possible input messages, for a fixed deletion position and a given parity p Recall that Y i is a random variable taing values in A Claim gives an upper bound on the probability mass function of Y i for any i and for a given p We distinguish here between two cases If case i is correct, ie, the assumption on the deletion position is correct, then Y i is always decoded correctly and it is uniformly distributed over A If case i is wrong, then Y i is not uniformly distributed over A as illustrated in the next example Example 4 Let the length of the binary message u be = 6 and consider = deletion Let the first MDS parity p be the sum of the = 4 message symbols Consider the previously defined set A with p = Consider the messages, U = (,,, ) A, U = (α,,, α) A For the sae of simplicity, we sip the last encoding step (Bloc IV, Fig ), and assume that p is recovered at the decoder Therefore, the corresponding codewords to be transmitted are x =, x = Now, assume that the 3 rd bit of x and x was deleted, and case 4 (wrong case) is considered It easy to verify that in this case, for both codewords, the q ary output of the decoder will be Y 4 = (,,, ) A This shows that there exists a wrong case i, where the same output can be obtained for two different inputs and a fixed deletion position Thereby, the distribution of Y i over A is not uniform The previous example suggests that to find the bound in Claim, we need to determine the maximum number of different inputs that can generate the same output for an arbitrary fixed deletion position and a given parity p We call this number γ Once we obtain γ we can write P r (Y i = Y D = d, P = p ) γ A = γ, () log q where D {,, n is the random variable representing the position of the deleted bit We will explain our approach for determining γ by going through an example for = 6 that can be easily generalized for any We denote by b z GF (), z =,,,, the bit of the message u in position z Example 5 Let = 6 and = Consider the binary message u given by u = b b b 3 b 4 b 5 b 6 b 7 b 8 b 9 b b b b 3 b 4 b 5 b 6 The extension field used here has a primitive element α, with α 4 = α + Assume that b 3 was deleted and case 4 is considered Hence, the binary string received at the decoder is chuned as follows b b b 4 b 5 b 6 b 7 b 8 b 9 b b b b 3 b 4 b 5 b 6, S S S 3 E where the erasure is denoted by E, and S, S and S 3 are the first 3 symbols of Y 4 given by S = α 3 b + α b + αb 4 + b 5 GF (6), S = α 3 b 6 + α b 7 + αb 8 + b 9 GF (6), S 3 = α 3 b + α b + αb + b 3 GF (6) The fourth symbol S 4 of Y 4 is to be determined by erasure decoding Suppose that there exists another message u u such that u and u lead to the same decoded string

8 Y 4 = (S, S, S 3, S 4 ) Since these two messages generate the same values of S, S and S 3, then they should have the same values for the following bits b b b 4 b 5 b 6 b 7 b 8 b 9 b b b b 3 We refer to these log = bits by fixed bits The only bits that can be different in u and u are b 4, b 5 and b 6 which correspond to the erasure, and the deleted bit b 3 We refer to these log = 4 bits by free bits Although these free bits can be different in u and u, they are constrained by the fact that the first parity in their corresponding q ary codewords X and X should have the same value p Next, we express this constraint by a linear equation in GF (6) Without loss of generality, we assume that p GF (6) is the sum of the = 4 message symbols Hence, p is given by p = α 3 (b + b 5 + b 9 + b 3 ) + α (b + b 6 + b + b 4 ) + α (b 3 + b 7 + b + b 5 ) + (b 4 + b 8 + b + b 6 ) Rewriting the previous equation by having the free bits on the LHS and the fixed bits and p on the RHS we get αb 3 + α b 4 + αb 5 + b 6 = p, () where p GF (6) and is given by p = p + α 3 (b + b 5 + b 9 + b 3 ) + α (b + b 6 + b ) + α (b 7 + b ) + (b 4 + b 8 + b ) The previous equation can be written as the following linear equation in GF (6), α 3 + b 4 α + (b 3 + b 5 ) α + b 6 = p (3) Now, to determine γ, we count the number of solutions of (3) If the unnowns in (3) were symbols in GF (6), then the solutions of (3) would span an affine subspace of size q = 6 3 However, the unnowns in (3) are binary, so we show next that it has at most solutions Let a 3 α 3 + a α + a α + a = p (4) be the polynomial representation of p in GF (6) where (a 3, a, a, a ) GF () 4 Every element in GF (6) has a unique polynomial representation of degree at most 3 Comparing (3) and (4), we obtain the following system of equations b 6 = a, b 3 + b 5 = a, b 4 = a, = a 3 If a 3, then (3) has no solution If a 3 =, then (3) has solutions because b 3 + b 5 = a has solutions Therefore, (3) has at most solutions, ie, γ The analysis in Example 5 can be directly generalized for messages of any length In general, the analysis yields log free bits and log fixed bits Now, we generalize (3) and show that γ for any Without loss of generality, we assume that p GF (q) is the sum of the symbols of the q ary message U Consider a wrong case i that assumes that the deletion has occurred in bloc i Let d j be a fixed bit position in bloc j, j i, that represents the position of the deletion Depending on whether the deletion occurred before or after bloc i, the generalization of (3) is given by one of the two following equations in GF (q) If j < i, b dj α w + b l+ α m + b l α m + + b l+m = p, (5) If j > i, b dj α w + b l α m + b l+ α m + + b l+m α = p, (6) where l = (i ) log +, m = log, w = j log b j and p GF (q) (the generaliztion of p in Example 5) is the sum of p and the part corresponding to the fixed bits Suppose that j < i Note that w m, so (5) can be written as b l+ α m + + (b dj + b l+o )α w + + b l+m = p, (7) where o is an integer such that o m Hence, by the same reasoning used in Example 5 we can conclude that (7) has at most solutions The same reasoning applies for (6), where j > i Therefore, γ and from () we have P r (Y i = Y D = d, P = p ) q log (8) The bound in (8) holds for arbitrary d Therefore, the upper bound on the probability of decoding failure in () holds for any deletion position piced independently of the codeword Moreover, for any given distribution on D (lie the uniform distribution for example), we can apply the total law of probability with respect to D and use the result from (8) to get P r (Y i = Y P = p ) VII TRADE-OFFS q log As previously mentioned, the first encoding step in GC codes (Bloc I, Fig ) consists of chuning the message into blocs of length log bits In this section, we generalize the results in Theorem by considering chuns of arbitrary length l bits (l ) 7, instead of log bits We show that if l = Ω(log ), then GC codes have an asymptotically vanishing probability of decoding failure This generalization allows us to demonstrate two trade-offs achieved by GC codes, based on the code properties in Theorem Theorem The Guess & Chec (GC) code can correct in polynomial time up to a constant number of deletions Let c > be a constant integer The code has the following properties: ) Redundancy: n = c( + )l bits ) Encoding ( complexity is O(l), and decoding complexity is O ) + l 7 For l = and c =, the code becomes a ( + ) repetition code

9 ( 3) Probability of decoding failure: P r(f ) = O (/l) ) l(c ) Proof See Appendix B The code properties in Theorem enable two trade-offs for GC codes: ) Trade-off A: By increasing l for a fixed and c, we observe from Theorem that the redundancy increases linearly while the decoding complexity decreases as a polynomial of degree Moreover, increasing l also decreases the probability of decoding failure as a polynomial of degree Therefore, trade-off A shows that by paying a linear price in terms of redundancy, we can simultaneously gain a degree polynomial improvement in both decoding complexity and probability of decoding failure ) Trade-off B: By increasing c for a fixed and l, we observe from Theorem that the redundancy increases linearly while the probability of decoding failure decreases exponentially Here, the order of complexity is not affected by c Therefore, trade-off B shows that by paying a linear price in terms of redundancy, we can gain an exponential improvement in probability of decoding failure, without affecting the order of complexity This trade-off is of particular importance in models which allow feedbac, where asing for additional parities (increasing c) will highly increase the probability of successful decoding VIII CORRECTING INSERTIONS In this section we show that by modifying the decoding scheme of GC codes, and eeping the same encoding scheme, we can obtain codes that can correct insertions, instead of deletions In fact, the resulting code properties for insertions (Theorem 3), are the same as that of deletions Recall that l (Section VII) is the code parameter representing the chuning length, ie, the number of bits in a single bloc For correcting insertions, we eep the same encoding scheme as in Fig, and for decoding we only modify the following: (i) Consider the assumption that a certain bloc B is affected by insertions, then while decoding we chun a length of l + at the position of bloc B (compared to chuning l bits when decoding deletions); (ii) The blocs assumed to be affected by insertions are considered to be erased (same as the deletions problem), but now for Criterion (Definition ) we chec if the decoded erasure is a subsequence of the chuned superbloc (of length l+ ) The rest of the details in the decoding scheme stay the same Theorem 3 The Guess & Chec (GC) code can correct in polynomial time up to a constant number of insertions Let c > be a constant integer The code has the following properties: ) Redundancy: n = c( + )l bits ) Encoding ( complexity is O(l), and decoding complexity is O ) + l ( 3) Probability of decoding failure: P r(f ) = O (/l) ) l(c ) We omit the proof of Theorem 3 because the same analysis applies as in the proof of Theorem More specifically, the redundancy and the encoding complexity are the same as in Theorem because we use the same encoding scheme The decoding complexity is also the same as in Theorem because the total number of cases to be checed by the decoder is unchanged The proof of the upper bound on the probability of decoding failure also applies similarly using the same techniques IX SIMULATION RESULTS We simulated the decoding of GC codes and compared the obtained probability of decoding failure to the upper bound in Theorem We tested the code for messages of length = 56, 5 and 4 bits, and for =, 3 and 4 deletions To guarantee an asymptotically vanishing probability Config 3 4 R P r(f ) R P r(f ) R P r(f ) e e e e TABLE I: The table shows the code rate R = /n and the probability of decoding failure P r(f ) of GC codes for different message lengths and different number of deletions The results shown are for c = + and l = log The results of P r(f ) are averaged over runs of simulations In each run, a message u chosen uniformly at random is encoded into the codeword x bits are then deleted uniformly at random from x, and the resulting string is decoded of decoding failure, the upper bound in Theorem requires that c > Therefore, we mae a distinction between two regimes, (i) < c < : Here, the theoretical upper bound is trivial Table I gives the results for c = + with the highest probability of decoding failure observed in our simulations being of the order of 3 This indicates that GC codes can decode correctly with high probability in this regime, although not reflected in the upper bound; (ii) c > : The upper bound is of the order of 5 for = 4, =, and c = + In the simulations no decoding failure was detected within runs for + c + In general, the simulations show that GC codes perform better than what the upper bound indicates This is due to the fact that the effect of Criterion (Definition ) is not taen into account when deriving the upper bound in Theorem These simulations were performed on a personal computer and the programming code was not optimized The average decoding time 8 is in the order of milliseconds for ( = 4, = ), order of seconds for ( = 4, = 3), and order of minutes for ( = 4, = 4) Going beyond these values of and will largely increase the running time due to the number of cases to be tested by the decoder However, for the file synchronization application in which we are interested (see next section) the values and are relatively small and decoding can be practical 8 These results are for l = log, the decoding time can be decreased if we increase l (trade-off A, Section VII)

10 X APPLICATION TO FILE SYNCHRONIZATION In this section, we describe how our codes can be used to construct interactive protocols for file synchronization We consider the model where two nodes (servers) have copies of the same file but one is obtained from the other by deleting d bits These nodes communicate interactively over a noiseless lin to synchronize the file affected by deletions Some of the most recent wor on synchronization can be found in [3] [7] In this section, we modify the synchronization algorithm by Venataramanan et al [3,4], and study the improvement that can be achieved by including our code as a blac box inside the algorithm The ey idea in [3,4] is to use center bits to divide a large string, affected by d deletions, into shorter segments, such that each segment is affected by one deletion at most Then, use VT codes to correct these segments Now, consider a similar algorithm where the large string is divided such that the shorter segments are affected by ( < d) or fewer deletions Then, use the GC code to correct the segments affected by more than one deletion 9 We set c = + and l = log, and if the decoding fails for a certain segment, we send one extra MDS parity at a time within the next communication round until the decoding is successful By implementing this algorithm, the gain we get is two folds: (i) reduction in the number of communication rounds; (ii) reduction in the total communication cost We performed simulations for = on files of size Mb, for different numbers of deletions d The results are illustrated in Table II We refer to the original scheme in [3,4] by Sync- VT, and to the modified version by Sync-GC The savings for = are roughly 43% to 73% in number of rounds, and 5% to 4% in total communication cost Number of rounds Total communication cost d Sync-VT Sync-GC Sync-VT Sync-GC TABLE II: Results are averaged over runs In each run, a string of size Mb is chosen uniformly at random, and the file to be synchronized is obtained by deleting d bits from it uniformly at random The total communication cost is expressed in bits The number of center bits used is 5 Note that the interactive algorithm in [3,4] also deals with the general file synchronization problem where the edited file is affected by both deletions and insertions There, the string is divided into shorter segments such that each segment is either: (i) not affected by any deletions/insertions; or (ii) affected by only one deletion; or (iii) affected by only one insertion Then, VT codes are used to correct the short segments GC codes can also be used in a similar manner when both deletions and insertions are involved In this case, the string would be divided such as the shorter segments are affected by: (i) or fewer deletions; or (ii) or fewer insertions 9 VT codes are still used for segments affected by only one deletion XI CONCLUSION In this paper, we introduced a new family of codes, that we called Guess & Chec (GC) codes, that can correct multiple deletions (or insertions) with high probability We provided deterministic polynomial time encoding and decoding schemes for these codes We validated our theoretical results by numerical simulations Moreover, we showed how these codes can be used in applications to remote file synchronization In conclusion, we point out some open problems and possible directions of future wor: ) GC codes can correct deletions or insertions Generalizing these constructions to deal with mixed deletions and insertions (indels) is a possible future direction ) In our analysis, we declare a decoding failure if the decoding yields more than one possible candidate string A direction is to study the performance of GC codes in list decoding 3) From the proof of Theorem, it can be seen that bound on the probability of decoding failure of GC codes holds if the deletion positions are chosen by an adversary that does not observe the codeword It would be interesting to study the performance of GC codes under more powerful adversaries which can observe part of the codeword, while still allowing a vanishing probability of decoding failure APPENDIX A PROOF OF CLAIM Claim is a generalization of Claim for > deletions Recall that the decoder goes through t cases where in each case corresponds to one possible way to distribute the deletions among the blocs Claim states that there exists a deterministic function h of, h() independent of, such that for any case i, i =,,, t, P r (Y i = Y P = p,, P = p ) h() q, log where Y i is the random variable representing the q ary string decoded in case i, i =,,, t, and P r, r =,, c, is the random variable representing the r th MDS parity symbol To prove the claim, we follow the same approach used in the proof of Claim Namely, we count the maximum number of different inputs (messages) that can generate the same output (decoded string) for fixed deletion positions (d,, d ) and given parities (p,, p ) Again, we call this number γ Recall that, for deletions, Y i is decoded based on the first parities Hence, Y i A, where A A A A, (9) A r {Y GF (q) G T r Y = p r, (3) for r =,, We are interested in showing that γ is independent of the binary message length To this end, we The underlying assumption here is that the deletions affected exactly blocs In cases where it is assumed that less than blocs are affected, then less than parities will be used to decode Y i, and the same analysis applies The set A is the generalization of set A for =

11 upper bound γ by a deterministic function of denoted by h() Hence, we establish the following bound P r (Y i = Y d,, d, p,, p ) γ A < h() q log (3) We will now explain the approach for bounding γ through an example for = deletions Example 6 Let = 3 and = Consider the binary message u given by u = b b b 3 Its corresponding q ary message U consists of 7 symbols (blocs) of length log = 5 bits each The message u is encoded into a codeword x using the GC code (Fig ) We assume that the first parity is the sum of the systematic symbols and the encoding vector for the second parity is (, α, α, α 3, α 4, α 5, α 6 ) Moreover, we assume that the actual deleted bits in x are b and b 7, but the chuning is done based on the assumption that deletions occurred in the 3 rd and 5 th bloc (wrong case) Similar to Example 5, it can be shown that the free bits are constrained by the following system of two linear equations in GF (3), α 4 b + α 3 b 7 + α b 3 + α 4 b 4 + b 5 + α 4 b 6 + α 3 b + α b 3 + αb 4 + b 5 = p, α 4 b + α 4 b 7 + α (α b 3 + α 4 b 4 + b 5 ) + α 7 b 6 + α 5 (α 3 b + α b 3 + αb 4 + b 5 ) = p (3) To upper bound γ, we upper bound the number of solutions of the system given by (3) Equation (3) can be written as follows { (b + b 6 )α 4 + b 7 α 3 + B + B = p, (b + b 7 )α 4 + b 6 α 7 + α B + α 5 B = p, where B and B are two symbols in GF (3) given by (33) B = α b 3 + α 4 b 4 + b 5, (34) B = α 3 b + α b 3 + αb 4 + b 5 (35) Notice that the coefficients of B and B in (33) originate from the MDS encoding vectors Hence, for given bit values of b, b 7 and b 6, the MDS property implies that (33) has a unique solution for B and B Furthermore, since B and B have unique polynomial representations in GF (3) of degree at most 4, for given values of B and B, (34) and (35) have at most one solution for b 3, b 4, b 5, b, b 3, b 4 and b 5 We thin of bits b 3, b 4, b 5, b, b 3, b 4 and b 5 as free bits of type I, and bits b, b 7, b 6 as free bits of type II The previous reasoning indicates that for given values of the bits of type II, (3) has at most one solution Therefore, an upper bound on γ is given by the number of possible choices of the bits of type II Hence, γ 3 = 8 The extension field used is GF (3) and has a primitive element α, with α 5 = α + Now, we generalize the previous example and upper bound γ for > deletions Without loss of generality, we assume that the deletions occur in different blocs Then, the free bits are constrained by a system of linear equations in GF (q), where q = Let ν () and µ () be non-negative integers of value at most Each of the equations has the following form β α νi b µi + α νj B j = p, (36) i= j= where β is the number of free bits of type II, B j GF (q) is a linear combination of part of the bits in bloc j ( free bits of type I), and p GF (q) The coefficients of B j, j =,,, originate from the MDS code generator matrix Hence, for given values of the bits of type II, the system of equations has a unique solution for B j, j =,, Furthermore, the linear combination of the bits of type I which gives B j has the following form B j = α m b j + α m b j + + α m λ+ b jλ, (37) where m < log is an integer, and λ is the number of free bits of type I Notice that (37) corresponds to a polynomial representation in GF (q), q =, of degree less than log Hence, for a given B j, (37) has at most one solution Therefore, γ is upper bounded by the number of possible choices of the bits of type II, ie, γ β Recall that, when it is assumed that the bloc j is affected by deletions, a sub-bloc of bits is chuned at the position of bloc j However, because of the shift caused by the deletions, that sub-bloc may contain bits which do not originate from bloc j The λ free bits of type I in B j, b j,, b jλ, are the bits of the sub-bloc which do originate from bloc j Since the shift at bloc j is at most positions, it is easy to see that λ > log Hence, since β = + (log λ), we can show that γ < (+) h() 3 Therefore, we have shown that γ is upper bounded by a deterministic function of that is independent of Since the bound in (3) holds for arbitrary deletion positions (d,, d ), the upper bound on the probability of decoding failure in Theorem holds for any deletion positions piced independently of the codeword Moreover, for any given distribution on the deletion positions (lie the uniform distribution for example), we can apply the total law of probability and use the result from (3) to get P r (Y i = Y P = p,, P = p ) < APPENDIX B PROOF OF THEOREM h() q log We consider the same encoding scheme illustrated in Fig with the only modification that the message is chuned into blocs of length l bits, and the field is size is q = l Taing 3 If the deletions occur in z < blocs, then β = + z(log λ), thus the upper bound would still hold

Guess & Check Codes for Deletions, Insertions, and Synchronization

Guess & Check Codes for Deletions, Insertions, and Synchronization Guess & Check Codes for Deletions, Insertions, and Synchronization Serge Kas Hanna, Salim El Rouayheb ECE Department, Rutgers University sergekhanna@rutgersedu, salimelrouayheb@rutgersedu arxiv:759569v3

More information

Correcting Bursty and Localized Deletions Using Guess & Check Codes

Correcting Bursty and Localized Deletions Using Guess & Check Codes Correcting Bursty and Localized Deletions Using Guess & Chec Codes Serge Kas Hanna, Salim El Rouayheb ECE Department, Rutgers University serge..hanna@rutgers.edu, salim.elrouayheb@rutgers.edu Abstract

More information

Correcting Localized Deletions Using Guess & Check Codes

Correcting Localized Deletions Using Guess & Check Codes 55th Annual Allerton Conference on Communication, Control, and Computing Correcting Localized Deletions Using Guess & Check Codes Salim El Rouayheb Rutgers University Joint work with Serge Kas Hanna and

More information

A Piggybacking Design Framework for Read-and Download-efficient Distributed Storage Codes

A Piggybacking Design Framework for Read-and Download-efficient Distributed Storage Codes A Piggybacing Design Framewor for Read-and Download-efficient Distributed Storage Codes K V Rashmi, Nihar B Shah, Kannan Ramchandran, Fellow, IEEE Department of Electrical Engineering and Computer Sciences

More information

Minimum Repair Bandwidth for Exact Regeneration in Distributed Storage

Minimum Repair Bandwidth for Exact Regeneration in Distributed Storage 1 Minimum Repair andwidth for Exact Regeneration in Distributed Storage Vivec R Cadambe, Syed A Jafar, Hamed Malei Electrical Engineering and Computer Science University of California Irvine, Irvine, California,

More information

Lecture 14 October 22

Lecture 14 October 22 EE 2: Coding for Digital Communication & Beyond Fall 203 Lecture 4 October 22 Lecturer: Prof. Anant Sahai Scribe: Jingyan Wang This lecture covers: LT Code Ideal Soliton Distribution 4. Introduction So

More information

Lecture 12. Block Diagram

Lecture 12. Block Diagram Lecture 12 Goals Be able to encode using a linear block code Be able to decode a linear block code received over a binary symmetric channel or an additive white Gaussian channel XII-1 Block Diagram Data

More information

Secure RAID Schemes from EVENODD and STAR Codes

Secure RAID Schemes from EVENODD and STAR Codes Secure RAID Schemes from EVENODD and STAR Codes Wentao Huang and Jehoshua Bruck California Institute of Technology, Pasadena, USA {whuang,bruck}@caltechedu Abstract We study secure RAID, ie, low-complexity

More information

Training-Based Schemes are Suboptimal for High Rate Asynchronous Communication

Training-Based Schemes are Suboptimal for High Rate Asynchronous Communication Training-Based Schemes are Suboptimal for High Rate Asynchronous Communication The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation

More information

IN this paper, we consider the capacity of sticky channels, a

IN this paper, we consider the capacity of sticky channels, a 72 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 1, JANUARY 2008 Capacity Bounds for Sticky Channels Michael Mitzenmacher, Member, IEEE Abstract The capacity of sticky channels, a subclass of insertion

More information

Upper Bounds on the Capacity of Binary Intermittent Communication

Upper Bounds on the Capacity of Binary Intermittent Communication Upper Bounds on the Capacity of Binary Intermittent Communication Mostafa Khoshnevisan and J. Nicholas Laneman Department of Electrical Engineering University of Notre Dame Notre Dame, Indiana 46556 Email:{mhoshne,

More information

Lecture 3: Error Correcting Codes

Lecture 3: Error Correcting Codes CS 880: Pseudorandomness and Derandomization 1/30/2013 Lecture 3: Error Correcting Codes Instructors: Holger Dell and Dieter van Melkebeek Scribe: Xi Wu In this lecture we review some background on error

More information

REED-SOLOMON CODE SYMBOL AVOIDANCE

REED-SOLOMON CODE SYMBOL AVOIDANCE Vol105(1) March 2014 SOUTH AFRICAN INSTITUTE OF ELECTRICAL ENGINEERS 13 REED-SOLOMON CODE SYMBOL AVOIDANCE T Shongwe and A J Han Vinck Department of Electrical and Electronic Engineering Science, University

More information

Efficiently decodable codes for the binary deletion channel

Efficiently decodable codes for the binary deletion channel Efficiently decodable codes for the binary deletion channel Venkatesan Guruswami (venkatg@cs.cmu.edu) Ray Li * (rayyli@stanford.edu) Carnegie Mellon University August 18, 2017 V. Guruswami and R. Li (CMU)

More information

Error Correcting Codes Questions Pool

Error Correcting Codes Questions Pool Error Correcting Codes Questions Pool Amnon Ta-Shma and Dean Doron January 3, 018 General guidelines The questions fall into several categories: (Know). (Mandatory). (Bonus). Make sure you know how to

More information

Towards Optimal Deterministic Coding for Interactive Communication

Towards Optimal Deterministic Coding for Interactive Communication Towards Optimal Deterministic Coding for Interactive Communication Ran Gelles Bernhard Haeupler Gillat Kol Noga Ron-Zewi Avi Wigderson Abstract We show an efficient, deterministic interactive coding scheme

More information

Codes for Partially Stuck-at Memory Cells

Codes for Partially Stuck-at Memory Cells 1 Codes for Partially Stuck-at Memory Cells Antonia Wachter-Zeh and Eitan Yaakobi Department of Computer Science Technion Israel Institute of Technology, Haifa, Israel Email: {antonia, yaakobi@cs.technion.ac.il

More information

Optimal Exact-Regenerating Codes for Distributed Storage at the MSR and MBR Points via a Product-Matrix Construction

Optimal Exact-Regenerating Codes for Distributed Storage at the MSR and MBR Points via a Product-Matrix Construction Optimal Exact-Regenerating Codes for Distributed Storage at the MSR and MBR Points via a Product-Matrix Construction K V Rashmi, Nihar B Shah, and P Vijay Kumar, Fellow, IEEE Abstract Regenerating codes

More information

Explicit Capacity Approaching Coding for Interactive Communication

Explicit Capacity Approaching Coding for Interactive Communication Explicit Capacity Approaching Coding for Interactive Communication Ran Gelles Bernhard Haeupler Gillat Kol Noga Ron-Zewi Avi Wigderson Abstract We show an explicit that is, efficient and deterministic

More information

2012 IEEE International Symposium on Information Theory Proceedings

2012 IEEE International Symposium on Information Theory Proceedings Decoding of Cyclic Codes over Symbol-Pair Read Channels Eitan Yaakobi, Jehoshua Bruck, and Paul H Siegel Electrical Engineering Department, California Institute of Technology, Pasadena, CA 9115, USA Electrical

More information

The BCH Bound. Background. Parity Check Matrix for BCH Code. Minimum Distance of Cyclic Codes

The BCH Bound. Background. Parity Check Matrix for BCH Code. Minimum Distance of Cyclic Codes S-723410 BCH and Reed-Solomon Codes 1 S-723410 BCH and Reed-Solomon Codes 3 Background The algebraic structure of linear codes and, in particular, cyclic linear codes, enables efficient encoding and decoding

More information

A Polynomial-Time Algorithm for Pliable Index Coding

A Polynomial-Time Algorithm for Pliable Index Coding 1 A Polynomial-Time Algorithm for Pliable Index Coding Linqi Song and Christina Fragouli arxiv:1610.06845v [cs.it] 9 Aug 017 Abstract In pliable index coding, we consider a server with m messages and n

More information

Error-correcting codes and applications

Error-correcting codes and applications Error-correcting codes and applications November 20, 2017 Summary and notation Consider F q : a finite field (if q = 2, then F q are the binary numbers), V = V(F q,n): a vector space over F q of dimension

More information

Product-matrix Construction

Product-matrix Construction IERG60 Coding for Distributed Storage Systems Lecture 0-9//06 Lecturer: Kenneth Shum Product-matrix Construction Scribe: Xishi Wang In previous lectures, we have discussed about the minimum storage regenerating

More information

Relaying a Fountain code across multiple nodes

Relaying a Fountain code across multiple nodes Relaying a Fountain code across multiple nodes Ramarishna Gummadi, R.S.Sreenivas Coordinated Science Lab University of Illinois at Urbana-Champaign {gummadi2, rsree} @uiuc.edu Abstract Fountain codes are

More information

And for polynomials with coefficients in F 2 = Z/2 Euclidean algorithm for gcd s Concept of equality mod M(x) Extended Euclid for inverses mod M(x)

And for polynomials with coefficients in F 2 = Z/2 Euclidean algorithm for gcd s Concept of equality mod M(x) Extended Euclid for inverses mod M(x) Outline Recall: For integers Euclidean algorithm for finding gcd s Extended Euclid for finding multiplicative inverses Extended Euclid for computing Sun-Ze Test for primitive roots And for polynomials

More information

Robust Network Codes for Unicast Connections: A Case Study

Robust Network Codes for Unicast Connections: A Case Study Robust Network Codes for Unicast Connections: A Case Study Salim Y. El Rouayheb, Alex Sprintson, and Costas Georghiades Department of Electrical and Computer Engineering Texas A&M University College Station,

More information

An introduction to basic information theory. Hampus Wessman

An introduction to basic information theory. Hampus Wessman An introduction to basic information theory Hampus Wessman Abstract We give a short and simple introduction to basic information theory, by stripping away all the non-essentials. Theoretical bounds on

More information

6990 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 11, NOVEMBER Achievable Rates for Channels With Deletions and Insertions

6990 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 11, NOVEMBER Achievable Rates for Channels With Deletions and Insertions 6990 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 59, NO. 11, NOVEMBER 2013 Achievable Rates for Channels With Deletions and Insertions Ramji Venkataramanan, Member, IEEE, Sekhar Tatikonda, Senior Member,

More information

Recent Results on Capacity-Achieving Codes for the Erasure Channel with Bounded Complexity

Recent Results on Capacity-Achieving Codes for the Erasure Channel with Bounded Complexity 26 IEEE 24th Convention of Electrical and Electronics Engineers in Israel Recent Results on Capacity-Achieving Codes for the Erasure Channel with Bounded Complexity Igal Sason Technion Israel Institute

More information

MATH3302. Coding and Cryptography. Coding Theory

MATH3302. Coding and Cryptography. Coding Theory MATH3302 Coding and Cryptography Coding Theory 2010 Contents 1 Introduction to coding theory 2 1.1 Introduction.......................................... 2 1.2 Basic definitions and assumptions..............................

More information

Error Correcting Codes: Combinatorics, Algorithms and Applications Spring Homework Due Monday March 23, 2009 in class

Error Correcting Codes: Combinatorics, Algorithms and Applications Spring Homework Due Monday March 23, 2009 in class Error Correcting Codes: Combinatorics, Algorithms and Applications Spring 2009 Homework Due Monday March 23, 2009 in class You can collaborate in groups of up to 3. However, the write-ups must be done

More information

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Discussion 6A Solution

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Discussion 6A Solution CS 70 Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Discussion 6A Solution 1. Polynomial intersections Find (and prove) an upper-bound on the number of times two distinct degree

More information

arxiv: v1 [cs.it] 22 Jul 2015

arxiv: v1 [cs.it] 22 Jul 2015 Efficient Low-Redundancy Codes for Correcting Multiple Deletions Joshua Brakensiek Venkatesan Guruswami Samuel Zbarsky arxiv:1507.06175v1 [cs.it] 22 Jul 2015 Carnegie Mellon University Abstract We consider

More information

THIS work is motivated by the goal of finding the capacity

THIS work is motivated by the goal of finding the capacity IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 53, NO. 8, AUGUST 2007 2693 Improved Lower Bounds for the Capacity of i.i.d. Deletion Duplication Channels Eleni Drinea, Member, IEEE, Michael Mitzenmacher,

More information

EE 229B ERROR CONTROL CODING Spring 2005

EE 229B ERROR CONTROL CODING Spring 2005 EE 229B ERROR CONTROL CODING Spring 2005 Solutions for Homework 1 1. Is there room? Prove or disprove : There is a (12,7) binary linear code with d min = 5. If there were a (12,7) binary linear code with

More information

High Sum-Rate Three-Write and Non-Binary WOM Codes

High Sum-Rate Three-Write and Non-Binary WOM Codes Submitted to the IEEE TRANSACTIONS ON INFORMATION THEORY, 2012 1 High Sum-Rate Three-Write and Non-Binary WOM Codes Eitan Yaakobi, Amir Shpilka Abstract Write-once memory (WOM) is a storage medium with

More information

Error Detection and Correction: Hamming Code; Reed-Muller Code

Error Detection and Correction: Hamming Code; Reed-Muller Code Error Detection and Correction: Hamming Code; Reed-Muller Code Greg Plaxton Theory in Programming Practice, Spring 2005 Department of Computer Science University of Texas at Austin Hamming Code: Motivation

More information

Error Detection, Correction and Erasure Codes for Implementation in a Cluster File-system

Error Detection, Correction and Erasure Codes for Implementation in a Cluster File-system Error Detection, Correction and Erasure Codes for Implementation in a Cluster File-system Steve Baker December 6, 2011 Abstract. The evaluation of various error detection and correction algorithms and

More information

The Probabilistic Method

The Probabilistic Method Lecture 3: Tail bounds, Probabilistic Method Today we will see what is nown as the probabilistic method for showing the existence of combinatorial objects. We also review basic concentration inequalities.

More information

An Approximation Algorithm for Constructing Error Detecting Prefix Codes

An Approximation Algorithm for Constructing Error Detecting Prefix Codes An Approximation Algorithm for Constructing Error Detecting Prefix Codes Artur Alves Pessoa artur@producao.uff.br Production Engineering Department Universidade Federal Fluminense, Brazil September 2,

More information

Lecture 4: Codes based on Concatenation

Lecture 4: Codes based on Concatenation Lecture 4: Codes based on Concatenation Error-Correcting Codes (Spring 206) Rutgers University Swastik Kopparty Scribe: Aditya Potukuchi and Meng-Tsung Tsai Overview In the last lecture, we studied codes

More information

Lecture 11: Quantum Information III - Source Coding

Lecture 11: Quantum Information III - Source Coding CSCI5370 Quantum Computing November 25, 203 Lecture : Quantum Information III - Source Coding Lecturer: Shengyu Zhang Scribe: Hing Yin Tsang. Holevo s bound Suppose Alice has an information source X that

More information

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria Source Coding Master Universitario en Ingeniería de Telecomunicación I. Santamaría Universidad de Cantabria Contents Introduction Asymptotic Equipartition Property Optimal Codes (Huffman Coding) Universal

More information

0.1. Linear transformations

0.1. Linear transformations Suggestions for midterm review #3 The repetitoria are usually not complete; I am merely bringing up the points that many people didn t now on the recitations Linear transformations The following mostly

More information

The Poisson Channel with Side Information

The Poisson Channel with Side Information The Poisson Channel with Side Information Shraga Bross School of Enginerring Bar-Ilan University, Israel brosss@macs.biu.ac.il Amos Lapidoth Ligong Wang Signal and Information Processing Laboratory ETH

More information

Information-Theoretic Lower Bounds on the Storage Cost of Shared Memory Emulation

Information-Theoretic Lower Bounds on the Storage Cost of Shared Memory Emulation Information-Theoretic Lower Bounds on the Storage Cost of Shared Memory Emulation Viveck R. Cadambe EE Department, Pennsylvania State University, University Park, PA, USA viveck@engr.psu.edu Nancy Lynch

More information

Transmitting k samples over the Gaussian channel: energy-distortion tradeoff

Transmitting k samples over the Gaussian channel: energy-distortion tradeoff Transmitting samples over the Gaussian channel: energy-distortion tradeoff Victoria Kostina Yury Polyansiy Sergio Verdú California Institute of Technology Massachusetts Institute of Technology Princeton

More information

18.310A Final exam practice questions

18.310A Final exam practice questions 18.310A Final exam practice questions This is a collection of practice questions, gathered randomly from previous exams and quizzes. They may not be representative of what will be on the final. In particular,

More information

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE General e Image Coder Structure Motion Video x(s 1,s 2,t) or x(s 1,s 2 ) Natural Image Sampling A form of data compression; usually lossless, but can be lossy Redundancy Removal Lossless compression: predictive

More information

Regenerating Codes and Locally Recoverable. Codes for Distributed Storage Systems

Regenerating Codes and Locally Recoverable. Codes for Distributed Storage Systems Regenerating Codes and Locally Recoverable 1 Codes for Distributed Storage Systems Yongjune Kim and Yaoqing Yang Abstract We survey the recent results on applying error control coding to distributed storage

More information

Chapter 7. Error Control Coding. 7.1 Historical background. Mikael Olofsson 2005

Chapter 7. Error Control Coding. 7.1 Historical background. Mikael Olofsson 2005 Chapter 7 Error Control Coding Mikael Olofsson 2005 We have seen in Chapters 4 through 6 how digital modulation can be used to control error probabilities. This gives us a digital channel that in each

More information

6.895 PCP and Hardness of Approximation MIT, Fall Lecture 3: Coding Theory

6.895 PCP and Hardness of Approximation MIT, Fall Lecture 3: Coding Theory 6895 PCP and Hardness of Approximation MIT, Fall 2010 Lecture 3: Coding Theory Lecturer: Dana Moshkovitz Scribe: Michael Forbes and Dana Moshkovitz 1 Motivation In the course we will make heavy use of

More information

IN this paper, we will introduce a new class of codes,

IN this paper, we will introduce a new class of codes, IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 44, NO 5, SEPTEMBER 1998 1861 Subspace Subcodes of Reed Solomon Codes Masayuki Hattori, Member, IEEE, Robert J McEliece, Fellow, IEEE, and Gustave Solomon,

More information

Algebraic Gossip: A Network Coding Approach to Optimal Multiple Rumor Mongering

Algebraic Gossip: A Network Coding Approach to Optimal Multiple Rumor Mongering Algebraic Gossip: A Networ Coding Approach to Optimal Multiple Rumor Mongering Supratim Deb, Muriel Médard, and Clifford Choute Laboratory for Information & Decison Systems, Massachusetts Institute of

More information

Turbo Codes Can Be Asymptotically Information-Theoretically Secure

Turbo Codes Can Be Asymptotically Information-Theoretically Secure Turbo Codes Can Be Asymptotically Information-Theoretically Secure Xiao Ma Department of ECE, Sun Yat-sen University Guangzhou, GD 50275, China E-mail: maxiao@mailsysueducn Abstract This paper shows that

More information

On MBR codes with replication

On MBR codes with replication On MBR codes with replication M. Nikhil Krishnan and P. Vijay Kumar, Fellow, IEEE Department of Electrical Communication Engineering, Indian Institute of Science, Bangalore. Email: nikhilkrishnan.m@gmail.com,

More information

Section 3 Error Correcting Codes (ECC): Fundamentals

Section 3 Error Correcting Codes (ECC): Fundamentals Section 3 Error Correcting Codes (ECC): Fundamentals Communication systems and channel models Definition and examples of ECCs Distance For the contents relevant to distance, Lin & Xing s book, Chapter

More information

Distributed Data Storage with Minimum Storage Regenerating Codes - Exact and Functional Repair are Asymptotically Equally Efficient

Distributed Data Storage with Minimum Storage Regenerating Codes - Exact and Functional Repair are Asymptotically Equally Efficient Distributed Data Storage with Minimum Storage Regenerating Codes - Exact and Functional Repair are Asymptotically Equally Efficient Viveck R Cadambe, Syed A Jafar, Hamed Maleki Electrical Engineering and

More information

On Bit Error Rate Performance of Polar Codes in Finite Regime

On Bit Error Rate Performance of Polar Codes in Finite Regime On Bit Error Rate Performance of Polar Codes in Finite Regime A. Eslami and H. Pishro-Nik Abstract Polar codes have been recently proposed as the first low complexity class of codes that can provably achieve

More information

MATH32031: Coding Theory Part 15: Summary

MATH32031: Coding Theory Part 15: Summary MATH32031: Coding Theory Part 15: Summary 1 The initial problem The main goal of coding theory is to develop techniques which permit the detection of errors in the transmission of information and, if necessary,

More information

An Improvement of Non-binary Code Correcting Single b-burst of Insertions or Deletions

An Improvement of Non-binary Code Correcting Single b-burst of Insertions or Deletions An Improvement of Non-binary Code Correcting Single b-burst of Insertions or Deletions Toyohiko Saeki, Takayuki Nozaki Yamaguchi University ISITA2018 Oct. 29th, 2018 1 / 15 Background (1: Purpose) b-burst

More information

Security in Locally Repairable Storage

Security in Locally Repairable Storage 1 Security in Locally Repairable Storage Abhishek Agarwal and Arya Mazumdar Abstract In this paper we extend the notion of locally repairable codes to secret sharing schemes. The main problem we consider

More information

Lecture 4 Noisy Channel Coding

Lecture 4 Noisy Channel Coding Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 9, 2015 1 / 56 I-Hsiang Wang IT Lecture 4 The Channel Coding Problem

More information

Solutions of Exam Coding Theory (2MMC30), 23 June (1.a) Consider the 4 4 matrices as words in F 16

Solutions of Exam Coding Theory (2MMC30), 23 June (1.a) Consider the 4 4 matrices as words in F 16 Solutions of Exam Coding Theory (2MMC30), 23 June 2016 (1.a) Consider the 4 4 matrices as words in F 16 2, the binary vector space of dimension 16. C is the code of all binary 4 4 matrices such that the

More information

Distributed Reed-Solomon Codes

Distributed Reed-Solomon Codes Distributed Reed-Solomon Codes Farzad Parvaresh f.parvaresh@eng.ui.ac.ir University of Isfahan Institute for Network Coding CUHK, Hong Kong August 216 Research interests List-decoding of algebraic codes

More information

Berlekamp-Massey decoding of RS code

Berlekamp-Massey decoding of RS code IERG60 Coding for Distributed Storage Systems Lecture - 05//06 Berlekamp-Massey decoding of RS code Lecturer: Kenneth Shum Scribe: Bowen Zhang Berlekamp-Massey algorithm We recall some notations from lecture

More information

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Lecture 6 I. CHANNEL CODING. X n (m) P Y X 6- Introduction to Information Theory Lecture 6 Lecturer: Haim Permuter Scribe: Yoav Eisenberg and Yakov Miron I. CHANNEL CODING We consider the following channel coding problem: m = {,2,..,2 nr} Encoder

More information

MATH3302 Coding Theory Problem Set The following ISBN was received with a smudge. What is the missing digit? x9139 9

MATH3302 Coding Theory Problem Set The following ISBN was received with a smudge. What is the missing digit? x9139 9 Problem Set 1 These questions are based on the material in Section 1: Introduction to coding theory. You do not need to submit your answers to any of these questions. 1. The following ISBN was received

More information

Lecture 6: Expander Codes

Lecture 6: Expander Codes CS369E: Expanders May 2 & 9, 2005 Lecturer: Prahladh Harsha Lecture 6: Expander Codes Scribe: Hovav Shacham In today s lecture, we will discuss the application of expander graphs to error-correcting codes.

More information

Communication Efficient Secret Sharing

Communication Efficient Secret Sharing 1 Communication Efficient Secret Sharing Wentao Huang, Michael Langberg, Senior Member, IEEE, Joerg Kliewer, Senior Member, IEEE, and Jehoshua Bruck, Fellow, IEEE Abstract A secret sharing scheme is a

More information

On Sparsity, Redundancy and Quality of Frame Representations

On Sparsity, Redundancy and Quality of Frame Representations On Sparsity, Redundancy and Quality of Frame Representations Mehmet Açaaya Division of Engineering and Applied Sciences Harvard University Cambridge, MA Email: acaaya@fasharvardedu Vahid Taroh Division

More information

Accumulate-Repeat-Accumulate Codes: Capacity-Achieving Ensembles of Systematic Codes for the Erasure Channel with Bounded Complexity

Accumulate-Repeat-Accumulate Codes: Capacity-Achieving Ensembles of Systematic Codes for the Erasure Channel with Bounded Complexity Accumulate-Repeat-Accumulate Codes: Capacity-Achieving Ensembles of Systematic Codes for the Erasure Channel with Bounded Complexity Henry D. Pfister, Member, Igal Sason, Member Abstract This paper introduces

More information

STUDY OF PERMUTATION MATRICES BASED LDPC CODE CONSTRUCTION

STUDY OF PERMUTATION MATRICES BASED LDPC CODE CONSTRUCTION EE229B PROJECT REPORT STUDY OF PERMUTATION MATRICES BASED LDPC CODE CONSTRUCTION Zhengya Zhang SID: 16827455 zyzhang@eecs.berkeley.edu 1 MOTIVATION Permutation matrices refer to the square matrices with

More information

Zigzag Codes: MDS Array Codes with Optimal Rebuilding

Zigzag Codes: MDS Array Codes with Optimal Rebuilding 1 Zigzag Codes: MDS Array Codes with Optimal Rebuilding Itzhak Tamo, Zhiying Wang, and Jehoshua Bruck Electrical Engineering Department, California Institute of Technology, Pasadena, CA 91125, USA Electrical

More information

Asymptotically good sequences of codes and curves

Asymptotically good sequences of codes and curves Asymptotically good sequences of codes and curves Ruud Pellikaan Technical University of Eindhoven Soria Summer School on Computational Mathematics July 9, 2008 /k 1/29 Content: 8 Some algebraic geometry

More information

Mathematics Department

Mathematics Department Mathematics Department Matthew Pressland Room 7.355 V57 WT 27/8 Advanced Higher Mathematics for INFOTECH Exercise Sheet 2. Let C F 6 3 be the linear code defined by the generator matrix G = 2 2 (a) Find

More information

Upper Bounds to Error Probability with Feedback

Upper Bounds to Error Probability with Feedback Upper Bounds to Error robability with Feedbac Barış Naiboğlu Lizhong Zheng Research Laboratory of Electronics at MIT Cambridge, MA, 0239 Email: {naib, lizhong }@mit.edu Abstract A new technique is proposed

More information

Linear Algebra. F n = {all vectors of dimension n over field F} Linear algebra is about vectors. Concretely, vectors look like this:

Linear Algebra. F n = {all vectors of dimension n over field F} Linear algebra is about vectors. Concretely, vectors look like this: 15-251: Great Theoretical Ideas in Computer Science Lecture 23 Linear Algebra Linear algebra is about vectors. Concretely, vectors look like this: They are arrays of numbers. fig. by Peter Dodds # of numbers,

More information

Error Detection & Correction

Error Detection & Correction Error Detection & Correction Error detection & correction noisy channels techniques in networking error detection error detection capability retransmition error correction reconstruction checksums redundancy

More information

Lecture 4 : Introduction to Low-density Parity-check Codes

Lecture 4 : Introduction to Low-density Parity-check Codes Lecture 4 : Introduction to Low-density Parity-check Codes LDPC codes are a class of linear block codes with implementable decoders, which provide near-capacity performance. History: 1. LDPC codes were

More information

Lecture 4: Proof of Shannon s theorem and an explicit code

Lecture 4: Proof of Shannon s theorem and an explicit code CSE 533: Error-Correcting Codes (Autumn 006 Lecture 4: Proof of Shannon s theorem and an explicit code October 11, 006 Lecturer: Venkatesan Guruswami Scribe: Atri Rudra 1 Overview Last lecture we stated

More information

Related-Key Statistical Cryptanalysis

Related-Key Statistical Cryptanalysis Related-Key Statistical Cryptanalysis Darakhshan J. Mir Department of Computer Science, Rutgers, The State University of New Jersey Poorvi L. Vora Department of Computer Science, George Washington University

More information

Binary Linear Codes G = = [ I 3 B ] , G 4 = None of these matrices are in standard form. Note that the matrix 1 0 0

Binary Linear Codes G = = [ I 3 B ] , G 4 = None of these matrices are in standard form. Note that the matrix 1 0 0 Coding Theory Massoud Malek Binary Linear Codes Generator and Parity-Check Matrices. A subset C of IK n is called a linear code, if C is a subspace of IK n (i.e., C is closed under addition). A linear

More information

Communication Efficient Secret Sharing

Communication Efficient Secret Sharing Communication Efficient Secret Sharing 1 Wentao Huang, Michael Langberg, senior member, IEEE, Joerg Kliewer, senior member, IEEE, and Jehoshua Bruck, Fellow, IEEE arxiv:1505.07515v2 [cs.it] 1 Apr 2016

More information

Decoding Reed-Muller codes over product sets

Decoding Reed-Muller codes over product sets Rutgers University May 30, 2016 Overview Error-correcting codes 1 Error-correcting codes Motivation 2 Reed-Solomon codes Reed-Muller codes 3 Error-correcting codes Motivation Goal: Send a message Don t

More information

Communications II Lecture 9: Error Correction Coding. Professor Kin K. Leung EEE and Computing Departments Imperial College London Copyright reserved

Communications II Lecture 9: Error Correction Coding. Professor Kin K. Leung EEE and Computing Departments Imperial College London Copyright reserved Communications II Lecture 9: Error Correction Coding Professor Kin K. Leung EEE and Computing Departments Imperial College London Copyright reserved Outline Introduction Linear block codes Decoding Hamming

More information

Coding for loss tolerant systems

Coding for loss tolerant systems Coding for loss tolerant systems Workshop APRETAF, 22 janvier 2009 Mathieu Cunche, Vincent Roca INRIA, équipe Planète INRIA Rhône-Alpes Mathieu Cunche, Vincent Roca The erasure channel Erasure codes Reed-Solomon

More information

Lecture 7 September 24

Lecture 7 September 24 EECS 11: Coding for Digital Communication and Beyond Fall 013 Lecture 7 September 4 Lecturer: Anant Sahai Scribe: Ankush Gupta 7.1 Overview This lecture introduces affine and linear codes. Orthogonal signalling

More information

Hybrid Noncoherent Network Coding

Hybrid Noncoherent Network Coding Hybrid Noncoherent Network Coding Vitaly Skachek, Olgica Milenkovic, Angelia Nedić University of Illinois, Urbana-Champaign 1308 W. Main Street, Urbana, IL 61801, USA Abstract We describe a novel extension

More information

A Tight Rate Bound and Matching Construction for Locally Recoverable Codes with Sequential Recovery From Any Number of Multiple Erasures

A Tight Rate Bound and Matching Construction for Locally Recoverable Codes with Sequential Recovery From Any Number of Multiple Erasures 1 A Tight Rate Bound and Matching Construction for Locally Recoverable Codes with Sequential Recovery From Any Number of Multiple Erasures arxiv:181050v1 [csit] 6 Dec 018 S B Balaji, Ganesh R Kini and

More information

Source Coding and Function Computation: Optimal Rate in Zero-Error and Vanishing Zero-Error Regime

Source Coding and Function Computation: Optimal Rate in Zero-Error and Vanishing Zero-Error Regime Source Coding and Function Computation: Optimal Rate in Zero-Error and Vanishing Zero-Error Regime Solmaz Torabi Dept. of Electrical and Computer Engineering Drexel University st669@drexel.edu Advisor:

More information

Dependence and independence

Dependence and independence Roberto s Notes on Linear Algebra Chapter 7: Subspaces Section 1 Dependence and independence What you need to now already: Basic facts and operations involving Euclidean vectors. Matrices determinants

More information

Chapter 2: Random Variables

Chapter 2: Random Variables ECE54: Stochastic Signals and Systems Fall 28 Lecture 2 - September 3, 28 Dr. Salim El Rouayheb Scribe: Peiwen Tian, Lu Liu, Ghadir Ayache Chapter 2: Random Variables Example. Tossing a fair coin twice:

More information

An Extended Fano s Inequality for the Finite Blocklength Coding

An Extended Fano s Inequality for the Finite Blocklength Coding An Extended Fano s Inequality for the Finite Bloclength Coding Yunquan Dong, Pingyi Fan {dongyq8@mails,fpy@mail}.tsinghua.edu.cn Department of Electronic Engineering, Tsinghua University, Beijing, P.R.

More information

AN INFORMATION THEORY APPROACH TO WIRELESS SENSOR NETWORK DESIGN

AN INFORMATION THEORY APPROACH TO WIRELESS SENSOR NETWORK DESIGN AN INFORMATION THEORY APPROACH TO WIRELESS SENSOR NETWORK DESIGN A Thesis Presented to The Academic Faculty by Bryan Larish In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy

More information

New constructions of WOM codes using the Wozencraft ensemble

New constructions of WOM codes using the Wozencraft ensemble New constructions of WOM codes using the Wozencraft ensemble Amir Shpilka Abstract In this paper we give several new constructions of WOM codes. The novelty in our constructions is the use of the so called

More information

Codes over Subfields. Chapter Basics

Codes over Subfields. Chapter Basics Chapter 7 Codes over Subfields In Chapter 6 we looked at various general methods for constructing new codes from old codes. Here we concentrate on two more specialized techniques that result from writing

More information

1 Basic Combinatorics

1 Basic Combinatorics 1 Basic Combinatorics 1.1 Sets and sequences Sets. A set is an unordered collection of distinct objects. The objects are called elements of the set. We use braces to denote a set, for example, the set

More information

Optimal Fractal Coding is NP-Hard 1

Optimal Fractal Coding is NP-Hard 1 Optimal Fractal Coding is NP-Hard 1 (Extended Abstract) Matthias Ruhl, Hannes Hartenstein Institut für Informatik, Universität Freiburg Am Flughafen 17, 79110 Freiburg, Germany ruhl,hartenst@informatik.uni-freiburg.de

More information