Complexity of Decoding Positive-Rate Reed-Solomon Codes

Complexity of Decoding Positive-Rate Reed-Solomon Codes Qi Ceng 1 and Daqing Wan 1 Scool of Computer Science Te University of Oklaoma Norman, OK73019 Email: qceng@cs.ou.edu Department of Matematics University of California Irvine, CA 9697-3875 Email: dwan@mat.uci.edu Abstract. Te complexity of maximum likeliood decoding of te Reed- Solomon codes [q 1, k] q is a well known open problem. Te only known result [4] in tis direction states tat it is at least as ard as te discrete logaritm in some cases were te information rate unfortunately goes to zero. In tis paper, we remove te rate restriction and prove tat te same complexity result olds for any positive information rate. In particular, tis resolves an open problem left in [4], and rules out te possibility of a polynomial time algoritm for maximum likeliood decoding problem of Reed-Solomon codes of any rate under a well known cryptograpical ardness assumption. As a side result, we give an explicit construction of Hamming balls of radius bounded away from te minimum distance, wic contain exponentially many codewords for Reed-Solomon code of any positive rate less tan one. Te previous constructions in [][7] only apply to Reed-Solomon codes of diminising rates. We also give an explicit construction of Hamming balls of relative radius less tan 1 wic contain subexponentially many codewords for Reed-Solomon code of rate approacing one. 1 Introduction Let F q be a finite field of q elements and of caracteristic p. A linear errorcorrecting [n, k] q code is defined to be a linear subspace of dimension k in F n q. Let D = {x 1,, x n } F q be a subset of cardinality D = n > 0. For 1 k n, let f run over all polynomials in F q [x] of degree at most k 1, te vectors of te form (f(x 1 ),, f(x n )) F n q constitute a linear error-correcting [n, k] q code. If D = F q, it is famously known as te Reed-Solomon code. If D = F q, it is known as te extended Reed-Solomon code. We denote tem by RS q [q 1, k] and RS q [q, k] respectively. We simply call it a generalized Reed-Solomon code if D is an arbitrary subset of F q.

Remark 1. In some code teory literature, RS q [q 1, k] is called primitive Reed- Solomon code, and a generalized Reed-Solomon code [n, k] q is defined to be {(y 1 f(x 1 ),, y n f(x n )) f F q [x], deg(f) < k}, were y 1, y,, y n are nonzero elements in F q. Te minimal distance of a generalized Reed-Solomon [n, k] q code is n k + 1 because a non-zero polynomial of degree at most k 1 as at most k 1 zeroes. Te ultimate decoding problem for an error-correcting [n, k] q code is te maximum likeliood decoding: given a received word u F n q, find a codeword v suc tat te Hamming distance d(u, v) is minimal. Wen te number of errors is reasonably small, say, smaller tan n nk, ten te list decoding algoritms of Guruswami-Sudan [8] gives a polynomial time algoritm to find all te codewords for te generalized Reed-Solomon [n, k] q code. Wen te number of errors increases beyond n nk, it is not known weter tere exists a polynomial time decoding algoritm. Te maximum likeliood decoding of a generalized Reed-Solomon [n, k] q code is known to be NP-complete [6]. Te difficulty is caused by te combinatorial complication of te subset D wit no structures. In fact, tere is a straigtforward way to reduce te subset sum problem in D to te deep ole problem of a generalized Reed-Solomon code, wic can ten be reduced to te maximum likeliood decoding problem [3]. Note tat te subset sum problem for D F q is ard only if D is muc smaller tan q. In practical applications, one rarely uses te case of arbitrary subset D. Te most widely used case is wen D = F q wit ric algebraic structures. Tis case is essentially equivalent to te case D = F q. For simplicity, we focus on te extended Reed-Solomon code RS q [q, k] in tis paper, all our results can be applied to te Reed-Solomon code RS q [q 1, k] wit little modification. Te maximum likeliood decoding problem of RS q [q, k] is considered to be ard, but te attempts to prove its NP-completeness ave failed so far. Te metods in [6][3] can not be specialized to RS q [q, k] because we ave lost te freedom to select D. Te only known complexity result [4] in tis direction says tat te decoding of RS q [q, k] is at least as ard as te discrete logaritm in F q for satisfying k 4 q k, q 1 + + 1 and 4 + 1 for any > 0. Te main weakness of tis result is tat q as to be greater tan k, wic implies tat te information rate k/q goes to zero. But in te real world, we tend to use te Reed-Solomon codes of ig rates. 1.1 Our results Our main result of tis paper is to remove tis restriction. Precisely, we sow tat

Teorem 1. For any c [0, 1], tere exists an infinite explicit family of Reed- Solomon codes {RS q1 [q 1, k 1 ], RS q [q, k ],, RS qi [q i, k i ], } wit q i = Θ(i log i) and k i = (c+o(1))q i suc tat if tere is a polynomial time randomized algoritm solving te maximum likeliood decoding problem for te above family of codes, ten tere is a polynomial time randomized algoritm solving te discrete logaritm problem over all te fields in {F q 1 1 were i is any integer less tan q 1/4+o(1) i., F q,, F q i i, }, Te discrete logaritm problem over finite fields is well studied in computational number teory. It is not believed to ave a polynomial time algoritm. Many cryptograpical protocols base teir security on tis assumption. Te fastest general purpose algoritm [1] solves te discrete logaritm problem over finite field F q in conjectured time exp(o((log q ) 1/3 (log log q ) /3 )). Tus, in te above teorem, it is best to take i as large as possible (close to q 1/4+o(1) i ) in order for te discrete logaritm to be ard. If = q 1/4+o(1), tis complexity is subexponential on q. Te above teorem rules out a polynomial time algoritm for te maximum likeliood decoding problem of Reed-Solomon code of any rate under a cryptograpical ardness assumption. By a direct counting argument, for any positive integer r < q k, tere exists a Hamming ball of radius r containing at least ( q r) /q q r k many codewords in Reed-Solomon code RS q [q, k]. Tus, if k = cq for a constant 0 < c < 1, we set r = q k q 1/4 and te number of code words in te Hamming ball will be exponential in q. However, finding suc a Hamming ball deterministically is a ard problem. Tere is some work done on tis problem [7][], but all te results are for codes of diminising rates. Our contribution to tis problem is to remove te rate restriction. Teorem. For any c (0, 1), tere exists a deterministic algoritm tat given a positive integer i, outputs a prime power q, a positive integer k and a vector v F q q suc tat q = Θ(i log i) and k = (c + o(1))q, and te Hamming ball centered at v and of radius q k q 1/4+o(1) contains exp(ω(q)) many codewords in RS q [q, k], and te algoritm runs in time i O(1). Our construction allows te information rate to be positive. However, te ratio between te Hamming ball radius q k q 1/4+o(1) and te minimum distance q k + 1, wic is known as te relative radius of te Hamming ball, is approacing 1, as is in [7][]. Te following result sows tat we can decrease te relative radius to a constant less tan 1 if we work wit codes wit information rates going to one.

Teorem 3. For any real number ρ (/3, 1), tere is a deterministic algoritm tat, given a positive integer i, outputs a prime power q = i O(1), a positive integer k = q o( q) and a vector v F q q suc tat te Hamming ball centered at v and of radius [ρ(q k + 1)] contains at least q i many codewords in RS q [q, k]. Te algoritm as time complexity i O(1). Note tat te information rate is 1 o(1). It would be interesting for future researc to extend te result to all ρ (1/, 1) and to prove a similar result wit te information rate positive and te relative radius less tan 1. Given a real number ρ (0, 1), te codes were some Hamming balls of relative radius ρ contain superpolynomially many codewords are called ρ-dense. It was known in [5] ow to efficiently construct suc codes for any ρ (1/, 1), but finding te center of suc a Hamming ball in deterministic polynomial time was left open. In tis paper, we solve tis problem if te relative radius falls in te range (/3, 1) using Reed-Solomon codes of rate approacing one. Tis result derandomizes an important step in te inapproximability result for minimum distance problem of a linear code in [5]. However, to completely derandomize te reduction tere, one needs to find a linear map from a dense Hamming ball into a linear subspace. Tis is again an interesting future researc direction. 1. Tecniques Our earlier paper [4] proved Teorem 1 for c = 0 (in tat case we ave i q 1/+o(1) i ). Te main result of our earlier paper was to sow tat te maximum likeliood decoding of RS q [q, k] is at least as ard as te discrete logaritm over F q if every element in F q can be represented as products of k + distinct elements from α + F q were α satisfies F q [α] = F q. Te number of representations corresponds to te number of codewords in certain Hamming ball of radius q k. In tis paper, we sall be concentrating on 0 < c 1. We sall sow tat te case c = 1 follows from te case c = 0 by a dual argument. Te main new idea for te case 0 < c < 1 is to exploit te role of subfields contained in F q. Assume tat q = q and = q 1/4+o(1) is a positive integer. We ave F q F q F q. Let α be an element in F q suc tat F q [α] = F q [α] = F q. We observe tat if every element in F q can be written as a product of g 1 many distinct α + a wit a F q, ten for any nonnegative integer g q q, every element in F q can be written as a product of g 1 + g many distinct α + a wit a F q. Tis observation enables us to prove te main tecnical lemma tat for any constant 0 < c < 1, any element in F q can be written as a product of cq distinct factors in {α + a a F q } for q large enoug. Previous work for rate c = 0 For readers convenience, in tis section, we sketc te main ideas in our earlier paper [4]. Tis will be te starting point of our new results in te present paper.

Let be a positive integer. Let (x) be a monic irreducible polynomial in F q [x] of degree. Let α be a root of (x) in an extension field. Ten, F q [α] = F q is a finite field of q element. We ave Teorem 4. Let < g < q be positive integers. If every element of F q can be written as a product of exactly g distinct linear factors of te form α + a wit a F q, ten te discrete logaritm in F q can be efficiently reduced in random time q O(1) to te maximum likeliood decoding of te Reed-Solomon code RS q [q, g ]. Proof. In [4], te same result was stated for te weaker bounded distance decoding. Since te specific words used in [4] ave exact distance q g to te code RS q [q, g ], te bounded distance decoding and te maximum likeliood decoding are equivalent for tose special words. Tus, we may replace bounded distance decoding by te maximum likeliood decoding in te above statement. We now sketc te main ideas. Let (x) be a monic irreducible polynomial of degree in F q [x]. We sall identify te extension field F q wit te residue field F q [x]/((x)). Let α be te class of x in F q [x]/((x)). Ten, F q [α] = F q. Consider te Reed-Solomon code RS q [q, g ]. For a polynomial f(x) F q [x] of degree at most 1, let u f be te received word u f = ( f(a) (a) + ag ) a Fq. By assumption, we can write f(α) = g (α + a i ), i=1 were a i F q are distinct. It follows tat as polynomials, we ave te identity g (x + a i ) = f(x) + t(x)(x), i=1 were t(x) F q [x] is some monic polynomial of degree g. Tus, g f(x) (x) + xg + (t(x) x g i=1 ) = (x + a i), (x) were t(x) x g F q [x] is a polynomial of degree at most g 1 and tus corresponds to a codeword. Tis equation implies tat te distance of te received word u f to te code RS q [q, g ] is at most q g. If te distance is smaller tan q g, ten one gets a monic polynomial of degree g wit more tan g distinct roots. Tus, te distance of u f to te code is exactly q g. Let C f be te set of codewords in RS q [q, g ] tat as distance exactly q g to te received word u f. Te cardinality of C f is ten equal to 1 g! times te number of ordered ways tat f(α) can be written as a product of exactly g

distinct linear factors of te form α + a wit a F q. For error radius q g, te maximum likeliood decoding of te received word u f is te same as finding a solution to te equation g f(α) = (α + a i ), i=1 were a i F q being distinct. To sow tat te discrete logaritm in F q can be reduced to te decoding of te words of te type u f, we apply te index calculus algoritm. Let b(α) be a primitive element of F q. Taking f(α) = b(α) i for a random 0 i q, te maximum likeliood decoding of te word u f gives a relation b(α) i = g (α + a j (i)), j=1 were a j (i) F q are distinct for 1 j g. Tis gives te congruence equation i g log b(α) (α + a j (i)) (mod q 1). j=1 Repeating te decoding and let i vary, tis would give enoug linear equations in te q variables log b(α) (α + a) (a F q )). Solving te linear system modulo q 1, one finds te values of log b(α) (α + a) for all a F q. To compute te discrete logaritm of an element v(α) F q wit respect to te base b(α), one applies te decoding to te element v(α) and finds a relation v(α) = were te b j F q are distinct. Ten, log b(α) v(α) g (α + b j ), j=1 g log b(α) (α + b j ) (mod q 1). j=1 In tis way, te discrete logaritm of v(α) is computed. Te detailed analysis can be found in [4]. Te above teorem is te starting point of our metod. In order to use it, one needs to get good information on te integer g satisfying te assumption of te teorem. Tis is a difficult teoretical problem in general. It can be done in some cases, wit te elp of Weil s caracter sum estimate togeter wit a simple sieving. Precisely, te following result was proved for g in [4]. Teorem 5. Let < g be positive integers. Let ( N(g, ) = 1 q g ( ) g q g 1 ( ) g g! q (1 + ))( 1) g q g/. 1

Ten every element in F q can be written in at least N(g, ) ways as a product of exactly g distinct linear factors of te form α + a wit a F q. If for some constant > 0, we ave q max(g, ( 1) + ), g ( 4 + )( + 1), ten N(g, ) q g/ /g! > 0. Te main draw back of te above teorem is te condition q g, wic translates to te condition tat te information rate (g )/q goes to zero in applications. 3 Te result for rate c = 1 Now we sow tat Teorem 1 olds wen te information rate approaces one. Proposition 6 Let g, be positive integers suc tat for some constant > 0, we ave q max(g, ( 1) + ), g ( 4 + )( + 1). Ten, every element in F q can be written in at least N(g, ) ways as a product of exactly q g distinct linear factors of te form α + a wit a F q. To prove tis proposition, we observe tat te map tat sends β F q to a F q (α + a)/β is one-to-one from F q to itself. Proof: Note tat (α + a) 0. a F q Given an element β F q, from Teorem 5, we ave tat a F q (α + a)/β can be written in at least N(g, ) ways as a product of exactly g distinct linear factors of te form α + a wit a F q, ence β can be written in at least N(g, ) ways as a product of exactly q g distinct linear factors of te form α + a wit a F q. It follows from Teorem 4 tat we ave te following two results. Proposition 7 Suppose tat q max(g, ( 1) + ), g ( 4 + )( + 1). Ten te maximum likeliood decoding RS q [q, q g ] is as ard as te discrete logaritm over te finite field F q. Note tat te rate (q g )/q approaces 1 as q increases for g = O( q) and = O(g) = O( q).

Proposition 8 Suppose tat q max(g, ( 1) + ), g ( 4 + )( + 1). Let (x) be an irreducible polynomial of degree over F q and let f(x) be a nonzero polynomial of degree less tan over F q. Ten in Reed-Solomon code RS q [q, q g ], te Hamming ball centered at ( f(a) (a) + aq g ) a Fq of radius g contains at least qg/ g! many codewords. Note if we set g = q, ten te number of codewords is greater tan q, wic is subexponential. Proof of Teorem 3: Te relative radius of te Hamming ball in te g above proposition is g++1. If g = ( 4 + )( + 1), ten te relative radius is approacing to 4 +. Select suc tat +4 4 = +3 3+4 ρ = + 4 3 + 4. Note tat can be large if ρ is close to /3. If g = q 1 +, te number of codewords is at least q g/ g! > ( q/g) g = q g (+). To make sure tat tis number is greater tan q i, we need g > (+)i. It is satisfied if we let q to be te least prime power tat is greater tan ( ( + )i ) + = i O(1). We ten calculate g = q 1 + and solve from te equation g = ( +)(+1). Finally we find an irreducible polynomial (x) of degree over F q using te algoritm in [9]. 4 Te result for rate 0 < c < 1 We now consider te positive rate case wit 0 < c < 1. For tis purpose, we take q = q m 1 wit m. Let α be an element in F q wit F q1 [α] = F q. Since we also ave F q = F q [α]. Teorem 9. Let q = q1 m wit g q q 1. Let N(g 1, g,, m) = 1 g 1! F q1 [α] F q [α] F q, wit m. Let g 1 and g be non-negative integers ( q g 1 1 ( g 1 ) q g 1 1 1 q1 m (1 + 1 ( ) g1 )(m 1) g1 q g1/ 1 ) (q ) q1 g

Ten, every element in F q can be written in at least N(g 1, g,, m) ways as a product of exactly g 1 + g distinct linear factors of te form α + a wit a F q. If for some constant > 0, we ave q 1 max(g1, (m 1) + ), g 1 ( 4 + )(m + 1) ten N(g 1, g,, m) qg1/ 1 g 1! ( ) q q1 > 0. g Proof. Since g q q 1, we can coose g distinct elements b 1,, b g from te set F q F q1. For any element β F q = F, since F q1 m q1 [α] = F q m, we can 1 apply Teorem 5 to deduce tat β (α + b 1 ) (α + b g ) = (α + a 1) (α + a g1 ), were te a i F q1 are distinct. Te number of suc sets {a 1, a, a 3,, a g1 } is greater tan F q1 ( 1 q g 1 1 ( g 1 ) q g 1 1 1 g 1! q1 m (1 + 1 ( ) ) g1 )(m 1) g1 q g1/ 1. Since F q1 and its complement F q F q1 are disjoint, it follows tat β = (α + b 1 ) (α + b g )(α + a 1 ) (α + a g1 ) is a product of exactly g 1 + g distinct linear factors of te form α + a wit a F q. We now take g 1 = q 1/m = q 1 and g = cq g 1 in te above teorem. Tus, g 1 + g = cq. We need g satisfying te inequalities Tat is, 0 g q q 1 = q q 1/m. 0 cq q 1/m q q 1/m. Te left side inequality is satisfied if q 1 c /(m 1). Te rigt side inequality is satisfied if q 1 (1 c) 1/(m 1). Tus, we obtain Teorem 10. Let m and be two positive integers suc tat q = q m 1. Let 0 < c < 1 be a constant suc tat q 1 max((m 1) +, ( 4 + )(m + 1), c m 1, (1 c) 1 m 1 ) for some constant > 0. Ten, every element in F q can be written as a product of exactly cq distinct linear factors of te form α + a wit a F q.

Combining tis teorem togeter wit Teorem 4, we deduce Teorem 11. Let m and be two positive integers suc tat q = q m 1. Let 0 < c < 1 be a constant suc tat q 1 max((m 1) +, ( 4 + )(m + 1), c m 1, (1 c) 1 m 1 ) for some constant > 0. Ten, te maximum likeliood decoding of te Reed- Solomon code RS q [q, cq ] is at least as ard (in random time q O(1) reduction) as te discrete logaritm in F q. Taking m = in tis teorem, we deduce Teorem 1. Proposition 1 Let be a positive integer and 0 < c < 1 be a constant. Let q 1 be a prime power suc tat q 1 max(( 1) +, ( 4 + )( + 1), c /3, (1 c) 1 ) (1) for some constant > 0. Let q = q1. Let (x) be an irreducible polynomial of degree over F q wose root α satisfies tat F q1 [α] = F q. Let f(x) be a nonzero polynomial over F q of degree less tan. Ten in te Reed-Solomon code RS q [q, cq ], te Hamming ball centered at ( f(a) (a) +a cq ) a Fq of radius q cq contains at least exp(θ(q)) many codewords. Proof: Te number of codewords in te ball is greater tan q q 1 / ( ) 1 q q1 q 1! cq, q 1 wic is greater tan ( q q 1 ) cq q 1 = exp(θ(q)). Proof of Teorem. Let q to be te square of te i-t prime power (listed in increasing order). Assume tat i is large enoug suc tat q max(c /3, (1 c) 1 ). We ten let to be 1/ log q and to be te largest integer satisfying (1). It remains to find an irreducible polynomial of degree over F q, wose root α satisfies tat F q1 [α] = F q. Let p be te caracteristic of F q. We can use α suc tat F p [α] = F q. We need to find an irreducible polynomial of degree log p q over F p. It can be done in time polynomial in p and te degree [9]. Ten we factor te polynomial over F q and take any factor to be (x). As for f(x), we may simply let f(x) = 1. 5 Conclusion and future researc In tis paper, we sow tat te maximum likeliood decoding of te Reed- Solomon code is at least as ard as te discrete logaritm for any given information rate. In our result, we assumed tat te cardinality of te finite field is

composite. Wile tis is not a problem in practical applications, e.g. q = 56 is quite popular, it would be interesting to remove tis restriction, tat is, allowing prime finite fields as well. Many important questions about decoding Reed-Solomon codes remain open. For example, little is known about te exact list decoding radius of Reed-Solomon codes. In particular, does tere exist a Hamming ball of relative radius less tan one tat contains super-polynomial many codewords in Reed-Solomon codes of rate less tan one? References 1. Nigel Smart Antoine Joux, Reynald Lercier and Frederik Vercauteren. Te number field sieve in te medium prime case. In Advances in Cryptology - CRYPTO 006, volume 4117 of Lecture Notes in Computer Science, pages 36 344. Springer-Verlag, 006.. Eli Ben-Sasson, Swastik Kopparty, and Jaikumar Radakrisnan. Subspace polynomials and list decoding of Reed-Solomon codes. In 47t Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 07 16, 006. 3. Qi Ceng and Elizabet Murray. On deciding deep oles of Reed-Solomon codes. In Proceedings of Annual Conference on Teory and Applications of Models of Computation(TAMC), volume 4484 of Lecture Notes in Computer Science, pages 96 305. Springer-Verlag, 007. 4. Qi Ceng and Daqing Wan. On te list and bounded distance decodability of Reed- Solomon codes. SIAM Journal on Computing, 37(1):195 09, 007. Special Issue on FOCS 004. 5. Ilya Dumer, Daniele Micciancio, and Madu Sudan. Hardness of approximating te minimum distance of a linear code. IEEE Transactions on Information Teory, 49(1): 37, 003. 6. V. Guruswami and A. Vardy. Maximum-likeliood decoding of Reed-Solomon codes is NP-ard. IEEE Transactions on Information Teory, 51(7):49 56, 005. 7. Venkatesan Guruswami and Atri Rudra. Limits to list decoding Reed-Solomon codes. IEEE Transactions on Information Teory, 5(8):364 3649, 006. 8. Venkatesan Guruswami and Madu Sudan. Improved decoding of Reed-Solomon and algebraic-geometry codes. IEEE Transactions on Information Teory, 45(6):1757 1767, 1999. 9. Victor Soup. New algoritms for finding irreducible polynomials over finite fields. Matematics of Computation, 54:435 447, 1990.