High-rate Locally-testable Codes with Quasi-polylogarithmic Query Complexity

High-rate Locally-testable Codes with Quasi-polylogarithmic Query Complexity Swastik Kopparty, Or Meir, Noga Ron-Zewi, Shubhangi Saraf September 5, 205 Abstract An error correcting code is said to be locally testable if there is a test that checks whether a given string is a codeword, or rather far from the code, by reading only a small number of symbols of the string. Locally testable codes (LTCs) are both interesting in their own right, and have important applications in complexity theory. A long line of research tries to determine the best tradeoff between rate and distance that LTCs can achieve. In this work, we construct LTCs that have high rate (arbitrarily close to ), have constant relative distance, and can be tested using (log n) O(log log n) queries. This improves over the previous best construction of LTCs with high rate, by the same authors, which uses exp( log n log log n) queries [KMRS5]. In fact, as in [KMRS5], our result is actually stronger: for binary codes, we obtain LTCs that match the Zyablov bound for any rate 0 < r <. For codes over large alphabet (of constant size), we obtain LTCs that approach the Singleton bound, for any rate 0 < r <. Introduction Locally-testable codes (LTCs) are error-correcting codes that admit local error-detecting algorithms. Specifically, LTCs are codes which come equipped with a randomized sub-linear time local-testing algorithm, which when given oracle access to a received word z, makes a few queries to z and distinguishes between the following two cases with high probability: z is a codeword, z is ε-far from every codeword of the code. LTCs were introduced by [FS95, RS96] and their systematic study was begun in [GS06]. They are not only interesting in their own right, but also have important connections to other areas of complexity theory, most notably to PCPs [AS98, ALM + 98]. Department of Mathematics & Department of Computer Science, Rutgers University, Piscataway NJ 08854, USA. Supported in part by a Sloan Fellowship and NSF grant CCF-253886. swastik.kopparty@gmail.com Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 7600, Israel. or.meir@weizmann.ac.il. This research was carried out when Meir was supported in part by the Israel Science Foundation (grant No. 460/05). School of Mathematics, Institute for Advanced Study, Princeton, NJ, USA, nogazewi@ias.edu. Supported in part by the Rothschild fellowship and NSF grant CCF-42958. Department of Mathematics & Department of Computer Science, Rutgers University, Piscataway NJ 08854, USA. Supported in part by NSF grant CCF-350572. shubhangi.saraf@gmail.com

The three key parameters of LTCs are: the rate, the relative distance, and the query complexity. The rate of a code is the ratio of the message length to the codeword length, and it measures the amount of redundancy of the code. The relative distance of a code is the minimum fraction of coordinates on which every pair of codewords from the code disagree, and is related to the fraction of errors that the code can detect and correct. Finally, the query complexity of an LTC is the number of queries made by the local-testing algorithm to the received word. Naturally, one would like to maximize the rate and relative distance of the LTC while minimizing its query complexity. In particular, the most interesting and fundamental question about LTCs asks: are there LTCs with constant rate, constant relative distance and constant query complexity. Most of the research on LTCs so far has focused on the constant query regime. After many years of intensive research [HS00, GS06, BSVW03, BGH + 06], it is now known [BS08, Din07, Vid5b] that there are LTCs of length n with rate Ω(/polylog(n)), constant relative distance and constant query complexity (where the query complexity can even be as small as 3!). It is not known if one can obtain codes with constant rate and constant relative distance in this regime of constant query complexity, and it is known that if the LTCs have additional restrictions then constant rate and constant relative distance are not possible [DK, BV2]. This paper is about the regime of LTCs with constant rate. In this regime one fixes the rate r and the relative distance to be constants and asks for the minimum query complexity achievable by LTCs. For constant rate r < /2, Reed-Muller codes over large fields were long known [RS96] to give LTCs of length n with rate r, constant relative distance and query complexity n O(/(log(/r)). More recent work [Vid5a, GKS3] showed that in fact one can get LTCs with constant relative distance and query complexity O(n β ) for any constant β > 0, even with rate arbitrarily close to. Even more recently, we showed in [KMRS5] that there exist LTCs with constant rate (and again the rate can be taken arbitrarily close to ), constant relative distance, and query complexity exp( log n log log n).. Our results In this work, we construct LTCs that have constant rate (which can be taken arbitrarily close to ) and constant relative distance, with query complexity being only (log n) O(log log n). Formally, we prove the following result. Theorem. (High-rate binary LTCs with quasi-polylogarithmic query complexity). For every r (0, ), there exist δ > 0 and an explicit infinite family of binary linear codes {C n } n satisfying:. C n has block length n, rate at least r, and relative distance at least δ, 2. C n is locally testable with query complexity and running time at most (log n) O(log log n). In fact, not only do our LTCs have constant rate and constant relative distance, but the tradeoff between the rate and the distance matches the Zyablov bound [Zya7]. Formally, the relative distance δ above satisfies ( δ = max {( R ε) H r )}, r<r< R where H is the binary entropy function. Over large (but still constant size) alphabets, we show that such codes can approach the Singleton bound, i.e they can basically obtain a rate-distance tradeoff that is the best possible for general codes. 2

Theorem.2 (LTCs with quasi-polylogarithmic query complexity approaching the Singleton bound). For every r (0, ) and every ε > 0, there exists an explicit infinite family of linear codes {C n } n satisfying:. C n has block length n, rate at least r, and relative distance at least r ε, 2. C n is locally testable with query complexity and running time at most (log n) O(log log n), 3. The alphabet of C n is of size at most exp(poly(/ε)). We note that the exponential dependence of the alphabet size on ε follows from our use of the Alon-Luby distance-amplification method (see below). This dependence indeed seems to be a bottleneck in all applications of this method, e.g. [GI05]..2 Our techniques Our construction of LTCs uses ingredients from our previous construction of high-rate LTCs [KMRS5], in conjunction with ideas from the iterative construction of constant-query LTCs by Meir [Mei09]. In [KMRS5], we showed that a technique of Alon and Luby [AL96] can be used to amplify the relative distance of LTCs while preserving their local testability. In particular, starting with an LTC with relative distance δ, we can amplify its relative distance to any 0 < δ < while increasing the query complexity by a factor of poly(/δ) and decreasing the rate by a factor of only δ. This technique is useful, since it is easier to construct LTCs with small relative distance, and this technique allows us to transform such LTCs into ones with better relative distance. Indeed, our construction of LTCs in [KMRS5] followed this scheme: first, we constructed LTCs with constant rate and extremely small relative distance, namely exp( log n log log n). Then, we applied the Alon-Luby technique to amplify the relative distance to a constant, while paying a factor of exp( log n log log n) in the query complexity. The main technical contribution of this paper is an improved construction of LTCs in the subconstant relative distance regime. More specifically, we show how to construct LTCs of high rate with relative distance /polylog(n) and query complexity (log n) O(log log n) (so both relative distance and query complexity are improved). Using the Alon-Luby distance-amplification, this gives in turn LTCs of high rate with constant relative distance and query complexity (log n) O(log log n). We construct our improved LTCs in the sub-constant relative distance regime using an iterative strategy, along the lines of the construction of Meir [Mei09] (which itself is in the spirit of several classical, iterative, brave, yet moderate, algorithms [Gol]: the zig-zag product [RVW00], undirected connectivity in log-space [Rei08] and the PCP theorem via gap amplification [Din07]). The main new ingredient in our iterative strategy is again the Alon-Luby distance-amplification technique (but in a different setting of parameters)..2. The iterative construction We now describe our iterative construction of high rate LTCs with relative distance /polylog(n) and query complexity (log n) O(log log n). Following [Mei09], our construction starts with a code of very small block length, which can be tested simply by reading the entire received word. Then, the block length is increased iteratively, while the rate, relative distance, and query complexity are not harmed by too much. More specifically, suppose we want to construct a code with block length n. We start with a code of block length poly log n, rate poly log n, and relative distance poly log n. Then, in each iteration, the block length and rate are (roughly) squared, the relative distance is maintained, and 3

the query complexity is increased by a factor of poly log n. Thus, after log log n iterations, we obtain an LTC with block length n, constant rate, relative distance poly log n, and query complexity (log n) O(log log n), as required. A single iteration. We now describe the structure of a single iteration. Suppose that, at the beginning of the iteration, the code C has block length n, rate r, relative distance δ, and query complexity q. We apply to C the following two operations: Tensor product: We replace C with its tensor product C 2. The tensor product C 2 is the code that contains all n n matrices M such that that all rows of M and all columns of M are codewords of C. The code C 2 has block length n 2, rate r 2, relative distance δ 2, and query complexity q poly(/δ). Distance amplification: We apply (a variant of) the Alon-Luby distance-amplification to the code C 2, and amplify the relative distance from δ 2 to δ. The resulting code has block length O(n 2 ), rate ( δ) r 2, relative distance δ, and query complexity q poly(/δ). The iteration ends after the distance amplification. The local testability of the tensor product. There is one more complication that needs to be handled: as explained above, we need the tensor product to preserve the local testability, i.e., to increase the query complexity by a factor of at most poly(/δ). However, this does not necessarily hold if C is an arbitrary code. In fact, there is a long line of research that attempts to understand the conditions under which tensor product preserves the local testability [BS06, Val05, CR05, DSW06, GM2, BV09b, BV09a, Vid5a]. In order to resolve this issue, we use the following idea of [Mei09]. We say that a code C 0 has property if there exists a code D such that C 0 is the tensor product D 2. It follows from [BS06, Vid5a] that the tensor product operation roughly preserves the local testability of codes that have property. Specifically, if C 0 has property and is locally testable with query complexity q and has relative distance δ, then C0 2 is locally testable with query complexity q poly(/δ). Hence, in order to make our construction go through, we maintain the invariant that our code C has property throughout the iterations. This requires us to show that a single iteration preserves the property of the code. To this end, first observe that the tensor product operation clearly preserves the property. The more challenging part is to make sure that the distance amplification preserves the property. In order to do so, we define a new operation, which we called -distance amplification, which amplifies the distance while preserving the property and the local testability. The -distance amplification. We conclude by describing how the -distance amplification works: Suppose that we are given a code C 0 that has the property, and we wish to amplify its relative distance to δ. By definition, there exists some code D such that C 0 = D 2. We apply the Alon-Luby distance-amplification to D to obtain a new code D with relative distance δ. We now define the code C 0 = (D ) 2 to be the result of applying the -distance amplification to C 0. Observe that C 0 indeed has relative distance δ, and that it has the property, as required. It remains to prove that the -distance amplification preserves the local testability, i.e., that this operation increases the query complexity by a factor of at most poly(/δ). Following [Mei09], we show it by decomposing this operation into simpler block-wise operations, and showing that each of these operations preserves local testability. 4

.3 Related work.3. Comparison with [KMRS5] As explained above, in our earlier work [KMRS5], we gave a construction of LTCs with constant rate, constant relative distance, and query complexity exp( log n log log n). This construction had two parts: () the construction of an LTC with sub-constant relative distance, and (2) application of the Alon-Luby distance-amplification to this LTC to get constant relative distance (while roughly preserving the other parameters). Our technical contribution is improving the LTC that is constructed in the first step. In order to make the comparison between the constructions easier, the first step of [KMRS5] could be presented as follows: As in the current construction, we start with a code with small block length, which is trivially locally testable, and then iteratively increase its block length. However, unlike our current construction, the iterations of [KMRS5] consist only of the tensor product operation, without using the distance amplification. Since we do not amplify the distance in each iteration, it decays quickly, and we end up with relative distance exp( log n log log n). Moreover, since the tensor product increases the query complexity by a factor that depends on the relative distance, the query complexity increases faster, and we can afford less iterations. Hence, we end up with worse parameters compared to the current construction..3.2 Comparison with [Mei09] As mentioned above, our construction bears resemblance to the LTCs of [Mei09]. The main part of [Mei09] constructs LTCs with rate poly log(n), constant relative distance, and query complexity poly log(n). That construction, too, starts with a code of small block length, which is trivially locally testable, and increases it iteratively. In each iteration, the block length is squared, the rate decreases by a constant factor, the relative distance is maintained, and the query complexity increases by a constant factor. Hence, after O(log log n) iterations, we get the required parameters. A single iteration of [Mei09] consists, too, of applying tensor product and distance amplification to the code. However, a single iteration there also applies an additional operation called random projection, which partially undoes the rate loss causes by tensoring - it causes the rate to decrease by a constant factor rather than be squared in each iteration. The difference between the two constructions could therefore be summed up as follows: The construction of [Mei09] starts with a constant rate, and then decreases the rate by a constant factor in each iteration. Thus, after O(log log n) iterations, it ends up with rate poly log(n). Our construction starts with rate poly log n, and then squares the rate in each iteration. Thus, after O(log log n) iterations, it ends up with constant rate. A crucial point here is that when the rate is very high (e.g., poly log n ), squaring the rate is better than multiplying the rate by a constant factor. The difference between the constructions may seem as only a matter of choosing the parameters, which raises the question why [Mei09] could not obtain our construction. There are two reasons for that: The query complexity is decreased to a constant in a later part of [Mei09]. 5

The construction of [Mei09] uses a different distance-amplification method due to [ABN + 92]. This method always loses at least a constant factor in the rate, regardless of the choice of parameters. Hence, [Mei09] could not avoid a loss of constant factor in each iteration, and had to end up with rate poly log(n) method allows us to lose only a factor of poly log n. On the other hand, our use of the Alon-Luby amplification in each iteration. In analyzing the local testability of the tensor product, [Mei09] relied on a theorem of [BS06]. This theorem only holds for codes that have very high relative distance ( 4 7 8 ). Thus, [Mei09] had to work with codes with very high relative distance, which forced those codes to have low rate. In particular, [Mei09] could not start with rate poly log n. We, on the other hand, replace the theorem of [BS06] with a more recent theorem of [Vid5a], which also works for codes of low distance. Hence, we can work with codes of rate poly log n. An interesting research direction would be to use the random projection operation of [Mei09] to further improve our construction. However, it seems that the straightforward analysis of this operation does not work in the setting of high rate and sub-constant relative distance..4 Organization We review preliminaries regarding error-correcting codes and locally-testable codes in Section 2. In Sections 3 and 4 we formally define the tensor product and -distance amplification operations respectively, and analyze the effects these operations have on the parameters of LTCs. Finally, in Section 5, we construct our LTCs using these operations. 2 Preliminaries All logarithms in this paper are in base 2 unless specified otherwise. For any n N we denote [n] def = {..., n}. We denote by F 2 the finite field of two elements. For any finite alphabet Σ and any pair of strings x, y Σ n, the relative Hamming distance (or, simply, relative distance) between x and y is the fraction of coordinates on which x and y differ, and is denoted by dist(x, y) def = {i [n] : x i y i } /n. We have the following useful approximation. Fact 2.. For every x, y R such that 0 x y, it holds that ( x) y 4 x y. Proof. It holds that ( x) y e x y 4 x y. The second inequality relies on the fact that 4 x e x for every x (0, ), which can be proved by noting that 4 x = e x at x = 0, and that the derivative of e x is smaller than that of 4 x for every x (0, ). The first inequality relies on the fact that x e x for every x R, which can be proved using similar considerations. The tensor product of matrices. For a pair of matrices G F m n and G 2 F m 2 n 2 their tensor product G G 2 (a.k.a. the Kronecker product) is the (m m 2 ) (n n 2 ) matrix over F with entries (G G 2 ) (i,i 2 ),(j,j 2 ) = (G ) i,j (G 2 ) i2,j 2 6

for every i [m ], i 2 [m 2 ], j [n ] and j 2 [n 2 ]. The following is a well-known fact about the tensor product of matrices: Fact 2.2. Let A, B, C, D be matrices. Then, 2. Error-correcting codes (A B) (C D) = (A C) (B D). Let Σ be an alphabet and let n be a positive integer (the block length). A code is simply a subset C Σ n. If F is a finite field and Σ is a vector space over F, we say a code C Σ n is F-linear if it is an F-linear subspace of the F-vector space Σ n. If Σ = F, then we simply say that C is linear. log C log( Σ n ) F(C) n dim F (Σ). The rate of a code is the ratio, which for F-linear codes equals dim The elements of a code C are called codewords. We say that C has relative distance at least δ if for every pair of distinct codewords c, c 2 C it holds that dist(c, c 2 ) δ. We use the notation dist(z, C) to denote the relative distance of a string z Σ n from C, and say that z is ε-close to (respectively, ε-far from) C if dist(z, C) < ε (respectively, if dist(z, C) ε). Let C F n be a linear code of dimension k. A generating matrix of C is an n k matrix G such that the map x G x is an isomorphism from F k to C. We say that an infinite family of linear codes {C n } n is explicit if there is an algorithm that on input n outputs a generating matrix of C n in time poly(n). Zyablov codes. We use the following fact, which states the existence of the Zyablov codes. Fact 2.3 (Zyablov bound [Zya7]). For every 0 < r < and ε > 0, there exists an explicit infinite family {Z n } n of binary linear codes of with rate r and relative distance ( δ = max {( R ε) H r )}, r<r< R where H is the inverse of the binary entropy function. In the latter statement of the Zyablov bound, we chose δ as a function of r. However, we may also choose r as a function of δ, which leads to the following statement: for every 0 < δ < and ε > 0, there exists an explicit infinite family {Z n } n of binary linear codes of with relative distance δ and rate r = max 0<R< H(δ+ε) { R ( δ H ( R) ε where H is the binary entropy function. In particular, for small values of δ, we can choose ε = δ and R = H(2 δ), yielding the following result. Fact 2.4 (Special case of the Zyablov bound). For every sufficiently small δ > 0, there exists an explicit infinite family {Z n } n of binary linear codes of with relative distance δ and rate at least ( r = H(2 δ) ) ( ) δ 3 δ, where the inequality holds for sufficiently small values of δ. )}, 7

2.2 Locally-testable codes Intuitively, a code is said to be locally testable [FS95, RS96, GS06] if, given a string z Σ n, it is possible to determine whether z is a codeword of C, or rather far from C, by reading only a small part of z. There are two variants of LTCs in the literature, weak LTCs and strong LTCs. From now on, we will work exclusively with strong LTCs, since it is a simpler notion and allows us to state a stronger result. Definition 2.5. We say that a code C Σ n is (strongly) locally testable with query complexity q if there exists a randomized algorithm A that satisfies the following requirements: Input: A gets oracle access to a string z Σ n. Completeness: If z is a codeword of C, then A accepts with probability. Soundness: If z is not a codeword of C, then A rejects with probability at least dist(z, C)/4. Query complexity: A makes at most q queries to the oracle z. We say that the algorithm A is a local tester of C. Given an infinite family of LTCs {C n } n, a uniform local tester for the family is a randomized oracle algorithm that given n, computes the local tester of C n. We will often also be interested in the running time of the uniform local tester. A remark on amplifying the rejection probability. It is common to define strong LTCs with an additional parameter ρ, and have the following soundness requirement: If z is not a codeword of C, then A rejects with probability at least ρ dist(z, C). Our definition corresponds to the special case where ρ = 4. However, given an LTC with ρ < 4, it is possible to amplify ρ up to 4 at the cost of increasing the query complexity. Hence, we chose to fix ρ to 4 in our definition, which somewhat simplifies the presentation. The amplification of ρ is performed as follows: The amplified tester invokes the original tester A for ρ times, and accepts only if all invocations of A accept. Clearly, this increases the query complexity by a factor of ρ and preserves the completeness property. To analyze the rejection probability, let z be a string that is not a codeword of C, and observe that amplified tester rejects z with probability at least ( ρ dist(z, C)) ρ ( 4 ) ρ dist(z, C) ρ (Fact 2.) = dist(z, C), 4 as required. 3 Tensor product In this section, we provide the formal definition of the tensor product operation and state the effect of this operation on the parameters of the LTC. The effect of this operation on the classical parameters of a code such as the block length, rate and relative distance is well-known (see, e.g. 8

[Sud0, DSW06]). To argue about the effect of this operation on the query complexity of an LTC we shall use a result due to Viderman [Vid5a]. For a linear code C F n and C 2 F n 2, their tensor product code C C 2 F n n 2 consists of all the matrices M such that all the rows of M are codewords of C 2 and all the columns are codewords of C. For a linear code C, let C def = C and C m def = C m C. The following are some useful facts regarding the tensor product operation and its effect on the classical parameters of a code (see e.g. [Sud0, DSW06]). Fact 3. (Properties of tensor product). Let C F n and C 2 F n 2 be linear codes of rates r, r 2 and relative distances δ, δ 2 respectively. Then C C 2 F n n 2 is a linear code of rate r r 2 and relative distance δ δ 2. Furthermore, if G, G 2 are generating matrices of C, C 2 respectively, then the tensor product G G 2 is a generating matrix of C C 2 (see Section 2 for the definition of G G 2 ). Let C F n be a linear code, and consider the code C 4. The codewords of C 4 are of length n 4, and it is useful to identify their coordinates set with the 4-dimnesional hypercube [n] 4. We say that a set P [n] 4 is an axis-parallel plane of the hypercube [n] 4 if there exist α, β [n] and k k 2 [4] such that { } P = (i, i 2, i 3, i 4 ) [n] 4 : i k = α, i k2 = β. The following fact gives an alternative characterization of C 4, and follows from our definition of tensor product codes. Fact 3.2 (see, e.g., [Mei09, Section 4.]). Let C F n be a linear code, and let z F n4. Then, z C 4 if and only if z P C 2 for every axis-parallel plane P. It turns out that the characterization in Fact 3.2 is robust, in the sense that if, for the average axis-parallel plane P, it holds that z P is close to C 2, then z is close to C 4. Conversely, if z is far from C 4, then z P is far from C 2 on average. This was proved by [BS06, Vid5a] for the purpose of constructing locally testable codes using tensor products. In particular, we use the following result. Theorem 3.3 ([Vid5a, Theorem 4.4]). Let C F n be a code with relative distance δ. Let z F n4 be a string, and let P [n] 4 be a random axis-parallel plane. Then, E [ dist(z P, C 2 ) ] 300 δ2 dist(z, C 4 ). Following [Mei09], we use the latter theorem for showing that the tensor product operation maintains the local testability of codes: Corollary 3.4 (Local testability of tensor product). Let C be a code of relative distance δ that is locally testable with query complexity q and running time T. Suppose furthermore that C has the property, i.e., that there exists a code D such that C = D 2. Then C 2 is locally testable with query complexity 200 q/δ 6 and running time (O(T ) + poly log(n)) /δ 6. Proof. Suppose that C is a code over F of block length n, and let n def = n. Observe that D F n is a code of relative distance δ def = δ, and that C 2 = D 4. Let A be the local tester of C. We describe a local tester A for C 2. When given a received word z F n2 = F (n ) 4, the tester A chooses a random axis-parallel plane P [n ] 4, runs the local tester A to verify that z P C = D 2, and accepts if and only if A 9

accepts. Clearly, A uses q queries, runs in time O(T ) + poly log(n), and accepts codewords of C 2 with probability. We show that A rejects a string z F n2 with probability at least 200 δ6 dist(z, C 2 ). This will imply the required result, since the latter rejection probability can be amplified to 4 dist(z, C 4 ) by increasing the query complexity and running time by a factor of 200/δ 6 (see the discussion in Section 2.2). Let z F n2 = F (n ) 4 be a string. Then, for every axis-parallel plane P [n] 4, the local tester A rejects z P with probability at least 4 dist(z P, C 2 ). By Theorem 3.3, it follows that A rejects z with probability at least [ E P 4 dist ( z P, D 2)] = 4 E [ ( P dist z P, D 2)] as required. 4 300 (δ ) 2 dist(z, D 4 ) = 200 δ6 dist(z, C 2 ), 4 -distance amplification In this section, we describe the -distance amplification operation and analyze its effect on the parameters of an LTC. As explained in the introduction, the -distance amplification operation is designed to improve the relative distance of an LTC C while preserving the property. This is done roughly as follows: Let C be an LTC that has the property, i.e., there exists some code D such that C = D 2. We would like to improve the relative distance of C. To this end, we apply the Alon-Luby distance-amplification technique to D, thus obtaining a code D, and set C = (D ) 2 to be the new code. Clearly, C has the property, and it is not hard to show that this operation improves the relative distance. The main challenge in this section will be to show that C is still locally testable. There is one more issue that needs to be handled: the Alon-Luby distance-amplification technique increases the alphabet size of the code. Thus, if D was a binary linear code, the code D would have alphabet Σ = {0, } p for some p N. In particular, D would not be a linear code, but only an F 2 -linear code. This is problematic, both because we need to maintain the linearity of our codes (as otherwise the tensor product operation would not be defined), and because we do not want the alphabet size of our codes to increase throughout the iterations. In order to resolve this issue, after applying the Alon-Luby amplification, we concatenate the resulting code with a binary inner code to reduce the alphabet size back to 2. We turn to implementing the above ideas formally. We start by describing the two main ingredients of the -distance amplification operation: the Alon-Luby distance-amplification and concatenation. Alon-Luby distance-amplification. Recall that that in our previous work [KMRS5] we observed that a technique of Alon and Luby [AL96] can be used to amplify the relative distance of LTCs. We now state the lemma that we need from [KMRS5] (actually, a special case of this lemma for binary linear codes). 0

Lemma 4. (Alon-Luby distance-amplification, [KMRS5, Lemma 4.2]). Suppose that there exists a binary linear code C {0, } n with relative distance δ and rate r that is locally testable with query complexity q. Then, for every 0 < δ, ε <, there exists a code C with relative distance at least δ that is locally testable with query complexity q poly(/(ε δ)) such that: C = C. C has rate at least r ( δ ε). The alphabet of C is Σ def = {0, } p for some p = poly(/(ε δ)). C is F 2 -linear. Furthermore, There is a polynomial time algorithm that computes a bijection from every code C to the corresponding code C, given r, δ, δ, and ε. There is an oracle algorithm that computes the local tester of the code C when given black box access to the local tester of the code C, and given also r, δ, δ, ε, and the block length of C. Moreover, if the local tester of C runs in time T C, then the resulting local tester of C runs in time T C poly (log(n)/(ε δ)), where n is the block length of C. Remark 4.2. The running time that is stated in Lemma 4. for the tester of C is better than the one that is stated in [KMRS5, Lemma 4.2]. However, the better running time can be verified by inspecting the proof in [KMRS5]. Concatenation. Concatenation is an operation on codes that can be used for reducing the alphabet size of a code. Let Λ and Σ be alphabets such that Σ = Λ p for some p N. Let C Σ n be a code over Σ and let H Λ m be a code over Λ. Suppose there exists a bijection φ : Λ p H. The concatenation of C with H is the code C Λ m n that is obtained as follows: for each codeword c C, we construct a corresponding codeword c C by replacing each symbol c i with φ(c i ). We shall use the following well-known fact. Fact 4.3 (Concatenation). Let C Σ n be a code over Σ = Λ p with rate r C and relative distance δ C, let H Λ m be a code over Λ with rate r H and relative distance δ H, and let C Λ m n be the concatenation of C with H. Then C has rate r C r H and relative distance δ C δ H. Furthermore, if Λ is a field, C is Λ-linear, and H is linear, then C is linear. The following lemma states the -distance amplification. We only state the amplification for binary linear codes and for sufficiently small distances, which is sufficient for our purposes. Lemma 4.4 (Properties of -distance amplification). There exists a universal constant δ 0 > 0 such that the following holds. Let C {0, } n be a binary linear code with relative distance δ and rate r that is locally testable with query complexity q. Suppose further that there exists a code D such that C = D 2. Then, for every sufficiently small δ such that δ 0 > δ > δ, there exists a binary linear code C with relative distance δ that is locally testable with query complexity q poly(/δ) such that: C = C. C has rate at least ( 6 2 ) δ ) r.

There exists a code D such that C = (D ) 2. Furthermore, There is a polynomial time algorithm that computes a bijection from every code C to the corresponding code C, given r, δ, δ, and the generating matrix of C. There is an oracle algorithm that computes the local tester of the corresponding code C when given black box access to the local tester of C, and given also r, δ, δ, and the block length of C. The resulting local tester of C runs in time poly(log n/δ) T C where T C is the running time of the local tester of C. We now construct the code C and show that it has the required rate and relative distance. In the rest of the section we will show that C is locally testable with the required query complexity and running time. As explained above, the basic idea of the construction is to first apply the Alon-Luby distanceamplification to D, and then concatenate the resulting code with a binary inner code to reduce the alphabet size back to 2. We choose δ 0 to be sufficiently small such that Fact 2.4 holds (the special case of the Zyablov bound). The code C is constructed as follows: Recall that the code D has rate r and relative distance δ. We apply the Alon-Luby distance-amplification (Lemma 4.) to the code D, choosing both the parameters δ and ε to be 4 δ. This results in an F 2 -linear code D with relative distance 4 δ and rate ( 2 4 δ ) r over the alphabet Σ = {0, } p (for some p = poly(/δ)). We choose our inner code to be the binary linear code Z obtained from the special case of the Zyablov bound (Fact 2.4) 2 def with relative distance δ Z = 4 δ and relative distance at least def r Z = 2 δ. We now concatenate D with the code Z, thus obtaining a binary linear code D relative distance 4 δ δ Z = δ and rate at least ( 2 4 δ ) r r Z = ( 2 4 δ ) r ( 2 δ ) ( 3 2 δ ) r. Finally, we set C def = (D ) 2. It is not hard to see that C is a linear code over F with relative distance δ and rate [ ( 3 2 δ ) r] 2 ( = 3 2 ) 2 δ 2 r ( 6 δ ) r. Thus, C has the required parameters. It remains to analyze the query complexity and running time of the local tester of C. The basic idea of the proof is as follows: we observe that C can be obtained from C by applying local operations, and show that such local operations preserve the local testability. The rest of this section is organized as follows: in Section 4., we set up a framework for working with local operations, which is a special case of framework of [Mei09]. Then, in Section 4.2, we show that C is locally testable by showing that it can be obtained from C using local operations, thus completing the proof of Lemma 4.4. 2 The code meeting the the special case of the Zyablov bound was chosen just for concreteness of parameters. Any explicit family of binary linear codes with distance δ and rate poly(δ) (for arbitrary δ) would have sufficed for this construction. 2

4. Local operations 4.. Block-wise operations The first type of local operations that we use is block-wise operations, defined as follows. Definition 4.5 (Block-wise operations). Let φ : Σ p Σ m be one-to-one, and let w Σ n such that p divides n. We say that w Σ n is obtained by applying φ to w block-wise if it holds that w = φ(w... w p ) φ(w p+..., w 2p ) φ(w n p+..., w n ). Observe that applying φ to w block-wise is a one-to-one function. For a set W Σ n, we say that a set W Σ n is obtained by applying φ to W block-wise if W is the result of applying φ block-wise to all the elements w W. We say that φ is invertible in time T if there is an algorithm that takes as an input a string u Σ m, runs in time T, and computes its pre-image φ (u) (or rejects if it does not exist). We refer to the latter algorithm as the inverter of φ. The following proposition shows that block-wise operations preserve local testability (this is a variant of Corollary 5.8 in [Mei09]). Proposition 4.6 (Local testability of block-wise operations). Let L Σ n be a code that is locally testable with query complexity q, and let φ : Σ p Σ m be one-to-one. Then, the code L Σ n that is obtained by applying φ to L block-wise is locally testable with query complexity O(m 2 q). Furthermore, if φ is invertible in time T, then there is an oracle algorithm that computes the local tester of the corresponding code L when given black box access to the local tester of L and to the inverter of φ. The resulting local tester of L runs in time O(m T T L ) where T L is the running time of the local tester of L. Proof. Let A be the local tester of L. We describe a local tester A for L. When given oracle access to a purported codeword z Σ n, the local tester A acts as follows. First, A partitions z to blocks of length m, chooses a uniformly distributed block, checks that this block is an image of φ, and rejects otherwise. Next, A emulates A. Whenever A makes a query to a coordinate i [n], the tester A acts as follows: A reads the block of z to which i belongs, i.e., the block whose index is i/p. If this block is not an image of φ, the tester A rejects. If the block is an image of φ, the tester A inverts it, retrieves the value of the i-th coordinate from the pre-image, and feeds it to A as an answer to the query. Finally, when A finishes running, A accepts if A accepts and rejects otherwise. It is easy to see that the query complexity of A is (q + ) m, that it has the required running time, and satisfies the completeness property. We show that A rejects z with probability at least 8 m dist(z, L ) this can be amplified to dist(z, L )/4 at the cost of increasing the query complexity and running time by a factor of O(m), which will give us the required result (see the discussion in Section 2.2). Let y be the string that is obtained by replacing each block of z with the closest image of φ. Suppose first that dist(y, z ) dist(z, L )/2. In this case, at least dist(z, L )/2 fraction of the blocks of z are not images of φ, and hence A rejects in the first step with probability at least dist(z, L )/8. Consider now the case where that dist(y, z ) < dist(z, L )/2. In this case, the triangle inequality implies that dist(y, L ) dist(z, L )/2. Let y be the string obtained by inverting φ on all the blocks of y. It is not hard to see that when A emulates a query i of the tester A, it either answers it 3

with y i or rejects. Hence, the rejection probability of A on z is at least the rejection probability of A on y, which is at least dist(y, L)/4. Now, observe that dist(y, L ) n m dist(y, L) n, and therefore dist(y, L) n n dist(y, L ) dist(z, L ) m 2 m. It thus follows that A rejects z with probability at least 8 m dist(z, L ), as required. 4..2 Permutations The second type of local operations that we use is permuting the coordinates of a code: Definition 4.7 (Permutations). Let σ : [n] [n] be a permutation, and let w Σ n. We say that w Σ n is obtained by applying σ to w if it holds that w i = w σ(i) for every i [n]. For a set W Σ n, we say that a set W Σ n is obtained by applying σ to W if W is the result of applying σ to all the elements w W. We say that σ is invertible in time T if there is an algorithm that takes as an input a coordinate j [n], runs in time T, and computes its pre-image σ (j). We refer to the latter algorithm as the inverter of σ. It is easy to see that applying a permutation preserves local testability. The following proposition states this fact along with the effect of the permutation on the running time of the tester. Proposition 4.8 (Local testability of permutations). Let L Σ n be a code that is locally testable with query complexity q, and let σ : [n] [n] be a permutation. Then, the code L that is obtained by applying σ to L is locally testable with query complexity q. Furthermore, if σ is invertible in time T, then there is an oracle algorithm that computes the local tester of the corresponding code L when given black box access to the local tester of L and to the inverter of σ. The resulting local tester of L runs in time O(T T L ) where T L is the running time of the local tester of L. 4..3 Local operations and tensor products The following propositions describe the interaction between the above local operations and tensor products of codes. This will be useful for analyzing the local testability properties of the -distance amplification procedure, which applies local operations to a tensor product code. Proposition 4.9 (Follows from [Mei09, Propositions 5.5,5.6]). Let F be a finite field, and let φ : F p F m be a linear one-to-one function. Let L F n be a linear code, and let L F n be the code that is obtained by applying φ to L block-wise. Then, there exist a linear one-to-one function φ : F p2 F m2 and permutations σ, σ 2 such that (L ) 2 is obtained by applying to L 2 the permutation σ, then φ block-wise, and then the permutation σ 2. Furthermore, σ and σ 2 are invertible in time polylogn. The functions φ and φ are invertible in time poly(m) since they are linear. Proof. Let G be a generating matrix of L, and let A be an m p matrix such that φ(x) = A x for every x F p. Let I n/p be the (n/p) (n/p) identity matrix. It is not hard to see that applying φ block-wise to a vector w F n is equivalent to multiplying w by I n/p A. Hence, a generating matrix of L is (I n/p A) G. 4

By Fact 3., the matrix G G is a generating matrix of L 2, and a generating matrix of (L ) 2 is ( (In/p A) G ) ( (I n/p A) G ). By Fact 2.2, the latter matrix is equal to the matrix ( In/p A I n/p A ) (G G). In other words, codewords of (L ) 2 are obtained by multiplying codewords of L 2 by the matrix I n/p A I n/p A. Now, it is not hard to see 3 that by permuting the rows and columns of the latter matrix we can get the matrix I n/p I n/p A A = I n 2 /p 2 A A, which is the matrix that applies the operation A A block-wise. permutation matrices P, P 2 such that In other words, there exist and therefore a generating matrix of (L ) 2 is I n/p A I n/p A = P 2 (I n 2 /p 2 A A) P, P 2 (I n 2 /p 2 A A) P (G G). Let σ and σ 2 be the permutations that correspond to the matrices P and P 2. The latter expression means that (L ) 2 is obtained by applying to L 2 the permutation σ, then A A block-wise, then the permutation σ 2, as required. It is not hard to see that the permutations σ and σ 2 are invertible in time poly log(n). Proposition 4.0. Let F be a finite field, and let σ : [n] [n] be a permutation. Let L F n be a linear code, and let L F n be the code that is obtained by applying σ to L. Then, there exists a permutation σ such that (L ) 2 is obtained by applying σ to L 2. Furthermore, if σ is invertible in time T then σ is invertible in time O(T ). The proof of Proposition 4.0 is similar to that of Proposition 4.9 but simpler, and is therefore omitted. 4.2 The local testability of C In this section, we show that the code C is locally testable, thus completing the proof of Lemma 4.4. This is done in three steps: first, we observe that D is obtained from D by applying a sequence of block-wise operations and permutations. We then use Propositions 4.9 and 4.0 to show that the same holds for C and C. Finally, we use the local testability of C and Propositions 4.6 and 4.8 to conclude that C is locally testable. 3 To see it, let use denote M def = I n/p A I n/p A and N def = I n/p I n/p A A. Since those matrices are tensor products, we can label the rows and the columns with quadruples of indices (i, i 2, i 3, i 4) and (j, j 2, j 3, j 4). Now, by the definition of the tensor product of matrices, it holds that M (i,i 2,i 3,i 4 ),(j,j 2,j 3,j 4 ) = ( ) ) I n/p i,j A i2,j 2 (I n/p i 3,j 3 A i4,j 4 = ( ) ) I n/p i,j (I n/p i 3,j 3 A i2,j 2 A i4,j 4 = N (i,i 3,i 2,i 4 ),(j,j 3,j 2,j 4 ). Hence, N can be obtained from M by permuting the rows and the columns, as required., 5

Obtaining D from D. Recall that D is obtained from D by applying the distance amplification of Lemma 4. to D to obtain a code D, and then concatenating the resulting code D with an inner code Z {0, } m. The construction of D from D in Lemma 4. provides an F2 -linear bijection from the code D to the code D. We give a brief description of this bijection, in order to argue that it is local (see [KMRS5] for more details). Let n denote the block length of D. Given a codeword d D, the bijection maps it to a codeword d D as follows:. Choose a Reed-Solomon code of block length s def = poly(/(ε δ)), rate ε, dimension b, and alphabet {0, } r for r def = log s. 2. The codeword d is partitioned 4 to n def = n b r blocks of length b r. 3. Each block is viewed as a string of length b over the alphabet {0, } r, and is encoded with the Reed-Solomon code. 4. Together, the encoded blocks form a string w of length n s over the alphabet {0, } r. The coordinates of w are now permuted according to some fixed permutation, which is determined by the edges of a strongly-explicit expander. Let us denote the obtained string w. 5. Finally, the string w is partitioned to n blocks of length s. Let us denote those blocks by S,..., S n. We view each block S j as a single symbol over the alphabet Σ = {0, } r s, and define d to be the resulting string over Σ. It is easy to see that the code D is obtained from D by applying the same bijection, and then encoding the blocks S,..., S n with the inner code Z rather than viewing them as symbols. Now, observe that this corresponds to obtaining D from D by applying a block-wise operation (the encoding with Reed-Solomon), a permutation, and then another block-wise operation (the encoding with Z). Obtaining C from C. Let D be the code obtained from D after applying the first block-wise operation, and let D 2 be the code obtained from D by applying the permutation. Observe that D is obtained by applying a block-wise operation to D 2. Let C = (D ) 2 and C 2 = (D 2 ) 2. By Propositions 4.9 and 4.0, the following claims hold:. C is obtained by applying to C = D 2 a permutation, a block-wise operation, and then another permutation. Specifically, the block-wise operation has block length (s r) 2 poly(/ δ). 2. C 2 is obtained by applying a permutation to C. 3. C = (D ) 2 is obtained by applying to C 2 a permutation, a block-wise operation, and then another permutation. Specifically, the block-wise operation has block length m 2 (where m is the block length of Z). 4 For simplicity, we assume here that b r divides n. In the general case, we increase n to the next multiple of b r by padding d with zeroes (see [KMRS5, Section 3..2] for details). 6

The local testability of C. We conclude that C is obtained from C by applying permutations and two block-wise operations. The blocks of the block-wise operations are of length poly(m/(ε δ)) (where m is the block length of Z). By Propositions 4.6 and 4.8, it follows that C is locally testable with query complexity q poly(m/δ) = q poly(/δ). We turn to analyze the running time of the local tester. Observe that the block-wise operations are invertible in time poly(m/δ): the reason is that those operations are linear, and generating matrices of the Reed-Solomon code and the Zyablov code Z can be constructed in time poly(s) poly(/δ) and poly(m) respectively. Next, observe that the permutations are invertible in time poly(m log n/δ): For the permutations that obtain C from C and C from C 2, this follows from Proposition 4.9. For the permutation that obtains C 2 from C, this follows from Proposition 4.0, and from the fact that the above bijection determines the permutation according to a strongly-explicit expander. Propositions 4.6 and 4.8 imply in turn that the local tester of C runs in time as required. poly(m log n/δ) T C = poly(log n/δ) T C 5 LTCs with quasi-polylogarithmic query complexity In this section we prove Theorem.2, which gives an infinite family of high-rate LTCs with quasipolylogarithmic query complexity. The codes in this family also have the property of approaching the Singleton bound over constant size alphabet. This immediately proves Theorem.2 from the introduction. Theorem.2. For every r (0, ) and every ε > 0, there exists an explicit infinite family of linear codes {C n } n satisfying:. C n has block length n, rate at least r, and relative distance at least r ε, 2. C n is locally testable with query complexity and running time at most (log n) O(log log n), 3. The alphabet of C n is of size at most exp(poly(/ε)). Furthermore, the family {C n } n has a uniform local tester that runs in time (log n) O(log log n). By concatenating the codes given by the above theorem with binary Gilbert-Varshamov codes [Gil52, Var57] of constant block length, we get the following stronger version of Theorem., which gives binary LTCs with quasi-polylogarithmic query complexity that attain the Zyablov bound [Zya7] (this immediately implies Theorem.. Theorem 5.. For every r (0, ) and ε > 0, there exists an explicit infinite family of binary linear codes {C n } n satisfying:. C n has block length at least n, rate at least r, and relative distance at least ( max {( R ε) H r )}, r<r< R where H is the inverse of the binary entropy function in the domain ( 0, 2). 2. C n is locally testable with query complexity (log n) O(log log n). 7

Furthermore, the family {C n } n has a uniform local tester that runs in time (log n) O(log log n). We prove Theorem.2 in two stages: In the first stage, we construct LTCs with the desired query complexity that have sub-constant relative distance. In the second stage, we apply the Alon-Luby distance-amplification technique to the latter codes to obtain LTCs with the desired parameters. The first stage is the main part of the proof, and is formalized in the following result. Lemma 5.2 (Main lemma). There exists an explicit infinite family of binary linear codes {W n } n satisfying:. W n has block length at least n, rate at least O( log n ), and relative distance at least log O() n. 2. W n is locally testable with query complexity (log n) O(log log n). Furthermore, the family {W n } n has a uniform local tester that runs in time (log n) O(log log n). We prove the main lemma in Section 5.. We now prove Theorem.2 based on this lemma. Proof of Theorem.2 We would like to construct the desired LTCs by amplifying the relative distance of the LTCs of the main lemma. The straightforward solution would be to apply the Alon- Luby distance-amplification (Lemma 4.) to those codes and amplify them directly to the desired distance. However, this solution is problematic, since Lemma 4. yields codes whose alphabet size depends on the original relative distance. In our case would yield super-constant alphabet size, while we would like our codes to have constant alphabet size. In order to resolve this issue, we construct our LTCs in a multiple steps:. We start by amplifying the LTCs of the main lemma only to a small relative distance poly(ε). This yields codes with super-constant alphabet size, and only decreases the rate by a factor of poly(ε). 2. Next, we concatenate the latter codes with some binary codes with relative distance poly(ε). This yields binary codes with relative distance poly(ε), and again only decreases the rate by a factor of poly(ε). 3. Finally, we amplify the latter LTCs to the desired relative distance. This yields codes whose alphabet size depends only on ε, as required. We now provide the formal details. Fix 0 < r < and ε > 0, and let δ = r ε be the desired relative distance. Let {W n } n be the infinite family of Lemma 5.2. We start by applying Lemma 4. with parameters δ = 5 ε and ε = 5 ε. This yields an infinite family of F 2-linear LTCs {U n } n with relative distance 5 ε, rate at least 2 5 ε O( log n ) 3 5 ε, query complexity (log n) O(log log n), and alphabet size exp ( (log n) O(log log n)). Next, let {Z n } n be an infinite family of binary Zyablov codes (Fact 2.4) with relative distance ( 5 ε) 3 and rate 5 ε. We concatenate the family {U n} n with the family {Z n } n, thus obtaining an infinite family of binary LTCs {V n } n with relative distance O(ε 3 ), rate at least 4 5 ε and query complexity (log n) O(log log n). The upper bound on the query complexity can be proved by noting that concatenation is a block-wise operation (as in Definition 4.5), and thus we can apply Proposition 4.6. 8