CS 59000 CTT Current Topics in Theoretical CS Oct 30, 0 Lecturer: Elena Grigorescu Lecture 9 Scribe: Vivek Patel Introduction In this lecture we study locally decodable codes. Locally decodable codes are error correcting codes that allow very efficient access to encoded data, and in addition they are highly resilient to noise. As a motivating example suppose that we have a list of songs say s, s,... s n that we would like to store so as to ensure both efficient decoding and error resilience. One way to encode this library would be to use an error-correcting code and encode each song separately and store the concatenation of these encodings E(s ), E(s ),..., E(s n ). To recover a song one would only need to access the corresponding block in the database. If a hacker takes out one song from this list then there will be no way of getting that song back. To deal with this possibility one might want instead to concatenate all songs into a string s = s, s, s 3,..., s n and then encode them altogether in a string E(s). However, the drawback now is that the time to recover/decode one song will be proportional to the length of all songs as we now have to decode the whole library. Locally decodable codes help us overcome such scenario by allowing both resilience to errors and fast access to the data. Formal Definition Let F q be a a field of size q (for simplicity you can just think of the more familiar field Z q, where q is a prime and all operations are mod q). Informally, an r-query locally decodable code (LDC) encodes k-symbol messages x into n-symbol messages C(x) in such a way that one can probabilistically recover any symbol x i of the message by querying only r symbols of the (possibly corrupted) codeword C(x), where r is a very small number. Definition (Locally decodable codes) A code C : F k q F n q is a (r, δ, ɛ)- locally decodable if there exists a randomized algorithm A such that. x F k q, iɛ[k], y F n q with d(y, c(x)) δ (where d(y, c(x)) denotes the relative Hamming distance between y and c(x)) we have P r[a(y, i) = x i ] ɛ (that is, A recovers the ith bit of x w.p. ɛ over its random coins), and. A reads at most r coordinates of the received vector y The ideal settings are n = O(k), r = o(n). Some particularly interesting settings for cryptographic applications and private information retrieval schemes is when r =, 3, 4. Building LDCs of small rate is interesting for all ranges of r up to O(n). For binary codes δ < /4. The running time of the decoder is poly(r, log n) which is asymptotically much smaller then poly(n), the time to decode the entire received word. A related and stronger notion of local decoding is that of local correction. Here we disregard the message that we started with and only focus on its actual encoding, and we want to
recover each bit of the encoding from a possibly corrupted received word. Recall that by definition a codeword of a systematic code contains the actual message itself (together with some redundancy), so for systematic codes local correction implies local decoding. Also recall that every linear code is systematic. In this lecture we only look at locally correctable linear codes and so we ll use the terms locally decodable/correctable interchangeably. Definition (Locally correctable codes (LCCs)) A code C F n q is a (r, δ, ɛ)- locally correctable if there exists a randomized algorithm A such that. c C i [n], and vector y F n q such that d(y, c) δ (again, d(y, c) is the relative Hamming distance between y and c) we have Pr[A(y, i) = c i ] ɛ.. A makes only r queries into y 3 A local decoder for the Hadamard Code Recall that the Hadamard code is Had = {l a : {0, } k {0, }, l a (x) = a x mod a {0, } k }. In other words, a binary message a = (a,..., a k ) is encoded as (a x 0, a x,..., a x k ) = l a (x) x F k. To locally decode/correct Had means to provide an algorithm that makes a constant number of queries and is able to output l a (b) for any value of b {0, } k, when we have access to a received word that has a δ fraction of error compared to l a (x) x F k. Theorem 3 Had is (, δ, δ)-locally decodable. Proof Recall that since l a is a linear function we have l a (b) = l a (b + c) + l a (c) c F k. The intuition is that if not too many values of l a (x) got corrupted, we can count the votes of each c F k for the value of l a(b) = l a (b + c) + l a (c) and output the majority. This idea gives us the basic decoder below. Algorithm Local decoding of the Hadamard Code Input: A function f, such that d(l a, f) δ, and b F k. Goal: output l a(b).. Pick c uniformly at random from F k. Output: f(b + c) + f(c). We can now analyse this decoder. Since the distance between f and l a is δ we have that Pr c [f(c) l a (c)] δ. For any fixed b, if c is chosen uniformly from F k we have that b + c is uniformly distributed in F k so, we also have Pr[f(c + b) l a(c + b)] δ. From the Union Bound Pr[l a (b)] = f(b + c) + f(c)] δ. So we have shown an example of a code for which there is a local decoder with optimal query complexity ( bits) but which has terrible rate (the codeword length is exponential in the message length). We will next see an important family of LDCs with much better rate (only a poly blowup in message length) yet the query complexity is a constant.
4 Reed Muller Codes Reed-Muller codes (RM) are multivariate extensions of Reed-Solomon codes, which we ve seen in a previous lecture. Most of the new families of LDCs are generalizations of RM codes. Informally, RM(m, l) is the code consisting of evaluations of m-variate polynomials of degree l over F q. Definition 4 Let m, l be positive integers and F q a finite field, and l q. RM(m, l) = { p(α) α F m q p F q [x,..., x m ], deg(p) l}. For example the polynomials p(x, x ) = x 3 + 3x is a polynomial in variables over the field F 5 = Z 5 and has total degree of 3. The codeword corresponding to it can also be thought as the evaluation of p at every point in Z 5. 4. Parameters of RM Dimension (message length) We can think of an RM code as a generalization of RS in the following sense. Recall that a RS codeword was an encoding of a message (m 0,..., m k ) F k q into the codeword m(α) α Fq, where m(x) = k i=0 m ix i. Similarly, a RM codeword encodes a message whose coordinates are viewed as the the coefficients of a degree l polynomial in m variables. Hence the dimension( of the) code is the number of possible monomials of such m + l polynomials, which turns out to be. To see this, note that a monomial of degree l is x d l x d xd m m with m d i = l and so we are asking about the size of the set {(d, d,..., d m ) d i = l}, which ( is the ) number ( of) ways one can place m delimiters between l units, which is easily seen m + l m + l to be =. m l Block length By definition this is q m. Distance ( l/q)q m i.e. relative distance l/q. This is obtained from the following useful lemma about the number of roots of a multivariate polynomials over a finite field. Lemma 5 (Schwartz-Zippel) Let p F q [x,..., x m ] be a poly of total degree l. Then the number of x F m q s.t. p(x) = 0 is at most lq m. Since RM is a linear code, its minimum distance is the weight of a minimum weight codeword (say given by a polynomial p). Therefore this is q m {x p(x) = 0} = q m ( l/q). ( ) m + l Useful settings for the parameters So RM(l, m) is a [n, k, d] code where k =, l n = q m, d = ( l/q)q m. Expressing everything in terms of k, we may choose m = log k/ log log k, q = log k, and so n = k, and it follows that l < log k log log k << q, and so d > ( log log k )n. 3
5 A local decoder for RM codes Suppose f : F m q F q is the function to which the algorithm has oracle access and g RM(l, m) is s.t. d(f, g) = δ (so, x : g(x) f(x)] = δq m ). A local decoder for RM is required that, on input a F m q it outputs g(a) in time poly(m, l, q) (w.h.p.). The idea is to query points that are structured in a way that is specific to RM codes. Recall that for the Hadamard code, any tuple of points (b, c, b + c) was always satisfying the pattern h(b) = h(c) + h(b + c) (where h Had). It turns out that a similar pattern characterizes higher degree polynomials, but this time the pattern is more complicated and the points that give us the useful structure form a so-called line in F m q. Definition 6 A line in F m q is defined by a F m q and b F m q 0 and is given by the collection of points L a,b = {a + bt t F q }. We will sometimes view the line as a function L a,b : F m q F m q, L a,b (t) = a + bt. Before proceeding with the decoder we state some useful facts. Proposition 7 For any a F m q, t F q if b is uniform over F m q then L a,b (t) is uniform in F m q. Definition 8 If f : F m q F q then the restriction of f to line L = L a,b is the function f L : F q F q, f L (t) = f(l(t)). Proposition 9 If p F q [x,..., x m ] is a polynomial of degree l then p L = p L (i.e. p L (t) = p(l(t)) t) is a polynomial in one variable of degree l. The following proposition is the main tool in the proof of the correctness of the local decoder that we will present. It indirectly gives a way of characterizing degree l polynomials by l + points on a line, which will be the query complexity of a decoder for RM codes. Proposition 0 If g F q [x] is a degree l polynomial and for l+ distinct values α 0, α,..., α l F q we know g(α i ) = β i, then g can be recovered exactly at any point α F q. Theorem RM q (l, m) is (l +, 3(l+), 3 )-locally decodable. Proof Algorithm A basic local decoder for RM Code Input: A function f, such that d(g, f) δ, for some g RM(l, m) and a F m. Goal: output g(a).. Pick random direction b F m q {0} uniformly at random from F m and let L a,b(t) = a + bt.. Let α 0,..., α l be distinct elements in F q and let β i = f(l a,b (α i )) = f(a + α i b). 3. Interpolate (using Proposition 0) to find the unique polynomial of degree l s.t. g(α i ) = β i, i. 4. Output g(0) (which should be the corrected value of f(a) = f(a + 0 b)) By Proposition 7 a, if b is chosen uniformly at random a + bt is a uniform point in F m q for any t. So Pr[f(a + α i b) g(a + α i b)] 3(l+) by assumption, for every i {0,,..., l}. Again, 4
by a union bound Pr[ some α i s.t. f(a+α i b] g(a+α i b)] (l +) 3(l+) = /3, and so all the queried points are correct w.p. /3. When that is the case the interpolation step successfully finds the unique polynomials g that agrees with f on the queried points. Notice that the amount of error that the above decodes can recover from (i.e. 3(l+) ) degrades with the degree l. More sophisticated analyses can however show that RM codes are locally decodable from a constant fraction of error (unambiguously, even from /4-fraction of error). 5