Lecture Introduction. 2 Formal Definition. CS CTT Current Topics in Theoretical CS Oct 30, 2012

Similar documents
Lecture Introduction. 2 Linear codes. CS CTT Current Topics in Theoretical CS Oct 4, 2012

Locally Decodable Codes

Lecture 19 : Reed-Muller, Concatenation Codes & Decoding problem

Lecture 3: Error Correcting Codes

Locally Decodable Codes

CS151 Complexity Theory. Lecture 9 May 1, 2017

Tutorial: Locally decodable codes. UT Austin

Lecture 12: November 6, 2017

Decoding Reed-Muller codes over product sets

6.895 PCP and Hardness of Approximation MIT, Fall Lecture 3: Coding Theory

Lecture 4: Codes based on Concatenation

Lecture 03: Polynomial Based Codes

Error Correcting Codes Questions Pool

Error Correcting Codes: Combinatorics, Algorithms and Applications Spring Homework Due Monday March 23, 2009 in class

1 Randomized Computation

Lecture B04 : Linear codes and singleton bound

Notes 10: List Decoding Reed-Solomon Codes and Concatenated codes

: Error Correcting Codes. October 2017 Lecture 1

Error Detection and Correction: Hamming Code; Reed-Muller Code

Basic Probabilistic Checking 3

High-rate codes with sublinear-time decoding

Locality in Coding Theory

Lecture 8 (Notes) 1. The book Computational Complexity: A Modern Approach by Sanjeev Arora and Boaz Barak;

Hardness Amplification

1 The Low-Degree Testing Assumption

Lecture 21: P vs BPP 2

Great Theoretical Ideas in Computer Science

Quantum algorithms (CO 781/CS 867/QIC 823, Winter 2013) Andrew Childs, University of Waterloo LECTURE 13: Query complexity and the polynomial method

Lecture 7 September 24

Lecture 6. k+1 n, wherein n =, is defined for a given

Lecture 9: List decoding Reed-Solomon and Folded Reed-Solomon codes

CSCI-B609: A Theorist s Toolkit, Fall 2016 Oct 4. Theorem 1. A non-zero, univariate polynomial with degree d has at most d roots.

2 Completing the Hardness of approximation of Set Cover

Lecture 9 - One Way Permutations

List Decoding of Reed Solomon Codes

EE 229B ERROR CONTROL CODING Spring 2005

High-Rate Codes with Sublinear-Time Decoding

Two Query PCP with Sub-Constant Error

Lecture 11: Quantum Information III - Source Coding

Questions Pool. Amnon Ta-Shma and Dean Doron. January 2, Make sure you know how to solve. Do not submit.

Notes for Lecture 18

Lecture 24: Goldreich-Levin Hardcore Predicate. Goldreich-Levin Hardcore Predicate

Lecture 12: Reed-Solomon Codes

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

Notes for the Hong Kong Lectures on Algorithmic Coding Theory. Luca Trevisan. January 7, 2007

6.842 Randomness and Computation Lecture 5

Lecture 29: Computational Learning Theory

Efficiently decodable codes for the binary deletion channel

IP = PSPACE using Error Correcting Codes

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Discussion 6A Solution

Computational Complexity: A Modern Approach

Three Query Locally Decodable Codes with Higher Correctness Require Exponential Length

Lecture 24: Randomized Complexity, Course Summary

Lecture Examples of problems which have randomized algorithms

Locally testable and Locally correctable Codes Approaching the Gilbert-Varshamov Bound

Chapter 6 Reed-Solomon Codes. 6.1 Finite Field Algebra 6.2 Reed-Solomon Codes 6.3 Syndrome Based Decoding 6.4 Curve-Fitting Based Decoding

A list-decodable code with local encoding and decoding

Linear Algebra. F n = {all vectors of dimension n over field F} Linear algebra is about vectors. Concretely, vectors look like this:

CS294: Pseudorandomness and Combinatorial Constructions September 13, Notes for Lecture 5

1 Vandermonde matrices

Reed-Muller Codes. Sebastian Raaphorst Carleton University

CS151 Complexity Theory. Lecture 14 May 17, 2017

Local correctability of expander codes

MATH3302 Coding Theory Problem Set The following ISBN was received with a smudge. What is the missing digit? x9139 9

The Tensor Product of Two Codes is Not Necessarily Robustly Testable

Lecture 4 : Quest for Structure in Counting Problems

IMPROVING THE ALPHABET-SIZE IN EXPANDER BASED CODE CONSTRUCTIONS

PCP Theorem and Hardness of Approximation

List decoding of binary Goppa codes and key reduction for McEliece s cryptosystem

Lecture 11: Polar codes construction

for some error exponent E( R) as a function R,

List and local error-correction

Lecture 8: Shannon s Noise Models

Lecture 2 Linear Codes

Lecture 12: Interactive Proofs

The BCH Bound. Background. Parity Check Matrix for BCH Code. Minimum Distance of Cyclic Codes

Lecture 11: Key Agreement

Notes for Lecture 11

Lecture 15: Conditional and Joint Typicaility

CS Communication Complexity: Applications and New Directions

Goldreich-Levin Hardcore Predicate. Lecture 28: List Decoding Hadamard Code and Goldreich-L

Arrangements, matroids and codes

Lecture 14: Cryptographic Hash Functions

Lecture 19: Elias-Bassalygo Bound

Arthur-Merlin Streaming Complexity

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity

20.1 2SAT. CS125 Lecture 20 Fall 2016

Improving the Alphabet Size in Expander Based Code Constructions

Bounding the number of affine roots

The number of message symbols encoded into a

Computing Error Distance of Reed-Solomon Codes

Lecture 10 Oct. 3, 2017

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 8

On the NP-Hardness of Bounded Distance Decoding of Reed-Solomon Codes

Guess & Check Codes for Deletions, Insertions, and Synchronization

B(w, z, v 1, v 2, v 3, A(v 1 ), A(v 2 ), A(v 3 )).

Notes for Lecture 3... x 4

And for polynomials with coefficients in F 2 = Z/2 Euclidean algorithm for gcd s Concept of equality mod M(x) Extended Euclid for inverses mod M(x)

Report on PIR with Low Storage Overhead

A Public Key Encryption Scheme Based on the Polynomial Reconstruction Problem

Transcription:

CS 59000 CTT Current Topics in Theoretical CS Oct 30, 0 Lecturer: Elena Grigorescu Lecture 9 Scribe: Vivek Patel Introduction In this lecture we study locally decodable codes. Locally decodable codes are error correcting codes that allow very efficient access to encoded data, and in addition they are highly resilient to noise. As a motivating example suppose that we have a list of songs say s, s,... s n that we would like to store so as to ensure both efficient decoding and error resilience. One way to encode this library would be to use an error-correcting code and encode each song separately and store the concatenation of these encodings E(s ), E(s ),..., E(s n ). To recover a song one would only need to access the corresponding block in the database. If a hacker takes out one song from this list then there will be no way of getting that song back. To deal with this possibility one might want instead to concatenate all songs into a string s = s, s, s 3,..., s n and then encode them altogether in a string E(s). However, the drawback now is that the time to recover/decode one song will be proportional to the length of all songs as we now have to decode the whole library. Locally decodable codes help us overcome such scenario by allowing both resilience to errors and fast access to the data. Formal Definition Let F q be a a field of size q (for simplicity you can just think of the more familiar field Z q, where q is a prime and all operations are mod q). Informally, an r-query locally decodable code (LDC) encodes k-symbol messages x into n-symbol messages C(x) in such a way that one can probabilistically recover any symbol x i of the message by querying only r symbols of the (possibly corrupted) codeword C(x), where r is a very small number. Definition (Locally decodable codes) A code C : F k q F n q is a (r, δ, ɛ)- locally decodable if there exists a randomized algorithm A such that. x F k q, iɛ[k], y F n q with d(y, c(x)) δ (where d(y, c(x)) denotes the relative Hamming distance between y and c(x)) we have P r[a(y, i) = x i ] ɛ (that is, A recovers the ith bit of x w.p. ɛ over its random coins), and. A reads at most r coordinates of the received vector y The ideal settings are n = O(k), r = o(n). Some particularly interesting settings for cryptographic applications and private information retrieval schemes is when r =, 3, 4. Building LDCs of small rate is interesting for all ranges of r up to O(n). For binary codes δ < /4. The running time of the decoder is poly(r, log n) which is asymptotically much smaller then poly(n), the time to decode the entire received word. A related and stronger notion of local decoding is that of local correction. Here we disregard the message that we started with and only focus on its actual encoding, and we want to

recover each bit of the encoding from a possibly corrupted received word. Recall that by definition a codeword of a systematic code contains the actual message itself (together with some redundancy), so for systematic codes local correction implies local decoding. Also recall that every linear code is systematic. In this lecture we only look at locally correctable linear codes and so we ll use the terms locally decodable/correctable interchangeably. Definition (Locally correctable codes (LCCs)) A code C F n q is a (r, δ, ɛ)- locally correctable if there exists a randomized algorithm A such that. c C i [n], and vector y F n q such that d(y, c) δ (again, d(y, c) is the relative Hamming distance between y and c) we have Pr[A(y, i) = c i ] ɛ.. A makes only r queries into y 3 A local decoder for the Hadamard Code Recall that the Hadamard code is Had = {l a : {0, } k {0, }, l a (x) = a x mod a {0, } k }. In other words, a binary message a = (a,..., a k ) is encoded as (a x 0, a x,..., a x k ) = l a (x) x F k. To locally decode/correct Had means to provide an algorithm that makes a constant number of queries and is able to output l a (b) for any value of b {0, } k, when we have access to a received word that has a δ fraction of error compared to l a (x) x F k. Theorem 3 Had is (, δ, δ)-locally decodable. Proof Recall that since l a is a linear function we have l a (b) = l a (b + c) + l a (c) c F k. The intuition is that if not too many values of l a (x) got corrupted, we can count the votes of each c F k for the value of l a(b) = l a (b + c) + l a (c) and output the majority. This idea gives us the basic decoder below. Algorithm Local decoding of the Hadamard Code Input: A function f, such that d(l a, f) δ, and b F k. Goal: output l a(b).. Pick c uniformly at random from F k. Output: f(b + c) + f(c). We can now analyse this decoder. Since the distance between f and l a is δ we have that Pr c [f(c) l a (c)] δ. For any fixed b, if c is chosen uniformly from F k we have that b + c is uniformly distributed in F k so, we also have Pr[f(c + b) l a(c + b)] δ. From the Union Bound Pr[l a (b)] = f(b + c) + f(c)] δ. So we have shown an example of a code for which there is a local decoder with optimal query complexity ( bits) but which has terrible rate (the codeword length is exponential in the message length). We will next see an important family of LDCs with much better rate (only a poly blowup in message length) yet the query complexity is a constant.

4 Reed Muller Codes Reed-Muller codes (RM) are multivariate extensions of Reed-Solomon codes, which we ve seen in a previous lecture. Most of the new families of LDCs are generalizations of RM codes. Informally, RM(m, l) is the code consisting of evaluations of m-variate polynomials of degree l over F q. Definition 4 Let m, l be positive integers and F q a finite field, and l q. RM(m, l) = { p(α) α F m q p F q [x,..., x m ], deg(p) l}. For example the polynomials p(x, x ) = x 3 + 3x is a polynomial in variables over the field F 5 = Z 5 and has total degree of 3. The codeword corresponding to it can also be thought as the evaluation of p at every point in Z 5. 4. Parameters of RM Dimension (message length) We can think of an RM code as a generalization of RS in the following sense. Recall that a RS codeword was an encoding of a message (m 0,..., m k ) F k q into the codeword m(α) α Fq, where m(x) = k i=0 m ix i. Similarly, a RM codeword encodes a message whose coordinates are viewed as the the coefficients of a degree l polynomial in m variables. Hence the dimension( of the) code is the number of possible monomials of such m + l polynomials, which turns out to be. To see this, note that a monomial of degree l is x d l x d xd m m with m d i = l and so we are asking about the size of the set {(d, d,..., d m ) d i = l}, which ( is the ) number ( of) ways one can place m delimiters between l units, which is easily seen m + l m + l to be =. m l Block length By definition this is q m. Distance ( l/q)q m i.e. relative distance l/q. This is obtained from the following useful lemma about the number of roots of a multivariate polynomials over a finite field. Lemma 5 (Schwartz-Zippel) Let p F q [x,..., x m ] be a poly of total degree l. Then the number of x F m q s.t. p(x) = 0 is at most lq m. Since RM is a linear code, its minimum distance is the weight of a minimum weight codeword (say given by a polynomial p). Therefore this is q m {x p(x) = 0} = q m ( l/q). ( ) m + l Useful settings for the parameters So RM(l, m) is a [n, k, d] code where k =, l n = q m, d = ( l/q)q m. Expressing everything in terms of k, we may choose m = log k/ log log k, q = log k, and so n = k, and it follows that l < log k log log k << q, and so d > ( log log k )n. 3

5 A local decoder for RM codes Suppose f : F m q F q is the function to which the algorithm has oracle access and g RM(l, m) is s.t. d(f, g) = δ (so, x : g(x) f(x)] = δq m ). A local decoder for RM is required that, on input a F m q it outputs g(a) in time poly(m, l, q) (w.h.p.). The idea is to query points that are structured in a way that is specific to RM codes. Recall that for the Hadamard code, any tuple of points (b, c, b + c) was always satisfying the pattern h(b) = h(c) + h(b + c) (where h Had). It turns out that a similar pattern characterizes higher degree polynomials, but this time the pattern is more complicated and the points that give us the useful structure form a so-called line in F m q. Definition 6 A line in F m q is defined by a F m q and b F m q 0 and is given by the collection of points L a,b = {a + bt t F q }. We will sometimes view the line as a function L a,b : F m q F m q, L a,b (t) = a + bt. Before proceeding with the decoder we state some useful facts. Proposition 7 For any a F m q, t F q if b is uniform over F m q then L a,b (t) is uniform in F m q. Definition 8 If f : F m q F q then the restriction of f to line L = L a,b is the function f L : F q F q, f L (t) = f(l(t)). Proposition 9 If p F q [x,..., x m ] is a polynomial of degree l then p L = p L (i.e. p L (t) = p(l(t)) t) is a polynomial in one variable of degree l. The following proposition is the main tool in the proof of the correctness of the local decoder that we will present. It indirectly gives a way of characterizing degree l polynomials by l + points on a line, which will be the query complexity of a decoder for RM codes. Proposition 0 If g F q [x] is a degree l polynomial and for l+ distinct values α 0, α,..., α l F q we know g(α i ) = β i, then g can be recovered exactly at any point α F q. Theorem RM q (l, m) is (l +, 3(l+), 3 )-locally decodable. Proof Algorithm A basic local decoder for RM Code Input: A function f, such that d(g, f) δ, for some g RM(l, m) and a F m. Goal: output g(a).. Pick random direction b F m q {0} uniformly at random from F m and let L a,b(t) = a + bt.. Let α 0,..., α l be distinct elements in F q and let β i = f(l a,b (α i )) = f(a + α i b). 3. Interpolate (using Proposition 0) to find the unique polynomial of degree l s.t. g(α i ) = β i, i. 4. Output g(0) (which should be the corrected value of f(a) = f(a + 0 b)) By Proposition 7 a, if b is chosen uniformly at random a + bt is a uniform point in F m q for any t. So Pr[f(a + α i b) g(a + α i b)] 3(l+) by assumption, for every i {0,,..., l}. Again, 4

by a union bound Pr[ some α i s.t. f(a+α i b] g(a+α i b)] (l +) 3(l+) = /3, and so all the queried points are correct w.p. /3. When that is the case the interpolation step successfully finds the unique polynomials g that agrees with f on the queried points. Notice that the amount of error that the above decodes can recover from (i.e. 3(l+) ) degrades with the degree l. More sophisticated analyses can however show that RM codes are locally decodable from a constant fraction of error (unambiguously, even from /4-fraction of error). 5