Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 11. Error Correcting Codes Erasure Errors

Similar documents
Discrete Mathematics and Probability Theory Fall 2013 Vazirani Note 6

CS 70 Discrete Mathematics and Probability Theory Summer 2016 Dinh, Psomas, and Ye Discussion 8A Sol

Error Correction. Satellite. GPS device. 3 packet message. Send 5. 3 D 2 B 1 A 1 E. Corrupts 1 packets. 3 D 2 B 1 E 1 A

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Discussion 6B Solution

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Discussion 6A Solution

Reed-Solomon code. P(n + 2k)

2. Polynomials. 19 points. 3/3/3/3/3/4 Clearly indicate your correctly formatted answer: this is what is to be graded. No need to justify!

3x + 1 (mod 5) x + 2 (mod 5)

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 8

Discrete Mathematics and Probability Theory Spring 2015 Vazirani Midterm #2 Solution

Today. Wrapup of Polynomials...and modular arithmetic. Coutability and Uncountability.

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 7

Lecture 21: Midterm 2 Review.

Midterm 2. Your Exam Room: Name of Person Sitting on Your Left: Name of Person Sitting on Your Right: Name of Person Sitting in Front of You:

Modular Arithmetic Inverses and GCD

Great Theoretical Ideas in Computer Science

Lecture 14. Outline. 1. Finish Polynomials and Secrets. 2. Finite Fields: Abstract Algebra 3. Erasure Coding

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Homework 7. This homework is due October 20, 2014, at 12:00 noon.

Lecture 3,4: Multiparty Computation

1 Operations on Polynomials

channel of communication noise Each codeword has length 2, and all digits are either 0 or 1. Such codes are called Binary Codes.

Discrete Mathematics and Probability Theory Summer 2014 James Cook Midterm 1

Introduction to Cryptography Lecture 13

Lecture 15 and 16: BCH Codes: Error Correction

Lecture 12: November 6, 2017

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Midterm 2

Error Correcting Codes Questions Pool

during transmission safeguard information Cryptography: used to CRYPTOGRAPHY BACKGROUND OF THE MATHEMATICAL

CS 70 Discrete Mathematics and Probability Theory Summer 2016 Dinh, Psomas, and Ye Final Exam

Midterm 1. Name: TA: U.C. Berkeley CS70 : Algorithms Midterm 1 Lecturers: Anant Sahai & Christos Papadimitriou October 15, 2008

Lecture 1: Shannon s Theorem

Number theory (Chapter 4)

Midterm 1. Total. CS70 Discrete Mathematics and Probability Theory, Spring :00-9:00pm, 1 March. Instructions:

ECE 4450:427/527 - Computer Networks Spring 2017

Discrete Mathematics and Probability Theory Summer 2014 James Cook Midterm 1 (Version B)

Entanglement and information

Communication Complexity. The dialogues of Alice and Bob...

Convolutional Coding LECTURE Overview

Getting Connected. Chapter 2, Part 2. Networking CS 3470, Section 1 Sarah Diesburg

Error Detection & Correction

Assume that the follow string of bits constitutes one of the segments we which to transmit.

Decoding Reed-Muller codes over product sets

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Dept. of Linguistics, Indiana University Fall 2015

Noisy channel communication

Linear Algebra. F n = {all vectors of dimension n over field F} Linear algebra is about vectors. Concretely, vectors look like this:

An introduction to basic information theory. Hampus Wessman

Section 4.3. Polynomial Division; The Remainder Theorem and the Factor Theorem

9 THEORY OF CODES. 9.0 Introduction. 9.1 Noise

Theory of Computation Chapter 12: Cryptography

Simple Math: Cryptography

CPSC 467b: Cryptography and Computer Security

Private Key Cryptography. Fermat s Little Theorem. One Time Pads. Public Key Cryptography

Example: sending one bit of information across noisy channel. Effects of the noise: flip the bit with probability p.

A Brief Encounter with Linear Codes

RSA RSA public key cryptosystem

Quantum Simultaneous Contract Signing

Lecture 11: Hash Functions, Merkle-Damgaard, Random Oracle

Notes 10: Public-key cryptography

Winter 2011 Josh Benaloh Brian LaMacchia

Cryptography and Number Theory

Lecture 16 Oct 21, 2014

9. Distance measures. 9.1 Classical information measures. Head Tail. How similar/close are two probability distributions? Trace distance.

Notes 10: List Decoding Reed-Solomon Codes and Concatenated codes

Solution to Midterm Examination

Lecture 6: Quantum error correction and quantum capacity

Cryptography. P. Danziger. Transmit...Bob...

CSE 123: Computer Networks

Lecture 14: Secure Multiparty Computation

HOMEWORK 8 SOLUTIONS MATH 4753

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

18.2 Continuous Alphabet (discrete-time, memoryless) Channel

Lecture 21: Quantum communication complexity

A Public Key Encryption Scheme Based on the Polynomial Reconstruction Problem

MATH3302 Coding Theory Problem Set The following ISBN was received with a smudge. What is the missing digit? x9139 9

Lecture Notes, Week 6

Quantum Teleportation Pt. 1

CPSC 467b: Cryptography and Computer Security

CSCI-B609: A Theorist s Toolkit, Fall 2016 Oct 4. Theorem 1. A non-zero, univariate polynomial with degree d has at most d roots.

CPSC 467b: Cryptography and Computer Security

Problem Set: TT Quantum Information

17.1 Binary Codes Normal numbers we use are in base 10, which are called decimal numbers. Each digit can be 10 possible numbers: 0, 1, 2, 9.

Fault Tolerant Computing CS 530 Information redundancy: Coding theory. Yashwant K. Malaiya Colorado State University

Sol: First, calculate the number of integers which are relative prime with = (1 1 7 ) (1 1 3 ) = = 2268

18.310A Final exam practice questions

Outline. EECS Components and Design Techniques for Digital Systems. Lec 18 Error Coding. In the real world. Our beautiful digital world.

ELG 5372 Error Control Coding. Lecture 12: Ideals in Rings and Algebraic Description of Cyclic Codes

Lecture 1: Introduction to Public key cryptography

Lecture Notes. Advanced Discrete Structures COT S

Lecture 22: RSA Encryption. RSA Encryption

ITCT Lecture IV.3: Markov Processes and Sources with Memory

16.36 Communication Systems Engineering

CPSC 467: Cryptography and Computer Security

U.C. Berkeley CS276: Cryptography Luca Trevisan February 5, Notes for Lecture 6

Physical Layer and Coding

Public-key Cryptography and elliptic curves

Lecture 14: IP = PSPACE

Multimedia Systems WS 2010/2011

U.C. Berkeley CS174: Randomized Algorithms Lecture Note 12 Professor Luca Trevisan April 29, 2003

Massachusetts Institute of Technology. Solution to Problem 1: The Registrar s Worst Nightmare

Transcription:

CS 70 Discrete Mathematics for CS Spring 2007 Luca Trevisan Lecture 11 Error Correcting Codes Erasure Errors We will consider two situations in which we wish to transmit information on an unreliable channel. The first is exemplified by the internet, where the information (say a file) is broken up into packets, and the unreliability is manifest in the fact that some of the packets are lost during transmission, as shown below: Suppose that the message consists of n packets and suppose that k packets are lost during transmission. We will show how to encode the initial message consisting of n packets into a redundant encoding consisting of n+k packets such that the recipient can reconstruct the message from any n received packets. Note that in this setting the packets are labelled and thus the recipient knows exactly which packets were dropped during transmission. We can assume without loss of generality that the contents of each packet is a number modulo q, where q is a prime. This is because the contents of the packet might be a 32-bit string and can therefore be regarded as a number between 0 and 2 32 1. Thus we can choose q to be any prime larger than 2 32. The properties of polynomials over GF(q) (i.e. with coefficients and values reduced modulo q) are perfectly suited to solve this problem and are the backbone to this error-correcting scheme. To see this, let us denote the message to be sent by m 1,...,m n and make the following crucial observations: 1) There is a unique polynomial P(x) of degree n 1 such that P(i) = m i for 1 i n (i.e. P(x) contains all of the information about the message, and evaluating P(i) gives the contents of the i-th packet). 2) The message to be sent is now m 1 = P(1),...,m n = P(n). We can generate additional packets by evaluating P(x) at points n + j (remember, our transmitted message must be redundant, i.e. it must contain more packets than the original message to account for the lost packets). Thus the transmitted message is c 1 = P(1),c 2 = P(2),...,c n+k = P(n+k). Since we are working modulo q, we must make sure that n+k q, but this condition does not impose a serious constraint since q is very large. 3) We can uniquely reconstruct P(x) from its values at any n distinct points, since it has degree n 1. This means that P(x) can be reconstructed from any n of the transmitted packets. Evaluating this reconstructed polynomial P(x) at x = 1,...,n yields the original message m 1,...,m n. Example Suppose we are working over GF(7) (i.e. all coefficients and numbers can take on values between 0 and 6), and the number of packets in the message is n = 4. Suppose the message that Alice wants to send to Bob is m 1 = 3, m 2 = 1, m 3 = 5, and m 4 = 0. The unique degree n 1 = 3 polynomial described by these 4 points CS 70, Spring 2007, Lecture 11 1

is P(x) = x 3 + 4x 2 + 5 (verify that P(i) = m i for 1 i 4). Now, suppose that Alice wishes to guard against k = 2 lost packets. In order to do this, she must evaluate P(x) at 2 extra points: P(5) = 6 and P(6) = 1. Now, Alice can transmit the encoded message which consists of n + k = 6 packets, where c j = P( j) for 1 j 6. So c 1 = P(1) = 3, c 2 = P(2) = 1, c 3 = P(3) = 5, c 4 = P(4) = 0, c 5 = P(5) = 6, and c 6 = P(6) = 1. Suppose packets 2 and 6 are dropped, in which case we have the following situation: From the values that Bob received (3, 5, 0, and 6), he uses Lagrange interpolation and computes the following delta functions: (x 3)(x 4)(x 5) 1 (x) = 24 (x 1)(x 4)(x 5) 3 (x) = 4 (x 1)(x 3)(x 5) 4 (x) = 3 (x 1)(x 3)(x 4) 5 (x) =. 8 He then reconstructs the polynomial P(x) = (3) 1 (x) + (5) 3 (x) + (0) 4 (x) + (6) 5 (x) = x 3 + 4x 2 + 5. Bob then evaluates m 2 = P(2) = 1, which is the packet that was lost from the original message. We can extend this example to a more general case and say that, no matter which 2 packets were dropped, Bob still could ve reconstructed P(x) and thus the original message, and that is because of the remarkable properties of polynomials over GF(q). Let us consider what would happen if Alice sent one fewer packet. If Alice only sent c j for 1 j n+k 1, then with k erasures, Bob would only receive c j for n 1 distinct values j. Thus, Bob would not be able to reconstruct P(x) (since there are exactly q polynomials of degree at most n 1 that agree with the n 1 packets which Bob received). This error-correcting scheme is therefore optimal it can recover the n characters of the transmitted message from any n received characters. General Errors Let us now consider a much more challenging scenario. Now Alice wishes to communicate with Bob over a noisy channel (say via a modem). Her message is m 1,...,m n, where we will think of the m i s as characters (either bytes or characters in the English alphabet). The problem now is that some of the characters are corrupted during transmission due to channel noise. So Bob receives exactly as many characters as Alice transmits. However, k of them are corrupted, and Bob has no idea which k. Recovering from such general errors is much more challenging than erasure errors, though once again polynomials hold the key. Let us again think of each character as a number modulo q for some prime q (for the English alphabet q is CS 70, Spring 2007, Lecture 11 2

some prime larger than 26, say q = 29). As before, we can describe the message by a polynomial P(x) of degree n 1 over GF(q), such that P(1) = m 1,..., P(n) = m n. As before, to cope with the transmission errors Alice will transmit additional characters obtained by evaluating P(x) at additional points. To guard against k general errors, Alice must transmit 2k additional characters (as opposed to just k additional packets, which was the case with erasure errors). Thus the encoded message is c 1,...,c n+2k where c j = P( j) for 1 j n+2k, and n+k of these characters that Bob receives are uncorrupted. As before, we must put the mild constraint on q that it be large enough so that q n+2k. For example, if Alice wishes to send n = 4 characters to Bob via a modem in which k = 1 of the characters is corrupted, she must redundantly send an encoded message consisting of 6 characters. Suppose she wants to transmit the same message as above, and that c 1 is corrupted and changed to c 1 = 2. This scenario can be visualized in the following figure: From Bob s viewpoint, the problem of reconstructing Alice s message is the same as reconstructing the polynomial P(x) from the n + 2k received characters R(1), R(2),..., R(n + 2k). In other words, Bob is given n+2k values modulo q, R(1),R(2),...,R(n+2k), with the promise that there is a polynomial P(x) of degree n 1 over GF(q) : R(i) = P(i) for n+k distinct values of i between 1 and n+2k. Bob must reconstruct P(x) from this data (in the above example, n+k = 5 and R(2) = P(2) = 1, R(3) = P(3) = 5, R(4) = P(4) = 0, R(5) = P(5) = 6, and R(6) = P(6) = 1). But does Bob have sufficient information to reconstruct P(x)? Our first observation shows that the answer is yes. There is a unique polynomial that agrees with R(x) at n+k points. Suppose that P (x) is any polynomial of degree n 1 that agrees with R(x) at n+k points. Then among these n+k points there are at most k errors, and therefore on at least n points x i : P (x i ) = P(x i ). But a polynomial of degree n 1 is uniquely defined by its values at n points, and therefore P(x) = P (x) (for all x). But how can Bob quickly find such a polynomial? The issue at hand are the locations of the k errors. Let e 1,...,e k be the k locations at which errors occurred. Note that P(e i ) R(e i ) for 1 i k: We could try to guess where the k errors lie, but this would take too long (it would take exponential time, in fact). Consider the error-locator polynomial E(x) = (x e 1 )(x e 2 ) (x e k ), which has degree k (since x appears k times). Let us make a simple but crucial observation: P(i)E(i) = R(i)E(i) for 1 i n+2k (this is true at points i at which no error occurred since P(i) = R(i) and trivially true at points i at which an error occurred CS 70, Spring 2007, Lecture 11 3

since E(i) = 0). This observation forms the basis of a very clever algorithm invented by Berlekamp and Welch. Looking more closely at these equalities, we will show that they are n + 2k linear equations in n + 2k unknowns, from which the locations of the errors and coefficients of P(x) can be easily deduced. Let P(x)E(x) = Q(x), which is a polynomial of degree n+k 1, and is therefore described by n+k coefficients. The error-locator polynomial E(x) = (x e 1 ) (x e k ) has degree k and is described by k + 1 coefficients, but the leading coefficient (coefficient of x k ) is always 1. So we have: Q(x) = a n+k 1 x n+k 1 + + a 1 x+a 0 E(x) = x k + b k 1 x k 1 + + b 1 x+b 0 Once we fix a value i for x, the received value R(i) is fixed. Also, Q(i) is now a linear function of the n+k coefficients a n+k 1...a 0, and E(x) is a linear function of the k coefficients b k 1...b 0. Therefore the equation Q(i) = R(i)E(i) is a linear equation in the n+2k unknowns a n+k 1,...,a 0 and b k 1,...,b 0. We thus have n+2k linear equations, one for each value of i, and n+2k unknowns. We can solve these equations and get E(x) and Q(x). We can then compute the ratio Q(x) E(x) to obtain P(x). Example. Suppose we are working over GF(7) and Alice wants to send Bob the n = 3 characters 3, 0, and 6 over a modem. Turning to the analogy of the English alphabet, this is equivalent to using only the first 7 letters of the alphabet, where a = 0,..., g = 6. So the message which Alice wishes for Bob to receive is dag. Then Alice interpolates to find the polynomial P(x) = x 2 + x+1, which is the unique polynomial of degree 2 such that P(1) = 3, P(2) = 0, and P(3) = 6. Suppose that k = 1 character is corrupted, so she needs to transmit the n + 2k = 5 characters P(1) = 3, P(2) = 0, P(3) = 6, P(4) = 0, and P(5) = 3 to Bob. Suppose P(1) is corrupted, so he receives 2 instead of 3 (i.e. Alice sends the encoded message dagad but Bob instead receives cagad ). Summarizing, we have the following situation: Let E(x) = (x e 1 ) be the error-locator polynomial remember, Bob doesn t know what e 1 is yet since he doesn t know where the error occurred and let the degree 3 polynomial Q(x) = R(x)E(x) for x = 1 to 5. Now Bob just substitutes x = 1, x = 2,..., x = 5 to get five linear equations in five unknowns (recall that we are working modulo 7 and that R(i) = c i is the value Bob received for the i-th character): a 3 + a 2 + a 1 + a 0 + 5b 0 = 2 a 3 + 4a 2 + 2a 1 + a 0 = 0 6a 3 + 2a 2 + 3a 1 + a 0 + b 0 = 4 a 3 + 2a 2 + 4a 1 + a 0 = 0 6a 3 + 4a 2 + 5a 1 + a 0 + 4b 0 = 1 CS 70, Spring 2007, Lecture 11 4

Bob then solves this linear system and finds that a 3 = 1, a 2 = 0, a 1 = 0, a 0 = 6, and b 0 = e 1 = 1 6 mod 7 (as a check, we expect e 1 to be the index at which the error occurred. Indeed, this is the case since the first character was corrupted from a d to a c ). This gives him the polynomials Q(x) = (x 3 + 6) and E(x) = (x 1). He can then find P(x) by computing the quotient P(x) = Q(x) E(x) = x3 +6 x 1 = x2 + x + 1. Bob notices that the first character was corrupted (since e 1 = 1), so now that he has P(x), he just computes P(1) = 3 = d and obtains the original, uncorrupted message dag. Finer Points Two points need further discussion. How do we know that the n+2k equations are consistent? What if they have no solution? This is simple. The equations must be consistent since Q(x) = P(x)E(x) together with the error locator polynomial E(x) gives a solution. A more interesting question is this: how do we know that the n+2k equations are independent, i.e., how do we know that there aren t other spurious solutions in addition to the real solution that we are looking for? Put more mathematically, how do we know that the solution Q (x) and E (x) that we reconstruct satisfy the property that E (x) divides Q (x) and that Q (x) E(x) = P(x)? To see this notice that Q(x)E (x) = Q (x)e(x) for 1 x n+2k. This holds trivially whenever E(x) or E (x) is 0, and otherwise it follows from the fact that Q (x) E(x) = R(x). But the degree of Q(x)E (x) and Q (x)e(x) is n + 2k - 1. Since these two polynomials are equal at n+2k points, it follows that they are the same polynomial, and thus rearranging we get that Q (x) E(x) = P(x). CS 70, Spring 2007, Lecture 11 5