COMPSCI 650 Applied Information Theory Apr 5, Lecture 18. Instructor: Arya Mazumdar Scribe: Hamed Zamani, Hadi Zolfaghari, Fatemeh Rezaei

Similar documents
Introduction to Low-Density Parity Check Codes. Brian Kurkoski

5. Density evolution. Density evolution 5-1

Message Passing Algorithm with MAP Decoding on Zigzag Cycles for Non-binary LDPC Codes

Ma/CS 6b Class 25: Error Correcting Codes 2

9 Forward-backward algorithm, sum-product on factor graphs

Physical Layer and Coding

Lecture B04 : Linear codes and singleton bound

Graph-based codes for flash memory

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

Error Correcting Codes: Combinatorics, Algorithms and Applications Spring Homework Due Monday March 23, 2009 in class

EE5585 Data Compression May 2, Lecture 27

Introducing Low-Density Parity-Check Codes

16.36 Communication Systems Engineering

Codes on graphs and iterative decoding

Exercise 1. = P(y a 1)P(a 1 )

Mathematics Department

Low-density parity-check (LDPC) codes

CS Lecture 4. Markov Random Fields

Shannon s noisy-channel theorem

Lecture 4: Proof of Shannon s theorem and an explicit code

Lecture 2 Linear Codes

Belief-Propagation Decoding of LDPC Codes

Belief propagation decoding of quantum channels by passing quantum messages

Codes on Graphs. Telecommunications Laboratory. Alex Balatsoukas-Stimming. Technical University of Crete. November 27th, 2008

Lecture 4 : Introduction to Low-density Parity-check Codes

ECEN 655: Advanced Channel Coding

Distributed Source Coding Using LDPC Codes

Codes on graphs and iterative decoding

And for polynomials with coefficients in F 2 = Z/2 Euclidean algorithm for gcd s Concept of equality mod M(x) Extended Euclid for inverses mod M(x)

Lecture 4: Linear Codes. Copyright G. Caire 88

1.6: Solutions 17. Solution to exercise 1.6 (p.13).

Lecture 8: Shannon s Noise Models

LDPC Codes. Intracom Telecom, Peania

Noisy channel communication

MATH Examination for the Module MATH-3152 (May 2009) Coding Theory. Time allowed: 2 hours. S = q

Low-Density Parity-Check Codes

EE229B - Final Project. Capacity-Approaching Low-Density Parity-Check Codes

Lecture 8: Channel Capacity, Continuous Random Variables

Slepian-Wolf Code Design via Source-Channel Correspondence

4 : Exact Inference: Variable Elimination

CS264: Beyond Worst-Case Analysis Lecture #11: LP Decoding

CSE 123: Computer Networks

EE 229B ERROR CONTROL CODING Spring 2005

CHAPTER 3 LOW DENSITY PARITY CHECK CODES

X 1 : X Table 1: Y = X X 2

MATH3302. Coding and Cryptography. Coding Theory

Making Error Correcting Codes Work for Flash Memory

Lecture 12. Block Diagram

Fountain Uncorrectable Sets and Finite-Length Analysis

Enhancing Binary Images of Non-Binary LDPC Codes

Coding Techniques for Data Storage Systems

An Introduction to Low Density Parity Check (LDPC) Codes

Error Correcting Codes Questions Pool

Low Density Parity Check (LDPC) Codes and the Need for Stronger ECC. August 2011 Ravi Motwani, Zion Kwok, Scott Nelson

Linear and conic programming relaxations: Graph structure and message-passing

Lecture 4: Codes based on Concatenation

Lecture 11: Polar codes construction

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Convergence analysis for a class of LDPC convolutional codes on the erasure channel

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Discussion 6A Solution

Great Theoretical Ideas in Computer Science

MATH 291T CODING THEORY

MATH 291T CODING THEORY

Coding Theory: Linear-Error Correcting Codes Anna Dovzhik Math 420: Advanced Linear Algebra Spring 2014

SPA decoding on the Tanner graph

Naïve Bayes classification

Capacity of a channel Shannon s second theorem. Information Theory 1/33

18.2 Continuous Alphabet (discrete-time, memoryless) Channel

Lecture 14: Hamming and Hadamard Codes

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Arrangements, matroids and codes

Decoding Codes on Graphs

Analysis of a Randomized Local Search Algorithm for LDPCC Decoding Problem

Adaptive Cut Generation for Improved Linear Programming Decoding of Binary Linear Codes

Lecture 3: Error Correcting Codes

Section 3 Error Correcting Codes (ECC): Fundamentals

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2

On the Block Error Probability of LP Decoding of LDPC Codes

Weaknesses of Margulis and Ramanujan Margulis Low-Density Parity-Check Codes

Lecture 6: Expander Codes

Compressed Sensing and Linear Codes over Real Numbers

B I N A R Y E R A S U R E C H A N N E L

Factor Graphs and Message Passing Algorithms Part 1: Introduction

Machine Learning 4771

P(c y) = W n (y c)p(c)

Community Detection in Signed Networks: an Error-Correcting Code Approach

An introduction to basic information theory. Hampus Wessman

Construction and Performance Evaluation of QC-LDPC Codes over Finite Fields

Decomposition Methods for Large Scale LP Decoding

Adaptive Decoding Algorithms for Low-Density Parity-Check Codes over the Binary Erasure Channel

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Symmetric Product Codes

EE5585 Data Compression January 29, Lecture 3. x X x X. 2 l(x) 1 (1)

Quasi-cyclic Low Density Parity Check codes with high girth

Lecture 3: Channel Capacity

An Introduction to Low-Density Parity-Check Codes

On Bit Error Rate Performance of Polar Codes in Finite Regime

Notes 10: Public-key cryptography

Linear Codes and Syndrome Decoding

Transcription:

COMPSCI 650 Applied Information Theory Apr 5, 2016 Lecture 18 Instructor: Arya Mazumdar Scribe: Hamed Zamani, Hadi Zolfaghari, Fatemeh Rezaei 1 Correcting Errors in Linear Codes Suppose someone is to send us a message x on an errornous channel. To do so he will first obtain the corresponding codeword c by multiplying x with the generator matrix G: x G = 1 k k n He then sends the codeword c on the channel; however, our channel introduces some error e: c 1 n (1) c + e = y (2) On the other side of the channel we will recieve y and our objective is to infer the original message x. If we could somehow identify the error e which the channel has introduced, we could obtain the codeword c sent by the sender which directly would lead us to the original message x. Lets multiply the parity-check matrix H with y: Hy T = H(c T + e T ) = Hc T + He T Hc T is zero by defintion (c is a codeword if multiplying it by H gives 0), thus: This is called the syndrome of the received message. Hy T = He T (3) If for every possible correctable error the channel may produce (e), we obtain a different value for the syndrome (He T ), then we will be able to uniquely identify that error by simply multiplying the parity-check matrix with the received message y. And if we can identify the error of the received message, we will be able to obtain the codeword by flipping the error bits in the received bit stream. More formally, if E is the set of errors which we are able to correct: E = {set of correctable errors} {0, 1} n e 1, e 2 E He T 1 He T 2 or in other words H(e T 1 + e T 2 ) 0. (Note that + and - are the same in binary). Example. Error Correcting in Hamming Code For Hamming Code we have the parity-check matrix: H = 0 0 0 1 1 1 1 0 1 1 0 0 1 1 (4) Note that all columns in this matrix are different. Suppose there is only one non-zero entry in our error e. Then He T will simply produce one of the unique columns of H. So for Hamming code, the set of correctable errors are the errors with only one non-zero entry: E = {e {0, 1} n : wt(e) = 1} 1

As you can see, we can identify any one-bit error in Hamming Code. Recall the minimum distance in Hamming code is 3, that is the hamming distance between any two codewords is at least 3. If the minimum distance were any lower, say 2, than a single-bit error could make us be as far away from the original codeword as we are from some other codeword. This would not allow us to distinguish which of the two is the original codeword. However, as long as the minimum distance between any two codewords is 3, any single bit error would still keep us closer to the original codeword and is correctable. To confuse a codeword in Hamming Code, we need at least two errors. Now what if we want to correct more than one error? In other words, we want the set of correctable errors to be: E = {e {0, 1} n : wt(e) = t} To do so, we must build our parity-check matrix H in such a way that: e 1, e 2 E : H(e T 1 + e T 2 ) 0 BCH codes can correct t errors as long as t is constant (i.e. does not work if t as a function of n). BCH codes are beyond the scope of this class but you can learn more in next year s Coding Theory course. 2 Correcting Erasures in Linear Codes In the previous section we investigated decoding messages on channels which introduce errors (i.e. flip codeword bits). In this section we will look into a channel which erases bits. In this channel, each bit could either remain unchanged or be changed to a new symbol?. Question. Suppose we have a code with a minimum codeword distance d. How many bit errors and how many erasures can this code correct? We ll start with errors. In order to be able to correct an error, the resulting sequence after applying the error on a codeword must remain closer to that codeword than any other codeword. Knowing that the distance between the original codeword and any other is at least d, the maximum amount of error bits we can have is d 1 2. Any more than this and the result will be closer to or as close to another codeword and we will not be able to infer the original codeword. In the case of erasures, even if we have up to d 1 erasures we are still closer to the original codeword than any other. However, any more than that we will make us confuse the codeword with another. 3 1 2 As an example, in Hamming code where the minimum codeword distance is 3, we can correct up to = 1 errors and 3 1 = 2 erasures. 2.1 Singleton bound In Hamming code, any two columns of the parity-check matrix H are linearly independent. If you multiply H by any vector of weight two, which essentially means just summing two columns of H, the result is never zero. However, for Hamming code three columns of the parity-check matrix may be lineary dependent. In general in an error correcting code with a minimum distance of d, any d 1 columns of the paritycheck matrix are linearly independent. And to have an error correcting code with minimum distance d, we need to construct a parity-check matrix such that any d 1 columns are linearly independent. 2

Since the size of each column of H is n k, there can be at most n k column vectors in H which are lineary independant. d 1 n k d n k + 1 d n 1 k n + 1 n d n 1 R + O(1) This is called the singleton bound. 2.2 Correcting Erasures Lets consider an example. Suppose H is the parity-check matrix for the Hamming code and x is a bit stream we have received on the channel. As shown below, we observe that three bits of the codeword have been erased by the channel. Assuming bits x 2, x 3, x 5, x 6 have been received correctly, our objective is to deduce the value of bits x 1, x 4, x 7. H = 0 0 0 1 1 1 1 0 1 1 0 0 1 1 x =? 1 0? 0 0? x 1 x 2 x 3 x 4 x 5 x 6 x 7 If x is to be a valid codeword we know by definition that Hx T = 0. x 1 x 2 0 0 0 1 1 1 1 x 3 0 1 1 0 0 1 1 x 4 = 0 x 5 x 6 x 7 We ll expand this multiplication to obtain the equations below. x 4 + x 5 + x 6 + x 7 = 0 x 2 + x 3 + x 6 + x 7 = 0 x 1 + x 3 + x 5 + x 7 = 0 We already know the values for bits x 2, x 3, x 5, x 6, so the equations will be reduced to x 4 + x 7 = 0 x 7 = 1 x 1 + x 7 = 0 3

We can solve this system of equations by writing it in the matrix format 0 1 1 x 1 0 0 0 1 x4 = 1 1 0 1 x7 0 H unknown X unknown H known X known In other words, multiplying the unknown columns of H by the unknown variables must be equal to multiplying the known columns of H by the known variables. Our decoding objective is to solve this set of linear equations. The linear equation is solvable as long as H unknown is invertable which is the case in this example. 1 x 1 0 1 1 0 x4 = 0 0 1 1 x7 1 0 1 0 Note that we previously mentioned we can correct two erasures in Hamming code; however, in this example we were able to correct three. This is due to the fact that the three columns erased in this example are linearly independant. This would not always be the case when correcting more than two erasures in Hamming code. For example, if x 1, x 2, x 3 were erased, H unknown would have not been invertable and we would not have been able to decode the message. We are now able to correct erasures in polynomial time. However, our decoding performance is bottlenecked at O(n 3 ) by the matrix inversion step, which is unacceptable considering we take n to be very large. We need an easier and faster way for solving the linear equation. Consider our previous example, we can directly find the value of x 7 to be 1 since it is in an equation with only a single variable. We can then remove x 7 from the other equations and find x 1 and x 4 in the same way. Obvisouly this method cannot be applied to all linear equations in general, but we will formalize this procedure in the next section using Tanner Graphs. 3 Iterative Decoding using Tanner Graphs A Tanner Graph is a bipartite graph with n variable nodes, one for each variable, on one side of the graph and n k check nodes on the other. Each check node represents a linear constraint and there will be and edge between a check node and a variable node if the variable is contained within the equation represented by the check node. In other words, you could consider the H parity-check matrix the adjency matrix for the Tanner graph. The Tanner graph of a code with the parity-check matrix below is shown in Figure 1. A different parity-check matrix will lead to a different Tanner graph representation. H = 0 0 0 1 1 1 1 0 1 1 0 0 1 1 We will use Tanner graphs to solve linear equations such as that obtained in Section 2.2. The procedure is as follows. Define the Erasure Set E to be the subset of variable nodes which are erased and unknown. We look for a check node in the Tanner graph which has exactly one neighbor within the erasure set. That is, a check node which represents a linear constraint containing only one variable. Our iterative decoding procedure can procede as long as we find such a node. Once this check node is found, 4

Figure 1 Variable Nodes Check Nodes x 1 4, 5, 6, 7 x 2 x 3 x 4 2, 3, 6, 7 x 5 x 6 1, 3, 5, 7 x 7 we will sum all values of the known neighbors of the check node and place the result as the value of the unknown variable. This unknown variable is now known and is removed from the erasure set. We iteratively continue this procedure until all variable values are known. Example. Consider the example in Section 2.2. We will solve the linear equation using Tanner graphs, Figure 2 illustrates each step of the procedure. The initial Erasure set is: E = {x 1, x 4, x 7 } Check node 2 is the only check node with a single neighbor in the erasure set. Check Node 1 {x 4, x 7 } Check Node 2 {x 7 } Check Node 3 {x 1, x 7 } We will sum the values of check node 2 s known variables, x 2, x 3, x 6, and place the result 1 as the value of x 7 (Figure 2a). The new erasure set now contains only x 4 and x 7 and both check nodes are neighbors with only one variable of the erasure set. We will apply the same process to these check nodes and obtain 1 as the value for both variables (Figure 2b and Figure 2c). 3.0.1 Stopping Sets and Sparse Graph Codes To be successful in iteratively solving the linear equation using a Tanner graph, we must be able to find a check node with a single neighbor in the erasure set in every step of the process. We define a stopping set to be a subset of variable nodes such that every check node has at least two neighbors within the stopping set. More formally S is a stopping set if: S V variable nodes : v V check nodes, N(v) S 2 5

(a) Figure 2 (b) (c)? x1? x1? 1 x1 1 x2 1 x2 1 x2 0 x3 0 x3 0 x3? x4? 1 x4 1 x4 0 x5 0 x5 0 x5 0 x6 0 x6 0 x6? 1 x7 1 x7 1 x7 If the erasure set contains a stopping set (S E) then the iterative algorithm above will fail. A stopping set will always exist in a large enough set if the graph is dense. However, if the graph is sparse, meaning that every linear equation in the system has only a few variables involved, it is unlikely that we have a stopping set. Sparse Graph Codes also known as Low-Density Parity-Check Matrix Codes are codes where the number of ones in the parity-check matrix is small compared to the number of zeros (the number of ones does not linearly grow with n). The benefit of having Low-Density Parity-Check Matrix codes is being able to use iterative algorithms for decoding. 4 Belief Propagation Decoding Suppose c is a binary codeword which has been sent over the channel. Also, suppose the channel is probabilistically introducing some errors. In order to decode the message we could find the best probability distribution given the observation y: p(c y) This is called Maximum A Posteriori (MAP) decoding and it is the best possible decoding. However, doing MAP decoding directly is often infeasible and hard to do. We will look at two different approaches to this problem. 4.1 Maximum Likelihood Decoding In maximum likelihood decoding, we use the bayes rule to convert the MAP maximization above into p(y c) p(c) p(y) Note that p(y) can be removed from the maximization since it does not depend on c. Assuming a uniform prior distribution for codewords, meaning all codewords are equally likely to occur, we can also remove p(c): p(y c) Given the memoryless property of the channel, this could be computed as: 6 n p(y i c i ) i=1

Figure 3: Binary Symetric Channel 0 0 p p 1 1 In the case of Binary Symetric Channels, p(y i c i ) depends only on whether c i is equal to y i or not. The channel flips the value of a bit with probability p, so c i and y i would have different values with probability p. If the distance between y and c is denoted by d(y, c), there are d(y, c) bits which differ in total between y and c. So the maximum likelihood is reduced to: = = p d(y,c) () n d(y,c) ( ) d(y,c) p () n ( p ) d(y,c) If p < 1 2 (which is a reasonable assumption since p = 1 2 would give us a channel that provides no information) then < 1 and the maximization above is equivalent to minimizing d(y, c): p p 1 arg min d(y, c) In other words, like we have seen in minimum hamming distance decoding, we find the codeword which has the minimum hamming distance with the received bit stream y. 4.2 Belief Propagation Decoding In belief propagation, instead of maximum likelihood we do a bit MAP decoding (bitwise MAP). In bit MAP decoding, we find the value for each bit i by computing: x {0,1} p(c i = x y) The i th bit takes the value which has a higher probability of occuring as the i th bit in codewords, given y. As you can see, in bit MAP decoding we are not trying to optimize over the codeword but rather optimize over each bit of the codeword. As a result, the bit stream we end up with in the end may not necessarily be codeword but it often does turn out to be. To compute the value for each bit i, we must sum over the probabilities of all codewords which have an i th bit of x. So the above maximization would be equivalent to: p(c y) x {0,1} :c i=x 7