Department of Computer Sciences Applied Algorithms Lab. July 24, 2011
Outline 1 Introduction 2 Algorithms for LDPC 3 Properties 4 Iterative Learning in Crowds 5 Algorithm 6 Results 7 Conclusion
PART I LDPC Codes
Coding Theory Basics Wish to send original message over a noisy channel Encode message into a Code Add redundant bits for robustness Code gets corrupted through noisy channel Bit flip error (BSC: Binary Symmetric Channel) W.p. p, a bit flips (error probability) Restoring corrupted code into original code is called Decoding Figure: Communication Channel Diagram
LDPC Codes Low-Density: sparse constraints on each bit Parity-Check: Bits of same constraint must sum to 0 (modulo 2) Graphical representation is a regular bipartite graph All codewords are equidistant from one another (Hamming-wise) Figure: Parity-Check Diagram for a (7,2,4)-code. Square nodes are check nodes and circles are message nodes.
Matrix Representation A c n matrix representation of LDPC, where there are c check nodes and n message nodes (i.e. length n codewords): 0 1 0 1 1 0 0 1 H = 1 1 1 0 0 1 0 0 0 0 1 0 0 1 1 1 1 0 0 1 1 0 1 0 Characteristics: Each row and column has the same number of 1s. Specifies the parity constraints: e.g. For each codeword, the 2nd, 4th, 5th, and the 8th bit should sum to 0 (by first row)... and similar constraints according to the subsequent rows
Code Verification - Constraints on Codewords Given a code w, how do we know if it s a valid codeword? Must satisfy parity constraints i.e. Sum to 0, for each row For a given codeword w of length n, the following must be satisfied: Hw = 0 (H {0, 1} c n, w {0, 1} n ) That is, all valid codewords should belong to the null space of the LDPC matrix.
Encoding A Generator matrix G for an LDPC code provides a basis for the code sets such that w = m i g i = mg, i where m is k-bit original message, and w is a valid codeword. G and H have a dual relationship in that G = [I k P] and H = [P T I n k ] Hence, to derive G from H, put H in the form of [P T I n k ] through row operations...... and set G = [I k P]. But this encoding scheme could take O(n 2 ) operations Linear-time encoding schemes exist
Decoding Exact decoding for LDPC codes is NP-hard Instead, we approximate by belief propagation (message passing) The messages passed are the log-likelihood of the observed bit Message from a codeword node v to a check node c: Likelihood of v given the observed values of v Message from a check node c to a codeword node v: Likelihood of v given all the messages passed to c
Decoding Formula Derivation [3] Define the likelihood of a binary r.v. x as L(x) L(x y) P(x = 0) P(x = 1) P(x = 0 y) P(x = 1 y) If P(x = 0) = P(x = 1) then L(x y) = L(y x) by Bayes rule. Therefore, if y 1,, y d are independent r.v. s, ln L(x y 1,, y d ) = i ln L(x y i ) (1) Eqn 1 will become the message from codeword node v to check node c.
Decoding Derivation (Cont.) Let l i = ln L(x i y i ). Consider the log-likelihood ln L(x 1 + +x l y 1,, y l ) = ln 1+( l i=1 tanh(l i/2)) 1 ( l i=1 tanh(l i/2)) (Summations are mod 2) This equation holds due to: and 2P(x 1 + +x l = 0 y 1,, y l ) 1 = l (2P(x i = 0 y i ) 1) i=1 2P(x i = 0 y i ) 1 = L(x i y i ) 1 L(x i y i )+1 = tanh(l i/2)
Defining Messages At round 0, m v is set to the log-likelihood of v conditioned on the observed value. At round k, m (k) vc = m (k) cv { m v, if k = 0 m v + c C v {c} m(k 1) c v, if k 1 = ln 1+ v V c {v} tanh(m(k) /2) v c 1 v V c {v} tanh(m(k) /2) v c To see that m cv makes sense, notice that x 1 + +x v 1 + x v+1 + +x p = 0 (or 1) x v = 0 (or 1), due to mod 2 operation and parity constraints. Hence, m cv is the likelihood for the node v.
Decoding - Final Decision Once the messages converge, we can set ln L(x v y v) = c C(v) mcv, and make the decision on the bit v by: { 0, if ln L(x v v v) > 0 1, if ln L(x v v v) < 0 If the log likelihood is positive, it means P(x = 0 y) > P(x = 1 y) and similar for negative.
Error Bounds [1] The error bound is given in terms of the depth of the tree (i.e. # of iterations): Theorem Given an (n, j, k)-ldpc code, with [ ] n ln n 2k 2j(k 1) 2 ln(j 1)(k 1) m ln n ln(j 1)(k 1), the probability of a decoding error after m iterations is ( [ ] α ) n P m exp c jk 2k n 2j(k 1) ln j 1 2 if j odd 2 ln(j 1)(k 1) α = ln j 2 if j even, 2 ln(j 1)(k 1) for suitable positive constants c jk.
PART II Iterative Learning in Crowds [2]
Tasks and Workers Workers: There are n workers - w a for a {1,, n} Each worker gets r tasks randomly. Tasks: There are m tasks - t i for i {1,, m} Each task gets assigned to l workers. NOTE: The above settings imply lm = rn.
Responses For simplicity, assume a binary response: Worker w a completing task t i yields response A ia {±1}. Correct answer for t i is s i = 1. Let p a P(A ia = s i ) (iid by some probability distribution).
Task Allocation Given the setting, we can construct a bipartite graph G({t i },{w a}, E, A): E [m] [n] - connected if t i assigned to w a. A ia is the weight assigned to edge (i, a). How to assign: According to a random (l, r)-regular (bipartite) graph. i.e. all workers have degree l, all tasks have degree r. Among all possible(l, r)-regular graphs, we sample one uniformly random.
Message Passing Messages on k-th iteration: x (k) i a from task to worker. y (k) a i from worker to task. ** {ya i} 0 N(1, 1) x (k) i a = b N(i)a y (k) a i = j N(a)i A ib y (k 1) b i A ja x (k 1) j a
Final Decision For task i, run a predefined number of iterations k. The decision is made according to: = sign A ib y (k 1) If x (k) i = 0, flip a fair coin. x (k) i b N(i) b i
Theorems From the random variables defined previously, we can derive probability bounds on mis-prediction: Theorem Assumˆlˆrq 2 > 1. Then lim k P(x(k) 0) exp where ˆx x 1, and q E[(2p 1) 2 ]. ( 1 2 ) l 3 (ˆlˆrq 2 1), ˆl 3 (4+ˆrq)
Interpretations With message passing, error decreases exponentially with rounds. Regardless of how tasks are assigned (as long as (l, r) regularity is satisfied). Initial messages can be random as long as the distribution has non-zero mean. But also, Strict setting: binary response, uniform prior on p s. Theorem holds for q 2ˆlˆr > 1: Can hold even if r is large.. i.e. even if workers process many irrelevant tasks (counter-intuitive).
Robert Gallager. Low-density parity check codes, 1963. David Karger, Sewoong Oh, and Devavrat Shah. Iterative learning from a crowd. In WIDS, 2011. Amin Shokrollahi. LDPC codes: An introduction, 2003.