Decomposition Methods for Large Scale LP Decoding Siddharth Barman Joint work with Xishuo Liu, Stark Draper, and Ben Recht
Outline Background and Problem Setup LP Decoding Formulation Optimization Framework ADMM Technical Core Projecting onto the Parity Polytope Numerical results Graphs
Efficient and Reliable Digital Transmission Error Correcting Codes Constructs that enable reliable delivery of digital data over unreliable (noisy) communication channels. Claude Shannon Richard Hamming
Model for Communication Channel Alice x Noisy Channel x Bob Binary Symmetric Channel (BSC) We will focus on transmission of bit strings (x, x {0, } n ) and the Binary Symmetric Channel. 0 p 0 p p p Figure : BSC with crossover probability p
Probabilistic implications of BSC 0 p 0 p p p Say x is the transmitted bit string and x is the received bit string. BSC is memoryless hence Pr( x x) = i Pr( x i x i ) Example: Pr( x = 00 x = 000) = ( p) ( p) p.
Executive Summary of Error Correcting Codes Alice x Dictionary Noisy Channel x / Dictionary Bob Codewords C: A Dictionary (structured set). Error Detection: Spell Checking. Error Correction: Spell Correction.
Maximum Likelihood (ML) Decoding With codeword C and received bit string x, ML decoding picks a codeword x C that maximizes the probability that x was received given that x was sent: Pr(received x sent x) maximize Pr( x x) subject to x C maximize i Pr( x i x i ) subject to x C
Maximum Likelihood (ML) Decoding With codeword C and received bit string x, ML decoding picks a codeword x C that maximizes the probability that x was received given that x was sent: Pr(received x sent x) maximize Pr( x x) subject to x C maximize i Pr( x i x i ) subject to x C maximize i log Pr( x i x i ) subject to x C
Maximum Likelihood (ML) Decoding ML decoding: maximize i log Pr( x i x i ) s.t. x C Negative log-likelihood ratios for all i: log p p if x i = γ i = log p p if x i = 0 ML decoding: minimize i γ i x i s.t. x C
Low Density Parity Check (LDPC) Codes LDPC: x C iff all the parity checks are satisfied. Parity Checks (x, x 2, x 3 ) (x, x 3, x 4 ) (x 2, x 5, x 6 ) (x 4, x 5, x 6 ) Codeword Bits x x 2 x 3 x 4 x 5 x 6 Example: x = ( 0 0 ) Parity Check : ( 0 ) Parity Check 2: ( 0) Parity Check 3: (0 ) Parity Check 4: (0 )
Decoding Low Density Parity Check (LDPC) Codes Parity Checks (x, x 2, x 3 ) (x, x 3, x 4 ) (x 2, x 5, x 6 ) (x 4, x 5, x 6 ) Codeword Bits x x 2 x 3 x 4 x 5 x 6 Let P j x be the sub-vector participating in the jth parity check. P x = (x x 2 x 3 ), P 4 x = (x 4 x 5 x 6 ) and d = 3. P d = {all even parity bit-vectors of length d} LDPC x C if and only if P j x P d for all j.
ML Decoding for LDPC Codes γ: Negative log-likelihood ratios. P j x: The sub-vector participating in the jth parity check. P d = {all even parity bit-vectors of length d} ML decoding: minimize subject to γ i x i i x C ML Decoding for LDPC Codes: minimize γ i x i subject to P j x P d j i
Decoding by Belief Propagation (BP) Parity Check Nodes (x, x 2, x 3 ) (x, x 3, x 4 ) (x, x 2, x 3 ) (x, x 3, x 4 ) Codeword Bits x x 2 x 3 x 4 x x 2 x 3 x 4 BP has been empirically successful in decoding LDPC codes but possess are no convergence guarantees. Inherently distributed and takes full advantage of locality (low density of parity checks). A distributed decoding algorithm is desirable as it directly implies scalability.
Decoding Linear Program minimize γ i x i i subject to P j x P d j and x {0, } n Parity Polytope, PP d, is the convex hull of all even parity bit-vectors of length d, PP d = conv(p d ). Feldman et al. (2005) proposed the following relaxation: minimize γ i x i i subject to P j x PP d j and x [0, ] n
Decoding LP and the Parity Polytope P d = {all even parity bit-vectors of length d} PP d = conv(p d ) minimize γ i x i i subject to P j x PP d j and x [0, ] n 0 0 0 000
Outline Background and Problem Setup [6 mins.] LP Decoding Formulation
Outline Background and Problem Setup [6 mins.] LP Decoding Formulation Optimization Framework [4 mins.] Alternating Direction Method of Multipliers (ADMM)
Decoding LP with parity polytope PP d : minimize γ i x i subject to P j x PP d j x [0, ] n i Add Replicas z j s: minimize γ i x i subject to z j = P j x j z j PP d j x [0, ] n i
Augmented Lagrangian minimize γ i x i subject to z j = P j x j i z j PP d j x [0, ] n Augmented Lagrangian with Lagrange multiplier λ and penalty parameter µ: L µ (x, z, λ) := γ T x + j λ T j (P j x z j )
Augmented Lagrangian minimize γ i x i subject to z j = P j x j i z j PP d j x [0, ] n Augmented Lagrangian with Lagrange multiplier λ and penalty parameter µ: L µ (x, z, λ) := γ T x + j λ T j (P j x z j ) + µ P j x z j 2 2 2 j
Alternating Direction Method of Multipliers L µ (x, z, λ) := γ T x + j λ T j (P j x z j ) + µ P j x z j 2 2 2 j ADMM Update Steps: Lather, Rinse, Repeat. x k+ := argmin x X L µ (x, z k, λ k ) z k+ := argmin z Z L µ (x k+, z, λ k ) ( ) λ k+ j := λ k j + µ P j x k+ z k+ j
ADMM x-update Decoding LP: minimize γ i x i subject to z j = P j x j z j PP d j x [0, ] n i Augmented Lagrangian: L µ (x, z, λ) := γ T x + j λt j (P j x z j ) + µ 2 j P jx z j 2 2
ADMM x-update Decoding LP: minimize γ i x i subject to z j = P j x j z j PP d j x [0, ] n i Augmented Lagrangian: L µ (x, z, λ) := γ T x + j λt j (P j x z j ) + µ 2 j P jx z j 2 2 With z and λ fixed the x-updates are simple: minimize L µ (x, z k, λ k ) subject to x [0, ] n
ADMM x-update In the x-update step replicas (z) and dual variables (λ) are fixed. x L µ(x, z k, λ k )= 0 Component wise update: x i = Π [0,] N v (i) j N v (i) ( z (i) j ) µ λ(i) j N v (i) : set of parity checks containing component i. z (i) j : component of the jth replica associated with x i. µ γ i
ADMM z-update Augmented Lagrangian: L µ (x, z, λ) := γ T x + j λt j (P j x z j ) + µ 2 j P jx z j 2 2 z-update: minimize λ T j (P j x z j ) + µ P j x z j 2 2 2 subject to z j PP d j j The minimization is completely separable in j, hence for each z j we need to solve: j minimize λ T j (P j x z j ) + µ 2 P jx z j 2 2 subject to z j PP d
z-updates: minimize λ T j (P j x z j ) + µ 2 P jx z j 2 2 subject to z j PP d With v = P j x + λ j /µ (completing the square) the problem is equivalent to: minimize v z 2 2 subject to z PP d The primary challenge in ADMM The z-update requires projecting onto the parity polytope.
Outline Background and Problem Setup [6 mins.] LP Decoding Formulation Optimization Framework [4 mins.] Alternating Direction Method of Multipliers (ADMM)
Outline Background and Problem Setup [6 mins.] LP Decoding Formulation Optimization Framework [4 mins.] Alternating Direction Method of Multipliers (ADMM) Technical Core [ 5 + ɛ mins.] Projecting onto the Parity Polytope
minimize v z 2 2 subject to z PP d Recall that the parity polytope, PP d, is the convex hull of all binary vectors of length d and even hamming weight. 0 0 0 000
Parity Polytope y PP d iff y = i α ie i Here e i are even-hamming-weight vectors of dimension d, i α i = and α i 0. Example (d = 6): /2 /2 /4 /4 = 2 0 0 0 0 + 4 0 0 + 4
Parity Polytope y PP d iff y = i α ie i Here e i are even-hamming-weight vectors of dimension d, i α i = and α i 0. Example (d = 6): /2 /2 /4 /4 = 4 0 0 0 0 + 2 0 0 + 4 0 0
Characterizing the Parity Polytope Two-Slice Lemma: For any y PP d there exists a representation: y = i α ie i such that the hamming weight of all e i is either r or r + 2, for some even integer r. Example with d = 6 and r = 2. /2 /2 /4 /4 = 4 0 0 0 0 + 2 0 0 + 4 0 0
Characterizing the Parity Polytope Lemma: For any y PP d there exists a representation: y = i α ie i such that the hamming weight of all e i is either r or r + 2, for some even integer r. (0) (0) t P P r+2 d (0) y (0) (0) (000) s P P r d (000) (000) d = 5, r = 2
Characterizing the Parity Polytope Let PP r d be the convex hull of all binary vectors of hamming weight r. Two-Slice Lemma: y PP d iff y = αs + ( α)t, where s PP r d and t PPr+2 d. (0) (0) t P P r+2 d (0) y (0) (0) (000) s P P r d (000) (000) d = 5, r = 2
Structure of the Parity Polytope PP r d = (Hyperplane containing vectors of weight r) Hypercube. (0) P P 4 5 (0) () ( 25, 2 5, 2 5, 2 5, 2 5) (0) ( 45, 4 5, 4 5, 4 5, 4 5) (0) (00000) (0) Any u PP d sandwiched between slices of weight r and r + 2 where r u r + 2.
Projecting onto the parity polytope Two-Slice Lemma: y PP d iff y = αs + ( α)t, where s PP r d and t PPr+2 d. Polytope Projection: min v y 2 2 s.t. y PP d From the two-slice lemma: min v (αs + ( α)t) 2 2 s.t. s PP r d, t PPr+2 d α [0, ]
Majorization Let u and w be d-vectors sorted in decreasing order. The vector w is said to majorize u if w = u and q u k q k= k= w k q Vector (,,,, 0) majorizes ( 4 5, 4 5, 4 5, 4 5, 4 5). (0) P P 4 5 (0) () ( 25, 2 5, 2 5, 2 5, 2 5) (0) ( 45, 4 5, 4 5, 4 5, 4 5) (0) (00000) (0)
Majorization Theorem: u is in the convex hull of all permutations of w if and only if w majorizes u. Vector (,,,, 0) majorizes ( 4 5, 4 5, 4 5, 4 5, 4 5). (0) P P 4 5 (0) () ( 25, 2 5, 2 5, 2 5, 2 5) (0) ( 45, 4 5, 4 5, 4 5, 4 5) (0) (00000) (0)
Projecting onto the parity polytope Polytope Projection: min v y 2 2 s.t. y PP d From the two-slice lemma: min v (αs + ( α)t) 2 2 s.t. s PP r d, t PPr+2 d α [0, ] Using Majorization: min v y 2 2 s.t. y i = αr + ( α)(r + 2) i r+ y (k) r + ( α) k= 0 α
Quadratic program for the projection problem: min v y 2 2 s.t. y i = αr + ( α)(r + 2) i r+ y (k) r + ( α) k= 0 α For the quadratic program the Karush-Kuhn-Tucker (KKT) conditions are necessary and sufficient. We develop a water-filling type algorithm which determines a solution satisfying the KKT conditions. Overall Result We can project onto the parity polytope in O(d log d) time.
Final Projection Algorithm The KKT conditions imply that either the projection of v onto the hypercube, [0, ] d, is in the parity polytope, or There exists a β R + such that the projection y satisfies: y = v β w where w = (,...,,,... ) }{{}}{{} T. r+ d r Using this characterization we develop a water-filling type algorithm that determines y, the projection onto the parity polytope. Overall Result We can project onto the parity polytope in O(d log d) time.
Outline Background and Problem Setup [6 mins.] LP Decoding Formulation Optimization Framework [4 mins.] ADMM Technical Core [ 5 + ɛ mins.] Projecting onto the Parity Polytope
Outline Background and Problem Setup [6 mins.] LP Decoding Formulation Optimization Framework [4 mins.] ADMM Technical Core [ 5 + ɛ mins.] Projecting onto the Parity Polytope Numerical results [5 mins.] Graphs
Implementation of ADMM decoder Recap ADMM update steps: x k+ := argmin x X L µ (x, z k, λ k ) z k+ := argmin z Z L µ (x k+, z, λ k ) λ k+ j := λ k j + µ(p j x k+ z k+ j )
Implementation of ADMM decoder Recap ADMM update steps: x k+ := argmin x X L µ (x, z k, λ k ) z k+ := argmin z Z L µ (x k+, z, λ k ) λ k+ j := λ k j + µ(p j x k+ z k+ j ) Question: How to choose penalty parameter µ?
Implementation of ADMM decoder Recap ADMM update steps: x k+ := argmin x X L µ (x, z k, λ k ) z k+ := argmin z Z L µ (x k+, z, λ k ) λ k+ j := λ k j + µ(p j x k+ z k+ j ) Question: How to choose penalty parameter µ? Another question: When do we terminate the iteration?
Implementation of ADMM decoder Recap ADMM update steps: x k+ := argmin x X L µ (x, z k, λ k ) z k+ := argmin z Z L µ (x k+, z, λ k ) λ k+ j := λ k j + µ(p j x k+ z k+ j ) Question: How to choose penalty parameter µ? Another question: When do we terminate the iteration? Need to determine µ, maximum number of iteration T max and error tolerance ɛ.
Simulation for (057,244) LPDC code. Fix T max = 300, ɛ = e 4 0 0 word error rate (WER) 0 0 2 0 3 SNR = 5dB SNR = 5.25dB SNR = 5.5dB 0 5 0 5 µ
Simulation for (057,244) LPDC code. Fix T max = 300, ɛ = e 4 # iteration per decoding 42 40 38 36 34 32 30 28 26 SNR = 5dB 24 SNR = 5.25dB SNR = 5.5dB 22 0 5 0 5 µ
Simulation for (057,244) LPDC code. ɛ = e 4, µ = 2 0 0 2 word error rate 0 3 0 4 0 5 ADMM decoding, Tmax = 50, µ = 2 ADMM decoding, Tmax = 00, µ = 2 ADMM decoding, Tmax = 300, µ = 2 0 6 5 5.5 6 6.5 Eb/N0(dB)
Simulation for (057,244) LPDC code. ɛ = e 4, µ = 2 0 0 2 word error rate 0 3 0 4 0 5 ADMM decoding, Tmax = 50, µ = 2 ADMM decoding, Tmax = 00, µ = 2 ADMM decoding, Tmax = 300, µ = 2 0 6 5 5.5 6 6.5 Eb/N0(dB) Wait! We have seen that larger µ gives better WER performance.
Simulation for (057,244) LPDC code. ɛ = e 4 0 0 2 word error rate 0 3 0 4 0 5 ADMM decoding, Tmax = 50, µ = 2 ADMM decoding, Tmax = 00, µ = 2 ADMM decoding, Tmax = 300, µ = 2 ADMM decoding, Tmax = 300, µ = 0 0 6 5 5.5 6 6.5 Eb/N0(dB)
Simulation for (057,244) LPDC code. ɛ = e 4 0 0 2 word error rate 0 3 0 4 0 5 ADMM decoding, Tmax = 50, µ = 2 ADMM decoding, Tmax = 00, µ = 2 ADMM decoding, Tmax = 300, µ = 2 ADMM decoding, Tmax = 300, µ = 0 0 6 5 5.5 6 6.5 Eb/N0(dB) Is this performance good enough?
Simulation for (057,244) LPDC code. ɛ = e 4 0 0 2 word error rate 0 3 0 4 ADMM decoding, Tmax = 50, µ = 2 0 5 ADMM decoding, Tmax = 00, µ = 2 ADMM decoding, Tmax = 300, µ = 2 ADMM decoding, Tmax = 300, µ = 0 LP decoding 0 6 5 5.5 6 6.5 Eb/N0(dB) Same error performance as LP decoding using simplex method.