Decomposition Methods for Large Scale LP Decoding

Similar documents
Adaptive Cut Generation for Improved Linear Programming Decoding of Binary Linear Codes

On the Joint Decoding of LDPC Codes and Finite-State Channels via Linear Programming

Belief-Propagation Decoding of LDPC Codes

Probabilistic Graphical Models

Graph-based codes for flash memory

On the Block Error Probability of LP Decoding of LDPC Codes

Introduction to Low-Density Parity Check Codes. Brian Kurkoski

EE229B - Final Project. Capacity-Approaching Low-Density Parity-Check Codes

Message Passing Algorithm and Linear Programming Decoding for LDPC and Linear Block Codes

Constrained Optimization and Lagrangian Duality

16.36 Communication Systems Engineering

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

Single-Gaussian Messages and Noise Thresholds for Low-Density Lattice Codes

CS264: Beyond Worst-Case Analysis Lecture #11: LP Decoding

Linear and conic programming relaxations: Graph structure and message-passing

SIPCom8-1: Information Theory and Coding Linear Binary Codes Ingmar Land

Lower Bounds on the Graphical Complexity of Finite-Length LDPC Codes

ECEN 655: Advanced Channel Coding

10725/36725 Optimization Homework 4

Error Detection and Correction: Hamming Code; Reed-Muller Code

Ma/CS 6b Class 24: Error Correcting Codes

Practical Polar Code Construction Using Generalised Generator Matrices

STUDY OF PERMUTATION MATRICES BASED LDPC CODE CONSTRUCTION

Applications of Linear Programming to Coding Theory

Linear Programming Decoding of Binary Linear Codes for Symbol-Pair Read Channels

4 An Introduction to Channel Coding and Decoding over BSC

EXTENDING THE DORSCH DECODER FOR EFFICIENT SOFT DECISION DECODING OF LINEAR BLOCK CODES SEAN MICHAEL COLLISON

Lecture 4 Noisy Channel Coding

COMPSCI 650 Applied Information Theory Apr 5, Lecture 18. Instructor: Arya Mazumdar Scribe: Hamed Zamani, Hadi Zolfaghari, Fatemeh Rezaei

Lecture 18: Optimization Programming

ICS-E4030 Kernel Methods in Machine Learning

2376 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 53, NO. 7, JULY Note that conic conv(c) = conic(c).

5. Density evolution. Density evolution 5-1

LDPC Codes. Slides originally from I. Land p.1

Convex Optimization M2

Low-Density Parity-Check Codes

Quasi-cyclic Low Density Parity Check codes with high girth

APPLICATIONS. Quantum Communications

Lecture 7. Union bound for reducing M-ary to binary hypothesis testing

exercise in the previous class (1)

Shannon s noisy-channel theorem

Lecture 4 : Introduction to Low-density Parity-check Codes

CSCI 2570 Introduction to Nanocomputing

MATH3302. Coding and Cryptography. Coding Theory

Pseudocodewords of Tanner Graphs

Lecture 8: Shannon s Noise Models

Reliable Computation over Multiple-Access Channels

An Introduction to Low Density Parity Check (LDPC) Codes

Introducing Low-Density Parity-Check Codes

Bounds on Achievable Rates of LDPC Codes Used Over the Binary Erasure Channel

Linear Block Codes. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Expectation propagation for symbol detection in large-scale MIMO communications

5. Duality. Lagrangian

Convex relaxation. In example below, we have N = 6, and the cut we are considering

Low-density parity-check (LDPC) codes

Tutorial on Convex Optimization: Part II

Codes on graphs and iterative decoding

Lecture 2 Linear Codes

Low-density parity-check codes

Entropies & Information Theory

Probabilistic and Bayesian Machine Learning

Construction and Performance Evaluation of QC-LDPC Codes over Finite Fields

Making Error Correcting Codes Work for Flash Memory

Ma/CS 6b Class 25: Error Correcting Codes 2

LOW-density parity-check (LDPC) codes were invented

Maximum Likelihood Decoding of Codes on the Asymmetric Z-channel

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Performance Analysis and Code Optimization of Low Density Parity-Check Codes on Rayleigh Fading Channels

14 : Theory of Variational Inference: Inner and Outer Approximation

Hamming codes and simplex codes ( )

Low Density Parity Check (LDPC) Codes and the Need for Stronger ECC. August 2011 Ravi Motwani, Zion Kwok, Scott Nelson

Statistical Machine Learning from Data

Analysis of a Randomized Local Search Algorithm for LDPCC Decoding Problem

Homework 3. Convex Optimization /36-725

Graph-based Codes and Iterative Decoding

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity

Distance Properties of Short LDPC Codes and Their Impact on the BP, ML and Near-ML Decoding Performance

Chapter 3 Linear Block Codes

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Convex relaxation. In example below, we have N = 6, and the cut we are considering

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Message Passing Algorithms and Improved LP Decoding

On Bit Error Rate Performance of Polar Codes in Finite Regime

Enhancing Binary Images of Non-Binary LDPC Codes

Joint Decoding of LDPC Codes and Finite-State Channels via Linear-Programming

MATH3302 Coding Theory Problem Set The following ISBN was received with a smudge. What is the missing digit? x9139 9

Codes on graphs and iterative decoding

National University of Singapore Department of Electrical & Computer Engineering. Examination for

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Convex Optimization Boyd & Vandenberghe. 5. Duality

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Factor Graphs and Message Passing Algorithms Part 1: Introduction

Codes on Graphs. Telecommunications Laboratory. Alex Balatsoukas-Stimming. Technical University of Crete. November 27th, 2008

Lecture 6: Expander Codes

9 Forward-backward algorithm, sum-product on factor graphs

Lecture 14: Hamming and Hadamard Codes

Transcription:

Decomposition Methods for Large Scale LP Decoding Siddharth Barman Joint work with Xishuo Liu, Stark Draper, and Ben Recht

Outline Background and Problem Setup LP Decoding Formulation Optimization Framework ADMM Technical Core Projecting onto the Parity Polytope Numerical results Graphs

Efficient and Reliable Digital Transmission Error Correcting Codes Constructs that enable reliable delivery of digital data over unreliable (noisy) communication channels. Claude Shannon Richard Hamming

Model for Communication Channel Alice x Noisy Channel x Bob Binary Symmetric Channel (BSC) We will focus on transmission of bit strings (x, x {0, } n ) and the Binary Symmetric Channel. 0 p 0 p p p Figure : BSC with crossover probability p

Probabilistic implications of BSC 0 p 0 p p p Say x is the transmitted bit string and x is the received bit string. BSC is memoryless hence Pr( x x) = i Pr( x i x i ) Example: Pr( x = 00 x = 000) = ( p) ( p) p.

Executive Summary of Error Correcting Codes Alice x Dictionary Noisy Channel x / Dictionary Bob Codewords C: A Dictionary (structured set). Error Detection: Spell Checking. Error Correction: Spell Correction.

Maximum Likelihood (ML) Decoding With codeword C and received bit string x, ML decoding picks a codeword x C that maximizes the probability that x was received given that x was sent: Pr(received x sent x) maximize Pr( x x) subject to x C maximize i Pr( x i x i ) subject to x C

Maximum Likelihood (ML) Decoding With codeword C and received bit string x, ML decoding picks a codeword x C that maximizes the probability that x was received given that x was sent: Pr(received x sent x) maximize Pr( x x) subject to x C maximize i Pr( x i x i ) subject to x C maximize i log Pr( x i x i ) subject to x C

Maximum Likelihood (ML) Decoding ML decoding: maximize i log Pr( x i x i ) s.t. x C Negative log-likelihood ratios for all i: log p p if x i = γ i = log p p if x i = 0 ML decoding: minimize i γ i x i s.t. x C

Low Density Parity Check (LDPC) Codes LDPC: x C iff all the parity checks are satisfied. Parity Checks (x, x 2, x 3 ) (x, x 3, x 4 ) (x 2, x 5, x 6 ) (x 4, x 5, x 6 ) Codeword Bits x x 2 x 3 x 4 x 5 x 6 Example: x = ( 0 0 ) Parity Check : ( 0 ) Parity Check 2: ( 0) Parity Check 3: (0 ) Parity Check 4: (0 )

Decoding Low Density Parity Check (LDPC) Codes Parity Checks (x, x 2, x 3 ) (x, x 3, x 4 ) (x 2, x 5, x 6 ) (x 4, x 5, x 6 ) Codeword Bits x x 2 x 3 x 4 x 5 x 6 Let P j x be the sub-vector participating in the jth parity check. P x = (x x 2 x 3 ), P 4 x = (x 4 x 5 x 6 ) and d = 3. P d = {all even parity bit-vectors of length d} LDPC x C if and only if P j x P d for all j.

ML Decoding for LDPC Codes γ: Negative log-likelihood ratios. P j x: The sub-vector participating in the jth parity check. P d = {all even parity bit-vectors of length d} ML decoding: minimize subject to γ i x i i x C ML Decoding for LDPC Codes: minimize γ i x i subject to P j x P d j i

Decoding by Belief Propagation (BP) Parity Check Nodes (x, x 2, x 3 ) (x, x 3, x 4 ) (x, x 2, x 3 ) (x, x 3, x 4 ) Codeword Bits x x 2 x 3 x 4 x x 2 x 3 x 4 BP has been empirically successful in decoding LDPC codes but possess are no convergence guarantees. Inherently distributed and takes full advantage of locality (low density of parity checks). A distributed decoding algorithm is desirable as it directly implies scalability.

Decoding Linear Program minimize γ i x i i subject to P j x P d j and x {0, } n Parity Polytope, PP d, is the convex hull of all even parity bit-vectors of length d, PP d = conv(p d ). Feldman et al. (2005) proposed the following relaxation: minimize γ i x i i subject to P j x PP d j and x [0, ] n

Decoding LP and the Parity Polytope P d = {all even parity bit-vectors of length d} PP d = conv(p d ) minimize γ i x i i subject to P j x PP d j and x [0, ] n 0 0 0 000

Outline Background and Problem Setup [6 mins.] LP Decoding Formulation

Outline Background and Problem Setup [6 mins.] LP Decoding Formulation Optimization Framework [4 mins.] Alternating Direction Method of Multipliers (ADMM)

Decoding LP with parity polytope PP d : minimize γ i x i subject to P j x PP d j x [0, ] n i Add Replicas z j s: minimize γ i x i subject to z j = P j x j z j PP d j x [0, ] n i

Augmented Lagrangian minimize γ i x i subject to z j = P j x j i z j PP d j x [0, ] n Augmented Lagrangian with Lagrange multiplier λ and penalty parameter µ: L µ (x, z, λ) := γ T x + j λ T j (P j x z j )

Augmented Lagrangian minimize γ i x i subject to z j = P j x j i z j PP d j x [0, ] n Augmented Lagrangian with Lagrange multiplier λ and penalty parameter µ: L µ (x, z, λ) := γ T x + j λ T j (P j x z j ) + µ P j x z j 2 2 2 j

Alternating Direction Method of Multipliers L µ (x, z, λ) := γ T x + j λ T j (P j x z j ) + µ P j x z j 2 2 2 j ADMM Update Steps: Lather, Rinse, Repeat. x k+ := argmin x X L µ (x, z k, λ k ) z k+ := argmin z Z L µ (x k+, z, λ k ) ( ) λ k+ j := λ k j + µ P j x k+ z k+ j

ADMM x-update Decoding LP: minimize γ i x i subject to z j = P j x j z j PP d j x [0, ] n i Augmented Lagrangian: L µ (x, z, λ) := γ T x + j λt j (P j x z j ) + µ 2 j P jx z j 2 2

ADMM x-update Decoding LP: minimize γ i x i subject to z j = P j x j z j PP d j x [0, ] n i Augmented Lagrangian: L µ (x, z, λ) := γ T x + j λt j (P j x z j ) + µ 2 j P jx z j 2 2 With z and λ fixed the x-updates are simple: minimize L µ (x, z k, λ k ) subject to x [0, ] n

ADMM x-update In the x-update step replicas (z) and dual variables (λ) are fixed. x L µ(x, z k, λ k )= 0 Component wise update: x i = Π [0,] N v (i) j N v (i) ( z (i) j ) µ λ(i) j N v (i) : set of parity checks containing component i. z (i) j : component of the jth replica associated with x i. µ γ i

ADMM z-update Augmented Lagrangian: L µ (x, z, λ) := γ T x + j λt j (P j x z j ) + µ 2 j P jx z j 2 2 z-update: minimize λ T j (P j x z j ) + µ P j x z j 2 2 2 subject to z j PP d j j The minimization is completely separable in j, hence for each z j we need to solve: j minimize λ T j (P j x z j ) + µ 2 P jx z j 2 2 subject to z j PP d

z-updates: minimize λ T j (P j x z j ) + µ 2 P jx z j 2 2 subject to z j PP d With v = P j x + λ j /µ (completing the square) the problem is equivalent to: minimize v z 2 2 subject to z PP d The primary challenge in ADMM The z-update requires projecting onto the parity polytope.

Outline Background and Problem Setup [6 mins.] LP Decoding Formulation Optimization Framework [4 mins.] Alternating Direction Method of Multipliers (ADMM)

Outline Background and Problem Setup [6 mins.] LP Decoding Formulation Optimization Framework [4 mins.] Alternating Direction Method of Multipliers (ADMM) Technical Core [ 5 + ɛ mins.] Projecting onto the Parity Polytope

minimize v z 2 2 subject to z PP d Recall that the parity polytope, PP d, is the convex hull of all binary vectors of length d and even hamming weight. 0 0 0 000

Parity Polytope y PP d iff y = i α ie i Here e i are even-hamming-weight vectors of dimension d, i α i = and α i 0. Example (d = 6): /2 /2 /4 /4 = 2 0 0 0 0 + 4 0 0 + 4

Parity Polytope y PP d iff y = i α ie i Here e i are even-hamming-weight vectors of dimension d, i α i = and α i 0. Example (d = 6): /2 /2 /4 /4 = 4 0 0 0 0 + 2 0 0 + 4 0 0

Characterizing the Parity Polytope Two-Slice Lemma: For any y PP d there exists a representation: y = i α ie i such that the hamming weight of all e i is either r or r + 2, for some even integer r. Example with d = 6 and r = 2. /2 /2 /4 /4 = 4 0 0 0 0 + 2 0 0 + 4 0 0

Characterizing the Parity Polytope Lemma: For any y PP d there exists a representation: y = i α ie i such that the hamming weight of all e i is either r or r + 2, for some even integer r. (0) (0) t P P r+2 d (0) y (0) (0) (000) s P P r d (000) (000) d = 5, r = 2

Characterizing the Parity Polytope Let PP r d be the convex hull of all binary vectors of hamming weight r. Two-Slice Lemma: y PP d iff y = αs + ( α)t, where s PP r d and t PPr+2 d. (0) (0) t P P r+2 d (0) y (0) (0) (000) s P P r d (000) (000) d = 5, r = 2

Structure of the Parity Polytope PP r d = (Hyperplane containing vectors of weight r) Hypercube. (0) P P 4 5 (0) () ( 25, 2 5, 2 5, 2 5, 2 5) (0) ( 45, 4 5, 4 5, 4 5, 4 5) (0) (00000) (0) Any u PP d sandwiched between slices of weight r and r + 2 where r u r + 2.

Projecting onto the parity polytope Two-Slice Lemma: y PP d iff y = αs + ( α)t, where s PP r d and t PPr+2 d. Polytope Projection: min v y 2 2 s.t. y PP d From the two-slice lemma: min v (αs + ( α)t) 2 2 s.t. s PP r d, t PPr+2 d α [0, ]

Majorization Let u and w be d-vectors sorted in decreasing order. The vector w is said to majorize u if w = u and q u k q k= k= w k q Vector (,,,, 0) majorizes ( 4 5, 4 5, 4 5, 4 5, 4 5). (0) P P 4 5 (0) () ( 25, 2 5, 2 5, 2 5, 2 5) (0) ( 45, 4 5, 4 5, 4 5, 4 5) (0) (00000) (0)

Majorization Theorem: u is in the convex hull of all permutations of w if and only if w majorizes u. Vector (,,,, 0) majorizes ( 4 5, 4 5, 4 5, 4 5, 4 5). (0) P P 4 5 (0) () ( 25, 2 5, 2 5, 2 5, 2 5) (0) ( 45, 4 5, 4 5, 4 5, 4 5) (0) (00000) (0)

Projecting onto the parity polytope Polytope Projection: min v y 2 2 s.t. y PP d From the two-slice lemma: min v (αs + ( α)t) 2 2 s.t. s PP r d, t PPr+2 d α [0, ] Using Majorization: min v y 2 2 s.t. y i = αr + ( α)(r + 2) i r+ y (k) r + ( α) k= 0 α

Quadratic program for the projection problem: min v y 2 2 s.t. y i = αr + ( α)(r + 2) i r+ y (k) r + ( α) k= 0 α For the quadratic program the Karush-Kuhn-Tucker (KKT) conditions are necessary and sufficient. We develop a water-filling type algorithm which determines a solution satisfying the KKT conditions. Overall Result We can project onto the parity polytope in O(d log d) time.

Final Projection Algorithm The KKT conditions imply that either the projection of v onto the hypercube, [0, ] d, is in the parity polytope, or There exists a β R + such that the projection y satisfies: y = v β w where w = (,...,,,... ) }{{}}{{} T. r+ d r Using this characterization we develop a water-filling type algorithm that determines y, the projection onto the parity polytope. Overall Result We can project onto the parity polytope in O(d log d) time.

Outline Background and Problem Setup [6 mins.] LP Decoding Formulation Optimization Framework [4 mins.] ADMM Technical Core [ 5 + ɛ mins.] Projecting onto the Parity Polytope

Outline Background and Problem Setup [6 mins.] LP Decoding Formulation Optimization Framework [4 mins.] ADMM Technical Core [ 5 + ɛ mins.] Projecting onto the Parity Polytope Numerical results [5 mins.] Graphs

Implementation of ADMM decoder Recap ADMM update steps: x k+ := argmin x X L µ (x, z k, λ k ) z k+ := argmin z Z L µ (x k+, z, λ k ) λ k+ j := λ k j + µ(p j x k+ z k+ j )

Implementation of ADMM decoder Recap ADMM update steps: x k+ := argmin x X L µ (x, z k, λ k ) z k+ := argmin z Z L µ (x k+, z, λ k ) λ k+ j := λ k j + µ(p j x k+ z k+ j ) Question: How to choose penalty parameter µ?

Implementation of ADMM decoder Recap ADMM update steps: x k+ := argmin x X L µ (x, z k, λ k ) z k+ := argmin z Z L µ (x k+, z, λ k ) λ k+ j := λ k j + µ(p j x k+ z k+ j ) Question: How to choose penalty parameter µ? Another question: When do we terminate the iteration?

Implementation of ADMM decoder Recap ADMM update steps: x k+ := argmin x X L µ (x, z k, λ k ) z k+ := argmin z Z L µ (x k+, z, λ k ) λ k+ j := λ k j + µ(p j x k+ z k+ j ) Question: How to choose penalty parameter µ? Another question: When do we terminate the iteration? Need to determine µ, maximum number of iteration T max and error tolerance ɛ.

Simulation for (057,244) LPDC code. Fix T max = 300, ɛ = e 4 0 0 word error rate (WER) 0 0 2 0 3 SNR = 5dB SNR = 5.25dB SNR = 5.5dB 0 5 0 5 µ

Simulation for (057,244) LPDC code. Fix T max = 300, ɛ = e 4 # iteration per decoding 42 40 38 36 34 32 30 28 26 SNR = 5dB 24 SNR = 5.25dB SNR = 5.5dB 22 0 5 0 5 µ

Simulation for (057,244) LPDC code. ɛ = e 4, µ = 2 0 0 2 word error rate 0 3 0 4 0 5 ADMM decoding, Tmax = 50, µ = 2 ADMM decoding, Tmax = 00, µ = 2 ADMM decoding, Tmax = 300, µ = 2 0 6 5 5.5 6 6.5 Eb/N0(dB)

Simulation for (057,244) LPDC code. ɛ = e 4, µ = 2 0 0 2 word error rate 0 3 0 4 0 5 ADMM decoding, Tmax = 50, µ = 2 ADMM decoding, Tmax = 00, µ = 2 ADMM decoding, Tmax = 300, µ = 2 0 6 5 5.5 6 6.5 Eb/N0(dB) Wait! We have seen that larger µ gives better WER performance.

Simulation for (057,244) LPDC code. ɛ = e 4 0 0 2 word error rate 0 3 0 4 0 5 ADMM decoding, Tmax = 50, µ = 2 ADMM decoding, Tmax = 00, µ = 2 ADMM decoding, Tmax = 300, µ = 2 ADMM decoding, Tmax = 300, µ = 0 0 6 5 5.5 6 6.5 Eb/N0(dB)

Simulation for (057,244) LPDC code. ɛ = e 4 0 0 2 word error rate 0 3 0 4 0 5 ADMM decoding, Tmax = 50, µ = 2 ADMM decoding, Tmax = 00, µ = 2 ADMM decoding, Tmax = 300, µ = 2 ADMM decoding, Tmax = 300, µ = 0 0 6 5 5.5 6 6.5 Eb/N0(dB) Is this performance good enough?

Simulation for (057,244) LPDC code. ɛ = e 4 0 0 2 word error rate 0 3 0 4 ADMM decoding, Tmax = 50, µ = 2 0 5 ADMM decoding, Tmax = 00, µ = 2 ADMM decoding, Tmax = 300, µ = 2 ADMM decoding, Tmax = 300, µ = 0 LP decoding 0 6 5 5.5 6 6.5 Eb/N0(dB) Same error performance as LP decoding using simplex method.