Coding for loss tolerant systems Workshop APRETAF, 22 janvier 2009 Mathieu Cunche, Vincent Roca INRIA, équipe Planète INRIA Rhône-Alpes Mathieu Cunche, Vincent Roca
The erasure channel Erasure codes Reed-Solomon codes LDPC codes Application to distributed storage 2
The erasure channel erasure channel o definition: a symbol either arrives to the destination, without any error or is erased and never received 0 0 Erased! 1 1 BSC (binary symmetric) and AWGN channels o the integrity assumption is a strong hypothesis o a received symbol is 100% guaranteed error free 3
o Fail stop 4 The erasure channel where do we find erasure channels? o On the Internet o Because of routing error, congestion o Because of bad CRC/checksum o On wireless and satelitte networks o intermittent connection due to obstacles o Distributed storage o disk failure in RAID systems o node failure in a data center o Distributed computation
The erasure channel Erasure codes Reed-Solomon codes LDPC codes Application to distributed storage 5
Erasure codes o k sources symbols, encoded into n encoding symbols k before encoding o Code rate = = n after encoding o Close to 1 => little redundancy o Close to 0 => high amount of redundancy Transmission Symbol erasure Source object k source symbols Encoding Decoding Decoded object (n-k) repair symbols 6
Erasure codes Often used as AL-FEC codes o Application Level-Forward Error Correction codes AL-FEC differ from Physical-layer FEC codes o PHY codes: o correct bit errors, and if not possible detect the errors o Symbol = bit o AL-FEC: o recover from symbol erasures o Symbol = byte, IP datagram, file chunck 7
Erasure codes how can we define good erasure codes? performance metrics for erasure codes o erasure recovery capabilities o main metric, measured as the overhead ratio: # _of _ symbols_ required _ for _ decoding decoding _overhead k o decoding needs (1+overhead)*k symbols to succeed, 1 whereas ideal (MDS) codes need only k symbols o encoding and decoding speed o to appreciate the complexity o required memory during encoding and decoding 8
The erasure channel Erasure codes Reed-Solomon codes LDPC codes Application to distributed storage 9
Reed Solomon codes In short o Discovered by Reed & Solomon in 1959 o Linear codes over GF(2 n ) o Sum : simple binary XOR o Multiplication and Division: use a logarithmic table o Based on polynomial interpolation o Practical implementation with Vandermonde matrix o any k k submatrix of a Vandermonde is invertible 10
Reed Solomon codes Encoding o Matrix vector multiplication X G = Y = Source vector: k source symbols Generator matrix: k x n Vandermonde Encoded vector: n encoded symbols o Complexity O(k 2 ) operations 11
Reed Solomon codes Decoding o Solve a linear system X G = Y = Source vector: Received vector: kxk submatrix of G k source symbols k received symbols (invertible) o Good VDM property: any kxk submatrix is invertible o k encoding symbols are enough to decode o Decoding overhead = 0, said differently RS are MDS o Complexity O(k 3 ) 12
Reed Solomon codes: summary Perfect codes o Decoding overhead = 0 o Decoding possible as soon as k symbols are received but limited scalability o n<255 GF(2 8 ) is sufficient o Fast operation over GF(2 8 ), (small logarithmic table) o Decoding speed = a few 10 Mbps o n>255, use GF(2 16 ) or more o Log table too large, cannot fit in cache o Decoding speed falls = a few Mbps 13
The erasure channel Erasure codes Reed-Solomon codes LDPC codes Application to distributed storage 14
LDPC codes in short o Low Density Parity Check (LDPC) o linear block codes o Sparse parity check matrix o discovered by Gallager in the 60 s, re-discovered in mid-90s o In general encoding require to solve a linear system O(k 3 ) o but high performance, lightweight variants exist o in the remaining we focus on a binary LDPC o Based on XOR operations 15
LDPC codes LDPC-staircase codes (RFC 5170) o a simple (trivial) parity check matrix structure Source symbols Parity symbols S 1 S 2 S 3 S 4 S 5 P 1 P 2 P 3 P 4 P 5 0 0 1 1 0 1 0 0 0 0 1 0 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 0 0 0 1 0 1 1 0 0 1 1 0 1 1 1 0 1 0 0 0 1 1 o A.K.A. double diagonal or Repeat Accumulate codes o high encoding speed (encoding is trivial) o recovery capabilities can be made close to ideal Constraints S 1 S 4 S 5 P 1 P 2 = 0 codes 16
LDPC codes Encoding S1 S2 S3 S4 S5 P1 P2 P3 P4 P5 0 0 1 1 0 1 0 0 0 0 1 0 0 1 1 1 1 0 0 0 1 1 1 0 0 0 1 1 0 0 0 1 0 1 1 0 0 1 1 0 1 1 1 0 1 0 0 0 1 1 S3 S4 P1 =0 S1 S4 S5 P1 P2 =0 S1 S2 S3 P2 P3 =0 S2 S4 S5 P3 P4 =0 S1 S2 S3 S5 P4 P5 =0 S1 S4 S5 P1 P2=0 o Linear complexity O(k) Decoding o solve a system of linear equations o Several techniques are feasible 17
LDPC codes Sol.1: Iterative Decoding (ID) o If an equation has only one unknown variable, this latter is equal to the sum of the others. Reiterate o Efficient thanks to the sparsness of the parity check matrix o Pros: Low complexity (linear O(k)) o Low CPU load and high sustainable bandwidth o Cons: Suboptimal in terms of correction capabilities code rate (k=1000,n1=3) o Some full rank systems cannot be solved Average overhead Overhead for a failure proba 10-4 2/3 (=0.66) 9.99 % 13.93 % 2/5 (=0.4) 17.13 % 22.91 % 18
LDPC codes Sol.2: Maximum Likelihood(ML) decoding o Solve a linear system (Gaussian Elimination, LU Missing symbols decomposition ) xa = b Submatrix of the Generator matrix Information of the received symbols o Excellent erasure correction capabilities code rate (k=1000,n1=5) Average overhead Overhead for a failure proba 10-4 2/3 (=0.66) 0.63 % 2.21 % 2/5 (=0.4) 2.04 % 4.41 % o High complexity: O(k 3 ) 19
Some more details on LDPC codes considered Sol. 3: Hybrid ID/ML scheme o Hybrid decoder o start decoding with ID (fast) o finish with ML if necessary (optimal) o excellent erasure correction capabilities o while remaining very fast 20
LDPC codes Decoding speed of the hybrid decoder o LDPC-staircase (N1=5), code rate 2/3, k=1,000 o Reed Solomon over GF(2 8 ) 32.4 times faster than RS (1.7 Gbps) sustainable decoding speed (Mbps) with RS: 54Mbps ID sufficient ML needed more and more often still 10.2 times faster (500 Mbps) loss probability(%) 21
The erasure channel Erasure codes Reed-Solomon codes LDPC codes Application to distributed storage 22
Application to distributed storage Client_1 Using replication : A file partitionned into 8 blocks Each block is replicated 4 times 1 3 6 7 2 4 5 8 1 3 4 6 1 2 6 8 3 4 6 7 2 3 5 7 2 5 7 8 1 4 5 8 Client_2 Can tolerate up to 3 failures 23
Application to distributed storage Client_1 Using erasure codes: A file encoded into 32 blocks: 8 source blocks 24 repair blocks A B C D E F G H 1 2 3 4 M N O P I J K L U V W X Q R S T 5 6 7 8 Client_2 Can tolerate up to 6 failures, since 8 blocks are enough to decode 24
Conclusion Erasure codes o Add redundancy to combat symbol erasures Reed-Solomon o Perfect codes (MDS), but inefficient for large objects LDPC codes o Can encode large objects o Corrections capabilities close to MDS o High encoding and decoding speed 25
Questions?