Coding Techniques for Data Storage Systems

Coding Techniques for Data Storage Systems Thomas Mittelholzer IBM Zurich Research Laboratory /8 Göttingen

Agenda. Channel Coding and Practical Coding Constraints. Linear Codes 3. Weight Enumerators and Error Rates 4. Soft Decoding 5. Codes on Graphs 6. Stopping Sets 7. Error Rate Analysis of Array-Based LDPC Codes 8. Summary /8 Göttingen

. Channel Capacity and Coding Discrete memoryless channel (DMC) Example: BSC(ε) = binary symmetric channel with crossover probability ε input X DMC Y output - ε ε P Y X X Y ε Shannon 948: Channel capacity can be approached by coding - ε Capacity: C = max I( X; Y) ( Recall: I(X;Y) = - P(y)logP(y) - P(x,y)logP(y x) ) P X Channel Coding Theorem (Gallager): For every rate R < C, there is a code of length n and rate R such that under maximum-likelihood (ML) decoding the worst case block error probability is arbitrarily small, i.e., P{ X^ X} worst case < 4 exp[ n E r (R) ] 3/8 where E r (R)> is the random coding exponent (which depends on the channel and R). Practical Problems:. Finding a good code is difficult. ML-decoding is complex. E.g., exhaustive search over code to find Göttingen x^ = arg max P( y x) x code

. Practical Coding Constraints of Data Storage Devices Practical constraints of DRAM, hard disk drives and tape drives DRAM (DDR3) Hard Disk Drives Tape Drives Reliability (maximum error rates) BER ~ - BER ~ -5 BER ~ -7 Throughput (burst operation) ~ 5 Gbit/s ~ Gbit/s ~ Gbit/s Delay ~ 5 ns (~ 3 ms with seek time) - Codeword Length (in bits) ~ 7 ~ 45 ~ 6, For DRAM, the codeword length and/or decoding time cannot be much extended For hard disk drives and tape drives, the codeword lengths and/or decoding time could be extended at the price of buffering more data The low bit error rates (BER) cannot be verified by simulations Need coding schemes with efficient decoders whose error rates can be assessed analytically 4/8 Göttingen

. Linear Codes Def.: A linear (n,k) code over a finite field GF(q) is a k-dimensional linear subspace C GF(q) n Examples:. Binary (3,) repetition code C = {, } in GF() 3. Ternary (3, ) repetition code C = {,, } in GF(3) 3 Def.: Hamming distance between two vectors x = x x x n and y = y y y n in GF(q) n : d( x, y) = # { i : x i y i } (= number of components in which x and y differ) Hamming weight of a vector x: w(x) = d(x,) Proposition: The pair (GF(q) n, d(.,.) ) is a metric space. Def.: Minimum distance of a linear code C: d min = min {d( x, y) : x, y C, x y} Proposition: A linear code C with minimum distance d min can correct up to t = (d min -)/ errors. Proof: Any error pattern e = e e e n of weight w(e) t distorts a codeword x into a received word y = x + e, which lies in the Hamming sphere S(x,t) of radius t around the codeword x. Since x C S(x,t) =, there is a unique closest codeword x to each received word y. 5/8 Göttingen

Göttingen 6 /8 Description of a linear code C by parity check matrices H and generator matrices G: ker H = C = im G Example: Binary (n=7, k=4, d min = 3) Hamming code C = { x = [x x x 7 ] : x H T = } = ker H, where. Linear Codes: Hamming Codes = P = H Equivalent systematic parity check matrix H, i.e., ker H = C, and systematic generator matrix G = H ' = G = - P T => G H T = [ I P T ] [ P I ] T = P T P T = Encoding a 4-bit message u = [u u u 3 u 4 ] with generator matrix G: x = u G If G is systematic, the encoder inverse map x uis a projection

. Linear Codes: Hamming Codes (cont.ed) The binary (7,4) Hamming code has d min = 3: there is a weight-3 codeword, viz., x = no two columns of H are linearly dependent, i.e., there is no weight- codeword (and no weight- codeword) H = => The (7,4) Hamming code can correct t = error. j-th column Syndrome Decoding For an error pattern e = [e e e 7 ] of weight w(e) =, the received word y = x + e satisfies s = y H T = (x + e) H T = eh T = [.. ] H T = transpose of j-th column of H syndrome j-th nonzero component of e => Error correction at j-th position: x^ = y [.. ] u Encoder x = ug x BSC y Syndrome Former s = yh T s x^ Encoder Decoder Inverse u^ 7/8 Göttingen

. Linear Codes: Hamming Codes (cont.ed) Proposition: For every m >, there is a binary Hamming code of length n = m, dimension k = n m and d min = 3, characterized by the mxn parity check matrix (columns = binary representation of all numbers m ) H M = M M M L L M M m rows Proposition: Hamming codes are perfect, i.e., the decoding spheres fill the entire space x C S(x,t=) = GF() n To show: x C S(x,t=) = n k V t= = n-m ( + n) = n where n n V t = + +... + t Remark: There is a (n=3,k=,d min =7) perfect Golay code and there are no other nontrivial binary codes. Sphere packing problems Applications: 3 3 3 + + 3 + = A shortened (n=7,k=64,d min =4) Hamming code with an overall parity bit is used in DRAM standards. For a BSC with ε -, one achieves a BER of about -. 3 8/8 Göttingen

. Linear Codes: Reed-Solomon Codes Let α GF(q) be a primitive element, i.e., <α> = GF(q)\{}. Let d > design distance Def.: An (n = q, k = n d + ) Reed-Solomon (RS) code is determined by the parity-check matrix H = M α α α 3 α d- 3 ( α ) ( α α ( α ) d- ) ( α ) 3 ( α ) ( α 3 α d- 3 3 ) 3 L L L L ( α α d- n- ( α ) 3 ( α ) M n n ) n d rows Every (d-)x(d-)-submatrix is a Vandermonde matrix based on mutually different terms α i, α i,, α i d- and, hence, has full rank d-. Thus, no d columns of H are linearly dependent d min d Theorem: The minimum distance of the (n, k) RS code is d min = d = n k + Singleton Bound on an (n, k, d min ) linear code: d min n k + Let G be systematic. Then [ ] G = [ x k+ x k+ x n ] has weight at most n k + A linear code meeting the Singleton Bound is called maximum-distance separable 9/8 Göttingen

3. Weight Enumerators Let C be a linear (n,k,d min ) code over GF(q) and let A i be the number of codewords of weight i. Def.: A( z) = n i= i A i z Example: Weight enumerator of the (n=7, k=4) binary Hamming code: A =, A = A =, A 3 = 7, A 4 = 7, A 5 = A 6 =, A 7 = A(z) = + 7 z 3 + 7z 4 + z 7 weight enumerator of the code C Theorem: The weight enumerator of the binary Hamming code of length n (= m ) is given by n n ( n ) / ( n+ )/ A( z) = ( + z) + ( + z) ( z) n + n + MacWilliams identities, automorphism group of a code Lattice of a linear (n,k) code C = ρ - (C), where ρ : Z n (Z/Z) n E.g., Gosset lattice E 8 corresponds to the extended Hamming code of length 8 (7,4) Hamming code: weight weight 3 weight 4 weight 7 /8 Göttingen

3. Weight Enumerators and Error Rates When using a linear (n,k) code C on the BSC(ε), what is the probability of an undetected error? An error is undetected iff the received word is a codeword. By linearity, one can assume that the allzero codeword was transmitted. i ε ( ε ) n i = prob. that the BSC transforms the allzero word into a fixed codeword of weight i Prob of an undetected error = n i n i n ε P ( E) Aε ( ε ) = ( ε ) A( ) = u i i ε Remarks:. This result can be extended to q-ary symmetric DMC with n P /( q ) Pu ( E) = ( P) A( ) P P P( b a) = P q if a = b otherwise. The weight distribution of (n,k,d) RS-codes over GF(q) is given by A =, A = A = = A d- =, and A l n = ( q ) l d l j= ( ) l q j j l d j for l = d, d+,, n /8 Göttingen

3. Bounded Distance Decoding Let C be a linear (n,k,d min ) code over GF(q), t = (d min -)/. Symmetric DMC as above w/ error prob P u Encoder x = ug x symmetric DMC y Decoder: Find x^ such that d(x^,y) t x^ failure Encoder Inverse u^ codewords are selected equally likely (not needed) Block error prob P B = P{X^ X or failure} Prob of correct decoding P c = P{ X^ = X } = - P B Theorem: P Proof: Each pattern of s errors has prob ( P ) q n There are ways to select i errors locations; i each occurs with prob n t i n i c = P ( P) i= i i P ( P) n i s ( P) n s x x x competing codewords x x with decoding spheres S(x,t) transmitted codeword and decoding sphere S(x,t) /8 Göttingen

3. Bounded Distance Decoding: Applications LTO-tape recording devices use two Reed-Solomon (RS) codes over GF( 8 ), which are concatenated to form a product code: (4,3,d=) RS C-code on rows (96,84,d=3) RS C-code on columns Data One Row One Col C Parity -5 C-Performance: N =96, t =6 Block (error corr. mode) Byte (error corr. mode) Block (erasure mode, marg=) Byte (erasure mode, marg=) -7 -limit C Parity RS codes can be efficiently decoded by the Berlekamp-Massey algorithm, which achieves bounded distance decoding. C is decoded first If decoder fails it outputs erasure symbols C is decoded second Error Rate - -5 By taking erasures into account performance improves substantially! - - - -3 Byte Error Probability at input of C decoder -4 3/8 Göttingen

4. Soft Decoding: AWGN Channel input X x { ± } Z AWGN Y output X AWGN p Y X (y x) = p Z (y-x) Y Binary-input additive white Gaussian noise (AWGN) channel: ± valued inputs and real-valued outputs Y = X + Z Z ~ N(, σ ) zero-mean normal distribution with variance σ Bit-error rate (uncoded) BER = Q(SNR).8.6.4 p Y X=- p Y X= where SNR = E b /N = /(σ ) Q( t) = t exp( s π / )ds..8.6.4. -4-3 - - 3 4 X = - X = 4/8 Göttingen

4. Soft Decoding vs Hard Decoding Soft channel outputs Z AWGN input X x { ± } Y output Toy code with n=9, k=, d min = 6 C = Different Decoding Algorithms ML: max. likelihood decoding BP: graph-based decoding with belief propagation HD-ML: Hard decision ML decoding HD-BDD: Hard decision bounded distance decoding Bounded distance decoding is far from optimum input X x { ± } Block Error Rate Hard decision channel BSC(ε = BER) Z AWGN - - -3-4 -5-6 Y Quantizer Y HD output y HD { ± } P HD BDD P HD ML P BP P ML Capacity -7-4 6 8 4 E b /N [db] 5/8 Göttingen

4. Soft Decoding: Bit-Wise MAP Decoding Z AWGN Codewords x of C are selected uniformly at random X i { ± } Y i MAP-Decoder w.r.t. Code C X i^(y) MAP (maximum a posteriori) decoding x i^(y) = arg max x i { ± } P Xi Y (x i y) x i^(y) = arg max x i { ± } ~x i P X Y(x y) x i^(y) = arg max x i { ± } ~x i p Y X(y x) P X (x) x i^(y) = arg max x i { ± } ~x i l p Y l Xl (y l x l ) {x C} (law of total prob) (Bayes rule) (uniform priors, memoryless channel) indicator function summation over all components of x except x i 6/8 Göttingen

5. Codes on Graphs Example Binary length-4 repetition code C = {, } can be characterized by {x C} = {x+x+x3+x4=} {x+x=} {x+x3=} H = The code membership function decomposes into factors => factor graph of a code variables x x x 3 x 4 checks H = f f f 3 H is related to the adjacency matrix of Γ by A Γ = H T H variable nodes V x x x 3 x 4 check nodes F Bipartite graph Γ = Γ H Factor graph or Tanner graph f f f 3 7/8 Göttingen

5. Codes on Graphs and Message Passing Example (cont.ed) The variable nodes are initialized with the channel output values p(y i x i ) Message passing rules ( Belief Propagation ) Initialization p(y x ) p(y x ) p(y 3 x 3 ) p(y 4 x 4 ) Motivation for message passing rules Local MAP decoding If graph is has no cycles, global MAP decoding is achieved 8/8 Göttingen

Göttingen 9 /8 Array-based LDPC codes Let q be a prime, j q, and let P denote the q q cyclic permutation matrix 5. Codes on Graphs : LDPC Codes Regular LDPC codes fixed degree l constellation at variable nodes fixed degree r constellation at check nodes Number of edges: n l= r #{check nodes} There are (n l)! such codes (graphs) H has column weight j, row weight q and length n = q. The shortest cycle in the graph Γ H has length 6. = j- 3 j- j- j- 3 3 ) ( ) ( ) ( ) ( ) ( ) ( q q q P P P P P P P P P P P P H L M M L L L = L M O M L L P P E R M UT A TI O N......

5. Codes on Graphs: Performance of LDPC Codes Low-density parity check (LDPC) codes introduced by Gallager (963) Very good performance under message passing (BP), (MacKay&Neal 995) Sparse H matrix -> few cycles on Γ H -> good performance of message passing Depending on the degree structure of the nodes in Γ H, there is a threshold T such that message passing decodes successfully if SNR > T and n N.B. (i) T > SNR(Capacity) (ii) Can design Γ H such that gap to capacity <.45 db Error rate Example (n=9, k=4) array-based LDPC code row weight = 47 column weight = 4 Rate-4/9 (j=4,q=47) array code on AWGN channel - -4-6 -8 P Block P bit Capacity (BLER) Capacity (BER).5 3 3.5 4 4.5 5 5.5 6 6.5 E b /N [db] /8 (Chung, Forney, Richardson, Urbanke, IEEE Comm. Letters, ) Göttingen T range of successful density evolution

5. Codes on Graphs: Finite Length Performance Rate-4/9 (j=4,q=47) array code on AWGN channel - P Block P bit Capacity (BLER) Capacity (BER) Short LDPC codes, typically have an error floor region due to cycles in the graph Error rate -4 How can the performance be assessed in the error floor region? -6-8.5 3 3.5 4 4.5 5 5.5 6 6.5 E b /N [db] Data points obtained with dedicated FPGA hardware decoder [L. Dolecek, Z. Zhang, V. Anantharam, M.J. Wainwright, B. Nikolic, IEEE Trans. Information Theory, ] T range of successful density evolution Error floor region /8 Göttingen

6. Stopping Sets Analyze the behavior of message passing, on the binary erasure channel (BEC) Motivation Consider (short) erasure patterns, i.e., subsets D of the variable nodes V for which the message passing decoder fails X ε ε - ε - ε Y erasure symbol Def.: A stopping set is a subset D V such that all neighbors of D are connected to D at least twice Example: Binary n=4 repetition code with parity check matrix Decoder fails if x = x = x 3 = (and x 4 = or ) variable nodes V D = 3 4 message µ = neighbors of D There are 3 stopping sets D = H = D = {,, 3} D = {,, 3, 4} = codeword erasure prob ε P F = ε 3 (-ε) + (-ε) 4 Decoding failure probability P F = ε D ( ε) D : nonempty stopping set (n D ) /8 Göttingen

6. Finding Stopping Sets Minimum distance problem Instance: A (random) binary mxn matrix H and an integer w > Question: Is there a nonzero vector x GF() n of weight w such that x H T =? Theorem (Vardy 997) The minimum distance problem is NP-complete Corollary Finding the minimum distance or the weight distribution of a linear code is NP-hard Theorem (Krishnan&Shankar 6) Finding the cardinality of the smallest nonempty stopping set is NP-hard Average ensemble performance (Di et al., IT-48, ) For an LDPC code ensemble of length n with fixed variable and check node degree distributions: probability that s chosen variable nodes contain a stopping set = B(s)/T(s) Average failure probability E[ P F n n s ] = ε ( ε ) s= s n s B( s) T ( ) s 3/8 Göttingen

6. Error Rate Analysis of Array-Based LDPC Codes Example of a (3,3) fully absorbing set D Motivation Consider subsets D of all variable nodes V which remain erroneous under the bit-flipping algorithm For D V decompose the set of neighboring check nodes N(D) into those with unsatisfied checks O(D) and satisfied checks E(D) Def.: An (a,b) absorbing set D V is characterized by D = a and O(D) = b every node in D has fewer neighbors in O(D) than in E(D) E(D): neighboring checks N(D): neighboring checks F: all checks Def.: An (a,b) fully absorbing set is an absorbing set and, in addition, all nodes in V \ D have more neighbors in F \ O(D) than in O(D). 4/8 Göttingen

6. Error Rate Analysis of Array-Based LDPC Codes Theorem The minimum distance of column weight 3 array LDPC codes is 6 with multiplicity (q )q. Error floor performance of the (9,7) array LDPC code Union bound for rate-7/9 (j=3,q=47) array code with AWGN Theorem [DZAWN, IT-] For the family of column weight 3 and row weight q array codes, the minimal absorbing and minimal fully absorbing sets are of size (3,3) and (4,), respectively. Their numbers grow as (q )q. Block Error Rate - - -3-4 -5 P Block Union bound AS Bound Capacity Absorbing set bound on block error prob.: Let D be a (3,3) absorbing set, then P B (q ) q P{Y absorbing region of D} number of (3,3) absorbing sets easily estimated by simulation -6-7 -8 3 3.5 4 4.5 5 5.5 6 6.5 7 E /N [db] b 5/8 Göttingen

6. Error Rate Analysis of Array-Based LDPC Codes Theorem [DZAWN, IT-] For the family of column weight 4 and row weight q>9 array codes, the minimal absorbing and minimal fully absorbing sets are of size (6,4). Example Error floor performance of the (9,4) array LDPC code Their numbers grow as q 3 (up to a constant factor). Ref.: [L. Dolecek, Z. Zhang, V. Anantharam, M.J. Wainwright, B. Nikolic, IEEE JSAC vol 7(6), 9] 6/8 Göttingen

Summary In storage applications, Hamming codes and Reed-Solomon codes are used (delay constraints). These codes are optimal with respect to the Hamming bound and the Singleton bound, i.e., in terms of algebraic coding criteria. The weight enumerator of these codes is known Analytical evaluation of undetected error probability error rate performance of bounded distance decoding Bounded distance decoding is not optimum for the additive white Gaussian noise channel but maximum-likelihood decoding is too complex. Iterative BP decoding of LDPC codes achieves almost capacity at low complexity. Error floor performance of short LDPC codes is an issue Analytical results only for a small class of codes (e.g., array-based LDPC codes) Open problems: characterization of stopping sets/absorbing sets for special classes of LDPC codes 7/8 Göttingen

References Algebraic Coding and Information Theory Friedrich Hirzebruch, Codierungstheorie und ihre Beziehung zu Geometrie und Zahlentheorie, Rheinisch-Westfälische Akademie der Wissenschaften, Vorträge N 37, Westdeutscher Verlag, 989 J.H. van Lint, Introduction to Coding Theory, GTM vol. 86, Springer, 98 Robert G. Gallager, Information Theory and Reliable Communication, John Wiley & Sons, N.Y., 968 A. Vardy, The Intractability of Computing the Minimum Distance of a Code, IEEE Trans. Information Theory, vol. 43(6), pp. 757-766, Nov. 997 Codes on Graphs T. Richardson & R. Urbanke, Modern Coding Theory, Cambridge Univ. Press, N.Y., 8 L. Dolecek, Z. Zhang, V. Anantharam, M.J. Wainwright, B. Nikolic, Analysis of Absorbing Sets and Fully Absorbing Sets of Array-Based LDPC Codes, IEEE Trans. Information Theory, vol. 56(), pp. 8-, Jan. 8/8 Göttingen

Back-Up 9/8 Göttingen

Bounded Distance Decoding of Toy Code Although suboptimal, the performance of bounded distance decoding can be easily evaluated. P HD BDD P HD ML HD BDD -5 Block Error Rate - -5 4 6 8 4 6 E /N [db] b 3/8 Göttingen

Soft Decoding Analysis of Toy Code The toy code is an (n=9, k=) array-based LDPC code with q = 3 = j The BP performance is determined by the 8 absorbing sets of type (3,3) -5 P BP F P ML F UB-ML AS Bound The ML performance is determined by the 3 weight-6 codewords Block Error Rate - -5-4 6 8 4 E b /N [db] 3/8 Göttingen

Evolution of BP for Toy Code: Absorbing Sets Input: noisy version of absorbing set { 3, 4, 9} Evolution of output P(x i = ) under BP algorithm P(x = ) P(x = ) P(x 3 = ) P(x 4 = ) P(x 5 = ) P(x 6 = ) P(x 7 = ) P(x 8 = ) P(x 9 = ).999.9994.8..999.9988.999.9994.8.6383....7363..6755....9....8657..977...739....47..695...9996....9979..9995..3....34..95............................................... Output alternates between the three absorbing sets { 3, 4, 9}, {, 5, 7}, {, 6, 8} Other possible behavior: Output stays on the same absorbing set Output converges to a (possibly wrong) codeword 3/8 Göttingen