Graph-based codes for flash memory

1/28 Graph-based codes for flash memory Discrete Mathematics Seminar September 3, 2013 Katie Haymaker Joint work with Professor Christine Kelley University of Nebraska-Lincoln

2/28 Outline 1 Background Brief introduction to coding theory Low density parity-check codes Structure of flash memory

2/28 Outline 1 Background Brief introduction to coding theory Low density parity-check codes Structure of flash memory 2 Bit assignments for binary regular LDPC codes

2/28 Outline 1 Background Brief introduction to coding theory Low density parity-check codes Structure of flash memory 2 Bit assignments for binary regular LDPC codes 3 Designing structured nonbinary LDPC codes using binary images

What is coding theory? Coding theory is the study of the reliable transmission of information. The rate of a code is the ratio of information symbols to transmitted symbols. 3/28

1 Claude Shannon, A mathematical theory of communication, 1948. 4/28 A bit of communication theory A communication channel is a system in which the output depends probabilistically on the input. Each channel has an associated parameter called the channel capacity. Shannon 1 showed that coding can be used to transmit information at a rate close to the capacity of a channel. Noisy channel coding theorem: For any code rate R below the channel capacity, and for any ɛ > 0, there exists a (possibly very long) code and a decoding algorithm such that it is possible to transmit information at rate R with probability of decoding error less than ɛ.

5/28 A common channel The binary symmetric channel has binary input and output, and a transition probability p.

6/28 Simple repetition code Example: The binary repetition code of length 3 Repeat the information symbol 3 times 1 111 0 000 If at most one error occurs, this can be decoded correctly using majority rule. In general, To correct t errors, the symbol must be repeated 2t + 1 times. The rate of this code is 1 n, where n is the length of the codewords.

7/28 Parameters of error-correcting codes A linear error-correcting code C is a subspace of a finite-dimensional vector space over a finite field GF(q). The dimension n of the vector space is the blocklength of the code. The dimension k of the subspace is the number of information symbols. The Hamming distance between two vectors is the number of positions in which they differ. Let d denote the minimum Hamming distance between any two distinct codewords. Notation: C is an [n, k, d] code.

8/28 Code representations Since C is a subspace, we can use a matrix to define the code: A parity check matrix is a matrix H whose kernel is C. I.e., v C Hv T = 0. Example: A parity check matrix of the [7, 4, 3] Hamming code is: H = 0 0 0 1 1 1 1 0 1 1 0 0 1 1 1 0 1 0 1 0 1 The three rows of H determine the following parity check equations. For any codeword (x 1,..., x 7 ) C GF(2) 7, x 4 + x 5 + x 6 + x 7 = 0(mod 2) (1) x 2 + x 3 + x 6 + x 7 = 0(mod 2) (2) x 1 + x 3 + x 5 + x 7 = 0(mod 2) (3)

9/28 LDPC codes A low-density parity-check (LDPC) code C over GF(2) is the kernel of a matrix H that is sparse in the number of nonzero entries. Tanner graph representation: H = 1 1 1 0 0 0 0 1 0 0 1 1 0 0 0 1 0 1 0 1 0 1 0 0 0 0 1 1 0 0 1 1 0 0 1 0 0 1 0 1 1 0 0 1 0 0 1 0 1

10/28 Message-passing decoding Idea: The variable nodes receive information from the channel, and use that to form a binary codeword estimate. These estimates are sent along the edges of the Tanner graph to the neighboring check nodes. Each check node sends messages back to each of its neighboring variables nodes, but the messages only use extrinsic information. The variable nodes update their messages, again using extrinsic information.

11/28 Gallager hard-decision decoding example Message-passing decoding on a Tanner graph for the [7, 3, 4] code shown earlier:

Flash memory structure Flash memory is a non-volatile storage medium used in: USB drives, cameras, phones, hybrid computer hard drives, etc. The memory is organized into blocks of 10 5 cells, each of which can be charged up to one of q levels. 2 E. Yaakobi, et al., Characterization and error-correcting codes for TLC flash memories, IEEE ICCNC, Jan., 2012 12/28

Flash memory structure Flash memory is a non-volatile storage medium used in: USB drives, cameras, phones, hybrid computer hard drives, etc. The memory is organized into blocks of 10 5 cells, each of which can be charged up to one of q levels. In MLC (Multilevel cell) flash, each cell can hold one of four symbols that may be viewed as binary 2-tuples 2. The left-most bit is called the most significant bit (MSB) and the right-most bit is the least significant bit (LSB). 2 E. Yaakobi, et al., Characterization and error-correcting codes for TLC flash memories, IEEE ICCNC, Jan., 2012 12/28

Bit-error probabilities in MLC flash memory The two bits of a single cell are distributed among MSB pages and LSB pages. In MLC flash, a large majority of errors are single-bit errors. Moreover, MSB pages have a lower page error rate than LSB pages 3. Say that b 1 is the channel error probability for MSBs and b 2 is the error probability for the LSBs. Thus b 1 < b 2. 3 E. Yaakobi, et al., Error characterization and coding schemes for flash memory, IEEE GLOBECOM Workshops, Dec., 2010. 13/28

14/28 Using standard codes in flash memory Currently, binary codes are used in MLC flash memory in two ways: Bit-interleaved coded modulation (BICM) alternating MSB/LSB

14/28 Using standard codes in flash memory Currently, binary codes are used in MLC flash memory in two ways: Bit-interleaved coded modulation (BICM) alternating MSB/LSB Multilevel coding (separate codes for MSB and LSB pages) Question: Given a code, what is the best way to assign coded bits to pages for improved performance?

15/28 Check node types We consider (j, k)-regular LDPC codes. Question: Given such a code, how should we assign bits to MSB and LSB pages? Definition: a check node has type α, β, denoted T (α, β) if it has α MSB neighbors and β LSB neighbors.

15/28 Check node types We consider (j, k)-regular LDPC codes. For 0 g 1, let g be the fraction of check nodes having type T (α 1, β 1 ) and (1 g) be the fraction of check nodes having type T (α 2, β 2 ). β 1 = k α 1 and β 2 = k α 2. Consider cases where α 1 α 2. Let l be the number of check nodes. Since half of the variable nodes are necessarily assigned to MSB pages and the other half to LSB pages, the following constraint holds: α 1 gl + α 2 (1 g)l = kl 2

4 R. G. Gallager, Low density parity check codes, IRE Trans. on Info. Theory, Jan. 1962. 16/28 Gallager A and B hard decision decoding Figure : Extrinsic information in message passing decoding. Gallager A algorithm 4 : all check node neighbors of a given variable node v must agree (except the neighbor c that v is sending to) in order to change the value that v sends to c in the next iteration.

16/28 Gallager A and B hard decision decoding Notation: The probability that an erroneous message is sent from an MSB variable node to a neighboring check node on the (t + 1)th iteration is denoted by p (t+1) M (define p (t+1) L similarly). Let q (t) M and q(t) L denote the probability that a message sent on iteration t from a check node to an MSB or LSB, respectively, is in error.

17/28 Probability of error using iterative decoding Lemma If x 1 and x 2 are the number of MSB and LSB neighbors, respectively, involved in a message update at a check node c, then the probability that the message from c is in error on iteration t is q (t) = 1 (1 2p(t) M )x1 (1 2p (t) 2 L )x2 Where x 1 + x 2 = k 1 since the check node has degree k and the variable node receiving the message is not included in the message update. Moreover, p (t+1) M = b 1 ( ( 1 1 q (t)) ) j 1 ( + (1 b 1 ) q (t)) j 1, and ( p (t+1) L = b 2 (1 1 q (t)) ) j 1 ( + (1 b 2 ) q (t)) j 1..

18/28 Results for binary regular codes using Gallager A Our results indicate that for a fixed probability b 1, the largest worst-case value of b 2 (Density Evolution threshold) occurs when g = 1/2 and the two check types are T (1, k 1) and T (k 1, 1). I.e., codes having highly unbalanced check nodes with respect to MSBs and LSBs will perform better than the expected result of standard BICM, which yields on average half MSB and half LSB neighbors at each check node.

18/28 Results for binary regular codes using Gallager A b2 threshold: channel threshold for LSB prob 0.09 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0.01 Binary (3,6) LDPC: MSB,LSB threshold probabilities Random g=1/2 T(1,5) g=1/2 T(2,4) g=2/3 T(2,4) 0 0 0.02 0.04 0.06 0.08 0.1 b1: channel prob of MSB Figure : Thresholds for structured bit-interleaved (3, 6)-regular codes and corresponding random code.

18/28 Results for binary regular codes using Gallager A b2 threshold: channel threshold for LSB prob 0.09 0.085 0.08 0.075 0.07 0.065 0.06 0.055 0.05 Zoom (3,6) LDPC: b1=0,...,0.03 Random g=1/2 T(1,5) g=1/2 T(2,4) g=2/3 T(2,4) 0.045 0 0.005 0.01 0.015 0.02 0.025 0.03 b1: channel prob of MSB Figure : Zoom-in of previous figure to small b 1 values, specifically where b 1 < b 2. A higher b 2 threshold indicates a stronger code.

19/28 Comparison The b 2 threshold graphs for (3, 16), (3, 30), and (4, 8)-regular codes show the same trends the unbalanced check node types have larger b 2 thesholds than the average case. b2 threshold: channel threshold for LSB prob 0.01 0.009 0.008 0.007 0.006 0.005 0.004 0.003 0.002 0.001 Binary (3,16) LDPC: MSB,LSB threshold probabilities g=1/2, T(1,15) g=1/2, T(2,14) g=1/2, T(3,13) g=1/2, T(4,12) g=2/3 T(5,11) g=3/4 T(6,10) g=7/8 T(7,9) Random 0 0 0.002 0.004 0.006 0.008 0.01 b1: channel prob of MSB

19/28 Comparison b2 threshold: channel threshold for LSB prob 2.5 x 10 3 2 1.5 1 0.5 Binary (3,30) LDPC: MSB,LSB Thresholds Random g=1/2 T(1,29) b2 threshold: channel threshold for LSB prob 2.5 x 10 3 2.4 2.3 2.2 2.1 2 1.9 1.8 1.7 Zoom (3,30) LDPC: b1=0,...,0.00075 Random g=1/2 T(1,29) 0 0 0.5 1 1.5 2 2.5 3 b1: channel prob of MSB x 10 3 1.6 0 1 2 3 4 5 6 7 8 b1: channel prob of MSB x 10 4 Figure : Thresholds for (3, 30)-regular codes, random vs. g = 1/2 and T (1, 29). Figure : Zoom-in of plot on left to small b 1 values, specifically where b 1 < b 2.

19/28 Comparison b2 threshold: channel threshold for LSB prob 0.045 0.04 0.035 0.03 0.025 0.02 0.015 0.01 0.005 Binary (4,8) Gallager Algorithm A Random g=1/4 T(1,7) g=1/2 T(1,7) g=1/3 T(2,6) g=1/2 T(2,6) g=1/2 T(3,5) g=3/4 T(3,5) 0 0 0.01 0.02 0.03 0.04 0.05 b1: channel prob of MSB Figure : Thresholds for structured bit-interleaved (4, 8)-regular codes and corresponding random code.

19/28 Comparison In the case of j = 3 and j = 4, the expressions for p (t+1) M and p (t+1) L are the same for both Algorithms A and B, and thus the results in the previous section are the same for Gallager B decoding. Using Gallager B decoding, unbalanced check node types still outperform BICM in the case of (5, 10)-regular bipartite graphs. Next step: Use these observations to assign nonbinary edge labels to regular bipartite graphs to form structured nonbinary LDPC codes for flash memory.

20/28 Implementing nonbinary codes in MLC flash We construct nonbinary codes using an underlying (j, k)-regular graph by adding edge labels from GF(4) that give the desired types of check nodes. The binary image graph that we introduce next gives a natural assignment of bits to pages and will result in two types of check nodes.

21/28 Binary matrix representation Let r be a root of the primitive polynomial g(x) = x 2 + x + 1. Then the binary matrix representation of elements of GF(4) is: ( ) 1 0 1 1 = 0 1 ( ) 0 1 r A = 1 1 ( ) r 2 A 2 1 1 = 0 0 = 1 0 ( 0 0 0 0 ).

22/28 Examples of a binary image graph The binary image graph of a code is the Tanner graph obtained from the binary image parity-check matrix. Adding edge labels from GF(4) to an arbitrary (j, k)-regular graph results in a binary image graph with left degrees from {j, j + 1,..., 2j} and right degrees from {k, k + 1,..., 2k}.

Check node types in the binary image graph Fix a set of labels {r 1,..., r k }, where r i GF(4) such that at each check node, these k labels are randomly ordered and assigned to its incident edges. The resulting binary image graph has binary check nodes c i1 and c i2 for each check node c i in the nonbinary Tanner graph (where i = 1,..., l). Figure : The left graph has edge labels from GF(4). The binary image graph on the right has check c 1 of type T (3, 1) and check c 2 of type T (1, 2). 23/28

Assigning edge-labels to a (3, 6)-regular graph Edge Labels {1, 1, A, A, A 2, A 2 } {1, 1, 1, A 2, A 2, A 2 } {1, 1, 1, 1, 1, A 2 } Type 1 T (4, 4) T (3, 3) T (6, 1) Type 2 T (4, 4) T (6, 3) T (1, 5) M (MSB) (0, 0, 8 27, 4 9, 2 9, 1 27 ) (0, 0, 1 8, 3 8, 3 8, 1 125 8 ) (0, 0, 216, 25 72, 5 72, 1 216 ) L (LSB) (0, 0, 8 27, 4 9, 2 9, 1 27 ) (0, 0, 1, 0, 0, 0) (0, 0, 1, 0, 0, 0) Table : Select edge label sets for (3, 6)-regular graphs and corresponding check types and degree distributions. 24/28

Results for (3, 6)-regular graphs over GF (4) b2 threshold: channel threshold for LSB prob 0.14 0.12 0.1 0.08 0.06 0.04 0.02 Expanded (3,6) graph with check node labels, Gallager A Random (1,1,1,A 2,A 2,A 2 ) (1,1,1,1,A 2,A 2 ) (1,1,1,1,1,A 2 ) (1,1,1,A,A 2,A 2 ) (1,1,1,1,A,A 2 ) (1,1,1,1,1,A) (1,1,A,A,A 2,A 2 ) (1,1,1,1,A,A) (1,1,1,A,A,A) (1,1,1,A,A,A 2 ) 0 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 b1: channel prob of MSB Figure : Thresholds of binary image graph codes obtained from (3, 6)-regular graphs using edge label sets from Table 1. 25/28

B decoding of (3, 6)-regular graph over GF (4) b2 threshold: channel threshold for LSB prob 0.1 0.09 0.08 0.07 0.06 0.05 0.04 0.03 Expanded (3,6) graph with check node labels, Gallager B Random (1,1,1,A 2,A 2,A 2 ) (1,1,1,1,A 2,A 2 ) (1,1,1,1,1,A 2 ) (1,1,1,A,A 2,A 2 ) (1,1,1,1,A,A 2 ) (1,1,1,1,1,A) (1,1,A,A,A 2,A 2 ) (1,1,1,1,A,A) (1,1,1,A,A,A) (1,1,1,A,A,A 2 ) 0.02 0.01 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.05 b1: channel prob of MSB Figure : Thresholds of binary image graph codes obtained from (3, 6)-regular graphs under Gallager B decoding. 26/28

Conclusions We looked at: Adapting binary LDPC codes for different bit-error probabilities. Designing nonbinary LDPC codes using the binary image graph. Analyzing performance using binary hard decision decoding. Result: Under hard-decision decoding, regular graph-based codes where the check equations use highly unbalanced mix of LSBs and MSBs outperform the alternating BICM method 4. 4 H., Kelley, Structured bit-interleaved LDPC codes for MLC flash memory, Submitted to JSAC: Communication methodologies for the next-generation storage systems, Sept. 2013. 27/28

28/28 Future directions: Is the same true for other decoding algorithms? (e.g., Majority Logic decoding, nonbinary hard-decision decoding, etc.) What check node types are optimal in codes over GF (8)? Is it possible to develop a general framework for codes over GF (2 m )? Determine explicit bit assignments to MSB and LSB pages for structured LDPC code families that obtain the desired check node types. For example, finite geometry LDPC codes.