Advanced Hardware Architecture for Soft Decoding Reed-Solomon Codes Stefan Scholl, Norbert Wehn Microelectronic Systems Design Research Group TU Kaiserslautern, Germany
Overview Soft decoding decoding for the RS(255,239) New hardware architecture Goal: large FER gain (over hard decision decoding) Algorithm based on information set decoding Complexity evaluation on a Virtex 5 FPGA 2
Motivation RS / BCH Decoder Hardware wireless wired storage VDSL NASA / CCSDS Optical (G.709) Widely used code: RS(255,239) or its shortened versions 3
Decoding Algorithms for Reed-Solomon Hard Decoding Soft Decoding Algorithm: standard method algebraic decoding complexity very low: first chip implementations in the 1970/80s Progress in microelectronics allows for more complexity today! 4
Decoding Algorithms for Reed-Solomon Hard Decoding Algorithm: standard method algebraic decoding complexity very low: first chip implementations in the 1970/80s Progress in microelectronics allows for more complexity today! Soft Decoding Improved error correction possible gain: up to 3 db (depends on length and coderate) Algorithms: Chase Decoding Information Set Decoding Adaptive Belief Propagation Kötter-Vardy 5
Decoding Algorithms for Reed-Solomon Hard Decoding Algorithm: standard method algebraic decoding complexity very low: first chip implementations in the 1970/80s Progress in microelectronics allows for more complexity today! Soft Decoding Improved error correction possible gain: up to 3 db (depends on length and coderate) Algorithms: Chase Decoding Information Set Decoding Adaptive Belief Propagation Kötter-Vardy We consider the widely used RS(255,239) but RS(255,239) seems to be challenging 6
State-of-the-art Soft Decoder Hardware Real & complete hardware implementations for RS(255,239) Paper Year Algorithm Gain (over HDD) An (PhD thesis, MIT) 2010 Low complexity Chase 0.45 db Hsu et al (ESSCIRC) 2011 Chase 0.35 db Garcia-Herrero et al (CSSP) 2011 Low complexity Chase 0.3 db low gain hardware <0.5 db Kan et al (ISTC) 2008 Adaptive BP 0.75 db Heloir et al (NEWCAS) 2012 Stochastic Chase 0.7 db Scholl et al (DATE) 2014 Information set 0.75 db medium gain hardware 0.5 1 db 7
State-of-the-art Hardware Implementations Hard decision decoding low gain <0.5 db 8
State-of-the-art Hardware Implementations Hard decision decoding medium gain 0.5-1 db low gain <0.5 db 9
State-of-the-art Hardware Implementations Hard decision decoding Literature shows: up to 2 db gain should be possible Not yet investigated! high gain > 1 db medium gain 0.5-1 db low gain <0.5 db 10
Implemented Algorithm* Information set decoding approach most reliable least reliable Received bits Binary image H = 1 1 0 0 1 0 1 1 0 1 0 0 1 1 reliability 0 0 0 0 1 0 1 0 1 1 0 0 1 0 1 0 1 1 0 1 0 1 0 0 0 1 1 1 0 0 0 0 1 0 1 0 0 0 1 0 1 1 1 0 1 1 0 0 1 0 1 1 0 1 0 1 1 1 0 1 1 0 0 1 0 1 0 0 0 1 0 0 1 1 0 0 0 1 0 0 1 1 0 0 *A. Ahmed, R. Koetter, and N. R. Shanbhag. Performance analysis of the adaptive parity check matrix based soft-decision decoding algorithm, 2004. 11
Implemented Algorithm* Information set decoding approach most reliable least reliable Received bits Diagonalized by Gaussian elimination Binary image H = 1 1 0 0 1 0 1 1 0 1 0 0 1 1 reliability 01 01 0 01 1 01 10 01 1 10 0 0 10 0 10 01 1 1 0 1 01 1 0 01 0 10 10 10 01 0 0 01 1 01 1 01 0 0 1 0 10 10 1 0 1 1 0 01 10 0 10 10 0 1 0 10 10 1 0 1 1 01 01 10 0 10 0 0 01 10 01 0 1 1 01 0 0 1 0 0 10 10 0 01 *A. Ahmed, R. Koetter, and N. R. Shanbhag. Performance analysis of the adaptive parity check matrix based soft-decision decoding algorithm, 2004. 12
Implemented Algorithm* Information set decoding approach most reliable least reliable Received bits Diagonalized by Gaussian elimination Binary image H = 1 1 0 0 1 0 1 1 0 1 0 0 1 1 reliability 01 01 0 01 1 01 10 01 1 10 0 0 10 0 10 01 1 1 0 1 01 1 0 01 0 10 10 10 01 0 0 01 1 01 1 01 0 0 1 0 10 10 1 0 1 1 0 01 10 0 10 10 0 1 0 10 10 1 0 1 1 01 01 10 0 10 0 0 01 10 01 0 1 1 01 0 0 1 0 0 10 10 0 01 syndrome 0 0 1 0 0 0 Syndrome weight: Small: Only errors in least rel. part Large: Min. 1 errors in most rel part *A. Ahmed, R. Koetter, and N. R. Shanbhag. Performance analysis of the adaptive parity check matrix based soft-decision decoding algorithm, 2004. 13
Implemented Algorithm* Information set decoding approach most reliable least reliable Received bits Diagonalized by Gaussian elimination Binary image H = 1 1 0 0 1 0 1 1 0 1 0 0 1 1 reliability 01 01 0 01 1 01 10 01 1 10 0 0 10 0 10 01 1 1 0 1 01 1 0 01 0 10 10 10 01 0 0 01 1 01 1 01 0 0 1 0 10 10 1 0 1 1 0 01 10 0 10 10 0 1 0 10 10 1 0 1 1 01 01 10 0 10 0 0 01 10 01 0 1 1 01 0 0 1 0 0 10 10 0 01 syndrome 0 0 1 0 0 0 Syndrome weight: Small: Only errors in least rel. part Large: Min. 1 errors in most rel part Order 1 processing: tentatively flip each most reliable bit (here: 1912) Order 2 processing: tentatively flip all combinations of 2 most reliable bits (~2 million cases) Can be seen as a low complexity variant of ordered-statistics decoding *A. Ahmed, R. Koetter, and N. R. Shanbhag. Performance analysis of the adaptive parity check matrix based soft-decision decoding algorithm, 2004. 14
Algorithm Improvements We add further features for improvement (mostly from other literature): Use a hard decision decoder (counters potential error floor) Use three differently diagonalized parity check matrices (improves FER) Partial overlapping of diagonalized parts allows for sophisticated architecture (complexity reduction) Restrict order 2 processing to fair reliable bits (250 out of 1912) Need to determine additional group: fair reliable (besides least and most) Large reduction of processings (factor 60 less) Use approximative reliability sorting to enable parallelization (higher speed) Overall loss due to complexity reduction: < 0.1 db 15
Our New Hardware Architecture Input: 2040 bit LLRs 8 in parallel Quantization: 6 bits Implementation on Virtex 5 FPGA output: 2040 bits (hard out) 8 in parallel 16
Our Hardware Architecture Sorting Finds low and fair reliable bits Finds 378 lowest out of 2040 LLRs Shift register based insertion sort 8 sorters parallel (approximative sorting) Stores bit positions in four memories 17
Our Hardware Architecture Gaussian Elimination /Diagonalization: Original matrix stored in memory Diagonalization on the fly Diagonalizaton column wise 2 phases: setup & elimination Saves ~70% hardware over state-of-the-art diagonalizations (e.g. systolic arrays) Three diagonalizations: exploit overlapping column original matrix Pipelined array eliminator P + + + + P P P: Fixed pivot positions! column eliminated matrix 18
Our Hardware Architecture Correction Unit Performs order 1 and 2 processing Parallelized order 2 proc. In 1 clock cycle: 1x order 1 6x order 2 3 instances (for 3 matrices) Selects best results for output 19
Our Hardware Architecture Syndrome Calculation: Required: syndrome of the diagonalized matrix Strategy: First: calculate syndrome using original matrix Second: diagonalize syndrome in the Gaussian Elimination Advantage: allows use of Galois field operations (much faster) 20
FPGA Implementations State-of-the-art soft decoder RS(255,239), gain > 0.5 db Kan et al Scholl et al Heloir et al THIS WORK Algorithm Adaptive BP Information Set Stoch. Chase Information Set Chip Stratix II Virtex 5 Virtex 5 Virtex 5 Flipflops n/a 42,000 143,000 70,200 Look-Up Tables 43,700 13,700 117,000 32,400 Throughput 4 Mbit/s 800 Mbit/s 50 Mbit/s 300 Mbit/s Communications gain over HDD 0.75 db 0.75 db 0.7 db 1.3 db M. Kan et al., Hardware implementation of soft-decision decoding for Reed-Solomon code. In Proc. 5th Int. Turbo Codes and Related Topics Symp, 2008. S. Scholl and N. Wehn, Hardware Implementation of a Reed-Solomon Soft Decoder based on Information Set Decoding, DATE 14, 2014. R. Heloir, C. Leroux, S. Hemati, M. Arzel, and W.J.Gross. Stochastic chase decoder for reed-solomon codes. IEEE NEWCAS 2012 Our new architecture 21
FPGA Implementations State-of-the-art soft decoder RS(255,239), gain > 0.5 db Kan et al Scholl et al Heloir et al THIS WORK Algorithm Adaptive BP Information Set Stoch. Chase Information Set Chip Stratix II Virtex 5 Virtex 5 Virtex 5 Flipflops n/a 42,000 143,000 70,200 Look-Up Tables 43,700 13,700 117,000 32,400 Throughput 4 Mbit/s 800 Mbit/s 50 Mbit/s 300 Mbit/s Communications gain over HDD 0.75 db 0.75 db 0.7 db 1.3 db M. Kan et al., Hardware implementation of soft-decision decoding for Reed-Solomon code. In Proc. 5th Int. Turbo Codes and Related Topics Symp, 2008. S. Scholl and N. Wehn, Hardware Implementation of a Reed-Solomon Soft Decoder based on Information Set Decoding, DATE 14, 2014. R. Heloir, C. Leroux, S. Hemati, M. Arzel, and W.J.Gross. Stochastic chase decoder for reed-solomon codes. IEEE NEWCAS 2012 Our new architecture 22
FPGA Implementations State-of-the-art soft decoder RS(255,239), gain > 0.5 db Kan et al Scholl et al Heloir et al THIS WORK Algorithm Adaptive BP Information Set Stoch. Chase Information Set Chip Stratix II Virtex 5 Virtex 5 Virtex 5 Flipflops n/a 42,000 143,000 70,200 Look-Up Tables 43,700 13,700 117,000 32,400 Throughput 4 Mbit/s 800 Mbit/s 50 Mbit/s 300 Mbit/s Communications gain over HDD 0.75 db 0.75 db 0.7 db 1.3 db M. Kan et al., Hardware implementation of soft-decision decoding for Reed-Solomon code. In Proc. 5th Int. Turbo Codes and Related Topics Symp, 2008. S. Scholl and N. Wehn, Hardware Implementation of a Reed-Solomon Soft Decoder based on Information Set Decoding, DATE 14, 2014. R. Heloir, C. Leroux, S. Hemati, M. Arzel, and W.J.Gross. Stochastic chase decoder for reed-solomon codes. IEEE NEWCAS 2012 Our new architecture 23
Comparison FER This work 24
Summary & Outlook Summary Proposed new RS soft decoder hardware for RS(255,239) Based on information set decoding Implementation with currently best FER: gain 1.3 db over HDD New High gain architecture, besides low & medium gain Acceptable complexity Future Challenges Improving implementation efficiency Architectures for specific application s requirements Approach applicable to every linear code 25
Thank you for your attention! Questions? 26
Our new Binary Gaussian Elimination Basic operation: adding rows onto other rows to form unit columns For our hardware: Two Phase Approach 1. Setup: configures addition patterns 2. Elimination: performs actual elimination Architecture: Column by column processing with pipelined array P + + Columns from original matrix + P Columns of eliminated matrix + P P: Fixed pivot positions! S. Scholl, C. Stumm, and N. Wehn. Hardware Implementations of Gaussian Elimination over GF(2) for Channel Decoding Algorithms. IEEE AFRICON 2013. 27
Comparison, 128 x 2040 matrix Design Example: Reed-Solomon (255,239) Code: Binary Matrix Size: 128 x 2040 Implementation on a Xilinx FPGA Chip (Virtex 7) Architecture Look-Up-Tables Flipflops Throughput SMITH* 780k* 260k* Systolic array 82k 99k 219k matrices / s proposed 17k 33k 272k matrices / s * estimated -80% saving -67% saving +25% increase Efficient Gaussian elimination is the key for efficient soft decoding! 28