for some error exponent E( R) as a function R,

Similar documents
Lecture 4 Noisy Channel Coding

Lecture 4 Channel Coding

Lecture 4: Codes based on Concatenation

Error Correcting Codes: Combinatorics, Algorithms and Applications Spring Homework Due Monday March 23, 2009 in class

LECTURE 13. Last time: Lecture outline

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem

Lecture 19 : Reed-Muller, Concatenation Codes & Decoding problem

Shannon s noisy-channel theorem

Lecture 18: Shanon s Channel Coding Theorem. Lecture 18: Shanon s Channel Coding Theorem

Lecture 3: Error Correcting Codes

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

National University of Singapore Department of Electrical & Computer Engineering. Examination for

LECTURE 10. Last time: Lecture outline

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

Shannon s Noisy-Channel Coding Theorem

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

Lecture 7 September 24

Universal Anytime Codes: An approach to uncertain channels in control

Lecture 8: Shannon s Noise Models

Concatenated Polar Codes

Approaching Blokh-Zyablov Error Exponent with Linear-Time Encodable/Decodable Codes

Error Exponent Region for Gaussian Broadcast Channels

Berlekamp-Massey decoding of RS code

16.36 Communication Systems Engineering

Sparse Regression Codes for Multi-terminal Source and Channel Coding

List Decoding of Reed Solomon Codes

X 1 : X Table 1: Y = X X 2

Decoding Reed-Muller codes over product sets

Communication by Regression: Sparse Superposition Codes

Efficiently decodable codes for the binary deletion channel

ABriefReviewof CodingTheory

19. Channel coding: energy-per-bit, continuous-time channels

Communication by Regression: Achieving Shannon Capacity

Lecture 18: Gaussian Channel

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Lecture 03: Polynomial Based Codes

An introduction to basic information theory. Hampus Wessman

Network coding for multicast relation to compression and generalization of Slepian-Wolf

Lecture 12. Block Diagram

Dispersion of the Gilbert-Elliott Channel

Coding Techniques for Data Storage Systems

Chapter 6 Reed-Solomon Codes. 6.1 Finite Field Algebra 6.2 Reed-Solomon Codes 6.3 Syndrome Based Decoding 6.4 Curve-Fitting Based Decoding

Notes for the Hong Kong Lectures on Algorithmic Coding Theory. Luca Trevisan. January 7, 2007

Appendix B Information theory from first principles

6.895 PCP and Hardness of Approximation MIT, Fall Lecture 3: Coding Theory

ECE Advanced Communication Theory, Spring 2009 Homework #1 (INCOMPLETE)

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity

Lecture 21: P vs BPP 2

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Second-Order Asymptotics in Information Theory

CSCI 2570 Introduction to Nanocomputing

An Achievable Error Exponent for the Mismatched Multiple-Access Channel

Maximum Likelihood Decoding of Codes on the Asymmetric Z-channel

Notes 10: List Decoding Reed-Solomon Codes and Concatenated codes

An Introduction to Low Density Parity Check (LDPC) Codes

Lecture 2 Linear Codes

Lecture 12: November 6, 2017

Lecture 7. Union bound for reducing M-ary to binary hypothesis testing

Investigation of the Elias Product Code Construction for the Binary Erasure Channel

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

Lecture 9: List decoding Reed-Solomon and Folded Reed-Solomon codes

A Summary of Multiple Access Channels

Lecture 4: Proof of Shannon s theorem and an explicit code

Channel Dispersion and Moderate Deviations Limits for Memoryless Channels

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

Lecture Introduction. 2 Formal Definition. CS CTT Current Topics in Theoretical CS Oct 30, 2012

Introduction to Convolutional Codes, Part 1

Lower Bounds on the Graphical Complexity of Finite-Length LDPC Codes

Chapter 9 Fundamental Limits in Information Theory

Lecture 15: Thu Feb 28, 2019

4 An Introduction to Channel Coding and Decoding over BSC

CS151 Complexity Theory. Lecture 9 May 1, 2017

lossless, optimal compressor

1 The Low-Degree Testing Assumption

Lecture 9 Polar Coding

MATH Examination for the Module MATH-3152 (May 2009) Coding Theory. Time allowed: 2 hours. S = q

Bridging Shannon and Hamming: List Error-Correction with Optimal Rate

Lecture Notes on Channel Coding

Channel Coding I. Exercises SS 2017

Error Exponent Regions for Gaussian Broadcast and Multiple Access Channels

Related-Key Statistical Cryptanalysis

Error Correcting Codes Questions Pool

The Method of Types and Its Application to Information Hiding

Lecture 6: Expander Codes

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Lecture 14 February 28

1.6: Solutions 17. Solution to exercise 1.6 (p.13).

(Classical) Information Theory III: Noisy channel coding

Error-Correcting Codes:

Lecture 11: Polar codes construction

Practical Polar Code Construction Using Generalised Generator Matrices

Algebraic Soft-Decision Decoding of Reed-Solomon Codes Using Bit-level Soft Information

Lecture 3: Channel Capacity

IMPROVING THE ALPHABET-SIZE IN EXPANDER BASED CODE CONSTRUCTIONS

Physical Layer and Coding

Optimum Rate Communication by Fast Sparse Superposition Codes

Transcription:

. Capacity-achieving codes via Forney concatenation Shannon s Noisy Channel Theorem assures us the existence of capacity-achieving codes. However, exhaustive search for the code has double-exponential complexity: Search over all codebook of size nr over all possible X n codewords. Plan for today: Constructive version of Shannon s Noisy Channel Theorem. The goal is to show that for BSC, it is possible to achieve capacity in polynomial time. Note that we need to consider three aspects of complexity Encoding Decoding Construction of the codes.1 Error exponents Recall we have defined the fundamental limit M (n, ɛ) = max{m (n, M, ɛ)-code} For notational convenience, let us define its functional inverse ɛ (n, M) = inf{ɛ (n, M, ɛ)-code} Shannon s theorem shows that for stationary memoryless channels, ɛ n ɛ ( n, exp( nr)) 0 for any R < C = sup X I(X; Y ). Now we want to know how fast it goes to zero as n. It turns out the speed is exponential, i.e., ɛ n exp( ne( R)) for some error exponent E( R) as a function R, which is also known as the reliability function of the channel. Determining E( R) is one of the most long-standing open problems in information theory. What we know are Lower bound on E(R ) (achievability): Gallager s random coding bound (which analyzes the ML decoder, instead of the suboptimal decoder as in Shannon s random coding bound or DT bound). Upper bound on E(R) (converse): Sphere-packing bound (Shannon-Gallager-Berlekamp), etc. It turns out there exists a number R crit ( 0, C ), called the critical rate, such that the lower and upper bounds meet for all R (R crit, C), where we obtain the value of E(R). For R (0, R crit ), we do not even know the existence of the exponent! Deriving these bounds is outside the scope of this lecture. Instead, we only need the positivity of error exponent, i.e., for any R < C, E( R) > 0. On the other hand, it is easy to see that E( C ) = 0 as a consequence of weak converse. Since as the rate approaches capacity from below, the communication becomes less reliable. The next theorem is a simple application of large deviation. 30

Theorem.1. For any DMC, for any R < C = sup X I( X; Y ), ɛ (n, exp(nr)) exp( ne(r)), for some E(R) > 0. Proof. Fix R < C so that C R > 0. Let P X be the capacity-achieving input distribution, i.e., C = I( X ; Y ). Recall Shannon s random coding bound (DT/Feinstein work as well): ɛ P (i(x; Y ) log M + τ) + exp( τ ). As usual, we apply this bound with iid P X n = (P n n C R X ), log M = nr and τ = ( ), to conclude the achievability of ɛ n P ( 1 n i(xn ; Y n ) C + R ) + exp ( n(c R) ). Since i(x n ; Y n ) = i( Xk; Y k ) is an iid sum, and Ei(X; Y ) = C > (C + R)/, the first term is upper + bounded by exp( nψt ( R C )) where T = i(x; Y ). The proof is complete since ɛ n is smaller than the sum of two exponentially small terms. Note: Better bound can be obtained using DT bound. But to get the best lower bound on E( R) we know (Gallager s random coding bound), we have to analyze the ML decoder.. Achieving polynomially small error probability In the sequel we focus on BSC channel with cross-over probability δ, which is an additive-noise DMC. Fix R < C = 1 h( δ) bits. Let the block length be n. Our goal is to achieve error probability ɛ n n α for arbitrarily large α > 0 in polynomial time. To this end, fix some b > 1 to be specified later and pick m = b log n and divide the block into n m sub-blocks of m bits. Applying Theorem.1, we can find [later on how to find] an ( m, exp(rm), ɛ m )-code such that ɛ m exp( me(r)) = n be(r) where E(R) > 0. Apply this code to each m-bit sub-block and apply ML decoding to each block. The encoding/decoding complexity is at most n m exp(o(m)) = no(1). To analyze the probability of error, use union bound: P e n ɛ m m n be(r)+1 n α, if we choose b α+1. E(R) Remark.1. The final question boils down to how to find the shorter code of blocklength m in poly(n)-time. This will be done if we can show that we can find good code (satisfying the Shannon random coding bound) for BSC of blocklenth m in exponential time. To this end, let us go through the following strategies: 1. Exhaustive search: A codebook is a subset of cardinality Rm out of m possible codewords. Total number of codebooks: ( m Rm) = exp(ω(m Rm )) = exp(ω(n c log n)). The search space is too big. 31

. Linear codes: In Lecture 16 we have shown that for additive-noise channels on finite fields we can focus on linear codes. For BSC, each linear code is parameterized by a generator matrix, with Rm entries. Then there are a total of Rm = n Ω(log n) still superpolynomial and we cannot afford the search over all linear codes. 3. Toeplitz generator matrices: In Homework 8 we see that it does not lose generality to focus on linear codes with Toeplitz generator matrices, i.e., G such that G ij = G i 1,j 1 for all i, j 1. Toeplitz matrices are determined by diagonals. So there are at most m > O ( 1 ) = n and we can find the optimal one in poly(n)-time. Since the channel is additive-noise, linear codes + syndrome decoder leads to the same maximal probability of error as average (Lecture 16). Remark.. Remark on de-randomization; randomness as a resource, coin flips and cooking (brown both sides of onions)....3 Concatenated codes Forney introduced the idea of concatenated codes in 1965 to build longer codes from shorter codes with manageable complexity. It consists of an inner code and an outer code: 1. C k in {0, 1} {0, 1} n, with rate k n. C out B K B N for some alphabet B of cardinality k, with rate K. N The concatenated code C {0, 1} kk {0, 1} nn works as follows (Fig..1): 1. Collect the kk message bits into K symbols in the alphabet B, apply C out componentwise to get a vector in B N. Map each symbol in B into k bits and apply componentwise to get a nn-bit codeword. The rate of the concatenated code is the product of the rates of the inner and outer codes: R = k K n N. kk C out k n D k in D kk out Figure.1: Concatenated code, where there are N inner encoder-decoder pairs. 3

.4 Achieving exponentially small error probability Forney proposed the following idea: Use an optimal code as the inner code Use a Reed-Solomon code as the outer code which can correct a constant fraction of errors. Reed-Solomon (RS) codes are linear codes from F K q F N q where the block length N = q 1 and the message length is K. Similar to the Reed-Muller code, the RS code treats the input K ( a, a,..., a ) as a polynomial p( x) = 1 i 0 1 K 1 i = 0 a iz over F q of degree at most K 1, and encodes it by its values at all non-zero elements. Therefore the RS codeword is a vector ( p( α) α F q /{0}) F N q. Therefore the generator matrix of RS code is a Vandermonde matrix. The RS code has the following advantages: 1. The minimum distance of RS code N K + 1. So if we choose K = (1 ɛ)n, then RS code can correct ɛn errors.. The encoding and decoding (e.g., Berlekamp-Massey decoding algorithm) can be implemented in poly( N) time. In fact, as we will see later, any efficient code which can correct a constant fraction of errors will suffice as the outer code for our purpose. Now we show that we can achieve any rate below capacity and exponentially small probability of error in polynomial time: Fix η, ɛ > 0 arbitrary. Inner code: Let k = ( 1 h( δ) η) n. By Theorem.1, there exists a C which is a linear ( n, k, ɛ n ) ( in { 0, 1 } k { 0, 1 } n, -code and maximal error probability ɛ ne η ) n. By Remark.1, can be chosen to be a linear code with Toeplitz generator matrix, which can be found in n time. The inner decoder is ML, which we can afford since n is small. Outer code: We pick the RS code with field size q = k with blocklength N = k 1. Pick the number of message bits to be K = (1 ɛ)n. Then we have C out F K F N. k k Then we obtain a concatenated code C { 0, 1} kk { 0, 1} nn with blocklength L = nn = n Cn for some constant C and rate R = ( 1 ɛ)( 1 h( δ) η ). It is clear that the code can be constructed in n = poly( L) time and all encoding/decoding operations are poly( L) time. Now we analyze the probability of error: Let us conditioned on the message bits (input to C out ). Since the outer code can correct ɛn errors, an error happens only if the number of erroneous inner encoder-decoder pairs exceeds ɛn. Since the channel is memoryless, each of the N pairs makes an error independently 1 with probability at most ɛ n. Therefore the number of errors is stochastically smaller than Binom(N, ɛ n ), and we can upper bound the total probability of error using Chernoff bound: ɛn P e P [ Binom(N, ɛ n ) ] exp ( Nd(ɛ/ ɛ n )) = exp ( Ω(N log N)) = exp( Ω(L)). where we have used ɛ n exp( Ω(n)) and d(ɛ/ ɛ n ) ɛ log ɛ ɛ n = Ω(n) = Ω(log N). 1 Here controlling the maximal error probability of inner code is the key. If we only have average error probability, then given a uniform distributed input to the RS code, the output symbols (which are the inputs to the inner encoders) need not be independent, and Chernoff bound is not necessarily applicable. 33

Note: For more details see the excellent exposition by Spielman [Spi97]. For modern constructions using sparse graph codes which achieve the same goal in linear time, see, e.g., [Spi96]. 34

MIT OpenCourseWare https://ocw.mit.edu 6.441 Information Theory Spring 016 For information about citing these materials or our Terms of Use, visit: https://ocw.mit.edu/terms.