Lecture 11: Polar codes construction

Similar documents
Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity

Notes 3: Stochastic channels and noisy coding theorem bound. 1 Model of information communication and noisy channel

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Lecture 8: Shannon s Noise Models

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Lecture 4: Proof of Shannon s theorem and an explicit code

Lecture 4 Channel Coding

Lecture 3: Channel Capacity

Shannon s noisy-channel theorem

Chapter 3 Source Coding. 3.1 An Introduction to Source Coding 3.2 Optimal Source Codes 3.3 Shannon-Fano Code 3.4 Huffman Code

Noisy channel communication

Lecture 4 Noisy Channel Coding

National University of Singapore Department of Electrical & Computer Engineering. Examination for

10-704: Information Processing and Learning Fall Lecture 9: Sept 28

Lecture 22: Final Review

Lecture 11: Quantum Information III - Source Coding

Capacity of a channel Shannon s second theorem. Information Theory 1/33

EE5585 Data Compression May 2, Lecture 27

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem

Lecture 14 February 28

Entropy as a measure of surprise

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

Source Coding. Master Universitario en Ingeniería de Telecomunicación. I. Santamaría Universidad de Cantabria

ECE Information theory Final (Fall 2008)

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

Lecture 4: Codes based on Concatenation

Entropies & Information Theory

10-704: Information Processing and Learning Fall Lecture 10: Oct 3

EE229B - Final Project. Capacity-Approaching Low-Density Parity-Check Codes

An introduction to basic information theory. Hampus Wessman

Lecture 16. Error-free variable length schemes (contd.): Shannon-Fano-Elias code, Huffman code

UNIT I INFORMATION THEORY. I k log 2

Practical Polar Code Construction Using Generalised Generator Matrices

LECTURE 13. Last time: Lecture outline

18.2 Continuous Alphabet (discrete-time, memoryless) Channel

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

Lecture 1 : Data Compression and Entropy

Lecture 6: Expander Codes

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

Lecture 2: August 31

Lecture 4 : Adaptive source coding algorithms

LECTURE 10. Last time: Lecture outline

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science Transmission of Information Spring 2006

Memory in Classical Information Theory: A Brief History

Network coding for multicast relation to compression and generalization of Slepian-Wolf

SIGNAL COMPRESSION Lecture 7. Variable to Fix Encoding

Shannon s Noisy-Channel Coding Theorem

EE5139R: Problem Set 4 Assigned: 31/08/16, Due: 07/09/16

Intro to Information Theory

EE376A - Information Theory Midterm, Tuesday February 10th. Please start answering each question on a new page of the answer booklet.

Information Theory and Statistics Lecture 2: Source coding

An Alternative Proof of Channel Polarization for Channels with Arbitrary Input Alphabets

1 Introduction to information theory

Lecture 18: Shanon s Channel Coding Theorem. Lecture 18: Shanon s Channel Coding Theorem

Two Applications of the Gaussian Poincaré Inequality in the Shannon Theory

Error Correcting Codes: Combinatorics, Algorithms and Applications Spring Homework Due Monday March 23, 2009 in class

Lec 03 Entropy and Coding II Hoffman and Golomb Coding

Lecture 1: September 25, A quick reminder about random variables and convexity

Shannon s Noisy-Channel Coding Theorem

Limits to List Decoding Random Codes

Coding for Computing. ASPITRG, Drexel University. Jie Ren 2012/11/14

Lecture 15: Conditional and Joint Typicaility

18.310A Final exam practice questions

Information Theory. Lecture 10. Network Information Theory (CT15); a focus on channel capacity results

X 1 : X Table 1: Y = X X 2

Efficiently decodable codes for the binary deletion channel

Information Theory with Applications, Math6397 Lecture Notes from September 30, 2014 taken by Ilknur Telkes

Lecture 1: Introduction, Entropy and ML estimation

Homework Set #2 Data Compression, Huffman code and AEP

Multi-Kernel Polar Codes: Proof of Polarization and Error Exponents

Chapter 9 Fundamental Limits in Information Theory

Reliable Computation over Multiple-Access Channels

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

On the Rate-Limited Gelfand-Pinsker Problem

Dept. of Linguistics, Indiana University Fall 2015

Channel Coding for Secure Transmissions

ECE Information theory Final

Lecture 6: Kraft-McMillan Inequality and Huffman Coding

Network Coding on Directed Acyclic Graphs

Performance of Polar Codes for Channel and Source Coding

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

10-704: Information Processing and Learning Spring Lecture 8: Feb 5

Notes 10: List Decoding Reed-Solomon Codes and Concatenated codes

(Classical) Information Theory III: Noisy channel coding

Universal Anytime Codes: An approach to uncertain channels in control

EE5585 Data Compression April 18, Lecture 23

Motivation for Arithmetic Coding

Lecture 12: November 6, 2017

3F1 Information Theory, Lecture 3

Chapter 2: Source coding

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

COMPSCI 650 Applied Information Theory Apr 5, Lecture 18. Instructor: Arya Mazumdar Scribe: Hamed Zamani, Hadi Zolfaghari, Fatemeh Rezaei

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

Lecture 3 : Algorithms for source coding. September 30, 2016

How to Achieve the Capacity of Asymmetric Channels

Solutions to Homework Set #3 Channel and Source coding

Transcription:

15-859: Information Theory and Applications in TCS CMU: Spring 2013 Lecturer: Venkatesan Guruswami Lecture 11: Polar codes construction February 26, 2013 Scribe: Dan Stahlke 1 Polar codes: recap of last lecture For a linear code, the codewords are related to the messages by a linear transformation (modulo 2) Since the codewords are longer than the messages, the matrix that represents the linear transformation will be rectangular However, it will simplify the analysis to pad the input message with some extra bits which are constant ( frozen ) in order to make the matrix square It doesn t matter if those bits get corrupted since they are not part of the message and we already know what their value was supposed to be anyway U 0 N = 2 n X 0 Y 0 = X 0 with prob 1 α? with prob α U 1 A n invertible N N W (BEC α ) U N 1 X N 1 Y N 1 Message Codeword Channel output The encoding operation is X0 N 1 = A n U0 N 1 As described last lecture, the matrix A n is defined recursively and ends up being in a tensor product form A n = G n 2 B n where B n denotes bit reversal The idea behind the polar code is that when decoding a corrupted message Y0 N 1, the vast majority of the uncertainty ends up being in the subset of U0 N 1 that were frozen Specifically, consider the entropy chain rule: N 1 i=0 N 1 H(U i U0 i 1, Y0 N 1 ) = i=0 H(X i Y i ) }} =α = Nα (1) We want a fraction α of the terms on the left hand side to be nearly 1 and the rest nearly 0 The former will be nearly uncertain and we take these to be the frozen bits, the latter will be nearly certain and we feed the message into these The goal of this lecture is to prove that the entropy accumulates into an α fraction of the U i This is called polarization 1

2 Proof of polarization First, give a name to the terms on the left hand side of (1) by defining a function α n : 0, 1,, 2 n 1} [0, 1] by α n (i) = H(U i U0 i 1, Y0 N 1 ) (2) This is the probability that U i is not known given U0 i 1 and Y0 N 1 Last lecture it was shown that α n ( ) evolves according to the recurrence 2α n 1 ( i α n (i) = 2 ) α n 1( i 2 )2 if i even α n 1 ( i (3) 2 )2 if i odd It is helpful to visualize this recurrence as a tree: α 2α α 2 α 2 2(2α α 2 ) (2α α 2 ) 2 (2α α 2 ) 2 2α 2 α 4 α 4 α 0 ( ) α 1 ( ) α 2 ( ) Think of α n as a random variable on 0, 1} n or 0, 1,, 2 n 1 } induced by the uniform distribution on 0, 1} n Specifically, with i(b) representing the integer having b as the binary representation, define the random variable Z n (b) = α n (i(b)) [0, 1] (4) with the uniform distribution on 0, 1} n Then E(Z n ) is the average value of the nodes on level n of the tree We want to show that as n grows Z n approaches a Bernoulli distribution, so that the values of the leafs are all close to either zero or one Suppose we know Z n Then the value of Z n+1 is 2Z n Zn 2 with probability 1/2 Z n+1 = Zn 2 (5) with probability 1/2 Therefore E[Z n+1 Z n ] = 1 2 E[2Z n Z 2 n] + 1 2 E[Z2 n] = Z n and so E[Z n ] = E[Z 0 ] = α This sequence Z 0, Z 1, is a martingale Definition 1 (Martingale) A sequence of random values Z 0, Z 1, such that n (i) E Z n } < (ii) EZ n+1 Z 0,, Z n } = Z n is a martingale 2

Since our random values Z n } are bounded in [0, 1], they converge to a fixed point by virtue of the so-called martingale convergence theorem We have ɛ > 0, lim Pr ( Z n+1(ω) Z n (ω) > ɛ) = 0, (6) ω where ω is the tree path Therefore E Z n+1 (ω) Z n (ω) } 0 But this can be expanded out to E Z n+1 (ω) Z n (ω) } = 1 2 E[2Z n Z 2 n Z n ] + 1 2 E[Z n Z 2 n] (7) = E[Z n Z 2 n] (8) = E[Z n (1 Z n )] 0 (9) So most of the time Z n (1 Z n ) 0 This only happens if Z n is close to 0 or 1 with high probability, lim [Z n (ɛ, 1 ɛ)] = 0 Since E[Z n ] = E[Z 0 ] = α, we have Pr(Z n 0) = 1 α and Pr(Z n 1) = α, or more precisely ɛ > 0 lim n < ɛ) = 1 α (10) lim n > 1 ɛ) = α (11) Quantitatively, one can prove ( lim Pr Z n < 2 N 049) = 1 α (12) where 049 is just some number less than a half (since the random variable gets squared about n/2 times) Corollary 1 For all large enough n, there exists a subset F n 0, 1,, 2 n 1} (the bad bits ) with F n (α + o(1))n such that α n (i) < 2 N 049 for all i F n 3 Obtaining the code u A n x Message Codeword Freeze u i, i F n to 0 This gives a code of rate 1 α o(1) Formally, this is a linear code c n = A n u u 0, 1} N, u i = 0 i F n } (13) The decoding procedure (for BEC α ) is as follows For i = 0, 1,, 2 n 1}, 3

(i) If i F n then u i = 0 (ii) If i F n then recover u i from u i 1 0, y0 N 1 if possible If not possible, then FAIL This is done using maximum likelihood decoding Claim: The probability that this algorithm fails to decode a random 1 input message u under BEC α noise is (using the union bound) Pr[FAIL] H(u i u i 1 0, y0 N 1 ) (14) i F n α n (i) (15) i F n N 049 N 2 0 (16) Efficiency: For BEC α one can compute F n efficiently (poly(n)) by computing the tree of values α n (i) Once F n is known, recovery is efficient Step (ii) can be implemented recursively mirroring the analysis that we have done 4 General channels For general channels, the recursive setup remains the same, and A n is the same, but F n will depend on the channel Consider recovery of u 2i = v i w i and u 2i+1 = w i from v0 i 1, w0 i 1, y0 N 1 Define β := H(U 2i V0 i 1, W0 i 1, Y0 N 1 ) (17) }} U 2i 1 0 γ := H(U 2i+1 V0 i 1, W0 i 1, Y0 N 1 ) (18) We know β is small because of Fano s inequality, so we can estimate each bit from the previous ones The random variables V i V0 i 1, Y N/2 1 0 and W i V0 i 1, Y N 1 N/2 are iid and have entropies β and γ By conservation of entropy, β + γ = 2α since u 2i = v i w i and u 2i+1 = w i It can be shown that one of β, γ is greater than α and the other less, so there is polarization Lemma 1 (proof omitted) Let (B 1, D 1 ) and (B 2, D 2 ) be iid pairs of discrete random variables, B 1, B 2 0, 1} If H(B i D i ) (δ, 1 δ) for δ > 0 then H(B 1 +B 2 D 1, D 2 ) H(B 1, D 1 ) γ(δ) > 0 By this lemma, β > α and γ < α In order to get quantitative bounds that imply high probability decoding, we compute not H(U i ) but so called Bhattacharya parameters of the channel, Z(W ) = y Y p(y 0)p(y 1) (19) = v 0, v 1, (20) ( p(y 0) ) ( p(y 1) ) where v 0 = and v 1 = For the BEC α channel we have Z(BEC α ) = α so y y analysis was just tracking the perceived erasure probabilities of all the intermediate bits during the decoding process 1 Since the code is linear, random fixed 4

5 Some comments We conclude the discussion of polar codes with some comments: The technique actually gives an alternate proof of Shannon s noisy coding theorem, for the case of binary input symmetric output memoryless channels As polar codes are linear, they can only achieve capacity of channels for which the uniform input distribution X achieves max X I(X; Y ) The method generalizes to prime alphabets without changing the code construction non-prime alphabets a different polarizing map has been constructed For Taking Y to be empty in our analysis gives us a source code for compressing N iid copies of the random variable X: X N 1 0 U0 N 1 = M n X0 N 1 For all but a vanishing fraction of indices i, we will have H(U i U0 i 1 ) 0 or H(U i U0 i 1 ) 1 As the sum of these entries equals NH(X), the number of bits for which H(U i U0 i 1 ) 0 is about (1 H(X))N, and we just need to store the remaining H(X)N bits of U0 N 1 and we can recover the rest, and therefore also X0 N 1 For the construction of the channel code (or the source code as in the previous item), we need to know which indices have conditional entropy close to 0 and which ones have conditional entropy close to 1 Taking the above source code setting, it might seem that the entropies H(U i U0 i 1 ) for an initial segment of i s will be 1, and as we condition more and more, the later H(U i U0 i 1 ) will be 0 Unfortunately there seems to be no simple characterization of which entropies polarize to 0 and which polarize to 1 However, there is an algorithmic way to estimate these quantities (or to be accurate, their proxies, the associated Bhattacharyya parameters) which can be used to compute the generator matrix of the code (or the compression matrix for source coding) Soon after their discovery, polar codes have been applied to many other classic information theory problems such as Slepian-Wolf, Wyner-Ziv, Gelfand-Pinsker, etc A well written reference on polar codes is the survey Polarization and polar codes by Eren Sasoglu which has been published as Volume 8, No 4 in the series Foundations and Trends in Communications and Information Theory 5