Polar Coding. Part 1 - Background. Erdal Arıkan. Electrical-Electronics Engineering Department, Bilkent University, Ankara, Turkey

Similar documents
Channel combining and splitting for cutoff rate improvement

On the Origin of Polar Coding

On the Origin of Polar Coding

Lecture 4 Noisy Channel Coding

ECEN 655: Advanced Channel Coding

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Lecture 9 Polar Coding

An Introduction to Low Density Parity Check (LDPC) Codes

LECTURE 10. Last time: Lecture outline

The PPM Poisson Channel: Finite-Length Bounds and Code Design

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Practical Polar Code Construction Using Generalised Generator Matrices

Belief propagation decoding of quantum channels by passing quantum messages

for some error exponent E( R) as a function R,

Successive Cancellation Decoding of Single Parity-Check Product Codes

One Lesson of Information Theory

On Bit Error Rate Performance of Polar Codes in Finite Regime

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Communication by Regression: Sparse Superposition Codes

Delay, feedback, and the price of ignorance

Optimum Soft Decision Decoding of Linear Block Codes

Error Correction Methods

EE229B - Final Project. Capacity-Approaching Low-Density Parity-Check Codes

Coding theory: Applications

Chapter 7: Channel coding:convolutional codes

List Decoding of Reed Solomon Codes

Lecture 4 : Introduction to Low-density Parity-check Codes

SOFT DECISION FANO DECODING OF BLOCK CODES OVER DISCRETE MEMORYLESS CHANNEL USING TREE DIAGRAM

Noisy channel communication

ELEC546 Review of Information Theory

Lecture 7. Union bound for reducing M-ary to binary hypothesis testing

Shannon s noisy-channel theorem

Coding Techniques for Data Storage Systems

Serially Concatenated Polar Codes

Decoding Reed-Muller codes over product sets

Reed-Solomon codes. Chapter Linear codes over finite fields

Lecture 19 : Reed-Muller, Concatenation Codes & Decoding problem

An analysis of the computational complexity of sequential decoding of specific tree codes over Gaussian channels

Lecture 4 Channel Coding

Approaching Blokh-Zyablov Error Exponent with Linear-Time Encodable/Decodable Codes

Lecture 11: Polar codes construction

~1~~ ~ ~~~ ~~~

Lecture 4: Proof of Shannon s theorem and an explicit code

ECE Advanced Communication Theory, Spring 2009 Homework #1 (INCOMPLETE)

Low-complexity error correction in LDPC codes with constituent RS codes 1

Feedback Capacity of a Class of Symmetric Finite-State Markov Channels

Sequential Decoding of Binary Convolutional Codes

Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels

Lecture 12. Block Diagram

Lecture 8: Shannon s Noise Models

THIS paper is aimed at designing efficient decoding algorithms

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

National University of Singapore Department of Electrical & Computer Engineering. Examination for

1 Ex. 1 Verify that the function H(p 1,..., p n ) = k p k log 2 p k satisfies all 8 axioms on H.

CSCI 2570 Introduction to Nanocomputing

Exponential Error Bounds for Block Concatenated Codes with Tail Biting Trellis Inner Codes

Introduction to Low-Density Parity Check Codes. Brian Kurkoski

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem

Aalborg Universitet. Bounds on information combining for parity-check equations Land, Ingmar Rüdiger; Hoeher, A.; Huber, Johannes

An Achievable Error Exponent for the Mismatched Multiple-Access Channel

Lecture 14 February 28

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

Performance of Polar Codes for Channel and Source Coding

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity

3F1 Information Theory, Lecture 3

4 An Introduction to Channel Coding and Decoding over BSC

Performance Analysis and Code Optimization of Low Density Parity-Check Codes on Rayleigh Fading Channels

Exact Probability of Erasure and a Decoding Algorithm for Convolutional Codes on the Binary Erasure Channel

Code design: Computer search

Digital Communication Systems ECS 452. Asst. Prof. Dr. Prapun Suksompong 5.2 Binary Convolutional Codes

Analyzing Large Communication Networks

Stabilization over discrete memoryless and wideband channels using nearly memoryless observations

Information Theory. Lecture 10. Network Information Theory (CT15); a focus on channel capacity results

Exercise 1. = P(y a 1)P(a 1 )

On the Error Exponents of ARQ Channels with Deadlines

Optimum Rate Communication by Fast Sparse Superposition Codes

Entropies & Information Theory

Chapter 9 Fundamental Limits in Information Theory

On Two Probabilistic Decoding Algorithms for Binary Linear Codes

Graph-based codes for flash memory

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Entropy as a measure of surprise

The sequential decoding metric for detection in sensor networks

3F1 Information Theory, Lecture 3

Shannon s Noisy-Channel Coding Theorem

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

Asymptotic redundancy and prolixity

S. Arnstein D. Bridwell A. Bucher Chase D. Falconer L. Greenspan Haccoun M. Heggestad

Roll No. :... Invigilator's Signature :.. CS/B.TECH(ECE)/SEM-7/EC-703/ CODING & INFORMATION THEORY. Time Allotted : 3 Hours Full Marks : 70

Lecture 11: Quantum Information III - Source Coding

LECTURE 13. Last time: Lecture outline

Lecture 19: Elias-Bassalygo Bound

UNIT I INFORMATION THEORY. I k log 2

SIPCom8-1: Information Theory and Coding Linear Binary Codes Ingmar Land

ABriefReviewof CodingTheory

Notes 10: List Decoding Reed-Solomon Codes and Concatenated codes

(Preprint of paper to appear in Proc Intl. Symp. on Info. Th. and its Applications, Waikiki, Hawaii, Nov , 1990.)

An Improved Sphere-Packing Bound for Finite-Length Codes over Symmetric Memoryless Channels

Maximum Likelihood Decoding of Codes on the Asymmetric Z-channel

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

Transcription:

Polar Coding Part 1 - Background Erdal Arıkan Electrical-Electronics Engineering Department, Bilkent University, Ankara, Turkey Algorithmic Coding Theory Workshop June 13-17, 2016 ICERM, Providence, RI

Outline Sequential decoding and the cutoff rate Guessing and cutoff rate Boosting the cutoff rate Pinsker s scheme Massey s scheme Polar coding

Sequential decoding and the cutoff rate Guessing and cutoff rate Boosting the cutoff rate Pinsker s scheme Massey s scheme Polar coding Sequential decoding and the cutoff rate 1 / 72

Tree coding and sequential decoding (SD) Consider a tree code (of rate 1/2) A path is chosen and transmitted Given the channel output, search the tree for the correct (transmitted) path The tree structure turns the ML decoding problem into a tree search problem A depth-first search algorithm exists called sequential decoding (SD) 0 1 00 11 00 11 10 01 00 11 10 01 11 00 01 10 00 11 10 01 11 00 01 10 00 11 10 01 11 00 01 10 Transmitted path Sequential decoding and the cutoff rate 2 / 72

Tree coding and sequential decoding (SD) Consider a tree code (of rate 1/2) A path is chosen and transmitted Given the channel output, search the tree for the correct (transmitted) path The tree structure turns the ML decoding problem into a tree search problem A depth-first search algorithm exists called sequential decoding (SD) 0 1 00 11 00 11 10 01 00 11 10 01 11 00 01 10 00 11 10 01 11 00 01 10 00 11 10 01 11 00 01 10 Transmitted path Sequential decoding and the cutoff rate 2 / 72

Tree coding and sequential decoding (SD) Consider a tree code (of rate 1/2) A path is chosen and transmitted Given the channel output, search the tree for the correct (transmitted) path The tree structure turns the ML decoding problem into a tree search problem A depth-first search algorithm exists called sequential decoding (SD) 0 1 00 11 00 11 10 01 00 11 10 01 11 00 01 10 00 11 10 01 11 00 01 10 00 11 10 01 11 00 01 10 Transmitted path Sequential decoding and the cutoff rate 2 / 72

Tree coding and sequential decoding (SD) Consider a tree code (of rate 1/2) A path is chosen and transmitted Given the channel output, search the tree for the correct (transmitted) path The tree structure turns the ML decoding problem into a tree search problem A depth-first search algorithm exists called sequential decoding (SD) 0 1 00 11 00 11 10 01 00 11 10 01 11 00 01 10 00 11 10 01 11 00 01 10 00 11 10 01 11 00 01 10 Transmitted path Sequential decoding and the cutoff rate 2 / 72

Tree coding and sequential decoding (SD) Consider a tree code (of rate 1/2) A path is chosen and transmitted Given the channel output, search the tree for the correct (transmitted) path The tree structure turns the ML decoding problem into a tree search problem A depth-first search algorithm exists called sequential decoding (SD) 0 1 00 11 00 11 10 01 00 11 10 01 11 00 01 10 00 11 10 01 11 00 01 10 00 11 10 01 11 00 01 10 Transmitted path Sequential decoding and the cutoff rate 2 / 72

Search metric SD uses a metric to distinguish the correct path from the incorrect ones Fano s metric: Γ(y n, x n ) = log P(y n x n ) P(y n ) nr path length candidate path received sequence code rate n x n y n R Sequential decoding and the cutoff rate 3 / 72

History Tree codes were introduced by Elias (1955) with the aim of reducing the complexity of ML decoding (the tree structure makes it possible to use search heuristics for ML decoding) Sequential decoding was introduced by Wozencraft (1957) as part of his doctoral thesis Fano (1963) simplified the search algorithm and introduced the above metric Sequential decoding and the cutoff rate 4 / 72

History Tree codes were introduced by Elias (1955) with the aim of reducing the complexity of ML decoding (the tree structure makes it possible to use search heuristics for ML decoding) Sequential decoding was introduced by Wozencraft (1957) as part of his doctoral thesis Fano (1963) simplified the search algorithm and introduced the above metric Sequential decoding and the cutoff rate 4 / 72

History Tree codes were introduced by Elias (1955) with the aim of reducing the complexity of ML decoding (the tree structure makes it possible to use search heuristics for ML decoding) Sequential decoding was introduced by Wozencraft (1957) as part of his doctoral thesis Fano (1963) simplified the search algorithm and introduced the above metric Sequential decoding and the cutoff rate 4 / 72

Drift properties of the metric On the correct path, the expectation of the metric per channel symbol is [ p(x, y) log p(y x) ] P(y) R = I (X ; Y ) R. y,x On any incorrect path, the expectation is [ p(x)p(y) log p(y x) ] p(y) R R x,y A properly designed SD scheme given enough time identifies the correct path with probability one at any rate R < I (X ; Y ). Sequential decoding and the cutoff rate 5 / 72

Drift properties of the metric On the correct path, the expectation of the metric per channel symbol is [ p(x, y) log p(y x) ] P(y) R = I (X ; Y ) R. y,x On any incorrect path, the expectation is [ p(x)p(y) log p(y x) ] p(y) R R x,y A properly designed SD scheme given enough time identifies the correct path with probability one at any rate R < I (X ; Y ). Sequential decoding and the cutoff rate 5 / 72

Drift properties of the metric On the correct path, the expectation of the metric per channel symbol is [ p(x, y) log p(y x) ] P(y) R = I (X ; Y ) R. y,x On any incorrect path, the expectation is [ p(x)p(y) log p(y x) ] p(y) R R x,y A properly designed SD scheme given enough time identifies the correct path with probability one at any rate R < I (X ; Y ). Sequential decoding and the cutoff rate 5 / 72

Computation problem in sequential decoding Computation in sequential decoding is a random quantity, depending on the code rate R and the noise realization Bursts of noise create barriers for the depth-first search algorithm, necessitating excessive backtracking in the search Still, the average computation per decoded digit in sequential decoding can be kept bounded provided the code rate R is below the cutoff rate ( R 0 = log Q(x) ) 2 W (y x) y x So, SD solves the coding problem for rates below R 0 Indeed, SD was the method of choice in space communications, albeit briefly Sequential decoding and the cutoff rate 6 / 72

Computation problem in sequential decoding Computation in sequential decoding is a random quantity, depending on the code rate R and the noise realization Bursts of noise create barriers for the depth-first search algorithm, necessitating excessive backtracking in the search Still, the average computation per decoded digit in sequential decoding can be kept bounded provided the code rate R is below the cutoff rate ( R 0 = log Q(x) ) 2 W (y x) y x So, SD solves the coding problem for rates below R 0 Indeed, SD was the method of choice in space communications, albeit briefly Sequential decoding and the cutoff rate 6 / 72

Computation problem in sequential decoding Computation in sequential decoding is a random quantity, depending on the code rate R and the noise realization Bursts of noise create barriers for the depth-first search algorithm, necessitating excessive backtracking in the search Still, the average computation per decoded digit in sequential decoding can be kept bounded provided the code rate R is below the cutoff rate ( R 0 = log Q(x) ) 2 W (y x) y x So, SD solves the coding problem for rates below R 0 Indeed, SD was the method of choice in space communications, albeit briefly Sequential decoding and the cutoff rate 6 / 72

Computation problem in sequential decoding Computation in sequential decoding is a random quantity, depending on the code rate R and the noise realization Bursts of noise create barriers for the depth-first search algorithm, necessitating excessive backtracking in the search Still, the average computation per decoded digit in sequential decoding can be kept bounded provided the code rate R is below the cutoff rate ( R 0 = log Q(x) ) 2 W (y x) y x So, SD solves the coding problem for rates below R 0 Indeed, SD was the method of choice in space communications, albeit briefly Sequential decoding and the cutoff rate 6 / 72

Computation problem in sequential decoding Computation in sequential decoding is a random quantity, depending on the code rate R and the noise realization Bursts of noise create barriers for the depth-first search algorithm, necessitating excessive backtracking in the search Still, the average computation per decoded digit in sequential decoding can be kept bounded provided the code rate R is below the cutoff rate ( R 0 = log Q(x) ) 2 W (y x) y x So, SD solves the coding problem for rates below R 0 Indeed, SD was the method of choice in space communications, albeit briefly Sequential decoding and the cutoff rate 6 / 72

References on complexity of sequential decoding Achievability: Wozencraft (1957), Reiffen (1962), Fano (1963), Stiglitz and Yudkin (1964) Converse: Jacobs and Berlekamp (1967) Refinements: Wozencraft and Jacobs (1965), Savage (1966), Gallager (1968), Jelinek (1968), Forney (1974), Arıkan (1986), Arıkan (1994) Sequential decoding and the cutoff rate 7 / 72

Sequential decoding and the cutoff rate Guessing and cutoff rate Boosting the cutoff rate Pinsker s scheme Massey s scheme Polar coding Guessing and cutoff rate 8 / 72

A computational model for sequential decoding SD visits nodes at level N in a certain order No look-ahead assumption: SD forgets what it saw beyond level N upon backtracking Complexity measure G N : The number of nodes searched (visited) at level N until the correct node is visited for the first time Guessing and cutoff rate 9 / 72

A computational model for sequential decoding SD visits nodes at level N in a certain order No look-ahead assumption: SD forgets what it saw beyond level N upon backtracking Complexity measure G N : The number of nodes searched (visited) at level N until the correct node is visited for the first time Guessing and cutoff rate 9 / 72

A computational model for sequential decoding SD visits nodes at level N in a certain order No look-ahead assumption: SD forgets what it saw beyond level N upon backtracking Complexity measure G N : The number of nodes searched (visited) at level N until the correct node is visited for the first time Guessing and cutoff rate 9 / 72

A bound of computational complexity Let R be a fixed code rate. There exist tree codes of rate R such that E[G N ] 1 + 2 N(R0 R). Conversely, for any tree code of rate R, E[G N ] 1 + 2 N(R 0 R) Guessing and cutoff rate 10 / 72

A bound of computational complexity Let R be a fixed code rate. There exist tree codes of rate R such that E[G N ] 1 + 2 N(R0 R). Conversely, for any tree code of rate R, E[G N ] 1 + 2 N(R 0 R) Guessing and cutoff rate 10 / 72

A bound of computational complexity Let R be a fixed code rate. There exist tree codes of rate R such that E[G N ] 1 + 2 N(R0 R). Conversely, for any tree code of rate R, E[G N ] 1 + 2 N(R 0 R) Guessing and cutoff rate 10 / 72

The Guessing Problem Alice draws a sample of a random variable X P. Bob wishes to determine X by asking questions of the form Is X equal to x? which are answered truthfully by Alice. Bob s goal is to minimize the expected number of questions until he gets a YES answer. Guessing and cutoff rate 11 / 72

The Guessing Problem Alice draws a sample of a random variable X P. Bob wishes to determine X by asking questions of the form Is X equal to x? which are answered truthfully by Alice. Bob s goal is to minimize the expected number of questions until he gets a YES answer. Guessing and cutoff rate 11 / 72

The Guessing Problem Alice draws a sample of a random variable X P. Bob wishes to determine X by asking questions of the form Is X equal to x? which are answered truthfully by Alice. Bob s goal is to minimize the expected number of questions until he gets a YES answer. Guessing and cutoff rate 11 / 72

Guessing with Side Information Alice samples (X, Y ) P(x, y). Bob observes Y and is to determine X by asking the same type of questions Is X equal to x? The goal is to minimize the expected number of quesses. Guessing and cutoff rate 12 / 72

Guessing with Side Information Alice samples (X, Y ) P(x, y). Bob observes Y and is to determine X by asking the same type of questions Is X equal to x? The goal is to minimize the expected number of quesses. Guessing and cutoff rate 12 / 72

Guessing with Side Information Alice samples (X, Y ) P(x, y). Bob observes Y and is to determine X by asking the same type of questions Is X equal to x? The goal is to minimize the expected number of quesses. Guessing and cutoff rate 12 / 72

Optimal guessing strategies Let G be the number of guesses to determine X. The expected no of guesses is given by E[G] = x X P(x)G(x) A guessing strategy minimizes E[G] if P(x) > P(x ) = G(x) < G(x ). Guessing and cutoff rate 13 / 72

Optimal guessing strategies Let G be the number of guesses to determine X. The expected no of guesses is given by E[G] = x X P(x)G(x) A guessing strategy minimizes E[G] if P(x) > P(x ) = G(x) < G(x ). Guessing and cutoff rate 13 / 72

Optimal guessing strategies Let G be the number of guesses to determine X. The expected no of guesses is given by E[G] = x X P(x)G(x) A guessing strategy minimizes E[G] if P(x) > P(x ) = G(x) < G(x ). Guessing and cutoff rate 13 / 72

Upper bound on guessing effort For any optimal guessing function [ ] 2 E[G (X )] P(x) Proof. x G (x) all x P(x )/P(x) = M ip G (i) i=1 E[G (X )] x P(x) [ ] 2 P(x )/P(x) = P(x). x x Guessing and cutoff rate 14 / 72

Lower bound on guessing effort For any guessing function for a target r.v. X with M possible values, [ ] 2 E[G(X )] (1 + ln M) 1 P(x) x For the proof we use the following variant of Hölder s inequality. Guessing and cutoff rate 15 / 72

Lemma Let a i, p i be positive numbers. [ a i p i i i a 1 i ] 1 [ ] 2 pi. Proof. Let λ = 1/2 and put A i = a 1 i, B i = ai λpλ i, in Hölder s inequality [ A i B i i i A 1/(1 λ) i ] 1 λ [ i i ] λ B 1/λ i. Guessing and cutoff rate 16 / 72

Proof of Lower Bound M E[G(X ) = ip G (i) i=1 ( M ) 1 ( M 1/i pg (i) i=1 i=1 ) 2 ( M ) 1 ( ) 2 = 1/i P(x) i=1 x ( ) 2 (1 + ln M) 1 P(x) x Guessing and cutoff rate 17 / 72

Essense of the inequalities For any set of real numbers p 1 p 2 p M > 0, 1 M i=1 i p i [ M ] 2 (1 + ln M) 1 i=1 pi Guessing and cutoff rate 18 / 72

Guessing Random Vectors Let X = (X 1,..., X n ) P(x 1,..., x n ). Guessing X means asking questions of the form Is X = x? for possible values x = (x 1,..., x n ) of X. Notice that coordinate-wise probes of the type Is X i = x i? are not allowed. Guessing and cutoff rate 19 / 72

Complexity of Vector Guessing Suppose X i has M i possible values, i = 1,..., n. Then, 1 E[G (X 1,..., X n )] [ ] 2 [1 + ln(m 1 M n )] 1 x1,...,xn P(x1,..., x n ) In particular, if X 1,..., X n are i.i.d. P with a common alphabet X, 1 E[G (X 1,..., X n )] [ ] 2n x X P(x) [1 + n ln X ] 1 Guessing and cutoff rate 20 / 72

Guessing with Side Information (X, Y ) a pair of random variables with a joint distribution P(x, y). Y known. X to be guessed as before. G(x y) the number of guesses when X = x, Y = y. Guessing and cutoff rate 21 / 72

Lower Bound For any guessing strategy and any ρ > 0, [ ] 2 P(x, y) E[G(X Y )] (1 + ln M) 1 y x where M is the number of possible values of X. Proof. E[G(X Y )] = y y P(y)E[G(X Y = y)] P(y)(1 + ln M) 1 [ x P(x y) ] 2 [ ] 2 P(x, y) = (1 + ln M) 1 y x Guessing and cutoff rate 22 / 72

Upper bound Optimal guessing functions satisfy [ ] 2 P(x, y). E[G (X Y )] y x Proof. E[G (X Y )] = y P(y) x P(x y)g (x y) y [ ] 2 P(y) P(x y) x [ ] 2 P(x, y). = y x Guessing and cutoff rate 23 / 72

Generalization to Random Vectors For optimal guessing functions, for ρ > 0, E[G (X 1,..., X k Y 1,..., Y n )] 1 [ y 1,...,y n x 1,...,x P(x1 k,..., x k, y 1,..., y n ) [1 + ln(m 1 M k )] 1 where M i denotes the number of possible values of X i. ] 2 Guessing and cutoff rate 24 / 72

A guessing decoder Consider a block code with M codewords x 1,..., x M of block length N. Suppose a codeword is chosen at random and sent over a channel W Given the channel output y, a guessing decoder decodes by asking questions of the form Is the correct codeword the mth one? to which it receives a truthful YES or NO answer. On a NO answer it repeats the question with a new m. The complexity C for this decoder is the number of questions until a YES answer. Guessing and cutoff rate 25 / 72

A guessing decoder Consider a block code with M codewords x 1,..., x M of block length N. Suppose a codeword is chosen at random and sent over a channel W Given the channel output y, a guessing decoder decodes by asking questions of the form Is the correct codeword the mth one? to which it receives a truthful YES or NO answer. On a NO answer it repeats the question with a new m. The complexity C for this decoder is the number of questions until a YES answer. Guessing and cutoff rate 25 / 72

A guessing decoder Consider a block code with M codewords x 1,..., x M of block length N. Suppose a codeword is chosen at random and sent over a channel W Given the channel output y, a guessing decoder decodes by asking questions of the form Is the correct codeword the mth one? to which it receives a truthful YES or NO answer. On a NO answer it repeats the question with a new m. The complexity C for this decoder is the number of questions until a YES answer. Guessing and cutoff rate 25 / 72

A guessing decoder Consider a block code with M codewords x 1,..., x M of block length N. Suppose a codeword is chosen at random and sent over a channel W Given the channel output y, a guessing decoder decodes by asking questions of the form Is the correct codeword the mth one? to which it receives a truthful YES or NO answer. On a NO answer it repeats the question with a new m. The complexity C for this decoder is the number of questions until a YES answer. Guessing and cutoff rate 25 / 72

A guessing decoder Consider a block code with M codewords x 1,..., x M of block length N. Suppose a codeword is chosen at random and sent over a channel W Given the channel output y, a guessing decoder decodes by asking questions of the form Is the correct codeword the mth one? to which it receives a truthful YES or NO answer. On a NO answer it repeats the question with a new m. The complexity C for this decoder is the number of questions until a YES answer. Guessing and cutoff rate 25 / 72

Optimal guessing decoder An optimal guessing decoder is one that minimizes the expected complexity E[C]. Clearly, E[C] is minimized by generating the guesses in decreasing order of likelihoods W (y x m ). x i1 1st guess (the most likely codeword given y) x i2 2nd guess (2nd most likely codeword given y). x L correct codeword obtained; guessing stops Complexity C equals the number of guesses L Guessing and cutoff rate 26 / 72

Application to the guessing decoder A block code C = {x 1,..., x M } with M = e NR codewords of block length N. A codeword X chosen at random and sent over a DMC W. Given the channel output vector Y, the decoder guesses X. A special case of guessing with side information where P(X = x, Y = y) = e NR N i=1 W (y i x i ), x C Guessing and cutoff rate 27 / 72

Cutoff rate bound E[G (X Y)] [1 + NR] 1 y = [1 + NR] 1 e NR y [ ] 2 P(x, y) [1 + NR] 1 e N(R R 0(W )) x [ Q N (x) W N (x, y) x ] 2N where R 0 (W ) = max Q ln y [ Q(x) ] 2 W (y x) x is the channel cutoff rate. Guessing and cutoff rate 28 / 72

Sequential decoding and the cutoff rate Guessing and cutoff rate Boosting the cutoff rate Pinsker s scheme Massey s scheme Polar coding Boosting the cutoff rate 29 / 72

Boosting the cutoff rate It was clear almost from the beginning that R 0 was at best shaky in its role as a limit to practical communications There were many attempts to boost the cutoff rate by devising clever schemes for searching a tree One striking example is Pinsker s scheme that displayed the strange nature of R 0 Boosting the cutoff rate 30 / 72

Boosting the cutoff rate It was clear almost from the beginning that R 0 was at best shaky in its role as a limit to practical communications There were many attempts to boost the cutoff rate by devising clever schemes for searching a tree One striking example is Pinsker s scheme that displayed the strange nature of R 0 Boosting the cutoff rate 30 / 72

Boosting the cutoff rate It was clear almost from the beginning that R 0 was at best shaky in its role as a limit to practical communications There were many attempts to boost the cutoff rate by devising clever schemes for searching a tree One striking example is Pinsker s scheme that displayed the strange nature of R 0 Boosting the cutoff rate 30 / 72

Sequential decoding and the cutoff rate Guessing and cutoff rate Boosting the cutoff rate Pinsker s scheme Massey s scheme Polar coding Pinsker s scheme 31 / 72

Binary Symmetric Channel We will describe Pinsker s scheme using the BSC example: Capacity C = 1 + ɛ log 2 (ɛ) + (1 ɛ) log 2 (1 ɛ) Cutoff rate R 0 = log 2 2 1 + 2 ɛ(1 ɛ) Pinsker s scheme 32 / 72

Binary Symmetric Channel We will describe Pinsker s scheme using the BSC example: Capacity C = 1 + ɛ log 2 (ɛ) + (1 ɛ) log 2 (1 ɛ) Cutoff rate R 0 = log 2 2 1 + 2 ɛ(1 ɛ) Pinsker s scheme 32 / 72

Capacity and cutoff rate for the BSC R 0 and C R 0 /C Pinsker s scheme 33 / 72

Pinsker s scheme Based on the observations that as ɛ 0 R 0 (ɛ) C(ɛ) 1 and R 0(ɛ) 1, Pinsker (1965) proposed concatenation scheme that achieved capacity within constant average cost per decoded bit irrespective of the level of reliability Pinsker s scheme 34 / 72

Pinsker s scheme x 1 y 1 d 1 u W 1 û 1 ˆd1 CE 1 SD 1 x 2 y 2 d 2 u W 2 û 2 ˆd2 CE 2 Block SD 2 Block decoder encoder (ML) d K2 CEK2 u K2 û K2 SDK2 ˆdK2 K 2 identical convolutional encoders x N2 W N 2 independent copies of W y N2 K 2 independent sequential decoders The inner block code does the initial clean-up at huge but finite complexity; the outer convolutional encoding (CE) and sequential decoding (SD) boosts the reliability at little extra cost. Pinsker s scheme 35 / 72

Discussion Although Pinsker s scheme made a very strong theoretical point, it was not practical. There were many more attempts to go around the R 0 barrier in 1960s: D. Falconer, A Hybrid Sequential and Algebraic Decoding Scheme, Sc.D. thesis, Dept. of Elec. Eng., M.I.T., 1966. I. Stiglitz, Iterative sequential decoding, IEEE Transactions on Information Theory, vol. 15, no. 6, pp. 715721, Nov. 1969. F. Jelinek and J. Cocke, Bootstrap hybrid decoding for symmetrical binary input channels, Inform. Contr., vol. 18, no. 3, pp. 261-298, Apr. 1971. It is fair to say that none of these schemes had any practical impact Pinsker s scheme 36 / 72

Discussion Although Pinsker s scheme made a very strong theoretical point, it was not practical. There were many more attempts to go around the R 0 barrier in 1960s: D. Falconer, A Hybrid Sequential and Algebraic Decoding Scheme, Sc.D. thesis, Dept. of Elec. Eng., M.I.T., 1966. I. Stiglitz, Iterative sequential decoding, IEEE Transactions on Information Theory, vol. 15, no. 6, pp. 715721, Nov. 1969. F. Jelinek and J. Cocke, Bootstrap hybrid decoding for symmetrical binary input channels, Inform. Contr., vol. 18, no. 3, pp. 261-298, Apr. 1971. It is fair to say that none of these schemes had any practical impact Pinsker s scheme 36 / 72

Discussion Although Pinsker s scheme made a very strong theoretical point, it was not practical. There were many more attempts to go around the R 0 barrier in 1960s: D. Falconer, A Hybrid Sequential and Algebraic Decoding Scheme, Sc.D. thesis, Dept. of Elec. Eng., M.I.T., 1966. I. Stiglitz, Iterative sequential decoding, IEEE Transactions on Information Theory, vol. 15, no. 6, pp. 715721, Nov. 1969. F. Jelinek and J. Cocke, Bootstrap hybrid decoding for symmetrical binary input channels, Inform. Contr., vol. 18, no. 3, pp. 261-298, Apr. 1971. It is fair to say that none of these schemes had any practical impact Pinsker s scheme 36 / 72

Discussion Although Pinsker s scheme made a very strong theoretical point, it was not practical. There were many more attempts to go around the R 0 barrier in 1960s: D. Falconer, A Hybrid Sequential and Algebraic Decoding Scheme, Sc.D. thesis, Dept. of Elec. Eng., M.I.T., 1966. I. Stiglitz, Iterative sequential decoding, IEEE Transactions on Information Theory, vol. 15, no. 6, pp. 715721, Nov. 1969. F. Jelinek and J. Cocke, Bootstrap hybrid decoding for symmetrical binary input channels, Inform. Contr., vol. 18, no. 3, pp. 261-298, Apr. 1971. It is fair to say that none of these schemes had any practical impact Pinsker s scheme 36 / 72

Discussion Although Pinsker s scheme made a very strong theoretical point, it was not practical. There were many more attempts to go around the R 0 barrier in 1960s: D. Falconer, A Hybrid Sequential and Algebraic Decoding Scheme, Sc.D. thesis, Dept. of Elec. Eng., M.I.T., 1966. I. Stiglitz, Iterative sequential decoding, IEEE Transactions on Information Theory, vol. 15, no. 6, pp. 715721, Nov. 1969. F. Jelinek and J. Cocke, Bootstrap hybrid decoding for symmetrical binary input channels, Inform. Contr., vol. 18, no. 3, pp. 261-298, Apr. 1971. It is fair to say that none of these schemes had any practical impact Pinsker s scheme 36 / 72

Discussion Although Pinsker s scheme made a very strong theoretical point, it was not practical. There were many more attempts to go around the R 0 barrier in 1960s: D. Falconer, A Hybrid Sequential and Algebraic Decoding Scheme, Sc.D. thesis, Dept. of Elec. Eng., M.I.T., 1966. I. Stiglitz, Iterative sequential decoding, IEEE Transactions on Information Theory, vol. 15, no. 6, pp. 715721, Nov. 1969. F. Jelinek and J. Cocke, Bootstrap hybrid decoding for symmetrical binary input channels, Inform. Contr., vol. 18, no. 3, pp. 261-298, Apr. 1971. It is fair to say that none of these schemes had any practical impact Pinsker s scheme 36 / 72

R 0 as practical capacity The failure to beat the cutoff rate bound in a meaningful manner despite intense efforts elevated R 0 to the status of a realistic limit to reliable communications R 0 appears as the key figure-of-merit for communication system design in the influential works of the period: Wozencraft and Jacobs, Principles of Communication Engineering, 1965 Wozencraft and Kennedy, Modulation and demodulation for probabilistic coding, IT Trans.,1966 Massey, Coding and modulation in digital communications, Zürich, 1974 Forney (1995) gives a first-hand account of this situation in his Shannon Lecture Performance and Complexity Pinsker s scheme 37 / 72

R 0 as practical capacity The failure to beat the cutoff rate bound in a meaningful manner despite intense efforts elevated R 0 to the status of a realistic limit to reliable communications R 0 appears as the key figure-of-merit for communication system design in the influential works of the period: Wozencraft and Jacobs, Principles of Communication Engineering, 1965 Wozencraft and Kennedy, Modulation and demodulation for probabilistic coding, IT Trans.,1966 Massey, Coding and modulation in digital communications, Zürich, 1974 Forney (1995) gives a first-hand account of this situation in his Shannon Lecture Performance and Complexity Pinsker s scheme 37 / 72

R 0 as practical capacity The failure to beat the cutoff rate bound in a meaningful manner despite intense efforts elevated R 0 to the status of a realistic limit to reliable communications R 0 appears as the key figure-of-merit for communication system design in the influential works of the period: Wozencraft and Jacobs, Principles of Communication Engineering, 1965 Wozencraft and Kennedy, Modulation and demodulation for probabilistic coding, IT Trans.,1966 Massey, Coding and modulation in digital communications, Zürich, 1974 Forney (1995) gives a first-hand account of this situation in his Shannon Lecture Performance and Complexity Pinsker s scheme 37 / 72

R 0 as practical capacity The failure to beat the cutoff rate bound in a meaningful manner despite intense efforts elevated R 0 to the status of a realistic limit to reliable communications R 0 appears as the key figure-of-merit for communication system design in the influential works of the period: Wozencraft and Jacobs, Principles of Communication Engineering, 1965 Wozencraft and Kennedy, Modulation and demodulation for probabilistic coding, IT Trans.,1966 Massey, Coding and modulation in digital communications, Zürich, 1974 Forney (1995) gives a first-hand account of this situation in his Shannon Lecture Performance and Complexity Pinsker s scheme 37 / 72

R 0 as practical capacity The failure to beat the cutoff rate bound in a meaningful manner despite intense efforts elevated R 0 to the status of a realistic limit to reliable communications R 0 appears as the key figure-of-merit for communication system design in the influential works of the period: Wozencraft and Jacobs, Principles of Communication Engineering, 1965 Wozencraft and Kennedy, Modulation and demodulation for probabilistic coding, IT Trans.,1966 Massey, Coding and modulation in digital communications, Zürich, 1974 Forney (1995) gives a first-hand account of this situation in his Shannon Lecture Performance and Complexity Pinsker s scheme 37 / 72

R 0 as practical capacity The failure to beat the cutoff rate bound in a meaningful manner despite intense efforts elevated R 0 to the status of a realistic limit to reliable communications R 0 appears as the key figure-of-merit for communication system design in the influential works of the period: Wozencraft and Jacobs, Principles of Communication Engineering, 1965 Wozencraft and Kennedy, Modulation and demodulation for probabilistic coding, IT Trans.,1966 Massey, Coding and modulation in digital communications, Zürich, 1974 Forney (1995) gives a first-hand account of this situation in his Shannon Lecture Performance and Complexity Pinsker s scheme 37 / 72

Other attempts to boost the cutoff rate Efforts to beat the cutoff rate continues to this day D. J. Costello and F. Jelinek, 1972. P. R. Chevillat and D. J. Costello Jr., 1977. F. Hemmati, 1990. B. Radosavljevic, E. Arıkan, B. Hajek, 1992. J. Belzile and D. Haccoun, 1993. S. Kallel and K. Li, 1997. E. Arıkan, 2006... Pinsker s scheme 38 / 72

Other attempts to boost the cutoff rate Efforts to beat the cutoff rate continues to this day D. J. Costello and F. Jelinek, 1972. P. R. Chevillat and D. J. Costello Jr., 1977. F. Hemmati, 1990. B. Radosavljevic, E. Arıkan, B. Hajek, 1992. J. Belzile and D. Haccoun, 1993. S. Kallel and K. Li, 1997. E. Arıkan, 2006... In fact, polar coding originates from such attempts. Pinsker s scheme 38 / 72

Sequential decoding and the cutoff rate Guessing and cutoff rate Boosting the cutoff rate Pinsker s scheme Massey s scheme Polar coding Massey s scheme 39 / 72

The R 0 debate A case study by McEliece (1980) cast a big doubt on the significance of R 0 as a practical limit McEliece s study was concerned with a Pulse Position Modulation (PPM) scheme, modeled as a q-ary erasure channel Capacity: C(q) = (1 ɛ) log q Cutoff rate: R 0 (q) = log As the bandwidth (q) grew, q 1+(q 1)ɛ 1 2 3 ε 1 ε 1 2 3 R 0 (q) C(q) 0 Algebraic coding (Reed-Solomon) scored a big win over probabilistic coding! q q? Massey s scheme 40 / 72

The R 0 debate A case study by McEliece (1980) cast a big doubt on the significance of R 0 as a practical limit McEliece s study was concerned with a Pulse Position Modulation (PPM) scheme, modeled as a q-ary erasure channel Capacity: C(q) = (1 ɛ) log q Cutoff rate: R 0 (q) = log As the bandwidth (q) grew, q 1+(q 1)ɛ 1 2 3 ε 1 ε 1 2 3 R 0 (q) C(q) 0 Algebraic coding (Reed-Solomon) scored a big win over probabilistic coding! q q? Massey s scheme 41 / 72

Massey meets the challenge Massey (1981) showed that there was a different way of doing coding and modulation on a q-ary erasure channel that boosted R 0 effortlessly Paradoxically, as Massey restored the status of R 0, he exhibited the flaky nature of this parameter Massey s scheme 42 / 72

Massey meets the challenge Massey (1981) showed that there was a different way of doing coding and modulation on a q-ary erasure channel that boosted R 0 effortlessly Paradoxically, as Massey restored the status of R 0, he exhibited the flaky nature of this parameter Massey s scheme 42 / 72

Channel splitting to boost cutoff rate (Massey, 1981) 1 2 ε 1 ε 1 2 00 01 1 ε ε 00 01 0 1 ε ε 0 1 1? 3 3 10 10 4 4 11 11 0 1 ε ε 0??? 1 1? Begin with a quaternary erasure channel (QEC) Massey s scheme 43 / 72

Channel splitting to boost cutoff rate (Massey, 1981) 1 2 ε 1 ε 1 2 00 01 1 ε ε 00 01 0 1 ε ε 0 1 1? 3 3 10 10 4 4 11 11 0 1 ε ε 0??? 1 1? Relabel the inputs Massey s scheme 44 / 72

Channel splitting to boost cutoff rate (Massey, 1981) 1 2 ε 1 ε 1 2 00 01 1 ε ε 00 01 0 1 ε ε 0 1 1? 3 3 10 10 4 4 11 11 0 1 ε ε 0??? 1 1? Split the QEC into two binary erasure channels (BEC) BECs fully correlated: erasures occur jointly Massey s scheme 45 / 72

Capacity, cutoff rate for one QEC vs two BECs Ordinary coding of QEC Independent coding of BECs E BEC D E QEC D E BEC D C(QEC) = 2(1 ɛ) R 0 (QEC) = log 4 1+3ɛ C(BEC) = (1 ɛ) R 0 (BEC) = log 2 1+ɛ Massey s scheme 46 / 72

Capacity, cutoff rate for one QEC vs two BECs Ordinary coding of QEC Independent coding of BECs E BEC D E QEC D E BEC D C(QEC) = 2(1 ɛ) R 0 (QEC) = log 4 1+3ɛ C(BEC) = (1 ɛ) R 0 (BEC) = log 2 1+ɛ C(QEC) = 2 C(BEC) Massey s scheme 46 / 72

Capacity, cutoff rate for one QEC vs two BECs Ordinary coding of QEC Independent coding of BECs E BEC D E QEC D E BEC D C(QEC) = 2(1 ɛ) R 0 (QEC) = log 4 1+3ɛ C(BEC) = (1 ɛ) R 0 (BEC) = log 2 1+ɛ C(QEC) = 2 C(BEC) R 0 (QEC) 2 R 0 (BEC) with equality iff ɛ = 0 or 1. Massey s scheme 46 / 72

Cutoff rate improvement by splitting capacity and cutoff rate (bits) 2 1 2 BEC cutoff rate QEC capacity QEC cutoff rate 0 0 erasure probability (ǫ) 1 Massey s scheme 47 / 72

Comparison of Pinsker s and Massey s schemes Pinsker Construct a superchannel by combining independent copies of a given DMC W Split the superchannel into correlated subchannels Ignore correlations between the subchannels, encode and decode them independently Can be used universally Can achieve capacity Not practical Massey Split the given DMC W into correlated subchannels Ignore correlations between the subchannels, encode and decode them independently Applicable only to specific channels Cannot achieve capacity Practical Massey s scheme 48 / 72

Comparison of Pinsker s and Massey s schemes Pinsker Construct a superchannel by combining independent copies of a given DMC W Split the superchannel into correlated subchannels Ignore correlations between the subchannels, encode and decode them independently Can be used universally Can achieve capacity Not practical Massey Split the given DMC W into correlated subchannels Ignore correlations between the subchannels, encode and decode them independently Applicable only to specific channels Cannot achieve capacity Practical Massey s scheme 48 / 72

Comparison of Pinsker s and Massey s schemes Pinsker Construct a superchannel by combining independent copies of a given DMC W Split the superchannel into correlated subchannels Ignore correlations between the subchannels, encode and decode them independently Can be used universally Can achieve capacity Not practical Massey Split the given DMC W into correlated subchannels Ignore correlations between the subchannels, encode and decode them independently Applicable only to specific channels Cannot achieve capacity Practical Massey s scheme 48 / 72

Comparison of Pinsker s and Massey s schemes Pinsker Construct a superchannel by combining independent copies of a given DMC W Split the superchannel into correlated subchannels Ignore correlations between the subchannels, encode and decode them independently Can be used universally Can achieve capacity Not practical Massey Split the given DMC W into correlated subchannels Ignore correlations between the subchannels, encode and decode them independently Applicable only to specific channels Cannot achieve capacity Practical Massey s scheme 48 / 72

Comparison of Pinsker s and Massey s schemes Pinsker Construct a superchannel by combining independent copies of a given DMC W Split the superchannel into correlated subchannels Ignore correlations between the subchannels, encode and decode them independently Can be used universally Can achieve capacity Not practical Massey Split the given DMC W into correlated subchannels Ignore correlations between the subchannels, encode and decode them independently Applicable only to specific channels Cannot achieve capacity Practical Massey s scheme 48 / 72

Comparison of Pinsker s and Massey s schemes Pinsker Construct a superchannel by combining independent copies of a given DMC W Split the superchannel into correlated subchannels Ignore correlations between the subchannels, encode and decode them independently Can be used universally Can achieve capacity Not practical Massey Split the given DMC W into correlated subchannels Ignore correlations between the subchannels, encode and decode them independently Applicable only to specific channels Cannot achieve capacity Practical Massey s scheme 48 / 72

Comparison of Pinsker s and Massey s schemes Pinsker Construct a superchannel by combining independent copies of a given DMC W Split the superchannel into correlated subchannels Ignore correlations between the subchannels, encode and decode them independently Can be used universally Can achieve capacity Not practical Massey Split the given DMC W into correlated subchannels Ignore correlations between the subchannels, encode and decode them independently Applicable only to specific channels Cannot achieve capacity Practical Massey s scheme 48 / 72

Comparison of Pinsker s and Massey s schemes Pinsker Construct a superchannel by combining independent copies of a given DMC W Split the superchannel into correlated subchannels Ignore correlations between the subchannels, encode and decode them independently Can be used universally Can achieve capacity Not practical Massey Split the given DMC W into correlated subchannels Ignore correlations between the subchannels, encode and decode them independently Applicable only to specific channels Cannot achieve capacity Practical Massey s scheme 48 / 72

Comparison of Pinsker s and Massey s schemes Pinsker Construct a superchannel by combining independent copies of a given DMC W Split the superchannel into correlated subchannels Ignore correlations between the subchannels, encode and decode them independently Can be used universally Can achieve capacity Not practical Massey Split the given DMC W into correlated subchannels Ignore correlations between the subchannels, encode and decode them independently Applicable only to specific channels Cannot achieve capacity Practical Massey s scheme 48 / 72

Comparison of Pinsker s and Massey s schemes Pinsker Construct a superchannel by combining independent copies of a given DMC W Split the superchannel into correlated subchannels Ignore correlations between the subchannels, encode and decode them independently Can be used universally Can achieve capacity Not practical Massey Split the given DMC W into correlated subchannels Ignore correlations between the subchannels, encode and decode them independently Applicable only to specific channels Cannot achieve capacity Practical Massey s scheme 48 / 72

Comparison of Pinsker s and Massey s schemes Pinsker Construct a superchannel by combining independent copies of a given DMC W Split the superchannel into correlated subchannels Ignore correlations between the subchannels, encode and decode them independently Can be used universally Can achieve capacity Not practical Massey Split the given DMC W into correlated subchannels Ignore correlations between the subchannels, encode and decode them independently Applicable only to specific channels Cannot achieve capacity Practical Massey s scheme 48 / 72

Comparison of Pinsker s and Massey s schemes Pinsker Construct a superchannel by combining independent copies of a given DMC W Split the superchannel into correlated subchannels Ignore correlations between the subchannels, encode and decode them independently Can be used universally Can achieve capacity Not practical Massey Split the given DMC W into correlated subchannels Ignore correlations between the subchannels, encode and decode them independently Applicable only to specific channels Cannot achieve capacity Practical Massey s scheme 48 / 72

Comparison of Pinsker s and Massey s schemes Pinsker Construct a superchannel by combining independent copies of a given DMC W Split the superchannel into correlated subchannels Ignore correlations between the subchannels, encode and decode them independently Can be used universally Can achieve capacity Not practical Massey Split the given DMC W into correlated subchannels Ignore correlations between the subchannels, encode and decode them independently Applicable only to specific channels Cannot achieve capacity Practical Massey s scheme 48 / 72

A conservation law for the cutoff rate Derived (Vector) Channel K Block Encoder N Memoryless Channel W N Block Decoder K Rate K/N Parallel channels theorem (Gallager, 1965) R 0 (Derived vector channel) N R 0 (W ) Cleaning up the channel by pre-/post-processing can only hurt R 0 Shows that boosting cutoff rate requires more than one sequential decoder Massey s scheme 49 / 72

A conservation law for the cutoff rate Derived (Vector) Channel K Block Encoder N Memoryless Channel W N Block Decoder K Rate K/N Parallel channels theorem (Gallager, 1965) R 0 (Derived vector channel) N R 0 (W ) Cleaning up the channel by pre-/post-processing can only hurt R 0 Shows that boosting cutoff rate requires more than one sequential decoder Massey s scheme 49 / 72

A conservation law for the cutoff rate Derived (Vector) Channel K Block Encoder N Memoryless Channel W N Block Decoder K Rate K/N Parallel channels theorem (Gallager, 1965) R 0 (Derived vector channel) N R 0 (W ) Cleaning up the channel by pre-/post-processing can only hurt R 0 Shows that boosting cutoff rate requires more than one sequential decoder Massey s scheme 49 / 72

A conservation law for the cutoff rate Derived (Vector) Channel K Block Encoder N Memoryless Channel W N Block Decoder K Rate K/N Parallel channels theorem (Gallager, 1965) R 0 (Derived vector channel) N R 0 (W ) Cleaning up the channel by pre-/post-processing can only hurt R 0 Shows that boosting cutoff rate requires more than one sequential decoder Massey s scheme 49 / 72

Sequential decoding and the cutoff rate Guessing and cutoff rate Boosting the cutoff rate Pinsker s scheme Massey s scheme Polar coding Polar coding 50 / 72

Prescription for a new scheme Consider small constructions Retain independent encoding for the subchannels Do not ignore correlations between subchannels at the expense of capacity This points to multi-level coding and successive cancellation decoding Polar coding 51 / 72

Multi-stage decoding architecture Channel W N d 1 CE 1 u 1 x 1 W d 2 CE 2 u 2 x 2 W One-to-one mapper f N y 1 l 1 y 2 l 2 Soft-decision generator g N û 1 SD 1 ˆd1 û 2 SD 2 ˆd2 d N CE N u N x N W y N l N û N SD N ˆdN N convolutional encoders N independent copies of W N sequential decoders Polar coding 52 / 72

Prescription for a new scheme Consider small constructions Retain independent encoding for the subchannels Do not ignore correlations between subchannels at the expense of capacity This points to multi-level coding and successive cancellation decoding Polar coding 53 / 72

Notation Let V : F 2 = {0, 1} Y be an arbitrary binary-input memoryless channel Let (X, Y ) be an input-output ensemble for channel V with X uniform on F 2 The (symmetric) capacity is defined as I (V ) = I (X ; Y ) = y Y x F 2 The (symmetric) cutoff rate is defined as R 0 (V ) = R 0 (X ; Y ) = log y Y 1 2 V (y x) log V (y x) 1 2 V (y 0) + 1 2 V (y 1) x F2 1 2 V (y x) 2 Polar coding 54 / 72

The basic construction Given two copies of a binary input channel W : F 2 = {0, 1} Y X 1 W Y 1 X 2 W Y 2 consider the transformation above to generate two channels W : F 2 Y 2 and W + : F 2 Y 2 F 2 with W (y 1 y 2 u 1 ) = u 2 1 2 W (y 1 u 1 + u 2 )W (y 2 u 2 ) W + (y 1 y 2 u 1 u 2 ) = 1 2 W (y 1 u 1 + u 2 )W (y 2 u 2 ) Polar coding 55 / 72

The basic construction Given two copies of a binary input channel W : F 2 = {0, 1} Y U 1 + W Y 1 U 2 W Y 2 consider the transformation above to generate two channels W : F 2 Y 2 and W + : F 2 Y 2 F 2 with W (y 1 y 2 u 1 ) = u 2 1 2 W (y 1 u 1 + u 2 )W (y 2 u 2 ) W + (y 1 y 2 u 1 u 2 ) = 1 2 W (y 1 u 1 + u 2 )W (y 2 u 2 ) Polar coding 55 / 72

The basic construction Given two copies of a binary input channel W : F 2 = {0, 1} Y U 1 + W Y 1 U 2 W Y 2 consider the transformation above to generate two channels W : F 2 Y 2 and W + : F 2 Y 2 F 2 with W (y 1 y 2 u 1 ) = u 2 1 2 W (y 1 u 1 + u 2 )W (y 2 u 2 ) W + (y 1 y 2 u 1 u 2 ) = 1 2 W (y 1 u 1 + u 2 )W (y 2 u 2 ) Polar coding 55 / 72

The basic construction Given two copies of a binary input channel W : F 2 = {0, 1} Y U 1 + W Y 1 U 2 W Y 2 consider the transformation above to generate two channels W : F 2 Y 2 and W + : F 2 Y 2 F 2 with W (y 1 y 2 u 1 ) = u 2 1 2 W (y 1 u 1 + u 2 )W (y 2 u 2 ) W + (y 1 y 2 u 1 u 2 ) = 1 2 W (y 1 u 1 + u 2 )W (y 2 u 2 ) Polar coding 55 / 72

The basic construction Given two copies of a binary input channel W : F 2 = {0, 1} Y U 1 + W Y 1 U 2 W Y 2 consider the transformation above to generate two channels W : F 2 Y 2 and W + : F 2 Y 2 F 2 with W (y 1 y 2 u 1 ) = u 2 1 2 W (y 1 u 1 + u 2 )W (y 2 u 2 ) W + (y 1 y 2 u 1 u 2 ) = 1 2 W (y 1 u 1 + u 2 )W (y 2 u 2 ) Polar coding 55 / 72

The basic construction Given two copies of a binary input channel W : F 2 = {0, 1} Y U 1 + W Y 1 U 2 W Y 2 consider the transformation above to generate two channels W : F 2 Y 2 and W + : F 2 Y 2 F 2 with W (y 1 y 2 u 1 ) = u 2 1 2 W (y 1 u 1 + u 2 )W (y 2 u 2 ) W + (y 1 y 2 u 1 u 2 ) = 1 2 W (y 1 u 1 + u 2 )W (y 2 u 2 ) Polar coding 55 / 72

The 2x2 transformation is information lossless With independent, uniform U 1, U 2, I (W ) = I (U 1 ; Y 1 Y 2 ), I (W + ) = I (U 2 ; Y 1 Y 2 U 1 ). Thus, I (W ) + I (W + ) = I (U 1 U 2 ; Y 1 Y 2 ) = 2I (W ), and I (W ) I (W ) I (W + ). Polar coding 56 / 72

The 2x2 transformation creates cutoff rate With independent, uniform U 1, U 2, R 0 (W ) = R 0 (U 1 ; Y 1 Y 2 ), R 0 (W + ) = R 0 (U 2 ; Y 1 Y 2 U 1 ). Theorem (2005) Correlation helps create cutoff rate: R 0 (W ) + R 0 (W + ) 2R 0 (W ) with equality iff W is a perfect channel, I (W ) = 1, or a pure noise channel, I (W ) = 0. Cutoff rates start polarizing: R 0 (W ) R 0 (W ) R 0 (W + ) Polar coding 57 / 72

The 2x2 transformation creates cutoff rate With independent, uniform U 1, U 2, R 0 (W ) = R 0 (U 1 ; Y 1 Y 2 ), R 0 (W + ) = R 0 (U 2 ; Y 1 Y 2 U 1 ). Theorem (2005) Correlation helps create cutoff rate: R 0 (W ) + R 0 (W + ) 2R 0 (W ) with equality iff W is a perfect channel, I (W ) = 1, or a pure noise channel, I (W ) = 0. Cutoff rates start polarizing: R 0 (W ) R 0 (W ) R 0 (W + ) Polar coding 57 / 72

The 2x2 transformation creates cutoff rate With independent, uniform U 1, U 2, R 0 (W ) = R 0 (U 1 ; Y 1 Y 2 ), R 0 (W + ) = R 0 (U 2 ; Y 1 Y 2 U 1 ). Theorem (2005) Correlation helps create cutoff rate: R 0 (W ) + R 0 (W + ) 2R 0 (W ) with equality iff W is a perfect channel, I (W ) = 1, or a pure noise channel, I (W ) = 0. Cutoff rates start polarizing: R 0 (W ) R 0 (W ) R 0 (W + ) Polar coding 57 / 72

Recursive continuation Do the same recursively: Given W, Duplicate W and obtain W and W +. Duplicate W (W + ), and obtain W and W + (W + and W ++ ). Duplicate W (W +, W +, W ++ ) and obtain W and W + (W +, W ++, W +, W + +, W ++, W +++ ).... Polar coding 58 / 72

Recursive continuation Do the same recursively: Given W, Duplicate W and obtain W and W +. Duplicate W (W + ), and obtain W and W + (W + and W ++ ). Duplicate W (W +, W +, W ++ ) and obtain W and W + (W +, W ++, W +, W + +, W ++, W +++ ).... Polar coding 58 / 72

Recursive continuation Do the same recursively: Given W, Duplicate W and obtain W and W +. Duplicate W (W + ), and obtain W and W + (W + and W ++ ). Duplicate W (W +, W +, W ++ ) and obtain W and W + (W +, W ++, W +, W + +, W ++, W +++ ).... Polar coding 58 / 72

Recursive continuation Do the same recursively: Given W, Duplicate W and obtain W and W +. Duplicate W (W + ), and obtain W and W + (W + and W ++ ). Duplicate W (W +, W +, W ++ ) and obtain W and W + (W +, W ++, W +, W + +, W ++, W +++ ).... Polar coding 58 / 72

Recursive continuation Do the same recursively: Given W, Duplicate W and obtain W and W +. Duplicate W (W + ), and obtain W and W + (W + and W ++ ). Duplicate W (W +, W +, W ++ ) and obtain W and W + (W +, W ++, W +, W + +, W ++, W +++ ).... Polar coding 58 / 72

Recursive continuation Do the same recursively: Given W, Duplicate W and obtain W and W +. Duplicate W (W + ), and obtain W and W + (W + and W ++ ). Duplicate W (W +, W +, W ++ ) and obtain W and W + (W +, W ++, W +, W + +, W ++, W +++ ).... Polar coding 58 / 72

Polarization Process Evolution of I = I (W ), I + = I (W + ), I = I (W ), etc. 1 I 0 Polar coding 59 / 72

Polarization Process Evolution of I = I (W ), I + = I (W + ), I = I (W ), etc. 1 I + I I 0 1 Polar coding 60 / 72

Polarization Process Evolution of I = I (W ), I + = I (W + ), I = I (W ), etc. 1 I ++ I + I I + I + I I 0 1 2 Polar coding 61 / 72

Polarization Process Evolution of I = I (W ), I + = I (W + ), I = I (W ), etc. 1 I ++ I + I I + I + I I 0 1 2 3 Polar coding 62 / 72

Polarization Process Evolution of I = I (W ), I + = I (W + ), I = I (W ), etc. 1 I ++ I + I I + I + I I 0 1 2 3 4 Polar coding 63 / 72

Polarization Process Evolution of I = I (W ), I + = I (W + ), I = I (W ), etc. 1 I ++ I + I I + I + I I 0 1 2 3 4 5 Polar coding 64 / 72

Polarization Process Evolution of I = I (W ), I + = I (W + ), I = I (W ), etc. 1 I ++ I + I I + I + I I 0 1 2 3 4 5 6 Polar coding 65 / 72

Polarization Process Evolution of I = I (W ), I + = I (W + ), I = I (W ), etc. 1 I ++ I + I I + I + I I 0 1 2 3 4 5 6 7 Polar coding 66 / 72