On the Convergence of the Polarization Process in the Noisiness/Weak- Topology

Similar documents
On the Polarization Levels of Automorphic-Symmetric Channels

An Alternative Proof of Channel Polarization for Channels with Arbitrary Input Alphabets

Polar Codes for Arbitrary DMCs and Arbitrary MACs

Channel Polarization and Blackwell Measures

Multi-Kernel Polar Codes: Proof of Polarization and Error Exponents

Polar Codes for Sources with Finite Reconstruction Alphabets

Polar codes for the m-user MAC and matroids

Solutions to Homework Set #3 Channel and Source coding

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Lecture 4 Noisy Channel Coding

Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels

Polar Codes for Some Multi-terminal Communications Problems

EE 4TM4: Digital Communications II. Channel Capacity

The Compound Capacity of Polar Codes

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Lecture 4 Channel Coding

Midterm Exam Information Theory Fall Midterm Exam. Time: 09:10 12:10 11/23, 2016

Can Feedback Increase the Capacity of the Energy Harvesting Channel?

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

An Extended Fano s Inequality for the Finite Blocklength Coding

Rudiments of Ergodic Theory

Compound Polar Codes

A Comparison of Superposition Coding Schemes

Capacity Region of Reversely Degraded Gaussian MIMO Broadcast Channel

Course 311: Michaelmas Term 2005 Part III: Topics in Commutative Algebra

On the Duality between Multiple-Access Codes and Computation Codes

Multiaccess Channels with State Known to One Encoder: A Case of Degraded Message Sets

Upper Bounds on the Capacity of Binary Intermittent Communication

Arimoto Channel Coding Converse and Rényi Divergence

The Method of Types and Its Application to Information Hiding

A Singleton Bound for Lattice Schemes

Part V. 17 Introduction: What are measures and why measurable sets. Lebesgue Integration Theory

Lecture 5 Channel Coding over Continuous Channels

Variable Length Codes for Degraded Broadcast Channels

Equidistant Polarizing Transforms

arxiv: v3 [cs.it] 1 Apr 2014

Lecture 9 Polar Coding

Feedback Capacity of a Class of Symmetric Finite-State Markov Channels

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Construction of Polar Codes with Sublinear Complexity

Lecture 3: Channel Capacity

On the Capacity of the Interference Channel with a Relay

Probability and Measure

A Simple Converse of Burnashev s Reliability Function

Midterm 1. Every element of the set of functions is continuous

7 About Egorov s and Lusin s theorems

Polar Codes are Optimal for Write-Efficient Memories

CHAPTER 9. Embedding theorems

Lecture 2: August 31

Discrete Memoryless Channels with Memoryless Output Sequences

Lecture 8: Channel and source-channel coding theorems; BEC & linear codes. 1 Intuitive justification for upper bound on channel capacity

Shannon s Noisy-Channel Coding Theorem

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information

Joint Write-Once-Memory and Error-Control Codes

Scaling limit of random planar maps Lecture 2.

Practical Polar Code Construction Using Generalised Generator Matrices

int cl int cl A = int cl A.

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

On the Construction of Polar Codes for Channels with Moderate Input Alphabet Sizes

ELEC546 Review of Information Theory

Exercise 1. = P(y a 1)P(a 1 )

Relay Networks With Delays

Refined Bounds on the Empirical Distribution of Good Channel Codes via Concentration Inequalities

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

Reliable Computation over Multiple-Access Channels

Tame definable topological dynamics

are Banach algebras. f(x)g(x) max Example 7.4. Similarly, A = L and A = l with the pointwise multiplication

The Capacity of Finite Abelian Group Codes Over Symmetric Memoryless Channels Giacomo Como and Fabio Fagnani

ECE 4400:693 - Information Theory

Information measures in simple coding problems

DYNAMICAL CUBES AND A CRITERIA FOR SYSTEMS HAVING PRODUCT EXTENSIONS

Remote Source Coding with Two-Sided Information

GROUP SHIFTS AND BERNOULLI FACTORS

Optimal Natural Encoding Scheme for Discrete Multiplicative Degraded Broadcast Channels

1. Simplify the following. Solution: = {0} Hint: glossary: there is for all : such that & and

National University of Singapore Department of Electrical & Computer Engineering. Examination for

TOPOLOGY HW 2. x x ± y

Polar Write Once Memory Codes

AN INTRODUCTION TO SECRECY CAPACITY. 1. Overview

LECTURE 10. Last time: Lecture outline

Homework Set #3 Rates definitions, Channel Coding, Source-Channel coding

Normal forms in combinatorial algebra

Essays on representations of p-adic groups. Smooth representations. π(d)v = ϕ(x)π(x) dx. π(d 1 )π(d 2 )v = ϕ 1 (x)π(x) dx ϕ 2 (y)π(y)v dy

Chapter 1. Measure Spaces. 1.1 Algebras and σ algebras of sets Notation and preliminaries

Shannon s noisy-channel theorem

β-expansion: A Theoretical Framework for Fast and Recursive Construction of Polar Codes

Topology. Xiaolong Han. Department of Mathematics, California State University, Northridge, CA 91330, USA address:

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

Cut-Set Bound and Dependence Balance Bound

1 Adeles over Q. 1.1 Absolute values

Boolean Algebras, Boolean Rings and Stone s Representation Theorem

8. Prime Factorization and Primary Decompositions

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

PROBLEMS, MATH 214A. Affine and quasi-affine varieties

(Classical) Information Theory III: Noisy channel coding

B 1 = {B(x, r) x = (x 1, x 2 ) H, 0 < r < x 2 }. (a) Show that B = B 1 B 2 is a basis for a topology on X.

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University

The Least Degraded and the Least Upgraded Channel with respect to a Channel Family

Smooth morphisms. Peter Bruin 21 February 2007

Transcription:

1 On the Convergence of the Polarization Process in the Noisiness/Weak- Topology Rajai Nasser Email: rajai.nasser@alumni.epfl.ch arxiv:1810.10821v2 [cs.it] 26 Oct 2018 Abstract Let W be a channel where the input alphabet is endowed with an Abelian group operation, and let W n) n 0 be Arıkan s channel-valued polarization process that is obtained from W using this operation. We prove that the process W n) n 0 converges almost surely to deterministic homomorphism channels in the noisiness/weak- topology. This provides a simple proof of multilevel polarization for a large family of channels, containing among others, discrete memoryless channels DMC), and channels with continuous output alphabets. This also shows that any continuous channel functional converges almost surely even if the functional does not induce a submartingale or a supermartingale). I. INTRODUCTION Polar codes are a family of capacity-achieving codes which were first introduced by Arıkan for binary input channels [1]. The construction of polar codes relies on a phenomenon called polarization: A collection of independent copies of a channel is converted into a collection of synthetic channels that are extreme, i.e., almost useless or almost perfect. The construction of polar codes was later generalized for channels with arbitrary but finite) input alphabet [2], [3], [4], [5], [6], [7]. Note that for channels where the input alphabet size is not prime, polarization is not necessarily a two-level polarization to useless and perfect channels): We may have multilevel polarization where the polarized channels can be neither useless nor perfect. In this paper, we are interested in the general multilevel polarization phenomenon which happens when we apply an Arıkan-style transformation that is based on an Abelian group operation. It was shown in [4] that as the number of polarization steps becomes large, the behavior of synthetic channels resembles that of deterministic homomorphism channels projecting their input onto a quotient group. This resemblance was formulated in [4] using Bhattacharyya parameters. We may say informally), that as the number of polarization steps goes to infinity, the synthetic channels converge to deterministic homomorphism channels. One reason why this statement is informal is because the synthetic channels do not have the same output alphabet, so in order to make the statement formal, we must define a space in which we can topologically compare channels with different output alphabets. In [8], we defined the space of all channels with fixed input alphabet and arbitrary but finite output alphabet. This space was first quotiented by an equivalence relation, and then several topological structures were defined. In this paper, we show that Arıkan s polarization process does converge in the noisiness/weak- topology to deterministic homomorphism channels. The proof uses the Blackwell measure of channels 1, and hence can be generalized to all channels whose equivalence class can be determined by the Blackwell measure. This family of channels contains, among others, all discrete memoryless channels and all channels with continuous output alphabets. Another advantage of our proof is that it implies the convergence of all channel functionals that are continuous in the noisiness/weak- topology. Therefore, we have convergence of those functionals even if they do not induce a submartingale or a supermartingale process. In Section II, we introduce the preliminaries of this paper. In Section III, we recall the multilevel polarization phenomenon. In Section IV, we show the convergence of the polarization process in the noisiness/weak- topology. For simplicity, we only discuss discrete memoryless channels, but the proof is valid for any channel whose equivalence class is determined by the Blackwell measure. A. Meta-Probability Measures II. PRELIMINARIES Let X be a finite set. The set of probability distributions on X is denoted as X. We associate X with its Borel σ- algebra. A meta-probability measure on X is a probability measure on the Borel sets of X. It is called a metaprobability measure because it is a probability measure on the set of probability distributions on X. We denote the set of meta-probability measures on X as MPX). A meta-probability measure MP on X is said to be balanced if p dmpp) = π X, X where π X is the uniform probability distribution on X. The set of balanced meta-probability measures on X is denoted as MP b X). The set of balanced and finitely supported metaprobability measures on X is denoted as MP bf X). B. DMC Spaces Let X and Y be two finite sets. The set of discrete memoryless channels DMC) with input alphabet X and output alphabet Y is denoted as DMC X,Y. The set of channels with input alphabet X is defined as DMC = n 1DMC X,[n], 1 Blackwell measures was used in [9] and [10] for analyzing the polarization of binary-input channels.

2 where [n] = 1,...,n} and is the disjoint union symbol. The symbol in DMC means that the output alphabet is arbitrary but finite. Let X, Y and Z be three finite sets. Let W DMC X,Y and V DMC Y,Z. The composition of V and W is the channel V W DMC X,Z defined as: V W)z x) = y YVz y)wy x). A channel W DMC X,Y is said to be degraded from another channel W DMC X,Y if there exists a channel V DMC Y,Y such that W = V W. Two channels are said to be equivalent if each one is degraded from the other. It is well known that if two channels are equivalent then every code has the same probability of error under ML decoding) for both channels. This is why it makes sense from an information-theoretic point of view to identify equivalent channels and consider them as one object in the space of equivalent channels. The quotient of DMC by the equivalence relation is denoted as DMC o). The equivalence class of a channel W DMC is denoted as Ŵ. C. A Necessary and Sufficient Condition for Degradedness Let U,X,Y be three finite sets and let W DMC X,Y. For every p U X, define P c p,w) = sup pu, x)wy x)du y). D DMC Y,U u U, x X, y Y P c p,w) can be interpreted as follows: LetU,X) be a pair of random variables in U X. Send X through the channel W and let Y be the output. P c p,w) can be seen as the optimal probability of correctly guessing U from Y among all random decoders D DMC Y,U. Now let X,Y,Y be three finite sets and let W DMC X,Y and W DMC X,Y. Buscemi proved in [11] that W is degraded from W if and only if P c p,w) P c p,w ) for every p U X and every finite set U. This means that W and W are equivalent if and only if P c p,w) = P c p,w ) for every p U X and every finite set U. Therefore, if p U X and Ŵ DMCo), we can define Pcp,Ŵ) = P c p,w) for any W Ŵ. D. Blackwell Measures Let W DMC X,Y. Let X,Y) be a pair of random variables in X Y, which is distributed as P X,Y x,y) = 1 X Wy x). In other words, X is uniformly distributed in X and Y is the output of the channel W when X is the input. For every y Y satisfying P Y y) > 0, let Wy 1 X be the posterior probability distribution of the input assuming y was received. More precisely, W 1 y x) = P X Y x y) = Wy x) x X Wy x ). The Blackwell measure of W is the meta-probability measure MP W MPX), which describes the random variablew 1 Y. It is easy to see that MP W = P Y y)δ W 1 y y Y: P Y y)>0 MPX). The following proposition, which is easy to prove, characterizes the Blackwell measures of DMCs: Proposition 1. [12] A meta-probability measure MP MPX) is the Blackwell measure of a DMC with input alphabet X if and only if MP MP bf X). The following proposition shows that the Blackwell measure characterizes the equivalence class of a channel: Proposition 2. [12] Two channels W DMC X,Y and W DMC X,Y are equivalent if and only if MP W = MP W. For every Ŵ DMCo), define MP Ŵ = MP W for any W Ŵ. Proposition 2 shows that MP Ŵ is well defined. E. The Noisiness/Weak- Topology In [8], we defined the noisiness metric as follows: on DMCo) Ŵ,Ŵ ) = sup m 1, p [m] X P c p,ŵ) P cp,ŵ ), where [m] = 1,...,m}. Ŵ,Ŵ ) is called the noisiness metric because it compares the noisiness of Ŵ with that of Ŵ : If P c p,ŵ) is close to P c p,ŵ ) for every random encoder p, then Ŵ and Ŵ have close noisiness levels. The topology on DMC o) which is induced by the metric o) is denoted as T. Another way to topologize the space DMC o) is through Blackwell measures: Proposition 1 implies that the mapping Ŵ MPŴ is a bijection from DMC o) to MP bfx). We call this mapping the canonical bijection from DMC o) to MP bf X). By choosing a topology on MP bf X), we can construct a topology on DMC o) through the canonical bijection. We showed in [8] that the weak- topology is exactly the same ast o). This is why we callto) the noisiness/weak- topology. Remark 1. Since we identify DMC o) and MP bfx) through the canonical bijection, we can use to define a metric on MP bf X). Furthermore, since MP bf X) is dense in MP b X) see e.g., [8]), we can extend the definition of to MP bx) by continuity. Similarly, we can extend the definition of any channel parameter or operation which is continuous in the noisiness/weak- topology such as the symmetric capacity, Arıkan s polar transformations, etc. [13]) to MP b X).

3 A. Useful Notations III. THE POLARIZATION PHENOMENON Throughout this paper, G, +) denotes a finite Abelian group. If W is a channel, we denote the symmetric capacity 2 of W as IW). For every subgroup H of G, define the channel D H DMC G,G/H as 1 if x A, D H A x) = 0 otherwise. In other words, D H is the deterministic channel where the output is the coset to which the input belongs. It is easy to see that ID H ) = log G/H. We denote the set D H : H is a subgroup of G} as DH G. Now let Y be a finite set and let W DMC G,Y. For every subgroup H of G, define the channel W[H] DMC G/H,Y as W[H]y A) = 1 A x A Wy x) = 1 H Wy x). x A Remark 2. Let X be a random variable uniformly distributed in G and let Y be the output of the channel W. It is easy to see that IW[H]) = IX mod H,Y). Let δ > 0. We say that a channel W DMC G,Y is δ- determined by a subgroup H of G,+) if IW) log G/H < δ and IW[H]) log G/H < δ. We say that W is δ-determined if there exists at least one subgroup H which δ-determines W. It is easy to see that if δ is small enough, there exists at most one subgroup that δ- determines W. Intuitively, if δ is small and W is δ-determined by a subgroup H, then the channel W is almost-equivalent to D H : Let X be a random variable that is uniformly distributed in G and let Y be the output of W when X is the input. We have: The inequality IX mod H;Y) log G/H < δ means that X mod H can be determined from Y with high probability. the inequality IX;Y) log G/H < δ means that there is almost no other information about X which can be determined from Y. Due to the above two observations, we can informally) say that if W isδ-determined byh, thenw is almost equivalent to D H. B. The Polarization Process Let W DMC G,Y be a channel with input alphabet G. Define the channels W DMC G,Y 2 and W + DMC G,Y 2 G as follows: W y 1,y 2 u 1 ) = 1 Wy 1 u 1 +u 2 )Wy 2 u 2 ), G 2 The symmetric capacity of a channel is the mutual information between a uniformly distributed input and the output. and W + y 1,y 2,u 1 u 2 ) = 1 G Wy 1 u 1 +u 2 )Wy 2 u 2 ). For every n 1 and every s = s 1,...,s n ),+} n, define W s = W s1 ) s2 ) ) sn. Remark 3. It can be shown that if W and V are equivalent, then W resp. W + ) and V resp. V + ) are equivalent. This allows us to write Ŵ and Ŵ+ to denote Ŵ and Ŵ+, respectively. It was shown in [3], [4] and [5] that as n becomes large, the behavior of almost all the synthetic channels W s ) s,+} n approaches the behavior of deterministic homomorphism channels projecting their input onto quotient groups. One way to formalize the above statement was given in [5] as follows: For every δ > 0, we have lim n 1 2 n s,+} n : W s is δ-determined } = 1. 1) Definition 1. Let B n ) n 1 be a sequence of independent and uniformly distributed Bernoulli random variables in, +}. Define the channel-valued random process W n ) n 0 as follows: W 0 = W. W n = W Bn n 1 = WB1,...,Bn) if n 1. Equation 1) can be rewritten as: lim P[W n is δ-determined}] = 1. 2) n One informal way to interpret Equation 2) is to say that the process W n ) n 0 converges to channels in DH G. This statement will be made formal in the following section. IV. CONVERGENCE OF THE POLARIZATION PROCESS Throughout this section, we identify a channel W DMC o) G, with its Blackwell measure MP W MP bf G). We also extend the definition of the + and operations to all balanced measures in MP b G) as discussed in Remark 1). Lemma 1. Let W n ) n 0 be the channel-valued process defined in Definition 1. Almost surely, the sequence IWn ) IW n ) ) converges to zero. n 0 Proof. It is well known that IW )+IW + ) = 2IW) for every channel with input alphabet G. Hence, we have EIW n+1 ) W n ) = 1 2 IW n )+ 1 2 IW+ n ) = IW n ). This shows that the process IW n )) n 0 is a martingale, hence it converges almost surely. This means that the process IWn+1 ) IW n ) ) almost surely converges to zero. On n 0 the other hand, we have IWn ) IW n ) if B n+1 =, IW n+1 ) IW n ) = IW n + ) IW n) if B n+1 = +, a) = IW n ) IW n),

4 where a) follows from the fact that IW n ) + IW+ n ) = 2IW n ), which means that IW n ) IW n ) = IW + n ) IW n ). Now define the set POL G = MP MP b G) : IMP ) = IMP)}. The next lemma shows that the Blackwell measures of channels with a small IW ) IW) are close in the noisiness metric sense) to measures in POL G. Lemma 2. For every ǫ > 0, there exists δ > 0 such that for every MP MP b G), if IMP ) IMP) < δ, then G, MP,POL G) < ǫ. Proof. Define the function f : MP b G) R + as fmp) = IMP ) IMP). Since the symmetric capacity and the transformation are continuous in the noisiness/weak- topology see [13]), the function f is also continuous in the same topology. Since POL G = f 1 0}) and since f is continuous, the set POL G is closed. Now for every ǫ > 0, define the set POL G,ǫ = MP MP b G) : G, MP,POL G) < ǫ}, and let δ = inffpol c G,ǫ )) = inf fmp) : MP MP b G) and G, MP,POL G) ǫ }. Since the set POL G is closed, we can see that the set POL c G,ǫ is closed as well. Furthermore, since the space MP b G) is compact see e.g., [8]), the set POL c G,ǫ is compact as well. Therefore, the set fpol c G,ǫ ) is compact in R +, which means that its infimum is achieved, i.e., there exists MP ǫ POL c G,ǫ such that δ = fmp ǫ ). But POL c G,ǫ POL G = and POL G = f 1 0}), so we must have δ = fmp ǫ ) > 0. From the definition of δ, we have: G, MP,POL G) ǫ fmp) δ. Hence, by contraposition, we have fmp) < δ G, MP,POL G) < ǫ. In the rest of this section, we analyze the balanced metaprobability measures that are in POL G i.e., those that satisfy IMP ) = IMP)). For every p, we denote the entropy of p as Hp). For every p,q, define p q as follows: p q)u 1 ) = pu 1 +u 2 )qu 2 ). Lemma 3. For every MP MP b G), we have IMP ) IMP) = Hp q) Hp))dMPp)dMPq). Proof. It is sufficient to show this for Blackwell measures of DMCs because we can then extend the equation to MP b G) by continuity). LetW be a DMC with input alphabetg. We haveiw ) IW) so IMP W ) IMP W) = IW) IW ). From [13, Proposition 8], we have IW) = log G Hp)dMP W p) = log G Hp)dMP W p)dmp W q). Similarly, IW ) = log G Hp)dMP W p) a) = log G Hp)dMP W,MP W ) p) b) = log G Hp)dC,+ # MP W MP W )p) c) = log G HC,+ p,q))dmp W MP W )p,q) d) = log G Hp q)dmp W p)dmp W q), where a) follows from [13, Proposition 10], b) follows from the definition of the, +)-convolution see Page 20 of [13]), c) follows from the properties of the push-forward probability measure, and d) follows from the definition of the C,+ map see Page 20 of [13]) and Fubini s theorem. The lemma now follows from the fact that IW ) IW) = IW) IW ). For every p and every u G, define p u as p u x) = px+u). Let p,q. We have p q = qu 2 )p u2. Due to the strict concavity of entropy, we have Hp q) Hp). Moreover, we have Hp q) = Hp) p u2 = p u 2, u 2,u 2 suppq) p u = p, u suppq) p u = p, u suppq), where suppq) = u 2 u 2 : u 2,u 2 suppq)}, and suppq) is the subgroup of G generated by suppq). Lemma 4. Let MP MPG). We have IMP ) = IMP) if and only if for every p,q suppmp) we have p = p u for every u suppq). Proof. Define the function F : R + as Fp,q) = Hp q) Hp). Since F is continuous and positive, the integral Fp,q)dMPp)dMPq) 3)

5 is equal to zero if and only if the function F is equal to zero on suppmp) suppmp). The lemma now follows from Lemma 3 and Equation 3). Lemma 5. POL G = MP D : D DH G }. Proof. Let H be a subgroup of G. It is easy to see that 1 MP DH = δ πa, where δ πa is a Dirac measure G/H centered at π A the uniform distribution on A). It is easy to check that MP DH satisfies the condition of Lemma 4, hence IMP D H ) = IMP DH ) and so MP D : D DH G } POL G. Now suppose that MP POL G. For everyp suppmp), define A p = suppp) and H p = A p. Lemma 4 shows that p = p u for every u H p. Let x,x A p. We have: px ) = px+x x) = p x xx) a) = px), where a) follows from the fact that x x H p. This shows that p is the uniform distribution on A p. Moreover, for every u H p, we have px+u) = p u x+u) = px+u u) = px) > 0. This implies that A p = x+h p, which means that the support of p is a coset of the subgroup H p. Now let p,q suppmp). Letx A p andu H q. Lemma 4 implies that px+u) = p u x) = px) > 0, hence u = x + u x H p. This shows that H q H p. Similarly, we can show that H p H q. Therefore, H p = H q for every p,q suppmp). This means that the support of MP consists of uniform distributions over cosets of the same subgroup of G. Let H be this subgroup. The above discussion shows that MP = α A δ πa, for some distribution α A : A G/H} over the quotient group G/H. Fix A G/H and let x A. We have: 1 G = π Gx) a) = px)dmpp) = 1 α B π B x) = α A A = α A H, B G/H where a) follows from the fact that MP is balanced. Hence α A = H G = 1 1, and so MP = δ πa. This G/H G/H means that MP = MP DH, thus MP MP D : D DH G }, and so POL G MP D : D DH G }. We conclude that POL G = MP D : D DH G }. Theorem 1. Let W n ) n 0 be the channel-valued process defined in Definition 1. Almost surely, there exists a subgroup H of G such that the sequence Ŵn) n 0 converges to ˆD H in the noisiness/weak- topology. Proof. Lemma 1 shows that almost surely, the sequence IW n ) IW n ) ) n 0 converges to zero. Let W n) n 0 be a sample of the process for which the sequence IW n ) IW n ) ) n 0 converges to zero. Lemma 2 implies that d o) G, MP W n,pol G ) ) n 0 converges to zero. Now since POL G is finite see Lemma 5), the sequence MP Wn ) n 0 converges to an element in POL G. Lemma 5 now implies that there exists a subgroup H of G such that the sequence ˆD H in the noisiness/weak- topol- Ŵn) n 0 converges to ogy. Corollary 1. For any channel functional F : DMC R which is invariant under channel equivalence, and which is continuous in the noisiness/weak- topology, the process FWn ) ) n 0 almost surely converges. More precisely,fw n) converges to FD H ) if Ŵ n converges to ˆD H. V. DISCUSSION Our proof can be used verbatim) to show the almost sure convergence of the polarization process associated to any channel whose equivalence class is determined by the Blackwell measure. This family of channels is large and contains almost any sensible channel we can think of [12]. ACKNOWLEDGMENT I would like to thank Emre Telatar and Maxim Raginsky for helpful discussions. REFERENCES [1] E. Arıkan, Channel polarization: A method for constructing capacityachieving codes for symmetric binary-input memoryless channels, Information Theory, IEEE Transactions on, vol. 55, no. 7, pp. 3051 3073, 2009. [2] E. Şaşoğlu, E. Telatar, and E. Arıkan, Polarization for arbitrary discrete memoryless channels, in Information Theory Workshop, 2009. ITW 2009. IEEE, 2009, pp. 144 148. [3] W. Park and A. Barg, Polar codes for q-ary channels, Information Theory, IEEE Transactions on, vol. 59, no. 2, pp. 955 969, 2013. [4] A. G. Sahebi and S. S. Pradhan, Multilevel channel polarization for arbitrary discrete memoryless channels, IEEE Transactions on Information Theory, vol. 59, no. 12, pp. 7839 7857, Dec 2013. [5] R. Nasser and E. Telatar, Polar codes for arbitrary DMCs and arbitrary MACs, IEEE Transactions on Information Theory, vol. 62, no. 6, pp. 2917 2936, June 2016. [6] R. Nasser, An ergodic theory of binary operations, part I: Key properties, IEEE Transactions on Information Theory, vol. 62, no. 12, pp. 6931 6952, Dec 2016. [7], An ergodic theory of binary operations, part II: Applications to polarization, IEEE Transactions on Information Theory, vol. 63, no. 2, pp. 1063 1083, Feb 2017. [8], Topological structures on DMC spaces, Entropy, vol. 20, no. 5, 2018. [Online]. Available: http://www.mdpi.com/1099-4300/20/5/343 [9] M. Raginsky, Channel polarization and Blackwell measures, in 2016 IEEE International Symposium on Information Theory ISIT), July 2016, pp. 56 60. [10] N. Goela and M. Raginsky, Channel polarization through the lens of Blackwell measures, arxiv:1809.05073, September 2018. [11] F. Buscemi, Degradable channels, less noisy channels, and quantum statistical morphisms: An equivalence relation, Probl. Inf. Transm., vol. 52, no. 3, pp. 201 213, Jul. 2016. [12] E. Torgersen, Comparison of Statistical Experiments, ser. Encyclopedia of Mathematics and its Applications. Cambridge University Press, 1991. [13] R. Nasser, Continuity of channel parameters and operations under various DMC topologies, Entropy, vol. 20, no. 5, 2018. [Online]. Available: http://www.mdpi.com/1099-4300/20/5/330