Optimal Encoding Schemes for Several Classes of Discrete Degraded Broadcast Channels

Similar documents
Optimal Natural Encoding Scheme for Discrete Multiplicative Degraded Broadcast Channels

Optimal Independent Encoding Schemes for Several Classes of Discrete Degraded Broadcast Channels

Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels

The Capacity Region of a Class of Discrete Degraded Interference Channels

The Gallager Converse

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Lecture 10: Broadcast Channel and Superposition Coding

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Lecture 4 Noisy Channel Coding

Variable Length Codes for Degraded Broadcast Channels

Information Theory. Lecture 10. Network Information Theory (CT15); a focus on channel capacity results

Lecture 1: The Multiple Access Channel. Copyright G. Caire 12

Lecture 3: Channel Capacity

ELEC546 Review of Information Theory

On the Capacity Region of the Gaussian Z-channel

Multiaccess Channels with State Known to One Encoder: A Case of Degraded Message Sets

On Scalable Coding in the Presence of Decoder Side Information

Chain Independence and Common Information

Lecture 22: Final Review

Feedback Capacity of the Gaussian Interference Channel to Within Bits: the Symmetric Case

On Multiple User Channels with State Information at the Transmitters

Reliable Computation over Multiple-Access Channels

(Classical) Information Theory III: Noisy channel coding

Solutions to Homework Set #2 Broadcast channel, degraded message set, Csiszar Sum Equality

A Proof of the Converse for the Capacity of Gaussian MIMO Broadcast Channels

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

PART III. Outline. Codes and Cryptography. Sources. Optimal Codes (I) Jorge L. Villar. MAMME, Fall 2015

Feedback Capacity of a Class of Symmetric Finite-State Markov Channels

Capacity Region of Reversely Degraded Gaussian MIMO Broadcast Channel

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

A Comparison of Superposition Coding Schemes

SOURCE CODING WITH SIDE INFORMATION AT THE DECODER (WYNER-ZIV CODING) FEB 13, 2003

Network Coding on Directed Acyclic Graphs

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

National University of Singapore Department of Electrical & Computer Engineering. Examination for

Solutions to Homework Set #3 Channel and Source coding

Energy State Amplification in an Energy Harvesting Communication System

Upper Bounds on the Capacity of Binary Intermittent Communication

X 1 : X Table 1: Y = X X 2

Interactive Decoding of a Broadcast Message

The Unbounded Benefit of Encoder Cooperation for the k-user MAC

Information Theory for Wireless Communications. Lecture 10 Discrete Memoryless Multiple Access Channel (DM-MAC): The Converse Theorem

ECE Information theory Final (Fall 2008)


5218 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 12, DECEMBER 2006

Joint Write-Once-Memory and Error-Control Codes

On the Secrecy Capacity of Fading Channels

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

Random Access: An Information-Theoretic Perspective

Side-information Scalable Source Coding

Remote Source Coding with Two-Sided Information

On Common Information and the Encoding of Sources that are Not Successively Refinable

On the Duality between Multiple-Access Codes and Computation Codes

Noisy-Channel Coding

List of Figures. Acknowledgements. Abstract 1. 1 Introduction 2. 2 Preliminaries Superposition Coding Block Markov Encoding...

On the Rate-Limited Gelfand-Pinsker Problem

Shannon s noisy-channel theorem

New Results on the Equality of Exact and Wyner Common Information Rates

Duality Between Channel Capacity and Rate Distortion With Two-Sided State Information

Approximately achieving the feedback interference channel capacity with point-to-point codes

Midterm Exam Information Theory Fall Midterm Exam. Time: 09:10 12:10 11/23, 2016

A Comparison of Two Achievable Rate Regions for the Interference Channel

On the Capacity and Degrees of Freedom Regions of MIMO Interference Channels with Limited Receiver Cooperation

Simultaneous Nonunique Decoding Is Rate-Optimal

Performance of Polar Codes for Channel and Source Coding

EE376A: Homework #3 Due by 11:59pm Saturday, February 10th, 2018

Lecture 8: Shannon s Noise Models

ELEMENTS O F INFORMATION THEORY

EE376A - Information Theory Final, Monday March 14th 2016 Solutions. Please start answering each question on a new page of the answer booklet.

Capacity of a channel Shannon s second theorem. Information Theory 1/33

UCSD ECE 255C Handout #12 Prof. Young-Han Kim Tuesday, February 28, Solutions to Take-Home Midterm (Prepared by Pinar Sen)

Linear Codes, Target Function Classes, and Network Computing Capacity

The Duality Between Information Embedding and Source Coding With Side Information and Some Applications

On Function Computation with Privacy and Secrecy Constraints

Shannon s Noisy-Channel Coding Theorem

Capacity Region of the Permutation Channel

Equivalence for Networks with Adversarial State

LOW-density parity-check (LDPC) codes were invented

Capacity bounds for multiple access-cognitive interference channel

Intermittent Communication

Capacity Bounds for. the Gaussian Interference Channel

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

Lecture 3. Mathematical methods in communication I. REMINDER. A. Convex Set. A set R is a convex set iff, x 1,x 2 R, θ, 0 θ 1, θx 1 + θx 2 R, (1)

Bounds on Mutual Information for Simple Codes Using Information Combining

Can Feedback Increase the Capacity of the Energy Harvesting Channel?

Lecture 11: Polar codes construction

Error Exponent Region for Gaussian Broadcast Channels

Lossy Distributed Source Coding

Entropic and informatic matroids

Interference Channels with Source Cooperation

Secure Degrees of Freedom of the MIMO Multiple Access Wiretap Channel

Lecture Notes on Digital Transmission Source and Channel Coding. José Manuel Bioucas Dias

LECTURE 13. Last time: Lecture outline

On Scalable Source Coding for Multiple Decoders with Side Information

Digital Communications III (ECE 154C) Introduction to Coding and Information Theory

On Dependence Balance Bounds for Two Way Channels

3. Coding theory 3.1. Basic concepts

The Capacity Region of the Gaussian MIMO Broadcast Channel

Symmetric Characterization of Finite State Markov Channels

Representation of Correlated Sources into Graphs for Transmission over Broadcast Channels

Transcription:

Optimal Encoding Schemes for Several Classes of Discrete Degraded Broadcast Channels Bike Xie, Student Member, IEEE and Richard D. Wesel, Senior Member, IEEE Abstract arxiv:0811.4162v4 [cs.it] 8 May 2009 Let X Y Z be a discrete memoryless degraded broadcast channel (DBC) with marginal transition probability matrices T Y X and T ZX. This paper explores encoding schemes for DBCs. We call a DBC encoder in which symbols from independent codebooks each using the same alphabet as X are combined using the same single-letter function that adds distortion to the channel a natural encoding (NE) scheme. This paper shows that NE schemes achieve the boundary of the capacity region for the multi-user broadcast Z channel, the two-user group-additive DBC, and the two-user discrete multiplicative DBC. This paper also defines and studies the input-symmetric DBC and introduces a permutation encoding approach for the input-symmetric DBC and proves its optimality. Denote q as the distribution of the channel input X. Inspired by Witsenhausen and Wyner, this paper defines and studies the function F. For any given q, and H(Y X) s H(Y ), where H(Y X) is the conditional entropy of Y given X and H(Y ) is the entropy of Y, define the function F T Y X,T ZX (q, s) as the infimum of H(Z U), the conditional entropy of Z given U with respect to all discrete random variables U such that a) H(Y U) = s, and b) U and Y, Z are conditionally independent given X. This paper studies the function F, its properties and its calculation. This paper then represents the capacity region of the DBC X Y Z using the function F T Y X,T ZX. Finally, this paper applies these results to several classes of DBCs and their encoders as discussed above. Keywords Degraded broadcast channel, natural encoding, broadcast Z channel, input-symmetric, group-additive degraded broadcast channel, discrete multiplicative degraded broadcast channel. I. Introduction In the 70 s, Cover [1], Bergmans [2] and Gallager [3] established the capacity region for degraded broadcast channels (DBCs). A common optimal transmission strategy to achieve the boundary of the capacity region for degraded broadcast channels is the joint encoding scheme presented in [4]. Specifically, the data sent to the user with the most degraded channel is encoded first. Given the codeword selected for that user, an appropriate codebook for the user with the second most degraded channel is selected, and so forth. This work was supported by the Defence Advanced Research Project Agency SPAWAR Systems Center, San Diego, California under Grant N66001-02-1-8938. The authors are with the Electrical Engineering Department, University of California, Los Angeles, CA 90095 USA (e-mail:xbk@ee.ucla.edu; wesel@ee.ucla.edu). 1

An independent-encoding scheme can also achieve the capacity of any DBC, as described in Appendix I [5]. This scheme essentially embeds all symbols from all the needed codebooks for the less-degraded user into a single super-symbol (but perhaps with a large alphabet). Then a single-letter function uses the input symbol from the more-degraded user to extract the needed symbol from the super symbol provided by the less-degraded user. Cover [6] introduced an independent-encoding scheme for two-user broadcast channels. When applied for two-user degraded broadcast channels, this scheme independently encodes users messages, and then combines these resulting codewords by applying a single-letter function. This scheme does not specify what codebooks to use or what single-letter function to use. It is a general independent-encoding approach, which includes the independentencoding scheme in Appendix I. A simple encoding scheme that is optimal for some common DBCs is an independentencoding approach in which symbols from independent codebooks each with the same alphabet as X are combined using the same single-letter function that adds distortion to the channel. We refer to this encoding scheme as the natural encoding (NE) scheme. As an example, the NE scheme for a two-receiver broadcast Gaussian channel has as each transmitted symbol the real addition of two real symbols from independent codebooks. The NE scheme is known to achieve the boundary of the capacity region for several broadcast channels including broadcast Gaussian channels [7], broadcast binary-symmetric channels [2] [8] [9] [10], discrete additive degraded broadcast channels [11] and two-user broadcast Z channels [12] [13]. This paper shows that NE schemes also achieve the boundary of the capacity region for the multi-user broadcast Z channel, the two-user group-additive DBC, and the two-user discrete multiplicative DBC. Shannon s entropy power inequality (EPI) [14] gives a lower bound on the differential entropy of the sum of independent random variables. In Bergmans s remarkable paper [7], he applied EPI to establish a converse showing the optimality of the scheme given by [1] [2] 2

(the NE scheme) for broadcast Gaussian channels. Mrs. Gerber s Lemma [15] provides a lower bound on the entropy of a sequence of binary-symmetric channel outputs. Wyner and Ziv obtained Mrs. Gerber s Lemma and applied it to establish a converse showing that the NE scheme for broadcast binary-symmetric channels suggested by Cover [1] and Bergmans [2] achieves the boundary of the capacity region [8]. EPI and Mrs. Gerber s Lemma play the same significant role in proving the optimality of the NE schemes for broadcast Gaussian channels and broadcast binary-symmetric channels. Witsenhausen and Wyner studied a conditional entropy bound for the channel output of a discrete channel and applied the results to establish an outer bound of the capacity region for DBCs [9] [10]. For broadcast binary-symmetric channels, this outer bound coincides with the capacity region. This paper extends ideas from Witsenhausen and Wyner [10] to study a conditional entropy bound for the channel output of a discrete DBC and represent the capacity region of discrete DBCs with this conditional entropy bound. This paper simplifies the expression of the conditional entropy bound for broadcast binary-symmetric channels and broadcast Z channels. For broadcast Z channels, the simplified expression of the conditional entropy bound demonstrates that the NE scheme provided in [12], which is optimal for two-user broadcast Z channels, is also optimal for multi-user broadcast Z channels. Witsenhausen and Wyner made two seminal contributions in [9] and [10]: the notion of minimizing one entropy under the constraint that another related entropy is fixed and the use of input symmetry as a way of solving an entire class of channels with a single unifying approach. Benzel [11] used the first idea to study discrete additive degraded broadcast channels. Recently Liu and Ulukus [16] [17] used both ideas together to extend Benzels results to include the larger class of discrete degraded interference channels (DDIC). Our paper defines what it means for a degraded broadcast channel to be input-symmetric (IS) and provides an independent-encoding scheme which achieves the capacity region of all inputsymmetric DBCs. 3

The input-symmetric channel was introduced by Witsenhausen and Wyner [10] and studied in [16] [17] and [18]. This paper extends the definition of the input-symmetric channel to the definition of the input-symmetric DBC. This paper introduces an independent-encoding scheme employing permutation functions of independently encoded streams (the permutation encoding approach) for the input-symmetric DBC and proves its optimality. The discrete additive DBC [11] is a special case of the input-symmetric DBC, and the optimal encoding approach for the discrete additive DBC [11] is also a special case of the permutation encoding approach. The group-additive DBC is a class of input-symmetric DBCs whose channel outputs are group additions of the channel input and noise. The permutation encoding approach for the group-additive DBC is the group-addition encoding approach, which is the NE scheme for the group-additive DBC. The discrete multiplicative DBC is a discrete DBC whose channel outputs are discrete multiplications (multiplications in a finite field) of the channel input and noise. This paper studies the conditional entropy bound for the discrete multiplicative DBC, and proves that the NE scheme achieves the boundary of the capacity region for discrete multiplicative DBCs. This paper is organized as follows: Section II defines the conditional entropy bound F ( ) for the channel output of a discrete DBC and represents the capacity region of the discrete DBC with the function F. Section III establishes a number of theorems concerning various properties of F. Section IV evaluates F ( ) and indicates the optimal transmission strategy for the discrete DBC. For the multi-user broadcast Z channel, Section V proves the optimality of the NE scheme. Section VI introduces the input-symmetric DBC, provides the permutation encoding approach, an independent-encoding scheme for the input-symmetric DBC and proves its optimality. Section VII studies the discrete multiplicative DBC and shows that the NE scheme achieves the boundary of the capacity region for the discrete multiplicative DBC. Section VIII delivers the conclusions. 4

II. The Conditional Entropy Bound F ( ) Let X Y Z be a discrete memoryless DBC where X {1, 2,, k}, Y {1, 2,, n} and Z {1, 2,, m}. Let T Y X be an n k stochastic matrix with entries T Y X (j, i) = Pr(Y = j X = i) and T ZX be an m k stochastic matrix with entries T ZX (j, i) = Pr(Z = j X = i). Thus, T Y X and T ZX are the marginal transition probability matrices of the degraded broadcast channel. Definition 1: Let vector q in the simplex k of probability k-vectors be the distribution of the channel input X. Define the function FT Y X,T ZX (q, s) as the infimum of H(Z U), the conditional entropy of Z given U, with respect to all discrete random variables U such that a) H(Y U) = s; b) U and Y, Z are conditionally independent given X, i.e., the sequence U, X, Y, Z forms a Markov chain U X Y Z. For any fixed vector q, the domain of FT Y X,T ZX (q, s) in s is the closed interval [H(Y X), H(Y )], where H(Y X) is the conditional entropy of Y given X and H(Y ) is the entropy of Y. This will be proved later in Lemma 3. The function F ( ) is an extension to the function F( ) introduced in [10]. We will use FT Y X,T ZX (q, s), F (q, s) and F (s) interchangeably. Theorem 1: FT Y X,T ZX (q, s) is monotonically nondecreasing in s and the infimum in its definition is a minimum. Hence, FT Y X,T ZX (q, s) can be taken as the minimum H(Z U) with respect to all discrete random variables U such that a) H(Y U) s; b) U and Y, Z are conditionally independent given X. The proof of Theorem 1 will be given in Section III. Theorem 2: The capacity region for the discrete memoryless degraded broadcast channel 5

X Y Z is the closure of the convex hull of all rate pairs (R 1, R 2 ) satisfying 0 R 1 I(X; Y ), (1) R 2 H(Z) F T Y X,T ZX (q, R 1 + H(Y X)), (2) for some q k, where I(X; Y ) is the mutual information of between X and Y, H(Y X) is the conditional entropy of Y given X, and H(Z) is the entropy of Z resulting from the channel input s distribution q. Thus, for a fixed input distribution q and for λ 0, finding the maximum of R 2 + λr 1 is equivalent to finding the minimum of F (q, s) λs as follows: max(r 2 + λr 1 ) = max (H(Z) F (q, R 1 + H(Y X)) + λr 1 + λh(y X) λh(y X)) = H(Z) λh(y X) + max ( F (q, R 1 + H(Y X)) + λ(r 1 + H(Y X))) = H(Z) λh(y X) min (F (q, s) λs) (3) Proof: The capacity region for the DBC is known in [1] [3] [4] as co {(R 1, R 2 ) : R 1 I(X; Y U), R 2 I(U; Z)}, (4) p(u),p(x u) where co denotes the closure of the convex hull operation, and U is the auxiliary random variable which satisfies the Markov chain U X Y Z and U min( X, Y, Z ). 6

Rewrite (4) and we have co = co = co = co p(u),p(x u) p X =q k p X =q k p X =q k {(R 1, R 2 ) : R 1 I(X; Y U), R 2 I(U; Z)} p(u,x) with p X =q p(u,x) with p X =q H(Y X) s H(Y ) {(R 1, R 2 ) : R 1 I(X; Y U), R 2 I(U; Z)} (5) {(R 1, R 2 ) : R 1 H(Y U) H(Y X), R 2 H(Z) H(Z U)} { (R1, R 2 ) : R 1 s H(Y X), R 2 H(Z) FT Y X,T (q, s)} ZX (7) = co { (R1, R 2 ) : 0 R 1 I(X; Y ), R 2 H(Z) FT Y X,T ZX (q, R 1 + H(Y X)) }, p X =q k (6) (8) where p X is the vector expression of the distribution of channel input X. Some of these steps are justified as follows: (5) follows from the equivalence of p(u),p(x u) and p X =q k p(u,x) with p X =q ; (7) follows from the definition of the conditional entropy bound F ; (8) follows from the nondecreasing property of F (s) in Theorem 1, which allows the substitution s = R 1 + H(Y X) in the argument of F. Q.E.D. Note that for a fixed distribution p X = q of the channel input X, the items I(X; Y ), H(Z) and H(Y X) in (8) are constants. This theorem provides the relationship between the capacity region and the conditional entropy bound F for a discrete degraded broadcast channel. It also motivates the further study of F. 7

III. Some Properties of F ( ) In this section, we will extend ideas from [10] to establish several properties of the conditional entropy bound F ( ). In [10], Witsenhausen and Wyner defined a conditional entropy bound F( ) for a pair of discrete random variables and provided some properties of F( ). The definition of F( ) is restated here. Let X Z be a discrete memoryless channel with the m k transition probability matrix T, where the entries T(j, i) = Pr(Z = j X = i). Let q be the distribution of X. For any q k, and 0 s H(X), the function F T (q, s) is the infimum of H(Z U) with respect to all discrete random variables U such that H(X U) = s and the sequence U,X,Z is a Markov chain. By definition, F T (q, s) = FI,T (q, s), where I is an identity matrix. Since F ( ) is the extension of F( ), most of the properties of F ( ) in this section are generalizations of properties of F( ) in [10]. For any choice of the integer l 1, w = [w 1,, w l ] T l and p j k for j = 1,, l, let U be a l-ary random variable with distribution w, and let T XU = [p 1 p l ] be the transition probability matrix from U to X. We can compute l p = p X = T XU w = w j p j (9) ξ = H(Y U) = η = H(Z U) = j=1 l w j h n (T Y X p j ) (10) j=1 l w j h m (T ZX p j ) (11) j=1 where h n : n R is the entropy function, i.e., h n (p 1,, p n ) = p i ln p i. Thus the choices of U satisfying conditions a) and b) in the definition of FT Y X,T ZX (q, s) corresponds to the choices of l,w and p j for which (9) (10) yields p = q and ξ = s. Let S = {(p, h n (T Y X p), h m (T ZX p)) k [0, ln n] [0, ln n] p k }. Since k is (k 1)- dimensional, k [0, lnn] [0, ln n] is a (k +1)-dimensional convex polytope. The mapping p (p, h n (T Y X p), h m (T ZX p)) assigns a point in S for each p k. Because this mapping 8

is continuous and the domain of the mapping, k, is compact and connected, the image S is also compact and connected. Let C be the set of all (p, ξ, η) satisfying (9) (10) and (11) for some choice of l, w and p j. By definition, the set C is the convex hull of the set S. Thus, C is compact, connected, and convex. Lemma 1: C is the convex hull of S, and thus C is compact, connected, and convex. Lemma 2: i) Every point of C can be obtained by (9) (10) and (11) with l k + 1. In other words, one only need to consider random variables U taking at most k + 1 values. ii) Every extreme point of the intersection of C with a two-dimensional plane can be obtained with l k. The proof of Lemma 2 is the same as the proof of a similar lemma for F( ) in [10]. The details of the proof are given in Appendix II. Let C = {(ξ, η) (q, ξ, η) C } be the projection of the set C onto the (ξ, η)-plane. Let Cq = {(ξ, η) (q, ξ, η) C } be the projection onto the (ξ, η)-plane of the intersection of C with the two-dimensional plane p = q. By definition, C = q k C q. Also, C and C q are compact and convex. By definition, F T Y X,T ZX (q, s) is the infimum of all η, for which C q contains the point (s, η). Thus FT Y X,T ZX (q, s) = inf{η (q, s, η) C} = inf{η (s, η) Cq }. (12) Lemma 3: For any fixed q as the distribution of X, the domain of F T Y X,T ZX (q, s) in s is the closed interval [H(Y X), H(Y )], i.e.,[ k q ih n (T Y X e i ), h n (T Y X q)], where e i is a vector, for which the i th entry is 1 and all other entries are zeros. Proof : For any Markov chain U X Y, by the Data Processing Theorem [19], H(Y U) H(Y X) and the equality is achieved when the random variable U = X. One also has H(Y U) H(Y ) and the equality is achieved when U is a constant. Thus, the domain of F T Y X,T ZX (q, s) in s is [H(Y X), H(Y )] for a fixed distribution of channel input 9

X. Since q is the distribution of X, H(Y X) = k q ih n (T Y X e i ) and H(Y ) = h n (T Y X q). Q.E.D. Theorem 3: The function F T Y X,T ZX (q, s) is defined on the compact convex domain {(q, s) q k, k q ih n (T Y X e i ) s h n (T Y X q)} and for each (q, s) in this domain, the infimum in its definition is a minimum, attainable with U taking at most k + 1 values. Proof: By Lemma 3, the function F is defined on the compact domain {(q, s) q k, k q ih n (T Y X e i ) s h n (T Y X q)}. This domain is convex because k is convex, the entropy function h n (T Y X q) is concave in q and k q ih n (T Y X e i ) is linear in q. For each (q, s) in this domain, the set {η (s, η) Cq } is non-empty. It is in fact a compact interval since C q is compact. Therefore, F T Y X,T ZX (q, s) = inf{η (s, η) C q } = min{η (s, η) C q } = min{η (q, s, η) C}. (13) By Lemma 2 i), this minimum is attained with U taking at most k + 1 values. Q.E.D. By Lemma 2 ii), the extreme points of Cq can be attained by convex combinations of at most k points of S. Thus, every linear function of (ξ, η) could attain its minimum with U taking at most k value since every linear function of (ξ, η) achieves its minimum over Cq at an extreme point of the compact set Cq. Lemma 4: The function F T Y X,T ZX (q, s) is jointly convex in (q, s). Proof : F T Y X,T ZX (q, s) is jointly convex in (q, s) because C is a convex set. In particular, the domain of F is convex by Theorem 3. For any two points (q 1, s 1 ) and (q 2, s 2 ) in the 10

domain, and for any 0 θ 1, FT Y X,T ZX (θq 1 + (1 θ)q 2, θs 1 + (1 θ)s 2 ) = min{η (θq 1 + (1 θ)q 2, θs 1 + (1 θ)s 2, η) C} min{θη 1 + (1 θ)η 2 (q 1, s 1, η 1 ), (q 2, s 2, η 2 ) C} =θft Y X,T ZX (q 1, s 1 ) + (1 θ)ft Y X,T ZX (q 2, s 2 ). Therefore, F T Y X,T ZX (q, s) is jointly convex in (q, s). Q.E.D. Now we give the proof of Theorem 1. Since Theorem 3 has shown that the infimum in the definition of F is a minimum, it suffices to show that F (s) = FT Y X,T ZX (q, s) is monotonically nondecreasing in s. For any fixed q, the domain of s is [H(Y X), H(Y )]. On the one hand, F (q, H(Y X)) = min{h(z U) p X = q, H(Y U) = H(Y X)} min{h(z U) p X = q, U = X} = H(Z X). (14) On the other hand, for any s [H(Y X), H(Y )], F (q, s) = min{h(z U) p X = q, H(Y U) = s} min{h(z U, X) p X = q, H(Y U) = s} (15) = H(Z X), (16) where (15) follows from H(Z U) H(Z U, X) and (16) follows from the conditional independence between Z and U given X. Inequalities (14) and (16) imply that for any 11

s [H(Y X), H(Y )], F (q, s) F (q, H(Y X)). (17) Combining (17) and the fact that F (q, s) is convex in s for any fixed q, we have F (q, s) is monotonically nondecreasing in s. Q.E.D. The proof of Theorem 1 also gives an endpoint of F (s), F (q, H(Y X)) = H(Z X), (18) which is achieved when U = X. The following theorem will provide the other endpoint, F (q, H(Y )) = H(Z), (19) which is obtained when U is a constant. Theorem 4: For H(Y X) s H(Y ), a lower bound of F (s) is F (s) s + H(Z) H(Y ). (20) F (s) is differentiable at all but at most countably many points. At differentiable points of F (s), Proof : 0 df (s) ds 1. (21) I(U; Z) I(U; Y ) (22) H(Z) H(Z U) H(Y ) H(Y U) H(Z U) H(Y U) + H(Z) H(Y ) F (s) s + H(Z) H(Y ). (23) 12

H(Z) C * q line s+h(z) H(Y) with slope 1 slope H(Z X) (q, ) slope 0 F*(q,s) H(Y X) H(Y) s or Fig. 1 Illustrations of the curve F (q, s) = F T Y X,T ZX (q, s) shown in bold, the region C q, and the point (0, ψ(q, λ)). Some of these steps are justified as follows: (22) follows from the Data Processing Theorem [19]; (23) follows from the definition of F (s). When the random variable U is a constant, H(Y U) = H(Y ) and H(Z U) = H(Z). Thus, equality in (23) is attained when s = H(Y ). Since F (s) is convex in s, it is differentiable at all but at most countably many points. If F (s) is differentiable at s = H(Y ), then 1 because the line s+h(z) H(Y ) with slope 1 supports the curve F (s) at s=h(y ) df (s) ds its end point (H(Y ), F (H(Y ))). For any H(Y X) s < H(Y ) where F (s) is differentiable, since F (s) is convex, the slope of the supporting line at the point (s, F (s)) is less than or equal to the slope of the supporting line s + H(Z) H(Y ) at the point (H(Y ), F (H(Y ))). Thus, for any H(Y X) s H(Y ) where F (s) is differentiable df (s) ds 1. (24) df (s) ds 0 because F (s) is monotonically nondecreasing. The illustrations of the function F (s) = F T Y X,T ZX (q, s) and C q are shown in Fig. 1. Q.E.D. 13

For X q, where q is a fixed vector, by Theorem 2, finding the maximum of R 2 + λr 1 is equivalent to finding the minimum of F (q, s) λs. Theorem 4 indicates that for every λ > 1, the minimum of F (q, s) λs is attained when s = H(Y ) and F (s) = H(Z), i.e., U is a constant. Thus, the non-trivial range of λ is 0 λ 1. The following theorem is the key to the applications in Section V and is an extension and generalization of Theorem 2.4 in [10]. Let X = (X 1,, X N ) be a sequence of channel inputs to the degraded broadcast channel X Y Z. The corresponding channel outputs are Y = (Y 1,, Y N ) and Z = (Z 1,, Z N ). Thus, the sequence of the channel outputs (Y i, Z i ), i = 1,, N, are conditionally independent of each other given the channel inputs X. Note that the channel outputs (Y i, Z i ) do not have to be identically or independently distributed since X 1,, X N could be correlated and have different distributions. Denote q i as the distribution of X i for i = 1,, N. Thus, q = q i /N is the average of the distribution of the channel inputs. For any q k, define F T (N) Y X,T(N) ZX (q, Ns) be the infimum of H(Z U) with respect to all random variables U and all possible channel inputs X such that H(Y U) = Ns, the average of the distribution of the channel inputs is q and U X Y Z is a Markov chain. Theorem 5: For all N = 1, 2,, and all T Y X,T ZX, q, and H(Y X) s H(Y ), one has F T (N) Y X,T(N) ZX (q, Ns) = NF T Y X,T ZX (q, s). (25) Proof : We first prove that F T (N) Y X,T(N) ZX (q, Ns) NF T Y X,T ZX (q, s). Since Ns = H(Y U) = = N H(Y i Y 1,, Y i 1, U) (26) N s i, (27) 14

where s i = H(Y i Y 1,, Y i 1, U) and (26) follows from the chain rule of entropy [4], H(Z U) = = N H(Z i Z 1,, Z i 1, U) (28) N H(Z i Z 1,, Z i 1, Y 1,, Y i 1, U) (29) N H(Z i Y 1,, Y i 1, U) (30) N FT Y X,T ZX (q i, s i ) (31) N N NFT Y X,T ZX ( q i /N, s i /N) (32) = NF T Y X,T ZX (q, s). (33) Some of these steps are justified as follows: (28) follows from the chain rule of entropy [4]; (29) holds because conditional entropy decreases when the conditioning increases; (30) follows from the fact that Z i and Z 1,, Z i 1 are conditionally independent given Y 1,, Y i 1 ; (31) follows from the definition of F if considering the Markov chain (U, Y 1,, Y i 1 ) X i Y i Z i ; (32) results from applying Jensen s inequality to the convex function F. By the definition of F T (N) Y X,T(N) ZX (q, Ns), Equation (33) implies that F T (N) Y X,T(N) ZX (q, Ns) NF T Y X,T ZX (q, s). (34) On the other hand, in the case that U is composed of N independently identically distributed (i.i.d.) random variables (U 1,, U N ), and each U i X i achieves p Xi = q, H(Y i U i ) = s 15

and H(Z i U i ) = F T Y X,T ZX (q, s), one has H(Y U) = Ns and H(Z U) = NF T Y X,T ZX (q, s). Since F T (N) Y X,T(N) ZX is defined by taking the minimum, F T (N) Y X,T(N) ZX (q, Ns) NF T Y X,T ZX (q, s). (35) Combining (34) and (35), one has F T (N) Y X,T(N) ZX (q, Ns) = NF T Y X,T ZX (q, s). Q.E.D. Theorem 5 indicates that if using the degraded broadcast channel X Y Z for N times, and for a fixed q as the average of the distribution of the channel inputs, the conditional entropy bound F T (N) Y X,T(N) ZX (q, Ns) is achieved when the channel is used independently and identically for N times, and single use of the channel at each time achieves the conditional entropy bound FT Y X,T ZX (q, s). IV. Evaluation of F ( ) In this section, we evaluate F (s) = FT Y X,T ZX (q, s) via a duality technique, which is also used for evaluating F( ) in [10]. This duality technique also provides the optimal transmission strategy for the DBC X Y Z to achieve the maximum of R 2 + λr 1 for any λ 0. Theorem 3 shows that F T Y X,T ZX (q, s) = min{η (s, η) C q } = min{η (q, s, η) C}. (36) Thus, the function F T Y X,T ZX (q, s) is determined by the lower boundary of C q. Since C q is convex, its lower boundary can be described by the lines supporting its graph from the below. The line with slope λ in the (ξ, η)-plane supporting C q as shown in Fig. 1 has the equation η = λξ + ψ(q, λ), (37) where ψ(q, λ) is the η-intercept of the tangent line with slope λ for the function F T Y X,T ZX (q, s). 16

Thus, ψ(q, λ) = min{f (q, ξ) λξ H(Y X) ξ H(Y )} (38) = min{η λξ (ξ, η) C q } (39) = min{η λξ (q, ξ, η) C}. (40) For H(Y X) s H(Y ), the function F (s) = F T Y X,T ZX (q, s) can be represented as F (s) = max{ψ(q, λ) + λs < λ < }. (41) Theorem 1 shows that the graph of F (s) is supported at s = H(Y X) by a line of slope 0, and Theorem 4 shows that the graph of F (s) is supported at s = H(Y ) by a line of slope 1. Thus, for H(Y X) s H(Y ), F (s) = max{ψ(q, λ) + λs 0 λ 1}. (42) Let L λ be a linear transformation (q, ξ, η) (q, η λξ). It maps C and S onto the sets C λ = {(q, η λξ) (q, ξ, η) C}, (43) and S λ = {(q, h m (T ZX q) λh n (T Y X q)) q k }. (44) The lower boundaries of C λ and S λ are the graphs of ψ(q, λ) and φ(q, λ) = h m (T ZX q) λh n (T Y X q) respectively. Since C is the convex hull of S, and thus C λ is the convex hull of S λ, ψ(q, λ) is the lower convex envelope of φ(q, λ) on k. In conclusion, ψ(, λ) can be obtained by forming the lower convex envelope of φ(, λ) for each λ and F (q, s) can be reconstructed from ψ(q, λ) by (42). This is the dual approach to 17

the evaluation of F. Theorem 2 represents the capacity region for a DBC by the function F (q, s). Since ψ(q, λ) and F (q, s) can be constructed by each other from (38) and (42), for any λ 0, the associated point on the boundary of the capacity region may be found (from its unique value of R 2 + λr 1 ) as follows max max{r 2 + λr 1 p X = q} q k = max q k max{h(z) F (q, s) + λs λh(y X)} = max q k (H(Z) λh(y X) min{f (q, s) λs}) = max q k (H(Z) λh(y X) ψ(q, λ)). (45) We have shown the relationship among F, ψ and the capacity region for the DBC. Now we state a theorem which provides the relationship among F (q, s), ψ(q, λ), φ(q, λ), and the optimal transmission strategies for the DBC. Theorem 6: i) For any 0 λ 1, if a point of the graph of ψ(, λ) is the convex combination of l points of the graph of φ(, λ) with arguments p j and weights w λ, j = 1,, l, then ) F T Y X,T ZX ( j w j p j, j w j h n (T Y X p j ) = j w j h m (T ZX p j ). (46) Furthermore, for a fixed channel input distribution q = j w jp j, the optimal transmission strategy to achieve the maximum of R 2 +λr 1 is determined by l,w j and p j. In particular, an optimal transmission strategy has U = l, Pr(U = j) = w j and p X U=j = p j, where p X U=j denotes the conditional distribution of X given U = j. ii)for a predetermined channel input distribution q, if the transmission strategy U = l, Pr(U = j) = w j and p X U=j = p j achieves max{r 2 + λr 1 j w jp j = q}, then the point (q, ψ(q, λ)) is the convex combination of l points of the graph of φ(, λ) with arguments p j 18

and weights w λ, j = 1,, l. The proof is given in Appendix III. Note that if for some pair (q, λ), ψ(q, λ) = φ(q, λ), then the corresponding optimal transmission strategy has l = 1, which means U is a constant. Thus, the line η = λξ + ψ(q, λ) supports the graph of F (q, ) at its endpoint (H(Y ), H(Z)) = (h n (T Y X q), h m (T ZX q)). A. Example: broadcast binary-symmetric channel For the broadcast binary-symmetric channel X Y Z with T Y X = 1 α 1 α 1,T ZX = 1 α 2 α 2, (47) α 1 1 α 1 α 2 1 α 2 where 0 < α 1 < α 2 < 1/2, one has φ(p, λ) = φ((p, 1 p) T, λ) = h m (T ZX q) λh n (T Y X q) = h((1 α 2 )p + α 2 (1 p)) λh((1 α 1 )p + α 1 (1 p)), (48) where h(x) = x ln x (1 x) ln(1 x) is the binary entropy function. Taking the second derivative of φ(p, λ) with respect to p, we have φ (1 2α 2 ) 2 (p, λ) = (α 2 p + (1 α 2 )(1 p))((1 α 2 )p + α 2 (1 p)) λ(1 2α 1 ) 2 + (α 1 p + (1 α 1 )(1 p))((1 α 1 )p + α 1 (1 p)), (49) which has the sign of ρ(p, λ) = ( 1 α 1 1 2α 1 p)( 1 α 1 1 2α 1 + p) + λ( 1 α 2 1 2α 2 p)( 1 α 2 1 2α 2 + p). (50) 19

p p 0 p 1/2 1-p 1 p Fig. 2 For any 0 λ 1, min p ρ(p, λ) = λ 4(1 2α 2 ) 1 2 4(1 2α 1 ) 2. (51) Thus, for λ (1 2α 2 ) 2 /(1 2α 1 ) 2, φ (p, λ) 0 for all 0 p 1, and so ψ(p, λ) = φ(p, λ). In this case, the optimal transmission strategy achieving the maximum of R 1 also achieves the maximum of R 2 + λr 1, and thus the optimal transmission strategy has l = 1, which means U is a constant. Note that φ(1/2+p, λ) = φ(1/2 p, λ). For λ < (1 2α 2 ) 2 /(1 2α 1 ) 2, φ(p, λ) has negative second derivative on an interval symmetric about p = 1/2. Let p λ = arg min p φ(p, λ) with p λ 1/2. Thus p λ satisfies φ p(p λ, λ) = 0. By symmetry, the envelope ψ(, λ) is obtained by replacing φ(p, λ) on the interval (p λ, 1 p λ ) by its minimum over p, which is shown in Fig. 2. Therefore, the lower envelope of φ(p, λ) is φ(p λ, λ), for p λ p 1 p λ ψ(p, λ) = φ(p, λ), otherwise. (52) For the predetermined distribution of X, p X = q = (q, 1 q) T with p λ < q < 1 p λ, (q, ψ(q, λ)) is the convex combination of the points (p λ, ψ(p λ, λ)) and (1 p λ, ψ(1 p λ, λ)). 20

Therefore, by Theorem 6, F (q, s) = h 2 (T ZX (p λ, 1 p λ ) T ) = h(α 2 + (1 2α 2 )p λ ) for s = h 2 (T Y X (p λ, 1 p λ ) T ) = h(α 1 + (1 2α 1 )p λ ), and 0 p λ q or 1 q p λ 1. This defines F (q, ) on its entire domain [h(α 1 ), h(α 1 + (1 2α 1 )q)], i.e., [H(Y X), H(Y )]. For the predetermined distribution of X, q = (q, 1 q) T with q < p λ or q > 1 p λ, one has φ(q, λ) = ψ(q, λ), which means that a line with slope λ supports F (q, ) at point s = H(Y ) = h(α 1 + (1 2α 1 )q, and thus the optimal transmission strategy has l = 1, which means U is a constant. V. Broadcast Z Channels The Z channel, shown in Fig. 3(a), is a binary asymmetric channel which is noiseless when symbol 1 is transmitted but noisy when symbol 0 is transmitted. The channel output Y is the binary OR of the channel input X and Bernoulli distributed noise with parameter α. The capacity of the Z channel was studied in [20]. The Broadcast Z channel is a class of discrete memoryless broadcast channels whose component channels are Z channels. A two-user broadcast Z channel with marginal transition probability matrices T Y X = 1 α 1,T ZX = 1 α 2, (53) 0 1 α 1 0 1 α 2 where 0 < α 1 α 2 < 1, is shown in Fig 3(b). The two-user broadcast Z channel is stochastically degraded and can be modeled as a physically degraded broadcast channel as shown in Fig. 4, where α = (α 2 α 1 )/(1 α 1 ) [12]. In the NE scheme for broadcast Z channels, the transmitter first independently encodes users information messages into binary codewords and then broadcasts the binary OR of these encoded codewords. The NE scheme achieves the whole boundary of the capacity region for the two-user broadcast Z channel [12] [13]. In this section, we will show that the NE scheme also achieves the boundary of the capacity region for multi-user broadcast Z channels. 21

1 0 X (a) Y 1 0 X 1 0 1 2 (b) 1 0 1 0 Y Z Fig. 3 The broadcast Z channel 1 X Y Z 1 1 0 0 Fig. 4 The degraded version of the broadcast Z channel A. F for the broadcast Z channel For the broadcast Z channel X Y Z shown in Fig. 3(b) and Fig. 4 with T Y X = 1 α 1,T ZX = 1 α 2, (54) 0 β 1 0 β 2 where 0 < α 1 α 2 < 1, β 1 = 1 α 1, and β 2 = 1 α 2, one has φ(p, λ) = φ((1 p, p) T, λ) = h(pβ 2 ) λh(pβ 1 ). (55) Taking the second derivative of φ(p, λ) with respect to p, we have φ (p, λ) = β2 2 λβ1 2, (56) (1 pβ 2 )pβ 2 (1 pβ 1 )pβ 1 22

(p, ) (p, ) p 0 1 Fig. 5 Illustrations of φ(, λ) and ψ(, λ) for the broadcast Z channel which has the sign of ρ(p, λ) = pβ 1 β 2 (1 λ) + λβ 1 β 2. (57) Let β = β 2 /β 1. For the case of β λ 1, φ (p, λ) 0 for all 0 p 1. Hence, φ(p, λ) is convex in p and thus φ(p, λ) = ψ(p, λ) for all 0 p 1. In this case, the optimal transmission strategy achieving the maximum of R 1 also achieves the maximum of R 2 + λr 1, and the optimal transmission strategy has l = 1, i.e., U is a constant. Note that the transmission strategy with l = 1 is a special case of the NE scheme in which the only codeword for the second user is an all-ones codeword. [ For the case of 0 λ < β, φ(p, λ) is concave in p on [0, β 2 λβ 1 β 1 β 2 (1 λ) β 2 λβ 1 β 1 β 2 ] and convex on (1 λ), 1]. The graph of φ(, λ) in this case is shown in Fig. 5. Since φ(0, λ) = 0, ψ(, λ), the lower convex envelope of φ(, λ), is constructed by drawing the tangent through the origin. Let (p λ, φ(p λ, λ)) be the point of contact. The value of p λ is determined by φ p (p λ, λ) = φ(p λ, λ)/p λ, i.e., ln(1 β 2 p λ ) = λ ln(1 β 1 p λ ). (58) Let q = (1 q, q) T be the distribution of the channel input X. For q p λ, ψ(q, λ) is 23

1 q p q p U X Y 1 p 1 p 1 Z Fig. 6 The optimal transmission strategy for the two-user broadcast Z channel obtained as a convex combination of points (0, 0) and (p λ, φ(p λ, λ)) with weights (p λ q)/p λ and q/p λ. By Theorem 6, it corresponds to s = [(p λ q)/p λ ]0+[q/p λ ]h(β 1 p λ ) = qh(β 1 p λ )/p λ and F (q, s) = q/p λ h(β 2 p λ ). Hence, for the broadcast Z channel, F T Y X,T ZX (q, qh(β 1 p)/p) = qh(β 2 p)/p (59) for p [q, 1], which defines FT Y X,T ZX (q, ) on its entire domain [qh(β 1 ), h(qβ 1 )]. Also by Theorem 6, the optimal transmission strategy U X achieving max{r 2 + λr 1 j w jp j = q} is determined by l = 2, w 1 = (p λ q)/p λ, w 2 = q/p λ, p 1 = (1, 0) T and p 2 = (1 p λ, p λ ) T. Since the optimal transmission strategy U X is a Z channel as shown in Fig. 6, the random variable X could also be constructed as the OR operation of two Bernoulli random variables with parameters (p λ q)/p λ and 1 p λ respectively. Hence, the optimal transmission strategy for the broadcast Z channel is still the NE scheme in this case. For q > p λ, ψ(q, λ) = φ(q, λ) and so the optimal transmission strategy has l = 1, i.e., U is a constant. Therefore, we provide an alternative proof to show that the NE scheme achieves the whole boundary of the two-user broadcast Z channel. B. Multi-user broadcast Z channel Let X = (X 1,, X N ) be a sequence of channel inputs to the broadcast Z channel X Y Z satisfying (54). The corresponding channel outputs are Y = (Y 1,, Y N ) and Z = (Z 1,, Z N ). Thus, the sequence of the channel outputs (Y i, Z i ), i = 1,, N, 24

X Y 1 Y 2 Y 3 Y K-1 Y K... 1 / 1 / / Fig. 7 The K-user broadcast Z channel are conditionally independent with each other given the channel inputs X. Note that the channel outputs (Y i, Z i ) do not have to be identically or independently distributed since X 1,, X N could be correlated and have different distributions. Lemma 5: Consider the Markov chain U X Y Z with i Pr(X i = 0)/N = q, if H(Y U) N q p h(β 1p), (60) for some p [q, 1], then The proof is given in Appendix IV. H(Z U) N q p h(β 2p) (61) = N q p h(β 1pβ ), (62) Consider a K-user broadcast Z channel with marginal transition probability matrices T Yj X = 1 α j, (63) 0 β j where 0 < α 1 α K < 1, and β j = 1 α j for j = 1,, K. The K-user broadcast Z channel is stochastically degraded and can be modeled as a physically DBC as shown in Fig. 7. The NE scheme for the K-user broadcast Z channel is to independently encode the K users information messages into K binary codewords and broadcast the binary OR of these 25

K encoded codewords. The j th user then successively decodes the messages for User K, User K 1,, and finally for User j. The codebook for the j th user is designed by random coding technique according to the binary random variable X (j) with Pr{X (j) = 0} = q (j). Denote X (i) X (j) as the OR of X (i) and X (j). Hence, the channel input X is the OR of X (j) for all 1 j K, i.e., X = X (1) X (K). From the coding theorem for DBCs [2] [3], the achievable region of the NE scheme for the K-user broadcast Z channel is determined by R j I(Y j, X (j) X (j+1),, X (K) ) (64) = H(Y j X (j+1),, X (K) ) H(Y j X (j), X (j+1),, X (K) ) (65) ( K ) ( j K ) j 1 = q (i) h(β j q (i) ) q (i) h(β j q (i) ) (66) i=j+1 i=j = q t j h(β j t j ) q t j 1 h(β j t j 1 ) (67) where t j = j q(i) for j = 1,, K, and q = Pr(X = 0) = K q(i). Denote t 0 = 1. Since 0 q (1),, q (K) 1, one has 1 = t 0 t 1 t K = q. (68) We now state and prove that the achievable region of the NE scheme is the capacity region for the multi-user broadcast Z channel. Fig. 8 shows the communication system for the K- user broadcast Z channel. X = (X 1,, X N ) is a length-n codeword determined by the messages W 1,, W K. Y 1,, Y K are the channel outputs corresponding to the channel input X. Theorem 7: If N Pr{X i = 0}/N = q, then no point (R 1,, R K ) such that R j q t j h(β j t j ) q t j 1 h(β j t j 1 ), j = 1,, K R d = q t d h(β d t d ) q t d 1 h(β d t d 1 ) + δ, for some d {1,, K}, δ > 0 (69) 26

S 1 W 1 {1,...,M 1 } S 2... W 2 Encoder X(W 1,...,W K ) Z 1 Z 2...... Z K Y 1 Y 2 Y K S K W K {1,...,M K } DEC DEC DEC Wˆ 1 Wˆ 2 WˆK Fig. 8 The communication system for the multi-user broadcast Z channel is achievable, where the t j are as in (67) and (68). Proof (by contradiction): This proof borrows the idea of proving the converse of the coding theorem for broadcast Gaussian channels [2]. Lemma 5 plays the same role in this proof as the entropy power inequality does in the proof for broadcast Gaussian channels. We suppose that the rates of (69) are achievable, which means that the probability of decoding error for each receiver can be upper bounded by an arbitrarily small ǫ for sufficiently large N Pr{Ŵj W j Y j } < ǫ, j = 1,, K. (70) By Fano s inequality, this implies that H(W j Y j ) h(ǫ) + ǫ ln(m j 1), j = 1,, K. (71) 27

Let o(ǫ) represent any function of ǫ such that o(ǫ) 0 and o(ǫ) 0 as ǫ 0. Equation (71) implies that H(W j Y j ), j = 1,, K, are all o(ǫ). Therefore, H(W j ) = H(W j W j+1,, W K ) (72) = I(W j ; Y j W j+1,, W K ) + H(W j Y j, W j+1,, W K ) (73) I(W j ; Y j W j+1,, W K ) + H(W j Y j ) (74) = H(Y j W j+1,, W K ) H(Y j W j, W j+1,, W K ) + o(ǫ), (75) where (72) follows from the independence of the W j, j = 1,, K. From (69), (75) and the fact that NR j H(W j ), H(Y j W j+1,, W K ) H(Y j W j, W j+1,, W K ) N q t j h(β j t j ) N q t j 1 h(β j t j 1 ) o(ǫ). Next, using Lemma 5 and (76), we show in the Appendix V that (76) H(Y K ) Nh(β K q) + Nδ o(ǫ), (77) where q = t K = N Pr(X i = 0)/N. Since ǫ can be arbitrarily small for sufficient large N, o(ǫ) 0 as N. For sufficiently large N, H(Y K ) Nh(β K q) + Nδ/2. However, it contradicts H(Y K ) = N H(Y K,i ) (78) N h(β K Pr(X i = 0)) (79) Nh(β K N Pr(X i = 0)/N) (80) = Nh(β K q). (81) 28

Some of these steps are justified as follows: (78) follows from Y K = (Y K,1,, Y K,N ); (80) is obtained by applying Jensen s inequality to the concave function h( ); (81) follows from q = N Pr(X i = 0)/N. The desired contradiction has been obtained, so the theorem is proved. VI. Input-Symmetric Degraded Broadcast Channels The input-symmetric channel was first introduced in [10] and studied further in [16] [17] [18]. The definition of the input-symmetric channel is as follows: Let Φ n denote the symmetric group of permutations of n objects by the n n permutation matrices. An n-input m-output channel with transition probability matrix T m n is input-symmetric if the set G T = {G Φ n Π Φ m, s.t. TG = ΠT } (82) is transitive, which means each element of {1,, n} can be mapped to every other element of {1,, n} by some permutation matrix in G T [10]. An important property of inputsymmetric channel is that the uniform distribution achieves capacity. Extend the definition of the input-symmetric channel to the input-symmetric DBC as follows: Definition 2: Input-Symmetric Degraded Broadcast Channel: A discrete memoryless DBC X Y Z with X = k, Y = n and Z = m is input-symmetric if the set G TY X,T ZX = G TY X G TZX (83) = {G Φ k Π Y X Φ n, Π ZX Φ m, s.t. T Y X G = Π Y X T Y X, T ZX G = Π ZX T ZX } (84) is transitive. Lemma 6: G TY X,T ZX is a group under matrix multiplication. 29

Proof : Every closed subset of a group is a group. Since G TY X,T ZX is a subset of Φ k, which is a group under matrix multiplication, it suffices to show that G TY X,T ZX is closed under matrix multiplication. Suppose G 1, G 2 G TY X,T ZX such that T Y X G 1 = Π Y X,1 T Y X, T ZX G 1 = Π ZX,1 T ZX, T Y X G 2 = Π Y X,2 T Y X and T ZX G 2 = Π ZX,2 T ZX. Thus, T Y X G 1 G 2 = Π Y X,1 Π Y X,2 T Y X, (85) and T ZX G 1 G 2 = Π ZX,1 Π ZX,2 T ZX. (86) Therefore, G 1 G 2 G TY X,T ZX. Q.E.D. Let l = G TY X,T ZX and G TY X,T ZX = {G 1,, G l }. Lemma 7: l G i = l k 11T, where l k Proof : Since G TY X,T ZX is a group, for all j = 1,, l, is an integer and 1 is an all-ones vector. G j ( l G i ) = l G j G i = l G i. (87) Hence, l G i has k identical columns and k identical rows since G TY X,T ZX is transitive. Therefore, l G i = l k 11T. Q.E.D. Definition 3: A subset of G TY X,T ZX : {G i1,, G ils } is a smallest transitive subset of G TY X,T ZX if l s j=1 G ij = l s k 11T, (88) where ls k is the smallest possible integer for which (88) is satisfied. A. Examples: broadcast binary-symmetric channels and broadcast binary-erasure channels The class of input-symmetric DBCs includes most of the common discrete memoryless DBCs. For example, the broadcast binary-symmetric channel X Y Z with marginal 30

transition probability matrices T Y X = 1 α 1 α 1 and T ZX = 1 α 2 α 2, α 1 1 α 1 α 2 1 α 2 where 0 α 1 α 2 1/2, is input-symmetric since G TY X,T ZX = 1 0, 0 1 0 1 1 0 (89) is transitive. Another interesting example is the broadcast binary-erasure channel with marginal transition probability matrices 1 a 1 0 1 a 2 0 T Y X = a 1 a 1 and T ZX = a 2 a 2, 0 1 a 1 0 1 a 2 where 0 a 1 a 2 1. It is input-symmetric since its G TY X,T ZX is the same as that of the broadcast binary-symmetric channel shown in (89). B. Example: group-additive DBC Definition 4: Group-additive Degraded Broadcast Channel: A degraded broadcast channel X Y Z with X, Y, Z {1,, n} is a group-additive degraded broadcast channel if there exist two n-ary random variables N 1 and N 2 such that Y X N 1 and Z Y N 2 as shown in Fig. 9, where denotes identical distribution and denotes group addition. The class of group-additive DBCs includes the broadcast binary-symmetric channel and the discrete additive DBC [11] as special cases. Theorem 8: Group-additive DBCs are input-symmetric. 31

N1 N2 X Y Z Fig. 9 The group-additive degraded broadcast channel. Proof : For the group-additive DBC X Y Z with X, Y, Z {1,, n}, let G x for x = 1,, n, be 0-1 matrices with entries 1 if j x = i G x (i, j) = 0 otherwise for i, j = 1,, n. (90) G x for x = 1,, n, are actually permutation matrices and have the property that G x1 G x2 = G x2 G x1 = G x1 x 2. Let (γ 0,, γ n 1 ) T be the distribution of N 1. Since Y has the same distribution as X N 1, one has n T Y X = γ x G x. (91) x=1 Hence, T Y X G x = G x T Y X for all x = 1,, n. Similarly, we have T ZX G x = G x T ZX for all x = 1,, n, and so {G 1,, G n } G TY X,T ZX. (92) Since the set {G 1,, G n } is transitive by definition, G TY X,T ZX is also transitive and hence the group-additive degraded broadcast channel is input-symmetric. Q.E.D. By definition, n j=1 G j = 11 T, and hence, {G 1,, G n } is a smallest transitive subset of G TY X,T ZX for the group-additive DBC. 32

C. Example: IS-DBC not covered in [16] [17] The class of DDICs and the corresponding DBCs studied in [16] [17] have to satisfy the condition that the transition probability matrix T ZY is input-symmetric, i.e., G TZY is transitive. The input-symmetric DBC, however, does not have to satisfy this condition. The following example provides an IS-DBC which is not covered in [16] [17]. Consider a DBC X Y Z with transition probability matrices a c b d T Y X =, T ZY = e f g h, c a g h e f d b and T ZX = T ZY T Y X = α β, (93) β α where a + c = b + d = 1, e + f + g + h = 1, α = ae + bf + cg + dh and β = ag + bh + ce + df. This DBC is input-symmetric since its G TY X,T ZX is the same as that of the broadcast binarysymmetric channel shown in (89). It is not covered in [16] [17] because 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 G TZY =, 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 0 (94) is not transitive. 33

D. Optimal input distribution and capacity region Consider the input-symmetric DBC X Y Z with the marginal transition probability matrices T Y X and T ZX. Recall that the set C is the set of all (p, ξ, η) satisfying (9) (10) and (11) for some choice of l, w and p j, j = 1,, l, the set C = {(ξ, η) (q, ξ, η) C } is the projection of the set C on the (ξ, η)-plane, and the set Cq = {(ξ, η) (q, ξ, η) C } is the projection on the (ξ, η)-plane of the intersection of C with the two-dimensional plane p = q. Lemma 8: For any permutation matrix G G TY X,T ZX and (p, ξ, η) C, one has (Gp, ξ, η) C. Proof : Since (p, ξ, η) satisfying (9) (10) and (11) for some choice of l, w and p j, l w j Gp j = Gp (95) j=1 l w j h n (T Y X Gp j ) = j=1 l w j h m (T ZX Gp j ) = l w j h n (Π Y X T Y X p j ) = ξ (96) j=1 j=1 j=1 l w j h m (Π Y X T ZX p j ) = η. (97) Hence, (Gp, ξ, η) satisfying (9) (10) and (11) for the choice of l, w and Gp j,j = 1,, l. Q.E.D. Corollary 1: p k and G G TY X,T ZX, one has C Gp = C p, and so F (Gp, s) = F (p, s) for any H(Y X) s H(Y ). Lemma 9: For any input-symmetric DBC, C = C u, where u denotes the uniform distribution. Proof : For any (ξ, η) C, there exits a distribution p such that (p, ξ, η) C. Let G TY X,T ZX = {G 1,, G l }. By Corollary 1, (G j p, ξ, η) C for all j = 1,, l. By the convexity of the set C, l (q, ξ, η) = ( G j p, ξ, η) C, (98) j=1 34

where q = l j=1 G jp. Since G TY X,T ZX is a group, for any permutation matrix G G TY X,T ZX, G q = l G G j p = j=1 l G j p = q. (99) j=1 Since G q = q, the i th entry and the j th entry of q are the same if G permutes the i th row to the j th row. Since the set G TY X,T ZX for an input-symmetric DBC is transitive, all the entries of q are the same, and so q = u. This implies that (ξ, η) C u. Since (ξ, η) is arbitrarily taken from C, one has C C u. On the other hand, by definition, C C u. Therefore, C = C u. Q.E.D. Now we state and prove that the uniformly distributed X is optimal for input-symmetric DBCs. Theorem 9: For any input-symmetric DBC, its capacity region can be achieved by using the transmission strategies such that the broadcast signal X is uniformly distributed. As a consequence, the capacity region is co { (R 1, R 2 ) : R 1 s h n (T Y X e 1 ), R 2 h m (T ZX u) FT Y X,T ZX (u, s), h n (T Y X e 1 ) s ln(n) }, where e 1 = (1, 0,, 0) T, n = Y, and m = Z. (100) Proof : Let q = (q 1,, q k ) T be the distribution of the channel input X for the inputsymmetric DBC X Y Z. Since G TY X is transitive, the columns of T Y X are permutations 35

of each other. H(Y X) = = = k H(Y X = i) (101) k q i h n (T Y X e i ) (102) k q i h n (T Y X e 1 ) (103) = h n (T Y X e 1 ), (104) which is independent of q. Let l = G TY X,T ZX and G TY X,T ZX = {G 1,, G l }. H(Z) = h m (T ZX q) (105) = h m (T ZX q) (106) = 1 l l h m (T ZX G i q) (107) h m (T ZX 1 l l G i q) (108) = h m (T ZX u), (109) where (108) follows from Jensen s inequality. Since C = C u for the input-symmetric DBC, F (q, s) F (u, s). (110) 36

Plugging (104), (109) and (110) into (7), the expression of the capacity region for the DBC, the capacity region for input-symmetric DBCs is co { (R1, R 2 ) : R 1 s H(Y X), R 2 H(Z) FT Y X,T ZX (q, s) } (111) p X =q k co { (R1, R 2 ) : R 1 s h n (T Y X e 1 ), R 2 h m (T ZX u) FT Y X,T ZX (u, s) } p X =q k (112) = co { (R 1, R 2 ) : R 1 s h n (T Y X e 1 ), R 2 h m (T ZX u) FT Y X,T ZX (u, s) } (113) = co { (R 1, R 2 ) : p X = u, R 1 s H(Y X), R 2 H(Z) FT Y X,T ZX (u, s) } (114) co { (R1, R 2 ) : R 1 s H(Y X), R 2 H(Z) FT Y X,T ZX (q, s) }, (115) p X =q k Note that (111) and (115) are identical expressions, hence (111-115) are all equal. Therefore, (100) and (113) express the capacity region for the input-symmetric DBC, which also means that the capacity region can be achieved by using transmission strategies where the broadcast signal X is uniformly distributed. Q.E.D. E. Permutation encoding approach and its optimality The permutation encoding approach is an independent-encoding scheme which achieves the capacity region for input-symmetric DBCs. The block diagram of this approach is shown in Fig. 10. In Fig. 10, W 1 is the message for User 1, which sees the better channel T Y X, and W 2 is the message for User 2, which sees the worse channel T ZX. The permutation encoding approach is first to independently encode these two messages into two codewords X 1 and X 2, and then to combine these two independent codewords using a single-letter operation. Let G s be a smallest transitive subset of G TY X,T ZX. Denote k = X and l s = G s. Use a random coding technique to design the codebook for User 1 according to the k-ary random 37

S 1 S 2 W 1 W 2 Encoder 1 Encoder 2 X 1 X 2 g X ( X ) 2 1 X T YX T ZX Y Z Successive Decoder Decoder 2 Wˆ 1 Wˆ 2 Fig. 10 The block diagram of the permutation encoding approach Y Decoder 1 Xˆ,Wˆ 1 1 Decoder 2 ˆX 2 Fig. 11 The structure of the successive decoder for input-symmetric DBCs variable X 1 with distribution p 1 and the codebook for User 2 according to the l-ary random variable X 2 with uniform distribution. Let G s = {G 1,, G ls }. Define the permutation function g x2 (x 1 ) = x if the permutation matrix G x2 maps the x th 1 column to the xth column, where x 2 {1,, l s } and x, x 1 {1,, k}. Hence, g x2 (x 1 ) = x if and only if the x th 1 row, x th column entry of G x2 is 1. The permutation encoding approach is then to broadcast X which is obtained by applying the single-letter permutation function X = g X2 (X 1 ) on symbols of codewords X 1 and X 2. Since X 2 is uniformly distributed and l s j=1 G j = ls k 11T, the broadcast signal X is also uniformly distributed. User 2 receives Z and decodes the desired message directly. User 1 receives Y and successively decodes the message for User 2 and then for User 1. The structure of the successive decoder is shown in Fig. 11. Note that Decoder 1 in Fig. 11 is not a joint decoder even though it has two inputs Y and ˆX 2. In particular, for the group-additive DBC with Y X N 1 and Z Y N 2, the permutation function g x2 (x 1 ) is the group addition x 2 x 1. Hence the permutation encoding approach for the group-additive DBC is the NE scheme for the group-additive DBC. The 38