Mismatched Multi-letter Successive Decoding for the Multiple-Access Channel

Similar documents
An Achievable Error Exponent for the Mismatched Multiple-Access Channel

THE mismatched decoding problem [1] [3] seeks to characterize

The Method of Types and Its Application to Information Hiding

Exact Random Coding Error Exponents of Optimal Bin Index Decoding

Finding the best mismatched detector for channel coding and hypothesis testing

Optimal Power Control in Decentralized Gaussian Multiple Access Channels

An Alternative Proof of Channel Polarization for Channels with Arbitrary Input Alphabets

Error Exponent Region for Gaussian Broadcast Channels

Reliable Communication Under Mismatched Decoding

A Comparison of Superposition Coding Schemes

Lecture 1: The Multiple Access Channel. Copyright G. Caire 12

IN [1], Forney derived lower bounds on the random coding

EE 4TM4: Digital Communications II. Channel Capacity

Variable Length Codes for Degraded Broadcast Channels

On Achievable Rates for Channels with. Mismatched Decoding

Lecture 4 Noisy Channel Coding

Simultaneous Nonunique Decoding Is Rate-Optimal

On Gaussian MIMO Broadcast Channels with Common and Private Messages

Universal A Posteriori Metrics Game

Exercise 1. = P(y a 1)P(a 1 )

The Poisson Channel with Side Information

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

Low-Complexity Fixed-to-Fixed Joint Source-Channel Coding

Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels

On the Capacity Region of the Gaussian Z-channel

A Comparison of Two Achievable Rate Regions for the Interference Channel

An Achievable Rate Region for the 3-User-Pair Deterministic Interference Channel

Multiaccess Channels with State Known to One Encoder: A Case of Degraded Message Sets

On Scalable Source Coding for Multiple Decoders with Side Information

Upper Bounds on the Capacity of Binary Intermittent Communication

Lecture 5 Channel Coding over Continuous Channels

Bit-Interleaved Coded Modulation Revisited: A Mismatched Decoding Perspective

The Capacity Region of the Gaussian Cognitive Radio Channels at High SNR

Second-Order Asymptotics for the Gaussian MAC with Degraded Message Sets

Capacity Region of Reversely Degraded Gaussian MIMO Broadcast Channel

Degrees of Freedom Region of the Gaussian MIMO Broadcast Channel with Common and Private Messages

Optimal Power Allocation for Parallel Gaussian Broadcast Channels with Independent and Common Information

Nearest Neighbor Decoding in MIMO Block-Fading Channels With Imperfect CSIR

Achievable Error Exponents for the Private Fingerprinting Game

Information Theory for Wireless Communications. Lecture 10 Discrete Memoryless Multiple Access Channel (DM-MAC): The Converse Theorem

Equivalence for Networks with Adversarial State

On the Capacity of the Interference Channel with a Relay

Interference channel capacity region for randomized fixed-composition codes

Keyless authentication in the presence of a simultaneously transmitting adversary

THIS paper is aimed at designing efficient decoding algorithms

arxiv: v1 [cs.it] 4 Jun 2018

On the Duality between Multiple-Access Codes and Computation Codes

Distributed Lossless Compression. Distributed lossless compression system

5958 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 12, DECEMBER 2010

A Novel Asynchronous Communication Paradigm: Detection, Isolation, and Coding

Information Theory Meets Game Theory on The Interference Channel

A New Achievable Region for Gaussian Multiple Descriptions Based on Subset Typicality

Subset Universal Lossy Compression

Multiuser Successive Refinement and Multiple Description Coding

A Summary of Multiple Access Channels

Joint Source-Channel Coding for the Multiple-Access Relay Channel

Universal Anytime Codes: An approach to uncertain channels in control

Lecture 7. Union bound for reducing M-ary to binary hypothesis testing

A Proof of the Converse for the Capacity of Gaussian MIMO Broadcast Channels

Appendix B Information theory from first principles

IEEE Proof Web Version

Optimal Natural Encoding Scheme for Discrete Multiplicative Degraded Broadcast Channels

On Source-Channel Communication in Networks

Bounds on Mutual Information for Simple Codes Using Information Combining

Error Exponent Regions for Gaussian Broadcast and Multiple Access Channels

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

Arimoto Channel Coding Converse and Rényi Divergence

Information Theory for Wireless Communications, Part II:

LECTURE 10. Last time: Lecture outline

IN MOST problems pertaining to hypothesis testing, one

Information measures in simple coding problems

The sequential decoding metric for detection in sensor networks

Coding for a Non-symmetric Ternary Channel

Performance-based Security for Encoding of Information Signals. FA ( ) Paul Cuff (Princeton University)

Bit-interleaved coded modulation revisited: a mismatched decoding perspective Martinez Vicente, A.; Guillén i Fàbregas, A.; Caire, G.; Willems, F.M.J.

Intermittent Communication

Simple Channel Coding Bounds

Frans M.J. Willems. Authentication Based on Secret-Key Generation. Frans M.J. Willems. (joint work w. Tanya Ignatenko)

Lecture 15: Conditional and Joint Typicaility

Variable-Rate Universal Slepian-Wolf Coding with Feedback

Information Theory. Lecture 10. Network Information Theory (CT15); a focus on channel capacity results

Lecture 2: August 31

On Multiple User Channels with State Information at the Transmitters

On the Capacity and Degrees of Freedom Regions of MIMO Interference Channels with Limited Receiver Cooperation

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

A Single-letter Upper Bound for the Sum Rate of Multiple Access Channels with Correlated Sources

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Optimization in Information Theory

INFORMATION THEORY AND STATISTICS

Dispersion of the Gilbert-Elliott Channel

Shannon s noisy-channel theorem

On Capacity Under Received-Signal Constraints

Entropies & Information Theory

Coding into a source: an inverse rate-distortion theorem

The Gallager Converse

Information-Theoretic Limits of Group Testing: Phase Transitions, Noisy Tests, and Partial Recovery

Soft Covering with High Probability

ELEC546 Review of Information Theory

Feedback Capacity of the Gaussian Interference Channel to Within Bits: the Symmetric Case

Achievable Rates for Probabilistic Shaping

Transcription:

Mismatched Multi-letter Successive Decoding for the Multiple-Access Channel Jonathan Scarlett University of Cambridge jms265@cam.ac.uk Alfonso Martinez Universitat Pompeu Fabra alfonso.martinez@ieee.org Albert Guillén i Fàbregas ICREA & Universitat Pompeu Fabra University of Cambridge guillen@ieee.org Abstract This paper studies channel coding for the discrete memoryless multiple-access channel with a given decoding rule. A multi-letter successive decoding rule depending on an arbitrary non-negative function q(x 1, x 2, y) is considered, and an achievable rate region and error exponent are derived. The rate region is compared with that of the imum-metric decoder which uses the function q(x 1, x 2, y), and a numerical example is given for which successive decoding yields a strictly higher sum rate for a given pair of input distributions. I. INTRODUCTION The mismatched decoding problem [1] [3] seeks to characterize the performance of channel coding when the decoding rule is fixed and possibly suboptimal (e.g. due to channel uncertainty or implementation constraints). Extensions of this problem to multiuser settings are not only of interest in their own right, but can also provide valuable insight into the singleuser setting [3] [5]. In particular, significant attention has been paid to the mismatched multiple-access channel (MAC), for which, given the length-n output vector y and codebooks,, x (M) C = x (1) the message pair as ( = 1, 2), the decoder estimates ( ˆm 1, ˆm 2 ) = arg (i,j) qn (x (i) 1, x(j) 2, y), (1) where q n (x 1, x 2, y) n i=1 q(x 1,i, x 2,i, y i ) for some nonnegative decoding metric q(x 1, x 2, y). Given that the decoder only knows the metric q n (x (i) 1, x(j) 2, y) corresponding to each codeword pair, one may question whether there exists a decoding rule with better performance than the imum-metric rule in (1). In general, this question is only interesting if reasonable decoding rules are considered. For example, if the values log q(x 1, x 2, y) are rationally independent (i.e. no value can be written as linear combinations of the others with rational coefficients), then there is a one-to-one correspondence between the joint empirical distribution of (x 1, x 2, y) and the possible values of q n (x 1, x 2, y), and hence the decoder can implement the imum-likelihood (ML) rule (assug the channel is memoryless). This work has been funded in part by the European Research Council under ERC grant agreement 259663, by the European Union s 7th Framework Programme (PEOPLE-2011-CIG) under grant agreement 303633 and by the Spanish Ministry of Economy and Competitiveness under grants RYC-2011-08150 and TEC2012-38800-C03-03. In this paper, we consider successive decoding of the form ˆm 1 = arg q n (x (i) 1, i x(j) 2, y) j ˆm 2 = arg q n (x ( ˆm1) 1, x (j) 2, y). (3) j Stated formally, we have the following. Let W (y x 1, x 2 ) be the transition law of a memoryless MAC, and let q(x 1, x 2, y) be an arbitrary non-negative function. The alphabets are denoted by X 1, X 2 and Y, and each is assumed to be finite. Encoder = 1, 2 takes as input m equiprobable on 1,, M, and transmits the corresponding codeword x (m) from a codebook C. We say that a rate pair (R 1, R 2 ) is achievable if, for all δ > 0, there exist sequences of codebooks C 1,n and C 2,n with M 1 e n(r1 δ) and M 2 e n(r2 δ) respectively, such that P[( ˆm 1, ˆm 2 ) (m 1, m 2 )] 0 under the decoding rule described by (3). Letting E ˆm m for = 1, 2, we observe that if q(x 1, x 2, y) = W (y x 1, x 2 ), then is the decision rule which imizes P[E 1 ]. That is, is a mismatched version of the optimal decoding rule for (one user of) the interference channel (IC). Thus, as well as giving an achievable rate region for the MAC with mismatched successive decoding, our results will quantify the loss due to mismatch for the IC. In particular, we obtain an achievable error exponent using different techniques to those of [6]. It can be shown that the exponents and rates with q = W coincide with those of ML decoding (i.e. (1) with q = W ); this is done by noting that imizes P[E 1 ], (3) imizes the probability of favoring some (m 1, j) (j m 2 ) over (m 1, m 2 ), and (1) imizes P[E 1 E 2 ]. In contrast, we will see that when q W, the successive decoder can lead to significantly different rate regions to those of imum-metric decoding. Notation: Bold symbols are used for vectors (e.g. x), and the corresponding i-th entry is written using a subscript (e.g. x i ). Subscripts are used to denote the distributions corresponding to expectations and mutual informations (e.g. E P [ ], I P (X; Y )). The marginals of a joint distribution P XY are denoted by P X and P Y. We write P X = P X to denote element-wise equality between two probability distributions on the same alphabet. The set of all sequences of length n with a given empirical distribution P X (i.e. type [7, Ch. 2]) is denoted by T n (P X ). We write f(n). = g(n) if lim n 1 n log f(n) g(n) =

0, and similarly for and. We write [α] + = (0, α), and denote the indicator function by 11 II. MAIN RESULT We fix the input distributions Q 1 and Q 2, let P X1X 2Y Q 1 Q 2 W, and define the functions F ( P X1X 2Y, P X 1X 2Y, R 2 ) E P E P [log q(x 1, X 2, Y )] + [ R 2 (X 2 ; X 1, Y ) ] +, (4) R2 (bits/use) 0.6 0.5 0.4 0.3 0.2 0.54 0.5 0.46 0.42 0.06 0.08 0.1 0.12 0.14 F (P X1X 2Y, R 2 ) E P and the sets P X 1 X 2 Y T 1 (P P [log q(x 1, X 2, Y )] X 1 X 2 Y,R 2) + R 2 I P (X 2 ; X 1, Y ), (5) T 1 (P X1X 2Y, R 2 ) ( P X1X 2Y, P X 1X 2Y ) : PX2Y = P X2Y, P X1 = P X1, P X 1Y = P X1Y, P X 2 = P X2, F ( P X1X 2Y, P X 1X 2Y, R 2 ) F (P X1X 2Y, R 2 ) T 1 (P X1X 2Y, R 2 ) P X 1X 2Y : P X 1Y = P X1Y, P X 2 = P X2, I P (X 2 ; X 1, Y ) R 2 T 2 (P X1X 2Y ) PX1X2Y : PX2 = P X2, P X1Y = P X1Y, E P [log q(x 1, X 2, Y )] E P [log q(x 1, X 2, Y )]. (8) Theorem 1. For any input distributions Q 1 and Q 2, the pair (R 1, R 2 ) is achievable for the decoder in (3) provided that R 2 ( P X1 X 2 Y, P X 1 X 2 Y 1(P X 1 X 2 Y,R 2) (6) (7) + [ (X 2 ; X 1, Y ) R 2 ] + (9) (X 2 ; X 1, Y ). (10) P X1 X 2 Y T 2(P X1 X 2 Y ) Proof: See Section III. The imization in (9) is a non-convex optimization problem, but it can be cast in terms of convex optimization problems; see the Appendix for details. While our focus is on achievable rates, the proof of Theorem 1 reveals that the error exponent corresponding to (10) coincides with one of the three error events for imum-metric decoding [5], and the error exponent corresponding to (9) is given by D(P X1X 2Y Q 1 Q 2 W ) P X1 X 2 Y : P X1 =Q 1,P X2 =Q 2 + [ I 0 (P X1X 2Y, R 2 ) R 1 ] +, (11) where I 0 (P X1X 2Y, R 2 ) denotes the right-hand side of (9) with an arbitrary distribution P X1X 2Y used in (5) (8) (rather than P X1X 2Y = Q 1 Q 2 W ). 0.1 Matched Maximum Metric Successive 0 0 0.1 0.2 0.3 0.4 0.5 0.6 R 1 (bits/use) Figure 1. Achievable rate regions for the channel given in (12). A. Numerical Example We consider the MAC with X 1 = X 2 = 0, 1, Y = 0, 1, 2, and W (y x 1, x 2 ) = 1 2η x1x 2 y = x 1 + x 2 η x1x 2 otherwise, (12) where η x1x 2 are constants. The mismatched decoder uses q(x 1, x 2, y) of a similar form, but a fixed value η in place of η x1x 2. We set η 00 = 0.01, η 01 = 0.1, η 10 = 0.01, η 11 = 0.3, η = 0.15, and Q 1 = Q 2 = (0.5, 0.5). Figure 1 plots the achievable rates regions of successive decoding (Theorem 1), imum-metric decoding (see [3], [5]), and matched decoding (yielding the same region whether successive or imum-metric). Interestingly, neither of the mismatched rate regions doates the other, thus suggesting that the two decoding rules are fundamentally different. For the given input distribution, the sum rate for successive decoding exceeds that of imummetric decoding. Furthermore, upon taking the convex hull (which is justified by a time sharing argument [3], [8]), the region for successive decoding is strictly larger. While we observed similar behaviors for other choices of Q 1 and Q 2, it remains unclear as to whether this always holds. Furthermore, while the rate region for imum-metric decoding is tight with respect to the ensemble average [3], it is unclear whether the same is true for that of successive decoding. The vertical line at R 1 0.1 is analogous to the interference channel, where for R 1 below a certain threshold, R 2 can take any value while still ensuring user 1 s message is estimated correctly [6]. Due to the mismatch, this induces a nonpentagonal shape in the present example. III. PROOF OF THEOREM 1 Our analysis is based on the method of type class enumeration (e.g. see [6], [9], [10]), and is perhaps most similar to that of Somekh-Baruch and Merhav [10]. We consider constantcomposition random coding, where for = 1, 2 we have 1 P X (x ) = T n (Q ) 11 x T n (Q ). (13)

Here we assume that Q 1 and Q 2 are types for notational convenience; more generally, we can approximate these by types and the analysis is unchanged. The (independent) random codewords are denoted by (X (1),, X (M) ). We assume without loss of generality that m 1 = m 2 = 1, and we with j 1. The output sequence is denoted by Y, and we write R 1 n log M ( = 1, 2). As noted by Grant et al. [11], we can analyze the error probability of the second decoding step (see (3)) assug that no error occurred on the first step (see ), while still using the unconditional statistics of (X 1, X 2, Y ). The subsequent analysis has been done in the study of imum-metric decoding [3], [5], and the corresponding rate condition is precisely (10). In the remainder of this section, we focus on the first decoding step. Let p e,1 (x 1, x 2, y) denote the random-coding error probability for the first decoding step conditioned on write X = X (1) and let X denote an arbitrary X (j) (X (1) 1, X(1) 2, Y ) = (x 1, x 2, y). The joint type of (x 1, x 2, y) is denoted by P X1X 2Y. 1 We write the objective in as Ξ x2y(x 1 ) q n (x 1, x 2, y) + j 1 q n (x 1, X (j) 2, y), (14) (X 2 ; X 1, Y ) then the type enumerator takes a subexponential value (possibly zero) with overwhelg probability. Lemma 1. [6], [10] Fix the pair (x 1, y) T n ( P X1Y ), a constant δ > 0, and a type P X 1X 2Y S 1(Q 2, P X1Y ). (i) If R 2 (X 2 ; X 1, Y ) + δ, then M 2 e n( (X 2;X 1,Y )+δ) N x1y( P X 1X 2Y ) (19) M 2 e n( (X 2;X 1,Y ) δ) with probability approaching one super-exponentially fast. (ii) If R 2 < (X 2 ; X 1, Y ) + δ, then (20) N x1y( P X 1X 2Y ) e n 2δ (21) with probability approaching one super-exponentially fast. Given a joint type P X1X 2Y, let A δ ( P X1X 2Y ) denote the event that the high-probability events in Lemma 1 occur for all P X1X 2Y S 1(Q 2, P X1Y ). Since P[A δ ( P X1X 2Y )] 1 super-exponentially fast, we can safely condition any event on A δ ( P X1X 2Y ) without changing the exponential behavior of the corresponding probability. Conditioned on A δ (P X1X 2Y ), we have the following: which is random due to the randomness of X (j) 2. Using the union bound, we have p e,1 (x 1, x 2, y) (M 1 1)P [ Ξ x2y(x 1 ) Ξ x2y(x 1 ) ]. (15) We proceed by analyzing the statistics of Ξ x2y. From (14), Ξ x2y(x 1 ) = q n ( P X1X 2Y )+ N x1y( P X 1X 2Y )q n ( P X 1X 2Y ), P X 1 X 2 Y (16) where PX1X 2Y is the joint type of (x 1, x 2, y), N x1y( P X 1X 2Y ) is the random number of X(j) 2 (j 1) such that (x 1, X (j) 2, y) T n ( P X 1X 2Y ), and we write q n ( P X ) 1X 2Y qn (x 1, x 2, y) for an arbitrary triplet (x 1, x 2, y) T n ( P X 1X 2Y ). Since the codewords are generated independently, N x1y( P X 1X 2Y ) is binomially distributed with M 2 1 trials and success probability P [ (x 1, X 2, y) T n ( P X )] 1X 2Y. By construction, we have N x1y( P X ) = 0 unless P 1X 2Y X 1X 2Y S 1(Q 2, P X1Y ), where S 1(Q 2, P X1Y ) P X1X 2Y : P X1Y = P X1Y, P X = Q 2 2. (17) The following lemma characterizes the behavior of N x1y( P X ) 1X 2Y for fixed R 2 and P X 1X 2Y. The proof can be found in [6], [10], and is based on the fact that P [ (x 1, X 2, y) T n ( P X 1X 2Y ) ]. = e n (X 2;X 1,Y ). (18) Roughly speaking, the lemma states that if R 2 > (X 2 ; X 1, Y ) then the corresponding type enumerator is highly concentrated about its mean, whereas if R 2 < 1 This is a slight abuse of notation in light of the previous definition P X1 X 2 Y = Q 1 Q 2 W, but this substitution will be made later. Ξ x2y(x 1 ) = q n (P X1X 2Y ) + q n (P X1X 2Y ) + P X 1 X 2 Y P X 1 X 2 Y S 1 (Q2,P X 1 Y ) R 2 I P (X 2;X 1,Y )+δ q n (P X1X 2Y ) + P X 1 X 2 Y S 1 (Q2,P X 1 Y ) R 2 I P (X 2;X 1,Y )+δ N x1y(p X 1X 2Y )q n (P X 1X 2Y ) (22) N x1y(p X 1X 2Y )q n (P X 1X 2Y ) (23) M 2 e n(i P (X2;X1,Y )+δ) q n (P X 1X 2Y ) (24) F δ (P X1X 2Y ), (25) where (24) follows from part (i) of Lemma 1. Unlike Ξ x2y(x 1 ), the quantity F δ (P X1X 2Y ) is deteristic. Substituting (25) into (15), we obtain p e,1 (x 1, x 2, y) M 1 P [ Ξ x2y(x 1 ) F δ (P X1X 2Y ) ]. (26) Since the statistics of Ξ x2y(x 1 ) depend on x 1 only through the joint type of (x 1, x 2, y), we can write (26) as follows: p e,1 (x 1, x 2, y) M 1 P [ (X 1, x 2, y) T n ( P X1X 2Y ) ] P X1 X 2 Y P [ Ξ x2y(x 1 ) F δ (P X1X 2Y ) ] (27). = M 1 e n (X 1;X 2,Y ) P X1 X 2 Y S 1(Q 1,P X2 Y ) P [ Ξ x2y(x 1 ) F δ (P X1X 2Y ) ], (28)

where x 1 denotes an arbitrary sequence such that (x 1, x 2, y) T n ( P X1X 2Y ), and S 1 (Q 1, P X2Y ) PX1X2Y : PX1 = Q 1, P X2Y = P X2Y. (29) In (28), we have used an analogous property to (18). Next, we again use Lemma 1 in order to replace Ξ x2y(x 1 ) in (28) by a deteristic quantity. We have from (16) that Ξ x2y(x 1 ) q n ( P X1X 2Y ) + p 0 (n) N x1y( P X 1X 2Y )q n ( P X 1X 2Y ), (30) P X 1 X 2 Y where p 0 (n) is a polynomial corresponding to the total number of joint types. Substituting (30) into (28), we obtain p e,1 (x 1, x 2, y) M 1 P X1 X 2 Y S 1(Q 1,P X2 Y ) e n (X 1;X 2,Y ) ( P X 1X 2Y ) ], (31) where E P, P ( P X 1X 2Y ) q n ( P X1X 2Y ) + p 0 (n)n x1y( P X 1X 2Y )q n ( P X 1X 2Y ) F δ (P X1X 2Y ), (32) and we have used the union bound to take the imum over outside the probability in (31). Continuing, we have P X 1X 2Y P [ E P X 1 X 2 Y S 1 (Q2, P P, P ( P X 1X 2Y ) ] X1 Y ) = R 2 (X 2;X 1,Y )+δ R 2< (X 2;X 1,Y )+δ ( P X 1X 2Y ) ], ( P X 1X 2Y ) ]. (33) For the first imization in (33), observe that conditioned on A δ ( P X1X 2Y ) (defined following Lemma 1), we have for P X satisfying R 1X 2Y 2 (X 2 ; X 1, Y ) + δ that N x1y( P X 1X 2Y )q n ( P X 1X 2Y ) M 2 e n( (X 2;X 1,Y ) δ) q n ( P X 1X 2Y ). (34) which follows from the union bound and the identity in (18). Whenever R 2 < (X 2 ; X 1, Y ) + δ, we have ( P X 1X 2Y ) ] c ] + P[B] ] 11 q n ( P X1X 2Y ) F δ (P X1X 2Y ) (37) + M 2 e n (X 2;X 1,Y ) ], (38) 11 q n ( P X1X 2Y ) F δ (P X1X 2Y ) + M 2 e n (X 2;X 1,Y ) 11 q n ( P X1X 2Y ) + p 0 (n)e n2δ q n ( P X 1X 2Y ) F δ (P X1X 2Y ), (39) where (38) follows from (36) and since B c implies N x1y( P X 1X 2Y ) = 0, and (39) uses part (ii) of Lemma 1. Observe that F (P X1X 2Y, R 2 ) in (5) is obtained from F δ in (25) in the limit as δ 0. Similarly, the exponents corresponding to the other quantities appearing in the indicator functions in (35) and (39) tend toward the following: F 1 ( P X1X 2Y, P X 1X 2Y, R 2 ) E P E P [log q(x 1, X 2, Y )] + R 2 (X 2 ; X 1, Y ) (40) F 2 ( P X1X 2Y, P X 1X 2Y ) E P E P [log q(x 1, X 2, Y )]. (41) Combining (31), (33), (35) and (39) with these expressions, taking δ 0, and using the continuity of the underlying terms in the optimizations, we obtain p e,1 (x 1, x 2, y) (1) M 1 e n (X 1;X 2,Y ), M 1 e n (X 1;X 2,Y ), (2 ) M 1 e n (X 1;X 2,Y ) M 2 e n (X 2;X 1,Y ), (42) Hence, and using Lemma 1, we have where 2 ( P X 1X 2Y ) ] 11 q n ( P X1X 2Y ) T (1) 1 (P X1X 2Y, R 2 ) +M 2 p 0 (n)e n( (X 2;X 1,Y ) δ) q n ( P X 1X 2Y ) F δ (P X1X 2Y ). ( P X1X 2Y, P X 1X 2Y ) : PX1X2Y S 1 (Q 1, P X2Y ), (35) P X 1X 2Y S 1(Q 2, P X1Y ), (X 2 ; X 1, Y ) R 2, For the second imization in (33), we define the event B F 1 ( P X1X 2Y, P Nx1y( P X 1X 2Y, R 2 ) F (P X1X 2Y, R 2 ) (43) X ) > 0 1X 2Y, yielding P[B] M 2 e n (X 2;X 1,Y ), (36) 2 Strictly speaking, these sets depend on (Q 1, Q 2 ), but this dependence need not be explicit, since we have P X1 = Q 1 and P X2 = Q 2.

T (2 ) 1 (P X1X 2Y, R 2 ) ( P X1X 2Y, P X 1X 2Y ) : P X1X 2Y S 1 (Q 1, P X2Y ), P X 1X 2Y S 1(Q 2, P X1Y ), E P [log q(x 1, X 2, Y )] F (P X1X 2Y, R 2 ) (44) T 1 (P X1X 2Y, R 2 ) ( P X1X 2Y, P X 1X 2Y ) : PX1X 2Y S 1 (Q 1, P X2Y ), P X 1X 2Y S 1(Q 2, P X1Y ), (X 2 ; X 1, Y ) R 2, F 2 ( P X1X 2Y, P X 1X 2Y ) F (P X1X 2Y, R 2 ). (45) The three terms in the imization in (42) respectively correspond to (35) and the two terms in (39). Since F 1 E P [log q], we see that T (1) 1 T (2 ) 1, and hence the second term in the outer imum of (42) can be removed. Furthermore, we can safely substitute P X1X 2Y = Q 1 Q 2 W, since P X1X 2Y Q 1 Q 2 W with probability approaching one by the law of large numbers. We thus obtain the following rate conditions for the first decoding step: R 1 + R 2 (1) (46) + (X 2 ; X 1, Y ). (47) Finally, using the definitions of F, S 1, S 1, T (1) 1 and T 1 (see (4), (17), (29), (43) and (45)) to unite (46) (47) yields (9). APPENDIX Here we write (9) in terms of convex optimization problems, starting with the alternative expression in (46) (47). We first note that (47) holds if and only if + [ (X 2 ; X 1, Y ) R 2 ] +, (48) due to the constraint (X 2 ; X 1, Y ) R 2. Next, we claim that when combining (46) and (48), the rate region is unchanged if the constraint (X 2 ; X 1, Y ) R 2 is omitted from (48). To see this, note that for (X 2 ; X 1, Y ) < R 2, the objective in (48) coincides with that of (46). The desired result follows from the identity F 1 > F 2 (using (40) (41) and the assumption (X 2 ; X 1, Y ) < R 2 ), implying that (46) is more restrictive. We now deal with the non-concavity of F 1 and F 2. Using the identity f(x) = f(x), f(x), (49) x a,b x a x b we obtain the following rate conditions from (46) and (48): (1,1) (50) (1,2) (2,1) (51) + [ (X 2 ; X 1, Y ) R 2 ] + (52) (2,2) + [ (X 2 ; X 1, Y ) R 2 ] +, (53) where for k = 1, 2 and l = 1, 2, T (k.l) 1 is defined in the same way as T (k) 1 with the following modifications: (i) The constraint F k F is changed so that the left-hand side contains the l-th term in the imization in F k (see (40) (41)); (ii) For k = 2, the constraint (X 2 ; X 1, Y ) R 2 is removed, in accordance with the discussion following (48). The variable P X 1X 2Y can be removed from both (50) and (52), since in both cases the choice P X (x 1X 2Y 1, x 2, y) = P X2 (x 2 ) P X1Y (x 1, y) is feasible and yields an objective of. It follows that (50) and (52) yield the same value, and we conclude that (9) can equivalently be expressed in terms of three conditions: (51), (53), and, (54) P X1 X 2 Y T (1,1 ) where T (1,1 ) 1 is defined in the same way as T (1,1) 1 with the variable P X 1X 2Y removed. These three conditions are all written as convex optimization problems, as desired. REFERENCES [1] I. Csiszár and P. Narayan, Channel capacity for a given decoding metric, IEEE Trans. Inf. Theory, vol. 45, no. 1, pp. 35 43, Jan. 1995. [2] A. Ganti, A. Lapidoth, and E. Telatar, Mismatched decoding revisited: General alphabets, channels with memory, and the wide-band limit, IEEE Trans. Inf. Theory, vol. 46, no. 7, pp. 2315 2328, Nov. 2000. [3] A. Lapidoth, Mismatched decoding and the multiple-access channel, IEEE Trans. Inf. Theory, vol. 42, no. 5, pp. 1439 1452, Sept. 1996. [4] A. Somekh-Baruch, On achievable rates for channels with mismatched decoding, 2013, submitted to IEEE Trans. Inf. Theory [Online: http://arxiv.org/abs/1305.0547]. [5] J. Scarlett, A. Martinez, and A. Guillén i Fàbregas, Multiuser coding techniques for mismatched decoding, 2013, submitted to IEEE Trans. Inf. Theory [Online: http://arxiv.org/abs/1311.6635]. [6] R. Etkin, N. Merhav, and E. Ordentlich, Error exponents of optimum decoding for the interference channel, IEEE Trans. Inf. Theory, vol. 56, no. 1, pp. 40 56, 2010. [7] I. Csiszár and J. Körner, Information Theory: Coding Theorems for Discrete Memoryless Systems, 2nd ed. Cambridge University Press, 2011. [8] A. El Gamal and Y. H. Kim, Network Information Theory. Cambridge University Press, 2011. [9] N. Merhav, Error exponents of erasure/list decoding revisited via moments of distance enumerators, IEEE Trans. Inf. Theory, vol. 54, no. 10, pp. 4439 4447, Oct. 2008. [10] A. Somekh-Baruch and N. Merhav, Exact random coding exponents for erasure decoding, IEEE Trans. Inf. Theory, vol. 57, no. 10, pp. 6444 6454, Oct. 2011. [11] A. Grant, B. Rimoldi, R. Urbanke, and P. Whiting, Rate-splitting multiple access for discrete memoryless channels, IEEE Trans. Inf. Theory, vol. 47, no. 3, pp. 873 890, 2001.