Interactive Decoding of a Broadcast Message

In Proc. Allerton Conf. Commun., Contr., Computing, (Illinois), Oct. 2003 Interactive Decoding of a Broadcast Message Stark C. Draper Brendan J. Frey Frank R. Kschischang University of Toronto Toronto, ON, M5S 3G4, Canada {sdraper@comm.utoronto.ca, frey@psi.toronto.edu, frank@comm.utoronto.ca} Abstract We develop communication strategies for the rate-constrained interactive decoding of a message broadcast to a group of interested users. This situation differs from the relay channel in that all users are interested in the transmitted message, and from the broadcast channel because no user can decode on its own. We focus on two-user scenarios, and describe a baseline strategy that uses ideas of coding with decoder side information. One user acts initially as a relay for the other. That other user then decodes the message and sends back random parity bits, enabling the first user to decode. We show how to improve on this scheme s performance through a conversation consisting of multiple rounds of discussion. While there are now more messages, each message is shorter, lowering the overall rate of the conversation. Such multi-round conversations can be more efficient because earlier messages serve as side information known at both encoder and decoder. We illustrate these ideas for binary erasure channels. We show that multi-round conversations can decode using less overall rate than is possible with the single-round scheme. 1 Introduction In this paper we consider the interactive decoding of a message received by a number of users. None of the users has a clear enough reception to decode the message singly, and all users are interested in the message sent. Therefore, the users must have a conversation to determine the message. This scenario models a wide range of situations where requesting a retransmission or a clarifying message from the transmitter is not possible. For instance, the transmitter may be a satellite or airplane that is within view of the users for only a short period of time. This may be the case, e.g., in a military application where the users are soldiers that for security reasons do not want to send a high-power request for retransmission back to the transmitter, but would rather cooperate to decode the message locally. Alternately, the users may constitute a sensor network, and the message a common source of randomness needed to set up network functionality, but power limitations constrain the sensors to local communication. Or, in a network context, the users may be terminals receiving multicast data. If the terminals have received different subsets of the packets sent, they can sort out ambiguity in the data stream by sharing already-received packets. In situation where the terminals are located far from the information source, this local sharing can cause much less network congestion than requesting clarifying packets from 1

the source. In digital fountain approaches, where re-requests are not needed, interactive decoding can still be useful since, through interaction, terminals may be able to sort out their ambiguity much more quickly than the time it would take for additional packets to arrive from the source. The central question we focus on is whether the conversation between users should consist of a single round of discussion (one message per user), or multiple rounds. We show that many-round conversations based on generalizations of the most efficient strategies known for the relay channel outperform single-round conversations based on these same strategies. Multiple rounds can help because already-received messages serve as side information known to both encoder and decoder. Generally, coding with side information is more efficient when the side information is known at both encoder and decoder, rather than at one or the other. In a multiple-round discussion, encoder and decoder can condition their encoding and decoding, respectively, on this shared information. The investigations of this paper are related to Orlitsky s and others work on interactive communications, e.g., see [4, 5] and the reference therein. The situation is most akin to [5] where Orlitsky and Roche design interactive communication strategies to reconstruct a memoryless function of two correlated observations at one of two users. The major difference in the problem setups is that in our case a detection problem underlies the space of observations about which discussion occurs. A second difference is that in our case both users are interested in decoding the transmitted message. The outline of the paper is as follows. In Section 2 we describe our system model and define a conversation formally. In Section 3 we describe a single-round approach using ideas of relay channel coding. In Section 4 we generalize this approach to many rounds of discussion. And, in Section 5, we analyze the proposed strategies for binary erasure channels. 2 System Model In this section we describe our communication system and define a conversation. We concentrate on the simplest scenario, consisting of a pair of users. In this case, the memoryless channel law is given by p ya,yb x(y a, y b x) = n j=1 p(y a,j x j )p(y b,j x j ), where y a and y b are the observations of the two users a and b, respectively, and where x is the channel input. Definition 1 A (2 nr, n, k) code and k-round conversation for this channel consists of of an encoding function f : {1, 2,..., 2 nr } X n, a set of 2k inter-user messages, m a,i {1, 2,..., 2 nr a,i }, m b,i {1, 2,..., 2 nr b,i }, a corresponding set of 2k inter-user message encoding functions {g a,i, g b,i } k such that m a,i = g a,i (y a, m b,1, m b,2,... m b,i 1 ), m b,i = g b,i (y b, m a,1, m a,2,..., m a,i ), 2

where, without loss of generality, we assume that user a begins the conversation, and a pair of decoding functions h a : Y n a {1, 2,..., 2 nr b,1 }... {1, 2,..., 2 nr b,k } {1, 2,..., 2 nr }, h b : Y n b {1, 2,..., 2 nr a,1 }... {1, 2,..., 2 nr a,k } {1, 2,..., 2 nr }. In this definition we assume that users wait until they have each received their full n-length block of observations before beginning their conversation. The conversation consists of a sequence of finite-rate messages where R a,i denotes the rate of the ith message m a,i sent from user a to user b, and R b,i the rate of the ith message m b,i sent from user b to user a. Rate is normalized with respect to n, the length of the codeword x(m) where m {1, 2,..., 2 nr } is the channel message. Our goal is to find the strategy that minimizes conversation complexity as measured by R sum = k R a,i + R b,i. From cut-set arguments, the transmitted message m cannot be reliably decoded if its rate R > I(x; y a, y b ) C ab. We term C ab is the joint-decoding capacity which can be achieved if the decoders are able to convene and jointly decode the message. This is an upper bound on the rate of reliable communication. 3 Decoding in a Single Round of Discussion In this section we describe a decoding strategy where the conversation consists of a single round of discussion. This strategy serves as a baseline with which we will compare conversations consisting of many rounds of discussion. Theorem 1 A rate-r message can be reliably decoded by users a and b in one round of discussion if R sum R I(x; y a ) + min I(u; y a y b ), (1) x,u P where the set P consists of all input variables x and auxiliary random variables u such that (i) the Markov condition u y a x, y b holds, and (ii) I(x; y b, u) R. To achieve this sum-rate, user a first acts as a relay for user b, user b decodes and sends back random parity bits. We first describe how user b can reliable decode if R a,1 min x,u P I(u; y a y b ). This result follows from standard coding with decoder side information arguments, and follows as a special case of Cover and El Gamal s Theorem 6 in [2] if the channel from the relay to the destination is a finite-rate noiseless link. Briefly, the transmitter uses a channel codebook C of rate R < C ab. Codebook C is generated randomly in an independent identically distributed (i.i.d.) manner according to p(x). User a has a source codebook C a consisting of 2 n R a length-n codewords u(s), s {1, 2,..., 2 n R a } generated i.i.d. according to p(u). User a randomly and uniformly partitions all codewords in C a into 2 nr a,1 subsets or bins. When user a observes y a, it finds a u(s) C a jointly typical with y a, and transmits to user b the index of the bin in which u(s) lies. User b searches that bin for a u(s) jointly typical with y b. It then selects the x(m) jointly typical with the pair (u(s), y b ) as the transmitted codeword. Let us choose R a,1 = I(y a ; u) + ɛ, and R a,1 = I(u; y a ) I(u; y b ) + 3ɛ = I(u; y a y b ) + 3ɛ. Then, because of our choice of rates, the Markov Lemma [1], and because R < C ab, the encoding into u(s), the selection from the bin, and message decoding can all be done reliably. At this point user b has successfully determined m. 3

Since user b knows m, it can use a more efficient strategy when replying to user a. In particular, user b also uses a binning strategy, but this time bins the messages (or codewords) directly, rather than the intermediate statistics given by the u(s). User b bins the 2 nr codewords into 2 nr b,1 = 2 n(r I(x;y a)+2ɛ) bins and transmits to a the index of the bin containing the codeword x(m). User a intersects the contents of this bin with the list L(y a ) of codewords jointly typical with its observation y a. Thus, if we use A (n) ɛ }. From standard to denote the jointly typical set [3], L(y a ) = {x C : (x, y a ) A (n) ɛ typicality arguments, the expected size of the list is E [ L(y a ) ] = x C Pr[(x, y a) A (n) ɛ ] x C 2 n(i(x;ya) ɛ) = 2 n(r I(x;ya)+ɛ). Since the probability of any given codeword being in the indicated bin is 2 nr b,1, the size of the intersection will be roughly 1 + 2 n(r I(x;ya)+ɛ) 2 nr b,1 = 1 + 2 nɛ. Therefore, with this choice of rates, we can find a codebook with a small maximum probability of error that satisfies (1). Note that communication to user a is the most efficient possible. The information flow from the transmitter to user a is I(x; y a ). The information flow from user b to user a is R b,1 = R I(x; y a ) + 2ɛ. Thus the information cut-set to user a is R + 2ɛ, which is within 2ɛ of the rate of the channel code. This is the least possible information flow for which user a can still decode reliably. Zhang has shown [6] that such most-efficient communication strategies are possible only when the relay has perfectly decoded the message. This result implies that the information flow to the first user to decode (user b in this case) will always exceed R I(x; y b ), the difference between the message rate and the information flow from transmitter to b. 4 Decoding in Many Rounds In this section we generalize the one-round approach to multiple rounds of discussion. We show the following theorem Theorem 2 A rate-r message can be reliably decoded by users a and b through a k-round conversation if [ k R sum min I(u a,i ; y a y b, u a,1,..., u a,i 1, u b,1,..., u b,i 1 ) x,u a,1,...u a,k,u b,1,...u b,k 1 P ] k 1 + I(u b,i ; y b y a, u a,1,..., u a,i, u b,1,..., u b,i 1 ) + R I(x; y a, u b,1,..., u b,k 1 ), (2) where the set P consists of all random variables x, u a,1,..., u a,k, u b,1,..., u b,k 1 that satisfy the Markov conditions (i) u a,i y a, u a,1,..., u a,i 1, u b,1,..., u b,i 1 x, y b and (ii) u b,i y b, u a,1,..., u a,i, u b,1,..., u b,i 1 x, y a for all i, and that satisfy (iii) I(x; y b, u a,1,..., u a,k ) > R. The coding construction that achieves this theorem is a direct generalization of Theorem 1. For message m a,i we use the posterior p(u a,i y a, u a,1,... u a,i 1, u b,1,... u b,i 1 ), and for message m b,i we use the posterior p(u b,i y b, u a,1,... u a,i, u b,1,... u b,i 1 ). Many-round decoding takes advantage of the fact that coding with side information techniques are generally more efficient when side information is known at both encoder and decoder. In the conversation already-transmitted messages serve as side information known at both encoder and decoder. We see in the next section that for this reason this strategy can decode the message at a strictly lower sum-rate R sum than the baseline approach of Theorem 1. 4

5 Example: Binary Erasure Channels In this section we illustrate single-round and many-round conversation strategies for a binary erasure broadcast channel. For simplicity we assume that the erasure channels to the users are both symmetric with the same erasure probability p. Furthermore, in order to approach within ɛ the joint-decoding capacity C ab we use a channel codebook generated according to a Bernoulli(0.5) distribution, consisting of 2 n(1 p2 ɛ) = 2 n(c ab ɛ) codewords. 5.1 Decoding in One Round We now show that to be able to decode in one round via the strategy of Theorem 1, user a must send enough information so that user b can fully determine user a s observation. In other words, user a should use a Slepian-Wolf code. First observe that with the choice of p(x) as Bernoulli(0.5) we can rewrite the second condition of Theorem 1 as ɛ C ab I(x; y b, u) = I(x; y a, y b ) I(x; y b, u). This puts extra conditions on the test channel p(u y b ) that we are able to pick. In particular, it implies a certain extra Markov condition must hold as ɛ goes to zero. ɛ I(x; y a, y b ) I(x; u, y b ) = H(x u, y b ) H(x y a, y b ) (3) = p[h(x u) H(x y a )] = p[i(x; y a ) I(x; u)] (4) where (4) follows because x is Bernoulli(0.5) and the channels are symmetric binary erasure channels with erasure probability p. At this point we can combine the mutual information terms in (4) into the single divergence term pd(p(x y a, u) p(x u)), and use Pinsker s inequality [3] to show that p(x u, y a ) p(x u) must be small. For brevity, herein we simply assume for the rest of the discussion that as ɛ 0, the Markov chain x u y a holds exactly. We next expand (3) in a second way to learn about the rate R a,1. ɛ H(y a y b ) H(y a x) + H(u x) H(u y b ) = I(u; y b ) I(y a ; y b ) + H(x u) H(x y a ) (5) I(u; y b ) I(y a ; y b ), (6) where (5) follows from the fact that H(x u) H(x y a ) 0, a consequence of the Markov chain x y a u and the data processing inequality. Substituting (6) into the expression for R a,1 in Theorem 1 we get R a,1 = I(u; y a y b ) = I(u; y a ) I(u; y b ) I(u; y a ) I(y a ; y b ) ɛ = H(y a y b ) H(y a u) ɛ. (7) Finally, we now show that the Markov conditions derived above show that H(y a u) in (7) is zero. The two Markov chains x y a u and x u y a imply the following, for any (y a, u) pair such that p(y a, u) > 0: p(x y a, u) = p(x y a ) = p(x u). (8) For example, say there is some value u = j such that p(y a = 1, u = j) > 0, then because the channel is a binary erasure channel, 1 = p(x = 1 y a = 1, u = j) = p(x = 1 y a = 1) = p(x = 1 u = j). (9) 5

It is then easy to show that there must exist a set S 1 of values of u such that p(y a = 1 u S 1 ) = 1, and p(u S 1 ) = p(y a = 1). Thus p(u S 1 y a = 1) = 1. Similarly, one can define sets S ɛ and S 0 such that p(u S ɛ y a = ɛ) = p(y a = ɛ u S ɛ ) = 1 and p(u S 0 y a = 0) = p(y a = 0 u S 0 ) = 1. This implies that any test channel p(u y a ) that achieves I(x; u, y b ) = I(x; y a, y b ) maps y a into one of three disjoint sets S 0, S ɛ and S 1, depending on the value of y a. And, therefore, that H(y a u) = 0. Substituting the result that H(y a u) = 0 into (7) shows that in order to decode in one round, user a at best must do Slepian-Wolf coding. As we show in the next section, this means that for a single-round discussion R sum 2p(1 p) + H B (p). 6 Test Channel for One-Round Decoding In this section we introduce a candidate test channel p(u y a ) that enables decoding at the minimal rate derived in the last section. It can be used to decode at an even lower sumrate if multiple rounds of conversation are allowed. Let U = {0, e, 1} where e is the symbol for an erasure. Define p(u y a ) as follows: p(u = 1 y a = 1) = 1 p u, p(u = e y a = 1) = p u, p(u = e y a = e) = 1, p(u = 0 y a = 0) = 1 p u, p(u = e y a = 0) = p u. For this test channel it is straightforward 1 to show that R a,1 = I(u; y a y b ) = H B (p + p u (1 p)) + p(1 p)(1 p u ) (1 p)h B (p u ), (10) where H B (p u ) denotes the entropy of a Bernoulli(p u ) random variable. For a given p u the maximum communication rate that user b can decode reliably is I(x; y b, u) = 1 p[p + (1 p)p u ]. (11) To get I(x; y b, u) = 1 p 2 = C ab, we must set p u = 0, yielding R a,1 = p(1 p) + H B (p) = H(y a y b ). After user b decodes it can bin the channel messages per Theorem 1 and transmit the bin index at rate p(1 p) to user a. This give sum-rate R sum = R a,1 + R b,1 = 2p(1 p) + H B (p). (12) We compare the sum-rate of many-round conversations to this baseline. 6.1 Useful Alternative Derivation of One-Round Strategy We first give a useful alternative derivation of (10) and (11) in terms of the absolute fraction of symbols described by each message. Keeping track of absolute fractions rather than p u, which is a relative fraction of extra erasures added in by the test channel, makes it easier to consider the effect of multiple rounds. The alternative derivation follows in three steps. First, say message m a,1 informs b of the values of a fraction γ a,1 of the n symbols that a observes. Furthermore, let each symbol described by m a,1 be unerased on y a. Thus, 0 γ a,1 (1 p). The total number of possible sequences of length nγ a,1 is ( ) n 2 nh B(γ a,1 ). (13) nγ a,1 1 In deriving these relationships it is helpful to define extra random variables e a, e b, e u, which denote whether y a, y b, or u, are erasures, respectively. These random variables can be used in a manner analogous to the derivation of Fano s inequality in [3]. 6

Second, we don t care which set of nγ a,1 unerased symbols we describe, but all such sets must be subsets of the n(1 p) symbols that are observed unerased by a. There are ( ) n(1 p) 2n(1 p)h B(γ a,1 /(1 p)) = 2 n((1 p)h B(γ a,1 /(1 p)) ɛ) (14) n(1 p) + 1 nγ a,1 such subsets. Therefore, for there to be on average at least one subset that matches nγ a,1 of a s unerased symbols, we need to index at least 2 n(h B(γ a,1 ) (1 p)h B (γ a,1 /(1 p))+ɛ) subsets of size nγ a,1. Again, since we don t care which such subset we describe, we can index fewer subsets than in (13), thereby saving rate. This is equivalent to the quantization effect of the test channel p(u y a ). Finally, we need to account for the conditional entropy of the nγ a,1 symbols given the decoder information. The conditional entropy of symbol y a,i given y b,i and the event that y a,i is not erased (since we only discuss unerased symbols) equals p. We describe nγ a,1 symbols, giving a conditional entropy rate γ a,1 p. This is equivalent to the rate savings given by binning. All together this gives message rate R a,1 = H B (γ a,1 ) (1 p)h B (γ a,1 /(1 p)) + γ a,1 p + ɛ. (15) If we set γ a,1 = (1 p u )(1 p), and use the symmetry of H B ( ), then (15) equals (10). At the end of this communication step user b knows perfectly the following fraction of symbols (which we term the communication rate supportable by user b at step 1, since if the rate of the channel code C is below this, user b will be able to decode): R comm,b,1 = 1 p + pγ a,1. (16) By setting γ a,1 = (1 p u )(1 p), (16) equals (11). Figure 1 gives a pictorial explanation of this derivation. The top and bottom bars represent the received vectors y a and y b, respectively. The shaded areas represent the fractions of erased symbols. Symbols have been ordered so that, starting from the left, the symbols received unerased by both are shown first, next the symbols unerased on y a but erased on y b, next erased on y b but unerased on y a, and finally erased on both. The dotted lines indicate the fraction of symbols transmitted in each message and whether they are erased or unerased in y a, y b. For example, on average user a s first message m a,1 describes npγ a,1 symbols to user b that are unerased on y a, but erased on y b. Since when sending m b,1 user b only needs to be concerned with sending information about the symbols not discussed in m a,1, the message m b,1 sent is conditioned on m a,1. This is equivalent to conditioning the test channel for u b,1 on the result of the first test channel, u a,1, in Theorem 2. User b discusses nγ b,1 symbols in message m b,1. The first step in finding R b,1 is to determine the number of subsets of size nγ b,1 of the n nγ a,1 = n(1 γ a,1 ) symbols not discussed in m a,1, i.e., ( ) n(1 γa,1 ) 2 n(1 γ a,1)h B (γ b,1 /(1 γ a,1 )). (17) nγ b,1 Second, the expected number of possibly useful symbols (i.e., unerased in y b and undiscussed in earlier messages) is n(1 p) nγ a,1 (1 p) = n(1 γ a,1 )(1 p). User b needs to describe only a subset of nγ b,1 of these symbols, where γ b,1 < (1 γ a,1 )(1 p). The number of such subsets is ( ) n(1 p)(1 γa,1 ) 2 n[(1 p)(1 γ a,1)h B (γ b,1 /(1 p)(1 γ a,1 )) ɛ]. (18) nγ b,1 7

(1 p) 2 p(1 p) p(1 p) p 2 y a (1 p)γ a,1 ( 1 p 1 γ a,1 ) γ b,1 pγ a,1 pγ b,1 1 γ a,1 y b Figure 1: The fraction of unerased symbols is white, the fraction of erased symbols is shaded. Finally, the conditional entropy of a symbol equals the probability that a remaining possible useful symbol on y b is erased on y a. Figure 1 helps us see this probability is np(1 p) n(1 p) nγ a,1 (1 p) = p. (19) 1 γ a,1 Putting together (17), (18), and (19) gives the rate of m b,1, ( ) ( γb,1 R b,1 = (1 γ a,1 )H B (1 p)(1 γ a,1 )H B 1 γ a,1 γ b,1 (1 p)(1 γ a,1 ) After receiving m b,1, the communication rate reliably decodable by user a is 6.2 Decoding in Many Rounds ) + γ b,1p 1 γ a,1 + ɛ. (20) R comm,a,1 = 1 p + pγ b,1 /(1 γ a,1 ). (21) In this section we use the reinterpretation of (10) and (11) given in the last section to present results for decoding in many rounds of discussion. We begin by defining a number of useful quantities. First, we define f a,k and f b,k to be the fractions of symbols not already discussed when a and b formulate their kth messages, respectively. These fractions indicate the size of the set of symbols about which discussion can continue. k 1 k 1 f a,k = 1 γ a,i γ b,i, f b,k = 1 k k 1 γ a,i γ b,i. The difference in the limits of summation occurs because a is assumed to transmit first. Next, define P a,k (P b,k ) to be the probabilities that a symbol discussed in m a,k (m b,k ) is 8

useful, i.e., is erased on y b (y a ). These probabilities are P a,k = P b,k = p(1 p) k 1 γ a,ip a,i 1 p k 1 γ a,ip a,i k 1 γ b,i(1 P b,i ) n a,k (22) d a,k p(1 p) k 1 γ b,ip b,i 1 p k 1 γ b,ip b,i k γ a,i(1 P a,i ) n b,k. (23) d b,k The denominators d a,k and d b,k are particularly useful as they give the fraction of unerased and undiscussed, and therefore possibly useful, symbols remaining on y a and y b, respectively. With these definitions, and for any choices of γ a,i (0 γ a,i d a,i ) and γ b,i (0 γ b,i d b,i ), the message rates at each step are R a,k = f a,k H B (γ a,k /f a,k ) d a,k H B (γ a,k /d a,k ) + γ a,k P a,k, (24) R b,k = f b,k H B (γ b,k /f b,k ) d b,k H B (γ b,k /d b,k ) + γ b,k P b,k. (25) The communicate rates supportable by each user after receiving the first k messages are R comm,a,k = 1 p + R comm,b,k = 1 p + k P b,i γ b,i, (26) k P a,i γ a,i. (27) Discussion continues until, say R comm,b,k > R, the rate of the channel code. At this point user b decodes the message, bins the channel codebook into 2 n(r R comm,a,k) bins, and sends the bin index in which the transmitted codeword lies. User a intersects the contents of this bin with its list of remaining codeword possibilities. 6.3 Comparison of schemes We now compare the many-round and single-round strategies. In Fig. 2 we plot the sumrate of conversation R sum = k (R a,i + R b,i ) for probability of erasure p = 0.05. The highest sum-rate is given by a single-round conversation consisting of two messages. As we allow the number of rounds of discussion till decoding to increase, the sum-rate declines. Assuming k rounds till decoding, in plots 2 and 3, at step i each user described to the other a fraction 1/(k i+1) of its remaining unerased and undiscussed bits. Equivalently, γ a,i = d a,i /(k+1 i). A lower bound on the sum-rate is given by 2R I(x; y a ) I(x; y b ) = 2p(1 p). This is the additional information flow needed for the flow to each user to equal the code rate R. As mentioned earlier, Zhang [6] showed this lower bound is unachievable for the situations we discuss. In Fig. 3 we plot the percentage decrease in sum-rate versus decoding in a single round, where k is the number of rounds to full decoding. We plot k = 1,... 10 for four probabilities of erasure, p = 0.05, 0.1, 0.2, 0.3. Note that there is more gain for systems with lower probability of erasure. This is because lower probability of erasure means a higher percentage of symbols unerased on both observations, and thus more redundancy and greater savings from side information coding. Note that for p = 0.3 rate savings decline for larger numbers of rounds of discussion. Clearly, this is a function of poor choices for the γ a,i and γ b,i. Determining efficient optimizations for the γ a,i and γ b,i is part of on-going research. 9

0.45 0.4 Single round decoding Many round decoding Cut set bound (unachievable) Sum rate of conversation 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 1 2 3 4 5 6 7 8 9 10 Rounds of discussion till decoding Figure 2: Sum-rate of conversation for p = 0.05 versus number of rounds of discussion. 7 Discussion and Directions In this paper we introduce the problem of interactive decoding of a broadcast message. We give an approach based on a generalization of the most efficient relaying techniques known. We show that multi-round conversations can give strictly better performance than single-round conversations, where performance is measured by the sum-rate of the conversation. In demonstrating this result we define a family of candidate test channels for the binary erasure version of the problem. In later rounds the test channel output is conditioned on the results of earlier rounds. There are many interesting directions to pursue with regard to the basic interactive decoding problem posed herein. A central question is what should be discussed each round. The candidate test channel introduced gives one answer to this question but, except in the single-round situation with p u = 0, this is not always the optimal answer. The interplay between the source and channel coding aspects of the problem means that finding the optimal input distribution and test channels which determine what is discussed each round is generally not a convex optimization. Further, what is discussed by a in the ith round (as specified by the test channel defining u a,i ) may be radically different from what a should discuss in the i + 1th round. Even if we fix a family of test channels, as we did here, we still have parameters to optimize for each round. For the case presented herein, this corresponds to deciding how much to discuss each round. In the single-round situation we discuss everything in one round. Clearly, you can only gain by allowing more rounds of discussion. But, determining the optimal choice of how much to discuss each round is difficult and test-channel dependent. We are also working on applying these ideas to other types of channels, such as binarysymmetric or Gaussian. It is very important to note that while, e.g., in the Gaussian mean-squared-error case, having the decoder s side information at the encoder doesn t help in Wyner-Ziv coding, it helps hugely here. If the encoder (user a) knew both his observation and the decoder s (user b s), then the encoder could use both observations to 10

Percent sum rate savings over one round 40 35 30 25 20 15 10 5 p = 0.05 p = 0.1 p = 0.2 p = 0.3 0 1 2 3 4 5 6 7 8 9 10 Rounds of discussion till decoding Figure 3: Sum-rate savings versus decoding in one round for various probabilities of erasure, and rounds of discussion till decoding. jointly decode the message. Following that he would use the perfectly efficient messagebinning scheme we introduced to communicate to b. As shown by Zhang, however, this performance is unachievable. Therefore, even in the Gaussian case the underlying detection problem dramatically changes the quality of the solution. Finally, we can think of casting interactive decoding as a type of network coding problem. Interaction is allowed among various network nodes to determine the information they want. This perspective may allow us to connect our work to the very active network coding community, and give insight into practical code designs for interactive decoding. References [1] T. Berger. Multiterminal source coding. In G. Longo, editor, The Information Theory Approach to Communications, chapter 4. Springer-Verlag, 1977. [2] T. M. Cover and A. El Gamal. Capacity theorems for the relay channel. IEEE Trans. Inform. Theory, 25:572 584, September 1979. [3] T. M. Cover and J. A. Thomas. Elements of Information Theory. John Wiley and Sons, 1991. [4] A. Orlitsky. Worst-case interactive communication I: Two messages are almost optimal. IEEE Trans. Inform. Theory, 36:1111 1126, September 1990. [5] A. Orlitsky and J. R. Roche. Coding for computing. IEEE Trans. Inform. Theory, 47:903 917, March 2001. [6] Z. Zhang. Partial converse for a relay channel. IEEE Trans. Inform. Theory, 34:1106, 1988. 11