Interactive Hypothesis Testing with Communication Constraints

Similar documents
Interactive Hypothesis Testing Against Independence

A Comparison of Superposition Coding Schemes

Lossy Distributed Source Coding

On Scalable Coding in the Presence of Decoder Side Information

Distributed Lossless Compression. Distributed lossless compression system

Simultaneous Nonunique Decoding Is Rate-Optimal

On Multiple User Channels with State Information at the Transmitters

ProblemsWeCanSolveWithaHelper

Distributed Detection With Vector Quantizer Wenwen Zhao, Student Member, IEEE, and Lifeng Lai, Member, IEEE

EECS 750. Hypothesis Testing with Communication Constraints

Information Masking and Amplification: The Source Coding Setting

UCSD ECE 255C Handout #12 Prof. Young-Han Kim Tuesday, February 28, Solutions to Take-Home Midterm (Prepared by Pinar Sen)

Distributed Lossy Interactive Function Computation

Remote Source Coding with Two-Sided Information

The Gallager Converse

EE 4TM4: Digital Communications II. Channel Capacity

On the Necessity of Binning for the Distributed Hypothesis Testing Problem

On Scalable Source Coding for Multiple Decoders with Side Information

On Common Information and the Encoding of Sources that are Not Successively Refinable

Lattices for Distributed Source Coding: Jointly Gaussian Sources and Reconstruction of a Linear Function

Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels

Hypothesis Testing with Communication Constraints

Multiterminal Source Coding with an Entropy-Based Distortion Measure

UCSD ECE 255C Handout #14 Prof. Young-Han Kim Thursday, March 9, Solutions to Homework Set #5

A Formula for the Capacity of the General Gel fand-pinsker Channel

Extended Gray Wyner System with Complementary Causal Side Information

Information Theoretic Limits of Randomness Generation

SHARED INFORMATION. Prakash Narayan with. Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe

Lecture 4 Noisy Channel Coding

SHARED INFORMATION. Prakash Narayan with. Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

Subset Universal Lossy Compression

On The Binary Lossless Many-Help-One Problem with Independently Degraded Helpers

Common Information. Abbas El Gamal. Stanford University. Viterbi Lecture, USC, April 2014

A New Achievable Region for Gaussian Multiple Descriptions Based on Subset Typicality

On Function Computation with Privacy and Secrecy Constraints

Interactive Decoding of a Broadcast Message

EE/Stat 376B Handout #5 Network Information Theory October, 14, Homework Set #2 Solutions

Paul Cuff, Han-I Su, and Abbas EI Gamal Department of Electrical Engineering Stanford University {cuff, hanisu,

Joint Source-Channel Coding for the Multiple-Access Relay Channel

Multiuser Successive Refinement and Multiple Description Coding

Representation of Correlated Sources into Graphs for Transmission over Broadcast Channels

Distributed Hypothesis Testing Over Discrete Memoryless Channels

Reliable Computation over Multiple-Access Channels

Lecture 22: Final Review

Solutions to Homework Set #1 Sanov s Theorem, Rate distortion

Frans M.J. Willems. Authentication Based on Secret-Key Generation. Frans M.J. Willems. (joint work w. Tanya Ignatenko)

An Achievable Rate Region for the 3-User-Pair Deterministic Interference Channel

SOURCE CODING WITH SIDE INFORMATION AT THE DECODER (WYNER-ZIV CODING) FEB 13, 2003

Side-information Scalable Source Coding

Multiaccess Channels with State Known to One Encoder: A Case of Degraded Message Sets

Universal Incremental Slepian-Wolf Coding

THE fundamental architecture of most of today s

Universality of Logarithmic Loss in Lossy Compression

Capacity Region of Reversely Degraded Gaussian MIMO Broadcast Channel

Common Randomness Principles of Secrecy

arxiv: v1 [cs.it] 5 Feb 2016

Strong Converse Theorems for Classes of Multimessage Multicast Networks: A Rényi Divergence Approach

Solutions to Homework Set #2 Broadcast channel, degraded message set, Csiszar Sum Equality

Capacity of a Class of Deterministic Relay Channels

Distributed Functional Compression through Graph Coloring

Lecture 5 Channel Coding over Continuous Channels

Arimoto Channel Coding Converse and Rényi Divergence

Graph Coloring and Conditional Graph Entropy

On Gaussian MIMO Broadcast Channels with Common and Private Messages

Distributed Lossy Interactive Function Computation

Subset Typicality Lemmas and Improved Achievable Regions in Multiterminal Source Coding

ECE Information theory Final (Fall 2008)

Katalin Marton. Abbas El Gamal. Stanford University. Withits A. El Gamal (Stanford University) Katalin Marton Withits / 9

6196 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 57, NO. 9, SEPTEMBER 2011

On the Duality between Multiple-Access Codes and Computation Codes

Variable Length Codes for Degraded Broadcast Channels

Network coding for multicast relation to compression and generalization of Slepian-Wolf

Variable-Rate Universal Slepian-Wolf Coding with Feedback

Amobile satellite communication system, like Motorola s

Performance-based Security for Encoding of Information Signals. FA ( ) Paul Cuff (Princeton University)

Lecture 1: The Multiple Access Channel. Copyright G. Caire 12

SUCCESSIVE refinement of information, or scalable

Efficient Use of Joint Source-Destination Cooperation in the Gaussian Multiple Access Channel

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

Coding Techniques for Primitive Relay Channels

Source and Channel Coding for Correlated Sources Over Multiuser Channels

How to Compute Modulo Prime-Power Sums?

Upper Bounds on the Capacity of Binary Intermittent Communication

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Polar Codes are Optimal for Lossy Source Coding

Secret Key Agreement Using Conferencing in State- Dependent Multiple Access Channels with An Eavesdropper

List of Figures. Acknowledgements. Abstract 1. 1 Introduction 2. 2 Preliminaries Superposition Coding Block Markov Encoding...

Lecture 15: Conditional and Joint Typicaility

(Classical) Information Theory III: Noisy channel coding

Capacity of a Class of Semi-Deterministic Primitive Relay Channels

Secret Key and Private Key Constructions for Simple Multiterminal Source Models

Information measures in simple coding problems

Lecture 3: Channel Capacity

Lecture 5: Channel Capacity. Copyright G. Caire (Sample Lectures) 122

The Gaussian Many-Help-One Distributed Source Coding Problem Saurabha Tavildar, Pramod Viswanath, Member, IEEE, and Aaron B. Wagner, Member, IEEE

On Lossless Coding With Coded Side Information Daniel Marco, Member, IEEE, and Michelle Effros, Fellow, IEEE

LECTURE 13. Last time: Lecture outline

YAMAMOTO [1] considered the cascade source coding

Transcription:

Fiftieth Annual Allerton Conference Allerton House, UIUC, Illinois, USA October - 5, 22 Interactive Hypothesis Testing with Communication Constraints Yu Xiang and Young-Han Kim Department of Electrical and Computer Engineering University of California, San Diego La Jolla, CA 9293, USA Email: {yxiang,yhk}@ucsd.edu Abstract This paper studies the problem of interactive hypothesis testing with communication constraints, in which two communication nodes separately observe one of two correlated sources and interact with each other to decide between two hypotheses on the joint distribution of the sources. When testing against independence, that is, the joint distribution of the sources under the alternative hypothesis is the product of the marginal distributions under the null hypothesis, a computable characterization is provided for the optimal tradeoff between the communication rates in two-round interaction and the testing performance measured by the type II error exponent such that the type I error probability asymptotically vanishes. An example is provided to show that interaction is strictly helpful. I. INTRODUCTION Berger [], in an inspiring attempt at combining information theory and statistical inference, formulated the problem of hypothesis testing with communication constraints as depicted in Fig.. Let (X n,y n ) n p X,Y(x i,y i ) be a pair of independent and identically distributed (i.i.d.)n-sequences generated by a two-component discrete memoryless source (2-DMS) (X,Y). Suppose that there are two hypotheses on the joint distribution of (X,Y), namely, H : (X,Y) p (x,y), H : (X,Y) p (x,y). In order to decide which hypothesis is true, nodes and 2 observe X n and Y n, respectively, compress their observed sequences into indices of rates R and R 2, and communicate them over noiseless links to node 3, which then makes a decision Ĥ {H, H } based on the received compression indices. What is the impact of communication constraints on the performance of hypothesis testing? To answer this question, Berger [] studied the optimal tradeoff between the communication rates and the testing performance that is measured by the exponent of the type II error probability such that the type I error probability is upper bounded by a given ǫ <. Despite many natural applications, however, theoretical understanding of this problem is far from complete and a simple characterization of this rate exponent tradeoff remains open in general. In their celebrated work [2], Ahlswede and Csiszár studied the special case in which the sequence Y n is fully available at the destination node, i.e., R 2. They established 978--4673-4539-2/2/$3. 22 IEEE 65 X n Y n Fig.. Node Node 2 R R 2 Node 3 Multiterminal hypothesis testing with communication constraints. single-letter inner and outer bounds on the optimal tradeoff between the communication rate R and the type II error exponent and showed that these bounds are tight for testing against independence, i.e., the alternative hypothesis H is p (x,y) p (x)p (y). Later, Han [3] and Shimokawa, Han, and Amari [4] provided a new coding scheme that improves upon the Ahlswede and Csiszar inner bound for the general hypothesis testing problem. The Shimokawa Han Amari scheme is similar to the Berger Tung scheme [5], [6] for the distributed lossy source coding problem, where nodeand node2perform joint typicality encoding followed by binning. A more comprehensive survey on the earlier literature can be found in [7]. Recently, several variations of this setup have been studied, including successive refinement hypothesis testing [8] and testing against conditional independence [9]. This paper studies an interactive version of hypothesis testing with communication constraints. Two nodes communicate with each other through noiseless links and one of the nodes is to perform hypothesis testing at the end of interactive communication. To be concrete, we focus on two rounds of interaction for hypothesis testing against independence. For this special case, we establish a single-letter characterization of the optimal tradeoff between the communication rates and the type II error probability when the type I error probability is arbitrarily small. The rest of the paper is organized as follows. In Section II, we review the problem of one-way hypothesis testing with communication constraints. In Section III, we formulate the problem of interactive hypothesis testing with communication constraints and present our main theorem. In Section IV, we Ĥ

compare the interactive hypothesis testing problem with the interactive lossy source coding problem by Kaspi []. Throughout the paper, we closely follow the notation in []. In particular, for X p(x) and ǫ (,), we define the set of ǫ-typical n-sequences x n (or the typical set in short) [2] as T ǫ (n) (X) {x n : #{i : x i x}/n p(x) ǫp(x) for all x X}. We say that X Y Z form a Markov chain if p(x,y,z) p(x)p(y x)p(z y), that is, X and Z are conditionally independent of each other given Y. II. ONE-WAY HYPOTHESIS TESTING WITH COMMUNICATION CONSTRAINTS As before, let (X n,y n ) n p X,Y(x i,y i ) be a pair of i.i.d. sequences generated by a 2-DMS (X,Y) and consider two hypotheses H : (X,Y) p (x,y), H : (X,Y) p (x,y). We consider the special case of the hypothesis testing problem depicted in Fig., in which R 2 ; see Fig. 2. Here, node 2 is equivalent to node 3 and is required to make a decision Ĥ {H, H }. A (2 nr,n) hypothesis test consists of an encoder that assigns an index m (x n ) [ : 2 nr ] to each sequence x n X n, and a tester that assigns ĥ(m,y n ) {H,H } to each (m,y n ) [ : 2 nr ] Y n. The acceptance region is defined as A n : {(m,y n ) [ : 2 nr ] Y n : ĥ(m,y n ) H }. Then the type I error probability is P (A c n ) p (x n,y n ) (x n,y n ):(m (x n ),y n ) A c n and the type II error probability is P (A n ) p (x n,y n ). (x n,y n ):(m (x n ),y n ) A n Fix ǫ (,) and define the optimal type II error probability as β n (R,ǫ) : minp (A n ), where the minimum is over all (2 nr,n) tests such that P (A c n ) ǫ. Further define the optimal type II error exponent as θ (R,ǫ) : lim n n logβ n (R,ǫ). X n Now suppose that the two hypotheses are Fig. 2. H : (X,Y) p (x,y), H : (X,Y) p (x,y) p (x)p (y). R Node Node 2 One-way hypothesis testing with communication constraint. Y n Ĥ Here p (x) and p (y) are marginal distributions of p (x,y). For this special case of hypothesis testing against independence, Ahlswede and Csiszár established the following. Theorem (Ahlswede and Csiszár [2]): For every ǫ (,), θ (R,ǫ) max I(U ;Y), () p(u x):r I(U ;X) where the cardinality bound for U is U X +. We illustrate the theorem with the following. Example : Consider the following Z binary sources (X,Y) depicted in Fig. 3, wherey is the output ofx through a Z channel (equivalently, X is the output of Y through an inverted Z channel) and /2 X /2 /2 Fig. 3. p X,Y (,) /2, p X,Y (,), p X,Y (,) /4, p X,Y (,) /4. 3/4 Y /4 3/4 Y /4 2/3 Two equivalent representations of (X, Y). /2 X /2 We now apply Theorem and evaluate the optimal type II error exponent in (). Since U X + 3, we can optimize over all conditional pmfs p(u x) of the form in Fig. 4. Then we have ( θ (R,ǫ) max H 6 H 2 3 ) 4 H 3, where the maximum is over all (a,b,c,d) such that Y R H 2 H 4 2 H 2 2/3 Fig. 4. X a b c d Conditional pmf p(u x). 2 U 66

and H through H 4 are defined as ( a+c H : H 2, b+d 2, 2 a b c d ), 2 H 2 : H(c,d, c d), ( a+2c H 3 : H 3 H 4 : H(a,b, a b)., b+2d, 3 a b 2c 2d 3 3 For example, when R /2, we have θ (R,ǫ).878. The entire curve of θ (R,ǫ) is plotted in Fig. 7 in Section III. III. INTERACTIVE HYPOTHESIS TESTING WITH COMMUNICATION CONSTRAINTS Suppose that instead of making an immediate decision based on one round of communication, the two nodes can interactively communicate over a noiseless bidirectional link before one of the nodes performs hypothesis testing. We wish to characterize the optimal tradeoff between the communication rates and the performance of hypothesis testing. For simplicity of discussion, we focus on the 2-round case depicted in Fig. 5. X n Ĥ Fig. 5. R Node R 2 Node 2 ), Y n Interactive hypothesis testing with communication constraints. As before, we consider testing against independent, i.e., H : (X,Y) p (x,y), H : (X,Y) p (x,y) p (x)p (y). A (2 nr,2 nr2,n) hypothesis test consists of a round encoder that assigns an index m (x n ) [ : 2 nr ] to each sequence x n X n, a round 2 encoder that assigns an index m 2 (m,y n ) [ : 2 nr2 ] to each (m,y n ) [ : 2 nr ] Y n, and a tester that assigns ĥ(m 2,x n ) {H,H } to each (m,x n ) [ : 2 nr ] X n. The acceptance region is defined as A n : {(m 2,x n ) [ : 2 nr2 ] X n : ĥ(m 2,x n ) H }. The type I error probability is P (A c n) and the type II error probability is P (A n ). Fix ǫ (,) and define the optimal type II error probability as β n(r,r 2,ǫ) : minp (A n ) where the minimum is over all (2 nr,2 nr2,n) tests with P (A c n ) ǫ. Further define the optimal type II error exponent as θ 2 (R,R 2,ǫ) : lim n n logβ n(r,r 2,ǫ). Remark : The optimal type II error exponent is lower bounded as θ 2 (R,R 2,ǫ) max { θ 2 (R +R 2,,ǫ), θ 2 (,R +R 2,ǫ) } θ (R +R 2,ǫ). We establish the optimal tradeoff between the rate constraints and the testing performance by characterizing θ 2 (R,R 2,ǫ) as ǫ. We are ready to state the main result of the paper. Theorem 2: lim θ 2(R,R 2,ǫ) ǫ max p(u x)p(u 2 u,y): R I(U ;X), R 2 I(U 2;Y U ) where U X + and U 2 Y U +. ( I(U ;Y)+I(U 2 ;X U ) ), Remark 2: By setting U 2 and R 2, Theorem 2 recovers the optimal one-way type II error exponent in Theorem. Remark 3: We can express the optimal tradeoff between communication constraints and the type II error exponent by the rate exponent region that consists of all rate exponent triples (R,R 2,θ) such that R I(U ;X), R 2 I(U 2 ;Y U ), θ I(U ;Y)+I(U 2 ;X U ) for some conditional pmfs p(u x)p(u 2 u,y). Example 2 (Interaction helps): We revist the Z binary sources in Fig. 3. We show that the two-round interaction can strictly outperform the one-way case. While the optimal type II exponent in Theorem 2 can be directly evaluated, we instead use the simple lower bound on θ 2 (R,R 2,ǫ) discussed in Remark. Consider lim θ 2(R,R 2,ǫ) limθ 2 (,R +R 2,ǫ) ǫ ǫ max I(U 2;X), p(u 2 y):r +R 2 I(U 2;Y) where follows by Theorem. Since U 2 3, we can again optimize over all conditional pmfs p(u 2 y) of the form in Fig. 6, which yields ( θ 2 (,R +R 2,ǫ) max H 2 H 2 ) 2 H 3, where the maximum is over all (a,b,c,d) such that and R +R 2 H 3 4 H 2 4 H 4 ( 3a+c H : H, 3b+d, 4 3a 3b c d ), 4 4 4 H 2 : H(a,b, a b), ( a+c H 3 : H 2, b+d 2, 2 a b c d ), 2 H 4 : H(c,d, c d). (2) 67

X /2 Fig. 6. Y a b c d Conditional pmf p(u 2 y). Fig. 7 numerically compares the one-way exponent θ (R + R 2,ǫ) with the lower bound θ 2 (,R + R 2,ǫ) on the tworound exponent θ 2 (R,R 2,ǫ) as ǫ. For every value of the sum rate R +R 2 (,), θ 2 (,R +R 2,ǫ) > θ (R +R 2,ǫ) and thus there is strict improvement by using interaction. In fact, it can be shown that the gap increases as the cross probability of the Z channel increases, that is, as the channel becomes more and more skewed. θ.35.3.25.2.5 2 θ θ 2 U 2 and independently generate 2 nr sequences u n (m ), m [ : 2 nr ], each according to n p (u i ). Randomly and independently generate 2 nr2 sequences u n 2 (m 2 m ), m 2 [ : 2 nr2 ], each according to n p (u 2i u i ). These sequences constitute the codebook C, which is revealed to both nodes. Encoding for round. Given a sequence x n, node finds an index m such that (x n,u n (m )) T ǫ (n). If there is more than one such index, it sends the smallest one among them. If there is no such index, it selects an index from [ : 2 nr ] uniformly at random. Encoding for round 2. Given y n and m, node 2 finds an index m 2 such that (y n,u n (m ),u n 2 (m 2)) T ǫ (n). If there is more than one such index, it selects one of them uniformly at random. If there is no such index, it selects an index from [ : 2 nr2 ] uniformly at random. Testing. Upon receiving m 2, node sets the acceptance region A n for H to A n {(m 2,x n ) : (u n (m ),u n 2 (m 2),x n ) T (n) ǫ }, where the jointly typical set T ǫ (n) T ǫ (n) (U,U 2,X) is defined with respect to p (x,y), p(u x) and p(u 2 u,y). Analysis of two types of error. Let (M, M 2 ) denote the chosen indices at node and 2 respectively. Node chooses Ĥ H iff one or more of the following events occur: E { (U n (m ),X n ) / T (n) for all m [ : 2 nr ) }, E 2 { (U2 n (m 2),U n (M ),Y n ) / T (n) for all m 2 [ : 2 nr2 ) }, E 3 { (U n (M ),U n 2 (M 2 ),X n ) / T (n) ǫ For the type I error probability, assume that H is true. Then α n P(E E 2 E 3 ) P(E )+P(E c E 2)+P(E c Ec 2 E 3). }...5..2.3.4.5.6.7.8.9 R +R 2 Fig. 7. Comparison of the one-way case with the two-round case. The solid red curve corresponds to the lower bound θ 2 (,R + R 2,ǫ) for the two-round case and the dotted blue curve corresponds to θ (R + R 2,ǫ) for the one-way case. In the following two subsections, we prove Theorem 2 by establishing achievability and the weak converse. A. Proof of Achievability Codebook generation. Fix a conditional pmf p(u,u 2 x,y) p(u x)p(u 2 u,y) that attains the maximum in (2). Let p (u ) x p (x)p(u x) and p (u 2 u ) y p (y)p(u 2 u,y). Randomly 68 We now bound each term. By the covering lemma [, Section 3.7], P(E ) tends to zero as n if R I(U ;X)+ δ( ). Now we bound the second term. Since ǫ > >, E c {(U n(m ),X n ) T (n) ǫ } and Y n {U n (M ) u n,x n x n } n p (y i u i,x i ) n p (y i x i ), by the conditional typicality lemma [, Section 2.5], then P{(U n (M ),X n,y n ) T (n) } tends to zero as n and thus P{(U n(m ),Y n ) T (n) } tends to zero as n. Therefore, again by the covering lemma, P(E E c 2 ) tends to zero as n if R 2 I(U 2 ;Y U )+δ( ). To bound the last term, we use a version of the Markov lemma [5] in [, Section 2.]. Let (x n,u n,y n ) T (n) and consider P{U n 2(M 2 ) u n 2 X n x n,u n (M ) u n,y n y n } P{U n 2 (M 2) u n 2 Un (M ) u n,y n y n } p(u n 2 u n,y n ). First note that by the covering lemma, P{U n 2 (M 2) T (n) (Un 2 un,yn ) U n (M ) u n,y n y n } tends to one asn, that is, p(u n 2 u n,y n ) satisfies the first condition in the Markov lemma. For the second condition, the following is proved in the Appendix.

Lemma : For every u n 2 T (n) (Un 2 u n,y n ) and n sufficiently large, p(u n 2 un,yn ). 2 nh(u2 U,Y). Hence, by the Markov lemma, P{(x n,u n,y n,u2 n (M 2 )) T ǫ (n) X n x n,u n(m ) u n,y n y n } tends to one as n, if (u n,x n,y n ) T (n) ǫ (U,X n,y n ) and < ǫ is sufficiently small. Therefore,P(E c Ec 2 E 3) asn. For the type II error probability, assume in this case that H is true. Then β n P(E c E c 2 E c 3) P(E c )P(E c 2 E c )P(E c 3 E c E c 2). We now bound each factor. By the covering lemma, P(E c ) tends to one as n if R I(U ;X) + δ( ). Define the event E {(U n (M),Y n ) / T (n) }. Then, given that (U n (M),X n ) T (n), P(E c 2 E c ) P(E c 2 E E c )+P(E c 2 E c E c ) P(E c 2 Ec Ec ) P(E c 2 E c )P(E c E c ). By the covering lemma, P(E c 2 E c ) tends to one as n if R 2 I(U 2 ;Y U )+δ( ). P(E c E c ) Thus we have n p (u)p (y n ) 2 n(i(u;y) δ(ǫ )). (u n,y n ) T (n) P(E c 2 E c ) 2 n(i(u;y) δ(ǫ )). For the third factor P(E c 3 E c E c 2), we need the following. Lemma 2: If H is true, we have p (u 2,x u ) p (u 2 u )p (x u ). T (n) and (Un 2 (K),U n (M),Y n ) T (n), we have P(E3 E c 2 c E) c (u n,un 2,xn ) T (n) ǫ (u n,un 2,xn ) T (n) ǫ P{U n (M) un,un 2 (K) un 2, p (u n 2 un )p (x n,u n ) X n x n E c 2 Ec } 2 n(h(u,u2,x)+δ(ǫ)) 2 n(h(u2 U) δ(ǫ )) 2 n(h(u,x) δ(ǫ )) 2 n(h(u2 U,X) H(U2 U) δ(ǫ)) 2 n(i(u2;x U) δ(ǫ)). Combining the bounds on the three factors, we have β n 2 n(i(u;y)+i(u2;x U) δ(ǫ)). In summary, the type I error probability averaged over all codebooks is upper bounded by ǫ if R I(U ;X) and R 2 I(U 2 ;Y U ), while the type II error probability averaged over all codebooks is upper bounded (in exponent) by 2 n(i(u;y )+I(U2;X U) δ(ǫ)). Therefore, there exists a codebook such that θ 2 (R,R 2,ǫ) I(U ;Y)+I(U 2 ;X U ). This completes the achievability proof. B. Proof of the Converse Given a (2 nr,2 nr2,n) test characterized by the encoding functionsm andm 2, and the acceptance regiona n, we have by the data processing inequality for relative entropy that ( D p (x n,y n )p(m x n )p(m 2 m,y n ) y n ) p (x n )p (y n )p(m x n )p(m 2 m,y n ) y n ( α)log α β +αlog α β, where α : P (A c n ) and β : P (A n ). Let M m (X n ) and M 2 m 2 (M,Y n ). Then by the definition of β n (R,R 2,ǫ), we must have Proof: We have p (u 2,x u ) p (u 2,y,x u ) y p (y,x u )p (u 2 y,u ) y p (x y,u )p (y u )p (u 2 y,u ) y p (x u )p (u 2,y u ) y p (u 2 u )p (x u ). Then, ( α)log α β H(M ) nr, H(M 2 ) nr 2, α ǫ, β β n(r,r 2,ǫ). +αlog α β ( α)log β +αlog β H(α) ( α)log β H(α) Now we boundp(e c 3 Ec Ec 2 ). Given that (Un (M),Xn ) 69 ( ǫ)log β H(α).

Thus we have the following multiletter expression upper bound as where lim θ(r,r 2,ǫ) ǫ lim n n D( p (m,m 2,x n ) p (m,m 2,x n ) ), p (m,m 2,x n ) y n p (x n,y n )p(m x n )p(m 2 m,y n ), p (m,m 2,x n ) y n p (x n )p (y n )p(m x n )p(m 2 m,y n ). The relative entropy term is upper bounded as D ( p (m,m 2,x n ) p (m,m 2,x n ) ) D ( p (x n,m 2 m )p (m ) p (x n m )p (m 2 m )p (m ) ) p (x n,m,m 2 )log p (m 2 m,xn ) p x n,m (m 2 m ),m 2 ( p (x n p (m 2 m,x n ),m,m 2 )log p x n,m (m 2 m ),m 2 p(m ) 2 m ) p (m 2 m ) I(X n ;M 2 M )+ p (m,m 2 )log p (m 2 m ) p m (m 2 m ),m 2 where p (m 2 m,x n ), p (m 2 m ), and p (m 2 m ) are defined as p (m 2 m,x n ) : y n p (y n x n )p(m 2 m,y n ), p (m 2 m ) : y n p (y n )p(m 2 m,y n ), p (m 2 m ) : y n p (y n m )p(m 2 m,y n ). The second term in (3) is upper bounded as p (m,m 2 )log p (m 2 m ) p m (m 2 m ),m 2 D(p (m,m 2 ) p (m,m 2 )) ( D p (m ) p (y m )p(m 2 m,y n ) y n p (m ) ) p (y)p(m 2 m,y n ) y n D(p (m )p (y m )p(m 2 m,y n ) I(M ;Y n ), p (m )p (y)p(m 2 m,y n )) where follows by the data processing inequality for relative entropy. Thus we have (3) θ : limθ 2 (R,R 2,ǫ) ǫ ( lim I(X n ;M 2 M )+I(M ;Y n ) ). (4) n n 7 To complete the converse proof, we single-letterize the upper bound in (4) in the following steps. First consider nr H(M ) I(M ;X n ) I(M ;X i X i ) I(M,X i,y i ;X i ), where follows from the fact that X n and Y n are i.i.d. Next consider nr 2 H(M 2 ) I(M 2 ;X n,y n M ) I(M 2 ;X i,y i M,X i,y i ) I(M 2 ;Y i M,X i,y i ). Now the mutual information term I(M ;Y n ) is upper bounded as I(M ;Y n ) I(M ;Y i Y i ) I(M,Y i ;Y i ) I(M,X i,y i ;Y i ). Finally I(M 2 ;X n M ) is upper bounded as I(M 2 ;X n M ) I(M 2 ;X i M,X i ) I(M 2 ;X i M,X i,y i ) +I(M 2 ;Y i M,X i ) I(M 2 ;Y i M,X i,x i ) I(M 2 ;X i M,X i,y i ), (b) where follows since I(M 2 ;X i,y i M,X i ) I(M 2 ;X i M,X i )+I(M 2 ;Y i M,X i,x i ) I(M 2 ;Y i M,X i )+I(M 2 ;X i M,X i,y i ). and (b) follows since I(M 2 ;Y i M,X i ) I(M 2 ;Y i M,X i,x i ) H(Y i M,X i ) H(Y i M,X i,x i ) H(Y i M,M 2,X i )+H(Y i M,M 2,X i,x i ) I(Y i ;X i M,X i ) I(Y i ;X i M,M 2,X i ) (c).

lim θ 2(R,R 2,ǫ) ǫ max p(u x),p(u 2 u,y): R I(U ;X), R 2 I(U 2;Y U ) I(U ;Y)+I(U 2 ;X U ). (4) Here, the inequality (c) holds since X i (M,X i ) Y i forms a Markov chain. Identifying U i (M,X i,y i ) and U 2i M 2 and note that U i X i Y i and U 2i (U i,y i ) X i form two Markov chains. Thus, for nr I(M,X i,y i ;X i ) and nr 2 I(U i ;X i ) I(M 2 ;Y i M,X i,y i ) I(U 2i ;Y i U i ), we have nθ I(M,X i,y i ;Y i ) +I(M 2 ;X i M,X i,y i ) I(U i ;Y i )+I(U 2i ;Y i U i ). Define the time-sharing random variable Q to be uniformly distributed over [ : n] and independent of (M,M 2,X n,y n ), and identify U (Q,U Q ), U 2 (Q,U 2Q ), X X Q, and Y Y Q. Clearly, U X Y and U 2 (U,Y) X form two Markov chains. Thus we have (4) at the top of the page. Finally, the cardinality bounds on U and U 2 follow the standard technique, in particular, the one used in the 2-round interactive lossy source coding problem []. This completes the converse proof. IV. DISCUSSION Consider the interactive lossy source coding problem first studied by Kaspi [], as depicted in Fig. 8. Here two nodes interactively communicate with each other so that each node can reconstruct the source observed by the other node with prescribed distortions. Kaspi [] established the general q- round rate distortion region. Ma and Ishwar [3] provided an ingenious example showing that interactive communication can strictly outperform one-way communication. In this section, we compare the two-round interactive hypothesis testing problem with the two-round interactive lossy source coding problem. For the formal definition of the latter, refer to [] or [, Section 2.3]. We recall the optimal tradeoff between communication constraints and distortion constraints. 7 X n R Node R 2 Node 2 (Ŷ n,d Y ) ( ˆX n,d X ) Fig. 8. Interactive lossy source coding. Theorem 3 (Kaspi []): The two-round rate distortion region is the set of all rate pairs (R,R 2 ) such that R I(U ;X) I(U ;Y), R 2 I(U 2 ;Y U ) I(U 2 ;X U ) for some p(u x)p(u 2 u,y) with U X + and U 2 Y U + and functions ˆx(u,u 2,y) and ŷ(u,u 2,x) that satisfy E(d(X, ˆX)) D X and E(d(Y,Ŷ)) D Y. Achievability is established by performing Wyner Ziv coding [4] in each round, i.e., joint typicality encoding followed by binning. By contrast, the scheme we used for the interactive hypothesis testing problem is joint typicality encoding in each round (without binning). The excessive communication rates caused by not using binning pay back with the type II error exponent; see Remark 3. It turns out, however, that this distinction between binning and no binning is not fundamental. By using Wyner Ziv coding in the interactive hypothesis testing problem, we can establish the following tradeoff between communication constraints and the testing performance. Proposition : The rate exponent region for two-round interactive hypothesis testing is the set of rate exponent triples (R,R 2,θ) such that θ I(U ;Y)+I(U 2 ;X U ) R I(U ;X) I(U,Y) R 2 I(U 2 ;Y U ) I(U 2 ;X U ) R +R 2 θ I(U ;X)+I(U 2 ;Y U ) for some p(u x)p(u 2 u,y). I(U ;Y) I(U 2 ;X U ) It can be shown that the region in Proposition is equivalent to the region in Remark 3 (and the optimal error exponent in Theorem 2). As pointed out by Rahman and Wagner [9] in the one-way setup, binning never hurts [9]. Therefore, the coding scheme for two-round interactive lossy source coding leads to an essentially identical scheme for two-round interactive hypothesis testing, which is optimal! This equivalence can be extended to the general q-round interactive hypothesis testing and lossy source coding problems. We will explore this connection further in a subsequent publication elsewhere. Y n

P{U n 2(M 2 ) u n 2 U n u n,y n y n } P{U n 2 (M 2) u n 2,Un 2 T (n) (Un 2 un,yn ) U n un,y n y n } P{U2(M n 2 ) T (n) (Un 2 u n,y n ) U n u n,y n y n } P{U2(m n 2 ) u n 2 U n u n,y n y n,u2 n T (n) (Un 2 u n,y n )} P{U2 n (M 2) u n 2 Un un,y n y n,u2 n T (n) (Un 2 un,yn )} P{U2 n (M 2) u n 2,M 2 m 2 U n un,y n y n,u2 n T (n) (Un 2 un,yn )} m 2 m 2 P{M 2 m 2 U n un,y n y n,u n 2 T (n) (Un 2 un,yn )} P{U n 2 (m 2) u n 2 Un un,y n y n,u n 2 T (n) (Un 2 un,yn ),M 2 m 2 } P{M 2 m 2 U n un,y n y n,u2 n T (n) (Un 2 un,yn )} P{U2 n (m 2) u n 2 Un 2 T (n) (Un 2 un,yn )} m 2 (b) m 2 P{M 2 m 2 U n un,y n y n,u n 2 T (n) (Un 2 un,yn )} 2 n(h(u2 U,Y) δ(ǫ )) 2 n(h(u2 U,Y) δ(ǫ )). APPENDIX PROOF OF LEMMA For everyu n 2 T (n) (Un 2 un,yn ), P{U2 n(m 2) u n 2 Un u n,y n y n } is upper bounded at the top of this page, where follows since U2 n(m 2) is independent of (Y n, U (m )) and U2 n(m 2 ) for m 2 m 2 and is conditionally independent of M 2 given (Y n, U (m )) and the indicator variables of the event U2 n(m 2) T (n) (Un 2 un,yn ), m 2 [ : 2 nr2 ], which implies that the event {U2(m n 2 ) u n 2} is conditionally independent of {Y n,u (m ),M 2 m 2 } given U2 n (m 2 ) T (n) (Un 2 u n,y n ). Step (b) follows from the properties of typical sequences. Similarly, for every u n 2 T (n) (Un 2 un,yn ) and n sufficiently large, P{U n 2(M 2 ) u n 2 U n u n,y n y n } This completes the proof of Lemma. ( )2 n(h(u2 U,Y)+δ(ǫ )). REFERENCES [] T. Berger, Decentralized estimation and decision theory, in Proc. IEEE Inf. Theory Workshop, Mt. Kisco, NY, Sep. 979. [2] R. Ahlswede and I. Csiszár, Hypothesis testing with communication constraints, IEEE Trans. Inf. Theory, vol. 32, no. 4, pp. 533 542, 986. [3] T. S. Han, Hypothesis testing with multiterminal data compression, IEEE Trans. Inf. Theory, vol. 33, no. 6, pp. 759 772, 987. [4] H. Shimokawa, T. S. Han, and S. Amari, Error bound of hypothesis testing with data compression, in Proc. IEEE Internat. Symp. Inf. Theory, Jun. 994, p. 29. [5] S.-Y. Tung, Multiterminal source coding, Ph.D. Thesis, Cornell University, Ithaca, NY, 978. [6] T. Berger, Multiterminal source coding, in The Information Theory Approach to Communications, G. Longo, Ed. New York: Springer- Verlag, 978. [7] T. S. Han and S. Amari, Statistical inference under multiterminal data compression, IEEE Trans. Inf. Theory, vol. 44, no. 6, pp. 23 2324, Oct. 998. [8] C. Tian and J. Chen, Successive refinement for hypothesis testing and lossless one-helper problem, IEEE Trans. Inf. Theory, vol. 54, no., pp. 4666 468, Oct. 28. [9] M. S. Rahman and A. Wagner, On the optimality of binning for distributed hypothesis testing, IEEE Trans. Inf. Theory, vol. 58, no., pp. 6282 633, Oct. 22. [] A. H. Kaspi, Two-way source coding with a fidelity criterion, IEEE Trans. Inf. Theory, vol. 3, no. 6, pp. 735 74, 985. [] A. El Gamal and Y.-H. Kim, Network Information Theory. Cambridge: Cambridge University Press, 2. [2] A. Orlitsky and J. R. Roche, Coding for computing, IEEE Trans. Inf. Theory, vol. 47, no. 3, pp. 93 97, 2. [3] N. Ma and P. Ishwar, Some results on distributed source coding for interactive function computation, IEEE Trans. Inf. Theory, vol. 57, no. 9, pp. 68 695, Sep. 2. [4] A. D. Wyner and J. Ziv, The rate distortion function for source coding with side information at the decoder, IEEE Trans. Inf. Theory, vol. 22, no., pp., 976. 72