On Rate Requirements for Achieving the Centralized Performance in Distributed Estimation

Size: px

Start display at page:

Download "On Rate Requirements for Achieving the Centralized Performance in Distributed Estimation"

Irma Simpson
6 years ago
Views:

1 O Rate Requiremets for Achievig the Cetralized Performace i Distributed Estimatio Mostafa El Gamal ad Lifeg Lai Abstract We cosider a distributed parameter estimatio problem, i which multiple termials sed messages related to their local observatios usig limited rates to a fusio ceter who will obtai a estimate of a parameter related to the observatios of all termials. It is well kow that if the trasmissio rates are i the Slepia-Wolf regio, the fusio ceter ca fully recover all observatios ad hece ca costruct a estimator havig the same performace as that of the cetralized case. Oe atural questio is whether Slepia-Wolf rates are ecessary to achieve the same estimatio performace as that of the cetralized case. I this paper, we show that the aswer to this questio is egative. We establish our result by explicitly costructig a asymptotically miimum variace ubiased estimator (MVUE) that has the same performace as that of the optimal estimator i the cetralized case while requirig iformatio rates less tha the coditios required i the Slepia-Wolf rate regio. The key idea is that, istead of aimig to recover the observatios at the fusio ceter, we desig uiversal schemes eablig the fusio ceter to compute a sufficiet statistic usig rates outside of the Selpia-Wolf regio. Idex Terms Distributed learig, MVUE, estimatio algorithm, Slepia-Wolf rates, uiversal ecodig/decodig scheme. I. INTRODUCTION Motivated by applicatios i sesor etworks ad other areas, the problem of distributed estimatio has bee extesively ivestigated from various perspective [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17]. As observatios are distributed over multiple termials i the distributed settig, the performaces of distributed estimators are o better tha those of cetralized estimators who have access to all observatios. The questio we address i this paper is, to achieve the same performace as that of the cetralized setup, how much iformatio has to be exchaged i the distributed settig. We cosider this problem for the followig setup. There are two radom variables (X, Y ) with a joit probability mass fuctio (PMF) P θ (X, Y ) parameterized by a ukow parameter θ. Two termials A ad B observe X ad Y respectively ad sed messages related to their ow local observatios with limited rates to termial C, which will the obtai a estimate of the ukow parameter. It is well kow that if the trasmissio rates from the termials are iside the Slepia-Wolf rate regio [18], there exists a uiversal The results i this paper were preseted i part at 53th Aual Allerto Coferece o Commuicatio, Cotrol, ad Computig, 2015 [1]. The authors are with Departmet of Electrical ad Computer Egieerig, Worcester Polytechic Istitute. {melgamal, llai}@wpi.edu. The work was supported by the Qatar Natioal Research Fud uder Grat QNRF codig scheme [19] that eables termial C to fully recover (X, Y ). Hece, oce the trasmissio rates are iside the Slepia-Wolf rate regio, the performace of the best estimator for the distributed setup is the same as that of the best estimator for the cetralized case. Oe atural questio is: are Slepia-Wolf rates ecessary to achieve the same estimatio performace as that of the cetralized case? The aswer to this questio has sigificat implicatios i the distributed estimatio. If the aswer is yes, the to obtai the best estimate of the ukow parameter requires trasmissio rates to be so high that they are sufficiet to fully recover the observatios at the decoder, hece o rate reductio is possible. O the other had, if the aswer is o, the the observatios ca be compressed beyod the limits of source codig for full observatio recovery. At a first glace, the aswer to this questio should be o as we are oly iterested i estimatig a parameter related to the observatios ad are ot iterested i recoverig the observatios themselves. However, all existig related works idicate otherwise. For example, [20] addressed the same questio ad suggested that Slepia-Wolf rates might be ecessary. I additio, the performace of the best kow estimator by Ha ad Amari [21] does ot match that of the cetralized case whe the iformatio rates are outside of the Slepia-Wolf rate regio. Furthermore, [22] showed that, uder certai coditios, extractig eve oe bit of iformatio from distributed sources is as hard as recoverig full observatios ad hece requires the iformatio rates to be i the Slepia- Wolf rate regio. I this paper, we show that the aswer to this questio is ideed o. We establish our result by explicitly costructig a distributed estimatio algorithm that achieves the same performace as that of the optimal estimator for the cetralized case while usig iformatio rates outside of the Slepia-Wolf regio. The mai observatio is that, to costruct a estimator that has the same performace as that of the cetralized case, the fusio ceter eeds oly sufficiet statistics ot full data. Based o this observatio, the key idea of our algorithm is that, istead of tryig to fully recover the source observatios, we desig schemes that eable the fusio ceter to recover sufficiet statistics usig less iformatio rates. To illustrate the idea, we first cosider biary symmetric sources (i.e., both X ad Y are biary sequeces) parameterized by a ukow parameter θ. For this model, i our algorithm, we first desig a uiversal codig/decodig scheme that eables termial C to compute compoet-wise moduletwo sum Z = X Y, which ca be achieved usig rates

2 outside of the Slepia-Wolf rate regio, ad the costruct a estimator usig Z. Here deotes elemet-wise xor. We show that our estimator is a asymptotically miimum variace ubiased estimator (MVUE) [23] ad achieves the same variace idex as that of the best estimator i the cetralized case. We the geeralize our study to geeral biary sources models that are ot ecessarily symmetric aymore. Compared with the symmetric case, there are two additioal challeges: 1) Z aloe is ot a sufficiet statistic aymore; ad 2) We do ot have a MVUE to compare the performace to aymore, as it is ot clear whether a MVUE exists ad eve if it exists its form is model depedet. To address the first issue, we modify our scheme ad ask the trasmitters to sed additioal iformatio (more specifically, empirical margial PMF) that requires dimiishig rate. Combiig Z with these additioal iformatio, the fusio ceter ca the costruct the empirical joit PMF, which is a sufficiet statistic. To address the secod issue, we show a stroger result that for ay cetralized estimator, we ca costruct a plugi estimator with the same performace by usig the oly decoded iformatio at termial C. We further exted our results to a more geeral class of o-biary sources ad show that our algorithm ca also achieve the same performace as that of the best estimator i the cetralized case while usig trasmissio rates less tha the coditios required i the Slepia-Wolf rate regio. Fially, although our estimatio algorithm achieves the cetralized performace at rates less tha Slepia-Wolf rates, there is o optimality guaratee at very low rates which ca be the case for a umber of practical applicatios. To address this, we propose a practical desig of our estimatio algorithm ad show that it outperforms the best kow estimator by Ha ad Amari [21] at all rates. I [21], the authors established their estimatio algorithm by itroducig auxiliary radom variables ad solvig the maximum-likelihood equatio. They showed that their estimatio algorithm achieves a smaller variace tha the estimator by Zhag ad Berger [24]. This paper builds ad substatially expad upo our recet coferece paper [1]. Compared with [1], we provide detailed proof for the case with biary symmetric sources, we exted our results to geeral biary sources i Sectio IV, we also propose a practical desig of our estimator that achieves a good performace at all compressio rates i Sectio VI, ad we add more umerical results i Sectio VII. The rest of the paper is orgaized as follows. We itroduce the problem formulatio i Sectio II. I Sectio III, we establish our mai results for biary symmetric sources, the we geeralize it to o-symmetric biary sources i Sectio IV. We exted our work to a more geeral class of iformatio sources i Sectio V. We propose a practical desig of our estimatio algorithm i Sectio VI. We preset the simulatio results i Sectio VII. Fially, we coclude the paper i Sectio VIII. II. PROBLEM FORMULATION Cosider two iformatio sources X ad Y takig values from the discrete alphabets X ad Y, respectively. (X, Y ) = {(X i, Y i )} i=1 are idepedetly ad idetically distributed (i.i.d.) observatios draw accordig to the parametric joit PMF P θ (X, Y ) where θ Θ is the ukow parameter. We assume that the rage of Θ is bouded ad hece θ u max{ if(θ), sup(θ) } is fiite. We cosider a distributed setup i which X are observed at termial A ad Y are observed at termial B. Usig limited rates, these two termials sed messages related to their ow local observatios to a fusio ceter (termial C), which will the obtai a estimate ˆθ of θ usig these messages. The setup is illustrated i Fig. 1. Termial A Termial B Fig. 1: System Model. Termial C I particular, termial A employs a ecodig fuctio g 1 : X g 1 (X ), while termial B employs a ecodig fuctio g 2 : Y g 2 (Y ). The code rates are R X = log g 1, R Y = log g 2, (1) where g i is the cardiality of the ecodig fuctio g i. From g 1 (X ) ad g 2 (Y ), the decoder obtais a estimate ˆθ of the ukow parameter θ usig estimator ψ: ˆθ = ψ(g 1 (X ), g 2 (Y )). (2) To evaluate the quality of the estimator, we use the variace idex that is defied as 1 V θ [ˆθ] = lim Var θ[ˆθ] = lim E θ[(ˆθ E[ˆθ]) 2 ]. (3) It is desirable to have a estimator that is asymptotically ubiased, i.e., E θ [ˆθ] θ as, rage-preservig, i.e., the rage of the estimatio fuctio ψ is Θ, ad has a small variace idex. It is well-kow that, if the codig rates satisfy (will be called Slepia-Wolf rates i the sequel) R X H θ (X Y ), (4) R Y H θ (Y X), (5) R X + R Y H θ (X, Y ), (6) there exists uiversal source codig schemes [19] (i.e., the codig scheme does ot depeds o the value of the ukow parameter θ) such that the decoder ca recostruct X ad Y with a dimiishig error probability. Here, H θ ( ) ad 1 Throughout the paper, we use the subscript θ to emphasize the fact that value of the quatity of iterest depeds o the parameter θ.

3 H θ ( ) deote the etropy ad coditioal etropy respectively. Hece, if (4)-(6) are satisfied, we ca obtai the same estimatio performace as that of the cetralized case. The questio we ask i this paper is: are Slepia-Wolf rates ecessary to achieve the same estimatio performace as that of the cetralized case? [20] ivestigated the same questio ad suggested that Slepia-Wolf rates appear to be ecessary for achievig the cetralized estimatio performace. I this paper, we show that Slepia-Wolf rates are ot ecessary. I particular, we show that there ideed exists a class of PMFs ad the correspodig distributed estimators that require commuicatio rates less tha the Slepia-Wolf rates while still achievig the same performace as that of the best estimator for the cetralized case. Throughout the paper, we use a upper case letter U to deote a radom variable, a lower case letter u to deote a realizatio of U, ad U to deote the discrete alphabet from which U takes values. For ay sequece u = (u(1),, u()) U, the relative frequecies (empirical PMF) π(a u ) (a u )/, a U of the compoets of u is called the type of u. Here (a u ) is the total umber of idices t at which u(t) = a. Chapter 11 of [25] cotais a comprehesive overview of useful properties of the type. III. BINARY SYMMETRIC CASE I this sectio, to illustrate our mai idea, we first cosider the case of biary symmetric sources with X = Y = 2 ad a joit PMF of (X, Y ) as give i Table I, i which the ukow parameter θ Θ = (0, 1). The isights obtaied here will be geeralized to more geeral models i later sectios. X/Y θ/2 (1 θ)/2 1 (1 θ)/2 θ/2 TABLE I: The joit PMF of biary symmetric sources. Note that for this model, either termial A or termial B aloe will be able to obtai a meaigful estimatio of the value of θ, as the margial distributios of X ad Y are idepedet of θ. O the other had, to estimate θ, the fusio ceter does ot eed to kow (X, Y ) fully. It is easy to check that the compoet-wise module-two sum Z = X Y [X 1 Y 1,, X i Y i,, X Y ] is a sufficiet statistic for estimatig θ. Hece, as log as the fusio ceter ca compute Z, it ca costruct a estimator that has the same performace as that of the cetralized case. Based o this observatio, we show that, to estimate θ for this class of PMFs, we ca achieve the cetralized estimatio performace usig rates that do ot satisfy (4)-(6). We establish this result usig two steps: 1) i the first step, we desig a uiversal ecoder at termials A ad B ad uiversal decoder at termial C to compute the modulo-two sum Z = X Y ; 2) i the secod step, we costruct a estimator usig Z ad aalyze its performace. A. Step 1: Computig Z Here, we discuss how to uiversally compute Z = X Y at termial C. Towards this goal, we will use the same liear code at both ecoders ad use a miimum etropy decoder at termial C. Sice the ecoders at termials A ad B are the same, we use the followig simplified otatio f = g 1 = g 2, R = R X = R Y. (7) The followig theorem shows that as log as R H θ (X Y ) = H θ (Y X), the decoder ca recostruct Z with a dimiishig error probability. Theorem 1: If R H θ (X Y ) = H θ (Y X), (8) there exist uiversal ecodig/decodig fuctios to recostruct Z = X Y at termial C with a expoetially decreasig error probability. Proof: The proof follows a similar structure as the proofs i [26] ad [19]. I particular, usig the ideas i [19], we modify the proof of [26] to make it uiversal. Radom Code Geeratio: We use a liear code f with a ecodig matrix A of size R to map {0, 1} to {0, 1} R. Hece f = 2 R. We idepedetly geerate each etry of A usig a uiform biary distributio, i.e., each etry of A is 0 or 1 with probability 0.5. Ecodig: The ecoded messages of the realizatios x {0, 1} ad y {0, 1} are f(x ) = x A, f(y ) = y A, (9) i which the operatios are all i biary field. Decodig: The decoder first combies the messages ito a sigle message as f(x ) f(y ). (10) It follows from the code liearity that f(x ) f(y ) = f(x y ) = f(z ). (11) From f(x y ), termial C uses a miimum etropy decoder to obtai ẑ. I particular, for each z such that f( z ) = f(x y ), the miimum etropy decoder first calculates the etropy of its type, the picks the oe that has the least etropy to be the decoded sequece. I the followig, to simplify the otatio, we use Z () ad Z () to deote dummy radom variables whose PMFs P ad P Z() Z () are the same as the types of z ad z, respectively. The fial decoded message is deoted as ẑ = φ(f(z )), (12) where φ deotes the miimum etropy decodig fuctio. Error Probability Aalysis: A decodig error occurs if ad oly if there exists a sequece ẑ z such that f(ẑ ) = f(z ) ad H(Ẑ() ) H(Z () ). (13)

4 The error probability, averagig over all possible codebooks, is P e () = P θ (z )Pr(ẑ z ) = Pr(f)P (), (14) z {0,1} f i which P θ (z ) Pr(Z = z ), ad P () deotes the error probability if a particular codebook f is used. By aalyzig (14), we show that there exists a particular codebook f such that P () 0 expoetially as as log as the coditios i the theorem are satisfied. Detailed aalysis ca be foud i Appedix A. This implies that if we use f, the the fusio ceter will be able to compute Z with a expoetially decreasig error probability. Theorem 1 implies that the required rates to decode Z = X Y with a small error probability is R X H θ (X Y ), (15) R Y H θ (Y X). (16) This rate regio is larger tha the Slepia-Wolf regio i (4)- (6), as the coditio R X + R Y H θ (X, Y ) is ot ecessary aymore. B. Step 2: Estimatio After obtaiig Ẑ, which is equal to Z with a probability covergig to 1 expoetially, we the desig a asymptotically MVUE of θ. Our estimator is ˆθ = (0 Ẑ ), (17) i which the otatio ( ) is defied i Sectio II. Theorem 2: If the coditios i Theorem 1 are satisfied, the estimator i (17) is a asymptotically MVUE ad achieves the optimal variace idex as that of the cetralized case. Proof: We establish this result by showig that the estimator (17) achieves the same performace as that of the optimal estimator i the cetralized case. Optimal Cetralized Estimator: First cosider the cetralized case i which X ad Y are both kow perfectly. Let ( 1, 2, 3, ) 4 deote the joit type of the sequeces x ad y, where ( 1, 2, 3, 4 ) are the frequecies of occurrece of the pairs {(0, 0), (1, 1), (0, 1), (1, 0)}, respectively. The joit PMF of (x, y ) is ( ) (1+ θ 2)( ) (3+ 1 θ 4) P θ (x, y ) =. (18) 2 2 Cosider the cetralized estimator ˆθ c = ( ). (19) This estimator is ubiased sice E θ [ˆθ c ] = θ. (20) The variace of the estimator is calculated as Var θ [ˆθ c ] = 1 2 E θ[( ) 2 ] θ 2 = θ(1 θ). (21) The variace idex is give by V θ [ˆθ c ] = lim Var θ[ˆθ c ] = θ(1 θ). (22) The Cramer-Rao lower boud (CRLB) of the cetralized case is [ 2 l[p θ (x, y ] )] CRLB = 1/E θ 2 θ θ(1 θ) = = Var θ [ˆθ c ]. (23) This implies that ˆθ c is a MVUE for the cetralized case. Compariso: Now, come back to our decetralized case, for which our estimator is ˆθ = (0 Ẑ ). (24) We will compare the performace of ˆθ with that of the optimal cetralized estimator ˆθ c. For the codebook f, defie T e () as the set of sequeces z s that are icorrectly decoded. Therefore, P () = P θ (z ). (25) z T () e The expected value of our estimator is give by E θ [ˆθ] = Pr(Ẑ = z ) (0 z ). (26) z {0,1} Note that Pr(Ẑ = z ) is ot ecessarily equal to P θ (z ), ad the sum of the probability differece ca be bouded as Pr(Ẑ = z ) P θ (Z = z ) z {0,1} We have that Sice the 2 z T () e P θ (z ) = 2P (). (27) E θ [ˆθ] E θ [ˆθ c ] Pr(Ẑ = z ) P θ (Z = z ) (0 z ). z {0,1} E θ [ˆθ] E θ [ˆθ c ] 0 (0 z ) 1, (28) z {0,1} Pr(Ẑ = z ) P θ (Z = z ) 2P (), (29) i which the last iequality is due to (27). As P () is show to coverge to zero expoetially fast i Sectio III-A, we have lim E θ[ˆθ] = E θ [ˆθ c ] = θ. (30)

5 This shows that our estimator is asymptotically ubiased. Similarly, we have Hece, Var θ [ˆθ] Var θ [ˆθ c ] E θ [ˆθ 2 ] E θ [ˆθ 2 c] + E θ [ˆθ] E θ [ˆθ c ] 4P (). (31) () V θ [ˆθ] V θ [ˆθ c ] lim 4P. (32) As, P () 0 expoetially, we have 4P () 0. Therefore, V θ [ˆθ] = V θ [ˆθ c ] = θ(1 θ). (33) This proves that our estimator is asymptotically ubiased ad achieves the same miimum variace that ca be achieved eve i the cetralized case. Hece, our estimator is optimal. Slepia- Wolf Regio some cases, MUVE may ot exist). Despite these challeges, we prove the followig result: Theorem 3: For ay biary source with a parametric PMF P θ (X, Y ), where θ Θ is the ukow parameter ad Θ is a bouded set, there exits a ubiased estimator ˆF based o Z = X Y that achieves the cetralized performace asymptotically ad requires commuicatio rates of R X = R Y H θ (Z). (34) Proof: The proof cosists of two mai steps: 1) i the first step, we costruct a scheme to eable the fusio ceter to compute a sufficiet statistic with expoetially dimiishig error probability; 2) i the secod step, we establish a estimator usig the computed statistics ad show that the estimator achieves the performace of the cetralized estimator. Step 1: Computig a Sufficiet Statistic Differet from the biary symmetric case cosidered i Sectio III, Z is ot a sufficiet statistic for the geeral biary ( case aymore. Now, we show the joit type P X () Y = () 1, 2, 3, ) 4 of the observatio sequeces (x, y ) is a sufficiet statistic ad show how to compute this statistics at the fusio ceter usig rates (34). Let T PX () Y be the set of all sequece pairs (x, y ) () that have the joit type P X () Y (). The coditioal PMF of (X, Y ) give the joit type P X () Y () is P θ (x, y P X () Y ()) = { 0, if (x, y ) / T PX () Y () 1 T PX () Y (), otherwise, (35) Fig. 2: : the rate pair required i our estimator, which is outside of the Slepia-Wolf rate regio. Combiig Theorems 1 ad 2, we coclude that, i the distributed parameter estimatio, the Slepia-Wolf rates are ot ecessary to achieve the same optimal estimatio performace as that of the cetralized case. Fig. 2 illustrates the compariso betwee the Slepia-Wolf rate regio ad the rate pair used i our estimator. IV. GENERAL BINARY CASE I this sectio, we exted our study to the geeral biary source models P θ (X, Y ). Here, we do ot make ay particular assumptio of the form of P θ (X, Y ). For example, P θ (X, Y ) could be a oliear fuctio of θ. Similar to the previous sectio, we assume that P θ (X = i, Y = j) > 0 for all θ Θ ad i, j {0, 1}. Compared with the biary symmetric source model cosidered i Sectio III, there are two additioal challeges. First, the compoet-wise module-two sum Z is ot a sufficiet statistic i geeral, hece recoverig Z aloe is ot eough. Secod, ulike the symmetric case i which we have a MUVE cetralized estimator to compare to, we caot do that aymore as we are cosiderig geeral models whose optimal cetralized is model specific (ad i which is ot a fuctio of θ. Therefore, the joit type P X () Y () is a sufficiet statistic of θ. Now we show how to compute this statistic at the fusio ceter with rates i (34). Ecodig: At termials A ad B, we first ecode X ad Y usig the same scheme preseted i Sectio III-A. This will eable termial C to compute Z. I additio, each termial will sed the margial types P X () P () X ad P Y () P () Y of the sequeces x ad y, respectively. The umber of margial types ca be bouded as [25] P () X = P() Y ( + 1)2. (36) Therefore each of the margial types ca be ecoded usig the rate 2 log(+1), which goes to zero as icreases. Hece, sedig these additioal iformatio requires dimiishig additioal rates. Decodig: At termial C, we first decode Ẑ usig the same scheme as discussed i Sectio III-A. Oce Ẑ is decoded, ( termial C will compute the joit type ˆP X () Y = () ˆ1, ˆ2, ˆ3, ) ˆ4 by combiig Ẑ alog with the additioal iformatio P X (), P Y () set from termials A ad B re-

6 spectively. I particular, from these iformatio, we have the followig relatioship ˆ 1 + ˆ 2 = (0 Ẑ ), (37) ˆ 1 + ˆ 3 = P X ()(x = 0), (38) ˆ 1 + ˆ 4 = P Y ()(y = 0), (39) 4 ˆ i =. (40) i=1 From these four equatios, we ca easily obtai ˆP X () Y (). Error Probability: Defie P e () as P e () = Pr( ˆP X () Y P () X () Y ()). (41) As show i Sectio III-A, Z ca be decoded at the rates give i (34) with a expoetially decreasig probability of error. Furthermore, the margial types ca be perfectly recovered at asymptotically zero rates, the the joit type P X () Y ca be computed with a expoetially decreasig () error probability P e (). Step 2: Estimatio I the biary symmetric case cosidered i Sectio III, we have MVUE for the cetralized case ad hece we ca compare our distributed estimator with this cetralized MVUE. I the geeral biary model, this approach will ot work as we do t kow whether or ot a MVUE exists. Furthermore, eve if it exists, the form of MVUE is model specific. I the followig, we show a stroger result that we ca achieve the same performace for ay cetralized estimator. First, as P X () Y is a sufficiet statistic for the cetralized case, by Rao-Blackwell theorem [23], if we wat () to miimize the variace of ubiased estimators, we ca focus o estimators that are fuctios of P X () Y (), amely F c = F (P X () Y ()), for the cetralized case. For ay ubiased F c, we desig the followig simple plugi estimator As the result, we have hece as P () e Therefore, E θ [ ˆF ] E θ [F c ] 2P () e θ u, (45) lim E θ[ ˆF ] = E θ [F c ] = θ, (46) goes to zero expoetially. Similarly, Var θ [ ˆF ] Var θ [F c ] 2P () e (θ 2 u + θ u ). (47) V θ [ ˆF ] = V θ [F c ]. (48) This implies that the plugi distributed estimator ˆF achieves the same performace as that of the cetralized estimator F c if the rate coditio (34) is satisfied. Depedig o the PMF of the biary source, the required sum rate to achieve the optimal cetralized performace 2H θ (Z) as obtaied usig our algorithm ca be less tha Slepia-Wolf sum rate H θ (X, Y ). As a example, cosider a o-symmetric oliear biary source with the PMF show i Table II. X/Y /4 + θ 2 1/4 θ 2 1 1/4 θ 1/4 + θ TABLE II: A example of a joit PMF of a o-symmetric biary source with θ Θ = (0, 1/4). Although the joit PMF give i Table II is ot symmetric ad oliear i θ, the required rates to obtai a ubiased estimator that achieves the cetralized performace are still lower tha Slepia-Wolf rates as show i Fig. 3. ˆF = F ( ˆP X () Y ()). (42) I the followig, we compare the performace of F c ad ˆF. We have that E θ [ ˆF ] = Pr( ˆP X () Y = P () x () y ())F (P x () y ()), (43) ad P x () y () E θ [ ˆF ] E θ [F c ] Pr( ˆP X () Y = P () x () y ()) P θ(p x () y ()) P x () y () F (P x () y ()). (44) Sice F (P x () y ()) Θ ad Θ is bouded, we have F (P x () y ()) θ u. Furthermore, followig similar steps as that of (27), we have Pr( ˆP X () Y = P () x () y ()) P () θ(p x () y ()) 2P e. P x () y () Fig. 3: The required rates to achieve the optimal cetralized performace for the biary source give i Table II is lower tha Slepia-Wolf rates.

7 V. NON-BINARY MODELS I this sectio, we exted our results for biary models to more geeral class of o-biary models. Let X = Y = {0, 1,..., M 1} ad cosider the class of PMFs P θ (X = i, Y = j) = { θ M, if (i + j) M 1, otherwise, 1 θ(m 1) M (49) 1 where θ Θ = (0, (M 1)). Note that each iformatio source has a uiform margial PMF ad settig M = 2 recovers the biary case. Similar to the biary case, we first use a liear code ad miimum etropy decoder to recostruct Z = (X + Y ) mod M at the decoder ad the desig a estimator from Z. I this sectio, we use mod M to deote elemet-wise mod operatio, I particular, we use a liear code f that maps {0, 1,..., M 1} to {0, 1,..., M 1} k. The ecoded messages of the realizatios x {0, 1,..., M 1} ad y {0, 1,..., M 1} are f(x ) = x A, f(y ) = y A, (50) Followig similar steps as those i the biary case, we ca show that, if (55)-(56) are satisfied, the estimator i (57) is asymptotically ubiased ad achieves a variace idex V θ [ˆθ] = θ[1 θ(m 1)]. (58) M 1 We ca further show that (58) is the best variace idex that ca be achieved eve i the cetralized case. This implies that our algorithm achieves the cetralized performace usig rates outside the Slepia-Wolf regio. VI. PRACTICAL APPROACH I the previous sectios, we established a ubiased estimator that achieves the cetralized performace for a umber of iformatio sources, while requires less rates tha Slepia- Wolf rates. For biary symmetric sources ad its extesio, our estimator achieves the CRLB withi the combied regios of Slepia-Wolf ad the dotted regio as show i Fig. 4, where R X H θ (X Y ), R Y H θ (Y X). (59) i which the code matrix A has rows ad k colums with each etry takig values from {0, 1,..., M 1}. The codig rate is R = k log M. (51) The decoder first combies the ecoded messages ito a sigle message as f(x ) + f(y ) mod M. (52) The fial decoded message is give by ẑ = φ(f(z )), (53) where φ the the miimum etropy decodig fuctio. Followig the same error probability aalysis for the biary case, we ca show that there exists a codebook f (ad hece a particular ecodig matrix A) that achieves a probability of decodig error P () 0 expoetially as if R H θ (Z) = H θ (X Y ) = H θ (Y X). (54) Therefore, as log as R X H θ (X Y ), (55) R Y H θ (Y X), (56) we ca recostruct Z = X + Y mod M at the decoder with a expoetially dimiishig error probability. After obtaiig Ẑ, which is equal to Z with a probability covergig to 1 expoetially, our estimator is ˆθ = (M 1 Ẑ ). (57) (M 1) Fig. 4: The low rates iside the dashed regio are cosidered i this sectio. Our estimator is optimal if Z = X Y is decoded with a vaishig probability of error. Otherwise, there is o optimality guaratee. I practical applicatios, the commuicatios rates ca be lower tha our coditios (59). Therefore, we modify the desig of our estimatio algorithm i this sectio to esure a good performace at all rates icludig the low rates iside the dashed regio as show i Fig. 4. We start with the case of biary symmetric sources the we exted the results to the geeral class of PMFs as preseted i Sectio V. For biary symmetric sources, we assume that the ukow parameter θ takes values i (0, t), where t (0, 0.5) is kow. First, we apply the ecodig/decodig scheme itroduced i Sectio III to ecode p observatios (x p, y p ) ad decode ẑ p = x p y p, where {, if R H(t) p = R H(t), otherwise, (60)

8 where is a operator that maps its argumet to the largest previous iteger, the we modify our estimator as followig: ˆθ = (0 Ẑp ). (61) p The followig Theorem states the performace bouds of our estimator. Theorem 4: If R H(t), (62) our estimator is a asymptotically MVUE. Otherwise, our estimator is asymptotically ubiased ad its variace idex is bouded as V θ [ˆθ] Proof: Case 1: R H(t) R H(t) H(t)θ(1 θ). (63) R H θ (Z) = H θ (X Y ) = H θ (Y X). (64) I this case, p =, ad our estimator is give by ˆθ = (0 Ẑ ). (65) As we proved i the previous sectios, this estimator is a asymptotically MVUE if R H θ (Z). Case 2: R < H(t) I the cetralized case, cosider the estimator ˆθ c = ( ), (66) p where 1 ad 2 are the frequecy of occurrece of the pairs (0, 0) ad (1, 1) i the observatios (x p, y p ), respectively. We have that E θ [ˆθ c ] = pθ = θ, (67) p ad Var θ [ˆθ c ] = θ(1 θ). (68) p I the decetralized case, the effective rate per observatio is give by Sice the R eff = R p. (69) p R H(t), (70) R eff H(t) H θ (Z). (71) For this rage of rates, we showed that lim E θ[ˆθ] = E θ [ˆθ c ] = θ. (72) Therefore, our estimator is asymptotically ubiased. We also have that It is obvious that Hece, V θ [ˆθ] = lim Var θ[ˆθ c ] V θ [ˆθ] = lim θ(1 θ). (73) p p R 1. (74) H(t) H(t)θ(1 θ). (75) R For the geeral class of PMFs give i (49), we assume that 1 θ takes values i (0, t), ad t (0, 2(M 1) ). We establish our estimator as ˆθ = p (M 1 Ẑp ). (76) p(m 1) Followig similar steps to the proof of Theorem 4, we have that our estimator is a asymptotically MVUE if R H(t). Otherwise, our estimator is asymptotically ubiased ad its variace idex is bouded as V θ [ˆθ] H(t)θ[1 θ(m 1)]. (77) R(M 1) For biary symmetric sources ad its extesio, we guaratee a worst case performace that is a fuctio of the commuicatio rate R. I the followig sectio, we show that despite of a small performace degradatio i the rate regio H(θ) R < H(t) as compared to our estimator i Sectio V, we maaged to achieve a very good performace at low rates. VII. NUMERICAL RESULTS I this sectio, we use several umerical examples to illustrate the compariso betwee our estimators to the best kow estimator by Ha ad Amari [21]. I the simulatio, we fix the ukow parameter θ ad chage the ecodig rates R X ad R Y such that R X = R Y = R H θ (Z). (78) We coduct the compariso for M = 2 ad M = 4, respectively. For our estimator i Sectio V ad M = 2, the variace idex of our estimator is (33), while the variace idex of the estimator by Ha ad Amari is calculated i example 3 of [21] (V θ [ˆθ]) HA (79) { ( a 2 b 2 4 θ 1 2 } [1 (1 4a 2) 2 )(1 4b 2 )], where a ad b are fuctios of R X ad R Y, whose expressios are give i (14.12) ad (14.13) of [21], respectively.

9 Fig. 5: Performace Compariso: θ = 0.05, M = 2 Ha ad Amari s estimator relies o the choice of the test chaels. The authors did ot specify a optimal choice of the test chaels i order to exted example 3 i [21] to the case of M = 4. We fid the followig mappig to be a atural extesio: Q = { 0, if X {0, 1} 1, if X {2, 3}, T = { 0, if Y {0, 1} 1, if Y {2, 3}. (80) Notice that (Q, T ) are distributed accordig to a biary symmetric PMF with a ukow parameter α = 2θ. Usig a estimator ˆθ = ˆα 2 leads to the followig expressio for the variace idex: (V θ [ˆθ]) HA (81) { ( a 2 b 2 4 2θ 2) 1 2 } [1 (1 4a 2 )(1 4b 2 )]. Fig. 6: Performace Compariso: θ = 0.9, M = 2 Fig. 5 ad Fig. 6 show the performace gai, i terms of the variace idex, of our estimator over Ha ad Amari s estimator for biary symmetric sources (M = 2) at two differet values of the ukow parameter, θ = 0.05 ad θ = 0.9, respectively. The performace differece is more oticeable at low rates. For θ = 0.05, the Slepia-Wolf sum rate is R X + R Y = 1.29 bits, while our estimator requires a sum rate of R X + R Y = 2R = 0.57 bits. For θ = 0.9, the Slepia-Wolf sum rate is 1.47 bits, while our estimator requires a sum rate of 0.94 bits. Furthermore, for Ha ad Amari s estimator to achieve the cetralized performace, the required sum-rate is 2 bits for both cases, which is ot oly much larger tha the sum rate required i our estimator but also much larger tha the sum-rate required by coditios specified i the Slepia-Wolf rate regio. For our estimator i Sectio V ad M = 4, the variace idex of our estimator is give i (58). The performace of Fig. 7: Performace Compariso: θ = 0.01, M = 4 Fig. 7 compares the variace idices achieved usig our estimator ad Ha ad Amari s estimator for M = 4 ad θ = It is clear that our estimator outperforms that of Ha ad Amari s estimator. Furthermore, the performace differece is more oticeable at low rates. The Slepia-Wolf sum rate is 2.24 bits, while our estimator requires a sum rate of 0.48 bits. For our practical estimator i Sectio VI ad M = 2, the variace idex of our estimator is bouded as i (77) if R < H(t). Otherwise, it achieves the CRLB. The variace idex of Ha ad Amari s estimator is (79). For our practical estimator i Sectio VI ad M = 4, the variace idex of our estimator is bouded as i (75) if R < H(t). Otherwise, it achieves the CRLB. The variace idex of Ha ad Amari s estimator is (81). Fig. 8 ad Fig. 9 show that our estimator outperforms Ha ad Amari s estimator at all rates. The performace differece is more oticeable at very low rates, which makes our estimator a good choice for applicatios with low rate costraits. Our estimator performs better for smaller values of the rage of θ, which is determied by t.

10 APPENDIX A ERROR PROBABILITY ANALYSIS IN THE PROOF OF THEOREM 1 To aalyze the probability of the decodig error, let z {0, 1} deote a sequece such that z z, f( z ) = f(z ). (82) Fig. 8: Performace Compariso: θ = 0.05, M = 2, t = 0.5 ad 0.1 Let Z() be a dummy radom variable whose PMF P Z() is the same as the type of z. Defie P () as the set of all joit types betwee ay two sequeces z ad z. For ay give f (equivaletly for a give ecodig matrix A), defie Nf () as the umber of sequeces z such that there exists aother sequece z havig the joit type P Z () P () Z() Z ad (82) holds. Z Sice each etry i A is uiformly distributed, the each elemet i f(z ) is uiformly distributed if z is a ozero sequece. Therefore, Pr(f(z ) = 0) = (0.5) R = 1 f, (83) i which the probability is computed over all codebooks. This implies that Pr(f( z ) = f(z )) = Pr(f( z z ) = 0) = 1 f. (84) Defie T as the set of all sequece pairs PZ () (z, z ) that have the joit Z() type P Z () Z(), T PZ as the set of () all sequeces z that have the margial type P Z (), ad T P Z() Z () (z ) as the set of all sequeces z that have the joit type P Z () with z. The sizes of the sets T ad Z() PZ () T P Z() Z () (z ) are bouded as [27] T PZ () 2H(Z()), Fig. 9: Performace Compariso: θ = 0.01, M = 4, t = 0.16 VIII. CONCLUSION I this paper, we have aswered the questio: Are Slepia- Wolf rates ecessary to achieve the same estimatio performace as that of the cetralized case? We have showed that the aswer to this questio is egative by costructig a asymptoticly MVUE for biary symmetric sources usig rates less tha the coditios required i the Slepia-Wolf rate regio. We have showed that our estimatio algorithm ca work for geeral biary sources to achieve the cetralized estimatio performace. We have also exteded our work to a geeral class of o-biary iformatio sources by modifyig our estimatio algorithm. We have further proposed a practical desig of our estimatio algorithm ad compared our results to the best kow estimator by Ha ad Amari to show the superiority of our estimator. T P Z() Z () (z ) 2 H( Z () Z () )+ɛ, (85) where ɛ is a arbitrary small umber. Notice that, for ay give P Z () Z(), Nf () is a radom variable (radom over f) that ca be expressed as Nf () = 1 ( z z : f( z ) = f(z ), z T PZ () = z T PZ () ad (z, z ) T PZ () Z() 1 ( z z : f( z ) = f(z ), ad z T P Z() Z () (z ) ),(86) where 1( ) is the idicatio fuctio. The expectatio of N f () over all possible codebooks f is E[N f ()] = z T PZ () E [ 1 ( z z : f( z ) = f(z ), z T PZ z () T P Z() Z () (z ) ad z T P Z() Z () (z ) )] Pr(f( z ) = f(z )).(87) )

11 (84), (85), ad (87) imply that 2(H(Z() E[Nf ()] )+H( Z () Z () )+ɛ). (88) f Applyig the Markov s iequality, we have ( Pr Nf () 2(H(Z () )+H( Z () Z () )+ɛ) ( P () ) + δ) f 1, (89) P () + δ where P () is the total umber of possible joit types ad δ is a arbitrary small umber. To simplify the otatio, let B () 2(H(Z () )+H( Z () Z () )+ɛ) ( P () + δ). (90) f Cosiderig all joit types P Z () simultaeously, the Z() uio boud ad (89) imply that ( Pr Nf () B (), ) P Z () P () Z() P () P () + δ > 0. (91) Sice the probability i (91) is positive, the there exists a codebook f that the followig equatio holds for all joit types P simultaeously Nf () 2(H(Z () )+H( Z () Z () )+ɛ) ( P () + δ) f. (92) As f = 2 R ad P () ( + 1)4, we further have N f () (93) (( + 1) 4 + δ) 2 (H(Z() )+H( Z () Z () )+ɛ R). I the followig, we will focus o f. Let P () () deote the portio of error probability associated with a fixed joit type P Z () Z() P () () (94) P θ (z )1 ( z z : f ( z ) = f (z ), z T PZ () ad (z, z ) T PZ () Z() ). The total decodig error probability P (), whe usig f, ca be expressed as P () = P Z () Z() P () (). (95) Let A () ɛ 1 deote the set of margial types P Z () such that P Z ()(z = i) P θ (z = i) < ɛ1 2 for i {0, 1}, where ɛ 1 is a arbitrarily small umber. Usig the defiitio of A () ɛ 1, (95) ca be rewritte as P () = P Z () Z(),P Z () A () ɛ 1 + () P () P Z () Z(),P Z () Ā() ɛ 1 () P () S 1 + S 2, (96) where Ā() ɛ 1 deotes the complimetary set of A () ɛ 1. For S 2, we have that P () () 2 (D(P Z () P θ (Z))), (97) where D(P Z () P θ (Z)) is the KullbackLeibler divergece [25] betwee the margial type P Z () ad the true PMF P θ (Z) of Z = X Y. Usig Pisker s iequality [28], for P Z (), we have Ā() ɛ 1 Therefore, D(P Z () P θ (Z)) 2ɛ 2 1. (98) S 2 P Z () Z() 2 2ɛ2 1 ( + 1) 4 2 2ɛ2 1. (99) (99) implies that S 2 0 expoetially as. For S 1, we have that P () () N f () 2 (H(Z() )+D(P Z () P θ (Z)). (100) Usig (93), we further have P () () (101) ( ) (( + 1) 4 + δ) 2 D(P Z () P θ (Z))+R H( Z () Z () ) ɛ. As we use the miimum etropy decoder, we have H( Z () ) H(Z () ), which implies H( Z () Z () ) H( Z () ) H(Z () ). Therefore, Here P () () (102) ( ) (( + 1) 4 + δ) 2 D(P Z () P θ (Z))+R H(Z () ) ɛ. Sice P Z () A () ɛ 1, it is easy to check that H(Z () ) H θ (Z) D(P Z () P θ (Z)) + ɛ 2. (103) ɛ 2 = ɛ 1 2 log P θ (z = i), (104) i which ca be made arbitrarily small as ɛ 1 0 for θ (0, 1). Therefore, ( ) P () () (( + 1) 4 + δ) 2 R H θ (Z) ɛ 3, (105) i which ɛ 3 = ɛ + ɛ 2. This implies that S 1 0 expoetially as if R H θ (Z). (106)

12 Therefore, (106) is sufficiet to guaratee that P () 0 expoetially as. It is easy to check that H θ (Z) = H θ (X Y ) = H θ (Y X). The proof is complete. REFERENCES [1] M. ElGamal ad L. Lai, Are Slepia-Wolf rates ecessary for distributed parameter estimatio?, i Proc. Allerto Cof. o Commuicatio, Cotrol, ad Computig, (Moticello, IL), pp , Oct [2] G. Xu, S. Zhu, ad B. Che, Decetralized data reductio with quatizatio costraits, IEEE Tras. Sigal Processig, vol. 62, pp , Apr [3] A. Vempaty, H. He, B. Che, ad P. K. Varshey, O quatizer desig for distributed Bayesia estimatio i sesor etworks, IEEE Tras. Sigal Processig, vol. 62, pp , Oct [4] J. Zhu, X. Li, R. S. Blum, ad Y. Gu, Parameter estimatio from quatized observatios i multiplicative oise eviromets, IEEE Tras. Sigal Processig, vol. 63, pp , Aug [5] J. Zhag, R. S. Blum, X. Lu, ad D. Cous, Asymptotically optimum distributed estimatio i the presece of attacks, IEEE Tras. Sigal Processig, vol. 63, pp , Mar [6] S. Kar, H. Che, ad P. K. Varshey, Optimal idetical biary quatizer desig for distributed estimatio, IEEE Tras. Sigal Processig, vol. 60, pp , Jul [7] J. Fag ad H. Li, Optimal/ear-optimal dimesioality reductio for distributed estimatio i homogeeous ad certai ihomogeeous scearios, IEEE Tras. Sigal Processig, vol. 58, pp , Aug [8] F. S. Cattivelli ad A. H. Sayed, Diffusio LMS strategies for distributed estimatio, IEEE Tras. Sigal Processig, vol. 58, pp , Oct [9] S. Cui, J.-J. Xiao, A. J. Goldsmith, Z.-Q. Luo, ad H. V. Poor, Estimatio diversity ad eergy efficiecy i distributed sesig, IEEE Tras. Sigal Processig, vol. 55, pp , Sep [10] M. Ragisky, Learig from compressed observatios, i Proc. IEEE Iform. Theory Workshop, (Tahoe City, CA), pp , Sep [11] A. Xu ad M. Ragisky, Coverses for distributed estimatio via strog data processig iequalities, i Proc. IEEE Itl. Symposium o Iform. Theory, (Hog Kog, Chia), Ju [12] Y. Zhag, J. Duchi, M. I. Jorda, ad M. J. Waiwright, Iformatiotheoretic lower bouds for distributed statistical estimatio with commuicatio costraits, i Advaces i Neural Iformatio Processig Systems, (Statelie, NV), pp , Dec [13] O. Shamir ad N. Srebro, Distributed stochastic optimizatio ad learig, i Proc. Allerto Cof. o Commuicatio, Cotrol, ad Computig, (Moticello, IL), pp , Oct [14] J.-J. Xiao, S. Cui, Z.-Q. Luo, ad A. J. Goldsmith, Power schedulig of uiversal decetralized estimatio i sesor etworks, IEEE Tras. Sigal Processig, vol. 54, pp , Feb [15] A. Ribeiro ad G. B. Giaakis, Badwidth-costraied distributed estimatio for wireless sesor etworks, Part I: Gaussia case, IEEE Tras. Sigal Processig, vol. 54, pp , Mar [16] A. Ribeiro ad G. B. Giaakis, Badwidth-costraied distributed estimatio for wireless sesor etworks, Part II: Ukow probability desity fuctio, IEEE Tras. Sigal Processig, vol. 54, pp , Jul [17] J. Li ad G. AlRegib, Rate-costraied distributed estimatio i wireless sesor etworks, IEEE Tras. Sigal Processig, vol. 55, pp , May [18] D. Slepia ad J. K. Wolf, Noiseless codig of correlated iformatio sources, IEEE Tras. Iform. Theory, vol. 19, pp , Nov [19] I. Csiszár, Liear codes for sources ad source etworks: Error expoets, uiversal codig, IEEE Tras. Iform. Theory, vol. 28, pp , Jul [20] A. Zia, J. P. Reilly, ad S. Shirai, Distributed estimatio; three theorems, i Proc. IEEE Iform. Theory Workshop, (Tahoe City, CA), pp , Sep [21] T. S. Ha ad S.-I. Amari, Parameter estimatio with multitermial data compressio, IEEE Tras. Iform. Theory, vol. 41, pp , Nov [22] R. Ahlswede ad I. Csiszár, To get a bit of iformatio may be as hard as to get full iformatio, IEEE Tras. Iform. Theory, vol. 27, pp , Jul [23] H. V. Poor, A itroductio to sigal detectio ad estimatio. New York: Spriger Sciece & Busiess Media, [24] Z. Zhag ad T. Berger, Estimatio via compressed iformatio, IEEE Tras. Iform. Theory, vol. 34, pp , Mar [25] T. M. Cover ad J. A. Thomas, Elemets of Iformatio Theory. Wiley Series i Telecommuicatios ad Sigal Processig, Wiley- Itersciece, [26] J. Körer ad K. Marto, How to ecode the modulo-two sum of biary sources, IEEE Tras. Iform. Theory, vol. 25, pp , Mar [27] I. Csiszár, The method of types, IEEE Tras. Iform. Theory, vol. 44, pp , Oct [28] I. Csiszár ad J. Körer, Iformatio Theory: Codig Theorems for Discrete Memoryless Systems. Cambridge Uiversity Press, 2011.

Are Slepian-Wolf Rates Necessary for Distributed Parameter Estimation?

Are Slepian-Wolf Rates Necessary for Distributed Parameter Estimation? Are Slepia-Wolf Rates Necessary for Distributed Parameter Estimatio? Mostafa El Gamal ad Lifeg Lai Departmet of Electrical ad Computer Egieerig Worcester Polytechic Istitute {melgamal, llai}@wpi.edu arxiv:1508.02765v2