Channel optimization for binary hypothesis testing

Channel optmzaton for bnary hypothess testng Gancarlo Baldan Munther Dahleh MIT, Laboratory for Informaton and Decson systems, Cambrdge 039 USA emal: gbaldan@mt.edu) MIT, Laboratory for Informaton and Decson systems, Cambrdge 039 USA emal: dahleh@mt.edu) Abstract: In ths paper we consder the classcal bnary hypothess testng problem where the d samples are obtaned through a channel. Our goal s to study the relatonshp between the channel capacty and the goodness of the estmaton measured by the Chernoff nformaton n order to get an upper bound on the estmaton performances as well as some nsght on the structure of the optmal channel. Keywords: Optmal Estmaton; Hypotheses; Capacty.. INTRODUCTION The bnary hypothess testng problem s probably the smplest estmaton problem one can consder. In the classcal setup a sequence of samples x s drawn from an unknown n dmensonal probablty dstrbuton whch can be ether p Hypothess H ) wth pror probablty π or p Hypothess H ) wth pror probablty π. The problem s to nfer from the samples whch of the two hypothess s correct or, to be precse, the most lkely. Thsproblemsverywellknownandsoptmallysolvedusng the lkelhood rato test LRT) as shown, for example, n []. Furthermore, applyng a large devaton prncple, an asymptotc analyss can be performed to show that the probablty of error n the estmaton decays exponentally n the number of samples wth a rate gven by the so called Chernoff Informaton. Inthspaper,wewllconsderanextensontothsproblem motvated by the fact that each collected sample s always obtaned through a measurng system that can affect the estmaton process. To model the effects due to the measurng system we wll consder that the observatons at the sourceareavalableonlythroughafntecapacty,dscrete, memoryless stochastc channel. Our goal s to address the queston of desgnng such a channelto maxmze goodness of the estmaton as measured by the decay rate of the probablty of error as well as obtanng a relatonshp between the capacty of the channel and the qualty of the estmaton.. BASIC DEFINITIONS In ths secton we brefly revew some basc quanttes defned n Informaton theory. These quanttes wll be used throughout the whole paper and some approxmatons wll be ntroduced to make ther defntons more tractable. Ths work was supported under NSF/EFRI grant 0735956 and under AFOSR/MURI grant R6756-G TheKullbackLeblerdstancesoneofthemostmportant quanttes n nformaton theory and measure the dstance between two probablty dstrbuton p and q defned over the same alphabet X. The Kullback dstance s defned as: Dp q) = px)log px) qx), ) x X where the logarthm wll be always consdered n base e. Snce from defnton ) t s clear that Dp q) doesn t depend on the alphabet but just on the two dstrbutons themselves we wll usually adopt the notaton: Dp q) = log, ) wherensthecardnaltyofx andweomtthedependence on the alphabet. Anothermeasureofthedstancebetweentwodstrbutons p and q whch s closely related to the Bnary Hypothess testng problem, s the so called Chernoff nformaton whose defnton s: Cp,q) = Dp λ p) = Dp λ q), 3) where p λ s a probablty dstrbuton defned as: p λ = p λ q λ n pλ q λ and λ s such that Dp λ p) = Dp λ q). We wll often consder dscrete, memoryless, stochastc channels mappng the alphabet X nto a fnte alphabet Y whose cardnalty s m. Ths knd of channels s completely descrbed by a condtonal probablty dstrbuton: Wy x) = PY = y X = x), 4) whch can be regarded as an m n stochastc matrx and wll be often denoted just by W. To measure the capacty of such a channel we wll use the standard nformaton defnton:, C = max p x IX;Y) 5) Copyrght by the Internatonal Federaton of Automatc Control IFAC) 484

where X s a random varable such that X p x and Y Wp x s the correspondng random varable obtaned through the channel. IX;Y) s called mutual nformaton between X and Y and s defned as: IX;Y) = Dp xy p x p y ) = E x [DW x) p y )]. 6) 3. PROBLEM FORMULATION The problem we are tryng to face s, essentally, an optmzaton one and, therefore, to provde a correct formulaton, we have to dentfy three major components: optmzaton varables, cost functon and constrants. In ths secton we wll defne these components tryng to motvate the choces made. 3. Optmzaton varables. We model our sample source as a dscrete random varable X over a fnte alphabet X such that X = n. The mass dstrbuton of X depend on the unknown hypothess: X { p under H p under H, 7) where p,p R n and P[H ] = π, P[H ] = π. The channel through whch we obtan the measurement s supposed to be a dscrete, memoryless, stochastc channel W mappng the alphabet X nto a fnte alphabet Y whose cardnalty s m. Both the dmenson of the output alphabet m and the channel W tself, wll be regarded as optmzaton varables, thus allowng a complete flexblty n the choce of the most sutable channel. 3. Constrants. Wthout any further assumpton on the class of feasble channels,anyoptmzatonproblemwouldbesolvedbythe choce m = n and W = I that makes the random varable X perfectlymeasurableasftherewasnochannelatall.to make the scenaro more realstc we decded to ntroduce a constrant n the capacty of the channel as measured by the usual mutual nformaton between X and Y: maxix;y) C. 8) p x We made ths choce because the capacty of a channel s a reasonable abstracton of ts qualty and s often the most crtcal specfcaton for a communcaton system. 3.3 Cost functon. Sncetherandomprocessobservableafterthechannely n s stll..d. wth a dstrbuton that can be ether q = Wp or q = Wp dependng on the true hypothess, t s reasonable to measure the qualty of the estmaton usng a standard technque for bnary hypothess testng appled to the process y n. We chose to optmze the asymptotc performance of the system n terms of the probablty of error. Specfcally t s well known that for a Bnary hypothess testng problem there exst a sequence of optmal estmators Ĥ n : Y n {,}, desgned usng a log-lkelhood rato, such that they mnmze the probablty of error gven n samples: P e n)=pĥny,...,y n ) = H = )π + PĤny,...,y n ) = H = )π. Moreover t has been shown that P e n) decays exponentally wth n at a rate gven by the Chernoff nformaton Cq,q ), that s: n n logp en) = Cq,q ). Our goal s then to maxmze the Chernoff Informaton Cq,q ) = CWp,Wp ) n order for the probablty of error to decay as fast as possble. The complete formulaton of the optmzaton problem can be wrtten as: s.t. max CWp,Wp ) W,m max p X IX;Y) C T W = T W,j 0 In secton 5 we wll assume that the capacty C s small enough so that the cost functon and the constrants n 9) can be approxmated by more tractable expresson thus leadng to an approxmatng optmzaton problem vald for small capactes. By explctly solvng ths problem we wll gan some nsght regardng the structure of the solutons of 9) n the small C regme. In the next secton we wll ntroduce some basc tools useful to perform the requred approxmatons. 4. EUCLIDEAN APPROXIMATIONS In ths secton we present some approxmatons to the quanttes defned n secton. To obtan these approxmatons we follow the dea known as Eucldean Informaton theory and presented n detals n []. We start consderng a smple Taylor expanson log+x) = x x +ϕx), where ϕx) = ox ) when x tends to 0. Applyng ths expanson to the defnton of the Kullback dstance ) we get Dp q)= = log log + q ) 9) = ) ) q ϕ p p = p q [p] n ) q p ϕ 0) where [p] s a dagonal matrx whose dagonal elements are gven by, =,...,n. We can smplfy the expresson n 0) by notcng that the last summaton s an nfntesmal of a superor order 485

wth respect to p q [p] as proved by the followng nequalty: n ϕ q p q [p] ) ϕ qj p j p j ) q p j p j ) j p j qj p ϕ j p j 0, ) where j s the ndex such that s maxmum and can be regarded as a functon of p. Furthermore the quanttes p q and p q are nfntesmal of [p] [q] the same order as p q snce from the nequaltes : t follows that mn p q [p] p q max [q], p q [p] p q = ) [q] smply applyng the squeeze theorem. By vrtue of the last two observatons the expresson n 0) can be fnally wrtten as Dp q) = ) p q [q] +o p q, ) [q] and we can now use ths expresson to approxmate both the defnton of capacty 5) and Chernoff nformaton. RegardngtheChernoffnformatonwecanapproxmatet wth an easer Kullback dstance as stated n the followng proposton: Proposton. If two probablty dstrbutons p and q, defned on the same alphabet X, are close enough then the followng approxmaton holds: More formally we have: Cp,q) 4 Dp q). Proof: See appendx A. Cp,q) Dp q) = 4. Regardng the defnton of channel capacty5), t s well known see [3]) that f p s the optmal nput dstrbuton achevng the capacty and p 0 s the correspondng output dstrbuton, we have DW p 0 ) = C : p > 0 and DW p 0 ) < C : p = 0. By vrtue of ths consderaton, under the assumpton of a small C, all the condtonal dstrbutons W wll be close to p 0 and the dstances are well approxmated by the expresson ) thus obtanng: W p 0 C. 3) [p It s easy to see that the converse s true as well; f we fx a pont p 0 on the smplex and choose n probablty vector W satsfyng the constrants 3), the resultng channel wll have a capacty less than C. Therefore condtons 3) are an alternatve formulaton of the channel capacty constrant 8) and ther only dsadvantage s that they requre a new arbtrary probablty vector p 0. For results on boundng a rato of two quadratc form we refer to [4] 5. NOISY CHANNEL SOLUTION Wth the term nosy channel we mean a channel whose capacty C s small. In ths secton we am to approxmate, under the assumpton C <<, the general problem 9) wth a more tractable optmzaton problem, whose soluton can be computed explctly and wll allow us to understandthebehavorof9)nthenosychannelregme. If C << we can take advantage of the constrants 3) snce they mply that Wp and Wp are close no matter what p and p are. If Wp and Wp are close, by vrtue of proposton,we can use the approxmaton: CWp,Wp ) = 4 DWp Wp ) and therefore maxmzng the Chernoff nformaton turns out to be equvalent to maxmzng the Kullback dstance DWp Wp ). Fnally, usng agan equaton ), we can approxmate the Chernoff nformaton va an Eucldean dstance: CWp,Wp ) = 4 DWp Wp ) = 8 Wp Wp [p. 4) Usng the approxmaton for the capacty constrant 3) and the result n 4), the orgnal problem 9) can be approxmated by: max W,m,p 0 8 Wp p ) [p s.t. W p 0 C, 5) [p T W = T W,j 0 and the advantage of ths formulaton s that t leads to an analytcal soluton as stated n the followng proposton. Proposton. Choose arbtrarly m and p 0 n the m-dmensonal smplex and then consder an arbtrary probablty vector w A such that w A p 0 = C as [p well as the only other vector w B whose dstance from p 0 s C and s opposte to w A wth respect to p 0, that s w B = p 0 w A. Next consder the followng channel: W = { wa f w B f < =,...,n, 6) then channel 6) s the optmal soluton of 5) and the assocated optmal cost s Proof: 4 C p p 7) Let s consder m and p 0 fxed. We wll prove the statement showng frst that the expresson 7) s an upper bound to the optmal value and then that W acheves that bound. To bound the cost we ll use the fact that p p adds up to zero and therefore f A s a matrx wth all the columns equal to each others then Ap p ) = 0. Formally we obtan: 486

Wp p ) [p = W [p 0 p )p p ) [p = W p 0 ) ) [p W p 0 [p C = C p p whch s equvalent to 8 Wp p ) [p 4 C p p. To prove that W acheves ths bound let s start defnng the quantty α = ), : p and notcng that, snce p p adds up to zero, we also have: α = ), : <p α = p p. Now, wth some algebra we get: 8 W p p ) = [p = 8 w A )+w B ) : p = 8 αw A αw B [p = 8 αw A p 0 ) [p = α C = 4 C p p. : <p [p Remarkably, the optmal value we obtaned consderng m and p 0 fxed turned out to be completely ndependent of m and p 0 and, therefore, problem 5) s solved by a trpletw,m,p 0 )wheremandp 0 canbechosenarbtrary provded that the defnton of W n 6) yelds a well defned stochastc matrx. A graphcal depcton of w A and w B, used to construct the optmal channel, s reported n fgure ; we pont out that, snce C s consdered small, t s always possble to determne such a par of vectors nsde the smplex. The result just proven shows that for small capacty the behavor of the Chernoff bound s lnear n C and s proportonaltothel dstancebetweenthetwohypothess. In the next secton we wll present some observatons, basedjustonsmulatons,regardngthebehavorforlarger C. Namely p 0 must be chosen far from the smplex borders so that w A and w B fall nsde the smplex. w A p 0 w B Fg.. Poston of w A and w B n the smplex wth respect to p 0 6. LARGE CAPACITY BEHAVIOR As the capacty ncreases the problem 5) s no longer approxmatng the orgnal optmzaton problem 9). In the general case fndng an analytcal soluton to 9) s unrealstc but we can stll make some remarks. In ths secton we wll pont out some of these nterestng features and we wll present a numercal result. For each m the soluton of 9) as a functon of C s monotone ncreasng and the optmal channel has always capacty C. Ths s true because the cost functon can be shown to be convex and W belongs to a convex set by vrtue of convexty of IY,X) wth respect to the channel. For each m the performances are not mprovng for C logm because the maxmum capacty achevable wth an m dmensonal output alphabet s always less than logm. Moreover f m = n then for C n we obtan exactly the Chernoff nformaton snce among the feasble channels there s the dentty channel I whchallowstomeasurethesamplesdrectlyfromthe source. Fnally the curves obtaned for m > n seem to be dentcal to the one obtaned for m = n. Interestngly, for some choces of the Hypothess p and p, the Chernoff nformaton s reached wth m = n) before the t C = logn In fgure we show the soluton of problem 9) where we keptmasaparameter.onlytwodfferentvaluesofmhave been taken nto account but t s stll possble to observe some of the behavors just ponted out. 7. CONCLUSIONS AND FUTURE WORK In ths paper we consdered a modfed verson of the bnary hypothess testng problem where the samples are measured through a channel. We looked for the best possble channel among those wth a ted capacty and we showed that, f the channel has a small capacty, ths optmzaton problem can be approxmated by a quadratc one. The optmal soluton for the approxmatng problem acheves an error exponent gven by 4 C p p where C s the capacty of the channel whle p and p are the two hypothess. In the small C regme we were also able to provde an explct formula for the optmal channel. It s not yet formally proved, although clearly supported by smulatons, that the optmal soluton of the 487

Error exponent 0.08 0.07 0.06 0.05 0.04 0.03 0.0 0.0 0 log) Channel capacty nants) m= m=3 Small Capacty result Chernoff Informaton Cp,p ) Fg.. Soluton of problem 9) wth n = 3, p = [0.53 0.3 0.34] and p = [0.3 0.4 0.35] approxmatng problem converges to the soluton of the orgnal problem as C tends to 0. We are currently workng on some generalzatons to the m-ary case as well as some non d-based models lke hdden Markov models. 8. ACKNOWLEDGMENT The autors wsh to thank Mesrob I. Ohannessan for hs suggestons and heln provng proposton. REFERENCES [] Cov:98 T. M. Cover, J. A. Thomas. Elements of Informaton Theory. Wley Interscence Publcaton, 99. [] Euc:008 S. Borade, L. Zheng. Eucldean Informaton Theory. Communcatons, 008 IEEE Internatonal Zurch Semnar on pages 4 7, 008. [3] Gal:968 R. Gallager Informaton Theory and Relable Communcaton Wley, 968. [4] QuadB:999 F. Calskan, C. Hajyev Sensor fault detecton n flght control systems based on the Kalman flter nnovaton sequence Proceedngs of the Insttuton of Mechancal Engneers, Part I: Journal of Systems and Control Engneerng volume 3, ssue 3, pages 43 48, 999. [5] BhattB:943 A. Bhattacharyya, On a measure of dvergence between two statstcal populatons defned by ther probablty dstrbuton. Bulletn of the Calcutta Mathematcal Socety volume 35 pages 99-0, 943. log3) Appendx A. PROOF OF PROPOSITION We have to prove that: Cp,q) Dp q) = 4. Frst of all we want to pont out that, f some of the components of q are equal to zero, then Dp q) s not defned unless the same components of p are zero as well and the t p s taken over the subspace n whch = 0 whenever = 0. For ths reason, wthout loss of generalty, we can restrct our analyss to the case > 0. Snce the defnton of the Chernoff nformaton n 3) s not n a closed form, n ths secton we ll provde an explct expresson to approxmate t when p and q are close enough. In order to easly deal wth equaton 3) let us ntroduce some notaton conventons: D q λ) = Dp λ q), D p λ) = Dp λ p), ˆD q λ) = pλ q [q], ˆD p λ) = pλ p [p]. In [] t s shown that the functon D p λ) s monotone decreasng n λ [0, ] whle D q λ) s ncreasng n the same nterval; moreover there exst a unque λ [0, ] such that D p λ ) = D q λ ). It s also easy to show that ˆD p λ) s monotone decreasng and ˆD q λ) s monotone ncreasng n [0, ] and that the unque value of λ satsfyng the equaton ˆD p λ) = ˆD q λ) s λ = /. In fact, f we denote wth φ = n p the Bhattacharyya coeffcent, we have: ˆD q /)= [ ] p q φ q = [ ] p p φ +q φ q = [ ] p φ +q p φ [ ] = n ), A.) p whch can be shown, wth the same argumentaton, to be equal to ˆD p /). We ll now show that the expresson just found n A.) can be regarded as an approxmaton of the Chernoff nformaton whose dstance from the latter s nfntesmal of a superor order wth respect to p q [q]. Let us start examnng the dfference D q ˆD q whch, usng the unform bound p λ q [q] p q [q] λ and by vrtue of equaton ), turns out to be small as p q: D q λ) ˆD q λ)=δ q λ) =o p λ q [q] ) =o p q [q] ) λ. A.) Usng the same argumentaton and the result n ) we can derve a smlar result for the dfference D p ˆD p : 488

D p λ) ˆD p λ)=δ p λ) =o p λ p [p] ) =o q p [p] ) λ =o p q [q] ) λ. A.3) If we ntroduce now the two functons: fλ) = D p λ) D q λ) ˆfλ) = ˆD p λ) ˆD q λ), keepng n mnd that ˆf/) = 0, we can obtan the followng bound on f/): f /) = f /) ˆf /) = D p /) ˆD p /)+ ˆD q /) D q /) δ p /) + δ q /). A.4) Usng the fact that dd q df λ and the results n dλ A.), A.3), A.4), we can now show that the dstance between ˆD q /) and the Chernoff nformaton Cp,q) = D q λ ) s small as p q: ˆD q /) D q λ ) D q /) D q λ ) + δ q /) dλ f/) fλ ) + δ q /) = f/) + δ q /) δ p /) + δ q /) =o p q [q] ) A.5) The result found n A.5) allows us to wrte the Chernoff nformaton n an explct form sutable to our purposes; more precsely: [ ] Cp,q)= ) n ) +o p q [q] p q ) =Ĉp,q)+o p q A.6) [q] [ ] n ) p = n =F p,q). = p + q n ) ) n p A.8) Before consderng the t let s compute a Taylor expanson of the functon F around q. After some straghtforward computatons we obtan: F p= q = 0, F = 0 =,...,n, p= q F p = + ) =,...,n, p= q 4 q n F = j, p j p= q 4q n therefore we have: Ĉ p,q)= p q) [ q] + ) p q)+o p q )! 4 q n = 8 p q) M q p q)+o p q ) Collectng the results obtaned so far the t we want to compute s trval: Cp,q) Dp q) = Ĉp,q) p q [q] 8 = p q) M q p q)+o p q ) p q p = q) M q p q) 4. In order to compute the t Cp,q) Dp q) on the n- dmensonal smplex, let s frst reduce the dmenson to an n- dmensonal space where we get rd of the constrant n =. In ths lower dmensonal space the approxmate expresson for the Kullback dstance becomes: p q [q] = n ) + ) n q n = p q) [ q] + ) p q) q n = p q) M q p q), A.7) where p and q are n dmensonal vectors equal to the frstn elementsofthevectorspandq whle sthen dmensonalvectorofall.theapproxmateexpressonfor the Chernoff nformaton becomes: q n 489