Channel optimization for binary hypothesis testing

Similar documents
MMA and GCMMA two methods for nonlinear optimization

Linear Approximation with Regularization and Moving Least Squares

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Composite Hypotheses testing

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Numerical Heat and Mass Transfer

Lecture 10 Support Vector Machines II

Assortment Optimization under MNL

Error Probability for M Signals

Problem Set 9 Solutions

Kernel Methods and SVMs Extension

Lecture 3: Shannon s Theorem

Some modelling aspects for the Matlab implementation of MMA

Inductance Calculation for Conductors of Arbitrary Shape

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

Lecture 12: Classification

Solutions HW #2. minimize. Ax = b. Give the dual problem, and make the implicit equality constraints explicit. Solution.

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Lecture 14 (03/27/18). Channels. Decoding. Preview of the Capacity Theorem.

More metrics on cartesian products

VARIATION OF CONSTANT SUM CONSTRAINT FOR INTEGER MODEL WITH NON UNIFORM VARIABLES

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Chapter 8 Indicator Variables

Complete subgraphs in multipartite graphs

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

COS 521: Advanced Algorithms Game Theory and Linear Programming

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

ELASTIC WAVE PROPAGATION IN A CONTINUOUS MEDIUM

Estimation: Part 2. Chapter GREG estimation

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

1 Matrix representations of canonical matrices

Additional Codes using Finite Difference Method. 1 HJB Equation for Consumption-Saving Problem Without Uncertainty

DUE: WEDS FEB 21ST 2018

Lecture 12: Discrete Laplacian

APPENDIX A Some Linear Algebra

Introduction to Vapor/Liquid Equilibrium, part 2. Raoult s Law:

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

CS : Algorithms and Uncertainty Lecture 14 Date: October 17, 2016

Linear Feature Engineering 11

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Maximizing the number of nonnegative subsets

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Hidden Markov Models & The Multivariate Gaussian (10/26/04)

princeton univ. F 17 cos 521: Advanced Algorithm Design Lecture 7: LP Duality Lecturer: Matt Weinberg

Vapnik-Chervonenkis theory

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

NUMERICAL DIFFERENTIATION

Learning Theory: Lecture Notes

ECE559VV Project Report

Perfect Competition and the Nash Bargaining Solution

THE ARIMOTO-BLAHUT ALGORITHM FOR COMPUTATION OF CHANNEL CAPACITY. William A. Pearlman. References: S. Arimoto - IEEE Trans. Inform. Thy., Jan.

Generalized Linear Methods

Lecture 2: Prelude to the big shrink

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

Lecture Notes on Linear Regression

Stat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors

Basically, if you have a dummy dependent variable you will be estimating a probability.

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

Time-Varying Systems and Computations Lecture 6

Notes on Frequency Estimation in Data Streams

Lecture 4 Hypothesis Testing

Affine transformations and convexity

Solutions to exam in SF1811 Optimization, Jan 14, 2015

Salmon: Lectures on partial differential equations. Consider the general linear, second-order PDE in the form. ,x 2

Chapter 11: Simple Linear Regression and Correlation

3.1 ML and Empirical Distribution

The Geometry of Logit and Probit

EEE 241: Linear Systems

Global Sensitivity. Tuesday 20 th February, 2018

Edge Isoperimetric Inequalities

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

COMPARISON OF SOME RELIABILITY CHARACTERISTICS BETWEEN REDUNDANT SYSTEMS REQUIRING SUPPORTING UNITS FOR THEIR OPERATIONS

Difference Equations

College of Computer & Information Science Fall 2009 Northeastern University 20 October 2009

Appendix for Causal Interaction in Factorial Experiments: Application to Conjoint Analysis

Supplementary Notes for Chapter 9 Mixture Thermodynamics

Chapter 7 Channel Capacity and Coding

Chapter 7 Channel Capacity and Coding

Why Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

A Local Variational Problem of Second Order for a Class of Optimal Control Problems with Nonsmooth Objective Function

Ph 219a/CS 219a. Exercises Due: Wednesday 23 October 2013

Linear Regression Analysis: Terminology and Notation

An (almost) unbiased estimator for the S-Gini index

Statistics II Final Exam 26/6/18

NON-CENTRAL 7-POINT FORMULA IN THE METHOD OF LINES FOR PARABOLIC AND BURGERS' EQUATIONS

The Second Anti-Mathima on Game Theory

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

COMPOSITE BEAM WITH WEAK SHEAR CONNECTION SUBJECTED TO THERMAL LOAD

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Channel Encoder. Channel. Figure 7.1: Communication system

C/CS/Phy191 Problem Set 3 Solutions Out: Oct 1, 2008., where ( 00. ), so the overall state of the system is ) ( ( ( ( 00 ± 11 ), Φ ± = 1

Entropy of Markov Information Sources and Capacity of Discrete Input Constrained Channels (from Immink, Coding Techniques for Digital Recorders)

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem.

APPROXIMATE PRICES OF BASKET AND ASIAN OPTIONS DUPONT OLIVIER. Premia 14

BOUNDEDNESS OF THE RIESZ TRANSFORM WITH MATRIX A 2 WEIGHTS

FUZZY GOAL PROGRAMMING VS ORDINARY FUZZY PROGRAMMING APPROACH FOR MULTI OBJECTIVE PROGRAMMING PROBLEM

Transcription:

Channel optmzaton for bnary hypothess testng Gancarlo Baldan Munther Dahleh MIT, Laboratory for Informaton and Decson systems, Cambrdge 039 USA emal: gbaldan@mt.edu) MIT, Laboratory for Informaton and Decson systems, Cambrdge 039 USA emal: dahleh@mt.edu) Abstract: In ths paper we consder the classcal bnary hypothess testng problem where the d samples are obtaned through a channel. Our goal s to study the relatonshp between the channel capacty and the goodness of the estmaton measured by the Chernoff nformaton n order to get an upper bound on the estmaton performances as well as some nsght on the structure of the optmal channel. Keywords: Optmal Estmaton; Hypotheses; Capacty.. INTRODUCTION The bnary hypothess testng problem s probably the smplest estmaton problem one can consder. In the classcal setup a sequence of samples x s drawn from an unknown n dmensonal probablty dstrbuton whch can be ether p Hypothess H ) wth pror probablty π or p Hypothess H ) wth pror probablty π. The problem s to nfer from the samples whch of the two hypothess s correct or, to be precse, the most lkely. Thsproblemsverywellknownandsoptmallysolvedusng the lkelhood rato test LRT) as shown, for example, n []. Furthermore, applyng a large devaton prncple, an asymptotc analyss can be performed to show that the probablty of error n the estmaton decays exponentally n the number of samples wth a rate gven by the so called Chernoff Informaton. Inthspaper,wewllconsderanextensontothsproblem motvated by the fact that each collected sample s always obtaned through a measurng system that can affect the estmaton process. To model the effects due to the measurng system we wll consder that the observatons at the sourceareavalableonlythroughafntecapacty,dscrete, memoryless stochastc channel. Our goal s to address the queston of desgnng such a channelto maxmze goodness of the estmaton as measured by the decay rate of the probablty of error as well as obtanng a relatonshp between the capacty of the channel and the qualty of the estmaton.. BASIC DEFINITIONS In ths secton we brefly revew some basc quanttes defned n Informaton theory. These quanttes wll be used throughout the whole paper and some approxmatons wll be ntroduced to make ther defntons more tractable. Ths work was supported under NSF/EFRI grant 0735956 and under AFOSR/MURI grant R6756-G TheKullbackLeblerdstancesoneofthemostmportant quanttes n nformaton theory and measure the dstance between two probablty dstrbuton p and q defned over the same alphabet X. The Kullback dstance s defned as: Dp q) = px)log px) qx), ) x X where the logarthm wll be always consdered n base e. Snce from defnton ) t s clear that Dp q) doesn t depend on the alphabet but just on the two dstrbutons themselves we wll usually adopt the notaton: Dp q) = log, ) wherensthecardnaltyofx andweomtthedependence on the alphabet. Anothermeasureofthedstancebetweentwodstrbutons p and q whch s closely related to the Bnary Hypothess testng problem, s the so called Chernoff nformaton whose defnton s: Cp,q) = Dp λ p) = Dp λ q), 3) where p λ s a probablty dstrbuton defned as: p λ = p λ q λ n pλ q λ and λ s such that Dp λ p) = Dp λ q). We wll often consder dscrete, memoryless, stochastc channels mappng the alphabet X nto a fnte alphabet Y whose cardnalty s m. Ths knd of channels s completely descrbed by a condtonal probablty dstrbuton: Wy x) = PY = y X = x), 4) whch can be regarded as an m n stochastc matrx and wll be often denoted just by W. To measure the capacty of such a channel we wll use the standard nformaton defnton:, C = max p x IX;Y) 5) Copyrght by the Internatonal Federaton of Automatc Control IFAC) 484

where X s a random varable such that X p x and Y Wp x s the correspondng random varable obtaned through the channel. IX;Y) s called mutual nformaton between X and Y and s defned as: IX;Y) = Dp xy p x p y ) = E x [DW x) p y )]. 6) 3. PROBLEM FORMULATION The problem we are tryng to face s, essentally, an optmzaton one and, therefore, to provde a correct formulaton, we have to dentfy three major components: optmzaton varables, cost functon and constrants. In ths secton we wll defne these components tryng to motvate the choces made. 3. Optmzaton varables. We model our sample source as a dscrete random varable X over a fnte alphabet X such that X = n. The mass dstrbuton of X depend on the unknown hypothess: X { p under H p under H, 7) where p,p R n and P[H ] = π, P[H ] = π. The channel through whch we obtan the measurement s supposed to be a dscrete, memoryless, stochastc channel W mappng the alphabet X nto a fnte alphabet Y whose cardnalty s m. Both the dmenson of the output alphabet m and the channel W tself, wll be regarded as optmzaton varables, thus allowng a complete flexblty n the choce of the most sutable channel. 3. Constrants. Wthout any further assumpton on the class of feasble channels,anyoptmzatonproblemwouldbesolvedbythe choce m = n and W = I that makes the random varable X perfectlymeasurableasftherewasnochannelatall.to make the scenaro more realstc we decded to ntroduce a constrant n the capacty of the channel as measured by the usual mutual nformaton between X and Y: maxix;y) C. 8) p x We made ths choce because the capacty of a channel s a reasonable abstracton of ts qualty and s often the most crtcal specfcaton for a communcaton system. 3.3 Cost functon. Sncetherandomprocessobservableafterthechannely n s stll..d. wth a dstrbuton that can be ether q = Wp or q = Wp dependng on the true hypothess, t s reasonable to measure the qualty of the estmaton usng a standard technque for bnary hypothess testng appled to the process y n. We chose to optmze the asymptotc performance of the system n terms of the probablty of error. Specfcally t s well known that for a Bnary hypothess testng problem there exst a sequence of optmal estmators Ĥ n : Y n {,}, desgned usng a log-lkelhood rato, such that they mnmze the probablty of error gven n samples: P e n)=pĥny,...,y n ) = H = )π + PĤny,...,y n ) = H = )π. Moreover t has been shown that P e n) decays exponentally wth n at a rate gven by the Chernoff nformaton Cq,q ), that s: n n logp en) = Cq,q ). Our goal s then to maxmze the Chernoff Informaton Cq,q ) = CWp,Wp ) n order for the probablty of error to decay as fast as possble. The complete formulaton of the optmzaton problem can be wrtten as: s.t. max CWp,Wp ) W,m max p X IX;Y) C T W = T W,j 0 In secton 5 we wll assume that the capacty C s small enough so that the cost functon and the constrants n 9) can be approxmated by more tractable expresson thus leadng to an approxmatng optmzaton problem vald for small capactes. By explctly solvng ths problem we wll gan some nsght regardng the structure of the solutons of 9) n the small C regme. In the next secton we wll ntroduce some basc tools useful to perform the requred approxmatons. 4. EUCLIDEAN APPROXIMATIONS In ths secton we present some approxmatons to the quanttes defned n secton. To obtan these approxmatons we follow the dea known as Eucldean Informaton theory and presented n detals n []. We start consderng a smple Taylor expanson log+x) = x x +ϕx), where ϕx) = ox ) when x tends to 0. Applyng ths expanson to the defnton of the Kullback dstance ) we get Dp q)= = log log + q ) 9) = ) ) q ϕ p p = p q [p] n ) q p ϕ 0) where [p] s a dagonal matrx whose dagonal elements are gven by, =,...,n. We can smplfy the expresson n 0) by notcng that the last summaton s an nfntesmal of a superor order 485

wth respect to p q [p] as proved by the followng nequalty: n ϕ q p q [p] ) ϕ qj p j p j ) q p j p j ) j p j qj p ϕ j p j 0, ) where j s the ndex such that s maxmum and can be regarded as a functon of p. Furthermore the quanttes p q and p q are nfntesmal of [p] [q] the same order as p q snce from the nequaltes : t follows that mn p q [p] p q max [q], p q [p] p q = ) [q] smply applyng the squeeze theorem. By vrtue of the last two observatons the expresson n 0) can be fnally wrtten as Dp q) = ) p q [q] +o p q, ) [q] and we can now use ths expresson to approxmate both the defnton of capacty 5) and Chernoff nformaton. RegardngtheChernoffnformatonwecanapproxmatet wth an easer Kullback dstance as stated n the followng proposton: Proposton. If two probablty dstrbutons p and q, defned on the same alphabet X, are close enough then the followng approxmaton holds: More formally we have: Cp,q) 4 Dp q). Proof: See appendx A. Cp,q) Dp q) = 4. Regardng the defnton of channel capacty5), t s well known see [3]) that f p s the optmal nput dstrbuton achevng the capacty and p 0 s the correspondng output dstrbuton, we have DW p 0 ) = C : p > 0 and DW p 0 ) < C : p = 0. By vrtue of ths consderaton, under the assumpton of a small C, all the condtonal dstrbutons W wll be close to p 0 and the dstances are well approxmated by the expresson ) thus obtanng: W p 0 C. 3) [p It s easy to see that the converse s true as well; f we fx a pont p 0 on the smplex and choose n probablty vector W satsfyng the constrants 3), the resultng channel wll have a capacty less than C. Therefore condtons 3) are an alternatve formulaton of the channel capacty constrant 8) and ther only dsadvantage s that they requre a new arbtrary probablty vector p 0. For results on boundng a rato of two quadratc form we refer to [4] 5. NOISY CHANNEL SOLUTION Wth the term nosy channel we mean a channel whose capacty C s small. In ths secton we am to approxmate, under the assumpton C <<, the general problem 9) wth a more tractable optmzaton problem, whose soluton can be computed explctly and wll allow us to understandthebehavorof9)nthenosychannelregme. If C << we can take advantage of the constrants 3) snce they mply that Wp and Wp are close no matter what p and p are. If Wp and Wp are close, by vrtue of proposton,we can use the approxmaton: CWp,Wp ) = 4 DWp Wp ) and therefore maxmzng the Chernoff nformaton turns out to be equvalent to maxmzng the Kullback dstance DWp Wp ). Fnally, usng agan equaton ), we can approxmate the Chernoff nformaton va an Eucldean dstance: CWp,Wp ) = 4 DWp Wp ) = 8 Wp Wp [p. 4) Usng the approxmaton for the capacty constrant 3) and the result n 4), the orgnal problem 9) can be approxmated by: max W,m,p 0 8 Wp p ) [p s.t. W p 0 C, 5) [p T W = T W,j 0 and the advantage of ths formulaton s that t leads to an analytcal soluton as stated n the followng proposton. Proposton. Choose arbtrarly m and p 0 n the m-dmensonal smplex and then consder an arbtrary probablty vector w A such that w A p 0 = C as [p well as the only other vector w B whose dstance from p 0 s C and s opposte to w A wth respect to p 0, that s w B = p 0 w A. Next consder the followng channel: W = { wa f w B f < =,...,n, 6) then channel 6) s the optmal soluton of 5) and the assocated optmal cost s Proof: 4 C p p 7) Let s consder m and p 0 fxed. We wll prove the statement showng frst that the expresson 7) s an upper bound to the optmal value and then that W acheves that bound. To bound the cost we ll use the fact that p p adds up to zero and therefore f A s a matrx wth all the columns equal to each others then Ap p ) = 0. Formally we obtan: 486

Wp p ) [p = W [p 0 p )p p ) [p = W p 0 ) ) [p W p 0 [p C = C p p whch s equvalent to 8 Wp p ) [p 4 C p p. To prove that W acheves ths bound let s start defnng the quantty α = ), : p and notcng that, snce p p adds up to zero, we also have: α = ), : <p α = p p. Now, wth some algebra we get: 8 W p p ) = [p = 8 w A )+w B ) : p = 8 αw A αw B [p = 8 αw A p 0 ) [p = α C = 4 C p p. : <p [p Remarkably, the optmal value we obtaned consderng m and p 0 fxed turned out to be completely ndependent of m and p 0 and, therefore, problem 5) s solved by a trpletw,m,p 0 )wheremandp 0 canbechosenarbtrary provded that the defnton of W n 6) yelds a well defned stochastc matrx. A graphcal depcton of w A and w B, used to construct the optmal channel, s reported n fgure ; we pont out that, snce C s consdered small, t s always possble to determne such a par of vectors nsde the smplex. The result just proven shows that for small capacty the behavor of the Chernoff bound s lnear n C and s proportonaltothel dstancebetweenthetwohypothess. In the next secton we wll present some observatons, basedjustonsmulatons,regardngthebehavorforlarger C. Namely p 0 must be chosen far from the smplex borders so that w A and w B fall nsde the smplex. w A p 0 w B Fg.. Poston of w A and w B n the smplex wth respect to p 0 6. LARGE CAPACITY BEHAVIOR As the capacty ncreases the problem 5) s no longer approxmatng the orgnal optmzaton problem 9). In the general case fndng an analytcal soluton to 9) s unrealstc but we can stll make some remarks. In ths secton we wll pont out some of these nterestng features and we wll present a numercal result. For each m the soluton of 9) as a functon of C s monotone ncreasng and the optmal channel has always capacty C. Ths s true because the cost functon can be shown to be convex and W belongs to a convex set by vrtue of convexty of IY,X) wth respect to the channel. For each m the performances are not mprovng for C logm because the maxmum capacty achevable wth an m dmensonal output alphabet s always less than logm. Moreover f m = n then for C n we obtan exactly the Chernoff nformaton snce among the feasble channels there s the dentty channel I whchallowstomeasurethesamplesdrectlyfromthe source. Fnally the curves obtaned for m > n seem to be dentcal to the one obtaned for m = n. Interestngly, for some choces of the Hypothess p and p, the Chernoff nformaton s reached wth m = n) before the t C = logn In fgure we show the soluton of problem 9) where we keptmasaparameter.onlytwodfferentvaluesofmhave been taken nto account but t s stll possble to observe some of the behavors just ponted out. 7. CONCLUSIONS AND FUTURE WORK In ths paper we consdered a modfed verson of the bnary hypothess testng problem where the samples are measured through a channel. We looked for the best possble channel among those wth a ted capacty and we showed that, f the channel has a small capacty, ths optmzaton problem can be approxmated by a quadratc one. The optmal soluton for the approxmatng problem acheves an error exponent gven by 4 C p p where C s the capacty of the channel whle p and p are the two hypothess. In the small C regme we were also able to provde an explct formula for the optmal channel. It s not yet formally proved, although clearly supported by smulatons, that the optmal soluton of the 487

Error exponent 0.08 0.07 0.06 0.05 0.04 0.03 0.0 0.0 0 log) Channel capacty nants) m= m=3 Small Capacty result Chernoff Informaton Cp,p ) Fg.. Soluton of problem 9) wth n = 3, p = [0.53 0.3 0.34] and p = [0.3 0.4 0.35] approxmatng problem converges to the soluton of the orgnal problem as C tends to 0. We are currently workng on some generalzatons to the m-ary case as well as some non d-based models lke hdden Markov models. 8. ACKNOWLEDGMENT The autors wsh to thank Mesrob I. Ohannessan for hs suggestons and heln provng proposton. REFERENCES [] Cov:98 T. M. Cover, J. A. Thomas. Elements of Informaton Theory. Wley Interscence Publcaton, 99. [] Euc:008 S. Borade, L. Zheng. Eucldean Informaton Theory. Communcatons, 008 IEEE Internatonal Zurch Semnar on pages 4 7, 008. [3] Gal:968 R. Gallager Informaton Theory and Relable Communcaton Wley, 968. [4] QuadB:999 F. Calskan, C. Hajyev Sensor fault detecton n flght control systems based on the Kalman flter nnovaton sequence Proceedngs of the Insttuton of Mechancal Engneers, Part I: Journal of Systems and Control Engneerng volume 3, ssue 3, pages 43 48, 999. [5] BhattB:943 A. Bhattacharyya, On a measure of dvergence between two statstcal populatons defned by ther probablty dstrbuton. Bulletn of the Calcutta Mathematcal Socety volume 35 pages 99-0, 943. log3) Appendx A. PROOF OF PROPOSITION We have to prove that: Cp,q) Dp q) = 4. Frst of all we want to pont out that, f some of the components of q are equal to zero, then Dp q) s not defned unless the same components of p are zero as well and the t p s taken over the subspace n whch = 0 whenever = 0. For ths reason, wthout loss of generalty, we can restrct our analyss to the case > 0. Snce the defnton of the Chernoff nformaton n 3) s not n a closed form, n ths secton we ll provde an explct expresson to approxmate t when p and q are close enough. In order to easly deal wth equaton 3) let us ntroduce some notaton conventons: D q λ) = Dp λ q), D p λ) = Dp λ p), ˆD q λ) = pλ q [q], ˆD p λ) = pλ p [p]. In [] t s shown that the functon D p λ) s monotone decreasng n λ [0, ] whle D q λ) s ncreasng n the same nterval; moreover there exst a unque λ [0, ] such that D p λ ) = D q λ ). It s also easy to show that ˆD p λ) s monotone decreasng and ˆD q λ) s monotone ncreasng n [0, ] and that the unque value of λ satsfyng the equaton ˆD p λ) = ˆD q λ) s λ = /. In fact, f we denote wth φ = n p the Bhattacharyya coeffcent, we have: ˆD q /)= [ ] p q φ q = [ ] p p φ +q φ q = [ ] p φ +q p φ [ ] = n ), A.) p whch can be shown, wth the same argumentaton, to be equal to ˆD p /). We ll now show that the expresson just found n A.) can be regarded as an approxmaton of the Chernoff nformaton whose dstance from the latter s nfntesmal of a superor order wth respect to p q [q]. Let us start examnng the dfference D q ˆD q whch, usng the unform bound p λ q [q] p q [q] λ and by vrtue of equaton ), turns out to be small as p q: D q λ) ˆD q λ)=δ q λ) =o p λ q [q] ) =o p q [q] ) λ. A.) Usng the same argumentaton and the result n ) we can derve a smlar result for the dfference D p ˆD p : 488

D p λ) ˆD p λ)=δ p λ) =o p λ p [p] ) =o q p [p] ) λ =o p q [q] ) λ. A.3) If we ntroduce now the two functons: fλ) = D p λ) D q λ) ˆfλ) = ˆD p λ) ˆD q λ), keepng n mnd that ˆf/) = 0, we can obtan the followng bound on f/): f /) = f /) ˆf /) = D p /) ˆD p /)+ ˆD q /) D q /) δ p /) + δ q /). A.4) Usng the fact that dd q df λ and the results n dλ A.), A.3), A.4), we can now show that the dstance between ˆD q /) and the Chernoff nformaton Cp,q) = D q λ ) s small as p q: ˆD q /) D q λ ) D q /) D q λ ) + δ q /) dλ f/) fλ ) + δ q /) = f/) + δ q /) δ p /) + δ q /) =o p q [q] ) A.5) The result found n A.5) allows us to wrte the Chernoff nformaton n an explct form sutable to our purposes; more precsely: [ ] Cp,q)= ) n ) +o p q [q] p q ) =Ĉp,q)+o p q A.6) [q] [ ] n ) p = n =F p,q). = p + q n ) ) n p A.8) Before consderng the t let s compute a Taylor expanson of the functon F around q. After some straghtforward computatons we obtan: F p= q = 0, F = 0 =,...,n, p= q F p = + ) =,...,n, p= q 4 q n F = j, p j p= q 4q n therefore we have: Ĉ p,q)= p q) [ q] + ) p q)+o p q )! 4 q n = 8 p q) M q p q)+o p q ) Collectng the results obtaned so far the t we want to compute s trval: Cp,q) Dp q) = Ĉp,q) p q [q] 8 = p q) M q p q)+o p q ) p q p = q) M q p q) 4. In order to compute the t Cp,q) Dp q) on the n- dmensonal smplex, let s frst reduce the dmenson to an n- dmensonal space where we get rd of the constrant n =. In ths lower dmensonal space the approxmate expresson for the Kullback dstance becomes: p q [q] = n ) + ) n q n = p q) [ q] + ) p q) q n = p q) M q p q), A.7) where p and q are n dmensonal vectors equal to the frstn elementsofthevectorspandq whle sthen dmensonalvectorofall.theapproxmateexpressonfor the Chernoff nformaton becomes: q n 489