A GENERAL CLASS OF LOWER BOUNDS ON THE PROBABILITY OF ERROR IN MULTIPLE HYPOTHESIS TESTING. Tirza Routtenberg and Joseph Tabrikian

Similar documents
A NEW BAYESIAN LOWER BOUND ON THE MEAN SQUARE ERROR OF ESTIMATORS. Koby Todros and Joseph Tabrikian

arxiv: v4 [cs.it] 17 Oct 2015

BAYESIAN DESIGN OF DECENTRALIZED HYPOTHESIS TESTING UNDER COMMUNICATION CONSTRAINTS. Alla Tarighati, and Joakim Jaldén

Arimoto-Rényi Conditional Entropy. and Bayesian M-ary Hypothesis Testing. Abstract

Expectation Propagation Algorithm

EE229B - Final Project. Capacity-Approaching Low-Density Parity-Check Codes

Gaussian Estimation under Attack Uncertainty

Lecture 7 Introduction to Statistical Decision Theory

Chernoff-Type Bounds for the Gaussian Error Function

Estimation of signal information content for classification

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

MMSE Dimension. snr. 1 We use the following asymptotic notation: f(x) = O (g(x)) if and only

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

An Extended Fano s Inequality for the Finite Blocklength Coding

Lattices for Distributed Source Coding: Jointly Gaussian Sources and Reconstruction of a Linear Function

44 CHAPTER 2. BAYESIAN DECISION THEORY

Worst-Case Analysis of the Perceptron and Exponentiated Update Algorithms

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Parameter Estimation

Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

An introduction to Variational calculus in Machine Learning

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Arimoto Channel Coding Converse and Rényi Divergence

Time-Delay Estimation *

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Quantum Sphere-Packing Bounds and Moderate Deviation Analysis for Classical-Quantum Channels

Tight Bounds for Symmetric Divergence Measures and a New Inequality Relating f-divergences

Bayes Decision Theory

Information Theory in Intelligent Decision Making

Convexity/Concavity of Renyi Entropy and α-mutual Information

Some New Information Inequalities Involving f-divergences

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Decentralized Detection in Sensor Networks

Universal Composite Hypothesis Testing: A Competitive Minimax Approach

Tightened Upper Bounds on the ML Decoding Error Probability of Binary Linear Block Codes and Applications

Bayesian Learning (II)

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

Lecture 8: Channel Capacity, Continuous Random Variables

Decentralized Detection In Wireless Sensor Networks

On Information Divergence Measures, Surrogate Loss Functions and Decentralized Hypothesis Testing

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Expectation Propagation for Approximate Bayesian Inference

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Statistics Ph.D. Qualifying Exam: Part I October 18, 2003

Lecture 1 October 9, 2013

On the Bayes Risk in Information-Hiding Protocols

Bayesian Decision Theory

THE BAYESIAN ABEL BOUND ON THE MEAN SQUARE ERROR

A COLLABORATIVE 20 QUESTIONS MODEL FOR TARGET SEARCH WITH HUMAN-MACHINE INTERACTION

EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.

Cases Where Finding the Minimum Entropy Coloring of a Characteristic Graph is a Polynomial Time Problem

x log x, which is strictly convex, and use Jensen s Inequality:

Upper Bounds on the Capacity of Binary Intermittent Communication

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

A NEW INFORMATION THEORETIC APPROACH TO ORDER ESTIMATION PROBLEM. Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A.

Lecture 7. Union bound for reducing M-ary to binary hypothesis testing

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Lecture 8: Information Theory and Statistics

On the Bayes Risk in Information-Hiding Protocols

On the Capacity of Free-Space Optical Intensity Channels

Robustness and duality of maximum entropy and exponential family distributions

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Universal Estimation of Divergence for Continuous Distributions via Data-Dependent Partitions

Evaluation of the non-elementary integral e

3.0.1 Multivariate version and tensor product of experiments

Ústav teorie informace a automatizace. Academy of Sciences of the Czech Republic Institute of Information Theory and Automation RESEARCH REPORT

Ch 4. Linear Models for Classification

Detection and Estimation Theory

EECS 750. Hypothesis Testing with Communication Constraints

LECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem

Unsupervised Learning

Performance-based Security for Encoding of Information Signals. FA ( ) Paul Cuff (Princeton University)

Optimum Joint Detection and Estimation

Constructing Polar Codes Using Iterative Bit-Channel Upgrading. Arash Ghayoori. B.Sc., Isfahan University of Technology, 2011

Generative MaxEnt Learning for Multiclass Classification

IN this paper, we consider the capacity of sticky channels, a

Robust Binary Quantizers for Distributed Detection

The Bayes classifier

Optimum Joint Detection and Estimation

Variational Scoring of Graphical Model Structures

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

EIE6207: Estimation Theory

Tightened Upper Bounds on the ML Decoding Error Probability of Binary Linear Block Codes and Applications

EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm

Variable selection and feature construction using methods related to information theory

Versatile, Accurate and Analytically Tractable Approximation for the Gaussian Q-function. Miguel López-Benítez and Fernando Casadevall

Introduction to Information Entropy Adapted from Papoulis (1991)

Lecture 2: August 31

Stochastic Complexity of Variational Bayesian Hidden Markov Models

Binary Transmissions over Additive Gaussian Noise: A Closed-Form Expression for the Channel Capacity 1

Kernel Methods and Support Vector Machines

Information Theory and Hypothesis Testing

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Variational Principal Components

Design of Optimal Bayesian Reliability Test Plans for a Series System

Transcription:

A GENERAL CLASS OF LOWER BOUNDS ON THE PROBABILITY OF ERROR IN MULTIPLE HYPOTHESIS TESTING Tirza Routtenberg and Joseph Tabrikian Department of Electrical and Computer Engineering Ben-Gurion University of the Negev, Beer-Sheva 8405, Israel Email: {tirzar,joseph}@ee.bgu.ac.il ABSTRACT In this paper, a new class of lower bounds on the probability of error for m-ary hypothesis tests is proposed. Computation of the imum probability of error which is attained by the maximum a-posteriori probability (MAP) criterion, is usually not tractable. The new class is derived using Hölder s inequality. The bounds in this class are continuous and differentiable function of the conditional probability of error and they provide good prediction of the imum probability of error in multiple hypothesis testing. It is shown that for binary hypothesis testing problem this bound asymptotically coincides with the optimum probability of error provided by the MAP criterion. This bound is compared with other existing lower bounds in several typical detection and classification problems in terms of tightness and computational complexity. Index Terms MAP, detection, lower bounds, hypothesis testing, probability of error. INTRODUCTION Lower bounds on the probability of error are of great importance in system design and performance analysis in many applications, such as signal detection, communications, and classification. It is well known that the imum probability of error is attained by the maximum a-posteriori probability (MAP) criterion, however, its probability of error is often difficult to calculate and usually not tractable. In such cases, lower bounds on the probability of error are useful for performance analysis, feasibility study and system design. These bounds can be useful also for derivation of analytical expressions for the Ziv-Zakai family of bounds for parameter estimation. One of the difficulties in computation of the Ziv-Zakai bounds is that they involve an expression for the imum probability of error of a binary hypothesis problem. Analytic expressions for lower bounds on the probability of error may be useful to simplify the calculation of the bound. Several lower bounds on the probability of error have been presented in the literature. The bounds can be divided into binary hypothesis bounds, 3 and general bounds for multiple-hypothesis bounds 4, 5, 6. The lower bounds presented in 4 and 7 are based on Fano 8 and Shannon inequalities, respectively. The relations between entropy and error probability have been used to derive the bounds in 5, 9. Lower bounds on the Bayes risk, which utilize distance measures between statistical distributions, 3, 6 can also be used as lower bounds on the probability of error. The lower bounds in, 3 are based on Bhattacharyya distance and have closed form expressions for many commonly used distributions, but their tightness are unsatisfactory in most cases. Devijver 6 introduced another bound in terms of the Bayesian distance. This bound is tighter than the Bhattacharyya bound and appropriate also for the multiple hypothesis testing. Practical and useful lower bounds on the probability of error are expected to be computationally simple, tight, and appropriate for general multi-hypothesis problems. In this paper, a new class of lower bounds with the aforementioned desired properties is derived using Hölder s inequality. The bounds in this class are simpler to compute than the optimum probability of error provided by the MAP criterion and they provide good prediction of the imum probability of error in multiple hypothesis testing. The tightest lower bound under this class of bounds is derived. It is shown that some existing lower bounds 5 can be derived from this family. In addition, for binary hypothesis testing problem this bound asymptotically coincides with the optimum probability of error provided by the MAP criterion. This bound is compared with other existing lower bounds. The paper is organized as follows. The basic idea of the bounding problem presented in Section. A brief review of existing lower bounds on the probability of error is presented in Section 3. The new class of bounds is derived in Section 4. The performances of the proposed bound for various examples is evaluated in Section 5. Finally, our conclusions appear in Section 6.. PROBLEM STATEMENT Consider an M-ary hypothesis testing problem, in which the hypotheses are θ i, i =,..., M with the corresponding

a-priori probabilities P (θ i ), i =,..., M, and the random observation vector is x. Let p(θ i x), i =,..., M denote the conditional probability of θ i given x, and f(x θ i ) and f(x, θ i ) are the conditional and joint probability density functions (pdf) of x and θ i, i =,..., M, respectively. The probability of error of the decision problem is denoted by. It is well known that the imum average probability of error, obtained by the MAP criterion, is given by 9 Pe = E max {P (θ i x)}. (),...,M However, the imum probability in () is often difficult to calculate and usually not tractable. Therefore, computable and tight lower bounds on the probability of error are useful for performance analysis and system design. 3. REVIEW OF EXISTING LOWER BOUNDS In this section, some existing lower bounds on the imum probability of error are presented. The following bounds have been derived especially for the binary hypothesis testing. Most of the binary hypothesis testing bounds are based on divergence measures of the difference between two probability distributions, known as f-divergences or Ali-Silvey distances 0. In, the divergence and two Bhattacharyya-based lower bounds were proposed. The divergence lower bound is B (div) = 8 e J/ () where J = E log L(x) E log L(x) and L(x) = f x θ (x θ ) f x θ (x θ ) is the likelihood ratio function, and E i log L(x) = log L(x)f x θi (x θ i )dx, i =,. x A simple Bhattacharyya-based lower bound is P E (θ x)p (θ x) B (BLB) =. (3) 8P (θ )P (θ ) This bound is always tighter than the divergence lower bound. The second Bhattacharyya-based bound on is B (BLB) = 0.5 0.5 4E P (θ x) P (θ x). (4) Another f-divergence bound is proposed in 3: B (f) = E (4P (θ x) P (θ x)) L (5) where L. For L = this bound can be obtained also by applying Jensen s inequality on the MAP probability of error. The harmonic lower bound was proposed in 5: B (HLB) = E P (θ x)p (θ x). (6) The Gaussian-Sinusoidal lower bound is given by B (Gauss sin) = 0.395E sin(πp (θ x)) exp{ α(p (θ x) 0.5) } (7) where α =.8063. Although this bound is tight, it is usually not tractable. An arbitrarily tight lower bound is given by B (AT LB) = α E log + e α e αp (θ x) + e αp (θ x) for any α > 0. By selecting high enough values for α, this lower bound can be made arbitrarily close to Pe. However, this bound is, in general, difficult to evaluate. For multiple hypothesis testing problems, the following lower bounds have been proposed. In 6, Devijver derived the following bounds using the conditional Bayesian distance: and B (Bayes) = M M where B θ x = M E ( ) MB θ x M (8) (9) B (Bayes) = B θ x (0) P (θ i x) stands for the conditional Bayesian distance. In 6, it is analytically shown that for the binary case the Bayesian distance lower bound in (9) is always tighter than the Bhattacharyya bound in (4). The bound in (0) is tighter than the bound 5, 6 B (Bayes3) = E M P (θ i x). () The bound B (quad) = B θ x () was proposed in 3 and 4 in the context of Vajdas quadratic entropy and the quadratic mutual information, respectively. In 6, it is claimed that B (quad) B (Bayes) B (Bayes). The bound B (quad) can be interpreted as an M-ary extension to the harmonic mean bound, presented in (6).

4. A NEW CLASS OF BOUNDS ON PROBABILITY OF ERROR 4.. Derivation of the new class of bounds Consider an M-ary hypothesis testing problem with detector ˆθ = ˆθ(x). Let u(x, θ) = ˆθ θ = { if ˆθ θ 0 if ˆθ = θ, (3) where θ is the true hypothesis and A is the indicator function of a subset A. Then, according to Hölder s inequality 5, for p, q, p + q = : E p u(x, θ) p E q v(x, θ) q E u(x, θ)v(x, θ) (4) for an arbitrary scalar function v(x, θ). It can be seen that E u(x, θ) p = E u(x, θ) = (5) where p. By substituting of (5) into (4) one obtains the following lower bound on the probability of error: Ep u(x, θ)v(x, θ) E p q v(x, θ) q. (6) Using (3) the expectation term in the numerator of (6) can be rewritten as E u(x, θ)v (x, θ) = E E u(x, θ)v (x, θ) x = E v (x, θ) E P (ˆθ x) v(x, ˆθ). (7) It can be shown that in order to obtain a valid bound which is independent of the detector ˆθ, v (x, θ) should be structured as follows v (x, θ) = ζ(x) (8) P (θ x) where ζ( ) is an arbitrary function. With no loss of generality ζ( ) can be chosen to be a nonnegative function. By substituting (8) in (7) one obtains E u(x, θ)v (x, θ) = E M ζ(x)p (θ i x) P (θ i x) Using (8), it can be seen that,θ i ˆθ = (M )E ζ(x). (9) E v (x, θ) q = E ζ q (x)g(x) (0) where g(x) = M (P (θ i x)) q. By substitution of (9) and (0) into (6) the bound can be rewritten as: (M ) p E p ζ(x) E p q ζq (x)g(x). () Maximization of () with respect to (w.r.t.) ζ( ), results ζ(x) = c g(x) q () and by substituting () in (), the attained lower bound is ( M ) q (M ) q q E (P (θ i x)) q (3) for all q >. 4.. Binary hypothesis testing In the binary hypothesis problem with the hypotheses θ and θ, the lower bound in (3) is P (θ x) P (θ x) E. (4) (P q (θ x) + P q (θ x)) q 4... Asymptotic properties It can be seen that the bound in (4) becomes tighter by increasing q, and for q the bound is E {P (θ x), P (θ x)} = E max {P (θ i x)}, (5) which is the optimal bound for the binary hypothesis test, presented in (). 4... The harmonic lower bound For q = this bound can be written by the following simple version: E x P (θ x) P (θ x) (6) which is identical to the harmonic lower bound in (6) and to B (quad) for the binary case in (). Thus, the binary lower bound in 5 can be interpreted as a special case of our general M-hypotheses bound, presented in (3). 4..3. Relation to upper bounds on imum probability of error In 5, an upper bound on the probability of error of MAP estimator for binary hypothesis testing is derived using the negative power mean inequalities. According to this paper: Pe q Ex ( ) q P (θ i x) q (7) for any q >. It can be seen that this upper bound can be obtained by multiplying the proposed lower bound in (3) by q. The factor of q controls the tightness between upper and lower bounds in the probability of error for binary hypothesis testing.

4..4. Bounds comparison Figure depicts the new lower bound for the binary hypothesis problem against the conditional probability P (θ x), for different values of the parameter q. The new bounds are compared to the bounds B (Gauss sin), B (AT LB) with α = 5, and B (Bayes3), presented in (7), (8), and (), respectively. It can be seen that the bound in (4) becomes tighter as q grows, and that for q = 0, the new bound is tighter than the other lower bounds almost everywhere. 0.5 0.45 0.4 0.35 Proposed bound q=0 Proposed bound q=5 Proposed bound q= B (Gauss sin) B (Bayes3) B (ATLB), α=5 where F is the hypergeometric function 6. Several bounds on the probability of error and the imum probability of error obtained by the MAP detector are presented in Fig. as a function of the distribution parameter, λ. The bounds in this figure are: B (BLB), B (BLB) B (Bayes), B (Bayes3) and the new lower bounds with q = and q = 3 presented in (3), (4), (9), (), and (6), respectively. It can be seen that for λ 0.8 the proposed bound with q = 3 is tighter than the Bhattacharyya lower bounds and is close to the imum probability of error obtained by the MAP decision rule. The proposed bound with q = is tighter than the B (BLB) lower bounds everywhere and tighter than the other bounds in some specific regions. In addition, the upper and lower bounds for q =, 3, obtained by (7) and (3), respectively, are presented in Fig. 3 as a function of the distribution parameter, λ. 0.3 0.5 0. 0.5 0. 0.05 0 0 0. 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 P(θ x) bound 0 0 0 B (Bayes) B (Bayes3) B (BLB) B (BLB) Proposed bound, q= Proposed bound, q=3 Fig.. The proposed lower bounds for q =, 5, 0 and other existing bounds as a function of the conditional probability P (θ x) for binary hypothesis testing. 0 3 4 5 6 7 8 9 0 5. EXAMPLES In this section, two examples are presented to evaluate the performances of the new lower bounds on the imum probability of error derived in this paper. 5.. Binary hypothesis problem Consider the following binary hypothesis testing problem: θ : f(x θ ) = λ e λx u(x) θ : f(x θ ) = λ e λ x u(x) (8) where u( ) denotes the unit step function and with P (θ ) = P (θ ) = and λ = 0.5. For this problem, the proposed bounds with q = and q = 3 are B q= = ( F λ, ; λ λ ; λ ) λ λ λ λ λ B q=3 = λ F ( (λ λ ), ) ; λ (λ λ ) ; λ λ Fig.. Comparison of the different bounds and the exact imum probability of error as function of λ for two equallylikely exponential distribution hypotheses. 5.. Multiple hypothesis problem Consider the following multiple hypothesis testing problem: λ θ : f(x θ ) = 3 cos (x/)e x θ : f(x θ ) = sin (x/)e x θ 3 : f(x θ 3 ) = 5 4 sin (x)e x (9) with P (θ ) = 5 8, P (θ ) = 5 8, and P (θ 3) = 8 8. In this problem the exact probability of error of the MAstimator is difficult to compute. The bounds B (Bayes), B (Bayes), B (Bayes3), and B (quad) are not tractable. The proposed bound with q = is computable and it equal to B q= = 40 4 0 e x cos (x/) + sin (x/) + sin (x) dx = = 35 e x (cos(x) sin(x) 5) 0 = 0.86.

bound 0 0 0 proposed lower bound, q= proposed lower bound, q=3 upper bound, q= upper bound, q=3 0 3 4 5 6 7 8 9 0 λ Fig. 3. Comparison of the upper and lower bounds and the exact imum probability of error as function of λ for two equally-likely exponential distribution hypotheses. This example demonstrates the simplicity of the proposed bound with q =, while the other bounds are intractable. 6. CONCLUSION In this paper, a new class of lower bounds on the probability of error in multiple hypothesis testing was presented. These new bounds maintain the desirable properties of continuity, differentiability, and symmetry. In the binary case, the proposed class depends on a parameter, q which at the limit of infinity provides the imum attainable probability of error, provided by the MAP detector. It is shown that this class of bounds generalizes some existing bounds for binary hypothesis tests. It was shown via examples that the proposed bounds outperform other existing bounds in terms of tightness and simplicity of calculation. 6 P. A. Devijver, On a new class of bounds on Bayes risk in multihypothesis pattern recognition, IEEE Trans. Comput., vol. C-3, pp. 70 80, 974. 7 C. E. Shannon, Certain results in coding theory for noisy channels, Inform. Contr., vol., pp. 6 5, 957. 8 R. M. Fano, Class notes for Transmission of Information course 6.574, MIT, Cambridge, MA, 95. 9 M. Feder and N. Merhav, Relations between entropy and error probability, IEEE Trans. Inform. Theory, vol. 40, no., pp. 59 66, 994. 0 H. Poor and J. Thomas, Applications of Ali-Silvey distance measures in the design generalized quantizers for binary decision systems, IEEE Trans. Commun., vol. 5, no. 9, pp. 893 900, 977. W. A. Hashlamoun, Applications of distance measures and probability of error bounds to distributed detection systems, Ph.D. thesis, Syracuse Univ., Syracuse, 99. H. Avi-Itzhak and T. Diep, Arbitrarily tight upper and lower bounds on the Bayesian probability of error, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 8, no., pp. 89 9, 996. 3 I. Vajda, Bounds of the imal error probability on checking a finite or countable number of hypotheses, Inform. Transmis. Problems, vol. 4, pp. 9 9, 968. 4 G. T. Toussaint, Feature evaluation criteria and contextual decoding algorithms in statistical pattern recognition, Ph.D. thesis, University of British Columbia, Vancouver, Canada, 97. 5 G. H. Hardy, J. E. Littlewood, and G. Polya, Inequalities., Cambridge, U.K.: Cambridge Univ. Press, nd ed. edition, 988. 6 M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, Dover, New York, ninth dover printing, tenth GPO printing edition, 964. 7. REFERENCES D. Chazan, M. Zakai, and J. Ziv, Improved lower bounds on signal parameter estimation, IEEE Trans. Inform. Theory, vol. IT-, no., pp. 90 93, 975. T. Kailath, The divergence and Bhattacharyya distance measures in signal selection, IEEE Trans. Commun. Technol., vol. com-5, no., pp. 5 60, 967. 3 J. V. Tilburg and D. E. Boekee, Divergence bounds on key equivocation and error probability in cryptanalysis, Advances in cryptology CRYPTO 85, vol. 8, pp. 489 53, 986. 4 T. S. Han and S. Verdú, Generalizing the Fano inequality, IEEE Trans. Inform. Theory, vol. 40, no. 4, pp. 47 5, 994. 5 N. Santhi and A. Vardy, On an improvement over Rényi s equivocation bound, Proc. 44-th Allerton Conference on Communications, Control and Computing, pp. 8 4, 006.