A GENERAL CLASS OF LOWER BOUNDS ON THE PROBABILITY OF ERROR IN MULTIPLE HYPOTHESIS TESTING. Tirza Routtenberg and Joseph Tabrikian
|
|
- Lambert Allen
- 5 years ago
- Views:
Transcription
1 A GENERAL CLASS OF LOWER BOUNDS ON THE PROBABILITY OF ERROR IN MULTIPLE HYPOTHESIS TESTING Tirza Routtenberg and Joseph Tabrikian Department of Electrical and Computer Engineering Ben-Gurion University of the Negev, Beer-Sheva 8405, Israel ABSTRACT In this paper, a new class of lower bounds on the probability of error for m-ary hypothesis tests is proposed. Computation of the imum probability of error which is attained by the maximum a-posteriori probability (MAP) criterion, is usually not tractable. The new class is derived using Hölder s inequality. The bounds in this class are continuous and differentiable function of the conditional probability of error and they provide good prediction of the imum probability of error in multiple hypothesis testing. It is shown that for binary hypothesis testing problem this bound asymptotically coincides with the optimum probability of error provided by the MAP criterion. This bound is compared with other existing lower bounds in several typical detection and classification problems in terms of tightness and computational complexity. Index Terms MAP, detection, lower bounds, hypothesis testing, probability of error. INTRODUCTION Lower bounds on the probability of error are of great importance in system design and performance analysis in many applications, such as signal detection, communications, and classification. It is well known that the imum probability of error is attained by the maximum a-posteriori probability (MAP) criterion, however, its probability of error is often difficult to calculate and usually not tractable. In such cases, lower bounds on the probability of error are useful for performance analysis, feasibility study and system design. These bounds can be useful also for derivation of analytical expressions for the Ziv-Zakai family of bounds for parameter estimation. One of the difficulties in computation of the Ziv-Zakai bounds is that they involve an expression for the imum probability of error of a binary hypothesis problem. Analytic expressions for lower bounds on the probability of error may be useful to simplify the calculation of the bound. Several lower bounds on the probability of error have been presented in the literature. The bounds can be divided into binary hypothesis bounds, 3 and general bounds for multiple-hypothesis bounds 4, 5, 6. The lower bounds presented in 4 and 7 are based on Fano 8 and Shannon inequalities, respectively. The relations between entropy and error probability have been used to derive the bounds in 5, 9. Lower bounds on the Bayes risk, which utilize distance measures between statistical distributions, 3, 6 can also be used as lower bounds on the probability of error. The lower bounds in, 3 are based on Bhattacharyya distance and have closed form expressions for many commonly used distributions, but their tightness are unsatisfactory in most cases. Devijver 6 introduced another bound in terms of the Bayesian distance. This bound is tighter than the Bhattacharyya bound and appropriate also for the multiple hypothesis testing. Practical and useful lower bounds on the probability of error are expected to be computationally simple, tight, and appropriate for general multi-hypothesis problems. In this paper, a new class of lower bounds with the aforementioned desired properties is derived using Hölder s inequality. The bounds in this class are simpler to compute than the optimum probability of error provided by the MAP criterion and they provide good prediction of the imum probability of error in multiple hypothesis testing. The tightest lower bound under this class of bounds is derived. It is shown that some existing lower bounds 5 can be derived from this family. In addition, for binary hypothesis testing problem this bound asymptotically coincides with the optimum probability of error provided by the MAP criterion. This bound is compared with other existing lower bounds. The paper is organized as follows. The basic idea of the bounding problem presented in Section. A brief review of existing lower bounds on the probability of error is presented in Section 3. The new class of bounds is derived in Section 4. The performances of the proposed bound for various examples is evaluated in Section 5. Finally, our conclusions appear in Section 6.. PROBLEM STATEMENT Consider an M-ary hypothesis testing problem, in which the hypotheses are θ i, i =,..., M with the corresponding
2 a-priori probabilities P (θ i ), i =,..., M, and the random observation vector is x. Let p(θ i x), i =,..., M denote the conditional probability of θ i given x, and f(x θ i ) and f(x, θ i ) are the conditional and joint probability density functions (pdf) of x and θ i, i =,..., M, respectively. The probability of error of the decision problem is denoted by. It is well known that the imum average probability of error, obtained by the MAP criterion, is given by 9 Pe = E max {P (θ i x)}. (),...,M However, the imum probability in () is often difficult to calculate and usually not tractable. Therefore, computable and tight lower bounds on the probability of error are useful for performance analysis and system design. 3. REVIEW OF EXISTING LOWER BOUNDS In this section, some existing lower bounds on the imum probability of error are presented. The following bounds have been derived especially for the binary hypothesis testing. Most of the binary hypothesis testing bounds are based on divergence measures of the difference between two probability distributions, known as f-divergences or Ali-Silvey distances 0. In, the divergence and two Bhattacharyya-based lower bounds were proposed. The divergence lower bound is B (div) = 8 e J/ () where J = E log L(x) E log L(x) and L(x) = f x θ (x θ ) f x θ (x θ ) is the likelihood ratio function, and E i log L(x) = log L(x)f x θi (x θ i )dx, i =,. x A simple Bhattacharyya-based lower bound is P E (θ x)p (θ x) B (BLB) =. (3) 8P (θ )P (θ ) This bound is always tighter than the divergence lower bound. The second Bhattacharyya-based bound on is B (BLB) = E P (θ x) P (θ x). (4) Another f-divergence bound is proposed in 3: B (f) = E (4P (θ x) P (θ x)) L (5) where L. For L = this bound can be obtained also by applying Jensen s inequality on the MAP probability of error. The harmonic lower bound was proposed in 5: B (HLB) = E P (θ x)p (θ x). (6) The Gaussian-Sinusoidal lower bound is given by B (Gauss sin) = 0.395E sin(πp (θ x)) exp{ α(p (θ x) 0.5) } (7) where α = Although this bound is tight, it is usually not tractable. An arbitrarily tight lower bound is given by B (AT LB) = α E log + e α e αp (θ x) + e αp (θ x) for any α > 0. By selecting high enough values for α, this lower bound can be made arbitrarily close to Pe. However, this bound is, in general, difficult to evaluate. For multiple hypothesis testing problems, the following lower bounds have been proposed. In 6, Devijver derived the following bounds using the conditional Bayesian distance: and B (Bayes) = M M where B θ x = M E ( ) MB θ x M (8) (9) B (Bayes) = B θ x (0) P (θ i x) stands for the conditional Bayesian distance. In 6, it is analytically shown that for the binary case the Bayesian distance lower bound in (9) is always tighter than the Bhattacharyya bound in (4). The bound in (0) is tighter than the bound 5, 6 B (Bayes3) = E M P (θ i x). () The bound B (quad) = B θ x () was proposed in 3 and 4 in the context of Vajdas quadratic entropy and the quadratic mutual information, respectively. In 6, it is claimed that B (quad) B (Bayes) B (Bayes). The bound B (quad) can be interpreted as an M-ary extension to the harmonic mean bound, presented in (6).
3 4. A NEW CLASS OF BOUNDS ON PROBABILITY OF ERROR 4.. Derivation of the new class of bounds Consider an M-ary hypothesis testing problem with detector ˆθ = ˆθ(x). Let u(x, θ) = ˆθ θ = { if ˆθ θ 0 if ˆθ = θ, (3) where θ is the true hypothesis and A is the indicator function of a subset A. Then, according to Hölder s inequality 5, for p, q, p + q = : E p u(x, θ) p E q v(x, θ) q E u(x, θ)v(x, θ) (4) for an arbitrary scalar function v(x, θ). It can be seen that E u(x, θ) p = E u(x, θ) = (5) where p. By substituting of (5) into (4) one obtains the following lower bound on the probability of error: Ep u(x, θ)v(x, θ) E p q v(x, θ) q. (6) Using (3) the expectation term in the numerator of (6) can be rewritten as E u(x, θ)v (x, θ) = E E u(x, θ)v (x, θ) x = E v (x, θ) E P (ˆθ x) v(x, ˆθ). (7) It can be shown that in order to obtain a valid bound which is independent of the detector ˆθ, v (x, θ) should be structured as follows v (x, θ) = ζ(x) (8) P (θ x) where ζ( ) is an arbitrary function. With no loss of generality ζ( ) can be chosen to be a nonnegative function. By substituting (8) in (7) one obtains E u(x, θ)v (x, θ) = E M ζ(x)p (θ i x) P (θ i x) Using (8), it can be seen that,θ i ˆθ = (M )E ζ(x). (9) E v (x, θ) q = E ζ q (x)g(x) (0) where g(x) = M (P (θ i x)) q. By substitution of (9) and (0) into (6) the bound can be rewritten as: (M ) p E p ζ(x) E p q ζq (x)g(x). () Maximization of () with respect to (w.r.t.) ζ( ), results ζ(x) = c g(x) q () and by substituting () in (), the attained lower bound is ( M ) q (M ) q q E (P (θ i x)) q (3) for all q >. 4.. Binary hypothesis testing In the binary hypothesis problem with the hypotheses θ and θ, the lower bound in (3) is P (θ x) P (θ x) E. (4) (P q (θ x) + P q (θ x)) q 4... Asymptotic properties It can be seen that the bound in (4) becomes tighter by increasing q, and for q the bound is E {P (θ x), P (θ x)} = E max {P (θ i x)}, (5) which is the optimal bound for the binary hypothesis test, presented in () The harmonic lower bound For q = this bound can be written by the following simple version: E x P (θ x) P (θ x) (6) which is identical to the harmonic lower bound in (6) and to B (quad) for the binary case in (). Thus, the binary lower bound in 5 can be interpreted as a special case of our general M-hypotheses bound, presented in (3) Relation to upper bounds on imum probability of error In 5, an upper bound on the probability of error of MAP estimator for binary hypothesis testing is derived using the negative power mean inequalities. According to this paper: Pe q Ex ( ) q P (θ i x) q (7) for any q >. It can be seen that this upper bound can be obtained by multiplying the proposed lower bound in (3) by q. The factor of q controls the tightness between upper and lower bounds in the probability of error for binary hypothesis testing.
4 4..4. Bounds comparison Figure depicts the new lower bound for the binary hypothesis problem against the conditional probability P (θ x), for different values of the parameter q. The new bounds are compared to the bounds B (Gauss sin), B (AT LB) with α = 5, and B (Bayes3), presented in (7), (8), and (), respectively. It can be seen that the bound in (4) becomes tighter as q grows, and that for q = 0, the new bound is tighter than the other lower bounds almost everywhere Proposed bound q=0 Proposed bound q=5 Proposed bound q= B (Gauss sin) B (Bayes3) B (ATLB), α=5 where F is the hypergeometric function 6. Several bounds on the probability of error and the imum probability of error obtained by the MAP detector are presented in Fig. as a function of the distribution parameter, λ. The bounds in this figure are: B (BLB), B (BLB) B (Bayes), B (Bayes3) and the new lower bounds with q = and q = 3 presented in (3), (4), (9), (), and (6), respectively. It can be seen that for λ 0.8 the proposed bound with q = 3 is tighter than the Bhattacharyya lower bounds and is close to the imum probability of error obtained by the MAP decision rule. The proposed bound with q = is tighter than the B (BLB) lower bounds everywhere and tighter than the other bounds in some specific regions. In addition, the upper and lower bounds for q =, 3, obtained by (7) and (3), respectively, are presented in Fig. 3 as a function of the distribution parameter, λ P(θ x) bound B (Bayes) B (Bayes3) B (BLB) B (BLB) Proposed bound, q= Proposed bound, q=3 Fig.. The proposed lower bounds for q =, 5, 0 and other existing bounds as a function of the conditional probability P (θ x) for binary hypothesis testing EXAMPLES In this section, two examples are presented to evaluate the performances of the new lower bounds on the imum probability of error derived in this paper. 5.. Binary hypothesis problem Consider the following binary hypothesis testing problem: θ : f(x θ ) = λ e λx u(x) θ : f(x θ ) = λ e λ x u(x) (8) where u( ) denotes the unit step function and with P (θ ) = P (θ ) = and λ = 0.5. For this problem, the proposed bounds with q = and q = 3 are B q= = ( F λ, ; λ λ ; λ ) λ λ λ λ λ B q=3 = λ F ( (λ λ ), ) ; λ (λ λ ) ; λ λ Fig.. Comparison of the different bounds and the exact imum probability of error as function of λ for two equallylikely exponential distribution hypotheses. 5.. Multiple hypothesis problem Consider the following multiple hypothesis testing problem: λ θ : f(x θ ) = 3 cos (x/)e x θ : f(x θ ) = sin (x/)e x θ 3 : f(x θ 3 ) = 5 4 sin (x)e x (9) with P (θ ) = 5 8, P (θ ) = 5 8, and P (θ 3) = 8 8. In this problem the exact probability of error of the MAstimator is difficult to compute. The bounds B (Bayes), B (Bayes), B (Bayes3), and B (quad) are not tractable. The proposed bound with q = is computable and it equal to B q= = e x cos (x/) + sin (x/) + sin (x) dx = = 35 e x (cos(x) sin(x) 5) 0 = 0.86.
5 bound proposed lower bound, q= proposed lower bound, q=3 upper bound, q= upper bound, q= λ Fig. 3. Comparison of the upper and lower bounds and the exact imum probability of error as function of λ for two equally-likely exponential distribution hypotheses. This example demonstrates the simplicity of the proposed bound with q =, while the other bounds are intractable. 6. CONCLUSION In this paper, a new class of lower bounds on the probability of error in multiple hypothesis testing was presented. These new bounds maintain the desirable properties of continuity, differentiability, and symmetry. In the binary case, the proposed class depends on a parameter, q which at the limit of infinity provides the imum attainable probability of error, provided by the MAP detector. It is shown that this class of bounds generalizes some existing bounds for binary hypothesis tests. It was shown via examples that the proposed bounds outperform other existing bounds in terms of tightness and simplicity of calculation. 6 P. A. Devijver, On a new class of bounds on Bayes risk in multihypothesis pattern recognition, IEEE Trans. Comput., vol. C-3, pp , C. E. Shannon, Certain results in coding theory for noisy channels, Inform. Contr., vol., pp. 6 5, R. M. Fano, Class notes for Transmission of Information course 6.574, MIT, Cambridge, MA, M. Feder and N. Merhav, Relations between entropy and error probability, IEEE Trans. Inform. Theory, vol. 40, no., pp , H. Poor and J. Thomas, Applications of Ali-Silvey distance measures in the design generalized quantizers for binary decision systems, IEEE Trans. Commun., vol. 5, no. 9, pp , 977. W. A. Hashlamoun, Applications of distance measures and probability of error bounds to distributed detection systems, Ph.D. thesis, Syracuse Univ., Syracuse, 99. H. Avi-Itzhak and T. Diep, Arbitrarily tight upper and lower bounds on the Bayesian probability of error, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 8, no., pp. 89 9, I. Vajda, Bounds of the imal error probability on checking a finite or countable number of hypotheses, Inform. Transmis. Problems, vol. 4, pp. 9 9, G. T. Toussaint, Feature evaluation criteria and contextual decoding algorithms in statistical pattern recognition, Ph.D. thesis, University of British Columbia, Vancouver, Canada, G. H. Hardy, J. E. Littlewood, and G. Polya, Inequalities., Cambridge, U.K.: Cambridge Univ. Press, nd ed. edition, M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, Dover, New York, ninth dover printing, tenth GPO printing edition, REFERENCES D. Chazan, M. Zakai, and J. Ziv, Improved lower bounds on signal parameter estimation, IEEE Trans. Inform. Theory, vol. IT-, no., pp , 975. T. Kailath, The divergence and Bhattacharyya distance measures in signal selection, IEEE Trans. Commun. Technol., vol. com-5, no., pp. 5 60, J. V. Tilburg and D. E. Boekee, Divergence bounds on key equivocation and error probability in cryptanalysis, Advances in cryptology CRYPTO 85, vol. 8, pp , T. S. Han and S. Verdú, Generalizing the Fano inequality, IEEE Trans. Inform. Theory, vol. 40, no. 4, pp. 47 5, N. Santhi and A. Vardy, On an improvement over Rényi s equivocation bound, Proc. 44-th Allerton Conference on Communications, Control and Computing, pp. 8 4, 006.
A NEW BAYESIAN LOWER BOUND ON THE MEAN SQUARE ERROR OF ESTIMATORS. Koby Todros and Joseph Tabrikian
16th European Signal Processing Conference EUSIPCO 008) Lausanne Switzerland August 5-9 008 copyright by EURASIP A NEW BAYESIAN LOWER BOUND ON THE MEAN SQUARE ERROR OF ESTIMATORS Koby Todros and Joseph
More informationarxiv: v4 [cs.it] 17 Oct 2015
Upper Bounds on the Relative Entropy and Rényi Divergence as a Function of Total Variation Distance for Finite Alphabets Igal Sason Department of Electrical Engineering Technion Israel Institute of Technology
More informationBAYESIAN DESIGN OF DECENTRALIZED HYPOTHESIS TESTING UNDER COMMUNICATION CONSTRAINTS. Alla Tarighati, and Joakim Jaldén
204 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) BAYESIA DESIG OF DECETRALIZED HYPOTHESIS TESTIG UDER COMMUICATIO COSTRAITS Alla Tarighati, and Joakim Jaldén ACCESS
More informationArimoto-Rényi Conditional Entropy. and Bayesian M-ary Hypothesis Testing. Abstract
Arimoto-Rényi Conditional Entropy and Bayesian M-ary Hypothesis Testing Igal Sason Sergio Verdú Abstract This paper gives upper and lower bounds on the minimum error probability of Bayesian M-ary hypothesis
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More informationEE229B - Final Project. Capacity-Approaching Low-Density Parity-Check Codes
EE229B - Final Project Capacity-Approaching Low-Density Parity-Check Codes Pierre Garrigues EECS department, UC Berkeley garrigue@eecs.berkeley.edu May 13, 2005 Abstract The class of low-density parity-check
More informationGaussian Estimation under Attack Uncertainty
Gaussian Estimation under Attack Uncertainty Tara Javidi Yonatan Kaspi Himanshu Tyagi Abstract We consider the estimation of a standard Gaussian random variable under an observation attack where an adversary
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationChernoff-Type Bounds for the Gaussian Error Function
IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 59, NO., NOVEMBER 0 939 Chernoff-Type Bounds for the Gaussian Error Function Seok-Ho Chang, Member, IEEE, Pamela C. Cosman, Fellow, IEEE, and Laurence B. Milstein,
More informationEstimation of signal information content for classification
Estimation of signal information content for classification The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation As Published
More informationChapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)
Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Detection problems can usually be casted as binary or M-ary hypothesis testing problems. Applications: This chapter: Simple hypothesis
More informationMMSE Dimension. snr. 1 We use the following asymptotic notation: f(x) = O (g(x)) if and only
MMSE Dimension Yihong Wu Department of Electrical Engineering Princeton University Princeton, NJ 08544, USA Email: yihongwu@princeton.edu Sergio Verdú Department of Electrical Engineering Princeton University
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More informationAn Extended Fano s Inequality for the Finite Blocklength Coding
An Extended Fano s Inequality for the Finite Bloclength Coding Yunquan Dong, Pingyi Fan {dongyq8@mails,fpy@mail}.tsinghua.edu.cn Department of Electronic Engineering, Tsinghua University, Beijing, P.R.
More informationLattices for Distributed Source Coding: Jointly Gaussian Sources and Reconstruction of a Linear Function
Lattices for Distributed Source Coding: Jointly Gaussian Sources and Reconstruction of a Linear Function Dinesh Krithivasan and S. Sandeep Pradhan Department of Electrical Engineering and Computer Science,
More information44 CHAPTER 2. BAYESIAN DECISION THEORY
44 CHAPTER 2. BAYESIAN DECISION THEORY Problems Section 2.1 1. In the two-category case, under the Bayes decision rule the conditional error is given by Eq. 7. Even if the posterior densities are continuous,
More informationWorst-Case Analysis of the Perceptron and Exponentiated Update Algorithms
Worst-Case Analysis of the Perceptron and Exponentiated Update Algorithms Tom Bylander Division of Computer Science The University of Texas at San Antonio San Antonio, Texas 7849 bylander@cs.utsa.edu April
More informationDecision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over
Point estimation Suppose we are interested in the value of a parameter θ, for example the unknown bias of a coin. We have already seen how one may use the Bayesian method to reason about θ; namely, we
More informationParameter Estimation
1 / 44 Parameter Estimation Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay October 25, 2012 Motivation System Model used to Derive
More informationTight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding
APPEARS IN THE IEEE TRANSACTIONS ON INFORMATION THEORY, FEBRUARY 015 1 Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding Igal Sason Abstract Tight bounds for
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationAn introduction to Variational calculus in Machine Learning
n introduction to Variational calculus in Machine Learning nders Meng February 2004 1 Introduction The intention of this note is not to give a full understanding of calculus of variations since this area
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More informationArimoto Channel Coding Converse and Rényi Divergence
Arimoto Channel Coding Converse and Rényi Divergence Yury Polyanskiy and Sergio Verdú Abstract Arimoto proved a non-asymptotic upper bound on the probability of successful decoding achievable by any code
More informationTime-Delay Estimation *
OpenStax-CNX module: m1143 1 Time-Delay stimation * Don Johnson This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 1. An important signal parameter estimation
More informationCSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection
CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection (non-examinable material) Matthew J. Beal February 27, 2004 www.variational-bayes.org Bayesian Model Selection
More informationEstimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator
Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationQuantum Sphere-Packing Bounds and Moderate Deviation Analysis for Classical-Quantum Channels
Quantum Sphere-Packing Bounds and Moderate Deviation Analysis for Classical-Quantum Channels (, ) Joint work with Min-Hsiu Hsieh and Marco Tomamichel Hao-Chung Cheng University of Technology Sydney National
More informationTight Bounds for Symmetric Divergence Measures and a New Inequality Relating f-divergences
Tight Bounds for Symmetric Divergence Measures and a New Inequality Relating f-divergences Igal Sason Department of Electrical Engineering Technion, Haifa 3000, Israel E-mail: sason@ee.technion.ac.il Abstract
More informationBayes Decision Theory
Bayes Decision Theory Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density 0 Minimum-Error-Rate Classification Actions are decisions on classes
More informationInformation Theory in Intelligent Decision Making
Information Theory in Intelligent Decision Making Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire, United Kingdom June 7, 2015 Information Theory
More informationConvexity/Concavity of Renyi Entropy and α-mutual Information
Convexity/Concavity of Renyi Entropy and -Mutual Information Siu-Wai Ho Institute for Telecommunications Research University of South Australia Adelaide, SA 5095, Australia Email: siuwai.ho@unisa.edu.au
More informationSome New Information Inequalities Involving f-divergences
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 12, No 2 Sofia 2012 Some New Information Inequalities Involving f-divergences Amit Srivastava Department of Mathematics, Jaypee
More informationPROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability
More informationDecentralized Detection in Sensor Networks
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL 51, NO 2, FEBRUARY 2003 407 Decentralized Detection in Sensor Networks Jean-François Chamberland, Student Member, IEEE, and Venugopal V Veeravalli, Senior Member,
More informationUniversal Composite Hypothesis Testing: A Competitive Minimax Approach
1504 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 6, JUNE 2002 Universal Composite Hypothesis Testing: A Competitive Minimax Approach Meir Feder, Fellow, IEEE, Neri Merhav, Fellow, IEEE Invited
More informationTightened Upper Bounds on the ML Decoding Error Probability of Binary Linear Block Codes and Applications
on the ML Decoding Error Probability of Binary Linear Block Codes and Department of Electrical Engineering Technion-Israel Institute of Technology An M.Sc. Thesis supervisor: Dr. Igal Sason March 30, 2006
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationSYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I
SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability
More informationLecture 8: Channel Capacity, Continuous Random Variables
EE376A/STATS376A Information Theory Lecture 8-02/0/208 Lecture 8: Channel Capacity, Continuous Random Variables Lecturer: Tsachy Weissman Scribe: Augustine Chemparathy, Adithya Ganesh, Philip Hwang Channel
More informationDecentralized Detection In Wireless Sensor Networks
Decentralized Detection In Wireless Sensor Networks Milad Kharratzadeh Department of Electrical & Computer Engineering McGill University Montreal, Canada April 2011 Statistical Detection and Estimation
More informationOn Information Divergence Measures, Surrogate Loss Functions and Decentralized Hypothesis Testing
On Information Divergence Measures, Surrogate Loss Functions and Decentralized Hypothesis Testing XuanLong Nguyen Martin J. Wainwright Michael I. Jordan Electrical Engineering & Computer Science Department
More informationToday. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion
Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence
More informationExpectation Propagation for Approximate Bayesian Inference
Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given
More informationEstimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators
Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let
More informationStatistics Ph.D. Qualifying Exam: Part I October 18, 2003
Statistics Ph.D. Qualifying Exam: Part I October 18, 2003 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. 1 2 3 4 5 6 7 8 9 10 11 12 2. Write your answer
More informationLecture 1 October 9, 2013
Probabilistic Graphical Models Fall 2013 Lecture 1 October 9, 2013 Lecturer: Guillaume Obozinski Scribe: Huu Dien Khue Le, Robin Bénesse The web page of the course: http://www.di.ens.fr/~fbach/courses/fall2013/
More informationOn the Bayes Risk in Information-Hiding Protocols
On the Bayes Risk in Information-Hiding Protocols Konstantinos Chatzikokolakis Catuscia Palamidessi INRIA and LIX, École Polytechnique Palaiseau, France {kostas,catuscia}@lix.polytechnique.fr Prakash Panangaden
More informationBayesian Decision Theory
Bayesian Decision Theory Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Bayesian Decision Theory Bayesian classification for normal distributions Error Probabilities
More informationTHE BAYESIAN ABEL BOUND ON THE MEAN SQUARE ERROR
THE BAYESIAN ABEL BOUND ON THE MEAN SQUARE ERROR Alexandre Renaux, Philippe Forster, Pascal Larzabal, Christ Richmond To cite this version: Alexandre Renaux, Philippe Forster, Pascal Larzabal, Christ Richmond
More informationA COLLABORATIVE 20 QUESTIONS MODEL FOR TARGET SEARCH WITH HUMAN-MACHINE INTERACTION
A COLLABORATIVE 20 QUESTIONS MODEL FOR TARGET SEARCH WITH HUMAN-MACHINE INTERACTION Theodoros Tsiligkaridis, Brian M Sadler and Alfred O Hero III, University of Michigan, EECS Dept and Dept Statistics,
More informationEE376A: Homeworks #4 Solutions Due on Thursday, February 22, 2018 Please submit on Gradescope. Start every question on a new page.
EE376A: Homeworks #4 Solutions Due on Thursday, February 22, 28 Please submit on Gradescope. Start every question on a new page.. Maximum Differential Entropy (a) Show that among all distributions supported
More informationCases Where Finding the Minimum Entropy Coloring of a Characteristic Graph is a Polynomial Time Problem
Cases Where Finding the Minimum Entropy Coloring of a Characteristic Graph is a Polynomial Time Problem Soheil Feizi, Muriel Médard RLE at MIT Emails: {sfeizi,medard}@mit.edu Abstract In this paper, we
More informationx log x, which is strictly convex, and use Jensen s Inequality:
2. Information measures: mutual information 2.1 Divergence: main inequality Theorem 2.1 (Information Inequality). D(P Q) 0 ; D(P Q) = 0 iff P = Q Proof. Let ϕ(x) x log x, which is strictly convex, and
More informationUpper Bounds on the Capacity of Binary Intermittent Communication
Upper Bounds on the Capacity of Binary Intermittent Communication Mostafa Khoshnevisan and J. Nicholas Laneman Department of Electrical Engineering University of Notre Dame Notre Dame, Indiana 46556 Email:{mhoshne,
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationA NEW INFORMATION THEORETIC APPROACH TO ORDER ESTIMATION PROBLEM. Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A.
A EW IFORMATIO THEORETIC APPROACH TO ORDER ESTIMATIO PROBLEM Soosan Beheshti Munther A. Dahleh Massachusetts Institute of Technology, Cambridge, MA 0239, U.S.A. Abstract: We introduce a new method of model
More informationLecture 7. Union bound for reducing M-ary to binary hypothesis testing
Lecture 7 Agenda for the lecture M-ary hypothesis testing and the MAP rule Union bound for reducing M-ary to binary hypothesis testing Introduction of the channel coding problem 7.1 M-ary hypothesis testing
More informationPrinciples of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata
Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationOn the Bayes Risk in Information-Hiding Protocols
On the Bayes Risk in Information-Hiding Protocols Konstantinos Chatzikokolakis Catuscia Palamidessi INRIA and LIX, École Polytechnique Palaiseau, France {kostas,catuscia}@lix.polytechnique.fr Prakash Panangaden
More informationOn the Capacity of Free-Space Optical Intensity Channels
On the Capacity of Free-Space Optical Intensity Channels Amos Lapidoth TH Zurich Zurich, Switzerl mail: lapidoth@isi.ee.ethz.ch Stefan M. Moser National Chiao Tung University NCTU Hsinchu, Taiwan mail:
More informationRobustness and duality of maximum entropy and exponential family distributions
Chapter 7 Robustness and duality of maximum entropy and exponential family distributions In this lecture, we continue our study of exponential families, but now we investigate their properties in somewhat
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationUniversal Estimation of Divergence for Continuous Distributions via Data-Dependent Partitions
Universal Estimation of for Continuous Distributions via Data-Dependent Partitions Qing Wang, Sanjeev R. Kulkarni, Sergio Verdú Department of Electrical Engineering Princeton University Princeton, NJ 8544
More informationEvaluation of the non-elementary integral e
Noname manuscript No. will be inserted by the editor Evaluation of the non-elementary integral e λx dx,, and related integrals Victor Nijimbere Received: date / Accepted: date Abstract A formula for the
More information3.0.1 Multivariate version and tensor product of experiments
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 3: Minimax risk of GLM and four extensions Lecturer: Yihong Wu Scribe: Ashok Vardhan, Jan 28, 2016 [Ed. Mar 24]
More informationÚstav teorie informace a automatizace. Academy of Sciences of the Czech Republic Institute of Information Theory and Automation RESEARCH REPORT
Akademie věd České republiky Ústav teorie informace a automatizace Academy of Sciences of the Czech Republic Institute of Information Theory and Automation RESEARCH REPORT D. Morales and I. Vajda: Generalized
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationDetection and Estimation Theory
Detection and Estimation Theory Instructor: Prof. Namrata Vaswani Dept. of Electrical and Computer Engineering Iowa State University http://www.ece.iastate.edu/ namrata Slide 1 What is Estimation and Detection
More informationEECS 750. Hypothesis Testing with Communication Constraints
EECS 750 Hypothesis Testing with Communication Constraints Name: Dinesh Krithivasan Abstract In this report, we study a modification of the classical statistical problem of bivariate hypothesis testing.
More informationLECTURE 15. Last time: Feedback channel: setting up the problem. Lecture outline. Joint source and channel coding theorem
LECTURE 15 Last time: Feedback channel: setting up the problem Perfect feedback Feedback capacity Data compression Lecture outline Joint source and channel coding theorem Converse Robustness Brain teaser
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More informationPerformance-based Security for Encoding of Information Signals. FA ( ) Paul Cuff (Princeton University)
Performance-based Security for Encoding of Information Signals FA9550-15-1-0180 (2015-2018) Paul Cuff (Princeton University) Contributors Two students finished PhD Tiance Wang (Goldman Sachs) Eva Song
More informationOptimum Joint Detection and Estimation
20 IEEE International Symposium on Information Theory Proceedings Optimum Joint Detection and Estimation George V. Moustakides Department of Electrical and Computer Engineering University of Patras, 26500
More informationConstructing Polar Codes Using Iterative Bit-Channel Upgrading. Arash Ghayoori. B.Sc., Isfahan University of Technology, 2011
Constructing Polar Codes Using Iterative Bit-Channel Upgrading by Arash Ghayoori B.Sc., Isfahan University of Technology, 011 A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree
More informationGenerative MaxEnt Learning for Multiclass Classification
Generative Maximum Entropy Learning for Multiclass Classification A. Dukkipati, G. Pandey, D. Ghoshdastidar, P. Koley, D. M. V. S. Sriram Dept. of Computer Science and Automation Indian Institute of Science,
More informationIN this paper, we consider the capacity of sticky channels, a
72 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 1, JANUARY 2008 Capacity Bounds for Sticky Channels Michael Mitzenmacher, Member, IEEE Abstract The capacity of sticky channels, a subclass of insertion
More informationRobust Binary Quantizers for Distributed Detection
1 Robust Binary Quantizers for Distributed Detection Ying Lin, Biao Chen, and Bruce Suter Abstract We consider robust signal processing techniques for inference-centric distributed sensor networks operating
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationOptimum Joint Detection and Estimation
Report SSP-2010-1: Optimum Joint Detection and Estimation George V. Moustakides Statistical Signal Processing Group Department of Electrical & Computer Engineering niversity of Patras, GREECE Contents
More informationVariational Scoring of Graphical Model Structures
Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationEIE6207: Estimation Theory
EIE6207: Estimation Theory Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: Steven M.
More informationTightened Upper Bounds on the ML Decoding Error Probability of Binary Linear Block Codes and Applications
on the ML Decoding Error Probability of Binary Linear Block Codes and Moshe Twitto Department of Electrical Engineering Technion-Israel Institute of Technology Haifa 32000, Israel Joint work with Igal
More informationEE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm
EE/Stats 376A: Homework 7 Solutions Due on Friday March 17, 5 pm 1. Feedback does not increase the capacity. Consider a channel with feedback. We assume that all the recieved outputs are sent back immediately
More informationVariable selection and feature construction using methods related to information theory
Outline Variable selection and feature construction using methods related to information theory Kari 1 1 Intelligent Systems Lab, Motorola, Tempe, AZ IJCNN 2007 Outline Outline 1 Information Theory and
More informationVersatile, Accurate and Analytically Tractable Approximation for the Gaussian Q-function. Miguel López-Benítez and Fernando Casadevall
Versatile, Accurate and Analytically Tractable Approximation for the Gaussian Q-function Miguel López-Benítez and Fernando Casadevall Department of Signal Theory and Communications Universitat Politècnica
More informationIntroduction to Information Entropy Adapted from Papoulis (1991)
Introduction to Information Entropy Adapted from Papoulis (1991) Federico Lombardo Papoulis, A., Probability, Random Variables and Stochastic Processes, 3rd edition, McGraw ill, 1991. 1 1. INTRODUCTION
More informationLecture 2: August 31
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy
More informationStochastic Complexity of Variational Bayesian Hidden Markov Models
Stochastic Complexity of Variational Bayesian Hidden Markov Models Tikara Hosino Department of Computational Intelligence and System Science, Tokyo Institute of Technology Mailbox R-5, 459 Nagatsuta, Midori-ku,
More informationBinary Transmissions over Additive Gaussian Noise: A Closed-Form Expression for the Channel Capacity 1
5 Conference on Information Sciences and Systems, The Johns Hopkins University, March 6 8, 5 inary Transmissions over Additive Gaussian Noise: A Closed-Form Expression for the Channel Capacity Ahmed O.
More informationKernel Methods and Support Vector Machines
Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete
More informationInformation Theory and Hypothesis Testing
Summer School on Game Theory and Telecommunications Campione, 7-12 September, 2014 Information Theory and Hypothesis Testing Mauro Barni University of Siena September 8 Review of some basic results linking
More informationNon-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines
Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall
More informationVariational Principal Components
Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings
More informationDesign of Optimal Bayesian Reliability Test Plans for a Series System
Volume 109 No 9 2016, 125 133 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://wwwijpameu ijpameu Design of Optimal Bayesian Reliability Test Plans for a Series System P
More information