A Note on Bound for Jensen-Shannon Divergence by Jeffreys

Similar documents
The Order Relation and Trace Inequalities for. Hermitian Operators

Comparison of the Population Variance Estimators. of 2-Parameter Exponential Distribution Based on. Multiple Criteria Decision Making Method

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

The Jacobsthal and Jacobsthal-Lucas Numbers via Square Roots of Matrices

Sharp integral inequalities involving high-order partial derivatives. Journal Of Inequalities And Applications, 2008, v. 2008, article no.

Another converse of Jensen s inequality

Some Inequalities Between Two Polygons Inscribed One in the Other

Comparative Studies of Law of Conservation of Energy. and Law Clusters of Conservation of Generalized Energy

On the spectral norm of r-circulant matrices with the Pell and Pell-Lucas numbers

Research Article Green s Theorem for Sign Data

CSCE 790S Background Results

One-sided finite-difference approximations suitable for use with Richardson extrapolation

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu

(Online First)A Lattice Boltzmann Scheme for Diffusion Equation in Spherical Coordinate

ON A DETERMINATION OF THE INITIAL FUNCTIONS FROM THE OBSERVED VALUES OF THE BOUNDARY FUNCTIONS FOR THE SECOND-ORDER HYPERBOLIC EQUATION

More metrics on cartesian products

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

TAIL BOUNDS FOR SUMS OF GEOMETRIC AND EXPONENTIAL VARIABLES

Lecture 12: Discrete Laplacian

Online Classification: Perceptron and Winnow

A note on almost sure behavior of randomly weighted sums of φ-mixing random variables with φ-mixing weights

An application of generalized Tsalli s-havrda-charvat entropy in coding theory through a generalization of Kraft inequality

Edge Isoperimetric Inequalities

COMPLEX NUMBERS AND QUADRATIC EQUATIONS

Randić Energy and Randić Estrada Index of a Graph

University of Washington Department of Chemistry Chemistry 453 Winter Quarter 2015

Curvature and isoperimetric inequality

On the correction of the h-index for career length

Goodness of fit and Wilks theorem

A CHARACTERIZATION OF ADDITIVE DERIVATIONS ON VON NEUMANN ALGEBRAS

Volume 18 Figure 1. Notation 1. Notation 2. Observation 1. Remark 1. Remark 2. Remark 3. Remark 4. Remark 5. Remark 6. Theorem A [2]. Theorem B [2].

Introduction to Information Theory, Data Compression,

A new construction of 3-separable matrices via an improved decoding of Macula s construction

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

11 Tail Inequalities Markov s Inequality. Lecture 11: Tail Inequalities [Fa 13]

Maximizing the number of nonnegative subsets

Simulated Power of the Discrete Cramér-von Mises Goodness-of-Fit Tests

Open Systems: Chemical Potential and Partial Molar Quantities Chemical Potential

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 12 10/21/2013. Martingale Concentration Inequalities and Applications

The lower and upper bounds on Perron root of nonnegative irreducible matrices

CHAPTER-5 INFORMATION MEASURE OF FUZZY MATRIX AND FUZZY BINARY RELATION

Power law and dimension of the maximum value for belief distribution with the max Deng entropy

The Geometry of Logit and Probit

Research Article An Extension of Stolarsky Means to the Multivariable Case

Games of Threats. Elon Kohlberg Abraham Neyman. Working Paper

Stanford University CS359G: Graph Partitioning and Expanders Handout 4 Luca Trevisan January 13, 2011

Lecture 7: Boltzmann distribution & Thermodynamics of mixing

The internal structure of natural numbers and one method for the definition of large prime numbers

Assortment Optimization under MNL

Binomial transforms of the modified k-fibonacci-like sequence

10-801: Advanced Optimization and Randomized Methods Lecture 2: Convex functions (Jan 15, 2014)

The Minimum Universal Cost Flow in an Infeasible Flow Network

Using T.O.M to Estimate Parameter of distributions that have not Single Exponential Family

3.1 ML and Empirical Distribution

Lecture 4: November 17, Part 1 Single Buffer Management

The binomial transforms of the generalized (s, t )-Jacobsthal matrix sequence

Integrals and Invariants of Euler-Lagrange Equations

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

Uniqueness of Weak Solutions to the 3D Ginzburg- Landau Model for Superconductivity

ECE 534: Elements of Information Theory. Solutions to Midterm Exam (Spring 2006)

Exercises of Chapter 2

Temperature. Chapter Heat Engine

BOUNDS ON DIVERGENCE IN NONEXTENSIVE STATISTICAL MECHANICS

Math 217 Fall 2013 Homework 2 Solutions

Estimation: Part 2. Chapter GREG estimation

Speeding up Computation of Scalar Multiplication in Elliptic Curve Cryptosystem

CALCULUS CLASSROOM CAPSULES

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

Linear Approximation with Regularization and Moving Least Squares

Foundations of Arithmetic

Statistical analysis using matlab. HY 439 Presented by: George Fortetsanakis

Lecture 4: September 12

On mutual information estimation for mixed-pair random variables

Asymptotics of the Solution of a Boundary Value. Problem for One-Characteristic Differential. Equation Degenerating into a Parabolic Equation

Eigenvalues of Random Graphs

A Robust Method for Calculating the Correlation Coefficient

arxiv: v1 [math.co] 1 Mar 2014

10.40 Appendix Connection to Thermodynamics and Derivation of Boltzmann Distribution

Monotonic Interpolating Curves by Using Rational. Cubic Ball Interpolation

Ensemble Methods: Boosting

STAT 3008 Applied Regression Analysis

Introduction to information theory and data compression

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Appendix for Causal Interaction in Factorial Experiments: Application to Conjoint Analysis

General viscosity iterative method for a sequence of quasi-nonexpansive mappings

Finding Primitive Roots Pseudo-Deterministically

Canonical transformations

System in Weibull Distribution

Physics 5153 Classical Mechanics. D Alembert s Principle and The Lagrangian-1

Kernel Methods and SVMs Extension

Lecture Notes on Linear Regression

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

Voting Games with Positive Weights and. Dummy Players: Facts and Theory

Entropy generation in a chemical reaction

Limited Dependent Variables

Error Probability for M Signals

A new Approach for Solving Linear Ordinary Differential Equations

Transcription:

OPEN ACCESS Conference Proceedngs Paper Entropy www.scforum.net/conference/ecea- A Note on Bound for Jensen-Shannon Dvergence by Jeffreys Takuya Yamano, * Department of Mathematcs and Physcs, Faculty of Scence, Kanagawa Unversty, 946, 6-33 Tsuchya, Hratsuka, Kanagawa 59-93, Japan; E-Mals: yamano@amy.h-ho.ne.jp * Author to whom correspondence should be addressed; E-Mal: yamano@amy.h-ho.ne.jp; Tel.: +8-45-47-8796; Fax: +8-45-47-8796. Receved: 4 September 4 / Accepted: 6 October 4 / Publshed: 3 November 4 Abstract: We present a lower bound on the Jensen-Shannon dvergence by the Jeffrers dvergence when p q s satsfed. In the orgnal Ln's paper [IEEE Trans. Info. Theory, 37, 45 (99)], where the dvergence was ntroduced, the upper bound n terms of the Jeffreys was the quarter of t. In vew of a recent shaper one reported by Crooks, we present a dscusson on upper bounds by transcendental functons of Jeffreys by comparng those values for a bnary dstrbuton. Keywords: Jensen-Shannon dvergence; varatonal dstance; Kullback-Lebler dvergence; Jeffreys dvergence PACS Codes: 89.7.-a, 89.7.Cf. Introducton The Jensen-Shannon dvergence JS(p; q) s a smlarty measure between two probablty dstrbutons p and q. It s presently used n varous dscplnes rangng from nformaton theory to electron orbtal patterns. It was ntroduced n the descrete case as [, ] JS(p; q) p ln p p + q + q ln q p + q In terms of the Kullback-Lebler dvergence D(p; q) = p ln p, t can also be expressed as q JS(p; q) = p + q D (p; ) + p + q D (q; ) ()

() In addton, the square root of the JS dvergence becomes a metrc n that t satsfes the trangle nequalty [3]. The JS dvergence can be bounded by other dvergence. For example, the varatonal dstance V(p; q) = p q can be used both for upper and lower bounds: JS(p; q) V(p; q), ([]), JS(p; q) 8 V (p; q) + 5 V4 (p; q) + (3) The second nequalty (lower bound) s explaned n Appendx. The Jeffreys dvergence J(p; q), on the other hand, s an old measure [4] and s defned as J(p; q) D(p; q) + D(q; p). However bounds of JS dvergence n terms of the Jeffreys dvergence s not so much examned snce the ntroducton by Ln []. It was shown that the JS dvergence s upper-bounded by the quarter of the Jeffreys,.e., J(p; q)/4 wthout mentonng whether there exsts the best possble (We present an alternatve proof of the Ln s upper bound n Appendx). Later, Crooks [5] presented a sharper one than the Ln s bound. In addton, t s stated that there s no correspondng lower bound n terms of J(p; q). In ths artcle, however, we show that there exsts n a lmted case.. Results and Dscusson.. A lower bound We here are nterested n obtanng a lower bound n terms of the Jeffreys dvergence. It should practcally be better to have both upper and lower bounds n terms of the same dvergence. However, t was reported that there s no lower bound by Jeffreys [5]. We dsprove ths statement below. By a heurstc method (though t s easly verfed), we fnd the followng nequalty: ln x + x 4 ln( + ln x), (x ) (4) The equalty holds when x =. Substtutng x = p /q and takng an average wth the probablty p, we have p ln p p + q 4 p ln( + ln ( p ) ) q Smlarly for the probablty q, an nequalty q ln p p + q 4 q ln( + ln ( q ) ) p (6) holds. Snce the functon ln( + x) s convex, the Jensen s nequalty tells ln( + x) ln( + x ). Therefore, the rght-hand sdes of the above two nequaltes are bounded from below respectvely by 4 ln( + D(p; q)) and ln( + D(q; p)) 4 Summng the both sdes and applyng a Jensen s nequalty wth the equal weght, we reach a lower bound: JS(p; q) ln( + J(p; q)) 4 (5)

3 (8).. Comparson of upper bounds The Ln s upper bound JS(p; q) 4 J(p; q) (9) s based on the nequalty of the arthmetc and geometrc means,.e., (p + q )/ p q. On the other hand, a sharper one derved by Crooks uses the Jensen s nequalty and t s expressed as [5], JS(p; q) ln [ ] + e J(p;q) () It s expressed by a transcendental functon of the Jeffreys dvergence. Here, we pursut yet another smpler one n the followng. Frst, we recognze that the Kullback-Lebler dvergence satsfes D(p; q) p () Ths nequalty s easly confrmed by a functon xln x, ( x ), and thus the nequalty p [ p ln ( p )] holds. Therefore, as n the same lne n [5], we can evaluate the followng way: q q p + q D (p; ) p p + q = p + e lnq p + e D(p;q) () where we have used the Jensen s nequalty for a concave functon at the last nequalty,.e. + e x + e x In a smlar way, we have p + q D (q; ) + e D(q;p) (3) Puttng together the nequaltes () and (3), we have JS(p; q) ( + e D(p;q) ) + ( + e D(q;p) ) (4) Lastly, usng the Jensen s nequalty once more, we have an upper bound JS(p; q) + e J(p;q) (5) Ths bound s smlar to () and s expressed by a transcendental functon of the Jeffreys, however t s smpler. For practcal purposes, t s mportant to examne how the upper bound (5) relates to the prevous ones. In Fgure, we show three curves of the upper bounds n the JS-Jeffreys plane. These q

4 curves correspond to nequaltes (9), () and (5). We fnd that the bound (5) s sharper than Ln s, however not than Crook s. In addton, the comparson of these bounds for a bnary dstrbuton s shown n Fgure, n whch p = (t, t) and q = ( t, t), where t. We observe that the bounds gve nearly same value around t=/, whle the dfferences become promnent as t departs towards ether or. Among these bounds we fnd that the Crooks bound s sharper, however t ndcates that there stll seems to exst a sharper one for entre range. Fgure. Comparson of the bound functons n the JS-Jeffreys plane: Ln (red), Crooks (green) and the present (blue). Fgure. Comparson of the bounds for a bnary dstrbuton: Ln (red), Crooks (green) and the present (blue). 4. Conclusons We have derved a lower bound of the Jensen-Shannon dvergence n terms of the Jeffrey s dvergence under a condton of p q : JS(p; q) 4 ln( + J(p; q)). Regardng the upper bound, we found a smpler one that s another transcendental functon of the Jeffrey s dvergence along the lne of the

5 prevous study. It s sharper than Ln s, but not than Crooks. Comparsons of alternatve upper bounds may practcally be useful and t s worth seekng for a sharper one. Appendx On lower bound by varatonal dstance The Kullback-Lebler dvergence s lower bounded by the varatonal dstance V, whch s known as the Pnsker nequalty [6, 7]: D(p; q) V + 36 V4 + (Varous bounds on a generalzed Kullback-Lebler dvergence was gven n [8]). Therefore, an equalty D (p; p + q p + q ) = D (q; ) 8 V + 5 V4 + holds, where V (p; p+q p+q ) = V (q; ) = V(p; q)/ s used. Thus, for the JS dvergence, the second nequalty of Eq. (3) follows. An alternatve proof of the Ln s bound []: The Ln s bound JS(p; q) J(p; q) can be derved as follows. Let us put the rato of the two 4 probablty functons as θ = q. Then, we have p Notng ( + θ ) = we have Thus, we can evaluate as Smlarly, we have p ln e (+θ )x e (+θ)x dx p p + q = {ln + p ( + θ ) } dx and applyng the Schwartz nequalty to t, e x dx e θx dx = θ ln( + θ ) ln lnθ = ln + D(p; q) p ln p p + q D(p; q) q ln q p + q D(q; p) Therefore, puttng together the above two nequaltes, we have the Ln s bound: JS(p; q) 4 (D(p; q) + D(q; p)) = J(p; q) 4

6 Acknowledgments The author thanks the organzers for arrangng the e-conference. Author Contrbutons The author conducted ths research and wrote the whole part of the paper. Conflcts of Interest The author declares no conflct of nterest. References and Notes. Ln, J. Dvergence measures based on the Shannon entropy. IEEE Trans. Info. Theory 99, 37, 45-5.. Rao, C. R. Dfferental Geometry n Statstcal Inference. IMS-Lecture Notes 987,, 7 5. 3. Endres D. and Schndeln, J. A new metrc for probablty dstrbutons. IEEE Trans. Inf. Theory, 3 49, 858-8. 4. Jeffreys, H. An nvarant form for the pror probablty n estmaton problems. Proc. Roy. Soc. Lon., Ser. A 946, 86, 453-46. 5. Crooks, G.E. Inequaltes between the Jensen Shannon and Jeffreys dvergences. Tech. Note 4 (8) http://threeplusone.com/crooks-nequalty.pdf 6. Kraft, O. A note on exponental bounds for bnomal probabltes. Ann. Inst. Statst. Math. 969,, 9-. 7. Fedotov, A., Harremoes, P., Topsoe, F. Refnements of Pnsker nequalty. IEEE Trans. Inf. Theory, 3, 49 49-498. 8. Yamano, T. Bounds on dvergence n nonextensve statstcal mechancs. J. Indones. Math. Soc. 3, 9, 89-97. 4 by the authors; lcensee MDPI, Basel, Swtzerland. Ths artcle s an open access artcle dstrbuted under the terms and condtons of the Creatve Commons Attrbuton lcense (http://creatvecommons.org/lcenses/by/3./).