arxiv: v2 [math.st] 22 Aug 2018
|
|
- Suzan Simpson
- 5 years ago
- Views:
Transcription
1 GENERALIZED BREGMAN AND JENSEN DIVERGENCES WHICH INCLUDE SOME F-DIVERGENCES TOMOHIRO NISHIYAMA arxiv: v2 [math.st] 22 Aug 208 Abstract. In this paper, we introduce new classes of divergences by extending the definitions of the Bregman divergence and the skew Jensen divergence. These new divergence classes (g-bregman divergence and skew g-jensen divergence satisfy some properties similar to the Bregman or skew Jensen divergence. We show these g-divergences include divergences which belong to a class of f-divergence (the Hellinger distance, the chi-square divergence and the alpha-divergence in addition to the Kullback-Leibler divergence. Moreover, we derive an inequality between the skew g-jensen divergence and the g-bregman divergence and show this inequality is a generalization of Lin s inequality. Keywords: Bregman divergence, f-divergence, Jensen divergence, Kullback- Leibler divergence, Jeffreys divergence, Jensen-Shannon divergence, Hellinger distance, chi-square divergence, alpha divergence.. Introduction Divergences are functions measure the discrepancy between two points and play a key role in the field of machine learning, signal processing and so on. Given a set S and p,q S, a divergence is defined as a function D : S S R which satisfies the following properties.. D(p,q 0 for all p,q S 2. D(p,q = 0 p = q The Bregman divergence [6] B F (p,q F(p F(q p q, F(q and f- divergence [,9] D f (p q i f( pi are representative divergences defined by using a strictly convex function, where F,f : S R are strictly convex functions and f( = 0. Besides that Nielsen have introduced the skew Jensen divergence J F,α (p,q ( αf(p + αf(q F(( αp + αq for a paramter α (0, and have shown the relation between the Bregman divergence and the skew Jensen divergence [7, 4]. The Bregaman divergence and the f-divergece include well-known divergences. For example, the Kullback-Leibler divergence(kl-divergence [] is a type of the Bregman divergence or the f-divergence. On the other hand, the Hellinger distance, the χ-square divergence, the α-divergence [4,8] are a type of the f-divergence and the Jensen-Shannon divergence(js-divergence [2] is a type of the skew Jensen divergence. For probability distributions p and q, each divergence is defined as follows. KL-divergence KL(p q i p i ln( p i (
2 2 TOMOHIRO NISHIYAMA Hellinger distance H 2 (p,q i ( p i 2 (2 Pearson χ-square divergence Neyman χ-square divergence D χ 2,P(p q i (p i 2 (3 α-divergence D χ2,n(p q i D α (p q α(α i (p i 2 p i (4 (p α i q α i (5 The main goal of this paper is to introduce new classes of divergences ( gdivergence and clarify the relations among g-divergences and f-divergence. First, we introduce the g-bregman divergence and the skew g-jensen divergence by extending the definitions of the Bregman divergence and the skew Jensen divergence respectively. For these new classes of divergences, we show they satisfy some properties similar to the Bregman or the skew Jensen divergence and derive an inequality between them. Then, we show that the g-bregman divergence includes the Hellinger distance, the Pearson and Neyman χ-square divergence and the α-divergence. Furthermore, we show that the skew g-jensen divergence include the Hellinger distance and the α-divergence. These divergences have not only the properties of f-divergence but also some properties similar to the Bregman or the skew Jensen divergence. Finally we derive many inequalities by using an inequality between the g-bregman divergence and skew g-jensen divergence. 2. g-bregman divergence and skew g-jensen divergence By using a function g which has a inverse function, we define the g-bregman divergence and the skew g-jensen divergence. Definition. Let Ω R d be a convex set. For p Ω, let F(x be a strictly convex function F : Ω R. The Bregman divergence and the symmetric Bregman divergence are defined as B F (p,q F(p F(q p q, F(q (6 B g F,sym (p,q Bg F (p,q+bg F (q,p = p q, F(p F(q. (7 The symbol, indicates an inner product. For the parameter α R \{0,}, the scaled skew Jensen divergence [3,4] is defined as sj F,α (p,q ( ( αf(p+αf(q F(( αp+αq. (8 α( α Nielsen has shown the relation between the Bregman divergence and the Jensen divergence as follows [4]. B F (q,p = lim sj F,α (p,q α 0 (9 B F (p,q = lim sj F,α (p,q α (0
3 GENERALIZED BREGMAN AND JENSEN DIVERGENCES WHICH INCLUDE SOME F-DIVERGENCES3 J F,α (p,q = ( ( αb F (p,( αp+αq+αb F (q,( αp+αq α( α Definition 2. (Definition of the g-convex setlet S be a vector space over the real numbers and g be a function g : S S. If S satisfies the following condition, we call the set S g-convex set. For all p and n S and all α in the interval (0,, the point ( αg(p+αg(q also belongs to S. Definition 3. (Definition of the g-divergences Let Ω be a g-convex set and let g : Ω Ω be a function which satisfies g(p = g(q p = q. Let α be a parameter in (0,. We define the (scaled skew g-jensen divergence, the g-bregman divergence and the g-symmetric Bregman divergence as follows. ( sj g F,α (p,q sj F,α(g(p,g(q (2 B g F (p,q B F(g(p,g(q (3 B g F,sym (p,q B F,sym(g(p,g(q (4 From the definition of function g, these functions satisfy divergence properties D g (p,q = 0 g(p = g(q p = q and D g (p,q 0, where D g indicates the g-divergences. When g(p is equal to p for all p Ω, the g-divergences are consistent with original divergences. In the following, we assume that g is a function having an inverse function. The g-bregman divergence and the symmetric g-bregman divergence satisfy the linearity, the generalized law of cosines and the generalized parallelogram law as well as the Bregman and the symmetric Bregman divergence. Linearity For positive constant c and c 2, holds. B g c F +c 2F 2 (p,q = c B g F (p,q+c 2 B g F 2 (p,q (5 Proposition. (Generalized law of cosines Let Ω be a g-convex set. For points p,q,r Ω, the following equations hold. B g F (p,q = Bg F (p,r+bg F (r,q g(p g(r, F(g(q F(g(r (6 B g F,sym (p,q = Bg F,sym (p,r+bg F,sym (r,q (7 g(p g(r, F(g(q F(g(r + g(q g(r, F(g(p F(g(r It is easily proved in the same way as the Bregman divergence by putting p = g(p, q = g(q and r = g(r. Theorem. (Generalized parallelogram law Let Ω be a g-convexset. When points p,q,r,s Ω satisfy g(p+g(r = g(q+g(s, the following equations holds. B g F,sym (p,q+bg F,sym (q,r+bg F,sym (r,s+bg F,sym (s,q = Bg F,sym (p,r+bg F,sym (q,s (8 The left hand side is the sum of four sides of rectangle pqrs and the right hand side is the sum of diagonal lines of rectangle pqrs. Proof. It has been shown that the Bregman divergence satisfies the four point identity(see equation (2.3 in [6]. F(r F(s,p q = B F (q,r+b F (p,s B F (p,r B F (q,s (9
4 4 TOMOHIRO NISHIYAMA By putting p = g(p, q = g(q, r = g(r and s = g(s, we can prove the g- Bregman divergence satisfies the similar four point identity. F(g(r F(g(s,g(p g(q = B g F (q,r+bg F (p,s Bg F (p,r Bg F (q,s (20 By combining the assumption and the definition of the symmetric g-bregman divergence ((7 and (4, we have B g F,sym (r,s = Bg F (q,r+bg F (p,s Bg F (p,r Bg F (q,s. (2 Exchanging p r and q s in (2 and taking the sum with (2, the result follows. This theorem can be also proved in the same way as mentioned in the paper [5]. The same relation between the Bregman and the skew-jensen divergence also hold for the g-bregman and the skew g-jensen divergence. B g F (q,p = lim α 0 sjg F,α (p,q (22 B g F (p,q = lim α sjg F,α (p,q (23 Lemma. Let Ω be a g-convexset and let p,q be points p,q Ω. For a parameter α (0, and a point r Ω which satisfies g(r = ( αg(p + αg(q, the g-bregman divergence can be expressed as follows. sj g F,α (p,q α( α (( αb gf (p,r+αbgf (q,r It is easily proved in the same way as equation ( by putting p = g(p and q = g(q. Next, we derive an inequality between the skew g-jensen and the symmetric g-bregman divergence. Lemma 2. Let Ω be a g-convexset and let p,q be points p,q Ω. For a parameter α (0, and a point r Ω which satisfies g(r = ( αg(p+αg(q, the following equations hold. B g F (p,q = Bg F (p,r+bg F (r,q+ α α Bg F,sym (r,q (25 (24 B g F (q,p = α Bg F (r,p+bg F (q,r+ α Bg F,sym (r,p (26 Proof. We first prove (25. By the generalized law of cosines (6, we have B g F (p,q = Bg F (p,r+bg F (r,q+ g(r g(p, F(g(q F(g(r. (27 From the assumption, the equation g(r g(p = α (g(q g(r (28 α holds. By substituting (28 to (27 and using (4, the result follows. By exchanging p and n (27, we can prove (26 in the same way. Lemma 3. Let Ω be a g-convexset and let p,q be points p,q Ω. For a parameter α (0, and a point r Ω which satisfies g(r = ( αg(p+αg(q, the following equation holds. B g F,sym (p,q = α Bg F,sym (p,r+ α Bg F,sym (r,q (29 Proof. Taking the sum of (25 and (26, the result follows.
5 GENERALIZED BREGMAN AND JENSEN DIVERGENCES WHICH INCLUDE SOME F-DIVERGENCES5 Theorem 2. (Jensen-Bregaman inequality Let Ω be a g-convex set and let p,q be points in Ω. For a parameter α (0,, the skew g-jensen divergence and the symmetric g-bregman divergence, the following inequality holds. B g F,sym (p,q sjg F,α (p,q (30 Proof. Let r Ω be a point which satisfies g(r = ( αg(p + αg(q. From (24 and using B g F (p,q Bg F,sym (p,q, we have sj g F,α (p,q = α Bg F (p,r+ α Bg F (q,r α Bg F,sym (p,r+ α Bg F,sym (q,r. Using Lemma 3., we have (3 sj g F,α (p,q α Bg F,sym (p,r+ α Bg F,sym (r,q = Bg F,sym (p,q (32 3. Examples We show some examples of the g-divergences. We denote the g-bregman divergence as GBD, the symmetric g-bregman divergence as SGBD, the skew g-jensen divergence as SGJD and the Jensen-Bregman inequality (30 as JBinequality. Aparameterαisin(0,exceptforexample3andweuseaparameterβ (0, instead of α in example 3. From example to 6, we show the case that GBD is belong to a class of f-divergence and example 7 is the case of information geometry [2 4] and statistical mechanics. KL-divergence, Jeffreys divergence, JS-divergence Generating functions: F(p = i p ilnp i, g(p = p GBD: generalized KL-divergence KL(p q i ( p i + +p i ln( pi SGBD: Jeffreys divergence(j-divergence [0] J(p,q i (p i ln( pi SGJD: scaled skew JS-divergence ( α( α JS α(p q α( α ( αkl(p ( αp+αq+αkl(q ( αp+αq JB-ineuqlity: generalized Lin s inequality α( αj(p,q JS α (p,q. When α is equall to 2, this inequality is consistent with Lin s inequality [2]. 2 KL-divergence, Jeffreys divergence, α-divergence Generating functions: F(p = i exp(p i, g(p = lnp GBD: generalized reverse KL-divergence KL(q p i (p i + ln( qi p i SGBD: J-divergence J(p,q i (p i ln( pi SGJD: generalized α-divergenced α (p q, where D α (p q i αp i ( α JB-ineuqlity: J(p,q D α (p q 3 α-divergence Generating functions: F(p = α i p α i, g(p = p α for α R\{0,} GBD: generalized α-divergence D α (p q SGBD: symmetric α-divergence ( D α,sym (p,q D α (p q+d α (q p ( βp i +β ( ( βp α i +βqα α i SGJD: β( β( α i α(α (pα i q α i
6 6 TOMOHIRO NISHIYAMA ( JB-ineuqlity: D α,sym (p,q β( β( α i ( βp i +β ( ( βp α i +βqα i α By using the equations 2 D ( 2 (p q = H2 (p,q, 2D (2 (p q = D χ 2,P(p q and 2D ( (p q = D χ2,n(p q [8], we obtain example 4 to 6. 4 Hellinger distance Generating functions: F(p = i p2 i, g(p = p GBD: Hellinger distance H 2 (p,q i ( p i 2 SGBD: Hellinger distance 2H 2 (p,q SGJD: Hellinger distance H 2 (p,q JB-ineuqlity: trivial 5 Pearson χ-square divergence Generating functions: F(p = 2 i pi, g(p = p 2 (p i 2 GBD: Pearson χ-square divergence D χ2,p(q p i SGBD: symmetric χ-square divergence D χ 2,sym(p,q D χ 2(p q+d χ 2(q p 2 SGJD: α( α i ( ( αp i α + ( αp 2 i +αq2 i JB-ineuqlity: D χ2,sym(p,q 2 α( α i ( ( αp i α + ( αp 2 i +αq2 i 6 Neyman χ-square divergence Generating functions: F(p = i p i, g(p = p (p i 2 p i GBD: Neyman χ-square divergence D χ 2,N(q p i SGBD: symmetric χ-square divergence D χ 2,sym(p,q SGJD: α( α i (( αp p i +α i ( α+αp i JB-ineuqlity: D χ 2,sym(p,q α( α i (( αp i +α p i ( α+αp i 7 Information geometry and statistical mechanics In the dually flat space M, there exist two affine coordinates θ i, η i and two convex functions called potential ψ(θ, φ(η. For the points P,Q M, we put p i = θ i (P, = θ i (Q or p i = η i (P, = η i (Q. Generating functions: F(p = ψ(θ or F(p = φ(η, g(p = p GBD: canonical divergence D(P Q φ(η(q+ψ(θ(p i η i(qθ i (P SGBD: D A (P,Q i (η i(q η i (P(θ i (Q θ i (P SGJD: sj ψ,α (θ(p,θ(q or sj φ,α (η(p,η(q JB-ineuqlity: D A (P,Q sj ψ,α (θ(p,θ(q or D A (P,Q sj φ,α (η(p,η(q For the exponential or mixture families, GBD is equal to the KL-divergence and SGBD is equal to the J-divergence. For the exponential families, SGJD sj ψ,α (θ(p,θ(q is equal to the skew Bhattacharyya distance [5,4], and for the mixture families, SGJD sj φ,α (η(p,η(q is equal to the skew-js divergence(see [5] in detail. In the case of canonical ensemble in statistical mechanics, the probability distribution function belongs to exponential families. Because the manifold of the exponential family is a dually flat space, the same equations and inequality hold
7 GENERALIZED BREGMAN AND JENSEN DIVERGENCES WHICH INCLUDE SOME F-DIVERGENCES7 for θ = β k B T (33 η = U (34 ψ = βf (35 φ = S, (36 k B where k B is the Boltzmann constant, T is temperature, U is the internal energy, S is the entropy and F is the Helmholtz free energy. Defining aparameterαas aindex ofthe statewhich satisfy T α = ( α T 0 +α T or U α = ( αu 0 +αu, JB-inequality can be written as follows. α( α (T T 0 (U U 0 T 0 T F α T α ( α F 0 T 0 α F T (37 α( α (T T 0 (U U 0 T 0 T S α ( αs 0 αs (38 4. Conclusion We have introduced the g-bregman divergence and the skew g-jensen divergence by extending the definitions of the Bregman and the skew Jensen divergence respectively. We have shown they satisfy some properties similar to the Bregman and the skew-jensen divergence. Furthermore, we have derived an inequality between the symmetric g-bregman divergence and the skew g-jensen divergence and we have shown they include divergences which belong to a class of f-divergence. If the relationship between these three classes of divergence are studied in more detail, it is expected to proceed applications to various fields including machine learning. References [] Syed Mumtaz Ali and Samuel D Silvey. A general class of coefficients of divergence of one distribution from another. Journal of the Royal Statistical Society. Series B (Methodological, pages 3 42, 966. [2] Shun-ichi Amari. Information geometry and its applications. Springer, 206. [3] Shun-ichi Amari and Andrzej Cichocki. Information geometry of divergence functions. Bulletin of the Polish Academy of Sciences: Technical Sciences, 58(:83 95, 200. [4] Shun-ichi Amari and Hiroshi Nagaoka. Methods of information geometry, volume 9. American Mathematical Soc., [5] Anil Bhattacharyya. On a measure of divergence between two statistical populations defined by their probability distributions. Bull. Calcutta Math. Soc., 35:99 09, 943. [6] Lev M Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR computational mathematics and mathematical physics, 7(3:200 27, 967. [7] Jacob Burbea and C Radhakrishna Rao. On the convexity of some divergence measures based on entropy functions. Technical report, PITTSBURGH UNIV PA INST FOR STATISTICS AND APPLICATIONS, 980. [8] Andrzej Cichocki and Shun-ichi Amari. Families of alpha-beta-and gamma-divergences: Flexible and robust measures of similarities. Entropy, 2(6: , 200. [9] Imre Csiszár. Information-type measures of difference of probability distributions and indirect observation. studia scientiarum Mathematicarum Hungarica, 2:229 38, 967. [0] Harold Jeffreys. An invariant form for the prior probability in estimation problems. Proc. R. Soc. Lond. A, 86(007:453 46, 946. [] Solomon Kullback. Information theory and statistics. Courier Corporation, 997. [2] Jianhua Lin. Divergence measures based on the shannon entropy. IEEE Transactions on Information theory, 37(:45 5, 99. [3] Frank Nielsen. A family of statistical symmetric divergences based on jensen s inequality. arxiv preprint arxiv: , 200.
8 8 TOMOHIRO NISHIYAMA [4] Frank Nielsen and Sylvain Boltz. The burbea-rao and bhattacharyya centroids. IEEE Transactions on Information Theory, 57(8: , 20. [5] Tomohiro Nishiyama. Divergence functions in dually flat spaces and their properties. arxiv preprint arxiv: , 208. [6] Simeon Reich and Shoham Sabach. Two strong convergence theorems for bregman strongly nonexpansive operators in reflexive banach spaces. Nonlinear Analysis: Theory, Methods & Applications, 73(:22 35, 200.
On the Chi square and higher-order Chi distances for approximating f-divergences
On the Chi square and higher-order Chi distances for approximating f-divergences Frank Nielsen, Senior Member, IEEE and Richard Nock, Nonmember Abstract We report closed-form formula for calculating the
More informationOn the Chi square and higher-order Chi distances for approximating f-divergences
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 1/17 On the Chi square and higher-order Chi distances for approximating f-divergences Frank Nielsen 1 Richard Nock 2 www.informationgeometry.org
More informationarxiv:math/ v1 [math.st] 19 Jan 2005
ON A DIFFERENCE OF JENSEN INEQUALITY AND ITS APPLICATIONS TO MEAN DIVERGENCE MEASURES INDER JEET TANEJA arxiv:math/05030v [math.st] 9 Jan 005 Let Abstract. In this paper we have considered a difference
More informationarxiv: v2 [cs.cv] 19 Dec 2011
A family of statistical symmetric divergences based on Jensen s inequality arxiv:009.4004v [cs.cv] 9 Dec 0 Abstract Frank Nielsen a,b, a Ecole Polytechnique Computer Science Department (LIX) Palaiseau,
More informationA SYMMETRIC INFORMATION DIVERGENCE MEASURE OF CSISZAR'S F DIVERGENCE CLASS
Journal of the Applied Mathematics, Statistics Informatics (JAMSI), 7 (2011), No. 1 A SYMMETRIC INFORMATION DIVERGENCE MEASURE OF CSISZAR'S F DIVERGENCE CLASS K.C. JAIN AND R. MATHUR Abstract Information
More informationarxiv: v1 [cs.lg] 22 Oct 2018
The Bregman chord divergence arxiv:1810.09113v1 [cs.lg] Oct 018 rank Nielsen Sony Computer Science Laboratories Inc, Japan rank.nielsen@acm.org Richard Nock Data 61, Australia The Australian National &
More informationTight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding
APPEARS IN THE IEEE TRANSACTIONS ON INFORMATION THEORY, FEBRUARY 015 1 Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding Igal Sason Abstract Tight bounds for
More informationarxiv: v2 [cs.it] 26 Sep 2011
Sequences o Inequalities Among New Divergence Measures arxiv:1010.041v [cs.it] 6 Sep 011 Inder Jeet Taneja Departamento de Matemática Universidade Federal de Santa Catarina 88.040-900 Florianópolis SC
More informationThe Information Bottleneck Revisited or How to Choose a Good Distortion Measure
The Information Bottleneck Revisited or How to Choose a Good Distortion Measure Peter Harremoës Centrum voor Wiskunde en Informatica PO 94079, 1090 GB Amsterdam The Nederlands PHarremoes@cwinl Naftali
More informationSome New Information Inequalities Involving f-divergences
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 12, No 2 Sofia 2012 Some New Information Inequalities Involving f-divergences Amit Srivastava Department of Mathematics, Jaypee
More informationInformation geometry of Bayesian statistics
Information geometry of Bayesian statistics Hiroshi Matsuzoe Department of Computer Science and Engineering, Graduate School of Engineering, Nagoya Institute of Technology, Nagoya 466-8555, Japan Abstract.
More informationInformation Geometry of Positive Measures and Positive-Definite Matrices: Decomposable Dually Flat Structure
Entropy 014, 16, 131-145; doi:10.3390/e1604131 OPEN ACCESS entropy ISSN 1099-4300 www.mdpi.com/journal/entropy Article Information Geometry of Positive Measures and Positive-Definite Matrices: Decomposable
More informationNested Inequalities Among Divergence Measures
Appl Math Inf Sci 7, No, 49-7 0 49 Applied Mathematics & Information Sciences An International Journal c 0 NSP Natural Sciences Publishing Cor Nested Inequalities Among Divergence Measures Inder J Taneja
More informationSymplectic and Kähler Structures on Statistical Manifolds Induced from Divergence Functions
Symplectic and Kähler Structures on Statistical Manifolds Induced from Divergence Functions Jun Zhang 1, and Fubo Li 1 University of Michigan, Ann Arbor, Michigan, USA Sichuan University, Chengdu, China
More informationAN AFFINE EMBEDDING OF THE GAMMA MANIFOLD
AN AFFINE EMBEDDING OF THE GAMMA MANIFOLD C.T.J. DODSON AND HIROSHI MATSUZOE Abstract. For the space of gamma distributions with Fisher metric and exponential connections, natural coordinate systems, potential
More informationMutual Information and Optimal Data Coding
Mutual Information and Optimal Data Coding May 9 th 2012 Jules de Tibeiro Université de Moncton à Shippagan Bernard Colin François Dubeau Hussein Khreibani Université de Sherbooe Abstract Introduction
More informationInformation Geometry on Hierarchy of Probability Distributions
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 47, NO. 5, JULY 2001 1701 Information Geometry on Hierarchy of Probability Distributions Shun-ichi Amari, Fellow, IEEE Abstract An exponential family or mixture
More informationtopics about f-divergence
topics about f-divergence Presented by Liqun Chen Mar 16th, 2018 1 Outline 1 f-gan: Training Generative Neural Samplers using Variational Experiments 2 f-gans in an Information Geometric Nutshell Experiments
More informationMathematical Foundations of the Generalization of t-sne and SNE for Arbitrary Divergences
MACHINE LEARNING REPORTS Mathematical Foundations of the Generalization of t-sne and SNE for Arbitrary Report 02/2010 Submitted: 01.04.2010 Published:26.04.2010 T. Villmann and S. Haase University of Applied
More informationIntroduction to Information Geometry
Introduction to Information Geometry based on the book Methods of Information Geometry written by Shun-Ichi Amari and Hiroshi Nagaoka Yunshu Liu 2012-02-17 Outline 1 Introduction to differential geometry
More informationNon-Negative Matrix Factorization with Quasi-Newton Optimization
Non-Negative Matrix Factorization with Quasi-Newton Optimization Rafal ZDUNEK, Andrzej CICHOCKI Laboratory for Advanced Brain Signal Processing BSI, RIKEN, Wako-shi, JAPAN Abstract. Non-negative matrix
More informationZOBECNĚNÉ PHI-DIVERGENCE A EM-ALGORITMUS V AKUSTICKÉ EMISI
ZOBECNĚNÉ PHI-DIVERGENCE A EM-ALGORITMUS V AKUSTICKÉ EMISI Jan Tláskal a Václav Kůs Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University in Prague 11.11.2010 Concepts What the
More informationPROPERTIES. Ferdinand Österreicher Institute of Mathematics, University of Salzburg, Austria
CSISZÁR S f-divergences - BASIC PROPERTIES Ferdinand Österreicher Institute of Mathematics, University of Salzburg, Austria Abstract In this talk basic general properties of f-divergences, including their
More informationJune 21, Peking University. Dual Connections. Zhengchao Wan. Overview. Duality of connections. Divergence: general contrast functions
Dual Peking University June 21, 2016 Divergences: Riemannian connection Let M be a manifold on which there is given a Riemannian metric g =,. A connection satisfying Z X, Y = Z X, Y + X, Z Y (1) for all
More informationA View on Extension of Utility-Based on Links with Information Measures
Communications of the Korean Statistical Society 2009, Vol. 16, No. 5, 813 820 A View on Extension of Utility-Based on Links with Information Measures A.R. Hoseinzadeh a, G.R. Mohtashami Borzadaran 1,b,
More informationInformation geometry of the power inverse Gaussian distribution
Information geometry of the power inverse Gaussian distribution Zhenning Zhang, Huafei Sun and Fengwei Zhong Abstract. The power inverse Gaussian distribution is a common distribution in reliability analysis
More informationInformation Divergences and the Curious Case of the Binary Alphabet
Information Divergences and the Curious Case of the Binary Alphabet Jiantao Jiao, Thomas Courtade, Albert No, Kartik Venkat, and Tsachy Weissman Department of Electrical Engineering, Stanford University;
More informationResearch Article On Some Improvements of the Jensen Inequality with Some Applications
Hindawi Publishing Corporation Journal of Inequalities and Applications Volume 2009, Article ID 323615, 15 pages doi:10.1155/2009/323615 Research Article On Some Improvements of the Jensen Inequality with
More informationF -Geometry and Amari s α Geometry on a Statistical Manifold
Entropy 014, 16, 47-487; doi:10.3390/e160547 OPEN ACCESS entropy ISSN 1099-4300 www.mdpi.com/journal/entropy Article F -Geometry and Amari s α Geometry on a Statistical Manifold Harsha K. V. * and Subrahamanian
More informationInformation Geometric view of Belief Propagation
Information Geometric view of Belief Propagation Yunshu Liu 2013-10-17 References: [1]. Shiro Ikeda, Toshiyuki Tanaka and Shun-ichi Amari, Stochastic reasoning, Free energy and Information Geometry, Neural
More informationCONVEX FUNCTIONS AND MATRICES. Silvestru Sever Dragomir
Korean J. Math. 6 (018), No. 3, pp. 349 371 https://doi.org/10.11568/kjm.018.6.3.349 INEQUALITIES FOR QUANTUM f-divergence OF CONVEX FUNCTIONS AND MATRICES Silvestru Sever Dragomir Abstract. Some inequalities
More informationInformation geometry connecting Wasserstein distance and Kullback Leibler divergence via the entropy-relaxed transportation problem
https://doi.org/10.1007/s41884-018-0002-8 RESEARCH PAPER Information geometry connecting Wasserstein distance and Kullback Leibler divergence via the entropy-relaxed transportation problem Shun-ichi Amari
More informationChapter 2 Exponential Families and Mixture Families of Probability Distributions
Chapter 2 Exponential Families and Mixture Families of Probability Distributions The present chapter studies the geometry of the exponential family of probability distributions. It is not only a typical
More informationInformation geometry of mirror descent
Information geometry of mirror descent Geometric Science of Information Anthea Monod Department of Statistical Science Duke University Information Initiative at Duke G. Raskutti (UW Madison) and S. Mukherjee
More informationf-divergence Inequalities
SASON AND VERDÚ: f-divergence INEQUALITIES f-divergence Inequalities Igal Sason, and Sergio Verdú Abstract arxiv:508.00335v7 [cs.it] 4 Dec 06 This paper develops systematic approaches to obtain f-divergence
More informationLegendre transformation and information geometry
Legendre transformation and information geometry CIG-MEMO #2, v1 Frank Nielsen École Polytechnique Sony Computer Science Laboratorie, Inc http://www.informationgeometry.org September 2010 Abstract We explain
More informationf-divergence Inequalities
f-divergence Inequalities Igal Sason Sergio Verdú Abstract This paper develops systematic approaches to obtain f-divergence inequalities, dealing with pairs of probability measures defined on arbitrary
More informationAlpha/Beta Divergences and Tweedie Models
Alpha/Beta Divergences and Tweedie Models 1 Y. Kenan Yılmaz A. Taylan Cemgil Department of Computer Engineering Boğaziçi University, Istanbul, Turkey kenan@sibnet.com.tr, taylan.cemgil@boun.edu.tr arxiv:1209.4280v1
More informationMachine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang
Machine Learning Lecture 02.2: Basics of Information Theory Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering The Hong Kong University of Science and Technology Nevin L. Zhang
More informationBregman divergence and density integration Noboru Murata and Yu Fujimoto
Journal of Math-for-industry, Vol.1(2009B-3), pp.97 104 Bregman divergence and density integration Noboru Murata and Yu Fujimoto Received on August 29, 2009 / Revised on October 4, 2009 Abstract. In this
More informationAdvanced Machine Learning & Perception
Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 6 Standard Kernels Unusual Input Spaces for Kernels String Kernels Probabilistic Kernels Fisher Kernels Probability Product Kernels
More informationTight Bounds for Symmetric Divergence Measures and a New Inequality Relating f-divergences
Tight Bounds for Symmetric Divergence Measures and a New Inequality Relating f-divergences Igal Sason Department of Electrical Engineering Technion, Haifa 3000, Israel E-mail: sason@ee.technion.ac.il Abstract
More informationLiterature on Bregman divergences
Literature on Bregman divergences Lecture series at Univ. Hawai i at Mānoa Peter Harremoës February 26, 2016 Information divergence was introduced by Kullback and Leibler [25] and later Kullback started
More informationMean-field equations for higher-order quantum statistical models : an information geometric approach
Mean-field equations for higher-order quantum statistical models : an information geometric approach N Yapage Department of Mathematics University of Ruhuna, Matara Sri Lanka. arxiv:1202.5726v1 [quant-ph]
More informationOn Information Divergence Measures, Surrogate Loss Functions and Decentralized Hypothesis Testing
On Information Divergence Measures, Surrogate Loss Functions and Decentralized Hypothesis Testing XuanLong Nguyen Martin J. Wainwright Michael I. Jordan Electrical Engineering & Computer Science Department
More informationBelief Propagation, Information Projections, and Dykstra s Algorithm
Belief Propagation, Information Projections, and Dykstra s Algorithm John MacLaren Walsh, PhD Department of Electrical and Computer Engineering Drexel University Philadelphia, PA jwalsh@ece.drexel.edu
More informationThis article was published in an Elsevier journal. The attached copy is furnished to the author for non-commercial research and education use, including for instruction at the author s institution, sharing
More informationExistence and Approximation of Fixed Points of. Bregman Nonexpansive Operators. Banach Spaces
Existence and Approximation of Fixed Points of in Reflexive Banach Spaces Department of Mathematics The Technion Israel Institute of Technology Haifa 22.07.2010 Joint work with Prof. Simeon Reich General
More informationJournée Interdisciplinaire Mathématiques Musique
Journée Interdisciplinaire Mathématiques Musique Music Information Geometry Arnaud Dessein 1,2 and Arshia Cont 1 1 Institute for Research and Coordination of Acoustics and Music, Paris, France 2 Japanese-French
More informationSymmetrizing the Kullback-Leibler distance
Symmetrizing the Kullback-Leibler distance Don H. Johnson Λ and Sinan Sinanović Computer and Information Technology Institute Department of Electrical and Computer Engineering Rice University Houston,
More informationApproximating Covering and Minimum Enclosing Balls in Hyperbolic Geometry
Approximating Covering and Minimum Enclosing Balls in Hyperbolic Geometry Frank Nielsen 1 and Gaëtan Hadjeres 1 École Polytechnique, France/Sony Computer Science Laboratories, Japan Frank.Nielsen@acm.org
More information2. A diagonal of a parallelogram divides it into two congruent triangles. 5. Diagonals of a rectangle bisect each other and are equal and vice-versa.
QURILTERLS 1. Sum of the angles of a quadrilateral is 360. 2. diagonal of a parallelogram divides it into two congruent triangles. 3. In a parallelogram, (i) opposite sides are equal (ii) opposite angles
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationTotal Jensen divergences: Definition, Properties and k-means++ Clustering
Total Jensen divergences: Definition, Properties and k-means++ Clustering Frank Nielsen, Senior Member, IEEE Sony Computer Science Laboratories, Inc. 3-4-3 Higashi Gotanda, 4-00 Shinagawa-ku arxiv:309.709v
More informationExpectation Maximization
Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /
More informationInformation Projection Algorithms and Belief Propagation
χ 1 π(χ 1 ) Information Projection Algorithms and Belief Propagation Phil Regalia Department of Electrical Engineering and Computer Science Catholic University of America Washington, DC 20064 with J. M.
More information1/37. Convexity theory. Victor Kitov
1/37 Convexity theory Victor Kitov 2/37 Table of Contents 1 2 Strictly convex functions 3 Concave & strictly concave functions 4 Kullback-Leibler divergence 3/37 Convex sets Denition 1 Set X is convex
More informationInformation Geometry and Algebraic Statistics
Matematikos ir informatikos institutas Vinius Information Geometry and Algebraic Statistics Giovanni Pistone and Eva Riccomagno POLITO, UNIGE Wednesday 30th July, 2008 G.Pistone, E.Riccomagno (POLITO,
More informationPattern learning and recognition on statistical manifolds: An information-geometric review
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 1/64 Pattern learning and recognition on statistical manifolds: An information-geometric review Frank Nielsen Frank.Nielsen@acm.org www.informationgeometry.org
More informationPosterior Regularization
Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods
More informationOn Improved Bounds for Probability Metrics and f- Divergences
IRWIN AND JOAN JACOBS CENTER FOR COMMUNICATION AND INFORMATION TECHNOLOGIES On Improved Bounds for Probability Metrics and f- Divergences Igal Sason CCIT Report #855 March 014 Electronics Computers Communications
More informationLimits from l Hôpital rule: Shannon entropy as limit cases of Rényi and Tsallis entropies
Limits from l Hôpital rule: Shannon entropy as it cases of Rényi and Tsallis entropies Frank Nielsen École Polytechnique/Sony Computer Science Laboratorie, Inc http://www.informationgeometry.org August
More informationDivergences, surrogate loss functions and experimental design
Divergences, surrogate loss functions and experimental design XuanLong Nguyen University of California Berkeley, CA 94720 xuanlong@cs.berkeley.edu Martin J. Wainwright University of California Berkeley,
More informationOn the decay of elements of inverse triangular Toeplitz matrix
On the decay of elements of inverse triangular Toeplitz matrix Neville Ford, D. V. Savostyanov, N. L. Zamarashkin August 03, 203 arxiv:308.0724v [math.na] 3 Aug 203 Abstract We consider half-infinite triangular
More informationarxiv: v4 [cs.it] 8 Apr 2014
1 On Improved Bounds for Probability Metrics and f-divergences Igal Sason Abstract Derivation of tight bounds for probability metrics and f-divergences is of interest in information theory and statistics.
More informationCOMMON FIXED POINT THEOREM OF THREE MAPPINGS IN COMPLETE METRIC SPACE
COMMON FIXED POINT THEOREM OF THREE MAPPINGS IN COMPLETE METRIC SPACE Latpate V.V. and Dolhare U.P. ACS College Gangakhed DSM College Jintur Abstract:-In this paper we prove common fixed point theorem
More informationA New Quantum f-divergence for Trace Class Operators in Hilbert Spaces
Entropy 04, 6, 5853-5875; doi:0.3390/e65853 OPEN ACCESS entropy ISSN 099-4300 www.mdpi.com/journal/entropy Article A New Quantum f-divergence for Trace Class Operators in Hilbert Spaces Silvestru Sever
More informationConvexity/Concavity of Renyi Entropy and α-mutual Information
Convexity/Concavity of Renyi Entropy and -Mutual Information Siu-Wai Ho Institute for Telecommunications Research University of South Australia Adelaide, SA 5095, Australia Email: siuwai.ho@unisa.edu.au
More informationSuperstability of Pexiderized functional equations arising from distance measures
Available online at www.tjnsa.com J. Nonlinear Sci. Appl. 9 (206), 43 423 Research Article Superstability of Pexiderized functional equations arising from distance measures Gwang Hui Kim a, Young Whan
More informationMeasuring Disclosure Risk and Information Loss in Population Based Frequency Tables
Measuring Disclosure Risk and Information Loss in Population Based Frequency Tables László Antal, atalie Shlomo, Mark Elliot University of Manchester laszlo.antal@postgrad.manchester.ac.uk 8 September
More informationShunlong Luo. Academy of Mathematics and Systems Science Chinese Academy of Sciences
Superadditivity of Fisher Information: Classical vs. Quantum Shunlong Luo Academy of Mathematics and Systems Science Chinese Academy of Sciences luosl@amt.ac.cn Information Geometry and its Applications
More informationBayesian Inference Course, WTCN, UCL, March 2013
Bayesian Course, WTCN, UCL, March 2013 Shannon (1948) asked how much information is received when we observe a specific value of the variable x? If an unlikely event occurs then one would expect the information
More informationInformation Measures: the Curious Case of the Binary Alphabet
Information Measures: the Curious Case of the Binary Alphabet Jiantao Jiao, Student Member, IEEE, Thomas A. Courtade, Member, IEEE, Albert No, Student Member, IEEE, Kartik Venkat, Student Member, IEEE,
More informationInformation geometry in optimization, machine learning and statistical inference
Front. Electr. Electron. Eng. China 2010, 5(3): 241 260 DOI 10.1007/s11460-010-0101-3 Shun-ichi AMARI Information geometry in optimization, machine learning and statistical inference c Higher Education
More informationEntropy measures of physics via complexity
Entropy measures of physics via complexity Giorgio Kaniadakis and Flemming Topsøe Politecnico of Torino, Department of Physics and University of Copenhagen, Department of Mathematics 1 Introduction, Background
More informationStatistical Analysis of Distance Estimators with Density Differences and Density Ratios
Entropy 2014, 16, 921-942; doi:10.3390/e16020921 OPEN ACCESS entropy ISSN 1099-4300 www.mdpi.com/journal/entropy Article Statistical Analysis of Distance Estimators with Density Differences and Density
More informationStrong Converse and Stein s Lemma in the Quantum Hypothesis Testing
Strong Converse and Stein s Lemma in the Quantum Hypothesis Testing arxiv:uant-ph/9906090 v 24 Jun 999 Tomohiro Ogawa and Hiroshi Nagaoka Abstract The hypothesis testing problem of two uantum states is
More informationStatistical Machine Learning Lectures 4: Variational Bayes
1 / 29 Statistical Machine Learning Lectures 4: Variational Bayes Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 29 Synonyms Variational Bayes Variational Inference Variational Bayesian Inference
More informationNew universal Lyapunov functions for nonlinear kinetics
New universal Lyapunov functions for nonlinear kinetics Department of Mathematics University of Leicester, UK July 21, 2014, Leicester Outline 1 2 Outline 1 2 Boltzmann Gibbs-Shannon relative entropy (1872-1948)
More informationWasserstein GAN. Juho Lee. Jan 23, 2017
Wasserstein GAN Juho Lee Jan 23, 2017 Wasserstein GAN (WGAN) Arxiv submission Martin Arjovsky, Soumith Chintala, and Léon Bottou A new GAN model minimizing the Earth-Mover s distance (Wasserstein-1 distance)
More informationTHEOREM AND METRIZABILITY
f-divergences - REPRESENTATION THEOREM AND METRIZABILITY Ferdinand Österreicher Institute of Mathematics, University of Salzburg, Austria Abstract In this talk we are first going to state the so-called
More informationGoodness of Fit Test and Test of Independence by Entropy
Journal of Mathematical Extension Vol. 3, No. 2 (2009), 43-59 Goodness of Fit Test and Test of Independence by Entropy M. Sharifdoost Islamic Azad University Science & Research Branch, Tehran N. Nematollahi
More informationBoyd Indices of Orlicz Lorentz Spaces
Boyd Indices of Orlicz Lorentz Spaces Department of Mathematics, University of Mis- STEPHEN J MONTGOMERY-SMITH souri, Columbia, Missouri 65211 ABSTRACT Orlicz Lorentz spaces provide a common generalization
More informationNishant Gurnani. GAN Reading Group. April 14th, / 107
Nishant Gurnani GAN Reading Group April 14th, 2017 1 / 107 Why are these Papers Important? 2 / 107 Why are these Papers Important? Recently a large number of GAN frameworks have been proposed - BGAN, LSGAN,
More informationTHE UNIVERSITY OF TORONTO UNDERGRADUATE MATHEMATICS COMPETITION. Sunday, March 14, 2004 Time: hours No aids or calculators permitted.
THE UNIVERSITY OF TORONTO UNDERGRADUATE MATHEMATICS COMPETITION Sunday, March 1, Time: 3 1 hours No aids or calculators permitted. 1. Prove that, for any complex numbers z and w, ( z + w z z + w w z +
More information21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk)
10-704: Information Processing and Learning Spring 2015 Lecture 21: Examples of Lower Bounds and Assouad s Method Lecturer: Akshay Krishnamurthy Scribes: Soumya Batra Note: LaTeX template courtesy of UC
More informationL p Spaces and Convexity
L p Spaces and Convexity These notes largely follow the treatments in Royden, Real Analysis, and Rudin, Real & Complex Analysis. 1. Convex functions Let I R be an interval. For I open, we say a function
More informationBregman Divergences for Data Mining Meta-Algorithms
p.1/?? Bregman Divergences for Data Mining Meta-Algorithms Joydeep Ghosh University of Texas at Austin ghosh@ece.utexas.edu Reflects joint work with Arindam Banerjee, Srujana Merugu, Inderjit Dhillon,
More informationHamiltonian Mechanics
Chapter 3 Hamiltonian Mechanics 3.1 Convex functions As background to discuss Hamiltonian mechanics we discuss convexity and convex functions. We will also give some applications to thermodynamics. We
More informationInference. Data. Model. Variates
Data Inference Variates Model ˆθ (,..., ) mˆθn(d) m θ2 M m θ1 (,, ) (,,, ) (,, ) α = :=: (, ) F( ) = = {(, ),, } F( ) X( ) = Γ( ) = Σ = ( ) = ( ) ( ) = { = } :=: (U, ) , = { = } = { = } x 2 e i, e j
More informationBounds for entropy and divergence for distributions over a two-element set
Bounds for entropy and divergence for distributions over a two-element set Flemming Topsøe Department of Mathematics University of Copenhagen, Denmark 18.1.00 Abstract Three results dealing with probability
More informationReceived: 20 December 2011; in revised form: 4 February 2012 / Accepted: 7 February 2012 / Published: 2 March 2012
Entropy 2012, 14, 480-490; doi:10.3390/e14030480 Article OPEN ACCESS entropy ISSN 1099-4300 www.mdpi.com/journal/entropy Interval Entropy and Informative Distance Fakhroddin Misagh 1, * and Gholamhossein
More informationInformation Theory and Hypothesis Testing
Summer School on Game Theory and Telecommunications Campione, 7-12 September, 2014 Information Theory and Hypothesis Testing Mauro Barni University of Siena September 8 Review of some basic results linking
More informationMinimax lower bounds I
Minimax lower bounds I Kyoung Hee Kim Sungshin University 1 Preliminaries 2 General strategy 3 Le Cam, 1973 4 Assouad, 1983 5 Appendix Setting Family of probability measures {P θ : θ Θ} on a sigma field
More informationSupplementary Materials for: f-gan: Training Generative Neural Samplers using Variational Divergence Minimization
Supplementary Materials for: f-gan: Training Generative Neural Samplers using Variational Divergence Minimization Sebastian Nowozin, Botond Cseke, Ryota Tomioka Machine Intelligence and Perception Group
More informationBISTA: a Bregmanian proximal gradient method without the global Lipschitz continuity assumption
BISTA: a Bregmanian proximal gradient method without the global Lipschitz continuity assumption Daniel Reem (joint work with Simeon Reich and Alvaro De Pierro) Department of Mathematics, The Technion,
More informationSpeech Recognition Lecture 7: Maximum Entropy Models. Mehryar Mohri Courant Institute and Google Research
Speech Recognition Lecture 7: Maximum Entropy Models Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.com This Lecture Information theory basics Maximum entropy models Duality theorem
More informationarxiv: v9 [math.st] 10 Apr 2016
A New Family of Bounded Divergence Measures and Application to Signal Detection Shivakumar Jolad 1, Ahmed Roman, Mahesh C. Shastry 3, Mihir Gadgil 4 and Ayanendranath Basu 5 1 Department of Physics, Indian
More informationarxiv:math/ v1 [math.co] 3 Sep 2000
arxiv:math/0009026v1 [math.co] 3 Sep 2000 Max Min Representation of Piecewise Linear Functions Sergei Ovchinnikov Mathematics Department San Francisco State University San Francisco, CA 94132 sergei@sfsu.edu
More informationOn Bayes Risk Lower Bounds
Journal of Machine Learning Research 17 (2016) 1-58 Submitted 4/16; Revised 10/16; Published 12/16 On Bayes Risk Lower Bounds Xi Chen Stern School of Business New York University New York, NY 10012, USA
More information