arxiv: v2 [math.st] 22 Aug 2018

Size: px
Start display at page:

Download "arxiv: v2 [math.st] 22 Aug 2018"

Transcription

1 GENERALIZED BREGMAN AND JENSEN DIVERGENCES WHICH INCLUDE SOME F-DIVERGENCES TOMOHIRO NISHIYAMA arxiv: v2 [math.st] 22 Aug 208 Abstract. In this paper, we introduce new classes of divergences by extending the definitions of the Bregman divergence and the skew Jensen divergence. These new divergence classes (g-bregman divergence and skew g-jensen divergence satisfy some properties similar to the Bregman or skew Jensen divergence. We show these g-divergences include divergences which belong to a class of f-divergence (the Hellinger distance, the chi-square divergence and the alpha-divergence in addition to the Kullback-Leibler divergence. Moreover, we derive an inequality between the skew g-jensen divergence and the g-bregman divergence and show this inequality is a generalization of Lin s inequality. Keywords: Bregman divergence, f-divergence, Jensen divergence, Kullback- Leibler divergence, Jeffreys divergence, Jensen-Shannon divergence, Hellinger distance, chi-square divergence, alpha divergence.. Introduction Divergences are functions measure the discrepancy between two points and play a key role in the field of machine learning, signal processing and so on. Given a set S and p,q S, a divergence is defined as a function D : S S R which satisfies the following properties.. D(p,q 0 for all p,q S 2. D(p,q = 0 p = q The Bregman divergence [6] B F (p,q F(p F(q p q, F(q and f- divergence [,9] D f (p q i f( pi are representative divergences defined by using a strictly convex function, where F,f : S R are strictly convex functions and f( = 0. Besides that Nielsen have introduced the skew Jensen divergence J F,α (p,q ( αf(p + αf(q F(( αp + αq for a paramter α (0, and have shown the relation between the Bregman divergence and the skew Jensen divergence [7, 4]. The Bregaman divergence and the f-divergece include well-known divergences. For example, the Kullback-Leibler divergence(kl-divergence [] is a type of the Bregman divergence or the f-divergence. On the other hand, the Hellinger distance, the χ-square divergence, the α-divergence [4,8] are a type of the f-divergence and the Jensen-Shannon divergence(js-divergence [2] is a type of the skew Jensen divergence. For probability distributions p and q, each divergence is defined as follows. KL-divergence KL(p q i p i ln( p i (

2 2 TOMOHIRO NISHIYAMA Hellinger distance H 2 (p,q i ( p i 2 (2 Pearson χ-square divergence Neyman χ-square divergence D χ 2,P(p q i (p i 2 (3 α-divergence D χ2,n(p q i D α (p q α(α i (p i 2 p i (4 (p α i q α i (5 The main goal of this paper is to introduce new classes of divergences ( gdivergence and clarify the relations among g-divergences and f-divergence. First, we introduce the g-bregman divergence and the skew g-jensen divergence by extending the definitions of the Bregman divergence and the skew Jensen divergence respectively. For these new classes of divergences, we show they satisfy some properties similar to the Bregman or the skew Jensen divergence and derive an inequality between them. Then, we show that the g-bregman divergence includes the Hellinger distance, the Pearson and Neyman χ-square divergence and the α-divergence. Furthermore, we show that the skew g-jensen divergence include the Hellinger distance and the α-divergence. These divergences have not only the properties of f-divergence but also some properties similar to the Bregman or the skew Jensen divergence. Finally we derive many inequalities by using an inequality between the g-bregman divergence and skew g-jensen divergence. 2. g-bregman divergence and skew g-jensen divergence By using a function g which has a inverse function, we define the g-bregman divergence and the skew g-jensen divergence. Definition. Let Ω R d be a convex set. For p Ω, let F(x be a strictly convex function F : Ω R. The Bregman divergence and the symmetric Bregman divergence are defined as B F (p,q F(p F(q p q, F(q (6 B g F,sym (p,q Bg F (p,q+bg F (q,p = p q, F(p F(q. (7 The symbol, indicates an inner product. For the parameter α R \{0,}, the scaled skew Jensen divergence [3,4] is defined as sj F,α (p,q ( ( αf(p+αf(q F(( αp+αq. (8 α( α Nielsen has shown the relation between the Bregman divergence and the Jensen divergence as follows [4]. B F (q,p = lim sj F,α (p,q α 0 (9 B F (p,q = lim sj F,α (p,q α (0

3 GENERALIZED BREGMAN AND JENSEN DIVERGENCES WHICH INCLUDE SOME F-DIVERGENCES3 J F,α (p,q = ( ( αb F (p,( αp+αq+αb F (q,( αp+αq α( α Definition 2. (Definition of the g-convex setlet S be a vector space over the real numbers and g be a function g : S S. If S satisfies the following condition, we call the set S g-convex set. For all p and n S and all α in the interval (0,, the point ( αg(p+αg(q also belongs to S. Definition 3. (Definition of the g-divergences Let Ω be a g-convex set and let g : Ω Ω be a function which satisfies g(p = g(q p = q. Let α be a parameter in (0,. We define the (scaled skew g-jensen divergence, the g-bregman divergence and the g-symmetric Bregman divergence as follows. ( sj g F,α (p,q sj F,α(g(p,g(q (2 B g F (p,q B F(g(p,g(q (3 B g F,sym (p,q B F,sym(g(p,g(q (4 From the definition of function g, these functions satisfy divergence properties D g (p,q = 0 g(p = g(q p = q and D g (p,q 0, where D g indicates the g-divergences. When g(p is equal to p for all p Ω, the g-divergences are consistent with original divergences. In the following, we assume that g is a function having an inverse function. The g-bregman divergence and the symmetric g-bregman divergence satisfy the linearity, the generalized law of cosines and the generalized parallelogram law as well as the Bregman and the symmetric Bregman divergence. Linearity For positive constant c and c 2, holds. B g c F +c 2F 2 (p,q = c B g F (p,q+c 2 B g F 2 (p,q (5 Proposition. (Generalized law of cosines Let Ω be a g-convex set. For points p,q,r Ω, the following equations hold. B g F (p,q = Bg F (p,r+bg F (r,q g(p g(r, F(g(q F(g(r (6 B g F,sym (p,q = Bg F,sym (p,r+bg F,sym (r,q (7 g(p g(r, F(g(q F(g(r + g(q g(r, F(g(p F(g(r It is easily proved in the same way as the Bregman divergence by putting p = g(p, q = g(q and r = g(r. Theorem. (Generalized parallelogram law Let Ω be a g-convexset. When points p,q,r,s Ω satisfy g(p+g(r = g(q+g(s, the following equations holds. B g F,sym (p,q+bg F,sym (q,r+bg F,sym (r,s+bg F,sym (s,q = Bg F,sym (p,r+bg F,sym (q,s (8 The left hand side is the sum of four sides of rectangle pqrs and the right hand side is the sum of diagonal lines of rectangle pqrs. Proof. It has been shown that the Bregman divergence satisfies the four point identity(see equation (2.3 in [6]. F(r F(s,p q = B F (q,r+b F (p,s B F (p,r B F (q,s (9

4 4 TOMOHIRO NISHIYAMA By putting p = g(p, q = g(q, r = g(r and s = g(s, we can prove the g- Bregman divergence satisfies the similar four point identity. F(g(r F(g(s,g(p g(q = B g F (q,r+bg F (p,s Bg F (p,r Bg F (q,s (20 By combining the assumption and the definition of the symmetric g-bregman divergence ((7 and (4, we have B g F,sym (r,s = Bg F (q,r+bg F (p,s Bg F (p,r Bg F (q,s. (2 Exchanging p r and q s in (2 and taking the sum with (2, the result follows. This theorem can be also proved in the same way as mentioned in the paper [5]. The same relation between the Bregman and the skew-jensen divergence also hold for the g-bregman and the skew g-jensen divergence. B g F (q,p = lim α 0 sjg F,α (p,q (22 B g F (p,q = lim α sjg F,α (p,q (23 Lemma. Let Ω be a g-convexset and let p,q be points p,q Ω. For a parameter α (0, and a point r Ω which satisfies g(r = ( αg(p + αg(q, the g-bregman divergence can be expressed as follows. sj g F,α (p,q α( α (( αb gf (p,r+αbgf (q,r It is easily proved in the same way as equation ( by putting p = g(p and q = g(q. Next, we derive an inequality between the skew g-jensen and the symmetric g-bregman divergence. Lemma 2. Let Ω be a g-convexset and let p,q be points p,q Ω. For a parameter α (0, and a point r Ω which satisfies g(r = ( αg(p+αg(q, the following equations hold. B g F (p,q = Bg F (p,r+bg F (r,q+ α α Bg F,sym (r,q (25 (24 B g F (q,p = α Bg F (r,p+bg F (q,r+ α Bg F,sym (r,p (26 Proof. We first prove (25. By the generalized law of cosines (6, we have B g F (p,q = Bg F (p,r+bg F (r,q+ g(r g(p, F(g(q F(g(r. (27 From the assumption, the equation g(r g(p = α (g(q g(r (28 α holds. By substituting (28 to (27 and using (4, the result follows. By exchanging p and n (27, we can prove (26 in the same way. Lemma 3. Let Ω be a g-convexset and let p,q be points p,q Ω. For a parameter α (0, and a point r Ω which satisfies g(r = ( αg(p+αg(q, the following equation holds. B g F,sym (p,q = α Bg F,sym (p,r+ α Bg F,sym (r,q (29 Proof. Taking the sum of (25 and (26, the result follows.

5 GENERALIZED BREGMAN AND JENSEN DIVERGENCES WHICH INCLUDE SOME F-DIVERGENCES5 Theorem 2. (Jensen-Bregaman inequality Let Ω be a g-convex set and let p,q be points in Ω. For a parameter α (0,, the skew g-jensen divergence and the symmetric g-bregman divergence, the following inequality holds. B g F,sym (p,q sjg F,α (p,q (30 Proof. Let r Ω be a point which satisfies g(r = ( αg(p + αg(q. From (24 and using B g F (p,q Bg F,sym (p,q, we have sj g F,α (p,q = α Bg F (p,r+ α Bg F (q,r α Bg F,sym (p,r+ α Bg F,sym (q,r. Using Lemma 3., we have (3 sj g F,α (p,q α Bg F,sym (p,r+ α Bg F,sym (r,q = Bg F,sym (p,q (32 3. Examples We show some examples of the g-divergences. We denote the g-bregman divergence as GBD, the symmetric g-bregman divergence as SGBD, the skew g-jensen divergence as SGJD and the Jensen-Bregman inequality (30 as JBinequality. Aparameterαisin(0,exceptforexample3andweuseaparameterβ (0, instead of α in example 3. From example to 6, we show the case that GBD is belong to a class of f-divergence and example 7 is the case of information geometry [2 4] and statistical mechanics. KL-divergence, Jeffreys divergence, JS-divergence Generating functions: F(p = i p ilnp i, g(p = p GBD: generalized KL-divergence KL(p q i ( p i + +p i ln( pi SGBD: Jeffreys divergence(j-divergence [0] J(p,q i (p i ln( pi SGJD: scaled skew JS-divergence ( α( α JS α(p q α( α ( αkl(p ( αp+αq+αkl(q ( αp+αq JB-ineuqlity: generalized Lin s inequality α( αj(p,q JS α (p,q. When α is equall to 2, this inequality is consistent with Lin s inequality [2]. 2 KL-divergence, Jeffreys divergence, α-divergence Generating functions: F(p = i exp(p i, g(p = lnp GBD: generalized reverse KL-divergence KL(q p i (p i + ln( qi p i SGBD: J-divergence J(p,q i (p i ln( pi SGJD: generalized α-divergenced α (p q, where D α (p q i αp i ( α JB-ineuqlity: J(p,q D α (p q 3 α-divergence Generating functions: F(p = α i p α i, g(p = p α for α R\{0,} GBD: generalized α-divergence D α (p q SGBD: symmetric α-divergence ( D α,sym (p,q D α (p q+d α (q p ( βp i +β ( ( βp α i +βqα α i SGJD: β( β( α i α(α (pα i q α i

6 6 TOMOHIRO NISHIYAMA ( JB-ineuqlity: D α,sym (p,q β( β( α i ( βp i +β ( ( βp α i +βqα i α By using the equations 2 D ( 2 (p q = H2 (p,q, 2D (2 (p q = D χ 2,P(p q and 2D ( (p q = D χ2,n(p q [8], we obtain example 4 to 6. 4 Hellinger distance Generating functions: F(p = i p2 i, g(p = p GBD: Hellinger distance H 2 (p,q i ( p i 2 SGBD: Hellinger distance 2H 2 (p,q SGJD: Hellinger distance H 2 (p,q JB-ineuqlity: trivial 5 Pearson χ-square divergence Generating functions: F(p = 2 i pi, g(p = p 2 (p i 2 GBD: Pearson χ-square divergence D χ2,p(q p i SGBD: symmetric χ-square divergence D χ 2,sym(p,q D χ 2(p q+d χ 2(q p 2 SGJD: α( α i ( ( αp i α + ( αp 2 i +αq2 i JB-ineuqlity: D χ2,sym(p,q 2 α( α i ( ( αp i α + ( αp 2 i +αq2 i 6 Neyman χ-square divergence Generating functions: F(p = i p i, g(p = p (p i 2 p i GBD: Neyman χ-square divergence D χ 2,N(q p i SGBD: symmetric χ-square divergence D χ 2,sym(p,q SGJD: α( α i (( αp p i +α i ( α+αp i JB-ineuqlity: D χ 2,sym(p,q α( α i (( αp i +α p i ( α+αp i 7 Information geometry and statistical mechanics In the dually flat space M, there exist two affine coordinates θ i, η i and two convex functions called potential ψ(θ, φ(η. For the points P,Q M, we put p i = θ i (P, = θ i (Q or p i = η i (P, = η i (Q. Generating functions: F(p = ψ(θ or F(p = φ(η, g(p = p GBD: canonical divergence D(P Q φ(η(q+ψ(θ(p i η i(qθ i (P SGBD: D A (P,Q i (η i(q η i (P(θ i (Q θ i (P SGJD: sj ψ,α (θ(p,θ(q or sj φ,α (η(p,η(q JB-ineuqlity: D A (P,Q sj ψ,α (θ(p,θ(q or D A (P,Q sj φ,α (η(p,η(q For the exponential or mixture families, GBD is equal to the KL-divergence and SGBD is equal to the J-divergence. For the exponential families, SGJD sj ψ,α (θ(p,θ(q is equal to the skew Bhattacharyya distance [5,4], and for the mixture families, SGJD sj φ,α (η(p,η(q is equal to the skew-js divergence(see [5] in detail. In the case of canonical ensemble in statistical mechanics, the probability distribution function belongs to exponential families. Because the manifold of the exponential family is a dually flat space, the same equations and inequality hold

7 GENERALIZED BREGMAN AND JENSEN DIVERGENCES WHICH INCLUDE SOME F-DIVERGENCES7 for θ = β k B T (33 η = U (34 ψ = βf (35 φ = S, (36 k B where k B is the Boltzmann constant, T is temperature, U is the internal energy, S is the entropy and F is the Helmholtz free energy. Defining aparameterαas aindex ofthe statewhich satisfy T α = ( α T 0 +α T or U α = ( αu 0 +αu, JB-inequality can be written as follows. α( α (T T 0 (U U 0 T 0 T F α T α ( α F 0 T 0 α F T (37 α( α (T T 0 (U U 0 T 0 T S α ( αs 0 αs (38 4. Conclusion We have introduced the g-bregman divergence and the skew g-jensen divergence by extending the definitions of the Bregman and the skew Jensen divergence respectively. We have shown they satisfy some properties similar to the Bregman and the skew-jensen divergence. Furthermore, we have derived an inequality between the symmetric g-bregman divergence and the skew g-jensen divergence and we have shown they include divergences which belong to a class of f-divergence. If the relationship between these three classes of divergence are studied in more detail, it is expected to proceed applications to various fields including machine learning. References [] Syed Mumtaz Ali and Samuel D Silvey. A general class of coefficients of divergence of one distribution from another. Journal of the Royal Statistical Society. Series B (Methodological, pages 3 42, 966. [2] Shun-ichi Amari. Information geometry and its applications. Springer, 206. [3] Shun-ichi Amari and Andrzej Cichocki. Information geometry of divergence functions. Bulletin of the Polish Academy of Sciences: Technical Sciences, 58(:83 95, 200. [4] Shun-ichi Amari and Hiroshi Nagaoka. Methods of information geometry, volume 9. American Mathematical Soc., [5] Anil Bhattacharyya. On a measure of divergence between two statistical populations defined by their probability distributions. Bull. Calcutta Math. Soc., 35:99 09, 943. [6] Lev M Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR computational mathematics and mathematical physics, 7(3:200 27, 967. [7] Jacob Burbea and C Radhakrishna Rao. On the convexity of some divergence measures based on entropy functions. Technical report, PITTSBURGH UNIV PA INST FOR STATISTICS AND APPLICATIONS, 980. [8] Andrzej Cichocki and Shun-ichi Amari. Families of alpha-beta-and gamma-divergences: Flexible and robust measures of similarities. Entropy, 2(6: , 200. [9] Imre Csiszár. Information-type measures of difference of probability distributions and indirect observation. studia scientiarum Mathematicarum Hungarica, 2:229 38, 967. [0] Harold Jeffreys. An invariant form for the prior probability in estimation problems. Proc. R. Soc. Lond. A, 86(007:453 46, 946. [] Solomon Kullback. Information theory and statistics. Courier Corporation, 997. [2] Jianhua Lin. Divergence measures based on the shannon entropy. IEEE Transactions on Information theory, 37(:45 5, 99. [3] Frank Nielsen. A family of statistical symmetric divergences based on jensen s inequality. arxiv preprint arxiv: , 200.

8 8 TOMOHIRO NISHIYAMA [4] Frank Nielsen and Sylvain Boltz. The burbea-rao and bhattacharyya centroids. IEEE Transactions on Information Theory, 57(8: , 20. [5] Tomohiro Nishiyama. Divergence functions in dually flat spaces and their properties. arxiv preprint arxiv: , 208. [6] Simeon Reich and Shoham Sabach. Two strong convergence theorems for bregman strongly nonexpansive operators in reflexive banach spaces. Nonlinear Analysis: Theory, Methods & Applications, 73(:22 35, 200.

On the Chi square and higher-order Chi distances for approximating f-divergences

On the Chi square and higher-order Chi distances for approximating f-divergences On the Chi square and higher-order Chi distances for approximating f-divergences Frank Nielsen, Senior Member, IEEE and Richard Nock, Nonmember Abstract We report closed-form formula for calculating the

More information

On the Chi square and higher-order Chi distances for approximating f-divergences

On the Chi square and higher-order Chi distances for approximating f-divergences c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 1/17 On the Chi square and higher-order Chi distances for approximating f-divergences Frank Nielsen 1 Richard Nock 2 www.informationgeometry.org

More information

arxiv:math/ v1 [math.st] 19 Jan 2005

arxiv:math/ v1 [math.st] 19 Jan 2005 ON A DIFFERENCE OF JENSEN INEQUALITY AND ITS APPLICATIONS TO MEAN DIVERGENCE MEASURES INDER JEET TANEJA arxiv:math/05030v [math.st] 9 Jan 005 Let Abstract. In this paper we have considered a difference

More information

arxiv: v2 [cs.cv] 19 Dec 2011

arxiv: v2 [cs.cv] 19 Dec 2011 A family of statistical symmetric divergences based on Jensen s inequality arxiv:009.4004v [cs.cv] 9 Dec 0 Abstract Frank Nielsen a,b, a Ecole Polytechnique Computer Science Department (LIX) Palaiseau,

More information

A SYMMETRIC INFORMATION DIVERGENCE MEASURE OF CSISZAR'S F DIVERGENCE CLASS

A SYMMETRIC INFORMATION DIVERGENCE MEASURE OF CSISZAR'S F DIVERGENCE CLASS Journal of the Applied Mathematics, Statistics Informatics (JAMSI), 7 (2011), No. 1 A SYMMETRIC INFORMATION DIVERGENCE MEASURE OF CSISZAR'S F DIVERGENCE CLASS K.C. JAIN AND R. MATHUR Abstract Information

More information

arxiv: v1 [cs.lg] 22 Oct 2018

arxiv: v1 [cs.lg] 22 Oct 2018 The Bregman chord divergence arxiv:1810.09113v1 [cs.lg] Oct 018 rank Nielsen Sony Computer Science Laboratories Inc, Japan rank.nielsen@acm.org Richard Nock Data 61, Australia The Australian National &

More information

Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding

Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding APPEARS IN THE IEEE TRANSACTIONS ON INFORMATION THEORY, FEBRUARY 015 1 Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding Igal Sason Abstract Tight bounds for

More information

arxiv: v2 [cs.it] 26 Sep 2011

arxiv: v2 [cs.it] 26 Sep 2011 Sequences o Inequalities Among New Divergence Measures arxiv:1010.041v [cs.it] 6 Sep 011 Inder Jeet Taneja Departamento de Matemática Universidade Federal de Santa Catarina 88.040-900 Florianópolis SC

More information

The Information Bottleneck Revisited or How to Choose a Good Distortion Measure

The Information Bottleneck Revisited or How to Choose a Good Distortion Measure The Information Bottleneck Revisited or How to Choose a Good Distortion Measure Peter Harremoës Centrum voor Wiskunde en Informatica PO 94079, 1090 GB Amsterdam The Nederlands PHarremoes@cwinl Naftali

More information

Some New Information Inequalities Involving f-divergences

Some New Information Inequalities Involving f-divergences BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 12, No 2 Sofia 2012 Some New Information Inequalities Involving f-divergences Amit Srivastava Department of Mathematics, Jaypee

More information

Information geometry of Bayesian statistics

Information geometry of Bayesian statistics Information geometry of Bayesian statistics Hiroshi Matsuzoe Department of Computer Science and Engineering, Graduate School of Engineering, Nagoya Institute of Technology, Nagoya 466-8555, Japan Abstract.

More information

Information Geometry of Positive Measures and Positive-Definite Matrices: Decomposable Dually Flat Structure

Information Geometry of Positive Measures and Positive-Definite Matrices: Decomposable Dually Flat Structure Entropy 014, 16, 131-145; doi:10.3390/e1604131 OPEN ACCESS entropy ISSN 1099-4300 www.mdpi.com/journal/entropy Article Information Geometry of Positive Measures and Positive-Definite Matrices: Decomposable

More information

Nested Inequalities Among Divergence Measures

Nested Inequalities Among Divergence Measures Appl Math Inf Sci 7, No, 49-7 0 49 Applied Mathematics & Information Sciences An International Journal c 0 NSP Natural Sciences Publishing Cor Nested Inequalities Among Divergence Measures Inder J Taneja

More information

Symplectic and Kähler Structures on Statistical Manifolds Induced from Divergence Functions

Symplectic and Kähler Structures on Statistical Manifolds Induced from Divergence Functions Symplectic and Kähler Structures on Statistical Manifolds Induced from Divergence Functions Jun Zhang 1, and Fubo Li 1 University of Michigan, Ann Arbor, Michigan, USA Sichuan University, Chengdu, China

More information

AN AFFINE EMBEDDING OF THE GAMMA MANIFOLD

AN AFFINE EMBEDDING OF THE GAMMA MANIFOLD AN AFFINE EMBEDDING OF THE GAMMA MANIFOLD C.T.J. DODSON AND HIROSHI MATSUZOE Abstract. For the space of gamma distributions with Fisher metric and exponential connections, natural coordinate systems, potential

More information

Mutual Information and Optimal Data Coding

Mutual Information and Optimal Data Coding Mutual Information and Optimal Data Coding May 9 th 2012 Jules de Tibeiro Université de Moncton à Shippagan Bernard Colin François Dubeau Hussein Khreibani Université de Sherbooe Abstract Introduction

More information

Information Geometry on Hierarchy of Probability Distributions

Information Geometry on Hierarchy of Probability Distributions IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 47, NO. 5, JULY 2001 1701 Information Geometry on Hierarchy of Probability Distributions Shun-ichi Amari, Fellow, IEEE Abstract An exponential family or mixture

More information

topics about f-divergence

topics about f-divergence topics about f-divergence Presented by Liqun Chen Mar 16th, 2018 1 Outline 1 f-gan: Training Generative Neural Samplers using Variational Experiments 2 f-gans in an Information Geometric Nutshell Experiments

More information

Mathematical Foundations of the Generalization of t-sne and SNE for Arbitrary Divergences

Mathematical Foundations of the Generalization of t-sne and SNE for Arbitrary Divergences MACHINE LEARNING REPORTS Mathematical Foundations of the Generalization of t-sne and SNE for Arbitrary Report 02/2010 Submitted: 01.04.2010 Published:26.04.2010 T. Villmann and S. Haase University of Applied

More information

Introduction to Information Geometry

Introduction to Information Geometry Introduction to Information Geometry based on the book Methods of Information Geometry written by Shun-Ichi Amari and Hiroshi Nagaoka Yunshu Liu 2012-02-17 Outline 1 Introduction to differential geometry

More information

Non-Negative Matrix Factorization with Quasi-Newton Optimization

Non-Negative Matrix Factorization with Quasi-Newton Optimization Non-Negative Matrix Factorization with Quasi-Newton Optimization Rafal ZDUNEK, Andrzej CICHOCKI Laboratory for Advanced Brain Signal Processing BSI, RIKEN, Wako-shi, JAPAN Abstract. Non-negative matrix

More information

ZOBECNĚNÉ PHI-DIVERGENCE A EM-ALGORITMUS V AKUSTICKÉ EMISI

ZOBECNĚNÉ PHI-DIVERGENCE A EM-ALGORITMUS V AKUSTICKÉ EMISI ZOBECNĚNÉ PHI-DIVERGENCE A EM-ALGORITMUS V AKUSTICKÉ EMISI Jan Tláskal a Václav Kůs Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University in Prague 11.11.2010 Concepts What the

More information

PROPERTIES. Ferdinand Österreicher Institute of Mathematics, University of Salzburg, Austria

PROPERTIES. Ferdinand Österreicher Institute of Mathematics, University of Salzburg, Austria CSISZÁR S f-divergences - BASIC PROPERTIES Ferdinand Österreicher Institute of Mathematics, University of Salzburg, Austria Abstract In this talk basic general properties of f-divergences, including their

More information

June 21, Peking University. Dual Connections. Zhengchao Wan. Overview. Duality of connections. Divergence: general contrast functions

June 21, Peking University. Dual Connections. Zhengchao Wan. Overview. Duality of connections. Divergence: general contrast functions Dual Peking University June 21, 2016 Divergences: Riemannian connection Let M be a manifold on which there is given a Riemannian metric g =,. A connection satisfying Z X, Y = Z X, Y + X, Z Y (1) for all

More information

A View on Extension of Utility-Based on Links with Information Measures

A View on Extension of Utility-Based on Links with Information Measures Communications of the Korean Statistical Society 2009, Vol. 16, No. 5, 813 820 A View on Extension of Utility-Based on Links with Information Measures A.R. Hoseinzadeh a, G.R. Mohtashami Borzadaran 1,b,

More information

Information geometry of the power inverse Gaussian distribution

Information geometry of the power inverse Gaussian distribution Information geometry of the power inverse Gaussian distribution Zhenning Zhang, Huafei Sun and Fengwei Zhong Abstract. The power inverse Gaussian distribution is a common distribution in reliability analysis

More information

Information Divergences and the Curious Case of the Binary Alphabet

Information Divergences and the Curious Case of the Binary Alphabet Information Divergences and the Curious Case of the Binary Alphabet Jiantao Jiao, Thomas Courtade, Albert No, Kartik Venkat, and Tsachy Weissman Department of Electrical Engineering, Stanford University;

More information

Research Article On Some Improvements of the Jensen Inequality with Some Applications

Research Article On Some Improvements of the Jensen Inequality with Some Applications Hindawi Publishing Corporation Journal of Inequalities and Applications Volume 2009, Article ID 323615, 15 pages doi:10.1155/2009/323615 Research Article On Some Improvements of the Jensen Inequality with

More information

F -Geometry and Amari s α Geometry on a Statistical Manifold

F -Geometry and Amari s α Geometry on a Statistical Manifold Entropy 014, 16, 47-487; doi:10.3390/e160547 OPEN ACCESS entropy ISSN 1099-4300 www.mdpi.com/journal/entropy Article F -Geometry and Amari s α Geometry on a Statistical Manifold Harsha K. V. * and Subrahamanian

More information

Information Geometric view of Belief Propagation

Information Geometric view of Belief Propagation Information Geometric view of Belief Propagation Yunshu Liu 2013-10-17 References: [1]. Shiro Ikeda, Toshiyuki Tanaka and Shun-ichi Amari, Stochastic reasoning, Free energy and Information Geometry, Neural

More information

CONVEX FUNCTIONS AND MATRICES. Silvestru Sever Dragomir

CONVEX FUNCTIONS AND MATRICES. Silvestru Sever Dragomir Korean J. Math. 6 (018), No. 3, pp. 349 371 https://doi.org/10.11568/kjm.018.6.3.349 INEQUALITIES FOR QUANTUM f-divergence OF CONVEX FUNCTIONS AND MATRICES Silvestru Sever Dragomir Abstract. Some inequalities

More information

Information geometry connecting Wasserstein distance and Kullback Leibler divergence via the entropy-relaxed transportation problem

Information geometry connecting Wasserstein distance and Kullback Leibler divergence via the entropy-relaxed transportation problem https://doi.org/10.1007/s41884-018-0002-8 RESEARCH PAPER Information geometry connecting Wasserstein distance and Kullback Leibler divergence via the entropy-relaxed transportation problem Shun-ichi Amari

More information

Chapter 2 Exponential Families and Mixture Families of Probability Distributions

Chapter 2 Exponential Families and Mixture Families of Probability Distributions Chapter 2 Exponential Families and Mixture Families of Probability Distributions The present chapter studies the geometry of the exponential family of probability distributions. It is not only a typical

More information

Information geometry of mirror descent

Information geometry of mirror descent Information geometry of mirror descent Geometric Science of Information Anthea Monod Department of Statistical Science Duke University Information Initiative at Duke G. Raskutti (UW Madison) and S. Mukherjee

More information

f-divergence Inequalities

f-divergence Inequalities SASON AND VERDÚ: f-divergence INEQUALITIES f-divergence Inequalities Igal Sason, and Sergio Verdú Abstract arxiv:508.00335v7 [cs.it] 4 Dec 06 This paper develops systematic approaches to obtain f-divergence

More information

Legendre transformation and information geometry

Legendre transformation and information geometry Legendre transformation and information geometry CIG-MEMO #2, v1 Frank Nielsen École Polytechnique Sony Computer Science Laboratorie, Inc http://www.informationgeometry.org September 2010 Abstract We explain

More information

f-divergence Inequalities

f-divergence Inequalities f-divergence Inequalities Igal Sason Sergio Verdú Abstract This paper develops systematic approaches to obtain f-divergence inequalities, dealing with pairs of probability measures defined on arbitrary

More information

Alpha/Beta Divergences and Tweedie Models

Alpha/Beta Divergences and Tweedie Models Alpha/Beta Divergences and Tweedie Models 1 Y. Kenan Yılmaz A. Taylan Cemgil Department of Computer Engineering Boğaziçi University, Istanbul, Turkey kenan@sibnet.com.tr, taylan.cemgil@boun.edu.tr arxiv:1209.4280v1

More information

Machine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang

Machine Learning. Lecture 02.2: Basics of Information Theory. Nevin L. Zhang Machine Learning Lecture 02.2: Basics of Information Theory Nevin L. Zhang lzhang@cse.ust.hk Department of Computer Science and Engineering The Hong Kong University of Science and Technology Nevin L. Zhang

More information

Bregman divergence and density integration Noboru Murata and Yu Fujimoto

Bregman divergence and density integration Noboru Murata and Yu Fujimoto Journal of Math-for-industry, Vol.1(2009B-3), pp.97 104 Bregman divergence and density integration Noboru Murata and Yu Fujimoto Received on August 29, 2009 / Revised on October 4, 2009 Abstract. In this

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 6 Standard Kernels Unusual Input Spaces for Kernels String Kernels Probabilistic Kernels Fisher Kernels Probability Product Kernels

More information

Tight Bounds for Symmetric Divergence Measures and a New Inequality Relating f-divergences

Tight Bounds for Symmetric Divergence Measures and a New Inequality Relating f-divergences Tight Bounds for Symmetric Divergence Measures and a New Inequality Relating f-divergences Igal Sason Department of Electrical Engineering Technion, Haifa 3000, Israel E-mail: sason@ee.technion.ac.il Abstract

More information

Literature on Bregman divergences

Literature on Bregman divergences Literature on Bregman divergences Lecture series at Univ. Hawai i at Mānoa Peter Harremoës February 26, 2016 Information divergence was introduced by Kullback and Leibler [25] and later Kullback started

More information

Mean-field equations for higher-order quantum statistical models : an information geometric approach

Mean-field equations for higher-order quantum statistical models : an information geometric approach Mean-field equations for higher-order quantum statistical models : an information geometric approach N Yapage Department of Mathematics University of Ruhuna, Matara Sri Lanka. arxiv:1202.5726v1 [quant-ph]

More information

On Information Divergence Measures, Surrogate Loss Functions and Decentralized Hypothesis Testing

On Information Divergence Measures, Surrogate Loss Functions and Decentralized Hypothesis Testing On Information Divergence Measures, Surrogate Loss Functions and Decentralized Hypothesis Testing XuanLong Nguyen Martin J. Wainwright Michael I. Jordan Electrical Engineering & Computer Science Department

More information

Belief Propagation, Information Projections, and Dykstra s Algorithm

Belief Propagation, Information Projections, and Dykstra s Algorithm Belief Propagation, Information Projections, and Dykstra s Algorithm John MacLaren Walsh, PhD Department of Electrical and Computer Engineering Drexel University Philadelphia, PA jwalsh@ece.drexel.edu

More information

This article was published in an Elsevier journal. The attached copy is furnished to the author for non-commercial research and education use, including for instruction at the author s institution, sharing

More information

Existence and Approximation of Fixed Points of. Bregman Nonexpansive Operators. Banach Spaces

Existence and Approximation of Fixed Points of. Bregman Nonexpansive Operators. Banach Spaces Existence and Approximation of Fixed Points of in Reflexive Banach Spaces Department of Mathematics The Technion Israel Institute of Technology Haifa 22.07.2010 Joint work with Prof. Simeon Reich General

More information

Journée Interdisciplinaire Mathématiques Musique

Journée Interdisciplinaire Mathématiques Musique Journée Interdisciplinaire Mathématiques Musique Music Information Geometry Arnaud Dessein 1,2 and Arshia Cont 1 1 Institute for Research and Coordination of Acoustics and Music, Paris, France 2 Japanese-French

More information

Symmetrizing the Kullback-Leibler distance

Symmetrizing the Kullback-Leibler distance Symmetrizing the Kullback-Leibler distance Don H. Johnson Λ and Sinan Sinanović Computer and Information Technology Institute Department of Electrical and Computer Engineering Rice University Houston,

More information

Approximating Covering and Minimum Enclosing Balls in Hyperbolic Geometry

Approximating Covering and Minimum Enclosing Balls in Hyperbolic Geometry Approximating Covering and Minimum Enclosing Balls in Hyperbolic Geometry Frank Nielsen 1 and Gaëtan Hadjeres 1 École Polytechnique, France/Sony Computer Science Laboratories, Japan Frank.Nielsen@acm.org

More information

2. A diagonal of a parallelogram divides it into two congruent triangles. 5. Diagonals of a rectangle bisect each other and are equal and vice-versa.

2. A diagonal of a parallelogram divides it into two congruent triangles. 5. Diagonals of a rectangle bisect each other and are equal and vice-versa. QURILTERLS 1. Sum of the angles of a quadrilateral is 360. 2. diagonal of a parallelogram divides it into two congruent triangles. 3. In a parallelogram, (i) opposite sides are equal (ii) opposite angles

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Total Jensen divergences: Definition, Properties and k-means++ Clustering

Total Jensen divergences: Definition, Properties and k-means++ Clustering Total Jensen divergences: Definition, Properties and k-means++ Clustering Frank Nielsen, Senior Member, IEEE Sony Computer Science Laboratories, Inc. 3-4-3 Higashi Gotanda, 4-00 Shinagawa-ku arxiv:309.709v

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /

More information

Information Projection Algorithms and Belief Propagation

Information Projection Algorithms and Belief Propagation χ 1 π(χ 1 ) Information Projection Algorithms and Belief Propagation Phil Regalia Department of Electrical Engineering and Computer Science Catholic University of America Washington, DC 20064 with J. M.

More information

1/37. Convexity theory. Victor Kitov

1/37. Convexity theory. Victor Kitov 1/37 Convexity theory Victor Kitov 2/37 Table of Contents 1 2 Strictly convex functions 3 Concave & strictly concave functions 4 Kullback-Leibler divergence 3/37 Convex sets Denition 1 Set X is convex

More information

Information Geometry and Algebraic Statistics

Information Geometry and Algebraic Statistics Matematikos ir informatikos institutas Vinius Information Geometry and Algebraic Statistics Giovanni Pistone and Eva Riccomagno POLITO, UNIGE Wednesday 30th July, 2008 G.Pistone, E.Riccomagno (POLITO,

More information

Pattern learning and recognition on statistical manifolds: An information-geometric review

Pattern learning and recognition on statistical manifolds: An information-geometric review c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 1/64 Pattern learning and recognition on statistical manifolds: An information-geometric review Frank Nielsen Frank.Nielsen@acm.org www.informationgeometry.org

More information

Posterior Regularization

Posterior Regularization Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods

More information

On Improved Bounds for Probability Metrics and f- Divergences

On Improved Bounds for Probability Metrics and f- Divergences IRWIN AND JOAN JACOBS CENTER FOR COMMUNICATION AND INFORMATION TECHNOLOGIES On Improved Bounds for Probability Metrics and f- Divergences Igal Sason CCIT Report #855 March 014 Electronics Computers Communications

More information

Limits from l Hôpital rule: Shannon entropy as limit cases of Rényi and Tsallis entropies

Limits from l Hôpital rule: Shannon entropy as limit cases of Rényi and Tsallis entropies Limits from l Hôpital rule: Shannon entropy as it cases of Rényi and Tsallis entropies Frank Nielsen École Polytechnique/Sony Computer Science Laboratorie, Inc http://www.informationgeometry.org August

More information

Divergences, surrogate loss functions and experimental design

Divergences, surrogate loss functions and experimental design Divergences, surrogate loss functions and experimental design XuanLong Nguyen University of California Berkeley, CA 94720 xuanlong@cs.berkeley.edu Martin J. Wainwright University of California Berkeley,

More information

On the decay of elements of inverse triangular Toeplitz matrix

On the decay of elements of inverse triangular Toeplitz matrix On the decay of elements of inverse triangular Toeplitz matrix Neville Ford, D. V. Savostyanov, N. L. Zamarashkin August 03, 203 arxiv:308.0724v [math.na] 3 Aug 203 Abstract We consider half-infinite triangular

More information

arxiv: v4 [cs.it] 8 Apr 2014

arxiv: v4 [cs.it] 8 Apr 2014 1 On Improved Bounds for Probability Metrics and f-divergences Igal Sason Abstract Derivation of tight bounds for probability metrics and f-divergences is of interest in information theory and statistics.

More information

COMMON FIXED POINT THEOREM OF THREE MAPPINGS IN COMPLETE METRIC SPACE

COMMON FIXED POINT THEOREM OF THREE MAPPINGS IN COMPLETE METRIC SPACE COMMON FIXED POINT THEOREM OF THREE MAPPINGS IN COMPLETE METRIC SPACE Latpate V.V. and Dolhare U.P. ACS College Gangakhed DSM College Jintur Abstract:-In this paper we prove common fixed point theorem

More information

A New Quantum f-divergence for Trace Class Operators in Hilbert Spaces

A New Quantum f-divergence for Trace Class Operators in Hilbert Spaces Entropy 04, 6, 5853-5875; doi:0.3390/e65853 OPEN ACCESS entropy ISSN 099-4300 www.mdpi.com/journal/entropy Article A New Quantum f-divergence for Trace Class Operators in Hilbert Spaces Silvestru Sever

More information

Convexity/Concavity of Renyi Entropy and α-mutual Information

Convexity/Concavity of Renyi Entropy and α-mutual Information Convexity/Concavity of Renyi Entropy and -Mutual Information Siu-Wai Ho Institute for Telecommunications Research University of South Australia Adelaide, SA 5095, Australia Email: siuwai.ho@unisa.edu.au

More information

Superstability of Pexiderized functional equations arising from distance measures

Superstability of Pexiderized functional equations arising from distance measures Available online at www.tjnsa.com J. Nonlinear Sci. Appl. 9 (206), 43 423 Research Article Superstability of Pexiderized functional equations arising from distance measures Gwang Hui Kim a, Young Whan

More information

Measuring Disclosure Risk and Information Loss in Population Based Frequency Tables

Measuring Disclosure Risk and Information Loss in Population Based Frequency Tables Measuring Disclosure Risk and Information Loss in Population Based Frequency Tables László Antal, atalie Shlomo, Mark Elliot University of Manchester laszlo.antal@postgrad.manchester.ac.uk 8 September

More information

Shunlong Luo. Academy of Mathematics and Systems Science Chinese Academy of Sciences

Shunlong Luo. Academy of Mathematics and Systems Science Chinese Academy of Sciences Superadditivity of Fisher Information: Classical vs. Quantum Shunlong Luo Academy of Mathematics and Systems Science Chinese Academy of Sciences luosl@amt.ac.cn Information Geometry and its Applications

More information

Bayesian Inference Course, WTCN, UCL, March 2013

Bayesian Inference Course, WTCN, UCL, March 2013 Bayesian Course, WTCN, UCL, March 2013 Shannon (1948) asked how much information is received when we observe a specific value of the variable x? If an unlikely event occurs then one would expect the information

More information

Information Measures: the Curious Case of the Binary Alphabet

Information Measures: the Curious Case of the Binary Alphabet Information Measures: the Curious Case of the Binary Alphabet Jiantao Jiao, Student Member, IEEE, Thomas A. Courtade, Member, IEEE, Albert No, Student Member, IEEE, Kartik Venkat, Student Member, IEEE,

More information

Information geometry in optimization, machine learning and statistical inference

Information geometry in optimization, machine learning and statistical inference Front. Electr. Electron. Eng. China 2010, 5(3): 241 260 DOI 10.1007/s11460-010-0101-3 Shun-ichi AMARI Information geometry in optimization, machine learning and statistical inference c Higher Education

More information

Entropy measures of physics via complexity

Entropy measures of physics via complexity Entropy measures of physics via complexity Giorgio Kaniadakis and Flemming Topsøe Politecnico of Torino, Department of Physics and University of Copenhagen, Department of Mathematics 1 Introduction, Background

More information

Statistical Analysis of Distance Estimators with Density Differences and Density Ratios

Statistical Analysis of Distance Estimators with Density Differences and Density Ratios Entropy 2014, 16, 921-942; doi:10.3390/e16020921 OPEN ACCESS entropy ISSN 1099-4300 www.mdpi.com/journal/entropy Article Statistical Analysis of Distance Estimators with Density Differences and Density

More information

Strong Converse and Stein s Lemma in the Quantum Hypothesis Testing

Strong Converse and Stein s Lemma in the Quantum Hypothesis Testing Strong Converse and Stein s Lemma in the Quantum Hypothesis Testing arxiv:uant-ph/9906090 v 24 Jun 999 Tomohiro Ogawa and Hiroshi Nagaoka Abstract The hypothesis testing problem of two uantum states is

More information

Statistical Machine Learning Lectures 4: Variational Bayes

Statistical Machine Learning Lectures 4: Variational Bayes 1 / 29 Statistical Machine Learning Lectures 4: Variational Bayes Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 29 Synonyms Variational Bayes Variational Inference Variational Bayesian Inference

More information

New universal Lyapunov functions for nonlinear kinetics

New universal Lyapunov functions for nonlinear kinetics New universal Lyapunov functions for nonlinear kinetics Department of Mathematics University of Leicester, UK July 21, 2014, Leicester Outline 1 2 Outline 1 2 Boltzmann Gibbs-Shannon relative entropy (1872-1948)

More information

Wasserstein GAN. Juho Lee. Jan 23, 2017

Wasserstein GAN. Juho Lee. Jan 23, 2017 Wasserstein GAN Juho Lee Jan 23, 2017 Wasserstein GAN (WGAN) Arxiv submission Martin Arjovsky, Soumith Chintala, and Léon Bottou A new GAN model minimizing the Earth-Mover s distance (Wasserstein-1 distance)

More information

THEOREM AND METRIZABILITY

THEOREM AND METRIZABILITY f-divergences - REPRESENTATION THEOREM AND METRIZABILITY Ferdinand Österreicher Institute of Mathematics, University of Salzburg, Austria Abstract In this talk we are first going to state the so-called

More information

Goodness of Fit Test and Test of Independence by Entropy

Goodness of Fit Test and Test of Independence by Entropy Journal of Mathematical Extension Vol. 3, No. 2 (2009), 43-59 Goodness of Fit Test and Test of Independence by Entropy M. Sharifdoost Islamic Azad University Science & Research Branch, Tehran N. Nematollahi

More information

Boyd Indices of Orlicz Lorentz Spaces

Boyd Indices of Orlicz Lorentz Spaces Boyd Indices of Orlicz Lorentz Spaces Department of Mathematics, University of Mis- STEPHEN J MONTGOMERY-SMITH souri, Columbia, Missouri 65211 ABSTRACT Orlicz Lorentz spaces provide a common generalization

More information

Nishant Gurnani. GAN Reading Group. April 14th, / 107

Nishant Gurnani. GAN Reading Group. April 14th, / 107 Nishant Gurnani GAN Reading Group April 14th, 2017 1 / 107 Why are these Papers Important? 2 / 107 Why are these Papers Important? Recently a large number of GAN frameworks have been proposed - BGAN, LSGAN,

More information

THE UNIVERSITY OF TORONTO UNDERGRADUATE MATHEMATICS COMPETITION. Sunday, March 14, 2004 Time: hours No aids or calculators permitted.

THE UNIVERSITY OF TORONTO UNDERGRADUATE MATHEMATICS COMPETITION. Sunday, March 14, 2004 Time: hours No aids or calculators permitted. THE UNIVERSITY OF TORONTO UNDERGRADUATE MATHEMATICS COMPETITION Sunday, March 1, Time: 3 1 hours No aids or calculators permitted. 1. Prove that, for any complex numbers z and w, ( z + w z z + w w z +

More information

21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk)

21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk) 10-704: Information Processing and Learning Spring 2015 Lecture 21: Examples of Lower Bounds and Assouad s Method Lecturer: Akshay Krishnamurthy Scribes: Soumya Batra Note: LaTeX template courtesy of UC

More information

L p Spaces and Convexity

L p Spaces and Convexity L p Spaces and Convexity These notes largely follow the treatments in Royden, Real Analysis, and Rudin, Real & Complex Analysis. 1. Convex functions Let I R be an interval. For I open, we say a function

More information

Bregman Divergences for Data Mining Meta-Algorithms

Bregman Divergences for Data Mining Meta-Algorithms p.1/?? Bregman Divergences for Data Mining Meta-Algorithms Joydeep Ghosh University of Texas at Austin ghosh@ece.utexas.edu Reflects joint work with Arindam Banerjee, Srujana Merugu, Inderjit Dhillon,

More information

Hamiltonian Mechanics

Hamiltonian Mechanics Chapter 3 Hamiltonian Mechanics 3.1 Convex functions As background to discuss Hamiltonian mechanics we discuss convexity and convex functions. We will also give some applications to thermodynamics. We

More information

Inference. Data. Model. Variates

Inference. Data. Model. Variates Data Inference Variates Model ˆθ (,..., ) mˆθn(d) m θ2 M m θ1 (,, ) (,,, ) (,, ) α = :=: (, ) F( ) = = {(, ),, } F( ) X( ) = Γ( ) = Σ = ( ) = ( ) ( ) = { = } :=: (U, ) , = { = } = { = } x 2 e i, e j

More information

Bounds for entropy and divergence for distributions over a two-element set

Bounds for entropy and divergence for distributions over a two-element set Bounds for entropy and divergence for distributions over a two-element set Flemming Topsøe Department of Mathematics University of Copenhagen, Denmark 18.1.00 Abstract Three results dealing with probability

More information

Received: 20 December 2011; in revised form: 4 February 2012 / Accepted: 7 February 2012 / Published: 2 March 2012

Received: 20 December 2011; in revised form: 4 February 2012 / Accepted: 7 February 2012 / Published: 2 March 2012 Entropy 2012, 14, 480-490; doi:10.3390/e14030480 Article OPEN ACCESS entropy ISSN 1099-4300 www.mdpi.com/journal/entropy Interval Entropy and Informative Distance Fakhroddin Misagh 1, * and Gholamhossein

More information

Information Theory and Hypothesis Testing

Information Theory and Hypothesis Testing Summer School on Game Theory and Telecommunications Campione, 7-12 September, 2014 Information Theory and Hypothesis Testing Mauro Barni University of Siena September 8 Review of some basic results linking

More information

Minimax lower bounds I

Minimax lower bounds I Minimax lower bounds I Kyoung Hee Kim Sungshin University 1 Preliminaries 2 General strategy 3 Le Cam, 1973 4 Assouad, 1983 5 Appendix Setting Family of probability measures {P θ : θ Θ} on a sigma field

More information

Supplementary Materials for: f-gan: Training Generative Neural Samplers using Variational Divergence Minimization

Supplementary Materials for: f-gan: Training Generative Neural Samplers using Variational Divergence Minimization Supplementary Materials for: f-gan: Training Generative Neural Samplers using Variational Divergence Minimization Sebastian Nowozin, Botond Cseke, Ryota Tomioka Machine Intelligence and Perception Group

More information

BISTA: a Bregmanian proximal gradient method without the global Lipschitz continuity assumption

BISTA: a Bregmanian proximal gradient method without the global Lipschitz continuity assumption BISTA: a Bregmanian proximal gradient method without the global Lipschitz continuity assumption Daniel Reem (joint work with Simeon Reich and Alvaro De Pierro) Department of Mathematics, The Technion,

More information

Speech Recognition Lecture 7: Maximum Entropy Models. Mehryar Mohri Courant Institute and Google Research

Speech Recognition Lecture 7: Maximum Entropy Models. Mehryar Mohri Courant Institute and Google Research Speech Recognition Lecture 7: Maximum Entropy Models Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.com This Lecture Information theory basics Maximum entropy models Duality theorem

More information

arxiv: v9 [math.st] 10 Apr 2016

arxiv: v9 [math.st] 10 Apr 2016 A New Family of Bounded Divergence Measures and Application to Signal Detection Shivakumar Jolad 1, Ahmed Roman, Mahesh C. Shastry 3, Mihir Gadgil 4 and Ayanendranath Basu 5 1 Department of Physics, Indian

More information

arxiv:math/ v1 [math.co] 3 Sep 2000

arxiv:math/ v1 [math.co] 3 Sep 2000 arxiv:math/0009026v1 [math.co] 3 Sep 2000 Max Min Representation of Piecewise Linear Functions Sergei Ovchinnikov Mathematics Department San Francisco State University San Francisco, CA 94132 sergei@sfsu.edu

More information

On Bayes Risk Lower Bounds

On Bayes Risk Lower Bounds Journal of Machine Learning Research 17 (2016) 1-58 Submitted 4/16; Revised 10/16; Published 12/16 On Bayes Risk Lower Bounds Xi Chen Stern School of Business New York University New York, NY 10012, USA

More information