On the Chi square and higher-order Chi distances for approximating f-divergences

Size: px
Start display at page:

Download "On the Chi square and higher-order Chi distances for approximating f-divergences"

Transcription

1 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 1/17 On the Chi square and higher-order Chi distances for approximating f-divergences Frank Nielsen 1 Richard Nock Sony Computer Science Laboratories, Inc. 2 UAG-CEREGMIA September 2013

2 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 2/17 Statistical divergences Measures the separability between two distributions. Examples: Pearson/Neymann χ 2, Kullback-Leibler divergence: χ 2 P (X 1 : X 2 ) = χ 2 N (X 1 : X 2 ) = KL(X 1 : X 2 ) = (x2 (x) x 1 (x)) 2 dν(x), x 1 (x) (x1 (x) x 2 (x)) 2 dν(x), x 2 (x) x 1 (x)log x 1(x) x 2 (x) dν(x),

3 f-divergences: A generic definition I f (X 1 : X 2 ) = where f is a convex function such that f(1) = 0. x 1 (x)f ( ) x2 (x) dν(x) 0, x 1 (x) f : (0, ) dom(f) [0, ] Jensen inequality: I f (X 1 : X 2 ) f( x 2 (x)dν(x)) = f(1) = 0. May consider f (1) = 0 and fix the scale of divergence by setting f (1) = 1. Can always be symmetrized: S f (X 1 : X 2 ) = I f (X 1 : X 2 )+I f (X 1 : X 2 ) with f (u) = uf(1/u), and I f (X 1 : X 2 ) = I f (X 2 : X 1 ). c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 3/17

4 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 4/17 f-divergences: Some examples Name of the f-divergence Formula I f (P : Q) Generator f(u) with f(1) = 0 Total variation (metric) Squared Hellinger 1 2 p(x) q(x) dν(x) 1 2 u 1 ( p(x) q(x)) 2 dν(x) ( u 1) 2 Pearson χ 2 (q(x) p(x)) 2 P dν(x) (u 1) 2 p(x) Neyman χ 2 (p(x) q(x)) 2 (1 u) N dν(x) 2 q(x) u Pearson-Vajda χ k (q(x) λp(x)) k P p k 1 dν(x) (u 1) k (x) Pearson-Vajda χ k q(x) λp(x) k P p k 1 dν(x) u 1 k (x) p(x) Kullback-Leibler p(x)log q(x) dν(x) logu q(x) reverse Kullback-Leibler q(x)log p(x) dν(x) u logu α-divergence 4 1 α 2 (1 p 1 α 2 (x)q 1+α (x)dν(x)) 4 1 α Jensen-Shannon 1 2p(x) 2 (p(x)log p(x)+q(x) + q(x)log 2q(x) )dν(x) (u + 1)log 1+u p(x)+q(x) 2 1+α 2 (1 u 2 ) + u logu

5 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 5/17 Stochastic approximations of f-divergences I (n) f (X 1 : X 2 ) 1 2n n i=1 ( f ( ) x2 (s i ) + x 1(t i ) x 1 (s i ) x 2 (t i ) f ( )) x2 (t i ), x 1 (t i ) with s 1,...,s n and t 1,...,t n IID. sampled from X 1 and X 2, respectively. lim I (n) n f (X 1 : X 2 ) I f (X 1 : X 2 ) work for any generator f but... In practice, limited to small dimension support.

6 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 6/17 Exponential families Canonical decomposition of the probability measure: p θ (x) = exp( t(x),θ F(θ)+k(x)), Here, consider natural parameter space Θ affine. Poi(λ) : p(x λ) = λx e λ,λ > 0,x {0,1,...} x! Nor I (µ) : p(x µ) = (2π) d 2 e 1 2 (x µ) (x µ),µ R d,x R d Family θ Θ F(θ) k(x) t(x) ν Poisson logλ R e θ logx! x ν c Iso.Gaussian µ R d 1 2 θ θ d 2 log2π 1 2 x x x ν L

7 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 7/17 χ 2 for affine exponential families Bypass integral computation, Closed-form formula χ 2 P (X 1 : X 2 ) = e F(2θ 2 θ 1 ) (2F(θ 2 ) F(θ 1 )) 1, χ 2 N (X 1 : X 2 ) = e F(2θ 1 θ 2 ) (2F(θ 1 ) F(θ 2 )) 1, Kullback-Leibler divergence amounts to a Bregman divergence [3]: KL(X 1 : X 2 ) = B F (θ 2 : θ 1 ) B F (θ : θ ) = F(θ) F(θ ) (θ θ ) F(θ )

8 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 8/17 Higher-order Vajda χ k divergences χ k P (X 1 : X 2 ) = χ k P (X 1 : X 2 ) = (x2 (x) x 1 (x)) k x 1 (x) k 1 dν(x), x2 (x) x 1 (x) k x 1 (x) k 1 dν(x), are f-divergences for the generators (u 1) k and u 1 k. When k = 1, χ 1 P (X 1 : X 2 ) = (x 1 (x) x 2 (x))dν(x) = 0 (never discriminative), and χ 1 P (X 1,X 2 ) is twice the total variation distance. χ 0 P χ k P is the unit constant. is a signed distance

9 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 9/17 Higher-order Vajda χ k divergences Lemma The (signed) χ k P distance between members X 1 E F (θ 1 ) and X 2 E F (θ 2 ) of the same affine exponential family is (k N) always bounded and equal to: χ k P (X 1 : X 2 ) = k ( k ( 1) k j j j=0 ) e F((1 j)θ 1 +jθ 2 ) e (1 j)f(θ 1)+jF(θ 2 ).

10 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 10/17 Higher-order Vajda χ k divergences: For Poisson/Normal distributions, we get closed-form formula: χ k P (λ 1 : λ 2 ) = χ k P (µ 1 : µ 2 ) = k ( k ( 1) k j j j=0 k ( k ( 1) k j j j=0 ) e λ1 j 1 λ j 2 ((1 j)λ 1+jλ 2 ), ) e 1 2 j(j 1)(µ 1 µ 2 ) (µ 1 µ 2 ). signed distances.

11 f-divergences from Taylor series Lemma (extends Theorem 1 of [1]) When bounded, the f-divergence I f can be expressed as the power series of higher order Chi-type distances: I f (X 1 : X 2 ) = = i=0 x 1 (x) i=0 ( ) 1 i! f (i) x2 (x) i (λ) x 1 (x) λ dν(x), 1 i! f (i) (λ) χ i λ,p (X 1 : X 2 ), I f <, and χ i λ,p (X 1 : X 2 ) is a generalization of the χ i P defined by: χ i λ,p (X (x2 (x) λx 1 (x)) i 1 : X 2 ) = x 1 (x) i 1 dν(x). and χ 0 λ,p (X 1 : X 2 ) = 1 by convention. Note that χ i λ,p f(1) = (1 λ)k is a f-divergence for f(u) = (u λ) k (1 λ) k c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 11/17

12 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 12/17 f-divergences: Analytic formula λ = 1 int(dom(f (i) )), f-divergence (Theorem 1 of [1]): I f (X 1 : X 2 ) s k=0 f (k) (1) χ k P k! (X 1 : X 2 ) 1 (s +1)! f (s+1) (M m) s, where f (s+1) = sup t [m,m] f (s+1) (t) and m p q M. λ = 0 (whenever 0 int(dom(f (i) ))) and affine exponential families, simpler expression: I f (X 1 : X 2 ) = I 1 i,i (θ 1 : θ 2 ) = i=0 f (i) (0) I 1 i,i (θ 1 : θ 2 ), i! e F(iθ 2+(1 i)θ 1 ) e if(θ 2)+(1 i)f(θ 1 ).

13 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 13/17 Corollary: Approximating f-divergences by χ 2 divergences Corollary A second-order Taylor expansion yields I f (X 1 : X 2 ) f(1)+f (1)χ 1 N (X 1 : X 2 )+ 1 2 f (1)χ 2 N (X 1 : X 2 ) Since f(1) = 0 and χ 1 N (X 1 : X 2 ) = 0, it follows that I f (X 1 : X 2 ) f (1) 2 χ2 N (X 1 : X 2 ), (f (1) > 0 follows from the strict convexity of the generator). When f(u) = ulogu, this yields the well-known approximation [2]: χ 2 P (X 1 : X 2 ) 2 KL(X 1 : X 2 ).

14 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 14/17 Kullback-Leibler divergence: Analytic expression Kullback-Leibler divergence: f(u) = log u. f (i) (u) = ( 1) i (i 1)!u i and hence f (i) (1) i! = ( 1)i i, for i 1 (with f(1) = 0). Since χ 1 1,P = 0, it follows that: KL(X 1 : X 2 ) = ( 1) i j=2 i χ j P (X 1 : X 2 ). alternating sign sequence Poisson distributions: λ 1 = 0.6 and λ 2 = 0.3, KL (exact using Bregman divergence), stochastic evaluation with n = 10 6 yields KL KL divergence from Taylor truncation: (s = 2), (s = 3), (s = 4), (s = 10), (s = 15), etc.

15 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 15/17 Contributions Statistical f-divergences between members of the same exponential family with affine natural space. Generic closed-form formula for the Pearson/Neyman χ 2 and Vajda χ k -type distance Analytic expression of f-divergences using Pearson-Vajda-type distances. Second-order Taylor approximation for fast estimation of f-divergences. Java TM package:

16 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 16/17 Thank author="frank Nielsen and Richard Nock", title="on the {C}hi square and higher-order {C}hi distances for approximating $f$-divergences", year="2013", eprint="arxiv/ " }

17 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 17/17 Bibliographic references I N.S. Barnett, P. Cerone, S.S. Dragomir, and A. Sofo. Approximating Csiszár f-divergence by the use of Taylor s formula with integral remainder. Mathematical Inequalities & Applications, 5(3): , Thomas M. Cover and Joy A. Thomas. Elements of information theory. Wiley-Interscience, New York, NY, USA, Frank Nielsen and Sylvain Boltz. The Burbea-Rao and Bhattacharyya centroids. IEEE Transactions on Information Theory, 57(8): , August 2011.

On the Chi square and higher-order Chi distances for approximating f-divergences

On the Chi square and higher-order Chi distances for approximating f-divergences On the Chi square and higher-order Chi distances for approximating f-divergences Frank Nielsen, Senior Member, IEEE and Richard Nock, Nonmember Abstract We report closed-form formula for calculating the

More information

topics about f-divergence

topics about f-divergence topics about f-divergence Presented by Liqun Chen Mar 16th, 2018 1 Outline 1 f-gan: Training Generative Neural Samplers using Variational Experiments 2 f-gans in an Information Geometric Nutshell Experiments

More information

arxiv: v2 [math.st] 22 Aug 2018

arxiv: v2 [math.st] 22 Aug 2018 GENERALIZED BREGMAN AND JENSEN DIVERGENCES WHICH INCLUDE SOME F-DIVERGENCES TOMOHIRO NISHIYAMA arxiv:808.0648v2 [math.st] 22 Aug 208 Abstract. In this paper, we introduce new classes of divergences by

More information

On Improved Bounds for Probability Metrics and f- Divergences

On Improved Bounds for Probability Metrics and f- Divergences IRWIN AND JOAN JACOBS CENTER FOR COMMUNICATION AND INFORMATION TECHNOLOGIES On Improved Bounds for Probability Metrics and f- Divergences Igal Sason CCIT Report #855 March 014 Electronics Computers Communications

More information

Stat410 Probability and Statistics II (F16)

Stat410 Probability and Statistics II (F16) Stat4 Probability and Statistics II (F6 Exponential, Poisson and Gamma Suppose on average every /λ hours, a Stochastic train arrives at the Random station. Further we assume the waiting time between two

More information

arxiv: v4 [cs.it] 8 Apr 2014

arxiv: v4 [cs.it] 8 Apr 2014 1 On Improved Bounds for Probability Metrics and f-divergences Igal Sason Abstract Derivation of tight bounds for probability metrics and f-divergences is of interest in information theory and statistics.

More information

arxiv: v2 [cs.cv] 19 Dec 2011

arxiv: v2 [cs.cv] 19 Dec 2011 A family of statistical symmetric divergences based on Jensen s inequality arxiv:009.4004v [cs.cv] 9 Dec 0 Abstract Frank Nielsen a,b, a Ecole Polytechnique Computer Science Department (LIX) Palaiseau,

More information

A View on Extension of Utility-Based on Links with Information Measures

A View on Extension of Utility-Based on Links with Information Measures Communications of the Korean Statistical Society 2009, Vol. 16, No. 5, 813 820 A View on Extension of Utility-Based on Links with Information Measures A.R. Hoseinzadeh a, G.R. Mohtashami Borzadaran 1,b,

More information

Statistics 300A, Homework 7. T (x) = C(b) =

Statistics 300A, Homework 7. T (x) = C(b) = Statistics 300A, Homework 7 Modified based on Brad Klingenberg s Solutions 3.34 (i Suppose that g is known, then the joint density of X,..., X n is given by ( [ n Γ(gb g (x x n g exp b which is (3.9 with

More information

Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding

Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding APPEARS IN THE IEEE TRANSACTIONS ON INFORMATION THEORY, FEBRUARY 015 1 Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding Igal Sason Abstract Tight bounds for

More information

A SYMMETRIC INFORMATION DIVERGENCE MEASURE OF CSISZAR'S F DIVERGENCE CLASS

A SYMMETRIC INFORMATION DIVERGENCE MEASURE OF CSISZAR'S F DIVERGENCE CLASS Journal of the Applied Mathematics, Statistics Informatics (JAMSI), 7 (2011), No. 1 A SYMMETRIC INFORMATION DIVERGENCE MEASURE OF CSISZAR'S F DIVERGENCE CLASS K.C. JAIN AND R. MATHUR Abstract Information

More information

June 21, Peking University. Dual Connections. Zhengchao Wan. Overview. Duality of connections. Divergence: general contrast functions

June 21, Peking University. Dual Connections. Zhengchao Wan. Overview. Duality of connections. Divergence: general contrast functions Dual Peking University June 21, 2016 Divergences: Riemannian connection Let M be a manifold on which there is given a Riemannian metric g =,. A connection satisfying Z X, Y = Z X, Y + X, Z Y (1) for all

More information

Convex Functions. Wing-Kin (Ken) Ma The Chinese University of Hong Kong (CUHK)

Convex Functions. Wing-Kin (Ken) Ma The Chinese University of Hong Kong (CUHK) Convex Functions Wing-Kin (Ken) Ma The Chinese University of Hong Kong (CUHK) Course on Convex Optimization for Wireless Comm. and Signal Proc. Jointly taught by Daniel P. Palomar and Wing-Kin (Ken) Ma

More information

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18 Information Theory David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 18 A Measure of Information? Consider a discrete random variable

More information

EECS 750. Hypothesis Testing with Communication Constraints

EECS 750. Hypothesis Testing with Communication Constraints EECS 750 Hypothesis Testing with Communication Constraints Name: Dinesh Krithivasan Abstract In this report, we study a modification of the classical statistical problem of bivariate hypothesis testing.

More information

Information Theory and Hypothesis Testing

Information Theory and Hypothesis Testing Summer School on Game Theory and Telecommunications Campione, 7-12 September, 2014 Information Theory and Hypothesis Testing Mauro Barni University of Siena September 8 Review of some basic results linking

More information

IE 521 Convex Optimization

IE 521 Convex Optimization Lecture 5: Convex II 6th February 2019 Convex Local Lipschitz Outline Local Lipschitz 1 / 23 Convex Local Lipschitz Convex Function: f : R n R is convex if dom(f ) is convex and for any λ [0, 1], x, y

More information

1/37. Convexity theory. Victor Kitov

1/37. Convexity theory. Victor Kitov 1/37 Convexity theory Victor Kitov 2/37 Table of Contents 1 2 Strictly convex functions 3 Concave & strictly concave functions 4 Kullback-Leibler divergence 3/37 Convex sets Denition 1 Set X is convex

More information

arxiv:math/ v1 [math.st] 19 Jan 2005

arxiv:math/ v1 [math.st] 19 Jan 2005 ON A DIFFERENCE OF JENSEN INEQUALITY AND ITS APPLICATIONS TO MEAN DIVERGENCE MEASURES INDER JEET TANEJA arxiv:math/05030v [math.st] 9 Jan 005 Let Abstract. In this paper we have considered a difference

More information

Sharpening Jensen s Inequality

Sharpening Jensen s Inequality Sharpening Jensen s Inequality arxiv:1707.08644v [math.st] 4 Oct 017 J. G. Liao and Arthur Berg Division of Biostatistics and Bioinformatics Penn State University College of Medicine October 6, 017 Abstract

More information

Robustness and duality of maximum entropy and exponential family distributions

Robustness and duality of maximum entropy and exponential family distributions Chapter 7 Robustness and duality of maximum entropy and exponential family distributions In this lecture, we continue our study of exponential families, but now we investigate their properties in somewhat

More information

Applications of Information Geometry to Hypothesis Testing and Signal Detection

Applications of Information Geometry to Hypothesis Testing and Signal Detection CMCAA 2016 Applications of Information Geometry to Hypothesis Testing and Signal Detection Yongqiang Cheng National University of Defense Technology July 2016 Outline 1. Principles of Information Geometry

More information

Posterior Regularization

Posterior Regularization Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods

More information

Research Article On Some Improvements of the Jensen Inequality with Some Applications

Research Article On Some Improvements of the Jensen Inequality with Some Applications Hindawi Publishing Corporation Journal of Inequalities and Applications Volume 2009, Article ID 323615, 15 pages doi:10.1155/2009/323615 Research Article On Some Improvements of the Jensen Inequality with

More information

arxiv: v9 [math.st] 10 Apr 2016

arxiv: v9 [math.st] 10 Apr 2016 A New Family of Bounded Divergence Measures and Application to Signal Detection Shivakumar Jolad 1, Ahmed Roman, Mahesh C. Shastry 3, Mihir Gadgil 4 and Ayanendranath Basu 5 1 Department of Physics, Indian

More information

Speech Recognition Lecture 7: Maximum Entropy Models. Mehryar Mohri Courant Institute and Google Research

Speech Recognition Lecture 7: Maximum Entropy Models. Mehryar Mohri Courant Institute and Google Research Speech Recognition Lecture 7: Maximum Entropy Models Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.com This Lecture Information theory basics Maximum entropy models Duality theorem

More information

PROPERTIES. Ferdinand Österreicher Institute of Mathematics, University of Salzburg, Austria

PROPERTIES. Ferdinand Österreicher Institute of Mathematics, University of Salzburg, Austria CSISZÁR S f-divergences - BASIC PROPERTIES Ferdinand Österreicher Institute of Mathematics, University of Salzburg, Austria Abstract In this talk basic general properties of f-divergences, including their

More information

Supplementary Materials for: f-gan: Training Generative Neural Samplers using Variational Divergence Minimization

Supplementary Materials for: f-gan: Training Generative Neural Samplers using Variational Divergence Minimization Supplementary Materials for: f-gan: Training Generative Neural Samplers using Variational Divergence Minimization Sebastian Nowozin, Botond Cseke, Ryota Tomioka Machine Intelligence and Perception Group

More information

Convexity/Concavity of Renyi Entropy and α-mutual Information

Convexity/Concavity of Renyi Entropy and α-mutual Information Convexity/Concavity of Renyi Entropy and -Mutual Information Siu-Wai Ho Institute for Telecommunications Research University of South Australia Adelaide, SA 5095, Australia Email: siuwai.ho@unisa.edu.au

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 6 Standard Kernels Unusual Input Spaces for Kernels String Kernels Probabilistic Kernels Fisher Kernels Probability Product Kernels

More information

IT and large deviation theory

IT and large deviation theory PhD short course Information Theory and Statistics Siena, 15-19 September, 2014 IT and large deviation theory Mauro Barni University of Siena Outline of the short course Part 1: Information theory in a

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

Mathematical Foundations of the Generalization of t-sne and SNE for Arbitrary Divergences

Mathematical Foundations of the Generalization of t-sne and SNE for Arbitrary Divergences MACHINE LEARNING REPORTS Mathematical Foundations of the Generalization of t-sne and SNE for Arbitrary Report 02/2010 Submitted: 01.04.2010 Published:26.04.2010 T. Villmann and S. Haase University of Applied

More information

Validation Metrics. Kathryn Maupin. Laura Swiler. June 28, 2017

Validation Metrics. Kathryn Maupin. Laura Swiler. June 28, 2017 Validation Metrics Kathryn Maupin Laura Swiler June 28, 2017 Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia LLC,

More information

Tight Bounds for Symmetric Divergence Measures and a New Inequality Relating f-divergences

Tight Bounds for Symmetric Divergence Measures and a New Inequality Relating f-divergences Tight Bounds for Symmetric Divergence Measures and a New Inequality Relating f-divergences Igal Sason Department of Electrical Engineering Technion, Haifa 3000, Israel E-mail: sason@ee.technion.ac.il Abstract

More information

Minimax lower bounds I

Minimax lower bounds I Minimax lower bounds I Kyoung Hee Kim Sungshin University 1 Preliminaries 2 General strategy 3 Le Cam, 1973 4 Assouad, 1983 5 Appendix Setting Family of probability measures {P θ : θ Θ} on a sigma field

More information

Inference. Data. Model. Variates

Inference. Data. Model. Variates Data Inference Variates Model ˆθ (,..., ) mˆθn(d) m θ2 M m θ1 (,, ) (,,, ) (,, ) α = :=: (, ) F( ) = = {(, ),, } F( ) X( ) = Γ( ) = Σ = ( ) = ( ) ( ) = { = } :=: (U, ) , = { = } = { = } x 2 e i, e j

More information

Information Theory in Intelligent Decision Making

Information Theory in Intelligent Decision Making Information Theory in Intelligent Decision Making Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire, United Kingdom June 7, 2015 Information Theory

More information

CONVEX FUNCTIONS AND MATRICES. Silvestru Sever Dragomir

CONVEX FUNCTIONS AND MATRICES. Silvestru Sever Dragomir Korean J. Math. 6 (018), No. 3, pp. 349 371 https://doi.org/10.11568/kjm.018.6.3.349 INEQUALITIES FOR QUANTUM f-divergence OF CONVEX FUNCTIONS AND MATRICES Silvestru Sever Dragomir Abstract. Some inequalities

More information

arxiv: v2 [cs.it] 26 Sep 2011

arxiv: v2 [cs.it] 26 Sep 2011 Sequences o Inequalities Among New Divergence Measures arxiv:1010.041v [cs.it] 6 Sep 011 Inder Jeet Taneja Departamento de Matemática Universidade Federal de Santa Catarina 88.040-900 Florianópolis SC

More information

6.1 Variational representation of f-divergences

6.1 Variational representation of f-divergences ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016

More information

ONE SILVESTRU SEVER DRAGOMIR 1;2

ONE SILVESTRU SEVER DRAGOMIR 1;2 SOME -DIVERGENCE MEASURES RELATED TO JENSEN S ONE SILVESTRU SEVER DRAGOMIR ; Abstract. In this paper we introuce some -ivergence measures that are relate to the Jensen s ivergence introuce by Burbea an

More information

arxiv: v1 [cs.lg] 22 Oct 2018

arxiv: v1 [cs.lg] 22 Oct 2018 The Bregman chord divergence arxiv:1810.09113v1 [cs.lg] Oct 018 rank Nielsen Sony Computer Science Laboratories Inc, Japan rank.nielsen@acm.org Richard Nock Data 61, Australia The Australian National &

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

f-divergence Estimation and Two-Sample Homogeneity Test under Semiparametric Density-Ratio Models

f-divergence Estimation and Two-Sample Homogeneity Test under Semiparametric Density-Ratio Models IEEE Transactions on Information Theory, vol.58, no.2, pp.708 720, 2012. 1 f-divergence Estimation and Two-Sample Homogeneity Test under Semiparametric Density-Ratio Models Takafumi Kanamori Nagoya University,

More information

CS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018

CS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 1 Overview This lecture mainly covers Recall the statistical theory of GANs

More information

The Information Bottleneck Revisited or How to Choose a Good Distortion Measure

The Information Bottleneck Revisited or How to Choose a Good Distortion Measure The Information Bottleneck Revisited or How to Choose a Good Distortion Measure Peter Harremoës Centrum voor Wiskunde en Informatica PO 94079, 1090 GB Amsterdam The Nederlands PHarremoes@cwinl Naftali

More information

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 Lecture 14: Information Theoretic Methods Lecturer: Jiaming Xu Scribe: Hilda Ibriga, Adarsh Barik, December 02, 2016 Outline f-divergence

More information

Pattern learning and recognition on statistical manifolds: An information-geometric review

Pattern learning and recognition on statistical manifolds: An information-geometric review c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 1/64 Pattern learning and recognition on statistical manifolds: An information-geometric review Frank Nielsen Frank.Nielsen@acm.org www.informationgeometry.org

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

Chapter 3 : Likelihood function and inference

Chapter 3 : Likelihood function and inference Chapter 3 : Likelihood function and inference 4 Likelihood function and inference The likelihood Information and curvature Sufficiency and ancilarity Maximum likelihood estimation Non-regular models EM

More information

ASYMPTOTIC PROPERTIES OF PREDICTIVE RECURSION: ROBUSTNESS AND RATE OF CONVERGENCE

ASYMPTOTIC PROPERTIES OF PREDICTIVE RECURSION: ROBUSTNESS AND RATE OF CONVERGENCE ASYMPTOTIC PROPERTIES OF PREDICTIVE RECURSION: ROBUSTNESS AND RATE OF CONVERGENCE By Ryan Martin and Surya T. Tokdar Indiana University-Purdue University Indianapolis and Duke University Here we explore

More information

CALCULUS JIA-MING (FRANK) LIOU

CALCULUS JIA-MING (FRANK) LIOU CALCULUS JIA-MING (FRANK) LIOU Abstract. Contents. Power Series.. Polynomials and Formal Power Series.2. Radius of Convergence 2.3. Derivative and Antiderivative of Power Series 4.4. Power Series Expansion

More information

Consistency of the maximum likelihood estimator for general hidden Markov models

Consistency of the maximum likelihood estimator for general hidden Markov models Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models

More information

Orthonormal polynomial expansions and lognormal sum densities

Orthonormal polynomial expansions and lognormal sum densities 1/28 Orthonormal polynomial expansions and lognormal sum densities Pierre-Olivier Goffard Université Libre de Bruxelles pierre-olivier.goffard@ulb.ac.be February 22, 2016 2/28 Introduction Motivations

More information

Bayesian Consistency for Markov Models

Bayesian Consistency for Markov Models Bayesian Consistency for Markov Models Isadora Antoniano-Villalobos Bocconi University, Milan, Italy. Stephen G. Walker University of Texas at Austin, USA. Abstract We consider sufficient conditions for

More information

This article was published in an Elsevier journal. The attached copy is furnished to the author for non-commercial research and education use, including for instruction at the author s institution, sharing

More information

Auxiliary-Function Methods in Optimization

Auxiliary-Function Methods in Optimization Auxiliary-Function Methods in Optimization Charles Byrne (Charles Byrne@uml.edu) http://faculty.uml.edu/cbyrne/cbyrne.html Department of Mathematical Sciences University of Massachusetts Lowell Lowell,

More information

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata

Testing Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function

More information

Legendre transformation and information geometry

Legendre transformation and information geometry Legendre transformation and information geometry CIG-MEMO #2, v1 Frank Nielsen École Polytechnique Sony Computer Science Laboratorie, Inc http://www.informationgeometry.org September 2010 Abstract We explain

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

Policy Gradient. U(θ) = E[ R(s t,a t );π θ ] = E[R(τ);π θ ] (1) 1 + e θ φ(s t) E[R(τ);π θ ] (3) = max. θ P(τ;θ)R(τ) (6) P(τ;θ) θ log P(τ;θ)R(τ) (9)

Policy Gradient. U(θ) = E[ R(s t,a t );π θ ] = E[R(τ);π θ ] (1) 1 + e θ φ(s t) E[R(τ);π θ ] (3) = max. θ P(τ;θ)R(τ) (6) P(τ;θ) θ log P(τ;θ)R(τ) (9) CS294-40 Learning for Robotics and Control Lecture 16-10/20/2008 Lecturer: Pieter Abbeel Policy Gradient Scribe: Jan Biermeyer 1 Recap Recall: H U() = E[ R(s t,a ;π ] = E[R();π ] (1) Here is a sample path

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

THE DVORETZKY KIEFER WOLFOWITZ INEQUALITY WITH SHARP CONSTANT: MASSART S 1990 PROOF SEMINAR, SEPT. 28, R. M. Dudley

THE DVORETZKY KIEFER WOLFOWITZ INEQUALITY WITH SHARP CONSTANT: MASSART S 1990 PROOF SEMINAR, SEPT. 28, R. M. Dudley THE DVORETZKY KIEFER WOLFOWITZ INEQUALITY WITH SHARP CONSTANT: MASSART S 1990 PROOF SEMINAR, SEPT. 28, 2011 R. M. Dudley 1 A. Dvoretzky, J. Kiefer, and J. Wolfowitz 1956 proved the Dvoretzky Kiefer Wolfowitz

More information

Inference in non-linear time series

Inference in non-linear time series Intro LS MLE Other Erik Lindström Centre for Mathematical Sciences Lund University LU/LTH & DTU Intro LS MLE Other General Properties Popular estimatiors Overview Introduction General Properties Estimators

More information

Geometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling

Geometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling Geometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling Paul Marriott 1, Radka Sabolova 2, Germain Van Bever 2, and Frank Critchley 2 1 University of Waterloo, Waterloo, Ontario,

More information

On Information Divergence Measures, Surrogate Loss Functions and Decentralized Hypothesis Testing

On Information Divergence Measures, Surrogate Loss Functions and Decentralized Hypothesis Testing On Information Divergence Measures, Surrogate Loss Functions and Decentralized Hypothesis Testing XuanLong Nguyen Martin J. Wainwright Michael I. Jordan Electrical Engineering & Computer Science Department

More information

Statistical Inference

Statistical Inference Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Week 12. Testing and Kullback-Leibler Divergence 1. Likelihood Ratios Let 1, 2, 2,...

More information

Bayes spaces: use of improper priors and distances between densities

Bayes spaces: use of improper priors and distances between densities Bayes spaces: use of improper priors and distances between densities J. J. Egozcue 1, V. Pawlowsky-Glahn 2, R. Tolosana-Delgado 1, M. I. Ortego 1 and G. van den Boogaart 3 1 Universidad Politécnica de

More information

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H. Appendix A Information Theory A.1 Entropy Shannon (Shanon, 1948) developed the concept of entropy to measure the uncertainty of a discrete random variable. Suppose X is a discrete random variable that

More information

Maximization of the information divergence from the multinomial distributions 1

Maximization of the information divergence from the multinomial distributions 1 aximization of the information divergence from the multinomial distributions Jozef Juríček Charles University in Prague Faculty of athematics and Physics Department of Probability and athematical Statistics

More information

Nested Inequalities Among Divergence Measures

Nested Inequalities Among Divergence Measures Appl Math Inf Sci 7, No, 49-7 0 49 Applied Mathematics & Information Sciences An International Journal c 0 NSP Natural Sciences Publishing Cor Nested Inequalities Among Divergence Measures Inder J Taneja

More information

Divergence measure of intuitionistic fuzzy sets

Divergence measure of intuitionistic fuzzy sets Divergence measure of intuitionistic fuzzy sets Fuyuan Xiao a, a School of Computer and Information Science, Southwest University, Chongqing, 400715, China Abstract As a generation of fuzzy sets, the intuitionistic

More information

2882 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 6, JUNE A formal definition considers probability measures P and Q defined

2882 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 6, JUNE A formal definition considers probability measures P and Q defined 2882 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 6, JUNE 2009 Sided and Symmetrized Bregman Centroids Frank Nielsen, Member, IEEE, and Richard Nock Abstract In this paper, we generalize the notions

More information

21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk)

21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk) 10-704: Information Processing and Learning Spring 2015 Lecture 21: Examples of Lower Bounds and Assouad s Method Lecturer: Akshay Krishnamurthy Scribes: Soumya Batra Note: LaTeX template courtesy of UC

More information

Goodness of Fit Test and Test of Independence by Entropy

Goodness of Fit Test and Test of Independence by Entropy Journal of Mathematical Extension Vol. 3, No. 2 (2009), 43-59 Goodness of Fit Test and Test of Independence by Entropy M. Sharifdoost Islamic Azad University Science & Research Branch, Tehran N. Nematollahi

More information

4 Expectation & the Lebesgue Theorems

4 Expectation & the Lebesgue Theorems STA 205: Probability & Measure Theory Robert L. Wolpert 4 Expectation & the Lebesgue Theorems Let X and {X n : n N} be random variables on a probability space (Ω,F,P). If X n (ω) X(ω) for each ω Ω, does

More information

Arimoto Channel Coding Converse and Rényi Divergence

Arimoto Channel Coding Converse and Rényi Divergence Arimoto Channel Coding Converse and Rényi Divergence Yury Polyanskiy and Sergio Verdú Abstract Arimoto proved a non-asymptotic upper bound on the probability of successful decoding achievable by any code

More information

DA Freedman Notes on the MLE Fall 2003

DA Freedman Notes on the MLE Fall 2003 DA Freedman Notes on the MLE Fall 2003 The object here is to provide a sketch of the theory of the MLE. Rigorous presentations can be found in the references cited below. Calculus. Let f be a smooth, scalar

More information

Lecture 1: Introduction

Lecture 1: Introduction Principles of Statistics Part II - Michaelmas 208 Lecturer: Quentin Berthet Lecture : Introduction This course is concerned with presenting some of the mathematical principles of statistical theory. One

More information

On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method

On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method Igal Sason Department of Electrical Engineering Technion - Israel Institute of Technology Haifa 32000, Israel ETH, Zurich,

More information

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1, Economics 520 Lecture Note 9: Hypothesis Testing via the Neyman-Pearson Lemma CB 8., 8.3.-8.3.3 Uniformly Most Powerful Tests and the Neyman-Pearson Lemma Let s return to the hypothesis testing problem

More information

(Multivariate) Gaussian (Normal) Probability Densities

(Multivariate) Gaussian (Normal) Probability Densities (Multivariate) Gaussian (Normal) Probability Densities Carl Edward Rasmussen, José Miguel Hernández-Lobato & Richard Turner April 20th, 2018 Rasmussen, Hernàndez-Lobato & Turner Gaussian Densities April

More information

Introduction to Information Geometry

Introduction to Information Geometry Introduction to Information Geometry based on the book Methods of Information Geometry written by Shun-Ichi Amari and Hiroshi Nagaoka Yunshu Liu 2012-02-17 Outline 1 Introduction to differential geometry

More information

Week 3: The EM algorithm

Week 3: The EM algorithm Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent

More information

THEOREM AND METRIZABILITY

THEOREM AND METRIZABILITY f-divergences - REPRESENTATION THEOREM AND METRIZABILITY Ferdinand Österreicher Institute of Mathematics, University of Salzburg, Austria Abstract In this talk we are first going to state the so-called

More information

A New Quantum f-divergence for Trace Class Operators in Hilbert Spaces

A New Quantum f-divergence for Trace Class Operators in Hilbert Spaces Entropy 04, 6, 5853-5875; doi:0.3390/e65853 OPEN ACCESS entropy ISSN 099-4300 www.mdpi.com/journal/entropy Article A New Quantum f-divergence for Trace Class Operators in Hilbert Spaces Silvestru Sever

More information

Total Jensen divergences: Definition, Properties and k-means++ Clustering

Total Jensen divergences: Definition, Properties and k-means++ Clustering Total Jensen divergences: Definition, Properties and k-means++ Clustering Frank Nielsen, Senior Member, IEEE Sony Computer Science Laboratories, Inc. 3-4-3 Higashi Gotanda, 4-00 Shinagawa-ku arxiv:309.709v

More information

4. Convex optimization problems (part 1: general)

4. Convex optimization problems (part 1: general) EE/AA 578, Univ of Washington, Fall 2016 4. Convex optimization problems (part 1: general) optimization problem in standard form convex optimization problems quasiconvex optimization 4 1 Optimization problem

More information

Lecture 3: Lower Bounds for Bandit Algorithms

Lecture 3: Lower Bounds for Bandit Algorithms CMSC 858G: Bandits, Experts and Games 09/19/16 Lecture 3: Lower Bounds for Bandit Algorithms Instructor: Alex Slivkins Scribed by: Soham De & Karthik A Sankararaman 1 Lower Bounds In this lecture (and

More information

arxiv: v4 [cs.it] 17 Oct 2015

arxiv: v4 [cs.it] 17 Oct 2015 Upper Bounds on the Relative Entropy and Rényi Divergence as a Function of Total Variation Distance for Finite Alphabets Igal Sason Department of Electrical Engineering Technion Israel Institute of Technology

More information

ZOBECNĚNÉ PHI-DIVERGENCE A EM-ALGORITMUS V AKUSTICKÉ EMISI

ZOBECNĚNÉ PHI-DIVERGENCE A EM-ALGORITMUS V AKUSTICKÉ EMISI ZOBECNĚNÉ PHI-DIVERGENCE A EM-ALGORITMUS V AKUSTICKÉ EMISI Jan Tláskal a Václav Kůs Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University in Prague 11.11.2010 Concepts What the

More information

Cramér-Rao Lower Bound and Information Geometry

Cramér-Rao Lower Bound and Information Geometry arxiv:1301.3578v2 [cs.it] 24 Jan 2013 Cramér-Rao Lower Bound and Information Geometry Frank Nielsen Sony Computer Science Laboratories Inc., Japan École Polytechnique, LIX, France 1 Introduction and historical

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology Singapore University of Design and Technology Lecture 9: Hypothesis testing, uniformly most powerful tests. The Neyman-Pearson framework Let P be the family of distributions of concern. The Neyman-Pearson

More information

f-gan: Training Generative Neural Samplers using Variational Divergence Minimization

f-gan: Training Generative Neural Samplers using Variational Divergence Minimization f-gan: Training Generative Neural Samplers using Variational Divergence Minimization Sebastian Nowozin, Botond Cseke, Ryota Tomioka Machine Intelligence and Perception Group Microsoft Research {Sebastian.Nowozin,

More information

Polynomial Preserving Jump-Diffusions on the Unit Interval

Polynomial Preserving Jump-Diffusions on the Unit Interval Polynomial Preserving Jump-Diffusions on the Unit Interval Martin Larsson Department of Mathematics, ETH Zürich joint work with Christa Cuchiero and Sara Svaluto-Ferro 7th General AMaMeF and Swissquote

More information

On divergences, surrogate loss functions, and decentralized detection

On divergences, surrogate loss functions, and decentralized detection On divergences, surrogate loss functions, and decentralized detection XuanLong Nguyen Computer Science Division University of California, Berkeley xuanlong@eecs.berkeley.edu Martin J. Wainwright Statistics

More information

Estimators, escort probabilities, and φ-exponential families in statistical physics

Estimators, escort probabilities, and φ-exponential families in statistical physics Estimators, escort probabilities, and φ-exponential families in statistical physics arxiv:math-ph/425v 4 Feb 24 Jan Naudts Departement Natuurkunde, Universiteit Antwerpen UIA, Universiteitsplein, 26 Antwerpen,

More information

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Ann Inst Stat Math (2009) 61:773 787 DOI 10.1007/s10463-008-0172-6 Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Taisuke Otsu Received: 1 June 2007 / Revised:

More information