On the Chi square and higher-order Chi distances for approximating f-divergences
|
|
- Tamsin Reeves
- 5 years ago
- Views:
Transcription
1 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 1/17 On the Chi square and higher-order Chi distances for approximating f-divergences Frank Nielsen 1 Richard Nock Sony Computer Science Laboratories, Inc. 2 UAG-CEREGMIA September 2013
2 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 2/17 Statistical divergences Measures the separability between two distributions. Examples: Pearson/Neymann χ 2, Kullback-Leibler divergence: χ 2 P (X 1 : X 2 ) = χ 2 N (X 1 : X 2 ) = KL(X 1 : X 2 ) = (x2 (x) x 1 (x)) 2 dν(x), x 1 (x) (x1 (x) x 2 (x)) 2 dν(x), x 2 (x) x 1 (x)log x 1(x) x 2 (x) dν(x),
3 f-divergences: A generic definition I f (X 1 : X 2 ) = where f is a convex function such that f(1) = 0. x 1 (x)f ( ) x2 (x) dν(x) 0, x 1 (x) f : (0, ) dom(f) [0, ] Jensen inequality: I f (X 1 : X 2 ) f( x 2 (x)dν(x)) = f(1) = 0. May consider f (1) = 0 and fix the scale of divergence by setting f (1) = 1. Can always be symmetrized: S f (X 1 : X 2 ) = I f (X 1 : X 2 )+I f (X 1 : X 2 ) with f (u) = uf(1/u), and I f (X 1 : X 2 ) = I f (X 2 : X 1 ). c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 3/17
4 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 4/17 f-divergences: Some examples Name of the f-divergence Formula I f (P : Q) Generator f(u) with f(1) = 0 Total variation (metric) Squared Hellinger 1 2 p(x) q(x) dν(x) 1 2 u 1 ( p(x) q(x)) 2 dν(x) ( u 1) 2 Pearson χ 2 (q(x) p(x)) 2 P dν(x) (u 1) 2 p(x) Neyman χ 2 (p(x) q(x)) 2 (1 u) N dν(x) 2 q(x) u Pearson-Vajda χ k (q(x) λp(x)) k P p k 1 dν(x) (u 1) k (x) Pearson-Vajda χ k q(x) λp(x) k P p k 1 dν(x) u 1 k (x) p(x) Kullback-Leibler p(x)log q(x) dν(x) logu q(x) reverse Kullback-Leibler q(x)log p(x) dν(x) u logu α-divergence 4 1 α 2 (1 p 1 α 2 (x)q 1+α (x)dν(x)) 4 1 α Jensen-Shannon 1 2p(x) 2 (p(x)log p(x)+q(x) + q(x)log 2q(x) )dν(x) (u + 1)log 1+u p(x)+q(x) 2 1+α 2 (1 u 2 ) + u logu
5 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 5/17 Stochastic approximations of f-divergences I (n) f (X 1 : X 2 ) 1 2n n i=1 ( f ( ) x2 (s i ) + x 1(t i ) x 1 (s i ) x 2 (t i ) f ( )) x2 (t i ), x 1 (t i ) with s 1,...,s n and t 1,...,t n IID. sampled from X 1 and X 2, respectively. lim I (n) n f (X 1 : X 2 ) I f (X 1 : X 2 ) work for any generator f but... In practice, limited to small dimension support.
6 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 6/17 Exponential families Canonical decomposition of the probability measure: p θ (x) = exp( t(x),θ F(θ)+k(x)), Here, consider natural parameter space Θ affine. Poi(λ) : p(x λ) = λx e λ,λ > 0,x {0,1,...} x! Nor I (µ) : p(x µ) = (2π) d 2 e 1 2 (x µ) (x µ),µ R d,x R d Family θ Θ F(θ) k(x) t(x) ν Poisson logλ R e θ logx! x ν c Iso.Gaussian µ R d 1 2 θ θ d 2 log2π 1 2 x x x ν L
7 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 7/17 χ 2 for affine exponential families Bypass integral computation, Closed-form formula χ 2 P (X 1 : X 2 ) = e F(2θ 2 θ 1 ) (2F(θ 2 ) F(θ 1 )) 1, χ 2 N (X 1 : X 2 ) = e F(2θ 1 θ 2 ) (2F(θ 1 ) F(θ 2 )) 1, Kullback-Leibler divergence amounts to a Bregman divergence [3]: KL(X 1 : X 2 ) = B F (θ 2 : θ 1 ) B F (θ : θ ) = F(θ) F(θ ) (θ θ ) F(θ )
8 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 8/17 Higher-order Vajda χ k divergences χ k P (X 1 : X 2 ) = χ k P (X 1 : X 2 ) = (x2 (x) x 1 (x)) k x 1 (x) k 1 dν(x), x2 (x) x 1 (x) k x 1 (x) k 1 dν(x), are f-divergences for the generators (u 1) k and u 1 k. When k = 1, χ 1 P (X 1 : X 2 ) = (x 1 (x) x 2 (x))dν(x) = 0 (never discriminative), and χ 1 P (X 1,X 2 ) is twice the total variation distance. χ 0 P χ k P is the unit constant. is a signed distance
9 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 9/17 Higher-order Vajda χ k divergences Lemma The (signed) χ k P distance between members X 1 E F (θ 1 ) and X 2 E F (θ 2 ) of the same affine exponential family is (k N) always bounded and equal to: χ k P (X 1 : X 2 ) = k ( k ( 1) k j j j=0 ) e F((1 j)θ 1 +jθ 2 ) e (1 j)f(θ 1)+jF(θ 2 ).
10 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 10/17 Higher-order Vajda χ k divergences: For Poisson/Normal distributions, we get closed-form formula: χ k P (λ 1 : λ 2 ) = χ k P (µ 1 : µ 2 ) = k ( k ( 1) k j j j=0 k ( k ( 1) k j j j=0 ) e λ1 j 1 λ j 2 ((1 j)λ 1+jλ 2 ), ) e 1 2 j(j 1)(µ 1 µ 2 ) (µ 1 µ 2 ). signed distances.
11 f-divergences from Taylor series Lemma (extends Theorem 1 of [1]) When bounded, the f-divergence I f can be expressed as the power series of higher order Chi-type distances: I f (X 1 : X 2 ) = = i=0 x 1 (x) i=0 ( ) 1 i! f (i) x2 (x) i (λ) x 1 (x) λ dν(x), 1 i! f (i) (λ) χ i λ,p (X 1 : X 2 ), I f <, and χ i λ,p (X 1 : X 2 ) is a generalization of the χ i P defined by: χ i λ,p (X (x2 (x) λx 1 (x)) i 1 : X 2 ) = x 1 (x) i 1 dν(x). and χ 0 λ,p (X 1 : X 2 ) = 1 by convention. Note that χ i λ,p f(1) = (1 λ)k is a f-divergence for f(u) = (u λ) k (1 λ) k c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 11/17
12 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 12/17 f-divergences: Analytic formula λ = 1 int(dom(f (i) )), f-divergence (Theorem 1 of [1]): I f (X 1 : X 2 ) s k=0 f (k) (1) χ k P k! (X 1 : X 2 ) 1 (s +1)! f (s+1) (M m) s, where f (s+1) = sup t [m,m] f (s+1) (t) and m p q M. λ = 0 (whenever 0 int(dom(f (i) ))) and affine exponential families, simpler expression: I f (X 1 : X 2 ) = I 1 i,i (θ 1 : θ 2 ) = i=0 f (i) (0) I 1 i,i (θ 1 : θ 2 ), i! e F(iθ 2+(1 i)θ 1 ) e if(θ 2)+(1 i)f(θ 1 ).
13 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 13/17 Corollary: Approximating f-divergences by χ 2 divergences Corollary A second-order Taylor expansion yields I f (X 1 : X 2 ) f(1)+f (1)χ 1 N (X 1 : X 2 )+ 1 2 f (1)χ 2 N (X 1 : X 2 ) Since f(1) = 0 and χ 1 N (X 1 : X 2 ) = 0, it follows that I f (X 1 : X 2 ) f (1) 2 χ2 N (X 1 : X 2 ), (f (1) > 0 follows from the strict convexity of the generator). When f(u) = ulogu, this yields the well-known approximation [2]: χ 2 P (X 1 : X 2 ) 2 KL(X 1 : X 2 ).
14 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 14/17 Kullback-Leibler divergence: Analytic expression Kullback-Leibler divergence: f(u) = log u. f (i) (u) = ( 1) i (i 1)!u i and hence f (i) (1) i! = ( 1)i i, for i 1 (with f(1) = 0). Since χ 1 1,P = 0, it follows that: KL(X 1 : X 2 ) = ( 1) i j=2 i χ j P (X 1 : X 2 ). alternating sign sequence Poisson distributions: λ 1 = 0.6 and λ 2 = 0.3, KL (exact using Bregman divergence), stochastic evaluation with n = 10 6 yields KL KL divergence from Taylor truncation: (s = 2), (s = 3), (s = 4), (s = 10), (s = 15), etc.
15 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 15/17 Contributions Statistical f-divergences between members of the same exponential family with affine natural space. Generic closed-form formula for the Pearson/Neyman χ 2 and Vajda χ k -type distance Analytic expression of f-divergences using Pearson-Vajda-type distances. Second-order Taylor approximation for fast estimation of f-divergences. Java TM package:
16 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 16/17 Thank author="frank Nielsen and Richard Nock", title="on the {C}hi square and higher-order {C}hi distances for approximating $f$-divergences", year="2013", eprint="arxiv/ " }
17 c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 17/17 Bibliographic references I N.S. Barnett, P. Cerone, S.S. Dragomir, and A. Sofo. Approximating Csiszár f-divergence by the use of Taylor s formula with integral remainder. Mathematical Inequalities & Applications, 5(3): , Thomas M. Cover and Joy A. Thomas. Elements of information theory. Wiley-Interscience, New York, NY, USA, Frank Nielsen and Sylvain Boltz. The Burbea-Rao and Bhattacharyya centroids. IEEE Transactions on Information Theory, 57(8): , August 2011.
On the Chi square and higher-order Chi distances for approximating f-divergences
On the Chi square and higher-order Chi distances for approximating f-divergences Frank Nielsen, Senior Member, IEEE and Richard Nock, Nonmember Abstract We report closed-form formula for calculating the
More informationtopics about f-divergence
topics about f-divergence Presented by Liqun Chen Mar 16th, 2018 1 Outline 1 f-gan: Training Generative Neural Samplers using Variational Experiments 2 f-gans in an Information Geometric Nutshell Experiments
More informationarxiv: v2 [math.st] 22 Aug 2018
GENERALIZED BREGMAN AND JENSEN DIVERGENCES WHICH INCLUDE SOME F-DIVERGENCES TOMOHIRO NISHIYAMA arxiv:808.0648v2 [math.st] 22 Aug 208 Abstract. In this paper, we introduce new classes of divergences by
More informationOn Improved Bounds for Probability Metrics and f- Divergences
IRWIN AND JOAN JACOBS CENTER FOR COMMUNICATION AND INFORMATION TECHNOLOGIES On Improved Bounds for Probability Metrics and f- Divergences Igal Sason CCIT Report #855 March 014 Electronics Computers Communications
More informationStat410 Probability and Statistics II (F16)
Stat4 Probability and Statistics II (F6 Exponential, Poisson and Gamma Suppose on average every /λ hours, a Stochastic train arrives at the Random station. Further we assume the waiting time between two
More informationarxiv: v4 [cs.it] 8 Apr 2014
1 On Improved Bounds for Probability Metrics and f-divergences Igal Sason Abstract Derivation of tight bounds for probability metrics and f-divergences is of interest in information theory and statistics.
More informationarxiv: v2 [cs.cv] 19 Dec 2011
A family of statistical symmetric divergences based on Jensen s inequality arxiv:009.4004v [cs.cv] 9 Dec 0 Abstract Frank Nielsen a,b, a Ecole Polytechnique Computer Science Department (LIX) Palaiseau,
More informationA View on Extension of Utility-Based on Links with Information Measures
Communications of the Korean Statistical Society 2009, Vol. 16, No. 5, 813 820 A View on Extension of Utility-Based on Links with Information Measures A.R. Hoseinzadeh a, G.R. Mohtashami Borzadaran 1,b,
More informationStatistics 300A, Homework 7. T (x) = C(b) =
Statistics 300A, Homework 7 Modified based on Brad Klingenberg s Solutions 3.34 (i Suppose that g is known, then the joint density of X,..., X n is given by ( [ n Γ(gb g (x x n g exp b which is (3.9 with
More informationTight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding
APPEARS IN THE IEEE TRANSACTIONS ON INFORMATION THEORY, FEBRUARY 015 1 Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding Igal Sason Abstract Tight bounds for
More informationA SYMMETRIC INFORMATION DIVERGENCE MEASURE OF CSISZAR'S F DIVERGENCE CLASS
Journal of the Applied Mathematics, Statistics Informatics (JAMSI), 7 (2011), No. 1 A SYMMETRIC INFORMATION DIVERGENCE MEASURE OF CSISZAR'S F DIVERGENCE CLASS K.C. JAIN AND R. MATHUR Abstract Information
More informationJune 21, Peking University. Dual Connections. Zhengchao Wan. Overview. Duality of connections. Divergence: general contrast functions
Dual Peking University June 21, 2016 Divergences: Riemannian connection Let M be a manifold on which there is given a Riemannian metric g =,. A connection satisfying Z X, Y = Z X, Y + X, Z Y (1) for all
More informationConvex Functions. Wing-Kin (Ken) Ma The Chinese University of Hong Kong (CUHK)
Convex Functions Wing-Kin (Ken) Ma The Chinese University of Hong Kong (CUHK) Course on Convex Optimization for Wireless Comm. and Signal Proc. Jointly taught by Daniel P. Palomar and Wing-Kin (Ken) Ma
More informationInformation Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18
Information Theory David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 18 A Measure of Information? Consider a discrete random variable
More informationEECS 750. Hypothesis Testing with Communication Constraints
EECS 750 Hypothesis Testing with Communication Constraints Name: Dinesh Krithivasan Abstract In this report, we study a modification of the classical statistical problem of bivariate hypothesis testing.
More informationInformation Theory and Hypothesis Testing
Summer School on Game Theory and Telecommunications Campione, 7-12 September, 2014 Information Theory and Hypothesis Testing Mauro Barni University of Siena September 8 Review of some basic results linking
More informationIE 521 Convex Optimization
Lecture 5: Convex II 6th February 2019 Convex Local Lipschitz Outline Local Lipschitz 1 / 23 Convex Local Lipschitz Convex Function: f : R n R is convex if dom(f ) is convex and for any λ [0, 1], x, y
More information1/37. Convexity theory. Victor Kitov
1/37 Convexity theory Victor Kitov 2/37 Table of Contents 1 2 Strictly convex functions 3 Concave & strictly concave functions 4 Kullback-Leibler divergence 3/37 Convex sets Denition 1 Set X is convex
More informationarxiv:math/ v1 [math.st] 19 Jan 2005
ON A DIFFERENCE OF JENSEN INEQUALITY AND ITS APPLICATIONS TO MEAN DIVERGENCE MEASURES INDER JEET TANEJA arxiv:math/05030v [math.st] 9 Jan 005 Let Abstract. In this paper we have considered a difference
More informationSharpening Jensen s Inequality
Sharpening Jensen s Inequality arxiv:1707.08644v [math.st] 4 Oct 017 J. G. Liao and Arthur Berg Division of Biostatistics and Bioinformatics Penn State University College of Medicine October 6, 017 Abstract
More informationRobustness and duality of maximum entropy and exponential family distributions
Chapter 7 Robustness and duality of maximum entropy and exponential family distributions In this lecture, we continue our study of exponential families, but now we investigate their properties in somewhat
More informationApplications of Information Geometry to Hypothesis Testing and Signal Detection
CMCAA 2016 Applications of Information Geometry to Hypothesis Testing and Signal Detection Yongqiang Cheng National University of Defense Technology July 2016 Outline 1. Principles of Information Geometry
More informationPosterior Regularization
Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods
More informationResearch Article On Some Improvements of the Jensen Inequality with Some Applications
Hindawi Publishing Corporation Journal of Inequalities and Applications Volume 2009, Article ID 323615, 15 pages doi:10.1155/2009/323615 Research Article On Some Improvements of the Jensen Inequality with
More informationarxiv: v9 [math.st] 10 Apr 2016
A New Family of Bounded Divergence Measures and Application to Signal Detection Shivakumar Jolad 1, Ahmed Roman, Mahesh C. Shastry 3, Mihir Gadgil 4 and Ayanendranath Basu 5 1 Department of Physics, Indian
More informationSpeech Recognition Lecture 7: Maximum Entropy Models. Mehryar Mohri Courant Institute and Google Research
Speech Recognition Lecture 7: Maximum Entropy Models Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.com This Lecture Information theory basics Maximum entropy models Duality theorem
More informationPROPERTIES. Ferdinand Österreicher Institute of Mathematics, University of Salzburg, Austria
CSISZÁR S f-divergences - BASIC PROPERTIES Ferdinand Österreicher Institute of Mathematics, University of Salzburg, Austria Abstract In this talk basic general properties of f-divergences, including their
More informationSupplementary Materials for: f-gan: Training Generative Neural Samplers using Variational Divergence Minimization
Supplementary Materials for: f-gan: Training Generative Neural Samplers using Variational Divergence Minimization Sebastian Nowozin, Botond Cseke, Ryota Tomioka Machine Intelligence and Perception Group
More informationConvexity/Concavity of Renyi Entropy and α-mutual Information
Convexity/Concavity of Renyi Entropy and -Mutual Information Siu-Wai Ho Institute for Telecommunications Research University of South Australia Adelaide, SA 5095, Australia Email: siuwai.ho@unisa.edu.au
More informationAdvanced Machine Learning & Perception
Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 6 Standard Kernels Unusual Input Spaces for Kernels String Kernels Probabilistic Kernels Fisher Kernels Probability Product Kernels
More informationIT and large deviation theory
PhD short course Information Theory and Statistics Siena, 15-19 September, 2014 IT and large deviation theory Mauro Barni University of Siena Outline of the short course Part 1: Information theory in a
More informationBrief Review on Estimation Theory
Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on
More informationMathematical Foundations of the Generalization of t-sne and SNE for Arbitrary Divergences
MACHINE LEARNING REPORTS Mathematical Foundations of the Generalization of t-sne and SNE for Arbitrary Report 02/2010 Submitted: 01.04.2010 Published:26.04.2010 T. Villmann and S. Haase University of Applied
More informationValidation Metrics. Kathryn Maupin. Laura Swiler. June 28, 2017
Validation Metrics Kathryn Maupin Laura Swiler June 28, 2017 Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia LLC,
More informationTight Bounds for Symmetric Divergence Measures and a New Inequality Relating f-divergences
Tight Bounds for Symmetric Divergence Measures and a New Inequality Relating f-divergences Igal Sason Department of Electrical Engineering Technion, Haifa 3000, Israel E-mail: sason@ee.technion.ac.il Abstract
More informationMinimax lower bounds I
Minimax lower bounds I Kyoung Hee Kim Sungshin University 1 Preliminaries 2 General strategy 3 Le Cam, 1973 4 Assouad, 1983 5 Appendix Setting Family of probability measures {P θ : θ Θ} on a sigma field
More informationInference. Data. Model. Variates
Data Inference Variates Model ˆθ (,..., ) mˆθn(d) m θ2 M m θ1 (,, ) (,,, ) (,, ) α = :=: (, ) F( ) = = {(, ),, } F( ) X( ) = Γ( ) = Σ = ( ) = ( ) ( ) = { = } :=: (U, ) , = { = } = { = } x 2 e i, e j
More informationInformation Theory in Intelligent Decision Making
Information Theory in Intelligent Decision Making Adaptive Systems and Algorithms Research Groups School of Computer Science University of Hertfordshire, United Kingdom June 7, 2015 Information Theory
More informationCONVEX FUNCTIONS AND MATRICES. Silvestru Sever Dragomir
Korean J. Math. 6 (018), No. 3, pp. 349 371 https://doi.org/10.11568/kjm.018.6.3.349 INEQUALITIES FOR QUANTUM f-divergence OF CONVEX FUNCTIONS AND MATRICES Silvestru Sever Dragomir Abstract. Some inequalities
More informationarxiv: v2 [cs.it] 26 Sep 2011
Sequences o Inequalities Among New Divergence Measures arxiv:1010.041v [cs.it] 6 Sep 011 Inder Jeet Taneja Departamento de Matemática Universidade Federal de Santa Catarina 88.040-900 Florianópolis SC
More information6.1 Variational representation of f-divergences
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016
More informationONE SILVESTRU SEVER DRAGOMIR 1;2
SOME -DIVERGENCE MEASURES RELATED TO JENSEN S ONE SILVESTRU SEVER DRAGOMIR ; Abstract. In this paper we introuce some -ivergence measures that are relate to the Jensen s ivergence introuce by Burbea an
More informationarxiv: v1 [cs.lg] 22 Oct 2018
The Bregman chord divergence arxiv:1810.09113v1 [cs.lg] Oct 018 rank Nielsen Sony Computer Science Laboratories Inc, Japan rank.nielsen@acm.org Richard Nock Data 61, Australia The Australian National &
More informationLecture 35: December The fundamental statistical distances
36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose
More informationf-divergence Estimation and Two-Sample Homogeneity Test under Semiparametric Density-Ratio Models
IEEE Transactions on Information Theory, vol.58, no.2, pp.708 720, 2012. 1 f-divergence Estimation and Two-Sample Homogeneity Test under Semiparametric Density-Ratio Models Takafumi Kanamori Nagoya University,
More informationCS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018
CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 1 Overview This lecture mainly covers Recall the statistical theory of GANs
More informationThe Information Bottleneck Revisited or How to Choose a Good Distortion Measure
The Information Bottleneck Revisited or How to Choose a Good Distortion Measure Peter Harremoës Centrum voor Wiskunde en Informatica PO 94079, 1090 GB Amsterdam The Nederlands PHarremoes@cwinl Naftali
More informationMGMT 69000: Topics in High-dimensional Data Analysis Falll 2016
MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 Lecture 14: Information Theoretic Methods Lecturer: Jiaming Xu Scribe: Hilda Ibriga, Adarsh Barik, December 02, 2016 Outline f-divergence
More informationPattern learning and recognition on statistical manifolds: An information-geometric review
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 1/64 Pattern learning and recognition on statistical manifolds: An information-geometric review Frank Nielsen Frank.Nielsen@acm.org www.informationgeometry.org
More informationLecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016
Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,
More informationChapter 3 : Likelihood function and inference
Chapter 3 : Likelihood function and inference 4 Likelihood function and inference The likelihood Information and curvature Sufficiency and ancilarity Maximum likelihood estimation Non-regular models EM
More informationASYMPTOTIC PROPERTIES OF PREDICTIVE RECURSION: ROBUSTNESS AND RATE OF CONVERGENCE
ASYMPTOTIC PROPERTIES OF PREDICTIVE RECURSION: ROBUSTNESS AND RATE OF CONVERGENCE By Ryan Martin and Surya T. Tokdar Indiana University-Purdue University Indianapolis and Duke University Here we explore
More informationCALCULUS JIA-MING (FRANK) LIOU
CALCULUS JIA-MING (FRANK) LIOU Abstract. Contents. Power Series.. Polynomials and Formal Power Series.2. Radius of Convergence 2.3. Derivative and Antiderivative of Power Series 4.4. Power Series Expansion
More informationConsistency of the maximum likelihood estimator for general hidden Markov models
Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models
More informationOrthonormal polynomial expansions and lognormal sum densities
1/28 Orthonormal polynomial expansions and lognormal sum densities Pierre-Olivier Goffard Université Libre de Bruxelles pierre-olivier.goffard@ulb.ac.be February 22, 2016 2/28 Introduction Motivations
More informationBayesian Consistency for Markov Models
Bayesian Consistency for Markov Models Isadora Antoniano-Villalobos Bocconi University, Milan, Italy. Stephen G. Walker University of Texas at Austin, USA. Abstract We consider sufficient conditions for
More informationThis article was published in an Elsevier journal. The attached copy is furnished to the author for non-commercial research and education use, including for instruction at the author s institution, sharing
More informationAuxiliary-Function Methods in Optimization
Auxiliary-Function Methods in Optimization Charles Byrne (Charles Byrne@uml.edu) http://faculty.uml.edu/cbyrne/cbyrne.html Department of Mathematical Sciences University of Massachusetts Lowell Lowell,
More informationTesting Hypothesis. Maura Mezzetti. Department of Economics and Finance Università Tor Vergata
Maura Department of Economics and Finance Università Tor Vergata Hypothesis Testing Outline It is a mistake to confound strangeness with mystery Sherlock Holmes A Study in Scarlet Outline 1 The Power Function
More informationLegendre transformation and information geometry
Legendre transformation and information geometry CIG-MEMO #2, v1 Frank Nielsen École Polytechnique Sony Computer Science Laboratorie, Inc http://www.informationgeometry.org September 2010 Abstract We explain
More informationHypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3
Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest
More informationPolicy Gradient. U(θ) = E[ R(s t,a t );π θ ] = E[R(τ);π θ ] (1) 1 + e θ φ(s t) E[R(τ);π θ ] (3) = max. θ P(τ;θ)R(τ) (6) P(τ;θ) θ log P(τ;θ)R(τ) (9)
CS294-40 Learning for Robotics and Control Lecture 16-10/20/2008 Lecturer: Pieter Abbeel Policy Gradient Scribe: Jan Biermeyer 1 Recap Recall: H U() = E[ R(s t,a ;π ] = E[R();π ] (1) Here is a sample path
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationTHE DVORETZKY KIEFER WOLFOWITZ INEQUALITY WITH SHARP CONSTANT: MASSART S 1990 PROOF SEMINAR, SEPT. 28, R. M. Dudley
THE DVORETZKY KIEFER WOLFOWITZ INEQUALITY WITH SHARP CONSTANT: MASSART S 1990 PROOF SEMINAR, SEPT. 28, 2011 R. M. Dudley 1 A. Dvoretzky, J. Kiefer, and J. Wolfowitz 1956 proved the Dvoretzky Kiefer Wolfowitz
More informationInference in non-linear time series
Intro LS MLE Other Erik Lindström Centre for Mathematical Sciences Lund University LU/LTH & DTU Intro LS MLE Other General Properties Popular estimatiors Overview Introduction General Properties Estimators
More informationGeometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling
Geometry of Goodness-of-Fit Testing in High Dimensional Low Sample Size Modelling Paul Marriott 1, Radka Sabolova 2, Germain Van Bever 2, and Frank Critchley 2 1 University of Waterloo, Waterloo, Ontario,
More informationOn Information Divergence Measures, Surrogate Loss Functions and Decentralized Hypothesis Testing
On Information Divergence Measures, Surrogate Loss Functions and Decentralized Hypothesis Testing XuanLong Nguyen Martin J. Wainwright Michael I. Jordan Electrical Engineering & Computer Science Department
More informationStatistical Inference
Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Week 12. Testing and Kullback-Leibler Divergence 1. Likelihood Ratios Let 1, 2, 2,...
More informationBayes spaces: use of improper priors and distances between densities
Bayes spaces: use of improper priors and distances between densities J. J. Egozcue 1, V. Pawlowsky-Glahn 2, R. Tolosana-Delgado 1, M. I. Ortego 1 and G. van den Boogaart 3 1 Universidad Politécnica de
More information3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.
Appendix A Information Theory A.1 Entropy Shannon (Shanon, 1948) developed the concept of entropy to measure the uncertainty of a discrete random variable. Suppose X is a discrete random variable that
More informationMaximization of the information divergence from the multinomial distributions 1
aximization of the information divergence from the multinomial distributions Jozef Juríček Charles University in Prague Faculty of athematics and Physics Department of Probability and athematical Statistics
More informationNested Inequalities Among Divergence Measures
Appl Math Inf Sci 7, No, 49-7 0 49 Applied Mathematics & Information Sciences An International Journal c 0 NSP Natural Sciences Publishing Cor Nested Inequalities Among Divergence Measures Inder J Taneja
More informationDivergence measure of intuitionistic fuzzy sets
Divergence measure of intuitionistic fuzzy sets Fuyuan Xiao a, a School of Computer and Information Science, Southwest University, Chongqing, 400715, China Abstract As a generation of fuzzy sets, the intuitionistic
More information2882 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 6, JUNE A formal definition considers probability measures P and Q defined
2882 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 6, JUNE 2009 Sided and Symmetrized Bregman Centroids Frank Nielsen, Member, IEEE, and Richard Nock Abstract In this paper, we generalize the notions
More information21.2 Example 1 : Non-parametric regression in Mean Integrated Square Error Density Estimation (L 2 2 risk)
10-704: Information Processing and Learning Spring 2015 Lecture 21: Examples of Lower Bounds and Assouad s Method Lecturer: Akshay Krishnamurthy Scribes: Soumya Batra Note: LaTeX template courtesy of UC
More informationGoodness of Fit Test and Test of Independence by Entropy
Journal of Mathematical Extension Vol. 3, No. 2 (2009), 43-59 Goodness of Fit Test and Test of Independence by Entropy M. Sharifdoost Islamic Azad University Science & Research Branch, Tehran N. Nematollahi
More information4 Expectation & the Lebesgue Theorems
STA 205: Probability & Measure Theory Robert L. Wolpert 4 Expectation & the Lebesgue Theorems Let X and {X n : n N} be random variables on a probability space (Ω,F,P). If X n (ω) X(ω) for each ω Ω, does
More informationArimoto Channel Coding Converse and Rényi Divergence
Arimoto Channel Coding Converse and Rényi Divergence Yury Polyanskiy and Sergio Verdú Abstract Arimoto proved a non-asymptotic upper bound on the probability of successful decoding achievable by any code
More informationDA Freedman Notes on the MLE Fall 2003
DA Freedman Notes on the MLE Fall 2003 The object here is to provide a sketch of the theory of the MLE. Rigorous presentations can be found in the references cited below. Calculus. Let f be a smooth, scalar
More informationLecture 1: Introduction
Principles of Statistics Part II - Michaelmas 208 Lecturer: Quentin Berthet Lecture : Introduction This course is concerned with presenting some of the mathematical principles of statistical theory. One
More informationOn the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method
On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method Igal Sason Department of Electrical Engineering Technion - Israel Institute of Technology Haifa 32000, Israel ETH, Zurich,
More informationEconomics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,
Economics 520 Lecture Note 9: Hypothesis Testing via the Neyman-Pearson Lemma CB 8., 8.3.-8.3.3 Uniformly Most Powerful Tests and the Neyman-Pearson Lemma Let s return to the hypothesis testing problem
More information(Multivariate) Gaussian (Normal) Probability Densities
(Multivariate) Gaussian (Normal) Probability Densities Carl Edward Rasmussen, José Miguel Hernández-Lobato & Richard Turner April 20th, 2018 Rasmussen, Hernàndez-Lobato & Turner Gaussian Densities April
More informationIntroduction to Information Geometry
Introduction to Information Geometry based on the book Methods of Information Geometry written by Shun-Ichi Amari and Hiroshi Nagaoka Yunshu Liu 2012-02-17 Outline 1 Introduction to differential geometry
More informationWeek 3: The EM algorithm
Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent
More informationTHEOREM AND METRIZABILITY
f-divergences - REPRESENTATION THEOREM AND METRIZABILITY Ferdinand Österreicher Institute of Mathematics, University of Salzburg, Austria Abstract In this talk we are first going to state the so-called
More informationA New Quantum f-divergence for Trace Class Operators in Hilbert Spaces
Entropy 04, 6, 5853-5875; doi:0.3390/e65853 OPEN ACCESS entropy ISSN 099-4300 www.mdpi.com/journal/entropy Article A New Quantum f-divergence for Trace Class Operators in Hilbert Spaces Silvestru Sever
More informationTotal Jensen divergences: Definition, Properties and k-means++ Clustering
Total Jensen divergences: Definition, Properties and k-means++ Clustering Frank Nielsen, Senior Member, IEEE Sony Computer Science Laboratories, Inc. 3-4-3 Higashi Gotanda, 4-00 Shinagawa-ku arxiv:309.709v
More information4. Convex optimization problems (part 1: general)
EE/AA 578, Univ of Washington, Fall 2016 4. Convex optimization problems (part 1: general) optimization problem in standard form convex optimization problems quasiconvex optimization 4 1 Optimization problem
More informationLecture 3: Lower Bounds for Bandit Algorithms
CMSC 858G: Bandits, Experts and Games 09/19/16 Lecture 3: Lower Bounds for Bandit Algorithms Instructor: Alex Slivkins Scribed by: Soham De & Karthik A Sankararaman 1 Lower Bounds In this lecture (and
More informationarxiv: v4 [cs.it] 17 Oct 2015
Upper Bounds on the Relative Entropy and Rényi Divergence as a Function of Total Variation Distance for Finite Alphabets Igal Sason Department of Electrical Engineering Technion Israel Institute of Technology
More informationZOBECNĚNÉ PHI-DIVERGENCE A EM-ALGORITMUS V AKUSTICKÉ EMISI
ZOBECNĚNÉ PHI-DIVERGENCE A EM-ALGORITMUS V AKUSTICKÉ EMISI Jan Tláskal a Václav Kůs Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University in Prague 11.11.2010 Concepts What the
More informationCramér-Rao Lower Bound and Information Geometry
arxiv:1301.3578v2 [cs.it] 24 Jan 2013 Cramér-Rao Lower Bound and Information Geometry Frank Nielsen Sony Computer Science Laboratories Inc., Japan École Polytechnique, LIX, France 1 Introduction and historical
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More information40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology
Singapore University of Design and Technology Lecture 9: Hypothesis testing, uniformly most powerful tests. The Neyman-Pearson framework Let P be the family of distributions of concern. The Neyman-Pearson
More informationf-gan: Training Generative Neural Samplers using Variational Divergence Minimization
f-gan: Training Generative Neural Samplers using Variational Divergence Minimization Sebastian Nowozin, Botond Cseke, Ryota Tomioka Machine Intelligence and Perception Group Microsoft Research {Sebastian.Nowozin,
More informationPolynomial Preserving Jump-Diffusions on the Unit Interval
Polynomial Preserving Jump-Diffusions on the Unit Interval Martin Larsson Department of Mathematics, ETH Zürich joint work with Christa Cuchiero and Sara Svaluto-Ferro 7th General AMaMeF and Swissquote
More informationOn divergences, surrogate loss functions, and decentralized detection
On divergences, surrogate loss functions, and decentralized detection XuanLong Nguyen Computer Science Division University of California, Berkeley xuanlong@eecs.berkeley.edu Martin J. Wainwright Statistics
More informationEstimators, escort probabilities, and φ-exponential families in statistical physics
Estimators, escort probabilities, and φ-exponential families in statistical physics arxiv:math-ph/425v 4 Feb 24 Jan Naudts Departement Natuurkunde, Universiteit Antwerpen UIA, Universiteitsplein, 26 Antwerpen,
More informationGeneralized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses
Ann Inst Stat Math (2009) 61:773 787 DOI 10.1007/s10463-008-0172-6 Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Taisuke Otsu Received: 1 June 2007 / Revised:
More information