On the Chi square and higher-order Chi distances for approximating f-divergences
|
|
- Jade McGee
- 6 years ago
- Views:
Transcription
1 On the Chi square and higher-order Chi distances for approximating f-divergences Frank Nielsen, Senior Member, IEEE and Richard Nock, Nonmember Abstract We report closed-form formula for calculating the Chi square and higher-order Chi distances between statistical distributions belonging to the same exponential family with affine natural space, and instantiate those formula for the Poisson and isotropic Gaussian families. We then describe an analytic formula for the f-divergences based on Taylor expansions and relying on an extended class of Chi-type distances. Index Terms families. statistical divergences, chi square distance, Kullback-Leibler divergence, Taylor series, exponential A. Statistical divergences: f-divergences I. INTRODUCTION Measuring the similarity or dissimilarity between two probability measures is met ubiquitously in signal processing. Some usual distances are the Pearson χ P and Neyman χ N chi square distances, and the Kullback-Leibler divergence [] defined respectively by χ (x (x) x (x)) P (X : X ) = dν(x), () x (x) χ (x (x) x (x)) N(X : X ) = dν(x), () x (x) KL(X : X ) = x (x) log x (x) dν(x), (3) x (x) Frank Nielsen is with Sony Computer Science Laboratories, Inc., Higashi Gotanda, 4-00 Shinagawa-ku, Tokyo, Japan, nielsen@csl.sony.co.jp Richard Nock is with UAG CEREGMIA, Martinique, France, rnock@martinique.univ-ag.fr.
2 where X and X are probability measures absolutely continuous with respect to a reference measure ν, and x and x denote their Radon-Nikodym densities, respectively. Those dissimilarity measures M are termed divergences to contrast with metric distances since they are oriented distances (i.e., M(X : X ) M(X : X )) that do not satisfy the triangular inequality. In the 960 s, many of those divergences were unified using the generic framework of f-divergences [], I f, defined for an arbitrary functional f: I f (X : X ) = x (x)f ( ) x (x) dν(x) 0, (4) x (x) where f is a convex function f : (0, ) dom(f) [0, ] (often normalized so that f() = 0). Those f-divergences can always be symmetrized by taking: S f (X : X ) = I f (X : X ) + I f (X : X ), (5) with f (u) = uf(/u), and I f (X : X ) = I f (X : X ). See Table I for a list of common f-divergences with their corresponding generators f. In information theory, f-divergences are characterized as the unique family of convex divergences that satisfies the information monotonicity property [3]. Note that f-divergences may evaluate to infinity (that is, unbounded I f ) when the integral diverge, even if x, x > 0 on the support X. For example, let X = (0, ) be the unit interval, and two densities (with respect to Lebesgue measure ν L ) x (x) = and x (x) = ce /x with c = 0 e /x dx 0.48 the normalizing constant. Consider the Kullback-Leibler divergence (f-divergence with f(u) = u log u): KL(X : X ) = 0 = log c + x log x (x) x (x) dν L(x), (6) 0 dν(x) =. (7) x Beware that sometimes the χ N and χ P definitions are inverted in the literature. This may stem from an alternative definition of f-divergences defined as I f (X : X ) = x (x)f( x (x) x (x) )dν(x) = I f (X : X ). That is, given any transition probability τ that transforms measures X and X to X τ and X τ (by a Markov morphism τ), we have I f (X : X ) I f (X τ : X τ ), with equality if and only if the transition τ is induced from a sufficient statistic [3].
3 3 f-divergence I f (P : Q) Formula Generator f(u) Total variation (metric) Squared Hellinger Pearson χ P Neyman χ N Pearson-Vajda χ k λ,p Pearson-Vajda χ k λ,p Kullback-Leibler reverse Kullback-Leibler α-divergence Jensen-Shannon p(x) q(x) dν(x) u ( p(x) q(x)) dν(x) ( u ) (q(x) p(x)) p(x) dν(x) (u ) (p(x) q(x)) q(x) (q(x) λp(x)) k dν(x) ( u) u p k (x) dν(x) (u λ) k q(x) λp(x) k dν(x) u λ k p k (x) p(x) log p(x) dν(x) log u q(x) q(x) log q(x) dν(x) u log u p(x) 4 ( p α α (x)q +α 4 (x)dν(x)) ( u +α α ) (p(x) log p(x) q(x) +u + q(x) log )dν(x) (u + ) log + u log u p(x)+q(x) p(x)+q(x) TABLE I SOME COMMON f -DIVERGENCES I f WITH CORRESPONDING GENERATORS: EXCEPT THE TOTAL VARIATION, f -DIVERGENCES ARE NOT METRIC [4]. B. Stochastic approximations of f-divergences To bypass the integral evaluation of I f of Eq. 4 (often mathematically intractable), we carry out a stochastic integration: Î f (X : X ) n n i= ( f ( ) x (s i ) + x (t i ) x (s i ) x (t i ) f ( )) x (t i ), (8) x (t i ) with s,..., s n and t,..., t n IID. sampled from X and X, respectively. Those approximations, although converging to the true values when n, are time consuming and yield poor results in practice, specially when the dimension of the observation space, X, is large. We therefore concentrate on obtaining exact or arbitrarily fine approximation formula for f-divergences by considering a restricted class of exponential families. C. Exponential families Let x, y denote the inner product for x, y X : The inner product for vector spaces X is the scalar product x, y = x y. An exponential family [6] is a set of probability measures E F = {P θ } θ dominated by a measure ν having their Radon-Nikodym densities p θ expressed canonically as:
4 4 p θ (x) = exp( t(x), θ F (θ) + k(x)), (9) for θ belonging to the natural parameter space: Θ = { θ R D p θ (x)dν(x) = }. Since log p x X θ(x)dν(x) = log = 0, it follows that F (θ) = log exp( t(x), θ + k(x))dν(x). For full regular families [6], it can be proved that function F is strictly convex and differentiable over the open convex set Θ. Function F characterizes the family, and bears different names in the literature (partition function, log-normalizer or cumulant function) and parameter θ (natural parameter) defines the member P θ of the family E F. Let D = dim(θ) denote the dimension of Θ, the order of the family. The map k(x) : X R is an auxiliary function defining a carrier measure ξ with dξ(x) = e k(x) dν(x). In practice, we often consider the Lebesgue measure ν L defined over the Borel σ-algebra E = B(R d ) of R d for continuous distributions (e.g., Gaussian), or the counting measure ν c defined on the power set σ-algebra E = X for discrete distributions (e.g., Poisson or multinomial families). The term t(x) is a measure mapping called the sufficient statistic [6]. Table II shows the canonical decomposition for the Poisson and isotropic Gaussian families. Notice that the Kullback-Leibler divergence between members X E F (θ ) and X E F (θ ) of the same exponential family amount to compute a Bregman divergence on swapped natural parameters [8]: KL(X : X ) = B F (θ : θ ), where B F (θ : θ ) = F (θ) F (θ ) (θ θ ) F (θ ), where F denotes the gradient. II. χ AND HIGHER-ORDER χ k DISTANCES A. A closed-form formula When X and X belong to the same restricted exponential family E F, we obtain the following result: Lemma : The Pearson/Neyman Chi square distance between X E F (θ ) and X E F (θ ) is given by: χ P (X : X ) = e F (θ θ ) (F (θ ) F (θ )), (0) χ N(X : X ) = e F (θ θ ) (F (θ ) F (θ )), () provided that θ θ and θ θ belongs to the natural parameter space Θ. This implies that the chi square distances are always bounded. The proof relies on the following lemma:
5 5 Lemma : The integral I p,q = x (x) p x (x) q dν(x) with p + q = for X E F (θ ) and X E F (θ ), p R, p + q = converge and equals to: I p,q = e F (pθ +qθ ) (pf (θ )+qf (θ )) () provided the natural parameter space Θ is affine. Proof: Let us calculate the integral I p,q : = exp(p( t(x), θ F (θ ) + k(x))) exp(q( t(x), θ F (θ ) + k(x)))dν(x), = e t(x),pθ +qθ (pf (θ )+qf (θ ))+k(x) dν(x), = e F (pθ +qθ ) (pf (θ )+qf (θ )) p F (x pθ + qθ )dν(x). When pθ + qθ Θ, we have p F (x pθ + qθ)dν(x) =, hence the result. To prove Lemma, we rewrite: χ P (X : X ) = = ( x (x) x (x) x (x) + x (x))dν(x), (3) ( ) x (x) x (x) dν(x), (4) and apply Lemma for p = and q = (checking that p + q = ). The closed-form formula for the Neyman chi square follows from the fact that χ N (X : X ) = χ P (X : X ). Thus when the natural parameter space Θ is affine, the Pearson/Neyman Chi square distances and its symmetrization χ P + χ N between members of the same exponential family are available in closed-form. Examples of such families are the Poisson, binomial, multinomial, or isotropic Gaussian families to name a few. Let us call those families: affine exponential families for short. The canonical decomposition of usual affine exponential families are reported in Table II. Note that a formula for the α-divergences between members of the same exponential family were reported in [8] for α [0, ]: In that case, αθ + ( α)θ always belong to the open convex natural space Θ (here, p belongs to R). B. The Poisson and isotropic Gaussian cases As reported in Table II, those Poisson and isotropic Gaussian exponential families have affine natural parameter spaces Θ.
6 6 Poi(λ) : p(x λ) = λx e λ, λ > 0, x {0,,...} x! Nor I(µ) : p(x µ) = (π) d e (x µ) (x µ), µ R d, x R d Family θ Θ F (θ) k(x) t(x) ν Poisson log λ R e θ log x! x ν c Iso.Gaussian µ R d θ θ d log π x x x ν L TABLE II EXAMPLES OF EXPONENTIAL FAMILIES WITH AFFINE NATURAL SPACE Θ. ν c DENOTES THE COUNTING MEASURE AND ν L THE LEBESGUE MEASURE. The Poisson family. For P Poi(λ ) and P Poi(λ ), we have: ( ) λ χ P (λ : λ ) = exp λ + λ. (5) λ To illustrate this formula with a numerical example, consider X Poi() and X Poi(). Then, it comes that χ P (P : P ) = e.78. The isotropic Normal family. For N Nor I (µ ) and N Nor I (µ ), we have according to Table II: χ P (µ : µ ) = e (µ µ ) (µ µ ) (µ µ µ µ ). In that case the χ distance is symmetric: χ P (µ : µ ) = e (µ µ ) (µ µ ) = χ N(µ : µ ) (6) C. Extensions to higher-order Vajda χ k divergences The higher-order Pearson-Vajda χ k P and χk P distances [5] are defined by: χ k (x (x) x (x)) k P (X : X ) = dν(x), (7) x (x) k χ k x (x) x (x) k P (X : X ) = dν(x), (8) x (x) k are f-divergences for the generators (u ) k and u k (with χ k P (X : X ) χ k P (X : X )). When k =, we have χ P (X : X ) = (x (x) x (x))dν(x) = 0 (i.e., divergence is never discriminative), and χ P (X, X ) is twice the total variation distance (the only metric
7 7 f-divergence [4]). χ 0 P is the unit constant. Observe that the χk P odd k (signed distance), but not χ k P. We can compute the χk P binomial expansion: distance may be negative for term explicitly by performing the Lemma 3: The (signed) χ k P distance between members X E F (θ ) and X E F (θ ) of the same affine exponential family is (k N) always bounded and equal to: k ( ) k e χ k P (X : X ) = ( ) k j F (( j)θ +jθ ) j e. (9) ( j)f (θ )+jf (θ ) Proof: = = j=0 χ k (x (x) x (x)) k P (X : X ) = dν(x), x (x) k (0) k ( ) k ( ) k j x (x) k j x (x) j dν(x), j x (x) k () j=0 k ( ) k ( ) k j j j=0 x (x) j x (x) j dν(x). () Then the proof follows from Lemma that shows that I j,j (X : X ) = x (x) j x (x) j dν(x) = e F (( j)θ +jθ ) e ( j)f (θ )+jf (θ ). For Poisson/Normal distributions, we get: χ k P (λ : λ ) = χ k P (µ : µ ) = k ( ) k ( ) k j e λ j λ j (( j)λ +jλ ), (3) j j=0 k ( ) k ( ) k j e j(j )(µ µ ) (µ µ ). (4) j j=0 Observe that for λ = λ = λ, we have χ k P (λ : λ ) = k j=0 ( )k j( k j) e λ λ = ( ) k = 0 when k N, as expected. The χ k P value is always bounded. For sanity check, consider the binomial expansion for k =, we have: χ P (λ : λ ) = ( ) e λ λ ( ) e λ λ + ( ) λ λ e λ = e λ λ λ, in accordance with Eq. 5. Consider a numerical example: Let λ = 0.6 and λ = 0.3, then χ P 0.6, χ3 P 0.03, χ4 P 0.04, χ5 P 0.0, χ6 P 0.08, χ7 P 0.03, χ 8 P 0.0, χ9 P , χ0 P sign of those χ k -type signed distances , etc. This numerical example illustrates the alternating
8 8 III. f -DIVERGENCES FROM TAYLOR SERIES Recall that the f-divergence defined for a generator f is I f (X : X ) = x (x)f ( ) x (x) x dν(x). (x) Assuming f analytic, we use the Taylor expansion about a point λ: f(x) = f(λ) + f (λ)(x λ) + f (λ)(x λ) +... = i=0 f (i) (λ)(x λ) i, the power series expansion of f, for i! λ int(dom(f (i) )) i. Lemma 4 (extends Theorem of [5]): When bounded, the f-divergence I f can be expressed as the power series of higher order Chi distances: I f (X : X ) = x (x) = i=0 i=0 i! f (i) (λ) ( ) i x (x) x (x) λ dν(x), i! f (i) (λ) χ i λ,p (X : X ), (5) where in the equality we have swapped the integral and sum according to Fubini theorem since we assumed that I f <, and χ i λ,p (X : X ) is a generalization of the χ i P Table I): and χ 0 λ,p (X : X ) = by convention. defined by (see χ i (x (x) λx (x)) i λ,p (X : X ) = dν(x). (6) x (x) i Choosing λ = int(dom(f (i) )), we approximate the f-divergence as follows (Theorem of [5]): I f (X : X ) s k=0 where f (s+) = sup t [m,m] f (s+) (t) and m p q fatness of p q, we ensure that I f <. f (k) () χ k P (X : X ) k! (s + )! f (s+) (M m) s, (7) M. Notice that by assuming the Choosing λ = 0 (whenever 0 int(dom(f (i) ))) and affine exponential families, we get the f-divergence in a much simpler analytic expression: I f (X : X ) = i=0 f (i) (0) I i,i (θ : θ ), (8) i! I i,i (θ : θ ) = ef (iθ +( i)θ ) e if (θ )+( i)f (θ ). (9)
9 9 Lemma 5: The bounded f-divergences between members of the same affine exponential family can be computed as an equivalent power series whenever f is analytic. Corollary : A second-order Taylor expansion yields I f (X : X ) f() + f ()χ N (X : X )+ f ()χ N (X : X ). Since f() = 0 (f can always be renormalized) and χ N (X : X ) = 0, it follows that I f (X : X ) f () χ N(X : X ), (30) and reciprocally χ N (X : X ) f () I f(x : X ) (f () > 0 follows from the strict convexity of the generator). When f(u) = u log u, this yields the well-known approximation []: χ P (X : X ) KL(X : X ). (3) For affine exponential families, we then plug the closed-form formula of Lemma to get a simple approximation formula of I f. For example, consider the Jensen-Shannon divergence (Table I) with f (u) = u u+ and f () =. It follows that I JS(X : X ) 4 χ N (X : X ). (For Poisson distributions λ = 5 and λ = 5., we get.5% relative error. A. Example : χ revisited Let us start with a sanity check for the χ distance between Poisson distributions. The Pearson chi square distance is a f-divergence for f(t) = t with f (t) = t and f (t) = and f (i) (t) = 0 for i >. Thus, with f (0) (0) =, f () (0) = 0, f () (0) =, and f (i) (0) = 0 for i >. Recall that I i,i (θ : θ ) = e F (iθ +( i)θ ) (if (θ )+( i)f (θ ) = exp(λ i λ i iλ ( i)λ ). Note that I i,i (λ, λ) = e 0 = for all i. Thus we get: I f (X : X ) = I,0 + I, with I,0 = e λ λ = and I, = e λ λ λ +λ. Thus, we obtain I f (X : X ) = + e λ λ λ +λ, in accordance with Eq. 5. B. Example : Kullback-Leibler divergence By choosing f(u) = log u, we obtain the Kullback-Leibler divergence (see Table I). We have f (i) (u) = ( ) i (i )!u i, and hence f (i) () χ,p j= = ( )i i! i, for i (with f() = 0). Since = 0, it follows that: ( ) i KL(X : X ) = χ j,p i (X : X ). (3)
10 0 Note that for the case of KL divergence between members of the same exponential families, the divergence can be expressed in a simpler closed-form using a Bregman divergence [8] on the swapped natural parameters. For example, consider Poisson distributions with λ = 0.6 and λ = 0.3, the Kullback-Leibler divergence computed from the equivalent Bregman divergence yields KL 0.58, the stochastic evaluation of Eq. 8 with n = 0 6 yields KL 0.56 and the KL divergence obtained from the truncation of Eq. 3 to the first s terms yields the following sequence: (s = ), 0.090(s = 3), 0.07(s = 4), 0.35(s = 0), 0.50(s = 5), etc. IV. CONCLUDING REMARKS We investigated the calculation of statistical f-divergences between members of the same exponential family with affine natural space. We first reported a generic closed-form formula for the Pearson/Neyman χ and Vajda χ k -type distance, and instantiated that formula for the Poisson and the isotropic Gaussian affine exponential families. We then considered the Taylor expansion of the generator f at any given point λ to deduce an analytic expression of f-divergences using Pearson-Vajda χ k λ,p distances. A second-order Taylor approximation yielded a fast estimation of f-divergences. This framework shall find potential applications in signal processing and when designing inequality bounds between divergences. A Java TM package that illustrates numerically the lemma is available online at: REFERENCES [] T. M. Cover and J. A. Thomas, Elements of information theory. New York, USA: Wiley-Interscience, 99. [] F. Österreicher, Distances based on the perimeter of the risk set of a testing problem, Austrian Journal of Statistics, vol. 4, no., p. 3.9, 03. [3] S. Amari and H. Nagaoka, Methods of Information Geometry, AMS, Ed. Oxford University Press, 000. [4] M. Khosravifard, D. Fooladivanda, and T. A. Gulliver, Confliction of the Convexity and Metric Properties in f-divergences, IEICE Trans. Fundam. Electron. Commun. Comput. Sci. E90-A, 9, , 007. [5] N. Barnett, P. Cerone, S. Dragomir, and A. Sofo, Approximating Csiszár f-divergence by the use of Taylor s formula with integral remainder, Mathematical Inequalities & Applications, vol. 5, no. 3, pp , 00. [6] L. D. Brown, Fundamentals of statistical exponential families: with applications in statistical decision theory. Hayworth, CA, USA: Institute of Mathematical Statistics, 986. [7] A. Cichocki and S.-i. Amari, Families of alpha- beta- and gamma- divergences: Flexible and robust measures of similarities, Entropy, vol., no. 6, pp , 00. [8] F. Nielsen and S. Boltz, The Burbea-Rao and Bhattacharyya centroids, IEEE Transactions on Information Theory, vol. 57, no. 8, pp , 0.
On the Chi square and higher-order Chi distances for approximating f-divergences
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 1/17 On the Chi square and higher-order Chi distances for approximating f-divergences Frank Nielsen 1 Richard Nock 2 www.informationgeometry.org
More informationtopics about f-divergence
topics about f-divergence Presented by Liqun Chen Mar 16th, 2018 1 Outline 1 f-gan: Training Generative Neural Samplers using Variational Experiments 2 f-gans in an Information Geometric Nutshell Experiments
More informationThis article was published in an Elsevier journal. The attached copy is furnished to the author for non-commercial research and education use, including for instruction at the author s institution, sharing
More informationarxiv: v2 [cs.cv] 19 Dec 2011
A family of statistical symmetric divergences based on Jensen s inequality arxiv:009.4004v [cs.cv] 9 Dec 0 Abstract Frank Nielsen a,b, a Ecole Polytechnique Computer Science Department (LIX) Palaiseau,
More informationarxiv: v2 [math.st] 22 Aug 2018
GENERALIZED BREGMAN AND JENSEN DIVERGENCES WHICH INCLUDE SOME F-DIVERGENCES TOMOHIRO NISHIYAMA arxiv:808.0648v2 [math.st] 22 Aug 208 Abstract. In this paper, we introduce new classes of divergences by
More informationCS Lecture 19. Exponential Families & Expectation Propagation
CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces
More informationAdvanced Machine Learning & Perception
Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 6 Standard Kernels Unusual Input Spaces for Kernels String Kernels Probabilistic Kernels Fisher Kernels Probability Product Kernels
More informationOn Improved Bounds for Probability Metrics and f- Divergences
IRWIN AND JOAN JACOBS CENTER FOR COMMUNICATION AND INFORMATION TECHNOLOGIES On Improved Bounds for Probability Metrics and f- Divergences Igal Sason CCIT Report #855 March 014 Electronics Computers Communications
More informationA View on Extension of Utility-Based on Links with Information Measures
Communications of the Korean Statistical Society 2009, Vol. 16, No. 5, 813 820 A View on Extension of Utility-Based on Links with Information Measures A.R. Hoseinzadeh a, G.R. Mohtashami Borzadaran 1,b,
More informationarxiv: v4 [cs.it] 8 Apr 2014
1 On Improved Bounds for Probability Metrics and f-divergences Igal Sason Abstract Derivation of tight bounds for probability metrics and f-divergences is of interest in information theory and statistics.
More informationTight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding
APPEARS IN THE IEEE TRANSACTIONS ON INFORMATION THEORY, FEBRUARY 015 1 Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding Igal Sason Abstract Tight bounds for
More informationJune 21, Peking University. Dual Connections. Zhengchao Wan. Overview. Duality of connections. Divergence: general contrast functions
Dual Peking University June 21, 2016 Divergences: Riemannian connection Let M be a manifold on which there is given a Riemannian metric g =,. A connection satisfying Z X, Y = Z X, Y + X, Z Y (1) for all
More informationInformation Theory and Hypothesis Testing
Summer School on Game Theory and Telecommunications Campione, 7-12 September, 2014 Information Theory and Hypothesis Testing Mauro Barni University of Siena September 8 Review of some basic results linking
More informationStat410 Probability and Statistics II (F16)
Stat4 Probability and Statistics II (F6 Exponential, Poisson and Gamma Suppose on average every /λ hours, a Stochastic train arrives at the Random station. Further we assume the waiting time between two
More informationarxiv:math/ v1 [math.st] 19 Jan 2005
ON A DIFFERENCE OF JENSEN INEQUALITY AND ITS APPLICATIONS TO MEAN DIVERGENCE MEASURES INDER JEET TANEJA arxiv:math/05030v [math.st] 9 Jan 005 Let Abstract. In this paper we have considered a difference
More informationConvergence of generalized entropy minimizers in sequences of convex problems
Proceedings IEEE ISIT 206, Barcelona, Spain, 2609 263 Convergence of generalized entropy minimizers in sequences of convex problems Imre Csiszár A Rényi Institute of Mathematics Hungarian Academy of Sciences
More informationA SYMMETRIC INFORMATION DIVERGENCE MEASURE OF CSISZAR'S F DIVERGENCE CLASS
Journal of the Applied Mathematics, Statistics Informatics (JAMSI), 7 (2011), No. 1 A SYMMETRIC INFORMATION DIVERGENCE MEASURE OF CSISZAR'S F DIVERGENCE CLASS K.C. JAIN AND R. MATHUR Abstract Information
More informationEECS 750. Hypothesis Testing with Communication Constraints
EECS 750 Hypothesis Testing with Communication Constraints Name: Dinesh Krithivasan Abstract In this report, we study a modification of the classical statistical problem of bivariate hypothesis testing.
More informationProbability and Measure
Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability
More informationMathematical Foundations of the Generalization of t-sne and SNE for Arbitrary Divergences
MACHINE LEARNING REPORTS Mathematical Foundations of the Generalization of t-sne and SNE for Arbitrary Report 02/2010 Submitted: 01.04.2010 Published:26.04.2010 T. Villmann and S. Haase University of Applied
More informationBelief Propagation, Information Projections, and Dykstra s Algorithm
Belief Propagation, Information Projections, and Dykstra s Algorithm John MacLaren Walsh, PhD Department of Electrical and Computer Engineering Drexel University Philadelphia, PA jwalsh@ece.drexel.edu
More informationPhenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 2012
Phenomena in high dimensions in geometric analysis, random matrices, and computational geometry Roscoff, France, June 25-29, 202 BOUNDS AND ASYMPTOTICS FOR FISHER INFORMATION IN THE CENTRAL LIMIT THEOREM
More information40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology
Singapore University of Design and Technology Lecture 9: Hypothesis testing, uniformly most powerful tests. The Neyman-Pearson framework Let P be the family of distributions of concern. The Neyman-Pearson
More informationChapter 3 : Likelihood function and inference
Chapter 3 : Likelihood function and inference 4 Likelihood function and inference The likelihood Information and curvature Sufficiency and ancilarity Maximum likelihood estimation Non-regular models EM
More informationMATHS 730 FC Lecture Notes March 5, Introduction
1 INTRODUCTION MATHS 730 FC Lecture Notes March 5, 2014 1 Introduction Definition. If A, B are sets and there exists a bijection A B, they have the same cardinality, which we write as A, #A. If there exists
More informationChapter 1. Statistical Spaces
Chapter 1 Statistical Spaces Mathematical statistics is a science that studies the statistical regularity of random phenomena, essentially by some observation values of random variable (r.v.) X. Sometimes
More informationTotal Jensen divergences: Definition, Properties and k-means++ Clustering
Total Jensen divergences: Definition, Properties and k-means++ Clustering Frank Nielsen, Senior Member, IEEE Sony Computer Science Laboratories, Inc. 3-4-3 Higashi Gotanda, 4-00 Shinagawa-ku arxiv:309.709v
More informationSome New Information Inequalities Involving f-divergences
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 12, No 2 Sofia 2012 Some New Information Inequalities Involving f-divergences Amit Srivastava Department of Mathematics, Jaypee
More informationSigned Measures. Chapter Basic Properties of Signed Measures. 4.2 Jordan and Hahn Decompositions
Chapter 4 Signed Measures Up until now our measures have always assumed values that were greater than or equal to 0. In this chapter we will extend our definition to allow for both positive negative values.
More informationPROPERTIES. Ferdinand Österreicher Institute of Mathematics, University of Salzburg, Austria
CSISZÁR S f-divergences - BASIC PROPERTIES Ferdinand Österreicher Institute of Mathematics, University of Salzburg, Austria Abstract In this talk basic general properties of f-divergences, including their
More informationBayes spaces: use of improper priors and distances between densities
Bayes spaces: use of improper priors and distances between densities J. J. Egozcue 1, V. Pawlowsky-Glahn 2, R. Tolosana-Delgado 1, M. I. Ortego 1 and G. van den Boogaart 3 1 Universidad Politécnica de
More informationMarch Algebra 2 Question 1. March Algebra 2 Question 1
March Algebra 2 Question 1 If the statement is always true for the domain, assign that part a 3. If it is sometimes true, assign it a 2. If it is never true, assign it a 1. Your answer for this question
More informationCORRECTIONS AND ADDITION TO THE PAPER KELLERER-STRASSEN TYPE MARGINAL MEASURE PROBLEM. Rataka TAHATA. Received March 9, 2005
Scientiae Mathematicae Japonicae Online, e-5, 189 194 189 CORRECTIONS AND ADDITION TO THE PAPER KELLERER-STRASSEN TPE MARGINAL MEASURE PROBLEM Rataka TAHATA Received March 9, 5 Abstract. A report on an
More information44 CHAPTER 2. BAYESIAN DECISION THEORY
44 CHAPTER 2. BAYESIAN DECISION THEORY Problems Section 2.1 1. In the two-category case, under the Bayes decision rule the conditional error is given by Eq. 7. Even if the posterior densities are continuous,
More informationLegendre transformation and information geometry
Legendre transformation and information geometry CIG-MEMO #2, v1 Frank Nielsen École Polytechnique Sony Computer Science Laboratorie, Inc http://www.informationgeometry.org September 2010 Abstract We explain
More informationKernel methods and the exponential family
Kernel methods and the exponential family Stephane Canu a Alex Smola b a 1-PSI-FRE CNRS 645, INSA de Rouen, France, St Etienne du Rouvray, France b Statistical Machine Learning Program, National ICT Australia
More information1/37. Convexity theory. Victor Kitov
1/37 Convexity theory Victor Kitov 2/37 Table of Contents 1 2 Strictly convex functions 3 Concave & strictly concave functions 4 Kullback-Leibler divergence 3/37 Convex sets Denition 1 Set X is convex
More informationBregman divergence and density integration Noboru Murata and Yu Fujimoto
Journal of Math-for-industry, Vol.1(2009B-3), pp.97 104 Bregman divergence and density integration Noboru Murata and Yu Fujimoto Received on August 29, 2009 / Revised on October 4, 2009 Abstract. In this
More informationInformation geometry for bivariate distribution control
Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic
More informationx log x, which is strictly convex, and use Jensen s Inequality:
2. Information measures: mutual information 2.1 Divergence: main inequality Theorem 2.1 (Information Inequality). D(P Q) 0 ; D(P Q) = 0 iff P = Q Proof. Let ϕ(x) x log x, which is strictly convex, and
More informationFundamentals of Statistics
Chapter 2 Fundamentals of Statistics This chapter discusses some fundamental concepts of mathematical statistics. These concepts are essential for the material in later chapters. 2.1 Populations, Samples,
More informationInference. Data. Model. Variates
Data Inference Variates Model ˆθ (,..., ) mˆθn(d) m θ2 M m θ1 (,, ) (,,, ) (,, ) α = :=: (, ) F( ) = = {(, ),, } F( ) X( ) = Γ( ) = Σ = ( ) = ( ) ( ) = { = } :=: (U, ) , = { = } = { = } x 2 e i, e j
More information3 Integration and Expectation
3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ
More informationConvexity/Concavity of Renyi Entropy and α-mutual Information
Convexity/Concavity of Renyi Entropy and -Mutual Information Siu-Wai Ho Institute for Telecommunications Research University of South Australia Adelaide, SA 5095, Australia Email: siuwai.ho@unisa.edu.au
More informationSpring 2012 Math 541B Exam 1
Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote
More informationFundamental Tools - Probability Theory II
Fundamental Tools - Probability Theory II MSc Financial Mathematics The University of Warwick September 29, 2015 MSc Financial Mathematics Fundamental Tools - Probability Theory II 1 / 22 Measurable random
More informationL p Spaces and Convexity
L p Spaces and Convexity These notes largely follow the treatments in Royden, Real Analysis, and Rudin, Real & Complex Analysis. 1. Convex functions Let I R be an interval. For I open, we say a function
More informationThe Information Bottleneck Revisited or How to Choose a Good Distortion Measure
The Information Bottleneck Revisited or How to Choose a Good Distortion Measure Peter Harremoës Centrum voor Wiskunde en Informatica PO 94079, 1090 GB Amsterdam The Nederlands PHarremoes@cwinl Naftali
More informationMinimax lower bounds I
Minimax lower bounds I Kyoung Hee Kim Sungshin University 1 Preliminaries 2 General strategy 3 Le Cam, 1973 4 Assouad, 1983 5 Appendix Setting Family of probability measures {P θ : θ Θ} on a sigma field
More informationMaximization of the information divergence from the multinomial distributions 1
aximization of the information divergence from the multinomial distributions Jozef Juríček Charles University in Prague Faculty of athematics and Physics Department of Probability and athematical Statistics
More informationBrief Review on Estimation Theory
Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on
More informationStochastic Comparisons of Order Statistics from Generalized Normal Distributions
A^VÇÚO 1 33 ò 1 6 Ï 2017 c 12 Chinese Journal of Applied Probability and Statistics Dec. 2017 Vol. 33 No. 6 pp. 591-607 doi: 10.3969/j.issn.1001-4268.2017.06.004 Stochastic Comparisons of Order Statistics
More informationInformation Geometry of Positive Measures and Positive-Definite Matrices: Decomposable Dually Flat Structure
Entropy 014, 16, 131-145; doi:10.3390/e1604131 OPEN ACCESS entropy ISSN 1099-4300 www.mdpi.com/journal/entropy Article Information Geometry of Positive Measures and Positive-Definite Matrices: Decomposable
More informationGaussian vectors and central limit theorem
Gaussian vectors and central limit theorem Samy Tindel Purdue University Probability Theory 2 - MA 539 Samy T. Gaussian vectors & CLT Probability Theory 1 / 86 Outline 1 Real Gaussian random variables
More informationOn Differentiability of Average Cost in Parameterized Markov Chains
On Differentiability of Average Cost in Parameterized Markov Chains Vijay Konda John N. Tsitsiklis August 30, 2002 1 Overview The purpose of this appendix is to prove Theorem 4.6 in 5 and establish various
More informationarxiv: v4 [cs.it] 17 Oct 2015
Upper Bounds on the Relative Entropy and Rényi Divergence as a Function of Total Variation Distance for Finite Alphabets Igal Sason Department of Electrical Engineering Technion Israel Institute of Technology
More information2 Two-Point Boundary Value Problems
2 Two-Point Boundary Value Problems Another fundamental equation, in addition to the heat eq. and the wave eq., is Poisson s equation: n j=1 2 u x 2 j The unknown is the function u = u(x 1, x 2,..., x
More informationChapter 2 Exponential Families and Mixture Families of Probability Distributions
Chapter 2 Exponential Families and Mixture Families of Probability Distributions The present chapter studies the geometry of the exponential family of probability distributions. It is not only a typical
More informationReceived: 20 December 2011; in revised form: 4 February 2012 / Accepted: 7 February 2012 / Published: 2 March 2012
Entropy 2012, 14, 480-490; doi:10.3390/e14030480 Article OPEN ACCESS entropy ISSN 1099-4300 www.mdpi.com/journal/entropy Interval Entropy and Informative Distance Fakhroddin Misagh 1, * and Gholamhossein
More informationTHEOREM AND METRIZABILITY
f-divergences - REPRESENTATION THEOREM AND METRIZABILITY Ferdinand Österreicher Institute of Mathematics, University of Salzburg, Austria Abstract In this talk we are first going to state the so-called
More informationPart II Probability and Measure
Part II Probability and Measure Theorems Based on lectures by J. Miller Notes taken by Dexter Chua Michaelmas 2016 These notes are not endorsed by the lecturers, and I have modified them (often significantly)
More informationLatent Variable Models and EM algorithm
Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic
More informationChapter 2: Fundamentals of Statistics Lecture 15: Models and statistics
Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Data from one or a series of random experiments are collected. Planning experiments and collecting data (not discussed here). Analysis:
More informationContents 1. Introduction 1 2. Main results 3 3. Proof of the main inequalities 7 4. Application to random dynamical systems 11 References 16
WEIGHTED CSISZÁR-KULLBACK-PINSKER INEQUALITIES AND APPLICATIONS TO TRANSPORTATION INEQUALITIES FRANÇOIS BOLLEY AND CÉDRIC VILLANI Abstract. We strengthen the usual Csiszár-Kullback-Pinsker inequality by
More informationHypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3
Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest
More information6.1 Variational representation of f-divergences
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016
More informationMTH 404: Measure and Integration
MTH 404: Measure and Integration Semester 2, 2012-2013 Dr. Prahlad Vaidyanathan Contents I. Introduction....................................... 3 1. Motivation................................... 3 2. The
More informationMATH MEASURE THEORY AND FOURIER ANALYSIS. Contents
MATH 3969 - MEASURE THEORY AND FOURIER ANALYSIS ANDREW TULLOCH Contents 1. Measure Theory 2 1.1. Properties of Measures 3 1.2. Constructing σ-algebras and measures 3 1.3. Properties of the Lebesgue measure
More informationDivergence measure of intuitionistic fuzzy sets
Divergence measure of intuitionistic fuzzy sets Fuyuan Xiao a, a School of Computer and Information Science, Southwest University, Chongqing, 400715, China Abstract As a generation of fuzzy sets, the intuitionistic
More informationConcentration Inequalities for Density Functionals. Shashank Singh
Concentration Inequalities for Density Functionals by Shashank Singh Submitted to the Department of Mathematical Sciences in partial fulfillment of the requirements for the degree of Master of Science
More informationResearch Article On Some Improvements of the Jensen Inequality with Some Applications
Hindawi Publishing Corporation Journal of Inequalities and Applications Volume 2009, Article ID 323615, 15 pages doi:10.1155/2009/323615 Research Article On Some Improvements of the Jensen Inequality with
More informationGoodness of Fit Test and Test of Independence by Entropy
Journal of Mathematical Extension Vol. 3, No. 2 (2009), 43-59 Goodness of Fit Test and Test of Independence by Entropy M. Sharifdoost Islamic Azad University Science & Research Branch, Tehran N. Nematollahi
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationChapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)
Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Detection problems can usually be casted as binary or M-ary hypothesis testing problems. Applications: This chapter: Simple hypothesis
More information2882 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 6, JUNE A formal definition considers probability measures P and Q defined
2882 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 6, JUNE 2009 Sided and Symmetrized Bregman Centroids Frank Nielsen, Member, IEEE, and Richard Nock Abstract In this paper, we generalize the notions
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More informationST5215: Advanced Statistical Theory
Department of Statistics & Applied Probability Monday, September 26, 2011 Lecture 10: Exponential families and Sufficient statistics Exponential Families Exponential families are important parametric families
More informationRobustness and duality of maximum entropy and exponential family distributions
Chapter 7 Robustness and duality of maximum entropy and exponential family distributions In this lecture, we continue our study of exponential families, but now we investigate their properties in somewhat
More informationStochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions
International Journal of Control Vol. 00, No. 00, January 2007, 1 10 Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions I-JENG WANG and JAMES C.
More informationf-divergence Inequalities
f-divergence Inequalities Igal Sason Sergio Verdú Abstract This paper develops systematic approaches to obtain f-divergence inequalities, dealing with pairs of probability measures defined on arbitrary
More informationOn the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method
On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method Igal Sason Department of Electrical Engineering Technion - Israel Institute of Technology Haifa 32000, Israel ETH, Zurich,
More informationf-divergence Inequalities
SASON AND VERDÚ: f-divergence INEQUALITIES f-divergence Inequalities Igal Sason, and Sergio Verdú Abstract arxiv:508.00335v7 [cs.it] 4 Dec 06 This paper develops systematic approaches to obtain f-divergence
More informationarxiv: v2 [cs.it] 26 Sep 2011
Sequences o Inequalities Among New Divergence Measures arxiv:1010.041v [cs.it] 6 Sep 011 Inder Jeet Taneja Departamento de Matemática Universidade Federal de Santa Catarina 88.040-900 Florianópolis SC
More information1: PROBABILITY REVIEW
1: PROBABILITY REVIEW Marek Rutkowski School of Mathematics and Statistics University of Sydney Semester 2, 2016 M. Rutkowski (USydney) Slides 1: Probability Review 1 / 56 Outline We will review the following
More informationBayesian statistics: Inference and decision theory
Bayesian statistics: Inference and decision theory Patric Müller und Francesco Antognini Seminar über Statistik FS 28 3.3.28 Contents 1 Introduction and basic definitions 2 2 Bayes Method 4 3 Two optimalities:
More informationCONVEX FUNCTIONS AND MATRICES. Silvestru Sever Dragomir
Korean J. Math. 6 (018), No. 3, pp. 349 371 https://doi.org/10.11568/kjm.018.6.3.349 INEQUALITIES FOR QUANTUM f-divergence OF CONVEX FUNCTIONS AND MATRICES Silvestru Sever Dragomir Abstract. Some inequalities
More informationf-divergence Estimation and Two-Sample Homogeneity Test under Semiparametric Density-Ratio Models
IEEE Transactions on Information Theory, vol.58, no.2, pp.708 720, 2012. 1 f-divergence Estimation and Two-Sample Homogeneity Test under Semiparametric Density-Ratio Models Takafumi Kanamori Nagoya University,
More informationZOBECNĚNÉ PHI-DIVERGENCE A EM-ALGORITMUS V AKUSTICKÉ EMISI
ZOBECNĚNÉ PHI-DIVERGENCE A EM-ALGORITMUS V AKUSTICKÉ EMISI Jan Tláskal a Václav Kůs Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University in Prague 11.11.2010 Concepts What the
More informationSTATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero
STATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero 1999 32 Statistic used Meaning in plain english Reduction ratio T (X) [X 1,..., X n ] T, entire data sample RR 1 T (X) [X (1),..., X (n) ] T, rank
More informationFormulas for probability theory and linear models SF2941
Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms
More informationAlgebraic Information Geometry for Learning Machines with Singularities
Algebraic Information Geometry for Learning Machines with Singularities Sumio Watanabe Precision and Intelligence Laboratory Tokyo Institute of Technology 4259 Nagatsuta, Midori-ku, Yokohama, 226-8503
More informationStatistical Inference
Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Week 12. Testing and Kullback-Leibler Divergence 1. Likelihood Ratios Let 1, 2, 2,...
More informationLecture 1: Introduction
Principles of Statistics Part II - Michaelmas 208 Lecturer: Quentin Berthet Lecture : Introduction This course is concerned with presenting some of the mathematical principles of statistical theory. One
More informationINFORMATION PROCESSING ABILITY OF BINARY DETECTORS AND BLOCK DECODERS. Michael A. Lexa and Don H. Johnson
INFORMATION PROCESSING ABILITY OF BINARY DETECTORS AND BLOCK DECODERS Michael A. Lexa and Don H. Johnson Rice University Department of Electrical and Computer Engineering Houston, TX 775-892 amlexa@rice.edu,
More information4. Conditional risk measures and their robust representation
4. Conditional risk measures and their robust representation We consider a discrete-time information structure given by a filtration (F t ) t=0,...,t on our probability space (Ω, F, P ). The time horizon
More information4 Expectation & the Lebesgue Theorems
STA 205: Probability & Measure Theory Robert L. Wolpert 4 Expectation & the Lebesgue Theorems Let X and {X n : n N} be random variables on a probability space (Ω,F,P). If X n (ω) X(ω) for each ω Ω, does
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationExpectation Maximization
Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /
More informationConvex Functions. Wing-Kin (Ken) Ma The Chinese University of Hong Kong (CUHK)
Convex Functions Wing-Kin (Ken) Ma The Chinese University of Hong Kong (CUHK) Course on Convex Optimization for Wireless Comm. and Signal Proc. Jointly taught by Daniel P. Palomar and Wing-Kin (Ken) Ma
More informationInformation Geometric view of Belief Propagation
Information Geometric view of Belief Propagation Yunshu Liu 2013-10-17 References: [1]. Shiro Ikeda, Toshiyuki Tanaka and Shun-ichi Amari, Stochastic reasoning, Free energy and Information Geometry, Neural
More information