Limits from l Hôpital rule: Shannon entropy as limit cases of Rényi and Tsallis entropies

Size: px

Start display at page:

Download "Limits from l Hôpital rule: Shannon entropy as limit cases of Rényi and Tsallis entropies"

Letitia Ryan
5 years ago
Views:

1 Limits from l Hôpital rule: Shannon entropy as it cases of Rényi and Tsallis entropies Frank Nielsen École Polytechnique/Sony Computer Science Laboratorie, Inc August 6, 200 Let us prove that both Rényi extensive R α and Tsallis non-extensive T α entropies of order α tend to Shannon entropy when α using l Hôpital rule. Following the same principle, we then show that both Rényi and Tsallis information divergences tend to the celebrated Kullback-Leibler divergence when α. L Hôpital rule dates back to the 7th century, and states that the it of the indeterminate ratio of functions equals to the it of the ratio of their derivatives provided that (i) the its of both the numerator and denominator coincide (either to, 0, or ), and that (ii) the it of the ratio of the derivatives also exists. That is, if x α f(x) = x α g(x) = 0 and x α f (x)/g (x) = l exists, then f(x) x α g(x) = f (x) x α g (x) = l. (Although the its of f and g can alternatively be chosen as ± instead of 0, we shall use below the theorem to solve 0/0 indeterminate forms.) Rényi entropy Consider Rényi entropy [2] R α (P ) = log n α i= pα i (for α > 0 and α ), and let us prove that R α (P ) tends to Shannon entropy S(P ) = n i= p i log p i

2 when α. Set f(α) = log n i= pα i (for any fixed distribution P ) and g(α) = α. Then dg(α) = and dα df(α) dα = n i= d dα (pα i ) n i= pα i after applying the derivative chain rule. Since d dα (pα i ) = d dα eα log p i = (log p i )e α log p i = p α i log p i, we get n g (α) = p α n i log p i, and α g (α) = p i log p i. i= Since α f(α) = α g(α) = 0 and α g (α) = n i= p i log p i, we deduce from l Hôpital rule that α R α (P ) = H(P ). That is, Rényi entropy tends to Shannon entropy as α. 2 Tsallis entropy Consider now the Tsallis entropy [3] T α of order α (for α R)): T α (P ) = i= pα i α. Let us prove using l Hôpital rule that α T α (P ) = S(p) = i p log p, Shannon entropy. For a fixed distribution P, write T α(p ) = f(α) g(α) with g(α) = α and g (α) =. We have f(α) = i= pα i and = i pα i log p i since (p α i ) = (e α log p i ) = p α i log p i. Since both its of f(α) and g(α) tend to 0 as α, we can apply l Hôpital rule and get α T α (P ) = α g (α) = i p i log p i = H(P ), namely Shannon entropy. 3 Rényi and Tsallis information divergences Based on Rényi and Tsallis entropies, we define the corresponding information divergences [4, ]: i= R α (P : Q) = T α (P : Q) = α log i α ( i p α i q α i, α > 0 and α 0 p α i q α i ), α R. 2

3 Let us again apply l Hôpital rule to show that those parametric information divergences tend to the Kullback-Leibler divergence when α. Consider f(α) = i= pα i q α i (for fixed distributions P and Q), using the derivative chain rule we get = i pα i q α i log p i p α i q α i log = i pα i q α i log p i using the fact that (q α i ) = (e ( α) log ) = log (e ( α) log ) = q α i log. Let g(α) = α, and r(α) = log f(α). We have g (α) = and r (α) = f(α) = i pα i q α i log p i q i i= pα i q α i. We are ready to apply l Hôpital rule since α r(x) = α g(x) = 0 and α r (x) = i p i log p i and α g (x) = to conclude that Rényi information divergence tends to α R α(p : Q) = i p i log p i = KL(P : Q), i.e., the Kullback-Leibler divergence, also known as the relative entropy with respect to Shannon entropy. Similarly, for Tsallis information divergence, we set t(α) = f(α) = i pα i q α i and get t (α) = = i pα i q α i log p i. We apply l Hôpital rule and conclude that Tsallis information tends to the Kullback-Leibler divergence: α T α(p : Q) = i p i log p i = KL(P : Q). Note that those Rényi and Tsallis information divergences are mutually related to each other by monotonic transformations:, R α (P : Q) = log( + (α )T (P : Q)), () α T α (P : Q) = e(α )Rα(P :Q) (2) α In particular for α, using a first-order approximation log(+x) x 0 x, we find that R α(p : Q) α α α (α )T (P : Q) α T α (P : Q) (Similarly, we have exp(x) x + x, so that using E q.?? we deduce that T α (P : Q) α R α (P : Q).) 3

4 Although both extensive Rényi and non-extensive Tsallis information divergences meet in the it case α =, those divergences behave very differently from the classical Kullback-Leibler divergence. Indeed, the Kullback- Leibler divergence can be decomposed as KL(P : Q) = H(P, Q) H(P, P ), that is the cross-entropy H(P : Q) = i p i log minus the entropy H(P ) = H(P, P ) = i p i log p i. Such a sum-decomposition does not apply to Rényi/Tsallis information divergences. (Unfortunately, Rényi/Tsallis information divergences are also sometimes misleadingly called cross-entropies in the literature. Indeed, this is confusing since in that case the entropy cannot be considered as a self cross-entropy, that would otherwise yields an inconsistent zero!) 4 Proof of l Hôpital rule Consider the case of the indeterminate ratio 0/0. The proof of l Hôpital rule relies on Cauchy mean value theorem that states for a C -function f on [a, b] there exists c (a, b) such that f (c) = f(b) f(a). b a First, let us define by continuity f(α) = g(α) = 0, and let x α g (α) = l. There exists x (α δ, α + δ)\{α} such that both f (x) and g (x) exist and g (x) 0. Suppose without loss of generality that x (α, α + δ), then the mean value theorem implies that g(x) 0 (otherwise by contradiction, we would have y (α, x) with g (y) = 0). Thus according to the mean value theorem there exists a point ɛ x (α, x) such that f(x) g(x) = f (ɛ x ) g (ɛ x ) When x α, we have ɛ x α and since the ratio of the derivatives exists, we deduce that f(x) x α g(x) = f (ɛ x ) x α g (ɛ x ) = f (x) x α g (x) = l. 4

Figure : L Hospital textbook on infinitesimal calculus: Cover pages of the original anonymous edition of M.DC.XCVI=696 (left) and of the second edition of MDCCXVI=76 (right).

old French) hired Swiss mathematician Johann Bernoulli (667 748) to lecture him on mathematics.

5 Figure : L Hospital textbook on infinitesimal calculus: Cover pages of the original anonymous edition of M.DC.XCVI=696 (left) and of the second edition of MDCCXVI=76 (right). 5 Historical notes Although the theorem is nowadays cited as l Hôpital rule, it is commonly believed to be partly his work since French nobleman Guillaume de l Hôpital (66-704, spelled l Hospital in old French) hired Swiss mathematician Johann Bernoulli ( ) to lecture him on mathematics. L Hôpital wrote and published in 696 the very first textbook on infinitesimal calculus (a precursor of modern differential calculus omitting voluntarily integration) titled: Analysis of the infinitely small to understand curves (see Figure ). The name of Bernoulli is acknowledged in the preface as the mathematician that pursued on the inspiring preinary work of Leibniz (646 76). 5

6 References [] Andrzej Cichocki, Anh Huy Phan, and Rafal Zdunek. Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation; electronic version. Wiley, [2] Alfred Rényi. On measures of entropy and information. In Proc. 4th Berkeley Symp. Math. Stat. and Prob., volume, pages , 96. [3] Constantino Tsallis. Possible generalization of Boltzmann-Gibbs statistics. J. Stat. Physics, 52, 988. [4] Tim van Erven and Peter Harremoës. Rényi divergence and its properties. CoRR, abs/ ,

Convexity/Concavity of Renyi Entropy and α-mutual Information

Convexity/Concavity of Renyi Entropy and -Mutual Information Siu-Wai Ho Institute for Telecommunications Research University of South Australia Adelaide, SA 5095, Australia Email: siuwai.ho@unisa.edu.au