arxiv: v4 [cs.it] 17 Oct 2015

Similar documents
Tight Bounds for Symmetric Divergence Measures and a New Inequality Relating f-divergences

On Improved Bounds for Probability Metrics and f- Divergences

arxiv: v4 [cs.it] 8 Apr 2014

Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding

f-divergence Inequalities

f-divergence Inequalities

Convexity/Concavity of Renyi Entropy and α-mutual Information

Arimoto-Rényi Conditional Entropy. and Bayesian M-ary Hypothesis Testing. Abstract

On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method

MMSE Dimension. snr. 1 We use the following asymptotic notation: f(x) = O (g(x)) if and only

Lower Bounds on the Graphical Complexity of Finite-Length LDPC Codes

Arimoto Channel Coding Converse and Rényi Divergence

Literature on Bregman divergences

Channels with cost constraints: strong converse and dispersion

The Information Bottleneck Revisited or How to Choose a Good Distortion Measure

A GENERAL CLASS OF LOWER BOUNDS ON THE PROBABILITY OF ERROR IN MULTIPLE HYPOTHESIS TESTING. Tirza Routtenberg and Joseph Tabrikian

On the Concentration of the Crest Factor for OFDM Signals

A new converse in rate-distortion theory

Inequalities for the L 1 Deviation of the Empirical Distribution

Universal Estimation of Divergence for Continuous Distributions via Data-Dependent Partitions

Soft Covering with High Probability

Refined Bounds on the Empirical Distribution of Good Channel Codes via Concentration Inequalities

Information Theory and Hypothesis Testing

Correlation Detection and an Operational Interpretation of the Rényi Mutual Information

Dispersion of the Gilbert-Elliott Channel

Jensen-Shannon Divergence and Hilbert space embedding

Entropy measures of physics via complexity

An Extended Fano s Inequality for the Finite Blocklength Coding

A Single-letter Upper Bound for the Sum Rate of Multiple Access Channels with Correlated Sources

Journal of Inequalities in Pure and Applied Mathematics

Bounds for entropy and divergence for distributions over a two-element set

On the Capacity of Free-Space Optical Intensity Channels

Strong Converse Theorems for Classes of Multimessage Multicast Networks: A Rényi Divergence Approach

Sequential prediction with coded side information under logarithmic loss

Analytical Bounds on Maximum-Likelihood Decoded Linear Codes: An Overview

Amobile satellite communication system, like Motorola s

The Method of Types and Its Application to Information Hiding

Upper Bounds on the Capacity of Binary Intermittent Communication

Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation

An Improved Sphere-Packing Bound for Finite-Length Codes over Symmetric Memoryless Channels

The Poisson Channel with Side Information

Functional Properties of MMSE

Some Expectations of a Non-Central Chi-Square Distribution With an Even Number of Degrees of Freedom

MUTUAL INFORMATION (MI) specifies the level of

An Achievable Error Exponent for the Mismatched Multiple-Access Channel

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

A Tight Upper Bound on the Second-Order Coding Rate of Parallel Gaussian Channels with Feedback

Information Theory in Intelligent Decision Making

IN THIS PAPER, we consider a class of continuous-time recurrent

arxiv: v8 [cs.it] 20 Feb 2014

Lecture 2: August 31

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

Two Applications of the Gaussian Poincaré Inequality in the Shannon Theory

Bounded Infinite Sequences/Functions : Orders of Infinity

Convergence of generalized entropy minimizers in sequences of convex problems

Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels

5 Mutual Information and Channel Capacity

On bounded redundancy of universal codes

On Bayes Risk Lower Bounds

Necessary and Sufficient Conditions for High-Dimensional Salient Feature Subset Recovery

Channel Polarization and Blackwell Measures

Lecture 21: Minimax Theory

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Capacity-Achieving Ensembles for the Binary Erasure Channel With Bounded Complexity

5.1 Inequalities via joint range

The Information Lost in Erasures Sergio Verdú, Fellow, IEEE, and Tsachy Weissman, Senior Member, IEEE

Guesswork Subject to a Total Entropy Budget

A View on Extension of Utility-Based on Links with Information Measures

Feedback Capacity of the Compound Channel

Channel Dispersion and Moderate Deviations Limits for Memoryless Channels

Gaussian Estimation under Attack Uncertainty

Intermittent Communication

Block 2: Introduction to Information Theory

EECS 750. Hypothesis Testing with Communication Constraints

Series 7, May 22, 2018 (EM Convergence)

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 53, NO. 12, DECEMBER

4488 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 10, OCTOBER /$ IEEE

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Dept. of Linguistics, Indiana University Fall 2015

Cut-Set Bound and Dependence Balance Bound

Quantum Sphere-Packing Bounds and Moderate Deviation Analysis for Classical-Quantum Channels

Subset Universal Lossy Compression

Multiaccess Channels with State Known to One Encoder: A Case of Degraded Message Sets

A Formula for the Capacity of the General Gel fand-pinsker Channel

An Alternative Proof for the Capacity Region of the Degraded Gaussian MIMO Broadcast Channel

The Compound Capacity of Polar Codes

The Fading Number of a Multiple-Access Rician Fading Channel

Tightened Upper Bounds on the ML Decoding Error Probability of Binary Linear Block Codes and Applications

ECE 4400:693 - Information Theory

Simple Channel Coding Bounds

Rényi Information Dimension: Fundamental Limits of Almost Lossless Analog Compression

Optimal Distributed Detection Strategies for Wireless Sensor Networks

Approaching Blokh-Zyablov Error Exponent with Linear-Time Encodable/Decodable Codes

SHARED INFORMATION. Prakash Narayan with. Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe

4F5: Advanced Communications and Coding Handout 2: The Typical Set, Compression, Mutual Information

Mismatched Multi-letter Successive Decoding for the Multiple-Access Channel

On Competitive Prediction and Its Relation to Rate-Distortion Theory

Variable Length Codes for Degraded Broadcast Channels

On Third-Order Asymptotics for DMCs

Goodness of Fit Test and Test of Independence by Entropy

Transcription:

Upper Bounds on the Relative Entropy and Rényi Divergence as a Function of Total Variation Distance for Finite Alphabets Igal Sason Department of Electrical Engineering Technion Israel Institute of Technology Haifa 3, Israel E-mail: sason@ee.technion.ac.il Sergio Verdú Department of Electrical Engineering Princeton University Princeton, New Jersey 8544, USA E-mail: verdu@princeton.edu arxiv:53.347v4 [cs.it] 7 Oct 5 Abstract A new upper bound on the relative entropy is derived as a function of the total variation distance for probability measures defined on a common finite alphabet. The bound improves a previously reported bound by Csiszár and Talata. It is further extended to an upper bound on the Rényi divergence of an arbitrary non-negative order including as a function of the total variation distance. Keywords: Pinsker s inequality, relative entropy, relative information, Rényi divergence, total variation distance.. INTRODUCTION Consider two probability distributions P and Q defined on a common measurable space A, F. The Csiszár- Kemperman-Kullback-Pinsker inequality a.k.a. Pinsker s inequality states that where [ DP Q = E P log dp ] dq P Q log e DP Q = A d log d dq designates the relative entropy a.k.a. the Kullback-Leibler divergence from P to Q, and P Q = sup P F QF 3 F F is the total variation distance between P and Q. A reverse Pinsker inequality providing an upper bound on the relative entropy in terms of the total variation distance does not exist in general since we can find distributions that are arbitrarily close in total variation but with arbitrarily high relative entropy. Nevertheless, it is possible to introduce constraints under which such reverse Pinsker inequalities can be obtained. In the case where the probability measures P and Q are defined on a common discrete i.e., finite or countable set A, DP Q = log, 4 P Q =. 5 One of the implications of is that convergence in relative entropy implies convergence in total variation distance. The total variation distance is bounded P Q, whereas the relative entropy is an unbounded information measure. Improved versions of Pinsker s inequality were studied, e.g., in [9], [], [4], [7], []. A reverse Pinsker inequality providing an upper bound on the relative entropy in terms of the total variation distance does not exist in general since we can find distributions that are arbitrarily close in total variation but with arbitrarily high relative entropy. Nevertheless, it is possible to introduce constraints under which such reverse Pinsker inequalities can be obtained. In the case of a finite alphabet A, Csiszár and Talata [6, p. ] show that log e DP Q P Q, 6 where. 7 Recent applications of 6 can be found in [, Appendix D] and [, Lemma 7] for the analysis of the thirdorder asymptotics of the discrete memoryless channel with or without cost constraints. In addition to in 7, the bounds in this paper involve β =, 8 β = 9 so, β, β [, ]. In this paper, Section derives a reverse Pinsker inequality for probability measures defined on a common finite set, improving the bound in 6. The utility of this inequality is studied in Section 3, and it is extended in Section 4 to Rényi divergences of an arbitrary non-negative order.. A NEW REVERSE PINSKER INEQUALITY FOR DISTRIBUTIONS ON A FINITE SET The present section introduces a strengthened version of 6, followed by some remarks and an example.

A. Main Result and Proof Theorem. Let P and Q be probability measures defined on a common finite set A, and assume that Q is strictly positive on A. Then, the following inequality holds: DP Q log + log + P Q P Q β log e P Q where and β are given in 7 and 9, respectively. Proof: Theorem is proved by obtaining upper and lower bounds on the χ -divergence from P to Q χ P Q. A lower bound follows by invoking Jensen s inequality χ P Q = 3 = exp log 4 exp log 5 = exp DP Q. 6 Alternatively, 6 can be obtained by combining the equality χ P Q = exp D P Q 7 with the monotonicity of the Rényi divergence D α P Q in α, which implies that D P Q DP Q. A refined version of 6 is derived in the following. The starting point is a refined version of Jensen s inequality in [, Lemma ], generalizing a result from [7, Theorem ], which leads to see [, Theorem 7] DQ P log + χ P Q DP Q 8 max DQ P. 9 From 9 and the definition of β in 9, we have χ P Q exp DP Q + β DQ P exp DP Q + β log e P Q where follows from 8 and the definition of β in 9, and follows from Pinsker s inequality. Note that the lower bound in refines the lower bound in 6 since β [, ]. An upper bound on χ P Q is derived as follows: and, from 3, χ P Q = = P Q max 3 P Q max. 4 Combining 3 and 4 yields χ P Q P Q. 5 Finally, follows by combining the upper and lower bounds on the χ -divergence in and 5. Remark. It is easy to check that Theorem strengthens the bound by Csiszár and Talata in 6 by at least a factor of since upper bounding the logarithm in gives DP Q β log e P Q. 6 In the finite-alphabet case, we can obtain another upper bound on DP Q as a function of the l norm P Q : DP Q log + P Q β log e P Q 7 which follows by combining,, and P Q P Q. Using the inequality log + x x log e for x in the right side of 7, and also loosening this bound by β log e ignoring the term P Q, we recover the bound DP Q P Q log e 8 which appears in the proof of Property 4 of [, Lemma 7], and also used in [, 74]. Remark. The lower bounds on the χ -divergence in 6 and improve the one in [6, Lemma 6.3] which states that DP Q χ P Q log e. Remark 3. Reverse Pinsker inequalities have been also derived in quantum information theory [], [], providing upper bounds on the relative entropy of two quantum states as a function of the trace norm distance when the imal eigenvalues of the states are positive c.f. [, Theorem 6] and [, Theorem ]. These type of bounds are akin to the weakend form in. When the variational distance is much smaller than the imal eigenvalue see [, Eq. 57], the latter bounds have a quadratic scaling in this distance, similarly to ; they are also inversely proportional to the imal eigenvalue, similarly to the dependence of in.

3. APPLICATIONS OF THEOREM A. The Exponential Decay of the Probability for a Non- Typical Sequence To exemplify the utility of Theorem, we bound the function L δ Q = DP Q 9 P T δ Q where we have denoted the subset of probability measures on A, F which are δ-close to Q as } T δ Q = P : a A, δ 3 Note that a,..., a n is strongly δ-typical according to Q if its empirical distribution belongs to T δ Q. According to Sanov s theorem e.g. [5, Theorem.4.], if the random variables are independent distributed according to Q, then the probability that Y,..., Y n, is not δ-typical vanishes exponentially with exponent L δ Q. To state the next result, we invoke the following notions from [4]. Given a probability measure Q, its balance coefficient is given by β Q = inf QA. 3 A F : QA The function φ:, ] [ log e, is given by 4 p φp = log p p, p,, 3 log e, p =. Theorem. If >, then φ β Q Q δ L δ Q 33 where 34 holds if δ Q. log + δ 34 Proof: Ordentlich and Weinberger [4, Section 4] show the refinement of Pinsker s inequality: φ β Q P Q DP Q. 35 Note that if > then β Q <, and therefore φ β Q is well defined and finite. If P T δ Q the simple bound P Q > δ 36 together with 35 yields 33. The upper bound 34 follows from and the fact that if δ Q, then P Q = δ. 37 P T δ Q If δ Q, the ratio between the upper and lower bounds in 34, satisfies log e φ β Q log + Q δ log e Q 4 38 δ where 38 follows from the fact that its second and third factors are less than or equal to and 4, respectively. Note that the bounds in 33 and 34 scale like δ for δ. B. Distance from Equiprobable If P is a distribution on a finite set A, HP gauges the distance from U, the equiprobable distribution, since HP = log A DP U. 39 Thus, it is of interest to explore the relationship between HP and P U. Particularizing, [4,.] see also [4, pp. 3 3], and we obtain P U log e log A HP, 4 P U A exp HP, 4 respectively..8.6.4..8. P U b c exp HP, 4 A A =4 a.5.5.8.6.4..8. b c HP bits A = 6 a.5.5.5 3 3.5 4 HP bits Fig.. Bounds on P U as a function of HP for A = 4, and A = 6. The point HP, P U =, A is depicted on the y-axis. In the curves of the two plots, the bounds a, b and c refer, respectively, to 4, 4 and 4. The bounds in 4 4 are illustrated for A = 4, 6 in Figure. For HP =, P U = A is shown for reference in Figure ; as the cardinality of the alphabet

increases, the gap between P U and its upper bound is reduced and this gap decays asymptotically to zero. Results on the more general problem of finding bounds on HP HQ based on P Q can be found in [5, Theorem 7.3.3], [], [6], [8], [6, Section.7] and [7]. 4. EXTENSION OF THEOREM TO RÉNYI DIVERGENCES Definition. The Rényi divergence of order α [, ] from P to Q is defined for α,, as D α P Q α log P α a Q α a. 43 Recall that D P Q DP Q is defined to be the analytic extension of D α P Q at α = if DP Q <, L Hôpital s rule gives that DP Q = lim α D α P Q. The extreme cases of α =, are defined as follows: If α = then D P Q = log QSupportP, If α = + then D P Q = log sup. Pinsker s inequality was extended by Gilardoni [] for a Rényi divergence of order α, ] see also [8, Theorem 3], and it gets the form α P Q log e D α P Q. A tight lower bound on the Rényi divergence of order α > as a function of the total variation distance is given in [9], which is consistent with Vajda s tight lower bound for f- divergences in [3, Theorem 3]. Motivated by these findings, we extend the upper bound on the relative entropy in Theorem to Rényi divergences of an arbitrary order. Theorem 3. Assume that P, Q are strictly positive with imum masses denoted by P and, respectively. Let β and β be given in 8 and 9, respectively, and abbreviate δ P Q [, ]. Then, the Rényi divergence of order α [, ] satisfies D α P Q f, α, ] f, α [, ] f, f 3, f 4 }, α, } log δ, f, f 3, f 4, α [ ], 44 where, for α [, ], f α, β, δ α log + δβ α β α [,, δ β log β, α =, log β, for α [, ] α = f α, β,, δ } f α, β, δ, log + δ and, for α [,, f 3 and f 4 are given by f 3 α, P, β, δ [ α log + δ α f 4 β,, δ log + δ P 45 46 ] β δ log e, 47 β δ log e, log + δ, } δ }. 48 Proof: See [, Section 7.C]. Remark 4. A simple bound, albeit looser than the one in Theorem 3 is P Q D α P Q log + 49 which is asymptotically tight as α in the case of a binary alphabet with equiprobable Q. Example. Figure illustrates the bound in 45, which is valid for all α [, ] see [, Theorem 3], and the upper bounds of Theorem 3 in the case of binary alphabets. 5. SUMMARY We derive in this paper some reverse Pinsker inequalities for probability measures P Q defined on a common finite set, which provide lower bounds on the total variation distance P Q as a function of the relative entropy DP Q under the assumption of a bounded relative information or >. More general results for an arbitrary alphabet are available in [, Section 5]. In [], we study bounds among various f-divergences, dealing with arbitrary alphabets and deriving bounds on the ratios of various distance measures. New expressions of the Rényi divergence in terms of the relative information spectrum are derived, leading to upper and lower bounds on the Rényi divergence in terms of the variational distance.

.7 nats.5.3.. b a D P kq Fig.. The Rényi divergence D αp Q for P and Q which are defined on a binary alphabet with P = Q = 5, compared to a its upper bound in 44, and b its upper bound in 45 see [, Theorem 3]. The two bounds coincide here when α,.9,. ACKNOWLEDGMENT The work of I. Sason has been supported by the Israeli Science Foundation ISF under Grant /, and the work of S. Verdú has been supported by the US National Science Foundation under Grant CCF-665, and in part by the Center for Science of Information, an NSF Science and Technology Center under Grant CCF-93937. REFERENCES [] K. M. R. Audenaert and J. Eisert, Continuity bounds on the quantum relative entropy, Journal of Mathematical Physics, vol. 46, paper 4, October 5. [] K. M. R. Audenaert and J. Eisert, Continuity bounds on the quantum relative entropy - II, Journal of Mathematical Physics, vol. 5, paper, November. [3] G. Böcherer and B. C. Geiger, Optimal quantization for distribution synthesis, March 5. Available at http://arxiv.org/abs/37.6843. [4] J. Bretagnolle and C. Huber, Estimation des densités: risque imax, Probability Theory and Related Fields, vol. 47, no., pp. 9 37, 979. [5] T. M. Cover and J. A. Thomas, Elements of Information Theory, second edition, John Wiley & Sons, 6. [6] I. Csiszár and Z. Talata, Context tree estimation for not necessarily finite memory processes, via BIC and MDL, IEEE Trans. on Information Theory, vol. 5, no. 3, pp. 7 6, March 6. [7] S. S. Dragomir, Bounds for the normalized Jensen functional, Bulletin of the Australian Mathematical Society, vol. 74, no. 3, pp. 47 478, 6. [8] T. van Erven and P. Harremoës, Rényi divergence and Kullback- Leibler divergence, IEEE Trans. on Information Theory, vol. 6, no. 7, pp. 3797 38, July 4. [9] A. A. Fedotov, P. Harremoës and F. Topsøe, Refinements of Pinsker s inequality, IEEE Trans. on Information Theory, vol. 49, no. 6, pp. 49 498, June 3. [] G. L. Gilardoni, On Pinsker s and Vajda s type inequalities for Csiszár s f-divergences, IEEE Trans. on Information Theory, vol. 56, no., pp. 5377 5386, November. [] S. W. Ho and R. W. Yeung, The interplay between entropy and variational distance, IEEE Trans. on Information Theory, vol. 56, no., pp. 596 599, December. [] V. Kostina and S. Verdú, Channels with cost constraints: strong converse and dispersion, to appear in the IEEE Trans. on Information Theory, vol. 6, no. 5, May 5. [3] M. Kraj ci, C. F. Liu, L. Mike s and S. M. Moser, Performance analysis of Fano coding, Proceedings of the IEEE 5 International Symposium on Information Theory, Hong Kong, June 4 9, 5. [4] E. Ordentlich and M. J. Weinberger, A distribution dependent refinement of Pinsker s inequality, IEEE Trans. on Information Theory, vol. 5, no. 5, pp. 836 84, May 5. [5] M. S. Pinsker, Information and Information Stability of Random Variables and Random Processes, San-Fransisco: Holden-Day, 964, originally published in Russian in 96. [6] V. V. Prelov and E. C. van der Meulen, Mutual information, variation, and Fano s inequality, Problems of Information Transmission, vol. 44, no. 3, pp. 85 97, September 8. [7] M. D. Reid and R. C. Williamson, Information, divergence and risk for binary experiments, Journal of Machine Learning Research, vol., no. 3, pp. 73 87, March. [8] I. Sason, Entropy bounds for discrete random variables via maximal coupling, IEEE Trans. on Information Theory, vol. 59, no., pp. 78 73, November 3. [9] I. Sason, On the Rényi divergence and the joint range of relative entropies, Proceedings of the 5 IEEE International Symposium on Information Theory, pp. 6 64, Hong Kong, June 4 9, 5. [] I. Sason and S. Verdú, Bounds among f-divergences, submitted to the IEEE Trans. on Information Theory, July 5. [Online]. Available at http://arxiv.org/abs/58.335. [] M. Tomamichel and V. Y. F. Tan, A tight upper bound for the third-order asymptotics for most discrete memoryless channels, IEEE Trans. on Information Theory, vol. 59, no., pp. 74 75, November 3. [] I. Vajda, Note on discriation information and variation, IEEE Trans. on Information Theory, vol. 6, no. 6, pp. 77 773, November 97. [3] I. Vajda, On f-divergence and singularity of probability measures, Periodica Mathematica Hungarica, vol., no. 4, pp. 3 34, 97. [4] V. N. Vapnik, Statistical Learning Theory, John Wiley & Sons, 998. [5] S. Verdú, Total variation distance and the distribution of the relative information, Proceedings of the Information Theory and Applications Workshop, pp. 499 5, San-Diego, California, USA, February 4. [6] S. Verdú, Information Theory, in preparation. [7] Z. Zhang, Estimating mutual information via Kolmogorov distance, IEEE Trans. on Information Theory, vol. 53, no. 9, pp. 38 38, September 7.