Convexity/Concavity of Renyi Entropy and α-mutual Information

Similar documents
Arimoto Channel Coding Converse and Rényi Divergence

arxiv: v4 [cs.it] 17 Oct 2015

Arimoto-Rényi Conditional Entropy. and Bayesian M-ary Hypothesis Testing. Abstract

Entropy measures of physics via complexity

The Information Bottleneck Revisited or How to Choose a Good Distortion Measure

Strong Converse Theorems for Classes of Multimessage Multicast Networks: A Rényi Divergence Approach

MMSE Dimension. snr. 1 We use the following asymptotic notation: f(x) = O (g(x)) if and only

Discrete Memoryless Channels with Memoryless Output Sequences

Functional Properties of MMSE

Convergence of generalized entropy minimizers in sequences of convex problems

Quantum Sphere-Packing Bounds and Moderate Deviation Analysis for Classical-Quantum Channels

ECE 4400:693 - Information Theory

Literature on Bregman divergences

Limits from l Hôpital rule: Shannon entropy as limit cases of Rényi and Tsallis entropies

Variable Length Codes for Degraded Broadcast Channels

Dispersion of the Gilbert-Elliott Channel

f-divergence Inequalities

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

The Method of Types and Its Application to Information Hiding

Feedback Capacity of a Class of Symmetric Finite-State Markov Channels

Tight Bounds for Symmetric Divergence Measures and a New Inequality Relating f-divergences

f-divergence Inequalities

Entropy and Ergodic Theory Lecture 4: Conditional entropy and mutual information

Correlation Detection and an Operational Interpretation of the Rényi Mutual Information

Channels with cost constraints: strong converse and dispersion

Upper Bounds on the Capacity of Binary Intermittent Communication

Lecture 2: August 31

On the Entropy of Sums of Bernoulli Random Variables via the Chen-Stein Method

Simple Channel Coding Bounds

A Single-letter Upper Bound for the Sum Rate of Multiple Access Channels with Correlated Sources

Jensen-Shannon Divergence and Hilbert space embedding

Chapter 2: Entropy and Mutual Information. University of Illinois at Chicago ECE 534, Natasha Devroye

Multiplicativity of Maximal p Norms in Werner Holevo Channels for 1 < p 2

5 Mutual Information and Channel Capacity

Multiaccess Channels with State Known to One Encoder: A Case of Degraded Message Sets

Reliable Computation over Multiple-Access Channels

Chapter 4. Data Transmission and Channel Capacity. Po-Ning Chen, Professor. Department of Communications Engineering. National Chiao Tung University

Optimal Sequences and Sum Capacity of Synchronous CDMA Systems

(each row defines a probability distribution). Given n-strings x X n, y Y n we can use the absence of memory in the channel to compute

Large Deviations Performance of Knuth-Yao algorithm for Random Number Generation

Some New Information Inequalities Involving f-divergences

On Improved Bounds for Probability Metrics and f- Divergences

arxiv: v4 [cs.it] 8 Apr 2014

Lecture 5 - Information theory

On Semicontinuity of Convex-valued Multifunctions and Cesari s Property (Q)

Tight Bounds for Symmetric Divergence Measures and a Refined Bound for Lossless Source Coding

A new converse in rate-distortion theory

Superposition Encoding and Partial Decoding Is Optimal for a Class of Z-interference Channels

Equivalence for Networks with Adversarial State

Computing and Communications 2. Information Theory -Entropy

EE5139R: Problem Set 7 Assigned: 30/09/15, Due: 07/10/15

SHARED INFORMATION. Prakash Narayan with. Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe

Optimal global rates of convergence for interpolation problems with random design

An Alternative Proof of Channel Polarization for Channels with Arbitrary Input Alphabets

On the Duality between Multiple-Access Codes and Computation Codes

Information Theory and Hypothesis Testing

Optimization in Information Theory

EECS 750. Hypothesis Testing with Communication Constraints

Intermittent Communication

Lecture 6: Gaussian Channels. Copyright G. Caire (Sample Lectures) 157

Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk

Lecture 4 Noisy Channel Coding

Channel Polarization and Blackwell Measures

Lecture 6 I. CHANNEL CODING. X n (m) P Y X

H(X) = plog 1 p +(1 p)log 1 1 p. With a slight abuse of notation, we denote this quantity by H(p) and refer to it as the binary entropy function.

Coding on Countably Infinite Alphabets

An upper bound on l q norms of noisy functions

LECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline

An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1

LECTURE 3. Last time:

Sequential prediction with coded side information under logarithmic loss

Soft Covering with High Probability

Course 212: Academic Year Section 1: Metric Spaces

Numerical Sequences and Series

Refined Bounds on the Empirical Distribution of Good Channel Codes via Concentration Inequalities

Capacity of the Discrete Memoryless Energy Harvesting Channel with Side Information

Metric Spaces and Topology

Research Collection. The relation between block length and reliability for a cascade of AWGN links. Conference Paper. ETH Library

Math 118B Solutions. Charles Martin. March 6, d i (x i, y i ) + d i (y i, z i ) = d(x, y) + d(y, z). i=1

Czechoslovak Mathematical Journal

An Extended Fano s Inequality for the Finite Blocklength Coding

SHARED INFORMATION. Prakash Narayan with. Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe

Introduction to Real Analysis

Part III. 10 Topological Space Basics. Topological Spaces

Mismatched Multi-letter Successive Decoding for the Multiple-Access Channel

An Achievable Error Exponent for the Mismatched Multiple-Access Channel

Threshold Optimization for Capacity-Achieving Discrete Input One-Bit Output Quantization

A Holevo-type bound for a Hilbert Schmidt distance measure

Symmetric Characterization of Finite State Markov Channels

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 53, NO. 12, DECEMBER

Universal Estimation of Divergence for Continuous Distributions via Data-Dependent Partitions

Journal of Inequalities in Pure and Applied Mathematics

THEOREM AND METRIZABILITY

Two Applications of the Gaussian Poincaré Inequality in the Shannon Theory

SHANNON made the following well-known remarks in

Strong Converse and Stein s Lemma in the Quantum Hypothesis Testing

Speech Recognition Lecture 7: Maximum Entropy Models. Mehryar Mohri Courant Institute and Google Research

Recent Results on Input-Constrained Erasure Channels

Channel Dispersion and Moderate Deviations Limits for Memoryless Channels

Week 2: Sequences and Series

Transcription:

Convexity/Concavity of Renyi Entropy and -Mutual Information Siu-Wai Ho Institute for Telecommunications Research University of South Australia Adelaide, SA 5095, Australia Email: siuwai.ho@unisa.edu.au Sergio Verdú Dept. of Electrical Engineering Princeton University Princeton, NJ 08544, U.S.A. Email: verdu@princeton.edu Abstract Entropy is well known to be Schur concave on finite alphabets. Recently, the authors have strengthened the result by showing that for any pair of probability distributions P and with majorized by P, the entropy of is larger than the entropy of P by the amount of relative entropy DP. This result applies to P and defined on countable alphabets. This paper shows the counterpart of this result for the Rényi entropy and the Tsallis entropy. Lower bounds on the difference in the Rényi or Tsallis entropy are given in terms of a new divergence which is related to the Rényi or Tsallis divergence. This paper also considers a notion of generalized mutual information, namely -mutual information, which is defined through the Rényi divergence. The convexity/concavity for different ranges of is shown. A sufficient condition for the Schur concavity is discussed and upper bounds on -mutual information are given in terms of the Rényi entropy. I. INTRODUCTION The Rényi entropy H P of a probability distribution P of order was introduced in []. The concavity of H P for P defined on a finite support was shown in [2]. If 0, ], H P is strictly concave in P and it is also Schur concave [3]. When >, Rényi entropy is neither convex nor concave in general but it is pseudoconcave. Since all concave functions and pseudoconcave functions are quasiconcave, Rényi entropy is Schur concave for > 0 [4]. It is well known that the Shannon entropy of finitelyvalued random variables is a Schur-concave function [4]. This result was strengthened and generalized for countably infinite random variables in [5], which shows that if a probability distribution is majorized by a probability distribution P, their difference in entropy was shown to be lower bounded by the relative entropy DP. This motivates the search for strengthened results for the Schur-concavity of the Rényi entropy. In order to strengthen the Schur-concavity of the Rényi entropy, this paper investigates the Schur-concavity of the Tsallis entropy. Tsallis entropy was introduced by Tsallis [6] to study multifractals. The same paper showed that the Tsallis entropy of order β is concave for β > 0 and convex for β < 0. The Schur-concavity of the Tsallis entropy was discussed in [7]. Tsallis divergence of order β was introduced in [8] and it is also known as the Hellinger divergence of order β [9]. Indeed, the Tsallis divergence is an f-divergence with fx = x xβ β and the Tsallis entropy is the corresponding entropy with the form i fp i for any probability distribution P. This paper also considers the convexity of a notion of generalized mutual information, namely -mutual information. This notion of mutual information was introduced by Sibson [0] in the discrete case, as the information radius of order []. Alternative notions of what could be called Rényi mutual information were proposed by Csiszár [2] and Arimoto [3]. -mutual information is closely related to Gallager s exponent function and plays an important role in a number of applications in channel capacity problems e.g., [4]. In Section II, the Schur concavity of the Rényi entropy and the Tsallis entropy are revisited. A new divergence is defined and new bounds on entropy in terms of the new divergence are derived. The -mutual information is defined and discussed in Section III. Its concavity, convexity, Schur concavity and upper bounds under different settings are illustrated. Due to space limits, some of the proofs are omitted. For simplicity, only countable alphabets are considered in this paper. All log and exp have the same arbitrary base. We use P to denote that the support of P is a subset of the support of. Furthermore, if P, it means that P does not hold. II. SCHUR-CONCAVITY OF RÉNYI ENTROPY AND TSALLIS ENTROPY Definition : For any probability distribution P, the Rényi entropy of order 0,, is defined as H P = log P i, where A is the support of P. By convention, H P is the Shannon entropy of P. Definition 2: For any probability distributions P and, let A be the union of the supports of P and. The Rényi divergence [] of order 0,, is defined as D P = log P i i, 2 which becomes infinite if > and P.

Definition 3: Given P X, P Y X, Y X, the conditional Rényi divergence of order 0,, is D P Y X Y X P X 3 = D P Y X P X Y X P X 4 = log PY X y x Y X y xp Xx, x,y X Y where X and Y are the supports of P X and P Y, respectively. The Tsallis entropy is related to the Rényi entropy through the following one-to-one function. For 0,, and z > 0, let and 5 ϕ z = expz z 6 ϕ v = log v. 7 Definition 4: For any probability distribution P, the Tsallis entropy of order β 0,, is defined as S β P = ϕ βh β P = P i β, 8 β β where A is the support of P. Definition 5: For any probability distributions P and, let A be the support of P. The Tsallis divergence of order β 0,, is defined as T β P = ϕ β D β P 9 β = P i β i β, 0 β which becomes infinite if β > and P. Definition 6: For any probability distributions P and and strictly concave function φ : [0, ] IR, let φ P i, i =P i iφ i + φi φp i, where φ x is the derivative of φx and i A which is the union of the supports of P and. Define the divergence W φ P = φ P i, i. 2 Lemma : For any probability distributions P and and strictly concave function φ : [0, ] IR, and with equality if and only if P =. φ P i, i 0 3 W φ P 0, 4 In the following, we fix β 0,, and consider the strictly concave function φ β x = xβ x β. 5 Its derivative is denoted by φ β x = βxβ β. For 0 x, φ β x is a decreasing function. Note that the Tsallis divergence can be seen as an f-divergence with fx = φx. The Tsallis entropy is equal to fp i. Without loss of generality, we assume that P and are defined on the positive integers and labeled in decreasing probabilities, i.e., P i P i + 6 i i +. 7 We say that is majorized by P if for all k =, 2,... k i i= k P i. 8 i= If is majorized by P, then P. If a real-valued functional f is such that fp f whenever is majorized by P, we say that f is Schur-concave. The following result generalizes [5, Theorem 3]. Theorem : For β 0,,, if is majorized by P, then S β S β P + W φβ P. 9 Proof: We first consider the case where and therefore P takes values on a finite integer set {, 2,..., M}. S β S β P W φβ P 20 = i φ β i i φ β P i W φβ P 2 = φ βii P i 22 i = M i P i φ βk φ βk + + i k=i M i P iφ βm 23 i= M = k= k φ βk φ βk + i i= k P i i= 24 0, 25 where 25 follows from the fact that φ β x is a decreasing function and is majorized by P. Consider the case that is non-zero for all integers. We assume that S β is finite, as otherwise, there is nothing to prove. We need to take M in the foregoing expressions.

Although the last term in the right side of 23 can now be negative, it vanishes due to the finiteness of S β : M i P iφ βm i= i=m+ i=m+ = β β iφ βm iφ βi i=m+ 26 27 i β, 28 where 27 follows from that φ β x is a decreasing function and 7. Therefore, the Tsallis entropy is a Schur-concave function. Due to the one-to-one correspondence between the Rényi entropy and the Tsallis entropy through 8, Theorem leads to the following bounds. If 0 < <, then ϕ H ϕ H P W φ P. 29 If >, then ϕ H P ϕ H W φ P. 30 These bounds are sufficient to show that the Rényi entropy is Schur-concave. They can further be used to obtain bounds on H H P in terms of W φ P as follows. Theorem 2: For P defined on a countable alphabet, the Rényi entropy H P is a Schur-concave function for 0,,. Furthermore, if is majorized by P, then a If 0 < <, then in nats H H P = ϕ A A W φ P 3 W φ P A 32 0, 33 where A is the support of. b If >, then in nats H H P ϕ W φ P 34 W φ P 35 0. 36 It would be interesting to obtain an inequality similar to [5, Theorem 3] for the Rényi entropy. However, the following example shows that it is not possible. Example : Consider P = {0.4, 0.35, 0.5, 0.} and = {0.3, 0.3, 0.3, 0.} so that is majorized by P. If = 0.6, then H H P > 0 but H H P < D P. To end this section, we illustrate the meaning of φ P i, i and how W φ P is related to relative a Fig.. An illustration of φx and φ in case a P i < i and case b i < P i. entropy and Tsallis divergence. Consider the tangent at φx when x = i as shown in Fig.. Lemma shows that the tangent is always above φp i and the amount is φ P i, i. Theorem 3: For any probability distributions P and with P and φx = x log x, b DP = W φ P 37 with the convention 0 log 0 = 0. Furthermore, if φ β x = xβ β, i β φβ P i, i = T β P, 38 where A is the support of, and hence for 0 < β <, T β P W φβ P. 39 III. -MUTUAL INFORMATION Definition 7: Let P X P Y X P Y, with X X and Y Y. The -mutual information with 0,, is I X; Y = min D P Y X P X 40 = log P X xpy X=x y. x X 4 By convention, I X; Y is the mutual information. Some properties of I X; Y related to the parameter are summarized in the following theorem. Theorem 4: For any fixed P X on X and P Y X : X Y, a I X; Y is continuous with 0, and lim X; Y = sup log 0 P X x. 42 lim I X; Y = I X; Y. 43 lim I X; Y = log sup x:p X x>0 P Y X y x. 44

b I X; Y is increasing with. Proof: a. The limit in 42 can be obtained by using L norm. For any ɛ > 0, there exists a sufficiently small 0 < such that P X x ɛ < P X xpy X=x y 45 x X < P X x. 46 Since ɛ > 0 is arbitrary, using L -norm gives lim P X xp 0 Y X=x y 47 y x X = sup P X x. 48 So the limit in 42 follows. The limits in 43 and 44 can be easily verified by using L Hôpital s rule. Due to 43, I X; Y is continuous with 0,. b. For < β, denote β = argmin D β P Y X P X so that I X; Y D P Y X β P X 49 D β P Y X β P X 50 = I β X; Y, 5 where 49 and 50 follow from 40 and [5, Theorem 3], respectively. Provided that C 0f > 0, Theorem 4 intriguingly reveals that the zero-error capacity of a discrete memoryless channel with feedback [6] is equal to C 0f = sup I 0 X; Y, 52 X where I 0 is defined as the continuous extension of I. Some upper bounds on -mutual information in terms of Rényi entropy are illustrated. These bounds provide some insights about generalizing conditional entropy. Theorem 5: For any given P X, P Y X and 0 < <, H P X = I X; X I P X, P Y X 53 Proof: It is easy to verify the equality in 53. The inequality can be seen by the following. Let U = X so that X U Y forms a Markov chain. The inequality in 53 follows from [4, Theorem 5.2] for and [7] for =. The difference between the sides in 53 is a good candidate for the conditional Rényi entropy for which several candidates have been suggested in the past e.g., [2], [3]. Since H P X is decreasing in [2], the following corollary holds. Corollary 6: For any given P X, P Y X and 0 <, H P X I P X, P Y X. 54 The following result shows the convexity or concavity of the conditional Rényi divergence, which proves to be useful to investigate the properties of -mutual information. Theorem 7: Given P X, P Y X, Y X, the conditional Rényi divergence D P Y X Y X P X is a concave in P X for and quasiconcave in P X for 0 < <. b convex in P Y X, Y X for 0 < and quasiconvex in P Y X, Y X for 0 < <. Proof: a. For 0 < <, log a is a decreasing function in a. The quasiconcavity of I X; Y can be seen from 5. For >, log a is concave in a so that the concavity of I X; Y can be seen from 5. Since I X; Y is concave in P X, part a is verified. Part b is a consequence of 4 together with [5, Theorem 3] and [5, Theorem ]. Theorem 8: Let P X P Y X P Y. For fixed P Y X, a I P X, P Y X = I X; Y is concave in P X for. b I P X, P Y X = I X; Y is quasiconcave in P X for 0 < <. Proof: Consider 0 < λ < and λ = λ. We first prove part a. I X; Y is known to be concave in P X for fixed P Y X [7]. Consider any probability distributions P X0 and P X and let = argmin D P Y X λp X0 + λp X. 55 Theorem 7 can be applied to show part a as follows: I λp X0 + λp X, P Y X = D P Y X λp X0 + λp X 56 λd P Y X P X0 + λd P Y X P X 57 λ min D P Y X P X0 + λ min D P Y X P X. 58 Similarly, part b can also be shown by using Theorem 7. Example 2: Although I P X, P Y X is concave in P X for, it can be shown that in general it is not Schur concave by considering λ = 0.3, = 2, P X0 0 = P X0 = 0.4, P X 0 = P X = 0.5, and [ ] 0 P Y X =. 59 0.5 0.5 However, the Schur concavity can be preserved if P Y X describes a symmetric channel. In other words, all the rows of the probability transition matrix P Y X are permutations of other rows and so are the columns. Since all symmetric quasiconcave functions are Schur concave [4], Theorem 8 implies the following corollary. Corollary 9: For any fixed symmetric P Y X, I P X, P Y X = I X; Y is Schur concave in P X for 0 < <. We now fix P X and consider the behavior of I P X,.

Theorem 0: Let P X P Y X P Y. For fixed P X, a I P X, P Y X = I X; Y is convex in P Y X for 0 <. b I P X, P Y X = I X; Y is quasiconvex in P Y X for 0 < <. Proof: Consider 0 < λ < and λ = λ. We first prove part a. I X; Y is known to be convex in P Y X [7]. Consider any conditional probability distributions P Y0 X and P Y X and let i = argmin D P Yi X P X 60 for i = 0 or. Due to 0 < < and Theorem 7, I P X, λp Y X + λp Y0 X 6 = min λp Y X + λp Y0 X P X 62 D λp Y X + λp Y0 X λ + λ 0 P X 63 λd P Y X P X + λd P Y0 X 0 P X 64 =λi P X, P Y X + λi P X, P Y0 X. 65 Similarly, Part b can be verifed by using Theorem 7. In Theorems 8 and 0, we have made different assumptions on the ranges of. The following examples justify that those assumptions are not superfluous. Example 3: Let λ = 0.3, P X0 0 = P X0 =, P X 0 = P X = 0.5, and [ ] 0 P Y X =. 66 0 I 0. P X, P Y X is not concave in P X and I 0.3 P X, P Y X is not convex in P X. Example 4: Let λ = 0.3, P X 0 = P X = 0.5, [ ] [ ] 0.7 0.3 0 P Y0 X 0 = and P 0.3 0.7 Y X =. 67 0 I 2 P X, P Y X is not concave in P Y X and I 3 P X, P Y X is not convex in P Y X. Although Examples 3 and 4 show some unpleasant behaviour of I P X, P Y X, the Arimoto-Blahut algorithm [8] can still be applied to maximize I P X, P Y X for > 0. The following theorem, motivated by [4], illustrates that some optimization problems about I P X, P Y X can be converted into convex optimization problems. Theorem : Consider 0,,. Let f P X, P Y X = ϕ I P X, P Y X 68 = P X xpy X=x y, x X z is a mono- where presents in 68 such that tonic increasing function in z. ϕ 69 a For a fixed P Y X, sup P X I P X, P Y X = ϕ sup f P X, P Y X, P X 70 where f P X, P Y X is concave in P X. b For a fixed P X, inf I P X, P Y X = ϕ P Y X inf f P X, P Y X P Y X, 7 where f P X, P Y X is convex in P Y X. Proof: Since ϕ z is a monotonic increasing function in z, 70 and 7 follow from the relation in 68. The function f P X, P Y X in 69 is concave in P X because z is concave in z. The function f P X, P Y X in 69 is convex in P Y X due to the Minkowski inequality for > and the reverse Minkowski inequality for <. Remark: Note that convexity/concavity were mistakenly reversed in [4, Theorem 5]. REFERENCES [] A. Rényi, On measures of entropy and information, Fourth Berkeley Symp. on Mathematical Statistics and Probability, pp. 547 56, 96. [2] M. Ben-Bassat and J. Raviv, Rényi s entropy and the probability of error, IEEE Transactions on Information Theory, vol. 24, no. 3, pp. 324 33, May 978. [3] Y. He, A. B. Hamza, and H. Krim, A generalized divergence measure for robust image registration, IEEE Transactions on Signal Processing, vol. 5, no. 5, pp. 2 220, 2003. [4] A. W. Marshall, I. Olkin, and B. C. Arnold, Inequalities: Theory of Majorization and Its Applications. Springer, 200. [5] S.-W. Ho and S. Verdú, On the interplay between conditional entropy and error probability, IEEE Transactions on Information Theory, vol. 56, no. 2, pp. 5930 5942, 200. [6] C. Tsallis, Possible generalization of Boltzmann-Gibbs statistics, Journal of statistical physics, vol. 52, no. -2, pp. 479 487, 988. [7] S. Furuichi, K. Yanagi, and K. Kuriyama, Fundamental properties of Tsallis relative entropy, Journal of Mathematical Physics, vol. 45, no. 2, pp. 4868 4877, 2004. [8] C. Tsallis, Generalized entropy-based criterion for consistent testing, Physical Review E, vol. 58, no. 2, p. 442, 998. [9] F. Liese and I. Vajda, On divergences and informations in statistics and information theory, IEEE Transactions on Information Theory, vol. 52, no. 0, pp. 4394 442, Oct 2006. [0] R. Sibson, Information radius, Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, vol. 4, no. 2, pp. 49 60, 969. [] S. Verdú, -mutual information, 205 Information Theory and Applications Workshop, 205. [2] I. Csiszár, Generalized cutoff rates and Rényi s information measures, IEEE Trans. on Information Theory, vol. 4, no., pp. 26 34, 995. [3] S. Arimoto, Information measures and capacity of order for discrete memoryless channels, Proc. Coll. Math. Soc. János Bolyai., 975. [4] Y. Polyanskiy and S. Verdú, Arimoto channel coding converse and Rényi divergence, 200 48th Annual Allerton Conference on Communication, Control, and Computing Allerton, pp. 327 333, 200. [5] T. van Erven and P. Harremoës, Rényi divergence and Kullback-Leibler divergence, IEEE Transactions on Information Theory, vol. 60, no. 7, pp. 3797 3820, July 204. [6] C. E. Shannon, The zero error capacity of a noisy channel, IRE Transactions on Information Theory, vol. 2, no. 3, pp. 8 9, 956. [7] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd ed. John Wiley & Sons, 202. [8] S. Arimoto, Computation of random coding exponent functions, IEEE Transactions on Information Theory, vol. 22, no. 6, pp. 665 67, 976.