Some Results on the Generalized Gaussian Distribution

Some Results on the Generalized Gaussian Distribution Alex Dytso, Ronit Bustin, H. Vincent Poor, Daniela Tuninetti 3, Natasha Devroye 3, and Shlomo Shamai (Shitz) Abstract The aer considers information-theoretic alications of a broad class of distributions termed generalized Gaussian (GG). The flexible arametric form of the robability density function of the GG family makes it an excellent choice for many modeling scenarios. Well-know examles of this distribution include Lalace, Gaussian, and uniform. The first art of the aer exlores roerties of the GG distribution. In articular, it is shown that a GG random variable can be decomosed into a roduct of a Gaussian random variable and a indeendent ositive random variable. The roerties of this decomosition are carefully examined. The second art of aer consideres a rate-distortion roblem of GG sources under the L error distortion. For examle, conditions are derived under which Shannon s lower bound is tight. I. INTRODUCTION A classical rate-distortion roblem, first formulated by Shannon in [, considers a source with the distribution P X on X, a reconstruction alhabet ˆX and a rate distortion measure d : X ˆX! R +. One of the crowning achievements of Shannon is an exact exression for the rate-distortion function given by R(D) inf I(X; ˆX). () P ˆX X :E[d(X, ˆX)aleD For continuous sources the rate-distortion function has been found for a Lalace source with an absolute error distortion d(x, ˆx) x ˆx [, exonential source with absolute error distortion [3 and Gaussian source with suare error distortion d(x, ˆx) x ˆx [. In this aer, we will enlarge this set of known cases by considering the rate-distortion roblem for generalized Gaussian (GG) distributions. A. Dytso and H.V. Poor are with the Deartment of Electrical Engineering, Princeton University, NJ, Princeton 8544, USA (email: adytso, oor@rinceton.edu). R. Bustin and S. Shamai (Shitz) are with the Deartment of Electrical Engineering, Technion-Israel Institute of Technology, Technion City, Haifa 3, Israel (e-mail: bustin@technion.ac.il, sshlomo@ee.technion.ac.il). 3 D. Tuninetti and N. Devroye are with the Deartment of Electrical and Comuter Engineering, University of Illinois at Chicago, IL, Chicago 667, USA (e-mail: danielat, devroye@uic.edu). The work of A. Dytso and H.V. Poor was suorted by the National Science Foundation under Grants CCF-4575 and CNS-456793. The work of R. Bustin and S. Shamai has been suorted by the Euroean Union s Horizon Research and Innovation Programme, grant agreement no. 69463. The work of Daniela Tuninetti and Natasha Devroye was suorted in art by the National Science Foundation under Grant 45. A. Problem Formulation We shall refer to X with GG distribution given by the robability density function (df) f X (x) c e x µ, ( c, x R, ( + as X N (µ, ). The well-know examles of this family of distributions include: the Lalace distribution for ; the Gaussian distribution for ; and the uniform distribution on [, for and!. We consider the rate distortion roblem with a GG source and a distortion measure that corresonds an `-norm Formally, we seek to solve R, (,D) d(x, ˆx) x ˆx,. (3) where X N (µ, ). We also define R, (,D) inf I(X; ˆX), (4) P ˆX X :E[ X ˆX ale D inf I(X; ˆX). (5) P ˆX X : X ˆX aled a.s. P X It should be noted that our analysis carries over to distributions on R + such as an exonential distribution, and thus our analysis will subsume the known cases mentioned above. B. Why Generalized Gaussian Sources? The flexible arametric form of the df of GG distributions allows for tails that are either heavier than Gaussian ( <) or lighter than Gaussian ( >) which makes it an excellent choice for many modeling scenarios. From the information theoretic ersective the GG distribution is interesting because it maximizes the entroy and Réyni entroy under a -th absolute moment constraint [4, [5. Theorem. Let X R such that E[ X ale. Then h(x) ale e log c E[ X ale log e c. (6) The ineuality in (6) is attained with euality iff X N (, ). Proof: This result can be roved via a method outlined in [4, Chater.

C. Shannon s Lower Bound In [ Shannon develoed a techniue for constructing a lower bound on the rate distortion which we now exlore in our context. Theorem. For X N (, ) and (, ) R + R, (,D) Proof: ale c log + log e D c I(X; ˆX) h(x) h(x ˆX) h(x) h(x ˆX ˆX) +. (7) h(x) h(x ˆX) (8 E[ X log e c log E[ X ˆX! e c (8 c) c log + log e, (8c) D c where the ineualities follow from: the fact that conditioning reduces entroy; the maximum entroy rincile from Theorem ; and c) the distortion constraint E h X ˆX i ale D. This concludes the roof. The ineualities in (8) are tight if there exists a backward test channel for some random variable ˆX such that X ˆX + Z, (9) and where Z N (,D ) and indeendent of ˆX. Moreover, denoting by r (t) the characteristic function of X N r (, ), showing the existence of a test channel in (9) is euivalent to showing that the function h(t) (t) (Dt), () is a valid characteristic function of some random variable ˆX. W.l.o.g. in what follows we set D and and define (,,)(t) (t) (t). () Therefore, showing the existence of the test channel in (9) amounts to showing that (,,)(t) is a valid characteristic function for. Remark. Note that using an additive backward test channel is not the only way of achieving eualities in (8). However, this is one of the most commonly used techniues and understanding its itations can be very valuable. D. Examles of Gaussian and Lalace Sources The roblem of analyzing (,,)(t) in () can be best demonstrated by studying the following four cases: (,) (, ) Gaussian source with a suare-error distortion; (,) (, ) Lalace source with an absolute-error distortion; (,) (, ) Gaussian source with an absolute-error distortion; and (,) (, ) Lalace source with a suare-error distortion. Recall, that the characteristic functions of Gaussian and Lalace random variables are given by (t) e t and (t) +4t, resectively. Therefore, for the above four mentioned cases the function (,,) (t) is given by (,,)(t) e ( )t, (,,)(t) (+4t )e t, (,,)(t) +4t +4 t, e t (,,)(t) +4 t. On the one hand, note that (,,)(t) e ( )t is a valid characteristic function and corresonds to ˆX N (, ), and (,,)(t) +4t +4 t + +4 t, () is a valid characteristic function of ˆX with df given by f ˆX(x) (x)+ c e x, (3) which is a df of a random variable ˆX B Y where B Bernoulli and Y N (, ) and where B and Y are indeendent. On the other hand, recall that all characteristic functions satisfy the ineuality (t) ale, which is not satisfied by e t (,,)(t) +4 t. Moreover, observe that (,,)(t) ( + 4t )e t has an inverse Fourier transform given by F { (,,) (t)}(x) x ( x)e 5, (4) which takes on negative values for < x and therefore cannot be a valid df. Therefore, (,,)(t) and (,,) (t) are not valid characteristic functions and the test channel in (9) does not exists for (, ) (, ) and (, ) (, ). Remark. From the Gaussian and Lalace examles, we see that the existence of the test channel is not always guaranteed. This motivates answering the uestion for which values of (, ) the test channel can be formed. However, answering such a uestion can be challenging since there exists no closed form exression for the characteristic function r(t) for r 6,. In fact, very little is known about roerties

of r(t) for r 6,. Section III is, therefore, devoted to studying roerties of the characteristic function of the GG random variable, which are of indeendent interest. II. MAIN RESULTS Our main results are given next. Theorem 3. Let Z N (, ) and for (, ) R + let Then we have: S {(, ) : (, ), >, 6 } [ {(, ) :, <<}. (5) for (, ) S and any the test channel in (9) cannot be formed. In other words, for (, ) S and any there exists no ˆX indeendent of Z such that (9) holds with X N (, ). for {(, ) :<ale} and almost all the test channel in (9) cannot be formed. In other words, for {(, ) :<ale } and almost all there exists no ˆX indeendent of Z such that (9) holds with X N (, ). for {(, ) : ale ale } and all the test channel in (9) exists and Shannon s lower bound in Theorem is attainable. Moreover, the distribution of ˆX is given by for and f ˆX(x) x ( D ) ) e f ˆX(x) (( D ) ), (6) + D (x)+! + D g(x), (7) for ale <, and where g(x) is the df of a random variable with characteristic function given by + (t) D (t) (Dt). (8) + (Dt) D Note that for the regime <ale the results holds for almost all, but we would like to oint out that there are cases in this regime when a test channel can be established and Shannon s lower bound is tight as demonstrated next for the case of (, ) (, ). Theorem 4. For X Unif[, and D N (set of measure zero) R, (,D) log +, (9) D In other words, the set of for which the statement does not hold has Lebesgue measure zero. where the euality in (9) is attained by a discrete random variable ˆX uniformly distributed on ale su( ˆX) ±i D : i : N, if N is even, ale su( ˆX) ±i D : i : N, if N is odd. where on N D. Proof: Let Z Unif[ D, D indeendent of ˆX and define a test channel as in (9). Note, that by this construction we have that X Unif[,, X ˆX ale D a.s., and I(X; ˆX) h(x) This concludes the roof. h(x ˆX) h(x) h(z) log. D III. PROPERTIES OF THE GENERALIZED GAUSSIAN DISTRIBUTION In this section we study roerties of the GG distribution. A. Moments and Mellin Transform The moments and absolute moments of the GG distribution are given next. Proosition. (Moments [6.) For any > and k> the moments of X N (, ) are given by E[X k k k k +, for k even, () E[X k, for k odd. () Definition. The Mellin transform of a ositive random variable X is defined as for s C and Re(s) >. m X (s) E[X s, () The Mellin transform emerges as a major tool in characterizing roducts of ositive indeendent random variables since m X Y (s) m X (s) m Y (s). (3) Proosition. (Mellin Transform of X.) For any > and X N (, ) E[ X s s k s (4) where s C such that Re(s) >. Moreover, for any > and k> the absolute moments are given by E[ X k k k k +. (5)

Proof: The Mellin transform can be easily comuted by using the integral Z x s e x dx s, (6) and a change of variable, and where the above integral is finite if Re(s) > and >. Note that the -th absolute moment of X N (, ) is given by E[ X. (7) The following corollary, which relates k-th moments of two GG distributions of a different order, is useful in many roofs. Corollary. Let X N (, ) and X N (, ) then for for any k R +. Moreover, for > k! E[ X k ale E[ X k, (8) E[ X k k E[ X k. (9) Proof: The roof follows by using Proosition and Stirling s aroximation for the gamma function. B. Relation to Positive Definite Functions As will be observed throughout this aer, the GG distribution exhibits different roerties deending whether ale or >. At the heart of this behavior is the concet of ositivedefinite functions. Definition. A function f : R! C is called ositive definite if for every ositive integer n and all real numbers x,x,...,x n the n n matrix is ositive semi-definite. A (a i,j ) n i,j, a i,j f(x i x j ), (3) Our first result relates the df of the GG distribution to the class of ositive definite functions. Theorem 5. The function e x is not ositive definite for >; and ositive definite for <ale. Moreover, for x> e x Z e tx dµ(t), (3) where dµ(t) (indeendent of x) is a finite non-negative Borel measure on t [,. Proof: See Aendix B. The concet of ositive-definite functions will also lay an imortant role in examining roerties of the characteristic function of the GG distribution. C. Product Decomosition of the GG Random Variable As a conseuence of Theorem 5 we have the following decomositional reresentation of the GG random variable. Proosition 3. For any <ale and X N (, ) we have that X V Z, (3) where V (deendent on ) is a ositive random variable indeendent of Z N (, ). Moreover, V has the following roerties: V is an unbounded random variable for <and V for ; and for < V is a continuous random variable with df given by f V (v) Z v it R it it it+ it+ dt, v >. (33) Proof: See Aendix A. The df of V can be also be reresented as a ower series exansion. Proosition 4. For < the df of V in (33) is given by X f V (v) a k v k, v >, (34) k where ( ) k+ sin k a k k + (k+)( k! ). (35) Proof: The roof follows by using the residue theorem on the integral in (33). Remark 3. For the case of, the random variable V is distributed according to the Rayleigh distribution. D. Characteristic Function The focus of this section is on the characteristic function of the GG distribution which will lay an imortant role in analyzing the outut distribution of additive channels with GG distributed noise. The characteristic function of the GG distribution can be given in the following integral form. Theorem 6. For any > the characteristic function of X N (, ) is given by Z (t) c cos(tx)e x dx. (36) Proof: The roof follows from the fact that e x is an even function and therefore its Fourier transform is euivalent to the cosine transform. Examles of characteristic functions of X N (, ) for several values of are given in Fig..

.8.6.4. -. -8-6 -4-4 6 8 t (Lalace) (Gauss.) 3 6 Fig. : Plot of the characteristic function of X N (, ) for several values of. E. On the Distribution of Zeros of the Characteristic Function As can be seen from Fig. the characteristic function of the GG distribution can have zeros. The following theorem gives a somewhat surrising result on the distribution of the zeros of (t) in (36). Theorem 7. (Distribution of Zeros.) The characteristic function of (t) in (36) has the following roerties: for >, (t) has at least one ositive to negative zero crossing; and for ale ale, (t) is non-negative and can be exressed as i (t) E he t V, (37) where the random variable V (deendent on ) is defined in (3). Proof: See Aendix B. Remark 4. Note that for <ale the function ( t) can be thought of as a Lalace transform of a random variable V. This observation will be useful in several of the roofs. F. Asymtotics of (t) By using the reresentation of (t) in Theorem 7, we can give an exact tail behavior for (t) in (36). Proosition 5. For << we have that where A ( + Moreover, t! (t) t+, (38) A ) ( ) 8 < t! (,,) (t) : Proof: See Aendix C. a and a is defined in (35)., >, +, <. (39) G. Deconvolution Results We are now in a osition to answer whether () is a characteristic function. Theorem 8. The function roerties (,,)(t) in (,,) (t) in () has the following for (, ) S, (,,)(t) is not a characteristic function for any ; and for every {(, ) :<ale} there exists an such that (,,)(t) is not a characteristic function. Proof: See Aendix D. We would like to oint out that for <ale there are cases when (,,) (t) is a characteristic function. Secifically, we can find an such that (,,)(t) is a characteristic function for >and >. The most trivial case is of and, for which (,,)(t), (4) is a characteristic function. A less trivial examle is when in which case (t) sinc(t) and (,,)(t) sinc(t) sinc(t). (4) For examle, when we have that (,,)(t) cos(t), which corresonds to the characteristic function of the random variable ˆX ± eually likely. Note that in the above examle, because zeros of (t) occur eriodically, we can select such that the oles and zeros of (,,)(t) cancel. However, we conjecture that such examles are only ossible for, and for << zeros of (t) do not aear eriodically leading to the following conjecture Conjecture. For < ale <, characteristic function for any >. (,,)(t) is not a It is not difficult to check, by using the roerty that convolution with an analytic function is again analytic, that Conjecture is true if is an even integer and is any noneven real number. In view of Theorem 8 it remains to answer what haens to (,,) (t) when < <. Theorem 9. For ale <, characteristic function. Moreover, (,,)(t) + + (,,)(t) in () is a + G(t), (4) where G(t) is a characteristic function of a continuous random variable. Proof: See Aendix E. Note that for the case of < < whether is a characteristic remains an oen uestion. H. Proof of Theorem 3 (,,)(t) The roof the main results now follows easily by noting that the test channel in (9) does not exist under the circumstances described in Theorem 8, and exists (i.e., Shannon s lower

bound is achievable) under the circumstances described in Theorem 9. IV. CONCLUSION In this work, we have examined a roblem of lossy comression for the generalized Gaussian family of source distributions. For the case when the distortion and the source distribution are matched (i.e., ), a closed form exression for the rate-distortion function has been given and shown to be attained by Shannon s lower bound. Moreover, for cases when the distortion and the source distribution are not matched, conditions under which no backward test channel exists have been given. APPENDIX A PROOF OF PROPOSITION 3 To show that X V Z, first observe that d (t) c t dµ(t) is a robability measure where dµ(t) was defined in Theorem 5. Z P(X R) R Z Z c R Z Z c c e x dx c Z Z R e tx dµ(t)dx e tx dxdµ(t) r t dµ(t) d (t), where the eualities follow from: using the reresentation of e x in Theorem 5; and switching of the order of integration as justified by Tonelli s theorem for ositive functions. The above imlies that d (t) c t dµ(t) is a robability measure on [, ). Moreover, for any measurable set S R we have Z P(X S) c e x dx S Z Z c e tx dµ(t)dx S Z Z r r t S e tx dxc t dµ(t) Z r t P Z S c t dµ(t) Z ale c) E P P P r Z S T t c T t dµ(t) Z S T T T Z S d) P (V Z S), where the eualities follow from: the reresentation of e x in Theorem 5; the fact that c t dµ(t) is a robability measure; c) because Z is indeendent of t; and d) renaming V T. Therefore, it follows that X can be decomosed into V Z. To find the df of V we use the Mellin transform aroach by observing that E[ X it E[ V Z it E[V it E[ Z it. (43) Therefore, by using Proosition we have that the Mellin transform of V is given by E[V it E[ X it E[ Z it it it+. (44) it it+ Finally, the df of V is comuted by using the inverse Mellin transform of (44) f V (v) Z it it+ v it dt, v >. This concludes the roof. R it APPENDIX B PROOF OF THEOREM 7 it+ We first show that for > there is at least one zero. We use the aroach of [7. Towards a contradiction assume that (t) for all t, then for all t we have that ale c c Z Z (x)( (x) (3 3 4e t (t) +e, cos(xt)) dx 4 cos(tx) + cos(tx)) dx where the eualities follow from: using ( cos(xt)) (3 4 cos(tx) + cos(tx)); and using the inverse Fourier transform and Parseval s identity. For small x we can write e x x + O(x ) and therefore we have that t (t) ale 3 4 + + O(t ) (4 ) t + O(t ). Therefore, for > we reach a contradiction since 4 < for >. This concludes the roof for the case of >. For <ale the roof goes as follows: (t) E[e itx E[e itv Z E E[e itv Z V E c) >, he t V i

where the (in)-eualities follow from: the decomosition in Proosition 3; the indeendence of V and Z and the fact that the characteristic function of Z is e t ; and c) the ositivity of e t V. This concludes the roof. APPENDIX C PROOF OF PROPOSITION 5 The roof follows by observing that (t) ale Z e t v f V (v)dv a Z a e t v v dv + t + A, (45) t+ where the (in)-eualities follow from: using the first term of the series in Proosition 4 to bound f V (v); and using the -th absolute moment of a Gaussian random variable from Proosition. The lower bound on A t + (t) follows similarly and is given by O t + (t). (46) The uer bound in (45) and the lower bound in (46) imly that This concludes the roof. A. Case of >> t! (t) t+. (47) A APPENDIX D PROOF OF THEOREM 8 We start by looking at the regime >>. We want to show that there exists no random variable X indeendent of Z N (, ) such that X ˆX + Z, (48) where X N (, ) for all. Note that, since X and Z are symmetric and have finite moments, if such an ˆX exists it must also be symmetric with finite moments. Then for all k k E[ X k E[ ˆX + Z k E[E[ ˆX + Z k Z E[ E[ ˆX + Z Z k E[ E[ ˆX+Z k E[ Z k, (49) where the (in)-eualities follow from: Jensen s ineuality; and the indeendence of ˆX and Z. This in turn imlies that, in order for ineuality in (49) to be true, we must have that for all k E[ Z k k E[ X k, (5). However, by Corollary for < k! E[ Z k k E[ X k ; therefore, there exists no that can satisfy (5) for all k. This concludes the roof for <. B. Case of and < Note that in the case of and < we want to show that there is no random variable ˆX such that the convolution leads to f X (y) c E ale e (y ˆX), where f X (y) c e y. Such an ˆX does not exist since the convolution reserves analyticity. That is the convolution with an analytic df must result in an analytic df. Noting that f X (y) is not analytic for < (i.e., the derivative at zero is not defined) leads to the desired conclusion. C. Case of > and ale Now for > and ale the function (,,)(t) has a ole but no zeros by Theorem 7. Therefore, for the case of >and ale there exists a t, namely the ole of (,,)(t), such that (,,)(t) is not continuous at t t. This violates the condition that the characteristic function is always a continuous function of t and therefore (,,)(t) is not a characteristic function for all. D. Case of >. For the case of > and > the function (,,)(t) (t) (t), (5) has both oles and zeros by Theorem 7. Moreover, let t be such that is (t )and we can always choose an such that (t ) 6 and (,,)(t ). (5) In other words, we choose an such that the oles do not cancel the zeros. This imlies that there exists an such that (,,)(t) is not a continuous function of t and is not a characteristic function. Moreover, it is not difficult to show that the above can be done for almost all.

E. Case of << Finally, for the regime << the result follows from Proosition 5 where it is shown that n! (,,) (t), (53) which violates the condition that the characteristic function is bounded. This concludes the roof. APPENDIX E PROOF OF THEOREM 9 To show that (,,) is a characteristic function we use Cramer s theorem [8 which reuires verification that x Z xa u) (,,) u x cos(u)du, (54) for x> and A. First, we focus on xa ale. By using the fact that (,,)(t) is monotonically decreasing, by the second mean value theorem for integration we have that Z xa (,,) () u) (,,) u x Z b cos(u)du (A u ) cos(u)du, (55) x (i.e., we can erform Fourier inversion). The integrability of G(t) follows from the fact that G(t) is bounded and t! t G(t) ; (57) for details see [9. This concludes the roof. REFERENCES [ C. Shannon, A mathematical theory of communication, Bell Syst. Tech. J., vol. 7, no. 379-43, 63-656, Jul., Oct. 948. [ H. Si, O. O. Koyluoglu, and S. Vishwanath, Lossy comression of exonential and lalacian sources using exansion coding, in Proc. IEEE Int. Sym. Inf. Theory. IEEE, 4,. 35 356. [3 S. Verdú, The exonential distribution in information theory, Problemy Peredachi Informatsii, vol. 3, no.,., 996. [4 T. Cover and J. Thomas, Elements of Information Theory: Second Edition. Wiley, 6. [5 E. Lutwak, D. Yang, and G. Zhang, Moment-entroy ineualities for a random vector, IEEE Trans. Inf. Theory, vol. 53, no. 4,. 63 67, Aril 7. [6 S. Nadarajah, A generalized normal distribution, Journal of Alied Statistics, vol. 3, no. 7,. 685 694, 5. [7 N. Elkies, A. Odlyzko, and J. Rush, On the acking densities of suerballs and other bodies, Inventiones Mathematicae, vol. 5, no.,. 63 639, 99. [8 N. G. Ushakov, Selected Toics in Characteristic Functions. Walter de Gruyter, 999. [9 A. Dytso, R. Bustin, H. V. Poor, D. Tuninetti, N. Devroye, and S. Shamai, The generalized Gaussian distribution in information theory: Rate distortion, www.rinceton.edu/ adytso/aers/gginfotheoresourcecoding.df, 7. R b for some b [,xa. Note that the integral u) cos(u)du for all b [,xa. For xa > let K be the largest integer such that xa K ; then Z xa + KX k k Z xa K Z (k+) u) (,,) cos(u)du u) (,,) cos(u)du u) (,,) cos(u)du, and the roof then follows similarly to the case of xa ale by using second mean value theorem for integration on each of the domains. For further details see [9. Next, we show that (,,)(t) + + + G(t), (56) where G(t) ( (,,) (t) ) + ( ) + is the characteristic function of a continuous distribution. First, observe that f(t) is a characteristic function of (x). Second, recall that the set of characteristic functions is closed under convex combinations, and since (,,)(t) and f(t) are characteristic function so is G(t). To show that G(t) is the characteristic function of a continuous distribution it is enough to show that G(t) is integrable