Moment-entropy inequalities for a random vector

Similar documents
Several properties of new ellipsoids

arxiv: v1 [math.pr] 4 Dec 2013

Brief Review of Functions of Several Variables

An Extremal Property of the Regular Simplex

Self-normalized deviation inequalities with application to t-statistic

Learning Theory: Lecture Notes

Lecture 19. sup y 1,..., yn B d n

The random version of Dvoretzky s theorem in l n

A Hadamard-type lower bound for symmetric diagonally dominant positive matrices

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013

Chapter 7 Isoperimetric problem

Advanced Stochastic Processes.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Information Theory Tutorial Communication over Channels with memory. Chi Zhang Department of Electrical Engineering University of Notre Dame

Minimal surface area position of a convex body is not always an M-position

Singular Continuous Measures by Michael Pejic 5/14/10

PAPER : IIT-JAM 2010

Lecture 7: Properties of Random Samples

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate

The log-behavior of n p(n) and n p(n)/n

Concavity Solutions of Second-Order Differential Equations

arxiv: v1 [math.pr] 13 Oct 2011

This section is optional.

Entropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP

Affine moments of a random vector

Exponential Families and Bayesian Inference

Dec Communication on Applied Mathematics and Computation Vol.32 No.4

An almost sure invariance principle for trimmed sums of random vectors

INEQUALITY FOR CONVEX FUNCTIONS. p i. q i

2 Banach spaces and Hilbert spaces

Homework Set #3 - Solutions

A Note on the Kolmogorov-Feller Weak Law of Large Numbers

b i u x i U a i j u x i u x j

Citation Journal of Inequalities and Applications, 2012, p. 2012: 90

Math Solutions to homework 6

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Asymptotic distribution of products of sums of independent random variables

lim za n n = z lim a n n.

INEQUALITIES BJORN POONEN

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 6 9/23/2013. Brownian motion. Introduction

Gamma Distribution and Gamma Approximation

Lecture 19: Convergence

1 Convergence in Probability and the Weak Law of Large Numbers

The Matrix Analog of the Kneser-Süss Inequality

Boundaries and the James theorem

Supplementary Material for Fast Stochastic AUC Maximization with O(1/n)-Convergence Rate

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

The mother of most Gaussian and Euclidean inequalities

APPLICATION OF YOUNG S INEQUALITY TO VOLUMES OF CONVEX SETS

A Proof of Birkhoff s Ergodic Theorem

18.657: Mathematics of Machine Learning

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

A Characterization of Compact Operators by Orthogonality

ON POINTWISE BINOMIAL APPROXIMATION

Statistical Theory MT 2008 Problems 1: Solution sketches

Chapter 8. Euler s Gamma function

PAijpam.eu ON TENSOR PRODUCT DECOMPOSITION

Common Coupled Fixed Point of Mappings Satisfying Rational Inequalities in Ordered Complex Valued Generalized Metric Spaces

Lecture 8: Convergence of transformations and law of large numbers

LECTURE 8: ASYMPTOTICS I

BETWEEN QUASICONVEX AND CONVEX SET-VALUED MAPPINGS. 1. Introduction. Throughout the paper we denote by X a linear space and by Y a topological linear

EE 4TM4: Digital Communications II Probability Theory

SHARP INEQUALITIES INVOLVING THE CONSTANT e AND THE SEQUENCE (1 + 1/n) n

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

Unbiased Estimation. February 7-12, 2008

Chapter 3 Inner Product Spaces. Hilbert Spaces

A REFINEMENT OF JENSEN S INEQUALITY WITH APPLICATIONS. S. S. Dragomir 1. INTRODUCTION

Math 341 Lecture #31 6.5: Power Series

Integrable Functions. { f n } is called a determining sequence for f. If f is integrable with respect to, then f d does exist as a finite real number

On Orlicz N-frames. 1 Introduction. Renu Chugh 1,, Shashank Goel 2

Statistical Theory MT 2009 Problems 1: Solution sketches

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

ANOTHER WEIGHTED WEIBULL DISTRIBUTION FROM AZZALINI S FAMILY

ON MEAN ERGODIC CONVERGENCE IN THE CALKIN ALGEBRAS

We are mainly going to be concerned with power series in x, such as. (x)} converges - that is, lims N n

Central limit theorem and almost sure central limit theorem for the product of some partial sums

On n-collinear elements and Riesz theorem

Lecture 8: October 20, Applications of SVD: least squares approximation

Lecture 3 : Random variables and their distributions

Maximum Likelihood Estimation and Complexity Regularization

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 3 9/11/2013. Large deviations Theory. Cramér s Theorem

1 = δ2 (0, ), Y Y n nδ. , T n = Y Y n n. ( U n,k + X ) ( f U n,k + Y ) n 2n f U n,k + θ Y ) 2 E X1 2 X1

<, if ε > 0 2nloglogn. =, if ε < 0.

On Random Line Segments in the Unit Square

Mi-Hwa Ko and Tae-Sung Kim

OFF-DIAGONAL MULTILINEAR INTERPOLATION BETWEEN ADJOINT OPERATORS

The multiplicative structure of finite field and a construction of LRC

MONOTONICITY OF SEQUENCES INVOLVING GEOMETRIC MEANS OF POSITIVE SEQUENCES WITH LOGARITHMICAL CONVEXITY

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

An elementary proof that almost all real numbers are normal

for all x ; ;x R. A ifiite sequece fx ; g is said to be ND if every fiite subset X ; ;X is ND. The coditios (.) ad (.3) are equivalet for =, but these

2.1. The Algebraic and Order Properties of R Definition. A binary operation on a set F is a function B : F F! F.

A REMARK ON A PROBLEM OF KLEE

ST5215: Advanced Statistical Theory

5.1. The Rayleigh s quotient. Definition 49. Let A = A be a self-adjoint matrix. quotient is the function. R(x) = x,ax, for x = 0.

An operator equality involving a continuous field of operators and its norm inequalities

5 Birkhoff s Ergodic Theorem

Transcription:

1 Momet-etropy iequalities for a radom vector Erwi Lutwak, Deae ag, ad Gaoyog Zhag Abstract The p-th momet matrix is defied for a real radom vector, geeralizig the classical covariace matrix. Sharp iequalities relatig the p-th momet ad Reyi etropy are established, geeralizig the classical iequality relatig the secod momet ad the Shao etropy. The extremal distributios for these iequalities are completely characterized. I. INTRODUCTION I [9 the authors demostrated how the classical iformatio theoretic iequality for the Shao etropy ad secod momet of a real radom variable could be exteded to iequalities for Reyi etropy ad the p-th momet. The extremals of these iequalities were also completely characterized. Momet-etropy iequalities, usig Reyi etropy, for discrete radom variables have also bee obtaied by Arika [2. We describe how to exted the defiitio of the secod momet matrix of a real radom vector to that of the p-th momet matrix. Usig this, we exted the momet-etropy iequalities ad the characterizatio of the extremal distributios proved i [9 to higher dimesios. Variats ad geeralizatios of the theorems preseted ca be foud i work of the authors [8, [10, [11 ad Bastero- Romace [3. The authors would like to thak Christoph Haberl for his careful readig of this paper ad valuable suggestios for improvig it. II. THE p-th MOMENT MATRIX OF A RANDOM VECTOR A. Basic otatio Throughout this paper we deote: = -dimesioal Euclidea space x y = stadard Euclidea ier product of x, y x = x x S = positive defiite symmetric -by- matrices A = determiat of A S K = Lebesgue measure of K. The stadard Euclidea ball i will be deoted by B, ad its volume by ω. Each ier product o ca be writte uiquely as (x, y) x, y A = Ax Ay, for A S. The associated orm will be deoted by A. E. Lutwak (elutwak@poly.edu), D. ag (dyag@poly.edu), ad G. Zhag (gzhag@poly.edu) are with the Departmet of Mathematics, Polytechic Uiversity, Brookly, New ork. ad were supported i part by NSF Grat DMS-0405707. Throughout this paper, X deotes a radom vector i. The probability measure o associated with a radom vector X is deoted m X. We will deote the stadard Lebesgue desity o by dx. By the desity fuctio f X of a radom vector X, we mea the Rado-Nikodym derivative of probability measure m X with respect to Lebesgue measure. If V is a vector space ad Φ : V is a cotiuous fuctio, the the expected value of Φ(X) is give by E[Φ(X) = Φ(x) dm X (x). We call a radom vector X odegeerate, if E[ v X > 0 for each ozero v. B. The p-th momet of a radom vector For p (0, ), the stadard p-th momet of a radom vector X is give by E[ X p = x p dm X (x). (1) More geerally, the p-th momet with respect to the ier product, A is C. The p-th momet matrix E[ X p A = x p A dm X(x). The secod momet matrix of a radom vector X is defied to be M 2 [X = E[X X, where for v, v v is the liear trasformatio give by x (x v)v. Recall that M 2 [X E[X is the covariace matrix. A importat observatio is that the defiitio of the momet matrix does ot use the ier product o. A uique characterizatio of the secod momet matrix is the followig: Let M = M 2 [X. The ier product, M 1/2 is the uique oe whose uit ball has maximal volume amog all ier products, A that are ormalized so that the secod momet satisfies E[ AX 2 =. We exted this characterizatio to a defiitio of the p-th momet matrix M p [X for all p (0, ). Theorem 1: If p (0, ) ad X is a odegeerate radom vector i with fiite p-th momet, the there exists a uique matrix A S such that ad E[ X p A = A A,

2 for each A S such that E[ X p A =. Moreover, the matrix A is the uique matrix i S satisfyig I = E[AX AX AX p 2. We defie the p-th momet matrix of a radom vector X to be M p [X = A p, where A is give by the theorem above. The proof of the theorem is give i IV A. Etropy III. MOMENT-ENTROP INEQUALITIES The Shao etropy of a radom vector X is defied to be h[x = f X log f X dx, provided that the itegral above exists. For λ > 0 the λ-reyi etropy power of a desity fuctio is defied to be ( f λ 1 λ X if λ 1, = e h[f if λ = 1, provided that the itegral above exists. Observe that lim N λ[x = N 1 [X. λ 1 The λ Reyi etropy of a radom vector X is defied to be h λ [X = log. The etropy h λ [X is cotiuous i λ ad, by the Hölder iequality, decreasig i λ. It is strictly decreasig, uless X is a uiform radom vector. It follows by the chai rule that for each A S. B. Relative etropy N λ [AX = A, (2) Give two radom vectors X, i, their relative Shao etropy or Kullback Leibler distace [6, [5, [1 (also, see page 231 i [4) is defied by ( ) fx h 1 [X, = f X log dx, (3) R f provided that the itegral above exists. Give λ > 0, we defie the relative λ Reyi etropy power of X ad as follows. If λ 1, the N λ [X, = ( ad, if λ = 1, the f X dx ( f λ X dx ( 1 λ R N 1 [X, = e h1[x,, λ(1 λ) f λ λ dx, (4) provided i both cases that the righthad side exists. Defie the λ Reyi relative etropy of radom vectors X ad by h λ [X, = log N λ [X,. Observe that h λ [X, is cotiuous i λ. Lemma 2: If X ad are radom vectors such that h λ [X, h λ [, ad h λ [X, are fiite, the h λ [X, 0. Equality holds if ad oly if X =. Proof: If λ > 1, the by the Hölder iequality, ( ) λ 1 ( f X dx f λ λ dx fx λ dx R ad if λ < 1, the we have fx λ = ( R ( f X ) λ f λ(1 λ) ) λ λ f X f R (R λ. λ, The iequality for λ = 1 follows by takig the limit λ 1. The equality coditios for λ 1 follow from the equality coditios of the Hölder iequality. The iequality for λ = 1, icludig the equality coditio, follows from the Jese iequality (details may be foud, for example, page 234 i [4). C. Geeralized Gaussias We call the extremal radom vectors for the mometetropy iequalities geeralized Gaussias ad recall their defiitio here. Give t R, let Let Γ(t) = t + = max(t, 0). deote the Gamma fuctio, ad let 0 x t 1 e x dx β(a, b) = Γ(a)Γ(b) Γ(a + b) deote the Beta fuctio. For each p (0, ) ad λ (/( + p), ), defie the stadard geeralized Gaussia to be the radom vector Z i whose desity fuctio f Z : [0, ) is give by a p,λ (1 + (1 λ) x p /(λ 1) + if λ 1 f Z (x) = (5) a p,1 e x p if λ = 1, where p(1 λ) p ω β( p, 1 1 λ p ) if λ < 1, p a p,λ = ω Γ( p ) if λ = 1, p(λ 1) p ω β( p, λ λ 1 ) if λ > 1. Ay radom vector i that ca be writte as = AZ, for some A S is called a geeralized Gaussia.

3 D. Iformatio measures of geeralized Gaussias If 0 < p < ad λ > /( + p), the λ-reyi etropy power of the stadard geeralized Gaussia radom vector Z is give by ( λ 1 (λ 1) 1 + a 1 p,λ if λ 1 N λ [Z = pλ e p a 1 p,1 if λ = 1 If 0 < p < ad λ > /( + p), the the p-th momet of Z is give by [ ( E[ Z p = λ 1 + p ) 1 1. We defie the costat c(, p, λ) = E[ Z p 1/p N λ [Z 1/ [ = a 1/ p,λ λ ( 1 + p ) 1 p 1 b(, p, λ), where ( ) 1 (1 λ) 1 (1 λ) b(, p, λ) = pλ if λ 1 e 1/p if λ = 1. Observe that if λ 1 ad 0 < p <, the f λ Z = a λ 1 p,λ (1 + (1 λ)e[ Z p ), (7) ad if λ = 1, the (6) h[z = log a p,1 + E[ Z p. (8) We will also eed the followig scalig idetities: f tz (x) = t f Z (t 1 x), (9) for each x. Therefore, ftz λ dx = t (1 λ) ad E[ tz p = t p E[ Z p. E. Spherical momet-etropy iequalities f λ Z dx, (10) The proof of Theorem 2 i [9 exteds easily to prove the followig. A more geeral versio ca be foud i [7. Theorem 3: If p (0, ), λ > /( + p), ad X is a radom vector i such that, E[ X p <, the E[ X p 1/p 1/ c(, p, λ), where c(, p, λ) is give by (6). Equality holds if ad oly if X = tz, for some t (0, ). Proof: For coveiece let a = a p,λ. Let ( E[ X p /p t = E[ Z p (11) ad = tz. If λ 1, the by (9) ad (5), (1), (11), ad (7), f X a λ 1 t (1 λ) + (1 λ)a λ 1 t (1 λ) p x p f X (x) dx = a λ 1 t (1 λ) (1 + (1 λ)t p E[ X p ) = a λ 1 t (1 λ) (1 + (1 λ)e[ Z p ) = t (1 λ) f λ Z, (12) where equality holds if λ < 1. It follows that if λ 1, the by Lemma 2, (4), (10) ad (12), ad (11), we have 1 N λ [X, λ ( ) ( = f λ t N λ[z = E[ X p /p f λ X N λ [Z E[ Z p /p. ) 1 ( 1 λ R ) λ 1 λ f X If λ = 1, the by Lemma 2, (3) ad (5), ad (8) ad (11), 0 h 1 [X, = h[x log a + log t + t p E[ X p = h[x + h[z + p log E[ X p E[ Z p. Lemma 2 shows that equality holds i all cases if ad oly if = X. F. Elliptic momet-etropy iequalities Corollary 4: If A S, p (0, ), λ > /( + p), ad X is a radom vector i satisfyig, E[ X p <, the E[ X p A 1/p c(, p, λ), (13) A 1/ 1/ where c(, p, λ) is give by (6). Equality holds if ad oly if X = ta 1 Z for some t (0, ). Proof: By (2) ad Theorem 3, E[ X p A 1/p A 1/ = E[ AX p 1/p 1/ N λ [AX 1/ E[ Z p 1/p N λ [Z 1/, ad equality holds if ad oly if AX = tz for some t (0, ). G. Affie momet-etropy iequalities Optimizig Corollary 4 over all A S yields the followig affie iequality. Theorem 5: If p (0, ), λ > /( + p), ad X is a radom vector i satisfyig, E[ X p <, the M p [X 1/p /p c(, p, λ),

4 where c(, p, λ) is give by (6). Equality holds if ad oly if X = A 1 Z for some A S. Proof: Substitute A = M p [X 1/p ito (13) Coversely, Corollary 4 follows from Theorem 5 by Theorem 1. IV. PROOF OF THEOREM 1 A. Isotropic positio of a probability measure A Borel measure µ o is said to be i isotropic positio, if x x x 2 dµ(x) = 1 I, (14) where I is the idetity matrix. Lemma 6: If p 0 ad µ is a Borel probability measure i isotropic positio, the for each A S, A 1/ ( Ax p x p /p dµ(x) 1, with either equality holdig if ad oly if A = ai for some a > 0. Proof: By Hölder s iequality, (R Ax p /p ( x p dµ(x) exp log Ax R x ) dµ(x), so it suffices to prove the p = 0 case oly. By (14), R (x e) 2 x 2 dµ(x) = 1, (15) for ay uit vector e. Let e 1,..., e be a orthoormal basis of eigevectors of A with correspodig eigevalues λ 1,..., λ. By the cocavity of log, ad (15), log Ax dµ(x) = 1 R log Ax 2 R x 2 x 2 dµ(x) = 1 log 2 i=1 1 2 i=1 = log A 1/. λ 2 i (x e i ) 2 x 2 (x e i ) 2 x 2 dµ(x) log λ 2 i dµ(x) The equality coditio follows from the strict cocavity of log. B. Proof of theorem Lemma 7: If p > 0 ad X is a odegeerate radom vector i with fiite p-th momet, the there exists c > 0 such that E[ e X p c, (16) for every uit vector e. Proof: The left side of (16) is a positive cotiuous fuctio of the uit sphere, which is compact. Theorem 8: If p 0 ad X is a odegeerate radom vector i with fiite p-th momet, the there exists A S, uique up to a scalar multiple, such that A 1/ E[ AX p 1/p A 1/ E[ A X p 1/p (17) for every A S. Proof: Let S S be the subset of matrices whose maximum eigevalue is exactly 1. This is a bouded set iside the set of all symmetric matrices, with its boudary S equal to positive semidefiite matrices with maximum eigevalue 1 ad miimum eigevalue 0. Give A S, let e be a eigevector of A with eigevalue 1. By Lemma 7, A 1/ E[ A X p 1/p A 1/ E[ X e p 1/p c 1/p A 1/. (18) Therefore, if A approaches the boudary S, the left side of (18) grows without boud. Sice the left side of (18) is a cotiuous fuctio o S, the existece of a miimum follows. Let A S be such a miimum ad = AX. The for each B S, B 1/ E[ B p 1/p = A 1/ BA 1/ E[ (BA)X p 1/p A 1/ A 1/ E[ AX p 1/p = E[ p 1/p. (19) with equality holdig if ad oly if equality holds for (17) with A = BA. Settig B = I + tb for B S, we get I + tb 1/ E[ (I + tb ) p 1/p E[ p 1/p, for each t ear 0. It follows that d dt I + tb 1/ E[ (I + tb ) p 1/p = 0, t=0 for each B S. A straightforward computatio shows that this holds oly if 1 E[ p I = E[ p 2. (20) Applyig Lemma 6 to dµ(x) = x p dm (x) E[ p, implies that equality holds for (19) oly if B = ai for some a (0, ). This, i tur, implies that equality holds for (17) oly if A = aa. Theorem 1 follows from Theorem 8 by rescalig A so that E[ p = ad substitutig = AX ito (20). REFERENCES [1 Shu-ichi Amari, Differetial-geometrical methods i statistics, Lecture Notes i Statistics, vol. 28, Spriger-Verlag, New ork, 1985. MR 86m:62053 [2 Erdal Arika, A iequality o guessig ad its applicatio to sequetial decodig, IEEE Tras. Iform. Theory 42 (1996), 99 105. [3 J. Bastero ad M. Romace, Positios of covex bodies associated to extremal problems ad isotropic measures, Adv. Math. 184 (2004), o. 1, 64 88.

5 [4 Thomas M. Cover ad Joy A. Thomas, Elemets of iformatio theory, Joh Wiley & Sos Ic., New ork, 1991, A Wiley-Itersciece Publicatio. [5 I. Csiszár, Iformatio-type measures of differece of probability distributios ad idirect observatios, Studia Sci. Math. Hugar. 2 (1967), 299 318. MR 36 #2428 [6 S. Kullback ad R. A. Leibler, O iformatio ad sufficiecy, A. Math. Statistics 22 (1951), 79 86. MR 12,623a [7 E. Lutwak, D. ag, ad G. Zhag, The Cramer-Rao iequality for star bodies, Duke Math. J. 112 (2002), 59 81. [8, Momet etropy iequalities, Aals of Probability 32 (2004), 757 774. [9, Cramer-Rao ad momet-etropy iequalities for Reyi etropy ad geeralized Fisher iformatio, IEEE Tras. Iform. Theory 51 (2005), 473 478. [10, L p Joh ellipsoids, Proc. Lodo Math. Soc. 90 (2005), 497 520. [11, Optimal Sobolev orms ad the L p Mikowski problem, It. Math. Res. Not. (2006), 62987, 1 21. Erwi Lutwak Erwi Lutwak received his B.S., M.S., ad Ph.D. degrees i Mathematics from Polytechic Uiversity, where he is ow Professor of Mathematics. Deae ag Deae ag received his B.A. i mathematics ad physics from Uiversity of Pesylvaia ad Ph.D. i mathematics from Harvard Uiversity. He has bee a NSF Postdoctoral Fellow at the Courat Istitute ad o the faculty of Rice Uiversity ad Columbia Uiversity. He is ow a full professor at Polytechic Uiversity. Gaoyog Zhag Gaoyog Zhag received his B.S. degree i mathematics from Wuha Uiversity of Sciece ad Techology, M.S. degree i mathematics from Wuha Uiversity, Wuha, Chia, ad Ph.D. degree i mathematics from Temple Uiversity, Philadelphia. He was a Rademacher Lecturer at the Uiversity of Pesylvaia, a member of the Istitute for Advaced Study at Priceto, ad a member of the Mathematical Scieces Research Istitute at Berkeley. He was a assistat professor ad is ow a full professor at Polytechic Uiversity.