arxiv: v1 [stat.me] 28 Mar 2011

Similar documents
Non Uniform Bounds on Geometric Approximation Via Stein s Method and w-functions

Bounds on expectation of order statistics from a nite population

Chapter 5 continued. Chapter 5 sections

Chapter 5. Chapter 5 sections

MAXIMUM VARIANCE OF ORDER STATISTICS

Expectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Strong log-concavity is preserved by convolution

Chapter 1. Sets and probability. 1.3 Probability space

Lecture 2: Repetition of probability theory and statistics

Expectation. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

On Approximating a Generalized Binomial by Binomial and Poisson Distributions

A Criterion for the Compound Poisson Distribution to be Maximum Entropy

1: PROBABILITY REVIEW

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

Random variables. DS GA 1002 Probability and Statistics for Data Science.

MFM Practitioner Module: Quantitative Risk Management. John Dodson. September 23, 2015

CS Lecture 19. Exponential Families & Expectation Propagation

Probability Distributions Columns (a) through (d)

Expectation of Random Variables

1 Presessional Probability

Lecture 13. Poisson Distribution. Text: A Course in Probability by Weiss 5.5. STAT 225 Introduction to Probability Models February 16, 2014

Conditional distributions (discrete case)

Review (Probability & Linear Algebra)

Chap 2.1 : Random Variables

Mathematical Statistics 1 Math A 6330

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

International Journal of Scientific & Engineering Research, Volume 6, Issue 2, February-2015 ISSN

APPROXIMATION OF GENERALIZED BINOMIAL BY POISSON DISTRIBUTION FUNCTION

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Binomial and Poisson Probability Distributions

Lecture 3. Discrete Random Variables

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Formulas for probability theory and linear models SF2941

Continuous Random Variables

Continuous Random Variables

EE4601 Communication Systems

The random variable 1

Continuous Probability Spaces

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

A Few Special Distributions and Their Properties

3. Probability and Statistics

Basics on Probability. Jingrui He 09/11/2007

Algorithms for Uncertainty Quantification

Random Variables and Their Distributions

arxiv: v1 [stat.me] 6 Jun 2016

Random Variables. Definition: A random variable (r.v.) X on the probability space (Ω, F, P) is a mapping

Introduction to Machine Learning

EE 505 Introduction. What do we mean by random with respect to variables and signals?

Brief Review of Probability

STAT 302 Introduction to Probability Learning Outcomes. Textbook: A First Course in Probability by Sheldon Ross, 8 th ed.

MAS223 Statistical Inference and Modelling Exercises

Chapter 8.8.1: A factorization theorem

p. 6-1 Continuous Random Variables p. 6-2

Discrete Distributions

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

Multiple Random Variables

On discrete distributions with gaps having ALM property

More on Distribution Function

Probability and Distributions

Random variables, Expectation, Mean and Variance. Slides are adapted from STAT414 course at PennState

Chapter 3: Random Variables 1

STATISTICS SYLLABUS UNIT I

Minimum Error Rate Classification

(y 1, y 2 ) = 12 y3 1e y 1 y 2 /2, y 1 > 0, y 2 > 0 0, otherwise.

STATISTICAL METHODS FOR SIGNAL PROCESSING c Alfred Hero

Estimation of parametric functions in Downton s bivariate exponential distribution

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

1 Uniform Distribution. 2 Gamma Distribution. 3 Inverse Gamma Distribution. 4 Multivariate Normal Distribution. 5 Multivariate Student-t Distribution

A Pointwise Approximation of Generalized Binomial by Poisson Distribution

1 Review of Probability and Distributions

Fisher information and Stam inequality on a finite group

ON CHARACTERIZATION OF POWER SERIES DISTRIBUTIONS BY A MARGINAL DISTRIBUTION AND A REGRESSION FUNCTION

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

01 Probability Theory and Statistics Review

Chapter 2. Random Variable. Define single random variables in terms of their PDF and CDF, and calculate moments such as the mean and variance.

HANDBOOK OF APPLICABLE MATHEMATICS

Week 2. Review of Probability, Random Variables and Univariate Distributions

Lecture 4: Probability and Discrete Random Variables

Relationship between probability set function and random variable - 2 -

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland

Statistical Methods in HYDROLOGY CHARLES T. HAAN. The Iowa State University Press / Ames

Fisher Information, Compound Poisson Approximation, and the Poisson Channel

Information geometry for bivariate distribution control

Free Meixner distributions and random matrices

Things to remember when learning probability distributions:

Deccan Education Society s FERGUSSON COLLEGE, PUNE (AUTONOMOUS) SYLLABUS UNDER AUTOMONY. SECOND YEAR B.Sc. SEMESTER - III

2. Variance and Higher Moments

Fundamental Tools - Probability Theory II

STAT 3610: Review of Probability Distributions

. Find E(V ) and var(v ).

On a simple construction of bivariate probability functions with fixed marginals 1

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

Mathematical Statistics

MA22S3 Summary Sheet: Ordinary Differential Equations

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

Katz Family of Distributions and Processes

CMPE 58K Bayesian Statistics and Machine Learning Lecture 5

Markov Chains. X(t) is a Markov Process if, for arbitrary times t 1 < t 2 <... < t k < t k+1. If X(t) is discrete-valued. If X(t) is continuous-valued

SDS 321: Introduction to Probability and Statistics

Transcription:

arxiv:1103.5447v1 [stat.me] 28 Mar 2011 On matrix variance inequalities G. Afendras and N. Papadatos Department of Mathematics, Section of Statistics and O.R., University of Athens, Panepistemiopolis, 157 84 Athens, Greece. Abstract Olkin and Shepp (2005, J. Statist. Plann. Inference, vol. 130, pp. 351 358) presented a matrix form of Chernoff s inequality for Normal and Gamma (univariate) distributions. We extend and generalize this result, proving Poincarétype and Bessel-type inequalities, for matrices of arbitrary order and for a large class of distributions. MSC: Primary 60E15. Key words and phrases: Integrated Pearson family; Cumulative Ord family; quadratic; matrix inequality. 1 Introduction Let Z be a standard Normal random variable (r.v.) and assume that g 1,...,g p are absolutely continuous, real-valued, functions of Z, each with finite variance (with respect to Z). Olkin and Shepp (2005) presented a matrix extension of Chernoff s variance inequality, which reads as follows: Olkin and Sepp (2005). If D = D(g) is the covariance matrix of the random vector g = (g 1 (Z),...,g p (Z)) t, where t denotes transpose, and if E[g i (Z)]2 < for all i = 1,2,...,p, then D H, where H = H(g) = (E[g i (Z)g j (Z)]) p p, andthe inequality is considered in thesense of Loewner ordering, that is, the matrix H D is nonnegative definite. In this note we extend and generalize this inequality for a large family of discrete and continuous r.v. s. Specifically, our results apply to any r.v. X according to one of the following definitions (c.f. [1]). Definition 1. (Integrated Pearson Family). Let X be an r.v. with probability density function (p.d.f.) f and finite mean µ = E(X). We say that X follows the Integrated Pearson distribution IP(µ;δ,β,γ), X IP(µ;δ,β,γ), if there exists a quadratic q(x) = δx 2 +βx+γ (with β,δ,γ R, δ + β + γ > 0) such that x (µ t)f(t)dt = q(x)f(x) for all x R. (1) Work partially supported by the University of Athens Research fund under Grant 70/4/5637. Corresponding author. e-mail address: npapadat@math.uoa.gr

Definition 2. (Cumulative Ord Family). Let X be an integer-valued r.v. with finite mean µ and assume that p(k) = P(X = k), k Z, is the probability mass function (p.m.f.) of X. We say that X follows the Cumulative Ord distribution CO(µ;δ,β,γ), X CO(µ;δ,β,γ), if there exists a quadratic q(j) = δj 2 +βj +γ (with β,δ,γ R, δ + β + γ > 0) such that (µ k)p(k) = q(j)p(j) for all j Z. (2) k j It is well known that the commonly used distributions are members of the above families, e.g. Normal, Gamma, Beta, F and t distributions belong to the Integrated Pearson family, while Poisson, Binomial, Pascal (Negative Binomial) and Hypergeometric distribution are members of the Cumulative Ord family. Therefore, the results of the present note also improve and unify the corresponding bounds for Beta r.v. s, given by Prakasa Rao (2006) and Wei and Zhang (2009). 2 Matrix variance inequalities of Poincaré and Bessel type 2.1 Continuous Case In this subsection we shall make use of the following notations. Assume that X IP(µ;δ,β,γ) and denote by q(x) = δx 2 +βx+γ its quadratic. It is known that, under (1), the support J = J(X) = {x R : f(x) > 0} is a (finite or infinite) open interval, say J(X) = (α,ω) see [1], [2]. For a fixed integer n {1,2,...} we shall denote by H n (X) the class of functions g : (α,ω) R satisfying the following properties: H 1 : For each k {0,1,...,n 1}, g (k) (with g (0) = g) is an absolutely continuous function with derivative g (k+1). H 2 : For each k {0,1,...,n}, E[q k (X)(g (k) (X)) 2 ] <. Also, for a fixed integer n {1,2,...} we shall denote by B n (X) the class of functions g : (α, ω) R satisfying the following properties: B 1 : Var[g(X)] <. B 2 : For each k {0,1,...,n 1}, g (k) (with g (0) = g) is an absolutely continuous function with derivative g (k+1). B 3 : For each k {0,1,...,n}, E[q k (X) g (k) (X) ] <. Clearly, if E X 2n < then H n (X) B n (X) (note that H 2 (with k = 0) yields B 1 ), since by the C-S inequality, E 2 [q k (X) g (k) (X) ] E[q k (X)] E[q k (X)(g (k) (X)) 2 ]. Consider now any p functions g 1,...,g p H n (X) and set g = (g 1,g 2,...,g p ) t. Then, the following p p matrices H k = H k (g) are well-defined for k = 1,2,...,n: H k = (h ij;k ), where h ij;k := E[q k (X)g (k) i (X)g (k) j (X)], i,j = 1,2,...,p. (3) 2

Similarly, for any functions g 1,...,g p B n (X), the following p p matrices B k = B k (g) are well-defined for k = 1,2,...,n: B k = (b ij;k ), where b ij;k := E[q k (X)g (k) i (X)] E[q k (X)g (k) j (X)], i,j = 1,2,...,p. (4) Our first result concerns a class of Poincaré-type matrix variance bounds, as follows: Theorem 1. Let X IP(µ;δ,β,γ)andassume that E X 2n < for some fixed integern {1,2,...}. Letg 1,...,g p bearbitraryfunctionsinh n (X)anddenotebyD = D(g) the dispersion matrix of the random vector g = g(x) = (g 1 (X),...,g p (X)) t. Also, denote by S n = S n (g) the p p matrix S n = n ( 1) k 1 k! k 1 j=0 (1 jδ) H k, where the matrices H k, k = 1,2,...,n, are defined by (3). Then, the matrix A n = ( 1) n (D S n ) is nonnegative definite. Moreover, A n is positive definite unless there exist constants c 1,...,c p R, not all zero, such that the function c 1 g 1 (x) + + c p g p (x) is a polynomial (in x) of degree at most n. Proof: Fix c = (c 1,...,c p ) t R p and define the function hc(x) = c t g(x) = c 1 g 1 (x)+ +c p g p (x). Since g 1,...,g p H n (X) we see that the function hc belongs to H n (X)and, in particular, hc(x) has finite varianceand E[q k (X)(h (k) c (X))2 ] <, k = 1,2...,n. Thus, we can make use of the inequality (see [1], [5]; c.f. [4], [7]) ( 1) n [Varhc(X) S n ] 0, where S n = n ( 1) k 1 k! (X)(h (k) k 1 c (X))2 ], j=0 (1 jδ)e[qk in which the equality holds if and only if hc is a polynomial of degree at most n. It is well-known that Varhc(X) = c t Dc and it is easily seen that E[q k (X)(h (k) c (X))2 ] = c t H k c with H k (k = 1,...,n) as in (3). Thus, S n = c t S n c and the preceding inequality takes the form c t [( 1) n (D S n )]c 0. Sincec R p isarbitraryitfollowsthatthematrix( 1) n (D S n )isnonnegativedefinite. Clearly the inequality is strict for all c R p for which hc / spam[1,x,...,x n ]. Remark 1. (a) Olkin and Shepp s (2005) matrix inequalities are particular cases of Theorem 1 for n = 1 and with X being a standard Normal or a Gamma r.v. For 3

example, when n = 1 and X = Z N(0,1) IP(0;0,0,1) then S 1 = H 1 = H = (E[g i (Z)g j (Z)]) p p and we get the inequality D H (in the Loewner ordering). Moreover, for X = Z and n = 2 or 3, Theorem 1 yields the new matrix variance bounds H 1H 2 2 D and D H 1H 2 2+ 1H 6 3 where H 2 = (E[g i (Z)g j (Z)]) p p and H 3 = (E[g i (Z)g j (Z)]) p p. (b) Theorem 1 applies to Beta r.v. s. In particular, when n = 1 and X Beta(a,b) (with p.d.f. f(x) x a 1 (1 x) b 1, 0 < x < 1, and parameters a,b > 0) then q(x) = x(1 x)/(a+b). Theorem 1 yields the inequality D H (in the Loewner ordering)whereh = 1 a+b (E[X(1 X)g i(x)g j(x)]) p p. ThiscompareswithTheorem 4.1 in Prakasa Rao (2006); c.f. Wei and Zhang (2009), Remark 1. Next we show some similar Bessel-type matrix variance bounds. The particular case of a Beta r.v. is covered by Theorem 1 of Wei and Zhang (2009). Theorem 2. Let X IP(µ;δ,β,γ)andassume that E X 2n < for some fixed integer n {1,2,...}. Let g 1,...,g p be arbitraryfunctions in B n (X) anddenote by D = D(g) the dispersion matrix of the random vector g = g(x) = (g 1 (X),...,g p (X)) t. Also, denote by L n = L n (g) the p p matrix L n = n 1 k! E[q k (X)] 2k 2 j=k 1 (1 jδ) B k, where the matrices B k, k = 1,2,...,n, are defined by (4). Then, L n D in the Loewner ordering. Moreover, D L n is positive definite unless there exist constants c 1,...,c p R, not all zero, such that the function c 1 g 1 (x)+ +c p g p (x) is a polynomial (in x) of degree at most n. Proof: Fix c = (c 1,...,c p ) t R p and, as in the previous proof, define the function hc(x) = c t g(x) = c 1 g 1 (x)+ +c p g p (x). Since g 1,...,g p B n (X) we see that the function hc belongs to B n (X). In particular, hc(x) has finite variance and E[q k (X) h (k) c (X) ] <, k = 1,2...,n. Thus, we can apply the inequality (see [2]) Varhc(X) L n, where L n = n E 2 [q k (X)h (k) c (X)] k! E[q k (X)] 2k 2 j=k 1 (1 jδ), in which the equality holds if and only if hc is a polynomial of degree at most n. Observe that Varhc(X) = c t Dc and E 2 [q k (X) h (k) c (X) ] = ct B k c with B k (k = 1,...,n) as in (4). Thus, L n = c t L n c and the preceding inequality takes the form c t [D L n ]c 0. Since c R p is arbitrary it follows that the matrix D L n is nonnegative definite. Clearly the inequality is strict for all c R p for which hc / spam[1,x,...,x n ]. 4

2.2 Discrete Case In this subsection we shall make use of the following notations. Assume that X CO(µ;δ,β,γ). It is known (see [1], [2]) that, under (2), the support J = J(X) = {k Z : p(k) > 0} is a (finite of infinite) interval of integers, say J(X) = {α,α + 1,...,ω 1,ω}. Write q(x) = δx 2 + βx + γ for the quadratic of X and let q [k] (x) = q(x)q(x + 1) q(x + k 1) for k = 1,2,... (with q [0] (x) 1, q [1] (x) q(x)). For any function g : Z R we shall denote by k [g(x)] its k-th forward difference, i.e., k [g(x)] = [ k 1 [g(x)]], k = 1,2,..., with [g(x)] = g(x+1) g(x) and 0 [g(x)] g(x). For a fixed integer n {1,2,...} we shall denote by Hd n (X) the class of functions g : J(X) R satisfying the following property: HD 1 : For each k {0,1,...,n}, E[q [k] (X)( k [g(x)]) 2 ] <. Also, for a fixed integer n {1,2,...} we shall denote by Bd n (X) the class of functions g : J(X) R satisfying the following properties: BD 1 : Var[g(X)] <. BD 2 : For each k {0,1,...,n}, E[q k (X) k [g(x)] ] <. Clearly, if E X 2n < then H n d (X) Bn d (X) (note that HD 1 (with k = 0) yields BD 1 ). Indeed, since P[q [k] (X) 0] = 1, the C-S inequality implies that E 2 [q [k] (X) k [g(x)] ] E[q [k] (X)] E[q [k] (X)( k [g(x)]) 2 ]. Consider now any p functions g 1,...,g p H n d (X) and set g = (g 1,g 2,...,g p ) t. Then, the following p p matrices H k = H k (g) are well-defined for k = 1,2,...,n: H k = (h ij;k ), where h ij;k := E[q [k] (X) k [g i (X)] k [g j (X)]], i,j = 1,2,...,p. (5) Similarly, for any functions g 1,...,g p B n d (X), the following p p matrices B k = B k (g) are well-defined for k = 1,2,...,n: B k = (b ij;k ), where b ij;k := E[q [k] (X) k [g i (X)]] E[q [k] (X) k [g j (X)]], i,j = 1,2,...,p. (6) The matrix variance inequalities for the discrete case are summarized in the following theorem; its proof, being the same as in the continuous case, is omitted. Theorem 3. Let X CO(µ;δ,β,γ) and assume that E X 2n < for some fixed integer n {1,2,...}. (a) Let g 1,...,g p be arbitrary functions in Hd n (X) and denote by D = D(g) the dispersion matrix of the random vector g = g(x) = (g 1 (X),...,g p (X)) t. Also, denote by S n = S n (g) the p p matrix S n = n ( 1) k 1 k! k 1 j=0 (1 jδ) H k, 5

where the matrices H k, k = 1,2,...,n, are defined by (5). Then, the matrix A n = ( 1) n (D S n ) is nonnegative definite. Moreover, A n is positive definite unless there exist constants c 1,...,c p R, not all zero, and a polynomial P n : R R, of degree at most n, such that P[c 1 g 1 (X)+ +c p g p (X) = P n (X)] = 1. (b) Let g 1,...,g p be arbitrary functions in Bd n (X) and denote by D = D(g) the dispersion matrix of the random vector g = g(x) = (g 1 (X),...,g p (X)) t. Also, denote by L n = L n (g) the p p matrix L n = n 1 k! E[q [k] (X)] 2k 2 j=k 1 (1 jδ) B k, where the matrices B k, k = 1,2,...,n, are defined by (6). Then, L n D in the Loewner ordering. Moreover, D L n is positive definite unless there exist constants c 1,...,c p R, not all zero, and a polynomial P n : R R, of degree at most n, such that P[c 1 g 1 (X)+ +c p g p (X) = P n (X)] = 1. [Itshouldbenotedthatthek-thterminthesumdefiningthematrixS n orthematrix L n, above, should be treated as the null matrix, 0 p p, whenever E[q [k] (X)] = 0.] As an example consider the case where X Poisson(λ) with p.m.f. p(k) = e λ λ k /k!, k = 0,1,... (λ > 0). Then X CO(λ;0,0,λ) so that q(x) λ. It follows that H k = λ k (E[ k [g i (X)] k [g j (X)]]) p p and B k = λ 2k (E[ k [g i (X)]] E[ k [g j (X)]]) p p. Thus, for n = 1 and p = 2 Theorem 3(a) yields the matrix inequality ( Var[g1 ] Cov[g 1,g 2 ] Cov[g 1,g 2 ] Var[g 2 ] ) ( E[( [g1 ]) λ 2 ] E[ [g 1 ] [g 2 ]] E[ [g 1 ] [g 2 ]] E[( [g 2 ]) 2 ] References [1] Afendras, G., Papadatos, N. and Papathanasiou, V. (2007). The discrete Mohr and Noll Inequality with applications to variance bounds. Sankhyā, 69, 162 189. [2] Afendras, G., Papadatos, N. and Papathanasiou, V. (2011). An extended Stein-type covariance identity for the Pearson family, with applications to lower variance bounds. Bernoulli (in press). [3] Chernoff, H. (1981). A note on an inequality involving the normal distribution. Ann. Probab., 9, 533 535. [4] Houdre, C. and Kagan, A. (1995). Variance inequalities for functions of Gaussian variables. J. Theoret. Probab., 8, 23 30. [5] Johnson, R.W. (1993). A note on variance bounds for a function of a Pearson variate. Statist. Decisions, 11, 273 278. 6 ).

[6] Olkin, I. and Shepp, L. (2005). A matrix variance inequality. J. Statist. Plann. Inference, 130, 351 358. [7] Papathanasiou, V. (1988). Variance bounds by a generalization of the Cauchy-Schwarz inequality. Statist. Probab. Lett., 7, 29 33. [8] Prakasa Rao, B.L.S. (2006). Matrix variance inequalities for multivariate distributions. Statistical Methodology, 3, 416 430. [9] Wei, Z. and Zhang, X. (2009). Covariance matrix inequalities for functions of Beta random variables. Statist. Probab. Lett., 79, 873 879. 7