Uniform Correlation Mixture of Bivariate Normal Distributions and. Hypercubically-contoured Densities That Are Marginally Normal

Similar documents
Teacher s Corner. f (x 1,x 2 ) =

Exam 2. Jeremy Morris. March 23, 2006

Multiple Random Variables

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Outline. Random Variables. Examples. Random Variable

Multivariate Statistics

Multivariate Random Variable

Random Variables and Their Distributions

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

Estimation of parametric functions in Downton s bivariate exponential distribution

Chapter 5 continued. Chapter 5 sections

A Probability Review

matrix-free Elements of Probability Theory 1 Random Variables and Distributions Contents Elements of Probability Theory 2

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Spherically Symmetric Logistic Distribution

A NEW CLASS OF SKEW-NORMAL DISTRIBUTIONS

Continuous Random Variables

The Multivariate Gaussian Distribution [DRAFT]

The Multivariate Gaussian Distribution

Introducing the Normal Distribution

Elements of Probability Theory

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

Mark Gordon Low

Preliminary statistics

Multiple Random Variables

Multivariate Gaussians. Sargur Srihari

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

Lecture Note 1: Probability Theory and Statistics

STAT 450: Statistical Theory. Distribution Theory. Reading in Casella and Berger: Ch 2 Sec 1, Ch 4 Sec 1, Ch 4 Sec 6.

On the Conditional Distribution of the Multivariate t Distribution

MULTIVARIATE DISTRIBUTIONS

Chapter 5. Chapter 5 sections

Introduction to Normal Distribution

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

ESTIMATORS FOR GAUSSIAN MODELS HAVING A BLOCK-WISE STRUCTURE

Introducing the Normal Distribution

EE4601 Communication Systems

STT 843 Key to Homework 1 Spring 2018

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Calculation of Bayes Premium for Conditional Elliptical Risks

Bayesian Inference for the Multivariate Normal

Lecture 22: A Review of Linear Algebra and an Introduction to The Multivariate Normal Distribution

Asymptotic Statistics-III. Changliang Zou

Review (Probability & Linear Algebra)

Consistent Bivariate Distribution

Prime numbers and Gaussian random walks

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline.

Multivariate random variables

MAS223 Statistical Inference and Modelling Exercises

Optimal Estimation of a Nonsmooth Functional

On Reparametrization and the Gibbs Sampler

Fiducial Inference and Generalizations

Introduction to Probability and Stocastic Processes - Part I

Some Results on the Multivariate Truncated Normal Distribution

1. Density and properties Brief outline 2. Sampling from multivariate normal and MLE 3. Sampling distribution and large sample behavior of X and S 4.

Multivariate Bayesian Linear Regression MLAI Lecture 11

COMPUTATIONAL MULTI-POINT BME AND BME CONFIDENCE SETS

Marginal density. If the unknown is of the form x = (x 1, x 2 ) in which the target of investigation is x 1, a marginal posterior density

(Multivariate) Gaussian (Normal) Probability Densities

Multivariate Non-Normally Distributed Random Variables

Multivariate Distribution Models

Notes on a skew-symmetric inverse double Weibull distribution

2.3. The Gaussian Distribution

STAT 801: Mathematical Statistics. Distribution Theory

A New Class of Positively Quadrant Dependent Bivariate Distributions with Pareto

STAT 450: Statistical Theory. Distribution Theory. Reading in Casella and Berger: Ch 2 Sec 1, Ch 4 Sec 1, Ch 4 Sec 6.

Chapter 12: Bivariate & Conditional Distributions

Why Do Statisticians Treat Predictors as Fixed? A Conspiracy Theory

Lecture 3. Inference about multivariate normal distribution

Introduction to Machine Learning

where r n = dn+1 x(t)

Lawrence D. Brown* and Daniel McCarthy*

Elliptically Contoured Distributions

CS 195-5: Machine Learning Problem Set 1

Real Analysis, 2nd Edition, G.B.Folland Signed Measures and Differentiation

Lecture 8: The Metropolis-Hastings Algorithm

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018

On Bivariate Transformation of Scale Distributions

On a simple construction of bivariate probability functions with fixed marginals 1

01 Probability Theory and Statistics Review

arxiv: v1 [math.pr] 10 Oct 2017

An Introduction to Multivariate Statistical Analysis

Jianhua Z. Huang, Haipeng Shen, Andreas Buja

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.

Expectation Propagation for Approximate Bayesian Inference

Notes on Random Variables, Expectations, Probability Densities, and Martingales

Research Article The Laplace Likelihood Ratio Test for Heteroscedasticity

Introduction to Probability and Statistics (Continued)

Regression. Oscar García

MULTIVARIATE PROBABILITY DISTRIBUTIONS

Multivariate random variables

PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS

Multivariate probability distributions and linear regression

Lawrence D. Brown, T. Tony Cai and Anirban DasGupta

Probability and Stochastic Processes

Journal of Statistical Research 2007, Vol. 41, No. 1, pp Bangladesh

2 Functions of random variables

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

1 Random variables and distributions

Lecture 11. Probability Theory: an Overveiw

Transcription:

Uniform Correlation Mixture of Bivariate Normal Distributions and Hypercubically-contoured Densities That Are Marginally Normal Kai Zhang Department of Statistics and Operations Research University of North Carolina, Chapel Hill, 7599 email: zhangk@email.unc.edu Lawrence D. Brown Department of Statistics, The Wharton School University of Pennsylvania, Philadelphia, 904 email: lbrown@wharton.upenn.edu Andreas Buja Department of Statistics, The Wharton School University of Pennsylvania, Philadelphia, 904 email: buja.at.wharton@gmail.com Edward George Department of Statistics, The Wharton School University of Pennsylvania, Philadelphia, 904 email: edgeorge@wharton.upenn.edu Linda Zhao Department of Statistics, The Wharton School University of Pennsylvania, Philadelphia, 904 email: lzhao@wharton.upenn.edu February 6, 04

Author s Footnote: Kai Zhang is Assistant Professor, Department of Statistics and Operations Research, the University of North Carolina, Chapel Hill (E-mail: zhangk@email.unc.edu). Lawrence D. Brown, Andreas Buja, Edward George, and Linda Zhao are Professors, Department of Statistics, The Wharton School, the University of Pennsylvania. We appreciate helpful comments from James Berger, Shane Jensen, Abba Krieger, J. Steve Marron, Henry McKean, Xiao-Li Meng, J. Michael Steele, and Cun- Hui Zhang.

Abstract The bivariate normal density with unit variance and correlation ρ is well-known. We show that by integrating ρ out of this density, the result is a function of the maximum norm. The Bayesian interpretation of this result is that if we put a uniform prior over ρ, then the marginal bivariate density depends only on the maximal magnitude of the variables. The square-shaped isodensity contour of this resulting marginal bivariate density can also be regarded as the equally-weighted mixture of bivariate normal distributions over all possible correlation coefficients. This density links to the Khintchine mixture method of generating random variables. We use this method to construct the higher dimensional generalizations of this distribution. We further show that for each dimension, there is a unique multivariate density that is a differentiable function of the maximum norm and is marginally normal, and the bivariate density from the integral over ρ is its special case in two dimensions. Keywords: Bivariate Normal Mixture, Khintchine Mixture, Uniform Prior over Correlation 3

. INTRODUCTION It is well-known that a multivariate distribution that has normal marginal distributions is not necessarily jointly multivariate normal (in fact, not even when the distribution is conditionally normal, see Gelman and Meng (99)), i.e., a p-dimensional multivariate distribution X = (X,..., X p ) that has marginal standard normal densities φ(x ), φ(x ),..., φ(x p ) and marginal distribution functions Φ(x ), Φ(x ),..., Φ(x p ) may not have a jointly multivariate normal density f(x,..., x p Σ) = (π) p/ Σ / exp{ x T Σ x/} (.) for some p by p correlation matrix Σ. Classical examples of such distributions can be found in Feller (97) and Kotz et al. (004). In this paper we focus on one particular class of such distributions that arises from uniform mixtures of bivariate normal densities over the correlation matrix. When p =, the bivariate normal density is well-known: f(x, x ρ) = { π ρ exp x + x ρx } x ( ρ ) (.) for some ρ. By a uniform correlation mixture of bivariate normal density we mean the bivariate density function f(x, x ) below: f(x, x ) = f(x, x ρ)dρ. (.3) This type of continuous mixture of bivariate normal distributions has been used in applications such as imaging analysis (Aylward and Pizer (997)). We show that such a uniform correlation mixture results in a bivariate density that depends on the maximal magnitude of the two variables: f(x, x ) = ( Φ( x ) ) (.4) where Φ( ) is the cdf of standard normal distribution, and x = max{ x, x }. This bivariate density has a natural Bayesian interpretation: it can be regarded as the marginal density of (X, X ) if we put a uniform prior over the correlation ρ (this type of density is referred to as the marginal predictive distribution in Bayesian literature). Moreover, one interesting feature of this density is that its isodensity contours consists of concentric squares. 4

Although we were not able to find the above result in the literature, we noticed that the bivariate density f(x, x ) = ( Φ( x ) ) is first obtained in a different manner by Bryson and Johnson (98). In this paper, the authors consider constructing multivariate distributions through the Khintchine mixture method (Khintchine (938)). The bivariate density f(x, x ) is listed as an example of their construction. But the link between this density and the uniform mixture over correlations is not addressed. Through the Khintchine mixture approach, we show that the resulting mixed density is a function of x. Moreover, we show that for each p, this resulting density is the unique multivariate density that is a differentiable function of x and is marginally normal. It thus becomes interesting to investigate the connection between the Khintchine mixture and the uniform mixture over correlation matrices.. THE UNIFORM CORRELATION MIXTURE INTEGRAL Our first main result is the following theorem: Theorem. f(x, x ) = f(x, x ρ)dρ = ( Φ( x ) ). (.) The proof can be found in Appendix A. Note that f(x, x ) is a proper bivariate density, and it is marginally standard normal: f(x, x )dx = R x x ( Φ( x ) ) dx + = ( Φ( x ) ) x + x ( Φ(x ) ) (, x ) ( x, ) x + x = ( Φ( x ) ) x ( Φ( x ) ) x π e x / = π e x / = φ(x ). ( Φ( x ) ) dx x π e x / dx x (.) The form of this bivariate density implies that its isodensity contours consists of concentric squares. Thus, an intuitive interpretation of this result is that if we average the isodensity contours of bivariate normal distributions which are concentrically elliptic, we shall get an isodensity contour of concentric squares. The plot of f(x, x ) is given in Figure. 5

3 0.5 0. 0.5 0. 0.05 0 3 3 0 0 3 Figure : The plot of the bivariate density function f(x, x ) = ( Φ( x ) ). Note that this bivariate density has a contour of squares and is marginally normal. This result also indicates that if we have a uniform prior over the correlation ρ, the resulting marginal density depends only on the maximal magnitude of the two variables. The application of this result to Bayesian inference needs to be further investigated. But this marginal density does immediately lead to the Bayes factor of testing ρ = 0, in this very special situation. Moreover, the uniform prior has been used (see Barnard et al. (000) for theory and applications to shrinkage estimation) to model covariance matrices. We shall report these types of applications in future work. 3. CONNECTION TO THE KHINTCHINE MIXTURE AND HYPERCUBICALLY CONTOURED DISTRIBUTIONS THAT ARE MARGINALLY NORMAL In Bryson and Johnson (98), the authors developed a method of generating multivariate distributions through Khintchine s Theorem. Khintchine s theorem states that any univariate continuous random variable X has a single mode if and only if it can be expressed as the product X = Y U where Y and U are independent continuous variables and U has a uniform distribution over [0, ]. This result and its extensions can be used to construct multivariate distributions that have specified marginal distributions. As an example of such a distribution with standard normal marginal distributions, the authors consider the construction with mutually independent 6

U i Uniform[, ], i =, and Y χ 3 (so that Y > 0 and Y χ 3 ). The random variables X and X are then generated as X = Y U and X = Y U. The authors show that with this construction from Y and U i s, the density of (X, X ) is exactly f(x, x ) = ( Φ( x ) ). This density can also be generalized to higher dimensions through Khintchine s method. In fact, for any p, one generates U i Uniform[, ], i =,..., p and Y χ 3, and considers X i = Y U i, i =,..., p. Then each X i is standard normally distributed. By using a p+ dimensional transformation with X i = Y U i and W = Y and then integrating out W, we derive the joint density of X = (X,..., X p ) as f p (x,..., x p ) = p π x y p e y / dy. (3.) Note that f p (0,..., 0) = for p 3. Nevertheless, f p is a proper p-dimensional density for every p that has standard normal marginal distributions. Since f p is a function of x, the isodensity contour of f p consists of concentric hypercubes, which generalizes f (x, x ) = ( Φ( x ) ). We further show below that f p is the only density that possesses this property of being hypercubically contoured and having marginally normal distributions. Proposition 3. Consider a p-dimensional density that is a function of x, i.e., g p (x,..., x p ) = h p ( x ) for some differentiable function h p : R + R +. If g p (x,..., x p ) has standard normal marginal densities, then the unique expression of g p (x,..., x p ) is g p (x,..., x p ) = p π x y p e y / dy. (3.) The proof can be found in Appendix B. 4. DISCUSSION: EQUIVALENCE BETWEEN THE UNIFORM CORRELATION MIXTURE AND THE KHINTCHINE MIXTURE IN HIGHER DIMENSIONS? In this paper, we show the equivalence of three bivariate densities: the uniform correlation mixture of bivariate normal densities, the unique square-contoured bivariate density with normal marginals, and the joint density of the Khintchine mixture of χ 3 densities. We also show the equivalence of hypercubically-contoured densities and the Khintchine mixture of χ 3 densities in higher dimensions. It thus becomes interesting to investigate whether the uniform correlation mixture of bivariate 7

normal densities is equivalent to them in higher dimensions. Intuitively, this equivalence should carry over, as we will just be averaging the ellipsoid-shaped contours of multivariate normal densities instead of elliptical ones. However, directly integrating the multivariate normal density over the uniform measure over positive definite matrices is not a transparent task. Furthermore, if this relationship holds for normal distributions in higher dimensions, it is curious whether this equivalence holds also for other distributions. We will investigate these problems in future work. The fact f p (0,..., 0) < only for p is also interesting. Due to the relationship between the normal distribution and random walks, it would be interesting to see if the finiteness of f p (0,..., 0) connects with the recurrence or transience of random walks. Due to the natural Bayesian interpretation of this mixture, we will also investigate its Bayesian applications. A. PROOF OF THEOREM. The derivation of f(x, x ) consists of the following steps:. To show f(0, 0) = 4 = ( Φ(0) ).. To show that if x and x are not both 0 s, then the function g(ρ) = x +x ρx x ( ρ ) is monotone decreasing for ρ (, a] and is monotone increasing for ρ [a, ), where a = a(x, x ) = sgn(x x ) x x x x. 3. To treat I = a f(x, x ρ)dρ and I = a Step. For x = x = 0, 4π ρ exp f(0, 0) = { x + x ρx x ( ρ ) f(x, x ρ)dρ separately, and to show that } dρ = I + I = Φ( x ) ( ). 4π ρ dρ = 4π arcsin ρ = 4 = ( ) Φ(0). (A.) Step. If x and x are not both 0 s, then consider the derivative of the exponent in f(x, x ρ): After some algebra, it can be shown that g(ρ) = x + x ρx x ( ρ. ) d dρ g(ρ) = ( ρ ) (ρx x )(ρx x ). (A.) 8

and d d ρ g(ρ) = { ( + 3ρ ( ρ ) 3 )x ρ(ρ + 3)x x + ( + 3ρ )x } 0. (A.3) Therefore, the minimum of g(ρ) is attained at a = a(x, x ) = sgn(x x ) x x x x. For example, if x > x 0, then a = x x. We should also note that the minimum value of g(ρ) is g(a) = x x. (A.4) Step 3. Without loss of generality, we consider the case x > x 0. We split the integral into two pieces: We start with a f(x, x ρ)dρ = f(x, x ρ)dρ + a f(x, x ρ)dρ = I + I (A.5) I = a { 4π ρ exp x + x ρx } x ( ρ dρ. ) (A.6) For ρ [a, ), g(ρ) is monotone increasing in ρ. Therefore, we consider the transformation y = g(ρ). Solving the quadratic equation for ρ yields that Denote (y) = (y x )(y x ). Note that Thus, by differentiating (A.7) in y, we obtain ρ = g (y) = x x + (y x )(y x ). (A.7) y d dy (y) = 4y (x + x ). (A.8) (y) dρ = (x + x )y x x x x (y) y dy. (y) (A.9) Moreover, by (A.7), ρ = y ( ) (x + x )y x x x x (y). (A.0) By applying the last two equations to I, we have I = x (x + x )y x x x x (y) 4 e y dy. πy (y) (A.) 9

A similar argument for I yields that I = x (x + x )y x x + x x (y) 4 e y dy. πy (y) We show next that (x + x )y x x + x x (y) (x + + x )y x x x x (y) (y x ) (y x ) (A.) (A.3) =x This is seen by noting that ( (x + x )y x x + x x (y) (y x ) + ) (x + x )y x x x x (y) (y x ) = 4x y x x (y x ) (A.4) =x. Now the integral is simplified as f(x x, x ρ)dρ = I + I = x 4πy e y dy = y x 4πz z e x z dz (A.5) where z = y/x. By classical Laplace transform results in Erdélyi et al. (954), the last integral is found to be 4πz z e x z dz = ( Φ( x ) ). (A.6) B. PROOF OF PROPOSITION 3. We shall prove the proposition by induction. For p =, x = x, and the marginal density is the density in x. Thus g (x ) = φ(x ) is the unique density that is marginally normal and is a function of x = x. This is the only case where the marginal normality is needed. Suppose (3.) is true for,... p. For p+, write x = (x,..., x p ). Suppose g p+ (x) = h p+ ( x ) for some differentiable function h p+ : R + R + and has standard normal marginal distributions. Note that if we integrate out x p+, i.e., g p+ (x)dx p+ = x h p+ ( x ) + h p+ (x p+ )dx p+, x (A.) 0

then this resulting density on the right hand side is a p-dimensional joint density that depends only on x. By the uniqueness from the induction hypothesis, x h p+ ( x ) + h p+ (x p+ )dx p+ = x p π x z p e z / dz. (A.) Now consider the equation in a positive t that th p+ (t) + Differentiating in t on both sides, we get t h p+ (x p+ )dx p+ = p π t z p e z / dz. (A.3) Thus, h p+ (y) = h p+(t) = p π t p e t /. (A.4) y (p+) π t (p+) e t / dt + c (A.5) for some constant c. Since lim y h p+ (y) = 0, c = 0. Thus the induction proof is complete. REFERENCES Aylward, S. and Pizer, S. (997), Continuous Gaussian Mixture Modeling, in Information Processing in Medical Imaging, eds. Duncan, J. and Gindi, G., Springer Berlin Heidelberg, vol. 30, pp. 76 89. Barnard, J., McCulloch, R., and Meng, X.-L. (000), Modeling Covariance Matrices in Terms of Standard Deviations and Correlations, with Application to Shrinkage, Statistica Sinica, 0, 8 3. Bryson, M. C. and Johnson, M. E. (98), Constructing and Simulating Multivariate Distributions Using Khintchine s Theorem, Journal of Statistical Computation and Simulation, 6, 9 37. Erdélyi, A., Magnus, W., Oberhettinger, F., and Tricomi, F. G. (954), Tables of Integral Transforms, Vols. and, McGraw-Hill, New York. Feller, W. (97), An Introduction to Probability Theory and its Applications, Vol., Wiley, nd ed.

Gelman, A. and Meng, X.-L. (99), A Note on Bivariate Distributions That Are Conditionally Normal, The American Statistician, 45, 5 6. Khintchine, A. Y. (938), On Unimodal Distributions, Izvch. Nauchno-Issled. Inst. Mat. Mekh. Tomsk. Gos. Univ.,, 7. Kotz, S., Balakrishnan, N., and Johnson, N. L. (004), Continuous Multivariate Distributions, Models and Applications, Continuous Multivariate Distributions, Wiley.