Assessing Multivariate Normality using Normalized Hermite Moments

Size: px

Start display at page:

Download "Assessing Multivariate Normality using Normalized Hermite Moments"

Rachel Carmella Flynn
5 years ago
Views:

1 BIWI-TR-5 May 999 Assessing Multivariate Normality using Normalized Hermite Moments Christian Stoecklin, Christian Brechbühler and Gábor Székely Swiss Federal Institute of Technology, ETH Zentrum Communication Technology Lab, Image Science CH-89 Zürich, Switzerland {cstoeckl brech

2 Summary The classical multivariate data analysis has been based largely on a multivariate normal distribution assumption. Therefore almost all of the work was concentrated on just location and dispersion parameters, with relatively little attention to questions of high-order characteristics. For characterization of multivariate distribution, its moments are very expressive. This fact was used to develop a test method for testing given an iid. sample of multivariate normality. Most of the known procedures try to detect joint nonnormality by studying marginal normality of an observation set. Tests based on the moments of a sample directly measure joint normality ignoring marginal effects. The method introduced in this work shows, that normalized Hermite moments are able to describe a multivariate distribution in a way, that allows to discriminate them better from the normal case than another often used method (nearest distance method). To illustrate the capabilities of the normalized moments, some artificial distributions with marginal normality were tested with both methods and directly compared.

3 Contents 3 Contents I Theory 5 Multivariate Normal Distribution 5. Basic Definitions The Moments of a distribution Hermite Polynomials Related Work 7. Probability Plot Nearest Distance Normalized Moment Method 4 3. Dimension Multidimensional Distribution of the test value T p,m Reconstruction using Normalized Moments 8 II Measurements 9 5 Test Distributions 9 5. Test distributions with marginal normal distribution Other test distribution Results 7 Conclusions 3 Appendix 5 A Miscellaneous 5 A. Sphericized Residuals: A. Distribution of the nearest distances d A.3 Equivalence of the d min A.4 The definite Integral I(n) A.5 Variance of the Scatter Distribution A.6 Examples of Reconstructions References 3

4 4 Contents Introduction Statistical distributions play a useful part in modeling data. One reason for the interest in using appropriate distributions for modeling data is the feasibility of obtaining parsimonious representations of data in terms of the parameters of such distributions. The assumption of multivariate normality underlies much of the standard classical multivariate statistical methodology. The effects of deviation from normality on the methods are not easily or clearly understood. If a normally distributed variable, or a random sample from a normal distribution has some property, it may be of interest to know whether such a property characterizes the normality. Serious research in this area has been done since 935, but most results date from 95 or later [Patel and Read 996]. It is useful to have procedures assessing normality for a given multivariate sample. Such a check is helpful in guiding the subsequent analysis of the data. It may suggest the need for a transformation of the data to make them as normally distributed as possible. In the first part some basic definitions of the most important features are made. They will be used later to introduce two often used methods and to develop a new method for measuring normality of a given sample. In the second part the new method is directly compared to a standard method using some alternate constructed distributions.

5 5 Part I Theory Multivariate Normal Distribution There are various ways of introducing the multivariate normal distribution. In this description the definitions from [Krzanowski 988] are used. It should be the simplest possible introduction that will be sufficient to establish the most important properties for subsequent use.. Basic Definitions If a random vector u = (u,..., u m ) is distributed such that the u i are all independent standard normal univariate random variables, then u is said to have a standard multivariate normal distribution. By definition the expected value E[u] is the zero vector and the covariance matrix Cov[u] is the (m m) identity matrix I m. The distribution of u is denoted by N m (, I m ), indicating an m-component multivariate normal distribution with mean vector and covariance matrix I m. If A is any (p m) matrix of rank m, and µ is any p-vector, then the random vector x = µ + Au is said to have a general multivariate normal distribution. The expected value of x is µ + A = µ and the covariance matrix of x is AI m A = AA. Thus if we write Σ = AA then x has a p-component multivariate normal distribution with mean µ and covariance matrix Σ, denoted by N p (µ, Σ) distribution. If Rank(A) < p, then Σ is not of full rank, and the description is degenerate in that all the probability lies in a subspace of dimensions m. This situation can be handled, but it requires higher realms of mathematics. In this study only the case m = p is used. Probability density function for x. Since u,..., u p are independent standard normal variates, their joint probability density function is f(u) = (π) p/ e p i= u i = (π) p/ e uu () and for the transformed u to x = µ + Au with distribution N(µ, Σ), the joint probability density function is f(x) = (π) p/ Σ e (x µ) Σ (x µ) () The multivariate normal distribution has a number of simple properties that make its use as a model for observed data most attractive. In the following it is assumed that x N(µ, Σ).

6 6 Multivariate Normal Distribution Property : If y = Bx + C is any linear transformation of x N(µ, Σ), where B is an (m p) matrix of real numbers with m p and rank m, and C a translation vector, then is y distributed as: y N(Bµ + C, BΣB ) Property : The marginal distribution of any subset of the x i has a multivariate normal distribution, whose mean vector and covariance matrix are obtained as the subvector of µ an submatrix of Σ appropriate to the chosen subset of x i. Property 3: Suppose that x is partitioned into two portions x and x, where x has q elements and x has (p q) elements. µ and Σ can be written in a conformable way: ( ) Σ Σ µ = (µ, µ ) and Σ = (3) Σ Σ With this few properties it can be shown that every random vector x can be sphericized [Gnanadesikan 997]. Sphericized residuals: for every random vector x N(µ, Σ) with mean vector x and covariance matrix Σ following Equation is valid: z = Σ (x µ) N(, Ip ) Σ (4) (proof see Appendix (A.)).. The Moments of a distribution In one dimension the moments of a distribution can be defined easily [Bronstein 99]. Let x be a continuous random variable with density function f(x). Then the i th -moment of x (i =,,,...) is defined as: if this integral converges. α i = E[X i ] = x i f(x)dx (5) Moments for multidimensional random variables x = (x, x,..., x p ) can be defined similarly as for one dimension. The i th moment is (for j l =,,..., p and l =,..., n): α j j...j n = E[X j,..., X jn ] =... Remark: some or all of the j l can be the same. x j x j... x jn f(x, x,..., x p )dx... dx p

7 .3 Hermite Polynomials 7 Standard normal distribution: Using the definition (Eq. ()) of a normal probability density function and the assumption that the variables are independent, the expression of the n th moment can be simplified. Given m = x n x n x np p with n = p j= n j, then the moment is: E[m] = m (π) p/ e (x +x + +x p ) dx... dx p = x n e x dx x n e x dx x np p e x p dxp π π π = I(n ) I(n ) I(n p ) (7) and I(n) evaluates to (see Section (A.4)): { I(n) = (n )!! if n even I(n) = if n odd.3 Hermite Polynomials Definition: the n th Hermite polynomials is defined as H n (x) = ( ) n e x d n dx n e x (8) Table of the first few polynomials: H (x) = H (x) = x H (x) = 4x H 3 (x) = 8x 3 x H 4 (x) = 6x 4 48x + H 5 (x) = 3x 5 + 6x 3 + x They define a complete orthogonal system with inner product: Related Work e x H n (x)h m (x) dx = π n n!δ nm (9) There are many methods, formal hypothesis testing procedures and informal graphical techniques, available to investigate the assumption that a set of data vectors can be treated as a set of independent observations from a multivariate normal distribution. The description of some methods may be found in [Gnanadesikan 997] and [Krzanowski 988].

8 8 Related Work In practice the presence of joint nonnormality is likely to be detected quite often by methods directed at studying the marginal normality of the observations on each variable. There exist also some tests that explicitly exploit the multivariate nature of the data in order to yield greater sensitivity. In this section a simple graphical and a more complex test are introduced.. Probability Plot Suppose that x has a multivariate normal distribution with mean µ and covariance matrix Σ. Using Equations () and (4) a variable z can be defined: and there exists an u, which equals following expression: z := (x µ) Σ (x µ) () z = u u = p u i where u N(, I p ) () i= But z, the sum of squares of p independent N(,), variates. Hence z has a χ distribution with p degrees of freedom. Thus if x,..., x n are a random sample from a N(µ,Σ) population, then z i = (x i µ) Σ (x i µ) for i =,..., n correspondingly form a random sample from a χ p population. Let z (), z (),..., z (n) denote the z (i) values arranged in increasing order, and suppose that z (i) is plotted against the quantile of a χ p distribution corresponding to a cumulative probability of (i )/n for i =,..., n [Krzanowski 988]. If the z (i) are truly independent observations from a χ p distribution then such a χ probability plot should yield a straight line. The χ probability plot is thus a simple graphical possibility of testing the hypothesis that the observations x i are independent realizations of a N(µ,Σ) variate: if the plot is approximately linear the hypothesis is tenable, but if there are marked deviations from linearity the hypothesis is not tenable. Problem: in practice, µ and Σ will be unknown, and the z (i) are thus not directly computable. The simplest way around the problem is to replace µ and Σ by the sample mean vector x and the sample covariance matrix S = n n i= (x i x)(x i x). Replacing µ and Σ enables the z (i) to be plotted, but two further problems have been introduced:. z (i) are no longer independent of each other, because each z (i) involves the common sample values x and S.. since x and S are only estimates of µ and Σ, the z (i) computed from them no longer have a χ distribution.

9 . Probability Plot 9 Solution: there are particularly two ways to solve this problems Enlarge of sample size: once n is greater than p, the difference between the true and the χ distribution seems to be negligible. In this case these complications can be ignored. Jack-knifed variables: if x and S are of the size n from a N p (µ, Σ) population, then z is distributed as: z p(n ) n(n p) F p,(n p) Therefore independence of each x i from x and S must be ensured. jack-knifed means and covariance matrices must been used (instead of the single mean x and S). The adjusted z i for use in the probability plot are thus ẑ i = (x (i) x (i) ) S (i) (x (i) x (i) ). It can be shown [Bartlett 95] that: S (i) = as + abs (x i x)(x i x) S b(x i x) S (x i x) x (i) = n (n x x i) where a = n n So each ẑ i can be obtained using just x, S and x (i). and b = n (n ) An example of these technique with a random sample of points is shown in Fig. () Ranked z (i) χ Quantile Figure : Probability plot for testing multivariate normality The probability plot technique depends on the human interpretation of a graph. Therefore it is not usable for our application, because we a seeking for a method which results in a objective measurement process. The jack-knifed mean x (i) and covariance matrix S (i) are simply the mean vector and covariance matrix of the (n ) sample members excluding x (i).

10 Related Work. Nearest Distance An other, more complex test for testing joint normality, called the nearest distance test, has been proposed by [Andrews et al. 97]. In this test nearest neighbor distances for each z-transformed point are passed through a series of steps to standard normal deviates. Under the null hypothesis these transformed distances are independent of the coordinates of points from which they are measured. This independence is tested by multiple regression techniques. In the next section this technique will be introduced and an instructive example will be presented in Section (..)... Procedure. Sphericizing: the first two steps in the procedure are to transform the data to the unit hypercube. This can be done by computing the sample mean vector ȳ and the covariance matrix S. Then the sphericized residuals z i have to be evaluated:. Uniforming: if z ij denotes the jth element of z i, then: z i = S / (y i ȳ) () x ij = Φ(z ij ), i =,..., n; j =,..., p (3) where Φ(z ij ) is the cumulative distribution function (cdf) of the standard normal distribution. 3. Nearest Distance: for each pair (x i, x j ) in the hypercube [, ] p, the distance may be calculated by using the metric: d(x i, x j ) = max k {min[ x ik x jk, x ik x jk ]} (4) and the smallest distance for every point x i may be obtained by: d min (x i ) = min i j d(x i, x j ) (5) To avoid boundary effects, the metric wraps around the opposite faces of the unit hypercube. If the distribution of the input sample y was multinormal, then it can be shown (see Sec. A.) that the distribution function F dmin (d) is given by: F dmin (d) = ( (d) p ) n (6) 4. Selection criterion: the next step is to define a critical distance d crit. Geometrically it can be explained using Figure (). If n points placed in equidistance positions in the unit hypercube (in dimensions the unit square), then the critical distance d crit is defined as: d crit = n /p (7)

11 . Nearest Distance 3 3 d krit 3 3 Figure : Defining the critical distance d crit = 6 in an example with 9 points. Now all points x i which have no neighbor with lower index in this critical volume d crit are selected. Algebraically it means that such a point x i must satisfy the following two Equations: d min (i) < n /p and d(i, j) > n /p for j < i. (8) 5. Independence: since all d min are positive, following inequalities (see Eq. (A.3)) holds: < d min < n /p < e n(d min(i)) p < (9) e [Gnanadesikan 997] claims that if the points x i are uniformly distributed in the unit hypercube, the right expression of Eq. (9) has a standard uniform distribution for the selected d min. A standard normal distribution corresponds to this probability: ( e < w i = Φ n(d min ) (i)) p < + () e Finally the computed w i should not show any dependence on x i. 6. Regression : such a dependency between the w i and the x i can be tested by fitting a quadratic surface in the elements of x to w and computing the regression sum of squares. In other words, the quadratic relationship has to be fitted E(w) = β + β T x + x T β x with the n points which survived the selection criteria. Then the regression sum of squares with (p + )(p + ) degrees of freedom is used for comparison. 7. Comparison : the obtained value from above has to be compared to the test value getting from the χ -distribution with (p + )(p + ) degrees of freedom. Joint normality has to be rejected for large values.

12 Related Work.. Example using a multivariate distribution In this example a multivariate normal random sample in dimensions of points was used (Fig. (3)). To show the effect of the several transformations, a scatterplot of the corresponding variables is added.. Sphericizing: starting with a multivariate normal distribution, all points were transformed using Equation () = z i = S / (y i ȳ) Figure 3: Scatterplot of all input y-points. Figure 4: Scatterplot of the sphericized points z.. Uniforming: after evaluating the cumulative density function Φ(z ij ) (see Eq. (3)) at every point z ij, the points are distributed as follows: Figure 5: Scatterplot of all uniformly distributed x ij. Figure 6: Merged distribution function theoretical (dashed) and experimental. 3. Nearest distance: the nearest distance of every point is added into the scatterplot (Fig. (7)). The experimental distribution of the d min is quite as similar as the computed theoretical function (see Eq. (6)).

13 . Nearest Distance Figure 7: Scatterplot with nearest distance area. Figure 8: Distribution of the d min with their theoretical curve (dashed). 4. Selection: after evaluating the selection criteria (Eq. (8)) the selected points were marked bold in the scatterplot (Fig. (9)). Their distribution is, as mentioned before, uniformly distributed (see Eq. (9) and Fig. ()) Figure 9: Scatterplot with selected nearest distance areas. Figure : (Standardized) distribution of the selected d min with their theoretical curve (dashed).

14 4 3 Normalized Moment Method 3 Normalized Moment Method The characteristics of a distribution can be described by its moments. This method is introduced using an example with standard multinormal distribution (a sample of n points). The goal is to find a set of descriptors of a distribution which can be compared to the corresponding set of a standard multivariate distribution. First this method will be developed in one dimension and in a second step it will be extended to p dimensions. 3. Dimension The theoretical moments and variance of the moments of a point of the standard normal distribution can be deduced using Equation (7): E[M] = {, mean, variance, skewness, kurtosis,... } = {I (j)} j=,,,... = {,,,, 3,, 5,...} Var[M] = {I (j) I (j) } j=,,,... = {,,, 5, 96, 945, 7,... } For a random sample with n points the moments and variances are: E[M] = {,,,,... } and Var[M] = {,,, 5,... } n Let S be a random sample of n points. For every point x i (i =,..., n) in S the first m powers M i = {, x i, x i,..., x m i } T are computed and averaged over all n points of S: M = {, x, x,..., x m } T with x l = n n x l i and l =,..., m i= For visualization of the problems which are introduced using moments, this procedure was done several times and the moments were plotted against each other (see Fig. ()). It is obvious that the moments are correlated and not independent. The problem of correlation (but not dependency) can be solved by using an orthonormal basis of Hermite polynomials (Eq. (8)): H n (x) := The orthonormality can be defined using the inner product: n! n H n( x ) () e x / Hn (x) H m (x)dx = δ nm (3) π

15 3. Dimension 5 M[] M[] M[3] M[3] M[] M[] M[4] M[4] M[4] M[] M[] M[3] M[5] M[5] M[5] M[5] M[] M[] M[3] M[4] Figure : Plot of the moments against each other using iid. samples of 5 points. Using this polynomials and evaluating the first m normalized Hermite functions for every point x i in a sample S, we get: M i = {, H (x i ), H (x i ),..., H m (x i )} T = {, x i, and after determining the average, we get for the sample: M = {, ξ, ξ,..., ξ m } T with ξ l := n (x i ), (x 3 i 3x i ),... } T (4) 6 n i= H l (x i ) and (l =,,..., m) (5) After evaluating M several times for distinct samples S and plotting again the moments against each other (see Fig. ()), we see that they are uncorrelated. This behavior of the moments can be used for characterizing a multivariate normal distribution. Remark: normalized Hermite moments are just linearly transformed raw moments (Eq. (5)). Our Hermite and the raw moments are related by a linear transformations which is represented by the matrix N:... M = N M with... N =

16 6 3 Normalized Moment Method 3 4 M[] M[] 3 M[3] M[] 3 M[3] M[] M[4] M[] M[4] M[] M[4] M[3] 3 M[5] M[] 3 M[5] M[] 3 M[5] M[3] M[5] M[4] Figure : Plot of the normalized Hermite moments against each other using iid. samples of 5 points. 3. Multidimensional The next step is to expand this method to more than one dimension. The problem is that this formalism can not be visualized as in one dimension. For further development it is sufficient to use one sample S from a distribution. Using the normalized Hermite moments M i (see Eq. (4)) we get for every point x i the upper corner of a tensor of rank p (p is the dimension of the points x i ): { m M i = k= } H αk (x ik ) αk m (6) After getting all moment tensors they must be averaged over all n points which results in: M = n n M i (7) For example in p = dimensions, that means a sample S of a distribution with n points (x i, y i ) (i =,..., n), we get for the first m = 3 moments of every point a triangular matrix i= instead of an array of length m as in one dimension

17 3.3 Distribution of the test value T p,m 7 M i of rank 4 which is averaged over all n points: M = n = n n { H α (x i ) H α (y i )} α +α 3 i= n i= H (x i ) H (x i ) H3 (x i ) H (y i ) H (x i ) H (y i ) H (x i ) H (y i ) H (y i ) H (x i ) H (y i ) H 3 (y i ) The entries of this tensor M are used to define a a test value T. The easiest way is to take the sum of squares of all n elements (without the first one 3 ) and to normalize them: T p,m := n { M} (8) 3.3 Distribution of the test value T p,m For the theoretical variance of T p,m we did not found a more or less simple and usable form. Therefore a test statistic for the multivariate standard distribution case must be generated numerically. Some examples are shown in Figure (3). They were computed using 5 random samples of size 5 form a bivariate normal distribution. They can be used for hypothesis testing m = 4 m = m = 6 m = T Figure 3: Distribution of the test value T,m of a multivariate standard normal distribution computed using 5 random samples S α with each 5 points and maximum number of moment 4,6,8 and. 3 Because the first element is just a constant of value.

18 8 4 Reconstruction using Normalized Moments 4 Reconstruction using Normalized Moments The normalized Hermite moments can also be used to reconstruct the density function. Defining the Gaussian function as g(x) = (/ π) exp( x /) and with the Hermite polynomials H n (x) (see Eq. ()), resp. its inner product (Eq. (3)), a series expansion can be done for any smoothed density function: p r (x) = g(x) m c m Hm (x) (9) with series coefficients c m. Because of the orthonormality of the Hermite polynomials the coefficients are given by: c m = p r (x) H m (x)dx (3) The computed average of the several Hermite moments ξ m (Eq. (5)) of a given sample and the series coefficients can be combined 4 in the following manner: ξ m = n n j= H m (x j ) p r (x) H m (x)dx = c m (3) The result is that with the computed moments ξ m the density function p r (x) can be approximized. The extension to more than one dimension is obvious. Some examples of reconstructions are shown in Section (A.6). 4 Monte-Carlo approximation of this integral uses a finite sample from a distribution

19 9 Part II Measurements In this section some quality measurements will be done. We want to know how good the normalized moment method distinguishes any distribution from the standard normal one in comparison to the nearest distance method introduced in section (.). For this process, some special distributions were created. Some of them have marginal normal distributions and they were tested with both methods. 5 Test Distributions Most of these distributions differ crassly from normality. Some are concentrated on lines of measure the nearest distance method could notice that more precise than the moment method. 5. Test distributions with marginal normal distribution A set of three artificial distributions are introduced here. Each sample of them has points and marginal normal variation. For better visualization only the two dimensional case was used. Diamond distribution: this bivariate distribution covers only a diamond shaped curve. Its mean and covariance matrix are: ( ) E D [x] = (, ) and Var D [x] = (a) Densityplot (b) cdf Figure 4: Diamond distribution with marginal cumulative density functions in x-and y- direction.

20 5 Test Distributions Half normal distribution: this bivariate distribution covers the quadrants x >, y > and x <, y <. Its mean and covariance matrix are: ( ) E H [x] = (, ) and Var H [x] = π π (a) Densityplot (b) cdf Figure 5: Half normal distribution with marginal cumulative density functions in x-and y-direction. X-distribution: the bivariate distribution covers the lines x = y and x = y. Its mean and covariance matrix are: ( ) E X [x] = (, ) and Var X [x] = (a) Densityplot (b) cdf Figure 6: X-distribution with marginal cumulative density functions in x-and y-direction.

21 5. Other test distribution 5. Other test distribution This type of distribution has no marginal normality, but we are interested in knowing how such a sample of scattered points will be described by all these methods. We designed this peculiar distribution specifically to provide the nearest distance method with data on which it could perform well. Lattice distribution: All points in this distribution are set on a scatterpoint. Because the algorithm of nearest distance method needs positive absolute distances (that means not equal zero), a small variation is added to every point. The variation is uniformly distributed in a small box around every point. The mean and covariance matrix are derived in section (A.5) (a) Densityplot (b) cdf Figure 7: Scattered point distribution with marginal cumulative density functions in x-and y-direction.

22 6 Results 6 Results To measure the variability of the test values T p,m (Eq. (8)), 7 samples of size points were used. The distribution of the computed test values were plotted against a reference plot. This reference (or theoretical) curve was computed before with a much higher number of samples from the standard normal case. Used parameters: dimension: p = points: n = max. moment: m = 6 In Figure (8) the results using the nearest distance method are plotted, and the results of the normalized moment method is shown in Figure (9) (a) Diamond Distribution 5 5 (b) X-Distribution (c) Half Normal Distribution 3 4 (d) Lattice Distribution Figure 8: Nearest distance method: all tested distributions in comparison to a bivariate normal distribution (dashed line). Parameters: points, 7 sample distributions, dimensions.

23 (a) Diamond Distribution (b) X-Distribution (c) Half Normal Distribution (d) Lattice Distribution Figure 9: Normalized moments method: all tested distributions in comparison to a bivariate normal distribution (dotted line). Parameters: points, 7 sample distributions, dimensions, 6 th oder of moment. 7 Conclusions In search of a more robust method to test if a given sample in one or more dimensions is normally distributed, we introduced a new procedure using the moments of the points. The moments of such points describe their distribution. This fact allows us to define a test value that summarizes such information. This value measure the dissimilarity of the given sampled distribution from normality. The advantages and disadvantages of the normalized Hermite moment method can be summarized as follows: Advantages: Real testing of joint normality instead of marginal normality.

24 4 7 Conclusions Better discrimination of distributions with marginal normality in comparison to other methods. Easy in use and fast for not to high dimensions and large number of points. The costs of the computation increase in the order of O(np m ). (Nearest distance method: O(n )) Disadvantages: We have not found a formulation for the theoretical distribution of the test value. The theoretical distribution must first be computed for a large number of samples.

25 5 A Miscellaneous A. Sphericized Residuals: The sphericized residual z can be written as: so for the distribution follows: z = Σ (x µ) = Σ x Σ µ (3) = Ax + b z N(Aµ + b, AΣA) = N(Σ µ Σ µ, Σ Σ(Σ ) T ) (33) = N(, I p ) A. Distribution of the nearest distances d. n = points in p = dimension: let us first have a look on the distance distribution F of random 5 points lying on the unit circle (see Fig. ()). x D back D for x Figure : closed unit interval as a circle with random placed points Distance forward: D for = X X mod U[, ] Distance backward: D back = D for = X X mod U[, ] { Dfor, if D Both directions: min[d for, D for ] = for / U[, /] D for, if D for / The distribution function F D (d) of these distances is: d < F D (d) = d for d (34) < d 5 that means the positions of these points are uniformly distributed.

26 6 A Miscellaneous. n points in p = dimension: let the minimal nearest neighbor distance over all points x i be d min = min i i d i. Using the probability P, the distribution of the distances d can be written as: F dmin (d) = P (d min d) = P (d d d d... d i d d i+ d... d n d) = P (d > d & d > d &... & d i > d & d i+ > d &... & d n > d) = P (d > d)p (d > d)...p (d i > d)p (d i+ > d)...p (d n > d) = ( P (d d))( P (d d))... ( P (d i d))( P (d i+ d))...( P (d n d)) = ( F d (d)) n = ( d) n 3. n points in p dimension: with a similar computation as above follows the distribution for the nearest distances d: F d p min (d) = ( (d)p ) n (35) A.3 Equivalence of the d min Using Equation (8) following equivalences can be found: < d min < n /p < (d min ) p < n > n(d min ) p > > e n(d min) p > e < e n(d min) p < e < e n(d min) p e < This verifies that the argument of Φ (Eq. ) is in the correct range. A.4 The definite Integral I(n) Using definition (Eq. ()), partial integration and the substitution u = x /, du = xdx the nd moment of a Gaussian random variable can be written as: I() = π = x e x3 /3 π } {{ } I()= x e x / dx = + x π e x / dx } {{ } I()= x x e x / dx π =

27 A.5 Variance of the Scatter Distribution 7 And the n th moment can be evaluated recursively: I(n) = x n e x / dx = x (n ) xe x / dx π = x (n ) e x / +(n ) x (n ) e x / dx =... π π }{{}}{{} = I(n ) After (n ) partial integrations it can be deduced: I(n) = (n ) I(n ) I() = I() = A.5 Variance of the Scatter Distribution I(n) = (n )!! if n even I(n) = if n odd Because of symmetry reasons, the mean and variance of this distribution can be developed in one dimension and then be extended. The configuration is shown in Figure (). n boxes with width w are positioned at the marked points. All points are uniformly distributed over all scatterpoints added with a random variation. The variation is also uniformly distributed in every box. w p(x) nw 3 n 3 x Figure : Configuration of the dimensional lattice distribution. The probability density function p(x) is: p(x) = nw x in a box and p(x)dx = n i= n i+ w i w nw dx = Mean µ B : the mean must be E[x] =

28 8 A Miscellaneous Variance: Var[x] = = 3nw x p(x)dx = n i= n = n + w n i= n i+ w i w x nw dx ((i + w )3 (i w )3 ) = 3n n i= n (3i + w 4 ) A.6 Examples of Reconstructions For illustrating some reconstructions the distributions introduced in Section (5) were used. The multinormal case has points and the other have points. The series expansion (Eq. (9)) was done till the 6 th moment (a) Reconstructed density function (b) Contourplot Figure : Reconstruction of a density function from the moments of a bivariate normal distributed sample.

29 A.6 Examples of Reconstructions (a) Reconstructed density function (b) Contourplot Figure 3: Reconstruction of a density function from the moments of a sample from a diamond distribution (a) Reconstructed density function (b) Contourplot Figure 4: Reconstruction of a density function from the moments of a sample from a half-normal distribution.

30 3 A Miscellaneous (a) Reconstructed density function (b) Contourplot Figure 5: Reconstruction of a density function from the moments of a sample from a X distribution (a) Reconstructed density function (b) Contourplot Figure 6: Reconstruction of a density function from the moments of a sample from a lattice distribution.

31 References 3 References [Andrews et al. 97] D.F. Andrews, R. Gnanadesikan, and J.L. Warner. Methods for assessing multivariate normality. Bell Laboratories Memorandum, 97. [Bartlett 95] M.S Bartlett. An inverse matrix adjustment arising in discriminant analysis. Ann. Math. Statist., :7, [Bronstein 99] Semendjajaew Bronstein. Taschenbuch der Mathematik. B.G. Teubner Verlagsgesellschaft, Stuttgart, 5. edition, [Gnanadesikan 997] R. Gnanadesikan. Methods for Statistical Data Analysis of Multivariate Observations. Wiley Series in Probalbility and Statistics. John Wiley & Sons, Inc., , 7, [Krzanowski 988] W.J. Krzanowski. Principles of Multivariate Analysis. Oxford Statistical Science Series. Oxford University Press, New York, , 7, 8 [Patel and Read 996] Jagdish K. Patel and Campbell B. Read. Handbook of the Normal Distribution. Statistics: Textbooks and Monographs. Marcel Dekker, Inc., second edition,

Multivariate Distributions

IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate