1. Density and properties Brief outline 2. Sampling from multivariate normal and MLE 3. Sampling distribution and large sample behavior of X and S 4.

Size: px

Start display at page:

Download "1. Density and properties Brief outline 2. Sampling from multivariate normal and MLE 3. Sampling distribution and large sample behavior of X and S 4."

Madeleine Gloria Knight
5 years ago
Views:

1 Multivariate normal distribution Reading: AMSA: pages Multivariate Analysis, Spring 2016 Institute of Statistics, National Chiao Tung University March 1, 2016

2 1. Density and properties Brief outline 2. Sampling from multivariate normal and MLE 3. Sampling distribution and large sample behavior of X and S 4. Assessing the assumption of normality 5. Detecting outliers and cleaning data 6. Transformations to near normality 2 / 26

3 X 1. X p Density = X (p 1) : random vector X N p (µ (p 1), Σ (p p) ) (i.e., follow a p-dimensional normal distribution) f (x) = 1 (2π) p 2 Σ 1 2 e (x µ)t Σ 1 (x µ)/2 where < x i <, i = 1, 2,, p 3 / 26

4 p = 2, Bivariate normal density f (x 1, x 2 ) = 1 2π σ 11 σ 22 (1 ρ 212 ) { exp [ ( ) 1 x1 µ 2 ( ) 1 x2 µ 2 2 2(1 ρ 2 12 ) + σ11 σ22 ( ) ( ) ]} x1 µ 1 x2 µ 2 2ρ 12 σ11 σ22 [ ] µ = (µ 1, µ 2 ) T σ11 σ, Σ = 12, ρ σ 12 σ 12 = σ 12 σ11 σ / 26

5 Properties Result: For a p-dimensional normal dist. X N p (µ, Σ), 1. µ is the point of maximum density (the mode). 2. µ is the expected value of X (the mean). Result: 1. If X N p (µ, Σ), then for any a (p 1) vector, a T X N(a T µ, a T Σa). 2. Also if a T X N(a T µ, a T Σa) for every a, then X N p (µ, Σ). 5 / 26

6 Result: If X N p (µ, Σ), 1. the q linear combination a 11 X a 1p X p A (q p) X =. a q1 X a qp X p are distributed as N q (Aµ, AΣA T ). 2. Also X + d, where d is a vector of constant, is distributed as N p (µ + d, Σ). 6 / 26

7 Result: X N p (µ, Σ), and X (p 1) = [ X1 (q 1) X 2 ((p q) 1) ] µ (p 1) = [ µ1 (q 1) [ Σ11 Σ (p p) = (q q) Σ 12 (q (p q)) then X 1 N q (µ 1, Σ 11 ). Σ 21 ((p q) q) µ 2 ((p q) 1) Σ 22 ((p q) (p q)) proof: Let A q p = [I (q q) 0 (q (p q)) ], then AX = X 1 N q ( Aµ, AΣA }{{}}{{ T }). µ Σ 1 11 Note: This implies that, for multivariate normal, all marginal dist. are normal. ] ] 7 / 26

8 Result: (a) If X 1 (q1 1) and X 2 (q2 1) are independent, then Cov(X [ ] 1, X 2 ) q1 q 2 = 0. [ ] [ ] ) X1 µ1 Σ11 Σ (b) If N X (q1 +q 2 )(, 12, then X 2 µ 2 Σ 21 Σ 1 and 22 X 2 are independent if and only if Σ 12 = 0. (c) If X 1 N q1 (µ 1, Σ 11 ), X 2 N q2 (µ 2, Σ 22 ), and X 1 and X 2 are independent, then [ X1 X 2 ] N (q1 +q 2 )( [ µ1 µ 2 ], [ Σ11 0 ] ) 0 Σ 22 (Note: marginal normals do not imply that their combined vector is normal) 8 / 26

9 [ ] X1 Result: Let X = N X p (µ, Σ) with [ ] [ 2 ] µ1 Σ11 Σ µ =, Σ = 12, and Σ µ 2 Σ 21 Σ 22 > 0. Then the 22 conditional dist. of X 1 given X 2 = x 2 is normal and has mean = µ 1 + Σ 12 Σ 1 22 (x 2 µ 2 ) covariance = Σ 11 Σ 12 Σ 1 22 Σ 21. Note: Above result implies that, for multivariate normal, 1. All conditional dist. are normal. 2. The conditional mean is the form µ 1 + β 1,q+1 (x q+1 µ q+1 ) + + β 1,p (x p µ p ). µ q + β q,q+1 (x q+1 µ q+1 ) + + β q,p (x p µ p ) where β s are defined by Σ 12 Σ The conditional covariance, Σ 11 Σ 12 Σ 1 22 Σ 21, dose not dependent on the value of the conditional variable, x 2. 9 / 26

10 Result: Let X N p (µ, Σ) with Σ > 0. Then (a) (X µ) T Σ 1 (X µ) is distributed as χ 2 p, where χ 2 p denotes the Chi-Square dist. with p degrees of freedom. (b) The N p (µ, Σ) dist. assigns prob. 1 α to the solid ellipsoid {x : (x µ) T Σ 1 (x µ) χ 2 p(α)}, where χ 2 p(α) denotes the upper (100α)th percentile of χ 2 p dist. proof: Note: Z = Σ 1 2 (X µ) has Np (0, I) and (X µ) T Σ 1 (X µ) = Z T Z. 10 / 26

11 Result: Let X 1,, X n be mutually independent with X j N p (µ j, Σ). (Note, X j s have the same covariance matrix Σ). Then V 1 = c 1 X c n X n N p ( n j=1 c jµ j, ( n j=1 c j 2)Σ). Moreover, V 1 and V 2 = b 1 X b n X n are jointly multivariate normal with covariance matrix [ ( n j=1 c j 2)Σ ] (bt c)σ (b T c)σ ( n j=1 b2 j )Σ. Consequently, V 1 and V 2 are independent if b T c = n j=1 c jb j = 0. proof: 11 / 26

12 Sampling from multivariate normal and MLE X 1,, X n : random sample from N p (µ, Σ) = = {Joint density of X 1,, X n } { n { 1 exp 1 [ ] }} (x j=1 (2π) p 2 Σ 1 j µ) T Σ 1 (x j µ) n [ ] exp (2π) np 2 Σ n 2 1 (x j µ) T Σ 1 (x j µ) 2 j=1 Result: Let X 1,, X n be random sample from N p (µ, Σ). Then µ = 1 n X j = X n Σ = 1 n (X j X)(X j X) T = n 1 n n S j=1 are MLE of µ and Σ. j=1 12 / 26

13 Result: Let X 1,, X n be random sample from N p (µ, Σ). Then X and S are sufficient statistics. proof: The joint density depends on the whole set of observations x 1,, x n only through x and n j=1 (x j x)(x j x) T = (n 1)S. Note: Many multivariate techniques begin with sample mean and covariance, it is prudent to check on the adequacy of multivariate normal assumption. If data cannot be regarded as multivariate normal, techniques that depends solely on x and S may ignore useful information. 13 / 26

14 Sampling dist. and large sample behavior of X and S For univariate case (p = 1), X 1,, X n are random sample from N(µ, σ 2 ), then 1. X N(µ, 1 n σ2 ) 2. (n 1)S 2 (= n j=1 (X j X ) 2 ) σ 2 χ 2 n 1 3. X and S 2 are independent. Wishart distribution (multivariate analogy of χ 2 ) Let Z 1,, Z m be independently distributed as N p (0, Σ), then W m ( Σ) = Wishart dist. with m d.f. = distribution of m j=1 Z jz T j 14 / 26

15 Result: Let X 1,, X n be a random sample of size n from a p-variate normal dist. with mean µ and covariance matrix Σ. Then 1. X N p (µ, 1 n Σ) 2. (n 1)S W n 1 ( Σ) 3. X and S are independent Law of larger number: X 1,, X n be a random sample with mean µ and covariance Σ. Then, X p µ (converge in probability to µ) n proof: S (or S n ) p Σ n 15 / 26

16 Central limit theorem: Let X 1,, X n be independent observations from a population with mean µ and covariance Σ. Then and n(x µ) n(x µ) T S 1 (X µ) d (n p) N p(0, Σ) d χ 2 p (n p) Note: From previously, n(x µ) T Σ 1 (X µ) χ 2 p if X N p (µ, 1 n Σ). The χ2 p is approximately the sampling dist. of n(x µ) T Σ 1 (X µ) when X is approximately normally distributed. Replacing Σ 1 by S 1 does not seriously affect this approximation for n is much larger than p. 16 / 26

17 Assessing the assumption of normality In practical work, the investigation of normality in one-dimension and two-dimension are ordinarily sufficient. We want address these questions: 1. Do the marginal distribution of X = (X 1,, X p ) appear to be normal? What about a few linear combinations of the components X i? 2. Do the scatter plots of pairs of observations on different characteristics give the elliptical appearance expected form normal populations? 3. Are there any wild observation that should be checked for accuracy? 17 / 26

18 Evaluating the normality of the univariate normal distribution Histogram of each X i, and observed proportion of observations lying in Q-Q plot against normal ( x i s ii, x i + s ii ) ( x i 2 s ii, x i + 2 s ii ) Shapiro and Wilk test for normality 18 / 26

19 Evaluating bivariate normality If the observations were generated from a multivariate normal distribution, each bivariate distribution should be normal, and the contours of constant density would be ellipses. The scatter plots for pairs of characteristics should exhibit an overall pattern that is nearly elliptical. Since X N p (µ, Σ) (X µ) T Σ 1 (X µ) χ 2 p so { all x such that (x x) T S 1 (x x) χ 2 p(1 r) } should be roughly r 100%. 19 / 26

20 For the sample observations x 1, x 2,, x n, can do Q-Q plot of d 2 j = (x j x) T S 1 (x j x), j = 1,, n versus quantiles of χ 2 p (the chi-square plot). Should be a line with slope 1 if x j s are from multivariate normal. 1. Order the dj 2 s from smallest to largest as d(1) 2 d (1) 2 d (n) 2 (. ) 2. Graph the pairs q c,p ( j 1 2 n ), d (j) 2, where q c,p ( j 1 2 n ) is the ( j 1 2 n ) 100 quantile of χ2 p. 20 / 26

21 Detecting outliers and cleaning data 1. Make a dot plot for each variable. 2. Make a scatter plot for each pair of variables. 3. Calculate the standardized values z jk = x jk x k, j = 1,, n, k = 1,, p skk Examine for large or small values (usually for z jk 3.5). 4. Calculate d 2 j = (x j x) T S 1 (x j x) In a chi-square plot (d 2 j s versus quantiles of χ2 p), examine those points that are farthest from the origin (usually for exceed (larger than) χ 2 p(0.05)). 21 / 26

22 Transformations to normality Method: To make non-normal data more normal looking by replacing x with g(x) For non-continuous data Original scale count, y proportion, ˆp Transformed scale y ( logit(ˆp) = log ˆp 1 ˆp correlation, r Fisher s z(r) = 1 2 log ( 1+r 1 r ) ) 22 / 26

23 Transformations to normality: continuous data Skewed data 23 / 26

24 Power transformation 24 / 26

25 Box-Cox transformation: for x > 0 { x λ x (λ) 1 λ 0 = λ ln(x) λ = 0 The choice of λ is the solution that maximize [ ] l(λ) = n 2 ln 1 n (x (λ) j x n (λ) ) 2 + (λ 1) j=1 where x (λ) = 1 n n j=1 x (λ) j. Why does it work? n ln(x j ), j=1 25 / 26

26 Transforming multivariate data Method 1: make each marginal distribution approximately normal by, e.g., Box-Cox transformation Method 2: Multivariate Box-Cox transformation Choose λ = (λ 1,, λ p ) which maximizes l(λ 1,, λ p ) = n ln S(λ) 2 n + (λ 1 1) ln(x j1 ). + (λ p 1) j=1 n ln(x jp ) where S(λ) is the sample covariance matrix computed by j=1 x (λ) j = [ λ x 1 j1 1 λ 1 x λp jp 1 ] T, j = 1,, n λ p 26 / 26

STT 843 Key to Homework 1 Spring 2018

STT 843 Key to Homework 1 Spring 2018 STT 843 Key to Homework Spring 208 Due date: Feb 4, 208 42 (a Because σ = 2, σ 22 = and ρ 2 = 05, we have σ 2 = ρ 2 σ σ22 = 2/2 Then, the mean and covariance of the bivariate normal is µ = ( 0 2 and Σ