Multivariate Analysis Homework 1

Multivariate Analysis Homework A490970 Yi-Chen Zhang March 6, 08 4.. Consider a bivariate normal population with µ = 0, µ =, σ =, σ =, and ρ = 0.5. a Write out the bivariate normal density. b Write out the squared generalized distance expression x µ T Σ x µ as a function of x and x. c Determine and sketch the constant-density contour that contains 50% of the probability. a The multivariate normal density is defined by the following equation. { fx = exp } π p/ Σ / x µt Σ x µ. x µ σ σ In this question, we have p =, x =, µ =, Σ = x µ σ σ 0 σ = ρ σ σ. Note that µ =, Σ =, Σ = Σ / =, and Σ =. So the bivariate normal density is fx = = { exp π / { exp 6π x x x x x + x } } x x, and =, b x µ T Σ x µ = x x x x = x x x + x. c For α = 0.5, the solid ellipsoid of x, x satisfy x µ T Σ x µ χ p,α = c will have probability 50%. From the quantile function in R we have χ,0.5 = qchisq0.5,df= =.86, therefore, c =.774. The eigenvalues of Σ are λ, λ =.660, 0.640 with eigenvectors 0.888 0.4597 e e =. 0.4597 0.888 Therefore, we have the axes as: c λ =.8 and c λ = 0.975. The contour is plotted in Figure.

x 0 4 4 0 4 x Figure : Contour that contains 50% of the probability 4.4. Let X be N µ, Σ with µ T =,, and Σ = a Find the distribution of X X + X. b Relabel the variables if necessary, and find a vector a such that X X a T X are independent. X and a Let a =,, T, then a T X = X X + X. Therefore, where and a T X Na T µ, a T Σa, a T µ = = a T Σa = = 9 The distribution of X X + X is N, 9. b Let a = T a a, then Y = X a T X = a X X + X a X. 0 0 X Now, let A =, then AX = NAµ, AΣA a a Y T, where 0 a 0 0 AΣA T = a a 0 a a = a + a a + a a 4a + a a + a +

Since we want to have X and Y independent, this implies that a a + = 0. So we have vector a = + c, for c R 0 4 0 4.6. Let X be distributed as N µ, Σ, where µ T =,, and Σ = 0 5 0. Which 0 of the following random variables are independent? Explain. a X and X b X and X c X and X d X, X and X e X and X + X X a σ = σ = 0, X and X are independent. b σ = σ =, X and X are not independent. c σ = σ = 0, X and X are independent. d We rearrange the covariance matrix and partition it. The new covariance matrix is as following: 4 0 Σ = 0 0 0 5 It is clear that X, X and X are independent. 0 0 X e Let A =, then AX = and AX NAµ, AΣA X + X X T, where 4 0 0 0 AΣA T = 0 5 0 0 0 0 4 6 = 6 6 It is clear that X and X + X X are not independent. 4.7. Refer to Exercise 4.6 and specify each of the following. a The conditional distribution of X, given that X = x. b The conditional distribution of X, given that X = x and X = x. X We use the result 4.6 from textbook. Let X = Nµ, Σ with µ = X Σ Σ and Σ = and Σ Σ Σ > 0. Then X X = x N µ + Σ Σ x µ, Σ Σ Σ Σ µ µ

a X X = x N + x, 4 X X = x N x +, b X X = x, X = x N + 0 5 0 x, 4 0 5 0 0 0 x 0 X X = x, X = x N x +, 4.6. Let X, X, X, and X 4 be independent N p µ, Σ random vectors. a Find the marginal distributions for each of the random vectors V = 4 X 4 X + 4 X 4 X 4 and V = 4 X + 4 X 4 X 4 X 4 b Find the joint density of the random vectors V and V defined in a. a By result 4.8 in the textbook, V and V have the following distribution n n N p c i µ, c i Σ i= i= Then we have V N p 0, Σ and V 4 N p 0, Σ. 4 b Also by result 4.8, V and V are jointly multivariate normal with covariance matrix n c i Σ b T cσ i= n b T cσ Σ, j= b j with c =,,, 4 4 4 4 T and b =,,, 4 4 4 4 T. So that we have the joint distribution of V and V as following: V 0 N V p, Σ 0 4 0 0 Σ 4 4.8. Find the maximum likelihood estimates of the mean vector µ and the covariance matrix Σ based on the random sample 6 X = 4 4 5 7 4 7 from a bivariate normal population. 4

Since the random samples X, X, X, and X 4 are from normal population, the maximum likelihood estimates of µ and Σ are X and n X i n XX i X T. Therefore, ˆµ = X = 4 6 and Σ = 4 i= 4 X i XX i X T = i= / /4 /4 / 4.9. Let X, X,..., X 0 be a random sample of size n = 0 from an N 6 µ, Σ population. Specify each of the following completely. a The distribution of X µ T Σ X µ b The distributions of X and n X µ c The distribution of n S a X µ T Σ X µ is distributed as χ 6 b X is distributed as N6 µ, Σ and n X 0 µ is distributed as N6 0, Σ 0 c n S is distributed as Wishart distribution Z i Zi T, where Z i N 6 0, Σ. We write this as W 6 9, Σ, i.e., Wishart distribution with dimensionality 6, degrees of freedom 9, and covariance matrix Σ. 4.. Let X,..., X 60 be a random sample of size 60 from a four-variate normal distribution having mean µ and covariance Σ. Specify each of the following completely. a The distribution of X b The distribution of X µ T Σ X µ c The distribution of n X µ T Σ X µ d The approximate distribution of n X µ T S X µ a X is distributed as N4 µ, 60 Σ. b X µ T Σ X µ is distributed as χ 4. c n X µ T Σ X µ is distributed as χ 4. d Since 60 4, n X µ T S X µ can be approximated as χ 4. 4.. Consider the annual rates of return including dividends on the Dow-Jones industrial average for the years 996-005. These data, multiplied by 00, are 0.6. 5. 6.8 7. 6. 5..6 6.0 Use these 0 observations to complete the following. a Construct a Q-Q plot. Do the data seem to be normally distributed? Explain. b Carry out a test of normality based on the correlation coefficient r Q. Let the significance level be α = 0.. i= a The Q-Q plot of this data is plotted in Figure. It seems that all the sample quantiles are close the theoretical quantiles. However, the Q-Q plots are not particularly informative unless the sample size is moderate to large, for instance, n 0. There can be quite a bit of variability in the straightness of the Q-Q plot for small samples, even when the observations are known to come from a normal population. 5

Sample Quantiles 0 0 0 0.5.0 0.5 0.0 0.5.0.5 Theoretical Quantiles Figure : Normal Q-Q plot b From 4- in the textbook, the q Q is defined by r Q = n n j= x j xq j q j= x j x n j= q j q Using the information from the data, we have r Q = 0.95. The R code of this calculation is compiled in Appendix. From Table 4. in the textbook we know that the critical point to test of normality at the 0% level of significance corresponding to n = 9 and α = 0. is between 0.90 and 0.95. Since r Q = 0.95 > the critical point, we do not reject the hypothesis of normality. 4.6. Exercise. gives the age x, measured in years, as well as the selling price x, measured in thousands of dollars, for n = 0 used cars. These data are reproduced as follows: x 4 5 6 8 9 x 8.95 9.00 7.95 5.54 4.00.95 8.94 7.49 6.00.99 a Use the results of Exercise. to calculate the squared statistical distances x j x T S x j x, j =,,..., 0, where x T j = x j, x j. b Using the distances in Part a, determine the proportion of the observations falling within the estimated 50% probability contour of a bivariate normal distribution. c Order the distances in Part a and construct a chi-square plot. d Given the results in Parts b and c, are these data approximately bivariate normal? Explain. x 5. 0.6 7.70 a From Exercise. we have x = = and S =. x.48 7.70 0.8544 The squared statistical distances d j = x j x T S x j x, j =,..., 0 are calculated and listed below d d d d 4 d 5 d 6 d 7 d 8 d 9 d 0.875.00.9009 0.75 0.05 0.076.79 0.865.75 4.5 6

b We plot the data points and 50% probability contour the blue ellipse in Figure. It is clear that subject 4, 5, 6, 8, and 9 are falling within the estimated 50% probability contour. The proportion of that is 0.5. x 5 0 5 4 5 6 7 8 9 4 6 8 0 x 0 Figure : Contour of a bivariate normal c The squared distances in Part a are ordered as below. The chi-square plot is shown in Figure 4. d 6 d 5 d 4 d 8 d 9 d d d d 7 d 0 0.076 0.05 0.75 0.865.75.875.00.9009.79 4.5 Sample Quantiles 0 4 0 4 6 8 0 4 Theoretical Quantiles Figure 4: Chi-square plot d Given the results in Parts b and c, we conclude these data are approximately bivariate normal. Most of the data are around the theoretical line. 7

Appendix R code for Problem 4. c. > libraryellipse > librarymass > librarymvtnorm > set.seed > > mu <- c0, > Sigma <- matrixc,sqrt/,sqrt/,, nrow=, ncol= > X <- mvrnormn=0000,mu=mu, Sigma=Sigma > lambda <- eigensigma$values > Gamma <- eigensigma$vectors > elps <- ttellipsesigma, level=0.5, npoints=000+mu > chi <- qchisq0.5,df= > c <- sqrtchi > factor <- c*sqrtlambda > plotx[,],x[,] > lineselps > pointsmu[], mu[] > segmentsmu[],mu[],factor[]*gamma[,],factor[]*gamma[,]+mu[] > segmentsmu[],mu[],factor[]*gamma[,],factor[]*gamma[,]+mu[] R code for Problem 4.. > x <- c-0.6,., 5., -6.8, -7., -6., 5.,.6, 6.0 > # a > qqnormx > qqlinex > # b > y <- sortx > n <- lengthy > p <- :n-0.5/n > q <- qnormp > rq <- cory,q R code for Problem 4.6. > n <- 0 > x <- c,,,,4,5,6,8,9, > x <- c8.95, 9.00, 7.95, 5.54, 4.00,.95, 8.94, 7.49, 6.00,.99 > X <- cbindx,x > Xbar <- colmeansx > S <- covx > Sinv <- solves > > # a > d <- diagttx-xbar%*%sinv%*%tx-xbar > > # b > libraryellipse 8

> p <- > elps <- ttellipses, level=0.85, npoints=000+xbar > plotx[,],x[,],type="n" > index <- d < qchisq0.5,df=p > textx[,][index],x[,][index],:n[index],col="blue" > textx[,][!index],x[,][!index],:n[!index],col="red" > lineselps,col="blue" > > # c > namesd <- :0 > sortd > qqplotqchisqppoints500,df=p, d, main="", + xlab="theoretical Quantiles", ylab="sample Quantiles" > qqlined,distribution=functionx{qchisqx,df=p} 9