Multivariate Analysis Homework 1

Similar documents
STT 843 Key to Homework 1 Spring 2018

Exam 2. Jeremy Morris. March 23, 2006

1. Density and properties Brief outline 2. Sampling from multivariate normal and MLE 3. Sampling distribution and large sample behavior of X and S 4.

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30

The Multivariate Normal Distribution 1

Inferences about a Mean Vector

The Multivariate Normal Distribution 1

Introduction to Normal Distribution

Chapter 5 continued. Chapter 5 sections

TUTORIAL 2 STA437 WINTER 2015

MAS223 Statistical Inference and Modelling Exercises

Stat 206: the Multivariate Normal distribution

BIOS 2083 Linear Models Abdus S. Wahed. Chapter 2 84

Lecture 3. Inference about multivariate normal distribution

Random vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables.

Random Variables and Their Distributions

Multivariate Analysis and Likelihood Inference

Lecture 11. Multivariate Normal theory

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

7 Multivariate Statistical Models

Random Vectors and Multivariate Normal Distributions

Visualizing the Multivariate Normal, Lecture 9

Multivariate Distributions

01 Probability Theory and Statistics Review

Chapter 4. Multivariate Distributions. Obviously, the marginal distributions may be obtained easily from the joint distribution:

III - MULTIVARIATE RANDOM VARIABLES

STA 2201/442 Assignment 2

The Delta Method and Applications

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

Multivariate Statistical Analysis

STAT 501 Assignment 1 Name Spring 2005

Table of Contents. Multivariate methods. Introduction II. Introduction I

Whitening and Coloring Transformations for Multivariate Gaussian Data. A Slecture for ECE 662 by Maliha Hossain

5 Inferences about a Mean Vector

8 - Continuous random vectors

The Multivariate Gaussian Distribution

Multivariate Linear Models

STA 2101/442 Assignment 3 1

Comparing two independent samples

Gaussian Models (9/9/13)

The purpose of this section is to derive the asymptotic distribution of the Pearson chi-square statistic. k (n j np j ) 2. np j.

5.1 Consistency of least squares estimates. We begin with a few consistency results that stand on their own and do not depend on normality.

Lecture Note 1: Probability Theory and Statistics

Applied Multivariate and Longitudinal Data Analysis

Exercises and Answers to Chapter 1

2. Multivariate Distributions

Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2

ECON Fundamentals of Probability

(Multivariate) Gaussian (Normal) Probability Densities

Statistics II: Estimating simple parametric models

Appendix 2. The Multivariate Normal. Thus surfaces of equal probability for MVN distributed vectors satisfy

under the null hypothesis, the sign test (with continuity correction) rejects H 0 when α n + n 2 2.

STAT 501 Assignment 1 Name Spring Written Assignment: Due Monday, January 22, in class. Please write your answers on this assignment

A Probability Review

Multiple Random Variables

MATH5745 Multivariate Methods Lecture 07

Chapter 4. i=(;j E= E-1 = ~: (27l) I(:i) = V: exp -- ( -(Xi - 1) + -(Xl - i)(x2-3) + -(X2-3)2) 1 ( )2 2V2(

STA 437: Applied Multivariate Statistics

Multivariate Statistical Analysis

TAMS39 Lecture 2 Multivariate normal distribution

Elements of Probability Theory

Fall 07 ISQS 6348 Midterm Solutions

Lecture 2: Review of Probability

Stat 704 Data Analysis I Probability Review

18 Bivariate normal distribution I

Statistics. Lent Term 2015 Prof. Mark Thomson. 2: The Gaussian Limit

EEL 5544 Noise in Linear Systems Lecture 30. X (s) = E [ e sx] f X (x)e sx dx. Moments can be found from the Laplace transform as

Stat 710: Mathematical Statistics Lecture 31

3. Probability and Statistics

Principal Component Analysis for a Spiked Covariance Model with Largest Eigenvalues of the Same Asymptotic Order of Magnitude

1 Data Arrays and Decompositions

MLES & Multivariate Normal Theory

BASICS OF PROBABILITY

The Multivariate Gaussian Distribution [DRAFT]

STA442/2101: Assignment 5

Introduction to Computational Finance and Financial Econometrics Matrix Algebra Review

Formulas for probability theory and linear models SF2941

STAT 4385 Topic 01: Introduction & Review

Multivariate Random Variable

Random Matrices and Multivariate Statistical Analysis

Stat 5101 Notes: Brand Name Distributions

University of California, Berkeley

Confidence Intervals, Testing and ANOVA Summary

Multivariate Normal & Wishart

CLASSICAL NORMAL-BASED DISCRIMINANT ANALYSIS

MULTIVARIATE POPULATIONS

Lecture 15. Hypothesis testing in the linear model

First steps of multivariate data analysis

Canonical Correlation Analysis of Longitudinal Data

USEFUL PROPERTIES OF THE MULTIVARIATE NORMAL*

2.3. The Gaussian Distribution

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

MULTIVARIATE DISTRIBUTIONS

Elliptically Contoured Distributions

Basic Concepts in Matrix Algebra

Lecture 22: A Review of Linear Algebra and an Introduction to The Multivariate Normal Distribution

Math 181B Homework 1 Solution

Moment Generating Function. STAT/MTHE 353: 5 Moment Generating Functions and Multivariate Normal Distribution

Multivariate Analysis Homework 3

Transcription:

Multivariate Analysis Homework A490970 Yi-Chen Zhang March 6, 08 4.. Consider a bivariate normal population with µ = 0, µ =, σ =, σ =, and ρ = 0.5. a Write out the bivariate normal density. b Write out the squared generalized distance expression x µ T Σ x µ as a function of x and x. c Determine and sketch the constant-density contour that contains 50% of the probability. a The multivariate normal density is defined by the following equation. { fx = exp } π p/ Σ / x µt Σ x µ. x µ σ σ In this question, we have p =, x =, µ =, Σ = x µ σ σ 0 σ = ρ σ σ. Note that µ =, Σ =, Σ = Σ / =, and Σ =. So the bivariate normal density is fx = = { exp π / { exp 6π x x x x x + x } } x x, and =, b x µ T Σ x µ = x x x x = x x x + x. c For α = 0.5, the solid ellipsoid of x, x satisfy x µ T Σ x µ χ p,α = c will have probability 50%. From the quantile function in R we have χ,0.5 = qchisq0.5,df= =.86, therefore, c =.774. The eigenvalues of Σ are λ, λ =.660, 0.640 with eigenvectors 0.888 0.4597 e e =. 0.4597 0.888 Therefore, we have the axes as: c λ =.8 and c λ = 0.975. The contour is plotted in Figure.

x 0 4 4 0 4 x Figure : Contour that contains 50% of the probability 4.4. Let X be N µ, Σ with µ T =,, and Σ = a Find the distribution of X X + X. b Relabel the variables if necessary, and find a vector a such that X X a T X are independent. X and a Let a =,, T, then a T X = X X + X. Therefore, where and a T X Na T µ, a T Σa, a T µ = = a T Σa = = 9 The distribution of X X + X is N, 9. b Let a = T a a, then Y = X a T X = a X X + X a X. 0 0 X Now, let A =, then AX = NAµ, AΣA a a Y T, where 0 a 0 0 AΣA T = a a 0 a a = a + a a + a a 4a + a a + a +

Since we want to have X and Y independent, this implies that a a + = 0. So we have vector a = + c, for c R 0 4 0 4.6. Let X be distributed as N µ, Σ, where µ T =,, and Σ = 0 5 0. Which 0 of the following random variables are independent? Explain. a X and X b X and X c X and X d X, X and X e X and X + X X a σ = σ = 0, X and X are independent. b σ = σ =, X and X are not independent. c σ = σ = 0, X and X are independent. d We rearrange the covariance matrix and partition it. The new covariance matrix is as following: 4 0 Σ = 0 0 0 5 It is clear that X, X and X are independent. 0 0 X e Let A =, then AX = and AX NAµ, AΣA X + X X T, where 4 0 0 0 AΣA T = 0 5 0 0 0 0 4 6 = 6 6 It is clear that X and X + X X are not independent. 4.7. Refer to Exercise 4.6 and specify each of the following. a The conditional distribution of X, given that X = x. b The conditional distribution of X, given that X = x and X = x. X We use the result 4.6 from textbook. Let X = Nµ, Σ with µ = X Σ Σ and Σ = and Σ Σ Σ > 0. Then X X = x N µ + Σ Σ x µ, Σ Σ Σ Σ µ µ

a X X = x N + x, 4 X X = x N x +, b X X = x, X = x N + 0 5 0 x, 4 0 5 0 0 0 x 0 X X = x, X = x N x +, 4.6. Let X, X, X, and X 4 be independent N p µ, Σ random vectors. a Find the marginal distributions for each of the random vectors V = 4 X 4 X + 4 X 4 X 4 and V = 4 X + 4 X 4 X 4 X 4 b Find the joint density of the random vectors V and V defined in a. a By result 4.8 in the textbook, V and V have the following distribution n n N p c i µ, c i Σ i= i= Then we have V N p 0, Σ and V 4 N p 0, Σ. 4 b Also by result 4.8, V and V are jointly multivariate normal with covariance matrix n c i Σ b T cσ i= n b T cσ Σ, j= b j with c =,,, 4 4 4 4 T and b =,,, 4 4 4 4 T. So that we have the joint distribution of V and V as following: V 0 N V p, Σ 0 4 0 0 Σ 4 4.8. Find the maximum likelihood estimates of the mean vector µ and the covariance matrix Σ based on the random sample 6 X = 4 4 5 7 4 7 from a bivariate normal population. 4

Since the random samples X, X, X, and X 4 are from normal population, the maximum likelihood estimates of µ and Σ are X and n X i n XX i X T. Therefore, ˆµ = X = 4 6 and Σ = 4 i= 4 X i XX i X T = i= / /4 /4 / 4.9. Let X, X,..., X 0 be a random sample of size n = 0 from an N 6 µ, Σ population. Specify each of the following completely. a The distribution of X µ T Σ X µ b The distributions of X and n X µ c The distribution of n S a X µ T Σ X µ is distributed as χ 6 b X is distributed as N6 µ, Σ and n X 0 µ is distributed as N6 0, Σ 0 c n S is distributed as Wishart distribution Z i Zi T, where Z i N 6 0, Σ. We write this as W 6 9, Σ, i.e., Wishart distribution with dimensionality 6, degrees of freedom 9, and covariance matrix Σ. 4.. Let X,..., X 60 be a random sample of size 60 from a four-variate normal distribution having mean µ and covariance Σ. Specify each of the following completely. a The distribution of X b The distribution of X µ T Σ X µ c The distribution of n X µ T Σ X µ d The approximate distribution of n X µ T S X µ a X is distributed as N4 µ, 60 Σ. b X µ T Σ X µ is distributed as χ 4. c n X µ T Σ X µ is distributed as χ 4. d Since 60 4, n X µ T S X µ can be approximated as χ 4. 4.. Consider the annual rates of return including dividends on the Dow-Jones industrial average for the years 996-005. These data, multiplied by 00, are 0.6. 5. 6.8 7. 6. 5..6 6.0 Use these 0 observations to complete the following. a Construct a Q-Q plot. Do the data seem to be normally distributed? Explain. b Carry out a test of normality based on the correlation coefficient r Q. Let the significance level be α = 0.. i= a The Q-Q plot of this data is plotted in Figure. It seems that all the sample quantiles are close the theoretical quantiles. However, the Q-Q plots are not particularly informative unless the sample size is moderate to large, for instance, n 0. There can be quite a bit of variability in the straightness of the Q-Q plot for small samples, even when the observations are known to come from a normal population. 5

Sample Quantiles 0 0 0 0.5.0 0.5 0.0 0.5.0.5 Theoretical Quantiles Figure : Normal Q-Q plot b From 4- in the textbook, the q Q is defined by r Q = n n j= x j xq j q j= x j x n j= q j q Using the information from the data, we have r Q = 0.95. The R code of this calculation is compiled in Appendix. From Table 4. in the textbook we know that the critical point to test of normality at the 0% level of significance corresponding to n = 9 and α = 0. is between 0.90 and 0.95. Since r Q = 0.95 > the critical point, we do not reject the hypothesis of normality. 4.6. Exercise. gives the age x, measured in years, as well as the selling price x, measured in thousands of dollars, for n = 0 used cars. These data are reproduced as follows: x 4 5 6 8 9 x 8.95 9.00 7.95 5.54 4.00.95 8.94 7.49 6.00.99 a Use the results of Exercise. to calculate the squared statistical distances x j x T S x j x, j =,,..., 0, where x T j = x j, x j. b Using the distances in Part a, determine the proportion of the observations falling within the estimated 50% probability contour of a bivariate normal distribution. c Order the distances in Part a and construct a chi-square plot. d Given the results in Parts b and c, are these data approximately bivariate normal? Explain. x 5. 0.6 7.70 a From Exercise. we have x = = and S =. x.48 7.70 0.8544 The squared statistical distances d j = x j x T S x j x, j =,..., 0 are calculated and listed below d d d d 4 d 5 d 6 d 7 d 8 d 9 d 0.875.00.9009 0.75 0.05 0.076.79 0.865.75 4.5 6

b We plot the data points and 50% probability contour the blue ellipse in Figure. It is clear that subject 4, 5, 6, 8, and 9 are falling within the estimated 50% probability contour. The proportion of that is 0.5. x 5 0 5 4 5 6 7 8 9 4 6 8 0 x 0 Figure : Contour of a bivariate normal c The squared distances in Part a are ordered as below. The chi-square plot is shown in Figure 4. d 6 d 5 d 4 d 8 d 9 d d d d 7 d 0 0.076 0.05 0.75 0.865.75.875.00.9009.79 4.5 Sample Quantiles 0 4 0 4 6 8 0 4 Theoretical Quantiles Figure 4: Chi-square plot d Given the results in Parts b and c, we conclude these data are approximately bivariate normal. Most of the data are around the theoretical line. 7

Appendix R code for Problem 4. c. > libraryellipse > librarymass > librarymvtnorm > set.seed > > mu <- c0, > Sigma <- matrixc,sqrt/,sqrt/,, nrow=, ncol= > X <- mvrnormn=0000,mu=mu, Sigma=Sigma > lambda <- eigensigma$values > Gamma <- eigensigma$vectors > elps <- ttellipsesigma, level=0.5, npoints=000+mu > chi <- qchisq0.5,df= > c <- sqrtchi > factor <- c*sqrtlambda > plotx[,],x[,] > lineselps > pointsmu[], mu[] > segmentsmu[],mu[],factor[]*gamma[,],factor[]*gamma[,]+mu[] > segmentsmu[],mu[],factor[]*gamma[,],factor[]*gamma[,]+mu[] R code for Problem 4.. > x <- c-0.6,., 5., -6.8, -7., -6., 5.,.6, 6.0 > # a > qqnormx > qqlinex > # b > y <- sortx > n <- lengthy > p <- :n-0.5/n > q <- qnormp > rq <- cory,q R code for Problem 4.6. > n <- 0 > x <- c,,,,4,5,6,8,9, > x <- c8.95, 9.00, 7.95, 5.54, 4.00,.95, 8.94, 7.49, 6.00,.99 > X <- cbindx,x > Xbar <- colmeansx > S <- covx > Sinv <- solves > > # a > d <- diagttx-xbar%*%sinv%*%tx-xbar > > # b > libraryellipse 8

> p <- > elps <- ttellipses, level=0.85, npoints=000+xbar > plotx[,],x[,],type="n" > index <- d < qchisq0.5,df=p > textx[,][index],x[,][index],:n[index],col="blue" > textx[,][!index],x[,][!index],:n[!index],col="red" > lineselps,col="blue" > > # c > namesd <- :0 > sortd > qqplotqchisqppoints500,df=p, d, main="", + xlab="theoretical Quantiles", ylab="sample Quantiles" > qqlined,distribution=functionx{qchisqx,df=p} 9