ANOVA: Analysis of Variance - Part I The purpose of these notes is to discuss the theory behind the analysis of variance. It is a summary of the definitions and results presented in class with a few exercises. The solutions of the exercises will follow in another file. In Part II, we will discuss linear models and its relationship with ANOVA models. Symmetric Matrices Symmetric Matrices: Let A be an m m matrix. It is said to be symmetric if a ij = a ji for i = 1,..., m and j = 1,..., m. Equivalently, it means that A = A, where A is the transpose of A. Spectral Decomposition: Let A be a symmetric matrix of size m m. It has m real eigenvalues λ 1,..., λ m associated to the eigenvalues e 1,..., e m where {e 1,..., e m } for an orthonormal basis. That is, e ie j = 0, if i j and e ie i = 1, Its spectral decomposition is that is e i is a unit vector. m A = λ i e i e i. Recall: We say that λ is an eigenvalue of the matrix A if A v = λ v for some non-zero vector v. We say that v is the corresponding eigenvector. Projection Matrix: Let P be a symmetric matrix of size n n with eigenvalues which are either 0 or 1. Then, we say that P is a projection matrix, since P Y = e i e iy = (e iy ) e i, represents the orthogonal projection of the vector Y = [Y 1,..., Y n ] onto the subspace with basis {e 1,..., e k } of dimension k. 1
Idempotent Symmetric Matrix: Consider a symmetric matrix A, i.e. A = A. If A 2 = A, then it is said to be idempotent. A symmetric idempotent symmetric matrix is a projection matrix. Exercise 1: Show that a symmetric idempotent matrix A, must have eigenvalues equal to either 0 or 1. Therefore, A is a projection matrix. Trace of a square matrix: Consider a n n matrix A, its trace is defined as tr(a) = a ii. In other words, the trace is the sum of the elements in the diagonal. Properties of the trace: tr(a 1 + A 2 ) = tr(a 1 ) + tr(a 2 ); tr(c A) = c tr(a); tr(ab) = tr(b A), as long as AB and BA are both square matrices. Exercise 2: Consider A a symmetric matrix of size n n. Show that tr(a) = λ i, where λ 1,..., λ n are the eigenvalues of A. Hint: Consider the spectral decomposition of A. Let P be a projection matrix. Show that tr(p ) is the dimension of the subspace onto which the matrix is projecting. Hint: The eigenvalues of the projection matrix are either 0 or 1. 2
Multivariate Normal Distribution Consider Y = [Y 1,..., Y n ] a vector a independent normal random variables, where Y i N(µ i, σ 2 i ), for i = 1,..., n. Let A m n matrix. We say that the vector W = A Y follows a multivariate distribution. Exercise 3: Let W = A Y be a multivariate normal random vector. Consider two components of W : Show that W i = a ik Y k and W j = a jk Y k. σ{w 1, W 2 } = a ik b jk σi 2. Theorem: Consider multivariate normal random vector W = A Y. The components of W are normal random variables. Uncorrelated components of W are independent. Proof: The ith component of W is a ik Y k, which is a normal since it is a linear combination of independent normals. Sketch of the proof: The moment generating function for a random vector W = [W 1,..., W n ] is M W (t 1,..., t n ) = E{e t W } = E{e t1w1+...+tm Wm }. It can be shown that a joint distribution is characterized by its moment generating function assuming that it exists. In other words, two different 3
joint distribution cannot have the same moment generating function. We will consider two components: W i = a ik Y k and W j = If W i and W j are independent then a jk Y k. M W (t i, t j ) = E{e tiwi }E{e tjwj }. (1) If we can show that if W 1 and W 2 are uncorrelated implies that (1), then this will mean that (W 1, W 2 ) have the same joint distribution as independent random variables. Thus, they must be independent. So lets assume that W 1 and W 2 are uncorrelated, this means σ{w 1, W 2 } = a ik b jk σi 2 = 0. Recall: If X i N(µ, σ 2 ), then M X (t) = E{e t X } = e tµ+t2 σ 2 /2. We have M(t i, t j ) = E{e ti Wj+tj Wj } = E{e n (t1a ik+t 2 a jk )Y k } = E{ = = n M Yk (t 1 a ik + t 2 a jk ) n n e (t1a ik+t 2 a jk ) µ k +(t ia ik +t j a jk ) 2 σ 2 k /2 e {(t1a ik+t 2 a jk )Y k } by the independence of Y 1,..., Y n = e ti n a ikµ k +t 2 i a2 ik σ2 k /2 e tj n a jkµ k +t 2 j a2 jk σ2 k /2 e 2 ti tj n a ika jk σ 2 k = M Wi (t i )M Wi (t i ), since a ik a jk σk 2 = 0. 4
Theorem: Let W = A Y and U = B Y be multivariate normal vectors, where Y 1,..., Y n are independent normal random variables with Y i N(µ i, σ 2 ). Note: We are assuming a common variance. If AB = 0 then AY and B Y are independent. Proof: Consider the ith component of W = AY : W i = and the jth component of U = B Y : U j = a ik Y k a jk Y k. If suffices to show that W i and U j are uncorrelated. Their covariance is since A B = 0. σ{w i, U j } = σ 2 n a ik a jk = 0 Remark: If the variance are not equal, then the covariance is σ{w i, U j } = a ik a jk σk, 2 which is not necessarily equal zero when A B = 0. 5
Quadratic Forms Definition: We say that Q is a quadractic form of Y = [Y 1,..., Y n ] if we can write it as follows: Q = Y AY, where A is symmetric. Equivalently, we have Q = a ij Y i Y j, where a ij = a ji. Note: We say that A is the matrix of the quadratic form. Properties of Quadratic Forms: if Q 1,..., Q k are quadratic forms of Y 1,..., Y n, then Q = c 1 Q 1 +... + c l Q k = Y ( is also a quadratic form of Y 1,..., Y n. c i A i )Y if the matrix of Q is a projection matrix P, then we can write Q as a sum of squares: Q = Y P Y = (P Y ) (P Y ), since P P and P 2 = P. Another representation that is useful is Q = Y ( e i e i)y = Y (e i e i)y = where λ 1 =... = λ k = 1 and λ i = 0 for i > k. (e iy ) (e iy ), Theorem: Consider Y 1,..., Y n independent normal random variables with a common variance, i.e. σ 2 {Y i } = σ 2 for i = 1,..., n. if P 1 and P 2 are projection matrices such that P 1 P 2 = 0, then Q 1 = Y P 1 Y and Q 2 = Y P 2 Y are independent. It follows from the fact they are functions of P 1 Y and P 2 Y which are independent. 6
The quadratic form Y P Y follows a χ 2 distribution. From above, we see the quadratic form is a sum of square of e 1Y, e 2Y,..., e k Y. Since e i e j = 0, for i j, therefore these are independent normals. Furthermore, σ 2 {e iy } = σ 2 e ie i = σ 2, since e i e i = 1 (i.e. a unit vector). Remarks: [Construction of χ 2 ] We are using the following construction of a χ 2 random variable. Let Y 1,..., Y n be independent normal random variables with a common variance σ 2, then n Y i 2 σ 2 χ 2, with ν = n degrees of freedom and the non-centrality parameter is n nc = µ2 i σ 2. The parameters come from the mean of the sum of squares, that is E{ Y 2 i } = n σ 2 + µ i. Expectation of a Quadratic form: We showed in class that E{Y AY } = tr(a) σ 2 + µ Aµ, where µ = [µ 1,..., µ n ]. Therefore, if P is a projection matrix, then E{Y P Y } = tr(p ) σ 2 + µ P µ, and Y P Y χ 2 with ν = tr(p ) degrees of freedom and nc = µ P µ σ 2. Exercise 4: Consider Q = Q 1 + Q 2, then show that ν = ν 1 + ν 2, where Q,Q 1, Q 2 are quadratic forms of Y 1,..., Y n of independent random variables with a common mean. In other words, show that the degrees of freedom are additive. 7
Exercise 5: Let Y 1,..., Y n independent normals with a common variance. That is Y i N(µ i, σ 2 ). 1. Show that the total variability SSTO σ 2 χ 2, by showing that SSTO = Y P Y, where P is a projection matrix, i.e. P = P and P 2 = P. The total variability is SSTO = (Y i Y ) 2 and Y = 1 n Y i. 2. Show that the degrees of freedom is ν = n 1. Hint: Compute the trace of the matrix. 3. Give the form of the non-centrality parameter of SSTO. 8