Preliminaries Copyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 1 / 38
Notation for Scalars, Vectors, and Matrices Lowercase letters = scalars: x, c, σ. Boldface, lowercase letters = vectors: x, y, β. Boldface, uppercase letters = matrices: A, X, Σ. opyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 2 / 38
The Product of a Scalar and a Matrix Suppose A is a matrix with m rows and n columns. Let a ij R be the element or entry in the ith row and jth column of A. We convey all this information with the notation A = [a m n ij ]. For any c R, ca = Ac = [ca ij ]; i.e., the product of the scalar c and the matrix A = [a ij ] is the matrix whose entry in the ith row and j column is c times a ij for each i = 1,..., m and j = 1,..., n. Copyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 3 / 38
Vector and Vector Transpose In STAT 510, a vector is a matrix with one column: x = x 1. x n. In STAT 510, we use x to denote the transpose of the vector x: x = [x 1,..., x n ]; i.e., x is a matrix with one column and x is the matrix with the same entries as x but written as a row rather than a column. Copyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 4 / 38
Transpose of a Matrix Suppose A is an m n matrix. Then we may write A as [a 1,..., a n ], where a i is the ith column of A for each i = 1,..., n. The transpose of the matrix A is A = [a 1,..., a n ] = a 1. a n. Copyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 5 / 38
Matrix Multiplication Suppose m n A = [a ij ] and B = [b n k ij ]. Then m n A B n k = C = m k [ c ij = ] n a il b lj. l=1 opyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 6 / 38
Matrix Multiplication Special Cases If a = a 1. a n and b = b 1. b n, then a b = n a i b i. i=1 Also, a a = n a 2 i a 2. i=1 a a a = n a 2 i is known as the Euclidean norm of a. i=1 Copyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 7 / 38
Another Look at Matrix Multiplication Suppose and m n A = [a ij ] = [a 1,..., a n ] = B = [b n k ij ] = [b 1,..., b k ] = a (1). a (m) b (1). b (n). Then m n A B n k = C = m k [ c ij = ] n a il b lj = [c ij = a (i) b j] l=1 a (1) B = [Ab 1,..., Ab k ] =. = a (m) B n a l b (l). l=1 Copyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 8 / 38
Transpose of a Matrix Product The transpose of a matrix product is a product of the transposes in reverse order; i.e., (AB) = B A. opyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 9 / 38
Rank and Trace The rank of a matrix A is written as rank(a) and is the maximum number of linearly independent rows (or columns) of A The trace of an n n matrix A is written at trace(a) and is the sum of the diagonal elements of A; i.e., trace(a) = n a ii. i=1 Copyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 10 / 38
Idempotent Matrices A matrix A is said to be idempotent if AA = A. The rank of an idempotent matrix is equal to its trace; i.e., rank(a) = trace(a). opyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 11 / 38
Linear Combinations and Column Spaces Ab is a linear combination of the columns of an m n matrx A: Ab = [a 1,..., a n ] b 1. b n = b 1a 1 + + b n a n. The set of all possible linear combinations of the columns of A is called the column space of A and is written as C(A) = {Ab : b R n }. Note that C(A) R m. Copyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 12 / 38
Inverse of a Matrix An m n matrix A is said to be square if m = n. A square matrix A is nonsingular or invertible if there exists a square matrix B such that AB = I. If A is nonsingular and AB = I, then B is the unique inverse of A and is written as A 1. For a nonsingular matrix A, we have AA 1 = I. (It is also true that A 1 A = I.) A square matrix without an inverse is called singular. Copyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 13 / 38
Generalized Inverses G is a generalized inverse of an m n matrix A if AGA = A. We usually denote a generalized inverse of A by A. If A is nonsingular, i.e., if A 1 exists, then A 1 is the one and only generalized inverse of A. AA 1 A = IA = AI = A If A is singular, i.e., if A 1 does not exist, then there are infinitely many generalized inverses of A. Copyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 14 / 38
Finding a Generalized Inverse of a Matrix A 1 Find any r r nonsingular submatrix of A where r = rank(a). Call this matrix W. 2 Invert and transpose W, i.e., compute (W 1 ). 3 Replace each element of W in A with the corresponding element of (W 1 ). 4 Replace all other elements in A with zeros. 5 Transpose the resulting matrix to obtain G, a generalized inverse for A. Copyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 15 / 38
Positive and Non-Negative Definite Matrices x Ax is known as a quadratic form. We say that an n n matrix A is positive definite (PD) iff A is symmetric (i.e., A = A ), and x Ax > 0 for all x R n \ {0}. We say that an n n matrix A is non-negative definite (NND) iff A is symmetric (i.e., A = A ), and x Ax 0 for all x R n. Copyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 16 / 38
Positive and Non-Negative Definite Matrices A matrix that is positive definite is nonsingular; i.e., A positive definite = A 1 exists. A matrix that is non-negative definite but not positive definite is singular. opyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 17 / 38
Random Vectors A random vector is a vector whose components are random variables. y = y 1 y 2. y n opyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 18 / 38
Expected Value of a Random Vector The expected value, or mean, of a random vector y is the vector of expected values of the components of y. y = E(y 1 ) E(y = E(y) = 2 ).. y 1 y 2 y n E(y n ) Copyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 19 / 38
Variance of a Random Vector The variance of a random vector y = [y 1, y 2,..., y n ] is the matrix whose i, jth element is Cov(y i, y j ) (i, j {1,..., n}). Cov(y 1, y 1 ) Cov(y 1, y 2 ) Cov(y 1, y n ) Cov(y Var(y) = 2, y 1 ) Cov(y 2, y 2 ) Cov(y 2, y n )...... Cov(y n, y 1 ) Cov(y n, y 2 ) Cov(y n, y n ) Copyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 20 / 38
Variance of a Random Vector The covariance of a random variable with itself is the variance of that random variable. Thus, Var(y 1 ) Cov(y 1, y 2 ) Cov(y 1, y n ) Cov(y Var(y) = 2, y 1 ) Var(y 2 ) Cov(y 2, y n )....... Cov(y n, y 1 ) Cov(y n, y 2 ) Var(y n ) Copyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 21 / 38
Covariance Between Two Random Vectors The covariance between random vectors u = [u 1,..., u m ] and v = [v 1,..., v n ] is the matrix whose i, jth element is Cov(u i, v j ) (i {1,..., m}, j {1,..., n}). Cov(u 1, v 1 ) Cov(u 1, v 2 ) Cov(u 1, v n ) Cov(u Cov(u, v) = 2, v 1 ) Cov(u 2, v 2 ) Cov(u 2, v n )... Cov(u m, v 1 ) Cov(u m, v 2 ) Cov(u m, v n ) = E(uv ) E(u)E(v ). opyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 22 / 38
Linear Transformation of a Random Vector If y is an n 1 random vector, A is an m n matrix of constants, and b is an m 1 vector of constants, then Ay + b is a linear transformation of the random vector y. Copyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 23 / 38
Mean, Variance, and Covariance of Linear Transformations of a Random Vector y E(Ay + b) = AE(y) + b Var(Ay + b) = AVar(y)A Cov(Ay + b, Cy + d) = AVar(y)C opyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 24 / 38
Standard Multivariate Normal Distributions If z 1,..., z n iid N(0, 1), then z = z 1. z n has a standard multivariate normal distribution: z N(0, I). opyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 25 / 38
Multivariate Normal Distributions Suppose z is an n 1 standard multivariate normal random vector, i.e., z N(0, I n n ). Suppose A is an m n matrix of constants and µ is an m 1 vector of constants. Then Az + µ has a multivariate normal distribution with mean µ and variance AA : z N(0, I) = Az + µ N(µ, AA ). Copyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 26 / 38
Multivariate Normal Distributions If µ is an m 1 vector of constants and Σ is a m m symmetric, non-negative definite (NND) matrix of rank n, then N(µ, Σ) signifies the multivariate normal distribution with mean µ and variance Σ. If y N(µ, Σ), then y d = Az + µ, where z N(0, I n n ) and A is an m n matrix of rank n such that AA = Σ. opyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 27 / 38
Linear Transformations of Multivariate Normal Distributions are Multivariate Normal y N(µ, Σ) = y = d Az + µ, z N(0, I), AA = Σ = Cy + d = d C(Az + µ) + d = Cy + d = d CAz + Cµ + d = Cy + d = d Mz + u, M CA, u Cµ + d = Cy + d N(u, MM ). opyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 28 / 38
Non-Central Chi-Square Distributions If y N(µ, I n n ), then w y y = n i=1 y 2 i has a non-central chi-square distribution with n degrees of freedom and non-centrality parameter µ µ/2: w χ 2 n(µ µ/2). (Some define the non-centrality parameter as µ µ rather than µ µ/2.) opyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 29 / 38
Central Chi-Square Distributions If z N(0, I n n ), then w z z = n i=1 z 2 i has a central chi-square distribution with n degrees of freedom: w χ 2 n. A central chi-square distribution is a non-central chi-square distribution with non-centrality parameter 0: w χ 2 n(0). Copyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 30 / 38
Important Distributional Result about Quadratic Forms Suppose Σ is an n n positive definite matrix. Suppose A is an n n symmetric matrix of rank m such that AΣ is idempotent (i.e., AΣAΣ = AΣ). Then y N(µ, Σ) = y Ay χ 2 m(µ Aµ/2). opyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 31 / 38
Mean and Variance of Chi-Square Distributions If w χ 2 m(θ), then E(w) = m + 2θ and Var(w) = 2m + 8θ. Copyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 32 / 38
Non-Central t Distributions Suppose y N(δ, 1). Suppose w χ 2 m. Suppose y and w are independent. Then y/ w/m has a non-central t distribution with m degrees of freedom and non-centrality parameter δ: y w/m t m (δ). opyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 33 / 38
Central t Distributions Suppose z N(0, 1). Suppose w χ 2 m. Suppose z and w are independent. Then z/ w/m has a central t distribution with m degrees of freedom: z w/m t m. The distribution t m is the same as t m (0). Copyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 34 / 38
Non-Central F Distributions Suppose w 1 χ 2 m 1 (θ). Suppose w 2 χ 2 m 2. Suppose w 1 and w 2 are independent. Then (w 1 /m 1 )/(w 2 /m 2 ) has a non-central F distribution with m 1 numerator degrees of freedom, m 2 denominator degrees of freedom, and non-centrality parameter θ: w 1 /m 1 w 2 /m 2 F m1,m 2 (θ). Copyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 35 / 38
Central F Distributions Suppose w 1 χ 2 m 1. Suppose w 2 χ 2 m 2. Suppose w 1 and w 2 are independent. Then (w 1 /m 1 )/(w 2 /m 2 ) has a central F distribution with m 1 numerator degrees of freedom and m 2 denominator degrees of freedom: w 1 /m 1 w 2 /m 2 F m1,m 2 (which is the same as the F m1,m 2 (0) distribution). Copyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 36 / 38
Relationship between t and F Distributions If u t m (δ), then u 2 F 1,m (δ 2 /2). Copyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 37 / 38
Some Independence ( ) Results Suppose y N(µ, Σ), where Σ is an n n PD matrix. If A 1 is an n 1 n matrix of constants and A 2 is an n 2 n matrix of constants, then A 1 ΣA 2 = 0 = A 1 y A 2 y. If A 1 is an n 1 n matrix of constants and A 2 is an n n symmetric matrix of constants, then A 1 ΣA 2 = 0 = A 1 y y A 2 y. If A 1 and A 2 are n n symmetric matrices of constants, then A 1 ΣA 2 = 0 = y A 1 y y A 2 y. Copyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 38 / 38