identity matrix, shortened I the jth column of I; the jth standard basis vector matrix A with its elements a ij

Notation R R n m R n m r R n s real numbers set of n m real matrices subset of R n m consisting of matrices with rank r subset of R n n consisting of symmetric matrices NND n subset of R n s consisting of nonnegative definite (nnd) matrices: A NND n A = LL for some L; instead of nnd, the term positive semidefinite is often used PD n subset of NND n consisting of positive definite (pd) matrices: A = LL for some nonsingular L 0 null vector, null matrix; denoted also as 0 n or 0 n m 1 n column vector of ones, shortened 1 I n i j A = {a ij } A n m a A A = (a 1 : : a m) identity matrix, shortened I the jth column of I; the jth standard basis vector matrix A with its elements a ij n m matrix A column vector a R n transpose of the matrix A (A : B) partitioned (augmented) matrix a (1) A = a (n) A 1 A n m represented columnwise An m represented row-wise inverse of the matrix A A generalized inverse of the matrix A: AA A = A, also called {1}- inverse, or inner inverse {A } the set of generalized inverses of A S Puntanen et al, Matrix Tricks for Linear Statistical Models: Our Personal Top Twenty, DOI 101007/978-3-642-10473-2, Springer-Verlag Berlin Heidelberg 2011 427

428 Notation A 12 reflexive generalized inverse of A: AA A = A, A AA = A, also called {12}-inverse A + A ij In(A) = (π, ν, δ) the Moore Penrose inverse of A: the unique matrix satisfying the four Moore Penrose conditions: (mp1) AA A = A, (mp2) A AA = A, (mp3) (AA ) = AA, (mp4) (A A) = A A generalized inverse of A satisfying the Moore Penrose conditions (mpi) and (mpj) A 1/2 symmetric nnd square root of A NND n: A 1/2 = TΛ 1/2 T, where A = TΛT is the eigenvalue decomposition of A A +1/2 (A + ) 1/2 a, b a, b V a b a a V inertia of the square matrix A: π, ν, and δ are the number of positive, negative, and zero eigenvalues of A, respectively, all counting multiplicities standard inner product in R n : a, b = a b; can denote also a general inner product in a vector space inner product a Vb; V is the inner product matrix (ipm) vectors a and b are orthogonal with respect to a given inner product Euclidean norm (standard norm, 2-norm) of vector a, also denoted a 2: a 2 = a a; can denote also a general vector norm in a vector space a 2 V = a Va, norm when the ipm is V (ellipsoidal norm) A, B standard matrix inner product between A, B R n m : A, B = tr(a B) = i,j a ijb ij A F Euclidean (Frobenius) norm of the matrix A: A 2 F = tr(a A) = i,j a2 ij A 2 A 1 2 matrix 2-norm of the matrix A (spectral norm): A 2 = max Ax 2 = sg 1 (A) = + ch 1(A A) x 2=1 matrix 2-norm of nonsingular A n n: A 1 2 = 1/ sg n (A) cond(a) condition number of nonsingular A n n: cond(a) = A 2 A 1 2 = sg 1 (A)/ sg n (A) cos(a, b) cos (a, b), the cosine of the angle, θ, between the nonzero vectors a and b: cos(a, b) = cos θ = cos (a, b) = a,b a b (a, b) the angle, θ, 0 θ π, between the nonzero vectors a and b: θ = (a, b) = cos 1 (a, b) A[α, β] A[α] submatrix of A n n, obtained by choosing the elements of A which lie in rows α and columns β; α and β are index sets of the rows and the columns of A, respectively A[α, α], principal submatrix; same rows and columns chosen

Notation 429 A(α, β) A(i, j) minor(a ij ) A L i ith leading principal submatrix of A n n: A L i = A[α, α], where α = {1,, i} submatrix of A, obtained by choosing the elements of A which do not lie in rows α and columns β submatrix of A, obtained by deleting row i and column j from A ijth minor of A corresponding to a ij : minor(a ij ) = det(a(i, j)), i, j {1,, n} cof(a ij ) ijth cofactor of A: cof(a ij ) = ( 1) i+j minor(a ij ) det(a) determinant of the matrix A n n: det(a) = a, a R, det(a) = n j=1 a ij cof(a ij ), i {1,, n}: the Laplace expansion by minors det(a[α]) det(a L i ) A diag(a) diag(d 1,, d n) diag(d) A δ rk(a) rank(a) tr(a) trace(a) vec(a) A B along the ith row principal minor leading principal minor of order i determinant of the matrix A n n diagonal matrix formed by the diagonal entries of A n n n n diagonal matrix with listed diagonal entries n n diagonal matrix whose ith diagonal element is d i diagonal matrix formed by the diagonal entries of A n n rank of the matrix A rank of the matrix A n trace of the matrix A n n : tr(a) = i=1 a ii trace of the matrix A n n vectoring operation: the vector formed by placing the columns of A under one another successively Kronecker product of A n m and B p q: a 11B a 1mB A B = R np mq a n1b a nmb A/A 11 Schur complement of A 11 in A = ( A 11 A 12 A 21 A 22 ) : A/A 11 = A 22 A 21A 11 A12 A 22 1 A L 0 A > L 0 A L B A < L B A rs B A 22 A 21A 11 A12 A is nonnegative definite: A = LL for some L; A NND n A is positive definite: A = LL for some invertible L; A PD n B A is nonnegative definite; B A NND n; A lies below B with respect to the Löwner ordering B A is positive definite; B A PD n A and B are rank-subtractive; rk(b A) = rk(b) rk(a); A lies below B with respect to the minus ordering

430 Notation Sh(V X) the shorted matrix of V NND n with respect to X n p, Sh(V X) is the maximal element U (in the Löwner ordering) in the set U = { U : 0 L U L V, C (U) C (X) } P A orthogonal projector onto C (A) (wrt I): P A = A(A A) A = AA + P A;V orthogonal projector onto C (A) wrt V PD n: P A;V = A(A VA) A V P A;V generalized orthogonal projector onto C (A) wrt V NND n: P A;V = A(A VA) A V + A[I (A VA) A VA]U, where U is arbitrary P A B projector onto C (A) along C (B): P A B (A : B) = (A : 0) {P A B } set of matrices satisfying: P A B (A : B) = (A : 0) P U orthogonal projector onto the vector space U (wrt a given inner product) C (A) column space of the matrix A n p: C (A) = { y R n : y = Ax for some x R p } N (A) null space of the matrix A n p: N (A) = { x R p : Ax = 0 } C (A) orthocomplement of C (A) wrt I: C (A) = { z R n : z Ax = 0 x R p } = N (A ) A matrix whose column space is C (A ) = C (A) C (A) V orthocomplement of C (A) wrt V: C (A) V = { z R n : z VAx = 0 x R p } = N (A V) A V p A (x) U V U + V U V U V U V ch i (A) = λ i ch(a) ch(a, B) nzch(a) chv i (A) matrix whose column space is C (A) V the characteristic polynomial of A: p A (x) = det(a xi) U is a subset of V; possibly U = V sum of the vector spaces U and V direct sum of the vector spaces U and V direct sum of the orthogonal vector spaces U and V intersection of the vector spaces U and V the ith largest eigenvalue of A n n (all eigenvalues being real) set of all n eigenvalues of A n n, including multiplicities, called also the spectrum of A: ch(a) = {ch 1(A),, ch n(a)} set of proper eigenvalues of symmetric A n n with respect to B NND n; λ ch(a, B) if Aw = λbw, Bw 0 set of the nonzero eigenvalues of A n n: nzch(a) = {ch 1(A),, ch r(a)}, r = rank(a) eigenvector of A n n with respect to λ i = ch i (A): a nonzero vector t i satisfying the equation At i = λ i t i sg i (A) = δ i the ith largest singular value of A n m: sg i (A) = + ch i (A A) = + ch i (AA )

Notation 431 sg(a) nzsg(a) ρ(a) var s(y) set of the singular values of A n m (m n): sg(a) = {sg 1 (A),, sg m (A)} set of the nonzero singular values of A n m: nzsg(a) = {sg 1 (A),, sg r (A)}, r = rank(a) the spectral radius of A n n: the maximum of the absolute values of the eigenvalues of A n n sample variance of the variable y var d (y) = s 2 y sample variance: argument is the variable vector y R n : var d (y) = 1 n 1 y Cy = 1 n n 1 i=1 (y i ȳ) 2 cov s(x, y) sample covariance between the variables x and y cov d (x, y) = s xy sample covariance: arguments are variable vectors R n : cov d (x, y) = 1 n 1 x Cy = 1 n n 1 i=1 (x i x i )(y i ȳ) cor d (x, y) = r xy x x U u 1,, u d u (1),, u (n) sample correlation: r xy = x Cy/ x Cx y Cy = cos(cx, Cy) projection of x onto C (1 n): x = Jx = x1 n centered x: x = Cx = x Jx = x x1 n n d data matrix of the u-variables: u (1) U = (u 1 : : u d ) = u (n) variable vectors in variable space R n observation vectors in observation space R d ū vector of means of the variables u 1,, u d : ū = (ū 1,, ū d ) Ũ ũ 1,, ũ d ũ (1),, ũ (n) centered U : Ũ = CU, C is the centering matrix centered variable vectors centered observation vectors var d (u i ) = s 2 i sample variance: argument is the variable vector u i R n : var d (u i ) = 1 n 1 u icu i = 1 n n 1 l=1 (u li ū i ) 2 cov d (u i, u j ) = s ij sample covariance: arguments are variable vectors R n : s ij = 1 n 1 u icu j = 1 n n 1 l=1 (u li ū i )(u lj ū j ) ssp(u) = {t ij } matrix T (d d) of the sums of squares and products of deviations about the mean: T = U n CU = i=1 (u (i) ū)(u (i) ū) cov d (U) = {s ij } sample covariance matrix S (d d) of the data matrix U: S = 1 n 1 T = 1 n n 1 i=1 (u (i) ū)(u (i) ū) cor d (u i, u j ) = r ij sample correlation: arguments are variable vectors R n cor d (U) = {r ij } sample correlation matrix R (d d) of the data matrix U: R = cor d (U) = (diag S) 1/2 S(diag S) 1/2 MHLN 2 (u (i), ū, S) sample Mahalanobis distance (squared) of the ith observation from the mean: MHLN 2 (u (i), ū, S) = (u (i) ū) S 1 (u (i) ū)

432 Notation MHLN 2 (ū i, ū j, S ) MHLN 2 (u, µ, Σ) E( ) var(x) = σ 2 x sample Mahalanobis distance (squared) between two mean vectors: MHLN 2 (ū i, ū j, S ) = (ū i ū j ) S 1 (ū i ū j ), where 1 S = n 1+n 2 2 (U 1C n1 U 1 + U 2C n2 U 2) population Mahalanobis distance squared: MHLN 2 (u, µ, Σ) = (u µ) Σ 1 (u µ) expectation of a random argument: E(x) = p 1x 1 + + p k x k if x is a discrete random variable whose values are x 1,, x k with corresponding probabilities p 1,, p k variance of the random variable x: σ 2 x = E(x µ x) 2, µ x = E(x) cov(x, y) = σ xy covariance between the random variables x and y: σ xy = E(x µ x)(y µ y), µ x = E(x), µ y = E(y) cor(x, y) = ϱ xy correlation between the random variables x and y: ϱ xy = σxy σ xσ y cov(x) covariance matrix (d d) of a d-dimensional random vector x: cov(x) = Σ = E(x µ x)(x µ x) cor(x) correlation matrix (d d) of the random vector x: cor(x) = ρ = (diag Σ) 1/2 Σ(diag Σ) 1/2 cov(x, y) (cross-)covariance matrix between the random vectors x and y: cov(x, y) = E(x µ x)(y µ y) = Σ xy cov(x, x) cor(x, y) cov(x, x) = cov(x) (cross-)correlation matrix between the random vectors x and y ( ) ( x cov y partitioned covariance matrix of the random vector xy ) : ( ) ( ) ( ) x Σxx σ xy cov(x, x) cov(x, y) cov = = y cov(x, y) var(y) x (µ, Σ) σ xy E(x) = µ, cov(x) = Σ x N p(µ, Σ) x follows the p-dimensional normal distribution N p(µ, Σ) n(x; µ, Σ) σ 2 y density for x N p(µ, Σ), Σ pd: n(x; µ, Σ) = 1 (2π) p/2 Σ 1/2 e 1 2 (x µ) Σ 1 (x µ) cc i (x, y) cc(x, y) cc +(x, y) ith largest canonical correlation between the random vectors x and y set of the canonical correlations between the random vectors x and y set of the nonzero (necessarily positive) canonical correlations between the random vectors x and y; square roots of the nonzero eigenvalues of P A P B : cc +(x, y) = nzch 1/2 (P A P B ) = nzsg[(a A) +1/2 A B(B B) +1/2 ], ( ) ( ) x A A A B cov = y B A B B X = (1 : X 0) in regression context often the model matrix

Notation 433 X 0 x 1,, x k x (1),, x (n) ssp(x 0 : y) cov d (X 0 : y) cor d (X 0 : y) n k data matrix of the x-variables: x (1) X 0 = (x 1 : : x k ) = x (n) variable vectors in the variable space R n observation vectors in the observation space R k partitioned matrix of the sums of squares and products of deviations about the mean of data (X 0 : y): ( ) T ssp(x 0 : y) = xx t xy t = (X xy t 0 : y) C(X 0 : y) yy partitioned sample covariance matrix of data (X 0 : y): ( ) Sxx s xy cov d (X 0 : y) = s xy partitioned sample correlation matrix of data (X 0 : y): ( ) cor d (X 0 : y) = R xx s 2 y r xy r xy 1 H orthogonal projector onto C (X), the hat matrix: H = X(X X) X = XX + = P X M J orthogonal projector onto C (X) : M = I n H the orthogonal projector onto C (1 n): J = 1 n 1n1 n = P 1n C centering matrix, the orthogonal projector onto C (1 n) : C = I n J (X 1 : X 2) partitioned model matrix X M 1 ˆβ orthogonal projector onto C (X 1) : M 1 = I n P X1 solution to normal equation X Xβ = X y, OLSE(β) X ˆβ = ŷ ŷ = Hy = OLS fitted values, OLSE(Xβ), denoted also Xβ = ˆµ, when µ = Xβ β solution to generalized normal equation X W Xβ = X W y, where W = V + XUX, C (W) = C (X : V) β if V is positive definite and X has full column rank, then β = BLUE(β) = (X V 1 X) 1 X V 1 y X β BLUE(Xβ), denoted also Xβ = µ ȳ mean of the response variable y: ȳ = (y 1 + + y n)/n x vector of the means of k regressor variables x 1,, x k : x = ( x 1,, x k ) R k ȳ projection of y onto C (1 n): ȳ = Jy = ȳ1 n

434 Notation ỹ centered y, ỹ = Cy = y ȳ ˆβ x ˆβ x = T 1 xx t xy ˆβ 0 BLP(y; x) BLUE(K β) BLUP(y f ; y) variables when X = (1 : X 0) = S 1 xx s xy: the OLS-regression coefficients of x- ˆβ0 = ȳ ˆβ x x = ȳ ( ˆβ 1 x 1 + + ˆβ k x k ): OLSE of the constant term (intercept) when X = (1 : X 0) the best linear predictor of the random vector y on the basis of the random vector x the best linear unbiased estimator of estimable parametric function K β, denoted as K β or K β the best linear unbiased predictor of a new unobserved y f LE(K β; y) (homogeneous) linear estimator of K β, where K R p q : {LE(K β; y)} = { Ay : A R q n } LP(y; x) (inhomogeneous) linear predictor of the p-dimensional random vector y on the basis of the q-dimensional random vector x: {LP(y; x)} = { f(x) : f(x) = Ax + a, A R p q, a R p } LUE(K β; y) (homogeneous) linear unbiased estimator of K β: {LUE(K β; y)} = { Ay : E(Ay) = K β } LUP(y f ; y) linear unbiased predictor of a new unobserved y f : {LUP(y f ; y)} = { Ay : E(Ay y f ) = 0 } MSEM(f(x); y) MSEM(Fy; K β) mean squared error matrix of f(x) (= random vector, function of the random vector x) with respect to y (= random vector or a given fixed vector): MSEM[f(x); y] = E[y f(x)][y f(x)] mean squared error matrix of the linear estimator Fy under {y, Xβ, σ 2 V} with respect to K β: MSEM(Fy; K β) = E(Fy K β)(fy K β) OLSE(K β) the ordinary least squares estimator of parametric function K β, denoted as K ˆβ or K β; here ˆβ is any solution to the normal equation X Xβ = X y risk(fy; K β) quadratic risk of Fy under {y, Xβ, σ 2 V} with respect to K β: risk(fy; K β) = tr[msem(fy; K β)] = E(Fy K β) (Fy K β) M mix M linear model: {y, Xβ, σ 2 V}: y = Xβ+ε, cov(y) = cov(ε) = σ 2 V, E(y) = Xβ mixed linear model: M mix = {y, Xβ+Zγ, D, R}: y = Xβ+Zγ+ ε; γ is the vector of the random effects, cov(γ) = D, cov(ε) = R, cov(γ, ε) = 0, E(y) = Xβ M f linear model with new future observations y f : {( ) ( ) ( )} y Xβ M f =,, σ 2 V V 12 y f X f β V 21 V 22