Bare minimum on matrix algebra. Psychology 588: Covariance structure and factor models

Bare minimum on matrix algebra Psychology 588: Covariance structure and factor models

Matrix multiplication 2 Consider three notations for linear combinations y11 y1 m x11 x 1p b11 b 1m y y x x b b n1 nm n1 np p1 pm yij xi 1bj1 xi 2bj2 xipbjp, i 1,, n, j 1,, m y j Xb j, j 1,, m Y XB Matrix multiplication is a very efficient way of writing simultaneous equation systems (i.e., linear combinations)

Inner (scalar) product 3 Suppose Y contains p DVs as columns, X contains q IVs, and B contains regression weights as: y11 y 1p x11 x 1q b11 b 1p y y x x b b N1 Np N1 Nq q1 qp Then an arbitrary y-entry for subject i and variable j is a linear combination of subject i s X-scores weighted for the j-th variable: Y, 1 1 2 2 y y x b x b x b ij N p ij i j i j iq qj

Outer products 4 If all entries in Y are considered simultaneously, Y can be shown as a sum of q outer products: q Y Y x b k k k k1 k1 q y11 y 1p x11 x 1q b11 b 1p y y x x b b N1 Np N1 Nq q1 qp Y k accounts for a fraction of the DVs variances, explained by the k-th IV x k with its weights b k for the p DVs

Algebraic properties of matrix multiplication 5 Suppose all following multiplications are defined: AB BA in general AB C ABC ABAC A BC c AB cacb

Trace 6 Trace is defined for square matrices as sum of diagonal elements: tr A nn n a i1 ii Useful for operations of sum of squares (typically of discrepancy of a model from the data) or weighted SS, along with the following properties: tr A tr A tr AB tr BA, tr ABC tr CAB tr BCA tr AB tr A tr B

For example, consider a residual matrix under the principal component model D X FA Then, sum of squares of all residuals is: F tr DD tr XAF XFA tr XX 2tr XFA tr AFFA The least-squares estimator of A is the one that minimizes F

Determinant 8 Determinant of an n n matrix A a ij, denoted by is defined as: A A a 11, if n 1 A n i j aij Aij for any i if n j1 1, 1 A ij, where called minor, is the determinant of a submatrix of A without row i and column j, and A is called cofactor for element (i, j) ij 1 i j

Inverse of square matrices 9 If all rows of A are linearly independent, there exists a unique n n matrix B such that: AB BA I B is denoted by A 1 and called the inverse of A The (j,i)-th entry of A 1 A is ij 1 i j --- note the reversed A subscripts A 0 Thus, it s obvious that for A 1 to be defined

Rank 10 Rank of an m n (m n) matrix A is defined as the number of linearly independent columns of A, with the following crucial properties A mn rank min, AB A B rank min rank, rank A square matrix must be full-rank for its inverse to exist --- necessary for estimation of parameters in SEM since it involves the inverse of the data covariance matrix and its determinant Following are all equivalent: full-rank, nonsingular, positive definite, A 0

Spectral decomposition 11 For an n n matrix A, an eigenvalue e is defined as Av ev, v 0, vv 1 vav e For any symmetric matrix A, all n eigenvalues are nonnegative (i.e., positive semi-definite or nonnegative definite); if collectively written, AV VE VV I E,, diag e1,, en AVEV VAV E --- spectral decomposition where eigenvalues are successively maximum, with tr E tr A cf. e A n j1 j

Singular value decomposition 12 While eigenvalue decomposition is defined for square matrices (spectral decomposition as a special case), SVD is defined more generally for any rectangular matrix as: X UTV, UU VV I where T is a diagonal matrix with nonnegative singular values on the diagonal and columns of U and V are orthonormal singular vectors For example, if Z is a matrix of deviation scores, the principal component model of its sample covariance matrix has simple relationship to its SVD as: Z UTV FV

In terms of the data covariance matrix, 1 1 S N 1 ZZ N 1 VTUUTV 1 2 V N 1 T VVEV If the factor-analysis convention of scaling is desired (i.e., components/factors scaled to have variance of one), ZUTV U VT FV 1 N 1 N 1 Alternatively, SVD of removes the constant scaling factor so that 1 Z N 1 ZUTV T 2 E

Lower-rank approximation (Eckart-Young theorem) 14 Suppose we want to extract R principal components from an N p, rank-p data matrix X (N p > R), then they are given by the largest R singular values and their singular vectors X UTVU TVU T V, 1 1 1 2 2 2 1 1, 2, 1, 2, 0 T2 U U U V V V T UTV where represents the R-dimensional subspace where the data variance is maximally captured and indicates 1 1 1 (p R)-dimensional subspace orthogonal to T 0 The R columns in V 1 are orthogonal reference axes in the R- dimensional space and the rows of U 1 T 1 are coordinates of the N observations projected onto this space UTV UTV 2 2 2 1 1 1

Likewise, the rank-r approximation can be shown for covariance matrix SN 1 1 ZZ by the spectral decomposition: S VEVVE VV E V, 1 1 1 2 2 2 E 0 1 1, 2, 0 E2 V V V E VEV where is the R-dimensional approximation to S that minimizes SS of the residuals 1 1 1 In addition, the left singular vectors of Z can be found by the spectral decomposition of as N 1 1 2 S VE V V E V N 1 1 ZZ 1 1 1 2 2 2 1 ZZ UEUU E U U E U, U U, U 1 1 1 2 2 2

Further operations 16 vec operation vectorizes a matrix stacking columns one below another vec A mn a a 1 n mn1 Kronecker product of A and B of any order is defined as A mn a 11 1n B pq a B B a a B B m1 mn mpnq