Mathematical Foundations of Applied Statistics: Matrix Algebra

Mathematical Foundations of Applied Statistics: Matrix Algebra Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/105

Literature Seber, G. A. F. (2008): A Matrix Handbook for Statisticians, Wiley. Gruber, M. H. J. (2014): Matrix Algebra for Linear Models, Wiley. Fieller, N. (2016): Basics of Matrix Algebra for Statistics with R, Chapman & Hall/CRC Press. Schmidt, K. and Trenkler, G. (2015): Einführung in die Moderne Matrix-Algebra, 3rd edition, Springer Gabler. Lang, S. (2005): Matrixalgebra mit einer Einführung in lineare Modelle, https://www.uibk.ac.at/statistics/personal/lang/ publications/matrixalgebra.pdf. Winter term 2018/19 2/105

Outline 1 Notation 2 Operations 3 Partitioned matrices 4 Linear independence, subspaces and rank 5 Matrix inverse 6 Definiteness of matrices and quadratic forms 7 Systems of linear equations 8 Generalized inverses and Moore-Penrose inverse 9 Determinants 10 Orthogonality 11 Trace 12 Eigenvalues and eigenvectors 13 Idempotent matrices 14 Matrix decompositions 15 Vector and matrix differentiation Winter term 2018/19 3/105

1 Notation A rectangular array of the form a 11 a 12... a 1p a 21 a 22... a 2p A = (a ij ) =......, a n1 a n2... a np where a ij R (i = 1,..., n; j = 1,..., p) is a matrix of size or dimension n p. The symbol R represents the set of real numbers. The first subscript in a ij indicates the row, the second identifies the column. Other notations are A R n p or A n p, where R n p denotes the vector space of all n-by-p real matrices. Matrices are denoted by uppercase letters in bold-faced type. Winter term 2018/19 4/105

1 Notation If we interchange the rows and columns of a matrix A, the resulting matrix is known as the transpose of A and is denoted by A (or A ); for example A = 6 2 4 7 1 3, A = ( 6 4 1 2 7 3 Formally, if A is denoted by A = (a ij ), then A is defined as If A R n p, then A R p n. A = (a ij ) = (a ji ). ). If A is any matrix, then (A ) = A. Winter term 2018/19 5/105

1 Notation A vector is a matrix with a single row or column. Vectors are denoted by lowercase letters in bold-faced type. A vector a R n is considered to be a column vector and elements in a vector are often identified by a single subscript; for example a 1 a = a 2 a 3. A row vector is regarded as a transpose of a column vector. Thus, if a is a column vector, then a is a row vector. A single real number is called a scalar. A variable representing a scalar is denoted by a lowercase italic letter, such as c. Winter term 2018/19 6/105

1 Notation Two matrices or two vectors are equal if they are of the same size and if all the elements in corresponding positions are equal. If A = A, then the matrix A is said to be symmetric; for example 3 2 6 A = 2 10 7, 6 7 9 is symmetric. All symmetric matrices are square. Winter term 2018/19 7/105

1 Notation Square p p matrix: A = a 11 a 12... a 1p a 21 a 22... a 2p...... a p1 a p2... a pp. The main diagonal of A consists of the elements a 11, a 22,..., a pp. If a matrix contains zeros in all off-diagonal positions, it is said to be a diagonal matrix. Winter term 2018/19 8/105

1 Notation For example, consider the matrix D = 8 0 0 0 0 3 0 0 0 0 0 0 0 0 0 4. This matrix can also be defined as D = diag(8, 3, 0, 4). We also use the notation diag(a) to indicate a diagonal matrix with the same diagonal elements as A; for example 3 2 6 3 0 0 A = 2 10 7, diag(a) = 0 10 0. 6 7 9 0 0 9 Winter term 2018/19 9/105

1 Notation A diagonal matrix with a 1 in each diagonal position is called an identity matrix, and is denoted by I (or I n to emphasize that the matrix is of order n): for example I 3 = 1 0 0 0 1 0 0 0 1. A lower triangular matrix is a square matrix with zeros above the diagonal; for example 7 0 0 0 L = 2 0 0 0 3 2 4 0. 5 6 1 8 An upper triangular matrix is defined similarly. Winter term 2018/19 10/105

1 Notation A vector of n ones is denoted by 1 n ; for example 1 3 = (1, 1, 1). We define J n = 1 n 1 n. An n p matrix of zeros is denoted by O n p and a vector of n zeros by 0 n. If the dimension of the matrix I n, J n or O n p is clear from the context, the subscript n is omitted. The same holds for the vector 1 n or 0 n. Winter term 2018/19 11/105

1 Notation Exercises 1 4 Winter term 2018/19 12/105

2 Operations - Sum of two matrices or two vectors If two matrices or two vectors are of the same size, they are said to be conformal for addition. Their sum is found by adding corresponding elements. Thus, if A is n p and B is n p, then C = A + B is also n p and is found as C = (c ij ) = (a ij + b ij ). The difference C = A B between to conformal matrices A and B is defined similarly: C = (c ij ) = (a ij b ij ). If A and B are both n p, then i. A + B = B + A. ii. (A + B) = A + B. Winter term 2018/19 13/105

2 Operations - Product of a scalar and a matrix Any scalar can be multiplied by any matrix. The product of a scalar and a matrix is defined as the product of each element of the matrix and the scalar: ca 11 ca 12... ca 1p ca 21 ca 22... ca 2p ca = (ca ij ) =......, ca n1 ca n2... ca np Since ca ij = a ij c, the product of a scalar and a matrix is commutative: ca = Ac. Winter term 2018/19 14/105

2 Operations - Product of two matrices or two vectors In order for the product AB to be defined, the number of columns in A must equal the number of rows in B, in which case A and B are said to be conformal for multiplication. Then, the (ij)th element of the product C = AB is defined as c ij = k a ik b kj, which is the sum of products of the elements in the ith row of A and the elements in the jth column of B. If A is n m and B is m p, then C = AB is n p. Winter term 2018/19 15/105

2 Operations - Product of two matrices or two vectors In general AB BA. Thus, matrix multiplication is not commutative. Matrix multiplication is distributive over addition or subtraction: i. A(B ± C) = AB ± AC, ii. (A ± B)C = AC ± BC. Multiplication involving vectors follows the same rules as for matrices. Winter term 2018/19 16/105

2 Operations - Product of two matrices or two vectors For example, if b is p 1, then b b = b1 2 + b2 2 + + b2 p is a sum of squares and b 2 1 b 1 b 2... b 1 b p bb b 2 b 1 b2 2... b 2 b p =....... b p b 1 b p b 2... bp 2 The distance from the origin to the point b is also referred to as the length or norm of b: p b = b b = bi 2. i=1 The distance between conformal a and b is defined to be d(a, b) = a b. Winter term 2018/19 17/105

2 Operations - Product of two matrices or two vectors If A is n p and B is p m, then (AB) = B A. If A,B, and C are conformal so that ABC is defined, then (ABC) = C B A. Let A be any n p matrix. Then A A and AA have the following properties: i. A A is p p and its elements are products of the columns of A. ii. AA is n n and its elements are products of the rows of A. iii. Both A A and AA are symmetric. Winter term 2018/19 18/105

2 Operations - Product of two matrices or two vectors If A is a symmetric p p matrix and y is a p 1 vector, the product y Ay = a ii yi 2 + a ij y i y j i i j is called a quadratic form. If x is n 1, y is p 1, and A is n p, the product x Ay = i,j a ij x i y j is called a bilinear form. Winter term 2018/19 19/105

2 Operations - Hadamard product If two matrices A and B are of the same size, the Hadamard (elementwise) product A B is found by multiplying corresponding elements: a 11 b 11 a 12 b 12... a 1p b 1p a 21 b 21 a 22 b 22... a 2p b 2p A B = (a ij b ij ) =....... a n1 b n1 a n2 b n2... a np b np Winter term 2018/19 20/105

2 Operations - Kronecker product Let A be an m n and B be a p q matrix. The Kronecker product of A and B is defined by the mp nq matrix a 11 B a 12 B... a 1n B a 21 B a 22 B... a 2n B A B = (a ij B) =....... a m1 B a m2... a mn B The following are properties of the Kronecker product: i. Assume that A and B are the same size. Then (a) (A + B) C = A C + B C. (b) C (A + B) = C A + C B. ii. Assuming A, B, C, and D have appropriate dimensions so that AC and BD are defined, (A B)(C D) = (AC BD). iii. The transpose (A B) = A B. Winter term 2018/19 21/105

2 Operations - Vec-operator The vec-operator transforms a matrix into a vector, by stacking all the columns of this matrix one underneath the other. Let A = (a 1, a 2,..., a n ) be an m n matrix. Then a 1 a 2 vec(a) =.. a n Some properties: i. vec(a + B) = vec(a) + vec(b). ii. vec(abc) = (C A)vec(B). iii. vec(ab) = (I A)vec(B) = (B I)vec(A). iv. vec(x ) = vec(x) = x. v. vec(xy ) = y x. Winter term 2018/19 22/105

2 Operations - Direct sum of matrices The direct sum of any pair of matrices A of size m n and B of size p q is a matrix of size (m + p) (n + q) defined as ( ) A Om q A B =. B O p n The direct sum of matrices is a special type of block or partitioned matrix. In general, the direct sum of n matrices is A 1 O... O n O A 2... O A i = diag (A 1, A 2,..., A n ) =. i=1..... O O... A n. Winter term 2018/19 23/105

2 Operations Exercises 5 13 Winter term 2018/19 24/105

3 Partitioned matrices It is sometimes convenient to partition a matrix into submatrices. For example, a partitioning of a matrix A into four (square or rectangular) submatrices of appropriate sizes can be indicated symbolically as follows: ( ) A11 A A = 12. A 21 A 22 To illustrate, let the 4 5 matrix A be partitioned as 7 2 5 8 4 ( ) A = 3 4 0 2 7 9 3 6 5 2 = A11 A 12 A 21 A 22 3 1 2 1 6. Winter term 2018/19 25/105

3 Partitioned matrices If two matrices A and B are conformal for multiplication, and if A and B are partitioned so that the submatrices are appropriately conformal, then the product AB can be found using the usual pattern of row by column multiplication with the submatrices as if they were single elements. For example, ( ) ( ) A11 A AB = 12 B11 B 12 A 21 A 22 B 21 B 22 ( A11 B = 11 + A 12 B 21 A 11 B 12 + A 12 B 22 A 21 B 11 + A 22 B 21 A 21 B 12 + A 22 B 22 ). Winter term 2018/19 26/105

3 Partitioned matrices If B is replaced by a vector b partitioned into two sets of elements, and if A is correspondingly partitioned into two sets of columns, then the previous equation becomes ( ) b1 Ab = (A 1, A 2 ) = A 1 b 1 + A 2 b 2, b 2 where the number of columns of A 1 is equal to the number of elements of b 1, and A 2 and b 2 are similarly conformal. The partitioned multiplication above can be extended to individual columns of A and individual elements of b: b 1 b 2 Ab = (a 1, a 2,..., a p ). = b 1a 1 + b 2 a 2 + + b p a p. b p Winter term 2018/19 27/105

3 Partitioned matrices Thus, Ab is expressible as a linear combination of the columns of A, in which the coefficients are elements of b. The product of a row vector and a matrix, a B, can be expressed as a linear combination of the rows of B, in which the coefficients are elements of a : b 1 a b B = (a 1, a 2,..., a n ) 2. b n = a 1b 1 +a 2 b 2 + +a n b n. If a matrix A is partitioned as A = (A 1, A 2 ), then ( ) A = (A 1, A 2 ) A = 1 A. 2 Winter term 2018/19 28/105

3 Partitioned matrices Exercise 14 Winter term 2018/19 29/105

4 Linear independence, subspaces and rank A set of vectors {a 1, a 2,..., a p } in R n is said to be linearly dependent if scalars c 1, c 2,..., c p (not all zero) can be found such that c 1 a 1 + c 2 a 2 + + c p a p = 0. If no coefficients c 1, c 2,..., c p can be found, the set of vectors a 1, a 2,..., a p is said to be linearly independent. A subspace of R n is a subset that is also a vector space. Given a collection a 1, a 2,..., a p R n, we define the following subspace as the span of {a 1, a 2,..., a p }: p span{a 1, a 2,..., a p } = b j a j : b j R. j=1 Winter term 2018/19 30/105

4 Linear independence, subspaces and rank If S R n is a subspace, then it is possible to find linearly independent basis vectors a 1, a 2,..., a k S such that S = span{a 1, a 2,..., a k }. All bases for a subspace S have the same number of elements. This number is the dimension and is denoted by dim(s). Winter term 2018/19 31/105

4 Linear independence, subspaces and rank There are two important subspaces associated with an n p matrix A. The range of A is defined by range(a) = {y R n y = Ax for some x R p }. The null space of A is defined by null(a) = {x R p Ax = 0}. If A R n p, then dim(null(a)) + dim(range(a)) = p. Winter term 2018/19 32/105

4 Linear independence, subspaces and rank The rank of any rectangular matrix A is defined as rank(a) = number of linearly independent columns of A = number of linearly independent rows of A = dim((range(a)). Suppose a rectangular matrix is n p of rank p, where p < n. Then A is said to be of full rank. We say that A is rank deficient if rank(a) < p. The maximum possible rank of an n p matrix A cannot be greater than either n or p; that is, rank(a) min{n, p}. Thus, in a rectangular matrix, the rows or columns (or both) are linearly dependent. Winter term 2018/19 33/105

4 Linear independence, subspaces and rank A common approach to finding the rank of a matrix is to reduce it to a simpler form, generally row echelon form, by elementary row operations. A matrix is in row echelon form when it satisfies the following conditions: 1 all nonzero rows (rows with at least one nonzero element) are above any rows of all zeroes, and 2 the leading coefficient (the first nonzero number from the left, also called the pivot) of a non-zero row is always strictly to the right of the leading coefficient of the row above it. Once in row echelon form, the rank equals the number of pivots and also the number of nonzero rows. Winter term 2018/19 34/105

4 Linear independence, subspaces and rank The following properties hold: i. rank(a + B) rank(a) + rank(b). ii. rank(ab) rank(a) and rank(ab) rank(b), hence rank(ab) min{rank(a), rank(b)}. iii. Multiplication by a full rank square matrix does not change the rank; that is, if B and C are full rank square matrices, rank(ab) = rank(ca) = rank(a). iv. For any matrix A, rank(a A) = rank(aa ) = rank(a ) = rank(a). Winter term 2018/19 35/105

4 Linear independence, subspaces and rank Exercises 15 17 Winter term 2018/19 36/105

5 Matrix inverse A full rank square matrix is said to be nonsingular. A nonsingular matrix A has a unique inverse, denoted by A 1, with the property that AA 1 = A 1 A = I. From this definition it is clear that (A 1 ) 1 = A. If A is square and rank deficient, then it does not have an inverse and is said to be singular. Winter term 2018/19 37/105

5 Matrix inverse The inverse of a nonsingular matrix A can be found using the Gauss-Jordan method. The inverse A 1 is obtained by transforming the augmented matrix [A I], using elementary row operations, into the matrix [I A 1 ]. A method for finding the inverse using determinants will be presented later. Winter term 2018/19 38/105

5 Matrix inverse If B is nonsingular and AB = CB, then we can multiply on the right by B 1 to obtain A = C. Similarly, if A is nonsingular, the system of equations Ax = c has the unique solution x = A 1 c. Two properties of inverses: i. If A is nonsingular, then A is nonsingular and its inverse can be found as (A ) 1 = (A 1 ). ii. If A and B are nonsingular matrices of the same size, then AB is nonsingular and (AB) 1 = B 1 A 1. Winter term 2018/19 39/105

5 Matrix inverse If A is symmetric, nonsingular, and is partitioned as ( ) A11 A A = 12, A 21 A 22 and if B = A 22 A 21 A 1 11 A 12, then, provided A 1 11 exist, the inverse of A is given by ( A 1 A 1 = 11 + A 1 11 A 12B 1 A 21 A 1 11 A 1 11 A 12B 1 ) B 1 A 21 A 1 11 B 1 and B 1. Winter term 2018/19 40/105

5 Matrix inverse Exercises 18 19 Winter term 2018/19 41/105

6 Definiteness of matrices and quadratic forms Any quadratic form y Ay can be expressed as ( ) y Ay = y A + A y, 2 and thus the matrix of a quadratic form can always be chosen to be symmetric. If the symmetric matrix A has the property y Ay > 0 for all possible y except y = 0, then the quadratic form y Ay is said to be positive definite, and A is said to be a positive definite matrix. Similarly, if y Ay 0 for all y and there is at least one y 0 such that y Ay = 0, then y Ay and A are said to be positive semidefinite. Winter term 2018/19 42/105

6 Definiteness of matrices and quadratic forms A is a negative definite matrix if A is positive definite. A is a negative semidefinite matrix if A is positive semidefinite. A matrix which is neither positive definite, negative definite, positive semidefinite, nor negative semidefinite is called indefinite. Winter term 2018/19 43/105

6 Definiteness of matrices and quadratic forms If A is positive definite, then all its diagonal elements a ii are positive. If A is positive semidefinite, than all a ii 0. A positive definite matrix is nonsingular. If A is positive definite, then A 1 is positive definite. Let B be an n p matrix. i. If rank(b) = p, then B B is positive definite. ii. If rank(b) < p, then B B is positive semidefinite. Winter term 2018/19 44/105

6 Definiteness of matrices and quadratic forms Note that if B is a square matrix, the matrix BB = B 2 is not necessarily positive semidefinite. For example, let Then B 2 = ( 1 2 1 2 B = ) ( 1 2 1 2 ), B B =. ( 2 4 4 8 ). In this case, B 2 is not positive semidefinite, but B B is positive semidefinite, since y B By = 2(y 1 2y 2 ) 2. Winter term 2018/19 45/105

6 Definiteness of matrices and quadratic forms If A is positive definite and is partitioned in the form ( ) A11 A A = 12, A 21 A 22 where A 11 and A 22 are square, then A 11 and A 22 are positive definite. Winter term 2018/19 46/105

6 Definiteness of matrices and quadratic forms Exercises 20 21 Winter term 2018/19 47/105

7 Systems of linear equations The system of n linear equations in p unknowns a 11 x 1 + a 12 x 2 + + a 1p x p = c 1 a 21 x 1 + a 22 x 2 + + a 2p x p = c 2. a n1 x 1 + a n2 x 2 + + a np x p = c n can be written in matrix form as Ax = c, where A is n p, x is p 1 and c is n 1. Winter term 2018/19 48/105

7 Systems of linear equations If n = p and A is nonsingular, then there exists a unique solution vector x obtained as x = A 1 c. If the system of equations Ax = c has one or more solution vectors, it is said to be consistent. If the system has no solution, it is said to be inconsistent. The system of equations Ax = c has at least one solution vector if and only if rank(a) = rank([a c]), where [A c] is an augmented matrix in which c has been appended to the coefficient matrix A as an additional column. A consistent system of equations can be solved by the usual methods for eliminating variables. A method involving generalized inverses is given later. Winter term 2018/19 49/105

7 Systems of linear equations Exercise 22 Winter term 2018/19 50/105

8 Generalized inverses and Moore-Penrose inverse We now consider generalized inverses of those matrices that do not have inverses in the usual sense. A generalized inverse of an n p matrix A is any p n matrix A that satisfies AA A = A. A generalized inverse is not unique except when A is nonsingular, in which case A = A 1. Every matrix, whether square or rectangular has a generalized inverse. This holds even for vectors. Winter term 2018/19 51/105

8 Generalized inverses and Moore-Penrose inverse Let A = 2 2 3 1 0 1 3 2 4. Let A 1 = 0 1 0 1/2 1 0 0 0 0, A 2 = 0 1 0 0 3/2 1/2 0 0 0. It is easily verified that AA 1 A = A and AA 2 A = A. Winter term 2018/19 52/105

8 Generalized inverses and Moore-Penrose inverse Suppose A is n p of rank r and that A is partitioned as ( ) A11 A A = 12, A 21 A 22 where A 11 is r r of rank r. Then a generalized inverse of A is given by ( ) A A 1 = 11 O, O O where the three O matrices are of appropriate sizes so that A is p n. The nonsingular submatrix need not be in the A 11 position. Winter term 2018/19 53/105

8 Generalized inverses and Moore-Penrose inverse Algorithm for finding a generalized inverse A for any n p matrix of rank r: 1 Find any nonsingular r r submatrix C. 2 Compute C 1 and (C 1 ). 3 Replace the elements of C by the elements of (C 1 ). 4 Replace all other elements in A by zeros. 5 Transpose the resulting matrix. Winter term 2018/19 54/105

8 Generalized inverses and Moore-Penrose inverse Let A be n p of rank r and let A be any generalized inverse of A and let (A A) be any generalized inverse of A A. Then i. rank(a A) = rank(aa ) = rank(a) = r. ii. (A ) = (A ). iii. A = A(A A) A A and A = A A(A A) A. iv. A = (A A) A. v. A(A A) A is symmetric, has rank r and is invariant to the choice of (A A). Winter term 2018/19 55/105

8 Generalized inverses and Moore-Penrose inverse Generalized inverses can be used to find solutions to a system of equations of less than full rank. If the system of equations Ax = c is consistent and if A is any generalized inverse for A, then x = A c is a solution. If Ax = c is consistent, then all possible solutions can be obtained by using all possible values of A in x = A c if c 0. The system of equations Ax = c has a solution if and only if for any generalized inverse A of A AA c = c. Winter term 2018/19 56/105

8 Generalized inverses and Moore-Penrose inverse The Moore-Penrose inverse of a matrix is a generalized inverse that is always unique. A matrix A + is the Moore-Penrose inverse of A if i. A + is a generalized inverse of A. ii. A + is reflexive (A + AA + = A + ). iii. A + A is symmetric. iv. AA + is symmetric. The generalized inverse and the Moore-Penrose inverse of a nonsingular matrix are its ordinary inverse. Winter term 2018/19 57/105

8 Generalized inverses and Moore-Penrose inverse Exercises 23 24 Winter term 2018/19 58/105

9 Determinants The determinant of a square matrix A is denoted det(a) or A. Let A be an n n matrix. Let A ij be the (n 1) (n 1) submatrix formed by deleting the ith row and the jth column of A. Then formulae for expanding determinants are det(a) = n ( 1) i+j a ij det(a ij ), j=1 (i fixed) and det(a) = n ( 1) i+j a ij det(a ij ), (j fixed). i=1 Winter term 2018/19 59/105

9 Determinants For example, expanding along the first row, we have 1 1 1 1 2 4 = 1 2 4 1 3 9 3 9 1 1 4 1 9 + 1 1 2 1 3 = ((2)(9) (4)(3)) ((1)(9) (4)(1)) +((1)(3) (2)(1)) = 6 5 + 1 = 2. For example, expanding along the second column, we have 1 1 1 1 2 4 = ( 1) 1 4 1 3 9 1 9 + 2 1 1 1 9 3 1 1 1 4 = ( 1)((1)(9) (4)(1)) + 2((1)(9) (1)(1)) 3((1)(4) (1)(1)) = 5 + 16 9 = 2. Winter term 2018/19 60/105

9 Determinants The determinants of some special square matrices are as follows: i. If D = diag(d 1,..., d n ), then det(d) = n i=1 d i. ii. The determinant of a triangular matrix is the product of the diagonal elements. iii. If A is singular, then det(a) = 0. iv. If A is nonsingular, then det(a) 0. v. If A is positive definite, then det(a) > 0. vi. det(a ) = det(a). vii. If A is nonsingular, then det(a 1 ) = det(a) 1. viii. If an n n matrix is multiplied by a scalar, the determinant becomes det(ca) = c n det(a). Winter term 2018/19 61/105

9 Determinants If A and B are of the same size, then i. det(a) det(b) = det(ab). ii. det(ab) = det(ba). iii. det(a 2 ) = det(a) 2. iv. In general, det(a) + det(b) = det(a + B) does not apply. Winter term 2018/19 62/105

9 Determinants If the square matrix A is partitioned as ( ) A11 A A = 12 A 21 A 22, and if A 11 and A 22 are square and nonsingular (but not necessarily the same size) then det(a) = det(a 11 ) det(a 22 A 21 A 1 11 A 12) = det(a 22 ) det(a 11 A 12 A 1 22 A 21). Winter term 2018/19 63/105

9 Determinants Adjoint method for finding the inverse of a matrix: let A be an n n matrix with det(a) 0. Then A 1 = 1 det(a) adj(a), where adj(a) is the adjoint matrix of A with elements C ji, the transpose of a matrix of cofactors C ij = ( 1) i+j det(a ij ), where A ij is the (n 1) (n 1) submatrix formed by deleting the ith row and jth column of A. Winter term 2018/19 64/105

9 Determinants Exercises 25 26 Winter term 2018/19 65/105

10 Orthogonality Two n 1 vectors a and b are said to be orthogonal if a b = 0. Geometrically, two orthogonal vectors are perpendicular to each other. Let θ be the angle between vectors a and b. The vector from the terminal point of a to the terminal point of b can be represented as c = b a. Winter term 2018/19 66/105

10 Orthogonality 42 MATRIX ALGEBRA Figure 2.4 Vectors a and b in 3-space. Figure: Vectors a and b in three-dimensional space. Winter term 2018/19 67/105

10 Orthogonality The law of cosines for the relationship of θ to the sides of the triangle can be stated in vector form as cos (θ) = a a + b b (b a) (b a) 2 (a a)(b b) = a a + b b (b b + a a 2a b) 2 (a a)(b b) = a b. (a a)(b b) When θ = 90, a b = 0 since cos(90 ) = 0. Thus, a and b are perpendicular when a b = 0. Winter term 2018/19 68/105

10 Orthogonality If a a = 1, the vector a is said to be normalized. A vector b can be normalized by dividing by its length. Thus c = is normalized so that c c = 1. b b b A set of p 1 vectors c 1, c 2,..., c p that are normalized and mutually orthogonal is said to be an orthonormal set of vectors. Winter term 2018/19 69/105

10 Orthogonality If the p p matrix C = (c 1, c 2,..., c p ) has orthonormal columns, C is called an orthogonal matrix. An orthogonal p p matrix C has the property C C = CC = I p. Thus an orthogonal matrix has orthonormal rows as well as orthonormal columns. A p k matrix C is said to be a columnwise orthonormal matrix if C C = I k. A p k matrix C is said to be a rowwise orthonormal matrix if CC = I p. Winter term 2018/19 70/105

10 Orthogonality To illustrate an orthogonal matrix, we start with 1 1 1 A = 1 2 0. 1 1 1 whose columns are mutually orthogonal but not orthonormal. To normalize the three columns, we divide by their respective lengths, 3, 6 and 2, to obtain the matrix C = 1/ 3 1/ 6 1/ 2 1/ 3 2/ 6 0 1/ 3 1/ 6 1/ 2. whose columns are orthonormal. Note that the rows of C are also orthonormal, so that C is an orthogonal matrix. Winter term 2018/19 71/105

10 Orthogonality Multiplication of a vector by an orthogonal matrix has the effect of rotating axes. If a point x is transformed to z = Cx, where C is orthogonal, then z z = (Cx) (Cx) = x C Cx = x Ix = x x. Hence, the transformation from x to z is a rotation. If the p p matrix C is orthogonal and A is any p p matrix, then i. 1 c ij 1, where c ij is any element of C. ii. det(c) = +1 or 1. iii. det(c AC) = det(a). Winter term 2018/19 72/105

10 Orthogonality Exercises 27 28 Winter term 2018/19 73/105

11 Trace The trace of an n n matrix A = (a ij ) is a scalar function defined as the sum of the diagonal elements of A; that is, tr(a) = n i=1 a ii. For example, suppose A = Then tr(a) = 8 3 + 9 = 14. 8 4 2 2 3 6 3 5 9. Winter term 2018/19 74/105

11 Trace Some properties of the trace are: i. If A and B are n n, then tr(a ± B) = tr(a) ± tr(b). ii. If A is n p and B is p n, then. iii. If A is n p, then tr(ab) = tr(ba). tr(a A) = where a j is the jth column of A. iv. If A is n p, then tr(aa ) = where a i is the ith row of A. Winter term 2018/19 75/105 p a j a j, j=1 n a i a i, i=1

11 Trace v. If A = (a ij ) is an n p matrix with representative element a ij, then n p tr(a A) = tr(aa ) = aij 2. i=1 j=1 vi. If A is any n n matrix and P is any n n nonsingular matrix, then tr(p 1 AP) = tr(a). vii. If A is any n n matrix and C is any n n orthogonal matrix, then tr(c AC) = tr(a). viii. If A is n p of rank r and A is a generalized inverse of A, then tr(a A) = tr(aa ) = r. Winter term 2018/19 76/105

11 Trace Exercises 29 30 Winter term 2018/19 77/105

12 Eigenvalues and eigenvectors For every square matrix A, a scalar λ and a nonzero vector x can be found such that Ax = λx, where λ is an eigenvalue of A and x is an eigenvector. To find λ and x for a matrix A, we write the previous equation as (A λi)x = 0. The square matrix (A λi) is singular and we can solve for λ using det(a λi) = 0, which is known as the characteristic equation. Winter term 2018/19 78/105

12 Eigenvalues and eigenvectors If A is n n, the characteristic equation will have n roots; that is, A will have n eigenvalues λ 1, λ 2,..., λ n. The λ s will not necessarily all be distinct, or all nonzero, or even all real. After finding λ 1, λ 2,..., λ n, the accompanying eigenvectors x 1, x 2,..., x n can can be found by solving the homogeneous linear system of equations (A λi)x = 0. Winter term 2018/19 79/105

12 Eigenvalues and eigenvectors If x is an eigenvector of A, kx is also an eigenvector. Eigenvectors are therefore unique only up to multiplication by a scalar. The length of x is arbitrary, but its direction from the origin is unique. Typically, an eigenvector x is scaled to normalized form so that x x = 1. Winter term 2018/19 80/105

12 Eigenvalues and eigenvectors If λ is an eigenvalue of A, then cλ is an eigenvalue of ca. If λ is an eigenvalue of A and x is the corresponding eigenvector of A, then cλ + k is an eigenvalue of the matrix ca + ki and x is an eigenvector of ca + ki, where c and k are scalars. If λ is an eigenvalue of A, then λ 2 is an eigenvalue of A 2. This can be extended to any power of A; that is, λ k is an eigenvalue of A k and x is the corresponding eigenvector. If λ is an eigenvalue of the nonsingular matrix A, 1/λ is an eigenvalue of A 1 and x is an eigenvector of both A and A 1. Winter term 2018/19 81/105

12 Eigenvalues and eigenvectors The eigenvalues of A + B are not of the form λ A + λ B, where λ A is an eigenvalue of A and λ B is an eigenvalue of B. Similarly, the eigenvalues of AB are not products of the form λ A λ B. However, the (nonzero) eigenvalues of AB are the same as those of BA. If x is an eigenvector of AB, then Bx is an eigenvector of BA. Let A be any n n matrix. i. If P is any n n nonsingular matrix, then A and P 1 AP have the same eigenvalues. ii. If C is any n n orthogonal matrix, then A and C AC have the same eigenvalues. Winter term 2018/19 82/105

12 Eigenvalues and eigenvectors Let A be an n n symmetric matrix. i. The eigenvalues λ 1, λ 2,..., λ n of A are real. ii. The eigenvectors x 1, x 2,..., x k of A corresponding to distinct eigenvalues λ 1, λ 2,..., λ k are mutually orthogonal. If A is any n n matrix with eigenvalues λ 1, λ 2,..., λ n, then i. det(a) = n i=1 λ i. ii. tr(a) = n i=1 λ i. iii. If A is positive definite, then λ i > 0 for i = 1, 2,..., n. iv. If A is positive semidefinite, then λ i 0 for i = 1, 2,..., n. The number of eigenvalues λ i for which λ i > 0 is the rank of A. Winter term 2018/19 83/105

12 Eigenvalues and eigenvectors Exercises 31 33 Winter term 2018/19 84/105

13 Idempotent matrices A square matrix A is said to be idempotent if A 2 = A. The only nonsingular idempotent matrix is the matrix I. If A is singular, symmetric, and idempotent, then A is positive semidefinite. If A is an n n symmetric idempotent matrix of rank r, then A has r eigenvalues equal to 1 and n r eigenvalues equal to 0. Winter term 2018/19 85/105

13 Idempotent matrices If A is symmetric and idempotent of rank r, then rank(a) = tr(a) = r. If A is an n n idempotent matrix, P is an n n nonsingular matrix, and C is an n n orthogonal matrix, then i. I A is idempotent. ii. A(I A) = O and (I A)A = O. iii. P 1 AP is idempotent. iv. C AC is idempotent. (If A is symmetric, C AC is a symmetric idempotent matrix.) Winter term 2018/19 86/105

13 Idempotent matrices Let A be n p of rank r, let A be any generalized inverse of A, and let (A A) be any generalized inverse of A A. Then A A, AA, and A(A A) A are all idempotent. Suppose that the n n symmetric matrix A can be written as A = k i=1 A i for some k, where each A i is an n n symmetric matrix. Then any two of the following conditions implies the third condition. i. A is idempotent. ii. Each of A 1, A 2,..., A k is idempotent. iii. A i A j = O for i j. If I = k i=1 A i, where each n n matrix A i is symmetric of rank r i, and if n = k i=1 r i, then both of the following are true: i. Each of A 1, A 2,..., A k is idempotent. ii. A i A j = O for i j. Winter term 2018/19 87/105

13 Idempotent matrices Exercises 34 35 Winter term 2018/19 88/105

14 Matrix decompositions If A is a p p symmetric matrix with eigenvalues λ 1, λ 2,..., λ p and normalized eigenvectors e 1, e 2,..., e p, then A can be expressed as A = EΛE p = λ j e j e j, j=1 where Λ = diag(λ 1, λ 2,..., λ p ) and E is the orthogonal matrix E = (e 1, e 2,..., e p ). The result above is the eigendecomposition (or spectral decomposition of A. E diagonalizes A; that is, E AE = Λ. Winter term 2018/19 89/105

14 Matrix decompositions If a p p matrix A is positive definite, we can use the eigendecomposition to find a square root matrix A 1/2. Since the eigenvalues of A are positive, we can substitute the square roots λ j for λ j in the eigendecomposition of A, to obtain A 1/2 = EΛ 1/2 E, where Λ 1/2 = diag( λ 1, λ 2,..., λ p ). The matrix A 1/2 is symmetric and has the property A 1/2 A 1/2 = (A 1/2 ) 2 = A. Winter term 2018/19 90/105

14 Matrix decompositions Let X R n p and min{n, p} = t. Expressing X by its singular value decomposition (SVD) gives X = UΣV = t σ j u j v j, j=1 where Σ R n p is a rectangular diagonal matrix with the singular values of X, σ 1 σ 2... σ t 0, on the diagonal and U = (u 1,..., u n ) R n n and V = (v 1,..., v p ) R p p are orthogonal matrices containing the left and right singular vectors of X, respectively. Winter term 2018/19 91/105

14 Matrix decompositions Applications of the SVD: 1 rank(x) = r min{n, p}, where r is equal to the number of the nonzero singular values, σ 1 σ 2... σ r > 0, of X. 2 The Moore-Penrose inverse X + R p n is X + = V r Σ + r U r, where Σ + r = diag(σ 1 1,..., σ 1 r ) R r r and V r and U r contain the first r columns of V and U, respectively. Winter term 2018/19 92/105

14 Matrix decompositions 3 Low-rank approximation: For k < r = rank(x), define X k = U k Σ k V k = k σ j u j v j, j=1 where Σ k = diag(σ 1,..., σ k ) R k k. Least-squares property: X k is the solution of min Y X Y 2 F subject to rank(y) k with where X F = X X k 2 F = r j=k+1 σ j u j v j n tr(x X) = p i=1 j=1 2 F x 2 ij. = r j=k+1 σ 2 j, Winter term 2018/19 93/105

14 Matrix decompositions Figure: Melencolia I (Albrecht Du rer) Winter term 2018/19 94/105

14 Matrix decompositions Figure: Image compression: SVDs of Dürer s magic square Winter term 2018/19 95/105

14 Matrix decompositions The SVD is very general in the sense that it can be applied to any n p matrix whereas the eigenvalue decomposition can only be applied to certain classes of square matrices. Nevertheless, the two decompositions are related. Given an SVD of X, as described above, the following relations hold: 1 X X = VΣ U UΣV = V(Σ Σ)V. 2 XX = UΣV VΣ U = U(ΣΣ )U. Consequently, i. The columns of V (right singular vectors) are eigenvectors of X X. ii. The columns of U (left singular vectors) are eigenvectors of XX. iii. The nonzero elements of Σ are the square roots of the nonzero eigenvalues of X X or XX. Winter term 2018/19 96/105

14 Matrix decompositions Exercises 36 38 Winter term 2018/19 97/105

15 Vector and matrix differentiation Suppose the function f : R p R, x = (x 1,..., x p ) f (x) = f (x 1,..., x p ) is given. The graph of f is a plane in R p+1. For example, f : R 2 R with f (x 1, x 2 ) = x 2 1 + x2 2. 8 6 f(x1, x2) 4 2 0 2 1 0 1 2 x2 0 1 1 x1 2 2 Winter term 2018/19 98/105

15 Vector and matrix differentiation Die p partial derivatives of a function f (x) = f (x 1,..., x p ) can be summarized by a vector, called the gradient f (x) = f (x) x = f (x) x 1. f (x) x p. The vector f (x) points to the direction of the steepest ascent. Winter term 2018/19 99/105

15 Vector and matrix differentiation Let H(x) be the matrix of the second derivatives of f (x) with respect to the vector x, named the Hessian matrix H(f (x)) = 2 f (x) x x = 2 f (x) x1 2 2 f (x) x 2 x 1. 2 f (x) x p x 1 The Hessian matrix is symmetric. 2 f (x) x 1 x 2... 2 f (x) x 2 2. 2 f (x) x p x 2... 2 f (x) x 1 x p 2 f (x) x 2 x p....... 2 f (x) x 2 p. Winter term 2018/19 100/105

15 Vector and matrix differentiation The gradient and the Hessian matrix of a function f : R p R can help to determine local optima of f. Let x 0 be a local optimum of f. Then f (x 0 ) = 0. The Hessian informs us if and what kind of local optimum there is: i. H(f (x 0 )) positive definite: local minimum. ii. H(f (x 0 )) negative definite: local maximum. iii. H(f (x 0 )) indefinite: saddle point. Winter term 2018/19 101/105

15 Vector and matrix differentiation Consider a function u = f (x) of the p variables in x. In many cases we can find an optimum of u by solving the system of p equations u x = 0. Occasionally the situation requires the optimization of the function u, subject to q constraints on x. We denote the constraints as h 1 (x) = 0, h 2 (x) = 0,..., h q (x) = 0 or, more succinctly, h(x) = 0. Winter term 2018/19 102/105

15 Vector and matrix differentiation Optimization of u subject to h(x) = 0 can be carried out using the method of Lagrange multipliers. We denote a vector of Lagrange multipliers by λ and let y = (x, λ ). We then let v = u + λ h(x). The optimum of u subject to h(x) = 0 is obtained by solving the equations y v = 0 or, equivalently where u x + h λ = 0 and h(x) = 0, x h x = h 1 x 1.... h 1 x p... h q x 1. h q x p. Winter term 2018/19 103/105

15 Vector and matrix differentiation Let a = (a 1,..., a p ), x = (x 1,..., x p ) and A be a symmetric p p matrix. Then i. ii. iii. iv. (a x) x = (x a) x = a. (Ax) x = A. (x Ax) x (x Ax) A = 2Ax. = xx. Winter term 2018/19 104/105

15 Vector and matrix differentiation Exercises 39 40 Winter term 2018/19 105/105