Lecture notes: Applied linear algebra Part 1. Version 2 Michael Karow Berlin University of Technology karow@math.tu-berlin.de October 2, 2008 1 Notation, basic notions and facts 1.1 Subspaces, range and kernel of a matrix In in following F m denotes the set of column vectors of length m with entries in F = R (real numbers) or F = C (complex numbers). A subset V F m is called a subspace if v 1, v 2 V implies λ 1 v 1 + λ 2 v 2 V for all λ 1, λ 2 F. Each subspace contains the zero vector. The sets {0} (trivial space) and F m are subspaces. A subset {v 1,..., v p } of a subspace V is said to be a basis of V if each v V can be written in the form v = v 1 x 1 + v 2 x 2 +... + v p, x p with unique coefficients x k F. If this is the case then in particular the relation 0 = v 1 x 1 + v 2 x 2 +... + v p, x p only holds if x 1 = x 2 =... = x p = 0. Hence the elements of a basis are linearly independent. Each basis of V has the same number of elements. This number is called the dimension of the subspace. Let A = [a jk ] F m n be a matrix with m rows and n columns whose entries are elements of F. Then the sets N F (A) = {x F n : Ax = 0}, R F (A) = {Ax : x F n } are subspaces. N F (A) is called the nullspace (synonym: kernel) of A. R F (A) is called the range (synonym: image) of A. In the sequel we omit the subscript F. Let A = [a 1,...,a n ] F m n, a k F m, x = [x 1,...,x n ] T, x k F. Then Ax = a 1 x 1 + a 2 x 2 +... + a n x n F m. Thus, Ax is a linear combination of the columns of A with coefficients x k, and the vector space R(A) is the set of all these linear combinations. One says that R(A) is the vector space that is spanned by the columns of A. Therefore R(A) is sometimes denoted as span(a 1,...,a n ) := R(A). 1
The maximum number of linearly independent columns of A is said to be the rank of A. The rank of A equals the dimension of R(A). It can be shown that dim(r(a)) +dim(n(a)) = n =rank A In particular, we have the equivalences ranka = n dim(r(a)) = n N(A) = {0} if m=n if m=n (n=number of columns of A). the columns of A are linearly independent the columns of A form a basis of R(A) R(A) = F n A is invertible (synonym: nonsingular), i.e. A 1 exists. Notation: In the following V U denotes the direct sum of the subspaces V, U. Recall: a sum of subspaces is direct if V U = {0}. 1.2 Scalar product, adjoint, unitary and Hermitian matrices Let V be a vector space over the field F, F = R or C. A scalar product on V is an F-valued function, : F F F with the following properties. For all u, v, w V, λ F, 1. v, v > 0 for v 0 (positive definiteness) 2. v, w = w, v (symmetry) 3. λ v, w = λ v, w 4. v, λ w = v, w λ. 5. u + v, w = u, w + v, w 6. u, v + w = u, v + u, w v, w V are orthogonal (synonym: perpendicular) if v, w = 0. The associated norm (also called the Euclidean length) is defined as v = v, v. If v = 1 then v is called a unit vector. If not otherwise stated in this lecture the scalar product is the standard scalar product on V = C n : n v, w := v w = v k w k, where v = [v 1... v n ] T, v = [w 1... w n ] T. v denotes the conjugate transpose of the column vector v. More generally, A = ĀT C n m denotes the adjoint (=conjugate transpose) of the matrix A = [a jk ] C m n. If A R m n is a matrix with real entries then A = A T. We have 2
1. Av, w = v, A w for all A C n n, v, w C n 2. (AB) = B A, (A ) = A, (λ A) = λa, for all A C m n, B C n p, λ C. Let a 1,...,a n C m be the columns of A C m n, i.e. A = [a 1,...,a n ]. Analogously B = [b 1,...,b n ] C m n. It is easily verified that a 1b 1 a 1b 2... a 1b n a 1, b 1 a 1, b 2... a 1, b n a 1 A B =. a n [b 1,..., b n ] = a 2b 1... = a, 2b 1.... a n b 1 a n b n a n, b 1...... a n, b n Thus A B is the matrix of scalar products of the columns of A with the columns of B. The matrix A A = [ a j, a k ] C n n is called the Gramian of the columns of A. The relation A A = I states that { 1 if j = k a j, a k = 0 otherwise. In words, we have A A = I if and only if the columns of A are pairwise orthogonal unit vectors. A square matrix A C n n with A A = I is said to be unitary. For a unitary matrix we have A = A 1. Exercise 1.1 (2+2 points) A = [a 1,...,a n ] C m n, B = [b 1,...,b p ] C q p and X = [x jk ] C n p. (a) Verify that AXB = n p x jk a j b k. (1) j=1 In particular AB = n a kb k (b) Suppose A A = I. Show that if n = p. v = n a k a k, v for all v R(A). A square matrix A C n n is said to be Hermitian if A = A. Exercise 1.2 (1 point) Let A C n n be Hermitian. Show that x Ax R for any x C n. A Hermitian matrix is said to be positive semi-definite if x Ax 0 for all x C n, 3
positive definite if x Ax > 0 for all x C n, x 0. Exercise 1.3 (4 points) Verify the following facts for A C m n. 1. Ax 2 = x, A Ax for all x C n. 2. A A is positive semi-definite. 3. A A is positive definite N(A) = {0} A A is invertible(non-singular). 4. N(A A) = N(A). 5. rank(a A) = rank(a). For a subspace V F m we define its orthogonal complement as V := {u F m : u, v = 0 for all v V}. We always have F m = V V. 2 The least squares problem Let A C m n, b C m and let V be a subspace of C m. We consider the functions f : V R g : C m R f(v) := v b, g(x) := Ax b. The least squares problem is to find the minimizers of f and g. Lemma 2.1 Let v 0 V. Then v 0 is the unique minimizer of f if and only if v 0 b, h = 0 for all h V. Proof: For any v 0, h V, λ C we have f(v 0 + λ h) 2 = v 0 + λ h b, v 0 + λ h b = v 0 b, v 0 b + v 0 b, λ h + λ h, v 0 b + λ h, λ h = f(v 0 ) 2 + 2 R(λ v 0 b, h ) + λ 2 h 2. ( ) If ( ) 0 for some h then we can find a λ C of sufficiently small modulus and suitable phase angle such that f(v 0 + λ h) < f(v 0 ). Hence in this case v 0 is not a minimizer. If on the other hand ( ) = 0 for all h V then it follows that f(v 0 + h) > f(v 0 ) for h 0. So v 0 is the unique minimizer of f. Corollary 2.2 A vector x 0 C m is a minimizer of g if and only if (A A)x 0 = A b (this is called the normal equation). 4
Proof: Applying lemma 2.1 to V = R(A) we conclude that x 0 is a minimizer if and only if 0 = Ax 0 b, Aξ = A (Ax 0 b), ξ for all ξ C n. The latter holds if and only if 0 = A (Ax 0 b) = A Ax 0 A b. Proposition 2.3 Suppose A C m n has linearly independent columns and R(A) = V (i.e. the columns of A form a basis of V). Then A A is nonsingular, and the unique minimizer of g is x 0 = (A A) 1 A b. The unique minimizer of f is Pb, where P := A(A A) 1 A. The matrix P is called the orthogonal projector onto V. Proof: The nonsingularity of A A is shown in Exercise 1.3. The rest is obvious. Note that the orthogonal projector P is Hermitian: P = P. Special cases: If A has only one column, A = a C m \ {0}, then Pv = aa a, v a 2v = a a. 2 Suppose the columns of A = [a 1,..., a n ] form an orthonormal basis of R(A), (i.e. A A = I). Then Pv = AA v = n a k a kv = n a k, v a k. (2) Further remarks: Lemma 2.1 and its proof also hold in infinite dimensional Hilbert spaces. However, a mimimizer v 0 may not exists if V is not a closed subspace. Since every subspace of C m has a finite basis the problem to find the minimizers for f is (in principle) solved by Proposition 2.3. Just find a basis and compute the unique minimizer Pb. If A has not full column rank then the minimizer of g is not unique. The set of minimizers is the affine space ˆx 0 + N(A A), where ˆx 0 is any solution of the normal equation. Exercise 2.4 (8 points) Definition: Let V and U be subspaces of C m such that C m = V U. A matrix P C m m is said to be a projector onto V along U if Pv = v for all v V and Pu = 0 for all u U. A projector is said to be orthogonal if U = V. Prove the statements (a)-(d) below. (a) The following assertions are equivalent for P C m m. (1) P 2 = P. 5
(2) C m = R(P) N(P) and P is the projector onto R(P) along N(P). (b) A projector P is orthogonal if and only if it is Hermitian (P = P ). (c) Let V = [v 1,..., v r ] C m r and W = [w 1,..., w r ] C m r be such that { 1 if j = k w j, v k = (this is called biorthogonality). 0 otherwise Then the matrix P := V W is a projector onto R(V ) along R(W). (d) Let C m = V U. Suppose the columns of V C m r form a basis of V and the columns of Z C m r form a basis of U. Then Z V is nonsingular and P := V (Z V ) 1 Z is the projector onto V along U. 3 The QR-decomposition We are going to show the following result. Theorem 3.1 Let A F m n, m n. There exists a unitary matrix Q C m m and an upper triangular matrix R = [r jk ] such that [ ] R A = Q (QR-decomposition of A) (3) 0 (If m = n then the 0 block below R is not present and we have A = QR). Let Q = [q 1,...,q m ] F m m. Then the identity (3) states a 1 = q 1 r 11 a 2 = q 1 r 12 + q 2 r 22 a 3 = q 1 r 13 + q 2 r 23 + q 3 r 33. a k = k q j r jk, j=1 k = 1,..., n If the columns of A are linearly independent then the first n columns of Q can be found by Gram-Schmidt-Orthogonalization: q 1 = a 1 a 1 q 2 = a 2 q 1 q 1, a 2 a 2 q 1 q 1, a 2 q 3 = a 3 q 1 q 1, a 3 q 2 q 2, a 3 a 3 q 1 q 1, a 3 q 2 q 2, a 3. q k = a k k 1 j=1 q j q j, a k a k k 1 j=1 q, k = 1,...,n j q j, a k 6
The remaining columns of Q could then be computed by Gram-Schmidt orthogonalization of any basis of R(A). However, this method for constructing Q is numerically not stable. Moreover the method fails if A has not full column rank (since then division by zero occurs). We will use an alternative method that uses Householder matrices. Definition: A matrix H C m m of the form H = I 2 aa a 2 a C m \ {0} is called a Householder matrix. We have Ha = a and Hv = v if v, a = 0. Thus, the multiplication with H is a reflection at the subspace (C a). Exercise 3.2 (2 points) Show that H is both Hermitian and unitary. Lemma 3.3 Let x, y C m If x y, x = y and x y R then the Householder matrix (x y)(x y) H = I 2 x y 2 satisfies Hx = y and Hy = x. Proof: We have x y 2 = (x y) (x y) This implies the claim. = x x x y y x + y y (y y = x x and x y = y x since x y R) = 2(x x y x) = 2(x y) x = 2(y y x y) = 2(y x) y. Proof of Theorem 3.1 We proceed by induction on m. The case m = 1 is trivial. Let m 2 and e 1 = [1, 0...,0] T C m. If the first column a 1 of A satisfies a 1 = r 11 e 1 for some r 11 C let H = I. Otherwise choose the factor r 11 C such that (r 11 e 1 ) a 1 R and r 11 = a 1. Then there exists a Householder matrix H with Ha 1 = r 11 e 1. In both cases we have HA = [ r11 0  By the induction assumption we have  = ˆQ triangular matrix ˆR. Thus r 11 [ ] A = H ˆR 0 ˆQ 0 ],  C (m 1) (n 1). [ ] ˆR with a unitary matrix 0 ˆQ and an upper [ ] 1 0 = H 0 ˆQ 0 =:Q r 11 [ ] ˆR 0. 7
Exercise 3.4 (3 points) Suppose A has linearly independent columns and has the QRfactorization (3). Write Q in the form Q = [Q 1, Q 2 ] with Q 1 C m n, Q 2 C m (m n). Show that the unique solutions of the least squares problems in Section 2 are given by v 0 = Q 1 Q 1 b, x 0 = R 1 Q 1 b. 4 The Schur decomposition Theorem 4.1 To any A C n n there exists a unitary matrix V C n n and an upper triangular matrix T such that A = V TV (Schur decomposition). Proof: The proof is by induction. The statement is trivial for 1 1 matrices. Let n 2 and let v be an eigenvector of A such that Av = λ v. Choose an orthonormal basis v 1, v 2,...,v n of C n such that v 1 = v. We have Av k = ξ k v 1 + n j=2 x jkv j for some ξ k, x jk C. Let ξ = [ξ 2,...,ξ n ], X = [x jk ]. Then A [v 1, v 2,...,v n ] =:V 1 = [Av 1, Av 2,...,Av n ] = [λ v 1, Av 2,...,Av n ] [ ] λ ξ = [v 1, v 2,..., v n ]. 0 X By the induction assumption there exists a unitary matrix V 2 and an upper triangular matrix T 2 such that X = V 2 T 2 V2. Hence, [ ] [ ] [ ][ ][ ] λ ξ λ ξ 1 0 λ ξ 1 0 A = V 1 V1 0 X = V 1 0 V 2 T 2 V2 V1 = V 1 0 V 2 0 T 2 0 V V1. It is easily verified that V is unitary. } {{ } =:V =:T 2 } {{ } =V 5 Normal matrices A matrix A C n n is said to be normal if it commutes with its adjoint, i.e. if the identity AA = A A holds. Hemitian and unitary matrices are normal. Proposition 5.1 A matrix A C n n be normal if and only if there exists a unitary matrix V C n n and a diagonal matrix Λ = diag(λ 1,...,λ n ) C n n such that A = V ΛV. (4) Exercise 5.2 (2 points) Prove Proposition 5.1. Hint: First show that if A = V BV with unitary V then A is normal if and only if B is normal. Then use the Schur decomposition and show that a triangular matrix T is normal if and only if it is diagonal. 8
Recall that in the decomposition (4) the diagonal elements of Λ are the eigenvalues of A and the columns of V are the associated eigenvectors. Thus Proposition 5.1 states that a matrix A is normal if and only if there exists an orthonormal basis of eigenvectors of A. The eigenvalues can be arbitrary complex numbers. However, a normal matrix A is Hermitian (unitary) if and only if all its eigenvalues are real (have modulus 1). Finally, note that (4) can be written in the form A = n λ k v k vk. This follows from (1). 6 The singular value decomposition (SVD) Proposition 6.1 Let A C m n, ranka = r. Then there exist unitary matrices V C n n, U C m m and positive numbers σ 1 σ 2... σ r such that ] [ˆΣ 0 A = U V, 0 0 ˆΣ = diag(σ1,...,σ r ). (5) =:Σ In this factorization the numbers σ k are unique. They are called the singular values of A. Let λ 1 ( ) λ 2 ( )... denote the eigenvalues of a Hermitian matrix in decreasing order. Then σ k = λ k (A A) = λ k (AA ) for k = 1,..., r. Convention: we define the singular values σ k of A to be 0 for k > ranka. Proof: The positive semidefinite Hermitian matrix A A has nonnegative eigenvalues λ k 0. Define σ k = λ k. Let V = [v 1,...,v n ] be a unitary matrix whose columns form a basis of eigenvectors such that A Av k = σk 2 v k and σ 1 σ 2... σ n. Since ranka A = ranka = r we have σ k > 0 for k r and σ k = 0 for k > r. For k r define u k := Av k /σ k.then { u j u k = v j (A A)v k 1 if j = k = σ j σ k 0 otherwise. Thus, the vectors u 1,...,u r are pairwise orthogonal unit vectors. Now, choose u r+1,...,u m C m such that U = [u 1,..., u m ] C m m is unitary. Then we have AV = UΣ. This implies (5). Remark: Suppose the unitary matrices V = 1, (6) U = [u 1,...,u r, u r+1,...,u }{{ m ], } [v }...,v {{ r, v } r+1,...,v n ] =:U 1 =:U 2 =:V 1 =:V 2 (7) 9
satisfy (5). Then R(V 1 ) = N(A), R(V 2 ) = N(A), R(U 1 ) = R(A), R(V 2 ) = R(A). Furthermore, we have A = U 1ˆΣV 1 = (The second equation is a special case of (1).) r σ k u k vk. Exercise 6.2 (2 points) Show that the singular values of a normal matrix A are the absolute values of the eigenvalues of A. 7 The Moore-Penrose generalized inverse For any matrix A C m n the linear map l : N(A) R(A), x l A(x) is bijective (i.e. one-to-one and onto). Hence the map pinv : C m C n, pinv(y 1 + y 2 ) := l 1 (y 1 ), y 1 R(A), y 2 R(A) is well defined. It is easily seen that pinv is linear. By elementary linear algebra there is a unique matrix A + C n m such that pinv(y) = A + y for all y C m. This matrix is called the Moore-Penrose generalized inverse of A. It can be computed via a singular value decomposition. Precisely, (with the notation (5) and (6)) we have A + = V [ˆΣ 1 ] 0 0 0 U = V 1ˆΣ 1 U 1 = r σ 1 k v k u k. The Moore-Penrose inverse yields the solution for the least squares problem even if A has not full column rank: Proposition 7.1 Let A C m n, b C m. The set of minimizers of the function g(x) = Ax b, x C n, is the affine space A + b + N(A). Furthermore A + b is the minimizer with the smallest (Euclidean) norm. Exercise 7.2 (4+1+1 points) (a) Prove Proposition 7.1. (b) Show that A + = (A A) 1 A if the columns of A are linearly independent. (c) Show that A + A is the orthogonal projector onto N(A), and that AA + is the orthogonal projector onto R(A). 10
Exercise 7.3 (6 points)prove the following statement. Let A C m n B C p q, C C m q. Then the matrix equation AXB = C has a solution X C n p if and only if In this case the general solution is AA + CB + B = C. X = A + CB + + Y A + AY BB +, Y C n p. 8 The spectral norm and the condition number of a matrix We consider the quantities A := max x C n,x =0 Ax Ax, inf(a) := min x x C n,x =0 x, where the norms x and Ax are the Euclidean vector norms. From the definition it follows that inf(a) x Ax A x for all x C n The the inequalities are sharp. The quantity A is called the spectral norm of A. Proposition 8.1 For any A C m n, A = σ 1, inf(a) = σ n, where σ 1 and σ n denote the maximum and the minimum (the nth) singular value of A. Proof: Multiplication of a vector with a unitary matrix does not change its Euclidean norm. Thus Ax = UΣV x = ΣV x x x V x = Σy n y = σ2 k y k 2 n y, ( ) k 2 where y = [y 1,...,y n ] T := V x. The quotient ( ) is obviously bounded from below by the smallest singular value. It is bounded from above by the largest singular value. The bounds are attained for the vectors y = [0,..., 0, 1] T = V v n and y = [1, 0,..., 0] T = V v 1 respectively. Exercise 8.2 (2 points) Let A C n n be nonsingular. Prove that inf(a) = A 1 1. (8) 11
For a nonsingular square matrix A C n n the quantity κ(a) := A A 1 = A inf(a) = σ 1 σ n is called its condition number. The condition number occures in the following error bound for the solution of linear equations. Proposition 8.3 Let A,  Cn n be nonsingular, and let x, ˆx, b 0 be such that Ax = b, ˆx = b. Then x ˆx A  κ(a). ˆx A Proof: We have x ˆx = A 1 b  1 b = (A 1  1 )b = A 1 ( A) 1 b = A 1 ( A)ˆx Thus, This yields the result. x ˆx A 1  A ˆx. The larger the condition number the less reliable is the numerical solution of a linear equation. 12