MS-A000* Matrix algebra ( Ville Turunen, Aalto University)

Size: px

Start display at page:

Download "MS-A000* Matrix algebra ( Ville Turunen, Aalto University)"

James Horton
6 years ago
Views:

1 x a c b Figure 1: Geometry of Pythagoras equation a 2 + b 2 = c 2, with x = a b MS-A000* Matrix algebra ( Ville Turunen, Aalto University) Welcome to the lecture course on linear algebra! We shall treat vectors and matrices in Euclidean spaces, the spectral theory (eigenvalue decompositions) of normal matrices, finally reaching the so-called SVD (Singular Value Decomposition) Such vector issues will be found everywhere in advanced mathematics and its applications in engineering and sciences Feedback is welcome: please contact villeturunen@aaltofi 0 It s all about geometry and algebra Vectors are best understood when illustrating the algebraic manipulations with schematic pictures For the best learning outcomes, the reader of these notes should draw his/her own mathematical pictures next to the calculations For example, see the picture visualizing Pythagoras equation There, by the areas of right-angled triangles and squares, c 2 = a b 2 + 4ab/2 = (a 2 + b 2 2ab) + 2ab = a 2 + b 2 Thus we get Pythagoras equation a 2 + b 2 = c 2 (1) Geometry of vectors is based on this! Actually, we may say without exaggeration that high-dimensional (even infinite-dimensional) calculations can often be correctly illustrated by just two-dimensional pictures! 1

2 (0, 0) x = (x 1, x 2 ) r = x ϕ = arg(x) Figure 2: Polar coordinates (r, ϕ) for point x = (x 1, x 2 ) R 2 1 Real plane and complex numbers Definition Let R be the real line Real plane R 2 consists of points x = (x 1, x 2 ), where x 1, x 2 R are the (Cartesian) coordinates of x R 2 Definition The polar coordinates (r, ϕ) = ( x, arg(x)) of point x = (x 1, x 2 ) R 2 satisfy r := x x 2 2; tan(ϕ) = x 2 x 1 if x 1 0 Here r 0 is distance of x from origin, and the argument ϕ = arg(x) R is the angle between positive horizontal axis and interval from O to x Remark: Example the argument ϕ is identified with ϕ + 2π here If x = ( 3, 1), then x = (2 cos(π/6), 2 sin(π/6)), since 3 2 x = + 12 = 2, arg(x) = arctan(1/ 3) = π/6 2

3 ir y x + y ir x x 0 R 0 x y x x x R Complex numbers Definition Figure 3: Arithmetic with complex numbers Points x = (x 1, x 2 ) R 2 can be thought as complex numbers x = x 1 + ix 2 C, where Re(x) := x 1 R is the real part and Im(x) := x 2 R is the imaginary part, and i := 0 + i1 is the imaginary unit, and t R is identified with t + i0 C If x, y C, there are familiar formulas: x + y := (x 1 + y 1 ) + i(x 2 + y 2 ), x y := (x 1 y 1 ) + i(x 2 y 2 ), x := ( x 1 ) + i( x 2 ) Complex conjugate of x = x 1 + ix 2 C is x = x := x 1 ix 2 C Absolute value of x C is (thinking of Pythagoras) x := x x 2 2 Example If x = 3 + i and y = 1 + 2i then x + y = 4 + 3i, x y = 2 i, x = 3 i, x = = 10 3

4 ir x + y y y 0 x x x + y x y R Figure 4: Triangle inequality of complex numbers Triangle inequality in C For x, y C, there is the triangle inequality Let s verify this First, calculating ( x + y ) 2 x + y 2 x + y x + y (2) = ( x + y ) 2 (x 1 + y 1 ) + i(y 1 + y 2 ) 2 = ( x x y + y 2) ( (x 1 + y 1 ) 2 + (x 2 + y 2 ) 2) = 2 x y 2 (x 1 y 1 + x 2 y 2 ), we need to show that x y x 1 y 1 + x 2 y 2 Here x 2 y 2 (x 1 y 1 + x 2 y 2 ) 2 = (x x 2 2)(y y 2 2) (x 2 1 y x 1 x 2 y 1 y 2 + x 2 2 y 2 2) = x 2 1 y x 2 2 y x 1 x 2 y 1 y 2 = (x 1 y 2 x 2 y 1 ) 2 0 This proved the triangle inequality (2) 4

5 1 i i 0 ir xy y x 1 R Figure 5: Product of complex numbers Product of complex numbers Product xy C of x, y C has the natural polar coordinate definition: { xy = x y, arg(xy) = arg(x) + arg(y) (3) Equivalently, in the Cartesian coordinates xy := (x 1 y 1 x 2 y 2 ) + i(x 1 y 2 + x 2 y 1 ) C (4) Especially, i 2 = (0 + 1i)(0 + 1i) = (0 1) + i(0 + 0) = 1 Equivalently, i 2 = i i = 1, and arg(i 2 ) = 2arg(i) = 2π/2 = π Example If x = i, y = + 5 i, then xy = (48 15) + i( ) 65 By the way, here x = y = 1 = xy = i Remark Always x x = (x 1 ix 2 )(x 1 + ix 2 ) = x x 2 2 = x 2 0 5

6 ir yx x + y y x y x + y y x yy xx (x + y)(x + y) 0 x R y x (x + y) xy Figure 6: Triangle inequality of complex numbers, again Another proof of Triangle inequality x + y 2 = (x + y)(x + y) = (x + y)(x + y ) = xx + yy + xy + yx = x 2 + y Re(xy ) x 2 + y xy = x 2 + y x y = ( x + y ) 2 Thus the Triangle inequality follows by taking square roots: x + y x + y 6

7 1 = e iπ i = e iπ/2 i 0 ir e it = cos(t) + i sin(t) t R 1 = e 0 = e i2π Figure 7: Euler s formula Division and powers of complex numbers Division x y = x/y C of x, y C (where y 0) satisfies { x/y = x / y, arg(x/y) = arg(x) arg(y) (5) Notice that x y = xy yy = xy y 2 Example If x = i and y = + 5 i then x y = xy y 2 y =1 = xy = ( ) + i( ) 65 = i Example If n Z then z n = z n and arg(z n ) = n arg(z) In case z = cos(t) + i sin(t), we get de Moivre s formula: (cos(t) + i sin(t)) n = cos(nt) + i sin(nt) (6) It can also be shown that the following Euler s formula e it = cos(t) + i sin(t) (7) holds, where the complex exponential is defined by the power series e z := k=0 1 k! zk (8) This satisfies for instance e w+z = e w e z 7

8 u v v u u v 1 v 2 O u + v v u 1 u 3 u v Figure 8: Schematic picture of vector operations 2 Vector spaces Vectors x = (x 1, x 2, x 3,, x n ) are everywhere : numbers x k are typically results of systematic measurements related to various phenomena To understand real vectors, we need complex numbers (see Gauss Fundamental Theorem of Algebra, on page 40; also Fourier transform easily takes real vectors to complex) From now on, let K := R or K := C (real or complex number field) Definition Vector space K n consists of points x = (x 1,, x n ), where x k K is the kth (Cartesian) coordinate of x K n Point O := (0,, 0) K n is the origin If λ K and x, y K n, let x + y := (x 1 + y 1,, x n + y n ), x y := (x 1 y 1,, x n y n ), λx := (λx 1,, λx n ), x := ( x 1,, x n ) Let a, b, x, y K n ; we identify vectors ab and xy if b a = y x Remark! We identify vector xy with point y x K n How do the vector operations look like? Illustrate these examples: Example Points λx, when x = (3, 1), π λ π Example λx + µy, when λ, µ { 1, 0, 1}, x = (3, 1), y = (1, 2) 8

9 u u O u v v v Geometry of vectors Figure 9: Norms, distances Definition The inner product (or dot product) of u, v C n is u v = u, v := n u k v k = u 1 v u n v n C (9) k=1 The norm u 0 of u C n is ( n ) 1/2 u := u, u 1/2 = u k 2 = ( u u n 2 ) 1/2 (10) k=1 u 2 = u, u can be thought as the energy of vector u C n Here, the number u k 0 is the distance from the equilibrium for each index k {1,, n}, with the energy proportional to the number u k 2 Then the sum of such energies is the total energy u 2 Definition Define the distance between u, v C n to be u v 0 (11) Remark Picture a triangle with vertices at O, u, v C n ; notice that u v 2 = u 2 + v 2 2 Re u, v Definition Vectors u, v C n are orthogonal if u, v = 0 (and then u v 2 = u 2 + v 2 ; remember Pythagoras!) Idea: u, v = u v cos(α), with α the angle between u, v Now here u, v = 0 means cos(α) = 0 (when u O v), the case of the orthogonality! 9

10 u O u/ u v/ v v Figure 10: Unit normalizations u/ u and v/ v of vectors u O and v O Cauchy Schwarz inequality Remember: Energy u, u = u 2 0 Can we estimate u, v C somehow? Proposition (Cauchy Schwarz inequality) For every u, v C n, u, v u v (12) Proof We may assume u v > 0 (for otherwise the claim is trivial) Also, noticing that u u, v = u v u, v, v we may assume the unit normalizations u = 1 = v, and then just prove that u, v 1 (Why this is enough? Think!) Indeed, now u, v = n u k v k k=1 (2) n u k v k ( ) k=1 n u k 2 + v k 2 k=1 2 = u 2 + v 2 2 u =1= v = 1 Above, inequality ( ) holds because 0 ( u k v k ) 2 QED 10

11 v O v u u u + v u u + v v x x y x z y y z z x z x y + y z u + v u + v Figure 11: Triangle inequality of vectors Triangle inequality (Corollary to Cauchy Schwarz) Triangle inequality For all u, v C n, u + v u + v (13) Proof By an application of Cauchy Schwarz (12), u + v 2 = u + v, u + v = u, u + u, v + v, u + v, v = u Re u, v + v 2 (12) u u v + v 2 = ( u + v ) 2 QED Example Let x = London, y = P aris, z = Rome For u = x y and v = y z, the Triangle inequality (13) then says: x z x y + y z The distance from x to z is at most the distance from x via y to z 11

12 u v u v v u + v u O u u + v v For a parallelogram in C n, the sum of the squares of the diagonals = the sum of the squares of the edges Figure 12: Parallelogram identity: u + v 2 + u v 2 = 2 u v 2 Polarization and parallelogram identities The norm was defined by the inner product Actually, the inner product can be recovered from the norm: Exercise Prove the polarization identity 4 u, v = u + v 2 u v 2 + i u + iv 2 i u iv 2 (14) Especially, notice that here 4 Re u, v = u + v 2 u v 2, Im u, v = Re u, iv Exercise Prove the parallelogram identity in C n : u + v 2 + u v 2 = 2 ( u 2 + v 2) (15) Draw a parallelogram in a plane, with vertices at O, u, v, u + v R 2 : are u + v, u v, u, v connected to the distances between these vertices? 12

13 u v R 3 O α v R 3 u R 3 Figure 13: Cross product Cross product in R 3 The cross product u v R 3 of vectors u, v R 3 is u v := (u 2 v 3, u 3 v 1, u 1 v 2 ) (u 3 v 2, u 1 v 3, u 2 v 1 ) (16) Equivalent geometric definition for the cross product: 1 u, u v = 0 = v, u v ; 2 u v = u v sin(α), with α the angle between u, v 3 The vector triple (u, v, u v) has right-handed orientation Application 1 u v is the area of the parallelogram with vertices at O, u, v, u + v R 3 Application 2 u v, w is the volume of the polytope with vertices at O, u, v, w, u + v, u + w, v + w, u + v + w R 3 Application 3 (q p) (r p) is a normal vector to a plane S R 3 if p, q, r S form a non-degenerate triangle 13

14 Nice to know: vector subspaces, dimension Definition (1) O V, Subset V K n is called a vector subspace if (2) x + y V whenever x, y V, and (3) λx V whenever λ K and x V The span of vectors S 1,, S k K n is the vector subspace { k } Z k = span{s j } k j=1 := λ j S j K n : λ 1,, λ k K For example, (this is a line or the origin), and j=1 Z 1 = KS 1 = {λ 1 S 1 : λ K} K n {O} Z 1 Z 2 Z k 1 Z k K n The dimensions are dim(k n ) = n and dim({o}) = 0; if here Z j+1 Z j, then Hence dim(z j+1 ) = 1 + dim(z j ) 0 dim(z k ) k Vectors S 1,, S k are linearly independent if dim(z k ) = k (otherwise they are called linearly dependent) Note The dimension depends on the scalar field K: The complex number field C has dimension 1 as a complex vector space If C is identified as the real plane R 2, then it has dimension 2 as a real vector space That is, Similarly, dim(c) = 1 < 2 = dim(r 2 ) dim(c n ) = n < 2n = dim(r 2n ), where we understand C n to be a complex vector space, even if sometimes it might be identified with the real vector space R 2n 14

15 Nice to know: Gram Schmidt orthonormalization Definition Vectors U 1,, U k K n are orthonormal if for all j, l {1,, k} { 1 when j = l, U j, U l = 0 when j l Gram Schmidt algorithm Let vectors S 1,, S k K n be linearly independent The Gram Schmidt process produces orthonormal vectors U 1,, U k K n as follows: Let U 1 := S 1 / S 1, (17) and for j 2 then where U j := Notice that for all d {1,, k} here V j V j, (18) j 1 V j := S j S j, U l U l (19) l=1 span{s 1,, S d } = span{u 1,, U d } Remark This Gram Schmidt process is numerically unstable (round-off errors accumulate), but there are ways to stabilize the process Example Vectors S 1 = and then V 2 = [ ] [ ] U 1 =, [ ] 3 and S 4 2 = [ ] 3 4 [ ] 3/5 4/5 [ ] 10 give 5 [ ] 1 3/ =, 2 4/5 [ ] 3/5 = 4/5 [ ] [ ] [ ] 3/5 44/5 =, 4/5 33/5 so that U 2 = V [ ] 2 4/5 V 2 = 3/5 15

16 A(u + v) = A(u) + A(v) A(v) K n : λu O u v u + v A K m : A(λu) = λ A(u) O = A(O) A(u) Figure 14: Idea of a linear mapping A : K n K m 3 Linear functions, linear equations Function A : K n K m is linear if A(u + v) = A(u) + A(v), A(λu) = λ A(u) for all λ K and u, v K n Then we often write Au := A(u) Equivalently, function A : K n K m is linear if and only if its graph G(A) is a vector subspace of K n K m = K n+m, where G(A) := {(u, A(u)) : u K n } Generic example Define function A : K n K m by n (A(u)) j := A jk u k, (20) where matrix elements A jk K for each j {1,, m} and k {1,, n} Then A is linear Actually, this example is surprisingly(?) general, as we shall soon see Trivial example We identify numbers x R with vectors (x) R 1 In this sense, function f : R R is linear if and only if f(x) = ax for some a R Then the graph G(f) = {(x, f(x)) : x R} described by the equation y = f(x) is a line through the origin (0, 0) of the (x, y)-plane, and this line has the slope a 16 k=1

17 j = (0, 1, 0) = I 2 j = (0, 1) = I 2 O i = (1, 0, 0) = I 1 O i = (1, 0) = I 1 k = (0, 0, 1) = I 3 Figure 15: Standard basis vectors of R 3 and R 2, respectively Norm The norm of linear A : K n K m is the number A := max Au = max Au, v (21) u K n : u 1 u K n,v K m : u, v 1 Then clearly Aw A w for all w K n Idea here: Linear functions treat vector operations nicely Norm measures the size of the mapping A (that is, how much A can stretch vectors in the extreme case) Standard basis vectors Definition Standard basis vectors I 1,, I n K n are defined by { 1 if j = k, I jk := 0 if j k For example, I 1, I 2, I 3 K 3 are I 1 = (1, 0, 0) = i, I 2 = (0, 1, 0) = j, I 3 = (0, 0, 1) = k Vectors u = (u 1,, u n ) K n can then be written u = n u k I k, k=1 which we shall use in the sequel Example: (9, 7, 5) = 9(1, 0, 0) + 7(0, 1, 0) + 5(0, 0, 1) 17

18 Matrix [A] of linear function A Let A : K n K m be linear Let I 1,, I n K n be the standard basis vectors If u = (u 1,, u n ) K n then Au K m, and Au u = A linear = n u k I k, k=1 n u k A(I k ), k=1 where A(I k ) K m Let A jk := (A(I k )) j K Then matrix A 11 A 12 A 1n [ ] A 21 A 22 A 2n A := Km n A m1 A m2 A mn (with m rows, n columns) has all the information about linear function as A : K n K m, (Au) j = In the sequel, we shall always identify n A jk u k (22) k=1 vector u = (u 1,, u n ) K n with matrix [u] = u 1 u n K n 1 We shall identify linear mapping A : K n K m with its matrix [A] K m n Notice that the kth column A k K m 1 of matrix [A] = [A 1 A n ] K m n is then A k = A 1k A 2k A mk = [A(I k)] K m 1, which is identified with the vector A(I k ) = (A 1k, A 2k,, A mk ) K m Especially, the standard basis vectors I 1,, I n K n for the so-called identity matrix [I] = [I 1 I n ] K n n, for which Iu = u for all u K n 18

19 A 1 = A(I 1 ) K n : O I n I 1 A K m : O = A(O) A n = A(I n ) Figure 16: Matrix [A] = [A 1 A n ] K m n of linear mapping A : K n K m Example Let A : R 3 R 2 be defined by A(u) := (2u 1 + 3u 2 + 4u 3, 5u 1 + 6u 2 + 7u 3 ) (23) Then A is linear, and its matrix [A] R 2 3 is [ ] [A] = The matrix notation for (23) would be [ ] [ ] [ ] [ ] Au = A u = u u u 3 [ ] 2u1 + 3u = 2 + 4u 3 5u 1 + 6u 2 + 7u 3 For instance, if u = ( 1, 2, 3) R 3, then [ ] [Au] = [A][u] = [ ] 2( 1) + 3( 2) + 4( 3) = = 5( 1) + 6( 2) + 7( 3) meaning that Au = ( 20, 38) R 2 We shall calculate the norm A later [ ] 20, 38 19

20 Matrices can be read from inner products: and v K m, then If A : K n K m is linear Au, v = m (Au) j v j = j=1 m n A jk u k v j j=1 k=1 Thus if I = [I 1 I 2 I 3 ] denotes the identity matrix of any dimension, if u = I k K n and v = I j K m, then Au, v = A jk (24) So, if we know all the inner products Au, v K for all u K n and v K m, we know all the matrix elements A jk K Remark Next we see that already the inner products Au, u are enough to determine A K n n when u C n (but not when u R n ) 20

21 A 2 = [ ] sin(ϕ) cos(ϕ) I 1 [ ] cos(ϕ) sin(ϕ) I 2 A 1 = [ cos(ϕ) sin(ϕ) O ϕ A = [A 1 A 2 ] = sin(ϕ) cos(ϕ) Identity matrix I = [I 1 I 2 ] = ] [ ] Figure 17: Rotation A R 2 2 by angle ϕ around the origin Weak formulation of linear mappings Weak Formulation Theorem Let A, B C n n Then A = B if Au, u = Bu, u for all u C n (25) Proof Let u, v, w C n, λ C, and Cw = (A B)w = Aw Bw Here Cw, w = (A B)w, w = Aw Bw, w = Aw, w Bw, w = 0 so that 0 0= Cw,w = λ C(u + λv), u + λv 0= Cw,w = λ 2 Cu, v + λ 2 Cv, u Plug in λ {1, i} to get the following pair of equations: { 0 = Cu, v + Cv, u, 0 = Cu, v Cv, u Clearly, Cu, v = 0 for all u, v C n So Aw Bw = Cw = O for all w C n Thus A = B QED Remark Complex numbers are needed in (25) If we only had Au, u = Bu, u for all u R n, we still could have A B For example, [ ] cos(ϕ) sin(ϕ) A = sin(ϕ) cos(ϕ) rotates R 2 by angle ϕ R around the origin Then Au, u = cos(ϕ) u 2, which does not identify A when cos(ϕ) < 1 21

22 K n : O x z y = tx + (1 t)z A K m : b = Ax = Ay = Az A(O) = O Figure 18: Linear equation Ax = b may have infinitely many solutions: If Ax = b = Az and x z then all the points y in the line passing through x and z are also solutions, as A(tx + (1 t)z) = = b for all t K 4 Linear equations, Gauss elimination For b K m and linear A : K n K m, the linear equation can have 0 or 1 or infinitely many solutions x K n Ax = b, (26) Remark: While the real-world problems are typically non-linear, they often can be locally approximated by linear equations with a good accuracy Matrix formulation Linear equation Ax = b can be written by matrices: that is equivalently [A]x = b, (27) A 11 A 12 A 1n x 1 b 1 A 21 A 22 A 2n x 2 = b 2, (28) A m1 A m2 A mn x n b m A 11 x 1 + A 12 x A 1n x n = b 1, A 21 x 1 + A 22 x A 2n x n = b 2, (29) A m1 x 1 + A m2 x A mn x n = b m 22

23 K n : x A 2 x = b 2 A K m : b = Ax O A 1 x = b 1 A(O) = O Figure 19: Linear equation Ax = b, with solutions x at the intersection of hypersurfaces A 1 x = b 1 and A 2 x = b 2 Geometric interpretation In Problem (29), the equation A j1 x 1 + A j2 x A jn x n = b j can be written as the hypersurface equation A j x = b j in K n, where A j = (A jk ) n k=1 = (A j1, A j2,, A jm ) K n In other words, such a hypersurface is the subset S j K n, where S j := {x K n : A j x = b j } That is, the problem Ax = b has solutions in the intersection S K n of hypersurfaces S 1, S 1,, S m, ie S := {x K n : A j x = b j for all j {1,, m}} So the linear problem has either 0 or 1 or infinitely many solutions: (0) zero solutions if S is empty, (1) one solution x if S = {x} (S has just one point x), ( ) infinitely many solutions if S contains at least two different points x and z (and then S contains the infinite line passing through x and z) Remark: by Without harm, the linear problem Ax = b can be abbreviated A 11 A 12 A 1n b 1 [ ] A 21 A 22 A 2n b 2 A b, that is A m1 A m2 A mn b m 23

24 K n : O x = (3, 1) (2, 5) x = 1 (4, 3) x = 9 Figure 20: In the example here, the linear equation [A][x] = [b] has a unique solution x = (3, 1) at the intersection of hyperspaces (2, 5) x = 1 and (4, 3) x = 9 (which are lines in the plane, in this case) Gauss eliminations Example [Clumsy version on the left; matrix version on the right:] { [ ] 2x 1 + 5x 2 = 1, or 4x 1 + 3x 2 = { [ ] 2x 1 + 5x 2 = 1, or 7x 2 = { [ ] 2x 1 = 6, or 7x 2 = { [ ] x 1 = 3, or x 2 = So the solution is x = (x 1, x 2 ) = (3, 1) (Check!) Geometric interpretation: this is the intersection of lines! 24

25 Sometimes no solutions Example 2x 1 + 3x 2 + 4x 3 = 2, 4x 1 + 3x 2 + 2x 3 = 6, 6x 1 + 6x 2 + 6x 3 = 9 2x 1 + 3x 2 + 4x 3 = 2, 3x 2 6x 3 = 2, 3x 2 6x 3 = 3 2x 1 + 3x 2 + 4x 3 = 2, 3x 2 6x 3 = 2, 0 = 1 or or or ; equation 0 = 1 is false; more precisely, the last row in the last matrix reads 0 x x x 3 = 1, which is a contradiction! So the original problem x x 2 = x 3 9 has no solution! Geometric interpretation: the original three (hyper)planes had empty intersection! In this example, each pair of the original (hyper)planes has an infinite line intersection (check this!) 25

26 Sometimes infinitely many solutions Example Ethanol C 2 H 5 OH burns (combusts with oxygen O 2 ), producing carbon dioxide CO 2 and water vapor H 2 O: the reaction is (x 1 ) C 2 H 5 OH + (x 2 ) O 2 (x 3 ) CO 2 + (x 4 ) H 2 O, where x 1, x 2, x 3, x 4 Z + Conservation of atoms C, H, O: 2x 1 1x 3 = 0, x 1 2x 4 = 0, or x 1 + 2x 2 2x 3 1x 4 = x 1 1x 3 = 0, x 3 2x 4 = 0, x 2 3x 2 3 1x 4 = / x 1 1x 3 = 0, x x 3 1x 4 = 0, 0 2 3/ x 3 2x 4 = x 1 2x 3 4 = 0, x 2 2x 4 = 0, x 3 2x 4 = We obtain 1x 1 1x 3 4 = 0, +1x 2 1x 4 = 0, +1x 3 2x 3 4 = 0 When x 4 = 3t R, we get or x = (x 1, x 2, x 3, x 4 ) = (t, 3t, 2t, 3t) R 4, where t R (infinitely many solutions!) The smallest x j Z + are so that the reaction formula is x = (1, 3, 2, 3), C 2 H 5 OH + 3 O 2 2 CO H 2 O (Check!) Geometric interpretation: The intersection of the original hypersurfaces in R 4 is the line { (t, 3t, 2t, 3t) R 4 : t R } 26

27 Gauss elimination process in nutshell What happens in the Gauss elimination? Write linear equation Ax = b as the matrix [ A b ], on which we apply row operations (here row = equation): 1 add weighted row to another row; 2 exchange order of two rows; 3 multiply a row by a constant λ 0; If we get from Ax = b another linear problem Cx = d, we denote [ A b ] [ C d ] Our goal is a matrix [C], where 1 next row should not have less initial zeros than the row above; 2 each column has at most one non-zero entry µ (and you may still normalize the row with division by µ) Write solution(s) in form (or show that there are no solutions!) x = (x 1,, x n ) = Remark! When finding solution candidates x, it is best to check that really Ax = b 27

28 K n : u A K m : Au B K p : BAu O AO = O BO = O Figure 21: Composing linear mappings A : K n K m and B : K m K p gives a linear mapping BA := B A : K m K p 5 Matrix product [B][A] = [BA], matrix inversion [A] 1 = [A 1 ] Let A, B be linear such that K n A K m B K p Then K n BA K p is linear, and we define matrix product [B] [A] := [BA] K p n Now because (BAu) i = m B ik (Au) k = m (BA) ij = B ik A kj, (30) m B ik k=1 ( n n m ) A kj u j = B ik A kj u j k=1 k=1 j=1 j=1 k=1 Remark Let A, B be linear, where { A : K n 1 K m 1, B : K n 2 K m 2 For BA to be defined, we must have m 1 = n 2 Thus [B][A] := [BA] is defined only if the number of columns in [B] is the number of rows in [A] Then we have the norm estimate because BAu B Au B A u BA B A, (31) 28

29 Example Let 9 8 [ ] [B] = and [A] = Now A : R 2 R 2 and B : R 2 R 3, so A + B, AB and BB cannot be defined Similarly, we cannot compute [A] + [B] = [A + B], we cannot compute [A][B] = [AB], we cannot compute [B][B] = [BB]! But BA : R 2 R 3 and A 2 = AA : R 2 R 2 can be defined, and 9( 1) + 8( 3) 9( 2) + 8( 4) 33 [B][A] = 7( 1) + 6( 3) 7( 2) + 6( 4) = 25 5( 1) + 4( 3) 5( 2) + 4( 4) , 26 [A] 2 := [A][A] = = = [ ] [ ] [ ] ( 1)( 1) + ( 2)( 3) ( 1)( 2) + ( 2)( 4) ( 3)( 1) + ( 4)( 3) ( 3)( 2) + ( 4)( 4) [ ]

30 R : f S : f(x) x y f(y) Figure 22: Injective function f : R S does not destroy information! If x y for x, y R and f : R S injective then f(x) f(y) Properties of functions Informally, function f : R S between sets R, S is a rule that on each input x R produces an output f(x) S The f-image f(q) S of a subset Q R is Function f : R S is f(q) = {f(x) : x Q} injective provided that f(x) = f(y) only if x = y R; surjective if f(r) = S (that is, the f-image covers S completely); bijective if it is both injective and surjective In this case, f : R S has the inverse function f 1 : S R, where f(x) = u x = f 1 (u) What if here R, S are vector spaces and f : R S is linear? 30

31 K n : x = A 1 (u) y = A 1 (v) A K m : u = Ax v = Ay O = A 1 (O) B=A 1 O = AO Figure 23: Inverse function B = A 1 of a linear bijection A : K n K m is automatically linear, and then also m = n Bijective linear functions Theorem Let A : K n K m be a linear bijection Then A 1 : K m K n is linear, and m = n Proof Let us denote B = A 1, and let u, v K m, λ K Then u = A(x) and v = A(y) for some x, y K n, and linearity of B follows: B(u + v) = B(A(x) + A(y)) = B(A(x + y)) = x + y = B(u) + B(v), B(λu) = B(λ A(x)) = B(A(λx)) = λx = λ B(u) Notice that the origin x = O K n is the unique solution to A(x) = O K m ; the solution could not be unique if m < n (here A(x) = O is a system of m equations with n unknown variables x 1,, x n K) Hence m n But symmetrically n m: here u = O K m is the unique solution to A 1 (u) = O K n 31 QED

32 Inverse matrix [A] 1 = [A 1 ] Let I = (x x) : K n K n be the identity function If A, B, C : K n K n are linear and BA = I = AC then B = C = A 1, because B = BI = B(AC) = (BA)C = IC = C Let A : K n K n be a linear bijection; we know that then A 1 : K n K n is also a linear bijection Then the inverse matrix of A is defined to be the matrix of A 1, that is [A] 1 := [A 1 ], and we say that [A] is invertible: here A 1 A 1, because 1 = I = A 1 A A 1 A Finding X = A 1 means solving a linear equation AX = I, which can be done by the Gauss elimination [A I] [I X] = [I A 1 ] Example [I] 1 = [I 1 ] = [I], that is: [ ] [ ] 1 [ ] = 1, = 0 1 [ ] 1 0, = 0 1 0, Example = [ ] [ ] Example If A = and B = then B = A : [ ] [ ] [ ] [ ] (3) + 1( 5) 2( 1) + 1(2) 1 0 = = = I, (3) + 3( 5) 5( 1) + 3(2) 0 1 [ ] [ ] [ ] [ ] (2) 1(5) 3(1) 1(3) 1 0 = = = I (2) + 2(5) 5(1) + 2(3)

33 [ ] 2 1 Example Let [A] = Then 5 3 [ ] [ ] A I = [ ] [ ] [ ] [ ] 3 1 Thus [A] 1 = [X] = 5 2 = [ I X ] Remark! It is good to check that [X] is the inverse matrix to [A]: [A][X] = = [I], [X][A] = = [I] (It is enough to check only [A][X] or [X][A] why?) Example Let us try to find the inverse to [A] = : [ A ] I = The equations in the last row are now 0 = 1, 0 = 2, 0 = 1 which is clearly false! The reason is that here matrix [A] R 3 3 is not invertible, ie linear function A : R 3 R 3 is not bijective! 33

34 O I n Q I 1 A AO = O A 1 = A(I 1 ) A(Q) A n = A(I n ) Figure 24: Let A : R n R n be linear Let Q = [0, 1] n R n be the unit cube Then det[a] 0 is the volume of the polyhedron A(Q) R n, and det[a] < 0 would mean flipping the orientation 6 Determinant The determinant det[a] K of a matrix [A] K n n describes geometric information about the linear function A : K n K n If a subset S K n has n-dimensional volume 1 then: A(S) K n has n-dimensional volume det[a] When K = R: if det[a] < 0 then A(S) is a mirror image of S; if det[a] > 0 then A(S) and S have the same orientation We shall need four laws (L1,L2,L3,L4) to find determinants: (L1) det[i] = 1 for identity matrix [I] K n n (L2) If one column is multiplied by λ K, then det is multiplied by λ (L3) If matrix has two same columns then determinant is 0 (L4) Let X j be the jth column of X = [X 1 X n ] K n n If C k = A k + B k and C j = A j = B j whenever j k then det[c] = det[a] + det[b] Example For [A] = 0 3 0, laws (L1,L2) yield det[a] = det = det (L1) = = 30 (This matrix [A] here stretches the standard basis by the respective factors 2, 3, 5, thus strecthing the 3-dimensional volumes by the factor = 30) 34

35 [ ] 0 = I 1 2 [ ] 0 = O 0 Q I 1 = [ ] 1 = I I 2 [ ] 1 0 a A= A 1 = A(I 1 ) = b c d [ ] a c A(Q) O = AO A(I 1 + I 2 ) = A 1 + A 2 = A 2 = A(I 2 ) = [ ] b d [ ] a + b c + d Figure 25: Matrix [A] R 2 2 maps the unit cube (the usual unit square) Q = [0, 1] 2 = {x = (x 1, x 2 ) R 2 : x 1, x 2 [0, 1]} to the parallelogram A(Q) R 2, which has the 2-dimensional volume (or the usual area) det[a] 0 Law (L5) (Corollary to (L3,L4)) two columns! Sign of det changes by interchanging Proof: It is enough to check the 2-dimensional case: [ ] [ ] [ ] a + b a + b (L4) a a + b b a + b = det = det + det c + d c + d c c + d d c + d [ ] [ ] [ ] [ ] (L4) a a a b b a b b = det + det + det + det c c c d d c d d [ ] [ ] (L3) a b b a = +det + det QED c d d c 0 (L3) Example: [ ] a b det c d The general computation for the 2-dimensional case is [ ] [ ] (L4) a b 0 b = det + det 0 d c d [ ] [ ] [ ] [ ] (L4) a b a 0 0 b 0 0 = det + det + det + det d c 0 c d [ ] [ ] [ ] [ ] (L2) = ab det + ad det + bc det + cd det [ ] (L3,L5) 1 0 = (ad bc) det 0 1 (L1) = ad bc 35

36 More generally: Similarly for any [A] K n n, det[a] = a P det[p ], (32) [P ] permutation where the permutation matrices [P ] R n n have elements P jk {0, 1} with exactly one 1 in each row and in each column; a P K is the product of those A jk for which P jk = 1 For dimension n, there are exactly n! permutation matrices For instance, for n = 2 we have n! = 2 permutations, and [ ] [ ] a b a 0 det = det c d 0 d +det [ ] 0 b c 0 = ad det [ ] bc det [ ] = ad bc Example For n = 3, there are n! = 6 permutation matrices [P ] R 3 3, which are , 0 0 1, 1 0 0, 0 0 1, 1 0 0, Find the determinants of these permutation matrices! Notice that a b c det d e f = g h i [ a 0 0 ] [ a 0 0 ] [ 0 b 0 ] [ 0 b 0 ] [ 0 0 c ] [ 0 0 c ] = det 0 e 0 + det 0 0 f + det 0 0 f + det d det d det 0 e i 0 h 0 g i 0 h 0 g 0 0 = +aei afh + bfg bdi + cdh ceg Remark! By (32), Laws (L1,L2,L3,L4,L5) are ok when the word column is changed to the word row Thus if we obtain [B] K n n from [A] K n n by the Gauss elimination (not multiplying single rows) then det[a] = ( 1) k det[b], where k is the number of the row interchanging operations Example: det[a] = = 42, where [A] =

37 Remark! Clearly, A K n n is invertible if and only if it can be transformed to the identity matrix I K n n by the Gauss row operations: [A I] Gauss [I A 1 ] On the other hand, invertibility is equivalent to det[a] 0, since the determinant can be calculated by the Gauss row operations Sub-determinants For finding determinants, the Gauss row operations are most useful if we know the exact numerical values of the matrix elements For more general determinant formulas, application of (32) might be better Related to this, we may go from dimension n to dimension n 1 by subdeterminants: for [A] K n n, let [B (j,k) ] = [omit row j and column k from A] K (n 1) (n 1), det[a] = = n ( 1) j+k A jk det[b (j,k) ] j=1 n ( 1) j+k A jk det[b (j,k) ] k=1 Here the signs have the pattern [ ( 1) j+k] n j,k=1 = Example Let us first develop the sub-determinants with respect to the first row [ a b c ] b, and then with respect to the second column e : h a b c [ ] [ ] [ ] det d e f e f d f d e = +a det b det + c det h i g i g h g h i [ ] [ ] [ ] d f a c a c = b det + e det h det g i g i d f 37

38 Row operations as matrix products Any simple Gauss row operation B EB on matrix B K n n can be written as an elementary matrix product EB K n n : for instance, a b c a b c d e f = g h i g h i d e f corresponds to interchanging rows (2nd and 3rd), a b c a b c 0 λ 0 d e f = λd λe λf g h i g h i corresponds to multiplying a row (2nd) by constant λ K, and a b c a b c 0 1 λ d e f = d + λg e + λh f + λi g h i g h i corresponds to adding a λ-weighted row (3rd) to another row (2nd) Theorem Determinant is multiplicative: for all A, B K n n det[ab] = det[a] det[b] (33) Proof For non-invertible A this is trivial An invertible A is of the form A = E 1 E 2 E k, (34) where each E j K n n corresponds to a Gauss row operation as above Here for all M K n n Thereby Corollary: det[ab] det[e j M] = det[e j ] det[m] (35) (34) = det[e 1 E 2 E k B] (35) = det[e 1 ] det[e 2 E k B] (35) = det[e 1 ] det[e 2 ] det[e k ] det[b] (35) = det[e 1 E 2 E k ] det[b] (34) = det[a] det[b] QED If A K n n is invertible then because 1 = det[i] = det[aa 1 ] = det[a] det[a 1 ] det[a 1 ] = 1/det[A], (36) 38

39 z O x Cx A O = AO Az = λz Cx Ax = λx Figure 26: Eigenvectors x, z C n on the same eigenline Cx of A C n n 7 Eigenvalues and eigenvectors Linear A : C n C n (and corresponding matrix [A] C n n ) has eigenvector x C n (or [x] C n 1 ) and eigenvalue λ C if Ax = λx, where eigenvector x O (of course, trivially AO = O = λo, where O C n is the origin) Remark Notice that if t C then here A(tx) = ta(x) = λ(tx): thus it would be appropriate to talk about the eigenline Cx = {tx C n : t C}, where tx C n is another eigenvector whenever t 0 How to find all the eigenvalues? Notice that Ax = λx (A λi)x = O A λi linear, x O A λi not invertible, which means det[a λi] = 0, so you may find the eigenvalues λ C by solving this so-called characteristic equation How to find the eigenvectors? Let λ C be an eigenvalue of a square matrix A C n n The corresponding eigenvectors x C n satisfy Ax = λx (equivalently (A λi)x = O), and we find them by the Gauss elimination on [A λi O] Just remember that the origin O C n is never an eigenvector 39

40 Eigenvalues as roots of characteristic polynomial Definition The characteristic polynomial of [A] C n n is p A : C C, where p A (z) := det[a zi] By the Fundamental Theorem of Algebra [Gauss], polynomials split uniquely in C into product of first order terms; thus p A (z) = ( 1) n (z λ 1 ) (z λ n ), where λ 1,, λ n C are the eigenvalues of A The algebraic multiplicity d = m a (λ) of an eigenvalue λ C is the degree d of the factor (z λ) d in p A The geometric multiplicity m g (λ) of an eigenvalue λ C is the dimension of the vector subspace spanned by the corresponding eigenvectors Notice that always Actually, 1 m g (λ) m a (λ) n p A (z) = ( 1) n ( z n tr[a]z n ( 1) n det[a] ), where tr[a] C is the trace of [A] C n n, satisfying (typically here A kk λ k ) tr[a] := n A kk = k=1 n λ k (37) k=1 40

41 ir A λ 0 R Figure 27: λ A if λ C is an eigenvalue of A C n n Eigenvalues/matrix norm Recall that the norm of matrix A C n n is A := max u C n : u 1 Au = max Au, v u,v C n : u, v 1 If λ > A then λ C cannot be an eigenvalue of A: This follows from n (λi A) (A/λ) k k=0 = (λi A) ( I + (A/λ) 1 + (A/λ) (A/λ) n) = λii AI + λi(a/λ) 1 A(A/λ) 1 + λi(a/λ) 2 A(A/λ) n = λi λ(a/λ) n+1 n λi, where the limit exists, because (A/λ) n+1 = A n+1 / λ n+1 n+1 n ( A / λ ) 0 since here A / λ < 1 Hence λi A C n n is invertible, and (λi A) 1 = λ 1 k=0 (A/λ) k Moreover, by applying the geometric series, we obtain the norm estimate (λi A) 1 λ 1 A k / λ k = k=0 1 λ A 41

42 a 0 0 Example If [A] = 0 b 0 C 3 3 then 0 0 c p A (z) = det[a zi] a z 0 0 = det 0 b z c z giving eigenvalues a, b, c C Moreover, where = ( 1) 3 (z a)(z b)(z c), p A (z) = ( 1) 3 ( z 3 (a + b + c)z 2 + (ab + ac + bc)z abc ), tr[a] = a + b + c C, det[a] = abc C [ ] a b Example If [A] = C c d 2 2 then p A (z) = det[a zi] [ ] a z b = det c d z = (a z)(d z) bc = ( 1) 2 (z 2 (a + d)z + (ad bc)) = ( 1) 2 (z λ 1 )(z λ 2 ), where eigenvalues λ 1, λ 2 C can be found easily Notice that here tr[a] = a + d C, det[a] = ad bc C 42

43 Example Let R : R 2 R 2, for which R(x) = (x 2, x 1 ), that is [ ] 0 1 [R] = 1 0 Here det[r] = 1 R reflects the space with respect to the hyperplane (line) {x R 2 : x 1 = x 2 } Now p R (z) = det[r zi] [ ] 0 z 1 = det 1 0 z = z 2 1 = (z 1)(z + 1), giving eigenvalues λ = ±1 Corresponding eigenvectors x? R(x) = λx? We solve (R λi)(x) = O by the Gauss elimination: [ ] [ ] [ ] λ 1 0 Gauss 0 1 λ 2 0 λ=± [R λi O] = = 1 λ 0 1 λ 0 1 λ 0 Eigenvectors for λ = +1: x = (t, t) C 2 for 0 t C Eigenvectors for λ = 1: x = (t, t) C 2 for 0 t C Example Let Q : R 2 R 2, for which Q(x) = (x 2, x 1 ), that is [ ] 0 1 [Q] = 1 0 Here det[q] = +1 Q rotates the space around the origin Now p Q (z) = det[q zi] [ ] 0 z 1 = det 1 0 z = z = (z i)(z + i), giving eigenvalues λ = ±i Corresponding eigenvectors x? Q(x) = λx? We solve (Q λi)(x) = O by the Gauss elimination: [ ] [ ] [ ] λ 1 0 Gauss λ 2 0 λ=±i [Q λi O] = = 1 λ 0 1 λ 0 1 λ 0 Eigenvectors for λ = +i: x = (t, +it) C 2 for 0 t C Eigenvectors for λ = i: x = (t, it) C 2 for 0 t C 43

44 I n = (0, 0,, 1) O I 1 = (1, 0,, 0) Λ:C n C n O = ΛO λ 1 I 1 λ n I n Figure 28: Action of a diagonal matrix Λ C n n with Λ kk = λ k 8 Diagonalization Matrix Λ = [Λ jk ] n j,k=1 Cn n is diagonal if Λ jk = 0 whenever j k That is, with λ k := Λ kk C we have λ λ 2 0 Λ = λ n Let I 1, I 2,, I n R n 1 be the standard basis vectors Here obviously Λ(I k ) = λ k I k, meaning that I k is an eigenvector corresponding to the eigenvalue λ k C Diagonal matrices are easy to sum and to multiply, eg here (λ 1 ) Λ (λ 2 ) = (λ n ) 999 Also, here Λ is invertible if and only if λ 1 λ 2 λ 3 λ n 0, and then (λ 1 ) Λ 1 0 (λ 2 ) 1 0 = (λ n ) 1 All this is to say: diagonal matrices are really easy to handle It turns out that many other matrices can be diagonalized, making them easy to treat when seen from a suitable viewpoint 44

45 S n O S 1 A:C n C n A(S n ) = λ n S n O = AO A(S 1 ) = λ 1 S 1 S 1 = [S 1 S n ] 1 S = [S 1 S n ] I n = S 1 (S n ) O I 1 = S 1 (S 1 ) diagonal Λ:C n C n λ n I n O = ΛO λ 1 I 1 Figure 29: Diagonalization Λ = S 1 AS In other words, A = SΛS 1 Diagonalization Matrix A C n n can be diagonalized if there is invertible S C n n for which Λ = S 1 AS C n n is diagonal, ie Λ jk = 0 whenever j k: then A = SΛS 1, and (writing λ k := Λ kk C) we have λ λ 2 0 Λ =, 0 0 λ n where λ 1, λ 2,, λ n C are the eigenvalues of A The kth column S k of S = [S 1 S n ] is the eigenvector of A corresponding to the eigenvalue λ k C: (AS) jk = (SΛS 1 S) jk = (SΛ) jk = n S jl Λ lk = λ k S jk l=1 We may say that a diagonalizable matrix A = SΛS 1 looks like a diagonal matrix Λ after changing the coordinate point of view by S 45

46 Remark: Such diagonalization A = SΛS 1, if it exists, may not be unique, as it changes if the choose different eigenvectors, or if we permute the order of the eigenvalues (and the corresponding order of the eigenvectors) For instance, c [ a b d ] [ λ1 0 0 λ 2 c ] [ ] 1 a b = d [ t2 b t 1 a t 2 d t 1 c ] [ λ2 0 0 λ 1 ] [ t2 b t 1 a ] 1 t 2 d t 1 c whenever ad bc 0 t 1 t 2 Here, eigenvectors [ ] [ ] a b t 1, t c 2 d respectively correspond to the eigenvalues λ 1, λ 2 C Nevertheless, we have: Theorem Matrix A C n n can be diagonalized if and only if m g (λ) = m a (λ) for all the eigenvalues λ of A Remark: For a matrix A C n n, remember that 1 m g (λ) m a (λ) n always Matrix A C n n cannot be diagonalized if m g (λ) < m a (λ) for just one eigenvalue λ of A, as then we cannot find enough eigenvectors to achieve the invertible eigenvector matrix S eventually! In the last two examples in the previous Chapter (Eigenvalues and eigenvectors), the eigenvalues of the diagonalizable matrices R, Q : C 2 C 2 have all multiplicities 1 Example Let [A] = [ ] 1 b, where b 0 (so [A] is not diagonal) Now 0 1 p A (z) = det[a zi] [ ] 1 z b = det 0 1 z = (1 z) 2, giving algebraic multiplicity m a (λ) = 2 for eigenvalue λ = 1 As b 0, then (A λi)(x) = O has only solution x = (t, 0) for constants t C So the geometric multiplicity m g (λ) = 1 < m a (λ) This is an example of a non-diagonalizable matrix! In other words, here S 1 AS is not diagonal no matter which invertible S we choose 46

47 L U 11 U 12 U 13 U 1n L 21 L U 22 U 23 U 2n L = L 31 L 32 L 33 0, U = 0 0 U 33 U 3n L n1 L n2 L n3 L nn U nn Figure 30: Lower triangular L C n n, upper triangular U C n n Example Let λ C be the only eigenvalue of diagonalizable A C n n Thus A = S(λI)S 1 = λss 1 = λi That is, A = constant times identity Example Matrix M C n n is triangular if it is lower triangular (M jk = 0 whenever j < k) or upper triangular (M jk = 0 whenever j > k) If triangular A has zero diagonal, then λ = 0 is its only eigenvalue; by the previous example, such A can be diagonalized only if A = O Example AS = SΛ, if [ ] a b A =, S = 0 0 [ ] 1 b, Λ = 0 a [ ] a Thus if a 0 then we have A = SΛS 1 (if a = 0 then S is not invertible) Remark On page 57 we learn that matrix is normal (see page 50) if and only if it has a unitary diagonalization In the previous example above, A = SΛS 1 is a normal matrix if and only if b = 0: thus some non-normal matrices can be diagonalized (yet not unitarily diagonalized) 47

48 Functions of square matrices Analytic function f : C C can be presented as power series f(z) = c k z k (38) k=0 Eg functions exp, cos, sin are analytic Define f : C n n C n n by f(a) = c k A k (39) Here f(a) ij f(a ij ) often! But with diagonalization A = SΛS 1, ( ) f(a) = c k (SΛS 1 ) k = = S c k Λ k S 1 = S f(λ) S 1, k=0 where f(λ) C n n is diagonal with f(λ) jj = f(λ jj ) C Nice! Example Then where k=0 k=0 Let A = SΛS 1, where a 0 0 Λ = 0 b c exp(a) = S exp(λ) S 1, A 1/2 = S Λ 1/2 S 1, e a 0 0 a 0 0 exp(λ) = 0 e b 0, Λ 1/2 = 0 b e c 0 0 c So, what would be A then? If A C n n and smooth func- Application to differential equations: tions x 1,, x n : R C such that x (t) = A x(t) (where (x (t)) j = (x j ) (t) = d dt x j(t), naturally), then x(t) = exp(ta) x(0) 48

49 K n : O A w u, u = 1 A A K m : A(O) = O w, w = 1 Au Figure 31: Adjoint A : K m K n is the mirror image of linear A : K n K m in the sense that u, A w = Au, w for all u K n and w K m 9 Adjoint matrices For a linear mapping A : K n K m, the adjoint is the linear mapping A : K m K n defined by the inner product duality Au, w =: u, A w (40) for all u K n and w K m Why this definition works? Indeed, Au, w = m (Au) j w j = j=1 m n A jk u k w j = j=1 k=1 ( n m ) u k A jk w j, k=1 j=1 which shows that the adjoint A has the matrix elements (A ) kj = A jk K Clearly (A ) = A There is the norm equality A = A, because A = max Au, v u C n,v C m : u, v 1 = max u, u C n,v C m : u, v 1 A v = A Example [ ] 1 + 2i 3 + 4i 5 + 6i Let A = C 7 + 8i i i Its adjoint matrix is 1 2i 7 8i A = 3 4i 9 10i C i 11 12i 49

50 Important classes of matrices Definition Let us introduce the following classes of square matrices: Matrix P C n n is positive if P x, x 0 for all x C n Matrix S C n n is symmetric (or self-adjoint) if S = S Matrix N C n n is normal if N N = NN Matrix U C n n is unitary if U = U 1 : then U(x), U(y) = x, y for all x, y C n ; such U preserves distances and angles in C n! Remark These classes of matrices have some connections: Above, P ositive Symmetric, and Symmetric Normal, but Normal Symmetric, and Symmetric P ositive Also, Unitary Normal, but Normal Unitary Term positive is often more accurately positive semi-definite λ λ 2 0 Exercise Let us study a diagonal matrix Λ = Cn n 0 0 λ n For which λ k C the following properties hold? (a) Λ is positive (b) Λ is symmetric (c) Λ is normal (d) Λ is unitary Remark Notice that U U = I means that the columns U k C n 1 of unitary U = [U 1 U n ] C n n are mutually orthonormal to each other: U k, U k = U k 2 = 1, and U j, U k = 0 for j k Moreover, if A C n n is positive/symmetric/normal/unitary then also U AU has the same property Remark Invertibility of S = [S 1 S n ] C n n means that the vectors S 1,, S n are linearly independent; out of these vectors Gram Schmidt gives orthonormal U 1,, U n, and then U = [U 1 U n ] C n n is unitary Definition For A R m n, the adjoint is the transpose A T = A A real unitary matrix A R n n is called orthogonal: this means A T A = I = A A T, in other words A T = A 1 here 50

51 Eigenvalues of symmetric / positive matrices Claim: Symmetric matrix has real eigenvalues Proof Let λ C be an eigenvalue of a symmetric matrix S = S C n n with eigenvector x C n 1 Then λ x, x = λx, x Sx=λx = Sx, x = x, S x S =S = x, Sx Sx=λx = x, λx = λ x, x Thus λ = λ (ie λ R), because x, x = x 2 0 QED Claim: Positive P C n n is symmetric, having eigenvalues 0 Proof By the Weak Formulation Theorem we have P = P, as P x, x = x, P x real = x, P x = P x, x for any x C n Let P u = λu, where u O eigenvector with eigenvalue λ C Then 0 P u, u = λu, u = λ u, u = λ u 2, so that 0 λ QED 51

52 Q(K n ) = Z Qx O x = P x + Qx P x Z = P (K n ) Figure 32: Orthogonal projection P K n n onto vector subspace Z K n Then the linear mapping Q = I P is the orthogonal projection onto the orthogonal vector subspace Z = {u K n : u, z = 0 for all z Z} Nice to know: orthogonal projections For a vector subspace Z K n of dimension d = dim(z) there are linearly independent vectors S 1,, S d K n such that Z = span{s 1,, S d } From the sequence (S 1,, S d, I 1, I 2,, I n ), after throwing away the early vectors that violate the linear independence, the Gram Schmidt process finds the orthonormal vectors U 1,, U n K n such that Z = span{u 1,, U d } If we define P : K n K n by P x := d x, U k U k, (41) then P is the orthogonal projection onto Z, satisfying P = P = P 2 k=1 Remark: If P K n n satisfies P = P = P 2 then P is the orthogonal projection onto Z := P (K n ) The idea: P x Z is the closest point in Z to x K n In other words, P x is the perpendicular shadow of point x on subspace Z Now let Q = I P, that is x = P x + Qx (for all x K n ), then we have the Pythagorean identity x 2 = P x 2 + Qx 2 Here, Q is the orthogonal projection onto the orthogonal vector subspace Z = {u K n : u, z = 0 for all z Z} Notice that P Q = O = QP, and P (K n ) = Z = {x K n : Qz = O}, Q(K n ) = Z = {x K n : P z = O} P = [P 1 P n ] K n n projects orthogonally onto Z = P (K n ) = span{p j } n j=1 52

53 S n O S 1 A:C n C n A(S n ) = λ n S n O = AO A(S 1 ) = λ 1 S 1 S 1 = [S 1 S n ] 1 S = [S 1 S n ] I n = S 1 (S n ) O I 1 = S 1 (S 1 ) diagonal Λ:C n C n λ n I n O = ΛO λ 1 I 1 Figure 33: Diagonalization Λ = S 1 AS In other words, A = SΛS 1 10 Unitary diagonalization Remember the following matrix properties: M C n n is symmetric (or self-adjoint) if M = M B C n n is normal if B B = B B U C n n is unitary if U = U 1 Also, remember what the diagonalization Λ = S 1 AS for a matrix A C n n means: previously we just required S C n n to be invertible In unitary diagonalization, we want S unitary: remember that unitary matrices preserve inner products and norms! In order for this to be possible, something has to be assumed from A, but fortunately this assumption turns out to be quite easy and natural: namely, A should be normal! [ ] [ ] 0 b 0 c Example Let A =, where b, c C Then A = c 0 b Thus 0 here A is symmetric if and only if c = b Moreover, [ ] [ ] [ ] 0 c A 0 b c 2 0 A = b = 0 c 0 0 b 2, [ ] [ ] [ ] 0 b 0 c AA b 2 0 = c 0 b = 0 0 c 2 Hence here A is normal if and only if b = c 53

54 A 11 A 12 A 13 A 1n 0 A 22 A 23 A 2n A = 0 0 A 33 A 3n A nn A A=AA = A = A A A A nn Figure 34: Any normal (upper or lower) triangular matrix is diagonal! And what if the matrix would be symmetric and triangular? Normal triangular matrix is diagonal! Let A C n n Then (A A) kk = (AA ) kk = n A lk A lk = l=1 n A kl A kl = l=1 n A lk 2, l=1 n A kl 2 Let A be normal (A A = AA ) and upper triangular (A ij = 0 if i > j) Then l=1 A 11 2 = n A l1 2 = (A A) 11 = (AA ) 11 = n A 1l 2 = A n A 1l 2, l=1 l=1 l=2 which shows that A 1l = 0 for l 2 From this result ( ), next A 22 2 ( ) = n A l2 2 = (A A) 22 = (AA ) 22 = l=1 n A 2l 2 = A l=1 n A 2l 2 l=3 shows that A 2l = 0 for l 3 Inductively, this yields A ij = 0 whenever j > i (and we already had A ji = 0 here) Hence A is a diagonal matrix! Result: So, normal triangular matrices are actually diagonal! (A trivial result: symmetric triangular matrices are diagonal and real! Think about that!) Remark This previous result soon becomes a powerful tool Namely, we will show that all square matrices can be unitarily triangulated; hence all normal matrices can be unitarily diagonalized! (Especially, all symmetric matrices can be unitarily diagonalized) 54

55 Gram Schmidt process reformulated Let matrix S = [S 1 S n ] C n n be invertible, with columns S k C n 1 The Gram Schmidt process gives unitary U = [U 1 U n ] C n n as follows: Let U 1 := S 1 / S 1, (42) and for k 2 then where U k := V k V k, (43) k 1 V k := S k S k, U j U j (44) j=1 It is trivial that U k, U k = 1 Also, U k, U l = 0 for k l, because if k > l then k V k, U l = S k, U l S k, U j U j, U l = S k, U l S k, U l = 0 j=1 Notice that for all k {1,, n} here span{s 1,, S k } = span{u 1,, U k } This Gram Schmidt process is numerically unstable (round-off errors accumulate), but there are ways to stabilize the process Example Let S = and [ ] 4 6 Then 3 2 U 1 = S 1 / S 1 = [ ] 4 / = [ ] 4/5, 3/5 V 2 = S 2 S 1, U 1 U [ ] [ ] [ 1 ] [ ] 6 6 4/5 4/5 =, 2 2 3/5 3/5 [ ] so U 2 = V 2 3/5 = V 2 We have 4/5 U = [U 1 U 2 ] = = [ ] [ ] 6 4/ /5 [ ] 4/5 3/5 3/5 4/5 = [ ] 6/5, 8/5 55

56 S 1 S 2, U 1 U 1 S2 U 1 U 2 V 2 Figure 35: Gram Schmidt: from invertible S = [S 1 S n ] C n n to unitary U = [U 1 U n ] C n n such that span{s 1,, S k } = span{u 1,, U k } Unitary triangulation of square matrices For a matrix A n C n n, we next find a unitary matrix U n C n n and an upper triangular matrix Λ n C n n such that A n = U n Λ n U n (45) This presentation (45) is trivial when n = 1, so let us assume n > 1 We prove (45) by induction, by reducing case n to case n 1 Take an eigenvalue λ C with a normalized eigenvector v C n 1 : A n v = λv, v = 1 By Gram Schmidt, find unitary V C n n with first column v So [ ] λ w A n = V V O A n 1 for some w C 1 (n 1), where O R (n 1) 1 is the zero column vector, and A n 1 = U n 1 Λ n 1 U n 1 C (n 1) (n 1) by case n 1 of (45) Let [ ] 1 O U n := V, O U n 1 [ ] λ wun 1 Λ n = O Λ n 1 Then A n = U n Λ n U n, where U n is unitary and Λ n upper triangular 56

57 A(U 1 ) = λ 1 U 1 U 2 U 1 A:C n C n A(U 2 ) = λ 2 U 2 O C n O = AO C n U = [U 1 U n ] U = [U 1 U n ] I 2 = U (U 2 ) C n Λ:C n C n λ 2 I 2 C n O C n I 1 = U (U 1 ) C n O = ΛO C n λ 1 I 1 C n Figure 36: Idea of unitary diagonalization Λ = U AU of normal A C n n Equivalently, this means A = UΛU Unitary operations U and U preserve distances and angles Unitary diagonalization of normal matrices Thus, for any A C n n there is unitary triangulation A = UΛU, where U C n n is unitary and Λ C n n is upper triangular It is easy to see that here Λ is normal if and only if A is normal Because normal triangular matrices are diagonal (see page 54), we get: Normal matrices can be diagonalized by unitary matrices! More precisely: Theorem Conditions (1) and (2) are equivalent: (1) A C n n is normal (that is, A A = AA ) (2) A = UΛU for unitary U C n n and diagonal Λ C n n Remark: In this result on the unitary diagonalization A = UΛU, it is easy to see that A = A if and only if Λ = Λ (Why?) So, a normal matrix is symmetric if and only if its eigenvalues are real However, there are nonnormal diagonalizable matrices with real eigenvalues: see the example on page 47 57

Math Linear Algebra Final Exam Review Sheet

Math Linear Algebra Final Exam Review Sheet Math 15-1 Linear Algebra Final Exam Review Sheet Vector Operations Vector addition is a component-wise operation. Two vectors v and w may be added together as long as they contain the same number n of