Linear Algebra Review - PDF Free Download

Chapter 1 Linear Algebra Review It is assumed that you have had a course in linear algebra, and are familiar with matrix multiplication, eigenvectors, etc. I will review some of these terms here, but quite rapidly. 1.1 Vector Spaces The standard object in linear algebra is a vector space. Definition 1.1. A vector space V over a field F (the scalars) is a set of vectors with two operations: vector addition vector + vector = vector, which makes V into an Abelian group, and scalar multiplication with properties scalar vector = vector, α (βv) = (αβ) v, α(v + w) = αv + αw, (α + β)v = αv + v. A subspace is a subset of V which is closed under addition and scalar multiplication. In this course, we will only consider R (real numbers) and C (complex numbers) as scalars. I will state the definitions for the complex case, if it makes a difference, but all of the examples and homework problems will be real (except possibly in the section on eigenvalues). There are two standard examples of finite-dimensional vector spaces. 1

2 CHAPTER 1. LINEAR ALGEBRA REVIEW 1.1.1 The Geometric View: Arrows A vector is an arrow, given by a direction and a length. You can move it around to any place you want. Examples are forces in physics, or velocities. 1.1.2 The Analytic View: Columns of Numbers The vector space is V = R n or C n. v 1 v 1 + w 1 αv 1 v 2 v =., v + w = v 2 + w 2.., αv = αv 2.. v n v n + w n αv n Note: Vectors are columns of numbers, not rows. In this course, we basically only use the analytic view, but the geometric view is occasionally useful to get an intuitive understanding of what some theorem or algorithm means. 1.1.3 Linear Independence and Bases Let {v i } be a collection of vectors, and {α i } some scalars. A linear combination of the vectors is α i v i. i The set of all linear combinations of {v i } is called the span. The span is always a subspace. The {v i } are called linearly dependent if there is a set of coefficients {α i }, not all zero, for which α i v i = 0. i Otherwise, they are linearly independent. A basis of V is a collection {v i } so that every w V can be written uniquely as a linear combination.

1.2. LINEAR MAPS 3 Theorem 1.2. (a) Every vector space has a basis (usually infinitely many of them) (b) Every basis has the same number of elements. This is called the dimension of the space. Take an arbitrary n-dimensional vector space V over F. Pick a basis {e 1, e 2,..., e n }. Then we can equate α 1 w = α i e i w = α 2 α n V F n. Thus, every finite-dimensional vector space over F is isomorphic to F n. We will frequently use the notation 1 0 0 1 e 1 = 0, e 2 = 0,..... 0 0 for the standard basis vectors. Sideline: As a generalization of F n, the space of infinite sequences {α 1, α 2, α 3,...} is also a vector space, with dimension infinity. Even more general, the space of functions on an interval [a, b] is a vector space. In finite dimensions, we have subscripts 1,..., n. In the case of sequences, we have subscripts 1, 2,.... In the case of functions, you can think of the x in f(x) as a subscript. Linear algebra for infinite-dimensional spaces is called functional analysis, and is its own topic. 1.2 Linear Maps A homomorphism L between vector spaces needs to preserve the two vector space operations. This means L(v + w) = L(v) + L(w), L(αv) = αl(v),

4 CHAPTER 1. LINEAR ALGEBRA REVIEW or equivalently L(αv + βw) = αl(v) + βl(w). Such a mapping is called a linear map. Assume L is a linear map from V to W. Definition 1.3. R(L) = range of L = {Lv : v V } W, N(L) = nullspace of L = kernel of L = {v V : Lv = 0} V. Both of these are subspaces. The dimension of R(L) is called the rank of L. Theorem 1.4. dim(r(l)) + dim(n(l)) = dim(v ). Note: From the definitions, a linear map L from V to W is one-to-one if and only if N(L) = {0}. It is onto if and only if R(L) = W. By the theorem and a dimension count, a mapping L from V into itself is one-to-one if and only if it is onto. 1.3 Matrices So far, L is just a mapping between vector spaces. It maps vectors to other vectors. Now we want to introduce coordinates. We pick a basis {e i }, i = 1,..., n in V, and another basis {f j }, j = 1,..., m in W. If v = i v ie i is an arbitrary vector in V (expressed in terms of the basis {e i }), then by linearity, w = Lv = v i Le i. i We can express Le i in the basis of W : Le i = j l ji f j. We collect all these numbers in a matrix L. Then L v = w l 11 l 12 l 1n l 21 l 22 l 2n.. l m1 l m2 l mn v 1. v n = w 1. w m. So, the columns of the matrix represent the images of the basis vectors.

1.3. MATRICES 5 Note the numbering in L: the first subscript always refers to the row, the second to the column. Entry l 35 is in row 3, column 5. The map L goes from F n to F m, and is of size m n (which is the opposite order). There are two ways to think about matrix times vector multiplication. The first one is the dot product interpretation: The jth entry in w is the dot product between the jth row of L, and v. v 1 v 2 w j = (l j1, l j2,..., l jn ). = l j1v 1 + l j2 v 2 + + l jn v n. v n The second one is the linear combination interpretation: w is the linear combination of the columns of L, with coefficients from v. This is the way we derived it above. When you program this on a computer, (matrix times vector) corresponds to a double loop. The loop can be executed in either order, corresponding to the two interpretations. Depending on the computer architecture, one way may be faster than the other. In Matlab-like code: % dot product % linear combination w = 0; w = 0; for i = 1:n for j = 1:n for j = 1:n for i = 1:n w(i) = w(i) + L(i,j)*v(j) w(i) = w(i) + L(i,j)*v(j) end end end end Remark: Technically, I should distinguish between the mapping L and the matrix L, but I won t. Just keep in mind that the matrix depends on the choice of basis, the mapping does not. 1.3.1 Matrix Multiplication Suppose you have a linear map L from V (dimension n) to W (dimension m), and M from W to X (dimensions p). We can then consider the combined map N = M L (backwards again).

6 CHAPTER 1. LINEAR ALGEBRA REVIEW You can verify that for the matrices, N = M L (p n) = (p m) (m n) n ij = k m ik l kj In words: the entry n ij in the product is the dot product of row i in M with column j in L. Note: The middle dimension has to match, and gets canceled. The size of the result comes from the outer numbers. Likewise in the sum, the middle index gets canceled. On a computer, (matrix times matrix) corresponds to a triple loop, which can be executed in 6 orders, corresponding to 3 different viewpoints. (Each one shows up twice, depending on the order in which the product matrix gets filled in). The first two are the dot product and linear combination interpretations from above. The third one is the sum of rank one matrices interpretation. If w is a row vector (size 1 n), and v is a column vector (size n 1), then w v is a scalar (1 1 matrix), and vw is a matrix of rank 1: v 1 w 1 v 1 w 2 v 1 w n v 2 w 1 v 2 w 2 v 2 w n. v n w 1 v n w 2 v n w n You can think of matrix multiplication ML as the sum of the rank one matrices produced from the products between the i th column of M and the i th row of L. Matrix multiplication is not commutative in general: LM M L. Unless the matrices are square, the two products are not even both defined, or not the same size. However, matrix multiplication is associative and distributive: (LM)N = L(MN) L(M + N) = LM + LN (L + M)N = LN + MN

1.3. MATRICES 7 One more observation about matrix multiplication: If N = ML, then then first column of N depends only on the first column of L, not any of the other numbers in L. The second column of N depends only on the second column in L, and so on. Likewise, the first row in N depends only on the first row in M, and so on. This means that if you want to solve a system of matrix equations AX = B, you can treat this as a sequence of matrix-vector problems: where b i, x i are the columns of B, X. 1.3.2 Basis Change Ax 1 = b 1 Ax m = b m The mapping L is independent of the choice of bases in V, W, but the matrix L depends on the bases. How does the matrix change when you change the bases? We will just consider this for the case where V = W. Suppose you have the standard basis {e i }, and a new basis {f i }. Let F be the matrix with columns f i. Let x be an arbitrary vector. In the original basis, it is expressed with coefficients v i, in the new basis with coefficients w i. x = i v i e i = w i f i = F w. (Use the interpretation of the matrix-vector product F w as a linear combination of the columns of F ). So, v = F w or w = F 1 v. To get from the original representation v in basis {e i } to the new representation w in basis {f i } you have to multiply by F 1. Now assume we have a linear map from V to V. In basis {e i } it is represented by a square matrix L. In the original basis, consider Convert to the new basis y = Lx. F 1 y = ( F 1 LF ) ( F 1 x ). Thus, the mapping L is represented in the new basis as ( F 1 LF ). The matrices L and F 1 LF are called conjugates of each other, or similar matrices. Similar matrices can be considered as the same mapping represented in two different bases. Properties of a matrix that are geometric, such as the determinant or eigenvalues, are preserved under conjugation. Other properties that are analytic are not preserved, such as the special shapes of matrices listed below.

8 CHAPTER 1. LINEAR ALGEBRA REVIEW 1.3.3 Special Matrices Let F m n be the set of matrices of size m n. This becomes a vector space over F of dimension mn, with entry-by-entry addition and scalar multiplication. The unit element for addition is the zero matrix: 0 0 0 0 0 0 O =... 0 0 0 In F n n, the identity matrix 1 0 0. I = 0 1........... 0 0 0 1 satisfies I L = L I = L. For a given L, the inverse matrix L 1 satisfies L L 1 = L 1 L = I. The inverse matrix may or may not exist. Example: Take ( ) a b L = C 2 2. c d The inverse is L 1 = 1 ad bc ( d b c a which can be verified by multiplying. The inverse exists if and only if ad bc 0. The term ad bc in the example is the determinant of the matrix. There are formulas for the determinant of a larger matrix, but we won t need them. Here is what the determinant means. The unit basis vectors e i form a square (in dimension 2) or cube (in dimension 3) of area or volume 1. After the mapping, the images form a parallelogram or parellelepiped. The absolute value of the determinant is the area (or volume) of that. The sign of the determinant has to do with orientation. This is the reason the Jacobian determinant shows up in multidimensional change of variable in integrals. If the determinant is zero, that means that the square or cube gets flattened into something lower-dimensional, and there is no inverse. ),

1.4. EIGENVALUES 9 Theorem 1.5. Properties of the determinant: det(i) = 1 det(ab) = det(a) det(b) det(a 1 ) = 1/ det(a) det(triangular matrix) = product of diagonal terms Properties of the inverse: 1.4 Eigenvalues A 1 exists if and only if det(a) 0 A 1 is unique A A 1 = A 1 A = I (AB) 1 = B 1 A 1 Assume A is a square matrix, of size n n. A nonzero vector v is called an eigenvector to eigenvalue λ, if Av = λv (A λi)v = 0. It is obvious that if v is an eigenvector, so is any multiple of v. More generally, linear combinations of eigenvectors to the same eigenvalue are again eigenvectors, so it makes more sense to think of the eigenspace E(λ) = N(A λi) = {v : (A λi)v = 0}. The dimension of E(λ) is called the geometric multiplicity of λ, written γ(λ). From the definition, a nonzero v exists for a given λ if and only if det(a λi) = 0. It turns out that this determinant is a polynomial of degree n in λ. This is called the characteristic polynomial χ(λ). By the fundamental theorem of algebra, χ(λ) has precisely n roots, possibly complex, possibly multiple. The multiplicity of λ as a root of χ(λ) is called the algebraic multiplicity of λ, written α(λ). Example: The n n identity matrix has χ(λ) = (1 λ) n, and every vector is an eigenvector to eigenvalue 1. Algebraic and geometric multiplicity are both n. Example: The matrix ( ) 1 1 A = 0 1 has χ(λ) = (1 λ) 2, so λ = 1 has algebraic multiplicity 2. The only eigenvector is (1, 0) T, so the geometric multiplicity is 1.

10 CHAPTER 1. LINEAR ALGEBRA REVIEW We always have 1 γ(λ) α(λ). If γ(λ) = α(λ) for all λ, then A is called diagonalizable. In this case, there is a basis of eigenvectors in which A becomes a diagonal matrix: V 1 AV = Λ A = V ΛV 1. Remark: In theoretical linear algebra courses, you spend a lot of time investigating what happens to matrices which are not diagonalizable. There is something called the Jordan Normal Form that leads to lots of interesting mathematics. For numerical purposes, this is completely irrelevant. We simply assume that every matrix is diagonalizable, which is actually pretty close to the truth: for a matrix with randomly chosen entries, the probability that it is not diagonalizable is 0. Some more properties of eigenvalues: The product of all eigenvalues (with appropriate algebraic multiplicity) is the determinant. If A is triangular, the eigenvalues are the numbers on the diagonal. If H is Hermitian, the eigenvalues are real, and H is diagonalizable. If Q is unitary, the eigenvalues are on the complex unit circle: λ i = 1, and Q is diagonalizable. 1.5 The Inner Product In addition to addition and scalar multiplication, many vector spaces also have an inner product: vector times vector = scalar. The standard inner product on C n is v, w = v i w i, i where the bar denotes complex conjugation. For R n, just ignore the bar. Any inner product produces a norm by v = v, v = v i 2. A norm in general is a measure for the length or size of a vector. This one is called the Euclidean norm, and corresponds to the geometrical length of the vector. If v, w are real, then v, w = v w cos θ, where θ is the angle between v and w. In particular, v and w are orthogonal if v, w = 0. i

1.6. SPECIAL MATRIX TYPES 11 For complex vectors you cannot talk about angles, but the definition of orthogonality is the same. Once you have an inner product, you can define the adjoint A of a linear map A by Av, w = v, A w. For matrices, it turns out that A = A T ( A transpose ) if A is real, or A = A H ( A Hermitian transpose ) if A is complex. If A has size m n, then A has size n m, and (A T ) ij = A ji, (A H ) ij = A ji. I frequently use a notation like v = (1, 2, 3, 4) T. That means that v is a column vector, but typesetting it like that would take up too much space on the page, so I write it as the transpose of a row vector. Theorem 1.6. The adjoint has the following properties: (AB) = B A (A ) 1 = (A 1 ) det(a ) = det(a) 1.6 Special Matrix Types 1.6.1 Matrices With Certain Patterns of Zeros Some matrices have certain patterns of zeros that make them easy to work with. These patterns do not usually stay the same under a basis transformation. In fact, many algorithms are based on finding a basis that makes a general matrix take such a shape. The notation just means any entry that does not have to be zero (but it could be). Diagonal 0 0. 0.................. 0 0 0

12 CHAPTER 1. LINEAR ALGEBRA REVIEW Tridiagonal 0 0 0 0 0 0 0.............. 0 0 0 0 0 0 0 More generally, a banded matrix has several bands of numbers near the diagonal. Triangular (upper or lower) 0.......... 0 0 Hessenberg (upper or lower). This is triangular, with one more band along the diagonal......... 0.................... 0 0 Hessenberg matrices are used for eigenvalue problems. It turns out that you cannot transform a general matrix into triangular form with identical eigenvalues in a finite number of steps, but you can get it into Hessenberg form. Then, you start an iterative method to wipe out the extra band. Note that a symmetric Hessenberg matrix is tridiagonal. All of these matrix types can also be defined for non-square matrices. You just start at the top left and follow the diagonal, until you run into an edge. The following matrices all count as upper triangular:

1.7. BLOCK MATRICES 13 1.6.2 Hermitian and Unitary Matrices Here are two other special kinds of matrices that come up a lot in numerical analysis. A matrix which is equal to its adjoint is called self-adjoint in general, but that word is usually only used in the infinite-dimensional setting. For real matrices, it is called symmetric, for complex matrices it is called Hermitian. For solving Ax = b for a symmetric or Hermitian matrix, we only need half the storage space, and half the number of calculations. For eigenvalue problems of real symmetric matrices, we know that the eigenvalues and eigenvectors are real, so we don t need complex arithmetic. Also, the Hessenberg form becomes tridiagonal, which speeds up the algorithm significantly. A square matrix whose columns are mutually orthogonal is called orthogonal in the real case, and unitary in the complex case. Orthogonal matrices are very popular in computation for two reasons: U 1 = U, so we have the inverse handy if we need it These matrices are extremely numerically stable (very little round-off error). Since all eigenvalues of a unitary matrix are on the unit circle, the determinant also has absolute value 1. In the real case, the determinant is either 1 or ( 1). Determinant 1 means the matrix represents a rotation of the coordinate system. Determinant ( 1) means it is a rotation pluys a reflection. There are two special examples of orthogonal matrices that are worth mentioning. A permutation matrix has one 1 in each row or column, the rest are 0. This corresponds to a permutation of the basis vectors. 0 1 0 0 0 0 0 1 1 0 0 0 0 0 1 0 If A is any matrix, then AP is A with its columns in different order. P A is A with its rows in different order. The general 2 2 orthogonal matrix is ( ) ( ) cos θ sin θ cos θ sin θ (det = 1), or (det = 1). sin θ cos θ sin θ cos θ This is a rotation by angle θ (and maybe a reflection). 1.7 Block Matrices You can partition matrices into blocks:

14 CHAPTER 1. LINEAR ALGEBRA REVIEW 1 2 3 4 5 6 7 8 9 10 11 12 ( ) A B C =. D E F As long as the dimensions all fit together, you can add/multiply block matrices just like regular matrices: ( ) A B C G ( ) H AG + BH + CI =. D E F DG + EH + F I I This is used extensively in parallel computing, for splitting the calculations among multiple processors.