Introduction to Matrix Algebra August 18, 2010 1 Vectors 1.1 Notations A p-dimensional vector is p numbers put together. Written as x 1 x =. x p. When p = 1, this represents a point in the line. When p = 2 it represents a point in a plane; when p = 3, it represents a point in the 3-d space. For general p, this represents a point in a p-dimension Euclidean space, written as R p. Alternatively, it can also be interpreted as an arrow, or vector, starting from the origin, pointing to that point. The transpose operation changes a column vector to a row vector; and is written as x ; that is x = (x 1,..., x p ). A number is also called a scalar. We can define the product between a vector x R p and a scalar, c R, as follows: x 1 cx 1 cx = c. =.. x p cx p Draw a picture here. We define addition of two vectors, y, as follows x 1 y 1 x 1 + y 1 x + y =. +. =. x p y p x p + y p Draw a picture here. 1
1.2 Length and angle Suppose p = 2, then by Pythagoras theorem we know that the length of x = (x 1, x 2 ) is simply x 2 1 + x 2 2. We can extend this concept to arbitrary p. The length of a vector x R p is the length of the arrow from origin to that point: x = x 2 1 + + x 2 p Note that, for any scalar c R p, cx = c x. Now let us define the angle between two vectors. Again start with p = 2. Let θ be the angle between x and y. Let θ 1 be the angle between x and the horizontal axis and θ 2 be the angle between y and the horizontal axis. Then cos(θ 1 ) = x 1 x, cos(θ 2) = y 1 x, sin(θ 1) = x 2 x, sin(θ 2) = y 2 x, Now we can express the cosine of the angle θ in terms of x and y, as follows: cos(θ 2 θ 1 ) = cos(θ 2 ) cos(θ 1 ) + sin(θ 2 ) sin(θ 1 ) = x 1y 1 + x 2 y 2 x y. This formula is generalized to the p-dimensional case as cos(θ) = x 1y 1 + + x p y p. x y We call x 1 y 1 + + x p y p the inner product, or scalar product, between x and y, and write it as x y; that is, In this notation, The number θ: x y = x 1 y 1 + + x p y p. cos(θ) = x y x y. ( ) x cos 1 y x y is defined as the angle between two vectors x and y in R p. Draw a picture here. 2
Example. Let x = (1, 3, 2), y = ( 2, 1, 1). 1. Find 3x, 2. Find x + y, 3. Find the lengths of x and y, 4. Find the cosine of the angle between x and y, 5. Find the angle between x and y. 6. Check that 3x = 3 x. Solution: 1. 3x = 3(1, 3, 2) = (3 1, 3 3, 3 2) = (3, 9, 6). 2. x + y = (1, 3, 2) + ( 2, 1, 1) = (1 2, 3 + 1, 2 1) = ( 1, 4, 1). 3. x = 1 2 + 3 2 + 2 2 = 14 = 3.742, y = ( 2) 2 + 1 2 + ( 1) 2 = 6 = 2.449. 4. Let θ be the angle between x and y. Then cos(θ) = (1)( 2) + (3)(1) + (2)( 1) 14 6 = 0.109 5. Now θ = cos 1 ( 0.109) = 1.680 = 96.26. 6. 3 x = 3 14, 3x = 3 2 (14). Obviously, they are the same. 3
1.3 Linear independence A set of vectors x 1,..., x k are said to be linearly dependent, if there exist scalars c 1,..., c k, not all zero, such that c 1 x 1 + c k x k = 0. Interpretation: if k = 2, then this means two vectors are proportional to each other; that is, they are on the same line. When k = 3, this means the k points are in the same plane. The set of vectors {x 1,... x k } are linearly dependent if they are not linear independent. That is, if there does not exist c 1,..., c k, not all zero, such that the above holds. In other words, whenever the above holds, c 1 = = c k = 0. Interpretation, they do not reside in any lower dimensional hyperplane. Example. Are the following groups of vectors linearly dependent? 1. x 1 = (1, 2, 3), x 2 = (1, 1, 1), x 3 = (0, 0, 1) 2. x 1 = (1, 2, 3), x 2 = (1, 1, 1), x 3 = (1, 3, 5). 3. Illustrate your conclusions by vectors in a 3-dimensional space. Solution: 1. Suppose c 1 x 1 + c 2 x 2 + c 3 x 3 = 0. (1) Then c 1 1 1 0 2 + c 2 1 + c 3 0 = 0 3 1 1 This is equivalent to c 1 + c 2 = 0 (2) 2c 1 + c 2 = 0 (3) 3c 1 + c 2 + c 3 = 0 (4) 4
From equation (2) we see that c 2 = c 1. Substitute this into equation (3) to obtain 2c 1 c 1 = c 1 = 0. Substitute this into (2) to show that c 2 = 0. Substitute c 1 = c 2 = 0 into (4) to show that c 3 = 0. Thus we have shown that whenever (1) holds we have that c 1 = c 2 = c 3 = 0. That is, x 1, x 2, x 3 are linearly independent. 2. Similarly, equation (1) implies that c 1 + 2c 2 + 3c 3 =0 (5) c 1 + c 2 + c 3 =0 (6) c 2 + 2c 3 =0 (7) From (7) we see that c 2 = 2c 3. Substitute this into (6) and (5) to obtain c 1 c 3 =0 c 1 c 3 =0 Thus equation (1) will be satisfied if c 1 = c 3 and c 2 = 2c 3. For example, if (c 1, c 2, c 3 ) = (1, 2, 1) the equation (1) is satisfied. Hence x 1, x 2, x 3 are linearly dependent. 3. Draw two pictures here to illustrate whether the three points are on the lower dimensional space or not. 1.4 Projection of a vector Suppose I have two vectors, x, y R p. The vector (x y)(y y) 1 y is called the projection of x onto y. We write this as P y (x). Note that this vector is proportional to y. Then length of this projection is P y (x) = x y y y 2 = x y y = x y x = x cos(θ). x y Notice that y y is a vector with unit length, and P y (x) = x y y y = x x y y 2 x y y = x cos(θ) y y. Draw a picture here. 5
2 Matrices 2.1 Notations A matrix is a collection of numbers ordered by rows and columns. It is customary to enclose the elements of a matrix in parentheses, brackets, or braces. For example, the following is a matrix: ( ) 5 8 2 X =. 1 0 7 This matrix has two rows and three columns, so it is referred to as a 2 by 3 matrix. The elements of a matrix are numbered in the following way: ( ) X11 X X = 12 X 13. X 21 X 22 X 23 The first subscript in a matrix refers to the row and the second subscript refers to the column. X ij denotes the element on the ith row and jth column of matrix X. A square matrix has as many rows as it has columns. ( ) 1 3 A =. 6 2 A symmetric matrix is a square matrix in which x ij = x ji for all possible i and j. ( ) 1 3 A =. 3 2 A diagonal matrix is a square matrix where all the off diagonal elements are 0. A diagonal matrix is always symmetric. ( ) 1 0 A =. 0 2 An identity matrix is a diagonal matrix with 1s and only 1s on the diagonal. The p p identity matrix is almost always denoted as I p. 1 0 0 I 3 = 0 1 0. 0 0 1 6
2.2 Basic operations Matrix addition: to add two matrices, they both must have the same number of rows and they both must have the same number of columns. The elements of the two matrices are simply added together, element by element, to produce the results. Let A and B be two m n matrices, then (A + B) ij = A ij + B ij for 1 i m, 1 j n. ( ) ( ) ( ) 1 0 0 1 2 3 2 2 3 + =. 0 1 0 0 1 0 0 2 0 Matrix subtraction works in the same way. Scalar multiplication: each element in the product matrix is simply the scalar multiplied by the element in the matrix. Let c be a scalar and A be a m n matrix, then (ca) ij = ca ij for 1 i m, 1 j n. ( ) ( ) 1 0 0 2 0 0 2 =. 0 1 0 0 2 0 Scalar multiplication is commutative, or ca = Ac. Multiplication of a row vector and a column vector: the row vector must have as many columns as the column vector has rows. The product is a scalar. Let a 1 b = ( ) b 1 b p, a =. a 1 then ba = ( ) b 1 b p. a p = p i=1 b ia i. a p, Multiplication of two matrices: multiplication of two matrices is defined only if the number of columns of the left matrix is the same as the number of rows of the right matrix. Let A be m p and B be p n, then (AB) ij = 7 p A il B lj l=1
Let a i be the ith row vector of A. Let b j be the jth column vector of B. Then (AB) ij = a i b j Some properties: 1. A + B = B + A (commutativity). 2. (ca)b = A(cB) (scalar associativity). 3. c(a + B) = ca + cb (scalar distributivity). 4. (AB)C = A(BC) (matrix associativity). 5. (A + B)C = AC + BC (matrix left distributivity). 6. C(A + B) = CA + CB (matrix right distributivity). 7. AI = A, IB = B. Generally, AB BA. Matrix transpose: the transpose of a matrix A is denoted by A or A T. The first row of a matrix becomes the first column of the transpose matrix, the second row of the matrix becomes the second column of the transpose, etc. A ij = A ji. The transpose of a row vector will be a column vector, and the transpose of a column vector will be a row vector. The transpose of a symmetric matrix is simply the original matrix. Matrix inverse: the inverse of a matrix A is denoted by A 1. AA 1 = A 1 A = I. A matrix must be square to have an inverse, but not all square matrices have an inverse. In some cases, the inverse does not exist. Trace of matrix: denoted as tr(a), the trace is used only for square matrices and equals the sum of the diagonal elements of the matrix. Let A be p p, then tr(a) = p i=1 A ii. Determinant: the determinant of A is denoted by det(a) or A. Some properties: 8
1. A matrix is invertible if and only if its determinant is nonzero. 2. The determinant of a product of square matrices equals the product of their determinants: det(ab) = det(a)det(b). 3. det(a 1 ) = [det(a)] 1, det(a ) = det(a). 4. Adding a multiple of any row to another row, or a multiple of any column to another column, does not change the determinant. 5. Interchanging two rows or two columns affects the determinant by multiplying it by -1. ( ) a b 6. The 2 2 matrix A = has determinant det(a) = ad bc. c d 7. The determinant det(a) can be viewed as the oriented area of the parallelogram with vertices at (0, 0), (a, b), (a+c, b+d), and (c, d). The oriented area is the same as the usual area, except that it is negative when the vertices are listed in clockwise order. Let A be an n p matrix. The column rank of a matrix is the maximal number of independent columns of A; the row rank of A is the maximal number of independent rows of A. Let a 1,..., a p be the columns of A. For any 1 k p, we can find ( p k) sets of k columns. If there is one set of columns are linearly independent, then we know the column rank of A is no less than k. Increase k to k + 1. There are ( p k+1) sets of k + 1 columns. If all sets are linearly independent, then we know the rank is k. The row rank is always equal to the column rank, and they are called the rank of A. A square matrix A of dimension k k is a full rank matrix if its rank is k. A square matrix is invertible if and only if it is full rank. Orthogonal matrices: only square matrices may be orthogonal matrices. An orthogonal matrix satisfies: AA = A A = I. The inverse of an orthogonal matrix is the transpose of that matrix, or A = A 1. Eigenvalues and Eigenvectors: a non-zero vector v of dimension n is an eigenvector of square (n n) matrix A if and only if it satisfies the eigenvalue equation Aq = λq, 9
where λ is a scalar called the eigenvalue and q is a vector called the eigenvector corresponding to λ. This yields the characteristic equation p(λ) = 0, where the characteristic polynomial is defined as p(λ) = det(a λi). Eigenvalue decomposition: If (λ 1, e 1 ),..., (λ k, e k ) are the k eigenpairs of the matrix A k k, then the following equality holds: A = λ 1 e 1 e 1 + + λ k e k e k. This is called the spectral decomposition of A, or eigen decomposition of A. Let A be a k k square, symmetric matrix, and let (λ 1, e 1 ),..., (λ k, e k ) be the eigen pairs of A. Let α be a number. The following k k matrix λ α 1 e 1 e 1 + + λ α k e k e k is called the αth power of A. Notice that when 0 α < 1, the above is defined only when λ 1,..., λ k are nonnegative. This is written as A α. In particular, if λ 1 0,... λ k 0 and α = 1/2, then A 1/2 is called the square root of the matrix A. The matrix A raised to the power 1 is the same as the inverse of a matrix. Definiteness: a symmetric n n matrix A is called positive definite, if for all nonzero vectors x R n, the associated quadratic form takes only positive values. Q(x) = x Ax If the quadratic form takes only non-negative values, the symmetric matrix is called semi-positive definite; if the quadratic form takes only negative values, the matrix is called negative definite. 10