Chapter 1. Matrix Algebra

Similar documents
Mathematical Foundations of Applied Statistics: Matrix Algebra

STAT200C: Review of Linear Algebra

Recall the convention that, for us, all vectors are column vectors.

Properties of Matrices and Operations on Matrices

Math Bootcamp An p-dimensional vector is p numbers put together. Written as. x 1 x =. x p

Knowledge Discovery and Data Mining 1 (VO) ( )

Preliminary Linear Algebra 1. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 100

MATH36001 Generalized Inverses and the SVD 2015

Preliminaries. Copyright c 2018 Dan Nettleton (Iowa State University) Statistics / 38

Introduction to Numerical Linear Algebra II

Chap 3. Linear Algebra

Foundations of Matrix Analysis

Chapter 3. Matrices. 3.1 Matrices

Review of Linear Algebra

Introduction to Matrix Algebra

Quick Tour of Linear Algebra and Graph Theory

MATRIX ALGEBRA. or x = (x 1,..., x n ) R n. y 1 y 2. x 2. x m. y m. y = cos θ 1 = x 1 L x. sin θ 1 = x 2. cos θ 2 = y 1 L y.

Linear Algebra V = T = ( 4 3 ).

3 (Maths) Linear Algebra

Applied Matrix Algebra Lecture Notes Section 2.2. Gerald Höhn Department of Mathematics, Kansas State University

Massachusetts Institute of Technology Department of Economics Statistics. Lecture Notes on Matrix Algebra

4. Matrix inverses. left and right inverse. linear independence. nonsingular matrices. matrices with linearly independent columns

Section 3.3. Matrix Rank and the Inverse of a Full Rank Matrix

Appendix A: Matrices

2. Matrix Algebra and Random Vectors

Lecture 1 Review: Linear models have the form (in matrix notation) Y = Xβ + ε,

Linear Algebra Formulas. Ben Lee

Basic Concepts in Matrix Algebra

APPENDIX A. Matrix Algebra. a n1 a n2 g a nk

Miscellaneous Results, Solving Equations, and Generalized Inverses. opyright c 2012 Dan Nettleton (Iowa State University) Statistics / 51

Introduction Eigen Values and Eigen Vectors An Application Matrix Calculus Optimal Portfolio. Portfolios. Christopher Ting.

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Lecture 15 Review of Matrix Theory III. Dr. Radhakant Padhi Asst. Professor Dept. of Aerospace Engineering Indian Institute of Science - Bangalore

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

ELE/MCE 503 Linear Algebra Facts Fall 2018

More Linear Algebra. Edps/Soc 584, Psych 594. Carolyn J. Anderson

Chapter 3 Transformations

Quick Tour of Linear Algebra and Graph Theory

Matrix Algebra, part 2

Problem Set (T) If A is an m n matrix, B is an n p matrix and D is a p s matrix, then show

MATH 320: PRACTICE PROBLEMS FOR THE FINAL AND SOLUTIONS

Introduction to Quantitative Techniques for MSc Programmes SCHOOL OF ECONOMICS, MATHEMATICS AND STATISTICS MALET STREET LONDON WC1E 7HX

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Multiplicative Perturbation Bounds of the Group Inverse and Oblique Projection

LINEAR MODELS IN STATISTICS

1 Linear Algebra Problems

22.3. Repeated Eigenvalues and Symmetric Matrices. Introduction. Prerequisites. Learning Outcomes

THE MOORE-PENROSE GENERALIZED INVERSE OF A MATRIX

Chapter 7. Linear Algebra: Matrices, Vectors,

Linear Algebra. Christos Michalopoulos. September 24, NTU, Department of Economics

MAT Linear Algebra Collection of sample exams

Math 315: Linear Algebra Solutions to Assignment 7

Elementary linear algebra

Computational Methods CMSC/AMSC/MAPL 460. Eigenvalues and Eigenvectors. Ramani Duraiswami, Dept. of Computer Science

Lecture Summaries for Linear Algebra M51A

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 2

EE731 Lecture Notes: Matrix Computations for Signal Processing

Matrices A brief introduction

1. Linear systems of equations. Chapters 7-8: Linear Algebra. Solution(s) of a linear system of equations (continued)

A Quick Tour of Linear Algebra and Optimization for Machine Learning

Lecture 2: Linear operators

Stat 206: Linear algebra

[3] (b) Find a reduced row-echelon matrix row-equivalent to ,1 2 2

Matrices A brief introduction

NOTES ON MATRICES OF FULL COLUMN (ROW) RANK. Shayle R. Searle ABSTRACT

ELEMENTS OF MATRIX ALGEBRA

Linear Algebra review Powers of a diagonalizable matrix Spectral decomposition

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.

Lecture Notes to be used in conjunction with. 233 Computational Techniques

Study Notes on Matrices & Determinants for GATE 2017

Lecture 7: Positive Semidefinite Matrices

Lecture Notes to be used in conjunction with. 233 Computational Techniques

Lecture Notes in Linear Algebra

Deep Learning Book Notes Chapter 2: Linear Algebra

1. General Vector Spaces

NOTES on LINEAR ALGEBRA 1

Linear Algebra: Characteristic Value Problem

Review of Vectors and Matrices

Notes on Eigenvalues, Singular Values and QR

c c c c c c c c c c a 3x3 matrix C= has a determinant determined by

B553 Lecture 5: Matrix Algebra Review

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

Jim Lambers MAT 610 Summer Session Lecture 1 Notes

1. Let m 1 and n 1 be two natural numbers such that m > n. Which of the following is/are true?

A matrix is a rectangular array of. objects arranged in rows and columns. The objects are called the entries. is called the size of the matrix, and

Introduction to Matrices

GATE Engineering Mathematics SAMPLE STUDY MATERIAL. Postal Correspondence Course GATE. Engineering. Mathematics GATE ENGINEERING MATHEMATICS

Linear Algebra & Analysis Review UW EE/AA/ME 578 Convex Optimization

MOORE-PENROSE INVERSE IN AN INDEFINITE INNER PRODUCT SPACE

Chapter Two Elements of Linear Algebra

A matrix is a rectangular array of. objects arranged in rows and columns. The objects are called the entries. is called the size of the matrix, and

Matrices and Linear Algebra

STAT 8260 Theory of Linear Models Lecture Notes

Lecture notes: Applied linear algebra Part 1. Version 2

Linear Algebra Review

Linear Algebra Primer

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 1 x 2. x n 8 (4) 3 4 2

Matrices. Chapter What is a Matrix? We review the basic matrix operations. An array of numbers a a 1n A = a m1...

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

Analytical formulas for calculating the extremal ranks and inertias of A + BXB when X is a fixed-rank Hermitian matrix

Transcription:

ST4233, Linear Models, Semester 1 2008-2009 Chapter 1. Matrix Algebra 1 Matrix and vector notation Definition 1.1 A matrix is a rectangular or square array of numbers of variables. We use uppercase boldface letters, and in this module all elements of matrices will be real numbers or variables representing real numbers. The notation A = (a ij ) represents a matrix by means of a typical element, for example ( ) a11 a 12 A = a 21 a 22. Definition 1.2 A vector is a matrix with a single column. We use lowercase boldface letters for column vectors. Row vectors are expressed as transposes of column vectors, for example, x = (x 1, x 2, x 3 ). 1

2 Rank Definition 2.1 A set of vectors a 1, a 2,, a n is said to be linearly dependent if scalars c 1, c 2,, c n (not all zero) can be found such that c 1 a 1 + c 2 a 2 + c n a n = 0. If no coefficients c 1, c 2,, c n (not all zero) can be found that satisfy the above equation, the set of vectors a 1, a 2,, a n is said to be linearly independent. Definition 2.2 The rank of a matrix is the number of linearly independent rows (and columns) in the matrix. 2

Theorem 2.1 (i) If the matrices A and B are conformable, then rank(ab) rank(a) and rank(ab) rank(b). (ii) If B and C are nonsingular, rank(ab) = rank(ca) = rank(a). (iii) For any matrix A, rank(a A) = rank(aa ) = rank(a ) = rank(a). Proof: (i) All the columns of AB are linear combinations of the columns of A. Consequently, the number of linearly independent columns of AB is less than or equal to the number of linearly independent columns of A, and rank(ab) rank(a). Similarly, all the rows of AB are linear combinations of the rows of B, and therefore rank(ab) rank(b). (ii) If B is nonsingular, then there exists a matrix B 1 such that BB 1 = I. Then by part (i) we have rank(a) = rank(abb 1 ) rank(ab) rank(a). (iii) If Ax = 0, A Ax = 0. Hence, rank(a) rank(a A). Conversely, if A Ax = 0, x A Ax = (Ax) (Ax) = 0, thus, Ax = 0. Hence, rank(a A) rank(a). So we have rank(a A) = rank(a). 3

3 Orthogonal Matrices Definition 3.1 A square matrix A: n n is said to be orthogonal if AA = I = A A. For orthogonal matrices, we have (1) A = A 1. (2) A = ±1. (3) Let δ ij = 1 for i = j and 0 for i j denote the Kronecker symbol. The column vectors a i and the row vectors a (i) of A satisfy the conditions a ia j = δ ij, a (i) a (j) = δ ij. (4) AB is orthogonal if A and B are orthogonal. 4

Theorem 3.1 If the matrix Q : p p is orthogonal and if A is any p p matrix, then (i) Q = +1 or 1. (ii) Q AQ = A. (iii) 1 c ij 1, where c ij is any element of Q. 5

4 Eigenvalues and Eigenvectors Definition 4.1 If A : p p is a square matrix, then q(λ) = A λi is a pth order polynomial in λ. The p roots λ 1,, λ p of the characteristic equation q(λ) = A λi = 0 are called eigenvalues or characteristic roots of A. The eigenvalues are possibly complex numbers. Since A λ i I = 0, A λ i I is a singular matrix. Hence, there exists a nonzero vector γ i 0 satisfying (A λ i I)γ i = 0, that is, Aγ i = λ i γ i, γ i is called the (right) eigenvectors of A for the eigenvalue λ i. If λ i is complex, then γ i may have complex components. An eigenvector γ with real components is called standardized if γ γ = 1. 6

Theorem 4.1 If A is any n n matrix with eigenvalues λ 1,, λ n, then (i) A = n i=1 λ i. (ii) trace(a) = n i=1 λ i. Theorem 4.2 Let A be an n n symmetric matrix. (i) The eigenvalues are all real. (ii) The eigenvectors x 1, x 2,, x n of A are mutually orthogonal; that is, x i x j = 0 for i j. 7

Theorem 4.3 If λ is an eigenvalue of the square matrix A with corresponding eigenvector x, then for a polynomial function g(a) of some order k, an eigenvalue is given by g(λ) and x is the corresponding eigenvector of g(a) as well as A. Example 4.1 (A 3 + 4A 2 3A + 5I)x = A 3 x + 4A 2 x 3Ax + 5x = (λ 3 + 4λ 2 3λ + 5)x. Thus λ 3 + 4λ 2 3λ + 5 is an eigenvalue of A 3 + 4A 2 3A + 5I, and x is the corresponding eigenvector. 8

5 Decomposition of Matrices Theorem 5.1 (Spectral decomposition theorem) If A is an n n symmetric matrix with eigenvalues λ 1, λ 2,, λ n and normalized eigenvectors x 1, x 2,, x n, then A can be expressed as A = CDC = n λ i x i x i, i=1 where D = diag(λ 1,, λ n ) and C is the orthogonal matrix C = (x 1, x 2,, x n ). Proof: See Rencher (2000, p.46-47). 9

Theorem 5.2 (Singular Value Decomposition) Suppose that A is an n p matrix of rank r. Then there exists two orthogonal matrices P n n and Q p p such that ( ) Ar O A = P Q, O O where A r = diag(λ 1,, λ r ), λ i > 0, i = 1, 2,, r, and λ 2 1,, λ2 r are non-zero eigenvalues of A A. 10

6 Positive Definite and Positive Semidefinite Matrices Definition 6.1 If a symmetric matrix A has the property y Ay > 0 for all possible y except y = 0, A is said to be a positive definite matrix and it is denoted by A > 0. Similarly, if y Ay 0 for all y except y = 0, A is said to be positive semidefinite and it is denoted by A 0. 11

Theorem 6.1 Let A be n n with eigenvalues λ 1, λ 2,, λ n. (i) If A is positive definite, then λ i > 0 for i = 1, 2,, n. (ii) If A is positive semidefinite, then λ i 0 for i = 1, 2,, n. eigenvalues is the rank of A. The number of nonzero If a matrix A is positive definite, we can find a square root matrix A 1/2, A 1/2 = CD 1/2 C, where D 1/2 = diag( λ 1,, λ n ). The matrix A 1/2 is symmetric and has the property A 1/2 A 1/2 = (A 1/2 ) 2 = A. 12

7 Idempotent Matrices Definition 7.1 A square matrix A is called idempotent if it satisfies A 2 = A, A = A. Theorem 7.1 Let A be an n n idempotent matrix with rank(a) = r n. Then we have: (i) The eigenvalues of A are 1 or 0. (ii) tr(a) = rank(a) = r. (iii) If A is of full rank n, then A = I n. (iv) If A and B are idempotent and if AB = BA, then AB is also idempotent. (v) If A is idempotent and P is orthogonal, then P AP is also idempotent. (vi) If A is idempotent, then I A is idempotent and A(I A) = (I A)A = 0. 13

8 Generalized inverse Definition 8.1 Let A be an n p matrix. Then a p n matrix A is said to be a generalized inverse of A if AA A = A. holds. Note when A is nonsingular, A is unique and A = A 1. A generalized inverse is also called a conditional inverse. If A is n p, any generalized inverse A is p n. Every matrix, whether square or rectangular, has a generalized inverse. For example, 1 x = 2 3. 4 Then, x 1 = (1, 0, 0, 0), x 2 = (0, 1/2, 0, 0), x 3 = (0, 0, 1/3, 0), and x 4 generalized inverses of x. = (0, 0, 0, 1/4) are all 14

Example 8.1 One A is: 2 2 3 A = 1 0 1. 3 2 4 0 1 0 A = 1 2 1 0. 0 0 0 15

Theorem 8.1 Suppose A is n p of rank r and that A is partitioned as ( ) A11 A 12 A =, A 21 A 22 where A 11 is r r of rank r. Then a generalized inverse of A is given by ( ) A A 1 11 O =, O O where the three O matrices are of appropriate sizes so that A is p n. Proof: By multiplication of partitioned matrices, we obtain ( ) ( ) I O AA A11 A 12 A = A 21 A 1 A = 11 O A 21 A 21 A 1 11 A. 12 To show A 2 1A 1 11 A 12 = A 22, we multiply A by ( ) I O B = A 21 A 1 11 I and we have ( ) A11 A 12 BA = O A 22 A 2 1A 1 11 A 12 Since B is nonsingular, rank(ba) = rank(a) = r. Since A 11 is nonsingular, there exists a matrix D such that ( ) A 12 A 22 A 21 A 1 11 A = 12 ( A11 Hence A 22 A 21 A 1 11 A 12 = O and the theorem is proved. O ) D. 16

Corollary 8.1 Suppose A is n p of rank r and that A is partitioned as ( ) A11 A 12 A =, A 21 A 22 where A 22 is r r of rank r. Then a generalized inverse of A is given by ( ) O O A =, O A 1 22 where the three O matrices are of appropriate sizes so that A is p n. The nonsingular submatrix need not be in the A 11 or A 12 positions, as in the above theorem and corollary. In this case, we can do row and column operations on A to get a r r leading minor. That is, there exists nonsingular matrix P m m and Q n n such that ( ) B11 B 12 P AQ = B =, B 21 B 22 where B 11 is r r with rank(b 11 ) = r. It can be shown that ( ) B 1 11 O G = Q P O O is a generalized inverse of A. 17

More general, we can find many generalized inverse of A by using the theorem of singular value decomposition, there exists nonsingular matrices P and Q such that ( ) Ir O P AQ =. O O Then ( ) Ir G 12 G = Q P G 21 G 22 is a generalized inverse of A, where G 12, G 21 and G 22 are arbitrary with conformable sizes. To see this: ( ) A = P 1 Ir O Q 1 O O ( ) ( ) ( AGA = P 1 Ir O Q 1 Ir G 12 Q P P 1 Ir O O G 21 G 22 O ( ) ( ) ( ) = P 1 Ir O Ir G 12 Ir O Q 1 O O G 21 G 22 O O ( ) = P 1 Ir O Q 1 O O = A ) O O Q 1 18

For the above result, we can see that (1) There exist infinitely many generalized inverse for any singular matrix. (2) rank(a ) rank(a) (we can choose A with rank(a ) = rank(a)). (3) If A is symmetric, A needs not be symmetric. In practice, we do not need to carry out row and column operations to get a r r nonsingular leading minor in order to find a generalized inverse. A general algorithm for finding a generalized inverse A for any n p matrix A of rank r (Searle 1982, P.218) is as follows. 1. Find any nonsingular r r submatrix C. It is not necessary that the elements of C occupy adjacent rows and columns in A. 2. Find C 1 and (C 1 ). 3. Replace the elements of C by the elements of (C 1 ). 4. Replace all other elements in A by zeros. 5. Transpose the resulting matrix. 19

Some properties of generalized inverses are given in the following theorem. Lemma 8.1 (i) For any matrix A we have A A = O if and only if A = O. (ii) Let X O be an m n matrix and A and B be n n matrices, AX X = BX X if and only if AX = BX. Proof: (i) If A = O, we have A A = O. Conversely, if A A = O, let A = (a 1,, a n ) be the columnwise partition, then A A = (a ia j ) = O, so that a i = 0 and A = O. (ii) If AX = BX, we have AX X = BX X. Conversely, if AX X = BX X, we have (A B)X X(A B ) = O. So (A B)X = O, i.e., AX = BX. 20

Theorem 8.2 Let A be n p of rank r, let A be any generalized inverse of A, and let (A A) be any generalized inverse of A A. Then: (i) rank(a A)=rank(AA )=rank(a)=r. (ii) (A ) is a generalized inverse of A ; that is, (A ) = (A ). (iii) A = A(A A) A A and A = A A(A A) A. (iv) (A A) A is a generalized inverse of A; that is, A = (A A) A. (v) A(A A) A is symmetric, has rank=r, and is invariant to the choice of (A A) ; that is, A(A A) A remains the same, no matter what value of (A A) is used. Proof: (i) The result follows from rank(a) = rank(aa A) rank(a A) rank(a). (ii) Since AA A = A, and A (A ) A = A, hence, (A ) is a generalized inverse of A. (iii) Let B = A(A A) A A A, we can verify that B B = (A A(A A) A A )(A(A A) A A A) = O by the definition of a generalized inverse. (iv) It follows from (iii). (v) Since (A A) A is a generalized inverse of A, and (i) rank(aa ) = r, so rank(a(a A) A ) = r. Let G 1 and G 2 be any two generalized inverse of A A, from (iii) we have AG 1 A A = AG 2 A A = A. By using the lemma, we have AG 1 A = AG 2 A, which means that A(A A) A is invariant to the choice of (A A). 21

Definition 8.2 (Moore-Penrose inverse) A matrix A + satisfying the following conditions is called the Moore-Penrose inverse of A: (i) AA + A = A, (ii) A + AA + = A +, (iii) (A + A) = A + A, (iv) (AA + ) = AA +. A + is unique. Proof: Proof for the uniqueness of Moore-Penrose inverse. Suppose that X and Y are both A +. From the definition of Moore-Penrose inverse, we have X = XAX = X(AX) = XX A = XX (AY A) = X(AX) (AY ) = (XAX)AY = XAY = (XA) Y AY = A X A Y Y = A Y Y = (Y A) Y = Y AY = Y. This completes the proof. 22

A + is a special case of A. It possesses all of the properties that A has. In addition, it is unique and possesses the following properties. Theorem 8.3 Let A be a m n-matrix. Then (i) When A is nonsingular, we have A + = A 1. (ii) (A + ) + = A. (iii) (A + ) = (A ) +. (iv) rank(a) = rank(a + ) = rank(a + A) = rank(aa + ). (v) If rank(a) = m, A + = A (AA ) 1 and AA + = I m. (vi) If rank(a) = n, A + = (A A) 1 A and A + A = I n. (vii) If P is an m m orthogonal matrix, Q is an n n orthogonal matrix, (P AQ) + = Q 1 A + P 1. (viii) (A A) + = A + (A ) + and (AA ) + = (A ) + A +. (ix) A + = (A A) + A = A (AA ) +. 23

Theorem 8.4 Suppose that A can be expressed as in the singular value decomposition theorem, then For any matrix A, A + is unique. A + = Q ( A 1 r O ) O P. O Proof: It can be easily verified that the A + defined in the theorem satisfies all the equations given in the definition of Moore-Penrose inverse. Theorem 8.5 Let A be an n n symmetric matrix, a R(A), b R(A), and assume 1 + b A + a 0. Then (A + ab ) + = A + A+ ab A + 1 + b A + a. 24

9 Generalized Inverse and Systems of Equations Theorem 9.1 The system of equations Ax = c has a solution (consistent) if and only if for any generalized inverse A of A AA c = c. Proof: (sufficiency) If the system is consistent, then AA c = c. Since AA A = A, we have AA Ax = Ax. Substituting Ax = c on both sides, we have AA c = c. (necessity) If AA c = c, the system is consistent. Suppose AA c = c, a solution exists, namely x = A c. 25

Theorem 9.2 If the system of equations Ax = c is consistent, then all possible solutions can be obtained in the following two ways. (a) Use all possible values of A in x = A c. (b) Use a specific A in x = A c + (I A A)z, and use all possible values of the arbitrary vector z. In particularly, we write x = A + c + (I A + A)z. Among all the solutions, x 0 = A + c has the smallest length. Proof: Part (a): see Searle (1982, p.238). Part II of (b): x 2 = [A + c + (I A + A)z] [A + c + (I A + A)z] = x 2 + z (I A + A) 2 z + 2c (A + ) (I A + A)z = x 2 + z (I A + A) 2 z x 2. This is because (A + ) (I A + A) = O, and z (I A + A) 2 z 0 for arbitrary z. The results follows. 26

10 Derivatives of Linear Functions and Quadratic Forms Let u = f(x) be a function of the variable x 1, x 2,, x p in x = (x 1, x 2,, x p ), and let u/ x 1,, u/ x p be the partial derivatives. We define u/ x as u x = Theorem 10.1 Let u = a x = x a, where a = (a 1, a 2,, a p ) is a vector of constants. Then u x 1 u x 2. u x p. u x = (a x) x = (x a) x = a. Theorem 10.2 Let u = x Ax, where A is a symmetric matrix of constants. Then u x = (x Ax) = 2Ax. x 27