Matrix Vector Products

Similar documents
DIAGONALIZATION. In order to see the implications of this definition, let us consider the following example Example 1. Consider the matrix

Chapter 7. Canonical Forms. 7.1 Eigenvalues and Eigenvectors

Calculating determinants for larger matrices

MATH 221, Spring Homework 10 Solutions

235 Final exam review questions

EIGENVALUE PROBLEMS. Background on eigenvalues/ eigenvectors / decompositions. Perturbation analysis, condition numbers..

1 Last time: least-squares problems

MATH 583A REVIEW SESSION #1

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)

15 Singular Value Decomposition

Definition (T -invariant subspace) Example. Example

The Singular Value Decomposition

Review problems for MA 54, Fall 2004.

Mathematical foundations - linear algebra

Solutions to Final Practice Problems Written by Victoria Kala Last updated 12/5/2015

Remark By definition, an eigenvector must be a nonzero vector, but eigenvalue could be zero.

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Math 315: Linear Algebra Solutions to Assignment 7

1. In this problem, if the statement is always true, circle T; otherwise, circle F.

Lecture 1: Review of linear algebra

Eigenvalues and Eigenvectors

CS 143 Linear Algebra Review

Eigenvalue and Eigenvector Homework

and let s calculate the image of some vectors under the transformation T.

Background Mathematics (2/2) 1. David Barber

Linear Algebra - Part II

Chap 3. Linear Algebra

University of Colorado Denver Department of Mathematical and Statistical Sciences Applied Linear Algebra Ph.D. Preliminary Exam January 23, 2015

Linear Algebra- Final Exam Review

MATH 423 Linear Algebra II Lecture 20: Geometry of linear transformations. Eigenvalues and eigenvectors. Characteristic polynomial.

a 11 a 12 a 11 a 12 a 13 a 21 a 22 a 23 . a 31 a 32 a 33 a 12 a 21 a 23 a 31 a = = = = 12

Math Matrix Algebra

Linear Algebra Practice Problems

LINEAR ALGEBRA 1, 2012-I PARTIAL EXAM 3 SOLUTIONS TO PRACTICE PROBLEMS

Math Linear Algebra Final Exam Review Sheet

Lecture 7: Positive Semidefinite Matrices

Conceptual Questions for Review

MATH 31 - ADDITIONAL PRACTICE PROBLEMS FOR FINAL

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Eigenvalues, Eigenvectors, and Diagonalization

MATH 315 Linear Algebra Homework #1 Assigned: August 20, 2018

Symmetric and anti symmetric matrices

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Remark 1 By definition, an eigenvector must be a nonzero vector, but eigenvalue could be zero.

1. Let m 1 and n 1 be two natural numbers such that m > n. Which of the following is/are true?

Solutions to Final Exam

UNIT 6: The singular value decomposition.

Problem Set (T) If A is an m n matrix, B is an n p matrix and D is a p s matrix, then show

Summer Session Practice Final Exam

Math 240 Calculus III

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Linear algebra II Homework #1 solutions A = This means that every eigenvector with eigenvalue λ = 1 must have the form

MATH 1553-C MIDTERM EXAMINATION 3

33AH, WINTER 2018: STUDY GUIDE FOR FINAL EXAM

5.) For each of the given sets of vectors, determine whether or not the set spans R 3. Give reasons for your answers.

Linear algebra II Tutorial solutions #1 A = x 1

4. Linear transformations as a vector space 17

Jordan Canonical Form Homework Solutions

Dimension. Eigenvalue and eigenvector

Linear Algebra Review

Numerical Linear Algebra Homework Assignment - Week 2

Singular Value Decomposition

No books, no notes, no calculators. You must show work, unless the question is a true/false, yes/no, or fill-in-the-blank question.

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Linear Algebra: Matrix Eigenvalue Problems

. = V c = V [x]v (5.1) c 1. c k

1. Linear systems of equations. Chapters 7-8: Linear Algebra. Solution(s) of a linear system of equations (continued)

Lec 2: Mathematical Economics

MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators.

1 9/5 Matrices, vectors, and their applications

Q1 Q2 Q3 Q4 Tot Letr Xtra

Chapter 3 Transformations

14 Singular Value Decomposition

Review of Linear Algebra

Review of Some Concepts from Linear Algebra: Part 2

MA 265 FINAL EXAM Fall 2012

MATH 1553 PRACTICE MIDTERM 3 (VERSION B)

ELE/MCE 503 Linear Algebra Facts Fall 2018

Generalized Eigenvectors and Jordan Form

Lecture Summaries for Linear Algebra M51A

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

Principal Component Analysis

Econ 204 Supplement to Section 3.6 Diagonalization and Quadratic Forms. 1 Diagonalization and Change of Basis

Stat 159/259: Linear Algebra Notes

Review of Linear Algebra Definitions, Change of Basis, Trace, Spectral Theorem

Ma/CS 6b Class 20: Spectral Graph Theory

2 Eigenvectors and Eigenvalues in abstract spaces.

Study Guide for Linear Algebra Exam 2

LINEAR ALGEBRA KNOWLEDGE SURVEY

Math 110 Linear Algebra Midterm 2 Review October 28, 2017

MATH 220 FINAL EXAMINATION December 13, Name ID # Section #

Mathematical foundations - linear algebra

MATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors.

APPLICATIONS The eigenvalues are λ = 5, 5. An orthonormal basis of eigenvectors consists of

A = 3 1. We conclude that the algebraic multiplicity of the eigenvalues are both one, that is,

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Pre-sessional Mathematics for Big Data MSc Class 2: Linear Algebra

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

Transcription:

We covered these notes in the tutorial sessions I strongly recommend that you further read the presented materials in classical books on linear algebra Please make sure that you understand the proofs and that you can regenerate them Should you have any questions, please feel free to contact the TAs Matrix Vector Products Let A be a matrix in R n n whose (i, j)-th element is denoted by a ij, and v be a vector in R n whose i-th element is denoted by v i Note that the elements of A and v are real numbers The length of v, denoted by v, is defined as v = n The matrix vector product Av is a vector in R n defined as follows: [Av i = a ij v j j=1 The relationship between v and Av is the focus of our discussion in this section There are two basic questions that one needs to deal with when A is multiplied by v: what happens to v; and what happens to A? The former question is that of deforming a vector whereas the latter is that of projecting several points The answer to each question provides us with a different interpretation of Av When a vector is multiplied by a matrix, its orientation changes and its length gets scaled as the following example illustrates: Example 1: Let A be given by and take A = v = [ 2 0 0 1 [ 1/ 2 1/ 2 v 2 i, (1) Note that v has unit length and makes an angle of π 4 with the x axis The vector Av, however, has length v = 5 and orientation atan(1/2) < π 4 The amount by which the length and orientation of Ax differ from those of x are related to properties of A Hence, Av is sometimes called the image of v under A Let A represent a set of m observations (rows of A) made over n features (column of A) Then, Ax represents a projection of m samples from A onto x In other words, it is a one dimensional representation of A To retrieve the coordinates of Ax in R n, one needs to convert each element of the image into a vector along the direction of x Note that the element [Ax i is the length of the i-th row of A projected onto x The coordinates of the projected data in the original coordinate are given by Axx T See Fig1 for more details Eigenvalues and Eigenvectors A scalar λ is called an eigenvalue of A if there exists v R n with v = 0 such that Av = λv, 1

(a) Figure 1: Different interpretations of Av The [ red circles are two observations contained in A Their 1/ 2 projection on the unit length vector v = 1/ is given by the green stars Avv T Therefore, 2 Av gives us the length of the projected points on v Furthermore, Av can also be viewed as a vector, which is called the image of v under A Av is in general a vector with a different length and orientation than that of Av In fact, if v does not rotate under A, then it must be an eigenvector (see the section on eigenvalues) If one images Av under A, the new image A 2 v is further stretched and rotated In this particular case, the length of the Av is 5/2 and the length of A 2 v is 17/2 A 2 v is more oriented toward the x-axis If we keep imaging v under A, we obtain the largest eigenvector of A One can see that A 10 v is almost aligned with the x-axis which implies (A λi)v = 0 The vector v is called a right eigenvector for A Similarly, a vector satisfying v T A = λv T is a left eigenvector for A From a geometrical point of view, an eigenvector is one that does not change its orientation when imaged under A (see Fig1); it only gets scaled by a factor called the eigenvalue Note that Av and λv have the same orientation and differ only by a constant factor Furthermore, note that any λ at which (A λi) becomes singular is an eigenvalue for A If A has n distinct eigenvalues {λ 1,, λ n }, we have det(λi A) = (λ λ 1 )(λ λ 2 )(λ λ n ) The right hand side of the above equation is called the characteristic polynomial of A In fact, we can find the eigenvalues of A by finding the roots of the characteristic polynomial This gives us an analytical approach to find the eigenvalues of either small matrices or matrices with special structures this is not, however, computationally efficient and it is not how numerical solvers find eigenvalues An eigenvector v i of A belongs to the null space of A λ i I, which means that it satisfies the equality (A λ i )v i = 0 If the eigenvalues are 2

distinct, their corresponding eigenvectors are unique up to a scaling factor, ie, there is a unique direction along which (A λ i )v i = 0 occurs Example 2: Let A be given by Eq1 The eigenvalues of A are the roots of det(a λi) = (λ 2)(λ 1), which are λ 1 = 2 and λ 2 = 1 The eigenvectors of A are given by [ v11 (A λ 1 I) = 0 Then, [ 0 0 0 1 v 12 [ v11 v 12 = 0 = v 11 = 1, v 12 = 0 [ 2 Note that A m v = m [ 0 1 v One can see that v 0 1 1 = is the unit length limit of 0 A m v as m This concept is illustrated in Fig1 We now consider cases where an eigenvalue λ r is repeated This happens when the characteristic polynomial has a repeated root: det(λi A) = (λ λ r ) m (λ λ 1 )(λ λ n m ), m > 1 m is called the algebraic multiplicity of λ r In this case, the eigenvector satisfying (A λ r I)v = 0 may no longer be unique More precisely, if A λ r I has rank n m then there are m independent vectors that satisfy (λ r I A)v = 0 If this occurs, the number of independent (and, in fact, orthogonal) eigenvectors associated with λ r, also called geometric multiplicity, is equal to m However, if A λ r I has rank k > n m, we cannot find m independent vectors v that satisfy (A λ r I)v = 0 In this case, we need to introduce generalized eigenvectors to complete a basis for A In what follows, let v 1 denote the regular eigenvector that satisfied (A λ r I)v = 0 Example 3: Let A be the identity matrix [ 1 0 A = 0 1 The eigenvalues of A are given by λ 1 = λ 2 = 1 Therefore, 1 is a repeated eigenvalue with multiplicity 2 The eigenvectors of A must satisfy [ 0 0 (A 1I)v = v = 0 0 0 One can see that any vector in R 2 is an eigenvector for A We can choose v 1 = [ 1 0 and v 2 = [ 0 1 to form an orthogonal basis Example 4: Let A be given by [ 0 1 A = 1 2 3

Again, A has a repeated eigenvalue at 1 with multiplicity 2 The associated eigenvectors must satisfy: (A 1I)v = [ 1 1 1 1 In this case, there is a unique eigenvector (up to a scaling factor) v 1 = [ 1/ 2 1/ 2 for λ = 1 This means that A is a defective matrix and we need to use generalized eigenvectors to complete a basis for A One can show that (A 1I)v 2 = v 1 yields, [ v 2 = 1 2 2 It can be verified that (I A)v 2 = v 1 and that Q = [v 1 v 2 forms a basis for A Now we can prove an important property of symmetric matrices Lemma 1: There are no generalized eigenvectors for symmetric matrices Proof: Assume otherwise Then, there exist vectors v 1 and v 2 for some symmetric matrix A, such that (A λ r I)v 2 = v 1, 1 2 2 for some repeated eigenvalue λ r However, this implies that or v T 1 (A λ r I)v 2 = v T 1 v 1, v T 1 Av 2 λ r v T 1 v T 2 = v T 1 v 1 (2) Note that for a general matrix A, if Av = λv, we have v T A T = λv T This means that a right eigenvector v is always a left eigenvector for A T However, if A is symmetric, we have (Av) T = v T A T = v T A = λ r v T Replacing v T 1 A in the first term of Eq2 by λ rv T yields v T 1 v 1 = 0, which implies v 1 = 0 However, this is a contradiction because v 1 = 0 cannot be an eigenvector Therefore, A cannot have generalized eigenvectors QED Diagonalizing Symmetric Matrices In the first homework, you were asked to prove that any matrix with a set of n linearly independent eigenvectors Q = [ v 1 v n is diagonalizable, ie, can be decomposed into A = QΣQ 1 In general, defective matrices are not diagonalizable (A similar decomposition exists for defective matrices of the form A = QJQ 1 where J contains the eigenvalues of A and is in the normal Jordan form) Therefore, using Lemma1, one can conclude that symmetric matrices are always diagonalizable The following Lemma proves a stronger property of symmetric matrices Lemma 2: Symmetric matrices are unitarily diagonalizable, ie, they can be decomposed into A = QΣQ T 4

proof: Lemma 1 implies that Q contains n linearly independent eigenvectors This means the decomposition A = QΣQ 1 exists (homework 1) Therefore, it suffices to prove that Q is orthogonal, ie, Q 1 = Q T To do so, one needs to prove that the columns of Q are orthogonal, ie, v T i v j = 0 for i j with v i = 1 (why does this imply Q 1 = Q T?) Two cases are to be considered: λ j λ i and λ j = λ i Case 1: λ i λ j It follows from the definition of eigenvectors that Av i = λ i v i Av j = λ j v j } = vt j Av i = λ i v T j v i v T i Av j = λ j v T i v j The left hand sides are transposes of each other Therefore, (v T i Av j ) T = λ j v T j v i = λ i v T j v i This implies that (λ i λ j )vj T v i = 0 Given that λ i λ j, it must hold that vi T v j = 0 Case 2: λ i = λ j The proof follows directly from Lemma 1 Recall that since A is not defective, A λ i I has rank n m with m being the multiplicity of λ i Therefore, there is an m dimensional subspace N (see Example 3) such that (A λ i I)v = 0 v N One can always choose orthogonal vectors v i and v j from N such that vi T v j = 0 QED Principal Component Analysis Let A be m n containing m zero mean observations in n dimensions The interest in this section is to find a 1 dimensional vector v that maximizes the variance of Av (the variance of the projection of A on v) It is proven in your textbook (see the chapter on PCA) that maximizing the variance amounts to minimizing the loss in projection In other words, argmin v R m N (a T i a T i vv T ) = argmax v R m(v T A T Av) Note that a T i refers to the i-th row of A Let φ : R m R denote the variance φ(y) = y T A T Ay Maximizing φ(y) amounts to maximizing the norm of A The following Lemma proves that the maximum is equal to the largest singular value of A and occurs at the principal eigenvector of A T A Lemma 3: Let A be m n containing m zero mean observations in n dimensions and define φ(y) = y T A T Ay Let σ max denote the largest eigenvalue of A T A and v max be its associated eigenvector It holds that: v max = argmax y =1 φ(y) In other words, the largest eigenvector of A T A is a maximizer of the variance φ over the set of unit length vectors proof 1: Since A T A is symmetric, it can be decomposed to A T A = QΣQ T with Q = [ v 1 v n Given that Q forms a basis for A T A, any vector y can be written as a linear combination of the columns of Q y = α i v i, 5

with n α i = 1 The variance φ can be written as a function of α φ(α 1,, α n ) = ( α i vi T )Q T ΣQ( α i vi T ) Furthermore, note that Q T v j = v j Therefore, Given that Σ is diagonal, and that v T i vt j φ(α 1,, α n ) = ( α i vi T )Σ( α i vi T ) φ(α 1,, α n ) = ( = 0 for i j, one can write αi 2 σ i ) It is clear that the maximum of φ(α) = σ max occurs at α max = 1, where α max is the weight of the eigenvector for associated with σ max Thus, the maximizing y occurs at y max QED proof 2: Consider the following convex optimization problem: Define the Lagrangian L: maximize y T A T Ay st y T y = 1, L(λ, y) = y T A T Ay λ(y T y 1) The necessary optimality conditions require that which means L y = 0, A T Ay = λy This implies that the optimal Lagrange multiplier λ is an eigenvalue of A T A and that the optimal y is an eigenvector of A T A Furthermore, note that A T A can be decomposed into A T A = QΣQ T Imposing the constraint y T y = 1 yields L(y) = y T Q T ΣQy, y {v 1,, v n }, where v i s are eigenvectors of A T A Let y = v k be k-th eigenvector We have: L(y, λ) = v T k QT ΣQv k = σ k Therefore, the maximum value of L is σ max, which occurs at y = v max QED Example 5: Consider the set of points shown in Fig2 The goal is to find the PCA components of the set This requires us to find a line on which the projected points are as diverse as as possible For instance, if the points are projected on the x-axis, the differences along the y-axis is lost Note that if we change the y-coordinates of the points the x- axis does not change! Similarly, projecting points along the y-axis does not preserve the difference along the x-axis However, as can be seen in the figure, the PCA line maximizes the projected variance by taking into account the differences in both coordinates The code for generating the PCA direction is provided below 6

(a) Figure 2: A sample data set and its first principal components % choose arbitrary points and press enter to see the principal components close all [x,y=getpts plot(x,y, ro );hold on A=[x y; Ab=A-repmat(mean(A),size(A,1),1); [v,d=eig(ab *Ab); z=ab*v(:,end)*v(:,end) ; z=z+repmat(mean(a)*v(:,end)*v(:,end),size(z,1),1); plot([min(z(:,1)) max(z(:,1)), [z(z(:,1)==min(z(:,1)),2) v(2,end)/v(1,end)*max(z(:,1)), b-- ); plot(z(:,1),z(:,2), g* ); legend( Original points, PCA direction, Projected points ) axis equal 7