Eigenvectors and SVD 1
Definition Eigenvectors of a square matrix Ax=λx, x=0. Intuition: x is unchanged by A (except for scaling) Examples: axis of rotation, stationary distribution of a Markov chain 2
Diagonalization Stack up evec equation to get Where X R n n = AX=XΛ x 1 x 2 x n If evecs are linearly indep, X is invertible, so, Λ=diag(λ 1,...,λ n ). A=XΛX 1. 3
Evecs of symmetric matrix All evals are real (not complex) Evecs are orthonormal u T i u j=0ifi=j, u T i u i=1 So U is orthogonal matrix U T U=UU T =I 4
Diagonalizing a symmetric matrix We have A=UΛU T = n λ i u i u T i i=1 A = = λ 1 u T 1 u 1 u 2 u n λ 2 u T 2.... λn u T n = λ 1 u 1 ( u T 1 ) + +λ n u n ( u T n ) 5
Transformation by an orthogonal matrix Consider a vector x transformed by the orthogonal matrix U to give x = Ux The length of the vector is preserved since x 2 = x T x=x T U T U T x=x T x= x 2 The angle between vectors is preserved x T ỹ=x T U U y=x T y Thus multiplication by U can be interpreted as a rigid rotation of the coordinate system. 6
Geometry of diagonalization Let A be a linear transformation. We can always decompose this into a rotation U, a scaling Λ, and a reverse rotation U T =U -1. Hence A = U Λ U T. The inverse mapping is given by A -1 = U Λ -1 U T A = A 1 = m λ i u i u T i i=1 m i=1 1 λ i u i u T i 7
Matlab example Given Diagonalize A= 1.5 0.5 0 0.5 1.5 0 0 0 3 [U,D]=eig(A) U = -0.7071-0.7071 0-0.7071 0.7071 0 0 0 1.0000 D = 1 0 0 0 2 0 0 0 3 Rot(45) Scale(1,2,3) check >> U*D*U ans = 1.5000-0.5000 0-0.5000 1.5000 0 0 0 3.0000 8
Positive definite matrices A matrix A is pd if x T A x > 0 for any non-zero vector x. Hence all the evecs of a pd matrix are positive Au i = λ i u i u T iau i = λ i u T iu i =λ i >0 A matrix is positive semi definite (psd) if λ i >= 0. A matrix of all positive entries is not necessarily pd; conversely, a pd matrix can have negative entries > [u,v] = eig([1 2; 3 4]) u = -0.8246-0.4160 0.5658-0.9094 v = -0.3723 0 0 5.3723 [u,v]=eig([2-1; -1 2]) u = -0.7071-0.7071-0.7071 0.7071 v = 1 0 0 3 9
Multivariate Gaussian Multivariate Normal (MVN) N(x µ,σ) def = 1 (2π) p/2 Σ 1/2exp[ 1 2 (x µ)t Σ 1 (x µ)] Exponent is the Mahalanobis distance between x and µ =(x µ) T Σ 1 (x µ) Σ is the covariance matrix (symmetric positive definite) x T Σx>0 x 10
Covariance matrix is Bivariate Gaussian where the correlation coefficient is and satisfies -1 ρ 1 Density is Σ= ρ= ( ) σ 2 x ρσ x σ y ρσ x σ y σ 2 y Cov(X,Y) Var(X)Var(Y) p(x,y)= ( 1 2πσ x σ exp 1 y 1 ρ 2 2(1 ρ 2 ) ( x 2 σ 2 x + y2 σ 2 y 2ρxy )) (σ x σ y ) 11
Spherical, diagonal, full covariance Σ= ( ) σ 2 0 0 σ 2 Σ= ( σ 2 x 0 0 σ 2 y ) Σ= ( ) σ 2 x ρσ x σ y ρσ x σ y σ 2 y 12
Surface plots 13
Matlab plotting code stepsize = 0.5; [x,y] = meshgrid(-5:stepsize:5,-5:stepsize:5); [r,c]=size(x); data = [x(:) y(:)]; p = mvnpdf(data, mu, S); p = reshape(p, r, c); surfc(x,y,p); % 3D plot contour(x,y,p); % Plot contours 14
Visualizing a covariance matrix Let Σ = U Λ U T. Hence Σ 1 =U T Λ 1 U 1 =UΛ 1 U = Let y = U(x-µ) be a transformed coordinate system, translated by µ and rotated by U. Then ( p (x µ) T Σ 1 (x µ) = (x µ) T i=1 p = i=1 p i=1 1 λ i u i u T i 1 u i u T i λ i ) (x µ) 1 λ i (x µ) T u i u T i(x µ)= p i=1 y 2 i λ i 15
Visualizing a covariance matrix From the previous slide (x µ) T Σ 1 (x µ) = Recall that the equation for an ellipse in 2D is y 2 1 λ 1 + y2 2 λ 2 =1 Hence the contours of equiprobability are elliptical, with axes given by the evecs and scales given by the evals of Σ p i=1 yi 2 λ i 16
Visualizing a covariance matrix Let X ~ N(0,I) be points on a 2d circle. If Y=UΛ 1 2X Λ 1 2 =diag( Λ ii ) Then Cov[y]=UΛ 1 2Cov[x]Λ 1 2U T =UΛ 1 2Λ 1 2U T =Σ So we can map a set of points on a circle to points on an ellipse 17
Implementation in Matlab Y=UΛ 1 2X Λ 1 2 =diag( Λ ii ) function h=gaussplot2d(mu, Sigma, color) [U, D] = eig(sigma); n = 100; t = linspace(0, 2*pi, n); xy = [cos(t); sin(t)]; k = 6; %k = sqrt(chi2inv(0.95, 2)); w = (k * U * sqrt(d)) * xy; z = repmat(mu, [1 n]) + w; h = plot(z(1, :), z(2, :), color); axis( equal 18
Standardizing the data We can subtract off the mean and divide by the standard deviation of each dimension to get the following (for case i=1:n and dimension j=1:d) y ij = x ij x j σ j Then E[Y]=0 and Var[Y j ]=1. However, Cov[Y] might still be elliptical due to correlation amongst the dimensions. 19
Whitening the data Let X ~ N(µ,Σ) and Σ = U Λ U T. To remove any correlation, we can apply the following linear transformation Y = Λ 1 2U T X In Matlab Λ 1 2 = diag(1/ Λ ii ) [U,D] = eig(cov(x)); Y = sqrt(inv(d)) * U * X; 20
Whitening: example 21
Whitening: proof Let Y = Λ 1 2U T X Λ 1 2 = diag(1/ Λ ii ) Using we have Cov[AX]=ACov[X]A T and Cov[Y] = Λ 1 2U T ΣUΛ 1 2 = Λ 1 2U T (UΛU T )UΛ 1 2 = Λ 1 2ΛΛ 1 2 =I EY = Λ 1 2U T E[X] 22
Singular Value Decomposition A = UΣV T =λ 1 u 1 ( v T 1 ) + +λ r u r ( v T r ) 23
Right svectors are evecs of A^T A For any matrix A A T A = VΣ T U T UΣV T = V(Σ T Σ)V T (A T A)V = V(Σ T Σ)=VD 24
Left svectors are evecs of A A^T AA T = UΣV T VΣ T U T = U(ΣΣ T )U T (AA T )U = U(ΣΣ T )=UD 25
Truncated SVD A = U :,1:k Σ 1:k,1:k V T 1:,1:k=λ 1 u 1 ( v T 1 ) + +λ k u k ( v T k ) Spectrum of singular values Rank k approximation to matrix 26
SVD on images Run demo load clown [U,S,V] = svd(x,0); ranks = [1 2 5 10 20 rank(x)]; for k=ranks(:) Xhat = (U(:,1:k)*S(1:k,1:k)*V(:,1:k) ); image(xhat); end 27
Clown example 1 2 5 10 20 200 28
Space savings A U :,1:k Σ 1:k,1:k V T 1:,1:k m n (m k)(k)(n k)=(m+n+1)k 200 320=64,000 (200+320+1)20=10,420 29