FSAN/ELEG815: Statistical Learning

Size: px
Start display at page:

Download "FSAN/ELEG815: Statistical Learning"

Transcription

1 : Statistical Learning Gonzalo R. Arce Department of Electrical and Computer Engineering University of Delaware 3. Eigen Analysis, SVD and PCA

2 Outline of the Course 1. Review of Probability 2. Stationary processes 3. Eigen Analysis, Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) 4. The Learning Problem 5. Training vs Testing 6. Estimation theory: Maximum likelihood and Bayes estimation 7. The Wiener Filter 8. Adaptive Optimization: Steepest descent and the LMS algorithm 9. Least Squares (LS) and Recursive Least Squares (RLS) algorithm 10. Overfitting and Regularization 11. Logistic, Ridge and Lasso regression. 12. Neural Networks 13. Matrix Completion 1/79

3 Eigen Analysis Eigen Analysis Objective: Utilize tools from linear algebra to characterize and analyze matrices, especially the correlation matrix The correlation matrix plays a large role in statistical characterization and processing. Previously result: R is Hermitian. Further insight into the correlation matrix is achieved through eigen analysis Eigenvalues and vectors Matrix diagonalization Application: Optimum filtering problems 2/79

4 Eigen Analysis Objective: For a Hermitian matrix R, find a vector q satisfying Rq = λq Interpretation: Linear transformation by R changes the scale, but not the direction of q Fact: A M M matrix R has M eigenvectors and eigenvalues To see this, note Rq i = λ i q i i = 1,2,3,,M (R λi)q = 0 For this to be true, the row/columns of (R λi) must be linearly dependent, det(r λi) = 0 3/79

5 Eigen Analysis Note: det(r λi) is a Mth order polynomial in λ The roots of the polynomial are the eigenvalues λ 1,λ 2,,λ M Rq i = λ i q i Each eigenvector q i is associated with one eigenvalue λ i The eigenvectors are not unique Rq i = λ i q i R(aq i ) = λ i (aq i ) Consequence: eigenvectors are generally normalized, e.g., q i = 1 for i = 1,2,...,M 4/79

6 Eigen Analysis Example (General two dimensional case) Let M = 2 and R = [ R1,1 R 1,2 R 2,1 R 2,2 Determine the eigenvalues and eigenvectors. Thus ] det(r λi) = 0 R 1,1 λ R 1,2 R 2,1 R 2,2 λ = 0 λ 2 λ(r 1,1 + R 2,2 ) + (R 1,1 R 2,2 R 1,2 R 2,1 ) = 0 λ 1,2 = 1 2 [ ] (R 1,1 + R 2,2 ) ± 4R 1,2 R 2,1 + (R 1,1 R 2,2 ) 5/79

7 Eigen Analysis Back substitution yields the eigenvectors: [ R1,1 λ R 1,2 R 2,2 λ R 2,1 ] [ q1 q 2 ] = [ 0 0 ] In general, this yields a set of linear equations. In the M = 2 case: (R 1,1 λ)q 1 + R 1,2 q 2 =0 R 2,1 q 1 + (R 2,2 λ)q 2 =0 Solving the set of linear equations for a specific eigenvalue λ i yields the corresponding eigenvector, q i 6/79

8 Eigen Analysis Example (Two dimensional white noise) Let R be the correlation matrix of a two sample vector of zero mean white noise R = [ σ σ 2 Determine the eigenvalues and eigenvectors. Carrying out the analysis yields eigenvalues [ λ 1,2 = 1 (R 1,1 + R 2,2 ) ± 4R 1,2 R 2,1 + (R 1,1 R 2,2 ) 2 = 1 [ (σ 2 + σ 2 ] ) ± 0 + (σ 2 2 σ 2 ) = σ 2 ] ] and eigenvectors q 1 = [ 1 0 ] and q 2 = [ 0 1 Note: The eigenvectors are unit length (and orthogonal) ] 7/79

9 Eigen Properties Eigen Properties Property (eigenvalues of R k ) If λ 1,λ 2,,λ M are the eigenvalues of R, then λ k 1,λ k 2,,λ k M are the eigenvalues of R k. Proof: Note Rq i = λ i q i.multiplying both sides by R k 1 times, R k q i = λ i R k 1 q i = λ k i q i Property (linear independence of eigenvectors) The eigenvectors q 1,q 2,,q M, of R are linearly independent, i.e., M i=1 for all nonzero scalars a 1,a 2,,a M. a i q i 0 8/79

10 Eigen Properties Property (Correlation matrix eigenvalues are real & nonnegative) The eigenvalues of R are real and nonnegative. Proof: Rq i = λ i q i qi H Rq i = λ i qi H q i [pre multiply by qi H ] λ i = qh i Rq i q H i q i 0 Follows from the facts: R is positive semi-definite and q H i q i = q i 2 > 0 Note: In most cases, R is positive definite and λ i > 0, i = 1,2,,M 9/79

11 Eigen Properties Property (Unique eigenvalues orthogonal eigenvectors) If λ 1,λ 2,,λ M are unique eigenvalues of R, then the corresponding eigenvectors, q 1,q 2,,q M, are orthogonal. Proof: Rq i = λ i q i qj H Rq i = λ i qj H q i ( ) Also, since λ j is real and R is Hermitian Rq j = λ j q j qj H R = λ j qj H qj H Rq i = λ j qj H q i Substituting the LHS from ( ) λ i q H j q i = λ j q H j q i 10/79

12 Eigen Properties Thus λ i q H j q i = λ j q H j q i (λ i λ j )q H j q i = 0 Since λ 1,λ 2,,λ M are unique q H j q i = 0 i j q 1,q 2,,q M are orthogonal. QED 11/79

13 Eigen Properties Diagonalization of R Objective: Find a transformation that transforms the correlation matrix into a diagonal matrix. Let λ 1,λ 2,,λ M be unique eigenvectors of R and take q 1,q 2,,q M to be the M orthonormal eigenvectors { qi H 1 i = j q j = 0 i j Define Q = [q 1,q 2,,q M ] and Ω = diag(λ 1,λ 2,,λ M ). Then consider Q H RQ = q H 1 q H 2. qm H R[q 1,q 2,,q M ] 12/79

14 Eigen Properties Q H RQ = = = q H 1 q H 2. qm H q H 1 q H 2. qm H R[q 1,q 2,,q M ] [λ 1 q 1,λ 2 q 2,,λ N q M ] λ λ λ M Q H RQ = Ω (eigenvector diagonalization of R) 13/79

15 Eigen Properties Property (Q is unitary) Q is unitary, i.e., Q 1 = Q H Proof: Since the q i eigenvectors are orthonormal Q H Q = Q 1 = Q H q H 1 q H 2. qm H [q 1,q 2,,q M ] = I Property (Eigen decomposition of R) The correlation matrix can be expressed as R = M i=1 λ i q i q H i 14/79

16 Eigen Properties Proof: The correlation diagonalization result states Isolating R and expanding, Note: This also gives Q H RQ = Ω R = QΩQ H = [q 1,q 2,,q M ]Ω = [q 1,q 2,,q M ] λ 1 q H 1 λ 2 q H 2. λ M q H M q H 1 q H 2. qm H = M i=1 R 1 = (Q H ) 1 Ω 1 Q 1 = QΩ 1 Q H where Ω 1 = diag(1/λ 1,1/λ 2,,1/λ M ) λ i q i q H i 15/79

17 Eigen Properties Aside (trace & determinant for matrix products) Note trace(a) = i A i,i. Also, trace(ab) = trace(ba) similarly det(ab) = det(a)det(b) Property (Determinant Eigenvalue Relation) The determinant of the correlation matrix is related to the eigenvalues as follows: det(r) = M λ i i=1 Proof: Using R = QΩQ H and the above, det(r) = det(qωq H ) = det(q)det(q H )det(ω) = det(ω) = M λ i i=1 16/79

18 Eigen Properties Property (Trace Eigenvalue Relation) The trace of the correlation matrix is related to the eigenvalues as follows: trace(r) = M λ i i=1 Proof: Note QED trace(r) = trace(qωq H ) = trace(q H QΩ) = trace(ω) = M λ i i=1 17/79

19 Eigen Properties Definition (Normal Matrix) A complex square matrix A is a normal matrix if A H A = AA H That is, a matrix is normal if it commutes with its conjugate transpose. Note All Hermitian symmetric matrices are normal Every matrix that can be diagonalized by the unitary transform is normal Definition (Condition Number) The condition number reflects how numerically well conditioned a problem is, i.e, a low condition number well conditioned; a high condition number ill conditioned. 18/79

20 Eigen Properties Definition (Condition Number for Linear Systems) For a linear system Ax = b defined by a normal matrix A, the condition number is χ(a) = λ max λ min where λ max and λ min are the maximum/minimum eigenvalues of A Observations: Large eigenvalue spread ill conditioned Small eigenvalue spread well conditioned 19/79

21 SVD Matrix-Vector Multiplication Example in 2D: and, A = y = Ax = [ 2 ] x = [ ] [ ] [ 1 3 = ] [ 5 2 What is the geometrical meaning of the matrix-vector multiplication? ] 20/79

22 SVD Matrix-Vector Multiplication [ y = Ax = 1 1][ 3 ] [ 5 = 2 ] Rotates the vector θ Stretches the vector 21/79

23 SVD Matrix-Vector Multiplication To rotate x by an angle θ, we pre-multiply by [ ] cosθ sinθ A = sinθ cosθ Stretch x by factor α, pre-multiply by [ ] α 0 A = 0 α 22/79

24 SVD Matrix-Vector Multiplication Consider the vectors v 1 and v 2 depicting a circle. What happens to the circle under matrix multiplication? 23/79

25 SVD Matrix-Vector Multiplication What happens to the 2D circle under matrix multiplication? Note: Ortogonality holds since they are all rotated by the same angle. 24/79

26 SVD Matrix-Vector Multiplication What happens to the n-d hyper-sphere under matrix multiplication? 25/79

27 SVD n-dim Hyper-Sphere Mapping to n-dim Hyper-Ellipsoid The mapping can be written as Expressed in matrix form as A }{{} A C m n [v 1 v 2...v n ] }{{} Av 1 = σ 1 û 1. Av n = σ j û n = [û 1 û 2...û n ] }{{} V C n n AV = Û ˆΣ. Û C m n σ σ n } {{ } ˆΣ C n n 26/79

28 SVD n-dim Hyper-Sphere Mapping to n-dim Hyper-Ellipsoid Let v 1,...,v n be unitary orthonormal vectors, then V = [v 1 v 2... v n ] is a unitary transformation matrix, that is V 1 = V H. Let û 1,...,û n be unitary orthonormal vectors, then Û = [û 1 û 2... û n ] is a unitary transformation matrix, that is U 1 = ÛH. 27/79

29 SVD Reduced Singular Value Decomposition The mapping is thus given by, Multiply both sides by V 1 we obtain: AV = Û ˆΣ AVV 1 = 1 Û ˆΣV AVV H = Û ˆΣV H AI = Û ˆΣV H A = Û ˆΣV H where Σ = diag([σ 1,σ 2,...,σ n ]), such that σ 1 σ 2...σ p 0. 28/79

30 SVD Singular Value Decomposition Reduced SVD SVD 29/79

31 SVD Theorem 1 Every matrix A C m n has a singular value decomposition (SVD). Singular values σ j are uniquely determined. If A is square σ j are distinct. u j and v j are also unique up to a complex sign. (unique if the complex sign is ignored) 30/79

32 SVD SVD calculation Start with A T A: A H A = ( UΣV H) H ( UΣV H) = VΣU H UΣV H A H AV = VΣ 2 V H V A H AV = VΣ 2 Reduces to an eigenvalue decomposition problem of the form: A T AV = V Σ 2, }{{} B where Λ is a diagonal matrix with the eigenvalues of B and V corresponds to the eigenvectors of B. }{{} Λ 31/79

33 SVD SVD calculation How do we calculate U: AA H = ( UΣV H) ( UΣV H) H = UΣV H VΣU H AA H U = UΣ 2 U H U AA }{{ H } U = U Σ 2 }{{} B Λ Eigenvalue problem where Λ is a diagonal matrix with the eigenvalues of B and U corresponds to the eigenvectors of B. 32/79

34 SVD Movie Rating - A Solution Describe a movie as an array of factors, e.g. comedy, action... Describe each viewer using same factors, e.g. likes comedy, likes action, etc Rating based on match/mismatch More factors better prediction 33/79

35 SVD Singular Value Decomposition Solution Viewers rated movies on a scale from 1 to 5, 0 for movies that were not rated by the user. Each column j is a different movie Each row i is a different viewer Each element a i,j represents the rating of movie j by viewer i A = a 1,1 a 1,n a m, a m,n Goal: Use SVD to predict unobserved data or the rating of a movie that hasn t come out yet. 34/79

36 SVD Singular Value Decomposition Solution We want to classify Movies and Viewers: Category 1 Category 2 Movies = Category 3 Intuitively, if Movie 1 Movie 2, these movies are similar (same category).. Categories are determined by matrix A and SVD algorithm. 35/79

37 SVD Singular Value Decomposition Solution Now, consider that each movie belongs to more than one category e.g. half comedy and half action. This can be written as: Movie j = v 1 Cat1 + v 2 Cat2 + + v n Catn s.t. v 2 = 1 where the set of categories {Cat i} forms an orthonormal basis. 36/79

38 SVD Singular Value Decomposition Solution In the case of V iewers, we use the same Movies categories: Movies = Category 1 Category 2 Category 3 E.g. a viewer that loves comedy is represented with the same unit vector of the comedy category movies. Each V iewer is represented as:. = V iewers. V iewer i = u 1 Cat1 + u 2 Cat2 + + u n Catn s.t. u 2 = 1 37/79

39 SVD Singular Value Decomposition Solution From Theorem 1: There exist a unique decomposition into categories. Every matrix A C m n can be factorized as A = ÛΣVH where: 38/79

40 SVD Singular Value Decomposition Solution Each row vector (u i ) in Û represents the taste of a V iewer i on the corresponding categories. Û = u 1,1 u 1,n u m, u m,n = u 1 u 2. u m 39/79

41 SVD Singular Value Decomposition Solution Each column (v j ) in V H represents the content of a Movie j on the corresponding categories. V H = v 1,1 v 1,n v n, v n,n = [ ] v 1 v 2... v n 40/79

42 SVD Singular Value Decomposition Solution Each singular value σ ii in Σ computes how a viewer of category i rates a movie of the same category i. Σ = σ 1, σ 2, σ n,n 41/79

43 SVD Singular Value Decomposition Solution We have more viewers than movies: New categories are created. The new vectors are still unit vectors orthonormal to all the basis vectors but the ratings of these useless categories are zero. 42/79

44 SVD Singular Value Decomposition Solution The representation of each movie and each viewer can be obtained by Movie j = v 1,j Cat1 + v 2,j Cat2 + + v n,j Catn s.t. v j 2 = 1 = v 1,j σ1,1 + v 2,j σ2,2 + + v n,j σn,n = Σv j V iewer i = u i,1 Cat1 + u i,2 Cat2 + + u i,n Catn + + u i,m Catm s.t. u i 2 = 1 = u i,1 σ1,1 + u i,2 σ2,2 + + u i,n σn,n = u i Σ 43/79

45 SVD Singular Value Decomposition Solution Given the decomposition of a movie and a viewer, the rating is estimated by: V iewer i Movie j = u i,1 v 1,j σ 1,1 + u i,2 v 2,j σ 2,2 + + u i,n v n,j σ n,n = (u i Σ)( Σvj ) = u i Σv j 44/79

46 SVD Singular Value Decomposition - Example Considering the rating from 60 viewers to 16 movies of 4 different genres(action, romance, sci-fi, comedy), we generate A R Viewers rated movies on a scale from 1 to 5, 0 for movies that were not rated by the user. Observe the same 4 categories of viewers. 45/79

47 SVD Singular Value Decomposition - Example 46/79

48 SVD Singular Value Decomposition - Example 47/79

49 SVD Singular Value Decomposition - Example 48/79

50 SVD Singular Value Decomposition - Example To estimate not rated movies (zero entries in A), we use additional information: A is known to be low-rank or approximately low-rank. Thus, we are going to use the k-rank approximation of the matrix A that is: Â = ÛΣ kv H where Σ k has all but the first k singular values σ ii set to zero. The ratings different from zero in A are set to its original value. Note: The ratings matrix A is expected to be low-rank since user preferences can be described by a few categories (k), such as the movie genres. 49/79

51 SVD Singular Value Decomposition - Example 50/79

52 SVD Singular Value Decomposition - Example 51/79

53 PCA Principal Component Analysis (PCA) Simple, method for extracting relevant information from confusing data sets. How to reduce a complex data set to a lower dimension? Consider a mass attached to a spring which oscillates as shown below. What if we did not know that F = ma? 52/79

54 PCA PCA - Motivation: Toy example Since we live in a 3D world use three cameras to capture data from the system. No information about the real x,y, and z axes camera positions are chosen arbitrarily. How do we get from this data set to a simple equation of z? 53/79

55 PCA PCA - Motivation: Toy example Three cameras give redundant information. Only one camera at a specific angle necessary to describe the system behavior. PCA is used to avoid redundancy. 54/79

56 PCA Framework: Change of Basis The goal of PCA is to identify the most meaningful basis to re-express a dataset. Let the basis of representation for our samples be the naive choice, that is the identity matrix. B = b 1 b 2. = b m = I (1) Note all the recorded data can be trivially expressed as a linear combination of b i. 55/79

57 PCA Change of Basis PCA: Is there another basis, which is a linear combination of the original basis, that best respresents the data set? Let X be the original data set, where each column is a single measurements set. Let Y be a linear transformation by P, i.e. Y = PX. Implications: Geometrically P is a rotation and a stretch which transforms X into Y. The rows of P, {p 1,...,p m } are a set of new basis vectors for expressing the columns of X. What is the best way to re-express X?, what is a good choice for P? 56/79

58 PCA Noise Signal and noise variances are depicted as σ 2 signal and σ 2 noise. The largest direction of variance is not along the natural basis but along the best-fit line. The directions with largest variances contain the dynamics of interest. Intuition: Find the direction indicated by σ signal. 57/79

59 PCA Redundancy Figures depict possible plots between two arbitrary measurement types r 1 and r 2. Low redundancy uncorrelated recordings High redundancy correlated recordings, e.g. the sensors are too close or the measured variables are equivalent. If recordings are highly correlated it is not necessary to measure both of them. 58/79

60 PCA PCA - Basic concepts Let a = [a 1,a 2,...,a n ] and b = [b 1,b 2,...,b n ] be two sets of measurements. Are they related? If the mean of a and b is zero, then: Variance: How large the change is in each vector. σ 2 a = 1 n aat = 1 n σ 2 b = 1 n bbt = 1 n a 2 i i b 2 i i Covariance: Statistical relationship between data in a and b. σ 2 ab = 1 n abt = 1 n i a i b i 59/79

61 PCA Variance and Covariance Let X be defined as X = [x T 1... x T m], where x i corresponds to all measurements of a particular type. Then the covariance matrix is defined as: C X = 1 n XXH The covariance values reflect the noise and redundacy in the measurements. 60/79

62 PCA Variance and Covariance Recall C X is the covariance matrix of X defined as C X = 1 n XXH. Covariance matrix in the spring example is C X R 6 6 : C X = σy 2 1 y 1 σy 2 1 z 1 σy 2 1 y 2 σy 2 1 z 2 σy 2 1 y 3 σy 2 1 z 3 σz 2 1 y 1 σz 2 1 z 1 σz 2 1 y 2 σz 2 1 z 2 σz 2 1 y 3 σz 2 1 z 3 σy 2 2 y 1 σy 2 2 z 1 σy 2 2 y 2 σy 2 2 z 2 σy 2 2 y 3 σy 2 2 z 3 σz 2 2 y 1 σz 2 2 z 1 σz 2 2 y 2 σz 2 2 z 2 σz 2 2 y 3 σz 2 2 z 3 σy 2 3 y 1 σy 2 3 z 1 σy 2 3 y 2 σy 2 3 z 2 σy 2 3 y 3 σy 2 3 z 3 σz 2 3 y 1 σz 2 3 z 1 σz 2 3 y 2 σz 2 3 z 2 σz 2 3 y 3 σz 2 3 z 3 Diagonal: Variance measures; Off-diagonal: covariance between all pairs. C X is hermitian and symmetric, i.e. C X = C T X = C T X. 61/79

63 PCA Covariance Matrix Interpretation C X = σy 2 1 y 1 σy 2 1 z 1 σy 2 1 y 2 σy 2 1 z 2 σy 2 1 y 3 σy 2 1 z 3 σz 2 1 y 1 σz 2 1 z 1 σz 2 1 y 2 σz 2 1 z 2 σz 2 1 y 3 σz 2 1 z 3 σy 2 2 y 1 σy 2 2 z 1 σy 2 2 y 2 σy 2 2 z 2 σy 2 2 y 3 σy 2 2 z 3 σz 2 2 y 1 σz 2 2 z 1 σz 2 2 y 2 σz 2 2 z 2 σz 2 2 y 3 σz 2 2 z 3 σy 2 3 y 1 σy 2 3 z 1 σy 2 3 y 2 σy 2 3 z 2 σy 2 3 y 3 σy 2 3 z 3 σz 2 3 y 1 σz 2 3 z 1 σz 2 3 y 2 σz 2 3 z 2 σz 2 3 y 3 σz 2 3 z 3 Off-diagonal terms If covariance is large then components are statistically dependent. If covariance is small then components are statistically independent. Diagonal terms: If variance is large it contains a lot of information about the system. If variance is small it does not provide significant information about the system. 62/79

64 PCA PCA Goal: Change basis such that the covariance matrix of the data is diagonal. If off-diagonal terms 0, the redundancies are eliminated. Diagonal terms represent the variance of each component. Components with large variance are the most representative. 63/79

65 PCA PCA and Eigenvalue Decomposition How to solve the problem? Data set: X R m n, where m is the number of measurement types and n is the number of samples. PCA : Find an orthonormal matrix P in Y = PX such that C Y = 1 n YYT is a diagonal matrix. The rows of P are the principal components of X 64/79

66 PCA PCA and Eigenvalue Decomposition We begin rewriting C Y in terms of the unknown variable. C Y = 1 n YYT = 1 n (PX)(PX)T = 1 n PXXT P T = P = PC X P T ( 1 n XXT ) P T 65/79

67 PCA PCA and Eigenvalue Decomposition C X can be diagonalized by an orthogonal matrix of its eigenvectors since it is a symmetric matrix. Let P = Q T, where Q is a matrix with the eigenvectors of 1 n XXT, then: C Y = PC X P T = P ( Q T ΩQ ) P T = P ( P T ΩP ) P T = ( PP 1) Ω ( PP 1) = Ω The transformation Y = PX diagonalizes the system. Covariance of Y is a diagonal matrix with the eigenvalues of 1 n XXT. 66/79

68 PCA PCA and SVD The SVD of X is given by X = UΣV H. Let P = U H, then: Y = U H X, The covariance matrix of Y is given by: C Y = 1 n YYH = 1 n UH XX H U = 1 n UH UΣV H VΣU H U = 1 n Σ2 67/79

69 PCA PCA The transformation Y = U H X diagonalized the system. Covariance of Y is a diagonal matrix with the squared singular values of X multiplied by a factor of 1 n. It can be concluded that Σ 2 = Ω, and σ 2 i = λ i. The principal components of the data matrix are given by U H. 68/79

70 PCA Application: Face Recognition PCA in face recognition Eigenfaces Intuition: Figure out the correlation between the rows/ colums of A from the SVD. A = UΣV H (2) How important each direction is: Σ Principal Directions: U How each individual component (row/column) projects onto the principal components: V. 69/79

71 PCA Data in Face Recognition The data matrix for the face recognition problem is constructed by vectorizing the face images as shown below, i.e. A = [A T 1 A T 2... A T N] T. The matrix will be N M, where N is the number of images in the data base and M is the number of pixels of each image. 70/79

72 PCA Example - Celebrity Images Example, take 5 images of each celebrity: George Clooney, Bruce Willis, Margaret Thatcher and Matt Damon. In the example, M = and N = /79

73 PCA Average Faces How do the average of the faces of these celebrities look like? ā i = j=1 A j (3) 72/79

74 PCA Average Faces What defines George Clooney s face? Data matrix A R N M with the images of the example. Compute the correlation matrix of the features of the dataset, i.e. the pixels. The correlation matrix is C = A T A R M M, here M = High correlation values everybody has eyes, a nose and a mouth. Correlations between images of the same person will be higher. Average Face 73/79

75 PCA Eigendecomposition Obtain the eigenvalue decomposition of C = A T A. That is C = QΩQ 1. First eigenvectors q i R M 1 are called the principal components (eigenfaces). One can reconstruct each face as a weighted sum of the eigenvectors. 74/79

76 PCA Representing Faces onto Basis Each face A i R 1 M in the data set A = [A T 1 A T 2... A T N] T, can be represented as a linear combination of the best K eigenvectors: A T i = K j=1 w j q j, where w j = q T j A T i (4) 75/79

77 PCA Projection of the Average faces into the K=20 largest Eigenvectors Q is M M, here let Q be the matrix formed by the first 20 eigenvectors, i.e. Q R M N. Project the average faces ā i onto the reduced eigenvector space, i.e. projection = ā i Q. Projections for each face are characteristic of each average face and could be used for classification purposes. 76/79

78 PCA Projection of new images Test set: New image of Margaret Thatcher, Maryl Streep as Margaret Thatcher in The Iron Lady, Betty White. Project test images onto eigenvector space, p = xq, where x R 1 M is the new vectorized image and Q is the matrix with the first 20 eigenvectors of the database. Reconstruct images as ˆx = Qp T. Error defined as the difference between the projection of the new image and the projection of the original Margaret Thatcher images A j Q where j = 1,...,5, that is E j = A jq xq, A j Q where A j are the original images of the database, in this case the 5 images of Margareth Thatcher. 77/79

79 PCA Projection of new images Image depicts, from left to right Test images. Projection of the test images onto the eigenvector space p = xq. Reconstructed images using the first 20 eigenvectors of the database ˆx = Qp T. Error of the projection with respect to each original Margareth Thatcher Image. 78/79

80 PCA Projection of new images 79/79

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed in the

More information

Notes on Eigenvalues, Singular Values and QR

Notes on Eigenvalues, Singular Values and QR Notes on Eigenvalues, Singular Values and QR Michael Overton, Numerical Computing, Spring 2017 March 30, 2017 1 Eigenvalues Everyone who has studied linear algebra knows the definition: given a square

More information

Mathematical foundations - linear algebra

Mathematical foundations - linear algebra Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar

More information

Introduction to Matrix Algebra

Introduction to Matrix Algebra Introduction to Matrix Algebra August 18, 2010 1 Vectors 1.1 Notations A p-dimensional vector is p numbers put together. Written as x 1 x =. x p. When p = 1, this represents a point in the line. When p

More information

The Singular Value Decomposition

The Singular Value Decomposition The Singular Value Decomposition Philippe B. Laval KSU Fall 2015 Philippe B. Laval (KSU) SVD Fall 2015 1 / 13 Review of Key Concepts We review some key definitions and results about matrices that will

More information

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Linear Algebra & Geometry why is linear algebra useful in computer vision? Linear Algebra & Geometry why is linear algebra useful in computer vision? References: -Any book on linear algebra! -[HZ] chapters 2, 4 Some of the slides in this lecture are courtesy to Prof. Octavia

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

Numerical Methods I Singular Value Decomposition

Numerical Methods I Singular Value Decomposition Numerical Methods I Singular Value Decomposition Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 9th, 2014 A. Donev (Courant Institute)

More information

Maths for Signals and Systems Linear Algebra in Engineering

Maths for Signals and Systems Linear Algebra in Engineering Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 15, Tuesday 8 th and Friday 11 th November 016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR) IN SIGNAL PROCESSING IMPERIAL COLLEGE

More information

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x = Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.

More information

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016 Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31 Outline 1 Eigen

More information

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = 30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed

More information

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Linear Algebra & Geometry why is linear algebra useful in computer vision? Linear Algebra & Geometry why is linear algebra useful in computer vision? References: -Any book on linear algebra! -[HZ] chapters 2, 4 Some of the slides in this lecture are courtesy to Prof. Octavia

More information

The Singular Value Decomposition and Least Squares Problems

The Singular Value Decomposition and Least Squares Problems The Singular Value Decomposition and Least Squares Problems Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo September 27, 2009 Applications of SVD solving

More information

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag A Tutorial on Data Reduction Principal Component Analysis Theoretical Discussion By Shireen Elhabian and Aly Farag University of Louisville, CVIP Lab November 2008 PCA PCA is A backbone of modern data

More information

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR 1. Definition Existence Theorem 1. Assume that A R m n. Then there exist orthogonal matrices U R m m V R n n, values σ 1 σ 2... σ p 0 with p = min{m, n},

More information

7 Principal Component Analysis

7 Principal Component Analysis 7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is

More information

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) Chapter 5 The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) 5.1 Basics of SVD 5.1.1 Review of Key Concepts We review some key definitions and results about matrices that will

More information

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 The Singular Value Decomposition (SVD) continued Linear Algebra Methods for Data Mining, Spring 2007, University

More information

15 Singular Value Decomposition

15 Singular Value Decomposition 15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Singular Value Decomposition

Singular Value Decomposition Chapter 6 Singular Value Decomposition In Chapter 5, we derived a number of algorithms for computing the eigenvalues and eigenvectors of matrices A R n n. Having developed this machinery, we complete our

More information

Linear Algebra - Part II

Linear Algebra - Part II Linear Algebra - Part II Projection, Eigendecomposition, SVD (Adapted from Sargur Srihari s slides) Brief Review from Part 1 Symmetric Matrix: A = A T Orthogonal Matrix: A T A = AA T = I and A 1 = A T

More information

IV. Matrix Approximation using Least-Squares

IV. Matrix Approximation using Least-Squares IV. Matrix Approximation using Least-Squares The SVD and Matrix Approximation We begin with the following fundamental question. Let A be an M N matrix with rank R. What is the closest matrix to A that

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Principal

More information

2. LINEAR ALGEBRA. 1. Definitions. 2. Linear least squares problem. 3. QR factorization. 4. Singular value decomposition (SVD) 5.

2. LINEAR ALGEBRA. 1. Definitions. 2. Linear least squares problem. 3. QR factorization. 4. Singular value decomposition (SVD) 5. 2. LINEAR ALGEBRA Outline 1. Definitions 2. Linear least squares problem 3. QR factorization 4. Singular value decomposition (SVD) 5. Pseudo-inverse 6. Eigenvalue decomposition (EVD) 1 Definitions Vector

More information

CS 143 Linear Algebra Review

CS 143 Linear Algebra Review CS 143 Linear Algebra Review Stefan Roth September 29, 2003 Introductory Remarks This review does not aim at mathematical rigor very much, but instead at ease of understanding and conciseness. Please see

More information

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component

More information

Singular Value Decomposition

Singular Value Decomposition Singular Value Decomposition Motivatation The diagonalization theorem play a part in many interesting applications. Unfortunately not all matrices can be factored as A = PDP However a factorization A =

More information

Linear Algebra: Matrix Eigenvalue Problems

Linear Algebra: Matrix Eigenvalue Problems CHAPTER8 Linear Algebra: Matrix Eigenvalue Problems Chapter 8 p1 A matrix eigenvalue problem considers the vector equation (1) Ax = λx. 8.0 Linear Algebra: Matrix Eigenvalue Problems Here A is a given

More information

UNIT 6: The singular value decomposition.

UNIT 6: The singular value decomposition. UNIT 6: The singular value decomposition. María Barbero Liñán Universidad Carlos III de Madrid Bachelor in Statistics and Business Mathematical methods II 2011-2012 A square matrix is symmetric if A T

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

The Singular Value Decomposition

The Singular Value Decomposition The Singular Value Decomposition An Important topic in NLA Radu Tiberiu Trîmbiţaş Babeş-Bolyai University February 23, 2009 Radu Tiberiu Trîmbiţaş ( Babeş-Bolyai University)The Singular Value Decomposition

More information

Lecture notes: Applied linear algebra Part 1. Version 2

Lecture notes: Applied linear algebra Part 1. Version 2 Lecture notes: Applied linear algebra Part 1. Version 2 Michael Karow Berlin University of Technology karow@math.tu-berlin.de October 2, 2008 1 Notation, basic notions and facts 1.1 Subspaces, range and

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information

MATH36001 Generalized Inverses and the SVD 2015

MATH36001 Generalized Inverses and the SVD 2015 MATH36001 Generalized Inverses and the SVD 201 1 Generalized Inverses of Matrices A matrix has an inverse only if it is square and nonsingular. However there are theoretical and practical applications

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Linear Algebra Review. Fei-Fei Li

Linear Algebra Review. Fei-Fei Li Linear Algebra Review Fei-Fei Li 1 / 51 Vectors Vectors and matrices are just collections of ordered numbers that represent something: movements in space, scaling factors, pixel brightnesses, etc. A vector

More information

Linear Algebra Review. Fei-Fei Li

Linear Algebra Review. Fei-Fei Li Linear Algebra Review Fei-Fei Li 1 / 37 Vectors Vectors and matrices are just collections of ordered numbers that represent something: movements in space, scaling factors, pixel brightnesses, etc. A vector

More information

Math Bootcamp An p-dimensional vector is p numbers put together. Written as. x 1 x =. x p

Math Bootcamp An p-dimensional vector is p numbers put together. Written as. x 1 x =. x p Math Bootcamp 2012 1 Review of matrix algebra 1.1 Vectors and rules of operations An p-dimensional vector is p numbers put together. Written as x 1 x =. x p. When p = 1, this represents a point in the

More information

Linear Algebra Review. Vectors

Linear Algebra Review. Vectors Linear Algebra Review 9/4/7 Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa (UCSD) Cogsci 8F Linear Algebra review Vectors

More information

MIT Final Exam Solutions, Spring 2017

MIT Final Exam Solutions, Spring 2017 MIT 8.6 Final Exam Solutions, Spring 7 Problem : For some real matrix A, the following vectors form a basis for its column space and null space: C(A) = span,, N(A) = span,,. (a) What is the size m n of

More information

Foundations of Matrix Analysis

Foundations of Matrix Analysis 1 Foundations of Matrix Analysis In this chapter we recall the basic elements of linear algebra which will be employed in the remainder of the text For most of the proofs as well as for the details, the

More information

Lecture 5 Singular value decomposition

Lecture 5 Singular value decomposition Lecture 5 Singular value decomposition Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University, tieli@pku.edu.cn

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Linear Algebra Fundamentals

Linear Algebra Fundamentals Linear Algebra Fundamentals It can be argued that all of linear algebra can be understood using the four fundamental subspaces associated with a matrix. Because they form the foundation on which we later

More information

EE731 Lecture Notes: Matrix Computations for Signal Processing

EE731 Lecture Notes: Matrix Computations for Signal Processing EE731 Lecture Notes: Matrix Computations for Signal Processing James P. Reilly c Department of Electrical and Computer Engineering McMaster University October 17, 005 Lecture 3 3 he Singular Value Decomposition

More information

Stat 159/259: Linear Algebra Notes

Stat 159/259: Linear Algebra Notes Stat 159/259: Linear Algebra Notes Jarrod Millman November 16, 2015 Abstract These notes assume you ve taken a semester of undergraduate linear algebra. In particular, I assume you are familiar with the

More information

Singular Value Decomposition

Singular Value Decomposition Chapter 5 Singular Value Decomposition We now reach an important Chapter in this course concerned with the Singular Value Decomposition of a matrix A. SVD, as it is commonly referred to, is one of the

More information

Linear Algebra Primer

Linear Algebra Primer Linear Algebra Primer David Doria daviddoria@gmail.com Wednesday 3 rd December, 2008 Contents Why is it called Linear Algebra? 4 2 What is a Matrix? 4 2. Input and Output.....................................

More information

Background Mathematics (2/2) 1. David Barber

Background Mathematics (2/2) 1. David Barber Background Mathematics (2/2) 1 David Barber University College London Modified by Samson Cheung (sccheung@ieee.org) 1 These slides accompany the book Bayesian Reasoning and Machine Learning. The book and

More information

https://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:

More information

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T. Notes on singular value decomposition for Math 54 Recall that if A is a symmetric n n matrix, then A has real eigenvalues λ 1,, λ n (possibly repeated), and R n has an orthonormal basis v 1,, v n, where

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

COMP 558 lecture 18 Nov. 15, 2010

COMP 558 lecture 18 Nov. 15, 2010 Least squares We have seen several least squares problems thus far, and we will see more in the upcoming lectures. For this reason it is good to have a more general picture of these problems and how to

More information

Eigenvalues and diagonalization

Eigenvalues and diagonalization Eigenvalues and diagonalization Patrick Breheny November 15 Patrick Breheny BST 764: Applied Statistical Modeling 1/20 Introduction The next topic in our course, principal components analysis, revolves

More information

MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators.

MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators. MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators. Adjoint operator and adjoint matrix Given a linear operator L on an inner product space V, the adjoint of L is a transformation

More information

Singular Value Decomposition and Principal Component Analysis (PCA) I

Singular Value Decomposition and Principal Component Analysis (PCA) I Singular Value Decomposition and Principal Component Analysis (PCA) I Prof Ned Wingreen MOL 40/50 Microarray review Data per array: 0000 genes, I (green) i,i (red) i 000 000+ data points! The expression

More information

Symmetric and anti symmetric matrices

Symmetric and anti symmetric matrices Symmetric and anti symmetric matrices In linear algebra, a symmetric matrix is a square matrix that is equal to its transpose. Formally, matrix A is symmetric if. A = A Because equal matrices have equal

More information

Linear Algebra in Actuarial Science: Slides to the lecture

Linear Algebra in Actuarial Science: Slides to the lecture Linear Algebra in Actuarial Science: Slides to the lecture Fall Semester 2010/2011 Linear Algebra is a Tool-Box Linear Equation Systems Discretization of differential equations: solving linear equations

More information

a 11 a 12 a 11 a 12 a 13 a 21 a 22 a 23 . a 31 a 32 a 33 a 12 a 21 a 23 a 31 a = = = = 12

a 11 a 12 a 11 a 12 a 13 a 21 a 22 a 23 . a 31 a 32 a 33 a 12 a 21 a 23 a 31 a = = = = 12 24 8 Matrices Determinant of 2 2 matrix Given a 2 2 matrix [ ] a a A = 2 a 2 a 22 the real number a a 22 a 2 a 2 is determinant and denoted by det(a) = a a 2 a 2 a 22 Example 8 Find determinant of 2 2

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

ON THE SINGULAR DECOMPOSITION OF MATRICES

ON THE SINGULAR DECOMPOSITION OF MATRICES An. Şt. Univ. Ovidius Constanţa Vol. 8, 00, 55 6 ON THE SINGULAR DECOMPOSITION OF MATRICES Alina PETRESCU-NIŢǍ Abstract This paper is an original presentation of the algorithm of the singular decomposition

More information

Singular Value Decomposition

Singular Value Decomposition Singular Value Decomposition CS 205A: Mathematical Methods for Robotics, Vision, and Graphics Doug James (and Justin Solomon) CS 205A: Mathematical Methods Singular Value Decomposition 1 / 35 Understanding

More information

Quantum Computing Lecture 2. Review of Linear Algebra

Quantum Computing Lecture 2. Review of Linear Algebra Quantum Computing Lecture 2 Review of Linear Algebra Maris Ozols Linear algebra States of a quantum system form a vector space and their transformations are described by linear operators Vector spaces

More information

TBP MATH33A Review Sheet. November 24, 2018

TBP MATH33A Review Sheet. November 24, 2018 TBP MATH33A Review Sheet November 24, 2018 General Transformation Matrices: Function Scaling by k Orthogonal projection onto line L Implementation If we want to scale I 2 by k, we use the following: [

More information

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) PCA transforms the original input space into a lower dimensional space, by constructing dimensions that are linear combinations

More information

Linear Subspace Models

Linear Subspace Models Linear Subspace Models Goal: Explore linear models of a data set. Motivation: A central question in vision concerns how we represent a collection of data vectors. The data vectors may be rasterized images,

More information

Review of some mathematical tools

Review of some mathematical tools MATHEMATICAL FOUNDATIONS OF SIGNAL PROCESSING Fall 2016 Benjamín Béjar Haro, Mihailo Kolundžija, Reza Parhizkar, Adam Scholefield Teaching assistants: Golnoosh Elhami, Hanjie Pan Review of some mathematical

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

. = V c = V [x]v (5.1) c 1. c k

. = V c = V [x]v (5.1) c 1. c k Chapter 5 Linear Algebra It can be argued that all of linear algebra can be understood using the four fundamental subspaces associated with a matrix Because they form the foundation on which we later work,

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

What is Principal Component Analysis?

What is Principal Component Analysis? What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most

More information

Lecture Notes 2: Matrices

Lecture Notes 2: Matrices Optimization-based data analysis Fall 2017 Lecture Notes 2: Matrices Matrices are rectangular arrays of numbers, which are extremely useful for data analysis. They can be interpreted as vectors in a vector

More information

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i

More information

Singular Value Decomposition and its. SVD and its Applications in Computer Vision

Singular Value Decomposition and its. SVD and its Applications in Computer Vision Singular Value Decomposition and its Applications in Computer Vision Subhashis Banerjee Department of Computer Science and Engineering IIT Delhi October 24, 2013 Overview Linear algebra basics Singular

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Announcements (repeat) Principal Components Analysis

Announcements (repeat) Principal Components Analysis 4/7/7 Announcements repeat Principal Components Analysis CS 5 Lecture #9 April 4 th, 7 PA4 is due Monday, April 7 th Test # will be Wednesday, April 9 th Test #3 is Monday, May 8 th at 8AM Just hour long

More information

Matrices A brief introduction

Matrices A brief introduction Matrices A brief introduction Basilio Bona DAUIN Politecnico di Torino Semester 1, 2014-15 B. Bona (DAUIN) Matrices Semester 1, 2014-15 1 / 41 Definitions Definition A matrix is a set of N real or complex

More information

Review problems for MA 54, Fall 2004.

Review problems for MA 54, Fall 2004. Review problems for MA 54, Fall 2004. Below are the review problems for the final. They are mostly homework problems, or very similar. If you are comfortable doing these problems, you should be fine on

More information

Linear Algebra for Machine Learning. Sargur N. Srihari

Linear Algebra for Machine Learning. Sargur N. Srihari Linear Algebra for Machine Learning Sargur N. srihari@cedar.buffalo.edu 1 Overview Linear Algebra is based on continuous math rather than discrete math Computer scientists have little experience with it

More information

1 Singular Value Decomposition and Principal Component

1 Singular Value Decomposition and Principal Component Singular Value Decomposition and Principal Component Analysis In these lectures we discuss the SVD and the PCA, two of the most widely used tools in machine learning. Principal Component Analysis (PCA)

More information

Linear Algebra. Session 12

Linear Algebra. Session 12 Linear Algebra. Session 12 Dr. Marco A Roque Sol 08/01/2017 Example 12.1 Find the constant function that is the least squares fit to the following data x 0 1 2 3 f(x) 1 0 1 2 Solution c = 1 c = 0 f (x)

More information

forms Christopher Engström November 14, 2014 MAA704: Matrix factorization and canonical forms Matrix properties Matrix factorization Canonical forms

forms Christopher Engström November 14, 2014 MAA704: Matrix factorization and canonical forms Matrix properties Matrix factorization Canonical forms Christopher Engström November 14, 2014 Hermitian LU QR echelon Contents of todays lecture Some interesting / useful / important of matrices Hermitian LU QR echelon Rewriting a as a product of several matrices.

More information

Matrix Vector Products

Matrix Vector Products We covered these notes in the tutorial sessions I strongly recommend that you further read the presented materials in classical books on linear algebra Please make sure that you understand the proofs and

More information

Computational Methods. Eigenvalues and Singular Values

Computational Methods. Eigenvalues and Singular Values Computational Methods Eigenvalues and Singular Values Manfred Huber 2010 1 Eigenvalues and Singular Values Eigenvalues and singular values describe important aspects of transformations and of data relations

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2017

Cheng Soon Ong & Christian Walder. Canberra February June 2017 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2017 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 141 Part III

More information

Basic Concepts in Matrix Algebra

Basic Concepts in Matrix Algebra Basic Concepts in Matrix Algebra An column array of p elements is called a vector of dimension p and is written as x p 1 = x 1 x 2. x p. The transpose of the column vector x p 1 is row vector x = [x 1

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

Data Mining Lecture 4: Covariance, EVD, PCA & SVD Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The

More information

Principal Component Analysis and Linear Discriminant Analysis

Principal Component Analysis and Linear Discriminant Analysis Principal Component Analysis and Linear Discriminant Analysis Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/29

More information

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

Throughout these notes we assume V, W are finite dimensional inner product spaces over C. Math 342 - Linear Algebra II Notes Throughout these notes we assume V, W are finite dimensional inner product spaces over C 1 Upper Triangular Representation Proposition: Let T L(V ) There exists an orthonormal

More information

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination Math 0, Winter 07 Final Exam Review Chapter. Matrices and Gaussian Elimination { x + x =,. Different forms of a system of linear equations. Example: The x + 4x = 4. [ ] [ ] [ ] vector form (or the column

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information