Principal components analysis COMS 4771
|
|
- MargaretMargaret McLaughlin
- 5 years ago
- Views:
Transcription
1 Principal components analysis COMS 4771
2 1. Representation learning
3 Useful representations of data Representation learning: Given: raw feature vectors x 1, x 2,..., x n R d. Goal: learn a useful feature transformation φ: R d R k. (Often k d i.e., dimensionality reduction but not always.) 1 / 23
4 Useful representations of data Representation learning: Given: raw feature vectors x 1, x 2,..., x n R d. Goal: learn a useful feature transformation φ: R d R k. (Often k d i.e., dimensionality reduction but not always.) Can then use φ as a feature map for supervised learning. 1 / 23
5 Useful representations of data Representation learning: Given: raw feature vectors x 1, x 2,..., x n R d. Goal: learn a useful feature transformation φ: R d R k. (Often k d i.e., dimensionality reduction but not always.) Can then use φ as a feature map for supervised learning. Some previously encontered examples: Feature maps corresponding to pos. def. kernels (+approximations). (Usually data-oblivious feature map doesn t depend on the data.) Centering x x µ (Effect: resulting features have mean 0.) Standardization x diag(σ 1, σ 2,..., σ d ) 1 (x µ). (Effect: resulting features have mean 0 and unit variance.) 1 / 23
6 Useful representations of data Representation learning: Given: raw feature vectors x 1, x 2,..., x n R d. Goal: learn a useful feature transformation φ: R d R k. (Often k d i.e., dimensionality reduction but not always.) Can then use φ as a feature map for supervised learning. Some previously encontered examples: Feature maps corresponding to pos. def. kernels (+approximations). (Usually data-oblivious feature map doesn t depend on the data.) Centering x x µ (Effect: resulting features have mean 0.) Standardization x diag(σ 1, σ 2,..., σ d ) 1 (x µ). (Effect: resulting features have mean 0 and unit variance.) What other properties of a feature representation may be desirable? 1 / 23
7 2. Principal components analysis
8 Dimensionality reduction via projections Input: x 1, x 2,..., x n R d, target dimensionality k N. Output: a k-dimensional subspace, represented by an orthonormal basis q 1, q 2,..., q k R d. (Orthogonal) projection: projection of x R d to span(q 1, q 2,..., q k ) is k k q i q T i x = q i, x q i R d. } {{ } Π Can also represent the projection of x in terms of its coefficients w.r.t. the orthonormal basis q 1, q 2,..., q k : q 1, x q 2, x φ(x) :=. Rk. q k, x 2 / 23
9 Projection of minimum residual squared error Objective: find k-dimensional projector Π: R d R d such that the average residual squared error 1 n x i Πx i 2 2 n is as small as possible. 3 / 23
10 Projection of minimum residual squared error k = 1 case (Π = qq T ) Objective: find unit vector q R d to minimize 1 n n x i qq T 2 x i 2 4 / 23
11 Projection of minimum residual squared error k = 1 case (Π = qq T ) Objective: find unit vector q R d to minimize 1 n n x i qq T 2 x i = 1 n 2 n x i qt n n x ix T i q 4 / 23
12 Projection of minimum residual squared error k = 1 case (Π = qq T ) Objective: find unit vector q R d to minimize 1 n n x i qq T 2 x i = 1 n = 1 n 2 n x i qt n n x ix T i q n ( ) x i qt n AT A q (where x T i is i-th row of A R n d ). 4 / 23
13 Projection of minimum residual squared error k = 1 case (Π = qq T ) Objective: find unit vector q R d to minimize 1 n n x i qq T 2 x i = 1 n = 1 n 2 n x i qt n n x ix T i q n ( ) x i qt n AT A q (where x T i is i-th row of A R n d ). 1 arg min q R d : q 2 =1 n n x i qq T 2 x i 2 ( ) 1 arg max q T q R d : q 2 =1 n AT A q. 4 / 23
14 Aside: Eigendecompositions Every symmetric matrix M R d d guaranteed to have eigendecomposition with real eigenvalues: = M V Λ V (d d) (d d) (d d) (d d) = d λ i v i v i real eigenvalues: λ 1 λ 2 λ d (Λ = diag(λ 1, λ 2,..., λ d )); corresponding orthonormal eigenvectors: v 1, v 2,..., v d (V = [v 1 v 2 v d ]). 5 / 23
15 Aside: Eigendecompositions Every symmetric matrix M R d d guaranteed to have eigendecomposition with real eigenvalues: = M V Λ V (d d) (d d) (d d) (d d) = d λ i v i v i real eigenvalues: λ 1 λ 2 λ d (Λ = diag(λ 1, λ 2,..., λ d )); corresponding orthonormal eigenvectors: v 1, v 2,..., v d (V = [v 1 v 2 v d ]). Fixed-point characterization of eigenvectors: Mv i = λ iv i. 5 / 23
16 Eigendecompositions Variational characterization of eigenvectors: max q R d qt Mq s.t. q 2 = 1 Maximum value: λ 1 (top eigenvalue) Maximizer: v 1 (top eigenvector) 6 / 23
17 Eigendecompositions Variational characterization of eigenvectors: max q R d qt Mq s.t. q 2 = 1 Maximum value: λ 1 (top eigenvalue) Maximizer: v 1 (top eigenvector) For i > 1, max q R d qt Mq s.t. q 2 = 1 q, v j = 0 j < i Maximum value: λ i (i-th largest eigenvalue) Maximizer: v i (i-th eigenvector) 6 / 23
18 Principal components analysis (k = 1) k = 1 case (Π = qq T ) 1 arg min q R d : q 2 =1 n n x i qq T 2 x i 2 ( ) 1 arg max q T q R d : q 2 =1 n AT A q. 7 / 23
19 Principal components analysis (k = 1) k = 1 case (Π = qq T ) 1 arg min q R d : q 2 =1 n n x i qq T 2 x i 2 ( ) 1 arg max q T q R d : q 2 =1 n AT A q. Solution: eigenvector of A T A corresponding to largest eigenvalue (i.e., the top eigenvector v 1). ( ) 1 q T n AT A q = 1 n q, x i 2 n (variance in direction q, assuming 1 n n xi = 0). 7 / 23
20 Principal components analysis (k = 1) k = 1 case (Π = qq T ) 1 arg min q R d : q 2 =1 n n x i qq T 2 x i 2 ( ) 1 arg max q T q R d : q 2 =1 n AT A q. Solution: eigenvector of A T A corresponding to largest eigenvalue (i.e., the top eigenvector v 1). ( ) 1 q T n AT A q = 1 n q, x i 2 n (variance in direction q, assuming 1 n n xi = 0). top eigenvector direction of maximum variance 7 / 23
21 Principal components analysis (general k) General k case (Π = QQ T ) arg min Q R d k : Q T Q=I 1 n n x i QQ T 2 x i 2 arg max Q R d k : Q T Q=I k q T i ( ) 1 n AT A q i. Solution: k eigenvectors of A T A corresponding to k largest eigenvalue 8 / 23
22 Principal components analysis (general k) General k case (Π = QQ T ) arg min Q R d k : Q T Q=I 1 n n x i QQ T 2 x i 2 arg max Q R d k : Q T Q=I k q T i ( ) 1 n AT A q i. Solution: k eigenvectors of A T A corresponding to k largest eigenvalue k q T i ( ) 1 n AT A q i = k 1 n n q i, x j 2 j=1 (sum of variances in q i directions, assuming 1 n n xi = 0). top k eigenvectors k-dim. subspace of maximum variance 8 / 23
23 Principal components analysis (PCA) Data matrix A R n d Rank k PCA (k dimensional linear subspace) Get top k eigenvectors V k := [v 1 v 2... v k ] of 1 n AT A = 1 n x ix T i. n Feature map: φ(x) := ( v 1, x, v 2, x,..., v k, x ) R k. Decorrelating property: 1 n n φ(x i)φ(x i) T = diag(λ 1, λ 2,..., λ k ). Approx. reconstruction: x V k φ(x). 9 / 23
24 Principal components analysis (PCA) Data matrix A R n d Rank k PCA with centering (k dimensional affine subspace) Get top k eigenvectors V k := [v 1 v 2... v k ] of where µ = 1 n n xi. 1 n n (x i µ)(x i µ) T Feature map: φ(x) := ( v 1, x µ, v 2, x µ,..., v k, x µ ) R k. Decorrelating property: 1 n 1 n n φ(x i) = 0 n φ(x i)φ(x i) T = diag(λ 1, λ 2,..., λ k ). Approx. reconstruction: x µ + V k φ(x). 10 / 23
25 Example: PCA on OCR digits data Data {x i} n from R 784. Fraction of residual variance left by rank-k PCA projection: k j=1 variance in direction vj 1. total variance Fraction of residual variance left by best k coordinate projections: k j=1 variance in direction ej 1. total variance fraction of residual variance coordinate projections PCA projections dimension of projections k 11 / 23
26 Example: compressing digits images pixel images of handwritten 3s (as vectors in R 256 ) Mean µ and eigenvectors v 1, v 2, v 3, v 4 Mean λ 1 = λ 2 = λ 3 = λ 4 = Reconstructions: x k = 1 k = 10 k = 50 k = 200 Only have to store k numbers per image, along with the mean µ and k eigenvectors (256(k + 1) numbers). 12 / 23
27 Example: eigenfaces Dimensional Dataof faces (as vectors in R10304 ) 92High 112 pixel images Figure 15.5: 100 training images. Each image con 112 = greyscale pixels. The train data is sca represented as an image, the components of each ima The average value of each pixel across all images is This is a subset of the 400 images in the full Olive Face Database. 100 example images top k = 48 eigenvectors Figure 15.6: (a): SVD tion of the images 13in/ 23fig
28 Other examples x R d : movement of stock prices for d different stocks in one day. 14 / 23
29 Other examples x R d : movement of stock prices for d different stocks in one day. Principal component: combination of stocks that account for the most variation in stock price movement. 14 / 23
30 Other examples x R d : movement of stock prices for d different stocks in one day. Principal component: combination of stocks that account for the most variation in stock price movement. x {1, 2,..., 5} d : levels at which various terms describe an individual (e.g., jolly, impulsive, outgoing, conceited, meddlesome ) 14 / 23
31 Other examples x R d : movement of stock prices for d different stocks in one day. Principal component: combination of stocks that account for the most variation in stock price movement. x {1, 2,..., 5} d : levels at which various terms describe an individual (e.g., jolly, impulsive, outgoing, conceited, meddlesome ) Principal components: major personality axes in a population (e.g., extroversion, agreeableness, conscientiousness ) 14 / 23
32 Other examples x R d : movement of stock prices for d different stocks in one day. Principal component: combination of stocks that account for the most variation in stock price movement. x {1, 2,..., 5} d : levels at which various terms describe an individual (e.g., jolly, impulsive, outgoing, conceited, meddlesome ) Principal components: major personality axes in a population (e.g., extroversion, agreeableness, conscientiousness ) / 23
33 3. Computation
34 Power method Problem: Given matrix A R n d, compute the top eigenvector of A T A. Initialize with random ˆv R d. Repeat: 1. ˆv := A T Aˆv. 2. ˆv := ˆv/ ˆv / 23
35 Power method Problem: Given matrix A R n d, compute the top eigenvector of A T A. Initialize with random ˆv R d. Repeat: 1. ˆv := A T Aˆv. 2. ˆv := ˆv/ ˆv 2. Theorem: For any ε (0, 1), with high probability (over choice of initial ˆv), ˆv T A T Aˆv (1 ε) top eigenvalue of A T A ( 1 after O ε log d ) iterations. ε 15 / 23
36 Power method Problem: Given matrix A R n d, compute the top eigenvector of A T A. Initialize with random ˆv R d. Repeat: 1. ˆv := A T Aˆv. 2. ˆv := ˆv/ ˆv 2. Theorem: For any ε (0, 1), with high probability (over choice of initial ˆv), ˆv T A T Aˆv (1 ε) top eigenvalue of A T A ( 1 after O ε log d ) iterations. ε Similar algorithm can be used to get top k eigenvectors. 15 / 23
37 4. Singular value decomposition
38 Singular value decomposition Every matrix A R n d has a singular value decomposition (SVD) = A U S V (n d) (n r) (r r) (r d) = r s i u i v i where r = rank(a) (r min{n, d}); U T U = I (i.e., U = [u 1 u 2 u r] has orthonormal columns) left singular vectors; S = diag(s 1, s 2,..., s r) where s 1 s 2 s r > 0 singular values; V T V = I (i.e., V = [v 1 v 2 v r] has orthonormal columns) right singular vectors. 16 / 23
39 SVD vs PCA If SVD of A is USV T = r siuivt i, then: non-zero eigenvalues of A T A are s 2 1, s 2 2,..., s 2 r, (squares of singular values of A); corresponding eigenvectors are v 1, v 2,..., v r R d (right singular vectors of A). 17 / 23
40 SVD vs PCA If SVD of A is USV T = r siuivt i, then: non-zero eigenvalues of A T A are s 2 1, s 2 2,..., s 2 r, (squares of singular values of A); corresponding eigenvectors are v 1, v 2,..., v r R d (right singular vectors of A). By symmetry, also have: non-zero eigenvalues of AA T are s 2 1, s 2 2,..., s 2 r, (squares of singular values of A); corresponding eigenvectors are u 1, u 2,..., u r R n (left singular vectors of A). 17 / 23
41 Low-rank SVD For any k rank(a), rank-k SVD approximation: Û k Ŝ k V k (n k) (k k) (k d) = k s i u i v i (Just retain top k left/right singular vectors and singular values from SVD.) 18 / 23
42 Low-rank SVD For any k rank(a), rank-k SVD approximation: Û k Ŝ k V k (n k) (k k) (k d) = k s i u i v i (Just retain top k left/right singular vectors and singular values from SVD.) Best rank-k approximation: Â := Û V T kŝk k = arg min M R n d : rank(m) k Minimum value is simply given by n d n j=1 j=1(a i,j Âi,j)2 = t>k d (A i,j M i,j) 2. s 2 t. 18 / 23
43 Example: latent semantic analysis Represent corpus of documents by counts of words they contain: document 1 document 2 document 3 aardvark abacus abalone One column per vocabulary word in A R n d One row per document in A R n d A i,j = numbers of times word j appears in document i. 19 / 23
44 Example: latent semantic analysis Statistical model for document-word count matrix. 20 / 23
45 Example: latent semantic analysis Statistical model for document-word count matrix. Parameters θ = (β 1, β 2,..., β k, π 1, π 2,..., π n, l 1, l 2,..., l n). k min{n, d} topics, each represented by a distributions over vocabulary words: β 1, β 2,..., β k R d +. Each β t = (β t,1, β t,2,..., β t,d ) is a probability vector, so d j=1 βt,j = 1. Each document i is associated with a probability distribution π i = (π i,1, π i,2,..., π i,k ) over topics, so k t=1 πi,t = / 23
46 Example: latent semantic analysis Statistical model for document-word count matrix. Parameters θ = (β 1, β 2,..., β k, π 1, π 2,..., π n, l 1, l 2,..., l n). k min{n, d} topics, each represented by a distributions over vocabulary words: β 1, β 2,..., β k R d +. Each β t = (β t,1, β t,2,..., β t,d ) is a probability vector, so d j=1 βt,j = 1. Each document i is associated with a probability distribution π i = (π i,1, π i,2,..., π i,k ) over topics, so k t=1 πi,t = 1. Model posits that document i s count vector (i-th row in A) follows a multinomial distribution with probabilities given by k [A i,1 A i,2... A i,d ] Expected value is l i k t=1 πi,tβt t. Multinomial l i, t=1 πi,tβ t: k t=1 π i,tβ T t. 20 / 23
47 Example: latent semantic analysis Suppose A P θ. 21 / 23
48 Example: latent semantic analysis Suppose A P θ. In expectation, A has rank k: l 1π T 1 β T 1 l 2π T 2 β T 2 E(A) =... } l nπ T n {{ }} β T k {{ } n k k d 21 / 23
49 Example: latent semantic analysis Suppose A P θ. In expectation, A has rank k: l 1π T 1 β T 1 l 2π T 2 β T 2 E(A) =... } l nπ T n {{ }} β T k {{ } n k k d Observed matrix A: A = E(A) + Zero mean noise so A is generally of rank min{n, d} k. 21 / 23
50 Example: latent semantic analysis Using SVD: rank-k SVD Û kŝk V T k of A gives approximation to LB T : Â := Û kŝk V T k E(A). (SVD helps remove some of the effect of the noise.) 22 / 23
51 Example: latent semantic analysis Using SVD: rank-k SVD Û kŝk V T k of A gives approximation to LB T : Â := Û kŝk V T k E(A). (SVD helps remove some of the effect of the noise.) Each of the n documents can be summarized by k numbers: Â V k = Û kŝk R n k. 22 / 23
52 Example: latent semantic analysis Using SVD: rank-k SVD Û kŝk V T k of A gives approximation to LB T : Â := Û kŝk V T k E(A). (SVD helps remove some of the effect of the noise.) Each of the n documents can be summarized by k numbers: Â V k = Û kŝk R n k. New document feature representation very useful for information retrieval. (Example: cosine similarities between documents become faster to compute and possibly less noisy.) 22 / 23
53 Example: latent semantic analysis Using SVD: rank-k SVD Û kŝk V T k of A gives approximation to LB T : Â := Û kŝk V T k E(A). (SVD helps remove some of the effect of the noise.) Each of the n documents can be summarized by k numbers: Â V k = Û kŝk R n k. New document feature representation very useful for information retrieval. (Example: cosine similarities between documents become faster to compute and possibly less noisy.) Actually estimating π i and β t takes a bit more work. 22 / 23
54 Recap PCA: directions of maximum variance in data subspace that minimizes residual squared error. Computation: power method SVD: general decomposition for arbitrary matrices Low-rank SVD: best low-rank approximation of a matrix in terms of average squared errors PCA/SVD: often useful when low-rank structure is expected (e.g., probabilistic modeling). 23 / 23
Principal Component Analysis
Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used
More information7 Principal Component Analysis
7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is
More informationDimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas
Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx
More informationCOMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017
COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationLecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University
Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More information14 Singular Value Decomposition
14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationDATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD
DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationAdvanced Introduction to Machine Learning CMU-10715
Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás Póczos Contents Motivation PCA algorithms Applications Some of these slides are taken from Karl Booksh Research
More informationUnsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent
Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationPrincipal Component Analysis
B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationMachine Learning - MT & 14. PCA and MDS
Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)
More informationLecture: Face Recognition and Feature Reduction
Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed
More informationDeriving Principal Component Analysis (PCA)
-0 Mathematical Foundations for Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deriving Principal Component Analysis (PCA) Matt Gormley Lecture 11 Oct.
More informationlinearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice
3. Eigenvalues and Eigenvectors, Spectral Representation 3.. Eigenvalues and Eigenvectors A vector ' is eigenvector of a matrix K, if K' is parallel to ' and ' 6, i.e., K' k' k is the eigenvalue. If is
More informationAssignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition. Name:
Assignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition Due date: Friday, May 4, 2018 (1:35pm) Name: Section Number Assignment #10: Diagonalization
More informationManning & Schuetze, FSNLP (c) 1999,2000
558 15 Topics in Information Retrieval (15.10) y 4 3 2 1 0 0 1 2 3 4 5 6 7 8 Figure 15.7 An example of linear regression. The line y = 0.25x + 1 is the best least-squares fit for the four points (1,1),
More informationA few applications of the SVD
A few applications of the SVD Many methods require to approximate the original data (matrix) by a low rank matrix before attempting to solve the original problem Regularization methods require the solution
More informationLecture: Face Recognition and Feature Reduction
Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed in the
More informationPCA and admixture models
PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality
More informationNumerical Methods I Singular Value Decomposition
Numerical Methods I Singular Value Decomposition Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 9th, 2014 A. Donev (Courant Institute)
More informationDimensionality Reduction
Lecture 5 1 Outline 1. Overview a) What is? b) Why? 2. Principal Component Analysis (PCA) a) Objectives b) Explaining variability c) SVD 3. Related approaches a) ICA b) Autoencoders 2 Example 1: Sportsball
More information15 Singular Value Decomposition
15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions
More informationLecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016
Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31 Outline 1 Eigen
More informationLinear Methods in Data Mining
Why Methods? linear methods are well understood, simple and elegant; algorithms based on linear methods are widespread: data mining, computer vision, graphics, pattern recognition; excellent general software
More informationMathematical foundations - linear algebra
Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar
More informationSingular Value Decompsition
Singular Value Decompsition Massoud Malek One of the most useful results from linear algebra, is a matrix decomposition known as the singular value decomposition It has many useful applications in almost
More informationPROBABILISTIC LATENT SEMANTIC ANALYSIS
PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications
More informationRobot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction
Robot Image Credit: Viktoriya Sukhanova 13RF.com Dimensionality Reduction Feature Selection vs. Dimensionality Reduction Feature Selection (last time) Select a subset of features. When classifying novel
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationPrincipal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014
Principal Component Analysis and Singular Value Decomposition Volker Tresp, Clemens Otte Summer 2014 1 Motivation So far we always argued for a high-dimensional feature space Still, in some cases it makes
More informationLecture 2: Linear Algebra Review
EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1
More informationStructure in Data. A major objective in data analysis is to identify interesting features or structure in the data.
Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two
More informationLinear Subspace Models
Linear Subspace Models Goal: Explore linear models of a data set. Motivation: A central question in vision concerns how we represent a collection of data vectors. The data vectors may be rasterized images,
More informationCS 340 Lec. 6: Linear Dimensionality Reduction
CS 340 Lec. 6: Linear Dimensionality Reduction AD January 2011 AD () January 2011 1 / 46 Linear Dimensionality Reduction Introduction & Motivation Brief Review of Linear Algebra Principal Component Analysis
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional
More informationPrincipal Component Analysis
Principal Component Analysis CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Principal
More informationLecture 5 Singular value decomposition
Lecture 5 Singular value decomposition Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University, tieli@pku.edu.cn
More informationLEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach
LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach Dr. Guangliang Chen February 9, 2016 Outline Introduction Review of linear algebra Matrix SVD PCA Motivation The digits
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional
More informationhttps://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:
More informationWhat is Principal Component Analysis?
What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most
More informationGI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil
GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection
More informationLECTURE 16: PCA AND SVD
Instructor: Sael Lee CS549 Computational Biology LECTURE 16: PCA AND SVD Resource: PCA Slide by Iyad Batal Chapter 12 of PRML Shlens, J. (2003). A tutorial on principal component analysis. CONTENT Principal
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationPCA and LDA. Man-Wai MAK
PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationMachine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component
More informationImage Analysis & Retrieval Lec 14 - Eigenface & Fisherface
CS/EE 5590 / ENG 401 Special Topics, Spring 2018 Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface Zhu Li Dept of CSEE, UMKC http://l.web.umkc.edu/lizhu Office Hour: Tue/Thr 2:30-4pm@FH560E, Contact:
More informationImage Analysis & Retrieval. Lec 14. Eigenface and Fisherface
Image Analysis & Retrieval Lec 14 Eigenface and Fisherface Zhu Li Dept of CSEE, UMKC Office: FH560E, Email: lizhu@umkc.edu, Ph: x 2346. http://l.web.umkc.edu/lizhu Z. Li, Image Analysis & Retrv, Spring
More informationProblem # Max points possible Actual score Total 120
FINAL EXAMINATION - MATH 2121, FALL 2017. Name: ID#: Email: Lecture & Tutorial: Problem # Max points possible Actual score 1 15 2 15 3 10 4 15 5 15 6 15 7 10 8 10 9 15 Total 120 You have 180 minutes to
More informationCSC 411 Lecture 12: Principal Component Analysis
CSC 411 Lecture 12: Principal Component Analysis Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 12-PCA 1 / 23 Overview Today we ll cover the first unsupervised
More informationMLCC 2015 Dimensionality Reduction and PCA
MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline PCA & Reconstruction PCA and Maximum Variance PCA and Associated Eigenproblem Beyond the First Principal Component
More informationAnnouncements (repeat) Principal Components Analysis
4/7/7 Announcements repeat Principal Components Analysis CS 5 Lecture #9 April 4 th, 7 PA4 is due Monday, April 7 th Test # will be Wednesday, April 9 th Test #3 is Monday, May 8 th at 8AM Just hour long
More informationData Mining Lecture 4: Covariance, EVD, PCA & SVD
Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The
More informationApplied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic
Applied Mathematics 205 Unit II: Numerical Linear Algebra Lecturer: Dr. David Knezevic Unit II: Numerical Linear Algebra Chapter II.3: QR Factorization, SVD 2 / 66 QR Factorization 3 / 66 QR Factorization
More informationThe Singular Value Decomposition
The Singular Value Decomposition An Important topic in NLA Radu Tiberiu Trîmbiţaş Babeş-Bolyai University February 23, 2009 Radu Tiberiu Trîmbiţaş ( Babeş-Bolyai University)The Singular Value Decomposition
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 8 Continuous Latent Variable
More information1 Feature Vectors and Time Series
PCA, SVD, LSI, and Kernel PCA 1 Feature Vectors and Time Series We now consider a sample x 1,..., x of objects (not necessarily vectors) and a feature map Φ such that for any object x we have that Φ(x)
More informationDimensionality Reduction
Dimensionality Reduction Le Song Machine Learning I CSE 674, Fall 23 Unsupervised learning Learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data where a classification of
More informationPCA and LDA. Man-Wai MAK
PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer
More informationCS 3750 Advanced Machine Learning. Applications of SVD and PCA (LSA and Link analysis) Cem Akkaya
CS 375 Advanced Machine Learning Applications of SVD and PCA (LSA and Link analysis) Cem Akkaya Outline SVD and LSI Kleinberg s Algorithm PageRank Algorithm Vector Space Model Vector space model represents
More informationDimensionality reduction
Dimensionality Reduction PCA continued Machine Learning CSE446 Carlos Guestrin University of Washington May 22, 2013 Carlos Guestrin 2005-2013 1 Dimensionality reduction n Input data may have thousands
More informationIV. Matrix Approximation using Least-Squares
IV. Matrix Approximation using Least-Squares The SVD and Matrix Approximation We begin with the following fundamental question. Let A be an M N matrix with rank R. What is the closest matrix to A that
More informationPrincipal Component Analysis
CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given
More informationUnsupervised Learning
2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and
More information1 Singular Value Decomposition and Principal Component
Singular Value Decomposition and Principal Component Analysis In these lectures we discuss the SVD and the PCA, two of the most widely used tools in machine learning. Principal Component Analysis (PCA)
More informationLinear Algebra & Geometry why is linear algebra useful in computer vision?
Linear Algebra & Geometry why is linear algebra useful in computer vision? References: -Any book on linear algebra! -[HZ] chapters 2, 4 Some of the slides in this lecture are courtesy to Prof. Octavia
More informationPrincipal Component Analysis (PCA)
Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction
More informationSTAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 9
STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 9 1. qr and complete orthogonal factorization poor man s svd can solve many problems on the svd list using either of these factorizations but they
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 5: Numerical Linear Algebra Cho-Jui Hsieh UC Davis April 20, 2017 Linear Algebra Background Vectors A vector has a direction and a magnitude
More informationj=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.
Lecture Notes: Orthogonal and Symmetric Matrices Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk Orthogonal Matrix Definition. Let u = [u
More informationLinear Algebra Review. Vectors
Linear Algebra Review 9/4/7 Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa (UCSD) Cogsci 8F Linear Algebra review Vectors
More informationEECS 275 Matrix Computation
EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 6 1 / 22 Overview
More informationNotes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.
Notes on singular value decomposition for Math 54 Recall that if A is a symmetric n n matrix, then A has real eigenvalues λ 1,, λ n (possibly repeated), and R n has an orthonormal basis v 1,, v n, where
More informationExpectation Maximization
Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same
More informationMATH36001 Generalized Inverses and the SVD 2015
MATH36001 Generalized Inverses and the SVD 201 1 Generalized Inverses of Matrices A matrix has an inverse only if it is square and nonsingular. However there are theoretical and practical applications
More informationSystem 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to:
System 2 : Modelling & Recognising Modelling and Recognising Classes of Classes of Shapes Shape : PDM & PCA All the same shape? System 1 (last lecture) : limited to rigidly structured shapes System 2 :
More informationPreprocessing & dimensionality reduction
Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016
More informationPrincipal Component Analysis
Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand
More informationLecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26
Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1
More information(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =
. (5 points) (a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? dim N(A), since rank(a) 3. (b) If we also know that Ax = has no solution, what do we know about the rank of A? C(A)
More informationLinear Algebra & Geometry why is linear algebra useful in computer vision?
Linear Algebra & Geometry why is linear algebra useful in computer vision? References: -Any book on linear algebra! -[HZ] chapters 2, 4 Some of the slides in this lecture are courtesy to Prof. Octavia
More informationEECS 275 Matrix Computation
EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 22 1 / 21 Overview
More informationSingular Value Decomposition and Principal Component Analysis (PCA) I
Singular Value Decomposition and Principal Component Analysis (PCA) I Prof Ned Wingreen MOL 40/50 Microarray review Data per array: 0000 genes, I (green) i,i (red) i 000 000+ data points! The expression
More informationMain matrix factorizations
Main matrix factorizations A P L U P permutation matrix, L lower triangular, U upper triangular Key use: Solve square linear system Ax b. A Q R Q unitary, R upper triangular Key use: Solve square or overdetrmined
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 9: Dimension Reduction/Word2vec Cho-Jui Hsieh UC Davis May 15, 2018 Principal Component Analysis Principal Component Analysis (PCA) Data
More informationDimensionality Reduction with Principal Component Analysis
10 Dimensionality Reduction with Principal Component Analysis Working directly with high-dimensional data, such as images, comes with some difficulties: it is hard to analyze, interpretation is difficult,
More informationThe Singular Value Decomposition
The Singular Value Decomposition Philippe B. Laval KSU Fall 2015 Philippe B. Laval (KSU) SVD Fall 2015 1 / 13 Review of Key Concepts We review some key definitions and results about matrices that will
More informationLatent semantic indexing
Latent semantic indexing Relationship between concepts and words is many-to-many. Solve problems of synonymy and ambiguity by representing documents as vectors of ideas or concepts, not terms. For retrieval,
More informationLecture 3: Review of Linear Algebra
ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters, transforms,
More informationMatrix Decomposition and Latent Semantic Indexing (LSI) Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson
Matrix Decomposition and Latent Semantic Indexing (LSI) Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson Latent Semantic Indexing Outline Introduction Linear Algebra Refresher
More informationFSAN/ELEG815: Statistical Learning
: Statistical Learning Gonzalo R. Arce Department of Electrical and Computer Engineering University of Delaware 3. Eigen Analysis, SVD and PCA Outline of the Course 1. Review of Probability 2. Stationary
More information