Principal components analysis COMS 4771

Size: px
Start display at page:

Download "Principal components analysis COMS 4771"

Transcription

1 Principal components analysis COMS 4771

2 1. Representation learning

3 Useful representations of data Representation learning: Given: raw feature vectors x 1, x 2,..., x n R d. Goal: learn a useful feature transformation φ: R d R k. (Often k d i.e., dimensionality reduction but not always.) 1 / 23

4 Useful representations of data Representation learning: Given: raw feature vectors x 1, x 2,..., x n R d. Goal: learn a useful feature transformation φ: R d R k. (Often k d i.e., dimensionality reduction but not always.) Can then use φ as a feature map for supervised learning. 1 / 23

5 Useful representations of data Representation learning: Given: raw feature vectors x 1, x 2,..., x n R d. Goal: learn a useful feature transformation φ: R d R k. (Often k d i.e., dimensionality reduction but not always.) Can then use φ as a feature map for supervised learning. Some previously encontered examples: Feature maps corresponding to pos. def. kernels (+approximations). (Usually data-oblivious feature map doesn t depend on the data.) Centering x x µ (Effect: resulting features have mean 0.) Standardization x diag(σ 1, σ 2,..., σ d ) 1 (x µ). (Effect: resulting features have mean 0 and unit variance.) 1 / 23

6 Useful representations of data Representation learning: Given: raw feature vectors x 1, x 2,..., x n R d. Goal: learn a useful feature transformation φ: R d R k. (Often k d i.e., dimensionality reduction but not always.) Can then use φ as a feature map for supervised learning. Some previously encontered examples: Feature maps corresponding to pos. def. kernels (+approximations). (Usually data-oblivious feature map doesn t depend on the data.) Centering x x µ (Effect: resulting features have mean 0.) Standardization x diag(σ 1, σ 2,..., σ d ) 1 (x µ). (Effect: resulting features have mean 0 and unit variance.) What other properties of a feature representation may be desirable? 1 / 23

7 2. Principal components analysis

8 Dimensionality reduction via projections Input: x 1, x 2,..., x n R d, target dimensionality k N. Output: a k-dimensional subspace, represented by an orthonormal basis q 1, q 2,..., q k R d. (Orthogonal) projection: projection of x R d to span(q 1, q 2,..., q k ) is k k q i q T i x = q i, x q i R d. } {{ } Π Can also represent the projection of x in terms of its coefficients w.r.t. the orthonormal basis q 1, q 2,..., q k : q 1, x q 2, x φ(x) :=. Rk. q k, x 2 / 23

9 Projection of minimum residual squared error Objective: find k-dimensional projector Π: R d R d such that the average residual squared error 1 n x i Πx i 2 2 n is as small as possible. 3 / 23

10 Projection of minimum residual squared error k = 1 case (Π = qq T ) Objective: find unit vector q R d to minimize 1 n n x i qq T 2 x i 2 4 / 23

11 Projection of minimum residual squared error k = 1 case (Π = qq T ) Objective: find unit vector q R d to minimize 1 n n x i qq T 2 x i = 1 n 2 n x i qt n n x ix T i q 4 / 23

12 Projection of minimum residual squared error k = 1 case (Π = qq T ) Objective: find unit vector q R d to minimize 1 n n x i qq T 2 x i = 1 n = 1 n 2 n x i qt n n x ix T i q n ( ) x i qt n AT A q (where x T i is i-th row of A R n d ). 4 / 23

13 Projection of minimum residual squared error k = 1 case (Π = qq T ) Objective: find unit vector q R d to minimize 1 n n x i qq T 2 x i = 1 n = 1 n 2 n x i qt n n x ix T i q n ( ) x i qt n AT A q (where x T i is i-th row of A R n d ). 1 arg min q R d : q 2 =1 n n x i qq T 2 x i 2 ( ) 1 arg max q T q R d : q 2 =1 n AT A q. 4 / 23

14 Aside: Eigendecompositions Every symmetric matrix M R d d guaranteed to have eigendecomposition with real eigenvalues: = M V Λ V (d d) (d d) (d d) (d d) = d λ i v i v i real eigenvalues: λ 1 λ 2 λ d (Λ = diag(λ 1, λ 2,..., λ d )); corresponding orthonormal eigenvectors: v 1, v 2,..., v d (V = [v 1 v 2 v d ]). 5 / 23

15 Aside: Eigendecompositions Every symmetric matrix M R d d guaranteed to have eigendecomposition with real eigenvalues: = M V Λ V (d d) (d d) (d d) (d d) = d λ i v i v i real eigenvalues: λ 1 λ 2 λ d (Λ = diag(λ 1, λ 2,..., λ d )); corresponding orthonormal eigenvectors: v 1, v 2,..., v d (V = [v 1 v 2 v d ]). Fixed-point characterization of eigenvectors: Mv i = λ iv i. 5 / 23

16 Eigendecompositions Variational characterization of eigenvectors: max q R d qt Mq s.t. q 2 = 1 Maximum value: λ 1 (top eigenvalue) Maximizer: v 1 (top eigenvector) 6 / 23

17 Eigendecompositions Variational characterization of eigenvectors: max q R d qt Mq s.t. q 2 = 1 Maximum value: λ 1 (top eigenvalue) Maximizer: v 1 (top eigenvector) For i > 1, max q R d qt Mq s.t. q 2 = 1 q, v j = 0 j < i Maximum value: λ i (i-th largest eigenvalue) Maximizer: v i (i-th eigenvector) 6 / 23

18 Principal components analysis (k = 1) k = 1 case (Π = qq T ) 1 arg min q R d : q 2 =1 n n x i qq T 2 x i 2 ( ) 1 arg max q T q R d : q 2 =1 n AT A q. 7 / 23

19 Principal components analysis (k = 1) k = 1 case (Π = qq T ) 1 arg min q R d : q 2 =1 n n x i qq T 2 x i 2 ( ) 1 arg max q T q R d : q 2 =1 n AT A q. Solution: eigenvector of A T A corresponding to largest eigenvalue (i.e., the top eigenvector v 1). ( ) 1 q T n AT A q = 1 n q, x i 2 n (variance in direction q, assuming 1 n n xi = 0). 7 / 23

20 Principal components analysis (k = 1) k = 1 case (Π = qq T ) 1 arg min q R d : q 2 =1 n n x i qq T 2 x i 2 ( ) 1 arg max q T q R d : q 2 =1 n AT A q. Solution: eigenvector of A T A corresponding to largest eigenvalue (i.e., the top eigenvector v 1). ( ) 1 q T n AT A q = 1 n q, x i 2 n (variance in direction q, assuming 1 n n xi = 0). top eigenvector direction of maximum variance 7 / 23

21 Principal components analysis (general k) General k case (Π = QQ T ) arg min Q R d k : Q T Q=I 1 n n x i QQ T 2 x i 2 arg max Q R d k : Q T Q=I k q T i ( ) 1 n AT A q i. Solution: k eigenvectors of A T A corresponding to k largest eigenvalue 8 / 23

22 Principal components analysis (general k) General k case (Π = QQ T ) arg min Q R d k : Q T Q=I 1 n n x i QQ T 2 x i 2 arg max Q R d k : Q T Q=I k q T i ( ) 1 n AT A q i. Solution: k eigenvectors of A T A corresponding to k largest eigenvalue k q T i ( ) 1 n AT A q i = k 1 n n q i, x j 2 j=1 (sum of variances in q i directions, assuming 1 n n xi = 0). top k eigenvectors k-dim. subspace of maximum variance 8 / 23

23 Principal components analysis (PCA) Data matrix A R n d Rank k PCA (k dimensional linear subspace) Get top k eigenvectors V k := [v 1 v 2... v k ] of 1 n AT A = 1 n x ix T i. n Feature map: φ(x) := ( v 1, x, v 2, x,..., v k, x ) R k. Decorrelating property: 1 n n φ(x i)φ(x i) T = diag(λ 1, λ 2,..., λ k ). Approx. reconstruction: x V k φ(x). 9 / 23

24 Principal components analysis (PCA) Data matrix A R n d Rank k PCA with centering (k dimensional affine subspace) Get top k eigenvectors V k := [v 1 v 2... v k ] of where µ = 1 n n xi. 1 n n (x i µ)(x i µ) T Feature map: φ(x) := ( v 1, x µ, v 2, x µ,..., v k, x µ ) R k. Decorrelating property: 1 n 1 n n φ(x i) = 0 n φ(x i)φ(x i) T = diag(λ 1, λ 2,..., λ k ). Approx. reconstruction: x µ + V k φ(x). 10 / 23

25 Example: PCA on OCR digits data Data {x i} n from R 784. Fraction of residual variance left by rank-k PCA projection: k j=1 variance in direction vj 1. total variance Fraction of residual variance left by best k coordinate projections: k j=1 variance in direction ej 1. total variance fraction of residual variance coordinate projections PCA projections dimension of projections k 11 / 23

26 Example: compressing digits images pixel images of handwritten 3s (as vectors in R 256 ) Mean µ and eigenvectors v 1, v 2, v 3, v 4 Mean λ 1 = λ 2 = λ 3 = λ 4 = Reconstructions: x k = 1 k = 10 k = 50 k = 200 Only have to store k numbers per image, along with the mean µ and k eigenvectors (256(k + 1) numbers). 12 / 23

27 Example: eigenfaces Dimensional Dataof faces (as vectors in R10304 ) 92High 112 pixel images Figure 15.5: 100 training images. Each image con 112 = greyscale pixels. The train data is sca represented as an image, the components of each ima The average value of each pixel across all images is This is a subset of the 400 images in the full Olive Face Database. 100 example images top k = 48 eigenvectors Figure 15.6: (a): SVD tion of the images 13in/ 23fig

28 Other examples x R d : movement of stock prices for d different stocks in one day. 14 / 23

29 Other examples x R d : movement of stock prices for d different stocks in one day. Principal component: combination of stocks that account for the most variation in stock price movement. 14 / 23

30 Other examples x R d : movement of stock prices for d different stocks in one day. Principal component: combination of stocks that account for the most variation in stock price movement. x {1, 2,..., 5} d : levels at which various terms describe an individual (e.g., jolly, impulsive, outgoing, conceited, meddlesome ) 14 / 23

31 Other examples x R d : movement of stock prices for d different stocks in one day. Principal component: combination of stocks that account for the most variation in stock price movement. x {1, 2,..., 5} d : levels at which various terms describe an individual (e.g., jolly, impulsive, outgoing, conceited, meddlesome ) Principal components: major personality axes in a population (e.g., extroversion, agreeableness, conscientiousness ) 14 / 23

32 Other examples x R d : movement of stock prices for d different stocks in one day. Principal component: combination of stocks that account for the most variation in stock price movement. x {1, 2,..., 5} d : levels at which various terms describe an individual (e.g., jolly, impulsive, outgoing, conceited, meddlesome ) Principal components: major personality axes in a population (e.g., extroversion, agreeableness, conscientiousness ) / 23

33 3. Computation

34 Power method Problem: Given matrix A R n d, compute the top eigenvector of A T A. Initialize with random ˆv R d. Repeat: 1. ˆv := A T Aˆv. 2. ˆv := ˆv/ ˆv / 23

35 Power method Problem: Given matrix A R n d, compute the top eigenvector of A T A. Initialize with random ˆv R d. Repeat: 1. ˆv := A T Aˆv. 2. ˆv := ˆv/ ˆv 2. Theorem: For any ε (0, 1), with high probability (over choice of initial ˆv), ˆv T A T Aˆv (1 ε) top eigenvalue of A T A ( 1 after O ε log d ) iterations. ε 15 / 23

36 Power method Problem: Given matrix A R n d, compute the top eigenvector of A T A. Initialize with random ˆv R d. Repeat: 1. ˆv := A T Aˆv. 2. ˆv := ˆv/ ˆv 2. Theorem: For any ε (0, 1), with high probability (over choice of initial ˆv), ˆv T A T Aˆv (1 ε) top eigenvalue of A T A ( 1 after O ε log d ) iterations. ε Similar algorithm can be used to get top k eigenvectors. 15 / 23

37 4. Singular value decomposition

38 Singular value decomposition Every matrix A R n d has a singular value decomposition (SVD) = A U S V (n d) (n r) (r r) (r d) = r s i u i v i where r = rank(a) (r min{n, d}); U T U = I (i.e., U = [u 1 u 2 u r] has orthonormal columns) left singular vectors; S = diag(s 1, s 2,..., s r) where s 1 s 2 s r > 0 singular values; V T V = I (i.e., V = [v 1 v 2 v r] has orthonormal columns) right singular vectors. 16 / 23

39 SVD vs PCA If SVD of A is USV T = r siuivt i, then: non-zero eigenvalues of A T A are s 2 1, s 2 2,..., s 2 r, (squares of singular values of A); corresponding eigenvectors are v 1, v 2,..., v r R d (right singular vectors of A). 17 / 23

40 SVD vs PCA If SVD of A is USV T = r siuivt i, then: non-zero eigenvalues of A T A are s 2 1, s 2 2,..., s 2 r, (squares of singular values of A); corresponding eigenvectors are v 1, v 2,..., v r R d (right singular vectors of A). By symmetry, also have: non-zero eigenvalues of AA T are s 2 1, s 2 2,..., s 2 r, (squares of singular values of A); corresponding eigenvectors are u 1, u 2,..., u r R n (left singular vectors of A). 17 / 23

41 Low-rank SVD For any k rank(a), rank-k SVD approximation: Û k Ŝ k V k (n k) (k k) (k d) = k s i u i v i (Just retain top k left/right singular vectors and singular values from SVD.) 18 / 23

42 Low-rank SVD For any k rank(a), rank-k SVD approximation: Û k Ŝ k V k (n k) (k k) (k d) = k s i u i v i (Just retain top k left/right singular vectors and singular values from SVD.) Best rank-k approximation: Â := Û V T kŝk k = arg min M R n d : rank(m) k Minimum value is simply given by n d n j=1 j=1(a i,j Âi,j)2 = t>k d (A i,j M i,j) 2. s 2 t. 18 / 23

43 Example: latent semantic analysis Represent corpus of documents by counts of words they contain: document 1 document 2 document 3 aardvark abacus abalone One column per vocabulary word in A R n d One row per document in A R n d A i,j = numbers of times word j appears in document i. 19 / 23

44 Example: latent semantic analysis Statistical model for document-word count matrix. 20 / 23

45 Example: latent semantic analysis Statistical model for document-word count matrix. Parameters θ = (β 1, β 2,..., β k, π 1, π 2,..., π n, l 1, l 2,..., l n). k min{n, d} topics, each represented by a distributions over vocabulary words: β 1, β 2,..., β k R d +. Each β t = (β t,1, β t,2,..., β t,d ) is a probability vector, so d j=1 βt,j = 1. Each document i is associated with a probability distribution π i = (π i,1, π i,2,..., π i,k ) over topics, so k t=1 πi,t = / 23

46 Example: latent semantic analysis Statistical model for document-word count matrix. Parameters θ = (β 1, β 2,..., β k, π 1, π 2,..., π n, l 1, l 2,..., l n). k min{n, d} topics, each represented by a distributions over vocabulary words: β 1, β 2,..., β k R d +. Each β t = (β t,1, β t,2,..., β t,d ) is a probability vector, so d j=1 βt,j = 1. Each document i is associated with a probability distribution π i = (π i,1, π i,2,..., π i,k ) over topics, so k t=1 πi,t = 1. Model posits that document i s count vector (i-th row in A) follows a multinomial distribution with probabilities given by k [A i,1 A i,2... A i,d ] Expected value is l i k t=1 πi,tβt t. Multinomial l i, t=1 πi,tβ t: k t=1 π i,tβ T t. 20 / 23

47 Example: latent semantic analysis Suppose A P θ. 21 / 23

48 Example: latent semantic analysis Suppose A P θ. In expectation, A has rank k: l 1π T 1 β T 1 l 2π T 2 β T 2 E(A) =... } l nπ T n {{ }} β T k {{ } n k k d 21 / 23

49 Example: latent semantic analysis Suppose A P θ. In expectation, A has rank k: l 1π T 1 β T 1 l 2π T 2 β T 2 E(A) =... } l nπ T n {{ }} β T k {{ } n k k d Observed matrix A: A = E(A) + Zero mean noise so A is generally of rank min{n, d} k. 21 / 23

50 Example: latent semantic analysis Using SVD: rank-k SVD Û kŝk V T k of A gives approximation to LB T : Â := Û kŝk V T k E(A). (SVD helps remove some of the effect of the noise.) 22 / 23

51 Example: latent semantic analysis Using SVD: rank-k SVD Û kŝk V T k of A gives approximation to LB T : Â := Û kŝk V T k E(A). (SVD helps remove some of the effect of the noise.) Each of the n documents can be summarized by k numbers: Â V k = Û kŝk R n k. 22 / 23

52 Example: latent semantic analysis Using SVD: rank-k SVD Û kŝk V T k of A gives approximation to LB T : Â := Û kŝk V T k E(A). (SVD helps remove some of the effect of the noise.) Each of the n documents can be summarized by k numbers: Â V k = Û kŝk R n k. New document feature representation very useful for information retrieval. (Example: cosine similarities between documents become faster to compute and possibly less noisy.) 22 / 23

53 Example: latent semantic analysis Using SVD: rank-k SVD Û kŝk V T k of A gives approximation to LB T : Â := Û kŝk V T k E(A). (SVD helps remove some of the effect of the noise.) Each of the n documents can be summarized by k numbers: Â V k = Û kŝk R n k. New document feature representation very useful for information retrieval. (Example: cosine similarities between documents become faster to compute and possibly less noisy.) Actually estimating π i and β t takes a bit more work. 22 / 23

54 Recap PCA: directions of maximum variance in data subspace that minimizes residual squared error. Computation: power method SVD: general decomposition for arbitrary matrices Low-rank SVD: best low-rank approximation of a matrix in terms of average squared errors PCA/SVD: often useful when low-rank structure is expected (e.g., probabilistic modeling). 23 / 23

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

7 Principal Component Analysis

7 Principal Component Analysis 7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás Póczos Contents Motivation PCA algorithms Applications Some of these slides are taken from Karl Booksh Research

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Principal Component Analysis

Principal Component Analysis B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed

More information

Deriving Principal Component Analysis (PCA)

Deriving Principal Component Analysis (PCA) -0 Mathematical Foundations for Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deriving Principal Component Analysis (PCA) Matt Gormley Lecture 11 Oct.

More information

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice 3. Eigenvalues and Eigenvectors, Spectral Representation 3.. Eigenvalues and Eigenvectors A vector ' is eigenvector of a matrix K, if K' is parallel to ' and ' 6, i.e., K' k' k is the eigenvalue. If is

More information

Assignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition. Name:

Assignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition. Name: Assignment #10: Diagonalization of Symmetric Matrices, Quadratic Forms, Optimization, Singular Value Decomposition Due date: Friday, May 4, 2018 (1:35pm) Name: Section Number Assignment #10: Diagonalization

More information

Manning & Schuetze, FSNLP (c) 1999,2000

Manning & Schuetze, FSNLP (c) 1999,2000 558 15 Topics in Information Retrieval (15.10) y 4 3 2 1 0 0 1 2 3 4 5 6 7 8 Figure 15.7 An example of linear regression. The line y = 0.25x + 1 is the best least-squares fit for the four points (1,1),

More information

A few applications of the SVD

A few applications of the SVD A few applications of the SVD Many methods require to approximate the original data (matrix) by a low rank matrix before attempting to solve the original problem Regularization methods require the solution

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed in the

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality

More information

Numerical Methods I Singular Value Decomposition

Numerical Methods I Singular Value Decomposition Numerical Methods I Singular Value Decomposition Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 9th, 2014 A. Donev (Courant Institute)

More information

Dimensionality Reduction

Dimensionality Reduction Lecture 5 1 Outline 1. Overview a) What is? b) Why? 2. Principal Component Analysis (PCA) a) Objectives b) Explaining variability c) SVD 3. Related approaches a) ICA b) Autoencoders 2 Example 1: Sportsball

More information

15 Singular Value Decomposition

15 Singular Value Decomposition 15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016 Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31 Outline 1 Eigen

More information

Linear Methods in Data Mining

Linear Methods in Data Mining Why Methods? linear methods are well understood, simple and elegant; algorithms based on linear methods are widespread: data mining, computer vision, graphics, pattern recognition; excellent general software

More information

Mathematical foundations - linear algebra

Mathematical foundations - linear algebra Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar

More information

Singular Value Decompsition

Singular Value Decompsition Singular Value Decompsition Massoud Malek One of the most useful results from linear algebra, is a matrix decomposition known as the singular value decomposition It has many useful applications in almost

More information

PROBABILISTIC LATENT SEMANTIC ANALYSIS

PROBABILISTIC LATENT SEMANTIC ANALYSIS PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications

More information

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction Robot Image Credit: Viktoriya Sukhanova 13RF.com Dimensionality Reduction Feature Selection vs. Dimensionality Reduction Feature Selection (last time) Select a subset of features. When classifying novel

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014 Principal Component Analysis and Singular Value Decomposition Volker Tresp, Clemens Otte Summer 2014 1 Motivation So far we always argued for a high-dimensional feature space Still, in some cases it makes

More information

Lecture 2: Linear Algebra Review

Lecture 2: Linear Algebra Review EE 227A: Convex Optimization and Applications January 19 Lecture 2: Linear Algebra Review Lecturer: Mert Pilanci Reading assignment: Appendix C of BV. Sections 2-6 of the web textbook 1 2.1 Vectors 2.1.1

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

Linear Subspace Models

Linear Subspace Models Linear Subspace Models Goal: Explore linear models of a data set. Motivation: A central question in vision concerns how we represent a collection of data vectors. The data vectors may be rasterized images,

More information

CS 340 Lec. 6: Linear Dimensionality Reduction

CS 340 Lec. 6: Linear Dimensionality Reduction CS 340 Lec. 6: Linear Dimensionality Reduction AD January 2011 AD () January 2011 1 / 46 Linear Dimensionality Reduction Introduction & Motivation Brief Review of Linear Algebra Principal Component Analysis

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Principal

More information

Lecture 5 Singular value decomposition

Lecture 5 Singular value decomposition Lecture 5 Singular value decomposition Weinan E 1,2 and Tiejun Li 2 1 Department of Mathematics, Princeton University, weinan@princeton.edu 2 School of Mathematical Sciences, Peking University, tieli@pku.edu.cn

More information

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach Dr. Guangliang Chen February 9, 2016 Outline Introduction Review of linear algebra Matrix SVD PCA Motivation The digits

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional

More information

https://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:

More information

What is Principal Component Analysis?

What is Principal Component Analysis? What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most

More information

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection

More information

LECTURE 16: PCA AND SVD

LECTURE 16: PCA AND SVD Instructor: Sael Lee CS549 Computational Biology LECTURE 16: PCA AND SVD Resource: PCA Slide by Iyad Batal Chapter 12 of PRML Shlens, J. (2003). A tutorial on principal component analysis. CONTENT Principal

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Numerical Linear Algebra Background Cho-Jui Hsieh UC Davis May 15, 2018 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

PCA and LDA. Man-Wai MAK

PCA and LDA. Man-Wai MAK PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component

More information

Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface

Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface CS/EE 5590 / ENG 401 Special Topics, Spring 2018 Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface Zhu Li Dept of CSEE, UMKC http://l.web.umkc.edu/lizhu Office Hour: Tue/Thr 2:30-4pm@FH560E, Contact:

More information

Image Analysis & Retrieval. Lec 14. Eigenface and Fisherface

Image Analysis & Retrieval. Lec 14. Eigenface and Fisherface Image Analysis & Retrieval Lec 14 Eigenface and Fisherface Zhu Li Dept of CSEE, UMKC Office: FH560E, Email: lizhu@umkc.edu, Ph: x 2346. http://l.web.umkc.edu/lizhu Z. Li, Image Analysis & Retrv, Spring

More information

Problem # Max points possible Actual score Total 120

Problem # Max points possible Actual score Total 120 FINAL EXAMINATION - MATH 2121, FALL 2017. Name: ID#: Email: Lecture & Tutorial: Problem # Max points possible Actual score 1 15 2 15 3 10 4 15 5 15 6 15 7 10 8 10 9 15 Total 120 You have 180 minutes to

More information

CSC 411 Lecture 12: Principal Component Analysis

CSC 411 Lecture 12: Principal Component Analysis CSC 411 Lecture 12: Principal Component Analysis Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 12-PCA 1 / 23 Overview Today we ll cover the first unsupervised

More information

MLCC 2015 Dimensionality Reduction and PCA

MLCC 2015 Dimensionality Reduction and PCA MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline PCA & Reconstruction PCA and Maximum Variance PCA and Associated Eigenproblem Beyond the First Principal Component

More information

Announcements (repeat) Principal Components Analysis

Announcements (repeat) Principal Components Analysis 4/7/7 Announcements repeat Principal Components Analysis CS 5 Lecture #9 April 4 th, 7 PA4 is due Monday, April 7 th Test # will be Wednesday, April 9 th Test #3 is Monday, May 8 th at 8AM Just hour long

More information

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

Data Mining Lecture 4: Covariance, EVD, PCA & SVD Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The

More information

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic Applied Mathematics 205 Unit II: Numerical Linear Algebra Lecturer: Dr. David Knezevic Unit II: Numerical Linear Algebra Chapter II.3: QR Factorization, SVD 2 / 66 QR Factorization 3 / 66 QR Factorization

More information

The Singular Value Decomposition

The Singular Value Decomposition The Singular Value Decomposition An Important topic in NLA Radu Tiberiu Trîmbiţaş Babeş-Bolyai University February 23, 2009 Radu Tiberiu Trîmbiţaş ( Babeş-Bolyai University)The Singular Value Decomposition

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 8 Continuous Latent Variable

More information

1 Feature Vectors and Time Series

1 Feature Vectors and Time Series PCA, SVD, LSI, and Kernel PCA 1 Feature Vectors and Time Series We now consider a sample x 1,..., x of objects (not necessarily vectors) and a feature map Φ such that for any object x we have that Φ(x)

More information

Dimensionality Reduction

Dimensionality Reduction Dimensionality Reduction Le Song Machine Learning I CSE 674, Fall 23 Unsupervised learning Learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data where a classification of

More information

PCA and LDA. Man-Wai MAK

PCA and LDA. Man-Wai MAK PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer

More information

CS 3750 Advanced Machine Learning. Applications of SVD and PCA (LSA and Link analysis) Cem Akkaya

CS 3750 Advanced Machine Learning. Applications of SVD and PCA (LSA and Link analysis) Cem Akkaya CS 375 Advanced Machine Learning Applications of SVD and PCA (LSA and Link analysis) Cem Akkaya Outline SVD and LSI Kleinberg s Algorithm PageRank Algorithm Vector Space Model Vector space model represents

More information

Dimensionality reduction

Dimensionality reduction Dimensionality Reduction PCA continued Machine Learning CSE446 Carlos Guestrin University of Washington May 22, 2013 Carlos Guestrin 2005-2013 1 Dimensionality reduction n Input data may have thousands

More information

IV. Matrix Approximation using Least-Squares

IV. Matrix Approximation using Least-Squares IV. Matrix Approximation using Least-Squares The SVD and Matrix Approximation We begin with the following fundamental question. Let A be an M N matrix with rank R. What is the closest matrix to A that

More information

Principal Component Analysis

Principal Component Analysis CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

1 Singular Value Decomposition and Principal Component

1 Singular Value Decomposition and Principal Component Singular Value Decomposition and Principal Component Analysis In these lectures we discuss the SVD and the PCA, two of the most widely used tools in machine learning. Principal Component Analysis (PCA)

More information

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Linear Algebra & Geometry why is linear algebra useful in computer vision? Linear Algebra & Geometry why is linear algebra useful in computer vision? References: -Any book on linear algebra! -[HZ] chapters 2, 4 Some of the slides in this lecture are courtesy to Prof. Octavia

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction

More information

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 9

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 9 STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 9 1. qr and complete orthogonal factorization poor man s svd can solve many problems on the svd list using either of these factorizations but they

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 5: Numerical Linear Algebra Cho-Jui Hsieh UC Davis April 20, 2017 Linear Algebra Background Vectors A vector has a direction and a magnitude

More information

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent.

j=1 u 1jv 1j. 1/ 2 Lemma 1. An orthogonal set of vectors must be linearly independent. Lecture Notes: Orthogonal and Symmetric Matrices Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong taoyf@cse.cuhk.edu.hk Orthogonal Matrix Definition. Let u = [u

More information

Linear Algebra Review. Vectors

Linear Algebra Review. Vectors Linear Algebra Review 9/4/7 Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa (UCSD) Cogsci 8F Linear Algebra review Vectors

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 6 1 / 22 Overview

More information

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T. Notes on singular value decomposition for Math 54 Recall that if A is a symmetric n n matrix, then A has real eigenvalues λ 1,, λ n (possibly repeated), and R n has an orthonormal basis v 1,, v n, where

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same

More information

MATH36001 Generalized Inverses and the SVD 2015

MATH36001 Generalized Inverses and the SVD 2015 MATH36001 Generalized Inverses and the SVD 201 1 Generalized Inverses of Matrices A matrix has an inverse only if it is square and nonsingular. However there are theoretical and practical applications

More information

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to:

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to: System 2 : Modelling & Recognising Modelling and Recognising Classes of Classes of Shapes Shape : PDM & PCA All the same shape? System 1 (last lecture) : limited to rigidly structured shapes System 2 :

More information

Preprocessing & dimensionality reduction

Preprocessing & dimensionality reduction Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax = . (5 points) (a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? dim N(A), since rank(a) 3. (b) If we also know that Ax = has no solution, what do we know about the rank of A? C(A)

More information

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Linear Algebra & Geometry why is linear algebra useful in computer vision? Linear Algebra & Geometry why is linear algebra useful in computer vision? References: -Any book on linear algebra! -[HZ] chapters 2, 4 Some of the slides in this lecture are courtesy to Prof. Octavia

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 22 1 / 21 Overview

More information

Singular Value Decomposition and Principal Component Analysis (PCA) I

Singular Value Decomposition and Principal Component Analysis (PCA) I Singular Value Decomposition and Principal Component Analysis (PCA) I Prof Ned Wingreen MOL 40/50 Microarray review Data per array: 0000 genes, I (green) i,i (red) i 000 000+ data points! The expression

More information

Main matrix factorizations

Main matrix factorizations Main matrix factorizations A P L U P permutation matrix, L lower triangular, U upper triangular Key use: Solve square linear system Ax b. A Q R Q unitary, R upper triangular Key use: Solve square or overdetrmined

More information

STA141C: Big Data & High Performance Statistical Computing

STA141C: Big Data & High Performance Statistical Computing STA141C: Big Data & High Performance Statistical Computing Lecture 9: Dimension Reduction/Word2vec Cho-Jui Hsieh UC Davis May 15, 2018 Principal Component Analysis Principal Component Analysis (PCA) Data

More information

Dimensionality Reduction with Principal Component Analysis

Dimensionality Reduction with Principal Component Analysis 10 Dimensionality Reduction with Principal Component Analysis Working directly with high-dimensional data, such as images, comes with some difficulties: it is hard to analyze, interpretation is difficult,

More information

The Singular Value Decomposition

The Singular Value Decomposition The Singular Value Decomposition Philippe B. Laval KSU Fall 2015 Philippe B. Laval (KSU) SVD Fall 2015 1 / 13 Review of Key Concepts We review some key definitions and results about matrices that will

More information

Latent semantic indexing

Latent semantic indexing Latent semantic indexing Relationship between concepts and words is many-to-many. Solve problems of synonymy and ambiguity by representing documents as vectors of ideas or concepts, not terms. For retrieval,

More information

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters, transforms,

More information

Matrix Decomposition and Latent Semantic Indexing (LSI) Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson

Matrix Decomposition and Latent Semantic Indexing (LSI) Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson Matrix Decomposition and Latent Semantic Indexing (LSI) Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson Latent Semantic Indexing Outline Introduction Linear Algebra Refresher

More information

FSAN/ELEG815: Statistical Learning

FSAN/ELEG815: Statistical Learning : Statistical Learning Gonzalo R. Arce Department of Electrical and Computer Engineering University of Delaware 3. Eigen Analysis, SVD and PCA Outline of the Course 1. Review of Probability 2. Stationary

More information