Principal Component Analysis and Linear Discriminant Analysis

Size: px
Start display at page:

Download "Principal Component Analysis and Linear Discriminant Analysis"

Transcription

1 Principal Component Analysis and Linear Discriminant Analysis Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL /29

2 Decomposition and Components Decomposition is a great idea. Linear decomposition and linear basis, e.g., the Fourier transform The bases construct the feature space may be orthogonal bases, may be not give the direction to find the components specified vs. learnt? The features are the image (or projection) of the original signal in the feature space e.g., the orthogonal projection of the original signal onto the feature space the projection does not have to be orthogonal Feature extraction 2/29

3 Outline Principal Component Analysis Linear Discriminant Analysis Comparison between PCA and LDA 3/29

4 Principal Components and Subspaces Subspaces preserve part of the information (and energy, or uncertainty) Principal components are orthogonal bases and preserve the large portion of the information of the data capture the major uncertainties (or variations) of data Two views Deterministic: minimizing the distortion of projection of the data Statistical: maximizing the uncertainty in data are they the same? under what condition they are the same? 4/29

5 View 1: Minimizing the MSE x ( WW T )x x R n, and assume centering E{x} = 0. m is the dim of the subspace, m < n orthonormal bases W = [w 1,w 2,...,w m ] where W T W = I, i.e., rotation orthogonal projection of x: Px = m (wi T x)w i = (WW T )x i=1 it achieves the minimum mean-square error (prove it!) e min MSE = E{ x Px 2 } = E{ P x } PCA can be posed as: finding a subspace that minimizes the MSE: argminj MSE (W) = E{ x Px 2 }, s.t., W T W = I W 5/29

6 Let do it... It is easy to see: So, J MSE (W) = E{x T x} E{x T Px} minimizing J MSE (W) maximizing E{x T Px} Then we have the following constrained optimization problem The Lagrangian is max W E{xT WW T x} s.t. W T W = I L(W,λ) = E{x T WW T x}+λ T (I W T W) The set of KKT conditions gives: L(W, λ) w i = 2E{xx T }w i 2λ i w i, i 6/29

7 What is it? Let s denote by S = E{xx T } (note: E{x} = 0). The KKT conditions give: or in a more concise matrix form: Sw i = λ i w i, i SW = λ i W What is this? Then, the value of minimum MSE is e min MSE = n i=m+1 i.e., the sum of the eigenvalues of the orthogonal subspace to the PCA subspace. λ i 7/29

8 View 2: Maximizing the Variation Let s look it from another perspective: We have a linear projection of x to a 1-d subspace y = w T x an important note: E{y} = 0 as E{x} = 0 The first principal component of x is such that the variance of the projection y is maximized of course, we need to constrain w to be a unit vector. so we have the following optimization problem what is it? max w J(w) = E{y2 } = E{(w T x) 2 }, s.t. w T w = 1 max w J(w) = wt Sw, s.t. w T w = 1 8/29

9 Maximizing the Variation (cont.) The sorted eigenvalues of S are λ 1 λ 2... λ n, and eigenvectors are {e 1,...,e n }. It is clearly that the first PC is y 1 = e T 1 x This can be generalized to m PCs (where m < n) with one more constraint E{y m y k } = 0, k < m i.e., the PCs are uncorrelated with all previously found PCs The solution is: Sounds familiar? w k = e k 9/29

10 The Two Views Converge The two views lead to the same result! You should prove: uncorrelated components orthonormal projection bases What if we are more greedy, say needing independent components? Do we shall expect orthonormal bases? In which case, we still have orthonormal bases? We ll see it in next lecture. 10/29

11 The Closed-Form Solution Learning the principal components from {x 1,...,x N }: 1. calculating m = 1 N N k=1 x k 2. centering A = [x 1 m,...,x N m] 3. calculating S = N k=1 (x k m)(x k m) T = AA T 4. eigenvalue decomposition 5. sorting λ i and e i 6. finding the bases S = U T ΣU W = [e 1,e 2,...,e m ] Note: The components for x is y = W T (x m), where x R n and y R m 11/29

12 A First Issue n is the dimension of input data, N is the size of the training set In practice, n N E.g., in image-based face recognition, if the resolution of a face image is , when stacking all the pixels, we end up n = 10,000. Note that S is a n n matrix Difficulties: S is ill-conditioned, as in general rank(s) n Eigenvalue decomposition of S is too demanding So, what should we do? 12/29

13 Solution I: First Trick A is a n N matrix, then S = AA T is n n, but A T A is N N Trick Let s do eigenvalue decomposition on A T A A T Ae = λe AA T Ae = λae i.e., if e is an eigenvector of A T A, then Ae is the eigenvector of AA T and the corresponding eigenvalues are the same Don t forget to normalize Ae Note: This trick does not fully solved the problem, as we still need to do eigenvalue decomposition on a N N matrix, which can be fairly large in practice. 13/29

14 Solution II: Using SVD Instead of doing EVD, doing SVD (singular value decomposition) is easier A R n R N A = UΣV T U R n R N, and U T U = I Σ R N R N, is diagonal V R N R N, and V T V = VV T = I 14/29

15 Solution III: Iterative Solution We can design an iterative procedure for finding W, i.e., W W+ W looking at the View of MSE minimization, our cost function: x m (wi T x)w i 2 = x i=1 m y i w i 2 = x (WW T )x 2 i=1 we can stop updating if the KKT is met m w i = γy i [x y i w i ] i=1 Its matrix form is: subspace learning algorithm Two issues: W = γ(xx T W WW T xx T W) The orthogonality is not reinforced Slow convergence 15/29

16 Solution IV: PAST To speed up the iteration, we can use recursive least squares (RLS). We can consider the following cost function J(t) = t β t i x(i) W(t)y(i) 2 i=1 where β is the forgetting factor. W can be solved recursively by the following PAST algorithm 1. y(t) = W T (t 1)x(t) 2. h(t) = P(t 1)y(t) 3. m(t) = h(t)/(β +y T (t)h(t)) 4. P(t) = 1 β Tri[P(t 1) m(t)ht (t)] 5. e(t) = x(t) W(t 1)y(t) 6. W(t) = W(t 1)+e(t)m T (t) 16/29

17 Outline Principal Component Analysis Linear Discriminant Analysis Comparison between PCA and LDA 17/29

18 Face Recognition: Does PCA work well? The same face under different illumination conditions What does PCA capture? Is this what we really want? 18/29

19 From Descriptive to Discriminative PCA extracts features (or components) that well describe the pattern Are they necessarily good for distinguishing between classes and separating patterns? Examples? We need discriminative features. Supervision (or labeled training data) is needed. The issues are: How do we define the discriminant and separability between classes? How many features do we need? How do we maximizing the separability? Here, we give an example of linear discriminant analysis. 19/29

20 Linear Discriminant Analysis Finding an optimal linear projection W Catches major difference between classes and discount irrelevant factors In the projected discriminative subspace, data are clustered 20/29

21 Within-class and Between-class Scatters We have two sets of labeled data: D 1 = {x 1,...,x n1 } and D 2 = {x 1,...,x n2 }. Let s define some terms: The centers of two classes, m i = 1 n i x D i x i Data scatter by definition S = (x m)(x m) T x D Within-class scatter: Between-class scatter: S w = S 1 +S 2 S b = (m 1 m 2 )(m 1 m 2 ) T 21/29

22 Fisher Liner Discriminant Input: We have two sets of labeled data: D 1 = {x 1,...,x n1 } and D 2 = {x 1,...,x n2 }. Output: We want to find a 1-d linear projection w that maximizes the separability between these two classes. Projected data: Y 1 = w T D 1 and Y 2 = w T D 2 Projected class centers: m i = w T m i Projected within-class scatter (it is a scalar in this case) S w = w T S w w prove it! Projected between-class scatter (it is a scalar in this case) Fisher Linear Discriminant S b = w T S b w prove it! J(w) = m 1 m 2 2 S 1 + S 2 = S b S w = wt S b w w T S w w 22/29

23 Rayleigh Quotient Theorem f(λ) = Ax λbx B where z B = z T B 1 z is minimized by the Rayleigh quotient λ = xt Ax x T Bx Proof. f(λ) λ = (Bx) T (Bz) = x T B T B 1 z = x T (Ax λbx) = x T Ax λx T Bx setting it to zero to see the result clearly. 23/29

24 Optimizing Fish Discriminant Theorem J(w) = wt S b w w T S ww is maximized when S b w = λs w w Proof. Let w T S w w = c 0. We can construct the Lagrangian as L(w,λ) = w T S b w λ(w T S w w c) Then KKT is It is clearly that L(w, λ) w = S b w λs w w S b w = λs w w 24/29

25 An Efficient Solution A naive solution is S 1 w S b w = λw Then we can do EVD on S 1 w S b, which needs some computation Is there a more efficient way? Facts: Sb w is along the direction of m 1 m 2. why? we don t care about λ the scalar factor So we can easily figure out the direction of w by Note: rank(s b ) = 1 w = S 1 w (m 1 m 2 ) 25/29

26 Multiple Discriminant Analysis Now, we have c number of classes: within-class scatter S w = c i=1 S i as before between-class scatter is a bit different from 2-class total scatter c S b = n i (m i m)(m i m) T i=1 S t = (x m)(x m) T = S w +S b x MDA is to find a subspace with bases W that maximizes J(W) = S b S w = WT S b W W T S w W 26/29

27 The Solution to MDA The solution is obtained by G-EVD S b w i = λ i S w w i where each w i is a generalized eigenvector In practice, what we can do is the following find the eigenvalues as the root of the characteristic polynomial for each λi, solve w i from S b λ i S w = 0 (S b λ i S w )w i = 0 Note: W is not unique (up to rotation and scaling) Note: rank(s b ) (c 1) (why?) 27/29

28 Outline Principal Component Analysis Linear Discriminant Analysis Comparison between PCA and LDA 28/29

29 The Relation between PCA and LDA LDA PCA 29/29

Dimension Reduction and Low-dimensional Embedding

Dimension Reduction and Low-dimensional Embedding Dimension Reduction and Low-dimensional Embedding Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/26 Dimension

More information

Pattern Recognition 2

Pattern Recognition 2 Pattern Recognition 2 KNN,, Dr. Terence Sim School of Computing National University of Singapore Outline 1 2 3 4 5 Outline 1 2 3 4 5 The Bayes Classifier is theoretically optimum. That is, prob. of error

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis .. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make

More information

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 Linear Discriminant Analysis Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Principal

More information

PCA and LDA. Man-Wai MAK

PCA and LDA. Man-Wai MAK PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection

More information

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization

More information

PCA and LDA. Man-Wai MAK

PCA and LDA. Man-Wai MAK PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer

More information

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision) CS4495/6495 Introduction to Computer Vision 8B-L2 Principle Component Analysis (and its use in Computer Vision) Wavelength 2 Wavelength 2 Principal Components Principal components are all about the directions

More information

Image Analysis & Retrieval. Lec 14. Eigenface and Fisherface

Image Analysis & Retrieval. Lec 14. Eigenface and Fisherface Image Analysis & Retrieval Lec 14 Eigenface and Fisherface Zhu Li Dept of CSEE, UMKC Office: FH560E, Email: lizhu@umkc.edu, Ph: x 2346. http://l.web.umkc.edu/lizhu Z. Li, Image Analysis & Retrv, Spring

More information

Example: Face Detection

Example: Face Detection Announcements HW1 returned New attendance policy Face Recognition: Dimensionality Reduction On time: 1 point Five minutes or more late: 0.5 points Absent: 0 points Biometrics CSE 190 Lecture 14 CSE190,

More information

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Course 495: Advanced Statistical Machine Learning/Pattern Recognition Course 495: Advanced Statistical Machine Learning/Pattern Recognition Deterministic Component Analysis Goal (Lecture): To present standard and modern Component Analysis (CA) techniques such as Principal

More information

1 Principal Components Analysis

1 Principal Components Analysis Lecture 3 and 4 Sept. 18 and Sept.20-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Principal Components Analysis Principal components analysis (PCA) is a very popular technique for

More information

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition ace Recognition Identify person based on the appearance of face CSED441:Introduction to Computer Vision (2017) Lecture10: Subspace Methods and ace Recognition Bohyung Han CSE, POSTECH bhhan@postech.ac.kr

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Salvador Dalí, Galatea of the Spheres CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang Some slides from Derek Hoiem and Alysha

More information

Computational Methods. Eigenvalues and Singular Values

Computational Methods. Eigenvalues and Singular Values Computational Methods Eigenvalues and Singular Values Manfred Huber 2010 1 Eigenvalues and Singular Values Eigenvalues and singular values describe important aspects of transformations and of data relations

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed

More information

MATH 829: Introduction to Data Mining and Analysis Principal component analysis

MATH 829: Introduction to Data Mining and Analysis Principal component analysis 1/11 MATH 829: Introduction to Data Mining and Analysis Principal component analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware April 4, 2016 Motivation 2/11 High-dimensional

More information

Machine Learning 2nd Edition

Machine Learning 2nd Edition INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás Póczos Contents Motivation PCA algorithms Applications Some of these slides are taken from Karl Booksh Research

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed in the

More information

ECE 661: Homework 10 Fall 2014

ECE 661: Homework 10 Fall 2014 ECE 661: Homework 10 Fall 2014 This homework consists of the following two parts: (1) Face recognition with PCA and LDA for dimensionality reduction and the nearest-neighborhood rule for classification;

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction

More information

Data Mining Lecture 4: Covariance, EVD, PCA & SVD

Data Mining Lecture 4: Covariance, EVD, PCA & SVD Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The

More information

Tutorial on Principal Component Analysis

Tutorial on Principal Component Analysis Tutorial on Principal Component Analysis Copyright c 1997, 2003 Javier R. Movellan. This is an open source document. Permission is granted to copy, distribute and/or modify this document under the terms

More information

What is Principal Component Analysis?

What is Principal Component Analysis? What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most

More information

MLCC 2015 Dimensionality Reduction and PCA

MLCC 2015 Dimensionality Reduction and PCA MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline PCA & Reconstruction PCA and Maximum Variance PCA and Associated Eigenproblem Beyond the First Principal Component

More information

Principal Component Analysis

Principal Component Analysis B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition

More information

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

Lecture 5 Supspace Tranformations Eigendecompositions, kernel PCA and CCA

Lecture 5 Supspace Tranformations Eigendecompositions, kernel PCA and CCA Lecture 5 Supspace Tranformations Eigendecompositions, kernel PCA and CCA Pavel Laskov 1 Blaine Nelson 1 1 Cognitive Systems Group Wilhelm Schickard Institute for Computer Science Universität Tübingen,

More information

Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface

Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface CS/EE 5590 / ENG 401 Special Topics, Spring 2018 Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface Zhu Li Dept of CSEE, UMKC http://l.web.umkc.edu/lizhu Office Hour: Tue/Thr 2:30-4pm@FH560E, Contact:

More information

Introduction to Graphical Models

Introduction to Graphical Models Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic

More information

Computation. For QDA we need to calculate: Lets first consider the case that

Computation. For QDA we need to calculate: Lets first consider the case that Computation For QDA we need to calculate: δ (x) = 1 2 log( Σ ) 1 2 (x µ ) Σ 1 (x µ ) + log(π ) Lets first consider the case that Σ = I,. This is the case where each distribution is spherical, around the

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Machine Learning Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1395 1 / 47 Table of contents 1 Introduction

More information

COMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare

COMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare COMP6237 Data Mining Covariance, EVD, PCA & SVD Jonathon Hare jsh2@ecs.soton.ac.uk Variance and Covariance Random Variables and Expected Values Mathematicians talk variance (and covariance) in terms of

More information

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) PCA transforms the original input space into a lower dimensional space, by constructing dimensions that are linear combinations

More information

Lecture 8: Linear Algebra Background

Lecture 8: Linear Algebra Background CSE 521: Design and Analysis of Algorithms I Winter 2017 Lecture 8: Linear Algebra Background Lecturer: Shayan Oveis Gharan 2/1/2017 Scribe: Swati Padmanabhan Disclaimer: These notes have not been subjected

More information

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection

More information

Machine Learning (Spring 2012) Principal Component Analysis

Machine Learning (Spring 2012) Principal Component Analysis 1-71 Machine Learning (Spring 1) Principal Component Analysis Yang Xu This note is partly based on Chapter 1.1 in Chris Bishop s book on PRML and the lecture slides on PCA written by Carlos Guestrin in

More information

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach Dr. Guangliang Chen February 9, 2016 Outline Introduction Review of linear algebra Matrix SVD PCA Motivation The digits

More information

Stat 159/259: Linear Algebra Notes

Stat 159/259: Linear Algebra Notes Stat 159/259: Linear Algebra Notes Jarrod Millman November 16, 2015 Abstract These notes assume you ve taken a semester of undergraduate linear algebra. In particular, I assume you are familiar with the

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Linear Algebra Review. Fei-Fei Li

Linear Algebra Review. Fei-Fei Li Linear Algebra Review Fei-Fei Li 1 / 37 Vectors Vectors and matrices are just collections of ordered numbers that represent something: movements in space, scaling factors, pixel brightnesses, etc. A vector

More information

CS540 Machine learning Lecture 5

CS540 Machine learning Lecture 5 CS540 Machine learning Lecture 5 1 Last time Basis functions for linear regression Normal equations QR SVD - briefly 2 This time Geometry of least squares (again) SVD more slowly LMS Ridge regression 3

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

CSC 411 Lecture 12: Principal Component Analysis

CSC 411 Lecture 12: Principal Component Analysis CSC 411 Lecture 12: Principal Component Analysis Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 12-PCA 1 / 23 Overview Today we ll cover the first unsupervised

More information

Lecture 7 Spectral methods

Lecture 7 Spectral methods CSE 291: Unsupervised learning Spring 2008 Lecture 7 Spectral methods 7.1 Linear algebra review 7.1.1 Eigenvalues and eigenvectors Definition 1. A d d matrix M has eigenvalue λ if there is a d-dimensional

More information

UNIT 6: The singular value decomposition.

UNIT 6: The singular value decomposition. UNIT 6: The singular value decomposition. María Barbero Liñán Universidad Carlos III de Madrid Bachelor in Statistics and Business Mathematical methods II 2011-2012 A square matrix is symmetric if A T

More information

Least Squares Optimization

Least Squares Optimization Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques. I assume the reader is familiar with basic linear algebra, including the

More information

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component

More information

LECTURE 16: PCA AND SVD

LECTURE 16: PCA AND SVD Instructor: Sael Lee CS549 Computational Biology LECTURE 16: PCA AND SVD Resource: PCA Slide by Iyad Batal Chapter 12 of PRML Shlens, J. (2003). A tutorial on principal component analysis. CONTENT Principal

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

Image Analysis. PCA and Eigenfaces

Image Analysis. PCA and Eigenfaces Image Analysis PCA and Eigenfaces Christophoros Nikou cnikou@cs.uoi.gr Images taken from: D. Forsyth and J. Ponce. Computer Vision: A Modern Approach, Prentice Hall, 2003. Computer Vision course by Svetlana

More information

Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson

Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson Name: TA Name and section: NO CALCULATORS, SHOW ALL WORK, NO OTHER PAPERS ON DESK. There is very little actual work to be done on this exam if

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Discriminant analysis and supervised classification

Discriminant analysis and supervised classification Discriminant analysis and supervised classification Angela Montanari 1 Linear discriminant analysis Linear discriminant analysis (LDA) also known as Fisher s linear discriminant analysis or as Canonical

More information

Numerical Methods I Singular Value Decomposition

Numerical Methods I Singular Value Decomposition Numerical Methods I Singular Value Decomposition Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 October 9th, 2014 A. Donev (Courant Institute)

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016 Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31 Outline 1 Eigen

More information

In this section again we shall assume that the matrix A is m m, real and symmetric.

In this section again we shall assume that the matrix A is m m, real and symmetric. 84 3. The QR algorithm without shifts See Chapter 28 of the textbook In this section again we shall assume that the matrix A is m m, real and symmetric. 3.1. Simultaneous Iterations algorithm Suppose we

More information

Signal Analysis. Principal Component Analysis

Signal Analysis. Principal Component Analysis Multi dimensional Signal Analysis Lecture 2E Principal Component Analysis Subspace representation Note! Given avector space V of dimension N a scalar product defined by G 0 a subspace U of dimension M

More information

IV. Matrix Approximation using Least-Squares

IV. Matrix Approximation using Least-Squares IV. Matrix Approximation using Least-Squares The SVD and Matrix Approximation We begin with the following fundamental question. Let A be an M N matrix with rank R. What is the closest matrix to A that

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

CS181 Midterm 2 Practice Solutions

CS181 Midterm 2 Practice Solutions CS181 Midterm 2 Practice Solutions 1. Convergence of -Means Consider Lloyd s algorithm for finding a -Means clustering of N data, i.e., minimizing the distortion measure objective function J({r n } N n=1,

More information

1 Singular Value Decomposition and Principal Component

1 Singular Value Decomposition and Principal Component Singular Value Decomposition and Principal Component Analysis In these lectures we discuss the SVD and the PCA, two of the most widely used tools in machine learning. Principal Component Analysis (PCA)

More information

Face Recognition. Lecture-14

Face Recognition. Lecture-14 Face Recognition Lecture-14 Face Recognition imple Approach Recognize faces (mug shots) using gray levels (appearance). Each image is mapped to a long vector of gray levels. everal views of each person

More information

Principal Component Analysis

Principal Component Analysis CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given

More information

The Singular Value Decomposition

The Singular Value Decomposition The Singular Value Decomposition Philippe B. Laval KSU Fall 2015 Philippe B. Laval (KSU) SVD Fall 2015 1 / 13 Review of Key Concepts We review some key definitions and results about matrices that will

More information

A Least Squares Formulation for Canonical Correlation Analysis

A Least Squares Formulation for Canonical Correlation Analysis A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation

More information

Principal Component Analysis (PCA) CSC411/2515 Tutorial

Principal Component Analysis (PCA) CSC411/2515 Tutorial Principal Component Analysis (PCA) CSC411/2515 Tutorial Harris Chan Based on previous tutorial slides by Wenjie Luo, Ladislav Rampasek University of Toronto hchan@cs.toronto.edu October 19th, 2017 (UofT)

More information

Learning with Singular Vectors

Learning with Singular Vectors Learning with Singular Vectors CIS 520 Lecture 30 October 2015 Barry Slaff Based on: CIS 520 Wiki Materials Slides by Jia Li (PSU) Works cited throughout Overview Linear regression: Given X, Y find w:

More information

Preliminary Examination, Numerical Analysis, August 2016

Preliminary Examination, Numerical Analysis, August 2016 Preliminary Examination, Numerical Analysis, August 2016 Instructions: This exam is closed books and notes. The time allowed is three hours and you need to work on any three out of questions 1-4 and any

More information

Dimensionality Reduction

Dimensionality Reduction Lecture 5 1 Outline 1. Overview a) What is? b) Why? 2. Principal Component Analysis (PCA) a) Objectives b) Explaining variability c) SVD 3. Related approaches a) ICA b) Autoencoders 2 Example 1: Sportsball

More information

Least Squares Optimization

Least Squares Optimization Least Squares Optimization The following is a brief review of least squares optimization and constrained optimization techniques. Broadly, these techniques can be used in data analysis and visualization

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Lecture: Face Recognition

Lecture: Face Recognition Lecture: Face Recognition Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 12-1 What we will learn today Introduction to face recognition The Eigenfaces Algorithm Linear

More information

Fisher s Linear Discriminant Analysis

Fisher s Linear Discriminant Analysis Fisher s Linear Discriminant Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

18.06SC Final Exam Solutions

18.06SC Final Exam Solutions 18.06SC Final Exam Solutions 1 (4+7=11 pts.) Suppose A is 3 by 4, and Ax = 0 has exactly 2 special solutions: 1 2 x 1 = 1 and x 2 = 1 1 0 0 1 (a) Remembering that A is 3 by 4, find its row reduced echelon

More information

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Linear Algebra & Geometry why is linear algebra useful in computer vision? Linear Algebra & Geometry why is linear algebra useful in computer vision? References: -Any book on linear algebra! -[HZ] chapters 2, 4 Some of the slides in this lecture are courtesy to Prof. Octavia

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Principal

More information

Principal Component Analysis CS498

Principal Component Analysis CS498 Principal Component Analysis CS498 Today s lecture Adaptive Feature Extraction Principal Component Analysis How, why, when, which A dual goal Find a good representation The features part Reduce redundancy

More information

Face Recognition and Biometric Systems

Face Recognition and Biometric Systems The Eigenfaces method Plan of the lecture Principal Components Analysis main idea Feature extraction by PCA face recognition Eigenfaces training feature extraction Literature M.A.Turk, A.P.Pentland Face

More information

Dimensionality Reduction

Dimensionality Reduction Dimensionality Reduction Le Song Machine Learning I CSE 674, Fall 23 Unsupervised learning Learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data where a classification of

More information

Motivating the Covariance Matrix

Motivating the Covariance Matrix Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role

More information

Computational functional genomics

Computational functional genomics Computational functional genomics (Spring 2005: Lecture 8) David K. Gifford (Adapted from a lecture by Tommi S. Jaakkola) MIT CSAIL Basic clustering methods hierarchical k means mixture models Multi variate

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 6 1 / 22 Overview

More information

20 Unsupervised Learning and Principal Components Analysis (PCA)

20 Unsupervised Learning and Principal Components Analysis (PCA) 116 Jonathan Richard Shewchuk 20 Unsupervised Learning and Principal Components Analysis (PCA) UNSUPERVISED LEARNING We have sample points, but no labels! No classes, no y-values, nothing to predict. Goal:

More information

Mathematical foundations - linear algebra

Mathematical foundations - linear algebra Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar

More information

Eigenvalues and diagonalization

Eigenvalues and diagonalization Eigenvalues and diagonalization Patrick Breheny November 15 Patrick Breheny BST 764: Applied Statistical Modeling 1/20 Introduction The next topic in our course, principal components analysis, revolves

More information