Principal Component Analysis
|
|
- Hilda Golden
- 5 years ago
- Views:
Transcription
1 CSci 5525: Machine Learning Dec 3, 2008
2 The Main Idea Given a dataset X = {x 1,..., x N }
3 The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection
4 The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection Two possible formulations
5 The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection Two possible formulations The variance in low-d is maximized
6 The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection Two possible formulations The variance in low-d is maximized The average projection cost is minimized
7 The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection Two possible formulations The variance in low-d is maximized The average projection cost is minimized Both are equivalent
8 Two viewpoints u 1 x 2 x n x n x 1
9 Maximum Variance Formulation Consider X = {x 1,..., x N }
10 Maximum Variance Formulation Consider X = {x 1,..., x N } With x i R d, goal is to get a projection in R m, m < d
11 Maximum Variance Formulation Consider X = {x 1,..., x N } With x i R d, goal is to get a projection in R m, m < d Consider m = 1, need a projection vector u 1 R d
12 Maximum Variance Formulation Consider X = {x 1,..., x N } With x i R d, goal is to get a projection in R m, m < d Consider m = 1, need a projection vector u 1 R d Each datapoint x i gets projected to u T x i
13 Maximum Variance Formulation Consider X = {x 1,..., x N } With x i R d, goal is to get a projection in R m, m < d Consider m = 1, need a projection vector u 1 R d Each datapoint x i gets projected to u T x i Mean of the projected data u T 1 x where x = 1 N N x n
14 Maximum Variance Formulation Consider X = {x 1,..., x N } With x i R d, goal is to get a projection in R m, m < d Consider m = 1, need a projection vector u 1 R d Each datapoint x i gets projected to u T x i Mean of the projected data u T 1 x where x = 1 N Variance of the projected data where 1 N N x n N (u T 1 x n u T 1 x)2 = u T 1 Su 1 S = 1 N N (x n x)(x n x) T
15 Maximum Variance Formulation (Contd.) Maximize u T 1 Su 1 w.r.t. u 1
16 Maximum Variance Formulation (Contd.) Maximize u T 1 Su 1 w.r.t. u 1 Need to have a constraint to prevent u 1
17 Maximum Variance Formulation (Contd.) Maximize u T 1 Su 1 w.r.t. u 1 Need to have a constraint to prevent u 1 Normalization constraint u 1 2 = 1
18 Maximum Variance Formulation (Contd.) Maximize u T 1 Su 1 w.r.t. u 1 Need to have a constraint to prevent u 1 Normalization constraint u 1 2 = 1 The Lagrangian for the problem u T 1 Su 1 + λ 1 (1 u T 1 u 1 )
19 Maximum Variance Formulation (Contd.) Maximize u T 1 Su 1 w.r.t. u 1 Need to have a constraint to prevent u 1 Normalization constraint u 1 2 = 1 The Lagrangian for the problem First order necessary condition u T 1 Su 1 + λ 1 (1 u T 1 u 1 ) Su 1 = λu 1
20 Maximum Variance Formulation (Contd.) Maximize u T 1 Su 1 w.r.t. u 1 Need to have a constraint to prevent u 1 Normalization constraint u 1 2 = 1 The Lagrangian for the problem First order necessary condition u T 1 Su 1 + λ 1 (1 u T 1 u 1 ) Su 1 = λu 1 u 1 must be largest eigenvector of S since u T 1 Su 1 = λ 1
21 Maximum Variance Formulation (Contd.) Maximize u T 1 Su 1 w.r.t. u 1 Need to have a constraint to prevent u 1 Normalization constraint u 1 2 = 1 The Lagrangian for the problem First order necessary condition u T 1 Su 1 + λ 1 (1 u T 1 u 1 ) Su 1 = λu 1 u 1 must be largest eigenvector of S since u T 1 Su 1 = λ 1 The eigenvector u 1 is called a principal component
22 Maximum Variance Formulation (Contd.) Subsequent principal components must be orthogonal to u 1
23 Maximum Variance Formulation (Contd.) Subsequent principal components must be orthogonal to u 1 Maximize u T 2 Su 2 s.t. u 2 2 = 1, u 2 u 1
24 Maximum Variance Formulation (Contd.) Subsequent principal components must be orthogonal to u 1 Maximize u T 2 Su 2 s.t. u 2 2 = 1, u 2 u 1 Turns out to be the second eigenvector, and so on
25 Maximum Variance Formulation (Contd.) Subsequent principal components must be orthogonal to u 1 Maximize u T 2 Su 2 s.t. u 2 2 = 1, u 2 u 1 Turns out to be the second eigenvector, and so on The top-m eigenvectors give the best m-dimensional projection
26 Minimum Error Formulation Consider a complete basis {u i } in R d
27 Minimum Error Formulation Consider a complete basis {u i } in R d Each data point can be written as x n = d i=1 α niu i
28 Minimum Error Formulation Consider a complete basis {u i } in R d Each data point can be written as x n = d i=1 α niu i Note that α ni = x T n u i so that x n = d (x T n u i )u i i=1
29 Minimum Error Formulation Consider a complete basis {u i } in R d Each data point can be written as x n = d i=1 α niu i Note that α ni = x T n u i so that x n = d (x T n u i )u i i=1 Our goal is to obtain a lower dimensional subspace m < d
30 Minimum Error Formulation Consider a complete basis {u i } in R d Each data point can be written as x n = d i=1 α niu i Note that α ni = x T n u i so that x n = d (x T n u i )u i i=1 Our goal is to obtain a lower dimensional subspace m < d A generic representation of a low-d point x n = m d z ni u i + b i u i i=1 i=m+1
31 Minimum Error Formulation Consider a complete basis {u i } in R d Each data point can be written as x n = d i=1 α niu i Note that α ni = x T n u i so that x n = d (x T n u i )u i i=1 Our goal is to obtain a lower dimensional subspace m < d A generic representation of a low-d point x n = m d z ni u i + b i u i i=1 i=m+1 Coefficients z ni depend on the data point x n
32 Minimum Error Formulation Consider a complete basis {u i } in R d Each data point can be written as x n = d i=1 α niu i Note that α ni = x T n u i so that x n = d (x T n u i )u i i=1 Our goal is to obtain a lower dimensional subspace m < d A generic representation of a low-d point x n = m d z ni u i + b i u i i=1 i=m+1 Coefficients z ni depend on the data point x n Free to choose z ni, b i, u i to get x n close to x n
33 Minimum Error Formulation (Contd.) The objective is to minimize J = 1 N x n x n 2 N
34 Minimum Error Formulation (Contd.) The objective is to minimize J = 1 N x n x n 2 N Taking derivative w.r.t. z ni we get z nj = x T n u j, j = 1,..., m
35 Minimum Error Formulation (Contd.) The objective is to minimize J = 1 N x n x n 2 N Taking derivative w.r.t. z ni we get z nj = x T n u j, j = 1,..., m Taking derivative w.r.t. b j we get b j = x T u j, j = m + 1,..., d
36 Minimum Error Formulation (Contd.) The objective is to minimize J = 1 N x n x n 2 N Taking derivative w.r.t. z ni we get z nj = x T n u j, j = 1,..., m Taking derivative w.r.t. b j we get b j = x T u j, j = m + 1,..., d Then we have d x n x n = {(x n x) T u i }u i i=m+1
37 Minimum Error Formulation (Contd.) The objective is to minimize J = 1 N x n x n 2 N Taking derivative w.r.t. z ni we get z nj = x T n u j, j = 1,..., m Taking derivative w.r.t. b j we get b j = x T u j, j = m + 1,..., d Then we have d x n x n = {(x n x) T u i }u i i=m+1 Lies in the space orthogonal to the principal subspace
38 Minimum Error Formulation (Contd.) The objective is to minimize J = 1 N x n x n 2 N Taking derivative w.r.t. z ni we get z nj = x T n u j, j = 1,..., m Taking derivative w.r.t. b j we get b j = x T u j, j = m + 1,..., d Then we have d x n x n = {(x n x) T u i }u i i=m+1 Lies in the space orthogonal to the principal subspace The distortion measure to be minimized J = 1 N d d (x T n u i x T u i ) 2 = u T i Su i N i=m+1 i=m+1
39 Minimum Error Formulation (Contd.) The objective is to minimize J = 1 N x n x n 2 N Taking derivative w.r.t. z ni we get z nj = x T n u j, j = 1,..., m Taking derivative w.r.t. b j we get b j = x T u j, j = m + 1,..., d Then we have d x n x n = {(x n x) T u i }u i i=m+1 Lies in the space orthogonal to the principal subspace The distortion measure to be minimized J = 1 N d d (x T n u i x T u i ) 2 = u T i Su i N i=m+1 i=m+1 Need orthonormality constraints on u i to prevent u i = 0 solution
40 Minimum Error Formulation (Contd.) Consider special case d = 2, m = 1
41 Minimum Error Formulation (Contd.) Consider special case d = 2, m = 1 The Lagrangian of the objective L = u T 2 Su 2 + λ 2 (1 u T 2 u 2 )
42 Minimum Error Formulation (Contd.) Consider special case d = 2, m = 1 The Lagrangian of the objective L = u T 2 Su 2 + λ 2 (1 u T 2 u 2 ) First order condition is Su 2 = λ 2 u 2
43 Minimum Error Formulation (Contd.) Consider special case d = 2, m = 1 The Lagrangian of the objective L = u T 2 Su 2 + λ 2 (1 u T 2 u 2 ) First order condition is Su 2 = λ 2 u 2 In general, the condition is Su i = λ i u i
44 Minimum Error Formulation (Contd.) Consider special case d = 2, m = 1 The Lagrangian of the objective L = u T 2 Su 2 + λ 2 (1 u T 2 u 2 ) First order condition is Su 2 = λ 2 u 2 In general, the condition is Su i = λ i u i Given by the eigenvectors corresponding to the smallest (d m) eigenvalues
45 Minimum Error Formulation (Contd.) Consider special case d = 2, m = 1 The Lagrangian of the objective L = u T 2 Su 2 + λ 2 (1 u T 2 u 2 ) First order condition is Su 2 = λ 2 u 2 In general, the condition is Su i = λ i u i Given by the eigenvectors corresponding to the smallest (d m) eigenvalues So the principal space u i, i = 1,..., m are the largest eigenvectors
46 Kernel PCA In PCA, the principal components u i are given by Su i = λ i u i where N S = 1 N x n x T n Consider a feature mapping φ(x) Want to implicitly perform PCA in the feature space Assume the features have zero mean n φ(x n) = 0
47 Kernel PCA (Contd.) The sample covariance matrix in the feature space C = 1 N φ(x n )φ(x n ) T N
48 Kernel PCA (Contd.) The sample covariance matrix in the feature space C = 1 N φ(x n )φ(x n ) T N The eigenvectors are given by Cv i = λ i v i
49 Kernel PCA (Contd.) The sample covariance matrix in the feature space C = 1 N φ(x n )φ(x n ) T N The eigenvectors are given by Cv i = λ i v i We want to avoid computing C explicitly
50 Kernel PCA (Contd.) The sample covariance matrix in the feature space C = 1 N φ(x n )φ(x n ) T N The eigenvectors are given by Cv i = λ i v i We want to avoid computing C explicitly Note that the eigenvectors satisfy 1 N } φ(x n ) {φ(x n ) T v i = λ i v i N
51 Kernel PCA (Contd.) The sample covariance matrix in the feature space C = 1 N φ(x n )φ(x n ) T N The eigenvectors are given by Cv i = λ i v i We want to avoid computing C explicitly Note that the eigenvectors satisfy 1 N } φ(x n ) {φ(x n ) T v i = λ i v i N Since the inner product is a scaler, we have N v i = a in φ(x n )
52 Kernel PCA (Contd.) Substituting back into the eigenvalue equation 1 N N φ(x n )φ(x n ) T N m=1 a im φ(x m ) = λ i N a in φ(x n )
53 Kernel PCA (Contd.) Substituting back into the eigenvalue equation 1 N N φ(x n )φ(x n ) T N m=1 a im φ(x m ) = λ i Multiplying both sides by φ(x l ) and using K(x n, x m ) = φ(x n ) T φ(x m ), we have 1 N N K(x l, x n ) N a im K(x n, x m ) = λ i m=1 N N a in φ(x n ) a in K(x l, x n )
54 Kernel PCA (Contd.) Substituting back into the eigenvalue equation 1 N N φ(x n )φ(x n ) T N m=1 a im φ(x m ) = λ i Multiplying both sides by φ(x l ) and using K(x n, x m ) = φ(x n ) T φ(x m ), we have 1 N N K(x l, x n ) N a im K(x n, x m ) = λ i m=1 In matrix notation, we have K 2 a i = λ i NKa i N N a in φ(x n ) a in K(x l, x n )
55 Kernel PCA (Contd.) Substituting back into the eigenvalue equation 1 N N φ(x n )φ(x n ) T N m=1 a im φ(x m ) = λ i Multiplying both sides by φ(x l ) and using K(x n, x m ) = φ(x n ) T φ(x m ), we have 1 N N K(x l, x n ) N a im K(x n, x m ) = λ i m=1 In matrix notation, we have K 2 a i = λ i NKa i N N a in φ(x n ) a in K(x l, x n ) Except for eigenvectors with 0 eigenvalues, we can solve Ka i = λ i Na i
56 Kernel PCA (Contd.) Since the original v i are normalized, we have 1 = v T i v i = a T i Ka i = λ i Na T i a i Gives a normalization condition for a i Compute a i by solving the eigenvalue decomposition The projection of a point is given by y i (x) = φ(x i ) T v i = N a in φ(x n ) T φ(x n ) = N a in K(x, x n )
57 Illustration of Kernel PCA (Feature Space) φ 1 v 1 φ 2
58 Illustration of Kernel PCA (Data Space) x 2 x 1
59 Dimensionality of Projection Original x i R d, feature φ(x i ) R D Possibly D >> d so that the number of principal components can be greater than d However, the number of nonzero eigenvalues cannot exceed N The covariance matrix C has rank at most N, even if D >> d Kernel PCA involves eigenvalue decomposition of a N N matrix
60 Kernel PCA: Non-zero Mean The features need not have zero mean Note that the features cannot be explicitly centered The centralized data would be of the form φ(x n ) = φ(x n ) 1 N N φ(x l ) l=1 The corresponding gram matrix K = K 1 N K K1 N + 1 N K1 N Use K in the basic kernel PCA formulation
61 Kernel PCA on Artificial Data
62 Kernel PCA Properties Computes eigenvalue decomposition of N N matrix Standard PCA computes it for d d For large datasets N >> d, Kernel PCA is more expensive Standard PCA gives projection to a low dimensional principal subspace l ˆx n = (x T n u i )u i i=1 Kernel PCA cannot do this φ(x) forms a d-dimensional manifold in R D PCA projection ˆφ of φ(x) need not be in the manifold May not have a pre-image ˆx in the data space
Nonlinear Dimensionality Reduction
Nonlinear Dimensionality Reduction Piyush Rai CS5350/6350: Machine Learning October 25, 2011 Recap: Linear Dimensionality Reduction Linear Dimensionality Reduction: Based on a linear projection of the
More informationMaximum variance formulation
12.1. Principal Component Analysis 561 Figure 12.2 Principal component analysis seeks a space of lower dimensionality, known as the principal subspace and denoted by the magenta line, such that the orthogonal
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationPrincipal Component Analysis (PCA) CSC411/2515 Tutorial
Principal Component Analysis (PCA) CSC411/2515 Tutorial Harris Chan Based on previous tutorial slides by Wenjie Luo, Ladislav Rampasek University of Toronto hchan@cs.toronto.edu October 19th, 2017 (UofT)
More informationConvergence of Eigenspaces in Kernel Principal Component Analysis
Convergence of Eigenspaces in Kernel Principal Component Analysis Shixin Wang Advanced machine learning April 19, 2016 Shixin Wang Convergence of Eigenspaces April 19, 2016 1 / 18 Outline 1 Motivation
More informationPrincipal Component Analysis
Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand
More information1 Principal Components Analysis
Lecture 3 and 4 Sept. 18 and Sept.20-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Principal Components Analysis Principal components analysis (PCA) is a very popular technique for
More informationPrincipal Component Analysis
Principal Component Analysis Yuanzhen Shao MA 26500 Yuanzhen Shao PCA 1 / 13 Data as points in R n Assume that we have a collection of data in R n. x 11 x 21 x 12 S = {X 1 =., X x 22 2 =.,, X x m2 m =.
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationComputation. For QDA we need to calculate: Lets first consider the case that
Computation For QDA we need to calculate: δ (x) = 1 2 log( Σ ) 1 2 (x µ ) Σ 1 (x µ ) + log(π ) Lets first consider the case that Σ = I,. This is the case where each distribution is spherical, around the
More informationSTATISTICAL LEARNING SYSTEMS
STATISTICAL LEARNING SYSTEMS LECTURE 8: UNSUPERVISED LEARNING: FINDING STRUCTURE IN DATA Institute of Computer Science, Polish Academy of Sciences Ph. D. Program 2013/2014 Principal Component Analysis
More informationKernel Principal Component Analysis
Kernel Principal Component Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA Tobias Scheffer Overview Principal Component Analysis (PCA) Kernel-PCA Fisher Linear Discriminant Analysis t-sne 2 PCA: Motivation
More informationFinal Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson
Final Exam, Linear Algebra, Fall, 2003, W. Stephen Wilson Name: TA Name and section: NO CALCULATORS, SHOW ALL WORK, NO OTHER PAPERS ON DESK. There is very little actual work to be done on this exam if
More informationCOMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017
COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY
More informationMLCC 2015 Dimensionality Reduction and PCA
MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline PCA & Reconstruction PCA and Maximum Variance PCA and Associated Eigenproblem Beyond the First Principal Component
More informationDimensionality Reduction
Dimensionality Reduction Le Song Machine Learning I CSE 674, Fall 23 Unsupervised learning Learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data where a classification of
More informationLecture 6 Sept Data Visualization STAT 442 / 890, CM 462
Lecture 6 Sept. 25-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Dual PCA It turns out that the singular value decomposition also allows us to formulate the principle components
More informationConnection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis
Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal
More informationMathematical foundations - linear algebra
Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar
More informationChapter 3 Transformations
Chapter 3 Transformations An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Linear Transformations A function is called a linear transformation if 1. for every and 2. for every If we fix the bases
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationPrincipal Component Analysis -- PCA (also called Karhunen-Loeve transformation)
Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) PCA transforms the original input space into a lower dimensional space, by constructing dimensions that are linear combinations
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationMachine Learning - MT & 14. PCA and MDS
Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)
More information1 Feature Vectors and Time Series
PCA, SVD, LSI, and Kernel PCA 1 Feature Vectors and Time Series We now consider a sample x 1,..., x of objects (not necessarily vectors) and a feature map Φ such that for any object x we have that Φ(x)
More informationCovariance and Correlation Matrix
Covariance and Correlation Matrix Given sample {x n } N 1, where x Rd, x n = x 1n x 2n. x dn sample mean x = 1 N N n=1 x n, and entries of sample mean are x i = 1 N N n=1 x in sample covariance matrix
More informationPoint Distribution Models
Point Distribution Models Jan Kybic winter semester 2007 Point distribution models (Cootes et al., 1992) Shape description techniques A family of shapes = mean + eigenvectors (eigenshapes) Shapes described
More informationPerceptron Revisited: Linear Separators. Support Vector Machines
Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department
More informationLecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26
Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1
More informationPrincipal Components Analysis (PCA)
Principal Components Analysis (PCA) Principal Components Analysis (PCA) a technique for finding patterns in data of high dimension Outline:. Eigenvectors and eigenvalues. PCA: a) Getting the data b) Centering
More informationLECTURE 16: PCA AND SVD
Instructor: Sael Lee CS549 Computational Biology LECTURE 16: PCA AND SVD Resource: PCA Slide by Iyad Batal Chapter 12 of PRML Shlens, J. (2003). A tutorial on principal component analysis. CONTENT Principal
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationDimension Reduction and Low-dimensional Embedding
Dimension Reduction and Low-dimensional Embedding Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/26 Dimension
More informationMachine Learning (CSE 446): Unsupervised Learning: K-means and Principal Component Analysis
Machine Learning (CSE 446): Unsupervised Learning: K-means and Principal Component Analysis Sham M Kakade c 2019 University of Washington cse446-staff@cs.washington.edu 0 / 10 Announcements Please do Q1
More informationAdvanced Introduction to Machine Learning CMU-10715
Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás Póczos Contents Motivation PCA algorithms Applications Some of these slides are taken from Karl Booksh Research
More informationPrincipal Component Analysis (PCA)
Principal Component Analysis (PCA) Salvador Dalí, Galatea of the Spheres CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang Some slides from Derek Hoiem and Alysha
More informationKernel methods, kernel SVM and ridge regression
Kernel methods, kernel SVM and ridge regression Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Collaborative Filtering 2 Collaborative Filtering R: rating matrix; U: user factor;
More informationSupport Vector Machine
Support Vector Machine Kernel: Kernel is defined as a function returning the inner product between the images of the two arguments k(x 1, x 2 ) = ϕ(x 1 ), ϕ(x 2 ) k(x 1, x 2 ) = k(x 2, x 1 ) modularity-
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationApproximate Kernel Methods
Lecture 3 Approximate Kernel Methods Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Machine Learning Summer School Tübingen, 207 Outline Motivating example Ridge regression
More informationNonlinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap
More information1 Singular Value Decomposition and Principal Component
Singular Value Decomposition and Principal Component Analysis In these lectures we discuss the SVD and the PCA, two of the most widely used tools in machine learning. Principal Component Analysis (PCA)
More information1 Inner Product and Orthogonality
CSCI 4/Fall 6/Vora/GWU/Orthogonality and Norms Inner Product and Orthogonality Definition : The inner product of two vectors x and y, x x x =.., y =. x n y y... y n is denoted x, y : Note that n x, y =
More informationExercises * on Principal Component Analysis
Exercises * on Principal Component Analysis Laurenz Wiskott Institut für Neuroinformatik Ruhr-Universität Bochum, Germany, EU 4 February 207 Contents Intuition 3. Problem statement..........................................
More informationLecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University
Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationApproximate Kernel PCA with Random Features
Approximate Kernel PCA with Random Features (Computational vs. Statistical Tradeoff) Bharath K. Sriperumbudur Department of Statistics, Pennsylvania State University Journées de Statistique Paris May 28,
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationLECTURE NOTE #11 PROF. ALAN YUILLE
LECTURE NOTE #11 PROF. ALAN YUILLE 1. NonLinear Dimension Reduction Spectral Methods. The basic idea is to assume that the data lies on a manifold/surface in D-dimensional space, see figure (1) Perform
More informationNotes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.
Notes on singular value decomposition for Math 54 Recall that if A is a symmetric n n matrix, then A has real eigenvalues λ 1,, λ n (possibly repeated), and R n has an orthonormal basis v 1,, v n, where
More informationIV. Matrix Approximation using Least-Squares
IV. Matrix Approximation using Least-Squares The SVD and Matrix Approximation We begin with the following fundamental question. Let A be an M N matrix with rank R. What is the closest matrix to A that
More informationPrincipal Component Analysis
B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition
More informationDimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas
Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx
More informationUnsupervised Learning: Dimensionality Reduction
Unsupervised Learning: Dimensionality Reduction CMPSCI 689 Fall 2015 Sridhar Mahadevan Lecture 3 Outline In this lecture, we set about to solve the problem posed in the previous lecture Given a dataset,
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 8 Continuous Latent Variable
More informationManifold Learning: Theory and Applications to HRI
Manifold Learning: Theory and Applications to HRI Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr August 19, 2008 1 / 46 Greek Philosopher
More informationWhat is Principal Component Analysis?
What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationImage Analysis. PCA and Eigenfaces
Image Analysis PCA and Eigenfaces Christophoros Nikou cnikou@cs.uoi.gr Images taken from: D. Forsyth and J. Ponce. Computer Vision: A Modern Approach, Prentice Hall, 2003. Computer Vision course by Svetlana
More informationPCA and LDA. Man-Wai MAK
PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer
More informationPRINCIPAL COMPONENTS ANALYSIS
121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves
More informationMATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors.
MATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors. Orthogonal sets Let V be a vector space with an inner product. Definition. Nonzero vectors v 1,v
More informationCS181 Midterm 2 Practice Solutions
CS181 Midterm 2 Practice Solutions 1. Convergence of -Means Consider Lloyd s algorithm for finding a -Means clustering of N data, i.e., minimizing the distortion measure objective function J({r n } N n=1,
More informationData Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis
Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture
More informationUnsupervised dimensionality reduction
Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction
More informationExercise Sheet 1.
Exercise Sheet 1 You can download my lecture and exercise sheets at the address http://sami.hust.edu.vn/giang-vien/?name=huynt 1) Let A, B be sets. What does the statement "A is not a subset of B " mean?
More informationPrincipal Component Analysis (PCA)
Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction
More informationPCA and LDA. Man-Wai MAK
PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer
More informationWorksheet for Lecture 25 Section 6.4 Gram-Schmidt Process
Worksheet for Lecture Name: Section.4 Gram-Schmidt Process Goal For a subspace W = Span{v,..., v n }, we want to find an orthonormal basis of W. Example Let W = Span{x, x } with x = and x =. Give an orthogonal
More informationData Mining Lecture 4: Covariance, EVD, PCA & SVD
Data Mining Lecture 4: Covariance, EVD, PCA & SVD Jo Houghton ECS Southampton February 25, 2019 1 / 28 Variance and Covariance - Expectation A random variable takes on different values due to chance The
More informationLecture: Face Recognition and Feature Reduction
Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed
More informationChapter XII: Data Pre and Post Processing
Chapter XII: Data Pre and Post Processing Information Retrieval & Data Mining Universität des Saarlandes, Saarbrücken Winter Semester 2013/14 XII.1 4-1 Chapter XII: Data Pre and Post Processing 1. Data
More informationMachine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Machine Learning Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1395 1 / 47 Table of contents 1 Introduction
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable
More informationSTA141C: Big Data & High Performance Statistical Computing
STA141C: Big Data & High Performance Statistical Computing Lecture 9: Dimension Reduction/Word2vec Cho-Jui Hsieh UC Davis May 15, 2018 Principal Component Analysis Principal Component Analysis (PCA) Data
More informationFocus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.
Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,
More informationLecture 7: Con3nuous Latent Variable Models
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/
More informationPrincipal Components Analysis
Principal Components Analysis Santiago Paternain, Aryan Mokhtari and Alejandro Ribeiro March 29, 2018 At this point we have already seen how the Discrete Fourier Transform and the Discrete Cosine Transform
More informationOnline Kernel PCA with Entropic Matrix Updates
Dima Kuzmin Manfred K. Warmuth Computer Science Department, University of California - Santa Cruz dima@cse.ucsc.edu manfred@cse.ucsc.edu Abstract A number of updates for density matrices have been developed
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationLecture 7 Spectral methods
CSE 291: Unsupervised learning Spring 2008 Lecture 7 Spectral methods 7.1 Linear algebra review 7.1.1 Eigenvalues and eigenvectors Definition 1. A d d matrix M has eigenvalue λ if there is a d-dimensional
More informationCS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)
CS4495/6495 Introduction to Computer Vision 8B-L2 Principle Component Analysis (and its use in Computer Vision) Wavelength 2 Wavelength 2 Principal Components Principal components are all about the directions
More informationLearning Eigenfunctions: Links with Spectral Clustering and Kernel PCA
Learning Eigenfunctions: Links with Spectral Clustering and Kernel PCA Yoshua Bengio Pascal Vincent Jean-François Paiement University of Montreal April 2, Snowbird Learning 2003 Learning Modal Structures
More informationUnsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent
Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:
More informationLinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis
More informationSolutions to Review Problems for Chapter 6 ( ), 7.1
Solutions to Review Problems for Chapter (-, 7 The Final Exam is on Thursday, June,, : AM : AM at NESBITT Final Exam Breakdown Sections % -,7-9,- - % -9,-,7,-,-7 - % -, 7 - % Let u u and v Let x x x x,
More informationMACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA
1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationLecture: Face Recognition and Feature Reduction
Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed in the
More informationSPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS
SPECTRAL CLUSTERING AND KERNEL PRINCIPAL COMPONENT ANALYSIS ARE PURSUING GOOD PROJECTIONS VIKAS CHANDRAKANT RAYKAR DECEMBER 5, 24 Abstract. We interpret spectral clustering algorithms in the light of unsupervised
More informationEigenvalues, Eigenvectors, and an Intro to PCA
Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.
More informationMachine Learning: Basis and Wavelet 김화평 (CSE ) Medical Image computing lab 서진근교수연구실 Haar DWT in 2 levels
Machine Learning: Basis and Wavelet 32 157 146 204 + + + + + - + - 김화평 (CSE ) Medical Image computing lab 서진근교수연구실 7 22 38 191 17 83 188 211 71 167 194 207 135 46 40-17 18 42 20 44 31 7 13-32 + + - - +
More informationStatistics for Applications. Chapter 9: Principal Component Analysis (PCA) 1/16
Statistics for Applications Chapter 9: Principal Component Analysis (PCA) 1/16 Multivariate statistics and review of linear algebra (1) Let X be a d-dimensional random vector and X 1,..., X n be n independent
More informationEigenvalues, Eigenvectors, and an Intro to PCA
Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.
More informationLecture 5 Supspace Tranformations Eigendecompositions, kernel PCA and CCA
Lecture 5 Supspace Tranformations Eigendecompositions, kernel PCA and CCA Pavel Laskov 1 Blaine Nelson 1 1 Cognitive Systems Group Wilhelm Schickard Institute for Computer Science Universität Tübingen,
More informationLecture 6 Proof for JL Lemma and Linear Dimensionality Reduction
COMS 4995: Unsupervised Learning (Summer 18) June 7, 018 Lecture 6 Proof for JL Lemma and Linear imensionality Reduction Instructor: Nakul Verma Scribes: Ziyuan Zhong, Kirsten Blancato This lecture gives
More informationLearning sets and subspaces: a spectral approach
Learning sets and subspaces: a spectral approach Alessandro Rudi DIBRIS, Università di Genova Optimization and dynamical processes in Statistical learning and inverse problems Sept 8-12, 2014 A world of
More information