PCA and admixture models
|
|
- Shawn Bradley
- 5 years ago
- Views:
Transcription
1 PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57
2 Announcements HW1 solutions posted. PCA and admixture models 2 / 57
3 Supervised versus Unsupervised Learning Unsupervised Learning from unlabeled observations Dimensionality Reduction. Last class. Other latent variable models. This class + review of PCA. PCA and admixture models 3 / 57
4 Outline Dimensionality reduction Linear Algebra background PCA Practical issues Probabilistic PCA Admixture models Population structure and GWAS PCA and admixture models Dimensionality reduction 4 / 57
5 Raw data can be complex, high-dimensional If we knew what to measure, we could find simple relationships. Signals have redundancy. Genotype measured at 500K SNPs. Genotypes at neighboring SNPs correlated. PCA and admixture models Dimensionality reduction 5 / 57
6 Dimensionality reduction Goal: Find a more compact representation of data Why? Visualize and discover hidden patterns. Preprocessing for a supervised learning problem. Statistical: remove noise. Computational: reduce wasteful computation. PCA and admixture models Dimensionality reduction 6 / 57
7 Dimensionality reduction Goal: Find a more compact representation of data Why? Visualize and discover hidden patterns. Preprocessing for a supervised learning problem. Statistical: remove noise. Computational: reduce wasteful computation. PCA and admixture models Dimensionality reduction 6 / 57
8 An example We measure parents and offspring heights. Two measurements. Points in R 2 How can we find a more compact representation? Two measurements are correlated with some noise. Pick a direction and project. PCA and admixture models Dimensionality reduction 7 / 57
9 An example We measure parents and offspring heights. Two measurements. Points in R 2 How can we find a more compact representation? Two measurements are correlated with some noise. Pick a direction and project. PCA and admixture models Dimensionality reduction 7 / 57
10 An example We measure parents and offspring heights. Two measurements. Points in R 2 How can we find a more compact representation? Two measurements are correlated with some noise. Pick a direction and project. PCA and admixture models Dimensionality reduction 7 / 57
11 Goal: Minimize reconstruction error Find projection that minimizes the Euclidean distance between original points and projections. Principal Components Analysis solves this problem! PCA and admixture models Dimensionality reduction 8 / 57
12 Principal Components Analysis PCA: find lower dimensional representation of data Choose K. X is N M raw data. X ZW T where Z = N K reduced representaion (PC scores) W is M K principal components (columns are principal components). PCA and admixture models Dimensionality reduction 9 / 57
13 Outline Dimensionality reduction Linear Algebra background PCA Practical issues Probabilistic PCA Admixture models Population structure and GWAS PCA and admixture models Linear Algebra background 10 / 57
14 Covariance matrix C = 1 N XT X Generalizes to many features C i,i : variance of feature i C i,j : covariance of feature i and j Symmetric PCA and admixture models Linear Algebra background 11 / 57
15 Covariance matrix C = 1 N XT X Positive semi-definite (PSD). Sometimes indicated as C 0 (Positive semi-definite matrix) A matrix A R n n is positive semi-definite iff v T Av 0 for all v R n. PCA and admixture models Linear Algebra background 11 / 57
16 Covariance matrix C = 1 N XT X Positive semi-definite (PSD). Sometimes indicated as C 0 v T Cv v T X T Xv = (Xv) T Xv n 2 = (Xv) i i=1 PCA and admixture models Linear Algebra background 11 / 57
17 Covariance matrix C = 1 N XT X All covariance matrices (being symmetric and PSD) have an eigendecomposition PCA and admixture models Linear Algebra background 11 / 57
18 Eigenvector and eigenvalue (Eigenvector and eigenvalue) A vector v is an eigenvector of A R n n if Av = λv for λ is the eigenvalue associated with v. PCA and admixture models Linear Algebra background 12 / 57
19 Eigendecomposition of a covariance matrix C is symmetric Its eigenvectors {u i }, i {1,..., M} can be chosen to be orthonormal u T i u j = 0, i j u T i u i = 1 We can choose eigenvectors so that eigenvalues are in decreasing order: λ 1 λ 2... λ M. PCA and admixture models Linear Algebra background 13 / 57
20 Eigendecomposition of a covariance matrix Arrange U = [u 1... u M ] Cu i = λ i u i, i {1,..., M} CU = C[u 1... u M ] = [Cu 1... Cu M ] = [λ 1 u 1... λ M u M ] λ = [u 1... u M ] λ M = UΛ PCA and admixture models Linear Algebra background 13 / 57
21 Eigendecomposition of a covariance matrix CU = UΛ Now U is an orthogonal matrix. So UU T = I M C = CUU T = UΛU T PCA and admixture models Linear Algebra background 14 / 57
22 Eigendecomposition of a covariance matrix C = UΛU T U is m m orthonormal matrix. Columns are eigenvectors sorted by eigenvalues. Λ is a diagonal matrix of eigenvalues. PCA and admixture models Linear Algebra background 14 / 57
23 Eigendecomposition: Example Covariance matrix : Ψ PCA and admixture models Linear Algebra background 15 / 57
24 Eigendecomposition: Example Covariance matrix : Ψ PCA and admixture models Linear Algebra background 15 / 57
25 Alternate characterization of eigenvectors Eigenvectors are orthonormal directions of maximum variance Eigenvalues are the variance in these directions. First eigenvector direction of maximum variance with variance = λ 1. PCA and admixture models Linear Algebra background 16 / 57
26 Alternate characterization of eigenvectors Given covariance matrix C R M M x = arg max x x T Cx x 2 = 1 Solution:x = u 1 is the first eigenvector of C. Example of a constrained optimization problem Why do we need the constaint? PCA and admixture models Linear Algebra background 16 / 57
27 Outline Dimensionality reduction Linear Algebra background PCA Practical issues Probabilistic PCA Admixture models Population structure and GWAS PCA and admixture models PCA 17 / 57
28 Back to PCA Given N data points x n R M, n {1,..., N}, find a linear transformation from a lower dimensional space K < M : W R M K and a projection z n R K so that we can reconstruct original data from the lower dimensional projection. x n w 1 z n, w K z n,k = [w 1... w K ] z n,1. z n,k = W z n, z n R K We assume the data is centered. n x n,m = 0. Compression We go from storing N M to M K + N K. How do we define quality of reconstruction? PCA and admixture models PCA 18 / 57
29 PCA Find z n R K and W R M K to minimize the reconstruction error J(W, Z) = 1 2 x n W z n 2 N n Z = [z 1,..., z N ] T Require columns of W to be orthonormal. The optimal solution is obtained by setting Ŵ = U K where U K contains the K eigenvectors associated with the K largest eigenvalues of the covaiance matrix C of X. The low-dimensional projection zˆ n = Ŵ T x n. PCA and admixture models PCA 19 / 57
30 PCA Find z n R K and W R M K to minimize the reconstruction error J(W, Z) = 1 2 x n W z n 2 N n Z = [z 1,..., z N ] T Require columns of W to be orthonormal. The optimal solution is obtained by setting Ŵ = U K where U K contains the K eigenvectors associated with the K largest eigenvalues of the covaiance matrix C of X. The low-dimensional projection zˆ n = Ŵ T x n. PCA and admixture models PCA 19 / 57
31 PCA Find z n R K and W R M K to minimize the reconstruction error J(W, Z) = 1 2 x n W z n 2 N n Z = [z 1,..., z N ] T Require columns of W to be orthonormal. The optimal solution is obtained by setting Ŵ = U K where U K contains the K eigenvectors associated with the K largest eigenvalues of the covaiance matrix C of X. The low-dimensional projection zˆ n = Ŵ T x n. PCA and admixture models PCA 19 / 57
32 PCA: K = 1 J(w 1, z 1 ) = 1 2 x n w 1 z n,1 2 N n = 1 (x n w 1 z n,1 ) T (x n w 1 z n,1 ) N n = 1 ( x T N n x 2w T 1 x n z n,1 + z 2 n,1 w T ) 1 w 1 n = const + 1 ( 2w T 2 N 1 x n z n,1 + z ) n,1 n To maximize this function, take derivatives with respect to z n,1 J(w 1, z 1 ) z n,1 = 0 z n,1 = w T 1 x n PCA and admixture models PCA 20 / 57
33 PCA: K = 1 Plugging back z n,1 = w T 1 x n J(w 1 ) = const + 1 N = const + 1 N = const 1 N ( 2w T 2 1 x n z n,1 + z ) n,1 n ( 2 2zn,1 z n,1 + z ) n,1 n n z n,1 2 Now, because the data is centered E [ z 1 ] = 1 z n,1 N n = 1 w T 1 x n N n = w T 1 1 x n = 0 N PCA and admixture models PCA n 20 / 57
34 PCA: K = 1 J(w 1 ) = const 1 N n z n,1 2 Var [ z 1 ] = E [ z 2 ] 1 E [ z 1 ] 2 = 1 z 2 n,1 0 N = 1 N n n z n,1 2 PCA and admixture models PCA 20 / 57
35 PCA: K = 1 Putting together J(w 1 ) = const 1 N Var [ z 1 ] = 1 N n z n,1 2 n z n,1 2 We have J(w 1 ) = const Var [ z 1 ] Two views of PCA: Find a direction that minimizes the reconstruction error Find a direction that maximizes variance of projected data arg min w1 J(w 1 ) = arg max w1 Var [ z 1 ] PCA and admixture models PCA 20 / 57
36 PCA: K = 1 arg min w1 J(w 1 ) = arg max w1 Var [ z 1 ] Var [ z 1 ] = 1 2 z n,1 N n = 1 w T 1 x n w T 1 x n N n = 1 w T 1 x n x T n w 1 N n = w T n (x nx T n ) 1 w 1 N = w T 1 Cw 1 PCA and admixture models PCA 21 / 57
37 PCA: K = 1 So we need to solve arg min w1 J(w 1 ) = arg max w1 Var [ z 1 ] arg max w1 w T 1 Cw 1 Since we required W to be orthonormal, we need to constrain: w 1 2 = 1. This objective function is maximized when w 1 is the first eigenvector of C PCA and admixture models PCA 21 / 57
38 PCA: K > 1 We can repeat the argument for K > 1. Since we require directions w k to be orthonormal, we can repeat the argument by searching for direction that maximzes the remaining variance and is orthogonal to previously selected directions. PCA and admixture models PCA 22 / 57
39 Computing eigendecompositions Numerical algorithms to compute all eigenvalue, eigenvectors. O(M 3 ). Infeasible for genetic datasets. Computing largest eigenvalue, eigenvector: Power iteration. O(M 2 ). Since we are interested in covariance matrices, can use algorithms to compute the singular-value decomposition (SVD): O(MN 2 ). (Will discuss later). PCA and admixture models PCA 23 / 57
40 Practical issues Choosing K For visualization, K = 2 or K = 3. For other analyses, pick K so that most of the variance in the data is retained. Fraction of variance retained in the top K eigenvectors K k=1 λ k M m=1 λ m PCA and admixture models PCA 24 / 57
41 PCA: Example PCA and admixture models PCA 25 / 57
42 PCA: Example PCA and admixture models PCA 25 / 57
43 PCA: Example PCA and admixture models PCA 25 / 57
44 PCA: Example PCA and admixture models PCA 25 / 57
45 PCA: Example PCA and admixture models PCA 25 / 57
46 PCA on HapMap PCA and admixture models PCA 26 / 57
47 PCA on Human Genome Diversity Project PCA and admixture models PCA 27 / 57
48 PCA on Human Genome Diversity Project PCA and admixture models PCA 27 / 57
49 PCA on European genetic data 1 Novembre et al. Nature 2008 PCA and admixture models PCA 28 / 57
50 Probabilistic interpretation of PCA z n iid N (0, I K ) p(x n z n ) = N (W z n, σ 2 I M ) PCA and admixture models PCA 29 / 57
51 Probabilistic interpretation of PCA z n iid N (0, I K ) p(x n z n ) = N (W z n, σ 2 I M ) E [x n z n ] = W z n E [x n ] = E [E [x n z n ]] = E [W z n ] = W E [z n ] = 0 PCA and admixture models PCA 29 / 57
52 Probabilistic interpretation of PCA z n iid N (0, I K ) p(x n z n ) = N (W z n, σ 2 I M ) Cov [x n ] = E [ x n x T ] n E [xn ] E [x n ] T [ = E (W z n + ɛ n )(W z n + ɛ n ) T] 0 = E [ W z n z T n W T + 2W z n ɛ T n + ɛ n ɛ T ] n = E [ W z n z T n W T] + E [ 2W z n ɛ T ] [ n + E ɛn ɛ T ] n = W E [z n z n ] W T + 2W E [ z n ɛ T n ] + σ 2 I M = W E [z n z n ] W T + 2W E [z n ] E [ɛ n ] T + σ 2 I M = W I K W T + 2W 0 + σ 2 I M = W W T + σ 2 I M PCA and admixture models PCA 29 / 57
53 Probabilistic PCA Log likelihood LL(W, σ 2 ) log P (D W, σ 2 ) Maximize W subject to constraint that columns of W are orthonormal. The maximum likelihood estimator Ŵ ML = U K (ΛK σ 2 I K ) U K = [U 1... U K ] λ Λ K = λ K σ 2 1 M ML = λ j M K j=k+1 PCA and admixture models PCA 30 / 57
54 Probabilistic PCA Log likelihood LL(W, σ 2 ) log P (D W, σ 2 ) Maximize W subject to constraint that columns of W are orthonormal. The maximum likelihood estimator Ŵ ML = U K (ΛK σ 2 I K ) U K = [U 1... U K ] λ Λ K = λ K σ 2 1 M ML = λ j M K j=k+1 PCA and admixture models PCA 30 / 57
55 Probabilistic PCA Computing the MLE Compute eigenvalues, eigenvectors Hidden/latent variable problem: Use EM PCA and admixture models PCA 31 / 57
56 Probabilistic PCA Computing the MLE Compute eigenvalues, eigenvectors Hidden/latent variable problem: Use EM PCA and admixture models PCA 31 / 57
57 Other advantages of Probabilistic PCA Can use model selection to infer K. Choose K to maximize the marginal likelihood P (D K). Use cross-validation and pick K that maximizes likelihood on held out data. Other model selection criteria such as AIC or BIC (see lecture 6 on clustering). PCA and admixture models PCA 32 / 57
58 Mini-Summary Dimensionality reduction: Linear methods Exploratory analysis and visualization. Downstream inference: Can use the low-dimensional features for other tasks. Principal Components Analysis finds a linear subspace that minimized reconstruction error or equivalently maximizes the variance. Eigenvalue problem. Probabilistic interpretation also leads to EM. Why may PCA not be appropriate for genetic data? PCA and admixture models PCA 33 / 57
Introduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationMachine Learning - MT & 14. PCA and MDS
Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationDimensionality Reduction
Dimensionality Reduction Le Song Machine Learning I CSE 674, Fall 23 Unsupervised learning Learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data where a classification of
More informationCS281 Section 4: Factor Analysis and PCA
CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we
More informationNeuroscience Introduction
Neuroscience Introduction The brain As humans, we can identify galaxies light years away, we can study particles smaller than an atom. But we still haven t unlocked the mystery of the three pounds of matter
More informationCOMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017
COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY
More informationDimensionality Reduction
Lecture 5 1 Outline 1. Overview a) What is? b) Why? 2. Principal Component Analysis (PCA) a) Objectives b) Explaining variability c) SVD 3. Related approaches a) ICA b) Autoencoders 2 Example 1: Sportsball
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationPrincipal Component Analysis
Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand
More informationPCA, Kernel PCA, ICA
PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per
More informationLecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016
Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31 Outline 1 Eigen
More informationUnsupervised Learning
2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and
More informationLecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26
Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1
More informationMultiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar
Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic
More informationMachine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling
Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University
More informationUnsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent
Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:
More informationData Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis
Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More informationPreprocessing & dimensionality reduction
Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016
More informationLecture: Face Recognition and Feature Reduction
Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed
More informationDeep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang
Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i
More informationCSC 411 Lecture 12: Principal Component Analysis
CSC 411 Lecture 12: Principal Component Analysis Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 12-PCA 1 / 23 Overview Today we ll cover the first unsupervised
More informationManifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA
Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/
More informationLecture 7: Con3nuous Latent Variable Models
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/
More informationCS 340 Lec. 6: Linear Dimensionality Reduction
CS 340 Lec. 6: Linear Dimensionality Reduction AD January 2011 AD () January 2011 1 / 46 Linear Dimensionality Reduction Introduction & Motivation Brief Review of Linear Algebra Principal Component Analysis
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College
More informationEECS 275 Matrix Computation
EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 6 1 / 22 Overview
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 8 Continuous Latent Variable
More informationDimension Reduction and Low-dimensional Embedding
Dimension Reduction and Low-dimensional Embedding Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/26 Dimension
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr
More informationLECTURE 16: PCA AND SVD
Instructor: Sael Lee CS549 Computational Biology LECTURE 16: PCA AND SVD Resource: PCA Slide by Iyad Batal Chapter 12 of PRML Shlens, J. (2003). A tutorial on principal component analysis. CONTENT Principal
More informationExpectation Maximization
Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same
More informationMathematical foundations - linear algebra
Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationDimensionality Reduction with Principal Component Analysis
10 Dimensionality Reduction with Principal Component Analysis Working directly with high-dimensional data, such as images, comes with some difficulties: it is hard to analyze, interpretation is difficult,
More informationLinear Algebra & Geometry why is linear algebra useful in computer vision?
Linear Algebra & Geometry why is linear algebra useful in computer vision? References: -Any book on linear algebra! -[HZ] chapters 2, 4 Some of the slides in this lecture are courtesy to Prof. Octavia
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality
More informationWhat is Principal Component Analysis?
What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most
More informationMATH 829: Introduction to Data Mining and Analysis Principal component analysis
1/11 MATH 829: Introduction to Data Mining and Analysis Principal component analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware April 4, 2016 Motivation 2/11 High-dimensional
More informationMLCC 2015 Dimensionality Reduction and PCA
MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline PCA & Reconstruction PCA and Maximum Variance PCA and Associated Eigenproblem Beyond the First Principal Component
More informationLecture: Face Recognition and Feature Reduction
Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed in the
More information1 Principal Components Analysis
Lecture 3 and 4 Sept. 18 and Sept.20-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Principal Components Analysis Principal components analysis (PCA) is a very popular technique for
More informationDATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD
DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary
More informationMACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA
1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR
More informationDecember 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis
.. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make
More informationFundamentals of Matrices
Maschinelles Lernen II Fundamentals of Matrices Christoph Sawade/Niels Landwehr/Blaine Nelson Tobias Scheffer Matrix Examples Recap: Data Linear Model: f i x = w i T x Let X = x x n be the data matrix
More informationClustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning
Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades
More informationPrincipal Component Analysis
Principal Component Analysis Yuanzhen Shao MA 26500 Yuanzhen Shao PCA 1 / 13 Data as points in R n Assume that we have a collection of data in R n. x 11 x 21 x 12 S = {X 1 =., X x 22 2 =.,, X x m2 m =.
More informationNotes on Latent Semantic Analysis
Notes on Latent Semantic Analysis Costas Boulis 1 Introduction One of the most fundamental problems of information retrieval (IR) is to find all documents (and nothing but those) that are semantically
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationCS181 Midterm 2 Practice Solutions
CS181 Midterm 2 Practice Solutions 1. Convergence of -Means Consider Lloyd s algorithm for finding a -Means clustering of N data, i.e., minimizing the distortion measure objective function J({r n } N n=1,
More informationMachine Learning for Data Science (CS4786) Lecture 12
Machine Learning for Data Science (CS4786) Lecture 12 Gaussian Mixture Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Back to K-means Single link is sensitive to outliners We
More informationUnsupervised Learning: Dimensionality Reduction
Unsupervised Learning: Dimensionality Reduction CMPSCI 689 Fall 2015 Sridhar Mahadevan Lecture 3 Outline In this lecture, we set about to solve the problem posed in the previous lecture Given a dataset,
More informationLinear Regression (continued)
Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression
More informationStatistics for Applications. Chapter 9: Principal Component Analysis (PCA) 1/16
Statistics for Applications Chapter 9: Principal Component Analysis (PCA) 1/16 Multivariate statistics and review of linear algebra (1) Let X be a d-dimensional random vector and X 1,..., X n be n independent
More informationCS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works
CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The
More information14 Singular Value Decomposition
14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationCPSC 340: Machine Learning and Data Mining. More PCA Fall 2017
CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).
More informationProbabilistic & Unsupervised Learning. Latent Variable Models
Probabilistic & Unsupervised Learning Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London
More informationMachine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.
Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning
More informationMaximum variance formulation
12.1. Principal Component Analysis 561 Figure 12.2 Principal component analysis seeks a space of lower dimensionality, known as the principal subspace and denoted by the magenta line, such that the orthogonal
More informationPrincipal Components Analysis (PCA)
Principal Components Analysis (PCA) Principal Components Analysis (PCA) a technique for finding patterns in data of high dimension Outline:. Eigenvectors and eigenvalues. PCA: a) Getting the data b) Centering
More informationPrincipal components analysis COMS 4771
Principal components analysis COMS 4771 1. Representation learning Useful representations of data Representation learning: Given: raw feature vectors x 1, x 2,..., x n R d. Goal: learn a useful feature
More informationLecture 3: Review of Linear Algebra
ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters, transforms,
More informationLecture 3: Review of Linear Algebra
ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak, scribe: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters,
More informationMultivariate Statistical Analysis
Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions
More informationPrincipal Component Analysis (PCA)
Principal Component Analysis (PCA) Salvador Dalí, Galatea of the Spheres CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang Some slides from Derek Hoiem and Alysha
More information7 Principal Component Analysis
7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is
More informationCollaborative Filtering: A Machine Learning Perspective
Collaborative Filtering: A Machine Learning Perspective Chapter 6: Dimensionality Reduction Benjamin Marlin Presenter: Chaitanya Desai Collaborative Filtering: A Machine Learning Perspective p.1/18 Topics
More informationGI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil
GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization
More informationMachine learning for pervasive systems Classification in high-dimensional spaces
Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version
More informationDescriptive Statistics
Descriptive Statistics DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Descriptive statistics Techniques to visualize
More informationPRINCIPAL COMPONENTS ANALYSIS
121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves
More informationLinear Algebra & Geometry why is linear algebra useful in computer vision?
Linear Algebra & Geometry why is linear algebra useful in computer vision? References: -Any book on linear algebra! -[HZ] chapters 2, 4 Some of the slides in this lecture are courtesy to Prof. Octavia
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More information20 Unsupervised Learning and Principal Components Analysis (PCA)
116 Jonathan Richard Shewchuk 20 Unsupervised Learning and Principal Components Analysis (PCA) UNSUPERVISED LEARNING We have sample points, but no labels! No classes, no y-values, nothing to predict. Goal:
More informationDeriving Principal Component Analysis (PCA)
-0 Mathematical Foundations for Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deriving Principal Component Analysis (PCA) Matt Gormley Lecture 11 Oct.
More informationLatent Variable Models and EM Algorithm
SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/
More informationMethods for sparse analysis of high-dimensional data, II
Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional
More informationCovariance and Correlation Matrix
Covariance and Correlation Matrix Given sample {x n } N 1, where x Rd, x n = x 1n x 2n. x dn sample mean x = 1 N N n=1 x n, and entries of sample mean are x i = 1 N N n=1 x in sample covariance matrix
More informationPrincipal Component Analysis (PCA)
Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction
More informationEE16B Designing Information Devices and Systems II
EE16B Designing Information Devices and Systems II Lecture 9A Geometry of SVD, PCA Intro Last time: Described the SVD in Compact matrix form: U1SV1 T Full form: UΣV T Showed a procedure to SVD via A T
More informationLinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis
More informationEE16B Designing Information Devices and Systems II
EE6B Designing Information Devices and Systems II Lecture 9B Geometry of SVD, PCA Uniqueness of the SVD Find SVD of A 0 A 0 AA T 0 ) ) 0 0 ~u ~u 0 ~u ~u ~u ~u Uniqueness of the SVD Find SVD of A 0 A 0
More informationComputer Vision Group Prof. Daniel Cremers. 3. Regression
Prof. Daniel Cremers 3. Regression Categories of Learning (Rep.) Learnin g Unsupervise d Learning Clustering, density estimation Supervised Learning learning from a training data set, inference on the
More informationUnsupervised Learning Basics
SC4/SM8 Advanced Topics in Statistical Machine Learning Unsupervised Learning Basics Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/
More informationCSC411: Final Review. James Lucas & David Madras. December 3, 2018
CSC411: Final Review James Lucas & David Madras December 3, 2018 Agenda 1. A brief overview 2. Some sample questions Basic ML Terminology The final exam will be on the entire course; however, it will be
More informationAdvanced Machine Learning & Perception
Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 1 Introduction, researchy course, latest papers Going beyond simple machine learning Perception, strange spaces, images, time, behavior
More informationStructure in Data. A major objective in data analysis is to identify interesting features or structure in the data.
Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two
More informationMachine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012
Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component
More informationPrincipal Component Analysis
Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More informationLatent Variable Models and EM algorithm
Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic
More information1 Singular Value Decomposition and Principal Component
Singular Value Decomposition and Principal Component Analysis In these lectures we discuss the SVD and the PCA, two of the most widely used tools in machine learning. Principal Component Analysis (PCA)
More information