PCA and admixture models

Size: px
Start display at page:

Download "PCA and admixture models"

Transcription

1 PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57

2 Announcements HW1 solutions posted. PCA and admixture models 2 / 57

3 Supervised versus Unsupervised Learning Unsupervised Learning from unlabeled observations Dimensionality Reduction. Last class. Other latent variable models. This class + review of PCA. PCA and admixture models 3 / 57

4 Outline Dimensionality reduction Linear Algebra background PCA Practical issues Probabilistic PCA Admixture models Population structure and GWAS PCA and admixture models Dimensionality reduction 4 / 57

5 Raw data can be complex, high-dimensional If we knew what to measure, we could find simple relationships. Signals have redundancy. Genotype measured at 500K SNPs. Genotypes at neighboring SNPs correlated. PCA and admixture models Dimensionality reduction 5 / 57

6 Dimensionality reduction Goal: Find a more compact representation of data Why? Visualize and discover hidden patterns. Preprocessing for a supervised learning problem. Statistical: remove noise. Computational: reduce wasteful computation. PCA and admixture models Dimensionality reduction 6 / 57

7 Dimensionality reduction Goal: Find a more compact representation of data Why? Visualize and discover hidden patterns. Preprocessing for a supervised learning problem. Statistical: remove noise. Computational: reduce wasteful computation. PCA and admixture models Dimensionality reduction 6 / 57

8 An example We measure parents and offspring heights. Two measurements. Points in R 2 How can we find a more compact representation? Two measurements are correlated with some noise. Pick a direction and project. PCA and admixture models Dimensionality reduction 7 / 57

9 An example We measure parents and offspring heights. Two measurements. Points in R 2 How can we find a more compact representation? Two measurements are correlated with some noise. Pick a direction and project. PCA and admixture models Dimensionality reduction 7 / 57

10 An example We measure parents and offspring heights. Two measurements. Points in R 2 How can we find a more compact representation? Two measurements are correlated with some noise. Pick a direction and project. PCA and admixture models Dimensionality reduction 7 / 57

11 Goal: Minimize reconstruction error Find projection that minimizes the Euclidean distance between original points and projections. Principal Components Analysis solves this problem! PCA and admixture models Dimensionality reduction 8 / 57

12 Principal Components Analysis PCA: find lower dimensional representation of data Choose K. X is N M raw data. X ZW T where Z = N K reduced representaion (PC scores) W is M K principal components (columns are principal components). PCA and admixture models Dimensionality reduction 9 / 57

13 Outline Dimensionality reduction Linear Algebra background PCA Practical issues Probabilistic PCA Admixture models Population structure and GWAS PCA and admixture models Linear Algebra background 10 / 57

14 Covariance matrix C = 1 N XT X Generalizes to many features C i,i : variance of feature i C i,j : covariance of feature i and j Symmetric PCA and admixture models Linear Algebra background 11 / 57

15 Covariance matrix C = 1 N XT X Positive semi-definite (PSD). Sometimes indicated as C 0 (Positive semi-definite matrix) A matrix A R n n is positive semi-definite iff v T Av 0 for all v R n. PCA and admixture models Linear Algebra background 11 / 57

16 Covariance matrix C = 1 N XT X Positive semi-definite (PSD). Sometimes indicated as C 0 v T Cv v T X T Xv = (Xv) T Xv n 2 = (Xv) i i=1 PCA and admixture models Linear Algebra background 11 / 57

17 Covariance matrix C = 1 N XT X All covariance matrices (being symmetric and PSD) have an eigendecomposition PCA and admixture models Linear Algebra background 11 / 57

18 Eigenvector and eigenvalue (Eigenvector and eigenvalue) A vector v is an eigenvector of A R n n if Av = λv for λ is the eigenvalue associated with v. PCA and admixture models Linear Algebra background 12 / 57

19 Eigendecomposition of a covariance matrix C is symmetric Its eigenvectors {u i }, i {1,..., M} can be chosen to be orthonormal u T i u j = 0, i j u T i u i = 1 We can choose eigenvectors so that eigenvalues are in decreasing order: λ 1 λ 2... λ M. PCA and admixture models Linear Algebra background 13 / 57

20 Eigendecomposition of a covariance matrix Arrange U = [u 1... u M ] Cu i = λ i u i, i {1,..., M} CU = C[u 1... u M ] = [Cu 1... Cu M ] = [λ 1 u 1... λ M u M ] λ = [u 1... u M ] λ M = UΛ PCA and admixture models Linear Algebra background 13 / 57

21 Eigendecomposition of a covariance matrix CU = UΛ Now U is an orthogonal matrix. So UU T = I M C = CUU T = UΛU T PCA and admixture models Linear Algebra background 14 / 57

22 Eigendecomposition of a covariance matrix C = UΛU T U is m m orthonormal matrix. Columns are eigenvectors sorted by eigenvalues. Λ is a diagonal matrix of eigenvalues. PCA and admixture models Linear Algebra background 14 / 57

23 Eigendecomposition: Example Covariance matrix : Ψ PCA and admixture models Linear Algebra background 15 / 57

24 Eigendecomposition: Example Covariance matrix : Ψ PCA and admixture models Linear Algebra background 15 / 57

25 Alternate characterization of eigenvectors Eigenvectors are orthonormal directions of maximum variance Eigenvalues are the variance in these directions. First eigenvector direction of maximum variance with variance = λ 1. PCA and admixture models Linear Algebra background 16 / 57

26 Alternate characterization of eigenvectors Given covariance matrix C R M M x = arg max x x T Cx x 2 = 1 Solution:x = u 1 is the first eigenvector of C. Example of a constrained optimization problem Why do we need the constaint? PCA and admixture models Linear Algebra background 16 / 57

27 Outline Dimensionality reduction Linear Algebra background PCA Practical issues Probabilistic PCA Admixture models Population structure and GWAS PCA and admixture models PCA 17 / 57

28 Back to PCA Given N data points x n R M, n {1,..., N}, find a linear transformation from a lower dimensional space K < M : W R M K and a projection z n R K so that we can reconstruct original data from the lower dimensional projection. x n w 1 z n, w K z n,k = [w 1... w K ] z n,1. z n,k = W z n, z n R K We assume the data is centered. n x n,m = 0. Compression We go from storing N M to M K + N K. How do we define quality of reconstruction? PCA and admixture models PCA 18 / 57

29 PCA Find z n R K and W R M K to minimize the reconstruction error J(W, Z) = 1 2 x n W z n 2 N n Z = [z 1,..., z N ] T Require columns of W to be orthonormal. The optimal solution is obtained by setting Ŵ = U K where U K contains the K eigenvectors associated with the K largest eigenvalues of the covaiance matrix C of X. The low-dimensional projection zˆ n = Ŵ T x n. PCA and admixture models PCA 19 / 57

30 PCA Find z n R K and W R M K to minimize the reconstruction error J(W, Z) = 1 2 x n W z n 2 N n Z = [z 1,..., z N ] T Require columns of W to be orthonormal. The optimal solution is obtained by setting Ŵ = U K where U K contains the K eigenvectors associated with the K largest eigenvalues of the covaiance matrix C of X. The low-dimensional projection zˆ n = Ŵ T x n. PCA and admixture models PCA 19 / 57

31 PCA Find z n R K and W R M K to minimize the reconstruction error J(W, Z) = 1 2 x n W z n 2 N n Z = [z 1,..., z N ] T Require columns of W to be orthonormal. The optimal solution is obtained by setting Ŵ = U K where U K contains the K eigenvectors associated with the K largest eigenvalues of the covaiance matrix C of X. The low-dimensional projection zˆ n = Ŵ T x n. PCA and admixture models PCA 19 / 57

32 PCA: K = 1 J(w 1, z 1 ) = 1 2 x n w 1 z n,1 2 N n = 1 (x n w 1 z n,1 ) T (x n w 1 z n,1 ) N n = 1 ( x T N n x 2w T 1 x n z n,1 + z 2 n,1 w T ) 1 w 1 n = const + 1 ( 2w T 2 N 1 x n z n,1 + z ) n,1 n To maximize this function, take derivatives with respect to z n,1 J(w 1, z 1 ) z n,1 = 0 z n,1 = w T 1 x n PCA and admixture models PCA 20 / 57

33 PCA: K = 1 Plugging back z n,1 = w T 1 x n J(w 1 ) = const + 1 N = const + 1 N = const 1 N ( 2w T 2 1 x n z n,1 + z ) n,1 n ( 2 2zn,1 z n,1 + z ) n,1 n n z n,1 2 Now, because the data is centered E [ z 1 ] = 1 z n,1 N n = 1 w T 1 x n N n = w T 1 1 x n = 0 N PCA and admixture models PCA n 20 / 57

34 PCA: K = 1 J(w 1 ) = const 1 N n z n,1 2 Var [ z 1 ] = E [ z 2 ] 1 E [ z 1 ] 2 = 1 z 2 n,1 0 N = 1 N n n z n,1 2 PCA and admixture models PCA 20 / 57

35 PCA: K = 1 Putting together J(w 1 ) = const 1 N Var [ z 1 ] = 1 N n z n,1 2 n z n,1 2 We have J(w 1 ) = const Var [ z 1 ] Two views of PCA: Find a direction that minimizes the reconstruction error Find a direction that maximizes variance of projected data arg min w1 J(w 1 ) = arg max w1 Var [ z 1 ] PCA and admixture models PCA 20 / 57

36 PCA: K = 1 arg min w1 J(w 1 ) = arg max w1 Var [ z 1 ] Var [ z 1 ] = 1 2 z n,1 N n = 1 w T 1 x n w T 1 x n N n = 1 w T 1 x n x T n w 1 N n = w T n (x nx T n ) 1 w 1 N = w T 1 Cw 1 PCA and admixture models PCA 21 / 57

37 PCA: K = 1 So we need to solve arg min w1 J(w 1 ) = arg max w1 Var [ z 1 ] arg max w1 w T 1 Cw 1 Since we required W to be orthonormal, we need to constrain: w 1 2 = 1. This objective function is maximized when w 1 is the first eigenvector of C PCA and admixture models PCA 21 / 57

38 PCA: K > 1 We can repeat the argument for K > 1. Since we require directions w k to be orthonormal, we can repeat the argument by searching for direction that maximzes the remaining variance and is orthogonal to previously selected directions. PCA and admixture models PCA 22 / 57

39 Computing eigendecompositions Numerical algorithms to compute all eigenvalue, eigenvectors. O(M 3 ). Infeasible for genetic datasets. Computing largest eigenvalue, eigenvector: Power iteration. O(M 2 ). Since we are interested in covariance matrices, can use algorithms to compute the singular-value decomposition (SVD): O(MN 2 ). (Will discuss later). PCA and admixture models PCA 23 / 57

40 Practical issues Choosing K For visualization, K = 2 or K = 3. For other analyses, pick K so that most of the variance in the data is retained. Fraction of variance retained in the top K eigenvectors K k=1 λ k M m=1 λ m PCA and admixture models PCA 24 / 57

41 PCA: Example PCA and admixture models PCA 25 / 57

42 PCA: Example PCA and admixture models PCA 25 / 57

43 PCA: Example PCA and admixture models PCA 25 / 57

44 PCA: Example PCA and admixture models PCA 25 / 57

45 PCA: Example PCA and admixture models PCA 25 / 57

46 PCA on HapMap PCA and admixture models PCA 26 / 57

47 PCA on Human Genome Diversity Project PCA and admixture models PCA 27 / 57

48 PCA on Human Genome Diversity Project PCA and admixture models PCA 27 / 57

49 PCA on European genetic data 1 Novembre et al. Nature 2008 PCA and admixture models PCA 28 / 57

50 Probabilistic interpretation of PCA z n iid N (0, I K ) p(x n z n ) = N (W z n, σ 2 I M ) PCA and admixture models PCA 29 / 57

51 Probabilistic interpretation of PCA z n iid N (0, I K ) p(x n z n ) = N (W z n, σ 2 I M ) E [x n z n ] = W z n E [x n ] = E [E [x n z n ]] = E [W z n ] = W E [z n ] = 0 PCA and admixture models PCA 29 / 57

52 Probabilistic interpretation of PCA z n iid N (0, I K ) p(x n z n ) = N (W z n, σ 2 I M ) Cov [x n ] = E [ x n x T ] n E [xn ] E [x n ] T [ = E (W z n + ɛ n )(W z n + ɛ n ) T] 0 = E [ W z n z T n W T + 2W z n ɛ T n + ɛ n ɛ T ] n = E [ W z n z T n W T] + E [ 2W z n ɛ T ] [ n + E ɛn ɛ T ] n = W E [z n z n ] W T + 2W E [ z n ɛ T n ] + σ 2 I M = W E [z n z n ] W T + 2W E [z n ] E [ɛ n ] T + σ 2 I M = W I K W T + 2W 0 + σ 2 I M = W W T + σ 2 I M PCA and admixture models PCA 29 / 57

53 Probabilistic PCA Log likelihood LL(W, σ 2 ) log P (D W, σ 2 ) Maximize W subject to constraint that columns of W are orthonormal. The maximum likelihood estimator Ŵ ML = U K (ΛK σ 2 I K ) U K = [U 1... U K ] λ Λ K = λ K σ 2 1 M ML = λ j M K j=k+1 PCA and admixture models PCA 30 / 57

54 Probabilistic PCA Log likelihood LL(W, σ 2 ) log P (D W, σ 2 ) Maximize W subject to constraint that columns of W are orthonormal. The maximum likelihood estimator Ŵ ML = U K (ΛK σ 2 I K ) U K = [U 1... U K ] λ Λ K = λ K σ 2 1 M ML = λ j M K j=k+1 PCA and admixture models PCA 30 / 57

55 Probabilistic PCA Computing the MLE Compute eigenvalues, eigenvectors Hidden/latent variable problem: Use EM PCA and admixture models PCA 31 / 57

56 Probabilistic PCA Computing the MLE Compute eigenvalues, eigenvectors Hidden/latent variable problem: Use EM PCA and admixture models PCA 31 / 57

57 Other advantages of Probabilistic PCA Can use model selection to infer K. Choose K to maximize the marginal likelihood P (D K). Use cross-validation and pick K that maximizes likelihood on held out data. Other model selection criteria such as AIC or BIC (see lecture 6 on clustering). PCA and admixture models PCA 32 / 57

58 Mini-Summary Dimensionality reduction: Linear methods Exploratory analysis and visualization. Downstream inference: Can use the low-dimensional features for other tasks. Principal Components Analysis finds a linear subspace that minimized reconstruction error or equivalently maximizes the variance. Eigenvalue problem. Probabilistic interpretation also leads to EM. Why may PCA not be appropriate for genetic data? PCA and admixture models PCA 33 / 57

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Dimensionality Reduction

Dimensionality Reduction Dimensionality Reduction Le Song Machine Learning I CSE 674, Fall 23 Unsupervised learning Learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data where a classification of

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Neuroscience Introduction

Neuroscience Introduction Neuroscience Introduction The brain As humans, we can identify galaxies light years away, we can study particles smaller than an atom. But we still haven t unlocked the mystery of the three pounds of matter

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

Dimensionality Reduction

Dimensionality Reduction Lecture 5 1 Outline 1. Overview a) What is? b) Why? 2. Principal Component Analysis (PCA) a) Objectives b) Explaining variability c) SVD 3. Related approaches a) ICA b) Autoencoders 2 Example 1: Sportsball

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016 Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31 Outline 1 Eigen

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar

Multiple regression. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Multiple regression 1 / 36 Previous two lectures Linear and logistic

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

Preprocessing & dimensionality reduction

Preprocessing & dimensionality reduction Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed

More information

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i

More information

CSC 411 Lecture 12: Principal Component Analysis

CSC 411 Lecture 12: Principal Component Analysis CSC 411 Lecture 12: Principal Component Analysis Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 12-PCA 1 / 23 Overview Today we ll cover the first unsupervised

More information

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/

More information

Lecture 7: Con3nuous Latent Variable Models

Lecture 7: Con3nuous Latent Variable Models CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/

More information

CS 340 Lec. 6: Linear Dimensionality Reduction

CS 340 Lec. 6: Linear Dimensionality Reduction CS 340 Lec. 6: Linear Dimensionality Reduction AD January 2011 AD () January 2011 1 / 46 Linear Dimensionality Reduction Introduction & Motivation Brief Review of Linear Algebra Principal Component Analysis

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 6 1 / 22 Overview

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 8 Continuous Latent Variable

More information

Dimension Reduction and Low-dimensional Embedding

Dimension Reduction and Low-dimensional Embedding Dimension Reduction and Low-dimensional Embedding Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/26 Dimension

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

LECTURE 16: PCA AND SVD

LECTURE 16: PCA AND SVD Instructor: Sael Lee CS549 Computational Biology LECTURE 16: PCA AND SVD Resource: PCA Slide by Iyad Batal Chapter 12 of PRML Shlens, J. (2003). A tutorial on principal component analysis. CONTENT Principal

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same

More information

Mathematical foundations - linear algebra

Mathematical foundations - linear algebra Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 26, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 55 High dimensional

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Dimensionality Reduction with Principal Component Analysis

Dimensionality Reduction with Principal Component Analysis 10 Dimensionality Reduction with Principal Component Analysis Working directly with high-dimensional data, such as images, comes with some difficulties: it is hard to analyze, interpretation is difficult,

More information

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Linear Algebra & Geometry why is linear algebra useful in computer vision? Linear Algebra & Geometry why is linear algebra useful in computer vision? References: -Any book on linear algebra! -[HZ] chapters 2, 4 Some of the slides in this lecture are courtesy to Prof. Octavia

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality

More information

What is Principal Component Analysis?

What is Principal Component Analysis? What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most

More information

MATH 829: Introduction to Data Mining and Analysis Principal component analysis

MATH 829: Introduction to Data Mining and Analysis Principal component analysis 1/11 MATH 829: Introduction to Data Mining and Analysis Principal component analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware April 4, 2016 Motivation 2/11 High-dimensional

More information

MLCC 2015 Dimensionality Reduction and PCA

MLCC 2015 Dimensionality Reduction and PCA MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline PCA & Reconstruction PCA and Maximum Variance PCA and Associated Eigenproblem Beyond the First Principal Component

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed in the

More information

1 Principal Components Analysis

1 Principal Components Analysis Lecture 3 and 4 Sept. 18 and Sept.20-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Principal Components Analysis Principal components analysis (PCA) is a very popular technique for

More information

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis .. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make

More information

Fundamentals of Matrices

Fundamentals of Matrices Maschinelles Lernen II Fundamentals of Matrices Christoph Sawade/Niels Landwehr/Blaine Nelson Tobias Scheffer Matrix Examples Recap: Data Linear Model: f i x = w i T x Let X = x x n be the data matrix

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yuanzhen Shao MA 26500 Yuanzhen Shao PCA 1 / 13 Data as points in R n Assume that we have a collection of data in R n. x 11 x 21 x 12 S = {X 1 =., X x 22 2 =.,, X x m2 m =.

More information

Notes on Latent Semantic Analysis

Notes on Latent Semantic Analysis Notes on Latent Semantic Analysis Costas Boulis 1 Introduction One of the most fundamental problems of information retrieval (IR) is to find all documents (and nothing but those) that are semantically

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

CS181 Midterm 2 Practice Solutions

CS181 Midterm 2 Practice Solutions CS181 Midterm 2 Practice Solutions 1. Convergence of -Means Consider Lloyd s algorithm for finding a -Means clustering of N data, i.e., minimizing the distortion measure objective function J({r n } N n=1,

More information

Machine Learning for Data Science (CS4786) Lecture 12

Machine Learning for Data Science (CS4786) Lecture 12 Machine Learning for Data Science (CS4786) Lecture 12 Gaussian Mixture Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Back to K-means Single link is sensitive to outliners We

More information

Unsupervised Learning: Dimensionality Reduction

Unsupervised Learning: Dimensionality Reduction Unsupervised Learning: Dimensionality Reduction CMPSCI 689 Fall 2015 Sridhar Mahadevan Lecture 3 Outline In this lecture, we set about to solve the problem posed in the previous lecture Given a dataset,

More information

Linear Regression (continued)

Linear Regression (continued) Linear Regression (continued) Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 6, 2017 1 / 39 Outline 1 Administration 2 Review of last lecture 3 Linear regression

More information

Statistics for Applications. Chapter 9: Principal Component Analysis (PCA) 1/16

Statistics for Applications. Chapter 9: Principal Component Analysis (PCA) 1/16 Statistics for Applications Chapter 9: Principal Component Analysis (PCA) 1/16 Multivariate statistics and review of linear algebra (1) Let X be a d-dimensional random vector and X 1,..., X n be n independent

More information

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017 CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).

More information

Probabilistic & Unsupervised Learning. Latent Variable Models

Probabilistic & Unsupervised Learning. Latent Variable Models Probabilistic & Unsupervised Learning Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College London

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

Maximum variance formulation

Maximum variance formulation 12.1. Principal Component Analysis 561 Figure 12.2 Principal component analysis seeks a space of lower dimensionality, known as the principal subspace and denoted by the magenta line, such that the orthogonal

More information

Principal Components Analysis (PCA)

Principal Components Analysis (PCA) Principal Components Analysis (PCA) Principal Components Analysis (PCA) a technique for finding patterns in data of high dimension Outline:. Eigenvectors and eigenvalues. PCA: a) Getting the data b) Centering

More information

Principal components analysis COMS 4771

Principal components analysis COMS 4771 Principal components analysis COMS 4771 1. Representation learning Useful representations of data Representation learning: Given: raw feature vectors x 1, x 2,..., x n R d. Goal: learn a useful feature

More information

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters, transforms,

More information

Lecture 3: Review of Linear Algebra

Lecture 3: Review of Linear Algebra ECE 83 Fall 2 Statistical Signal Processing instructor: R Nowak, scribe: R Nowak Lecture 3: Review of Linear Algebra Very often in this course we will represent signals as vectors and operators (eg, filters,

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Salvador Dalí, Galatea of the Spheres CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang Some slides from Derek Hoiem and Alysha

More information

7 Principal Component Analysis

7 Principal Component Analysis 7 Principal Component Analysis This topic will build a series of techniques to deal with high-dimensional data. Unlike regression problems, our goal is not to predict a value (the y-coordinate), it is

More information

Collaborative Filtering: A Machine Learning Perspective

Collaborative Filtering: A Machine Learning Perspective Collaborative Filtering: A Machine Learning Perspective Chapter 6: Dimensionality Reduction Benjamin Marlin Presenter: Chaitanya Desai Collaborative Filtering: A Machine Learning Perspective p.1/18 Topics

More information

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil

GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis. Massimiliano Pontil GI07/COMPM012: Mathematical Programming and Research Methods (Part 2) 2. Least Squares and Principal Components Analysis Massimiliano Pontil 1 Today s plan SVD and principal component analysis (PCA) Connection

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Descriptive statistics Techniques to visualize

More information

PRINCIPAL COMPONENTS ANALYSIS

PRINCIPAL COMPONENTS ANALYSIS 121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves

More information

Linear Algebra & Geometry why is linear algebra useful in computer vision?

Linear Algebra & Geometry why is linear algebra useful in computer vision? Linear Algebra & Geometry why is linear algebra useful in computer vision? References: -Any book on linear algebra! -[HZ] chapters 2, 4 Some of the slides in this lecture are courtesy to Prof. Octavia

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

20 Unsupervised Learning and Principal Components Analysis (PCA)

20 Unsupervised Learning and Principal Components Analysis (PCA) 116 Jonathan Richard Shewchuk 20 Unsupervised Learning and Principal Components Analysis (PCA) UNSUPERVISED LEARNING We have sample points, but no labels! No classes, no y-values, nothing to predict. Goal:

More information

Deriving Principal Component Analysis (PCA)

Deriving Principal Component Analysis (PCA) -0 Mathematical Foundations for Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deriving Principal Component Analysis (PCA) Matt Gormley Lecture 11 Oct.

More information

Latent Variable Models and EM Algorithm

Latent Variable Models and EM Algorithm SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional

More information

Covariance and Correlation Matrix

Covariance and Correlation Matrix Covariance and Correlation Matrix Given sample {x n } N 1, where x Rd, x n = x 1n x 2n. x dn sample mean x = 1 N N n=1 x n, and entries of sample mean are x i = 1 N N n=1 x in sample covariance matrix

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction

More information

EE16B Designing Information Devices and Systems II

EE16B Designing Information Devices and Systems II EE16B Designing Information Devices and Systems II Lecture 9A Geometry of SVD, PCA Intro Last time: Described the SVD in Compact matrix form: U1SV1 T Full form: UΣV T Showed a procedure to SVD via A T

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

EE16B Designing Information Devices and Systems II

EE16B Designing Information Devices and Systems II EE6B Designing Information Devices and Systems II Lecture 9B Geometry of SVD, PCA Uniqueness of the SVD Find SVD of A 0 A 0 AA T 0 ) ) 0 0 ~u ~u 0 ~u ~u ~u ~u Uniqueness of the SVD Find SVD of A 0 A 0

More information

Computer Vision Group Prof. Daniel Cremers. 3. Regression

Computer Vision Group Prof. Daniel Cremers. 3. Regression Prof. Daniel Cremers 3. Regression Categories of Learning (Rep.) Learnin g Unsupervise d Learning Clustering, density estimation Supervised Learning learning from a training data set, inference on the

More information

Unsupervised Learning Basics

Unsupervised Learning Basics SC4/SM8 Advanced Topics in Statistical Machine Learning Unsupervised Learning Basics Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/

More information

CSC411: Final Review. James Lucas & David Madras. December 3, 2018

CSC411: Final Review. James Lucas & David Madras. December 3, 2018 CSC411: Final Review James Lucas & David Madras December 3, 2018 Agenda 1. A brief overview 2. Some sample questions Basic ML Terminology The final exam will be on the entire course; however, it will be

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 1 Introduction, researchy course, latest papers Going beyond simple machine learning Perception, strange spaces, images, time, behavior

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component

More information

Principal Component Analysis

Principal Component Analysis Machine Learning Michaelmas 2017 James Worrell Principal Component Analysis 1 Introduction 1.1 Goals of PCA Principal components analysis (PCA) is a dimensionality reduction technique that can be used

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

1 Singular Value Decomposition and Principal Component

1 Singular Value Decomposition and Principal Component Singular Value Decomposition and Principal Component Analysis In these lectures we discuss the SVD and the PCA, two of the most widely used tools in machine learning. Principal Component Analysis (PCA)

More information