Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer

Size: px
Start display at page:

Download "Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA. Tobias Scheffer"

Transcription

1 Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen PCA Tobias Scheffer

2 Overview Principal Component Analysis (PCA) Kernel-PCA Fisher Linear Discriminant Analysis t-sne 2

3 PCA: Motivation Data compression Preprocessing (Feature Selection / Noisy Features) Data visualization 3

4 PCA: Example Representation of Digits as an m m pixel matrix The actual number of degrees of freedom is significantly smaller because many features Are meaningless or Are composites of several pixels Goal: Reduce to a d-dimensional subspace 4

5 PCA: Example Representation of faces as an m m pixel matrix The actual number of degrees of freedom is significantly smaller because many combinations of pixels cannot occur in faces Reduce to a d-dimensional subspace 5

6 PCA: Projection A Projection is an idempotent linear Transformation Center point: x = 1 n Covariance: n n i=1 x i Σ = 1 x n i x x i x T i=1 x x i Data points x i u Projected data points proj u x = ux i 6

7 PCA: Projection A Projection is an idempotent linear Transformation Let u 1 R m with u 1 T u 1 = 1 y 1 x = u 1 T x constitutes a projection onto a one-dimensional subspace x x i u For data in the projection s space, it follows that: T Center (mean): y 1 x = u 1 x Variance: 1 n u T T n 1 x i u 1 x 2 i=1 = u T 1 Σ u 1 7

8 PCA: Problem Setting Given: data X = Find matrix U = x 1 x n = u 1 u d x 11 x 1m x n1 x n1 Vectors u i are orthonormal basis. such that Vector u 1 preserves maximal variance of data: max u 1 T 1 u 1 : u 1 =1 n XXT u 1 Vector u i preserves maximal residual variance. T1 max u u i : u i =1,u i u 1,,u i u i n XXT u i i 1 8

9 PCA: Problem Setting Given: data X = Find matrix U = x 1 x n = u 1 u d x 11 x 1m x n1 x n1 such that Value u k T x i is projection of x i onto dimension u k. Vector U T x i is projection of x i onto coordinates U. x n u 1 x n u d Matrix Y = XU is projection of X onto coordinates U: x 1 u 1 x 1 u d Y = XU = 9

10 PCA: Assumption To simplify the notation, assume centered data. x = 0. Can be achieved by subtracting mean value x i c = x i x 10

11 PCA: Direction of Maximum Variance Find direction u 1 that maximizes projected variance Instances x~p X (assume mean x = 0). The projected variance onto (normalized) u 1 is E proj u1 x 2 = E u 1 T xx T u 1 = u 1 T E xx T Σ xx u 1 E xx T n x i1 = x i1 x im Covariance matrix i=1 x im n x i1 x i1 2 x i1 x i1 x im x im = i=1 x im x im x í1 x i1 x im x im 2 11

12 PCA: Direction of Maximum Variance Find direction w that maximizes projected variance Instances x~p X (assume mean x = 0). The projected variance onto (normalized) u 1 is E proj u1 x 2 = E u 1 T xx T u 1 = u 1 T E xx T Σ xx u 1 The empirical covariance matrix (of centered data) is Σ xx = 1 n XXT How can we find direction u 1 to maximize u 1 T Σ xx u 1? How can we kernelize it? 12

13 PCA: Optimization Problem Solution for u 1 : max variance of the projected data: max u T 1 Σ xx u 1, such that u 1 u T 1 u 1 = 1 Lagrangian: u 1 T Σ xx u 1 + λ 1 1 u 1 T u 1 13

14 PCA: Optimization Problem Solution for u 1 : max variance of the projected data: max u T 1 Σ xx u 1, such that u 1 u T 1 u 1 = 1 Lagrangian: u 1 T Σ xx u 1 + λ 1 1 u 1 T u 1 Taking its derivative & setting it to 0: Σ xx u 1 = λ 1 u 1 The solution u 1 must be an eigenvector of Σ xx Variance of the projected data: u 1 T Σ xx u 1 = u 1 T λ 1 u 1 = λ 1 The solution is the eigenvector u 1 of Σ xx with greatest eigenvalue λ 1, called the 1 st principal component 14

15 PCA: Optimization Problem Solution for u i : max variance of the projected data: max u T i Σ xx u i, such that u i u T i u i = 1 u i u 1,, u i u i 1 Lagrangian: u i T Σ xx u i + λ i 1 u i T u i Taking its derivative & setting it to 0: Σ xx u i = λ i u i The solution u i must be an eigenvector of Σ xx To maximize variance of the projected data: u i T Σ xx u i = u i T λ i u i = λ i And to assure that u i are orthogonal: u i is eigenvector with next-best eigenvalue λ i < λ i 1. 15

16 PCA: Optimization Problem Eigenvector decomposition implies: Σ xx = UΛU T With Λ = λ λ m However, if U 1:d contains the first d eigenvectors, then Y = XU 1:d has only a fraction of the variance: d i=1 λ i tr(σ xx ) Choose d smaller than m but large enough to cover most of the variance. 16

17 PCA Projection of x to the eigenspace: y 1 x = u 1 T x y x = U T x with U = Y = XU u 1 u d Largest eigenvector is 1 st principal component The remaining principal components are orthogonal directions which maximize the residual variance d principal components vectors of the d largest eigenvalues 17

18 PCA: Reverse Projection Observation: u j form a basis for R m & y j x are the coordinates of x in that basis Data x i can thus be reconstructed in that basis: x i = m j=1 x i T u j u j or X = UU T X If data lies (mostly) in d-dimensional principal subspace, we can also reconstruct the data there: x i = d j=1 x i T u j u j or X = U 1:d U 1:d T X where U 1:d is the matrix of 1 st d eigenvectors 18

19 Reverse Projection: Example Morphace (Universität Basel) 3D face model of 200 persons (150,000 features) PCA with 199 principal components. 19

20 PCA: Algorithm PCA finds dataset s principal components, which maximize the projected variance Algorithm: 1. Compute data s mean: μ = 1 n n i=1 x i 2. Compute data s covariance: Σ xx = 1 n n i=1 x i μ x i μ T 3. Find principal axes: U = eigenvektors Σ xx 4. Project data onto 1 st d eigenvectors x i U 1:d T x i μ 20

21 Difference Between PCA and Autoencoder PCA: Linear mapping y x = U T x from x to y. Autoencoder: Linear mapping from x to h, then nonlinear activation function y = σ(x). Autoencoder with squared loss and linear activation function = PCA. Stacked autoencoder: more nonlinearity, more complex mappings. Kernel-PCA: linear mapping in feature space Φ. 21

22 Overview Principal Component Analysis (PCA) Kernel-PCA Fisher Linear Discriminant Analysis t-sne 24

23 Kernel PCA PCA can only capture linear subspaces More complex features can capture non-linearity Want to use PCA in high-dimensional spaces Requirements: Data only interact through inner product 25

24 Kernel PCA: Example 26

25 Kernel PCA: Example PCA fails to capture the data s two ring structure rings are not separated in the first 2 components. 27

26 Kernel PCA: Kernel Recap Linear classifiers: Often adequate, but not always. Idea: data implicitly mapped to another space, in which they are linearly classifiable Image mapping: x φ x Associated kernel: κ x i, x j = φ x i T φ x j Kernel = inner product = similarity of Examples (+) (+) (+) (+) (+) (-) (-) (-) (-) (-) (-) (-) (-) (-) 28

27 Kernel PCA Covariance of centered data: Σ xx = 1 n x i x i T i = Eigenvectors: Σ xx u = λu. In feature space, centered: i x i1 x i1 x im x im Σ φ x φ x = φ x i φ x i T i Eigenvectors: Σ φ x φ x u = λu φ x i φ x i T i u = λu All solutions live in the span of φ x 1, φ x n 29

28 Kernel PCA All solutions live in the span of φ x 1, φ x n Hence, all eigenvectors u must be linear combination of φ x 1, φ x n : α k : u = α i φ(x i ) n i=1 Hence, Σ φ x φ x u = λu is satisfied if n projected equations are satisfied: i: φ x T i Σ φ x φ x u = λφ x i u φ x i T φ x j φ x j T j = λφ x j T n k=1 n k=1 α k φ(x k ) α k φ(x k ) 30

29 Kernel PCA n projected equations: i: φ x i T Σ φ x φ x u = λφ x i u φ x i T 1 n 1 n j = λφ x j T j,k φ x j φ x j T n k=1 n α k φ(x k ) k=1 α k φ(x k ) α k φ x i T φ x j φ(x j ) T φ(x k ) n = λ α k φ x T i φ(x k ) k=1 K 2 α = nλkα Kα = nλα K = φ X φ X T 31

30 Kernel PCA Results in eigenvalue problem: Kα = nλα Centering data in feature space: K c ij = φ x i 1 n φ(x k ) φ x j 1 k n = K ij k i 1 j T 1 i k j T + k1 i 1 j T φ(x k ) k k i = 1 n K ik k k = 1 n 2 K jk j,k 32

31 Kernel-PCA Algorithm Kernel-PCA finds dataset s principal components in an implicitly defined feature space Algorithm: 1. Compute kernel matrix K: K ij = κ x i, x j 2. Center the kernel matrix: K = K 1 n 11T K 1 n K11T + 1T K1 n 2 3. Find its eigenvectors: U, V = eig K 11 T 4. 1 Find the dual vectors: α k = λ 2 k u k 5. Project the data onto the subspace: n x j α k,i K ij i=1 d k=1 = α T d k K,j k=1 34

32 Kernel PCA Ring Data Example Kernel PCA (RBF) does capture the data s structure & resulting projections separate the 2 rings 35

33 Overview Principal Component Analysis (PCA) Kernel-PCA Fisher Linear Discriminant Analysis t-sne 39

34 Fisher-Discriminant Analysis (FDA) The subspace induced by PCA maximally captures variance from all data Not the correct criterion for classification Original Space PCA Subspace 10 0 X T u PCA 10 0 x 2 x x x 1 Σu PCA = λ PCA u PCA 40

35 Fisher-Discriminant Analysis (FDA) Optimization criterion of PCA: Maximize the data s variance in the subspace. max ui u T Σu, where u i T u j = 1, u i u j Optimization criterion of FDA: Maximize between-class variance and minimize withinclass variance within the subspace. max u ut Σ b u u T Σ w u, where Variance per class Σ w = Σ +1 + Σ 1 Σ b = x +1 x 1 x +1 x 1 T 41

36 Fisher-Discriminant Analysis (FDA) Optimization criterion of FDA for k classes: Maximize between-class variance and minimize withinclass variance within the subspace. max u ut Σ b u u T Σ w u, where Leads to the generalized eigenvalue problem Σ b u = λσ w u Σ w = Σ Σ k Σ b = n i x i x x i x T k i=1 Number of instances per class Generalized eigenvalue problem has k 1 different solutions 42

37 Fisher-Discriminant Analysis (FDA) The subspace induced by PCA maximally captures variance from all data Not the correct criterion for classification Original Space Fisher PCA Subspace Subpace 10 0 X T u PCA FDA x 2 x x x 1 Σu b u PCA FIS = λ PCA FIS Σu wpca u FIS 43

38 Overview Principal Component Analysis (PCA) Kernel-PCA Fisher Linear Discriminant Analysis t-sne 44

39 t-sne Impossible to preserve all distances when projecting data into lower-dimensional space. PCA: Preserve maximum variance. Variance is squared distance. Sum is dominated by instances that are far apart. Instances that are far apart from each other shall remain as far apart in the projected space. Idea of t-sne: Preserve local neighborhood. Instances that are close to each other shall remain close in the projected space. Instances that are far apart may be moved further apart by the projection. 45

40 2D PCA for MNIST Handwritten Digits PAC is poor at preserving closeness between similar bitmaps. 46

41 Local Neighborhood in Original Space Probability that x i would pick x j as neighbor if neighbors were picked by Gaussian distribution centered at x i : 2 x i x j exp 2 2σ p j i = i 2 x i x j exp j i 2σ i 2 Set each σ i such that conditional has fixed entropy. 47

42 Distance in Projected Space Probability that x i would pick x j as neighbor if neighbors were picked by Student s t-distribution centered at x i : y i y j q j i = y i y j j i Student s t-distribution has heavier tails: very large distances are more likely than under Gaussian. Moving far instances further apart incurs less penalty. 48

43 t-sne: Optimization Criterion Move instances around in projected space to minimize Kullback-Leibler divergence: KL(p q = p j i log p j i j i q j i If p j i is large but q j i is small: large penalty. If q j i is large but p j i is small: smaller penalty. i Hence, preserves local neighborhood structure of the data. 49

44 t-sne: Optimization Move instances around in projected space to minimize Kullback-Leibler divergence Gradient for projected instance y i : KL(p q y i = 4 (pj i qj i) 1 + yi yj 2 1 (y i y j ) j i Implementation for large samples: Build quadtree over data Approximate p j i and q j i of instances in distinct branches by distances between centers of mass. 50

45 2D t-sne for MNIST Handwritten Digits Local similarities are preserved better. 51

46 Summary PCA constructs lower-dimensional space that preserves most of the variance. Kernel PCA works on the kernel matrix; good when there are fewer instances than there are features. Fisher linear discriminant analysis maximizes between-class and minimizes within-class variance t-sne finds a projection that preserves local neighborhood relations. 52

Principal Component Analysis

Principal Component Analysis CSci 5525: Machine Learning Dec 3, 2008 The Main Idea Given a dataset X = {x 1,..., x N } The Main Idea Given a dataset X = {x 1,..., x N } Find a low-dimensional linear projection The Main Idea Given

More information

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) PCA transforms the original input space into a lower dimensional space, by constructing dimensions that are linear combinations

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization

More information

Principal Component Analysis (PCA) CSC411/2515 Tutorial

Principal Component Analysis (PCA) CSC411/2515 Tutorial Principal Component Analysis (PCA) CSC411/2515 Tutorial Harris Chan Based on previous tutorial slides by Wenjie Luo, Ladislav Rampasek University of Toronto hchan@cs.toronto.edu October 19th, 2017 (UofT)

More information

Principal Component Analysis

Principal Component Analysis B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

1 Principal Components Analysis

1 Principal Components Analysis Lecture 3 and 4 Sept. 18 and Sept.20-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Principal Components Analysis Principal components analysis (PCA) is a very popular technique for

More information

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision)

CS4495/6495 Introduction to Computer Vision. 8B-L2 Principle Component Analysis (and its use in Computer Vision) CS4495/6495 Introduction to Computer Vision 8B-L2 Principle Component Analysis (and its use in Computer Vision) Wavelength 2 Wavelength 2 Principal Components Principal components are all about the directions

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

What is Principal Component Analysis?

What is Principal Component Analysis? What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Nonlinear Dimensionality Reduction Piyush Rai CS5350/6350: Machine Learning October 25, 2011 Recap: Linear Dimensionality Reduction Linear Dimensionality Reduction: Based on a linear projection of the

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Salvador Dalí, Galatea of the Spheres CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang Some slides from Derek Hoiem and Alysha

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás Póczos Contents Motivation PCA algorithms Applications Some of these slides are taken from Karl Booksh Research

More information

Data Preprocessing Tasks

Data Preprocessing Tasks Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Introduction Consider a zero mean random vector R n with autocorrelation matri R = E( T ). R has eigenvectors q(1),,q(n) and associated eigenvalues λ(1) λ(n). Let Q = [ q(1)

More information

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017

COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 COMS 4721: Machine Learning for Data Science Lecture 19, 4/6/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University PRINCIPAL COMPONENT ANALYSIS DIMENSIONALITY

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Linear and Non-Linear Dimensionality Reduction

Linear and Non-Linear Dimensionality Reduction Linear and Non-Linear Dimensionality Reduction Alexander Schulz aschulz(at)techfak.uni-bielefeld.de University of Pisa, Pisa 4.5.215 and 7.5.215 Overview Dimensionality Reduction Motivation Linear Projections

More information

Computation. For QDA we need to calculate: Lets first consider the case that

Computation. For QDA we need to calculate: Lets first consider the case that Computation For QDA we need to calculate: δ (x) = 1 2 log( Σ ) 1 2 (x µ ) Σ 1 (x µ ) + log(π ) Lets first consider the case that Σ = I,. This is the case where each distribution is spherical, around the

More information

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations. Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

Novelty Detection. Cate Welch. May 14, 2015

Novelty Detection. Cate Welch. May 14, 2015 Novelty Detection Cate Welch May 14, 2015 1 Contents 1 Introduction 2 11 The Four Fundamental Subspaces 2 12 The Spectral Theorem 4 1 The Singular Value Decomposition 5 2 The Principal Components Analysis

More information

Kernel Principal Component Analysis

Kernel Principal Component Analysis Kernel Principal Component Analysis Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Image Analysis. PCA and Eigenfaces

Image Analysis. PCA and Eigenfaces Image Analysis PCA and Eigenfaces Christophoros Nikou cnikou@cs.uoi.gr Images taken from: D. Forsyth and J. Ponce. Computer Vision: A Modern Approach, Prentice Hall, 2003. Computer Vision course by Svetlana

More information

PCA and LDA. Man-Wai MAK

PCA and LDA. Man-Wai MAK PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer

More information

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction Robot Image Credit: Viktoriya Sukhanova 13RF.com Dimensionality Reduction Feature Selection vs. Dimensionality Reduction Feature Selection (last time) Select a subset of features. When classifying novel

More information

CS 340 Lec. 6: Linear Dimensionality Reduction

CS 340 Lec. 6: Linear Dimensionality Reduction CS 340 Lec. 6: Linear Dimensionality Reduction AD January 2011 AD () January 2011 1 / 46 Linear Dimensionality Reduction Introduction & Motivation Brief Review of Linear Algebra Principal Component Analysis

More information

MATH 829: Introduction to Data Mining and Analysis Principal component analysis

MATH 829: Introduction to Data Mining and Analysis Principal component analysis 1/11 MATH 829: Introduction to Data Mining and Analysis Principal component analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware April 4, 2016 Motivation 2/11 High-dimensional

More information

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis .. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 2 Nonlinear Manifold Learning Multidimensional Scaling (MDS) Locally Linear Embedding (LLE) Beyond Principal Components Analysis (PCA)

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer Universität Potsdam Institut für Informatik Lehrstuhl Linear Classifiers Blaine Nelson, Tobias Scheffer Contents Classification Problem Bayesian Classifier Decision Linear Classifiers, MAP Models Logistic

More information

Maximum variance formulation

Maximum variance formulation 12.1. Principal Component Analysis 561 Figure 12.2 Principal component analysis seeks a space of lower dimensionality, known as the principal subspace and denoted by the magenta line, such that the orthogonal

More information

Manifold Learning: Theory and Applications to HRI

Manifold Learning: Theory and Applications to HRI Manifold Learning: Theory and Applications to HRI Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr August 19, 2008 1 / 46 Greek Philosopher

More information

Lecture 7: Con3nuous Latent Variable Models

Lecture 7: Con3nuous Latent Variable Models CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/

More information

LEC 3: Fisher Discriminant Analysis (FDA)

LEC 3: Fisher Discriminant Analysis (FDA) LEC 3: Fisher Discriminant Analysis (FDA) A Supervised Dimensionality Reduction Approach Dr. Guangliang Chen February 18, 2016 Outline Motivation: PCA is unsupervised which does not use training labels

More information

MLCC 2015 Dimensionality Reduction and PCA

MLCC 2015 Dimensionality Reduction and PCA MLCC 2015 Dimensionality Reduction and PCA Lorenzo Rosasco UNIGE-MIT-IIT June 25, 2015 Outline PCA & Reconstruction PCA and Maximum Variance PCA and Associated Eigenproblem Beyond the First Principal Component

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems

More information

Preprocessing & dimensionality reduction

Preprocessing & dimensionality reduction Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable

More information

Interpreting Deep Classifiers

Interpreting Deep Classifiers Ruprecht-Karls-University Heidelberg Faculty of Mathematics and Computer Science Seminar: Explainable Machine Learning Interpreting Deep Classifiers by Visual Distillation of Dark Knowledge Author: Daniela

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction

More information

CSC 411 Lecture 12: Principal Component Analysis

CSC 411 Lecture 12: Principal Component Analysis CSC 411 Lecture 12: Principal Component Analysis Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 12-PCA 1 / 23 Overview Today we ll cover the first unsupervised

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yuanzhen Shao MA 26500 Yuanzhen Shao PCA 1 / 13 Data as points in R n Assume that we have a collection of data in R n. x 11 x 21 x 12 S = {X 1 =., X x 22 2 =.,, X x m2 m =.

More information

PCA and LDA. Man-Wai MAK

PCA and LDA. Man-Wai MAK PCA and LDA Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: S.J.D. Prince,Computer

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Convergence of Eigenspaces in Kernel Principal Component Analysis

Convergence of Eigenspaces in Kernel Principal Component Analysis Convergence of Eigenspaces in Kernel Principal Component Analysis Shixin Wang Advanced machine learning April 19, 2016 Shixin Wang Convergence of Eigenspaces April 19, 2016 1 / 18 Outline 1 Motivation

More information

Lecture 5 Supspace Tranformations Eigendecompositions, kernel PCA and CCA

Lecture 5 Supspace Tranformations Eigendecompositions, kernel PCA and CCA Lecture 5 Supspace Tranformations Eigendecompositions, kernel PCA and CCA Pavel Laskov 1 Blaine Nelson 1 1 Cognitive Systems Group Wilhelm Schickard Institute for Computer Science Universität Tübingen,

More information

Fundamentals of Matrices

Fundamentals of Matrices Maschinelles Lernen II Fundamentals of Matrices Christoph Sawade/Niels Landwehr/Blaine Nelson Tobias Scheffer Matrix Examples Recap: Data Linear Model: f i x = w i T x Let X = x x n be the data matrix

More information

Dimensionality Reduction

Dimensionality Reduction Lecture 5 1 Outline 1. Overview a) What is? b) Why? 2. Principal Component Analysis (PCA) a) Objectives b) Explaining variability c) SVD 3. Related approaches a) ICA b) Autoencoders 2 Example 1: Sportsball

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 8 Continuous Latent Variable

More information

Covariance and Correlation Matrix

Covariance and Correlation Matrix Covariance and Correlation Matrix Given sample {x n } N 1, where x Rd, x n = x 1n x 2n. x dn sample mean x = 1 N N n=1 x n, and entries of sample mean are x i = 1 N N n=1 x in sample covariance matrix

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality

More information

Announcements (repeat) Principal Components Analysis

Announcements (repeat) Principal Components Analysis 4/7/7 Announcements repeat Principal Components Analysis CS 5 Lecture #9 April 4 th, 7 PA4 is due Monday, April 7 th Test # will be Wednesday, April 9 th Test #3 is Monday, May 8 th at 8AM Just hour long

More information

Online Kernel PCA with Entropic Matrix Updates

Online Kernel PCA with Entropic Matrix Updates Online Kernel PCA with Entropic Matrix Updates Dima Kuzmin Manfred K. Warmuth University of California - Santa Cruz ICML 2007, Corvallis, Oregon April 23, 2008 D. Kuzmin, M. Warmuth (UCSC) Online Kernel

More information

ECE 661: Homework 10 Fall 2014

ECE 661: Homework 10 Fall 2014 ECE 661: Homework 10 Fall 2014 This homework consists of the following two parts: (1) Face recognition with PCA and LDA for dimensionality reduction and the nearest-neighborhood rule for classification;

More information

Data Mining II. Prof. Dr. Karsten Borgwardt, Department Biosystems, ETH Zürich. Basel, Spring Semester 2016 D-BSSE

Data Mining II. Prof. Dr. Karsten Borgwardt, Department Biosystems, ETH Zürich. Basel, Spring Semester 2016 D-BSSE D-BSSE Data Mining II Prof. Dr. Karsten Borgwardt, Department Biosystems, ETH Zürich Basel, Spring Semester 2016 D-BSSE Karsten Borgwardt Data Mining II Course, Basel Spring Semester 2016 2 / 117 Our course

More information

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis

Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Connection of Local Linear Embedding, ISOMAP, and Kernel Principal Component Analysis Alvina Goh Vision Reading Group 13 October 2005 Connection of Local Linear Embedding, ISOMAP, and Kernel Principal

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

7. Variable extraction and dimensionality reduction

7. Variable extraction and dimensionality reduction 7. Variable extraction and dimensionality reduction The goal of the variable selection in the preceding chapter was to find least useful variables so that it would be possible to reduce the dimensionality

More information

CHAPTER 4 PRINCIPAL COMPONENT ANALYSIS-BASED FUSION

CHAPTER 4 PRINCIPAL COMPONENT ANALYSIS-BASED FUSION 59 CHAPTER 4 PRINCIPAL COMPONENT ANALYSIS-BASED FUSION 4. INTRODUCTION Weighted average-based fusion algorithms are one of the widely used fusion methods for multi-sensor data integration. These methods

More information

Nonlinear Dimensionality Reduction

Nonlinear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Kernel PCA 2 Isomap 3 Locally Linear Embedding 4 Laplacian Eigenmap

More information

Artificial Intelligence Module 2. Feature Selection. Andrea Torsello

Artificial Intelligence Module 2. Feature Selection. Andrea Torsello Artificial Intelligence Module 2 Feature Selection Andrea Torsello We have seen that high dimensional data is hard to classify (curse of dimensionality) Often however, the data does not fill all the space

More information

Machine Learning (CSE 446): Unsupervised Learning: K-means and Principal Component Analysis

Machine Learning (CSE 446): Unsupervised Learning: K-means and Principal Component Analysis Machine Learning (CSE 446): Unsupervised Learning: K-means and Principal Component Analysis Sham M Kakade c 2019 University of Washington cse446-staff@cs.washington.edu 0 / 10 Announcements Please do Q1

More information

Image Analysis & Retrieval. Lec 14. Eigenface and Fisherface

Image Analysis & Retrieval. Lec 14. Eigenface and Fisherface Image Analysis & Retrieval Lec 14 Eigenface and Fisherface Zhu Li Dept of CSEE, UMKC Office: FH560E, Email: lizhu@umkc.edu, Ph: x 2346. http://l.web.umkc.edu/lizhu Z. Li, Image Analysis & Retrv, Spring

More information

Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface

Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface CS/EE 5590 / ENG 401 Special Topics, Spring 2018 Image Analysis & Retrieval Lec 14 - Eigenface & Fisherface Zhu Li Dept of CSEE, UMKC http://l.web.umkc.edu/lizhu Office Hour: Tue/Thr 2:30-4pm@FH560E, Contact:

More information

Dimensionality Reduction

Dimensionality Reduction Dimensionality Reduction Le Song Machine Learning I CSE 674, Fall 23 Unsupervised learning Learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data where a classification of

More information

Kernel methods for comparing distributions, measuring dependence

Kernel methods for comparing distributions, measuring dependence Kernel methods for comparing distributions, measuring dependence Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Principal component analysis Given a set of M centered observations

More information

Motivating the Covariance Matrix

Motivating the Covariance Matrix Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed in the

More information

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction Using PCA/LDA Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction One approach to deal with high dimensional data is by reducing their

More information

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016 Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31 Outline 1 Eigen

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

Unsupervised dimensionality reduction

Unsupervised dimensionality reduction Unsupervised dimensionality reduction Guillaume Obozinski Ecole des Ponts - ParisTech SOCN course 2014 Guillaume Obozinski Unsupervised dimensionality reduction 1/30 Outline 1 PCA 2 Kernel PCA 3 Multidimensional

More information

Face Detection and Recognition

Face Detection and Recognition Face Detection and Recognition Face Recognition Problem Reading: Chapter 18.10 and, optionally, Face Recognition using Eigenfaces by M. Turk and A. Pentland Queryimage face query database Face Verification

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Covariance to PCA. CS 510 Lecture #14 February 23, 2018

Covariance to PCA. CS 510 Lecture #14 February 23, 2018 Covariance to PCA CS 510 Lecture 14 February 23, 2018 Overview: Goal Assume you have a gallery (database) of images, and a probe (test) image. The goal is to find the database image that is most similar

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 6 1 / 22 Overview

More information

Statistics for Applications. Chapter 9: Principal Component Analysis (PCA) 1/16

Statistics for Applications. Chapter 9: Principal Component Analysis (PCA) 1/16 Statistics for Applications Chapter 9: Principal Component Analysis (PCA) 1/16 Multivariate statistics and review of linear algebra (1) Let X be a d-dimensional random vector and X 1,..., X n be n independent

More information

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Laurenz Wiskott Institute for Theoretical Biology Humboldt-University Berlin Invalidenstraße 43 D-10115 Berlin, Germany 11 March 2004 1 Intuition Problem Statement Experimental

More information

Dimension Reduction and Low-dimensional Embedding

Dimension Reduction and Low-dimensional Embedding Dimension Reduction and Low-dimensional Embedding Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/26 Dimension

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Unsupervised learning: beyond simple clustering and PCA

Unsupervised learning: beyond simple clustering and PCA Unsupervised learning: beyond simple clustering and PCA Liza Rebrova Self organizing maps (SOM) Goal: approximate data points in R p by a low-dimensional manifold Unlike PCA, the manifold does not have

More information

Lecture VIII Dim. Reduction (I)

Lecture VIII Dim. Reduction (I) Lecture VIII Dim. Reduction (I) Contents: Subset Selection & Shrinkage Ridge regression, Lasso PCA, PCR, PLS Lecture VIII: MLSC - Dr. Sethu Viayakumar Data From Human Movement Measure arm movement and

More information

STATISTICAL LEARNING SYSTEMS

STATISTICAL LEARNING SYSTEMS STATISTICAL LEARNING SYSTEMS LECTURE 8: UNSUPERVISED LEARNING: FINDING STRUCTURE IN DATA Institute of Computer Science, Polish Academy of Sciences Ph. D. Program 2013/2014 Principal Component Analysis

More information

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014

Principal Component Analysis and Singular Value Decomposition. Volker Tresp, Clemens Otte Summer 2014 Principal Component Analysis and Singular Value Decomposition Volker Tresp, Clemens Otte Summer 2014 1 Motivation So far we always argued for a high-dimensional feature space Still, in some cases it makes

More information

Lecture 7 Spectral methods

Lecture 7 Spectral methods CSE 291: Unsupervised learning Spring 2008 Lecture 7 Spectral methods 7.1 Linear algebra review 7.1.1 Eigenvalues and eigenvectors Definition 1. A d d matrix M has eigenvalue λ if there is a d-dimensional

More information

Table of Contents. Multivariate methods. Introduction II. Introduction I

Table of Contents. Multivariate methods. Introduction II. Introduction I Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation

More information

Deriving Principal Component Analysis (PCA)

Deriving Principal Component Analysis (PCA) -0 Mathematical Foundations for Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Deriving Principal Component Analysis (PCA) Matt Gormley Lecture 11 Oct.

More information