Dimension Reduction (PCA, ICA, CCA, FLD,

Size: px

Start display at page:

Download "Dimension Reduction (PCA, ICA, CCA, FLD,"

Milton Bradford
6 years ago
Views:

1 Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang , Machine Learning, Spring 2011 April 6 th, 2011 Parts of the PCA slides are from previous lectures 1

2 Outline Dimension reduction Principal Components Analysis Independent Component Analysis Canonical Correlation Analysis Fisher s Linear Discriminant Topic Models and Latent Dirichlet Allocation 2

3 Dimension reduction Feature selection select a subset of features More generally, feature extraction Not limited to the original features Dimension reduction usually refers to this case 3

4 Dimension reduction Assumption: data (approximately) lies on a lower dimensional space Examples: 4

5 Outline Dimension reduction Principal Components Analysis Independent Component Analysis Canonical Correlation Analysis Fisher s Linear Discriminant Topic Models and Latent Dirichlet Allocation 5

6 Principal components analysis 6

7 Principal components analysis 7

8 Principal components analysis 8

9 Principal components analysis 9

10 Principal components analysis Assume data is centered For a projection direction v Variance of projected data 10

11 Principal components analysis Assume data is centered For a projection direction v Variance of projected data Maximize the variance of projected data 11

12 Principal components analysis Assume data is centered For a projection direction v Variance of projected data Maximize the variance of projected data How to solve this? 12

13 Principal components analysis PCA formulation As a result 13

14 Principal components analysis 14

15 Outline Dimension reduction Principal Components Analysis Independent Component Analysis Canonical Correlation Analysis Fisher s Linear Discriminant Topic Models and Latent Dirichlet Allocation 15

16 Source separation The classical cocktail party problem Separate the mixed signal into sources 16

17 Source separation The classical cocktail party problem Separate the mixed signal into sources Assumption: different sources are independent 17

18 Independent component analysis Let v 1, v 2, v 3, v d denote the projection directions of independent components ICA: find these directions such that data projected onto these directions have maximum statistical independence 18

19 Independent component analysis Let v 1, v 2, v 3, v d denote the projection directions of independent components ICA: find these directions such that data projected onto these directions have maximum statistical independence How to actually maximize independence? Minimize the mutual information Or maximize the non-gaussianity Actual formulation quite complicated! 19

20 Outline Dimension reduction Principal Components Analysis Independent Component Analysis Canonical Correlation Analysis Fisher s Linear Discriminant Topic Models and Latent Dirichlet Allocation 20

data is maximized Intuitively, find the intrinsic subspace of

21 Recall: PCA Principal component analysis Note: Find the projection direction v such that the variance of projected data is maximized Intuitively, find the intrinsic subspace of the original feature space (in terms of retaining the data variability) 21

22 Canonical correlation analysis Now consider two sets of variables x and y x is a vector of p variables y is a vector of q variables Basically, two feature spaces How to find the connection between two set of variables (or two feature spaces)? 22

23 Canonical correlation analysis Now consider two sets of variables x and y x is a vector of p variables y is a vector of q variables Basically, two feature spaces How to find the connection between two set of variables (or two feature spaces)? CCA: find a projection direction u in the space of x, and a projection direction v in the space of y, so that projected data onto u and v has max correlation Note: CCA simultaneously finds dimension reduction for two feature spaces 23

24 Canonical correlation analysis CCA formulation X is n by p: n samples in p-dimensional space Y is n by q: n samples in q-dimensional space The n samples are paired in X and Y 24

25 Canonical correlation analysis CCA formulation X is n by p: n samples in p-dimensional space Y is n by q: n samples in q-dimensional space The n samples are paired in X and Y How to solve? kind of complicated 25

26 Canonical correlation analysis CCA formulation X is n by p: n samples in p-dimensional space Y is n by q: n samples in q-dimensional space The n samples are paired in X and Y How to solve? Generalized eigenproblems! 26

27 Outline Dimension reduction Principal Components Analysis Independent Component Analysis Canonical Correlation Analysis Fisher s Linear Discriminant Topic Models and Latent Dirichlet Allocation 27

28 Fisher s linear discriminant Now come back to one feature space In addition to features, we also have label Find the dimension reduction that helps separate different classes of examples! Let s consider 2-class case 28

29 Fisher s linear discriminant Idea: maximize the ratio of between-class variance over within-class variance for the projected data 29

30 Fisher s linear discriminant 30

31 Fisher s linear discriminant Generalize to multi-class cases Still, maximizing the ratio of between-class variance over within-class variance of the projected data: 31

32 Outline Dimension reduction Principal Components Analysis Independent Component Analysis Canonical Correlation Analysis Fisher s Linear Discriminant Topic Models and Latent Dirichlet Allocation 32

33 Topic models Topic models: a class of dimension reduction models on text (from words to topics) 33

34 Topic models Topic models: a class of dimension reduction models on text (from words to topics) Bag-of-words representation of documents 34

35 Topic models Topic models: a class of dimension reduction models on text (from words to topics) Bag-of-words representation of documents Topic models for representing documents 35

36 Latent Dirichlet allocation A fully Bayesian specification of topic models 36

37 Latent Dirichlet allocation Data: words on each documents Estimation: maximizing the data likelihood difficult! 37

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given