Dimension Reduction (PCA, ICA, CCA, FLD,

Similar documents
PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Principal Component Analysis

CS Lecture 18. Topic Models and LDA

ECE 5984: Introduction to Machine Learning

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Non-Parametric Bayes

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine Learning 11. week

Machine Learning 2nd Edition

Independent Component Analysis. PhD Seminar Jörgen Ungh

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Factor Analysis (10/2/13)

Lecture 7: Con3nuous Latent Variable Models

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis

Kernel methods for comparing distributions, measuring dependence

Generative Clustering, Topic Modeling, & Bayesian Inference

Statistical Pattern Recognition

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

STATS306B STATS306B. Discriminant Analysis. Jonathan Taylor Department of Statistics Stanford University. June 3, 2010

PCA, Kernel PCA, ICA

Linear & Non-Linear Discriminant Analysis! Hugh R. Wilson

Maximum variance formulation

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Machine Learning for Data Science (CS4786) Lecture 12

The Bayes classifier

CSC321 Lecture 20: Autoencoders

Unsupervised Learning

Pattern Recognition and Machine Learning

Introduction to Machine Learning

Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

Gaussian Models

ECE521 week 3: 23/26 January 2017

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Unsupervised learning: beyond simple clustering and PCA

Least Squares Regression

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

CSC 411 Lecture 12: Principal Component Analysis

STA 414/2104: Lecture 8

Dimensionality Reduction. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Basics of Multivariate Modelling and Data Analysis

Lecture 13: Data Modelling and Distributions. Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1

Least Squares Regression

Principal Component Analysis

STATS 306B: Unsupervised Learning Spring Lecture 12 May 7

L11: Pattern recognition principles

Bayesian Methods: Naïve Bayes

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Linear Methods for Regression. Lijun Zhang

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

MLE/MAP + Naïve Bayes

Introduction to Probabilistic Machine Learning

Machine Learning (Spring 2012) Principal Component Analysis

Learning with Noisy Labels. Kate Niehaus Reading group 11-Feb-2014

Subspace Methods for Visual Learning and Recognition

The Expectation-Maximization Algorithm

COMS 4721: Machine Learning for Data Science Lecture 18, 4/4/2017

STA 414/2104: Lecture 8

HST.582J/6.555J/16.456J

STA 414/2104: Machine Learning

Principal Component Analysis (PCA)

Machine Learning - MT & 14. PCA and MDS

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018

COMP 328: Machine Learning

MLE/MAP + Naïve Bayes

Dimensionality Reduction

Kernel methods, kernel SVM and ridge regression

ECE521 Lecture 19 HMM cont. Inference in HMM

Discriminant analysis and supervised classification

Independent Component Analysis

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Introduction to Signal Detection and Classification. Phani Chavali

Lecture 6 Proof for JL Lemma and Linear Dimensionality Reduction

Machine Learning Linear Classification. Prof. Matteo Matteucci

LEC 3: Fisher Discriminant Analysis (FDA)

Linear Dimensionality Reduction

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Machine Learning 1. Linear Classifiers. Marius Kloft. Humboldt University of Berlin Summer Term Machine Learning 1 Linear Classifiers 1

1 Principal Components Analysis

Bayesian Machine Learning

Statistical Machine Learning

Fisher Linear Discriminant Analysis

Statistical Analysis of fmrl Data

Lecture'12:' SSMs;'Independent'Component'Analysis;' Canonical'Correla;on'Analysis'

Artificial Intelligence Module 2. Feature Selection. Andrea Torsello

Machine Learning Basics: Maximum Likelihood Estimation

Introduction PCA classic Generative models Beyond and summary. PCA, ICA and beyond

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Machine Learning 4771

Dimensionality reduction

STA414/2104 Statistical Methods for Machine Learning II

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

PROBABILISTIC LATENT SEMANTIC ANALYSIS

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Linear Dimensionality Reduction

Transcription:

Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011 April 6 th, 2011 Parts of the PCA slides are from previous 10-701 lectures 1

Outline Dimension reduction Principal Components Analysis Independent Component Analysis Canonical Correlation Analysis Fisher s Linear Discriminant Topic Models and Latent Dirichlet Allocation 2

Dimension reduction Feature selection select a subset of features More generally, feature extraction Not limited to the original features Dimension reduction usually refers to this case 3

Dimension reduction Assumption: data (approximately) lies on a lower dimensional space Examples: 4

Outline Dimension reduction Principal Components Analysis Independent Component Analysis Canonical Correlation Analysis Fisher s Linear Discriminant Topic Models and Latent Dirichlet Allocation 5

Principal components analysis 6

Principal components analysis 7

Principal components analysis 8

Principal components analysis 9

Principal components analysis Assume data is centered For a projection direction v Variance of projected data 10

Principal components analysis Assume data is centered For a projection direction v Variance of projected data Maximize the variance of projected data 11

Principal components analysis Assume data is centered For a projection direction v Variance of projected data Maximize the variance of projected data How to solve this? 12

Principal components analysis PCA formulation As a result 13

Principal components analysis 14

Outline Dimension reduction Principal Components Analysis Independent Component Analysis Canonical Correlation Analysis Fisher s Linear Discriminant Topic Models and Latent Dirichlet Allocation 15

Source separation The classical cocktail party problem Separate the mixed signal into sources 16

Source separation The classical cocktail party problem Separate the mixed signal into sources Assumption: different sources are independent 17

Independent component analysis Let v 1, v 2, v 3, v d denote the projection directions of independent components ICA: find these directions such that data projected onto these directions have maximum statistical independence 18

Independent component analysis Let v 1, v 2, v 3, v d denote the projection directions of independent components ICA: find these directions such that data projected onto these directions have maximum statistical independence How to actually maximize independence? Minimize the mutual information Or maximize the non-gaussianity Actual formulation quite complicated! 19

Outline Dimension reduction Principal Components Analysis Independent Component Analysis Canonical Correlation Analysis Fisher s Linear Discriminant Topic Models and Latent Dirichlet Allocation 20

Recall: PCA Principal component analysis Note: Find the projection direction v such that the variance of projected data is maximized Intuitively, find the intrinsic subspace of the original feature space (in terms of retaining the data variability) 21

Canonical correlation analysis Now consider two sets of variables x and y x is a vector of p variables y is a vector of q variables Basically, two feature spaces How to find the connection between two set of variables (or two feature spaces)? 22

Canonical correlation analysis Now consider two sets of variables x and y x is a vector of p variables y is a vector of q variables Basically, two feature spaces How to find the connection between two set of variables (or two feature spaces)? CCA: find a projection direction u in the space of x, and a projection direction v in the space of y, so that projected data onto u and v has max correlation Note: CCA simultaneously finds dimension reduction for two feature spaces 23

Canonical correlation analysis CCA formulation X is n by p: n samples in p-dimensional space Y is n by q: n samples in q-dimensional space The n samples are paired in X and Y 24

Canonical correlation analysis CCA formulation X is n by p: n samples in p-dimensional space Y is n by q: n samples in q-dimensional space The n samples are paired in X and Y How to solve? kind of complicated 25

Canonical correlation analysis CCA formulation X is n by p: n samples in p-dimensional space Y is n by q: n samples in q-dimensional space The n samples are paired in X and Y How to solve? Generalized eigenproblems! 26

Outline Dimension reduction Principal Components Analysis Independent Component Analysis Canonical Correlation Analysis Fisher s Linear Discriminant Topic Models and Latent Dirichlet Allocation 27

Fisher s linear discriminant Now come back to one feature space In addition to features, we also have label Find the dimension reduction that helps separate different classes of examples! Let s consider 2-class case 28

Fisher s linear discriminant Idea: maximize the ratio of between-class variance over within-class variance for the projected data 29

Fisher s linear discriminant 30

Fisher s linear discriminant Generalize to multi-class cases Still, maximizing the ratio of between-class variance over within-class variance of the projected data: 31

Outline Dimension reduction Principal Components Analysis Independent Component Analysis Canonical Correlation Analysis Fisher s Linear Discriminant Topic Models and Latent Dirichlet Allocation 32

Topic models Topic models: a class of dimension reduction models on text (from words to topics) 33

Topic models Topic models: a class of dimension reduction models on text (from words to topics) Bag-of-words representation of documents 34

Topic models Topic models: a class of dimension reduction models on text (from words to topics) Bag-of-words representation of documents Topic models for representing documents 35

Latent Dirichlet allocation A fully Bayesian specification of topic models 36

Latent Dirichlet allocation Data: words on each documents Estimation: maximizing the data likelihood difficult! 37