MULTI-VARIATE/MODALITY IMAGE ANALYSIS

Size: px
Start display at page:

Download "MULTI-VARIATE/MODALITY IMAGE ANALYSIS"

Transcription

1 MULTI-VARIATE/MODALITY IMAGE ANALYSIS Duygu Tosun-Turgut, Ph.D. Center for Imaging of Neurodegenerative Diseases Department of Radiology and Biomedical Imaging

2 Curse of dimensionality A major problem in imaging statistics is the curse of dimensionality. If the data x lies in high dimensional space, then an enormous amount of data is required to learn distributions or decision rules.

3 Curse of dimensionality One way to deal with dimensionality is to assume that we know the form of the probability distribution. For example, a Gaussian model in N dimensions has N + N(N-1)/2 parameters to estimate. Requires O(N 2 ) data to learn reliably. This may not be practical.

4 Typical solution: divide into sub-problems Each sub-problem relates single voxel to clinical variable Known as voxel-wise, univariate, or pointwise regression Approach popularized by SPM (Friston, 1995) Dependencies between spatial variables neglected!!

5 Dimension reduction One way to avoid the curse of dimensionality is by projecting the data onto a lower-dimensional space. Techniques for dimension reduction: Principal Component Analysis (PCA) Fisher s Linear Discriminant Multi-dimensional Scaling. Independent Component Analysis (ICA)

6 A dual goal Find a good representation The features part Reduce redundancy in the data A side effect of proper features

7 A good feature Simplify the explanation of the input Represent repeating patterns When defined makes the input simpler How do we define these abstract qualities? Onto the math

8 Linear features Z = WX features samples " # Weight Matrix $ % features = dimensions " ' ' ' # $ ( ( ( % dimensions Feature Matrix samples " # Input Matrix $ %

9 A 2D example Matrix representation of data! Z=WX= # z 1 # " T z 2 T $! & & = # w 1 # % " T w 2 T $! &# &# %" x 1 T x 2 T $ & & % X 1 Scatter plot of same data X 2 X 2 X 1

10 Defining a goal Desirable feature features Give simple weights Avoid feature similarity How do we define these?

11 One way to proceed Simple weights Minimize relation of the two dimensions Decorrelate: Feature similarity Same thing! Decorrelate: z 1 T z 2 = 0 w 1 T w 2 = 0

12 Diagonalizing the covariance Covariance matrix! Cov(z 1, z 2 )=# z T 1 z 1 z T 1 z 2 # " z T 2 z 1 z T 2 z 2 $ & & / N % Diagonalizing the covariance suppresses crossdimensional co-activity if z 1 is high, z 2 won t be! Cov(z 1,z 2 )=# z T 1 z 1 z T 1 z 2 # " z T 2 z 1 z T 2 z 2 $ & & / N =! # % " $ & = Ι %

13 Problem definition For a given input X Find a feature matrix W So that the weights decorrelate ( WX) ( WX) T = NΙ ZZ T = NΙ WCov(X)W T = Ι Solving for diagonalization!

14 Solving for diagonalization Covariance matrices are positive definite Therefore symmetric have orthogonal eigenvectors and real eigenvalues and are factorizable by: U T AU=Λ where U has eigenvectors of A in its columns Λ=diag(λ i ), where λ i are the eigenvalues of A

15 Decorrelation An implicit Gaussian assumption N-D data has N directions of variance

16 Undoing the variance The decorrelating matrix W contains two vectors that normalize the input s variance

17 Resulting transform Input gets scaled to a well behaved Gaussian with unit variance in all dimensions Input data Transformed data (weights)

18 A more complex case Having correlation between two dimensions We still find the directions of maximal variance But we also rotate in addition to scaling Input data Transformed data (weights)

19 One more detail So far we considered zero-mean inputs The transforming operation was a rotation If the input mean is not zero bad things happen! Make sure that your data is zero-mean! Input data Transformed data (weights)

20 Principal component analysis This transform is known as PCA The features are the principal components They are orthogonal to each other And produce orthogonal (white) weights Major tool in statistics Removes dependencies from multivariate data Also known as the KLT Karhunen-Loeve transform

21 A better way to compute PCA The Singular Value Decomposition way [ U,S, V] = SVD(A) A=USV T Relationship to eigendecomposition In our case (covariance input A), U and S will hold the eigenvectors/values of A Why the SVD? More stable, more robust, fancy extensions

22 Dimensionality reduction PCA is great for high dimensional data Allows us to perform dimensionality reduction Helps us find relevant structure in data Helps us throw away things that won t matter

23 What is the number of dimensions? If the input was M dimensional, how many dimensions do we keep? No solid answer (estimators exist) Look at the singular-/eigen-values They will show the variance of each component, at some point it will be small

24 Example Eigenvalues of 1200 dimensional video data Little variance after component 30 We don t need to keep the rest of the data Dimension reduction occurs by ignoring the directions in which the covariance is small.

25 Recap PCA decorrelates multivariate data, finds useful components, reduces dimensionality. PCA is only powerful if the biological question is related to the highest variance in the dataset If not other techniques are more useful : Independent Component Analysis

26 Limitations of PCA PCA may not find the best directions for discriminating between two classes. 1 st eigenvector is best for representing the probabilities. 2 nd eigenvector is best for discrimination.

27 Fisher s linear discriminant analysis (LDA) Performs dimensionality reduction while preserving as much of the class discriminatory information as possible. Seeks to find directions along which the classes are best separated. Takes into consideration the scatter within-classes but also the scatter between-classes. z 1 z 2

28 LDA via PCA PCA is first applied to the data set to reduce its dimensionality.! # # # # "# x 1 x 2 x N $! & # & PCA # & ''''' # & # %& "# y 1 y 2 y K $ & & & & %& LDA is then applied to find the most discriminative directions.! # # # # "# y 1 y 2 y K $! & # & LDA # & ''''' # & # %& "# z 1 z 2 z C 1 $ & & & & %&

29 PCA fails Focus on uncorrelated and Gaussian components Second-order statistics Orthogonal transformation Focus on independent and non- Gaussian components Higher-order statistics Non-orthogonal transformation

30 Statistical independence If we know something about x, that should tell us nothing about y Dependent Independent

31 Concept of ICA A given signal (x) is generated by linear mixing (A) of independent components (s) ICA is a statistical analysis method to estimate those independent components (z) and mixing rule (W) s 1 s 2 z = Wx = A ij x 1 x 2 WAs W ij z 1 z 2 Blind source separation s 3 x 3 z 3 s M x M s A x W z Observed Unknown z M

32 Example: PCA and ICA Model: x x 1 2 = a a a a s s 1 2

33 How it works?

34 Rational of ICA Find the components S i that are as independent as possible in the sens of maximizing some function F(s 1,s 2,,s k ) that measures indepedence All ICs (except 1) should be non-gaussian The variance of all ICs is 1 There is no hierarchy between ICs

35 How to find ICs? Many choices of objective function F Mutual information MI = f ( s 1, s 2,..., s k ) Log f 1 f ( s 1 ( s ) f 1 2, s 2 ( s,..., s 2 )... f k k ) ( s k ) Use the kurtosis of the variables to approximate the distribution function The number of ICs is chosen by the user

36 Difference with PCA It is not a dimensionality reduction technique. There is no single (exact) solution for components (in R: FastICA, PearsonICA, MLICA) ICs are of course uncorrelated but also as independent as possible Uninteresting for Gaussian distributed variables

37 Feature Extraction in ECG data (Raw Data)

38 Feature Extraction in ECG data (PCA)

39 Feature Extraction in ECG data (Extended ICA)

40 Feature Extraction in ECG data (flexible ICA)

41 Application domains of ICA Image denoising Medical signal processing fmri, ECG, EEG Feature extraction, face recognition Compression, redundancy reduction Watermarking Clustering Time series analysis (stock market, microarray data) Topic extraction Econometrics: Finding hidden factors in financial data

42 Gene clusters [Hsiao et al. 2002] Each gene is mapped to a point based on the value assigned to the gene in the 14 th (x-axis), 15 th (y-axis) and 55 th (z-axis) independent components: - Red - enriched with liverspecific genes - Orange enriched with muscle-specific genes - Green enriched with vulvaspecific genes - Yellow genes not annotated as liver-, muscle-, or vulvaspecific

43 Image, I Segmentation by ICA ICA Filter Bank With n filters I 1, I 2,, I n These are filter responses Segmented image, C Clustering Above is an unsupervised setting. Segmentation (i.e., classification in this context) can also be performed by a supervised method on the output feature images I 1, I 2,, I n. A texture image Segmentation Jenssen and Eltoft, ICA filter bank for segmentation of textured images, 4 th International symposium on ICA and BSS, Nara, Japan, 2003

44 Recap: PCA vs LDA vs ICA PCA : Proper for dimension reduction LDA : Proper for pattern classification if the number of training samples of each class are large ICA : Proper for blind source separation or classification using ICs when class id of training data is not available

45 Multimodal Data Collection Brain Function (spatiotemporal, task-related) Task 1-N Task 1-N EEG EEG EEG fmri fmri fmri Brain Structure (spatial) T1/T2 DTI Genetic Data (spatial/chromosomal) SNP Gene Express ion Covariates (Age, etc.) Other*

46 Joint ICA Variation on ICA to look for components that appear jointly across features or modalities = " #$ Χ Modality 1 Χ Modality 2 % &' = Α " S Modality 1 S #$ Modality 2 % &'

47 jica Results: Multitask fmri One component Significant at p<0.01 Calhoun VD, Adali T, Kiehl KA, Astur RS, Pekar JJ, Pearlson GD. (2006): A Method for Multitask fmri Data Fusion Applied to Schizophrenia. Hum.Brain Map. 27(7):

48 jica Results: Joint Histograms Calhoun VD, Adali T, Kiehl KA, Astur RS, Pekar JJ, Pearlson GD. (2006): A Method for Multitask fmri Data Fusion Applied to Schizophrenia. Hum.Brain Map. 27(7):

49 Parallel ICA: Two Goals Data1: (fmri) Data2: (SNP) X1 X2 Identify Hidden Features A1 W1 W2 A2 Identify Hidden Features S1 S2 Identify Linked Features MAX : {H(Y1) + H(Y2) }, <Infomax> 2 Cov(a 2 1i,a 2j) Subject to: arg max g{w1,w2 ŝ1,ŝ2 }, g( )=Correlation(A 1,A 2) = Var(a ) Var(a ) J. Liu, O. Demirci, and V. D. Calhoun, "A Parallel Independent Component Analysis Approach to Investigate Genomic Influence on Brain Function," IEEE Signal Proc. Letters, vol. 15, pp , i 2j

50 Example 2: Default Mode Network & Multiple Genetic Factors fmri Component t-map Mixing Coefficients (ρ=0.31) fmri Component SNPs Component rs rs rs rs rs rs rs6136 rs rs rs rs rs rs J. Liu, G. D. Pearlson, A. Windemuth, G. Ruano, N. I. Perrone-Bizzozero, and V. D. Calhoun, "Combining fmri and SNP data to investigate connections between brain function and genetics using parallel ICA," Hum.Brain Map., vol. 30, pp ,

51 Partial least squares (PLS) Understanding relationships between process & response variables

52 PLS fundamentals X Space Y Space X 3 Projections Planes Y 3 X 2 Y 2 X 1 Y 1

53 PLS is the multivariate version of regression PLS uses two different PCA models, one for the X s and one for the Y s, and finds the links between the two. Mathematically, the difference is as follows: In PCA, we are maximizing the variance that is explained by the model. In PLS, we are maximizing the covariance.

54 Conceptually how PLS works? A step-wise process: PLS finds a set of orthogonal components that : maximize the level of explanation of both X and Y provide a predictive equation for Y in terms of the X s This is done by: fitting a set of components to X (as in PCA) similarly fitting a set of components to Y reconciling the two sets of components so as to maximize explanation of X and Y PLS finds components of X that are also relevant to Y Latent vectors are components that simultaneously decompose X and Y Latent vectors explain the covariance between X and Y

55 PLS the Inner Relation The way PLS works visually is by tweeking the two PCA models (X and Y) until their covariance is optimised. It is this tweeking that led to the name partial least-squares. All 3 are solved simultaneously via numerical methods

56 Canonical correlation analysis (CCA) CCA was developed first by H. Hotelling. H. Hotelling. Relations between two sets of variates. Biometrika, 28: , CCA measures the linear relationship between two multidimensional variables. CCA finds two bases, one for each variable, that are optimal with respect to correlations. Applications in economics, medical studies, bioinformatics and other areas.

57 Canonical correlation analysis (CCA) Two multidimensional variables Two different measurement on the same set of objects Web images and associated text Protein (or gene) sequences and related literature (text) Protein sequence and corresponding gene expression In classification: feature vector and class label Two measurements on the same object are likely to be correlated. May not be obvious on the original measurements. Find the maximum correlation on transformed space.

58 Canonical correlation analysis (CCA) X T measurement W X transformation Transformed data Correlation T Y W Y

59 Problem definition Find two sets of basis vectors, one for x and the other for y, such that the correlations between the projections of the variables onto these basis vectors are maximized. Given Compute two basis vectors w x and w y : y < wy, y >

60 60 Problem definition Compute the two basis vectors so that the correlations of the projections onto these vectors are maximized.

61 Least squares regression l Estimate the parameters β based on a set of training data: (x 1, y 1 ) (x N, y N ) l Minimize residual sum of squares N p RSS( β ) = y 0 i β x ij β j i= 1 j= 1 2 ( T ) T X X X y ˆ 1 β = poor numeric properties!

62 Subset selection We want to eliminate unnecessary features Use additional penalties to reduce coefficients: coefficient shrinkage Ridge Regression p 2 Minimize least squares s.t. β j s The Lasso Minimize least squares s.t. Principal Components Regression Regress on M < p principal components of X Partial Least Squares Regress on M < p directions of X weighted by y j= 1 p j= 1 β s j

63 Prostate Cancer Data Example

64 Error comparison

65 Shrinkage methods: Ridge regression

66 Shrinkage methods: Lasso regression

67 A unifying view We can view all the linear regression techniques under a common framework 2 N p p ˆ = + q β arg minβ y i β0 x ij β j λ β j i= 1 j= 1 j= 1 λ includes bias, q indicates a prior distribution on β λ=0: least squares λ>0, q=0: subset selection (counts number of nonzero parameters) λ>0, q=1: the lasso λ>0, q=2: ridge regression

68 Family of shrinkage regression

69 Atrophy associated with delayed memory Spatial correlation Ridge regression

Principal Component Analysis CS498

Principal Component Analysis CS498 Principal Component Analysis CS498 Today s lecture Adaptive Feature Extraction Principal Component Analysis How, why, when, which A dual goal Find a good representation The features part Reduce redundancy

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Introduction Consider a zero mean random vector R n with autocorrelation matri R = E( T ). R has eigenvectors q(1),,q(n) and associated eigenvalues λ(1) λ(n). Let Q = [ q(1)

More information

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis .. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Artificial Intelligence Module 2. Feature Selection. Andrea Torsello

Artificial Intelligence Module 2. Feature Selection. Andrea Torsello Artificial Intelligence Module 2 Feature Selection Andrea Torsello We have seen that high dimensional data is hard to classify (curse of dimensionality) Often however, the data does not fill all the space

More information

Machine Learning 11. week

Machine Learning 11. week Machine Learning 11. week Feature Extraction-Selection Dimension reduction PCA LDA 1 Feature Extraction Any problem can be solved by machine learning methods in case of that the system must be appropriately

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

LECTURE :ICA. Rita Osadchy. Based on Lecture Notes by A. Ng

LECTURE :ICA. Rita Osadchy. Based on Lecture Notes by A. Ng LECURE :ICA Rita Osadchy Based on Lecture Notes by A. Ng Cocktail Party Person 1 2 s 1 Mike 2 s 3 Person 3 1 Mike 1 s 2 Person 2 3 Mike 3 microphone signals are mied speech signals 1 2 3 ( t) ( t) ( t)

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems

More information

Machine Learning 2nd Edition

Machine Learning 2nd Edition INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

A Least Squares Formulation for Canonical Correlation Analysis

A Least Squares Formulation for Canonical Correlation Analysis A Least Squares Formulation for Canonical Correlation Analysis Liang Sun, Shuiwang Ji, and Jieping Ye Department of Computer Science and Engineering Arizona State University Motivation Canonical Correlation

More information

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A. 2017-2018 Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI

More information

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017 CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).

More information

7. Variable extraction and dimensionality reduction

7. Variable extraction and dimensionality reduction 7. Variable extraction and dimensionality reduction The goal of the variable selection in the preceding chapter was to find least useful variables so that it would be possible to reduce the dimensionality

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Independent Component Analysis (ICA)

Independent Component Analysis (ICA) Independent Component Analysis (ICA) Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

A MULTIVARIATE MODEL FOR COMPARISON OF TWO DATASETS AND ITS APPLICATION TO FMRI ANALYSIS

A MULTIVARIATE MODEL FOR COMPARISON OF TWO DATASETS AND ITS APPLICATION TO FMRI ANALYSIS A MULTIVARIATE MODEL FOR COMPARISON OF TWO DATASETS AND ITS APPLICATION TO FMRI ANALYSIS Yi-Ou Li and Tülay Adalı University of Maryland Baltimore County Baltimore, MD Vince D. Calhoun The MIND Institute

More information

Dimension Reduction (PCA, ICA, CCA, FLD,

Dimension Reduction (PCA, ICA, CCA, FLD, Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011 April 6 th, 2011 Parts of the PCA slides are from previous 10-701 lectures 1 Outline Dimension reduction

More information

A Selective Review of Sufficient Dimension Reduction

A Selective Review of Sufficient Dimension Reduction A Selective Review of Sufficient Dimension Reduction Lexin Li Department of Statistics North Carolina State University Lexin Li (NCSU) Sufficient Dimension Reduction 1 / 19 Outline 1 General Framework

More information

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations. Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

More information

PRINCIPAL COMPONENTS ANALYSIS

PRINCIPAL COMPONENTS ANALYSIS 121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves

More information

https://goo.gl/kfxweg KYOTO UNIVERSITY Statistical Machine Learning Theory Sparsity Hisashi Kashima kashima@i.kyoto-u.ac.jp DEPARTMENT OF INTELLIGENCE SCIENCE AND TECHNOLOGY 1 KYOTO UNIVERSITY Topics:

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Additional reading can be found from non-assessed exercises (week 8) in this course unit teaching page. Textbooks: Sect. 6.3 in [1] and Ch. 12 in [2] Outline Introduction

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component

More information

What is Principal Component Analysis?

What is Principal Component Analysis? What is Principal Component Analysis? Principal component analysis (PCA) Reduce the dimensionality of a data set by finding a new set of variables, smaller than the original set of variables Retains most

More information

HST.582J/6.555J/16.456J

HST.582J/6.555J/16.456J Blind Source Separation: PCA & ICA HST.582J/6.555J/16.456J Gari D. Clifford gari [at] mit. edu http://www.mit.edu/~gari G. D. Clifford 2005-2009 What is BSS? Assume an observation (signal) is a linear

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 12 Jan-Willem van de Meent (credit: Yijun Zhao, Percy Liang) DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Linear Dimensionality

More information

Example: Face Detection

Example: Face Detection Announcements HW1 returned New attendance policy Face Recognition: Dimensionality Reduction On time: 1 point Five minutes or more late: 0.5 points Absent: 0 points Biometrics CSE 190 Lecture 14 CSE190,

More information

Independent Component Analysis and Its Application on Accelerator Physics

Independent Component Analysis and Its Application on Accelerator Physics Independent Component Analysis and Its Application on Accelerator Physics Xiaoying Pang LA-UR-12-20069 ICA and PCA Similarities: Blind source separation method (BSS) no model Observed signals are linear

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

Preprocessing & dimensionality reduction

Preprocessing & dimensionality reduction Introduction to Data Mining Preprocessing & dimensionality reduction CPSC/AMTH 445a/545a Guy Wolf guy.wolf@yale.edu Yale University Fall 2016 CPSC 445 (Guy Wolf) Dimensionality reduction Yale - Fall 2016

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Karhunen-Loève Transform KLT. JanKees van der Poel D.Sc. Student, Mechanical Engineering

Karhunen-Loève Transform KLT. JanKees van der Poel D.Sc. Student, Mechanical Engineering Karhunen-Loève Transform KLT JanKees van der Poel D.Sc. Student, Mechanical Engineering Karhunen-Loève Transform Has many names cited in literature: Karhunen-Loève Transform (KLT); Karhunen-Loève Decomposition

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information

Lecture VIII Dim. Reduction (I)

Lecture VIII Dim. Reduction (I) Lecture VIII Dim. Reduction (I) Contents: Subset Selection & Shrinkage Ridge regression, Lasso PCA, PCR, PLS Lecture VIII: MLSC - Dr. Sethu Viayakumar Data From Human Movement Measure arm movement and

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Independent Component Analysis Barnabás Póczos Independent Component Analysis 2 Independent Component Analysis Model original signals Observations (Mixtures)

More information

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition

Face Recognition. Face Recognition. Subspace-Based Face Recognition Algorithms. Application of Face Recognition ace Recognition Identify person based on the appearance of face CSED441:Introduction to Computer Vision (2017) Lecture10: Subspace Methods and ace Recognition Bohyung Han CSE, POSTECH bhhan@postech.ac.kr

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

Independent Component Analysis. Contents

Independent Component Analysis. Contents Contents Preface xvii 1 Introduction 1 1.1 Linear representation of multivariate data 1 1.1.1 The general statistical setting 1 1.1.2 Dimension reduction methods 2 1.1.3 Independence as a guiding principle

More information

Dimensionality Reduction and Principle Components Analysis

Dimensionality Reduction and Principle Components Analysis Dimensionality Reduction and Principle Components Analysis 1 Outline What is dimensionality reduction? Principle Components Analysis (PCA) Example (Bishop, ch 12) PCA vs linear regression PCA as a mixture

More information

Classification: The rest of the story

Classification: The rest of the story U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher

More information

Feature Engineering, Model Evaluations

Feature Engineering, Model Evaluations Feature Engineering, Model Evaluations Giri Iyengar Cornell University gi43@cornell.edu Feb 5, 2018 Giri Iyengar (Cornell Tech) Feature Engineering Feb 5, 2018 1 / 35 Overview 1 ETL 2 Feature Engineering

More information

Principal Component Analysis vs. Independent Component Analysis for Damage Detection

Principal Component Analysis vs. Independent Component Analysis for Damage Detection 6th European Workshop on Structural Health Monitoring - Fr..D.4 Principal Component Analysis vs. Independent Component Analysis for Damage Detection D. A. TIBADUIZA, L. E. MUJICA, M. ANAYA, J. RODELLAR

More information

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007 MIT OpenCourseWare http://ocw.mit.edu HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing Spring 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

Multidimensional scaling (MDS)

Multidimensional scaling (MDS) Multidimensional scaling (MDS) Just like SOM and principal curves or surfaces, MDS aims to map data points in R p to a lower-dimensional coordinate system. However, MSD approaches the problem somewhat

More information

Machine Learning (Spring 2012) Principal Component Analysis

Machine Learning (Spring 2012) Principal Component Analysis 1-71 Machine Learning (Spring 1) Principal Component Analysis Yang Xu This note is partly based on Chapter 1.1 in Chris Bishop s book on PRML and the lecture slides on PCA written by Carlos Guestrin in

More information

Dimensionality Reduction and Principal Components

Dimensionality Reduction and Principal Components Dimensionality Reduction and Principal Components Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Motivation Recall, in Bayesian decision theory we have: World: States Y in {1,..., M} and observations of X

More information

CSC 411 Lecture 12: Principal Component Analysis

CSC 411 Lecture 12: Principal Component Analysis CSC 411 Lecture 12: Principal Component Analysis Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 12-PCA 1 / 23 Overview Today we ll cover the first unsupervised

More information

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) PCA transforms the original input space into a lower dimensional space, by constructing dimensions that are linear combinations

More information

Pattern Recognition 2

Pattern Recognition 2 Pattern Recognition 2 KNN,, Dr. Terence Sim School of Computing National University of Singapore Outline 1 2 3 4 5 Outline 1 2 3 4 5 The Bayes Classifier is theoretically optimum. That is, prob. of error

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

Subspace Methods for Visual Learning and Recognition

Subspace Methods for Visual Learning and Recognition This is a shortened version of the tutorial given at the ECCV 2002, Copenhagen, and ICPR 2002, Quebec City. Copyright 2002 by Aleš Leonardis, University of Ljubljana, and Horst Bischof, Graz University

More information

Notes on Latent Semantic Analysis

Notes on Latent Semantic Analysis Notes on Latent Semantic Analysis Costas Boulis 1 Introduction One of the most fundamental problems of information retrieval (IR) is to find all documents (and nothing but those) that are semantically

More information

! Dimension Reduction

! Dimension Reduction ! Dimension Reduction Pamela K. Douglas NITP Summer 2013 Overview! What is dimension reduction! Motivation for performing reduction on your data! Intuitive Description of Common Methods! Applications in

More information

Subspace Analysis for Facial Image Recognition: A Comparative Study. Yongbin Zhang, Lixin Lang and Onur Hamsici

Subspace Analysis for Facial Image Recognition: A Comparative Study. Yongbin Zhang, Lixin Lang and Onur Hamsici Subspace Analysis for Facial Image Recognition: A Comparative Study Yongbin Zhang, Lixin Lang and Onur Hamsici Outline 1. Subspace Analysis: Linear vs Kernel 2. Appearance-based Facial Image Recognition.

More information

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1). Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes

More information

Hypothesis testing:power, test statistic CMS:

Hypothesis testing:power, test statistic CMS: Hypothesis testing:power, test statistic The more sensitive the test, the better it can discriminate between the null and the alternative hypothesis, quantitatively, maximal power In order to achieve this

More information

International Journal of Pure and Applied Mathematics Volume 19 No , A NOTE ON BETWEEN-GROUP PCA

International Journal of Pure and Applied Mathematics Volume 19 No , A NOTE ON BETWEEN-GROUP PCA International Journal of Pure and Applied Mathematics Volume 19 No. 3 2005, 359-366 A NOTE ON BETWEEN-GROUP PCA Anne-Laure Boulesteix Department of Statistics University of Munich Akademiestrasse 1, Munich,

More information

Recursive Generalized Eigendecomposition for Independent Component Analysis

Recursive Generalized Eigendecomposition for Independent Component Analysis Recursive Generalized Eigendecomposition for Independent Component Analysis Umut Ozertem 1, Deniz Erdogmus 1,, ian Lan 1 CSEE Department, OGI, Oregon Health & Science University, Portland, OR, USA. {ozertemu,deniz}@csee.ogi.edu

More information

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection

COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection Instructor: Herke van Hoof (herke.vanhoof@cs.mcgill.ca) Based on slides by:, Jackie Chi Kit Cheung Class web page:

More information

Dimensionality Reduction

Dimensionality Reduction Dimensionality Reduction Le Song Machine Learning I CSE 674, Fall 23 Unsupervised learning Learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data where a classification of

More information

20 Unsupervised Learning and Principal Components Analysis (PCA)

20 Unsupervised Learning and Principal Components Analysis (PCA) 116 Jonathan Richard Shewchuk 20 Unsupervised Learning and Principal Components Analysis (PCA) UNSUPERVISED LEARNING We have sample points, but no labels! No classes, no y-values, nothing to predict. Goal:

More information

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018

CPSC 340: Machine Learning and Data Mining. Sparse Matrix Factorization Fall 2018 CPSC 340: Machine Learning and Data Mining Sparse Matrix Factorization Fall 2018 Last Time: PCA with Orthogonal/Sequential Basis When k = 1, PCA has a scaling problem. When k > 1, have scaling, rotation,

More information

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab 1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed in the

More information

CMSC858P Supervised Learning Methods

CMSC858P Supervised Learning Methods CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi

Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Face Recognition Using Laplacianfaces He et al. (IEEE Trans PAMI, 2005) presented by Hassan A. Kingravi Overview Introduction Linear Methods for Dimensionality Reduction Nonlinear Methods and Manifold

More information

Tutorial on Blind Source Separation and Independent Component Analysis

Tutorial on Blind Source Separation and Independent Component Analysis Tutorial on Blind Source Separation and Independent Component Analysis Lucas Parra Adaptive Image & Signal Processing Group Sarnoff Corporation February 09, 2002 Linear Mixtures... problem statement...

More information

Eigenvalues and diagonalization

Eigenvalues and diagonalization Eigenvalues and diagonalization Patrick Breheny November 15 Patrick Breheny BST 764: Applied Statistical Modeling 1/20 Introduction The next topic in our course, principal components analysis, revolves

More information

Table of Contents. Multivariate methods. Introduction II. Introduction I

Table of Contents. Multivariate methods. Introduction II. Introduction I Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation

More information

Vector Space Models. wine_spectral.r

Vector Space Models. wine_spectral.r Vector Space Models 137 wine_spectral.r Latent Semantic Analysis Problem with words Even a small vocabulary as in wine example is challenging LSA Reduce number of columns of DTM by principal components

More information

Independent Component Analysis

Independent Component Analysis Independent Component Analysis Seungjin Choi Department of Computer Science Pohang University of Science and Technology, Korea seungjin@postech.ac.kr March 4, 2009 1 / 78 Outline Theory and Preliminaries

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Introduction to Independent Component Analysis. Jingmei Lu and Xixi Lu. Abstract

Introduction to Independent Component Analysis. Jingmei Lu and Xixi Lu. Abstract Final Project 2//25 Introduction to Independent Component Analysis Abstract Independent Component Analysis (ICA) can be used to solve blind signal separation problem. In this article, we introduce definition

More information

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann Machine Learning! in just a few minutes Jan Peters Gerhard Neumann 1 Purpose of this Lecture Foundations of machine learning tools for robotics We focus on regression methods and general principles Often

More information

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Course 495: Advanced Statistical Machine Learning/Pattern Recognition Course 495: Advanced Statistical Machine Learning/Pattern Recognition Deterministic Component Analysis Goal (Lecture): To present standard and modern Component Analysis (CA) techniques such as Principal

More information

Dimensionality Reduction and Principle Components

Dimensionality Reduction and Principle Components Dimensionality Reduction and Principle Components Ken Kreutz-Delgado (Nuno Vasconcelos) UCSD ECE Department Winter 2012 Motivation Recall, in Bayesian decision theory we have: World: States Y in {1,...,

More information

Chart types and when to use them

Chart types and when to use them APPENDIX A Chart types and when to use them Pie chart Figure illustration of pie chart 2.3 % 4.5 % Browser Usage for April 2012 18.3 % 38.3 % Internet Explorer Firefox Chrome Safari Opera 35.8 % Pie chart

More information

Independent Component Analysis and Its Applications. By Qing Xue, 10/15/2004

Independent Component Analysis and Its Applications. By Qing Xue, 10/15/2004 Independent Component Analysis and Its Applications By Qing Xue, 10/15/2004 Outline Motivation of ICA Applications of ICA Principles of ICA estimation Algorithms for ICA Extensions of basic ICA framework

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

Machine Learning - MT & 14. PCA and MDS

Machine Learning - MT & 14. PCA and MDS Machine Learning - MT 2016 13 & 14. PCA and MDS Varun Kanade University of Oxford November 21 & 23, 2016 Announcements Sheet 4 due this Friday by noon Practical 3 this week (continue next week if necessary)

More information

Learning with Singular Vectors

Learning with Singular Vectors Learning with Singular Vectors CIS 520 Lecture 30 October 2015 Barry Slaff Based on: CIS 520 Wiki Materials Slides by Jia Li (PSU) Works cited throughout Overview Linear regression: Given X, Y find w:

More information

Dimensionality Reduction

Dimensionality Reduction Lecture 5 1 Outline 1. Overview a) What is? b) Why? 2. Principal Component Analysis (PCA) a) Objectives b) Explaining variability c) SVD 3. Related approaches a) ICA b) Autoencoders 2 Example 1: Sportsball

More information

Assignment 3. Introduction to Machine Learning Prof. B. Ravindran

Assignment 3. Introduction to Machine Learning Prof. B. Ravindran Assignment 3 Introduction to Machine Learning Prof. B. Ravindran 1. In building a linear regression model for a particular data set, you observe the coefficient of one of the features having a relatively

More information

Independent Component Analysis (ICA) Bhaskar D Rao University of California, San Diego

Independent Component Analysis (ICA) Bhaskar D Rao University of California, San Diego Independent Component Analysis (ICA) Bhaskar D Rao University of California, San Diego Email: brao@ucsdedu References 1 Hyvarinen, A, Karhunen, J, & Oja, E (2004) Independent component analysis (Vol 46)

More information

A hierarchical group ICA model for assessing covariate effects on brain functional networks

A hierarchical group ICA model for assessing covariate effects on brain functional networks A hierarchical group ICA model for assessing covariate effects on brain functional networks Ying Guo Joint work with Ran Shi Department of Biostatistics and Bioinformatics Emory University 06/03/2013 Ying

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information