MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

Size: px
Start display at page:

Download "MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A"

Transcription

1 MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A Pietro Guccione, PhD DEI - DIPARTIMENTO DI INGEGNERIA ELETTRICA E DELL INFORMAZIONE POLITECNICO DI BARI Pietro Guccione Assistant Professor in Signal Processing (pietro.guccione@poliba.it, )

2 Lecture 8 - Summary Linear Dimensionality Reduction Dimensionality Reduction Principal Component Analysis, examples Canonical Correlation Analysis, examples Multiset Canonical Correlation Analysis, example Summary 2

3 Data Collection: high dimensionality More variables than observations (Hughes phenomenon): When the number of variables is too high compared to the number of the samples, it may happen that the algorithm is unable to find a proper structure within data that can be generalized to other dataset of the same experiment. This is known as the curse of dimensionality or Hughes phenomenon. It may commonly occur in many fields. Example: Chemometrics: for determining concentrations in certain chemical compounds, calibration studies often need to analyze intensity measurements on a very large number (500 1,000 or more) of different spectral wavelengths using a small number of samples (few dozens) Overfitting in classification 3

4 Data Collection: high dimensionality The problem of high dimensionality involves also the estimation of parameters in hidden models (ex: number of coefficients in a regression problem) or of latent variables (number of mixtures in a density estimation problem) Overfitting in regression The problem of dimensionality depends on both the data and the algorithm. Possible solutions are: trying to change algorithm or trying to reduce the dimensionality of the problem 4

5 Dimensionality Reduction Two approaches are available to perform dimensionality reduction: Feature selection: choosing a subset of all the features (the ones more informative. Topic of a next lecture) Feature extraction: creating a subset of new features by combination of the existing ones In general, the optimal mapping y = f(x) will be a non-linear function. However, feature extraction is commonly limited to linear transformations: y = Wx 5

6 From Dimensionality Reduction to PCA Principal Component Analysis is a standard technique for visualizing high dimensional data and for data pre-processing. PCA reduces the dimensionality (the number of variables) of a data set by maintaining as much variance as possible. PCA: finds the directions of maximum variation of the data decorrelates the original variables by using orthogonal transformation The set of uncorrelated variables are said principal components Retain all the dimensions Reduce the dimensions 6

7 PCA: mathematical details Principal Component Analysis is an orthogonal linear transformation that transforms the data to a new coordinate system such that the greatest variance lies on the first coordinate, the second greatest on the second coordinate, and so on. Organize data in a matrix, X [N x P], N samples (repetition of the experiment), P variates (the features of the experiment). The full principal components decomposition of X can be given as: T T XW X TW nxp nxp pxp T T T WΛW X X WW I The principal components T (called scores) are achieved as a linear combination of data and a set of weights (the loadings) The (column) weights W (that are the loadings) are the eigenvectors of the sample covariance matrix of data Λ is the diagonal matrix of the eigenvalues (sorted in decreasing order) N 1 p xnp N n1 N 1 q jk ( xnj j )( xnk k ) N 1 n1 Just to recall in mind: the sample mean and covariance 7

8 PCA: meaning In PCA data, are decomposed by projecting in a new space of the same dimension. Samples are described in a multi-dimensional space PCA 2 The loadings are the weight by which each standardized original variable should be multiplied to get the component score The scores are the transformed variable values corresponding to each sample T XW PCA Decomposition is done to maximize the variance (the energy) of the data in the first (few) dimensions PCA PCA

9 PCA: dimensionality reduction Not all the principal components are equally important. Their relative importance is given by the explained variance. A typical plot of the variance is given by: Cumulative variance explained [%] components We want 99% of variance explained n c =6 components are enough Cumulative variance explained [%] components Y X M UW 1 ( ) ' X UW ' M = n n X U :,1: W :,1: ' ΣM c c [Using matlab notation] 9

10 PCA dimensional reduction: example Example: hyperspectral image of Earth achieved using a sensor with 103 bands in the interval of visible and near infrared. False color representation of the Pavia city, Italy achieved by simple superposition of all the bands

11 PCA dimensional reduction: example PCA on the dataset: cumulative variance 100 Exmplained variance [%] Components 11

12 PCA dimensional reduction: example Residual of original data and re-composition using: 1 st 1, st 2 and 1 st and component 2 nd 3 rd components

13 PCA: dimensionality reduction Another tool: the biplot. The biplot is a way to jointly represents information about the samples and variables, after dimensionality reduction. The samples are represented as dots in a plane (supposing just 2 variables survived after dimensionality reduction); variables are displayed as vectors (by using the value of the loadings) Questions: Is there a particular combination of experimental condition able to better describe a group of samples? Which samples are described by which variates/combination? 20 PCA samples Explained variance [%] AA-1 AAm-1 HEMA-1 MAA-1 HEMA-2 Monomer 1/2 ratio Crosslinker Crosslinker concentration Experimental Conditions Support CaCl2 Concentration Additive Additive concentration # components 13

14 PCA: dimensionality reduction 0.8 Additive Second Component Additive concentration HEMA-1 Support AA-1 CaCl2 Concentration AAm-1 HEMA-2 Crosslinker MAA-1 concentration Monomer 1/2 ratio -0.4 Crosslinker First Component Similar The score higher vectors scores of the are each for vector, the samples loadings, the on higher mean the i.e. two similar the contribution components. variates, behavior projected of (in a this variate on case: the to similar principal that component, experimental components the conditions) higher the influence of that variate on that group of samples 14

15 Example: X-ray Powder Diffraction /1 A set of X-ray Powder Diffraction patterns are collected by changing an external stimulus, properly driven, along time. The spectra are function of the diffraction angle and of time. Due to the changing with the stimulus, the active and the silent parts of the structure behave differently. The corresponding XPD spectrum can be written as the superposition of three contributions: Coming from active atoms, Coming from active and silent atoms, Coming from silent atoms only. The three terms (each function of angle and time) have constraints among them, so a simple decomposition in principal components might fail A 2,t ~ 1200 X(n,p), n=1,,n; p=1, P time profile 15

16 Example: X-ray Powder Diffraction /2 Decomposition of by using PCA: 2, f1 t R 1 2 R2 2 R3 2 f2 t f t A t R f t R f t R f t A 2,t ~ 1200 X(n,p), n=1,,n; p=1, P time profile 16

17 Example: X-ray Powder Diffraction /3 Normalize data [?] Usually, the column-mean is removed; while removing of the standard deviation depends on the energy level of each variate p 1 N N n1 1 x np N p np p N 1 n1 x 2 P ω X(n,p), n=1,,n; p=1, P Z-scoring: remove the mean and divide for the standard deviation Apply the PCA Y X 1μ' 1 ( 1σ' ) Y UW' 17

18 Example: X-ray Powder Diffraction /4 In chemometrics: PCA relates the multivariate response (spectra = the loadings or components) to the concentration of the analyte of interest (the scores in PCA) A way to interpret the matrix multiplication score times loadings: ~ Signs are not accounted for in PCA decomposition! X 18

19 PCA Example: X-ray Powder Diffraction /5 Score #1 3.5 x 109 Loading #1 3 1 st component (or score) / corresponding loading (spectrum) Active and silent contribution Intensity PCA tests Score # tests 2 nd component (or score) / corresponding loading (spectrum) Active contribution only Intensity theta [deg] 4.5 x 109 Loading # theta [deg] 15 Score #3 18 x 1010 Loading # PCA rd component (or score) / corresponding loading (spectrum) Intensity Silent contribution only tests theta [deg] 19

20 PCA: how many components to retain? /1 This is the main problem in PCA when approximation of original dimensionality is necessary: how many principal components to retain. Components are sorted from highest variance to the lowest (PCA is a variance maximization technique), so the firsts PC, that cumulate high enough variance, should be sufficient. The question then is shifted to the magnitude of the eigenvalues of the covariance matrix of X (the sample covariance matrix), Σ X How small can an eigenvalue be, while the corresponding principal component is still considered significant? First method: Scree Plot: The sample eigenvalues from a PCA are ordered from largest to smallest. If the largest few sample eigenvalues dominate in magnitude, with the remaining sample eigenvalues very small, then the scree plot will exhibit an elbow in the plot corresponding to the division into large and small values of the sample eigenvalues. The order number at which the elbow occurs can be used to determine how many principal components to retain. 20

21 PCA: how many components to retain? /2 Scree plot for Gaussian Multivariate Data The scenarios may be very different, according to the relationship between data size and number of variates (or the rank of the covariance matrix) Example: Z [r x n] matrix with elements drawn from N(0,1) D [r x r] diagonal matrix, with D 2 =diag{20,17,12,8,3,2,2,,2} X = DZ is the multivariate dataset Σ X = n -1 XX T is the sample covariance matrix 1 st case: n = 300, r = 30 2 nd case: n = 30, r = 30 >> n=300; r=30; >> Z = randn(r,n); Written as transpose of the usual way >> diagvec = ones(r,1)*2; >> diagvec(1:5) = [ ]; >> D = sqrt(diag(diagvec)); >> X = D*Z; >> lambda = eig(x*x /n); >> figure,plot(lambda(end:-1:1)); grid on; 21

22 PCA: how many components to retain? /3 1 st case: n = 300, r = 30 2 nd case: n = 30, r = 30 >> n=30; r=30; >> Z = randn(r,n); >> diagvec = ones(r,1)*2; >> diagvec(1:5) = [ ]; >> D = sqrt(diag(diagvec)); >> X = D*Z; >> lambda = eig(x*x /n); >> figure,plot(lambda(end:-1:1)); grid on; elbow no elbow 22

23 PCA: how many components to retain? /4 Second method: The rank trace plot It consists of the plot of the residual of the eigenvalues versus a properly transformed sequential number t C 1 r r i t 1 r i1 i i C elbow no elbow C 23

24 The Canonical Correlation Analysis /1 CCA is a statistical multivariate technique used to analyze the relationship between two set of variables. CCA seeks two sets of transformed variates such that these assume the maximum correlation across the two datasets X [ X,..., X ] 1 1 Y [ Y,..., Y ] q p T T The two sets of variates CCA aims at finding the vector basis (or coefficients) {b x, b y } i, such that: The correlations between the projections of the variables onto these basis are mutually maximized; Each pair of basis is uncorrelated with the preceding ones. The obtained projections are the canonical variables, i.e. the linear combination of variables making up the j-th basis for X and Y: T T b X,b Y x y dxp pxn dxq qxn d=min{rank(x),rank(y)} 24

25 The Canonical Correlation Analysis /2 vectors of random variables Basis vectors (or scores) Canonical correlation analysis can be defined as the problem of finding two sets of basis vectors, one for x and the other for y, such that the correlations between the projections of the variables onto these basis vectors are mutually maximized. Let us consider the first step. Find first pair of canonical variables maximum canonical correlation coefficient Problem: 25

26 The Canonical Correlation Analysis /3 The subsequent canonical correlations must be uncorrelated for different solutions: The solution of the CCA problem can be obtained by solving the eigenvalue equations: where: are the within-sets covariance matrices are the between-sets covariance matrices 26

27 The Canonical Correlation Analysis /4 The CCA is optimal in solving, in the least square sense, the following problem: That is finding the G, H and ν, to minimize the matrix: Solution is given by: GX G,[ d p] HY HY H,[ d q] GX+ E HY GX HY GX T H YH It H G G H ( t) ( t) ( t) Y u T T ( t) 1/2 1 1/2 Y YX X X T T t tu t T 1 ( t) 1/2 Y T t X T d min( p, q) with the columns of G and H being the pair of canonical variate scores The correlation, ρ j, between ξ j and ω j is called the canonical correlation coefficient associated with the j th pair of canonical variates, j = 1, 2,..., t. 27

28 The Canonical Correlation Analysis /5 Example: we take a subset of a public catalogue of a large number of astronomical objects (from Izenmann Modern Multivariate Statistical Analysis, Springer, 2008): the COMBO-17 dataset ( The brightness of each object in 17 passbands, the magnitude, the redshift and other variables have been arranged in X (23 variables) and Y (6 variables), according to some criterion. 28

29 The Canonical Correlation Analysis /6 Astronomers want to know if groups of absolute magnitude are correlated with each other 29

30 The Canonical Correlation Analysis /7 Just one-two of the projected correlations are large, the other are very small 30

31 The Multiset Canonical Correlation Analysis /1 vectors of random variables M-CCA performs a sequence of deflationary linear transformations of the original sets that are solutions of a constrained optimization problem: s-th stage set of canonical variables The s-th stage set of canonical variables can be selected to maximize or minimize a particular function of its correlation matrix, subject to certain restrictions: 31

32 General principle of linear BSS: M-CCA: operative example /1 Blind source separation consists in estimating some underlying source signals from a mixture. s 1 s 2 A x 1 x 2 s N x N functional Magnetic Resonance Imaging (fmri) measures brain activity by detecting changes in blood flow and so in neuronal activity. unknown mixing process fmri data analysis goal: to detect correlations between brain activation areas and a task paradigm. 32

33 M-CCA: operative example /2 Data driven methods make no assumptions on the response to recover Subject i decomposition Subject j decomposition C 1 C 1 C 2 C 3 Source matching problem C 2 C 3 C n C n 33

34 M-CCA: operative example /3 Task: Visual N-Back condition 0-Back condition: identify the number currently seen 2-Back condition: recall the number seen two stimuli before Sequence of stimuli Correct Responses Time 0-Back Back Back

35 M-CCA: operative example /4 Stimulus paradigm: Task duration: 4 min Block duration: 30 sec Tasks: 0-Back / 2-Back 0B 2B 0B 2B 0B 2B 0B 2B Time The cyclic nature of the reference temporal task paradigm may be exploited to further populate the dataset. 0B 2B 0B 2B 0B 2B 0B 2B Time Time 35

36 M-CCA: operative example /5 Subj 1 [T, V] [K, V] Reshape Spatial PCA Subj M MCCA Spatial Sources extraction Source unmixing K spatial maps, K temporal trends for each subject Time Courses extraction

37 M-CCA: operative example /6 Most significant estimated sources Time (s) Time (s) Working memory system: prefrontal cortex parietal cortex anterior cingulate basal ganglia Time (s) Time (s) Time (s) 37

38 M-CCA: operative example /7 Mean Controls Sources Mean Patients sources 38

39 M-CCA: operative example /8 Controls Patients Difference Maps 39

40 M-CCA: operative example /9 t-score: Selected features: Thresholds: η t,inf = -2.45, η t,sup = 2.47 Fisher-score: Selected features: Thresholds: η w =

41 Component Analysis: summary When the number of variates is too high it could be useful a reduction. The reduction based on linear decomposition grounds on the hypothesis that many of such variates are correlated among them and that it is possible to represent the variates in a new space, when a reduced number of such components are sufficient to represent data PCA is the simplest component decomposition technique. Choice of the number of reduced components to retain is a side problem Different PCA can be applied on groups of data: Canonical Correlation Analysis, on group of two dataset and Multiset Canonical Correlation Analysis, which is a generalization of CCA Responses are all related to the real possibility to identify linear decomposition of sources starting from variates (homogeneous variates is a strong hypothesis, i.e. variates that come from the same origin, same physical quantity) 41

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A

MultiDimensional Signal Processing Master Degree in Ingegneria delle Telecomunicazioni A.A MultiDimensional Signal Processing Master Degree in Ingegneria delle elecomunicazioni A.A. 015-016 Pietro Guccione, PhD DEI - DIPARIMENO DI INGEGNERIA ELERICA E DELL INFORMAZIONE POLIECNICO DI BARI Pietro

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In

More information

GEOG 4110/5100 Advanced Remote Sensing Lecture 15

GEOG 4110/5100 Advanced Remote Sensing Lecture 15 GEOG 4110/5100 Advanced Remote Sensing Lecture 15 Principal Component Analysis Relevant reading: Richards. Chapters 6.3* http://www.ce.yildiz.edu.tr/personal/songul/file/1097/principal_components.pdf *For

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17 Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis Chris Funk Lecture 17 Outline Filters and Rotations Generating co-varying random fields Translating co-varying fields into

More information

Chapter 4: Factor Analysis

Chapter 4: Factor Analysis Chapter 4: Factor Analysis In many studies, we may not be able to measure directly the variables of interest. We can merely collect data on other variables which may be related to the variables of interest.

More information

Principal Component Analysis. Applied Multivariate Statistics Spring 2012

Principal Component Analysis. Applied Multivariate Statistics Spring 2012 Principal Component Analysis Applied Multivariate Statistics Spring 2012 Overview Intuition Four definitions Practical examples Mathematical example Case study 2 PCA: Goals Goal 1: Dimension reduction

More information

Vector Space Models. wine_spectral.r

Vector Space Models. wine_spectral.r Vector Space Models 137 wine_spectral.r Latent Semantic Analysis Problem with words Even a small vocabulary as in wine example is challenging LSA Reduce number of columns of DTM by principal components

More information

Basics of Multivariate Modelling and Data Analysis

Basics of Multivariate Modelling and Data Analysis Basics of Multivariate Modelling and Data Analysis Kurt-Erik Häggblom 6. Principal component analysis (PCA) 6.1 Overview 6.2 Essentials of PCA 6.3 Numerical calculation of PCs 6.4 Effects of data preprocessing

More information

Data reduction for multivariate analysis

Data reduction for multivariate analysis Data reduction for multivariate analysis Using T 2, m-cusum, m-ewma can help deal with the multivariate detection cases. But when the characteristic vector x of interest is of high dimension, it is difficult

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

Principal component analysis

Principal component analysis Principal component analysis Angela Montanari 1 Introduction Principal component analysis (PCA) is one of the most popular multivariate statistical methods. It was first introduced by Pearson (1901) and

More information

Dimensionality Reduction Techniques (DRT)

Dimensionality Reduction Techniques (DRT) Dimensionality Reduction Techniques (DRT) Introduction: Sometimes we have lot of variables in the data for analysis which create multidimensional matrix. To simplify calculation and to get appropriate,

More information

Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.

More information

Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.

More information

Neuroscience Introduction

Neuroscience Introduction Neuroscience Introduction The brain As humans, we can identify galaxies light years away, we can study particles smaller than an atom. But we still haven t unlocked the mystery of the three pounds of matter

More information

Principal Component Analysis (PCA) Theory, Practice, and Examples

Principal Component Analysis (PCA) Theory, Practice, and Examples Principal Component Analysis (PCA) Theory, Practice, and Examples Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite) variables. p k n A

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Principal Components Analysis (PCA)

Principal Components Analysis (PCA) Principal Components Analysis (PCA) Principal Components Analysis (PCA) a technique for finding patterns in data of high dimension Outline:. Eigenvectors and eigenvalues. PCA: a) Getting the data b) Centering

More information

New Machine Learning Methods for Neuroimaging

New Machine Learning Methods for Neuroimaging New Machine Learning Methods for Neuroimaging Gatsby Computational Neuroscience Unit University College London, UK Dept of Computer Science University of Helsinki, Finland Outline Resting-state networks

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

Lecture 4: Principal Component Analysis and Linear Dimension Reduction

Lecture 4: Principal Component Analysis and Linear Dimension Reduction Lecture 4: Principal Component Analysis and Linear Dimension Reduction Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics University of Pittsburgh E-mail:

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component

More information

Dimension Reduction (PCA, ICA, CCA, FLD,

Dimension Reduction (PCA, ICA, CCA, FLD, Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011 April 6 th, 2011 Parts of the PCA slides are from previous 10-701 lectures 1 Outline Dimension reduction

More information

Principal Component Analysis

Principal Component Analysis I.T. Jolliffe Principal Component Analysis Second Edition With 28 Illustrations Springer Contents Preface to the Second Edition Preface to the First Edition Acknowledgments List of Figures List of Tables

More information

Machine Learning 11. week

Machine Learning 11. week Machine Learning 11. week Feature Extraction-Selection Dimension reduction PCA LDA 1 Feature Extraction Any problem can be solved by machine learning methods in case of that the system must be appropriately

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Introduction Consider a zero mean random vector R n with autocorrelation matri R = E( T ). R has eigenvectors q(1),,q(n) and associated eigenvalues λ(1) λ(n). Let Q = [ q(1)

More information

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Combinations of features Given a data matrix X n p with p fairly large, it can

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

Maximum variance formulation

Maximum variance formulation 12.1. Principal Component Analysis 561 Figure 12.2 Principal component analysis seeks a space of lower dimensionality, known as the principal subspace and denoted by the magenta line, such that the orthogonal

More information

Vectors and Matrices Statistics with Vectors and Matrices

Vectors and Matrices Statistics with Vectors and Matrices Vectors and Matrices Statistics with Vectors and Matrices Lecture 3 September 7, 005 Analysis Lecture #3-9/7/005 Slide 1 of 55 Today s Lecture Vectors and Matrices (Supplement A - augmented with SAS proc

More information

Machine Learning (BSMC-GA 4439) Wenke Liu

Machine Learning (BSMC-GA 4439) Wenke Liu Machine Learning (BSMC-GA 4439) Wenke Liu 02-01-2018 Biomedical data are usually high-dimensional Number of samples (n) is relatively small whereas number of features (p) can be large Sometimes p>>n Problems

More information

Machine Learning 2nd Edition

Machine Learning 2nd Edition INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010

More information

Tutorial on Blind Source Separation and Independent Component Analysis

Tutorial on Blind Source Separation and Independent Component Analysis Tutorial on Blind Source Separation and Independent Component Analysis Lucas Parra Adaptive Image & Signal Processing Group Sarnoff Corporation February 09, 2002 Linear Mixtures... problem statement...

More information

7. Variable extraction and dimensionality reduction

7. Variable extraction and dimensionality reduction 7. Variable extraction and dimensionality reduction The goal of the variable selection in the preceding chapter was to find least useful variables so that it would be possible to reduce the dimensionality

More information

Factor Analysis Continued. Psy 524 Ainsworth

Factor Analysis Continued. Psy 524 Ainsworth Factor Analysis Continued Psy 524 Ainsworth Equations Extraction Principal Axis Factoring Variables Skiers Cost Lift Depth Powder S1 32 64 65 67 S2 61 37 62 65 S3 59 40 45 43 S4 36 62 34 35 S5 62 46 43

More information

Principal Component Analysis vs. Independent Component Analysis for Damage Detection

Principal Component Analysis vs. Independent Component Analysis for Damage Detection 6th European Workshop on Structural Health Monitoring - Fr..D.4 Principal Component Analysis vs. Independent Component Analysis for Damage Detection D. A. TIBADUIZA, L. E. MUJICA, M. ANAYA, J. RODELLAR

More information

Data Preprocessing Tasks

Data Preprocessing Tasks Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can

More information

HST.582J/6.555J/16.456J

HST.582J/6.555J/16.456J Blind Source Separation: PCA & ICA HST.582J/6.555J/16.456J Gari D. Clifford gari [at] mit. edu http://www.mit.edu/~gari G. D. Clifford 2005-2009 What is BSS? Assume an observation (signal) is a linear

More information

Drift Reduction For Metal-Oxide Sensor Arrays Using Canonical Correlation Regression And Partial Least Squares

Drift Reduction For Metal-Oxide Sensor Arrays Using Canonical Correlation Regression And Partial Least Squares Drift Reduction For Metal-Oxide Sensor Arrays Using Canonical Correlation Regression And Partial Least Squares R Gutierrez-Osuna Computer Science Department, Wright State University, Dayton, OH 45435,

More information

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection

More information

15 Singular Value Decomposition

15 Singular Value Decomposition 15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Salvador Dalí, Galatea of the Spheres CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang Some slides from Derek Hoiem and Alysha

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

STATISTICAL LEARNING SYSTEMS

STATISTICAL LEARNING SYSTEMS STATISTICAL LEARNING SYSTEMS LECTURE 8: UNSUPERVISED LEARNING: FINDING STRUCTURE IN DATA Institute of Computer Science, Polish Academy of Sciences Ph. D. Program 2013/2014 Principal Component Analysis

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

2.5 Multivariate Curve Resolution (MCR)

2.5 Multivariate Curve Resolution (MCR) 2.5 Multivariate Curve Resolution (MCR) Lecturer: Dr. Lionel Blanchet The Multivariate Curve Resolution (MCR) methods are widely used in the analysis of mixtures in chemistry and biology. The main interest

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Anders Øland David Christiansen 1 Introduction Principal Component Analysis, or PCA, is a commonly used multi-purpose technique in data analysis. It can be used for feature

More information

DIMENSION REDUCTION AND CLUSTER ANALYSIS

DIMENSION REDUCTION AND CLUSTER ANALYSIS DIMENSION REDUCTION AND CLUSTER ANALYSIS EECS 833, 6 March 2006 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@kgs.ku.edu 864-2093 Overheads and resources available at http://people.ku.edu/~gbohling/eecs833

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

Quantitative Understanding in Biology Principal Components Analysis

Quantitative Understanding in Biology Principal Components Analysis Quantitative Understanding in Biology Principal Components Analysis Introduction Throughout this course we have seen examples of complex mathematical phenomena being represented as linear combinations

More information

Basics of Multivariate Modelling and Data Analysis

Basics of Multivariate Modelling and Data Analysis Basics of Multivariate Modelling and Data Analysis Kurt-Erik Häggblom 2. Overview of multivariate techniques 2.1 Different approaches to multivariate data analysis 2.2 Classification of multivariate techniques

More information

Artificial Intelligence Module 2. Feature Selection. Andrea Torsello

Artificial Intelligence Module 2. Feature Selection. Andrea Torsello Artificial Intelligence Module 2 Feature Selection Andrea Torsello We have seen that high dimensional data is hard to classify (curse of dimensionality) Often however, the data does not fill all the space

More information

MULTI-VARIATE/MODALITY IMAGE ANALYSIS

MULTI-VARIATE/MODALITY IMAGE ANALYSIS MULTI-VARIATE/MODALITY IMAGE ANALYSIS Duygu Tosun-Turgut, Ph.D. Center for Imaging of Neurodegenerative Diseases Department of Radiology and Biomedical Imaging duygu.tosun@ucsf.edu Curse of dimensionality

More information

CSC 411 Lecture 12: Principal Component Analysis

CSC 411 Lecture 12: Principal Component Analysis CSC 411 Lecture 12: Principal Component Analysis Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 12-PCA 1 / 23 Overview Today we ll cover the first unsupervised

More information

PRINCIPAL COMPONENT ANALYSIS

PRINCIPAL COMPONENT ANALYSIS PRINCIPAL COMPONENT ANALYSIS 1 INTRODUCTION One of the main problems inherent in statistics with more than two variables is the issue of visualising or interpreting data. Fortunately, quite often the problem

More information

1 Linearity and Linear Systems

1 Linearity and Linear Systems Mathematical Tools for Neuroscience (NEU 34) Princeton University, Spring 26 Jonathan Pillow Lecture 7-8 notes: Linear systems & SVD Linearity and Linear Systems Linear system is a kind of mapping f( x)

More information

Principal component analysis

Principal component analysis Principal component analysis Motivation i for PCA came from major-axis regression. Strong assumption: single homogeneous sample. Free of assumptions when used for exploration. Classical tests of significance

More information

Principal Components Analysis. Sargur Srihari University at Buffalo

Principal Components Analysis. Sargur Srihari University at Buffalo Principal Components Analysis Sargur Srihari University at Buffalo 1 Topics Projection Pursuit Methods Principal Components Examples of using PCA Graphical use of PCA Multidimensional Scaling Srihari 2

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

1 Principal Components Analysis

1 Principal Components Analysis Lecture 3 and 4 Sept. 18 and Sept.20-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Principal Components Analysis Principal components analysis (PCA) is a very popular technique for

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 4: Factor analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2017/2018 Master in Mathematical Engineering Pedro

More information

Unconstrained Ordination

Unconstrained Ordination Unconstrained Ordination Sites Species A Species B Species C Species D Species E 1 0 (1) 5 (1) 1 (1) 10 (4) 10 (4) 2 2 (3) 8 (3) 4 (3) 12 (6) 20 (6) 3 8 (6) 20 (6) 10 (6) 1 (2) 3 (2) 4 4 (5) 11 (5) 8 (5)

More information

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = 30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can

More information

A MULTIVARIATE MODEL FOR COMPARISON OF TWO DATASETS AND ITS APPLICATION TO FMRI ANALYSIS

A MULTIVARIATE MODEL FOR COMPARISON OF TWO DATASETS AND ITS APPLICATION TO FMRI ANALYSIS A MULTIVARIATE MODEL FOR COMPARISON OF TWO DATASETS AND ITS APPLICATION TO FMRI ANALYSIS Yi-Ou Li and Tülay Adalı University of Maryland Baltimore County Baltimore, MD Vince D. Calhoun The MIND Institute

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Principal Component Analysis CS498

Principal Component Analysis CS498 Principal Component Analysis CS498 Today s lecture Adaptive Feature Extraction Principal Component Analysis How, why, when, which A dual goal Find a good representation The features part Reduce redundancy

More information

Chemometrics. Matti Hotokka Physical chemistry Åbo Akademi University

Chemometrics. Matti Hotokka Physical chemistry Åbo Akademi University Chemometrics Matti Hotokka Physical chemistry Åbo Akademi University Linear regression Experiment Consider spectrophotometry as an example Beer-Lamberts law: A = cå Experiment Make three known references

More information

Karhunen-Loève Transform KLT. JanKees van der Poel D.Sc. Student, Mechanical Engineering

Karhunen-Loève Transform KLT. JanKees van der Poel D.Sc. Student, Mechanical Engineering Karhunen-Loève Transform KLT JanKees van der Poel D.Sc. Student, Mechanical Engineering Karhunen-Loève Transform Has many names cited in literature: Karhunen-Loève Transform (KLT); Karhunen-Loève Decomposition

More information

Unsupervised Learning: Dimensionality Reduction

Unsupervised Learning: Dimensionality Reduction Unsupervised Learning: Dimensionality Reduction CMPSCI 689 Fall 2015 Sridhar Mahadevan Lecture 3 Outline In this lecture, we set about to solve the problem posed in the previous lecture Given a dataset,

More information

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016

Lecture 8. Principal Component Analysis. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. December 13, 2016 Lecture 8 Principal Component Analysis Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza December 13, 2016 Luigi Freda ( La Sapienza University) Lecture 8 December 13, 2016 1 / 31 Outline 1 Eigen

More information

Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c /9/9 page 147 le-tex

Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c /9/9 page 147 le-tex Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c08 2013/9/9 page 147 le-tex 8.3 Principal Component Analysis (PCA) 147 Figure 8.1 Principal and independent components

More information

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables 1 A factor can be considered to be an underlying latent variable: (a) on which people differ (b) that is explained by unknown variables (c) that cannot be defined (d) that is influenced by observed variables

More information

STAT 730 Chapter 9: Factor analysis

STAT 730 Chapter 9: Factor analysis STAT 730 Chapter 9: Factor analysis Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Data Analysis 1 / 15 Basic idea Factor analysis attempts to explain the

More information

12.2 Dimensionality Reduction

12.2 Dimensionality Reduction 510 Chapter 12 of this dimensionality problem, regularization techniques such as SVD are almost always needed to perform the covariance matrix inversion. Because it appears to be a fundamental property

More information

Principal Component Analysis!! Lecture 11!

Principal Component Analysis!! Lecture 11! Principal Component Analysis Lecture 11 1 Eigenvectors and Eigenvalues g Consider this problem of spreading butter on a bread slice 2 Eigenvectors and Eigenvalues g Consider this problem of stretching

More information

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x = Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.

More information

Independent Component Analysis and Its Applications. By Qing Xue, 10/15/2004

Independent Component Analysis and Its Applications. By Qing Xue, 10/15/2004 Independent Component Analysis and Its Applications By Qing Xue, 10/15/2004 Outline Motivation of ICA Applications of ICA Principles of ICA estimation Algorithms for ICA Extensions of basic ICA framework

More information

A non-gaussian decomposition of Total Water Storage (TWS), using Independent Component Analysis (ICA)

A non-gaussian decomposition of Total Water Storage (TWS), using Independent Component Analysis (ICA) Titelmaster A non-gaussian decomposition of Total Water Storage (TWS, using Independent Component Analysis (ICA Ehsan Forootan and Jürgen Kusche Astronomical Physical & Mathematical Geodesy, Bonn University

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Learning with Singular Vectors

Learning with Singular Vectors Learning with Singular Vectors CIS 520 Lecture 30 October 2015 Barry Slaff Based on: CIS 520 Wiki Materials Slides by Jia Li (PSU) Works cited throughout Overview Linear regression: Given X, Y find w:

More information

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization

More information

Robustness of Principal Components

Robustness of Principal Components PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.

More information

Computational functional genomics

Computational functional genomics Computational functional genomics (Spring 2005: Lecture 8) David K. Gifford (Adapted from a lecture by Tommi S. Jaakkola) MIT CSAIL Basic clustering methods hierarchical k means mixture models Multi variate

More information

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007 MIT OpenCourseWare http://ocw.mit.edu HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing Spring 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations. Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

More information

LECTURE 18. Lecture outline Gaussian channels: parallel colored noise inter-symbol interference general case: multiple inputs and outputs

LECTURE 18. Lecture outline Gaussian channels: parallel colored noise inter-symbol interference general case: multiple inputs and outputs LECTURE 18 Last time: White Gaussian noise Bandlimited WGN Additive White Gaussian Noise (AWGN) channel Capacity of AWGN channel Application: DS-CDMA systems Spreading Coding theorem Lecture outline Gaussian

More information

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II Gatsby Unit University College London 27 Feb 2017 Outline Part I: Theory of ICA Definition and difference

More information

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Machine Learning. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Machine Learning Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1395 1 / 47 Table of contents 1 Introduction

More information

STATS 306B: Unsupervised Learning Spring Lecture 12 May 7

STATS 306B: Unsupervised Learning Spring Lecture 12 May 7 STATS 306B: Unsupervised Learning Spring 2014 Lecture 12 May 7 Lecturer: Lester Mackey Scribe: Lan Huong, Snigdha Panigrahi 12.1 Beyond Linear State Space Modeling Last lecture we completed our discussion

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis .. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make

More information

EDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS

EDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS EDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS Mario Romanazzi October 29, 2017 1 Introduction An important task in multidimensional data analysis is reduction in complexity. Recalling that

More information

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING Vishwanath Mantha Department for Electrical and Computer Engineering Mississippi State University, Mississippi State, MS 39762 mantha@isip.msstate.edu ABSTRACT

More information

Linear Subspace Models

Linear Subspace Models Linear Subspace Models Goal: Explore linear models of a data set. Motivation: A central question in vision concerns how we represent a collection of data vectors. The data vectors may be rasterized images,

More information

COMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare

COMP6237 Data Mining Covariance, EVD, PCA & SVD. Jonathon Hare COMP6237 Data Mining Covariance, EVD, PCA & SVD Jonathon Hare jsh2@ecs.soton.ac.uk Variance and Covariance Random Variables and Expected Values Mathematicians talk variance (and covariance) in terms of

More information