Computer exercise 3: PCA, CCA and factors. Principal component analysis. Eigenvalues and eigenvectors

Size: px
Start display at page:

Download "Computer exercise 3: PCA, CCA and factors. Principal component analysis. Eigenvalues and eigenvectors"

Transcription

1 UPPSALA UNIVERSITY Department of Mathematics Måns Thulin Multivariate Methods Spring 2011 Computer exercise 3: PCA, CCA and factors In this computer exercise the following topics are studied: Principal component analysis Factor analysis Canonical correlation analysis Principal component analysis Principal components are linear combinations of random variables, given by the eigenvalues and eigenvectors of the covariance or correlation matrix of the random variables. Eigenvalues and eigenvectors Sometimes a covariance or correlation matrix is given directly in the problem at hand. Consider for instance the covariance matrix in Example 8.1 on page 434 of J&W: Σ = To find the principal components, we must find the eigenvalues and eigenvectors of Σ. These are found by the command eigen; see the following script, where at the last line the first principal component is extracted: Sig <- cbind(c(1,-2,0),c(-2,5,0),c(0,0,2)); eigen(sig) pc1 <- eigen(sig)$vector[,1] Now let us verify that the vectors and scalars given by eigen really are the eigenvectors and corresponding eigenvalues of Σ. Make sure that you understand what the following commands do and why they yield the same result. Sig %*% eigen(sig)$vector[,1] eigen(sig)$values[1] * eigen(sig)$vector[,1] The vectors with the coefficients for the principal components should have lengh 1 and be orthogonal to eachother. We can verify that the eigenvectors of Σ satisfy this by R: t(eigen(sig)$vector[,1]) %*% eigen(sig)$vector[,1] t(eigen(sig)$vector[,1]) %*% eigen(sig)$vector[,2] 1

2 Principal components given a data structure If an entire multivariate dataset is given, the key command for PCA in R is princomp. We will study a dataset of air pollution in US cities, given in the file usair.r. Seven variables were recorded for 41 cities: SO2: Sulphur dioxide content of air in micrograms per cubic meter Temp: Average annual temperature in F Manuf: Number of manufacturing enterprises employing 20 or more workers Pop: Population size (1970 census) in thousands Wind: Average annual wind speed in miles per hour Precip: Average annual precipitation in inches Days: Average number of days with precipitation per year The example is taken from a book by Everitt. 1 We will for a moment disregard the SO2 variable and study the remaining six: two of these relate to human ecology (Pop, Manuf) and four to climate (Temp, Wind, Precip, Days). Everitt suggests looking at 1 times the temperatures, since then all six variables are such that high values represent a less attractive enviroment. We start by studying all combinations of scatter plots and computing correlations: source("usair.r") usair pairs(usair[,-1]) cor(usair[,-1]) Are there any outliers? For simplicity, we continue the analysis of the complete data set, but at this stage in a data analysis it is good practice to keep an eye out for possible outliers that may affect the analysis. Since the variables are measured on different scales, it seems reasonable to use the correlation matrix for the analysis, since the variables with big variance will dominate the principal components completely otherwise. The principal components are given by the eigenvectors of the correlation matrix: eigen(cor(usair[,-1])) Alternatively, the command princomp can be used. This gives us a nice summary of the principal components as well: usairpc<-princomp(usair[,-1],cor=true) summary(usairpc,loadings=true) How are the standard deviations listed for the principal components related to the eigenvalues of the correlation matrix? Everitt tries to interpret the first three principal components, and labels them quality of life, wet weather and climate type. Do you agree with his interpretation? To see how much of the variation in the data that is described by the principal components, we look at a screeplot: screeplot(usairpc,type="l") We can now study the data set by looking at a scatter plot of the first two principal components: 1 Everitt, B.S. (2005), An R and S-PLUS Companion to Multivariate Analysis, Springer 2

3 plot(usairpc$scores[,1],usairpc$scores[,2], ylim=range(usairpc$scores[,1]),xlab = "PC1",ylab="PC2",lwd=2) It can also be of interest to plot the first principal component versus the third. In the above plot, we can use the command identify to see which point belongs to which city. An alternative is to plot the (abbreviated) city names directly in the plot: par(pty="s") plot(usairpc$scores[,1],usairpc$scores[,2], ylim=range(usairpc$scores[,1]),xlab = "PC1",ylab="PC2",type="n") text(usairpc$scores[,1],usairpc$scores[,2], labels = abbreviate(row.names(usair)),cex=0.7,lwd=2) One purpose for collecting the air pollution data was to see how the SO2 variable is related to the other variables. We can study its relation the the principal components in different ways; for instance by a bubble plot: plot(usairpc$scores[,1],usairpc$scores[,2], ylim=range(usairpc$scores[,1]),xlab = "PC1",ylab="PC2",type="n") symbols(usairpc$scores[,1],usairpc$scores[,2],circles=sqrt(usair$so2), inches=0.2,add=true,bg="pink",fg="black") Or by plots versus the first three principal components: par(mfrow=c(1,3)) plot(usairpc$scores[,1],usair$so2,xlab="pc1") plot(usairpc$scores[,2],usair$so2,xlab="pc2") plot(usairpc$scores[,3],usair$so2,xlab="pc3") What are your conclusions? Is SO2 related to any of the principal components? Principal components and cluster analysis In cluster analysis, studied in the next block of the course, the goal is to find clusters sets of points that somehow belong together in the data set. Some methods for finding clusters require that the user states the number of clusters beforehand. If the data is high-dimensional, it may be hard to find the number of clusters by visual means. In this situation, principal components sometimes come in handy. Let s look at the four-dimensional data in data2.dat. data2<-read.table("data2.dat") pairs(data2) The pairwise scatter plots seem to indicate that there are indeed clusters within the data, but it is hard to say how many there are. Perhaps we can find out by looking at the principal components. data2p<-princomp(data2) screeplot(data2p,type="l") plot(data2p$scores[,1],data2p$scores[,2]) 3

4 Judging from the plot, how many clusters would you say there are in the data set? The screeplot tells us that the third and fourth principal components contain little information. However, in this particular case, including the third component in the analysis might change the conclusion regarding the number of clusters. Let s examine the clusters in three dimensions pay extra attention to the rightmost cluster in the two-dimensional scatter plot above. library(rgl) plot3d(data2p$scores[,1],data2p$scores[,3],data2p$scores[,2],col=2) Factor analysis The file life.dat contains data regarding life expectancy in years by country, age, and sex. The data comes from Keyfitz and Flieger (1971) and relate to life expectancies in the 1960s. life<-read.table("life.dat",header=true) life cor(life[,2:9]) m0 is the life expectancy of a 0 year old man, m25 is the life expectancy of a 25 year old man, and so on. It is assumed that the variation in the data can be described by a few underlying factors. A good start could be to think about what such factors could be. A factor analysis can be performed either by the principal component method or the maximum likelihood metod. Let s start with the principal component method. As before, we can find the principal components using princomp: lifepc<-princomp(life[,2:9]) summary(lifepc,loadings=true) Judging from the cumulative proportion of variance for the principal components, how many factors would you use in the factor model? The factor loadings are now obtained by scaling the principal components with their standard deviations (see page 490 of Johnson & Wichern). Here we choose m = 3 factors. L1pc<-lifepc$sdev[1]*lifepc$loadings[,1] L2pc<-lifepc$sdev[2]*lifepc$loadings[,2] L3pc<-lifepc$sdev[3]*lifepc$loadings[,3] Try to interpret the factor loadings. Sometimes the interpretation can be aided by rotating the factor using an orthogonal matrix (recall that the factor model with the rotated loadings is just as valid as the first model obtained). Many analysts like to use the varimax criterion for choosing which orthogonal matrix to use. In R, we can use the function varimax to perform the rotation: Lpc<-cbind(L1pc,L2pc,L3pc) varimax(lpc) 4

5 Interpret the rotated factor loadings. Does your interpretation differ from that from the unrotated loadings? The maximum likelihood approach to estimation of the the factor loadings is described on p. 495 in Johnson & Wichern. We do not go into details about the procedure, but we can try to use it as a black box method, just to see if it yields the same solution as the principal component method. The function factanal gives the ML estimates of the factor loadings. It also performs the test for the number of common factors detailed on p of Johnson & Wichern. The test is motivated by asymptotic results, so it might not be appropriate for our data, where we only have n = 31 observations. Nevertheless, we use it here to see what the results are. lifefa1<-factanal(life[,2:9],factors=1,method="mle") lifefa1 lifefa2<-factanal(life[,2:9],factors=2,method="mle") lifefa2 lifefa3<-factanal(life[,2:9],factors=3,method="mle") lifefa3 The loadings given by factanal are rotated using the varimax criterion by default. Interpret the loadings for m = 3 factors. Is your interpretation the same as those from the principal component analyses? Canonical correlation analysis Unlike principal component analysis and factor analysis, that deal with relationships within sets of variables, canonical correlation analysis deals with relationships between sets of variables. In this exercise, we will study the salespeople data from problem 9.19 in Johnson & Wichern. The data is given on the table on page 536 and is found in the file T9-12.DAT. We will study the canonical correlations between the first three variables (sales growth, sales profitability and new account sales) and the last four variables (creativity test, mechanical reasoning test, abstract reasoning test and mathematics test). T912<-read.table("T9-12.DAT") X1<-T912[,1:3] X2<-T912[,4:7] The coefficients for the canonical variates can be found by solving the eigenvector equations (10-11) on page 545 of Johnson & Wichern. First, let us partition the covariance matrix: covmatrix<-cov(t912) S11<-covmatrix[1:3,1:3] S22<-covmatrix[4:7,4:7] S12<-covmatrix[1:3,4:7] S21<-covmatrix[4:7,1:3] We can now solve the eigenvector equations: 5

6 bigmatrix1<-solve(s11)%*%s12%*%solve(s22)%*%s21 canon1<-eigen(bigmatrix1) bigmatrix2<-solve(s22)%*%s21%*%solve(s11)%*%s12 canon2<-eigen(bigmatrix2) Convince yourself that the non-standardized canonical variates and the (absolute values of the) canonical correlations now are given by canon1 canon2 sqrt(canon1$values) The variates obtained using eigen are standardized in the sense that the coefficient vectors have length 1, but that is not what we are looking for here. The canonical variates should be choosen so that V ar(u) = V ar(v ) = 1. Let s calculate the variance for U 1 = a 1 X(1) : varu1<-t(canon1$vectors[,1]) %*% S11 %*% canon1$vectors[,1] We can now use this to standardize a 1 so that V ar(u 1 ) = 1: aa<-1/sqrt(varu1) a<-aa*canon1$vectors[,1] t(a) %*% S11 %*% a # Variance after standardization The other coefficient vectors can be standardized in the same way. An alternative to the approach above is to use the cancor function to find the canonical correlations and variates. For the above data set, these quantities are found by writing can912<-cancor(x1,x2) can912 The coefficient vectors given by cancor are not standardized in the right way. Once again, we can standardize them to give linear combinations with variance 1: varu1c<-t(can912$xcoef[,1]) %*% S11 %*% can912$xcoef[,1] aa<-1/sqrt(varu1c) ac<-aa*can912$xcoef[,1] t(ac) %*% S11 %*% ac Finally, we can verify that the standardized vector given by the two methods is the same: a ac 6

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING Vishwanath Mantha Department for Electrical and Computer Engineering Mississippi State University, Mississippi State, MS 39762 mantha@isip.msstate.edu ABSTRACT

More information

DIMENSION REDUCTION AND CLUSTER ANALYSIS

DIMENSION REDUCTION AND CLUSTER ANALYSIS DIMENSION REDUCTION AND CLUSTER ANALYSIS EECS 833, 6 March 2006 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@kgs.ku.edu 864-2093 Overheads and resources available at http://people.ku.edu/~gbohling/eecs833

More information

Multivariate Fundamentals: Rotation. Exploratory Factor Analysis

Multivariate Fundamentals: Rotation. Exploratory Factor Analysis Multivariate Fundamentals: Rotation Exploratory Factor Analysis PCA Analysis A Review Precipitation Temperature Ecosystems PCA Analysis with Spatial Data Proportion of variance explained Comp.1 + Comp.2

More information

Chapter 4: Factor Analysis

Chapter 4: Factor Analysis Chapter 4: Factor Analysis In many studies, we may not be able to measure directly the variables of interest. We can merely collect data on other variables which may be related to the variables of interest.

More information

Principal Component Analysis (PCA) Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Recall: Eigenvectors of the Covariance Matrix Covariance matrices are symmetric. Eigenvectors are orthogonal Eigenvectors are ordered by the magnitude of eigenvalues: λ 1 λ 2 λ p {v 1, v 2,..., v n } Recall:

More information

How to Run the Analysis: To run a principal components factor analysis, from the menus choose: Analyze Dimension Reduction Factor...

How to Run the Analysis: To run a principal components factor analysis, from the menus choose: Analyze Dimension Reduction Factor... The principal components method of extraction begins by finding a linear combination of variables that accounts for as much variation in the original variables as possible. This method is most often used

More information

PRINCIPAL COMPONENT ANALYSIS

PRINCIPAL COMPONENT ANALYSIS PRINCIPAL COMPONENT ANALYSIS 1 INTRODUCTION One of the main problems inherent in statistics with more than two variables is the issue of visualising or interpreting data. Fortunately, quite often the problem

More information

Applied Multivariate Analysis

Applied Multivariate Analysis Department of Mathematics and Statistics, University of Vaasa, Finland Spring 2017 Dimension reduction Exploratory (EFA) Background While the motivation in PCA is to replace the original (correlated) variables

More information

EDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS

EDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS EDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS Mario Romanazzi October 29, 2017 1 Introduction An important task in multidimensional data analysis is reduction in complexity. Recalling that

More information

Dimension Reduction and Classification Using PCA and Factor. Overview

Dimension Reduction and Classification Using PCA and Factor. Overview Dimension Reduction and Classification Using PCA and - A Short Overview Laboratory for Interdisciplinary Statistical Analysis Department of Statistics Virginia Tech http://www.stat.vt.edu/consult/ March

More information

Machine Learning (Spring 2012) Principal Component Analysis

Machine Learning (Spring 2012) Principal Component Analysis 1-71 Machine Learning (Spring 1) Principal Component Analysis Yang Xu This note is partly based on Chapter 1.1 in Chris Bishop s book on PRML and the lecture slides on PCA written by Carlos Guestrin in

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Factor Analysis Continued. Psy 524 Ainsworth

Factor Analysis Continued. Psy 524 Ainsworth Factor Analysis Continued Psy 524 Ainsworth Equations Extraction Principal Axis Factoring Variables Skiers Cost Lift Depth Powder S1 32 64 65 67 S2 61 37 62 65 S3 59 40 45 43 S4 36 62 34 35 S5 62 46 43

More information

Principal Component Analysis. Applied Multivariate Statistics Spring 2012

Principal Component Analysis. Applied Multivariate Statistics Spring 2012 Principal Component Analysis Applied Multivariate Statistics Spring 2012 Overview Intuition Four definitions Practical examples Mathematical example Case study 2 PCA: Goals Goal 1: Dimension reduction

More information

Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses.

Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses. ISQS 6348 Final exam solutions. Name: Open book and notes, but no electronic devices. Answer short answer questions on separate blank paper. Answer multiple choice on this exam sheet. Put your name on

More information

Principal Component Analysis Utilizing R and SAS Software s

Principal Component Analysis Utilizing R and SAS Software s International Journal of Current Microbiology and Applied Sciences ISSN: 2319-7706 Volume 7 Number 05 (2018) Journal homepage: http://www.ijcmas.com Original Research Article https://doi.org/10.20546/ijcmas.2018.705.441

More information

Principal components

Principal components Principal components Principal components is a general analysis technique that has some application within regression, but has a much wider use as well. Technical Stuff We have yet to define the term covariance,

More information

Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis

Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis For example Data reduction approaches Cluster analysis Principal components analysis

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Principal Component Analysis

Principal Component Analysis I.T. Jolliffe Principal Component Analysis Second Edition With 28 Illustrations Springer Contents Preface to the Second Edition Preface to the First Edition Acknowledgments List of Figures List of Tables

More information

PRINCIPAL COMPONENTS ANALYSIS

PRINCIPAL COMPONENTS ANALYSIS 121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves

More information

Principal Component Analysis, A Powerful Scoring Technique

Principal Component Analysis, A Powerful Scoring Technique Principal Component Analysis, A Powerful Scoring Technique George C. J. Fernandez, University of Nevada - Reno, Reno NV 89557 ABSTRACT Data mining is a collection of analytical techniques to uncover new

More information

Multivariate Analysis Homework 3

Multivariate Analysis Homework 3 Multivariate Analysis Homework 3 A4910970 Yi-Chen Zhang April 13, 018 8.4. Find the principal components and the proportion of the total population variance explained by each when the covariance matrix

More information

Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.

More information

Motivating the Covariance Matrix

Motivating the Covariance Matrix Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role

More information

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 8: Canonical Correlation Analysis

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 8: Canonical Correlation Analysis MS-E2112 Multivariate Statistical (5cr) Lecture 8: Contents Canonical correlation analysis involves partition of variables into two vectors x and y. The aim is to find linear combinations α T x and β

More information

Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.

More information

Statistical Data Analysis

Statistical Data Analysis DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

Quantitative Understanding in Biology Short Course Session 9 Principal Components Analysis

Quantitative Understanding in Biology Short Course Session 9 Principal Components Analysis Quantitative Understanding in Biology Short Course Session 9 Principal Components Analysis Jinhyun Ju Jason Banfelder Luce Skrabanek June 21st, 218 1 Preface For the last session in this course, we ll

More information

e 2 e 1 (a) (b) (d) (c)

e 2 e 1 (a) (b) (d) (c) 2.13 Rotated principal component analysis [Book, Sect. 2.2] Fig.: PCA applied to a dataset composed of (a) 1 cluster, (b) 2 clusters, (c) and (d) 4 clusters. In (c), an orthonormal rotation and (d) an

More information

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

More information

Multivariate Statistics (I) 2. Principal Component Analysis (PCA)

Multivariate Statistics (I) 2. Principal Component Analysis (PCA) Multivariate Statistics (I) 2. Principal Component Analysis (PCA) 2.1 Comprehension of PCA 2.2 Concepts of PCs 2.3 Algebraic derivation of PCs 2.4 Selection and goodness-of-fit of PCs 2.5 Algebraic derivation

More information

Quantitative Understanding in Biology Principal Components Analysis

Quantitative Understanding in Biology Principal Components Analysis Quantitative Understanding in Biology Principal Components Analysis Introduction Throughout this course we have seen examples of complex mathematical phenomena being represented as linear combinations

More information

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 1: Introduction, Multivariate Location and Scatter

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 1: Introduction, Multivariate Location and Scatter MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 1:, Multivariate Location Contents , pauliina.ilmonen(a)aalto.fi Lectures on Mondays 12.15-14.00 (2.1. - 6.2., 20.2. - 27.3.), U147 (U5) Exercises

More information

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis .. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make

More information

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Combinations of features Given a data matrix X n p with p fairly large, it can

More information

Lab 7. Direct & Indirect Gradient Analysis

Lab 7. Direct & Indirect Gradient Analysis Lab 7 Direct & Indirect Gradient Analysis Direct and indirect gradient analysis refers to a case where you have two datasets with variables that have cause-and-effect or mutual influences on each other.

More information

Basics of Multivariate Modelling and Data Analysis

Basics of Multivariate Modelling and Data Analysis Basics of Multivariate Modelling and Data Analysis Kurt-Erik Häggblom 2. Overview of multivariate techniques 2.1 Different approaches to multivariate data analysis 2.2 Classification of multivariate techniques

More information

Gopalkrishna Veni. Project 4 (Active Shape Models)

Gopalkrishna Veni. Project 4 (Active Shape Models) Gopalkrishna Veni Project 4 (Active Shape Models) Introduction Active shape Model (ASM) is a technique of building a model by learning the variability patterns from training datasets. ASMs try to deform

More information

Introduction to Factor Analysis

Introduction to Factor Analysis to Factor Analysis Lecture 10 August 2, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #10-8/3/2011 Slide 1 of 55 Today s Lecture Factor Analysis Today s Lecture Exploratory

More information

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) PCA transforms the original input space into a lower dimensional space, by constructing dimensions that are linear combinations

More information

Exploratory Factor Analysis and Principal Component Analysis

Exploratory Factor Analysis and Principal Component Analysis Exploratory Factor Analysis and Principal Component Analysis Today s Topics: What are EFA and PCA for? Planning a factor analytic study Analysis steps: Extraction methods How many factors Rotation and

More information

Computational functional genomics

Computational functional genomics Computational functional genomics (Spring 2005: Lecture 8) David K. Gifford (Adapted from a lecture by Tommi S. Jaakkola) MIT CSAIL Basic clustering methods hierarchical k means mixture models Multi variate

More information

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables 1 A factor can be considered to be an underlying latent variable: (a) on which people differ (b) that is explained by unknown variables (c) that cannot be defined (d) that is influenced by observed variables

More information

Exploratory Factor Analysis and Principal Component Analysis

Exploratory Factor Analysis and Principal Component Analysis Exploratory Factor Analysis and Principal Component Analysis Today s Topics: What are EFA and PCA for? Planning a factor analytic study Analysis steps: Extraction methods How many factors Rotation and

More information

Lecture 8: Classification

Lecture 8: Classification 1/26 Lecture 8: Classification Måns Eriksson Department of Mathematics, Uppsala University eriksson@math.uu.se Multivariate Methods 19/5 2010 Classification: introductory examples Goal: Classify an observation

More information

Principal Components Theory Notes

Principal Components Theory Notes Principal Components Theory Notes Charles J. Geyer August 29, 2007 1 Introduction These are class notes for Stat 5601 (nonparametrics) taught at the University of Minnesota, Spring 2006. This not a theory

More information

Assignment 3. Introduction to Machine Learning Prof. B. Ravindran

Assignment 3. Introduction to Machine Learning Prof. B. Ravindran Assignment 3 Introduction to Machine Learning Prof. B. Ravindran 1. In building a linear regression model for a particular data set, you observe the coefficient of one of the features having a relatively

More information

Intelligent Data Analysis. Principal Component Analysis. School of Computer Science University of Birmingham

Intelligent Data Analysis. Principal Component Analysis. School of Computer Science University of Birmingham Intelligent Data Analysis Principal Component Analysis Peter Tiňo School of Computer Science University of Birmingham Discovering low-dimensional spatial layout in higher dimensional spaces - 1-D/3-D example

More information

Applied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur

Applied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur Applied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur Lecture - 29 Multivariate Linear Regression- Model

More information

PRINCIPAL COMPONENTS ANALYSIS

PRINCIPAL COMPONENTS ANALYSIS PRINCIPAL COMPONENTS ANALYSIS Iris Data Let s find Principal Components using the iris dataset. This is a well known dataset, often used to demonstrate the effect of clustering algorithms. It contains

More information

Chapter 6 Expectation and Conditional Expectation. Lectures Definition 6.1. Two random variables defined on a probability space are said to be

Chapter 6 Expectation and Conditional Expectation. Lectures Definition 6.1. Two random variables defined on a probability space are said to be Chapter 6 Expectation and Conditional Expectation Lectures 24-30 In this chapter, we introduce expected value or the mean of a random variable. First we define expectation for discrete random variables

More information

A Characterization of Principal Components. for Projection Pursuit. By Richard J. Bolton and Wojtek J. Krzanowski

A Characterization of Principal Components. for Projection Pursuit. By Richard J. Bolton and Wojtek J. Krzanowski A Characterization of Principal Components for Projection Pursuit By Richard J. Bolton and Wojtek J. Krzanowski Department of Mathematical Statistics and Operational Research, University of Exeter, Laver

More information

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 Linear Discriminant Analysis Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Principal

More information

Eigenfaces. Face Recognition Using Principal Components Analysis

Eigenfaces. Face Recognition Using Principal Components Analysis Eigenfaces Face Recognition Using Principal Components Analysis M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive Neuroscience, 3(1), pp. 71-86, 1991. Slides : George Bebis, UNR

More information

Using Principal Component Analysis Modeling to Monitor Temperature Sensors in a Nuclear Research Reactor

Using Principal Component Analysis Modeling to Monitor Temperature Sensors in a Nuclear Research Reactor Using Principal Component Analysis Modeling to Monitor Temperature Sensors in a Nuclear Research Reactor Rosani M. L. Penha Centro de Energia Nuclear Instituto de Pesquisas Energéticas e Nucleares - Ipen

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Introduction to Factor Analysis

Introduction to Factor Analysis to Factor Analysis Lecture 11 November 2, 2005 Multivariate Analysis Lecture #11-11/2/2005 Slide 1 of 58 Today s Lecture Factor Analysis. Today s Lecture Exploratory factor analysis (EFA). Confirmatory

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand

More information

MATH 829: Introduction to Data Mining and Analysis Principal component analysis

MATH 829: Introduction to Data Mining and Analysis Principal component analysis 1/11 MATH 829: Introduction to Data Mining and Analysis Principal component analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware April 4, 2016 Motivation 2/11 High-dimensional

More information

Unsupervised Learning. k-means Algorithm

Unsupervised Learning. k-means Algorithm Unsupervised Learning Supervised Learning: Learn to predict y from x from examples of (x, y). Performance is measured by error rate. Unsupervised Learning: Learn a representation from exs. of x. Learn

More information

Exercises * on Principal Component Analysis

Exercises * on Principal Component Analysis Exercises * on Principal Component Analysis Laurenz Wiskott Institut für Neuroinformatik Ruhr-Universität Bochum, Germany, EU 4 February 207 Contents Intuition 3. Problem statement..........................................

More information

Dimension Reduction (PCA, ICA, CCA, FLD,

Dimension Reduction (PCA, ICA, CCA, FLD, Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011 April 6 th, 2011 Parts of the PCA slides are from previous 10-701 lectures 1 Outline Dimension reduction

More information

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17

Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17 Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis Chris Funk Lecture 17 Outline Filters and Rotations Generating co-varying random fields Translating co-varying fields into

More information

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS NOTES FROM PRE- LECTURE RECORDING ON PCA PCA and EFA have similar goals. They are substantially different in important ways. The goal

More information

Intermediate Social Statistics

Intermediate Social Statistics Intermediate Social Statistics Lecture 5. Factor Analysis Tom A.B. Snijders University of Oxford January, 2008 c Tom A.B. Snijders (University of Oxford) Intermediate Social Statistics January, 2008 1

More information

Machine Learning 11. week

Machine Learning 11. week Machine Learning 11. week Feature Extraction-Selection Dimension reduction PCA LDA 1 Feature Extraction Any problem can be solved by machine learning methods in case of that the system must be appropriately

More information

PCA Advanced Examples & Applications

PCA Advanced Examples & Applications PCA Advanced Examples & Applications Objectives: Showcase advanced PCA analysis: - Addressing the assumptions - Improving the signal / decreasing the noise Principal Components (PCA) Paper II Example:

More information

ILLUSTRATIVE EXAMPLES OF PRINCIPAL COMPONENTS ANALYSIS

ILLUSTRATIVE EXAMPLES OF PRINCIPAL COMPONENTS ANALYSIS ILLUSTRATIVE EXAMPLES OF PRINCIPAL COMPONENTS ANALYSIS W. T. Federer, C. E. McCulloch and N. J. Miles-McDermott Biometrics Unit, Cornell University, Ithaca, New York 14853-7801 BU-901-MA December 1986

More information

Exploratory Factor Analysis and Canonical Correlation

Exploratory Factor Analysis and Canonical Correlation Exploratory Factor Analysis and Canonical Correlation 3 Dec 2010 CPSY 501 Dr. Sean Ho Trinity Western University Please download: SAQ.sav Outline for today Factor analysis Latent variables Correlation

More information

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB

Quantitative Genomics and Genetics BTRY 4830/6830; PBSB Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture16: Population structure and logistic regression I Jason Mezey jgm45@cornell.edu April 11, 2017 (T) 8:40-9:55 Announcements I April

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

Principal Components Analysis (PCA)

Principal Components Analysis (PCA) Principal Components Analysis (PCA) Principal Components Analysis (PCA) a technique for finding patterns in data of high dimension Outline:. Eigenvectors and eigenvalues. PCA: a) Getting the data b) Centering

More information

CS229 Final Project. Wentao Zhang Shaochuan Xu

CS229 Final Project. Wentao Zhang Shaochuan Xu CS229 Final Project Shale Gas Production Decline Prediction Using Machine Learning Algorithms Wentao Zhang wentaoz@stanford.edu Shaochuan Xu scxu@stanford.edu In petroleum industry, oil companies sometimes

More information

Maximum variance formulation

Maximum variance formulation 12.1. Principal Component Analysis 561 Figure 12.2 Principal component analysis seeks a space of lower dimensionality, known as the principal subspace and denoted by the magenta line, such that the orthogonal

More information

Dimensionality Reduction

Dimensionality Reduction Lecture 5 1 Outline 1. Overview a) What is? b) Why? 2. Principal Component Analysis (PCA) a) Objectives b) Explaining variability c) SVD 3. Related approaches a) ICA b) Autoencoders 2 Example 1: Sportsball

More information

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017 CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).

More information

-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the

-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the 1 2 3 -Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the 1950's. -PCA is based on covariance or correlation

More information

9.1 Orthogonal factor model.

9.1 Orthogonal factor model. 36 Chapter 9 Factor Analysis Factor analysis may be viewed as a refinement of the principal component analysis The objective is, like the PC analysis, to describe the relevant variables in study in terms

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

Canonical Correlation & Principle Components Analysis

Canonical Correlation & Principle Components Analysis Canonical Correlation & Principle Components Analysis Aaron French Canonical Correlation Canonical Correlation is used to analyze correlation between two sets of variables when there is one set of IVs

More information

PCA vignette Principal components analysis with snpstats

PCA vignette Principal components analysis with snpstats PCA vignette Principal components analysis with snpstats David Clayton October 30, 2018 Principal components analysis has been widely used in population genetics in order to study population structure

More information

Lecture 7: Con3nuous Latent Variable Models

Lecture 7: Con3nuous Latent Variable Models CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/

More information

f rot (Hz) L x (max)(erg s 1 )

f rot (Hz) L x (max)(erg s 1 ) How Strongly Correlated are Two Quantities? Having spent much of the previous two lectures warning about the dangers of assuming uncorrelated uncertainties, we will now address the issue of correlations

More information

Pollution Sources Detection via Principal Component Analysis and Rotation

Pollution Sources Detection via Principal Component Analysis and Rotation Pollution Sources Detection via Principal Component Analysis and Rotation Vanessa Kuentz 1 in collaboration with : Marie Chavent 1 Hervé Guégan 2 Brigitte Patouille 1 Jérôme Saracco 1,3 1 IMB, Université

More information

A Peak to the World of Multivariate Statistical Analysis

A Peak to the World of Multivariate Statistical Analysis A Peak to the World of Multivariate Statistical Analysis Real Contents Real Real Real Why is it important to know a bit about the theory behind the methods? Real 5 10 15 20 Real 10 15 20 Figure: Multivariate

More information

1 Principal Components Analysis

1 Principal Components Analysis Lecture 3 and 4 Sept. 18 and Sept.20-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Principal Components Analysis Principal components analysis (PCA) is a very popular technique for

More information

Linear Algebra Methods for Data Mining

Linear Algebra Methods for Data Mining Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 The Singular Value Decomposition (SVD) continued Linear Algebra Methods for Data Mining, Spring 2007, University

More information

Principal Component and Factor Analysis of Macroeconomic Indicators

Principal Component and Factor Analysis of Macroeconomic Indicators IOSR Journal Of Humanities And Social Science (IOSR-JHSS) Volume 23, Issue 7, Ver. 10 (July. 2018) PP 01-07 e-issn: 2279-0837, p-issn: 2279-0845. www.iosrjournals.org Principal Component and Factor Analysis

More information

1 of 7 7/16/2009 6:12 AM Virtual Laboratories > 7. Point Estimation > 1 2 3 4 5 6 1. Estimators The Basic Statistical Model As usual, our starting point is a random experiment with an underlying sample

More information

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In

More information

Experiment 1: Linear Regression

Experiment 1: Linear Regression Experiment 1: Linear Regression August 27, 2018 1 Description This first exercise will give you practice with linear regression. These exercises have been extensively tested with Matlab, but they should

More information

Multivariate analysis of genetic data: exploring groups diversity

Multivariate analysis of genetic data: exploring groups diversity Multivariate analysis of genetic data: exploring groups diversity T. Jombart Imperial College London Bogota 01-12-2010 1/42 Outline Introduction Clustering algorithms Hierarchical clustering K-means Multivariate

More information

PRINCIPAL COMPONENTS ANALYSIS (PCA)

PRINCIPAL COMPONENTS ANALYSIS (PCA) PRINCIPAL COMPONENTS ANALYSIS (PCA) Introduction PCA is considered an exploratory technique that can be used to gain a better understanding of the interrelationships between variables. PCA is performed

More information

Didacticiel - Études de cas

Didacticiel - Études de cas 1 Topic New features for PCA (Principal Component Analysis) in Tanagra 1.4.45 and later: tools for the determination of the number of factors. Principal Component Analysis (PCA) 1 is a very popular dimension

More information

Vector Space Models. wine_spectral.r

Vector Space Models. wine_spectral.r Vector Space Models 137 wine_spectral.r Latent Semantic Analysis Problem with words Even a small vocabulary as in wine example is challenging LSA Reduce number of columns of DTM by principal components

More information

Dimensionality Reduction Techniques (DRT)

Dimensionality Reduction Techniques (DRT) Dimensionality Reduction Techniques (DRT) Introduction: Sometimes we have lot of variables in the data for analysis which create multidimensional matrix. To simplify calculation and to get appropriate,

More information

15 Singular Value Decomposition

15 Singular Value Decomposition 15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information