Computer exercise 3: PCA, CCA and factors. Principal component analysis. Eigenvalues and eigenvectors
|
|
- Reginald Young
- 5 years ago
- Views:
Transcription
1 UPPSALA UNIVERSITY Department of Mathematics Måns Thulin Multivariate Methods Spring 2011 Computer exercise 3: PCA, CCA and factors In this computer exercise the following topics are studied: Principal component analysis Factor analysis Canonical correlation analysis Principal component analysis Principal components are linear combinations of random variables, given by the eigenvalues and eigenvectors of the covariance or correlation matrix of the random variables. Eigenvalues and eigenvectors Sometimes a covariance or correlation matrix is given directly in the problem at hand. Consider for instance the covariance matrix in Example 8.1 on page 434 of J&W: Σ = To find the principal components, we must find the eigenvalues and eigenvectors of Σ. These are found by the command eigen; see the following script, where at the last line the first principal component is extracted: Sig <- cbind(c(1,-2,0),c(-2,5,0),c(0,0,2)); eigen(sig) pc1 <- eigen(sig)$vector[,1] Now let us verify that the vectors and scalars given by eigen really are the eigenvectors and corresponding eigenvalues of Σ. Make sure that you understand what the following commands do and why they yield the same result. Sig %*% eigen(sig)$vector[,1] eigen(sig)$values[1] * eigen(sig)$vector[,1] The vectors with the coefficients for the principal components should have lengh 1 and be orthogonal to eachother. We can verify that the eigenvectors of Σ satisfy this by R: t(eigen(sig)$vector[,1]) %*% eigen(sig)$vector[,1] t(eigen(sig)$vector[,1]) %*% eigen(sig)$vector[,2] 1
2 Principal components given a data structure If an entire multivariate dataset is given, the key command for PCA in R is princomp. We will study a dataset of air pollution in US cities, given in the file usair.r. Seven variables were recorded for 41 cities: SO2: Sulphur dioxide content of air in micrograms per cubic meter Temp: Average annual temperature in F Manuf: Number of manufacturing enterprises employing 20 or more workers Pop: Population size (1970 census) in thousands Wind: Average annual wind speed in miles per hour Precip: Average annual precipitation in inches Days: Average number of days with precipitation per year The example is taken from a book by Everitt. 1 We will for a moment disregard the SO2 variable and study the remaining six: two of these relate to human ecology (Pop, Manuf) and four to climate (Temp, Wind, Precip, Days). Everitt suggests looking at 1 times the temperatures, since then all six variables are such that high values represent a less attractive enviroment. We start by studying all combinations of scatter plots and computing correlations: source("usair.r") usair pairs(usair[,-1]) cor(usair[,-1]) Are there any outliers? For simplicity, we continue the analysis of the complete data set, but at this stage in a data analysis it is good practice to keep an eye out for possible outliers that may affect the analysis. Since the variables are measured on different scales, it seems reasonable to use the correlation matrix for the analysis, since the variables with big variance will dominate the principal components completely otherwise. The principal components are given by the eigenvectors of the correlation matrix: eigen(cor(usair[,-1])) Alternatively, the command princomp can be used. This gives us a nice summary of the principal components as well: usairpc<-princomp(usair[,-1],cor=true) summary(usairpc,loadings=true) How are the standard deviations listed for the principal components related to the eigenvalues of the correlation matrix? Everitt tries to interpret the first three principal components, and labels them quality of life, wet weather and climate type. Do you agree with his interpretation? To see how much of the variation in the data that is described by the principal components, we look at a screeplot: screeplot(usairpc,type="l") We can now study the data set by looking at a scatter plot of the first two principal components: 1 Everitt, B.S. (2005), An R and S-PLUS Companion to Multivariate Analysis, Springer 2
3 plot(usairpc$scores[,1],usairpc$scores[,2], ylim=range(usairpc$scores[,1]),xlab = "PC1",ylab="PC2",lwd=2) It can also be of interest to plot the first principal component versus the third. In the above plot, we can use the command identify to see which point belongs to which city. An alternative is to plot the (abbreviated) city names directly in the plot: par(pty="s") plot(usairpc$scores[,1],usairpc$scores[,2], ylim=range(usairpc$scores[,1]),xlab = "PC1",ylab="PC2",type="n") text(usairpc$scores[,1],usairpc$scores[,2], labels = abbreviate(row.names(usair)),cex=0.7,lwd=2) One purpose for collecting the air pollution data was to see how the SO2 variable is related to the other variables. We can study its relation the the principal components in different ways; for instance by a bubble plot: plot(usairpc$scores[,1],usairpc$scores[,2], ylim=range(usairpc$scores[,1]),xlab = "PC1",ylab="PC2",type="n") symbols(usairpc$scores[,1],usairpc$scores[,2],circles=sqrt(usair$so2), inches=0.2,add=true,bg="pink",fg="black") Or by plots versus the first three principal components: par(mfrow=c(1,3)) plot(usairpc$scores[,1],usair$so2,xlab="pc1") plot(usairpc$scores[,2],usair$so2,xlab="pc2") plot(usairpc$scores[,3],usair$so2,xlab="pc3") What are your conclusions? Is SO2 related to any of the principal components? Principal components and cluster analysis In cluster analysis, studied in the next block of the course, the goal is to find clusters sets of points that somehow belong together in the data set. Some methods for finding clusters require that the user states the number of clusters beforehand. If the data is high-dimensional, it may be hard to find the number of clusters by visual means. In this situation, principal components sometimes come in handy. Let s look at the four-dimensional data in data2.dat. data2<-read.table("data2.dat") pairs(data2) The pairwise scatter plots seem to indicate that there are indeed clusters within the data, but it is hard to say how many there are. Perhaps we can find out by looking at the principal components. data2p<-princomp(data2) screeplot(data2p,type="l") plot(data2p$scores[,1],data2p$scores[,2]) 3
4 Judging from the plot, how many clusters would you say there are in the data set? The screeplot tells us that the third and fourth principal components contain little information. However, in this particular case, including the third component in the analysis might change the conclusion regarding the number of clusters. Let s examine the clusters in three dimensions pay extra attention to the rightmost cluster in the two-dimensional scatter plot above. library(rgl) plot3d(data2p$scores[,1],data2p$scores[,3],data2p$scores[,2],col=2) Factor analysis The file life.dat contains data regarding life expectancy in years by country, age, and sex. The data comes from Keyfitz and Flieger (1971) and relate to life expectancies in the 1960s. life<-read.table("life.dat",header=true) life cor(life[,2:9]) m0 is the life expectancy of a 0 year old man, m25 is the life expectancy of a 25 year old man, and so on. It is assumed that the variation in the data can be described by a few underlying factors. A good start could be to think about what such factors could be. A factor analysis can be performed either by the principal component method or the maximum likelihood metod. Let s start with the principal component method. As before, we can find the principal components using princomp: lifepc<-princomp(life[,2:9]) summary(lifepc,loadings=true) Judging from the cumulative proportion of variance for the principal components, how many factors would you use in the factor model? The factor loadings are now obtained by scaling the principal components with their standard deviations (see page 490 of Johnson & Wichern). Here we choose m = 3 factors. L1pc<-lifepc$sdev[1]*lifepc$loadings[,1] L2pc<-lifepc$sdev[2]*lifepc$loadings[,2] L3pc<-lifepc$sdev[3]*lifepc$loadings[,3] Try to interpret the factor loadings. Sometimes the interpretation can be aided by rotating the factor using an orthogonal matrix (recall that the factor model with the rotated loadings is just as valid as the first model obtained). Many analysts like to use the varimax criterion for choosing which orthogonal matrix to use. In R, we can use the function varimax to perform the rotation: Lpc<-cbind(L1pc,L2pc,L3pc) varimax(lpc) 4
5 Interpret the rotated factor loadings. Does your interpretation differ from that from the unrotated loadings? The maximum likelihood approach to estimation of the the factor loadings is described on p. 495 in Johnson & Wichern. We do not go into details about the procedure, but we can try to use it as a black box method, just to see if it yields the same solution as the principal component method. The function factanal gives the ML estimates of the factor loadings. It also performs the test for the number of common factors detailed on p of Johnson & Wichern. The test is motivated by asymptotic results, so it might not be appropriate for our data, where we only have n = 31 observations. Nevertheless, we use it here to see what the results are. lifefa1<-factanal(life[,2:9],factors=1,method="mle") lifefa1 lifefa2<-factanal(life[,2:9],factors=2,method="mle") lifefa2 lifefa3<-factanal(life[,2:9],factors=3,method="mle") lifefa3 The loadings given by factanal are rotated using the varimax criterion by default. Interpret the loadings for m = 3 factors. Is your interpretation the same as those from the principal component analyses? Canonical correlation analysis Unlike principal component analysis and factor analysis, that deal with relationships within sets of variables, canonical correlation analysis deals with relationships between sets of variables. In this exercise, we will study the salespeople data from problem 9.19 in Johnson & Wichern. The data is given on the table on page 536 and is found in the file T9-12.DAT. We will study the canonical correlations between the first three variables (sales growth, sales profitability and new account sales) and the last four variables (creativity test, mechanical reasoning test, abstract reasoning test and mathematics test). T912<-read.table("T9-12.DAT") X1<-T912[,1:3] X2<-T912[,4:7] The coefficients for the canonical variates can be found by solving the eigenvector equations (10-11) on page 545 of Johnson & Wichern. First, let us partition the covariance matrix: covmatrix<-cov(t912) S11<-covmatrix[1:3,1:3] S22<-covmatrix[4:7,4:7] S12<-covmatrix[1:3,4:7] S21<-covmatrix[4:7,1:3] We can now solve the eigenvector equations: 5
6 bigmatrix1<-solve(s11)%*%s12%*%solve(s22)%*%s21 canon1<-eigen(bigmatrix1) bigmatrix2<-solve(s22)%*%s21%*%solve(s11)%*%s12 canon2<-eigen(bigmatrix2) Convince yourself that the non-standardized canonical variates and the (absolute values of the) canonical correlations now are given by canon1 canon2 sqrt(canon1$values) The variates obtained using eigen are standardized in the sense that the coefficient vectors have length 1, but that is not what we are looking for here. The canonical variates should be choosen so that V ar(u) = V ar(v ) = 1. Let s calculate the variance for U 1 = a 1 X(1) : varu1<-t(canon1$vectors[,1]) %*% S11 %*% canon1$vectors[,1] We can now use this to standardize a 1 so that V ar(u 1 ) = 1: aa<-1/sqrt(varu1) a<-aa*canon1$vectors[,1] t(a) %*% S11 %*% a # Variance after standardization The other coefficient vectors can be standardized in the same way. An alternative to the approach above is to use the cancor function to find the canonical correlations and variates. For the above data set, these quantities are found by writing can912<-cancor(x1,x2) can912 The coefficient vectors given by cancor are not standardized in the right way. Once again, we can standardize them to give linear combinations with variance 1: varu1c<-t(can912$xcoef[,1]) %*% S11 %*% can912$xcoef[,1] aa<-1/sqrt(varu1c) ac<-aa*can912$xcoef[,1] t(ac) %*% S11 %*% ac Finally, we can verify that the standardized vector given by the two methods is the same: a ac 6
FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING
FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING Vishwanath Mantha Department for Electrical and Computer Engineering Mississippi State University, Mississippi State, MS 39762 mantha@isip.msstate.edu ABSTRACT
More informationDIMENSION REDUCTION AND CLUSTER ANALYSIS
DIMENSION REDUCTION AND CLUSTER ANALYSIS EECS 833, 6 March 2006 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@kgs.ku.edu 864-2093 Overheads and resources available at http://people.ku.edu/~gbohling/eecs833
More informationMultivariate Fundamentals: Rotation. Exploratory Factor Analysis
Multivariate Fundamentals: Rotation Exploratory Factor Analysis PCA Analysis A Review Precipitation Temperature Ecosystems PCA Analysis with Spatial Data Proportion of variance explained Comp.1 + Comp.2
More informationChapter 4: Factor Analysis
Chapter 4: Factor Analysis In many studies, we may not be able to measure directly the variables of interest. We can merely collect data on other variables which may be related to the variables of interest.
More informationPrincipal Component Analysis (PCA) Principal Component Analysis (PCA)
Recall: Eigenvectors of the Covariance Matrix Covariance matrices are symmetric. Eigenvectors are orthogonal Eigenvectors are ordered by the magnitude of eigenvalues: λ 1 λ 2 λ p {v 1, v 2,..., v n } Recall:
More informationHow to Run the Analysis: To run a principal components factor analysis, from the menus choose: Analyze Dimension Reduction Factor...
The principal components method of extraction begins by finding a linear combination of variables that accounts for as much variation in the original variables as possible. This method is most often used
More informationPRINCIPAL COMPONENT ANALYSIS
PRINCIPAL COMPONENT ANALYSIS 1 INTRODUCTION One of the main problems inherent in statistics with more than two variables is the issue of visualising or interpreting data. Fortunately, quite often the problem
More informationApplied Multivariate Analysis
Department of Mathematics and Statistics, University of Vaasa, Finland Spring 2017 Dimension reduction Exploratory (EFA) Background While the motivation in PCA is to replace the original (correlated) variables
More informationEDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS
EDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS Mario Romanazzi October 29, 2017 1 Introduction An important task in multidimensional data analysis is reduction in complexity. Recalling that
More informationDimension Reduction and Classification Using PCA and Factor. Overview
Dimension Reduction and Classification Using PCA and - A Short Overview Laboratory for Interdisciplinary Statistical Analysis Department of Statistics Virginia Tech http://www.stat.vt.edu/consult/ March
More informationMachine Learning (Spring 2012) Principal Component Analysis
1-71 Machine Learning (Spring 1) Principal Component Analysis Yang Xu This note is partly based on Chapter 1.1 in Chris Bishop s book on PRML and the lecture slides on PCA written by Carlos Guestrin in
More informationPCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani
PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given
More informationFactor Analysis Continued. Psy 524 Ainsworth
Factor Analysis Continued Psy 524 Ainsworth Equations Extraction Principal Axis Factoring Variables Skiers Cost Lift Depth Powder S1 32 64 65 67 S2 61 37 62 65 S3 59 40 45 43 S4 36 62 34 35 S5 62 46 43
More informationPrincipal Component Analysis. Applied Multivariate Statistics Spring 2012
Principal Component Analysis Applied Multivariate Statistics Spring 2012 Overview Intuition Four definitions Practical examples Mathematical example Case study 2 PCA: Goals Goal 1: Dimension reduction
More informationShort Answer Questions: Answer on your separate blank paper. Points are given in parentheses.
ISQS 6348 Final exam solutions. Name: Open book and notes, but no electronic devices. Answer short answer questions on separate blank paper. Answer multiple choice on this exam sheet. Put your name on
More informationPrincipal Component Analysis Utilizing R and SAS Software s
International Journal of Current Microbiology and Applied Sciences ISSN: 2319-7706 Volume 7 Number 05 (2018) Journal homepage: http://www.ijcmas.com Original Research Article https://doi.org/10.20546/ijcmas.2018.705.441
More informationPrincipal components
Principal components Principal components is a general analysis technique that has some application within regression, but has a much wider use as well. Technical Stuff We have yet to define the term covariance,
More informationMultivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis
Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis For example Data reduction approaches Cluster analysis Principal components analysis
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationPrincipal Component Analysis
I.T. Jolliffe Principal Component Analysis Second Edition With 28 Illustrations Springer Contents Preface to the Second Edition Preface to the First Edition Acknowledgments List of Figures List of Tables
More informationPRINCIPAL COMPONENTS ANALYSIS
121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves
More informationPrincipal Component Analysis, A Powerful Scoring Technique
Principal Component Analysis, A Powerful Scoring Technique George C. J. Fernandez, University of Nevada - Reno, Reno NV 89557 ABSTRACT Data mining is a collection of analytical techniques to uncover new
More informationMultivariate Analysis Homework 3
Multivariate Analysis Homework 3 A4910970 Yi-Chen Zhang April 13, 018 8.4. Find the principal components and the proportion of the total population variance explained by each when the covariance matrix
More informationEigenvalues, Eigenvectors, and an Intro to PCA
Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.
More informationMotivating the Covariance Matrix
Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role
More informationMS-E2112 Multivariate Statistical Analysis (5cr) Lecture 8: Canonical Correlation Analysis
MS-E2112 Multivariate Statistical (5cr) Lecture 8: Contents Canonical correlation analysis involves partition of variables into two vectors x and y. The aim is to find linear combinations α T x and β
More informationEigenvalues, Eigenvectors, and an Intro to PCA
Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.
More informationStatistical Data Analysis
DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationLecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26
Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1
More informationQuantitative Understanding in Biology Short Course Session 9 Principal Components Analysis
Quantitative Understanding in Biology Short Course Session 9 Principal Components Analysis Jinhyun Ju Jason Banfelder Luce Skrabanek June 21st, 218 1 Preface For the last session in this course, we ll
More informatione 2 e 1 (a) (b) (d) (c)
2.13 Rotated principal component analysis [Book, Sect. 2.2] Fig.: PCA applied to a dataset composed of (a) 1 cluster, (b) 2 clusters, (c) and (d) 4 clusters. In (c), an orthonormal rotation and (d) an
More informationROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015
ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti
More informationMultivariate Statistics (I) 2. Principal Component Analysis (PCA)
Multivariate Statistics (I) 2. Principal Component Analysis (PCA) 2.1 Comprehension of PCA 2.2 Concepts of PCs 2.3 Algebraic derivation of PCs 2.4 Selection and goodness-of-fit of PCs 2.5 Algebraic derivation
More informationQuantitative Understanding in Biology Principal Components Analysis
Quantitative Understanding in Biology Principal Components Analysis Introduction Throughout this course we have seen examples of complex mathematical phenomena being represented as linear combinations
More informationMS-E2112 Multivariate Statistical Analysis (5cr) Lecture 1: Introduction, Multivariate Location and Scatter
MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 1:, Multivariate Location Contents , pauliina.ilmonen(a)aalto.fi Lectures on Mondays 12.15-14.00 (2.1. - 6.2., 20.2. - 27.3.), U147 (U5) Exercises
More informationDecember 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis
.. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make
More informationDimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining
Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Combinations of features Given a data matrix X n p with p fairly large, it can
More informationLab 7. Direct & Indirect Gradient Analysis
Lab 7 Direct & Indirect Gradient Analysis Direct and indirect gradient analysis refers to a case where you have two datasets with variables that have cause-and-effect or mutual influences on each other.
More informationBasics of Multivariate Modelling and Data Analysis
Basics of Multivariate Modelling and Data Analysis Kurt-Erik Häggblom 2. Overview of multivariate techniques 2.1 Different approaches to multivariate data analysis 2.2 Classification of multivariate techniques
More informationGopalkrishna Veni. Project 4 (Active Shape Models)
Gopalkrishna Veni Project 4 (Active Shape Models) Introduction Active shape Model (ASM) is a technique of building a model by learning the variability patterns from training datasets. ASMs try to deform
More informationIntroduction to Factor Analysis
to Factor Analysis Lecture 10 August 2, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #10-8/3/2011 Slide 1 of 55 Today s Lecture Factor Analysis Today s Lecture Exploratory
More informationPrincipal Component Analysis -- PCA (also called Karhunen-Loeve transformation)
Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) PCA transforms the original input space into a lower dimensional space, by constructing dimensions that are linear combinations
More informationExploratory Factor Analysis and Principal Component Analysis
Exploratory Factor Analysis and Principal Component Analysis Today s Topics: What are EFA and PCA for? Planning a factor analytic study Analysis steps: Extraction methods How many factors Rotation and
More informationComputational functional genomics
Computational functional genomics (Spring 2005: Lecture 8) David K. Gifford (Adapted from a lecture by Tommi S. Jaakkola) MIT CSAIL Basic clustering methods hierarchical k means mixture models Multi variate
More information1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables
1 A factor can be considered to be an underlying latent variable: (a) on which people differ (b) that is explained by unknown variables (c) that cannot be defined (d) that is influenced by observed variables
More informationExploratory Factor Analysis and Principal Component Analysis
Exploratory Factor Analysis and Principal Component Analysis Today s Topics: What are EFA and PCA for? Planning a factor analytic study Analysis steps: Extraction methods How many factors Rotation and
More informationLecture 8: Classification
1/26 Lecture 8: Classification Måns Eriksson Department of Mathematics, Uppsala University eriksson@math.uu.se Multivariate Methods 19/5 2010 Classification: introductory examples Goal: Classify an observation
More informationPrincipal Components Theory Notes
Principal Components Theory Notes Charles J. Geyer August 29, 2007 1 Introduction These are class notes for Stat 5601 (nonparametrics) taught at the University of Minnesota, Spring 2006. This not a theory
More informationAssignment 3. Introduction to Machine Learning Prof. B. Ravindran
Assignment 3 Introduction to Machine Learning Prof. B. Ravindran 1. In building a linear regression model for a particular data set, you observe the coefficient of one of the features having a relatively
More informationIntelligent Data Analysis. Principal Component Analysis. School of Computer Science University of Birmingham
Intelligent Data Analysis Principal Component Analysis Peter Tiňo School of Computer Science University of Birmingham Discovering low-dimensional spatial layout in higher dimensional spaces - 1-D/3-D example
More informationApplied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur
Applied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur Lecture - 29 Multivariate Linear Regression- Model
More informationPRINCIPAL COMPONENTS ANALYSIS
PRINCIPAL COMPONENTS ANALYSIS Iris Data Let s find Principal Components using the iris dataset. This is a well known dataset, often used to demonstrate the effect of clustering algorithms. It contains
More informationChapter 6 Expectation and Conditional Expectation. Lectures Definition 6.1. Two random variables defined on a probability space are said to be
Chapter 6 Expectation and Conditional Expectation Lectures 24-30 In this chapter, we introduce expected value or the mean of a random variable. First we define expectation for discrete random variables
More informationA Characterization of Principal Components. for Projection Pursuit. By Richard J. Bolton and Wojtek J. Krzanowski
A Characterization of Principal Components for Projection Pursuit By Richard J. Bolton and Wojtek J. Krzanowski Department of Mathematical Statistics and Operational Research, University of Exeter, Laver
More informationLinear Algebra Methods for Data Mining
Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 Linear Discriminant Analysis Linear Algebra Methods for Data Mining, Spring 2007, University of Helsinki Principal
More informationEigenfaces. Face Recognition Using Principal Components Analysis
Eigenfaces Face Recognition Using Principal Components Analysis M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive Neuroscience, 3(1), pp. 71-86, 1991. Slides : George Bebis, UNR
More informationUsing Principal Component Analysis Modeling to Monitor Temperature Sensors in a Nuclear Research Reactor
Using Principal Component Analysis Modeling to Monitor Temperature Sensors in a Nuclear Research Reactor Rosani M. L. Penha Centro de Energia Nuclear Instituto de Pesquisas Energéticas e Nucleares - Ipen
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationIntroduction to Factor Analysis
to Factor Analysis Lecture 11 November 2, 2005 Multivariate Analysis Lecture #11-11/2/2005 Slide 1 of 58 Today s Lecture Factor Analysis. Today s Lecture Exploratory factor analysis (EFA). Confirmatory
More informationPrincipal Component Analysis
Principal Component Analysis Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Nina Balcan] slide 1 Goals for the lecture you should understand
More informationMATH 829: Introduction to Data Mining and Analysis Principal component analysis
1/11 MATH 829: Introduction to Data Mining and Analysis Principal component analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware April 4, 2016 Motivation 2/11 High-dimensional
More informationUnsupervised Learning. k-means Algorithm
Unsupervised Learning Supervised Learning: Learn to predict y from x from examples of (x, y). Performance is measured by error rate. Unsupervised Learning: Learn a representation from exs. of x. Learn
More informationExercises * on Principal Component Analysis
Exercises * on Principal Component Analysis Laurenz Wiskott Institut für Neuroinformatik Ruhr-Universität Bochum, Germany, EU 4 February 207 Contents Intuition 3. Problem statement..........................................
More informationDimension Reduction (PCA, ICA, CCA, FLD,
Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011 April 6 th, 2011 Parts of the PCA slides are from previous 10-701 lectures 1 Outline Dimension reduction
More informationPrincipal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17
Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis Chris Funk Lecture 17 Outline Filters and Rotations Generating co-varying random fields Translating co-varying fields into
More informationLECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS
LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS NOTES FROM PRE- LECTURE RECORDING ON PCA PCA and EFA have similar goals. They are substantially different in important ways. The goal
More informationIntermediate Social Statistics
Intermediate Social Statistics Lecture 5. Factor Analysis Tom A.B. Snijders University of Oxford January, 2008 c Tom A.B. Snijders (University of Oxford) Intermediate Social Statistics January, 2008 1
More informationMachine Learning 11. week
Machine Learning 11. week Feature Extraction-Selection Dimension reduction PCA LDA 1 Feature Extraction Any problem can be solved by machine learning methods in case of that the system must be appropriately
More informationPCA Advanced Examples & Applications
PCA Advanced Examples & Applications Objectives: Showcase advanced PCA analysis: - Addressing the assumptions - Improving the signal / decreasing the noise Principal Components (PCA) Paper II Example:
More informationILLUSTRATIVE EXAMPLES OF PRINCIPAL COMPONENTS ANALYSIS
ILLUSTRATIVE EXAMPLES OF PRINCIPAL COMPONENTS ANALYSIS W. T. Federer, C. E. McCulloch and N. J. Miles-McDermott Biometrics Unit, Cornell University, Ithaca, New York 14853-7801 BU-901-MA December 1986
More informationExploratory Factor Analysis and Canonical Correlation
Exploratory Factor Analysis and Canonical Correlation 3 Dec 2010 CPSY 501 Dr. Sean Ho Trinity Western University Please download: SAQ.sav Outline for today Factor analysis Latent variables Correlation
More informationQuantitative Genomics and Genetics BTRY 4830/6830; PBSB
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture16: Population structure and logistic regression I Jason Mezey jgm45@cornell.edu April 11, 2017 (T) 8:40-9:55 Announcements I April
More information14 Singular Value Decomposition
14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More informationPrincipal Components Analysis (PCA)
Principal Components Analysis (PCA) Principal Components Analysis (PCA) a technique for finding patterns in data of high dimension Outline:. Eigenvectors and eigenvalues. PCA: a) Getting the data b) Centering
More informationCS229 Final Project. Wentao Zhang Shaochuan Xu
CS229 Final Project Shale Gas Production Decline Prediction Using Machine Learning Algorithms Wentao Zhang wentaoz@stanford.edu Shaochuan Xu scxu@stanford.edu In petroleum industry, oil companies sometimes
More informationMaximum variance formulation
12.1. Principal Component Analysis 561 Figure 12.2 Principal component analysis seeks a space of lower dimensionality, known as the principal subspace and denoted by the magenta line, such that the orthogonal
More informationDimensionality Reduction
Lecture 5 1 Outline 1. Overview a) What is? b) Why? 2. Principal Component Analysis (PCA) a) Objectives b) Explaining variability c) SVD 3. Related approaches a) ICA b) Autoencoders 2 Example 1: Sportsball
More informationCPSC 340: Machine Learning and Data Mining. More PCA Fall 2017
CPSC 340: Machine Learning and Data Mining More PCA Fall 2017 Admin Assignment 4: Due Friday of next week. No class Monday due to holiday. There will be tutorials next week on MAP/PCA (except Monday).
More information-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the
1 2 3 -Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the 1950's. -PCA is based on covariance or correlation
More information9.1 Orthogonal factor model.
36 Chapter 9 Factor Analysis Factor analysis may be viewed as a refinement of the principal component analysis The objective is, like the PC analysis, to describe the relevant variables in study in terms
More informationLinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis
More informationCanonical Correlation & Principle Components Analysis
Canonical Correlation & Principle Components Analysis Aaron French Canonical Correlation Canonical Correlation is used to analyze correlation between two sets of variables when there is one set of IVs
More informationPCA vignette Principal components analysis with snpstats
PCA vignette Principal components analysis with snpstats David Clayton October 30, 2018 Principal components analysis has been widely used in population genetics in order to study population structure
More informationLecture 7: Con3nuous Latent Variable Models
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/
More informationf rot (Hz) L x (max)(erg s 1 )
How Strongly Correlated are Two Quantities? Having spent much of the previous two lectures warning about the dangers of assuming uncorrelated uncertainties, we will now address the issue of correlations
More informationPollution Sources Detection via Principal Component Analysis and Rotation
Pollution Sources Detection via Principal Component Analysis and Rotation Vanessa Kuentz 1 in collaboration with : Marie Chavent 1 Hervé Guégan 2 Brigitte Patouille 1 Jérôme Saracco 1,3 1 IMB, Université
More informationA Peak to the World of Multivariate Statistical Analysis
A Peak to the World of Multivariate Statistical Analysis Real Contents Real Real Real Why is it important to know a bit about the theory behind the methods? Real 5 10 15 20 Real 10 15 20 Figure: Multivariate
More information1 Principal Components Analysis
Lecture 3 and 4 Sept. 18 and Sept.20-2006 Data Visualization STAT 442 / 890, CM 462 Lecture: Ali Ghodsi 1 Principal Components Analysis Principal components analysis (PCA) is a very popular technique for
More informationLinear Algebra Methods for Data Mining
Linear Algebra Methods for Data Mining Saara Hyvönen, Saara.Hyvonen@cs.helsinki.fi Spring 2007 The Singular Value Decomposition (SVD) continued Linear Algebra Methods for Data Mining, Spring 2007, University
More informationPrincipal Component and Factor Analysis of Macroeconomic Indicators
IOSR Journal Of Humanities And Social Science (IOSR-JHSS) Volume 23, Issue 7, Ver. 10 (July. 2018) PP 01-07 e-issn: 2279-0837, p-issn: 2279-0845. www.iosrjournals.org Principal Component and Factor Analysis
More information1 of 7 7/16/2009 6:12 AM Virtual Laboratories > 7. Point Estimation > 1 2 3 4 5 6 1. Estimators The Basic Statistical Model As usual, our starting point is a random experiment with an underlying sample
More informationPrinciple Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA
Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In
More informationExperiment 1: Linear Regression
Experiment 1: Linear Regression August 27, 2018 1 Description This first exercise will give you practice with linear regression. These exercises have been extensively tested with Matlab, but they should
More informationMultivariate analysis of genetic data: exploring groups diversity
Multivariate analysis of genetic data: exploring groups diversity T. Jombart Imperial College London Bogota 01-12-2010 1/42 Outline Introduction Clustering algorithms Hierarchical clustering K-means Multivariate
More informationPRINCIPAL COMPONENTS ANALYSIS (PCA)
PRINCIPAL COMPONENTS ANALYSIS (PCA) Introduction PCA is considered an exploratory technique that can be used to gain a better understanding of the interrelationships between variables. PCA is performed
More informationDidacticiel - Études de cas
1 Topic New features for PCA (Principal Component Analysis) in Tanagra 1.4.45 and later: tools for the determination of the number of factors. Principal Component Analysis (PCA) 1 is a very popular dimension
More informationVector Space Models. wine_spectral.r
Vector Space Models 137 wine_spectral.r Latent Semantic Analysis Problem with words Even a small vocabulary as in wine example is challenging LSA Reduce number of columns of DTM by principal components
More informationDimensionality Reduction Techniques (DRT)
Dimensionality Reduction Techniques (DRT) Introduction: Sometimes we have lot of variables in the data for analysis which create multidimensional matrix. To simplify calculation and to get appropriate,
More information15 Singular Value Decomposition
15 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing
More information