Ordination & PCA. Ordination. Ordination

Size: px
Start display at page:

Download "Ordination & PCA. Ordination. Ordination"

Transcription

1 Ordination & PCA Introduction to Ordination Purpose & types Shepard diagrams Principal Components Analysis (PCA) Properties Computing eigenvalues Computing principal components Biplots Covariance vs. Correlation Types of PCAs Meaningful Components Misuses of PCA PCA Session / Software Tutorial Ordination Ordination (from the Latin ordinatio and German Ordnung) is the arrangement of units in some order (Goodall 1954). It consists of plotting object-points along an axis representing an ordered relationship, or forming a scatter diagram with two or more axes. The actual term ordination seems to have originated in the ecological literature (and is generally not used by statisticians). Ordination Biologists are often interested in characterizing data trends of variation in the objects with respect to all descriptors, not just a few. The ordination approach permits the construction of a multidimensional space whereby each axis represents a descriptor in the study. This multidimensional space is then reduced to two or three dimensions for graphical interpretation and communication, thus permitting the examination of relationship among objects. 1

2 Ordination Ordination in reduced space is often referred to as factor analysis (in non-biological disciplines) since it is based on the extraction of the eigenvectors or factors of the association matrix. In actuality, there is a fundamental difference between FA and the other ordination procedures (treated later). The domains of application of the techniques we will discuss are covered in the following table and include: PCA = Principal Components Analysis PCO = Principal Coordinates Analysis NMDS = Nonmetric Multidimensional Scaling CA = Correspondence Analysis FA = Factor Analysis Domains of Ordination Method Distance Preserved Variables PCA PCO NMDS CA FA Euclidean distance Any distance measure Any distance measure χ 2 distance Euclidean distance Quantitative data, linear relationships, beware the double-zero Quantitative, semiquantitative, qualitative, or mixed Quantitative, semiquantitative, qualitative, or mixed Non-negative, homog. quant. data or binary data Quantitative data, linear relationships, beware the double-zero Reduced Space If the goal of ordination is then to reduce the dimensionality of a data set, and represent the final product in say d = 2 dimensional space, an obvious question is: to what extent does the reduced space preserve the distance relationships among objects? To answer this, you need to compute the distance between all pairs of objects, both in the multidimensional space and the reduced space. The resulting values are plotted in a scatter diagram. When the projection in reduced space accounts for a high fraction of the variance, the spaces are similar. This is called a Shepard diagram. 2

3 Shepard Diagram The Shepard diagram (1962) can be used to estimate the representativeness of ordinations obtained using any reduced-space ordination method. In PCA, the distances among objects, in both the multidimensional space and the reduced space, are calculated using Euclidean distances. The F matrix of principal components gives the coordinates of the objects in reduced space. In PCO & NMDS, Euclidean distances among the objects in reduced space are compared to distances D hi found in matrix D used as the basis for computing the ordination. CA uses chi-square distances as the abcissa. Shepard Diagram Dist. in reduced space (d hi ) 45 Dist. in multidimensional space (D hi ) Projection in reduced space accounts for a high fraction of variance; relative positions of objects are similar. Projection accounts for small fraction of variance, but relative projection of objects are similar. Projection accounts for small fraction of variance, but relative projection of objects differ in the two spaces. Ordination vs. Classification Ordination and classification are often used as complements to each other in the evaluation of EEB-related questions. With regards to multivariate data, they both (1) show relationships, (2) reduce noise, (3) identify outliers, and (4) summarize redundancy. However, they have slightly different applications. Clustering investigates pairwise distances among objects, and often produces a hierarchy of relatedness. Ordination considers the variability of the whole association matrix and emphasizes gradients and relationships. Unlike direct gradient analysis, ordination & classification procedures rely solely on object-descriptor matrices. Environmental interpretations are made post-hoc as a separate step (in most cases). 3

4 Ordination in EEB Mike Palmer at Oklahoma State University (the other OSU ) maintains an ordination web page that is an excellent resource: Here, all of the ordination methods are explained, vocabulary defined, references provided, links to other resources & software, a listserv, etc. History of Ordination in EEB Pearson develops PCA as a regression technique Spearman applies factor analysis to psychology Ramensky uses informal ordination technique & term "Ordnung" in ecology Hotelling develops PCA for understanding the correlation matrix Curtis and McIntosh employ the "continuum index" approach Williams uses Correspondence Analysis Goodall uses the term "ordination" for PCA Bray-Curtis (Polar) ordination Kruskal develops NMDS 1970's - Whittaker develops theoretical foundations of gradient analysis Hill revives Correspondence Analysis Canonical Correlation introduced to ecology Fasham, Prentice use NMDS DCA introduced by Hill and Gauch Gauch's "Multivariate Analysis in Community Ecology" CCA introduced by ter Braak Fuzzy set ordination introduced by Roberts ter Braak and Prentice's "Theory of Gradient Analysis" Principal Component Analysis (PCA) In a multinormal distribution, the first principal axis is the line that goes through the greatest dimension of the concentration ellipsoid describing the distribution. In the same way, the following principal axes (orthogonal to one another; i.e., at right angles & successively shorter) go through the following greatest dimensions of the pdimensional ellipsoid. A maximum of p principal axes may be derived from a data table containing p variables. 4

5 Principal Axes The principal axes of a dispersion matrix S are found by solving: whose characteristic equation is used to compute the eigenvalues λ k. The eigenvectors u k associated with the λ k are found by putting the different λ k values in turn in to the first equation. Theses eigenvectors are the principal axes of the dispersion matrix S. the eigenvectors are normalized (scaled to unit length) before computing the principal components, which gives the coordinates of the objects on the successive principal axes. Vocabulary Major Axis. Axis in the direction of maximum variance of a scatter of points. First principal axis. Line passing through the greatest dimension of the ellipsoid; major axis of the ellipsoid. Principal components. New variates specified by the axes of rigid rotation of the original system of co-ordinates, and corresponding to the rigid rotation of original coordinates. Gives the positions of the objects in the new system of coordinates. Principal component axes. System of axes (aka, principal axes) resulting from the rotation just described. Principal Component Analysis (PCA) PCA was first described by Hotelling (1933) and more clearly articulated in a seminal paper by Rao (1964). PCA is a powerful technique in EEB because of its properties: 1. Since any dispersion matrix S is symmetric, its principal axes u k are orthogonal to one another. They correspond to linearly independent directions in the concentration ellipsoid of the distribution of objects. 2. The eigenvalues λ k of a dispersion matrix S give the amount of variance corresponding to successive principal axes. 3. Because of (1) & (2), PCA is usually capable of summarizing a dispersion matrix containing many descriptors in just 2 or 3 dimensions. 5

6 Principal Component Analysis (PCA) Let s develop a simple numerical example involving 5 objects and 2 quantitative descriptors: NB: in practice, PCA would never be used with 2 descriptors because one could simply do a bivariate scatter plot. Principal Component Analysis (PCA) - Simple Graphical Interpretation - (a) 5 objects plotted WRT 2 descriptors, y 1 and y 2. (b) After centering the data, objects are plotted with respect to means (dashed lines). (c) Objects plotted WRT principal axes I & II, which are centered. (d) Two systems of axes b & c are superimposed after a rotation of 26 34'. Computing Eigenvectors The dispersion (covariance) matrix S can be computed directly, by multiplying the matrix of centered data with its transpose: The corresponding characteristic equation is: 6

7 Computing Eigenvectors Solving for the characteristic polynomial, the eigenvalues are λ 1 = 9 and λ 2 = 5. The total variance remains the same, but it is partitioned in a different way: the sum of the variances on the main diagonal of matrix S is ( = 14), while the sum of the eigenvalues is (9 + 5 = 14). λ 1 = 9 accounts for 64.3% of the variance and λ 2 makes up the difference (35.7%). There are always as many eigenvalues as there are descriptors. The successive eigenvalues account for progressively smaller fractions of variance. Computing Eigenvectors Now, introducing in turn the λ k s into the matrix equation: provides the eigenvectors associated with the eigenvalues. Once these vectors have been normalized (i.e., set to unit length, u u-1), they become the columns of matrix U: One can easily verify orthogonality of the eigenvectors: NB: (arc cos ) = 26 34' [angle of rotation!] Computing Principal Components The elements of the eigenvectors are also weights, or loadings of the original descriptors, in the linear combination of descriptors from which the principal components are computed. The principal components give the positions of the objects with respect to the new system of principal axes. The positions of all objects are given by matrix F of the transformed variables, and is called the matrix of component scores: where U is the matrix of eigenvectors and [y-y] the matrix of centered observations. 7

8 Computing Principal Components NB: this would not be the case if U had been multiplied by Y (instead of the centered matrix) as in some special forms of PCA (i.e., non-centered PCA). Now, for our numerical example: Computing Principal Components Since the two columns of the matrix of component scores are the coordinates of the five objects WRT the principal axes, they can be used to plot the objects WRT principal axes I and II. PCA has simply rotated the axes by 26 34' in such a way that the new axes correspond to the two main components of variation. NB1: The relative positions of the objects in the rotated p-dimensional space of principal components are the same as the p-dimensional space of the original descriptors. NB2: This means that Euclidean distances among objects have been preserved through the rotation of axes. NB:3 This is one of the important properties of PCA discussed previously. Computing Principal Components The quality of the representation in reduced Euclidean space with m dimensions only (m p) may be assessed by an "R 2 -like ratio" (analogous to regression): NB: the denominator is the trace of matrix S. Given our example, we would find 9/(9+5) = of the total variance along the first principal component (a confirmation of our previous summing of eigenvalues). 8

9 Contributions of Descriptors PCA provides the information needed to understand the role of the original descriptors in the formation of the principal components. It may also be used to show the relationships among original descriptors in the reduced space. These can be described via projection in a reduced space matrix (UΛ 1/2 ) resulting in scalars originating from the centered projection. The result is often portrayed as a biplot where both observations and descriptors are graphed on the same plot. Biplot Example Legendre et al Time series of 10 observations from a Canadian river. 12 descriptors including 5 species of benthic gastropods and 7 environmental variables. NB: spp. & env. descriptors scores all multiplied by 5 prior to plotting. Contributions of Descriptors One approach to studying the relationships among descriptors consists of scaling the eigenvectors in such a way that the cosines of the angles between the descriptor-axes be proportional to their covariances. In this approach, the angles between the descriptor-axes are between 0 (max. pos. cov.) and 180 (max. neg. cov.); and angle of 90 indicates a null covariance (orthogonality). This result is achieved by scaling each eigenvector k to a length equal to its standard deviation λ k. NB: Using this scaling, ED among objects is NOT preserved. 9

10 Contributions of Descriptors Using the diagonal matrix of eigenvalues Λ, the new matrix of eigenvectors can be directly computed by means of expression UΛ 1/2. Thus, for our numerical example: Principal Components of a Correlation Matrix Even though PCA is defined for a dispersion matrix S, it can also be carried out on a correlation matrix R since correlations are covariances of standardized descriptors. In an R matrix, all of the diagonal elements are one. It follows that the sum of the eigenvalues, which corresponds to the total variance of the dispersion matrix, is equal to the order of R, which is given by the number of descriptors p. PCs extracted from correlation matrices are not the same as those computed from dispersion matrices. BEWARE: some software apps only allow computation from a correlation matrix and this may be wholly inappropriate under certain situations! Principal Components of a Correlation Matrix In the case of correlations, the descriptors are standardized. Thus, the distances are independent of measurement units, whereas those in the space of the original descriptors vary according to scales. When the descriptors are all of the same kind and order of magnitude, and have the same units, it is clear that the S matrix must be used. When the descriptors are of a heterogeneous nature, it is more appropriate to use an R matrix. 10

11 Principal Components of a Correlation Matrix The principal components of a correlation matrix are computed from matrix U of the eigenvectors of R and the matrix of standardized observations: where s y is the standard deviation of y. Principal component analysis is still only a rotation of the system of axes. However, since the descriptors are now standardized, the objects are not positioned in the same way as if the descriptors had simply been centered (i.e., PCA from S). Standardized vs. Unstandardized In community studies standardization is often desirable when a small number of species are dominant across all of your samples (i.e., Simpson's dominance is great). It prevents "swamping" of the uncommon species. Although, in certain cases this may be seen as inappropriate. Standardization must be done when the quantities of different descriptors are measured in different units. Data matrices whose elements are the values of incomparable environmental elements should be standardized. Centered vs. Uncentered In addition to standardization, one must also considered whether the data should be "centered." The vast majority of published EEB studies are on centered PCAs, but, it may not always be the best approach to visualizing the data. An uncentered PCA is called for when the data exhibit between-axes heterogeneity; i.e., when there are clusters of data points such that each cluster has negligible projections on some subset of the axes, a different subset of axes is required for each cluster. A centered PCA is appropriate when the data exhibit little or no between-axes heterogeneity; i.e., the data points have appreciable projections on all axes. 11

12 Centered vs. Uncentered In practice, data are often obtained for which it is not immediately obvious whether the between-axes heterogeneity exceeds the within-axes heterogeneity or vice versa. When this happens, the recommendation is to do BOTH a centered and uncentered PCA. NB: There are many analytical approaches to data centering and standardization in PCA. We have just covered the basics here. 4 Types of PCA We have now essentially defined four basic types of PCA: Unstandarized Uncentered PCA Standardized Uncentered PCA Unstandardized Centered PCA Standardized Centered PCA To evaluate, consider a matrix X (from Pielou 1984) with two descriptors and ten objects: Confirm for yourself the following: * 12

13 Meaningful Components Since the principal components correspond to progressively smaller fractions of the total variance, one must determine how many components are biologically meaningful (i.e., what is the dimensionality of the reduced space?). Shepard diagrams are one approach, but there are others, perhaps better. An emprical rule-of-thumb (aka Kaiser-Guttman criterion) when using the S-matrix is that one should interpret a principal component if the corresponding λ eigenvalue is larger than the mean of the λ's. For the R-matrix, meaningful components > 1. Meaningful Components K-G Criterion A scree plot (Catell 1966) is often useful in determining d. This is simply a rank-order plot of the eigenvalues in decreasing order. Note: mean λ = 13.5 using K-G criterion, two PCs probably adequate for interpretation; smaller eigenvalues are mostly just noise. Alternatively, all values above a line fitted through the smallest values are meaningful. Misuses of PCA The power and general utility of PCA have encouraged some biologists to go beyond the limits of the model. Some transgressions have no effect, others have dramatic effects. PCA was originally defined for data with multi-normal distributions, thus the data should really be normalized. Deviations from normality do not necessarily bias the results, however, one should be particularly careful of the descriptors and try to ensure they are not skewed or have outliers. 13

14 Misuses of PCA Technically, a dispersion matrix cannot be estimated using a number of observations n smaller than or equal to the number of descriptors p. The number of objects must be larger than the number of descriptors. Do not transpose a primary matrix and compute correlations among the objects instead of among the descriptors. (This is a bit odd especially since PCA provides information about the relationships of both objects and descriptors.) Covariances and correlations are defined for quantitative descriptors only. Do not use multi-state qualitative descriptors--means and variances are useless. Misuses of PCA When calculated over data sets with many double-zeros, coefficients such as the covariance or correlation lead to ordinations that produce inadequate estimates of the distances among objects. This makes PCA particularly inappropriate for analyzing many biological data sets containing samples species, but, it remains as an excellent analytical procedure for analyzing environmental, systematic, or morphometric data. Misuses of PCA In addition to the many zeroes problem, there is a fundamental assumption that the descriptors are linearly (or at least monotonically) related to each other (lines or planes). While this may be true with certain types of data, it is rarely the case with community data where species abundances are being analyzed. Most species are unimodally distributed. 14

15 Misuses of PCA Consider this hypothetical coenocline. There are 3 species distributed along the gradient, each with a unimodal response function. What happens if you do a PCA on these data? Misuses of PCA Certainly there is a nonlinearity problem here. The red line (in 2-D space PC-1 vs PC-2) is what is known as the "horseshoe effect" (an extreme version of the "arch" effect), where axis 2 exhibits a parabolic curve & is not a true representation of the linear gradient. PCA Tutorial -NCSS- PCA is such a widely used procedure that almost every major software application supports it. My preferences for PCA are NCSS, SAS, and R. Let's look at a PCA example using 6 descriptors and 30 objects and work through a session in NCSS. 15

16 Missing Data NCSS needs to know how you wish to handle missing data. (NB: a zero value is different than a missing value!) If data are missing there are several options available, the three most common of which are: (a) delete the entire row from which this observation belongs (select "none" option)--this results in an obvious loss of data (sometimes large), (b) the "mean option" just drops the mean of that descriptor in to the matrix for analysis (but, while simple, this causes estimation problems later). (c) Estimate the covariance matrix S and use these coefficients in a regression to estimate the missing datum based on the other data that are available; once each missing value is estimated, a new covariance matrix is calculated and the process is repeated until there is convergence. This convergence is measured by the trace of the covariance matrix. This is the recommended procedure. Outliers There are various ways to approach dealing with outliers (to which PCA is quite sensitive because of the distortion they cause in the variance-covariance structure): (a) Start with doing a full univariate EDA of all of your descriptors. You may wish to winsorize or delete selected severe outliers. If you delete an observation (row) make sure you specify a procedure for handling missing data (previous). (b) There are several algorithms that are used to construct a PCA. The two that NCSS supports are "regular" (what we learned) and "robust". The latter applies weights to outlying points to minimize their influence. Both S and R can be estimated robustly. 16

17 Rotations In addition to the "normal" approach for constructing a PCA, there are various types of orthogonal rotation techniques. In other words, in order to reveal data structure and interpret the meaning of your axes, it may be advisable to provide an additional orthogonal rotation of your data. Two options are available in NCSS: varimax and quartimax. In varimax, the axes are rotated to maximize the sum of the variances of the squared loadings within each column of the loadings matrix. In quartimax, the rows of the matrix containing the rotated factors are maximized rather than the columns (varimax). Suggestion: start with normal, then try a rotation if necessary. NCSS Output Without going through all of the output, I would like to draw your attention to several key points in the output of a single software application: This is Bartlett s sphericity test (Bartlett, 1950) for testing the null hypothesis that the correlation matrix is an identity matrix (all correlations are zero). If you get a probability (P) value greater than 0.05, you should not perform a PCA on the data. NCSS Output Scree plot suggests only first 2 axes are meaningful. 17

18 NCSS Output The eigenvectors are the weights that relate the scaled original variables to the factors. These coefficients may be used to determine the relative importance of each variable in forming the factor. Often, the eigenvectors are scaled so that the variances of the factor scores are equal to one. These scaled eigenvectors are given in the Score Coefficients section described later. NCSS Output The communality is the proportion of the variation of a variable that is accounted for by the factors that are retained. It is the R 2 value that would be achieved if this variable were regressed on the retained factors. This table value gives the amount added to the communality by each factor. NCSS Output This report is useful for detecting outliers--observations that are very different from the bulk of the data. To do this, two quantities are displayed: T 2 and Q k. Both suggest rows-2&3 are having inordinate influence and need to be scrutinized. 18

19 NCSS Output This report presents the individual factor scores scaled so each column has a mean of zero and a standard deviation of one. These are the values that are plotted. Remember, there is one row of score values for each observation and one column for each factor that was kept. NCSS Output Shown are plots of the first three axes. Note the value of plotting the third axis, even though only two are indicated as important. Observation 3 is clearly an outlier (as suggested by previous stat tests). This point needs to be dropped and analysis re-run. PCA Tutorial -R- Let s look at another tutorial, this time using R. This will give you another perspective and a broader appreciation of what is available for this type of analysis. There are two ways to perform PCA in R: princomp() and prcomp(). The library LabDSV contains a third which is called pca() which essentially calls princomp(), but adds different computing and plotting options useful for EEB. 19

20 R-Tutorial First, make sure to install and load BOTH packages LabDSV and vegan. Next, access the Bryce Canyon data set from the vegan package: > library(vegan) > library(labdsv) > data(bryceveg) R-Tutorial > pca.1<-pca(bryceveg,cor=true,dim=10) Will run a PCA on the Bryce Canyon vegetation data set, using a correlation matrix and only calculating scores for the first 10 eigenvectors. There are then four aspects that need to be considered: 1. Variance explained by eigenvector 2. Cumulative variance by eigenvector 3. Species loading by eigenvector 4. Plot scores by vector Variance Explained > summary(pca.1, dim=3) Importance of components: [,1] [,2] [,3] Standard deviation Proportion of Variance Cumulative Proportion NB: For a reason peculiar to R, the first line is SD, so you must square it to get the variance. 20

21 Variance Explained We can look at the same information graphically: > varplot.pca(pca.1) Species Loadings By default, small values are depressed. You can see (from what is shown) that eigenvector 1 is negatively correlated with arcpat and ceamar and positively correlated with chrvis. Plot Scores Similar output can be obtained for the actual plot scores (here limited to just the first 3 dimensions, which is usually sufficient and a partial listing of stands): 21

22 Plot Scores These plot scores are typically what are used to produce the final PCA plot that we usually want in an EEB application. The defaults in LabDSV are designed to provide optimal output for most EEB applications (not so for the base R package), but can be altered as desired. Alternatively, these scores can be copied into a separate graphics program and the plot constructed there. > plot(pca.1,title="bryce Canyon") The default for plot is PC-1 vs. PC-2; however, you can look at other dimensions, change symbols, colors, etc.: > plot(pca.1, ax=1, ay=3, col=3, pch=3, title="bryce Canyon") NB: R also supports interactive point highlighting. After creating a graph, enter: >plotid(pca.1) Then click on points on your graph to see what happens! 22

23 R-Tutorial - Summary - 23

Unconstrained Ordination

Unconstrained Ordination Unconstrained Ordination Sites Species A Species B Species C Species D Species E 1 0 (1) 5 (1) 1 (1) 10 (4) 10 (4) 2 2 (3) 8 (3) 4 (3) 12 (6) 20 (6) 3 8 (6) 20 (6) 10 (6) 1 (2) 3 (2) 4 4 (5) 11 (5) 8 (5)

More information

4. Ordination in reduced space

4. Ordination in reduced space Université Laval Analyse multivariable - mars-avril 2008 1 4.1. Generalities 4. Ordination in reduced space Contrary to most clustering techniques, which aim at revealing discontinuities in the data, ordination

More information

Multivariate Statistics 101. Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis

Multivariate Statistics 101. Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis Multivariate Statistics 101 Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis Multivariate Statistics 101 Copy of slides and exercises PAST software download

More information

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING Vishwanath Mantha Department for Electrical and Computer Engineering Mississippi State University, Mississippi State, MS 39762 mantha@isip.msstate.edu ABSTRACT

More information

-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the

-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the 1 2 3 -Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the 1950's. -PCA is based on covariance or correlation

More information

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

PRINCIPAL COMPONENTS ANALYSIS

PRINCIPAL COMPONENTS ANALYSIS 121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves

More information

Chapter 11 Canonical analysis

Chapter 11 Canonical analysis Chapter 11 Canonical analysis 11.0 Principles of canonical analysis Canonical analysis is the simultaneous analysis of two, or possibly several data tables. Canonical analyses allow ecologists to perform

More information

Principal Component Analysis (PCA) Theory, Practice, and Examples

Principal Component Analysis (PCA) Theory, Practice, and Examples Principal Component Analysis (PCA) Theory, Practice, and Examples Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite) variables. p k n A

More information

Algebra of Principal Component Analysis

Algebra of Principal Component Analysis Algebra of Principal Component Analysis 3 Data: Y = 5 Centre each column on its mean: Y c = 7 6 9 y y = 3..6....6.8 3. 3.8.6 Covariance matrix ( variables): S = -----------Y n c ' Y 8..6 c =.6 5.8 Equation

More information

Principal Component Analysis. Applied Multivariate Statistics Spring 2012

Principal Component Analysis. Applied Multivariate Statistics Spring 2012 Principal Component Analysis Applied Multivariate Statistics Spring 2012 Overview Intuition Four definitions Practical examples Mathematical example Case study 2 PCA: Goals Goal 1: Dimension reduction

More information

EDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS

EDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS EDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS Mario Romanazzi October 29, 2017 1 Introduction An important task in multidimensional data analysis is reduction in complexity. Recalling that

More information

BIO 682 Multivariate Statistics Spring 2008

BIO 682 Multivariate Statistics Spring 2008 BIO 682 Multivariate Statistics Spring 2008 Steve Shuster http://www4.nau.edu/shustercourses/bio682/index.htm Lecture 11 Properties of Community Data Gauch 1982, Causton 1988, Jongman 1995 a. Qualitative:

More information

8. FROM CLASSICAL TO CANONICAL ORDINATION

8. FROM CLASSICAL TO CANONICAL ORDINATION Manuscript of Legendre, P. and H. J. B. Birks. 2012. From classical to canonical ordination. Chapter 8, pp. 201-248 in: Tracking Environmental Change using Lake Sediments, Volume 5: Data handling and numerical

More information

An Introduction to Ordination Connie Clark

An Introduction to Ordination Connie Clark An Introduction to Ordination Connie Clark Ordination is a collective term for multivariate techniques that adapt a multidimensional swarm of data points in such a way that when it is projected onto a

More information

Multivariate Statistics (I) 2. Principal Component Analysis (PCA)

Multivariate Statistics (I) 2. Principal Component Analysis (PCA) Multivariate Statistics (I) 2. Principal Component Analysis (PCA) 2.1 Comprehension of PCA 2.2 Concepts of PCs 2.3 Algebraic derivation of PCs 2.4 Selection and goodness-of-fit of PCs 2.5 Algebraic derivation

More information

Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses.

Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses. ISQS 6348 Final exam solutions. Name: Open book and notes, but no electronic devices. Answer short answer questions on separate blank paper. Answer multiple choice on this exam sheet. Put your name on

More information

Canonical Correlation & Principle Components Analysis

Canonical Correlation & Principle Components Analysis Canonical Correlation & Principle Components Analysis Aaron French Canonical Correlation Canonical Correlation is used to analyze correlation between two sets of variables when there is one set of IVs

More information

4/2/2018. Canonical Analyses Analysis aimed at identifying the relationship between two multivariate datasets. Cannonical Correlation.

4/2/2018. Canonical Analyses Analysis aimed at identifying the relationship between two multivariate datasets. Cannonical Correlation. GAL50.44 0 7 becki 2 0 chatamensis 0 darwini 0 ephyppium 0 guntheri 3 0 hoodensis 0 microphyles 0 porteri 2 0 vandenburghi 0 vicina 4 0 Multiple Response Variables? Univariate Statistics Questions Individual

More information

INTRODUCTION TO MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA

INTRODUCTION TO MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA INTRODUCTION TO MULTIVARIATE ANALYSIS OF ECOLOGICAL DATA David Zelený & Ching-Feng Li INTRODUCTION TO MULTIVARIATE ANALYSIS Ecologial similarity similarity and distance indices Gradient analysis regression,

More information

1.3. Principal coordinate analysis. Pierre Legendre Département de sciences biologiques Université de Montréal

1.3. Principal coordinate analysis. Pierre Legendre Département de sciences biologiques Université de Montréal 1.3. Pierre Legendre Département de sciences biologiques Université de Montréal http://www.numericalecology.com/ Pierre Legendre 2018 Definition of principal coordinate analysis (PCoA) An ordination method

More information

Table of Contents. Multivariate methods. Introduction II. Introduction I

Table of Contents. Multivariate methods. Introduction II. Introduction I Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

Multivariate Fundamentals: Rotation. Exploratory Factor Analysis

Multivariate Fundamentals: Rotation. Exploratory Factor Analysis Multivariate Fundamentals: Rotation Exploratory Factor Analysis PCA Analysis A Review Precipitation Temperature Ecosystems PCA Analysis with Spatial Data Proportion of variance explained Comp.1 + Comp.2

More information

Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.

More information

Principal Components. Summary. Sample StatFolio: pca.sgp

Principal Components. Summary. Sample StatFolio: pca.sgp Principal Components Summary... 1 Statistical Model... 4 Analysis Summary... 5 Analysis Options... 7 Scree Plot... 8 Component Weights... 9 D and 3D Component Plots... 10 Data Table... 11 D and 3D Component

More information

Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.

More information

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

More information

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS NOTES FROM PRE- LECTURE RECORDING ON PCA PCA and EFA have similar goals. They are substantially different in important ways. The goal

More information

Vector Space Models. wine_spectral.r

Vector Space Models. wine_spectral.r Vector Space Models 137 wine_spectral.r Latent Semantic Analysis Problem with words Even a small vocabulary as in wine example is challenging LSA Reduce number of columns of DTM by principal components

More information

Introduction to ordination. Gary Bradfield Botany Dept.

Introduction to ordination. Gary Bradfield Botany Dept. Introduction to ordination Gary Bradfield Botany Dept. Ordination there appears to be no word in English which one can use as an antonym to classification ; I would like to propose the term ordination.

More information

Dimensionality Reduction Techniques (DRT)

Dimensionality Reduction Techniques (DRT) Dimensionality Reduction Techniques (DRT) Introduction: Sometimes we have lot of variables in the data for analysis which create multidimensional matrix. To simplify calculation and to get appropriate,

More information

Bootstrapping, Randomization, 2B-PLS

Bootstrapping, Randomization, 2B-PLS Bootstrapping, Randomization, 2B-PLS Statistics, Tests, and Bootstrapping Statistic a measure that summarizes some feature of a set of data (e.g., mean, standard deviation, skew, coefficient of variation,

More information

7 Principal Components and Factor Analysis

7 Principal Components and Factor Analysis 7 Principal Components and actor nalysis 7.1 Principal Components a oal. Relationships between two variables can be graphically well captured in a meaningful way. or three variables this is also possible,

More information

Statistics Toolbox 6. Apply statistical algorithms and probability models

Statistics Toolbox 6. Apply statistical algorithms and probability models Statistics Toolbox 6 Apply statistical algorithms and probability models Statistics Toolbox provides engineers, scientists, researchers, financial analysts, and statisticians with a comprehensive set of

More information

Introduction to multivariate analysis Outline

Introduction to multivariate analysis Outline Introduction to multivariate analysis Outline Why do a multivariate analysis Ordination, classification, model fitting Principal component analysis Discriminant analysis, quickly Species presence/absence

More information

DIMENSION REDUCTION AND CLUSTER ANALYSIS

DIMENSION REDUCTION AND CLUSTER ANALYSIS DIMENSION REDUCTION AND CLUSTER ANALYSIS EECS 833, 6 March 2006 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@kgs.ku.edu 864-2093 Overheads and resources available at http://people.ku.edu/~gbohling/eecs833

More information

Factor Analysis. Summary. Sample StatFolio: factor analysis.sgp

Factor Analysis. Summary. Sample StatFolio: factor analysis.sgp Factor Analysis Summary... 1 Data Input... 3 Statistical Model... 4 Analysis Summary... 5 Analysis Options... 7 Scree Plot... 9 Extraction Statistics... 10 Rotation Statistics... 11 D and 3D Scatterplots...

More information

Measuring relationships among multiple responses

Measuring relationships among multiple responses Measuring relationships among multiple responses Linear association (correlation, relatedness, shared information) between pair-wise responses is an important property used in almost all multivariate analyses.

More information

Introduction to Matrix Algebra and the Multivariate Normal Distribution

Introduction to Matrix Algebra and the Multivariate Normal Distribution Introduction to Matrix Algebra and the Multivariate Normal Distribution Introduction to Structural Equation Modeling Lecture #2 January 18, 2012 ERSH 8750: Lecture 2 Motivation for Learning the Multivariate

More information

Robustness of Principal Components

Robustness of Principal Components PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.

More information

PCA Advanced Examples & Applications

PCA Advanced Examples & Applications PCA Advanced Examples & Applications Objectives: Showcase advanced PCA analysis: - Addressing the assumptions - Improving the signal / decreasing the noise Principal Components (PCA) Paper II Example:

More information

EXAM PRACTICE. 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False

EXAM PRACTICE. 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False EXAM PRACTICE 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False Stats 1: What is a Hypothesis? A testable assertion about how the world works Hypothesis

More information

Principal Component Analysis, A Powerful Scoring Technique

Principal Component Analysis, A Powerful Scoring Technique Principal Component Analysis, A Powerful Scoring Technique George C. J. Fernandez, University of Nevada - Reno, Reno NV 89557 ABSTRACT Data mining is a collection of analytical techniques to uncover new

More information

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables 1 A factor can be considered to be an underlying latent variable: (a) on which people differ (b) that is explained by unknown variables (c) that cannot be defined (d) that is influenced by observed variables

More information

Applied Multivariate Statistical Analysis Richard Johnson Dean Wichern Sixth Edition

Applied Multivariate Statistical Analysis Richard Johnson Dean Wichern Sixth Edition Applied Multivariate Statistical Analysis Richard Johnson Dean Wichern Sixth Edition Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world

More information

Lecture 2: Diversity, Distances, adonis. Lecture 2: Diversity, Distances, adonis. Alpha- Diversity. Alpha diversity definition(s)

Lecture 2: Diversity, Distances, adonis. Lecture 2: Diversity, Distances, adonis. Alpha- Diversity. Alpha diversity definition(s) Lecture 2: Diversity, Distances, adonis Lecture 2: Diversity, Distances, adonis Diversity - alpha, beta (, gamma) Beta- Diversity in practice: Ecological Distances Unsupervised Learning: Clustering, etc

More information

Exploratory Factor Analysis and Principal Component Analysis

Exploratory Factor Analysis and Principal Component Analysis Exploratory Factor Analysis and Principal Component Analysis Today s Topics: What are EFA and PCA for? Planning a factor analytic study Analysis steps: Extraction methods How many factors Rotation and

More information

Multivariate Analysis of Ecological Data using CANOCO

Multivariate Analysis of Ecological Data using CANOCO Multivariate Analysis of Ecological Data using CANOCO JAN LEPS University of South Bohemia, and Czech Academy of Sciences, Czech Republic Universitats- uric! Lanttesbibiiothek Darmstadt Bibliothek Biologie

More information

Dissimilarity and transformations. Pierre Legendre Département de sciences biologiques Université de Montréal

Dissimilarity and transformations. Pierre Legendre Département de sciences biologiques Université de Montréal and transformations Pierre Legendre Département de sciences biologiques Université de Montréal http://www.numericalecology.com/ Pierre Legendre 2017 Definitions An association coefficient is a function

More information

An Introduction to R for the Geosciences: Ordination I

An Introduction to R for the Geosciences: Ordination I An Introduction to R for the Geosciences: Ordination I Gavin Simpson April 29, 2013 Summary This practical will use the PONDS dataset to demonstrate methods of indirect gradient analysis (PCA, CA, and

More information

Didacticiel - Études de cas

Didacticiel - Études de cas 1 Topic New features for PCA (Principal Component Analysis) in Tanagra 1.4.45 and later: tools for the determination of the number of factors. Principal Component Analysis (PCA) 1 is a very popular dimension

More information

7. Variable extraction and dimensionality reduction

7. Variable extraction and dimensionality reduction 7. Variable extraction and dimensionality reduction The goal of the variable selection in the preceding chapter was to find least useful variables so that it would be possible to reduce the dimensionality

More information

Factor analysis. George Balabanis

Factor analysis. George Balabanis Factor analysis George Balabanis Key Concepts and Terms Deviation. A deviation is a value minus its mean: x - mean x Variance is a measure of how spread out a distribution is. It is computed as the average

More information

Indirect Gradient Analysis

Indirect Gradient Analysis Indirect Gradient Analysis Gavin Simpson May 12, 2006 Summary This practical will use the PONDS dataset to demonstrate methods of indirect gradient analysis (PCA, CA, and DCA) of species and environmental

More information

Multivariate analysis

Multivariate analysis Multivariate analysis Prof dr Ann Vanreusel -Multidimensional scaling -Simper analysis -BEST -ANOSIM 1 2 Gradient in species composition 3 4 Gradient in environment site1 site2 site 3 site 4 site species

More information

An Introduction to Multivariate Statistical Analysis

An Introduction to Multivariate Statistical Analysis An Introduction to Multivariate Statistical Analysis Third Edition T. W. ANDERSON Stanford University Department of Statistics Stanford, CA WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents

More information

Machine Learning (Spring 2012) Principal Component Analysis

Machine Learning (Spring 2012) Principal Component Analysis 1-71 Machine Learning (Spring 1) Principal Component Analysis Yang Xu This note is partly based on Chapter 1.1 in Chris Bishop s book on PRML and the lecture slides on PCA written by Carlos Guestrin in

More information

2/26/2017. This is similar to canonical correlation in some ways. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

2/26/2017. This is similar to canonical correlation in some ways. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 What is factor analysis? What are factors? Representing factors Graphs and equations Extracting factors Methods and criteria Interpreting

More information

Structural Equation Modeling and Confirmatory Factor Analysis. Types of Variables

Structural Equation Modeling and Confirmatory Factor Analysis. Types of Variables /4/04 Structural Equation Modeling and Confirmatory Factor Analysis Advanced Statistics for Researchers Session 3 Dr. Chris Rakes Website: http://csrakes.yolasite.com Email: Rakes@umbc.edu Twitter: @RakesChris

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

6348 Final, Fall 14. Closed book, closed notes, no electronic devices. Points (out of 200) in parentheses.

6348 Final, Fall 14. Closed book, closed notes, no electronic devices. Points (out of 200) in parentheses. 6348 Final, Fall 14. Closed book, closed notes, no electronic devices. Points (out of 200) in parentheses. 0 11 1 1.(5) Give the result of the following matrix multiplication: 1 10 1 Solution: 0 1 1 2

More information

Principal component analysis

Principal component analysis Principal component analysis Angela Montanari 1 Introduction Principal component analysis (PCA) is one of the most popular multivariate statistical methods. It was first introduced by Pearson (1901) and

More information

Binary choice 3.3 Maximum likelihood estimation

Binary choice 3.3 Maximum likelihood estimation Binary choice 3.3 Maximum likelihood estimation Michel Bierlaire Output of the estimation We explain here the various outputs from the maximum likelihood estimation procedure. Solution of the maximum likelihood

More information

Basics of Multivariate Modelling and Data Analysis

Basics of Multivariate Modelling and Data Analysis Basics of Multivariate Modelling and Data Analysis Kurt-Erik Häggblom 2. Overview of multivariate techniques 2.1 Different approaches to multivariate data analysis 2.2 Classification of multivariate techniques

More information

Exploratory Factor Analysis and Principal Component Analysis

Exploratory Factor Analysis and Principal Component Analysis Exploratory Factor Analysis and Principal Component Analysis Today s Topics: What are EFA and PCA for? Planning a factor analytic study Analysis steps: Extraction methods How many factors Rotation and

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

Data Preprocessing Tasks

Data Preprocessing Tasks Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can

More information

A User's Guide To Principal Components

A User's Guide To Principal Components A User's Guide To Principal Components J. EDWARD JACKSON A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Brisbane Toronto Singapore Contents Preface Introduction 1. Getting

More information

A Introduction to Matrix Algebra and the Multivariate Normal Distribution

A Introduction to Matrix Algebra and the Multivariate Normal Distribution A Introduction to Matrix Algebra and the Multivariate Normal Distribution PRE 905: Multivariate Analysis Spring 2014 Lecture 6 PRE 905: Lecture 7 Matrix Algebra and the MVN Distribution Today s Class An

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Introduction Edps/Psych/Stat/ 584 Applied Multivariate Statistics Carolyn J Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN c Board of Trustees,

More information

Package paramap. R topics documented: September 20, 2017

Package paramap. R topics documented: September 20, 2017 Package paramap September 20, 2017 Type Package Title paramap Version 1.4 Date 2017-09-20 Author Brian P. O'Connor Maintainer Brian P. O'Connor Depends R(>= 1.9.0), psych, polycor

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice

The Model Building Process Part I: Checking Model Assumptions Best Practice The Model Building Process Part I: Checking Model Assumptions Best Practice Authored by: Sarah Burke, PhD 31 July 2017 The goal of the STAT T&E COE is to assist in developing rigorous, defensible test

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

sphericity, 5-29, 5-32 residuals, 7-1 spread and level, 2-17 t test, 1-13 transformations, 2-15 violations, 1-19

sphericity, 5-29, 5-32 residuals, 7-1 spread and level, 2-17 t test, 1-13 transformations, 2-15 violations, 1-19 additive tree structure, 10-28 ADDTREE, 10-51, 10-53 EXTREE, 10-31 four point condition, 10-29 ADDTREE, 10-28, 10-51, 10-53 adjusted R 2, 8-7 ALSCAL, 10-49 ANCOVA, 9-1 assumptions, 9-5 example, 9-7 MANOVA

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) Authored by: Sarah Burke, PhD Version 1: 31 July 2017 Version 1.1: 24 October 2017 The goal of the STAT T&E COE

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Laurenz Wiskott Institute for Theoretical Biology Humboldt-University Berlin Invalidenstraße 43 D-10115 Berlin, Germany 11 March 2004 1 Intuition Problem Statement Experimental

More information

Exercises * on Linear Algebra

Exercises * on Linear Algebra Exercises * on Linear Algebra Laurenz Wiskott Institut für Neuroinformatik Ruhr-Universität Bochum, Germany, EU 4 February 7 Contents Vector spaces 4. Definition...............................................

More information

Principal Components Analysis. Sargur Srihari University at Buffalo

Principal Components Analysis. Sargur Srihari University at Buffalo Principal Components Analysis Sargur Srihari University at Buffalo 1 Topics Projection Pursuit Methods Principal Components Examples of using PCA Graphical use of PCA Multidimensional Scaling Srihari 2

More information

Neuendorf MANOVA /MANCOVA. Model: MAIN EFFECTS: X1 (Factor A) X2 (Factor B) INTERACTIONS : X1 x X2 (A x B Interaction) Y4. Like ANOVA/ANCOVA:

Neuendorf MANOVA /MANCOVA. Model: MAIN EFFECTS: X1 (Factor A) X2 (Factor B) INTERACTIONS : X1 x X2 (A x B Interaction) Y4. Like ANOVA/ANCOVA: 1 Neuendorf MANOVA /MANCOVA Model: MAIN EFFECTS: X1 (Factor A) X2 (Factor B) Y1 Y2 INTERACTIONS : Y3 X1 x X2 (A x B Interaction) Y4 Like ANOVA/ANCOVA: 1. Assumes equal variance (equal covariance matrices)

More information

REVIEW 8/2/2017 陈芳华东师大英语系

REVIEW 8/2/2017 陈芳华东师大英语系 REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p

More information

Principal Components Analysis using R Francis Huang / November 2, 2016

Principal Components Analysis using R Francis Huang / November 2, 2016 Principal Components Analysis using R Francis Huang / huangf@missouri.edu November 2, 2016 Principal components analysis (PCA) is a convenient way to reduce high dimensional data into a smaller number

More information

Karhunen-Loève Transform KLT. JanKees van der Poel D.Sc. Student, Mechanical Engineering

Karhunen-Loève Transform KLT. JanKees van der Poel D.Sc. Student, Mechanical Engineering Karhunen-Loève Transform KLT JanKees van der Poel D.Sc. Student, Mechanical Engineering Karhunen-Loève Transform Has many names cited in literature: Karhunen-Loève Transform (KLT); Karhunen-Loève Decomposition

More information

5. Discriminant analysis

5. Discriminant analysis 5. Discriminant analysis We continue from Bayes s rule presented in Section 3 on p. 85 (5.1) where c i is a class, x isap-dimensional vector (data case) and we use class conditional probability (density

More information

Notes on Latent Semantic Analysis

Notes on Latent Semantic Analysis Notes on Latent Semantic Analysis Costas Boulis 1 Introduction One of the most fundamental problems of information retrieval (IR) is to find all documents (and nothing but those) that are semantically

More information

PRINCIPAL COMPONENTS ANALYSIS

PRINCIPAL COMPONENTS ANALYSIS PRINCIPAL COMPONENTS ANALYSIS Iris Data Let s find Principal Components using the iris dataset. This is a well known dataset, often used to demonstrate the effect of clustering algorithms. It contains

More information

Chapter 4: Factor Analysis

Chapter 4: Factor Analysis Chapter 4: Factor Analysis In many studies, we may not be able to measure directly the variables of interest. We can merely collect data on other variables which may be related to the variables of interest.

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

Principal Component Analysis & Factor Analysis. Psych 818 DeShon

Principal Component Analysis & Factor Analysis. Psych 818 DeShon Principal Component Analysis & Factor Analysis Psych 818 DeShon Purpose Both are used to reduce the dimensionality of correlated measurements Can be used in a purely exploratory fashion to investigate

More information

Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis

Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis For example Data reduction approaches Cluster analysis Principal components analysis

More information

Quantitative Understanding in Biology Principal Components Analysis

Quantitative Understanding in Biology Principal Components Analysis Quantitative Understanding in Biology Principal Components Analysis Introduction Throughout this course we have seen examples of complex mathematical phenomena being represented as linear combinations

More information

ANOVA approach. Investigates interaction terms. Disadvantages: Requires careful sampling design with replication

ANOVA approach. Investigates interaction terms. Disadvantages: Requires careful sampling design with replication ANOVA approach Advantages: Ideal for evaluating hypotheses Ideal to quantify effect size (e.g., differences between groups) Address multiple factors at once Investigates interaction terms Disadvantages:

More information

STAT 730 Chapter 9: Factor analysis

STAT 730 Chapter 9: Factor analysis STAT 730 Chapter 9: Factor analysis Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Data Analysis 1 / 15 Basic idea Factor analysis attempts to explain the

More information

B. Weaver (18-Oct-2001) Factor analysis Chapter 7: Factor Analysis

B. Weaver (18-Oct-2001) Factor analysis Chapter 7: Factor Analysis B Weaver (18-Oct-2001) Factor analysis 1 Chapter 7: Factor Analysis 71 Introduction Factor analysis (FA) was developed by C Spearman It is a technique for examining the interrelationships in a set of variables

More information

Neuendorf MANOVA /MANCOVA. Model: X1 (Factor A) X2 (Factor B) X1 x X2 (Interaction) Y4. Like ANOVA/ANCOVA:

Neuendorf MANOVA /MANCOVA. Model: X1 (Factor A) X2 (Factor B) X1 x X2 (Interaction) Y4. Like ANOVA/ANCOVA: 1 Neuendorf MANOVA /MANCOVA Model: X1 (Factor A) X2 (Factor B) X1 x X2 (Interaction) Y1 Y2 Y3 Y4 Like ANOVA/ANCOVA: 1. Assumes equal variance (equal covariance matrices) across cells (groups defined by

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.

More information

1 A Review of Correlation and Regression

1 A Review of Correlation and Regression 1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then

More information

Principal component analysis (PCA) for clustering gene expression data

Principal component analysis (PCA) for clustering gene expression data Principal component analysis (PCA) for clustering gene expression data Ka Yee Yeung Walter L. Ruzzo Bioinformatics, v17 #9 (2001) pp 763-774 1 Outline of talk Background and motivation Design of our empirical

More information