Unconstrained Ordination

Size: px
Start display at page:

Download "Unconstrained Ordination"

Transcription

1 Unconstrained Ordination Sites Species A Species B Species C Species D Species E 1 0 (1) 5 (1) 1 (1) 10 (4) 10 (4) 2 2 (3) 8 (3) 4 (3) 12 (6) 20 (6) 3 8 (6) 20 (6) 10 (6) 1 (2) 3 (2) 4 4 (5) 11 (5) 8 (5) 11 (5) 14 (5) 5 1 (2) 6 (2) 2 (2) 2 (3) 6 (3) 6 3 (4) 10 (4) 6 (4) 0 (1) 0 (1) 1 Unconstrained Ordination Sites Species A Species B Species C Species D Species E 1 0 (1) 5 (1) 1 (1) 10 (4) 10 (4) 2 2 (3) 8 (3) 4 (3) 12 (6) 20 (6) 3 8 (6) 20 (6) 10 (6) 1 (2) 3 (2) 4 4 (5) 11 (5) 8 (5) 11 (5) 14 (5) 5 1 (2) 6 (2) 2 (2) 2 (3) 6 (3) 6 3 (4) 10 (4) 6 (4) 0 (1) 0 (1) -ABC DE 4 3 +ABC -DE 6 2

2 Important Characteristics of Unconstrained Ordination Techniques P A family of techniques with similar goals. P Organize sampling entities (e.g., species, sites, observations, etc.) along continuous ecological gradients. P Assess relationships within a single set of variables; no attempt is made to define the relationship between a set of independent variables and one or more dependent variables. 3 Important Characteristics of Unconstrained Ordination Techniques P Extract dominant, underlying gradients of variation (e.g., principal components) among sampling units from a set of multivariate observations; emphasizes variation among samples rather than similarity (as in cluster analysis). P Reduce the dimensionality of a multivariate data set by condensing a large number of original variables into a smaller set of new composite dimensions (e.g., principal components) with a minimum loss of information. 4

3 Important Characteristics of Unconstrained Ordination Techniques P Summarize data redundancy by placing similar entities in proximity in ordination space and producing a parsimonious understanding of the data in terms of a few dominant gradients of variation. P Define new composite dimensions (e.g., principal components) as weighted, linear combinations of the original variables. P Eliminate noise from a multivariate data set by recovering patterns in the first few composite dimensions (e.g., principal components) and deferring noise to subsequent axes. 5 Principal Components Analysis Data Matrix Canopy Snag Canopy Obs Cover Density Height X N Centroid PC1 X 1 PC1 =.8x 1 -.4x 2 +.1x 3 PC2 = -.1x 1 -.1x 2 +.9x 3 X 2 PC2 6

4 PCA: The Data Set P Single set of variables; no distinction between independent and dependent variables. P Continuous, categorical, or count variables (preferably all continuous); mixed data sets unknown but probably not appropriate. P Every sample entity must be measured on the same set of variables. P Ideally there should be more samples (rows) than number of variables (columns) [i.e., data matrix should be of full rank]. 7 PCA: The Data Set P Common 2-way ecological data: < Sites-by-Environmental Parameters < Species-by-Niche Parameters < Species-by-Behavioral Characteristics < Samples-by-Species < Specimens-by-Characterisitcs Variables Sample x 1 x 2 x 3... x p 1 x 11 x 12 x x 1p 2 x 21 x 22 x x 2p 3 x 31 x 32 x x 3p n x n1 x n2 x n3... x np 8

5 PCA: The Data Set 1 1S S S S S PCA: Assumptions P Descriptive use of PCA requires "no" assumptions! P Inferential use of PCA requires assumptions! 1. Multivariate Normality PCA assumes that the underlying structure of the data is multivariate normal (i.e., hyperellipsoidal with normally varying density around the centroid). Such a distribution exists when each variable has a normal distribution about fixed values on all others. 10

6 PCA: Assumptions P Multivariate Normality 11 PCA: Assumptions Multivariate Normality --Consequences: P Invalid significance tests. P Lose strict independence (i.e., orthogonality) among principal components. P Later principal components (i.e., those associated with smaller eigenvalues) will often resemble the earlier components, but will have smaller principal component loadings. 12

7 PCA: Assumptions Multivariate Normality Univariate Diagnostics: P Conduct univariate tests of normality for each variable. P Visually inspect distribution plots (e.g., histogram, box plot, and normal quantile-quantile plot) for each variable. P < "Univariate" normality does not equal "multivariate" normality. < Often used to determine whether the variables should be transformed prior to the PCA. < Usually assumed that univariate normality is a good step towards multivariate normality. 13 PCA: Assumptions Multivariate Normality Univariate Diagnostics: 14

8 PCA: Assumptions Multivariate normailty multivariate diagnostics: P Conduct a multivariate test of normality (e.g., E- statistic). P Visually inspect distribution plots (e.g., histogram, box plot, normal quantilequantile plot) for each principal component (PC). 15 PCA: Assumptions Multivariate Normality Solutions: P Collect a larger sample; although even an infinitely large sample will not normalize an inherently nonnormal distribution. P Ignore the problem and do not make inferences. P Use a nonparametric ordination technique like NMDS 16

9 PCA: Assumptions 2. Independent Random Sample (and effects of outliers) PCA assumes that random samples of observation vectors have been drawn independently from a P-dimensional multivariate normal population; that is, that sample points represent an independent, random sample of the multidimensional space. Transect From Urban 17 PCA: Assumptions Independent Random Sample (and outliers) Consequences: P Invalid significance tests. P Outliers and point clusters exert undue pull on the direction of the component axes and therefore strongly affect the ecological efficacy of the ordination. 18

10 PCA: Assumptions Outliers Univariate Diagnostics: P Standardize the data and inspect for entities with any value more than, e.g., 2.5 standard deviations from the mean on any variable. Obs var1 var2 var x i s x Obs stndvar1 stndvar2 stndvar PCA: Assumptions Outliers Univariate Diagnostics: P Construct univariate stem-and-leaf, box, and normal probability plots for each variable and check for suspected outliers. 20

11 Outliers Multivariate Diagnostics: P Examine deviations of the sample average (Euclidean) distances to other samples. PCA: Assumptions Standard deviation scores >3 Extreme observations 21 Outliers Multivariate Diagnostics: P Examine each sample s Mahalanobis distance to the group of remaining samples. PCA: Assumptions D 2 22

12 Outliers Multivariate Diagnostics: P Construct histograms, box plots, and normal quantile-quantile plots of the principal component scores for each principal component and check for suspected outliers. PCA: Assumptions 23 Outliers Multivariate Diagnostics: P Construct scatter plots of principal components and check for suspect points. PCA: Assumptions 24

13 PCA: Assumptions Independent Random Sample (and outliers) Solutions: P Intelligent sampling plan (large, representative sample). P Use stratified random sampling when appropriate. P Eliminate outliers. P Ignore the problem and do not make inferences. 25 PCA: Assumptions 3. Linearity PCA assumes that variables change linearly along underlying gradients and that there exists linear relationships among variables such that the variables can be combined in a linear fashion to create principal components. X3 Centroid PC1 X 1 PC1 =.8x 1 -.4x 2 +.1x 3 PC2 = -.1x 1 -.1x 2 +.9x 3 X 2 PC2 26

14 Linearity Consequences: P Failure to identify and interpret the gradient. PCA: Assumptions Sampling over A-C A B C Environmental Gradient 27 Linearity Consequences: P Failure to identify and interpret the gradient. PCA: Assumptions 28

15 Linearity Diagnostics: (A) Scatter plots of variables. PCA: Assumptions 29 Linearity Diagnostics: (B) Scatter plots of principal component (PC) scores. PCA: Assumptions 30

16 Linearity Diagnostics: (C) Scatter plots of variables vs. principal component (PC) scores. PCA: Assumptions 31 Linearity Solutions: PCA: Assumptions P Sample a shorter range of the environmental gradient. P Use alternative ordination methods, such as Detrended Correspondence Analysis, Detrended Principal Components Analysis, or Nonmetric Multidimensional Scaling. P Interpret results cautiously. 32

17 General Rules: PCA: Sample Size Considerations P More samples (rows) than variables (columns). P Enough samples should be taken to adequately describe each distinctive community. P Enough samples should be taken to ensure that the covariance structure of the population is estimated accurately and precisely by the sample data set (i.e., enough to insure stable parameter estimates). Rule of Thumb: N $3@P 33 PCA: Sample Size Considerations Sample Size Solutions: P Eliminate unimportant variables. P Sample sequentially until the mean and variance of the parameter estimates (e.g., the eigenvalues and eigenvectors) stabilize. P Examine the stability of the results using a resampling procedure. P Interpret results cautiously; don't extrapolate findings. 34

18 PCA: Deriving the Principal Components Correlation vs. Covariance Matrices: Raw Data Matrix OBS CCov Snag CHgt Variance ij j i i1 n x x n 2 σ (chgt) = 1/3[(35-20) 2 + (20-20) 2 + (5-20) 2 ] = 150 s jk Covariance n i1 xij x j xik xk Cov (ccov-snag) = 1/3[(80-40)( )+ (35-40)( )+ (5-40)( )] = -14 n CCov Snag CHgt Covariance Matrix CCov Snag CHgt Diagonals = variances Off-diagonals = covariances 35 PCA: Deriving the Principal Components Correlation vs. Covariance Matrices: Raw Data Matrix OBS CCov Snag CHgt Correlation ij ik ij ik 2 2 n x x x x 2 n x x n x x ij ij ik ik 2 CCov Snag CHgt Correlation Matrix CCov Snag Cor (ccov-snag) = 3[(80)(1.2) + (35)(3.3) + (5)(2.1)] -[(120)(6.6)] = {[3( ) -(120 2 )] [3( ) -(6.6 2 )] }1/ Cor (ccov-ccov) = 3[(80)(80) + (35)(35) + (5)(5)] -[(120)(120)] = {[3( ) -(120 2 )] [3( ) -(120 2 )] }1/2 CHgt Diagonals = internal association Off-diagonals = correlations 36

19 PCA: Deriving the Principal Components Correlation vs. Covariance Matrices: Raw Data Matrix Standardized Data Matrix OBS CCov Snag CHgt x ij x s j OBS CCov Snag CHgt Correlation Matrix Covariance Matrix CCov Snag CHgt CCov Snag CHgt CCov Snag = CCov Snag CHgt CHgt PCA: Deriving the Principal Components Correlation vs. Covariance Matrices: Correlation Matrix Covariance Matrix CCov Snag CHgt CCov Snag CHgt CCov CCov Snag Snag CHgt CHgt P Correlation matrix treats all variables as equally important (i.e., gives equal weight to all variables). P Raw covariance matrix gives more weight to variables with larger variances (i.e., gives weights to variables proportionate to their variance). 38

20 PCA: Deriving the Principal Components Correlation vs. Covariance Matrices: P Note that the solutions obtained from the correlation and raw covariance matrices will be different. P Correlation matrix almost always preferred, and is always more appropriate if the scale or unit of measurement differs among variables. P Correlation matrix indicates how parsimoniously the PCA will be able to summarize the data. 39 PCA: Deriving the Principal Components Eigenvalues: Characteristic Equation: R I 0 P An NxP data set has P eigenvalues. Where: R = correlation or covariance matrix λ = vector of eigenvalue solutions I = identity matrix P Eigenvalues = variances of the corresponding PC s. P λ 1 > λ 2 > λ 3 >... > λ p P Cor Approach: Σλ i = P = trace of correlation matrix P Cov Approach: Σλ i = Σσ i = trace of covariance matrix 40

21 PCA: Deriving the Principal Components Eigenvalues: 41 PCA: Deriving the Principal Components Eigenvectors: Characteristic Equation: R I v 0 i Where: λ i = eigenvalue corresponding to the i th PC v i = eigenvector associated with the i th PC P Eigenvectors equal the coefficients (weights) of the variables in the linear equations that define the principal components. P Cor Approach: v i proportional to structure coefficients - loadings (s i ). P Cov Approach: v i "not" proportional to s i. i 42

22 PCA: Deriving the Principal Components Eigenvectors: PC1 =.8x 1 -.4x 2 +.1x 3 PC2 = -.1x 1 -.1x 2 +.9x 3... X3 PC1 Centroid X 1 X 2 PC2 43 PCA: Deriving the Principal Components Eigenvectors: Correlation Approach Covariance Approach 44

23 Sample Scores: PCA: Deriving the Principal Components z a x a x... a x * * * ij i1 j1 i2 j2 ip jp Z ij = a ik = x * jk = score for i th PC and j th sample eigenvector coefficient for i th PC and k th variable standardized value for j th sample and k th variable P Scores represent the values of the new uncorrelated variables (components) that can serve as the input data for subsequent analysis by other statistical procedures. 45 PCA: Assessing the Importance of the PCs P How important (significant) is each component? P How "many" components to retain and interpret? 1. Latent Root Criterion: Retain components with eigenvalues >1 (correlation approach only), because components with eigenvalues <1 represent less variance than is accounted for by a single variable. P To determine "maximum" number of components to retain. P Most reliable when the number of variables is between 20 and 50. Too few when P < 20; too many when P >

24 PCA: Assessing the Importance of the PCs 1. Latent Root Criterion: Keep 8 principal components. 47 PCA: Assessing the Importance of the PCs 2. Scree Plot Criterion: The point at which the scree plot curve first begins to straighten out is considered to indicate the maximum number of components to retain. Keep 4 principal components? 48

25 PCA: Assessing the Importance of the PCs 3. Broken Stick Criterion: The point at which the scree plot curve crosses the broken stick model distribution is considered to indicate the maximum number of components to retain. * i p 1 ki k Broken stick: Keep 2 or 3 principal components PCA: Assessing the Importance of the PCs 4. Relative Percent Variance Criterion: Compare the relative magnitudes of the eigenvalues to see how much of the total sample variation in the data set is accounted for by each principal component. i p i1 i i 50

26 PCA: Assessing the Importance of the PCs 4. Relative Percent Variance Criterion: #Measures how much of the total sample variance is accounted for by each principal component. #Cumulative percent variance of all eigenvalues equals 100%. #Used to evaluate the "importance" of each principal component. #Used to determine how many principal components to retain. #Used to evaluate the effectiveness of the ordination as a whole in parsimoniously summarizing the data structure. #Influenced by the number of variables in the data set (decreases as P increases). #Influenced by the number of samples (decreases as N increases). #Should only be used in conjuction with other measures. 51 PCA: Assessing the Importance of the PCs 5. Significance Tests: A. Parametric Tests: Rarely employed because of the assumptions involved (e.g., multivariate normal, independent random sample). B. Nonparametric Tests Based on Resampling Procedures: Jackknife/Bootstrap/Randomization Procedures conceptually simple, computer-intensive, nonparametric procedures, involving resampling the original data, for determining the variability of statistics with unknown or poorly known distributions. 52

27 PCA: Assessing the Importance of the PCs 5. Significance Tests: Remember, statiscal significance does not always mean ecological significance. P Component may not describe enough variance to meet your ecological needs. P Component may not have a meaningful ecological interpretation as judged by the principal component loadings. P Ultimately, the utility of each principal component must be grounded on ecological criteria. 53 PCA: Assessing the Importance of the PCs Bootstrap Procedure: The premise is that, through resampling of the original data, confidence intervals may be constructed based on the repeated recalculation of the statistic under investigation. [ n x p ] PCA Φ i(all) Bootstrap sample [ n x p ] [ n x p ] PCA Φ * i(1) Repeat M times [ n x p ] PCA 54 Φ * i(m) Bootstrap estimates

28 PCA: Assessing the Importance of the PCs Bootstrap Procedure: Bootstrap estimate * i j M 1 M * i( j) * SE( ) i M ji * * i(( j) i M 1 2 Bootstrap error ratio * i * SE( ) i t-statistic with N 1 df 55 PCA: Assessing the Importance of the PCs Bootstrap Procedure: Test Hypothesis: H o : Φ i(observed) = Φ i(no real data structure) H A : Φ i(observed) > Φ i(no real data structure) Translated distribution Translation α- level Distribution of bootstrap estimates 0 Φ i(ho) Φ i(observed) 56

29 PCA: Assessing the Importance of the PCs Randomization Procedure: The premise is that, through resampling of the original data, we can generate the actual distribution of the statistic under the null hypothesis, and test the observed sample against this distribution. [ n x p ] PCA Φ i(all) Randomize w/i Columns Random permutation [ n x p ] PCA Φ * i(1) Repeat M times [ n x p ] PCA 57 Φ * i(m) Permutation estimates PCA: Assessing the Importance of the PCs Randomization Procedure: Test Hypothesis: H o : Φ i(observed) = Φ i(no real data structure) H A : Φ i(observed) > Φ i(no real data structure) Distribution of permutation estimates Direct and intuitive interpretation of the type 1 error rate! α- level 0 Φ i(ho) Φ i(observed) 58

30 PCA: Assessing the Importance of the PCs Randomization Procedure: 59 PCA: Interpreting the Principal Components 1. Principal Component Structure (also loadings ): s v ( ) ij i j i S ij = v i(j) = λ i = 60 correlation between the i th PC and the j th variable eigenvector element of the j th variable in the i th PC derived from correlation matrix i th eigenvalue (i th PC) #Bivariate product-moment correlations between the principal components and original variables. #The squared loadings indicate the percent of the variable's variance accounted for by that component. #Note that the structure is different depending on whether the correlation matrix or raw covariance matrix is used in the eigenanalysis.

31 PCA: Interpreting the Principal Components 1. Principal Component Structure (also loadings ): s v ( ) ij i j i s * PCA: Interpreting the Principal Components Significance of Structure Correlations: P s ij > ±0.30 significant, when N > 50 P s ij > ±0.26 significant, when N = 100 P s ij > ±0.18 significant, when N = 200 P s ij > ±0.15 significant, when N = 300 # The disadvantage of these rules is that the number of variables being analyzed and the specific component being examined are not considered. < As you move from the 1 st to later components, the acceptable level for considering a loading significant should increase. < As N or P increases, the acceptable level for considering a loading significant should decrease. 62

32 PCA: Interpreting the Principal Components Interpreting Structure Correlations: P The larger the sample size, the smaller the loading to be considered significant. P The larger the number of variables being analyzed, the smaller the loading to be considered significant. P The larger the number of components, the larger the size of the loading on later factors to be considered significant for interpretation. P Note, that significant correlation coefficients may not necessarily represent ecologically important variables. 63 PCA: Interpreting the Principal Components Interpreting Structure Correlations: P Highlight the highest significant loading for each variable (red) P Highlight other significant loadings (blue) PC1 Gradient Conifer Cedar/Fir Huckle Fern/Grape Salal/Plum Currant Hardwood Alder/Salmon 64

33 PCA: Interpreting the Principal Components Interpreting Structure Correlations: P Highlight the highest significant loading for each variable (red) P Highlight other significant loadings (blue) PC2 Gradient Vine Fir Fern Hazel Plum/Salal Hemlock Cedar Forb 65 PCA: Interpreting the Principal Components 2. Final Communalities: Communality of a variable is equal to the squared multiple correlation for predicting the variable from the principal components; that is, the proportion of a variable's variance that is accounted for by the principal components. c j P s 2 ij i1 P Prior communalities = 1.0 P Final communality estimates = the squared multiple correlations for predicting the variables from the "retained" principal components 66

34 PCA: Interpreting the Principal Components 2. Final Communalities: c j P s 2 ij i1 P Final communality estimates indicate how well the original variables are accounted for by the "retained" principal components. P Final communality estimates increase from zero to one as the number of retained principal components increases from zero to P. 67 PCA: Interpreting the Principal Components 3. Principal Component Scores and Biplots: P Scatter plots of scores graphically illustrate the relationships among entities. P Axes can be scaled in a variety of ways (don t sweat over it). P Scatter plots of scores can be useful in evaluating model assumptions (e.g., linearity, outliers). 68

35 PCA: Interpreting the Principal Components Enhanced Ordination Plots: 3d plots Samples Variables 69 PCA: Interpreting the Principal Components Enhanced Ordination Plots: Overlays Sample scores scaled in relation to magnitude of variable 70

36 PCA: Interpreting the Principal Components Enhanced Ordination Plots: Overlays Scatter plot envelopes using quantile or robust spline regression P Scatter plot envelopes are useful for assessing the shape of the response function (i.e., linear, unimodal) and, thus, determining model appropriateness. 71 PCA: Interpreting the Principal Components Enhanced Ordination Plots: Displaying groups Ordihulls Ordispider 72

37 PCA: Interpreting the Principal Components Enhanced Ordination Plots: Displaying groups Ordiellipse Ordiarrows PCA: Interpreting the Principal Components Enhanced Ordination Plots: Example of worm tracks Marten habitat trajectories under simulated management regimes Start Habitat extent 74

38 PCA: Interpreting the Principal Components Enhanced Ordination Plots: Fitting other variables Permutation Tests: Vector fitting: 75 PCA: Interpreting the Principal Components Enhanced Ordination Plots: Fitting other variables Permutation Tests: Factor fitting: 76

39 PCA: Interpreting the Principal Components Enhanced Ordination Plots: Fitting other variables GAM results: GAM surface fitting: 77 PCA: Rotating the Principal Components Purpose: To improve component interpretation by redistributing the variance from earlier components to later ones to achieve a simpler, theoretically more meaningful, principal component structure; that is, by increasing loadings of important variables and decreasing loadings of unimportant variables. P Orthogonal rotation: Axes maintained at 90 o < varimax rotation < quartimax rotation < equimax rotation P Oblique rotation: Axes "not" maintained at 90 o 78

40 PCA: Rotating the Principal Components Orthogonal Rotation +1.0 PC2 V 1 Oblique Rotation +1.0 PC2 V 1 V 2 V 2 Loadings V 4 V 3 PC1 Loadings V 4 V 3 PC1 V 5 V PCA: Rotating the Principal Components Orthogonal Rotations: P Varimax...column rotation to simplify structure within component and improve component interpretation (increase high loadings and decrease low loadings). P Quartimax...row rotation for simplifying interpretation of variables in terms of wellunderstood components (variables load high on fewer components). 80

41 PCA: Rotating the Principal Components Use and Limitations: P Effective when sample is not multivariate normal. P Rotations always reduce the eigenvalue (variance) of the first component. P Rotations always maintain the cumulative percent variance (or total variance accounted for by the retained components). P Only true test of the usefulness of component rotation is whether the component interpretation is better. 81 Limitations: Principal Components Analysis P PCA can produce severely distorted data sets with long gradients other techniques perform better in most cases under these conditions. P PCA assumes an underlying multivariate normal distribution which is unlikely in most ecological data sets. P PCA assumes linear response model, e.g., that species respond linearly to underlying environmental gradients unrealistic in many (but not all) ecological data sets, especially for long gradients. 82

42 Principal Components Analysis Review: P Eigenvalues... P Eigenvectors... P Structure coefficients (loadings)... P Principal component scores... P Final communalities... Variances of PC s. Variable weights in PC linear combinations. Correlations between original variables and PC s. Location of samples on PC s. % of variance in original variables explained by retained PCs. 83

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In

More information

Principal Component Analysis (PCA) Theory, Practice, and Examples

Principal Component Analysis (PCA) Theory, Practice, and Examples Principal Component Analysis (PCA) Theory, Practice, and Examples Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite) variables. p k n A

More information

Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis

Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis For example Data reduction approaches Cluster analysis Principal components analysis

More information

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables

1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables 1 A factor can be considered to be an underlying latent variable: (a) on which people differ (b) that is explained by unknown variables (c) that cannot be defined (d) that is influenced by observed variables

More information

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS

LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS NOTES FROM PRE- LECTURE RECORDING ON PCA PCA and EFA have similar goals. They are substantially different in important ways. The goal

More information

-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the

-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the 1 2 3 -Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the 1950's. -PCA is based on covariance or correlation

More information

Principal Component Analysis & Factor Analysis. Psych 818 DeShon

Principal Component Analysis & Factor Analysis. Psych 818 DeShon Principal Component Analysis & Factor Analysis Psych 818 DeShon Purpose Both are used to reduce the dimensionality of correlated measurements Can be used in a purely exploratory fashion to investigate

More information

4/2/2018. Canonical Analyses Analysis aimed at identifying the relationship between two multivariate datasets. Cannonical Correlation.

4/2/2018. Canonical Analyses Analysis aimed at identifying the relationship between two multivariate datasets. Cannonical Correlation. GAL50.44 0 7 becki 2 0 chatamensis 0 darwini 0 ephyppium 0 guntheri 3 0 hoodensis 0 microphyles 0 porteri 2 0 vandenburghi 0 vicina 4 0 Multiple Response Variables? Univariate Statistics Questions Individual

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

Ordination & PCA. Ordination. Ordination

Ordination & PCA. Ordination. Ordination Ordination & PCA Introduction to Ordination Purpose & types Shepard diagrams Principal Components Analysis (PCA) Properties Computing eigenvalues Computing principal components Biplots Covariance vs. Correlation

More information

Principal component analysis (PCA) for clustering gene expression data

Principal component analysis (PCA) for clustering gene expression data Principal component analysis (PCA) for clustering gene expression data Ka Yee Yeung Walter L. Ruzzo Bioinformatics, v17 #9 (2001) pp 763-774 1 Outline of talk Background and motivation Design of our empirical

More information

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection

More information

PCA Advanced Examples & Applications

PCA Advanced Examples & Applications PCA Advanced Examples & Applications Objectives: Showcase advanced PCA analysis: - Addressing the assumptions - Improving the signal / decreasing the noise Principal Components (PCA) Paper II Example:

More information

Basics of Multivariate Modelling and Data Analysis

Basics of Multivariate Modelling and Data Analysis Basics of Multivariate Modelling and Data Analysis Kurt-Erik Häggblom 6. Principal component analysis (PCA) 6.1 Overview 6.2 Essentials of PCA 6.3 Numerical calculation of PCs 6.4 Effects of data preprocessing

More information

Multivariate Fundamentals: Rotation. Exploratory Factor Analysis

Multivariate Fundamentals: Rotation. Exploratory Factor Analysis Multivariate Fundamentals: Rotation Exploratory Factor Analysis PCA Analysis A Review Precipitation Temperature Ecosystems PCA Analysis with Spatial Data Proportion of variance explained Comp.1 + Comp.2

More information

Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses.

Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses. ISQS 6348 Final exam solutions. Name: Open book and notes, but no electronic devices. Answer short answer questions on separate blank paper. Answer multiple choice on this exam sheet. Put your name on

More information

4. Ordination in reduced space

4. Ordination in reduced space Université Laval Analyse multivariable - mars-avril 2008 1 4.1. Generalities 4. Ordination in reduced space Contrary to most clustering techniques, which aim at revealing discontinuities in the data, ordination

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 4: Factor analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2017/2018 Master in Mathematical Engineering Pedro

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Introduction Edps/Psych/Stat/ 584 Applied Multivariate Statistics Carolyn J Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN c Board of Trustees,

More information

Chapter 4: Factor Analysis

Chapter 4: Factor Analysis Chapter 4: Factor Analysis In many studies, we may not be able to measure directly the variables of interest. We can merely collect data on other variables which may be related to the variables of interest.

More information

Contents. Acknowledgments. xix

Contents. Acknowledgments. xix Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables

More information

Robustness of Principal Components

Robustness of Principal Components PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.

More information

Algebra of Principal Component Analysis

Algebra of Principal Component Analysis Algebra of Principal Component Analysis 3 Data: Y = 5 Centre each column on its mean: Y c = 7 6 9 y y = 3..6....6.8 3. 3.8.6 Covariance matrix ( variables): S = -----------Y n c ' Y 8..6 c =.6 5.8 Equation

More information

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING Vishwanath Mantha Department for Electrical and Computer Engineering Mississippi State University, Mississippi State, MS 39762 mantha@isip.msstate.edu ABSTRACT

More information

Bootstrapping, Randomization, 2B-PLS

Bootstrapping, Randomization, 2B-PLS Bootstrapping, Randomization, 2B-PLS Statistics, Tests, and Bootstrapping Statistic a measure that summarizes some feature of a set of data (e.g., mean, standard deviation, skew, coefficient of variation,

More information

Dimensionality Reduction Techniques (DRT)

Dimensionality Reduction Techniques (DRT) Dimensionality Reduction Techniques (DRT) Introduction: Sometimes we have lot of variables in the data for analysis which create multidimensional matrix. To simplify calculation and to get appropriate,

More information

EXAM PRACTICE. 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False

EXAM PRACTICE. 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False EXAM PRACTICE 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False Stats 1: What is a Hypothesis? A testable assertion about how the world works Hypothesis

More information

Table of Contents. Multivariate methods. Introduction II. Introduction I

Table of Contents. Multivariate methods. Introduction II. Introduction I Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation

More information

BIO 682 Multivariate Statistics Spring 2008

BIO 682 Multivariate Statistics Spring 2008 BIO 682 Multivariate Statistics Spring 2008 Steve Shuster http://www4.nau.edu/shustercourses/bio682/index.htm Lecture 11 Properties of Community Data Gauch 1982, Causton 1988, Jongman 1995 a. Qualitative:

More information

UCLA STAT 233 Statistical Methods in Biomedical Imaging

UCLA STAT 233 Statistical Methods in Biomedical Imaging UCLA STAT 233 Statistical Methods in Biomedical Imaging Instructor: Ivo Dinov, Asst. Prof. In Statistics and Neurology University of California, Los Angeles, Spring 2004 http://www.stat.ucla.edu/~dinov/

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Applied Multivariate Analysis

Applied Multivariate Analysis Department of Mathematics and Statistics, University of Vaasa, Finland Spring 2017 Dimension reduction Exploratory (EFA) Background While the motivation in PCA is to replace the original (correlated) variables

More information

Principal Component Analysis (PCA) Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Recall: Eigenvectors of the Covariance Matrix Covariance matrices are symmetric. Eigenvectors are orthogonal Eigenvectors are ordered by the magnitude of eigenvalues: λ 1 λ 2 λ p {v 1, v 2,..., v n } Recall:

More information

Exploratory Factor Analysis and Principal Component Analysis

Exploratory Factor Analysis and Principal Component Analysis Exploratory Factor Analysis and Principal Component Analysis Today s Topics: What are EFA and PCA for? Planning a factor analytic study Analysis steps: Extraction methods How many factors Rotation and

More information

Multivariate Statistics Summary and Comparison of Techniques. Multivariate Techniques

Multivariate Statistics Summary and Comparison of Techniques. Multivariate Techniques Multivariate Statistics Summary and Comparison of Techniques P The key to multivariate statistics is understanding conceptually the relationship among techniques with regards to: < The kinds of problems

More information

VAR2 VAR3 VAR4 VAR5. Or, in terms of basic measurement theory, we could model it as:

VAR2 VAR3 VAR4 VAR5. Or, in terms of basic measurement theory, we could model it as: 1 Neuendorf Factor Analysis Assumptions: 1. Metric (interval/ratio) data 2. Linearity (in the relationships among the variables) -Factors are linear constructions of the set of variables (see #8 under

More information

Principal component analysis

Principal component analysis Principal component analysis Angela Montanari 1 Introduction Principal component analysis (PCA) is one of the most popular multivariate statistical methods. It was first introduced by Pearson (1901) and

More information

ANOVA approach. Investigates interaction terms. Disadvantages: Requires careful sampling design with replication

ANOVA approach. Investigates interaction terms. Disadvantages: Requires careful sampling design with replication ANOVA approach Advantages: Ideal for evaluating hypotheses Ideal to quantify effect size (e.g., differences between groups) Address multiple factors at once Investigates interaction terms Disadvantages:

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4 Lecture Topic Projects 1 Intro, schedule, and logistics 2 Applications of visual analytics, data types 3 Data sources and preparation Project 1 out 4 Data reduction, similarity & distance, data augmentation

More information

Exploratory Factor Analysis and Principal Component Analysis

Exploratory Factor Analysis and Principal Component Analysis Exploratory Factor Analysis and Principal Component Analysis Today s Topics: What are EFA and PCA for? Planning a factor analytic study Analysis steps: Extraction methods How many factors Rotation and

More information

Multivariate Analysis of Ecological Data using CANOCO

Multivariate Analysis of Ecological Data using CANOCO Multivariate Analysis of Ecological Data using CANOCO JAN LEPS University of South Bohemia, and Czech Academy of Sciences, Czech Republic Universitats- uric! Lanttesbibiiothek Darmstadt Bibliothek Biologie

More information

Chapter 11 Canonical analysis

Chapter 11 Canonical analysis Chapter 11 Canonical analysis 11.0 Principles of canonical analysis Canonical analysis is the simultaneous analysis of two, or possibly several data tables. Canonical analyses allow ecologists to perform

More information

Principal Components Analysis. Sargur Srihari University at Buffalo

Principal Components Analysis. Sargur Srihari University at Buffalo Principal Components Analysis Sargur Srihari University at Buffalo 1 Topics Projection Pursuit Methods Principal Components Examples of using PCA Graphical use of PCA Multidimensional Scaling Srihari 2

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

1. Introduction to Multivariate Analysis

1. Introduction to Multivariate Analysis 1. Introduction to Multivariate Analysis Isabel M. Rodrigues 1 / 44 1.1 Overview of multivariate methods and main objectives. WHY MULTIVARIATE ANALYSIS? Multivariate statistical analysis is concerned with

More information

Principal component analysis

Principal component analysis Principal component analysis Motivation i for PCA came from major-axis regression. Strong assumption: single homogeneous sample. Free of assumptions when used for exploration. Classical tests of significance

More information

2/26/2017. This is similar to canonical correlation in some ways. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

2/26/2017. This is similar to canonical correlation in some ways. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 What is factor analysis? What are factors? Representing factors Graphs and equations Extracting factors Methods and criteria Interpreting

More information

Data Preprocessing Tasks

Data Preprocessing Tasks Data Tasks 1 2 3 Data Reduction 4 We re here. 1 Dimensionality Reduction Dimensionality reduction is a commonly used approach for generating fewer features. Typically used because too many features can

More information

Factor Analysis. Qian-Li Xue

Factor Analysis. Qian-Li Xue Factor Analysis Qian-Li Xue Biostatistics Program Harvard Catalyst The Harvard Clinical & Translational Science Center Short course, October 7, 06 Well-used latent variable models Latent variable scale

More information

Principal Components Analysis (PCA)

Principal Components Analysis (PCA) Principal Components Analysis (PCA) Principal Components Analysis (PCA) a technique for finding patterns in data of high dimension Outline:. Eigenvectors and eigenvalues. PCA: a) Getting the data b) Centering

More information

Or, in terms of basic measurement theory, we could model it as:

Or, in terms of basic measurement theory, we could model it as: 1 Neuendorf Factor Analysis Assumptions: 1. Metric (interval/ratio) data 2. Linearity (in relationships among the variables--factors are linear constructions of the set of variables; the critical source

More information

Lecture 4: Principal Component Analysis and Linear Dimension Reduction

Lecture 4: Principal Component Analysis and Linear Dimension Reduction Lecture 4: Principal Component Analysis and Linear Dimension Reduction Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics University of Pittsburgh E-mail:

More information

EDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS

EDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS EDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS Mario Romanazzi October 29, 2017 1 Introduction An important task in multidimensional data analysis is reduction in complexity. Recalling that

More information

Multivariate analysis

Multivariate analysis Multivariate analysis Prof dr Ann Vanreusel -Multidimensional scaling -Simper analysis -BEST -ANOSIM 1 2 Gradient in species composition 3 4 Gradient in environment site1 site2 site 3 site 4 site species

More information

Factor analysis. George Balabanis

Factor analysis. George Balabanis Factor analysis George Balabanis Key Concepts and Terms Deviation. A deviation is a value minus its mean: x - mean x Variance is a measure of how spread out a distribution is. It is computed as the average

More information

B. Weaver (18-Oct-2001) Factor analysis Chapter 7: Factor Analysis

B. Weaver (18-Oct-2001) Factor analysis Chapter 7: Factor Analysis B Weaver (18-Oct-2001) Factor analysis 1 Chapter 7: Factor Analysis 71 Introduction Factor analysis (FA) was developed by C Spearman It is a technique for examining the interrelationships in a set of variables

More information

Factor Analysis. -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD

Factor Analysis. -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD Factor Analysis -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 From PCA to factor analysis Remember: PCA tries to estimate a transformation of the data such that: 1. The maximum amount

More information

Multivariate Statistics 101. Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis

Multivariate Statistics 101. Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis Multivariate Statistics 101 Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis Multivariate Statistics 101 Copy of slides and exercises PAST software download

More information

Revision: Chapter 1-6. Applied Multivariate Statistics Spring 2012

Revision: Chapter 1-6. Applied Multivariate Statistics Spring 2012 Revision: Chapter 1-6 Applied Multivariate Statistics Spring 2012 Overview Cov, Cor, Mahalanobis, MV normal distribution Visualization: Stars plot, mosaic plot with shading Outlier: chisq.plot Missing

More information

Noise & Data Reduction

Noise & Data Reduction Noise & Data Reduction Paired Sample t Test Data Transformation - Overview From Covariance Matrix to PCA and Dimension Reduction Fourier Analysis - Spectrum Dimension Reduction 1 Remember: Central Limit

More information

Principal Components. Summary. Sample StatFolio: pca.sgp

Principal Components. Summary. Sample StatFolio: pca.sgp Principal Components Summary... 1 Statistical Model... 4 Analysis Summary... 5 Analysis Options... 7 Scree Plot... 8 Component Weights... 9 D and 3D Component Plots... 10 Data Table... 11 D and 3D Component

More information

Basics of Multivariate Modelling and Data Analysis

Basics of Multivariate Modelling and Data Analysis Basics of Multivariate Modelling and Data Analysis Kurt-Erik Häggblom 2. Overview of multivariate techniques 2.1 Different approaches to multivariate data analysis 2.2 Classification of multivariate techniques

More information

Unsupervised machine learning

Unsupervised machine learning Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels

More information

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization. Statistical Tools in Evaluation HPS 41 Fall 213 Dr. Joe G. Schmalfeldt Types of Scores Continuous Scores scores with a potentially infinite number of values. Discrete Scores scores limited to a specific

More information

A Peak to the World of Multivariate Statistical Analysis

A Peak to the World of Multivariate Statistical Analysis A Peak to the World of Multivariate Statistical Analysis Real Contents Real Real Real Why is it important to know a bit about the theory behind the methods? Real 5 10 15 20 Real 10 15 20 Figure: Multivariate

More information

Vector Space Models. wine_spectral.r

Vector Space Models. wine_spectral.r Vector Space Models 137 wine_spectral.r Latent Semantic Analysis Problem with words Even a small vocabulary as in wine example is challenging LSA Reduce number of columns of DTM by principal components

More information

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu

Dimension Reduction Techniques. Presented by Jie (Jerry) Yu Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage

More information

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1

Statistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1 Week 2 Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Part I Other datatypes, preprocessing 2 / 1 Other datatypes Document data You might start with a collection of

More information

Noise & Data Reduction

Noise & Data Reduction Noise & Data Reduction Andreas Wichert - Teóricas andreas.wichert@inesc-id.pt 1 Paired Sample t Test Data Transformation - Overview From Covariance Matrix to PCA and Dimension Reduction Fourier Analysis

More information

Glossary for the Triola Statistics Series

Glossary for the Triola Statistics Series Glossary for the Triola Statistics Series Absolute deviation The measure of variation equal to the sum of the deviations of each value from the mean, divided by the number of values Acceptance sampling

More information

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes

Part I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes Week 2 Based in part on slides from textbook, slides of Susan Holmes Part I Other datatypes, preprocessing October 3, 2012 1 / 1 2 / 1 Other datatypes Other datatypes Document data You might start with

More information

Principal Component Analysis. Applied Multivariate Statistics Spring 2012

Principal Component Analysis. Applied Multivariate Statistics Spring 2012 Principal Component Analysis Applied Multivariate Statistics Spring 2012 Overview Intuition Four definitions Practical examples Mathematical example Case study 2 PCA: Goals Goal 1: Dimension reduction

More information

Multivariate and Multivariable Regression. Stella Babalola Johns Hopkins University

Multivariate and Multivariable Regression. Stella Babalola Johns Hopkins University Multivariate and Multivariable Regression Stella Babalola Johns Hopkins University Session Objectives At the end of the session, participants will be able to: Explain the difference between multivariable

More information

Overview of clustering analysis. Yuehua Cui

Overview of clustering analysis. Yuehua Cui Overview of clustering analysis Yuehua Cui Email: cuiy@msu.edu http://www.stt.msu.edu/~cui A data set with clear cluster structure How would you design an algorithm for finding the three clusters in this

More information

Experimental design. Matti Hotokka Department of Physical Chemistry Åbo Akademi University

Experimental design. Matti Hotokka Department of Physical Chemistry Åbo Akademi University Experimental design Matti Hotokka Department of Physical Chemistry Åbo Akademi University Contents Elementary concepts Regression Validation Hypotesis testing ANOVA PCA, PCR, PLS Clusters, SIMCA Design

More information

Figure 43 - The three components of spatial variation

Figure 43 - The three components of spatial variation Université Laval Analyse multivariable - mars-avril 2008 1 6.3 Modeling spatial structures 6.3.1 Introduction: the 3 components of spatial structure For a good understanding of the nature of spatial variation,

More information

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component

More information

Multivariate Statistics (I) 2. Principal Component Analysis (PCA)

Multivariate Statistics (I) 2. Principal Component Analysis (PCA) Multivariate Statistics (I) 2. Principal Component Analysis (PCA) 2.1 Comprehension of PCA 2.2 Concepts of PCs 2.3 Algebraic derivation of PCs 2.4 Selection and goodness-of-fit of PCs 2.5 Algebraic derivation

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

An Introduction to Ordination Connie Clark

An Introduction to Ordination Connie Clark An Introduction to Ordination Connie Clark Ordination is a collective term for multivariate techniques that adapt a multidimensional swarm of data points in such a way that when it is projected onto a

More information

Introduction to ordination. Gary Bradfield Botany Dept.

Introduction to ordination. Gary Bradfield Botany Dept. Introduction to ordination Gary Bradfield Botany Dept. Ordination there appears to be no word in English which one can use as an antonym to classification ; I would like to propose the term ordination.

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Comparisons of Two Means Edps/Soc 584 and Psych 594 Applied Multivariate Statistics Carolyn J. Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN c

More information

Canonical Correlation & Principle Components Analysis

Canonical Correlation & Principle Components Analysis Canonical Correlation & Principle Components Analysis Aaron French Canonical Correlation Canonical Correlation is used to analyze correlation between two sets of variables when there is one set of IVs

More information

Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.

More information

Eigenvalues, Eigenvectors, and an Intro to PCA

Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.

More information

Statistics Toolbox 6. Apply statistical algorithms and probability models

Statistics Toolbox 6. Apply statistical algorithms and probability models Statistics Toolbox 6 Apply statistical algorithms and probability models Statistics Toolbox provides engineers, scientists, researchers, financial analysts, and statisticians with a comprehensive set of

More information

An Introduction to Multivariate Statistical Analysis

An Introduction to Multivariate Statistical Analysis An Introduction to Multivariate Statistical Analysis Third Edition T. W. ANDERSON Stanford University Department of Statistics Stanford, CA WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents

More information

PRINCIPAL COMPONENTS ANALYSIS

PRINCIPAL COMPONENTS ANALYSIS 121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves

More information

Multivariate Statistics Fundamentals Part 1: Rotation-based Techniques

Multivariate Statistics Fundamentals Part 1: Rotation-based Techniques Multivariate Statistics Fundamentals Part 1: Rotation-based Techniques A reminded from a univariate statistics courses Population Class of things (What you want to learn about) Sample group representing

More information

An introduction to multivariate data

An introduction to multivariate data An introduction to multivariate data Angela Montanari 1 The data matrix The starting point of any analysis of multivariate data is a data matrix, i.e. a collection of n observations on a set of p characters

More information

Clusters. Unsupervised Learning. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Clusters. Unsupervised Learning. Luc Anselin.   Copyright 2017 by Luc Anselin, All Rights Reserved Clusters Unsupervised Learning Luc Anselin http://spatial.uchicago.edu 1 curse of dimensionality principal components multidimensional scaling classical clustering methods 2 Curse of Dimensionality 3 Curse

More information

3.1. The probabilistic view of the principal component analysis.

3.1. The probabilistic view of the principal component analysis. 301 Chapter 3 Principal Components and Statistical Factor Models This chapter of introduces the principal component analysis (PCA), briefly reviews statistical factor models PCA is among the most popular

More information

Principal Components Analysis using R Francis Huang / November 2, 2016

Principal Components Analysis using R Francis Huang / November 2, 2016 Principal Components Analysis using R Francis Huang / huangf@missouri.edu November 2, 2016 Principal components analysis (PCA) is a convenient way to reduce high dimensional data into a smaller number

More information

Distances and similarities Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Distances and similarities Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining Distances and similarities Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Similarities Start with X which we assume is centered and standardized. The PCA loadings were

More information

Principal Component Analysis (PCA) Our starting point consists of T observations from N variables, which will be arranged in an T N matrix R,

Principal Component Analysis (PCA) Our starting point consists of T observations from N variables, which will be arranged in an T N matrix R, Principal Component Analysis (PCA) PCA is a widely used statistical tool for dimension reduction. The objective of PCA is to find common factors, the so called principal components, in form of linear combinations

More information

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining

Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining Dimension reduction, PCA & eigenanalysis Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Combinations of features Given a data matrix X n p with p fairly large, it can

More information

A User's Guide To Principal Components

A User's Guide To Principal Components A User's Guide To Principal Components J. EDWARD JACKSON A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Brisbane Toronto Singapore Contents Preface Introduction 1. Getting

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information