PCA Advanced Examples & Applications
|
|
- Edgar Smith
- 5 years ago
- Views:
Transcription
1 PCA Advanced Examples & Applications Objectives: Showcase advanced PCA analysis: - Addressing the assumptions - Improving the signal / decreasing the noise
2 Principal Components (PCA) Paper II Example: Ainley, D.G. et al. (2005). Objective: Relate densities of the 12 most abundant species of seabirds to 12 habitat variables: 5 biological, 4 oceanographic, 3 geographic (spatial) 82.3%
3 Principal Components (PCA) Paper II Oceanographic variables examined: sea-surface temperature / salinity, thermocline depth / strength Date Distance to Fronts Chl Max Acoustic Biomass
4 Principal Components (PCA) Paper II Data Manipulations To Avoid Biases: Densities log-transformed to meet normality assumptions Nevertheless, residuals generated in the regressions for some species did not meet those assumptions (Skewness / Kurtosis Test for Normality of Residuals, p < 0.05) Least-squares regression analysis (ANOVA), however, is a very robust procedure with respect to non-normality (Seber, 1977, Kleinbaum et al., 1988) Yet, while these analyses yield the best linear unbiased estimator in the absence of normally distributed residuals, p- values near 0.05 must be viewed with caution (Seber, 1977)
5 Principal Components (PCA) Paper II To avoid double-absences: Only 15-min transects in which any given species was recorded were analyzed The total sample size for the 12 species was 1209 Is this an adequate sample size? Rule of thumb: 5 samples per variable (Tabachnick and Fidell 1989) 1209 / 12 ~ 100 samples per variable
6 Principal Components (PCA) Paper II Analysis Methods: Principal components analysis (PCA), in combination with Sidak multiple comparison tests, used to assess differences in habitat selection among 12 seabird species To test for significant differences in habitat affinities among seabird species, used two one-way ANOVAs: In the first, tested for differences among PC1 scores of each species; in the second, compared the PC2 scores Differences between two species significant if either one or both PC scores differed significantly
7 Principal Components (PCA) Paper II Community-Wide Result: First and second PC axes explained 60% of variance in distribution of 12 species
8 Principal Components (PCA) Paper II Species-specific Results: Species mapped onto two (independent) dimensions Pair-wise associations (tested) denoted by circles Near Fronts Zoop Prey Salty, Green Fish Prey
9 Principal Components (PCA) Comparisons Number of Axes: - Selected 2 easy to interpret (Ainley et al. 2005) - Selected 6 based on eigenvalues > 1 (Weichler et al. 2004) Display of Results: - Plot & table of eigenvalues (Ainley et al. 2005) - Eigenvalues & interpretation (description) (Weichler et al. 2004) Significance Tests: - Pairwise species comparisons (ANOVA) (Ainley et al. 2005) - Correlations with selected variables (Weichler et al. 2004)
10 Influence of Distances in PCA PCA seeks the strongest patterns, with the largest distances: Remember: Outliers distort the real patterns in the data, by adding large distances
11 Mind the PCA Assumptions Because it uses linear combinations of response variables to create the axes, PCA is subject to the assumptions of linearity in the relationships among the variables. Implicit assumption of linearity in relationships between the responses and gradients represented by ordination axes. Ordination axes uncorrelated (by definition, orthogonal). Assumptions met by relatively homogeneous environmental datasets with few zeroes, and rarely met by species datasets. Difficulties when PCA used on zero-rich species datasets because they usually violate normality and linearity assumptions(e.g., high skewness, difficult to normalize).
12 Reducing Noise / Improving Signal Therefore, critical to determine if a strong influence by dominant species / variables is consistent with the underlying assumptions and with your analysis objectives. Alternatively, you can think about what steps are needed to reduce their influence and to meet assumptions. Options for reducing the influence of dominant species involve: standardizing data to eliminate abundance differences deleting highly dominant / rare species creating subsets of variables to analyze separately
13 Standardizing the Data Relativization re-scales all of the data at once, using a common criterion / standard. When it is done by columns (e.g., species), variation across plots is retained, but variation across species is standardized. When its done by rows (e.g., plots), variation across species is retained, but variation across plots is standardized. Sums:
14 Data Relativizations in PC-ORD Relativization by Maximum: When relativization by maximum is set for columns, each cell in a column is divided by the maximum value in the column, replacing absolute values with proportions of the maximum observed value across all sample units. This relativization approach is used when the maximum observed value of a given response across all sample units is considered the maximum potential abundance for that response in this population-species. The relativized values for each response represent proportions of their maximum potential. They are applied to species data to equalize the influence of common / rare species and of abundant / nonabundant species.
15 Data Relativizations in PC-ORD Relativization by Maximum: (input: x > 0; output: from 0 to 1) Divides each cell s value by the total (for the given row or the given column) such that values range from 0 to 1. Sums: Sums:
16 Data Relativizations in PC-ORD General Relativization: General relativization by column totals reduces influence of responses with high total abundance relative to those with low total abundance, because observations are proportional to their intra-response total abundance. This retains the variation in abundances across sample units, but reduces the influence of very common species and increases the influence of rare species. NOTE: General relativization by row totals is applicable if your question of interest focuses on giving each sample the same influence. HINT: this is not reasonable when the columns have different variables
17 Data Relativizations PC-ORD Y X NOTE: p can take on two values: 1 or 2 Remember: City Block vs Euclidean
18 Data Relativizations in PC-ORD General Relativization: (input: x > 0; output: from 0 to 1) If p = 1, Relativization is by column (or row) totals. Appropriate for city-block distance measure (e.g., Sorensen). If p = 2, Relativization is by column (or row) totals. Appropriate for Euclidean distance measure (square root).
19 General Relativizations General Relativization: (by totals) makes area under each species distribution response curve = 1 By columns generalized: (p = 1): By columns generalized: (p = 2): Sums:
20 Other Data Relativization PC-ORD Deviations: Value Mean Z scores: (Value Mean) / SD Binary response: Above (1) / Below (0) Ranks: Assigns ranks (e.g., 0, 0, 6, 9 would receive the ranks 1.5, 1.5, 3, 4)
21 Data Relativization Remember Note: Need to accept TEMP file
22 Relativizations Recommendations Do not use relativization by maximum when any data < 0 Do not use general relativization when any data < 0 Cannot use standard deviates with empty data groups (rows / columns) - Why not? NOTE: Fine to use with negative data
23 PCA Example Upwell Where do we start? Data Exploration + Summarization What do we look for? Value Ranges Typos, Possible Transformations Unequal Sums Different Weights
24 What do we look for? PCA Example Upwell -1 < Skewness < +1 Few Vacant Cells
25 PCA Example Upwell How can we solve unequal sums (weights) of variables? Relativize by Maximum (Columns)
26 PCA Example Upwell How can we solve unequal sums (weights) of variables? Standard Deviates by Columns
27 Mind your Relativizations Not all datasets amenable to all relativizations: some are mathematically incompatible, others fail to relativize the samples / species. Check Data Ranges / Sums BEFORE Check Data Ranges / Sums AFTER
28 Rotating the Ordination PCA seeks the strongest patterns, with the largest distances: The resulting ordination can be rotated to look at specific patterns
29 Axis 2 PCA Tools Rotation Results: Rotation (VARIMAX) Samples: Not rotated Varimax rotated points Species: Axis 2 Eup Poapra Broine Agrrep Cardra Eup Broine Agrrep Poapra Cardra Agrsto vectors Desces Agrsto Desc es Axis 1 Axis 1 Comparison of ordination of sample units in species space before and after varimax rotation. Note the improved alignment of the species vectors with the ordination axes in the rotated ordination.
30 PCA Tools - Rotation Rotation to align patterns from separate ordinations facilitates comparisons across studies: In Ordination 1, the point cloud has been rotated to maximize loading of Variable 1 onto Axis 1. In Ordination 2, the same dominant trends were found but at an angle to those found in Ordination 1. Therefore, Ordination 2 can be rotated through an angle (shown by arrow) so that it aligns Variable 1 with Axis 1.
31 PCA Tools Rotation Rotation aligns ordination to highlight certain patterns NEDO Axes Loadings Axis 1: Axis 2: Rotation by NEDO Stretch plot along direction of most variation for species
32 PCA Tools Rotation Looking at a Specific Species Response Correlations NEDO Axis 1: Axis 2:
33 PCA Tools - Rotation Rotation aligns ordination to highlight certain patterns NOTE: loadings of the species on the axes and the correlations of the species with the axes will change after rotation is implemented
34 Mind your Rotations Report all rotations in results. Check Axis Correlations / % Variance BEFORE Check AxisCorrelations / % Variance AFTER
35
36 PCA Next Steps Example 1 Use PCA to synthesize cross-correlated environmental variables into independent (orthogonal) patterns Use new synthetic variables to compare categorical variables (groups) using ANOVA / GLMs
37 PCA Example of Next Steps Principal Component Analysis (PCA) used to assess patterns of shared variation in 71 POP analytes. 6 DDTs,47 PCB congeners, 8 chlordane isomers, 3 hexachlorohexanes dieldrin, mirex, aldrin, hexachlorobenzene, and 10 PBDE congeners. Considered three categorical variables: Three age / sex groups compared in the analysis: juveniles, adult males, and adult females. Two sample origins: necropsy (dead) / biopsy (alive). Two tissues sampled: serum (blood) and fat.
38 PCA Example of Next Steps Sample Outliers: Data log transformed and examined to determine the existence of outliers (> 3 S.D. deviations from mean). Two adult male outliers (one high and one low) were removed for statistical analysis following these criteria. Empty Variables: POP analytes that were below LOQ in > 75% of samples removed to reach recommended 5:1 sample / variable ratio
39 PCA Example of Next Steps Significant PCA axes selected using alpha = 0.05, using 999 randomizations. One significant PCA axis accounted for 74.89% of variance.
40 PCA Example of Next Steps Sample loading values compared using ANOVA to assess whether common patterns of POP levels associated with different age/sex groups (juvenile, adult, adult ), origin (live biopsy vs. necropsy), or tissue (blubber vs. serum). No significant differences among age/sex groups in PCA loading values, indicating that shared variation of POPs did not differ between age/sex groups. Significant difference between the 2 sample origins (p = 0.02), suggesting a difference in POPs between necropsy and live animal samples. Significant difference between blubber samples and serum samples (p < 0.001).
41 PCA Next Steps Example 2 Use PCA to synthesize cross-correlated environmental variables into independent (orthogonal) patterns Use new synthetic variables to explain other response variables (like species counts) using other statistical methods (GLMs, GAMs)
42 PCA Next Steps Published Example: Ainley & Hyrenbach (2010). Objective: Relate seabird densities to five crosscorrelated environmental variables: MEI, PDO, upwelling 39, upwelling 36, SST
43 PCA Next Steps Objective: Also considered lagged environmental data: winter, early spring, late spring
44 PCA Next Steps Results: Four PC axes described 83 % of variability Assessed temporal trends in PC factors using Spearman rank correlations (df = 19, rs critical = 0.433). Tests indicated no trends in spring-time environmental conditions sampled during the study period: PC1 (rs = 0.195, 0.50 > p > 0.20), PC2 (rs = , 0.50 > p > 0.20), PC3 (rs = 0.005, p > 0.50), and PC4 (rs = , p > 0.50).
45 PCA Next Steps Results: Related seabird densities to 4 PC factors and time using GLM tests: - R squared - P value - # of variables
46 PCA Next Steps Results: Species with significant responses to PC1
47 Summary Next Steps 1 PCA synthesized complex patterns into orthogonal axes Other statistical tests performed with resulting PC loadings This allows performing categorical comparisons (i.e., ANOVA)
48 Summary Next Steps 2 PCA synthesized complex patterns into orthogonal axes Other statistical tests performed with resulting PC loadings This allows relating species abundances (non-normal data) to the PCA factors using other statistics (i.e., GLMs, GAMs)
Distance Measures. Objectives: Discuss Distance Measures Illustrate Distance Measures
Distance Measures Objectives: Discuss Distance Measures Illustrate Distance Measures Quantifying Data Similarity Multivariate Analyses Re-map the data from Real World Space to Multi-variate Space Distance
More informationUnconstrained Ordination
Unconstrained Ordination Sites Species A Species B Species C Species D Species E 1 0 (1) 5 (1) 1 (1) 10 (4) 10 (4) 2 2 (3) 8 (3) 4 (3) 12 (6) 20 (6) 3 8 (6) 20 (6) 10 (6) 1 (2) 3 (2) 4 4 (5) 11 (5) 8 (5)
More informationANOVA approach. Investigates interaction terms. Disadvantages: Requires careful sampling design with replication
ANOVA approach Advantages: Ideal for evaluating hypotheses Ideal to quantify effect size (e.g., differences between groups) Address multiple factors at once Investigates interaction terms Disadvantages:
More informationEXAM PRACTICE. 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False
EXAM PRACTICE 12 questions * 4 categories: Statistics Background Multivariate Statistics Interpret True / False Stats 1: What is a Hypothesis? A testable assertion about how the world works Hypothesis
More informationPrincipal Component Analysis (PCA) Theory, Practice, and Examples
Principal Component Analysis (PCA) Theory, Practice, and Examples Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite) variables. p k n A
More information-Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the
1 2 3 -Principal components analysis is by far the oldest multivariate technique, dating back to the early 1900's; ecologists have used PCA since the 1950's. -PCA is based on covariance or correlation
More informationMultivariate Fundamentals: Rotation. Exploratory Factor Analysis
Multivariate Fundamentals: Rotation Exploratory Factor Analysis PCA Analysis A Review Precipitation Temperature Ecosystems PCA Analysis with Spatial Data Proportion of variance explained Comp.1 + Comp.2
More informationDimensionality Reduction Techniques (DRT)
Dimensionality Reduction Techniques (DRT) Introduction: Sometimes we have lot of variables in the data for analysis which create multidimensional matrix. To simplify calculation and to get appropriate,
More informationInferences About the Difference Between Two Means
7 Inferences About the Difference Between Two Means Chapter Outline 7.1 New Concepts 7.1.1 Independent Versus Dependent Samples 7.1. Hypotheses 7. Inferences About Two Independent Means 7..1 Independent
More informationCHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)
FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter
More informationBIO 682 Multivariate Statistics Spring 2008
BIO 682 Multivariate Statistics Spring 2008 Steve Shuster http://www4.nau.edu/shustercourses/bio682/index.htm Lecture 11 Properties of Community Data Gauch 1982, Causton 1988, Jongman 1995 a. Qualitative:
More informationPrincipal component analysis
Principal component analysis Motivation i for PCA came from major-axis regression. Strong assumption: single homogeneous sample. Free of assumptions when used for exploration. Classical tests of significance
More informationMultivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis
Multivariate Data Analysis a survey of data reduction and data association techniques: Principal Components Analysis For example Data reduction approaches Cluster analysis Principal components analysis
More informationAdvanced Mantel Test
Advanced Mantel Test Objectives: Illustrate Flexibility of Simple Mantel Test Discuss the Need and Rationale for the Partial Mantel Test Illustrate the use of the Partial Mantel Test Summary Mantel Test
More information1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College
1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College Spring 2010 The basic ANOVA situation Two variables: 1 Categorical, 1 Quantitative Main Question: Do the (means of) the quantitative
More informationMachine Learning (Spring 2012) Principal Component Analysis
1-71 Machine Learning (Spring 1) Principal Component Analysis Yang Xu This note is partly based on Chapter 1.1 in Chris Bishop s book on PRML and the lecture slides on PCA written by Carlos Guestrin in
More informationCOMPARING SEVERAL MEANS: ANOVA
LAST UPDATED: November 15, 2012 COMPARING SEVERAL MEANS: ANOVA Objectives 2 Basic principles of ANOVA Equations underlying one-way ANOVA Doing a one-way ANOVA in R Following up an ANOVA: Planned contrasts/comparisons
More informationMultivariate Statistics 101. Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis
Multivariate Statistics 101 Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis Multivariate Statistics 101 Copy of slides and exercises PAST software download
More informationROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015
ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti
More informationLinear Dimensionality Reduction
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis
More informationMultivariate analysis of genetic data: exploring groups diversity
Multivariate analysis of genetic data: exploring groups diversity T. Jombart Imperial College London Bogota 01-12-2010 1/42 Outline Introduction Clustering algorithms Hierarchical clustering K-means Multivariate
More informationStatistics 202: Data Mining. c Jonathan Taylor. Week 2 Based in part on slides from textbook, slides of Susan Holmes. October 3, / 1
Week 2 Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Part I Other datatypes, preprocessing 2 / 1 Other datatypes Document data You might start with a collection of
More informationPart I. Other datatypes, preprocessing. Other datatypes. Other datatypes. Week 2 Based in part on slides from textbook, slides of Susan Holmes
Week 2 Based in part on slides from textbook, slides of Susan Holmes Part I Other datatypes, preprocessing October 3, 2012 1 / 1 2 / 1 Other datatypes Other datatypes Document data You might start with
More informationPrincipal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis. Chris Funk. Lecture 17
Principal Component Analysis-I Geog 210C Introduction to Spatial Data Analysis Chris Funk Lecture 17 Outline Filters and Rotations Generating co-varying random fields Translating co-varying fields into
More information2/26/2017. This is similar to canonical correlation in some ways. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2
PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 What is factor analysis? What are factors? Representing factors Graphs and equations Extracting factors Methods and criteria Interpreting
More informationLECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS
LECTURE 4 PRINCIPAL COMPONENTS ANALYSIS / EXPLORATORY FACTOR ANALYSIS NOTES FROM PRE- LECTURE RECORDING ON PCA PCA and EFA have similar goals. They are substantially different in important ways. The goal
More informationCHAPTER 4 CRITICAL GROWTH SEASONS AND THE CRITICAL INFLOW PERIOD. The numbers of trawl and by bag seine samples collected by year over the study
CHAPTER 4 CRITICAL GROWTH SEASONS AND THE CRITICAL INFLOW PERIOD The numbers of trawl and by bag seine samples collected by year over the study period are shown in table 4. Over the 18-year study period,
More information4.1. Introduction: Comparing Means
4. Analysis of Variance (ANOVA) 4.1. Introduction: Comparing Means Consider the problem of testing H 0 : µ 1 = µ 2 against H 1 : µ 1 µ 2 in two independent samples of two different populations of possibly
More informationParametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami
Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous
More informationIntroduction to Linear regression analysis. Part 2. Model comparisons
Introduction to Linear regression analysis Part Model comparisons 1 ANOVA for regression Total variation in Y SS Total = Variation explained by regression with X SS Regression + Residual variation SS Residual
More informationGeneralised linear models. Response variable can take a number of different formats
Generalised linear models Response variable can take a number of different formats Structure Limitations of linear models and GLM theory GLM for count data GLM for presence \ absence data GLM for proportion
More informationAnalysis of Variance: Part 1
Analysis of Variance: Part 1 Oneway ANOVA When there are more than two means Each time two means are compared the probability (Type I error) =α. When there are more than two means Each time two means are
More informationPrinciple Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA
Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In
More informationRevision: Chapter 1-6. Applied Multivariate Statistics Spring 2012
Revision: Chapter 1-6 Applied Multivariate Statistics Spring 2012 Overview Cov, Cor, Mahalanobis, MV normal distribution Visualization: Stars plot, mosaic plot with shading Outlier: chisq.plot Missing
More informationFactor analysis. George Balabanis
Factor analysis George Balabanis Key Concepts and Terms Deviation. A deviation is a value minus its mean: x - mean x Variance is a measure of how spread out a distribution is. It is computed as the average
More informationREVIEW 8/2/2017 陈芳华东师大英语系
REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p
More informationStatistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018
Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical
More informationANCOVA. Lecture 9 Andrew Ainsworth
ANCOVA Lecture 9 Andrew Ainsworth What is ANCOVA? Analysis of covariance an extension of ANOVA in which main effects and interactions are assessed on DV scores after the DV has been adjusted for by the
More informationTanagra Tutorials. In this tutorial, we show how to perform this kind of rotation from the results of a standard PCA in Tanagra.
Topic Implementing the VARIMAX rotation in a Principal Component Analysis. A VARIMAX rotation is a change of coordinates used in principal component analysis 1 (PCA) that maximizes the sum of the variances
More informationCanonical Correlation & Principle Components Analysis
Canonical Correlation & Principle Components Analysis Aaron French Canonical Correlation Canonical Correlation is used to analyze correlation between two sets of variables when there is one set of IVs
More informationANOVA Situation The F Statistic Multiple Comparisons. 1-Way ANOVA MATH 143. Department of Mathematics and Statistics Calvin College
1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College An example ANOVA situation Example (Treating Blisters) Subjects: 25 patients with blisters Treatments: Treatment A, Treatment
More informationMATH 829: Introduction to Data Mining and Analysis Principal component analysis
1/11 MATH 829: Introduction to Data Mining and Analysis Principal component analysis Dominique Guillot Departments of Mathematical Sciences University of Delaware April 4, 2016 Motivation 2/11 High-dimensional
More informationTrites and Larkin, 1996). The dashed line shows the division between the declining
Fig. 1. Locations of major geographic features cited in the text. The inserted graph shows estimated numbers of Steller sea lions (all ages) in Alaska from 1956 to 2000 (based on Trites and Larkin, 1996).
More informationNoise & Data Reduction
Noise & Data Reduction Paired Sample t Test Data Transformation - Overview From Covariance Matrix to PCA and Dimension Reduction Fourier Analysis - Spectrum Dimension Reduction 1 Remember: Central Limit
More informationExperimental Design and Data Analysis for Biologists
Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1
More informationDistances and similarities Based in part on slides from textbook, slides of Susan Holmes. October 3, Statistics 202: Data Mining
Distances and similarities Based in part on slides from textbook, slides of Susan Holmes October 3, 2012 1 / 1 Similarities Start with X which we assume is centered and standardized. The PCA loadings were
More informationCorrelation and Simple Linear Regression
Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationIntroduction to Machine Learning
10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what
More informationAssignment 3. Introduction to Machine Learning Prof. B. Ravindran
Assignment 3 Introduction to Machine Learning Prof. B. Ravindran 1. In building a linear regression model for a particular data set, you observe the coefficient of one of the features having a relatively
More information1 A factor can be considered to be an underlying latent variable: (a) on which people differ. (b) that is explained by unknown variables
1 A factor can be considered to be an underlying latent variable: (a) on which people differ (b) that is explained by unknown variables (c) that cannot be defined (d) that is influenced by observed variables
More informationEigenvalues, Eigenvectors, and an Intro to PCA
Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.
More informationEigenvalues, Eigenvectors, and an Intro to PCA
Eigenvalues, Eigenvectors, and an Intro to PCA Eigenvalues, Eigenvectors, and an Intro to PCA Changing Basis We ve talked so far about re-writing our data using a new set of variables, or a new basis.
More informationInter Item Correlation Matrix (R )
7 1. I have the ability to influence my child s well-being. 2. Whether my child avoids injury is just a matter of luck. 3. Luck plays a big part in determining how healthy my child is. 4. I can do a lot
More informationPrincipal Components Analysis (PCA)
Principal Components Analysis (PCA) Principal Components Analysis (PCA) a technique for finding patterns in data of high dimension Outline:. Eigenvectors and eigenvalues. PCA: a) Getting the data b) Centering
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationStructure in Data. A major objective in data analysis is to identify interesting features or structure in the data.
Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two
More informationQuantitative Understanding in Biology Principal Components Analysis
Quantitative Understanding in Biology Principal Components Analysis Introduction Throughout this course we have seen examples of complex mathematical phenomena being represented as linear combinations
More informationSPSS Guide For MMI 409
SPSS Guide For MMI 409 by John Wong March 2012 Preface Hopefully, this document can provide some guidance to MMI 409 students on how to use SPSS to solve many of the problems covered in the D Agostino
More informationBIOL Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES
BIOL 458 - Biometry LAB 6 - SINGLE FACTOR ANOVA and MULTIPLE COMPARISON PROCEDURES PART 1: INTRODUCTION TO ANOVA Purpose of ANOVA Analysis of Variance (ANOVA) is an extremely useful statistical method
More informationVAR2 VAR3 VAR4 VAR5. Or, in terms of basic measurement theory, we could model it as:
1 Neuendorf Factor Analysis Assumptions: 1. Metric (interval/ratio) data 2. Linearity (in the relationships among the variables) -Factors are linear constructions of the set of variables (see #8 under
More informationG E INTERACTION USING JMP: AN OVERVIEW
G E INTERACTION USING JMP: AN OVERVIEW Sukanta Dash I.A.S.R.I., Library Avenue, New Delhi-110012 sukanta@iasri.res.in 1. Introduction Genotype Environment interaction (G E) is a common phenomenon in agricultural
More informationLecture 5: ANOVA and Correlation
Lecture 5: ANOVA and Correlation Ani Manichaikul amanicha@jhsph.edu 23 April 2007 1 / 62 Comparing Multiple Groups Continous data: comparing means Analysis of variance Binary data: comparing proportions
More informationstatistical methods for tailoring seasonal climate forecasts Andrew W. Robertson, IRI
statistical methods for tailoring seasonal climate forecasts Andrew W. Robertson, IRI tailored seasonal forecasts why do we make probabilistic forecasts? to reduce our uncertainty about the (unknown) future
More informationWorksheet 2 - Basic statistics
Worksheet 2 - Basic statistics Basic statistics references Fowler et al. (1998) -Chpts 1, 2, 3, 4, 5, 6, 9, 10, 11, 12, & 16 (16.1, 16.2, 16.3, 16.9,16.11-16.14) Holmes et al. (2006) - Chpt 4 & Sections
More informationNoise & Data Reduction
Noise & Data Reduction Andreas Wichert - Teóricas andreas.wichert@inesc-id.pt 1 Paired Sample t Test Data Transformation - Overview From Covariance Matrix to PCA and Dimension Reduction Fourier Analysis
More informationPrincipal Component Analysis. Applied Multivariate Statistics Spring 2012
Principal Component Analysis Applied Multivariate Statistics Spring 2012 Overview Intuition Four definitions Practical examples Mathematical example Case study 2 PCA: Goals Goal 1: Dimension reduction
More informationB. Weaver (18-Oct-2001) Factor analysis Chapter 7: Factor Analysis
B Weaver (18-Oct-2001) Factor analysis 1 Chapter 7: Factor Analysis 71 Introduction Factor analysis (FA) was developed by C Spearman It is a technique for examining the interrelationships in a set of variables
More informationData Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection
More informationVector Space Models. wine_spectral.r
Vector Space Models 137 wine_spectral.r Latent Semantic Analysis Problem with words Even a small vocabulary as in wine example is challenging LSA Reduce number of columns of DTM by principal components
More informationLeast Squares Analyses of Variance and Covariance
Least Squares Analyses of Variance and Covariance One-Way ANOVA Read Sections 1 and 2 in Chapter 16 of Howell. Run the program ANOVA1- LS.sas, which can be found on my SAS programs page. The data here
More informationFigure 43 - The three components of spatial variation
Université Laval Analyse multivariable - mars-avril 2008 1 6.3 Modeling spatial structures 6.3.1 Introduction: the 3 components of spatial structure For a good understanding of the nature of spatial variation,
More informationLecture 6: Single-classification multivariate ANOVA (k-group( MANOVA)
Lecture 6: Single-classification multivariate ANOVA (k-group( MANOVA) Rationale and MANOVA test statistics underlying principles MANOVA assumptions Univariate ANOVA Planned and unplanned Multivariate ANOVA
More informationPRINCIPAL COMPONENTS ANALYSIS
121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves
More informationPRINCIPAL COMPONENTS ANALYSIS (PCA)
PRINCIPAL COMPONENTS ANALYSIS (PCA) Introduction PCA is considered an exploratory technique that can be used to gain a better understanding of the interrelationships between variables. PCA is performed
More informationThe Model Building Process Part I: Checking Model Assumptions Best Practice
The Model Building Process Part I: Checking Model Assumptions Best Practice Authored by: Sarah Burke, PhD 31 July 2017 The goal of the STAT T&E COE is to assist in developing rigorous, defensible test
More informationQuantitative Understanding in Biology Short Course Session 9 Principal Components Analysis
Quantitative Understanding in Biology Short Course Session 9 Principal Components Analysis Jinhyun Ju Jason Banfelder Luce Skrabanek June 21st, 218 1 Preface For the last session in this course, we ll
More information1 A Review of Correlation and Regression
1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then
More informationHYPOTHESIS TESTING II TESTS ON MEANS. Sorana D. Bolboacă
HYPOTHESIS TESTING II TESTS ON MEANS Sorana D. Bolboacă OBJECTIVES Significance value vs p value Parametric vs non parametric tests Tests on means: 1 Dec 14 2 SIGNIFICANCE LEVEL VS. p VALUE Materials and
More informationPrincipal component analysis (PCA) for clustering gene expression data
Principal component analysis (PCA) for clustering gene expression data Ka Yee Yeung Walter L. Ruzzo Bioinformatics, v17 #9 (2001) pp 763-774 1 Outline of talk Background and motivation Design of our empirical
More informationSTAT 501 EXAM I NAME Spring 1999
STAT 501 EXAM I NAME Spring 1999 Instructions: You may use only your calculator and the attached tables and formula sheet. You can detach the tables and formula sheet from the rest of this exam. Show your
More informationBasics of Multivariate Modelling and Data Analysis
Basics of Multivariate Modelling and Data Analysis Kurt-Erik Häggblom 6. Principal component analysis (PCA) 6.1 Overview 6.2 Essentials of PCA 6.3 Numerical calculation of PCs 6.4 Effects of data preprocessing
More informationMultivariate analysis of genetic data exploring group diversity
Multivariate analysis of genetic data exploring group diversity Thibaut Jombart, Marie-Pauline Beugin MRC Centre for Outbreak Analysis and Modelling Imperial College London Genetic data analysis with PR
More informationOpen Problems in Mixed Models
xxiii Determining how to deal with a not positive definite covariance matrix of random effects, D during maximum likelihood estimation algorithms. Several strategies are discussed in Section 2.15. For
More informationAnalysis of Covariance (ANCOVA) with Two Groups
Chapter 226 Analysis of Covariance (ANCOVA) with Two Groups Introduction This procedure performs analysis of covariance (ANCOVA) for a grouping variable with 2 groups and one covariate variable. This procedure
More informationThe Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)
The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) Authored by: Sarah Burke, PhD Version 1: 31 July 2017 Version 1.1: 24 October 2017 The goal of the STAT T&E COE
More informationNiche Modeling. STAMPS - MBL Course Woods Hole, MA - August 9, 2016
Niche Modeling Katie Pollard & Josh Ladau Gladstone Institutes UCSF Division of Biostatistics, Institute for Human Genetics and Institute for Computational Health Science STAMPS - MBL Course Woods Hole,
More informationBiol 206/306 Advanced Biostatistics Lab 5 Multiple Regression and Analysis of Covariance Fall 2016
Biol 206/306 Advanced Biostatistics Lab 5 Multiple Regression and Analysis of Covariance Fall 2016 By Philip J. Bergmann 0. Laboratory Objectives 1. Extend your knowledge of bivariate OLS regression to
More informationMultiple Comparison Procedures Cohen Chapter 13. For EDUC/PSY 6600
Multiple Comparison Procedures Cohen Chapter 13 For EDUC/PSY 6600 1 We have to go to the deductions and the inferences, said Lestrade, winking at me. I find it hard enough to tackle facts, Holmes, without
More informationFrank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c /9/9 page 147 le-tex
Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c08 2013/9/9 page 147 le-tex 8.3 Principal Component Analysis (PCA) 147 Figure 8.1 Principal and independent components
More informationActivity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression
Activity #12: More regression topics: LOWESS; polynomial, nonlinear, robust, quantile; ANOVA as regression Scenario: 31 counts (over a 30-second period) were recorded from a Geiger counter at a nuclear
More informationBIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression
BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression Introduction to Correlation and Regression The procedures discussed in the previous ANOVA labs are most useful in cases where we are interested
More informationShort Answer Questions: Answer on your separate blank paper. Points are given in parentheses.
ISQS 6348 Final exam solutions. Name: Open book and notes, but no electronic devices. Answer short answer questions on separate blank paper. Answer multiple choice on this exam sheet. Put your name on
More informationBiostatistics for physicists fall Correlation Linear regression Analysis of variance
Biostatistics for physicists fall 2015 Correlation Linear regression Analysis of variance Correlation Example: Antibody level on 38 newborns and their mothers There is a positive correlation in antibody
More informationData Analysis and Statistical Methods Statistics 651
Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Suhasini Subba Rao Review Our objective: to make confident statements about a parameter (aspect) in
More informationEDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS
EDAMI DIMENSION REDUCTION BY PRINCIPAL COMPONENT ANALYSIS Mario Romanazzi October 29, 2017 1 Introduction An important task in multidimensional data analysis is reduction in complexity. Recalling that
More informationDimension Reduction Techniques. Presented by Jie (Jerry) Yu
Dimension Reduction Techniques Presented by Jie (Jerry) Yu Outline Problem Modeling Review of PCA and MDS Isomap Local Linear Embedding (LLE) Charting Background Advances in data collection and storage
More informationDegrees of freedom df=1. Limitations OR in SPSS LIM: Knowing σ and µ is unlikely in large
Z Test Comparing a group mean to a hypothesis T test (about 1 mean) T test (about 2 means) Comparing mean to sample mean. Similar means = will have same response to treatment Two unknown means are different
More informationOr, in terms of basic measurement theory, we could model it as:
1 Neuendorf Factor Analysis Assumptions: 1. Metric (interval/ratio) data 2. Linearity (in relationships among the variables--factors are linear constructions of the set of variables; the critical source
More informationApplied Multivariate Analysis
Department of Mathematics and Statistics, University of Vaasa, Finland Spring 2017 Dimension reduction Exploratory (EFA) Background While the motivation in PCA is to replace the original (correlated) variables
More information