A User's Guide To Principal Components

Similar documents
Linear Models in Statistics

An Introduction to Multivariate Statistical Analysis

Applied Multivariate Statistical Analysis Richard Johnson Dean Wichern Sixth Edition

Finite Population Sampling and Inference

Regression Analysis By Example

Experimental Design and Data Analysis for Biologists

Generalized, Linear, and Mixed Models

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Continuous Univariate Distributions

Transition Passage to Descriptive Statistics 28

Principal Component Analysis

DESIGN AND ANALYSIS OF EXPERIMENTS Third Edition

HANDBOOK OF APPLICABLE MATHEMATICS

Response Surface Methodology:

Introduction to Matrix Algebra and the Multivariate Normal Distribution

An Introduction to Applied Multivariate Analysis with R

Applied Probability and Stochastic Processes

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p.

Wolfgang Karl Härdle Leopold Simar. Applied Multivariate. Statistical Analysis. Fourth Edition. ö Springer

Principal component analysis

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...

PATTERN CLASSIFICATION

Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R.

A Course in Time Series Analysis

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

MANOVA is an extension of the univariate ANOVA as it involves more than one Dependent Variable (DV). The following are assumptions for using MANOVA:

Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of

Maximum-Entropy Models in Science and Engineering

DISCOVERING STATISTICS USING R

Introduction to Eco n o m et rics

Linear Statistical Models

Chapter 11 Canonical analysis

The Essentials of Linear State-Space Systems

TIME SERIES DATA ANALYSIS USING EVIEWS

Factor Analysis of Data Matrices

Response Surface Methodology

Stat 5101 Lecture Notes

Techniques and Applications of Multivariate Analysis

Statistics Toolbox 6. Apply statistical algorithms and probability models

A Introduction to Matrix Algebra and the Multivariate Normal Distribution

G. S. Maddala Kajal Lahiri. WILEY A John Wiley and Sons, Ltd., Publication

Chapter 7, continued: MANOVA

NUMERICAL METHODS FOR ENGINEERING APPLICATION

FORECASTING METHODS AND APPLICATIONS SPYROS MAKRIDAKIS STEVEN С WHEELWRIGHT. European Institute of Business Administration. Harvard Business School

Continuous Univariate Distributions

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Consistent Bivariate Distribution

sphericity, 5-29, 5-32 residuals, 7-1 spread and level, 2-17 t test, 1-13 transformations, 2-15 violations, 1-19

Textbook: Methods of Multivariate Analysis 2nd edition, by Alvin C. Rencher

Directional Statistics

Structural Equations with Latent Variables

Multivariate Analysis in The Human Services

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1

Course in Data Science

Contents. Acknowledgments. xix

Generalized Linear Models (GLZ)

Matrix Differential Calculus with Applications in Statistics and Econometrics

Introduction to the Design and Analysis of Experiments

1 Interpretation. Contents. Biplots, revisited. Biplots, revisited 2. Biplots, revisited 1

Regression. Estimation of the linear function (straight line) describing the linear component of the joint relationship between two variables X and Y.

Types of Statistical Tests DR. MIKE MARRAPODI

Measuring relationships among multiple responses

1 Data Arrays and Decompositions

Chemometrics. Matti Hotokka Physical chemistry Åbo Akademi University

AN INTRODUCTION TO THE FRACTIONAL CALCULUS AND FRACTIONAL DIFFERENTIAL EQUATIONS

Multivariate Linear Models

THE PRINCIPLES AND PRACTICE OF STATISTICS IN BIOLOGICAL RESEARCH. Robert R. SOKAL and F. James ROHLF. State University of New York at Stony Brook

Statistical. Psychology

Unconstrained Ordination

IDL Advanced Math & Stats Module

NONPARAMETRICS. Statistical Methods Based on Ranks E. L. LEHMANN HOLDEN-DAY, INC. McGRAW-HILL INTERNATIONAL BOOK COMPANY

HANDBOOK OF APPLICABLE MATHEMATICS

Discriminant Analysis and Statistical Pattern Recognition

6348 Final, Fall 14. Closed book, closed notes, no electronic devices. Points (out of 200) in parentheses.

ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX

MANOVA MANOVA,$/,,# ANOVA ##$%'*!# 1. $!;' *$,$!;' (''

Univariate Discrete Distributions

Sensitivity Analysis in Linear Regression

Applied Multivariate and Longitudinal Data Analysis

Multivariate Analysis of Ecological Data using CANOCO

INFORMATION THEORY AND STATISTICS

Analysis of variance, multivariate (MANOVA)

Linear and Nonlinear Models

Multivariate and Multivariable Regression. Stella Babalola Johns Hopkins University

Deccan Education Society s FERGUSSON COLLEGE, PUNE (AUTONOMOUS) SYLLABUS UNDER AUTOMONY. SECOND YEAR B.Sc. SEMESTER - III

A Guide to Modern Econometric:

Multivariate Analysis of Variance

Data Fitting and Uncertainty

Testing Statistical Hypotheses

FOURIER SERIES, TRANSFORMS, AND BOUNDARY VALUE PROBLEMS

STAT 730 Chapter 9: Factor analysis

INTRODUCCIÓ A L'ANÀLISI MULTIVARIANT. Estadística Biomèdica Avançada Ricardo Gonzalo Sanz 13/07/2015

Basic Mathematics for Chemists

Multivariate Statistics (I) 2. Principal Component Analysis (PCA)

Using Multivariate Statistics

Multivariate Statistics 101. Ordination (PCA, NMDS, CA) Cluster Analysis (UPGMA, Ward s) Canonical Correspondence Analysis

Number of cases (objects) Number of variables Number of dimensions. n-vector with categorical observations

Elements of Multivariate Time Series Analysis

Measurement Error Models

Pharmaceutical Experimental Design and Interpretation

Transcription:

A User's Guide To Principal Components J. EDWARD JACKSON A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Brisbane Toronto Singapore

Contents Preface Introduction 1. Getting Started,, ^ 1.1 Introduction, 4 1.2 A Hypothetical Example, 4 1.3 Characteristic Roots and Vectors, 7 1.4 The Method of Principal Components, 10 1.5 Some Properties of Principal Components, 13 1.6 Scaling of Characteristic Vectors, 16 1.7 Using Principal Components in Quality Control, 19 2. PCA With More Than Two Variables 2.1 Introduction, 26 2.2 Sequential Estimation of Principal Components, 27 2.3 Ballistic Missile Example, 28 2.4 Covariance Matrices of Less than Füll Rank, 30 2.5 Characteristic Roots are Equal or Nearly So, 32 2.6 A Test for Equality of Roots, 33 2.7 Residual Analysis, 34 2.8 When to Stop?, 41 2.9 A Photographic Film Example, 51 2.10 UsesofPCA, 58 3. Scaling of Data 3.1 Introduction, 63 3.2 Data as Deviations from the Mean: Covariance Matrices, 64 3.3 Data in Standard Units: Correlation Matrices, 64

3.4 Data are not Scaled at All: Product or Second Moment Matrices, 72 3.5 Double-centered Matrices, 75 3.6 Weighted PCA, 75 3.7 Complex Variables, 77 CONTENTS Inferential Procedures 80 4.1 Introduction, 80 4.2 Sampling Properties of Characteristic Roots and Vectors, 80 4.3 Optimality, 85 4.4 Tests for Equality of Characteristic Roots, 86 4.5 Distribution of Characteristic Roots, 89 4.6 Significance Tests for Characteristic Vectors: Confirmatory PCA, 95 4.7 Inference with Regard to Correlation Matrices, 98 4.8 The Effect of Nonnormality, 102 4.9 The Complex Domain, 104 Putting It All Together Hearing Loss I 105 5.1 Introduction, 105 5.2 The Data, 106 5.3 Principal Component Analysis, 110 5.4 Data Analysis, 115 Operations with Group Data 123 6.1 Introduction, 123 6.2 Rational Subgroups and Generalized T-statistics, 123 6.3 Generalized T-statistics Using PCA, 126 6.4 Generalized Residual Analysis, 128 6.5 Use of Hypothetical or Sample Means and Covariance Matrices, 131 6.6 Numerical Example: A Color Film Process, 132 6.7 Generalized T-statistics and the Multivariate Analysis of Variance, 141 Vector Interpretation I: Simplifications and Inferential Techniques 142 7.1 Introduction, 142 7.2 Interpretation. Some General Rules, 143

CONTENTS 7.3 Simplification, 144 7.4 Use of Confirmatory PCA, 148 7.5 Correlation of Vector Coefficients, 149 8. Vector Interpretation II: Rotation 8.1 Introduction, 155 8.2 Simple Structure, 156 8.3 Simple Rotation, 157 8.4 Rotation Methods, 159 8.5 Some Comments About Rotation, 165 8.6 Procrustes Rotation, 167 9. A Case History Hearing Loss II 9.1 Introduction, 173 9.2 The Data, 174 9.3 Principal Component Analysis, 177 9.4 Allowance for Age, 178 9.5 Putting it all Together, 184 9.6 Analysis of Groups, 186 10. Singular Value Decomposition: Multidimensional Scaling I 10.1 Introduction, 189 10.2 R- and ß-analysis, 189 10.3 Singular Value Decomposition, 193 10.4 Introduction to Multidimensional Scaling, 196 10.5 Biplots, 199 10.6 MDPREF, 204 10.7 Point-Point Plots, 211 10.8 Correspondence Analysis, 214 10.9 Three-Way PCA, 230 10.10 N-Mode PCA, 232 11. Distance Models: Multidimensional Scaling II 11.1 Similarity Models, 233 11.2 An Example, 234 11.3 Data Collection Techniques, 237 11.4 Enhanced MDS Scaling of Similarities, 239

X CONTENTS 11.5 Do Horseshoes Bring Good Luck?, 250 11.6 Scaling Individual Differences, 252 11.7 External Analysis of Similarity Spaces, 257 11.8 Other Scaling Techniques, Including One-Dimensional Scales, 262 12. Linear Models I: Regression; PCA of Predictor Variables 263 12.1 Introduction, 263 12.2 Classical Least Squares, 264 12.3 Principal Components Regression, 271 12.4 Methods Involving Multiple Responses, 281 12.5 Partial Least-Squares Regression, 282 12.6 Redundancy Analysis, 290 12.7 Summary, 298 13. Linear Models II: Analysis of Variance; PCA of Response Variables 301 13.1 Introduction, 301 13.2 Univariate Analysis of Variance, 302 13.3 MANOVA, 303 13.4 Alternative MANOVA using PCA, 305 13.5 Comparison of Methods, 308 13.6 Extension to Other Designs, 309 13.7 An Application of PCA to Univariate ANOVA, 309 14. Other Applications of PCA 319 14.1 Missing Data, 319 14.2 Using PCA to Improve Data Quality, 324 14.3 Tests for Multivariate Normality, 325 14.4 Variate Selection, 328 14.5 Discriminant Analysis and Cluster Analysis, 334 14.6 Time Series, 338 15. Fiatland: Special Procedures for Two Dimensions 342 15.1 Construction of a Probability Ellipse, 342 15.2 Inferential Procedures for the Orthogonal Regression Line, 344 15.3 Correlation Matrices, 348 15.4 Reduced Major Axis, 348

CONTENTS xi 16. Odds and Ends 350 16.1 Introduction, 350 16.2 Generalized PCA, 350 16.3 Cross-Validation, 353 16.4 Sensitivity, 356 16.5 Robust PCA, 365 16.6 g-group PCA, 372 16.7 PCA When Data Are Functions, 376 16.8 PCA With Discrete Data, 381 16.9 [Odds and Ends] 2, 385 17. What is Factor Analysis Anyhow? 388 17.1 Introduction, 388 17.2 The Factor Analysis Model, 389 17.3 Estimation Methods, 398 17.4 Class I Estimation Procedures, 399 17.5 Class II Estimation Procedures, 402 17.6 Comparison of Estimation Procedures, 405 17.7 Factor Score Estimates, 407 17.8 Confirmatory Factor Analysis, 412 17.9 Other Factor Analysis Techniques, 416 17.10 Just What is Factor Analysis Anyhow?, 420 18. Other Competitors 424 18.1 Introduction, 424 18.2 Image Analysis, 425 18.3 Triangularization Methods, 427 18.4 Arbitrary Components, 430 18.5 Subsets of Variables, 430 18.6 Andrews' Function Plots, 432 Conclusion 435 Appendix A Matrix Properties A.l Introduction, 437 A.2 Definitions, 437 A.3 Operations with Matrices, 441 437 Appendix B. Matrix Algebra Associated with Principal Component Analysis 446

xii CONTENTS Appendix C. Computational Methods 450 C.l Introduction, 450 C.2 Solution of the Characteristic Equation, 450 C.3 The Power Method, 451 C.4 Higher-Level Techniques, 453 C.5 Computer Packages, 454 Appendix D. A Directory of Symbols and Definitions for PCA 456 D.l Symbols, 456 D.2 Definitions, 459 Appendix E. Some Classic Examples 460 E.l Introduction, 460 E.2 Examples for which the Original Data are Available, 460 E.3 Covariance or Correlation Matrices Only, 462 Appendix F. Data Sets Used in This Book 464 F.l Introduction, 464 F.2 Chemical Example, 464 F.3 Grouped Chemical Example, 465 F.4 Ballistic Missile Example, 466 F.5 Black-and-White Film Example, 466 F.6 Color Film Example, 467 F.7 Color Print Example, 467 F.8 Seventh-Grade Tests, 468 F.9 Absorbence Curves, 468 F. 10 Complex Variables Example, 468 F. 11 Audiometrie Example, 469 F. 12 Audiometrie Case History, 470 F. 13 Rotation Demonstration, 470 F. 14 Physical Measurements, 470 F. 15 Rectangular Data Matrix, 470 F. 16 Horseshoe Example, 471 F. 17 Presidential Hopefuls, 471 F. 18 Contingency Table Demo: Brand vs. Sex, 472 F. 19 Contingency Table Demo: Brand vs. Age, 472 F.20 Three-Way Contingency Table, 472

CONTENTS xiü F.21 Occurrence of Personal Assault, 472 F.22 Linnerud Data, 473 F.23 Bivariate Nonnormal Distribution, 473 F.24 Circle Data, 473 F.25 United States Budget, 474 Appendix G. Tables Bibliography Author Index G.l Table of the Normal Distribution, 476 G.2 Table of the t-distribution, 477 G.3 Table of the Chi-square Distribution, 478 G.4 Table of the F-Distribution, 480 G.5 Table of the Lawley-Hotelling Trace Statistic, 485 G.6 Tables of the Extreme Roots of a Covariance Matrix, 494 475 497 551 Subject Index 563