Applied Multivariate Statistical Analysis Richard Johnson Dean Wichern Sixth Edition
Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world Visit us on the World Wide Web at: www.pearsoned.co.uk Pearson Education Limited 2014 All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without either the prior written permission of the publisher or a licence permitting restricted copying in the United Kingdom issued by the Copyright Licensing Agency Ltd, Saffron House, 6 10 Kirby Street, London EC1N 8TS. All trademarks used herein are the property of their respective owners. The use of any trademark in this text does not vest in the author or publisher any trademark ownership rights in such trademarks, nor does the use of such trademarks imply any affi liation with or endorsement of this book by such owners. ISBN 10: 1-292-02494-1 ISBN 13: 978-1-292-02494-3 ISBN 13: 978-1-292-037 7-8 5 (Print) (PDF) British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library Printed in the United States of America
Applied Multivariate Statistical Analysis: Pearson New International Edition Cover Chapter 1: Aspects of Multivariate Analysis 1.1 Introduction 1.2 Applications of Multivariate Techniques 1.3 The Organization of Data Arrays Descriptive Statistics Graphical Techniques 1.4 Data Displays and Pictorial Representations Linking Multiple Two-Dimensional Scatter Plots Graphs of Growth Curves Stars Chernoff Faces 1.5 Distance 1.6 Final Comments Chapter 2: Sample Geometry and Random Sampling 2.1 Introduction 2.2 The Geometry of the Sample 2.3 Random Samples and the Expected Values of the Sample Mean and Covariance Matrix 2.4 Generalized Variance Generalized Variance Determined by and its Geometrical Interpretation Another Generalization of Variance 2.5 Sample Mean, Covariance, and Correlation as Matrix Operations 2.6 Sample Values of Linear Combinations of Variables Chapter 3: Matrix Algebra and Random Vectors 3.1 Introduction 3.2 Some Basics of Matrix and Vector Algebra Vectors 3.3 Positive Definite Matrices
3.4 A Square-Root Matrix 3.5 Random Vectors and Matrices 3.6 Mean Vectors and Covariance Matrices Partitioning the Covariance Matrix The Mean Vector and Covariance Matrix for Linear Combinations of Random Variables Partitioning the Sample Mean Vector and Covariance Matrix 3.7 Matrix Inequalities and Maximization Supplement 3A: Vectors and Matrices: Basic Concepts Vectors Matrices Chapter 4: The Multivariate Normal Distribution 4.1 Introduction 4.2 The Multivariate Normal Density and its Properties Additional Properties of the Multivariate Normal Distribution 4.3 Sampling from a Multivariate Normal Distribution and Maximum Likelihood Estimation The Multivariate Normal Likelihood Maximum Likelihood Estimation of and Sufficient Statistics 4.4 The Sampling Distribution of X and S Properties of the Wishart Distribution 4.5 Large-Sample Behavior of X and S 4.6 Assessing the Assumption of Normality Evaluating the Normality of the Univariate Marginal Distributions Evaluating Bivariate Normality 4.7 Detecting Outliers and Cleaning Data Steps for Detecting Outliers 4.8 Transformations to Near Normality Transforming Multivariate Observations Chapter 5: Inferences About a Mean Vector 5.1 Introduction 5.2 The Plausibility of 0 as a Value for a Normal Population Mean 5.3 Hotellings T2 and Likelihood Ratio Tests General Likelihood Ratio Method
5.4 Confidence Regions and Simultaneous Comparisons of Component Means Simultaneous Confidence Statements A Comparison of Simultaneous Confidence Intervals with One-at-a-Time Intervals The Bonferroni Method of Multiple Comparisons 5.5 Large Sample Inferences about a Population Mean Vector 5.6 Multivariate Quality Control Charts Charts for Monitoring a Sample of Individual Multivariate Observations for Stability Control Regions for Future Individual Observations Control Ellipse for Future Observations T2-Chart for Future Observations Control Charts Based on Subsample Means Control Regions for Future Subsample Observations 5.7 Inferences about Mean Vectors When Some Observations are Missing 5.8 Difficulties Due to Time Dependence in Multivariate Observations Supplement 5A: Simultaneous ConfidenceIntervals and Ellipses as Shadows of the p-dimensional Ellipsoids Chapter 6: Comparisons of Several Multivariate Means 6.1 Introduction 6.2 Paired Comparisons and a Repeated Measures Design Paired Comparisons A Repeated Measures Design for Comparing Treatments 6.3 Comparing Mean Vectors from Two Populations Assumptions Concerning the Structure of the Data Further Assumptions When n1 and n2 are Small Simultaneous Confidence Intervals The Two-Sample Situation When 1 2 An Approximation to the Distribution of T2 for Normal Populations When Sample Sizes are Not Large 6.4 Comparing Several Multivariate Population Means (One-Way Manova) Assumptions about the Structure of the Data for One-Way Manova A Summary of Univariate Anova Multivariate Analysis of Variance (Manova) 6.5 Simultaneous Confidence Intervals for Treatment Effects 6.6 Testing for Equality of Covariance Matrices 6.7 Two-Way Multivariate Analysis of Variance Univariate Two-Way Fixed-Effects Model with Interaction Multivariate Two-Way Fixed-Effects Model with Interaction
6.8 Profile Analysis 6.9 Repeated Measures Designs and Growth Curves 6.10 Perspectives and a Strategy for Analyzing Multivariate Models Chapter 7: Multivariate Linear Regression Models 7.1 Introduction 7.2 The Classical Linear Regression Model 7.3 Least Squares Estimation Sum-of-Squares Decomposition Geometry of Least Squares 7.4 Inferences About the Regression Model Inferences Concerning the Regression Parameters Likelihood Ratio Tests for the Regression Parameters 7.5 Inferences from the Estimated Regression Function Estimating the Regression Function at Z0 Forecasting a New Observation at Z0 7.6 Model Checking and Other Aspects of Regression Does the Model Fit? Leverage and Influence Additional Problems in Linear Regression 7.7 Multivariate Multiple Regression Other Multivariate Test Statistics Predictions from Multivariate Multiple Regressions 7.8 The Concept of Linear Regression 7.9 Comparing the Two Formulations of the Regression Model Mean Corrected Form of the Regression Model Relating the Formulations 7.10 Multiple Regression Models with Time Dependent Errors Supplement 7A: The Distribution of the Likelihood Ratio for the Multivariate Multiple Regression Model Chapter 8: Principal Components 8.1 Introduction 8.2 Population Principal Components Principal Components for Covariance Matrices with Special Structures 8.3 Summarizing Sample Variation by Principal Components The Number of Principal Components
Interpretation of the Sample Principal Components Standardizing the Sample Principal Components 8.4 Graphing the Principal Components 8.5 Large Sample Inferences Large Sample Properties of i and ei Testing for the Equal Correlation Structure 8.6 Monitoring Quality with Principal Components Checking a Given Set of Measurements for Stability Controlling Future Values Supplement 8A: The Geometry of the SamplePrincipal Component Approximation The p-dimensional Geometrical Interpretation The n-dimensional Geometrical Interpretation Chapter 9: Factor Analysis and Inference for Structured Covariance Matrices 9.1 Introduction 9.2 The Orthogonal Factor Model 9.3 Methods of Estimation The Principal Component (and Principal Factor) Method A Modified Approachthe Principal Factor Solution The Maximum Likelihood Method A Large Sample Test for the Number of Common Factors 9.4 Factor Rotation Oblique Rotations 9.5 Factor Scores The Weighted Least Squares Method The Regression Method 9.6 Perspectives and a Strategy for Factor Analysis Supplement 9A: Some Computational Details for Maximum Likelihood Estimation Recommended Computational Scheme Maximum Likelihood Estimators of p = LzLz + z Chapter 10: Canonical Correlation Analysis 10.1 Introduction 10.2 Canonical Variates and Canonical Correlations 10.3 Interpreting the Population Canonical Variables Identifying the Canonical Variables Canonical Correlations as Generalizations of Other Correlation Coefficients
The First r Canonical Variables as a Summary of Variability A Geometrical Interpretation of the Population Canonical Correlation Analysis 10.4 The Sample Canonical Variates and Sample Canonical Correlations 10.5 Additional Sample Descriptive Measures Matrices of Errors of Approximations Proportions of Explained Sample Variance 10.6 Large Sample Inferences Chapter 11: Discrimination and Classification 11.1 Introduction 11.2 Separation and Classification for Two Populations 11.3 Classification with Two Multivariate Normal Populations Classification of Normal Populations When 1 = 2 = Scaling Fishers Approach to Classification with Two Populations Is Classification a Good Idea? Classification of Normal Populations When 1 2 11.4 Evaluating Classification Functions 11.5 Classification with Several Populations The Minimum Expected Cost of Misclassification Method Classification with Normal Populations 11.6 Fishers Method for Discriminating among Several Populations Using Fishers Discriminants to Classify Objects 11.7 Logistic Regression and Classification Introduction The Logit Model Logistic Regression Analysis Classification Logistic Regression with Binomial Responses 11.8 Final Comments Including Qualitative Variables Classification Trees Neural Networks Selection of Variables Testing for Group Differences Graphics Practical Considerations Regarding Multivariate Normality
Chapter 12: Clustering, Distance Methods and Ordination 12.1 Introduction 12.2 Similarity Measures Distances and Similarity Coefficients for Pairs of Items Similarities and Association Measures for Pairs of Variables Concluding Comments on Similarity 12.3 Hierarchical Clustering Methods Single Linkage Complete Linkage Average Linkage Wards Hierarchical Clustering Method Final CommentsHierarchical Procedures 12.4 Nonhierarchical Clustering Methods K-means Method Final CommentsNonhierarchical Procedures 12.5 Clustering Based on Statistical Models 12.6 Multidimensional Scaling 12.7 Correspondence Analysis Algebraic Development of Correspondence Analysis 12.8 Biplots for Viewing Sampling Units and Variables Constructing Biplots 12.9 Procrustes Analysis: A Method for Comparing Configurations Constructing the Procrustes Measure of Agreement Supplement 12A: Data Mining Introduction The Data Mining Process Model Assessment Selected Additional for Model Based Clustering Appendix Table 1: Standard Normal Probabilities Table 2: Students T-Distribution Percentage Points Table 3: X2 Distribution Percentage Points Table 4: F-Distribution Percentage Points ( = 10) Table 5: F-Distribution Percentage Points ( =.05) Table 6: F-Distribution Percentage Points ( =.01) Index