Techniques and Applications of Multivariate Analysis Department of Statistics Professor Yong-Seok Choi E-mail: yschoi@pusan.ac.kr Home : yschoi.pusan.ac.kr
Contents Multivariate Statistics (I) in Spring 1. Introduction of Multivariate Analysis 2. Principal Component Analysis (PCA) 3. Factor Analysis (FA) 4. Canonical Correlation Analysis (CCA) 5. Cluster Analysis (CA) Multivariate Statistics (II) in Autumn 6. Discrimination and Classification Analysis (DCA) 7. Multidimensional Scaling (MDS) 8. Correspondence Analysis (CRA) 9. Biplots 10. Estimation and Testing
Lecture 1. Introduction of Multivariate Analysis Lecture 1-1 1.1 multivariate data analysis 1.3 matrix representation of multivariate data Lecture 1-2 1.4 descriptive statistics 1.5 multivariate normal distribution and its useful properties 1.6 testing multivariate normality 1.7 Appendix: A1 and A2
1.1 Multivariate Data Analysis Definition A collection of techniques dealing with data containing observations on two or more variables. In general, data contain the n observations variables x,, 1 x p o, 1,o n and p Techniques based on the geometrical ideas R-Techniques : Analyses based on the matrix of covariance or correlations between variables. - PCA/FA/CCA/Biplot Q-Techniques : Analyses based on the matrix of distances between observations. - CA/DA/MDS
Geometrical Representations of 3-dimensinal space Students Mechanics Algebra Statistics 1 77 67 81 2 63 80 81 3 50 50 50 a) n = 3 points in p-space b) p = 3 points in n-space
[Data 1] Examination marks on 5 subjects (Mardia et al., 1979, pp. 3-4) closed-book open-book Obs Mechanics Vectors Algebra Analysis Statistics Questions: How to combine or average these marks? Relationship between open-book and closed-book?
[Data 2] Protein consumption in European 25 countries Protein Source: Meat Pigs Eggs Milk Fish Cereals Starchy foods Nuts/ Oil-seeds Fruits/Vegetables Questions: Which countries have high consumption of each kind of protein? Relationship between protein sources?
[Data 3] Fitness club data(sas Institute Inc., 1990, Chapter 15) Variables : Physiological vs Exercise Weight Waist Pulse Chins Situps Jumps Questions: How to relate physiological variables to exercise variables?
[Data 4] Fisher s Iris flower data(johnson & Wichern, 2002, p. 657) Types: Setosa Versicolour Virginica X1: Sepal length X2: Sepal width X4: Petal length X5: Petal width Questions: Ask to which species a new iris of unknown species belongs? How to find the criteria for classifying? Sir Ronald Aylmer Fisher (17 February 1890 29 July 1962) English statistician, evolutionary biologist, and geneticist.
[Data 5] Dissimilarity matrix for economic views of 14 institutes (Choi, 1995, Chapter 1) Questions: Which Institutes have a similar economic view? What are their economic views?
[Data 6] Contingency table for academic careers and preference grades of renowned brand (Baek and Lee) Very : Not at all : Questions: Independence between rows and columns? Which columns match rows together?
Remark : Objectives of Multivariate Analysis 1. Data Reduction or Structure Simplification - Making interpretation easier 2. Sorting and Grouping - Creating groups of similar objects or variables 3. Investigation of the dependence among variables - Making independent variables 4. Prediction 5. Hypothesis Testing