Principal component analysis

Size: px

Start display at page:

Download "Principal component analysis"

Jewel Arline Willis
5 years ago
Views:

1 Principal component analysis Motivation i for PCA came from major-axis regression. Strong assumption: single homogeneous sample. Free of assumptions when used for exploration. Classical tests of significance of eigenvectors and eigenvalues assume multivariate normality. Bootstrap tests assume only that sample is representative of the population. Can be used with multiple samples for exploration: Search for structure: e.g., how many groups? Not optimized for discovering group structure. Classical significance tests can t be used. If discover structure by exploring data, then can t test for significance.

2 Principal component analysis PC (.%).. PC (.%).. Scores on. Scores on Scores on PC (.%) Scores on PC (.%) MANOVA: p <. But: data were sampled randomly from a single multivariate-normal i t lpopulation.

3 Multiple groups and multiple variables Suppose that: We have two or more groups (treatments, etc.) defined on extrinsic criteria. We wish to know whether and how we can discriminate groups on the basis of two or more measured variables. Things we might want to know: Can we discriminate the groups? If so, how well? How different are the groups? Are the groups significantly different? How do we assess significance in the presence of correlations among variables? Which variables are most important in discriminating the groups? Can group membership be predicted for unknown individuals? How good is the prediction?

4 Multiple groups and multiple variables These questions are answered using three related methods: () Discriminant function analysis (DFA): = Discriminant analysis (DA), = canonical variate analysis (CVA). Determines the linear combinations of variables that best discriminate groups. () Multivariate analysis of variance (MANOVA): Determines whether multivariate samples differ non-randomly (significantly). () Mahalanobis distance (D ): Measures distances in multivariate character space in the presence of correlations among variables. Developed independently by three mathematicians: Fisher (DFA) in England, Hotelling (MANOVA) in the United States, Mahalanobis (D ) in India. Due to differences in notation, underlying similarities not noticed for years. Now have a common matrix formulation.

5 Discriminant analysis Principal component analysis: Inherently a single-group procedure: Assumes that data represent a single homogeneous sample from a population. Can be used for multiple groups, but cannot take group structure into consideration. Often used to determine whether groups differ in terms of the variables used, but: Can t use grouping information even if it exists. Maximizes variance, regardless of its source. Not guaranteed to discriminate groups. Discriminant analysis: Explicitly a multiple-group procedure. Assumes that groups are known (correctly) before analysis, on the basis of extrinsic criteria. Optimizes i discrimination i i bt between the groups by one or more linear combinations of the variables (discriminant functions).

6 Discriminant analysis Q: How are the groups different, and which hihvariables ibl most contribute tibt to the differences? A: For k groups, find the k linear discriminant functions (axes, vectors, functions) that t maximally separate the k groups. Discriminant functions (DFs) are eigenvectors of the among-group variance (rather than total variance). Like PCs, discriminant functions: Are linear combinations of the original variables. Are specified by sets of eigenvector coefficients (weights). Can be rescaled as vector correlations. Allow interpretation of contributions of individual variables. Have corresponding eigenvalues. Specify the proportion of among-groupgroup variance (rather than total variance) accounted for by each DF. Can be estimated from either the covariance matrices (one per group) or the correlation matrices. Groups are assumed to have multivariate normal distributions with identical covariance matrices.

7 Discriminant analysis Example: groups, variables: Example: groups, variables: Original data Original data with % data ellipses X X X A B X res F from ANOVA of scor Angle of line from horizontal Line A: ANOVA F=. Line B: ANOVA F=. DF: ANOVA F=. Projection scor res Projection scor res Projection scor res Group Group Group

8 Discriminant analysis Example: groups, variables: Example: groups, variables: X Original data X X Original data with % data ellipses A B X F from ANOVA of score es Angle of line from horizontal. Line A: ANOVA F=.. Line B: ANOVA F=.. DF: ANOVA F=.. Projection sco ores Projection sco ores Projection sco ores Group Group Group

9 Discriminant analysis The discriminant functions are eigenvectors: For PCA, the eigenvectors are estimated from S, the covariance matrix, with accounts for the total variance of the sample. For DFA, the eigenvectors are estimated t from a matrix that t accounts for the among-group variance. For a single variable, a measure of among-group variation, scaled by within-group variation, is the ratio: s a s w Discriminant functions are eigenvectors of the matrix W = pooled within-group covariance matrix. B = among-group covariance matrix. Analogous to univariate measure. W B

10 Thus the DFA eigenvectors: Discriminant analysis Maximize the ratio of among-group variation to within-group variation. Optimize i discrimination i i among all groups simultaneously. l For any set of data, there exists one axis (the discriminant function, DF) for which projections of groups of individuals are maximally separated, as measured by ANOVA of the projections onto the axis. For groups: this DF completely accounts for group discrimination. For + groups, have series of orthogonal DFs: DF accounts for largest proportion of among-group variance. DF accounts for largest proportion of residual among-group group variance. Etc. DFs can be used as bases of a new coordinate system for plotting DFs can be used as bases of a new coordinate system for plotting scores of observations, and loadings of original variables.

11 Discriminant analysis Example: groups, variables: Original data Original data with % data ellipses X X X X Scores on DF (.%) - - Loadings on DF X X X X - - Scores on DF (.%) Loadings on DF

12 Discriminant analysis Example: groups, variables: Oi Original i ldata Original i data with % data ellipses X X X X. (.%) Scores on DF - - Loadings on DF X X X X X Scores on DF (.%) Loadings on DF

13 Discriminant analysis Discriminant i i functions have no necessary relationship to principal components: PC axes DF axes PC axes DF axes X X X X X PC axes DF axes X X PC axes DF axes X

14 MANOVA Q: Are the groups significantly heterogeneous? A: Multivariate analysis of variance: General case of testing ti for significant ifi differences among a set of predefined groups (treatments), with multiple correlated variables. ANOVA: special case for one variable (univariate). Hotelling s T -test: special case of MANOVA for two groups. t-test: special univariate case for two groups.

15 MANOVA Discriminant i i functions are eigenvectors of the matrix: The eigenvalues of W B are,,, p. W B A general multivariate test statistic is Wilk s lambda: Commonly reported by statistical packages. Expression to determine significance is complicated. Wilk s lambda can be transformed to an F-statistic, but the expression for this is complicated, too. Several other test statistics are commonly reported by statistical packages: Varying terminology, varying assumptions. All reported with corresponding p-values. p j j

16 Mahalanobis distance Q: How to measure the distance between two groups? A: Depends on whether we want to take correlations among variables into consideration. If not, just measure the Euclidean distance between centroids. If so, must measure the Mahalanobis distance between centroids along - the covariance structure: D x- x S x - x Can also measure Mahalanobis distances between points. Euclidean distances Mahalanobis distances.. X.. X.. X X

17 Classifying unknowns into predetermined groups Context: t have k known groups of observations. Also have one or more unknown observations, assumed to be a member of one of the known groups. Task: to assign each unknown observation to one of the k groups. Procedure: Find Mahalanobis distance from the unknown observation to each of the centroids of the k groups. Assign the unknown to the closest group. Can be randomized: Bootstrap the known observations by sampling within groups, with ihreplacement. Assign the unknown observation to the closest group, based on distance from the observation to the group centroids. Repeat many times: gives the proportion of times the observation is assigned to each of the groups.

18 Classifying unknowns into predetermined groups Example: groups, variables, unknown, bootstrap iterations: X X X X..... Classification probabilities: Group :. Group :. Classification probabilities: Group :. Group :.

19 Assessing misclassification rates (probabilities) Would like to know how good the discriminant i i functions are. DFA involves finding axes of maximum discrimination for the data included in the analysis. Would like to know how well the procedure will generalize. Can t ttrust tmisclassification ifi rates based on the observations used in the analysis. Ideally, would like to have new, known data to assign to known groups based on the discriminant functions.

20 Assessing misclassification rates (probabilities) Alternately, can cross-validate: Divide all data into: () Calibration data set: used to find discriminant functions. () Test data set: used to test discriminant functions. Determine how well the DFs can correctly assign unknowns to their correct groups. Proportions of incorrect assignments are estimates of true misclassification rates. Problem: need all data to get the best estimates discriminant functions. Solution: cross-validate one observation at a time via the jackknife procedure.

21 Assessing misclassification probabilities Cross-validation via the jackknife ( leave-one-out ) procedure. Set one observation aside. Estimate discriminant functions from remaining observations. Classify the remaining i known observation using the discriminant i i functions. Repeat for all observations, leaving one out at a time. Example: groups, variables, observations/group:.. X... Scores on DF X Scores on DF

22 Assessing misclassification probabilities Assign each individual, id in turn, to one of fthe known groups using the jackknife procedure. Bootstrap times. Misclassification rate: / = % Percentage of bootstrap replicates Observation Group Assigned to group Group Group

Y (Nominal/Categorical) 1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV

Y (Nominal/Categorical) 1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV 1 Neuendorf Discriminant Analysis The Model X1 X2 X3 X4 DF2 DF3 DF1 Y (Nominal/Categorical) Assumptions: 1. Metric (interval/ratio) data for 2+ IVs, and categorical (nominal) data for a single DV 2. Linearity--in