A Characterization of Principal Components. for Projection Pursuit. By Richard J. Bolton and Wojtek J. Krzanowski

Size: px

Start display at page:

Download "A Characterization of Principal Components. for Projection Pursuit. By Richard J. Bolton and Wojtek J. Krzanowski"

Regina Day
5 years ago
Views:

1 A Characterization of Principal Components for Projection Pursuit By Richard J. Bolton and Wojtek J. Krzanowski Department of Mathematical Statistics and Operational Research, University of Exeter, Laver Building, North Park Road, Exeter, EX4 4QE, U.K. ABSTRACT Principal Component Analysis is a technique often found to be useful for identifying structure in multivariate data. Although it has various characterizations (Rao 1964), the most familiar is as a variance-maximizing projection. Projection pursuit is a methodology for selecting low-dimensional projections of multivariate data by the optimization of some index of \interestingness" over all projection directions. Principal Component Analysis can be viewed as an example of projection pursuit and we justify its success in structure identication by characterizing it in terms of maximum likelihood under the assumption of normality. 1. INTRODUCTION Researchers in many elds use Principal Component Analysis (PCA) as an ecient way to provide an informative, low-dimensional representation of multivariate data in which features in the data such as clustering, skewness and outliers can often be detected. However, PCA does not necessarily afford the \best" view of these structures and it may miss other interesting 1

2 views of the data. With this in mind, much research has been done in recent years on approaches to identify projections that display particularly \interesting" features of the data. This body of techniques goes under the generic name \projection pursuit" (Friedman and Tukey 1974). We decide on the dimension of the required projection and choose a criterion (the projection pursuit index) that will, hopefully, nd projections of the desired data structure when optimized. Local optimization of the criterion over all projections of the required dimensionality yields \interesting" projections of the data which can be displayed graphically. The projection is usually chosen to be one-, two- or three-dimensional for convenience. PCA can be viewed as a particular case of projection pursuit in which the index of \interestingness" is the variance of the data, which is maximized over all unit length projections. This is one of the few available indices that can be maximized algebraically (see below). Projection pursuit indices are diverse but the constructions of most are motivated by consideration of Central Limit Theorem results. Diaconis and Freedman (1984) showed that projected subspaces of high-dimensional data converged, weakly in probability, to normality. That is, most projections are approximately Gaussian. Indeed, under the Cramer-Wold theorem, if the \least normal" one-dimensional projection of the data is normal then the multivariate distribution, completely dened by its one-dimensional projections of high-dimensional data, is also normal. Consequently, most projection pursuit indices are developed from the standpoint that normality represents the notion of \uninterestingness" (Huber 1985). These indices are thus optimized to nd projections showing departures from normality, yet, in the context of normality, the reason why PCA is capable of picking out interesting projections has never been explored. A maximum likelihood framework 2

3 helps to clarify this link between PCA and projection pursuit. 2. A MAXIMUM LIKELIHOOD FRAMEWORK FOR PCA Let us work with the concept of normality representing an un-interesting projection. We require projections to be of unit length and orthogonal. Given a sample fx 1 ; : : : ; x n g 2 R p and a projection a 2 R p, the projection pursuit index corresponding to PCA is V = maxfasa g; aa 0 = 1 (1) a where S = 1 n?1 P (xi?x) 0 (x i?x) is the sample variance-covariance matrix. In most standard multivariate texts it is shown algebraically that the maximizing a is the eigenvector corresponding to the largest eigenvalue, 1, of S, or alternatively, a is the direction of maximum variance in the data. Successive evaluations of (1), over projections orthogonal to those already found, produce projections that are the eigenvectors corresponding to the eigenvalues 2 > : : : > p. Now we will show that this index (1) corresponds to the minimum, over all projections a, of the maximized log-likelihood when normality is assumed. Write L(x; ; ; a) for the log-likelihood of N (a; aa 0 ), which is maximized with the usual maximum likelihood estimators of and (^ = x and ^ = n?1 n S). Then L(a) = max L(x; ; ; a) =? n 2n (p + p log( ; 2 n? 1 ))? n 2 log jasa0 j (2) can be viewed as a projection pursuit index. Now L(a) is monotonic in log jasa 0 j; as jasa 0 j increases L(a) decreases, so maximizing V corresponds to minimizing L(a). That is, when normality is assumed the most interesting projection is the one with the smallest (maximized) likelihood. Perhaps another way of thinking about this is that when the projection is most interesting the data are less likely to be normally distributed. 3

4 3. PCA AND PROJECTION PURSUIT Interesting projections of the data selected by PCA are those which have a small maximized likelihood under normal assumptions. This does not imply that PCA maximizes departure from normality, but does suggest that there is a connection between \interesting", as dened by variance maximization in PCA, and non-normality. Structures that PCA will not so readily uncover are those that, although non-normal, have a relatively large likelihood under normal assumptions (e.g. heavy-tailed elliptical or other symmetrical distributions). It is therefore possible to create a hierarchy of structures according to their maximized likelihoods under normal assumptions and, accordingly, their relative likelihood of being selected by PCA. Consequently, we cannot discount the lowest principal components as these sometimes reveal more interesting structure than the highest principal components (Donnell, Buja and Stuetzle 1994). Dierent projection pursuit indices tend to home in on dierent types of non-normal structures and several have been developed, each displaying dierent properties. The holes index (Cook, Buja and Cabrera 1993), for example, is optimized at projections displaying large \holes" at the center of the data. Optimization at projections showing skewness is characteristic of the Legendre index (Friedman 1987) and the skewness index (Cook, Buja and Cabrera 1993). The central mass index (Cook, Buja and Cabrera 1993) nds projections in which the data are concentrated centrally. Group structures in the data may be found by optimizing an index developed by Eslava and Marriott (1994). These are just a few of the many projection pursuit indices in existence, full details of which can be found, for example, in Huber (1985), Jones and Sibson (1987), Hall (1989), Posse (1990, 1995) and Nason (1995, 1996). 4

5 The implementation of existing indices and the development of new indices to nd other interesting projections is an active area of research. A selection of two-dimensional projection pursuit indices has been implemented in association with a grand tour in the software XGobi (Swayne, Cook and Buja 1997), a dynamic two-dimensional graphical tool publicly available from StatLib. It is important to realise that there may be several interesting projections of a data set which are not necessarily mutually orthogonal. XGobi provides the scope and exibility to promote local optimization of a projection pursuit index and thereby to obtain several interesting projections of the data. If we take advantage of the opportunity for user intervention we become familiar with the data through an interface that is more informative than static pairwise plots. With packages such as this making projection pursuit more accessible the exploration of projection pursuit methods is encouraged for researchers currently reliant on PCA to display interesting features of their multivariate data. ACKNOWLEDGEMENTS We are grateful to the Associate Editor and referees for their suggestions, which have materially improved the presentation of this paper. REFERENCES Cook, D., Buja, A. and Cabrera, J., (1993), \Projection pursuit indices based on orthonormal function expansions," Journal of Computational and Graphical Statistics, 2(3), Diaconis, P. and Freedman, D., (1984), \Asymptotics of graphical projection pursuit," The Annals of Statistics, 12,

6 Donnell, D. J., Buja, A. and Stuetzle, W., (1994), \Analysis of additive dependencies using smallest additive principal components (with discussion)," The Annals of Statistics, 22, Eslava, G. and Marriott, F. H. C., (1994), \Some criteria for projection pursuit," Statistics and Computing, 4, Friedman, J. H. and Tukey, J. W., (1974), \A projection pursuit algorithm for exploratory data analysis," IEEE Transactions on Computers, Series C, 23, Hall, P., (1989), \On polynomial-based projection indices for exploratory projection pursuit," The Annals of Statistics, 17, Huber, P. J., (1985), \Projection pursuit (with discussion)," The Annals of Statistics, 13, Jones, M. C. and Sibson, R., (1987), \What is projection pursuit? (with discussion)," Journal of the Royal Statistical Society, Series A, 150, Nason, G. P., (1995), \Three-dimensional projection pursuit," Applied Statistics, 44, Nason, G. P., (1996), \Robust projection indices," Technical Report, School of Mathematics, University of Bristol, Bristol (submitted for publication). Posse, C., (1990), \An eective two-dimensional projection pursuit algorithm," Communications in Statistics, Simulation and Computation, 19, Posse, C., (1995), \Projection pursuit exploratory data analysis," Computational Statistics and Data Analysis, 20, Rao, C. R., (1964), \The use and interpretation of principal component analysis in applied research," Sankhya, Series A, 26, Swayne, D., Cook, D. and Buja, A., (1997), \XGobi: interactive dynamic 6

7 data visualization in the X Window system," Journal of Computational and Graphical Statistics, 7(1),

Robustness of Principal Components

Robustness of Principal Components PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.