Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Size: px

Start display at page:

Download "Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at"

Gerald Johns
5 years ago
Views:

1 Biometrika Trust Testing Multivariate Normality Author(s): D. R. Cox and N. J. H. Small Source: Biometrika, Vol. 65, No. 2 (Aug., 1978), pp Published by: Oxford University Press on behalf of Biometrika Trust Stable URL: Accessed: :49 UTC JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org. Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at Biometrika Trust, Oxford University Press are collaborating with JSTOR to digitize, preserve and extend access to Biometrika

Biometrika (1978), 65, 2, pp. 263-72 263 Printed in Great Britain Testing multivariate normality BY D. R. COX AND N. J. H.

2 Biometrika (1978), 65, 2, pp Printed in Great Britain Testing multivariate normality BY D. R. COX AND N. J. H. SMALL Department of Mathematics, Imperial College, London SUMMARY Previous work on testing multivariate normality is reviewed. Coordinate-dependent and invariant procedures are distinguished. The arguments for concentrating on tests of linearity of regression are indicated and such tests, both coordinate-dependent and invariant, are developed. Some key word8: Goodness of fit; Invariance; Multivariate normality; Nonlinearity; Probability plot; Trransformation; Tukey's degree of freedom. 1. INTRODUCTION There has been much recent work on testing univariate normality, stemming partly from work on weak convergence (Durbin, 1973) and partly from more empirical ideas. Unfortunately little of this work can be directly applied to testing multivariate normality. Even when v, the number of component variables, is only two immediate adaptation of univariate tests such as the chi-squared goodness of fit test is clumsy and if v is larger such tests are quite impracticable. Further, the absence of a simple yet general family of distributions extending the multivariate normal precludes the use of a likelihood ratio test; see, however, Barndorff-Nielsen (1977). Just as in other applications of significance tests, the practical purpose of the test must be considered. It is a central theme of the present paper that a main objective of tests of multivariate normality is to see whether an estimated covariance matrix provides an adequate summary of the interrelationships among a set of variables; most practical applications of multivariate analysis depend either upon a direct interpretation of one or more covariance matrices or upon some further analysis of such matrices. While in particular applications very specific kinds of departure from multivariate normality might be of concern, the departure with the most serious consequences is often the occurrence of appreciable nonlinearity of dependence. In its simplest form, the covariance of two random variables is even qualitatively a poor indication of their association if appreciable curvature is present. Nonnormality of marginal distribution, as such, does not have this consequence. Therefore for the great majority of this paper we consider tests of linearity of regression rather than directly of normality. There is a general distinction in multivariate analysis between procedures that are invariant under arbitrary nonsingular linear transformations of the v component variables and those that are dependent on the particular coordinate system used to record the data. Despite the great theoretical power and importance of invariance considerations in multivariate analysis, there are many practical situations where the particular choice of components is important, i.e. where effects are in some sense most usefully to be detected or interpreted in particular directions in the v-dimensional space of the variables. Therefore we give separate discussions of invariant and of coordinate-dependent techniques. The coordinate-dependent procedures are, however, all invariant under scale and location changes of the components.

264 D. R. COX AND N. J. H. SMALL 2. PREVIOUS WORK An excellent broad review of the assessment of multivariate distributional properties has been given recently by Gnanadesikan (1977, pp.

3 264 D. R. COX AND N. J. H. SMALL 2. PREVIOUS WORK An excellent broad review of the assessment of multivariate distributional properties has been given recently by Gnanadesikan (1977, pp ) so that only a brief outline of previous work on tests of multivariate normality need be given here. A quite powerful coordinate-dependelnt approach is to consider parametric trainsformations coordinate by coordinate, e.g. of ys into YS(Ad = f(y8 1)/AS (As 0), -logys (AS = 0), for s = 1,..., v. Then it may be reasonable to assume that for some unknown A = (Al,..., AV) the transformed observations are multivariate normal with unknown mean and covariance matrix. The required A can be estimated by maximum likelihood and the null hypothesis As = 1 (s = 1,...,v) tested by a likelihood ratio test. In some applications in which the component variables are similar in kind it may be sensible to suppose that A1 =... = A,. This genera tion of the univariate technique of Box & Cox (1964) was probably first used in unpublished work by the late T. Burnaby; it was developed entirely independently and in more detail by Andrews, Gnanadesikan & Warner (1971, 1973). This approach is, of course, coordinatedependent, although Andrews et al. (1973) have considered the possibility of a preliminary rotation of coordinates before the consideration of transformations. This general approach has the advantage over most of the others to be mentioned that it gives an explicit suggestion of the analysis to be adopted if clear evidence against the null hypothesis of multivariate normality is found. Indeed, the only other general procedure based on an explicit alternative model is the fitting of a mixture of normal components, usually with different means and the same covariance matrix (Day, 1969). A widely useful invariant graphical procedure (Healy, 1968; Cox, 1968; Andrews et al., 1971) is based on the distribution of the ordered Mahalanobis distances of the individual points from their mean in the metric defined by the sample covariance matrix. Thus if Y1,..., Y. are n independent observations of a v-dimensional vector, Y their sample mean and S their estimated covariance matrix, we compute D? = (Y- y)t S-1(yi1) and plot the ordered D' against the expected order statistics for samples of size n from the chi-squared distribution with v degrees of freedom. It would be useful to have a significance test based on this procedure. Often, too, it will be informative to supplement the information about the distances of the individuals from the mean by some consideration of angular position (Gnanadesikan, 1977, pp ). Important tests of univariate normality are based on standardized third and fourth cumulants, these being of particular value because of their diagnostic power in indicating the qualitative nature of any departure from normality. One simple possibility for a coordinate-dependent multivariate procedure is to examine separately the marginal distribution of each component (Andrews et al., 1973). A conservative composite significance test can be obtained from the most significant of the individual component statistics by using a Bonferroni bound, i.e. by multiplying the most extreme significance level by v. Alternatively a more detailed analysis may be based on the estimated correlation matrix of the original variables; details are given in an unpublished paper by N. J. H. Small. An invariant procedure similar in spirit to that developed in? 4 of the present paper was given by Malkovich

Testing multivariate normality 265 & Afifi (1973) who considered as a possible statistic the supremum of, for example, the standardized skewness over all linear combinations a, yi +... + a y,.

4 Testing multivariate normality 265 & Afifi (1973) who considered as a possible statistic the supremum of, for example, the standardized skewness over all linear combinations a, yi a y,. They applied th notion to other univariate statistics. Mardia (1970) has obtained invariant combinations of the third- and fourth-order cumulants by examining those combinations that have maximum effect on the null hypothesis distribution of the Hotelling T2 statistic. Estimates of these invariant combinations are suggested for use as test statistics. Subsequently (Mardia, 1975) the relation between these and the distances D2 has been explored. 3. COORDINATE-DEPENDENT PROCEDURES 3*1. General In this section we consider tests for linearity of regression relationships which are coordinate-dependent. We deal with situations in which the v component variables are to be treated symmetrically. Of course, if there are available both response and explanatory variables, we shall normally condition on the observed values of the explanatory variables and be directly interested in distributional properties only as they concern the conditional distribution of the response variable given the explanatory variables. A complication in the discussion that follows is that when v is large a natural analysis leads to rather a large number of component statistics, and some way of simplifying this procedure will be necessary; see? 33. 3*2. Two component variables While the bivariate case, v = 2, is not of great practical interest, it is worthwhile beginning with a discussion of it. Let the observations be (Yil, Yi2) (i = 1,..., n), regarded as n independent observations. The null hypothesis is that these correspond to independent and identically distributed random variables (Y1, Y2) with a bivariate normal distribution. A simple test of the linearity of the regression of Y2 on Y1 is provided by Q2,1, the standar Student t statistic for the significance of the regression coefficient of Y2 on Y2 in a univariate linear model in which Y2 is regressed on Y1 and Y2. In special circumstances nonlinear functions other than Y2, for example 1/Y1, could be used, or an F statistic could be calculated for regression on a set of such functions. To treat the component variables symmetrically, we take with Q2,1 the statistic Q1,2* The joint distribution of (Q2,1, Q12) is complicated, even though the individual distribut of Q2,1 and Q1,2 are simple, so that to work with a test depending on both components consider the asymptotic distribution. In fact under the null hypothesis (Q2,1 Q1,2) is asym tically bivariate normal with zero mean and unit variance. It remains therefore to find the asymptotic correlation coefficient of Q2,1 and Q1,2. For this we may ignore the denominators of the Student t statistics and examiine the correlation between the random variables T21 = SYi2{(Yil - -(Yil - Y1) m30/m20 -M20} T12 = 1Yil{(yi2 -. 2)2 -(Yi2 - F.2) Mr03/MO2 -M02 where, for example, Y1 = XYil/n, mro = - (Yi- 1)r/n. We can without loss of generality take the random variables (Yil, Yi2) as bivariate normal of zero mean, unit variances and correlation coefficient p. Then Yi2 = PYi1 + '7Zi, where (Yil, Zi) are independently standard normal, so that _T2 = 1-p2. It follows that T21 = 52Z7(Y2-1) +Op(l)

5 266 D. R. COX AND N. J. H. SMALL and that therefore, cov (T21, T12>nE{(Y2-pY1) ( Y-1) (Y 1-pY2) (1Y- 1 )} = 2np(2-p2) (2-3p2), on evaluating the relevant moments. Thus asymptotically corr (Q2,1, Q1,2) = p(2-3p2), (1) where p = corr (Yl, Y2). This is consistently estimated on replacing p by r12' the sample correlation coefficient of Y1 and Y2. Thus, if required, we can form a composite test statistic either as or as the quadratic form max (IQ2,1 1, IQ1,2 1) (2) [1 r2(2-3rl2)]-4 Q2]( [7122-3r2 1 Lia The statistic (2) can, for large samples, be tested for significance from tables of the bivariate normal distribution and the statistic (3) by the chi-squared distribution with two degrees of freedom. Note from (1) that if p is small Q2,1 and Q1,2 have a correlation with the same sign as p whereas if p is large the correlations have opposite signs; this last fact has a simple geometrical interpretation. If information is available from several independent samples, all concerning the same two variables, a composite statistic can be formed in various ways More than two component variables The ideas of? 3*2 can be generalized to v component variables in several ways. Among these methods are the following. (a) We may regress each Y, linearly on all other Y. and on Y,2 and thus obtain a Student t statistic Qtt) (t $ r) for the quadratic contribution. There are v(v -1) such statistics, and they can be regarded as forming a v x v array with empty main diagonal. (b) The statistics considered in (a) can be supplemented by a further set of statistics Q(v)" (r $ t + u), examining the regression of Y1 on Yi Yu, adjusting for the linear terms as b This gives a further iv(v -1) (v -2) statistics, and so in all Jv2(v - 1) statistics. (c) Approaches (a) and (b) could be applied to marginal dependencies, regressing Yr on y2 and Y., omitting all other variables, i.e. obtaining the statistics Q(2) of? 3-2. More generally suitable v'-dimensional definitions (v' < v) could be examined. (d) Instead of isolating single degrees of freedom, we may combine the contributions by forming in the standard way an F statistic for, in case (a), fitting all Y2 (s + r) in regressing Y, on all Y, and all Y2. (e) We may use Tukey's degree of freedom for nonadditivity (Tukey, 1949) to obtain one degree of freedom from each variable in turn. If Yr is the dependent variable, let Ir be the fitted value arising from linear regression on the remaining variables; then the degree of freedom gives the Student t statistic associated with including 2 in the model, in addition to all linear terms. While this procedure has, especially for large v, the advantage of limiting the number of subsidiary statistics to be examined, empirical experience suggests that the dangers of overlooking major effects are too great for the procedure to be safely recommended, at least on its own. In all these methods nonlinear functions such as reciprocals could be used instead of squares. The methods give rise to a set of test statistics, in general correlated. If there are an appreciable ilurnber of these they can be plotted against an appropriate probability scale,

Testing multivariate normality 267 often the standard normal; it is known that moderate correlation between the values has little effect on the linearity of the plots.

6 Testing multivariate normality 267 often the standard normal; it is known that moderate correlation between the values has little effect on the linearity of the plots. The plots can be replaced by or augmented by approximate significance tests and we discuss these briefly below. The advantage of procedures that are based on single degrees of freedom is that they give more information for detailed diagnosis if evidence of a departure from the null hypothesis of multivariate normality is found. This suggests that for v not exceeding about 10, one of (a)-(c) should be used. For larger values of v, either the variates should be split into meaningful subsections, or (d), or, conceivably for very large v, (e), applied. For the remaining discussion we concentrate on (a) and (c). In either approach, the natural graphical method is to plot the ordered Q's against the expected order statistics in samples of size m from the standard normal distribution, where m is the number of Q statistics to be plotted. It is assumed that the sample size from which the statistics are computed is such that the Student t distribution can be treated as effectively normal; if not a nonlinear transformation to marginal normality could be applied. For interpretation it is essential that at least the more extreme points in the plot should be labelled with the two defining suffixes. Note that the signs of the Q's are meaningful, provided that the signs of the original variables are, so that a normal plot is appropriate, rather than a half-normal plot of absolute values. For more detailed numerical interpretation of the Q's, it is natural to consider them as a square array and to examine row and column sums Q(v), Q() or Q(2), Q(2), or sums of squares S(v)- {Q(v)1}2 S(v) = {Q(v)}2. (4) Sr(,. s r r,8 '.,8 r-= r,s. It can be shown that approximately the statistics (4) have means v -1 and variances 2(v-1) {1 + 2(v-2)/n}. It is hard to give a firm discussion of the relative merits of the statistics Q(M) and Q(2). Computational simplicity to some extent favours Q(2) and this will also have advantages if v is comparable with n and the variables are almost independent, for then the fitting of linear regressions will effectively induce 'noise' masking the effects under study. On the other hand, if strong roughly linear relationships are known to be present, it seems sensible to eliminate them and hence to use the statistics Q(V). If n is large compared with v the simplest general procedure is to use the Q(v). If information from several samples is combined it will usually be best to take the combined Q(r) as a weighted sum of the separate statistics, weighting by the sample size. 4. INVARIANT PROCEDURES 4'1. General idea The procedures of? 3 are coordinate-dependent. They in effect look for nonlinearities associated particularly with the variables Y1,..., Yv; of course an initial transformation of th original data might be made. To obtain an invariant procedure examining nonlinearity the most direct approach is to find that pair of variables, linear combinations of the original variables, such that one has maximum curvature in its regression on the other. The amount of curvature so achieved is the test statistic, and the form of the two maximizing variables will, hopefully, be a useful diagnostic tool Development of directions of maximum curvature: Population theory In the following discussion we can work either with samples and sample moments, or with random variables and corresponding population moments, which is what is done here.

7 268 D. R. COX AND N. J. H. SMALL Suppose that the variable Y = (Y,...,Y,)T is standardized so that its components have mean zero and let their covariance matrix be z = ((a0)). For the higher moments, write for r,s,t,u = 1,...,v, E(YY8Yt)=1(r,8,t), E(YrY8YtYu) = p,(r,8,t,u). Consider X = aty and W = bty with at a = btyeb = 1, so that X and W have zero mean and unit variance. Let y = Yxw denote the least squares regression coefficient of X on W2, adjusting for linear regression on W. This is found most simply by considering the orthogonalized form, X = 3W+y{W2-WE(W3)- 1}+E, (5) where e is an error term uncorrelated with W and W2, so that E(XW2)-E(W3) E(X W) (6 Yxw E(W4)-1-{E(W3)}2 One population measure of the quadratic contribution to regression is qxw = YxwI[E( W4) - 1- {E( W3)}2]i. (7) An interpretation of 71xw is as the proportion of the total unit var the quadratic component in the least squares regression of X on W and W2. We can express Yxw and "7xw in terms of a and b. For fixed b we wish to maximize the numerator of Yxw that is, to maximize Z(a, b) = la, b8 bt p,(r, s, t) - {IbA bt bp /l(r, s, t)} (2ar b8 are) (8) subject to Zar a,,a, = b b rra = 1. Consider ;(a, b) - AarYa,, a,, where A is a Lagrang multiplier, and differentiate with respect to a. to give for u = 1,..., v at a stationary poi Zb2 btb4 (U, 8, t) -(Ybt car) {brb, bt bb(r, s, t)}-azatant = 0. (9) Multiplication by bu followed by summation over u gives AZat bu aut = 0, and multiplication by au and summation gives 4(a, b) - A = 0. Because it is clear that the maximized {(., b) is nonzero, unless all ti(r, s, t) are zero, it follows that 2at buc a = 0, that is that the associated X and W are uncorrelated. Further, au = {Yb- b8 p(r, s, t) ucu - bu Ebr b8 b,t,(r, s, t)}/l(a, b), (1 where ((aij)) = E-1, which is assumed to exist. Hence 712(b), the supremum of -q2(a, b) over a for fixed b, is 2(b) - Zbr bs bt bu,u(r, s, p),(t, u, q) ap - {br b8 bt p(r, s, t)}2 (11) Zbr bs bt bu p(r, s, t, u)-1-{ br bsbt,u(r, 8, t)}2 The required directions for maximum curvature are obtained by maximizing this expression subject to 2br b8 a,8 = 1; except possibly in extremely special cases, this maximization has to be done numerically. The value of b is obtained directly and that of a by substitution in (10) Computation of the maximum curvature We shall continue with the notation above, although thinking rather more of using sample moments and of calculating the maximum of 7q2(b) for use as a test statistic. To avoid eomputational instability arising from gross differences in scale, the variates should be standardized to have unit variance as well as zero mean. Although the constraint bt Lb = 1 is important for the magnitude of the curvature, Yxw' it is irrelevant for -x and if no use is made in?4.2 of the relation 2brb8a- = 1, then we

$Testing multivariate normality 269 ultimately obtain the form, homogeneous in b, 2(b)= (2br bs &rs) {br bs be bu,u(r, s, p) j2(t, u, q) &p2}- {Xb, b8 bi j2(r, s, t)}2 A2(b) = r_.$

8 Testing multivariate normality 269 ultimately obtain the form, homogeneous in b, 2(b)= (2br bs &rs) {br bs be bu,u(r, s, p) j2(t, u, q) &p2}- {Xb, b8 bi j2(r, s, t)}2 A2(b) = r_.9 ars r I-L P 8 (2br b8 are) {Zbr b8 be bu 2(r, s, t, u)}- (2b bs a )3- {br b8 bt u(r, s, t)j2 where a circumflex denotes a sample value. This expression is now to be maximized without constraint on b. In nearly normal cases some simplification can be achieved by giving the denominator of A2(b) its normal theory value of 2(b b8 s)3 and then concentrating on the maximization of the numerator. Given one or more starting values bo of b, the maximization of 712(b) can be carried out the use of a 'hill-climbing' algorithm. Suitable bo may be selected from evaluations of 71 for a sequence of b values defined by the intersections in a grid of lines of 'latitude and longitude' on a half-surface of a v-dimensional sphere, noting that 2(b) = A12(- b). Such a grid may be formed by making uniform divisions of the angles in a system of spherical polar coordinates; the resulting points are spread fairly uniformly over the surface of the hypersphere. If each angular coordinate is divided into m parts (m > 2) then there are {(m - 1)v- -}/(m-2) points; this increases very rapidly with v, even if a coarse m, is employed, in which case the probability that the selected bo lead to the g rather than merely to local maxima, is reduced. Also, the effort in evaluating 712(b) is roughly proportional to v4, assuming that an array ' (r, s, t, u) =Y(r, s, p) j2(t, u, q) &a' is used. These two facts combine to make v = 6 about the limit for computational feasibility. Larger numbers of variables could, for example, be dealt with by dividing them into subsets of size 6 or less. For a test statistic we concentrate on the global maximum of 2(b). For interpretation, however, in a nonnull case, it may well be useful to know the several roughly equal local maxima The null hypothesis distribution Denote the maximum of 2(b) by 2 ax To apply a significance test based on 2 X we need to know at least approximately its distribution under the null hypothesis of multivariate normality. Clearly this distribution depends only on v and n. Analytical study of the distribution seems not to be feasible. Simulation shows that approximately for n > 50, v < 6, loga2.ax is normally distributed with mean log {(5v2)/(8n)} and standard deviation 0 90 (v = (v = 3), 0-38 (v = 4), 0-31 (v = 5), 0-17 (v = 6). It would be good to have some qualitative explanation both of the log normal shape and of the form of the mean and standard deviation. The nature of the dependence on n may be accounted for in general terms by the following argument. For fixed a and b, q (a, b) has a distribution with asymptotic mean and variance 3(v - 1)/(2n) and 9(v - 1)/(2n2) respectively, under the null hypothesis. We now consider fitting this with the log normal distribution, corresponding to the distribution N(p,, a2), which has mean eis+il2 and variance e2(+-2) (e2-1). Upon equating moments, we obtain,u = log {(v-1)2(v+ 1)-ln-1}, a2 = log {(v + l)/(v- 1)}. Of course 77na > 9(a, b), so that in fitting a log normal distribution to qm..ax the equalitie above could only be maintained by the introduction of constants that were functions of n and p. However, the asymptotic dependence on n should remain unaltered, and hence in making the transition from 9(a, b) to 77 ax the similarity, as functions of n, between th and a above and those obtained empirically, should be preserved.

9 2770 D. R. COX AND N. J. H. SMALL Note that quite apart from its use as a test statistic 7nax has a direct numerical interpretation as a maximal proportion of variance accounted for by quadratic regression. 5. INTERPRETATION While the procedures of?? 3 and 4 have been described in the first place as tests of significance, if evidence of nonlinearity is found some interpretation has always to be attempted. In the absence of a simple widely applicable alternative family of distributions, no general rules can be given, but the following comments may be helpful. Inspection of scatter diagrams will always be required for interpretation. For the procedure of? 4 a first step will be to examine the plot for the derived variables for which -q2 is maxima For the coordinate dependent approach of? 3, pairs of original variates may be plotted, or alternatively residuals of YT and Y2 from their linear regression on the remaining variables, when Q(11) appears interestingly large. In clearcut cases the nonlinearity will arise either from a small number of aberrant points, which will then need special consideration, or from a consistent curvature. If the curvature arises in connection with only one or two component variables, it may be sensible to treat the remainder as multivariate normal and to describe separately the dependence of the anomalous variables on the remainder. Consistent patterns of signis in the curvature may indicate the general nature of appropriate transformations. It would in principle be possible to develop techniques corresponding to those of?? 3 and 4 but using some form of robust regression rather than least squares regression. We have not investigated this. 6. AN EXAMPLE To illustrate the way in which the results might be applied, we now give in brief outline an analysis of some data circulated some years ago by Dr P. D. P. Wood, Milk Marketing Board, to the Multivariate Study Group of the Royal Statistical Society. The data comprised 8 measurements on the pelvis of each of 90 Friesian cows. The upper estimates in Table 1 give the first four estimated moments of the marginal distributions. For samples of size 90 from a univariate normal distribution, the lower and upper 5% points for g, are roughly -0-6 and 0-6, and for g and 1-47, respectivel A frequency plot for variable 7 showed an observation at 292 mm, the range for the other animals being 150 to 223 mm. All observations on this extreme animal were omitted in the subsequent analyses. In particular this omission reduced Y1 and g2 for variable 7 to and Study of the marginal distributions showed no other obvious outliers, although the first variable was markedly bimodal. Note that there is no evidence of systematic skewness; a log transformation was therefore not applied. Table 1. Pelvic measurements on cows. First four marginal moments. Upper values, all 90 cows. Lower values, selected 84 cows Mean (mm) X6 189X0 189X2 St. dev. (mm) X X63 1X I03-0*38 0*66 0_35 0A

10 Testing multivariate normality 271 The next step was to use the coordinate-dependent methods of? 3 3. Method (e), using Tukey's degree of freedom for nonadditivity, was tried; the largest Student t statistic out of 8 is 2-58, for the regression of variable 2 on the square of the fitted value of its linear regression on the other variables, but such a value is not markedly extreme. The array of t statistics generated by method (a) contained a number of abnormally large values, mostly but not entirely connected with variable 2. Various scatter plots showed that there were 5 animals which in the space of the first 5 variables form an outlying group not on the main linear regression and therefore inducing curvature. They, too, were omitted for separate interpretation and method (a) of? 3.3 reapplied to the remaining 84 individuals. Table 2 shows the resulting Student t statistics and their marginal sums of squares: see (4). The ranked Student t statistics can be plotted against expected normal order statistics. There is nothing untoward. The invariant procedure of? 4 was then applied to these remaining 84 individuals, taking 6 variables at a time for computational reasons. The largest value of a2. obtained was 0-34 which is not only well short of statistical significance but corresponds to only a modest degree of curvature. Table 2. Pelvic measurements. Curvature analysis for 84 selected cows Sum of Dependent Squared variable squares variable for row X29 1'63 0X89 12X X85-0X X F Sum ofsquaresfor column The marginal moments for the 84 individuals are recorded in the lower values of Table 1; there is some bimodality in variable 1. To summarize, one outlier and five anomalous individuals have been detected. The remaining 84 individuals, while showing some evidence of marginal nonnormality, show no evidence of nonlinearity, and interpretation of the interrelationships among the 8 variables via their covariance matrix seems in order. No doubt these conclusions could be reached via other routes. We are grateful to Dr P. D. P. Wood for permission to use the data analysed in? 6. N. J. H. Small's work was supported by the Science Research Council. REFERENCES ANDREWS, D. F., GNANADESIKAN, R. & WARNER, J. L. (1971). Transformati Biometrics 27, ANDREWS, D. F., GNANADESIKAN, R. & WARNER, J. L. (1973). Methods for assessing multivariate normality. In Multivariate Analysis, Vol. 3, Ed. P. R. Krishnaiah, pp New York: Academic Press. BARNDORFF-NIELSEN, 0. (1977). Discussion of paper by D. R. Cox. Scand. J. Statist. 4, Box, G. E. P. & Cox, D. R. (1964). An analysis of transformations. J. R. Statist. Soc. B 26, Cox, D. R. (1968). Notes on some aspects of regression analysis. J. R. Statist. Soc. A 131, DAY, N. R. (1969). Divisive cluster analysis and a test for multivariate normality. Bull. I.S.I. 43, 2

11 272 D. R. COX AND N. J. H. SMALL DUJRBIN, J. (1973). Distribution Theoryfor Tests Based on the Sample Distribution Function. Philadelphia: Society for Industrial and Applied Mathematics. GNANADESIKAN, R. (1977). Methods for Statistical Data Analysis of Multivariate Observations. New York: Wiley. HEALY, M. J. R. (1968). Multivariate normal plotting. Appl. Statist. 17, MALKOVICH, J. F. & AFIFI, A. A. (1973). On tests for multivariate normality. J. Am. Statist. Assoc. 68, MARDIA, K. V. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika 57, MARDIA, K. V. (1975). Assessment of multinormality and the robustness of Hotelling's T2 test. Appi. Statist. 24, TUKEY, J. W. (1949). One degree of freedom for non-additivity. Biometrics 5, [Received November Revised February 1978]

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at Biometrika Trust Robust Regression via Discriminant Analysis Author(s): A. C. Atkinson and D. R. Cox Source: Biometrika, Vol. 64, No. 1 (Apr., 1977), pp. 15-19 Published by: Oxford University Press on