Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Size: px

Start display at page:

Download "Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at"

Sharlene Montgomery
5 years ago
Views:

1 Biometrika Trust Analysis of Variability with Large Numbers of Small Samples Author(s): D. R. Cox and P. J. Solomon Source: Biometrika, Vol. 73, No. 3 (Dec., 1986), pp Published by: Oxford University Press on behalf of Biometrika Trust Stable URL: Accessed: :35 UTC REFERENCES Linked references are available on JSTOR for this article: You may need to log in to JSTOR to access the linked references. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org. Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at Biometrika Trust, Oxford University Press are collaborating with JSTOR to digitize, preserve and extend access to Biometrika

2 Biometrika (1986), 73, 3, pp Printed in Great Britain Analysis of variability with large numbers of small samples By D. R. COX Department of Mathematics, Imperial College, London SW7 2BZ, U.K. AND P. J. SOLOMON Department of Statistics, University of Adelaide, Adelaide, S.A., Australia 5001 SUMMARY Procedures are discussed for the detailed analysis of distributional form, based on many samples of size r, where especially r = 2, 3, 4. The possibility of discriminating between different kinds of departure from the standard normal assumptions is discussed. Both graphical and more formal procedures are developed and iltustrated by some data on pulse rates. Some key words: Graphical methods; Kurtosis; Nonnormality; Order statistics; Overdispersion; Pulse rate; Skewness. 1. INTRODUCTION It is common both in some kinds of balanced experimental design and in schemes of routine testing to have quite large numbers of small groups of observations, each group obtained under the same conditions. For example, in a large study, blood pressure measurements might be taken in duplicate on each patient visit, or in certain routine chemical analyses triplicate samples from the same source might be dealt with independently. There can be major problems in ensuring the independence of such replicate observations necessary to ensure that the relevant source of variability is not underestimated. Assuming that this independence is achieved, preferably by appropriate 'blinding', we use the replicate observations primarily to improve the precision of the mean, or other location estimate, but also to estimate a component of variance within samples, usually measuring sampling or measurement error. A further use is to check for gross errors affecting a single observation. In the present paper, however, we examine what further information can be extracted from such data. Throughout we suppose that, possibly after transformation, the 'standard' assumption of a normal distribution with constant variance is at least a reasonable starting point for an analysis. Of course more detailed analysis is likely to be worthwhile only if the variation within samples is of intrinsic interest. We assume throughout that rounding errors, digit preferences and the like are relatively unimportant and that the variability studied is not an artefact. 2. SOME GENERAL IDEAS 2 1. Formulation of models Suppose that we have m independent samples each of size r, the observations in the ith sample being ydl,...., Yir (i = 1..., inm). As a basis for the analysis we consider a

3 544 D. R. COX AND P. J. SOLOMON number of possible models for the corresponding simplest is the standard, normal theory one. Normal theory model, MN. The Yip are independently normally distributed with const variance o.2 and with E ( Yip) = ui. As explained in? 1 the focus of interest in the present paper is not the {Ui}, which may indeed have some additional structure specified, for instance, by a regression or factorial model. We shall typically suppose that there are substantial differences in mean present and that no useful information about the internal variability can be recovered from the between-sample variation. There are numerous ways in which interesting departures from MN may occur and we shall consider just three. Systematic changes in variance, MSY. Here the assumptions of MN hold except that var (YYi,) = CJ2, which is not constant but is a function either of an explanatory variable zi characterizing the ith population or of g,i. Such variation can, if necessary, be represent in various ways (Cook & Weisberg, 1983), e.g. OJ2= e/zi 0-2 2= etgi O02 o2 = = epg9i.2 P - H 20 where o-2o is a 'baseline' variance and / captures the systematic dependency present. An important extension allows for more complex multidimensional dependence: for example, if the samples are arranged in a row x column array, there may be systematic differences in variance between rows, between columns, or both. Complementary to MSY is the possibility that, while each population has a different variance, the changes in variance are random, unrelated to any observed feature. This is a model of overdispersion relative to MN. Overdispersion model, MOD. Again the normal theory assumptions of MN are modifie only by allowing each population to have a different variance, var ( Y1) = (J2, but now Ti = C-2 (i = 1,..., m) are independent unobserved values of a random variable T having a probability density function h(t). In the type of application we have in mind, it will often not be feasible to estimate the form of h(t) with any precision, and it may then be adequate to assume that T has an inverse gamma distribution, with density (Tfo)o tt-'fo exp (-'fot'/t){f('fo)f1, (1) where fo plays the role of an 'effective degrees of freedom' and E(T)= Tfo (fo-2)-1= O, say. For our final model, we suppose that, except for location, all populations ha same distribution, which is, however, nonnormal. Nonnormal model, MNN. All { Y1} are independent and the density of Yi, is where g(x) is a nonnormal density of zero mean. We write Kr for the rth cumulant of Yis and pr=kr/(j (r > 3). Of course, there are other possibilities and, in particular, we could combine M MOD and MNN in various ways, and also introduce models involving serial correlation.

4 Variability with large numbers of small samples 545 We are interested in methods for detecting departures from MN, in the estimation of relevant parameters in MSY, MOD and MNN, ana in studying the feasibility of discriminat ing between these three kinds of departure, in particular between MOD and MNN. While for very small values of r, separation of MOD and MNN may often not be possible, note that if there is substantial underdispersion in the sample estimates of variance, MOD can at once be eliminated from consideration. No special model for gross errors has been included. Both MOD and MNN can simulate such errors, in the first case via an occasional very large 'true' variance and in the second via a very long-tailed error distribution. Note that with r > 2 if gross errors were detected to be predominantly in one direction, MNN rather than MOD would be required. To some extent further analysis and interpretation may be quite strongly influenced by what kind of departure from MN is most appropriate. Thus, as between MOD and MNN, the former prompts the question 'why are some groups more variable than others?', whereas under the latter the errors may more reasonably be presumed to have a totally homogeneous structure. For the ith sample, we write 2 2. Some simple properties of MOD and MNN -i =, Yip/r Si = I ( Yip _ yi )2 for the sample mean and sum of squares. These are sufficient under both MN and MOD. It is convenient also to write ={Si/ (r -1 )}2 and 'i= 5i for the usual estimates of standard deviation and variance. For the overdispersion model MOD with the inverse gamma compounding density (1), the likelihood of the ith sample is (fo T?)!oF(1fO + 2r) F(2fo)irir{foro+ r(-zi)2+ Si}fo+lrf (2) which is of the Student's t form. If the {,i} are regarded as nuisance parameters, we examine the marginal likelihood based on Si which is Sir-3/2{1 + Sr/(f0to)}-foAr+i (foto r bb2f 2 r2) 3 For the general compounding density h(t), (3) is replaced by 00 J r1qr-1(sit'v)h(t) dt, (4) where qr1l(x) is the probability density of th of freedom. Estimation under the model MOD can be based on (3). Alternatively, and also for discriminating between MOD and MNN, it is useful to record the cumulants {Kj(i zi in terms of the cumulants {Aj} of the compounding density h(t). These are, wit 2= A1 = E(T), E()= var ()=+-1- (5)

5 546 D. R. COX AND P. J. SOLOMON say, and - 6S 8 _12r+1 A2 +(r+l1)(r +3) A31 K3(Ti)=)2+ (r-1)2 4 (r-1)(r+)aj Because A2/ U4, A3! O6,... are the cumulants of a nonnegative random variable of unit mean, A3/oJ6 > (A2/! o4)(a 2/4-1), with equality attained by a two-point distribution with one atom at zero. Therefore (T-) >.6{- + 8 (r+l)(9-r) A2 (r+1)(r+3) (A2)2} {(r- 1)2 (r- l) +2 r+1) +2 Note that negative skewness in the distribution of -, is possible, although for s of r, very large values of if2 would be necessary to bring this about. By comparison we have under the nonnormal model MNN (Kendall & Stuart, 1969, Ch. 12) E(T-)=cr2, var ( (~~\6 8+ 4(r -2) 1? 2 _(7 03( E) =C{(r - I1)2 +r(r - l1)2 p23 +r( r - l ) r From (5) and (7) we can compute fairly directly via the first two moments of the {T} estimates of either A2/ C in MOD or p4 in MNN. Both are dimensionless measures of the departure from the standard conditions MN. For r> 2 we can compute for each sample scale and location invariant measures of 'shape', e.g. the standardized third cumulant. Under MN and MOD these are distributed independently of yi. and Si, but this independence is in general lost under MNN. We shall in the subsequent discussion suppose that the population means {fig} are such that no useful information about error properties can be gleaned from the sample means 9'. Note, however, that under MNN cov (i., Si) = (r-1)k3/r. If the {ui} are distributed with variance -2,, this covering both systematic and rando variation, we have so that var (Yi.) = (J2/r + &.2, var(si) = 2(r - 1)(J4+ (r -)2K41r, corr (Yi., Si) = P I (8) Under MNN there is thus some possibility tha which could be misinterpreted as evidence The correlation (8) is, however, likely to be assume.

Variability with large numbers of small samples 547 2-3. Procedures based on the distribution of Si In? 2*2 some of the properties of the sample sums of squares ha especially under MOD and MNN.

6 Variability with large numbers of small samples Procedures based on the distribution of Si In? 2*2 some of the properties of the sample sums of squares ha especially under MOD and MNN. We now consider some corresponding statistical procedures. For preliminary analysis a probability plot of the ordered {Si} against the expected order statistics of the chi-squared distribution with r- 1 degrees of freedom (Pearson & Hartley, 1972, Table 20) is natural. When r is not too small, say r > 5, and a number of Si are available, a powerful genera method of analysis (Bartlett & Kendall, 1946) under MN is to employ linear methods for log Si using the fact that under sampling a normal distribution var (log Si) = qf(2r-2), (9) where +fr(z) is the digamma function and the values for r = 2, 3, 4 are 4 93, 1-64, We shall not make extensive use of this in the present work, partly because of the severe loss of efficiency when r = 2, 3 and partly because of the undue sensitivity to sma values of Si and the failure of the method without ad hoc modification if Si = 0; su values could quite easily arise from rounding errors. As one example of its use, however, one could test for departure from MN in the direction of MOD or MNN by comparing with (9). (m - 1)-1{ (log Si)2 - (X log Si)2/ m} For MOD expansion of (3) or (4) for small 1/fo or small dispersion of h(x) shows that the locally most powerful test is based on the distribution of l S2 given L Si, or equivalently the marginal distribution of the dispersion index _{ S2 _ (X Si)2/ m}m-.1 (1 Si/ m)2{2/(r-1)} The divisor 2/ (r -1) ensures that, under MN, I -> 1 in probability as m -> cx. Note that detailed specification of h(x) is unnecessary for the local optimality property. For large m, I is asymptotically normal: because of the skewness of I, a rather better approximation is obtained by taking log I to be normal with mean - (r + 1)/{m(r - 1)} and variance 2(r+ 1)/{m(r- 1)} Detection of systematic changes We now consider the examination of possible systematic relations between variance and an explanatory variable or between variance and mean. Again graphical analysis will usually be a natural first step supplemented where appropriate by a test statistic which we take in the form T =I aisilz; Si. 11 where we can without loss of generality scale the explanatory variable so that l ai = 0, I ai = m. For instance if the sample means yi. are taken as the explan-atory variable ai=(yi.-y){ (yj.)2/m} 2 where g = Y./ m. There are two ways of obtaining a null hypothesis distribution of T The first is to note that under MN all permutations of {S1,...., Sm} are equally likely; hence, with S. = l Si/ m, E(T)=0O, var (T) = {z(si _S)2/ m}/3.2. (12)

7 548 D. R. COX AND P. J. SOLOMON Under weak assumptions, asymptotic normality will hold as n -> cc: for higher permutation moments, see Cox & Hinkley (1974, p. 185). An 'exact' permutation test is in principle possible. Under MN a more sensitive analysis is possible, by using the normal theory distribution of the ratio of quadratic forms rather than the permutation distribution. A reasonably simple approximate test is derived by writing pr (Ttx) = pr { (ai - x/m) Si }- [{2 - )( x2/)} Thus for a two-sided equi-tailed test at level 2a we need critical limits x* defined by where ((-k*) a. That is, {2(r- 1)(1 +x*2/m)}12 a, ~~ 2 (13) x* ={2(r - 1)1'k*f 1-2(r - 1) k*2/m}2 (13) For a more refined calculation it is possible to introduce a correction based on the standardized skewness and kurtosis of the random variable I (ai - x/lvm)si, nam l) (r- 1)/m I r+/ (m)} = (rii )rn{p4avrnmp3a+o(n)} where P3a = a3/m, P4a = a4/m-3. Numerical work in the special case r = 2 suggests that for the purpose of the present paper, where m is likely to be quite large, use of (13) is entirely adequate. Note that the simpler approximation (13) with the 1/rm term omitted is equivalent to the permutation test (12) with a normal theory value for var (T), as is clear on general grounds. 3. SAMPLES OF SIZE TWO We now consider in more detail what is probably the most common case in applications, namely r = 2, when a large number of duplicate observations are available. Provided that the numbering of observations within a pair is uninformative, we may replace the ith pair of observations (Yil, Yi2) by Yi. = 2( Yi1 + YiA), Si = 2( Yi-Yi2)2. An initial analysis for detecting systematic dependencies is to plot Si against Yi. or some other suitable explanatory variable zi. For a formal test we use? 2-4. If there is no systematic relation, attention is focused on the marginal distribution of the {Si}. Under the standard model MN, Sil/r2 has the chi-squared distribution with one degree of freedom. Probably the simplest graphical analysis is a seminormal plot of the ordered V/Sj; see Pearson & Hartley (1972, Table 21). Departure in the direction of underdispersion would indicate MNN whereas overdispersion could indicate either MOD or MNN.

8 Variability with large numbers of small samples 549 Under MNN, it is clear that even with a very large value of m only even-order cumulants of the distribution of the y's can be determined. Thus there is no way of studying the skewness of y, at least so long as there is no relevant information in the variation between the {yi.} To supplement the plot of Si versus yi. the index of dispersion can be calculated, and by (5) and (7) regarded as estimating 1 +2(A2/J-4) under MOD, and 1 +4p4 under MNN, 4. SAMPLES OF SIZE THREE 4 1. General discussion When we have samples of size three rather than samples of size two, there is appreciable extra sensitivity in having estimates of variance with two degrees of freedom rather than one, but in some ways the more interesting difference lies in the richer possibilities for examining distributional shape and in particular of detecting skewness under the nonnormal model, MNN. Thus an initial analysis will often consist of: (i) a plot of Si versus Yi. or a similar explanatory variable zi, supplemented where necessary by the test statistic (11); (ii) a plot of the ordered Si versus the expected order statistics from an exponential distribution, supplemented where necessary by the test statistic (10). Note that epm, the expected value of the pth smallest observation in samples of size m from the unit exponential distribution, is 1 1 log{(m+2)/(m-p+2)} (ptm), pmrmn * * * m-p+1 log (m+)+y (p= m), where -y is Euler's constant. We now concentrate on the more detailed analysis of the variation and this is most naturally done in terms of the order statistics Order statistics in samples of three It is convenient to drop temporarily the suffix indicating the particular sample and to write Y(1) < Y(2) Y(3) for the order statistics. The central results were given by Fisher (1930) in connection with a study of the exact distribution of sample estimates of skewne in which induction on sample size was used, samples of three being the 'starting point'. If a2 = ]S is the usual estimate of variance, the one 'degree of freedom' that describes the skewness of the sample can be taken in dimensionless form as W = (Y(3)-2Y(2) + Y(1))/J Under the standard normal model, MN, and also under MOD, (w, S) are independent, so that clear dependence between (w, S) is an indication that MNN should be considered. To obtain the joint distribution of (5, w) under MN and MNN it is convenient, following Fisher (1930), to write Y(3) = Y(2) + 0 ( Cos COS v t ext/3rem +1 si v Y=Y sin v, - Y(1) Cos = v Y(2) +13 c - p sin to ss v, wt +2 =3 sin or

9 550 D. R. Cox AND P. J. SOLOMON Y(1)* Note that w = 213 sin v and that t - sin1 takes values in (-1, 1). Now the joint density of {Y(1), Y(2), y(3)} is 6g(y(l))g(y(2))g(y(3)) and, on transforming to new variables (ai, Y2, v), the density of (5, v) is _ U f0 g(x - -Cos v + N3a5 sin v)g(x)g(x +J-Cos v +ln35-sin v) dx. (14) It follows immediately that for MN the statistics 5- and v are indeed independent and that the marginal density of v is uniform; i.e. that of t is uniform on (-1, 1). Incidentally the usual unbiased estimate of the third cumulant 2 =(y3- y)3=35 sin (3 v), showing explicitly that consideration of w or v is essentially equivalent to that of the standardized third cumulant ratio. Note that the individually standardized third cumulant lies in (-13, /3). There are the following open questions connected with (14). (i) Does knowledge of the joint distribution of (5, v) determine g(x) uniquely, except for a translation? (ii) Does independence of (r and v imply that g(x) is normal? (iii) Is there a simple necessary and sufficient condition for a given distribution of (O, v) to be representable in the form (14)? (iv) What statistic or statistics are theoretically most sensitive for detecting departures from MN? Note that under MOD, ( and v are independent, v is uniform and 5 or S has the overdispersed distribution discussed in? 2-2. Also if it is required to estimate skewness under the model MNN averaging of individually standardized estimates is inappropriate, in particular in the light of the constraint mentioned above: a consistent estimate of K3/1 ( is A reasonable analysis in practice is: 2m -Yi)3/(m )3/2. (a) the examination of the marginal distribution of the Si via the probability plot and test statistic mentioned in? 4 1; (b) inspection of a scatter plot of ti versus oi, clear dependence showing evidence against MN and MOD; (c) inspection of the marginal distribution of ti supplemented by calculation of the mean of ti as a statistic nonzero values of which would indicate skewness in the model MNN- 5. SAMPLES OF SIZE MORE THAN THREE We discuss only briefly corresponding procedures for samples of size four and more. The natural extension of the statitic w of? 4 is provided via linear functions of order statistics with a simple interpretation and zero expectation under normality. Thus for samples of size four we start with measures of skewness and kurtosis ( y(4 - y(3)) - ( Y(2) -Y(1)), (y j"4 - (3) + ( Y(2) -Y(1)) - k( y(3) -Y(2)),

Variability with large numbers of small samples 551 choosing k so that the second of these has zero expectation under MN: in fact k = 2-465.

10 Variability with large numbers of small samples 551 choosing k so that the second of these has zero expectation under MN: in fact k = This leads us to define W = {Y(4)-Y(3 -Y(2)+Y(1)}/, W"= {Y(4) y(3)+3f465Y(2)-Y(1)1'7. It is easily shown that, under MN and MOD, W' and wi are independent of 5J, var(w')=0-771, var (w")= They are also uncorrelated but far from independent and far from normally distributed. It is likely, therefore, that the most effective procedure is to compute w' and w" from each set of data and to examine the sample means and variances for consistency with the normal theory values. Departures would have a fairly clear diagnostic value. The sample estimate of cov (w', w") could also be calculated, although it is unclear what interpretation is to be put on a nonzero value. 6. SOME EXAMPLES Samples of size three. As an illustration, we analyse some pulse rates from the International Prospective Primary Prevention Study in Hypertension, a large scale clinical trial. Before entry and randomization to treatments, patients typically attended three qualifying visits at least one day apart. Pulse rate, the number of beats per minute, was one of several variables measured at each pre-entry visit. The data analysed here consist of three pulse rates, y, for a sample of one hundred men and one hundred women. The analysis goes in three broad steps. First, a plot, not given here, of the within-patient sum of squares Si versus the mean pulse rate yi. shows clear evidence that Si increases with Yi., thus indicating either MSY or, just conceivably, sampling correlation between Si and Yi. under MNN. Clearly MN is inappropriate. However, use of the reciprocal pulse rate removes the systematic relation between sum of squares and mean. The reciprocal pulse rate has a direct interpretation as the mean time between beats. The changes induced by reciprocal transformation are supported by the statistic T; for women, T = 0897 on the original scale, but this changes to for reciprocals; correspondingly for men, T changes from to 0094 under the reciprocal transformation. The second step is a probability plot of the ordered Si versus the expected exponential order statistics. Both the pulse rates and reciprocal pulse rates show overdispersion, but less for reciprocals; see Fig. 1. The very large values of Si correspond to women with large variation in pulse rate which variation, however, may be clinically meaningful. Therefore it is on the whole sensible to regard such values as part of the population under study rather than as aberrant values to be rejected. Supplementing the plot by the index of overdispersion I of (10) verifies the improved distributional properties of the reciprocals. Under MNN, from (7), I leads to an estimate of the dimensionless fourth cumulant, namely and for pulse rates for women and men, and on the reciprocal scale, and 4-602, respectively. The third step in the analysis plots Si versus the angular measure of skewness ti. We found it useful to examine for comparison simulated data from other distributions such as the normal, exponential and Student's 1 distributions. Normal data produced a random scatter as expected whereas exponential data ga-ve a plot which was clearly positively skew.

11 552 D. R. COX AND P. J. SOLOMON Si ~~~~~~~~ e 5 Fig. 1. Entry pulse rate for 100 women. Triples. Crosses, Si versus exponential score, ei; circles, losi versus exponential score, ei, after reciprocal transformation. (a) 1000 Si O ~ ~~~~ - Si 600- o 0 (23> 800- (b) (3 0 ~~~~~~~~~~~~ ~~~0 0 O ~~~~200 e O~~~~~~~~~~~~~~ t ~~ O ~ ~ CD? _ 0 _ 0-1*0-0O5 0 O 5 PG ti 1.0 Fig. 2. Entry pulse rate for 100 women. Triples. (a) Si versus angular statistic, ti. (b) losi versus angular statistic, ti, after reciprocal transformation _ ]-st ~~~xxxx < XA 0 I I Seminormal expected order statistic Fig. 3. Entry pulse rate for 100 women. Pairs. Crosses, seminormal plot of ordered V/Si; circles, seminormal plot of ordered V/S, after reciprocal transformation.

Variability with large numbers of small samples 553 Figure 2(a) shows that Si and ti for pulse rates are not independent. Reciprocals improve the symmetry; see Fig. 2(b).

12 Variability with large numbers of small samples 553 Figure 2(a) shows that Si and ti for pulse rates are not independent. Reciprocals improve the symmetry; see Fig. 2(b). The skewness measure ti is characterized by concentrations of points at -1, 1 and 0, the first two values a consequence of grouping; its distribution is certainly nonuniform, both on the original and reciprocal scales, so that MOD is not appropriate. The women exhibit slightly negatively skew pulse rates, giving mean ti and standard deviation On the reciprocal scale, ti has mean and standard deviation 0 705: the mean is exaggerated by one very large Si. By contrast, the men exhibit slight positive skewness in pulse rate with mean ti and standard deviation 0 739; for reciprocals, mean ti is with standard deviation The assumption of symmetry in the distribution of the reciprocals is further supported by the usual standardized measure of skewness which takes values of and for women and for men. Exactly unit values of ti are accounted for by rounding. Simulations suggest, however, that rounding cannot entirely explain the anomalous distribution of ti. The grouping could have been aggravated by some temporal correlation. In particular, when two out of three rates agree, do they occur together as consecutive values? Pooling men and women, of 81 individuals with two rates the same, 19 had a different rate at the second visit, whereas 34 had a different rate at the first, and 28 at the third visit. This tends to suggest that some doctors may recall the previous pulse rate at the subsequent visit. In summary, the reciprocal pulse rates appear to have a symmetric, nonnormal, long-tailed distribution, suggesting that Student's t distribution with a small number of degrees of freedom may be appropriate. This provides a suitably flexible family of symmetrical distributions if parametric representation is required. Samples of size two. Partly in order to compare results, we have analysed a series of pairs of observations formed by taking just the first and last pulse rate from the data discussed above. Reciprocals certainly reduce the size of the kurtosis, being typically positive. However, reciprocals for women exhibit slight negative kurtosis as shown in Fig. 3, although testing with I, which under MNN leads to an estimate of p4 as for reciprocals, this is by no means statistically significant. Also, the other values of I vary little. In particular, for women, pulse rates give p as 3 335, and for men 6 054, which changes to for reciprocals. Samples of size four. Data on strengths of yarn (Cox & Snell, 1981, p. 131) in the form of 12 samples each of size four show excellent agreement with the normal theory values for w' and w". Namely, the observed mean w' for bobbins is -0,0178 with observed standard deviation 0-710, and for w", the observed mean is -0A466 with standard dev 1 561; the theoretical standard deviations are respectively and and the standard errors of the means thus and 0;461. ACKNOWLEDGEMENTS We are grateful to the Science and Engineering Research Council and to Ciba-Geigy Ltd (Basle) for support of this work. REFERENCES BARTLETT, M. S. & KENDALL, D. G. (1946). The statistical analysis of variance-heterogeneity and the logarithmic transformation. SuppL J. R. Statist. Soc. 8,

554 D. R. COX AND P. J. SOLOMON COOK, R. D. & WEISBERG, S. (1983). Diagnostics for heteroscedasticity in regression. Biometrika 70, 1-10. Cox, D. R. & HINKLEY, D. V. (1974). Theoretical Statistics.

13 554 D. R. COX AND P. J. SOLOMON COOK, R. D. & WEISBERG, S. (1983). Diagnostics for heteroscedasticity in regression. Biometrika 70, Cox, D. R. & HINKLEY, D. V. (1974). Theoretical Statistics. London: Chapman & Hall. Cox, D. R. & SNELL, E. J. (1981). Applied Stajistics. London: Chapman & Hall. FISHER, R. A. (1930). The moments of the distribution for normal samples of measures of departure from normality. Proc. R. Soc. A 130, KENDALL, M. G. & STUART A. (1969). Advanced Theory of Statistics, 1, 3rd ed. High Wycombe: Griffin. PEARSON, E. S. & HARTLEY, H. 0. (1972). Biometrika Tables for Statisticians, 2. High Wycombe: Griffin. [Received December Revised April 1986]

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at A Note on the Efficiency of Least-Squares Estimates Author(s): D. R. Cox and D. V. Hinkley Source: Journal of the Royal Statistical Society. Series B (Methodological), Vol. 30, No. 2 (1968), pp. 284-289