Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Similar documents
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access to Biometrics.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

The Periodogram and its Optical Analogy.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

E. DROR, W. G. DWYER AND D. M. KAN

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

The Econometric Society is collaborating with JSTOR to digitize, preserve and extend access to Econometrica.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

The Econometric Society is collaborating with JSTOR to digitize, preserve and extend access to Econometrica.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

INFORMS is collaborating with JSTOR to digitize, preserve and extend access to Management Science.

Biometrika Trust. Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika.

Mind Association. Oxford University Press and Mind Association are collaborating with JSTOR to digitize, preserve and extend access to Mind.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Direction: This test is worth 250 points and each problem worth points. DO ANY SIX

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

A nonparametric two-sample wald test of equality of variances

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

NORMAL CHARACTERIZATION BY ZERO CORRELATIONS

Math 423/533: The Main Theoretical Topics

The American Mathematical Monthly, Vol. 100, No. 8. (Oct., 1993), pp

The Review of Economic Studies, Ltd.

Ecological Society of America is collaborating with JSTOR to digitize, preserve and extend access to Ecology.

Biometrika Trust. Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika.

This content downloaded from on Wed, 27 Nov :43:30 PM All use subject to JSTOR Terms and Conditions

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Biometrika Trust. Biometrika Trust is collaborating with JSTOR to digitize, preserve and extend access to Biometrika.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Detection of Influential Observation in Linear Regression. R. Dennis Cook. Technometrics, Vol. 19, No. 1. (Feb., 1977), pp

PRINCIPLES OF STATISTICAL INFERENCE

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Mathematical Association of America is collaborating with JSTOR to digitize, preserve and extend access to The American Mathematical Monthly.

ON THE DISTRIBUTION OF RESIDUALS IN FITTED PARAMETRIC MODELS. C. P. Quesenberry and Charles Quesenberry, Jr.

Mathematical Association of America

THE PRINCIPLES AND PRACTICE OF STATISTICS IN BIOLOGICAL RESEARCH. Robert R. SOKAL and F. James ROHLF. State University of New York at Stony Brook

11] Index Number Which Shall Meet Certain of Fisher's Tests 397

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

Stat 5102 Final Exam May 14, 2015

The Suntory and Toyota International Centres for Economics and Related Disciplines

Simulating Properties of the Likelihood Ratio Test for a Unit Root in an Explosive Second Order Autoregression

STATISTICAL ANALYSIS WITH MISSING DATA

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Using R in Undergraduate and Graduate Probability and Mathematical Statistics Courses*

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

Annals of Mathematics

Finite Population Sampling and Inference

CONVERTING OBSERVED LIKELIHOOD FUNCTIONS TO TAIL PROBABILITIES. D.A.S. Fraser Mathematics Department York University North York, Ontario M3J 1P3

REGRESSION WITH CORRELATED ERRORS C.A. GLASBEY

Weighted Least Squares

HANDBOOK OF APPLICABLE MATHEMATICS

VARIABILITY OF KUDER-RICHARDSON FOm~A 20 RELIABILITY ESTIMATES. T. Anne Cleary University of Wisconsin and Robert L. Linn Educational Testing Service

THE NEGATIVE BINOMIAL DISTRIBUTION

Department of Statistical Science FIRST YEAR EXAM - SPRING 2017

The College Mathematics Journal, Vol. 24, No. 4. (Sep., 1993), pp

FORECASTING SUGARCANE PRODUCTION IN INDIA WITH ARIMA MODEL

Testing the homogeneity of variances in a two-way classification

An Improved Approximate Formula for Calculating Sample Sizes for Comparing Two Binomial Distributions

ANALYTIC TRANSFORMATIONS OF EVERYWHERE DENSE POINT SETS*

Research Article The Laplace Likelihood Ratio Test for Heteroscedasticity

Journal of Applied Probability, Vol. 13, No. 3. (Sep., 1976), pp

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

A simple analysis of the exact probability matching prior in the location-scale model

(3) (S) THE BIAS AND STABILITY OF JACK -KNIFE VARIANCE ESTIMATOR IN RATIO ESTIMATION

REGRESSION DIAGNOSTICS AND REMEDIAL MEASURES

DESIGN AND ANALYSIS OF EXPERIMENTS Third Edition

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

Module 6: Methods of Point Estimation Statistics (OA3102)

Statistics: A review. Why statistics?

SEQUENTIAL TESTS FOR COMPOSITE HYPOTHESES

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at

Journal of the American Mathematical Society, Vol. 2, No. 2. (Apr., 1989), pp

RESEARCH REPORT. Estimation of sample spacing in stochastic processes. Anders Rønn-Nielsen, Jon Sporring and Eva B.

The College Mathematics Journal, Vol. 16, No. 2. (Mar., 1985), pp

[313 ] A USE OF COMPLEX PROBABILITIES IN THE THEORY OF STOCHASTIC PROCESSES

Subject CS1 Actuarial Statistics 1 Core Principles

Hypothesis Testing for Var-Cov Components

This chapter follows from the work done in Chapter 4 of the Core topics book involving quadratic equations.

A note on modelling cross correlations: hyperbolic secant regression

Deccan Education Society s FERGUSSON COLLEGE, PUNE (AUTONOMOUS) SYLLABUS UNDER AUTOMONY. SECOND YEAR B.Sc. SEMESTER - III

Estimation Theory Fredrik Rusek. Chapters 6-7

JEREMY TAYLOR S CONTRIBUTIONS TO TRANSFORMATION MODEL

Sufficiency and conditionality

ECONOMETFUCS FIELD EXAM Michigan State University May 11, 2007

Multivariate Distributions

Likelihood and p-value functions in the composite likelihood context

The Poisson Correlation Function

ECONOMETRICS FIELD EXAM Michigan State University August 21, 2009

Transcription:

A Note on the Efficiency of Least-Squares Estimates Author(s): D. R. Cox and D. V. Hinkley Source: Journal of the Royal Statistical Society. Series B (Methodological), Vol. 30, No. 2 (1968), pp. 284-289 Published by: Wiley for the Royal Statistical Society Stable URL: http://www.jstor.org/stable/2984507 Accessed: 24-10-2017 13:15 UTC JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org. Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://about.jstor.org/terms Royal Statistical Society, Wiley are collaborating with JSTOR to digitize, preserve and extend access to Journal of the Royal Statistical Society. Series B (Methodological)

284 [No. 2, A Note on the Efficiency of Least-squares Estimates By D. R. Cox and D. V. HINKLEY Imperial College [Received December 1966. Revised May 1967] SUMMARY A linear model is considered in which errors are independent and identically distributed with zero mean. If the error distribution is specified, except possibly for unknown parameters, the asymptotic efficiency of least-squares estimates relative to maximum-likelihood estimates can be found. For regression parameters orthogonal to the general mean the asymptotic efficiency, which is independent of the design matrix, is calculated explicitly for an Edgeworth series, for a Pearson Type VII distribution and for a log gamma distribution. 1. INTRODUCTION CONSIDER a linear model in which random variables Y1,..., Yn have the form p Yi = E XiXqPqq+EJ 4 =h+ (1) q=0 say, where Po..., /p are unknown parameters, the Xjq's are known con.. I en are random terms of zero mean. There is much work on the effec tures from "standard" conditions on the analysis of (1); in the present paper we concentrate on the effect of distributional form on the large-sample properties of the least-squares estimates of the /q's. That is, El,..-,En are assumed independent and identically distributed with probability density function f(e; A) of zero mean, A being an unknown parameter giving the dispersion, and possibly also the shape, of the distribution. The "standard" assumption here is, of course, that the distribution is normal. Provided that var (Eq) <oo, the least-squares estimates /3q have an asymptotic normal distribution with covariance matrix I IX'XIIj- var (E), (2) under mild conditions on the n x (p+ 1) design matr likelihood estimates $q also are asymptotically normal a of the least-squares estimate gq is the ratio of the asym var (^)/var (Jq). (3) Alternatively we may restrict attention to unbia var (/3q) to the minimum variance (given by asymptotic measure of efficiency. The results of Sections 2, 3 and 4 apply equally to the asymptotic efficiency and to this second definition. The restriction to unbiased estimates is usually artificial, although when the distribution of the Ei's is symmetrical it is likely that most reasonable estimates of pq will be unbiased.

1968] COX AND HINKLEY - Efficiency of Least-squares Estimates 285 For non-normal distributions of the Ei's, the least-squares estimates have minimum variance among unbiased estimates that are linear combinations of the Yi's. The maximum-likelihood estimates will, however, be non-linear in the Yi's and the ratio (3) less than one. For a very careful statement of the asymptotic argument, we would need to consider a sequence of problems in which n->oo for a fixed number of parameters. We shall, however, not do this explicitly, merely supposing that n is large and that the usual arguments of asymptotic theory are applicable. As noted above, our results have also a non-asymptotic interpretation. Recently Anscombe (1967) developed a likelihood analysis of (1) when the Ei's have a Pearson Type VII distribution; that is, when where f(e; A)= A(A2) (1+ C(A2 ) A (4) c(a2) = 2(A )3 a(a2) = r(a2){2c(a2)}i ( 2(A2 - J)V P(A2 - lf ' that is, the Ei's are distributed proportionally to Student's t. Here parameters A1, A2 determine the scale and shape of the distribution. The present paper arose from an attempt to assess how far the estimates from Anscombe's analysis are likely to differ from least-squares estimates. 2. GENERAL RESULTS Often the general mean is included in the linear model and is not a parameter of primary interest. Then, without loss of generality, we suppose that xi0=1 (i=1l,...,n), ExX=q=O (q=1l,...,p), (6) i and take,..., as the parameters of interest, orthogonal to the general mean. The log likelihood is L = E logf(yi-pi; A) = E g(yi-pi; A), (7) i i say. An the estimation problem associated with (7) is regular. For instance, if the distribution of the Ei's has finite range, the range of regularity of (7) depends on the 4's, and the usual formulae for asymptotic variances are inapplicable. We shall, however, assume that this difficulty does not arise and proceed to evaluate the expected second derivatives of the log likelihood. First, let r, s = 1,...,p. Then a32l - = g(yi-hi; A) t) ', te = Eg"(Yi-HPi; A) XirXis, (8) where the derivatives are for fixed A. Thus E @13rsL (Xirxi)A V (9) where A6= E{-g"(E; A)} = f(e; A)

286 COX AND HINKLEY - Efficiency of Least-squares Estimates [No. 2, The quantity A, determines the precision of estimating a location pa simple random samples from f(y- 0; A) for known A (Fisher, 1925). Next, if A1 is a component of the parameter A determining distributional form, we have similarly that and a2l _ a2g(yi-pi; A) (11) D/rDa Al xe X ) by virtue of (6). Similarly, (6) implies that a32ln E'A AA) =0 (12) / a2l E ( ) =0 (r = 1,..., p). (13) Thus, by (12) and (13), the information matrix for the full set of unknown parameters partitions into the form ( I 0 ) 0 1 12 where I, refers to the parameters P0 and A, and I2 to the parameters S equation (9) gives the (r, s)th element of 12. Further, by the orthogonality condition (6), the covariance matrix of the least-squares estimates /1,..., /gp is 11 Xir xisjj'var(e). Hence the required measure of asymptotic efficiency (3) is {A, var (E)}-1. (14) This is independent of the design matrix; it is the same as the efficiency of the mean of a random sample considered as an estimate of the population mean, the parameter A being known. Finally, for particular distributions the expected value of the random variable in (11) may be zero. Then it is easily shown that E(a2L/a/3o DA) = 0 also, and the result (14) applies to all the regression parameters; that is, the preliminary orthogonalization (6) is unnecessary. This is the case, for example, if A is a single-scale parameter and the distribution is symmetric; that is, if f(e; A) = A-f(Ef/A) and f(x) =f(-x). 3. Two SPECIAL CASES For the Type VII distribution given by (4) and (5) it can be shown that for A2 _ A2 A-2 (A - ~)3 A2 (A2-23)2(A2 1)' var(,e) A(A2-)1 for the second result see, for example, Kendall and Stuart (1961, p. 59). Thus the asymptotic efficiency (14) is, for A2 > 32 A2(2A2-1) (15)

1968] COX AND HINKLEY - Efficiency of Least-squares Estimates 287 For some purposes it is instructive to express this in terms of the coefficient of kurtosis, Y2 = 6/(2A2-5); (15) then becomes (1+6Y2)(1 +3Y2)( now for A2 > 25. Some numerical results are given in the next section. A second interesting special case arises when the error distribution has the gamma form with a fixed coefficient of variation; that is, with a fixed index, say A. For many purposes it is then natural to think first of a multiplicative model; that is, to take the additive model (1) as applying to the logarithms of the original observations. That is, in (1) we take Ej as having a log gamma distribution of zero mean. The asympt efficiency (14) is then (Bartlett and Kendall, 1946) {A0'(A)}-1, (17) where +'(A) is the digamma function. Again some numerical results are given in the next section. The parameter A can, if required, be expressed in terms of either the coefficient of skewness or the coefficient of kurtosis by Yi = "(A) WO'(A)}-, Y2 = o"(a) {0'(A)}-2. (18) 4. EDGEWORTH SERIES To obtain the efficiency for a more general class of distributions, we have taken the Edgeworth series f(e; Y) =(E) [1 + 6Y1 H3(E) + {24 Y2 H4(E) + 1 H6(E)} +{12 oy73h5(e) + 1L 4Y 1y2 H7(E) + 12(96 y EH9(E)} + 7 0 Y {Y4 H6() + 15 2 72 H8(E) + 70 Y )3 H8(E) + 1 7 2 8 Y1 Y2 H,0(E) + 31104 yl H12(E)}], (19) where +(E) is the standardized normal density function, H,(E) is the Hermite polynomial of order r, and the shape parameter A in the general notation is replaced by y. We have without loss of generality taken the variance to be one. In terms of a notional parameter N, y, = O(N-i), Y2 = O(N-1),... and (19) is derived fro asymptotic expansion as N-> oo. For the Type VII distribution N= A2, and for the log gamma distribution N_ A. By (19) g(e; y) = const - E2 + log {1 + -y, H3(E) +...}, leading to a development in powers of N-1 when the logarithm is expanded; we assume f(e; y) to be such that the expansion is valid. It follows after appreciable calculation that E{-g"(E; y)} = 1 4+- 2 1 62 + o(n-2). (20) Since var (E) = 1, the efficiency is the reciprocal of (20). Note that the absence of first-order terms in y, and Y2 in (20) is to be expected, because the efficiency must take a maximum value of 1 at y, = Y2 = 0, corresponding to the normal distribution. It is interesting that the leading term in (20) is that in y1,

288 COX AND HINKLEY - Efficiency of Least-squares Estimates [No. 2, suggesting that y12, not Y2' is a suitable first-order index of the effect of non-n on the efficiency of the least-squares estimates. For the Type VII distribution y2 = 0, necessitating the use of second-order terms. Table 1 collects some numerical results based on (15), (17) and (20). The exact values for the Type VII and log gamma distribution are compared with the values obtained by inserting the appropriate values of yl, Y2 into the expression (20) derived for an Edgeworth series. Also for the log gamma distribution, an approximation based on (20) ignoring the terms of order N-2 is given. For the Type VII distribution, Table 1 shows that for small values of A2 (less than 10) the approximation to the asymptotic efficiency based on (20) is too small: for TABLE 1 Asymptotic efficiency of least-squares estimates Type VII Log gamma Efficiency Efficiency Efficiency Efficiency A Y2 from (15) from (20) A 1 2 from (17) from (2 to O(N-2) ~~~~~~to O(N-2) to O(N'1) 2-0 0-500 0.5-1 535" 4-000 0-405 0.087 0-463 2.5 0.700 1-0 -14140 2.400 0.608 0-202 0-606 3.0 6.000 0.800 04143 1.5-0-917 1-613 0.713 0 345 0.704 3.5 3.000 0.857 0.400 2-0 -0-780 1.188 0.775 0.476 0*769 4-0 2-000 0 893 0.600 2-5 -0-688 0-931 0-816 0 579 0.809 4.5 1P500 0-917 0.727 3.0-0-621 0-763 0-844 0.657 0.838 5.0 1 200 0-933 0.806 4-0 -0-529 0-557 0-881 0.761 0-877 6.0 0.857 0-955 0-891 5.0-0-469 0.437 0-904 0 821 0-902 10.0 0.400 0.984 0.974 10.0-0-324 0-210 0-951 0.928 0-950 A2 > 10 the approximation is quite good. Asymp the same as the true value as far as the term in A-1 For the log gamma distribution, the approximation based on (20) to terms of order N-2 behaves in a similar way to that for the Type VII distribution. But if we only use (20) to terms of order N-1 the approximation is much better, even though the relevant Edgeworth series (that is, to terms of order N-1) is not positive definit A < 5. The reason for this is not clear. 5. DisCUSSION For the Type VII distribution, the loss of efficiency is less than 10 per cent provided that A2 is greater than about 4; A2-4 is the value given by Jeffreys (1961, Section 5.7) for the distribution of observational errors in several series which he analysed. For the log gamma distribution, the loss of efficiency is less than 10 per cent provided that A is greater than about 5; for A) 5 the results of Bartlett and Kendall (1946, pp. 130-2) indicate that the main difference between the log gamma distribution and the normal distribution lies in the skewness of the former. The asymptotic efficiency considered here involves no allowance for the effect on confidence intervals of errors in estimating the shape and dispersion of the error

1968] COX AND HINKLEY - Efficiency of Least-squares Estimates 289 distribution. In fact, with moderate amounts of data the loss of effective informat from incorrectly using sums of squares of residuals in the estimation of error disp is conceivably more serious than the loss of information from incorrectly using lin functions to estimate the /q's. Similar results apply to linear autoregressive processes with non-normal innovations. ACKNOWLEDGEMAENT Mr Hinkley's work was supported by an S.R.C. Research Studentship. REFERENCES ANSCOMBE, F. J. (1967). Topics in the investigation of linear relations fitted by the method of least squares. J. R. Statist. Soc. B, 29, 1-52. BARTLETT, M. S. and KENDALL, D. G. (1946). The statistical analysis of variance-heterogeneity and the logarithmic transformation. J. R. Statist. Soc., Suppl., 8, 128-138. FISHER, R. A. (1925). Theory of statistical estimation. Proc. Camb. Phil. Soc., 22. 700-725. JEFFREYS, H. (1961). Theory of Probability (3rd ed.). Oxford: University Press. KENDALL, M. G. and STUART, A. (1961). The Advanced Theory of Statistics (Volume 2). London: Griffin.