Econometric textbooks usually discuss the procedures to be adopted

Size: px

Start display at page:

Download "Econometric textbooks usually discuss the procedures to be adopted"

Benedict Lloyd
5 years ago
Views:

1 Economic and Social Review Vol. 9 No. 4 Substituting Means for Missing Observations in Regression D. CONNIFFE An Foras Taluntais I INTRODUCTION Econometric textbooks usually discuss the procedures to be adopted when missing values occur in data intended for regression analysis. There is a large statistical literature on this topic. A variety of methods, differing both in their computational complexity and in the assumptions required, have been suggested, compared or simulated by, among others, Buck (1960), Afifi and Elashoff (1966), Haitovsky (1968), and Hocking and Smith (1968). More recently, there have been the papers of Dagenais (1973), and Beale and Little (1975). Yet textbooks sometimes give a not dishonourable mention to the simplest ad-hoc procedure of all: that of replacing missing values by sample means. Apart from the obvious computational ease of the procedure, advantages over ordinary least squares (on the complete observations only) are claimed for it in some circumstances. To quote Kmenta (1971): 'One redeeming feature of the estimators... is the fact that when the correlation between X and Y is low, the mean square error of these estimators is less than that of ordinary least squares'. This note examines why the simple method of substituting means seems to give good results in certain circumstances. The objective is not to promote this simple approach instead of more sophisticated methods no one has ever claimed that it could be generally superior but to explain when and why its mean square error properties are superior to ordinary least squares or even to other methods.

2 II MISSING VALUES OF THE DEPENDENT VARIABLE Consider the case ofasimple regression when there are r 1 observations on x and r (< r ) complete pairs' of both x and y. The estimator of the regression coefficient obtained by replacing missing y values by the simple mean and performing the usual calculations is P where S' implies summation over all S'xy Sxy S = = 6* (1) -S' S' P S' K ' pairs including those in which the sample mean has been substituted, and 3* is the usual estimator from the r 2 complete observations. Afifi and Elashoff (1967) gave comparisons of the mean-square error of 0 relative to that of 0*, having assumed that yj = a + 3 xj + ej, where E (ej) = 0, V (ej) = a 2 (1-p 2 2 ) and X { is N (j^, a x ) Using their results, the mean square error of /3 is: (r 2 -l)a 2 (l-p2) + f i r - r j C2.(r a -1) + ( r _ r ) j ( 2 ) while that of 3* is ( r i_ 3 )(r 1 -l)a 2 i (r t -l) 2 j r 1 + l. 1 \ [ a 2 (1-P 2 ) (' 3-3) < (3) The first term of (2) will clearly be much smaller than (3) provided r x is considerably greater than r 2. The second term of (2) depends on the true "coefficient /3 and if this is small (2) can be less than (3). If 3 is not small enough (2) will exceed (3), so the use of j3 instead of 3* implies some degree of prior knowledge. But, if 3 is small, has the reduction in MSE resulted from information contained in the extra incomplete observations? This is a valid question because Hocking and Smith (1972) have shown that if y and x are bivariate Normal or multivariate Normal if X is a vector of explanatory variables and y values are missing from some observations, then the maximum likelihood estimate of j3 is just )3*. (This does not apply to the intercept term.) In formula (1), S is always less than S', so (3 is obtained from 0* by 'shrinking' it. Shrinkage estimators form a class of biased estimators obtained by multiplying least square estimators by quantities less than unity. Their properties have been discussed in the statistical literature; for example,

3 by Mayer and Wilke (1973). The optimal in a minimum mean square error sense shrinkage factor is j 0 2 S so, provided j3 2 /a 2 is bounded, an improvement over least squares is possible. The shrinkage estimator, defined by (4) is also superior to (1) although it uses only the complete observations. Since formula (4) involves unknown parameters that must be estimated from the data or from, probably vague, prior knowledge, the optimality may not be attainable in practice. But how plausible is S 2 x/s' 2 x as a shrinkage factor? It does not involve a 2, p or )3 and its magnitude depends on the number of incomplete observations. The degree of shrinkage increases with the number of extra incomplete observations. But from (4) the optimal shrinkage increases as 3 2 decreases relative to* a 2 (1 p 2 ). Unless it could be postulated that missing values will increase in frequency in situations where the coefficient is small, S /S' is inappropriate as a shrinkage factor. So, even in cases where the choice of a 'good' shrinkage factor is far from clear, it can be stated, somewhat negatively, that the method of substituting the mean for missing Y values is unlikely to help. Ill MISSING VALUES OF AN EXPLANATORY VARIABLE The simplest case, the dependent variable measured on r x observations and one explanatory variable with missing values for r 1 r 2 of these, is a trivial one because substituting the sample mean just gives the ordinary estimator. Suppose then that there are two explanatory variables, Xj measured on all r t observations and, which is missing for Tj r 2 of them. It is obvious that: and then S' = S, S'XjX 2 = SXjX 2 and S' y = S y A S ys' - S'x i y S X l Pi S' S - (SxjX 2 ) 2 and (5) S'x^S S ysxjx 2 S' S - (S X l ) 2

4 Taking expectations, where the model E(y) = a + 3j x t is assumed, gives ^ S ' x J S x * -(Sx.x,)* and (6) E(? 2 ) = /3 2 2 ' S' S - Sx, S*x, ) j S' S 2 - (S X l ) 2 \ S*x ][ differs from S'x, in that it contains the true (but unknown) values instead of the sample means. Thus the expectations (6) are not actually estimable unless further assumptions are made about the x's. However, it is clear that the estimators (5) are biased; the extent of the bias depending on the magnitude of 3 2. (3 2 is biased downwards and biased upwards (assuming Xj and positively related; if negatively related (3j is biased downwards). The variances of p t and j ^ 2, conditional on the x's, are a 2 S a 2 S' - and i!. (7) S' S _(S X l ) 2, S' S -(S X l ) 2 respectively, and these are smaller than the variances of the ordinary least squares estimators, which are a 2 S a 2 S S S -(S X l ) 2 a n d S S -(S X l ) 2 ( 8 ) The difference will not be appreciable in the case of $ 2 if SXj is small. The differences could be very large, however, if the denominators in (8) were close.to zero. Excluding this case for the present and remembering that mean square error is the sum of variance and squared bias, it is evident that the estimators (5) will only have better mean square errors than the least squares estimators if (3 2 is small. As in Section II it appears that prior knowledge is needed to justify the estimators. There are other estimators for this situation, of course. Another approach is to replace the missing values by *2i=*2 + ( X!i-X 1 )Sx i X 2 /S, (9) where x x j is the value of x ; corresponding to the missing value and is the sample mean over the r 2 complete observations. Now the standard analysis of the 'completed' data gives the usual least squares estimator (3% say, for 0 2 and

5 S'Xjy SXjX 2 S'x- 2 S _ for j3j. Conditionally on the x's, this has expectation S'x, x, Sx, x, ) ft (10)! and if is itself assumed to have a linear regression on x t, the further expectation over is /3j. So with this frequently made assumption, (10) is unbiased, unlike the estimator $ given by (5). The mean square error of (10) is easily shown to be smaller than that of $ ( except for small )3 2 and for the circumstances to be discussed next. The estimator (10) can be improved on by various modifications; for example, starting with (9) and conducting a weighted instead of unweighted analysis. The approach of Dagenais (1973) and Methods 4 and 5 of Beale and Little (1975) are of this type. But there is a situation where the mean square error properties of the simple method of substituting means are clearly superior to ordinary least squares on the complete observations and even superior to the more sophisticated methods mentioned. If S S -(S X l ) 2 (11) is close to zero, that is, if there is a severe multicolliniarity problem, the variances given by (8) will be very large. Those given by (7) will be much smaller and this reduction will more than offset the squared bias, provided j3 2 is bounded. The simple method is also superior to (10) because the variance of that estimator may be shown to contain (11) as a denominator. It is easy to see why any approach, based on predicting missing values from values of another explanatory variable, will fail in an extreme multicolliniarity situation. Formula (9) sets up an exact linear relationship between the explanatory variables for the incomplete observations and so makes the original multicolliniarity problem even more severe. This superiority, in a mean square error sense, of the method of substituting means, is another example of the fairly general finding that, in regression situations, biased estimators are most effective when extreme multicolliniarity is present in the data. In summary then, although the method of substituting means for missing explanatory variables is not generally sound, it could outperform other approaches, at least in terms of mean square error, in studies where the data-set is close to multicolliniar.

6 REFERENCES AFIFI, A. A. and R. M. ELASHOFF, "Missing Observations in Multivariate Statistics I. Review of the Literature." Journal of American Statistical Association, 61, AFIFI, A. A. and R. M. ELASHOFF, "Missing Observations in Multivariate Statistics II. Point Estimation in Simple Linear Regression." Journal of American Statistical Association, 62, BEALE, E. M. L. and R. J. A. LITTLE, "Missing Observations in Multivariate Analysis." Journal Royal Statistical Society B, 37, BUCK, S. F "A Method of estimation of Missing Values in Multivariate data suitable for use with an electronic computer." Journal Royal Statistical Society B, 22, DAGENAIS, M. G., "The use of Incomplete observations in Multiple Regression Analysis." Journal of Econometrics, 1, HAITOVSKY, Y., "Missing Data in Regression Analysis." Journal Royal Statistical Society B, 30, HOCKING, R. R. and W. B. SMITH, "Estimation of Parameters in the Multivariate Normal Distribution with missing observations." Journal of American Statistical Association 63, HOCKING, R. R. and W. B. SMITH, "Optimum Incomplete Multinomial Samples" Technometrics 14, KMENTA, J., Elements of Econometrics, Chapter 9, New York: Macmillan. MAYER, L. S. and T. A. WILKE, "On Biased Estimation in Linear Models." Technometrics, 15,

Comments on the Weighted Regression Approach to Missing Values

The Economic and Social Review, Vol. 14, No. 4, July 1983, pp. 259-272 Comments on the Weighted Regression Approach to Missing Values D. CONNIFFE The Economic and Social Research Institute, Dublin Abstract: