') -t CAA\..-ll' ~ j... "1-' V'~ /'. uuo. ;). I ''''- ~ ( \ '.. /' I ~, Unbiased prediction in linear regression models with equi-correlated responses Shalabh Received: May 13, 1996; revised version: December 11, 1996 This paper considers problem of predicting actual and mean values of response variable in a linear regression model with equi-correlated responses. Two such predictors are presented and ir efficiency properties are studied w:l.threspect to criterion of covariance matrix. 1. Introduction: In many applications,we come across linear regression models with equi-correlated responses. For example, when observations are taken on some characteristic on members of a family in familial studies, y exhibit generally- high correlation, see, e.g., Srivastava (1984). Similarly. correlated responses are recorded when measurements are taken on two eyes or hands of individuals for studies in medical sciences, see, e.g., Munoz, Rosner and Carey (1986) and Rosner (1984). IJ.ikewise in survey sampling when cluster
238 sampling procedure is adopted, fairly high values of intra-cluster correlation is found, see, e.g., Holt and Scott (198!) and King and Evans (1986). Estimation of parameters in linear regression models with equi-correlated responses has received considerable attention in literature but such is not case with problem of prediction of some future vaiues of response variable given a set of value for explanatory variables. This has inspired present investigations. The plan of this article is as follows. In Section 2, we describe model and present two unbiased predictors. Their efficiency properties are also analyzed. Finally some remarks are made. 2. Model Specification And Predictions: Let us consider regression model: (1) y = ~ + au following linear where y is a nx1 vector of n observations on response variable, X is an nxp full column rank matrix of n observations on p explanatory variables, ~ is a column vector of associated regression coefficients, a is a scalar and u is a column vector of disturbances. Next, we assume that a set of m fixed values of explanatory variables in form of a mxp matrix Xf is given corresponding to which m values of response variable are to be predicted. Thus we have (2) Yf = Xf!1 + auf where Yf denotes column vector of m values of response variable and uf is vector of disturbances. It is assumed that values of response 239 variable are equi-correlated disturbances have an intra-class 50 that structure. Thus disturbances are correlation be identically distributed with assumed to write variances 1 and covariances p So mean, that ~e can E(u) = 0, E(uf) = 0, E(uu') (3) = (1 - p)i n + pjnj n ' = W (say) E(u f U f ') = (1 - p)i m + pj m J m ' E(uuf') = PJnJm' = Hf (say) elements where J unity. denotes a column vector with all Finally, we assume for sake of simplicity in exposition that observations in X are taken as deviations from ir corresponding means and model contains no intercept term so that X'J n is a null matrix. For predicting future values of response variable in a generalized linear regression model, Bibby and Toutenburg (1978) and Rao and Toutenburg (1995) have considered a variety of predictors and have presented a comprehensive discussion of ir properties under a general framework, see also Chandrasekar and Prabakaran (1994). He, however, restrict our attention to two unbiased predictors, viz., classical predictor and optimal homogeneous are defined as predictor obtained by GOldberger (1962). They (4) Pc = Xfb (5) where. -1 PH = Xfb + Wf W (y - Xb) (6) b = (X'X)-lX'y is least squares estimator which can be seen result to be identical with generalized least squares estimator of ~ from (1) employing '---, - -- - -- -- --- ------.
241 240 1 P 1 - P n 1 + (n - l)p n n ' ('1) W-l = ( ) [ r - J J,] see also Mc ElroY (1961) who has obtained a necessary and sufficient condition for equivalence of least Squares and generalized least squares estimators when disturbances are equi-correlated. The vector quantities (4) and (5) are generally used to find predictions eir for actual responses (Yf) or mean responses (Xf~) but not for both simultaneously. Practical situations may often arise where we are required to predict both actual values and mean values, see, e.g., Zellner (1994) and Shalabh (1995) for some illustrative examples. In order to handle this problem, let us define following target function (8) T = kyf + (1 - ~)E(Y~) = kyf + (1 - k)x~ where ~ is a nonstochastic scalar lying between o and 1; see Shalabh (1995) for details. The choice of k is a matter of practitioner's preference related to weightage assigned to prediction of actual responses in relation to prediction of mean responses. (9) (10) It is easy to see that E(PC - T) = OElXf(X'X)-lX'U - ~uf] = 0 E(PH - T) = OE(Kf(X'X)-lX'u - AUf = 0 + ~.,: - 1\p JmJn'u] whence follows unbiasedness of both predictors wher y are used for responses or actual responses or both. It can be easily covariance matrices of seen that predictors are. mean variance (11) (12) Vk(PC) = E(PC - T)(PC - T)' =02[(1 - P)(XfSXf' + ~21m) Vk(PH) = E(PH - T)(PH - T)' where S = (X'X)-l. (13) + A2pJ J, mm J =02[0 - P)(XfSXf' + ~21m) + p [\. 2 (1-2A)nP + J J. 1 + (n - 1 )p) m m ] From (11) and (12), we observe that (1-2A)np2 VA(PH) - Vk(PC) = 1 + (n - l)p JmJm' Thus both predictors are equally efficient when p = 0 and/or A = 0.5. The first case (p = 0) is not very interesting because n model loses its specification of equi-correlated responses. The second case (A = 0.5) is of course interesting. It implies that both predictors have identical performance properties when y are used in a situation in which prediction of actual performance and prediction of mean responses are equally important and thus receive equal weightage. When A is less than 0.5, i.e., prediction of mean responses is to be given higher weightage in comparison to prediction of actual responses, classical predictor remains unbeaten by optimal homogeneous predictor. Just reverse holds true when A exceeds 0.5. In or words, for situations assigning higher weightage to prediction of actual responses in, comparison to prediction of mean responses, optimal homogeneous predictor is superior to classical predictor with respect to criterion of variance covariance matrix. The aforesaid observations match, finding of Rao and Toutenburg (1995, p. 172) who have remarked that classical predictor is more
efficient than optimal homogeneous predictor for mean responses Rhile opposite is true when aim is to predict actual,responses.. Next, let us consider expression (11). It is seen that it is an increasing function of A in sense that as we increase value of ~ from 0 to 1, variability in Pc increases. This implies that variability of predictions arising from classical predictor has an upward. trend as ~ moves from 0 to 1. In or words, predictions have smaller dispersion Rhen y are used for mean values of response variable. Their performance declines as more and more weightage is given to prediction of actual values. 3. Some Remarks: We have considered problem of predicting future values of response variable in a linear regression model having an equi-correlated covariance structure and have studied efficiency properties of tro unbiased predictors with respect to criterion of variance covariance matrix. Our analysis has brought out some interesting findings that may prove useful to practitioners. It may be remarked that our investigations have assumed parameter p characterizing covariance structure to be known. When it is unknown, we may employ its estimate suggested by, for instance, Fuller and Battese (1973) and Srivastava (1984). Substituting such an estimate for p in Wf and W in (5), we can derive a feasible version of optimal predictor. Such a substitution will, however, disturb optimal property of predictor (5). It would be interesting to analyze performance of such feasible predictors and would be a subject matter of future work. REFRRENCES Bibby, J. and Toutenburg, H.(1978) Prediction And Improved Estimation In Linear Models, John Wiley, New York. Chandrasekar, B. and Prabakaran, T.E. (1994). A note on optimal vector unbiased predictor. Stat. Papers, 35, 71-80. Fuller, W.A. pnd Battese, G.E. (1973). Transformations for estimation of linear models with nested - error structure. J. Amer. Statist. Assoc., 68, 626-632. Goldberger, A.S. (1962). Best linear unbiased prediction in generalized linear regression 57, 369-375. model. J. Amer. Statist. Assoc., Holt, D. and Scott, A.J. (1981). Regression analysis using survey data. The Statistician, 30, 169-178. King, M.L. and Evans, M.A. (1986). Testing for block effects in' regression models based on 677-679. SUrvey data. J. Amer. Statist. Assoc., 81, Me Elroy, F.W. (1967). A sufficient condition that squares estimator be best J. Amer. Statist. Assoc., necessary and ordinary least linear unbiased. 62, 1302-1304. Munoz, A., Rosner, B. and Carey, V.(1986). Regression Analysis in Presence of heterogeneous intraclass Correlations. Biometrics, 42, 653-658. Rao, C.R. and Toutenburg, H. (1995) Linear Springer. Models, Least Squares And Alternatives, 243 Rosner, B. (1984). Multivariate methods in OPhthalmology with appli.cations to or
. ~ - L44 paired-data 1025-1035. situations. Biometrics, 40. Shalabh (1995). Performance of Stein-rule procedure for simultaneous prediction of actual and average values of study variable in linear regression model. Proceed. Fiftieth Session Int. Stat. Inst., 1375-1390. 90. Srivastava, M.S. (1984). intraclass correlation in Biometrika, 71, 177-185. Estimation familial of data. Zellner, A. (1994). Bayesjan and Non-Bayesian estimation using balanced loss functions (in Statistical Decision Theory And Related Topics V, eds. S.S. Gupta and J.O. Berger), Springer-Verlag, New York. Shalabh Department of Statistics University of Jammu Jammu-180 004, India