Unbiased prediction in linear regression models with equi-correlated responses

Similar documents
of Restricted and Mixed Regression Estimators

Bayesian Estimation of Regression Coefficients Under Extended Balanced Loss Function

Stein-Rule Estimation under an Extended Balanced Loss Function

Sociedad de Estadística e Investigación Operativa

Projektpartner. Sonderforschungsbereich 386, Paper 163 (1999) Online unter:

ON EFFICIENT FORECASTING IN LINEAR REGRESSION MODELS

Pseudo-minimax linear and mixed regression estimation of regression coecients when prior estimates are available

Mean squared error matrix comparison of least aquares and Stein-rule estimators for regression coefficients under non-normal disturbances

304 A^VÇÚO 1n ò while the commonly employed loss function for the precision of estimation is squared error loss function ( β β) ( β β) (1.3) or weight

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

1. Introduction Over the last three decades a number of model selection criteria have been proposed, including AIC (Akaike, 1973), AICC (Hurvich & Tsa

Marcia Gumpertz and Sastry G. Pantula Department of Statistics North Carolina State University Raleigh, NC

Improved Multivariate Prediction in a General Linear Model with an Unknown Error Covariance Matrix

Chapter 14 Stein-Rule Estimation

COLLABORATION OF STATISTICAL METHODS IN SELECTING THE CORRECT MULTIPLE LINEAR REGRESSIONS

Econometric textbooks usually discuss the procedures to be adopted

Large Sample Properties of Estimators in the Classical Linear Regression Model

Cost analysis of alternative modes of delivery by lognormal regression model

A note on the equality of the BLUPs for new observations under two linear models

Testing Some Covariance Structures under a Growth Curve Model in High Dimension

ECONOMETRIC THEORY. MODULE VI Lecture 19 Regression Analysis Under Linear Restrictions

Introduction to Within-Person Analysis and RM ANOVA

Small area estimation with missing data using a multivariate linear random effects model

Charles E. McCulloch Biometrics Unit and Statistics Center Cornell University

y it = α i + β 0 ix it + ε it (0.1) The panel data estimators for the linear model are all standard, either the application of OLS or GLS.

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES

PREDICTION IN RESTRICTED REGRESSION MODELS

Comparison of prediction quality of the best linear unbiased predictors in time series linear regression models

Next is material on matrix rank. Please see the handout

REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES

On the Efficiencies of Several Generalized Least Squares Estimators in a Seemingly Unrelated Regression Model and a Heteroscedastic Model

Testing Statistical Hypotheses

Introductory Econometrics

Journal of Multivariate Analysis. Use of prior information in the consistent estimation of regression coefficients in measurement error models

INTRODUCTORY ECONOMETRICS

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Stochastic Design Criteria in Linear Models

Approximate analysis of covariance in trials in rare diseases, in particular rare cancers

Specification errors in linear regression models

Testing Simple Hypotheses R.L. Wolpert Institute of Statistics and Decision Sciences Duke University, Box Durham, NC 27708, USA

Researchers often record several characters in their research experiments where each character has a special significance to the experimenter.

Canonical Correlation Analysis of Longitudinal Data

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

Analysis of the AIC Statistic for Optimal Detection of Small Changes in Dynamic Systems

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Financial Econometrics

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Multivariate Regression

Bootstrap Simulation Procedure Applied to the Selection of the Multiple Linear Regressions

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

14 Multiple Linear Regression

A predictive density approach to predicting a future observable in multilevel models

A discussion on multiple regression models

A new algebraic analysis to linear mixed models

The Simple Regression Model. Part II. The Simple Regression Model

Linear Model Under General Variance

Poisson Regression. Ryan Godwin. ECON University of Manitoba

Least Squares Estimation-Finite-Sample Properties

Regression Models - Introduction

A Covariance Regression Model

Regression Analysis. y t = β 1 x t1 + β 2 x t2 + β k x tk + ϵ t, t = 1,..., T,

Part 8: GLMs and Hierarchical LMs and GLMs

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

Generalized, Linear, and Mixed Models

A MODEL-BASED EVALUATION OF SEVERAL WELL-KNOWN VARIANCE ESTIMATORS FOR THE COMBINED RATIO ESTIMATOR

Introduction to Simple Linear Regression

Journal of Asian Scientific Research COMBINED PARAMETERS ESTIMATION METHODS OF LINEAR REGRESSION MODEL WITH MULTICOLLINEARITY AND AUTOCORRELATION

On Efficiency of Midzuno-Sen Strategy under Two-phase Sampling

Empirical likelihood inference for regression parameters when modelling hierarchical complex survey data

On Selecting Tests for Equality of Two Normal Mean Vectors

RECENT DEVELOPMENTS IN VARIANCE COMPONENT ESTIMATION

The Multiple Regression Model

BIAS-ROBUSTNESS AND EFFICIENCY OF MODEL-BASED INFERENCE IN SURVEY SAMPLING

Trends in Human Development Index of European Union

Gauss Markov & Predictive Distributions

Variable Selection and Model Building

Research Article Ratio Type Exponential Estimator for the Estimation of Finite Population Variance under Two-stage Sampling

HISTORICAL PERSPECTIVE OF SURVEY SAMPLING

Bayesian Estimation of a Possibly Mis-Specified Linear Regression Model

Testing Statistical Hypotheses

A Test of Homogeneity Against Umbrella Scale Alternative Based on Gini s Mean Difference

STAT 540: Data Analysis and Regression

Multicollinearity and A Ridge Parameter Estimation Approach

Consistent Bivariate Distribution

Small Area Estimates of Poverty Incidence in the State of Uttar Pradesh in India

Tightening Durbin-Watson Bounds

Chapter 5 Matrix Approach to Simple Linear Regression

Sample size calculations for logistic and Poisson regression models

114 A^VÇÚO 1n ò where y is an n 1 random vector of observations, X is a known n p matrix of full column rank, ε is an n 1 unobservable random vector,

Analysis of variance, multivariate (MANOVA)

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley

Analysis of Microtubules using. for Growth Curve modeling.

Generalized Linear Models (GLZ)

Regression. ECO 312 Fall 2013 Chris Sims. January 12, 2014

Experimental Design and Data Analysis for Biologists

An Unbiased C p Criterion for Multivariate Ridge Regression

Lesson 17: Vector AutoRegressive Models

STAT Chapter 11: Regression

Transcription:

') -t CAA\..-ll' ~ j... "1-' V'~ /'. uuo. ;). I ''''- ~ ( \ '.. /' I ~, Unbiased prediction in linear regression models with equi-correlated responses Shalabh Received: May 13, 1996; revised version: December 11, 1996 This paper considers problem of predicting actual and mean values of response variable in a linear regression model with equi-correlated responses. Two such predictors are presented and ir efficiency properties are studied w:l.threspect to criterion of covariance matrix. 1. Introduction: In many applications,we come across linear regression models with equi-correlated responses. For example, when observations are taken on some characteristic on members of a family in familial studies, y exhibit generally- high correlation, see, e.g., Srivastava (1984). Similarly. correlated responses are recorded when measurements are taken on two eyes or hands of individuals for studies in medical sciences, see, e.g., Munoz, Rosner and Carey (1986) and Rosner (1984). IJ.ikewise in survey sampling when cluster

238 sampling procedure is adopted, fairly high values of intra-cluster correlation is found, see, e.g., Holt and Scott (198!) and King and Evans (1986). Estimation of parameters in linear regression models with equi-correlated responses has received considerable attention in literature but such is not case with problem of prediction of some future vaiues of response variable given a set of value for explanatory variables. This has inspired present investigations. The plan of this article is as follows. In Section 2, we describe model and present two unbiased predictors. Their efficiency properties are also analyzed. Finally some remarks are made. 2. Model Specification And Predictions: Let us consider regression model: (1) y = ~ + au following linear where y is a nx1 vector of n observations on response variable, X is an nxp full column rank matrix of n observations on p explanatory variables, ~ is a column vector of associated regression coefficients, a is a scalar and u is a column vector of disturbances. Next, we assume that a set of m fixed values of explanatory variables in form of a mxp matrix Xf is given corresponding to which m values of response variable are to be predicted. Thus we have (2) Yf = Xf!1 + auf where Yf denotes column vector of m values of response variable and uf is vector of disturbances. It is assumed that values of response 239 variable are equi-correlated disturbances have an intra-class 50 that structure. Thus disturbances are correlation be identically distributed with assumed to write variances 1 and covariances p So mean, that ~e can E(u) = 0, E(uf) = 0, E(uu') (3) = (1 - p)i n + pjnj n ' = W (say) E(u f U f ') = (1 - p)i m + pj m J m ' E(uuf') = PJnJm' = Hf (say) elements where J unity. denotes a column vector with all Finally, we assume for sake of simplicity in exposition that observations in X are taken as deviations from ir corresponding means and model contains no intercept term so that X'J n is a null matrix. For predicting future values of response variable in a generalized linear regression model, Bibby and Toutenburg (1978) and Rao and Toutenburg (1995) have considered a variety of predictors and have presented a comprehensive discussion of ir properties under a general framework, see also Chandrasekar and Prabakaran (1994). He, however, restrict our attention to two unbiased predictors, viz., classical predictor and optimal homogeneous are defined as predictor obtained by GOldberger (1962). They (4) Pc = Xfb (5) where. -1 PH = Xfb + Wf W (y - Xb) (6) b = (X'X)-lX'y is least squares estimator which can be seen result to be identical with generalized least squares estimator of ~ from (1) employing '---, - -- - -- -- --- ------.

241 240 1 P 1 - P n 1 + (n - l)p n n ' ('1) W-l = ( ) [ r - J J,] see also Mc ElroY (1961) who has obtained a necessary and sufficient condition for equivalence of least Squares and generalized least squares estimators when disturbances are equi-correlated. The vector quantities (4) and (5) are generally used to find predictions eir for actual responses (Yf) or mean responses (Xf~) but not for both simultaneously. Practical situations may often arise where we are required to predict both actual values and mean values, see, e.g., Zellner (1994) and Shalabh (1995) for some illustrative examples. In order to handle this problem, let us define following target function (8) T = kyf + (1 - ~)E(Y~) = kyf + (1 - k)x~ where ~ is a nonstochastic scalar lying between o and 1; see Shalabh (1995) for details. The choice of k is a matter of practitioner's preference related to weightage assigned to prediction of actual responses in relation to prediction of mean responses. (9) (10) It is easy to see that E(PC - T) = OElXf(X'X)-lX'U - ~uf] = 0 E(PH - T) = OE(Kf(X'X)-lX'u - AUf = 0 + ~.,: - 1\p JmJn'u] whence follows unbiasedness of both predictors wher y are used for responses or actual responses or both. It can be easily covariance matrices of seen that predictors are. mean variance (11) (12) Vk(PC) = E(PC - T)(PC - T)' =02[(1 - P)(XfSXf' + ~21m) Vk(PH) = E(PH - T)(PH - T)' where S = (X'X)-l. (13) + A2pJ J, mm J =02[0 - P)(XfSXf' + ~21m) + p [\. 2 (1-2A)nP + J J. 1 + (n - 1 )p) m m ] From (11) and (12), we observe that (1-2A)np2 VA(PH) - Vk(PC) = 1 + (n - l)p JmJm' Thus both predictors are equally efficient when p = 0 and/or A = 0.5. The first case (p = 0) is not very interesting because n model loses its specification of equi-correlated responses. The second case (A = 0.5) is of course interesting. It implies that both predictors have identical performance properties when y are used in a situation in which prediction of actual performance and prediction of mean responses are equally important and thus receive equal weightage. When A is less than 0.5, i.e., prediction of mean responses is to be given higher weightage in comparison to prediction of actual responses, classical predictor remains unbeaten by optimal homogeneous predictor. Just reverse holds true when A exceeds 0.5. In or words, for situations assigning higher weightage to prediction of actual responses in, comparison to prediction of mean responses, optimal homogeneous predictor is superior to classical predictor with respect to criterion of variance covariance matrix. The aforesaid observations match, finding of Rao and Toutenburg (1995, p. 172) who have remarked that classical predictor is more

efficient than optimal homogeneous predictor for mean responses Rhile opposite is true when aim is to predict actual,responses.. Next, let us consider expression (11). It is seen that it is an increasing function of A in sense that as we increase value of ~ from 0 to 1, variability in Pc increases. This implies that variability of predictions arising from classical predictor has an upward. trend as ~ moves from 0 to 1. In or words, predictions have smaller dispersion Rhen y are used for mean values of response variable. Their performance declines as more and more weightage is given to prediction of actual values. 3. Some Remarks: We have considered problem of predicting future values of response variable in a linear regression model having an equi-correlated covariance structure and have studied efficiency properties of tro unbiased predictors with respect to criterion of variance covariance matrix. Our analysis has brought out some interesting findings that may prove useful to practitioners. It may be remarked that our investigations have assumed parameter p characterizing covariance structure to be known. When it is unknown, we may employ its estimate suggested by, for instance, Fuller and Battese (1973) and Srivastava (1984). Substituting such an estimate for p in Wf and W in (5), we can derive a feasible version of optimal predictor. Such a substitution will, however, disturb optimal property of predictor (5). It would be interesting to analyze performance of such feasible predictors and would be a subject matter of future work. REFRRENCES Bibby, J. and Toutenburg, H.(1978) Prediction And Improved Estimation In Linear Models, John Wiley, New York. Chandrasekar, B. and Prabakaran, T.E. (1994). A note on optimal vector unbiased predictor. Stat. Papers, 35, 71-80. Fuller, W.A. pnd Battese, G.E. (1973). Transformations for estimation of linear models with nested - error structure. J. Amer. Statist. Assoc., 68, 626-632. Goldberger, A.S. (1962). Best linear unbiased prediction in generalized linear regression 57, 369-375. model. J. Amer. Statist. Assoc., Holt, D. and Scott, A.J. (1981). Regression analysis using survey data. The Statistician, 30, 169-178. King, M.L. and Evans, M.A. (1986). Testing for block effects in' regression models based on 677-679. SUrvey data. J. Amer. Statist. Assoc., 81, Me Elroy, F.W. (1967). A sufficient condition that squares estimator be best J. Amer. Statist. Assoc., necessary and ordinary least linear unbiased. 62, 1302-1304. Munoz, A., Rosner, B. and Carey, V.(1986). Regression Analysis in Presence of heterogeneous intraclass Correlations. Biometrics, 42, 653-658. Rao, C.R. and Toutenburg, H. (1995) Linear Springer. Models, Least Squares And Alternatives, 243 Rosner, B. (1984). Multivariate methods in OPhthalmology with appli.cations to or

. ~ - L44 paired-data 1025-1035. situations. Biometrics, 40. Shalabh (1995). Performance of Stein-rule procedure for simultaneous prediction of actual and average values of study variable in linear regression model. Proceed. Fiftieth Session Int. Stat. Inst., 1375-1390. 90. Srivastava, M.S. (1984). intraclass correlation in Biometrika, 71, 177-185. Estimation familial of data. Zellner, A. (1994). Bayesjan and Non-Bayesian estimation using balanced loss functions (in Statistical Decision Theory And Related Topics V, eds. S.S. Gupta and J.O. Berger), Springer-Verlag, New York. Shalabh Department of Statistics University of Jammu Jammu-180 004, India