Sampling, WLS, and Mixed Models Festschrift to Honor Professor Gary Koch

Similar documents
Sampling, WLS, and Mixed Models Festschrift to Honor Professor Gary Koch

The standard deviation of the mean

Properties and Hypothesis Testing

Basics of Probability Theory (for Theory of Computation courses)

Optimal Estimator for a Sample Set with Response Error. Ed Stanek

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable.

Optimally Sparse SVMs

1 Inferential Methods for Correlation and Regression Analysis

Random Variables, Sampling and Estimation

(3) If you replace row i of A by its sum with a multiple of another row, then the determinant is unchanged! Expand across the i th row:

Element sampling: Part 2

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Estimation for Complete Data

Lecture Overview. 2 Permutations and Combinations. n(n 1) (n (k 1)) = n(n 1) (n k + 1) =

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Commutativity in Permutation Groups

Discrete-Time Systems, LTI Systems, and Discrete-Time Convolution

This is an introductory course in Analysis of Variance and Design of Experiments.

Infinite Sequences and Series

ECON 3150/4150, Spring term Lecture 3

Improved Class of Ratio -Cum- Product Estimators of Finite Population Mean in two Phase Sampling

Estimation of Gumbel Parameters under Ranked Set Sampling

Hoggatt and King [lo] defined a complete sequence of natural numbers

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Principle Of Superposition

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.

Estimation of Population Mean Using Co-Efficient of Variation and Median of an Auxiliary Variable

Information-based Feature Selection

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Statistics 511 Additional Materials

Chapter 9 - CD companion 1. A Generic Implementation; The Common-Merge Amplifier. 1 τ is. ω ch. τ io

Math 155 (Lecture 3)

Linear Regression Demystified

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

Modified Ratio Estimators Using Known Median and Co-Efficent of Kurtosis

Some examples of vector spaces

Investigating the Significance of a Correlation Coefficient using Jackknife Estimates

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Understanding Samples

CS284A: Representations and Algorithms in Molecular Biology

Advanced Stochastic Processes.

Chapter 6 Part 5. Confidence Intervals t distribution chi square distribution. October 23, 2008

CEU Department of Economics Econometrics 1, Problem Set 1 - Solutions

Lecture 19: Convergence

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

Estimation of the Population Mean in Presence of Non-Response

Introductory statistics

o <Xln <X2n <... <X n < o (1.1)

6.3 Testing Series With Positive Terms

Mathematical Induction

x a x a Lecture 2 Series (See Chapter 1 in Boas)

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Abstract. Ranked set sampling, auxiliary variable, variance.

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

A statistical method to determine sample size to estimate characteristic value of soil parameters

(3) If you replace row i of A by its sum with a multiple of another row, then the determinant is unchanged! Expand across the i th row:

6 Sample Size Calculations

An Introduction to Randomized Algorithms

Proof of Goldbach s Conjecture. Reza Javaherdashti

GUIDELINES ON REPRESENTATIVE SAMPLING

On stratified randomized response sampling

4.3 Growth Rates of Solutions to Recurrences

Session 5. (1) Principal component analysis and Karhunen-Loève transformation

Lecture 2: Monte Carlo Simulation

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Topics in Probability Theory and Stochastic Processes Steven R. Dunbar. Stirling s Formula Derived from the Gamma Function

Chapter 6 Sampling Distributions

TEACHER CERTIFICATION STUDY GUIDE

G. R. Pasha Department of Statistics Bahauddin Zakariya University Multan, Pakistan

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

7.1 Convergence of sequences of random variables

Topic 5: Basics of Probability

Expectation and Variance of a random variable

SNAP Centre Workshop. Basic Algebraic Manipulation

THE KALMAN FILTER RAUL ROJAS

Definitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients.

Sequences, Mathematical Induction, and Recursion. CSE 2353 Discrete Computational Structures Spring 2018

ESTIMATION AND PREDICTION BASED ON K-RECORD VALUES FROM NORMAL DISTRIBUTION

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

1 Hash tables. 1.1 Implementation

Probability, Expectation Value and Uncertainty

Feedback in Iterative Algorithms

Statistical inference: example 1. Inferential Statistics

A Relationship Between the One-Way MANOVA Test Statistic and the Hotelling Lawley Trace Test Statistic

MOMENT-METHOD ESTIMATION BASED ON CENSORED SAMPLE

Algebra of Least Squares

Distribution of Random Samples & Limit theorems

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution

Chapter 13, Part A Analysis of Variance and Experimental Design

Statistical Inference Based on Extremum Estimators

Analysis of the Chow-Robbins Game with Biased Coins

II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation

A UNIFIED APPROACH TO ESTIMATION AND PREDICTION UNDER SIMPLE RANDOM SAMPLING

The Random Walk For Dummies

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Transcription:

Samplig, WLS, ad Mixed Models Festschrift to Hoor Professor Gary Koch Edward J. Staek III Departmet of Public Health Uiversity of Massachusetts, Amherst, MA ad Julio M Siger Departameto de Estatística Uiversidade de São Paulo, Brazil Abstract Mixed models may be defied with or without referece to samplig or samplig radom variables, ad ca be used to predict realized radom effects. A commo applicatio ivolves the estimatio of latet values of study subects measured with respose error. I this cotext, mixed models may be specified as a sum of two radom variables, with oe stemmig from a exchageable distributio of latet values of study subects ad the other from the study subects respose error distributios. Such models assig positive probabilities to both potetially realizable resposes ad to artificial resposes that are ot potetially realizable. This has a impact o the defiitio of the parameters associated with study subects, o the iterpretatio of bias ad o the evaluatio of predictors. I cotrast, fiite populatio mixed models may be defied to represet the two-stage process of samplig subects ad measurig their resposes. Such models assig positive probabilities oly to potetially realizable resposes. We cosider the problem of estimatig a subect s latet value measured with respose error ad compare the two mixed model formulatios via a simple example. A aalysis of the performace of the correspodig predictors over the same potetially realizable resposes idicates that the optimal liear mixed model predictor (the usual BLUP) is ofte (but ot always) more accurate tha the comparable fiite populatio mixed model BLUP. The example provides the basis for a broader discussio of other liear estimators such as weighted least squares, ad the role of coditioig, samplig, ad model assumptios i developig iferece. C09ed3v.doc 0/7/009 5:40 PM

Itroductio: Advaces i public health ad health sciece are tied to uderstadig practical implicatios of chages i policy, programs, uderlyig cause of disease, prevetio, ad/or treatmet (Koch et al. (980)). Uderstadig the impact of such chages is the focus of much of Biostatistics. Not oly does Biostatistics embrace the theoretical uderpiigs of statistical modelig, but it seeks to tie the results of studies to actual reality. It is for this reaso that samplig plays a importat role i may applicatios of Biostatistics, sice estimates are eeded for real populatios. This is ot a simple process, sice it ivolves recocilig seemigly ad hoc approaches, such as i Koch s (967) procedure to estimate the populatio mea, with the fudametal basis of iferece from survey samplig, whe exteded, for example, to respose error (Koch, 973), ad to model based approaches. It is the struggle to uderstadig the basic uderpiigs of Biostatistics that has bee the focus of much of Koch s work, ad cotiues to be of compellig iterest, as discussed by Brow ad Kass (009). We discuss a simple settig that we feel challeges the depths of uderstadig of statistical iferece, estimatig the latet value of a subect. There are may settigs where iterest lies i the latet value for study subects. A example is the Seasos Study Merriam et al (999), Ockee et al. (004), where three 4-hour recall dietary iterviews were collected o each study subect i each seaso of a year to evaluate seasoal cholesterol chages, cotrollig for the cotributio of saturated fat itake. The 4-hour recalls were used to estimate the average saturated fat itake for each subect (the latet value) i the six weeks prior to cholesterol measure. Average saturated fat itake, ad the estimated stadard deviatio for 554 study subects for the first seaso i the study are displayed i Figure. Both the latet value ad the variace i saturated fat itake vary amog subects. Rather tha usig the simple average for the seaso to estimate a subect s latet saturated fat itake, a more accurate estimate may be obtaied by usig a best liear ubiased predictor (BLUP) i a mixed model (MM) with subects as radom effects. Although mixed model BLUPs are commoly used to estimate realized subects latet values, a close examiatio reveals that some of the a portio of the MM sample space is artificial ad ot potetially realizable. This prompts a reexamiatio of the iterpretatio of latet values, bias of the MM-BLUP, ad the criteria used to evaluate its performace. This also provides a cotext for compariso with the fiite populatio mixed model (FPMM) cosidered by Staek ad Siger (004), which icludes samplig ad avoids couterfactuals that are ot potetially realizable. We discuss these issues i the cotext of a simple problem. First, we develop a mixed model for a set of subects whose resposes follow a simple respose error model by addig the assumptio that the subect latet values are radom effects. This itroduces defiitios ad otatio. Next, we discuss a simple example to distiguish (possibly artificial) MM-latet values from (actual) subect latet values, ad MM-resposes from potetially realizable resposes. We follow by assumig that the set is selected from a fiite populatio of size N=3 ad review the FPMM alog with the correspodig BLUP i this cotext. We coclude with a discussio of the coectios betwee the issues raised i the example ad some broader ideas i Statistics. The Mixed Model Predictor C09ed3v.doc 0/7/009 5:40 PM

May frameworks ca be used to develop the BLUP uder a MM as discussed by Robiso (99). The framework we use begis with a additive respose error model for each subect, ad assumes exchageability of the correspodig latet values. The study subects costitute a set that may or may ot have bee obtaied as a result of a probability sample from a populatio. I the Seasos Study, for example, the study subects costitute a voluteer subset of members of the Fallo Health Maiteace Orgaizatio. We start with a set of subects, labeled =,..., ad assume that repeated resposes, Y k, k =,..., r s, are associated with subect. The data for the set correspod to the pairs,, ( Y Y Yr ), =,...,. We assume that the resposes associated with subect are idepedet ad idetically distributed radom variablesy k, k =,..., r, ad E Y = y as the latet value for subect ; we also let ( Y ) = σ deote the defie ( ) R k var R k correspodig respose error variace. The subscript R idicates expectatio with respect to the distributio of the respose error. For simplicity, we cosider a sigle measure for subect ad drop the subscript k so that the respose error model may be writte as Whe r >, Y ad Y = y + E. () E correspod to the average respose ad average respose error, respectively, ad σ represets the variace of these averages, which we assume kow. The latet values, y, are the parameters of iterest. Without additioal assumptios, the respose for subect, amely, Y, is the best estimator of the subect s latet value. We defie a MM by addig to the respose error model the assumptio that the latet values for the subects are a realizatio of P = ( P P P ), a exchageable vector of radom variables whose possible equally likely values are the latet values of the subects i the set. A realizatio of P is a MM-latet value which we re-parameterize as P = μ + a, =,...,, ad for which we assume that Eξ ( a ) = 0, Eξ ( aa ) = γ whe =, or Eξ ( aa ) = γ otherwise, with μ = y ad γ = ( y ) μ. The subscript ξ = = idicates expectatio with respect to the distributio of latet values. Oe possible realizatio of P is y = ( y y y ), while other possible realizatios of P are permutatios of these values. The MM is give by Y = μ + a + E () C09ed3v.doc 0/7/009 5:40 PM 3

or i matrix form, by where = ( ) Y = Xμ + Za+ E Y Y Y Y, X=, a colum vector with all elemets equal to, Z= I, a idetity matrix, = ( ) E = ( E E E ), a vector of respose errors. I the MM, E ξ R ( ) ( Y ) a a a a, the vector of radom effects, ad Y = X μ, while var ξ R = Ω, where Ω= Γ + σ, Γ = γ I J, with = J =, ad σ deotes a = matrix with diagoal elemets σ ad off-diagoal elemets equal to zero. Every realizatio of Y i () correspods to a idetical realizatio of Y i (), but ot vice-versa. Let the target correspod to a liear fuctio of P give byt = g P. Wheg = e, a vector whose elemets are all equal to zero except the elemet i row that is equal to oe, the target is P, ad the correspodig BLUP is a liear fuctio of Y that is ubiased ad has miimum expected mea squared error (MSE). We may show (see Appedix A for details) that the BLUP of P is P = μ + k Y μ (3) ( ) where μ is the weighted least squares (WLS) estimate of the mea, i.e., μ = wy, with w = γ + σ γ + σ =, ad k γ =. The expected MSE of the predictor, give by γ + σ ( k var R P P) ξ = σ k +, where k = k, is smaller tha the expected MSE ( σ ) k = attaied whe we use the subect s respose as a estimate of the subect s latet value, (see Appedix B). Sice the oly realizatio of T that is ot artificial is y, P may be cosidered a better estimate of y tha the observed respose, Y. The mixed model give by () is defied for a set, ad ot ecessarily for a potetially realized sample from a populatio. Oe way to itroduce the idea of a populatio i the MM is to assume the latet values for the subects i the set are the realized latet values from a sample from a populatio. Whe the populatio size N is large so that the fiite populatio samplig fractio ca be igored, varξ ( P) = γ I, where N replaces i the defiitio of γ. A alterative mixed model cosiders the data to be the realized respose of a radom sample of subects from a (fiite) populatio (Staek ad Siger 004). We refer to this model as the fiite populatio mixed model (FPMM), ad ote that the BLUP give by (3) is ot the same as the FPMM- = C09ed3v.doc 0/7/009 5:40 PM 4

BLUP. These differeces warrat a closer examiatio of the model ad uderlyig assumptios, which we cosider via a simple eumerative example. Examples We discuss a simple example to compare the MM ad FPMM BLUPs. Sice the FPMM is defied for a simple radom sample from a populatio, we begi by defiig such a populatio, eve though the MM requires oly subects i a set. The data correspod to = subects, Daisy, labeled s = ad Rose, labeled s = 3,who are members of a populatio of N = 3 subects summarized i Table. - isert Table here Note that the respose error variace differs betwee subects. We begi with a discussio of the MM. To match the otatio used for the MM, first, we order the labels from smallest to largest, idexig the smallest label ( s = ) by = ad the ext smallest label ( s = 3) by =. Next, we assume that for subect, the respose error ca take o two equally likely values correspodig to σ or σ. Uder the respose error model (), each respose (correspodig to a pair of values for the sample set) is equally likely with probability ¼. With these assumptios, we display i Table the potetially realizable resposes correspodig to the four combiatios of respose error. - Isert Table here- The potetially realizable resposes i Table are possible resposes (which we idex by t ) for the MM whe the realizatio of P (the MM-latet values) is y. Sice the latet values are assumed to be exchageable ad =, there are two possible realizatios of P. Resposes for the other realizatio of the MM-latet values are listed i Table 3. - Isert Table 3 here- The resposes for the MM listed i Tables ad 3 correspod to the equally likely realizatios, Y t, for t =,...,8, of Y, each occurrig with probability /8. The correspodig realizatios, P t, t =,...,8, of P are the realized MM-latet values. Whe t =,...,4 (as i Table ), the realizatios of P ad Y correspod to y ad Y i (), respectively. For such data, Daisy s realized MM-latet value is 0. The realizatios of P ad Y are artificial whe t = 5,...,8 (as i Table 3). I this case, Daisy s realized MM-latet value is. The BLUP of P uder the MM give by (3) for each realized MM-respose is give i Table 4. The colums i Table 4 are orgaized i two paels, with the first pael correspodig to Daisy, ad the secod pael, to Rose. The differeces, P P, ad the correspodig squared differeces are give i last two colums of each pael. Notice that the average differece is C09ed3v.doc 0/7/009 5:40 PM 5

0 ξ =. The average squared differece, or MSE, is 0.99 for Daisy ad 3.77 for Rose. These values are smaller tha those that would result from a best liear ubiased estimator (BLUE) usig model (), amely σ = for Daisy, ad σ 3 = 4 for Rose. zero, satisfyig the ubiased costrait give by E R( P P) -Isert Table 4 here- There are some problems with these results which ca be illustrated by focusig o the MMresposes for Daisy (first pael of Table 4). Notice from Table that Daisy s latet value is 0, while is also listed as a latet value for Daisy i Table 4. The MM-latet value of for Daisy correspodig to the MM-resposes t = 5,...,8 exists oly i the mixed model, ot i reality. Such a latet value is artificial, ad oe could argue that it should ot be give a positive probability i the aalysis. This is ot due to a differet iterpretatio of the subect labeled =, sice this label oly correspods to Daisy i the model defiitio. These results shed light o the iterpretatio of bias ad o the defiitio of the MSE for the MM. I order to compute bias, we eed to subtract the subect s actual latet value for all settigs as show i Table 5. Usig the subect s actual latet value, the BLUP give by (3) is biased for each subect, ad its MSE is larger tha the MSE of the BLUE based o model (). -Isert Table 5 here- I the MM, positive probability is give to MM-resposes that are ot potetially realizable. By averagig over these artificial resposes i additio to the potetially realizable resposes, the coectio betwee the MM ad reality is broke. This creates cotradictios i the iterpretatio of results. For example, the latet value for Daisy ( = ) is 0 for all potetially realizable resposes, but the expected value of the correspodig MM-latet values is Eξ ( P) =. R 6 To retai the iterpretatio of the latet value for the subect, the MM-latet value should be defied oly over the potetially realizable MM-resposes i.e., correspodig to t =,...,4. Thus, the target quatity must be defied coditioally o the potetially realizable resposes, eve though the MM is defied ucoditioally. Defiig the MM-latet value as the expected value of P oly over the potetially realizable resposes results i a latet value equal to 0 for Daisy ad i a latet value equal to for Rose, which correspod to their true latet values. Restrictig evaluatio of (3) to potetially realizable resposes provides some isight o bias ad MSE. The coditioal bias is give by E P P P= y = k y μ, ( ) ( )( ) R (see Appedix C). Therefore, the coditioal bias for Daisy is -0., while the coditioal bias for Rose is 0.46. The average coditioal bias over the subects is ot equal to zero. Usig a similar defiitio for the MSE (see Appedix C), i.e., C09ed3v.doc 0/7/009 5:40 PM 6

(( ) ) ( ) ( ) ( k ) ER P P P = y = k w σ + y μw + kσ +, (4) = k where μ = w y, it follows that the MSE for Daisy is 0.986 ad 3.768 for Rose, both w = smaller tha the MSE of the simple resposes, Y, =,. Estimatig the Mea Latet Value Our developmet has focused o estimatig the MM-latet value for a subect. We ca use similar methods to obtai a estimate of T = g P where g = i the MM a target that correspods to the average MM-latet value, P. The correspodig BLUE is the weighted least squares (WLS) estimator give by μ. Notice that P is equal to y = y =, the mea of the latet values i the respose error model (). The BLUE of y i () is the mea respose, Y Y = =. Sice P = y, it is temptig to compare the BLUE obtaied uder model () with the BLUP obtaied uder model (), as illustrated i Table 6. -Isert Table 6 here- Uder model (), there are o resposes comparable to the MM-resposes for t = 5,...,8. This is a cosequece of the iclusio of artificial resposes i the MM. The target parameter, P, is costat over all possible MM-resposes. If we defie a estimator similar to Y for the MMresposes as Y Y = =, the the MSE of space. The MSE of μ, give by ( ) Y ad μ ca be evaluated over the same sample k E ξ R μ P = γ, is less tha the MSE of k Y, give by ( ) ξ R which equals the MSE of Y uder model (), i.e., ( ) ER Y y σ E Y P providig the usual ustificatio for the use of μ istead of Y. = =, The WLS estimator is ubiased whe evaluated over all resposes. Evaluated over potetially realizable resposes, i.e. those correspodig to t =,...,4, the bias is ( k k Eξ R P T P= y ) = y. The ubiased property of the WLS estimate of the latet = k value mea holds oly whe expectatio is take over all possible resposes, icludig those artificial resposes that are ot potetially realizable. The MSE, evaluated oly over potetially realizable resposes, is C09ed3v.doc 0/7/009 5:40 PM 7

( = ) = σ + ( ) MSE R P T P y ξ w w y =. = Whe =, as i the example illustrated i Table 6, this expressio simplifies to ( ) k E ξ R μ P = γ. Whe >, as illustrated ext, the coditioal MSE of the MMk BLUP is ot equal to its ucoditioal MSE, ad may be larger (or smaller) tha the ucoditioal MSE. A Slightly Larger Example. Although i the first example, with =, it was possible to eumerate all outcomes, some issues that occur more geerally could ot be revealed. We briefly discuss a secod example where the data correspod to = 3 subects, Daisy ( s = ), Lily ( s = ) ad Rose ( s = 3), to raise such issues. We order the labels i the set from smallest to largest, idexig the smallest label by = ad the ext smallest label by =, ad the largest label by = 3, ad assume that respose error for a subect ca take o two equally likely values correspodig to σ or σ. With these assumptios, there are eight equally likely possible potetially realizable resposes correspodig to the differet combiatios of respose error (Table 7). -Isert Table 7 here- The t =,...,8 potetially realized resposes i Table 7 are possible resposes (which we idex by t ) for the MM whe the realizatio of P (the MM-latet values) is y. Sice the latet values are assumed to be exchageable ad = 3, there are six possible realizatios of P. Replacig y by each of these realizatios gives rise to 40 artificial resposes that are ot realizable, but are icluded with positive probability i the MM. The predictor of P give by (3) uder the MM for Daisy is listed for t =,...,8 i Table 8, ad for t = 9,...,48 i Table 9. We summarize the results for the MM-BLUP of each subect i Table 0. -Isert Table 8-0 here- Notice that whe averagig over the potetially realizable resposes ( t =,...,8 ), the MM-latet value is the subect s latet value. The average squared differece betwee the MM-BLUP ad the MM-latet value for the potetially realizable respose is larger tha a similar average over the o-realizable resposes for Daisy ad Rose, but ot for Lily. It is the overall average MSE (overt =,..., 48 ) that is usually evaluated for the MM, eve though such a average icludes resposes that are ot potetially realizable. It is of value to cosider the MM-BLUP of P = 5 i this example. Over the potetially realizable resposes ( t =,...,8 ), the average of μ is 6.009, while over the couter factual resposes, the average is 4.798. Although the simple average of μ over all MM-resposes is C09ed3v.doc 0/7/009 5:40 PM 8

equal to P, this ubiased results oly occurs oly if the artificial o-realizable resposes are icluded. The average MSE for the potetially realizable resposes ( t =,...,8 ) is give by.667, while 3.645 is the average MSE for the couter factual resposes ( t = 9,...,48 ). The simple average MSE (over allt =,...,48 ) give by 3.48 is larger tha the average MSE for the potetially realizable resposes, but smaller tha the comparable average MSE uder the respose error model give by.667. The Fiite Populatio Mixed Model We defie a fiite populatio mixed model by cosiderig the data to be the realized respose of a simple radom sample of subects from a fiite populatio, assumig a sigle respose for each subect. We defie subects, latet values, ad respose i the populatio usig similar otatio as i model () by defiig the populatio as a set of N subects. We use the subscript s to label subects i the populatio, ad ote that y ad E represet N N vectors of latet values ad respose error, respectively. With this defiitio, μ = ys N s= N correspods to the usual fiite populatio parameters of the mea, while N γ correspods to N the usual fiite populatio variace, where γ = ( y ) s μ. We defie Y as a N N s= respose vector with elemets Ys = ys + Es, s =,..., N, so that the respose error model for the populatio is give by Y = y + E (5). We defie a sample as a sequece of subects, ad use i =,..., to idex the subects i the sequece. We idex the possible sequeces of subects by h, where =,..., H ad N! H =. Let yhi deote the latet value for the subect i positio i i sequece h ( N ) ad! defie the sample vector of latet values by y ( y y y h = h h h ). This geeral represetatio of a sample was used by Godambe (955). We defie respose for sequece h by Y = u Y h so that the elemetyhi deotes respose for the subect i positio i i sequece h, Yh = ( h h h ) u = ( u u u ) h h h h ( u u u ) Y Y Y, ad is a matrix of costats with colums give by u hi = hi hi hin for i =,...,. The elemet uhis has a value of oe if subect s is i positio i i sequece h, ad zero otherwise. For example, whe = ad N = 3, the data for sequece h cosistig of subect s = 3 followed by subect s = is (( s = 3, Y ) ( s, Y = ) ) C09ed3v.doc 0/7/009 5:40 PM 9

u 0 u where u u u 0 0 = =. Latet values ad respose errors for the subects i u u 0 3 3 sequece h are defied i a similar maer by y = u y ad E = u E, respectively. While it is h h h h possible to relate the respose for a subect i positio i i sequece h to the respose for subect defied by the respose error model () (see Appedix D), it is importat to ote that the subect i positio i i sequece h is ot ecessarily the same subect as the subect labeled i model () sice for oly oe sequece will the order of the subects i () match the subect s positio i the sequece. The fiite populatio mixed model is defied by assumig a sample correspods to a radomly selected sequece, where I h represet a idicator radom variable that has a value of oe whe sample sequece h is selected, ad zero otherwise, ad subsequetly summig the idicator radom variable for as over possible sequeces. We assume that all sample sequeces are equally likely (correspodig to simple radom samplig without replacemet), so that Ep ( I ) = (where the subscript p idicates expectatio with respect to samplig). Next, let H H H YI = I Y be a vector represetig the sample respose. Defiig U I = I u with = elemets U variables, Y H is = I u his = N Ii UisYs s=, i=,...,, s =,..., N, Y = I I = U Y is a vector of sample radom =, i =,...,. Usig (5) ad defiig subect effects by βs = ys μ, s =,..., N, the fiite populatio mixed model may be writte as Y = μ + b + E (6) where b N = U β ad E i is s s= N Ii is s s= Ii i Ii = U E, or i matrix form as Y = Xμ + Zb+ E I where b= U β, ( ) I b = b b b, β = ( β β β N ) ad E = I UI E. This represets the sample radom variables i the fiite populatio as defied by Staek ad Siger (004). The radom variable b i correspods to the deviatio of the subect s latet value from the populatio mea for the subect i positio i i a radomly selected sequece. I Let the target correspod to a liear fuctio of PI = Xμ + Zb give by T I = g P I. Whe g = e i, a vector whose elemets are all equal to zero except the elemet i row i that is equal to oe, the target is P Ii, ad the FPMM-BLUP is a liear fuctio of Y I that is ubiased ad has miimum expected mea squared error (MSE). We show (see Appedix E for details) that the FPMM-BLUP is P = Y + k Y Y Ii ( Ii ) C09ed3v.doc 0/7/009 5:40 PM 0

YIi i = N σ s N s= γ where Y = is the sample average respose, k =, ad σ =. The γ + σ expected MSE of the predictor is ( σ varpr PIi PI ) = + k( ). Whe TI = μ, g =, PI = Y, ad ( σ γ varpr PI P) = ( f ) +. Of particular iterest is a compariso of the average MSE for potetially realizable resposes. For all sample sets ad subects, the MM-BLUP MSE is smaller tha the MSE of the observed respose, ad smaller tha MSE of the FPMM-BLUP. For a give sample sequece, the FPMM- BLUP is biased, with the bias give by E ( R PIi PIi I ) ( = k)( y μ ) (see Appedix F). Coditioal o a sample sequece, the MSE of the FPMM-BLUP is give by MSE ( ) ( ) ( ) ( ) ( ) ( ) pr PIi PIi I = k + kσ + k σh + k y μh. (7) Examples We cosider the FPMM-BLUP for simple radom samples of size = from the populatio of N = 3 subects listed i Table. First, ote that there are six possible sample sequeces, with! = sequeces for each sample set. Sice the FPMM-BLUP is idetical for a subect i differet sequeces i the same set, we list the t =,..., possible equally likely sample resposes i Table correspodig to the three sample sets. - Isert Table - Notice that the FPMM-BLUP is a biased predictor of the subect s latet value for each subect, but the average bias (over all subects) is zero. The MSE differs betwee subects, ad exceeds the MSE of the observed respose for Daisy ad Rose, but is smaller (48.58 vs 00) tha the MSE of the observed respose for Lily. The average MSE of the FPMM-BLUP (i.e., 3.66) over all subects is smaller tha the average MSE of the observed respose (i.e., 35). Table provides a summary of the MM-BLUP ad the FPMM-BLUP for the three differet sets of = from the populatio of N = 3 listed i Table. Recall that the MM- BLUP is defied for each particular set, while the FPMM-BLUP is defied over all sets. The results i Table are arraged i paels of rows correspodig to average predictors of Daisy, Lily, ad Rose s latet values. Colums correspod to the average predictor, the bias, ad the MSE. The bias ad MSE are evaluated for the MM model relative to the MM-latet value, ad relative to the subect s true latet value. The potetially realizable resposes correspod to rows where t =,...,4. The last three rows i Table summarize the average results over potetially realizable resposes, over couterfactual resposes that are ot potetially realizable, ad over all resposes. Notice that the average bias over all resposes is zero for each predictor, but whe bias is calculated oly over potetially realizable resposes, the MM-BLUP is biased, while the FPMM_BLUP is ot. The results i Table illustrate overlappig but distict sample spaces that uderlie the MM ad the FPMM predictors. C09ed3v.doc 0/7/009 5:40 PM

A Example with N = 4 ad = 3 We cosider a slightly larger example to compare the MM-BLUP ad FPMM-BLUP. The example is for a populatio of N = 4 where a simple radom sample of = 3 subects is selected, resultig i four possible sample sets. The populatio cosists of the origial populatio give i Table, ad a additioal subect, Violet, with a latet value ad respose variace give by y 4 = adσ 4 = 5, respectively. The compariso is made for each sample set- assumig that the set costitutes a populatio for the FPMM. This meas that γ ad σ are differet for differet sets. We compare the MSE of the estimates of subect s latet values from the MM-BLUP (4) ad the FPMM-BLUP i Table 3. The results idicate that the MSE of the MM-BLUP is smaller tha that of the FPMM-BLUP i most, but ot all settigs. The estimate of Violet s latet value based o a FPMM-BLUP has smaller MSE tha the MM-BLUP i sets that iclude Daisy. Discussio The compariso of the model-based formulatio of the mixed model () ad the fiite populatio mixed model (6) via the examples provides some isight as well as revealig the opportuity for cofusio i discussios of mixed models. First, the compariso provides some clarity to Robiso s (99) discussio of whether the MM-BLUP should be termed a estimator or a predictor, ad uderscores the difficulty that Hederso (975) had i providig a covicig iterpretatio of the MM-BLUP. Hederso (984, page 37) posed the problem as to Which is the more logical cocept, predictio of a radom variable or estimatio of the realized value of a radom variable? If we have a aimal already bor, it seems reasoable to describe the evaluatio of its breedig value as a estimatio problem. O the other had, if we are iterested i evaluatio the potetial breedig value of a matig betwee two potetial parets, this would be a problem i predictio. The termiology of estimatio applies to the MM-BLUP whe the aimal is already bor, while predictio applies to the FPMM-BLUP whe the matig parets have yet to be selected. The iterpretatio of ubiased is also clarified. I the mixed model, we ca distiguish Eξ Y μ E ξ Y = P (the MM-latet value for subect ) from R ( ) = from ( ) ( ) R ξ = = R E Y P y y, the true latet value for subect. If our iterest is i the latet value for subect the ubiased property of the MM-BLUP is defied as E ( ξ R P) from the usual defiitio of ubiased, give by E ( Rξ P = ) = y = μ. This differs P y. Neither the MM-BLUP or the FPMM-BLUP are ubiased whe this defiitio is adopted. The MM-BLUP is a biased estimator of the subect s latet value, while the FPMM-BLUP is a biased predictor of the realized radom effect. Icludig U i the BLUP termiology may provide reassurace that BLUP s are OK for those who cosider lack of bias a pre-requisite for aalysis. But truth would C09ed3v.doc 0/7/009 5:40 PM

be better served if both MM-BLUPs ad FPMM-BLUPs were described as biased but more accurate ways of estimatig a subect s latet value. A importat aspect of the parallel developmet of the MM ad FPMM is illustratig the overlappig but distict sample spaces. Sice the examples we cosidered are small ad the outcomes are discrete, it is possible to make the sample spaces explicit. More geerally, the sample space is the product of possible realizatios of P ad E. If respose error has m values for each subect, both the sample spaces for the MM ad for the FPMM whe = N have m! possible values. These sample spaces overlap for the m values correspodig to P = y. The additioal (! ) m values i the MM whe P y are artificial, while the (! ) m resposes i the FPMM correspod to differet permutatios of the subects that are all potetially realizable. The differece i the MM-BLUP ad FPMM-BLUP is due to their developmet over the differet sample spaces. We advocate evaluatig statistics over sample spaces that are potetially realizable. This guidelie requires statistics to be liked to reality, implyig that oly a portio of the sample space be used to evaluate estimators i the MM. It is cosistet with Tukey s commet i discussio of Nelder (977) that our focus must be o questios, ot models. By limitig evaluatio of the estimators from the two formulatio of the mixed model to the potetially realizable sample space, we keep the focus o a real questio. With this focus, as illustrated via the examples, the MM-BLUP of a subect s latet value is ot uiformly more accurate tha the simple sample mea, or the FPMM-BLUP. More study i this area is clearly eeded. Guidelies are lackig for estimator choice; uderstadig is lackig o how to artificially expad a sample space to produce more accurate estimators; practical issues where variace parameters are ukow are yet to be explored; ad extesios to settigs with auxiliary variables are ot cosidered. The distictio betwee potetially realizable poits i the sample space ad artificial sample poits i the MM provides a cotext for uderstadig the cocer expressed i much of the classical statistical literature that oly variace compoets should be estimated ad radom effects should ot be predicted. First, otice that eve though the MM icludes artificial sample poits, there is a uderlyig physical reality to γ (whe defied for the set or for a populatio). This provides legitimacy to estimatig variace compoets. The ratioale for cocer over predictig radom effects i a MM is also evidet. For a subect, there is a differece betwee the MM-latet values, ad the subect s latet value. I the MM, the latet value associated with a subect is ot costat, but chages for differet sample poits. There is o reaso to be iterested i the latet value for the subect that is assiged to the artificial sample poits. This reasoig provides the logic behid a statemet that predictio of radom effects has o meaig. Our uderstadig of this cocer chages if we cosider estimatio of realized radom effects, where the term realized implies limitig cosideratio to sample poits that are potetially realizable. By restrictig the sample space to such poits, the MM latet value is costat for a subect, ad equal to the subect s latet value. Estimatio of the realized C09ed3v.doc 0/7/009 5:40 PM 3

radom effect i the MM is meaigful, as is predictio of the realized radom effect i the FPMM. There is a simple coectio betwee the MM ad Bayesia methods. The distributio of P i the MM has bee termed the obective prior distributio, as i Robiso (99). It has a simple iterpretatio as the distributio of subect s latet values, ad characterizes atural variatio betwee subects. If < N, expadig the distributio of P to be a subset of radom variables from a exchageable distributio of latet values i the populatio, which we deote by P N will expad the umber of artificial respose i the MM sample space, but ot alter the umber of potetially realizable resposes. Although each realizatio of P N is a set of latet values from the populatio, this expasio does ot make the estimator based o a set of subects from the MM more geeral, or does it guaratee that the resultig estimator will be more accurate. The accuracy of estimators that are developed from such models should be evaluated oly over potetially realizable poits i the sample space. Such a evaluatio may provide isight as to whether artificial expasio of sample spaces ca give rise to more accurate estimators. It is possible to expad the discussio of Bayesia cocepts to iclude a distributio of fixed effects, which Robiso (99) refers to as a subective prior. A sample poit i the resultig oit distributio must have parameters equal to those i the actual set of subects i order for potetially realizable resposes to be icluded i the sample space. The extesio to subective priors exteds oly the umber of artificial poits i the sample space, ad does ot alter the set of potetially realizable resposes from which the resultig estimator should be evaluated. Still, it is possible that such a extesio of the artificial sample poits will produce a more accurate estimator i some settigs. This is aother area deservig further study. There is a firm coectio betwee the MM ad the FPMM i the survey samplig literature datig back to Godambe s (955) ad Godambe ad Joshi s (965) importat papers. This work stimulated a crisis i the foudatios of statistical iferece, as summarized by Cassel et al. (977). We discuss this coectio, sice it provides a uifyig framework for ideas of statistical iferece. Godambe (955) cocluded that there is o best liear ubiased estimator of a fiite populatio total based o probability samples. This result was startlig sice the sample mea from a simple radom sample is commoly preseted as the BLUE of the populatio mea. Importat ideas i Godambe s developmet iclude the very geeral defiitio of a liear estimator, ad the eed of additioal assumptios beyod samplig to obtai a optimal estimator. The liear estimator proposed by Godambe (955) icludes separate coefficiets for each subect i each positio i a sample, where sample poits correspod to realizatios of the subset of the first radom variables represetig a permutatio of subect values i a fiite populatio. Subsequetly, Godambe ad Joshi (965) cocluded that it was sufficiet for coefficiets to be defied for each subect i a sample set, ot a sample sequece. Optimal coefficiets ca icorporate subect specific iformatio, such as differet respose error variaces, sice subects are idetifiable i a set. Additioally, sice the sample set is the startig poit, the coectio back to the possible samplig probabilities is ot relevat, sice iferece is coditioal o the sample set. C09ed3v.doc 0/7/009 5:40 PM 4

The settig cosidered by Godambe (955) did ot iclude respose error. Addig respose error to a subect s latet value i Godambe s basic model does ot alter the coclusio of oexistece of a BLUE, eve though it is possible to specify a set of estimatig equatios. While the equatios ca be solved, the solutio does ot result i a estimator sice it icludes osample latet values. Godambe (955) itroduced additioal a priori model assumptios (motivated by icludig a auxiliary variable) i order to develop a estimator of the populatio total. These assumptios are similar to the MM assumptios o latet values. As a result, the MM ca be cosidered to be a variatio o the suggestio by Godambe (955). These basic ideas are the foudatio for superpopulatio models i survey samplig. We idetify aspects of these models that are related to the MM. First, there is a coectio betwee the realized sample ad the superpopulatio, which we defie i terms of a set of latet values give by realizatios of P = μ + a. These latet values eed ot be simply the latet values for the subects i the sample set, but could be defied quite geerally. Whe the latet values for the subects i the sample set are icluded i this defiitio, it is always possible to cosider the realized sample as a possible sample poit i the superpopulatio model. Notice how this defiitio obscures the iterpretatio of a superpopulatio, sice the oly idetifiable subects are those i the sample. While it may be appealig to thik of a superpopulatio as a larger fiite populatio (as i Voss (999)), there is o eed to do so. The FPMM is the result of movig i a differet directio as a cosequece of Godambe s oexistece results. Rather tha icludig additioal assumptios to the model for a sample set, the FPMM collapses radom variables to a lower dimesioal space. Oe casualty i the collapsig is a loss i idetifiability of subects for the FPMM radom variables. This idetifiabiltiy is lost whe developig predictors of realized radom effects, but re-gaied oce the subects i the sample set are realized. I order to maitai these distictios, Staek ad Siger (004) have described the FPMM-BLUP as a predictor of the latet value of a realized subect i a positio i a sample. The termiology by itself is cofusig, sice the positio is ot of substative iterest i a practical problem, ad the subect (whose latet value is of iterest) is ot idetifiable. A advatage of the approach is the iclusio oly of sample poits that are potetially realizable, a fact that simplifies assessig performace of the predictor. While the simple examples illustrate that the FPMM predictor may outperform the MM predictor i some settigs, guidace for its use is curretly lackig. Some of the mai ideas i these results are far reachig. First, we coclude that a importat area for ivestigatio of statistical iferece is coditioal o the sample set. Secod, we coclude that it is crucial to evaluate properties of estimators over potetially realizable sample poits, ad ot iclude poits i the portio of a artificial sample space. This simple guidelie ca elimiate debate over iclusio of prior distributios, or other artificial assumptios, sice use i developig estimators is allowed, but the evaluatio of the properties of the estimators is tied to reality. Fially, these results illustrate that there is a lot to be leared. May accepted procedures, models, ad theories appear to be based o ideas that are ot cosistet with these two coclusios. Their re-examiatio may lead to a souder basis for statistics i the future. C09ed3v.doc 0/7/009 5:40 PM 5

Appedix A. We develop the predictor of P = X μ +Z a where X = ad Z = e as a liear fuctio of Y, i.e. P = cy, that is ubiased ad has miimum expected MSE i the mixed model followig the developmet by Goldberger (96) as reviewed by Robiso (99). We first ote that X μ ad Y E ξ R = P X var ξ R Y Ω ZΓZ =. The ubiased costrait P Z ΓZ Z ΓZ requires that E ( P P ) 0 R ξ =. Sice ( ) = ξ cy ( μ + Z a) E P P E X ξr R = Eξ cp = cx μ X μ ( X Z ) μ, a where = ( ) P P P P, the ubiased costrait is give by cx = 0. Miimizig X ( P P ) = c c czγz + Z ΓZ ξ Ω with respect to c subect to the ubiased costrait var R results i ( P = X μ + Z ΓZΩ Y Xμ ) where μ ( ) = XΩ X XΩ Y. Sice ( ) Ω = γ + σ γ J, = Ω = k + kk, where γ = ( k) k γ = γ + σ, ( ) k = k k k ad k k Ω = so that = k. Usig this result, X X γ ( k ) μ k = ky. Now ΓZ Ω = k + k k ad hece = ( ) ( ) k ( Ω μ ) = k ( ) μ + μ ΓZ Y X Y X k k Y X = ( ) ( ) ( ) k C09ed3v.doc 0/7/009 5:40 PM 6

where ( ) ( ) ( ) μ k k k Y X = 0, so that ( Y μ) ( μ ) = X μ + Z ΓZΩ Y X P = μ+ k. Appedix B. Mea Squared Error The mea squared error of P uder the model for ( P P ) ξ where P = X μ Z ΓZ ( Y X μ ) ( k ) var R ca write P = cy where where Pw k P k = + Ω ad Y is give by P = X μ +Z a. Notice that we c = k + e k. First, observe that k = E P = cp ( ) R ξ = ( k ) k P + k P = + k = ( k) Pw kp Pw k( P Pw) = + =. Now var ( ) ( ξr P P = varξr c Z ) Y Ω Γ varξ R = ad a Γ Γ result, Usig c ad Ω= Γ + σ, var = ξ R Y where a Y = Γ + σ. As a 0 0 = a ( ) P P = σ + + varξ R c Γ c Z Γc Z ΓZ. = Z = e, we expad these terms to obtai C09ed3v.doc 0/7/009 5:40 PM 7

var These terms simplify as ad ( P P ) ( k ) ( k) ξ R = = k σ k+ kγk k k ( k ) ( k) + e k+ k e k k kσ kσ = = ( k ) ( k) k k = = + e Γk+ kγ e k k k σ + e e + e k Γ k e = = = ( k ) e Γk e Γ k e + e Γe k = ( k ) ( ) k γ ( k ) ( k) k σ k+ kγk =, k = k k k k e kσ k+ k kσ e + k = k = ( ) ( ) ( k ) ( ) k γ ( k) k( k) + e k Γk+ kγ k e = k = k = k k( k) γ e kσ e + e k Γ k e =, = = = ( k ) + ( k) kγ e Γk e Γ k e + e Γe = γ kγ. k = k Combiig all terms ad simplifyig the resultig expressio, ( ) Appedix C. The Coditioal MSE We evaluate E ( R P P P = ) P = y. As a result, ( ) k Sice c = k + e k, k = y. Now P = cy, ( R ) ( P = y) = ( cy P= y) E P P E y R R = cy y varξ R P P = σ k +., k k. E Y P= y = y, ad P = y whe. C09ed3v.doc 0/7/009 5:40 PM 8

where w As a result, cy ( ) = k y + k y = k ( k ) k k k = y + k y y = k = k = μ + k y w k = ad μw = w y. Hece, k = E ( R P P = ) = ( k )( y μw) We evaluate E ( ) R P P P = y by otig that give = Now ( ) E R = ( ) P y. ( ( ) ) ( ) (( ) ) ( ce ( y cy )) P y E P P P = y = E c y + E y P= y R R EE = σ. As a result, ( ) = E = R ( ) ( y ) = c E EE c + c y R (( ) ) = = σ + ( ) = ( k) ( k) ( k ) P y, P = = ( + ) cy c P E. ER P P P y c c y cy. Now ( ) ( ) k k c σ k σ k = c = k + e + k e k = = k = = k σ k σ k + e k+ e k = = k = ( ) k = k σ k σ k σ k+ e k+ e e k = k = = ( k ) ( k) = k σ + k σ + k σ cy = k μ + k y, Usig ( ) w k = k.. C09ed3v.doc 0/7/009 5:40 PM 9

As a result, (( ) P y) ( cy ) ( ) y = y k μw ky ( k)( y μw) ( k) ( y μw) = = ( k ) ( k) E P P k k k k y. ( ) ( ) R = = σ + σ + σ + μw k = k ( k ) = ( k) w σ + ( y μw) kσ + + = k. Appedix D. Relatioship Betwee Models for Sequeces ad Sets Respose for sequece h give by Yh ca be related to respose defied by the respose error model (). To see this, we represet the idicator variables that defie sequece h i u h as a permutatio (defied by v m ) of elemets i set h (defied by δ h ) such that u = δ v h h m where δh δh δh N δh δh δhn δ h =, δh δh δhn th δ hs has a value of oe if the smallest subect s label i set h is for subect s, ad zero otherwise, ad vm vm vm vm vm vm v m = vm vm vm th with elemets v mi havig a value of oe if the smallest label i set h is i positio i, ad zero otherwise. For example, whe = ad N = 3, the data for sequece h cosistig of subect s = 3 followed by s = is (( s = 3, Y ) ( s =, Y h h) ) 0 0 permute the subects i set h defied by δ h =. 0 0 Appedix E. is defied by usig 0 v m = to 0 C09ed3v.doc 0/7/009 5:40 PM 0

Z i = The FPMM-BLUP of a sample subect s latet value, PIi = Xiμ + Zb i where X i = ad e may be obtaied similarly to the developmet i Appedix A. We first ote that i γ Y I X Γ = I J ad Ω= Γ+σ I so that E pr = μ ad N P Ii X i Y I Ω ZΓZi varpr =. The predictor is a liear fuctio of Y I give by P I = cy I such P Ii ZiΓZ ZiΓZi that E ( P P ) = 0 which implies that cx = 0. The FPMM-BLUP is give by pr I Ii + Z ΓZ Ω ( Y X ) where μ ( ) P = X μ μ Ii i i I X i = X Ω X X Ω Y. Sice I k Ω = I + J, X Ω X = where f =, γ + σ N k γ + ( f ) σ N X Ω Y = I Y, ( ) μ = X Ω X X Ω Y γ + σ fk I I = Y ad k Z iγz Ω = k e i + N k. Sice Y I X I μ = I J Y, ( ) Z iγz Ω YI Xμ = ke i I J YI so that PIi = Y + k( Y Ii Y). Usig these expressios, P Ii = + ke i I J Y where I Y I c i = + ke i I J. Now varpr = Γ + σ I. As a result, b 0 0 var ( ) ( ) pr PIi PIi = c i σ I + Γ c i ZΓc i i + ZΓZ i i. Now ( ) c ( ) i σ I + Γ ci = c i σ + γ I γ J ci N, = ( σ + γ ) cc i i γ cj i ci N ad cc i i= + k while cj i ci = resultig i ( ) c i σ I + Γ c i = ( σ + γ ) + k γ. N Also, Z Γc i i = γ + kγ N ad Z iγz i = γ. Combiig these terms ad N simplifyig, ( σ varpr PIi PIi ) = + k( ). C09ed3v.doc 0/7/009 5:40 PM

Appedix F. The MSE of the FPMM-BLUP for a Sample Set The FPMM-BLUP of a sample subect s latet value, PIi = Xiμ + Zb i where X i = ad Zi = e i P = Y + k Y Y. The predictor is developed over all possible sample sequeces idetified is Ii ( Ii ) u δ v where a realized sample sequece is the realizatio of by = h h m ( P P I ) var pr Ii Ii h. Sice Y I v δ Y ad P I = v δ y, I = m h I m h H Iu =. We evaluate h P ( Ii PIi I δ Y = cm gm ) δhy where we defie c m = c iv m ad g m = gv m. As a result, var ( ) var ( ) pr PIi PIi I =c mδh R Y δhcm. Now δ h varr( Y) δ h = σ ad c m = + ke i v m J. Notice that ev i m = e with elemets = e = ev i mi so that we ca express c m = + ke I J. As a result, defiig i= σ h σ = =, ( P P I ) pr Ii Ii = c m σ cm = var Also, settig μ = h h y, the MSE is give by = σ h + k ( σ σh ) + k σ σ + σh. = ( ( ) k+ kσ ( ) ) + k σh ( Y E ) ( R PIi PIi I = c m gm) ER y, = ( k)( y μ ) ( ) ( ( ) ( ) ) ( = + σ + σ + ) ( μ ) MSEpR PIi PIi I k k k h k y = + + + ( ) k kσ ( k) ( σ ) ( k) ( y μ ) h h. C09ed3v.doc 0/7/009 5:40 PM

Refereces Brow, E.M., ad Kass, R.E. (009). What is statistics? (with discussio), The America Statisticia, 63: 05-3. Cassel, C.M., Särdal, C.E. ad Wretma, J.H. (977), Foudatios of Iferece i Survey Samplig, New York, NY: Joh Wiley. Godambe, V.P. (955). A uified theory of samplig from fiite populatios. Joural of the Royal Statistical Society B. 7:69-78. Godambe, V.P. ad Joshi, V.M. (965). Admissibility ad Bayes estimatio i samplig from fiite populatio. I. The Aals of Mathematical Statistics 36. 707-7. Hederso, C.R. (975). Best liear ubiased estimatio ad predictio uder a selectio model, Biometrics 3:43-447. Hederso, C.R. (984). Applicatios of Liear Models i Aimal Breedig. Uiversity of Guelph, Guelph Caada (ISBN 0-88955-030-). Koch, G.G. (967). A procedure to estimate the populatio mea i radom effects models, Techometrics 9: 577-585. Koch, G. G. (973). A alterative approach to multivariate respose error models for sample survey data with applicatios to estimators ivolvig subclass meas, Joural of the America Statistical Associatio, 68: 906-93. Koch, G. G., Gilligs, D.B., ad Stokes, M.E. (980). Biostatistical implicatios of desig, samplig, ad measuremet to health sciece datat aalysis, A. Rev. Public Health :63-5. C09ed3v.doc 0/7/009 5:40 PM 3

Merriam, P.A., Ockee, I.S., Hebert, J.R., Milagros, C.R., ad Matthews,C.E. (999). "Seasoal variatio of blood cholesterol levels: study methodology," Joural of Biological Rhythms. Vol. 4 No. 4, 330-339. Nelder, J.A. (977). A reformulatio of liear models w(with discussio). Joural of the Royal Statistical Society A 40 48-76. Ockee, I.S., Chiriboga, D.E., Staek, E.J.III, Harmatz, M.G., Nicolosi, R., Saperia, G., Well, A.D., Merriam, P.A., Reed, G., Ma, Y., Matthews, C.E. ad Hebert, J.R. (004). Seasoal variatio i serum cholesterol: Treatmet implicatios ad possible mechaisms. Archives of Iteral Medicie, 64:863-870. Robiso, G.K. (99). That BLUP is a good thig: the estimatio of radom effects, Statistical Sciece, 6:5-5. Staek E.J. III ad Siger, J.M. (004), Predictig Radom Effects from Fiite Populatio Clustered Samples with Respose Error, Joural of the America Statistical Associatio, 99: 9-30. Voss, D.T. (999). Resolvig the Mixed Models Cotroversy. The America Statisticia 53. 35-356. C09ed3v.doc 0/7/009 5:40 PM 4

Table. Populatio Values ad Parameters for Simple Example Latet Respose Value Variace Subect's Subect Name s y s σ s Daisy 0 Lily 3 00 Rose 3 4 Source: c09ed33.xls Table. Potetially Realizable Resposes for the Set of Subects of s = ad s = 3 Assumig a Mixed Model Source: c09ed33.xls Potetially Realized Respose t Subect Respose Variace σ Mixed Model Latet Value P Respose Error E MM Respose Y t Daisy (=) 0 Rose (=) 4 4 Daisy (=) 0-9 Rose (=) 4 4 3 Daisy (=) 0 Rose (=) 4-0 4 Daisy (=) 0-9 Rose (=) 4-0 Table 3. Additioal (o-realizable) Resposes for the Set of Subects of s = ad s = 3 Assumig a Mixed Model Source: c09ed33.xls Norealizable Respose t Subect Respose Variace σ Mixed Model Latet Value P Respose Error E MM Respose Y t 5 Daisy (=) 3 Rose (=) 4 0 6 Daisy (=) - Rose (=) 4 0 7 Daisy (=) 3 Rose (=) 4 0-8 8 Daisy (=) - Rose (=) 4 0-8 C09ed3v.doc 0/7/009 5:40 PM 5

Table 4. Predictors of MM-Latet Values, Differece from P, ad MSE for the Set of Subects s = ad s = 3. t Mixed Model Respose Daisy (=) Y P Mixed Model Latet Value P Differece P P MSE ( P ) P Mixed Model Respose Rose (=) P Y Mixed Model Latet Value P Differece P P MSE ( P ) P 0.90 0 0.90 0.8 4 4.4.4 5.79 9 8.93 0 -.07.5 4 4.9.9 5.4 3 0.84 0 0.84 0.7 0 0.64 -.36.86 4 9 8.87 0 -.3.8 0 0.5 -.48.9 5 3 3.3.3.8.48 0.48.9 6.6-0.84 0.7.36 0.36.86 7 3 3.07.07.5 8 7.7 0 -.9 5.4 8.0-0.90 0.8 8 7.59 0 -.4 5.79 Average 6 6 6 0 0.986 6 6 6 0 3.768 Source: c09ed33.xls Table 5. Predictors of Subect s Latet Values, Differece from Subects s = ad s = 3. t Mixed Model Respose Daisy (=) Y P Daisy's Latet Value y Differece P y MSE ( P ) y Mixed Model Respose Rose (=) P Y y, ad MSE for the Set of Rose's Latet Value y Differece P y MSE ( P ) y 0.90 0 0.90 0.8 4 4.4.4 5.79 9 8.93 0 -.07.5 4 4.9.9 5.4 3 0.84 0 0.84 0.7 0 0.64 -.36.86 4 9 8.87 0 -.3.8 0 0.5 -.48.9 5 3 3.3 0-6.87 47.9.48 9.48 89.84 6.6 0-8.84 78.6.36 9.36 87.65 7 3 3.07 0-6.93 47.99 8 7.7 5.7 3.6 8.0 0-8.90 79.8 8 7.59 5.59 3.30 Average 6 6 0-4.0 3.058 6 6 4.000 3.058 Source: c09ed33.xls C09ed3v.doc 0/7/009 5:40 PM 6

Table 6. Estimators of the Mea Latet Value P = y from a Respose Error Model ad MM, the Differece from the Mea, ad the MSE for the Set of Subects s = ad s = 3. t Mea Respose Y Mea MM Respose Y WLS μ True Ave Latet Value P Differece Mea Respose Y y Differece Mea MM Respose Y P Differece WLS μ P MSE Mea Respose MSE Mea MM Respose MSE WLS ( Y y) ( Y P) ( μ P) 7.50 7.50 7.65 6.00.50.50.65.5.5.73 6.50 6.50 6.6 6.00 0.50 0.50 0.6 0.5 0.5 0.37 3 5.50 5.50 5.74 6.00-0.50-0.50-0.6 0.5 0.5 0.07 4 4.50 4.50 4.70 6.00 -.50 -.50 -.30.5.5.70 5 7.50 7.30 6.00.50.30.5.70 6 6.50 6.6 6.00 0.50 0.6 0.5 0.07 7 5.50 5.39 6.00-0.50-0.6 0.5 0.37 8 4.50 4.35 6.00 -.50 -.65.5.73 Average 6 6 6 6 0 0 0.50.50.7 Source: c09ed33.xls C09ed3v.doc 0/7/009 5:40 PM 7