Sampling, WLS, and Mixed Models Festschrift to Honor Professor Gary Koch

Similar documents
Sampling, WLS, and Mixed Models Festschrift to Honor Professor Gary Koch

The standard deviation of the mean

Properties and Hypothesis Testing

1 Inferential Methods for Correlation and Regression Analysis

Basics of Probability Theory (for Theory of Computation courses)

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable.

Random Variables, Sampling and Estimation

Optimally Sparse SVMs

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Infinite Sequences and Series

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Lecture 19: Convergence

Statistics 511 Additional Materials

(3) If you replace row i of A by its sum with a multiple of another row, then the determinant is unchanged! Expand across the i th row:

Optimal Estimator for a Sample Set with Response Error. Ed Stanek

Commutativity in Permutation Groups

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Element sampling: Part 2

CS284A: Representations and Algorithms in Molecular Biology

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Estimation for Complete Data

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Estimation of the Population Mean in Presence of Non-Response

6.3 Testing Series With Positive Terms

Advanced Stochastic Processes.

Hoggatt and King [lo] defined a complete sequence of natural numbers

Estimation of Gumbel Parameters under Ranked Set Sampling

Chapter 9 - CD companion 1. A Generic Implementation; The Common-Merge Amplifier. 1 τ is. ω ch. τ io

Principle Of Superposition

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

Lecture Overview. 2 Permutations and Combinations. n(n 1) (n (k 1)) = n(n 1) (n k + 1) =

Abstract. Ranked set sampling, auxiliary variable, variance.

Linear Regression Demystified

Sequences. Notation. Convergence of a Sequence

x a x a Lecture 2 Series (See Chapter 1 in Boas)

Discrete-Time Systems, LTI Systems, and Discrete-Time Convolution

ECON 3150/4150, Spring term Lecture 3

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

This is an introductory course in Analysis of Variance and Design of Experiments.

Statistical inference: example 1. Inferential Statistics

4.3 Growth Rates of Solutions to Recurrences

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Math 155 (Lecture 3)

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

6 Sample Size Calculations

ESTIMATION AND PREDICTION BASED ON K-RECORD VALUES FROM NORMAL DISTRIBUTION

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.

Chapter 6 Part 5. Confidence Intervals t distribution chi square distribution. October 23, 2008

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Axioms of Measure Theory

Introductory statistics

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution

Understanding Samples

Proof of Goldbach s Conjecture. Reza Javaherdashti

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

CEU Department of Economics Econometrics 1, Problem Set 1 - Solutions

Some examples of vector spaces

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random

Probability, Expectation Value and Uncertainty

A statistical method to determine sample size to estimate characteristic value of soil parameters

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Modified Ratio Estimators Using Known Median and Co-Efficent of Kurtosis

Algebra of Least Squares

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

Estimation of Population Mean Using Co-Efficient of Variation and Median of an Auxiliary Variable

1 Hash tables. 1.1 Implementation

CHAPTER 10 INFINITE SEQUENCES AND SERIES

Sequences, Mathematical Induction, and Recursion. CSE 2353 Discrete Computational Structures Spring 2018

Mathematical Induction

6.867 Machine learning, lecture 7 (Jaakkola) 1

Topic 5: Basics of Probability

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

1.010 Uncertainty in Engineering Fall 2008

Lecture 12: September 27

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Distribution of Random Samples & Limit theorems

An Introduction to Randomized Algorithms

Statisticians use the word population to refer the total number of (potential) observations under consideration

The Random Walk For Dummies

7.1 Convergence of sequences of random variables

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

Dirichlet s Theorem on Arithmetic Progressions

GUIDELINES ON REPRESENTATIVE SAMPLING

Improved Class of Ratio -Cum- Product Estimators of Finite Population Mean in two Phase Sampling

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Frequentist Inference

( ) = p and P( i = b) = q.

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

G. R. Pasha Department of Statistics Bahauddin Zakariya University Multan, Pakistan

Math 113 Exam 3 Practice

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Topic 9: Sampling Distributions of Estimators

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

MOMENT-METHOD ESTIMATION BASED ON CENSORED SAMPLE

A Relationship Between the One-Way MANOVA Test Statistic and the Hotelling Lawley Trace Test Statistic

Problem Set 4 Due Oct, 12

SEQUENCES AND SERIES

Transcription:

Samplig, WLS, ad Mixed Models Festschrift to Hoor Professor Gary Koch Edward J. Staek Departmet of Public Health Uiversity of Massachusetts, Amherst, MA 40 Arold House 75 N. Pleasat Street Uiversity of Massachusett, Amherst MA 000 43-545-38 ad Julio M Siger Departameto de Estatística Uiversidade de São Paulo, Brazil uliosiger@gmail.com Ruig Title: Samplig, WLS, ad Mixed Models KEYWORDS: best liear ubiased predictors, latet values, predictio, shrikage, superpopulatio, desig-based iferece C09ed3v6v.doc /8/009 5:5 PM

Abstract Mixed models may be defied with or without referece to samplig or samplig radom variables, ad ca be used to predict realized radom effects, as for example whe estimatig the latet values of study subects measured with respose error. Whe the model is specified without referece to samplig, a simple mixed model icludes two radom variables, with oe stemmig from a exchageable distributio of latet values of study subects ad the other from the study subects respose error distributios. Positive probabilities are assiged to both potetially realizable resposes ad artificial resposes that are ot potetially realizable, resultig i artificial latet values. cotrast, fiite populatio mixed models may be defied to represet the two-stage process of samplig subects ad measurig their resposes, where positive probabilities are oly assiged to potetially realizable resposes. A compariso of the estimators over the same potetially realizable resposes idicates that the optimal liear mixed model estimator (the usual best liear ubiased predictor, BLUP) is ofte (but ot always) more accurate tha the comparable fiite populatio mixed model estimator (the FPMM BLUP). The example provides the basis for a broader discussio of the role of coditioig, samplig, ad model assumptios i developig iferece. C09ed3v6v.doc /8/009 5:5 PM

troductio: Advaces i public health ad health sciece are tied to uderstadig practical implicatios of chages i policy, programs, uderlyig causes of diseases, prevetio, ad/or treatmet (Koch et al. (980)). Uderstadig the impact of such chages is the focus of much of Biostatistics. Not oly does Biostatistics embrace the theoretical uderpiigs of statistical modelig, but it seeks to tie the results of studies to reality. t is this struggle that has bee the focus of much of Koch s work, ad cotiues to be of compellig iterest. For this reaso, samplig plays a importat role i may applicatios, sice estimates are eeded for real populatios. This is ot a simple process, because it ivolves recocilig seemigly ad hoc approaches, such as i Koch s (967) procedure to estimate the populatio mea, with the fudametal basis of iferece from survey samplig, whe exteded, for example, to respose error (Koch, 973), ad to model based approaches. We cosider a simple settig that we feel challeges the depths of uderstadig statistical iferece, amely, estimatig the latet value of some characteristic (eg., cholesterol) for a subect. A practical example is the Seasos Study Merriam et al. (999), Ockee et al. (004), where three 4- hour recall dietary iterviews were collected o each study subect i each seaso of a year to evaluate seasoal cholesterol chages, cotrollig for the cotributio of saturated fat itake. The 4-hour recalls were used to estimate the average saturated fat itake for each subect i the six weeks prior to cholesterol measure (the latet value). Average saturated fat itake, ad the estimated stadard deviatio for 554 study subects are displayed i Figure ad reveal that both the latet value ad the variace i saturated fat itake are likely to vary amog subects. Rather tha usig the simple average saturated fat itake for the seaso to estimate a subect s latet saturated fat itake (which is the best liear ubiased estimator from a respose error model), a more accurate estimator is the BLUP from a mixed model (MM), obtaied by replacig the subect effect by a radom effect. Although the MM-BLUP is commoly used to estimate a subect s latet value, a close examiatio reveals that a portio of the MM sample space is artificial ad ot potetially realizable. This motivates a re-examiatio of the associated sample space C09ed3v6v.doc /8/009 5:5 PM 3

ad the criteria used to evaluate its performace. t also provides a cotext for compariso with the FPMM ad FPMM-BLUP cosidered by Staek ad Siger (004), where all sample poits are potetially realizable. We discuss these issues i the cotext of a simple problem. First, we develop a MM for a set of subects whose resposes follow a simple respose error model by replacig the subect effect by a radom effect. We use this setup to develop the MM-BLUP. Via a simple example, we describe the sample space for the MM, ad distiguish the (artificial) MM-latet values from (potetially realizable) subect latet values, ad MM-resposes from potetially realizable resposes. We cotrast this developmet with a FPMM mixed model that is based o a simple radom sample of a populatio, icludig a respose error model for sample subects resposes, ad the correspodig FPMM-BLUP. We coclude with a discussio of the coectios betwee the issues raised i the example ad some broader ideas. Estimatig Latet Values i The Mixed Model May frameworks ca be used to develop a best liear ubiased predictor as discussed by Robiso (99). Although with appropriate assumptios, the expressio for the BLUP may be idetical whe motivated from differet frameworks, the differeces betwee them are importat i uderstadig the coectio betwee the physical problem, the stochastic model, ad the solutio. These coectios are importat sice they facilitate the iterpretatio of the results. The MM framework we use begis with a additive respose error model for each subect, ad assumes exchageability of the correspodig latet values. The study subects costitute a set that may or may ot have bee obtaied as a result of a probability sample from a populatio. the Seasos Study, for example, the study subects costitute a subset of members of the Fallo Health Maiteace Orgaizatio (HMO) who voluteered to participate, ad ot a probability sample of Fallo HMO members. C09ed3v6v.doc /8/009 5:5 PM 4

We start with a set of subects, labeled =,..., ad assume that repeated resposes, Y k, k =,..., r s, are associated with subect. The data for the set correspod to the pairs,, ( Y Y Yr ), =,...,. We assume that the resposes associated with subect are idepedet ad correspod to idetically distributed radom variables Y k, k =,..., r, ad defie E R ( Y k ) = y as the latet value for var R Y k σ subect ; we also deote the correspodig respose error variace by ( ) =. The subscript R idicates expectatio with respect to the distributio of the respose error. For simplicity, we cosider a sigle measure for subect ad drop the subscript k so that the respose error model may be writte as Y = y + E. () Whe r >, Y ad E correspod to the average respose ad average respose error, respectively, ad σ represets the variace of these averages, which we assume kow. The latet values, y, are the parameters of iterest. Without additioal assumptios, the respose for subect, amely, Y, is the best estimator of the subect s latet value. We costruct a MM by addig to the respose error model the assumptio that the latet values for the subects are a realizatio of = ( ) P P P P, a vector of exchageable radom variables whose possible equally likely values iclude (but are ot restricted to) the latet values of the subects i the set. The latet values uderlyig the radom variables i P could be solely the = latet values of the subects i the set, the N = latet values of subects i a populatio from which the subects were sampled, or some other set of values, iclusive of the latet values of the subects i the set, which C09ed3v6v.doc /8/009 5:5 PM 5

we may refer to collectively as a superpopulatio. Let us defie Eξ ( P ) γ = E ξ ( P ) μ μ = ad, where the subscript ξ idicates expectatio with respect to the distributio of latet values. The term μ is the mea latet value of the subects i the set whe =, the mea latet value of subects i the populatio whe N =, or the mea latet value of subects i the γ = y μ superpopulatio. Whe =, ( ) while whe N =, γ = ( y ) s μ = where s =,..., N label subects i a fiite populatio. N, N s= A realizatio of P is a MM-latet value which we re-parameterize as P a = μ +, =,...,, where ξ ( a ) =, ad Eξ ( aa ) γ E 0 = whe =, or Eξ ( aa ) = γ otherwise. Oe possible realizatio of P is = ( ) y y y y. The MM is give by Y = μ + a + E () or i matrix form, by where = ( ) = μ + + Y X Za E Y Y Y Y, X=, a colum vector with all elemets equal to, Z=, a idetity matrix, a = ( a a a ), the vector of radom effects, ad E = ( E E E ), a vector of respose errors. the MM, E ξ R ( ) Y = X μ, while var R ( Y ) ξ = Ω, where Ω= Γ + σ, = Γ = γ J, with J =, ad σ deotes a = C09ed3v6v.doc /8/009 5:5 PM 6

matrix with diagoal elemets σ ad off-diagoal elemets equal to zero. Every realizatio of Y i () correspods to a idetical realizatio of Y i (), but ot vice-versa. Let the target correspod to T = g P. Whe g = e, a vector whose elemets are all equal to zero except the elemet i row that is equal to oe, the target is P. The correspodig BLUP is a liear fuctio of Y that is ubiased ad has miimum expected mea squared error (MSE). We may show (see Appedix A for details) that the BLUP of P is ( μ ) P = μ + k Y (3) where μ is the weighted least squares (WLS) estimate of the mea, i.e., μ = wy, with = w = γ + σ = γ + σ, ad k γ = γ + σ. The expected MSE of the predictor is give by ( P P ) = σ k + varξ R k k, where k k = =. t is smaller tha σ, the expected MSE attaied whe we use the subect s respose as a estimate of the subect s latet value, (see Appedix B). With this uderstadig, P may be cosidered a better estimator of Assumig that all realizatios of the oly realizatio of T that i reality ca occur. y tha the observed respose, P are distict, otice that the latet value of subect is give by Y. y is The mixed model give by () is defied for a set. t is ot ecessary for the set to have bee selected via probability samplig from a larger populatio, ad i fact, the set could iclude all subects i a populatio. A alterative mixed model cosiders the data to be the realized respose of a radom C09ed3v6v.doc /8/009 5:5 PM 7

sample of subects from a (fiite) populatio (Staek ad Siger 004). We refer to this model as the fiite populatio mixed model (FPMM), ad ote that the BLUP give by (3) is ot the same as the FPMM-BLUP. Before cosiderig these differeces, we discuss a simple eumerative example for the MM. Examples The data correspod to = subects: Daisy, labeled = ad Rose, labeled =, summarized i Table. - isert Table here Note that the respose error variace differs betwee subects. We assume that for each subect, =,,, respose error ca take o two equally likely values correspodig to σ or σ. Uder the respose error model (), each respose (correspodig to a pair of values for the sample set) is equally likely with probability ¼. With these assumptios, we display i Table the potetially realizable resposes correspodig to the four combiatios of respose error. - sert Table here- The potetially realizable resposes i Table coicide with the resposes (which we idex by t ) for the MM whe the realizatio of P (the MM-latet values) is y. Sice the latet values are assumed to be exchageable ad =, there are two possible realizatios of P. Resposes for the other realizatio of the MM-latet values are listed i Table 3. C09ed3v6v.doc /8/009 5:5 PM 8

- sert Table 3 here- The resposes for the MM listed i Tables ad 3 correspod to the equally likely realizatios, t =,...,8, of Y, each occurrig with probability /8. The correspodig realizatios, t Y t, for P, t =,...,8, of P are the realized MM-latet values. Whe t =,...,4 (as i Table ), the realizatios of P ad correspod to realizatios of y ad Y i (), respectively. For such data, Daisy s realized MM-latet value is 0. The P ad Y are artificial whe t = 5,...,8 (as i Table 3). For these values of t, Daisy s realized MM-latet value is. Y The BLUP of P uder the MM give by (3) for each realized MM-respose is give i Table 4, where the first pael correspods to Daisy, ad the secod pael, to Rose. The differeces, P P, ad the correspodig squared differeces are give i last two colums of each pael. Notice that the average differece is zero, satisfyig the ubiased costrait give by E R( P P) 0 ξ =. The average squared differece, or MSE, is 0.99 for Daisy ad 3.77 for Rose. These values are smaller tha those that would result from a best liear ubiased estimator (BLUE) usig model (), amely σ = for Daisy, ad σ 3 = 4 for Rose. -sert Table 4 here- There are some problems with these results which ca be illustrated by focusig o the MM-resposes for Daisy (first pael of Table 4). Notice i Table, Daisy s latet value is 0; i Table 4, is also listed as a latet value for Daisy. The MM-latet value of for Daisy (correspodig to the MM-resposes t = 5,...,8 ) exists oly i the mixed model, ot i reality. Such a latet value is artificial, ad oe could C09ed3v6v.doc /8/009 5:5 PM 9

argue that it should ot be give a positive probability i the aalysis. This is ot due to ambiguity over which subect is labeled =, sice this label oly correspods to Daisy i the model defiitio. These results shed light o the iterpretatio of bias ad o the defiitio of the MSE for the MM whe the target is the subect s latet value. Whe y is the target, the bias is determied by subtractig the subect s actual latet value from P i all settigs as show i Table 5. Usig the subect s actual latet value, the BLUP give by (3) is biased for each subect, ad its MSE is larger tha the MSE of the BLUE based o model (). -sert Table 5 here- the MM, positive probability is give to MM-resposes that are ot potetially realizable. By averagig over these artificial resposes i additio to the potetially realizable resposes, the coectio betwee the MM ad reality is broke. This creates cotradictios i the iterpretatio of results. For example, the latet value for Daisy ( = ) is 0 for all potetially realizable resposes, but the expected value of the correspodig MM-latet values is E ( P) ξ R = 6. To retai the iterpretatio that the realized MM-latet value is the latet value for the subect, oly the MM sample poits that are potetially observable (i.e., correspodig to t =,...,4 ) should be give positive probability. Restrictig evaluatio of (3) to potetially observable resposes provides some isight o the bias ad the MSE. The coditioal bias is give by ( P P = ) = ( k )( y μ ) E P y, R (see Appedix C). the example, the coditioal bias for Daisy is -0., while the coditioal bias for Rose is 0.46. The average coditioal bias over the subects is ot equal to zero, although the limit of the C09ed3v6v.doc /8/009 5:5 PM 0

bias with icreasig umbers of measures of respose is zero, sice ( k ) defiitio for the MSE (see Appedix C), i.e., (( ) ) ( ) ( ) lim =. Usig a similar r ( k ) E R P P P= y = k w σ + y μw + kσ +, (4) = k where μ = w y, it follows that the MSE for Daisy is 0.986 ad 3.768 for Rose, both smaller tha w = the MSE of the simple resposes, Y, =,. Estimatig the Mea Latet Value Our developmet has focused o estimatig the MM-latet value for a subect. We ca use similar methods to obtai a estimate of T = g P where g =, a target that correspods to the average MM-latet value, P. The correspodig BLUE is the weighted least squares (WLS) estimator give by μ. Whe > N, Eξ ( T ) is the mea of the MM-latet values i the superpopulatio, while whe = N, Eξ ( T ) is the mea latet value i the populatio. Whe =, P is equal to y = y =, the mea of the latet values i the respose error model (). The BLUE of y i () is the mea respose, Y Y = =. Sice P = y, it is temptig to compare the BLUE obtaied uder model () with the BLUP obtaied uder model () whe =, as illustrated i Table 6. -sert Table 6 here- C09ed3v6v.doc /8/009 5:5 PM

Uder model (), there are o resposes comparable to the MM-resposes for t = 5,...,8. This is a cosequece of the iclusio of artificial resposes i the MM. The target, P, is costat over all possible MM-resposes. f we defie a estimator similar to Y for the MM-resposes as Y the the MSE of Y = =, Y ad μ ca be evaluated over the same sample space. The MSE of μ, give by ( ) k ξ μ P = γ, is smaller tha the MSE of k E R y = = MSE of Y uder model (), i.e., ( ) of μ istead of Y. E R Y Y, give by E ( ) R Y P ξ which equals the σ, providig the usual ustificatio for the use Evaluated over potetially realizable resposes, i.e. those correspodig to t =,...,4, the bias of the WLS estimator is E ( ξ R ) k k P T P= y = y. The ubiased property of the WLS estimate = k of the average latet value holds oly whe expectatio is take over all MM resposes, icludig those artificial resposes that are ot potetially realizable. The MSE, evaluated oly over potetially realizable resposes, is ( = ) = σ + ( ) MSE R P T P y ξ w w y =. = Whe =, as i the example illustrated i Table 6, this expressio simplifies to ( ) k ξ μ P = γ. Whe >, as illustrated ext, the coditioal MSE of the MM-BLUP is k E R ot equal to its ucoditioal MSE, ad may be larger (or smaller) tha the ucoditioal MSE. C09ed3v6v.doc /8/009 5:5 PM

A Slightly Larger Example. Although i the first example with =, it was possible to eumerate all outcomes, some issues that occur more geerally could ot be revealed. We briefly discuss a secod example where = ad the data correspod to = 3 subects, Daisy, Rose, ad Lily which we label =,..., = 3, respectively. We assume that respose error for subect ca take o two equally likely values correspodig to σ or σ. With these assumptios, there are eight equally likely possible potetially realizable resposes correspodig to the differet combiatios of respose error (Table 7). -sert Table 7 here- The t =,...,8 potetially realized resposes i Table 7 are possible resposes for the MM whe the realizatio of P (the MM-latet values) is y. Sice the latet values are assumed to be exchageable ad = 3, there are six possible realizatios of P. Replacig y by each of these realizatios gives rise to 40 artificial resposes that are ot realizable, but are icluded with positive probability i the MM. The predictor of P give by (3) uder the MM for Daisy is listed for t =,...,8 i Table 8, ad for t = 9,...,48 i Table 9. We summarize the results for the MM-BLUP of each subect i Table 0. -sert Table 8-0 here- Notice that whe averagig over the potetially realizable resposes ( t =,...,8 ), the MM-latet value is the subect s latet value. The average squared differece betwee the MM-BLUP ad the MM-latet value for the potetially realizable respose is larger tha a similar average over the o-realizable resposes for Daisy ad Rose, but ot for Lily. t is the overall average MSE (over t =,...,48 ) that is C09ed3v6v.doc /8/009 5:5 PM 3

usually evaluated for the MM, eve though such a average icludes resposes that are ot potetially realizable. Sice =, P is a costat i this example give by P = 5. t is of value to cosider the MM-BLUP estimator of P. Over the potetially realizable resposes ( t =,...,8 ), the average of μ is 6.009, while over the o-observable resposes, the average is 4.798. Although the simple average of μ over all MM-resposes is equal to P, this ubiased result oly occurs oly if the artificial resposes are icluded. The average MSE for the potetially realizable resposes ( t =,...,8 ) is give by.667, while 3.645 is the average MSE for the artificial resposes ( t = 9,...,48 ). The average MSE (over allt =,..., 48 ) give by 3.48 is larger tha the average MSE for the potetially realizable resposes, but smaller tha the comparable average MSE uder the respose error model give by.667. The Fiite Populatio Mixed Model We ow cosider the data to be the realized respose of a simple radom sample of subects from a fiite populatio, assumig a sigle respose for each sample subect. We represet subects, latet values, ad respose usig similar otatio as i model (). We defie the populatio as a set of N labeled subects, assigig the subscript s as a label to subects placed i alphabetical order by ame, ad represet the N vectors of latet values ad respose errors by y ad E, respectively. With this N defiitio, μ = ys N correspods to the usual fiite populatio mea, while N N γ, where γ s= s= N = ( y ) s μ correspods to the usual fiite populatio variace. We defie Y as a N N respose vector with elemets Ys = ys + Es, s =,..., N, so that the respose error model for the populatio is give by C09ed3v6v.doc /8/009 5:5 PM 4

Y = y + E (5). We defie a sample as a sequece of subects, ad use i=,..., to idex the subects i the sequece. We idex the possible sequeces of subects by h, where h =,..., H N! ad H = ( N ). Let! yhi deote the latet value for the subect i positio i i sequece h ad defie the sample vector of latet values by y = ( y y y ) proposed by Godambe (955). h h h h. This geeral represetatio of a sample is similar to that We defie respose for sequece h by Y = u Y h h so that the elemetyhi deotes respose for the subect i positio i i sequece h, Y = ( Y Y Y ), ad u = ( u u u ) h h h h a matrix of costats with colums give by u = ( u u u ) hi hi hi hin is h h h h for i =,...,. The elemet uhis has a value of oe if subect s is i positio i i sequece h, ad zero otherwise. For example, whe = ad N = 3, the data for sequece h cosistig of subect s = 3 followed by subect s = is (( s = 3, Y ) ( s =, Y h h) ) so that u u 0 h u h = u u = 0 0. Latet values ad respose errors u u 0 h 3 h 3 h h h for the subects i sequece h are defied i a similar maer by y = u y ad E = u E, respectively. h h h h While it is possible to relate the respose for the subect i positio i i sequece h to the respose for subect defied by the respose error model () (see Appedix D), it is importat to ote that the subect i positio i =, for example, i sequece h is ot ecessarily the same subect as the subect labeled = i model () sice for oly oe sequece will the order of the subects i () match the subect s positio i the sequece. C09ed3v6v.doc /8/009 5:5 PM 5

the fiite populatio mixed model we assume that a sample correspods to a radomly selected sequece. To formalize this, we let h represet a idicator radom variable that has a value of oe whe sample sequece h is selected, ad zero otherwise, ad let Y H = h Yh h =. We assume that all sample sequeces are equally likely (correspodig to simple radom samplig without replacemet), so that Ep ( ) h = H (where the subscript p idicates expectatio with respect to samplig). Lettig H U = u with elemets U h h h = H = u, i =,...,, s =,..., N, it follows that is h his h = Y = U Y is a vector of sample radom variables, Y s s N i UisYs s= =, i=,...,. Usig (5) ad defiig subect effects by β = y μ, s =,..., N, the fiite populatio mixed model may be writte as Y = μ + b + E (6) i i i where b N = U β ad i is s s= E N = U E, or i matrix form as i is s s= Y = Xμ + Zb+ E where = b U β, b = ( ), = ( ) b b b β β β β ad E = U E. This represets the N sample radom variables i the fiite populatio as defied by Staek ad Siger (004). The radom variable b i correspods to the deviatio of the subect s latet value from the populatio mea for the subect i positio i i a radomly selected sequece. Let the target correspod to a liear fuctio of P = Xμ + Zb give by T = g P. Whe g = e i, (a vector whose elemets are all equal to zero except the elemet i row i that is equal to oe), the target is P i. The correspodig FPMM-BLUP is a liear fuctio of Y that is ubiased ad has C09ed3v6v.doc /8/009 5:5 PM 6

miimum expected mea squared error (MSE). We show (see Appedix E for details) that the FPMM- BLUP is i ( i ) P = Y + k Y Y where Y Yi i = = is the sample average respose, k = γ γ + σ N σ s N s=, ad σ =. The expected MSE of the predictor is var ( P P ) = + k( ) pr i ad it follows that var ( P P) ( f ) pr σ σ γ = +.. Whe T = μ, g =, we have P = Y, Of particular iterest is a compariso of the average MSE for potetially realizable resposes. For all sample sets ad subects, the MM-BLUP MSE is smaller tha the MSE of the observed respose, ad smaller tha MSE of the FPMM-BLUP. For a give sample sequece, the FPMM-BLUP is biased, with the bias give by E ( R Pi Pi ) = ( k)( y μ ) sequece, the MSE of the FPMM-BLUP is give by (see Appedix F). Coditioal o a sample h ( ) ( ) σ ( ) ( σ ) ( ) ( μ ) h = + + + h MSEpR Pi Pi k k k h k y. (7) Examples We cosider the FPMM-BLUP for simple radom samples of size = from the populatio of N = 3 subects listed i Table. First, ote that there are six possible sample sequeces, with! = sequeces for each sample set. Sice the FPMM-BLUP is idetical for a subect i differet sequeces i the same set, we list the t =,..., possible equally likely sample resposes i Table correspodig to the three sample sets. C09ed3v6v.doc /8/009 5:5 PM 7

- sert Table - Notice that the FPMM-BLUP is a biased predictor of each subect s latet value, but the average bias (over all subects) is zero. The MSE differs betwee subects, ad exceeds the MSE of the observed respose for Daisy ad Rose, but is smaller (48.58 vs 00) tha the MSE of the observed respose for Lily. The average MSE of the FPMM-BLUP (i.e., 3.66) over all subects is smaller tha the average MSE of the observed respose (i.e., 35). Table provides a summary of the MM-BLUP (whe = ) ad the FPMM-BLUP for the three differet sets of = from the populatio of N = 3 listed i Table. Recall that the MM-BLUP is defied for each set, while the FPMM-BLUP is defied over all possible sets. The results i Table are arraged i paels of rows correspodig to average predictors of Daisy s, Lily s, ad Rose s latet values. The colums correspod to the average predictor, the bias, ad the MSE. The bias ad MSE are evaluated for the MM relative to the MM-latet value, ad relative to the subect s true latet value. The potetially realizable resposes correspod to rows where t =,...,4. The last three rows i Table summarize the average results over potetially realizable resposes, over artificial resposes that are ot potetially realizable, ad over all resposes. The average bias over all resposes is zero for each predictor, but whe bias is calculated oly over potetially realizable resposes, the MM-BLUP is biased, while the FPMM-BLUP is ot. The results i Table illustrate overlappig but distict sample spaces that uderlie the MM ad the FPMM predictors. A compariso of the accuracy of the predictors should be based o the average MSE over a commo set of potetially observable resposes. This correspods to the MSE for the rows correspodig to t =,...,4 i Table. A Example with N = 4 ad = 3 C09ed3v6v.doc /8/009 5:5 PM 8

We cosider a slightly larger example to compare the MM-BLUP ad FPMM-BLUP. The example is for a populatio of N = 4 where a simple radom sample of = 3 subects is selected, resultig i four possible sample sets. The populatio cosists of the origial populatio give i Table, ad a additioal subect, Violet, with a latet value ad respose variace give by y 4 = adσ 4 = 5, respectively. The compariso is made for each sample set- assumig that the set costitutes a populatio for the FPMM ad = for the MM. This meas that the betwee subects variace, γ, is idetical for the FPMM ad the MM i each set, but both the betwee subects variace ad the average respose error variace, σ, are differet for differet sets. We compare the MSE of the estimates of subect s latet values from the MM-BLUP (4) ad the FPMM-BLUP (7) i Table 3. The results idicate that the MSE of the MM-BLUP is smaller tha that of the FPMM-BLUP i most, but ot all settigs. The FPMM-BLUP MSE is smaller for Rose i the set {Daisy, Rose, ad Violet} ad for Violet i the set {Daisy, Lily, ad Violet}. Discussio The compariso of the model-based formulatio of the mixed model () ad the fiite populatio mixed model (6) via the examples provides some isight to the iterpretatio of mixed models ad reveals the opportuity for cofusio i this cotext. First, the compariso provides some clarity to Robiso s (99) discussio of whether the MM-BLUP should be termed a estimator or a predictor, ad uderscores the difficulty that Hederso (975) had i providig a covicig iterpretatio of the MM- BLUP. Hederso (984, page 37) posed the problem as to Which is the more logical cocept, predictio of a radom variable or estimatio of the realized value of a radom variable? f we have a aimal already bor, it seems reasoable to describe C09ed3v6v.doc /8/009 5:5 PM 9

the evaluatio of its breedig value as a estimatio problem. O the other had, if we are iterested i evaluatio the potetial breedig value of a matig betwee two potetial parets, this would be a problem i predictio. The termiology of estimatio applies to the MM-BLUP whe the aimal is already bor, while predictio applies to the FPMM-BLUP whe the matig parets have yet to be selected. The iterpretatio of ubiased is also clarified. the mixed model, we ca distiguish Eξ R ( Y ) from E R ξ ( Y ) = P (the MM-latet value for subect ) from E ( Y ) ξ P= y = y, the true latet R value for subect. f our iterest is i the latet value for subect the ubiased property of the MM- BLUP is defied as E ( ξ R P ) ( P ) R = μ. This differs from the usual defiitio of ubiased, give by E ξ P= y = y. Neither the MM-BLUP or the FPMM-BLUP are ubiased whe this defiitio is adopted. The MM-BLUP is a biased estimator of the subect s latet value, while the FPMM-BLUP is a biased predictor of the realized radom effect. cludig U i the BLUP acroym may provide reassurace that BLUPs are OK for those who cosider lack of bias as a pre-requisite for aalysis. But truth would be better served if both MM-BLUPs ad FPMM-BLUPs were described as biased but more accurate ways of estimatig a subect s latet value. = μ A importat aspect of the parallel developmet of the MM ad FPMM is the descriptio of the overlappig but distict sample spaces. Sice the examples we cosidered are small ad the outcomes are discrete, it is possible to make the sample spaces explicit. More geerally, the sample space is the product of possible realizatios of P ad E. f respose error has m values for each subect, both the sample spaces for the MM ad for the FPMM whe = N = have m! equally likely sample poits. C09ed3v6v.doc /8/009 5:5 PM 0

These sample spaces share m sample poits where = P y. The additioal ( )! m values i the MM whe P y are artificial, while the (! ) m resposes i the FPMM correspod to differet permutatios of the subects that are all potetially realizable, but ot all observed. this cotext, the differece i the MM-BLUP ad FPMM-BLUP is due to their developmet over the differet sample spaces. We advocate evaluatig statistics over sample spaces that are potetially realizable. This guidelie requires statistics to be liked to reality, implyig that oly a portio of the sample space be used to evaluate estimators i the MM. t is cosistet with Tukey s commet i discussio of Nelder (977) that our focus must be o questios, ot models. By limitig evaluatio of the estimators from the two formulatios of the mixed model to the potetially realizable sample space, we keep the focus o real questios. With this focus, as illustrated via the examples, the MM-BLUP of a subect s latet value is ot uiformly more accurate tha the the FPMM-BLUP. More study i this area is clearly eeded. Guidelies are lackig for estimator choice; uderstadig is lackig o how to artificially expad a sample space to produce more accurate estimators; practical issues where variace parameters are ukow are yet to be explored; ad extesios to settigs with auxiliary variables are ot yet cosidered. The distictio betwee potetially realizable poits i the sample space ad artificial sample poits i the MM provides a cotext for uderstadig the cocer expressed i much of the classical statistical literature that oly variace compoets should be estimated ad radom effects should ot be predicted. First, otice that eve though the MM icludes artificial sample poits, γ ca be iterpreted as the variace of the subect s latet values i the set whe =, as the variace of the subect s latet values i the populatio whe N =, or as the variace of the subect s latet values i the superpopulatio whe N <. This provides legitimacy to estimatig variace compoets. The ratioale for cocer over predictig radom effects i a MM is also evidet, sice for a subect, there is a differece betwee C09ed3v6v.doc /8/009 5:5 PM

the MM-latet values, ad the subect s latet value. the MM, the latet value associated with a subect is ot costat, but chages for differet sample poits. There is o reaso to be iterested i the artificial latet values assiged to a subect. This reasoig provides the logic behid a statemet that predictio of radom effects has o meaig. Our uderstadig of this cocer chages if we cosider estimatio of realized radom effects, where the term realized implies limitig cosideratio to sample poits that are potetially realizable. By restrictig the sample space to such poits, the MM latet value is costat for a subect, ad equal to the subect s latet value. Estimatio of the realized radom effect i the MM is meaigful, as is predictio of the realized radom effect i the FPMM. There is a simple coectio betwee the MM ad Bayesia methods. The distributio of P i the MM has bee termed the obective prior distributio, as i Robiso (99). t has a simple iterpretatio as the distributio of subect s latet values, ad characterizes atural variatio betwee subects. Whe < N, defiig the distributio of P to be a subset of radom variables from a exchageable distributio of N = latet values i the populatio, will expad the umber of artificial respose i the MM sample space, but ot alter the umber of potetially realizable resposes. Although each realizatio of P is a set of latet values from the populatio, this expasio does ot make the estimator based o a set of subects from the MM more geeral, or does it guaratee that the resultig estimator will be more accurate. The accuracy of estimators that are developed from such models should be evaluated oly over potetially realizable poits i the sample space. Such a evaluatio may provide isight as to whether artificial expasio of sample spaces ca give rise to more accurate estimators. t is possible to expad the discussio of Bayesia cocepts to iclude a distributio of fixed effects, which Robiso (99) refers to as a subective prior. A sample poit i the resultig oit distributio C09ed3v6v.doc /8/009 5:5 PM

(of fixed ad radom effects) must have parameters equal to those for the actual set of subects i order for potetially realizable resposes to be icluded i the sample space. The extesio to subective priors exteds oly the umber of artificial poits i the sample space, ad does ot alter the set of potetially realizable resposes from which the resultig estimator should be evaluated. Still, it is possible that such a extesio of the artificial sample poits will produce a more accurate estimator i some settigs, a area deservig further study. There is a firm coectio betwee the MM ad the FPMM i the survey samplig literature datig back to the importat papers of Godambe (955) ad Godambe ad Joshi (965). Their work stimulated a crisis i the foudatios of statistical iferece, as summarized by Cassel et al. (977). We discuss this coectio, sice it provides a uifyig framework for ideas of statistical iferece. Godambe (955) cocluded that there is o best liear ubiased estimator of a fiite populatio total based o probability samples. This result was startlig sice the sample mea from a simple radom sample is commoly preseted as the BLUE of the populatio mea. mportat ideas i Godambe s developmet iclude the very geeral defiitio of a liear estimator, ad the eed of additioal assumptios beyod samplig to obtai a optimal estimator. The liear estimator iitially proposed by Godambe (955) icludes separate coefficiets for each subect i each positio i a sample, where sample poits correspod to realizatios of the subset of the first radom variables represetig a permutatio of subect values i a fiite populatio. Subsequetly, Godambe ad Joshi (965) cocluded that it was sufficiet for coefficiets to be defied for each subect i a sample set, ot a sample sequece. Optimal coefficiets ca icorporate subect specific iformatio, such as differet respose error variaces, sice subects are idetifiable i a set. Additioally, sice the sample set is the startig poit, the coectio back to the possible samplig probabilities is ot relevat, sice iferece is coditioal o the sample set. The settig cosidered by Godambe (955) did ot iclude respose error. Addig respose error to a subect s latet value i Godambe s basic model does ot alter the coclusio of o-existece of a C09ed3v6v.doc /8/009 5:5 PM 3

BLUE, eve though it is possible to specify a set of estimatig equatios for a target. While the equatios ca be solved, the solutio does ot result i a estimator sice it icludes o-sample latet values. Godambe (955) itroduced additioal a priori model assumptios (motivated by icludig a auxiliary variable) i order to develop a estimator of the populatio total. These assumptios are similar to the MM assumptios o latet values. As a result, the MM ca be cosidered to be a variatio o the suggestio by Godambe (955). These basic ideas costitute the foudatio for superpopulatio models i survey samplig. We idetify aspects of these models that are related to the MM. First, there is a coectio betwee the realized sample ad the superpopulatio, which we defie i terms of a set of latet values give by realizatios of P = μ + a. These latet values eed ot be simply the latet values for the subects i the sample set, but could be defied quite geerally. Whe the latet values for the subects i the sample set are icluded i this defiitio, it is always possible to cosider the realized sample as a possible sample poit i the superpopulatio model. Notice how this defiitio obscures the iterpretatio of a superpopulatio, sice the oly idetifiable subects are those i the sample. While it may be appealig to thik of a superpopulatio as a larger fiite populatio (as i Voss (999)), there is o eed to do so. The FPMM is the result of movig i a differet directio as a cosequece of Godambe s o-existece results. Rather tha icludig additioal assumptios i the model for a sample set, the FPMM collapses radom variables to a lower dimesioal space. Oe casualty i the collapsig is a loss i idetifiability of subects for the FPMM radom variables. This idetifiability is lost whe developig predictors of realized radom effects, but re-gaied oce the subects i the sample set are realized. order to maitai these distictios, Staek ad Siger (004) have described the FPMM-BLUP as a predictor of the latet value of a realized subect i a positio i a sample. This awkward iterpretatio is a stumblig block for the methods, sice the positio is ot of substative iterest i a practical problem, ad the C09ed3v6v.doc /8/009 5:5 PM 4

subect (whose latet value is of iterest) is ot idetifiable. By limitig evaluatio of the MSE to subects i the sample set, the FPMM-BLUP is a estimator of a subect s latet value i the set, offerig a simple clear iterpretatio. While the simple examples illustrate that the FPMM predictor may outperform the MM predictor i some settigs, guidace for its use is curretly lackig. Some of the mai ideas i these results are far reachig. First, we coclude that a importat area for ivestigatio is that of statistical iferece coditioal o the sample set. Secod, we coclude that it is crucial to evaluate properties of estimators over potetially realizable sample poits, ad ot iclude artificial poits i the sample space. This simple guidelie ca elimiate debate over adoptio of prior distributios, or other artificial assumptios, sice their use i developig estimators is allowed, but the evaluatio of the properties of the estimators should be tied to reality. Fially, these results illustrate that there is a lot to be leared. May accepted procedures, models, ad theories appear to be based o ideas that are ot cosistet with these two coclusios. Their re-examiatio may lead to a souder basis for statistical iferece i the future. C09ed3v6v.doc /8/009 5:5 PM 5

Appedix A. We develop the predictor of P = X μ +Z a where X = ad Z = e as a liear fuctio of Y, i.e. P = cy, that is ubiased ad has miimum expected MSE i the mixed model followig the developmet by Goldberger (96) as reviewed by Robiso (99). We first ote that X μ ad Y Eξ R = P X ( P P) Eξ = 0. Sice R var ξ R Y Ω ZΓZ =. The ubiased costrait requires that P Z ΓZ Z ΓZ E ( P P ) = E ξ cy ( X μ + Z a) ξr R = Eξ cp = cx μ X μ ( X Z ) μ, a the ubiased costrait is give by cx = 0. Miimizig X ( P P ) = c c czγz + Z ΓZ var R ξ Ω with respect to c subect to the ubiased costrait results i P ( = X μ + Z ΓZΩ Y Xμ ) where μ ( ) = XΩ X XΩ Y. = γ + σ γ J = Sice Ω ( ), Ω = k + γ = k kk, where k γ = γ + σ, ( ) k = k k k ad k X X= k = k. Usig this result, Ω γ ( k ) so that μ ky k =. Now ΓZ k k Ω = k + = ( k ) ad hece where ( k) ΓZ ( ) ( Ω Y X μ = k ) ( μ + μ) = Y X k k Y X ( μ ) = 0 k k Y X, so that ( k) C09ed3v6v.doc /8/009 5:5 PM 6

( Y μ) ( μ ) = X μ + Z ΓZΩ Y X P = μ+ k. Appedix B. Mea Squared Error The mea squared error of P uder the model for Y is give by P P P = X μ + Z ΓZ Ω Y X μ ad P = X μ +Z a. Notice that we ca ( ) var R write ξ where ( ) ( k ) P = cy where c = k + e k. First, observe that k = E P = cp ( ) R ξ ( k) Pw kp Pw k( P Pw) = ( k ) k P + k P = + k = = + where Pw = k P. Now ( Y var ) ( ξr P P = varξr c Z ) where k = a Y Ω Γ Y varξ R = ad Ω= Γ + σ, varξ R = Γ + σ. As a a Γ Γ = 0 0 = a result, ( ) var ξ R P P = c σ + Γ c Z Γc + Z ΓZ. = Usig c ad Z = e, we expad these terms to obtai var These terms simplify as ( k ) ( k) ( ξ R P P) = σ + = k k kγk k k ( k ) ( k) + e k+ k e k k kσ kσ = = ( k ) ( k) k k = = + e Γk+ kγ e k k k σ + e e + e k Γ k e = = = ( k ) e Γk e Γ k e + eγe k =. C09ed3v6v.doc /8/009 5:5 PM 7

ad ( ) ( ) ( ) k k k k γ σ k k+ kγk =, k = k k ( k ) ( ) k e kσ k+ k kσ e + k = k =, ( k ) ( ) ( k) k k k γ k k + e Γk+ kγ e = k = k = k k( k) γ e kσ e + e k Γ k e =, = = = ( k ) + ( k) kγ e Γk e Γ k e + e Γe = γ kγ. k = k var k ξ R P P = σ k +. k Combiig all terms ad simplifyig the resultig expressio, ( ) Appedix C. The Coditioal MSE We evaluate E ( R P P P = ) P = y. As a result, ( ) k Sice c = k + e k, k = where result, w y. Now P = cy, E ( ) ( P P P = y) = ( cy y P= y) E E R R cy ( ) ( k ) = cy y = k y + k y = k k w k k = y + k y y = k = k = μ + k y k = ad μw = w y. Hece, k = E ( R P P = ) = ( k )( y μw) We evaluate E ( R P P) P = y by otig that give = ( ) P y. R Y P= y = y, ad P = y whe. P y, P = = ( + ) cy c P E. As a C09ed3v6v.doc /8/009 5:5 PM 8

E Now ( ) R = ( ( ) ) ( ) (( ) ) ( ce ( y cy )) P y E P P P = y = E c y + E y P= y EE = σ. As a result, R R ( ) = E = R ( ) ( y ) = c E EE c + c y R (( ) ) P P = = σ + ( y ) = E R P y c c cy. Now ( ) ( ) k k c σ k σ k = c = k + e + k e k = = k = ( k) ( k) = k σ k σ k + e k+ e k = = k = ( k ) ( ) k = k σ k σ k σ k+ e k+ e e k = k = = ( k ) ( k) = k σ + k σ + k σ cy = k μ + k y, Usig ( ) As a result, w ( ( ) P y) k = k ( cy ) ( ) y = y k μw ky ( k)( y μw) ( k) ( y μw) = = ( k ) ( k) E.. ( ) ( ) R P P = = k σ + kσ + kσ + k y μw k = k ( k ) = ( k) w σ + ( y μw) kσ + + = k.. Appedix D. Relatioship Betwee Models for Sequeces ad Sets C09ed3v6v.doc /8/009 5:5 PM 9

Respose for sequece h give by Yh ca be related to respose defied by the respose error model (). To see this, we represet the idicator variables that defie sequece h i u h as a permutatio (defied by v m ) of elemets i set h (defied by δ h ) such that u = δ v h h m where δh δh δh N δh δh δhn δ h =, δh δh δhn th δ hs has a value of oe if the smallest subect s label i set h is for subect s, ad zero otherwise, ad vm vm vm vm vm vm v m = vm vm vm th with elemets v mi havig a value of oe if the smallest label i set h is i positio i, ad zero otherwise. For example, whe = ad N = 3, the data for sequece h cosistig of subect s = 3 followed by s = is (( s 3, Y ) ( s, Y 0 = = h h ) ) is defied by usig v m = to permute the 0 0 0 subects i set h defied by δ h =. 0 0 Appedix E. Z i = The FPMM-BLUP of a sample subect s latet value, Pi = Xiμ + Zb i where X i = ad e may be obtaied similarly to the developmet i Appedix A. We first ote that i γ Y X Γ = J ad Ω= Γ+σ so that E pr = μ ad N P i X i Y Ω ZΓZi varpr =. The predictor is a liear fuctio of Y give by P = cy such that P i ZiΓZ ZiΓZi E P P = 0 which implies that cx = 0. The FPMM-BLUP is give by ( ) pr i + Z ΓZ Ω ( Y X ) where μ ( ) P = X μ μ i i i X i = X Ω X X Ω Y. Sice k Ω = + J, X Ω X = where f =, γ + σ N k γ + ( f ) σ N X Ω Y = Y, ( ) μ = X Ω X X Ω Y γ + σ fk = Y ad C09ed3v6v.doc /8/009 5:5 PM 30

k Z iγz Ω = k e i + N k. Sice Y X μ = J Y, ( ) Z iγz Ω Y Xμ = ke i J Y so that Pi = Y + k( Y i Y). Usig these expressios, P i k = + e i J Y where c i = + ke i J. Y Now varpr = Γ + σ. As a result, b 0 0 var ( ) ( ) pr Pi Pi = c i σ + Γ c i ZΓc i i + ZΓZ i i. Now ( ) c ( ) i σ + Γ ci = c i σ + γ γ J ci N, = ( σ + γ ) cc i i γ cj i ci N ad cc i i= + k while cj i ci = resultig i ( ) c i σ + Γ c i = ( σ + γ ) + k γ. N Also, Z Γc i i = γ + kγ N ad Z iγz i = γ. Combiig these terms ad N simplifyig, ( σ varpr Pi Pi ) = + k( ). Appedix F. The MSE of the FPMM-BLUP for a Sample Set The FPMM-BLUP of a sample subect s latet value, Pi = Xiμ + Zb i where X i = ad Zi = e i is i ( i ) P = Y + k Y Y. The predictor is developed over all possible sample sequeces idetified by u = δ v h h m where a realized sample sequece is the realizatio of u h h. We evaluate ( P P ) var pr i i h. Sice Y h = m h H h = v δ Y ad P = v δ y, h m h h P ( i Pi δ Y h = cm gm ) δhy C09ed3v6v.doc /8/009 5:5 PM 3

where we defie c m = c iv m ad m = m Now ( ) e =c δ Y δ c. g gv. As a result, var ( P P ) var ( ) pr i i h m h R h m δ h varr Y δ h = σ ad c m = + ke i v m J. Notice that ev i m = e with elemets = = ev so that we ca express m k c = + e J. As a result, defiig σ h = σ i=, = i mi ( P P h ) pr i i = c m σ cm = var Also, settig μ = h h y, the MSE is give by = σ h + k ( σ σh ) + k σ σ + σh. = ( ( ) k+ kσ ( ) ) + k σh ( ) ( c g h ) h R i i = m m R yh E P P E Y ( k)( y μ ) = ( ) ( ( ) ( ) ) ( h = + σ + σ + ) ( μh ) MSEpR Pi Pi k k k h k y = + + +, ( ) k kσ ( k) ( σ ) ( k) ( y μ ) h h. C09ed3v6v.doc /8/009 5:5 PM 3

Refereces Brow, E.M., ad Kass, R.E. (009). What is statistics? (with discussio), The America Statisticia, 63: 05-3. Cassel, C.M., Särdal, C.E. ad Wretma, J.H. (977), Foudatios of ferece i Survey Samplig, New York, NY: Joh Wiley. Godambe, V.P. (955). A uified theory of samplig from fiite populatios. Joural of the Royal Statistical Society B. 7:69-78. Godambe, V.P. ad Joshi, V.M. (965). Admissibility ad Bayes estimatio i samplig from fiite populatio.. The Aals of Mathematical Statistics 36. 707-7. Hederso, C.R. (975). Best liear ubiased estimatio ad predictio uder a selectio model, Biometrics 3:43-447. Hederso, C.R. (984). Applicatios of Liear Models i Aimal Breedig. Uiversity of Guelph, Guelph Caada (SBN 0-88955-030-). Koch, G.G. (967). A procedure to estimate the populatio mea i radom effects models, Techometrics 9: 577-585. Koch, G. G. (973). A alterative approach to multivariate respose error models for sample survey data with applicatios to estimators ivolvig subclass meas, Joural of the America Statistical Associatio, 68: 906-93. Koch, G. G., Gilligs, D.B., ad Stokes, M.E. (980). Biostatistical implicatios of desig, samplig, ad measuremet to health sciece datat aalysis, A. Rev. Public Health :63-5. Merriam, P.A., Ockee,.S., Hebert, J.R., Milagros, C.R., ad Matthews,C.E. (999). "Seasoal variatio of blood cholesterol levels: study methodology," Joural of Biological Rhythms. Vol. 4 No. 4, 330-339. Nelder, J.A. (977). A reformulatio of liear models w(with discussio). Joural of the Royal Statistical Society A 40 48-76. C09ed3v6v.doc /8/009 5:5 PM 33

Ockee,.S., Chiriboga, D.E., Staek, E.J., Harmatz, M.G., Nicolosi, R., Saperia, G., Well, A.D., Merriam, P.A., Reed, G., Ma, Y., Matthews, C.E. ad Hebert, J.R. (004). Seasoal variatio i serum cholesterol: Treatmet implicatios ad possible mechaisms. Archives of teral Medicie, 64:863-870. Robiso, G.K. (99). That BLUP is a good thig: the estimatio of radom effects, Statistical Sciece, 6:5-5. Staek E.J. ad Siger, J.M. (004), Predictig Radom Effects from Fiite Populatio Clustered Samples with Respose Error, Joural of the America Statistical Associatio, 99: 9-30. Voss, D.T. (999). Resolvig the Mixed Models Cotroversy. The America Statisticia 53. 35-356. C09ed3v6v.doc /8/009 5:5 PM 34

List of Table Titles Table. Populatio Values ad Parameters for Simple Example Table. Potetially Realizable Resposes for the Set {Daisy, Rose} Assumig a Mixed Model Source: c09ed33.xls Table 3. Additioal (o-realizable) Resposes for the Set {Daisy, Rose} Assumig a Mixed Model Table 4. Predictors of MM-Latet Values, Differece from P, ad MSE for the Set {Daisy, Rose}. Table 5. Predictors of Subect s Latet Values, Differece from y, ad MSE for the Set {Daisy, Rose}. Table 6. Estimators of the Mea Latet Value P = y from a Respose Error Model ad MM, the Differece from the Mea, ad the MSE for the Set {Daisy, Rose}. Table 7. Potetially Realizable Resposes for the Set {Daisy, Rose, Lily} Assumig a Respose Error Model Table 8. Predictors of Daisy s Latet Value, Differece from MM Latet Value P, ad MSE for the Set {Daisy, Rose, Lily} for Potetially Realizable Resposes. Table 9. Predictors of Daisy s Latet Value, Differece from MM Latet Value P, ad MSE for the Set {Daisy, Rose, Lily} for No-realizable Resposes. Table 0. Summary of MM-Average Predictor, P, MM-Latet Value, P, Differece, ad MSE for Resposes t =,...,8 ; t = 9,...,48 ; ad t =,..., 48 for the set {Daisy, Rose, Lily} Table. Fiite Populatio Mixed Model Respose ad Predictors of Realized Latet Values Table. Compariso of Average, Bias ad MSE of Predictors of Subect s Latet Values Usig the Subect s Respose Error Model (RE), Y, the MM-BLUP, P, ad the FPMM-BLUP, P i. Table 3. Compariso of the MSE betwee the MM-BLUP ad the FPMM-BLUP for Potetially Realizable Sample sets of size = 3 from a populatio of N = 4. List of Figure Titles Figure. Mea vs Stadard Deviatio of Saturated Fat itake (gm/day) for =554 subects with 0 or mores 4hr recall measures i Seaso s Study. C09ed3v6v.doc /8/009 5:5 PM 35