Estimating the Population Mean From a Simple Random Sample When Some Responses. are Missing

Similar documents
Collapsing to Sample and Remainder Means. Ed Stanek. In order to collapse the expanded random variables to weighted sample and remainder

Simple Linear Regression Analysis

Reaction Time VS. Drug Percentage Subject Amount of Drug Times % Reaction Time in Seconds 1 Mary John Carl Sara William 5 4

Investigation of Partially Conditional RP Model with Response Error. Ed Stanek

Linear Approximating to Integer Addition

TESTS BASED ON MAXIMUM LIKELIHOOD

Econometric Methods. Review of Estimation

Quiz 1- Linear Regression Analysis (Based on Lectures 1-14)

REVIEW OF SIMPLE LINEAR REGRESSION SIMPLE LINEAR REGRESSION

European Journal of Mathematics and Computer Science Vol. 5 No. 2, 2018 ISSN

r y Simple Linear Regression How To Study Relation Between Two Quantitative Variables? Scatter Plot Pearson s Sample Correlation Correlation

European Journal of Mathematics and Computer Science Vol. 5 No. 2, 2018 ISSN

CS473-Algorithms I. Lecture 12b. Dynamic Tables. CS 473 Lecture X 1

A Result of Convergence about Weighted Sum for Exchangeable Random Variable Sequence in the Errors-in-Variables Model

Simple Linear Regression. How To Study Relation Between Two Quantitative Variables? Scatter Plot. Pearson s Sample Correlation.

8 The independence problem

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Handout #4. Statistical Inference. Probability Theory. Data Generating Process (i.e., Probability distribution) Observed Data (i.e.

Simple Linear Regression

Point Estimation: definition of estimators

Chapter 8: Statistical Analysis of Simulated Data

Linear Regression. Can height information be used to predict weight of an individual? How long should you wait till next eruption?

Lecture 3 Probability review (cont d)

Chapter 3 Sampling For Proportions and Percentages

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

Summarizing Bivariate Data. Correlation. Scatter Plot. Pearson s Sample Correlation. Summarizing Bivariate Data SBD - 1

Lecture 25 Highlights Phys 402

A note on testing the covariance matrix for large dimension

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

Hamilton Cycles in Random Lifts of Graphs

To use adaptive cluster sampling we must first make some definitions of the sampling universe:

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

Predicting the eruption time after observed an eruption of 4 minutes in duration.

Lecture 3. Sampling, sampling distributions, and parameter estimation

Application of Calibration Approach for Regression Coefficient Estimation under Two-stage Sampling Design

Chapter 5 Properties of a Random Sample

ESS Line Fitting

Team. Outline. Statistics and Art: Sampling, Response Error, Mixed Models, Missing Data, and Inference

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d

ε. Therefore, the estimate

KR20 & Coefficient Alpha Their equivalence for binary scored items

Chapter 14 Logistic Regression Models

Summary of the lecture in Biostatistics

Chapter 4 Multiple Random Variables

Scheduling Jobs with a Common Due Date via Cooperative Game Theory

Functions of Random Variables

On a Truncated Erlang Queuing System. with Bulk Arrivals, Balking and Reneging

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

LECTURE - 4 SIMPLE RANDOM SAMPLING DR. SHALABH DEPARTMENT OF MATHEMATICS AND STATISTICS INDIAN INSTITUTE OF TECHNOLOGY KANPUR

Simulation Output Analysis

b. There appears to be a positive relationship between X and Y; that is, as X increases, so does Y.

Chapter 8. Inferences about More Than Two Population Central Values

Comparison of Dual to Ratio-Cum-Product Estimators of Population Mean

Discrete Mathematics and Probability Theory Fall 2016 Seshia and Walrand DIS 10b

Third handout: On the Gini Index

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

Lecture Notes 2. The ability to manipulate matrices is critical in economics.

Regression. Chapter 11 Part 4. More than you ever wanted to know about how to interpret the computer printout

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model

2SLS Estimates ECON In this case, begin with the assumption that E[ i

ENGI 3423 Simple Linear Regression Page 12-01

X ε ) = 0, or equivalently, lim

Chapter -2 Simple Random Sampling

Multiple Linear Regression Analysis

1. a. Houston Chronicle, Des Moines Register, Chicago Tribune, Washington Post

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Point Estimation: definition of estimators

Correlation. Pearson s Sample Correlation. Correlation and Linear Regression. Scatter Plot

ROOT-LOCUS ANALYSIS. Lecture 11: Root Locus Plot. Consider a general feedback control system with a variable gain K. Y ( s ) ( ) K

6.867 Machine Learning

Chapter 11 Systematic Sampling

CHAPTER 3 POSTERIOR DISTRIBUTIONS

Class 13,14 June 17, 19, 2015

Chapter -2 Simple Random Sampling

ECON 5360 Class Notes GMM

ENGI 4421 Propagation of Error Page 8-01

Some Notes on the Probability Space of Statistical Surveys

Chapter Two. An Introduction to Regression ( )

IRREDUCIBLE COVARIANT REPRESENTATIONS ASSOCIATED TO AN R-DISCRETE GROUPOID

An Unbiased Class of Ratio Type Estimator for Population Mean Using an Attribute and a Variable

CHAPTER VI Statistical Analysis of Experimental Data

Bayes (Naïve or not) Classifiers: Generative Approach

International Journal of Pure and Applied Sciences and Technology

ρ < 1 be five real numbers. The

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

Statistics: Unlocking the Power of Data Lock 5

Objectives of Multiple Regression

UNIT 4 SOME OTHER SAMPLING SCHEMES

Parameter, Statistic and Random Samples

Chapter 4 Multiple Random Variables

Wu-Hausman Test: But if X and ε are independent, βˆ. ECON 324 Page 1

STK4011 and STK9011 Autumn 2016

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions

Lecture Notes Forecasting the process of estimating or predicting unknown situations

Dimensionality Reduction and Learning

Trignometric Inequations and Fuzzy Information Theory

Transcription:

Etmatg the Populato Mea From a Smple Radom Sample Whe Some Repoe are Mg Edward J. Staek, Jgog Lu, Reca Yucel, ad Elae Puleo Departmet of Botattc ad Epdemology 40 Arold Houe Uverty of Maachuett Amhert, Maachuett 003 C06ed5 /4/006 0:56 AM

Abtract We develop a deg-baed predcto approach to etmate the fte populato mea a mple ettg where ome repoe are mg. The approach baed o dcator amplg radom varable that operate o labeled ut (ubject). Mg data mecham are defed that may deped o a ubject, or o a electo (uch a whe the tudy deg ag group of elected ubject to dfferet tervewer). Ug a approach uually reerved for model-baed ferece, we develop a predctor that equal the ample total dvded by the expected ample ze. The method are drect exteo of bet lear ubaed predcto (BLUP) fte populato mxed model. Whe the probablty of mg etmated from the ample, the emprcal etmator mplfe to the mea of the realzed o-mg repoe. The dfferet mg data mecham are revealed by the otato that accout for the label ad ample electo. The mea quared error (MSE) of the emprcal etmator, coutertutvely, maller tha the MSE f the probablty of mg kow. KEYWORDS: Smple radom amplg, Mg data, MCAR, fte populato, Bet lear ubaed etmator (BLUE), predcto. C06ed5 /4/006 0:56 AM

. troducto Stattcal aaly the preece of complete or mg data a pervave problem ample urvey. A mple example llutrate the problem. Suppoe that a voter opo poll coducted va a mple radom telephoe ample elected from a lt of regtered voter. Although a ample of ze elected, repoe wll mot lkely be obtaed o elected ubject. Some of the regtered voter wll have awerg mache ad cree call, reultg o-repoe. addto, poor tervewg kll by ome tervewer may reult refual for other cotacted ubject. The frt type of o-repoe deped o the ubject, whle the ecod type of o-repoe deped o the tervewer. the mplet ettg, the probablty of o-repoe wll be urelated to the actual voter preferece of the ubject. f th true, the mg repoe are called mg completely at radom (MCAR) (Lttle ad Rub 987). For example, f the proporto of regtered voter who cree call amog thoe who would vote for a caddate the ame for all caddate, the the mg repoe are MCAR. Alo, f the proporto of refual that reult from poor tervewer kll the ame for voter of all caddate, the mg repoe are MCAR. MCAR the mplet kd of o-repoe aumpto. t ofte aumed a a tartg pot a aaly, a we do here. How hould oe etmate the voter preferece for a caddate whe repoe for ome of the elected ample ubject mg? A tutve etmator the mple proporto (.e. < mea) of the repodg ubject who would vote for the caddate. Th the etmator decrbed by Cochra (977). Although tuto a good gude electg th etmator, the etmator ot a mple lear fucto of the ample data, ce the deomator a radom C06ed5 /4/006 - -

varable. A way aroud th complcato to codto o the oberved ample ze,. Oh ad Scheure (983) ad Rao (985) have ued th approach to how that the etmator ubaed. The codtoal approach, however, draw to queto the role of the uderlyg mple radom amplg tattcal ferece. We exame th mple problem ad how how explct pecfcato of amplg dcator radom varable wll reult a probablty model famlar to other problem. Straght forward applcato of predcto method gve re to a predctor that deped o the probablty of mg repoe, whch, whe replaced by the ample etmate, reduce to the mea of the oberved repoe.. The Populato We defe a fte populato a a collecto of a kow umber,, of detfable ubject labeled =,,,. Aocated wth ubject a repoe y, whch we aume potetally obervable wthout error. the votg preferece urvey, y correpod to a dcator that aume a value of oe f ubject wll vote for the cumbet, ad zero otherwe. Whe there are more tha two choce, repoe correpod to a et of dcator for the caddate, wth oly oe havg a value of oe. th cotext, we wll lmt our teret to vote for a gle caddate, ad thu coder a gle repoe varable. The aumpto of o repoe error correpod to each ubject havg o ucertaty a to ther vote. We ummarze the et of populato value the vector y = ( y,, y ) ad aume that there teret a p vector of parameter of the form β =Gy where G a matrx of kow cotat. We lmt our atteto to a gle parameter, the populato mea gve by C06ed5 /4/006 - -

gy = µ = y =, defed by ettg g =, ad defe the populato varace a σ = ( y µ ), where repreet a = vector wth all elemet equal to oe. 3. Samplg, Mg Data, ad Predcto Suppoe that a mple radom ample wthout replacemet to be elected from the populato. We defe the elemet occupyg the frt poto the permutato to be the ample, otg that mple radom amplg mple that each permutato equally lkely. Th repreetato ha bee dcued by Cael, Särdal ad Wretma (977) ad explored the cotext of uper-populato model by Rao ad Bellhoue (978). Our dcuo cloely related to thee preetato, but follow the defto ad otato ued by Staek, Sger ad Leça (004) for radom varable correpodg to poto a radomly elected permutato. Let =,,, dex the poto a permutato. We repreet the value poto of a radomly elected permutato by the radom varable Y = U y, where U = f ut poto ad U = 0 otherwe. The radom vector Y = ( Y,, Y ) repreet a radom = permutato of the populato (a Cael, Särdal ad Wretma 977). We ca relate Y to y uch that Y = Uy, where U U U =. U U C06ed5 /4/006-3 -

ote that y a vector of cotat dexed by the labeled ubject, whle Y a vector of radom varable dexed by the poto. Realzg a value of Y wll ot reveal whch ubject occupyg poto the permutato, although t wll reveal the value correpodg to the realzed ubject. To kow whch ubject occupe poto a permutato, we eed to kow the realzed value of the radom varable S U. = = Thee ubtle dtcto ca be llutrated wth the votg preferece example. Suppoe that the realzed repoe for the frt elected ubject ( = ) a vote for the cumbet. Smply kowg the realzed value of Y doe ot tell u whch ubject voted for the cumbet, t oly tell u that oe of the ubject voted th way. order to kow whch ubject cat th vote, we eed to kow whch ubject occuped the frt poto the permutato,.e. the realzato of S. Th could be recorded alog wth the realzed value of Y, reultg a bvarate repoe. Typcally, the addtoal varate repreetg the labeled ut dropped from the aaly. Although ot relevat for the preet dcuo, the ubtle dfferece betwee the realzed value of a poto ad the realzed value of a ubject what make terpretato of realzed radom effect mxed model o challegg (ee Staek, Sger, ad Leça (004) for addtoal dcuo). Sce each ubject ha a equal chace of beg aged to a gve poto a permutato, Eξ ( Y ) µ = for =,...,, where ξ deote expectato over permutato. We ca ummarze th expected value tructure a lear model gve by Y = Xµ + E where X=. We partto the vector of radom varable to a ubet repreetg the ample, Y = ( Y,, Y ), dexed by =,,,, ad the remader, Y (,, = Y Y ) +, dexed by C06ed5 /4/006-4 -

= +, uch that ( ),, Y= Y Y. Stattcal ferece ca be readly decrbed ug th repreetato. Suppoe our teret etmatg the populato mea, µ ug reult from a mple radom ample. The frt tep defg tattcal ferece to troduce a probablty model that defe radom varable, ce tattcal ferece mut volve radom varable. Repreetg the elemet by the populato by radom varable, each correpodg to a poto a permutato of the populato a mple traparet place to tart. The properte of thee radom varable eed ot be dwelled o to make progre defg what we mea by tattcal ferece. Yet oe charactertc of the repreetato, that the populato mea ca be expreed a µ = Y, = worth otg. The radom varable are ot completely free to vary. fact, a parameter, µ, equal to the um of the radom varable. Wth uch a repreetato of radom varable, we ca TO HERE 0/9/006 Th correpodg to, ad detfy a correpodece betwee elemet the populato, ad radom varable.repreet the. The predcto approach to ferece make ue of the fact that the realzed ample correpod to the realzed value of Y. order to etmate a parameter that a lear fucto of Y, the bac problem predcto of a lear fucto of Y that ot oberved. The lear fucto determed by the parameter of teret. For example, ce the populato C06ed5 /4/006-5 -

mea ca be repreeted by = +, (where µ y Y y = y = wth y repreetg the realzed value of Y ad Y = Y ), a etmator of µ baed o a mple radom ample = + requre predcto of Y. We llutrate th proce wth a mple example. Suppoe we have a populato wth ze = 4 ad elect a ample wthout replacemet of ze =. We repreet the populato a ( 3 4 y = y y y y ) ad a radom permutato of the populato a Y = ( 3 4 ) Y Y Y Y. The frt two radom varable the permutato make up the ample. A total of! = 4 poble permutato ca occur, wth each of them equally lkely. The reult of three poble permutato are gve Fgure. C06ed5 /4/006-6 -

Fgure. Example of Realzed Sample for Three Poble Permutato where = 4 ad = uy 0 0 0 y 0 0 0 y 0 0 0 y 3 y 4 y 0 0 0 y 4 3 0 0 0 3 0 0 0 y y4 y3 0 0 0 y 3 y y4 0 0 0 y y 4 Permutato Realzed Sample y y y y y 0 0 0 y y 0 0 0 y y y 0 0 0 y 3 y 3 y 0 0 0 y4 y4 Y y y The radom varable ad Y wll ot be oberved. Predctg ther um the expreo Y3 4 the bac problem of ferece. Predctor of th fucto ca be developed ug the approach of Royall (988) whch ha bee recetly ummarzed by Vallat, Dorfma, ad Royall (000) the cotext of uperpopulato model. t ot eceary to troduce a uper-populato to apply the approach to mple radom amplg. We aume that the predctor are a lear fucto of the ample, are ubaed, ad wll reult mmum expected MSE. The reultg predctor, Y ˆ, called the bet lear ubaed predctor (BLUP). Whe combed a a weghted lear fucto wth the ample mea, the etmator of µ the bet lear ubaed etmator (BLUE). Uder mple radom amplg, the BLUP of Y y, o that the BLUE of µ y, the mple ample mea Y C06ed5 /4/006-7 -

(Staek, Sger, ad Leça 004). 3.. Two Method of Specfyg Mg Data Uder the aumpto of MCAR, we pecfy two model that accout for mg data. each model, we aume the probablty of a mg repoe cotat, ad equal to π. The frt model repreet the mg data mecham by radom varable dexed by the poto of a ubject the ample, M, =,...,, where M take o a value of oe f repoe mg for poto, ad zero otherwe. Such radom varable may repreet a mg data mecham for factor determed by the tudy deg, a for example whe dfferet tervewer are aged group of ample ubject to tervew. The ecod model repreet a mg data mecham by radom varable dexed by ubject, H, =,...,, where H take o a value of oe f repoe mg for ubject, ad zero otherwe. Such radom varable may repreet a mg data mecham where a factor, uch a awerg mache creeg, deped o dvdual ubject. The two mg data mecham emphaze the dtcto betwee ubject label ad ample poto. 3... A Model for Repoe whe Mg Data Deped o Sample Subject Poto We frt coder the ettg where the mg data mecham dexed by the poto of ubject the ample, a mght occur f tervewer are aged to coecutve elected ubject. We corporate the mg data mecham to the radom permutato model by augmetg the radom varable to a vector of radom varable. C06ed5 /4/006-8 -

The frt radom varable the vector correpod to potetally oberved repoe. th The radom varable gve by ( M ) Y. f, the radom varable wll be realzed the ample. Whe the realzed value of M m = 0, repoe for the ubject elected poto gve by the realzed value of Y,.e. y. Whe the realzed value of M m =, repoe for the ubject elected poto mg, ad the value of the realzato, ( ) m Y, zero. Thu, the frt radom varable are the potetally obervable repoe for radom varable repreetg a permutato. The ecod radom varable the vector correpod to mg repoe. The th radom varable gve by M Y. f, the radom varable wll be realzed (but the value of the radom varable wll ot be oberved) the ample. For example, whe the realzed value of realzed value of M m =, repoe for the ubject elected poto mg, but the M Y wll correpod to the realzed value of my,.e. y. Although th value wll ot be oberved by the vetgator, t wll be cotaed the ecod et of radom varable. Whe the realzed value of M m = 0, repoe for the ubject elected poto ot mg, but the value of the realzato, my, zero. Thu, the ecod radom varable are the potetally obervable repoe for realzed radom varable repreetg a permutato where repoe mg. Whe the probablty of mg deped o poto, we repreet the frt radom varable by the product ( * ) M Y, where * M = M a dagoal matrx wth dagoal = elemet gve by M. We partto th vector to a vector repreetg the ample, Y, ad the remader, Y, uch that ( ) ( * M Y = Y Y ), where the upercrpt a C06ed5 /4/006-9 -

remder that thee radom varable are potetally oberved. The ecod radom varable correpodg to mg repoe are gve by the product Y ( m) * = M Y. We repreet the vector of radom varable by ( ( m) = ) Z Y Y Y. Elemet of th vector are gve by ( ) = Z M U y ad = ( m) = Z M U y. = 3... A Model for Repoe whe Mg Data Deped o Labeled Subject Whe the probablty of mg deped o the ubject, we repreet the potetally obervable radom varable by a vector a mlar maer. We form the frt vector of radom varable that are potetally oberved by the product ( * ) U H y, where * H = H a dagoal matrx wth dagoal elemet gve by = H. We partto th vector to a vector repreetg the ample, ϒ, ad the remader, ϒ, ug the ame otato, but where ( ) ( * U H y = ϒ ϒ ). Elemet of ϒ are ow of the form = = ( ) Z U H y for =,...,. Whe h = 0, the realzed value for the ubject ot mg ad may be oberved; whe h =, the realzed value of the radom varable Z zero. The radom varable the vector correpodg to mg repoe are gve by the product ( m) * ϒ = UH y wth elemet ( m) = Z U H y. = We repreet the vector of radom varable by ( m) ( ) Z = ϒ ϒ ϒ. The C06ed5 /4/006-0 -

radom varable the vector Y are oberved a a reult of amplg. The elemet of ϒ ad ( m) ϒ are ot oberved. otce that uoberved radom varable correpod to both the mg data, ad to the porto of the populato that ot cluded a part of the ample. Although thee radom varable are repreeted dtctly, they hare the commo tatu of mg data. 3.. Frt ad Secod Momet. We develop the expected value ad varace of the vector of radom varable repreetg the populato ext. Expectato take wth repect to radom varable repreetg the mg data mecham, ξ, ad wth repect to radom permutato of the populato, ξ. For example, the elemet of Z are of the form ( ) = Z M U y ad = ( m) = Z M U y. Ug codtoal expectato, = ( ) ( ) o ξξ = ξξ ξ ξ ( ) E Z E E Z, ad ce ξ ( ) ( ) ξ = µ, E Z M ( ) = ( ) Eξξ Z π µ. Smlarly, ( m) ξξ ( ) = πµ E Z. Combg thee expreo, ( ) π Eξξ Z = µ. π The reult llutrate that the expected value of radom varable the ample, Z are ot equal to the populato mea. Th reult tutve f we recall that whe a repoe mg, the oberved repoe zero (a a reult of troducg the mg data radom varable the model). For example, f the probablty of a mg repoe π =0.0, the expected value of a potetally obervable radom varable, Z, =,..., 80% of µ. The C06ed5 /4/006 - -

expected value doe t mply that there ba, but mply that the expected value wll be cloer to zero tha the populato mea. The detcal reult are obtaed takg the expected value of the radom varable Z ( ), uch that E ξξ π Z = µ. π The varace ca be developed a mlar maer. Ug a codtoal expao, ( ) = E ( ) + E ( Z ) var ξξ Z var ξ ξ ξ Z ξ var ξ ξ. To evaluate th expreo, we defe * * ( ) M = M M, uch that = Z MUY. The ( ) E ξ ξ Z = M µ, where M m = + 0, ad m = ( M M M ). ote that m var ξ ( M ) = var ( ξ m ). Sce we aume the mg data radom varable are depedet, varξ ( m ) = π( π), ad hece. varξ ( ) ( ) Eξ ξ Z = π π µ Ug the reult from Staek, Sger ad Leça (004) that var [ ] hece E ξ UY = σ J, ad π π var ( Z ) = σ E M J M = σ π( π) + σ J. π π ξ ξ ξ ξ Combg thee expreo, Var ξξ ( ) ( π) π( π) ( ) J Z = σ + π( π) σ + µ. π π π detcal reult are obtaed evaluatg the varace of the radom varable Z. We ca ummarze the model for the populato that clude mg data. The model C06ed5 /4/006 - -

gve by Z = Xγ+ E where X =, ad ( ) π µ γ=. otce that th model, the um of the parameter πµ equal to the populato mea, µ. A mlar model ca be expreed for Z. We drop the ubcrpt for Z the ubequet developmet ce the two model have the ame expected value ad varace. 3.3. Developg Predctor of the Mea We ue the predcto approach to etmate the populato mea. Frt, ote that we ca expre the populato mea a a mple lear combato of the radom varable, µ = gz, where g =. Alo, we ca partto Z to a et of radom varable correpodg to the ample, Z (correpodg to Y or ( ) ϒ o ), ad the remag radom varable, Z. We partto g ( g g ) a mlar maer, where = g =. The value of the ample radom varable wll be oberved, ad correpod to the repoe for a o-mg elected ut, or the value zero for a elected ut where repoe mg. A a reult, oce the ample realzed, µ = g z + g Z, ad the bac problem predcto of g Z. We requre the predctor to be lear fucto of the ample data, pz, to be ubaed, uch that Eξξ ( pz, reultg the cotrat that gz ) p ( π ) = ( π ), ad to C06ed5 /4/006-3 -

mmze the varace, ξξ ( pz g Z ). Mmzg the varace ubject to th cotrat var ug Lagrage multpler ad mplfyg lead to the bet lear ubaed etmator (Lu, 004) gve by ˆ µ = Y. (3.) ( π ) The deomator, ( π ), correpod to the expected umber of repodg ample ubject. We refer to ˆµ a the average of the expected repodet. The umerator mply the total of the realzed ample, = Y, ug a repoe of zero for radom varable where the repoe mg. The varace of the etmator gve by var ( ˆ ) ( π) π µ = πµ + σ. ( π ) The etmator ca be wrtte a maer that emphaze the terpretato of predctg the u-oberved radom varable. We expre t a the weghted um of three term: the ample mea, Y = = Y ; the predctor of repoe for a ubject ot elected the ample, ˆP ; ad the predctor of repoe for the π ubject where repoe expected to be mg, ˆP. Ug th otato, the etmator gve by ˆ µ = Y + ( ) P垐 + πp. There a mple tuto that correpod to the choce of predctor. The predctor of repoe for a ubject ot elected the ample who wll repod equal to the average repoe over the ample, ad gve by ˆP = Y. The predctor for ubject whoe repoe wll C06ed5 /4/006-4 -

be mg correpod to the average repoe of the expected repodet, P ˆ ˆ = µ. Combg thee expreo, 垐 µ = Y + ( ) Y + πµ, a expreo whch readly ca be ee to be equal to (3.). A key feature of th decompoto the ablty to terpret term the etmator a a um of realzed ample value, ad predctor of u-oberved radom varable. Th provde a tutve gude to the tattcal ferece that lk drectly to the actual tattcal method. 3.4. The Emprcal Predctor practce, we eed to kow the probablty of mg repoe order to compute the predctor. A commo practce whe parameter are ukow to replace the parameter by etmate of the parameter. The etmator may come a addtoal data, or drectly from the ample. We refer to the reultg predctor a a emprcal predctor. order to etmate the populato mea, we eed a etmate of π. We ca etmate th parameter by the proporto of mg repoe the ample. otce that f repoe cot olely of the realzed value of Y, the we wll ot be able dtguh whether or ot repoe for poto the ample mg, or mply repreet a repoe of zero for the elected ubject. A a reult, we ca ot form a ubaed etmate of π wthout more formato. We aume that uch addtoal formato avalable. The addtoal formato cot of the realzed value, x of M (or U H ) for =,...,, allowg u to = C06ed5 /4/006-5 -

kow for each poto the ample whether or ot repoe mg. Defg 0 a the umber of elemet of Y (or 0 ) where repoe mg, we etmate π by ˆ π =. ( ) ϒ o Repreetg the umber of o-mg ample repoe a = 0, the emprcal predctor mplfe to Y ˆ µ 0 = = ( ˆ ) π = Y, equal to the mple mea of the o-mg ample repodet. The emprcal predctor mplfe to the tutve etmator wdely ued, although rarely motvated a formal faho. Ug the fte populato radom permutato model approach ad the addtoal data o or U H for =,...,, th predctor emerge a bet. = M 0 We etmate the MSE by replacg π by ˆ π =, y = by T Y = = ad σ by S = x Y ( )( ˆ µ 0 ). Ug thee etmator, Vˆ ( ˆ µ 0 ) = S = T + 0. The frt term th expreo flate the varace to accout for varablty reultg from dvo by the expected umber of o-mg ample repoe, a oppoed to the actual umber of o-mg ample repoe. Although for the emprcal etmator, we ue the actual umber of o-mg ample repoe, the expreo for the MSE tll reta th term. The ecod term mlar to the varace of ample mea uder mple radom wthout replacemet amplg. The dfferece that S a etmate of σ that deped oly o o-mg ample repodet. C06ed5 /4/006-6 -

3.5. A Example We llutrate the emprcal predctor wth the votg example. Suppoe that a telephoe tervew urvey of = 400 voter Amhert, Maachuett coducted to etmate the proporto of voter who favor ame ex marrage. We aume that the ample elected baed o mple radom amplg of the tow voter regtrato lt cotag = 8000 regtered voter ame. We alo aume that the probablty of repoe beg mg depedet of the actual ubject repoe for all voter. A a reult of the urvey, uppoe that = 50 repoe are obtaed, where 00 ( ˆ µ 0 =0. 80) favor ame ex marrage. The mple ample average gve by 00 y = = 0.5, 400 50 whle the etmate of the probablty of mg repoe ˆ π = = 0.375. We cotruct the 400 etmator of the proporto of voter favorg ame ex marrage by the um of the voter favorg ame ex marrage three group, the ampled voter who repod, y 00 = 400 = 00 ; the predcted umber of voter who would repod, but were ot 400 cluded the ample, ( ) 00 y = 7600 = 3800 ; ad the predcted umber of voter who 400 垐 0 8000 0.375 0.8 3000 0.8 = 400. Addg the oberved would ot repod, [ π ] µ = ( ) = ( ) umber of voter favorg ame ex marrage the ample to the predcted umber favorg ame ex marrage who would repod ad thoe who would ot repod, ˆ µ 0 = [ 00 + 3800 + 400] = 0.8. 8000 C06ed5 /4/006-7 -

Whe the repoe dchotomou, the expreo for the MSE mplfe to ˆ 0 V( 垐 µ ) = µ ( Y) + µ 0, whch gve by ( ) 0 0 v垐 µ = 0.0050 + 0.000059857 = 0.0058. 0 We compare the expreo for the varace wth a expreo for the varace correpodg to the fte populato varace where the aumpto made that the ample ze equal to the umber of o-mg ample repoe. Th varace gve by ˆ σ = S. Whe repoe dchotomou, S ( ) 垐 µ 0 µ 0 =, ad our example ˆ σ = 0.00065. Smulato tude (Lu 004) reveal that ˆ σ a better approxmato for the varace of ˆµ 0 tha V ˆ ( ˆ µ 0 ). Ug th expreo ad aumg aymptotc ormalty, we may etmate a 95% cofdece terval for repoe a (0.75, 0.849). 4. Dcuo The mple example llutrate a deg baed method that frame tattcal ferece a a problem of predctg value ot the ample. Whe ome repoe mg, predctor are eeded both of the remag ut the populato, ad of the ampled ut where repoe mg. Th approach to ferece very mlar to the approach advocated by Vallet, Dorfma, ad Royall (000) whch optmal predctor are cotructed for uoberved radom varable baed o a model for a uper-populato. Both approache dtguh betwee the value a fte populato ad a et of radom varable whoe realzato the populato value. The dfferece the two approache tem from accoutg C06ed5 /4/006-8 -

for the ut label. the uper-populato approach, label are gored. The tartg pot a et of radom varable that form a uper-populato ad atfy certa tattcal properte, uch a exchageablty. The fte populato codered to be a realzato of at et of uper-populato radom varable. The predctor are developed from the uper-populato model, ad ot from the fte populato amplg. the urvey amplg lterature, the predctor are referred to a model-baed, ce ther dervato baed o the uper-populato model. cotrat, the probablty model preeted Secto 3 are drectly from the amplg. Ut the populato are detfable, ad the label ca be traced through the proce of decrbg the mg data mecham. Th eable a clear accoutg ad terpretato of the phycal procee of amplg, ad procee that may reult mg data. Ulke the model-baed urvey approach, the radom varable ad ther properte are baed olely o the amplg deg ad do ot requre addtoal aumpto. However, mlar to the model-baed approach, the eetal tattcal problem framed a a predcto problem, ad make ue of the ame tool developg the bet lear ubaed predctor a are ued the model-baed approach. The bac deg-baed predcto approach wa preeted the cotext of mple radom amplg by Staek, Sger, ad Leça (004). There are everal ovatve apect to the applcato of th approach to the mg data problem. Frt, detfyg the labeled ut eable a clearer pecfcato of the mg data mecham. We have dtguhed the mg data mecham that deped o ample poto (uch a tervewer), from mg data mecham that deped o the labeled ut (uch a havg a awerg mache). t clearly poble to have more complex mg data model where the mg data mecham C06ed5 /4/006-9 -

deped o both ample poto,, ad ut,. Although the developmet Secto 3 aume that the probablty of repoe beg mg depedet ad detcally dtrbuted, other aumpto are poble, ad wll lkely lead to dfferet predctor. A ecod ovatve feature of the developmet the repreetato of the problem a a double et of radom varable. The two et of radom varable correpod to oe et where a repoe wll be potetally oberved, ad a ecod et cotg of the repoe value whe repoe mg. the frt et, realzato of the radom varable where repoe mg have a value of zero; the ecod et, realzato of the radom varable where repoe ot mg have a value of zero. Summg thee radom varable gve re to a et of radom varable where there o mg data. The dea of expadg the repreetato of radom varable for mg data mlar cocept to the expao of radom varable codered by Staek, Sger, ad Leça (004) ued to dtguh predcto of repoe for a ut baed o a mple radom ample. The emprcal etmate provde a addtoal teretg apect of the developmet. the cotext of bet lear ubaed predctor mxed model, emprcal etmate are commoly cotructed by replacg varace compoet parameter by ample etmator. Uually, uch ubttuto reult a elevated expected MSE for the emprcal predctor due to addtoal varace troduced by ubttutg the etmator for parameter. our applcato, the predctor volve a gle ukow parameter, π. Replacg th parameter by the ample etmator doe flate the expected MSE. However, the expected MSE appear to dramatcally overtate the varablty whe compared wth the varace evaluated from mulato tude. a ee, ubttutg for the ample ze reduce the varace by accoutg for the o C06ed5 /4/006-0 -

gorable mg data, ce the repoe recorded a a value of zero whe the ample repodet repoe wa mg. Ug fte populato amplg model ad a predcto approach coect etmato ad predcto. Th clearly llutrated the mple radom amplg/mg data ettg. The example provde a ettg for dtguhg term commoly ued tattc. The term etmator reerved for a tattc that come cloe ( term of havg mall varace) to a parameter. The term predctor reerved for a tattc that come cloe ( term of havg mall mea quared error) to a radom varable. Sce we defe a parameter a a lear combato of populato value, but the defe a radom permutato of thee value a a et of radom varable, the parameter ha a equvalet defto a a lear combato of radom varable. Th proce mple that a etmator of the populato mea ca be terpreted a a predctor of the lear combato of radom varable ot cluded the ample. The deg-baed predcto approach to fte populato ca be exteded to other tuato. Predctor of realzed radom effect have bee developed by Staek ad Sger (004) the cotext of two tage amplg wth repoe error. Ther developmet lmted to populato wth equal ze cluter, but the method ca be exteded. Addtoal exteo have bee made to ettg where there are auxlary varable aocated wth each ut the cotext of mple radom amplg (L 003; L ad Staek, 004). Thee exteo beg to develop deg baed method that may be ueful for modelg urvey data. Other exteo, a for example to o-gorable mg repoe mecham, rema to be explored. C06ed5 /4/006 - -

Referece Cael, C.M., Särdal, C.E. ad Wretma, J.H. (977), Foudato of ferece Survey Samplg, ew York, Y: Joh Wley. Cocha, W.G. (977), Samplg Techque, ew York, Y; Joh Wley. L, W. (003), Ue of Radom Permutato Model Rate Etmato ad Stadardzato, Ph.D The, Departmet of Botattc ad Epdemology, Uverty of Maachuett, Amhert, MA. L, W. ad Staek, E.J.. (004), Covarace Adjuted Etmato Uder a Deg Baed Radom Permutato Model, Joural of Stattcal Plag ad ferece, uder revew. Lttle, R.J.A. ad Rub, D.B. (987), Stattcal Aaly wth Mg Data, ew York, Y; Joh Wley. Lu, J. (004), Etmatg parameter whe coderg the uoberved ut a mg value mple radom amplg, Mater The, Departmet of Botattc ad Epdemology, Uverty of Maachuett, Amhert, MA. Oh, H.L. ad Scheure, H.F. (983), Weghtg Adjutmet for Ut orepoe, complete data Sample Survey,, 4-84. Rao, J..K. (985), Codtoal ferece Survey Samplg, Survey Methodology,, 5-3. Rao, J..K. ad Bellhoue, D.R. (978), Etmato of fte populato mea uder geeralzed radom permutato model, Joural of Stattcal Plag ad ferece,, 5-4. C06ed5 /4/006 - -

Royall, R.M. (988), The Predcto Approach to Samplg Theory, Hadbook of Stattc Volume 6, ed. Krhaah, P.R. ad Rao, C.R. ew York, Y; Elever Scece Publher, 399-43. Staek E.J. ad Sger, J.M. (004), Predctg Radom Effect from Fte Populato Clutered Sample wth Repoe Error, Joural of the Amerca Stattcal Aocato, pre. Staek, E.J., Sger, J.M. ad Leça, V.B. (004). A ufed approach to etmato ad predcto uder mple radom amplg, Joural of Stattcal Plag ad ferece,, 35-338. Vallat, R., Dorfma, A.H., ad Royall, R.M. (000), Fte Populato Samplg ad ferece: A Predcto Approach, ew York, Y; Joh Wley. C06ed5 /4/006-3 -