A UNIFIED APPROACH TO ESTIMATION AND PREDICTION UNDER SIMPLE RANDOM SAMPLING

Size: px
Start display at page:

Download "A UNIFIED APPROACH TO ESTIMATION AND PREDICTION UNDER SIMPLE RANDOM SAMPLING"

Transcription

1 A UIFIED APPROACH TO ETIMATIO AD PREDICTIO UDER IMPLE RADOM AMPLIG Edward J. taek III Departmet of Biostatistics ad Epidemiology, PH Uiversity of Massachusetts at Amherst, UA Julio da Motta iger Departameto de Estatística, IME Uiversidade de ão Paulo, Brazil Viviaa Beatriz Lecia Departameto de Ivestigació, FM Uiversidad acioal de Tucumá, Argetia ABTRACT We cosider a probability model where the desig based approach to iferece uder simple radom samplig of a fiite populatio ecompasses a simple radom permutatio superpopulatio model. The model cosists of a expaded set of radom variables followig a radom permutatio probability distributio that keeps track of both the uits labels ad positios i the permutatio. I particular, sice we keep track of the labels, the model allows us to attack the problem of estimatio of a uit s parameter. While some liear combiatios of the expaded set of radom variables correspod to liear combiatios of the uit parameters, other liear combiatios correspod to radom variables kow as radom effects. Usig a predictio techique similar to that employed uder the model-based approach, we develop optimum estimators of the liear combiatios of the uit parameters ad optimum predictors of the radom effects. The ubiased miimum variace estimator of the populatio mea is the sample mea ad of a uit parameter is the Horvitz-Thompso estimator if the uit is icluded i the sample, ad zero otherwise. The predictor of the radom variable at a give positio i the permutatio is i

2 the realized uit s parameter for positios i the sample, ad the sample mea for other positios. For other liear fuctios, uique miimum variace ubiased estimators may ot exist. Key words: Bias, fiite populatio, optimal estimatio, predictio, radom effects, mixed models, super-populatio, desig based, model based, iferece. Ruig Title: UIFIED IFERECE I IMPLE RADOM AMPLIG ii

3 . ITRODUCTIO We propose a probability model iduced by a simple radom sample desig for a fiite populatio that ecompasses a simple radom permutatio super-populatio model. Model based predictio tools are used to optimally estimate liear combiatios of radom variables i the model. Appropriate liear combiatios of the radom variables may be costructed to represet fiite populatio parameters, icludig the parameter for a idividual uit. Other liear combiatios correspod to radom variables that are aalogous to radom effects. The model provides a commo cotext for comparig results o predictio ad estimatio. ice the parameters ca be estimated ad the radom variables ca be predicted i a commo maer, the results lead to iterestig iterpretatios. The probability model we propose was motivated by the desire to costruct iferece for a uit parameter i simple radom samplig. While this problem is ot of compellig iterest, it is closely related to a similar commo problem i two stage samplig where there is iterest i predictig the parameter for a realized uit (or cluster). The ivestigatio of that more complicated problem led us to focus o this simpler settig which still retais some essetial aspects of the two stage problem. We explore the simpler settig here, deferrig further commets o the two stage settig to the discussio. Iferece about a parameter for a idividual labeled uit is ot possible uder the classical desig-based approach sice idividual labeled uits are ot idetifiable i the probability models geerally used to lik the sample to the populatio. I fact, the probability models used for such purposes are typically based o the distributios of exchageable radom variables which igore labels. We overcome this problem by itroducig a discrete probability model where parameters correspod to the values of the labeled uits. The model is based o

4 idicator radom variables geerated by a radom permutatio of uits, as would occur i a simple radom samplig desig. These radom variables keep track of both the uit s label ad the uit s positio i the permutatio. Rather tha characterizig a permuted fiite populatio by radom variables, the expaded framework icludes 2 radom variables. The model we propose does ot rely o the cocept of a super-populatio cosidered uder the model-based approach. However, estimators/predictors of liear combiatios of radom variables are costructed usig the predictio approach commo i model-based iferece. Furthermore, liear combiatios of the radom variables reproduce the simple radom permutatio super-populatio model. The problem we cosider is particularly simple, ad hece is related to a broad literature. The geeral modelig framework for survey samplig is give by Cassel, ärdal ad Wretma (977), with desig based ad model based iferece widely discussed (Bolfarie ad Zacks (992); Hedayat ad iha (99); Mukhopadhyay (200); ärdal, wesso, ad Wretma (992); Thompso (997); Valliat, Dorfma, ad Royall (2000)). Recet reviews of iferece i survey samplig are give by Rao (997, 999a). Brewer, Haif, ad Tam (988), Brewer (999) ad Brewer (2002) have discussed recocilig model-based ad desig based iferece. The radom permutatio super-populatio model has bee discussed by Rao ad Bellhouse (978), Mukhopadhyay (984) ad Rao (984), ad i the cotext of two stage samplig, by Padmawar ad Mukhopadhyay (985) ad Bellhouse ad Rao (986). Model-based approaches to the two-stage problem have bee studied by cott ad mith (969) ad Fuller ad Battese (973), ad recetly reviewed i the cotext of small area estimatio by Rao (999b). Of particular relevace are the fudametal results of Godambe (955) ad Godambe ad Joshi (965) that o uiform miimum variace ubiased liear estimator of the populatio total 2

5 exists if coefficiets are allowed to deped o the sequece of labels i the sample. Royall (969) coutered this result with the observatio that if radom variables represetig the samplig were reduced to their usual represetatio, where oe radom variable is associated with each selected uit, optimal estimators could be obtaied. Other approaches to overcome the o-existece result of Godambe have bee suggested by Hartley ad Rao (968,969). Our approach is i the same spirit as that of Royall s 969 result, where we reduce the most geeral set of radom variables defied by Godambe to a set of 2 radom variables. Defiitios ad otatio are developed i ectio 2 ad the expaded model is fully defied i ectio 3. Iterest is focused o liear combiatios of the radom variables defied i the expaded model. Certai liear combiatios simplify to o-stochastic fiite populatio parameters; other liear combiatios are radom variables. ice both parameters ad radom variables ca be defied by the liear combiatios, the methods we develop i ectio 4 are appropriate for both estimators (of parameters) ad predictors (of radom variables). For simplicity, we use the term estimator i referece to geeral liear combiatios of radom variables. The expaded model eables estimatio of the populatio mea, as well as parameters for labeled uits. The sample mea is the best liear ubiased estimate of the populatio mea. For a sigle uit, the best liear ubiased estimator is uique ad of the Horvitz-Thompso (952) type if the uit is icluded i the sample, ad zero otherwise. imultaeous estimatio of all uit parameters i the populatio does ot i geeral lead to uique estimators. However, with differet additioal restrictios, differet uique estimators arise. The predictor of the radom variable correspodig to the i th positio i a ordered permutatio, while ot of ay obvious 3

6 iterest, turs out to be aalogous to the widely used predictor of a realized radom effect i a mixed model. These results are discussed further i ectio DEFIITIO AD OTATIO We cosider the problem of estimatig certai characteristics of a fiite populatio of uits uder simple radom without replacemet samplig. We defie a fiite populatio as a collectio of a kow umber,, of idetifiable uits labeled =,,. Associated with uit is a parameter y. We summarize the set of parameters i the vector = ( ) y y,, y ad assume that whe uit is observed, the parameter y is kow without error. Typically, there is iterest i a p vector of parameters of the form β = Gy where G is a matrix of kow costats. For example, if G = I, with I deotig the -dimesioal idetity matrix, the β is the set of idividual parameters. If G = e, where e deotes a -dimesioal colum vector with ull elemets i all positios except for the th positio for which the value is assiged, the parameter β correspods to the value y associated with the uit labeled i the populatio. Whe G =, where deotes a -vector with all elemets equal to, β correspods to the populatio mea, µ. We defie a probability model that liks the populatio parameters to a expaded vector of radom variables which is essetially iduced by a simple radom samplig desig, ad develop estimators of liear fuctios of these radom variables. The proposed estimators are liear fuctios of the radom variables that defie a sample. We use the predictio approach that is commo i model-based iferece to develop the estimators. Before itroducig the 4

7 expaded model, we first review the predictio approach used i the cotext of super-populatio models. The predictio approach is based o a uderlyig probability model for a vector of radom variables * Y = (,, ) that characterizes a super-populatio. The populatio uder Y Y study, y = ( y,, y ), is cosidered to be a realizatio of these super-populatio radom variables. The vector of radom variables is partitioed ito a subset which we call the sample, Y = ( Y,, Y ) ad the remaider, * Y, such that * R = ( Y+,, Y) Y = ( Y, Y ). Iferece is * * * R solely based o liear models of the form * * * * Y = X β + E (2.) where * X is a kow o-stochastic matrix, parameters ad * β is a p-dimesioal vector of super-populatio * E is a -dimesioal vector of radom errors govered by the probability model uder which * E ξ ( ) = E 0, where ξ deotes expectatio with respect to the superpopulatio. Although the super-populatio parameters appear i the model, they are ot of primary iterest. Istead, the parameters of iterest are liear combiatios realizatio of β= Gy of a * Y. The populatio mea ad the populatio total are typical examples of β. Assumig that * Y is realized, the estimator of β is based o the predictor of fuctios of it, i such a way that it satisfies some optimality criteria (see Royal (976) or * Y R or some Bolfarie ad Zacks (992), for example). More specifically, Valliat et al. (2000, pp.29-30) poit out that the target parameters may be writte as β= β + β R, where β deotes the part of the liear combiatio observed i the sample ad β R deotes the part associated with the osampled uits. After selectig the sample, the problem of estimatig β is equivalet to 5

8 predictig β R ad the best liear ubiased estimate (BLUE) of β is obtaied by addig the optimal predictor of βr to β. The predictio process relies o the probability model for the super-populatio ad does ot ecessarily deped o the physical process used to select the sample. 3. THE EXPADED MODEL Our mai obective is to express a expaded set of radom variables iduced by the desig-based approach i the form of model (2.). We show that this model allows the costructio of estimators of liear combiatios of the correspodig radom variables based o the same optimality criteria cosidered uder the predictio approach. ome of these liear combiatios correspod to populatio parameters, while others are radom variables. We restrict ourselves to the case where the sample is selected by simple radom samplig without replacemet. We first describe the typical desig-based radom permutatio model, ad the itroduce the expaded model. A advatage of the expaded model is the ability to idetify a parameter associated with a labeled uit. Assumig simple radom without replacemet samplig, the typical radom permutatio probability model assigs equal probability to all permutatios of the fiite populatio uits. We idex each uit s positio i the permutatio by i=,...,. The value i positio i for a radomly selected permutatio is defied by the realizatio of the radom variable Y = U y i i = where U i = if uit is i positio i ad U i =0 otherwise. The radom vector ( 2 ) Y = Y Y Y is the radom permutatio super-populatio (Cassel et al., 977), ad the 6

9 radom variables Y, i =,...,, correspod to a sample. This represetatio of radom variables i does ot allow uits to be idetified ad hece does ot permit iferece about uit parameters. The expaded model is based o represetig the radom variables i the sum = U y i as idividual radom variables of the form Y = U i i y, which we summarize i a 2 vector ( 2 ) Y= Y Y Y where = ( 2 ) be defied compactly as = ( ) vec ( ) (earle, 982), y Y Y Y Y. The vector of radom variables ca Y D I U, where deotes the Kroecker product D y is a diagoal matrix with the elemets of y alog the mai diagoal, vec ( U ) is a vector represetig the colum expasio of U, ad U U2 U U 2 U 22 U 2 U =. U U U Give the radom structure of U, the expected value ad the variace of the expaded radom vector are respectively give by ad E ( Y) = Xy (3.) var ( ) Y = P (3.2) where X I, = P = I a J with J =, ad a a a a a a = DPD y y. (3.3) 7

10 The selectio of a simple radom sample of size from the populatio will result i the realizatio of of the expaded radom variables i the vector Y. We gather these radom variables for the sample i the vector = ( ( ) ) Y I 0 Y by rearragig the elemets i = the vector Y ; similarly, the remaiig ( ) radom variables are defied by the vector ( ( - ) ( - )) YR = 0 I Y, where = A deotes a block diagoal matrix, with blocks give = by A (earle, 982). The variace of the rearraged expaded radom vector is partitioed as Y V VR Var = YR VR VR V J, with J ( ) =. where ( = V I J ) ad R = ( ( ) ) As a illustratio, cosider a fiite populatio with = 4 uits from which we select a simple radom sample without replacemet of size 2 follows that Y = =. Lettig = ( ) y y y y y it ( y ( U U2 U3 U4 ) y2 ( U2 U22 U32 U42 ) y3 ( U3 U23 U33 U43 ) y4 ( U4 U24 U34 U44 )) ( ( 2 ) 2 ( 2 22 ) 3 ( 3 23 ) 4 ( 4 24 )) Y = y U U y U U y U U y U U ( ( 3 4 ) 2 ( ) 3 ( ) 4 ( )) YR = y U U y U U y U U y U U ad,. upposig that the first ad secod selected uits i a sample are uits 3 ad, respectively, the realized value of ( ) y y. 3 Y is 4. ETIMATIO A characteristic of the proposed model is that the vector of parameters y may be defied as liear combiatios LY of the expaded radom variables. For example, settig 8

11 L = I, (4.) LY = y, while the value for uit i the populatio, y, is defied by settig L = e. (4.2) The populatio mea µ is defied by settig 2 L =. (4.3) More geerally, we ca defie other liear combiatios of Y which are stochastic. For example, a radom variable correspodig to the value that will appear i the i th positio i a permutatio is defied by settig L = e. (4.4) i I geeral, for liear combiatios defied i terms of the expaded radom variables, we ca discuss estimatig a parameter or predictig a radom variable. The specificatio of L is ecessary to determie whether LY is fixed or radom. We ca ecompass both the estimatio ad the predictio problems i the same framework. As previously oted for simplicity, we use the term estimatio i referece to a geeral liear combiatios of radom variables. It is ot ecessary to use the expaded radom variables to develop estimators for all liear combiatios of Y. To see this, we evaluate the liear combiatio usig the expasio give by Y= I Y + ( P I) Y. (4.5) For example, usig (4.3), the liear combiatio defiig the populatio mea simplifies to LY = Y. imilarly, usig (4.4), the liear combiatio defiig the radom variable 9

12 correspodig to the value that will appear i the i th positio i a permutatio simplifies to LY = e Y. For the liear combiatios defied by (4.3) ad (4.4), the optimal estimator ca be i developed by solely cosiderig the radom variables Y sice the first ad secod terms i (4.5) are orthogoal, ad the secod term has expected value equal to zero (Rao ad Bellhouse (978), Theorem.). Usig the predictio approach, we develop the solutio to the problem of estimatig LY based o a sample. First, we partitio LY ito a sample compoet, LY, ad a remaiig compoet, LY. R R We require the predictors of LY R R to be liear i the sample, ad represet them by LRY. Defiig C= L + L R, the class of estimators of LY is give by E { : is a matrix of costats} C = CY C p. We require the estimators to be ubiased (such that ( ) geeralized mea squared error give by ( ) GME = Var p CY LY (4.6) (Bolfarie ad Zacks, 992). Usig (3.), we may write ( CY ) E CY LY = 0 ), ad have miimum E = CX y, where X I (4.7) = so that the ubiased coditio reduces to CX y = CX = LX. (4.8) LXy for all y, or equivaletly We solve (4.8) for C i terms of a arbitrary matrix, ad the miimize the GME with respect to that matrix. Whe LY is o-stochastic, the result is give by ˆ J C= L I + p p ( ) P T I P (4.9) 0

13 where T is a arbitrary matrix resultig from use of geeralized iverses to obtai the solutio (see Appedix A). olutios to the problem of estimatig a liear fuctio of LY are developed i a similar maer. We briefly outlie the solutio that was first give by Royall (976). First, ote that ( Y) E = X µ, where X=, ad ( ) 2 2, where ( ) 2 Var Y = σ P σ = y µ Y ito the sample, Y = ( Y,, Y ), ad the remaider, Y = ( Y,, Y ), results i Y V V R Var = Y R V R V R where 2 V ( = σ I J) ad 2 ( V ) R = σ J ( ) = R +. Partitioig. We partitio X ad L i a similar maer resultig i X =, X = ad LY = L Y + L Y. We require the predictor of R R R to be a liear fuctio of the sample, R R L Y R LY E, to be ubiased, i.e. to satisfy ( L R Y ) = E( L RY R ), ad to have miimum GME (give by var ( ) p CY LY, where C = L + L ). The resultig estimator is R where ( ) ( ˆ α) ˆ CY ˆ = L Y + L R Rα + X V RV Y X (4.0) = XV X XV Y. αˆ 4. Estimatig y value We obtai the estimator of LY with L defied by (4.2) correspodig to a particular y associated with the uit labeled. ice p =, P 0 ad (4.9) simplifies to p =

14 ˆ C= ( e ). (4.) This correspods to y whe uit is icluded i the sample, ad zero otherwise, a Horvitz- Thompso type estimator of the uit s value. For such a estimator, 2 GME = y. 4.2 Estimatig y We develop simultaeous ubiased estimators of all the idividual parameters, y, i a fiite populatio ext. These parameters are defied by settig L equal to (4.). ice p the solutio give by (4.9) simplifies to ˆ C= ( I ) + P T ( I P ) (4.2) =, where T is a arbitrary matrix. I geeral, the secod term i (4.2) is ot zero, ad hece there are multiple solutios, each of which has ( ) GME = σ 2. Uique estimators ca be obtaied by imposig restrictios o the structure of the coefficiets, C. For example, if we assume that C= I v, where v is a vector of ukow costats, followig the same strategy, we may show that the uique estimator of y is CY ˆ = ( I ) Y. (4.3) This restrictio forces the coefficiets to be the same for differet parameters, but allows the coefficiets to differ with positio. However, ot all structures for C lead to ubiased estimators. For example, there are o solutios for C= J v. 2

15 A more geeral class of estimators ca be cosidered if we replace the requiremet of uit ubiasedess by average ubiasedess, E ( ) 0 CY LY =. With this requiremet, ad proceedig i a maer similar to that used to obtai (4.2), the estimator of y simplifies to ˆ CY = J Y + P T Y. (4.4) This estimator is ot uique sice T is arbitrary. If C= I v, a uique solutio results ad is give by (4.3). If C is restricted to be of the form C= J v, it follows that the uique solutio is CY ˆ = y, the sample mea for each elemet. We illustrate these results via a simple example. Let us assume that = 4, = 2, y = ( y y y y ) ad that the realized value of Y is ( ) y y, i.e., the 3 third uit was selected i the first positio ad the first uit was selected i the secod positio i the sample. If we require the estimator (4.2) to be liear ad ubiased, oly the estimator for the uselected uit (ad also for uits for which y = 0 ) is uique, ad equal to zero. The estimate for uit = is ay, while the estimate for uit = 3 is cy 3 with a ad c deotig fuctios of elemets i the arbitrary matrix, T. If we require the estimator to be liear ad ubiased, ad restrict the coefficiets to be of the form C= I v, the the uique estimates for uits = ad = 3 are give by the Horvitz-Thompso type estimate, 4 y. The estimates 2 for uit = 2 ad = 4 are zero. Usig the average ubiased costrait, ad requirig estimators to be liear i the sample with coefficiets of the form C= J v, the uique estimate for all uits is give by the sample mea, y. 3

16 4.3 Estimatig µ ad predictig radom variable(s) i the i th positio i a permutatio based o Y. The liear combiatio LY with L give by (4.3) defies the populatio mea; settig L equal to (4.4) defies the radom variable that will appear i the i th positio i a permutatio. Usig (4.5), both liear combiatios are equal to liear fuctios of LY with i L =, ad L = e, respectively. Usig the coefficiets that defie the populatio mea, ad otig that J V = 2 I + ad ˆ α = y, the estimator (4.0) simplifies to y, the sample mea. σ e e e where e i is a vector of dimesio to predict the We partitio = ( ) i i ir radom variable i the i th positio i a permutatio. Whe i, CY ˆ = e Y which will i correspod to the value of the uit that is i the i th positio i a realized permutatio, i.e. uy i (where i = u represets the realized value of ( ˆ α) U i ). Whe i >, LY = L RY R, ad ˆ CY ˆ = e ir Rα + X V RV Y X which simplifies to y. The GME of the predictor is zero whe i, ad equal to σ 2 + whe i>. imultaeous predictors of the uits realized i all positios are defied by settig L = I ad result i the same predictors as those obtaied for the idividual positios. The predictors correspod to the realized uit s values whe i, ad to the sample mea whe i >. For the vector of predictors, estimator of y i ectio 4.2. ( ) 2 GME = σ, which is equal to the GME of the 4

17 5. DICUIO Desig based ad model based methods are usually discussed as separate approaches for estimatio ad iferece i fiite populatio samplig. We have preseted a expaded probability model iduced by the possible physical process of simple radom samplig. ice o super-populatio model is required ad the probability model arises solely from samplig, we cosider the resultig estimators to be desig-based. o additioal assumptios or cocepts are required for estimatio, which is accomplished by developig predictors of liear fuctios of the uobserved radom variables. Liear fuctios of the expaded probability model lead to a set of radom variables referred to by others as a simple radom permutatio super-populatio model (Cassel, ärdal ad Wretma (977)). Thus, the expaded model ecompasses both desig ad model based frameworks. Although we feel that the expaded model uifies aspects of survey samplig methodology for simple radom samplig, it has ot yet bee exteded to the broad class of super-populatio models, icludig the more geeral radom permutatio superpopulatio models. Others have ivestigated a radom permutatio model i the cotext of a superpopulatio framework, ad cocluded that the sample mea is the uiform miimum variace ubiased estimator of the populatio mea (Rao ad Bellhouse, 978). I such a framework, the likelihood is uiformative for uit parameters, ad estimatio has focused o the mea. Iclusio probabilities for labeled uits, as opposed to the basic idicator radom variables uderlyig uit selectio are used. Although the idicator radom variables used to defie the expaded probability model are ot ew (see, for example, eyma (934, 935), ad Kempthore (952)), their use i developig estimators of uit parameters appears to be ovel. 5

18 The expaded model exteds the typical permutatio model to a broader set of radom variables, but falls short of the very geeral set of radom variables evisioed by Godambe (955) which spas a ( ) dimesioal space. The radom variables i a typical permutatio model spa a dimesioal space. The radom variables i the expaded model spa a ( ) 2 dimesioal space. Higher dimesioal radom variables may be postulated itermediate to Godambe s geeral model that may lead to ew isights. Our motivatio i developig the expaded permutatio model was to improve our uderstadig of realized radom effects i the cotext of a mixed model. I a mixed model, a realized radom effect is commoly defied as the differece betwee the parameter for a realized uit, ad the mea of a populatio. With this defiitio, the expected value of a radom effect is zero. To simplify the discussio, we defie a realized radom effect as the parameter for a labeled uit that is realized at a particular positio i a permutatio. Our defiitio is a reparameterizatio of the defiitio commoly used for mixed models. If a uit is icluded i a simple radom sample, the realized radom effect is simply the parameter for that uit. The value of the parameter (which is observed) is the best liear ubiased predictor. ice the predictor is the parameter for the realized uit, may we iterpret the predictor as a predictor of the parameter for a specified uit? The expaded model provides the aswer to this questio sice we ca predict a radom effect ad a specified uit as separate liear combiatios of radom variables i the same model. The liear combiatios that defie these two quatities differ, as do their estimators. A clearer statemet of the iterpretatio for what is commoly referred to as the predictor of a realized radom effect is the predictor of a positio i a permutatio. I fact, sice the expected value of the radom variable at a positio 6

19 is the populatio mea, the predictor of this parameter will almost ever equal the parameter beig predicted. I a aalogous maer, the predictor of a realized radom effect i a simple mixed model will carry the iterpretatio as the predictor of the expected value of uits that ca occur at a positio i a permutatio. imilar results based o a expaded model for cluster samplig, while outside the scope of this paper, have bee developed for equal size clusters both with ad without respose error (taek ad iger, 2002a) ad i a ubalaced settig (taek ad iger, 2002b). The expaded framework is particularly importat to retai the estig of secodary samplig uits i primary samplig uits i a ubalaced two stage samplig cotext. uch results share the basic awkward iterpretatio as predictors of positios, ot idetified uits, as illustrated i the expaded simple radom samplig model. While the results preseted here are for simple radom samplig, extesios to may other sample settigs appear to be feasible. uch extesios iclude addig measuremet error to simple radom samplig, stratified samplig, ad ubalaced cluster samplig settigs. trategies that accout for covariates have bee iitially addressed i dissertatios by Lecia (2002) ad Li (2002). Extesios also appear feasible for experimetal studies. There are also limitatios. The two stage sample results are limited by the curret lack of a optimal strategy for variace compoet estimatio. trategies for hadlig a cotiuous covariate are ot yet developed ad may ot be feasible. Extesios to uequal probability samplig, may be possible but have ot yet bee developed. From a differet perspective, i the expaded model, liear combiatios of radom variables that correspod to uit parameters ca be defied, ad have a clear iterpretatio. The ubiased estimator of a uit s parameter (which correspods to the Horvitz-Thompso estimator 7

20 whe the uit is icluded i the sample, or zero otherwise) suffers from the criticism of Basu s (97) elephat example. The estimator is ot ituitive, although it clearly satisfies the costrait for ubiasedess. May practitioers have used the predictor of a positio i a permutatio as a estimate of the parameter for a uit i the populatio. uch a estimator correspods to the value for the uit if it is icluded i the sample or to the sample mea if it is ot i the sample ad may be writte as ( ) yˆ s = I{ } Yi + I{ } Yi / = s = s i= = where I { = s} deotes a idicator fuctio. This ad hoc estimator may be expressed i terms of the elemets of the expaded radom vector Y but ot i terms of the collapsed radom variables Y i. However, it is a o-liear fuctio of Y, suggestig that beyod the eed of keepig track of both labels ad values attached to the uits i the populatio for which we wat to draw iferece, a broader class of estimators is eeded to obtai such a result. Oe way the oliearity ca be avoided is to defie a exteded set of radom variables, beyod those proposed i this paper. Curret research is uderway to ivestigate such expaded sets, ad use them to develop liear predictors of specific uits. 8

21 Ackowledgemets The authors are grateful to the Coselho acioal de Desevolvimeto Cietífico e Tecológico (CPq), Fudação de Amparo à Pesquisa do Estado de ão Paulo (FAPEP), FIEP (PROEX), Coordeação de Aperfeiçoameto de Pessoal de ível uperior (CAPE), Brazil ad to the atioal Istitutes of Health (IH-PH-R0-HD36848), UA, for fiacial support. The authors also wish to thak Dalto Adrade, Heleo Bolfarie, Joh Buoaccorsi, ad Oscar Loureiro for helpful commets that lead to improvemets i the mauscript. The authors also gratefully ackowledge helpful commets by referees that have lead to improvemets i the paper. 9

22 REFERECE Basu, D. (97). A essay o the logical foudatios of survey samplig, Part. I: V.P. Godambe ad D.A. prott, Eds., Foudatios of tatistical Iferece, Holt, Riehart ad Wisto, Toroto, Bellhouse, D.R. ad Rao, J..K. (986). O the efficiecy of predictio estimators i two-stage samplig, J. tat. Pla. Iferece, 3, Bolfarie, H. ad Zacks,. (992). Predictio Theory for Fiite Populatios. priger-verlag, ew York. Brewer, K.R.W., Haif, M., ad Tam,.M. (988). How early ca model-based predictio ad desig-based estimatio be recociled?, J. Am. tat. Assoc. 83, Brewer, K.R.W. (999). Desig-based or predictio-based iferece? tratified radom vs stratified balaced samplig, It. tat. Rev., 67, Brewer, K.R.W. (2002). Combied urvey amplig Iferece. Oxford Uiversity Press, ew York. Cassel, C.M., ärdal, C.E. ad Wretma, J.H. (977). Foudatios of Iferece i urvey amplig. Wiley, ew York. Fuller, W. A. ad Battese, G.E. (973). Trasformatios for estimatio of liear models with ester-error structure. Am. tat. Assoc. 68, Godambe, V.P. (955). A Uified Theory of amplig from Fiite Populatios. J. Roy. tatist. oc. er. B 7,

23 Godambe, V.P. ad Joshi, V.M. (965). Admissibility ad Bayes estimatio i samplig from fiite populatios:i. A.Math.tatist. 26, Graybill, F.A. (983) Matrices with Applicatios i tatistics. Wadsworth Iteratioal Group, Belmot, Calif. Hartley, H.O. ad Rao, J..K. (968). A ew Estimatio Theory for ample urveys. Biometrika 55, Hartley, H.O. ad Rao, J..K. (969). A ew Estimatio Theory for ample urveys, II. I ew Developmets i urvey amplig, Godambe ad prott (Eds), ew York: Wiley Itersciece, Hedayat,A. ad iha, B.K. (99). Desig ad Iferece i Fiite Populatio amplig. Wiley, ew York. Horvitz, D.G. ad Thompso, D.J. (952). A geeralizatio of samplig without replacemet from fiite populatio. J. Am. tat. Assoc. 47, Kempthore, O. (952). Desig ad Aalysis of Experimets. Wiley, ew York. Lecia, V.B. (2002). Modelos de Efeitos Aleatórios E Populações Fiitas, Ph.D. dissertatio i the Departmet of tatistics, Uiversity of ao Paulo, ao Paulo, Brazil. Li, W. (2002). Use of Radom Permutatio Models i Rate Estimatio ad tadardizatio. Ph.D dissertatio i the Departmet of Biostatistics ad Epidemiology, Uiversity of Massachusetts, Amherst, Massachusetts. 2

24 Mukhopadhyay, P. (984). Optimum estimatio of a fiite populatio variace uder geeralized radom permutatio models. Calcutta tat. Assoc. Bull., 33, Mukhopadhyay, P. (200). Topics i urvey amplig, priger Lecture otes i tatistics 53, ew York. eyma, J. (934). O the two differet aspects of the represetatio method: the method of stratified samplig ad the method of purposive selectio. J. Roy. tatisti. oc. 97, eyma, J.K. Iwaszkiewicz, et al. (935). tatistical problems i agricultural experimetatio. Roy. tatisti. oc. 2 (upplemet) Padmawar, V.R. ad Mukhopadhyay, P. (985). Estimatio uder two-stage radom permutatio models. Metrika 25, Rao, J..K. ad Bellhouse, D.R. (978). Optimal estimatio of a fiite populatio mea uder geeralized radom permutatio models. J. tat. Pla. Iferece 2, Rao, J..K. (997). Developmets i ample urvey Theory: A Appraisal. Ca. J. tat. 25, - 2. Rao, J..K. (999a). ome Curret Treds i ample urvey Theory ad Methods. akhyã B, 6, -25. Rao, J..K. (999b). ome recet advaces i model-based small area estimatio, urv.meth. 25, Rao, T.J. (984). ome aspects of radom permutatio models i fiite populatio samplig theory, Metrika, 3,

25 Royall, R.M. (969). A Old Approach to Fiite Populatio amplig Theory. J. Am. tat. Assoc. 63, Royall, R.M. (976). The Liear Least-quares Predictio Approach to Two-tage amplig. J. Am. tat. Assoc. 7, ärdal, C-E, wesso, B., ad Wretma, J. (992). Model Assisted urvey amplig. priger-verlag, ew York. earle,.r. (982). Matrix Algebra Useful for tatistics. Joh Wiley, ew York. cott, A. ad mith, M.F. (969). Estimatio i Multi-tage urveys. J. Am. tat. Assoc. 64, taek, E.J. III ad iger, J.M. (2002). Predictig Realized Radom Effects with Clustered amples from Fiite Populatios with Respose Error. Upublished: ( taek, E.J. III ad iger, J.M. (2002). Predictig Realized Cluster Parameters from Two tage ample of Uequal ize Clustered Populatios. Upublished: ( Thompso, M.E. (997). Theory of ample urveys. Chapma ad Hall, Lodo. Valliat, R., Dorfma, A.H. ad Royall, R.M. (2000). Fiite Populatio amplig ad Iferece: A Predictio Approach Joh Wiley, ew York. 23

26 Appedix A: Optimal estimators We solve (4.8) for C ad the miimize the GME with respect to that matrix. First, ote that (4.8) ca be re-expressed as XC = XL. I geeral for fixed matrices A ad B, the set of solutios to = AW B is give by = + ( ) W A B I A A Z where A is a specific g-iverse of A ad Z is a arbitrary matrix (as defied by Graybill (983)). We make repeated use of this result i obtaiig the solutio. ettig = X I, all solutios that satisfy the costrait for ubiasedess are give by J C = I + ( ) L I P Z, (A.) where Z p is a arbitrary matrix. Whe LY is o-stochastic, (as i (4.), (4.2) ad (4.3)), the GME i (4.6) simplifies to which is a fuctio of Z. Defiig ( ) GME = Var CY = CV C, p p p p c = Z (A.2) p ad J = a I L p, the GME simplifies to ( ) 2 ( ) GME = ava + c P c+ c P a. (A.3) Differetiatig (A.3) with respect to c ad settig the resultig derivatives equal to zero yields ( ) ˆ = ( ) P c P a. ice give as P is orthogoal to ( ) P a= 0 ad the solutios are J, ( ) c ˆ = I PP r, (A.4) 24

27 where r is a arbitrary vector. We replace c by (A.4) i equatio (A.2), ad solvig for ˆ Z, results i Z ˆ p = p ( ) + p p p r I PP PT, (A.5) where = p p p, ad T is a arbitrary matrix. ubstitutig (A.5) ito (A.) ad simplifyig, ˆ J C= L I + p ( ) + p p ( ) r p I P P T I P, (A.6) where both r ad T are arbitrary, ad is a g-iverse of (3.3). To defie, we first let m be the umber of values of y, =,..., that are ozero. Furthermore, let y represet a vector with elemets equal to y if y 0, ad zero otherwise. Fially, let i y represet a vector with elemets equal to oe if y 0, ad zero otherwise. With such defiitios, we defie a g-iverse of as = ( ) + D D y y m yy. (A.7) Pre-multiplyig this expressio by yields = D. ubstitutig this expressio ito (A.6), i y ˆ J C= L I + p + p p ( ) r p Dy 0 P P T I P (A.8) where D = I D, a diagoal matrix with diagoal elemets equal to zero for diagoal y0 iy elemets with y 0, ad oe for diagoal elemets with y = 0. The geeral result give by (A.8) ca be simplified by otig that for all Y, r D y 0 P Y = 0. otig that the GME will ot chage with differet choices of the 25

28 arbitrary vector r, elimiatig the term that depeds o r will ot alter the predictor or the GME, ad simplifies the result. As a result, optimal estimators ca be costructed usig ˆ J C= L I + p p ( ) P T I P. (A.9) 26

DESIGN BASED PREDICTION IN SIMPLE RANDOM SAMPLING WITH APPLICATION TO RANDOM EFFECTS

DESIGN BASED PREDICTION IN SIMPLE RANDOM SAMPLING WITH APPLICATION TO RANDOM EFFECTS DEIG BAED PREDICTIO I IMPLE RADOM AMPLIG WITH APPLICATIO TO RADOM EFFECT Edward J. taek III Departmet of Biostatistics ad Epidemiology, PH Uiversity of Massachusetts at Amherst, UA Julio da Motta iger

More information

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable.

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable. Chapter 10 Variace Estimatio 10.1 Itroductio Variace estimatio is a importat practical problem i survey samplig. Variace estimates are used i two purposes. Oe is the aalytic purpose such as costructig

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

Element sampling: Part 2

Element sampling: Part 2 Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

CS284A: Representations and Algorithms in Molecular Biology

CS284A: Representations and Algorithms in Molecular Biology CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

Discrete-Time Systems, LTI Systems, and Discrete-Time Convolution

Discrete-Time Systems, LTI Systems, and Discrete-Time Convolution EEL5: Discrete-Time Sigals ad Systems. Itroductio I this set of otes, we begi our mathematical treatmet of discrete-time s. As show i Figure, a discrete-time operates or trasforms some iput sequece x [

More information

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters

More information

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random Part III. Areal Data Aalysis 0. Comparative Tests amog Spatial Regressio Models While the otio of relative likelihood values for differet models is somewhat difficult to iterpret directly (as metioed above),

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

ECON 3150/4150, Spring term Lecture 3

ECON 3150/4150, Spring term Lecture 3 Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step ECON 3150/4150, Sprig term 2014. Lecture 3 Ragar Nymoe Uiversity of Oslo 21 Jauary 2014 1 / 30 Itroductio

More information

Principle Of Superposition

Principle Of Superposition ecture 5: PREIMINRY CONCEP O RUCUR NYI Priciple Of uperpositio Mathematically, the priciple of superpositio is stated as ( a ) G( a ) G( ) G a a or for a liear structural system, the respose at a give

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

Chimica Inorganica 3

Chimica Inorganica 3 himica Iorgaica Irreducible Represetatios ad haracter Tables Rather tha usig geometrical operatios, it is ofte much more coveiet to employ a ew set of group elemets which are matrices ad to make the rule

More information

The Random Walk For Dummies

The Random Walk For Dummies The Radom Walk For Dummies Richard A Mote Abstract We look at the priciples goverig the oe-dimesioal discrete radom walk First we review five basic cocepts of probability theory The we cosider the Beroulli

More information

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A. Radom Walks o Discrete ad Cotiuous Circles by Jeffrey S. Rosethal School of Mathematics, Uiversity of Miesota, Mieapolis, MN, U.S.A. 55455 (Appeared i Joural of Applied Probability 30 (1993), 780 789.)

More information

Modified Ratio Estimators Using Known Median and Co-Efficent of Kurtosis

Modified Ratio Estimators Using Known Median and Co-Efficent of Kurtosis America Joural of Mathematics ad Statistics 01, (4): 95-100 DOI: 10.593/j.ajms.01004.05 Modified Ratio s Usig Kow Media ad Co-Efficet of Kurtosis J.Subramai *, G.Kumarapadiya Departmet of Statistics, Podicherry

More information

An Introduction to Randomized Algorithms

An Introduction to Randomized Algorithms A Itroductio to Radomized Algorithms The focus of this lecture is to study a radomized algorithm for quick sort, aalyze it usig probabilistic recurrece relatios, ad also provide more geeral tools for aalysis

More information

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1. Eco 325/327 Notes o Sample Mea, Sample Proportio, Cetral Limit Theorem, Chi-square Distributio, Studet s t distributio 1 Sample Mea By Hiro Kasahara We cosider a radom sample from a populatio. Defiitio

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

5. Fractional Hot deck Imputation

5. Fractional Hot deck Imputation 5. Fractioal Hot deck Imputatio Itroductio Suppose that we are iterested i estimatig θ EY or eve θ 2 P ry < c where y fy x where x is always observed ad y is subject to missigess. Assume MAR i the sese

More information

(3) If you replace row i of A by its sum with a multiple of another row, then the determinant is unchanged! Expand across the i th row:

(3) If you replace row i of A by its sum with a multiple of another row, then the determinant is unchanged! Expand across the i th row: Math 5-4 Tue Feb 4 Cotiue with sectio 36 Determiats The effective way to compute determiats for larger-sized matrices without lots of zeroes is to ot use the defiitio, but rather to use the followig facts,

More information

Improved Class of Ratio -Cum- Product Estimators of Finite Population Mean in two Phase Sampling

Improved Class of Ratio -Cum- Product Estimators of Finite Population Mean in two Phase Sampling Global Joural of Sciece Frotier Research: F Mathematics ad Decisio Scieces Volume 4 Issue 2 Versio.0 Year 204 Type : Double Blid Peer Reviewed Iteratioal Research Joural Publisher: Global Jourals Ic. (USA

More information

Frequentist Inference

Frequentist Inference Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for

More information

Abstract. Ranked set sampling, auxiliary variable, variance.

Abstract. Ranked set sampling, auxiliary variable, variance. Hacettepe Joural of Mathematics ad Statistics Volume (), 1 A class of Hartley-Ross type Ubiased estimators for Populatio Mea usig Raked Set Samplig Lakhkar Kha ad Javid Shabbir Abstract I this paper, we

More information

(3) If you replace row i of A by its sum with a multiple of another row, then the determinant is unchanged! Expand across the i th row:

(3) If you replace row i of A by its sum with a multiple of another row, then the determinant is unchanged! Expand across the i th row: Math 50-004 Tue Feb 4 Cotiue with sectio 36 Determiats The effective way to compute determiats for larger-sized matrices without lots of zeroes is to ot use the defiitio, but rather to use the followig

More information

6 Sample Size Calculations

6 Sample Size Calculations 6 Sample Size Calculatios Oe of the major resposibilities of a cliical trial statisticia is to aid the ivestigators i determiig the sample size required to coduct a study The most commo procedure for determiig

More information

Sampling, WLS, and Mixed Models Festschrift to Honor Professor Gary Koch

Sampling, WLS, and Mixed Models Festschrift to Honor Professor Gary Koch Samplig, WLS, ad Mixed Models Festschrift to Hoor Professor Gary Koch Edward J. Staek III Departmet of Public Health Uiversity of Massachusetts, Amherst, MA ad Julio M Siger Departameto de Estatística

More information

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise) Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +

More information

Sequences, Mathematical Induction, and Recursion. CSE 2353 Discrete Computational Structures Spring 2018

Sequences, Mathematical Induction, and Recursion. CSE 2353 Discrete Computational Structures Spring 2018 CSE 353 Discrete Computatioal Structures Sprig 08 Sequeces, Mathematical Iductio, ad Recursio (Chapter 5, Epp) Note: some course slides adopted from publisher-provided material Overview May mathematical

More information

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n. Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator

More information

o <Xln <X2n <... <X n < o (1.1)

o <Xln <X2n <... <X n < o (1.1) Metrika, Volume 28, 1981, page 257-262. 9 Viea. Estimatio Problems for Rectagular Distributios (Or the Taxi Problem Revisited) By J.S. Rao, Sata Barbara I ) Abstract: The problem of estimatig the ukow

More information

Commutativity in Permutation Groups

Commutativity in Permutation Groups Commutativity i Permutatio Groups Richard Wito, PhD Abstract I the group Sym(S) of permutatios o a oempty set S, fixed poits ad trasiet poits are defied Prelimiary results o fixed ad trasiet poits are

More information

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution Iteratioal Mathematical Forum, Vol., 3, o. 3, 3-53 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/.9/imf.3.335 Double Stage Shrikage Estimator of Two Parameters Geeralized Expoetial Distributio Alaa M.

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

Sampling, WLS, and Mixed Models Festschrift to Honor Professor Gary Koch

Sampling, WLS, and Mixed Models Festschrift to Honor Professor Gary Koch Samplig, WLS, ad Mixed Models Festschrift to Hoor Professor Gary Koch Edward J. Staek Departmet of Public Health Uiversity of Massachusetts, Amherst, MA 40 Arold House 75 N. Pleasat Street Uiversity of

More information

A Relationship Between the One-Way MANOVA Test Statistic and the Hotelling Lawley Trace Test Statistic

A Relationship Between the One-Way MANOVA Test Statistic and the Hotelling Lawley Trace Test Statistic http://ijspccseetorg Iteratioal Joural of Statistics ad Probability Vol 7, No 6; 2018 A Relatioship Betwee the Oe-Way MANOVA Test Statistic ad the Hotellig Lawley Trace Test Statistic Hasthika S Rupasighe

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

A Lattice Green Function Introduction. Abstract

A Lattice Green Function Introduction. Abstract August 5, 25 A Lattice Gree Fuctio Itroductio Stefa Hollos Exstrom Laboratories LLC, 662 Nelso Park Dr, Logmot, Colorado 853, USA Abstract We preset a itroductio to lattice Gree fuctios. Electroic address:

More information

Math 155 (Lecture 3)

Math 155 (Lecture 3) Math 55 (Lecture 3) September 8, I this lecture, we ll cosider the aswer to oe of the most basic coutig problems i combiatorics Questio How may ways are there to choose a -elemet subset of the set {,,,

More information

1 Review of Probability & Statistics

1 Review of Probability & Statistics 1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5

More information

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals 7-1 Chapter 4 Part I. Samplig Distributios ad Cofidece Itervals 1 7- Sectio 1. Samplig Distributio 7-3 Usig Statistics Statistical Iferece: Predict ad forecast values of populatio parameters... Test hypotheses

More information

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence Chapter 3 Strog covergece As poited out i the Chapter 2, there are multiple ways to defie the otio of covergece of a sequece of radom variables. That chapter defied covergece i probability, covergece i

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

Information-based Feature Selection

Information-based Feature Selection Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 016 MODULE : Statistical Iferece Time allowed: Three hours Cadidates should aswer FIVE questios. All questios carry equal marks. The umber

More information

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS J. Japa Statist. Soc. Vol. 41 No. 1 2011 67 73 A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS Yoichi Nishiyama* We cosider k-sample ad chage poit problems for idepedet data i a

More information

Lecture Overview. 2 Permutations and Combinations. n(n 1) (n (k 1)) = n(n 1) (n k + 1) =

Lecture Overview. 2 Permutations and Combinations. n(n 1) (n (k 1)) = n(n 1) (n k + 1) = COMPSCI 230: Discrete Mathematics for Computer Sciece April 8, 2019 Lecturer: Debmalya Paigrahi Lecture 22 Scribe: Kevi Su 1 Overview I this lecture, we begi studyig the fudametals of coutig discrete objects.

More information

SYSTEMATIC SAMPLING FOR NON-LINEAR TREND IN MILK YIELD DATA

SYSTEMATIC SAMPLING FOR NON-LINEAR TREND IN MILK YIELD DATA Joural of Reliability ad Statistical Studies; ISS (Prit): 0974-804, (Olie):9-5666 Vol. 7, Issue (04): 57-68 SYSTEMATIC SAMPLIG FOR O-LIEAR TRED I MILK YIELD DATA Tauj Kumar Padey ad Viod Kumar Departmet

More information

SNAP Centre Workshop. Basic Algebraic Manipulation

SNAP Centre Workshop. Basic Algebraic Manipulation SNAP Cetre Workshop Basic Algebraic Maipulatio 8 Simplifyig Algebraic Expressios Whe a expressio is writte i the most compact maer possible, it is cosidered to be simplified. Not Simplified: x(x + 4x)

More information

Math 61CM - Solutions to homework 3

Math 61CM - Solutions to homework 3 Math 6CM - Solutios to homework 3 Cédric De Groote October 2 th, 208 Problem : Let F be a field, m 0 a fixed oegative iteger ad let V = {a 0 + a x + + a m x m a 0,, a m F} be the vector space cosistig

More information

6.867 Machine learning, lecture 7 (Jaakkola) 1

6.867 Machine learning, lecture 7 (Jaakkola) 1 6.867 Machie learig, lecture 7 (Jaakkola) 1 Lecture topics: Kerel form of liear regressio Kerels, examples, costructio, properties Liear regressio ad kerels Cosider a slightly simpler model where we omit

More information

A proposed discrete distribution for the statistical modeling of

A proposed discrete distribution for the statistical modeling of It. Statistical Ist.: Proc. 58th World Statistical Cogress, 0, Dubli (Sessio CPS047) p.5059 A proposed discrete distributio for the statistical modelig of Likert data Kidd, Marti Cetre for Statistical

More information

This is an introductory course in Analysis of Variance and Design of Experiments.

This is an introductory course in Analysis of Variance and Design of Experiments. 1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class

More information

Advanced Stochastic Processes.

Advanced Stochastic Processes. Advaced Stochastic Processes. David Gamarik LECTURE 2 Radom variables ad measurable fuctios. Strog Law of Large Numbers (SLLN). Scary stuff cotiued... Outlie of Lecture Radom variables ad measurable fuctios.

More information

Optimization Methods MIT 2.098/6.255/ Final exam

Optimization Methods MIT 2.098/6.255/ Final exam Optimizatio Methods MIT 2.098/6.255/15.093 Fial exam Date Give: December 19th, 2006 P1. [30 pts] Classify the followig statemets as true or false. All aswers must be well-justified, either through a short

More information

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator Slide Set 13 Liear Model with Edogeous Regressors ad the GMM estimator Pietro Coretto pcoretto@uisa.it Ecoometrics Master i Ecoomics ad Fiace (MEF) Uiversità degli Studi di Napoli Federico II Versio: Friday

More information

Basics of Probability Theory (for Theory of Computation courses)

Basics of Probability Theory (for Theory of Computation courses) Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.

More information

On stratified randomized response sampling

On stratified randomized response sampling Model Assisted Statistics ad Applicatios 1 (005,006) 31 36 31 IOS ress O stratified radomized respose samplig Jea-Bok Ryu a,, Jog-Mi Kim b, Tae-Youg Heo c ad Chu Gu ark d a Statistics, Divisio of Life

More information

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT

Geometry of LS. LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT OCTOBER 7, 2016 LECTURE 3 GEOMETRY OF LS, PROPERTIES OF σ 2, PARTITIONED REGRESSION, GOODNESS OF FIT Geometry of LS We ca thik of y ad the colums of X as members of the -dimesioal Euclidea space R Oe ca

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

Simple Random Sampling!

Simple Random Sampling! Simple Radom Samplig! Professor Ro Fricker! Naval Postgraduate School! Moterey, Califoria! Readig:! 3/26/13 Scheaffer et al. chapter 4! 1 Goals for this Lecture! Defie simple radom samplig (SRS) ad discuss

More information

A statistical method to determine sample size to estimate characteristic value of soil parameters

A statistical method to determine sample size to estimate characteristic value of soil parameters A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig

More information

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors ECONOMETRIC THEORY MODULE XIII Lecture - 34 Asymptotic Theory ad Stochastic Regressors Dr. Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Asymptotic theory The asymptotic

More information

Algebra of Least Squares

Algebra of Least Squares October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal

More information

Bayesian Methods: Introduction to Multi-parameter Models

Bayesian Methods: Introduction to Multi-parameter Models Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested

More information

CSE 1400 Applied Discrete Mathematics Number Theory and Proofs

CSE 1400 Applied Discrete Mathematics Number Theory and Proofs CSE 1400 Applied Discrete Mathematics Number Theory ad Proofs Departmet of Computer Scieces College of Egieerig Florida Tech Sprig 01 Problems for Number Theory Backgroud Number theory is the brach of

More information

3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense,

3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense, 3. Z Trasform Referece: Etire Chapter 3 of text. Recall that the Fourier trasform (FT) of a DT sigal x [ ] is ω ( ) [ ] X e = j jω k = xe I order for the FT to exist i the fiite magitude sese, S = x [

More information

Lecture 12: September 27

Lecture 12: September 27 36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.

More information

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula Joural of Multivariate Aalysis 102 (2011) 1315 1319 Cotets lists available at ScieceDirect Joural of Multivariate Aalysis joural homepage: www.elsevier.com/locate/jmva Superefficiet estimatio of the margials

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

The standard deviation of the mean

The standard deviation of the mean Physics 6C Fall 20 The stadard deviatio of the mea These otes provide some clarificatio o the distictio betwee the stadard deviatio ad the stadard deviatio of the mea.. The sample mea ad variace Cosider

More information

1 Introduction to reducing variance in Monte Carlo simulations

1 Introduction to reducing variance in Monte Carlo simulations Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

Machine Learning for Data Science (CS 4786)

Machine Learning for Data Science (CS 4786) Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm

More information

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector

Summary and Discussion on Simultaneous Analysis of Lasso and Dantzig Selector Summary ad Discussio o Simultaeous Aalysis of Lasso ad Datzig Selector STAT732, Sprig 28 Duzhe Wag May 4, 28 Abstract This is a discussio o the work i Bickel, Ritov ad Tsybakov (29). We begi with a short

More information

Problem Set 2 Solutions

Problem Set 2 Solutions CS271 Radomess & Computatio, Sprig 2018 Problem Set 2 Solutios Poit totals are i the margi; the maximum total umber of poits was 52. 1. Probabilistic method for domiatig sets 6pts Pick a radom subset S

More information

In this section we derive some finite-sample properties of the OLS estimator. b is an estimator of β. It is a function of the random sample data.

In this section we derive some finite-sample properties of the OLS estimator. b is an estimator of β. It is a function of the random sample data. 17 3. OLS Part III I this sectio we derive some fiite-sample properties of the OLS estimator. 3.1 The Samplig Distributio of the OLS Estimator y = Xβ + ε ; ε ~ N[0, σ 2 I ] b = (X X) 1 X y = f(y) ε is

More information

Soo King Lim Figure 1: Figure 2: Figure 3: Figure 4: Figure 5: Figure 6: Figure 7:

Soo King Lim Figure 1: Figure 2: Figure 3: Figure 4: Figure 5: Figure 6: Figure 7: 0 Multivariate Cotrol Chart 3 Multivariate Normal Distributio 5 Estimatio of the Mea ad Covariace Matrix 6 Hotellig s Cotrol Chart 6 Hotellig s Square 8 Average Value of k Subgroups 0 Example 3 3 Value

More information

Assignment 2 Solutions SOLUTION. ϕ 1 Â = 3 ϕ 1 4i ϕ 2. The other case can be dealt with in a similar way. { ϕ 2 Â} χ = { 4i ϕ 1 3 ϕ 2 } χ.

Assignment 2 Solutions SOLUTION. ϕ 1  = 3 ϕ 1 4i ϕ 2. The other case can be dealt with in a similar way. { ϕ 2 Â} χ = { 4i ϕ 1 3 ϕ 2 } χ. PHYSICS 34 QUANTUM PHYSICS II (25) Assigmet 2 Solutios 1. With respect to a pair of orthoormal vectors ϕ 1 ad ϕ 2 that spa the Hilbert space H of a certai system, the operator  is defied by its actio

More information

Singular value decomposition. Mathématiques appliquées (MATH0504-1) B. Dewals, Ch. Geuzaine

Singular value decomposition. Mathématiques appliquées (MATH0504-1) B. Dewals, Ch. Geuzaine Lecture 11 Sigular value decompositio Mathématiques appliquées (MATH0504-1) B. Dewals, Ch. Geuzaie V1.2 07/12/2018 1 Sigular value decompositio (SVD) at a glace Motivatio: the image of the uit sphere S

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

Optimal Estimator for a Sample Set with Response Error. Ed Stanek

Optimal Estimator for a Sample Set with Response Error. Ed Stanek Optial Estiator for a Saple Set wit Respose Error Ed Staek Itroductio We develop a optial estiator siilar to te FP estiator wit respose error tat was cosidered i c08ed63doc Te first 6 pages of tis docuet

More information

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference EXST30 Backgroud material Page From the textbook The Statistical Sleuth Mea [0]: I your text the word mea deotes a populatio mea (µ) while the work average deotes a sample average ( ). Variace [0]: The

More information

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014. Product measures, Toelli s ad Fubii s theorems For use i MAT3400/4400, autum 2014 Nadia S. Larse Versio of 13 October 2014. 1. Costructio of the product measure The purpose of these otes is to preset the

More information

Statistical inference: example 1. Inferential Statistics

Statistical inference: example 1. Inferential Statistics Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either

More information

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn

Review Questions, Chapters 8, 9. f(y) = 0, elsewhere. F (y) = f Y(1) = n ( e y/θ) n 1 1 θ e y/θ = n θ e yn Stat 366 Lab 2 Solutios (September 2, 2006) page TA: Yury Petracheko, CAB 484, yuryp@ualberta.ca, http://www.ualberta.ca/ yuryp/ Review Questios, Chapters 8, 9 8.5 Suppose that Y, Y 2,..., Y deote a radom

More information

GUIDELINES ON REPRESENTATIVE SAMPLING

GUIDELINES ON REPRESENTATIVE SAMPLING DRUGS WORKING GROUP VALIDATION OF THE GUIDELINES ON REPRESENTATIVE SAMPLING DOCUMENT TYPE : REF. CODE: ISSUE NO: ISSUE DATE: VALIDATION REPORT DWG-SGL-001 002 08 DECEMBER 2012 Ref code: DWG-SGL-001 Issue

More information

K. Grill Institut für Statistik und Wahrscheinlichkeitstheorie, TU Wien, Austria

K. Grill Institut für Statistik und Wahrscheinlichkeitstheorie, TU Wien, Austria MARKOV PROCESSES K. Grill Istitut für Statistik ud Wahrscheilichkeitstheorie, TU Wie, Austria Keywords: Markov process, Markov chai, Markov property, stoppig times, strog Markov property, trasitio matrix,

More information

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen) Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................

More information

Maximum likelihood estimation from record-breaking data for the generalized Pareto distribution

Maximum likelihood estimation from record-breaking data for the generalized Pareto distribution METRON - Iteratioal Joural of Statistics 004, vol. LXII,. 3, pp. 377-389 NAGI S. ABD-EL-HAKIM KHALAF S. SULTAN Maximum likelihood estimatio from record-breakig data for the geeralized Pareto distributio

More information

Expectation and Variance of a random variable

Expectation and Variance of a random variable Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio

More information