Sample Allocation under a Population Model and Stratified Inclusion Probability Proportionate to Size Sampling

Secto o Survey Researc Metods Sample Allocato uder a Populato Model ad Stratfed Icluso Probablty Proportoate to Sze Sampl Su Woo Km, Steve eera, Peter Soleberer 3 Statststcs, Douk Uversty, Seoul, Korea, Republc of Isttute for Socal Researc, Uversty of Mca, 46 Tompso, A Arbor, Mca, 4804 3 Isttute for Socal Researc, Uversty of Mca. Itroducto I stratfed sampl, a total sample of elemets s allocated to eac of =,, des strata ad depedet samples of elemets are selected depedetly wt strata. Oe of te mportat roles of te survey sampler s to determe te sample allocato to strata tat wll result te reatest precso for sample estmates of populato caracterstcs. May studes ave focused o sample allocato stratfed radom sampl. Te follow approaces ave bee popular survey sampl practce: ( proportoal sample allocato to strata, ad ( eyma (934 sample allocato. Proportoal sample allocato asss sample szes to strata proporto to te stratum populato sze. Proportoal allocato ca be used we formato o stratum varablty s lack or stratum varaces are approxmately equal. Sce proportoal allocato results a self-wet sample, populato estmates ad ter sampl varaces are easly computed. eyma allocato ca be used effectvely to mmze te varace of a estmator f te survey cost per sampl ut s te same all strata but elemet varaces, S, dffer across strata. Ts allocato metod requres kowlede of te values of te stadard devatos, S, of te varable of terest y for eac stratum. Ts formato o stratum-specfc varace s ofte ot avalable practce. A sample allocato metod wt practcal advataes over eyma allocato s termed x optmal allocato. Te x optmal allocato metod uses a auxlary varable x, ly correlated wt te y ad replaces te stratum stadard devatos of te y wt tose of te x te eyma allocato formula. Of course, ts allocato s ot strctly optmal f te correlato betwee x ad y s ot perfect. As a alteratve, Dayal (985 sowed tat a lear model wt respect to x ad y ca be approprately used te allocato of a stratfed radom sample. Ts tecque s called modelasssted allocato. I fact may stratfed sample dess, especally tose employed busess surveys, smple radom sampl wtout replacemet ca be employed to select elemets wt strata. But t s well-kow tat sampl stratees wt vary probabltes suc as probablty proportoal to sze ( PPS sampl wtout replacemet are superor to smple radom sampl wt respect to te effcecy of estmator of populato totals ad related quattes. PPS sampl wtout replacemet s ofte called cluso probablty proportoal to sze ( IPPS sampl or PS sampl. A umber of PS sampl scemes ave bee developed to select samples of sze equal to or reater ta two, ad most of tem are ot easly applcable practce. owever, some tecques suc as Sampford s (967 metod, are ot restrcted to stratum sample sze of = ad may be a attractve opto for reduc sampl varace compared to alteratve dess. Rao (968 dscusses a sample allocato approac tat mmzes te expected varace of te orvtz ad Tompso (-T (95 estmator uder PS sampl ad a superpopulato reresso model wtout te tercept. Rao s metod for sample allocato results te same expected sampl varace for ay PS sampl des. Rao s (968 dscusso rases several questos: ( It may be desreable to troduce a tercept term to te superpopulato reresso model. Cosder te tercept term, wat s te proper stratey for sample allocato PS sampl? ( If we use Sampford s (967 PS sampl metod, wat sample allocato stratey would be approprate? I ts paper, we attempt to aswer tese questos. We frst revew Rao s (968 metod. We sow tat te presece of te tercept te model produces a more complcated allocato problem, but 306

Secto o Survey Researc Metods oe tat ca be easly solved. I addto, we employ optmzato teory to sow ow to optmally determe stratum sample szes for Sampford s selecto metod.. Revst Rao s metod Cosder a fte populato cosst of =,, strata wt uts stratum. Let s be a sample of sze draw from eac stratum by a ve sampl des P( ad let S be te set of all possble samples from eac stratum. Te total sample sze s : =. (. = Te te probablty tat te ut te stratum wll be a sample, deoted, s ve by s, s S = Ps (, =,,, =,,, (. wc are called te frst-order cluso probabltes. Also, te probablty tat bot of te uts ad j wll be cluded a sample, deoted j, s obtaed by j = Ps (, =,,, j =,,., j s, s S (.3 Tese are termed te jot selecto probabltes or te secod-order cluso probabltes. Let y be te value of y for te ut te stratum. As a estmator of te populato total Y = y, cosder te -T estmator = = Yˆ y T =. (.4 = = If > 0, ts estmator s a ubased estmator of Y, wt varace: y = β x + ε, (.6 were x s te value of x for te ut stratum E y x = β x, V ( y x = σ x,, ξ ( ξ Covξ y, yj x, xj = 0. ere E ξ deotes te model expectato over all te fte populatos tat ca be draw from te superpopulato. Te we ave te follow expected varace uder te model (.6:, ad ( EVar Y ξ ( ˆ T σ x = =, (.7 = were, = p = x, = x. = To mmze (.7 subject to te codto (., us te Larae multpler λ, cosder ( ˆ T + = = = = p EVar ξ Y λ σ x + λ. = (.8 Equat (.8 to zero ad dfferetat wt respect to, we ave σ x =. (.9 λ = p Substtut (., we ave σ x =. (.0 p λ = = Replac λ (.9 wt (.0, we ave te follow sample allocato eac stratum: ( ˆ y j T = ( j j. Var Y = = j> j y (.5 Rao (968 cosdered te follow superpopulato reresso model wtout te tercept: = = x = = x. (. 306

Secto o Survey Researc Metods ote tat f =, te allocato uder te superpopulato model ad PS sampl reduces to: =, (. = wc s a proportoal sample allocato to te stratum. Also, Rao sowed tat terms of expected varace, ustratfed PS sampl uder te same superpopulato model s feror to stratfed PS sampl wt te allocato (.. Look at te expected varace (.7 ad te sample allocato (., t does ot volve te jot probabltes j eac stratum. It dcates tat uder te model wtout te tercept (.6 te specfc propertes of a ve PS sampl sceme (propertes tat determe te j are ot reflected te sample allocato, result te same sample allocato for ay PS sampl. ece te follow ssues, as metoed te Itroducto, are of terest. ( Te superpopulato reresso model wc we may ws to employ may surveys may be : y = α + βx + ε, (3. wc s a eeral form ad (.6 s a specal form of (3. we α = 0. Cosder te tercept term α, we eed to reexame te most approprate sample allocato stratey for PS sampl. ( Altou t wll be sow te follow secto tat us (3. ves a sample allocato volv te jot probabltes j, ad tese dffer accord to te cose PS sampl, f we focus o Sampford s (967 metod for PS sampl, wat sample allocato stratey would be approprate? Secto 3 wll address tese ssues of sample allocato. 3. Alteratve Sample Allocatos We assume two dfferet models volv a tercept term: Model I: y = α + βx + ε, =,,, =,, (3. were ε s umercally elble, tat s, x explas y well. Model II: y = α + βx + ε, =,,, =,, (3. were Eξ ( y x = α + βx, Vξ ( y x = σ x, Covξ y, yj x, xj = 0. ad ( Istead of (.5 we cosder te follow form of te varace of te -T estmator Var Y ( ˆ y ( = + j T y yj = = = = j> j = = j> j y y (3.3 Teorem 3.. Uder te Model I, te mmzato of te expected varace of (.4 uder PS sampl s equvalet to mmz A B +, (3.4 = = were, A = α + αβ( x + x (3.5 ad j j = j> x xj B ( α + βx = β. (3.6 = x Proof. For te expected varace of (.4 uder Model I te trd term (3.3 s a fxed value tat does ot volve, ad te oter terms are ve by: ( α + βx = = x = = ( α + βx α + αβ( x + xj x x + = = j> j, + β β = = (3.7 by ot j = ( / te secod = j> term (3.3. = = Sce ( α + βx ad β = te quatty to be mmzed (3.7 s: j are also fxed, 3063

Secto o Survey Researc Metods = = ( α + βx x α + αβ( x + xj + j β x x = = j> j = Te proof follows from substtuto of ad A α + αβ( x + x j = j = j> x xj B ( α + βx (3.8. = β = x (3.8 Remark 3.. Mmzato of (3.4 s a smple problem terms of because te A ad te B are kow values. Cosder Sampford s (967 PS sampl metod for select elemets eac stratum. Altou we ca use (3.4 to decde te stratum sample sze, we stll do t kow te values of te jot probabltes. Te follow approxmate 4 expresso for correct to O ( may be useful: j j ( ppj + ( p + pj pk k = 3 + { ( p + pj pk ( p pj k = + ( 3( p + pj pk ( 3 p k k= k=, (3.9 wc was derved by Asok ad Sukatme (976. From (3.4 ad (3.9 we obta te follow teorem. Teorem 3.. Uder te Model I, te sample allocato problem to mmze te expected varace of (.4 uder Sampford s metod we us te 4 jot probabltes, correct to O (, ve (3.9 s equvalet to mmz were D, (3.0 C + = = { α αβ }, (3. C = + ( x + x j j = j>, (3. j = ( p + pj pk p pj pk k= k= D = B { α + αβ( x + xj } j, (3.3 = j> ad ( p p p = + + j j k k = 3 + ( p + pj pk k = p pj 3( p pj pk 3 pk k= k=. (3.4 + + + Proof. Substtut j from (3.9 (3.5 for te frst term of (3.4, we et: A = { α + αβ( x + xj }, j 0 = = = j> were: (3.5 j 0 = + ( p + pj pk k = 3 + { ( p + pj pk ( p pj k = + ( 3( p + pj pk ( 3 p k k= k=. Express (3.6 terms of, we ave: j 0 j j (3.6 = +. (3.7 Substtut (3.7 (3.5, we obta A = { + ( x + x } α αβ j j = = = j> + { α + αβ( x + xj } j = = j> { α + αβ( x + xj } j = = j> { α + αβ( x + xj } j = = j>. 3064

Secto o Survey Researc Metods (3.8 Sce te secod ad trd terms (3.8 are te kow values, te mmzato of (3.8 reduces to mmz: Add { + ( x + x } α αβ j j = = j> { α + αβ( x + xj } j = = j> B. (3.9 (3.9, we ave te follow = equvalet mmzato problem to te mmzato of (3.4: { + ( x + x } α αβ j j = = j> + B { α + αβ( x + xj } j = = j>. Ts completes te proof. (3.0 Remark 3.. (3.0 s a smple allocato problem terms of because te C ad te D are te kow values. Remark 3.3. We ca defe te follow optmzato problem wt respect to : subject to, ad D Mmze (3. C + = =, =,,, (3., =,,, (3.3 =. (3.4 = Ts problem may be easly adled by covex matematcal proramm alortms ad te soluto provdes a effcet sample allocato stratey we us Sampford s metod uder te model assumpto of (3.. We obta te follow teorem reard te mmzato of te varace of te -T estmator (.4 PS sampl uder te assumpto of te model (3.. Teorem 3.3. Uder Model II, mmz te expected varace of (.4 uder PS sampl amouts to mmz: were, A B, (3.5 + = = ( ( (3.6 A = α x x αx + β ad j j = j> σ = B = x. (3.7 Proof. Cosder a dfferet form of (.5 us = p : j y y j ( T = p pj = = j> p pj Var Y. (3.8 By us Ey ξ = σ x + α + β x + αβx (3.9 ad Eξ ( y yj = α + αβ( x + xj + β x xj, (3.30 we obta y y j Eξ = σ p p p j xj x α + ( αx + β. (3.3 x x Te we et: j EVar ξ ( Y T σ = p ppj = = j> j xj x + α ppj ( αx + β = = j> xx j = EV + α ( xj x ( αx + β = = j> = = j> j ( x x ( x + α α + β wt EV = ( p p j j σ = = (3.3 3065

Secto o Survey Researc Metods x = σ ( p = = p = σ x = = p x σ x. (3.33 = = = = = σ Sce te secod term (3.3 ad te secod term (3.33 are fxed terms of, te mmzato of te model expectato of (3.8 reduces to mmz: = = j> ( x x ( x + α α β j j σ x = = +. (3.34 Sce (3.34 equals (3.5, te proof s completed. Remark 3.4. Mmz (3.5 s a smple problem terms of because te A ad te B are te kow values. Remark 3.4. (3.33 s a dfferet form of (.7. Te model expectato of (3.8 volves (.7 plus te oter terms due to Model II wt te tercept term, as sow (3.3. Teorem 3.4. Uder te Model II, te sample allocato problem uder Sampford s sampl sceme to mmze te expected varace of (.4, 4 we us te jot probabltes correct to O ( ve (3.9, s equvalet to mmz: D C +, (3.35 = = were C {( ( = α x xj αx + β j} (3.36 = j> ad wt {( ( } D = B α x x αx + β j j = j> ( p p p p p p, = + j j k j k k= k= (3.37 ad ( p p p = + + + ( p + p p j j k k = 3 j k k = j ( j k k k= k=, + p p 3 p + p p + 3 p σ = B = x. Proof. Substtut (3.9 te frst term of (3.5 ad us (3.7 wt (3. ad (3.4, we obta ( A {( x j x = α = = = j> ( αx β p pjj0} + = α {( x xj = = j> ( αx β( j j } + + = α {( x xj ( αx + β j} = = j> + α {( x xj ( αx + β j } = = j> α {( x xj ( αx + β j} = = j> α {( x xj ( αx + β j }. = = j> (3.38 Sce te secod ad trd terms (3.38 are equal, te mmzato of (3.38 reduces to mmz te oter terms, tat s, {( x xj ( x + j} = = j> α {( x xj ( αx + β j }. α α β = = j> (3.39 Tus, te mmzato of (3.5 wt (3.6 ad (3.7 amouts to te oe of {( x xj ( x + j} α α β = = j> 3066

Secto o Survey Researc Metods {( x xj ( x j } α α + β = = j> B +. = (3.39 Accordly, te follow reduced form from (3.39 ca be obtaed. {( x xj ( x + j} α α β = = j> B α {( x xj ( αx β j} + + = = j> ece, we ave proved te teorem. (3.40 Remark 3.5. (3.35 s a smple allocato problem terms of sce te C ad te D are te kow values. Remark3.6. I order to fd a soluto for, we may defe te follow optmzato problem: subject to ad Mmze D C + (3.4 = =, =,, (3.4, =,,. (3.43 It s oted tat te codto (. may ot be used as te costrat, dfferet from Remark 3.3. Corollary3.. Uder Model II, wtout te tercept te mmzato of te expected varace of (.4 uder PS sampl s equvalet to mmz: were (3.44 = = = x (3.45 Proof. We α = 0, (3.3 Teorem 3.3 reduces to smply EV, wc s expressed as (3.33. σ ad te secod term (3.33 are fxed values wt respect to, ad te mmzato of (3.33 reduces to te oe of (3.44. ece, we ave te corollary. Remark 3.7. (3.44 s qute a smple allocato problem terms of ot deped o te jot probabltes j. 4. Dscusso We ave addressed te topc of effcet sample allocato stratfed samples us more eeral superpopulato reresso models ta tose vestated by Rao (968. Uder more eeral models tat clude a tercept term, we ave developed several teorems to be useful for decd sample allocato PS sampl dess. Also, trou te teorems we ave sowed ow to apply ts sample allocato teory for Sampford s (967 sampl metod, oe of te more commo PS sampl dess used survey practce. We determed tat te sample allocato approaces to mmz te model expectato of te varace of te -T estmator may deped o te expressos of te varace. Based o te teorems developed ts paper, te optmzato problem wt respect to te stratum sample szes ca be solved by us software volv covex matematcal proramm alortms. Ts s a stratforward approac for sample allocato we us more effcet PS sampl metods. I addto to Sampford sampl, te approac ca be appled to a varety of PS sampl wtout replacemet dess. I future work t wll be mportat to exted te teory ad metods descrbed ere to allocato problems uder more complcated superpopulato models ad stuatos were te superpopulat model ca vary across strata Refereces Dayal, S. (985. Allocato of sample us values of auxlary caracterstc, Joural of te Statstcal Pla ad Iferece,, 3-38. orvtz, D. G. ad Tompso, D. J. (95. A eeralzato of sampl wtout replacemet from a fte uverse, Joural of te Amerca Statstcal Assocato, 47, 663-685. eyma, J. (934. O two dfferet aspects of te represetatve metod: te metod of stratfed sampl ad te metod of purposve selecto, Joural of te Royal Statstcal Socety, 97, 558-606. 3067

Secto o Survey Researc Metods Rao, T. J. (968. O te allocato of sample sze stratfed sampl, Aals of te Isttute of Statstcal Matematcs, 0, 59-66. Sampford, M. R. (967. O sampl wtout replacemet wt uequal probabltes of selecto, Bometrka, 54, 499-53. 3068