COORDINATION OF PPS SAMPLES OVER TIME

Size: px

Start display at page:

Download "COORDINATION OF PPS SAMPLES OVER TIME"

Ezra Gary Hampton
6 years ago
Views:

1 COORDINATION OF PPS SAMPLES OVER TIME Esbjörn Ohlsson, Stockholm Unversty Mathematcal Statstcs, Stockholm Unversty, S Stockholm, Sweden esbj@matematk.su.se ABSTRACT Probablty proportonal to sze (PPS) samplng fnds applcaton n busness surveys both for the frst stage of a mult-stage desgn, and for drect samplng of busnesses from a lst frame. In both cases there s a need to update samples from tme to tme, whle retanng as many unts as possble from the old sample. In the second type of applcaton there s also a need for negatve coordnaton of surveys to get an even dstrbuton of response burden. Further requrements on the samplng procedures are smplcty n applcaton and n estmaton of varance. We present varous permanent random number technques that meet these requrements and compare them to a few other methods, and present a smulaton study on expected overlap. Key Words: Postve Coordnaton, Negatve Coordnaton, Overlap Control, Permanent Random Numbers 1. THE PROBLEM Applcatons of PPS samplng n busness surveys can be splt nto two man categores: (a) samplng from area frames wth a mult-stage desgn and (b) samplng ultmate unts drectly from a lst frame. The classcal stuaton for (a) s a mult-stage sample where prmary samplng unts are drawn wth probablty proportonal to some measure of the unt's sze (PPS). Typcally, due to extensve stratfcaton, just a few unts are selected n each stratum. As an example we consder a master sample of stretches of road, mantaned by the Swedsh Natonal Road Admnstraton. Even though the frst stage sample sze s 84, extensve stratfcaton results n stratum sample szes n=1, throughout. In each sampled unt, nvestments are made, e.g. on equpment to measure traffc flow, and ths, of course, s a strong argument for retanng the sample over the years. Another argument s that the man nterest n samples from ths frame s n estmates of change over tme, for whch retanng the same unts mproves precson. The szes of the unts are based on a rotatng census that gves estmates of traffc mleage per year for one thrd of the unts each year. Snce these szes changes substantally over the years, there s a regular need to update the samples, f not every thrd year so at least each decade. Else, there would fnally be a great loss n effcency n the estmates from surveys usng the master sample. Furthermore, at rregular ntervals there are changes n the classfcaton of roads that underles the stratfcaton: e.g., roads may change from county to natonal or even European hghway status. Also, there are some new roads (brths) and roads that are no longer n the populaton (deaths). We conclude that n ths survey, as n most repettve surveys, there s a need for updatng the sample to account for new szes, stratum/classfcaton changes plus brths and deaths, whle retanng as many unts as possble from the old sample. In ths example, the wthn strata sample szes were n=1, as s the case wth several major surveys such as the US Consumer Expendture Survey and the US Current Populaton Survey. Ths mght explan why ths case has receved consderable attenton n the lterature see e.g., Keyftz (1951), Ksh and Scott (1971) and Causey, Cox and Ernst (1985). Several newer references can be found n the overvew by Ernst (1999). In the present paper we wll put a specal emphass on the case n=1, but we wll also consder PPS samples of other. The second type of applcaton of PPS samplng s the case where we have a lst frame wth ultmate samplng unts, often n the form of a busness regster. The most common desgn n ths case seems to be a stratfed smple random sample (STSI). Sample coordnaton wth ths knd of desgn was dscussed extensvely at the 1993 ICES meetng, see Ohlsson (1995) and Srnath and Carpenter (1995). It s also a topc at the ICES II meetng, see McKenze and Gross (1999) and Royce (1999). In these cases, there s both stratfcaton by ndustry and by sze. The PPS alternatve means that the stratfcaton by ndustry s kept whle stratfcaton by sze s replaced by PPS samplng. One advantage of ths knd of desgn s that the sze measure s more extensvely exploted. 255

2 One example of a PPS desgn of ths second knd s the sample of outlets for the Swedsh CPI. In an nvestgaton of the sample of retal traders for the Swedsh Consumer Prce Index, Ålenus (1990) consdered the sample of n=41 department stores out of N=259 unts n that ndustral stratum, wth prces of two dfferent commodtes as target varables. Wth a standard sze stratfcaton, usng four strata, STSI gave twce the varance of PPS samplng. Even when the STSI desgn was refned to use 41 strata and a rato estmator was used (treatng sze as an auxlary varable), STSI gave around 25 percent larger varance than PPS. For the NASS Crops Survey, Baley & Kott (1997) found PPS samplng to be more effcent than STSI for most crops. The concluson s that there are busness surveys for whch a PPS desgn s preferable to STSI from an effcency pont of vew. Furthermore, f the PPS samplng procedure s smple to mplement, a PPS desgn s smpler to admnstrate, wth much fewer strata to construct, allocate and mantan. Even f we do not clam PPS to be generally preferable to STSI, t should be consdered a strong canddate for busness survey desgns. A necessary condton for ths s that the PPS procedure nvolved s smple to use n practce. Smple and effcent PPS procedures are the man topc of ths paper. 2. SPECIFICATION FOR PPS COORDINATION There are an mmense number of PPS procedures avalable n the lterature, see Brewer & Hanf (1983). Most of these are not proper for sample coordnaton, though. In fact, we beleve that lack of smple, effcent PPS procedures that can produce coordnated samples may be a reason for the extensve use of STSI nstead of PPS samplng. We now specfy n more detal our requrements on a smple and effcent PPS samplng procedure wth capablty of handlng sample coordnaton. The (stratum) populaton s U=1,2,,N. In the frame (whch s supposed to be a lst even n case a) there s a nonnegatve auxlary varable p 1, p 2,,p N. In applcatons, p s usually a measure of the sze of unt. We assume that the p :s have been normed so that p = 1, wthn each stratum. Särndal, Swensson and Wretman (1992, p. 90) gve a lst on desrable propertes of a PPS procedure. Appled to the stuaton wth two samples that are to be coordnated, ther frst two propertes gve us the followng three requrements: () Relatve smplcty n applcaton. () For the frst sample π = Pr( s) = np, U. () For the second sample π = Pr( s ) = n p, U '. Here π = Pr( s) denotes actual ncluson probablty of unt n the frst sample s. All quanttes relatng to the second sample s' wll be equpped wth a prme, as n p. Särndal et al. add three condtons that enable varance estmaton wth the Sen-Yates-Grundy estmator. In a later artcle, Särndal (1996) argues for the use of procedures that allow smple, sngle-sum varance estmaton. We agree wth ths pont of vew and get (v) Avalablty of a varance estmator, preferably expressed as a sngle sum. (Not relevant for n=1.) Fnally, we add three condtons that are partcular for the problem of overlap control. Note that the expected number of unts n common to two samples s Pr( s, s ) (1) (v) Possblty of postve sample coordnaton of two or more samples wth dfferent sze measures p, dfferent strata and dfferent n, preferably wth maxmzaton of the expected sample overlap n (1). 256

3 (v) Possblty of negatve sample coordnaton, of two or more samples wth dfferent sze measures p, dfferent strata and dfferent n, preferably wth mnmzaton of the expected sample overlap n (1). (v) On each occason, all strata are sampled ndependent of each other. For the second sample ths means that Pr( s, j s ) = Pr( s ) Pr( j s ) whenever and j are n dfferent strata. As noted by Ernst (1999), condton (v) s not satsfed by most overlap procedures that allow for dfferent stratfcaton. Ths condton s mportant for obtanng unbased varance estmates and, above all, for the possblty to apply the sample overlap procedure repeatedly. Särndal et al. (1992, p. 90) remark that t s not easy to devse a (fxed-sze) procedure havng the propertes desrable for PPS samplng, even at a sngle occason. It s thus futle to hope for a procedure for samplng on two occasons, wth overlap control, that fulfls all our requrements ()-(v). Instead, we have to look for procedures that are reasonably good at ()-(v). We start out wth a procedure that s not fxed-sze,.e. t gves a random sample sze. 3. POISSON SAMPLING AND THE IDEA OF PRN In Posson samplng, each unt s gven an ndependent, unformly dstrbuted random number X on the nterval (0,1). Unt s ncluded n the sample s f X np. Posson samplng can be used for sample coordnaton by savng the X as permanent random numbers (PRN). Ths dea s due to Brewer, Early and Joyce (1972) and means that when a second sample s to be drawn, we use the same random numbers as n the frst, but we update the szes p, the stratfcaton, and the sample sze n. Note that the quanttes p and n relates to the stratum where s located n the second desgn, whch may or may not be the same as n the frst desgn. (For smplcty n notaton, we refran from usng stratum sub-ndexes.) A vrtue of Posson samplng s that t s very smple to apply,.e. requrement () met. Further, t s readly seen that ths procedure s strctly PPS so that () and () are fulflled. A sngle-sum varance estmator (v) s avalable, see Brewer and Hanf (1983, p. 83). The probablty of ncludng unt n two samples drawn wth PRN Posson samplng s obvously Pr( s, s ) = mn( np, n p ) (2) Ths s of course the largest possble probablty, yeldng a strct maxmum (1). Negatve coordnaton can be acheved by shftng the PRN an amount c to the rght before the selecton of the second sample, gvng new random * numbers X = X + c. If m samples are to be negatvely coordnated, the choce c = 1/ m should gve a small sample overlap. In partcular, f the target ncluson probabltes np are less than 1/m for all unts n all m desgns, the * expected overlap s 0. In the case of m=2, an alternatve s to use antthetc random numbers, X = 1 X whch gves mnmum expected sample overlap for any target probabltes. We conclude that both (v) and (v) are satsfed. Fnally, all unts are sampled ndependently and n partcular (v) s met. It may appear as f we have found the optmal procedure for PPS sample coordnaton. However, Posson samplng has the drawback of gvng a random sample sze (wth expected value n). Ths has two mplcatons: The frst s that there s a rsk for n=0 n some stratum. Snce the random sample sze s approxmately Posson dstrbuted, the probablty of ths to happen when we have H strata, all wth sample sze n, s ( ) H 1 e n 1. In order for ths quantty to be neglgble we must avod small n, where the magntude of small of course depends on H. The concluson s that Posson samplng can not be used n all stuatons, and n partcular not n those wth n=1 or 2. Even when the probablty of some zero sample sze s neglgble, the randomness n sample sze may serously dsturb the ntended sample allocaton over strata, wth a loss n effcency of the estmates. 257

4 The second, less serous, drawback of the random sample sze s that t should be clear that nference should be made condtonal on the sample sze actually obtaned. Condtonal on the actual sample sze, the probabltes of ncluson are no longer exactly PPS and they are n fact very hard to compute, see Ares (1999). 4. FIXED SIZE PROCEDURES We now look at fxed-sze alternatves to Posson samplng, startng wth the case n=1. In ths secton we restrct the attenton to PRN procedures. Collocated samplng, by Brewer et al (1972), cannot be used to coordnate samples wth dfferent stratfcaton, and wll therefore not be consdered here. 4.1 The case n=1 A samplng desgn s a probablty dstrbuton on all possble sets of samples. Wth ths defnton, PPS samplng wth fxed sze n=1 s unque. The tradtonal samplng procedure for realzng a PPS sze one sample uses just one random number. For the applcaton of PRN sample coordnaton, we need a procedure that uses ndvdual random numbers for all unts. Such a procedure s Exponental samplng, presented n Ohlsson (1996). Startng wth a set of PRN, { X ; = 1,2,... N}, we compute the transformed random numbers ξ = log( 1 X )/ p, whch are exponentally dstrbuted wth mean 1 / p. The unt wth the smallest ξ s selected for the sample. By a wellknown result from probablty theory, the probablty of selectng unt s p, as requred. Coordnaton of samples s acheved by usng PRN as descrbed for Posson samplng above. Exponental samplng does not reach the optmal expected overlap n (v) and (v), but s not too far away, see the numercal examples below. A formula for expected overlap for postve coordnaton was gven n Ohlsson (1996). Snce ths procedure s very smple to mplement, () s fulflled along wth ()-(); (v) s not relevant snce n=1. We just mentoned that (v)-(v) are fulflled. Fnally, t s not hard to see that (v) s satsfed. We conclude that Exponental samplng s a strong canddate for coordnated PPS samplng when n= The case n>1 Unlke the n=1 case, PPS samplng wth n>1 can be done n several dfferent ways. Most of these can not be used n connecton wth PRN, snce ths requres procedures that use ndvdual random numbers for the unts. A natural dea s to extend Exponental samplng to n>1, by selectng the unts wth the n smallest transformed random numbers. Unfortunately, ths yelds so called successve samplng (Cochran, 1977, Secton 9A.8) whch s not strctly PPS. The actual ncluson probabltes may be qute far from the target values. Cochran (1977) presents several technques for handlng ths problem, of whch we consder Brewer s method for n=2. In our context, ths method can be appled by drawng the frst unt as n Exponental samplng, but wth transformed random numbers ( X ) log 1 ξ = (3) p (1 p ) /(1 2 p ) After removng the unt drawn n the frst round, a second unt s drawn wth the transformed random numbers of Exponental samplng ξ = log( 1 X )/ p. Cochran (1977, Secton 9A.8) shows that these two steps yeld a sze n=2 sample wth the requred ncluson probabltes,.e. meets () and (). The procedure s relatvely smple (), and has the same propertes as Exponental samplng as regards (v)-(v). It also allows unbased varance estmaton wth the Sen-Yates-Grundy estmator. Ths s not a sngle-sum estmator, though, so (v) s not completely met. The extenson of Brewer s method to n>2 by Sampford (1967) s rather complcated and wll not be consdered here. Ohlsson (1990 and 1998) gave an alteraton of Posson samplng called Sequental Posson samplng (SPS). Ths procedure uses the transformed random numbers ξ = X / p and selects the n unts wth the smallest such numbers. 258

5 The sample wll be close to a Posson sample, snce the latter selects the unts wth ξ n. Not surprsng, the propertes are approxmately the same as those of Posson samplng, but the sze s fxed. The procedure s very smple (), admts smple-sum varance estmaton (v) and yelds ndependent strata (v). It s approxmately PPS, n the meanng of () and (), a fact whch s motvated by asymptotcs and smulaton n Ohlsson (1998). Even though sample coordnaton s not optmal, n terms of maxmum and mnmum expected overlap n (v) and (v), respectvely, we can expect the overlap not to be too far from the optmum of Posson samplng. The case of postve coordnaton s nvestgated n a smulaton study n the next chapter. Rosén (1997) gave an alteraton of SPS, called Pareto samplng (PAS), wth transformed random numbers of odds rato type X /(1 X ) ξ = (4) np /(1 np ) The propertes are smlar to those of SPS, but PAS s somewhat closer to the target ncluson probabltes n () and (). The closeness to the optmum n (v) s nvestgated n the smulaton study below. 5. COMPARISON WITH OTHER PROCEDURES 5.1 The case n=1 In the lterature, there are several (non-prn) procedures for postve coordnaton (maxmzng overlap) of two PPS sze n=1 samples. The poneerng procedure by Keyftz (1952) assumes the same stratfcaton for both samples. Condtonal on the frst sample, Keyftz method focuses on the second sample. Suppose unt was selected n the frst sample. Keyftz method retans ths unt n sample f t s ncreasng,. e. p p. Else, the unt s retaned wth probablty p / p. If the unt s rejected, one of the ncreasng unts are selected wth probablty proportonal to the ncrements ( p p ). It s easly verfed that ths smple procedure s strctly PPS and s optmal n terms of maxmzng expected sample overlap, when we have the same stratfcaton for both samples. Keyftz' procedure can be extended to the case wth (somewhat) changng strata. We wll descrbe ths procedure for a sngle new stratum. Frst dentfy an old stratum whch wll be consdered as the predecessor of our new stratum. Unts comng from other old strata, mmgrants, are treated as brths,.e., they are assgned the value p = 0. Any frst sample selecton among mmgrants s gnored. Then the orgnal Keyftz algorthm s appled to the new stratum, but the frst two steps are only appled to an eventual ntal selecton that s not an mmgrant. Ksh and Scott (1971) note that ths procedure can be far from optmal n terms of expected overlap unless we have very small dfferences n the stratfcaton of the two samples. They provde three methods (besde the extended Keyftz procedure) for the case wth arbtrary stratfcaton of the two samples. We shall consder only Method II, whch s clamed by Ksh and Scott to gve the largest overlap of the three, wthout beng very complcated. The procedure s an elaborated extenson of Keyftz method. For a descrpton, we refer to the orgnal artcle. The procedure has the dsadvantage of dstortng the ndependence between the strata,.e., t does not fulfl requrement (v). As already noted, ths mples that the procedure can not be appled repeatedly to the same survey. Lke Keyftz procedure, Ksh and Scott only concerns the second sample, so that () s trval. In ther secton 6.2, Ksh and Scott (1971) prove that the procedure fulfls (). The proof reles on the ndependence of the ntal strata, though. A consequence of the dependence of the new strata s therefore that the procedure s not strctly vald for repeated use,.e., () s only vald the frst tme the method s appled. The expected overlap s qute close to optmum, see Secton

6 5.2 The case n>1 Causey, Cox and Ernst (1985) suggested a procedure whch maxmzes (or mnmzes) the expected overlap, subject to the constrants of havng the requred target probabltes n both samples,.e. condtons () and (). The problem s solved by lnear programmng methods. By desgn, ths procedure fulfls our requrement ()-() and s optmal n terms of our choce of (v) or (v). Ernst and Ikeda (1995) note two dffcultes wth the procedure whch can make t unusable n practce. One s that (v) s not fulflled, wth the same dffculty as for Ksh and Scott to apply the procedure repeatedly for the same survey, especally f n>1. The second dffculty s that the transportaton problem may be too large to solve n practce. In any case, () s not fulflled. Ernst (1999) revews several alteratons of the Causey, Cox and Ernst procedure, non of whch fulflls all our requrements. Sunter (1989) presents an nterestng procedure that s applcable for maxmzng overlap of two samples wth any sample sze n. It s a generalzaton of Keyftz procedure. Lke the latter, Sunter s procedure was not prmarly desgned for handlng stratum changes and can be expected to gve an overlap far from maxmum when we have large stratum dfferences between the two samples. 6. NUMERICAL EXAMPLES Below we report results from two numercal studes, one for n=1 and one for n=2 to The case n=1 The frst study concerns expected overlap of dfferent procedures for maxmzng overlap n the case n=1. It s based on the so called MU284 populaton of 284 Swedsh muncpaltes presented n Appendx B of Särndal et al. (1992). See Draw probabltes are ether equal or proportonal to the number of nhabtants, for the frst sample we use fgures from 1975 ( P75), and for the second sample those of 1985 ( P85). Frst sample strata are ether defned by the regonal REG varable, gvng 8 strata of szes between 15 and 56, or by the CL varable, wth 50 small strata wth szes between 5 and 8. The expected overlap was computed exactly (no smulaton) for PRN Exponental samplng (EXP), the extended Keyftz (KEY), the Ksh and Scott (K&S), and the Cox-Causey-Ernst (CCE) procedure. As benchmarks we use (a) the case wthout any overlap control, wth both samples drawn ndependently (IND) and (b) the non-achevable upper lmt (2), added over the strata. For further detals on the study, see Ohlsson (1996). Table 0. MU284 populaton. Expected overlap n eght regonal (REG) strata. Unequal probabltes. 50% of the unts move from each stratum to an adjacent stratum. Stratum N h IND EXP KEY K&S CCE Lmt Sum Percent

7 For the remanng set-ups, we present no stratum detals but just the sum n percent of the number of sampled unts. Table 1. n=1. MU284 populaton. Expected overlap n percent of total sample sze. No changes n strata. Strata Prob IND EXP KEY K&S CCE Lmt Medum (REG) Unequal Equal Small (CL) Unequal Table 2. n=1. MU284 populaton. Expected overlap n percent of total sample sze. One unt changes stratum. Strata Prob IND EXP KEY K&S CCE Lmt Medum (REG) Unequal Equal Small (CL) Unequal Table 3. n=1. MU284 populaton. Expected overlap n percent of total sample sze. 50% of unts change stratum. Strata Prob IND EXP KEY K&S CCE Lmt Medum (REG) Unequal Equal Small (CL) Unequal Table 4. n=1. MU284 populaton. Expected overlap n percent of total sample sze. One thrd of the unts move to the next stratum, one sxth to the followng. Strata Prob IND EXP KEY K&S CCE Lmt Medum (REG) Unequal Equal Small (CL) Unequal Note: CCE (Causey-Cox-Ernst procedure) ntractable n two strata. Unequal case not treated. 6.2 The case n>1 For n=2, 3, 4 we have conducted a smulaton study of expected overlap for sequental Posson samplng (SEP), and Pareto samplng (PAR). For n=2, we also consder exponental samplng (EXP), n the form usng Brewer s recalculated draw probabltes. The benchmarks IND and Lmt are as n the precedng secton. We use data from a master frame of the Swedsh Natonal Road Admnstraton. The statstcal unts are stretches of road and the sze measure s derved from traffc mleage per year. The unts are stratfed accordng to regon and type of road, altogether 28 strata wth 2523 unts. We let 10% of the unts change strata between the two samplng occasons. A more detaled report of the study s gven n Ohlsson (1999). The data are avalable at the address The smulatons were run wth teratons. 261

8 Table 5. n=2. Road populaton. Expected overlap by stratum. Stratum no. N h IND EXP SEP PAR Lmt Sum For n=3 and 4 we only gve the sum over strata. Table 6. Road populaton. Expected overlap, aggregated over strata. 10% of unts change stratum. Sample sze IND EXP SEP PAR Lmt n n= n= n= Incluson probabltes The PRN technques for n>1 are only approxmately (asymptotcally) PPS. Numercal studes ndcate that the approxmaton s very good n many stuatons. Ohlsson (1990, 1998) reports a smulaton study on CPI data, where SEP s very close to beng unbased. Rosén (1998) studes exact actual ncluson probabltes (AIP) for SEP and PAR n an artfcal, but nevertheless nterestng, stuaton, vz. when all unts have the same ncluson probablty except for one odd unt. Even wth sample szes as small as n=2, the relatve error n the AIP for PAR s never larger than 2% and n most cases t s much smaller. SEP s a bt further from the target probablty. The AIPs were also computed n the smulaton study mentoned n Secton 6.2. The results are reported n Ohlsson (1999). Here agan PAR s qute close to the unbased case, wth SEP performng a lttle bt less good. 262

9 7. CONCLUSIONS We frst consder the problem of maxmzng overlap, whch s the concern of the numercal studes. Usng any of the procedures under consderaton gves a great ncrease n (expected) sample overlap, as opposed to drawng ndependent samples (IND). For n=1, the Ksh and Scott procedure s qute close to the optmal (n ths respect) Cox, Causey and Ernst procedure. When there are great dfferences between the strata of the two samples, Keyftz' procedure s rather far from the maxmum expected overlap. Exponental samplng s a bt away from the optmum, but much less so than Keyftz. Snce the K&S and CCE both suffer from the problem of dependent strata,.e. volate (v), and snce the former does not fulfll (v) and the latter volates (), we consder Exponental samplng a good compromse n the search for a procedure that fulflls ()-() and (v)-(v) n as far as possble. Of the mentoned procedures, Exponental samplng and CCE are the only ones that can be used for mnmzng overlap (v). Turnng to the case n>1, we concluded n Secton 5.2 that CCE can be unusable n practce and that Sunter s procedure can be expected to gve a low sample overlap when there are great dfferences n stratfcaton between the two samples. Ths leaves us wth the PRN procedures, whch are all equal n sample overlap n our smulaton studes. Snce PAR s a bt closer to the rght ncluson probabltes t s generally preferable to SEP. EXP s an alternatve for n=2, beng strctly unbased, but t suffers from a more complcated varance estmaton procedure. In any case, the varous PRN technques nvestgated here are smple and effcent for smultaneous negatve and postve sample coordnaton of any number of surveys, wth any draw probabltes and any stratfcaton. When n>1, they all allow for varance estmaton, and all but Exponental have an assocated sngle-sum varance estmator. In summary, the PRN technques fulfll ()-(v) even though they do not strctly maxmze/mnmze the expected sample overlap. The overall concluson s that PRN procedures, n.b. Exponental samplng for n=1 and maybe n=2, and Pareto samplng for n>1, are compettve as procedures for controllng sample overlap. 8. REFERENCES Ares, N. (1999). Comparsons Between Condtonal Posson Samplng and Pareto πps Samplng Desgns, Contrbuted paper, Bulletn of the Internatonal Statstcal Insttute, 52nd Sesson. Ålenus, M. (1990). Storleksstratferng eller pps-urval? En jämförande stude för KPI-data, F-METOD NR 25, Statstcs Sweden. (In Swedsh.) Baley, J. T. and Kott, P.S. (1997). An Applcaton of Multple Lst Frame Samplng for Mult-Purpose Surveys, ASA Proceedngs of the Secton on Survey Research Methods. Brewer, K.R.W., Early, L.J. and Joyce, S.F. (1972). Selectng several samples from a sngle populaton, Australan Journal of Statstcs, 14, Brewer, K.R.W., and Hanf, M. (1983). Samplng Wth Unequal Probabltes. Sprnger, New York. Causey, B. D., Cox, L. H. and Ernst, L. R. (1985), Applcaton of Transportaton Theory to Statstcal Problems, Journal of the Amercan Statstcal Assocaton, 80, Cochran, W. G. (1977). Samplng Technques 3d ed., New York: John Wley. Ernst, L. R. (1999), The Maxmzaton and Mnmzaton of Sample Overlap Problems: A Half Century of Results, Invted Paper, Bulletn of the Internatonal Statstcal Insttute, 52nd Sesson. Ernst, L. R. and Ikeda, M. M. (1995), A Reduced-Sze Transportaton Algorthm for Maxmzng the Overlap Between Surveys, Survey Methodology, 21,

10 Keyftz, N. (1951), Samplng wth Probabltes Proportonal to Sze, Journal of the Amercan Statstcal Assocaton, 46, Ksh, L. and Scott, A. (1971), Retanng Unts After Changng Strata and Probabltes, Journal of the Amercan Statstcal Assocaton, 66, McKenze, R. and Gross, B. (1999), Synchronzed Samplng at the Australan Bureau of Statstcs. In ths volume. Ohlsson, E. (1990), Sequental Posson Samplng from a Busness Regster and ts Applcaton to the Swedsh Consumer Prce Index, R & D Report 1990:6, Statstcs Sweden. Ohlsson, E. (1995), Coordnaton of Samples Usng Permanent Random Numbers, In Busness Survey Methods, New York: Wley, Ohlsson, E. (1996), Methods for PPS Sze One Sample Coordnaton, Research Report No. 194, Insttute of Actuaral Mathematcs and Mathematcal Statstcs, Stockholm Unversty. Ohlsson, E. (1998), Sequental Posson Samplng, Journal of Offcal Statstcs, 14, Ohlsson, E. (1999), Methods for PPS Sze One Sample Coordnaton, Research Report No. 210, Insttute of Actuaral Mathematcs and Mathematcal Statstcs, Stockholm Unversty. Rosén, B. (1997b), On Samplng wth Probablty Proportonal to Sze, Journal of Statstcal Plannng and Inference, 62, Rosén, B. (1998), On Incluson Probabltes for Order Samplng, R & D Report 1998:2, Statstcs Sweden. To appear n Journal of Statstcal Plannng and Inference? Royce, D. (1999), Issues n Co-ordnated Samplng at Statstcs Canada, In ths volume. Sampford, M. R., (1967), On Samplng Wthout Replacement wth Unequal Probabltes of Selecton, Bometrka, 54, Särndal, C. E., (1996), Effcent Estmators Wth Smple Varance n Unequal Probablty Samplng, Journal of the Amercan Statstcal Assocaton, 91, Särndal, C. E., Swensson, B. and Wretman, J. (1992), Model Asssted Survey Samplng, New York: Sprnger-Verlag. Srnath, K.P. and Carpenter, R.M. (1995). Samplng Methods for Repeated Busness Surveys, In Busness Survey Methods, New York: Wley, Sunter, A. B (1989), Updatng Sze Measures n a PPSWOR Desgn, Survey Methodology, 15,

11 DISCUSSION OF SESSION 31: COORDINATING SAMPLING BETWEEN AND WITHIN SURVEYS Lawrence R. Ernst, U.S. Bureau of Labor Statstcs BLS, 2 Massachusetts Ave., N.E., Room 3160, Washngton, DC , U.S.A. ernst_l@bls.gov 1. CATEGORIZATION OF THE THREE PAPERS Although there are three papers n ths sesson, I vew them as fttng nto two categores of procedures for controllng sample overlap. The major focus of Ohlsson s paper s on an overlap procedure that he developed, exponental samplng, whch s n a class of procedures most commonly used n selectng PSUs for household surveys, although some establshment surveys also use PSUs. Perhaps the key characterstc of such procedures, whch orgnated wth Keyftz (1951), s that the sample sze per stratum, n, s always small, typcally ether 1 or 2, and we consequently desgnate these procedures as S procedures. The followng are common characterstcs of S procedures. For a gven stratum, n s predetermned, although n some desgns n may vary by stratum. Sample unts are selected pps. If n > 1, the jont selecton probabltes wthn a stratum are predetermned. Most S procedures, ncludng Ohlsson s procedure, although not all, allow for dfferent stratfcatons n the desgns beng overlapped. These procedures have been developed for overlap maxmzaton and/or mnmzaton, but not partal rotaton. S procedures generally do not use PRNs, wth Ohlsson s procedure the only excepton that I am aware of among procedures that strctly preserve the desred selecton probabltes. (We restrct the dscusson to such procedures.) Fnally, S procedures typcally overlap only two samples at a tme, wth agan Ohlsson s procedure an excepton. The McKenze and Gross (M&G) and Royce papers, n contrast, dscuss overlap procedures typcally used for selectng establshments from a stratfed lst frame, wth a key characterstc that these procedures must be capable of overlappng samples for whch n s large. Consequently, we desgnate these procedures as L procedures. In L procedures, n for a stratum may be ether predetermned, as n synchronzed samplng of the M&G paper, or varable, as n all the Statstcs Canada (STC) procedures descrbed n the Royce paper. The selecton of the sample unts s commonly, as n both of these papers, although not exclusvely, wth equal probablty. Jont selecton probabltes are generally nether predetermned, nor calculated. Typcally L procedures do not attempt to control overlap for unts changng strata. Ths s because, n addton to the extra complexty of attemptng to do so, the most common stratfcaton change s the occasonal establshment changng sze class. Both of these papers are exceptons because they do dscuss procedures that control overlap wth restratfcaton. In fact, ths s a key ssue n the M&G paper. L procedures are used for overlap maxmzaton, mnmzaton, and partal overlap, and all three applcatons are dscussed n both of these papers. L procedures commonly use PRNs, wth the procedure used by the Canadan Monthly Wholesale and Retal Trade Survey the only excepton n these two papers. Some L procedures are applcable to the overlap of more than two surveys at a tme. 2. OHLSSON PAPER I consder the procedure descrbed n Ohlsson s paper to be hghly nnovatve and an extremely mportant contrbuton to controllng overlap. Before dscussng other detals of the features of ths procedure, let me menton one mportant feature that s not mentoned n ths paper, but s descrbed n Ohlsson (1996). Suppose an ntal sample has been chosen, that s one not overlapped wth a prevous sample, wthout usng Ohlsson s procedure. Although t mght appear to be too late then to overlap a subsequent sample wth ths ntal sample usng hs procedure, t s shown n Ohlsson (1996) that for n = 1, PRNs can be assgned retrospectvely, condtoned on the ntal sample, and the Ohlsson procedure then appled to subsequent samples wth all the propertes of the procedure remanng unchanged. It s not presently known whether ths result can be extended to n > 1. Ohlsson s procedure, lke all overlap procedures, has both advantages and dsadvantages. The short lst of dsadvantages wll be mentoned frst. It s the only S overlap procedure that I am aware of that n the case when n = 1 and two samples wth dentcal stratfcaton are overlapped does not yeld the optmal overlap. All other S procedures reduce to Keyftz s (1951) procedure n that case. In partcular, Keyftz s procedure always retans n the new sample any sample unt n the ntal desgn that has a selecton probablty at least as large n the new desgn as n the ntal desgn. However, wth Ohlsson s procedure, such a unt can be replaced n the new sample by a unt wth a selecton probablty that has ncreased by a larger percentage n the new desgn. In addton, there are 265

12 lmtatons on the use of ths procedure. If n > 1 and Ohlsson s procedure was not used n selectng the ntal sample, then t cannot be used to select subsequent samples snce the procedure for retrospectvely assgnng PRNs mentoned above has only been developed for the case n = 1. In addton, regardless of n, f another overlap procedure had prevously been used that destroyed the ndependence of samplng from stratum to stratum, then Ohlsson s procedure cannot be used. Most of the advantages of Ohlsson s procedure partcularly apply when the surveys beng overlapped have dfferent stratfcatons, so we wll confne our dscusson to ths case, whch s the more common case n practce when n s small. Ohlsson s procedure s qute smple to mplement. Ths s partcularly noteworthy when n > 1, snce most alternatves procedures, ncludng Causey, et al. (1985), Ernst (1986), and Ernst and Ikeda (1995), employ lnear programmng algorthms, whch are generally not smple to mplement. Also, whch s the key pont of the procedure, t preserves the ndependence of samplng from stratum to stratum. Overlap procedures that requre the same stratfcatons n the surveys overlapped also automatcally satsfy ths ndependence. Besdes these procedures, the only other procedures that I am aware of that preserve ths ndependence do not predetermne the sample sze and hence are not S procedures. In addton to the advantages n varance estmaton, Ohlsson s procedure can be appled repeatedly as a result of ths ndependence property. Among alternatve procedures, some, such as Ksh and Scott (1971), Causey, et al. (1985), and Ernst and Ikeda (1995), can only be used once, snce they assume ths ndependence; whle others, such as Perkns (1970) and Ernst (1986), whch can be used repeatedly, tend to yeld an overlap that s further from optmal. To llustrate the resultng problems, consder the selecton of the PSUs for the U.S. Census Bureau s Survey of Income and Program Partcpaton. Ths survey s redesgned every 10 years. The PSUs were selected ndependently n the 1980s redesgn. For the 1990s selecton, the sample was overlapped wth the 1980s sample usng the Ernst and Ikeda (1995) procedure. For the 2000s redesgn, snce the Ernst and Ikeda procedure cannot be used agan, the Ernst (1986) procedure wll be used, whch should not produce as large an overlap. It mght have been better to have used Ohlsson s procedure throughout. It would be nterestng to emprcally compare the overlap produced by Ohlsson s procedure to that produced by the Ernst (1986) procedure, whch produces the best overlap among those procedures that can be used repeatedly because they do not assume ndependence from stratum to stratum. I suspect that the Ernst procedure would be superor when the surveys overlapped have smlar stratfcatons, snce the Ohlsson procedure s not optmal when the stratfcatons are dentcal. However, when the stratfcatons are very dfferent, Ohlsson s procedure may be superor, snce t s known that the Ernst procedure does not generally produce an overlap that s close to optmal under ths condton. 3. MCKENZIE AND GROSS PAPER I have always been mpressed wth the general approach n synchronzed samplng of movng the endponts to the rght whle keepng the samplng sze fxed, whch prevents unts that leave the sample at one tme perod as a result of more brths than deaths from reenterng the next tme perod as a result of more deaths than brths. The focus n ths paper, however, s not on synchronzed samplng n general, but on maxmzng or mnmzng overlap wth another stratfcaton of the same survey or wth one or more other surveys. As the authors note, the overlap attaned usng ther procedure s far from optmal because of the partal ntervals problem. The only way that I am aware of to avod ths problem wthout a transformaton of the PRNs s to have the new sample consst of several partal ntervals wth only unts n each partal nterval from the correspondng old stratum ncluded, f possble, rather than a sngle nterval wth all unts ncluded. I suspect that ths soluton would cause too many operatonal problems to be serously consdered. The fact that there s a bas wth ther procedure when the startng pont of the selecton nterval s the last pont that produces the maxmum score s llustrated by the followng smple example. Suppose a stratum n the new desgn conssts of two complete old strata, A and B, wth n = 1 for the new stratum and both old strata, and wth A consstng of more unts than B. Then the probablty s.5 that the last pont that produces the maxmum score s the sample pont n stratum A and the probablty s.5 that t s the sample pont n stratum B. Consequently, the selecton of the sample unt n the new stratum would not be wth equal probablty, but nstead each unt that was n stratum B would have a hgher selecton probablty than each unt that was n stratum A. I agree, as mentoned n M&G, that the use of the pont after last occurrence of maxmum score rather than the pont of the last occurrence as the startng pont of the selecton nterval tends to reduce the sze of the msallocaton, although I beleve ths whole ssue s complex and needs further study. Furthermore, for very small n, partcularly for n = 1, ths choce of 266

13 startng pont may reduce the effectveness of the overlap. To llustrate, consder the above example wth the pont after the last occurrence of the maxmum score as the startng pont. Then the sample unt n the new stratum would only have been a sample unt n the old desgn f the stratum A sample unt and stratum B sample unt are the frst and last ponts the new stratum, an event wth probablty lower than the probablty of overlap when the sample unt n the new stratum s selected ndependently of the selecton of the sample unts n the old strata. The avodable load measure used n the emprcal study s nterestng. However, t s a measure of hgh load, whle the maxmum score used n the overlap procedure attempts to mnmze a somewhat dfferent measure. For example, selectng two unts, one wth a hgh load and one wth a low load may result n a lower score but a hgher contrbuton to the avodable load than selectng two unts wth medum loads. I beleve t also would be worthwhle to compare the avodable load obtaned n the emprcal study usng ther overlap procedure to the avodable load obtaned selectng the samples for these 12 surveys ndependently. 4. ROYCE PAPER Although several approaches for sample coordnaton are dscussed n ths paper, STC appears to be standardzng for wthn survey sample coordnaton around GSAM, whch uses collocated random numbers (CRNs) that have at least two advantages over standard PRNs due to the equal spacng of the selecton numbers. Frst, CRNs help reduce the varablty of the sample sze when usng a fxed selecton nterval, although, partcularly because of brths and deaths, they do not completely elmnate t. In addton, an attempt to mnmze overlap among surveys by assgnng each survey dfferent startng ponts an approprate dstance apart cannot fal wth CRNs, as t can wth PRNs f most of the PRNs are clustered close to each other. CRNs also have dsadvantages n comparson wth PRNs. CRNs cannot be assgned on a flow bass, whch s why they could not be used n the Tax Estmates Program. Also coordnatng surveys wth dfferent stratfcatons or when restratfyng can be more complcated wth CRNs, as noted by Ohlsson (1995), snce the CRN for a unt s a functon of the number of unts n the stratum. One of the mportant features of GSAM s a procedure for maxmzng overlap when restratfyng a survey. GSAM accomplshes ths whle avodng the partal ntervals problem dscussed n the M&G paper and thus generally produces a larger overlap. Ths s done by assgnng new selecton numbers to the unts n a new stratum n a manner that clusters together as much as possble the unts that were n sample under the old stratfcaton. Usng ths approach for coordnatng among dfferent surveys mght lead to complcatons, however, because the same unt would have dfferent selecton numbers for dfferent surveys. An alternatve to the procedure descrbed for mnmzng the overlap between the crops and lvestock surveys would be to choose approprate startng ponts for the selecton ntervals for the two surveys and move both ntervals to the rght. For example, f the sample for one survey was three tmes larger than the sample for the other n a stratum, then the selecton nterval for the larger sample could start at 0 and for the smaller sample at Ths alternatve should result n a longer tme perod before the samples could overlap. However, I understand that ths alternatve was among those consdered by STC, but that t produced a larger overlap than the procedure adopted. The procedure descrbed for reducng respondent burden n SEPH for mult-establshment enterprses results n based estmates. In partcular, I beleve there may be a large underallocaton of establshments n large enterprses because of ths procedure. The network samplng approach used n UES s qute nterestng. It complcates estmaton and varance estmaton, however, as dscussed n Smard and Hdroglou (1999). 5. REFERENCES NOT LISTED IN SESSION PAPERS Ernst, L. R. (1986), Maxmzng the Overlap Between Surveys When Informaton Is Incomplete, European Journal of Operatonal Research, 27, pp Perkns, W. M. (1970), 1970 CPS Redesgn: Proposed Method for Dervng Sample PSU Selecton Probabltes Wthn 1970 NSR Strata, memorandum to Joseph Waksberg, Washngton, DC: U.S. Bureau of the Census. Any opnons expressed n ths dscusson are those of the author and do not consttute polcy of BLS. 267

14 268

Maximizing Overlap of Large Primary Sampling Units in Repeated Sampling: A comparison of Ernst s Method with Ohlsson s Method

Maximizing Overlap of Large Primary Sampling Units in Repeated Sampling: A comparison of Ernst s Method with Ohlsson s Method Maxmzng Overlap of Large Prmary Samplng Unts n Repeated Samplng: A comparson of Ernst s Method wth Ohlsson s Method Red Rottach and Padrac Murphy 1 U.S. Census Bureau 4600 Slver Hll Road, Washngton DC