A Bound for the Relative Bias of the Design Effect

A Bound for the Relatve Bas of the Desgn Effect Alberto Padlla Banco de Méxco Abstract Desgn effects are typcally used to compute sample szes or standard errors from complex surveys. In ths paper, we show that the desgn effect estmator s based and an upper bound for the relatve bas s presented. A smulaton study was conducted to assess the sze of the bound for the relatve bas wth samples drawn from two artfcally generated populatons usng stratfed and two-stage random samplng. Key Words: Varance of varances, Confdence nterval, Coeffcent of varaton 1. Introducton The desgn effect, deff, Ksh (1965), s defned as the rato of the varance of an estmator under an specfc desgn to the varance of the estmator under smple random samplng wthout replacement, wor. The estmator of the desgn effect s used for example n the computaton of the sample sze for complex sample desgns and to buld confdence ntervals. In ths artcle we wll exhbt the bas of the desgn effect and a bound for the relatve bas of the standard error of the desgn effect..1 Defnton. Desgn Effect Estmaton and Bas It s worth mentonng that all results are based on the desgn based approach for the plannng stage of a survey. The desgn effect, deff, Ksh (1965), s defned as the rato of the varance of an estmator under an specfc desgn dfferent from smple random samplng, v ), to the varance of the estmator under smple random samplng wthout replacement, wor : deff yˆ) v ) v ) ( wor wor The desgn effect estmator, deff, Ksh (1965), s computed by pluggng n the formula showed n the prevous slde, estmators of the varances n both the numerator and denomnator. In partcular, the varance estmator under wor s generally obtaned by usng formula: n n 1 ( ˆ) 1 ˆ ˆ 1 y y vwor y N n n 1 In ths formula the sample s consdered as a wor and t does not guarantee an unbased estmaton of the populaton varance under wor. An example wth a small populaton ll shed lght on ths pont.

Remark: we wll use the term relatve bas to the upper bound of the rato of the bas to the standard error..1.1 Example 1. Desgn Effect Estmaton and Bas Stratum Table 1: Small Stratfed Populaton Values yh Populaton mean yh s hu 1 {, 3, 4.5,.5, 3.4} 3.08 0.91 {11, 14, 18} 14.33 1.33 Populaton 7.30 37.96 A smple random sample wthout replacement,, of sze was extracted from each statum. There are 30 possble samples under stratfed random samplng, strs, and 70 under. The populaton varances for the mean under both samplng desgns are, v wor ) 4.75 and v st ) 0. 395 respectvely. The desgn effect for ths populaton s:. deff ) v ) v ) 0.083 wor As t was mentoned above, Ksh (1965) defned the estmator of deff for each sample as: deffˆ ) v ) ), where =1, 30, and K strs st, based, wor based, ( yˆ ) (1 4/8) j ( y ˆ ) (4 1)4 1 hj y 30 In ths case, 1, ( yˆ ) 30 5.93 based, whch s dfferent from v ) 4.. 75 The average over all possble samples under stratfed random samplng s, 30 ˆ deff ) 30 0.067 deff ) 0.083 1 K Ths result s not surprsng snce the estmator defˆ f K s a rato estmator whch s known to be based, Cochran (1977). Ths result shows the need to modfy the estmator used n the denomnator of the formula proposed by Gambno (009) to obtan an unbased estmator: n 1 yˆ ( yˆ) unbased, ( yˆ) (1 ) sq ) N n ( N 1) N Where yˆ sq, yˆ and ( yˆ) are unbased estmators of the followng populaton quanttes: sum of squares, total squared and varance of the total under the desgn dfferent from wor. Herenafter, defˆ f G wll denote the deff estmator usng Gambno s correcton. Usng Gambno s correcton for the estmaton of defˆ, f G we obtan: 30 deffˆ ) 30 0.084 deff ) 0.083 1 G st 4 defˆ f K Ths estmator remans based, but the source of the bas stems from the use of a rato estmator only. wth

.1. Bounds for the Bas and Relatve Bas of the Deff Theorem: for a sample desgn,, dfferent from, wth varance estmators,, msˆ and ˆ 0we have: e mse mse E deff E ˆ ( ˆ) mseˆ v v bas bas Corollary 1: under a sample desgn dfferent from, wth unbased estmators of the populaton varances of and, and usng defˆ f, G the relatve bas s gven by: cov( def fˆ G, unbased ) v Remark: n ths expresson, v ˆ s computed wth Gambno s formula (009). unbased Corollary : a bound for the relatve bas of defˆ f, G s gven by cv( unbased ),.e., the coeffcent of varaton of the unbased estmators of the populaton varance under. Remark: we are workng wth expresson defˆ f G, whch s dfferent from defˆ f K. The latter expresson s the quantty routnely employed n practce..1.3 Example. Desgn Effect Estmaton and Bas stratfed random samplng Based on Cochran (1977) example, page 137, we smulated a small populaton wth 3 strata and 57 elements. Table : Smulated Stratfed Populaton Strata Nh nh Wh yh s hu 1 13 9 0..33 1.6 18 7 0.3 1.61 0.08 3 6 6 0.46 5.04 1.18 Populaton 57 3.44 cov( mseˆ, deffˆ) v bas msˆ e Populaton quantty v v strs deff K Value 0.096 0.035 0.364

The bound for the relatve bas, cv( unbased ), computed wth v ˆ unbased from the smulaton, was 18.4%. Remark: n table, the labels for columns to 6 refer to populaton sze, sample sze, relatve sze, stratum mean and element varance wthn strata. From the populaton defned n the prevous slde, we smulate the extracton of 5,000 samples of sze under strs, and for each sample we computed the followng estmators: Unbased estmator of the varance under strs, strs, Based estmator of the varance under, usng Ksh defnton, v ˆbased, Unbased estmator of the varance under usng Gambno correcton, v ˆ Deff estmator usng Ksh formula, defˆ f K, Deff estmator usng Gambno correcton, defˆ f G Results for 5,000 samples for each estmator: Estmator Value Bas (%) v ˆunbased 0.0964 --- v ˆbased 0.083-13.4% strs 0.035 --- defˆ f K 0.4318 18.7% defˆ 0.374.4% f G unbased The bound for the relatve bas, cv( unbased ), computed wth v ˆunbased from the smulaton, was 18.4%..1.4 Example 3. Desgn Effect Estmaton and Bas two-stage cluster samplng We have a small populaton wth 8 clusters or prmary samplng unts, PSU, and each PSU has 8 elements, SSU, secondary samplng unts. At frst stage we draw a=3 PSU and b=4 SSU, so the sample sze s n=ab=1 elements. The values wthn each cluster were smulated usng unform random varables. The mnmum and maxmum employed n the smulaton are shown n the next table. y s mn & max 1 0.33 0.0058 0. and 0.5 0.444 0.0009 0.4 and 0.5 3 1.37 0.0094 1.1 and 1.4 4 0.919 0.0037 0.8 and 1.0 5 0.3 0.0064 0.1 and 0.35 6 0.610 0.0030 0.5 and 0.7 PSU 7 0.970 0.0044 0.9 and 1.1 8 0.461 0.0077 0.3 and 0.6 Populaton 0.650 0.1166

The values n columns and 3 refer to the wthn-cluster mean and varance. Populaton quantty v v clus deff K Value 0.0079 0.01.6877 In ths table v clus s the populaton varance under two-stage random samplng. The ntraclass correlaton for ths populaton s 0.95 and was computed usng the result from Cochran (1977) page 91. We smulate the extracton of 3,500 samples of sze 1, wth a=3 PSU selected by and b=4 SSU selected by. For each sample we computed the followng estmators: Unbased estmator of the varance under clus, clus, Based estmator of the varance under, usng Ksh defnton, v ˆbased, Unbased estmator of the varance under usng Gambno correcton, v ˆ Deff estmator usng Ksh formula, defˆ f K, Deff estmator usng Gambno correcton, defˆ f G The results for the 5,000 samples for each estmator are shown above: Estmator Average of estmators Relatve bas v ˆunbased 0.0080 --- v ˆbased 0.0066-16.13% clus 0.069 --- defˆ f K 3.9037 45.4% defˆ 3.467 0.80% f G unbased The bound for the relatve bas, cv( unbased ), computed wth from the unbased smulaton, was 63.44%..1.5 Example 4. Desgn Effect Estmaton and Bas two-stage cluster samplng wth several roh values Usng the same populaton of example 3 and changng elements between clusters, we repeated the smulatons as n example 3 n order to obtan dfferent values of the bound for the relatve bas and to compute the formula used n practce deff [1+roh(b-1)] and compare t to the populaton deff. Wth ths change between clusters the populaton mean and varance between elements was unaffected, but roh and deff changed.

roh deff Bound Relatve bas deff [1+roh(b-1)] -0.14 0.70 0.5 0.58-0.05 0.9 0.34 0.85 0.01 1.07 0.34 1.03 0.14 1.38 0.36 1.4 0.6 1.66 0.37 1.78 0.38 1.97 0.43.15 0.50.5 0.45.50 0.63.50 0.54.88 0.76.87 0.61 3.7 0.86 3.1 0.6 3.57 0.96 3.35 0.64 3.87 In the table, [1+roh(b-1)] s a good aproxmaton to deff (populaton value) whenever (A-1)/A and (N-1)/N are equal to unty, see Ksh (1965), chapter 5. From ths table, t can be seen that [1+roh(b-1)] overestmates the populaton deff for roh values from 0.6 to 0.96 and when roh s negatve t underestmates t. 3. Conclusons An exact expresson for the bas and an upper bound to the rato of the bas of the desgn effect estmator to the standard error was gven. The upper bound for the bas s gven by the coeffcent of varaton of the unbased estmators of the varance under smple random samplng. Based on the smulatons and the extensve use of the desgn effect n practce, t s advsable to analyse the stablty of the varance estmator under smple random samplng, whenever possble. It s also advsable to work wth an unbased estmator of the varance of smple random samplng. Some more smulatons are needed to assess the usefulness of formula deff [1+roh(b-1)], t seems that t tends to over or underestmate the true deff value. References Cochran, W. G. (1977). Samplng Technques, 3rd ed. New York: Wley. Gambno, J.G. (009). Desgn effects caveat, The Amercan Statstcan, pp. 141-145. Ksh, L. (1965). Survey Samplng. New York: Wley.

Padlla, A.M., Una cota para el sesgo relatvo del efecto del dseño, Memoras electróncas en extenso de la 4ª Semana Internaconal de la Estadístca y la Probabldad. Julo 011, CD ISBN: 978-607-487-34-5. Rao, J.N.K. (196). On the estmaton of the relatve effcency of samplng procedures, Annals of the Insttute of Statstcal Mathematcs, pp. 143-150.