On stratified randomized response sampling

Model Assisted Statistics ad Applicatios 1 (005,006) 31 36 31 IOS ress O stratified radomized respose samplig Jea-Bok Ryu a,, Jog-Mi Kim b, Tae-Youg Heo c ad Chu Gu ark d a Statistics, Divisio of Life Sciece ad Geetic Egieerig ad Statistics, Cheogju Uiversity, Cheogju, Chugbuk, 360-764, Republic of Korea b Statistics, Divisio of Sciece ad Mathematics, Uiversity of Miesota, Morris, MN, 5667, USA c Departmet of Statistics, North Carolia State Uiversity, Raleigh, NC, 7695, USA d Natioal Cacer Ceter, Ilsa, Goyag-si, Gyeoggi-do, 411-769, Republic of Korea Abstract. I this paper, we propose a ew quatitative radomized respose model based o Magat ad Sigh [7] two-stage radomized respose model. We derive the estimator of the sesitive variable mea, ad show that our method is more efficiet tha other radomized respose models suggested by Greeberg et al. [3] ad Gupta et al. [4] estimators. Keywords: Quatitative radomized respose techique, sesitive characteristics, stratified samplig 1. Itroductio The radomized respose techique is a procedure for collectig the iformatio o sesitive characteristics without exposig the idetity of the respodet. It was first itroduced by Warer [8] as a alterative survey techique for socially udesirable or icrimiatig behavior questios. Greeberg et al. [3] have proposed ad developed the urelated questio radomized respose desig for estimatig the mea ad the variace of the distributio of a quatitative variable. Gupta et al. [4], ad Arab [1] have showed that optioal radomized respose model is more accurate while beig less itrusive. Hog et al. [5] suggest a stratified radomized respose model usig a proportioal allocatio. However their model may have a high costs due to the difficulty i obtaiig a proportioal sample from each stratum. To rectify this problem, Kim ad Warde [6] suggest a stratified radomized respose model usig a optimal allocatio which is more efficiet tha that of usig the proportioal allocatio. Correspodig author: Jea-Bok Ryu, Statistics, Divisio of Life Sciece ad Geetic Egieerig ad Statistics, Cheogju Uiversity, Cheogju, Chugbuk, 360-764, Republic of Korea. E-mail: jbryu@cju.ac.kr.. A review of quatitative radomized respose methods Urelated questio radomized respose method proposed by Greeberg et al. [3] is a survey procedure that a respodet could be asked oe of two questios depedig o the outcome of a radomizatio device. For example, a iterviewee performs a radomizatio device with two outcomes each with pre-assiged probabilities ad 1 which will aswer oe of the followig questios: S : How may abortios have you had durig your lifetime? N : How may magazies do you subscribe to? where we deotes S as the sesitive questio ad N as the o-sesitive questio. Two idepedet, ooverlappig samples of sizes 1 ad are used (size 1 eed ot be equal to size ). Let the populatio mea of both the sesitive ad o-sesitive distributios be µ A, µ Y, respectively. Let the populatio variace of both the sesitive ad o-sesitive distributios be A, Y, respectively. Ubiased estimators for the meas of the sesitive ad o-sesitive radom variables, µ A ad µ Y, are ˆµ 1 = (1 ) T 1 (1 1 ) T 1 ad (1) ISSN 1574-1699/05/06/$17.00 005/006 IOS ress ad the authors. All rights reserved

3 J.-B. Ryu et al. / O stratified radomized respose samplig ˆµ = T 1 1 T, () 1 where T i is total sample mea computed from the resposes i the i th samples ad i is the selectio probability for the sesitive questio i the i th sample, for i =1, ( 1 ). The variace of ˆµ 1 is give by Var(ˆµ 1 ) (3) = (1 ) Var( T 1 )+(1 1 ) Var( T ) ( 1 ), where Var( T j )= 1 j (Y + j(a Y )+ j(1 j )(µ A µ Y ) ). I this method, if µ Y ad Y are kow i advace, oly oe sample is eeded. So we defie 1 = ad T 1 = T, the Eqs (1) ad (3) are simplified as ad ˆµ 1 = T (1 )µ Y Var(ˆµ 1 )= Var( T ) (4) ( Y + ( A Y ) (5) + (1 )(µ A µ Y ) ). Eichhor ad Hayre [] itroduce a scrambled radomized respose method for estimatig the mea µ A ad the variace A of the sesitive questio A. Accordig to them, each respodet selected i the sample is istructed to use a radomizatio device ad geerate a radom umber, say B, from some pre-assiged distributio. The distributio of the radom variable B, also called a scramblig variable, is assumed to be kow. The mea µ B ad the variace B of the scramblig variable are also assumed to be kow. The i th respodet selected i the sample of size, draw by usig simple radom samplig with replacemet (SR- SWR), is requested to report the value Z i = B i A i /µ B as a scrambled respose o the sesitive variable, A. They show that a ubiased estimator of the populatio mea, µ A,isgiveby ˆµ E Z i (6) with variace Var(ˆµ E ) ( A + CB ( A + µ A )), (7) where B is the stadard deviatio of the scramblig variable B, ad C B = B /µ B deotes the kow coefficiet of variatio of the scramblig variable B. Gupta et al. [4] propose a optioal radomized respose techique, which is more efficiet tha the scrambled radomized respose techique suggested by Eichhor ad Hayre []. I the optioal radomized respose techique, where each respodet selected by SRSWR, ca choose oe of the followig two optios: (a) The respodet ca report the correct respose A, or (b) The respodet ca report the scrambled respose BA, where B deotes the idepedet scramblig variable. I optioal procedure, they assumed that both B ad A are positive radom variables ad µ B =1. The optioal radomized respose model ca be writte as Z = B I A, (8) where I is a idicator radom variable defied as { 1 if the respose is scrambled I = 0 otherwise. If W deotes the probability that a perso will report the scrambled respose, the I is a Beroulli radom variable with E(I) =W, where W ca be called the sesitivity of the questio. They showed a ubiased estimator of populatio mea, µ A, is give by ˆµ G Z i, (9) with variace Var(ˆµ G ) ( A + WCB( A + µ A) ), (10) where C B = B /µ B deotes the kow coefficiet of variatio of the scramblig variable B. 3. Two-stage quatitative radomized respose model I this sectio, we propose a two-stage quatitative radomized respose model. We assume that a sample of size is selected by SRSWR. The method is described as follows. Stage 1 A idividual respodet i the sample is istructed to use the radomizatio device R 1 which cosists of two statemets: questio ad (ii) Go to the radomizatio device R i the secod stage

J.-B. Ryu et al. / O stratified radomized respose samplig 33 represeted with probabilities ad 1. Stage The radomizatio device R cosists of two statemets: questio ad (ii) Report the scrambled respose AB of a sesitive questio represeted with probabilities T ad 1 T. The respodet should ot report to a iterviewer which steps are take to protect the respodet s privacy. We assumed that both B ad A are positive radom variables, µ B =1, ad B = ψ. Similar to Eichhor ad Hayre [] approach, the distributio of radom variable B, the mea µ B ad the variace B of the scramblig variable are all assumed to be kow. Based o two-stage procedures, the i th respodet selected i the sample of size, draw by usig SRSWR, is requested to report the value, U = αa +(1 α)(βa +(1 β)ab), (11) where i R α = 1 0 if a respodet chooses a statemet i R 1 ad β = i R 0 if a respodet chooses a statemet i R The expected value of the observed respose is, E(U)=E (αa +(1 α)(βa +(1 β)ab)) =µ A +(1 )(Tµ A +(1 T )µ A µ B ) = µ A, (1) where α is a Beroulli radom variable with E(α) =, Var(α) = (1 ) ad β is Beroulli with E(β) =T, Var(β) =T (1 T ). Theorem 3.1. A ubiased estimator, ˆµ A, of the populatio mea µ A is give by, ˆµ A U i (13) Theorem 3.. The variace of the proposed estimator ˆµ A is give by Var(ˆµ A ) (14) [ A +(1 )(1 T )ψ (µ A + A) ]. If 0 < <1 i Eq. (14), the, obtai the relative efficiecy of ˆµ A with respect to ˆµ G, we compare Var(ˆµ G ) ad Var(ˆµ A ) as follows: Var(ˆµ G ) Var(ˆµ A ) ( A + Wψ (A + µ A) ) 1 [ A +(1 )(1 T )ψ (µ A + A) ] ( (µ A + A )(1 T )ψ) 0. We have show that the proposed estimator ˆµ A is more efficiet tha the estimator ˆµ G suggested by Gupta et al. [4]. 4. Stratified two-stage quatitative radomized respose model I this sectio, we ewly propose a two-stage quatitative radomized respose techique i stratified samplig. The mai advatage of the stratified approach is that the techique overcome the limitatio of the loss of idividual characteristics of the respodets. We assume that the populatio is partitioed ito strata, ad a sample is selected by the SRSWR from each stratum. We assume that the umber of uits i each stratum is kow. Let deote the umber of uits i the sample from stratum h ad deote the total umber of uits i the samples from all strata so that = k. Stage 1 A idividual respodet i the sample is istructed to use the radomizatio device R 1h which cosists of two statemets: questio ad (ii) Go to the radomizatio device R h i the secod stage represeted with probabilities h ad 1 h. Stage The radomizatio device R h cosists of two statemets: questio ad (ii) Report the scrambled respose AB of a sesitive questio

34 J.-B. Ryu et al. / O stratified radomized respose samplig = w h (µ A h + A h )( h +(1 h )T h +(1 h )(1 T h )(1 + ψh )) µ A h. k w h (µ A h + A h )( h +(1 h )T h +(1 h )(1 T h )(1 + ψh )) µ A h ( k ) 1 w h (µ A h + A h )( h +(1 h )T h +(1 h )(1 T h )(1 + ψh )) µ A h. (0) represeted with probabilities T h ad 1 T h. Uder the assumptio that respodet reports truthfully ad h ad T h are set by the researcher, the distributio of radom variable B h, the mea µ Bh ad the variace B h of the scramblig variable are all assumed to be kow. We assume that µ Bh =1ad B h = ψh for all h =1,,,k. The i th respodet selected i the sample of size i stratum h, draw by usig SRSWR, is requested to report the value, U h = α h A h +(1 α h ) (15) (β h A h +(1 β h )A h B h ), where, α h = β h = i R 1h 0 if a respodet chooses a statemet i R 1h i R h 0 if a respodet chooses a statemet i R h Similar to Eq. (1), the expected value of the observed respose is give by, E(U h )=E(α h A h +(1 α h )(β h A h +(1 β h )A h B h )) = h µ Ah +(1 h )(T h µ Ah +(1 T h )µ Ah µ Bh )=µ Ah (16) where α h is a Beroulli radom variable with E(α h )= h,var(α h ) = h (1 h ) ad β h is a Beroulli radom variable with E(β h )=T h,var(β h )=T h (1 T h ). By Theorem 3.1, a ubiased estimator of the populatio mea µ Ah i stratum h is, ˆµ Ah U hi (17) ad its variace is Var(ˆµ Ah ) ((µ A h + A h )( h +(1 h )T h +(1 h )(1 T h )(1 + ψ h)) µ A h ). Sice the selectios i differet strata are made idepedetly, the mea estimators for idividual strata ca be added together to obtai a mea estimator for the whole populatio. The mea estimator of µ A for stratified samplig scheme is: ˆµ s A = k w h ˆµ Ah = k w h U hi, (18) where w h = (N h /N ) for h,,,k, so that w = k w h =1, N is the umber of uits i the whole populatio ad N h is the total umber of uits i stratum h. It is easily show that the proposed mea estimator ˆµ s A is a ubiased estimate for the populatio mea µ A. The variace of the mea estimator ˆµ s A is: Var(ˆµ s A ) (19) k wh = ((µ A + A h )( h +(1 h )T h h +(1 h )(1 T h )(1 + ψ h )) µ A h ). Iformatio o µ Ah ad A h is usually uavailable, however if prior iformatio o µ Ah ad A h is available from past experiece, the we may derive the followig optimal allocatio formula. Usig the optimal-allocatio approach based o Kim ad Warde [6], oe ca show that the variace i Eq. (19) is miimized whe 1,,..., k are chose such that (the first equatio o the top of the page). Uder this optimal-allocatio assumptio, the variace i Eq. (19) becomes i Eq. (0). Theorem 4.1. Assumig optimal allocatio, whe w 1 = w =1/ad D =(D 1 + D )/, the stratified estimator ˆµ s A is more efficiet tha the proposed model

J.-B. Ryu et al. / O stratified radomized respose samplig 35 Relative Efficiecy Relative Efficiecy 0 0 40 60 80 0 0 40 60 80 T=0. 0. 0.4 0.6 0.8 T=0. Relative Efficiecy Relative Efficiecy 0 0 40 60 80 0 0 40 60 80 0. 0.4 0.6 0.8 T=0. T=0. 0. 0.4 0.6 0.8 0. 0.4 0.6 0.8 Fig. 1. The relative efficiecy of ˆµ A with respect to ˆµ 1 as a fuctio of =0.1, 0., 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 ad T =0., 0.4, 0.6, 0.8 with A =1, Y = B =0.5 ad µ A =(top left); µ A =4(top right); µ A =6(bottom left); µ A =8(bottom right). estimator ˆµ A, where, D =((µ A + A )( +(1 )T +(1 )(1 T )(1 + ψ )) µ A ), D h =(µ A h + A h )( h +(1 h )T h +(1 h )(1 T h )(1 + ψh )) µ A h, for h =1ad. For other cases, we ca easily check the relative efficiecy by a variace compariso with various settigs of µ Ah, A h, h, ad T h. Theorem 4.1 guaratees that whe optimal-allocatio is used, the stratified estimator, ˆµ s A, is more efficiet tha the estimator, ˆµ A, which igores the stratificatio. 5. Compariso ad discussio I this sectio, we preset a umerical study of the two-stage quatitative radomized respose model. The purpose of the simulatio is to cofirm that the proposed techique is more efficiet. We compare the origial quatitative radomized respose model proposed by Greeberg et al. [3] (ˆµ 1 ) with the proposed model (ˆµ A ) i terms of variace. From Eqs (4) ad (5), the mea ad the variace of Greeberg et al. [3] s sesitive mea estimator (whe µ Y is kow) are ˆµ 1 = T (1 )µ Y Var(ˆµ 1 )= Var( T ), ( Y + ( A Y ) + (1 )(µ A µ Y ) ). Uder the assumptios with µ Y = µ B ad Y = B = ψ, ˆµ 1 = T (1 ), Var(ˆµ 1 )= Var( T ) (ψ + (A Y ) + (1 )(µ A 1) ). The relative efficiecy of ˆµ A with respect to ˆµ 1 is as follows: RE = Var(ˆµ 1) Var(ˆµ A ) =[ψ + ( A ψ )

36 J.-B. Ryu et al. / O stratified radomized respose samplig + (1 )(µ A 1) ]/ [ ((µ A + A)( +(1 )T +(1 )(1 T )(1 + ψ )) µ A )]. Figure 1 shows that the proposed estimator, ˆµ A,is more efficiet tha the Greeberg et al. [3] estimator, ˆµ 1, with A. We ca show that the proposed method is more efficiet tha the Greeberg et al. [3] method if the coefficiet of variatio, C B = B /µ B = B 1.0. Our ewly proposed two-stage quatitative radomized respose model improves the performace by takig advatage of radomized respose iformatio provided by secod stage. We have show that our model is much more efficiet tha other models (Greeberg et al. [3] ad Gupta et al. [4]). Additioally, we have provided a comprehesive descriptio of the two-stage quatitative stratified radomized respose model ad its statistical properties. The use of stratified quatitative radomized respose model ca overcome the limitatios of radomized respose model which ca lose the idividual characteristics of the respodets. Refereces [1] R. Arab, Optioal radomized respose techiques for complex survey desigs, Biometrical Joural 46(1) (004), 114 14. [] B.H. Eichhor ad L.S. Hayre, Scrambled radomized respose methods for obtaiig sesitive quatitative data, Joural of Statistical laig ad Iferece 7 (1983), 307 316. [3] B.G. Greeberg, R.R. Kuebler Jr., J.R. Aberathy ad D.G. Horvitz, Applicatio of the radomized respose techique i obtaiig quatitative data, Joural of the America Statistical Associatio 66 (1971), 43 50. [4] S. Gupta, B. Gupta ad S. Sigh, Estimatio of sesitivity level of persoal iterview survey questios, Joural of Statistical laig ad Iferece 100 (00), 39 47. [5] K. Hog, J. Yum ad H. Lee, A stratified radomized respose techique, Korea Joural of Applied Statististics 7 (1994), 141 147. [6] J.-M. Kim ad W.D. Warde, A stratified Warer s radomized respose model, Joural of Statistical laig ad Iferece 10(1 ) (004), 155 165. [7] N.S. Magat ad R. Sigh, A alterative radomized respose procedure, Biometrika 77 (1990), 439 44. [8] S.L. Warer, Radomized respose: a survey techique for elimiatig evasive aswer bias, Joural of the America Statistical Associatio 60 (1965), 63 69.