Estimation of sensitive quantitative characteristics in randomized response sampling

Size: px

Start display at page:

Download "Estimation of sensitive quantitative characteristics in randomized response sampling"

Timothy Price
5 years ago
Views:

1 Estimation of sensitive quantitative characteristics in randomized response sampling Kuo-Chung Huang Department of Business Administration Chungyu Institute of Technology Keelung Taiwan 20 R.O.C. Chun-Hsiung Lan Mei-Pei Kuo Department of Business Administration Nanhua University Dalin, Chiayi Taiwan 622 R.O.C. Abstract This paper considers the problem of procuring honest responses for sensitive quantitative characteristics. An alternative survey technique is proposed, which enables us to estimate the population mean unbiasedly and to gauge how sensitive a survey topic is. An asymptotically unbiased estimator of sensitivity level is proposed, and conditions for which unbiased estimation for population variance being available is also studied. In addition, an efficiency comparison is worked out to examine the performance of the proposed procedure. It is found that higher estimation efficiency results from higher variation of randomization device. Keywords : Percent relative efficiency, privacy protection, sample size allocation, simple random sampling with replacement. kchuang.tw@gmail.com Journal of Statistics & Management Systems Vol. 9 (2006), No., pp c Taru Publications

2 28 K. C. HUANG, C. H. LAN AND M. P. KUO. Introduction Accurate information on the survey topics is extremely relevant for parameter estimations. However, it is difficult to obtain valid and reliable information in the area of sensitive topics. If a direct survey research is employed to assess a sensitive characteristic, respondents often refuse to take part, or reply untruthfully, especially when they have committed sensitive behavior. To improve respondent cooperation and to procure reliable data, Warner [5] first introduced the randomized response (RR) technique for qualitative characteristics. Subsequently, Greenberg et al. [8] suggested an extension of Warner s RR technique to quantitative characteristics. An excellent exposition of modifications on RR techniques and other related works could be referred to Chaudhuri and Mukerjee [6]. Some resent developments are Bhargava and Singh [], Chua and Tsui [7], Padmawar and Vijayan [2], Singh et al. [4], Chang and Huang [2], Chaudhuri [5], Singh et al. [3], Huang [0], and Chang et al. [3, 4], etc. In particular, to quantify the sensitivity level for certain items of inquiry in practice, Gupta et al. [9] first shown how an estimator may be developed on the quantitative characteristics. Consider a finite population in which every person has a positive value for the sensitive characteristic X. The problem of interest is to estimate the mean µ x, the variance σx 2, and the sensitivity level W of X from a with-replacement simple random sample of size n. In Gupta et al. [9] procedure, each sampled respondent is instructed to utilize a randomization device and generate a positive-valued random number S from a pre-assigned distribution with known mean µ S and known variance γs 2. Then he or she chooses one of the following options: (a) The respondent can report the correct response X, or (b) The respondent can report the scrambled response SX. The optional randomized response model considered is Z S Y X, where Y or 0 according as the response is scrambled or not. Here Y is a Bernoulli variate with E(Y) W, where W is the probability that a respondent will report the scrambled response rather than the actual response X. It is shown that the usual sample mean ˆµ xg Z is unbiased with variance given by Var( ˆµ xg ) n σ 2 Z n [σ 2 x + Wγ 2 S (σ 2 x + µ 2 x)]. (.) Denote by s 2 Z (n n ) (Z j Z) 2 the sample variance, an unbiased estimator of Var( ˆµ xg ) is given by Var( ˆ ˆµ xg ) n s 2 Z j. And, they

3 RANDOMIZED RESPONSE SAMPLING 29 suggested Ŵ G n n log(z j ) log Z j E[log(S)] and ˆσ 2 xg s2 Z Ŵ G γ 2 S ˆµ2 xg + Ŵ G γ 2 S, as an estimator of W and σ 2 x, respectively. Although it creates an appropriate environment to estimate some unknown population parameters, the Gupta et al. [9] development in finding an estimator for W remains biased because of the presence of log term. They did address estimation efficiency and biasedness in the W estimate using empirical estimators, but analytical variance expression of W remains unidentified. In addition, the estimator ˆσ xg 2 is merely a biased estimator of σx 2. In these regards, we intend to suggest an survey procedure that provides an unbiased estimator of σx 2 and an asymptotically unbiased estimator of W together with its variance. The proposed estimators with principal properties are given in the following section. In section 3, an efficiency comparison is carried out to study the performance of the proposed procedure. 2. The proposed procedure In the proposed procedure, two independent samples of size n i, i, 2, are drawn from the population using simple random sampling with replacement such that n + n 2 n, the total sample size required. In the i th sample, each respondent is instructed to use a randomization device and generate a random number, say S i, from some pre-assigned distributions such as Chi-square, uniform, or Weibull etc. The RR procedures are set up in such a way that the interviewer does not know what the respondent has selected. Then the respondent is requested to report one of the following two options: (a) The correct response X, or (b) The scrambled response S i X. It is supposed that S i is a positive-valued random variate with known mean µ Si θ i and known variance σs 2 i γi 2. The optional randomized response model for the ith sample, i, 2, is given by Z i ( Y)X + YS i X, where Y is a random variate, with expectation E(Y) W, defined as {, if the response is scrambled, Y 0, otherwise.

4 30 K. C. HUANG, C. H. LAN AND M. P. KUO Under the proposed procedure, the sample response Z i for the ith sample, i, 2, has expectation E(Z i ) ( W)µ x + Wθ i µ x µ x + W(θ i )µ x. (2.) If Z and Z 2 are the observed means for the two samples, the proposed estimators of µ x and W are respectively given by ˆµ x ( θ 2) Z ( θ ) Z 2 Z and Ŵ Z 2, θ θ 2 ( θ 2 ) Z ( θ ) Z 2 whence provided that θ θ 2. Theorem. The estimator ˆµ x is unbiased with variance given by [ Var( ˆµ x ) ( θ θ 2 ) 2σ Z 2 + ( θ ) 2σ 2 ] Z 2, (2.2) θ 2 n n 2 where σ 2 Z i σ 2 x + W(γ 2 i +θ2 i )σ 2 x +W[γ 2 i +( W)( θ i) 2 ]µ 2 x, i, 2.(2.3) Proof. Taking expectation for ˆµ x, the unbiasedness follows from using (2.). From the distribution of Zi 2, we have E(Z 2 i ) σ 2 x + µ 2 x + W(γ 2 i + θ 2 i )(σ 2 x + µ 2 x). (2.4) Using (2.), (2.4) and the fact that σz 2 i E(Zi 2) [E(Z i)] 2, we get (2.3). Expression (2.2) then follows from the independence of the two samples. Hence the proof. Theorem 2. An unbiased estimator of the variance of ˆµ x is given by [ ] Var( ˆ ˆµ x ) (θ θ 2 ) 2 ( θ 2 ) 2 s2 Z + ( θ ) 2 s2 Z 2, n n 2 where s 2 Z i (n i ) n i (Z i j Z i ) 2, i, 2. j Proof. It follows from that E(s 2 Z i ) σ 2 Z i, i, 2. For deriving the properties of the estimator Ŵ, let us define d Z Z 2 and d 2 ( θ 2 ) Z ( θ ) Z 2. Since E(d ) (θ θ 2 )Wµ x and E(d 2 ) (θ θ 2 )µ x, we then have Ŵ d /d 2 and W E(d )/E(d 2 ). Additionally, if we further denote by e [d E(d )]/ E(d ) and e 2 [d 2 E(d 2 )]/E(d 2 ), assuming that e 2 < so that ( + e 2 ) can be validly expanded as power series, it follows that E(e ) E(e 2 ) 0, E(e e 2 ) ( θ 2)σ 2 Z + ( θ ) 2 σ 2 Z 2 (θ θ 2 ) 2 Wµ 2 x,

5 RANDOMIZED RESPONSE SAMPLING 3 E(e 2 ) σz 2 + σz 2 2 (θ θ 2 ) 2 W 2 µ x 2, E(e 2 2 ) ( θ 2) 2 σz 2 + ( θ ) 2 σz 2 2 (θ θ 2 ) 2 µ x 2. (2.5) And the estimation error can be written in terms of e and e 2 as Ŵ W W(e e 2 ) + o(n ). (2.6) Then we have the following theorem. Theorem 3. The estimator Ŵ is asymptotically unbiased with variance given by Var(Ŵ) (θ θ 2 ) 2 µ 2 x { which can be estimated by ˆ Var(Ŵ) (θ θ 2 ) 2 ˆµ 2 x { [ + W(θ 2 )] 2 σ 2 Z n + [ + W(θ )] 2 σ 2 Z 2 n 2 [ + Ŵ(θ 2 )] 2 s2 Z n + [ + Ŵ(θ )] 2 s2 Z 2 n 2 } }, (2.7). (2.8) Proof. The asymptotically unbiasedness follows from E(e ) E(e ) 0. Squaring expression (2.6), omitting terms with power in e i s higher than the second and then taking expectation, we have E(Ŵ W) 2 W 2 E(e 2 2e e 2 + e 2 2 ). On substituting the expected values given in (2.5) and after some algebraic simplification, we get (2.7). Replacing µ x, W and σz 2 i in (2.7) by the corresponding sample analogue, (2.8) then follows. Hence the theorem. Next, let us consider the problem of unbiased estimation for the variance σ 2 x of the sensitive characteristic X. For the sake of notational convenience, let us denote a 2(θ 2 )γ 2 + (2 θ θ 2 )γ 2 2 +( 2θ θ 2 ) (θ θ 2 ), b (θ θ 2 )( γ 2 θ2 ) and c 2[(θ 2 )γ 2 (θ )γ 2 2 +(θ )(θ 2 )(θ θ 2 )]. An unbiased estimator of σ 2 x is given in the following theorem. Theorem 4. If (θ 2 )( 2θ +θ 2 )γ 2 + (θ ) 2 γ2 2 2θ (θ )(θ 2 ) (θ θ 2 ), an unbiased estimator of σx 2 is given by [( ) ] as 2 + bs2 2 + c n n i Z 2 j Z Z 2 ˆσ x 2 j (γ2 2 + θ2 2 γ2 θ2 )(θ, θ 2 ) whence provided that γ 2 + θ2 γ2 2 + θ2 2 and θ θ 2.

6 32 K. C. HUANG, C. H. LAN AND M. P. KUO Proof. Since the two samples are independent, expression (2.) leads E( Z Z 2 ) [ + (θ + θ 2 2)W + (θ )(θ 2 )W 2 ]µ 2 x. (2.9) From (2.4), we have ( E n n i Z 2 j j ) [ + W(γ 2 + θ2 )](σ 2 x + µ 2 x). (2.0) From using (2.9), (2.0) and the fact that E(si 2) σ i 2, i, 2, the unbiasedness of ˆσ x 2 then follows. Hence the theorem. We now move on to study the appropriate sample size allocations for various objectives in sampling surveys. Consider a linear combination of the variances of ˆµ x and Ŵ, given in (2.2) and (2.7) respectively, given by Var( ˆµ x, Ŵ) α Var( ˆµ x ) + α 2 Var(Ŵ) [ (θ + θ 2 ) 2 µ x 2 {α ( θ 2 ) 2 µ x 2 + α 2 [ + W(θ 2 )] 2 } σ Z 2 n ] + {α ( θ ) 2 µ 2 x + α 2 [ + W(θ )] 2 } σ 2 Z 2 n 2 where α and α 2 are non-negative coefficients chosen by the investigator. If the total sample size n ( n + n 2 ) is fixed, then through a simple application of the Cauchy-Schwarz inequality, the sample allocation for which Var( ˆµ x, Ŵ) attains its minimum is given by [ n {α ( θ 2 ) 2 µ x 2 + α 2 [ + W(θ 2 ) 2 ]}σ 2 ] /2 Z n 2 {α ( θ ) 2 µ x 2 + α 2 [ + W(θ ) 2 ]}σz 2, 2 and the minimum value of Var( ˆµ x, Ŵ) will be ( ) [{α ( θ 2 ) 2 µ x 2 + α 2 [ + W(θ 2 ) 2 ]} 2σ Z +{α ( θ ) 2 µ x 2 + α 2 [ + W(θ ) 2 ]} 2σ Z2 ] 2 Min Var( ˆµ x, Ŵ) n(θ θ 2 ) 2 µ x 2. It is remarkable that the actual values of a σ Z, σ Z2 and W are always unknown, information regarding them can be obtained from past experience or a pilot survey, which is helpful for practical applications (Murthy [], pp ). 3. Efficiency comparison In what follows, we study the performance of the proposed procedure compared with Gupta et al. [9] procedure. The percent relative efficiency of the proposed estimator ˆµ x with respect to Gupta et al. estimator

7 RANDOMIZED RESPONSE SAMPLING 33 ˆµ xg is defined as PRE (θ θ 2 ) 2 σ 2 Z ( θ 2 σ Z + θ σ Z2 ) 2 00, where σz 2 and σ Z 2 i are respectively given in (.) and (2.3). Since the above PRE expression is infested with too many parameters, here the comparison is performed on an empirical investigation using the same probability distribution in the competing randomizing devices. To examine how the variation of randomization devices will affect the efficiency, it is assumed that scrambled variables follow the distributions with mean µ and variance γ 2 satisfying γ 2 kµ. Without loss of generality, here we simply choose (µ S, γs 2) (, k), (θ, γ 2) (k, k2 ) and (θ 2, γ2 2 ) (0.5, 0.5k), where k 2, 3,..., 0. It is remarkable that the PRE value remains unchanged if the coefficient of variation CV x σ x µ x is fixed. The values of CV x are then chosen to be 0.2, 0.4, 0.6, 0.8 and. The percent relative efficiencies thus obtained are outlined in Table for different values of W. Table Percent relative efficiency of the two competing procedures CV x W k (Contd. Table )

8 34 K. C. HUANG, C. H. LAN AND M. P. KUO CV x W k From Table, it is seen that the proposed procedure performs better than Gupta et al. [9] procedure for most of the practical situations. Even though in some cases the proposed procedure is less efficient, it can be viewed as a tradeoff for being able to get an unbiased estimator of σx 2 and an asymptotically unbiased estimator of W. Note that the PRE value increases with k and/or W, whereas decreases with CV x, if other parameters are unchanged. References [] M. Bhargava and R. Singh (2000), A modified randomization device for Warner s model, Statistica, Vol. 60, pp [2] H. J. Chang and K. C. Huang (200), Estimation of proportion and sensitivity of a qualitative character, Metrika, Vol. 53, pp [3] H. J. Chang, C. L. Wang and K. C. Huang (2004), Using randomized response to estimate the proportion and truthful reporting probability in a dichotomous finite population, Journal of Applied Statistics, Vol. 3, pp [4] H. J. Chang, C. L. Wang and K. C. Huang (2004), On estimating the proportion of a qualitative sensitive character using randomized response sampling, Quality & Quantity, Vol. 38, pp [5] A. Chaudhuri (200), Using randomized response from a complex survey to estimate a sensitive proportion in a dichotomous finite population, Journal of Statistical Planning and Inference, Vol. 94, pp

9 RANDOMIZED RESPONSE SAMPLING 35 [6] A. Chaudhuri and R. Mukerjee (988), Randomized Response: Theory and Techniques, Marcel Dekker, New York. [7] T. C. Chua and A. K. Tsui (2000), Procuring honest responses indirectly, Journal of Statistical Planning and Inference, Vol. 90, pp [8] B. G. Greenberg, R. R. Kubler, J. R. Abernathy and D. G. Horvitz (97), Applications of the RR technique in obtaining quantitative data, Journal of the American Statistical Association, Vol. 66, pp [9] S. Gupta, B. Gupta and S. Singh (2002), Estimation of sensitivity level of personal interview survey questions, Journal of Statistical Planning and Inference, Vol. 00, pp [0] K. C. Huang (2004), A survey technique for estimating the proportion and sensitivity in a dichotomous finite population, Statistica Neerlandica, Vol. 58, pp [] M. N. Murthy (967), Sampling Theory and Methods, Statistical Publishing Society, Calcutta. [2] V. R. Padmawar and K. Vijayan (2000), Randomized response revisited, Journal of Statistical Planning and Inference, Vol. 90, pp [3] S. Singh, M. Mahmood and D. S. Tracy (200), Estimation of mean and variance of stigmatized quantitative variable using distinct units in randomized response sampling, Statistical Papers, Vol. 42, pp [4] S. Singh, R. Singh and N. S. Mangeao (2000), Some alternative strategies to Moor s model in randomized response sampling, Journal of Statistical Planning and Inference, Vol. 83, pp [5] S. L. Warner (965), Randomized response: a survey technique for eliminating evasive answer bias, Journal of the American Statistical Association, Vol. 60, pp Received April, 2005

On the Relative Efficiency of the Proposed Reparametized Randomized Response Model

On the Relative Efficiency of the Proposed Reparametized Randomized Response Model Abstract 1 Adepetun, A.O. and 2 Adebola, F.B. Department of Statistics, Federal University of Technology, PMB 704, Akure,