Estimation of Sensitive Characteristic using Two-Phase Sampling for Regression Type Estimator

Journal of Informatics and Mathematical Sciences Volume 4 (01), Number 3,. 311 317 RGN Publications htt://www.rgnublications.com Estimation of Sensitive Characteristic using Two-Phase Samling for Regression Tye Estimator M. Javed and M.L. Bansal Abstract. The roblem of underreorting and non resonse on the sensitive issues are very common in most of the surveys due to our social setu. The randomied resonse (RR) models reduce rates of non-resonse and biased resonse that would ensure resondents rivacy if they resond truthfully concerning ersonal questions. Here RR device is roosed for regression tye estimator in which two indeendent samles are drawn from the oulation. The estimator of oulation mean of sensitive variable has been develoed. Its bias and variance and otimum variance have also been derived. Introduction Warner (1965) in his ioneer model suggested the technique of randomied resonse where every erson in the oulation belongs either to sensitive grou A, or its comliment. Enquirer often feels embarrassed in asking direct questions about sensitive issues. Such questions might be about drug usage, tax evasion, history of induced abortion, illegal income etc. Horvit et al. (1967) and Greenberg et al. (1971) have extended the Warner (1965) model to the case where the resonses to the sensitive question are quantitative rather than a simle yes or no. The resondent selects one of the two questions by means of a randomiation device. According to Greenberg et al. (1971), it is essential that the mean and variance of the resonses to the unrelated question be close to those for the sensitive questions, otherwise, it will often be ossible to recognie from the resonse which question was selected. Two samle single alternate RR model roosed by Greenberg and co-workers for obtaining the data on continuous tye sensitive random variable is more racticable and easy in handling. If some auxiliary information is available then resonses can be used in regression analysis by following Singh et al. (1996), Strachan et al. (1998), and Singh and King (1999). Recent ublications on RR techniques among others are, Singh et al. (000), Chaudhuri (001, 004), Singh (00), Elffers et al. (003), Huang Key words and hrases. RR device; Ratio tye estimator; Two-hase samling; Bias and variance.

31 M. Javed and M.L. Bansal (004), Javed and Grewal (006), Grewal et al. (006) and Sidhu et al. (007) and Gjestvang and Singh (006, 007), Samuel (008) and Carel et al. (010). Proosed Randomied Resonse Device Here an RR device is roosed for ratio tye estimator in which two indeendent samles, samle-i and samle-ii of sies n and k with relacement are drawn from the oulation. The resondents in the first samle are given the RR device which contained statements regarding A based on sensitive characteristic X and Q based on unrelated characteristic Y with resective robabilities and (1 ). Then the resondents are asked to answer the selected statement in terms of numerical figures without revealing to the interviewer which one of the two statements is answered. The resondents in the second samle are asked to answer in numerical figures (without using the RR device) for two direct statements Q 1 and Q based on unrelated characteristics Y 1 and Y resectively. The statements Q 1 and Q are correlated. The characteristics Y 1 and Y are unrelated with the sensitive character X. Let = robability that sensitive question is selected by the first resondent in the first samle. 1 = robability that non-sensitive question Q is selected by the first resondent in the first samle. = q Z i = observed resonse from individual i in first samle. X i = resonse of individual i in case he/she selects sensitive question through RR device in first samle. Y i = resonse of individual i in case he/ she selects alternate question Q through RR device in first samle. Y 1 j = resonse of first question Q 1 directly asked from j-th resondent in second samle. Y j = resonse of second question Q directly asked from j-th resondent in second samle. Estimator based on roosed randomied device Here a regression tye estimator for two-hase samling is roosed using the suggested RR device. A regression tye estimator µ xlr of mean µ x is roosed in two-hase samling is as below µ xlr = 1 X + (1 )ȳ (1 ){ ȳ + β(ȳ 1 ȳ 1} (1)

Estimation of Sensitive Characteristic using Two-Phase Samling for Regression Tye Estimator 313 where ȳ 1 = 1 m y 1 j, ȳ = 1 m y j, ȳ 1 = 1 k k y 1j and ȳ = 1 n n y i, () i=1 Let β = s y 1 y s y1, (3) s y1 y = 1 m 1 s y = 1 m 1 (y 1 j ȳ 1 )(y j ȳ ), s y 1 = 1 (y 1 j ȳ 1 ) and m 1 (y j ȳ ). (4) ε 1 = ȳ1 Ȳ 1 1, ε = ȳ 1 Ȳ 1 1, ε 3 = s y 1 y S y1 y such that E(ε 3 ) = E(ε 4 ) = 0 where 1 and ε 4 = s y 1 S y 1 1 (5) E(ε 1 ε 3 ) = µ 1 ȳ 1 µ 11 and E(ε 1 ε 4 ) = µ 30 ȳ 1 µ 0 (6) µ i j = E(y 1 ȳ 1 ) i (y ȳ ) j. (7) Bias of the Estimator The roosed estimator is biased. The bias of the estimator µ xlr is given in the following theorem: Theorem 1. The bias of the estimator µ xlr to the order O(n ) is given by 1 1 B(µ xlr ) = m 1 µ1 µ 11µ 30 k µ 0 µ. (8) 0 Proof. We have E(µ xlr ) = 1 E[ X + (1 )ȳ (1 ){ȳ + β(ȳ 1 ȳ 1}] = 1 [ X + (1 )Ȳ (1 ){Ȳ + βȳ 1 E((1 + ε 3 )(1 + ε 4 ) 1 (ε ε 1 ))}]. Considering the binomial exansion of (1 + ε 4 ) 1 u to the order of O(n ) and substituting the values of E(ε 1 ε 3 ) and E(ε 1 ε 4 ) in the above exression, we have 1 1 E(µ xlr ) = X m 1 µ1 µ 11µ 30 k µ 0 µ. (9) 0 As B(µ xlr ) = E(µ xlr ) X. (10) Using (9) in (10) we get (8).

314 M. Javed and M.L. Bansal Variance of the Estimator The variance of the estimator µ xlr is given in the following theorem: Theorem. The variance of the estimator µ xlr to the order of O(n ) is given by V (µ xlr ) = 1 σ 1 + (1 ) n k 1 1 σ y N + m 1 σ y (1 ρ ) (11) where σ = σ x + qσ y + q( X Ȳ ). Proof. We have V (µ xlr ) = 1 [V ( ) + (1 ) V (ȳ lrd )] (1) where = X + (1 )ȳ and ȳ lrd = ȳ + β(ȳ 1 ȳ 1) V (µ xlr ) = 1 σ n + (1 ) V (ȳ lrd ). (13) Now V (ȳ lrd ) = E 1 V ( ȳ lrd /s) + V 1 E (ȳ lrd /s) = E 1 V ( ȳ lrd /s) + V 1 (y ) where s means reliminary samle and V (ȳ lrd /s) = E( ȳ lrd /s) [E(ȳ lrd/s)] = E({ ȳ + β(ȳ 1 ȳ 1)} ) (E{ ȳ + β(ȳ 1 ȳ 1)}). For two-hase samling one can refer Sukhatme et al. (1984) to obtain the variance exression 1 V (ȳ lrd /s) = m 1 s y (1 ρ ) where ρ is the coefficient of correlation between y 1 and y and s y is defined at (4). 1 V (ȳ lrd ) = E 1 m 1 k s y (1 ρ ) + V 1 (y ) 1 = m 1 1 s y (1 ρ ) + k 1 σ y N. (14) Using the result obtained at (14) in (13), we get (11). Otimum Variance of the Estimator We should choose that rocedure which for a fixed cost C 0 minimies the variance of the estimate. If C 1, C and C 3 are the costs er unit of collecting information on the variable Z, auxiliary variable y 1 and y under study resectively. Then, the total cost of the survey can be exressed as C 0 = nc 1 + kc + mc 3. (15)

Estimation of Sensitive Characteristic using Two-Phase Samling for Regression Tye Estimator 315 We shall now determine n, k and m so that for a fixed cost C 0, the variance of the estimator µ xlr is least. Consider therefore, the exression φ = V (µ xlr ) λ(c 0 nc 1 kc mc 3 ). (16) On differentiating φ artially with resect to n, k and m and equating them equal to ero, we have φ n = 1 σ n + λc 1, (17) φ 1 σ k = y (1 ρ ) + λc, (18) φ 1 σ m = y (1 ρ ) + λc 3. (19) Solving (17), (18) and (19) for n, k and m resectively, we have σ m n =, (0) λc 1 1 1 ρ k = σ y, (1) λc m = k C C 3. () Again differentiating φ artially with resect to λ and equating the result equal to ero, we have φ λ = C 0 nc 1 kc mc 3. (3) Substituting the values of n, k and m in (3), we have λ = σ C1 + C [qa + k C 3 ] C 0 where A = σ y 1 ρ. Now substituting the values of λ in (0), (1) and (), we get the otimum values of n, k and m resectively as n ot = C 0σ B C 1, k ot = C 0qA B C, m ot = k ot C C 3

316 M. Javed and M.L. Bansal and B = σ C1 + C [qa + k C 3 ]. where A is defined above and q = 1. Theorem 3. The otimum variance of the estimator µ xlr to the order of O(n ) is given by V (µ xlr ) = 1 σ B C 1 B C + q C 0 C 0 qa 1 B σ y + N C 0 qa ( C 3 C )A. (4) Proof. We have V (µ xlr ) = 1 σ 1 + (1 ) n k 1 N 1 σ y + m 1 σ y (1 ρ ). Substituting the otimum values of n, k and m obtained earlier in the above exression, we get (4). References [1] F.W. Carel, Gerty Peeters, J.L.M. Lensvelt-Mulders and Karin Lasthuien (010), A note on a simle and ractical randomied resonse framework for eliciting dichotomous and quantitative information, Sociological Meth. Res., 39, 83 96. [] A. Chaudhuri (001), Using randomied resonse from a comlex survey to estimate a sensitive roortion in a dichotomous finite oulation, J. Statist. Planning Inference 94, 37 4. [3] A. Chaudhuri (004), Christofides randomied resonse technique in comlex samle survey, Metrika 60, 3 8. [4] E. Elffers, P.G.M. Van Der Heijden and M. Heemans (003), Exlaining regulatory non-comliance: a survey study of rule transgression for two Deutch instrumentl laws, alying the randomied resonse method, J. Quantitative Criminology 19, 409 439. [5] C.R. Gjestvang and S. Singh (006), A new randomied resonse model, J. Roy. Stat. Soc., B, 68, 53 530. [6] C.R. Gjestvang and S. Singh (007), Forced quantitative randomied resonse model: a new device, Metrika 66(), 43 56. [7] D.G. Horvit, B.V. Shah and W.R. Simmons (1967), The unrelated question randomied resonse model, Proc. Social Statist. Sec., Am. Statist. Ass., 65 7. [8] C.H. Huang (004), A survey technique for estimating the roortion and sensitivity in a dichotomous finite oulation, Statistica Neerlandica 58, 75 8. [9] B.G. Greenberg, R.R. Kuebler, J.R. Abernathy and D.G. Horvit (1971), Alication of the randomied resonse technique in obtaining quantitative data, J. Am. Statist. Ass. 66, 43 50. [10] I.S. Grewal, M. Bansal and S.S. Sidhu (006), Poulation mean estimator corresonding to Horvit and Thomson s estimator for multi-characteristics using randomied resonse technique, Model Assisted Statistics and Alications 1(4), 15 0. [11] M. Javed and I.S. Grewal (006), On the relative efficiencies of randomied resonse devices with Greenberg unrelated question model, Model Assisted Statistics and Alications 1(4), 91 97.

Estimation of Sensitive Characteristic using Two-Phase Samling for Regression Tye Estimator 317 [1] H. Samuel (008), The multi-item randomied resonse technique, Sociological Meth. Res. 36(4), 495 514. [13] S.S. Sidhu, M.L. Bansal and S. Singh (007), Estimation of sensitive multi-characters using unknown value of unrelated question, Alied Mathematical Sciences 1(37), 1803 180. [14] S. Singh (00), A new stochastic randomied resonse model, Metrika 56, 131 14. [15] S. Singh, A.H. Joarder and M.L. King (1996), Regression analysis using scrambled resonses, Australian J. Statist. 38(), 01 11. [16] S. Singh and M.L. King (1999), Estimation of coefficient of determination using scrambled resonses, J. Indian Soc. Agric. Statist. 5(3), 338 343. [17] S. Singh, R. Singh and N.S. Mangat (000), Some alternative strategies to Moor s model in randomied resonse samling, J. Statist. Planning Inference 83, 43 55. [18] R. Strachan, M.L. King and S. Singh (1998), Likelihood based estimation of the regression model with scrambled resonse, Australian Newealand J. Statist. 40(3), 79 90. [19] P.V. Sukhatme, B.V. Sukhatme, S. Sukhatme and C. Asok (1984), Samling Theory of Surveys with Alications, Iowa State University Press, USA and Indian Society of Agricultural Statistics, New Delhi, India. [0] S.L. Warner (1965), Randomied resonse: a survey technique for eliminating evasive answer bias, J. Am. Statist. Ass. 60, 63 69. M. Javed, Deartment of Mathematics, Statistics & Physics, Punjab Agricultural University, Ludhiana 141004, India. E-mail: mjaved@au.edu M.L. Bansal, Deartment of Mathematics, Statistics & Physics, Punjab Agricultural University, Ludhiana 141004, India. Received May, 011 Acceted May 9, 01