On stratified randomized response sampling

Similar documents
Improved Estimation of Rare Sensitive Attribute in a Stratified Sampling Using Poisson Distribution

A New Mixed Randomized Response Model

An Improved Warner s Randomized Response Model

Extension of Mangat Randomized Response Model

Abstract. Ranked set sampling, auxiliary variable, variance.

Improved Class of Ratio -Cum- Product Estimators of Finite Population Mean in two Phase Sampling

Element sampling: Part 2

Random Variables, Sampling and Estimation

Estimation for Complete Data

A General Family of Estimators for Estimating Population Variance Using Known Value of Some Population Parameter(s)

Chain ratio-to-regression estimators in two-phase sampling in the presence of non-response

Properties and Hypothesis Testing

Estimation of Population Mean Using Co-Efficient of Variation and Median of an Auxiliary Variable

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

1 Inferential Methods for Correlation and Regression Analysis

Estimation of the Population Mean in Presence of Non-Response

Simple Random Sampling!

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Monte Carlo method and application to random processes

Modified Ratio Estimators Using Known Median and Co-Efficent of Kurtosis

Estimating the Population Mean using Stratified Double Ranked Set Sample

Varanasi , India. Corresponding author

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable.

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution

A proposed discrete distribution for the statistical modeling of

A statistical method to determine sample size to estimate characteristic value of soil parameters

KLMED8004 Medical statistics. Part I, autumn Estimation. We have previously learned: Population and sample. New questions

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

The Sampling Distribution of the Maximum. Likelihood Estimators for the Parameters of. Beta-Binomial Distribution

Goodness-Of-Fit For The Generalized Exponential Distribution. Abstract

Simulation. Two Rule For Inverting A Distribution Function

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

MBACATÓLICA. Quantitative Methods. Faculdade de Ciências Económicas e Empresariais UNIVERSIDADE CATÓLICA PORTUGUESA 9. SAMPLING DISTRIBUTIONS

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Journal of Scientific Research Vol. 62, 2018 : Banaras Hindu University, Varanasi ISSN :

Estimation of Gumbel Parameters under Ranked Set Sampling

Some Exponential Ratio-Product Type Estimators using information on Auxiliary Attributes under Second Order Approximation

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

This is an introductory course in Analysis of Variance and Design of Experiments.

Frequentist Inference

Lecture 2: Monte Carlo Simulation

5. Fractional Hot deck Imputation

Table 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab

The standard deviation of the mean

AClassofRegressionEstimatorwithCumDualProductEstimatorAsIntercept

Investigating the Significance of a Correlation Coefficient using Jackknife Estimates

Chapter 13, Part A Analysis of Variance and Experimental Design

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Statistical inference: example 1. Inferential Statistics

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Topic 10: Introduction to Estimation

Alternative Ratio Estimator of Population Mean in Simple Random Sampling

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Final Examination Solutions 17/6/2010

A Family of Unbiased Estimators of Population Mean Using an Auxiliary Variable

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Bayesian and E- Bayesian Method of Estimation of Parameter of Rayleigh Distribution- A Bayesian Approach under Linex Loss Function

GUIDELINES ON REPRESENTATIVE SAMPLING

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

MOMENT-METHOD ESTIMATION BASED ON CENSORED SAMPLE

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

A NEW METHOD FOR CONSTRUCTING APPROXIMATE CONFIDENCE INTERVALS FOR M-ESTU1ATES. Dennis D. Boos

Parameter, Statistic and Random Samples

Linear Regression Models

Chapter 8: STATISTICAL INTERVALS FOR A SINGLE SAMPLE. Part 3: Summary of CI for µ Confidence Interval for a Population Proportion p

3 Resampling Methods: The Jackknife

Quick Review of Probability

Introduction to Probability and Statistics Twelfth Edition

A new distribution-free quantile estimator

Access to the published version may require journal subscription. Published with permission from: Elsevier.

Topic 9: Sampling Distributions of Estimators

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

6 Sample Size Calculations

Section 9.2. Tests About a Population Proportion 12/17/2014. Carrying Out a Significance Test H A N T. Parameters & Hypothesis

Introducing Sample Proportions

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Improved exponential estimator for population variance using two auxiliary variables

The Bootstrap, Jackknife, Randomization, and other non-traditional approaches to estimation and hypothesis testing

Topic 9: Sampling Distributions of Estimators

Mathematical Modeling of Optimum 3 Step Stress Accelerated Life Testing for Generalized Pareto Distribution

Y i n. i=1. = 1 [number of successes] number of successes = n

THE KALMAN FILTER RAUL ROJAS

G. R. Pasha Department of Statistics Bahauddin Zakariya University Multan, Pakistan

AMS570 Lecture Notes #2

MATH/STAT 352: Lecture 15

Statistics 511 Additional Materials

New Ratio Estimators Using Correlation Coefficient

A NEW CLASS OF 2-STEP RATIONAL MULTISTEP METHODS

SYSTEMATIC SAMPLING FOR NON-LINEAR TREND IN MILK YIELD DATA

Quick Review of Probability

Improved Ratio Estimators of Population Mean In Adaptive Cluster Sampling

Provläsningsexemplar / Preview TECHNICAL REPORT INTERNATIONAL SPECIAL COMMITTEE ON RADIO INTERFERENCE

Stochastic Simulation

Confidence Intervals for the Population Proportion p

Control Charts for Mean for Non-Normally Correlated Data

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.

Understanding Samples

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

Transcription:

Model Assisted Statistics ad Applicatios 1 (005,006) 31 36 31 IOS ress O stratified radomized respose samplig Jea-Bok Ryu a,, Jog-Mi Kim b, Tae-Youg Heo c ad Chu Gu ark d a Statistics, Divisio of Life Sciece ad Geetic Egieerig ad Statistics, Cheogju Uiversity, Cheogju, Chugbuk, 360-764, Republic of Korea b Statistics, Divisio of Sciece ad Mathematics, Uiversity of Miesota, Morris, MN, 5667, USA c Departmet of Statistics, North Carolia State Uiversity, Raleigh, NC, 7695, USA d Natioal Cacer Ceter, Ilsa, Goyag-si, Gyeoggi-do, 411-769, Republic of Korea Abstract. I this paper, we propose a ew quatitative radomized respose model based o Magat ad Sigh [7] two-stage radomized respose model. We derive the estimator of the sesitive variable mea, ad show that our method is more efficiet tha other radomized respose models suggested by Greeberg et al. [3] ad Gupta et al. [4] estimators. Keywords: Quatitative radomized respose techique, sesitive characteristics, stratified samplig 1. Itroductio The radomized respose techique is a procedure for collectig the iformatio o sesitive characteristics without exposig the idetity of the respodet. It was first itroduced by Warer [8] as a alterative survey techique for socially udesirable or icrimiatig behavior questios. Greeberg et al. [3] have proposed ad developed the urelated questio radomized respose desig for estimatig the mea ad the variace of the distributio of a quatitative variable. Gupta et al. [4], ad Arab [1] have showed that optioal radomized respose model is more accurate while beig less itrusive. Hog et al. [5] suggest a stratified radomized respose model usig a proportioal allocatio. However their model may have a high costs due to the difficulty i obtaiig a proportioal sample from each stratum. To rectify this problem, Kim ad Warde [6] suggest a stratified radomized respose model usig a optimal allocatio which is more efficiet tha that of usig the proportioal allocatio. Correspodig author: Jea-Bok Ryu, Statistics, Divisio of Life Sciece ad Geetic Egieerig ad Statistics, Cheogju Uiversity, Cheogju, Chugbuk, 360-764, Republic of Korea. E-mail: jbryu@cju.ac.kr.. A review of quatitative radomized respose methods Urelated questio radomized respose method proposed by Greeberg et al. [3] is a survey procedure that a respodet could be asked oe of two questios depedig o the outcome of a radomizatio device. For example, a iterviewee performs a radomizatio device with two outcomes each with pre-assiged probabilities ad 1 which will aswer oe of the followig questios: S : How may abortios have you had durig your lifetime? N : How may magazies do you subscribe to? where we deotes S as the sesitive questio ad N as the o-sesitive questio. Two idepedet, ooverlappig samples of sizes 1 ad are used (size 1 eed ot be equal to size ). Let the populatio mea of both the sesitive ad o-sesitive distributios be µ A, µ Y, respectively. Let the populatio variace of both the sesitive ad o-sesitive distributios be A, Y, respectively. Ubiased estimators for the meas of the sesitive ad o-sesitive radom variables, µ A ad µ Y, are ˆµ 1 = (1 ) T 1 (1 1 ) T 1 ad (1) ISSN 1574-1699/05/06/$17.00 005/006 IOS ress ad the authors. All rights reserved

3 J.-B. Ryu et al. / O stratified radomized respose samplig ˆµ = T 1 1 T, () 1 where T i is total sample mea computed from the resposes i the i th samples ad i is the selectio probability for the sesitive questio i the i th sample, for i =1, ( 1 ). The variace of ˆµ 1 is give by Var(ˆµ 1 ) (3) = (1 ) Var( T 1 )+(1 1 ) Var( T ) ( 1 ), where Var( T j )= 1 j (Y + j(a Y )+ j(1 j )(µ A µ Y ) ). I this method, if µ Y ad Y are kow i advace, oly oe sample is eeded. So we defie 1 = ad T 1 = T, the Eqs (1) ad (3) are simplified as ad ˆµ 1 = T (1 )µ Y Var(ˆµ 1 )= Var( T ) (4) ( Y + ( A Y ) (5) + (1 )(µ A µ Y ) ). Eichhor ad Hayre [] itroduce a scrambled radomized respose method for estimatig the mea µ A ad the variace A of the sesitive questio A. Accordig to them, each respodet selected i the sample is istructed to use a radomizatio device ad geerate a radom umber, say B, from some pre-assiged distributio. The distributio of the radom variable B, also called a scramblig variable, is assumed to be kow. The mea µ B ad the variace B of the scramblig variable are also assumed to be kow. The i th respodet selected i the sample of size, draw by usig simple radom samplig with replacemet (SR- SWR), is requested to report the value Z i = B i A i /µ B as a scrambled respose o the sesitive variable, A. They show that a ubiased estimator of the populatio mea, µ A,isgiveby ˆµ E Z i (6) with variace Var(ˆµ E ) ( A + CB ( A + µ A )), (7) where B is the stadard deviatio of the scramblig variable B, ad C B = B /µ B deotes the kow coefficiet of variatio of the scramblig variable B. Gupta et al. [4] propose a optioal radomized respose techique, which is more efficiet tha the scrambled radomized respose techique suggested by Eichhor ad Hayre []. I the optioal radomized respose techique, where each respodet selected by SRSWR, ca choose oe of the followig two optios: (a) The respodet ca report the correct respose A, or (b) The respodet ca report the scrambled respose BA, where B deotes the idepedet scramblig variable. I optioal procedure, they assumed that both B ad A are positive radom variables ad µ B =1. The optioal radomized respose model ca be writte as Z = B I A, (8) where I is a idicator radom variable defied as { 1 if the respose is scrambled I = 0 otherwise. If W deotes the probability that a perso will report the scrambled respose, the I is a Beroulli radom variable with E(I) =W, where W ca be called the sesitivity of the questio. They showed a ubiased estimator of populatio mea, µ A, is give by ˆµ G Z i, (9) with variace Var(ˆµ G ) ( A + WCB( A + µ A) ), (10) where C B = B /µ B deotes the kow coefficiet of variatio of the scramblig variable B. 3. Two-stage quatitative radomized respose model I this sectio, we propose a two-stage quatitative radomized respose model. We assume that a sample of size is selected by SRSWR. The method is described as follows. Stage 1 A idividual respodet i the sample is istructed to use the radomizatio device R 1 which cosists of two statemets: questio ad (ii) Go to the radomizatio device R i the secod stage

J.-B. Ryu et al. / O stratified radomized respose samplig 33 represeted with probabilities ad 1. Stage The radomizatio device R cosists of two statemets: questio ad (ii) Report the scrambled respose AB of a sesitive questio represeted with probabilities T ad 1 T. The respodet should ot report to a iterviewer which steps are take to protect the respodet s privacy. We assumed that both B ad A are positive radom variables, µ B =1, ad B = ψ. Similar to Eichhor ad Hayre [] approach, the distributio of radom variable B, the mea µ B ad the variace B of the scramblig variable are all assumed to be kow. Based o two-stage procedures, the i th respodet selected i the sample of size, draw by usig SRSWR, is requested to report the value, U = αa +(1 α)(βa +(1 β)ab), (11) where i R α = 1 0 if a respodet chooses a statemet i R 1 ad β = i R 0 if a respodet chooses a statemet i R The expected value of the observed respose is, E(U)=E (αa +(1 α)(βa +(1 β)ab)) =µ A +(1 )(Tµ A +(1 T )µ A µ B ) = µ A, (1) where α is a Beroulli radom variable with E(α) =, Var(α) = (1 ) ad β is Beroulli with E(β) =T, Var(β) =T (1 T ). Theorem 3.1. A ubiased estimator, ˆµ A, of the populatio mea µ A is give by, ˆµ A U i (13) Theorem 3.. The variace of the proposed estimator ˆµ A is give by Var(ˆµ A ) (14) [ A +(1 )(1 T )ψ (µ A + A) ]. If 0 < <1 i Eq. (14), the, obtai the relative efficiecy of ˆµ A with respect to ˆµ G, we compare Var(ˆµ G ) ad Var(ˆµ A ) as follows: Var(ˆµ G ) Var(ˆµ A ) ( A + Wψ (A + µ A) ) 1 [ A +(1 )(1 T )ψ (µ A + A) ] ( (µ A + A )(1 T )ψ) 0. We have show that the proposed estimator ˆµ A is more efficiet tha the estimator ˆµ G suggested by Gupta et al. [4]. 4. Stratified two-stage quatitative radomized respose model I this sectio, we ewly propose a two-stage quatitative radomized respose techique i stratified samplig. The mai advatage of the stratified approach is that the techique overcome the limitatio of the loss of idividual characteristics of the respodets. We assume that the populatio is partitioed ito strata, ad a sample is selected by the SRSWR from each stratum. We assume that the umber of uits i each stratum is kow. Let deote the umber of uits i the sample from stratum h ad deote the total umber of uits i the samples from all strata so that = k. Stage 1 A idividual respodet i the sample is istructed to use the radomizatio device R 1h which cosists of two statemets: questio ad (ii) Go to the radomizatio device R h i the secod stage represeted with probabilities h ad 1 h. Stage The radomizatio device R h cosists of two statemets: questio ad (ii) Report the scrambled respose AB of a sesitive questio

34 J.-B. Ryu et al. / O stratified radomized respose samplig = w h (µ A h + A h )( h +(1 h )T h +(1 h )(1 T h )(1 + ψh )) µ A h. k w h (µ A h + A h )( h +(1 h )T h +(1 h )(1 T h )(1 + ψh )) µ A h ( k ) 1 w h (µ A h + A h )( h +(1 h )T h +(1 h )(1 T h )(1 + ψh )) µ A h. (0) represeted with probabilities T h ad 1 T h. Uder the assumptio that respodet reports truthfully ad h ad T h are set by the researcher, the distributio of radom variable B h, the mea µ Bh ad the variace B h of the scramblig variable are all assumed to be kow. We assume that µ Bh =1ad B h = ψh for all h =1,,,k. The i th respodet selected i the sample of size i stratum h, draw by usig SRSWR, is requested to report the value, U h = α h A h +(1 α h ) (15) (β h A h +(1 β h )A h B h ), where, α h = β h = i R 1h 0 if a respodet chooses a statemet i R 1h i R h 0 if a respodet chooses a statemet i R h Similar to Eq. (1), the expected value of the observed respose is give by, E(U h )=E(α h A h +(1 α h )(β h A h +(1 β h )A h B h )) = h µ Ah +(1 h )(T h µ Ah +(1 T h )µ Ah µ Bh )=µ Ah (16) where α h is a Beroulli radom variable with E(α h )= h,var(α h ) = h (1 h ) ad β h is a Beroulli radom variable with E(β h )=T h,var(β h )=T h (1 T h ). By Theorem 3.1, a ubiased estimator of the populatio mea µ Ah i stratum h is, ˆµ Ah U hi (17) ad its variace is Var(ˆµ Ah ) ((µ A h + A h )( h +(1 h )T h +(1 h )(1 T h )(1 + ψ h)) µ A h ). Sice the selectios i differet strata are made idepedetly, the mea estimators for idividual strata ca be added together to obtai a mea estimator for the whole populatio. The mea estimator of µ A for stratified samplig scheme is: ˆµ s A = k w h ˆµ Ah = k w h U hi, (18) where w h = (N h /N ) for h,,,k, so that w = k w h =1, N is the umber of uits i the whole populatio ad N h is the total umber of uits i stratum h. It is easily show that the proposed mea estimator ˆµ s A is a ubiased estimate for the populatio mea µ A. The variace of the mea estimator ˆµ s A is: Var(ˆµ s A ) (19) k wh = ((µ A + A h )( h +(1 h )T h h +(1 h )(1 T h )(1 + ψ h )) µ A h ). Iformatio o µ Ah ad A h is usually uavailable, however if prior iformatio o µ Ah ad A h is available from past experiece, the we may derive the followig optimal allocatio formula. Usig the optimal-allocatio approach based o Kim ad Warde [6], oe ca show that the variace i Eq. (19) is miimized whe 1,,..., k are chose such that (the first equatio o the top of the page). Uder this optimal-allocatio assumptio, the variace i Eq. (19) becomes i Eq. (0). Theorem 4.1. Assumig optimal allocatio, whe w 1 = w =1/ad D =(D 1 + D )/, the stratified estimator ˆµ s A is more efficiet tha the proposed model

J.-B. Ryu et al. / O stratified radomized respose samplig 35 Relative Efficiecy Relative Efficiecy 0 0 40 60 80 0 0 40 60 80 T=0. 0. 0.4 0.6 0.8 T=0. Relative Efficiecy Relative Efficiecy 0 0 40 60 80 0 0 40 60 80 0. 0.4 0.6 0.8 T=0. T=0. 0. 0.4 0.6 0.8 0. 0.4 0.6 0.8 Fig. 1. The relative efficiecy of ˆµ A with respect to ˆµ 1 as a fuctio of =0.1, 0., 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 ad T =0., 0.4, 0.6, 0.8 with A =1, Y = B =0.5 ad µ A =(top left); µ A =4(top right); µ A =6(bottom left); µ A =8(bottom right). estimator ˆµ A, where, D =((µ A + A )( +(1 )T +(1 )(1 T )(1 + ψ )) µ A ), D h =(µ A h + A h )( h +(1 h )T h +(1 h )(1 T h )(1 + ψh )) µ A h, for h =1ad. For other cases, we ca easily check the relative efficiecy by a variace compariso with various settigs of µ Ah, A h, h, ad T h. Theorem 4.1 guaratees that whe optimal-allocatio is used, the stratified estimator, ˆµ s A, is more efficiet tha the estimator, ˆµ A, which igores the stratificatio. 5. Compariso ad discussio I this sectio, we preset a umerical study of the two-stage quatitative radomized respose model. The purpose of the simulatio is to cofirm that the proposed techique is more efficiet. We compare the origial quatitative radomized respose model proposed by Greeberg et al. [3] (ˆµ 1 ) with the proposed model (ˆµ A ) i terms of variace. From Eqs (4) ad (5), the mea ad the variace of Greeberg et al. [3] s sesitive mea estimator (whe µ Y is kow) are ˆµ 1 = T (1 )µ Y Var(ˆµ 1 )= Var( T ), ( Y + ( A Y ) + (1 )(µ A µ Y ) ). Uder the assumptios with µ Y = µ B ad Y = B = ψ, ˆµ 1 = T (1 ), Var(ˆµ 1 )= Var( T ) (ψ + (A Y ) + (1 )(µ A 1) ). The relative efficiecy of ˆµ A with respect to ˆµ 1 is as follows: RE = Var(ˆµ 1) Var(ˆµ A ) =[ψ + ( A ψ )

36 J.-B. Ryu et al. / O stratified radomized respose samplig + (1 )(µ A 1) ]/ [ ((µ A + A)( +(1 )T +(1 )(1 T )(1 + ψ )) µ A )]. Figure 1 shows that the proposed estimator, ˆµ A,is more efficiet tha the Greeberg et al. [3] estimator, ˆµ 1, with A. We ca show that the proposed method is more efficiet tha the Greeberg et al. [3] method if the coefficiet of variatio, C B = B /µ B = B 1.0. Our ewly proposed two-stage quatitative radomized respose model improves the performace by takig advatage of radomized respose iformatio provided by secod stage. We have show that our model is much more efficiet tha other models (Greeberg et al. [3] ad Gupta et al. [4]). Additioally, we have provided a comprehesive descriptio of the two-stage quatitative stratified radomized respose model ad its statistical properties. The use of stratified quatitative radomized respose model ca overcome the limitatios of radomized respose model which ca lose the idividual characteristics of the respodets. Refereces [1] R. Arab, Optioal radomized respose techiques for complex survey desigs, Biometrical Joural 46(1) (004), 114 14. [] B.H. Eichhor ad L.S. Hayre, Scrambled radomized respose methods for obtaiig sesitive quatitative data, Joural of Statistical laig ad Iferece 7 (1983), 307 316. [3] B.G. Greeberg, R.R. Kuebler Jr., J.R. Aberathy ad D.G. Horvitz, Applicatio of the radomized respose techique i obtaiig quatitative data, Joural of the America Statistical Associatio 66 (1971), 43 50. [4] S. Gupta, B. Gupta ad S. Sigh, Estimatio of sesitivity level of persoal iterview survey questios, Joural of Statistical laig ad Iferece 100 (00), 39 47. [5] K. Hog, J. Yum ad H. Lee, A stratified radomized respose techique, Korea Joural of Applied Statististics 7 (1994), 141 147. [6] J.-M. Kim ad W.D. Warde, A stratified Warer s radomized respose model, Joural of Statistical laig ad Iferece 10(1 ) (004), 155 165. [7] N.S. Magat ad R. Sigh, A alterative radomized respose procedure, Biometrika 77 (1990), 439 44. [8] S.L. Warer, Radomized respose: a survey techique for elimiatig evasive aswer bias, Joural of the America Statistical Associatio 60 (1965), 63 69.