Jong-Min Kim* and Jon E. Anderson. Statistics Discipline Division of Science and Mathematics University of Minnesota at Morris

Size: px

Start display at page:

Download "Jong-Min Kim* and Jon E. Anderson. Statistics Discipline Division of Science and Mathematics University of Minnesota at Morris"

Oswald Walters
5 years ago
Views:

1 Jackknife Variance Estimation for Two Samples after Imputation under Two-Phase Sampling Jong-Min Kim* and Jon E. Anderson Statistics Discipline Division of Science and Mathematics University of Minnesota at Morris 2004 JSM, August 11 1

2 Outline Jackknife Background and Definition Ratio Estimator in Simple Random Sampling Jackknife variance estimators for two samples Simulation study Ratio Estimator in Stratified Random Sampling Jackknife variance estimators for two samples Simulation study Conclusions 2004 JSM, August 11 2

3 Jackknife Background Introduced by Quenouille (1949, 1956) as a method to reduce bias Popularized by Tukey (1958) who used it for variances and CIs Arvesen (1969) was the first to propose two-sample jackknife estimator 2004 JSM, August 11 3

4 Jackknife Definition Let ˆθ be an estimate Let ˆθ( j) be an estimator of the same form with observation j deleted The jackknife estimate of the variance of ˆθ is n 1 n n 2 [ˆθ( j) ˆθ] j= JSM, August 11 4

5 Naive Jackknife Variance Estimator for Two Samples, Two-Phase, Simple Random Sampling Two, first-phase simple random samples s 1 of size n 1 and s 2 of size n 2 are taken without replacement from a population of N elements. Simple random subsamples s 1 of size n 1 and s 2 of size n 2 are taken without replacement from s 1 and s 2. Ratio estimators of Y, are y r1 = (y 1 /x 1 )x 1 = ˆR 1 x 1, y r2 = (y 2 /x 2 )x 2 = ˆR 2 x 2, where x 1 and x 2 are the means for the first-phase samples s 1 and s 2 and (x 1, y 1 ) and (x 2, y 2 ) are the means for the second-phase samples s 1 and s JSM, August 11 5

6 Naive Jackknife Variance Estimator for Two Samples, SRS Define y rk ( j) = [y k ( j)/x k ( j)]x k( j), for all j s k, k = 1, 2, where (n k x k x j )/(n k 1), if (j s k ), x k ( j) = x k, if (j s k s k), y k ( j) = (n k y k y j )/(n k 1), if (j s k ), y k, if (j s k s k). and x k ( j) = (n k x k x j)/(n k 1) for all j s k JSM, August 11 6

7 Naive Jackknife Variance Estimator for Two Samples, SRS Apply the usual jackknife method for y rk ( j) to get v Jrk = n k 1 n k j s k [y rk ( j) y rk ] 2. The jackknife variance estimator is a weighted average of two estimators, given by v Jr = 1v n Jr1 + n 2v Jr2 n 1 + n 2 = n 1 1 n 1 + [y r1 ( j) y r1 ] 2 + n 2 1 n 2 n 1 + n 2 j s 1 l s 2 [y r2 ( l) y r2 ] JSM, August 11 7

8 Adjusted Jackknife Variance Estimator for Two Samples, Two-Phase, Simple Random Sampling Let s consider an adjusted jackknife variance estimator using a data imputation framework. In one of our samples, k = 1, 2 we define our estimator of Y as y ki = (1/n k ) i s yi. k If the observation is part of the second phase sample, s k, y i = y i, because y i is observed. If the value of y is not directly observed because it is part of s k s k, the value is obtained through ratio imputation as y i = (y k/x k )x i JSM, August 11 8

9 Adjusted Jackknife Variance Estimator for Two Samples, Two-Phase, Simple Random Sampling Rao and Sitter (1995) proposed the following device { ẑ ki ( j) = yki yk ( j) + x k ( j) x ki y } k x ki, x k for sample k = 1, 2. Under this formulation, ẑ ki ( j) = y ki = (y k/x k )x ki for j s k s k in sample k = 1, 2, and ẑ ki ( j) = (y k ( j)/x k ( j))x ki for j s k in sample k = 1, JSM, August 11 9

10 Define the adjusted estimator, y a ki( j) = 1 n k 1 n k i=1 ẑ ki ( j). Define the jackknife variance estimator for sample k, v Jrk = n k 1 n k j s k [y a ki( j) y ki ] 2, where y ki = y kr under ratio imputation. The jackknife variance estimator based on adjusted imputed estimators y ki, k = 1, 2 is a weighted average of two estimators, given by vjr a = 1v n Jr1 + n 2v Jr2 n 1 + n 2 = n 1 1 n 1 + [y a 1I( j) y 1I ] 2 + n 2 1 n 2 n 1 + n 2 j s 1 l s 2 [y a 2I( l) y 2I ] JSM, August 11 10

11 Simulation Study Design: Simple Random Sampling Population size is Pop Y is related to Pop X, Y = 0.8 X + ɛ Y X 2004 JSM, August 11 11

12 Simulation: Simple Random Sampling One Sample Jacknife Variance for Ratio Estimator J. Variance *SecondPhase 3*SecondPhase Second Phase Sample Size 2004 JSM, August 11 12

13 Simulation: Simple Random Sampling Two Sample Jackknife Variance for Ratio Estimator J. Variance *SecondPhase 3*SecondPhase Second Phase Sample Size 2004 JSM, August 11 13

14 Simulation: Simple Random Sampling Comparison of One vs Two Sample Estimators St. Dev of Ests One Sample St. Dev Two Sample St. Dev Second Phase Sample Size 2004 JSM, August 11 14

15 Adjusted Jackknife Variance Estimator for Two Samples, Two-Phase, Stratified Random Sampling Suppose the population of N units consists of L strata such that the h-th stratum consists of N h units and L h=1 N h = N. Suppose that an auxiliary variable, x, closely related to an item y is observed on all sample units, s hk in sample k = 1, 2 for stratum h. Ratio imputation uses y hki = (y hk/x hk )x hki for i s hk s hk where y hk and x hk are the means of y and x for the respondents s hk in stratum h JSM, August 11 15

16 Under ratio imputation, y ki = L W h y hki = h L W h (y hk /x hk )x hk, h where x hk is the x mean for the full sample s hk h. from stratum Also, y hki ( hkj) = [y hk( hkj)/x hk ( hkj)]x hki, under ratio imputation when hkjth respondent is deleted, where y hk ( hkj) = [n hk y hk y hj ] /(n hk 1) and x hk ( hkj) = [n hk x hk x hj ] /(n hk 1), for k = 1, JSM, August 11 16

17 Define ẑ hki ( hkj) = y hki + { yhk ( hkj) x hk ( hkj) x hki y } hk x hki, x hk for sample k = 1, 2, stratum h. Under this formulation, ẑ hki ( hkj) = y hki = (y hk/x hk )x hki for hkj s hk s hk in sample k = 1, 2, and ẑ hki ( hkj) = (y hk ( hkj)/x hk ( hkj))x hki for hkj s hk in sample k = 1, 2, stratum h. Define the adjusted estimator, y a hki( hkj) = 1 n hk 1 n hk i=1 ẑ hki ( hkj). Using these values, the jackknife variance estimator is given by v Jr (y ki ) = L h=1 n hk 1 n hk n hk j=1 [y a ki( hkj) y ki ] 2, 2004 JSM, August 11 17

18 Adjusted Jackknife Variance Estimator for Two Samples,Two-Phase, Stratified Random Sampling Noting that y a ki ( hkj) y ki = W h [y a hki ( hkj) y hki], where y a hki ( hkj) is the adjusted imputed estimator of the hth stratum mean Y h when hkjth sample unit is deleted, we get v Jr (y ki ) = = L Whv 2 Jr (y hki ) h=1 L h=1 W 2 h n hk 1 n hk n hk j=1 (y a hki( hkj) y hki ) 2, for k = 1, JSM, August 11 18

19 The jackknife variance estimator is a weighted average of two estimators, given by v Jrs = 1v n Jr (y 1I ) + n 2v Jr (y 2I ) n 1 + n 2 n L 1 = n 1 + W 2 n h1 1 h n 2 n h=1 h1 n L 2 + n 1 + W 2 n h2 1 h n 2 n h2 h=1 n h1 j=1 n h2 j=1 where n 1 = 2 h=1 n h1 and n 2 = 2 h=1 n h2. (y a h1i( h1j) y h1i ) 2 (y a h2i( h2j) y h2i ) JSM, August 11 19

20 Simulation Study Design: Stratified Random Sampling Population size is N = Pop Y is related to Pop X, Y = 0.8 X + ɛ Three strata, X < 90, 90 X 110, 110 < X. Stratum 1 size = N 1 = 1633 s.t. W 1 = N 1 N = Stratum 2 size = N 2 = 6805 s.t. W 2 = N 2 N = Stratum 3 size = N 3 = 1562 s.t. W 3 = N 3 N = JSM, August 11 20

21 Simulation Study Design: Stratified Random Sampling Stratum=3 Population Y Values Stratum=1 Stratum= Population X Values 2004 JSM, August 11 21

22 Simulation: Stratified Random Sampling One Sample Jackknife Variance, Stratified Sampling J. Variance *SecondPhase 3*SecondPhase Second Phase Sample Size 2004 JSM, August 11 22

23 Simulation: Stratified Random Sampling Two Sample Jackknife Variance, Stratified Sampling J. Variance *SecondPhase 3*SecondPhase Second Phase Sample Size 2004 JSM, August 11 23

24 Simulation: Stratified Random Sampling Comparison of One and Two Samples, Stratified RS St. Dev of Ests One Sample St. Dev Two Sample St. Dev Second Phase Sample Size 2004 JSM, August 11 24

25 Simulation: Stratified Random Sampling Comparison of Complete and Missing Data St. Dev of Ests Complete Data Missing Data Case Second Phase Sample Size Stratified RS, First SS= 2*SecondPhase 2004 JSM, August 11 25

26 Calibration Approach to Jackknife Variance Estimation Three major advantages of calibration approach in Survey Sampling Leads to consistent estimates Provides an important class of techniques for the efficient combination of data sources. Has computational advantage for estimates. Apply Tracy et al. (2003) calibration in Stratified and Double Sampling to Jackknife Variance Estimator 2004 JSM, August 11 26

27 Conclusions and Future Study Jackknife variance estimator for two samples has less SD (variation) than Jackknife variance estimator for one sample. Adjusted Jackknife variance estimator for two samples, Stratified RS, is more efficient than the Adjusted Jackknife variance estimator for two samples, SRS. As the sample sizes increase, the adjusted Jackknife variance estimator for two-sample is shown to be consistent. Future: Apply Jackknife variance estimator for two samples to Stratified Multistage Sampling JSM, August 11 27

Jong-Min Kim* and Jon E. Anderson. Statistics Discipline Division of Science and Mathematics University of Minnesota at Morris

Jackknife Variance Estimation of the Regression and Calibration Estimator for Two 2-Phase Samples Jong-Min Kim* and Jon E. Anderson jongmink@morris.umn.edu Statistics Discipline Division of Science and