Jong-Min Kim* and Jon E. Anderson. Statistics Discipline Division of Science and Mathematics University of Minnesota at Morris

Jackknife Variance Estimation for Two Samples after Imputation under Two-Phase Sampling Jong-Min Kim* and Jon E. Anderson jongmink@mrs.umn.edu Statistics Discipline Division of Science and Mathematics University of Minnesota at Morris 2004 JSM, August 11 1

Outline Jackknife Background and Definition Ratio Estimator in Simple Random Sampling Jackknife variance estimators for two samples Simulation study Ratio Estimator in Stratified Random Sampling Jackknife variance estimators for two samples Simulation study Conclusions 2004 JSM, August 11 2

Jackknife Background Introduced by Quenouille (1949, 1956) as a method to reduce bias Popularized by Tukey (1958) who used it for variances and CIs Arvesen (1969) was the first to propose two-sample jackknife estimator 2004 JSM, August 11 3

Jackknife Definition Let ˆθ be an estimate Let ˆθ( j) be an estimator of the same form with observation j deleted The jackknife estimate of the variance of ˆθ is n 1 n n 2 [ˆθ( j) ˆθ] j=1 2004 JSM, August 11 4

Naive Jackknife Variance Estimator for Two Samples, Two-Phase, Simple Random Sampling Two, first-phase simple random samples s 1 of size n 1 and s 2 of size n 2 are taken without replacement from a population of N elements. Simple random subsamples s 1 of size n 1 and s 2 of size n 2 are taken without replacement from s 1 and s 2. Ratio estimators of Y, are y r1 = (y 1 /x 1 )x 1 = ˆR 1 x 1, y r2 = (y 2 /x 2 )x 2 = ˆR 2 x 2, where x 1 and x 2 are the means for the first-phase samples s 1 and s 2 and (x 1, y 1 ) and (x 2, y 2 ) are the means for the second-phase samples s 1 and s 2. 2004 JSM, August 11 5

Naive Jackknife Variance Estimator for Two Samples, SRS Define y rk ( j) = [y k ( j)/x k ( j)]x k( j), for all j s k, k = 1, 2, where (n k x k x j )/(n k 1), if (j s k ), x k ( j) = x k, if (j s k s k), y k ( j) = (n k y k y j )/(n k 1), if (j s k ), y k, if (j s k s k). and x k ( j) = (n k x k x j)/(n k 1) for all j s k. 2004 JSM, August 11 6

Naive Jackknife Variance Estimator for Two Samples, SRS Apply the usual jackknife method for y rk ( j) to get v Jrk = n k 1 n k j s k [y rk ( j) y rk ] 2. The jackknife variance estimator is a weighted average of two estimators, given by v Jr = 1v n Jr1 + n 2v Jr2 n 1 + n 2 = n 1 1 n 1 + [y r1 ( j) y r1 ] 2 + n 2 1 n 2 n 1 + n 2 j s 1 l s 2 [y r2 ( l) y r2 ] 2. 2004 JSM, August 11 7

Adjusted Jackknife Variance Estimator for Two Samples, Two-Phase, Simple Random Sampling Let s consider an adjusted jackknife variance estimator using a data imputation framework. In one of our samples, k = 1, 2 we define our estimator of Y as y ki = (1/n k ) i s yi. k If the observation is part of the second phase sample, s k, y i = y i, because y i is observed. If the value of y is not directly observed because it is part of s k s k, the value is obtained through ratio imputation as y i = (y k/x k )x i. 2004 JSM, August 11 8

Adjusted Jackknife Variance Estimator for Two Samples, Two-Phase, Simple Random Sampling Rao and Sitter (1995) proposed the following device { ẑ ki ( j) = yki yk ( j) + x k ( j) x ki y } k x ki, x k for sample k = 1, 2. Under this formulation, ẑ ki ( j) = y ki = (y k/x k )x ki for j s k s k in sample k = 1, 2, and ẑ ki ( j) = (y k ( j)/x k ( j))x ki for j s k in sample k = 1, 2. 2004 JSM, August 11 9

Define the adjusted estimator, y a ki( j) = 1 n k 1 n k i=1 ẑ ki ( j). Define the jackknife variance estimator for sample k, v Jrk = n k 1 n k j s k [y a ki( j) y ki ] 2, where y ki = y kr under ratio imputation. The jackknife variance estimator based on adjusted imputed estimators y ki, k = 1, 2 is a weighted average of two estimators, given by vjr a = 1v n Jr1 + n 2v Jr2 n 1 + n 2 = n 1 1 n 1 + [y a 1I( j) y 1I ] 2 + n 2 1 n 2 n 1 + n 2 j s 1 l s 2 [y a 2I( l) y 2I ] 2. 2004 JSM, August 11 10

Simulation Study Design: Simple Random Sampling Population size is 10000 Pop Y is related to Pop X, Y = 0.8 X + ɛ Y 40 60 80 100 120 60 80 100 120 X 2004 JSM, August 11 11

Simulation: Simple Random Sampling One Sample Jacknife Variance for Ratio Estimator J. Variance 0 10 20 30 40 50 60 70 2*SecondPhase 3*SecondPhase 200 400 600 800 1000 Second Phase Sample Size 2004 JSM, August 11 12

Simulation: Simple Random Sampling Two Sample Jackknife Variance for Ratio Estimator J. Variance 0 10 20 30 40 50 60 70 2*SecondPhase 3*SecondPhase 100 200 300 400 500 Second Phase Sample Size 2004 JSM, August 11 13

Simulation: Simple Random Sampling Comparison of One vs Two Sample Estimators St. Dev of Ests 0.0 0.5 1.0 1.5 2.0 2.5 3.0 One Sample St. Dev Two Sample St. Dev 200 400 600 800 1000 Second Phase Sample Size 2004 JSM, August 11 14

Adjusted Jackknife Variance Estimator for Two Samples, Two-Phase, Stratified Random Sampling Suppose the population of N units consists of L strata such that the h-th stratum consists of N h units and L h=1 N h = N. Suppose that an auxiliary variable, x, closely related to an item y is observed on all sample units, s hk in sample k = 1, 2 for stratum h. Ratio imputation uses y hki = (y hk/x hk )x hki for i s hk s hk where y hk and x hk are the means of y and x for the respondents s hk in stratum h. 2004 JSM, August 11 15

Under ratio imputation, y ki = L W h y hki = h L W h (y hk /x hk )x hk, h where x hk is the x mean for the full sample s hk h. from stratum Also, y hki ( hkj) = [y hk( hkj)/x hk ( hkj)]x hki, under ratio imputation when hkjth respondent is deleted, where y hk ( hkj) = [n hk y hk y hj ] /(n hk 1) and x hk ( hkj) = [n hk x hk x hj ] /(n hk 1), for k = 1, 2. 2004 JSM, August 11 16

Define ẑ hki ( hkj) = y hki + { yhk ( hkj) x hk ( hkj) x hki y } hk x hki, x hk for sample k = 1, 2, stratum h. Under this formulation, ẑ hki ( hkj) = y hki = (y hk/x hk )x hki for hkj s hk s hk in sample k = 1, 2, and ẑ hki ( hkj) = (y hk ( hkj)/x hk ( hkj))x hki for hkj s hk in sample k = 1, 2, stratum h. Define the adjusted estimator, y a hki( hkj) = 1 n hk 1 n hk i=1 ẑ hki ( hkj). Using these values, the jackknife variance estimator is given by v Jr (y ki ) = L h=1 n hk 1 n hk n hk j=1 [y a ki( hkj) y ki ] 2, 2004 JSM, August 11 17

Adjusted Jackknife Variance Estimator for Two Samples,Two-Phase, Stratified Random Sampling Noting that y a ki ( hkj) y ki = W h [y a hki ( hkj) y hki], where y a hki ( hkj) is the adjusted imputed estimator of the hth stratum mean Y h when hkjth sample unit is deleted, we get v Jr (y ki ) = = L Whv 2 Jr (y hki ) h=1 L h=1 W 2 h n hk 1 n hk n hk j=1 (y a hki( hkj) y hki ) 2, for k = 1, 2. 2004 JSM, August 11 18

The jackknife variance estimator is a weighted average of two estimators, given by v Jrs = 1v n Jr (y 1I ) + n 2v Jr (y 2I ) n 1 + n 2 n L 1 = n 1 + W 2 n h1 1 h n 2 n h=1 h1 n L 2 + n 1 + W 2 n h2 1 h n 2 n h2 h=1 n h1 j=1 n h2 j=1 where n 1 = 2 h=1 n h1 and n 2 = 2 h=1 n h2. (y a h1i( h1j) y h1i ) 2 (y a h2i( h2j) y h2i ) 2 2004 JSM, August 11 19

Simulation Study Design: Stratified Random Sampling Population size is N = 10000 Pop Y is related to Pop X, Y = 0.8 X + ɛ Three strata, X < 90, 90 X 110, 110 < X. Stratum 1 size = N 1 = 1633 s.t. W 1 = N 1 N =.1633. Stratum 2 size = N 2 = 6805 s.t. W 2 = N 2 N =.6805. Stratum 3 size = N 3 = 1562 s.t. W 3 = N 3 N =.1562. 2004 JSM, August 11 20

Simulation Study Design: Stratified Random Sampling Stratum=3 Population Y Values 40 60 80 100 120 Stratum=1 Stratum=2 60 80 100 120 Population X Values 2004 JSM, August 11 21

Simulation: Stratified Random Sampling One Sample Jackknife Variance, Stratified Sampling J. Variance 0 10 20 30 40 50 60 70 2*SecondPhase 3*SecondPhase 100 200 300 400 500 Second Phase Sample Size 2004 JSM, August 11 22

Simulation: Stratified Random Sampling Two Sample Jackknife Variance, Stratified Sampling J. Variance 0 10 20 30 40 50 60 70 2*SecondPhase 3*SecondPhase 100 200 300 400 500 Second Phase Sample Size 2004 JSM, August 11 23

Simulation: Stratified Random Sampling Comparison of One and Two Samples, Stratified RS St. Dev of Ests 0.0 0.5 1.0 1.5 2.0 2.5 3.0 One Sample St. Dev Two Sample St. Dev 100 200 300 400 500 Second Phase Sample Size 2004 JSM, August 11 24

Simulation: Stratified Random Sampling Comparison of Complete and Missing Data St. Dev of Ests 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Complete Data Missing Data Case 100 200 300 400 500 Second Phase Sample Size Stratified RS, First SS= 2*SecondPhase 2004 JSM, August 11 25

Calibration Approach to Jackknife Variance Estimation Three major advantages of calibration approach in Survey Sampling Leads to consistent estimates Provides an important class of techniques for the efficient combination of data sources. Has computational advantage for estimates. Apply Tracy et al. (2003) calibration in Stratified and Double Sampling to Jackknife Variance Estimator 2004 JSM, August 11 26

Conclusions and Future Study Jackknife variance estimator for two samples has less SD (variation) than Jackknife variance estimator for one sample. Adjusted Jackknife variance estimator for two samples, Stratified RS, is more efficient than the Adjusted Jackknife variance estimator for two samples, SRS. As the sample sizes increase, the adjusted Jackknife variance estimator for two-sample is shown to be consistent. Future: Apply Jackknife variance estimator for two samples to Stratified Multistage Sampling. 2004 JSM, August 11 27