Chapter 5: Models used in conjunction with sampling J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70
Nonresponse Unit Nonresponse: weight adjustment Item Nonresponse: imputation J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 2 / 70
Two-Phase Setup for Unit Nonresponse Phase one (A): Observe x i Phase two (A R ) : Observe (x i, y i ) π 1i = Pr[i A] : inclusion probability phase one (known) π 2i 1i = Pr[i A R i A] : inclusion probability phase two (unknown) We are interested in estimating the population mean of Y using weighted mean of the observations: i A ȳ R = R w i y i i A R w i where w i = π 1 1i ˆπ 1 2i 1i J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 3 / 70
Two-Phase Setup for Unit Nonresponse Regression weighting approach ȳ reg,1 = x N ˆβ or ȳ reg,2 = x 1 ˆβ where x 1 = ( i A π 1 1i ) 1 ( i A π 1 1i x i ) and ˆβ = ( i x i ) 1 ( π 1 1i x i A R π 1 1i x i A R i y i ). Response model approach: Make a parametric model assumption for π 2i 1i = p(x i ; φ) and use ˆπ 2i 1i = p(x i ; ˆφ). J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 4 / 70
Theorem 5.1.1 Theorem (i) V [N 1 i A π 1 1i (x i, y i ) F] = O(n 1 ) (ii) V [ ˆV (ȲHT) F] = O(n 3 ) (iii) K L < π 2i 1i < K U, π 1 2i 1i = x i α for some α, (iv) x i λ = 1 for all i for some λ, (iv) R i : independent ȳ reg,1 ȳ N = 1 π 1 2i e i + O p (n 1 ), N i A R where π 2i = π 1i π 2i 1i, e i = y i x i β N, and β N = ( i U π 2i 1ix i x i) 1 i U π 2i 1ix i y i. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 5 / 70
Proof of Theorem 5.1.1 Since ˆβ = ( i A R π 1 1i x i x i) 1 i A R π 1 1i x i y i, ˆβ β N = O p (n 1/2 ) where β N = ( i U π 2i 1ix i x i) 1 i U π 2i 1ix i y i. ȳ reg,1 ȳ N = x N ˆβ xn β N N (y i x i β N ) = π 1 2i 1i π 2i 1i(y i x i β N ) i=1 = i=1 N i=1 Use π 1 2i 1i = x i α, transform x i to show x N ( ˆβ β N ) = N 1 (α x i )π 2i 1i (y i x i β N ) = 0 i A R π 1 2i e i + O p (n 1 ). J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 6 / 70
Variance Estimation for ȳ reg,1 ȳ reg,1 = i A R x N j A R π 1 1j x j x j 1 π 1 1i x i Small f = n/n, let ˆb j = ˆπ 1 2j 1jêj, ê j = y j x j ˆβ. ˆV = 1 N 2 i A R π 1ij π 1iπ 1j π 1ij j A R y i =: 1 N ˆb i ˆbj π 1i π 1j i A R 1 π 1i 1 ˆπ 2i 1i y i J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 7 / 70
Justification Variance V w 2i e i F = (π 2ij π 2i π 2j ) w 2i w 2j e i e j i A R i U j U = π 2i 1i π 2j 1j (π 1ij π 1i π 1j )w 2i w 2j e i e j i j;i,j U + i U(π 2i π 2 2i)w 2 2ie 2 i where π 2ij = { π1ij π 2i 1i π 2j 1j for i j π 1i π 2i 1i for i = j. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 8 / 70
Justification (Cont d) Expectation of variance estimator E 1ij (π 1ij π 1i π 1j )w 2i e i w 2j e j F π 1 i A R j A R = i U(π 1i π 2 1i)π 2i 1i w 2 2ie 2 i = i U + i j;i,j U π 2i 1i π 2j 1j (π 1ij π 1i π 1j )w 2i e i w 2j e j (π 2ij π 2i π 2j )w 2i e i w 2j e j j U + i U π 2i (π 2i π 1i )w 2 2ie 2 i, where w 2i = N 1 π 1 2i. The second term is the bias of the variance estimator and it is of order O(N 1 ). J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 9 / 70
Variance Estimation for ȳ reg,2 ȳ reg,2 = x 1 ˆβ ȳ reg,2 ȳ N = ( x 1 x N )β N + x N ( ˆβ β N ) + O p (n 1 ) = ( x 1 x N )β N + N 1 π 1 2i (y i x i β N ) + O p (n 1 ). i A R Variance estimator ˆV 2 = 1 N 2 i A j A π 1ij π 1i π 1j π 1ij ˆb i2 π 1i ˆb j2 π 1j ( 1 where ˆb j2 = (x j x 1 ) ˆβ + (N x 1 ) i AR π 1 1i x i i) x Rj x j êj. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 10 / 70
Response model approach (Propensity score approach) Let R i = { 1 if yi is observed 0 otherwise Assume that the true response mechanism satisfies Pr(R = 1 x, y) = Pr(R = 1 x) = p(x; φ 0 ) (1) for some φ 0. The first equality is often called missing at random (MAR). Under the response model (18), a consistent estimator of φ 0 can be obtained by solving Û h (φ) { } Ri d i p(x i ; φ) 1 h(x i ; φ) = 0, (2) i A where d i = 1/π i, for some h(x; φ) such that Û h (φ)/ φ is of full rank. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 11 / 70
Once ˆφ h is computed from (2), the propensity score adjusted (PSA) estimator of Y = N i=1 y i is given by Ŷ PSA = i A R d i g(x i ; ˆφ h )y i, (3) where g(x i ; ˆφ h ) = {p(x i ; ˆφ h )} 1. The PSA estimator Ŷ PSA is asymptotically equivalent to Ỹ PSA = d i g(x i ; φ 0 )y i + d i h i d i g(x i ; φ 0 )h i i A R i A i A R where ( N ) 1 N B z = p i z i h i p i z i y i, i=1 p i = P(R i = 1 x i ), and z i = g(x i ; φ)/ φ evaluated at φ = φ 0. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 12 / 70 i=1 B z (4)
Thus, the asymptotic variance is equal to ) ) V (ỸPSA F N = V (ŶHT F N +V d i p 1 ( i yi h ) ib z FN i A R ) = V (ŶHT F N { +E di 2 (p 1 i 1) ( } y i h ) 2 ib z FN, i A where p i = p(x i ; φ 0 ) and the second equality follows from independence among R i s. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 13 / 70
Note that { E di 2 (p 1 i 1) ( } y i h ) 2 ib z = E = E +E i A [ i A [ i A [ i A d 2 i (p 1 i 1) { y i E(y i x i ) + E(y i x i ) h ib z } 2 d 2 i (p 1 i 1) {y i E(y i x i )} 2 d 2 i (p 1 i 1) { E(y i x i ) h ib z } 2 and the cross product term is zero because y i E(y i x i ) is conditionally unbiased for zero, conditional on x i and A. ] ] ] J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 14 / 70
Thus, we have ) V (ỸPSA F N ) V (ŶHT F N [ ] + E di 2 (p 1 i 1) {y i E(y i x i )} 2 F N, i A where the equality holds if ˆφ h satisfies { } Ri d i p(x i ; φ) 1 E(Y x i ) = 0. (5) i A Condition (5) provides a way of constructing an optimal PSA estimator. If E(Y x) = β 0 + β 1 x, an optimal PSA estimator of θ can be obtained by solving i A d i R i p(x i ; φ) (1, x i) = i A d i (1, x i ). (6) J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 15 / 70
We now discuss variance estimation of PSA estimators of the form (3) where ˆp i = p i ( ˆφ) is constructed to satisfy (2). By (4), we can write Ŷ PSA = ) d i η i (φ 0 ) + o p (n 1/2 N, (7) i A where η i (φ) = h i B z + R i ( yi h ) p i (φ) ib z. (8) To derive the variance estimator, we assume that the variance estimator ˆV = i A j A Ω ijq i q j satisfies ˆV /V (ˆq HT F N ) = 1 + o p (1) for some Ω ij related to the joint inclusion probability, where ˆq HT = i A d iq i for any q with a finite fourth moment. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 16 / 70
To obtain the total variance, the finite population is divided into two groups, a population of respondents and a population of nonrespondents, so the response indicator is extended to the entire population as R N = {R 1, R 2,, R N }. Given the population, the sample A is selected according to a probability sampling design. Then, we have both respondents and nonrespondents in the sample A. The total variance of ˆη HT = i A d iη i can be written as V (ˆη HT F N ) = E{V (ˆη HT F N, R N ) F N } + V {E(ˆη HT F N, R N ) F N } = V 1 + V 2. (9) J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 17 / 70
The conditional variance term V (ˆη HT F N, R N ) in (9) can be estimated by ˆV 1 = Ω ij ˆη i ˆη j, (10) i A j A where ˆη i = η i ( ˆφ) is defined in (8) with B z replaced by a consistent estimator such as ˆB z = 1 d i ẑ i h i d i ẑ i y i i A R i A R and ẑ i = z(x i ; ˆφ) is the value of z i = g(x i ; φ)/ φ evaluated at φ = ˆφ. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 18 / 70
The second term V 2 in (9) is ( N ) V {E(ˆη HT F N, R N ) F N } = V η i F N = N i=1 i=1 A consistent estimator of V 2 can be derived as ˆV 2 = 1 ˆp i d i ˆp 2 i A R i 1 p i p i ( yi h ib z ) 2. ( y i h i ˆB z ) 2. (11) J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 19 / 70
Therefore, ˆV (ŶPSA ) = ˆV 1 + ˆV 2, (12) is consistent for the variance of the PSA estimator defined in (3) with ˆp i = p i ( ˆφ) satisfying (2), where ˆV 1 is in (10) and ˆV 2 is in (11). Note that the first term of the total variance is V 1 = O p (n 1 N 2 ), but the second term is V 2 = O p (N). Thus, when the sampling fraction nn 1 is negligible, that is, nn 1 = o(1), the second term V 2 can be ignored and ˆV 1 is a consistent estimator of the total variance. Otherwise, the second term V 2 should be taken into consideration so that a consistent variance estimator can be constructed as in (12). J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 20 / 70
5.2 Imputation Meaning: Fill in missing values by a plausible value (or by a set of plausible values) Why imputation? It provides a complete data file: we can apply the standard complete data methods By filling in missing values, the analyses from different users can be consistent. By a proper choice of imputation model, we may reduce the nonresponse bias. Do not want to delete the records of partial information: Makes full use of information. (i.e. reduce the variance) J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 21 / 70
Basic setup y i : study variable. subject to missing. x i : auxiliary variable. always observed. R i : response indicator function for y i. Imputed estimator of total Y = N i=1 y i: Ŷ I = i A where y is the imputed value of y i. How to find y i? 1 π i {R i y i + (1 R i )y i } (13) J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 22 / 70
Lemma 1: If y i is not observed at R i = 0 and if we can find yi that satisfies E (yi R i = 0) = E (y i R i = 0) (14) then the imputed ) estimator ŶI in (13) is unbiased for Y in the sense that E (ŶI Y = 0. How to get y i satisfying (14)? Deterministic imputation: Use an estimator of E (y i R i = 0). Stochastic imputation: Generate y i from f (y i R i = 0). J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 23 / 70
Approaches of computing the conditional distribution f (y i R i = 0): Assuming Missing Completely At Random (MCAR): f (y i R i = 0) = f (y i R i = 1). (15) Under MCAR, we can estimate the parameter using the set of respondents. However, the MCAR may not be realistic. Assume that there exists an auxiliary vector x i such that f (y i x i, R i = 0) = f (y i x i, R i = 1). (16) Condition (16) is called Missing At Random (MAR). Under MAR, we have E (y i R i = 0) = E {E (y i x i, R i = 0) R i = 0} = E {E (y i x i, R i = 1) R i = 0}. Thus, we have only to generate y i from f (y i x i, R i = 1). J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 24 / 70
Lemma 2: Let yi be the imputed value of y i. If E (y i x i, R i = 1) = E (y i x i, R i = 1) (17) and MAR condition holds, then the imputed estimator Ŷ I in (13) is unbiased. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 25 / 70
When the MAR condition holds? : If the response mechanism satisfies Pr (R i = 1 y i, x i ) = Pr (R i = 1 x i ) then (16) holds. Commonly used imputation methods 1 Business surveys: Ratio, regression, nearest neighbor imputation 2 Socio-economic surveys: Random donor (within classes), stochastic ratio or regression, Fractional Imputation, Multiple imputation. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 26 / 70
A Hot Deck Imputation Procedure Partition the sample into G groups: A = A 1 A 2 A G. In group g, we have n g elements, r g respondents, and m g = n g r g nonrespondents. For each group A g, select m g imputed values from r g respondents with replacement (or without replacement). Imputation model: y i iid(µ g, σ 2 g ), i A g (respondents and missing) J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 27 / 70
Example 5.2.1: Hot Deck Imputation Under SRS A g = A Rg A Mg with A Rg = {i A g ; R i = 1} and A Mg = {i A g ; R i = 0}. Imputation: y j = y i with probability 1/r g for i A Rg and j A Mg. Imputed estimator of ȳ N : ȳ I = n 1 i A {R i y i + (1 R i ) yi } =: n 1 y Ii i A J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 28 / 70
Variance of Hot Deck Imputed Mean V (ȳ I ) = V {E I (ȳ I y n )} + E {V I (ȳ I y n )} G G = V n g ȳ Rg + E ( ) n 2 m g 1 r 1 g S 2 Rg n 1 g=1 where ȳ Rg = rg 1 i A Rg y i and SRg 2 = (r g 1) 1 i A Rg (y i ȳ Rg ) 2, y n = (y 1, y 2,..., y n ) g=1 J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 29 / 70
Variance of Hot Deck Imputed Sample (2) Model : y i i A g iid(µ g, σ 2 g ) V {ȳ I } = V {ȳ n } + n 2 G g=1 n g m g r 1 g G = V {ȳ n } + n 2 c g σg 2 g=1 Reduced sample size: n 2 n 2 g (r 1 g σ 2 g + n 2 ng 1 )σg 2 G g=1 m g (1 rg 1 )σg 2 Randomness due to stochastic imputation: n 2 m g (1 rg 1 )σg 2 J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 30 / 70
Variance Estimation Naive approach: Treat imputed values as if observed Naive approach underestimates the true variance! Example: Naive: ˆV I = n 1 S 2 I E { } SI 2 } n = E {(n 1) 1 (y Ii ȳ I ) 2 Bias corrected estimator ˆV = ˆV I + i=1. = (n 1) 1 [ E{(y Ii µ) 2 } V {ȳ I } ]. = E(Sy,n) 2 G c g SRg 2 g=1 J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 31 / 70
Other Approaches for Variance Estimation Multiple imputation: Rubin (1987) Adjusted jackknife: Rao and Shao (1992) Fractional imputation: Kim and Fuller (2004), Kim (2011). Linearization: Shao and Steel (1999), Kim and Rao (2009) J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 32 / 70
Fractional imputation Basic Idea Split the record with missing item into M imputed values Assign fractional weights J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 33 / 70
Example 5.2.1: Artificial bivariate data from SRS Table 5.1: Sample with Missing data ID Weight Cell for x Cell for y x y 1 0.10 1 1 1 7 2 0.10 1 1 2 M 3 0.10 1 2 3 M 4 0.10 1 1 M 14 5 0.10 1 2 1 3 6 0.10 2 1 2 15 7 0.10 2 2 3 8 8 0.10 2 1 3 9 9 0.10 2 2 2 2 10 0.10 2 1 M M M: Missing J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 34 / 70
x: categorical variable with three categories (1,2,3). Two imputation cells Cell 1: Cell 2: x = 1 with observed prob. 0.5 2 with observed prob. 0.25 3 with observed prob. 0.25 1 with observed prob. 0.00 x = 2 with observed prob. 0.50 3 with observed prob. 0.50 y: continuous variable. Four possible donors for cell one and three possible donors for cell two. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 35 / 70
Fractional hot deck imputation: Table 5.2: Fractionally Imputed Data Set Donor Final ID x y wij0 Weight Cell for x Cell for y x y 1 0 0 0.1000 1 1 1 7 2 0 1 0.3333 0.0289 1 1 2 7 0 6 0.3333 0.0396 1 1 2 15 0 8 0.3333 0.0315 1 1 2 9 3 0 5 0.3333 0.0333 1 2 3 3 0 7 0.3333 0.0333 1 2 3 8 0 9 0.3333 0.0333 1 2 3 2 4 0 0.5000 0.0500 1 1 1 14 0 0.2500 0.0250 1 1 2 14 0 0.2500 0.0250 1 1 3 14 5 0 0 0.1000 1 2 1 3 J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 36 / 70
Table 5.2: Fractionally Imputed Data Set (Cont d) Donor Final ID x y wij0 Weight Cell for x Cell for y x y 6 0 0 0.1000 2 1 2 15 7 0 0 0.1000 2 2 3 8 8 0 0 0.1000 2 1 3 9 9 0 0 0.1000 2 2 2 2 10 8 0.2500 0.0225 2 1 2 9 4 0.2500 0.0275 2 1 2 14 1 0.2500 0.0209 2 1 3 7 6 0.2500 0.0291 2 1 3 15 J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 37 / 70
Q. How to compute the fractional weight? [Step 1] Compute the cell mean of the control variable z i where z i consists of I (x i = 1), I (x i = 2), I (x i = 3), and y i. [Step 2] Apply the calibration weighting method to find wij that satisfies M w i z i + w i wij z i = w i z c i A Rc j=1 i A c and M j=1 w ij = 1. i A Mc J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 38 / 70
Nearest neighbor imputation Models for nearest neighbor imputation Model Y i = g (x i, β) + e i Semiparametric Model : If g ( ) were known, one could use model based imputation such as Y i = g ( ) x i, ˆβ + êi. Nonparametric Model : The form of g ( ) is unknown, except that it is a smooth function of x i. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 39 / 70
Models for nearest neighbor imputation Fay (1999) s model E ζ (y i ) = E ζ ( ynn1(i) ) = Eζ ( ynn2(i) ) Var ζ (y i ) = Var ζ ( ynn1(i) ) = Varζ ( ynn2(i) ) and the y-variables are uncorrelated, where nn1(i) is the index for the nearest neighbor of unit i and nn2(i) is the index for the second nearest neighbor of unit i. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 40 / 70
Models for nearest neighbor imputation Alternative representation of Fay (1999) s model Y j indep (µ gi, σ gi ), j A gi where A gi = {i, nn1(i), nn2(i)} is the index set of the two nearest neighbors of unit i. Thus, Fay (1999) s model is a special case of cell mean model with much finer cell definition. Kim and Fuller (2004) proposed a variance estimation method for fractional imputation under the cell mean model. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 41 / 70
Models for nearest neighbor imputation Alternative representation of Fay (1999) s model Y j indep (µ gi, σ gi ), j A gi where A gi = {i, nn1(i), nn2(i)} is the index set of the two nearest neighbors of unit i. Thus, Fay (1999) s model is a special case of cell mean model with much finer cell definition. Kim and Fuller (2004) proposed a variance estimation method for fractional imputation under the cell mean model. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 42 / 70
Example: Simple Random Sample Auxiliary Element Sample Variable Y Weight ( House Rent ) ( Income ) 1 0.1 1 1 2 0.1 1 2 3 0.1 1 3 4 0.1 1 4 5 0.1 1 5 6 0.1 1? 7 0.1 0 3 8 0.1 0 6 9 0.1 0 9 10 0.1 0? J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 43 / 70
Example : Jackknife for Fractional Imputation in Kim and Fuller (2004) w (1) ij w (2) i w (2) w (5) ij Unit w i wij X Y w (1) i ij w (5) i 1 0.10 1 1 0 0.111 0.111 2 0.10 1 2 0.111 0 0.111 3 0.10 1 3 0.111 0.111 0.111 4 0.10 1 4 0.111 0.111 0.111 5 0.10 1 5 0.111 0.111 0 6 0.05 1 1 0.111 (0.5 δ 1 ) 0.055 0.111 (0.5 + δ 5 ) 0.05 1 5 0.111 (0.5 + δ 1 ) 0.055 0.111 (0.5 δ 5 ) 7 0.10 0 3 0.111 0.111 0.111 8 0.10 0 6 0.111 0.111 0.111 9 0.10 0 9 0.111 0.111 0.111 10 0.05 0 3 0.055 0.055 0.055 0.05 0 6 0.056 0.056 0.056 J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 44 / 70
Example (Continued) w (6) ij w (7) i w (7) w (10) ij Unit w i wij X Y w (6) i ij w (10) i 1 0.10 1 1 0.111 0.111 0.111 2 0.10 1 2 0.111 0.111 0.111 3 0.10 1 3 0.111 0.111 0.111 4 0.10 1 4 0.111 0.111 0.111 5 0.10 1 5 0.111 0.111 0.111 6 0.05 1 1 0 0.055 0.055 0.05 1 5 0 0.056 0.056 7 0.10 0 3 0.111 0 0.111 8 0.10 0 6 0.111 0.111 0.111 9 0.10 0 9 0.111 0.111 0.111 10 0.05 0 3 0.55 0.111 (0.5 δ 7 ) 0 0.05 0 6 0.56 0.111 (0.5 + δ 7 ) 0 J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 45 / 70
Example : Continued If θ = E (X ) If θ = E (Y ) ˆθ (k) I ˆθ I = ˆθ (k) n ˆθ n, k = 1, 2,..., 10. ˆθ (1) I ˆθ I = ˆθ (1) I,naive ˆθ I + δ 1 (y 5 y 1 ) ˆθ (5) I ˆθ I = ˆθ (5) I,naive ˆθ I + δ 5 (y 1 y 5 ) ˆθ (7) I ˆθ I = ˆθ (7) I,naive ˆθ I + δ 7 (y 8 y 7 ) ˆθ (8) I ˆθ I = ˆθ (8) I,naive ˆθ I + δ 8 (y 7 y 8 ) ˆθ (k) I ˆθ I = ˆθ (k) I,naive ˆθ I, k = 2, 3, 4, 6, 9, 10. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 46 / 70
Variance estimation for fractional imputation Variance estimator is a function of δ k s : ˆV δ = L k=1 c k i A R α (k) δ,i y i α i y i i A R Naive variance estimator ( δ k 0 ) : Underestimation Increasing the δ k will increase the value of variance estimator How to decide δ k? ( ) ) E ˆV δ Var (ˆθ I = E [ G L g=1 i A R k=1 ] ( ) c k α (k) 2 δ,i α i α 2 i 2 σ 2 g J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 47 / 70
Variance estimation for fractional imputation Kim and Fuller (2004) showed that if i A R w (k) ij = 1 (C.1) and L i A Rg k=1 ( ) c k α (k) 2 i α i = αi 2, i A Rg then the replication variance estimator defined by (C.2) ˆV I = L k=1 c k (ˆθ (k) I ˆθ I ) 2, where ˆθ (k) I = i A R α (k) i y i, is unbiased for the total variance under the cell mean model. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 48 / 70
Variance estimation for fractional imputation Condition (C.1) is needed for the replication weight to be used for other completely responding variables. Condition (C.2) is used to unbiasedly estimate the imputation variance. The δ k is determined by solving a quadratic equation of δ k. i A Rg c k ( ) 2 α (k) ( ) δ,i α i c k α (k) 2 0,i α i = α 2 k i A Rg L s=1 ( ) c s α (s) 2 0,k α k where α (k) 0,i = j A w (k) i wij d ij is the k-th replicate of total weight of donor i for naive variance estimator. (δ k 0) J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 49 / 70
Example : Continued Calculation of δ 1 : δ 1 = 0.303 0.9 {0.044 0.111δ 1 0.14} 2 + 0.9 {0.178 + 0.111δ 1 0.16} 2 0.9 {0.044 0.14} 2 0.9 {0.178 0.16} 2 = 0.14 2 0.9 {0.044 0.14} 2 0.9 {0.155 0.14} 2 8 0.9 {0.111 0.14} 2 Similarly, we can calculate δ 5 = 0.429, δ 7 = 0.377, and δ 8 = 0.377. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 50 / 70
Conclusion Kim, Fuller, and Bell (2011) used the method to compute the variance estimation for 2000 US census long form income data. Nonparametric method: Theoretically challenging but practically very attractive (and popular) Other references 1 Chen and Shao (2001) 2 Beaumont and Bocci (2009) 3 Kim and Fuller (2004), Fuller and Kim (2005), Kim (2011). J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 51 / 70
Small area estimation Basic Setup Original sample A is decomposed into G domains such that A = A 1 A G and n = n 1 + + n G n is large but n g can be very small. Direct estimator of Y g = i U g y i Ŷ d,g = i A g 1 π i y i Unbiased May have high variance. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 52 / 70
If there is some auxiliary information available, then we can do something: Synthetic estimator of Y g Ŷs,g = Xg ˆβ where X g = i U g x i is the known total of x i in U g and ˆβ is an estimated regression coefficient. Low variance (if x i does not contain the domain indicator). Could be biased (unless i U g (y i x ib) = 0) J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 53 / 70
Composite estimation: consider Ŷ c,g = α g Ŷ d,g + (1 α g ) Ŷs,g for some α g (0, 1). We are interested in finding αg that minimizes the MSE of Ŷ c. The optimal choice is αg = MSE (Ŷs,g ) MSE ) (Ŷd,g + MSE (Ŷd,g ) (Ŷs,g ) (Ŷd,g ) For the direct estimation part, MSE = V can be estimated. ) } For the synthetic estimation part, MSE (Ŷs,g = E {(Ŷ s,g Y g ) 2 cannot be computed directly without assuming some error model. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 54 / 70
Area level estimation Basic Setup Parameter of interest: Ȳ g = Ng 1 Model and u g ( 0, σ 2 u). Also, we have with V g = V ( ˆȲ d,g ). i U g y i Ȳ g = X g β + u g ˆȲ d,g ( Ȳ g, V g ) J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 55 / 70
Two model can be written ˆȲ d,g = Ȳg + e g X g β = Ȳg u g where e g and u g are independent error terms with mean zeros and variance V g and σ 2 u, respectively. Thus, the best linear unbiased predictor (BLUP) can be written as where α g = σ 2 u/(v g + σ 2 u). ˆȲ g = αg ˆȲ d,g + ( 1 αg ) X g β (18) J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 56 / 70
MSE: If β, V g, and σu 2 are known, then ( ) ( ) MSE ˆȲ g = V ˆȲ g Ȳ g { ( ) = V αg ˆȲ d,g Ȳ g + ( 1 αg ) ( X ) } g β Ȳ g = ( αg ) 2 Vg + ( 1 αg ) 2 σ 2 u = αg V g = ( 1 αg ) σ 2 u. Note that, since 0 < αg < 1, ( ) MSE ˆȲ g < V g and ( ) MSE ˆȲ g < σu. 2 J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 57 / 70
When β is unknown (and V g and σ 2 u are known): G ˆβ = w g X g X g g=1 1 where w g = ( σ 2 u + V g ) 1. The EBLUP is G g=1 w g X g ˆȲ d,g ˆȲ g ( ˆβ) = αg ˆȲ d,g + ( 1 αg ) X ˆβ g (19) which takes the form of the composite estimator. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 58 / 70
The MSE is { } MSE ˆȲ g ( ˆβ) { } = V ˆȲ g ( ˆβ) Ȳ g { ( ) = V αg ˆȲ d,g Ȳ g + ( 1 αg ) ( X ˆβ )} g Ȳ g = ( αg ) 2 Vg + ( 1 αg ) 2 {σu 2 + X g V ( ˆβ) X } g = α g V g + ( 1 α g )2 X g V ( ˆβ) X g. where ( G V ˆβ) = w g X g X g g=1 1 J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 59 / 70
If β and σ 2 u are unknown: 1 Find a consistent estimator of β and σu. 2 2 Use ˆȲ g (ˆα g, ˆβ) = ˆα g ˆȲ d,g + ( 1 ˆα g ) X g ˆβ. (20) where ˆα g = ˆσ 2 u/( ˆV g + ˆσ 2 u) Estimation of σu: 2 Method of moment ˆσ u 2 = { G ( k g ˆȲ d,g X ˆβ) } 2 g ˆV d,g, G p g { } 1 where k g ˆσ u 2 + ˆV g and G g=1 k g = 1. If ˆσ u 2 is negative, then we set it to zero. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 60 / 70
MSE { MSE ˆȲ g (ˆα g, ˆβ) } { } = V ˆȲ g (ˆα g, ˆβ) Ȳ g { ( ) = V ˆα g ˆȲ d,g Ȳ g + ( 1 ˆα g ) ( X ˆβ )} g Ȳ g = ( αg ) 2 Vg + ( 1 αg ) } 2 {σu 2 + X g V ( ˆβ) X g +V (ˆα g ) { V g + σu 2 } = αg V g + ( 1 αg )2 X g V ( ˆβ) X g +V (ˆα g ) { V g + σu 2 } MSE estimation (Prasad and Rao, 1990): { MSE ˆ ˆȲ g (ˆα g, ˆβ) } = ˆα g ˆV g + ( 1 ˆα g )2 X g ˆV ( ˆβ) Xg { } +2 ˆV (ˆα g ) ˆV g + ˆσ u 2. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 61 / 70
Extensions Unit level estimation: Battese, Harter, and Fuller (1988). Use a unit level modeling and y gi = x giβ + u g + e gi Ŷ g = i U g {x gi ˆβ + û g }, where σu û g = Ê(u g ˆX 2 g, Ŷ g ) = (Ŷ g ˆX g ˆβ). σu 2 + ˆV g It can be shown that ˆȲ g = ˆα g Ȳreg,g + ( 1 ˆα g ) Ȳs,g where Ȳ reg,g = ˆȲ d,g + ( X g ˆ X d,g ) ˆβ and Ȳ s,g = X g ˆβ. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 62 / 70
Benchmarked small area estimation: Wang, Fuller, and Qu (2009). sum of the small area estimates is not necessarily equal to Ŷ = i A 1 π i y i It is desired to make the benchmarking condition holds: G N g ˆȲ g g=1 = Ŷ Idea: Since ˆȲ g = X g ˆβ + α g ( ˆȲ d,g X g ˆβ), we can adjust σ 2 u so that G g=1 ( N g αg ˆȲ d,g X ˆβ ) g = 0. For other applications, read Small Area Estimation by Rao (2003). J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 63 / 70
Measurement Error Measurement error: Errors due to inaccurate measurement (e.g. interviewer effect; ambiguous questions; inaccurate memory; impossible to measure directly) Two aspects: 1 Bias: very hard to measure. Validation subsample needed 2 Variance: repeated independent determinations for the same item J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 64 / 70
Measurement Error Model X i : observed value of item x for unit i x i : true value of item x for unit i Measurement error Model 1: where u i x i iid ( 0, σ 2 u). Measurement error Model 2: X i = x i + u i X i = γ 0 + γ 1 x i + u i where u i x i iid ( 0, σ 2 u). Possible model if xi are observed in a validation sample. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 65 / 70
Simple Estimators Horvitz-Thompson estimator : ˆT X = i A w ix i Parameter: T x = N i=1 x i. 1 H-T estimator unbiased under Model 1 ( ) ( ˆT X T x = ˆTx T x + ˆTX ˆT ) x The first term has zero mean over the sampling mechanism and the second term has zero mean under model 1. 2 Variance { } V ˆTX T x F x = V = V { { } ˆTx T x F x + E wi 2 σu 2 { ˆTx T x F x } + E i A } { N } w i σu 2 i=1 J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 66 / 70
Naive Variance Estimator ( ) ˆV ˆT X { } E ˆV ( ˆT x ) = i A j A { } = E ˆV ( ˆT x F x ) = V π 1 ij (π ij π i π j ) w i w j X i X j ( ˆT x T x F x ) + { } + E ˆV ( ˆT u ) N (1 π i ) w i σu. 2 ( ) { } The bias of ˆV ˆT X as an estimator of V ˆT X T x F x is Nσu, 2 which is negligible if n/n = o (1). Remark: The bias is negligible under the assumption that the measurement errors are independent. If the assumption does not hold, then the variance estimator is no longer asymptotically unbiased and we may need to use a different variance estimator. i=1 J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 67 / 70
Complex Estimators Observe (X i, y i ), where X i subject to measurement error Regression Model (under model 1) y i = β 0 + β 1 x i + e i X i = x i + u i where e i x i ( 0, σ 2 e), ui x i ( 0, σ 2 u), and ei and u i are independent. Naive approach: OLS estimator associated with y i = β 0 + β 1 X i + a i where a i = e i β 1 u i. The OLS estimator is biased because E (a i X i ) 0. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 68 / 70
Regression Coefficient Under the assumption of e i and u i, { n E n 1 ( Xi X ) } n y i F x = n 1 (x i x) (β 0 + β 1 x i ) + o (1) and E { i=1 n 1 i=1 n ( Xi X ) } n 2 Fx = n 1 (x i x) 2 + σu 2 + o (1), i=1 we have { }.= E ˆβ1,OLS F x β1 σ xx σ xx + σu 2 = β 1 κ xx { where σ xx = E n 1 n i=1 (x i x) 2}. Thus, the effect of measurement error is to bias the slope estimate in the direction of zero. Bias of this nature is commonly referred to as attenuation. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 69 / 70 i=1
If κ xx is known, a simple bias-correction method is to use If σ 2 u is known, then we use ˆβ 1 = κ 1 xx ˆβ 1,OLS. ˆβ 1 = { S 2 X σ2 u} 1 SXY. If the ratio σe/σ 2 u 2 is known, a bias-corrected method can be obtained by minimizing the following quantity: { } n (y i β 0 β 1 x i ) 2 Q (x 1,, x n, β 0, β 1 ) = σe 2 + (X i x i ) 2 σu 2. i=1 Here, x i are treated as parameters. This is essentially the least squares method with measurement errors. The solution can be obtained by minimizing { } n (y i β 0 β 1 X i ) 2 Q (β 0, β 1 ) = σe 2 + β1 2. σ2 u i=1 J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 70 / 70