Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70

Size: px
Start display at page:

Download "Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70"

Transcription

1 Chapter 5: Models used in conjunction with sampling J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70

2 Nonresponse Unit Nonresponse: weight adjustment Item Nonresponse: imputation J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 2 / 70

3 Two-Phase Setup for Unit Nonresponse Phase one (A): Observe x i Phase two (A R ) : Observe (x i, y i ) π 1i = Pr[i A] : inclusion probability phase one (known) π 2i 1i = Pr[i A R i A] : inclusion probability phase two (unknown) We are interested in estimating the population mean of Y using weighted mean of the observations: i A ȳ R = R w i y i i A R w i where w i = π 1 1i ˆπ 1 2i 1i J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 3 / 70

4 Two-Phase Setup for Unit Nonresponse Regression weighting approach ȳ reg,1 = x N ˆβ or ȳ reg,2 = x 1 ˆβ where x 1 = ( i A π 1 1i ) 1 ( i A π 1 1i x i ) and ˆβ = ( i x i ) 1 ( π 1 1i x i A R π 1 1i x i A R i y i ). Response model approach: Make a parametric model assumption for π 2i 1i = p(x i ; φ) and use ˆπ 2i 1i = p(x i ; ˆφ). J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 4 / 70

5 Theorem Theorem (i) V [N 1 i A π 1 1i (x i, y i ) F] = O(n 1 ) (ii) V [ ˆV (ȲHT) F] = O(n 3 ) (iii) K L < π 2i 1i < K U, π 1 2i 1i = x i α for some α, (iv) x i λ = 1 for all i for some λ, (iv) R i : independent ȳ reg,1 ȳ N = 1 π 1 2i e i + O p (n 1 ), N i A R where π 2i = π 1i π 2i 1i, e i = y i x i β N, and β N = ( i U π 2i 1ix i x i) 1 i U π 2i 1ix i y i. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 5 / 70

6 Proof of Theorem Since ˆβ = ( i A R π 1 1i x i x i) 1 i A R π 1 1i x i y i, ˆβ β N = O p (n 1/2 ) where β N = ( i U π 2i 1ix i x i) 1 i U π 2i 1ix i y i. ȳ reg,1 ȳ N = x N ˆβ xn β N N (y i x i β N ) = π 1 2i 1i π 2i 1i(y i x i β N ) i=1 = i=1 N i=1 Use π 1 2i 1i = x i α, transform x i to show x N ( ˆβ β N ) = N 1 (α x i )π 2i 1i (y i x i β N ) = 0 i A R π 1 2i e i + O p (n 1 ). J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 6 / 70

7 Variance Estimation for ȳ reg,1 ȳ reg,1 = i A R x N j A R π 1 1j x j x j 1 π 1 1i x i Small f = n/n, let ˆb j = ˆπ 1 2j 1jêj, ê j = y j x j ˆβ. ˆV = 1 N 2 i A R π 1ij π 1iπ 1j π 1ij j A R y i =: 1 N ˆb i ˆbj π 1i π 1j i A R 1 π 1i 1 ˆπ 2i 1i y i J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 7 / 70

8 Justification Variance V w 2i e i F = (π 2ij π 2i π 2j ) w 2i w 2j e i e j i A R i U j U = π 2i 1i π 2j 1j (π 1ij π 1i π 1j )w 2i w 2j e i e j i j;i,j U + i U(π 2i π 2 2i)w 2 2ie 2 i where π 2ij = { π1ij π 2i 1i π 2j 1j for i j π 1i π 2i 1i for i = j. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 8 / 70

9 Justification (Cont d) Expectation of variance estimator E 1ij (π 1ij π 1i π 1j )w 2i e i w 2j e j F π 1 i A R j A R = i U(π 1i π 2 1i)π 2i 1i w 2 2ie 2 i = i U + i j;i,j U π 2i 1i π 2j 1j (π 1ij π 1i π 1j )w 2i e i w 2j e j (π 2ij π 2i π 2j )w 2i e i w 2j e j j U + i U π 2i (π 2i π 1i )w 2 2ie 2 i, where w 2i = N 1 π 1 2i. The second term is the bias of the variance estimator and it is of order O(N 1 ). J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 9 / 70

10 Variance Estimation for ȳ reg,2 ȳ reg,2 = x 1 ˆβ ȳ reg,2 ȳ N = ( x 1 x N )β N + x N ( ˆβ β N ) + O p (n 1 ) = ( x 1 x N )β N + N 1 π 1 2i (y i x i β N ) + O p (n 1 ). i A R Variance estimator ˆV 2 = 1 N 2 i A j A π 1ij π 1i π 1j π 1ij ˆb i2 π 1i ˆb j2 π 1j ( 1 where ˆb j2 = (x j x 1 ) ˆβ + (N x 1 ) i AR π 1 1i x i i) x Rj x j êj. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 10 / 70

11 Response model approach (Propensity score approach) Let R i = { 1 if yi is observed 0 otherwise Assume that the true response mechanism satisfies Pr(R = 1 x, y) = Pr(R = 1 x) = p(x; φ 0 ) (1) for some φ 0. The first equality is often called missing at random (MAR). Under the response model (18), a consistent estimator of φ 0 can be obtained by solving Û h (φ) { } Ri d i p(x i ; φ) 1 h(x i ; φ) = 0, (2) i A where d i = 1/π i, for some h(x; φ) such that Û h (φ)/ φ is of full rank. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 11 / 70

12 Once ˆφ h is computed from (2), the propensity score adjusted (PSA) estimator of Y = N i=1 y i is given by Ŷ PSA = i A R d i g(x i ; ˆφ h )y i, (3) where g(x i ; ˆφ h ) = {p(x i ; ˆφ h )} 1. The PSA estimator Ŷ PSA is asymptotically equivalent to Ỹ PSA = d i g(x i ; φ 0 )y i + d i h i d i g(x i ; φ 0 )h i i A R i A i A R where ( N ) 1 N B z = p i z i h i p i z i y i, i=1 p i = P(R i = 1 x i ), and z i = g(x i ; φ)/ φ evaluated at φ = φ 0. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 12 / 70 i=1 B z (4)

13 Thus, the asymptotic variance is equal to ) ) V (ỸPSA F N = V (ŶHT F N +V d i p 1 ( i yi h ) ib z FN i A R ) = V (ŶHT F N { +E di 2 (p 1 i 1) ( } y i h ) 2 ib z FN, i A where p i = p(x i ; φ 0 ) and the second equality follows from independence among R i s. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 13 / 70

14 Note that { E di 2 (p 1 i 1) ( } y i h ) 2 ib z = E = E +E i A [ i A [ i A [ i A d 2 i (p 1 i 1) { y i E(y i x i ) + E(y i x i ) h ib z } 2 d 2 i (p 1 i 1) {y i E(y i x i )} 2 d 2 i (p 1 i 1) { E(y i x i ) h ib z } 2 and the cross product term is zero because y i E(y i x i ) is conditionally unbiased for zero, conditional on x i and A. ] ] ] J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 14 / 70

15 Thus, we have ) V (ỸPSA F N ) V (ŶHT F N [ ] + E di 2 (p 1 i 1) {y i E(y i x i )} 2 F N, i A where the equality holds if ˆφ h satisfies { } Ri d i p(x i ; φ) 1 E(Y x i ) = 0. (5) i A Condition (5) provides a way of constructing an optimal PSA estimator. If E(Y x) = β 0 + β 1 x, an optimal PSA estimator of θ can be obtained by solving i A d i R i p(x i ; φ) (1, x i) = i A d i (1, x i ). (6) J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 15 / 70

16 We now discuss variance estimation of PSA estimators of the form (3) where ˆp i = p i ( ˆφ) is constructed to satisfy (2). By (4), we can write Ŷ PSA = ) d i η i (φ 0 ) + o p (n 1/2 N, (7) i A where η i (φ) = h i B z + R i ( yi h ) p i (φ) ib z. (8) To derive the variance estimator, we assume that the variance estimator ˆV = i A j A Ω ijq i q j satisfies ˆV /V (ˆq HT F N ) = 1 + o p (1) for some Ω ij related to the joint inclusion probability, where ˆq HT = i A d iq i for any q with a finite fourth moment. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 16 / 70

17 To obtain the total variance, the finite population is divided into two groups, a population of respondents and a population of nonrespondents, so the response indicator is extended to the entire population as R N = {R 1, R 2,, R N }. Given the population, the sample A is selected according to a probability sampling design. Then, we have both respondents and nonrespondents in the sample A. The total variance of ˆη HT = i A d iη i can be written as V (ˆη HT F N ) = E{V (ˆη HT F N, R N ) F N } + V {E(ˆη HT F N, R N ) F N } = V 1 + V 2. (9) J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 17 / 70

18 The conditional variance term V (ˆη HT F N, R N ) in (9) can be estimated by ˆV 1 = Ω ij ˆη i ˆη j, (10) i A j A where ˆη i = η i ( ˆφ) is defined in (8) with B z replaced by a consistent estimator such as ˆB z = 1 d i ẑ i h i d i ẑ i y i i A R i A R and ẑ i = z(x i ; ˆφ) is the value of z i = g(x i ; φ)/ φ evaluated at φ = ˆφ. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 18 / 70

19 The second term V 2 in (9) is ( N ) V {E(ˆη HT F N, R N ) F N } = V η i F N = N i=1 i=1 A consistent estimator of V 2 can be derived as ˆV 2 = 1 ˆp i d i ˆp 2 i A R i 1 p i p i ( yi h ib z ) 2. ( y i h i ˆB z ) 2. (11) J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 19 / 70

20 Therefore, ˆV (ŶPSA ) = ˆV 1 + ˆV 2, (12) is consistent for the variance of the PSA estimator defined in (3) with ˆp i = p i ( ˆφ) satisfying (2), where ˆV 1 is in (10) and ˆV 2 is in (11). Note that the first term of the total variance is V 1 = O p (n 1 N 2 ), but the second term is V 2 = O p (N). Thus, when the sampling fraction nn 1 is negligible, that is, nn 1 = o(1), the second term V 2 can be ignored and ˆV 1 is a consistent estimator of the total variance. Otherwise, the second term V 2 should be taken into consideration so that a consistent variance estimator can be constructed as in (12). J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 20 / 70

21 5.2 Imputation Meaning: Fill in missing values by a plausible value (or by a set of plausible values) Why imputation? It provides a complete data file: we can apply the standard complete data methods By filling in missing values, the analyses from different users can be consistent. By a proper choice of imputation model, we may reduce the nonresponse bias. Do not want to delete the records of partial information: Makes full use of information. (i.e. reduce the variance) J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 21 / 70

22 Basic setup y i : study variable. subject to missing. x i : auxiliary variable. always observed. R i : response indicator function for y i. Imputed estimator of total Y = N i=1 y i: Ŷ I = i A where y is the imputed value of y i. How to find y i? 1 π i {R i y i + (1 R i )y i } (13) J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 22 / 70

23 Lemma 1: If y i is not observed at R i = 0 and if we can find yi that satisfies E (yi R i = 0) = E (y i R i = 0) (14) then the imputed ) estimator ŶI in (13) is unbiased for Y in the sense that E (ŶI Y = 0. How to get y i satisfying (14)? Deterministic imputation: Use an estimator of E (y i R i = 0). Stochastic imputation: Generate y i from f (y i R i = 0). J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 23 / 70

24 Approaches of computing the conditional distribution f (y i R i = 0): Assuming Missing Completely At Random (MCAR): f (y i R i = 0) = f (y i R i = 1). (15) Under MCAR, we can estimate the parameter using the set of respondents. However, the MCAR may not be realistic. Assume that there exists an auxiliary vector x i such that f (y i x i, R i = 0) = f (y i x i, R i = 1). (16) Condition (16) is called Missing At Random (MAR). Under MAR, we have E (y i R i = 0) = E {E (y i x i, R i = 0) R i = 0} = E {E (y i x i, R i = 1) R i = 0}. Thus, we have only to generate y i from f (y i x i, R i = 1). J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 24 / 70

25 Lemma 2: Let yi be the imputed value of y i. If E (y i x i, R i = 1) = E (y i x i, R i = 1) (17) and MAR condition holds, then the imputed estimator Ŷ I in (13) is unbiased. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 25 / 70

26 When the MAR condition holds? : If the response mechanism satisfies Pr (R i = 1 y i, x i ) = Pr (R i = 1 x i ) then (16) holds. Commonly used imputation methods 1 Business surveys: Ratio, regression, nearest neighbor imputation 2 Socio-economic surveys: Random donor (within classes), stochastic ratio or regression, Fractional Imputation, Multiple imputation. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 26 / 70

27 A Hot Deck Imputation Procedure Partition the sample into G groups: A = A 1 A 2 A G. In group g, we have n g elements, r g respondents, and m g = n g r g nonrespondents. For each group A g, select m g imputed values from r g respondents with replacement (or without replacement). Imputation model: y i iid(µ g, σ 2 g ), i A g (respondents and missing) J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 27 / 70

28 Example 5.2.1: Hot Deck Imputation Under SRS A g = A Rg A Mg with A Rg = {i A g ; R i = 1} and A Mg = {i A g ; R i = 0}. Imputation: y j = y i with probability 1/r g for i A Rg and j A Mg. Imputed estimator of ȳ N : ȳ I = n 1 i A {R i y i + (1 R i ) yi } =: n 1 y Ii i A J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 28 / 70

29 Variance of Hot Deck Imputed Mean V (ȳ I ) = V {E I (ȳ I y n )} + E {V I (ȳ I y n )} G G = V n g ȳ Rg + E ( ) n 2 m g 1 r 1 g S 2 Rg n 1 g=1 where ȳ Rg = rg 1 i A Rg y i and SRg 2 = (r g 1) 1 i A Rg (y i ȳ Rg ) 2, y n = (y 1, y 2,..., y n ) g=1 J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 29 / 70

30 Variance of Hot Deck Imputed Sample (2) Model : y i i A g iid(µ g, σ 2 g ) V {ȳ I } = V {ȳ n } + n 2 G g=1 n g m g r 1 g G = V {ȳ n } + n 2 c g σg 2 g=1 Reduced sample size: n 2 n 2 g (r 1 g σ 2 g + n 2 ng 1 )σg 2 G g=1 m g (1 rg 1 )σg 2 Randomness due to stochastic imputation: n 2 m g (1 rg 1 )σg 2 J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 30 / 70

31 Variance Estimation Naive approach: Treat imputed values as if observed Naive approach underestimates the true variance! Example: Naive: ˆV I = n 1 S 2 I E { } SI 2 } n = E {(n 1) 1 (y Ii ȳ I ) 2 Bias corrected estimator ˆV = ˆV I + i=1. = (n 1) 1 [ E{(y Ii µ) 2 } V {ȳ I } ]. = E(Sy,n) 2 G c g SRg 2 g=1 J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 31 / 70

32 Other Approaches for Variance Estimation Multiple imputation: Rubin (1987) Adjusted jackknife: Rao and Shao (1992) Fractional imputation: Kim and Fuller (2004), Kim (2011). Linearization: Shao and Steel (1999), Kim and Rao (2009) J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 32 / 70

33 Fractional imputation Basic Idea Split the record with missing item into M imputed values Assign fractional weights J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 33 / 70

34 Example 5.2.1: Artificial bivariate data from SRS Table 5.1: Sample with Missing data ID Weight Cell for x Cell for y x y M M M M M M: Missing J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 34 / 70

35 x: categorical variable with three categories (1,2,3). Two imputation cells Cell 1: Cell 2: x = 1 with observed prob with observed prob with observed prob with observed prob x = 2 with observed prob with observed prob y: continuous variable. Four possible donors for cell one and three possible donors for cell two. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 35 / 70

36 Fractional hot deck imputation: Table 5.2: Fractionally Imputed Data Set Donor Final ID x y wij0 Weight Cell for x Cell for y x y J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 36 / 70

37 Table 5.2: Fractionally Imputed Data Set (Cont d) Donor Final ID x y wij0 Weight Cell for x Cell for y x y J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 37 / 70

38 Q. How to compute the fractional weight? [Step 1] Compute the cell mean of the control variable z i where z i consists of I (x i = 1), I (x i = 2), I (x i = 3), and y i. [Step 2] Apply the calibration weighting method to find wij that satisfies M w i z i + w i wij z i = w i z c i A Rc j=1 i A c and M j=1 w ij = 1. i A Mc J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 38 / 70

39 Nearest neighbor imputation Models for nearest neighbor imputation Model Y i = g (x i, β) + e i Semiparametric Model : If g ( ) were known, one could use model based imputation such as Y i = g ( ) x i, ˆβ + êi. Nonparametric Model : The form of g ( ) is unknown, except that it is a smooth function of x i. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 39 / 70

40 Models for nearest neighbor imputation Fay (1999) s model E ζ (y i ) = E ζ ( ynn1(i) ) = Eζ ( ynn2(i) ) Var ζ (y i ) = Var ζ ( ynn1(i) ) = Varζ ( ynn2(i) ) and the y-variables are uncorrelated, where nn1(i) is the index for the nearest neighbor of unit i and nn2(i) is the index for the second nearest neighbor of unit i. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 40 / 70

41 Models for nearest neighbor imputation Alternative representation of Fay (1999) s model Y j indep (µ gi, σ gi ), j A gi where A gi = {i, nn1(i), nn2(i)} is the index set of the two nearest neighbors of unit i. Thus, Fay (1999) s model is a special case of cell mean model with much finer cell definition. Kim and Fuller (2004) proposed a variance estimation method for fractional imputation under the cell mean model. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 41 / 70

42 Models for nearest neighbor imputation Alternative representation of Fay (1999) s model Y j indep (µ gi, σ gi ), j A gi where A gi = {i, nn1(i), nn2(i)} is the index set of the two nearest neighbors of unit i. Thus, Fay (1999) s model is a special case of cell mean model with much finer cell definition. Kim and Fuller (2004) proposed a variance estimation method for fractional imputation under the cell mean model. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 42 / 70

43 Example: Simple Random Sample Auxiliary Element Sample Variable Y Weight ( House Rent ) ( Income ) ? ? J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 43 / 70

44 Example : Jackknife for Fractional Imputation in Kim and Fuller (2004) w (1) ij w (2) i w (2) w (5) ij Unit w i wij X Y w (1) i ij w (5) i (0.5 δ 1 ) (0.5 + δ 5 ) (0.5 + δ 1 ) (0.5 δ 5 ) J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 44 / 70

45 Example (Continued) w (6) ij w (7) i w (7) w (10) ij Unit w i wij X Y w (6) i ij w (10) i (0.5 δ 7 ) (0.5 + δ 7 ) 0 J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 45 / 70

46 Example : Continued If θ = E (X ) If θ = E (Y ) ˆθ (k) I ˆθ I = ˆθ (k) n ˆθ n, k = 1, 2,..., 10. ˆθ (1) I ˆθ I = ˆθ (1) I,naive ˆθ I + δ 1 (y 5 y 1 ) ˆθ (5) I ˆθ I = ˆθ (5) I,naive ˆθ I + δ 5 (y 1 y 5 ) ˆθ (7) I ˆθ I = ˆθ (7) I,naive ˆθ I + δ 7 (y 8 y 7 ) ˆθ (8) I ˆθ I = ˆθ (8) I,naive ˆθ I + δ 8 (y 7 y 8 ) ˆθ (k) I ˆθ I = ˆθ (k) I,naive ˆθ I, k = 2, 3, 4, 6, 9, 10. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 46 / 70

47 Variance estimation for fractional imputation Variance estimator is a function of δ k s : ˆV δ = L k=1 c k i A R α (k) δ,i y i α i y i i A R Naive variance estimator ( δ k 0 ) : Underestimation Increasing the δ k will increase the value of variance estimator How to decide δ k? ( ) ) E ˆV δ Var (ˆθ I = E [ G L g=1 i A R k=1 ] ( ) c k α (k) 2 δ,i α i α 2 i 2 σ 2 g J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 47 / 70

48 Variance estimation for fractional imputation Kim and Fuller (2004) showed that if i A R w (k) ij = 1 (C.1) and L i A Rg k=1 ( ) c k α (k) 2 i α i = αi 2, i A Rg then the replication variance estimator defined by (C.2) ˆV I = L k=1 c k (ˆθ (k) I ˆθ I ) 2, where ˆθ (k) I = i A R α (k) i y i, is unbiased for the total variance under the cell mean model. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 48 / 70

49 Variance estimation for fractional imputation Condition (C.1) is needed for the replication weight to be used for other completely responding variables. Condition (C.2) is used to unbiasedly estimate the imputation variance. The δ k is determined by solving a quadratic equation of δ k. i A Rg c k ( ) 2 α (k) ( ) δ,i α i c k α (k) 2 0,i α i = α 2 k i A Rg L s=1 ( ) c s α (s) 2 0,k α k where α (k) 0,i = j A w (k) i wij d ij is the k-th replicate of total weight of donor i for naive variance estimator. (δ k 0) J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 49 / 70

50 Example : Continued Calculation of δ 1 : δ 1 = { δ } { δ } { } { } 2 = { } { } { } 2 Similarly, we can calculate δ 5 = 0.429, δ 7 = 0.377, and δ 8 = J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 50 / 70

51 Conclusion Kim, Fuller, and Bell (2011) used the method to compute the variance estimation for 2000 US census long form income data. Nonparametric method: Theoretically challenging but practically very attractive (and popular) Other references 1 Chen and Shao (2001) 2 Beaumont and Bocci (2009) 3 Kim and Fuller (2004), Fuller and Kim (2005), Kim (2011). J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 51 / 70

52 Small area estimation Basic Setup Original sample A is decomposed into G domains such that A = A 1 A G and n = n n G n is large but n g can be very small. Direct estimator of Y g = i U g y i Ŷ d,g = i A g 1 π i y i Unbiased May have high variance. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 52 / 70

53 If there is some auxiliary information available, then we can do something: Synthetic estimator of Y g Ŷs,g = Xg ˆβ where X g = i U g x i is the known total of x i in U g and ˆβ is an estimated regression coefficient. Low variance (if x i does not contain the domain indicator). Could be biased (unless i U g (y i x ib) = 0) J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 53 / 70

54 Composite estimation: consider Ŷ c,g = α g Ŷ d,g + (1 α g ) Ŷs,g for some α g (0, 1). We are interested in finding αg that minimizes the MSE of Ŷ c. The optimal choice is αg = MSE (Ŷs,g ) MSE ) (Ŷd,g + MSE (Ŷd,g ) (Ŷs,g ) (Ŷd,g ) For the direct estimation part, MSE = V can be estimated. ) } For the synthetic estimation part, MSE (Ŷs,g = E {(Ŷ s,g Y g ) 2 cannot be computed directly without assuming some error model. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 54 / 70

55 Area level estimation Basic Setup Parameter of interest: Ȳ g = Ng 1 Model and u g ( 0, σ 2 u). Also, we have with V g = V ( ˆȲ d,g ). i U g y i Ȳ g = X g β + u g ˆȲ d,g ( Ȳ g, V g ) J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 55 / 70

56 Two model can be written ˆȲ d,g = Ȳg + e g X g β = Ȳg u g where e g and u g are independent error terms with mean zeros and variance V g and σ 2 u, respectively. Thus, the best linear unbiased predictor (BLUP) can be written as where α g = σ 2 u/(v g + σ 2 u). ˆȲ g = αg ˆȲ d,g + ( 1 αg ) X g β (18) J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 56 / 70

57 MSE: If β, V g, and σu 2 are known, then ( ) ( ) MSE ˆȲ g = V ˆȲ g Ȳ g { ( ) = V αg ˆȲ d,g Ȳ g + ( 1 αg ) ( X ) } g β Ȳ g = ( αg ) 2 Vg + ( 1 αg ) 2 σ 2 u = αg V g = ( 1 αg ) σ 2 u. Note that, since 0 < αg < 1, ( ) MSE ˆȲ g < V g and ( ) MSE ˆȲ g < σu. 2 J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 57 / 70

58 When β is unknown (and V g and σ 2 u are known): G ˆβ = w g X g X g g=1 1 where w g = ( σ 2 u + V g ) 1. The EBLUP is G g=1 w g X g ˆȲ d,g ˆȲ g ( ˆβ) = αg ˆȲ d,g + ( 1 αg ) X ˆβ g (19) which takes the form of the composite estimator. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 58 / 70

59 The MSE is { } MSE ˆȲ g ( ˆβ) { } = V ˆȲ g ( ˆβ) Ȳ g { ( ) = V αg ˆȲ d,g Ȳ g + ( 1 αg ) ( X ˆβ )} g Ȳ g = ( αg ) 2 Vg + ( 1 αg ) 2 {σu 2 + X g V ( ˆβ) X } g = α g V g + ( 1 α g )2 X g V ( ˆβ) X g. where ( G V ˆβ) = w g X g X g g=1 1 J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 59 / 70

60 If β and σ 2 u are unknown: 1 Find a consistent estimator of β and σu. 2 2 Use ˆȲ g (ˆα g, ˆβ) = ˆα g ˆȲ d,g + ( 1 ˆα g ) X g ˆβ. (20) where ˆα g = ˆσ 2 u/( ˆV g + ˆσ 2 u) Estimation of σu: 2 Method of moment ˆσ u 2 = { G ( k g ˆȲ d,g X ˆβ) } 2 g ˆV d,g, G p g { } 1 where k g ˆσ u 2 + ˆV g and G g=1 k g = 1. If ˆσ u 2 is negative, then we set it to zero. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 60 / 70

61 MSE { MSE ˆȲ g (ˆα g, ˆβ) } { } = V ˆȲ g (ˆα g, ˆβ) Ȳ g { ( ) = V ˆα g ˆȲ d,g Ȳ g + ( 1 ˆα g ) ( X ˆβ )} g Ȳ g = ( αg ) 2 Vg + ( 1 αg ) } 2 {σu 2 + X g V ( ˆβ) X g +V (ˆα g ) { V g + σu 2 } = αg V g + ( 1 αg )2 X g V ( ˆβ) X g +V (ˆα g ) { V g + σu 2 } MSE estimation (Prasad and Rao, 1990): { MSE ˆ ˆȲ g (ˆα g, ˆβ) } = ˆα g ˆV g + ( 1 ˆα g )2 X g ˆV ( ˆβ) Xg { } +2 ˆV (ˆα g ) ˆV g + ˆσ u 2. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 61 / 70

62 Extensions Unit level estimation: Battese, Harter, and Fuller (1988). Use a unit level modeling and y gi = x giβ + u g + e gi Ŷ g = i U g {x gi ˆβ + û g }, where σu û g = Ê(u g ˆX 2 g, Ŷ g ) = (Ŷ g ˆX g ˆβ). σu 2 + ˆV g It can be shown that ˆȲ g = ˆα g Ȳreg,g + ( 1 ˆα g ) Ȳs,g where Ȳ reg,g = ˆȲ d,g + ( X g ˆ X d,g ) ˆβ and Ȳ s,g = X g ˆβ. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 62 / 70

63 Benchmarked small area estimation: Wang, Fuller, and Qu (2009). sum of the small area estimates is not necessarily equal to Ŷ = i A 1 π i y i It is desired to make the benchmarking condition holds: G N g ˆȲ g g=1 = Ŷ Idea: Since ˆȲ g = X g ˆβ + α g ( ˆȲ d,g X g ˆβ), we can adjust σ 2 u so that G g=1 ( N g αg ˆȲ d,g X ˆβ ) g = 0. For other applications, read Small Area Estimation by Rao (2003). J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 63 / 70

64 Measurement Error Measurement error: Errors due to inaccurate measurement (e.g. interviewer effect; ambiguous questions; inaccurate memory; impossible to measure directly) Two aspects: 1 Bias: very hard to measure. Validation subsample needed 2 Variance: repeated independent determinations for the same item J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 64 / 70

65 Measurement Error Model X i : observed value of item x for unit i x i : true value of item x for unit i Measurement error Model 1: where u i x i iid ( 0, σ 2 u). Measurement error Model 2: X i = x i + u i X i = γ 0 + γ 1 x i + u i where u i x i iid ( 0, σ 2 u). Possible model if xi are observed in a validation sample. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 65 / 70

66 Simple Estimators Horvitz-Thompson estimator : ˆT X = i A w ix i Parameter: T x = N i=1 x i. 1 H-T estimator unbiased under Model 1 ( ) ( ˆT X T x = ˆTx T x + ˆTX ˆT ) x The first term has zero mean over the sampling mechanism and the second term has zero mean under model 1. 2 Variance { } V ˆTX T x F x = V = V { { } ˆTx T x F x + E wi 2 σu 2 { ˆTx T x F x } + E i A } { N } w i σu 2 i=1 J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 66 / 70

67 Naive Variance Estimator ( ) ˆV ˆT X { } E ˆV ( ˆT x ) = i A j A { } = E ˆV ( ˆT x F x ) = V π 1 ij (π ij π i π j ) w i w j X i X j ( ˆT x T x F x ) + { } + E ˆV ( ˆT u ) N (1 π i ) w i σu. 2 ( ) { } The bias of ˆV ˆT X as an estimator of V ˆT X T x F x is Nσu, 2 which is negligible if n/n = o (1). Remark: The bias is negligible under the assumption that the measurement errors are independent. If the assumption does not hold, then the variance estimator is no longer asymptotically unbiased and we may need to use a different variance estimator. i=1 J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 67 / 70

68 Complex Estimators Observe (X i, y i ), where X i subject to measurement error Regression Model (under model 1) y i = β 0 + β 1 x i + e i X i = x i + u i where e i x i ( 0, σ 2 e), ui x i ( 0, σ 2 u), and ei and u i are independent. Naive approach: OLS estimator associated with y i = β 0 + β 1 X i + a i where a i = e i β 1 u i. The OLS estimator is biased because E (a i X i ) 0. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 68 / 70

69 Regression Coefficient Under the assumption of e i and u i, { n E n 1 ( Xi X ) } n y i F x = n 1 (x i x) (β 0 + β 1 x i ) + o (1) and E { i=1 n 1 i=1 n ( Xi X ) } n 2 Fx = n 1 (x i x) 2 + σu 2 + o (1), i=1 we have { }.= E ˆβ1,OLS F x β1 σ xx σ xx + σu 2 = β 1 κ xx { where σ xx = E n 1 n i=1 (x i x) 2}. Thus, the effect of measurement error is to bias the slope estimate in the direction of zero. Bias of this nature is commonly referred to as attenuation. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 69 / 70 i=1

70 If κ xx is known, a simple bias-correction method is to use If σ 2 u is known, then we use ˆβ 1 = κ 1 xx ˆβ 1,OLS. ˆβ 1 = { S 2 X σ2 u} 1 SXY. If the ratio σe/σ 2 u 2 is known, a bias-corrected method can be obtained by minimizing the following quantity: { } n (y i β 0 β 1 x i ) 2 Q (x 1,, x n, β 0, β 1 ) = σe 2 + (X i x i ) 2 σu 2. i=1 Here, x i are treated as parameters. This is essentially the least squares method with measurement errors. The solution can be obtained by minimizing { } n (y i β 0 β 1 X i ) 2 Q (β 0, β 1 ) = σe 2 + β1 2. σ2 u i=1 J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 70 / 70

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

Nonresponse weighting adjustment using estimated response probability

Nonresponse weighting adjustment using estimated response probability Nonresponse weighting adjustment using estimated response probability Jae-kwang Kim Yonsei University, Seoul, Korea December 26, 2006 Introduction Nonresponse Unit nonresponse Item nonresponse Basic strategy

More information

Chapter 8: Estimation 1

Chapter 8: Estimation 1 Chapter 8: Estimation 1 Jae-Kwang Kim Iowa State University Fall, 2014 Kim (ISU) Ch. 8: Estimation 1 Fall, 2014 1 / 33 Introduction 1 Introduction 2 Ratio estimation 3 Regression estimator Kim (ISU) Ch.

More information

6. Fractional Imputation in Survey Sampling

6. Fractional Imputation in Survey Sampling 6. Fractional Imputation in Survey Sampling 1 Introduction Consider a finite population of N units identified by a set of indices U = {1, 2,, N} with N known. Associated with each unit i in the population

More information

A measurement error model approach to small area estimation

A measurement error model approach to small area estimation A measurement error model approach to small area estimation Jae-kwang Kim 1 Spring, 2015 1 Joint work with Seunghwan Park and Seoyoung Kim Ouline Introduction Basic Theory Application to Korean LFS Discussion

More information

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)

More information

Combining data from two independent surveys: model-assisted approach

Combining data from two independent surveys: model-assisted approach Combining data from two independent surveys: model-assisted approach Jae Kwang Kim 1 Iowa State University January 20, 2012 1 Joint work with J.N.K. Rao, Carleton University Reference Kim, J.K. and Rao,

More information

Combining Non-probability and Probability Survey Samples Through Mass Imputation

Combining Non-probability and Probability Survey Samples Through Mass Imputation Combining Non-probability and Probability Survey Samples Through Mass Imputation Jae-Kwang Kim 1 Iowa State University & KAIST October 27, 2018 1 Joint work with Seho Park, Yilin Chen, and Changbao Wu

More information

Two-phase sampling approach to fractional hot deck imputation

Two-phase sampling approach to fractional hot deck imputation Two-phase sampling approach to fractional hot deck imputation Jongho Im 1, Jae-Kwang Kim 1 and Wayne A. Fuller 1 Abstract Hot deck imputation is popular for handling item nonresponse in survey sampling.

More information

Recent Advances in the analysis of missing data with non-ignorable missingness

Recent Advances in the analysis of missing data with non-ignorable missingness Recent Advances in the analysis of missing data with non-ignorable missingness Jae-Kwang Kim Department of Statistics, Iowa State University July 4th, 2014 1 Introduction 2 Full likelihood-based ML estimation

More information

Data Integration for Big Data Analysis for finite population inference

Data Integration for Big Data Analysis for finite population inference for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation

More information

Imputation for Missing Data under PPSWR Sampling

Imputation for Missing Data under PPSWR Sampling July 5, 2010 Beijing Imputation for Missing Data under PPSWR Sampling Guohua Zou Academy of Mathematics and Systems Science Chinese Academy of Sciences 1 23 () Outline () Imputation method under PPSWR

More information

Chapter 4. Replication Variance Estimation. J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28

Chapter 4. Replication Variance Estimation. J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28 Chapter 4 Replication Variance Estimation J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28 Jackknife Variance Estimation Create a new sample by deleting one observation n 1 n n ( x (k) x) 2 = x (k) = n

More information

VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA

VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA Submitted to the Annals of Applied Statistics VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA By Jae Kwang Kim, Wayne A. Fuller and William R. Bell Iowa State University

More information

On the bias of the multiple-imputation variance estimator in survey sampling

On the bias of the multiple-imputation variance estimator in survey sampling J. R. Statist. Soc. B (2006) 68, Part 3, pp. 509 521 On the bias of the multiple-imputation variance estimator in survey sampling Jae Kwang Kim, Yonsei University, Seoul, Korea J. Michael Brick, Westat,

More information

Chapter 4: Imputation

Chapter 4: Imputation Chapter 4: Imputation Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Basic Theory for imputation 3 Variance estimation after imputation 4 Replication variance estimation

More information

Cluster Sampling 2. Chapter Introduction

Cluster Sampling 2. Chapter Introduction Chapter 7 Cluster Sampling 7.1 Introduction In this chapter, we consider two-stage cluster sampling where the sample clusters are selected in the first stage and the sample elements are selected in the

More information

Introduction to Survey Data Integration

Introduction to Survey Data Integration Introduction to Survey Data Integration Jae-Kwang Kim Iowa State University May 20, 2014 Outline 1 Introduction 2 Survey Integration Examples 3 Basic Theory for Survey Integration 4 NASS application 5

More information

Fractional Imputation in Survey Sampling: A Comparative Review

Fractional Imputation in Survey Sampling: A Comparative Review Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical

More information

Parametric fractional imputation for missing data analysis

Parametric fractional imputation for missing data analysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????),??,?, pp. 1 15 C???? Biometrika Trust Printed in

More information

Propensity score adjusted method for missing data

Propensity score adjusted method for missing data Graduate Theses and Dissertations Graduate College 2013 Propensity score adjusted method for missing data Minsun Kim Riddles Iowa State University Follow this and additional works at: http://lib.dr.iastate.edu/etd

More information

Calibration estimation using exponential tilting in sample surveys

Calibration estimation using exponential tilting in sample surveys Calibration estimation using exponential tilting in sample surveys Jae Kwang Kim February 23, 2010 Abstract We consider the problem of parameter estimation with auxiliary information, where the auxiliary

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Weighting in survey analysis under informative sampling

Weighting in survey analysis under informative sampling Jae Kwang Kim and Chris J. Skinner Weighting in survey analysis under informative sampling Article (Accepted version) (Refereed) Original citation: Kim, Jae Kwang and Skinner, Chris J. (2013) Weighting

More information

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved

More information

A note on multiple imputation for general purpose estimation

A note on multiple imputation for general purpose estimation A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume

More information

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Statistica Sinica 24 (2014), 1001-1015 doi:http://dx.doi.org/10.5705/ss.2013.038 INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Seunghwan Park and Jae Kwang Kim Seoul National Univeristy

More information

analysis of incomplete data in statistical surveys

analysis of incomplete data in statistical surveys analysis of incomplete data in statistical surveys Ugo Guarnera 1 1 Italian National Institute of Statistics, Italy guarnera@istat.it Jordan Twinning: Imputation - Amman, 6-13 Dec 2014 outline 1 origin

More information

Deriving indicators from representative samples for the ESF

Deriving indicators from representative samples for the ESF Deriving indicators from representative samples for the ESF Brussels, June 17, 2014 Ralf Münnich and Stefan Zins Lisa Borsi and Jan-Philipp Kolb GESIS Mannheim and University of Trier Outline 1 Choosing

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Econometrics Workshop UNC

More information

Statistical Methods for Handling Missing Data

Statistical Methods for Handling Missing Data Statistical Methods for Handling Missing Data Jae-Kwang Kim Department of Statistics, Iowa State University July 5th, 2014 Outline Textbook : Statistical Methods for handling incomplete data by Kim and

More information

Combining multiple observational data sources to estimate causal eects

Combining multiple observational data sources to estimate causal eects Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,

More information

Simple Linear Regression: The Model

Simple Linear Regression: The Model Simple Linear Regression: The Model task: quantifying the effect of change X in X on Y, with some constant β 1 : Y = β 1 X, linear relationship between X and Y, however, relationship subject to a random

More information

Chapter 2. Section Section 2.9. J. Kim (ISU) Chapter 2 1 / 26. Design-optimal estimator under stratified random sampling

Chapter 2. Section Section 2.9. J. Kim (ISU) Chapter 2 1 / 26. Design-optimal estimator under stratified random sampling Chapter 2 Section 2.4 - Section 2.9 J. Kim (ISU) Chapter 2 1 / 26 2.4 Regression and stratification Design-optimal estimator under stratified random sampling where (Ŝxxh, Ŝxyh) ˆβ opt = ( x st, ȳ st )

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Institute of Statistics

More information

Estimation of Complex Small Area Parameters with Application to Poverty Indicators

Estimation of Complex Small Area Parameters with Application to Poverty Indicators 1 Estimation of Complex Small Area Parameters with Application to Poverty Indicators J.N.K. Rao School of Mathematics and Statistics, Carleton University (Joint work with Isabel Molina from Universidad

More information

arxiv: v2 [math.st] 20 Jun 2014

arxiv: v2 [math.st] 20 Jun 2014 A solution in small area estimation problems Andrius Čiginas and Tomas Rudys Vilnius University Institute of Mathematics and Informatics, LT-08663 Vilnius, Lithuania arxiv:1306.2814v2 [math.st] 20 Jun

More information

Combining Non-probability and. Probability Survey Samples Through Mass Imputation

Combining Non-probability and. Probability Survey Samples Through Mass Imputation Combining Non-probability and arxiv:1812.10694v2 [stat.me] 31 Dec 2018 Probability Survey Samples Through Mass Imputation Jae Kwang Kim Seho Park Yilin Chen Changbao Wu January 1, 2019 Abstract. This paper

More information

A decision theoretic approach to Imputation in finite population sampling

A decision theoretic approach to Imputation in finite population sampling A decision theoretic approach to Imputation in finite population sampling Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 August 1997 Revised May and November 1999 To appear

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 05 Full points may be obtained for correct answers to eight questions Each numbered question (which may have several parts) is worth

More information

Chapter 3: Element sampling design: Part 1

Chapter 3: Element sampling design: Part 1 Chapter 3: Element sampling design: Part 1 Jae-Kwang Kim Fall, 2014 Simple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling Kim Ch. 3: Element sampling design: Part

More information

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Multiple Regression Analysis. Part III. Multiple Regression Analysis Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant

More information

Economics 582 Random Effects Estimation

Economics 582 Random Effects Estimation Economics 582 Random Effects Estimation Eric Zivot May 29, 2013 Random Effects Model Hence, the model can be re-written as = x 0 β + + [x ] = 0 (no endogeneity) [ x ] = = + x 0 β + + [x ] = 0 [ x ] = 0

More information

Robustness to Parametric Assumptions in Missing Data Models

Robustness to Parametric Assumptions in Missing Data Models Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice

More information

Calibration Estimation for Semiparametric Copula Models under Missing Data

Calibration Estimation for Semiparametric Copula Models under Missing Data Calibration Estimation for Semiparametric Copula Models under Missing Data Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Economics and Economic Growth Centre

More information

of being selected and varying such probability across strata under optimal allocation leads to increased accuracy.

of being selected and varying such probability across strata under optimal allocation leads to increased accuracy. 5 Sampling with Unequal Probabilities Simple random sampling and systematic sampling are schemes where every unit in the population has the same chance of being selected We will now consider unequal probability

More information

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018

Econometrics I KS. Module 1: Bivariate Linear Regression. Alexander Ahammer. This version: March 12, 2018 Econometrics I KS Module 1: Bivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: March 12, 2018 Alexander Ahammer (JKU) Module 1: Bivariate

More information

Regression: Lecture 2

Regression: Lecture 2 Regression: Lecture 2 Niels Richard Hansen April 26, 2012 Contents 1 Linear regression and least squares estimation 1 1.1 Distributional results................................ 3 2 Non-linear effects and

More information

TWO-WAY CONTINGENCY TABLES UNDER CONDITIONAL HOT DECK IMPUTATION

TWO-WAY CONTINGENCY TABLES UNDER CONDITIONAL HOT DECK IMPUTATION Statistica Sinica 13(2003), 613-623 TWO-WAY CONTINGENCY TABLES UNDER CONDITIONAL HOT DECK IMPUTATION Hansheng Wang and Jun Shao Peking University and University of Wisconsin Abstract: We consider the estimation

More information

An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys

An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys Richard Valliant University of Michigan and Joint Program in Survey Methodology University of Maryland 1 Introduction

More information

Eric Shou Stat 598B / CSE 598D METHODS FOR MICRODATA PROTECTION

Eric Shou Stat 598B / CSE 598D METHODS FOR MICRODATA PROTECTION Eric Shou Stat 598B / CSE 598D METHODS FOR MICRODATA PROTECTION INTRODUCTION Statistical disclosure control part of preparations for disseminating microdata. Data perturbation techniques: Methods assuring

More information

MISSING or INCOMPLETE DATA

MISSING or INCOMPLETE DATA MISSING or INCOMPLETE DATA A (fairly) complete review of basic practice Don McLeish and Cyntha Struthers University of Waterloo Dec 5, 2015 Structure of the Workshop Session 1 Common methods for dealing

More information

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY J.D. Opsomer, W.A. Fuller and X. Li Iowa State University, Ames, IA 50011, USA 1. Introduction Replication methods are often used in

More information

Model Assisted Survey Sampling

Model Assisted Survey Sampling Carl-Erik Sarndal Jan Wretman Bengt Swensson Model Assisted Survey Sampling Springer Preface v PARTI Principles of Estimation for Finite Populations and Important Sampling Designs CHAPTER 1 Survey Sampling

More information

Analyzing Pilot Studies with Missing Observations

Analyzing Pilot Studies with Missing Observations Analyzing Pilot Studies with Missing Observations Monnie McGee mmcgee@smu.edu. Department of Statistical Science Southern Methodist University, Dallas, Texas Co-authored with N. Bergasa (SUNY Downstate

More information

Unequal Probability Designs

Unequal Probability Designs Unequal Probability Designs Department of Statistics University of British Columbia This is prepares for Stat 344, 2014 Section 7.11 and 7.12 Probability Sampling Designs: A quick review A probability

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Propensity Score Weighting with Multilevel Data

Propensity Score Weighting with Multilevel Data Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative

More information

Introduction to Econometrics

Introduction to Econometrics Introduction to Econometrics Lecture 3 : Regression: CEF and Simple OLS Zhaopeng Qu Business School,Nanjing University Oct 9th, 2017 Zhaopeng Qu (Nanjing University) Introduction to Econometrics Oct 9th,

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

Graybill Conference Poster Session Introductions

Graybill Conference Poster Session Introductions Graybill Conference Poster Session Introductions 2013 Graybill Conference in Modern Survey Statistics Colorado State University Fort Collins, CO June 10, 2013 Small Area Estimation with Incomplete Auxiliary

More information

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design 1 / 32 Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design Changbao Wu Department of Statistics and Actuarial Science University of Waterloo (Joint work with Min Chen and Mary

More information

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models John M. Neuhaus Charles E. McCulloch Division of Biostatistics University of California, San

More information

Penalized Balanced Sampling. Jay Breidt

Penalized Balanced Sampling. Jay Breidt Penalized Balanced Sampling Jay Breidt Colorado State University Joint work with Guillaume Chauvet (ENSAI) February 4, 2010 1 / 44 Linear Mixed Models Let U = {1, 2,...,N}. Consider linear mixed models

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss

More information

Advanced Topics in Survey Sampling

Advanced Topics in Survey Sampling Advanced Topics in Survey Sampling Jae-Kwang Kim Wayne A Fuller Pushpal Mukhopadhyay Department of Statistics Iowa State University World Statistics Congress Short Course July 23-24, 2015 Kim & Fuller

More information

Problem Selected Scores

Problem Selected Scores Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected

More information

ECON Program Evaluation, Binary Dependent Variable, Misc.

ECON Program Evaluation, Binary Dependent Variable, Misc. ECON 351 - Program Evaluation, Binary Dependent Variable, Misc. Maggie Jones () 1 / 17 Readings Chapter 13: Section 13.2 on difference in differences Chapter 7: Section on binary dependent variables Chapter

More information

Robust Hierarchical Bayes Small Area Estimation for Nested Error Regression Model

Robust Hierarchical Bayes Small Area Estimation for Nested Error Regression Model Robust Hierarchical Bayes Small Area Estimation for Nested Error Regression Model Adrijo Chakraborty, Gauri Sankar Datta,3 and Abhyuday Mandal NORC at the University of Chicago, Bethesda, MD 084, USA Department

More information

AN INSTRUMENTAL VARIABLE APPROACH FOR IDENTIFICATION AND ESTIMATION WITH NONIGNORABLE NONRESPONSE

AN INSTRUMENTAL VARIABLE APPROACH FOR IDENTIFICATION AND ESTIMATION WITH NONIGNORABLE NONRESPONSE Statistica Sinica 24 (2014), 1097-1116 doi:http://dx.doi.org/10.5705/ss.2012.074 AN INSTRUMENTAL VARIABLE APPROACH FOR IDENTIFICATION AND ESTIMATION WITH NONIGNORABLE NONRESPONSE Sheng Wang 1, Jun Shao

More information

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas 0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity

More information

Estimation from Purposive Samples with the Aid of Probability Supplements but without Data on the Study Variable

Estimation from Purposive Samples with the Aid of Probability Supplements but without Data on the Study Variable Estimation from Purposive Samples with the Aid of Probability Supplements but without Data on the Study Variable A.C. Singh,, V. Beresovsky, and C. Ye Survey and Data Sciences, American nstitutes for Research,

More information

Topics and Papers for Spring 14 RIT

Topics and Papers for Spring 14 RIT Eric Slud Feb. 3, 204 Topics and Papers for Spring 4 RIT The general topic of the RIT is inference for parameters of interest, such as population means or nonlinearregression coefficients, in the presence

More information

Model-based Estimation of Poverty Indicators for Small Areas: Overview. J. N. K. Rao Carleton University, Ottawa, Canada

Model-based Estimation of Poverty Indicators for Small Areas: Overview. J. N. K. Rao Carleton University, Ottawa, Canada Model-based Estimation of Poverty Indicators for Small Areas: Overview J. N. K. Rao Carleton University, Ottawa, Canada Isabel Molina Universidad Carlos III de Madrid, Spain Paper presented at The First

More information

Biostat 2065 Analysis of Incomplete Data

Biostat 2065 Analysis of Incomplete Data Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh September 13 & 15, 2005 1. Complete-case analysis (I) Complete-case analysis refers to analysis based on

More information

ECON The Simple Regression Model

ECON The Simple Regression Model ECON 351 - The Simple Regression Model Maggie Jones 1 / 41 The Simple Regression Model Our starting point will be the simple regression model where we look at the relationship between two variables In

More information

Chapter 1. Linear Regression with One Predictor Variable

Chapter 1. Linear Regression with One Predictor Variable Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical

More information

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods

Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods Chapter 4 Confidence Intervals in Ridge Regression using Jackknife and Bootstrap Methods 4.1 Introduction It is now explicable that ridge regression estimator (here we take ordinary ridge estimator (ORE)

More information

Chapter 3: Maximum Likelihood Theory

Chapter 3: Maximum Likelihood Theory Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40 1 Introduction Example 2 Maximum likelihood

More information

New Developments in Nonresponse Adjustment Methods

New Developments in Nonresponse Adjustment Methods New Developments in Nonresponse Adjustment Methods Fannie Cobben January 23, 2009 1 Introduction In this paper, we describe two relatively new techniques to adjust for (unit) nonresponse bias: The sample

More information

Eric V. Slud, Census Bureau & Univ. of Maryland Mathematics Department, University of Maryland, College Park MD 20742

Eric V. Slud, Census Bureau & Univ. of Maryland Mathematics Department, University of Maryland, College Park MD 20742 COMPARISON OF AGGREGATE VERSUS UNIT-LEVEL MODELS FOR SMALL-AREA ESTIMATION Eric V. Slud, Census Bureau & Univ. of Maryland Mathematics Department, University of Maryland, College Park MD 20742 Key words:

More information

BOOTSTRAPPING SAMPLE QUANTILES BASED ON COMPLEX SURVEY DATA UNDER HOT DECK IMPUTATION

BOOTSTRAPPING SAMPLE QUANTILES BASED ON COMPLEX SURVEY DATA UNDER HOT DECK IMPUTATION Statistica Sinica 8(998), 07-085 BOOTSTRAPPING SAMPLE QUANTILES BASED ON COMPLEX SURVEY DATA UNDER HOT DECK IMPUTATION Jun Shao and Yinzhong Chen University of Wisconsin-Madison Abstract: The bootstrap

More information

The general linear regression with k explanatory variables is just an extension of the simple regression as follows

The general linear regression with k explanatory variables is just an extension of the simple regression as follows 3. Multiple Regression Analysis The general linear regression with k explanatory variables is just an extension of the simple regression as follows (1) y i = β 0 + β 1 x i1 + + β k x ik + u i. Because

More information

Bootstrap inference for the finite population total under complex sampling designs

Bootstrap inference for the finite population total under complex sampling designs Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan.

More information

Inference with Imputed Conditional Means

Inference with Imputed Conditional Means Inference with Imputed Conditional Means Joseph L. Schafer and Nathaniel Schenker June 4, 1997 Abstract In this paper, we develop analytic techniques that can be used to produce appropriate inferences

More information

Some methods for handling missing values in outcome variables. Roderick J. Little

Some methods for handling missing values in outcome variables. Roderick J. Little Some methods for handling missing values in outcome variables Roderick J. Little Missing data principles Likelihood methods Outline ML, Bayes, Multiple Imputation (MI) Robust MAR methods Predictive mean

More information

Nonrespondent subsample multiple imputation in two-phase random sampling for nonresponse

Nonrespondent subsample multiple imputation in two-phase random sampling for nonresponse Nonrespondent subsample multiple imputation in two-phase random sampling for nonresponse Nanhua Zhang Division of Biostatistics & Epidemiology Cincinnati Children s Hospital Medical Center (Joint work

More information

Best Linear Unbiased Prediction (BLUP) of Random Effects in the Normal Linear Mixed Effects Model. *Modified notes from Dr. Dan Nettleton from ISU

Best Linear Unbiased Prediction (BLUP) of Random Effects in the Normal Linear Mixed Effects Model. *Modified notes from Dr. Dan Nettleton from ISU Best Linear Unbiased Prediction (BLUP) of Random Effects in the Normal Linear Mixed Effects Model *Modified notes from Dr. Dan Nettleton from ISU Suppose intelligence quotients (IQs) for a population of

More information

The regression model with one fixed regressor cont d

The regression model with one fixed regressor cont d The regression model with one fixed regressor cont d 3150/4150 Lecture 4 Ragnar Nymoen 27 January 2012 The model with transformed variables Regression with transformed variables I References HGL Ch 2.8

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Making sense of Econometrics: Basics

Making sense of Econometrics: Basics Making sense of Econometrics: Basics Lecture 2: Simple Regression Egypt Scholars Economic Society Happy Eid Eid present! enter classroom at http://b.socrative.com/login/student/ room name c28efb78 Outline

More information

EFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING

EFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING Statistica Sinica 13(2003), 641-653 EFFICIENT REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLING J. K. Kim and R. R. Sitter Hankuk University of Foreign Studies and Simon Fraser University Abstract:

More information

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Put your solution to each problem on a separate sheet of paper. Problem 1. (5166) Assume that two random samples {x i } and {y i } are independently

More information

Some methods for handling missing data in surveys

Some methods for handling missing data in surveys Graduate Theses and Dissertations Graduate College 2015 Some methods for handling missing data in surveys Jongho Im Iowa State University Follow this and additional works at: http://lib.dr.iastate.edu/etd

More information

Nonparametric Small Area Estimation Using Penalized Spline Regression

Nonparametric Small Area Estimation Using Penalized Spline Regression Nonparametric Small Area Estimation Using Penalized Spline Regression 0verview Spline-based nonparametric regression Nonparametric small area estimation Prediction mean squared error Bootstrapping small

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 15 1 / 38 Data structure t1 t2 tn i 1st subject y 11 y 12 y 1n1 Experimental 2nd subject

More information

Nonparametric Small Area Estimation via M-quantile Regression using Penalized Splines

Nonparametric Small Area Estimation via M-quantile Regression using Penalized Splines Nonparametric Small Estimation via M-quantile Regression using Penalized Splines Monica Pratesi 10 August 2008 Abstract The demand of reliable statistics for small areas, when only reduced sizes of the

More information

Two Applications of Nonparametric Regression in Survey Estimation

Two Applications of Nonparametric Regression in Survey Estimation Two Applications of Nonparametric Regression in Survey Estimation 1/56 Jean Opsomer Iowa State University Joint work with Jay Breidt, Colorado State University Gerda Claeskens, Université Catholique de

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai Weighting Methods Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Weighting Methods Stat186/Gov2002 Fall 2018 1 / 13 Motivation Matching methods for improving

More information