Nonresponse weighting adjustment using estimated response probability

Size: px

Start display at page:

Download "Nonresponse weighting adjustment using estimated response probability"

Maude Bridges
6 years ago
Views:

1 Nonresponse weighting adjustment using estimated response probability Jae-kwang Kim Yonsei University, Seoul, Korea December 26, 2006

2 Introduction Nonresponse Unit nonresponse Item nonresponse Basic strategy for nonresponse Unit nonresponse : Call-back, Nonresponse weighting adjustment Item nonresponse : Imputation 2

3 Basic Setup Stratum Pop. Size Mean Sample Size Respondents N R Ȳ R n R Nonrespondents N M Ȳ M n M Entire population N Ȳ n SRS from the entire population, but observe only on the respondents. Use ȳ R (respondent mean) to estimate the population mean. Bias (ȳ R ) V ar (ȳ R ). = N R ) (ȲR Ȳ M N. = SR 2 n R 3

4 Two problems : Biased : Ȳ R Ȳ M Large variance due to n R < n

5 Nonresponse weighting adjustment (NWA) method Under no missing data: Ŷ HT = i A π i y i π i = P r (i A): first-order inclusion probability of unit i A: index set of the intended sample Response indicator function: R i = { if unit i responds, 0 if unit i does not respond. 4

6 Idea : Use two-phase sampling approach Population (U) P hase Sample (A) P hase2 Respondents (A R ) Estimation: Let φ i A = P r (R i = A). If φ i A were known, then Ŷ φ = R i y i i A π i φ i A would be conditionally unbiased. In practice, we use an estimator ˆφ i A of φ i A. The NWA estimator is Ŷ NW A = R i y i i A π i ˆφ i A 5

7 Example : Logistic regression model for φ i A Model : φ (x i, α) = exp ( x i α) + exp ( x i α) Estimation of α 0 by the maximum likelihood method : Solve S (α) [R i φ (x i, α)] x i = 0 i A for α. An iterative method can be used to solve the nonlinear equation : α (t+) α (t) ( S/ α) S ( α (t)) 6

8 More generally, the score equation can be weighted: for some weight k i. S k (α) i A k i [R i φ (x i, α)] x i = 0 Should we use weights or not? Optimal k i? 7

9 Asymptotic Properties of NWA estimator Assumptions about the population and the sample [A.] Sequence of finite populations with bounded fourth moments. [A.2] No extreme weights dominate the others. [A.3] n-consistency holds for the mean-type estimators 8

10 Assumptions about the response probability [B.] φ i A does not depend on the value of others. φ i A = φ i ) (i.e. [B.2] The responses are independent: Cov ( { ) φi ( φ R i, R j = i ) if i = j 0 otherwise. [B.3] The response probability is parametrically modelled. φ i = φ ( x i ; α 0), for some known smooth function φ (x i ; ) of parameter α evaluated at α = α 0. [B.4] φ i is uniformly bounded. 9

11 Estimation of φ i Estimation of α 0 : Use weighted score equation α i A k i [R i ln (φ i ) + ( R i ) ln ( φ i )] = 0, where k i is the weight of unit i in the score equation for α. Alternative representation: S k (α) (let) = i A k i (R i φ (x i ; α)) h i (α) = 0, where h i (α) = {logit (φ i )} / α. Use ˆφ i = φ (x i ; ˆα) where ˆα is the solution to S k (α) = 0. 0

12 Basic Idea (for deriving the asymptotic properties) Write the NWA estimator as a function of ˆα: Ŷ NW A (ˆα) i A π i φ i (ˆα) R iy i Taking a Taylor expansion of Ŷ NW A (ˆα) around α = α 0 : Ŷ NW A (ˆα). = Ŷ NW A ( α 0 ) + [ ŶNW A α ( α 0 )] (ˆα α 0 ) The second term in RHS does not contribute to the expectation, but does contribute to the variance.

13 Basic Idea - continued Taking a Taylor expansion of S k (ˆα) = 0: S k (ˆα). = S k ( α 0 ) + [ Sk α ( α 0 )] (ˆα α 0 ) Combine the two expansions: Ŷ NW A (ˆα). = Ŷ NW A ( α 0 ) [ ŶNW A α. = Ŷ NW A ( α 0 ) γ N S k ( α 0 ) ( α 0 )] [ Sk α ( α 0 )] S k ( α 0 ) ( where Ŷ NW A α 0 ) = i A πi φ ( i R i y i and S k α 0 ) = i A k i (R i φ i ) h i0. 2

14 Main Result Linearization Ŷ NW A. = i A π i [ π i φ i k i h i0 γ N + R i φ i ( yi π i φ i k i h i0 γ N ) ] Conditional expectation E ( Ŷ NW A A ). = i A π i y i Conditional variance V ( Ŷ NW A A ). = i A π 2 i φ i φ i ( yi π i φ i k i h i0 γ N ) 2 3

15 Main Result -continued The NWA estimator is asymptotically unbiased regardless of the choice of k i. The variance of the NWA estimator is minimized for k i = y i π i φ i h i0 γ If we don t have any prior information about the distribution of y i, then k i = π i φ i seems to be a reasonable choice for optimal NWA estimation. 4

16 Main Result -continued Estimate α 0 using k i = π i φ i : S k (α) = i A k i (R i φ (x i ; α)) h i (α) = i A π i φ i (R i φ (x i ; α)) h i (α) S k (ˆα) = 0 is equivalent to i A π i R i ˆφ i h i (ˆα) = i A π i h i (ˆα). Thus, optimal score equation = calibration equation. 5

17 Back to Example - Logistic regression model for φ i Under the logistic regression model, logit (φ i ) = x i α0 and S k (α) i A k i [R i φ (x i, α)] x i = 0 Optimal score equation i A π i R i ˆφ i x i = i A π i x i. Optimal NWA estimator applied to x = Complete sample estimator applied to x. 6

18 Simulation Study An artificial bivariate population of size N = 0, 000: ( ) [( ) ( )] yi i.i.d. 2 ρ N,, i =, 2,, N 2 ρ x i Unequal probability sampling (by stratified sampling) Generate missing data using a logistic regression model of R i on x i. About 30% missing data. 7

19 Simulation Study - continued Four estimators. Two-phase estimator : NWA estimator using true φ i 2. Unweighted NWA estimator : NWA estimator using k i = 3. Weighted NWA estimator : NWA estimator using k i = /π i 4. Optimal NWA estimator : NWA estimator using k i = / (π i φ i ) 0,000 Monte Carlo samples of size n = 00 and n = 400 are generated repeatedly from the fixed population. 8

20 Monte Carlo standardized variances of the NWA estimators, based on 0,000 samples. n Estimator Standardized variance ρ = 0.0 ρ = 0.3 ρ = 0.6 Two-phase Unweighted NWA Weighted NWA Optimal NWA Two-phase Unweighted NWA Weighted NWA Optimal NWA

21 Variance estimation Linearization : Ŷ NW A. = i A π i η i where η i = π i φ i k i h i0 γ N + R i φ i ( yi π i φ i k i h i0 γ N ) Extended definition of R i : { if unit i responds if sampled R i = 0 if unit i does not respond if sampled, for i =, 2,, N. 20

22 Variance estimation - continued Classical two-phase approach: Population (U) P hase Sample (A) P hase2 Respondents (A R ) Reverse approach: Population (U) Responding Population (U R ) Respondents (A R ) 2

23 Variance decomposition under the reverse approach: where V E V 2 V V E V ( Ŷ NW A ). = V + V 2 i A π i i A π i η i R, R 2,, R N η i R, R 2,, R N Variance component estimation ˆV = i A ˆV 2 = j A ˆφ i i A R π ij π i π j ˆη i ˆη j π ij π i π j (ˆφ i ) ( ) 2 y i π iˆφ i k iˆγ N 22

24 Extension : Nonresponse cell method A special case of NWA method. Commonly used. Partition the sample into G cells : A = A A 2 A G Assume that the response rates φ i are constant in a cell. For i A g, use ˆφ i = i A g πi R i i A g πi 23

25 Extension - Continued NWA cell : Two cell formation criteria Qausi-randomzation approach : Equal response probability assumption Model-based approach : Homogeneous study-item-value assumption Cross-classification of two dimensions of cells is not feasible : collapse the cells in an ad-hoc manner 24

26 Extension - Continued Previously we used Ŷ NW A = G g= i A π g i i A g πi R i y i i A g πi R i Here, the cells are formed to have equal response probability. Directly use ˆφ i in the cell-weighting estimator Ŷ NW A2 = G g= i A π g i i A g πi ˆφ i R i y i i A g πi ˆφ i R i 25

27 Extension - Continued Taylor expansion Ŷ NW A2 = ŶHT + G g= i A g π i ( ) Ri (y i ȳ g ) φ i Variance V ar ( Ŷ NW A2 ) = V ar (ŶHT ) +E G g= i A g π 2 i ( ) φ i (y i ȳ g ) 2 The variance will be smaller if the cells are formed with homogeneous y s. 26

28 Conclusion Even if you know the true response probability, it s better to use the estimated response probability for the NWA estimation. Maximum likelihood method may be optimal for estimating α 0, but not optimal for NWA estimation. Standard practice of calibration is indeed an optimal procedure. Variance estimation is possible using the reverse approach. 27

29 In the cell-weighting NWA method, we can Use the estimated response probability to control the bias. Use the weighting cell to control the variance.

30 Thank You! 28

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction