A measurement error model approach to small area estimation Jae-kwang Kim 1 Spring, 2015 1 Joint work with Seunghwan Park and Seoyoung Kim
Ouline Introduction Basic Theory Application to Korean LFS Discussion Jae-kwang Kim Survey Sampling Spring, 2015 2 / 26
Introduction Small Area estimation: want to provide reliable estimates for area with insufficient sample sizes. Sample is not planned to give accurate direct estimators for the domains: domains with few or no sample observations. Idea: Model can be used to borrow strength from other sources of information. Jae-kwang Kim Survey Sampling Spring, 2015 3 / 26
Introduction Motivation: want to combine several sources of information to get improved small area estimates. How to improve the direct estimators using auxiliary variables, from other independent survey data from census data or administrative data. In our study, Area-level model approach, Several sources of auxiliary information, A measurement error model. Using a Generalized Least Squares(GLS) method. Jae-kwang Kim Survey Sampling Spring, 2015 4 / 26
Introduction General Setup Study variable : X i Survey A: Directly compute ˆX i, subject to sampling error. Survey B: Compute Ŷi1, subject to sampling error. Census: Measures Ŷi2. E A ( ˆX i ) E B (Ŷi1) due to the structural differences between the surveys. Structural differences (or systematic difference) due to different mode of survey due to time difference due to frame difference Goal: Best prediction of X i by incorporating various types of auxiliary information. Jae-kwang Kim Survey Sampling Spring, 2015 5 / 26
Basic Steps Model specification: Measurement error model approach Best prediction: BLUP Parameter estimation: GLS method MSPE estimation Jae-kwang Kim Survey Sampling Spring, 2015 6 / 26
Model: Measurement error model approach Two error models (for area i) Sampling error model ˆX i,a Ŷ i,b = X i + a i = Y i + b i where (a i, b i ) represents the sampling error such that ( ai b i ) [( 0 0 ) (, V (a i ) Cov(a i, b i ) Cov(a i, b i ) V (b i ) )] Structural error model Y i = β 0 + β 1X i + e i, e i (0, σ 2 ei) Jae-kwang Kim Survey Sampling Spring, 2015 7 / 26
Model: measurement error model approach Structural error model describes the relationship between the two survey measurement up to sampling error. X : target measurement item (variable of primary interest) Y : inaccurate measurement of X with possible systematic bias. If both X and Y measure the same item (with different survey modes), structural error model is essentially a measurement error model. (β 0 = 0, β 1 = 1 means no measurement bias.) Why consider Y i = β 0 + β 1X i + e i instead of X i = β 0 + β 1Y i + e i? : 1 We want to explain Y in terms of X. (e.g. β 0 = 0 and β 1 = 1 means no measurement bias) 2 Can handle several Y more easily. Jae-kwang Kim Survey Sampling Spring, 2015 8 / 26
Prediction Recall GLS method: y = Zθ + e, e (0, V ) ˆθ GLS = (Z V 1 Z) 1 Z V 1 y GLS approach to combine two error models: y = Zθ + e, e (0, V ) ( ˆX i,a β 1 1 (Ŷi,b β 0) ) = ( 1 1 ) ( u1i X i + u 2i ) where u 1i = a i and u 2i = β 1 1 (b i + e i ). Thus, ( ) [( ) ( u1i 0 V (a i ) β 1 1 Cov(a i, b i ), u 2i 0 β 1 1 Cov(a i, b i ) β 2 1 (V (b i ) + σei) 2 )] Jae-kwang Kim Survey Sampling Spring, 2015 9 / 26
Prediction GLS method: Best linear unbiased estimator of X i based on the linear combination of ˆX i,a and ˆX i,b = β 1 1 (Ŷi,b β 0). Under the current setup, where α i = ˆX i = α i ˆX i,a + (1 α i ) ˆX i,b σ 2 ei + V (b i ) β 1Cov(a i, b i ) σ 2 ei + β 2 1 V (a i) + V (b i ) 2β 1Cov(a i, b i ) The GLS estimator is sometimes called composite estimator. In practice we need to use ˆβ 0, ˆβ 1, and ˆσ 2 ei. Jae-kwang Kim Survey Sampling Spring, 2015 10 / 26
Parameter estimation The area-level model takes the form of measurement error model (Fuller, 1987) Ŷ i ˆX i = β 0 + β 1X i + e i + b i = X i + a i We will consider generalized least squares (GLS) method for parameter estimation. GLS Estimation of β 0, β 1: Minimize (Ŷi β 0 β 1 ˆXi ) 2 with respect to (β 0, β 1). Q 1(β 0, β 1) = i V (Ŷi β 0 β 1 ˆXi ) (1) Jae-kwang Kim Survey Sampling Spring, 2015 11 / 26
Parameter estimation (Cont d) Since ) V (Ŷi β 0 β 1 ˆXi = σei 2 + ( β 1, 1) Σ i ( β 1, 1), (2) where σ 2 ei = V (e i ) and Σ i = V {(a i, b i ) }, we can write Q (β 0, β 1) = i w i (β 1) (Ŷi β 0 β 1 ˆX i ) 2, (3) where w i (β 1) = { σ 2 ei + ( β 1, 1) Σ i ( β 1, 1) } 1. Here, Σi is assumed to be known. Note that β 0 Q = 0 i ) w i (β 1) (Ŷi β 0 β 1 ˆXi = 0 and so ˆβ 0 = ȳ w ˆβ 1 x w, (4) where ( x w, ȳ w ) = { i w i( ˆβ 1)} 1 i w i( ˆβ 1)( ˆX i, Ŷi). Jae-kwang Kim Survey Sampling Spring, 2015 12 / 26
Plugging (4) into (3), we have only to minimize Q 1 (β 1) = i w i (β 1) {Ŷi ȳ w β 1( ˆX i x w )} 2. (5) Thus, we need to find the solution to Q1 / β 1 = 0 where Q1 = { } } 2 w i (β 1) {Ŷi ȳ w β 1( ˆX i x w ) β 1 β 1 i 2 w i (β 1)( ˆX i x w ) {Ŷi ȳ w β 1( ˆX } i x w ). i Using β 1 w i (β 1) = 2 {w i (β 1)} 2 {β 1V (a i ) C(a i, b i )}, and {Ŷ1i ȳ w β 1( ˆX } 2 i x w ) p σei 2 + ( β 1, 1) Σ i ( β 1, 1) = 1/w i (β 1), the solution to Q1 / β 1 = 0 satisfies i ˆβ 1 = w i( ˆβ 1) {( x i x w ) (ȳ i ȳ w ) C(a i, b i )} i w i( ˆβ { 1) ( x i x w ) 2 V (a i ) }. (6) Jae-kwang Kim Survey Sampling Spring, 2015 13 / 26
Parameter estimation: Estimation of σ 2 ei Assume σ 2 ei = σ 2 e. We can also consider an alternative assumption such as σ 2 ei = X i σ 2 e, but in this case, parametric model assumption is needed. In practice, one can consider a transformation T ( ) such that the structural error model becomes T (Y i ) = β 0 + β 1T (X i ) + e i, e i (0, σ 2 e ). Method-of-moment estimator of σ 2 e : Solve (Ŷ i ˆβ 0 ˆX i ˆβ 1) 2 = H 2, (7) σe 2 + ( ˆβ 1, 1)Σ i ( ˆβ 1, 1) where H is the total number of small areas. i Jae-kwang Kim Survey Sampling Spring, 2015 14 / 26
Parameter estimation (Cont d) Iterative algorithm for parameter estimation. 1 Compute the initial estimator of (β 0, β 1 ) by setting ˆσ e 2 = 0. 2 Use the current value of ( ˆβ 0, ˆβ 1 ), compute ˆσ e 2 using (7). 3 Use the current value of ˆσ e1 2 compute the updated estimator of (β 0, β 1 ), using (4) and (6). 4 Repeat step 2, step 3 until convergence. Jae-kwang Kim Survey Sampling Spring, 2015 15 / 26
MSE estimation Recall the measurement error model structure Ŷ i = β 0 + β 1X i + e i + b i GLS estimator of X i : ˆX i = X i + a i ˆX i = {(β 1, 1)V 1 i (β 1, 1) } 1 (β 1, 1)V 1 i (Ŷi β 0, ˆX i ) = α i ˆX i + (1 α i ){β 1 1 (Ŷ i β 0)} = α i ˆXi,a + (1 α i ) ˆX i,b, where V i is the variance-covariance matrix of (b i + e i, a i ) and MSE of ˆX i : α i = E{( ˆX i X i ) 2 } = E σ 2 ei + V (b i ) β 1Cov(a i, b i ) σ 2 ei + β 2 1 V (a i) + V (b i ) 2β 1Cov(a i, b i ) [ { α i ( ˆX i,a X i ) + (1 α i )( ˆX i,b X i )} 2 ] = α 2 i V ( ˆX i,a ) + (1 α i ) 2 V ( ˆX i,b ) + 2α i (1 α i )Cov( ˆX i,a, ˆX i,b ) = α i V ( ˆX i,a ) + (1 α i )Cov( ˆX i,a, ˆX i,b ) := M 1i. Jae-kwang Kim Survey Sampling Spring, 2015 16 / 26
MSE estimation The actual prediction for X i is computed by ˆX ei = ˆX i (ˆθ) where θ = (β 0, β 1, σe 2 ). MSE( ˆX ei ) = MSE( ˆX { i ) + E ( ˆX ei ˆX i ) 2} Consider a jackknife approach, ˆM 2i = H 1 H = M 1i + M 2i H ( k) ( ˆȲ i ˆȲ i ) 2 k=1 where ˆα (JK) i = ˆα i H 1 H ˆM 1i = ˆα (JK) i ˆV (a i ) + (1 ˆα (JK) i )Ĉov(a i, b i ) k=1 (ˆα( k) i ˆα i ). Jae-kwang Kim Survey Sampling Spring, 2015 17 / 26
Korean LFS Application Labor Force Survey: very important economic survey measuring unemployment rates. Several sources of information for unemployment of Korea 1 Korean Labor Force Survey (KLF) data - 7K sample households (monthly) 2 Local Area Labor Force Survey (LALF) data - 200K sample households (quarterly) 3 Census long form data (10% of the population) KLF sample is nested within LALF sample. Jae-kwang Kim Survey Sampling Spring, 2015 18 / 26
Korea LFS Application Unemployment rate for small area is the parameter of interest. Several sources of information for unemployment for analysis district area i. ˆXi : estimates from KLF data Ŷ 1i : estimates from LALF data Ŷ 2i : estimates from census data KLF : sampling error, measurement error. LALF : sampling error, measurement error. Census data : sampling error, measurement error (no updated information). Jae-kwang Kim Survey Sampling Spring, 2015 19 / 26
Korea LFS Application We can Consider also Census data. Then (3) changes to ˆ X i 1 a i ˆȲ 1i β 0 = β 1 Xi + b i + ē 1i ˆȲ 2i γ 0 γ 1 ē 2i Whole process is similar to the case combining two survey. Jae-kwang Kim Survey Sampling Spring, 2015 20 / 26
Figure: Plot of Unemployment Rate for KLF and LALF Survey for Urban Area Jae-kwang Kim Survey Sampling Spring, 2015 21 / 26
Figure: Plot of Residuals against estimated values for Urban Area Jae-kwang Kim Survey Sampling Spring, 2015 22 / 26
Korea LFS Application Data analysis Result Consider four estimates MSE KLF : Only KLF LALF : Only LALF GLS 1 : Combine KLF and LALF GLS 2 : Combine KLF, LALF, and census data MSE 1st Q Median Mean 3rd Q KLF 0.0000630 0.0001210 0.0002476 0.0002395 LALF 0.0001123 0.0001330 0.0001482 0.0001695 GLS 1 0.0000444 0.0000738 0.0000893 0.0001210 GLS 2 0.0000405 0.0000543 0.0000575 0.0000721 Jae-kwang Kim Survey Sampling Spring, 2015 23 / 26
Discussion Model specification was very difficult!. We build models separately for urban and rural areas, which ares assigned based on the proportion of households engaged in agricultural business. In KLF Survey, 25% of the whole areas have 0 unemployment rate due to the quite small sample size of individual area. The areas which have 0 unemployment rate are excluded when parameters are estimated. We have considered the structural model which has a 0 intercept. Ȳ 1i = β 1 Xi + e i Mixture model or Zero-inflated regression model can be considered. Jae-kwang Kim Survey Sampling Spring, 2015 24 / 26
Summary Motivated by a real data, Korean Labor Force Survey in small area estimation GLS prediction approach under the area-level model Measurement error model for parameter estimation Instead of GLS approach, maximum likelihood approach is also possible under parametric model assumptions. Jae-kwang Kim Survey Sampling Spring, 2015 25 / 26
Reference Kim, J.K., Park, S. and Kim, S. (2015). Small area estimation combining information from several sources, Survey Methodology, In press. Jae-kwang Kim Survey Sampling Spring, 2015 26 / 26