A measurement error model approach to small area estimation

Similar documents
Introduction to Survey Data Integration

Chapter 8: Estimation 1

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70

Combining data from two independent surveys: model-assisted approach

Fractional Imputation in Survey Sampling: A Comparative Review

Nonresponse weighting adjustment using estimated response probability

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

For more information about how to cite these materials visit

Combining Non-probability and Probability Survey Samples Through Mass Imputation

Applied Econometrics (QEM)

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING

Data Integration for Big Data Analysis for finite population inference

Economics 582 Random Effects Estimation

Small Area Modeling of County Estimates for Corn and Soybean Yields in the US

The regression model with one fixed regressor cont d

A note on multiple imputation for general purpose estimation

Imputation for Missing Data under PPSWR Sampling

Økonomisk Kandidateksamen 2004 (I) Econometrics 2. Rettevejledning

Recent Advances in the analysis of missing data with non-ignorable missingness

Applied Time Series Topics

Making sense of Econometrics: Basics

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Stat 579: Generalized Linear Models and Extensions

F & B Approaches to a simple model

STAT 3A03 Applied Regression With SAS Fall 2017

Two-phase sampling approach to fractional hot deck imputation

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

Topic 12 Overview of Estimation

Small Area Confidence Bounds on Small Cell Proportions in Survey Populations

Weighting in survey analysis under informative sampling

ECON The Simple Regression Model

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

The Use of Survey Weights in Regression Modelling

Simple Linear Regression

VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA

Econometrics I Lecture 3: The Simple Linear Regression Model

On the bias of the multiple-imputation variance estimator in survey sampling

Cluster Sampling 2. Chapter Introduction

Advanced Econometrics

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Weighted Least Squares

Estimation of Complex Small Area Parameters with Application to Poverty Indicators

Calibration estimation using exponential tilting in sample surveys

Simple and Multiple Linear Regression

Regression - Modeling a response

Simple Linear Regression

Regression #3: Properties of OLS Estimator

Ch 2: Simple Linear Regression

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Lecture 6 Multiple Linear Regression, cont.

Statistics and Econometrics I

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models

Principles of forecasting

Parametric fractional imputation for missing data analysis

Weighted Least Squares

Small Domains Estimation and Poverty Indicators. Carleton University, Ottawa, Canada

Lecture 14 Simple Linear Regression

Statistics II. Management Degree Management Statistics IIDegree. Statistics II. 2 nd Sem. 2013/2014. Management Degree. Simple Linear Regression

Econometrics of Panel Data

Regression I: Mean Squared Error and Measuring Quality of Fit

Chapter 2 The Simple Linear Regression Model: Specification and Estimation

Chapter 1. Linear Regression with One Predictor Variable

MS&E 226: Small Data

Robust Hierarchical Bayes Small Area Estimation for Nested Error Regression Model

Lecture Notes 4 Vector Detection and Estimation. Vector Detection Reconstruction Problem Detection for Vector AGN Channel

MS&E 226: Small Data

Chapter 4: Imputation

Statistics 910, #5 1. Regression Methods

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Simple Linear Regression for the MPG Data

Two-Variable Regression Model: The Problem of Estimation

Regression diagnostics

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

The propensity score with continuous treatments

Statistics 3858 : Maximum Likelihood Estimators

Estadística II Chapter 4: Simple linear regression

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Regression Models - Introduction

In the bivariate regression model, the original parameterization is. Y i = β 1 + β 2 X2 + β 2 X2. + β 2 (X 2i X 2 ) + ε i (2)

6. Fractional Imputation in Survey Sampling

Master s Written Examination

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

11.1 Gujarati(2003): Chapter 12

ECON3150/4150 Spring 2016

Graduate Econometrics Lecture 4: Heteroskedasticity

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

BANA 7046 Data Mining I Lecture 2. Linear Regression, Model Assessment, and Cross-validation 1

L7: Multicollinearity

F9 F10: Autocorrelation

Categorical Predictor Variables

Master s Written Examination - Solution

Graybill Conference Poster Session Introductions

Econometrics of Panel Data

Chapter 3: Element sampling design: Part 1

Transcription:

A measurement error model approach to small area estimation Jae-kwang Kim 1 Spring, 2015 1 Joint work with Seunghwan Park and Seoyoung Kim

Ouline Introduction Basic Theory Application to Korean LFS Discussion Jae-kwang Kim Survey Sampling Spring, 2015 2 / 26

Introduction Small Area estimation: want to provide reliable estimates for area with insufficient sample sizes. Sample is not planned to give accurate direct estimators for the domains: domains with few or no sample observations. Idea: Model can be used to borrow strength from other sources of information. Jae-kwang Kim Survey Sampling Spring, 2015 3 / 26

Introduction Motivation: want to combine several sources of information to get improved small area estimates. How to improve the direct estimators using auxiliary variables, from other independent survey data from census data or administrative data. In our study, Area-level model approach, Several sources of auxiliary information, A measurement error model. Using a Generalized Least Squares(GLS) method. Jae-kwang Kim Survey Sampling Spring, 2015 4 / 26

Introduction General Setup Study variable : X i Survey A: Directly compute ˆX i, subject to sampling error. Survey B: Compute Ŷi1, subject to sampling error. Census: Measures Ŷi2. E A ( ˆX i ) E B (Ŷi1) due to the structural differences between the surveys. Structural differences (or systematic difference) due to different mode of survey due to time difference due to frame difference Goal: Best prediction of X i by incorporating various types of auxiliary information. Jae-kwang Kim Survey Sampling Spring, 2015 5 / 26

Basic Steps Model specification: Measurement error model approach Best prediction: BLUP Parameter estimation: GLS method MSPE estimation Jae-kwang Kim Survey Sampling Spring, 2015 6 / 26

Model: Measurement error model approach Two error models (for area i) Sampling error model ˆX i,a Ŷ i,b = X i + a i = Y i + b i where (a i, b i ) represents the sampling error such that ( ai b i ) [( 0 0 ) (, V (a i ) Cov(a i, b i ) Cov(a i, b i ) V (b i ) )] Structural error model Y i = β 0 + β 1X i + e i, e i (0, σ 2 ei) Jae-kwang Kim Survey Sampling Spring, 2015 7 / 26

Model: measurement error model approach Structural error model describes the relationship between the two survey measurement up to sampling error. X : target measurement item (variable of primary interest) Y : inaccurate measurement of X with possible systematic bias. If both X and Y measure the same item (with different survey modes), structural error model is essentially a measurement error model. (β 0 = 0, β 1 = 1 means no measurement bias.) Why consider Y i = β 0 + β 1X i + e i instead of X i = β 0 + β 1Y i + e i? : 1 We want to explain Y in terms of X. (e.g. β 0 = 0 and β 1 = 1 means no measurement bias) 2 Can handle several Y more easily. Jae-kwang Kim Survey Sampling Spring, 2015 8 / 26

Prediction Recall GLS method: y = Zθ + e, e (0, V ) ˆθ GLS = (Z V 1 Z) 1 Z V 1 y GLS approach to combine two error models: y = Zθ + e, e (0, V ) ( ˆX i,a β 1 1 (Ŷi,b β 0) ) = ( 1 1 ) ( u1i X i + u 2i ) where u 1i = a i and u 2i = β 1 1 (b i + e i ). Thus, ( ) [( ) ( u1i 0 V (a i ) β 1 1 Cov(a i, b i ), u 2i 0 β 1 1 Cov(a i, b i ) β 2 1 (V (b i ) + σei) 2 )] Jae-kwang Kim Survey Sampling Spring, 2015 9 / 26

Prediction GLS method: Best linear unbiased estimator of X i based on the linear combination of ˆX i,a and ˆX i,b = β 1 1 (Ŷi,b β 0). Under the current setup, where α i = ˆX i = α i ˆX i,a + (1 α i ) ˆX i,b σ 2 ei + V (b i ) β 1Cov(a i, b i ) σ 2 ei + β 2 1 V (a i) + V (b i ) 2β 1Cov(a i, b i ) The GLS estimator is sometimes called composite estimator. In practice we need to use ˆβ 0, ˆβ 1, and ˆσ 2 ei. Jae-kwang Kim Survey Sampling Spring, 2015 10 / 26

Parameter estimation The area-level model takes the form of measurement error model (Fuller, 1987) Ŷ i ˆX i = β 0 + β 1X i + e i + b i = X i + a i We will consider generalized least squares (GLS) method for parameter estimation. GLS Estimation of β 0, β 1: Minimize (Ŷi β 0 β 1 ˆXi ) 2 with respect to (β 0, β 1). Q 1(β 0, β 1) = i V (Ŷi β 0 β 1 ˆXi ) (1) Jae-kwang Kim Survey Sampling Spring, 2015 11 / 26

Parameter estimation (Cont d) Since ) V (Ŷi β 0 β 1 ˆXi = σei 2 + ( β 1, 1) Σ i ( β 1, 1), (2) where σ 2 ei = V (e i ) and Σ i = V {(a i, b i ) }, we can write Q (β 0, β 1) = i w i (β 1) (Ŷi β 0 β 1 ˆX i ) 2, (3) where w i (β 1) = { σ 2 ei + ( β 1, 1) Σ i ( β 1, 1) } 1. Here, Σi is assumed to be known. Note that β 0 Q = 0 i ) w i (β 1) (Ŷi β 0 β 1 ˆXi = 0 and so ˆβ 0 = ȳ w ˆβ 1 x w, (4) where ( x w, ȳ w ) = { i w i( ˆβ 1)} 1 i w i( ˆβ 1)( ˆX i, Ŷi). Jae-kwang Kim Survey Sampling Spring, 2015 12 / 26

Plugging (4) into (3), we have only to minimize Q 1 (β 1) = i w i (β 1) {Ŷi ȳ w β 1( ˆX i x w )} 2. (5) Thus, we need to find the solution to Q1 / β 1 = 0 where Q1 = { } } 2 w i (β 1) {Ŷi ȳ w β 1( ˆX i x w ) β 1 β 1 i 2 w i (β 1)( ˆX i x w ) {Ŷi ȳ w β 1( ˆX } i x w ). i Using β 1 w i (β 1) = 2 {w i (β 1)} 2 {β 1V (a i ) C(a i, b i )}, and {Ŷ1i ȳ w β 1( ˆX } 2 i x w ) p σei 2 + ( β 1, 1) Σ i ( β 1, 1) = 1/w i (β 1), the solution to Q1 / β 1 = 0 satisfies i ˆβ 1 = w i( ˆβ 1) {( x i x w ) (ȳ i ȳ w ) C(a i, b i )} i w i( ˆβ { 1) ( x i x w ) 2 V (a i ) }. (6) Jae-kwang Kim Survey Sampling Spring, 2015 13 / 26

Parameter estimation: Estimation of σ 2 ei Assume σ 2 ei = σ 2 e. We can also consider an alternative assumption such as σ 2 ei = X i σ 2 e, but in this case, parametric model assumption is needed. In practice, one can consider a transformation T ( ) such that the structural error model becomes T (Y i ) = β 0 + β 1T (X i ) + e i, e i (0, σ 2 e ). Method-of-moment estimator of σ 2 e : Solve (Ŷ i ˆβ 0 ˆX i ˆβ 1) 2 = H 2, (7) σe 2 + ( ˆβ 1, 1)Σ i ( ˆβ 1, 1) where H is the total number of small areas. i Jae-kwang Kim Survey Sampling Spring, 2015 14 / 26

Parameter estimation (Cont d) Iterative algorithm for parameter estimation. 1 Compute the initial estimator of (β 0, β 1 ) by setting ˆσ e 2 = 0. 2 Use the current value of ( ˆβ 0, ˆβ 1 ), compute ˆσ e 2 using (7). 3 Use the current value of ˆσ e1 2 compute the updated estimator of (β 0, β 1 ), using (4) and (6). 4 Repeat step 2, step 3 until convergence. Jae-kwang Kim Survey Sampling Spring, 2015 15 / 26

MSE estimation Recall the measurement error model structure Ŷ i = β 0 + β 1X i + e i + b i GLS estimator of X i : ˆX i = X i + a i ˆX i = {(β 1, 1)V 1 i (β 1, 1) } 1 (β 1, 1)V 1 i (Ŷi β 0, ˆX i ) = α i ˆX i + (1 α i ){β 1 1 (Ŷ i β 0)} = α i ˆXi,a + (1 α i ) ˆX i,b, where V i is the variance-covariance matrix of (b i + e i, a i ) and MSE of ˆX i : α i = E{( ˆX i X i ) 2 } = E σ 2 ei + V (b i ) β 1Cov(a i, b i ) σ 2 ei + β 2 1 V (a i) + V (b i ) 2β 1Cov(a i, b i ) [ { α i ( ˆX i,a X i ) + (1 α i )( ˆX i,b X i )} 2 ] = α 2 i V ( ˆX i,a ) + (1 α i ) 2 V ( ˆX i,b ) + 2α i (1 α i )Cov( ˆX i,a, ˆX i,b ) = α i V ( ˆX i,a ) + (1 α i )Cov( ˆX i,a, ˆX i,b ) := M 1i. Jae-kwang Kim Survey Sampling Spring, 2015 16 / 26

MSE estimation The actual prediction for X i is computed by ˆX ei = ˆX i (ˆθ) where θ = (β 0, β 1, σe 2 ). MSE( ˆX ei ) = MSE( ˆX { i ) + E ( ˆX ei ˆX i ) 2} Consider a jackknife approach, ˆM 2i = H 1 H = M 1i + M 2i H ( k) ( ˆȲ i ˆȲ i ) 2 k=1 where ˆα (JK) i = ˆα i H 1 H ˆM 1i = ˆα (JK) i ˆV (a i ) + (1 ˆα (JK) i )Ĉov(a i, b i ) k=1 (ˆα( k) i ˆα i ). Jae-kwang Kim Survey Sampling Spring, 2015 17 / 26

Korean LFS Application Labor Force Survey: very important economic survey measuring unemployment rates. Several sources of information for unemployment of Korea 1 Korean Labor Force Survey (KLF) data - 7K sample households (monthly) 2 Local Area Labor Force Survey (LALF) data - 200K sample households (quarterly) 3 Census long form data (10% of the population) KLF sample is nested within LALF sample. Jae-kwang Kim Survey Sampling Spring, 2015 18 / 26

Korea LFS Application Unemployment rate for small area is the parameter of interest. Several sources of information for unemployment for analysis district area i. ˆXi : estimates from KLF data Ŷ 1i : estimates from LALF data Ŷ 2i : estimates from census data KLF : sampling error, measurement error. LALF : sampling error, measurement error. Census data : sampling error, measurement error (no updated information). Jae-kwang Kim Survey Sampling Spring, 2015 19 / 26

Korea LFS Application We can Consider also Census data. Then (3) changes to ˆ X i 1 a i ˆȲ 1i β 0 = β 1 Xi + b i + ē 1i ˆȲ 2i γ 0 γ 1 ē 2i Whole process is similar to the case combining two survey. Jae-kwang Kim Survey Sampling Spring, 2015 20 / 26

Figure: Plot of Unemployment Rate for KLF and LALF Survey for Urban Area Jae-kwang Kim Survey Sampling Spring, 2015 21 / 26

Figure: Plot of Residuals against estimated values for Urban Area Jae-kwang Kim Survey Sampling Spring, 2015 22 / 26

Korea LFS Application Data analysis Result Consider four estimates MSE KLF : Only KLF LALF : Only LALF GLS 1 : Combine KLF and LALF GLS 2 : Combine KLF, LALF, and census data MSE 1st Q Median Mean 3rd Q KLF 0.0000630 0.0001210 0.0002476 0.0002395 LALF 0.0001123 0.0001330 0.0001482 0.0001695 GLS 1 0.0000444 0.0000738 0.0000893 0.0001210 GLS 2 0.0000405 0.0000543 0.0000575 0.0000721 Jae-kwang Kim Survey Sampling Spring, 2015 23 / 26

Discussion Model specification was very difficult!. We build models separately for urban and rural areas, which ares assigned based on the proportion of households engaged in agricultural business. In KLF Survey, 25% of the whole areas have 0 unemployment rate due to the quite small sample size of individual area. The areas which have 0 unemployment rate are excluded when parameters are estimated. We have considered the structural model which has a 0 intercept. Ȳ 1i = β 1 Xi + e i Mixture model or Zero-inflated regression model can be considered. Jae-kwang Kim Survey Sampling Spring, 2015 24 / 26

Summary Motivated by a real data, Korean Labor Force Survey in small area estimation GLS prediction approach under the area-level model Measurement error model for parameter estimation Instead of GLS approach, maximum likelihood approach is also possible under parametric model assumptions. Jae-kwang Kim Survey Sampling Spring, 2015 25 / 26

Reference Kim, J.K., Park, S. and Kim, S. (2015). Small area estimation combining information from several sources, Survey Methodology, In press. Jae-kwang Kim Survey Sampling Spring, 2015 26 / 26