Introduction to Survey Data Integration

Size: px
Start display at page:

Download "Introduction to Survey Data Integration"

Transcription

1 Introduction to Survey Data Integration Jae-Kwang Kim Iowa State University May 20, 2014

2 Outline 1 Introduction 2 Survey Integration Examples 3 Basic Theory for Survey Integration 4 NASS application 5 Discussion Kim (ISU) Matching May 20, / 29

3 1. Introduction Question: What do you mean by integration in surveys? integrating survey and administrative data through unit record linkage improving coherence across data collections by using standard classifications and questions rationalizing content between surveys, including census greater standardization of methods, tools, IT systems and processes combining separate sample surveys into one survey vehicle. Reference: Bycroft, C. (2010). Integrated household surveys: a survey vehicle approach. Wellington: Statistics New Zealand. Kim (ISU) Matching May 20, / 29

4 1. Introduction Our focus is the last part of the several meaning of survey integration. combining separate sample surveys into one survey vehicle Most sample surveys are run as stand-alone surveys. However, survey integration is a new area of research to achieve the three conflicting goals: 1 Minimize the cost associated with surveys 2 Maximize the information (or efficiency of the survey estimation) 3 Minimize the respondent burden Kim (ISU) Matching May 20, / 29

5 1. Introduction Two approaches for survey integration Macro approach: Obtain improve estimates by combining all available information General Least Squares (GLS) estimation, GLS weighting Micro approach: Create a single data (single survey vehicle) that contains all available information Synthetic data imputation Record Linkage Statistical Matching, or data fusion Kim (ISU) Matching May 20, / 29

6 2. Survey Integration Examples How to combine several source of information? Example 1: US Census of housing and population Short Form: 100 % sample (obtain basic demographic information) Long form: about 16% sample (obtain other social and economic information as well as demographic information) Classical two-phase sampling problem: Calibration weighting for demographic variable to match known population counts from short form (Deming & Stephan, 1940). Kim (ISU) Matching May 20, / 29

7 2. Survey Integration Examples Example 2: Consumer Expenditure Survey (Zieschang, 1990) Diary survey: Observe X, Y Interview survey (quarterly): Observe X Two surveys are obtained independently from the same target population. (uses the same sampling frame.) Two estimates of X, ˆX 1 and ˆX 2, can be different because of the sampling errors. How to incorporate the information from the quarterly interview survey to diary survey estimate? Kim (ISU) Matching May 20, / 29

8 2. Survey Integration Examples GLS Weighting Two parameters with three estimates: 1 Survey one: Observe ˆX 1 2 Survey two: Observe ˆX 2 and Ŷ2 GLS model ˆX 1 ˆX 2 Ŷ 2 = ( X Y ) + e 1 e 2 e 3 (1) where V e 1 e 2 e 3 = V xx V xx2 V xy2 0 V xy2 V yy2 Kim (ISU) Matching May 20, / 29

9 2. Survey Integration Examples GLS Weighting (Continued) GLS model: We may write Y = Zθ + e where e (0, V ). GLS estimator of θ: ˆθ GLS = ( Z V 1 Z ) 1 Z V 1 Y Note that the GLS estimator is a linear combination of the three observed information: In fact, we have ˆθ GLS = α 1 ˆX1 + α 2 ˆX2 + α 3 Ŷ 2 and ˆX GLS = V xx2 V xx1 ˆX 1 + ˆX 2 V xx1 + V xx2 V xx1 + V xx2 ( ) Vxy2 Ŷ GLS = Ŷ 2 + ˆX GLS ˆX 2. V xx2 Kim (ISU) Matching May 20, / 29

10 2. Survey Integration Examples Example 3: SAIPE (Fay & Herriot, 1979), combines small area level information Current Population Survey (obtain Ŷi, estimated poverty rate for area i) Census long form data (x 1i : poverty rate for the census year for area i) Food stamp participation rate (x 2i ) Pseudo poverty rate for children from tax return information & tax nonfiler rate (x 3i ) Fay-Herriot model Ŷ i = Y i + e i = (x iβ + u i ) + e i x i = (1, x 1i, x 2i, x 3i ) : predictors, e i : sampling error, u i : area i random effect (=model error). Kim (ISU) Matching May 20, / 29

11 3. Basic Theory General Setup Survey A: Measures Ŷ i,a, subject to sampling error. Survey B: Measures ˆX i,b, subject to sampling error. E A (Ŷi,a) E B ( ˆX i,b ) due to the structural difference between the surveys Structural difference (or systematic difference) due to different mode of survey due to time difference due to frame difference Example: Objective Yield Survey vs Agricultural Yield Survey and December Agricultural Surveys Kim (ISU) Matching May 20, / 29

12 3. Basic Theory Two error models (for area i) Sampling error model Ŷ i,a = Y i + a i ˆX i,b = X i + b i where (a i, b i ) represents the sampling error such that ( ) [( ) ( )] ai 0 Vyy,i V, yx,i b i 0 V xy,i V xx,i Structural error modeling X i = β 0 N i + β 1 Y i + e i, e i (0, N i σ 2 e ) where N i is the (known) population size of area i. Kim (ISU) Matching May 20, / 29

13 3. Basic Theory Structural error model describes the relationship between the two survey measurement up to sampling error. Y : target measurement item (variable of primary interest) X : inaccurate measurement of Y with possible systematic bias. If both X and Y measure the same item (with different survey modes), structural error model is essentially a measurement error model. (β 0 = 0, β 1 = 1 means no measurement bias.) Kim (ISU) Matching May 20, / 29

14 3. Basic Theory If the parameters in the structural error model are known, Ŷ i,b β1 1 ( ˆX i,b β 0 N i ) is also an unbiased estimator of Y i, computed from survey B. Estimator Ŷi,b using consistent ( ˆβ 0, ˆβ 1 ) is often called synthetic estimator. How to combine the two estimators? 1 Best Unbiased Prediction approach 2 GLS (or GMM) approach Kim (ISU) Matching May 20, / 29

15 3. Basic Theory Best Unbiased Prediction approach: 1 Specify a model for sampling error: ˆf ( ˆX i, Ŷi X i, Y i ) 2 Specify a parametric model for structural error: g (Y i X i ; θ) h(x i ; α) 3 Use Bayes rule to combine the two models and obtain the posterior distribution f (Y i ˆX i, Ŷ i ; θ, α) ˆf ( ˆX i, Ŷ i X i, Y i )g(y i X i ; θ)h(x i ; α)dx i. 4 Compute conditional expectation ( E Y i ˆX ) i, Ŷi; θ, α = Y i f (Y i ˆX i, Ŷi; θ, α)dy i Kim (ISU) Matching May 20, / 29

16 3. Basic Theory Best Unbiased Prediction approach: Best prediction under correct model specification. Requires strong model assumptions. Computation can be heavy (MCMC) Depends on unknown superpopulation parameter 1 Empirical Bayes 2 Hierarchical Bayes Kim (ISU) Matching May 20, / 29

17 3. Basic Theory Alternative approach: GLS (Generalized least squares) approach Does not require distributional assumptions. Only require moment assumptions (second moment assumptions) Combine information through GLS, not through Bayes Recall : GLS method y = Zβ + e, e (0, V ) ˆβ GLS = ( Z V 1 Z ) 1 Z V 1 y Kim (ISU) Matching May 20, / 29

18 3. Basic Theory GLS approach to combine two error models: y = Zθ + e, e (0, V ) ( Ŷi,a β 1 1 ( ˆX i,b N i β 0 ) ) = ( 1 1 ) ( u1i Y i + u 2i where u 1i = a i and u 2i = β1 1 (b i + e i ). Thus, ( ) [( ) ( u1i 0 Vyy,i β, 1 1 V xy,i u 2i 0 β1 1 V xy,i β1 2 (V xx,i + N i σe) 2 ) )]. Kim (ISU) Matching May 20, / 29

19 3. Basic Theory GLS estimator : Best linear unbiased estimator of Y i based on the linear combination of Ŷ i,a and Ŷ i,b = β 1 1 ( ˆX i,b N i β 0 ). Under the current setup, where Ŷ i = ω i Ŷ i,a + (1 ω i )Ŷ i,b N i σe 2 + β1 2 ω i = V xx,i β 1 V xy,i N i σe 2 + β1 2V xx,i + V yy,i 2β 1 V xy,i The GLS estimator is sometimes called composite estimator. In practice, we need to use ˆβ 0, ˆβ 1, and ˆσ 2 e. Kim (ISU) Matching May 20, / 29

20 3. Basic Theory Parameter estimation for the structural model 1 Case 1: Matching measurement X and measurement Y is possible (e.g.: Survey A sample is a subset of survey B sample.) 2 Case 2: Matching is not possible. In Case 1, we can easily obtain a consistent estimator of the model parameters from the set where units have both X and Y observed. (Unit level modeling) In Case 2, we may use area level model to match ˆX i (from survey B) and Ŷi (from survey A) for each area. Kim (ISU) Matching May 20, / 29

21 3. Basic Theory To estimate (β 0, β 1 ), we express ˆX i,b = N i β 0 + Y i β 1 + e i + b i Ŷ i,a = Y i + a i Direct regression of ˆX i,b on (N i, Ŷ i,a ) does not work because Ŷ i,a has sampling error. The area-level model takes the form of measurement error model (Fuller, 1987). Parameter estimation can be performed using the measurement error model estimation methods. (Details skipped.) Kim (ISU) Matching May 20, / 29

22 4. NASS application Improving Crop Acreage Estimation National Agricultural Statistical Service (NASS) under US Department of Agriculture provides acreage estimates for crops. Several sources of information for crop acreage June Area Survey (JAS) data Satellite imagery classification data. Cropland Data Layer (CDL) Farm Service Agency (FSA) data Kim (ISU) Matching May 20, / 29

23 4. NASS application Improving Crop Acreage Estimation Several sources of information for crop acreage (for area i: analysis district) Ŷ i : estimates from JAS (subject to sampling error) ˆX 1i : estimates from CDL ˆX 2i : FSA estimates Even though satellite image data is not subject to sampling error, the CDL estimates are subject to sampling error due to classification from JAS data. FSA estimates can be modified to reduce the coverage error by modeling propensity scores for program participation. Thus, FSA estimates are subject to some type of sampling error where the sampling mechanism here refers to the program participation. Kim (ISU) Matching May 20, / 29

24 Figure : The 2009 cropland data layer products. The legend identifies aggregated agricultural and non-agricultural land cover categories by decreasing acreage. Kim (ISU) Matching May 20, / 29

25 4. NASS application Improving Crop Acreage Estimation We can construct separate structural error models X 1i = β 10 N i + β 11 Y i + e 1i X 2i = β 20 N i + β 21 Y i + e 2i where ( e1i e 2i ) [( 0 0 ) ( Ni σ, e N i σe2 2 )] Kim (ISU) Matching May 20, / 29

26 4. NASS application Improving Crop Acreage Estimation GLS method can be applied to separate structural error models to get Ŷ i Ŷ i,1 Ŷ i,2 = Y i + where Ŷ i,1 = ˆβ 11 1 ( ˆX 1i ˆβ 10 N i ) and Ŷ i,2 = ˆβ 21 1 ( ˆX 2i ˆβ 20 N i ) are two different synthetic estimators of Y i. u 1i u 2i u 3i We may use a simpler covariance matrix to achieve computational simplicity., Kim (ISU) Matching May 20, / 29

27 4. NASS application Some Results Figure : Comparison of the estimates in Indiana State Kim (ISU) Matching May 20, / 29

28 4. NASS application Some Results Table : CV for Corn Acreage (%) JAS GLS JAS GLS CA IN NE PN Kim (ISU) Matching May 20, / 29

29 5. Discussion Two error models: sampling error model and structural error model. GLS method provides a useful tool for combining two models. Does not rely on parametric distributional assumptions. Requires correct specification of the variance-covariance matrix for optimal estimation. Simpler form of covariance matrix can be used for computational simplicity over statistical efficiency. Kim (ISU) Matching May 20, / 29

30 REFERENCES Deming, W. E. & Stephan, F. F. (1940). On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. Annals of Mathematical Statistics 11, Fay, R. & Herriot, R. (1979). Estimates of income for small places: an application of James-Stein procedures to census data. J. Am. Statist. Assoc. 74, Fuller, W. (1987). Measurement Error Models. Hoboken, New Jersey: John Wiley & Sons, Inc. Zieschang, K. (1990). Sample weighting method and estimation of totals in the consumer expenditure survey. J. Am. Statist. Assoc. 85, Kim (ISU) Matching May 20, / 29

A measurement error model approach to small area estimation

A measurement error model approach to small area estimation A measurement error model approach to small area estimation Jae-kwang Kim 1 Spring, 2015 1 Joint work with Seunghwan Park and Seoyoung Kim Ouline Introduction Basic Theory Application to Korean LFS Discussion

More information

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

Combining Non-probability and Probability Survey Samples Through Mass Imputation

Combining Non-probability and Probability Survey Samples Through Mass Imputation Combining Non-probability and Probability Survey Samples Through Mass Imputation Jae-Kwang Kim 1 Iowa State University & KAIST October 27, 2018 1 Joint work with Seho Park, Yilin Chen, and Changbao Wu

More information

Small Area Modeling of County Estimates for Corn and Soybean Yields in the US

Small Area Modeling of County Estimates for Corn and Soybean Yields in the US Small Area Modeling of County Estimates for Corn and Soybean Yields in the US Matt Williams National Agricultural Statistics Service United States Department of Agriculture Matt.Williams@nass.usda.gov

More information

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Chapter 5: Models used in conjunction with sampling J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Nonresponse Unit Nonresponse: weight adjustment Item Nonresponse:

More information

Chapter 8: Estimation 1

Chapter 8: Estimation 1 Chapter 8: Estimation 1 Jae-Kwang Kim Iowa State University Fall, 2014 Kim (ISU) Ch. 8: Estimation 1 Fall, 2014 1 / 33 Introduction 1 Introduction 2 Ratio estimation 3 Regression estimator Kim (ISU) Ch.

More information

Combining data from two independent surveys: model-assisted approach

Combining data from two independent surveys: model-assisted approach Combining data from two independent surveys: model-assisted approach Jae Kwang Kim 1 Iowa State University January 20, 2012 1 Joint work with J.N.K. Rao, Carleton University Reference Kim, J.K. and Rao,

More information

Fitting a Bayesian Fay-Herriot Model

Fitting a Bayesian Fay-Herriot Model Fitting a Bayesian Fay-Herriot Model Nathan B. Cruze United States Department of Agriculture National Agricultural Statistics Service (NASS) Research and Development Division Washington, DC October 25,

More information

Fractional Imputation in Survey Sampling: A Comparative Review

Fractional Imputation in Survey Sampling: A Comparative Review Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical

More information

Recent Advances in the analysis of missing data with non-ignorable missingness

Recent Advances in the analysis of missing data with non-ignorable missingness Recent Advances in the analysis of missing data with non-ignorable missingness Jae-Kwang Kim Department of Statistics, Iowa State University July 4th, 2014 1 Introduction 2 Full likelihood-based ML estimation

More information

Eric V. Slud, Census Bureau & Univ. of Maryland Mathematics Department, University of Maryland, College Park MD 20742

Eric V. Slud, Census Bureau & Univ. of Maryland Mathematics Department, University of Maryland, College Park MD 20742 COMPARISON OF AGGREGATE VERSUS UNIT-LEVEL MODELS FOR SMALL-AREA ESTIMATION Eric V. Slud, Census Bureau & Univ. of Maryland Mathematics Department, University of Maryland, College Park MD 20742 Key words:

More information

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Statistica Sinica 24 (2014), 1001-1015 doi:http://dx.doi.org/10.5705/ss.2013.038 INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Seunghwan Park and Jae Kwang Kim Seoul National Univeristy

More information

VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA

VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA Submitted to the Annals of Applied Statistics VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA By Jae Kwang Kim, Wayne A. Fuller and William R. Bell Iowa State University

More information

Small Area Confidence Bounds on Small Cell Proportions in Survey Populations

Small Area Confidence Bounds on Small Cell Proportions in Survey Populations Small Area Confidence Bounds on Small Cell Proportions in Survey Populations Aaron Gilary, Jerry Maples, U.S. Census Bureau U.S. Census Bureau Eric V. Slud, U.S. Census Bureau Univ. Maryland College Park

More information

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)

More information

A Hierarchical Clustering Algorithm for Multivariate Stratification in Stratified Sampling

A Hierarchical Clustering Algorithm for Multivariate Stratification in Stratified Sampling A Hierarchical Clustering Algorithm for Multivariate Stratification in Stratified Sampling Stephanie Zimmer Jae-kwang Kim Sarah Nusser Abstract Stratification is used in sampling to create homogeneous

More information

Small Area Estimates of Poverty Incidence in the State of Uttar Pradesh in India

Small Area Estimates of Poverty Incidence in the State of Uttar Pradesh in India Small Area Estimates of Poverty Incidence in the State of Uttar Pradesh in India Hukum Chandra Indian Agricultural Statistics Research Institute, New Delhi Email: hchandra@iasri.res.in Acknowledgments

More information

Advanced Methods for Agricultural and Agroenvironmental. Emily Berg, Zhengyuan Zhu, Sarah Nusser, and Wayne Fuller

Advanced Methods for Agricultural and Agroenvironmental. Emily Berg, Zhengyuan Zhu, Sarah Nusser, and Wayne Fuller Advanced Methods for Agricultural and Agroenvironmental Monitoring Emily Berg, Zhengyuan Zhu, Sarah Nusser, and Wayne Fuller Outline 1. Introduction to the National Resources Inventory 2. Hierarchical

More information

6. Fractional Imputation in Survey Sampling

6. Fractional Imputation in Survey Sampling 6. Fractional Imputation in Survey Sampling 1 Introduction Consider a finite population of N units identified by a set of indices U = {1, 2,, N} with N known. Associated with each unit i in the population

More information

Robust Hierarchical Bayes Small Area Estimation for Nested Error Regression Model

Robust Hierarchical Bayes Small Area Estimation for Nested Error Regression Model Robust Hierarchical Bayes Small Area Estimation for Nested Error Regression Model Adrijo Chakraborty, Gauri Sankar Datta,3 and Abhyuday Mandal NORC at the University of Chicago, Bethesda, MD 084, USA Department

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Nonresponse weighting adjustment using estimated response probability

Nonresponse weighting adjustment using estimated response probability Nonresponse weighting adjustment using estimated response probability Jae-kwang Kim Yonsei University, Seoul, Korea December 26, 2006 Introduction Nonresponse Unit nonresponse Item nonresponse Basic strategy

More information

Parametric fractional imputation for missing data analysis

Parametric fractional imputation for missing data analysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????),??,?, pp. 1 15 C???? Biometrika Trust Printed in

More information

Small Area Estimation with Uncertain Random Effects

Small Area Estimation with Uncertain Random Effects Small Area Estimation with Uncertain Random Effects Gauri Sankar Datta 1 2 3 and Abhyuday Mandal 4 Abstract Random effects models play an important role in model-based small area estimation. Random effects

More information

Chapter 4: Imputation

Chapter 4: Imputation Chapter 4: Imputation Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Basic Theory for imputation 3 Variance estimation after imputation 4 Replication variance estimation

More information

Selection of small area estimation method for Poverty Mapping: A Conceptual Framework

Selection of small area estimation method for Poverty Mapping: A Conceptual Framework Selection of small area estimation method for Poverty Mapping: A Conceptual Framework Sumonkanti Das National Institute for Applied Statistics Research Australia University of Wollongong The First Asian

More information

Imputation for Missing Data under PPSWR Sampling

Imputation for Missing Data under PPSWR Sampling July 5, 2010 Beijing Imputation for Missing Data under PPSWR Sampling Guohua Zou Academy of Mathematics and Systems Science Chinese Academy of Sciences 1 23 () Outline () Imputation method under PPSWR

More information

Data Integration for Big Data Analysis for finite population inference

Data Integration for Big Data Analysis for finite population inference for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation

More information

Small domain estimation using probability and non-probability survey data

Small domain estimation using probability and non-probability survey data Small domain estimation using probability and non-probability survey data Adrijo Chakraborty and N Ganesh March 2018, FCSM Outline Introduction to the Problem Approaches for Combining Probability and Non-Probability

More information

A note on multiple imputation for general purpose estimation

A note on multiple imputation for general purpose estimation A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume

More information

The Simple Regression Model. Part II. The Simple Regression Model

The Simple Regression Model. Part II. The Simple Regression Model Part II The Simple Regression Model As of Sep 22, 2015 Definition 1 The Simple Regression Model Definition Estimation of the model, OLS OLS Statistics Algebraic properties Goodness-of-Fit, the R-square

More information

Advanced Econometrics

Advanced Econometrics Based on the textbook by Verbeek: A Guide to Modern Econometrics Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna May 16, 2013 Outline Univariate

More information

Small area solutions for the analysis of pollution data TIES Daniela Cocchi, Enrico Fabrizi, Carlo Trivisano. Università di Bologna

Small area solutions for the analysis of pollution data TIES Daniela Cocchi, Enrico Fabrizi, Carlo Trivisano. Università di Bologna Small area solutions for the analysis of pollution data Daniela Cocchi, Enrico Fabrizi, Carlo Trivisano Università di Bologna TIES 00 Genova, June nd 00 Summary The small area problem in the environmental

More information

Small Domains Estimation and Poverty Indicators. Carleton University, Ottawa, Canada

Small Domains Estimation and Poverty Indicators. Carleton University, Ottawa, Canada Small Domains Estimation and Poverty Indicators J. N. K. Rao Carleton University, Ottawa, Canada Invited paper for presentation at the International Seminar Population Estimates and Projections: Methodologies,

More information

Graybill Conference Poster Session Introductions

Graybill Conference Poster Session Introductions Graybill Conference Poster Session Introductions 2013 Graybill Conference in Modern Survey Statistics Colorado State University Fort Collins, CO June 10, 2013 Small Area Estimation with Incomplete Auxiliary

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Multivariate area level models for small area estimation. a

Multivariate area level models for small area estimation. a Multivariate area level models for small area estimation. a a In collaboration with Roberto Benavent Domingo Morales González d.morales@umh.es Universidad Miguel Hernández de Elche Multivariate area level

More information

Estimation of Mean Population in Small Area with Spatial Best Linear Unbiased Prediction Method

Estimation of Mean Population in Small Area with Spatial Best Linear Unbiased Prediction Method Journal of Physics: Conference Series PAPER OPEN ACCESS Estimation of Mean Population in Small Area with Spatial Best Linear Unbiased Prediction Method To cite this article: Syahril Ramadhan et al 2017

More information

Nonlinear Temperature Effects of Dust Bowl Region and Short-Run Adaptation

Nonlinear Temperature Effects of Dust Bowl Region and Short-Run Adaptation Nonlinear Temperature Effects of Dust Bowl Region and Short-Run Adaptation A. John Woodill University of Hawaii at Manoa Seminar in Energy and Environmental Policy March 28, 2016 A. John Woodill Nonlinear

More information

Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics

Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics Amang S. Sukasih, Mathematica Policy Research, Inc. Donsig Jang, Mathematica Policy Research, Inc. Amang S. Sukasih,

More information

Bootstrap inference for the finite population total under complex sampling designs

Bootstrap inference for the finite population total under complex sampling designs Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan.

More information

New Developments in Nonresponse Adjustment Methods

New Developments in Nonresponse Adjustment Methods New Developments in Nonresponse Adjustment Methods Fannie Cobben January 23, 2009 1 Introduction In this paper, we describe two relatively new techniques to adjust for (unit) nonresponse bias: The sample

More information

Determining Changes in Welfare Distributions at the Micro-level: Updating Poverty Maps By Chris Elbers, Jean O. Lanjouw, and Peter Lanjouw 1

Determining Changes in Welfare Distributions at the Micro-level: Updating Poverty Maps By Chris Elbers, Jean O. Lanjouw, and Peter Lanjouw 1 Determining Changes in Welfare Distributions at the Micro-level: Updating Poverty Maps By Chris Elbers, Jean O. Lanjouw, and Peter Lanjouw 1 Income and wealth distributions have a prominent position in

More information

Making sense of Econometrics: Basics

Making sense of Econometrics: Basics Making sense of Econometrics: Basics Lecture 2: Simple Regression Egypt Scholars Economic Society Happy Eid Eid present! enter classroom at http://b.socrative.com/login/student/ room name c28efb78 Outline

More information

Likelihood-based inference with missing data under missing-at-random

Likelihood-based inference with missing data under missing-at-random Likelihood-based inference with missing data under missing-at-random Jae-kwang Kim Joint work with Shu Yang Department of Statistics, Iowa State University May 4, 014 Outline 1. Introduction. Parametric

More information

A Small Area Procedure for Estimating Population Counts

A Small Area Procedure for Estimating Population Counts Graduate heses and Dissertations Iowa State University Capstones, heses and Dissertations 2010 A Small Area Procedure for Estimating Population Counts Emily J. erg Iowa State University Follow this and

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:

More information

REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES

REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES Statistica Sinica 8(1998), 1153-1164 REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES Wayne A. Fuller Iowa State University Abstract: The estimation of the variance of the regression estimator for

More information

Statistical perspectives on spatial social science

Statistical perspectives on spatial social science Statistical perspectives on spatial social science Discussion Sarah Nusser (nusser@iastate.edu) Center for Survey Statistics and Methodology Department of Statistics Iowa State University Morris Hansen

More information

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter

More information

Chapter 4. Replication Variance Estimation. J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28

Chapter 4. Replication Variance Estimation. J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28 Chapter 4 Replication Variance Estimation J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28 Jackknife Variance Estimation Create a new sample by deleting one observation n 1 n n ( x (k) x) 2 = x (k) = n

More information

Plausible Values for Latent Variables Using Mplus

Plausible Values for Latent Variables Using Mplus Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can

More information

Statistical Methods for Handling Missing Data

Statistical Methods for Handling Missing Data Statistical Methods for Handling Missing Data Jae-Kwang Kim Department of Statistics, Iowa State University July 5th, 2014 Outline Textbook : Statistical Methods for handling incomplete data by Kim and

More information

g-priors for Linear Regression

g-priors for Linear Regression Stat60: Bayesian Modeling and Inference Lecture Date: March 15, 010 g-priors for Linear Regression Lecturer: Michael I. Jordan Scribe: Andrew H. Chan 1 Linear regression and g-priors In the last lecture,

More information

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we

More information

Estimation of Complex Small Area Parameters with Application to Poverty Indicators

Estimation of Complex Small Area Parameters with Application to Poverty Indicators 1 Estimation of Complex Small Area Parameters with Application to Poverty Indicators J.N.K. Rao School of Mathematics and Statistics, Carleton University (Joint work with Isabel Molina from Universidad

More information

Non-Parametric Bootstrap Mean. Squared Error Estimation For M- Quantile Estimators Of Small Area. Averages, Quantiles And Poverty

Non-Parametric Bootstrap Mean. Squared Error Estimation For M- Quantile Estimators Of Small Area. Averages, Quantiles And Poverty Working Paper M11/02 Methodology Non-Parametric Bootstrap Mean Squared Error Estimation For M- Quantile Estimators Of Small Area Averages, Quantiles And Poverty Indicators Stefano Marchetti, Nikos Tzavidis,

More information

A BAYESIAN APPROACH TO JOINT SMALL AREA ESTIMATION

A BAYESIAN APPROACH TO JOINT SMALL AREA ESTIMATION A BAYESIAN APPROACH TO JOINT SMALL AREA ESTIMATION A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Yanping Qu IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

More information

Statistics and Econometrics I

Statistics and Econometrics I Statistics and Econometrics I Point Estimation Shiu-Sheng Chen Department of Economics National Taiwan University September 13, 2016 Shiu-Sheng Chen (NTU Econ) Statistics and Econometrics I September 13,

More information

Business Statistics: A First Course

Business Statistics: A First Course Business Statistics: A First Course 5 th Edition Chapter 7 Sampling and Sampling Distributions Basic Business Statistics, 11e 2009 Prentice-Hall, Inc. Chap 7-1 Learning Objectives In this chapter, you

More information

21. Best Linear Unbiased Prediction (BLUP) of Random Effects in the Normal Linear Mixed Effects Model

21. Best Linear Unbiased Prediction (BLUP) of Random Effects in the Normal Linear Mixed Effects Model 21. Best Linear Unbiased Prediction (BLUP) of Random Effects in the Normal Linear Mixed Effects Model Copyright c 2018 (Iowa State University) 21. Statistics 510 1 / 26 C. R. Henderson Born April 1, 1911,

More information

Contextual Effects in Modeling for Small Domains

Contextual Effects in Modeling for Small Domains University of Wollongong Research Online Applied Statistics Education and Research Collaboration (ASEARC) - Conference Papers Faculty of Engineering and Information Sciences 2011 Contextual Effects in

More information

36-720: The Rasch Model

36-720: The Rasch Model 36-720: The Rasch Model Brian Junker October 15, 2007 Multivariate Binary Response Data Rasch Model Rasch Marginal Likelihood as a GLMM Rasch Marginal Likelihood as a Log-Linear Model Example For more

More information

Combining multiple observational data sources to estimate causal eects

Combining multiple observational data sources to estimate causal eects Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,

More information

Chapter 3: Maximum Likelihood Theory

Chapter 3: Maximum Likelihood Theory Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40 1 Introduction Example 2 Maximum likelihood

More information

Multiple Regression Analysis. Part III. Multiple Regression Analysis

Multiple Regression Analysis. Part III. Multiple Regression Analysis Part III Multiple Regression Analysis As of Sep 26, 2017 1 Multiple Regression Analysis Estimation Matrix form Goodness-of-Fit R-square Adjusted R-square Expected values of the OLS estimators Irrelevant

More information

LFS quarterly small area estimation of youth unemployment at provincial level

LFS quarterly small area estimation of youth unemployment at provincial level LFS quarterly small area estimation of youth unemployment at provincial level Stima trimestrale della disoccupazione giovanile a livello provinciale su dati RFL Michele D Aló, Stefano Falorsi, Silvia Loriga

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

The Use of Survey Weights in Regression Modelling

The Use of Survey Weights in Regression Modelling The Use of Survey Weights in Regression Modelling Chris Skinner London School of Economics and Political Science (with Jae-Kwang Kim, Iowa State University) Colorado State University, June 2013 1 Weighting

More information

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design 1 / 32 Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design Changbao Wu Department of Statistics and Actuarial Science University of Waterloo (Joint work with Min Chen and Mary

More information

Mixed Effects Prediction under Benchmarking and Applications to Small Area Estimation

Mixed Effects Prediction under Benchmarking and Applications to Small Area Estimation CIRJE-F-832 Mixed Effects Prediction under Benchmarking and Applications to Small Area Estimation Tatsuya Kubokawa University of Tokyo January 2012 CIRJE Discussion Papers can be downloaded without charge

More information

Calibration estimation using exponential tilting in sample surveys

Calibration estimation using exponential tilting in sample surveys Calibration estimation using exponential tilting in sample surveys Jae Kwang Kim February 23, 2010 Abstract We consider the problem of parameter estimation with auxiliary information, where the auxiliary

More information

Non-parametric bootstrap mean squared error estimation for M-quantile estimates of small area means, quantiles and poverty indicators

Non-parametric bootstrap mean squared error estimation for M-quantile estimates of small area means, quantiles and poverty indicators Non-parametric bootstrap mean squared error estimation for M-quantile estimates of small area means, quantiles and poverty indicators Stefano Marchetti 1 Nikos Tzavidis 2 Monica Pratesi 3 1,3 Department

More information

GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY

GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY CHAPTER 2: GEOGRAPHIC DATA Primary data - acquired directly from the original source In situ or in the field Costly time and money! Campus bike racks

More information

Section on Survey Research Methods JSM 2010 STATISTICAL GRAPHICS OF PEARSON RESIDUALS IN SURVEY LOGISTIC REGRESSION DIAGNOSIS

Section on Survey Research Methods JSM 2010 STATISTICAL GRAPHICS OF PEARSON RESIDUALS IN SURVEY LOGISTIC REGRESSION DIAGNOSIS STATISTICAL GRAPHICS OF PEARSON RESIDUALS IN SURVEY LOGISTIC REGRESSION DIAGNOSIS Stanley Weng, National Agricultural Statistics Service, U.S. Department of Agriculture 3251 Old Lee Hwy, Fairfax, VA 22030,

More information

Econometrics A. Simple linear model (2) Keio University, Faculty of Economics. Simon Clinet (Keio University) Econometrics A October 16, / 11

Econometrics A. Simple linear model (2) Keio University, Faculty of Economics. Simon Clinet (Keio University) Econometrics A October 16, / 11 Econometrics A Keio University, Faculty of Economics Simple linear model (2) Simon Clinet (Keio University) Econometrics A October 16, 2018 1 / 11 Estimation of the noise variance σ 2 In practice σ 2 too

More information

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables Applied Econometrics (MSc.) Lecture 3 Instrumental Variables Estimation - Theory Department of Economics University of Gothenburg December 4, 2014 1/28 Why IV estimation? So far, in OLS, we assumed independence.

More information

Dealing With Endogeneity

Dealing With Endogeneity Dealing With Endogeneity Junhui Qian December 22, 2014 Outline Introduction Instrumental Variable Instrumental Variable Estimation Two-Stage Least Square Estimation Panel Data Endogeneity in Econometrics

More information

STAT 3A03 Applied Regression With SAS Fall 2017

STAT 3A03 Applied Regression With SAS Fall 2017 STAT 3A03 Applied Regression With SAS Fall 2017 Assignment 2 Solution Set Q. 1 I will add subscripts relating to the question part to the parameters and their estimates as well as the errors and residuals.

More information

Miscellanea A note on multiple imputation under complex sampling

Miscellanea A note on multiple imputation under complex sampling Biometrika (2017), 104, 1,pp. 221 228 doi: 10.1093/biomet/asw058 Printed in Great Britain Advance Access publication 3 January 2017 Miscellanea A note on multiple imputation under complex sampling BY J.

More information

AN INSTRUMENTAL VARIABLE APPROACH FOR IDENTIFICATION AND ESTIMATION WITH NONIGNORABLE NONRESPONSE

AN INSTRUMENTAL VARIABLE APPROACH FOR IDENTIFICATION AND ESTIMATION WITH NONIGNORABLE NONRESPONSE Statistica Sinica 24 (2014), 1097-1116 doi:http://dx.doi.org/10.5705/ss.2012.074 AN INSTRUMENTAL VARIABLE APPROACH FOR IDENTIFICATION AND ESTIMATION WITH NONIGNORABLE NONRESPONSE Sheng Wang 1, Jun Shao

More information

Intermediate Econometrics

Intermediate Econometrics Intermediate Econometrics Markus Haas LMU München Summer term 2011 15. Mai 2011 The Simple Linear Regression Model Considering variables x and y in a specific population (e.g., years of education and wage

More information

Introduction to Econometrics

Introduction to Econometrics Introduction to Econometrics Lecture 3 : Regression: CEF and Simple OLS Zhaopeng Qu Business School,Nanjing University Oct 9th, 2017 Zhaopeng Qu (Nanjing University) Introduction to Econometrics Oct 9th,

More information

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information

Economics 620, Lecture 2: Regression Mechanics (Simple Regression)

Economics 620, Lecture 2: Regression Mechanics (Simple Regression) 1 Economics 620, Lecture 2: Regression Mechanics (Simple Regression) Observed variables: y i ; x i i = 1; :::; n Hypothesized (model): Ey i = + x i or y i = + x i + (y i Ey i ) ; renaming we get: y i =

More information

Model-based Estimation of Poverty Indicators for Small Areas: Overview. J. N. K. Rao Carleton University, Ottawa, Canada

Model-based Estimation of Poverty Indicators for Small Areas: Overview. J. N. K. Rao Carleton University, Ottawa, Canada Model-based Estimation of Poverty Indicators for Small Areas: Overview J. N. K. Rao Carleton University, Ottawa, Canada Isabel Molina Universidad Carlos III de Madrid, Spain Paper presented at The First

More information

Lecture 5: LDA and Logistic Regression

Lecture 5: LDA and Logistic Regression Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant

More information

Two-phase sampling approach to fractional hot deck imputation

Two-phase sampling approach to fractional hot deck imputation Two-phase sampling approach to fractional hot deck imputation Jongho Im 1, Jae-Kwang Kim 1 and Wayne A. Fuller 1 Abstract Hot deck imputation is popular for handling item nonresponse in survey sampling.

More information

Estimation of Production Functions using Average Data

Estimation of Production Functions using Average Data Estimation of Production Functions using Average Data Matthew J. Salois Food and Resource Economics Department, University of Florida PO Box 110240, Gainesville, FL 32611-0240 msalois@ufl.edu Grigorios

More information

Survey nonresponse and the distribution of income

Survey nonresponse and the distribution of income Survey nonresponse and the distribution of income Emanuela Galasso* Development Research Group, World Bank Module 1. Sampling for Surveys 1: Why are we concerned about non response? 2: Implications for

More information

Typical information required from the data collection can be grouped into four categories, enumerated as below.

Typical information required from the data collection can be grouped into four categories, enumerated as below. Chapter 6 Data Collection 6.1 Overview The four-stage modeling, an important tool for forecasting future demand and performance of a transportation system, was developed for evaluating large-scale infrastructure

More information

Key Algebraic Results in Linear Regression

Key Algebraic Results in Linear Regression Key Algebraic Results in Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 30 Key Algebraic Results in

More information

Planned Missingness Designs and the American Community Survey (ACS)

Planned Missingness Designs and the American Community Survey (ACS) Planned Missingness Designs and the American Community Survey (ACS) Steven G. Heeringa Institute for Social Research University of Michigan Presentation to the National Academies of Sciences Workshop on

More information

Session 2.1: Terminology, Concepts and Definitions

Session 2.1: Terminology, Concepts and Definitions Second Regional Training Course on Sampling Methods for Producing Core Data Items for Agricultural and Rural Statistics Module 2: Review of Basics of Sampling Methods Session 2.1: Terminology, Concepts

More information

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Joe Schafer Office of the Associate Director for Research and Methodology U.S. Census

More information

Empirical approaches in public economics

Empirical approaches in public economics Empirical approaches in public economics ECON4624 Empirical Public Economics Fall 2016 Gaute Torsvik Outline for today The canonical problem Basic concepts of causal inference Randomized experiments Non-experimental

More information

A Comparative Study of Imputation Methods for Estimation of Missing Values of Per Capita Expenditure in Central Java

A Comparative Study of Imputation Methods for Estimation of Missing Values of Per Capita Expenditure in Central Java IOP Conference Series: Earth and Environmental Science PAPER OPEN ACCESS A Comparative Study of Imputation Methods for Estimation of Missing Values of Per Capita Expenditure in Central Java To cite this

More information

On the bias of the multiple-imputation variance estimator in survey sampling

On the bias of the multiple-imputation variance estimator in survey sampling J. R. Statist. Soc. B (2006) 68, Part 3, pp. 509 521 On the bias of the multiple-imputation variance estimator in survey sampling Jae Kwang Kim, Yonsei University, Seoul, Korea J. Michael Brick, Westat,

More information

On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness

On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness Statistics and Applications {ISSN 2452-7395 (online)} Volume 16 No. 1, 2018 (New Series), pp 289-303 On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness Snigdhansu

More information

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY J.D. Opsomer, W.A. Fuller and X. Li Iowa State University, Ames, IA 50011, USA 1. Introduction Replication methods are often used in

More information