Graybill Conference Poster Session Introductions

Size: px
Start display at page:

Download "Graybill Conference Poster Session Introductions"

Transcription

1 Graybill Conference Poster Session Introductions 2013 Graybill Conference in Modern Survey Statistics Colorado State University Fort Collins, CO June 10, 2013

2 Small Area Estimation with Incomplete Auxiliary Information Andreea L. Erciulescu and Wayne A. Fuller Department of Statistics, Iowa State University June 10, 2013

3 Background and Motivation Surveys are often designed to achieve specific information about totals and means, but direct estimates for small areas may not be reliable because of small sample sizes Procedures based on models have been used to construct estimates for small areas, by exploiting auxiliary information We fit nested models with a binary response and random area effects E(y ij b i ) = p ij (x ij, b i ) = exp(x ij β + b i) 1 + exp(x ij β + b i) The goal is to construct small area predictions for the mean of a binomial variable, using different amounts of auxiliary information The true small area mean of y is θ i = p ij (x ij, b i )df xi (x)

4 Results and Conclusions We consider three cases of auxiliary information Fxi (µ xi, Σ xx ) known µ xi unknown fixed, estimated using ω ij such that n i j=1 n i ω ij = 1 and E( ω ij x ij ) = µ xi j=1 µxi unknown random, estimated using µ xi (µ x, Σ δδ ) and x i µ xi (µ xi, Σ xx ) We construct small area predictions for the area means by integrating over the covariate distribution and over the random area effects distribution We compare the prediction error biases and mean squared errors using a simulation study We conclude that, generally, it is better to include auxiliary information in the model and estimate its distribution than to ignore the auxiliary information.

5 Variance Estimation after Multiple Imputation Jiwei Zhao Department of Biostatistics, Yale University School of Public Health June 10, 2013

6 Background and Motivation Consider (R, Y, X ), R = 1 is Y is observed, X is fully observed Missing data mechanism p(r = 1 Y, X ) Regression model p(y X ; θ) Problem of interest: comparison of variance estimation of estimates of θ before and after multiple imputation (MI) Under MAR assumption, the estimator after MI is less efficient, and more general results are obtained (Wang and Robins 1998, Kim and Shao 2013) Before MI, Vp = I 1 obs After MI, Vp,MI = I 1 obs + M 1 Ic 1 I mis Ic 1 What is the situation under nonignorable missingness?

7 Results and Conclusions How to propose preliminary estimator? Assume p(r = 1 Y, X ) = p(r = 1 Y ) (Tang, Little and Raghunathan, 2003) p(x Y, R = 1) = p(x Y ) = p(y X ;θ)p(x ) p(y X ;θ)p(x )dx How to conduct MI? p(y X, R = 0) = p(r=0 Y )p(y X ;θ) p(r=0 Y )p(y X ;θ)dy Simulation studies show that MI could improve the efficiency of estimator of θ Ongoing project: general results are still under investigation Any comments/suggestions are appreciated!

8 A Semiparametric Approach to Modeling Survey Data in the Presence of Informative Sampling Wade W. Herndon, Jean Opsomer, and F. Jay Breidt Department of Statistics, Colorado State University June 10, 2013

9 Background and Motivation Under an informative sampling design, the model that holds at the sample level does not hold at the population level We want to estimate f (y k x k ) Due to the informative sampling we must include additional design variables to account for the design information yielding f 1 (y k x k, z k ) We can recover the original regression relationship of interest via f (y k x k ) = f 1 (y k x k, z k )f 2 (z k x k )dz k The goal is to use model covariates to integrate out the design effects from model

10 Results and Conclusions For many applications, sample weights can be included as model covariates to account for the design bias, and then subsequently estimated by a nonparametric estimator using model covariates The full regression model is y = x T β + wx T γ + ɛ A semiparametric model is proposed where (ˆβ T, ˆγ T ) come from the regression of y on x and w E [w x] is estimated by a nonparametric, design-based estimator The nonparametric estimator is combined with the parametric regression to form an estimator for y that is a smooth function of x nonparametric methods are used here to integrate out the design effects from the model

11 The use of followups for propensity score adjustment with nonignorable nonresponse Jongho Im and Jae-Kwang Kim Department of Statistics, Iowa State University June 10, 2013

12 Background and Motivation Nonignorable nonreponse bias can be corrected with followups. Our goal is to provide a propensity score adjusted estimator, Ŷ = n d i δ i,t 1 y i + i=1 n i=1 (1 δ i,t 1 )δ it d i y i ˆp it for t = 1,, T with δ i0 = 0. d i is sampling weight. A t is a set of all respondents up to the t-th contact; A 1 A T. δ it is equal to 1 if i A t and 0 otherwise. p it is the conditional response probability at the t-th contact, p it P(δ it = 1 δ i,t 1 = 0, y i ) = {1 + exp(α t + φy i )} 1 Alho (1990) considered a conditional likelihood based approach to estimate ˆp it by assuming the multinomial likelihood on p it = P(δ it = 1 δ i,t 1 = 0, y i, δ it = 1) instead of p it.

13 Results and Conclusions Since E [δ it δ i,t 1, y i ] = p it, given the set of respondents A 1 and A 2, we can write δ i1 d i (1, y i ) = (N, Y ) & d i = N (1) p i1 i=a i A i A d i δ i1 (1, y i ) + i A d i (1 δ i1 )δ i2 p i2 (1, y i ) = (N, Y ) (2) We have 3 equations and 3 parameters in (1) and (2). We can apply the generalized method of moment (GMM) for the general followup cases that we have more equations than the number of parameters. Relatively easy to get variance estimation (GMM estimator). More robust rather than other likelihood based methods. Auxiliary variable information can be augmented as additional calibration equations.

14 Varying Coefficient Models in Finite Population Sampling Luis Fernando Contreras Cruz COLPOS Mexico June 10, 2013

15 Background and Motivation A model-assisted semiparametric method of estimating population totals is investigated to improve the precision of survey estimators by incorporating multivariate auxiliary information. The proposed superpopulation model is a varying coefficient model. The varying coefficient models (Hastie and Tibshirani,1993) and many of their variations (e.g. Hoover,1998) have gained much attention in the literature. The applications are found in various scientific areas, such as economics, business, medical science, etc. (see Fan, 2008 for a nice review). Both simulated and real data examples are given to illustrate the model and the proposed estimation methodology, which have provided strong evidence that corroborates with the asymptotic theory.

16 Results and Conclusions A way to obtain the smoothing parameters was proposed using cross-validation. The VCM identifies relations non linear between the variables. The VCM assisted-models contributes to semiparametric regression in survey sampling. The Variance estimation using cross-validation and g-weights work well in simulation studies and application. Use cross-validation to avoid overfitting problem.

17 Application of Z-estimation Theory to Calibrated Estimators for Semiparametric Models with Two-phase Stratified Sampling Jie Kate Hu, Gary Chan, Norman Breslow Department of Biotatistics University of Washington, Seattle, WA June 10, 2013

18 Motivation In epidemiology studies, we are usually interested in parameters specified in a (semi)parametric model describing an association between an exposure and an outcome. For example, λ(t Z) = λ 0 (t) + θ T Z. To improve the efficiency, we consider two-phase stratified sampling design and calibration estimators using auxiliary variables available for all cohort members. Our goal it to estimate both Euclidean and infinite dimensional parameters simultaneously in semiparametric models using inverse probability weighted estimating equation (IPW-EE) with calibration.

19 Results Let X be the variable of interest. Motivated by the semiparametric model, α 0 is defined as the unique solution to the map Ψ(a) = Eψ α (X ) = 0. Let vector Ṽ = Ṽ (V ) be the calibration variable. Calibrated estimator ˆα is obtained by solving the calibrated IPW-EE: N Ψ ψα,γ(x, V, R) = 0, ψ α,γ(x, V, R) = N (α, γ) = 1 N ( ψ 1,α,γ ψ 2,γ Asymptotic distribution of ˆα : i=1 (X, V, R) = R π 0(V ) exp( γt Ṽ )ψ α (X ) R (V, R) = π 0(V ) exp( γt Ṽ )Ṽ Ṽ N(ˆα α 0 ) = Ψ c 1 11 G N ψ1,α 0,0+ Ψ c 1 11 Ψ c c 1 12 Ψ 22 G Nψ2,0+o p (1). ).

20 Estimation of Cluster-level Regression Model under Nonresponse within Clusters Nuanpan Nangsue Social Sciences, University of Southampton, UK June 10, 2013

21 Background and Motivation Aim: Look at new methods for analysis which incorporate information on non-response in the model The model of interest is a cluster level regression model relating the cluster mean Ȳ i of y ij Ȳ i = x i β + ɛ i (3) We suppose that underlying (3) we may write y ij = x i β + ɛ ij (4) To model the response outcome R ij, we introduce a variable u ij so that R ij = 1 if u ij > 0 and R ij = 0, otherwise. We assume that u ij = z i γ + δ ij (5) The inferential problem is how to use observed data on y ij, x i and z i to make inference about β.

22 Results and Conclusions To develop an estimator following the approach of Heckman (1976), we may write ( z E(y ij R ij = 1) = x i ) β + cλ i γ, (6) ( ) ( ) ( where c = σ ɛδ σ 1 z δ, λ i γ z σ δ = φ i γ z σ δ /Φ i γ σ δ ). A simpler version of this estimator is obtained by noting that for large m i, the response rate p i = r i m i may be expressed approximately as ( z ) p i E(R ij ) = Φ i γ = Φ(Ψ i ) (7) ( Now set ˆΨ i = Φ 1 z (p i ) and replace λ i ˆγ by λ( ˆΨ i ) in the Heckman two-step approach. An approximate Heckman maximum likelihood estimator is also obtained in order to estimate the regression coefficients β and c. σ δ ˆσ δ ) σ δ

23 Proportion estimators in dual frame surveys with auxiliary information Hemilio Coelho 1, Camila Silva 1 and Cristiano Ferraz 2 1. Department of Statistics, Federal University of Paraiba 2. Department of Statistics, Federal University of Pernambuco June 10, 2013

24 Background and Motivation In dual frame surveys, probability samples are independently drawn from two overlapping frames, denoted by A and B, with A B The simultaneous use of both frames, in a dual frame design generate three domains mutually exclusive: a = A B c, b = B A c and ab = A B. Based on results proposed by Hartley (1962), we proposed three estimators to estimate the populational proportion assisted by regression models, denoted by ˆP 1, ˆP 2 and ˆP 3, where the model used in the third estimator was based on logistic regression; The goal is to evaluate the performance of these estimators through Monte Carlo Experiments. All estimators were evaluated on their replicates mean, standard deviation, mean squared error and relative bias.

25 Results and Conclusions The results show that estimators ˆP 1 and ˆP 2 presented less relative bias than the estimator ˆP 3 ; When we look for the standard deviation for all sample sizes, it is possible to note that the estimator ˆP 3 presented better performance; The results show that the relative bias of estimator ˆP 3 not changed for all sample sizes considered, which suggests a further study to correct this bias. The correct specification of the model or the number of auxiliary information present in study can improve the performance of estimator ˆP 3.

26 Impacts of Nonsampling Errors on Estimates for the Conservation Effects Assessment Project Andreea Erciulescu and Emily Berg Department of Statistics, Iowa State University June 10, 2013

27 Background and Objectives Conservation Effects Assessment Project (CEAP) Environmental impacts of conservation practices Population: cultivated cropland Estimation domains: watersheds (8-digits nested in 4-digits) Boone/Raccoon River Watershed (Iowa) Sample of locations classified as cultivated cropland according to the National Resources Inventory Computer model converts collected data to analysis variables Soil erosion (RUSLE2), wind erosion, nitrogen run-off Nonsampling errors in CEAP Nonresponse - refusals Frame undercoverage - limited information on land use at sample design stage

28 Methods and Results Auxiliary information to evaluate bias due to nonsampling errors Slope, soil erodibility index from Soil Survey (known for full population) Soil erosion based on Universal Soil Loss Equation from NRI (known for NRI sample) Compare means using t-tests and locations using nonparametric tests Little evidence of nonresponse bias Evidence of bias due to frame undercoverage Especially in southern watersheds, where slopes are steeper and changes between non-cultivated and cultivated cropland are more common On-going work Calibration to adjust for bias due to frame undercoverage Small area estimation, 8-digit watersheds

29 Jackknife Empirical Likelihood for Regression Imputation Estimation Sixia Chen and Pingshou Zhong Westat and Michigan State University June 10, 2013

30 Item Nonresponse in Auxiliary Variables Used in Weighting Adjustments for Survey Sample Data Raphael Nishimura Michigan Program in Survey Methodology, Institute for Social Research, University of Michigan June 10, 2013

31 Background and Motivation Auxiliary variables in weighting on survey sampling: Population aggregates (control totals) known for auxiliary variables: t x = x i U Adjust design-weights to match population totals: w i x i = t x Improve estimates precision Calibration (Deville and Sarndal, 1992) Special case: Linear GREG (Generalized REGression) estimator Requirement: auxiliary variables observed for all sampled elements However, some important auxiliary variables may not be completely observed In practice: auxiliary variables imputed when missing or not used in weighting What are the impact of such procedure in the survey estimates? s

32 Results and Conclusions Missing values in auxiliary variables used in weighting adjustments: Never use complete cases only Larger variance (reduced sample size) Potential bias Calibration using auxiliary variable with imputed values, worthwhile when MAR (correctly specified imputation model) High correlation with survey variable Missing rate is not high Otherwise, using other auxiliary variables with lower missing rates and/or higher correlation with survey variables might be better alternative

Recent Advances in the analysis of missing data with non-ignorable missingness

Recent Advances in the analysis of missing data with non-ignorable missingness Recent Advances in the analysis of missing data with non-ignorable missingness Jae-Kwang Kim Department of Statistics, Iowa State University July 4th, 2014 1 Introduction 2 Full likelihood-based ML estimation

More information

Combining multiple observational data sources to estimate causal eects

Combining multiple observational data sources to estimate causal eects Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,

More information

Introduction An approximated EM algorithm Simulation studies Discussion

Introduction An approximated EM algorithm Simulation studies Discussion 1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse

More information

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

Fractional Imputation in Survey Sampling: A Comparative Review

Fractional Imputation in Survey Sampling: A Comparative Review Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical

More information

On the bias of the multiple-imputation variance estimator in survey sampling

On the bias of the multiple-imputation variance estimator in survey sampling J. R. Statist. Soc. B (2006) 68, Part 3, pp. 509 521 On the bias of the multiple-imputation variance estimator in survey sampling Jae Kwang Kim, Yonsei University, Seoul, Korea J. Michael Brick, Westat,

More information

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Modification and Improvement of Empirical Likelihood for Missing Response Problem UW Biostatistics Working Paper Series 12-30-2010 Modification and Improvement of Empirical Likelihood for Missing Response Problem Kwun Chuen Gary Chan University of Washington - Seattle Campus, kcgchan@u.washington.edu

More information

Data Integration for Big Data Analysis for finite population inference

Data Integration for Big Data Analysis for finite population inference for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation

More information

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)

More information

Propensity score adjusted method for missing data

Propensity score adjusted method for missing data Graduate Theses and Dissertations Graduate College 2013 Propensity score adjusted method for missing data Minsun Kim Riddles Iowa State University Follow this and additional works at: http://lib.dr.iastate.edu/etd

More information

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Topics and Papers for Spring 14 RIT

Topics and Papers for Spring 14 RIT Eric Slud Feb. 3, 204 Topics and Papers for Spring 4 RIT The general topic of the RIT is inference for parameters of interest, such as population means or nonlinearregression coefficients, in the presence

More information

Robustness to Parametric Assumptions in Missing Data Models

Robustness to Parametric Assumptions in Missing Data Models Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice

More information

Nonresponse weighting adjustment using estimated response probability

Nonresponse weighting adjustment using estimated response probability Nonresponse weighting adjustment using estimated response probability Jae-kwang Kim Yonsei University, Seoul, Korea December 26, 2006 Introduction Nonresponse Unit nonresponse Item nonresponse Basic strategy

More information

AN INSTRUMENTAL VARIABLE APPROACH FOR IDENTIFICATION AND ESTIMATION WITH NONIGNORABLE NONRESPONSE

AN INSTRUMENTAL VARIABLE APPROACH FOR IDENTIFICATION AND ESTIMATION WITH NONIGNORABLE NONRESPONSE Statistica Sinica 24 (2014), 1097-1116 doi:http://dx.doi.org/10.5705/ss.2012.074 AN INSTRUMENTAL VARIABLE APPROACH FOR IDENTIFICATION AND ESTIMATION WITH NONIGNORABLE NONRESPONSE Sheng Wang 1, Jun Shao

More information

Imputation for Missing Data under PPSWR Sampling

Imputation for Missing Data under PPSWR Sampling July 5, 2010 Beijing Imputation for Missing Data under PPSWR Sampling Guohua Zou Academy of Mathematics and Systems Science Chinese Academy of Sciences 1 23 () Outline () Imputation method under PPSWR

More information

Advanced Methods for Agricultural and Agroenvironmental. Emily Berg, Zhengyuan Zhu, Sarah Nusser, and Wayne Fuller

Advanced Methods for Agricultural and Agroenvironmental. Emily Berg, Zhengyuan Zhu, Sarah Nusser, and Wayne Fuller Advanced Methods for Agricultural and Agroenvironmental Monitoring Emily Berg, Zhengyuan Zhu, Sarah Nusser, and Wayne Fuller Outline 1. Introduction to the National Resources Inventory 2. Hierarchical

More information

High Dimensional Propensity Score Estimation via Covariate Balancing

High Dimensional Propensity Score Estimation via Covariate Balancing High Dimensional Propensity Score Estimation via Covariate Balancing Kosuke Imai Princeton University Talk at Columbia University May 13, 2017 Joint work with Yang Ning and Sida Peng Kosuke Imai (Princeton)

More information

Combining data from two independent surveys: model-assisted approach

Combining data from two independent surveys: model-assisted approach Combining data from two independent surveys: model-assisted approach Jae Kwang Kim 1 Iowa State University January 20, 2012 1 Joint work with J.N.K. Rao, Carleton University Reference Kim, J.K. and Rao,

More information

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Statistica Sinica 24 (2014), 1001-1015 doi:http://dx.doi.org/10.5705/ss.2013.038 INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Seunghwan Park and Jae Kwang Kim Seoul National Univeristy

More information

A measurement error model approach to small area estimation

A measurement error model approach to small area estimation A measurement error model approach to small area estimation Jae-kwang Kim 1 Spring, 2015 1 Joint work with Seunghwan Park and Seoyoung Kim Ouline Introduction Basic Theory Application to Korean LFS Discussion

More information

Extending causal inferences from a randomized trial to a target population

Extending causal inferences from a randomized trial to a target population Extending causal inferences from a randomized trial to a target population Issa Dahabreh Center for Evidence Synthesis in Health, Brown University issa dahabreh@brown.edu January 16, 2019 Issa Dahabreh

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

Covariate Balancing Propensity Score for General Treatment Regimes

Covariate Balancing Propensity Score for General Treatment Regimes Covariate Balancing Propensity Score for General Treatment Regimes Kosuke Imai Princeton University October 14, 2014 Talk at the Department of Psychiatry, Columbia University Joint work with Christian

More information

Two-phase sampling approach to fractional hot deck imputation

Two-phase sampling approach to fractional hot deck imputation Two-phase sampling approach to fractional hot deck imputation Jongho Im 1, Jae-Kwang Kim 1 and Wayne A. Fuller 1 Abstract Hot deck imputation is popular for handling item nonresponse in survey sampling.

More information

Introduction to Survey Data Integration

Introduction to Survey Data Integration Introduction to Survey Data Integration Jae-Kwang Kim Iowa State University May 20, 2014 Outline 1 Introduction 2 Survey Integration Examples 3 Basic Theory for Survey Integration 4 NASS application 5

More information

Calibration Estimation for Semiparametric Copula Models under Missing Data

Calibration Estimation for Semiparametric Copula Models under Missing Data Calibration Estimation for Semiparametric Copula Models under Missing Data Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Economics and Economic Growth Centre

More information

Parametric fractional imputation for missing data analysis

Parametric fractional imputation for missing data analysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????),??,?, pp. 1 15 C???? Biometrika Trust Printed in

More information

Miscellanea A note on multiple imputation under complex sampling

Miscellanea A note on multiple imputation under complex sampling Biometrika (2017), 104, 1,pp. 221 228 doi: 10.1093/biomet/asw058 Printed in Great Britain Advance Access publication 3 January 2017 Miscellanea A note on multiple imputation under complex sampling BY J.

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

A note on multiple imputation for general purpose estimation

A note on multiple imputation for general purpose estimation A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume

More information

6. Fractional Imputation in Survey Sampling

6. Fractional Imputation in Survey Sampling 6. Fractional Imputation in Survey Sampling 1 Introduction Consider a finite population of N units identified by a set of indices U = {1, 2,, N} with N known. Associated with each unit i in the population

More information

Chapter 4: Imputation

Chapter 4: Imputation Chapter 4: Imputation Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Basic Theory for imputation 3 Variance estimation after imputation 4 Replication variance estimation

More information

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design 1 / 32 Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design Changbao Wu Department of Statistics and Actuarial Science University of Waterloo (Joint work with Min Chen and Mary

More information

Small area prediction based on unit level models when the covariate mean is measured with error

Small area prediction based on unit level models when the covariate mean is measured with error Graduate Theses and Dissertations Iowa State University Capstones, Theses and Dissertations 2015 Small area prediction based on unit level models when the covariate mean is measured with error Andreea

More information

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Chapter 5: Models used in conjunction with sampling J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Nonresponse Unit Nonresponse: weight adjustment Item Nonresponse:

More information

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Econometrics Workshop UNC

More information

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Michael J. Daniels and Chenguang Wang Jan. 18, 2009 First, we would like to thank Joe and Geert for a carefully

More information

Statistical Methods for Handling Missing Data

Statistical Methods for Handling Missing Data Statistical Methods for Handling Missing Data Jae-Kwang Kim Department of Statistics, Iowa State University July 5th, 2014 Outline Textbook : Statistical Methods for handling incomplete data by Kim and

More information

A weighted simulation-based estimator for incomplete longitudinal data models

A weighted simulation-based estimator for incomplete longitudinal data models To appear in Statistics and Probability Letters, 113 (2016), 16-22. doi 10.1016/j.spl.2016.02.004 A weighted simulation-based estimator for incomplete longitudinal data models Daniel H. Li 1 and Liqun

More information

Large sample theory for merged data from multiple sources

Large sample theory for merged data from multiple sources Large sample theory for merged data from multiple sources Takumi Saegusa University of Maryland Division of Statistics August 22 2018 Section 1 Introduction Problem: Data Integration Massive data are collected

More information

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007)

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007) Double Robustness Bang and Robins (2005) Kang and Schafer (2007) Set-Up Assume throughout that treatment assignment is ignorable given covariates (similar to assumption that data are missing at random

More information

Combining Non-probability and Probability Survey Samples Through Mass Imputation

Combining Non-probability and Probability Survey Samples Through Mass Imputation Combining Non-probability and Probability Survey Samples Through Mass Imputation Jae-Kwang Kim 1 Iowa State University & KAIST October 27, 2018 1 Joint work with Seho Park, Yilin Chen, and Changbao Wu

More information

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing Primal-dual Covariate Balance and Minimal Double Robustness via (Joint work with Daniel Percival) Department of Statistics, Stanford University JSM, August 9, 2015 Outline 1 2 3 1/18 Setting Rubin s causal

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Institute of Statistics

More information

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3 University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.

More information

Calibration estimation using exponential tilting in sample surveys

Calibration estimation using exponential tilting in sample surveys Calibration estimation using exponential tilting in sample surveys Jae Kwang Kim February 23, 2010 Abstract We consider the problem of parameter estimation with auxiliary information, where the auxiliary

More information

REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES

REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES Statistica Sinica 8(1998), 1153-1164 REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES Wayne A. Fuller Iowa State University Abstract: The estimation of the variance of the regression estimator for

More information

Estimation for two-phase designs: semiparametric models and Z theorems

Estimation for two-phase designs: semiparametric models and Z theorems Estimation for two-phase designs:semiparametric models and Z theorems p. 1/27 Estimation for two-phase designs: semiparametric models and Z theorems Jon A. Wellner University of Washington Estimation for

More information

University of Michigan School of Public Health

University of Michigan School of Public Health University of Michigan School of Public Health The University of Michigan Department of Biostatistics Working Paper Series Year 003 Paper Weighting Adustments for Unit Nonresponse with Multiple Outcome

More information

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai Weighting Methods Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Weighting Methods Stat186/Gov2002 Fall 2018 1 / 13 Motivation Matching methods for improving

More information

Some methods for handling missing data in surveys

Some methods for handling missing data in surveys Graduate Theses and Dissertations Graduate College 2015 Some methods for handling missing data in surveys Jongho Im Iowa State University Follow this and additional works at: http://lib.dr.iastate.edu/etd

More information

Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level

Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level A Monte Carlo Simulation to Test the Tenability of the SuperMatrix Approach Kyle M Lang Quantitative Psychology

More information

Model Assisted Survey Sampling

Model Assisted Survey Sampling Carl-Erik Sarndal Jan Wretman Bengt Swensson Model Assisted Survey Sampling Springer Preface v PARTI Principles of Estimation for Finite Populations and Important Sampling Designs CHAPTER 1 Survey Sampling

More information

LIKELIHOOD RATIO INFERENCE FOR MISSING DATA MODELS

LIKELIHOOD RATIO INFERENCE FOR MISSING DATA MODELS LIKELIHOOD RATIO IFERECE FOR MISSIG DATA MODELS KARU ADUSUMILLI AD TAISUKE OTSU Abstract. Missing or incomplete outcome data is a ubiquitous problem in biomedical and social sciences. Under the missing

More information

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY J.D. Opsomer, W.A. Fuller and X. Li Iowa State University, Ames, IA 50011, USA 1. Introduction Replication methods are often used in

More information

The propensity score with continuous treatments

The propensity score with continuous treatments 7 The propensity score with continuous treatments Keisuke Hirano and Guido W. Imbens 1 7.1 Introduction Much of the work on propensity score analysis has focused on the case in which the treatment is binary.

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions Joe Schafer Office of the Associate Director for Research and Methodology U.S. Census

More information

Weighting in survey analysis under informative sampling

Weighting in survey analysis under informative sampling Jae Kwang Kim and Chris J. Skinner Weighting in survey analysis under informative sampling Article (Accepted version) (Refereed) Original citation: Kim, Jae Kwang and Skinner, Chris J. (2013) Weighting

More information

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score Causal Inference with General Treatment Regimes: Generalizing the Propensity Score David van Dyk Department of Statistics, University of California, Irvine vandyk@stat.harvard.edu Joint work with Kosuke

More information

Estimating the Marginal Odds Ratio in Observational Studies

Estimating the Marginal Odds Ratio in Observational Studies Estimating the Marginal Odds Ratio in Observational Studies Travis Loux Christiana Drake Department of Statistics University of California, Davis June 20, 2011 Outline The Counterfactual Model Odds Ratios

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

Nonrespondent subsample multiple imputation in two-phase random sampling for nonresponse

Nonrespondent subsample multiple imputation in two-phase random sampling for nonresponse Nonrespondent subsample multiple imputation in two-phase random sampling for nonresponse Nanhua Zhang Division of Biostatistics & Epidemiology Cincinnati Children s Hospital Medical Center (Joint work

More information

Chapter 2. Section Section 2.9. J. Kim (ISU) Chapter 2 1 / 26. Design-optimal estimator under stratified random sampling

Chapter 2. Section Section 2.9. J. Kim (ISU) Chapter 2 1 / 26. Design-optimal estimator under stratified random sampling Chapter 2 Section 2.4 - Section 2.9 J. Kim (ISU) Chapter 2 1 / 26 2.4 Regression and stratification Design-optimal estimator under stratified random sampling where (Ŝxxh, Ŝxyh) ˆβ opt = ( x st, ȳ st )

More information

ENTROPY BALANCING IS DOUBLY ROBUST QINGYUAN ZHAO. Department of Statistics, Stanford University DANIEL PERCIVAL. Google Inc.

ENTROPY BALANCING IS DOUBLY ROBUST QINGYUAN ZHAO. Department of Statistics, Stanford University DANIEL PERCIVAL. Google Inc. ENTROPY BALANCING IS DOUBLY ROBUST QINGYUAN ZHAO Department of Statistics, Stanford University DANIEL PERCIVAL Google Inc. Abstract. Covariate balance is a conventional key diagnostic for methods used

More information

A Sampling of IMPACT Research:

A Sampling of IMPACT Research: A Sampling of IMPACT Research: Methods for Analysis with Dropout and Identifying Optimal Treatment Regimes Marie Davidian Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Longitudinal analysis of ordinal data

Longitudinal analysis of ordinal data Longitudinal analysis of ordinal data A report on the external research project with ULg Anne-Françoise Donneau, Murielle Mauer June 30 th 2009 Generalized Estimating Equations (Liang and Zeger, 1986)

More information

Missing Covariate Data in Matched Case-Control Studies

Missing Covariate Data in Matched Case-Control Studies Missing Covariate Data in Matched Case-Control Studies Department of Statistics North Carolina State University Paul Rathouz Dept. of Health Studies U. of Chicago prathouz@health.bsd.uchicago.edu with

More information

Nonparametric Regression Estimation of Finite Population Totals under Two-Stage Sampling

Nonparametric Regression Estimation of Finite Population Totals under Two-Stage Sampling Nonparametric Regression Estimation of Finite Population Totals under Two-Stage Sampling Ji-Yeon Kim Iowa State University F. Jay Breidt Colorado State University Jean D. Opsomer Colorado State University

More information

arxiv: v2 [math.st] 20 Jun 2014

arxiv: v2 [math.st] 20 Jun 2014 A solution in small area estimation problems Andrius Čiginas and Tomas Rudys Vilnius University Institute of Mathematics and Informatics, LT-08663 Vilnius, Lithuania arxiv:1306.2814v2 [math.st] 20 Jun

More information

Causal Inference in Observational Studies with Non-Binary Treatments. David A. van Dyk

Causal Inference in Observational Studies with Non-Binary Treatments. David A. van Dyk Causal Inference in Observational Studies with Non-Binary reatments Statistics Section, Imperial College London Joint work with Shandong Zhao and Kosuke Imai Cass Business School, October 2013 Outline

More information

VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA

VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA Submitted to the Annals of Applied Statistics VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA By Jae Kwang Kim, Wayne A. Fuller and William R. Bell Iowa State University

More information

Nuisance parameter elimination for proportional likelihood ratio models with nonignorable missingness and random truncation

Nuisance parameter elimination for proportional likelihood ratio models with nonignorable missingness and random truncation Biometrika Advance Access published October 24, 202 Biometrika (202), pp. 8 C 202 Biometrika rust Printed in Great Britain doi: 0.093/biomet/ass056 Nuisance parameter elimination for proportional likelihood

More information

New Developments in Nonresponse Adjustment Methods

New Developments in Nonresponse Adjustment Methods New Developments in Nonresponse Adjustment Methods Fannie Cobben January 23, 2009 1 Introduction In this paper, we describe two relatively new techniques to adjust for (unit) nonresponse bias: The sample

More information

arxiv: v1 [stat.me] 15 May 2011

arxiv: v1 [stat.me] 15 May 2011 Working Paper Propensity Score Analysis with Matching Weights Liang Li, Ph.D. arxiv:1105.2917v1 [stat.me] 15 May 2011 Associate Staff of Biostatistics Department of Quantitative Health Sciences, Cleveland

More information

ENTROPY BALANCING IS DOUBLY ROBUST. Department of Statistics, Wharton School, University of Pennsylvania DANIEL PERCIVAL. Google Inc.

ENTROPY BALANCING IS DOUBLY ROBUST. Department of Statistics, Wharton School, University of Pennsylvania DANIEL PERCIVAL. Google Inc. ENTROPY BALANCING IS DOUBLY ROBUST QINGYUAN ZHAO arxiv:1501.03571v3 [stat.me] 11 Feb 2017 Department of Statistics, Wharton School, University of Pennsylvania DANIEL PERCIVAL Google Inc. Abstract. Covariate

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

Analyzing Pilot Studies with Missing Observations

Analyzing Pilot Studies with Missing Observations Analyzing Pilot Studies with Missing Observations Monnie McGee mmcgee@smu.edu. Department of Statistical Science Southern Methodist University, Dallas, Texas Co-authored with N. Bergasa (SUNY Downstate

More information

analysis of incomplete data in statistical surveys

analysis of incomplete data in statistical surveys analysis of incomplete data in statistical surveys Ugo Guarnera 1 1 Italian National Institute of Statistics, Italy guarnera@istat.it Jordan Twinning: Imputation - Amman, 6-13 Dec 2014 outline 1 origin

More information

Inferences on missing information under multiple imputation and two-stage multiple imputation

Inferences on missing information under multiple imputation and two-stage multiple imputation p. 1/4 Inferences on missing information under multiple imputation and two-stage multiple imputation Ofer Harel Department of Statistics University of Connecticut Prepared for the Missing Data Approaches

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Basics of Modern Missing Data Analysis

Basics of Modern Missing Data Analysis Basics of Modern Missing Data Analysis Kyle M. Lang Center for Research Methods and Data Analysis University of Kansas March 8, 2013 Topics to be Covered An introduction to the missing data problem Missing

More information

Biostat 2065 Analysis of Incomplete Data

Biostat 2065 Analysis of Incomplete Data Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh September 13 & 15, 2005 1. Complete-case analysis (I) Complete-case analysis refers to analysis based on

More information

Estimation from Purposive Samples with the Aid of Probability Supplements but without Data on the Study Variable

Estimation from Purposive Samples with the Aid of Probability Supplements but without Data on the Study Variable Estimation from Purposive Samples with the Aid of Probability Supplements but without Data on the Study Variable A.C. Singh,, V. Beresovsky, and C. Ye Survey and Data Sciences, American nstitutes for Research,

More information

Missing covariate data in matched case-control studies: Do the usual paradigms apply?

Missing covariate data in matched case-control studies: Do the usual paradigms apply? Missing covariate data in matched case-control studies: Do the usual paradigms apply? Bryan Langholz USC Department of Preventive Medicine Joint work with Mulugeta Gebregziabher Larry Goldstein Mark Huberman

More information

GENERALIZED SCORE TESTS FOR MISSING COVARIATE DATA. A Dissertation LEI JIN

GENERALIZED SCORE TESTS FOR MISSING COVARIATE DATA. A Dissertation LEI JIN GENERALIZED SCORE TESTS FOR MISSING COVARIATE DATA A Dissertation by LEI JIN Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements for the degree

More information

On the Use of Linear Fixed Effects Regression Models for Causal Inference

On the Use of Linear Fixed Effects Regression Models for Causal Inference On the Use of Linear Fixed Effects Regression Models for ausal Inference Kosuke Imai Department of Politics Princeton University Joint work with In Song Kim Atlantic ausal Inference onference Johns Hopkins

More information

An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys

An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys Richard Valliant University of Michigan and Joint Program in Survey Methodology University of Maryland 1 Introduction

More information

Chapter 4. Parametric Approach. 4.1 Introduction

Chapter 4. Parametric Approach. 4.1 Introduction Chapter 4 Parametric Approach 4.1 Introduction The missing data problem is already a classical problem that has not been yet solved satisfactorily. This problem includes those situations where the dependent

More information

Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes

Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes Kosuke Imai Department of Politics Princeton University July 31 2007 Kosuke Imai (Princeton University) Nonignorable

More information

Econometric Analysis of Cross Section and Panel Data

Econometric Analysis of Cross Section and Panel Data Econometric Analysis of Cross Section and Panel Data Jeffrey M. Wooldridge / The MIT Press Cambridge, Massachusetts London, England Contents Preface Acknowledgments xvii xxiii I INTRODUCTION AND BACKGROUND

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Calibration estimation in survey sampling

Calibration estimation in survey sampling Calibration estimation in survey sampling Jae Kwang Kim Mingue Park September 8, 2009 Abstract Calibration estimation, where the sampling weights are adjusted to make certain estimators match known population

More information

Likelihood-based inference with missing data under missing-at-random

Likelihood-based inference with missing data under missing-at-random Likelihood-based inference with missing data under missing-at-random Jae-kwang Kim Joint work with Shu Yang Department of Statistics, Iowa State University May 4, 014 Outline 1. Introduction. Parametric

More information

MISSING or INCOMPLETE DATA

MISSING or INCOMPLETE DATA MISSING or INCOMPLETE DATA A (fairly) complete review of basic practice Don McLeish and Cyntha Struthers University of Waterloo Dec 5, 2015 Structure of the Workshop Session 1 Common methods for dealing

More information

Causal Inference Basics

Causal Inference Basics Causal Inference Basics Sam Lendle October 09, 2013 Observed data, question, counterfactuals Observed data: n i.i.d copies of baseline covariates W, treatment A {0, 1}, and outcome Y. O i = (W i, A i,

More information

The Use of Survey Weights in Regression Modelling

The Use of Survey Weights in Regression Modelling The Use of Survey Weights in Regression Modelling Chris Skinner London School of Economics and Political Science (with Jae-Kwang Kim, Iowa State University) Colorado State University, June 2013 1 Weighting

More information