Graybill Conference Poster Session Introductions

Similar documents
Recent Advances in the analysis of missing data with non-ignorable missingness

Combining multiple observational data sources to estimate causal eects

Introduction An approximated EM algorithm Simulation studies Discussion

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Imputation in Survey Sampling: A Comparative Review

On the bias of the multiple-imputation variance estimator in survey sampling

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Data Integration for Big Data Analysis for finite population inference

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

Propensity score adjusted method for missing data

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Topics and Papers for Spring 14 RIT

Robustness to Parametric Assumptions in Missing Data Models

Nonresponse weighting adjustment using estimated response probability

AN INSTRUMENTAL VARIABLE APPROACH FOR IDENTIFICATION AND ESTIMATION WITH NONIGNORABLE NONRESPONSE

Imputation for Missing Data under PPSWR Sampling

Advanced Methods for Agricultural and Agroenvironmental. Emily Berg, Zhengyuan Zhu, Sarah Nusser, and Wayne Fuller

High Dimensional Propensity Score Estimation via Covariate Balancing

Combining data from two independent surveys: model-assisted approach

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING

A measurement error model approach to small area estimation

Extending causal inferences from a randomized trial to a target population

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

Covariate Balancing Propensity Score for General Treatment Regimes

Two-phase sampling approach to fractional hot deck imputation

Introduction to Survey Data Integration

Calibration Estimation for Semiparametric Copula Models under Missing Data

Parametric fractional imputation for missing data analysis

Miscellanea A note on multiple imputation under complex sampling

Flexible Estimation of Treatment Effect Parameters

A note on multiple imputation for general purpose estimation

6. Fractional Imputation in Survey Sampling

Chapter 4: Imputation

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Small area prediction based on unit level models when the covariate mean is measured with error

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Statistical Methods for Handling Missing Data

A weighted simulation-based estimator for incomplete longitudinal data models

Large sample theory for merged data from multiple sources

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007)

Combining Non-probability and Probability Survey Samples Through Mass Imputation

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Calibration estimation using exponential tilting in sample surveys

REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES

Estimation for two-phase designs: semiparametric models and Z theorems

University of Michigan School of Public Health

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai

Some methods for handling missing data in surveys

Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level

Model Assisted Survey Sampling

LIKELIHOOD RATIO INFERENCE FOR MISSING DATA MODELS

REPLICATION VARIANCE ESTIMATION FOR THE NATIONAL RESOURCES INVENTORY

The propensity score with continuous treatments

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions

Weighting in survey analysis under informative sampling

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Estimating the Marginal Odds Ratio in Observational Studies

For more information about how to cite these materials visit

Nonrespondent subsample multiple imputation in two-phase random sampling for nonresponse

Chapter 2. Section Section 2.9. J. Kim (ISU) Chapter 2 1 / 26. Design-optimal estimator under stratified random sampling

ENTROPY BALANCING IS DOUBLY ROBUST QINGYUAN ZHAO. Department of Statistics, Stanford University DANIEL PERCIVAL. Google Inc.

A Sampling of IMPACT Research:

Longitudinal analysis of ordinal data

Missing Covariate Data in Matched Case-Control Studies

Nonparametric Regression Estimation of Finite Population Totals under Two-Stage Sampling

arxiv: v2 [math.st] 20 Jun 2014

Causal Inference in Observational Studies with Non-Binary Treatments. David A. van Dyk

VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA

Nuisance parameter elimination for proportional likelihood ratio models with nonignorable missingness and random truncation

New Developments in Nonresponse Adjustment Methods

arxiv: v1 [stat.me] 15 May 2011

ENTROPY BALANCING IS DOUBLY ROBUST. Department of Statistics, Wharton School, University of Pennsylvania DANIEL PERCIVAL. Google Inc.

Using Estimating Equations for Spatially Correlated A

Analyzing Pilot Studies with Missing Observations

analysis of incomplete data in statistical surveys

Inferences on missing information under multiple imputation and two-stage multiple imputation

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Basics of Modern Missing Data Analysis

Biostat 2065 Analysis of Incomplete Data

Estimation from Purposive Samples with the Aid of Probability Supplements but without Data on the Study Variable

Missing covariate data in matched case-control studies: Do the usual paradigms apply?

GENERALIZED SCORE TESTS FOR MISSING COVARIATE DATA. A Dissertation LEI JIN

On the Use of Linear Fixed Effects Regression Models for Causal Inference

An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys

Chapter 4. Parametric Approach. 4.1 Introduction

Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes

Econometric Analysis of Cross Section and Panel Data

Classification. Chapter Introduction. 6.2 The Bayes classifier

Calibration estimation in survey sampling

Likelihood-based inference with missing data under missing-at-random

MISSING or INCOMPLETE DATA

Causal Inference Basics

The Use of Survey Weights in Regression Modelling

Transcription:

Graybill Conference Poster Session Introductions 2013 Graybill Conference in Modern Survey Statistics Colorado State University Fort Collins, CO June 10, 2013

Small Area Estimation with Incomplete Auxiliary Information Andreea L. Erciulescu and Wayne A. Fuller Department of Statistics, Iowa State University June 10, 2013

Background and Motivation Surveys are often designed to achieve specific information about totals and means, but direct estimates for small areas may not be reliable because of small sample sizes Procedures based on models have been used to construct estimates for small areas, by exploiting auxiliary information We fit nested models with a binary response and random area effects E(y ij b i ) = p ij (x ij, b i ) = exp(x ij β + b i) 1 + exp(x ij β + b i) The goal is to construct small area predictions for the mean of a binomial variable, using different amounts of auxiliary information The true small area mean of y is θ i = p ij (x ij, b i )df xi (x)

Results and Conclusions We consider three cases of auxiliary information Fxi (µ xi, Σ xx ) known µ xi unknown fixed, estimated using ω ij such that n i j=1 n i ω ij = 1 and E( ω ij x ij ) = µ xi j=1 µxi unknown random, estimated using µ xi (µ x, Σ δδ ) and x i µ xi (µ xi, Σ xx ) We construct small area predictions for the area means by integrating over the covariate distribution and over the random area effects distribution We compare the prediction error biases and mean squared errors using a simulation study We conclude that, generally, it is better to include auxiliary information in the model and estimate its distribution than to ignore the auxiliary information.

Variance Estimation after Multiple Imputation Jiwei Zhao Department of Biostatistics, Yale University School of Public Health June 10, 2013

Background and Motivation Consider (R, Y, X ), R = 1 is Y is observed, X is fully observed Missing data mechanism p(r = 1 Y, X ) Regression model p(y X ; θ) Problem of interest: comparison of variance estimation of estimates of θ before and after multiple imputation (MI) Under MAR assumption, the estimator after MI is less efficient, and more general results are obtained (Wang and Robins 1998, Kim and Shao 2013) Before MI, Vp = I 1 obs After MI, Vp,MI = I 1 obs + M 1 Ic 1 I mis Ic 1 What is the situation under nonignorable missingness?

Results and Conclusions How to propose preliminary estimator? Assume p(r = 1 Y, X ) = p(r = 1 Y ) (Tang, Little and Raghunathan, 2003) p(x Y, R = 1) = p(x Y ) = p(y X ;θ)p(x ) p(y X ;θ)p(x )dx How to conduct MI? p(y X, R = 0) = p(r=0 Y )p(y X ;θ) p(r=0 Y )p(y X ;θ)dy Simulation studies show that MI could improve the efficiency of estimator of θ Ongoing project: general results are still under investigation Any comments/suggestions are appreciated!

A Semiparametric Approach to Modeling Survey Data in the Presence of Informative Sampling Wade W. Herndon, Jean Opsomer, and F. Jay Breidt Department of Statistics, Colorado State University June 10, 2013

Background and Motivation Under an informative sampling design, the model that holds at the sample level does not hold at the population level We want to estimate f (y k x k ) Due to the informative sampling we must include additional design variables to account for the design information yielding f 1 (y k x k, z k ) We can recover the original regression relationship of interest via f (y k x k ) = f 1 (y k x k, z k )f 2 (z k x k )dz k The goal is to use model covariates to integrate out the design effects from model

Results and Conclusions For many applications, sample weights can be included as model covariates to account for the design bias, and then subsequently estimated by a nonparametric estimator using model covariates The full regression model is y = x T β + wx T γ + ɛ A semiparametric model is proposed where (ˆβ T, ˆγ T ) come from the regression of y on x and w E [w x] is estimated by a nonparametric, design-based estimator The nonparametric estimator is combined with the parametric regression to form an estimator for y that is a smooth function of x nonparametric methods are used here to integrate out the design effects from the model

The use of followups for propensity score adjustment with nonignorable nonresponse Jongho Im and Jae-Kwang Kim Department of Statistics, Iowa State University June 10, 2013

Background and Motivation Nonignorable nonreponse bias can be corrected with followups. Our goal is to provide a propensity score adjusted estimator, Ŷ = n d i δ i,t 1 y i + i=1 n i=1 (1 δ i,t 1 )δ it d i y i ˆp it for t = 1,, T with δ i0 = 0. d i is sampling weight. A t is a set of all respondents up to the t-th contact; A 1 A T. δ it is equal to 1 if i A t and 0 otherwise. p it is the conditional response probability at the t-th contact, p it P(δ it = 1 δ i,t 1 = 0, y i ) = {1 + exp(α t + φy i )} 1 Alho (1990) considered a conditional likelihood based approach to estimate ˆp it by assuming the multinomial likelihood on p it = P(δ it = 1 δ i,t 1 = 0, y i, δ it = 1) instead of p it.

Results and Conclusions Since E [δ it δ i,t 1, y i ] = p it, given the set of respondents A 1 and A 2, we can write δ i1 d i (1, y i ) = (N, Y ) & d i = N (1) p i1 i=a i A i A d i δ i1 (1, y i ) + i A d i (1 δ i1 )δ i2 p i2 (1, y i ) = (N, Y ) (2) We have 3 equations and 3 parameters in (1) and (2). We can apply the generalized method of moment (GMM) for the general followup cases that we have more equations than the number of parameters. Relatively easy to get variance estimation (GMM estimator). More robust rather than other likelihood based methods. Auxiliary variable information can be augmented as additional calibration equations.

Varying Coefficient Models in Finite Population Sampling Luis Fernando Contreras Cruz COLPOS Mexico June 10, 2013

Background and Motivation A model-assisted semiparametric method of estimating population totals is investigated to improve the precision of survey estimators by incorporating multivariate auxiliary information. The proposed superpopulation model is a varying coefficient model. The varying coefficient models (Hastie and Tibshirani,1993) and many of their variations (e.g. Hoover,1998) have gained much attention in the literature. The applications are found in various scientific areas, such as economics, business, medical science, etc. (see Fan, 2008 for a nice review). Both simulated and real data examples are given to illustrate the model and the proposed estimation methodology, which have provided strong evidence that corroborates with the asymptotic theory.

Results and Conclusions A way to obtain the smoothing parameters was proposed using cross-validation. The VCM identifies relations non linear between the variables. The VCM assisted-models contributes to semiparametric regression in survey sampling. The Variance estimation using cross-validation and g-weights work well in simulation studies and application. Use cross-validation to avoid overfitting problem.

Application of Z-estimation Theory to Calibrated Estimators for Semiparametric Models with Two-phase Stratified Sampling Jie Kate Hu, Gary Chan, Norman Breslow Department of Biotatistics University of Washington, Seattle, WA June 10, 2013

Motivation In epidemiology studies, we are usually interested in parameters specified in a (semi)parametric model describing an association between an exposure and an outcome. For example, λ(t Z) = λ 0 (t) + θ T Z. To improve the efficiency, we consider two-phase stratified sampling design and calibration estimators using auxiliary variables available for all cohort members. Our goal it to estimate both Euclidean and infinite dimensional parameters simultaneously in semiparametric models using inverse probability weighted estimating equation (IPW-EE) with calibration.

Results Let X be the variable of interest. Motivated by the semiparametric model, α 0 is defined as the unique solution to the map Ψ(a) = Eψ α (X ) = 0. Let vector Ṽ = Ṽ (V ) be the calibration variable. Calibrated estimator ˆα is obtained by solving the calibrated IPW-EE: N Ψ ψα,γ(x, V, R) = 0, ψ α,γ(x, V, R) = N (α, γ) = 1 N ( ψ 1,α,γ ψ 2,γ Asymptotic distribution of ˆα : i=1 (X, V, R) = R π 0(V ) exp( γt Ṽ )ψ α (X ) R (V, R) = π 0(V ) exp( γt Ṽ )Ṽ Ṽ N(ˆα α 0 ) = Ψ c 1 11 G N ψ1,α 0,0+ Ψ c 1 11 Ψ c c 1 12 Ψ 22 G Nψ2,0+o p (1). ).

Estimation of Cluster-level Regression Model under Nonresponse within Clusters Nuanpan Nangsue Social Sciences, University of Southampton, UK June 10, 2013

Background and Motivation Aim: Look at new methods for analysis which incorporate information on non-response in the model The model of interest is a cluster level regression model relating the cluster mean Ȳ i of y ij Ȳ i = x i β + ɛ i (3) We suppose that underlying (3) we may write y ij = x i β + ɛ ij (4) To model the response outcome R ij, we introduce a variable u ij so that R ij = 1 if u ij > 0 and R ij = 0, otherwise. We assume that u ij = z i γ + δ ij (5) The inferential problem is how to use observed data on y ij, x i and z i to make inference about β.

Results and Conclusions To develop an estimator following the approach of Heckman (1976), we may write ( z E(y ij R ij = 1) = x i ) β + cλ i γ, (6) ( ) ( ) ( where c = σ ɛδ σ 1 z δ, λ i γ z σ δ = φ i γ z σ δ /Φ i γ σ δ ). A simpler version of this estimator is obtained by noting that for large m i, the response rate p i = r i m i may be expressed approximately as ( z ) p i E(R ij ) = Φ i γ = Φ(Ψ i ) (7) ( Now set ˆΨ i = Φ 1 z (p i ) and replace λ i ˆγ by λ( ˆΨ i ) in the Heckman two-step approach. An approximate Heckman maximum likelihood estimator is also obtained in order to estimate the regression coefficients β and c. σ δ ˆσ δ ) σ δ

Proportion estimators in dual frame surveys with auxiliary information Hemilio Coelho 1, Camila Silva 1 and Cristiano Ferraz 2 1. Department of Statistics, Federal University of Paraiba 2. Department of Statistics, Federal University of Pernambuco June 10, 2013

Background and Motivation In dual frame surveys, probability samples are independently drawn from two overlapping frames, denoted by A and B, with A B The simultaneous use of both frames, in a dual frame design generate three domains mutually exclusive: a = A B c, b = B A c and ab = A B. Based on results proposed by Hartley (1962), we proposed three estimators to estimate the populational proportion assisted by regression models, denoted by ˆP 1, ˆP 2 and ˆP 3, where the model used in the third estimator was based on logistic regression; The goal is to evaluate the performance of these estimators through Monte Carlo Experiments. All estimators were evaluated on their replicates mean, standard deviation, mean squared error and relative bias.

Results and Conclusions The results show that estimators ˆP 1 and ˆP 2 presented less relative bias than the estimator ˆP 3 ; When we look for the standard deviation for all sample sizes, it is possible to note that the estimator ˆP 3 presented better performance; The results show that the relative bias of estimator ˆP 3 not changed for all sample sizes considered, which suggests a further study to correct this bias. The correct specification of the model or the number of auxiliary information present in study can improve the performance of estimator ˆP 3.

Impacts of Nonsampling Errors on Estimates for the Conservation Effects Assessment Project Andreea Erciulescu and Emily Berg Department of Statistics, Iowa State University June 10, 2013

Background and Objectives Conservation Effects Assessment Project (CEAP) Environmental impacts of conservation practices Population: cultivated cropland Estimation domains: watersheds (8-digits nested in 4-digits) Boone/Raccoon River Watershed (Iowa) Sample of locations classified as cultivated cropland according to the National Resources Inventory Computer model converts collected data to analysis variables Soil erosion (RUSLE2), wind erosion, nitrogen run-off Nonsampling errors in CEAP Nonresponse - refusals Frame undercoverage - limited information on land use at sample design stage

Methods and Results Auxiliary information to evaluate bias due to nonsampling errors Slope, soil erodibility index from Soil Survey (known for full population) Soil erosion based on Universal Soil Loss Equation from NRI (known for NRI sample) Compare means using t-tests and locations using nonparametric tests Little evidence of nonresponse bias Evidence of bias due to frame undercoverage Especially in southern watersheds, where slopes are steeper and changes between non-cultivated and cultivated cropland are more common On-going work Calibration to adjust for bias due to frame undercoverage Small area estimation, 8-digit watersheds

Jackknife Empirical Likelihood for Regression Imputation Estimation Sixia Chen and Pingshou Zhong Westat and Michigan State University June 10, 2013

Item Nonresponse in Auxiliary Variables Used in Weighting Adjustments for Survey Sample Data Raphael Nishimura Michigan Program in Survey Methodology, Institute for Social Research, University of Michigan June 10, 2013

Background and Motivation Auxiliary variables in weighting on survey sampling: Population aggregates (control totals) known for auxiliary variables: t x = x i U Adjust design-weights to match population totals: w i x i = t x Improve estimates precision Calibration (Deville and Sarndal, 1992) Special case: Linear GREG (Generalized REGression) estimator Requirement: auxiliary variables observed for all sampled elements However, some important auxiliary variables may not be completely observed In practice: auxiliary variables imputed when missing or not used in weighting What are the impact of such procedure in the survey estimates? s

Results and Conclusions Missing values in auxiliary variables used in weighting adjustments: Never use complete cases only Larger variance (reduced sample size) Potential bias Calibration using auxiliary variable with imputed values, worthwhile when MAR (correctly specified imputation model) High correlation with survey variable Missing rate is not high Otherwise, using other auxiliary variables with lower missing rates and/or higher correlation with survey variables might be better alternative