Discussing Effects of Different MAR-Settings

Similar documents
Analyzing Pilot Studies with Missing Observations

Nonrespondent subsample multiple imputation in two-phase random sampling for nonresponse

Don t be Fancy. Impute Your Dependent Variables!

A note on multiple imputation for general purpose estimation

Statistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Econometrics of Panel Data

A multivariate multilevel model for the analysis of TIMMS & PIRLS data

Centering Predictor and Mediator Variables in Multilevel and Time-Series Models

Econometrics of Panel Data

An Empirical Comparison of Multiple Imputation Approaches for Treating Missing Data in Observational Studies

STATISTICAL INFERENCE FOR SURVEY DATA ANALYSIS

Known unknowns : using multiple imputation to fill in the blanks for missing data

F-tests for Incomplete Data in Multiple Regression Setup

Comparison of multiple imputation methods for systematically and sporadically missing multilevel data

The regression model with one fixed regressor cont d

Plausible Values for Latent Variables Using Mplus

Fractional Imputation in Survey Sampling: A Comparative Review

ECON 3150/4150, Spring term Lecture 7

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Applied Econometrics (QEM)

Measuring the fit of the model - SSR

Anova-type consistent estimators of variance components in unbalanced multi-way error components models

Regression Models - Introduction

Econometrics of Panel Data

What is Survey Weighting? Chris Skinner University of Southampton

Imputation for Missing Data under PPSWR Sampling

analysis of incomplete data in statistical surveys

Exploiting TIMSS and PIRLS combined data: multivariate multilevel modelling of student achievement

Some methods for handling missing values in outcome variables. Roderick J. Little

Inferences on missing information under multiple imputation and two-stage multiple imputation

REPLICATION VARIANCE ESTIMATION FOR TWO-PHASE SAMPLES

Chapter 6: Linear Regression With Multiple Regressors

Weighting Missing Data Coding and Data Preparation Wrap-up Preview of Next Time. Data Management

LECTURE 10. Introduction to Econometrics. Multicollinearity & Heteroskedasticity

The multiple regression model; Indicator variables as regressors

Missing Data in Longitudinal Studies: Mixed-effects Pattern-Mixture and Selection Models

Time-Invariant Predictors in Longitudinal Models

Mixed-Effects Pattern-Mixture Models for Incomplete Longitudinal Data. Don Hedeker University of Illinois at Chicago

Advising on Research Methods: A consultant's companion. Herman J. Ader Gideon J. Mellenbergh with contributions by David J. Hand

Weighted Least Squares

multilevel modeling: concepts, applications and interpretations

Missing covariate data in matched case-control studies: Do the usual paradigms apply?

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70

ECON Introductory Econometrics. Lecture 6: OLS with Multiple Regressors

Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level

Eric Shou Stat 598B / CSE 598D METHODS FOR MICRODATA PROTECTION

An Introduction to Multilevel Models. PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 25: December 7, 2012

Sociology 6Z03 Review I

Simple and Multiple Linear Regression

Recitation 1: Regression Review. Christina Patterson

Measurement Error and Linear Regression of Astronomical Data. Brandon Kelly Penn State Summer School in Astrostatistics, June 2007

CLUSTER EFFECTS AND SIMULTANEITY IN MULTILEVEL MODELS

Lecture 14 Simple Linear Regression

Modelling heterogeneous variance-covariance components in two-level multilevel models with application to school effects educational research

Econ 1123: Section 5. Review. Internal Validity. Panel Data. Clustered SE. STATA help for Problem Set 5. Econ 1123: Section 5.

Reliability of inference (1 of 2 lectures)

Integrated approaches for analysis of cluster randomised trials

arxiv: v1 [stat.me] 27 Feb 2017

Nonresponse weighting adjustment using estimated response probability

Estimation of Parameters

Simple Linear Regression

Model Assumptions; Predicting Heterogeneity of Variance

Toutenburg, Fieger: Using diagnostic measures to detect non-mcar processes in linear regression models with missing covariates

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

A Practitioner s Guide to Cluster-Robust Inference

Time Invariant Predictors in Longitudinal Models

Pooling multiple imputations when the sample happens to be the population.

A Non-parametric bootstrap for multilevel models

Local Polynomial Wavelet Regression with Missing at Random

Maximum Likelihood Estimation; Robust Maximum Likelihood; Missing Data with Maximum Likelihood

MISSING or INCOMPLETE DATA

EMERGING MARKETS - Lecture 2: Methodology refresher

Miscellanea A note on multiple imputation under complex sampling

Estimating Standard Errors in Finance Panel Data Sets: Comparing Approaches

Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data

Statistical View of Least Squares

Comparing Group Means When Nonresponse Rates Differ

Mixed Models for Longitudinal Ordinal and Nominal Outcomes

Topics and Papers for Spring 14 RIT

A weighted simulation-based estimator for incomplete longitudinal data models

Outline. Statistical inference for linear mixed models. One-way ANOVA in matrix-vector form

Statistics 910, #5 1. Regression Methods

ECONOMETFUCS FIELD EXAM Michigan State University May 11, 2007

Reconstruction of individual patient data for meta analysis via Bayesian approach

Basics of Modern Missing Data Analysis

ST 790, Homework 1 Spring 2017

Lecture Notes: Some Core Ideas of Imputation for Nonresponse in Surveys. Tom Rosenström University of Helsinki May 14, 2014

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London

Topic 12 Overview of Estimation

Models for Clustered Data

Section I. Define or explain the following terms (3 points each) 1. centered vs. uncentered 2 R - 2. Frisch theorem -

Models for Clustered Data

Principal component analysis (PCA) for clustering gene expression data

Inference for correlated effect sizes using multiple univariate meta-analyses

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois

ECON The Simple Regression Model

Analysis of Incomplete Non-Normal Longitudinal Lipid Data

Transcription:

Discussing Effects of Different MAR-Settings Research Seminar, Department of Statistics, LMU Munich Munich, 11.07.2014 Matthias Speidel Jörg Drechsler Joseph Sakshaug

Outline What we basically want to do: hierarchical imputation What we encounter: effects of MAR-settings Questions: explanation of these effect Discussing Effects of Different MAR-Settings 2

Our research DFG-Project Reason for hierarchical imputation: Item-Nonresponse nonresponse bias Imputation reduce or unmake nonresponse bias Hierarchical data hierarchical imputation Aim for unbiased linear mixed model parameter estimates Which imputation method works well, which not? And why? Focus on dummy variables imputation Discussing Effects of Different MAR-Settings 3

Notation Hierarchical data in our case: y j,i = β 0 + b 0j + (β 1 + b 1j ) x j,i + ε j,i i = 1,..., n j j = 1,..., m n j size of cluster j m number of clusters ( b 0 j b 1j ) N (( 0 0 ), Σ b = ( σ2 b 0 σ b1b 0 σ b0b 1 σ 2 b 1 )) ε j,i N(0, σ 2 ε) β j = β + b j Discussing Effects of Different MAR-Settings 4

Dummy variables imputation Dummy imputation only hierarchical imputation method implemented in SAS, SPSS and Stata. Estimates cluster specific intercept and slope imputation parameters (assumed fixed not random) Under which data and missing generating setting biased results? Explanations for the bias? Discussing Effects of Different MAR-Settings 5

Shortly: the main results cluster size = 10 cluster size = 25 cluster size = 50 0 2 4 6 0 5 10 15 20 0 4 8 12 2 1 0 1 σ b0 2 σ b1 2 σ b0b1 ρ 0.05 0.1 0.25 0.5 0.05 0.1 0.25 0.5 0.05 0.1 0.25 0.5 missing rate relative bias Figure: Relative bias after dummy imputation for different settings. Discussing Effects of Different MAR-Settings 6

Side results Not only data setting important, but also shape of missing function. Don t mean MCAR vs. MAR vs. MNAR (Rubin (1976)). Mean different missing functions within MAR. Discussing Effects of Different MAR-Settings 7

General missing function Missing function for standardized X i, desired missing rate MR 0.5 and 1 s 1: P(Y i = NA X i, MR, s) = MR (1 s) + 2 MR s logit(x i ) 1 (1) s controls strength and direction of X s influence on the missing probabilities. MCAR if s = 0 Missing probabilities on average equal to MR if X symmetrically distributed around 0 (proof missing). Discussing Effects of Different MAR-Settings 8

Illustrations missing mechanism: varying MR missing functions 1.0 0.8 MR = 0.5 MR = 0.25 MR = 0.10 MR = 0.05 P(Y = NA) 0.6 0.4 0.2 0.0 4 2 0 2 4 standardized X Figure: Illustration of missing probabilities for different missing rates Discussing Effects of Different MAR-Settings 9

Illustrations missing mechanism: varying s missing functions 1.0 0.8 s = 1.0 s = 0.5 s = 0.0 s = 0.5 s = 1.0 P(Y = NA) 0.6 0.4 0.2 0.0 4 2 0 2 4 standardized X Figure: Five different shapes of missing functions (s = 1, 0.5, 0, 0.5, 1). All lead to MAR, except MCAR of course. Discussing Effects of Different MAR-Settings 10

Results when estimating Σ b after dummy imputation 1.0 0.5 0.0 0.5 1.0 0 2 4 6 8 σ b0 2 s 1.0 0.5 0.0 0.5 1.0 0 2 4 6 8 σ b1 2 s 1.0 0.5 0.0 0.5 1.0 10 5 0 5 10 σ b0b1 s 1.0 0.5 0.0 0.5 1.0 2 1 0 1 ρ s Figure: Relative empirical biases under different missing functions. Discussing Effects of Different MAR-Settings 11

Questions 1. Explanation for differences in parameter estimates for different MAR settings? 2. Sufficient to look at variance matrix from imputation parameters (instead of variances after imputation)? Sufficient to look at variance matrices of a single cluster? 3. If bias in var( ˆβ j ) fix not clearly visible, sufficient to compare it to the unbiased var( ˆβ j ) rand? How to say matrix A is larger than matrix B? Elementwise comparison? Multiplicative or additive matrix? Interpretation? Discussing Effects of Different MAR-Settings 12

Imputation parameters variances Main interest in Var( ˆβ j β, Σ b, σ 2 ε) fix (readability)var( ˆβ j ) fix = σ 2 ε (X T X) 1 (2) Maybe compare it to what can be found in (Goldstein, 2011: p. 69) Var( ˆβ j β, Σ b, σ 2 ε) rand Var( ˆβ j ) rand = σ 2 ε (X T X + σ 2 ε Σ 1 b ) 1 (3) Beside elementwise comparison, multiplicative and additive term γ Var( ˆβ j ) rand = Var( ˆβ j ) fix (4) δ + Var( ˆβ j ) rand = Var( ˆβ j ) fix (5) Discussing Effects of Different MAR-Settings 13

Multiplicative disparity Properties and interpretation of γ? γ Var( ˆβ j ) rand = Var( ˆβ j ) fix γ σε 2 (X T X + σε 2 Σ 1 b ) 1 = σε 2 (X T X) 1 γ = (X T X) 1 (X T X + σε 2 Σ 1 b ) = (X T X) 1 X T X + (X T X) 1 σ 2 ε Σ 1 b = I + (X T X) 1 σ 2 ε Σ 1 b = I + Var( ˆβ j ) fix Σ 1 b (6) Discussing Effects of Different MAR-Settings 14

Additive disparity Properties and interpretation of δ? γ Var( ˆβ j ) rand = Var( ˆβ j ) fix (I + Var( ˆβ j ) fix Σ 1 b ) Var( ˆβ j ) rand = Var( ˆβ j ) fix Var( ˆβ j ) rand + Var( ˆβ j ) fix Σ 1 b Var( ˆβ j ) rand = Var( ˆβ j ) fix (7) Now, we have the matrix δ: δ = Var( ˆβ j ) fix Σ 1 b Var( ˆβ j ) rand (8) δ = (X T X) 1 σε 2 Σ 1 b σε 2 (X T X + σε 2 Σ 1 b ) 1 Discussing Effects of Different MAR-Settings 15

References I Goldstein, Harvey (2011): Multilevel Statistical Models. Chichester (UK): Wiley, 4 ed.. Rubin, Donald B. (1976): Inference and Missing Data. In: Biometrika, Vol. 63, No. 3, p. 581 592, URL http://biomet.oxfordjournals.org/content/63/3/581.short. Discussing Effects of Different MAR-Settings 16

Additional material Discussing Effects of Different MAR-Settings 17

Details of imputation parameters variance Dummy imputation: Var( ˆβ j ) fix = σ 2 ε n x 2 i ( x i ) 2 ( x2 i x i x i n ) (9) σ 2 ε = n 2 var(x) ( x2 i x i x i n ) (10) Mixed effects imputation: Var( ˆβ j ) rand = σε 2 (( n x i σ 2 ε x i xi 2 ) + σ0 2 σ2 1 σ2 01 ( σ2 1 1 σ 01 σ 01 σ0 2 )) (11) To shorten the expression, I define a = σ 2 ε. σ0 2 σ2 1 σ2 01 Var( ˆβ j ) rand = σε 2 ( n + a σ2 1 1 x i a σ 01 x i a σ 01 xi 2 + a σ0 2 ) (12) Discussing Effects of Different MAR-Settings 18

Details of imputation parameters variance (cont.) Var( ˆβ j ) rand = (n + a σ1 2) ( x2 i + a σ0 2) ( x i a σ 01 ) 2 σ 2 ε ( x2 i + a σ0 2 x i + a σ 01 x i + a σ 01 n + a σ1 2 ) (13) Discussing Effects of Different MAR-Settings 19

Details on delta δ =Var( ˆβ j ) fix Σ 1 b Var( ˆβ j ) rand σ 2 ε = n 2 var(x) ( x2 i x i x i n ) 1 σ0 2 σ2 1 σ2 01 ( σ2 1 σ 01 σ 01 σ0 2 ) (14) σ 2 ε (n + a σ 2 1 ) ( x2 i + a σ 2 0 ) ( x i a σ 01 ) 2 ( x2 i + a σ0 2 x i + a σ 01 x i + a σ 01 n + a σ1 2 ) Discussing Effects of Different MAR-Settings 20

Our simulation parameters Hierarchical data in our case: y j,i = 62.1 + b 0j + (0.8 + b 1j ) x j,i + ε j,i ( b 0 j ) N (( 0 b 1j 0 ), Σ 6.2 0.4 b = ( 0.4 0.1 )) ε j,i N(0, 92.2) x j,i N(0, 81) n j dependent on setting 10, 25, or 50 m = 100 Individuals class affiliation known and variable in data set. Parameters from NEPS data (and similar to CILS4EU data). Discussing Effects of Different MAR-Settings 21

Matthias Speidel Matthias.Speidel@iab.de Jörg Drechsler Jörg.Drechsler@iab.de Joseph Sakshaug Joseph.Sakshaug@iab.de www.iab.de/en