Analyzing Pilot Studies with Missing Observations

Similar documents
2 Naïve Methods. 2.1 Complete or available case analysis

Some methods for handling missing values in outcome variables. Roderick J. Little

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

A weighted simulation-based estimator for incomplete longitudinal data models

Discussing Effects of Different MAR-Settings

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW

Statistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23

Inferences on missing information under multiple imputation and two-stage multiple imputation

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

MISSING or INCOMPLETE DATA

Comparing Group Means When Nonresponse Rates Differ

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

6. Fractional Imputation in Survey Sampling

Two-phase sampling approach to fractional hot deck imputation

analysis of incomplete data in statistical surveys

Data Integration for Big Data Analysis for finite population inference

Whether to use MMRM as primary estimand.

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70

Don t be Fancy. Impute Your Dependent Variables!

Unbiased estimation of exposure odds ratios in complete records logistic regression

6 Pattern Mixture Models

Estimation of Missing Data Using Convoluted Weighted Method in Nigeria Household Survey

An Empirical Comparison of Multiple Imputation Approaches for Treating Missing Data in Observational Studies

Modelling Dropouts by Conditional Distribution, a Copula-Based Approach

Longitudinal analysis of ordinal data

A note on multiple imputation for general purpose estimation

MISSING or INCOMPLETE DATA

Fractional Imputation in Survey Sampling: A Comparative Review

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

ST 790, Homework 1 Spring 2017

7 Sensitivity Analysis

T E C H N I C A L R E P O R T KERNEL WEIGHTED INFLUENCE MEASURES. HENS, N., AERTS, M., MOLENBERGHS, G., THIJS, H. and G. VERBEKE

Plausible Values for Latent Variables Using Mplus

F-tests for Incomplete Data in Multiple Regression Setup

Time-Invariant Predictors in Longitudinal Models

Short course on Missing Data

Basics of Modern Missing Data Analysis

Toutenburg, Fieger: Using diagnostic measures to detect non-mcar processes in linear regression models with missing covariates

Analysis of Incomplete Non-Normal Longitudinal Lipid Data

Miscellanea A note on multiple imputation under complex sampling

Imputation Algorithm Using Copulas

Discussion of Identifiability and Estimation of Causal Effects in Randomized. Trials with Noncompliance and Completely Non-ignorable Missing Data

A Sampling of IMPACT Research:

Weighting Missing Data Coding and Data Preparation Wrap-up Preview of Next Time. Data Management

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Linear Mixed Models for Longitudinal Data with Nonrandom Dropouts

Causal Inference Basics

Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level

Downloaded from:

Nonresponse weighting adjustment using estimated response probability

Known unknowns : using multiple imputation to fill in the blanks for missing data

STATISTICAL INFERENCE FOR SURVEY DATA ANALYSIS

A Significance Test for the Lasso

Bayesian Analysis of Multivariate Normal Models when Dimensions are Absent

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London

Topics and Papers for Spring 14 RIT

Planned Missingness Designs and the American Community Survey (ACS)

Pooling multiple imputations when the sample happens to be the population.

Case Study in the Use of Bayesian Hierarchical Modeling and Simulation for Design and Analysis of a Clinical Trial

Time-Invariant Predictors in Longitudinal Models

Nonrespondent subsample multiple imputation in two-phase random sampling for nonresponse

BIOSTATISTICAL METHODS

On the bias of the multiple-imputation variance estimator in survey sampling

Sampling and incomplete network data

VARIANCE ESTIMATION FOR NEAREST NEIGHBOR IMPUTATION FOR U.S. CENSUS LONG FORM DATA

Graybill Conference Poster Session Introductions

For more information about how to cite these materials visit

Chapter 4. Parametric Approach. 4.1 Introduction

Biostat 2065 Analysis of Incomplete Data

Comparison of methods for repeated measures binary data with missing values. Farhood Mohammadi. A thesis submitted in partial fulfillment of the

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data

A Comparison of Multiple Imputation Methods for Missing Covariate Values in Recurrent Event Data

Recent Advances in the analysis of missing data with non-ignorable missingness

Time Invariant Predictors in Longitudinal Models

Ordinary Least Squares Regression

Integrated approaches for analysis of cluster randomised trials

Sample Size and Power Considerations for Longitudinal Studies

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

Parametric fractional imputation for missing data analysis

Reconstruction of individual patient data for meta analysis via Bayesian approach

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Survival models and health sequences

1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as

Alexina Mason. Department of Epidemiology and Biostatistics Imperial College, London. 16 February 2010

Comment on Tests of Certain Types of Ignorable Nonresponse in Surveys Subject to Item Nonresponse or Attrition

Introduction An approximated EM algorithm Simulation studies Discussion

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Methods for Handling Missing Data

Measurement error as missing data: the case of epidemiologic assays. Roderick J. Little

Multiple Imputation For Missing Ordinal Data

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

University of Michigan School of Public Health

Causal Inference with a Continuous Treatment and Outcome: Alternative Estimators for Parametric Dose-Response Functions

ECO375 Tutorial 8 Instrumental Variables

More about linear mixed models

An Introduction to Causal Analysis on Observational Data using Propensity Scores

Bios 6648: Design & conduct of clinical research

Combining Non-probability and Probability Survey Samples Through Mass Imputation

Transcription:

Analyzing Pilot Studies with Missing Observations Monnie McGee mmcgee@smu.edu. Department of Statistical Science Southern Methodist University, Dallas, Texas Co-authored with N. Bergasa (SUNY Downstate Medical Center) I. Ginsburg and D. Engler (Columbia Presbyterian Medical Center) University of Texas at Dallas, April 19, 2005 p.1/32

Outline 1. Motivation: Gabapentin Study 2. Analysis with a Mixed-Effects Model 3. Other Important Facts about the Data 4. Dealing with the Real Data 5. Conclusions and Future Explorations University of Texas at Dallas, April 19, 2005 p.2/32

Gabapentin Study Protocol called for 16 subjects in pre-post format Half randomized to receive Gabapentin Main outcomes: Hourly Scratching Activity & Visual Analogue Score Two quantitations: Baseline and After 6 weeks Quantitations required a 48-hour stay in the hospital University of Texas at Dallas, April 19, 2005 p.3/32

Mixed Effects Model Analysis y ijk = α i + β j + γ ij + ɛ ijk y ijk is the response for the i th group and the j th quantitation on the k th subject. University of Texas at Dallas, April 19, 2005 p.4/32

Mixed Effects Model Analysis y ijk = α i + β j + γ ij + ɛ ijk y ijk is the response for the i th group and the j th quantitation on the k th subject. α i, i = 1, 2, represents effect of treatment group University of Texas at Dallas, April 19, 2005 p.4/32

Mixed Effects Model Analysis y ijk = α i + β j + γ ij + ɛ ijk y ijk is the response for the i th group and the j th quantitation on the k th subject. α i, i = 1, 2, represents effect of treatment group β j, j = 1, 2 is the effect of the j th quantitation University of Texas at Dallas, April 19, 2005 p.4/32

Mixed Effects Model Analysis y ijk = α i + β j + γ ij + ɛ ijk y ijk is the response for the i th group and the j th quantitation on the k th subject. α i, i = 1, 2, represents effect of treatment group β j, j = 1, 2 is the effect of the j th quantitation γ ij is the interaction effect between group and quantitation University of Texas at Dallas, April 19, 2005 p.4/32

Mixed Effects Model Analysis y ijk = α i + β j + γ ij + ɛ ijk y ijk is the response for the i th group and the j th quantitation on the k th subject. α i, i = 1, 2, represents effect of treatment group β j, j = 1, 2 is the effect of the j th quantitation γ ij is the interaction effect between group and quantitation ɛ ijk N(0,σ 2 I) University of Texas at Dallas, April 19, 2005 p.4/32

Mixed Effects Model Analysis y ijk = α i + β j + γ ij + ɛ ijk y ijk is the response for the i th group and the j th quantitation on the k th subject. α i, i = 1, 2, represents effect of treatment group β j, j = 1, 2 is the effect of the j th quantitation γ ij is the interaction effect between group and quantitation ɛ ijk N(0,σ 2 I) The random effect is due to different initial levels of response for each subject on each quantitation University of Texas at Dallas, April 19, 2005 p.4/32

LME Results for HSA Effect Num DF Den DF F Value Pr > F Constant 1 861 24.81 < 0.0001 Group 1 13 4.47 0.0543 Quant 1 861 8.76 0.0032 Group Quant 1 861 1.39 0.2390 Log Likelihood: 5517.502 University of Texas at Dallas, April 19, 2005 p.5/32

Show Me the Data! Excel Spreadsheet of the Data Graphical Display of HSA and VAS University of Texas at Dallas, April 19, 2005 p.6/32

Issues with the Data Lots of NAs in spreadsheet! Entire pre and/or post assessments missing for 4 subjects A priori difference in gabapentin and placebo groups Very small sample size Disparate beginning times HSA and VAS normalization Detection limit for HSA; Finite scale for VAS University of Texas at Dallas, April 19, 2005 p.7/32

Types of Missingness Missing Completely at Random (MCAR): probability of an observation being missing does not depend on observed or unobserved measurements. Pr(R y o,y m ) = Pr(R) University of Texas at Dallas, April 19, 2005 p.8/32

Types of Missingness Missing Completely at Random (MCAR): probability of an observation being missing does not depend on observed or unobserved measurements. Pr(R y o,y m ) = Pr(R) Missing at Random (MAR): probability of an observation being missing, given the observed data, does not depend on the unobserved data. Pr(R y o,y m ) = Pr(R y o ) University of Texas at Dallas, April 19, 2005 p.8/32

Types of Missingness (cont d) Missing Not at Random (MNAR): probability of an observation being missing depends on the value of the missing observation itself. University of Texas at Dallas, April 19, 2005 p.9/32

Types of Missingness (cont d) Missing Not at Random (MNAR): probability of an observation being missing depends on the value of the missing observation itself. In most situations, the true mechanism is probably MNAR. - Carpenter & Kenward ( 2005) University of Texas at Dallas, April 19, 2005 p.9/32

Missingness in Gabapentin Data Due to severity of missingness in hours 24-48 of quantitation, only first 24-hours of data were used. University of Texas at Dallas, April 19, 2005 p.10/32

Missingness in Gabapentin Data Due to severity of missingness in hours 24-48 of quantitation, only first 24-hours of data were used. Two subjects pre-treatment data are missing due to equipment malfunction. University of Texas at Dallas, April 19, 2005 p.10/32

Missingness in Gabapentin Data Due to severity of missingness in hours 24-48 of quantitation, only first 24-hours of data were used. Two subjects pre-treatment data are missing due to equipment malfunction. Itermittant data missing due to eating, sleeping, showering, etc. during the hospital stay. University of Texas at Dallas, April 19, 2005 p.10/32

Missingness in Gabapentin Data Due to severity of missingness in hours 24-48 of quantitation, only first 24-hours of data were used. Two subjects pre-treatment data are missing due to equipment malfunction. Itermittant data missing due to eating, sleeping, showering, etc. during the hospital stay. Some data may be missing due to severity of scratching or severity of illness (two subjects with missing post-treatment measurements) University of Texas at Dallas, April 19, 2005 p.10/32

Missingness in Gabapentin Data Due to severity of missingness in hours 24-48 of quantitation, only first 24-hours of data were used. Two subjects pre-treatment data are missing due to equipment malfunction. Itermittant data missing due to eating, sleeping, showering, etc. during the hospital stay. Some data may be missing due to severity of scratching or severity of illness (two subjects with missing post-treatment measurements) Our mechanism is mostly MAR University of Texas at Dallas, April 19, 2005 p.10/32

Now What? Fill-in the missing values and rerun the mixed model. University of Texas at Dallas, April 19, 2005 p.11/32

Now What? Fill-in the missing values and rerun the mixed model. Mean-filled values Regression-mean imputation University of Texas at Dallas, April 19, 2005 p.11/32

Now What? Fill-in the missing values and rerun the mixed model. Mean-filled values Regression-mean imputation Last Observation Carried Forward (LOCF) University of Texas at Dallas, April 19, 2005 p.11/32

Now What? Fill-in the missing values and rerun the mixed model. Mean-filled values Regression-mean imputation Last Observation Carried Forward (LOCF) Hot Deck (or Cold Deck) Imputation University of Texas at Dallas, April 19, 2005 p.11/32

Now What? Fill-in the missing values and rerun the mixed model. Mean-filled values Regression-mean imputation Last Observation Carried Forward (LOCF) Hot Deck (or Cold Deck) Imputation Likelihood based Imputation University of Texas at Dallas, April 19, 2005 p.11/32

Now What? Fill-in the missing values and rerun the mixed model. Mean-filled values Regression-mean imputation Last Observation Carried Forward (LOCF) Hot Deck (or Cold Deck) Imputation Likelihood based Imputation Time Series Approach (Pfeffermann and Nathan, 2002) NB: Most results pertaining to inference are asymptotic results. University of Texas at Dallas, April 19, 2005 p.11/32

Results: Mean-Filled Values Effect Num DF Den DF F Value Pr > F Constant 1 607 101.88 < 0.0001 Group 1 13 1.47 0.2468 Quant 1 607 39.08 < 0.0001 Group Quant 1 607 21.00 < 0.0001 Log Likelihood: 1181.314 University of Texas at Dallas, April 19, 2005 p.12/32

Results: LOCF-Filled Values Effect Num DF Den DF F Value Pr > F Constant 1 581 89.29 < 0.0001 Group 1 13 0.907 0.3584 Quant 1 581 38.39 < 0.0001 Group Quant 1 581 21.32 < 0.0001 Log Likelihood: 1124.822 University of Texas at Dallas, April 19, 2005 p.13/32

Summary Thus Far Carpenter and Kenward (2005) call mean replacement and LOCF unprincipled methods Both lead to biased estimates of parameters. Simple mean imputation tends to dilute associations. LOCF distorts mean and covariance structure, even for a single time point, even under MCAR. Regression mean imputation can generate unbiased estimates, but the variance is still typically underestimated. Can t replace entire quantitations with mean or LOCF. University of Texas at Dallas, April 19, 2005 p.14/32

Nearest Neighbor Hot Deck Imputation Let y i = (y i1,...,y ik ) be a K 1 complete data vector of outcomes. University of Texas at Dallas, April 19, 2005 p.15/32

Nearest Neighbor Hot Deck Imputation Let y i = (y i1,...,y ik ) be a K 1 complete data vector of outcomes. Let y i = (y obs,i,y obs,m ) where y obs,i is the observed part and y obs,m is the missing part of y i. Then ŷ it = y lt + (y obs,i y obs,l ) where y obs,i is the mean of the observed values for subject i. University of Texas at Dallas, April 19, 2005 p.15/32

Nearest Neighbor Hot Deck Imputation Let y i = (y i1,...,y ik ) be a K 1 complete data vector of outcomes. Let y i = (y obs,i,y obs,m ) where y obs,i is the observed part and y obs,m is the missing part of y i. Then ŷ it = y lt + (y obs,i y obs,l ) where y obs,i is the mean of the observed values for subject i. Subject l is called the donor. University of Texas at Dallas, April 19, 2005 p.15/32

Choosing a Donor We want a donor that is close to the subject whose observations are missing. University of Texas at Dallas, April 19, 2005 p.16/32

Choosing a Donor We want a donor that is close to the subject whose observations are missing. Close is defined by a metric, e. g. d(i,j) = max k x ik x jk where x i = (x i1,...,x ik ) T are the values of K appropriatly scaled covariates for a unit i at which y i is missing. University of Texas at Dallas, April 19, 2005 p.16/32

Donors for TS Data Suppose subject i is missing a value at time t. The closest donor is defined as d j (t) = min j for all j = 1,...,n 1. T t=1 x it x jt, University of Texas at Dallas, April 19, 2005 p.17/32

Hot Deck Results Effect Num DF Den DF F Value Pr > F Constant 1 703 75.48 < 0.0001 Group 1 13 0.760 0.2468 Quant 1 703 37.61 < 0.0001 Group Quant 1 703 16.18 < 0.0001 Log Likelihood: 1405.141 University of Texas at Dallas, April 19, 2005 p.18/32

A Modification Hot Deck Imputation provides us with only one data set, which we take as the real data. University of Texas at Dallas, April 19, 2005 p.19/32

A Modification Hot Deck Imputation provides us with only one data set, which we take as the real data. Multiple Imputation provides us with multiple data sets, which we can use to estimate uncertainty about the correct nonresponse model. University of Texas at Dallas, April 19, 2005 p.19/32

A Modification Hot Deck Imputation provides us with only one data set, which we take as the real data. Multiple Imputation provides us with multiple data sets, which we can use to estimate uncertainty about the correct nonresponse model. BUT - MI can be complicated. University of Texas at Dallas, April 19, 2005 p.19/32

A Modification Hot Deck Imputation provides us with only one data set, which we take as the real data. Multiple Imputation provides us with multiple data sets, which we can use to estimate uncertainty about the correct nonresponse model. BUT - MI can be complicated. Estimate multiple data sets using NNHDI with additive noise. University of Texas at Dallas, April 19, 2005 p.19/32

Modified NNHDI Results Results for 3 Imputations of NNHDI with additive N(0, 29) noise. University of Texas at Dallas, April 19, 2005 p.20/32

Modified NNHDI Results Results for 3 Imputations of NNHDI with additive N(0, 29) noise. Effect Imputation Num DF Den DF F Value Pr > F Group A 1 13 5.15 0.0409 B 1 13 7.27 0.0183 C 1 13 7.07 0.0196 Quant A 1 703 24.86 < 0.0001 B 1 703 22.50 < 0.0001 C 1 703 20.58 < 0.0001 Group Quant A 1 703 23.51 < 0.0001 Log Likelihoods: A: 805.89, B: 817.74, C: 814.94 B 1 703 47.92 < 0.0001 C 1 703 52.74 < 0.0001 University of Texas at Dallas, April 19, 2005 p.20/32

Nonresponse Uncertainty Let ˆθ d and W d, d = 1,...,D, be D complete-data estimates and their associated variances for θ. Then University of Texas at Dallas, April 19, 2005 p.21/32

Nonresponse Uncertainty Let ˆθ d and W d, d = 1,...,D, be D complete-data estimates and their associated variances for θ. Then θ D = 1 D D d=1 ˆθ d. University of Texas at Dallas, April 19, 2005 p.21/32

Nonresponse Uncertainty Let ˆθ d and W d, d = 1,...,D, be D complete-data estimates and their associated variances for θ. Then θ D = 1 D D ˆθ d. d=1 and the average within imputation variance is W D = 1 D D d=1 W d. University of Texas at Dallas, April 19, 2005 p.21/32

More Uncertainty The between-imputation variance is Total variability is B D = 1 D 1 D (ˆθ d θ D ) 2. d=1 T D = W D + D + 1 D B D, University of Texas at Dallas, April 19, 2005 p.22/32

More Uncertainty The between-imputation variance is Total variability is B D = 1 D 1 D (ˆθ d θ D ) 2. d=1 T D = W D + D + 1 D B D, and ˆγ D = (1 + 1/D)B D /T D is an estimate of the fraction of information about θ due to nonresponse (Little and Rubin, pp.86-87). University of Texas at Dallas, April 19, 2005 p.22/32

Uncertainty Calculations For the Gabapentin Data: Effect θd Wd B D T D ˆγ D Group -0.862 31.4 0.081 31.6 0.003 Quant -0.065 3.90 0.078 4.01 0.026 Interaction 0.694 8.4 0.149 8.56 0.023 University of Texas at Dallas, April 19, 2005 p.23/32

Power and Size Case 1: Pretest/Posttest Study with one normally distributed random variable (σ 2 = 1) and data MCAR University of Texas at Dallas, April 19, 2005 p.24/32

Power and Size Case 1: Pretest/Posttest Study with one normally distributed random variable (σ 2 = 1) and data MCAR Case 2: Case 1 with chunks of missing data (wave nonresponse). University of Texas at Dallas, April 19, 2005 p.24/32

Power and Size Case 1: Pretest/Posttest Study with one normally distributed random variable (σ 2 = 1) and data MCAR Case 2: Case 1 with chunks of missing data (wave nonresponse). Case 3: Wave nonresponse for longitudinal data with no correlation, analyzed with mixed-model University of Texas at Dallas, April 19, 2005 p.24/32

Power and Size Case 1: Pretest/Posttest Study with one normally distributed random variable (σ 2 = 1) and data MCAR Case 2: Case 1 with chunks of missing data (wave nonresponse). Case 3: Wave nonresponse for longitudinal data with no correlation, analyzed with mixed-model Case 4: Same as 3 with AR(1) structure in data University of Texas at Dallas, April 19, 2005 p.24/32

Power and Size Case 1: Pretest/Posttest Study with one normally distributed random variable (σ 2 = 1) and data MCAR Case 2: Case 1 with chunks of missing data (wave nonresponse). Case 3: Wave nonresponse for longitudinal data with no correlation, analyzed with mixed-model Case 4: Same as 3 with AR(1) structure in data Compared size and power for 10%, 30%, and 50% missing values. University of Texas at Dallas, April 19, 2005 p.24/32

Case 1: A Simple Paired t-test N = 10 N = 30 % Missing None 10% 30% None 10% 30% µ d = 0 0.050 0.056 0.091 0.050 0.051 0.062 µ d = 2 0.977 0.938 0.712 1 1 0.999 µ d = 5 1 1 0.987 1 1 1 University of Texas at Dallas, April 19, 2005 p.25/32

Case 2: Paired t-test with Wave Nonresponse N = 10 N = 30 % Missing 30% 50% 10% 30% 50% µ d = 0 0.052 0.053 0.050 0.050 0.051 µ d = 2 0.662 0.341 0.999 0.995 0.904 University of Texas at Dallas, April 19, 2005 p.26/32

Case 3: Longitudinal WN Data N = 10 N = 30 Scenario Effect 30% 50% 30% 50% µ d = 0 Group 0.014 0.015 0.032 0.049 µ d = 0 Quant 0.053 0.049 0.051 0.035 µ d = 2 Group 0.009 0.007 0.014 0.012 µ d = 2 Quant 1 1 1 1 University of Texas at Dallas, April 19, 2005 p.27/32

Case 4: Longitudinal AR(1) Data N = 10 N = 30 Scenario Effect 30% 50% 30% 50% φ 1 = φ 2 Group 0.046 0.046 0.049 0.049 φ 1 = φ 2 Quant 0.243 0.228 0.216 0.188 φ 1 φ 2 Group 0.050 0.050 0.049 0.050 φ 1 φ 2 Quant 0.328 0.304 0.265 0.219 University of Texas at Dallas, April 19, 2005 p.28/32

The Real Issue How good are the parameter estimates under the above scenarios? University of Texas at Dallas, April 19, 2005 p.29/32

The Real Issue How good are the parameter estimates under the above scenarios? Results about estimation in the literature are asymptotic. University of Texas at Dallas, April 19, 2005 p.29/32

The Real Issue How good are the parameter estimates under the above scenarios? Results about estimation in the literature are asymptotic. Literature suggests a transformation that makes normality more accurate for small samples. University of Texas at Dallas, April 19, 2005 p.29/32

The Real Issue How good are the parameter estimates under the above scenarios? Results about estimation in the literature are asymptotic. Literature suggests a transformation that makes normality more accurate for small samples. Searle (1970) gives information matrices for mixed effects models with unbalanced data. University of Texas at Dallas, April 19, 2005 p.29/32

The Real Issue How good are the parameter estimates under the above scenarios? Results about estimation in the literature are asymptotic. Literature suggests a transformation that makes normality more accurate for small samples. Searle (1970) gives information matrices for mixed effects models with unbalanced data. Large literature on efficiency for various experimental designs in presence of missing observations. University of Texas at Dallas, April 19, 2005 p.29/32

Remaining Issues Automating choice of like individuals for replacement values University of Texas at Dallas, April 19, 2005 p.30/32

Remaining Issues Automating choice of like individuals for replacement values Variance of random perturbation University of Texas at Dallas, April 19, 2005 p.30/32

Remaining Issues Automating choice of like individuals for replacement values Variance of random perturbation Generating data substitutions from models University of Texas at Dallas, April 19, 2005 p.30/32

Remaining Issues Automating choice of like individuals for replacement values Variance of random perturbation Generating data substitutions from models Calculate efficiencies, bias, and variance University of Texas at Dallas, April 19, 2005 p.30/32

Remaining Issues Automating choice of like individuals for replacement values Variance of random perturbation Generating data substitutions from models Calculate efficiencies, bias, and variance Detection limits, a priori differences in groups, normalization, etc. University of Texas at Dallas, April 19, 2005 p.30/32

References 1. Carpenter, James and Kendward, Mike (2005) Economic and Social Research Council Missing Data Website. http:www.missingdata.org.uk. 2. Little, Roderick J.A. and Rubin, Donald B.(2002). Statistical Analysis with Missing Data (2nd edition). New York: Wiley Interscience. 3. Pfeffermann, Danny and Nathan, Gad (2002). Imputation for Wave Nonresponse: Existing Methods and a Time Series Approach, in Survey Nonresponse (Robert M. Groves, Don A. Dilman, John L. Eltinge, and Rodrick J.A. Little, eds.). New York: Wiley, Chapter 28. 4. Prescott, P. and Mansson, R.A. (2002). Efficiency of Pair Wise Treatment Comparisons in Incomplete Block Experiments Subject to the Loss of a Block of Observations. Communications in Statistics: Theory and Methods, 31, 449-462. 5. Searle, S. R. (1970). Large Sample Variances of Maximum Likelihood Estimators of Variance Components Using Unbalanced Data. Biometrics, 26, 505-524. University of Texas at Dallas, April 19, 2005 p.31/32

A Priori Difference in Groups Reassign subjects to groups at random, regardless of true assignment University of Texas at Dallas, April 19, 2005 p.32/32

A Priori Difference in Groups Reassign subjects to groups at random, regardless of true assignment Calculate two-sample t-tests for each assignment University of Texas at Dallas, April 19, 2005 p.32/32

A Priori Difference in Groups Reassign subjects to groups at random, regardless of true assignment Calculate two-sample t-tests for each assignment 1000 replications of 10000 assignments University of Texas at Dallas, April 19, 2005 p.32/32

A Priori Difference in Groups Reassign subjects to groups at random, regardless of true assignment Calculate two-sample t-tests for each assignment 1000 replications of 10000 assignments Results: Percentage of P-values < 0.05 University of Texas at Dallas, April 19, 2005 p.32/32

A Priori Difference in Groups Reassign subjects to groups at random, regardless of true assignment Calculate two-sample t-tests for each assignment 1000 replications of 10000 assignments Results: Percentage of P-values < 0.05 Data Min Median Max Original 2.97 3.35 4.2 Mean Repl 0.07 0.21 0.38 LOCF Repl 11.6 12.5 13.4 University of Texas at Dallas, April 19, 2005 p.32/32