The Impact of Measurement Error on Propensity Score Analysis: An Empirical Investigation of Fallible Covariates

Size: px

Start display at page:

Download "The Impact of Measurement Error on Propensity Score Analysis: An Empirical Investigation of Fallible Covariates"

Deborah Hancock
5 years ago
Views:

1 The Impact of Measurement Error on Propensity Score Analysis: An Empirical Investigation of Fallible Covariates Eun Sook Kim, Patricia Rodríguez de Gil, Jeffrey D. Kromrey, Rheta E. Lanehart, Aarti Bellara, Reginald S. Lee Modern Modeling Methods Conference, May 21st, 2013, Windsor Locks, CT

2 Introduction Background Purpose Method Results Common Support Balance Bias RMSE Type I error CI coverage and width Conclusions Further research Presentation Outline 2

3 Introduction Rubin s Causal Model (RCM) T i =Y 1i Y 0i T i = treatment effect for individual i Y 1i = potential outcome for treatment Y 0i = potential outcome for control Fundamental Problem of Causal Inference Solution to estimate causality T = E(Y 1i Z i = 1) E(Y 0i Z i = 0) where (Y 1 and Y 0 ) Z Assumptions Strongly ignorable treatment assignment Stable unit treatment value assumption (SUTVA) 3

4 Propensity Score Methods (PSM) Propensity Score (PS): Estimate of an individual s probability for being assigned to treatment group logit ( Z 1) log 1 ˆ where p is the number of predictors Use the estimated propensity score to condition treatment and control groups Caliper Matching Matching without caliper Stratification Covariance Adjustment PS Weighting ˆ 0 p i 1 ix i 4

5 Researcher s Decisions Covariate Selection PS Estimation Evaluate Common Support Trimming Samples Conditioning Methods Balance Properties Outcome Model 5

6 Covariate Selection Model specification error Relation of covariates to outcome Relation of covariates to treatment assignment Brookhart et al., 2006; Rubin & Thomas, 1996; Rubin, 1997 Measurement errors in covariates Deleterious effect of measurement error on PS analysis Bellara et al., 2013; Steiner, Cook, & Shadish,

7 Background: Previous Research Bias in Point Estimates by Covariate Reliability 7

8 Background: Type I Error Rates by Covariate Reliability Type I Error Rates by Covariate Reliability 8

9 Background: CI Coverage by Covariate Reliability CI Coverage by Covariate Reliability 9

10 Purpose of the Study To investigate the effect of covariate selection based on measurement quality on the balance and the estimation of the treatment effect in PS analysis When the covariates in a sample have various levels of measurement error Select a full set or a subset of reliable covariates To provide guidelines in selecting covariates that could reduce selection bias more efficiently in the presence of measurement errors 10

Method Simulation study Fully crossed factorial mixed design with 6 between-subjects factors and 3 within-subject factors Between Number of covariates Population treatment effect Covariate

11 Method Simulation study Fully crossed factorial mixed design with 6 between-subjects factors and 3 within-subject factors Between Number of covariates Population treatment effect Covariate relationship to treatment Covariate relationship to outcome Correlation among covariates Sample size Within PS conditioning methods Covariate selection Trimming 2160 conditions x 7 conditioning methods x 3 covariate sets x 2 trimming 5000 replications SAS IML Procedure 11

12 Method Design factors in data generation Between-subject factors: Number of covariates 9, 18, 27 Population Treatment effect 0.0, 0.2, 0.5, 0.8 Covariate relationship to treatment assignment 0.025, 0.050, Covariate relationship to outcome 0.025, 0.050, Correlation among covariates 0,.2,.5 Sample Size 50, 100, 250, 500,

13 Method Within-Subjects Factor PS conditioning methods Ignoring Covariates Matching without caliper one-to-one matching Matching with caliper Caliper width =.25 SD of PS ANCOVA PS ANCOVA PS Weighting Inverse probability of treatment weights Stratification Quintile 13

14 Within-Subject Factor: Covariate selection Method Three levels of reliability in equal proportions in a single sample.6,.8, 1.0 Covariate selection A full set Covariates with high reliability (.8) Covariates with perfect reliability only (1.0 only) 14

15 Results Common Support Balance Bias RMSE Type I error CI Coverage CI Width 15

16 Common Support Coverage By Sample Size and Covariate Set 16

17 Common Support Coverage by Correlation among Covariates and Covariate Set 17

18 Common Support Coverage by Covariate Relation to Outcome and Covariate Set 18

19 Common Support Coverage by Covariate Relation to Treatment Assignment and Covariate Set 19

20 Common Support Coverage by Number of Covariates And Covariate Set 20

21 Distribution of Balance of Binary Covariate 21

22 Distribution of Balance of Binary Covariate When N >

23 Distribution of Balance of Continuous Covariate When N >

24 Distribution of Balance of Binary Covariates By Covariate Set When N >

25 Balance of Binary Covariates by Covariate Set and Conditioning Method with No Correlation Among Covariates (r = 0.0) 25

26 Balance of Binary Covariates by Covariate Set and Conditioning Method with High Correlation Among Covariates (r = 0.5) 26

27 Bias Distribution of Bias Cmatch NoCmatch Ignore Ancova PS_Ancova Weighting Stratify Conditioning Method 27

28 Bias Distribution of Bias when N> 100 Caliper match NoCaliper Ignore Ancova PS_Ancova Weighting Stratify Match Conditioning Method 28

29 Distribution of Bias by Covariate Sets when N>

30 30

31 31

32 RMSE Mean Distribution by Method 32

33 RMSE 33

34 RMSE 34

35 RMSE 35

36 RMSE 36

37 37

38 38

39 39

40 40

41 41

42 42

43 43

44 44

45 45

46 46

47 47

48 48

49 49

50 50

51 51

52 Conclusions Model specification error made deleterious effects on propensity score analysis Consistent across conditioning methods Observed in most of outcome variables (e.g., bias, Type I error) More serious as more covariates are omitted 52

53 Conclusions When there are covariates with different levels of reliability in a single sample, omitting covariates with poor measurement quality is not recommended More cautious when the covariates are highly related to outcome More cautious when the covariates are highly related to treatment assignment More cautious when sample size is large The degree depends on conditioning methods (e.g., less impact on PS ANCOVA) and also on the simulation study outcomes (e.g., negligible effect on balance)

54 Further Research Errors-in-variables model Explicitly model measurement errors in propensity score estimation using the errors-in-variables logistic model rather than omitting covariates with measurement error Propensity score analysis with binary outcome The impact of measurement error and model specification error on the estimation of binary outcomes The effect of misspecification of functional forms in propensity score estimation 54

55 Contact Information Your comments and questions are valued and encouraged. Contact the author at: Eun Sook Kim, Ph. D. Department of Educational Measurement and Research University of South Florida 4202 E. Fowler Ave. EDU 105 Tampa, FL Office: EDU 369 Phone: (813)

An Empirical Comparison of Multiple Imputation Approaches for Treating Missing Data in Observational Studies

Paper 177-2015 An Empirical Comparison of Multiple Imputation Approaches for Treating Missing Data in Observational Studies Yan Wang, Seang-Hwane Joo, Patricia Rodríguez de Gil, Jeffrey D. Kromrey, Rheta