Marginal Effects in Multiply Imputed Datasets

Similar documents
mi estimate : regress newrough_ic Current Torch_height Slowoncurves E_0 G_0 Current > 2 Pressure2 Cut_speed2 Presscutspeed, noconstant

Homework Solutions Applied Logistic Regression

ECON 594: Lecture #6

Latent class analysis and finite mixture models with Stata

options description set confidence level; default is level(95) maximum number of iterations post estimation results

Marginal effects and extending the Blinder-Oaxaca. decomposition to nonlinear models. Tamás Bartus

Using generalized structural equation models to fit customized models without programming, and taking advantage of new features of -margins-

Binary Dependent Variables

2. We care about proportion for categorical variable, but average for numerical one.

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, Last revised January 20, 2018

Description Quick start Menu Syntax Options Remarks and examples Stored results Methods and formulas Acknowledgment References Also see

A Journey to Latent Class Analysis (LCA)

Boolean logit and probit in Stata

Analysis of repeated measurements (KLMED8008)

Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see

Self-Assessment Weeks 8: Multiple Regression with Qualitative Predictors; Multiple Comparisons

Variance estimation for Generalized Entropy and Atkinson indices: the complex survey data case

sociology 362 regression

crreg: A New command for Generalized Continuation Ratio Models

Understanding the multinomial-poisson transformation

Interpreting coefficients for transformed variables

Empirical Asset Pricing

Analyzing Proportions

leebounds: Lee s (2009) treatment effects bounds for non-random sample selection for Stata

Survival Analysis. Hsueh-Sheng Wu CFDR Workshop Series Spring 2010

Introduction A research example from ASR mltcooksd mlt2stage Outlook. Multi Level Tools. Influential cases in multi level modeling

Appendix A. Numeric example of Dimick Staiger Estimator and comparison between Dimick-Staiger Estimator and Hierarchical Poisson Estimator

Sociology 362 Data Exercise 6 Logistic Regression 2

Title. Description. stata.com. Special-interest postestimation commands. asmprobit postestimation Postestimation tools for asmprobit

Practice exam questions

Very simple marginal effects in some discrete choice models *

From the help desk: Comparing areas under receiver operating characteristic curves from two or more probit or logit models

1 The basics of panel data

i (x i x) 2 1 N i x i(y i y) Var(x) = P (x 1 x) Var(x)

Interaction effects between continuous variables (Optional)

Introduction to Event History Analysis. Hsueh-Sheng Wu CFDR Workshop Series June 20, 2016

raise Coef. Std. Err. z P> z [95% Conf. Interval]

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

Estimation of multivalued treatment effects under conditional independence

8. Nonstandard standard error issues 8.1. The bias of robust standard errors

Essential of Simple regression

Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see

Lecture 4: Multivariate Regression, Part 2

Recent Developments in Multilevel Modeling

Logit estimates Number of obs = 5054 Wald chi2(1) = 2.70 Prob > chi2 = Log pseudolikelihood = Pseudo R2 =

Lecture 2: Poisson and logistic regression

Lecture 12: Effect modification, and confounding in logistic regression

Description Remarks and examples Also see

Title. Description. Special-interest postestimation commands. xtmelogit postestimation Postestimation tools for xtmelogit

Ordinal Independent Variables Richard Williams, University of Notre Dame, Last revised April 9, 2017

Lab 10 - Binary Variables

Working with Stata Inference on the mean

Sociology Exam 2 Answer Key March 30, 2012

SOCY5601 Handout 8, Fall DETECTING CURVILINEARITY (continued) CONDITIONAL EFFECTS PLOTS

Section Least Squares Regression

A short guide and a forest plot command (ipdforest) for one-stage meta-analysis

sociology 362 regression

How To Do Piecewise Exponential Survival Analysis in Stata 7 (Allison 1995:Output 4.20) revised

Assessing the Calibration of Dichotomous Outcome Models with the Calibration Belt

Problem Set 10: Panel Data

Nonlinear Econometric Analysis (ECO 722) : Homework 2 Answers. (1 θ) if y i = 0. which can be written in an analytically more convenient way as

point estimates, standard errors, testing, and inference for nonlinear combinations

Lecture 5: Poisson and logistic regression

CRE METHODS FOR UNBALANCED PANELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M.

Lecture (chapter 13): Association between variables measured at the interval-ratio level

Trajectory analysis. Jan Helmdag University of Greifswald. September 26, Preliminaries Why? How? What to do? Application example

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois

Lecture#12. Instrumental variables regression Causal parameters III

Estimation and inference for quantiles and indices of inequality and poverty with survey data

1 A Review of Correlation and Regression

Modelling Binary Outcomes 21/11/2017

Lab 07 Introduction to Econometrics

The Simulation Extrapolation Method for Fitting Generalized Linear Models with Additive Measurement Error

suppress constant term

Practice 2SLS with Artificial Data Part 1

Functional Form. So far considered models written in linear form. Y = b 0 + b 1 X + u (1) Implies a straight line relationship between y and X

Analyzing Data Collected with Complex Survey Design. Hsueh-Sheng Wu CFDR Workshop Series Summer 2011

ECO220Y Simple Regression: Testing the Slope

Lab 11 - Heteroskedasticity

Outline. Linear OLS Models vs: Linear Marginal Models Linear Conditional Models. Random Intercepts Random Intercepts & Slopes

The simulation extrapolation method for fitting generalized linear models with additive measurement error

1 Warm-Up: 2 Adjusted R 2. Introductory Applied Econometrics EEP/IAS 118 Spring Sylvan Herskowitz Section #

Generalized linear models

Measurement Error. Often a data set will contain imperfect measures of the data we would ideally like.

Introductory Econometrics. Lecture 13: Hypothesis testing in the multiple regression model, Part 1

Addition to PGLR Chap 6

Warwick Economics Summer School Topics in Microeconometrics Instrumental Variables Estimation

Title. Description. Quick start. Menu. stata.com. irt nrm Nominal response model

Estimation of pre and post treatment Average Treatment Effects (ATEs) with binary time-varying treatment using Stata

Nonlinear mixed-effects models using Stata

SIMULATION-BASED SENSITIVITY ANALYSIS FOR MATCHING ESTIMATORS

Propensity Score Matching and Analysis TEXAS EVALUATION NETWORK INSTITUTE AUSTIN, TX NOVEMBER 9, 2018

Example 7b: Generalized Models for Ordinal Longitudinal Data using SAS GLIMMIX, STATA MEOLOGIT, and MPLUS (last proportional odds model only)

The Stata Journal. Editor Nicholas J. Cox Department of Geography Durham University South Road Durham DH1 3LE UK

ECON 497 Final Exam Page 1 of 12

Confidence intervals for the variance component of random-effects linear models

Correlation and Simple Linear Regression

Estimating chopit models in gllamm Political efficacy example from King et al. (2002)

-redprob- A Stata program for the Heckman estimator of the random effects dynamic probit model

Transcription:

Marginal Effects in Multiply Imputed Datasets Daniel Klein daniel.klein@uni-kassel.de University of Kassel 14th German Stata Users Group meeting GESIS Cologne June 10, 2016 1 / 25

1 Motivation 2 Marginal effects in multiply imputed datasets Some considerations A general approach The mimrgns command 3 Summary 2 / 25

Multiple Imputation The mi commands mi introduced in Stata11 imputes missing values mi impute fits models to imputed data mi estimate some postestimation commands mi test performs data management tasks mi xeq, mi passive,... 3 / 25

Marginal effects and adjusted predictions The margins command margins introduced in Stata11 estimates marginal effects as derivatives or (semi-)elasticities estimates adjusted predictions replaces old mfx and adjust Factor variable notation introduced with and required by margins creates indicator variables, higher order terms, interactions replaces old xi 4 / 25

Subsequent enhancements mi chained equations user imputation methods predictions margins marginsplot visualizes results pairwise comparisons pwcompare contrasts 5 / 25

So, what is missing? Setting up example data version 12.1 webuse nhanes2d, clear svyset, clear drop if (hlthstat > 5) set seed 42 mi set mlong mi register imputed vitaminc zinc mi impute chained /// (pmm, knn(5)) vitaminc zinc /// = hlthstat i.sex i.ageg i.psu finalwgt i.strata ///, add(5) noisily mi svyset psu [pweight = finalwgt], strata(strata) save nhanes2d_imputed5.dta, replace 6 / 25

So, what is missing? Combining mi.... mi estimate : svy : mlogit hlthstat i.sex vitaminc i.ageg zinc Multiple-imputation estimates Imputations = 5 Survey: Multinomial logistic regression Number of obs = 10335 Number of strata = 31 Population size = 116997257 Number of PSUs = 62 Average RVI = 0.2784 Largest FMI = 0.1845 Complete DF = 31 DF adjustment: Small sample DF: min = 21.03 avg = 27.89 max = 29.17 Model F test: Equal FMI F( 32, 28.1) = 197.60 Within VCE type: Linearized Prob > F = 0.0000 hlthstat Coef. Std. Err. t P> t [95% Conf. Interval] 1 2.sex -.2535739.0757789-3.35 0.002 -.4085658 -.098582 vitaminc.3783082.0836045 4.52 0.000.207102.5495144 agegrp 2 -.2417474.1129682-2.14 0.041 -.4727367 -.0107581 3 -.5148524.116651-4.41 0.000 -.75337 -.2763347 4-1.158049.0944494-12.26 0.000-1.351177 -.9649211 5-1.464487.1040847-14.07 0.000-1.677314-1.251659 6-1.236679.1044329-11.84 0.000-1.45025-1.023107 zinc.0069707.0029981 2.33 0.028.0008261.0131153 _cons -.3710475.2991727-1.24 0.225 -.9837576.2416626 (output omitted ) 7 / 25

So, what is missing?... and margins (output omitted ) 3 (base outcome) (output omitted ) 5 2.sex -.0199766.1179985-0.17 0.867 -.2613745.2214214 vitaminc -.9080216.1067354-8.51 0.000-1.129973 -.6860703 agegrp 2.2911128.3417395 0.85 0.401 -.4076501.9898758 3 1.109931.2787963 3.98 0.000.5398748 1.679988 4 1.577982.2523318 6.25 0.000 1.062038 2.093925 5 2.092809.2107573 9.93 0.000 1.66184 2.523779 6 2.483614.2154178 11.53 0.000 2.043079 2.924149 zinc -.0107241.0038114-2.81 0.010 -.0186396 -.0028087 _cons -1.316412.3994441-3.30 0.003-2.139385 -.493439.. margins, vce(unconditional) dydx(*) predict(outcome(1)) last estimates not found r(301); 8 / 25

General considerations How to? Technically, run both, the estimation command and margins on each imputed dataset post estimates to e(b) and e(v) combine results according to Rubin s rules better yet: let mi estimate do the work 9 / 25

Point estimate Variance-covariance matrix Rubin s rules A brief review q mi = 1 M VCE mi = W + M ˆq i i=1 ( 1 + 1 ) B M with B = W = 1 M M i=1 Ŵ i ( ) 1 M (ˆq i q mi ) (ˆq i q mi ) M 1 i=1 10 / 25

Requirements Applicability of Rubin s rules Statistically, q... estimates population parameter does not depend on sample size is (asymptotic) normal Satisfied by, e.g., regression coefficients linear predictor average marginal effects (in large samples) 11 / 25

The proposed solution Isabel Cañette and Yulia Marchenko program emargins, eclass properties(mi) version 11 args outcome svy : mlogit hlthstat i.sex vitaminc i.ageg zinc margins, vce(unconditional) dydx(*) /// predict(outcome(`outcome )) post end 12 / 25

. forvalues j = 1/5 { 2. mi estimate : emargins `j 3. } Let s try Multiple-imputation estimates Imputations = 5 Average marginal effects Number of obs = 10335 Number of strata = 31 Number of PSUs = 62 Average RVI = 0.0209 Largest FMI = 0.0906 Complete DF = 31 DF adjustment: Small sample DF: min = 25.74 avg = 28.64 Within VCE type: Linearized max = 29.17 Coef. Std. Err. t P> t [95% Conf. Interval] 2.sex -.0506149.011062-4.58 0.000 -.073242 -.0279878 vitaminc.0746023.0103436 7.21 0.000.0534397.0957648 agegrp 2 -.0283339.0185412-1.53 0.137 -.0662458.0095781 3 -.0703962.0191397-3.68 0.001 -.1095317 -.0312607 4 -.1800159.0154733-11.63 0.000 -.2116561 -.1483758 5 -.2412945.0155433-15.52 0.000 -.2730762 -.2095128 6 -.2335327.0144687-16.14 0.000 -.2631215 -.203944 zinc.0010539.0004302 2.45 0.021.0001692.0019386 (output omitted ) 13 / 25

Let s try (output omitted ) Multiple-imputation estimates Imputations = 5 Average marginal effects Number of obs = 10335 Number of strata = 31 Number of PSUs = 62 Average RVI = 0.0683 Largest FMI = 0.2394 Complete DF = 31 DF adjustment: Small sample DF: min = 18.37 avg = 26.98 Within VCE type: Linearized max = 29.17 Coef. Std. Err. t P> t [95% Conf. Interval] 2.sex -.0006929.0044593-0.16 0.878 -.0098181.0084322 vitaminc -.0396292.0052561-7.54 0.000 -.0504941 -.0287643 agegrp 2.005949.0042071 1.41 0.168 -.0026535.0145515 3.0314414.0075778 4.15 0.000.015947.0469358 4.0666996.0083927 7.95 0.000.0495386.0838605 5.103145.0093551 11.03 0.000.0840132.1222768 6.1286984.0152025 8.47 0.000.0976045.1597924 zinc -.0005411.0001548-3.50 0.003 -.0008658 -.0002164 14 / 25

Adapting the general approach A simple logit model mi xeq : generate byte goodhlth = (hlthstat < 3) program emargins2, eclass properties(mi) version 11 svy : logit goodhlth i.sex vitaminc i.ageg zinc margins, vce(unconditional) dydx(*) post end. mi estimate : emargins2 varlist specification required r(198); 15 / 25

Adapting the general approach Pairwise comparisons capture program drop emargins2 program emargins2, eclass properties(mi) version 11 svy : logit goodhlth i.sex vitaminc i.ageg zinc margins i.ageg, vce(unconditional) /// post pwcompare end. mi estimate : emargins2 whatever Multiple-imputation estimates Imputations = 5 Pairwise comparisons of predictive margins Number of strata = 31 Number of PSUs = 62 Average RVI = 0.0017 Largest FMI = 0.0078 Complete DF = 31 DF adjustment: Small sample DF: min = 29.05 avg = 29.14 Within VCE type: Linearized max = 29.16 Coef. Std. Err. t P> t [95% Conf. Interval] agegrp 1.7059828.0106223 66.46 0.000.684263.7277026 2.6568665.0195888 33.53 0.000.6168121.6969209 3.5459818.0152304 35.85 0.000.5148394.5771241 4.4056276.014571 27.84 0.000.375833.4354221 5.3318254.01227 27.04 0.000.3067364.3569145 6.3026305.0155862 19.42 0.000.2707557.3345053 16 / 25

Generalizing the proposed solution mimrgns Goals syntax similar to margins output similar to margins no hard coded estimation command 17 / 25

Replicating the original example. mi estimate : svy : mlogit hlthstat i.sex vitaminc i.ageg zinc (output omitted ). mimrgns, vce(unconditional) dydx(*) predict(outcome(1)) Multiple-imputation estimates Imputations = 5 Average marginal effects Number of obs = 10335 Number of strata = 31 Number of PSUs = 62 Average RVI = 0.0209 Largest FMI = 0.0906 Complete DF = 31 DF adjustment: Small sample DF: min = 25.74 avg = 28.64 Within VCE type: Linearized max = 29.17 Expression : Pr(hlthstat==1), predict(outcome(1)) dy/dx w.r.t. : 2.sex vitaminc 2.agegrp 3.agegrp 4.agegrp 5.agegrp 6.agegrp zinc dy/dx Std. Err. t P> t [95% Conf. Interval] 2.sex -.0506149.011062-4.58 0.000 -.073242 -.0279878 vitaminc.0746023.0103436 7.21 0.000.0534397.0957648 agegrp 2 -.0283339.0185412-1.53 0.137 -.0662458.0095781 3 -.0703962.0191397-3.68 0.001 -.1095317 -.0312607 4 -.1800159.0154733-11.63 0.000 -.2116561 -.1483758 5 -.2412945.0155433-15.52 0.000 -.2730762 -.2095128 6 -.2335327.0144687-16.14 0.000 -.2631215 -.203944 zinc.0010539.0004302 2.45 0.021.0001692.0019386 Note: dy/dx for factor levels is the discrete change from the base level. 18 / 25

The logit model. mi estimate : svy : logit goodhlth i.sex vitaminc i.ageg zinc Multiple-imputation estimates Imputations = 5 Survey: Logistic regression Number of obs = 10335 Number of strata = 31 Population size = 116997257 Number of PSUs = 62 Average RVI = 0.0340 Largest FMI = 0.1144 Complete DF = 31 DF adjustment: Small sample DF: min = 24.57 avg = 27.90 max = 29.16 Model F test: Equal FMI F( 8, 29.0) = 93.99 Within VCE type: Linearized Prob > F = 0.0000 goodhlth Coef. Std. Err. t P> t [95% Conf. Interval] 2.sex -.21535.061994-3.47 0.002 -.3421322 -.0885678 vitaminc.5040865.0639858 7.88 0.000.3727249.6354481 agegrp 2 -.2307692.0919848-2.51 0.018 -.4188591 -.0426792 3 -.7052038.0684112-10.31 0.000 -.8450875 -.5653201 4-1.284035.0787597-16.30 0.000-1.445081-1.122989 5-1.608576.073101-22.00 0.000-1.758052-1.459101 6-1.746134.0925375-18.87 0.000-1.935379-1.556888 zinc.0082014.0020081 4.08 0.000.0040671.0123356 _cons -.2285908.1915549-1.19 0.244 -.6234565.166275 19 / 25

Pairwise comparisons. mimrgns i.ageg, predict(pr) vce(unconditional) pwcompare Multiple-imputation estimates Imputations = 5 Pairwise comparisons of predictive margins Number of obs = 10335 Number of strata = 31 Number of PSUs = 62 Average RVI = 0.0017 Largest FMI = 0.0078 Complete DF = 31 DF adjustment: Small sample DF: min = 29.05 avg = 29.14 Within VCE type: Linearized max = 29.16 Expression : Pr(goodhlth), predict(pr) Contrast Std. Err. [95% Conf. Interval] agegrp 2 vs 1 -.0491163.0201039 -.0902249 -.0080076 3 vs 1 -.1600011.0156144 -.191928 -.1280741 4 vs 1 -.3003552.0176029 -.3363501 -.2643604 5 vs 1 -.3741574.0152539 -.4053472 -.3429676 6 vs 1 -.4033523.0189403 -.4420859 -.3646187 3 vs 2 -.1108848.0276638 -.1674514 -.0543182 (output omitted ) 6 vs 4 -.1029971.019452 -.1427737 -.0632204 6 vs 5 -.0291949.0156898 -.0612829.002893 20 / 25

Plotting results. mimrgns i.ageg, predict(pr) vce(unconditional) pwcompare cmdmargins (output omitted ). marginsplot, noci Variables that uniquely identify margins: _pw1 _pw0 _pw enumerates all pairwise comparisons; _pw0 enumerates the reference categories; _pw1 enumerates the comparison categories. Pairwise Comparisons of Predictive Margins of agegrp Comparisons of Pr(Goodhlth) -.4 -.2 0.2.4 1 2 3 4 5 6 Comparison category Reference category 1 2 3 4 5 6 21 / 25

What is the catch? Unresolved issues Statistical non-linear (adjusted) predictions Rubin s rules applicable? higher order terms and interactions passive imputation vs. JAV... 22 / 25

What is the catch? Unresolved issues Technical non-linear (adjusted) predictions transformations not easily implemented higher order terms and interactions margins relies on factor variables JAV approach not feasible execution time need to run estimation command and margins M times cf. Miles (2015)... 23 / 25

Summary some of margins results can be combined using Rubin s rules e.g., linear predictions, average marginal effects mimrgns makes this easy there are still unresolved statistical and technical issues produced results are not guaranteed to be valid 24 / 25

References Cañette, I and Marchenko, Y. (2010). Re: st: Average marginal effects for a multiply imputed complex survey. Statalist. Miles, A. (2015). Obtaining Predictions from Models Fit to Multiply-Imputed Data. Sociological Methods and Research 45(1):175-185. 25 / 25