Alexina Mason. Department of Epidemiology and Biostatistics Imperial College, London. 16 February 2010

Size: px
Start display at page:

Download "Alexina Mason. Department of Epidemiology and Biostatistics Imperial College, London. 16 February 2010"

Transcription

1 Strategy for modelling non-random missing data mechanisms in longitudinal studies using Bayesian methods: application to income data from the Millennium Cohort Study Alexina Mason Department of Epidemiology and Biostatistics Imperial College, London 16 February 2010 with thanks to Nicky Best, Ian Plewis and Sylvia Richardson This work was supported by an ESRC PhD studentship.

2 Outline 1 Motivation Introduction MCS income example 2 Modelling Strategy Overview Construct a base model Sensitivity analysis 3 Application Construct a base model for MCS income Sensitivity analysis for MCS income

3 Why do we need a missing data strategy? Inevitably longitudinal studies lose members over time and generally suffer from missing data Analysis of such data is complicated by missing covariates and missing reponses Many approaches have been proposed The appropriateness of a particular approach is dependent on the mechanism that leads to the missing data but this cannot be determined from the data So, researchers are forced to make assumptions and strongly recommended to check the robustness of their conclusions to alternative plausible assumptions This can be complicated, so a flexible strategy can help

4 Why does the strategy use Bayesian methods? Bayesian full probability modelling is a statistically principled method for dealing with missing data, i.e. combines information in the observed data with assumptions about the missing value mechanism accounts for the uncertainty introduced by the missing data Allow complex models to be constructed in a modular way, for example a Bayesian joint model may consist of submodels for analysing the question of interest imputing missing covariates allowing the mechanism to be informative Enable coherent model estimation Facilitate sensitivity analysis However, the principles of the strategy could be adapted for a non-bayesian framework

5 Millennium Cohort Study (MCS) example MCS has 18,000+ cohort members born in the UK at the beginning of the Millennium Using sweeps 1 and 2, our example predicts income for main respondents (usually the cohort member s mother) meeting the criteria: single in sweep 1 in work not self-employed Motivating questions about income include: does ethnicity affect rate of pay? how much extra do individuals earn if they have a degree? does change in partnership status affect income?

6 Missingness in the MCS income dataset Initial dataset has 559 records sweep 1 covariates observed missing pay observed missing 43 4 Restrict dataset to individuals fully observed in sweep 1 sweep 2 for remaining 505 individuals covariates observed missing pay observed missing Do not distinguish between item and sweep non-response All the covariate comes from sweep non-response

7 Types of missing data Following Rubin, missing data are generally classified into 3 types Consider the mechanism that led to missing pay in sweep 2 (pay 2 ), defining p i to be the probability that pay 2 is missing for individual i

8 Types of missing data Following Rubin, missing data are generally classified into 3 types Consider the mechanism that led to missing pay in sweep 2 (pay 2 ), defining p i to be the probability that pay 2 is missing for individual i Missing Completely at Random (MCAR) does not depend on observed or unobserved data p i = θ 0

9 Types of missing data Following Rubin, missing data are generally classified into 3 types Consider the mechanism that led to missing pay in sweep 2 (pay 2 ), defining p i to be the probability that pay 2 is missing for individual i Missing Completely at Random (MCAR) does not depend on observed or unobserved data p i = θ 0 Missing at Random (MAR) depends only on observed data p i = θ 0 + θ 1 pay 1i

10 Types of missing data Following Rubin, missing data are generally classified into 3 types Consider the mechanism that led to missing pay in sweep 2 (pay 2 ), defining p i to be the probability that pay 2 is missing for individual i Missing Completely at Random (MCAR) does not depend on observed or unobserved data p i = θ 0 Missing at Random (MAR) depends only on observed data p i = θ 0 + θ 1 pay 1i Missing not at random (MNAR) neither MCAR or MAR hold p i = θ 0 + δpay 2i or p i = θ 0 + θ 1 pay 1i + δ(pay 2i pay 1i )

11 Outline 1 Motivation Introduction MCS income example 2 Modelling Strategy Overview Construct a base model Sensitivity analysis 3 Application Construct a base model for MCS income Sensitivity analysis for MCS income

12 Schematic Diagram 1: select MoI using complete cases note plausible alternatives 2: add CMoM 3: add RMoM BASE MODEL 4: seek additional data 5: elicit expert knowledge 6: ASSUMPTION SENSITIVITY 7: PARAMETER SENSITIVITY report YES robustness 8: Are conclusions robust? NO determine region of high plausibility recognise uncertainty assess fit of validation sample calculate DIC

13 Schematic Diagram note plausible alternatives 6: ASSUMPTION SENSITIVITY report YES robustness 1: select MoI using complete cases 2: add CMoM 3: add RMoM BASE MODEL 8: Are conclusions robust? 7: PARAMETER SENSITIVITY NO determine region of high plausibility recognise uncertainty 4: seek additional data 5: elicit expert knowledge assess fit of validation sample calculate DIC Strategy can be thought of as consisting of two parts: Constructing a base model Assessing conclusions from this base model against a selection of well chosen sensitivity analyses

14 Schematic Diagram note plausible alternatives 6: ASSUMPTION SENSITIVITY report YES robustness 1: select MoI using complete cases 2: add CMoM 3: add RMoM BASE MODEL 8: Are conclusions robust? 7: PARAMETER SENSITIVITY NO determine region of high plausibility recognise uncertainty 4: seek additional data 5: elicit expert knowledge assess fit of validation sample calculate DIC Strategy can be thought of as consisting of two parts: Constructing a base model Assessing conclusions from this base model against a selection of well chosen sensitivity analyses

15 Schematic Diagram note plausible alternatives 6: ASSUMPTION SENSITIVITY report YES robustness 1: select MoI using complete cases 2: add CMoM 3: add RMoM BASE MODEL 8: Are conclusions robust? 7: PARAMETER SENSITIVITY NO determine region of high plausibility recognise uncertainty 4: seek additional data 5: elicit expert knowledge assess fit of validation sample calculate DIC Strategy can be thought of as consisting of two parts: Constructing a base model Assessing conclusions from this base model against a selection of well chosen sensitivity analyses

16 Construct a base model I note plausible alternatives 6: ASSUMPTION SENSITIVITY report YES robustness 1: select MoI using complete cases 2: add CMoM 3: add RMoM BASE MODEL 8: Are conclusions robust? 7: PARAMETER SENSITIVITY NO determine region of high plausibility recognise uncertainty 4: seek additional data 5: elicit expert knowledge assess fit of validation sample calculate DIC 1 Form an initial Model of Interest (MoI) using only complete cases, includes choosing transform for the response model structure set of explanatory variables Add a Covariate Model of Missingness (CMoM) to produce realistic imputations of any missing covariates Add a Response Model of Missingness (RMoM) to allow informative in the response

17 Construct a base model I note plausible alternatives 6: ASSUMPTION SENSITIVITY report YES robustness 1: select MoI using complete cases 2: add CMoM 3: add RMoM BASE MODEL 8: Are conclusions robust? 7: PARAMETER SENSITIVITY NO determine region of high plausibility recognise uncertainty 4: seek additional data 5: elicit expert knowledge assess fit of validation sample calculate DIC 1 Form an initial Model of Interest (MoI) using only complete cases, includes choosing transform for the response model structure set of explanatory variables 2 Add a Covariate Model of Missingness (CMoM) to produce realistic imputations of any missing covariates Add a Response Model of Missingness (RMoM) to allow informative in the response

18 Construct a base model I note plausible alternatives 6: ASSUMPTION SENSITIVITY report YES robustness 1: select MoI using complete cases 2: add CMoM 3: add RMoM BASE MODEL 8: Are conclusions robust? 7: PARAMETER SENSITIVITY NO determine region of high plausibility recognise uncertainty 4: seek additional data 5: elicit expert knowledge assess fit of validation sample calculate DIC 1 Form an initial Model of Interest (MoI) using only complete cases, includes choosing transform for the response model structure set of explanatory variables 2 Add a Covariate Model of Missingness (CMoM) to produce realistic imputations of any missing covariates 3 Add a Response Model of Missingness (RMoM) to allow informative in the response

19 The joint model: schematic diagram model of interest parameters response model parameters response with probability of covariate model parameters covariates with fully observed covariates indicator

20 The joint model: schematic diagram model of interest parameters model of interest response model parameters response with probability of covariate model parameters covariates with fully observed covariates indicator

21 The joint model: schematic diagram model of interest parameters model of interest response model parameters response with probability of covariate model parameters covariates with fully observed covariates indicator covariate model of

22 The joint model: schematic diagram model of interest parameters model of interest response model of response model parameters this part required for non-ignorable in the response response with probability of covariate model parameters covariates with fully observed covariates indicator covariate model of

23 The joint model: schematic diagram model of interest parameters response model of model of interest response model parameters information from additional sources may help with the estimation of these parameters response with probability of covariate model parameters covariates with fully observed covariates indicator covariate model of

24 Construct a base model II note plausible alternatives 6: ASSUMPTION SENSITIVITY report YES robustness 1: select MoI using complete cases 2: add CMoM 3: add RMoM BASE MODEL 8: Are conclusions robust? 7: PARAMETER SENSITIVITY NO determine region of high plausibility recognise uncertainty 4: seek additional data 5: elicit expert knowledge assess fit of validation sample calculate DIC 4 Additional data can help with parameter estimation. Possible sources include earlier/later sweeps of longitudinal study not under investigation another study on individuals with similar characteristics Expert knowledge can be incorporated using informative priors. Information relating to the RMoM has potential to make large impact, particularly regarding its functional form.

25 Construct a base model II note plausible alternatives 6: ASSUMPTION SENSITIVITY report YES robustness 1: select MoI using complete cases 2: add CMoM 3: add RMoM BASE MODEL 8: Are conclusions robust? 7: PARAMETER SENSITIVITY NO determine region of high plausibility recognise uncertainty 4: seek additional data 5: elicit expert knowledge assess fit of validation sample calculate DIC 4 Additional data can help with parameter estimation. Possible sources include earlier/later sweeps of longitudinal study not under investigation another study on individuals with similar characteristics 5 Expert knowledge can be incorporated using informative priors. Information relating to the RMoM has potential to make large impact, particularly regarding its functional form.

26 Perform a sensitivity analysis note plausible alternatives 6: ASSUMPTION SENSITIVITY report YES robustness 1: select MoI using complete cases 2: add CMoM 3: add RMoM BASE MODEL 8: Are conclusions robust? 7: PARAMETER SENSITIVITY NO determine region of high plausibility recognise uncertainty 4: seek additional data 5: elicit expert knowledge assess fit of validation sample calculate DIC 6 Form alternative models from the base model by changing key assumptions, including: MoI error distribution MoI response transform functional form of the RMoM Run the base model with the parameters controlling the extent of the departure from MAR fixed to a range of plausible values. Use the results of both types of sensitivity analysis to establish variability in the quantities of interest.

27 Perform a sensitivity analysis note plausible alternatives 6: ASSUMPTION SENSITIVITY report YES robustness 1: select MoI using complete cases 2: add CMoM 3: add RMoM BASE MODEL 8: Are conclusions robust? 7: PARAMETER SENSITIVITY NO determine region of high plausibility recognise uncertainty 4: seek additional data 5: elicit expert knowledge assess fit of validation sample calculate DIC 6 Form alternative models from the base model by changing key assumptions, including: MoI error distribution MoI response transform functional form of the RMoM 7 Run the base model with the parameters controlling the extent of the departure from MAR fixed to a range of plausible values. Use the results of both types of sensitivity analysis to establish variability in the quantities of interest.

28 Perform a sensitivity analysis note plausible alternatives 6: ASSUMPTION SENSITIVITY report YES robustness 1: select MoI using complete cases 2: add CMoM 3: add RMoM BASE MODEL 8: Are conclusions robust? 7: PARAMETER SENSITIVITY NO determine region of high plausibility recognise uncertainty 4: seek additional data 5: elicit expert knowledge assess fit of validation sample calculate DIC 6 Form alternative models from the base model by changing key assumptions, including: MoI error distribution MoI response transform functional form of the RMoM 7 Run the base model with the parameters controlling the extent of the departure from MAR fixed to a range of plausible values. 8 Use the results of both types of sensitivity analysis to establish variability in the quantities of interest.

29 Outline 1 Motivation Introduction MCS income example 2 Modelling Strategy Overview Construct a base model Sensitivity analysis 3 Application Construct a base model for MCS income Sensitivity analysis for MCS income

30 Schematic Diagram note plausible alternatives 6: ASSUMPTION SENSITIVITY report YES robustness 1: select MoI using complete cases 2: add CMoM 3: add RMoM BASE MODEL 8: Are conclusions robust? 7: PARAMETER SENSITIVITY NO determine region of high plausibility recognise uncertainty 4: seek additional data 5: elicit expert knowledge assess fit of validation sample calculate DIC Strategy can be thought of as consisting of two parts: Constructing a base model Assessing conclusions from this base model against a selection of well chosen sensitivity analyses

31 Initial model of interest We choose log of hourly net pay as our response 6 explanatory variables Description of explanatory variables short name description details age continuous a edu educational level 3 levels (1=none/NVQ1; 2=NVQ2/3; 3=NVQ4/5) b eth ethnic group 2 levels (1=white; 2=non-white) sing c single/partner 2 levels (1=single; 2=partner) reg region of country 2 levels (1=London; 2=other) stratum ward type by country d 9 levels a centred and standardised b the level of National Vocational Qualification (NVQ) equivalence of the individual s highest academic or vocational educational qualification (level 3 has a degree) c always single in sweep 1 d three strata for England (advantaged, disadvantaged and ethnic minority); two strata for Wales, Scotland and Northern Ireland (advantaged and disadvantaged) And a t distribution with 4 degrees of freedom (t 4 ) for the errors for robustness to outliers

32 Initial model of interest: the equations log of hourly pay (hpay) Alternative (AS3): cube root transform robustness to outliers Alternative (AS1): Normal errors individual random effects y it t 4 (µ it, σ 2 ) µ it = α i +γ s(i) + p k=1 β kx kit + q k=p+1 β kz ki stratum specific intercepts eth (ethnic group) age (main respondent s age) edu (educational level) reg (London/other) sing (single/partner) Alternative (AS2): include age 2 and age edu interaction terms & vague priors e.g. β k N(0, )

33 Conclusions based on complete cases Higher hourly pay is associated with having a degree Little evidence of an association between pay and ethnicity Lower pay is associated with gaining a partner between sweeps Key parameter estimates based on a complete case analysis Complete Cases β edu[nvq2&3] 0.15 (0.06,0.25) β edu[nvq4&5] 0.35 (0.24,0.45) β eth (-0.18,0.10) β sing (-0.14,0.00) Table shows the posterior mean, with the 95% interval in brackets

34 Covariate model of Assume covariates are missing at random (MAR) stratum and eth do not change between sweeps Imputation of missing sweep 2 values required for the other 4 covariates reg: assign sweep 1 value age, edu and sing: set up a joint imputation model using latent variables with Normal distributions for edu and sing Fully observed sweep 1 covariates are used as explanatory variables in the imputation model

35 Response model of (selection model) Allow informative in the response by modelling a missing value indicator (m i ) for sweep 2 pay (hpay i2 ) s.t. { 1: hpayi2 observed m i = 0: hpay i2 missing Use a logit model for response, i.e. m i Bernoulli(p i ); logit(p i ) =? Previous work in this area informs choice of predictors of missing income functional form Untransformed hourly pay used in this sub-model

36 Response model of : the equations m i Bernoulli(p i ) hpay i1 hpay i2 hpay i1 eth (ethnic group) sc (social class) ctry (country) logit(p i ) = θ 0 + P iecewise(level i ) + P iecewise(change i ) + k θ kw ki choice of functional form and position of knots are based on expert knowledge Alternative (AS4): linear functional form P iecewise(level i ) = P iecewise(change i ) = & vague priors θ level[1] (level i 10) : level i < 10 θ level[2] (level i 10) : level i 10 δ 1 change i : change i < 0 δ 2 change i : change i 0

37 Response model of parameter estimates Sweep 2 pay is more likely to be missing for individuals who are non-white have low levels of pay in sweep 1 whose pay changes substantially between sweeps Social class and country make little difference BASE CASE θ (1.80,3.89) θ level[1] 0.29 (0.10,0.52) θ level[2] 0.59 (0.29,0.91) δ change[1] 0.67 (0.39,0.97) δ change[2] (-0.36,-0.06) θ ctry[2:wales] (-0.82,0.52) θ ctry[3:scotland] 0.17 (-0.46,0.81) θ ctry[4:northern Ireland] 0.29 (-0.36,0.97) θ eth (-1.82,-0.46) θ sc[2] 0.06 (-0.70,0.85) θ sc[3] (-1.08,1.00) θ sc[4] 0.15 (-0.62,0.95) Table shows posterior mean (95% interval) The 95% intervals of the change parameters (δ) do not include zero: evidence of informative given the model assumptions

38 Impact on substantive questions Conclusions regarding education and ethnicity unchanged Evidence of an association between hourly pay and gaining a partner between sweeps has strengthened Comparison of parameter estimates from model of interest (complete cases) and joint model (BASE CASE) Complete Cases BASE CASE β edu[nvq2&3] 0.15 (0.06,0.25) 0.17 (0.09,0.25) β edu[nvq4&5] 0.35 (0.24,0.45) 0.35 (0.25,0.44) β eth (-0.18,0.10) (-0.18,0.06) β sing (-0.14,0.00) (-0.20,-0.02) Table shows the posterior mean, with the 95% interval in brackets

39 Incorporating additional sources of information Where information is limited, some parameters in the joint model can be difficult to estimate But, we can increase the amount of information available by incorporating data from other sources, e.g. data from other studies expert opinion

40 Seek additional data For example, imputing edu is difficult because few individuals gain qualifications between sweeps Seek another study with individuals with similar characteristics which includes education variables Expand Covariate Model of Missingness (CMoM) to simultaneously model data from original study (MCS) and additional study by fitting 2 sets of equations with common coefficients 1 set for imputing the missing MCS covariates 1 set for modelling the additional data The extra data allows the parameters in the CMoM to be estimated with greater accuracy

41 Elicit expert knowledge The Bayesian approach provides the option of including additional information through informative priors This is of greatest potential value for the parameters associated with informative Informative priors can be formed through elicitation, but are difficult to elicit directly Instead elicit information about probability of response at design points convert this to informative priors A good elicitation strategy would identify and concentrate on weakly identified variables allow for correlation between these variables focus on functional form

42 Base model fit using validation sample probability density Data was collected from 7 individuals who were originally non-contacts or refusals in sweep 2, after they were re-issued by the fieldwork agency We set these data to missing before fitting our models, so they can now be used for model checking For BASE, hourly pay is well estimated for all 7 individuals BASE Posterior predictive distribution and observed value of hourly pay of 4 re-issued individuals A hourly pay ( ) probability density B hourly pay ( ) probability density C hourly pay ( ) probability density The true value of hourly pay is indicated by the red line D hourly pay ( )

43 Schematic Diagram note plausible alternatives 6: ASSUMPTION SENSITIVITY report YES robustness 1: select MoI using complete cases 2: add CMoM 3: add RMoM BASE MODEL 8: Are conclusions robust? 7: PARAMETER SENSITIVITY NO determine region of high plausibility recognise uncertainty 4: seek additional data 5: elicit expert knowledge assess fit of validation sample calculate DIC Strategy can be thought of as consisting of two parts: Constructing a base model Assessing conclusions from this base model against a selection of well chosen sensitivity analyses

44 Assumption sensitivity analysis: description BASE CASE key features: MoI - t 4 error distribution MoI - covariates {age, edu, eth, reg, sing} MoI - log transform of the response RMoM - piecewise linear functional form for level and change Assumption sensitivity analysis differences from BASE CASE: AS1: MoI - Normal error distribution AS2: MoI - additional covariates age 2 and age edu AS3: MoI - cube root transform of response AS4: RMoM - linear functional form for level and change MoI = Model of Interest; RMoM = Response Model of Missingness

45 Assumption sensitivity analysis: results Based on this sensitivity analysis ethnicity: conclusions from BASE CASE are robust gaining a partner: consistent evidence of association with lower pay, but strength is unclear Comparison of parameters associated with being non-white and gaining a partner between sweeps β eth β sing CC (-0.18,0.10) (-0.14,0.00) BASE (-0.18,0.06) (-0.20,-0.02) AS1 (Normal errors) 0.01 (-0.12,0.15) (-0.23,-0.01) AS2 (additional covariates) (-0.18,0.07) (-0.20,-0.01) AS4 (linear level & change) (-0.18,0.07) (-0.25,-0.07) AS3 (cube root transform) (-0.12,0.04) (-0.14,-0.02) Table shows the posterior mean, with the 95% interval in brackets.

46 AS1-AS4: model fit using validation sample The mean square error (MSE) of the fit of hourly pay for the 7 re-issues is a summary measure of model performance The models with the linear functional form for the RMoM (AS4) and with the cube root transform (AS3) fit the re-issued individuals best MSE of the fit of hourly pay for the 7 re-issued individuals median 95% interval BASE 18.7 (3.1,367.0) AS1 (Normal errors) 16.8 (3.2,108.8) AS2 (additional covariates) 14.2 (2.8,295.3) AS3 (cube root transform) 8.0 (1.9,73.6) AS4 (linear level & change) 8.8 (2.9,21.7)

47 Parameter sensitivity analysis: description Recall change i = hpay i2 hpay i1, and {... + δ1 change logit(p i ) = i + : change i < δ 2 change i + : change i 0 The values of δ 1 and δ 2 control the degree of departure from MAR δ 1 and δ 2 are difficult for the model to estimate A series of models is run with these two parameters fixed 81 variants formed by combining 9 values of δ 1 with 9 values of δ 2 Value set for both δ 1 and δ 2 is { 1, 0.75, 0.5, 0.25, 0, 0.25, 0.5, 0.75, 1} 9 variants have linear functional form of change, i.e. δ 1 = δ 2 δ 1 = δ 2 = 0 variant is equivalent to assuming the response is MAR

48 Parameter sensitivity analysis: results - tabular Proportional increase in pay associated with selected covariates for PS variants compared with base model (BASE) minimum δ 1 δ 2 maximum δ 1 δ 2 MAR a BASE edu[nvq2&3] (1.06,1.25) (1.10,1.29) (1.06,1.26) (1.09,1.29) edu[nvq4&5] (1.18,1.44) (1.31,1.58) (1.19,1.46) (1.28,1.56) eth PS (0.82,1.05) (0.85,1.12) (0.84,1.09) (0.83,1.06) sing (0.71,0.84) (1.18,1.47) (0.88,0.99) (0.82,0.98) Table shows the posterior mean, with the 95% interval in brackets. a δ 1 = 0 and δ 2 = 0 is MAR

49 Parameter sensitivity analysis: results - graphical I Estimated proportional change in pay associated with being non-white versus δ 1 conditional on δ 2 from PS variants 1.4 delta2= 1 delta2= 0.5 delta2=0 delta2=0.5 delta2=1 posterior mean 95% interval e β eth δ 1

50 Parameter sensitivity analysis: results - graphical II Estimated proportional change in pay associated with gaining a partner between sweeps versus δ 1 conditional on δ 2 from PS variants e β sing delta2= 1 delta2= 0.5 delta2=0 delta2=0.5 delta2=1 posterior mean 95% interval δ 1

51 Parameter sensitivity analysis: results - graphical III Posterior mean of proportional change in pay associated with selected covariates versus δ 1 and δ 2 from PS variants non white: e β eth gaining a partner: e β sing δ δ δ 1 δ 1 The points at the δ values relating to MAR, BASE and AS4 are marked with a red circle, blue triangle and green diamond respectively.

52 Reporting robustness: ethnicity question Does ethnicity affect rate of pay? Key points to report are: No evidence of an association between ethnicity and hourly pay Base model results: the proportional change in hourly pay associated with being non-white has a posterior mean of 0.94, with a 95% interval from 0.83 to 1.06 These conclusions are very robust to our sensitivity analysis But results relating to gaining a partner are not robust, so we need to investigate the plausibility of different models

53 Assess fit using validation sample Mean square error of the fit of hourly pay for the 7 re-issued individuals versus δ 1 and δ 2 from PS variants δ δ 1 The points at the δ values relating to MAR, BASE and AS4 are marked with a red circle, blue triangle and green diamond respectively.

54 Determine region of high plausibility MSE for re issues Proportional change in pay with gaining a partner δ δ δ 1 δ 1 The points at the δ values relating to MAR, BASE and AS4 are marked with a red circle, blue triangle and green diamond respectively.

55 Determine region of high plausibility MSE for re issues Proportional change in pay with gaining a partner δ δ δ 1 δ 1 The points at the δ values relating to MAR, BASE and AS4 are marked with a red circle, blue triangle and green diamond respectively.

56 Recognising uncertainty: partnership question Does change in partnership status affect income? Key points to report are: There is evidence that gaining a partner is associated with a decrease in hourly pay However, the magnitude of this decrease is uncertain Our analysis suggests that the proportional decrease lies in the region 0.77 (0.71,0.84) to 0.94 (0.88,0.99) Some models run as part of the sensitivity analysis suggest that change in partnership status is associated with an increase in pay, but these models do not fall in the region of high plausibility. Why should gaining a partner be associated with a decrease in hourly pay? proxy for additional child? reverse causality?

57 Summary Compared to a complete case analysis implementing this strategy is time-consuming but allows realistic assumptions about the mechanism to be explored and provides confidence in conclusions to questions of interest The proposed strategy is flexible steps can be omitted if appropriate or it can be extended if necessary applied for other types of studies

58 Relevant literature The BIAS project. Best, N. G., Spiegelhalter, D. J., Thomas, A., and Brayne, C. E. G. (1996). Bayesian Analysis of Realistically Complex Models. Journal of the Royal Statistical Society, Series A (Statistics in Society), 159, (2), Daniels, M. J. and Hogan, J. W. (2008). Missing Data In Longitudinal Studies Strategies for Bayesian Modeling and Sensitivity Analysis. Chapman & Hall. Little, R. J. A. and Rubin, D. B. (2002). Statistical Analysis with Missing Data, (2nd edn). John Wiley and Sons. Plewis, I. (2007). Non-Response in a Birth Cohort Study: The Case of the Millennium Cohort Study. International Journal of Social Research Methodology, 10, (5),

A comparison of fully Bayesian and two-stage imputation strategies for missing covariate data

A comparison of fully Bayesian and two-stage imputation strategies for missing covariate data A comparison of fully Bayesian and two-stage imputation strategies for missing covariate data Alexina Mason, Sylvia Richardson and Nicky Best Department of Epidemiology and Biostatistics, Imperial College

More information

Bayesian methods for modelling non-random missing data mechanisms in longitudinal studies. Alexina Jane Mason. Imperial College London

Bayesian methods for modelling non-random missing data mechanisms in longitudinal studies. Alexina Jane Mason. Imperial College London Bayesian methods for modelling non-random missing data mechanisms in longitudinal studies Alexina Jane Mason Imperial College London Department of Epidemiology and Public Health PhD Thesis Abstract In

More information

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London

Bayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London Bayesian methods for missing data: part 1 Key Concepts Nicky Best and Alexina Mason Imperial College London BAYES 2013, May 21-23, Erasmus University Rotterdam Missing Data: Part 1 BAYES2013 1 / 68 Outline

More information

Nuoo-Ting (Jassy) Molitor, Nicky Best, Chris Jackson and Sylvia Richardson Imperial College UK. September 30, 2008

Nuoo-Ting (Jassy) Molitor, Nicky Best, Chris Jackson and Sylvia Richardson Imperial College UK. September 30, 2008 Using Bayesian graphical models to model biases in observational studies and to combine multiple data sources: Application to low birth-weight and water disinfection by-products Nuoo-Ting (Jassy) Molitor,

More information

Inferences on missing information under multiple imputation and two-stage multiple imputation

Inferences on missing information under multiple imputation and two-stage multiple imputation p. 1/4 Inferences on missing information under multiple imputation and two-stage multiple imputation Ofer Harel Department of Statistics University of Connecticut Prepared for the Missing Data Approaches

More information

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /rssa.

University of Bristol - Explore Bristol Research. Peer reviewed version. Link to published version (if available): /rssa. Goldstein, H., Carpenter, J. R., & Browne, W. J. (2014). Fitting multilevel multivariate models with missing data in responses and covariates that may include interactions and non-linear terms. Journal

More information

Discussion of Identifiability and Estimation of Causal Effects in Randomized. Trials with Noncompliance and Completely Non-ignorable Missing Data

Discussion of Identifiability and Estimation of Causal Effects in Randomized. Trials with Noncompliance and Completely Non-ignorable Missing Data Biometrics 000, 000 000 DOI: 000 000 0000 Discussion of Identifiability and Estimation of Causal Effects in Randomized Trials with Noncompliance and Completely Non-ignorable Missing Data Dylan S. Small

More information

MISSING or INCOMPLETE DATA

MISSING or INCOMPLETE DATA MISSING or INCOMPLETE DATA A (fairly) complete review of basic practice Don McLeish and Cyntha Struthers University of Waterloo Dec 5, 2015 Structure of the Workshop Session 1 Common methods for dealing

More information

Unbiased estimation of exposure odds ratios in complete records logistic regression

Unbiased estimation of exposure odds ratios in complete records logistic regression Unbiased estimation of exposure odds ratios in complete records logistic regression Jonathan Bartlett London School of Hygiene and Tropical Medicine www.missingdata.org.uk Centre for Statistical Methodology

More information

Basics of Modern Missing Data Analysis

Basics of Modern Missing Data Analysis Basics of Modern Missing Data Analysis Kyle M. Lang Center for Research Methods and Data Analysis University of Kansas March 8, 2013 Topics to be Covered An introduction to the missing data problem Missing

More information

An Empirical Comparison of Multiple Imputation Approaches for Treating Missing Data in Observational Studies

An Empirical Comparison of Multiple Imputation Approaches for Treating Missing Data in Observational Studies Paper 177-2015 An Empirical Comparison of Multiple Imputation Approaches for Treating Missing Data in Observational Studies Yan Wang, Seang-Hwane Joo, Patricia Rodríguez de Gil, Jeffrey D. Kromrey, Rheta

More information

Can a Pseudo Panel be a Substitute for a Genuine Panel?

Can a Pseudo Panel be a Substitute for a Genuine Panel? Can a Pseudo Panel be a Substitute for a Genuine Panel? Min Hee Seo Washington University in St. Louis minheeseo@wustl.edu February 16th 1 / 20 Outline Motivation: gauging mechanism of changes Introduce

More information

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing Alessandra Mattei Dipartimento di Statistica G. Parenti Università

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

Some methods for handling missing values in outcome variables. Roderick J. Little

Some methods for handling missing values in outcome variables. Roderick J. Little Some methods for handling missing values in outcome variables Roderick J. Little Missing data principles Likelihood methods Outline ML, Bayes, Multiple Imputation (MI) Robust MAR methods Predictive mean

More information

Whether to use MMRM as primary estimand.

Whether to use MMRM as primary estimand. Whether to use MMRM as primary estimand. James Roger London School of Hygiene & Tropical Medicine, London. PSI/EFSPI European Statistical Meeting on Estimands. Stevenage, UK: 28 September 2015. 1 / 38

More information

Plausible Values for Latent Variables Using Mplus

Plausible Values for Latent Variables Using Mplus Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can

More information

Controlling for latent confounding by confirmatory factor analysis (CFA) Blinded Blinded

Controlling for latent confounding by confirmatory factor analysis (CFA) Blinded Blinded Controlling for latent confounding by confirmatory factor analysis (CFA) Blinded Blinded 1 Background Latent confounder is common in social and behavioral science in which most of cases the selection mechanism

More information

Known unknowns : using multiple imputation to fill in the blanks for missing data

Known unknowns : using multiple imputation to fill in the blanks for missing data Known unknowns : using multiple imputation to fill in the blanks for missing data James Stanley Department of Public Health University of Otago, Wellington james.stanley@otago.ac.nz Acknowledgments Cancer

More information

Adjustment for Missing Confounders Using External Validation Data and Propensity Scores

Adjustment for Missing Confounders Using External Validation Data and Propensity Scores Adjustment for Missing Confounders Using External Validation Data and Propensity Scores Lawrence C. McCandless 1 Sylvia Richardson 2 Nicky Best 2 1 Faculty of Health Sciences, Simon Fraser University,

More information

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P. Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Melanie M. Wall, Bradley P. Carlin November 24, 2014 Outlines of the talk

More information

MISSING or INCOMPLETE DATA

MISSING or INCOMPLETE DATA MISSING or INCOMPLETE DATA A (fairly) complete review of basic practice Don McLeish and Cyntha Struthers University of Waterloo Dec 5, 2015 Structure of the Workshop Session 1 Common methods for dealing

More information

Millennium Cohort Study:

Millennium Cohort Study: Millennium Cohort Study: Geographic Identifiers in MCS June 2009 Jon Johnson Centre for Longitudinal Studies Faculty of Policy and Society Institute of Education, University of London Page 1 Introduction

More information

Reconstruction of individual patient data for meta analysis via Bayesian approach

Reconstruction of individual patient data for meta analysis via Bayesian approach Reconstruction of individual patient data for meta analysis via Bayesian approach Yusuke Yamaguchi, Wataru Sakamoto and Shingo Shirahata Graduate School of Engineering Science, Osaka University Masashi

More information

Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level

Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level A Monte Carlo Simulation to Test the Tenability of the SuperMatrix Approach Kyle M Lang Quantitative Psychology

More information

Statistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23

Statistical Methods. Missing Data  snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23 1 / 23 Statistical Methods Missing Data http://www.stats.ox.ac.uk/ snijders/sm.htm Tom A.B. Snijders University of Oxford November, 2011 2 / 23 Literature: Joseph L. Schafer and John W. Graham, Missing

More information

The propensity score with continuous treatments

The propensity score with continuous treatments 7 The propensity score with continuous treatments Keisuke Hirano and Guido W. Imbens 1 7.1 Introduction Much of the work on propensity score analysis has focused on the case in which the treatment is binary.

More information

Structural Uncertainty in Health Economic Decision Models

Structural Uncertainty in Health Economic Decision Models Structural Uncertainty in Health Economic Decision Models Mark Strong 1, Hazel Pilgrim 1, Jeremy Oakley 2, Jim Chilcott 1 December 2009 1. School of Health and Related Research, University of Sheffield,

More information

New Developments in Nonresponse Adjustment Methods

New Developments in Nonresponse Adjustment Methods New Developments in Nonresponse Adjustment Methods Fannie Cobben January 23, 2009 1 Introduction In this paper, we describe two relatively new techniques to adjust for (unit) nonresponse bias: The sample

More information

Toutenburg, Fieger: Using diagnostic measures to detect non-mcar processes in linear regression models with missing covariates

Toutenburg, Fieger: Using diagnostic measures to detect non-mcar processes in linear regression models with missing covariates Toutenburg, Fieger: Using diagnostic measures to detect non-mcar processes in linear regression models with missing covariates Sonderforschungsbereich 386, Paper 24 (2) Online unter: http://epub.ub.uni-muenchen.de/

More information

Estimation of Missing Data Using Convoluted Weighted Method in Nigeria Household Survey

Estimation of Missing Data Using Convoluted Weighted Method in Nigeria Household Survey Science Journal of Applied Mathematics and Statistics 2017; 5(2): 70-77 http://www.sciencepublishinggroup.com/j/sjams doi: 10.1168/j.sjams.20170502.12 ISSN: 2376-991 (Print); ISSN: 2376-9513 (Online) Estimation

More information

Growth Mixture Modeling and Causal Inference. Booil Jo Stanford University

Growth Mixture Modeling and Causal Inference. Booil Jo Stanford University Growth Mixture Modeling and Causal Inference Booil Jo Stanford University booil@stanford.edu Conference on Advances in Longitudinal Methods inthe Socialand and Behavioral Sciences June 17 18, 2010 Center

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

An Introduction to Mplus and Path Analysis

An Introduction to Mplus and Path Analysis An Introduction to Mplus and Path Analysis PSYC 943: Fundamentals of Multivariate Modeling Lecture 10: October 30, 2013 PSYC 943: Lecture 10 Today s Lecture Path analysis starting with multivariate regression

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Mark Scheme (Results) Summer 2010

Mark Scheme (Results) Summer 2010 Mark Scheme (Results) Summer 2010 GCE Statistics S1 (6683) Edexcel Limited. Registered in England and Wales No. 4496750 Registered Office: One90 High Holborn, London WC1V 7BH Edexcel is one of the leading

More information

University of Warwick institutional repository:

University of Warwick institutional repository: University of Warwick institutional repository: http://go.warwick.ac.uk/wrap This paper is made available online in accordance with publisher policies. Please scroll down to view the document itself. Please

More information

Nonrespondent subsample multiple imputation in two-phase random sampling for nonresponse

Nonrespondent subsample multiple imputation in two-phase random sampling for nonresponse Nonrespondent subsample multiple imputation in two-phase random sampling for nonresponse Nanhua Zhang Division of Biostatistics & Epidemiology Cincinnati Children s Hospital Medical Center (Joint work

More information

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND

More information

PIRLS 2016 Achievement Scaling Methodology 1

PIRLS 2016 Achievement Scaling Methodology 1 CHAPTER 11 PIRLS 2016 Achievement Scaling Methodology 1 The PIRLS approach to scaling the achievement data, based on item response theory (IRT) scaling with marginal estimation, was developed originally

More information

Midterm 1 ECO Undergraduate Econometrics

Midterm 1 ECO Undergraduate Econometrics Midterm ECO 23 - Undergraduate Econometrics Prof. Carolina Caetano INSTRUCTIONS Reading and understanding the instructions is your responsibility. Failure to comply may result in loss of points, and there

More information

Don t be Fancy. Impute Your Dependent Variables!

Don t be Fancy. Impute Your Dependent Variables! Don t be Fancy. Impute Your Dependent Variables! Kyle M. Lang, Todd D. Little Institute for Measurement, Methodology, Analysis & Policy Texas Tech University Lubbock, TX May 24, 2016 Presented at the 6th

More information

Comparison of multiple imputation methods for systematically and sporadically missing multilevel data

Comparison of multiple imputation methods for systematically and sporadically missing multilevel data Comparison of multiple imputation methods for systematically and sporadically missing multilevel data V. Audigier, I. White, S. Jolani, T. Debray, M. Quartagno, J. Carpenter, S. van Buuren, M. Resche-Rigon

More information

Missing covariate data in matched case-control studies: Do the usual paradigms apply?

Missing covariate data in matched case-control studies: Do the usual paradigms apply? Missing covariate data in matched case-control studies: Do the usual paradigms apply? Bryan Langholz USC Department of Preventive Medicine Joint work with Mulugeta Gebregziabher Larry Goldstein Mark Huberman

More information

An Introduction to Path Analysis

An Introduction to Path Analysis An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Institute of Statistics and Econometrics Georg-August-University Göttingen Department of Statistics

More information

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling Jon Wakefield Departments of Statistics and Biostatistics University of Washington 1 / 37 Lecture Content Motivation

More information

Markov Chain Monte Carlo methods

Markov Chain Monte Carlo methods Markov Chain Monte Carlo methods By Oleg Makhnin 1 Introduction a b c M = d e f g h i 0 f(x)dx 1.1 Motivation 1.1.1 Just here Supresses numbering 1.1.2 After this 1.2 Literature 2 Method 2.1 New math As

More information

Overlapping Astronomical Sources: Utilizing Spectral Information

Overlapping Astronomical Sources: Utilizing Spectral Information Overlapping Astronomical Sources: Utilizing Spectral Information David Jones Advisor: Xiao-Li Meng Collaborators: Vinay Kashyap (CfA) and David van Dyk (Imperial College) CHASC Astrostatistics Group April

More information

Multiple Imputation for Missing Data in Repeated Measurements Using MCMC and Copulas

Multiple Imputation for Missing Data in Repeated Measurements Using MCMC and Copulas Multiple Imputation for Missing Data in epeated Measurements Using MCMC and Copulas Lily Ingsrisawang and Duangporn Potawee Abstract This paper presents two imputation methods: Marov Chain Monte Carlo

More information

7 Sensitivity Analysis

7 Sensitivity Analysis 7 Sensitivity Analysis A recurrent theme underlying methodology for analysis in the presence of missing data is the need to make assumptions that cannot be verified based on the observed data. If the assumption

More information

Bayesian Mixture Modeling

Bayesian Mixture Modeling University of California, Merced July 21, 2014 Mplus Users Meeting, Utrecht Organization of the Talk Organization s modeling estimation framework Motivating examples duce the basic LCA model Illustrated

More information

Strategies for dealing with Missing Data

Strategies for dealing with Missing Data Institut für Soziologie Eberhard Karls Universität Tübingen http://www.maartenbuis.nl What do we want from an analysis strategy? Simple example We have a theory that working for cash is mainly men s work

More information

F-tests for Incomplete Data in Multiple Regression Setup

F-tests for Incomplete Data in Multiple Regression Setup F-tests for Incomplete Data in Multiple Regression Setup ASHOK CHAURASIA Advisor: Dr. Ofer Harel University of Connecticut / 1 of 19 OUTLINE INTRODUCTION F-tests in Multiple Linear Regression Incomplete

More information

Multidimensional Control Totals for Poststratified Weights

Multidimensional Control Totals for Poststratified Weights Multidimensional Control Totals for Poststratified Weights Darryl V. Creel and Mansour Fahimi Joint Statistical Meetings Minneapolis, MN August 7-11, 2005 RTI International is a trade name of Research

More information

Selection endogenous dummy ordered probit, and selection endogenous dummy dynamic ordered probit models

Selection endogenous dummy ordered probit, and selection endogenous dummy dynamic ordered probit models Selection endogenous dummy ordered probit, and selection endogenous dummy dynamic ordered probit models Massimiliano Bratti & Alfonso Miranda In many fields of applied work researchers need to model an

More information

Estimating the long-term health impact of air pollution using spatial ecological studies. Duncan Lee

Estimating the long-term health impact of air pollution using spatial ecological studies. Duncan Lee Estimating the long-term health impact of air pollution using spatial ecological studies Duncan Lee EPSRC and RSS workshop 12th September 2014 Acknowledgements This is joint work with Alastair Rushworth

More information

Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina

Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals. John W. Mac McDonald & Alessandro Rosina Mixture modelling of recurrent event times with long-term survivors: Analysis of Hutterite birth intervals John W. Mac McDonald & Alessandro Rosina Quantitative Methods in the Social Sciences Seminar -

More information

Dynamic sequential analysis of careers

Dynamic sequential analysis of careers Dynamic sequential analysis of careers Fulvia Pennoni Department of Statistics and Quantitative Methods University of Milano-Bicocca http://www.statistica.unimib.it/utenti/pennoni/ Email: fulvia.pennoni@unimib.it

More information

Analyzing Pilot Studies with Missing Observations

Analyzing Pilot Studies with Missing Observations Analyzing Pilot Studies with Missing Observations Monnie McGee mmcgee@smu.edu. Department of Statistical Science Southern Methodist University, Dallas, Texas Co-authored with N. Bergasa (SUNY Downstate

More information

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved

More information

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

A Note on Bayesian Inference After Multiple Imputation

A Note on Bayesian Inference After Multiple Imputation A Note on Bayesian Inference After Multiple Imputation Xiang Zhou and Jerome P. Reiter Abstract This article is aimed at practitioners who plan to use Bayesian inference on multiplyimputed datasets in

More information

The Bayesian Approach to Multi-equation Econometric Model Estimation

The Bayesian Approach to Multi-equation Econometric Model Estimation Journal of Statistical and Econometric Methods, vol.3, no.1, 2014, 85-96 ISSN: 2241-0384 (print), 2241-0376 (online) Scienpress Ltd, 2014 The Bayesian Approach to Multi-equation Econometric Model Estimation

More information

Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions Data

Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions Data Quality & Quantity 34: 323 330, 2000. 2000 Kluwer Academic Publishers. Printed in the Netherlands. 323 Note Discrete Response Multilevel Models for Repeated Measures: An Application to Voting Intentions

More information

Statistical Practice

Statistical Practice Statistical Practice A Note on Bayesian Inference After Multiple Imputation Xiang ZHOU and Jerome P. REITER This article is aimed at practitioners who plan to use Bayesian inference on multiply-imputed

More information

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score Causal Inference with General Treatment Regimes: Generalizing the Propensity Score David van Dyk Department of Statistics, University of California, Irvine vandyk@stat.harvard.edu Joint work with Kosuke

More information

Analysis of Longitudinal Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington

Analysis of Longitudinal Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington Analysis of Longitudinal Data Patrick J Heagerty PhD Department of Biostatistics University of Washington Auckland 8 Session One Outline Examples of longitudinal data Scientific motivation Opportunities

More information

Modeling conditional dependence among multiple diagnostic tests

Modeling conditional dependence among multiple diagnostic tests Received: 11 June 2017 Revised: 1 August 2017 Accepted: 6 August 2017 DOI: 10.1002/sim.7449 RESEARCH ARTICLE Modeling conditional dependence among multiple diagnostic tests Zhuoyu Wang 1 Nandini Dendukuri

More information

Centering Predictor and Mediator Variables in Multilevel and Time-Series Models

Centering Predictor and Mediator Variables in Multilevel and Time-Series Models Centering Predictor and Mediator Variables in Multilevel and Time-Series Models Tihomir Asparouhov and Bengt Muthén Part 2 May 7, 2018 Tihomir Asparouhov and Bengt Muthén Part 2 Muthén & Muthén 1/ 42 Overview

More information

Tables and Figures. This draft, July 2, 2007

Tables and Figures. This draft, July 2, 2007 and Figures This draft, July 2, 2007 1 / 16 Figures 2 / 16 Figure 1: Density of Estimated Propensity Score Pr(D=1) % 50 40 Treated Group Untreated Group 30 f (P) 20 10 0.01~.10.11~.20.21~.30.31~.40.41~.50.51~.60.61~.70.71~.80.81~.90.91~.99

More information

Local Polynomial Wavelet Regression with Missing at Random

Local Polynomial Wavelet Regression with Missing at Random Applied Mathematical Sciences, Vol. 6, 2012, no. 57, 2805-2819 Local Polynomial Wavelet Regression with Missing at Random Alsaidi M. Altaher School of Mathematical Sciences Universiti Sains Malaysia 11800

More information

Time-Invariant Predictors in Longitudinal Models

Time-Invariant Predictors in Longitudinal Models Time-Invariant Predictors in Longitudinal Models Topics: What happens to missing predictors Effects of time-invariant predictors Fixed vs. systematically varying vs. random effects Model building strategies

More information

Propensity Score Adjustment for Unmeasured Confounding in Observational Studies

Propensity Score Adjustment for Unmeasured Confounding in Observational Studies Propensity Score Adjustment for Unmeasured Confounding in Observational Studies Lawrence C. McCandless Sylvia Richardson Nicky G. Best Department of Epidemiology and Public Health, Imperial College London,

More information

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models Thomas Kneib Department of Mathematics Carl von Ossietzky University Oldenburg Sonja Greven Department of

More information

A weighted simulation-based estimator for incomplete longitudinal data models

A weighted simulation-based estimator for incomplete longitudinal data models To appear in Statistics and Probability Letters, 113 (2016), 16-22. doi 10.1016/j.spl.2016.02.004 A weighted simulation-based estimator for incomplete longitudinal data models Daniel H. Li 1 and Liqun

More information

Fractional Imputation in Survey Sampling: A Comparative Review

Fractional Imputation in Survey Sampling: A Comparative Review Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical

More information

A note on multiple imputation for general purpose estimation

A note on multiple imputation for general purpose estimation A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume

More information

Health utilities' affect you are reported alongside underestimates of uncertainty

Health utilities' affect you are reported alongside underestimates of uncertainty Dr. Kelvin Chan, Medical Oncologist, Associate Scientist, Odette Cancer Centre, Sunnybrook Health Sciences Centre and Dr. Eleanor Pullenayegum, Senior Scientist, Hospital for Sick Children Title: Underestimation

More information

A comparison of arm-based and contrast-based approaches to network meta-analysis (NMA)

A comparison of arm-based and contrast-based approaches to network meta-analysis (NMA) A comparison of arm-based and contrast-based approaches to network meta-analysis (NMA) Ian White Cochrane Statistical Methods Group Webinar 14 th June 2017 Background The choice between

More information

ECON Introductory Econometrics. Lecture 13: Internal and external validity

ECON Introductory Econometrics. Lecture 13: Internal and external validity ECON4150 - Introductory Econometrics Lecture 13: Internal and external validity Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 9 Lecture outline 2 Definitions of internal and external

More information

Planned Missingness Designs and the American Community Survey (ACS)

Planned Missingness Designs and the American Community Survey (ACS) Planned Missingness Designs and the American Community Survey (ACS) Steven G. Heeringa Institute for Social Research University of Michigan Presentation to the National Academies of Sciences Workshop on

More information

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks (9) Model selection and goodness-of-fit checks Objectives In this module we will study methods for model comparisons and checking for model adequacy For model comparisons there are a finite number of candidate

More information

Causal Inference in Observational Studies with Non-Binary Treatments. David A. van Dyk

Causal Inference in Observational Studies with Non-Binary Treatments. David A. van Dyk Causal Inference in Observational Studies with Non-Binary reatments Statistics Section, Imperial College London Joint work with Shandong Zhao and Kosuke Imai Cass Business School, October 2013 Outline

More information

Appendix to The Life-Cycle and the Business-Cycle of Wage Risk - Cross-Country Comparisons

Appendix to The Life-Cycle and the Business-Cycle of Wage Risk - Cross-Country Comparisons Appendix to The Life-Cycle and the Business-Cycle of Wage Risk - Cross-Country Comparisons Christian Bayer Falko Juessen Universität Bonn and IGIER Technische Universität Dortmund and IZA This version:

More information

INTRODUCTION TO MULTILEVEL MODELLING FOR REPEATED MEASURES DATA. Belfast 9 th June to 10 th June, 2011

INTRODUCTION TO MULTILEVEL MODELLING FOR REPEATED MEASURES DATA. Belfast 9 th June to 10 th June, 2011 INTRODUCTION TO MULTILEVEL MODELLING FOR REPEATED MEASURES DATA Belfast 9 th June to 10 th June, 2011 Dr James J Brown Southampton Statistical Sciences Research Institute (UoS) ADMIN Research Centre (IoE

More information

On the Choice of Parameterisation and Priors for the Bayesian Analyses of Mendelian Randomisation Studies.

On the Choice of Parameterisation and Priors for the Bayesian Analyses of Mendelian Randomisation Studies. On the Choice of Parameterisation and Priors for the Bayesian Analyses of Mendelian Randomisation Studies. E. M. Jones 1, J. R. Thompson 1, V. Didelez, and N. A. Sheehan 1 1 Department of Health Sciences,

More information

Latent Variable Model for Weight Gain Prevention Data with Informative Intermittent Missingness

Latent Variable Model for Weight Gain Prevention Data with Informative Intermittent Missingness Journal of Modern Applied Statistical Methods Volume 15 Issue 2 Article 36 11-1-2016 Latent Variable Model for Weight Gain Prevention Data with Informative Intermittent Missingness Li Qin Yale University,

More information

Polytomous Item Explanatory IRT Models with Random Item Effects: An Application to Carbon Cycle Assessment Data

Polytomous Item Explanatory IRT Models with Random Item Effects: An Application to Carbon Cycle Assessment Data Polytomous Item Explanatory IRT Models with Random Item Effects: An Application to Carbon Cycle Assessment Data Jinho Kim and Mark Wilson University of California, Berkeley Presented on April 11, 2018

More information

Time-Invariant Predictors in Longitudinal Models

Time-Invariant Predictors in Longitudinal Models Time-Invariant Predictors in Longitudinal Models Today s Topics: What happens to missing predictors Effects of time-invariant predictors Fixed vs. systematically varying vs. random effects Model building

More information

An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys

An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys An Overview of the Pros and Cons of Linearization versus Replication in Establishment Surveys Richard Valliant University of Michigan and Joint Program in Survey Methodology University of Maryland 1 Introduction

More information

Biostat 2065 Analysis of Incomplete Data

Biostat 2065 Analysis of Incomplete Data Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies

More information

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

A Fully Nonparametric Modeling Approach to. BNP Binary Regression A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation

More information

Flexible mediation analysis in the presence of non-linear relations: beyond the mediation formula.

Flexible mediation analysis in the presence of non-linear relations: beyond the mediation formula. FACULTY OF PSYCHOLOGY AND EDUCATIONAL SCIENCES Flexible mediation analysis in the presence of non-linear relations: beyond the mediation formula. Modern Modeling Methods (M 3 ) Conference Beatrijs Moerkerke

More information

Comparing Change Scores with Lagged Dependent Variables in Models of the Effects of Parents Actions to Modify Children's Problem Behavior

Comparing Change Scores with Lagged Dependent Variables in Models of the Effects of Parents Actions to Modify Children's Problem Behavior Comparing Change Scores with Lagged Dependent Variables in Models of the Effects of Parents Actions to Modify Children's Problem Behavior David R. Johnson Department of Sociology and Haskell Sie Department

More information

Describing Stratified Multiple Responses for Sparse Data

Describing Stratified Multiple Responses for Sparse Data Describing Stratified Multiple Responses for Sparse Data Ivy Liu School of Mathematical and Computing Sciences Victoria University Wellington, New Zealand June 28, 2004 SUMMARY Surveys often contain qualitative

More information

Chris Taylor a, Gareth Rees a & Rhys Davies a a WISERD, Cardiff University, Cardiff, UK. Published online: 05 Jul 2013.

Chris Taylor a, Gareth Rees a & Rhys Davies a a WISERD, Cardiff University, Cardiff, UK. Published online: 05 Jul 2013. This article was downloaded by: [Cardiff University] On: 19 July 2013, At: 00:23 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer

More information

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58

Inference. ME104: Linear Regression Analysis Kenneth Benoit. August 15, August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58 Inference ME104: Linear Regression Analysis Kenneth Benoit August 15, 2012 August 15, 2012 Lecture 3 Multiple linear regression 1 1 / 58 Stata output resvisited. reg votes1st spend_total incumb minister

More information

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 217, Chicago, Illinois Outline 1. Opportunities and challenges of panel data. a. Data requirements b. Control

More information

A Flexible Bayesian Approach to Monotone Missing. Data in Longitudinal Studies with Nonignorable. Missingness with Application to an Acute

A Flexible Bayesian Approach to Monotone Missing. Data in Longitudinal Studies with Nonignorable. Missingness with Application to an Acute A Flexible Bayesian Approach to Monotone Missing Data in Longitudinal Studies with Nonignorable Missingness with Application to an Acute Schizophrenia Clinical Trial Antonio R. Linero, Michael J. Daniels

More information