Inferences on missing information under multiple imputation and two-stage multiple imputation
|
|
- Leo Scott
- 5 years ago
- Views:
Transcription
1 p. 1/4 Inferences on missing information under multiple imputation and two-stage multiple imputation Ofer Harel Department of Statistics University of Connecticut Prepared for the Missing Data Approaches in the Health and Social Sciences: A Modern Survey meeting, October, 2010.
2 p. 2/4 Outline Background Methods Multiple imputation (MI) Two-stage MI Asymptotic results Number of imputations Discussion Current/future work
3 Background The missing data problem Most statistical analysis and estimation procedures were not designed to handle missing values. Even a small amount of missing values can cause great difficulty. The missing-data aspect is nuisance, not of primary interest. The Goal To make statistically valid inferences about population parameters from an incomplete data set. Not to estimate, predict, or recover the missing values themselves. Untestable assumptions are inevitable. Sensitivity analysis is helpful. p. 3/4
4 p. 4/4 Background Information The information concept was introduced by Fisher in the 30 s. The use of information, information matrices and other functions of the information concepts have become very common. For example The inverse of the covariance matrix. Information based tests such as AIC and BIC. The information concept became important for both frequentists and bayesians.
5 p. 5/4 Background Information I am interested in using the information concept in an incomplete-data situation. Rubin introduced the rate of the missing information, after partitioning the complete information into observed and missing parts. Both Rubin and Schafer argue for the importance of this measure. I am going to develop this concept and show its importance. I am going to touch on possible extensions. How do we define the rates of missing information?
6 p. 6/4 Why do we care? Can we use MI (or any other missing data procedure) if we have 1%, 5%, 10%, 25%, 50%, 75%, 90% missing values? what is the cut point?
7 p. 7/4 Why do we care? Can we use MI (or any other missing data procedure) if we have 1%, 5%, 10%, 25%, 50%, 75%, 90% missing values? what is the cut point? How confident are we in the results from incomplete data? When using MI, how do we determine the number of imputations? Can we evaluate the effect of a missing value or a group of missing values? Can we evaluate our missing data model, and/or its effect on our inference?
8 p. 8/4 Multiple imputation Multiple imputation (Rubin, 1987): a simulation-based approach to missing data. Imputation: Create m imputations of the missing data, Y (1) (m) mis,...,y mis, under a suitable model. Analysis: Analyze each of the m completed data sets in the same way. Combination: Combine the m sets of estimates and variances using Rubin s (1987) rules.
9 p. 9/4 Multiple imputation MI is separated into 3 stages: Imputation stage, Analysis stage, and Combining Results stage. Observed Data???? Im putation 1 2 m
10 p. 10/4 Multiple imputation Imputation: Under an ignorability assumption we draw m sets of imputations from P(Y mis Y obs ), while under non-ignorable models we draw the imputations from P(Y mis Y obs,m). Analysis: Given the complete data sets we can use any complete-data procedures to analyze the data.
11 p. 11/4 Multiple imputation Combining rules: Calculate and store the estimates ˆQ (j) and variancesu (j) for j = 1,...,m and combine: Q = m 1 m ˆQ (j) j=1 U = m 1 m j=1 U (j) m (ˆQ(j) Q) 2 B = (m 1) 1 j=1 T = U +(1+m 1 )B
12 p. 12/4 MI - Cont. Which leads to an approximate 95% interval for Q Q ±t ν T, where the degrees of freedom are ν = (m 1) [ ] (1+m 1 2 )B. T The rate of missing information is λ = U 1 (ν +1)(ν +3) 1 T 1 U 1. This is approximately λ = B U+B when ν is large.
13 p. 13/4 Rates of missing information The rate of missing information is the proportion of missing information to the complete information. The rate of missing information governs the convergence rates of the EM algorithm and the data augmentation. The lower the rate of missing information, the faster the convergence will be. The rate of missing information is important in assessing how missing information contributes to inferential uncertainty about the population quantity of interest.
14 p. 14/4 Rates of missing information The naive estimate of the rate of missing information is the rate of missing values. The naive estimate will be accurate in only very limited situations. The rate of missing information is a function not only of the amount of missing values but also of their importance. When using MI with a small number of imputations, the estimate of the rates of missing information can be very noisy.
15 p. 15/4 Two-types of missing values In many situations, missing values may be oftwo qualitatively different types. Examples include: planned missingness versus unplanned. unit non-response versus item non-response. refusal versus don t know. latent variables versus missing manifest items. deaths versus other types of dropout. dropouts versus subjects who return. dropout for reasons that might be closely related to outcomes, versus dropout for other reasons.
16 p. 16/4 Two-stage multiple imputation Extends Rubin s (1987) framework to two types of missing values. Imputation: Createmimputations ofymis A the first level of the missing data, Y A(1) A(m) mis,...,ymis. Then, for eachymis, A generate n imputations of Ymis B from a conditional predictive distribution given Y A mis. Analysis: Analyze each of the mn completed data sets in the same way. Combination: Combine the mn sets of estimates and variances by Shen s (2000) rules.
17 p. 17/4 Two-stage MI m A A A B B A B 1 n 1 n 1 n 1 n
18 Shen rules From Shen (2000, unpublished Ph.D. thesis) calculate and store: ˆQ (j,k) = estimate of Q U (j,k) = standard error 2 for j = 1,...,m and k = 1,...,n. Then: Q.. = 1 mn m n j=1 k=1 ˆQ (j,k) Qj. = 1 n n k=1 ˆQ (j,k) U = B = 1 mn W = 1 m m 1 m 1 n j=1 k=1 m j=1 U (j,k) m ) 2 (ˆQ (j.) Q.. j=1 1 n 1 n (ˆQ(j,k) Q ) 2 j. k=1 p. 18/4
19 p. 19/4 Shen rules - Cont. The total variance is T = U.. +(1+ 1 m )B +(1 1 n )W. An approximate 95% interval for Q is Q ±t ν T, where the degrees of freedom are ν 1 = 1 m(n 1) ( (1 1 n )W T ) (m 1) ( (1+ 1 m )B ) 2. T
20 p. 20/4 Shen rules - Cont. The overall estimated rate of missing information is ˆλ = B +(1 n 1 )W U.. +B +(1 n 1 )W. The estimated rate of missing information due to Y B mis if Y A mis was known is ˆλ B A = W U.. +W, and the difference ˆλ A = ˆλ ˆλ B A represents the decrease in the rate of missing information if Y A mis becomes known.
21 p. 21/4 Rates of missing information The overall estimated rate of missing information in two-stage MI is equivalent to the rate of missing information in conventional MI. The partition of the rates of missing information is based on the partition of the missing values. There are many ways to partition the missing data. Similar to conventional MI, a small number of imputations may lead to a noisy estimate of the rates of missing information.
22 p. 22/4 Asymptotic Results - Conventional MI The random imputations, Y (j) mis P(Y mis Y obs ), induces a probability distribution to B. Rubin argues that (m 1)B B χ 2 m 1, where B = V(ˆQ(Y obs,y mis ) Y obs ). Assuming that the above is reasonable, the Central Limit Theorem indicates that m/2(b B ) B N(0,1) asm, where E(B) = B and V(B) = 2B 2 /(m 1).
23 p. 23/4 Asymptotic Results - Conventional MI Because ˆλ = B/(B +U), we can use the delta method to derive m(ˆλ λ) 2B 2 U 2 (B+ U) 4 which may also be written as N(0,1), m(ˆλ λ) 2λ(1 λ) N(0,1). Therefore, an approximate 95% confidence interval for the population rate of missing information is ˆλ± 1.96ˆλ(1 ˆλ) m/2.
24 p. 24/4 Asymptotic Results - Conventional MI The rate of missing information is a proportion. In situations where the rate of missing information is close to zero or one, it may be useful to apply a normal approximation on the logit scale, Logit(p) = log[p/(1 p)]; this result in m(logit(ˆλ) Logit(λ)) N(0,2). the logit transformation is a first order variance-stabilizing transformation for ˆλ
25 p. 25/4 Asymptotic Results - two-stage MI In two-stage MI, the sources of the variance can be separated into a between and within nest. ˆQ jk Q.. = ( Q j. Q.. )+(ˆQ jk Q j. ). Source DF SS MS E(MS) Between m 1 n m j=1 ( Q j. Q.. ) 2 nb σw 2 +nσ2 B Within m(n 1) m n j=1 k=1 (ˆQ jk Q j. ) 2 W σw 2
26 p. 26/4 Asymptotic Results - two-stage MI From the ANOVA literature it is known that the mean square errors are independent chi-squares random variables such that, ( σ 2 W +nσb 2 m 1 ) 1 nb χ 2 m 1 ( σ 2 W m(n 1) ) 1 W χ 2 m(n 1).
27 p. 27/4 Asymptotic Results - two-stage MI A first-order approximation to the joint distribution ofb and W is m V B W E B W N (0,I), 0 V 2 where V 1 = 2(σ 2 W +nσ2 B )2 /n 2 and V 2 = 2(σ 2 W )2 /(n 1). Using a multivariate delta method expansion, similar to the procedure we have done for conventional MI we get
28 p. 28/4 Asymptotic Results - two-stage MI mσ 1 1 λ λ B A ˆλ ˆ λ B A N (0,I), where Σ 1 = V 1 U 2 +V 2 U 2 (1 n 1 ) V 2 U 2 (1 n 1 ) (U+B+(1 n 1 )W) 4 (U+B+(1 n 1 )W) 2 (U+W) 2 V 2 U 2 (1 n 1 ) V 2 U 2 (U+B+(1 n 1 )W) 2 (U+W) 2 (U+W) 4.
29 Asymptotic Results - two-stage MI Rewriting the covariance matrixσ1 as a function of λ andλ B A gives Σ 1 = [ λ 2(1 λ)4 1 λ λb A n 1 1 λ B A n 2(1 λ) 2 (λ B A ) 2 n ] 2 2(1 λ) 2 (λ B A ) 2 n 2(1 λ B A ) 2 (λ B A ) 2 (n 1). When the rates of missing information are close to zero or one, we can calculate an approximation on a logit scale mσ 1 Logit(λ) 2 Logit(ˆλ) Logit(λ B A ) Logit( λ B A ˆ ) N (0,I), where Σ 2 = 2(1 λ) 2 λ 2 [ λ 1 λ λb A 2(1 λ)λ B A λ(1 λ B A )n ] 2 n 1 2(1 λ)λ B A 1 λ B A n λ(1 λ B A )n 2 (n 1). p. 29/4
30 p. 30/4 Asymptotic Results - simulations Simulation studies were designed to evaluate the theoretical distributions. Simulation studies were performed for both conventional MI and two-stage MI. Different levels of rates of missing information and number of imputations were simulated for conventional MI and the estimates, variances and coverage were tested. For two-stage MI, the rates of missing information and the number of imputations in each level were simulated, and again the estimates variances (covariances) and coverage were tested. The simulated and theoretical results were very close.
31 p. 31/4 Simulations results Comparison of asymptotic and simulated behavior of the estimated rate of missing information. Plot of SE(ˆλ) 100
32 p. 32/4 Simulations results Comparison of asymptotic and simulated behavior of the estimated rate of missing information. Plot of SE(ˆλ) 100
33 Number of imputations Rubin (1987) demonstrated that a limited number of imputations (5-20) is enough. This is true since the relative efficiency on the variance scale of a point estimate based on m imputations, to one based on an infinite number of imputations, is(1+λ/m) 1. m λ p. 33/4
34 p. 34/4 Number of imputations - Cont. From the standpoint of point-estimation alone, there is little to gain from using more than a few imputations unless the rates of missing information are unusually large. When we are interested in the rates of missing information, estimates of these rates can be very unstable (Schafer, 1997). I will determine the number of imputations necessary for obtaining reliable estimates for the missing information rate.
35 p. 35/4 Number of imputations - Cont. Table 1: Recommendations for choosing m in Conventional MI to estimateλwith given accuracy ˆλ SE(ˆλ) = SE(ˆλ) = SE(ˆλ) =
36 p. 36/4 Number of imputations - Cont. For two-stage MI, one can invoke similar arguments for the relative efficiency (1+λ /m) 1, where λ = λ λb A (1 λ)(1 n 1 ). (1 λ B A ) Because0 λ B A λ, the relative efficiency lies between (1+λ/m) 1 and (1+λ/(mn)) 1.
37 Number of imputations - Cont. Recommendations for choosing m and n in two-stage MI where SE(ˆλ B A ) SE(ˆλ) 0.03 ˆλ B A n ˆλ p. 37/4
38 Number of imputations - Cont. Recommendations for choosing m and n in two-stage MI where SE(ˆλ B A ) SE(ˆλ) 0.02 ˆλ B A n ˆλ p. 38/4
39 Number of imputations - Cont. Recommendations for choosing m and n in two-stage MI where SE(ˆλ B A ) SE(ˆλ) 0.01 ˆλ B A n ˆλ p. 39/4
40 p. 40/4 Discussion When the researcher s main interest is point estimates (and their variances), it is sufficient to use a modest number of imputations. When the rates of missing information are a concern, more imputations are recommended. Simulations have been made to test the accuracy of the asymptotic distribution. In most cases, producing more imputations is not a problem. The rates of missing information are an important model diagnostic tool.
41 p. 41/4 Extensions There are numerous measures that assess the effect of an observation, group of observations, a variable, or variables and observations on the regression estimation (Outlier, influence point etc.). Can we assess the effect of a missing observation, a group of missing observations, an incomplete variable or any combination of these, on the overall estimation? We want this measure to be applicable under any missing data mechanism. And applicable to any data analysis problem.
42 p. 42/4 Extensions I call this measure "Outfluence". The Outfluence is a relative measure comparing the effect of a missing value relative to the effect of the rest of the missing values. For example, consider a survey study in which we are interested in the effect of missing income on the estimation. The outfluence will measure the effect of the missing income compared to all other missing values. We already defined the measure (Harel, 2008) and introduce its limiting distribution (Harel & Stratton, 2009). More work needs to be devoted to the development of methods that will evaluate incomplete-data methodology.
43 p. 43/4 Outfluence Consider the next setup:
44 p. 44/4 Marijuana Example A pilot study presented by Weil et al (1968) and analyzed by Schafer (1997). Three treatments (low dose, high dose and placebo) were given to each subject. The change in heart beat was recorded. We will look at the mean of placebo in 90 minute since Schafer (1997) showed missing data have a large effect on this estimate.
45 p. 45/4 Example Data 15 minutes 90 minutes Subject Placebo Low High Placebo Low High NA NA NA NA NA Means Source: Weil et al (1967)
46 p. 46/4 Example Results Missing values ˆ (4,+) (4,4) (5,4) (+,4) (9,+) B A ˆ ˆ
47 p. 47/4 Extensions and Future work Model diagnosis is an importance concept in statistics The rates of missing information can be used as a diagnostic tool (e.g. Harel & Miglioretti 2007). Many measures were developed to compare the performance of competing models in complete data. To date there is almost no research about the performance of methodology with incomplete data. How can we use measures such as AIC,BIC, DIC to evaluate our missing data modeling?
48 p. 48/4 References Harel, O. (2007) "Inferences on missing information under multiple imputation and two-stage multiple imputation" Statistical Methodology, 4, Harel, O., Miglioretti, D. (2007) "Latent class analysis and the rate of missing information" Journal of Data Science, 5, Harel, O. (2008) "Outfluence - The impact of missing values"model Assisted Statistics and Applications, 3(2), Harel, O., Stratton, J. (2009) "Inferences on the Outfluence How do missing values impact your analysis?" Communications in Statistics-Theory & Methods, In press. Rubin, D.B. (1987)Multiple Imputation for Nonresponse in Surveys. J. Wiley & Sons, New York. Schafer, J.L. (1997)Analysis of Incomplete Multivariate Data. Chapman & Hall, London. Shen, Z. (2000). Nested Multiple Imputation. Ph.D. thesis, Harvard University, Department of Statistics. Weil, A., Zinberg, N., Nelson, J. (1968) Clinical and psychological effects of marihuana in man. Science, 162,
Statistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23
1 / 23 Statistical Methods Missing Data http://www.stats.ox.ac.uk/ snijders/sm.htm Tom A.B. Snijders University of Oxford November, 2011 2 / 23 Literature: Joseph L. Schafer and John W. Graham, Missing
More informationF-tests for Incomplete Data in Multiple Regression Setup
F-tests for Incomplete Data in Multiple Regression Setup ASHOK CHAURASIA Advisor: Dr. Ofer Harel University of Connecticut / 1 of 19 OUTLINE INTRODUCTION F-tests in Multiple Linear Regression Incomplete
More informationHandling Data with Three Types of Missing Values
University of Connecticut DigitalCommons@UConn Doctoral Dissertations University of Connecticut Graduate School 7-30-2013 Handling Data with Three Types of Missing Values Jennifer Boyko University of Connecticut,
More informationPooling multiple imputations when the sample happens to be the population.
Pooling multiple imputations when the sample happens to be the population. Gerko Vink 1,2, and Stef van Buuren 1,3 arxiv:1409.8542v1 [math.st] 30 Sep 2014 1 Department of Methodology and Statistics, Utrecht
More informationBasics of Modern Missing Data Analysis
Basics of Modern Missing Data Analysis Kyle M. Lang Center for Research Methods and Data Analysis University of Kansas March 8, 2013 Topics to be Covered An introduction to the missing data problem Missing
More informationAnalyzing Pilot Studies with Missing Observations
Analyzing Pilot Studies with Missing Observations Monnie McGee mmcgee@smu.edu. Department of Statistical Science Southern Methodist University, Dallas, Texas Co-authored with N. Bergasa (SUNY Downstate
More informationAlexina Mason. Department of Epidemiology and Biostatistics Imperial College, London. 16 February 2010
Strategy for modelling non-random missing data mechanisms in longitudinal studies using Bayesian methods: application to income data from the Millennium Cohort Study Alexina Mason Department of Epidemiology
More informationMultiple Imputation for Missing Data in Repeated Measurements Using MCMC and Copulas
Multiple Imputation for Missing Data in epeated Measurements Using MCMC and Copulas Lily Ingsrisawang and Duangporn Potawee Abstract This paper presents two imputation methods: Marov Chain Monte Carlo
More informationFractional Imputation in Survey Sampling: A Comparative Review
Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical
More informationStatistical Practice
Statistical Practice A Note on Bayesian Inference After Multiple Imputation Xiang ZHOU and Jerome P. REITER This article is aimed at practitioners who plan to use Bayesian inference on multiply-imputed
More informationA Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness
A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model
More informationBayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units
Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional
More informationThe lmm Package. May 9, Description Some improved procedures for linear mixed models
The lmm Package May 9, 2005 Version 0.3-4 Date 2005-5-9 Title Linear mixed models Author Original by Joseph L. Schafer . Maintainer Jing hua Zhao Description Some improved
More informationA Note on Bayesian Inference After Multiple Imputation
A Note on Bayesian Inference After Multiple Imputation Xiang Zhou and Jerome P. Reiter Abstract This article is aimed at practitioners who plan to use Bayesian inference on multiplyimputed datasets in
More informationPackage lmm. R topics documented: March 19, Version 0.4. Date Title Linear mixed models. Author Joseph L. Schafer
Package lmm March 19, 2012 Version 0.4 Date 2012-3-19 Title Linear mixed models Author Joseph L. Schafer Maintainer Jing hua Zhao Depends R (>= 2.0.0) Description Some
More informationToutenburg, Fieger: Using diagnostic measures to detect non-mcar processes in linear regression models with missing covariates
Toutenburg, Fieger: Using diagnostic measures to detect non-mcar processes in linear regression models with missing covariates Sonderforschungsbereich 386, Paper 24 (2) Online unter: http://epub.ub.uni-muenchen.de/
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationBayesian inference for factor scores
Bayesian inference for factor scores Murray Aitkin and Irit Aitkin School of Mathematics and Statistics University of Newcastle UK October, 3 Abstract Bayesian inference for the parameters of the factor
More informationANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW
SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved
More informationPlausible Values for Latent Variables Using Mplus
Plausible Values for Latent Variables Using Mplus Tihomir Asparouhov and Bengt Muthén August 21, 2010 1 1 Introduction Plausible values are imputed values for latent variables. All latent variables can
More informationPIRLS 2016 Achievement Scaling Methodology 1
CHAPTER 11 PIRLS 2016 Achievement Scaling Methodology 1 The PIRLS approach to scaling the achievement data, based on item response theory (IRT) scaling with marginal estimation, was developed originally
More informationTime-Invariant Predictors in Longitudinal Models
Time-Invariant Predictors in Longitudinal Models Today s Class (or 3): Summary of steps in building unconditional models for time What happens to missing predictors Effects of time-invariant predictors
More informationA comparison of fully Bayesian and two-stage imputation strategies for missing covariate data
A comparison of fully Bayesian and two-stage imputation strategies for missing covariate data Alexina Mason, Sylvia Richardson and Nicky Best Department of Epidemiology and Biostatistics, Imperial College
More informationSome methods for handling missing values in outcome variables. Roderick J. Little
Some methods for handling missing values in outcome variables Roderick J. Little Missing data principles Likelihood methods Outline ML, Bayes, Multiple Imputation (MI) Robust MAR methods Predictive mean
More informationA weighted simulation-based estimator for incomplete longitudinal data models
To appear in Statistics and Probability Letters, 113 (2016), 16-22. doi 10.1016/j.spl.2016.02.004 A weighted simulation-based estimator for incomplete longitudinal data models Daniel H. Li 1 and Liqun
More informationMixed-Effects Pattern-Mixture Models for Incomplete Longitudinal Data. Don Hedeker University of Illinois at Chicago
Mixed-Effects Pattern-Mixture Models for Incomplete Longitudinal Data Don Hedeker University of Illinois at Chicago This work was supported by National Institute of Mental Health Contract N44MH32056. 1
More informationPrerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3
University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.
More informationDetermining Sufficient Number of Imputations Using Variance of Imputation Variances: Data from 2012 NAMCS Physician Workflow Mail Survey *
Applied Mathematics, 2014,, 3421-3430 Published Online December 2014 in SciRes. http://www.scirp.org/journal/am http://dx.doi.org/10.4236/am.2014.21319 Determining Sufficient Number of Imputations Using
More informationFractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling
Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction
More informationReconstruction of individual patient data for meta analysis via Bayesian approach
Reconstruction of individual patient data for meta analysis via Bayesian approach Yusuke Yamaguchi, Wataru Sakamoto and Shingo Shirahata Graduate School of Engineering Science, Osaka University Masashi
More informationStreamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level
Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level A Monte Carlo Simulation to Test the Tenability of the SuperMatrix Approach Kyle M Lang Quantitative Psychology
More informationBayesian methods for missing data: part 1. Key Concepts. Nicky Best and Alexina Mason. Imperial College London
Bayesian methods for missing data: part 1 Key Concepts Nicky Best and Alexina Mason Imperial College London BAYES 2013, May 21-23, Erasmus University Rotterdam Missing Data: Part 1 BAYES2013 1 / 68 Outline
More informationEM Algorithm II. September 11, 2018
EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data
More informationMISSING or INCOMPLETE DATA
MISSING or INCOMPLETE DATA A (fairly) complete review of basic practice Don McLeish and Cyntha Struthers University of Waterloo Dec 5, 2015 Structure of the Workshop Session 1 Common methods for dealing
More informationA note on multiple imputation for general purpose estimation
A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume
More informationTime-Invariant Predictors in Longitudinal Models
Time-Invariant Predictors in Longitudinal Models Topics: What happens to missing predictors Effects of time-invariant predictors Fixed vs. systematically varying vs. random effects Model building strategies
More informationComparison of multiple imputation methods for systematically and sporadically missing multilevel data
Comparison of multiple imputation methods for systematically and sporadically missing multilevel data V. Audigier, I. White, S. Jolani, T. Debray, M. Quartagno, J. Carpenter, S. van Buuren, M. Resche-Rigon
More informationDEALING WITH MULTIVARIATE OUTCOMES IN STUDIES FOR CAUSAL EFFECTS
DEALING WITH MULTIVARIATE OUTCOMES IN STUDIES FOR CAUSAL EFFECTS Donald B. Rubin Harvard University 1 Oxford Street, 7th Floor Cambridge, MA 02138 USA Tel: 617-495-5496; Fax: 617-496-8057 email: rubin@stat.harvard.edu
More informationAn Approximate Test for Homogeneity of Correlated Correlation Coefficients
Quality & Quantity 37: 99 110, 2003. 2003 Kluwer Academic Publishers. Printed in the Netherlands. 99 Research Note An Approximate Test for Homogeneity of Correlated Correlation Coefficients TRIVELLORE
More informationA Flexible Bayesian Approach to Monotone Missing. Data in Longitudinal Studies with Nonignorable. Missingness with Application to an Acute
A Flexible Bayesian Approach to Monotone Missing Data in Longitudinal Studies with Nonignorable Missingness with Application to an Acute Schizophrenia Clinical Trial Antonio R. Linero, Michael J. Daniels
More informationConfirmatory Factor Analysis: Model comparison, respecification, and more. Psychology 588: Covariance structure and factor models
Confirmatory Factor Analysis: Model comparison, respecification, and more Psychology 588: Covariance structure and factor models Model comparison 2 Essentially all goodness of fit indices are descriptive,
More informationInference with Imputed Conditional Means
Inference with Imputed Conditional Means Joseph L. Schafer and Nathaniel Schenker June 4, 1997 Abstract In this paper, we develop analytic techniques that can be used to produce appropriate inferences
More information6 Pattern Mixture Models
6 Pattern Mixture Models A common theme underlying the methods we have discussed so far is that interest focuses on making inference on parameters in a parametric or semiparametric model for the full data
More informationStatistical Inference with Monotone Incomplete Multivariate Normal Data
Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 1/4 Statistical Inference with Monotone Incomplete Multivariate Normal Data This talk is based on joint work with my wonderful
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationApproximate analysis of covariance in trials in rare diseases, in particular rare cancers
Approximate analysis of covariance in trials in rare diseases, in particular rare cancers Stephen Senn (c) Stephen Senn 1 Acknowledgements This work is partly supported by the European Union s 7th Framework
More informationMiscellanea A note on multiple imputation under complex sampling
Biometrika (2017), 104, 1,pp. 221 228 doi: 10.1093/biomet/asw058 Printed in Great Britain Advance Access publication 3 January 2017 Miscellanea A note on multiple imputation under complex sampling BY J.
More informationLikelihood-based inference with missing data under missing-at-random
Likelihood-based inference with missing data under missing-at-random Jae-kwang Kim Joint work with Shu Yang Department of Statistics, Iowa State University May 4, 014 Outline 1. Introduction. Parametric
More informationInferences on a Normal Covariance Matrix and Generalized Variance with Monotone Missing Data
Journal of Multivariate Analysis 78, 6282 (2001) doi:10.1006jmva.2000.1939, available online at http:www.idealibrary.com on Inferences on a Normal Covariance Matrix and Generalized Variance with Monotone
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More informationDon t be Fancy. Impute Your Dependent Variables!
Don t be Fancy. Impute Your Dependent Variables! Kyle M. Lang, Todd D. Little Institute for Measurement, Methodology, Analysis & Policy Texas Tech University Lubbock, TX May 24, 2016 Presented at the 6th
More informationConfidence Distribution
Confidence Distribution Xie and Singh (2013): Confidence distribution, the frequentist distribution estimator of a parameter: A Review Céline Cunen, 15/09/2014 Outline of Article Introduction The concept
More informationDiscussing Effects of Different MAR-Settings
Discussing Effects of Different MAR-Settings Research Seminar, Department of Statistics, LMU Munich Munich, 11.07.2014 Matthias Speidel Jörg Drechsler Joseph Sakshaug Outline What we basically want to
More informationStatistical Methods for Handling Missing Data
Statistical Methods for Handling Missing Data Jae-Kwang Kim Department of Statistics, Iowa State University July 5th, 2014 Outline Textbook : Statistical Methods for handling incomplete data by Kim and
More informationMeasurement Error and Linear Regression of Astronomical Data. Brandon Kelly Penn State Summer School in Astrostatistics, June 2007
Measurement Error and Linear Regression of Astronomical Data Brandon Kelly Penn State Summer School in Astrostatistics, June 2007 Classical Regression Model Collect n data points, denote i th pair as (η
More informationStructural Equation Modeling and Confirmatory Factor Analysis. Types of Variables
/4/04 Structural Equation Modeling and Confirmatory Factor Analysis Advanced Statistics for Researchers Session 3 Dr. Chris Rakes Website: http://csrakes.yolasite.com Email: Rakes@umbc.edu Twitter: @RakesChris
More information" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2
Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the
More informationMultiple Imputation Methods for Treatment Noncompliance and Nonresponse in Randomized Clinical Trials
UW Biostatistics Working Paper Series 2-19-2009 Multiple Imputation Methods for Treatment Noncompliance and Nonresponse in Randomized Clinical Trials Leslie Taylor UW, taylorl@u.washington.edu Xiao-Hua
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationThe Wishart distribution Scaled Wishart. Wishart Priors. Patrick Breheny. March 28. Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/11
Wishart Priors Patrick Breheny March 28 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/11 Introduction When more than two coefficients vary, it becomes difficult to directly model each element
More informationMISSING or INCOMPLETE DATA
MISSING or INCOMPLETE DATA A (fairly) complete review of basic practice Don McLeish and Cyntha Struthers University of Waterloo Dec 5, 2015 Structure of the Workshop Session 1 Common methods for dealing
More informationBayesian inference for sample surveys. Roderick Little Module 2: Bayesian models for simple random samples
Bayesian inference for sample surveys Roderick Little Module : Bayesian models for simple random samples Superpopulation Modeling: Estimating parameters Various principles: least squares, method of moments,
More informationMulti-group analyses for measurement invariance parameter estimates and model fit (ML)
LBP-TBQ: Supplementary digital content 8 Multi-group analyses for measurement invariance parameter estimates and model fit (ML) Medication data Multi-group CFA analyses were performed with the 16-item
More informationThe propensity score with continuous treatments
7 The propensity score with continuous treatments Keisuke Hirano and Guido W. Imbens 1 7.1 Introduction Much of the work on propensity score analysis has focused on the case in which the treatment is binary.
More informationShu Yang and Jae Kwang Kim. Harvard University and Iowa State University
Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND
More informationPACKAGE LMest FOR LATENT MARKOV ANALYSIS
PACKAGE LMest FOR LATENT MARKOV ANALYSIS OF LONGITUDINAL CATEGORICAL DATA Francesco Bartolucci 1, Silvia Pandofi 1, and Fulvia Pennoni 2 1 Department of Economics, University of Perugia (e-mail: francesco.bartolucci@unipg.it,
More informationSTATISTICAL ANALYSIS WITH MISSING DATA
STATISTICAL ANALYSIS WITH MISSING DATA SECOND EDITION Roderick J.A. Little & Donald B. Rubin WILEY SERIES IN PROBABILITY AND STATISTICS Statistical Analysis with Missing Data Second Edition WILEY SERIES
More informationREVISED PAGE PROOFS. Logistic Regression. Basic Ideas. Fundamental Data Analysis. bsa350
bsa347 Logistic Regression Logistic regression is a method for predicting the outcomes of either-or trials. Either-or trials occur frequently in research. A person responds appropriately to a drug or does
More informationLISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014
LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers
More informationIntroduction An approximated EM algorithm Simulation studies Discussion
1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse
More informationAcknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression
INTRODUCTION TO CLINICAL RESEARCH Introduction to Linear Regression Karen Bandeen-Roche, Ph.D. July 17, 2012 Acknowledgements Marie Diener-West Rick Thompson ICTR Leadership / Team JHU Intro to Clinical
More informationImputation Algorithm Using Copulas
Metodološki zvezki, Vol. 3, No. 1, 2006, 109-120 Imputation Algorithm Using Copulas Ene Käärik 1 Abstract In this paper the author demonstrates how the copulas approach can be used to find algorithms for
More informationStatistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes
Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes Kosuke Imai Department of Politics Princeton University July 31 2007 Kosuke Imai (Princeton University) Nonignorable
More informationHealth utilities' affect you are reported alongside underestimates of uncertainty
Dr. Kelvin Chan, Medical Oncologist, Associate Scientist, Odette Cancer Centre, Sunnybrook Health Sciences Centre and Dr. Eleanor Pullenayegum, Senior Scientist, Hospital for Sick Children Title: Underestimation
More informationState Estimation of Linear and Nonlinear Dynamic Systems
State Estimation of Linear and Nonlinear Dynamic Systems Part I: Linear Systems with Gaussian Noise James B. Rawlings and Fernando V. Lima Department of Chemical and Biological Engineering University of
More informationRegression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.
TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted
More informationPlanned Missingness Designs and the American Community Survey (ACS)
Planned Missingness Designs and the American Community Survey (ACS) Steven G. Heeringa Institute for Social Research University of Michigan Presentation to the National Academies of Sciences Workshop on
More informationSTA 216, GLM, Lecture 16. October 29, 2007
STA 216, GLM, Lecture 16 October 29, 2007 Efficient Posterior Computation in Factor Models Underlying Normal Models Generalized Latent Trait Models Formulation Genetic Epidemiology Illustration Structural
More information36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression
36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 22, 2015 Lecture 4: Linear Regression TCELL Simple Regression Example Male black wheatear birds carry stones to the nest as a form
More informationChapter 4. Parametric Approach. 4.1 Introduction
Chapter 4 Parametric Approach 4.1 Introduction The missing data problem is already a classical problem that has not been yet solved satisfactorily. This problem includes those situations where the dependent
More informationarxiv: v5 [stat.me] 13 Feb 2018
arxiv: arxiv:1602.07933 BOOTSTRAP INFERENCE WHEN USING MULTIPLE IMPUTATION By Michael Schomaker and Christian Heumann University of Cape Town and Ludwig-Maximilians Universität München arxiv:1602.07933v5
More informationMultilevel Statistical Models: 3 rd edition, 2003 Contents
Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction
More informationNotes on the Multivariate Normal and Related Topics
Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions
More informationLongitudinal analysis of ordinal data
Longitudinal analysis of ordinal data A report on the external research project with ULg Anne-Françoise Donneau, Murielle Mauer June 30 th 2009 Generalized Estimating Equations (Liang and Zeger, 1986)
More informationNew Developments in Nonresponse Adjustment Methods
New Developments in Nonresponse Adjustment Methods Fannie Cobben January 23, 2009 1 Introduction In this paper, we describe two relatively new techniques to adjust for (unit) nonresponse bias: The sample
More informationMixture Modeling. Identifying the Correct Number of Classes in a Growth Mixture Model. Davood Tofighi Craig Enders Arizona State University
Identifying the Correct Number of Classes in a Growth Mixture Model Davood Tofighi Craig Enders Arizona State University Mixture Modeling Heterogeneity exists such that the data are comprised of two or
More informationA class of latent marginal models for capture-recapture data with continuous covariates
A class of latent marginal models for capture-recapture data with continuous covariates F Bartolucci A Forcina Università di Urbino Università di Perugia FrancescoBartolucci@uniurbit forcina@statunipgit
More informationFlexible multiple imputation by chained equations of the AVO-95 Survey
PG/VGZ/99.045 Flexible multiple imputation by chained equations of the AVO-95 Survey TNO Prevention and Health Public Health Wassenaarseweg 56 P.O.Box 2215 2301 CE Leiden The Netherlands Tel + 31 71 518
More informationLabel Switching and Its Simple Solutions for Frequentist Mixture Models
Label Switching and Its Simple Solutions for Frequentist Mixture Models Weixin Yao Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. wxyao@ksu.edu Abstract The label switching
More informationOutline of GLMs. Definitions
Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density
More informationBiostat 2065 Analysis of Incomplete Data
Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies
More information36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)
36-309/749 Experimental Design for Behavioral and Social Sciences Dec 1, 2015 Lecture 11: Mixed Models (HLMs) Independent Errors Assumption An error is the deviation of an individual observed outcome (DV)
More informationConfidence Intervals, Testing and ANOVA Summary
Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0
More informationWinLTA USER S GUIDE for Data Augmentation
USER S GUIDE for Version 1.0 (for WinLTA Version 3.0) Linda M. Collins Stephanie T. Lanza Joseph L. Schafer The Methodology Center The Pennsylvania State University May 2002 Dev elopment of this program
More informationTWO-WAY CONTINGENCY TABLES WITH MARGINALLY AND CONDITIONALLY IMPUTED NONRESPONDENTS
TWO-WAY CONTINGENCY TABLES WITH MARGINALLY AND CONDITIONALLY IMPUTED NONRESPONDENTS By Hansheng Wang A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy
More informationLecture 9: Linear Regression
Lecture 9: Linear Regression Goals Develop basic concepts of linear regression from a probabilistic framework Estimating parameters and hypothesis testing with linear models Linear regression in R Regression
More informationControlling for latent confounding by confirmatory factor analysis (CFA) Blinded Blinded
Controlling for latent confounding by confirmatory factor analysis (CFA) Blinded Blinded 1 Background Latent confounder is common in social and behavioral science in which most of cases the selection mechanism
More informationUnbiased estimation of exposure odds ratios in complete records logistic regression
Unbiased estimation of exposure odds ratios in complete records logistic regression Jonathan Bartlett London School of Hygiene and Tropical Medicine www.missingdata.org.uk Centre for Statistical Methodology
More informationStatistical Models with Uncertain Error Parameters (G. Cowan, arxiv: )
Statistical Models with Uncertain Error Parameters (G. Cowan, arxiv:1809.05778) Workshop on Advanced Statistics for Physics Discovery aspd.stat.unipd.it Department of Statistical Sciences, University of
More information