Introduction An approximated EM algorithm Simulation studies Discussion

Similar documents
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

A note on multiple imputation for general purpose estimation

Recent Advances in the analysis of missing data with non-ignorable missingness

Optimization Methods II. EM algorithms.

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Biostat 2065 Analysis of Incomplete Data

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Statistical Methods for Handling Missing Data

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Fractional Imputation in Survey Sampling: A Comparative Review

Likelihood-based inference with missing data under missing-at-random

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level

Parametric fractional imputation for missing data analysis

analysis of incomplete data in statistical surveys

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Graybill Conference Poster Session Introductions

AN INSTRUMENTAL VARIABLE APPROACH FOR IDENTIFICATION AND ESTIMATION WITH NONIGNORABLE NONRESPONSE

Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes

REGRESSION ANALYSIS IN LONGITUDINAL STUDIES WITH NON-IGNORABLE MISSING OUTCOMES

Nuisance parameter elimination for proportional likelihood ratio models with nonignorable missingness and random truncation

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Some methods for handling missing values in outcome variables. Roderick J. Little

Bayesian estimation of the discrepancy with misspecified parametric models

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing

Announcements. Proposals graded

CPSC 540: Machine Learning

EM Algorithm II. September 11, 2018

Chapter 4: Imputation

Biostat 2065 Analysis of Incomplete Data

Reconstruction of individual patient data for meta analysis via Bayesian approach

EM for ML Estimation

Expectation Maximization

Missing Data: Theory & Methods. Introduction

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS

Miscellanea A note on multiple imputation under complex sampling

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton

Basics of Modern Missing Data Analysis

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Gaussian Mixture Models

Probabilistic Graphical Models

A weighted simulation-based estimator for incomplete longitudinal data models

Bias-Variance Tradeoff

Stat 5101 Lecture Notes

Semiparametric Generalized Linear Models

Latent Variable Models and EM algorithm

Density Estimation. Seungjin Choi

A review of some semiparametric regression models with application to scoring

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Lecture 5: LDA and Logistic Regression

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Basic math for biology

Local Polynomial Wavelet Regression with Missing at Random

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

Topics and Papers for Spring 14 RIT

A Sampling of IMPACT Research:

Implications of Missing Data Imputation for Agricultural Household Surveys: An Application to Technology Adoption

The Expectation-Maximization Algorithm

Analysis of Matched Case Control Data in Presence of Nonignorable Missing Exposure

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

LOCAL LINEAR REGRESSION FOR GENERALIZED LINEAR MODELS WITH MISSING DATA

Likelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University

A STRATEGY FOR STEPWISE REGRESSION PROCEDURES IN SURVIVAL ANALYSIS WITH MISSING COVARIATES. by Jia Li B.S., Beijing Normal University, 1998

Estimating the parameters of hidden binomial trials by the EM algorithm

Robustness to Parametric Assumptions in Missing Data Models

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

EM Algorithm. Expectation-maximization (EM) algorithm.

Computing the MLE and the EM Algorithm

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

STATISTICAL ANALYSIS WITH MISSING DATA

Accelerating the EM Algorithm for Mixture Density Estimation

Comparison between conditional and marginal maximum likelihood for a class of item response models

Causal Inference in Observational Studies with Non-Binary Treatments. David A. van Dyk

Lecture Notes: Some Core Ideas of Imputation for Nonresponse in Surveys. Tom Rosenström University of Helsinki May 14, 2014

6. Fractional Imputation in Survey Sampling

Chap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University

Default Priors and Effcient Posterior Computation in Bayesian

Multistate Modeling and Applications

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Discussion of Maximization by Parts in Likelihood Inference

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Machine Learning: A Statistics and Optimization Perspective

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Multiple Imputation for Missing Data in Repeated Measurements Using MCMC and Copulas

New mixture models and algorithms in the mixtools package

Subsample ignorable likelihood for regression analysis with missing data

12 - Nonparametric Density Estimation

Semiparametric Gaussian Copula Models: Progress and Problems

Generalized Linear Models Introduction

Columbia University. Columbia University Biostatistics Technical Report Series. Handling Missing Data by Deleting Completely Observed Records

Nonparameteric Regression:

Generalized Linear Models

Statistical Estimation

Transcription:

1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse November 12-13, 2015

2 / 33 1 Introduction Background & Current Approaches Expectation-Maximization (EM) Algorithm for Regression Analysis of Data with Nonresponse 2 An approximated EM algorithm The algorithm Application in contingency table analysis Application in regression analyses with a continuous outcome 3 Simulation studies 4 Discussion

3 / 33 Regression Analysis of Data with Nonresponse Consider bivariate data {x i, y i, i = 1, 2,..., n} where x i s are fully observed, y i s are only observed for i = 1,..., m. Denote R i to be the missing-data indicator: R i = 1 if y i is observed and R i = 0 otherwise. Assume that [y i x i ] g(y i x i ; θ) exp{θs(x i, y i ) + a(θ)} pr[r i = 1 x i, y i ] = w(x i, y i ; ψ) and the parameter of interest is θ. Goal: to avoid modeling w(x i, y i ; ψ).

4 / 33 Likelihood-based Inference (I) The observed data are D obs = {x i, R i, R i y i ; i = 1,..., n}. When w(x i, y i ; ψ) is parametrically modeled, the likelihood function is L(θ, ψ; D obs ) = = n p(r i, R i y i x i ; θ, ψ) m g(y i x i ; θ)w(x i, y i ; ψ) n i=m+1 g(y i x i ; θ){1 w(x i, y i ; ψ)} dy i When w(x i, y i ; ψ) = w(x i ; ψ), data are called missing at random (MAR) and L(θ, ψ; D obs ) L(θ; D obs )L(ψ; D obs ) (Rubin, 1976).

5 / 33 Likelihood-based Inference (II) When data are MAR plus θ and ψ are distinct, the modeling of w(x, y; ψ) is not necessary under the likelihood-based inference. When data are not MAR, the missingness has to be modeled and the inference on θ and ψ are made together. Misspecification of the missing-data model w(x, y; ψ) often leads to biased estimate of θ.

6 / 33 A conditional likelihood Assume that [y i x i ] g(y i x i ; θ) (A parametric regression) pr[r i = 1 x i, y i ] = w(x i, y i ; ψ) = w(y i ; ψ). Then [X Y, R = 1] = [X Y, R = 0] = [X Y ]. Consider the following conditional likelihood: CL(θ) = R i =1 p(x i y i ; θ, F X ) (F X is the CDF of X) = R i =1 g(y i x i ; θ)p(x i ) p(y i ; θ, F X ) (Bayes formula) R i =1 g(y i x i ; θ) g(yi x; θ) df X (x) Requires knowing F X ( )!

A pseudolikelihood method (Tang, Little & Raghunathan, 2003) In alternative, we may either model X f (x; α), obtain α = arg max α n f (x i; α), then consider a pseudolikelihood function PL 1 (θ) = R i =1 g(y i x i ; θ) g(yi x; θ) df X (x; α) or substitute F X ( ) by the empirical distribution F n ( ): PL 2 (θ) = R i =1 p(y i x i ; θ) p(yi x; θ) df n (x) = p(y i x i ; θ) 1 n R i =1 n j=1 p(y i x j ; θ) 7 / 33

8 / 33 Exponential tilting (Kim & Yu, 2011) Consider the following semiparametric logistic regression model: pr[r i = 1 x i, y i ] = logit 1 {h(x i ) ψy i }, where h( ) is unspecified and ψ is either known or estimated from an external dataset with the missing values recovered. Subsequently we would have: exp(ψy) p(y x, r = 0) = p(y x, r = 1) E[exp(ψy) x, r = 1]. With both p(y x, r = 1) and E[exp(ψy) x, r = 1] empirically estimated, one will obtain a density estimator for p(y x, r = 0) and estimate θ = E[H(X, Y )] via: θ = 1 n { ri =1 H(x i, y i ) + n Ê[H(x i, y i ) x i, r i = 0]} r i =0

9 / 33 An instrumental variable approach (Wang, Shao & Kim, 2014) Assume that X = (Z, U) and E(Y X) = β 0 + β 1 u + β 2 z pr(r = 1 Z, U, Y ) = pr(r = 1 U, Y ) = π(ψ; U, Y ) one may use the following estimating equations to estimate ψ: n r i { π(ψ; u i, y i ) 1}(1, u i, z i ) = 0. Then estimate β s via n r i π( ψ; u i, y i ) (y i β 0 β 1 u i + β 2 z i )(1, u i, z i ) = 0.

10 / 33 The likelihood function From the following representation: L(θ, ψ; D obs ) = = = = n p(r i, R i y i x i ; θ, ψ) n n n p(r i, R i y i, (1 R i )y i x i ; θ, ψ) p((1 R i )y i R i, R i y i, x i ; θ, ψ) p(r i, y i x i ; θ, ψ) p((1 R i )y i R i, R i y i, x i ; θ, ψ) p(y i x i ; θ)p(r i x i, y i ; ψ) p((1 R i )y i R i, R i y i, x i ; θ, ψ),

11 / 33 The log-likelihood function We have Let Q(θ, ψ; θ, ψ) = l(θ, ψ; y obs ) = log L(θ, ψ; y obs ) nx = {log p(y i x i ; θ) + log p(r i x i, y i ; ψ) nx log p((1 R i )y i R i, R i y i, x i ; θ, ψ)}. E[log p(y i x i ; θ) + log p(r i x i, y i ; ψ) R i, R i y i, x i ; θ, ψ], H(θ, ψ; θ, nx ψ) = E[log p((1 R i )y i R i, R i y i, x i ; θ, ψ) R i, R i y i, x i ; θ, ψ]. Then we have l(θ, ψ; y obs ) = Q(θ, ψ; θ, ψ) H(θ, ψ; θ, ψ), for any ( θ, ψ).

12 / 33 The Expectation-Maximization (EM) Algorithm (Dempster, Laird and Rubin, 1977) From the Jansen s inequality, we have H(θ, ψ; θ, ψ) H( θ, ψ; θ, ψ). The EM algorithm is an iterative algorithm to find a sequence {(θ (t), ψ (t) ), t = 0, 1, 2,...} such that (θ (t+1), ψ (t+1) ) = arg max (θ,ψ) Q(θ, ψ; θ (t), ψ (t) ). This assures that l(θ (t), ψ (t) ; y obs ) l(θ (t+1), ψ (t+1) ; y obs ). Typically logistic or probit regression models are used for w(x, y; ψ). Numerical integrations or the Monte Carlo method are often necessary in the implementation.

13 / 33 When ψ = ψ 0 is known Now consider when the true value of ψ, ψ 0, is known: l(θ, ψ 0 ; y obs ) = log L(θ, ψ 0 ; y obs ) nx = {log p(y i x i ; θ) + log p(r i x i, y i ; ψ 0 ) log p((1 R i )y i D i,obs ; θ, ψ 0 )}. With current estimate θ (t), the Q-function becomes Q(θ; θ (t) ) = nx E[log p(y i x i ; θ) + log p(r i x i, y i ; ψ 0 ) R i, R i y i, x i ; θ (t), ψ 0 ] nx E[log p(y i x i ; θ) R i, R i y i, x i ; θ (t), ψ 0 ], because the second term does not involve θ.

14 / 33 Another view of the likelihood Without loss of generality, assume that the regression model follows a canonical exponential family. Then Ω(θ; θ (t) ) = mx nx logg(y i x i ; θ) + E[log g(y i x i ; θ) x i, R i = 0; θ (t), ψ 0 ] mx logg(y i x i ; θ) + i=m+1 i=m+1 nx {θe[s(x i, y i ) x i, R i = 0; θ (t), ψ 0 ] + a(θ)}, we need to obtain E[S(x i, y i ) x i, R i = 0; θ (t), ψ 0 ] in order to carry out the EM algorithm. With w(x i, y i ; ψ 0 ) known, R E[S(x i, y i ) x i, R i = 0; θ (t) s(xi, y i )g(y i x i ; θ (t) ){1 w(x i, y i ; ψ 0 )} dy i, ψ 0 ] = R g(yi x i ; θ (t) ){1 w(x i, y i ; ψ 0 )} dy i

15 / 33 An alternative look of the E-step On the other hand, E[S(x i, y i ) x i, θ (t) ] = E[E[S(x i, y i ) R i, x i ] x i ] We would have E[S(x i, y i ) x i, R i = 0; θ (t), ψ 0 ] = E[S(x i, y i ) x i, R i = 1; θ (t) ]pr[r i = 1 x i ; θ (t), ψ 0 ] +E[S(x i, y i ) x i, R i = 0; θ (t), ψ 0 ]pr[r i = 0 x i ; θ (t), ψ 0 ], = E[S(x i, y i ) x i, θ (t) ] E[S(x i, y i ) x i, R i = 1; θ (t), ψ 0 ]pr[r i = 1 x i ; θ (t), ψ 0 ] pr[r i = 0 x i ; θ (t), ψ 0 ]

16 / 33 Empirical replacements It is noted that if we use some empirical estimates of E[S(x i, y i ) x i, R i = 1; θ (t), ψ 0 ] and pr[r i = 0 x i ; θ (t), ψ 0 ] to replace them, we would be able to carry out an iterative algorithm as the following: At the E-step be[s(x i, y i ) x i, R i = 0; θ (t), ψ 0 ] = E[S(x i, y i ) x i, θ (t) ] b E[S(x i, y i ) x i, R i = 1] bpr[r i = 1 x i ] 1 bpr[r i = 1 x i ] At the M-step, find θ = θ (t+1) that solves: θ (t+1) = arg max θ Q (θ; θ (t) ) mx := arg max θ [θ{ S(x i, y i ) + nx i=m+1 be[s(x i, Y ) x i, R i = 0; θ (t) ]} + na(θ)]

17 / 33 An important observation In usual an EM algorithm, θ (t) may be far away from the truth θ 0. However, in order for the empirical estimates pr[r i = 1 x i ] and Ê[S(x i, y i ) x i, R i = 1] to be self-consistent with the corresponding terms under (θ (t), ψ 0 ), it is required that θ (t) is a consistent estimate of θ. Therefore we should start with a consistent initial estimate of θ and carry out this iterative algorithm to obtain another consistent estimate of θ at convergence.

A short summary of the modified EM 1 Choose initial value of θ (0) from either a complete dataset from subjects with similar characteristics or a completed subset recovered from recalls. 2 Compute the Nadaraya-Watson estimates or local polynomial estimates for pr[r = 1 x i ] and E[S(x i, Y ) x i, R = 1], for each i = m + 1, m + 2,..., n. 3 At the E-step of the tth iteration, calculate be[s(x i, y i ) x i, R i = 0; θ (t), ψ 0 ]. 4 At the M-step of the tth iteration, find θ = θ (t+1) that solves: nx mx nx E[S(x i, y i ) x i ; θ] = S(x i, y i ) + be[s(x i, Y ) x i, R i = 0; θ (t) ] i=m+1 With a complete external dataset {x i, y i ; i = n + 1,..., n + n E }, we would solve θ = θ (t+1) from: n+n nx X E E[S(x i, y i ) x i ; θ] + E[S(x i, y i ) x i ; θ] i=n+1 = mx S(xi, y i ) + nx be[s(xi, Y ) x i, R i = 0; θ (t) ] + n+n X E S(xi, y i ) 18 / 33

19 / 33 Here we design an iterative algorithm with a sequence {θ (t), t = 0, 1, 2,...} such that θ (t+1) = arg max θ Q (θ; θ (t) ) = arg max θ {Q(θ; θ (t) ) + o(1)} This algorithm will yield l(θ (t+1), ψ 0 ; y obs ) l(θ (t), ψ 0 ; y obs ) + o(1).

An example in contingency table analysis Consider discrete data {x i, y i, z i, i = 1,..., n}: x i s and y i s are fully observed. z i is observed for i = 1,..., c; missing for i = c + 1,..., n. Let m = n c. Essentially the observed include the fully classified table {c jkl } and the partially classified table {m jk+ }. Complete external data D E = {x i, y i, z i, i = n + 1,..., n + n E } are fully observed. A log-linear model is assumed with parameter θ, which leads to π jkl = pr[x = i, Y = j, Z = l], j = 1,..., J; k = 1,..., K ; l = 1,..., L. Initial estimates {π (0) jkl } are derived from the external complete data D E. 20 / 33

21 / 33 Implementation under the discrete setting Initial estimations: bpr[z = l x = j, y = k, R = 1] = bpr[r = 1 x = j, y = k] = At the tth step, for (j, k, l), we update S jkl = c jkl + P c I{x i = j, y i = k, z i = l} P c I{x i = j, y i = k} P c I{x i = j, y i = k} P n I{x i = j, y i = k} = c jk+ n jk+. = c jkl c jk+ nx I{x i = j, y i = k} bpr[z = l x = j, y = k, R = 0] i=c+1 = c jkl + m jk+ pr[z = l x = j, y = k; θ (t) ] cjkl c jk+ c jk+ n jk+ 1 c jk+ n jk+ = c jkl + m jk+ π (t) jkl c jkl π (t) n jk+ jk+ m jk+ n jk+ = n jk+ π (t) jkl π (t) jk+

22 / 33 Impression on the discrete setting In the E-step, we ended up as if imputing all Z i s based on (x i, y i ), including those observed ones. If we include the external data D E in the algorithm with updating the sufficient statistics through π (t) Sjkl = njkl E + S jkl = njkl E jkl + n jk+ π (t), jk+ the modified EM algorithm is equivalent to running a regular EM algorithm on the fully classified table {n E jkl } and the partial classified table {n jk+ }, or, removing the observed z i s from the complete cases (not part of the external complete data).

23 / 33 Implementation under the continuous setting Consider the regression analysis of [Y X] where Y is continuous and subject to nonresponse If X is discrete, the empirical approximations are: r j =1,x j =x i =k Ê[S(x i, Y ) x i = k, r i = 1] = s(x i, y j ) #{j : r j = 1, x j = x i = k} pr[r i = 1 x i = k] = #{j : r j = 1, x j = x i = k} #{j : x j = x i = k} If X is continuous, we use either the Nadaraya-Watson estimate or local polynomial estimate as Ê[S(x i, Y ) x i = k, r i = 1]; a kernel estimator for pr[r i = 1 x i = k].

24 / 33 Simulation settings when data are NMAR We simulated bivariate data {x i, y i, i = 1, 2,..., N} following: (i) x i N(0, 1) or x i Bin(5, 0.3). (ii) [y i x i ] N(β 0 + β 1 x i, σ 2 ), where θ = (β 0, β 1, σ 2 ) = (1, 1, 1). Keep n E = 200 of the above observations as the external datasets for obtaining initial values for θ. Simulate missing y i s from the rest n = 1000 subjects with: pr[r i = 1 x i, y i ] = Φ(ψ 0 + ψ 1 x i + ψ 2 y i ), where Φ() is the CDF of the Gaussian distribution. Compare the complete-case estimate ( θ cc ) based on the 1000 observations with missing data, MLE from the external data ( θ E ) and the proposed approximated EM estimates ( θ AEM ).

25 / 33 Simulation results with X N(0, 1) Methods β 0 β 1 σ 2 Complete-case analysis Empirical Bias* 2009-1388 -486 Empirical SD* 380 422 468 Coverage of 95% CI 0% 8.8% 79.5% MLEs from external subset Empirical Bias* -26 9-103 Empirical SD* 705 707 970 Coverage of 95% CI 96.1% 93.8% 92.6% Approximated EM** Empirical Bias* -97-11 -161 Empirical SD* 551 472 624 Coverage of 95% CI 95.0% 95.2% 93.4% The proportion of nonresponse was about 40%. * Empirical biases and SDs were the actual numbers times 1000. ** The Epanechnikov kernel was used in the approximated EM. Bootstrap was used to derive the standard errors of the b θ AEM

26 / 33 Simulation results with X Bin(5, 0.3) Methods β 0 β 1 σ 2 Complete-case analysis Empirical Bias* 156-1412 -354 Empirical SD* 574 446 477 Coverage of 95% CI 94.6% 11.6% 86.5% MLEs from external subset Empirical Bias* -12 25-111 Empirical SD* 1245 688 976 Coverage of 95% CI 94.7% 94.2% 92.5% Approximated EM** Empirical Bias* -16 27-50 Empirical SD* 662 492 731 Coverage of 95% CI 94.7% 93.9% 93.6% The proportion of nonresponse was about 40% * Empirical biases and SDs were the actual numbers times 1000. ** The empirical averages were used in the approximated EM. Bootstrap was used to derive the standard errors of the b θ AEM.

27 / 33 Simulation settings when data MAR We simulated bivariate data {x i, y i, i = 1, 2,..., N} following: (i) x i N(0, 1). (ii) [y i x i ] N(β 0 + β 1 x i, σ 2 ), where θ = (β 0, β 1, σ 2 ) = (1, 1, 1). Keep n E = 200 of the above observations as the external datasets for obtaining initial values for θ. Simulate missing y i s from the rest n = 1000 subjects with: pr[r i = 1 x i, y i ] = Φ(ψ 0 + ψ 1 x i ), where Φ() is the CDF of the Gaussian distribution. Compare the complete-case estimate ( θ cc ) based on the 1000 observations with missing data, MLE from the external data ( θ E ) and the proposed approximated EM estimates ( θ AEM ).

28 / 33 Simulation results with X N(0, 1) Methods β 0 β 1 σ 2 Complete-case analysis Empirical Bias* -12 1-24 Empirical SD* 386 411 488 Coverage of 95% CI 94% 94.5% 93.4% MLEs from external subset Empirical Bias* -26 9-103 Empirical SD* 705 707 970 Coverage of 95% CI 96.1% 93.8% 92.6% Approximated EM** Empirical Bias* 11 11-73 Empirical SD* 556 472 641 Coverage of 95% CI 93.9% 94.5% 93.7% The proportion of nonresponse was about 40%. * Empirical biases and SDs were the actual numbers times 1000. ** The Epanechnikov kernel was used in the modified EM. Bootstrap was used to derive the standard errors of the b θ AEM

29 / 33 Without an external complete dataset... If one could obtain a reliable initial estimate θ (0) and the associated variance-covariance matrix estimate V, then we can use it as the initial and perform the M-steps via θ (t+1) = arg max θ {(θ θ (0) ) T V 1 (θ θ (0) ) + 1 n Q (θ; θ (t) )}

30 / 33 Without external complete dataset... Consider missing-data mechanisms such as pr[r i = 1 x i, y i ] = w(x i, y i ; ψ) = w(y i ; ψ), There are two approaches for implementing the approximated EM algorithm for data with outcome-dependent nonresponses: 1 Use the pseudolikelihood estimate as the initial estimate and run the approximated EM. 2 Use the pseudolikelihood estimate as the initial estimate and incorporate the pseudolikehood function as a component in each M-step: θ (t+1) = arg max θ {l pl (θ) + Q (θ; θ (t) )} The second approach is more computationally intensive.

31 / 33 Future works Need to monitor {l(θ (t) ; ψ 0 ), Q (θ (t+1) ; θ (t) ) Q(θ (t+1) ; θ (t)) ; t = 0, 1, 2...}. (X,Y,R) Search for a target function l(θ; P n (X,Y,R) a stationary point of l(θ; P n, θ (0) )., θ (0) ) so that θ AEM is Variance estimation. Link to integrative data analysis.

32 / 33 Acknowledgment Megan Hunt Olson, University of Wisconsin, Green Bay. Yang Zhang, Amgen Inc.

33 / 33 Reference 1. Dempster, A. P., N. M. Laird and D. B. Rubin (1977). Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society: Series B 39, 1-37. 2. Tang, G., Little, R.J.A. and Raghunathan, T. (2003). Analysis of Multivariate Missing Data with Nonignorable Nonresponse. Biometrika, 90, 747-764. 3. Kim, J. K. and C. L. Yu (2011). A semi-parametric estimation of mean functionals with non-ignorable missing data, Journal of the American Statistical Association 106, 157-165. 4. Wang, S., J. Shao and J. K. Kim (2014). Identifiability and estimation in problems with nonignorable nonresponse, Statistica Sinica 24, 1097-1116.