An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

Similar documents
Data Integration for Big Data Analysis for finite population inference

Recent Advances in the analysis of missing data with non-ignorable missingness

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Fractional Imputation in Survey Sampling: A Comparative Review

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70

Nonresponse weighting adjustment using estimated response probability

Chapter 8: Estimation 1

Chapter 4: Imputation

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

A note on multiple imputation for general purpose estimation

Propensity score adjusted method for missing data

Statistical Methods for Handling Missing Data

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW

Combining data from two independent surveys: model-assisted approach

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007)

Parametric fractional imputation for missing data analysis

Combining Non-probability and Probability Survey Samples Through Mass Imputation

6. Fractional Imputation in Survey Sampling

A Sampling of IMPACT Research:

Robustness to Parametric Assumptions in Missing Data Models

Modification and Improvement of Empirical Likelihood for Missing Response Problem

A measurement error model approach to small area estimation

Linear Methods for Prediction

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

Weighting in survey analysis under informative sampling

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

On the bias of the multiple-imputation variance estimator in survey sampling

Calibration Estimation for Semiparametric Copula Models under Missing Data

Combining Non-probability and. Probability Survey Samples Through Mass Imputation

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Estimating the Marginal Odds Ratio in Observational Studies

Chapter 3: Maximum Likelihood Theory

Classification. Chapter Introduction. 6.2 The Bayes classifier

Linear Methods for Prediction

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Generalized Linear Models. Kurt Hornik

Likelihood-based inference with missing data under missing-at-random

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Combining multiple observational data sources to estimate causal eects

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Web-based Supplementary Materials for A Robust Method for Estimating. Optimal Treatment Regimes

Graybill Conference Poster Session Introductions

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Imputation for Missing Data under PPSWR Sampling

Chapter 3: Element sampling design: Part 1

Economics 582 Random Effects Estimation

A weighted simulation-based estimator for incomplete longitudinal data models

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai

Lecture 6: Discrete Choice: Qualitative Response

Covariate Balancing Propensity Score for General Treatment Regimes

5 Methods Based on Inverse Probability Weighting Under MAR

Calibration estimation using exponential tilting in sample surveys

Comment: Understanding OR, PS and DR

Two-phase sampling approach to fractional hot deck imputation

7 Sensitivity Analysis

Time Series Analysis

Modelling Survival Events with Longitudinal Data Measured with Error

analysis of incomplete data in statistical surveys

AN INSTRUMENTAL VARIABLE APPROACH FOR IDENTIFICATION AND ESTIMATION WITH NONIGNORABLE NONRESPONSE

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley

Introduction to Survey Data Integration

Propensity Score Weighting with Multilevel Data

2 Naïve Methods. 2.1 Complete or available case analysis

Flexible Estimation of Treatment Effect Parameters

University of Oxford. Statistical Methods Autocorrelation. Identification and Estimation

B y t = γ 0 + Γ 1 y t + ε t B(L) y t = γ 0 + ε t ε t iid (0, D) D is diagonal

Causal Inference Basics

Chapter 4. Replication Variance Estimation. J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28

For more information about how to cite these materials visit

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Biostat 2065 Analysis of Incomplete Data

Econ 583 Final Exam Fall 2008

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models

Basic math for biology

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models

Chapter 2. Section Section 2.9. J. Kim (ISU) Chapter 2 1 / 26. Design-optimal estimator under stratified random sampling

Statistical Inference of Covariate-Adjusted Randomized Experiments

Statistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23


Problem Selected Scores

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Recap. Vector observation: Y f (y; θ), Y Y R m, θ R d. sample of independent vectors y 1,..., y n. pairwise log-likelihood n m. weights are often 1

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses

Joint work with Nottingham colleagues Simon Preston and Michail Tsagris.

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Integrated approaches for analysis of cluster randomised trials

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Weakly dependent functional data. Piotr Kokoszka. Utah State University. Siegfried Hörmann. University of Utah

TAMS39 Lecture 2 Multivariate normal distribution

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Semiparametric Generalized Linear Models

F & B Approaches to a simple model

Transcription:

An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)

2 / 47 Reference Zhou, M. and Kim, J.K. (2012). An Efficient Method of Estimation for Longitudinal Surveys with Monotone Missing Data, Biometrika, Accepted for publication.

3 / 47 Outline 1 Basic Setup 2 Propensity Score Method 3 GLS Approach with Propensity Score Method 4 Application to Longitudinal Missing 5 Simulation Study 6 Conclusion

4 / 47 Basic Setup X, Y: random variables from some distribution θ: parameter of interest, defined through Examples E {U(θ; X, Y)} = 0. 1 θ = E(Y): U(θ; X, Y) = Y θ 2 θ = F 1 Y (1/2): U(θ) = F Y(θ) 1/2 3 θ is regression coefficient: U(θ) = X (Y Xθ).

5 / 47 Basic Setup Estimator of θ: Solve Û n (θ) n U(θ; x i, y i ) = 0 i=1 to get ˆθ n, where (x i, y i ) are IID realizations of (X, Y). Under some conditions, ˆθ n converges in probability to θ and follows from a normal distribution when n. What if some of y i are missing?

6 / 47 Basic Setup x 1,, x n are fully observed. Some of y i are not observed. Let r i = { 1 if yi is observed 0 if y i is missing. Note that r i is also a random variable (whose probability distribution is generally unknown).

7 / 47 Basic Setup Complete-Case (CC) method Solve n r i U(θ; x i, y i ) = 0 i=1 Biased unless Pr (r = 1 X, Y) does not depend on (X, Y), i.e. biased unless the set of the respondents is a simple random sample from the original data.

8 / 47 Basic Setup Weighted Complete-Case (WCC) method Solve Û W (θ) n r i w i U(θ; x i, y i ) = 0 i=1 for some weights w i. The weight is often called the propensity scores (or propensity weights). The choice of w i = 1 Pr (r i = 1 x i, y i ) will make the resulting estimator consistent. Requires some assumption about Pr (r i = 1 x i, y i ).

9 / 47 Basic Setup Justification for using w i = 1/Pr (r i = 1 x i, y i ) Note that } E {ÛW (θ) x 1,, x n, y 1,, y n = Û n (θ) where the expectation is taken with respect to r. Thus, the probability limit of the solution to Û W (θ) = 0 is equal to the probability limit of the solution to Û n (θ) = 0. No distributional assumptions made about (X, Y).

Propensity score method Idea For simplicity, assume that Pr (r i = 1 x i, y i ) takes a parametric form. Pr (r i = 1 x i, y i ) = π(x i, y i ; φ ) for some unknown φ. The functional form of π( ) is known. For example, π(x, y; φ ) = exp(φ 0 + φ 1 x + φ 2 y) 1 + exp(φ 0 + φ 1 x + φ 2 y) Propensity score approach to missing data: obtain ˆθ PS which solves n 1 Û PS (θ) r i π(x, y; ˆφ) U(θ; x i, y i ) = 0 i=1 for some ˆφ which converges to φ in probability. 10 / 47

11 / 47 Propensity score method Issues Identifiability: Model parameters may not be fully identifiable from the observed sample. May assume Pr (r i = 1 x i, y i ) = Pr (r i = 1 x i ). This condition is often called MAR (Missing at random). For longitudinal data with monotone missing pattern, the MAR condition means Pr (r i,t = 1 x i, y i1,, y it ) = Pr (r i,t = 1 x i, y i1,, y i,t 1 ). That is, the response probability at time t may depend on the value of y observed up to time t.

12 / 47 Propensity score method Issues Estimation of φ Maximum likelihood method: Solve S(φ) n {r i π i (φ)} q i (φ) = 0 i=1 where q i = logit{π i (φ)}/ φ. Maximum likelihood method does not always lead to efficient estimation (see Example 1 next). Inference using ˆθ PS : Note that ˆθ PS = ˆθ PS ( ˆφ). We need to incorporate the sampling variability of ˆφ in making inference about θ using ˆθ PS.

13 / 47 Propensity score method Example 1 Response model Parameter of interest: θ = E(Y). PS estimator of θ: Solve jointly π i (φ ) = exp(φ 0 + φ 1 x i) 1 + exp(φ 0 + φ 1 x i) U(θ, φ) = S(φ) = n r i {π i (φ)} 1 (y i θ) = 0 i=1 n {r i π i (φ)}(1, x i ) = (0, 0) i=1

Propensity score method Example 1 (Cont d) Taylor linearization { ˆθ PS ( ˆφ) = ˆθPS (φ ) E ( )} { U E φ ( )} S 1 S(φ ) φ = ˆθ PS (φ ) {Cov (U, S)} {V(S)} 1 S(φ ), by the property of zero-mean function. (i.e. If E(U) = 0, then E( U/ φ) = Cov(U, S).) So, we have where V{ˆθ PS ( ˆφ)} = V{ˆθ PS (φ ) S } V{ˆθ PS (φ )}, V{ˆθ S } = V(ˆθ) Cov(ˆθ, S) {V(S)} 1 Cov(S, ˆθ). 14 / 47

15 / 47 GLS approach with propensity score method Motivation The propensity score method is used to reduce the bias, rather than to reduce the variance. In the previous example, the PS estimator for θ x = E(X) is ˆθ x,ps = n i=1 r iˆπ 1 n i=1 r iˆπ 1 i i x i where ˆπ i = π i ( ˆφ). Note that ˆθ x,ps is not necessarily equal to x n = n 1 n i=1 x i. How to incorporate the extra information of x n?

16 / 47 GLS approach with propensity score method GLS (or GMM) approach Let θ = (θ x, θ y ). We have three estimators for two parameters. Find θ that minimizes Q PS (θ) = x n θ x ˆθ x,ps θ x ˆθ y,ps θ y where ˆθ PS = ˆθ PS ( ˆφ). ˆV x n θ x ˆθ x,ps θ x ˆθ y,ps θ y Computation for ˆV is somewhat cumbersome. 1 x n θ x ˆθ x,ps θ x ˆθ y,ps θ y

17 / 47 GLS approach with propensity score method Alternative GLS (or GMM) approach Find (θ, φ) that minimizes x n θ x ˆθ x,ps (φ) θ x ˆθ y,ps (φ) θ y S(φ) ˆV x n θ x ˆθ x,ps (φ) θ x ˆθ y,ps (φ) θ y S(φ) 1 x n θ x ˆθ x,ps (φ) θ x ˆθ y,ps (φ) θ y S(φ) Computation for ˆV is easier since we can treat φ as if known. Let Q (θ, φ) be the above objective function. It can be shown that Q (θ, ˆφ) = Q PS (θ) and so minimizing Q (θ, ˆφ) is equivalent to minimizing Q PS (θ)..

18 / 47 GLS approach with propensity score method Justification for the equivalence May write Q (θ, φ) = ( ÛPS (θ, φ) S(φ) = Q 1 (θ φ) + Q 2 (φ) ) ( V11 V 12 V 21 V 22 ) 1 ( ÛPS (θ, φ) S(φ) ) where Q 1 (θ φ) = ) {V (ÛPS V 12 V 1 22 S ( UPS S )} ( ) 1 Û PS V 12 V 1 22 S Q 2 (φ) = S(φ) {ˆV (S)} 1 S(φ) For the MLE ˆφ, we have Q 2 ( ˆφ) = 0 and Q 1 (θ ˆφ) = Q PS (θ).

19 / 47 GLS approach with propensity score method Back to Example 1 Response model π i (φ ) = exp(φ 0 + φ 1 x i) 1 + exp(φ 0 + φ 1 x i) Three direct PS estimators of (1, θ x, θ y ): (ˆθ 1,PS, ˆθ x,ps, ˆθ y,ps ) = n 1 x n = n 1 n i=1 x i available. What is the optimal estimator of θ y? n i=1 r iˆπ 1 i (1, x i, y i ).

20 / 47 GLS approach with propensity score method Example 1 (Cont d) Minimize x n θ x ˆθ 1,PS (φ) 1 ˆθ x,ps (φ) θ x ˆV ˆθ y,ps (φ) θ y S(φ) with respect to (θ x, θ y, φ), where S(φ) = n i=1 with h i (φ) = π i (φ)(1, x i ). x n ˆθ 1,PS (φ) ˆθ x,ps (φ) ˆθ y,ps (φ) S(φ) 1 ( ) ri π i (φ) 1 h i (φ) = 0 x n θ x ˆθ 1,PS (φ) 1 ˆθ x,ps (φ) θ x ˆθ y,ps (φ) θ y S(φ)

21 / 47 GLS approach with propensity score method Example 1 (Cont d) Equivalently, minimize ˆθ y,ps (φ) θ y ˆθ 1,PS (φ) 1 ˆθ x,ps (φ) x n S(φ) ˆV ˆθ y,ps (φ) ˆθ 1,PS (φ) ˆθ x,ps (φ) x n S(φ) 1 ˆθ y,ps (φ) θ y ˆθ 1,PS (φ) 1 ˆθ x,ps (φ) x n S(φ) with respect to (θ y, φ), since the optimal estimator of θ x is x n.

22 / 47 GLS approach with propensity score method Example 1 (Cont d) The solution can be written as ˆθ y,opt = ˆθ ( y,ps + 1 ˆθ ) ( 1,PS ˆB 0 + x n ˆθ ) { ˆφ)} 1,PS ˆB 1 + 0 S( Ĉ where ˆB 0 ˆB 1 Ĉ n = r i b i i=1 1 x i h i 1 x i h i 1 n r i b i i=1 1 x i h i y i and b i = ˆπ 2 i (1 ˆπ i ). Note that the last term {0 S( ˆφ)}Ĉ, which is equal to zero, does not contribute to the point estimation. But, it is used for variance estimation.

23 / 47 GLS approach with propensity score method Example 1 (Cont d) That is, for variance estimation, we simply express where ˆθ y,opt = n 1 n ˆη i i=1 ˆη = ˆB 0 + x iˆb 1 + h iĉ + r ( ) i y i ˆB 0 x iˆb 1 h ˆπ iĉ i and apply the standard variance formula to ˆη i. This idea can be extended to the survey sampling setup.

24 / 47 GLS approach with propensity score method Example 1 (Cont d) The optimal estimator is linear in y. That is, we can write ˆθ y,opt = 1 n n i=1 r i g i y i = w i y i ˆπ i r i =1 where g i satisfies n i=1 r i ˆπ i g i (1, x i, h i) = n (1, x i, h i). This, doubly robust under E ζ (y x) = β 0 + β 1 x in the sense that θ y,opt is consistent when either the response model or the superpopulation model holds. i=1

25 / 47 Application to longitudinal missing Basic Setup X i is always observed and remains unchanged for t = 0, 1,..., T. Y it is the response for subject i at time t. r it : The response indicator for subject i at time t. Assuming no missing in the baseline year, Y 0 can be absorbed into X. Monotone missing pattern r it = 0 r i,t+1 = 0, t = 1,..., T 1. L i,t = (X i, Y i1,..., Y i,t ) : Measurement up to t. Parameter of interest is µ t = E{Y it }.

26 / 47 Application to longitudinal missing Missing mechanism Missing completely at random (MCAR) : P(r it=1 r i,t 1 = 1, L i,t ) = P(r it=1 r i,t 1 = 1). Covariate-dependent missing (CDM) : P(r it = 1 r i,t 1 = 1, L i,t ) = P(r it = 1 r i,t 1 = 1, X i ). Missing at random (MAR) : P(r it = 1 r i,t 1 = 1, L i,t ) = P(r it = 1 r i,t 1 = 1, L i,t 1 ). Missing not at random (MNAR) : Missing at random does not hold.

27 / 47 Application to longitudinal missing Motivation Panel attrition is frequently encountered in panel surveys, while classical methods often assume covariate-dependent missing, which can be unrealistic. We want to develop a PS method under MAR. Want to make full use of available information.

28 / 47 Application to longitudinal missing Modeling Propensity Score Under MAR, in the longitudinal data case, we would consider the conditional probabilities: Then p it := P(r it = 1 r i,t 1 = 1, L i,t 1 ), t = 1,..., T. π it = t p ij. π t then can be modeled through modeling p t with p t (L t 1 ; φ t ). This kind of modeling is also adopted in Robins et al. (1995). j=1

29 / 47 Application to longitudinal missing Score Function for Longitudinal Data Under parametric models for p t s, the partial likelihood for φ 1,..., φ T is L(φ 1,..., φ T ) = n T i=1 t=1 [ p r i,t it (1 p it ) 1 r i,t ] r i,t 1, and the corresponding score function is (S 1 (φ 1 ),..., S T (φ T )), where S t (φ t ) = n r i,t 1 {r it p it (φ t )} q it (φ t ) = 0 i=1 where q it (φ t ) = logit{p it (φ t )}/ φ t. Under logistic regression model such that p t = 1/{1 + exp( φ tl t 1 )}, we have q it (φ t ) = L t 1.

30 / 47 Application to longitudinal missing Example 2 Assume T = 3 Parameter of interest: µ x = E(X), µ t = E(Y t ), t = 1, 2, 3. PS estimator of µ p at year t: ˆθ p,t = n 1 n i=1 r itˆπ it 1 y ip, p t. Estimator under t = 0 (baseline year): ˆθ x,0 = n 1 n i=1 x i Estimator under t = 1: ˆθ x,1, ˆθ 1,1 Estimator under t = 2: ˆθ x,2, ˆθ 1,2, ˆθ 2,2 Estimator under t = 3: ˆθ x,3, ˆθ 1,3, ˆθ 2,3, ˆθ 3,3 (T + 1) p + T(T + 1)/2 estimators for p + T parameters. (p = dim(x))

Application to longitudinal missing GMM for Longitudinal Data Case Need to incorporate auxiliary information sequentially. T = 1 already covered in Example 1. For t = 2, we have auxiliary information about µ x from t = 0 sample (i.e. x n ) and another auxiliary information about µ 1 from t = 1 sample (i.e. ˆθ 1,opt ). Thus, the optimal estimator of θ 2 takes the form of ˆθ 2,opt = ˆθ ( 2,2 + x n ˆθ ) x,2 ˆB 1 + (ˆθ1,opt ˆθ ) 1,2 ˆB 2 for some ˆB 1 and ˆB 2. 31 / 47

Application to longitudinal missing The auxiliary information up to time t can be incorporated by the estimating function Ẽ(ξ t 1 ) = 0 where Ẽ( ) = n 1 n i=1 i and ξ t 1 := r 0 π 0 u 1 X ( ) r 1 X π 1 u 2 Y 1 r t 1 π t 1 u t. X Y 1. Y t 1 = r 0 π 0 u 1 L 0 r 1 π 1 u 2 L 1. r t 1 π t 1 u t L t 1, where u t = r t /p t 1. Note that E{ξ t 1 } = 0 because E{u t L t 1, r t 1 = 1} = 0. 32 / 47

33 / 47 Application to longitudinal missing The score function can be written as S t := (S 1,..., S t) = nẽ{ψ t 1 }, where (r 1 p 1 r ( 0 )X ) X (r 2 p 2 r 1 ) Y 1 r 0 u 1 p 1 L 0 ψ t 1 =. r 1 u 2 p 2 L 1 X =.. Y 1 r (r t p t r t 1 ) t 1 u t p t L t 1. Y t 1

34 / 47 Application to longitudinal missing GMM for Longitudinal Data Case This motivates the minimizing the following quadratic form: Ẽ{r t Y t /ˆπ t } µ t Ẽ{r t Y t /π t } Q t = Ẽ{ˆξ t 1 } E{ξ t 1 } Ẽ{ξ ˆV t 1 } Ẽ{ ˆψ t 1 } E{ψ t 1 } Ẽ{ψ t 1 } Ẽ{r t Y t /ˆπ t } µ t Ẽ{ˆξ t 1 } E{ξ t 1 }. Ẽ{ ˆψ t 1 } E{ψ t 1 } 1

Application to longitudinal missing Optimal PS Estimator Theorem (1) Under the logistic type response model, where the score function for (φ 1,..., φ T ) is Ẽ(ψ T 1 ) = 0. For each year t, the optimal estimator of µ t = E{Y t } among the class ˆµ t,bt,c t = Ẽ{r t Y t /ˆπ t } B tẽ{ˆξ t 1 } C tẽ{ ˆψ t 1 }, is given by ˆµ t,ˆb t,ĉ t, where ˆB t = (ˆB 1t,..., ˆB tt), Ĉ t = (Ĉ 1t,..., Ĉ tt) with { ( ) ( ) ( ) (ˆB j,t, Ĉ j,t) = Ẽ 1 r tˆπ j 1 1 1 ˆπ 1 j 1 L 1 } j 1 ˆπ j 1 L j 1 ˆπ t ˆp j ˆp j L j 1 ˆp j L j 1 { ( ) ( ) } 1 1 Ẽ 1 ˆπ i 1 L i 1 r t Y t. ˆp j ˆp i L i 1 ˆπ t 35 / 47

36 / 47 Application to longitudinal missing Variance Estimation Theorem (2) The Ŷ t,opt estimator is asymptotically equivalent to Ẽ{η t }, where η t = r ty t π t t j=1 ( ) 1 D π j,tr j 1 u j 1 L j 1 j, p j L j 1 with D j,t = E 1 {r j 1 u 2 j ( ) ( ) 1 π j 1 L 1 } { ( ) j 1 π j 1 L 1 j 1 E u π i 1 L i 1 j p j L j 1 p j L j 1 p i L i 1 r t Y t π t }. Thus the variance of Ŷ t,opt can be consistently estimated by n 1 (n 1) 1 Ẽ{ˆη t Ẽ(ˆη t )} 2,

37 / 47 Application to longitudinal missing Properties of our Optimal Estimator Ŷ t,opt is asymptotically normal with mean µ t and variance that is equal to the lower bound of asymptotic variance corresponding to the following family Ŷ t,psa B Ẽ{ˆξ t 1 }. Computational advantage due to the fact that r i 1 u i L i and r j 1 u j L j are orthogonal (uncorrelated) for i j. Variance estimation is also very convenient, as implied by Theorem 2.

38 / 47 Numerical Study Robins et al. (1995) Estimator Robins et al. (1995) proposed a class of estimators for estimating µ t in the longitudinal data case with monotone missing, by incorporating a regression model of E(Y it X i ) = m(x i ; β t ). When m(x i ; β) = X i β t, the weighted estimating equation method in Robins et al. (1995) would give an estimator ˆµ t (out of that family) that is a solution to Ẽ which gives [ rt ˆπ t {Y t µ t β 1,t(X Ẽ[X])} ( )] 1 X Ẽ(X) ˆµ t = Ẽ{r (Ẽ{rt ) ty t /ˆπ t } Ẽ{r t /ˆπ t } ˆβ 1,t Y t /ˆπ t } Ẽ{r t /ˆπ t } X n.,

39 / 47 Numerical Study Estimators under Study The estimator using the full sample, i.e. Ẽ{Y t }, for reference. The naive estimator, i.e. the simple average of the complete sample, which is ˆµ t,naive = Ẽ{r t Y t }/Ẽ{r t }. The direct propensity score adjusted estimator, that is, ˆµ t,psa = Ẽ{r t Y t /ˆπ t }. The estimator using weighted estimating equations by Robins et al. (1995), denoted by ˆµ t,rrz. Our estimator given in theorem 1, denoted by ˆµ t,opt.

40 / 47 Numerical Study Simulation Study I Y 0 = 2(X 1) + e 0, Y t = 2(X 1) + 2Y t 1 + e t, for t > 1, where X N(1, 1), e t = 0.5e t 1 + v t, e 0 N(0, 1) and v t N(0, 1) independently for different t. The missing indicator r t follows the following distribution: P(r t = 1 X, Y t 1, r t 1 = 1) = expit(1 + 2X Y t 1 /(t + 1)), and there is no missing in the baseline year.

41 / 47 Numerical Study Simulation Study I We used B = 10000 Monte Carlo samples of size n = 300 for this simulation. The response rates for t = 1, 2, 3 are 0.93, 0.87, 0.75 respectively. We also computed variance estimator of the optimal estimator using the formula in Theorem 2. The relative biases of the variance estimator, for t = 1, 2, 3 are 0.0149, 0.0159, 0.0172 respectively.

42 / 47 Numerical Study Results from Simulation Study 1 Table: Comparison for different methods when n = 300, T = 3 with Monte Carlo sample size 10000 for simulation study 1, using the full data as baseline. 100*RMSE/RMSE.Full t = 1 t = 2 t = 3 Full 100 100 100 Naive 134 118 260 PS 101 102 157 RRZ 100 101 105 Opt 100 100 101

43 / 47 Numerical Study Simulation Study II Y 0 = 2(X 1) 1/3 + e 0, Y t = 2(X 1) 1/3 + 2Y t 1 + e t, for t > 1, where X N(1, 1), e t = 0.5e t 1 + v t, e 0 N(0, 1) and v t N(0, 1) independently for different t. The missing indicator r t follows the following distribution: P(r t = 1 X, Y t 1, r t 1 = 1) = expit(1 + 2X Y t 1 /(t + 1)).

44 / 47 Numerical Study Simulation Study II B = 10000, n = 300, response rates for t = 1, 2, 3 are 0.92, 0.85, 0.74 respectively. Using the same formula for variance estimation, in this simulation study, for the optimal estimator, the relative biases of the variance estimator, for t = 1, 2, 3 are 0.0029, 0.0058, 0.0066 respectively.

45 / 47 Numerical Study Results from Simulation Study II 100*RMSE/RMSE.Full t = 1 t = 2 t = 3 Full 100 100 100 Naive 133 115 229 PS 102 106 144 RRZ 102 103 118 Opt 101 103 108

46 / 47 Concluding Remarks We adopted GLS (GMM) technique and constructed an optimal estimator among a class of unbiased estimators. Under monotone missing pattern, applying GLS (GMM) method to estimate µ X, µ 1,..., µ T simultaneously is exactly the same as what we proposed (estimate µ X, µ 1,..., µ T one by one). This method is directly applicable to the case when the baseline year sample is selected with a complex probability sample. Extension to non-monotone missing pattern, time-dependent covariate can be important topics for further investigation.

47 / 47 Thank You! Questions? : jkim@iastate.edu