An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data

Size: px
Start display at page:

Download "An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data"

Transcription

1 An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)

2 2 / 47 Reference Zhou, M. and Kim, J.K. (2012). An Efficient Method of Estimation for Longitudinal Surveys with Monotone Missing Data, Biometrika, Accepted for publication.

3 3 / 47 Outline 1 Basic Setup 2 Propensity Score Method 3 GLS Approach with Propensity Score Method 4 Application to Longitudinal Missing 5 Simulation Study 6 Conclusion

4 4 / 47 Basic Setup X, Y: random variables from some distribution θ: parameter of interest, defined through Examples E {U(θ; X, Y)} = 0. 1 θ = E(Y): U(θ; X, Y) = Y θ 2 θ = F 1 Y (1/2): U(θ) = F Y(θ) 1/2 3 θ is regression coefficient: U(θ) = X (Y Xθ).

5 5 / 47 Basic Setup Estimator of θ: Solve Û n (θ) n U(θ; x i, y i ) = 0 i=1 to get ˆθ n, where (x i, y i ) are IID realizations of (X, Y). Under some conditions, ˆθ n converges in probability to θ and follows from a normal distribution when n. What if some of y i are missing?

6 6 / 47 Basic Setup x 1,, x n are fully observed. Some of y i are not observed. Let r i = { 1 if yi is observed 0 if y i is missing. Note that r i is also a random variable (whose probability distribution is generally unknown).

7 7 / 47 Basic Setup Complete-Case (CC) method Solve n r i U(θ; x i, y i ) = 0 i=1 Biased unless Pr (r = 1 X, Y) does not depend on (X, Y), i.e. biased unless the set of the respondents is a simple random sample from the original data.

8 8 / 47 Basic Setup Weighted Complete-Case (WCC) method Solve Û W (θ) n r i w i U(θ; x i, y i ) = 0 i=1 for some weights w i. The weight is often called the propensity scores (or propensity weights). The choice of w i = 1 Pr (r i = 1 x i, y i ) will make the resulting estimator consistent. Requires some assumption about Pr (r i = 1 x i, y i ).

9 9 / 47 Basic Setup Justification for using w i = 1/Pr (r i = 1 x i, y i ) Note that } E {ÛW (θ) x 1,, x n, y 1,, y n = Û n (θ) where the expectation is taken with respect to r. Thus, the probability limit of the solution to Û W (θ) = 0 is equal to the probability limit of the solution to Û n (θ) = 0. No distributional assumptions made about (X, Y).

10 Propensity score method Idea For simplicity, assume that Pr (r i = 1 x i, y i ) takes a parametric form. Pr (r i = 1 x i, y i ) = π(x i, y i ; φ ) for some unknown φ. The functional form of π( ) is known. For example, π(x, y; φ ) = exp(φ 0 + φ 1 x + φ 2 y) 1 + exp(φ 0 + φ 1 x + φ 2 y) Propensity score approach to missing data: obtain ˆθ PS which solves n 1 Û PS (θ) r i π(x, y; ˆφ) U(θ; x i, y i ) = 0 i=1 for some ˆφ which converges to φ in probability. 10 / 47

11 11 / 47 Propensity score method Issues Identifiability: Model parameters may not be fully identifiable from the observed sample. May assume Pr (r i = 1 x i, y i ) = Pr (r i = 1 x i ). This condition is often called MAR (Missing at random). For longitudinal data with monotone missing pattern, the MAR condition means Pr (r i,t = 1 x i, y i1,, y it ) = Pr (r i,t = 1 x i, y i1,, y i,t 1 ). That is, the response probability at time t may depend on the value of y observed up to time t.

12 12 / 47 Propensity score method Issues Estimation of φ Maximum likelihood method: Solve S(φ) n {r i π i (φ)} q i (φ) = 0 i=1 where q i = logit{π i (φ)}/ φ. Maximum likelihood method does not always lead to efficient estimation (see Example 1 next). Inference using ˆθ PS : Note that ˆθ PS = ˆθ PS ( ˆφ). We need to incorporate the sampling variability of ˆφ in making inference about θ using ˆθ PS.

13 13 / 47 Propensity score method Example 1 Response model Parameter of interest: θ = E(Y). PS estimator of θ: Solve jointly π i (φ ) = exp(φ 0 + φ 1 x i) 1 + exp(φ 0 + φ 1 x i) U(θ, φ) = S(φ) = n r i {π i (φ)} 1 (y i θ) = 0 i=1 n {r i π i (φ)}(1, x i ) = (0, 0) i=1

14 Propensity score method Example 1 (Cont d) Taylor linearization { ˆθ PS ( ˆφ) = ˆθPS (φ ) E ( )} { U E φ ( )} S 1 S(φ ) φ = ˆθ PS (φ ) {Cov (U, S)} {V(S)} 1 S(φ ), by the property of zero-mean function. (i.e. If E(U) = 0, then E( U/ φ) = Cov(U, S).) So, we have where V{ˆθ PS ( ˆφ)} = V{ˆθ PS (φ ) S } V{ˆθ PS (φ )}, V{ˆθ S } = V(ˆθ) Cov(ˆθ, S) {V(S)} 1 Cov(S, ˆθ). 14 / 47

15 15 / 47 GLS approach with propensity score method Motivation The propensity score method is used to reduce the bias, rather than to reduce the variance. In the previous example, the PS estimator for θ x = E(X) is ˆθ x,ps = n i=1 r iˆπ 1 n i=1 r iˆπ 1 i i x i where ˆπ i = π i ( ˆφ). Note that ˆθ x,ps is not necessarily equal to x n = n 1 n i=1 x i. How to incorporate the extra information of x n?

16 16 / 47 GLS approach with propensity score method GLS (or GMM) approach Let θ = (θ x, θ y ). We have three estimators for two parameters. Find θ that minimizes Q PS (θ) = x n θ x ˆθ x,ps θ x ˆθ y,ps θ y where ˆθ PS = ˆθ PS ( ˆφ). ˆV x n θ x ˆθ x,ps θ x ˆθ y,ps θ y Computation for ˆV is somewhat cumbersome. 1 x n θ x ˆθ x,ps θ x ˆθ y,ps θ y

17 17 / 47 GLS approach with propensity score method Alternative GLS (or GMM) approach Find (θ, φ) that minimizes x n θ x ˆθ x,ps (φ) θ x ˆθ y,ps (φ) θ y S(φ) ˆV x n θ x ˆθ x,ps (φ) θ x ˆθ y,ps (φ) θ y S(φ) 1 x n θ x ˆθ x,ps (φ) θ x ˆθ y,ps (φ) θ y S(φ) Computation for ˆV is easier since we can treat φ as if known. Let Q (θ, φ) be the above objective function. It can be shown that Q (θ, ˆφ) = Q PS (θ) and so minimizing Q (θ, ˆφ) is equivalent to minimizing Q PS (θ)..

18 18 / 47 GLS approach with propensity score method Justification for the equivalence May write Q (θ, φ) = ( ÛPS (θ, φ) S(φ) = Q 1 (θ φ) + Q 2 (φ) ) ( V11 V 12 V 21 V 22 ) 1 ( ÛPS (θ, φ) S(φ) ) where Q 1 (θ φ) = ) {V (ÛPS V 12 V 1 22 S ( UPS S )} ( ) 1 Û PS V 12 V 1 22 S Q 2 (φ) = S(φ) {ˆV (S)} 1 S(φ) For the MLE ˆφ, we have Q 2 ( ˆφ) = 0 and Q 1 (θ ˆφ) = Q PS (θ).

19 19 / 47 GLS approach with propensity score method Back to Example 1 Response model π i (φ ) = exp(φ 0 + φ 1 x i) 1 + exp(φ 0 + φ 1 x i) Three direct PS estimators of (1, θ x, θ y ): (ˆθ 1,PS, ˆθ x,ps, ˆθ y,ps ) = n 1 x n = n 1 n i=1 x i available. What is the optimal estimator of θ y? n i=1 r iˆπ 1 i (1, x i, y i ).

20 20 / 47 GLS approach with propensity score method Example 1 (Cont d) Minimize x n θ x ˆθ 1,PS (φ) 1 ˆθ x,ps (φ) θ x ˆV ˆθ y,ps (φ) θ y S(φ) with respect to (θ x, θ y, φ), where S(φ) = n i=1 with h i (φ) = π i (φ)(1, x i ). x n ˆθ 1,PS (φ) ˆθ x,ps (φ) ˆθ y,ps (φ) S(φ) 1 ( ) ri π i (φ) 1 h i (φ) = 0 x n θ x ˆθ 1,PS (φ) 1 ˆθ x,ps (φ) θ x ˆθ y,ps (φ) θ y S(φ)

21 21 / 47 GLS approach with propensity score method Example 1 (Cont d) Equivalently, minimize ˆθ y,ps (φ) θ y ˆθ 1,PS (φ) 1 ˆθ x,ps (φ) x n S(φ) ˆV ˆθ y,ps (φ) ˆθ 1,PS (φ) ˆθ x,ps (φ) x n S(φ) 1 ˆθ y,ps (φ) θ y ˆθ 1,PS (φ) 1 ˆθ x,ps (φ) x n S(φ) with respect to (θ y, φ), since the optimal estimator of θ x is x n.

22 22 / 47 GLS approach with propensity score method Example 1 (Cont d) The solution can be written as ˆθ y,opt = ˆθ ( y,ps + 1 ˆθ ) ( 1,PS ˆB 0 + x n ˆθ ) { ˆφ)} 1,PS ˆB S( Ĉ where ˆB 0 ˆB 1 Ĉ n = r i b i i=1 1 x i h i 1 x i h i 1 n r i b i i=1 1 x i h i y i and b i = ˆπ 2 i (1 ˆπ i ). Note that the last term {0 S( ˆφ)}Ĉ, which is equal to zero, does not contribute to the point estimation. But, it is used for variance estimation.

23 23 / 47 GLS approach with propensity score method Example 1 (Cont d) That is, for variance estimation, we simply express where ˆθ y,opt = n 1 n ˆη i i=1 ˆη = ˆB 0 + x iˆb 1 + h iĉ + r ( ) i y i ˆB 0 x iˆb 1 h ˆπ iĉ i and apply the standard variance formula to ˆη i. This idea can be extended to the survey sampling setup.

24 24 / 47 GLS approach with propensity score method Example 1 (Cont d) The optimal estimator is linear in y. That is, we can write ˆθ y,opt = 1 n n i=1 r i g i y i = w i y i ˆπ i r i =1 where g i satisfies n i=1 r i ˆπ i g i (1, x i, h i) = n (1, x i, h i). This, doubly robust under E ζ (y x) = β 0 + β 1 x in the sense that θ y,opt is consistent when either the response model or the superpopulation model holds. i=1

25 25 / 47 Application to longitudinal missing Basic Setup X i is always observed and remains unchanged for t = 0, 1,..., T. Y it is the response for subject i at time t. r it : The response indicator for subject i at time t. Assuming no missing in the baseline year, Y 0 can be absorbed into X. Monotone missing pattern r it = 0 r i,t+1 = 0, t = 1,..., T 1. L i,t = (X i, Y i1,..., Y i,t ) : Measurement up to t. Parameter of interest is µ t = E{Y it }.

26 26 / 47 Application to longitudinal missing Missing mechanism Missing completely at random (MCAR) : P(r it=1 r i,t 1 = 1, L i,t ) = P(r it=1 r i,t 1 = 1). Covariate-dependent missing (CDM) : P(r it = 1 r i,t 1 = 1, L i,t ) = P(r it = 1 r i,t 1 = 1, X i ). Missing at random (MAR) : P(r it = 1 r i,t 1 = 1, L i,t ) = P(r it = 1 r i,t 1 = 1, L i,t 1 ). Missing not at random (MNAR) : Missing at random does not hold.

27 27 / 47 Application to longitudinal missing Motivation Panel attrition is frequently encountered in panel surveys, while classical methods often assume covariate-dependent missing, which can be unrealistic. We want to develop a PS method under MAR. Want to make full use of available information.

28 28 / 47 Application to longitudinal missing Modeling Propensity Score Under MAR, in the longitudinal data case, we would consider the conditional probabilities: Then p it := P(r it = 1 r i,t 1 = 1, L i,t 1 ), t = 1,..., T. π it = t p ij. π t then can be modeled through modeling p t with p t (L t 1 ; φ t ). This kind of modeling is also adopted in Robins et al. (1995). j=1

29 29 / 47 Application to longitudinal missing Score Function for Longitudinal Data Under parametric models for p t s, the partial likelihood for φ 1,..., φ T is L(φ 1,..., φ T ) = n T i=1 t=1 [ p r i,t it (1 p it ) 1 r i,t ] r i,t 1, and the corresponding score function is (S 1 (φ 1 ),..., S T (φ T )), where S t (φ t ) = n r i,t 1 {r it p it (φ t )} q it (φ t ) = 0 i=1 where q it (φ t ) = logit{p it (φ t )}/ φ t. Under logistic regression model such that p t = 1/{1 + exp( φ tl t 1 )}, we have q it (φ t ) = L t 1.

30 30 / 47 Application to longitudinal missing Example 2 Assume T = 3 Parameter of interest: µ x = E(X), µ t = E(Y t ), t = 1, 2, 3. PS estimator of µ p at year t: ˆθ p,t = n 1 n i=1 r itˆπ it 1 y ip, p t. Estimator under t = 0 (baseline year): ˆθ x,0 = n 1 n i=1 x i Estimator under t = 1: ˆθ x,1, ˆθ 1,1 Estimator under t = 2: ˆθ x,2, ˆθ 1,2, ˆθ 2,2 Estimator under t = 3: ˆθ x,3, ˆθ 1,3, ˆθ 2,3, ˆθ 3,3 (T + 1) p + T(T + 1)/2 estimators for p + T parameters. (p = dim(x))

31 Application to longitudinal missing GMM for Longitudinal Data Case Need to incorporate auxiliary information sequentially. T = 1 already covered in Example 1. For t = 2, we have auxiliary information about µ x from t = 0 sample (i.e. x n ) and another auxiliary information about µ 1 from t = 1 sample (i.e. ˆθ 1,opt ). Thus, the optimal estimator of θ 2 takes the form of ˆθ 2,opt = ˆθ ( 2,2 + x n ˆθ ) x,2 ˆB 1 + (ˆθ1,opt ˆθ ) 1,2 ˆB 2 for some ˆB 1 and ˆB / 47

32 Application to longitudinal missing The auxiliary information up to time t can be incorporated by the estimating function Ẽ(ξ t 1 ) = 0 where Ẽ( ) = n 1 n i=1 i and ξ t 1 := r 0 π 0 u 1 X ( ) r 1 X π 1 u 2 Y 1 r t 1 π t 1 u t. X Y 1. Y t 1 = r 0 π 0 u 1 L 0 r 1 π 1 u 2 L 1. r t 1 π t 1 u t L t 1, where u t = r t /p t 1. Note that E{ξ t 1 } = 0 because E{u t L t 1, r t 1 = 1} = / 47

33 33 / 47 Application to longitudinal missing The score function can be written as S t := (S 1,..., S t) = nẽ{ψ t 1 }, where (r 1 p 1 r ( 0 )X ) X (r 2 p 2 r 1 ) Y 1 r 0 u 1 p 1 L 0 ψ t 1 =. r 1 u 2 p 2 L 1 X =.. Y 1 r (r t p t r t 1 ) t 1 u t p t L t 1. Y t 1

34 34 / 47 Application to longitudinal missing GMM for Longitudinal Data Case This motivates the minimizing the following quadratic form: Ẽ{r t Y t /ˆπ t } µ t Ẽ{r t Y t /π t } Q t = Ẽ{ˆξ t 1 } E{ξ t 1 } Ẽ{ξ ˆV t 1 } Ẽ{ ˆψ t 1 } E{ψ t 1 } Ẽ{ψ t 1 } Ẽ{r t Y t /ˆπ t } µ t Ẽ{ˆξ t 1 } E{ξ t 1 }. Ẽ{ ˆψ t 1 } E{ψ t 1 } 1

35 Application to longitudinal missing Optimal PS Estimator Theorem (1) Under the logistic type response model, where the score function for (φ 1,..., φ T ) is Ẽ(ψ T 1 ) = 0. For each year t, the optimal estimator of µ t = E{Y t } among the class ˆµ t,bt,c t = Ẽ{r t Y t /ˆπ t } B tẽ{ˆξ t 1 } C tẽ{ ˆψ t 1 }, is given by ˆµ t,ˆb t,ĉ t, where ˆB t = (ˆB 1t,..., ˆB tt), Ĉ t = (Ĉ 1t,..., Ĉ tt) with { ( ) ( ) ( ) (ˆB j,t, Ĉ j,t) = Ẽ 1 r tˆπ j ˆπ 1 j 1 L 1 } j 1 ˆπ j 1 L j 1 ˆπ t ˆp j ˆp j L j 1 ˆp j L j 1 { ( ) ( ) } 1 1 Ẽ 1 ˆπ i 1 L i 1 r t Y t. ˆp j ˆp i L i 1 ˆπ t 35 / 47

36 36 / 47 Application to longitudinal missing Variance Estimation Theorem (2) The Ŷ t,opt estimator is asymptotically equivalent to Ẽ{η t }, where η t = r ty t π t t j=1 ( ) 1 D π j,tr j 1 u j 1 L j 1 j, p j L j 1 with D j,t = E 1 {r j 1 u 2 j ( ) ( ) 1 π j 1 L 1 } { ( ) j 1 π j 1 L 1 j 1 E u π i 1 L i 1 j p j L j 1 p j L j 1 p i L i 1 r t Y t π t }. Thus the variance of Ŷ t,opt can be consistently estimated by n 1 (n 1) 1 Ẽ{ˆη t Ẽ(ˆη t )} 2,

37 37 / 47 Application to longitudinal missing Properties of our Optimal Estimator Ŷ t,opt is asymptotically normal with mean µ t and variance that is equal to the lower bound of asymptotic variance corresponding to the following family Ŷ t,psa B Ẽ{ˆξ t 1 }. Computational advantage due to the fact that r i 1 u i L i and r j 1 u j L j are orthogonal (uncorrelated) for i j. Variance estimation is also very convenient, as implied by Theorem 2.

38 38 / 47 Numerical Study Robins et al. (1995) Estimator Robins et al. (1995) proposed a class of estimators for estimating µ t in the longitudinal data case with monotone missing, by incorporating a regression model of E(Y it X i ) = m(x i ; β t ). When m(x i ; β) = X i β t, the weighted estimating equation method in Robins et al. (1995) would give an estimator ˆµ t (out of that family) that is a solution to Ẽ which gives [ rt ˆπ t {Y t µ t β 1,t(X Ẽ[X])} ( )] 1 X Ẽ(X) ˆµ t = Ẽ{r (Ẽ{rt ) ty t /ˆπ t } Ẽ{r t /ˆπ t } ˆβ 1,t Y t /ˆπ t } Ẽ{r t /ˆπ t } X n.,

39 39 / 47 Numerical Study Estimators under Study The estimator using the full sample, i.e. Ẽ{Y t }, for reference. The naive estimator, i.e. the simple average of the complete sample, which is ˆµ t,naive = Ẽ{r t Y t }/Ẽ{r t }. The direct propensity score adjusted estimator, that is, ˆµ t,psa = Ẽ{r t Y t /ˆπ t }. The estimator using weighted estimating equations by Robins et al. (1995), denoted by ˆµ t,rrz. Our estimator given in theorem 1, denoted by ˆµ t,opt.

40 40 / 47 Numerical Study Simulation Study I Y 0 = 2(X 1) + e 0, Y t = 2(X 1) + 2Y t 1 + e t, for t > 1, where X N(1, 1), e t = 0.5e t 1 + v t, e 0 N(0, 1) and v t N(0, 1) independently for different t. The missing indicator r t follows the following distribution: P(r t = 1 X, Y t 1, r t 1 = 1) = expit(1 + 2X Y t 1 /(t + 1)), and there is no missing in the baseline year.

41 41 / 47 Numerical Study Simulation Study I We used B = Monte Carlo samples of size n = 300 for this simulation. The response rates for t = 1, 2, 3 are 0.93, 0.87, 0.75 respectively. We also computed variance estimator of the optimal estimator using the formula in Theorem 2. The relative biases of the variance estimator, for t = 1, 2, 3 are , , respectively.

42 42 / 47 Numerical Study Results from Simulation Study 1 Table: Comparison for different methods when n = 300, T = 3 with Monte Carlo sample size for simulation study 1, using the full data as baseline. 100*RMSE/RMSE.Full t = 1 t = 2 t = 3 Full Naive PS RRZ Opt

43 43 / 47 Numerical Study Simulation Study II Y 0 = 2(X 1) 1/3 + e 0, Y t = 2(X 1) 1/3 + 2Y t 1 + e t, for t > 1, where X N(1, 1), e t = 0.5e t 1 + v t, e 0 N(0, 1) and v t N(0, 1) independently for different t. The missing indicator r t follows the following distribution: P(r t = 1 X, Y t 1, r t 1 = 1) = expit(1 + 2X Y t 1 /(t + 1)).

44 44 / 47 Numerical Study Simulation Study II B = 10000, n = 300, response rates for t = 1, 2, 3 are 0.92, 0.85, 0.74 respectively. Using the same formula for variance estimation, in this simulation study, for the optimal estimator, the relative biases of the variance estimator, for t = 1, 2, 3 are , , respectively.

45 45 / 47 Numerical Study Results from Simulation Study II 100*RMSE/RMSE.Full t = 1 t = 2 t = 3 Full Naive PS RRZ Opt

46 46 / 47 Concluding Remarks We adopted GLS (GMM) technique and constructed an optimal estimator among a class of unbiased estimators. Under monotone missing pattern, applying GLS (GMM) method to estimate µ X, µ 1,..., µ T simultaneously is exactly the same as what we proposed (estimate µ X, µ 1,..., µ T one by one). This method is directly applicable to the case when the baseline year sample is selected with a complex probability sample. Extension to non-monotone missing pattern, time-dependent covariate can be important topics for further investigation.

47 47 / 47 Thank You! Questions? : jkim@iastate.edu

Data Integration for Big Data Analysis for finite population inference

Data Integration for Big Data Analysis for finite population inference for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation

More information

Recent Advances in the analysis of missing data with non-ignorable missingness

Recent Advances in the analysis of missing data with non-ignorable missingness Recent Advances in the analysis of missing data with non-ignorable missingness Jae-Kwang Kim Department of Statistics, Iowa State University July 4th, 2014 1 Introduction 2 Full likelihood-based ML estimation

More information

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Fractional Imputation in Survey Sampling: A Comparative Review

Fractional Imputation in Survey Sampling: A Comparative Review Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical

More information

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70

Chapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Chapter 5: Models used in conjunction with sampling J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Nonresponse Unit Nonresponse: weight adjustment Item Nonresponse:

More information

Nonresponse weighting adjustment using estimated response probability

Nonresponse weighting adjustment using estimated response probability Nonresponse weighting adjustment using estimated response probability Jae-kwang Kim Yonsei University, Seoul, Korea December 26, 2006 Introduction Nonresponse Unit nonresponse Item nonresponse Basic strategy

More information

Chapter 8: Estimation 1

Chapter 8: Estimation 1 Chapter 8: Estimation 1 Jae-Kwang Kim Iowa State University Fall, 2014 Kim (ISU) Ch. 8: Estimation 1 Fall, 2014 1 / 33 Introduction 1 Introduction 2 Ratio estimation 3 Regression estimator Kim (ISU) Ch.

More information

Chapter 4: Imputation

Chapter 4: Imputation Chapter 4: Imputation Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Basic Theory for imputation 3 Variance estimation after imputation 4 Replication variance estimation

More information

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND

More information

A note on multiple imputation for general purpose estimation

A note on multiple imputation for general purpose estimation A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume

More information

Propensity score adjusted method for missing data

Propensity score adjusted method for missing data Graduate Theses and Dissertations Graduate College 2013 Propensity score adjusted method for missing data Minsun Kim Riddles Iowa State University Follow this and additional works at: http://lib.dr.iastate.edu/etd

More information

Statistical Methods for Handling Missing Data

Statistical Methods for Handling Missing Data Statistical Methods for Handling Missing Data Jae-Kwang Kim Department of Statistics, Iowa State University July 5th, 2014 Outline Textbook : Statistical Methods for handling incomplete data by Kim and

More information

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved

More information

Combining data from two independent surveys: model-assisted approach

Combining data from two independent surveys: model-assisted approach Combining data from two independent surveys: model-assisted approach Jae Kwang Kim 1 Iowa State University January 20, 2012 1 Joint work with J.N.K. Rao, Carleton University Reference Kim, J.K. and Rao,

More information

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007)

Double Robustness. Bang and Robins (2005) Kang and Schafer (2007) Double Robustness Bang and Robins (2005) Kang and Schafer (2007) Set-Up Assume throughout that treatment assignment is ignorable given covariates (similar to assumption that data are missing at random

More information

Parametric fractional imputation for missing data analysis

Parametric fractional imputation for missing data analysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????),??,?, pp. 1 15 C???? Biometrika Trust Printed in

More information

Combining Non-probability and Probability Survey Samples Through Mass Imputation

Combining Non-probability and Probability Survey Samples Through Mass Imputation Combining Non-probability and Probability Survey Samples Through Mass Imputation Jae-Kwang Kim 1 Iowa State University & KAIST October 27, 2018 1 Joint work with Seho Park, Yilin Chen, and Changbao Wu

More information

6. Fractional Imputation in Survey Sampling

6. Fractional Imputation in Survey Sampling 6. Fractional Imputation in Survey Sampling 1 Introduction Consider a finite population of N units identified by a set of indices U = {1, 2,, N} with N known. Associated with each unit i in the population

More information

A Sampling of IMPACT Research:

A Sampling of IMPACT Research: A Sampling of IMPACT Research: Methods for Analysis with Dropout and Identifying Optimal Treatment Regimes Marie Davidian Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Robustness to Parametric Assumptions in Missing Data Models

Robustness to Parametric Assumptions in Missing Data Models Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice

More information

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Modification and Improvement of Empirical Likelihood for Missing Response Problem UW Biostatistics Working Paper Series 12-30-2010 Modification and Improvement of Empirical Likelihood for Missing Response Problem Kwun Chuen Gary Chan University of Washington - Seattle Campus, kcgchan@u.washington.edu

More information

A measurement error model approach to small area estimation

A measurement error model approach to small area estimation A measurement error model approach to small area estimation Jae-kwang Kim 1 Spring, 2015 1 Joint work with Seunghwan Park and Seoyoung Kim Ouline Introduction Basic Theory Application to Korean LFS Discussion

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

Weighting in survey analysis under informative sampling

Weighting in survey analysis under informative sampling Jae Kwang Kim and Chris J. Skinner Weighting in survey analysis under informative sampling Article (Accepted version) (Refereed) Original citation: Kim, Jae Kwang and Skinner, Chris J. (2013) Weighting

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

On the bias of the multiple-imputation variance estimator in survey sampling

On the bias of the multiple-imputation variance estimator in survey sampling J. R. Statist. Soc. B (2006) 68, Part 3, pp. 509 521 On the bias of the multiple-imputation variance estimator in survey sampling Jae Kwang Kim, Yonsei University, Seoul, Korea J. Michael Brick, Westat,

More information

Calibration Estimation for Semiparametric Copula Models under Missing Data

Calibration Estimation for Semiparametric Copula Models under Missing Data Calibration Estimation for Semiparametric Copula Models under Missing Data Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Economics and Economic Growth Centre

More information

Combining Non-probability and. Probability Survey Samples Through Mass Imputation

Combining Non-probability and. Probability Survey Samples Through Mass Imputation Combining Non-probability and arxiv:1812.10694v2 [stat.me] 31 Dec 2018 Probability Survey Samples Through Mass Imputation Jae Kwang Kim Seho Park Yilin Chen Changbao Wu January 1, 2019 Abstract. This paper

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Econometrics Workshop UNC

More information

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter

More information

Estimating the Marginal Odds Ratio in Observational Studies

Estimating the Marginal Odds Ratio in Observational Studies Estimating the Marginal Odds Ratio in Observational Studies Travis Loux Christiana Drake Department of Statistics University of California, Davis June 20, 2011 Outline The Counterfactual Model Odds Ratios

More information

Chapter 3: Maximum Likelihood Theory

Chapter 3: Maximum Likelihood Theory Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40 1 Introduction Example 2 Maximum likelihood

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Linear Methods for Prediction

Linear Methods for Prediction This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i, A Course in Applied Econometrics Lecture 18: Missing Data Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. When Can Missing Data be Ignored? 2. Inverse Probability Weighting 3. Imputation 4. Heckman-Type

More information

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing Primal-dual Covariate Balance and Minimal Double Robustness via (Joint work with Daniel Percival) Department of Statistics, Stanford University JSM, August 9, 2015 Outline 1 2 3 1/18 Setting Rubin s causal

More information

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information

Likelihood-based inference with missing data under missing-at-random

Likelihood-based inference with missing data under missing-at-random Likelihood-based inference with missing data under missing-at-random Jae-kwang Kim Joint work with Shu Yang Department of Statistics, Iowa State University May 4, 014 Outline 1. Introduction. Parametric

More information

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING

INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Statistica Sinica 24 (2014), 1001-1015 doi:http://dx.doi.org/10.5705/ss.2013.038 INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Seunghwan Park and Jae Kwang Kim Seoul National Univeristy

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss

More information

Combining multiple observational data sources to estimate causal eects

Combining multiple observational data sources to estimate causal eects Department of Statistics, North Carolina State University Combining multiple observational data sources to estimate causal eects Shu Yang* syang24@ncsuedu Joint work with Peng Ding UC Berkeley May 23,

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Web-based Supplementary Materials for A Robust Method for Estimating. Optimal Treatment Regimes

Web-based Supplementary Materials for A Robust Method for Estimating. Optimal Treatment Regimes Biometrics 000, 000 000 DOI: 000 000 0000 Web-based Supplementary Materials for A Robust Method for Estimating Optimal Treatment Regimes Baqun Zhang, Anastasios A. Tsiatis, Eric B. Laber, and Marie Davidian

More information

Graybill Conference Poster Session Introductions

Graybill Conference Poster Session Introductions Graybill Conference Poster Session Introductions 2013 Graybill Conference in Modern Survey Statistics Colorado State University Fort Collins, CO June 10, 2013 Small Area Estimation with Incomplete Auxiliary

More information

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design 1 / 32 Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design Changbao Wu Department of Statistics and Actuarial Science University of Waterloo (Joint work with Min Chen and Mary

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Institute of Statistics

More information

Imputation for Missing Data under PPSWR Sampling

Imputation for Missing Data under PPSWR Sampling July 5, 2010 Beijing Imputation for Missing Data under PPSWR Sampling Guohua Zou Academy of Mathematics and Systems Science Chinese Academy of Sciences 1 23 () Outline () Imputation method under PPSWR

More information

Chapter 3: Element sampling design: Part 1

Chapter 3: Element sampling design: Part 1 Chapter 3: Element sampling design: Part 1 Jae-Kwang Kim Fall, 2014 Simple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling Kim Ch. 3: Element sampling design: Part

More information

Economics 582 Random Effects Estimation

Economics 582 Random Effects Estimation Economics 582 Random Effects Estimation Eric Zivot May 29, 2013 Random Effects Model Hence, the model can be re-written as = x 0 β + + [x ] = 0 (no endogeneity) [ x ] = = + x 0 β + + [x ] = 0 [ x ] = 0

More information

A weighted simulation-based estimator for incomplete longitudinal data models

A weighted simulation-based estimator for incomplete longitudinal data models To appear in Statistics and Probability Letters, 113 (2016), 16-22. doi 10.1016/j.spl.2016.02.004 A weighted simulation-based estimator for incomplete longitudinal data models Daniel H. Li 1 and Liqun

More information

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai

Weighting Methods. Harvard University STAT186/GOV2002 CAUSAL INFERENCE. Fall Kosuke Imai Weighting Methods Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Weighting Methods Stat186/Gov2002 Fall 2018 1 / 13 Motivation Matching methods for improving

More information

Lecture 6: Discrete Choice: Qualitative Response

Lecture 6: Discrete Choice: Qualitative Response Lecture 6: Instructor: Department of Economics Stanford University 2011 Types of Discrete Choice Models Univariate Models Binary: Linear; Probit; Logit; Arctan, etc. Multinomial: Logit; Nested Logit; GEV;

More information

Covariate Balancing Propensity Score for General Treatment Regimes

Covariate Balancing Propensity Score for General Treatment Regimes Covariate Balancing Propensity Score for General Treatment Regimes Kosuke Imai Princeton University October 14, 2014 Talk at the Department of Psychiatry, Columbia University Joint work with Christian

More information

5 Methods Based on Inverse Probability Weighting Under MAR

5 Methods Based on Inverse Probability Weighting Under MAR 5 Methods Based on Inverse Probability Weighting Under MAR The likelihood-based and multiple imputation methods we considered for inference under MAR in Chapters 3 and 4 are based, either directly or indirectly,

More information

Calibration estimation using exponential tilting in sample surveys

Calibration estimation using exponential tilting in sample surveys Calibration estimation using exponential tilting in sample surveys Jae Kwang Kim February 23, 2010 Abstract We consider the problem of parameter estimation with auxiliary information, where the auxiliary

More information

Comment: Understanding OR, PS and DR

Comment: Understanding OR, PS and DR Statistical Science 2007, Vol. 22, No. 4, 560 568 DOI: 10.1214/07-STS227A Main article DOI: 10.1214/07-STS227 c Institute of Mathematical Statistics, 2007 Comment: Understanding OR, PS and DR Zhiqiang

More information

Two-phase sampling approach to fractional hot deck imputation

Two-phase sampling approach to fractional hot deck imputation Two-phase sampling approach to fractional hot deck imputation Jongho Im 1, Jae-Kwang Kim 1 and Wayne A. Fuller 1 Abstract Hot deck imputation is popular for handling item nonresponse in survey sampling.

More information

7 Sensitivity Analysis

7 Sensitivity Analysis 7 Sensitivity Analysis A recurrent theme underlying methodology for analysis in the presence of missing data is the need to make assumptions that cannot be verified based on the observed data. If the assumption

More information

Time Series Analysis

Time Series Analysis Time Series Analysis hm@imm.dtu.dk Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1 Outline of the lecture Regression based methods, 1st part: Introduction (Sec.

More information

Modelling Survival Events with Longitudinal Data Measured with Error

Modelling Survival Events with Longitudinal Data Measured with Error Modelling Survival Events with Longitudinal Data Measured with Error Hongsheng Dai, Jianxin Pan & Yanchun Bao First version: 14 December 29 Research Report No. 16, 29, Probability and Statistics Group

More information

analysis of incomplete data in statistical surveys

analysis of incomplete data in statistical surveys analysis of incomplete data in statistical surveys Ugo Guarnera 1 1 Italian National Institute of Statistics, Italy guarnera@istat.it Jordan Twinning: Imputation - Amman, 6-13 Dec 2014 outline 1 origin

More information

AN INSTRUMENTAL VARIABLE APPROACH FOR IDENTIFICATION AND ESTIMATION WITH NONIGNORABLE NONRESPONSE

AN INSTRUMENTAL VARIABLE APPROACH FOR IDENTIFICATION AND ESTIMATION WITH NONIGNORABLE NONRESPONSE Statistica Sinica 24 (2014), 1097-1116 doi:http://dx.doi.org/10.5705/ss.2012.074 AN INSTRUMENTAL VARIABLE APPROACH FOR IDENTIFICATION AND ESTIMATION WITH NONIGNORABLE NONRESPONSE Sheng Wang 1, Jun Shao

More information

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley

Panel Data Models. James L. Powell Department of Economics University of California, Berkeley Panel Data Models James L. Powell Department of Economics University of California, Berkeley Overview Like Zellner s seemingly unrelated regression models, the dependent and explanatory variables for panel

More information

Introduction to Survey Data Integration

Introduction to Survey Data Integration Introduction to Survey Data Integration Jae-Kwang Kim Iowa State University May 20, 2014 Outline 1 Introduction 2 Survey Integration Examples 3 Basic Theory for Survey Integration 4 NASS application 5

More information

Propensity Score Weighting with Multilevel Data

Propensity Score Weighting with Multilevel Data Propensity Score Weighting with Multilevel Data Fan Li Department of Statistical Science Duke University October 25, 2012 Joint work with Alan Zaslavsky and Mary Beth Landrum Introduction In comparative

More information

2 Naïve Methods. 2.1 Complete or available case analysis

2 Naïve Methods. 2.1 Complete or available case analysis 2 Naïve Methods Before discussing methods for taking account of missingness when the missingness pattern can be assumed to be MAR in the next three chapters, we review some simple methods for handling

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

University of Oxford. Statistical Methods Autocorrelation. Identification and Estimation

University of Oxford. Statistical Methods Autocorrelation. Identification and Estimation University of Oxford Statistical Methods Autocorrelation Identification and Estimation Dr. Órlaith Burke Michaelmas Term, 2011 Department of Statistics, 1 South Parks Road, Oxford OX1 3TG Contents 1 Model

More information

B y t = γ 0 + Γ 1 y t + ε t B(L) y t = γ 0 + ε t ε t iid (0, D) D is diagonal

B y t = γ 0 + Γ 1 y t + ε t B(L) y t = γ 0 + ε t ε t iid (0, D) D is diagonal Structural VAR Modeling for I(1) Data that is Not Cointegrated Assume y t =(y 1t,y 2t ) 0 be I(1) and not cointegrated. That is, y 1t and y 2t are both I(1) and there is no linear combination of y 1t and

More information

Causal Inference Basics

Causal Inference Basics Causal Inference Basics Sam Lendle October 09, 2013 Observed data, question, counterfactuals Observed data: n i.i.d copies of baseline covariates W, treatment A {0, 1}, and outcome Y. O i = (W i, A i,

More information

Chapter 4. Replication Variance Estimation. J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28

Chapter 4. Replication Variance Estimation. J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28 Chapter 4 Replication Variance Estimation J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28 Jackknife Variance Estimation Create a new sample by deleting one observation n 1 n n ( x (k) x) 2 = x (k) = n

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates

Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Improving Efficiency of Inferences in Randomized Clinical Trials Using Auxiliary Covariates Anastasios (Butch) Tsiatis Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Biostat 2065 Analysis of Incomplete Data

Biostat 2065 Analysis of Incomplete Data Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh September 13 & 15, 2005 1. Complete-case analysis (I) Complete-case analysis refers to analysis based on

More information

Econ 583 Final Exam Fall 2008

Econ 583 Final Exam Fall 2008 Econ 583 Final Exam Fall 2008 Eric Zivot December 11, 2008 Exam is due at 9:00 am in my office on Friday, December 12. 1 Maximum Likelihood Estimation and Asymptotic Theory Let X 1,...,X n be iid random

More information

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models

The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models The consequences of misspecifying the random effects distribution when fitting generalized linear mixed models John M. Neuhaus Charles E. McCulloch Division of Biostatistics University of California, San

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models

Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

Chapter 2. Section Section 2.9. J. Kim (ISU) Chapter 2 1 / 26. Design-optimal estimator under stratified random sampling

Chapter 2. Section Section 2.9. J. Kim (ISU) Chapter 2 1 / 26. Design-optimal estimator under stratified random sampling Chapter 2 Section 2.4 - Section 2.9 J. Kim (ISU) Chapter 2 1 / 26 2.4 Regression and stratification Design-optimal estimator under stratified random sampling where (Ŝxxh, Ŝxyh) ˆβ opt = ( x st, ȳ st )

More information

Statistical Inference of Covariate-Adjusted Randomized Experiments

Statistical Inference of Covariate-Adjusted Randomized Experiments 1 Statistical Inference of Covariate-Adjusted Randomized Experiments Feifang Hu Department of Statistics George Washington University Joint research with Wei Ma, Yichen Qin and Yang Li Email: feifang@gwu.edu

More information

Statistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23

Statistical Methods. Missing Data  snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23 1 / 23 Statistical Methods Missing Data http://www.stats.ox.ac.uk/ snijders/sm.htm Tom A.B. Snijders University of Oxford November, 2011 2 / 23 Literature: Joseph L. Schafer and John W. Graham, Missing

More information

2018 2019 1 9 sei@mistiu-tokyoacjp http://wwwstattu-tokyoacjp/~sei/lec-jhtml 11 552 3 0 1 2 3 4 5 6 7 13 14 33 4 1 4 4 2 1 1 2 2 1 1 12 13 R?boxplot boxplotstats which does the computation?boxplotstats

More information

Problem Selected Scores

Problem Selected Scores Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected

More information

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE

Chapter 6. Panel Data. Joan Llull. Quantitative Statistical Methods II Barcelona GSE Chapter 6. Panel Data Joan Llull Quantitative Statistical Methods II Barcelona GSE Introduction Chapter 6. Panel Data 2 Panel data The term panel data refers to data sets with repeated observations over

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

Recap. Vector observation: Y f (y; θ), Y Y R m, θ R d. sample of independent vectors y 1,..., y n. pairwise log-likelihood n m. weights are often 1

Recap. Vector observation: Y f (y; θ), Y Y R m, θ R d. sample of independent vectors y 1,..., y n. pairwise log-likelihood n m. weights are often 1 Recap Vector observation: Y f (y; θ), Y Y R m, θ R d sample of independent vectors y 1,..., y n pairwise log-likelihood n m i=1 r=1 s>r w rs log f 2 (y ir, y is ; θ) weights are often 1 more generally,

More information

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses Outline Marginal model Examples of marginal model GEE1 Augmented GEE GEE1.5 GEE2 Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association

More information

Joint work with Nottingham colleagues Simon Preston and Michail Tsagris.

Joint work with Nottingham colleagues Simon Preston and Michail Tsagris. /pgf/stepx/.initial=1cm, /pgf/stepy/.initial=1cm, /pgf/step/.code=1/pgf/stepx/.expanded=- 10.95415pt,/pgf/stepy/.expanded=- 10.95415pt, /pgf/step/.value required /pgf/images/width/.estore in= /pgf/images/height/.estore

More information

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score

Causal Inference with General Treatment Regimes: Generalizing the Propensity Score Causal Inference with General Treatment Regimes: Generalizing the Propensity Score David van Dyk Department of Statistics, University of California, Irvine vandyk@stat.harvard.edu Joint work with Kosuke

More information

Integrated approaches for analysis of cluster randomised trials

Integrated approaches for analysis of cluster randomised trials Integrated approaches for analysis of cluster randomised trials Invited Session 4.1 - Recent developments in CRTs Joint work with L. Turner, F. Li, J. Gallis and D. Murray Mélanie PRAGUE - SCT 2017 - Liverpool

More information

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

Weakly dependent functional data. Piotr Kokoszka. Utah State University. Siegfried Hörmann. University of Utah

Weakly dependent functional data. Piotr Kokoszka. Utah State University. Siegfried Hörmann. University of Utah Weakly dependent functional data Piotr Kokoszka Utah State University Joint work with Siegfried Hörmann University of Utah Outline Examples of functional time series L 4 m approximability Convergence of

More information

TAMS39 Lecture 2 Multivariate normal distribution

TAMS39 Lecture 2 Multivariate normal distribution TAMS39 Lecture 2 Multivariate normal distribution Martin Singull Department of Mathematics Mathematical Statistics Linköping University, Sweden Content Lecture Random vectors Multivariate normal distribution

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

F & B Approaches to a simple model

F & B Approaches to a simple model A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 215 http://www.astro.cornell.edu/~cordes/a6523 Lecture 11 Applications: Model comparison Challenges in large-scale surveys

More information