Chapter 4: Imputation
|
|
- Karin Porter
- 5 years ago
- Views:
Transcription
1 Chapter 4: Imputation Jae-Kwang Kim Department of Statistics, Iowa State University
2 Outline 1 Introduction 2 Basic Theory for imputation 3 Variance estimation after imputation 4 Replication variance estimation 5 Multiple imputation 6 Fractional imputation Ch 4 2 / 79
3 Introduction Basic setup Y: a vector of random variables with distribution F (y; θ). y 1,, y n are n independent realizations of Y. We are interested in estimating ψ which is implicitly defined by E{U(ψ; Y)} = 0. Under complete observation, a consistent estimator ˆψ n of ψ can be obtained by solving estimating equation for ψ: U(ψ; y i ) = 0. A special case of estimating function is the score function. In this case, ψ = θ. Sandwich variance estimator is often used to estimate the variance of ˆψ n : ˆV ( ˆψ n ) = ˆτ 1 u where τ u = E{ U(ψ; y)/ ψ }. ˆV (U)ˆτ 1 u Ch 4 3 / 79
4 1. Introduction Missing data setup Suppose that y i is not fully observed. y i = (y obs,i, y mis,i ): (observed, missing) part of y i δ i : response indicator functions for y i. Under the existence of missing data, we can use the following estimators: ˆψ: solution to E {U (ψ; y i ) y obs,i, δ i } = 0. (1) The equation in (1) is often called expected estimating equation. Ch 4 4 / 79
5 1. Introduction Motivation (for imputation) Computing the conditional expectation in (1) can be a challenging problem. 1 The conditional expectation depends on unknown parameter values. That is, E {U (ψ; y i ) y obs,i, δ i } = E {U (ψ; y i ) y obs,i, δ i ; θ, φ}, where θ is the parameter in f (y; θ) and φ is the parameter in p(δ y; φ). 2 Even if we know η = (θ, φ), computing the conditional expectation is numerically difficult. Ch 4 5 / 79
6 1. Introduction Imputation Imputation: Monte Carlo approximation of the conditional expectation (given the observed data). E {U (ψ; y i ) y obs,i, δ i } = 1 m m U j=1 ( ) ψ; y obs,i, y (j) mis,i 1 Bayesian approach: generate y mis,i from f (y mis,i y obs, δ) = f (y mis,i y obs, δ; η) p(η y obs, δ)dη 2 Frequentist approach: generate y mis,i from f (y mis,i y obs,i, δ; ˆη), where ˆη is a consistent estimator. Ch 4 6 / 79
7 Example 4.1 Basic Setup Let (x, y) be a vector of bivariate random variables. Assume that x i are always observed and y i are subject to missingness in the sample, and the probability of missingness does not depend on the value of y i. In this case, an imputed estimator of θ = E(Y ) based on single imputation can be computed by ˆθ I = 1 n {δ i y i + (1 δ i ) y i } (2) where y i is an imputed value for y i. Imputation model y i N ( β 0 + β 1 x i, σ 2 e), for some (β 0, β 1, σ 2 e). Ch 4 7 / 79
8 Example 4.1 (Cont d) Deterministic imputation: Use yi = ˆβ 0 + ˆβ 1 x i where ( ) ( ) ˆβ 0, ˆβ 1 = ȳ r ˆβ 1 x r, Sxxr 1 S xyr. Note that and ) V (ˆθ I ) E (ˆθ I θ = 0 = 1 ( 1 n σ2 y + r 1 ) σe 2 = σ2 { ( y 1 1 r ) ρ 2}. n r n Stochastic imputation: Use yi = ˆβ 0 + ˆβ 1 x i + ei, where e i (0, ˆσ e). 2 The imputed estimator under stochastic imputation satisfies ) V (ˆθ I = 1 n σ2 y + ( 1 r 1 n ) σe 2 + n r n 2 σ2 e where the third term represents the additional variance due to stochastic imputation. Ch 4 8 / 79
9 Remark Deterministic imputation is unbiased for the estimating the mean but may not be unbiased for estimating the proportion. For example, if θ = Pr(Y < c) = E{I (Y < c)}, the imputed estimator ˆθ = n 1 {δ i I (y i < c) + (1 δ i )I (y i < c)} is unbiased if E{I (Y < c)} = E{I (Y < c)}, which holds only when the marginal distribution of y is the same as the marginal distribution of y. In general, under deterministic imputation, we have E(y) = E(y ) but V (y) > V (y ). For regression imputation, V (y ) = σ 2 y (1 ρ 2 ) < σ 2 y = V (y). Imputation increases the variance (of the imputed estimator) because the imputed values are positively correlated. Variance estimation is complicated because of the correlation between the imputed values. Ch 4 9 / 79
10 2 Basic Theory for Imputation Lemma 4.1 Let ˆθ be the solution to Û (θ) = 0, where Û(θ) is a function of complete observations } y 1,, y n and parameter θ. Let θ 0 be the solution to E {Û (θ) = 0. Then, under some regularity conditions, ˆθ θ 0 = [E { U(θ 0 )}] 1 Û (θ0 ), where U(θ) = Û(θ)/ θ and notation A n = Bn means that A n = 1 + R n for some R n which converges to zero in probability. B 1 n Ch 4 10 / 79
11 Remark (about Lemma 4.1) Its proof is based on Taylor linearization: ) Û(ˆθ) = Û(θ 0 ) + U(θ 0 ) (ˆθ θ 0 = Û(θ 0) + E{ U(θ 0 )} (ˆθ θ0 ), where the second (approximate) equality follows by and ˆθ = θ 0 + o p (1). { } Need to assume that E U(θ 0 ) U(θ 0 ) = E{ U(θ 0 )} + o p (1) Also, we need conditions for ˆθ p θ 0. is nonsingular. Lemma 4.1 can be used to establish the asymptotic normality of ˆθ. (Use Slutsky Theorem). Ch 4 11 / 79
12 where S com (η; y) is the score function of η = (θ, φ) under complete response. Ch 4 12 / Basic Theory for Imputation Basic Setup (for Case 1: ψ = η) y = (y 1,, y n ) f (y; θ) δ = (δ 1,, δ n ) P(δ y; φ) y = (y obs, y mis ): (observed, missing) part of y. y (1) mis,, y (m) mis : m imputed values of y mis generated from f (y mis y obs, δ; ˆη p ) = f (y; ˆθ p )P(δ y; ˆφ p ) f (y; ˆθp )P(δ y; ˆφ p )dµ(y mis ), where ˆη p = (ˆθ p, ˆφ p ) is a preliminary estimator of η = (θ, φ). Using m imputed values, imputed score function is computed as S imp,m (η ˆη p ) m 1 m j=1 S com ( η; y obs, y (j) mis, δ )
13 2. Basic Theory for Imputation Lemma 4.2 (Asymptotic results for m = ) Assume that ˆη p converges in probability to η. Let ˆη I,m be the solution to 1 m m j=1 S com ( η; y obs, y (j) mis, δ ) = 0, where y (1) mis,, y (m) mis are the imputed values generated from f (y mis y obs, δ; ˆη p ). Then, under some regularity conditions, for m, and ˆη imp, = ˆη MLE + J mis (ˆη p ˆη MLE ) (3) V (ˆη imp, ). = I 1 obs + J mis {V (ˆη p ) V (ˆη MLE )} J mis, where J mis = I 1 comi mis is the fraction of missing information. Ch 4 13 / 79
14 Remark Equation (3) means that ˆη imp, = (I J mis ) ˆη MLE + J misˆη p. (4) That is, ˆη imp, is a convex combination of ˆη MLE and ˆη p. Note that ˆη imp, is one-step EM update with initial estimate ˆη p. Let ˆη (t) be the t-th EM update of η that is computed by solving ( S η ˆη (t 1)) = 0 with ˆη (0) = ˆη p. Equation (4) implies that Thus, we can obtain ˆη (t) = (I J mis ) ˆη MLE + J misˆη (t 1). ˆη (t) = ˆη MLE + (J mis ) t 1 (ˆη (0) ˆη MLE ), which justifies lim t ˆη (t) = ˆη MLE. Ch 4 14 / 79
15 Proof for Lemma 4.2 Ch 4 15 / 79
16 Wang and Robins (1998) Theorem 4.1 (Asymptotic results for m < ) Let ˆη p be a preliminary n-consistent estimator of η with variance V p. Under some regularity conditions, the solution ˆη imp,m to S m (η ˆη p ) 1 m m j=1 S com ( η; y obs, y (j) mis, δ ) = 0 has mean η 0 and asymptotic variance equal to V (ˆη imp,m ). = I 1 obs + J mis where J mis = I 1 comi mis. { Vp I 1 } obs J mis + m 1 IcomI 1 mis Icom 1 (5) Ch 4 16 / 79
17 2. Basic Theory for Imputation Remark If we use ˆη p = ˆη MLE, then the asymptotic variance in (5) is V (ˆη imp,m ). = I 1 obs + m 1 IcomI 1 mis Icom. 1 In Bayesian imputation (or multiple imputation), the posterior values of η are independently generated from η N(ˆη MLE, I 1 obs ), which implies that we can use V p = I 1 obs + m 1 I 1 obs. Thus, the asymptotic variance in (5) for multiple imputation is V (ˆη imp,m ). = I 1 obs + m 1 J mis I 1 obs J mis + m 1 IcomI 1 mis Icom. 1 The second term is the additional price we pay when generating the posterior values, rather than using ˆη MLE directly. Ch 4 17 / 79
18 Remark (about Theorem 4.1) Variance term (5) has three components. Writing ˆη imp,m = ˆη MLE + (ˆη imp, ˆη MLE ) + (ˆη imp,m ˆη imp, ), we can establish that the three terms are independent and satisfies V (ˆη MLE = I 1 obs V (ˆη imp, ) { ˆη MLE = Jmis Vp I 1 V (ˆη imp,m ˆη imp, ) = m 1 I 1 obs} J mis comi mis Icom 1 Ch 4 18 / 79
19 Sketched proof for Theorem 4.1 By Lemma 4.1 applied to S(η ˆη p ) = 0, we have Similarly, we can write Thus, V (ˆη imp,m ˆη imp, ) and ˆη imp, η 0 = I 1 com S(η ˆη p ). ˆη imp,m η 0 = I 1 com S m(η ˆη p ). = I 1 comv { S m (η ˆη p ) S(η ˆη p ) } I 1 com V { S m(η ˆη p ) S(η ˆη p ) } = 1 m V {S com(η) y obs, δ; ˆη p } = 1 m V {S mis(η) y obs, δ; ˆη p } = 1 m E { S mis (η) 2 y obs, δ; ˆη p } = 1 m E { S mis (η) 2 y obs, δ; η }. Ch 4 19 / 79
20 2. Basic Theory for Imputation Basic Setup (for Case 2: ψ η) Parameter ψ defined by E{U(ψ; y)} = 0. Under complete response, a consistent estimator of ψ can be obtained by solving U (ψ; y) = 0. Assume that some part of y, denoted by y mis, is not observed and m imputed values, say y (1) mis,, y (m) mis, are generated from f (y mis y obs, δ; ˆη MLE ), where ˆη MLE is the MLE of η 0. The imputed estimating function using m imputed values is computed as Ūimp,m (ψ ˆη MLE ) = 1 m U(ψ; y (j) ), (6) m where y (j) = (y obs, y (j) mis ). Let ˆψ imp,m be the solution to Ū imp,m (ψ ˆη MLE ) = 0. We are j=1 interested in the asymptotic properties of ˆψ imp,m. Ch 4 20 / 79
21 Theorem 4.2 Theorem 4.2 Suppose that the parameter of interest ψ 0 is estimated by solving U (ψ) = 0 under complete response. Then, under some regularity conditions, the solution to E {U (ψ) y obs, δ; ˆη MLE } = 0 (7) has mean ψ 0 and the asymptotic variance τ 1 Ωτ 1, where τ = E { U (ψ 0 ) / ψ } Ω = V { Ū (ψ 0 η 0 ) + κs obs (η 0 ) } and κ = E {U (ψ 0 ) S mis (η 0 )} I 1 obs. Ch 4 21 / 79
22 Sketched Proof Writing Ū(ψ η) E{U(ψ) y obs, δ; η}, the solution to (7) can be treated as the solution to the joint estimating equation [ ] U1 (ψ, η) U (ψ, η) = 0, U 2 (η) where U 1 (ψ, η) = Ū (ψ η) and U 2 (η) = S obs (η). We can apply the Taylor expansion to get ( ) ) ( ˆψ B11 B 12 ˆη where = ( ψ0 η 0 ( B11 B 12 B 21 B 22 ) = B 21 B 22 ) 1 [ U1 (ψ 0, η 0 ) U 2 (η 0 ) [ E ( U1 / ψ ) E ( U 1 / η ) E ( U 2 / ψ ) E ( U 2 / η ) ]. ] Ch 4 22 / 79
23 Sketched Proof (Cont d) Note that B 11 = E{ U(ψ)/ ψ } B 21 = 0 B 12 = E{U(ψ)S mis (η 0 )} B 22 = I obs Thus, ˆψ = ψ 0 B11 1 { U1 (ψ 0, η 0 ) B 12 B22 1 U 2(η 0 ) }. Ch 4 23 / 79
24 Alternative approach Use Randles (1982) theorem: An estimator ˆθ( ˆβ) is asymptotically equivalent to ˆθ(β) if { } E β ˆθ(β) = 0. Thus, writing ˆθ k (β) = ˆθ(β) + ks obs (β), we have 1 ˆθ( ˆβ) = ˆθ k ( ˆβ) holds for any k. 2 If we can find k = k such that ˆθ k ( ˆβ) satisfies Randles condition, then we can safely ignore the effect of the sampling variability of ˆβ and assume that ˆθ k ( ˆβ) = ˆθ k (β). Ch 4 24 / 79
25 Example 4.2 Under the setup of Example 4.1, we are interested in obtaining the asymptotic variance of the regression imputation estimator ˆθ Id = 1 { ( )} δ i y i + (1 δ i ) ˆβ 0 + ˆβ 1 x i, n ( ) where ˆβ = ˆβ 0, ˆβ 1 is the solution to S obs (β) = 1 σ 2 e δ i (y i β 0 β 1 x i )(1, x i ). The imputed estimator is the solution to { } E U(θ) y obs, δ; ˆβ = 0 where U(θ) = (y i θ) /σe. 2 Ch 4 25 / 79
26 Example 4.2 (Cont d) Since E {U(θ) y obs, δ; β} Ū(θ β) = {δ i y i + (1 δ i ) (β 0 + β 1 x i ) θ} /σe. 2 Thus, using the linearization formula in Theorem 4.2, we have Ū l (θ β) = Ū(θ β) + (κ 1, κ 2 )S obs (β) (8) where (κ 1, κ 2 ) = I 1 obs E {S mis(β)u(θ)}. (9) Ch 4 26 / 79
27 Example 4.2 (Cont d) In this example, we have ( κ0 κ 1 ) = [ { }] 1 { } E δ i (1, x i ) (1, x i ) E (1 δ i ) (1, x i ) = E { ( 1 + (n/r)(1 g x r ), (n/r)g) }, where g = ( x n x r )/ n δ i(x i x r ) 2 /r. Thus, Ū l (θ β) σ 2 e = δ i (y i θ) + (1 δ i ) (β 0 + β 1 x i θ) + δ i (y i β 0 β 1 x i ) (κ 0 + κ 1 x i ). Ch 4 27 / 79
28 Example 4.2 (Cont d) Note that the solution to Ūl (θ β) = 0 leads to ˆθ Id,l = 1 n = 1 n {β 0 + β 1 x i + δ i (1 + κ 0 + κ 1 x i ) (y i β 0 β 1 x i )} d i, where 1 + κ 0 + κ 1 x i = (n/r){1 + g(x i x r )}. Thus, ˆθ Id is asymptotically equivalent to ˆθ Id,l, which is the sample mean of d i, the influence function of unit i to ˆθ Id. Under uniform response mechanism, 1 + κ 0 + κ 1 x i = n/r and the asymptotic variance of ˆθ l is equal to 1 n β2 1σx r σ2 e = 1 ( 1 n σ2 y + r 1 ) σe 2 n Ch 4 28 / 79
29 Remark Let X 1,, X n be IID sample from f (x; θ 0 ), θ 0 Θ and we are interested in estimating γ 0 = γ(θ 0 ), where γ( ) : Θ R k. An estimator ˆγ = ˆγ n is called asymptotically linear if there exist a random vector ψ(x) such that n (ˆγn γ 0 ) = 1 n ψ(x i ) + o p (1) (10) with E θ0 {ψ(x )} = 0 and E θ0 {ψ(x )ψ(x ) } is finite and non-singular. Here, Z n = o p (1) means that Z n converges to zero in probability. Ch 4 29 / 79
30 Remark The function ψ(x) is referred to as an influence function. The phrase influence function was used by Hampel (JASA, 1974) and is motivated by the fact that to the first order ψ(x) is the influence of a single observation on the estimator ˆγ = ˆγ(X 1,, X n ). The asymptotic properties of an asymptotically linear estimator, ˆγ n can be summarized by considering only its influence function. Since ψ(x ) has zero mean, the CLT tells us that 1 n ψ(x i ) L N [ 0, E θ0 {ψ(x )ψ(x ) } ]. (11) Thus, combining (10) with (11) and applying Slutsky s theorem, we have n (ˆγn γ 0 ) L N [ 0, E θ0 {ψ(x )ψ(x ) } ]. Ch 4 30 / 79
31 3 Variance estimation after imputation Ch 4 31 / 79
32 Deterministic Imputation In Example 4.2, the imputed estimator can be written as ˆθ Id ( ˆβ). Note that we can write the deterministic imputed estimator as where ŷ i ( ˆβ) = ˆβ 0 + ˆβ 1 x i. ˆθ Id = n 1 ŷ i ( ˆβ), In general, the asymptotic variance of ˆθ Id = ˆθ( ˆβ) is different from the asymptotic variance of ˆθ(β) Ch 4 32 / 79
33 As in Example 4.2, if we can find d i = d i (β) such that ˆθ Id ( ˆβ) = n 1 d i ( ˆβ) = n 1 d i (β), then the asymptotic variance of ˆθ Id is equal to the asymptotic variance of d n = n 1 n d i(β). Note that, if (x i, y i, δ i ) are IID, then d i = d(x i, y i, δ i ) are also IID. Thus, the variance of d n = n 1 n d i is unbiasedly estimated by ˆV ( d n ) = 1 n 1 n 1 ( ) 2 di d n. (12) Unfortunately, we cannot compute ˆV ( d n ) in (12) since d i = d i (β) is a function of unknown parameters. Thus, we use ˆd i = d i ( ˆβ) in (12) to get a consistent variance estimator of the imputed estimator. Ch 4 33 / 79
34 Stochastic Imputation Instead of the deterministic imputation, suppose that a stochastic imputation is used such that ˆθ I = n 1 { ( )} δ i y i + (1 δ i ) ˆβ 0 + ˆβ 1 x i + êi, where êi are the additional noise terms in the stochastic imputation. Often êi are randomly selected from the empirical distribution of the sample residuals in the respondents. The variance of the imputed estimator can be decomposed into two parts: ) ) ) V (ˆθ I = V (ˆθ Id + V (ˆθ I ˆθ Id (13) where the first part is the deterministic part and the second part is the additional variance due to stochastic imputation. The first part can be estimated by the linearization method discussed above. The second part is called the imputation variance. Ch 4 34 / 79
35 If we require the imputation mechanism to satisfy (1 δ i ) êi = 0 then the imputation variance is equal to zero. Often the variance of ˆθ I ˆθ Id = n 1 n (1 δ i) êi can be computed under the known imputation mechanism. For example, if simple random sampling without replacement is used then ) V (ˆθ I ˆθ Id = E where m = n r. { V (ˆθ I y obs, δ)} = n 2 (1 m/r) (r 1) 1 δ i êi 2 Ch 4 35 / 79
36 Imputed estimator for general parameters We now discuss a general case of parameter estimation when the parameter of interest ψ is estimated by the solution ˆψ n to U(ψ; y i ) = 0 (14) under complete response of y 1,, y n. Under the existence of missing data, we can use the imputed estimating equation Ū m (ψ) m 1 m j=1 U(ψ; y (j) i ) = 0, (15) where y (j) i = (y i,obs, y (j) i,mis ) and y (j) i,mis are randomly generated from the conditional distribution h (y i,mis y i,obs, δ i ; ˆη p ) where ˆη p is estimated by solving Û p (η) U p (η; y i,obs ) = 0. (16) Ch 4 36 / 79
37 To apply the linearization method, we first compute the conditional expectation of U(ψ; y i ) given (y i,obs, δ i ) evaluated at ˆη p. That is, compute Ū (ψ ˆη p ) = Ū i (ψ ˆη p ) = E {U(ψ; y i ) y i,obs, δ i ; ˆη p }. (17) Let ˆψ R be the solution to Ū (ψ ˆη p) = 0. Using the linearization technique, we have { } Ū (ψ ˆη p ) = Ū (ψ η 0) + E η Ū (ψ η 0) (ˆη p η 0 ) (18) and 0 = Û p (ˆη p ) = Û p (η 0 ) + E { } η Ûp (η 0 ) (ˆη p η 0 ). (19) Ch 4 37 / 79
38 Thus, combining (18) and (19), we have Ū (ψ ˆη p ) = Ū (ψ η 0) + κ(ψ)ûp (η 0 ) (20) where κ(ψ) = E { } [ { }] 1 η Ū (ψ η 0) E η Ûp (η 0 ). Thus, writing Ū l (ψ η 0 ) = {Ūi (ψ η 0 ) + κ(ψ)û p (η 0 ; y i,obs )} = q i (ψ η 0 ), and q i (ψ η 0 ) = Ū i (ψ η 0 ) + κ(ψ)û p (η 0 ; y i,obs ), the variance of Ū (ψ ˆη p ) is asymptotically equal to the variance of Ū l (ψ η 0 ). Ch 4 38 / 79
39 Thus, the sandwich-type variance estimator for ˆψ R is ( ) ˆV ˆψ R = ˆτ q 1 ˆΩ q ˆτ q 1 (21) where ( ) ˆτ q = n 1 q i ˆψ R ˆη p ˆΩ q = n 1 (n 1) 1 (ˆq i q n ) 2, q i (ψ η) = q i (ψ η) / ψ, q n = n 1 n ˆq i, and ˆq i = q i ( ˆψ R ˆη p ). Note that ( ) ˆτ q = n 1 q i ˆψ R ˆη p = n 1 because ˆη p is the solution to (16). E { U( ˆψ R ; y i ) y i,obs, δ i ; ˆη p } Ch 4 39 / 79
40 Example 4.3 Assume that the original sample is decomposed into G disjoint groups (often called imputation cells) and the sample observations are IID within the same cell. That is, i.i.d. y i i S g ( µ g, σg 2 ) (22) where S g is the set of sample indices in cell g. Assume that n g sample elements in cell g and r g elements are observed in the cell. For deterministic imputation, let ˆµ g = rg 1 i S g δ i y i be the g-th cell mean of y among the respondents. The (deterministically) imputed estimator of θ = E(Y ) is, under MAR, ˆθ Id = n 1 G g=1 i S g {δ i y i + (1 δ i )ˆµ g } = n 1 G n g ˆµ g. (23) g=1 Ch 4 40 / 79
41 Example 4.3 (Cont d) Using the linearization technique in (20), the imputed estimator can be expressed as ˆθ Id = n 1 G g=1 i S g { µ g + n } g δ i (y i µ g ) r g and the plug-in variance estimator can be expressed as ˆV (ˆθ Id ) = 1 n 1 n 1 (24) ( ˆdi d ) 2 n (25) where ˆd i = ˆµ g + (n g /r g )δ i (y i ˆµ g ) and d n = n 1 n ˆd i. Ch 4 41 / 79
42 Example 4.3 (Cont d) If a stochastic imputation is used where an imputed value is randomly selected from the set of respondents in the same cell, then we can write ˆθ Is = n 1 G g=1 i S g {δ i y i + (1 δ i )y i }. (26) Writing ˆθ Is = ˆθ Id + n 1 G g=1 i S g (1 δ i ) (yi ˆµ g ), the variance of the second term can be estimated by n 2 G g=1 i S g (1 δ i ) (yi ˆµ g ) 2 if the imputed values are generated independently, conditional on the respondents. Ch 4 42 / 79
43 4.4 Replication variance estimation Ch 4 43 / 79
44 Replication variance estimation (under complete response) Let ˆθ n be the complete-sample estimator of θ. The replication variance estimator of ˆθ n takes the form of ˆV rep (ˆθ n ) = L k=1 c k (ˆθ (k) n ˆθ n ) 2 (27) where L is the number of replicates, c k is the replication factor associated with replication k, and ˆθ n (k) is the k-th replicate of ˆθ n. If ˆθ n = n y i/n, then we can write ˆθ n (k) = n w (k) i y i for some replication weights w (k) 1, w (k) 2,, w n (k). Ch 4 44 / 79
45 Replication variance estimation (under complete response) For example, in the jackknife method, we have L = n, c k = (n 1)/n, and { w (k) (n 1) 1 if i k i = 0 if i = k. If we use the above jackknife method to ˆθ n = n y i/n, the resulting jackknife estimator in (27) is algebraically equivalent to n 1 (n 1) 1 n (y i ȳ n ) 2. Ch 4 45 / 79
46 Replication variance estimation (under complete response) Under some regularity conditions, for ˆθ n = g(ȳ n ), the replication variance estimator of ˆθ n, defined by where ˆθ (k) n ˆV rep (ˆθ n ) = = g(ȳ n (k) ), satisfies L k=1 c k (ˆθ (k) n ˆθ n ) 2, (28) ˆV rep (ˆθ n ) = { g (ȳ n ) }2 ˆV rep (ȳ n ). Thus, the replication variance estimator (28) is asymptotically equivalent to the linearized variance estimator. Ch 4 46 / 79
47 Remark If the parameter of interest, denoted by ψ, is estimated by ˆψ which is obtained by solving an estimating equation n U(ψ; y i) = 0, then a consistent variance estimator can be obtained by the sandwich formula: The complete-sample variance estimator of ˆψ is ( ˆV ˆψ) = ˆτ u 1 ˆΩ u ˆτ u 1 (29) where ( ) ˆτ u = n 1 U ˆψ; y i ˆΩ u = n 1 (n 1) 1 (û i ū n ) 2, U (ψ; y) = U (ψ; y) / ψ, ū n = n 1 n ûi, and û i = U( ˆψ; y i ). Ch 4 47 / 79
48 If we want to use the replication method of the form (27), we can construct the replication variance estimator of ˆψ by ˆV rep ( ˆψ) = where ˆψ (k) is computed by L ( 2 c k ˆψ (k) ˆψ) (30) k=1 Û (k) (ψ) w (k) i U(ψ; y i ) = 0. (31) Ch 4 48 / 79
49 One-step approximation of ˆψ (k) is to use or, even more simply, to use { ˆψ (k) 1 1 = ˆψ U (k) ( ˆψ)} Û(k) ( ˆψ) (32) ˆψ (k) 1 = ˆψ { ˆψ)} 1 U( Û(k) ( ˆψ). (33) The replication variance estimator using (33) is algebraically equivalent to { U( ˆψ) } 1 [ k=1 c k {Û(k) ( ˆψ) Û( ˆψ)} 2 ] { U( ˆψ)} 1, which is very close to the sandwich variance formula in (29). Ch 4 49 / 79
50 Back to Example 4.1 For the regression imputation in Example 4.1, ˆθ Id = 1 { ( )} δ i y i + (1 δ i ) ˆβ 0 + ˆβ 1 x i. n The replication variance estimator of ˆθ Id is computed by ) L ) ˆV rep (ˆθ Id = c k (ˆθ (k) 2 Id ˆθ Id (34) where ˆθ (k) Id = k=1 { ( )} w (k) i δ i y i + (1 δ i ) ˆβ(k) (k) 0 + ˆβ 1 x i and ( ˆβ (k) 0, ˆβ (k) 1 ) is the solution to w (k) i δ i (y i β 0 β 1 x i ) (1, x i ) = (0, 0). Ch 4 50 / 79
51 Example 4.4 We now return to the setup of Example In this case, the deterministically imputed estimator of θ = E(Y ) is constructed by ˆθ Id = n 1 {δ i y i + (1 δ i )ˆp 0i } (35) where ˆp 0i is the predictor of y i given x i and δ i = 0. That is, ˆp 0i = p(x i ; ˆβ){1 π(x i, 1; ˆφ)} {1 p(x i ; ˆβ)}{1 π(x i, 0; ˆφ)} + p(x i ; ˆβ){1 π(x i, 1; ˆφ)}, where ˆβ and ˆφ are jointly estimated by the EM algorithm. Ch 4 51 / 79
52 Example 4.4 (Cont d) E-step ( S 1 β β (t), φ (t)) = {y i p i (β)} x i + δ i =1 δ i =0 where w ij(t) = Pr (Y i = j x i, δ i = 0; β (t), φ (t)) = 1 w ij(t) {j p i (β)} x i, j=0 Pr ( Y i = j x i ; β (t)) Pr ( δ i = 0 x i, j; φ (t)) 1 y=0 Pr ( Y i = y x i ; β (t)) Pr ( δ i = 0 x i, y; φ (t)) and ( S 2 φ β (t), φ (t)) = {δ i π (x i, y i ; φ)} ( x ) i, y i δ i =1 + δ i =0 1 w ij(t) {δ i π i (x i, j; φ)} ( x i, j ). j=0 Ch 4 52 / 79
53 Example 4.4 (Cont d) M-step The parameter estimates are updated by solving [ S 1 (β β (t), φ (t)), S 2 ( φ β (t), φ (t))] = (0, 0) for β and φ. Ch 4 53 / 79
54 Example 4.4 (Cont d) For replication variance estimation, we can use (34) with where ˆp (k) 0i = { } Id = w (k) i δ i y i + (1 δ i )ˆp (k) 0i. (36) ˆθ (k) p(x i ; ˆβ (k) ){1 π(x i, 1; ˆφ (k) )} {1 p(x i ; ˆβ (k) )}π(x i, 0; ˆφ (k) ) + p(x i ; ˆβ (k) ){1 π(x i, 1; ˆφ (k) )}. and ( ˆβ (k), ˆφ (k) ) is obtained by solving the mean score equations with weights replaced by the replication weights w (k) i. Ch 4 54 / 79
55 Example 4.4 (Cont d) That is, ( ˆβ (k), ˆφ (k) ) is the solution to S (k) 1 (β, φ) w (k) i {y i p(x i ; β)} x i δ i =1 + δ i =0 w (k) i 1 y=0 w iy(β, φ){y p(x i ; β)}x i = 0 S (k) 2 (β, φ) w (k) i {δ i π(x i, y i ; φ)} (x i, y i ) δ i =1 + δ i =0 w (k) i 1 y=0 w iy(β, φ){δ i π(x i, y; β)}(x i, y) = 0 and w iy(β, φ) = p(x i ; β)π(x i, 1; φ) {1 p(x i ; β)}π(x i, 0; φ) + p(x i ; β)π(x i, 1; φ). Ch 4 55 / 79
56 6 Fractional Imputation Ch 4 56 / 79
57 Monte Carlo EM Remark Monte Carlo EM can be used as a frequentist approach to imputation. Convergence is not guaranteed (for fixed m). E-step can be computationally heavy. (May use MCMC method). Ch 4 57 / 79
58 If y mis,i is categorical, then simply use Ch 4all possible values of y mis,i as the58 / 79 Parametric Fractional Imputation (Kim, 2011) Parametric fractional imputation 1 More than one (say m) imputed values of y mis,i : y (1) mis,i,, y (m) mis,i from some (initial) density h (y mis,i ). 2 Create weighted data set {( w ij, yij ) } ; j = 1, 2,, m; i = 1, 2, n where m j=1 w ij = 1, y ij = (y obs,i, y (j) mis,i ) w ij f (y ij, δ i ; ˆη)/h(y (j) mis,i ), ˆη is the maximum likelihood estimator of η, and f (y, δ; η) is the joint density of (y, δ). 3 The weight w ij are the normalized importance weights and can be called fractional weights.
59 Remark Importance sampling idea: For sufficiently large m, m j=1 ij g ( yij ) g(yi ) f (y i,δ i ;ˆη) = h(y mis,i ) h(y mis,i)dy mis,i f (yi,δ i ;ˆη) h(y mis,i ) h(y mis,i)dy mis,i w for any g such that the expectation exists. = E {g (y i ) y obs,i, δ i ; ˆη} In the importance sampling literature, h( ) is called proposal distribution and f ( ) is called target distribution. Do not need to compute the conditional distribution f (y mis,i y obs,i, δ i ; η). Only the joint distribution f (y obs,i, y mis,i, δ i ; η) is needed because f (y obs,i, y (j) mis,i, δ i; ˆη)/h(y (j) i,mis ) m k=1 f (y obs,i, y (k) mis,i, δ i; ˆη)/h(y (k) i,mis ) = f (y (j) mis,i y obs,i, δ i ; ˆη)/h(y (j) i,mis ) m (k) k=1 f (ymis,i y obs,i, δ i ; ˆη)/h(y (k) i,mis ). Ch 4 59 / 79
60 EM algorithm by fractional imputation 1 Imputation-step: generate y (j) i,mis h (y i,mis). 2 Weighting-step: compute where m j=1 w ij(t) = 1. 3 M-step: update w ij(t) f (y ij, δ i ; ˆη (t) )/h(y (j) i,mis ) ˆη (t+1) = arg max m j=1 w ij(t) log f ( η; y ij, δ i ). 4 (Optional) Check if wij(t) is too large for some j. If so, set h(y i,mis ) = f (y i,mis y i,obs ; ˆη t ) and goto Step 1. 5 Repeat Step 2 - Step 4 until convergence. Ch 4 60 / 79
61 Remark Imputation Step + Weighting Step = E-step. The imputed values are not changed for each EM iteration. Only the fractional weights are changed. 1 Computationally efficient (because we use importance sampling only once). 2 Convergence is achieved (because the imputed values are not changed). See Theorem 4.5. For sufficiently large t, ˆη (t) ˆη. Also, for sufficiently large m, ˆη ˆη MLE. For estimation of ψ in E{U(ψ; Y )} = 0, simply use 1 n m j=1 w ij U(ψ; y ij) = 0 where wij = wij (ˆη) and ˆη is obtained from the above EM algorithm. Ch 4 61 / 79
62 Theorem 4.5 (Theorem 1 of Kim (2011) ) Theorem Let If then Q (η ˆη (t) ) = m j=1 w ij(t) log f ( η; y ij, δ i ). Q (ˆη (t+1) ˆη (t) ) Q (ˆη (t) ˆη (t) ) (37) where l obs (η) = n ln{f obs(i) (y i,obs, δ i ; η)} and f obs(i) (y i,obs, δ i ; η) = 1 m l obs (ˆη (t+1)) l obs (ˆη (t)), (38) m j=1 f (y ij, δ i ; η)/h m (y (j) i,mis ). Ch 4 62 / 79
63 Proof By using Jensen s inequality, l obs (ˆη (t+1)) l obs (ˆη (t)) = Therefore, (37) implies (38). m ln m j=1 j=1 w ij(t) w ij(t) ln f (yij, δ i; ˆη (t+1) ) f (yij, δ i; ˆη (t) ) { } f (y ij, δ i ; ˆη (t+1) ) f (y ij, δ i; ˆη (t) ) = Q (ˆη (t+1) ˆη (t) ) Q (ˆη (t) ˆη (t) ). Ch 4 63 / 79
64 Example 4.11: Return to Example 3.15 Fractional imputation 1 Imputation Step: Generate y (1) i,, y (m) i from f (y i x i ; ˆθ ) (0). 2 Weighting Step: Using the m imputed values generated from Step 1, compute the fractional weights by ( f y (j) wij(t) i x i ; ˆθ ) (t) { ( ) 1 π(x i, y (j) f y (j) i ; ˆφ } (t) ) i x i ; ˆθ (0) where ( π(x i, y i ; ˆφ) exp ˆφ0 + ˆφ 1 x i + ˆφ ) 2 y i = ( ). 1 + exp ˆφ 0 + ˆφ 1 x i + ˆφ 2 y i Ch 4 64 / 79
65 Example 4.11 Using the imputed data and the fractional weights, the M-step can be implemented by solving and j=1 j=1 m ( ) wij(t) S θ; x i, y (j) = 0 i m { } ( ) wij(t) δ i π(φ; x i, y (j) i ) 1, x i, y (j) i = 0, (39) where S (θ; x i, y i ) = log f (y i x i ; θ)/ θ. Ch 4 65 / 79
66 Example 4.12: Back to Example 3.18 (GLMM) Level 1 model y ij f 1 (y ij x ij, a i ; θ 1 ) for some fixed θ 1 and a i random. Level 2 model a i f 2 (a i ; θ 2 ) Latent variable: a i We are interested in generating a i from n i p(a i x i, y i ; θ 1, θ) f 1 (y ij x ij, a i ; θ 1 ) f 2(a i ; θ 2 ) j=1 Ch 4 66 / 79
67 Example 4.12 (Cont d) E-step 1 Imputation Step: Generate a (1) i,, a (m) (t) i from f 2 (a i ; ˆθ 2 ). 2 Weighting Step: Using the m imputed values generated from Step 1, compute the fractional weights by ij(t) g 1(y i x i, a (j) ; w i ˆθ (t) 1 ) where g 1 (y i x i, a i ; ˆθ 1 ) = n i j=1 f 1(y ij x ij, a i ; θ 1 ). M-step: Update the parameters by solving j=1 m ( ) wij(t) S 1 θ 1 ; x i, y i, a (j) i = 0 j=1 m ( ) wij(t) S 2 θ 2 ; a (j) i = 0. Ch 4 67 / 79
68 Example 4.13: Measurement error model Interested in estimating θ in f (y x; θ). Instead of observing x, we observe z which can be highly correlated with x. Thus, z is an instrumental variable for x: f (y x, z) = f (y x) and f (y z = a) f (y z = b) for a b. In addition to original sample, we have a separate calibration sample that observes (x i, z i ). Ch 4 68 / 79
69 Example 4.13 (Cont d) Table: Data Structure Z X Y Calibration Sample o o Original Sample o o The goal is to generate x in the original sample from f (x i z i, y i ) f (x i z i ) f (y i x i, z i ) = f (x i z i ) f (y i x i ) Obtain a consistent estimator ˆf (x z) from calibration sample. E-step 1 Generate x (1) i,, x (m) i from ˆf (x i z i ). 2 Compute the fractional weights associated with x (j) i by w ij f (y i x (j) ; ˆθ) M-step: Solve the weighted score equation for θ. i Ch 4 69 / 79
70 Remarks for Computation Recall that, writing S (η η) = M j=1 w ij (η)s(η; y ij, δ i ) where wij (η) is the fractional weight associated with y ij, denoted by w ij (η) = f (yij, δ i; η)/h m (y (j) i,mis ) m k=1 f (y ik, δ i; η)/h m (y (k) (40) i,mis ), and S(η; y, δ) = log f (y, δ; η)/ η, the EM algorithm for fractional imputation can be expressed as ˆη (t+1) solve S (η η (t) ) = 0. Ch 4 70 / 79
71 Remarks for Computation Instead of EM algorithm, Newton-type algorithm can also be used. The Newton-type algorithm for computing the MLE from the fractionally imputed data is given by { 1 ˆη (t+1) = ˆη (t) + Iobs )} S (ˆη(t) (ˆη (t) ˆη (t) ) where I obs (η) = m j=1 m j=1 w ij (η)ṡ(η; y ij, δ i ) w ij (η) { S(η; y ij, δ i ) S i (η) } 2, Ṡ(η; y, δ) = S(η; y, δ)/ η and S i (η) = M j=1 w ij (η)ṡ(η; y ij, δ i). Ch 4 71 / 79
72 Estimation of general parameter Parameter Ψ is defined through E{U(Ψ; Y )} = 0. The FI estimator of Ψ is computed by solving m j=1 Note that ˆη is the solution to m j=1 w ij (ˆη)U(Ψ; y ij) = 0. (41) w ij (ˆη) S (ˆη; y ij) = 0. Ch 4 72 / 79
73 Estimation of general parameter We can use either linearization method or replication method for variance estimation. For linearization method, using Theorem 4.2, we can use sandwich formula where ( ˆV ˆΨ) ˆτ q = n 1 j=1 = ˆτ q 1 ˆΩ q ˆτ q 1 (42) m ( wij U ˆΨ; yij) ˆΩ q = n 1 (n 1) 1 (ˆq i q n) 2, with ˆq i = Ūi + ˆκ S i, where (Ū i, S i ) = m j=1 w ij (U ij, S ij ), Uij = U( ˆΨ; yij ), S ij = S(ˆη; y ij ), and ˆκ = m j=1 wij (ˆη) ( Uij Ūi ) S ij {Iobs (ˆη)} 1 Ch 4 73 / 79
74 Estimation of general parameter For replication method, we first obtain the k-th replicate ˆη (k) of ˆη by solving m w (k) i wij (η) S ( η; yij) = 0. j=1 Once ˆη (k) is obtained then the k-th replicate ˆΨ (k) of ˆΨ is obtained by solving m w (k) i wij (ˆη (k) )U(Ψ; yij) = 0 for ψ. j=1 The replication variance estimator of ˆΨ from (41) is obtained by ˆV rep ( ˆΨ) = L ( 2 c k ˆΨ (k) ˆΨ). k=1 Ch 4 74 / 79
75 Example 4.15: Nonparametric Fractional Imputation Bivariate data: (x i, y i ) x i are completely observed but y i is subject to missingness. Joint distribution of (x, y) completely unspecified. Assume MAR in the sense that P(δ = 1 x, y) does not depend on y. Without loss of generality, assume that δ i = 1 for i = 1,, r and δ i = 0 for i = r + 1,, n. We are only interested in estimating θ = E(Y ). Ch 4 75 / 79
76 Example 4.15 (Cont d) Let K h (x i, x j ) = K((x i x j )/h) be the Kernel function with bandwidth h such that K(x) 0 and K(x)dx = 1, xk(x)dx = 0, σk 2 x 2 K(x)dx > 0. Examples include the following: Boxcar kernel: K(x) = 1 2 I (x) Gaussian kernel: K(x) = 1 2π exp( 1 2 x 2 ) where Epanechnikov kernel: K(x) = 3 4 (1 x 2 )I (x) Tricube Kernel: K(x) = (1 x 3 ) 3 I (x) I (x) = { 1 if x 1 0 if x > 1. Ch 4 76 / 79
77 Example 4.15 (Cont d) Nonparametric regression estimator of m(x) = E(Y x): ˆm(x) = r l i (x)y i (43) where l i (x) = K ( x x i ) ( h j K x xj ). h Estimator in (43) is often called Nadaraya-Watson kernel estimator. Under some regularity conditions and under the optimal choice of h (with h = O(n 1/5 )), it can be shown that [ E { ˆm(x) m(x)} 2] = O(n 4/5 ). Thus, its convergence rate is slower than that of parametric one. Ch 4 77 / 79
78 Example 4.15 (Cont d) However, the imputed estimator of θ using (43) can achieve the n-consistency. That is, { r ˆθ NP = 1 y i + n i=r+1 ˆm(x i ) } (44) achieves n (ˆθ NP θ) N(0, σ 2 ) (45) where σ 2 = E{v(x)/π(x)} + V {m(x)}, m(x) = E(y x), v(x) = V (y x) and π(x) = E(δ x). Result (45) was first proved by Cheng (1994). Ch 4 78 / 79
79 Example 4.15 (Cont d) We can express ˆθ NP in (45) as a nonparametric fractional imputation (NFI) estimator of the form ˆθ NFI = 1 r r y n i + wij y (j) i j=r+1 where w ij = l i (x j ), which is defined after (43), and y (j) i = y i. Variance estimation can be implemented by a resampling method, such as bootstrap. Ch 4 79 / 79
Statistical Methods for Handling Missing Data
Statistical Methods for Handling Missing Data Jae-Kwang Kim Department of Statistics, Iowa State University July 5th, 2014 Outline Textbook : Statistical Methods for handling incomplete data by Kim and
More informationFractional Imputation in Survey Sampling: A Comparative Review
Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More information6. Fractional Imputation in Survey Sampling
6. Fractional Imputation in Survey Sampling 1 Introduction Consider a finite population of N units identified by a set of indices U = {1, 2,, N} with N known. Associated with each unit i in the population
More informationRecent Advances in the analysis of missing data with non-ignorable missingness
Recent Advances in the analysis of missing data with non-ignorable missingness Jae-Kwang Kim Department of Statistics, Iowa State University July 4th, 2014 1 Introduction 2 Full likelihood-based ML estimation
More informationParametric fractional imputation for missing data analysis
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????),??,?, pp. 1 15 C???? Biometrika Trust Printed in
More informationFractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling
Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction
More informationA note on multiple imputation for general purpose estimation
A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume
More informationLikelihood-based inference with missing data under missing-at-random
Likelihood-based inference with missing data under missing-at-random Jae-kwang Kim Joint work with Shu Yang Department of Statistics, Iowa State University May 4, 014 Outline 1. Introduction. Parametric
More informationAn Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data
An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)
More informationChapter 5: Models used in conjunction with sampling. J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70
Chapter 5: Models used in conjunction with sampling J. Kim, W. Fuller (ISU) Chapter 5: Models used in conjunction with sampling 1 / 70 Nonresponse Unit Nonresponse: weight adjustment Item Nonresponse:
More informationNonresponse weighting adjustment using estimated response probability
Nonresponse weighting adjustment using estimated response probability Jae-kwang Kim Yonsei University, Seoul, Korea December 26, 2006 Introduction Nonresponse Unit nonresponse Item nonresponse Basic strategy
More informationModel Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao
Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley
More informationShu Yang and Jae Kwang Kim. Harvard University and Iowa State University
Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND
More informationIntroduction An approximated EM algorithm Simulation studies Discussion
1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse
More informationRobustness to Parametric Assumptions in Missing Data Models
Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice
More informationEconometrics I, Estimation
Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the
More informationTwo-phase sampling approach to fractional hot deck imputation
Two-phase sampling approach to fractional hot deck imputation Jongho Im 1, Jae-Kwang Kim 1 and Wayne A. Fuller 1 Abstract Hot deck imputation is popular for handling item nonresponse in survey sampling.
More informationWeb Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.
Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we
More informationCombining data from two independent surveys: model-assisted approach
Combining data from two independent surveys: model-assisted approach Jae Kwang Kim 1 Iowa State University January 20, 2012 1 Joint work with J.N.K. Rao, Carleton University Reference Kim, J.K. and Rao,
More informationChapter 4. Replication Variance Estimation. J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28
Chapter 4 Replication Variance Estimation J. Kim, W. Fuller (ISU) Chapter 4 7/31/11 1 / 28 Jackknife Variance Estimation Create a new sample by deleting one observation n 1 n n ( x (k) x) 2 = x (k) = n
More informationInference For High Dimensional M-estimates. Fixed Design Results
: Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and
More informationAsymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands
Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department
More informationP n. This is called the law of large numbers but it comes in two forms: Strong and Weak.
Large Sample Theory Large Sample Theory is a name given to the search for approximations to the behaviour of statistical procedures which are derived by computing limits as the sample size, n, tends to
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationChapter 8: Estimation 1
Chapter 8: Estimation 1 Jae-Kwang Kim Iowa State University Fall, 2014 Kim (ISU) Ch. 8: Estimation 1 Fall, 2014 1 / 33 Introduction 1 Introduction 2 Ratio estimation 3 Regression estimator Kim (ISU) Ch.
More informationA measurement error model approach to small area estimation
A measurement error model approach to small area estimation Jae-kwang Kim 1 Spring, 2015 1 Joint work with Seunghwan Park and Seoyoung Kim Ouline Introduction Basic Theory Application to Korean LFS Discussion
More informationBootstrap inference for the finite population total under complex sampling designs
Bootstrap inference for the finite population total under complex sampling designs Zhonglei Wang (Joint work with Dr. Jae Kwang Kim) Center for Survey Statistics and Methodology Iowa State University Jan.
More informationOn the bias of the multiple-imputation variance estimator in survey sampling
J. R. Statist. Soc. B (2006) 68, Part 3, pp. 509 521 On the bias of the multiple-imputation variance estimator in survey sampling Jae Kwang Kim, Yonsei University, Seoul, Korea J. Michael Brick, Westat,
More information36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1
36. Multisample U-statistics jointly distributed U-statistics Lehmann 6.1 In this topic, we generalize the idea of U-statistics in two different directions. First, we consider single U-statistics for situations
More informationPh.D. Qualifying Exam Friday Saturday, January 6 7, 2017
Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a
More informationMaster s Written Examination
Master s Written Examination Option: Statistics and Probability Spring 05 Full points may be obtained for correct answers to eight questions Each numbered question (which may have several parts) is worth
More informationIntroduction to Survey Data Integration
Introduction to Survey Data Integration Jae-Kwang Kim Iowa State University May 20, 2014 Outline 1 Introduction 2 Survey Integration Examples 3 Basic Theory for Survey Integration 4 NASS application 5
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationCombining Non-probability and Probability Survey Samples Through Mass Imputation
Combining Non-probability and Probability Survey Samples Through Mass Imputation Jae-Kwang Kim 1 Iowa State University & KAIST October 27, 2018 1 Joint work with Seho Park, Yilin Chen, and Changbao Wu
More informationFor iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.
Large Sample Theory Study approximate behaviour of ˆθ by studying the function U. Notice U is sum of independent random variables. Theorem: If Y 1, Y 2,... are iid with mean µ then Yi n µ Called law of
More informationModelling Non-linear and Non-stationary Time Series
Modelling Non-linear and Non-stationary Time Series Chapter 2: Non-parametric methods Henrik Madsen Advanced Time Series Analysis September 206 Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September
More informationEstimation and Model Selection in Mixed Effects Models Part I. Adeline Samson 1
Estimation and Model Selection in Mixed Effects Models Part I Adeline Samson 1 1 University Paris Descartes Summer school 2009 - Lipari, Italy These slides are based on Marc Lavielle s slides Outline 1
More informationModel-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego
Model-free prediction intervals for regression and autoregression Dimitris N. Politis University of California, San Diego To explain or to predict? Models are indispensable for exploring/utilizing relationships
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationEstimation, Inference, and Hypothesis Testing
Chapter 2 Estimation, Inference, and Hypothesis Testing Note: The primary reference for these notes is Ch. 7 and 8 of Casella & Berger 2. This text may be challenging if new to this topic and Ch. 7 of
More informationREGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University
REGRESSION ITH SPATIALL MISALIGNED DATA Lisa Madsen Oregon State University David Ruppert Cornell University SPATIALL MISALIGNED DATA 10 X X X X X X X X 5 X X X X X 0 X 0 5 10 OUTLINE 1. Introduction 2.
More informationStatement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.
MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss
More informationJoint Probability Distributions
Joint Probability Distributions ST 370 In many random experiments, more than one quantity is measured, meaning that there is more than one random variable. Example: Cell phone flash unit A flash unit is
More informationGraybill Conference Poster Session Introductions
Graybill Conference Poster Session Introductions 2013 Graybill Conference in Modern Survey Statistics Colorado State University Fort Collins, CO June 10, 2013 Small Area Estimation with Incomplete Auxiliary
More informationSparse Nonparametric Density Estimation in High Dimensions Using the Rodeo
Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University
More informationarxiv:math/ v1 [math.st] 23 Jun 2004
The Annals of Statistics 2004, Vol. 32, No. 2, 766 783 DOI: 10.1214/009053604000000175 c Institute of Mathematical Statistics, 2004 arxiv:math/0406453v1 [math.st] 23 Jun 2004 FINITE SAMPLE PROPERTIES OF
More informationLinear Methods for Prediction
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationStat 451 Lecture Notes Monte Carlo Integration
Stat 451 Lecture Notes 06 12 Monte Carlo Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 6 in Givens & Hoeting, Chapter 23 in Lange, and Chapters 3 4 in Robert & Casella 2 Updated:
More informationEM Algorithm II. September 11, 2018
EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data
More informationSpatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood
Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October
More informationImputation for Missing Data under PPSWR Sampling
July 5, 2010 Beijing Imputation for Missing Data under PPSWR Sampling Guohua Zou Academy of Mathematics and Systems Science Chinese Academy of Sciences 1 23 () Outline () Imputation method under PPSWR
More informationLecture 5: LDA and Logistic Regression
Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant
More informationMiscellanea A note on multiple imputation under complex sampling
Biometrika (2017), 104, 1,pp. 221 228 doi: 10.1093/biomet/asw058 Printed in Great Britain Advance Access publication 3 January 2017 Miscellanea A note on multiple imputation under complex sampling BY J.
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationIntroduction to Regression
Introduction to Regression p. 1/97 Introduction to Regression Chad Schafer cschafer@stat.cmu.edu Carnegie Mellon University Introduction to Regression p. 1/97 Acknowledgement Larry Wasserman, All of Nonparametric
More informationA Conditional Approach to Modeling Multivariate Extremes
A Approach to ing Multivariate Extremes By Heffernan & Tawn Department of Statistics Purdue University s April 30, 2014 Outline s s Multivariate Extremes s A central aim of multivariate extremes is trying
More informationOutline of GLMs. Definitions
Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationPh.D. Qualifying Exam Friday Saturday, January 3 4, 2014
Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Put your solution to each problem on a separate sheet of paper. Problem 1. (5166) Assume that two random samples {x i } and {y i } are independently
More informationData Uncertainty, MCML and Sampling Density
Data Uncertainty, MCML and Sampling Density Graham Byrnes International Agency for Research on Cancer 27 October 2015 Outline... Correlated Measurement Error Maximal Marginal Likelihood Monte Carlo Maximum
More informationMarginal Screening and Post-Selection Inference
Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2
More informationPart 6: Multivariate Normal and Linear Models
Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of
More informationECON 3150/4150, Spring term Lecture 6
ECON 3150/4150, Spring term 2013. Lecture 6 Review of theoretical statistics for econometric modelling (II) Ragnar Nymoen University of Oslo 31 January 2013 1 / 25 References to Lecture 3 and 6 Lecture
More informationDensity estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas
0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity
More informationProblem Selected Scores
Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected
More informationPropensity score adjusted method for missing data
Graduate Theses and Dissertations Graduate College 2013 Propensity score adjusted method for missing data Minsun Kim Riddles Iowa State University Follow this and additional works at: http://lib.dr.iastate.edu/etd
More informationChapter 3: Maximum Likelihood Theory
Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40 1 Introduction Example 2 Maximum likelihood
More informationModification and Improvement of Empirical Likelihood for Missing Response Problem
UW Biostatistics Working Paper Series 12-30-2010 Modification and Improvement of Empirical Likelihood for Missing Response Problem Kwun Chuen Gary Chan University of Washington - Seattle Campus, kcgchan@u.washington.edu
More informationIntroduction to Regression
Introduction to Regression David E Jones (slides mostly by Chad M Schafer) June 1, 2016 1 / 102 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression Nonparametric Procedures
More informationSection 10: Role of influence functions in characterizing large sample efficiency
Section 0: Role of influence functions in characterizing large sample efficiency. Recall that large sample efficiency (of the MLE) can refer only to a class of regular estimators. 2. To show this here,
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationTesting Error Correction in Panel data
University of Vienna, Dept. of Economics Master in Economics Vienna 2010 The Model (1) Westerlund (2007) consider the following DGP: y it = φ 1i + φ 2i t + z it (1) x it = x it 1 + υ it (2) where the stochastic
More informationInference For High Dimensional M-estimates: Fixed Design Results
Inference For High Dimensional M-estimates: Fixed Design Results Lihua Lei, Peter Bickel and Noureddine El Karoui Department of Statistics, UC Berkeley Berkeley-Stanford Econometrics Jamboree, 2017 1/49
More informationanalysis of incomplete data in statistical surveys
analysis of incomplete data in statistical surveys Ugo Guarnera 1 1 Italian National Institute of Statistics, Italy guarnera@istat.it Jordan Twinning: Imputation - Amman, 6-13 Dec 2014 outline 1 origin
More informationUNIVERSITÄT POTSDAM Institut für Mathematik
UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam
More informationReview and continuation from last week Properties of MLEs
Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that
More informationBayesian inference for factor scores
Bayesian inference for factor scores Murray Aitkin and Irit Aitkin School of Mathematics and Statistics University of Newcastle UK October, 3 Abstract Bayesian inference for the parameters of the factor
More informationCombining Non-probability and. Probability Survey Samples Through Mass Imputation
Combining Non-probability and arxiv:1812.10694v2 [stat.me] 31 Dec 2018 Probability Survey Samples Through Mass Imputation Jae Kwang Kim Seho Park Yilin Chen Changbao Wu January 1, 2019 Abstract. This paper
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationEconomics 583: Econometric Theory I A Primer on Asymptotics
Economics 583: Econometric Theory I A Primer on Asymptotics Eric Zivot January 14, 2013 The two main concepts in asymptotic theory that we will use are Consistency Asymptotic Normality Intuition consistency:
More informationMaximum Smoothed Likelihood for Multivariate Nonparametric Mixtures
Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures David Hunter Pennsylvania State University, USA Joint work with: Tom Hettmansperger, Hoben Thomas, Didier Chauveau, Pierre Vandekerkhove,
More informationData Integration for Big Data Analysis for finite population inference
for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation
More informationSpring 2012 Math 541B Exam 1
Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote
More informationWeighting in survey analysis under informative sampling
Jae Kwang Kim and Chris J. Skinner Weighting in survey analysis under informative sampling Article (Accepted version) (Refereed) Original citation: Kim, Jae Kwang and Skinner, Chris J. (2013) Weighting
More informationFinal Exam November 24, Problem-1: Consider random walk with drift plus a linear time trend: ( t
Problem-1: Consider random walk with drift plus a linear time trend: y t = c + y t 1 + δ t + ϵ t, (1) where {ϵ t } is white noise with E[ϵ 2 t ] = σ 2 >, and y is a non-stochastic initial value. (a) Show
More informationComputer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression
Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:
More informationδ -method and M-estimation
Econ 2110, fall 2016, Part IVb Asymptotic Theory: δ -method and M-estimation Maximilian Kasy Department of Economics, Harvard University 1 / 40 Example Suppose we estimate the average effect of class size
More informationMann-Whitney Test with Adjustments to Pre-treatment Variables for Missing Values and Observational Study
MPRA Munich Personal RePEc Archive Mann-Whitney Test with Adjustments to Pre-treatment Variables for Missing Values and Observational Study Songxi Chen and Jing Qin and Chengyong Tang January 013 Online
More informationStat 516, Homework 1
Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball
More informationChapter 1. Linear Regression with One Predictor Variable
Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical
More informationMarkov Chain Monte Carlo (MCMC)
Markov Chain Monte Carlo (MCMC Dependent Sampling Suppose we wish to sample from a density π, and we can evaluate π as a function but have no means to directly generate a sample. Rejection sampling can
More informationA Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness
A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model
More informationGeneralized Linear Models. Kurt Hornik
Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general
More informationStat 451 Lecture Notes Numerical Integration
Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29
More informationEconomics 582 Random Effects Estimation
Economics 582 Random Effects Estimation Eric Zivot May 29, 2013 Random Effects Model Hence, the model can be re-written as = x 0 β + + [x ] = 0 (no endogeneity) [ x ] = = + x 0 β + + [x ] = 0 [ x ] = 0
More informationINSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING
Statistica Sinica 24 (2014), 1001-1015 doi:http://dx.doi.org/10.5705/ss.2013.038 INSTRUMENTAL-VARIABLE CALIBRATION ESTIMATION IN SURVEY SAMPLING Seunghwan Park and Jae Kwang Kim Seoul National Univeristy
More information