Introduction An approximated EM algorithm Simulation studies Discussion
|
|
- Dana Hodge
- 5 years ago
- Views:
Transcription
1 1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse November 12-13, 2015
2 2 / 33 1 Introduction Background & Current Approaches Expectation-Maximization (EM) Algorithm for Regression Analysis of Data with Nonresponse 2 An approximated EM algorithm The algorithm Application in contingency table analysis Application in regression analyses with a continuous outcome 3 Simulation studies 4 Discussion
3 3 / 33 Regression Analysis of Data with Nonresponse Consider bivariate data {x i, y i, i = 1, 2,..., n} where x i s are fully observed, y i s are only observed for i = 1,..., m. Denote R i to be the missing-data indicator: R i = 1 if y i is observed and R i = 0 otherwise. Assume that [y i x i ] g(y i x i ; θ) exp{θs(x i, y i ) + a(θ)} pr[r i = 1 x i, y i ] = w(x i, y i ; ψ) and the parameter of interest is θ. Goal: to avoid modeling w(x i, y i ; ψ).
4 4 / 33 Likelihood-based Inference (I) The observed data are D obs = {x i, R i, R i y i ; i = 1,..., n}. When w(x i, y i ; ψ) is parametrically modeled, the likelihood function is L(θ, ψ; D obs ) = = n p(r i, R i y i x i ; θ, ψ) m g(y i x i ; θ)w(x i, y i ; ψ) n i=m+1 g(y i x i ; θ){1 w(x i, y i ; ψ)} dy i When w(x i, y i ; ψ) = w(x i ; ψ), data are called missing at random (MAR) and L(θ, ψ; D obs ) L(θ; D obs )L(ψ; D obs ) (Rubin, 1976).
5 5 / 33 Likelihood-based Inference (II) When data are MAR plus θ and ψ are distinct, the modeling of w(x, y; ψ) is not necessary under the likelihood-based inference. When data are not MAR, the missingness has to be modeled and the inference on θ and ψ are made together. Misspecification of the missing-data model w(x, y; ψ) often leads to biased estimate of θ.
6 6 / 33 A conditional likelihood Assume that [y i x i ] g(y i x i ; θ) (A parametric regression) pr[r i = 1 x i, y i ] = w(x i, y i ; ψ) = w(y i ; ψ). Then [X Y, R = 1] = [X Y, R = 0] = [X Y ]. Consider the following conditional likelihood: CL(θ) = R i =1 p(x i y i ; θ, F X ) (F X is the CDF of X) = R i =1 g(y i x i ; θ)p(x i ) p(y i ; θ, F X ) (Bayes formula) R i =1 g(y i x i ; θ) g(yi x; θ) df X (x) Requires knowing F X ( )!
7 A pseudolikelihood method (Tang, Little & Raghunathan, 2003) In alternative, we may either model X f (x; α), obtain α = arg max α n f (x i; α), then consider a pseudolikelihood function PL 1 (θ) = R i =1 g(y i x i ; θ) g(yi x; θ) df X (x; α) or substitute F X ( ) by the empirical distribution F n ( ): PL 2 (θ) = R i =1 p(y i x i ; θ) p(yi x; θ) df n (x) = p(y i x i ; θ) 1 n R i =1 n j=1 p(y i x j ; θ) 7 / 33
8 8 / 33 Exponential tilting (Kim & Yu, 2011) Consider the following semiparametric logistic regression model: pr[r i = 1 x i, y i ] = logit 1 {h(x i ) ψy i }, where h( ) is unspecified and ψ is either known or estimated from an external dataset with the missing values recovered. Subsequently we would have: exp(ψy) p(y x, r = 0) = p(y x, r = 1) E[exp(ψy) x, r = 1]. With both p(y x, r = 1) and E[exp(ψy) x, r = 1] empirically estimated, one will obtain a density estimator for p(y x, r = 0) and estimate θ = E[H(X, Y )] via: θ = 1 n { ri =1 H(x i, y i ) + n Ê[H(x i, y i ) x i, r i = 0]} r i =0
9 9 / 33 An instrumental variable approach (Wang, Shao & Kim, 2014) Assume that X = (Z, U) and E(Y X) = β 0 + β 1 u + β 2 z pr(r = 1 Z, U, Y ) = pr(r = 1 U, Y ) = π(ψ; U, Y ) one may use the following estimating equations to estimate ψ: n r i { π(ψ; u i, y i ) 1}(1, u i, z i ) = 0. Then estimate β s via n r i π( ψ; u i, y i ) (y i β 0 β 1 u i + β 2 z i )(1, u i, z i ) = 0.
10 10 / 33 The likelihood function From the following representation: L(θ, ψ; D obs ) = = = = n p(r i, R i y i x i ; θ, ψ) n n n p(r i, R i y i, (1 R i )y i x i ; θ, ψ) p((1 R i )y i R i, R i y i, x i ; θ, ψ) p(r i, y i x i ; θ, ψ) p((1 R i )y i R i, R i y i, x i ; θ, ψ) p(y i x i ; θ)p(r i x i, y i ; ψ) p((1 R i )y i R i, R i y i, x i ; θ, ψ),
11 11 / 33 The log-likelihood function We have Let Q(θ, ψ; θ, ψ) = l(θ, ψ; y obs ) = log L(θ, ψ; y obs ) nx = {log p(y i x i ; θ) + log p(r i x i, y i ; ψ) nx log p((1 R i )y i R i, R i y i, x i ; θ, ψ)}. E[log p(y i x i ; θ) + log p(r i x i, y i ; ψ) R i, R i y i, x i ; θ, ψ], H(θ, ψ; θ, nx ψ) = E[log p((1 R i )y i R i, R i y i, x i ; θ, ψ) R i, R i y i, x i ; θ, ψ]. Then we have l(θ, ψ; y obs ) = Q(θ, ψ; θ, ψ) H(θ, ψ; θ, ψ), for any ( θ, ψ).
12 12 / 33 The Expectation-Maximization (EM) Algorithm (Dempster, Laird and Rubin, 1977) From the Jansen s inequality, we have H(θ, ψ; θ, ψ) H( θ, ψ; θ, ψ). The EM algorithm is an iterative algorithm to find a sequence {(θ (t), ψ (t) ), t = 0, 1, 2,...} such that (θ (t+1), ψ (t+1) ) = arg max (θ,ψ) Q(θ, ψ; θ (t), ψ (t) ). This assures that l(θ (t), ψ (t) ; y obs ) l(θ (t+1), ψ (t+1) ; y obs ). Typically logistic or probit regression models are used for w(x, y; ψ). Numerical integrations or the Monte Carlo method are often necessary in the implementation.
13 13 / 33 When ψ = ψ 0 is known Now consider when the true value of ψ, ψ 0, is known: l(θ, ψ 0 ; y obs ) = log L(θ, ψ 0 ; y obs ) nx = {log p(y i x i ; θ) + log p(r i x i, y i ; ψ 0 ) log p((1 R i )y i D i,obs ; θ, ψ 0 )}. With current estimate θ (t), the Q-function becomes Q(θ; θ (t) ) = nx E[log p(y i x i ; θ) + log p(r i x i, y i ; ψ 0 ) R i, R i y i, x i ; θ (t), ψ 0 ] nx E[log p(y i x i ; θ) R i, R i y i, x i ; θ (t), ψ 0 ], because the second term does not involve θ.
14 14 / 33 Another view of the likelihood Without loss of generality, assume that the regression model follows a canonical exponential family. Then Ω(θ; θ (t) ) = mx nx logg(y i x i ; θ) + E[log g(y i x i ; θ) x i, R i = 0; θ (t), ψ 0 ] mx logg(y i x i ; θ) + i=m+1 i=m+1 nx {θe[s(x i, y i ) x i, R i = 0; θ (t), ψ 0 ] + a(θ)}, we need to obtain E[S(x i, y i ) x i, R i = 0; θ (t), ψ 0 ] in order to carry out the EM algorithm. With w(x i, y i ; ψ 0 ) known, R E[S(x i, y i ) x i, R i = 0; θ (t) s(xi, y i )g(y i x i ; θ (t) ){1 w(x i, y i ; ψ 0 )} dy i, ψ 0 ] = R g(yi x i ; θ (t) ){1 w(x i, y i ; ψ 0 )} dy i
15 15 / 33 An alternative look of the E-step On the other hand, E[S(x i, y i ) x i, θ (t) ] = E[E[S(x i, y i ) R i, x i ] x i ] We would have E[S(x i, y i ) x i, R i = 0; θ (t), ψ 0 ] = E[S(x i, y i ) x i, R i = 1; θ (t) ]pr[r i = 1 x i ; θ (t), ψ 0 ] +E[S(x i, y i ) x i, R i = 0; θ (t), ψ 0 ]pr[r i = 0 x i ; θ (t), ψ 0 ], = E[S(x i, y i ) x i, θ (t) ] E[S(x i, y i ) x i, R i = 1; θ (t), ψ 0 ]pr[r i = 1 x i ; θ (t), ψ 0 ] pr[r i = 0 x i ; θ (t), ψ 0 ]
16 16 / 33 Empirical replacements It is noted that if we use some empirical estimates of E[S(x i, y i ) x i, R i = 1; θ (t), ψ 0 ] and pr[r i = 0 x i ; θ (t), ψ 0 ] to replace them, we would be able to carry out an iterative algorithm as the following: At the E-step be[s(x i, y i ) x i, R i = 0; θ (t), ψ 0 ] = E[S(x i, y i ) x i, θ (t) ] b E[S(x i, y i ) x i, R i = 1] bpr[r i = 1 x i ] 1 bpr[r i = 1 x i ] At the M-step, find θ = θ (t+1) that solves: θ (t+1) = arg max θ Q (θ; θ (t) ) mx := arg max θ [θ{ S(x i, y i ) + nx i=m+1 be[s(x i, Y ) x i, R i = 0; θ (t) ]} + na(θ)]
17 17 / 33 An important observation In usual an EM algorithm, θ (t) may be far away from the truth θ 0. However, in order for the empirical estimates pr[r i = 1 x i ] and Ê[S(x i, y i ) x i, R i = 1] to be self-consistent with the corresponding terms under (θ (t), ψ 0 ), it is required that θ (t) is a consistent estimate of θ. Therefore we should start with a consistent initial estimate of θ and carry out this iterative algorithm to obtain another consistent estimate of θ at convergence.
18 A short summary of the modified EM 1 Choose initial value of θ (0) from either a complete dataset from subjects with similar characteristics or a completed subset recovered from recalls. 2 Compute the Nadaraya-Watson estimates or local polynomial estimates for pr[r = 1 x i ] and E[S(x i, Y ) x i, R = 1], for each i = m + 1, m + 2,..., n. 3 At the E-step of the tth iteration, calculate be[s(x i, y i ) x i, R i = 0; θ (t), ψ 0 ]. 4 At the M-step of the tth iteration, find θ = θ (t+1) that solves: nx mx nx E[S(x i, y i ) x i ; θ] = S(x i, y i ) + be[s(x i, Y ) x i, R i = 0; θ (t) ] i=m+1 With a complete external dataset {x i, y i ; i = n + 1,..., n + n E }, we would solve θ = θ (t+1) from: n+n nx X E E[S(x i, y i ) x i ; θ] + E[S(x i, y i ) x i ; θ] i=n+1 = mx S(xi, y i ) + nx be[s(xi, Y ) x i, R i = 0; θ (t) ] + n+n X E S(xi, y i ) 18 / 33
19 19 / 33 Here we design an iterative algorithm with a sequence {θ (t), t = 0, 1, 2,...} such that θ (t+1) = arg max θ Q (θ; θ (t) ) = arg max θ {Q(θ; θ (t) ) + o(1)} This algorithm will yield l(θ (t+1), ψ 0 ; y obs ) l(θ (t), ψ 0 ; y obs ) + o(1).
20 An example in contingency table analysis Consider discrete data {x i, y i, z i, i = 1,..., n}: x i s and y i s are fully observed. z i is observed for i = 1,..., c; missing for i = c + 1,..., n. Let m = n c. Essentially the observed include the fully classified table {c jkl } and the partially classified table {m jk+ }. Complete external data D E = {x i, y i, z i, i = n + 1,..., n + n E } are fully observed. A log-linear model is assumed with parameter θ, which leads to π jkl = pr[x = i, Y = j, Z = l], j = 1,..., J; k = 1,..., K ; l = 1,..., L. Initial estimates {π (0) jkl } are derived from the external complete data D E. 20 / 33
21 21 / 33 Implementation under the discrete setting Initial estimations: bpr[z = l x = j, y = k, R = 1] = bpr[r = 1 x = j, y = k] = At the tth step, for (j, k, l), we update S jkl = c jkl + P c I{x i = j, y i = k, z i = l} P c I{x i = j, y i = k} P c I{x i = j, y i = k} P n I{x i = j, y i = k} = c jk+ n jk+. = c jkl c jk+ nx I{x i = j, y i = k} bpr[z = l x = j, y = k, R = 0] i=c+1 = c jkl + m jk+ pr[z = l x = j, y = k; θ (t) ] cjkl c jk+ c jk+ n jk+ 1 c jk+ n jk+ = c jkl + m jk+ π (t) jkl c jkl π (t) n jk+ jk+ m jk+ n jk+ = n jk+ π (t) jkl π (t) jk+
22 22 / 33 Impression on the discrete setting In the E-step, we ended up as if imputing all Z i s based on (x i, y i ), including those observed ones. If we include the external data D E in the algorithm with updating the sufficient statistics through π (t) Sjkl = njkl E + S jkl = njkl E jkl + n jk+ π (t), jk+ the modified EM algorithm is equivalent to running a regular EM algorithm on the fully classified table {n E jkl } and the partial classified table {n jk+ }, or, removing the observed z i s from the complete cases (not part of the external complete data).
23 23 / 33 Implementation under the continuous setting Consider the regression analysis of [Y X] where Y is continuous and subject to nonresponse If X is discrete, the empirical approximations are: r j =1,x j =x i =k Ê[S(x i, Y ) x i = k, r i = 1] = s(x i, y j ) #{j : r j = 1, x j = x i = k} pr[r i = 1 x i = k] = #{j : r j = 1, x j = x i = k} #{j : x j = x i = k} If X is continuous, we use either the Nadaraya-Watson estimate or local polynomial estimate as Ê[S(x i, Y ) x i = k, r i = 1]; a kernel estimator for pr[r i = 1 x i = k].
24 24 / 33 Simulation settings when data are NMAR We simulated bivariate data {x i, y i, i = 1, 2,..., N} following: (i) x i N(0, 1) or x i Bin(5, 0.3). (ii) [y i x i ] N(β 0 + β 1 x i, σ 2 ), where θ = (β 0, β 1, σ 2 ) = (1, 1, 1). Keep n E = 200 of the above observations as the external datasets for obtaining initial values for θ. Simulate missing y i s from the rest n = 1000 subjects with: pr[r i = 1 x i, y i ] = Φ(ψ 0 + ψ 1 x i + ψ 2 y i ), where Φ() is the CDF of the Gaussian distribution. Compare the complete-case estimate ( θ cc ) based on the 1000 observations with missing data, MLE from the external data ( θ E ) and the proposed approximated EM estimates ( θ AEM ).
25 25 / 33 Simulation results with X N(0, 1) Methods β 0 β 1 σ 2 Complete-case analysis Empirical Bias* Empirical SD* Coverage of 95% CI 0% 8.8% 79.5% MLEs from external subset Empirical Bias* Empirical SD* Coverage of 95% CI 96.1% 93.8% 92.6% Approximated EM** Empirical Bias* Empirical SD* Coverage of 95% CI 95.0% 95.2% 93.4% The proportion of nonresponse was about 40%. * Empirical biases and SDs were the actual numbers times ** The Epanechnikov kernel was used in the approximated EM. Bootstrap was used to derive the standard errors of the b θ AEM
26 26 / 33 Simulation results with X Bin(5, 0.3) Methods β 0 β 1 σ 2 Complete-case analysis Empirical Bias* Empirical SD* Coverage of 95% CI 94.6% 11.6% 86.5% MLEs from external subset Empirical Bias* Empirical SD* Coverage of 95% CI 94.7% 94.2% 92.5% Approximated EM** Empirical Bias* Empirical SD* Coverage of 95% CI 94.7% 93.9% 93.6% The proportion of nonresponse was about 40% * Empirical biases and SDs were the actual numbers times ** The empirical averages were used in the approximated EM. Bootstrap was used to derive the standard errors of the b θ AEM.
27 27 / 33 Simulation settings when data MAR We simulated bivariate data {x i, y i, i = 1, 2,..., N} following: (i) x i N(0, 1). (ii) [y i x i ] N(β 0 + β 1 x i, σ 2 ), where θ = (β 0, β 1, σ 2 ) = (1, 1, 1). Keep n E = 200 of the above observations as the external datasets for obtaining initial values for θ. Simulate missing y i s from the rest n = 1000 subjects with: pr[r i = 1 x i, y i ] = Φ(ψ 0 + ψ 1 x i ), where Φ() is the CDF of the Gaussian distribution. Compare the complete-case estimate ( θ cc ) based on the 1000 observations with missing data, MLE from the external data ( θ E ) and the proposed approximated EM estimates ( θ AEM ).
28 28 / 33 Simulation results with X N(0, 1) Methods β 0 β 1 σ 2 Complete-case analysis Empirical Bias* Empirical SD* Coverage of 95% CI 94% 94.5% 93.4% MLEs from external subset Empirical Bias* Empirical SD* Coverage of 95% CI 96.1% 93.8% 92.6% Approximated EM** Empirical Bias* Empirical SD* Coverage of 95% CI 93.9% 94.5% 93.7% The proportion of nonresponse was about 40%. * Empirical biases and SDs were the actual numbers times ** The Epanechnikov kernel was used in the modified EM. Bootstrap was used to derive the standard errors of the b θ AEM
29 29 / 33 Without an external complete dataset... If one could obtain a reliable initial estimate θ (0) and the associated variance-covariance matrix estimate V, then we can use it as the initial and perform the M-steps via θ (t+1) = arg max θ {(θ θ (0) ) T V 1 (θ θ (0) ) + 1 n Q (θ; θ (t) )}
30 30 / 33 Without external complete dataset... Consider missing-data mechanisms such as pr[r i = 1 x i, y i ] = w(x i, y i ; ψ) = w(y i ; ψ), There are two approaches for implementing the approximated EM algorithm for data with outcome-dependent nonresponses: 1 Use the pseudolikelihood estimate as the initial estimate and run the approximated EM. 2 Use the pseudolikelihood estimate as the initial estimate and incorporate the pseudolikehood function as a component in each M-step: θ (t+1) = arg max θ {l pl (θ) + Q (θ; θ (t) )} The second approach is more computationally intensive.
31 31 / 33 Future works Need to monitor {l(θ (t) ; ψ 0 ), Q (θ (t+1) ; θ (t) ) Q(θ (t+1) ; θ (t)) ; t = 0, 1, 2...}. (X,Y,R) Search for a target function l(θ; P n (X,Y,R) a stationary point of l(θ; P n, θ (0) )., θ (0) ) so that θ AEM is Variance estimation. Link to integrative data analysis.
32 32 / 33 Acknowledgment Megan Hunt Olson, University of Wisconsin, Green Bay. Yang Zhang, Amgen Inc.
33 33 / 33 Reference 1. Dempster, A. P., N. M. Laird and D. B. Rubin (1977). Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society: Series B 39, Tang, G., Little, R.J.A. and Raghunathan, T. (2003). Analysis of Multivariate Missing Data with Nonignorable Nonresponse. Biometrika, 90, Kim, J. K. and C. L. Yu (2011). A semi-parametric estimation of mean functionals with non-ignorable missing data, Journal of the American Statistical Association 106, Wang, S., J. Shao and J. K. Kim (2014). Identifiability and estimation in problems with nonignorable nonresponse, Statistica Sinica 24,
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationA note on multiple imputation for general purpose estimation
A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume
More informationRecent Advances in the analysis of missing data with non-ignorable missingness
Recent Advances in the analysis of missing data with non-ignorable missingness Jae-Kwang Kim Department of Statistics, Iowa State University July 4th, 2014 1 Introduction 2 Full likelihood-based ML estimation
More informationOptimization Methods II. EM algorithms.
Aula 7. Optimization Methods II. 0 Optimization Methods II. EM algorithms. Anatoli Iambartsev IME-USP Aula 7. Optimization Methods II. 1 [RC] Missing-data models. Demarginalization. The term EM algorithms
More informationDiscussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs
Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Michael J. Daniels and Chenguang Wang Jan. 18, 2009 First, we would like to thank Joe and Geert for a carefully
More informationBiostat 2065 Analysis of Incomplete Data
Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies
More informationFractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling
Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction
More informationStatistical Methods for Handling Missing Data
Statistical Methods for Handling Missing Data Jae-Kwang Kim Department of Statistics, Iowa State University July 5th, 2014 Outline Textbook : Statistical Methods for handling incomplete data by Kim and
More informationShu Yang and Jae Kwang Kim. Harvard University and Iowa State University
Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND
More informationFractional Imputation in Survey Sampling: A Comparative Review
Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical
More informationLikelihood-based inference with missing data under missing-at-random
Likelihood-based inference with missing data under missing-at-random Jae-kwang Kim Joint work with Shu Yang Department of Statistics, Iowa State University May 4, 014 Outline 1. Introduction. Parametric
More informationModification and Improvement of Empirical Likelihood for Missing Response Problem
UW Biostatistics Working Paper Series 12-30-2010 Modification and Improvement of Empirical Likelihood for Missing Response Problem Kwun Chuen Gary Chan University of Washington - Seattle Campus, kcgchan@u.washington.edu
More informationStreamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level
Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level A Monte Carlo Simulation to Test the Tenability of the SuperMatrix Approach Kyle M Lang Quantitative Psychology
More informationParametric fractional imputation for missing data analysis
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????),??,?, pp. 1 15 C???? Biometrika Trust Printed in
More informationanalysis of incomplete data in statistical surveys
analysis of incomplete data in statistical surveys Ugo Guarnera 1 1 Italian National Institute of Statistics, Italy guarnera@istat.it Jordan Twinning: Imputation - Amman, 6-13 Dec 2014 outline 1 origin
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationA Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness
A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationGraybill Conference Poster Session Introductions
Graybill Conference Poster Session Introductions 2013 Graybill Conference in Modern Survey Statistics Colorado State University Fort Collins, CO June 10, 2013 Small Area Estimation with Incomplete Auxiliary
More informationAN INSTRUMENTAL VARIABLE APPROACH FOR IDENTIFICATION AND ESTIMATION WITH NONIGNORABLE NONRESPONSE
Statistica Sinica 24 (2014), 1097-1116 doi:http://dx.doi.org/10.5705/ss.2012.074 AN INSTRUMENTAL VARIABLE APPROACH FOR IDENTIFICATION AND ESTIMATION WITH NONIGNORABLE NONRESPONSE Sheng Wang 1, Jun Shao
More informationStatistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes
Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes Kosuke Imai Department of Politics Princeton University July 31 2007 Kosuke Imai (Princeton University) Nonignorable
More informationREGRESSION ANALYSIS IN LONGITUDINAL STUDIES WITH NON-IGNORABLE MISSING OUTCOMES
REGRESSION ANALYSIS IN LONGITUDINAL STUDIES WITH NON-IGNORABLE MISSING OUTCOMES by Changyu Shen B.S. in Biology University of Science and Technology of China, 1998 Submitted to the Graduate Faculty of
More informationNuisance parameter elimination for proportional likelihood ratio models with nonignorable missingness and random truncation
Biometrika Advance Access published October 24, 202 Biometrika (202), pp. 8 C 202 Biometrika rust Printed in Great Britain doi: 0.093/biomet/ass056 Nuisance parameter elimination for proportional likelihood
More informationPrimal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing
Primal-dual Covariate Balance and Minimal Double Robustness via (Joint work with Daniel Percival) Department of Statistics, Stanford University JSM, August 9, 2015 Outline 1 2 3 1/18 Setting Rubin s causal
More informationSome methods for handling missing values in outcome variables. Roderick J. Little
Some methods for handling missing values in outcome variables Roderick J. Little Missing data principles Likelihood methods Outline ML, Bayes, Multiple Imputation (MI) Robust MAR methods Predictive mean
More informationBayesian estimation of the discrepancy with misspecified parametric models
Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012
More informationEstimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing
Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing Alessandra Mattei Dipartimento di Statistica G. Parenti Università
More informationAnnouncements. Proposals graded
Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:
More informationEM Algorithm II. September 11, 2018
EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data
More informationChapter 4: Imputation
Chapter 4: Imputation Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Basic Theory for imputation 3 Variance estimation after imputation 4 Replication variance estimation
More informationBiostat 2065 Analysis of Incomplete Data
Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh September 13 & 15, 2005 1. Complete-case analysis (I) Complete-case analysis refers to analysis based on
More informationReconstruction of individual patient data for meta analysis via Bayesian approach
Reconstruction of individual patient data for meta analysis via Bayesian approach Yusuke Yamaguchi, Wataru Sakamoto and Shingo Shirahata Graduate School of Engineering Science, Osaka University Masashi
More informationEM for ML Estimation
Overview EM for ML Estimation An algorithm for Maximum Likelihood (ML) Estimation from incomplete data (Dempster, Laird, and Rubin, 1977) 1. Formulate complete data so that complete-data ML estimation
More informationExpectation Maximization
Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /
More informationMissing Data: Theory & Methods. Introduction
STAT 992/BMI 826 University of Wisconsin-Madison Missing Data: Theory & Methods Chapter 1 Introduction Lu Mao lmao@biostat.wisc.edu 1-1 Introduction Statistical analysis with missing data is a rich and
More informationCalibration Estimation of Semiparametric Copula Models with Data Missing at Random
Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Econometrics Workshop UNC
More informationPARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS
Statistica Sinica 15(2005), 831-840 PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS Florin Vaida University of California at San Diego Abstract: It is well known that the likelihood sequence of the EM algorithm
More informationMiscellanea A note on multiple imputation under complex sampling
Biometrika (2017), 104, 1,pp. 221 228 doi: 10.1093/biomet/asw058 Printed in Great Britain Advance Access publication 3 January 2017 Miscellanea A note on multiple imputation under complex sampling BY J.
More informationLast lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton
EM Algorithm Last lecture 1/35 General optimization problems Newton Raphson Fisher scoring Quasi Newton Nonlinear regression models Gauss-Newton Generalized linear models Iteratively reweighted least squares
More informationBasics of Modern Missing Data Analysis
Basics of Modern Missing Data Analysis Kyle M. Lang Center for Research Methods and Data Analysis University of Kansas March 8, 2013 Topics to be Covered An introduction to the missing data problem Missing
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationA weighted simulation-based estimator for incomplete longitudinal data models
To appear in Statistics and Probability Letters, 113 (2016), 16-22. doi 10.1016/j.spl.2016.02.004 A weighted simulation-based estimator for incomplete longitudinal data models Daniel H. Li 1 and Liqun
More informationBias-Variance Tradeoff
What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationSemiparametric Generalized Linear Models
Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student
More informationLatent Variable Models and EM algorithm
Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationA review of some semiparametric regression models with application to scoring
A review of some semiparametric regression models with application to scoring Jean-Loïc Berthet 1 and Valentin Patilea 2 1 ENSAI Campus de Ker-Lann Rue Blaise Pascal - BP 37203 35172 Bruz cedex, France
More informationDensity estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas
0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity
More informationLecture 5: LDA and Logistic Regression
Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant
More informationLecture 8: Bayesian Estimation of Parameters in State Space Models
in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationLocal Polynomial Wavelet Regression with Missing at Random
Applied Mathematical Sciences, Vol. 6, 2012, no. 57, 2805-2819 Local Polynomial Wavelet Regression with Missing at Random Alsaidi M. Altaher School of Mathematical Sciences Universiti Sains Malaysia 11800
More informationClustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation
Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide
More informationTopics and Papers for Spring 14 RIT
Eric Slud Feb. 3, 204 Topics and Papers for Spring 4 RIT The general topic of the RIT is inference for parameters of interest, such as population means or nonlinearregression coefficients, in the presence
More informationA Sampling of IMPACT Research:
A Sampling of IMPACT Research: Methods for Analysis with Dropout and Identifying Optimal Treatment Regimes Marie Davidian Department of Statistics North Carolina State University http://www.stat.ncsu.edu/
More informationImplications of Missing Data Imputation for Agricultural Household Surveys: An Application to Technology Adoption
Implications of Missing Data Imputation for Agricultural Household Surveys: An Application to Technology Adoption Haluk Gedikoglu Assistant Professor of Agricultural Economics Cooperative Research Programs
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationAnalysis of Matched Case Control Data in Presence of Nonignorable Missing Exposure
Biometrics DOI: 101111/j1541-0420200700828x Analysis of Matched Case Control Data in Presence of Nonignorable Missing Exposure Samiran Sinha 1, and Tapabrata Maiti 2, 1 Department of Statistics, Texas
More informationOptimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.
Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may
More informationLOCAL LINEAR REGRESSION FOR GENERALIZED LINEAR MODELS WITH MISSING DATA
The Annals of Statistics 1998, Vol. 26, No. 3, 1028 1050 LOCAL LINEAR REGRESSION FOR GENERALIZED LINEAR MODELS WITH MISSING DATA By C. Y. Wang, 1 Suojin Wang, 2 Roberto G. Gutierrez and R. J. Carroll 3
More informationLikelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University
Likelihood, MLE & EM for Gaussian Mixture Clustering Nick Duffield Texas A&M University Probability vs. Likelihood Probability: predict unknown outcomes based on known parameters: P(x q) Likelihood: estimate
More informationA STRATEGY FOR STEPWISE REGRESSION PROCEDURES IN SURVIVAL ANALYSIS WITH MISSING COVARIATES. by Jia Li B.S., Beijing Normal University, 1998
A STRATEGY FOR STEPWISE REGRESSION PROCEDURES IN SURVIVAL ANALYSIS WITH MISSING COVARIATES by Jia Li B.S., Beijing Normal University, 1998 Submitted to the Graduate Faculty of the Graduate School of Public
More informationEstimating the parameters of hidden binomial trials by the EM algorithm
Hacettepe Journal of Mathematics and Statistics Volume 43 (5) (2014), 885 890 Estimating the parameters of hidden binomial trials by the EM algorithm Degang Zhu Received 02 : 09 : 2013 : Accepted 02 :
More informationRobustness to Parametric Assumptions in Missing Data Models
Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice
More informationCalibration Estimation of Semiparametric Copula Models with Data Missing at Random
Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Institute of Statistics
More informationEM Algorithm. Expectation-maximization (EM) algorithm.
EM Algorithm Outline: Expectation-maximization (EM) algorithm. Examples. Reading: A.P. Dempster, N.M. Laird, and D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc.,
More informationComputing the MLE and the EM Algorithm
ECE 830 Fall 0 Statistical Signal Processing instructor: R. Nowak Computing the MLE and the EM Algorithm If X p(x θ), θ Θ, then the MLE is the solution to the equations logp(x θ) θ 0. Sometimes these equations
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationSTATISTICAL ANALYSIS WITH MISSING DATA
STATISTICAL ANALYSIS WITH MISSING DATA SECOND EDITION Roderick J.A. Little & Donald B. Rubin WILEY SERIES IN PROBABILITY AND STATISTICS Statistical Analysis with Missing Data Second Edition WILEY SERIES
More informationAccelerating the EM Algorithm for Mixture Density Estimation
Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 1/18 Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department Worcester Polytechnic
More informationComparison between conditional and marginal maximum likelihood for a class of item response models
(1/24) Comparison between conditional and marginal maximum likelihood for a class of item response models Francesco Bartolucci, University of Perugia (IT) Silvia Bacci, University of Perugia (IT) Claudia
More informationCausal Inference in Observational Studies with Non-Binary Treatments. David A. van Dyk
Causal Inference in Observational Studies with Non-Binary reatments Statistics Section, Imperial College London Joint work with Shandong Zhao and Kosuke Imai Cass Business School, October 2013 Outline
More informationLecture Notes: Some Core Ideas of Imputation for Nonresponse in Surveys. Tom Rosenström University of Helsinki May 14, 2014
Lecture Notes: Some Core Ideas of Imputation for Nonresponse in Surveys Tom Rosenström University of Helsinki May 14, 2014 1 Contents 1 Preface 3 2 Definitions 3 3 Different ways to handle MAR data 4 4
More information6. Fractional Imputation in Survey Sampling
6. Fractional Imputation in Survey Sampling 1 Introduction Consider a finite population of N units identified by a set of indices U = {1, 2,, N} with N known. Associated with each unit i in the population
More informationChap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University
Chap 2. Linear Classifiers (FTH, 4.1-4.4) Yongdai Kim Seoul National University Linear methods for classification 1. Linear classifiers For simplicity, we only consider two-class classification problems
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationMultistate Modeling and Applications
Multistate Modeling and Applications Yang Yang Department of Statistics University of Michigan, Ann Arbor IBM Research Graduate Student Workshop: Statistics for a Smarter Planet Yang Yang (UM, Ann Arbor)
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationDiscussion of Maximization by Parts in Likelihood Inference
Discussion of Maximization by Parts in Likelihood Inference David Ruppert School of Operations Research & Industrial Engineering, 225 Rhodes Hall, Cornell University, Ithaca, NY 4853 email: dr24@cornell.edu
More informationExponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger
Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm by Korbinian Schwinger Overview Exponential Family Maximum Likelihood The EM Algorithm Gaussian Mixture Models Exponential
More informationEmpirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design
1 / 32 Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design Changbao Wu Department of Statistics and Actuarial Science University of Waterloo (Joint work with Min Chen and Mary
More informationMachine Learning: A Statistics and Optimization Perspective
Machine Learning: A Statistics and Optimization Perspective Nan Ye Mathematical Sciences School Queensland University of Technology 1 / 109 What is Machine Learning? 2 / 109 Machine Learning Machine learning
More informationNearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2
Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013
Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationMultiple Imputation for Missing Data in Repeated Measurements Using MCMC and Copulas
Multiple Imputation for Missing Data in epeated Measurements Using MCMC and Copulas Lily Ingsrisawang and Duangporn Potawee Abstract This paper presents two imputation methods: Marov Chain Monte Carlo
More informationNew mixture models and algorithms in the mixtools package
New mixture models and algorithms in the mixtools package Didier Chauveau MAPMO - UMR 6628 - Université d Orléans 1ères Rencontres BoRdeaux Juillet 2012 Outline 1 Mixture models and EM algorithm-ology
More informationSubsample ignorable likelihood for regression analysis with missing data
Appl. Statist. (2011) 60, Part 4, pp. 591 605 Subsample ignorable likelihood for regression analysis with missing data Roderick J. Little and Nanhua Zhang University of Michigan, Ann Arbor, USA [Received
More information12 - Nonparametric Density Estimation
ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6
More informationSemiparametric Gaussian Copula Models: Progress and Problems
Semiparametric Gaussian Copula Models: Progress and Problems Jon A. Wellner University of Washington, Seattle 2015 IMS China, Kunming July 1-4, 2015 2015 IMS China Meeting, Kunming Based on joint work
More informationGeneralized Linear Models Introduction
Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,
More informationColumbia University. Columbia University Biostatistics Technical Report Series. Handling Missing Data by Deleting Completely Observed Records
Columbia University Columbia University Biostatistics Technical Report Series Year 2006 Paper 13 Handling Missing Data by Deleting Completely Observed Records Cuiling Wang Myunghee Cho Paik Columbia University,
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More informationGeneralized Linear Models
Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.
More informationStatistical Estimation
Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from
More information