Introduction An approximated EM algorithm Simulation studies Discussion

Size: px
Start display at page:

Download "Introduction An approximated EM algorithm Simulation studies Discussion"

Transcription

1 1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse November 12-13, 2015

2 2 / 33 1 Introduction Background & Current Approaches Expectation-Maximization (EM) Algorithm for Regression Analysis of Data with Nonresponse 2 An approximated EM algorithm The algorithm Application in contingency table analysis Application in regression analyses with a continuous outcome 3 Simulation studies 4 Discussion

3 3 / 33 Regression Analysis of Data with Nonresponse Consider bivariate data {x i, y i, i = 1, 2,..., n} where x i s are fully observed, y i s are only observed for i = 1,..., m. Denote R i to be the missing-data indicator: R i = 1 if y i is observed and R i = 0 otherwise. Assume that [y i x i ] g(y i x i ; θ) exp{θs(x i, y i ) + a(θ)} pr[r i = 1 x i, y i ] = w(x i, y i ; ψ) and the parameter of interest is θ. Goal: to avoid modeling w(x i, y i ; ψ).

4 4 / 33 Likelihood-based Inference (I) The observed data are D obs = {x i, R i, R i y i ; i = 1,..., n}. When w(x i, y i ; ψ) is parametrically modeled, the likelihood function is L(θ, ψ; D obs ) = = n p(r i, R i y i x i ; θ, ψ) m g(y i x i ; θ)w(x i, y i ; ψ) n i=m+1 g(y i x i ; θ){1 w(x i, y i ; ψ)} dy i When w(x i, y i ; ψ) = w(x i ; ψ), data are called missing at random (MAR) and L(θ, ψ; D obs ) L(θ; D obs )L(ψ; D obs ) (Rubin, 1976).

5 5 / 33 Likelihood-based Inference (II) When data are MAR plus θ and ψ are distinct, the modeling of w(x, y; ψ) is not necessary under the likelihood-based inference. When data are not MAR, the missingness has to be modeled and the inference on θ and ψ are made together. Misspecification of the missing-data model w(x, y; ψ) often leads to biased estimate of θ.

6 6 / 33 A conditional likelihood Assume that [y i x i ] g(y i x i ; θ) (A parametric regression) pr[r i = 1 x i, y i ] = w(x i, y i ; ψ) = w(y i ; ψ). Then [X Y, R = 1] = [X Y, R = 0] = [X Y ]. Consider the following conditional likelihood: CL(θ) = R i =1 p(x i y i ; θ, F X ) (F X is the CDF of X) = R i =1 g(y i x i ; θ)p(x i ) p(y i ; θ, F X ) (Bayes formula) R i =1 g(y i x i ; θ) g(yi x; θ) df X (x) Requires knowing F X ( )!

7 A pseudolikelihood method (Tang, Little & Raghunathan, 2003) In alternative, we may either model X f (x; α), obtain α = arg max α n f (x i; α), then consider a pseudolikelihood function PL 1 (θ) = R i =1 g(y i x i ; θ) g(yi x; θ) df X (x; α) or substitute F X ( ) by the empirical distribution F n ( ): PL 2 (θ) = R i =1 p(y i x i ; θ) p(yi x; θ) df n (x) = p(y i x i ; θ) 1 n R i =1 n j=1 p(y i x j ; θ) 7 / 33

8 8 / 33 Exponential tilting (Kim & Yu, 2011) Consider the following semiparametric logistic regression model: pr[r i = 1 x i, y i ] = logit 1 {h(x i ) ψy i }, where h( ) is unspecified and ψ is either known or estimated from an external dataset with the missing values recovered. Subsequently we would have: exp(ψy) p(y x, r = 0) = p(y x, r = 1) E[exp(ψy) x, r = 1]. With both p(y x, r = 1) and E[exp(ψy) x, r = 1] empirically estimated, one will obtain a density estimator for p(y x, r = 0) and estimate θ = E[H(X, Y )] via: θ = 1 n { ri =1 H(x i, y i ) + n Ê[H(x i, y i ) x i, r i = 0]} r i =0

9 9 / 33 An instrumental variable approach (Wang, Shao & Kim, 2014) Assume that X = (Z, U) and E(Y X) = β 0 + β 1 u + β 2 z pr(r = 1 Z, U, Y ) = pr(r = 1 U, Y ) = π(ψ; U, Y ) one may use the following estimating equations to estimate ψ: n r i { π(ψ; u i, y i ) 1}(1, u i, z i ) = 0. Then estimate β s via n r i π( ψ; u i, y i ) (y i β 0 β 1 u i + β 2 z i )(1, u i, z i ) = 0.

10 10 / 33 The likelihood function From the following representation: L(θ, ψ; D obs ) = = = = n p(r i, R i y i x i ; θ, ψ) n n n p(r i, R i y i, (1 R i )y i x i ; θ, ψ) p((1 R i )y i R i, R i y i, x i ; θ, ψ) p(r i, y i x i ; θ, ψ) p((1 R i )y i R i, R i y i, x i ; θ, ψ) p(y i x i ; θ)p(r i x i, y i ; ψ) p((1 R i )y i R i, R i y i, x i ; θ, ψ),

11 11 / 33 The log-likelihood function We have Let Q(θ, ψ; θ, ψ) = l(θ, ψ; y obs ) = log L(θ, ψ; y obs ) nx = {log p(y i x i ; θ) + log p(r i x i, y i ; ψ) nx log p((1 R i )y i R i, R i y i, x i ; θ, ψ)}. E[log p(y i x i ; θ) + log p(r i x i, y i ; ψ) R i, R i y i, x i ; θ, ψ], H(θ, ψ; θ, nx ψ) = E[log p((1 R i )y i R i, R i y i, x i ; θ, ψ) R i, R i y i, x i ; θ, ψ]. Then we have l(θ, ψ; y obs ) = Q(θ, ψ; θ, ψ) H(θ, ψ; θ, ψ), for any ( θ, ψ).

12 12 / 33 The Expectation-Maximization (EM) Algorithm (Dempster, Laird and Rubin, 1977) From the Jansen s inequality, we have H(θ, ψ; θ, ψ) H( θ, ψ; θ, ψ). The EM algorithm is an iterative algorithm to find a sequence {(θ (t), ψ (t) ), t = 0, 1, 2,...} such that (θ (t+1), ψ (t+1) ) = arg max (θ,ψ) Q(θ, ψ; θ (t), ψ (t) ). This assures that l(θ (t), ψ (t) ; y obs ) l(θ (t+1), ψ (t+1) ; y obs ). Typically logistic or probit regression models are used for w(x, y; ψ). Numerical integrations or the Monte Carlo method are often necessary in the implementation.

13 13 / 33 When ψ = ψ 0 is known Now consider when the true value of ψ, ψ 0, is known: l(θ, ψ 0 ; y obs ) = log L(θ, ψ 0 ; y obs ) nx = {log p(y i x i ; θ) + log p(r i x i, y i ; ψ 0 ) log p((1 R i )y i D i,obs ; θ, ψ 0 )}. With current estimate θ (t), the Q-function becomes Q(θ; θ (t) ) = nx E[log p(y i x i ; θ) + log p(r i x i, y i ; ψ 0 ) R i, R i y i, x i ; θ (t), ψ 0 ] nx E[log p(y i x i ; θ) R i, R i y i, x i ; θ (t), ψ 0 ], because the second term does not involve θ.

14 14 / 33 Another view of the likelihood Without loss of generality, assume that the regression model follows a canonical exponential family. Then Ω(θ; θ (t) ) = mx nx logg(y i x i ; θ) + E[log g(y i x i ; θ) x i, R i = 0; θ (t), ψ 0 ] mx logg(y i x i ; θ) + i=m+1 i=m+1 nx {θe[s(x i, y i ) x i, R i = 0; θ (t), ψ 0 ] + a(θ)}, we need to obtain E[S(x i, y i ) x i, R i = 0; θ (t), ψ 0 ] in order to carry out the EM algorithm. With w(x i, y i ; ψ 0 ) known, R E[S(x i, y i ) x i, R i = 0; θ (t) s(xi, y i )g(y i x i ; θ (t) ){1 w(x i, y i ; ψ 0 )} dy i, ψ 0 ] = R g(yi x i ; θ (t) ){1 w(x i, y i ; ψ 0 )} dy i

15 15 / 33 An alternative look of the E-step On the other hand, E[S(x i, y i ) x i, θ (t) ] = E[E[S(x i, y i ) R i, x i ] x i ] We would have E[S(x i, y i ) x i, R i = 0; θ (t), ψ 0 ] = E[S(x i, y i ) x i, R i = 1; θ (t) ]pr[r i = 1 x i ; θ (t), ψ 0 ] +E[S(x i, y i ) x i, R i = 0; θ (t), ψ 0 ]pr[r i = 0 x i ; θ (t), ψ 0 ], = E[S(x i, y i ) x i, θ (t) ] E[S(x i, y i ) x i, R i = 1; θ (t), ψ 0 ]pr[r i = 1 x i ; θ (t), ψ 0 ] pr[r i = 0 x i ; θ (t), ψ 0 ]

16 16 / 33 Empirical replacements It is noted that if we use some empirical estimates of E[S(x i, y i ) x i, R i = 1; θ (t), ψ 0 ] and pr[r i = 0 x i ; θ (t), ψ 0 ] to replace them, we would be able to carry out an iterative algorithm as the following: At the E-step be[s(x i, y i ) x i, R i = 0; θ (t), ψ 0 ] = E[S(x i, y i ) x i, θ (t) ] b E[S(x i, y i ) x i, R i = 1] bpr[r i = 1 x i ] 1 bpr[r i = 1 x i ] At the M-step, find θ = θ (t+1) that solves: θ (t+1) = arg max θ Q (θ; θ (t) ) mx := arg max θ [θ{ S(x i, y i ) + nx i=m+1 be[s(x i, Y ) x i, R i = 0; θ (t) ]} + na(θ)]

17 17 / 33 An important observation In usual an EM algorithm, θ (t) may be far away from the truth θ 0. However, in order for the empirical estimates pr[r i = 1 x i ] and Ê[S(x i, y i ) x i, R i = 1] to be self-consistent with the corresponding terms under (θ (t), ψ 0 ), it is required that θ (t) is a consistent estimate of θ. Therefore we should start with a consistent initial estimate of θ and carry out this iterative algorithm to obtain another consistent estimate of θ at convergence.

18 A short summary of the modified EM 1 Choose initial value of θ (0) from either a complete dataset from subjects with similar characteristics or a completed subset recovered from recalls. 2 Compute the Nadaraya-Watson estimates or local polynomial estimates for pr[r = 1 x i ] and E[S(x i, Y ) x i, R = 1], for each i = m + 1, m + 2,..., n. 3 At the E-step of the tth iteration, calculate be[s(x i, y i ) x i, R i = 0; θ (t), ψ 0 ]. 4 At the M-step of the tth iteration, find θ = θ (t+1) that solves: nx mx nx E[S(x i, y i ) x i ; θ] = S(x i, y i ) + be[s(x i, Y ) x i, R i = 0; θ (t) ] i=m+1 With a complete external dataset {x i, y i ; i = n + 1,..., n + n E }, we would solve θ = θ (t+1) from: n+n nx X E E[S(x i, y i ) x i ; θ] + E[S(x i, y i ) x i ; θ] i=n+1 = mx S(xi, y i ) + nx be[s(xi, Y ) x i, R i = 0; θ (t) ] + n+n X E S(xi, y i ) 18 / 33

19 19 / 33 Here we design an iterative algorithm with a sequence {θ (t), t = 0, 1, 2,...} such that θ (t+1) = arg max θ Q (θ; θ (t) ) = arg max θ {Q(θ; θ (t) ) + o(1)} This algorithm will yield l(θ (t+1), ψ 0 ; y obs ) l(θ (t), ψ 0 ; y obs ) + o(1).

20 An example in contingency table analysis Consider discrete data {x i, y i, z i, i = 1,..., n}: x i s and y i s are fully observed. z i is observed for i = 1,..., c; missing for i = c + 1,..., n. Let m = n c. Essentially the observed include the fully classified table {c jkl } and the partially classified table {m jk+ }. Complete external data D E = {x i, y i, z i, i = n + 1,..., n + n E } are fully observed. A log-linear model is assumed with parameter θ, which leads to π jkl = pr[x = i, Y = j, Z = l], j = 1,..., J; k = 1,..., K ; l = 1,..., L. Initial estimates {π (0) jkl } are derived from the external complete data D E. 20 / 33

21 21 / 33 Implementation under the discrete setting Initial estimations: bpr[z = l x = j, y = k, R = 1] = bpr[r = 1 x = j, y = k] = At the tth step, for (j, k, l), we update S jkl = c jkl + P c I{x i = j, y i = k, z i = l} P c I{x i = j, y i = k} P c I{x i = j, y i = k} P n I{x i = j, y i = k} = c jk+ n jk+. = c jkl c jk+ nx I{x i = j, y i = k} bpr[z = l x = j, y = k, R = 0] i=c+1 = c jkl + m jk+ pr[z = l x = j, y = k; θ (t) ] cjkl c jk+ c jk+ n jk+ 1 c jk+ n jk+ = c jkl + m jk+ π (t) jkl c jkl π (t) n jk+ jk+ m jk+ n jk+ = n jk+ π (t) jkl π (t) jk+

22 22 / 33 Impression on the discrete setting In the E-step, we ended up as if imputing all Z i s based on (x i, y i ), including those observed ones. If we include the external data D E in the algorithm with updating the sufficient statistics through π (t) Sjkl = njkl E + S jkl = njkl E jkl + n jk+ π (t), jk+ the modified EM algorithm is equivalent to running a regular EM algorithm on the fully classified table {n E jkl } and the partial classified table {n jk+ }, or, removing the observed z i s from the complete cases (not part of the external complete data).

23 23 / 33 Implementation under the continuous setting Consider the regression analysis of [Y X] where Y is continuous and subject to nonresponse If X is discrete, the empirical approximations are: r j =1,x j =x i =k Ê[S(x i, Y ) x i = k, r i = 1] = s(x i, y j ) #{j : r j = 1, x j = x i = k} pr[r i = 1 x i = k] = #{j : r j = 1, x j = x i = k} #{j : x j = x i = k} If X is continuous, we use either the Nadaraya-Watson estimate or local polynomial estimate as Ê[S(x i, Y ) x i = k, r i = 1]; a kernel estimator for pr[r i = 1 x i = k].

24 24 / 33 Simulation settings when data are NMAR We simulated bivariate data {x i, y i, i = 1, 2,..., N} following: (i) x i N(0, 1) or x i Bin(5, 0.3). (ii) [y i x i ] N(β 0 + β 1 x i, σ 2 ), where θ = (β 0, β 1, σ 2 ) = (1, 1, 1). Keep n E = 200 of the above observations as the external datasets for obtaining initial values for θ. Simulate missing y i s from the rest n = 1000 subjects with: pr[r i = 1 x i, y i ] = Φ(ψ 0 + ψ 1 x i + ψ 2 y i ), where Φ() is the CDF of the Gaussian distribution. Compare the complete-case estimate ( θ cc ) based on the 1000 observations with missing data, MLE from the external data ( θ E ) and the proposed approximated EM estimates ( θ AEM ).

25 25 / 33 Simulation results with X N(0, 1) Methods β 0 β 1 σ 2 Complete-case analysis Empirical Bias* Empirical SD* Coverage of 95% CI 0% 8.8% 79.5% MLEs from external subset Empirical Bias* Empirical SD* Coverage of 95% CI 96.1% 93.8% 92.6% Approximated EM** Empirical Bias* Empirical SD* Coverage of 95% CI 95.0% 95.2% 93.4% The proportion of nonresponse was about 40%. * Empirical biases and SDs were the actual numbers times ** The Epanechnikov kernel was used in the approximated EM. Bootstrap was used to derive the standard errors of the b θ AEM

26 26 / 33 Simulation results with X Bin(5, 0.3) Methods β 0 β 1 σ 2 Complete-case analysis Empirical Bias* Empirical SD* Coverage of 95% CI 94.6% 11.6% 86.5% MLEs from external subset Empirical Bias* Empirical SD* Coverage of 95% CI 94.7% 94.2% 92.5% Approximated EM** Empirical Bias* Empirical SD* Coverage of 95% CI 94.7% 93.9% 93.6% The proportion of nonresponse was about 40% * Empirical biases and SDs were the actual numbers times ** The empirical averages were used in the approximated EM. Bootstrap was used to derive the standard errors of the b θ AEM.

27 27 / 33 Simulation settings when data MAR We simulated bivariate data {x i, y i, i = 1, 2,..., N} following: (i) x i N(0, 1). (ii) [y i x i ] N(β 0 + β 1 x i, σ 2 ), where θ = (β 0, β 1, σ 2 ) = (1, 1, 1). Keep n E = 200 of the above observations as the external datasets for obtaining initial values for θ. Simulate missing y i s from the rest n = 1000 subjects with: pr[r i = 1 x i, y i ] = Φ(ψ 0 + ψ 1 x i ), where Φ() is the CDF of the Gaussian distribution. Compare the complete-case estimate ( θ cc ) based on the 1000 observations with missing data, MLE from the external data ( θ E ) and the proposed approximated EM estimates ( θ AEM ).

28 28 / 33 Simulation results with X N(0, 1) Methods β 0 β 1 σ 2 Complete-case analysis Empirical Bias* Empirical SD* Coverage of 95% CI 94% 94.5% 93.4% MLEs from external subset Empirical Bias* Empirical SD* Coverage of 95% CI 96.1% 93.8% 92.6% Approximated EM** Empirical Bias* Empirical SD* Coverage of 95% CI 93.9% 94.5% 93.7% The proportion of nonresponse was about 40%. * Empirical biases and SDs were the actual numbers times ** The Epanechnikov kernel was used in the modified EM. Bootstrap was used to derive the standard errors of the b θ AEM

29 29 / 33 Without an external complete dataset... If one could obtain a reliable initial estimate θ (0) and the associated variance-covariance matrix estimate V, then we can use it as the initial and perform the M-steps via θ (t+1) = arg max θ {(θ θ (0) ) T V 1 (θ θ (0) ) + 1 n Q (θ; θ (t) )}

30 30 / 33 Without external complete dataset... Consider missing-data mechanisms such as pr[r i = 1 x i, y i ] = w(x i, y i ; ψ) = w(y i ; ψ), There are two approaches for implementing the approximated EM algorithm for data with outcome-dependent nonresponses: 1 Use the pseudolikelihood estimate as the initial estimate and run the approximated EM. 2 Use the pseudolikelihood estimate as the initial estimate and incorporate the pseudolikehood function as a component in each M-step: θ (t+1) = arg max θ {l pl (θ) + Q (θ; θ (t) )} The second approach is more computationally intensive.

31 31 / 33 Future works Need to monitor {l(θ (t) ; ψ 0 ), Q (θ (t+1) ; θ (t) ) Q(θ (t+1) ; θ (t)) ; t = 0, 1, 2...}. (X,Y,R) Search for a target function l(θ; P n (X,Y,R) a stationary point of l(θ; P n, θ (0) )., θ (0) ) so that θ AEM is Variance estimation. Link to integrative data analysis.

32 32 / 33 Acknowledgment Megan Hunt Olson, University of Wisconsin, Green Bay. Yang Zhang, Amgen Inc.

33 33 / 33 Reference 1. Dempster, A. P., N. M. Laird and D. B. Rubin (1977). Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society: Series B 39, Tang, G., Little, R.J.A. and Raghunathan, T. (2003). Analysis of Multivariate Missing Data with Nonignorable Nonresponse. Biometrika, 90, Kim, J. K. and C. L. Yu (2011). A semi-parametric estimation of mean functionals with non-ignorable missing data, Journal of the American Statistical Association 106, Wang, S., J. Shao and J. K. Kim (2014). Identifiability and estimation in problems with nonignorable nonresponse, Statistica Sinica 24,

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

A note on multiple imputation for general purpose estimation

A note on multiple imputation for general purpose estimation A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume

More information

Recent Advances in the analysis of missing data with non-ignorable missingness

Recent Advances in the analysis of missing data with non-ignorable missingness Recent Advances in the analysis of missing data with non-ignorable missingness Jae-Kwang Kim Department of Statistics, Iowa State University July 4th, 2014 1 Introduction 2 Full likelihood-based ML estimation

More information

Optimization Methods II. EM algorithms.

Optimization Methods II. EM algorithms. Aula 7. Optimization Methods II. 0 Optimization Methods II. EM algorithms. Anatoli Iambartsev IME-USP Aula 7. Optimization Methods II. 1 [RC] Missing-data models. Demarginalization. The term EM algorithms

More information

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs

Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Discussion of Missing Data Methods in Longitudinal Studies: A Review by Ibrahim and Molenberghs Michael J. Daniels and Chenguang Wang Jan. 18, 2009 First, we would like to thank Joe and Geert for a carefully

More information

Biostat 2065 Analysis of Incomplete Data

Biostat 2065 Analysis of Incomplete Data Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies

More information

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

Statistical Methods for Handling Missing Data

Statistical Methods for Handling Missing Data Statistical Methods for Handling Missing Data Jae-Kwang Kim Department of Statistics, Iowa State University July 5th, 2014 Outline Textbook : Statistical Methods for handling incomplete data by Kim and

More information

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND

More information

Fractional Imputation in Survey Sampling: A Comparative Review

Fractional Imputation in Survey Sampling: A Comparative Review Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical

More information

Likelihood-based inference with missing data under missing-at-random

Likelihood-based inference with missing data under missing-at-random Likelihood-based inference with missing data under missing-at-random Jae-kwang Kim Joint work with Shu Yang Department of Statistics, Iowa State University May 4, 014 Outline 1. Introduction. Parametric

More information

Modification and Improvement of Empirical Likelihood for Missing Response Problem

Modification and Improvement of Empirical Likelihood for Missing Response Problem UW Biostatistics Working Paper Series 12-30-2010 Modification and Improvement of Empirical Likelihood for Missing Response Problem Kwun Chuen Gary Chan University of Washington - Seattle Campus, kcgchan@u.washington.edu

More information

Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level

Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level A Monte Carlo Simulation to Test the Tenability of the SuperMatrix Approach Kyle M Lang Quantitative Psychology

More information

Parametric fractional imputation for missing data analysis

Parametric fractional imputation for missing data analysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????),??,?, pp. 1 15 C???? Biometrika Trust Printed in

More information

analysis of incomplete data in statistical surveys

analysis of incomplete data in statistical surveys analysis of incomplete data in statistical surveys Ugo Guarnera 1 1 Italian National Institute of Statistics, Italy guarnera@istat.it Jordan Twinning: Imputation - Amman, 6-13 Dec 2014 outline 1 origin

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness A. Linero and M. Daniels UF, UT-Austin SRC 2014, Galveston, TX 1 Background 2 Working model

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Graybill Conference Poster Session Introductions

Graybill Conference Poster Session Introductions Graybill Conference Poster Session Introductions 2013 Graybill Conference in Modern Survey Statistics Colorado State University Fort Collins, CO June 10, 2013 Small Area Estimation with Incomplete Auxiliary

More information

AN INSTRUMENTAL VARIABLE APPROACH FOR IDENTIFICATION AND ESTIMATION WITH NONIGNORABLE NONRESPONSE

AN INSTRUMENTAL VARIABLE APPROACH FOR IDENTIFICATION AND ESTIMATION WITH NONIGNORABLE NONRESPONSE Statistica Sinica 24 (2014), 1097-1116 doi:http://dx.doi.org/10.5705/ss.2012.074 AN INSTRUMENTAL VARIABLE APPROACH FOR IDENTIFICATION AND ESTIMATION WITH NONIGNORABLE NONRESPONSE Sheng Wang 1, Jun Shao

More information

Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes

Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes Statistical Analysis of Randomized Experiments with Nonignorable Missing Binary Outcomes Kosuke Imai Department of Politics Princeton University July 31 2007 Kosuke Imai (Princeton University) Nonignorable

More information

REGRESSION ANALYSIS IN LONGITUDINAL STUDIES WITH NON-IGNORABLE MISSING OUTCOMES

REGRESSION ANALYSIS IN LONGITUDINAL STUDIES WITH NON-IGNORABLE MISSING OUTCOMES REGRESSION ANALYSIS IN LONGITUDINAL STUDIES WITH NON-IGNORABLE MISSING OUTCOMES by Changyu Shen B.S. in Biology University of Science and Technology of China, 1998 Submitted to the Graduate Faculty of

More information

Nuisance parameter elimination for proportional likelihood ratio models with nonignorable missingness and random truncation

Nuisance parameter elimination for proportional likelihood ratio models with nonignorable missingness and random truncation Biometrika Advance Access published October 24, 202 Biometrika (202), pp. 8 C 202 Biometrika rust Printed in Great Britain doi: 0.093/biomet/ass056 Nuisance parameter elimination for proportional likelihood

More information

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing

Primal-dual Covariate Balance and Minimal Double Robustness via Entropy Balancing Primal-dual Covariate Balance and Minimal Double Robustness via (Joint work with Daniel Percival) Department of Statistics, Stanford University JSM, August 9, 2015 Outline 1 2 3 1/18 Setting Rubin s causal

More information

Some methods for handling missing values in outcome variables. Roderick J. Little

Some methods for handling missing values in outcome variables. Roderick J. Little Some methods for handling missing values in outcome variables Roderick J. Little Missing data principles Likelihood methods Outline ML, Bayes, Multiple Imputation (MI) Robust MAR methods Predictive mean

More information

Bayesian estimation of the discrepancy with misspecified parametric models

Bayesian estimation of the discrepancy with misspecified parametric models Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012

More information

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing

Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing Estimating and Using Propensity Score in Presence of Missing Background Data. An Application to Assess the Impact of Childbearing on Wellbeing Alessandra Mattei Dipartimento di Statistica G. Parenti Università

More information

Announcements. Proposals graded

Announcements. Proposals graded Announcements Proposals graded Kevin Jamieson 2018 1 Bayesian Methods Machine Learning CSE546 Kevin Jamieson University of Washington November 1, 2018 2018 Kevin Jamieson 2 MLE Recap - coin flips Data:

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:

More information

EM Algorithm II. September 11, 2018

EM Algorithm II. September 11, 2018 EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data

More information

Chapter 4: Imputation

Chapter 4: Imputation Chapter 4: Imputation Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Basic Theory for imputation 3 Variance estimation after imputation 4 Replication variance estimation

More information

Biostat 2065 Analysis of Incomplete Data

Biostat 2065 Analysis of Incomplete Data Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh September 13 & 15, 2005 1. Complete-case analysis (I) Complete-case analysis refers to analysis based on

More information

Reconstruction of individual patient data for meta analysis via Bayesian approach

Reconstruction of individual patient data for meta analysis via Bayesian approach Reconstruction of individual patient data for meta analysis via Bayesian approach Yusuke Yamaguchi, Wataru Sakamoto and Shingo Shirahata Graduate School of Engineering Science, Osaka University Masashi

More information

EM for ML Estimation

EM for ML Estimation Overview EM for ML Estimation An algorithm for Maximum Likelihood (ML) Estimation from incomplete data (Dempster, Laird, and Rubin, 1977) 1. Formulate complete data so that complete-data ML estimation

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /

More information

Missing Data: Theory & Methods. Introduction

Missing Data: Theory & Methods. Introduction STAT 992/BMI 826 University of Wisconsin-Madison Missing Data: Theory & Methods Chapter 1 Introduction Lu Mao lmao@biostat.wisc.edu 1-1 Introduction Statistical analysis with missing data is a rich and

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Econometrics Workshop UNC

More information

PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS

PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS Statistica Sinica 15(2005), 831-840 PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS Florin Vaida University of California at San Diego Abstract: It is well known that the likelihood sequence of the EM algorithm

More information

Miscellanea A note on multiple imputation under complex sampling

Miscellanea A note on multiple imputation under complex sampling Biometrika (2017), 104, 1,pp. 221 228 doi: 10.1093/biomet/asw058 Printed in Great Britain Advance Access publication 3 January 2017 Miscellanea A note on multiple imputation under complex sampling BY J.

More information

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton EM Algorithm Last lecture 1/35 General optimization problems Newton Raphson Fisher scoring Quasi Newton Nonlinear regression models Gauss-Newton Generalized linear models Iteratively reweighted least squares

More information

Basics of Modern Missing Data Analysis

Basics of Modern Missing Data Analysis Basics of Modern Missing Data Analysis Kyle M. Lang Center for Research Methods and Data Analysis University of Kansas March 8, 2013 Topics to be Covered An introduction to the missing data problem Missing

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

A weighted simulation-based estimator for incomplete longitudinal data models

A weighted simulation-based estimator for incomplete longitudinal data models To appear in Statistics and Probability Letters, 113 (2016), 16-22. doi 10.1016/j.spl.2016.02.004 A weighted simulation-based estimator for incomplete longitudinal data models Daniel H. Li 1 and Liqun

More information

Bias-Variance Tradeoff

Bias-Variance Tradeoff What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

A review of some semiparametric regression models with application to scoring

A review of some semiparametric regression models with application to scoring A review of some semiparametric regression models with application to scoring Jean-Loïc Berthet 1 and Valentin Patilea 2 1 ENSAI Campus de Ker-Lann Rue Blaise Pascal - BP 37203 35172 Bruz cedex, France

More information

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas 0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity

More information

Lecture 5: LDA and Logistic Regression

Lecture 5: LDA and Logistic Regression Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant

More information

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Lecture 8: Bayesian Estimation of Parameters in State Space Models in State Space Models March 30, 2016 Contents 1 Bayesian estimation of parameters in state space models 2 Computational methods for parameter estimation 3 Practical parameter estimation in state space

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

Local Polynomial Wavelet Regression with Missing at Random

Local Polynomial Wavelet Regression with Missing at Random Applied Mathematical Sciences, Vol. 6, 2012, no. 57, 2805-2819 Local Polynomial Wavelet Regression with Missing at Random Alsaidi M. Altaher School of Mathematical Sciences Universiti Sains Malaysia 11800

More information

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide

More information

Topics and Papers for Spring 14 RIT

Topics and Papers for Spring 14 RIT Eric Slud Feb. 3, 204 Topics and Papers for Spring 4 RIT The general topic of the RIT is inference for parameters of interest, such as population means or nonlinearregression coefficients, in the presence

More information

A Sampling of IMPACT Research:

A Sampling of IMPACT Research: A Sampling of IMPACT Research: Methods for Analysis with Dropout and Identifying Optimal Treatment Regimes Marie Davidian Department of Statistics North Carolina State University http://www.stat.ncsu.edu/

More information

Implications of Missing Data Imputation for Agricultural Household Surveys: An Application to Technology Adoption

Implications of Missing Data Imputation for Agricultural Household Surveys: An Application to Technology Adoption Implications of Missing Data Imputation for Agricultural Household Surveys: An Application to Technology Adoption Haluk Gedikoglu Assistant Professor of Agricultural Economics Cooperative Research Programs

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Analysis of Matched Case Control Data in Presence of Nonignorable Missing Exposure

Analysis of Matched Case Control Data in Presence of Nonignorable Missing Exposure Biometrics DOI: 101111/j1541-0420200700828x Analysis of Matched Case Control Data in Presence of Nonignorable Missing Exposure Samiran Sinha 1, and Tapabrata Maiti 2, 1 Department of Statistics, Texas

More information

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X. Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may

More information

LOCAL LINEAR REGRESSION FOR GENERALIZED LINEAR MODELS WITH MISSING DATA

LOCAL LINEAR REGRESSION FOR GENERALIZED LINEAR MODELS WITH MISSING DATA The Annals of Statistics 1998, Vol. 26, No. 3, 1028 1050 LOCAL LINEAR REGRESSION FOR GENERALIZED LINEAR MODELS WITH MISSING DATA By C. Y. Wang, 1 Suojin Wang, 2 Roberto G. Gutierrez and R. J. Carroll 3

More information

Likelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University

Likelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University Likelihood, MLE & EM for Gaussian Mixture Clustering Nick Duffield Texas A&M University Probability vs. Likelihood Probability: predict unknown outcomes based on known parameters: P(x q) Likelihood: estimate

More information

A STRATEGY FOR STEPWISE REGRESSION PROCEDURES IN SURVIVAL ANALYSIS WITH MISSING COVARIATES. by Jia Li B.S., Beijing Normal University, 1998

A STRATEGY FOR STEPWISE REGRESSION PROCEDURES IN SURVIVAL ANALYSIS WITH MISSING COVARIATES. by Jia Li B.S., Beijing Normal University, 1998 A STRATEGY FOR STEPWISE REGRESSION PROCEDURES IN SURVIVAL ANALYSIS WITH MISSING COVARIATES by Jia Li B.S., Beijing Normal University, 1998 Submitted to the Graduate Faculty of the Graduate School of Public

More information

Estimating the parameters of hidden binomial trials by the EM algorithm

Estimating the parameters of hidden binomial trials by the EM algorithm Hacettepe Journal of Mathematics and Statistics Volume 43 (5) (2014), 885 890 Estimating the parameters of hidden binomial trials by the EM algorithm Degang Zhu Received 02 : 09 : 2013 : Accepted 02 :

More information

Robustness to Parametric Assumptions in Missing Data Models

Robustness to Parametric Assumptions in Missing Data Models Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Institute of Statistics

More information

EM Algorithm. Expectation-maximization (EM) algorithm.

EM Algorithm. Expectation-maximization (EM) algorithm. EM Algorithm Outline: Expectation-maximization (EM) algorithm. Examples. Reading: A.P. Dempster, N.M. Laird, and D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc.,

More information

Computing the MLE and the EM Algorithm

Computing the MLE and the EM Algorithm ECE 830 Fall 0 Statistical Signal Processing instructor: R. Nowak Computing the MLE and the EM Algorithm If X p(x θ), θ Θ, then the MLE is the solution to the equations logp(x θ) θ 0. Sometimes these equations

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

STATISTICAL ANALYSIS WITH MISSING DATA

STATISTICAL ANALYSIS WITH MISSING DATA STATISTICAL ANALYSIS WITH MISSING DATA SECOND EDITION Roderick J.A. Little & Donald B. Rubin WILEY SERIES IN PROBABILITY AND STATISTICS Statistical Analysis with Missing Data Second Edition WILEY SERIES

More information

Accelerating the EM Algorithm for Mixture Density Estimation

Accelerating the EM Algorithm for Mixture Density Estimation Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 1/18 Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department Worcester Polytechnic

More information

Comparison between conditional and marginal maximum likelihood for a class of item response models

Comparison between conditional and marginal maximum likelihood for a class of item response models (1/24) Comparison between conditional and marginal maximum likelihood for a class of item response models Francesco Bartolucci, University of Perugia (IT) Silvia Bacci, University of Perugia (IT) Claudia

More information

Causal Inference in Observational Studies with Non-Binary Treatments. David A. van Dyk

Causal Inference in Observational Studies with Non-Binary Treatments. David A. van Dyk Causal Inference in Observational Studies with Non-Binary reatments Statistics Section, Imperial College London Joint work with Shandong Zhao and Kosuke Imai Cass Business School, October 2013 Outline

More information

Lecture Notes: Some Core Ideas of Imputation for Nonresponse in Surveys. Tom Rosenström University of Helsinki May 14, 2014

Lecture Notes: Some Core Ideas of Imputation for Nonresponse in Surveys. Tom Rosenström University of Helsinki May 14, 2014 Lecture Notes: Some Core Ideas of Imputation for Nonresponse in Surveys Tom Rosenström University of Helsinki May 14, 2014 1 Contents 1 Preface 3 2 Definitions 3 3 Different ways to handle MAR data 4 4

More information

6. Fractional Imputation in Survey Sampling

6. Fractional Imputation in Survey Sampling 6. Fractional Imputation in Survey Sampling 1 Introduction Consider a finite population of N units identified by a set of indices U = {1, 2,, N} with N known. Associated with each unit i in the population

More information

Chap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University

Chap 2. Linear Classifiers (FTH, ) Yongdai Kim Seoul National University Chap 2. Linear Classifiers (FTH, 4.1-4.4) Yongdai Kim Seoul National University Linear methods for classification 1. Linear classifiers For simplicity, we only consider two-class classification problems

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Multistate Modeling and Applications

Multistate Modeling and Applications Multistate Modeling and Applications Yang Yang Department of Statistics University of Michigan, Ann Arbor IBM Research Graduate Student Workshop: Statistics for a Smarter Planet Yang Yang (UM, Ann Arbor)

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Discussion of Maximization by Parts in Likelihood Inference

Discussion of Maximization by Parts in Likelihood Inference Discussion of Maximization by Parts in Likelihood Inference David Ruppert School of Operations Research & Industrial Engineering, 225 Rhodes Hall, Cornell University, Ithaca, NY 4853 email: dr24@cornell.edu

More information

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm by Korbinian Schwinger Overview Exponential Family Maximum Likelihood The EM Algorithm Gaussian Mixture Models Exponential

More information

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design 1 / 32 Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design Changbao Wu Department of Statistics and Actuarial Science University of Waterloo (Joint work with Min Chen and Mary

More information

Machine Learning: A Statistics and Optimization Perspective

Machine Learning: A Statistics and Optimization Perspective Machine Learning: A Statistics and Optimization Perspective Nan Ye Mathematical Sciences School Queensland University of Technology 1 / 109 What is Machine Learning? 2 / 109 Machine Learning Machine learning

More information

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2 Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013 Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Multiple Imputation for Missing Data in Repeated Measurements Using MCMC and Copulas

Multiple Imputation for Missing Data in Repeated Measurements Using MCMC and Copulas Multiple Imputation for Missing Data in epeated Measurements Using MCMC and Copulas Lily Ingsrisawang and Duangporn Potawee Abstract This paper presents two imputation methods: Marov Chain Monte Carlo

More information

New mixture models and algorithms in the mixtools package

New mixture models and algorithms in the mixtools package New mixture models and algorithms in the mixtools package Didier Chauveau MAPMO - UMR 6628 - Université d Orléans 1ères Rencontres BoRdeaux Juillet 2012 Outline 1 Mixture models and EM algorithm-ology

More information

Subsample ignorable likelihood for regression analysis with missing data

Subsample ignorable likelihood for regression analysis with missing data Appl. Statist. (2011) 60, Part 4, pp. 591 605 Subsample ignorable likelihood for regression analysis with missing data Roderick J. Little and Nanhua Zhang University of Michigan, Ann Arbor, USA [Received

More information

12 - Nonparametric Density Estimation

12 - Nonparametric Density Estimation ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6

More information

Semiparametric Gaussian Copula Models: Progress and Problems

Semiparametric Gaussian Copula Models: Progress and Problems Semiparametric Gaussian Copula Models: Progress and Problems Jon A. Wellner University of Washington, Seattle 2015 IMS China, Kunming July 1-4, 2015 2015 IMS China Meeting, Kunming Based on joint work

More information

Generalized Linear Models Introduction

Generalized Linear Models Introduction Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,

More information

Columbia University. Columbia University Biostatistics Technical Report Series. Handling Missing Data by Deleting Completely Observed Records

Columbia University. Columbia University Biostatistics Technical Report Series. Handling Missing Data by Deleting Completely Observed Records Columbia University Columbia University Biostatistics Technical Report Series Year 2006 Paper 13 Handling Missing Data by Deleting Completely Observed Records Cuiling Wang Myunghee Cho Paik Columbia University,

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.

More information

Statistical Estimation

Statistical Estimation Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from

More information