Biostat 2065 Analysis of Incomplete Data

Similar documents
EM Algorithm II. September 11, 2018

EM for ML Estimation

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Computational statistics

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton

Statistical Estimation

The Wishart distribution Scaled Wishart. Wishart Priors. Patrick Breheny. March 28. Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/11

Bootstrap and Parametric Inference: Successes and Challenges

Estimating the parameters of hidden binomial trials by the EM algorithm

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction An approximated EM algorithm Simulation studies Discussion

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Basics of Modern Missing Data Analysis

1 One-way analysis of variance

Maximum likelihood estimation via the ECM algorithm: A general framework

Notes on the Multivariate Normal and Related Topics

MCMC algorithms for fitting Bayesian models

Statistical Methods. Missing Data snijders/sm.htm. Tom A.B. Snijders. November, University of Oxford 1 / 23

STATISTICAL ANALYSIS WITH MISSING DATA

THE UNIVERSITY OF CHICAGO CONSTRUCTION, IMPLEMENTATION, AND THEORY OF ALGORITHMS BASED ON DATA AUGMENTATION AND MODEL REDUCTION

Lecture 3 September 1

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger

PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS

ECE 275B Homework #2 Due Thursday 2/12/2015. MIDTERM is Scheduled for Thursday, February 19, 2015

Label Switching and Its Simple Solutions for Frequentist Mixture Models

Statistics 3858 : Maximum Likelihood Estimators

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

STAT Advanced Bayesian Inference

Bayesian Linear Regression

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Fitting Narrow Emission Lines in X-ray Spectra

Fractional Imputation in Survey Sampling: A Comparative Review

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Likelihood-based inference with missing data under missing-at-random

Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS

Statistics & Data Sciences: First Year Prelim Exam May 2018

Chapter 17: Undirected Graphical Models

A note on multiple imputation for general purpose estimation

Introduction to Maximum Likelihood Estimation

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Likelihood-Based Methods

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

Statistical Data Analysis Stat 3: p-values, parameter estimation

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling

Bayesian inference for multivariate skew-normal and skew-t distributions

Approximations based on posterior modes

Heriot-Watt University

Maximum Likelihood Estimation. only training data is available to design a classifier

Model Checking and Improvement

Bayesian Methods for Machine Learning

Graphical Models for Collaborative Filtering

Some general observations.

STA 4273H: Sta-s-cal Machine Learning

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation

Chapter 4: Factor Analysis

MODEL BASED CLUSTERING FOR COUNT DATA

COMPETING RISKS WEIBULL MODEL: PARAMETER ESTIMATES AND THEIR ACCURACY

Variable selection for model-based clustering

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

STA 4273H: Statistical Machine Learning

LINEAR MODELS FOR CLASSIFICATION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception

Bayesian Networks in Educational Assessment

Linear Models A linear model is defined by the expression

MAXIMUM LIKELIHOOD ESTIMATION OF FACTOR ANALYSIS USING THE ECME ALGORITHM WITH COMPLETE AND INCOMPLETE DATA

Advanced Quantitative Methods: maximum likelihood

Biostat 2065 Analysis of Incomplete Data

An Akaike Criterion based on Kullback Symmetric Divergence in the Presence of Incomplete-Data

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

STAT 730 Chapter 4: Estimation

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Lecture Notes. Introduction

ANALYSIS OF TWO-LEVEL STRUCTURAL EQUATION MODELS VIA EM TYPE ALGORITHMS

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

A Bayesian Nonparametric Approach to Monotone Missing Data in Longitudinal Studies with Informative Missingness

Maximum Likelihood Estimation; Robust Maximum Likelihood; Missing Data with Maximum Likelihood

Greene, Econometric Analysis (6th ed, 2008)

analysis of incomplete data in statistical surveys

CS Lecture 18. Expectation Maximization

1 Mixed effect models and longitudinal data analysis

Parameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets

Unsupervised Learning

A Bayesian Treatment of Linear Gaussian Regression

Covariance function estimation in Gaussian process regression

ECE 275B Homework #2 Due Thursday MIDTERM is Scheduled for Tuesday, February 21, 2012

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

Parametric Techniques Lecture 3

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

Yaming Yu Department of Statistics, University of California, Irvine Xiao-Li Meng Department of Statistics, Harvard University.

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Bayesian inference for factor scores

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

A Very Brief Summary of Statistical Inference, and Examples

A STRATEGY FOR STEPWISE REGRESSION PROCEDURES IN SURVIVAL ANALYSIS WITH MISSING COVARIATES. by Jia Li B.S., Beijing Normal University, 1998

GARCH Models Estimation and Inference

Transcription:

Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005

1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies that ( θ θ) N(0, C), where C = J 1 (θ) is the inverse of the expected information. Therefore a natural estimate of C is C 1 = J 1 ( θ). An alternative estimate is C 2 = I 1 (θ). When misspecification is plausible, a robust estimate of the variance is where C 3 = I 1 ( θ)k( θ)i 1 ( θ), K(θ) = l(θ) l(θ) T. θ θ In Newton-Raphson algorithm, C 2 is computed as part of an iteration step. The components of C 3 can be easily obtained from the score and C 2 in NR algorithm. When the EM algorithm or one of its variants is used for ML estimation, additional steps are needed to compute standard errors of the estimates. For example, further calculation C 2 or Louis s formular may be necessary.

2. Supplemented EM algorithm Supplemented EM(SEM) (Meng & Rubin, 1991) is a way to calculate the large-sample covariance matrix of θ using only 1. code for the E and M steps of EM. 2. code for the large-sample complete-data variance-covariance matrix, V c. 3. standard matrix operations. Recall that DM = I i obs i 1 com, where DM is the derivative of the EM mappping, i com is the complete information and i obs = I(θ Y obs ) is the observed information. Therefore, I 1 (θ Y obs ) = i 1 obs = i 1 com(i DM) 1 = V com (I DM) 1. Denote V obs = I 1 (θ Y obs ), then V obs = V com (I DM + DM)(I DM) 1 = V com {I + DM(I DM) 1 } := V com + V. where V = V com DM(I DM) 1 is the increase in variance due to missing data. Though the map M has no explicit form, its derivative DM can be effectively approximated using some extra EM steps.

3. Implementation of SEM First obtain the MLE θ and then run a sequence of SEM iterations as following: the initival value is close to θ and at step t, the currence estimate is θ (t), 1. Run the usual E and M steps to obtain θ (t+1). 2. Fis i = 1, calculate θ (t) (i) = ( θ 1,..., θ i 1, θ (t) i, θ i+1,..., θ d ), 3. Treating θ (t) (i) as the current estimate of θ, run one iteration of EM to obtain θ (t+1) (i). 4. Obtain the ratio r (t) ij = θ (t+1) j (i) θ j θ (t) i θ, for j = 1,..., d. i 5. Repeat steps 2 to 4 for i = 2,..., d. The output θ (t+1) and {r (t) ij : i, j = 1,..., d}. With t, the elements of r (t) ij and the limit will be approximation of DM. converge

4. Other methods Bootstrapping the observed data Bayesian methods: using the posterior variance under a flat prior

5. The ECM algorithm There are situations where the M step does not have an explicit solution even the complete data are from an exponential family. Usually iterative procedures are required for each M step. Sometimes the M-step can be modified into several conditional maximization steps in order to avoid iterative M-steps. This modification is called ECM algorithm. Suppose the parameter θ = {θ 1, θ 2,..., θ S }, and the current estimate is θ (t) = {θ (t) 1,..., θ (t) S }, then the M-step consists of S CM-steps: 1. At the (t + 1/S)th CM-step, let θ = {θ 1, θ (t) 2,..., θ (t) S }, then maximize Q( θ; θ (t) ) with respect to θ 1. Denote the maximizer as θ (t+1/s) 1. 2. Similarly, at the (t + 2/S)th CM-step, let θ = {θ (t+1/s) 1, θ 2, θ (t) 3,..., θ (t) S }, maximize Q( θ; θ (t) ) with respect to θ 2. Denote the maximizer as θ (t+2/s) 2. 3. Repeat sequentially by maxmizing the Q function with respect to θ s and all other parameters are fixed at the previous values, s = 1, 2,..., S. After all, θ (t+1) = {θ (t+1/s) 1,..., θ (t+s/s) S is the updated estimate of θ for the subsequent E-step. Since each CM step increases Q, ECM is a GEM algorithm and monotonically increases the likelihood of θ. Under the same conditions that guarantee the convergence of EM, ECM converges to a stationary point of the likelihood, i.e., a solution to the score equation of θ.

Example 8.6. A multivariate normal regression model with incomplete data. Model: y i N K (X i β, Σ), i = 1, 2,..., n.

6. Univariate t with unknown degrees of freedom Suppose that the observed data consist of a random sample X = (x 1, x 2,..., x n ) from a Student s t distribution with center µ, scale parameter σ, and unknown degrees of freedom ν, with density f(x i ; θ) = Γ(ν/2 + 1/2) (πνσ 2 ) 1/2 Γ(ν/2){1 + (x i µ) 2 /(νσ 2 )} (ν+1)/2. An augmented complete dataset can be defined as Y = (Y obs, Y mis ), where Y obs = X and Y mis = W = (w 1, w 2,..., w n ) is a vector of unobserved positive quantities, such that pairs (w i, x i ) are independent across units i, with distribution (x i w i ; θ) N(µ, σ 2 /w i ), (w i ; θ) χ 2 ν/ν. The M step is complicated by the estimation of ν. It can be replaced by two CM-steps: 1. CM1: For current parameters θ (t) = (µ (t), σ (t), ν (t) ), maximize Q with respect to (µ, σ) and fixed ν = ν (t). It yields (µ, σ) = (µ (t+1), σ (t+1) ). 2. CM2: Maximize Q with respect to ν with (µ, σ) = (µ (t+1), σ (t+1) ). The completedata loglikelihood is n l(µ, σ 2, ν; Y ) = n/2 log σ 2 1/2 w i (x i µ) 2 /σ 2 + nν/2 log(ν/2) n log Γ(ν/2) + (ν/2 1) i=1 n log w i ν/2 i=1 n w i. i=1

Sufficient statistics are... Since ν is a scalar, the maximizer ν (t+1) can be found by an iterative one-dimensional search.

7. ECME algorithm The ECME(Expectation/Conditional Maximization Either) algorithm replaces some of the CM steps of ECM, which maximize the contrained expected complete-data loglikelihood function (the Q-function), with steps that maximize the correspondingly constrained actual likelihood function. 1. ECME shares the stable monotone convergence and simplicity of implementation. 2. ECME can have a substantially faster convergence rate than EM or ECM: (a) In some of ECME s M-steps, the actual likelihood (rather than an approximation of it) is being conditionally maximized. (b) ECME allows faster computation with constrained maximization. Example 8.9. Univariate t with unknown degrees of freedom. An ECME algorithm is obtained by retaining the E and CM1 steps of the previous example, but replacing the CM2 step by maximizing the observed loglikelihood with repect to ν.