EM Algorithm II. September 11, 2018

Size: px
Start display at page:

Download "EM Algorithm II. September 11, 2018"

Transcription

1 EM Algorithm II September 11, 2018

2 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data log likelihood: l O (θ Y obs ) = log { f (Y obs, y mis θ) dy mis EM algorithm: { E step: h (k) (θ) E l C (θ Y obs, Y mis ) Y obs, θ. (k) Note, the integration is with respect { to Y mis Y obs, so E l C (θ Y obs, Y mis ) Y obs, θ (k) = l C (θ Y obs, Y mis ) f (y mis Y obs, θ)dy mis. M step: θ (k+1) = arg max θ h (k) (θ) Ascent property: l O (θ (k) Y obs ) is non-decreasing along k. If you can calculate it, it is a good idea to monitor it for debugging purpose. Issues: 1. Could trap in local maxima. 2. Slow convergence.

3 Standard errors for the EM estimates 2/27 Numerical approximation of the Hessian matrix Note: l(θ) = observed-data log-likelihood We estimate the gradient using { l(θ) i = l(θ) θ i l(θ + δ ie i ) l(θ δ i e i ) 2δ i where e i is a unit vector with 1 for the ith element and 0 otherwise. In calculating derivatives using this formula, generally start with some medium size δ and then repeatedly halve it until the estimated derivative stabilizes. We can estimate the Hessian by applying the above formula twice: { l(θ) i j l(θ + δ ie i + δ j e j ) l(θ + δ i e i δ j e j ) l(θ δ i e i + δ j e j ) + l(θ δ i e i δ j e j ) 4δ i δ j

4 Louis estimator (1982) 3/27 l C (θ Y obs, Y mis ) log { f (Y obs, Y mis θ) { l O (θ Y obs ) log f (Y obs, y mis θ) dy mis l C (θ Y obs, Y mis ), l O (θ Y obs ) = gradients of l C, l O l C (θ Y obs, Y mis ), l O (θ Y obs ) = second derivatives of l C, l O We can prove that (5) l O (θ Y obs ) = E { l C (θ Y obs, Y mis ) Y obs (6) l O (θ Y obs ) = E { l C (θ Y obs, Y mis ) Y obs E {[ l C (θ Y obs, Y mis ) ] 2 Y obs + [ l O (θ Y obs ) ] 2 MLE: ˆθ = arg max θ l O (θ Y obs ) Louis variance estimator: { l O (θ Y obs ) 1 evaluated at θ = ˆθ Note: All of the conditional expectations can be computed in the EM algorithm using only l C and l C, which are first and second derivatives of the complete-data log-likelihood. Louis estimator should be evaluated at the last step of EM.

5 Proof of (5) 4/27 Proof: By the definition of l O (θ Y obs ), l O (θ Y obs ) = log { f (Y obs, y mis θ) dy mis θ = f (Y obs, y mis θ) dy mis / θ f (Yobs, y mis θ) dy mis = f (Y obs, y mis θ) dy mis f (Yobs, y mis θ) dy mis (7) Multiplying and dividing the integrand of the numerator by f (Y obs, y mis θ) gives (5), f (Y obs,y mis θ) f (Y l O (θ Y obs ) = obs,y mis θ) f (Y obs, y mis θ) dy mis f (Yobs, y mis θ) dy mis = = = log { f (Y obs,y mis θ) θ f (Y obs, y mis θ) dy mis f (Yobs, y mis θ) dy mis f (Y obs, y mis θ) l C (θ Y obs, Y mis ) dy mis f (Yobs, y mis θ) dy mis l C (θ Y obs, Y mis ) f (y mis Y obs, θ) dy mis = E { l C (θ Y obs, Y mis ) Y obs.

6 Proof of (6) 5/27 Proof: We take an additional derivative of l O (θ Y obs ) in expression (7) to obtain l O (θ Y obs ) = = f (Y obs, y mis θ) dy mis f (Yobs, y mis θ) dy mis f (Y obs, y mis θ) dy mis f (Yobs, y mis θ) dy mis f 2 (Y obs, y mis θ) dy mis f (Yobs, y mis θ) dy mis { l O (θ Y obs ) 2 = f (Y obs, y mis θ) dy mis f (Y obs θ) { l O (θ Y obs ) 2 (8) To see how the first term breaks down, we take an additional derivative of log { f f (Yobs, y mis θ) (Y obs, y mis θ) dy mis = f (Y obs, y mis θ) dy mis θ

7 to obtain f 2 log { f (Y obs, y mis θ) (Y obs, y mis θ) dy mis = f (Y θ θ obs, y mis θ) dy mis [ ] 2 log { f (Yobs, y mis θ) + f (Y obs, y mis θ) dy mis θ Thus we express the first term in equation (8) to be E { l C (θ Y obs, Y mis ) Y obs + E {[ l C (θ Y obs, Y mis ) ] 2 Y obs.

8 Convergence rate of EM 7/27 Let I C (θ) and I O (θ) denote the complete information and observed information, respectively. One can show when the EM converges, the linear convergence rate, denoted as (θ (k+1) ˆθ)/(θ (k) ˆθ) approximates 1 I O (ˆθ)/I C (ˆθ). (later) This means that When missingness is small, EM converges quickly Otherwise EM converges slowly.

9 Towards SEM algorithm 8/27 EM algorithm does not generate asymptotic covariance matrix (standard errors) for parameters as a byproduct. The asymptotic covariance matrix for ˆθ, denoted as V, can be found as { l O (ˆθ Y obs ) 1. However, the derivations can be difficult to evaluate directly. In contrast, l C (ˆθ Y obs, Y mis ), and hence I OC E { l C (ˆθ Y obs, Y mis ) Y obs is relatively easier to evaluate. Louis estimator for covariance matrix, i.e., E { l C (ˆθ Y obs, Y mis ) Y obs E {[ l C (ˆθ Y obs, Y mis ) ] 2 Y obs + [ l O (ˆθ Y obs ) ] 2 requires calculation of the conditional expectation of the square of the complete-data score function, which is specific to each problem. Supplemented EM algorithm (Meng & Rubin, 1991) obtains covariance matrix by using only the code for computing the complete-data covariance matrix, the code for EM itself, and code for standard matrix operations.

10 Towards SEM algorithm (continued) 9/27 EM defines a mapping, M : θ (k+1) = M(θ (k) ), where M(θ) = (M 1 (θ),..., M p (θ)) Let {DM i j = ( M j (θ)/ θ i ) θ=ˆθ, which is a p p matrix. We can show that θ (k+1) ˆθ DM(θ (k) ˆθ), which means DM is the rate of convergence of EM. Proof: Because θ (k+1) = M(θ (k) ) and ˆθ = M(ˆθ), θ (k+1) ˆθ = M(θ (k) ) M(ˆθ). By Taylor series expansion on the right hand side, we have θ (k+1) ˆθ DM(θ (k) ˆθ). It has been shown that V = I 1 OC (I DM) 1, which means, the observed-data asymptotic variance can be obtained by inflating the complete-data asymptotic variance by the factor (1 DM) 1.

11 SEM algorithm 10/27 SEM consists of three parts 1. The evaluation of I OC 2. The evaluation of DM 3. The evaluation of V Evaluation of I OC E { l C (ˆθ Y obs, Y mis ) Y obs Example 1 (Grouped Multinomial): l C (θ X) = (x 1 + y 4 ) log θ + (y 2 + y 3 ) log (1 θ). Example 2 (Normal Mixtures): l C (µ, σ, p x, y) = i j y i j { log p j + log φ(x i µ j, σ j ) f (Y obs, Y mis ) exponential family: l C (θ X) = S (X) η(θ) B(θ) The l C, l C and l C are linear functions of x 1, i y i j and S (X) (sufficient statistics). Recall that we evaluate E(sufficient statistics Y obs, θ (k) ) at every E step. We easily obtain I OC by plugging in E(sufficient statistics Y obs, ˆθ) at the last E step

12 SEM algorithm (continued) 11/27 Evaluation of DM = {r i j For a scalar θ, we can use the sequence θ (k) to obtain DM. For a vector θ, we cannot do so, because θ (k+1) i ˆθ i j DM i j (θ (k) j ˆθ j ). Each DM i j is the component-wise rate of convergence of the following forced EM 1. Run EM to get the MLE ˆθ 2. Pick a starting point, θ (0), some small distance from ˆθ but not equal to ˆθ in any component 3. Repeat the following until r (k) i j is stable (a) Calculate θ (k) = M(θ (k 1) ) using one step of EM (b) For each i = 1,..., p, i. Let θ (k) (i) = (ˆθ 1,..., ˆθ i 1, θ (k) i, ˆθ i+1,..., ˆθ p ) (Replace the ith element of ˆθ with the ith element of θ (k) ) ii. Perform one step of EM on θ (k) (i) to obtain M[θ (k) (i)] iii. Obtain r (k) i j = { { M j [θ (k) (i)] ˆθ j / θ (k) i ˆθ i for j = 1,..., p

13 SEM algorithm (continued) 12/27 Note: The MLE ˆθ should be obtained at very low tolerance (e.g., ɛ = ) The final r i j is taken to be the first value of r (k) i j can be different for different (i, j). satisfying r (k) i j r (k 1) i j < ɛ, where k

14 MCEM algorithm 13/27 EM algorithm: the analytical integration of the likelihood required for the E-step can be difficult. Monte Carlo EM algorithm (Wei & Tanner, 1990) replaces the analytical integration in the E-step by a Monte Carlo integration procedure with MCMC sampling techniques such as the Gibbs or the Metropolis Hastings algorithm. MCE step: Simulate a sample Y mis,1,..., Y mis,m from f (Y mis Y obs, θ (k) ) and calculate m h (k) (θ) = m 1 l C (θ Y obs, Y mis, j ) M step: θ (k+1) = arg max θ h (k) (θ) j=1 Choose m to guarantee convergence Wei & Tanner recommend starting with small value of m and then increasing m as θ (k) moves closer to the true maximizer.

15 Bayesian EM 14/27 Suppose we have prior π(θ) and wish to find the mode of log posterior = l O (θ Y obs ) + log π(θ). { E step: h (k) (θ) E l C (θ Y obs, Y mis ) + log π(θ) Y obs, θ (k) { = E l C (θ Y obs, Y mis ) Y obs, θ (k) + log π(θ) M step: θ (k+1) = arg max θ h (k) (θ)

16 Bayesian EM: Example 15/27 Observed data: Complete data: where x 0 + x 1 = y 1. { (y 1, y 2, y 3 ) multinomial n; { (x 0, x 1, y 2, y 3 ) multinomial n; 2 + θ 4, 1 θ 2, θ 4 1 2, θ 4, 1 θ 2, θ 4 No Prior θ Beta(ν 1, ν 2 ) : π(θ) = Γ(ν 1+ν 2 ) Γ(ν 1 )Γ(ν 2 ) θ(ν 1 1) (1 θ) (ν 2 1) l C (θ x 0, x 1, y 2, y 3 ) (x 1 + y 3 ) log θ + y 2 log(1 θ) (x 1 + y 3 + ν 1 1) log θ + (y 2 + ν 2 1) log(1 θ) ω (k) 12 = E(x 1 θ (k), y 1 ) θ (k) y 1 /(θ (k) + 2) Same as left θ (k+1) (ω (k) 12 + y 3)/(ω (k) 12 + y 2 + y 3 ) (ω (k) 12 + y 3 + ν 1 1)/(ω (k) 12 + y 2 + y 3 + ν 1 + ν 2 2)

17 Generalized EM algorithm 16/27 Generalized EM (GEM) E step: evaluate h (k) (θ) as before M step: Choose θ (k+1) such that h (k) (θ (k+1) ) h (k) (θ (k) ) (do not necessarily maximize h (k) (θ), just increase it.) Note: This retains the ascent property of EM. EM gradient algorithm (Lange, 1995): a class of GEM M step: Do one step of Newton-Raphson: { θ (k+1) = θ (k) + α (k) d (k), where d (k) 2 h (k) 1 { (θ) h (k) (θ) = θ θ θ Start with α (k) = 1; do step-halving until h (k) (θ (k+1) ) h (k) (θ (k) ) θ=θ (k). Lange pointed out that one step Newton-Raphson saves us from performing iterations within iterations and yet still displays the same local rate of convergence as a full EM algorithm that maximizes h (k) (θ) at each iteration.

18 ECM algorithm 17/27 EM is unattractive if maximizing complete-data log likelihood h (k) (θ) is complicated. In many cases, maximizing h (k) (θ) is relatively simple when conditional on some of the parameters being estimated. Expectation Conditional Maximization algorithm (Meng & Rubin, 1993) replaces a (complicated) M step with a sequence of conditional maximization (CM) steps. E step: evaluate h (k) (θ) as before CM step: Partition θ into T parts: θ = (θ 1,..., θ T ). For t = 1,..., T, obtain θ (k+1) t = arg max θ t h (k) (θ (k+1) 1,..., θ (k+1) t 1, θ t, θ (k) t+1,..., θ(k) T ) Note: Sharing all appealing convergence properties of EM, such as ascent property Typically need more E- and M- iterations but can be faster in total computer time.

19 ECM algorithm: example 18/27 Complete data: y 1,..., y n gamma(α, β) with density f (y α, β) = yα 1 e y/β β α Γ(α) Observed data: y O = censoring of the complete data Complete-data log-likelihood l C (α, β y 1,..., y n ) = (α 1) log y i y i /β n { α log β + log Γ(α) i i Define ȳ = n 1 i y i, ḡ = n 1 i log y i, and ψ(α) = Γ (α)/γ(α) E step: ω (k) E(ȳ y O, α (k), β (k) ) τ (k) E(ḡ y O, α (k), β (k) ) CM steps Given α (k), β (k+1) = ω (k) /α (k) Given β (k+1), α (k+1) = ψ 1 (τ (k) log β (k+1) )

20 ECME algorithm 19/27 ECM Either algorithm (Liu & Rubin, 1994) is a generalization of the ECM algorithm. It replacs some CM-steps of ECM, which maximize the constrained expected complete-data log likelihood, with steps that maximize the correspondingly constrained observed-data log likelihood. E step: evaluate h (k) (θ) as before CM step: Partition θ into T parts: θ = (θ 1,..., θ T ). For t = 1,..., T, obtain either or Note: θ (k+1) t θ (k+1) t = arg max θ t = arg max θ t h (k) (θ (k+1) 1,..., θ (k+1) t 1, θ t, θ (k) t+1,..., θ(k) T ) l O (θ (k+1) 1,..., θ (k+1) t 1, θ t, θ (k) t+1,..., θ(k) T Y obs) Share with both EM and ECM their stable monotone convergence and simplicity of implementation Converge substantially faster than either EM or ECM, measured by either the number of iterations or actual computer time.

21 Example: Mixed-effects model 20/27 For a longitudinal dataset of i = 1,..., N subjects, each with t = 1,..., n i measurements of the response, a simple linear mixed effect model is given by Y it = X i β + b i + ɛ it, b i N(0, σ 2 b ), ɛ i N ni (0, σ 2 ɛ I ni ), b i, ɛ i independent Observed-data log-likelihood l(β, σ 2 b, σ2 ɛ Y 1,..., Y N ) where {Σ i tt = σ 2 b + σ2 ɛ and {Σ i tt = σ 2 b for t t. i { 1 2 (Y i X i β) Σ 1 i (Y i X i β) 1 2 log Σ i, In fact, this likelihood can be directly maximized for (β, σ 2 b, σ2 ɛ) by using Newton-Raphson or Fisher scoring. Note: Given (σ 2 b, σ2 ɛ) and hence Σ i, we obtain β that maximizes the likelihood by solving l(β, σ 2 b, σ2 ɛ Y 1,..., Y N ) = X i β Σ 1 i (Y i X i β) = 0, i 1 N N β = X i Σ 1 i X i X i Σ 1 i Y i. i=1 i=1

22 Example: Mixed-effects model (continued) 21/27 Complete-data log-likelihood: b i are treated as missing data Let ɛ i = Y i X i β b i. We know that ( ) {( ) ( ) bi 0 σ 2 = N, b σ 2 ɛ I ni ɛ i l C (β, σ 2 b, σ2 ɛ ɛ 1,..., ɛ N, b 1,..., b N ) { i 1 b 2 2σ 2 i 1 2 log σ2 b 1 ɛ 2σ b 2 i ɛ i n i ɛ 2 log σ2 ɛ The parameter that maximizes the l C is obtained as, given the complete data N σ 2 b = N 1 σ 2 ɛ = β = i=1 i=1 b 2 i 1 N n i N X i X i i=1 N i=1 1 ɛ i ɛ i N i=1 X i (Y i b i ).

23 Example: Mixed-effects model (continued) 22/27 E step: to evaluate We use the relationship E ( ) b 2 i Y i, β (k), σ 2(k) b, σ 2(k) ɛ E ( ɛ i ɛ Y i, β (k), σ (k) b, ) σ2(k) ɛ E ( b i Y i, β (k), σ (k) b, ) σ2(k) ɛ E(b 2 i Y i ) = {E(b i Y i ) 2 + Var(b i Y i ). Thus we need to calculate E(b i Y i ) and Var(b i Y i ). Recall the conditional distribution for multivariate normal variables ( ) {( ) ( Yi Xi β σ 2 = N, b e ni e n i + σ 2 ɛ I ni σ 2 b e ) n i, e n 0 i = (1, 1,..., 1) b i σ 2 b e n i Let Σ i = σ 2 b e n i e n i + σ 2 ɛ I ni. We known that σ 2 b E(b i Y i ) = 0 + σ 2 b e n i Σ 1 i (Y i X i β) Var(b i Y i ) = σ 2 b σ2 b e n i Σ 1 i σ 2 b e n i.

24 Example: Mixed-effects model (continued) 23/27 Similarly, We use the relationship We can derive E(ɛ i ɛ i Y i ) = E(ɛ i Y i )E(ɛ i Y i ) + Var(ɛ i Y i ). ( Yi ɛ i ) = N {( ) Xi β, 0 Let Σ i = σ 2 b e n i e n i + σ 2 ɛ I ni. Then we have M step of standard EM algorithm N = N 1 σ 2(k+1) b σ 2(k+1) ɛ = β (k+1) = ( ) σ 2 b e ni e n i + σ 2 ɛ I ni σ 2 ɛ I ni σ 2 ɛ I ni σ 2. ɛ I ni E(ɛ i Y i ) = 0 + σ 2 ɛσ 1 i (Y i X i β) Var(ɛ i Y i ) = σ 2 ɛ I ni σ 4 ɛσ 1 i. i=1 i=1 1 N n i N X i X i i=1 E(b 2 i Y i, β (k), σ 2(k) b, σɛ 2(k) )) (1) N i=1 1 E(ɛ i ɛ i Y i, β (k), σ 2(k) b, σ 2(k) ɛ ) (2) N i=1 X i E(Y i b i Y i, β (k), σ 2(k) b, σ 2(k) ɛ ). (3)

25 ECME algorithm: Mixed-effects model (continued) 24/27 M step of ECME algorithm Partition the parameter vector (β, σ 2 b, σ2 ɛ) as β and (σ 2 b, σ2 ɛ) First maximize complete-data log-likelihood over (σ 2 b, σ2 ɛ), given by (1) and (2) Given (σ 2(k+1) b, σ 2(k+1) ɛ ), we can calculate Σ (k+1) i = σ 2(k+1) b e ni e n i + σ 2(k+1) ɛ I ni and obtain β that maximizes the observed-data log likelihood 1 N { β (k+1) = X i Σ (k+1) 1 N { i Xi X i Σ (k+1) 1 i Yi. i=1 i=1 Convergence accelerated: In some of ECME s M steps, the observed-data likelihood is being conditionally maximized, rather than an approximation to it (expected complete-data log likelihood).

26 Accelerated EM 25/27 Aitken s acceleration method (Louis, 1982) Suppose θ (k) ˆθ, as k. Then Now ˆθ = θ (k) + [ θ (k+h+1) θ (k+h)]. h=0 θ (k+h+2) θ (k+h+1) = M(θ (k+h+1) ) M(θ (k+h) ) J(θ (k+h) ) [ θ (k+h+1) θ (k+h)] J(θ (k) ) [ θ (k+h+1) θ (k+h)] { J(θ (k) ) h+1 [ θ (k+1) θ (k)] M : mapping defined by EM J : Jabobian of M Thus { ˆθ θ (k) + J(θ (k) ) h [ θ (k+1) θ (k)] h=0 θ (k) + { I J(θ (k) ) 1 [ θ (k+1) θ (k)] by which we can produce the effect of an infinite number of iterations by the following algorithm

27 Accelerated EM 26/27 The algorithm: 1. From θ (k), produce θ (k+1) using EM 2. Estimate ( I J(θ (k) ) ) 1 by ( I ˆ J ) 1 (see below) 3. Compute θ (k+1) 4. Use θ (k+1) in step 1. = θ (k) + ( I ˆ J ) 1 [ θ (k+1) θ (k)] Louis (1982) showed ( I ˆ J ) 1 = IOC (I O ) 1 where I OC = E { l C (ˆθ Y obs, Y mis ) Y obs and IO can be obtained by the Louis formula.

28 References 27/27 Dempster, A.P., Laird, N., and Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39, Lange, K. (1995). A gradient algorithm locally equivalent to the EM algorithm. Journal of the Royal Statistical Society B, 57, Liu, C. and Rubin, D.B. (1994). The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence. Biometrika, 81, Louis, T.A. (1982). Finding observed information using the EM algorithm. Journal of the Royal Statistical Society B, 44, Meng, X. and Rubin, D.B. (1991). Using EM to obtain asymptotic variance-covariance matrices: The SEM algorithm. Journal of the American Statistical Association, 86, Meng, X. and Rubin, D.B. (1993). Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika, 80, Wei, G. C. G. and Tanner, M.A. (1990). A Monte Carlo implementation of the EM algorithm and the poor mans data augmentation algorithms. Journal of the American Statistical Association, 85,

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton EM Algorithm Last lecture 1/35 General optimization problems Newton Raphson Fisher scoring Quasi Newton Nonlinear regression models Gauss-Newton Generalized linear models Iteratively reweighted least squares

More information

Biostat 2065 Analysis of Incomplete Data

Biostat 2065 Analysis of Incomplete Data Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies

More information

EM for ML Estimation

EM for ML Estimation Overview EM for ML Estimation An algorithm for Maximum Likelihood (ML) Estimation from incomplete data (Dempster, Laird, and Rubin, 1977) 1. Formulate complete data so that complete-data ML estimation

More information

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X. Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Statistical Estimation

Statistical Estimation Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from

More information

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Put your solution to each problem on a separate sheet of paper. Problem 1. (5166) Assume that two random samples {x i } and {y i } are independently

More information

Likelihood-based inference with missing data under missing-at-random

Likelihood-based inference with missing data under missing-at-random Likelihood-based inference with missing data under missing-at-random Jae-kwang Kim Joint work with Shu Yang Department of Statistics, Iowa State University May 4, 014 Outline 1. Introduction. Parametric

More information

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution MH I Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution a lot of Bayesian mehods rely on the use of MH algorithm and it s famous

More information

COMPETING RISKS WEIBULL MODEL: PARAMETER ESTIMATES AND THEIR ACCURACY

COMPETING RISKS WEIBULL MODEL: PARAMETER ESTIMATES AND THEIR ACCURACY Annales Univ Sci Budapest, Sect Comp 45 2016) 45 55 COMPETING RISKS WEIBULL MODEL: PARAMETER ESTIMATES AND THEIR ACCURACY Ágnes M Kovács Budapest, Hungary) Howard M Taylor Newark, DE, USA) Communicated

More information

THE UNIVERSITY OF CHICAGO CONSTRUCTION, IMPLEMENTATION, AND THEORY OF ALGORITHMS BASED ON DATA AUGMENTATION AND MODEL REDUCTION

THE UNIVERSITY OF CHICAGO CONSTRUCTION, IMPLEMENTATION, AND THEORY OF ALGORITHMS BASED ON DATA AUGMENTATION AND MODEL REDUCTION THE UNIVERSITY OF CHICAGO CONSTRUCTION, IMPLEMENTATION, AND THEORY OF ALGORITHMS BASED ON DATA AUGMENTATION AND MODEL REDUCTION A DISSERTATION SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCES

More information

Estimating the parameters of hidden binomial trials by the EM algorithm

Estimating the parameters of hidden binomial trials by the EM algorithm Hacettepe Journal of Mathematics and Statistics Volume 43 (5) (2014), 885 890 Estimating the parameters of hidden binomial trials by the EM algorithm Degang Zhu Received 02 : 09 : 2013 : Accepted 02 :

More information

Computational statistics

Computational statistics Computational statistics EM algorithm Thierry Denœux February-March 2017 Thierry Denœux Computational statistics February-March 2017 1 / 72 EM Algorithm An iterative optimization strategy motivated by

More information

ECE 275B Homework #2 Due Thursday MIDTERM is Scheduled for Tuesday, February 21, 2012

ECE 275B Homework #2 Due Thursday MIDTERM is Scheduled for Tuesday, February 21, 2012 Reading ECE 275B Homework #2 Due Thursday 2-16-12 MIDTERM is Scheduled for Tuesday, February 21, 2012 Read and understand the Newton-Raphson and Method of Scores MLE procedures given in Kay, Example 7.11,

More information

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions. Large Sample Theory Study approximate behaviour of ˆθ by studying the function U. Notice U is sum of independent random variables. Theorem: If Y 1, Y 2,... are iid with mean µ then Yi n µ Called law of

More information

ECE 275B Homework #2 Due Thursday 2/12/2015. MIDTERM is Scheduled for Thursday, February 19, 2015

ECE 275B Homework #2 Due Thursday 2/12/2015. MIDTERM is Scheduled for Thursday, February 19, 2015 Reading ECE 275B Homework #2 Due Thursday 2/12/2015 MIDTERM is Scheduled for Thursday, February 19, 2015 Read and understand the Newton-Raphson and Method of Scores MLE procedures given in Kay, Example

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

STAT Advanced Bayesian Inference

STAT Advanced Bayesian Inference 1 / 8 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics March 5, 2018 Distributional approximations 2 / 8 Distributional approximations are useful for quick inferences, as starting

More information

Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level

Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level A Monte Carlo Simulation to Test the Tenability of the SuperMatrix Approach Kyle M Lang Quantitative Psychology

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak. Large Sample Theory Large Sample Theory is a name given to the search for approximations to the behaviour of statistical procedures which are derived by computing limits as the sample size, n, tends to

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University

Shu Yang and Jae Kwang Kim. Harvard University and Iowa State University Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND

More information

A note on multiple imputation for general purpose estimation

A note on multiple imputation for general purpose estimation A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume

More information

Parametric fractional imputation for missing data analysis

Parametric fractional imputation for missing data analysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????),??,?, pp. 1 15 C???? Biometrika Trust Printed in

More information

Lecture 3 September 1

Lecture 3 September 1 STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have

More information

Approximate Normality, Newton-Raphson, & Multivariate Delta Method

Approximate Normality, Newton-Raphson, & Multivariate Delta Method Approximate Normality, Newton-Raphson, & Multivariate Delta Method Timothy Hanson Department of Statistics, University of South Carolina Stat 740: Statistical Computing 1 / 39 Statistical models come in

More information

Information in a Two-Stage Adaptive Optimal Design

Information in a Two-Stage Adaptive Optimal Design Information in a Two-Stage Adaptive Optimal Design Department of Statistics, University of Missouri Designed Experiments: Recent Advances in Methods and Applications DEMA 2011 Isaac Newton Institute for

More information

Numerical Analysis for Statisticians

Numerical Analysis for Statisticians Kenneth Lange Numerical Analysis for Statisticians Springer Contents Preface v 1 Recurrence Relations 1 1.1 Introduction 1 1.2 Binomial CoefRcients 1 1.3 Number of Partitions of a Set 2 1.4 Horner's Method

More information

arxiv: v1 [math.st] 21 Jun 2012

arxiv: v1 [math.st] 21 Jun 2012 IMS Collections c Institute of Mathematical Statistics, On Convergence Properties of the Monte Carlo EM Algorithm Ronald C. Neath Hunter College, City University of New York arxiv:1206.4768v1 math.st]

More information

Label Switching and Its Simple Solutions for Frequentist Mixture Models

Label Switching and Its Simple Solutions for Frequentist Mixture Models Label Switching and Its Simple Solutions for Frequentist Mixture Models Weixin Yao Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. wxyao@ksu.edu Abstract The label switching

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

ANALYSIS OF TWO-LEVEL STRUCTURAL EQUATION MODELS VIA EM TYPE ALGORITHMS

ANALYSIS OF TWO-LEVEL STRUCTURAL EQUATION MODELS VIA EM TYPE ALGORITHMS Statistica Sinica 8(1998), 749-766 ANALYSIS OF TWO-LEVEL STRUCTURAL EQUATION MODELS VIA EM TYPE ALGORITHMS Sik-Yum Lee and Wai-Yin Poon Chinese University of Hong Kong Abstract: In this paper, the maximum

More information

6. Fractional Imputation in Survey Sampling

6. Fractional Imputation in Survey Sampling 6. Fractional Imputation in Survey Sampling 1 Introduction Consider a finite population of N units identified by a set of indices U = {1, 2,, N} with N known. Associated with each unit i in the population

More information

Basics of Modern Missing Data Analysis

Basics of Modern Missing Data Analysis Basics of Modern Missing Data Analysis Kyle M. Lang Center for Research Methods and Data Analysis University of Kansas March 8, 2013 Topics to be Covered An introduction to the missing data problem Missing

More information

Bayesian inference for multivariate skew-normal and skew-t distributions

Bayesian inference for multivariate skew-normal and skew-t distributions Bayesian inference for multivariate skew-normal and skew-t distributions Brunero Liseo Sapienza Università di Roma Banff, May 2013 Outline Joint research with Antonio Parisi (Roma Tor Vergata) 1. Inferential

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

CONVERGENCE OF A STOCHASTIC APPROXIMATION VERSION OF THE EM ALGORITHM

CONVERGENCE OF A STOCHASTIC APPROXIMATION VERSION OF THE EM ALGORITHM The Annals of Statistics 1999, Vol. 27, No. 1, 94 128 CONVERGENCE OF A STOCHASTIC APPROXIMATION VERSION OF THE EM ALGORITHM By Bernard Delyon, Marc Lavielle and Eric Moulines IRISA/INRIA, Université Paris

More information

Statistical Methods for Handling Missing Data

Statistical Methods for Handling Missing Data Statistical Methods for Handling Missing Data Jae-Kwang Kim Department of Statistics, Iowa State University July 5th, 2014 Outline Textbook : Statistical Methods for handling incomplete data by Kim and

More information

Lecture 4 September 15

Lecture 4 September 15 IFT 6269: Probabilistic Graphical Models Fall 2017 Lecture 4 September 15 Lecturer: Simon Lacoste-Julien Scribe: Philippe Brouillard & Tristan Deleu 4.1 Maximum Likelihood principle Given a parametric

More information

Maximum likelihood estimation via the ECM algorithm: A general framework

Maximum likelihood estimation via the ECM algorithm: A general framework Biometrika (1993), 80, 2, pp. 267-78 Printed in Great Britain Maximum likelihood estimation via the ECM algorithm: A general framework BY XIAO-LI MENG Department of Statistics, University of Chicago, Chicago,

More information

Parameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets

Parameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets Parameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets Matthias Katzfuß Advisor: Dr. Noel Cressie Department of Statistics The Ohio State University

More information

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Bayesian Inference for DSGE Models. Lawrence J. Christiano Bayesian Inference for DSGE Models Lawrence J. Christiano Outline State space-observer form. convenient for model estimation and many other things. Bayesian inference Bayes rule. Monte Carlo integation.

More information

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007

More information

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Bayesian Inference for DSGE Models. Lawrence J. Christiano Bayesian Inference for DSGE Models Lawrence J. Christiano Outline State space-observer form. convenient for model estimation and many other things. Preliminaries. Probabilities. Maximum Likelihood. Bayesian

More information

SIMULTANEOUS CONFIDENCE INTERVALS AMONG k MEAN VECTORS IN REPEATED MEASURES WITH MISSING DATA

SIMULTANEOUS CONFIDENCE INTERVALS AMONG k MEAN VECTORS IN REPEATED MEASURES WITH MISSING DATA SIMULTANEOUS CONFIDENCE INTERVALS AMONG k MEAN VECTORS IN REPEATED MEASURES WITH MISSING DATA Kazuyuki Koizumi Department of Mathematics, Graduate School of Science Tokyo University of Science 1-3, Kagurazaka,

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Chapter 3 : Likelihood function and inference

Chapter 3 : Likelihood function and inference Chapter 3 : Likelihood function and inference 4 Likelihood function and inference The likelihood Information and curvature Sufficiency and ancilarity Maximum likelihood estimation Non-regular models EM

More information

Fractional Imputation in Survey Sampling: A Comparative Review

Fractional Imputation in Survey Sampling: A Comparative Review Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical

More information

PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS

PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS Statistica Sinica 15(2005), 831-840 PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS Florin Vaida University of California at San Diego Abstract: It is well known that the likelihood sequence of the EM algorithm

More information

A note on shaved dice inference

A note on shaved dice inference A note on shaved dice inference Rolf Sundberg Department of Mathematics, Stockholm University November 23, 2016 Abstract Two dice are rolled repeatedly, only their sum is registered. Have the two dice

More information

Issues on Fitting Univariate and Multivariate Hyperbolic Distributions

Issues on Fitting Univariate and Multivariate Hyperbolic Distributions Issues on Fitting Univariate and Multivariate Hyperbolic Distributions Thomas Tran PhD student Department of Statistics University of Auckland thtam@stat.auckland.ac.nz Fitting Hyperbolic Distributions

More information

Graduate Econometrics I: Maximum Likelihood I

Graduate Econometrics I: Maximum Likelihood I Graduate Econometrics I: Maximum Likelihood I Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

Likelihood-Based Methods

Likelihood-Based Methods Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)

More information

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Genet. Sel. Evol. 33 001) 443 45 443 INRA, EDP Sciences, 001 Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Louis Alberto GARCÍA-CORTÉS a, Daniel SORENSEN b, Note a

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random

More information

Stat 710: Mathematical Statistics Lecture 12

Stat 710: Mathematical Statistics Lecture 12 Stat 710: Mathematical Statistics Lecture 12 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 12 Feb 18, 2009 1 / 11 Lecture 12:

More information

Introduction An approximated EM algorithm Simulation studies Discussion

Introduction An approximated EM algorithm Simulation studies Discussion 1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse

More information

Econometrics I, Estimation

Econometrics I, Estimation Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the

More information

Posterior Simulation via the Signed Root Log-Likelihood Ratio

Posterior Simulation via the Signed Root Log-Likelihood Ratio Bayesian Analysis (2010) 5, Number 4, pp. 787 816 Posterior Simulation via the Signed Root Log-Likelihood Ratio S. A. Kharroubi and T. J. Sweeting Abstract. We explore the use of importance sampling based

More information

Markov Chain Monte Carlo

Markov Chain Monte Carlo Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).

More information

Testing for a Global Maximum of the Likelihood

Testing for a Global Maximum of the Likelihood Testing for a Global Maximum of the Likelihood Christophe Biernacki When several roots to the likelihood equation exist, the root corresponding to the global maximizer of the likelihood is generally retained

More information

STATISTICAL INFERENCE WITH DATA AUGMENTATION AND PARAMETER EXPANSION

STATISTICAL INFERENCE WITH DATA AUGMENTATION AND PARAMETER EXPANSION STATISTICAL INFERENCE WITH arxiv:1512.00847v1 [math.st] 2 Dec 2015 DATA AUGMENTATION AND PARAMETER EXPANSION Yannis G. Yatracos Faculty of Communication and Media Studies Cyprus University of Technology

More information

1 One-way analysis of variance

1 One-way analysis of variance LIST OF FORMULAS (Version from 21. November 2014) STK2120 1 One-way analysis of variance Assume X ij = µ+α i +ɛ ij ; j = 1, 2,..., J i ; i = 1, 2,..., I ; where ɛ ij -s are independent and N(0, σ 2 ) distributed.

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Comparison between conditional and marginal maximum likelihood for a class of item response models

Comparison between conditional and marginal maximum likelihood for a class of item response models (1/24) Comparison between conditional and marginal maximum likelihood for a class of item response models Francesco Bartolucci, University of Perugia (IT) Silvia Bacci, University of Perugia (IT) Claudia

More information

Bayesian inference for factor scores

Bayesian inference for factor scores Bayesian inference for factor scores Murray Aitkin and Irit Aitkin School of Mathematics and Statistics University of Newcastle UK October, 3 Abstract Bayesian inference for the parameters of the factor

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

A Comparative Study of Imputation Methods for Estimation of Missing Values of Per Capita Expenditure in Central Java

A Comparative Study of Imputation Methods for Estimation of Missing Values of Per Capita Expenditure in Central Java IOP Conference Series: Earth and Environmental Science PAPER OPEN ACCESS A Comparative Study of Imputation Methods for Estimation of Missing Values of Per Capita Expenditure in Central Java To cite this

More information

Bayesian inference for multivariate extreme value distributions

Bayesian inference for multivariate extreme value distributions Bayesian inference for multivariate extreme value distributions Sebastian Engelke Clément Dombry, Marco Oesting Toronto, Fields Institute, May 4th, 2016 Main motivation For a parametric model Z F θ of

More information

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

Linear Mixed Models for Longitudinal Data with Nonrandom Dropouts

Linear Mixed Models for Longitudinal Data with Nonrandom Dropouts Journal of Data Science 4(2006), 447-460 Linear Mixed Models for Longitudinal Data with Nonrandom Dropouts Ahmed M. Gad and Noha A. Youssif Cairo University Abstract: Longitudinal studies represent one

More information

On the Geometry of EM algorithms

On the Geometry of EM algorithms On the Geometry of EM algorithms David R. Hunter Technical report no. 0303 Department of Statistics Penn State University University Park, PA 16802-2111 email: dhunter@stat.psu.edu phone: (814) 863-0979

More information

MAXIMUM LIKELIHOOD INFERENCE IN ROBUST LINEAR MIXED-EFFECTS MODELS USING MULTIVARIATE t DISTRIBUTIONS

MAXIMUM LIKELIHOOD INFERENCE IN ROBUST LINEAR MIXED-EFFECTS MODELS USING MULTIVARIATE t DISTRIBUTIONS Statistica Sinica 17(2007), 929-943 MAXIMUM LIKELIHOOD INFERENCE IN ROBUST LINEAR MIXED-EFFECTS MODELS USING MULTIVARIATE t DISTRIBUTIONS Peter X.-K. Song 1, Peng Zhang 2 and Annie Qu 3 1 University of

More information

Stat 451 Lecture Notes Numerical Integration

Stat 451 Lecture Notes Numerical Integration Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Fitting Narrow Emission Lines in X-ray Spectra

Fitting Narrow Emission Lines in X-ray Spectra Outline Fitting Narrow Emission Lines in X-ray Spectra Taeyoung Park Department of Statistics, University of Pittsburgh October 11, 2007 Outline of Presentation Outline This talk has three components:

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance

More information

On Markov chain Monte Carlo methods for tall data

On Markov chain Monte Carlo methods for tall data On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational

More information

Lecture 4: Types of errors. Bayesian regression models. Logistic regression

Lecture 4: Types of errors. Bayesian regression models. Logistic regression Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture

More information

Stochastic approximation EM algorithm in nonlinear mixed effects model for viral load decrease during anti-hiv treatment

Stochastic approximation EM algorithm in nonlinear mixed effects model for viral load decrease during anti-hiv treatment Stochastic approximation EM algorithm in nonlinear mixed effects model for viral load decrease during anti-hiv treatment Adeline Samson 1, Marc Lavielle and France Mentré 1 1 INSERM E0357, Department of

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes

More information

Normalising constants and maximum likelihood inference

Normalising constants and maximum likelihood inference Normalising constants and maximum likelihood inference Jakob G. Rasmussen Department of Mathematics Aalborg University Denmark March 9, 2011 1/14 Today Normalising constants Approximation of normalising

More information

Gibbs Sampling in Endogenous Variables Models

Gibbs Sampling in Endogenous Variables Models Gibbs Sampling in Endogenous Variables Models Econ 690 Purdue University Outline 1 Motivation 2 Identification Issues 3 Posterior Simulation #1 4 Posterior Simulation #2 Motivation In this lecture we take

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Assume X P θ, θ Θ, with joint pdf (or pmf) f(x θ). Suppose we observe X = x. The Likelihood function is L(θ x) = f(x θ) as a function of θ (with the data x held fixed). The

More information

1. Fisher Information

1. Fisher Information 1. Fisher Information Let f(x θ) be a density function with the property that log f(x θ) is differentiable in θ throughout the open p-dimensional parameter set Θ R p ; then the score statistic (or score

More information

THE NESTED DIRICHLET DISTRIBUTION AND INCOMPLETE CATEGORICAL DATA ANALYSIS

THE NESTED DIRICHLET DISTRIBUTION AND INCOMPLETE CATEGORICAL DATA ANALYSIS Statistica Sinica 19 (2009), 251-271 THE NESTED DIRICHLET DISTRIBUTION AND INCOMPLETE CATEGORICAL DATA ANALYSIS Kai Wang Ng 1, Man-Lai Tang 2, Guo-Liang Tian 1 and Ming Tan 3 1 The University of Hong Kong,

More information

On Bayesian Computation

On Bayesian Computation On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints

More information

New Bayesian methods for model comparison

New Bayesian methods for model comparison Back to the future New Bayesian methods for model comparison Murray Aitkin murray.aitkin@unimelb.edu.au Department of Mathematics and Statistics The University of Melbourne Australia Bayesian Model Comparison

More information

An Introduction to mixture models

An Introduction to mixture models An Introduction to mixture models by Franck Picard Research Report No. 7 March 2007 Statistics for Systems Biology Group Jouy-en-Josas/Paris/Evry, France http://genome.jouy.inra.fr/ssb/ An introduction

More information

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. Rebecca Barter May 5, 2015 Linear Regression Review Linear Regression Review

More information

6 Pattern Mixture Models

6 Pattern Mixture Models 6 Pattern Mixture Models A common theme underlying the methods we have discussed so far is that interest focuses on making inference on parameters in a parametric or semiparametric model for the full data

More information