EM Algorithm II. September 11, 2018
|
|
- Spencer Fleming
- 5 years ago
- Views:
Transcription
1 EM Algorithm II September 11, 2018
2 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data log likelihood: l O (θ Y obs ) = log { f (Y obs, y mis θ) dy mis EM algorithm: { E step: h (k) (θ) E l C (θ Y obs, Y mis ) Y obs, θ. (k) Note, the integration is with respect { to Y mis Y obs, so E l C (θ Y obs, Y mis ) Y obs, θ (k) = l C (θ Y obs, Y mis ) f (y mis Y obs, θ)dy mis. M step: θ (k+1) = arg max θ h (k) (θ) Ascent property: l O (θ (k) Y obs ) is non-decreasing along k. If you can calculate it, it is a good idea to monitor it for debugging purpose. Issues: 1. Could trap in local maxima. 2. Slow convergence.
3 Standard errors for the EM estimates 2/27 Numerical approximation of the Hessian matrix Note: l(θ) = observed-data log-likelihood We estimate the gradient using { l(θ) i = l(θ) θ i l(θ + δ ie i ) l(θ δ i e i ) 2δ i where e i is a unit vector with 1 for the ith element and 0 otherwise. In calculating derivatives using this formula, generally start with some medium size δ and then repeatedly halve it until the estimated derivative stabilizes. We can estimate the Hessian by applying the above formula twice: { l(θ) i j l(θ + δ ie i + δ j e j ) l(θ + δ i e i δ j e j ) l(θ δ i e i + δ j e j ) + l(θ δ i e i δ j e j ) 4δ i δ j
4 Louis estimator (1982) 3/27 l C (θ Y obs, Y mis ) log { f (Y obs, Y mis θ) { l O (θ Y obs ) log f (Y obs, y mis θ) dy mis l C (θ Y obs, Y mis ), l O (θ Y obs ) = gradients of l C, l O l C (θ Y obs, Y mis ), l O (θ Y obs ) = second derivatives of l C, l O We can prove that (5) l O (θ Y obs ) = E { l C (θ Y obs, Y mis ) Y obs (6) l O (θ Y obs ) = E { l C (θ Y obs, Y mis ) Y obs E {[ l C (θ Y obs, Y mis ) ] 2 Y obs + [ l O (θ Y obs ) ] 2 MLE: ˆθ = arg max θ l O (θ Y obs ) Louis variance estimator: { l O (θ Y obs ) 1 evaluated at θ = ˆθ Note: All of the conditional expectations can be computed in the EM algorithm using only l C and l C, which are first and second derivatives of the complete-data log-likelihood. Louis estimator should be evaluated at the last step of EM.
5 Proof of (5) 4/27 Proof: By the definition of l O (θ Y obs ), l O (θ Y obs ) = log { f (Y obs, y mis θ) dy mis θ = f (Y obs, y mis θ) dy mis / θ f (Yobs, y mis θ) dy mis = f (Y obs, y mis θ) dy mis f (Yobs, y mis θ) dy mis (7) Multiplying and dividing the integrand of the numerator by f (Y obs, y mis θ) gives (5), f (Y obs,y mis θ) f (Y l O (θ Y obs ) = obs,y mis θ) f (Y obs, y mis θ) dy mis f (Yobs, y mis θ) dy mis = = = log { f (Y obs,y mis θ) θ f (Y obs, y mis θ) dy mis f (Yobs, y mis θ) dy mis f (Y obs, y mis θ) l C (θ Y obs, Y mis ) dy mis f (Yobs, y mis θ) dy mis l C (θ Y obs, Y mis ) f (y mis Y obs, θ) dy mis = E { l C (θ Y obs, Y mis ) Y obs.
6 Proof of (6) 5/27 Proof: We take an additional derivative of l O (θ Y obs ) in expression (7) to obtain l O (θ Y obs ) = = f (Y obs, y mis θ) dy mis f (Yobs, y mis θ) dy mis f (Y obs, y mis θ) dy mis f (Yobs, y mis θ) dy mis f 2 (Y obs, y mis θ) dy mis f (Yobs, y mis θ) dy mis { l O (θ Y obs ) 2 = f (Y obs, y mis θ) dy mis f (Y obs θ) { l O (θ Y obs ) 2 (8) To see how the first term breaks down, we take an additional derivative of log { f f (Yobs, y mis θ) (Y obs, y mis θ) dy mis = f (Y obs, y mis θ) dy mis θ
7 to obtain f 2 log { f (Y obs, y mis θ) (Y obs, y mis θ) dy mis = f (Y θ θ obs, y mis θ) dy mis [ ] 2 log { f (Yobs, y mis θ) + f (Y obs, y mis θ) dy mis θ Thus we express the first term in equation (8) to be E { l C (θ Y obs, Y mis ) Y obs + E {[ l C (θ Y obs, Y mis ) ] 2 Y obs.
8 Convergence rate of EM 7/27 Let I C (θ) and I O (θ) denote the complete information and observed information, respectively. One can show when the EM converges, the linear convergence rate, denoted as (θ (k+1) ˆθ)/(θ (k) ˆθ) approximates 1 I O (ˆθ)/I C (ˆθ). (later) This means that When missingness is small, EM converges quickly Otherwise EM converges slowly.
9 Towards SEM algorithm 8/27 EM algorithm does not generate asymptotic covariance matrix (standard errors) for parameters as a byproduct. The asymptotic covariance matrix for ˆθ, denoted as V, can be found as { l O (ˆθ Y obs ) 1. However, the derivations can be difficult to evaluate directly. In contrast, l C (ˆθ Y obs, Y mis ), and hence I OC E { l C (ˆθ Y obs, Y mis ) Y obs is relatively easier to evaluate. Louis estimator for covariance matrix, i.e., E { l C (ˆθ Y obs, Y mis ) Y obs E {[ l C (ˆθ Y obs, Y mis ) ] 2 Y obs + [ l O (ˆθ Y obs ) ] 2 requires calculation of the conditional expectation of the square of the complete-data score function, which is specific to each problem. Supplemented EM algorithm (Meng & Rubin, 1991) obtains covariance matrix by using only the code for computing the complete-data covariance matrix, the code for EM itself, and code for standard matrix operations.
10 Towards SEM algorithm (continued) 9/27 EM defines a mapping, M : θ (k+1) = M(θ (k) ), where M(θ) = (M 1 (θ),..., M p (θ)) Let {DM i j = ( M j (θ)/ θ i ) θ=ˆθ, which is a p p matrix. We can show that θ (k+1) ˆθ DM(θ (k) ˆθ), which means DM is the rate of convergence of EM. Proof: Because θ (k+1) = M(θ (k) ) and ˆθ = M(ˆθ), θ (k+1) ˆθ = M(θ (k) ) M(ˆθ). By Taylor series expansion on the right hand side, we have θ (k+1) ˆθ DM(θ (k) ˆθ). It has been shown that V = I 1 OC (I DM) 1, which means, the observed-data asymptotic variance can be obtained by inflating the complete-data asymptotic variance by the factor (1 DM) 1.
11 SEM algorithm 10/27 SEM consists of three parts 1. The evaluation of I OC 2. The evaluation of DM 3. The evaluation of V Evaluation of I OC E { l C (ˆθ Y obs, Y mis ) Y obs Example 1 (Grouped Multinomial): l C (θ X) = (x 1 + y 4 ) log θ + (y 2 + y 3 ) log (1 θ). Example 2 (Normal Mixtures): l C (µ, σ, p x, y) = i j y i j { log p j + log φ(x i µ j, σ j ) f (Y obs, Y mis ) exponential family: l C (θ X) = S (X) η(θ) B(θ) The l C, l C and l C are linear functions of x 1, i y i j and S (X) (sufficient statistics). Recall that we evaluate E(sufficient statistics Y obs, θ (k) ) at every E step. We easily obtain I OC by plugging in E(sufficient statistics Y obs, ˆθ) at the last E step
12 SEM algorithm (continued) 11/27 Evaluation of DM = {r i j For a scalar θ, we can use the sequence θ (k) to obtain DM. For a vector θ, we cannot do so, because θ (k+1) i ˆθ i j DM i j (θ (k) j ˆθ j ). Each DM i j is the component-wise rate of convergence of the following forced EM 1. Run EM to get the MLE ˆθ 2. Pick a starting point, θ (0), some small distance from ˆθ but not equal to ˆθ in any component 3. Repeat the following until r (k) i j is stable (a) Calculate θ (k) = M(θ (k 1) ) using one step of EM (b) For each i = 1,..., p, i. Let θ (k) (i) = (ˆθ 1,..., ˆθ i 1, θ (k) i, ˆθ i+1,..., ˆθ p ) (Replace the ith element of ˆθ with the ith element of θ (k) ) ii. Perform one step of EM on θ (k) (i) to obtain M[θ (k) (i)] iii. Obtain r (k) i j = { { M j [θ (k) (i)] ˆθ j / θ (k) i ˆθ i for j = 1,..., p
13 SEM algorithm (continued) 12/27 Note: The MLE ˆθ should be obtained at very low tolerance (e.g., ɛ = ) The final r i j is taken to be the first value of r (k) i j can be different for different (i, j). satisfying r (k) i j r (k 1) i j < ɛ, where k
14 MCEM algorithm 13/27 EM algorithm: the analytical integration of the likelihood required for the E-step can be difficult. Monte Carlo EM algorithm (Wei & Tanner, 1990) replaces the analytical integration in the E-step by a Monte Carlo integration procedure with MCMC sampling techniques such as the Gibbs or the Metropolis Hastings algorithm. MCE step: Simulate a sample Y mis,1,..., Y mis,m from f (Y mis Y obs, θ (k) ) and calculate m h (k) (θ) = m 1 l C (θ Y obs, Y mis, j ) M step: θ (k+1) = arg max θ h (k) (θ) j=1 Choose m to guarantee convergence Wei & Tanner recommend starting with small value of m and then increasing m as θ (k) moves closer to the true maximizer.
15 Bayesian EM 14/27 Suppose we have prior π(θ) and wish to find the mode of log posterior = l O (θ Y obs ) + log π(θ). { E step: h (k) (θ) E l C (θ Y obs, Y mis ) + log π(θ) Y obs, θ (k) { = E l C (θ Y obs, Y mis ) Y obs, θ (k) + log π(θ) M step: θ (k+1) = arg max θ h (k) (θ)
16 Bayesian EM: Example 15/27 Observed data: Complete data: where x 0 + x 1 = y 1. { (y 1, y 2, y 3 ) multinomial n; { (x 0, x 1, y 2, y 3 ) multinomial n; 2 + θ 4, 1 θ 2, θ 4 1 2, θ 4, 1 θ 2, θ 4 No Prior θ Beta(ν 1, ν 2 ) : π(θ) = Γ(ν 1+ν 2 ) Γ(ν 1 )Γ(ν 2 ) θ(ν 1 1) (1 θ) (ν 2 1) l C (θ x 0, x 1, y 2, y 3 ) (x 1 + y 3 ) log θ + y 2 log(1 θ) (x 1 + y 3 + ν 1 1) log θ + (y 2 + ν 2 1) log(1 θ) ω (k) 12 = E(x 1 θ (k), y 1 ) θ (k) y 1 /(θ (k) + 2) Same as left θ (k+1) (ω (k) 12 + y 3)/(ω (k) 12 + y 2 + y 3 ) (ω (k) 12 + y 3 + ν 1 1)/(ω (k) 12 + y 2 + y 3 + ν 1 + ν 2 2)
17 Generalized EM algorithm 16/27 Generalized EM (GEM) E step: evaluate h (k) (θ) as before M step: Choose θ (k+1) such that h (k) (θ (k+1) ) h (k) (θ (k) ) (do not necessarily maximize h (k) (θ), just increase it.) Note: This retains the ascent property of EM. EM gradient algorithm (Lange, 1995): a class of GEM M step: Do one step of Newton-Raphson: { θ (k+1) = θ (k) + α (k) d (k), where d (k) 2 h (k) 1 { (θ) h (k) (θ) = θ θ θ Start with α (k) = 1; do step-halving until h (k) (θ (k+1) ) h (k) (θ (k) ) θ=θ (k). Lange pointed out that one step Newton-Raphson saves us from performing iterations within iterations and yet still displays the same local rate of convergence as a full EM algorithm that maximizes h (k) (θ) at each iteration.
18 ECM algorithm 17/27 EM is unattractive if maximizing complete-data log likelihood h (k) (θ) is complicated. In many cases, maximizing h (k) (θ) is relatively simple when conditional on some of the parameters being estimated. Expectation Conditional Maximization algorithm (Meng & Rubin, 1993) replaces a (complicated) M step with a sequence of conditional maximization (CM) steps. E step: evaluate h (k) (θ) as before CM step: Partition θ into T parts: θ = (θ 1,..., θ T ). For t = 1,..., T, obtain θ (k+1) t = arg max θ t h (k) (θ (k+1) 1,..., θ (k+1) t 1, θ t, θ (k) t+1,..., θ(k) T ) Note: Sharing all appealing convergence properties of EM, such as ascent property Typically need more E- and M- iterations but can be faster in total computer time.
19 ECM algorithm: example 18/27 Complete data: y 1,..., y n gamma(α, β) with density f (y α, β) = yα 1 e y/β β α Γ(α) Observed data: y O = censoring of the complete data Complete-data log-likelihood l C (α, β y 1,..., y n ) = (α 1) log y i y i /β n { α log β + log Γ(α) i i Define ȳ = n 1 i y i, ḡ = n 1 i log y i, and ψ(α) = Γ (α)/γ(α) E step: ω (k) E(ȳ y O, α (k), β (k) ) τ (k) E(ḡ y O, α (k), β (k) ) CM steps Given α (k), β (k+1) = ω (k) /α (k) Given β (k+1), α (k+1) = ψ 1 (τ (k) log β (k+1) )
20 ECME algorithm 19/27 ECM Either algorithm (Liu & Rubin, 1994) is a generalization of the ECM algorithm. It replacs some CM-steps of ECM, which maximize the constrained expected complete-data log likelihood, with steps that maximize the correspondingly constrained observed-data log likelihood. E step: evaluate h (k) (θ) as before CM step: Partition θ into T parts: θ = (θ 1,..., θ T ). For t = 1,..., T, obtain either or Note: θ (k+1) t θ (k+1) t = arg max θ t = arg max θ t h (k) (θ (k+1) 1,..., θ (k+1) t 1, θ t, θ (k) t+1,..., θ(k) T ) l O (θ (k+1) 1,..., θ (k+1) t 1, θ t, θ (k) t+1,..., θ(k) T Y obs) Share with both EM and ECM their stable monotone convergence and simplicity of implementation Converge substantially faster than either EM or ECM, measured by either the number of iterations or actual computer time.
21 Example: Mixed-effects model 20/27 For a longitudinal dataset of i = 1,..., N subjects, each with t = 1,..., n i measurements of the response, a simple linear mixed effect model is given by Y it = X i β + b i + ɛ it, b i N(0, σ 2 b ), ɛ i N ni (0, σ 2 ɛ I ni ), b i, ɛ i independent Observed-data log-likelihood l(β, σ 2 b, σ2 ɛ Y 1,..., Y N ) where {Σ i tt = σ 2 b + σ2 ɛ and {Σ i tt = σ 2 b for t t. i { 1 2 (Y i X i β) Σ 1 i (Y i X i β) 1 2 log Σ i, In fact, this likelihood can be directly maximized for (β, σ 2 b, σ2 ɛ) by using Newton-Raphson or Fisher scoring. Note: Given (σ 2 b, σ2 ɛ) and hence Σ i, we obtain β that maximizes the likelihood by solving l(β, σ 2 b, σ2 ɛ Y 1,..., Y N ) = X i β Σ 1 i (Y i X i β) = 0, i 1 N N β = X i Σ 1 i X i X i Σ 1 i Y i. i=1 i=1
22 Example: Mixed-effects model (continued) 21/27 Complete-data log-likelihood: b i are treated as missing data Let ɛ i = Y i X i β b i. We know that ( ) {( ) ( ) bi 0 σ 2 = N, b σ 2 ɛ I ni ɛ i l C (β, σ 2 b, σ2 ɛ ɛ 1,..., ɛ N, b 1,..., b N ) { i 1 b 2 2σ 2 i 1 2 log σ2 b 1 ɛ 2σ b 2 i ɛ i n i ɛ 2 log σ2 ɛ The parameter that maximizes the l C is obtained as, given the complete data N σ 2 b = N 1 σ 2 ɛ = β = i=1 i=1 b 2 i 1 N n i N X i X i i=1 N i=1 1 ɛ i ɛ i N i=1 X i (Y i b i ).
23 Example: Mixed-effects model (continued) 22/27 E step: to evaluate We use the relationship E ( ) b 2 i Y i, β (k), σ 2(k) b, σ 2(k) ɛ E ( ɛ i ɛ Y i, β (k), σ (k) b, ) σ2(k) ɛ E ( b i Y i, β (k), σ (k) b, ) σ2(k) ɛ E(b 2 i Y i ) = {E(b i Y i ) 2 + Var(b i Y i ). Thus we need to calculate E(b i Y i ) and Var(b i Y i ). Recall the conditional distribution for multivariate normal variables ( ) {( ) ( Yi Xi β σ 2 = N, b e ni e n i + σ 2 ɛ I ni σ 2 b e ) n i, e n 0 i = (1, 1,..., 1) b i σ 2 b e n i Let Σ i = σ 2 b e n i e n i + σ 2 ɛ I ni. We known that σ 2 b E(b i Y i ) = 0 + σ 2 b e n i Σ 1 i (Y i X i β) Var(b i Y i ) = σ 2 b σ2 b e n i Σ 1 i σ 2 b e n i.
24 Example: Mixed-effects model (continued) 23/27 Similarly, We use the relationship We can derive E(ɛ i ɛ i Y i ) = E(ɛ i Y i )E(ɛ i Y i ) + Var(ɛ i Y i ). ( Yi ɛ i ) = N {( ) Xi β, 0 Let Σ i = σ 2 b e n i e n i + σ 2 ɛ I ni. Then we have M step of standard EM algorithm N = N 1 σ 2(k+1) b σ 2(k+1) ɛ = β (k+1) = ( ) σ 2 b e ni e n i + σ 2 ɛ I ni σ 2 ɛ I ni σ 2 ɛ I ni σ 2. ɛ I ni E(ɛ i Y i ) = 0 + σ 2 ɛσ 1 i (Y i X i β) Var(ɛ i Y i ) = σ 2 ɛ I ni σ 4 ɛσ 1 i. i=1 i=1 1 N n i N X i X i i=1 E(b 2 i Y i, β (k), σ 2(k) b, σɛ 2(k) )) (1) N i=1 1 E(ɛ i ɛ i Y i, β (k), σ 2(k) b, σ 2(k) ɛ ) (2) N i=1 X i E(Y i b i Y i, β (k), σ 2(k) b, σ 2(k) ɛ ). (3)
25 ECME algorithm: Mixed-effects model (continued) 24/27 M step of ECME algorithm Partition the parameter vector (β, σ 2 b, σ2 ɛ) as β and (σ 2 b, σ2 ɛ) First maximize complete-data log-likelihood over (σ 2 b, σ2 ɛ), given by (1) and (2) Given (σ 2(k+1) b, σ 2(k+1) ɛ ), we can calculate Σ (k+1) i = σ 2(k+1) b e ni e n i + σ 2(k+1) ɛ I ni and obtain β that maximizes the observed-data log likelihood 1 N { β (k+1) = X i Σ (k+1) 1 N { i Xi X i Σ (k+1) 1 i Yi. i=1 i=1 Convergence accelerated: In some of ECME s M steps, the observed-data likelihood is being conditionally maximized, rather than an approximation to it (expected complete-data log likelihood).
26 Accelerated EM 25/27 Aitken s acceleration method (Louis, 1982) Suppose θ (k) ˆθ, as k. Then Now ˆθ = θ (k) + [ θ (k+h+1) θ (k+h)]. h=0 θ (k+h+2) θ (k+h+1) = M(θ (k+h+1) ) M(θ (k+h) ) J(θ (k+h) ) [ θ (k+h+1) θ (k+h)] J(θ (k) ) [ θ (k+h+1) θ (k+h)] { J(θ (k) ) h+1 [ θ (k+1) θ (k)] M : mapping defined by EM J : Jabobian of M Thus { ˆθ θ (k) + J(θ (k) ) h [ θ (k+1) θ (k)] h=0 θ (k) + { I J(θ (k) ) 1 [ θ (k+1) θ (k)] by which we can produce the effect of an infinite number of iterations by the following algorithm
27 Accelerated EM 26/27 The algorithm: 1. From θ (k), produce θ (k+1) using EM 2. Estimate ( I J(θ (k) ) ) 1 by ( I ˆ J ) 1 (see below) 3. Compute θ (k+1) 4. Use θ (k+1) in step 1. = θ (k) + ( I ˆ J ) 1 [ θ (k+1) θ (k)] Louis (1982) showed ( I ˆ J ) 1 = IOC (I O ) 1 where I OC = E { l C (ˆθ Y obs, Y mis ) Y obs and IO can be obtained by the Louis formula.
28 References 27/27 Dempster, A.P., Laird, N., and Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39, Lange, K. (1995). A gradient algorithm locally equivalent to the EM algorithm. Journal of the Royal Statistical Society B, 57, Liu, C. and Rubin, D.B. (1994). The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence. Biometrika, 81, Louis, T.A. (1982). Finding observed information using the EM algorithm. Journal of the Royal Statistical Society B, 44, Meng, X. and Rubin, D.B. (1991). Using EM to obtain asymptotic variance-covariance matrices: The SEM algorithm. Journal of the American Statistical Association, 86, Meng, X. and Rubin, D.B. (1993). Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika, 80, Wei, G. C. G. and Tanner, M.A. (1990). A Monte Carlo implementation of the EM algorithm and the poor mans data augmentation algorithms. Journal of the American Statistical Association, 85,
Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton
EM Algorithm Last lecture 1/35 General optimization problems Newton Raphson Fisher scoring Quasi Newton Nonlinear regression models Gauss-Newton Generalized linear models Iteratively reweighted least squares
More informationBiostat 2065 Analysis of Incomplete Data
Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies
More informationEM for ML Estimation
Overview EM for ML Estimation An algorithm for Maximum Likelihood (ML) Estimation from incomplete data (Dempster, Laird, and Rubin, 1977) 1. Formulate complete data so that complete-data ML estimation
More informationOptimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.
Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationStatistical Estimation
Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from
More informationPh.D. Qualifying Exam Friday Saturday, January 3 4, 2014
Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014 Put your solution to each problem on a separate sheet of paper. Problem 1. (5166) Assume that two random samples {x i } and {y i } are independently
More informationLikelihood-based inference with missing data under missing-at-random
Likelihood-based inference with missing data under missing-at-random Jae-kwang Kim Joint work with Shu Yang Department of Statistics, Iowa State University May 4, 014 Outline 1. Introduction. Parametric
More informationMH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution
MH I Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution a lot of Bayesian mehods rely on the use of MH algorithm and it s famous
More informationCOMPETING RISKS WEIBULL MODEL: PARAMETER ESTIMATES AND THEIR ACCURACY
Annales Univ Sci Budapest, Sect Comp 45 2016) 45 55 COMPETING RISKS WEIBULL MODEL: PARAMETER ESTIMATES AND THEIR ACCURACY Ágnes M Kovács Budapest, Hungary) Howard M Taylor Newark, DE, USA) Communicated
More informationTHE UNIVERSITY OF CHICAGO CONSTRUCTION, IMPLEMENTATION, AND THEORY OF ALGORITHMS BASED ON DATA AUGMENTATION AND MODEL REDUCTION
THE UNIVERSITY OF CHICAGO CONSTRUCTION, IMPLEMENTATION, AND THEORY OF ALGORITHMS BASED ON DATA AUGMENTATION AND MODEL REDUCTION A DISSERTATION SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCES
More informationEstimating the parameters of hidden binomial trials by the EM algorithm
Hacettepe Journal of Mathematics and Statistics Volume 43 (5) (2014), 885 890 Estimating the parameters of hidden binomial trials by the EM algorithm Degang Zhu Received 02 : 09 : 2013 : Accepted 02 :
More informationComputational statistics
Computational statistics EM algorithm Thierry Denœux February-March 2017 Thierry Denœux Computational statistics February-March 2017 1 / 72 EM Algorithm An iterative optimization strategy motivated by
More informationECE 275B Homework #2 Due Thursday MIDTERM is Scheduled for Tuesday, February 21, 2012
Reading ECE 275B Homework #2 Due Thursday 2-16-12 MIDTERM is Scheduled for Tuesday, February 21, 2012 Read and understand the Newton-Raphson and Method of Scores MLE procedures given in Kay, Example 7.11,
More informationFor iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.
Large Sample Theory Study approximate behaviour of ˆθ by studying the function U. Notice U is sum of independent random variables. Theorem: If Y 1, Y 2,... are iid with mean µ then Yi n µ Called law of
More informationECE 275B Homework #2 Due Thursday 2/12/2015. MIDTERM is Scheduled for Thursday, February 19, 2015
Reading ECE 275B Homework #2 Due Thursday 2/12/2015 MIDTERM is Scheduled for Thursday, February 19, 2015 Read and understand the Newton-Raphson and Method of Scores MLE procedures given in Kay, Example
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationSTAT Advanced Bayesian Inference
1 / 8 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics March 5, 2018 Distributional approximations 2 / 8 Distributional approximations are useful for quick inferences, as starting
More informationStreamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level
Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level A Monte Carlo Simulation to Test the Tenability of the SuperMatrix Approach Kyle M Lang Quantitative Psychology
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationP n. This is called the law of large numbers but it comes in two forms: Strong and Weak.
Large Sample Theory Large Sample Theory is a name given to the search for approximations to the behaviour of statistical procedures which are derived by computing limits as the sample size, n, tends to
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationShu Yang and Jae Kwang Kim. Harvard University and Iowa State University
Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND
More informationA note on multiple imputation for general purpose estimation
A note on multiple imputation for general purpose estimation Shu Yang Jae Kwang Kim SSC meeting June 16, 2015 Shu Yang, Jae Kwang Kim Multiple Imputation June 16, 2015 1 / 32 Introduction Basic Setup Assume
More informationParametric fractional imputation for missing data analysis
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????),??,?, pp. 1 15 C???? Biometrika Trust Printed in
More informationLecture 3 September 1
STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have
More informationApproximate Normality, Newton-Raphson, & Multivariate Delta Method
Approximate Normality, Newton-Raphson, & Multivariate Delta Method Timothy Hanson Department of Statistics, University of South Carolina Stat 740: Statistical Computing 1 / 39 Statistical models come in
More informationInformation in a Two-Stage Adaptive Optimal Design
Information in a Two-Stage Adaptive Optimal Design Department of Statistics, University of Missouri Designed Experiments: Recent Advances in Methods and Applications DEMA 2011 Isaac Newton Institute for
More informationNumerical Analysis for Statisticians
Kenneth Lange Numerical Analysis for Statisticians Springer Contents Preface v 1 Recurrence Relations 1 1.1 Introduction 1 1.2 Binomial CoefRcients 1 1.3 Number of Partitions of a Set 2 1.4 Horner's Method
More informationarxiv: v1 [math.st] 21 Jun 2012
IMS Collections c Institute of Mathematical Statistics, On Convergence Properties of the Monte Carlo EM Algorithm Ronald C. Neath Hunter College, City University of New York arxiv:1206.4768v1 math.st]
More informationLabel Switching and Its Simple Solutions for Frequentist Mixture Models
Label Switching and Its Simple Solutions for Frequentist Mixture Models Weixin Yao Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. wxyao@ksu.edu Abstract The label switching
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationPh.D. Qualifying Exam Friday Saturday, January 6 7, 2017
Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a
More informationANALYSIS OF TWO-LEVEL STRUCTURAL EQUATION MODELS VIA EM TYPE ALGORITHMS
Statistica Sinica 8(1998), 749-766 ANALYSIS OF TWO-LEVEL STRUCTURAL EQUATION MODELS VIA EM TYPE ALGORITHMS Sik-Yum Lee and Wai-Yin Poon Chinese University of Hong Kong Abstract: In this paper, the maximum
More information6. Fractional Imputation in Survey Sampling
6. Fractional Imputation in Survey Sampling 1 Introduction Consider a finite population of N units identified by a set of indices U = {1, 2,, N} with N known. Associated with each unit i in the population
More informationBasics of Modern Missing Data Analysis
Basics of Modern Missing Data Analysis Kyle M. Lang Center for Research Methods and Data Analysis University of Kansas March 8, 2013 Topics to be Covered An introduction to the missing data problem Missing
More informationBayesian inference for multivariate skew-normal and skew-t distributions
Bayesian inference for multivariate skew-normal and skew-t distributions Brunero Liseo Sapienza Università di Roma Banff, May 2013 Outline Joint research with Antonio Parisi (Roma Tor Vergata) 1. Inferential
More informationOutline of GLMs. Definitions
Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of
More informationCONVERGENCE OF A STOCHASTIC APPROXIMATION VERSION OF THE EM ALGORITHM
The Annals of Statistics 1999, Vol. 27, No. 1, 94 128 CONVERGENCE OF A STOCHASTIC APPROXIMATION VERSION OF THE EM ALGORITHM By Bernard Delyon, Marc Lavielle and Eric Moulines IRISA/INRIA, Université Paris
More informationStatistical Methods for Handling Missing Data
Statistical Methods for Handling Missing Data Jae-Kwang Kim Department of Statistics, Iowa State University July 5th, 2014 Outline Textbook : Statistical Methods for handling incomplete data by Kim and
More informationLecture 4 September 15
IFT 6269: Probabilistic Graphical Models Fall 2017 Lecture 4 September 15 Lecturer: Simon Lacoste-Julien Scribe: Philippe Brouillard & Tristan Deleu 4.1 Maximum Likelihood principle Given a parametric
More informationMaximum likelihood estimation via the ECM algorithm: A general framework
Biometrika (1993), 80, 2, pp. 267-78 Printed in Great Britain Maximum likelihood estimation via the ECM algorithm: A general framework BY XIAO-LI MENG Department of Statistics, University of Chicago, Chicago,
More informationParameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets
Parameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets Matthias Katzfuß Advisor: Dr. Noel Cressie Department of Statistics The Ohio State University
More informationBayesian Inference for DSGE Models. Lawrence J. Christiano
Bayesian Inference for DSGE Models Lawrence J. Christiano Outline State space-observer form. convenient for model estimation and many other things. Bayesian inference Bayes rule. Monte Carlo integation.
More informationParametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory
Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007
More informationBayesian Inference for DSGE Models. Lawrence J. Christiano
Bayesian Inference for DSGE Models Lawrence J. Christiano Outline State space-observer form. convenient for model estimation and many other things. Preliminaries. Probabilities. Maximum Likelihood. Bayesian
More informationSIMULTANEOUS CONFIDENCE INTERVALS AMONG k MEAN VECTORS IN REPEATED MEASURES WITH MISSING DATA
SIMULTANEOUS CONFIDENCE INTERVALS AMONG k MEAN VECTORS IN REPEATED MEASURES WITH MISSING DATA Kazuyuki Koizumi Department of Mathematics, Graduate School of Science Tokyo University of Science 1-3, Kagurazaka,
More informationLatent Variable Models and EM algorithm
Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationChapter 3 : Likelihood function and inference
Chapter 3 : Likelihood function and inference 4 Likelihood function and inference The likelihood Information and curvature Sufficiency and ancilarity Maximum likelihood estimation Non-regular models EM
More informationFractional Imputation in Survey Sampling: A Comparative Review
Fractional Imputation in Survey Sampling: A Comparative Review Shu Yang Jae-Kwang Kim Iowa State University Joint Statistical Meetings, August 2015 Outline Introduction Fractional imputation Features Numerical
More informationPARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS
Statistica Sinica 15(2005), 831-840 PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS Florin Vaida University of California at San Diego Abstract: It is well known that the likelihood sequence of the EM algorithm
More informationA note on shaved dice inference
A note on shaved dice inference Rolf Sundberg Department of Mathematics, Stockholm University November 23, 2016 Abstract Two dice are rolled repeatedly, only their sum is registered. Have the two dice
More informationIssues on Fitting Univariate and Multivariate Hyperbolic Distributions
Issues on Fitting Univariate and Multivariate Hyperbolic Distributions Thomas Tran PhD student Department of Statistics University of Auckland thtam@stat.auckland.ac.nz Fitting Hyperbolic Distributions
More informationGraduate Econometrics I: Maximum Likelihood I
Graduate Econometrics I: Maximum Likelihood I Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More informationLikelihood-Based Methods
Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)
More informationAlternative implementations of Monte Carlo EM algorithms for likelihood inferences
Genet. Sel. Evol. 33 001) 443 45 443 INRA, EDP Sciences, 001 Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Louis Alberto GARCÍA-CORTÉS a, Daniel SORENSEN b, Note a
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random
More informationStat 710: Mathematical Statistics Lecture 12
Stat 710: Mathematical Statistics Lecture 12 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 12 Feb 18, 2009 1 / 11 Lecture 12:
More informationIntroduction An approximated EM algorithm Simulation studies Discussion
1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse
More informationEconometrics I, Estimation
Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the
More informationPosterior Simulation via the Signed Root Log-Likelihood Ratio
Bayesian Analysis (2010) 5, Number 4, pp. 787 816 Posterior Simulation via the Signed Root Log-Likelihood Ratio S. A. Kharroubi and T. J. Sweeting Abstract. We explore the use of importance sampling based
More informationMarkov Chain Monte Carlo
Markov Chain Monte Carlo Recall: To compute the expectation E ( h(y ) ) we use the approximation E(h(Y )) 1 n n h(y ) t=1 with Y (1),..., Y (n) h(y). Thus our aim is to sample Y (1),..., Y (n) from f(y).
More informationTesting for a Global Maximum of the Likelihood
Testing for a Global Maximum of the Likelihood Christophe Biernacki When several roots to the likelihood equation exist, the root corresponding to the global maximizer of the likelihood is generally retained
More informationSTATISTICAL INFERENCE WITH DATA AUGMENTATION AND PARAMETER EXPANSION
STATISTICAL INFERENCE WITH arxiv:1512.00847v1 [math.st] 2 Dec 2015 DATA AUGMENTATION AND PARAMETER EXPANSION Yannis G. Yatracos Faculty of Communication and Media Studies Cyprus University of Technology
More information1 One-way analysis of variance
LIST OF FORMULAS (Version from 21. November 2014) STK2120 1 One-way analysis of variance Assume X ij = µ+α i +ɛ ij ; j = 1, 2,..., J i ; i = 1, 2,..., I ; where ɛ ij -s are independent and N(0, σ 2 ) distributed.
More informationLOGISTIC REGRESSION Joseph M. Hilbe
LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of
More informationComparison between conditional and marginal maximum likelihood for a class of item response models
(1/24) Comparison between conditional and marginal maximum likelihood for a class of item response models Francesco Bartolucci, University of Perugia (IT) Silvia Bacci, University of Perugia (IT) Claudia
More informationBayesian inference for factor scores
Bayesian inference for factor scores Murray Aitkin and Irit Aitkin School of Mathematics and Statistics University of Newcastle UK October, 3 Abstract Bayesian inference for the parameters of the factor
More informationDefault Priors and Effcient Posterior Computation in Bayesian
Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature
More informationA Comparative Study of Imputation Methods for Estimation of Missing Values of Per Capita Expenditure in Central Java
IOP Conference Series: Earth and Environmental Science PAPER OPEN ACCESS A Comparative Study of Imputation Methods for Estimation of Missing Values of Per Capita Expenditure in Central Java To cite this
More informationBayesian inference for multivariate extreme value distributions
Bayesian inference for multivariate extreme value distributions Sebastian Engelke Clément Dombry, Marco Oesting Toronto, Fields Institute, May 4th, 2016 Main motivation For a parametric model Z F θ of
More informationFractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling
Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction
More informationLinear Mixed Models for Longitudinal Data with Nonrandom Dropouts
Journal of Data Science 4(2006), 447-460 Linear Mixed Models for Longitudinal Data with Nonrandom Dropouts Ahmed M. Gad and Noha A. Youssif Cairo University Abstract: Longitudinal studies represent one
More informationOn the Geometry of EM algorithms
On the Geometry of EM algorithms David R. Hunter Technical report no. 0303 Department of Statistics Penn State University University Park, PA 16802-2111 email: dhunter@stat.psu.edu phone: (814) 863-0979
More informationMAXIMUM LIKELIHOOD INFERENCE IN ROBUST LINEAR MIXED-EFFECTS MODELS USING MULTIVARIATE t DISTRIBUTIONS
Statistica Sinica 17(2007), 929-943 MAXIMUM LIKELIHOOD INFERENCE IN ROBUST LINEAR MIXED-EFFECTS MODELS USING MULTIVARIATE t DISTRIBUTIONS Peter X.-K. Song 1, Peng Zhang 2 and Annie Qu 3 1 University of
More informationStat 451 Lecture Notes Numerical Integration
Stat 451 Lecture Notes 03 12 Numerical Integration Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapter 5 in Givens & Hoeting, and Chapters 4 & 18 of Lange 2 Updated: February 11, 2016 1 / 29
More informationMultilevel Statistical Models: 3 rd edition, 2003 Contents
Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction
More informationFitting Narrow Emission Lines in X-ray Spectra
Outline Fitting Narrow Emission Lines in X-ray Spectra Taeyoung Park Department of Statistics, University of Pittsburgh October 11, 2007 Outline of Presentation Outline This talk has three components:
More informationProbabilistic Graphical Models
2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector
More informationLecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models
Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance
More informationOn Markov chain Monte Carlo methods for tall data
On Markov chain Monte Carlo methods for tall data Remi Bardenet, Arnaud Doucet, Chris Holmes Paper review by: David Carlson October 29, 2016 Introduction Many data sets in machine learning and computational
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationStochastic approximation EM algorithm in nonlinear mixed effects model for viral load decrease during anti-hiv treatment
Stochastic approximation EM algorithm in nonlinear mixed effects model for viral load decrease during anti-hiv treatment Adeline Samson 1, Marc Lavielle and France Mentré 1 1 INSERM E0357, Department of
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes
More informationNormalising constants and maximum likelihood inference
Normalising constants and maximum likelihood inference Jakob G. Rasmussen Department of Mathematics Aalborg University Denmark March 9, 2011 1/14 Today Normalising constants Approximation of normalising
More informationGibbs Sampling in Endogenous Variables Models
Gibbs Sampling in Endogenous Variables Models Econ 690 Purdue University Outline 1 Motivation 2 Identification Issues 3 Posterior Simulation #1 4 Posterior Simulation #2 Motivation In this lecture we take
More informationMaximum Likelihood Estimation
Maximum Likelihood Estimation Assume X P θ, θ Θ, with joint pdf (or pmf) f(x θ). Suppose we observe X = x. The Likelihood function is L(θ x) = f(x θ) as a function of θ (with the data x held fixed). The
More information1. Fisher Information
1. Fisher Information Let f(x θ) be a density function with the property that log f(x θ) is differentiable in θ throughout the open p-dimensional parameter set Θ R p ; then the score statistic (or score
More informationTHE NESTED DIRICHLET DISTRIBUTION AND INCOMPLETE CATEGORICAL DATA ANALYSIS
Statistica Sinica 19 (2009), 251-271 THE NESTED DIRICHLET DISTRIBUTION AND INCOMPLETE CATEGORICAL DATA ANALYSIS Kai Wang Ng 1, Man-Lai Tang 2, Guo-Liang Tian 1 and Ming Tan 3 1 The University of Hong Kong,
More informationOn Bayesian Computation
On Bayesian Computation Michael I. Jordan with Elaine Angelino, Maxim Rabinovich, Martin Wainwright and Yun Yang Previous Work: Information Constraints on Inference Minimize the minimax risk under constraints
More informationNew Bayesian methods for model comparison
Back to the future New Bayesian methods for model comparison Murray Aitkin murray.aitkin@unimelb.edu.au Department of Mathematics and Statistics The University of Melbourne Australia Bayesian Model Comparison
More informationAn Introduction to mixture models
An Introduction to mixture models by Franck Picard Research Report No. 7 March 2007 Statistics for Systems Biology Group Jouy-en-Josas/Paris/Evry, France http://genome.jouy.inra.fr/ssb/ An introduction
More informationSTAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.
STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. Rebecca Barter May 5, 2015 Linear Regression Review Linear Regression Review
More information6 Pattern Mixture Models
6 Pattern Mixture Models A common theme underlying the methods we have discussed so far is that interest focuses on making inference on parameters in a parametric or semiparametric model for the full data
More information