Label Switching and Its Simple Solutions for Frequentist Mixture Models

Size: px
Start display at page:

Download "Label Switching and Its Simple Solutions for Frequentist Mixture Models"

Transcription

1 Label Switching and Its Simple Solutions for Frequentist Mixture Models Weixin Yao Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. Abstract The label switching problem for Bayesian mixtures has been extensively researched in recent years. However, there is much less attention on the label switching issue for frequentist mixture models. In this article, we discuss the label switching problem and the importance of solving it for frequentist mixture models when using simulation study or bootstrap to evaluate the performance of mixture model estimators. We argue that many existing labeling methods for Bayesian mixtures can not be simply applied to frequentist mixture models. Two new simple but effective labeling methods are proposed for frequentist mixture models. The new labeling methods can incorporate the information of component labels of each sample, which is available for the simulation study or parametric bootstrap for frequentist mixture models. Our empirical studies demonstrate that the new proposed methods work well and provide better results than the traditionally used order constraint labeling. In addition, the simulation studies also demonstrate that the simple order constraint labeling can sometimes lead to sever biased and even meaningless estimates, and thus might provide misleading estimated variation. Key words: Complete likelihood; Label switching; Mixture models; 1

2 1 Introduction Label switching has long been known to be a big challenging problem for Bayesian mixture modeling. It occurs due to the invariance of mixture likelihood to the permutations of component labels. Many methods have been proposed to solve the label switching for Bayesian mixtures. See, for example, Stephens (2000), Celeux, Hurn, and Robert (2000), Chung, Loken, and Schafer (2004), Geweke (2007), Yao and Lindsay (2009), Grün and Leisch (2009), Sperrin, Jaki, and Wit (2010), and Papastamoulis and Iliopoulos (2010). However, there is much less attention on the label switching issue for frequentist mixture models. (As far as we know, all the label switching paper by far are devoted to the Bayesian mixture models.) One of the reasons is that the label switching issue is more obvious and sever for Bayesian mixture analysis. In this article, we will discuss the label switching problem and the importance of solving it for frequentist mixture models when using simulation study or bootstrap to evaluate the performance of mixture model estimators. We argue that many existing labeling methods for Bayesian mixtures can not be simply applied to frequentist mixture models. Two new simple but effective labeling methods are then proposed for frequentist mixture models. Let x = (x 1,, x n ) be independent identically distributed (iid) observations from a mixture density with m (m is assumed to be known and finite) components: p(x; θ) = π 1 f(x; λ 1 ) + π 2 f(x; λ 2 ) + + π m f(x; λ m ), (1.1) where θ = (π 1,..., π m 1, λ 1,..., λ m ), π j likelihood function for x = (x 1,, x n ) is > 0 for all js, and m j=1 π j = 1. Then the L(θ; x) = n {π 1 f(x i ; λ 1 ) + π 2 f(x i ; λ 2 ) + + π m f(x i ; λ m )}. (1.2) i=1 2

3 The maximum likelihood estimator (MLE) of θ, by maximizing (1.2), is straightforward using the EM algorithm (Dempster et al. 1977). For a general introduction to mixture models, see, for example, Lindsay (1995), Böhning (1999), McLachlan and Peel (2000), and Frühwirth-Schnatter (2006). For any permutation ω = (ω(1),..., ω(m)) of the identity permutation (1,..., m), define the corresponding permutation of the parameter vector θ by θ ω = (πω(1),..., πω(m 1), λω(1),..., λω(m)). A special feature of mixture model is that the likelihood function L(θ ω ; x) is numerically the same as L(θ; x) for any permutation ω. Hence if ˆθ is the MLE, ˆθ ω is the MLE for any permutation ω. If one is only interested in a point estimator, the MLE suffices for the purpose without any label switching problem. (Note that for Bayesian mixtures, the label switching needs to be solved even for a point estimator.) However, if one wants to use a simulation study or bootstrap approach to evaluate the variation of the MLE for mixture models, the label switching problem will similarly occur. Given a sequence of raw unlabeled estimates (ˆθ 1,..., ˆθ N ) of θ, in order to measure their variation, one must first label these samples, i.e., find the labels (ω 1,..., ω N ) such that (ˆθ ω 1 1,..., ˆθ ω N N ) have the same label meaning. Without correct labels, the estimates tend to have serious bias and the estimated variation might be also misleading. Theoretically, one may also estimate the variation of parameter estimates by their asymptotic covariance matrix and compute them by inverting the observed or expected information matrix at the MLE. In practice, however, this may be tedious analytically or computationally, although there have been many computation simpler methods proposed to estimate the observed information matrix. See, for example, Louis (1982), Meilijson (1989), Meng and Rubin (1991), and McLachlan and Krishnan (1997, Sect. 3

4 4.5). However, it is well known that the estimates of the covariance matrix of the MLE based on the expected or observed information matrices are guaranteed to be valid inferentially only asymptotically. Basford, Greenway, McLachlan, and Peel (1997) compared the bootstrap and information-based approaches for some normal mixture models and found that unless the sample size was very large, the standard errors obtained by an information-based approach were too unstable to be recommended. Therefore, the bootstrap approach is usually preferred and thus solving labeling switching is crucial. One of the most commonly used solutions to label switching is to simply put an explicit parameter constraint on all the estimates so that only one permutation can satisfy it for each estimate. One main problem with the order constraint labeling is that it can only use the label information of one component parameter at a time, which is not desirable when many component parameters can simultaneously provide label information. Another main problem with identifiability constraint labeling is the choice of constraint, especially for multivariate problems. Different order constraints may generate markedly different results; it is difficult to anticipate the overall effect. In addition, as demonstrated by Stephens (2000), for Bayesian mixtures, many choices of identifiability constraint do not completely remove the symmetry of the posterior distribution. As a result, label switching problem may remain after imposing an identifiability constraint. Therefore, it is expected that many choices of identifiability constraint don t work well for frequentist mixture models, either, which is also verified by our simulation studies in Section 3. Although, many other labeling methods have been proposed for Bayesian mixture models. However, many of them depend on the special structure of Bayesian mixture models and cannot be directly applied to frequentist mixture models. For example, the popular relabeling algorithm (Stephens, 2000) based on Kullback-Liebler divergence needs to calculate the classification probabilities for the same set of observations for all 4

5 sampled parameters. However, in frequency simulation study, each parameter estimate is usually estimated based on different generated sets of observations. The maximum a posterior (MAP) (Marin, Mengersen, and Robert, 2005) and posterior modes associated labeling (Yao and Lindsay, 2009) both depend on the special structure of posterior distribution. There are some exceptions, though, such as the normal likelihood based clustering method of Yao and Lindsay (2009) and the method of data-dependent priors (Chung, Loken, and Schafer, 2004), although it requires more research about how to apply the latter method when the data is not univariate. In this article, we will propose two simple but effective labeling methods for frequentist mixture models. In a simulation study or parametric bootstrap, the component labels for each observation are known. The proposed labeling methods try to make use of this valuable component labels information. The first method is to do labeling by maximizing the complete likelihood. The second method is to do labeling by minimizing the Euclidean distance between the classification probabilities and the latent true labels. Our empirical studies demonstrate that the new proposed methods work well and provide better labeling results than the traditionally used order constraint labeling. It is well known that the order constraint labeling methods don t work well and cannot completely remove the label switching in many cases for Bayesian mixture models. In this article, we use the simulation studies to demonstrate the similar undesirable results for frequentist mixture models, i.e., the simple order constraint labeling can sometimes lead to sever biased and even meaningless estimates, and thus might provide misleading estimated variation for frequentist mixture models. The structure of the paper is as follows. Section 2 introduces our new labeling methods. In Section 3, we use a simulation study to compare the proposed labeling methods to the traditionally used order constraint labeling. We summarize the proposed labeling methods and discuss some future research work in Section 4. 5

6 2 New Labeling Methods Suppose (ˆθ 1,..., ˆθ N ) are N raw unlabeled maximum likelihood estimates of mixture model (1.1) in a simulation study or bootstrap procedure. Our objective is to find the labels (ω 1,..., ω N ) such that (ˆθ ω 1 1,..., ˆθ ω N N ) have the same meaning of component labels. estimates. Then we can use the labeled samples to evaluate the variation of parameter In a simulation study or parametric bootstrap, the latent component labels for each observation are known. Our proposed new labeling methods try to make use of this valuable component labels information. Suppose x = {x 1,..., x n } is a typical generated data and the corresponding unlabeled MLE is ˆθ. Define the latent variable z = {z ij, i = 1,..., n, j = 1,..., m}, where 1, if the i th observation x i is from the j th component ; z ij = 0, otherwise. Complete likelihood based labeling: The first proposed method is to find the label ω for ˆθ by maximizing the complete likelihood of (x, z) over ω L(ˆθ ω ; x, z) = n m {ˆπ ω j f(x i ; ˆλ ω j )} z ij, (2.1) i=1 j=1 where ˆπ ω j = ˆπω(j), and ˆλ ω j = ˆλω(j). Unlike the mixture likelihood, the complete likelihood L(ˆθ; x, z) is not invariant to the permutation of component labels, since the variable z carries the label information. Therefore L(ˆθ; x, z) carries the label information of parameter ˆθ and can be used to do labeling. Here, we make use of the information of latent variable z to break the permutation symmetry of mixture likelihood. 6

7 Note that log{l(ˆθ ω ; y, z)} = = = where n i=1 n i=1 n i=1 m j=1 [ z ij log{ˆπ ω j f(x i ; ˆλ ω ] j )} (2.2) [ { m ˆπ ω j f(x i ; ˆλ ω }] j ) n m [ z ij log p(x i ; ˆθ ω + z ij log{p(x i ; ˆθ ω ] )} ) i=1 j=1 m z ij log p ij (ˆθ ω n ) + log p(x i ; ˆθ ω ) (2.3) j=1 j=1 i=1 p ij (θ ω ) = πω j f(x i ; λ ω j ) p(x i ; θ ω ) and p(x i ; θ ω ) = m j=1 πω j f(x i ; λ ω j ). Notice that the second term of (2.3) is log mixture likelihood and thus is invariant to the permutation of component labels of ˆθ. Therefore, we have the following result. Theorem 2.1. Maximizing L(ˆθ ω ; y, z) with respect to ω in (2.1) is equivalent to maximizing l 1 (ˆθ ω ; y, z) = n m z ij log p ij (ˆθ ω ), (2.4) i=1 j=1 which is equivalent to minimizing the Kullback-Leibler divergence if we consider z ij as the true classification probability and p ij (ˆθ) as the estimated classification. In practice, it is usually easier to work on (2.4) than (2.1), since the classification probabilities p ij (θ) is a byproduct of an EM algorithm. In addition, note that p ij (θ ω ) = p iω(j)(θ). Therefore, we don t need to find the classification probabilities for each permutation of θ and thus the computation of the complete likelihood labeling is usually very fast. Distance based labeling: The second method is to do labeling by minimizing the following Euclidian distance between the classification probabilities and the true latent 7

8 labels over ω l 2 (ˆθ ω ; y, z) = n i=1 m j=1 { p ij (ˆθ ω ) z ij } 2 (2.5) Here, we want to find the labels such that the estimated classification probabilities are as similar to latent labels as possible based on the Euclidian distance. Note that n m {p ij (ˆθ ω } 2 n m ) z ij = [{p ij (ˆθ ω ] n m )} 2 + {z ij } 2 2 z ij p ij (ˆθ ω ) i=1 j=1 i=1 j=1 i=1 j=1 The first part of above formula is invariant to the label. Thus to minimize the above Euclidian distance is equivalent to maximizing the second part. Therefore, we have the following result. Theorem 2.2. Minimizing l 2 (ˆθ ω ; y, z) with respect to ω in (2.5) is equivalent to maximizing l 3 (ˆθ ω ; y, z) = n m z ij p ij (ˆθ ω ). (2.6) i=1 j=1 Note that the objective function (2.4) and (2.6) are very similar, except that (2.4) uses log transformation for p ij (ˆθ ω ) but (2.6) does not. Both of the above two proposed labeling methods can be also applied to nonparametric bootstrap. Based on the MLE for the original samples (x 1,..., x n ), we can get the estimated classification probabilities {ˆp ij, i = 1,..., n, j = 1,..., m}. Then we can simply replace z ij in (2.4) by ˆp ij or let 1, if ˆp ij > ˆp il for all l j; z ij = 0, otherwise. 8

9 3 Simulation Study In this section, we will use a simulation study to compare the proposed complete likelihood based labeling method (COMPLH) and the Euclidean distance based labeling method (DISTLAT) with traditionally used order constraint labeling and the normal likelihood based labeling method (NORMLH) (Yao and Lindsay, 2009) for both univariate and multivariate normal mixture models. It is well known that the normal mixture models with unequal variance have unbounded likelihood function. Therefore, the maximum likelihood estimate (MLE) is not well defined. Similar unboundness issue also exists for multivariate normal mixture models with unequal covariance. There has been considerable research dealing with the unbounded mixture likelihood issue. See, for example, Hathaway (1985, 1986), Chen, Tan, and Zhang (2008), Chen and Tan (2009), and Yao (2010). However, since our focus is not directly on parameter estimation but on how to label them after they are derived, we will assume, without loss of generality and for simplicity of computation only, equal variance (covariance) for univariate (multivariate) normal mixture models when using EM algorithm to find the MLE. The EM algorithm is run based on 20 randomly chosen initial values and stops until the maximum difference between the updated parameter estimates of two consecutive iterations is less than To compare different labeling results, we report the average and standard deviation of labeled MLEs for different labeling methods. It is expected that the ideally labeled estimates should have small bias. Therefore, the bias is a good indicator of how well each labeling method performs. Note that the estimated standard errors by each labeling method cannot be directly used to compare different labeling methods since the true standard errors are unknown even in simulation setting. In addition, as discussed in Section 1, the estimated standard errors obtained by an information-based approach were usually too unstable when the sample size is not too large. 9

10 Example 1. We are interested in evaluating the performance of MLE for the mixture model π 1 N(µ 1, 1) + (1 π 2 )N(µ 2, 1), where π 1 = 0.3 and µ 1 = 0. We consider the following four cases for µ 2 : I) µ 2 = 0.5; II) µ 2 = 1; III) µ 2 = 1.5; IV) µ 2 = 2. The above four cases have unequal component proportions and the separation of two mixture components increases from Case I to Case IV. For each case, we run 500 replicates for both sample size 50 and 200 and find the MLE for each replicate by EM algorithm assuming equal variance. We consider five labeling methods: order constraint labeling based on component means (OC-µ), order constraint labeling based on component proportions (OC-π), normal likelihood based labeling (NORMLH), complete likelihood based labeling (COMPLH), and the Euclidean distance based labeling method (DISTLAT). Table 1 and 2 report the average and standard deviation (Std) of the MLEs based on different labeling methods for n = 50 and 200, respectively. Since the equal variance is assumed, the variance estimates do not carry the labeling information and are the same for different labeling methods. Therefore, we did not report the variance estimates in the Tables for simplicity of comparison. From Tables 1 and 2, we can see that COMPLH and DISTLAT have similar results and have smaller bias than all other labeling methods, especially when two components are close, and NORMLH also has slightly smaller bias than OC-µ and OC-π, which have large bias, especially when the components are close. In addition, we can see that all labeling methods perform better and have smaller bias when sample size increases. Therefore, it is expected that those five labeling methods will provide similar labeled estimates for all four cases considered in this example when sample size is large enough. 10

11 Example 2. We generate independent and identically distributed (iid) samples (x 1,..., x n ) from π 1 N µ 11 µ 12, (1 π 2 )N µ 21 µ 22, , where µ 11 = 0 and µ 12 = 0. We consider the following four cases for π 1, µ 21, µ 22, σ 2 : I. π 1 = 0.3, µ 21 = 0.5, µ 22 = 0.5; II. π 1 = 0.3, µ 21 = 1, µ 22 = 1; III. π 1 = 0.3, µ 21 = 1.5, µ 22 = 1.5; IV. π 1 = 0.5, µ 21 = 3, µ 22 = 0. For each case, we run 500 replicates for both sample size 100 and 400 and find the MLE for each replicate by EM algorithm assuming equal covariance. We consider the following six labeling methods: order constraint labeling based on the component means of first dimension (OC-µ 1 ), order constraint labeling based on the component means of second dimension (OC-µ 2 ), order constraint labeling based on the component proportions (OCπ), NORMLH, COMPLH, and DISTLAT. Table 3 and 4 report the average and standard deviation (Std) of the MLEs based on different labeling methods for n = 100 and 400, respectively. From the tables, we can see that, for Case I to Case III, COMPLH and DISTLAT have similar results and provide smaller bias of labeled estimate than all other labeling methods, especially when the components are close and the sample size is not large. In addition, NORMLH also provide smaller bias than OC-µ 1, OC-µ 2, and, OC-π. For Case IV, the component means of the second dimension are the same but the component means of the first dimension are very separate. In addition, the component proportions are the same. In this case, 11

12 the component means of the second dimension and component proportions don t have any label information. From the tables, we can see that OC-µ 2 and OC-π provide unreasonable estimates and have large bias for both n = 100 and n = 400. Note that in this case, OC-µ 2 and OC-π won t work well even when the sample size is larger than 400 due to the wrong order constraint component parameters used. However, OC-µ 1, NORMLH, COMPLH, and DISTLAT all work well for both n = 100 and n = 400. Example 3. (x 1,..., x n ) from We generate independent and identically distributed (iid) samples 3 π j N µ j1 j=1 µ j2, , where π 1 = 0.2, π 2 = 0.3, π 3 = 0.5, µ 11 = 0, and µ 12 = 0. We consider the following three cases for (µ 21, µ 22, µ 31, µ 32 ): I. µ 21 = 0.5, µ 22 = 0.5, µ 31 = 1, µ 32 = 1; II. µ 21 = 1, µ 22 = 1, µ 31 = 2, µ 32 = 2; III. µ 21 = 0, µ 22 = 2, µ 31 = 0, µ 32 = 4. For each case, we run 500 replicates for both sample size 100 and 400 and find the MLE for each replicate by EM algorithm assuming equal covariance. We consider the following six labeling methods: OC-µ 1, OC-µ 2, OC-π, NORMLH, COMPLH, and DISTLAT. Table 5 and 6 report the average and standard deviation (Std) of the MLEs based on different labeling methods for n = 100 and 400, respectively. The findings are similar to Example 2. From the tables, we can see that COMPLH and DISTLAT produce similar results in most of cases and have overall better performance than all other labeling methods, especially when sample size is small. 12

13 4 Summary Label switching issue has not received as much attention for frequentist mixture models as for Bayesian mixture models. In this article, we explain the importance of solving label switching issue for frequentist mixture models and propose a new labeling method. Based on the simulation study, we can see that the proposed complete likelihood based labeling method and the Euclidean distance based labeling method by incorporating the information of latent labels have overall better performance than all other methods considered in our simulation study. In addition, it can be seen that the order constraint labeling methods only work well when the constrained component parameters have enough label information but usually provide poor estimates and even unreasonable estimates if the constrained component parameters are not very separate. Therefore, in practice, the choice of constraint is very sensitive and thus difficult, especially for multivariate problems. Different order constraints may generate markedly different results and it is difficult to anticipate the overall effect. As explained in Section 1, many of the labeling methods proposed for Bayesian mixtures cannot be directly applied to frequentist mixtures. However, it requires more research whether we can apply, either directly or after some revision, some of the Bayesian labeling methods to frequentist mixtures. 5 Acknowledgements This work is related to my Ph.D. dissertation. I am indebted to my dissertation advisor, Bruce G. Lindsay, for his assistance and counsel in this research. 13

14 References Basford, K. E., Greenway, D. R., McLachlan, G. J., and Peel, D. (1997). Standard errors of fitted means under normal mixture models. Computational Statistics, 12:1-17. Böhning, D. (1999). Computer-Assisted Analysis of Mixtures and Applications, Boca Raton, FL: Chapman and Hall/CRC. Celeux, G. (1998), Bayesian inference for mixtures: The label switching problem. In Compstat 98-Proc. in Computational Statistics (eds. R. Payne and P.J. Green), Physica, Heidelberg. Celeux, G., Hurn, M., and Robert, C. P. (2000). Computational and inferential difficulties with mixture posterior distributions. Journal of the American Statistical Association, 95, Chen, J., Tan, X., and Zhang, R. (2008). Inference for normal mixture in mean and variance. Statistica Sincia, 18, Chen, J. and Tan, X. (2009). Inference for multivariate normal mixtures. Journal of Multivariate Analysis, 100, Chung, H., Loken, E., and Schafer, J. L. (2004). Difficulties in drawing inferences with finite-mixture models: a simple example with a simple solution. The American Statistican, 58, Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of Royal Statistical Society, Ser B., 39, Frühwirth-Schnatter, S. (2001). Markov chain Monte Carlo estimation of classical and 14

15 dynamic switching and mixture models. Journal of the American Statistical Association, 96, (2006). Finite Mixture and Markov Switching Models, Springer, Geweke, J. (2007). Interpretation and inference in mixture models: Simple MCMC works. Computational Statistics and Data Analysis, 51, Grün, B. and Leisch, F. (2009). Dealing with label switching in mixture models under genuine multimodality. Journal of Multivariate Analysis, 100, Hathaway, R. J. (1985). A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Annals of Statistics, 13, Hathaway, R. J. (1986). A constrained EM algorithm for univariate mixtures. Journal of Statistical Computation and Simulation, 23, Lindsay, B. G., (1995). Mixture Models: Theory, Geometry, and Applications. NSF- CBMS Regional Conference Series in Probability and Statistics v 5, Hayward, CA: Institure of Mathematical Statistics. Louis, T. A. (1982). Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society, Series B, 44: Marin, J.-M., Mengersen, K. L. and Robert, C. P. (2005). Bayesian modelling and inference on mixtures of distributions. Handbook of Statistics 25 (eds. D. Dey and C.R. Rao), North-Holland, Amsterdam. McLachlan, G. J. and Krishnan, T. (1997). The EM Algorithm and Extensions. Wiley, New York. McLachlan, G. J. and Peel, D. (2000). Finite Mixture Models. New York: Wiley. 15

16 Meilijson, I. (1989). A fast improvement of the EM algorithm in its own terms. Journal of the Royal Statistical Society, Series B, 51: Meng, X. L. and Rubin, D. B. (1991). Using EM to obtain asymptotic variancecovariance matrices: the SEM algorithm. Journal of the American Statistical Association, 86: Papastamoulis, P. and Iliopoulos, G. (2010), An artificial allocations based solution to the label switching problem in Bayesian analysis of mixtures of distributions, Journal of Computational and Graphical Statistics, 19, Sperrin, M., Jaki, T., and Wit, E. (2010), Probabilistic relabeling strategies for the label switching problem in Bayesian mixture models, Statistics and Computing, 20, Stephens, M. (2000). Dealing with label switching in mixture models. Journal of Royal Statistical Society, Ser B., 62, Yao, W. (2010). A profile likelihood method for normal mixture with unequal variance. Journal of Statistical Planning and Inference, 140, Yao, W. and Lindsay, B. G. (2009). Bayesian mixture labeling by highest posterior density. Journal of American Statistical Association, 104,

17 Table 1: Average (Std) of Point Estimates Over 500 Repetitions When n = 50 for Example 1. Case TRUE OC-µ OC-π NORMLH COMPLH DISTLAT I µ 1 : (0.705) 0.250(1.456) (1.240) (1.356) (1.371) µ 2 : (0.640) 0.355(0.476) 0.851(0.457) 0.645(0.529) 0.622(0.518) π 1 : (0.284) 0.257(0.149) 0.353(0.245) 0.304(0.207) 0.298(0.201) II µ 1 : (0.722) 0.479(1.497) (1.22) 0.075(1.309) 0.121(1.348) µ 2 : (0.617) 0.748(0.516) 1.257(0.470) 1.152(0.499) 1.106(0.491) π1 : (0.277) 0.266(0.151) 0.359(0.241) 0.328(0.220) 0.312(0.207) III µ 1 : (0.711) 0.441(1.436) 0.026(1.110) 0.031(1.112) 0.071(1.167) µ 1 : (0.575) 1.229(0.576) 1.644(0.412) 1.638(0.415) 1.598(0.416) π 1 : (0.238) 0.289(0.144) 0.351(0.208) 0.351(0.208) 0.336(0.196) IV µ 1 : (0.709) 0.288(1.322) (0.901) (0.940) 0.002(0.975) µ 2 : (0.461) 1.782(0.594) 2.100(0.366) 2.083(0.366) 2.067(0.365) π 1 : (0.191) 0.294(0.129) 0.333(0.177) 0.326(0.170) 0.320(0.163) Table 2: Average (Std) of Point Estimates Over 500 Repetitions When n = 200 for Example 1. Case TRUE OC-µ OC-π NORMLH COMPLH DISTLAT I µ 1 : (0.718) 0.122(1.345) (0.718) (1.282) (1.29) µ 2 : (0.613) 0.401(0.381) 0.720(0.366) 0.588(0.367) 0.571(0.369) π 1 : (0.293) 0.251(0.162) 0.318(0.235) 0.273(0.192) 0.269(0.187) II µ 1 : (0.728) 0.342(1.332) (1.130) 0.050(1.178) 0.077(1.199) µ 2 : (0.559) 0.786(0.424) 1.133(0.353) 1.078(0.366) 1.052(0.371) π1 : (0.271) 0.270(0.159) 0.339(0.229) 0.317(0.211) 0.307(0.202) III µ 1 : (0.621) 0.113(1.029) (0.792) (0.797) (0.797) µ 1 : (0.358) 1.351(0.429) 1.540(0.273) 1.537(0.275) 1.537(0.275) π 1 : (0.190) 0.287(0.139) 0.307(0.165) 0.306(0.163) 0.306(0.163) IV µ 1 : (0.367) 0.017(0.532) (0.418) (0.428) (0.428) µ 1 : (0.220) 1.964(0.277) 2.002(0.187) 2.000(0.189) 2.000(0.189) π 1 : (0.103) 0.300(0.092) 0.303(0.099) 0.302(0.097) 0.302(0.097) 17

18 Table 3: Average (Std) of Point Estimates Over 500 Repetitions When n = 100 for Example 2. Case TRUE OC-µ 1 OC-µ 2 OC-π NORMLH COMPLH DISTLAT I µ 11 : (0.633) 0.306(0.819) 0.308(1.136) 0.252(1.130) 0.145(1.091) 0.164(1.099) µ 12 : (0.92) (0.604) 0.395(1.175) 0.408(1.180) 0.241(1.153) 0.244(1.155) µ 21 : (0.593) 0.363(0.868) 0.360(0.366) 0.417(0.358) 0.524(0.404) 0.504(0.399) µ 22 : (0.816) 0.976(0.639) 0.338(0.363) 0.325(0.359) 0.493(0.392) 0.490(0.388) π 1 : (0.278) 0.512(0.278) 0.269(0.155) 0.273(0.160) 0.309(0.202) 0.303(0.196) II µ 11 : (0.598) 0.444(0.937) 0.435(1.101) 0.281(1.040) 0.258(1.019) 0.264(1.024) µ 12 : (0.916) (0.631) 0.392(1.123) 0.272(1.070) 0.213(1.027) 0.222(1.036) µ 21 : (0.536) 0.775(0.708) 0.784(0.400) 0.939(0.367) 0.962(0.383) 0.956(0.382) µ 22 : (0.782) 1.232(0.534) 0.793(0.399) 0.913(0.378) 0.972(0.396) 0.963(0.390) π 1 : (0.258) 0.431(0.256) 0.280(0.148) 0.300(0.174) 0.322(0.196) 0.317(0.191) III µ 11 : (0.563) 0.23(0.841) 0.233(0.902) 0.150(0.814) 0.136(0.789) 0.138(0.797) µ 12 : (0.823) 0.036(0.540) 0.215(0.859) 0.107(0.727) 0.115(0.736) 0.120(0.743) µ 21 : (0.354) 1.355(0.505) 1.352(0.391) 1.435(0.329) 1.448(0.331) 1.447(0.319) µ 22 : (0.507) 1.536(0.344) 1.357(0.380) 1.465(0.293) 1.458(0.303) 1.453(0.308) π 1 : (0.177) 0.343(0.175) 0.296(0.117) 0.309(0.137) 0.311(0.139) 0.309(0.137) IV µ 11 : (0.257) 1.536(1.510) 1.629(1.568) 0.030(0.257) 0.030(0.256) 0.030(0.256) µ 12 : (0.228) (0.207) (0.266) (0.228) (0.227) (0.227) µ 21 : (0.248) 1.493(1.510) 1.399(1.434) 2.998(0.248) 2.998(0.248) 2.998(0.248) µ 22 : (0.225) 0.129(0.159) 0.008(0.177) 0.004(0.225) 0.004(0.224) 0.004(0.225) π 1 : (0.075) 0.493(0.075) 0.444(0.050) 0.504(0.075) 0.504(0.075) 0.504(0.075) 18

19 Table 4: Average (Std) of Point Estimates Over 500 Repetitions When n = 400 for Example 2. Case TRUE OC-µ 1 OC-µ 2 OC-π NORMLH COMPLH DISTLAT I µ 11 : (0.544) 0.263(0.767) 0.259(0.991) 0.118(0.949) 0.124(0.952) 0.125(0.954) µ 12 : (0.820) (0.580) 0.292(1.060) 0.144(1.030) 0.172(1.034) 0.173(1.035) µ 21 : (0.519) 0.370(0.700) 0.374(0.310) 0.515(0.325) 0.509(0.320) 0.508(0.320) µ 22 : (0.739) 0.857(0.573) 0.369(0.304) 0.517(0.312) 0.490(0.315) 0.488(0.313) π 1 : (0.286) 0.474(0.286) 0.266(0.167) 0.301(0.207) 0.294(0.200) 0.293(0.199) II µ 11 : (0.488) 0.258(0.712) 0.210(0.798) 0.102(0.706) 0.104(0.708) 0.104(0.708) µ 12 : (0.800) (0.494) 0.246(0.860) 0.132(0.776) 0.130(0.773) 0.130(0.773) µ 21 : (0.345) 0.809(0.537) 0.857(0.316) 0.965(0.275) 0.963(0.277) 0.963(0.277) µ 22 : (0.518) 1.121(0.388) 0.863(0.322) 0.977(0.278) 0.979(0.276) 0.979(0.276) π 1 : (0.211) 0.379(0.221) 0.289(0.136) 0.309(0.164) 0.310(0.164) 0.310(0.164) III µ 11 : (0.217) 0.005(0.252) (0.224) (0.232) (0.232) (0.232) µ 12 : (0.287) 0.007(0.208) 0.022(0.294) 0.013(0.259) 0.013(0.259) 0.013(0.259) µ 21 : (0.112) 1.492(0.188) 1.503(0.117) 1.502(0.122) 1.502(0.122) 1.502(0.122) µ 22 : (0.158) 1.508(0.135) 1.493(0.143) 1.501(0.120) 1.501(0.119) 1.501(0.119) π 1 : (0.067) 0.305(0.071) 0.302(0.063) 0.302(0.063) 0.302(0.063) 0.302(0.063) IV µ 11 : (0.097) 1.401(1.51) 1.474(1.533) 0.001(0.097) 0.001(0.097) 0.001(0.097) µ 12 : (0.082) (0.063) 0.001(0.086) (0.082) (0.082) (0.082) µ 21 : (0.095) 1.601(1.500) 1.527(1.475) 3.001(0.095) 3.001(0.095) 3.001(0.095) µ 22 : (0.082) 0.053(0.063) 0.002(0.078) 0.005(0.082) 0.005(0.082) 0.005(0.082) π 1 : (0.031) 0.498(0.031) 0.475(0.019) 0.500(0.031) 0.499(0.031) 0.500(0.031) 19

20 Table 5: Average (Std) of Point Estimates Over 500 Repetitions When n = 100 for Example 3. Case TRUE OC-µ 1 OC-µ 2 OC-π NORMLH COMPLH DISTLAT I µ 11 : (0.719) 0.403(1.110) 0.435(1.504) 0.260(1.480) 0.045(1.255) 0.079(1.225) µ 12 : (1.100) (0.695) 0.477(1.465) 0.400(1.440) (1.187) 0.045(1.164) µ 21 : (0.353) 0.636(0.853) 0.627(0.863) 0.802(0.837) 0.658(0.911) 0.642(0.988) µ 22 : (0.874) 0.620(0.404) 0.663(0.873) 0.718(0.903) 0.746(0.891) 0.735(0.977) µ 31 : (0.666) 0.715(1.090) 0.691(0.389) 0.692(0.364) 1.052(0.522) 1.034(0.505) µ 32 : (1.030) 1.625(0.604) 0.683(0.414) 0.705(0.382) 1.080(0.535) 1.043(0.521) π 1 : (0.159) 0.255(0.187) 0.123(0.079) 0.127(0.084) 0.182(0.132) 0.192(0.137) π 2 : (0.184) 0.460(0.182) 0.314(0.094) 0.313(0.102) 0.352(0.187) 0.332(0.190) II µ 11 : (0.671) 0.265(0.922) 0.776(1.530) (0.752) 0.089(0.973) 0.180(0.943) µ 12 : (1.070) (0.618) 0.905(1.599) 0.039(0.815) 0.149(1.085) 0.145(1.039) µ 21 : (0.449) 1.353(0.948) 1.084(0.995) 1.415(0.625) 1.248(0.807) 1.248(1.061) µ 22 : (0.988) 1.228(0.444) 1.058(1.001) 1.144(0.575) 1.256(0.809) 1.352(0.977) µ 31 : (0.566) 1.788(0.954) 1.547(0.506) 2.018(0.852) 2.070(0.555) 1.979(0.484) µ 32 : (0.852) 2.387(0.609) 1.543(0.541) 2.323(0.683) 2.102(0.566) 2.009(0.498) π 1 : (0.131) 0.233(0.126) 0.141(0.076) 0.223(0.123) 0.204(0.118) 0.210(0.113) π 2 : (0.167) 0.432(0.167) 0.325(0.074) 0.450(0.154) 0.380(0.166) 0.337(0.179) III µ 11 : (0.595) (0.630) (1.120) (0.734) (0.819) (0.699) µ 12 : (1.890) 0.159(0.763) 1.441(2.138) 0.901(1.770) 0.350(1.255) 0.380(1.219) µ 21 : (0.193) 0.049(0.754) 0.010(0.483) 0.438(0.726) 0.033(0.781) 0.031(0.958) µ 22 : (1.650) 2.415(0.864) 1.840(1.444) 1.930(1.270) 2.331(0.993) 2.400(1.244) µ 31 : (0.616) (0.779) 0.006(0.295) 0.009(0.329) 0.015(0.537) (0.407) µ 32 : (1.780) 4.107(0.551) 3.400(0.827) 3.850(0.508) 3.999(0.491) 3.900(0.455) π 1 : (0.159) 0.232(0.113) 0.154(0.076) 0.206(0.106) 0.212(0.107) 0.223(0.105) π 2 : (0.156) 0.369(0.151) 0.325(0.061) 0.303(0.129) 0.341(0.143) 0.297(0.146) 20

21 Table 6: Average (Std) of Point Estimates Over 500 Repetitions When n = 400 for Example 3. Case TRUE OC-µ 1 OC-µ 2 OC-π NORMLH COMPLH DISTLAT I µ 11 : (0.678) 0.233(1.040) 0.485(1.406) 0.463(1.410) (1.185) (1.116) µ 12 : (1.020) (0.658) 0.406(1.320) 0.392(1.320) (1.11) (1.059) µ 21 : (0.325) 0.669(0.784) 0.537(0.751) 0.491(0.735) 0.682(0.718) 0.741(0.861) µ 22 : (0.678) 0.588(0.294) 0.581(0.766) 0.524(0.761) 0.664(0.705) 0.729(0.819) µ 31 : (0.603) 0.837(0.886) 0.717(0.330) 0.785(0.318) 1.061(0.418) 0.999(0.389) µ 32 : (0.890) 1.461(0.545) 0.697(0.339) 0.769(0.320) 1.036(0.435) 0.976(0.394) π 1 : (0.166) 0.231(0.171) 0.111(0.077) 0.111(0.078) 0.157(0.124) 0.168(0.125) π 2 : (0.190) 0.468(0.201) 0.309(0.092) 0.316(0.105) 0.364(0.187) 0.322(0.189) II µ 11 : (0.492) 0.132(0.717) 0.733(1.403) 0.063(0.392) 0.062(0.782) 0.041(0.550) µ 12 : (0.638) (0.496) 0.701(1.388) 0.074(0.362) 0.012(0.713) 0.007(0.483) µ 21 : (0.427) 1.266(0.850) 0.988(0.908) 1.318(1.110) 1.220(0.763) 1.318(1.001) µ 22 : (0.916) 1.146(0.438) 0.999(0.924) 1.265(1.130) 1.229(0.746) 1.324(0.997) µ 31 : (0.512) 1.960(0.672) 1.637(0.405) 1.977(0.327) 2.076(0.388) 1.999(0.327) µ 32 : (0.661) 2.252(0.481) 1.621(0.409) 1.982(0.339) 2.080(0.398) 1.991(0.343) π 1 : (0.102) 0.215(0.104) 0.143(0.078) 0.251(0.094) 0.203(0.105) 0.226(0.098) π 2 : (0.164) 0.414(0.157) 0.325(0.062) 0.267(0.180) 0.362(0.161) 0.298(0.174) III µ 11 : (0.424) (0.320) 0.024(0.739) (0.217) 0.001(0.482) 0.009(0.299) µ 12 : (1.790) 0.016(0.441) 0.748(1.609) 0.050(0.398) 0.052(0.598) 0.062(0.533) µ 21 : (0.092) 0.022(0.568) (0.202) 0.033(0.727) 0.004(0.569) 0.013(0.697) µ 22 : (1.630) 2.167(0.718) 1.735(1.157) 2.192(0.958) 2.162(0.791) 2.178(0.916) µ 31 : (0.471) 0.016(0.433) 0.001(0.160) (0.192) 0.011(0.237) (0.194) µ 32 : (1.730) 4.022(0.289) 3.722(0.568) 3.964(0.277) 3.992(0.272) 3.965(0.275) π 1 : (0.153) 0.214(0.080) 0.167(0.071) 0.221(0.077) 0.206(0.083) 0.217(0.078) π 2 : (0.146) 0.324(0.123) 0.323(0.045) 0.284(0.122) 0.309(0.111) 0.288(0.119) 21

Label switching and its solutions for frequentist mixture models

Label switching and its solutions for frequentist mixture models Journal of Statistical Computation and Simulation ISSN: 0094-9655 (Print) 1563-5163 (Online) Journal homepage: http://www.tandfonline.com/loi/gscs20 Label switching and its solutions for frequentist mixture

More information

Bayesian Mixture Labeling by Minimizing Deviance of. Classification Probabilities to Reference Labels

Bayesian Mixture Labeling by Minimizing Deviance of. Classification Probabilities to Reference Labels Bayesian Mixture Labeling by Minimizing Deviance of Classification Probabilities to Reference Labels Weixin Yao and Longhai Li Abstract Solving label switching is crucial for interpreting the results of

More information

A Simple Solution to Bayesian Mixture Labeling

A Simple Solution to Bayesian Mixture Labeling Communications in Statistics - Simulation and Computation ISSN: 0361-0918 (Print) 1532-4141 (Online) Journal homepage: http://www.tandfonline.com/loi/lssp20 A Simple Solution to Bayesian Mixture Labeling

More information

Relabel mixture models via modal clustering

Relabel mixture models via modal clustering Communications in Statistics - Simulation and Computation ISSN: 0361-0918 (Print) 1532-4141 (Online) Journal homepage: http://www.tandfonline.com/loi/lssp20 Relabel mixture models via modal clustering

More information

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model Thai Journal of Mathematics : 45 58 Special Issue: Annual Meeting in Mathematics 207 http://thaijmath.in.cmu.ac.th ISSN 686-0209 The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for

More information

Computation of an efficient and robust estimator in a semiparametric mixture model

Computation of an efficient and robust estimator in a semiparametric mixture model Journal of Statistical Computation and Simulation ISSN: 0094-9655 (Print) 1563-5163 (Online) Journal homepage: http://www.tandfonline.com/loi/gscs20 Computation of an efficient and robust estimator in

More information

Root Selection in Normal Mixture Models

Root Selection in Normal Mixture Models Root Selection in Normal Mixture Models Byungtae Seo a, Daeyoung Kim,b a Department of Statistics, Sungkyunkwan University, Seoul 11-745, Korea b Department of Mathematics and Statistics, University of

More information

EM Algorithm II. September 11, 2018

EM Algorithm II. September 11, 2018 EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data

More information

Robust mixture regression using the t-distribution

Robust mixture regression using the t-distribution This is the author s final, peer-reviewed manuscript as accepted for publication. The publisher-formatted version may be available through the publisher s web site or your institution s library. Robust

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Biostat 2065 Analysis of Incomplete Data

Biostat 2065 Analysis of Incomplete Data Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies

More information

Testing for a Global Maximum of the Likelihood

Testing for a Global Maximum of the Likelihood Testing for a Global Maximum of the Likelihood Christophe Biernacki When several roots to the likelihood equation exist, the root corresponding to the global maximizer of the likelihood is generally retained

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

An empirical comparison of EM, SEM and MCMC performance for problematic Gaussian mixture likelihoods

An empirical comparison of EM, SEM and MCMC performance for problematic Gaussian mixture likelihoods Statistics and Computing 14: 323 332, 2004 C 2004 Kluwer Academic Publishers. Manufactured in The Netherlands. An empirical comparison of EM, SEM and MCMC performance for problematic Gaussian mixture likelihoods

More information

Model-based cluster analysis: a Defence. Gilles Celeux Inria Futurs

Model-based cluster analysis: a Defence. Gilles Celeux Inria Futurs Model-based cluster analysis: a Defence Gilles Celeux Inria Futurs Model-based cluster analysis Model-based clustering (MBC) consists of assuming that the data come from a source with several subpopulations.

More information

KANSAS STATE UNIVERSITY

KANSAS STATE UNIVERSITY ROBUST MIXTURES OF REGRESSION MODELS by XIUQIN BAI M.S., Kansas State University, USA, 2010 AN ABSTRACT OF A DISSERTATION submitted in partial fulfillment of the requirements for the degree DOCTOR OF PHILOSOPHY

More information

Different points of view for selecting a latent structure model

Different points of view for selecting a latent structure model Different points of view for selecting a latent structure model Gilles Celeux Inria Saclay-Île-de-France, Université Paris-Sud Latent structure models: two different point of views Density estimation LSM

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Estimating the parameters of hidden binomial trials by the EM algorithm

Estimating the parameters of hidden binomial trials by the EM algorithm Hacettepe Journal of Mathematics and Statistics Volume 43 (5) (2014), 885 890 Estimating the parameters of hidden binomial trials by the EM algorithm Degang Zhu Received 02 : 09 : 2013 : Accepted 02 :

More information

Bayesian finite mixtures with an unknown number of. components: the allocation sampler

Bayesian finite mixtures with an unknown number of. components: the allocation sampler Bayesian finite mixtures with an unknown number of components: the allocation sampler Agostino Nobile and Alastair Fearnside University of Glasgow, UK 2 nd June 2005 Abstract A new Markov chain Monte Carlo

More information

Adaptive Metropolis with Online Relabeling

Adaptive Metropolis with Online Relabeling Adaptive Metropolis with Online Relabeling Anonymous Unknown Abstract We propose a novel adaptive MCMC algorithm named AMOR (Adaptive Metropolis with Online Relabeling) for efficiently simulating from

More information

U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models

U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models U-Likelihood and U-Updating Algorithms: Statistical Inference in Latent Variable Models Jaemo Sung 1, Sung-Yang Bang 1, Seungjin Choi 1, and Zoubin Ghahramani 2 1 Department of Computer Science, POSTECH,

More information

PACKAGE LMest FOR LATENT MARKOV ANALYSIS

PACKAGE LMest FOR LATENT MARKOV ANALYSIS PACKAGE LMest FOR LATENT MARKOV ANALYSIS OF LONGITUDINAL CATEGORICAL DATA Francesco Bartolucci 1, Silvia Pandofi 1, and Fulvia Pennoni 2 1 Department of Economics, University of Perugia (e-mail: francesco.bartolucci@unipg.it,

More information

Robust Estimation of the Number of Components for Mixtures of Linear Regression Models

Robust Estimation of the Number of Components for Mixtures of Linear Regression Models Noname manuscript No. (will be inserted by the editor) Robust Estimation of the Number of Components for Mixtures of Linear Regression Models Meng Li Sijia Xiang Weixin Yao Received: date / Accepted: date

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Communications in Statistics - Simulation and Computation. Comparison of EM and SEM Algorithms in Poisson Regression Models: a simulation study

Communications in Statistics - Simulation and Computation. Comparison of EM and SEM Algorithms in Poisson Regression Models: a simulation study Comparison of EM and SEM Algorithms in Poisson Regression Models: a simulation study Journal: Manuscript ID: LSSP-00-0.R Manuscript Type: Original Paper Date Submitted by the Author: -May-0 Complete List

More information

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.

More information

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter

More information

Answers and expectations

Answers and expectations Answers and expectations For a function f(x) and distribution P(x), the expectation of f with respect to P is The expectation is the average of f, when x is drawn from the probability distribution P E

More information

Hmms with variable dimension structures and extensions

Hmms with variable dimension structures and extensions Hmm days/enst/january 21, 2002 1 Hmms with variable dimension structures and extensions Christian P. Robert Université Paris Dauphine www.ceremade.dauphine.fr/ xian Hmm days/enst/january 21, 2002 2 1 Estimating

More information

Maximum Likelihood Estimation. only training data is available to design a classifier

Maximum Likelihood Estimation. only training data is available to design a classifier Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

A Note on the Expectation-Maximization (EM) Algorithm

A Note on the Expectation-Maximization (EM) Algorithm A Note on the Expectation-Maximization (EM) Algorithm ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign March 11, 2007 1 Introduction The Expectation-Maximization

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Lecture 3 September 1

Lecture 3 September 1 STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling

Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Fractional Hot Deck Imputation for Robust Inference Under Item Nonresponse in Survey Sampling Jae-Kwang Kim 1 Iowa State University June 26, 2013 1 Joint work with Shu Yang Introduction 1 Introduction

More information

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution

MH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution MH I Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution a lot of Bayesian mehods rely on the use of MH algorithm and it s famous

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved

More information

The EM Algorithm for the Finite Mixture of Exponential Distribution Models

The EM Algorithm for the Finite Mixture of Exponential Distribution Models Int. J. Contemp. Math. Sciences, Vol. 9, 2014, no. 2, 57-64 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ijcms.2014.312133 The EM Algorithm for the Finite Mixture of Exponential Distribution

More information

New Bayesian methods for model comparison

New Bayesian methods for model comparison Back to the future New Bayesian methods for model comparison Murray Aitkin murray.aitkin@unimelb.edu.au Department of Mathematics and Statistics The University of Melbourne Australia Bayesian Model Comparison

More information

Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction

Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction Xiaodong Lin 1 and Yu Zhu 2 1 Statistical and Applied Mathematical Science Institute, RTP, NC, 27709 USA University of Cincinnati,

More information

Approximate Likelihoods

Approximate Likelihoods Approximate Likelihoods Nancy Reid July 28, 2015 Why likelihood? makes probability modelling central l(θ; y) = log f (y; θ) emphasizes the inverse problem of reasoning y θ converts a prior probability

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

Bayes: All uncertainty is described using probability.

Bayes: All uncertainty is described using probability. Bayes: All uncertainty is described using probability. Let w be the data and θ be any unknown quantities. Likelihood. The probability model π(w θ) has θ fixed and w varying. The likelihood L(θ; w) is π(w

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

Accelerating the EM Algorithm for Mixture Density Estimation

Accelerating the EM Algorithm for Mixture Density Estimation Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 1/18 Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department Worcester Polytechnic

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

A BAYESIAN MATHEMATICAL STATISTICS PRIMER. José M. Bernardo Universitat de València, Spain

A BAYESIAN MATHEMATICAL STATISTICS PRIMER. José M. Bernardo Universitat de València, Spain A BAYESIAN MATHEMATICAL STATISTICS PRIMER José M. Bernardo Universitat de València, Spain jose.m.bernardo@uv.es Bayesian Statistics is typically taught, if at all, after a prior exposure to frequentist

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

The Expectation Maximization Algorithm

The Expectation Maximization Algorithm The Expectation Maximization Algorithm Frank Dellaert College of Computing, Georgia Institute of Technology Technical Report number GIT-GVU-- February Abstract This note represents my attempt at explaining

More information

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide

More information

Likelihood-Based Methods

Likelihood-Based Methods Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)

More information

Bayesian inference for multivariate skew-normal and skew-t distributions

Bayesian inference for multivariate skew-normal and skew-t distributions Bayesian inference for multivariate skew-normal and skew-t distributions Brunero Liseo Sapienza Università di Roma Banff, May 2013 Outline Joint research with Antonio Parisi (Roma Tor Vergata) 1. Inferential

More information

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units

Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional

More information

An introduction to Variational calculus in Machine Learning

An introduction to Variational calculus in Machine Learning n introduction to Variational calculus in Machine Learning nders Meng February 2004 1 Introduction The intention of this note is not to give a full understanding of calculus of variations since this area

More information

Statistical Estimation

Statistical Estimation Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from

More information

Monte Carlo Integration using Importance Sampling and Gibbs Sampling

Monte Carlo Integration using Importance Sampling and Gibbs Sampling Monte Carlo Integration using Importance Sampling and Gibbs Sampling Wolfgang Hörmann and Josef Leydold Department of Statistics University of Economics and Business Administration Vienna Austria hormannw@boun.edu.tr

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

A Comparative Study of Imputation Methods for Estimation of Missing Values of Per Capita Expenditure in Central Java

A Comparative Study of Imputation Methods for Estimation of Missing Values of Per Capita Expenditure in Central Java IOP Conference Series: Earth and Environmental Science PAPER OPEN ACCESS A Comparative Study of Imputation Methods for Estimation of Missing Values of Per Capita Expenditure in Central Java To cite this

More information

PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS

PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS Statistica Sinica 15(2005), 831-840 PARAMETER CONVERGENCE FOR EM AND MM ALGORITHMS Florin Vaida University of California at San Diego Abstract: It is well known that the likelihood sequence of the EM algorithm

More information

STA 294: Stochastic Processes & Bayesian Nonparametrics

STA 294: Stochastic Processes & Bayesian Nonparametrics MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a

More information

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton

Last lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton EM Algorithm Last lecture 1/35 General optimization problems Newton Raphson Fisher scoring Quasi Newton Nonlinear regression models Gauss-Newton Generalized linear models Iteratively reweighted least squares

More information

Dynamic sequential analysis of careers

Dynamic sequential analysis of careers Dynamic sequential analysis of careers Fulvia Pennoni Department of Statistics and Quantitative Methods University of Milano-Bicocca http://www.statistica.unimib.it/utenti/pennoni/ Email: fulvia.pennoni@unimib.it

More information

Overlapping Astronomical Sources: Utilizing Spectral Information

Overlapping Astronomical Sources: Utilizing Spectral Information Overlapping Astronomical Sources: Utilizing Spectral Information David Jones Advisor: Xiao-Li Meng Collaborators: Vinay Kashyap (CfA) and David van Dyk (Imperial College) CHASC Astrostatistics Group April

More information

Adaptive Metropolis with Online Relabeling

Adaptive Metropolis with Online Relabeling Adaptive Metropolis with Online Relabeling Rémi Bardenet LAL & LRI, University Paris-Sud 91898 Orsay, France bardenet@lri.fr Olivier Cappé LTCI, Telecom ParisTech & CNRS 46, rue Barrault, 7513 Paris, France

More information

Bayesian Modelling and Inference on Mixtures of Distributions Modelli e inferenza bayesiana per misture di distribuzioni

Bayesian Modelling and Inference on Mixtures of Distributions Modelli e inferenza bayesiana per misture di distribuzioni Bayesian Modelling and Inference on Mixtures of Distributions Modelli e inferenza bayesiana per misture di distribuzioni Jean-Michel Marin CEREMADE Université Paris Dauphine Kerrie L. Mengersen QUT Brisbane

More information

An Introduction to mixture models

An Introduction to mixture models An Introduction to mixture models by Franck Picard Research Report No. 7 March 2007 Statistics for Systems Biology Group Jouy-en-Josas/Paris/Evry, France http://genome.jouy.inra.fr/ssb/ An introduction

More information

Estimation for nonparametric mixture models

Estimation for nonparametric mixture models Estimation for nonparametric mixture models David Hunter Penn State University Research supported by NSF Grant SES 0518772 Joint work with Didier Chauveau (University of Orléans, France), Tatiana Benaglia

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences

Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Genet. Sel. Evol. 33 001) 443 45 443 INRA, EDP Sciences, 001 Alternative implementations of Monte Carlo EM algorithms for likelihood inferences Louis Alberto GARCÍA-CORTÉS a, Daniel SORENSEN b, Note a

More information

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2 STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised

More information

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION

COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION COS513: FOUNDATIONS OF PROBABILISTIC MODELS LECTURE 9: LINEAR REGRESSION SEAN GERRISH AND CHONG WANG 1. WAYS OF ORGANIZING MODELS In probabilistic modeling, there are several ways of organizing models:

More information

NONPARAMETRIC BAYESIAN INFERENCE ON PLANAR SHAPES

NONPARAMETRIC BAYESIAN INFERENCE ON PLANAR SHAPES NONPARAMETRIC BAYESIAN INFERENCE ON PLANAR SHAPES Author: Abhishek Bhattacharya Coauthor: David Dunson Department of Statistical Science, Duke University 7 th Workshop on Bayesian Nonparametrics Collegio

More information

The Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic

The Polya-Gamma Gibbs Sampler for Bayesian. Logistic Regression is Uniformly Ergodic he Polya-Gamma Gibbs Sampler for Bayesian Logistic Regression is Uniformly Ergodic Hee Min Choi and James P. Hobert Department of Statistics University of Florida August 013 Abstract One of the most widely

More information

Inferring biological dynamics Iterated filtering (IF)

Inferring biological dynamics Iterated filtering (IF) Inferring biological dynamics 101 3. Iterated filtering (IF) IF originated in 2006 [6]. For plug-and-play likelihood-based inference on POMP models, there are not many alternatives. Directly estimating

More information

Mixtures of Rasch Models

Mixtures of Rasch Models Mixtures of Rasch Models Hannah Frick, Friedrich Leisch, Achim Zeileis, Carolin Strobl http://www.uibk.ac.at/statistics/ Introduction Rasch model for measuring latent traits Model assumption: Item parameters

More information

NONPARAMETRIC MIXTURE OF REGRESSION MODELS

NONPARAMETRIC MIXTURE OF REGRESSION MODELS NONPARAMETRIC MIXTURE OF REGRESSION MODELS Huang, M., & Li, R. The Pennsylvania State University Technical Report Series #09-93 College of Health and Human Development The Pennsylvania State University

More information

CSE446: Clustering and EM Spring 2017

CSE446: Clustering and EM Spring 2017 CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled

More information

Long-Run Covariability

Long-Run Covariability Long-Run Covariability Ulrich K. Müller and Mark W. Watson Princeton University October 2016 Motivation Study the long-run covariability/relationship between economic variables great ratios, long-run Phillips

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference

Preliminaries The bootstrap Bias reduction Hypothesis tests Regression Confidence intervals Time series Final remark. Bootstrap inference 1 / 171 Bootstrap inference Francisco Cribari-Neto Departamento de Estatística Universidade Federal de Pernambuco Recife / PE, Brazil email: cribari@gmail.com October 2013 2 / 171 Unpaid advertisement

More information

Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level

Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level A Monte Carlo Simulation to Test the Tenability of the SuperMatrix Approach Kyle M Lang Quantitative Psychology

More information

Bagging During Markov Chain Monte Carlo for Smoother Predictions

Bagging During Markov Chain Monte Carlo for Smoother Predictions Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

A Note on Lenk s Correction of the Harmonic Mean Estimator

A Note on Lenk s Correction of the Harmonic Mean Estimator Central European Journal of Economic Modelling and Econometrics Note on Lenk s Correction of the Harmonic Mean Estimator nna Pajor, Jacek Osiewalski Submitted: 5.2.203, ccepted: 30.0.204 bstract The paper

More information

Data Preprocessing. Cluster Similarity

Data Preprocessing. Cluster Similarity 1 Cluster Similarity Similarity is most often measured with the help of a distance function. The smaller the distance, the more similar the data objects (points). A function d: M M R is a distance on M

More information

Semi-Parametric Importance Sampling for Rare-event probability Estimation

Semi-Parametric Importance Sampling for Rare-event probability Estimation Semi-Parametric Importance Sampling for Rare-event probability Estimation Z. I. Botev and P. L Ecuyer IMACS Seminar 2011 Borovets, Bulgaria Semi-Parametric Importance Sampling for Rare-event probability

More information

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Theory of Maximum Likelihood Estimation. Konstantin Kashin Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical

More information

Nonparametric Modal Regression

Nonparametric Modal Regression Nonparametric Modal Regression Summary In this article, we propose a new nonparametric modal regression model, which aims to estimate the mode of the conditional density of Y given predictors X. The nonparametric

More information

Robust Monte Carlo Methods for Sequential Planning and Decision Making

Robust Monte Carlo Methods for Sequential Planning and Decision Making Robust Monte Carlo Methods for Sequential Planning and Decision Making Sue Zheng, Jason Pacheco, & John Fisher Sensing, Learning, & Inference Group Computer Science & Artificial Intelligence Laboratory

More information

Bayesian inference for factor scores

Bayesian inference for factor scores Bayesian inference for factor scores Murray Aitkin and Irit Aitkin School of Mathematics and Statistics University of Newcastle UK October, 3 Abstract Bayesian inference for the parameters of the factor

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

K-ANTITHETIC VARIATES IN MONTE CARLO SIMULATION ISSN k-antithetic Variates in Monte Carlo Simulation Abdelaziz Nasroallah, pp.

K-ANTITHETIC VARIATES IN MONTE CARLO SIMULATION ISSN k-antithetic Variates in Monte Carlo Simulation Abdelaziz Nasroallah, pp. K-ANTITHETIC VARIATES IN MONTE CARLO SIMULATION ABDELAZIZ NASROALLAH Abstract. Standard Monte Carlo simulation needs prohibitive time to achieve reasonable estimations. for untractable integrals (i.e.

More information