Why MixTRV? Motivations for a new Matlab package for Gaussian mixture inference statistical hypothesis testing& model-based classification/clustering

Size: px
Start display at page:

Download "Why MixTRV? Motivations for a new Matlab package for Gaussian mixture inference statistical hypothesis testing& model-based classification/clustering"

Transcription

1 Why MixTRV? Motivations for a new Matlab package for Gaussian mixture inference statistical hypothesis testing& model-based classification/clustering Alexandre Lourme Institut de Mathématiques de Bordeaux (IMB), UMR 55, Université Bordeaux I Faculté d Economie Gestion&AES, Université Montesquieu - Bordeaux IV Introduction Nowadays modeling heterogeneous continuous data with a finite sequence of Gaussians is common, and the softwares dedicated to this task proliferate for some years. The oldest softwares, as SNOB (Wallace and Boulton, 968) or EMMIX (McLachlan et al., 999), are gradually replaced with a new generation of Matlab or R packages dedicated to Gaussian modeling within a wider purpose as, for example, discriminant analysis, cluster analysis, high-dimensional data processing, variable selection, statistical hypothesis testing, etc. These recent packages differ not only on the target but also on the inferential method (Maximum Likelihood, Maximum of Completed Likelihood, Minimum Message Length, Bayesian inference, etc.), on the considered parsimonious models, on the embedded model selection criteria etc. So, MixTRV is a Matlab package for both classification, clustering and statistical hypothesis testing, enabling to infer twenty-two Gaussian models by maximum likelihood in a supervised, semi-supervised, unsupervised context. MixTRV is close to recent R packages as bgmm (Biecek et al., 0), mclust (Fraley et al., 0), pgmm (McNicholas et al., 0), mixmod (Lebret et al., nd) or upclass (Russell et al., nd). Nevertheless MixTRV differs from the latter packages on several stability properties of its underlying parsimonious models, properties that are important both for representing and interpreting the inferred model. Several model families K Gaussians fitting a sample of d-dimensional heterogeneous data, the Gaussian k (k {,..., K}) is characterized by a center µ k R d, a covariance matrix Σ k R d d (symmetric positive definite) and a weightπ k > 0 ( K j= π j = ). In order to improve the square error of the inferred model it is usual to consider a family of parsimonious models combining contraints on the previous parameters. In this spirit, each software among bgmm, mclust, pgmm, mixmod, upclass and MixTRV is characterized by an own collection of models defined by specific constraints onπ k,µ k,σ k (k=,..., K). Let us review the model families related the latter softwares so as to highlight, further, the advantages of MixTRV. mclust. Each covariance matrix Σ k is symmetric positive definite. Then its eigenvalues are positive real numbers:α k, α k, α k,d > 0 and Σ k is diagonalizable in an orthonormal basis of eigenvectors. So, notingλ k = diag(α k,,...,α k,d ), there exists an orthogonal matrixd k R d d such as: Σ k = D k Λ k D k. () The columns of D k form an orthonormal basis ofr d. The canonical directions of this basis on the one hand and the principal axes of the ellipsoidal iso-density contours of the Gaussian k on the other hand are pairwise parallel. So, the matrixd k relates the orientation of the Gaussian k.

2 The models of mclust described in Fraley et al. (0) propose ten covariance structures from the following decomposition: Σ k =α,k D k L k D k () where L k = Λ k /α,k. These models are named EII, VII, EEI, VEI, EVI, VVI, EEE, EEV, VEV, VVV. The letter V or E in first position indicates thatα k, (k=,..., K) are variable (V) or equal (E). V, E or I stands in second position when the matricesl k (k=,..., K) are assumed to be variable (V), equal to each other (E), or all equal to the identity matrix (I). The letter V, E or I in third position means that the orthogonal matrices D k (k=,..., K) are variable (V), equal (E) or that each of them is a permutation matrix (I) a. As regards the other parameters mclust considers the weightsπ k (k=,..., K) and the centers µ k (k =,..., K) as free. So the mclust model family consists of the ten previous covariance structures. mixmod. The mixmod software described in Biernacki et al. (006) considers fourteen parsimonious structures of covariances. These models are based on the following decomposition deriving from (): Σ k =λ k D k A k D k () whereλ k = Σ k /d is the volume of the component k anda k = Λ k /λ k the shape. Decompositions () and () are close since each of them is obtained from () by normalizing the matrix Λ k. These decompositions only differ on the normalizing factor which is the volumeλ k of the component k in () and the largest eigenvalueα,k ofσ k in (). The fourteen covariance models of Biernacki et al. (006) combine constraints on the volume (λ k ), the shape (A k ) and the orientation (D k ) of the components. These models are divided into three families. The spherical family includes two covariance models namedλi andλ k I. Both of them assume that the matricesa k andd k are equal to the identity matrix. As regards the volumesλ,...,λ K the first (resp. the second) model considers they are homogeneous (resp. heterogeneous). The diagonal family consists of four covariance models namedλb,λ k B,λB k,λ k B k depending on whether (i) the volumes are homogeneous (λ) or free (λ k ) and (ii) the matrices D k A k D k (k=,..., K) are diagonal and equal (B) or just diagonal (B k ). The general family is composed of height covariance models obtained by assuming as homogeneous/heterogeneous the volumes (λ/λ k ), the shapes (A/A k ), the orientations (D/D k ). So,λDA k D means that both volumes and orientations are homogeneous whereas shapes are free. Mixmod considers no parsimonious hypotheses about the centers µ k (k=,..., K) but the weights are either free (π k ) or equal (π). Combining the two latter assumptions about the weights with the fourteen previous covariance structures leads to the wide mixmod model family made of twenty-eight parsimonious models. The standard homoscedastic model with free weights - notedπ k λdad - is one of them. pgmm. The models of pgmm are mixtures of factor analyzers (see Mclachlan and Peel, 000, Chap. 8). Such models are often used to fit high dimensional data (see Bouveyron and Brunet-Saumard, a the permutation matrices are homogeneous (resp. heterogeneous) when the matrices L k (k=,..., K) are supposed to be equal (resp. variable)

3 0), but they are suitable for modeling Gaussian data even outside the specific context of high dimension. For a common given dimension q (q N ) of the latent spaces (see Mclachlan and Peel, 000, p. 0) pgmm proposes twelve covariance structures gathered in a family called EPGMM and described in McNicholas and Murphy (00). These structures rest on the hypothesis that each matrix Σ k can be decomposed according to: Σ k = B k B k +ω k k () whereb k is a matrix with d rows and q columns (q independent of k),ω k is a scalar and k is a diagonal matrix with determinant. When q d the columns of B k define q directions inr d which are close to the factorial axes of the Gaussian k. As regardsω k k, this term is intended to take over the residual variability of the Gaussian k in the other directions ofr d. Each EPGMM model name has four letters among C (for constrained) and U (for unconstrained). In position (resp., ) the letter C or U indicates whether the parameter B k (resp. k,ω k ) is homogeneous or free with respect to k. In fourth position, the letter C or U indicates that matrices k are equal or not equal to the identity matrix. When the matrices k are assumed to be equal to the identity matrix, they are necessarily homogeneous with respect to k. So, if a model name ends with C then the second letter is also C. Combining the previous hypotheses leads to twelve covariance models. The model called CUCU for example supposes that the matrices B k (k=,..., K) are homogeneous and the coefficients ω k (k=,..., K) also, but that the matrices k (k=,..., K) are free. upclass. Generally, in a model-based discriminant analysis context, the model inference involves just labelled data wheareas the unlabelled data are only used in the classification step. The upclass software enables to make use of the information held into the unlabelled data in the inference step by estimating a model both on labelled and unlabelled data (see Russell et al., nd). The parsimonious models of upclass are the same as those of mclust. So, they inherit both their advantages and drawbacks. bgmm. Unlike the previously reviewed softwares, bgmm (see Biecek et al., 0) enables to combine parsimonious hypotheses about both the covariance matrices Σ k (k=,..., K) and the centers µ k (k =,..., K). Each bgmm model name consists of four signs. The first sign (resp. The second sign) is a letter, either E or D depending on whether the centersµ k (resp. the covariance matricesσ k ) are equal or different. The third sign is E when the d variances Σ k (i, i) (i=,..., d) of each component and the d (d)/ covariancesσ k (i, j) ( i< j d) are homogeneous, and D otherwise. The last sign is 0 if the d (d)/ covariancesσ k (i, j) ( i< j d) of each component are supposed to be null and D otherwise. For example the model DDD0 assumes that the centers and the variances are free whereas the component covariances are null. mixtrv. As each covariance matrixσ k is symmetric positive definite, this matrix can be decomposed according to: Σ k = T k R k T k (5) where T k is the diagonal matrix of component standard deviations and R k the associated matrix of correlations. So: T k (i, j)= Σ k (i, j) if i= j and 0 otherwise, and: R k = T k Σ k T k. Moreover the standardized meanv k is defined by: V k = Tk µ k (6) So, the matrixt k appears to be a scale parameter andtk, a normalizing parameter. ThenV k andr k are respectively the center and the covariance matrix of the k-th normalized Gaussian component. MixTRV offers twenty-two parsimonious models. Everyone has a name with four letters. H or F stands in first, third and fourth position depending on whether weightsπ k, correlation matrices R k and

4 standardized mean vectors V k are homogeneous (H) or free (F) with respect to k. F, P or H in second position indicates that the standard deviation matrices T k (k=,..., K) are free (F) proportional (P) or homogeneous (H). For example the model FPHF of MixTRV assumes that weights and standardized means are free through the components whereas standard deviations are proportional and correlations are homogeneous. One can check that both covariance structures of FPHF in MixTRV andπλ k DAD in mixmod are the same. More generally it often happens that several models belonging to different packages among mclust, mixmod, pgmm, bgmm and MixTRV share identical covariance structures. All common covariance structures through the latter packages are summarized in Table, Column. Like bgmm, MixTRV enables to consider parsimonious hypotheses on the centers: when the vectors V k and the matrices T k are simultaneously homogeneous with respect to k, then µ k (k=,..., K) are equal. Unlike (), () and (), the decomposition (5) is canonical. This ensures the existence and the uniqueness of each parameter V k, R k and T k. Although it is not the main virtue of MixTRV this advantage is convenient for inferringv k,r k, T k and helpful for interpreting these parameters. Stability properties The five properties described bellow separate MixTRV from the other softwares mclust, mixmod, pgmm, upclass, bgmm. Indeed the MixTRV model family is the only one the parsimonious models of which satisfy each of the following properties. Property (Model Structure Scale Invariance) A random vectorx R d with a parametric distribution is scale invariant if both models of SX and X are submitted to the same constraints whichever is the diagonal positive definite matrixs R d d. Illustrations X being distributed as a mixture of K Gaussians with homogeneous correlation matrices: R =,...,= R K, then the component correlation matrices ofsx are themselves homogeneous. So the model FFHF of MixTRV is scale invariant. X being distributed as a mixture of K Gaussians with equal variances within each component: Σ k (, )= =Σ k (d, d), the conditional variances of SX are generally not equal. So, the models DDED in bgmm, VII in mclust,π k λi in mixmod, etc. are not scale invariant. Column of Table summarizes which models of bgmm, pgmm, mclust, mixmod, MixTRV satisfy (or not) Property : MixTRV is the only package all the models of which are scale invariant. Here is one reason why discarding models that do not satisfy Property : they often lead to unsuitable graphical representations. For example Fig. a depicts two Gaussian isodensity contours for a bivariate mixmod modelπ k λ k DA k D, within orthonormal axes. Fig. b shows that when x-axis scale is changed, the main axes of the ellipses are no more parallel wheareas the two Gaussians which are represented still have the same orientation. Property (Model Rank Scale Invariance)

5 y y 0 0 x (a) orthonormal axes x (b) changing x-axis scale Fig. : The evidence of equal orientations depends on the axis scaling Γ denoting a likelihood based model selection criterion among AIC (Akaike, 97), BIC (Schwarz, 978), ICL (Biernacki et al., 000), a set of models isγ-scale invariant if rescaling the data does not change the model ranks related toγ. For a model family to be AIC/BIC/ICL - scale invariant, each model of the family must satisfy Property. As each software bgmm, pgmm, mclust and mixmod includes at least one non scale invariant structure, these four packages are neither AIC nor BIC nor ICL - scale invariant. On the opposite it can be proved that the MixTRV model collection is both AIC, BIC and ICL - scale invariant (see Biernacki and Lourme, 0). Column 5 of Table recalls that MixTRV is the only model collection which satisfies Property. Illustration Let us consider the following experimental design in order to illustrate the importance of Property. Each Gaussian mixture of bgmm, pgmm, mclust, mixmod, MixTRV fits the famous Old Faithful Geyser eruptions (see Azzalini and Bowman, 990) - with K = classes interpreted as short and long eruptions - and then the four best models associated to BIC within each family are recorded. This leads to Tables a, b, c depending on wether the two variables Duration and Waiting are measured in minutes minutes, in seconds minutes or both standardized (divided by their standard deviation). One can observe from Table that for each software bgmm, pgmm, mclust and mixmod, the list containing the names of the four best models associated to BIC depends on the units of the data whereas the list keeps unchanged in case of MixTRV. When a model family does not satisfy Property, the model selection procedure is subject to the measurement units and, then, the constraints of the selected model cannot be interpreted as a property of the data. In the previous example about Oldfaithful eruptions, the best mixmod model for BIC isπ k λ k DA k D when Duration and Waiting are both measured in minutes whereasπ k λ k D k AD k is preferred when both variables are standardized. So, homogeneous orientations of the distributions is not an intrinsic property of short and long Oldfaithful erruptions. On the contrary the MixTRV model selected by BIC is FFHF for any measurement units, which supports that homogeneous correlations of Waiting and Duration among short and long eruptions is an intrinsic property of Oldfaithful eruptions. Property (Consistency of Canonical Projections) A random vectorx R d with a parametric distribution is consistent by projection onto the canonical 5

6 planes if any random vector same constraints asx. X R consisting of two distinct components of X, is submitted to the Illustrations X = (X,..., X d ) being distributed as a mixture of K Gaussians with homogeneous standardized means: V =,...,= V K, then the component standardized mean vectors of X= (X i, X j ) are themselves homogeneous whatever is the couple of distinct indexes (i, j). So the structure FFFH of MixTRV is consistent by projection onto the canonical planes. But if X = (X,..., X d ) is distributed as a mixture of K Gaussians with homogeneous volumes: λ =,...,=λ K, the component volumes of X= (X, X ) for example are generally not equal. So, the mixmod modelπ k λd k A k D k is not consistent by projection onto the canonical planes. Column 6 of Table displays the bgmm, pgmm, mclust, mixmod and MixTRV models satisfying (or not) Property : MixTRV and bgmm are the only softwares all the parsimonious models of which are consistent by projection onto canonical planes. Models which do not satisfy Property are not easy to represent in dimension. For example Fig. depicts two Gaussian component isodensity contours related to a -variate mixmod modelπ k λd k AD k : the structure consisting on homogeneous volumes and shapes but free orientations does not remain by projection onto x-y canonical plane. Fig. : Unsustainability of the following structure: homogeneous shapes - homogeneous volumes - free orientations, into the canonical planes 6

7 Property b (Characterization of a Model by Bivariate Margins) A random vector X R d with a parametric distribution is characterizable by bivariate margins if its parameter necessarily satisfies any constraint complied by the parameter of its projections onto the canonical planes. Illustrations LetX= (X,..., X d ) be a random vector inr d distributed as a mixture of K Gaussians. If the component standard deviation matrices of (X i, X j ) are proportional whatever is the couple of distinct indexes (i, j) then the component standard deviation matrices of X are themselves proportional. So, the model FPFF of MixTRV is characterizable by bivariate margins. On the contrary it is possible that any couple of margins (X i, X j ) is distributed according to the model DDED of bgmm whereas X is not distributed according to DDED. So, the bgmm model DDED is not characterizable by bivariate margins. Column 7 of Table summarizes which models of bgmm, pgmm, mclust, mixmod, MixTRV satisfy (or not) Property : MixTRV is the only package all the models of which are characterizable by bivariate margins. Property 5 (Likelihood Ratio Test Scale Invariance) A model collection is scale invariant as regards the Likelihood Ratio Test (LRT) if changing the units of the data keeps all likelihood ratios of any couple of nested models unchanged. Illustration The mclust model family is not scale invariant as regards the LRT. For example the ratio of maximized likelihoods related to the mclust models VEV and VVV - the latter is more complex than the former by three degrees of freedom - inferred on the turtle data of Jolicoeur et al. (960) is equal to.0 when the three variables carapace length, width and height are all measured in cm, and this ratio is equal to.8 when carapace length and width are standardized (divided by their standard deviation). This means that for a given significance level of the LRT, the parameterl k of () - which is related to the shape of male and female turtle distributions - will be considered as homogeneous or free depending on the units of the data. So homogeneous distribution shapes is not an intrinsic property of male and female turtles since it is also related to the measurement units. Actually no one of mclust, mixmod, pgmm, bgmm model families satisfies Property 5: within each of these collections there exists a couple of nested models, the likelihood ratio of which varies according to the units of the data. On the opposite, the MixTRV model set is LRT-scale invariant: for any couple of nested MixTRV models the likelihood ratio remains the same whatever are the units of the data. Table, Column 8 recall that MixTRV is the only model family which satisfies Property 5. b converse of Property 7

8 References Akaike, H. (97). A new look at the statistical model identification. Automatic Control, IEEE Transactions on, 9(6):76 7. Azzalini, A. and Bowman, A. W. (990). A look at some data on the old faithful geyser. Applied Statistician, 9(): Biecek, P., Szczurek, E., Vingron, M., and Tiuryn, J. (0). The r package bgmm: Mixture modeling with uncertain knowledge. Journal of Statistical Software, 7(i0). Biernacki, C., Celeux, G., and Govaert, G. (000). Assessing a mixture model for clustering with the integrated completed likelihood. Pattern Analysis and Machine Intelligence, IEEE Transactions on, (7): Biernacki, C., Celeux, G., Govaert, G., and Langrognet, F. (006). Model-based cluster and discriminant analysis with the mixmod software. Computational Statistics&Data Analysis, 5(): Biernacki, C. and Lourme, A. (0). Stable and Visualizable Gaussian Parsimonious Clustering Models. Statistics and Computings, (in press). Bouveyron, C. and Brunet-Saumard, C. (0). Model-based clustering of high-dimensional data: A review. Computational Statistics& Data Analysis. Fraley, C., Raftery, A. E., Murphy, T. B., and Scrucca, L. (0). mclust version for R: Normal mixture modeling for model-based clustering, classification, and density estimation. Rapport de recherche 597, Department of Statistics, University of Washington. Jolicoeur, P., Mosimann, J. E., et al. (960). Size and shape variation in the painted turtle. a principal component analysis. Growth, ():9 5. Lebret, R., Iovleff, S., Langrognet, F., Biernacki, C., Celeux, G., and Govaert, G. (n.d.). Rmixmod: The r package of the model-based unsupervised, supervised and semi-supervised classification mixmod library. Journal of Statistical Software. Mclachlan, G. and Peel, D. (000). Finite Mixture Models. Wiley Series in Probability and Statistics. Wiley-Interscience. McLachlan, G. J., Peel, D., Basford, K. E., and Adams, P. (999). The EMMIX software for the fitting of mixtures of normal and t-components. Journal of Statistical Software, ():. McNicholas, P. D. and Murphy, T. B. (00). Model-based clustering of microarray expression data via latent gaussian mixture models. Bioinformatics, 6(): McNicholas, P. D., Murphy, T. B., Jampani, K., McDaid, A., and Banks, L. (0). pgmm version.0 for r: Model-based clustering and classification via latent gaussian mixture models. Technical report, Technical Report 0-0, Department of Mathematics and Statistics, University of Guelph. Russell, N., Cribbin, L., and Murphy, T. B. (n.d.). upclass: An r package for updating model-based classification rules. Schwarz, G. (978). Estimating the dimension of a model. The annals of statistics, 6():6 6. Wallace, C. S. and Boulton, D. M. (968). An information measure for classification. The Computer Journal, ():

9 common stability properties covariance package model structure Property Property Property Property Property 5 [D/E]DDD [D/E]DD [D/E]DED + bgmm [D/E]DE0 + + [D/E]EDD [D/E]ED [D/E]EED + [D/E]EE0 + + UUUU + +? UUCU + +? CUUU + +? CUCU +? UCUU + +? pgmm UCCU +? CCUU + +? CCCU + +? UCUC + UCCC? CCUC +? CCCC +? VVV VEV EEV EEE mclust VVI EVI + + VEI EEI VII + + EII + + [π k /π]λ k D k A k D k [π k /π]λd k A k D k + [π k /π]λ k D k AD k [π k /π]λd k AD k [π k /π]λ k DA k D + [π k /π]λda k D + [π mixmod k /π]λ k DAD [π k /π]λdad [π k /π]λ k B k [π k /π]λb k + + [π k /π]λ k B [π k /π]λb [π k /π]λ k I + + [π k /π]λi + + [F/H]FFF [F/H]PFF [F/H]HFF [F/H]FHF [F/H]PHF mixtrv [F/H]HHF [F/H]FFH [F/H]PFH [F/H]HFH [F/H]FHH [F/H]PHH Table : The parsimonious models of bgmm, pgmm, mixmod, mclust, MixTRV: common covariance structures and summary about the models/families that hold (+) or not (-) the stability properties from to 5

10 family rank model BIC model BIC model BIC DDDD. DDDD 59.5 DDDD 80.6 bgmm DEDD 5. DDDO 6. DEDD 55.5 DDDO 57. DEDD 8.6 DDDO 85.5 DEDO 5.6 DEDO 58.9 DEDD 856. CCUC 5.5 UCCC 5. CCUC 88. pgmm CCUU 0.7 CUCU. UCUC 59.7 UCCU CUCU 89. UCUC 80.7 UCUC.7 CCCC 55.9 CCUU 8.0 VVV. VVV 59.5 VEV 88.6 mclust EEE 5. VEV 5. EEE 55.5 VEV 555. VVV 80.6 EEV 8.5 EEV 9. EEV 557. EEE 8.6 π k λ k DA k D 0. π k λ k DA k D 5. π k λ k D k AD k 88.6 mixmod π k λ k D k A k D k. π k λ k DAD.0 π k λ k D k A k D k 59.5 π k λda k D 59.7 π k λ k D k A k D k 80.6 π k λ k DAD 8. π k λda k D. π k λ k DAD 550. π k λ k DA k D 8.6 FFHF 7. FFHF 5.6 FFHF 85.6 mixtrv FFFF. FPHF.0 FFFF 59.5 FPHF 550. FFFF 80.6 FPHF 8. FHHF 5. FHHF 55.5 FHHF 8.6 (a) min min (original units) (b) sec min (c) standardized standardized Table : The four best models within each family (bgmm, pgmm, mclust, mixmod, mixtrv), inferred on the Old Faithful data (K = ) when Duration Waiting measurement units vary. 0

MixTRV Matlab package for Gaussian mixture inference statistical hypothesis testing& model-based classification/clustering

MixTRV Matlab package for Gaussian mixture inference statistical hypothesis testing& model-based classification/clustering MixTRV Matlab package for Gaussian mixture inference statistical hypothesis testing& model-based classification/clustering Alexandre Lourme Institut de Mathématiques de Bordeaux (IMB), UMR 5251, Université

More information

Model-Based Clustering of High-Dimensional Data: A review

Model-Based Clustering of High-Dimensional Data: A review Model-Based Clustering of High-Dimensional Data: A review Charles Bouveyron, Camille Brunet To cite this version: Charles Bouveyron, Camille Brunet. Model-Based Clustering of High-Dimensional Data: A review.

More information

arxiv: v1 [stat.me] 7 Aug 2015

arxiv: v1 [stat.me] 7 Aug 2015 Dimension reduction for model-based clustering Luca Scrucca Università degli Studi di Perugia August 0, 05 arxiv:508.07v [stat.me] 7 Aug 05 Abstract We introduce a dimension reduction method for visualizing

More information

Journal of Statistical Software

Journal of Statistical Software JSS Journal of Statistical Software January 2012, Volume 46, Issue 6. http://www.jstatsoft.org/ HDclassif: An R Package for Model-Based Clustering and Discriminant Analysis of High-Dimensional Data Laurent

More information

Model-based cluster analysis: a Defence. Gilles Celeux Inria Futurs

Model-based cluster analysis: a Defence. Gilles Celeux Inria Futurs Model-based cluster analysis: a Defence Gilles Celeux Inria Futurs Model-based cluster analysis Model-based clustering (MBC) consists of assuming that the data come from a source with several subpopulations.

More information

Dimension Reduction for Model-based Clustering via Mixtures of Multivariate t-distributions

Dimension Reduction for Model-based Clustering via Mixtures of Multivariate t-distributions Dimension Reduction for Model-based Clustering via Mixtures of Multivariate t-distributions by Katherine Morris A Thesis presented to The University of Guelph In partial fulfilment of requirements for

More information

Families of Parsimonious Finite Mixtures of Regression Models arxiv: v1 [stat.me] 2 Dec 2013

Families of Parsimonious Finite Mixtures of Regression Models arxiv: v1 [stat.me] 2 Dec 2013 Families of Parsimonious Finite Mixtures of Regression Models arxiv:1312.0518v1 [stat.me] 2 Dec 2013 Utkarsh J. Dang and Paul D. McNicholas Department of Mathematics & Statistics, University of Guelph

More information

Parsimonious Gaussian Mixture Models

Parsimonious Gaussian Mixture Models Parsimonious Gaussian Mixture Models Brendan Murphy Department of Statistics, Trinity College Dublin, Ireland. East Liguria West Liguria Umbria North Apulia Coast Sardina Inland Sardinia South Apulia Calabria

More information

Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions

Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions Parthan Kasarapu & Lloyd Allison Monash University, Australia September 8, 25 Parthan Kasarapu

More information

applications Rome, 9 February Università di Roma La Sapienza Robust model based clustering: methods and applications Francesco Dotto Introduction

applications Rome, 9 February Università di Roma La Sapienza Robust model based clustering: methods and applications Francesco Dotto Introduction model : fuzzy model : Università di Roma La Sapienza Rome, 9 February Outline of the presentation model : fuzzy 1 General motivation 2 algorithm on trimming and reweigthing. 3 algorithm on trimming and

More information

Mixtures and Hidden Markov Models for analyzing genomic data

Mixtures and Hidden Markov Models for analyzing genomic data Mixtures and Hidden Markov Models for analyzing genomic data Marie-Laure Martin-Magniette UMR AgroParisTech/INRA Mathématique et Informatique Appliquées, Paris UMR INRA/UEVE ERL CNRS Unité de Recherche

More information

INRIA Rh^one-Alpes. Abstract. Friedman (1989) has proposed a regularization technique (RDA) of discriminant analysis

INRIA Rh^one-Alpes. Abstract. Friedman (1989) has proposed a regularization technique (RDA) of discriminant analysis Regularized Gaussian Discriminant Analysis through Eigenvalue Decomposition Halima Bensmail Universite Paris 6 Gilles Celeux INRIA Rh^one-Alpes Abstract Friedman (1989) has proposed a regularization technique

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

Testing for a Global Maximum of the Likelihood

Testing for a Global Maximum of the Likelihood Testing for a Global Maximum of the Likelihood Christophe Biernacki When several roots to the likelihood equation exist, the root corresponding to the global maximizer of the likelihood is generally retained

More information

An Introduction to Multivariate Statistical Analysis

An Introduction to Multivariate Statistical Analysis An Introduction to Multivariate Statistical Analysis Third Edition T. W. ANDERSON Stanford University Department of Statistics Stanford, CA WILEY- INTERSCIENCE A JOHN WILEY & SONS, INC., PUBLICATION Contents

More information

mixmod Statistical Documentation

mixmod Statistical Documentation mixmod Statistical Documentation February 10, 2016 Contents 1 Introduction 3 2 Mixture model 3 2.1 Density estimation from a mixture model............ 4 2.2 Clustering with mixture model..................

More information

Louis Roussos Sports Data

Louis Roussos Sports Data Louis Roussos Sports Data Rank the sports you most like to participate in, 1 = favorite, 7 = least favorite. There are n=130 rank vectors. > sportsranks Baseball Football Basketball Tennis Cycling Swimming

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

Pattern Classification

Pattern Classification Pattern Classification Introduction Parametric classifiers Semi-parametric classifiers Dimensionality reduction Significance testing 6345 Automatic Speech Recognition Semi-Parametric Classifiers 1 Semi-Parametric

More information

Finite Mixture Models and Clustering

Finite Mixture Models and Clustering Finite Mixture Models and Clustering Mohamed Nadif LIPADE, Université Paris Descartes, France Nadif (LIPADE) EPAT, May, 2010 Course 3 1 / 40 Introduction Outline 1 Introduction Mixture Approach 2 Finite

More information

Unsupervised machine learning

Unsupervised machine learning Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

Clustering VS Classification

Clustering VS Classification MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

Principal component analysis

Principal component analysis Principal component analysis Angela Montanari 1 Introduction Principal component analysis (PCA) is one of the most popular multivariate statistical methods. It was first introduced by Pearson (1901) and

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Here each term has degree 2 (the sum of exponents is 2 for all summands). A quadratic form of three variables looks as

Here each term has degree 2 (the sum of exponents is 2 for all summands). A quadratic form of three variables looks as Reading [SB], Ch. 16.1-16.3, p. 375-393 1 Quadratic Forms A quadratic function f : R R has the form f(x) = a x. Generalization of this notion to two variables is the quadratic form Q(x 1, x ) = a 11 x

More information

Introduction to Graphical Models

Introduction to Graphical Models Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic

More information

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach

LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach LEC 2: Principal Component Analysis (PCA) A First Dimensionality Reduction Approach Dr. Guangliang Chen February 9, 2016 Outline Introduction Review of linear algebra Matrix SVD PCA Motivation The digits

More information

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture

More information

Different points of view for selecting a latent structure model

Different points of view for selecting a latent structure model Different points of view for selecting a latent structure model Gilles Celeux Inria Saclay-Île-de-France, Université Paris-Sud Latent structure models: two different point of views Density estimation LSM

More information

Choosing a model in a Classification purpose. Guillaume Bouchard, Gilles Celeux

Choosing a model in a Classification purpose. Guillaume Bouchard, Gilles Celeux Choosing a model in a Classification purpose Guillaume Bouchard, Gilles Celeux Abstract: We advocate the usefulness of taking into account the modelling purpose when selecting a model. Two situations are

More information

Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University

Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University this presentation derived from that presented at the Pan-American Advanced

More information

THE EMMIX SOFTWARE FOR THE FITTING OF MIXTURES OF NORMAL AND t-components

THE EMMIX SOFTWARE FOR THE FITTING OF MIXTURES OF NORMAL AND t-components THE EMMIX SOFTWARE FOR THE FITTING OF MIXTURES OF NORMAL AND t-components G.J. McLachlan, D. Peel, K.E. Basford*, and P. Adams Department of Mathematics, University of Queensland, St. Lucia, Queensland

More information

Model selection criteria in Classification contexts. Gilles Celeux INRIA Futurs (orsay)

Model selection criteria in Classification contexts. Gilles Celeux INRIA Futurs (orsay) Model selection criteria in Classification contexts Gilles Celeux INRIA Futurs (orsay) Cluster analysis Exploratory data analysis tools which aim is to find clusters in a large set of data (many observations

More information

This appendix provides a very basic introduction to linear algebra concepts.

This appendix provides a very basic introduction to linear algebra concepts. APPENDIX Basic Linear Algebra Concepts This appendix provides a very basic introduction to linear algebra concepts. Some of these concepts are intentionally presented here in a somewhat simplified (not

More information

Linear Algebra: Matrix Eigenvalue Problems

Linear Algebra: Matrix Eigenvalue Problems CHAPTER8 Linear Algebra: Matrix Eigenvalue Problems Chapter 8 p1 A matrix eigenvalue problem considers the vector equation (1) Ax = λx. 8.0 Linear Algebra: Matrix Eigenvalue Problems Here A is a given

More information

Machine Learning (Spring 2012) Principal Component Analysis

Machine Learning (Spring 2012) Principal Component Analysis 1-71 Machine Learning (Spring 1) Principal Component Analysis Yang Xu This note is partly based on Chapter 1.1 in Chris Bishop s book on PRML and the lecture slides on PCA written by Carlos Guestrin in

More information

Dimensionality Reduction

Dimensionality Reduction Dimensionality Reduction Le Song Machine Learning I CSE 674, Fall 23 Unsupervised learning Learning from raw (unlabeled, unannotated, etc) data, as opposed to supervised data where a classification of

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Dimensionality Reduction and Principle Components

Dimensionality Reduction and Principle Components Dimensionality Reduction and Principle Components Ken Kreutz-Delgado (Nuno Vasconcelos) UCSD ECE Department Winter 2012 Motivation Recall, in Bayesian decision theory we have: World: States Y in {1,...,

More information

Basics of Multivariate Modelling and Data Analysis

Basics of Multivariate Modelling and Data Analysis Basics of Multivariate Modelling and Data Analysis Kurt-Erik Häggblom 2. Overview of multivariate techniques 2.1 Different approaches to multivariate data analysis 2.2 Classification of multivariate techniques

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses.

Short Answer Questions: Answer on your separate blank paper. Points are given in parentheses. ISQS 6348 Final exam solutions. Name: Open book and notes, but no electronic devices. Answer short answer questions on separate blank paper. Answer multiple choice on this exam sheet. Put your name on

More information

Probabilistic Fisher Discriminant Analysis

Probabilistic Fisher Discriminant Analysis Probabilistic Fisher Discriminant Analysis Charles Bouveyron 1 and Camille Brunet 2 1- University Paris 1 Panthéon-Sorbonne Laboratoire SAMM, EA 4543 90 rue de Tolbiac 75013 PARIS - FRANCE 2- University

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized

More information

Factor Analysis (10/2/13)

Factor Analysis (10/2/13) STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.

More information

Maximum variance formulation

Maximum variance formulation 12.1. Principal Component Analysis 561 Figure 12.2 Principal component analysis seeks a space of lower dimensionality, known as the principal subspace and denoted by the magenta line, such that the orthogonal

More information

Discriminant Analysis with High Dimensional. von Mises-Fisher distribution and

Discriminant Analysis with High Dimensional. von Mises-Fisher distribution and Athens Journal of Sciences December 2014 Discriminant Analysis with High Dimensional von Mises - Fisher Distributions By Mario Romanazzi This paper extends previous work in discriminant analysis with von

More information

Chapter 10. Semi-Supervised Learning

Chapter 10. Semi-Supervised Learning Chapter 10. Semi-Supervised Learning Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Outline

More information

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30 Problem Set 2 MAS 622J/1.126J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain

More information

Principal component analysis

Principal component analysis Principal component analysis Motivation i for PCA came from major-axis regression. Strong assumption: single homogeneous sample. Free of assumptions when used for exploration. Classical tests of significance

More information

High Dimensional Discriminant Analysis

High Dimensional Discriminant Analysis High Dimensional Discriminant Analysis Charles Bouveyron 1,2, Stéphane Girard 1, and Cordelia Schmid 2 1 LMC IMAG, BP 53, Université Grenoble 1, 38041 Grenoble cedex 9 France (e-mail: charles.bouveyron@imag.fr,

More information

Dimensionality Reduction and Principal Components

Dimensionality Reduction and Principal Components Dimensionality Reduction and Principal Components Nuno Vasconcelos (Ken Kreutz-Delgado) UCSD Motivation Recall, in Bayesian decision theory we have: World: States Y in {1,..., M} and observations of X

More information

Principal Component Analysis vs. Independent Component Analysis for Damage Detection

Principal Component Analysis vs. Independent Component Analysis for Damage Detection 6th European Workshop on Structural Health Monitoring - Fr..D.4 Principal Component Analysis vs. Independent Component Analysis for Damage Detection D. A. TIBADUIZA, L. E. MUJICA, M. ANAYA, J. RODELLAR

More information

Package EMMIXmcfa. October 18, 2017

Package EMMIXmcfa. October 18, 2017 Package EMMIXmcfa October 18, 2017 Type Package Title Mixture of Factor Analyzers with Common Factor Loadings Version 2.0.8 Date 2017-10-18 Author Suren Rathnayake, Geoff McLachlan, Jungsun Baek Maintainer

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Discriminant analysis and supervised classification

Discriminant analysis and supervised classification Discriminant analysis and supervised classification Angela Montanari 1 Linear discriminant analysis Linear discriminant analysis (LDA) also known as Fisher s linear discriminant analysis or as Canonical

More information

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING Vishwanath Mantha Department for Electrical and Computer Engineering Mississippi State University, Mississippi State, MS 39762 mantha@isip.msstate.edu ABSTRACT

More information

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012

Machine Learning. Principal Components Analysis. Le Song. CSE6740/CS7641/ISYE6740, Fall 2012 Machine Learning CSE6740/CS7641/ISYE6740, Fall 2012 Principal Components Analysis Le Song Lecture 22, Nov 13, 2012 Based on slides from Eric Xing, CMU Reading: Chap 12.1, CB book 1 2 Factor or Component

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R.

Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R. Methods and Applications of Linear Models Regression and the Analysis of Variance Third Edition RONALD R. HOCKING PenHock Statistical Consultants Ishpeming, Michigan Wiley Contents Preface to the Third

More information

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = 30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can

More information

Computer Vision Group Prof. Daniel Cremers. 3. Regression

Computer Vision Group Prof. Daniel Cremers. 3. Regression Prof. Daniel Cremers 3. Regression Categories of Learning (Rep.) Learnin g Unsupervise d Learning Clustering, density estimation Supervised Learning learning from a training data set, inference on the

More information

Multivariate Normal Mixture Modeling, Clustering and Classification with the rebmix Package

Multivariate Normal Mixture Modeling, Clustering and Classification with the rebmix Package Multivariate Normal Mixture Modeling, Clustering and Classification with the rebmix Package Marko Nagode January 29, 2018 Abstract arxiv:180108788v1 [statml] 26 Jan 2018 The rebmix package provides R functions

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical

More information

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated

Overview of Statistical Tools. Statistical Inference. Bayesian Framework. Modeling. Very simple case. Things are usually more complicated Fall 3 Computer Vision Overview of Statistical Tools Statistical Inference Haibin Ling Observation inference Decision Prior knowledge http://www.dabi.temple.edu/~hbling/teaching/3f_5543/index.html Bayesian

More information

Mixture models for analysing transcriptome and ChIP-chip data

Mixture models for analysing transcriptome and ChIP-chip data Mixture models for analysing transcriptome and ChIP-chip data Marie-Laure Martin-Magniette French National Institute for agricultural research (INRA) Unit of Applied Mathematics and Informatics at AgroParisTech,

More information

Package EMMIX. November 8, 2013

Package EMMIX. November 8, 2013 Title The EM Algorithm and Mixture Models Package EMMIX November 8, 2013 Description Fit multivariate mixture models via the EM Algorithm. Multivariate distributions include Normal distribution, t-distribution,

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

MACHINE LEARNING ADVANCED MACHINE LEARNING

MACHINE LEARNING ADVANCED MACHINE LEARNING MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 2 2 MACHINE LEARNING Overview Definition pdf Definition joint, condition, marginal,

More information

Exact and Monte Carlo Calculations of Integrated Likelihoods for the Latent Class Model

Exact and Monte Carlo Calculations of Integrated Likelihoods for the Latent Class Model Exact and Monte Carlo Calculations of Integrated Likelihoods for the Latent Class Model C. Biernacki a,, G. Celeux b, G. Govaert c a CNRS & Université de Lille 1, Villeneuve d Ascq, France b INRIA, Orsay,

More information

VAR Model. (k-variate) VAR(p) model (in the Reduced Form): Y t-2. Y t-1 = A + B 1. Y t + B 2. Y t-p. + ε t. + + B p. where:

VAR Model. (k-variate) VAR(p) model (in the Reduced Form): Y t-2. Y t-1 = A + B 1. Y t + B 2. Y t-p. + ε t. + + B p. where: VAR Model (k-variate VAR(p model (in the Reduced Form: where: Y t = A + B 1 Y t-1 + B 2 Y t-2 + + B p Y t-p + ε t Y t = (y 1t, y 2t,, y kt : a (k x 1 vector of time series variables A: a (k x 1 vector

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

CS 143 Linear Algebra Review

CS 143 Linear Algebra Review CS 143 Linear Algebra Review Stefan Roth September 29, 2003 Introductory Remarks This review does not aim at mathematical rigor very much, but instead at ease of understanding and conciseness. Please see

More information

An introduction to multivariate data

An introduction to multivariate data An introduction to multivariate data Angela Montanari 1 The data matrix The starting point of any analysis of multivariate data is a data matrix, i.e. a collection of n observations on a set of p characters

More information

Composite Hypotheses and Generalized Likelihood Ratio Tests

Composite Hypotheses and Generalized Likelihood Ratio Tests Composite Hypotheses and Generalized Likelihood Ratio Tests Rebecca Willett, 06 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Tutorial on Principal Component Analysis

Tutorial on Principal Component Analysis Tutorial on Principal Component Analysis Copyright c 1997, 2003 Javier R. Movellan. This is an open source document. Permission is granted to copy, distribute and/or modify this document under the terms

More information

arxiv: v2 [stat.me] 4 Jun 2018

arxiv: v2 [stat.me] 4 Jun 2018 Statistics Surveys Vol. 12 (2018) 1 48 ISSN: 1935-7516 DOI: https://doi.org/10.1214/18-ss119 Variable Selection Methods for Model-based Clustering arxiv:1707.00306v2 [stat.me] 4 Jun 2018 Contents Michael

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Anders Øland David Christiansen 1 Introduction Principal Component Analysis, or PCA, is a commonly used multi-purpose technique in data analysis. It can be used for feature

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Introduction Edps/Psych/Stat/ 584 Applied Multivariate Statistics Carolyn J Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN c Board of Trustees,

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information

Foundations of Computer Vision

Foundations of Computer Vision Foundations of Computer Vision Wesley. E. Snyder North Carolina State University Hairong Qi University of Tennessee, Knoxville Last Edited February 8, 2017 1 3.2. A BRIEF REVIEW OF LINEAR ALGEBRA Apply

More information

EECS 275 Matrix Computation

EECS 275 Matrix Computation EECS 275 Matrix Computation Ming-Hsuan Yang Electrical Engineering and Computer Science University of California at Merced Merced, CA 95344 http://faculty.ucmerced.edu/mhyang Lecture 6 1 / 22 Overview

More information

Feature selection and extraction Spectral domain quality estimation Alternatives

Feature selection and extraction Spectral domain quality estimation Alternatives Feature selection and extraction Error estimation Maa-57.3210 Data Classification and Modelling in Remote Sensing Markus Törmä markus.torma@tkk.fi Measurements Preprocessing: Remove random and systematic

More information

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

More information

Variable selection for model-based clustering

Variable selection for model-based clustering Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition

More information

arxiv: v2 [stat.me] 19 Apr 2011

arxiv: v2 [stat.me] 19 Apr 2011 Simultaneous model-based clustering and visualization in the Fisher discriminative subspace Charles Bouveyron 1 & Camille Brunet 2 arxiv:1101.2374v2 [stat.me] 19 Apr 2011 1 Laboratoire SAMM, EA 4543, Université

More information

MODEL BASED CLUSTERING FOR COUNT DATA

MODEL BASED CLUSTERING FOR COUNT DATA MODEL BASED CLUSTERING FOR COUNT DATA Dimitris Karlis Department of Statistics Athens University of Economics and Business, Athens April OUTLINE Clustering methods Model based clustering!"the general model!"algorithmic

More information

Gaussian random variables inr n

Gaussian random variables inr n Gaussian vectors Lecture 5 Gaussian random variables inr n One-dimensional case One-dimensional Gaussian density with mean and standard deviation (called N, ): fx x exp. Proposition If X N,, then ax b

More information