Parsimonious Gaussian Mixture Models

Size: px
Start display at page:

Download "Parsimonious Gaussian Mixture Models"

Transcription

1 Parsimonious Gaussian Mixture Models Brendan Murphy Department of Statistics, Trinity College Dublin, Ireland. East Liguria West Liguria Umbria North Apulia Coast Sardina Inland Sardinia South Apulia Calabria Sicily 1

2 Outline Data From Food Authenticity Studies Italian Olive Oils Italian Wines Background: Model-Based Clustering Data Reduction and Clustering Factor Analysis Methods: Parsimonious Gaussian Mixture Models 2

3 Acknowledgements This work has been done in collaboration with: Paul McNicholas, Department of Statistics, Trinity College Dublin. This work is supported by a Science Foundation of Ireland Basic Research Grant (04/BR/M0057). Much of this work has been carried out when visiting CSSS. Thanks to Nema for producing the results of Slide 33 at extremely short notice. 3

4 Food Authenticity Studies An authentic food is one that is exactly what it claims to be. Important aspects of food description include: Process history Geographic origin Species or variety Purity and adulteration Food producers and consumers need to be assured of the authenticity of their food purchases. Food authenticity studies are concerned with establishing if foods are authentic or not. 4

5 Analytical Techniques Many analytical chemistry techniques are used in food authenticity studies. These include: Gas chromatography Mass spectroscopy Vibrational spectroscopic techniques (Raman, ultraviolet, mid-infrared, near-infrared and visible). These techniques have been shown to be capable for discrimination between sets of similar biological materials. Some of these techniques are slow and difficult, while others are quick and easy. 5

6 Italian Olive Oils Forina et al (1982,1983) report the percentage composition of eight fatty acids found by lipid fraction of 572 Italian olive oils. Fatty Acids palmitic palmitoleic stearic oleic linoleic linolenic arachidic eicosenoic These data are available in the GGobi package (Swayne et al, 2003; 6

7 Italian Olive Oils: Origin The data are used to classify the olive oil samples to their geographic origin. East Liguria West Liguria Umbria North Apulia Coast Sardina Inland Sardinia South Apulia Calabria Sicily 7

8 Italian Wines Forina et al (1986) used twenty eight chemical properties of Italian wines from the Asti region to classify the wines to their specific type (Barolo, Grignolino, Barbera). A subset of thirteen variables are available from the gclus library (Hurley, 2004) for R. These data are also in the UCI Machine Learning Database. Chemical Properties Alcohol Malic Acid Ash Alcalinity of Ash Magnesium Total Phenols Flavanoids Nonflavanoid Phenols Proanthocyanins Color Intensity Hue OD280/OD315 Of Diluted Wines Proline 8

9 Model-based Clustering and Discriminant Analysis Model-based clustering (Banfield and Raftery, 1993; Fraley and Raftery, 2000; 2002) uses normal mixtures to develop a flexible suite of cluster analysis methods. Model-based clustering uses constraints on the group covariance matrices; the constraints use the eigenvalue decomposition of the covariance matrices to impose shape restrictions on the groups. Bensmail and Celeux (1996) developed discriminant analysis methods using the same covariance decomposition. The decomposition is of the form, Σ g = λ g D g A g D T g, where λ g is a constant, D g is an orthonormal matrix, and A g is a diagonal matrix with det(a k ) = 1. 9

10 Covariance Parameters Interpretations for the parameters are: (λ g = Volume); (A g = Shape); (D g = Orientation). These parameters can be constrained in various ways. EII VII EEI VEI EVI VVI EEE EEV VEV VVV 10

11 Model-Based Clustering Let y 1, y 2,..., y M be unlabelled data that we wish to cluster. Model-based clustering is based on the likelihood function, f(y M θ 1,..., θ G ) = M G π g f(y m θ g ). m=1 g=1 The likelihood function is maximized using the EM algorithm to get estimates for the parameters. Observations are clustered on the basis of estimated values for the posterior probability of component membership, P{Component g y} ˆπ g f(y ˆθ g ) G g =1 ˆπ g f(y ˆθ g ) 11

12 Data Reduction, Clustering and Classification Chang (1983) showed that the principal components corresponding to the larger eigenvalues do not necessarily contain information about group structure. Data reduction and clustering separately may not be a good idea! 12

13 Model-based Clustering & Variable Selection Raftery and Dean (2004) recently developed a version of model-based clustering that includes variable selection. With their method, variables are selected in a step-wise manner. Their method involves the stages: Find the variable with the greatest evidence of clustering given the already selected variables. Remove a variable from the set of selected variables if it no longer has evidence of clustering. This is one approach that avoids the problems of data reduction followed by clustering. 13

14 Factor Analysis The factor analysis assumes that observed values are conditionally independent given a latent variable. Specifically, X 1 X 2.. = µ 1 µ λ 11 λ λ 1q λ 21 λ λ 2q U 1 U ɛ 1 ɛ 2.. X p µ p λ p1 λ p2... λ pq U q ɛ p where independently ie. X = µ + ΛU + ɛ U MVN(0, I), ɛ MVN(0, Ψ) and Ψ = diag(σ 2 1, σ 2 2,..., σ 2 p). 14

15 Factor Analysis The resulting distribution for X is, X MVN ( µ, ΛΛ T + Ψ ). Λ is called the loading matrix. Ψ is the noise variance. Λ is not defined uniquely. If Λ is replaced by Λ = ΛD where D is orthonormal, then ΛΛ T + Ψ = (Λ )(Λ ) T + Ψ. As a result, the covariance has pq q(q 1)/2 + p free parameters. 15

16 Probabilisitic Principal Components Analysis Tipping and Bishop (1999a) developed the probabilistic principal components analysis (PPCA) model. This model is equivalent to imposing an isotropy constraint on the noise variance Ψ = diag(ψ, ψ,..., ψ) = ψi in the factor analysis model. The covariance in this model has pq q(q 1)/2 + 1 free parameters. 16

17 Mixture of Factor Analyzers Model The Mixture of Factor Analyzers (MFA) model assumes a normal mixture model. The covariance of each component has a factor analysis covariance structure. So, X G π g MVN(µ g, Λ g Λ T g + Ψ g ). g=1 This model was developed by Ghahramani and Hinton (1997) and further developed by McLachlan et al (2002,2003). Tipping and Bishop (1999b) developed a Mixture of Probabilistic Principal Components Analysers (MPPCA) model (Ψ g = σ 2 gi) 17

18 Constraints We can constrain the Λ g and Ψ g parameters in the MFA model across groups to reduce the number of parameters. We also have the option of assuming that Ψ g = ψ g I. This leads to eight Parsimonious Gaussian Mixture Models: ModelID Loading Noise Isotropic (G = 2) (G = 2) (p = 8, q = 1) (p = 8, q = 3) CCC Constrained Constrained Constrained 9 22 CCU Constrained Constrained Unconstrained CUC Constrained Unconstrained Constrained CUU Constrained Unconstrained Unconstrained UCC Unconstrained Constrained Constrained UCU Unconstrained Constrained Unconstrained UUC Unconstrained Unconstrained Constrained UUU Unconstrained Unconstrained Unconstrained

19 Model Fitting The Parsimonious Gaussian mixture models are fitted using the AECM algorithm (Meng and van Dyk, 1997). The ECM algorithm (Meng and Rubin, 1993) replaces the M-step by a series of conditional maximization steps. The AECM algorithm (Meng and van Dyk, 1997) allows a different specification of complete-data for each conditional maximization step. McLachlan and Krishnan (1997) gives an extensive review of the EM algorithm and variants McLachlan and Peel (2000) give extensive details of the fitting algorithm in the UUU case. 19

20 Three Likelihoods The likelihood function for this mixture is, L = f(x π g, µ g, Λ g, Ψ g ) = N G π g φ(x n µ g, Λ g Λ T g + Ψ g ). n=1 g=1 The first complete-data likelihood function is, L 1 = f(x, z π g, µ g, Λ g, Ψ g ) = N G [ πg φ(x n µ g, Λ g Λ T g + Ψ g ) ] z ng. n=1 g=1 The second complete-data likelihood function is, L 2 = f(x, z, u π g, µ g, Λ g, Ψ g ) = N G [π g φ(x n µ g + Λ g u n, Ψ g )φ(u n 0, I)] z ng. n=1 g=1 20

21 AECM: Stage 1 (π g and µ g ) This missing data are the component membership labels z ng. These are replaced by their expected values ẑ ng ˆπ g φ(x n ˆµ g, ˆΛ g ˆΛT g + ˆΨ g ). This leads to the expected complete-data log-likelihood, Q 1 = G g=1 N g log π g Np 2 log 2π G g=1 G N g tr { S g (Λ g Λ T g + Ψ g ) 1}, g=1 where N g = N n=1 ẑng and S g = (1/N g ) N n=1 ẑng(x n ˆµ g )(x n ˆµ g ) T. N g 2 log Λ gλ T g + Ψ g 21

22 AECM: Stage 1 (π g and µ g ) Maximizing Q 1 with respect to µ g and π g gives the estimates, ˆµ g = N n=1 ẑngx n N n=1 ẑng and ˆπ g = N g N. 22

23 AECM: Stage 2 (Λ g and Ψ g ) The missing data are the z ng and the latent variables u n. The expected complete-data log-likelihood can be shown to be, Q 2 = C + G g=1 N g [ log π g { 2 tr Λ T g Ψ 1 g Λ g (I ˆB g ˆΛg + ˆB } ] g S g ˆBT g ) where ˆB g = ˆΛ T g ( ˆΛ g ˆΛT g + ˆΨ g ) 1. log Ψ 1 g 1 2 tr { Ψ 1 } { } g S g + tr Ψ 1 g Λ g ˆBg S g Maximizing this with respect to Λ g and Ψ g gives new estimates for these parameters. How we do this depends on the constraints... 23

24 AECM: An Aside Graybill (1983) and Lütkepohl (1996) give matrix differential results that help with the maximization. In particular, the following useful identities, log X X = X 1 tr(xaxb) X tr(xa) X tr(axb) X = AT = B T A T = B T X T A T + A T X T B T. 24

25 AECM: Stage 2 (Λ g and Ψ g Constrains) Constraints are implemented by replacing the Λ g and Ψ g terms with the appropriate version and then maximizing. For example, the UCU estimates are: ˆB g = ˆΛ T g ( ˆΛ g ˆΛT g + ˆΨ) 1 ˆΛ new g = S g ˆBT g (I ˆB g ˆΛg + ˆB g S g ˆBT g ) 1 ˆΨ new = G g=1 ˆπ g diag { S g ˆΛ new g ˆB g S g } The CUU constrained model is more complicated than the others. 25

26 Model Selection Model selection was done using BIC. BIC = 2(Maximized Log-Likelihood) log(n) (Number of Parameters). Three model features are chosen: Constraints Components (G) Latent factor dimension (q) For small problems, an exhaustive search was possible. For larger problems, a local search strategy can be used. 26

27 Results: Italian Olive Oils The eight Parsimonious Gaussian mixture models (CCC, CCU,..., UUU), G = 1, 2,..., 14 and q = 1, 2,..., 5 were fitted. The best model is a UCU model with (G = 7, q = 5). q G 27

28 Results: Italian Olive Oils Classification table for the best PGMM. Rand Index=0.90 Adjusted Rand Index=0.64 BIC= N. Apulia 24 1 Calabria 48 8 S. Apulia Sicily I. Sardinia 64 1 C. Sardinia 33 E. Liguria 50 W. Liguria 50 Umbria 51 28

29 Results: Italian Olive Oils Classification table for the best model found using mclust. Rand Index=0.93 Adjusted Rand Index=0.78 BIC= N. Apulia 25 Calabria 56 S. Apulia Sicily 36 I. Sardinia 65 C. Sardinia 33 E. Liguria 50 W. Liguria 50 Umbria 51 29

30 Results: Italian Wines Fitted eight Parsimonious Gaussian Mixture Models (CCC, CCU,..., UUU), G = 1, 2,..., 8 and q = 1, 2,..., 5. The best model is a CUU model with (G = 4, q = 2). q G 30

31 Results: Italian Wines Classification table for the best PGMM Barolo 59 Grignolino Barbera 48 Rand Index=0.91 Adjusted Rand Index=0.79 BIC=

32 Results: Italian Wines (mclust) Classification table for the best model found using mclust Barolo Grignolino Barbera Rand Index=0.80 Adjusted Rand Index=0.48 BIC=

33 Results: Italian Wines (Variable Selection) Using the variable selection method of Raftery and Dean (2004) we get: Rand Index=0.88 Adjusted Rand Index= Barolo 51 8 Grignolino Barbera 1 47 Variables Selected: Malic Acid, Proline, Flavanoids, Color Intensity. 33

34 Italian Wines (A Little Known Fact?) The wines in this study were from the years Barolo Grignolino Barbera Could this be affecting the results? 34

35 Results: Italian Wines (A Little Deeper) Returning to the PGMM results... and assuming that the data are in year order... Cluster Barolo Grignolino Barbera The middle two clusters are almost grouped by year. 35

36 Results: Italian Wines (A Little Deeper) Returning to the mclust results... Cluster Barolo Grignolino Barbera

37 Discriminant Analysis Results A quick study of the use of the PGMM for discriminant analysis (semi-supervised) found the following classification rates: The data were randomly split into training and test in a 50:50 ratio. The results for 50 random splits are below. Data ModelID q Misclassification Rate Italian Olive Oils UCU 4 5.6% (1.6) Italian Wines UUU 3 1.8% (1.4) 37

38 Conclusions Data reduction can improve clustering and classification results. Combining variable selection and clustering can give improved results. The constrained mixture of factor analyzers model leads to a family of Parsimonious Gaussian Mixture Models. These models should be especially useful in high-dimensional problems. For fixed q, the number of parameters grows linearly in dimension. Incorporating a LASSO-type constraint on the loading matrix may give a sparse solution and effectively do variable selection. 38

Parsimonious Gaussian Mixture Models

Parsimonious Gaussian Mixture Models Parsimonious Gaussian Mixture Models Paul David McNicholas Trinity Collee Dublin, Ireland Thomas Brendan Murphy Trinity Collee Dublin, Ireland Abstract Parsimonious Gaussian mixture models are developed

More information

Extending mixtures of multivariate t-factor analyzers

Extending mixtures of multivariate t-factor analyzers Stat Comput (011) 1:361 373 DOI 10.1007/s11-010-9175- Extending mixtures of multivariate t-factor analyzers Jeffrey L. Andrews Paul D. McNicholas Received: July 009 / Accepted: 1 March 010 / Published

More information

arxiv: v1 [stat.me] 7 Aug 2015

arxiv: v1 [stat.me] 7 Aug 2015 Dimension reduction for model-based clustering Luca Scrucca Università degli Studi di Perugia August 0, 05 arxiv:508.07v [stat.me] 7 Aug 05 Abstract We introduce a dimension reduction method for visualizing

More information

Model-Based Clustering of High-Dimensional Data: A review

Model-Based Clustering of High-Dimensional Data: A review Model-Based Clustering of High-Dimensional Data: A review Charles Bouveyron, Camille Brunet To cite this version: Charles Bouveyron, Camille Brunet. Model-Based Clustering of High-Dimensional Data: A review.

More information

Generative modeling of data. Instructor: Taylor Berg-Kirkpatrick Slides: Sanjoy Dasgupta

Generative modeling of data. Instructor: Taylor Berg-Kirkpatrick Slides: Sanjoy Dasgupta Generative modeling of data Instructor: Taylor BergKirkpatrick Slides: Sanjoy Dasgupta Parametric versus nonparametric classifiers Nearest neighbor classification: size of classifier size of data set Nonparametric:

More information

The generative approach to classification. A classification problem. Generative models CSE 250B

The generative approach to classification. A classification problem. Generative models CSE 250B The generative approach to classification The generative approach to classification CSE 250B The learning process: Fit a probability distribution to each class, individually To classify a new point: Which

More information

Dimension Reduction for Model-based Clustering via Mixtures of Multivariate t-distributions

Dimension Reduction for Model-based Clustering via Mixtures of Multivariate t-distributions Dimension Reduction for Model-based Clustering via Mixtures of Multivariate t-distributions by Katherine Morris A Thesis presented to The University of Guelph In partial fulfilment of requirements for

More information

Variable Selection and Updating In Model-Based Discriminant Analysis for High-Dimensional Data 1

Variable Selection and Updating In Model-Based Discriminant Analysis for High-Dimensional Data 1 Variable Selection and Updating In Model-Based Discriminant Analysis for High-Dimensional Data 1 Thomas Brendan Murphy School of Mathematical Sciences University College Dublin, Ireland Nema Dean Department

More information

Journal of Statistical Software

Journal of Statistical Software JSS Journal of Statistical Software January 2012, Volume 46, Issue 6. http://www.jstatsoft.org/ HDclassif: An R Package for Model-Based Clustering and Discriminant Analysis of High-Dimensional Data Laurent

More information

An Adaptive LASSO-Penalized BIC

An Adaptive LASSO-Penalized BIC An Adaptive LASSO-Penalized BIC Sakyajit Bhattacharya and Paul D. McNicholas arxiv:1406.1332v1 [stat.me] 5 Jun 2014 Dept. of Mathematics and Statistics, University of uelph, Canada. Abstract Mixture models

More information

Model-based cluster analysis: a Defence. Gilles Celeux Inria Futurs

Model-based cluster analysis: a Defence. Gilles Celeux Inria Futurs Model-based cluster analysis: a Defence Gilles Celeux Inria Futurs Model-based cluster analysis Model-based clustering (MBC) consists of assuming that the data come from a source with several subpopulations.

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Dimension Reduction and Clustering of High. Dimensional Data using a Mixture of Generalized. Hyperbolic Distributions

Dimension Reduction and Clustering of High. Dimensional Data using a Mixture of Generalized. Hyperbolic Distributions Dimension Reduction and Clustering of High Dimensional Data using a Mixture of Generalized Hyperbolic Distributions DIMENSION REDUCTION AND CLUSTERING OF HIGH DIMENSIONAL DATA USING A MIXTURE OF GENERALIZED

More information

arxiv: v1 [stat.me] 27 Nov 2012

arxiv: v1 [stat.me] 27 Nov 2012 A LASSO-Penalized BIC for Mixture Model Selection Sakyajit Bhattacharya and Paul D. McNicholas arxiv:1211.6451v1 [stat.me] 27 Nov 2012 Department of Mathematics & Statistics, University of Guelph. Abstract

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

Probabilistic Fisher Discriminant Analysis

Probabilistic Fisher Discriminant Analysis Probabilistic Fisher Discriminant Analysis Charles Bouveyron 1 and Camille Brunet 2 1- University Paris 1 Panthéon-Sorbonne Laboratoire SAMM, EA 4543 90 rue de Tolbiac 75013 PARIS - FRANCE 2- University

More information

Factor Analysis (10/2/13)

Factor Analysis (10/2/13) STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.

More information

Gaussian Mixture Models with Component Means Constrained in Pre-selected Subspaces

Gaussian Mixture Models with Component Means Constrained in Pre-selected Subspaces Gaussian Mixture Models with Component Means Constrained in Pre-selected Subspaces Mu Qiao and Jia Li Abstract We investigate a Gaussian mixture model (GMM) with component means constrained in a pre-selected

More information

applications Rome, 9 February Università di Roma La Sapienza Robust model based clustering: methods and applications Francesco Dotto Introduction

applications Rome, 9 February Università di Roma La Sapienza Robust model based clustering: methods and applications Francesco Dotto Introduction model : fuzzy model : Università di Roma La Sapienza Rome, 9 February Outline of the presentation model : fuzzy 1 General motivation 2 algorithm on trimming and reweigthing. 3 algorithm on trimming and

More information

Families of Parsimonious Finite Mixtures of Regression Models arxiv: v1 [stat.me] 2 Dec 2013

Families of Parsimonious Finite Mixtures of Regression Models arxiv: v1 [stat.me] 2 Dec 2013 Families of Parsimonious Finite Mixtures of Regression Models arxiv:1312.0518v1 [stat.me] 2 Dec 2013 Utkarsh J. Dang and Paul D. McNicholas Department of Mathematics & Statistics, University of Guelph

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Chapter 10. Semi-Supervised Learning

Chapter 10. Semi-Supervised Learning Chapter 10. Semi-Supervised Learning Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Outline

More information

Variable selection for model-based clustering

Variable selection for model-based clustering Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

Factor Analysis and Kalman Filtering (11/2/04)

Factor Analysis and Kalman Filtering (11/2/04) CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Classification 2: Linear discriminant analysis (continued); logistic regression

Classification 2: Linear discriminant analysis (continued); logistic regression Classification 2: Linear discriminant analysis (continued); logistic regression Ryan Tibshirani Data Mining: 36-462/36-662 April 4 2013 Optional reading: ISL 4.4, ESL 4.3; ISL 4.3, ESL 4.4 1 Reminder:

More information

Model Selection for Mixtures of Factor Analyzers via Hierarchical BIC

Model Selection for Mixtures of Factor Analyzers via Hierarchical BIC Stat Comput manuscript No. (will be inserted by the editor) Model Selection for Mixtures of Factor Analyzers via Hierarchical BIC Jianhua Zhao Philip L.H. Yu Lei Shi Received: date / Accepted: date Abstract

More information

Model-based clustering of high-dimensional data: an overview and some recent advances

Model-based clustering of high-dimensional data: an overview and some recent advances Model-based clustering of high-dimensional data: an overview and some recent advances Charles BOUVEYRON Laboratoire SAMM, EA 4543 Université Paris 1 Panthéon-Sorbonne This presentation is based on several

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

Mixture of Latent Trait Analyzers for Model-Based Clustering of Categorical Data

Mixture of Latent Trait Analyzers for Model-Based Clustering of Categorical Data Mixture of Latent Trait Analyzers for Model-Based Clustering of Categorical Data arxiv:1301.2167v2 [stat.me] 19 Feb 2013 Isabella Gollini & Thomas Brendan Murphy National Centre for Geocomputation, National

More information

Model selection criteria in Classification contexts. Gilles Celeux INRIA Futurs (orsay)

Model selection criteria in Classification contexts. Gilles Celeux INRIA Futurs (orsay) Model selection criteria in Classification contexts Gilles Celeux INRIA Futurs (orsay) Cluster analysis Exploratory data analysis tools which aim is to find clusters in a large set of data (many observations

More information

The Pennsylvania State University The Graduate School Eberly College of Science VISUAL ANALYTICS THROUGH GAUSSIAN MIXTURE MODELS WITH

The Pennsylvania State University The Graduate School Eberly College of Science VISUAL ANALYTICS THROUGH GAUSSIAN MIXTURE MODELS WITH The Pennsylvania State University The Graduate School Eberly College of Science VISUAL ANALYTICS THROUGH GAUSSIAN MIXTURE MODELS WITH SUBSPACE CONSTRAINED COMPONENT MEANS A Thesis in Statistics by Mu Qiao

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

Introduction to Graphical Models

Introduction to Graphical Models Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic

More information

PROBABILISTIC LATENT SEMANTIC ANALYSIS

PROBABILISTIC LATENT SEMANTIC ANALYSIS PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications

More information

THE EMMIX SOFTWARE FOR THE FITTING OF MIXTURES OF NORMAL AND t-components

THE EMMIX SOFTWARE FOR THE FITTING OF MIXTURES OF NORMAL AND t-components THE EMMIX SOFTWARE FOR THE FITTING OF MIXTURES OF NORMAL AND t-components G.J. McLachlan, D. Peel, K.E. Basford*, and P. Adams Department of Mathematics, University of Queensland, St. Lucia, Queensland

More information

Lecture 7: Con3nuous Latent Variable Models

Lecture 7: Con3nuous Latent Variable Models CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Latent Variable Models and EM Algorithm

Latent Variable Models and EM Algorithm SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/

More information

Performance assessment of quantum clustering in non-spherical data distributions

Performance assessment of quantum clustering in non-spherical data distributions Performance assessment of quantum clustering in non-spherical data distributions Raúl V. Casaña-Eslava1, José D. Martín-Guerrero2, Ian H. Jarman1 and Paulo J. G. Lisboa1 1- School of Computing and Mathematical

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

Overview. and data transformations of gene expression data. Toy 2-d Clustering Example. K-Means. Motivation. Model-based clustering

Overview. and data transformations of gene expression data. Toy 2-d Clustering Example. K-Means. Motivation. Model-based clustering Model-based clustering and data transformations of gene expression data Walter L. Ruzzo University of Washington UW CSE Computational Biology Group 2 Toy 2-d Clustering Example K-Means? 3 4 Hierarchical

More information

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling

Machine Learning. B. Unsupervised Learning B.2 Dimensionality Reduction. Lars Schmidt-Thieme, Nicolas Schilling Machine Learning B. Unsupervised Learning B.2 Dimensionality Reduction Lars Schmidt-Thieme, Nicolas Schilling Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University

More information

Adaptive Mixture Discriminant Analysis for. Supervised Learning with Unobserved Classes

Adaptive Mixture Discriminant Analysis for. Supervised Learning with Unobserved Classes Adaptive Mixture Discriminant Analysis for Supervised Learning with Unobserved Classes Charles Bouveyron SAMOS-MATISSE, CES, UMR CNRS 8174 Université Paris 1 (Panthéon-Sorbonne), Paris, France Abstract

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Clustering VS Classification

Clustering VS Classification MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:

More information

Model Based Clustering of Count Processes Data

Model Based Clustering of Count Processes Data Model Based Clustering of Count Processes Data Tin Lok James Ng, Brendan Murphy Insight Centre for Data Analytics School of Mathematics and Statistics May 15, 2017 Tin Lok James Ng, Brendan Murphy (Insight)

More information

MSA220 Statistical Learning for Big Data

MSA220 Statistical Learning for Big Data MSA220 Statistical Learning for Big Data Lecture 4 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology More on Discriminant analysis More on Discriminant

More information

THESIS COVARIANCE REGULARIZATION IN MIXTURE OF GAUSSIANS FOR HIGH-DIMENSIONAL IMAGE CLASSIFICATION. Submitted by. Daniel L Elliott

THESIS COVARIANCE REGULARIZATION IN MIXTURE OF GAUSSIANS FOR HIGH-DIMENSIONAL IMAGE CLASSIFICATION. Submitted by. Daniel L Elliott THESIS COVARIANCE REGULARIZATION IN MIXTURE OF GAUSSIANS FOR HIGH-DIMENSIONAL IMAGE CLASSIFICATION Submitted by Daniel L Elliott Department of Computer Science In partial fulfillment of the requirements

More information

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis

Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Data Analysis and Manifold Learning Lecture 6: Probabilistic PCA and Factor Analysis Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inrialpes.fr http://perception.inrialpes.fr/ Outline of Lecture

More information

High Dimensional Discriminant Analysis

High Dimensional Discriminant Analysis High Dimensional Discriminant Analysis Charles Bouveyron LMC-IMAG & INRIA Rhône-Alpes Joint work with S. Girard and C. Schmid High Dimensional Discriminant Analysis - Lear seminar p.1/43 Introduction High

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same

More information

Applied Multivariate and Longitudinal Data Analysis

Applied Multivariate and Longitudinal Data Analysis Applied Multivariate and Longitudinal Data Analysis Discriminant analysis and classification Ana-Maria Staicu SAS Hall 5220; 919-515-0644; astaicu@ncsu.edu 1 Consider the examples: An online banking service

More information

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA

Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Manifold Learning for Signal and Visual Processing Lecture 9: Probabilistic PCA (PPCA), Factor Analysis, Mixtures of PPCA Radu Horaud INRIA Grenoble Rhone-Alpes, France Radu.Horaud@inria.fr http://perception.inrialpes.fr/

More information

Discriminant Analysis for Longitudinal Data

Discriminant Analysis for Longitudinal Data Discriminant Analysis for Longitudinal Data DISCRIMINANT ANALYSIS FOR LONGITUDINAL DATA BY KEVIN MATIRA, B.Sc. a thesis submitted to the department of mathematics & statistics and the school of graduate

More information

Variational Autoencoders

Variational Autoencoders Variational Autoencoders Recap: Story so far A classification MLP actually comprises two components A feature extraction network that converts the inputs into linearly separable features Or nearly linearly

More information

STAT 730 Chapter 9: Factor analysis

STAT 730 Chapter 9: Factor analysis STAT 730 Chapter 9: Factor analysis Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Data Analysis 1 / 15 Basic idea Factor analysis attempts to explain the

More information

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM. Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification

More information

Mixture of Gaussians Models

Mixture of Gaussians Models Mixture of Gaussians Models Outline Inference, Learning, and Maximum Likelihood Why Mixtures? Why Gaussians? Building up to the Mixture of Gaussians Single Gaussians Fully-Observed Mixtures Hidden Mixtures

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Lecture 6: April 19, 2002

Lecture 6: April 19, 2002 EE596 Pat. Recog. II: Introduction to Graphical Models Spring 2002 Lecturer: Jeff Bilmes Lecture 6: April 19, 2002 University of Washington Dept. of Electrical Engineering Scribe: Huaning Niu,Özgür Çetin

More information

Gaussian Mixtures and the EM algorithm

Gaussian Mixtures and the EM algorithm Gaussian Mixtures and the EM algorithm 1 sigma=1.0 sigma=1.0 Responsibilities 0.0 0.2 0.4 0.6 0.8 1.0 sigma=0.2 sigma=0.2 Responsibilities 0.0 0.2 0.4 0.6 0.8 1.0 2 Details of figure Left panels: two Gaussian

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Combining Generative and Discriminative Methods for Pixel Classification with Multi-Conditional Learning

Combining Generative and Discriminative Methods for Pixel Classification with Multi-Conditional Learning Combining Generative and Discriminative Methods for Pixel Classification with Multi-Conditional Learning B. Michael Kelm Interdisciplinary Center for Scientific Computing University of Heidelberg michael.kelm@iwr.uni-heidelberg.de

More information

Machine Learning for Data Science (CS4786) Lecture 12

Machine Learning for Data Science (CS4786) Lecture 12 Machine Learning for Data Science (CS4786) Lecture 12 Gaussian Mixture Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Back to K-means Single link is sensitive to outliners We

More information

More on Unsupervised Learning

More on Unsupervised Learning More on Unsupervised Learning Two types of problems are to find association rules for occurrences in common in observations (market basket analysis), and finding the groups of values of observational data

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Dimension Reduction (PCA, ICA, CCA, FLD,

Dimension Reduction (PCA, ICA, CCA, FLD, Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011 April 6 th, 2011 Parts of the PCA slides are from previous 10-701 lectures 1 Outline Dimension reduction

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

Chapter 4: Factor Analysis

Chapter 4: Factor Analysis Chapter 4: Factor Analysis In many studies, we may not be able to measure directly the variables of interest. We can merely collect data on other variables which may be related to the variables of interest.

More information

Probabilistic Time Series Classification

Probabilistic Time Series Classification Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign

More information

Classification 1: Linear regression of indicators, linear discriminant analysis

Classification 1: Linear regression of indicators, linear discriminant analysis Classification 1: Linear regression of indicators, linear discriminant analysis Ryan Tibshirani Data Mining: 36-462/36-662 April 2 2013 Optional reading: ISL 4.1, 4.2, 4.4, ESL 4.1 4.3 1 Classification

More information

Active and Semi-supervised Kernel Classification

Active and Semi-supervised Kernel Classification Active and Semi-supervised Kernel Classification Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London Work done in collaboration with Xiaojin Zhu (CMU), John Lafferty (CMU),

More information

arxiv: v2 [stat.me] 19 Apr 2011

arxiv: v2 [stat.me] 19 Apr 2011 Simultaneous model-based clustering and visualization in the Fisher discriminative subspace Charles Bouveyron 1 & Camille Brunet 2 arxiv:1101.2374v2 [stat.me] 19 Apr 2011 1 Laboratoire SAMM, EA 4543, Université

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Principal Component Analysis Applied to Polytomous Quadratic Logistic

Principal Component Analysis Applied to Polytomous Quadratic Logistic Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS024) p.4410 Principal Component Analysis Applied to Polytomous Quadratic Logistic Regression Andruski-Guimarães,

More information

CSC411 Fall 2018 Homework 5

CSC411 Fall 2018 Homework 5 Homework 5 Deadline: Wednesday, Nov. 4, at :59pm. Submission: You need to submit two files:. Your solutions to Questions and 2 as a PDF file, hw5_writeup.pdf, through MarkUs. (If you submit answers to

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab

More information

Why MixTRV? Motivations for a new Matlab package for Gaussian mixture inference statistical hypothesis testing& model-based classification/clustering

Why MixTRV? Motivations for a new Matlab package for Gaussian mixture inference statistical hypothesis testing& model-based classification/clustering Why MixTRV? Motivations for a new Matlab package for Gaussian mixture inference statistical hypothesis testing& model-based classification/clustering Alexandre Lourme Institut de Mathématiques de Bordeaux

More information

Adaptive Mixture Discriminant Analysis for Supervised Learning with Unobserved Classes

Adaptive Mixture Discriminant Analysis for Supervised Learning with Unobserved Classes Adaptive Mixture Discriminant Analysis for Supervised Learning with Unobserved Classes Charles Bouveyron To cite this version: Charles Bouveyron. Adaptive Mixture Discriminant Analysis for Supervised Learning

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College

More information

Combining eigenvalues and variation of eigenvectors for order determination

Combining eigenvalues and variation of eigenvectors for order determination Combining eigenvalues and variation of eigenvectors for order determination Wei Luo and Bing Li City University of New York and Penn State University wei.luo@baruch.cuny.edu bing@stat.psu.edu 1 1 Introduction

More information

Semi-Supervised Learning through Principal Directions Estimation

Semi-Supervised Learning through Principal Directions Estimation Semi-Supervised Learning through Principal Directions Estimation Olivier Chapelle, Bernhard Schölkopf, Jason Weston Max Planck Institute for Biological Cybernetics, 72076 Tübingen, Germany {first.last}@tuebingen.mpg.de

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Louis Roussos Sports Data

Louis Roussos Sports Data Louis Roussos Sports Data Rank the sports you most like to participate in, 1 = favorite, 7 = least favorite. There are n=130 rank vectors. > sportsranks Baseball Football Basketball Tennis Cycling Swimming

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Regularized Parameter Estimation in High-Dimensional Gaussian Mixture Models

Regularized Parameter Estimation in High-Dimensional Gaussian Mixture Models LETTER Communicated by Clifford Lam Regularized Parameter Estimation in High-Dimensional Gaussian Mixture Models Lingyan Ruan lruan@gatech.edu Ming Yuan ming.yuan@isye.gatech.edu School of Industrial and

More information

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00 / 5 Administration Homework on web page, due Feb NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00... administration / 5 STA 44/04 Jan 6,

More information

Model-based mixture discriminant analysis an experimental study

Model-based mixture discriminant analysis an experimental study Model-based mixture disriminant analysis an experimental study Zohar Halbe and Mayer Aladjem Department of Eletrial and Computer Engineering, Ben-Gurion University of the Negev P.O.Box 653, Beer-Sheva,

More information

Choosing a model in a Classification purpose. Guillaume Bouchard, Gilles Celeux

Choosing a model in a Classification purpose. Guillaume Bouchard, Gilles Celeux Choosing a model in a Classification purpose Guillaume Bouchard, Gilles Celeux Abstract: We advocate the usefulness of taking into account the modelling purpose when selecting a model. Two situations are

More information