Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions
|
|
- Cleopatra Francis
- 6 years ago
- Views:
Transcription
1 Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions Parthan Kasarapu & Lloyd Allison Monash University, Australia September 8, 25 Parthan Kasarapu Mixture modelling using MML September 8, 25 / 3
2 Presentation Outline Mixture modelling problem Minimum Message Length framework MML-based search method Evaluation of the proposed method von Mises-Fisher mixtures and applications. Parthan Kasarapu Mixture modelling using MML September 8, 25 2 / 3
3 Mixture models K Pr(x; M) = w j f j (x; Θ j ) j= Parthan Kasarapu Mixture modelling using MML September 8, 25 3 / 3
4 Mixture models Pr(x; M) = Ubiquitously used Modelling multi-modal data K w j f j (x; Θ j ) j= Component probability distributions of various kinds Poisson, Exponential, Weibull,... multivariate Gaussian (Euclidean) multivariate von Mises-Fisher (directional) Parthan Kasarapu Mixture modelling using MML September 8, 25 3 / 3
5 The Problem Estimation of the parameters of the components. Expectation-Maximization (EM) algorithm Determination of a suitable number of components Objective function to compare two mixtures. Parthan Kasarapu Mixture modelling using MML September 8, 25 4 / 3
6 Motivation Principal component setosa versicolor virginica Principal component Parthan Kasarapu Mixture modelling using MML September 8, 25 5 / 3
7 Motivation Principal component setosa versicolor virginica Principal component Parthan Kasarapu Mixture modelling using MML September 8, 25 5 / 3
8 Motivation Principal component setosa versicolor virginica Principal component Parthan Kasarapu Mixture modelling using MML September 8, 25 5 / 3
9 Motivation Principal component setosa versicolor virginica Principal component Parthan Kasarapu Mixture modelling using MML September 8, 25 5 / 3
10 Motivation Principal component setosa versicolor virginica Principal component setosa versicolor virginica Principal component setosa versicolor virginica Principal component Principal component Principal component Statistical model selection is important. Parthan Kasarapu Mixture modelling using MML September 8, 25 6 / 3
11 Model selection and inference Several candidate models: which one to choose? A criterion to compare models... Based on the model s complexity and the goodness-of-fit Parthan Kasarapu Mixture modelling using MML September 8, 25 7 / 3
12 Minimum Message Length (MML) Framework Conceptualized by Wallace and Boulton (968) I (H&D) = I (H) }{{} + I (D H) }{{} First part Second part Parthan Kasarapu Mixture modelling using MML September 8, 25 8 / 3
13 Minimum Message Length (MML) Framework Conceptualized by Wallace and Boulton (968) I (H&D) = Two-part message: I (H): model complexity I (D H): goodness-of-fit I (H) }{{} + I (D H) }{{} First part Second part Parthan Kasarapu Mixture modelling using MML September 8, 25 8 / 3
14 MML parameter estimation (Wallace and Freeman, 987) Single component (H) with parameter Θ j I (H&D) = I (Θ j ) + I (D H) + constant where I (Θ j ) = log h(θ j) F(Θj ) Prior density h(θ j ) Expected Fisher information F(Θ j ) Negative log-likelihood I (D H) Parthan Kasarapu Mixture modelling using MML September 8, 25 9 / 3
15 MML parameter estimation (Wallace and Freeman, 987) Mixture with K components (H) I (H&D) = I (K) + I (w) + K I (Θ j ) +I (D H) + constant j= } {{ } first part Parthan Kasarapu Mixture modelling using MML September 8, 25 / 3
16 MML parameter estimation (Wallace and Freeman, 987) Mixture with K components (H) I (H&D) = I (K) + I (w) + K I (Θ j ) +I (D H) + constant j= } {{ } first part An EM algorithm to estimate parameters... Component parameters are updated using their MML estimates! I (H&D) is the scoring function. Parthan Kasarapu Mixture modelling using MML September 8, 25 / 3
17 Determining the number of components K Several scoring functions... AIC & BIC (Akaike, 974; Schwarz et al., 978) MDL (Rissanen, 978) Approximated MML (Oliver et al., 996; Roberts et al., 998) ICL (Biernacki et al., 2) MML-like (Figueiredo and Jain, 22) Parthan Kasarapu Mixture modelling using MML September 8, 25 / 3
18 Determining the number of components K Several scoring functions... AIC & BIC (Akaike, 974; Schwarz et al., 978) MDL (Rissanen, 978) Approximated MML (Oliver et al., 996; Roberts et al., 998) ICL (Biernacki et al., 2) MML-like (Figueiredo and Jain, 22) We propose a comprehensive MML formulation with no assumptions. Parthan Kasarapu Mixture modelling using MML September 8, 25 / 3
19 Determining the number of components K Search method: existing approaches... Choose the K that has the best EM outcome. Figueiredo and Jain (22) propose an improved method. Begin with a large number of components. Iteratively eliminate the redundant ones. MML-based Snob (Wallace and Boulton, 968)... Perturb the current mixture. Assumes independent assumption on the attributes. Parthan Kasarapu Mixture modelling using MML September 8, 25 2 / 3
20 Proposed search method Basic idea Perturb a K-component mixture through a series of operations so that the mixture escapes a presumably sub-optimal state to an improved state. Operations include... Split Delete Merge Parthan Kasarapu Mixture modelling using MML September 8, 25 3 / 3
21 Illustrative example of the search method Y X Original mixture with three components. Parthan Kasarapu Mixture modelling using MML September 8, 25 4 / 3
22 Illustrative example of the search method Y Y X X Original mixture with three components. Begin with a one-component mixture. Parthan Kasarapu Mixture modelling using MML September 8, 25 4 / 3
23 Illustrative example of the search method Y Y Y X X X (a) I = bits (b) Splitting (c) I = bits Split operation A parent component is split to find locally optimal children leading to a (K + )-component mixture. Parthan Kasarapu Mixture modelling using MML September 8, 25 5 / 3
24 Illustrative example of the search method Y Y Y X X X (d) Initial means (e) I = 2269 bits (f) I = 2246 bits Parthan Kasarapu Mixture modelling using MML September 8, 25 6 / 3
25 Illustrative example of the search method Y Y Y X X X (g) Deleting (h) I = bits (i) I = bits Delete operation A component is deleted to find an optimal (K )-component mixture. Parthan Kasarapu Mixture modelling using MML September 8, 25 7 / 3
26 Illustrative example of the search method Y Y Y X X X (j) Merging (k) Initialization (l) I = bits Merge operation A pair of close components are merged to find an optimal (K )-component mixture. Parthan Kasarapu Mixture modelling using MML September 8, 25 8 / 3
27 Evolution of the mixture model Message length (in thousands of bits) first part second part total Number of components.2 Figure: Variation of the individual parts of the total message length with increasing components. Parthan Kasarapu Mixture modelling using MML September 8, 25 9 / 3
28 Performance of the proposed method Comparison with the search method of Figueiredo and Jain (22) Correct selections (%) Proposed FJ Separation δ (a) Average number of inferred components Proposed FJ Number of data points (b) Figure: -dimensional Gaussian mixture simulations (a) Percentage of correct selections with varying δ for a fixed sample size of N = 8 (b) Average number of inferred mixture components with different sample sizes and δ =.2 between component means. Parthan Kasarapu Mixture modelling using MML September 8, 25 2 / 3
29 Performance of the proposed method Comparison methodology I MML = I MML (M FJ ) I MML (M ) and I FJ = I FJ (M FJ ) I FJ (M ) Difference in message lengths Proposed ( I MML ) MML-like ( I FJ ) Empirical KL-divergence Proposed FJ Separation δ (a) Separation δ (b) Figure: (a) Difference in message lengths of inferred mixtures (b) Box-whisker plot of KL-divergence of inferred mixtures. Parthan Kasarapu Mixture modelling using MML September 8, 25 2 / 3
30 Mixtures of von Mises-Fisher (vmf) distributions vmf is analogous to a symmetric Gaussian wrapped on the hypersphere. Suitable for modelling directional data. Mixtures of vmf distributions inferred for... Describing protein data. High-dimensional text clustering. Parthan Kasarapu Mixture modelling using MML September 8, / 3
31 Mixture modelling of protein directional data X Pi+ θ Pi φ Pi X3 X2 Pi 2 Data corresponds to unit vectors on the sphere. Set of co-latitude θ [, π] and longitude φ [, 2π) pairs. Parthan Kasarapu Mixture modelling using MML September 8, / 3
32 Mixture modelling of protein directional data Parthan Kasarapu Mixture modelling using MML September 8, / 3
33 Optimal number of vmf mixture components 95 Co-latitude θ Longitude φ Figure: 37-component mixture Parthan Kasarapu Mixture modelling using MML September 8, / 3
34 Improved descriptors of protein data Null model Total message length Bits per (millions of bits) residue Uniform vmf mixture Parthan Kasarapu Mixture modelling using MML September 8, / 3
35 Text clustering Data corresponds to the normalized vector representations of text documents (Banerjee et al., 25). Parthan Kasarapu Mixture modelling using MML September 8, / 3
36 Text clustering Data corresponds to the normalized vector representations of text documents (Banerjee et al., 25). Clusters Methods of vmf parameter estimation Evaluation metric True Inferred Banerjee Tanabe Sra Song MML Message length Avg. F-measure Mutual Information Message length Mutual Information Table: Clustering performance on the two datasets: (a) Classic3 (d = 4358)(b) CMU Newsgroup (d = 6448). The MML mixtures consistently have lower message lengths. Parthan Kasarapu Mixture modelling using MML September 8, / 3
37 Summary MML-based parameter estimation of... Multivariate Gaussian and vmf distributions Design of the mixture modelling apparatus... Selection of the optimal number of components. Applications to modelling protein directional data and text clustering. P. Kasarapu, L. Allison, Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions, Machine Learning, (2-3): , 25. Parthan Kasarapu Mixture modelling using MML September 8, / 3
38 Thank you. Parthan Kasarapu Mixture modelling using MML September 8, / 3
39 References I H. Akaike. A new look at the statistical model identification. IEEE Transactions on Automatic Control, 9(6):76 723, Dec 974. A. Banerjee, I. S. Dhillon, J. Ghosh, and S. Sra. Clustering on the unit hypersphere using von Mises-Fisher distributions. Journal of Machine Learning Research, 6: , 25. C. Biernacki, G. Celeux, and G. Govaert. Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(7):79 725, 2. M. A. T. Figueiredo and A. K. Jain. Unsupervised learning of finite mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(3):38 396, 22. J. J. Oliver, R. A. Baxter, and C. S. Wallace. Unsupervised learning using MML. In Machine Learning: Proceedings of the 3th International Conference, pages , 996. J. Rissanen. Modeling by shortest data description. Automatica, 4(5):465 47, 978. S. Roberts, D. Husmeier, I. Rezek, and W. Penny. Bayesian approaches to Gaussian mixture modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2(): 33 42, Nov 998. G. Schwarz et al. Estimating the dimension of a model. The Annals of Statistics, 6(2):46 464, 978. C. S. Wallace and D. M. Boulton. An information measure for classification. Computer Journal, (2):85 94, 968. C. S. Wallace and P. R. Freeman. Estimation and inference by compact coding. Journal of the Royal Statistical Society: Series B (Methodological), 49(3):24 265, 987. Parthan Kasarapu Mixture modelling using MML September 8, 25 3 / 3
The Regularized EM Algorithm
The Regularized EM Algorithm Haifeng Li Department of Computer Science University of California Riverside, CA 92521 hli@cs.ucr.edu Keshu Zhang Human Interaction Research Lab Motorola, Inc. Tempe, AZ 85282
More informationMinimum Message Length Grouping of Ordered Data
Minimum Message Length Grouping of Ordered Data Leigh J. Fitzgibbon, Lloyd Allison, and David L. Dowe School of Computer Science and Software Engineering Monash University, Clayton, VIC 3168 Australia
More informationMinimum Message Length Inference and Mixture Modelling of Inverse Gaussian Distributions
Minimum Message Length Inference and Mixture Modelling of Inverse Gaussian Distributions Daniel F. Schmidt Enes Makalic Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School
More informationA clustering view on ESS measures of political interest:
A clustering view on ESS measures of political interest: An EM-MML approach Cláudia Silvestre Margarida Cardoso Mário Figueiredo Escola Superior de Comunicação Social - IPL BRU_UNIDE, ISCTE-IUL Instituto
More informationMinimum Message Length Autoregressive Model Order Selection
Minimum Message Length Autoregressive Model Order Selection Leigh J. Fitzgibbon School of Computer Science and Software Engineering, Monash University Clayton, Victoria 38, Australia leighf@csse.monash.edu.au
More informationMinimum Message Length Clustering of Spatially-Correlated Data with Varying Inter-Class Penalties
Minimum Message Length Clustering of Spatially-Correlated Data with Varying Inter-Class Penalties Gerhard Visser Clayton School of I.T., Monash University, Clayton, Vic. 3168, Australia, (gvis1@student.monash.edu.au)
More informationThe Minimum Message Length Principle for Inductive Inference
The Principle for Inductive Inference Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health University of Melbourne University of Helsinki, August 25,
More informationDiscriminant Analysis with High Dimensional. von Mises-Fisher distribution and
Athens Journal of Sciences December 2014 Discriminant Analysis with High Dimensional von Mises - Fisher Distributions By Mario Romanazzi This paper extends previous work in discriminant analysis with von
More informationStatistical and Inductive Inference by Minimum Message Length
C.S. Wallace Statistical and Inductive Inference by Minimum Message Length With 22 Figures Springer Contents Preface 1. Inductive Inference 1 1.1 Introduction 1 1.2 Inductive Inference 5 1.3 The Demise
More informationBivariate Weibull-power series class of distributions
Bivariate Weibull-power series class of distributions Saralees Nadarajah and Rasool Roozegar EM algorithm, Maximum likelihood estimation, Power series distri- Keywords: bution. Abstract We point out that
More informationModel Comparison. Course on Bayesian Inference, WTCN, UCL, February Model Comparison. Bayes rule for models. Linear Models. AIC and BIC.
Course on Bayesian Inference, WTCN, UCL, February 2013 A prior distribution over model space p(m) (or hypothesis space ) can be updated to a posterior distribution after observing data y. This is implemented
More informationModel-based cluster analysis: a Defence. Gilles Celeux Inria Futurs
Model-based cluster analysis: a Defence Gilles Celeux Inria Futurs Model-based cluster analysis Model-based clustering (MBC) consists of assuming that the data come from a source with several subpopulations.
More informationModel Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model
Model Selection Tutorial 2: Problems With Using AIC to Select a Subset of Exposures in a Regression Model Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population
More informationExpectation Maximization
Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same
More informationStatistics 202: Data Mining. c Jonathan Taylor. Model-based clustering Based in part on slides from textbook, slides of Susan Holmes.
Model-based clustering Based in part on slides from textbook, slides of Susan Holmes December 2, 2012 1 / 1 Model-based clustering General approach Choose a type of mixture model (e.g. multivariate Normal)
More informationEfficient Linear Regression by Minimum Message Length
1 Efficient Linear Regression by Minimum Message Length Enes Makalic and Daniel F. Schmidt Abstract This paper presents an efficient and general solution to the linear regression problem using the Minimum
More informationAdaptive Mixture Discriminant Analysis for. Supervised Learning with Unobserved Classes
Adaptive Mixture Discriminant Analysis for Supervised Learning with Unobserved Classes Charles Bouveyron SAMOS-MATISSE, CES, UMR CNRS 8174 Université Paris 1 (Panthéon-Sorbonne), Paris, France Abstract
More informationTerm Filtering with Bounded Error
Term Filtering with Bounded Error Zi Yang, Wei Li, Jie Tang, and Juanzi Li Knowledge Engineering Group Department of Computer Science and Technology Tsinghua University, China {yangzi, tangjie, ljz}@keg.cs.tsinghua.edu.cn
More informationDegenerate Expectation-Maximization Algorithm for Local Dimension Reduction
Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction Xiaodong Lin 1 and Yu Zhu 2 1 Statistical and Applied Mathematical Science Institute, RTP, NC, 27709 USA University of Cincinnati,
More informationMinimum Entropy Data Partitioning. Exhibition Road, London SW7 2BT, UK. in even a 3-dimensional data space). a Radial-Basis Function (RBF) classier in
Minimum Entropy Data Partitioning Stephen J. Roberts, Richard Everson & Iead Rezek Intelligent & Interactive Systems Group Department of Electrical & Electronic Engineering Imperial College of Science,
More informationExploring the Patterns of Human Mobility Using Heterogeneous Traffic Trajectory Data
Exploring the Patterns of Human Mobility Using Heterogeneous Traffic Trajectory Data Jinzhong Wang April 13, 2016 The UBD Group Mobile and Social Computing Laboratory School of Software, Dalian University
More informationMML Invariant Linear Regression
MML Invariant Linear Regression Daniel F. Schmidt and Enes Makalic The University of Melbourne Centre for MEGA Epidemiology Carlton VIC 3053, Australia {dschmidt,emakalic}@unimelb.edu.au Abstract. This
More informationMinimum Message Length Analysis of the Behrens Fisher Problem
Analysis of the Behrens Fisher Problem Enes Makalic and Daniel F Schmidt Centre for MEGA Epidemiology The University of Melbourne Solomonoff 85th Memorial Conference, 2011 Outline Introduction 1 Introduction
More informationMDL Histogram Density Estimation
MDL Histogram Density Estimation Petri Kontkanen, Petri Myllymäki Complex Systems Computation Group (CoSCo) Helsinki Institute for Information Technology (HIIT) University of Helsinki and Helsinki University
More informationLecture 6: Model Checking and Selection
Lecture 6: Model Checking and Selection Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 27, 2014 Model selection We often have multiple modeling choices that are equally sensible: M 1,, M T. Which
More informationTHE EMMIX SOFTWARE FOR THE FITTING OF MIXTURES OF NORMAL AND t-components
THE EMMIX SOFTWARE FOR THE FITTING OF MIXTURES OF NORMAL AND t-components G.J. McLachlan, D. Peel, K.E. Basford*, and P. Adams Department of Mathematics, University of Queensland, St. Lucia, Queensland
More informationMODULE -4 BAYEIAN LEARNING
MODULE -4 BAYEIAN LEARNING CONTENT Introduction Bayes theorem Bayes theorem and concept learning Maximum likelihood and Least Squared Error Hypothesis Maximum likelihood Hypotheses for predicting probabilities
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationMML Mixture Models of Heterogeneous Poisson Processes with Uniform Outliers for Bridge Deterioration
MML Mixture Models of Heterogeneous Poisson Processes with Uniform Outliers for Bridge Deterioration T. Maheswaran, J.G. Sanjayan 2, David L. Dowe 3 and Peter J. Tan 3 VicRoads, Metro South East Region,
More informationUnsupervised Learning. k-means Algorithm
Unsupervised Learning Supervised Learning: Learn to predict y from x from examples of (x, y). Performance is measured by error rate. Unsupervised Learning: Learn a representation from exs. of x. Learn
More information7. Estimation and hypothesis testing. Objective. Recommended reading
7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing
More informationChoosing a model in a Classification purpose. Guillaume Bouchard, Gilles Celeux
Choosing a model in a Classification purpose Guillaume Bouchard, Gilles Celeux Abstract: We advocate the usefulness of taking into account the modelling purpose when selecting a model. Two situations are
More informationMachine Learning Summer School
Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,
More informationSome Model Selection Problems. in Image Analysis. Mário A. T. Figueiredo. Instituto Superior Técnico PORTUGAL. Lisboa
Some Model Selection Problems in Image Analysis Mário A. T. Figueiredo Instituto Superior Técnico Lisboa PORTUGAL People involved in work here presented: José M. N. Leitão, Instituto Superior Técnico,
More informationMODEL SELECTION CRITERIA FOR ACOUSTIC SEGMENTATION
ISCA Archive MODEL SELECTION CRITERIA FOR ACOUSTIC SEGMENTATION Mauro Cettolo and Marcello Federico ITC-irst - Centro per la Ricerca Scientifica e Tecnologica I-385 Povo, Trento, Italy ABSTRACT Robust
More informationIndependent Component Analysis and Unsupervised Learning
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien National Cheng Kung University TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent
More informationDiversity-Promoting Bayesian Learning of Latent Variable Models
Diversity-Promoting Bayesian Learning of Latent Variable Models Pengtao Xie 1, Jun Zhu 1,2 and Eric Xing 1 1 Machine Learning Department, Carnegie Mellon University 2 Department of Computer Science and
More informationMinimum Hellinger Distance Estimation in a. Semiparametric Mixture Model
Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.
More informationNEW ESTIMATORS FOR PARALLEL STEADY-STATE SIMULATIONS
roceedings of the 2009 Winter Simulation Conference M. D. Rossetti, R. R. Hill, B. Johansson, A. Dunkin, and R. G. Ingalls, eds. NEW ESTIMATORS FOR ARALLEL STEADY-STATE SIMULATIONS Ming-hua Hsieh Department
More informationClustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation
Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:
More informationChapter 6 The Structural Risk Minimization Principle
Chapter 6 The Structural Risk Minimization Principle Junping Zhang jpzhang@fudan.edu.cn Intelligent Information Processing Laboratory, Fudan University March 23, 2004 Objectives Structural risk minimization
More informationUnsupervised machine learning
Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels
More informationAutomated Segmentation of Low Light Level Imagery using Poisson MAP- MRF Labelling
Automated Segmentation of Low Light Level Imagery using Poisson MAP- MRF Labelling Abstract An automated unsupervised technique, based upon a Bayesian framework, for the segmentation of low light level
More informationLocal Asymptotics and the Minimum Description Length
Local Asymptotics and the Minimum Description Length Dean P. Foster and Robert A. Stine Department of Statistics The Wharton School of the University of Pennsylvania Philadelphia, PA 19104-6302 March 27,
More informationDifferent points of view for selecting a latent structure model
Different points of view for selecting a latent structure model Gilles Celeux Inria Saclay-Île-de-France, Université Paris-Sud Latent structure models: two different point of views Density estimation LSM
More informationVariable selection for model-based clustering
Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition
More informationMemoirs of the Faculty of Engineering, Okayama University, Vol. 42, pp , January Geometric BIC. Kenichi KANATANI
Memoirs of the Faculty of Engineering, Okayama University, Vol. 4, pp. 10-17, January 008 Geometric BIC Kenichi KANATANI Department of Computer Science, Okayama University Okayama 700-8530 Japan (Received
More informationMinimum Message Length Shrinkage Estimation
Minimum Message Length Shrinage Estimation Enes Maalic, Daniel F. Schmidt Faculty of Information Technology, Monash University, Clayton, Australia, 3800 Abstract This note considers estimation of the mean
More informationIndependent Component Analysis and Unsupervised Learning. Jen-Tzung Chien
Independent Component Analysis and Unsupervised Learning Jen-Tzung Chien TABLE OF CONTENTS 1. Independent Component Analysis 2. Case Study I: Speech Recognition Independent voices Nonparametric likelihood
More informationDimension Reduction (PCA, ICA, CCA, FLD,
Dimension Reduction (PCA, ICA, CCA, FLD, Topic Models) Yi Zhang 10-701, Machine Learning, Spring 2011 April 6 th, 2011 Parts of the PCA slides are from previous 10-701 lectures 1 Outline Dimension reduction
More informationThe Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models
The Behaviour of the Akaike Information Criterion when Applied to Non-nested Sequences of Models Centre for Molecular, Environmental, Genetic & Analytic (MEGA) Epidemiology School of Population Health
More informationTwo Further Gradient BYY Learning Rules for Gaussian Mixture with Automated Model Selection
Two Further Gradient BYY Learning Rules for Gaussian Mixture with Automated Model Selection Jinwen Ma, Bin Gao, Yang Wang, and Qiansheng Cheng Department of Information Science, School of Mathematical
More informationStatistical learning. Chapter 20, Sections 1 4 1
Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationLiterature on Bregman divergences
Literature on Bregman divergences Lecture series at Univ. Hawai i at Mānoa Peter Harremoës February 26, 2016 Information divergence was introduced by Kullback and Leibler [25] and later Kullback started
More informationMultiple Speaker Tracking with the Factorial von Mises- Fisher Filter
Multiple Speaker Tracking with the Factorial von Mises- Fisher Filter IEEE International Workshop on Machine Learning for Signal Processing Sept 21-24, 2014 Reims, France Johannes Traa, Paris Smaragdis
More informationRegression I: Mean Squared Error and Measuring Quality of Fit
Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving
More informationModel Estimation Example
Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions
More informationAn Information Theoretic Interpretation of Variational Inference based on the MDL Principle and the Bits-Back Coding Scheme
An Information Theoretic Interpretation of Variational Inference based on the MDL Principle and the Bits-Back Coding Scheme Ghassen Jerfel April 2017 As we will see during this talk, the Bayesian and information-theoretic
More informationEBEM: An Entropy-based EM Algorithm for Gaussian Mixture Models
EBEM: An Entropy-based EM Algorithm for Gaussian Mixture Models Antonio Peñalver Benavent, Francisco Escolano Ruiz and Juan M. Sáez Martínez Robot Vision Group Alicante University 03690 Alicante, Spain
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationCovariance Matrix Enhancement Approach to Train Robust Gaussian Mixture Models of Speech Data
Covariance Matrix Enhancement Approach to Train Robust Gaussian Mixture Models of Speech Data Jan Vaněk, Lukáš Machlica, Josef V. Psutka, Josef Psutka University of West Bohemia in Pilsen, Univerzitní
More informationStatistics 262: Intermediate Biostatistics Model selection
Statistics 262: Intermediate Biostatistics Model selection Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Today s class Model selection. Strategies for model selection.
More informationData Mining Techniques
Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationMISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30
MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)
More informationCOMPUTING THE REGRET TABLE FOR MULTINOMIAL DATA
COMPUTIG THE REGRET TABLE FOR MULTIOMIAL DATA Petri Kontkanen, Petri Myllymäki April 12, 2005 HIIT TECHICAL REPORT 2005 1 COMPUTIG THE REGRET TABLE FOR MULTIOMIAL DATA Petri Kontkanen, Petri Myllymäki
More informationA MULTIVARIATE MODEL FOR COMPARISON OF TWO DATASETS AND ITS APPLICATION TO FMRI ANALYSIS
A MULTIVARIATE MODEL FOR COMPARISON OF TWO DATASETS AND ITS APPLICATION TO FMRI ANALYSIS Yi-Ou Li and Tülay Adalı University of Maryland Baltimore County Baltimore, MD Vince D. Calhoun The MIND Institute
More informationBregman Divergences for Data Mining Meta-Algorithms
p.1/?? Bregman Divergences for Data Mining Meta-Algorithms Joydeep Ghosh University of Texas at Austin ghosh@ece.utexas.edu Reflects joint work with Arindam Banerjee, Srujana Merugu, Inderjit Dhillon,
More informationModule Master Recherche Apprentissage et Fouille
Module Master Recherche Apprentissage et Fouille Michele Sebag Balazs Kegl Antoine Cornuéjols http://tao.lri.fr 19 novembre 2008 Unsupervised Learning Clustering Data Streaming Application: Clustering
More informationAnalysis Methods for Supersaturated Design: Some Comparisons
Journal of Data Science 1(2003), 249-260 Analysis Methods for Supersaturated Design: Some Comparisons Runze Li 1 and Dennis K. J. Lin 2 The Pennsylvania State University Abstract: Supersaturated designs
More informationMultivariate Statistics
Multivariate Statistics Chapter 6: Cluster Analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2017/2018 Master in Mathematical Engineering
More informationHow New Information Criteria WAIC and WBIC Worked for MLP Model Selection
How ew Information Criteria WAIC and WBIC Worked for MLP Model Selection Seiya Satoh and Ryohei akano ational Institute of Advanced Industrial Science and Tech, --7 Aomi, Koto-ku, Tokyo, 5-6, Japan Chubu
More informationEM & Variational Bayes
EM & Variational Bayes Hanxiao Liu September 9, 2014 1 / 19 Outline 1. EM Algorithm 1.1 Introduction 1.2 Example: Mixture of vmfs 2. Variational Bayes 2.1 Introduction 2.2 Example: Bayesian Mixture of
More informationClustering with k-means and Gaussian mixture distributions
Clustering with k-means and Gaussian mixture distributions Machine Learning and Object Recognition 2017-2018 Jakob Verbeek Clustering Finding a group structure in the data Data in one cluster similar to
More informationMixture models for analysing transcriptome and ChIP-chip data
Mixture models for analysing transcriptome and ChIP-chip data Marie-Laure Martin-Magniette French National Institute for agricultural research (INRA) Unit of Applied Mathematics and Informatics at AgroParisTech,
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More informationCOURSE INTRODUCTION. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
COURSE INTRODUCTION COMPUTATIONAL MODELING OF VISUAL PERCEPTION 2 The goal of this course is to provide a framework and computational tools for modeling visual inference, motivated by interesting examples
More informationAn Analysis of the Difference of Code Lengths Between Two-Step Codes Based on MDL Principle and Bayes Codes
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 47, NO. 3, MARCH 2001 927 An Analysis of the Difference of Code Lengths Between Two-Step Codes Based on MDL Principle Bayes Codes Masayuki Goto, Member, IEEE,
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationBayesian methods for graph clustering
Author manuscript, published in "Advances in data handling and business intelligence (2009) 229-239" DOI : 10.1007/978-3-642-01044-6 Bayesian methods for graph clustering P. Latouche, E. Birmelé, and C.
More informationIntroduction to Probabilistic Graphical Models
Introduction to Probabilistic Graphical Models Kyu-Baek Hwang and Byoung-Tak Zhang Biointelligence Lab School of Computer Science and Engineering Seoul National University Seoul 151-742 Korea E-mail: kbhwang@bi.snu.ac.kr
More informationSparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference
Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference Shunsuke Horii Waseda University s.horii@aoni.waseda.jp Abstract In this paper, we present a hierarchical model which
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More information10708 Graphical Models: Homework 2
10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves
More informationPresentation in Convex Optimization
Dec 22, 2014 Introduction Sample size selection in optimization methods for machine learning Introduction Sample size selection in optimization methods for machine learning Main results: presents a methodology
More informationDetection of Outliers in Regression Analysis by Information Criteria
Detection of Outliers in Regression Analysis by Information Criteria Seppo PynnÄonen, Department of Mathematics and Statistics, University of Vaasa, BOX 700, 65101 Vaasa, FINLAND, e-mail sjp@uwasa., home
More informationClustering with k-means and Gaussian mixture distributions
Clustering with k-means and Gaussian mixture distributions Machine Learning and Category Representation 2014-2015 Jakob Verbeek, ovember 21, 2014 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15
More informationParametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory
Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007
More informationClustering using the Minimum Message Length Criterion and Simulated Annealing
Clustering using the Minimum Message Length Criterion and Simulated Annealing Ian Davidson CSIRO Australia, Division of Information Technology, 723 Swanston Street, Carlton, Victoria, Australia 3053, inpd@mel.dit.csiro.au
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationForecasting Wind Ramps
Forecasting Wind Ramps Erin Summers and Anand Subramanian Jan 5, 20 Introduction The recent increase in the number of wind power producers has necessitated changes in the methods power system operators
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationUnsupervised Learning with Permuted Data
Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University
More informationPCA and admixture models
PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationLecture 6: Gaussian Mixture Models (GMM)
Helsinki Institute for Information Technology Lecture 6: Gaussian Mixture Models (GMM) Pedram Daee 3.11.2015 Outline Gaussian Mixture Models (GMM) Models Model families and parameters Parameter learning
More informationNonparametric Bayes Density Estimation and Regression with High Dimensional Data
Nonparametric Bayes Density Estimation and Regression with High Dimensional Data Abhishek Bhattacharya, Garritt Page Department of Statistics, Duke University Joint work with Prof. D.Dunson September 2010
More informationData Preprocessing. Cluster Similarity
1 Cluster Similarity Similarity is most often measured with the help of a distance function. The smaller the distance, the more similar the data objects (points). A function d: M M R is a distance on M
More information