EBEM: An Entropy-based EM Algorithm for Gaussian Mixture Models

Size: px
Start display at page:

Download "EBEM: An Entropy-based EM Algorithm for Gaussian Mixture Models"

Transcription

1 EBEM: An Entropy-based EM Algorithm for Gaussian Mixture Models Antonio Peñalver Benavent, Francisco Escolano Ruiz and Juan M. Sáez Martínez Robot Vision Group Alicante University Alicante, Spain {sco, Abstract In this paper we address the problem of estimating the parameters of a Gaussian mixture model. Although the EM algorithm yields the maximum-likelihood solution it requires a careful initialization of the parameters and the optimal number of kernels in the mixture may be unknown beforehand. We propose a criterion based on the entropy of the pdf (probability density function) associated to each kernel to measure the quality of a given mixture model. A novel method for estimating Shannon entropy based on Entropic Spanning Graphs is developed and a modification of the classical EM algorithm to find the optimal number of kernels in the mixture is presented. We test our algorithm in probability density estimation, pattern recognition and color image segmentation. 1 Introduction Gaussian Mixture models have been widely used for density estimation, pattern recognition and function approximation. One of the most common methods for fitting mixtures to data is the EM algorithm [5]. However, this algorithm is prone to initialization errors and it may converge to local maxima of the log-likelihood function. In addition, the algorithm requires that the number of elements (kernels) in the mixture is known beforehand (model-selection). The so called model-selection problem has been addressed in many ways [13][14][7][6]. In this paper we propose a method that starting with only one kernel, finds the maximum-likelihood solution. In order to do so, it tests whether the underlying pdf of each kernel is Gaussian and otherwise it replaces that kernel with two kernels adequately separated from each other. In order to detect non- Gaussianity we compare the entropy of the underlying pdf with the theoretical entropy of a Gaussian. After the kernel with worse degree of Gaussianity has been splited in two, new EM steps are performed in order to obtain a new maximum-likelihood solution. The paper is organized as follows: In section 2, we review the basic principles of Gaussian-mixture models and usual EM algorithm. In section 3, we introduce Entropic Spanning Graphs and how they can be related with the estimation of Renyi s entropy of order α and how Shannon entropy can be estimated through it. In section 4, we present our maximum-entropy based criterion approach and the modified EM algorithm that exploits it. In section 5, we show some experimental results. Conclusions and future work are presented in Section 6. 2 Gaussian-Mixture Models 2.1 Definition A d-dimensional random variable y follows a finitemixture distribution when its pdf p(y Θ) can be described by a weighted sum of known pdf s named kernels. When all these kernels are Gaussian, the mixture is named in the same way: K p(y Θ) = π i p(y Θ i ) (1) i=1 where 0 π i 1, i = 1,..., K, and K i=1 π i = 1, being K the number of kernels, π 1,..., π k the a priori probabilities of each kernel, and Θ i the parameters describing the kernel. In Gaussian mixtures, Θ i = {µ i, Σ i }, that is, the average vector and the covariance matrix. The set of parameters of a given mixture is Θ {Θ 1,..., Θ k, π 1,..., π k }. Obtaining the optimal set of parameters Θ is usually posed in terms of maximizing the log-likelihood of the pdf to be estimated: l(y Θ) = log p(y Θ) = log N n=1 p(y n Θ) = = n=1 log K k=1 π kp(y k Θ k ). With Θ = arg max Θ l(θ) and Y = {y 1,...y N } is a set of N i.i.d. samples of the variable Y. (2)

2 2.2 EM Algorithm The EM (Expectation-Maximization) algorithm [5] is an iterative procedure that allows us to find maximumlikelihood solutions to problems in which there are hidden variables. In the case of Gaussian mixtures [11], these variables are a set of N labels Z = {z 1,..., z N } associated to the samples. Each label is a binary vector z i = [z (i) 1,..., z(i) k ], with z(i) m = 1 and z p (i) = 0, if p m, indicating that y (i) has been generated by the kernel m. The EM algorithm generates a sequence of estimations of the set of parameters {Θ (t), t = 1, 2,...} by alternating an expectation step and the maximization one until convergence. The Expectation step consists of estimating the expected value of the hidden variables given the visible data Y and the current estimation of the parameters Θ (t). The probability of generating y n with the kernel k is given by p(k y n ) = π kp(y (n) k) Σ K j=1 π jp(y (n) k) In the Maximization step we re-estimate the new parameters Θ (t + 1) given the expected Z: p(k y )y n=1 n n p(k y, n=1 n ) N p(k y )(y n=1 n n µ k)(y n µ k ) T N p(k y, n=1 n ) π k = 1 N n=1 p(k y n ), µ k = Σ k = A detailed description of this classic algorithm is given in [11]. Here we focus on the fact that if K is unknown beforehand it cannot be estimated through maximizing the log-likelihood because l(θ) grows with K. In a classical EM algorithm with a fixed number of kernels density can be underestimated giving a poor description of the data. In the next section we describe the use of entropy to test whether a given kernel describes properly the underlying data. 3 Shannon entropy estimation Several methods for the nonparametric estimation of the differential entropy of a continuous random variable have been widely used (see [1] for a detailed overview). Entropic Spanning Graphs obtained from data to estimate Renyi s α- entropy [9] belong to a type of methods called non plugin. Renyi s α-entropy of a probability density function f is defined as: H α (f) = 1 1 α ln f α (z)dz (5) z for α (0, 1). The α entropy converges to the Shannon entropy f(z) ln f(z)dz as α 1, so it is possible to obtain the second one from the first one. (3) (4) 3.1 Renyi s Entropy and Entropic Spanning Graphs A graph G consists of a set of vertices X n = {x 1,..., x n }, with x n R d and edges {e} that connect vertices in graph: e ij = (x i, x j ). If we denote by M(X n ) the possible sets of edges in the class of acyclic graphs spanning X n (spanning trees), the total edge length functional of the Euclidean power weighted Minimal Spanning Tree is: L MST γ (X n ) = min e γ (6) M(X n) e M(X n) with γ (0, d) y e the euclidean distance between graph vertices. It is intuitive that the length of the MST spanning the more concentrated nonuniform set of points increases at a slower rate than does the MST spanning the uniformly distributed points. This fact, motivate the application of the MST as a way to test for randomness of a set of points[10]. In [8] it was showed that in d-dimensional feature space, with d 2: H α (X n ) = d γ [ ln L ] γ(x n ) n α ln β Lγ,d is an asymptotically unbiased, and almost surely consistent, estimator of the α-entropy of f where α = (d γ)/d and β Lγ,d is a constant bias correction depending on the graph minimization criterion, but independent of f. Closed form expressions are not available for β Lγ,d, only known approximations and bounds: (i) Monte Carlo simulation of uniform random samples on unit cube [0, 1] d ; (ii) Large d approximation: (γ/2) ln(d/(2πe)) in [2]. We can estimate H α (f) for different values of α = (d γ)/d by changing the edge weight exponent γ. As γ modifies the edge weights monotonically, the graph is the same for different values of γ, and only the total length in expression 7 needs to be recomputed. 3.2 Renyi Extrapolation of Shannon Entropy Entropic spanning graphs are suitable for estimating α- entropy with α [0, 1[, so Shannon entropy can not be directly estimated with this method. Figure 1 shows that the shape of function does not depend neither on the nature of data nor on their size. Since it is not possible to directly calculate the value of H α for α = 1, we will approximate this value by means of a continuous function that captures the tendency of H α in the environment of 1. From a value of α [0, 1[, we can calculate the tangent line y = mx + b to H α in this point, using m = H α, x = α and y = H α. In any case, this line (7)

3 Figure 1. H α for gaussian distributions with different covariance matrices. will be continuous and we will be able to calculate its value for x = 1. The point in α = 1 will be different depending on the α used. From now on, we will call α to the α value that generates the correct entropy value in α = 1, following the described procedure. Experimentally, we have verified which this value does not depend on the nature of the probability distribution associated to the data, but it depends on the number of samples used for the estimation and the problem dimension. As H α is a monotonous decreasing function, we can estimate α value in the Gaussian case by means of a dichotomic search between two well separated α values for a constant number of samples and problem dimension and different covariance matrices. Experimentally, we have verified that α is almost constant for diagonal covariance matrices with variance value greater than 0.5. In order to appreciate the effects of the dimension and the number of samples on the problem, we generated a new experiment in which we randomly calculated α for a set of 1000 distributions with 2 d 5. Experimentally we have verified that the shape of the underlying curve adjusts suitably to a function of the type: α a+b expcd = 1 N, where N is the number of samples, D is the problem dimension and a, b, c are three constants to estimate. In order to estimate these values, we used Monte Carlo Simulation, minimizing the mean square error between expression and data. We obtained a = 1.271, b = and c = Figure 2 shows the function form for different values of dimension and number of samples. 4 The EBEM Algorithm Given a set of kernels for the mixture (initially one kernel) we evaluate the real global entropy H(y) and the theoretical maximum entropy H max (y) of the mixture by considering the individual pairs of entropies for each ker- Figure 2. α for dimensions between 2 and 5 and different number of samples. nel, and the prior probabilities: H(Y ) = K k=1 π kh k (Y ) and H max (Y ) = K k=1 π kh maxk (Y ), with H maxk (Y ) = 1 2 log[(2πe)d Σ ]. If the ratio H(y)/H max (y) is above a given threshold we consider that all kernels are well fitted. Otherwise, we select the kernel with the lowest individual ratio and it is replaced by two other kernels that are conveniently placed and initialized. Then, a new EM with K + 1 kernels starts. 4.1 Introducing a New Kernel A low H(y)/H max (y) local ratio indicates that multimodality arises and thus the kernel must be replaced by two other kernels. The difficulty in the split step arises when the original covariance matrix needs to generate two new matrices with two constraints: overall dispersion must remain almost constant and the new matrices must be positive definite. In [12] the split is based on the constraints of preserving the first to moments before and after the operation in one-dimensional settings. This is an ill-posed problem because the number of equations is less than the number of unknowns and the requirement of retaining the positive definiteness of covariance matrices. 4.2 The Split Equations From definition of mixture in equation 1, considering that the K component is the one with lowest Gaussianity threshold, it must be decomposed into the K 1 and K 2 components with parameters Θ k1 = (µ k1, Σ k1 ) and Θ k2 = (µ k2, Σ k2 ). The corresponding priors, the mean vectors and the covariance matrices should satisfy the following split equations: π = π 1 + π 2 π µ = π 1 µ 1 + π 2 µ 2 π (Σ + µ µ T ) = π 1 (Σ 1 + µ 1 µ T 1 ) + π 2 (Σ 2 + µ 2 µ T 2 ) (8)

4 In [15] a RJMCMC based algorithm is presented. To solve the equations showed above, a spectral decomposition of the actual covariance matrix is used and the original problem is replaced by estimating the new eigenvalues and eigenvectors of new covariance matrices from the original one. The main limitation is that all covariance matrices of the mixture are constrained to share the same eigenvector matrix. More recently, in [4] new eigenvector matrices are obtained applying a rotation matrix. Let = V Λ V T be the spectral decomposition of the covariance matrix, with Λ = diag(λj 1,..., λj d ) a diagonal matrix containing the eigenvalues of with increasing order, the component with the lowest entropy ratio, π, π 1, π 2 the priors of both original and new components, µ, µ 1, µ 2 the means and, 1, 2 the covariance matrices. Let also be D a dxd rotation matrix with columns orthonormal unit vectors. D is built by generating its lower triangular matrix independently from d(d 1)/2 different uniform U(0, 1) densities. The proposed split operation is given by: π 1 = u 1 π, π 2 = (1 u 1 )π µ 1 = µ ( d i=1 ui 2 λ i V i ) π2 π 1 µ 2 = µ ( d i=1 ui 2 λ i V i ) π1 π 2 π Λ 1 = diag(u 3 )diag(ι u 2 )diag(ι + u 2 )Λ π 1 π Λ 2 = diag(ι u 3 )diag(ι u 2 )diag(ι + u 2 )Λ π 2 V 1 = DV, V 2 = D T V (9) where, ι is a d x 1 vector of ones, u 1, u 2 = (u 1 2, u 2 2,..., u d 2) T and u 3 = (u 1 3, u 2 3,..., u d 3) T are 2d + 1 random variables needed to construct priors, means and eigenvalues for the new component in the mixture. They are calculated as: u 1 be(2, 2), u 1 2 be(1, 2d), u j 2 U( 1, 1), u 1 3 be(1, d), u j 3 U(0, 1), with j = 2,..., d. and be() a Beta distribution. EBEM ALGORITHM Initialization: Start with a unique kernel. K 1. Θ 1 {µ 1, Σ 1} with random values. repeat: //Main loop repeat: //E, M Steps Estimate log-likelihood in iteration i: l i until: l i l i 1 < CONVERGENCE TH Evaluate H(Y ) and H max(y ) globally if (H(Y )/H max < ENTROPY TH) Select kernel K with the lowest ratio and decompose into K 1 and K 2 Initialize parameters Θ 1 and Θ 2(Eq.9) Initialize new averages: µ 1, µ 2 Initialize new eigenvalues matrices: Λ 1, Λ 2 Initialize new eigenvector matrices: V 1 and V 2 Set new priors: π 1 and π 2 else Final True until: Final = True Figure 4. Entropy Based EM algorithm. 5 Experiments In order to test our approach we have performed several experiments with synthetic and image data. In the first one we have generated 2500 samples from 5 bi-dimensional Gaussians with prior probabilities π k = 0.2 k. Their averages are: µ 1 = [ 1, 1] T, µ 2 = [6, 3] T, µ 3 = [3, 6] T, µ 4 = [2, 2] T, µ 5 = [0, 0] T. We have used a Gaussianity threshold of 0.15, and a convergence threshold of for the EM algorithm. Our algorithm converges after 30 iterations finding correctly k = 5. In Figure 5 we show the evolution of the algorithm. Figure 3. 2-D Example of splitting one kernel. A graphical description of the splitting process in the 2-D case is showed in Fig.3. Directions and magnitudes of variability are defined by eigenvectors and eigenvalues of the covariance matrix. Otherwise, a full algorithmic description of the process is showed in Fig. 4. Figure 5. Evolution of our algorithm.

5 We have also tested our approach in unsupervised segmentation of color images. At each pixel i in the image we compute a 3-dimensional feature vector x i with the components in the RGB color space. As a result of our algorithm we obtain the number of components (classes) M and y i [1, 2,..., M] to indicate from which class came the pixel i th. Therefore our image model sets that each pixel is generated by one of the Gaussian densities in the Gaussian mixture model. We have used different entropy thresholds Figure 6. Color image segmentation results. and a convergence threshold of 0.1 for the EM algorithm. In Fig. 6 we show some color image segmentation results with three different images with sizes of 189x189 pixels. In each row we show the original image and the resulting segmented image with increasing entropy threshold. The greater it is the demanded threshold the higher is the number of kernels generated and therefore the number of colors detected in the image. No postprocess has been applied to the resulting images. Finally, we have applied the proposed method to the well known Iris [3] data set, that contains 3 classes of 50 (4- dimensional) instances referred to a type of iris plant: Versicolor, Virginica and Setosa. We have generated 300 training samples from the averages and covariances of the original classes and we have checked the performance in a classification problem with the original 150 samples. Starting with K = 1, the method correctly selected K = 3. Then, a maximum a posteriori classifier was built, with classification performance of 98%. 6 Conclusions and future Work In this paper we have presented a method for finding the optimal number of kernels in a Gaussian mixture based on maximum entropy. We start the algorithm with only one kernel and then we decide to split it on the basis of the entropy of the underlying pdf. The algorithm converges in few iterations and it is suitable for density estimation, pattern recognition and unsupervised color image segmentation. We are currently exploring new methods of estimating the MST in O(logn) cost and how to select the data features efficiently. References [1] E. Beirlant, E. Dudewicz, L. Gyorfi, and E. V. der Meulen. Nonparametric entropy estimation. International Journal on Mathematical and Statistical Sciences, 6(1):17 39, [2] D. Bertsimas and G. V. Ryzin. An asymptotic determination of the minimum spanning tree and minimum matching constants in geometrical probability. Operations Research Letters, 9(1): , [3] C. Blake and C. Merz. Uci repository of machine learning databases. University of California, Irvine, Dept. of Information and Computer Sciences, [4] P. Dellaportas and I. Papageorgiou. Multivariate mixtures of normals with unknown number of components. Statistics and Computing, To appear. [5] A. Dempster, N. Laird, and D. Rubin. Maximum likelihood estimation from incomplete data via the em algorithm. Journal of The Royal Statistical Society, 39(1):1 38, [6] M. Figueiredo and A. Jain. Unsupervised selection and estimation of finite mixture models. In International Conference on Pattern Recognition. ICPR2000, Barcelona, Spain, IEEE. [7] M. Figueiredo, J. Leitao, and A. Jain. On fitting mixture models. Energy Minimization Methods in Computer Vision and Pattern Recognition. Lecture Notes in Computer Science, 1654(1):54 69, [8] A. Hero and O. Michel. Asymptotic theory of greedy aproximations to minnimal k-point random graphs. IEEE Transactions on Information Theory, 45(6): , [9] A. Hero and O. Michel. Applications of entropic graphs. IEEE Signal Processing Magazine, 19(5):85 95, [10] R. Hoffman and A. Jain. A test of randomness based on the minimal spanning tree. Pattern Recognition Letters, 1(1): , [11] R. Redner and H. Walker. Mixture densities, maximum likelihood, and the em algorithm. SIAM Review, 26(2): , [12] S. Richardson and P. Green. On bayesian analysis of mixtures with unknown number of components (with discussion). Journal of the Royal Statistical Society B, (1), [13] N. Vlassis and A. Likas. A kurtosis-based dynamic approach to gaussian mixture modeling. IEEE Trans. Systems, Man, and Cybernetics, 29(4): , [14] N. Vlassis and A. Likas. A greedy em algorithm for gaussian mixture learning. Neural Processing Letters, 15(1):77 87, [15] Z. Zhang, K. Chan, Y. Wu, and C. Chen. Learning a multivariate gaussian mixture models with the reversible jump mcmc algorithm. Statistics and Computing, (1), 2004.

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide

More information

K-Means, Expectation Maximization and Segmentation. D.A. Forsyth, CS543

K-Means, Expectation Maximization and Segmentation. D.A. Forsyth, CS543 K-Means, Expectation Maximization and Segmentation D.A. Forsyth, CS543 K-Means Choose a fixed number of clusters Choose cluster centers and point-cluster allocations to minimize error can t do this by

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

EECS490: Digital Image Processing. Lecture #26

EECS490: Digital Image Processing. Lecture #26 Lecture #26 Moments; invariant moments Eigenvector, principal component analysis Boundary coding Image primitives Image representation: trees, graphs Object recognition and classes Minimum distance classifiers

More information

JOURNAL OF PHYSICAL AGENTS, VOL. 4, NO. 2, MAY Large Scale Environment Partitioning in Mobile Robotics Recognition Tasks

JOURNAL OF PHYSICAL AGENTS, VOL. 4, NO. 2, MAY Large Scale Environment Partitioning in Mobile Robotics Recognition Tasks JOURNAL OF PHYSICAL AGENTS, VOL. 4, NO., MAY Large Scale Environment Partitioning in Mobile Robotics Recognition Tasks Boyan Bonev and Miguel Cazorla Abstract In this paper we present a scalable machine

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Estimation of Rényi Information Divergence via Pruned Minimal Spanning Trees 1

Estimation of Rényi Information Divergence via Pruned Minimal Spanning Trees 1 Estimation of Rényi Information Divergence via Pruned Minimal Spanning Trees Alfred Hero Dept. of EECS, The university of Michigan, Ann Arbor, MI 489-, USA Email: hero@eecs.umich.edu Olivier J.J. Michel

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision

More information

Gaussian Mixture Distance for Information Retrieval

Gaussian Mixture Distance for Information Retrieval Gaussian Mixture Distance for Information Retrieval X.Q. Li and I. King fxqli, ingg@cse.cuh.edu.h Department of omputer Science & Engineering The hinese University of Hong Kong Shatin, New Territories,

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Mixture Models and EM

Mixture Models and EM Mixture Models and EM Goal: Introduction to probabilistic mixture models and the expectationmaximization (EM) algorithm. Motivation: simultaneous fitting of multiple model instances unsupervised clustering

More information

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart

Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart Statistical and Learning Techniques in Computer Vision Lecture 2: Maximum Likelihood and Bayesian Estimation Jens Rittscher and Chuck Stewart 1 Motivation and Problem In Lecture 1 we briefly saw how histograms

More information

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1 EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

MIXTURE MODELS AND EM

MIXTURE MODELS AND EM Last updated: November 6, 212 MIXTURE MODELS AND EM Credits 2 Some of these slides were sourced and/or modified from: Christopher Bishop, Microsoft UK Simon Prince, University College London Sergios Theodoridis,

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction

Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction Xiaodong Lin 1 and Yu Zhu 2 1 Statistical and Applied Mathematical Science Institute, RTP, NC, 27709 USA University of Cincinnati,

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin

More information

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition

Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition Experiments with a Gaussian Merging-Splitting Algorithm for HMM Training for Speech Recognition ABSTRACT It is well known that the expectation-maximization (EM) algorithm, commonly used to estimate hidden

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Information Theory in Computer Vision and Pattern Recognition

Information Theory in Computer Vision and Pattern Recognition Francisco Escolano Pablo Suau Boyan Bonev Information Theory in Computer Vision and Pattern Recognition Foreword by Alan Yuille ~ Springer Contents 1 Introduction...............................................

More information

MACHINE LEARNING ADVANCED MACHINE LEARNING

MACHINE LEARNING ADVANCED MACHINE LEARNING MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 2 2 MACHINE LEARNING Overview Definition pdf Definition joint, condition, marginal,

More information

ECE 275B Homework #2 Due Thursday MIDTERM is Scheduled for Tuesday, February 21, 2012

ECE 275B Homework #2 Due Thursday MIDTERM is Scheduled for Tuesday, February 21, 2012 Reading ECE 275B Homework #2 Due Thursday 2-16-12 MIDTERM is Scheduled for Tuesday, February 21, 2012 Read and understand the Newton-Raphson and Method of Scores MLE procedures given in Kay, Example 7.11,

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians University of Cambridge Engineering Part IIB Module 4F: Statistical Pattern Processing Handout 2: Multivariate Gaussians.2.5..5 8 6 4 2 2 4 6 8 Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2 2 Engineering

More information

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Multi-Layer Boosting for Pattern Recognition

Multi-Layer Boosting for Pattern Recognition Multi-Layer Boosting for Pattern Recognition François Fleuret IDIAP Research Institute, Centre du Parc, P.O. Box 592 1920 Martigny, Switzerland fleuret@idiap.ch Abstract We extend the standard boosting

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Maximum Likelihood Estimation. only training data is available to design a classifier

Maximum Likelihood Estimation. only training data is available to design a classifier Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional

More information

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava

MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS Maya Gupta, Luca Cazzanti, and Santosh Srivastava University of Washington Dept. of Electrical Engineering Seattle,

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Supervised locally linear embedding

Supervised locally linear embedding Supervised locally linear embedding Dick de Ridder 1, Olga Kouropteva 2, Oleg Okun 2, Matti Pietikäinen 2 and Robert P.W. Duin 1 1 Pattern Recognition Group, Department of Imaging Science and Technology,

More information

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2 STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Two-View Segmentation of Dynamic Scenes from the Multibody Fundamental Matrix

Two-View Segmentation of Dynamic Scenes from the Multibody Fundamental Matrix Two-View Segmentation of Dynamic Scenes from the Multibody Fundamental Matrix René Vidal Stefano Soatto Shankar Sastry Department of EECS, UC Berkeley Department of Computer Sciences, UCLA 30 Cory Hall,

More information

ECE 275B Homework #2 Due Thursday 2/12/2015. MIDTERM is Scheduled for Thursday, February 19, 2015

ECE 275B Homework #2 Due Thursday 2/12/2015. MIDTERM is Scheduled for Thursday, February 19, 2015 Reading ECE 275B Homework #2 Due Thursday 2/12/2015 MIDTERM is Scheduled for Thursday, February 19, 2015 Read and understand the Newton-Raphson and Method of Scores MLE procedures given in Kay, Example

More information

MIXTURE OF EXPERTS ARCHITECTURES FOR NEURAL NETWORKS AS A SPECIAL CASE OF CONDITIONAL EXPECTATION FORMULA

MIXTURE OF EXPERTS ARCHITECTURES FOR NEURAL NETWORKS AS A SPECIAL CASE OF CONDITIONAL EXPECTATION FORMULA MIXTURE OF EXPERTS ARCHITECTURES FOR NEURAL NETWORKS AS A SPECIAL CASE OF CONDITIONAL EXPECTATION FORMULA Jiří Grim Department of Pattern Recognition Institute of Information Theory and Automation Academy

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures

More information

Pattern Recognition. Parameter Estimation of Probability Density Functions

Pattern Recognition. Parameter Estimation of Probability Density Functions Pattern Recognition Parameter Estimation of Probability Density Functions Classification Problem (Review) The classification problem is to assign an arbitrary feature vector x F to one of c classes. The

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Density Estimation: ML, MAP, Bayesian estimation

Density Estimation: ML, MAP, Bayesian estimation Density Estimation: ML, MAP, Bayesian estimation CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Maximum-Likelihood Estimation Maximum

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

Mixtures of Gaussians with Sparse Structure

Mixtures of Gaussians with Sparse Structure Mixtures of Gaussians with Sparse Structure Costas Boulis 1 Abstract When fitting a mixture of Gaussians to training data there are usually two choices for the type of Gaussians used. Either diagonal or

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30 Problem Set 2 MAS 622J/1.126J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

Mixtures of Gaussians. Sargur Srihari

Mixtures of Gaussians. Sargur Srihari Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm

More information

Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization

Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization Uncorrelated Multilinear Principal Component Analysis through Successive Variance Maximization Haiping Lu 1 K. N. Plataniotis 1 A. N. Venetsanopoulos 1,2 1 Department of Electrical & Computer Engineering,

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Adaptive Crowdsourcing via EM with Prior

Adaptive Crowdsourcing via EM with Prior Adaptive Crowdsourcing via EM with Prior Peter Maginnis and Tanmay Gupta May, 205 In this work, we make two primary contributions: derivation of the EM update for the shifted and rescaled beta prior and

More information

COMS 4771 Lecture Course overview 2. Maximum likelihood estimation (review of some statistics)

COMS 4771 Lecture Course overview 2. Maximum likelihood estimation (review of some statistics) COMS 4771 Lecture 1 1. Course overview 2. Maximum likelihood estimation (review of some statistics) 1 / 24 Administrivia This course Topics http://www.satyenkale.com/coms4771/ 1. Supervised learning Core

More information

Finite Mixture Model of Bounded Semi-naive Bayesian Networks Classifier

Finite Mixture Model of Bounded Semi-naive Bayesian Networks Classifier Finite Mixture Model of Bounded Semi-naive Bayesian Networks Classifier Kaizhu Huang, Irwin King, and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin,

More information

Robust Laplacian Eigenmaps Using Global Information

Robust Laplacian Eigenmaps Using Global Information Manifold Learning and its Applications: Papers from the AAAI Fall Symposium (FS-9-) Robust Laplacian Eigenmaps Using Global Information Shounak Roychowdhury ECE University of Texas at Austin, Austin, TX

More information

Hierarchical Clustering of Dynamical Systems based on Eigenvalue Constraints

Hierarchical Clustering of Dynamical Systems based on Eigenvalue Constraints Proc. 3rd International Conference on Advances in Pattern Recognition (S. Singh et al. (Eds.): ICAPR 2005, LNCS 3686, Springer), pp. 229-238, 2005 Hierarchical Clustering of Dynamical Systems based on

More information

EM in High-Dimensional Spaces

EM in High-Dimensional Spaces IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS PART B: CYBERNETICS, VOL. 35, NO. 3, JUNE 2005 571 EM in High-Dimensional Spaces Bruce A. Draper, Member, IEEE, Daniel L. Elliott, Jeremy Hayes, and Kyungim

More information

Feature selection and extraction Spectral domain quality estimation Alternatives

Feature selection and extraction Spectral domain quality estimation Alternatives Feature selection and extraction Error estimation Maa-57.3210 Data Classification and Modelling in Remote Sensing Markus Törmä markus.torma@tkk.fi Measurements Preprocessing: Remove random and systematic

More information

Spectral Generative Models for Graphs

Spectral Generative Models for Graphs Spectral Generative Models for Graphs David White and Richard C. Wilson Department of Computer Science University of York Heslington, York, UK wilson@cs.york.ac.uk Abstract Generative models are well known

More information

Lecture 3: Pattern Classification

Lecture 3: Pattern Classification EE E6820: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 1 2 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mixtures

More information

Efficient Greedy Learning of Gaussian Mixture Models

Efficient Greedy Learning of Gaussian Mixture Models LETTER Communicated by John Platt Efficient Greedy Learning of Gaussian Mixture Models J. J. Verbeek jverbeek@science.uva.nl N. Vlassis vlassis@science.uva.nl B. Kröse krose@science.uva.nl Informatics

More information

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM. Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Mathematical Formulation of Our Example

Mathematical Formulation of Our Example Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot

More information

Review of Maximum Likelihood Estimators

Review of Maximum Likelihood Estimators Libby MacKinnon CSE 527 notes Lecture 7, October 7, 2007 MLE and EM Review of Maximum Likelihood Estimators MLE is one of many approaches to parameter estimation. The likelihood of independent observations

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Lecture 3: Pattern Classification. Pattern classification

Lecture 3: Pattern Classification. Pattern classification EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and

More information

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute

More information

Latent Variable Models

Latent Variable Models Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:

More information

Introduction to Graphical Models

Introduction to Graphical Models Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic

More information

Large Scale Environment Partitioning in Mobile Robotics Recognition Tasks

Large Scale Environment Partitioning in Mobile Robotics Recognition Tasks Large Scale Environment in Mobile Robotics Recognition Tasks Boyan Bonev, Miguel Cazorla {boyan,miguel}@dccia.ua.es Robot Vision Group Department of Computer Science and Artificial Intelligence University

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Finite Singular Multivariate Gaussian Mixture

Finite Singular Multivariate Gaussian Mixture 21/06/2016 Plan 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Plan Singular Multivariate Normal Distribution 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Multivariate

More information

Clustering using Mixture Models

Clustering using Mixture Models Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior

More information

Linear Models. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Linear Models. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Linear Models DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Linear regression Least-squares estimation

More information

The Regularized EM Algorithm

The Regularized EM Algorithm The Regularized EM Algorithm Haifeng Li Department of Computer Science University of California Riverside, CA 92521 hli@cs.ucr.edu Keshu Zhang Human Interaction Research Lab Motorola, Inc. Tempe, AZ 85282

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Classification Methods II: Linear and Quadratic Discrimminant Analysis

Classification Methods II: Linear and Quadratic Discrimminant Analysis Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Low-level Image Processing

Low-level Image Processing Low-level Image Processing In-Place Covariance Operators for Computer Vision Terry Caelli and Mark Ollila School of Computing, Curtin University of Technology, Perth, Western Australia, Box U 1987, Emaihtmc@cs.mu.oz.au

More information

Chapter 14 Combining Models

Chapter 14 Combining Models Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients

More information

MACHINE LEARNING ADVANCED MACHINE LEARNING

MACHINE LEARNING ADVANCED MACHINE LEARNING MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 22 MACHINE LEARNING Discrete Probabilities Consider two variables and y taking discrete

More information

Pattern Classification

Pattern Classification Pattern Classification Introduction Parametric classifiers Semi-parametric classifiers Dimensionality reduction Significance testing 6345 Automatic Speech Recognition Semi-Parametric Classifiers 1 Semi-Parametric

More information

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-line Learning of Feature Space and Classifier Seiichi Ozawa, Shaoning Pang, and Nikola Kasabov Graduate School of Science and Technology, Kobe

More information

Lecture: Face Recognition and Feature Reduction

Lecture: Face Recognition and Feature Reduction Lecture: Face Recognition and Feature Reduction Juan Carlos Niebles and Ranjay Krishna Stanford Vision and Learning Lab Lecture 11-1 Recap - Curse of dimensionality Assume 5000 points uniformly distributed

More information

An Analytic Distance Metric for Gaussian Mixture Models with Application in Image Retrieval

An Analytic Distance Metric for Gaussian Mixture Models with Application in Image Retrieval An Analytic Distance Metric for Gaussian Mixture Models with Application in Image Retrieval G. Sfikas, C. Constantinopoulos *, A. Likas, and N.P. Galatsanos Department of Computer Science, University of

More information

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier

A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier A Modified Incremental Principal Component Analysis for On-Line Learning of Feature Space and Classifier Seiichi Ozawa 1, Shaoning Pang 2, and Nikola Kasabov 2 1 Graduate School of Science and Technology,

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Determining the number of components in mixture models for hierarchical data

Determining the number of components in mixture models for hierarchical data Determining the number of components in mixture models for hierarchical data Olga Lukočienė 1 and Jeroen K. Vermunt 2 1 Department of Methodology and Statistics, Tilburg University, P.O. Box 90153, 5000

More information