Accelerating the EM Algorithm for Mixture Density Estimation
|
|
- Aubrey Sharp
- 5 years ago
- Views:
Transcription
1 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 1/18 Accelerating the EM Algorithm for Mixture Density Estimation Homer Walker Mathematical Sciences Department Worcester Polytechnic Instititute Joint work with Josh Plasse (WPI/Imperial College). Research supported in part by DOE Grant DE-SC and NSF Grant DMS
2 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 2/18 Mixture Densities Consider a (finite) mixture density m p(x Φ) = α i p i (x φ i ). i=1 Problem: Estimate Φ = (α 1,..., α m, φ 1,..., φ m ) using an unlabeled sample {x k } N on the mixture. Maximum-Likelihood Estimate (MLE): Determine Φ = arg max L(Φ), where N L(Φ) log.
3 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 3/18 The EM (Expectation-Maximization) Algorithm The general formulation and name were given in... A. P. Dempster, N. M. Laird, and D. B. Rubin (1977), Maximum-likelihood from incomplete data via the EM algorithm, J. Royal Statist. Soc. Ser. B (methodological), 39, pp General idea: Determine the next approximate MLE to maximize the expectation of the complete-data log-likelihood function, given the observed incomplete data and the current approximate MLE. Marvelous property: The log-likelihood function increases at each iteration.
4 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 4/18 The EM Algorithm for Mixture Densities For a mixture density, an EM iteration is... α + i = 1 N N α c i p i(x k φ c i ) p(x k Φ c ), φ + i = arg max N log p i (x k φ i ) αc i p i(x k φ c i ) p(x k Φ c ) For a derivation, convergence analysis, history, etc., see... R. A. Redner and HW (1984), Mixture densities, maximum-likelihood, and the EM algorithm, SIAM Review, 26,
5 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 5/18 Particular Example: Normal (Gaussian) Mixtures Assume (multivariate) normal densities. For each i, φ i = (µ i, Σ i ) and p i (x φ i ) = 1 (2π) n/2 (det Σ i ) 1/2 e (x µ i ) T Σ 1 i (x µ i )/2 EM iteration: For i = 1,..., m, N α + i = 1 N µ + i = { N α c i p i (x k φ c i ) p(x k Φ c ), α c i x p i (x k φ c i ) } / { N k p(x k Φ c ) α c i p i (x k φ c i ) } p(x k Φ c, ) { N Σ + i = (x k µ + i )(x k µ + i ) T αc i p i (x k φ c i ) } / { N α c i p i (x k φ c i ) } p(x k Φ c ) p(x k Φ c. )
6 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 6/18 EM Iterations Demo A Univariate Normal Mixture. p i (x φ i ) = 1 e (x µ i ) 2 /(2σi 2 ) for i = 1,..., 5. 2πσ i 2 Sample of 100,000 observations. [α 1,..., α 5 ] = [.2,.3,.3,.1,.1] [µ 1,..., µ 5 ] = [0, 1, 2, 3, 4], [σ1 2,..., σ2 5 ] = [.2, 2,.5,.1,.1]. { EM iterations on the means: µ + N } / { i = x k α i p i (x k φ i ) N 5!Population Mixture, Sample Size 10000, EM with No Acceleration, Iteration } α i p i (x k φ i ) !0.05!0.1!3!2!
7 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 6/18 EM Iterations Demo A Univariate Normal Mixture. p i (x φ i ) = 1 e (x µ i ) 2 /(2σi 2 ) for i = 1,..., 5. 2πσ i 2 Sample of 100,000 observations. [α 1,..., α 5 ] = [.2,.3,.3,.1,.1] [µ 1,..., µ 5 ] = [0, 1, 2, 3, 4], [σ1 2,..., σ2 5 ] = [.2, 2,.5,.1,.1]. { EM iterations on the means: µ + N } / { i = x k α i p i (x k φ i ) N 5!Population Mixture, Sample Size 10000, EM with No Acceleration, Iteration } α i p i (x k φ i ) Log Residual Norm ! !0.1!3!2! Iteration Number
8 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 7/18 Anderson Acceleration Derived from a method of D. G. Anderson, Iterative procedures for nonlinear integral equations, J. Assoc. Comput. Machinery, 12 (1965), Consider a fixed-point iteration x + = g(x), g : R n R n. Anderson Acceleration: Given x 0 and mmax 1. Set x 1 = g(x 0 ). Iterate: For k = 1, 2,... Set m k = min{mmax, k}. Set F k = (f k mk,..., f k ), where f i = g(x i ) x i. Solve min α R m k +1 F k α 2 s. t. m k i=0 α i = 1. Set x k+1 = m k i=0 α i g(x k mk +i ).
9 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 8/18 EM Iterations Demo (cont.) p i (x φ i ) = 1 e (x µ i ) 2 /(2σi 2 ) for i = 1,..., 5. 2πσ i 2 Sample of 100,000 observations. [α 1,..., α 5 ] = [.2,.3,.3,.1,.1] [µ 1,..., µ 5 ] = [0, 1, 2, 3, 4], [σ1 2,..., σ2 5 ] = [.2, 2,.5,.1,.1]. { EM iterations on the means: µ + N } / { i = x k α i p i (x k φ i ) N 5!Population Mixture, Sample Size 10000, EM with No Acceleration, Iteration } α i p i (x k φ i ) !0.05!0.1!3!2!
10 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 8/18 EM Iterations Demo (cont.) p i (x φ i ) = 1 e (x µ i ) 2 /(2σi 2 ) for i = 1,..., 5. 2πσ i 2 Sample of 100,000 observations. [α 1,..., α 5 ] = [.2,.3,.3,.1,.1] [µ 1,..., µ 5 ] = [0, 1, 2, 3, 4], [σ1 2,..., σ2 5 ] = [.2, 2,.5,.1,.1]. { EM iterations on the means: µ + N } / { i = x k α i p i (x k φ i ) N 5!Population Mixture, Sample Size 10000, EM with No Acceleration, Iteration } α i p i (x k φ i ) Log Residual Norm ! !0.1!3!2! Iteration Number
11 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 9/18 EM Convergence and Separation Redner W (1984): For mixture densities, the convergence is linear and depends on the separation of the component populations: well-separated (fast convergence) if, whenever i j, p i (x φ i ) p(x Φ ) pj(x φ j ) p(x Φ ) 0 for all x IRn ; poorly separated (slow convergence) if, for some i j, p i (x φ i ) p(x Φ ) p j(x φ j ) p(x Φ ) for all x Rn.
12 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 10/18 Example: EM Convergence and Separation A Univariate Normal Mixture. p i (x φ i ) = 1 e (x µ i ) 2 /(2σi 2 ) for i = 1,..., 3. 2πσ i 2 { EM iterations on the means: µ + N } / { i = x k α i p i (x k φ i ) N Sample of 100,000 observations. [α 1, α 2, α 3 ] = [.3,.3,.4], [σ1 2, σ2 2, σ2 3 ] = [1, 1, 1]. [µ 1, µ 2, µ 3 ] = [0, 2, 4], [0, 1, 2], [0,.5, 1]. } α i p i (x k φ i ). 0 2 Log Residual Norm Iteration Number
13 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 10/18 Example: EM Convergence and Separation A Univariate Normal Mixture. p i (x φ i ) = 1 e (x µ i ) 2 /(2σi 2 ) for i = 1,..., 3. 2πσ i 2 { EM iterations on the means: µ + N } / { i = x k α i p i (x k φ i ) N Sample of 100,000 observations. [α 1, α 2, α 3 ] = [.3,.3,.4], [σ1 2, σ2 2, σ2 3 ] = [1, 1, 1]. [µ 1, µ 2, µ 3 ] = [0, 2, 4], [0, 1, 2], [0,.5, 1]. } α i p i (x k φ i ). 0 2 Log Residual Norm Iteration Number
14 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 11/18 Experiments with Multivariate Normal Mixtures Experiment with Anderson acceleration applied to... EM iteration: For i = 1,..., m, N α + i = 1 N µ + i = { N α c i p i (x k φ c i ) p(x k Φ c ), α c i x p i (x k φ c i ) } / { N k p(x k Φ c ) α c i p i (x k φ c i ) } p(x k Φ c, ) { N Σ + i = (x k µ + i )(x k µ + i ) T αc i p i (x k φ c i ) } / { N α c i p i (x k φ c i ) } p(x k Φ c ) p(x k Φ c. ) Assume m is known. Ultimate interest: very large N.
15 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 12/18 Experiments with Multivariate Normal Mixtures (cont.) Two issues: Good initial guess? Use K-means. Fast clustering algorithm. Usually gives good results. Apply several times to random subsets of the sample. Choose the clustering with minimal sum of within-class distances. Use proportions, means, covariance matrices for the clusters as the initial guess. Preserving constraints? Iterate on... α i, i = 1,..., m; Cholesky factors of each Σ i.
16 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 13/18 Experiments with Generated Data All computing in MATLAB. Mixtures with m = 5 subpopulations. Generated data in R d for d = 2, 5, 10, 15, 20: For each d, randomly generated 100 true {α i, µ i, Σ i } 5 i=1. For each {α i, µ i, Σ i } 5 i=1, randomly generated a sample of size N = 1, 000, 000. Compared (unaccelerated) EM with EM+AA with mmax = 5, 10, 15, 20, 25, 30.
17 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 14/18 Experiments with Generated Data (cont.) A look at failures. mmax Totals failure to converge within 300 iterations. N α i p i (x k )/p(x k ) = 0 for some i. There were trials in which all methods failed, 26 trials in which EM failed and EM+AA succeeded for at least one mmax, 15 trials in which EM failed and EM+AA succeeded for all mmax, 20 trials in which EM succeeded and EM+AA failed for all mmax, 21 trials in which EM succeeded and EM+AA failed for at least one mmax.
18 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 15/18 Experiments with Generated Data (cont.) Performance profiles (Dolan-Moré, 2002) for (unaccelerated) EM and EM+AA with mmax = 5 over all trials: mmax = 0 mmax = mmax = 0 mmax = Iteration Numbers Run Times
19 An Experiment with Real Data I Remotely sensed data from near Tollhouse, CA. (Thanks to Brett Bader, Digital Globe.) I N = = observations of 16-dimensional multispectral data. I Modeled with a mixture of m = 3 multivariate normals. I Applied (unaccelerated) EM and EM+AA with mmax = 5, 10, 15, 20, 25,30. Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 16/18
20 An Experiment with Real Data (cont.) Log residual norms vs. iteration numbers. Right: Bayes classification of data based on MLE. Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 17/18
21 Accelerating the EM Algorithm ICERM Workshop September 4, 2015 Slide 18/18 In Conclusion... Anderson acceleration is a promising tool for accelerating the EM algorithm that may improve both robustness and efficiency. Future work: Expand generated-data experiments to include more trials, larger data sets, well-controlled separation experiments, partially-labeled samples, and other parametric PDF forms. Look for more data from real applications.
The EM Algorithm in Multivariate Gaussian Mixture Models using Anderson Acceleration
The EM Algorithm in Multivariate Gaussian Mixture Models using Anderson Acceleration by Joshua H. Plasse A Project Report Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUTE in partial fulfillment
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA
More informationMachine Learning for Data Science (CS4786) Lecture 12
Machine Learning for Data Science (CS4786) Lecture 12 Gaussian Mixture Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Back to K-means Single link is sensitive to outliners We
More informationData Preprocessing. Cluster Similarity
1 Cluster Similarity Similarity is most often measured with the help of a distance function. The smaller the distance, the more similar the data objects (points). A function d: M M R is a distance on M
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationGaussian Mixture Distance for Information Retrieval
Gaussian Mixture Distance for Information Retrieval X.Q. Li and I. King fxqli, ingg@cse.cuh.edu.h Department of omputer Science & Engineering The hinese University of Hong Kong Shatin, New Territories,
More informationMixture Models and EM
Mixture Models and EM Goal: Introduction to probabilistic mixture models and the expectationmaximization (EM) algorithm. Motivation: simultaneous fitting of multiple model instances unsupervised clustering
More informationA Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute
More informationRegression Clustering
Regression Clustering In regression clustering, we assume a model of the form y = f g (x, θ g ) + ɛ g for observations y and x in the g th group. Usually, of course, we assume linear models of the form
More informationClustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation
Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide
More informationMaximum Likelihood Estimation. only training data is available to design a classifier
Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional
More informationLecture 6: Gaussian Mixture Models (GMM)
Helsinki Institute for Information Technology Lecture 6: Gaussian Mixture Models (GMM) Pedram Daee 3.11.2015 Outline Gaussian Mixture Models (GMM) Models Model families and parameters Parameter learning
More informationEstimating the parameters of hidden binomial trials by the EM algorithm
Hacettepe Journal of Mathematics and Statistics Volume 43 (5) (2014), 885 890 Estimating the parameters of hidden binomial trials by the EM algorithm Degang Zhu Received 02 : 09 : 2013 : Accepted 02 :
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationMIXTURE MODELS AND EM
Last updated: November 6, 212 MIXTURE MODELS AND EM Credits 2 Some of these slides were sourced and/or modified from: Christopher Bishop, Microsoft UK Simon Prince, University College London Sergios Theodoridis,
More informationExpectation Maximization
Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationMixtures of Gaussians continued
Mixtures of Gaussians continued Machine Learning CSE446 Carlos Guestrin University of Washington May 17, 2013 1 One) bad case for k-means n Clusters may overlap n Some clusters may be wider than others
More informationCOM336: Neural Computing
COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk
More informationGaussian Mixture Models, Expectation Maximization
Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Multivariate Gaussians Mark Schmidt University of British Columbia Winter 2019 Last Time: Multivariate Gaussian http://personal.kenyon.edu/hartlaub/mellonproject/bivariate2.html
More informationIntroduction to Machine Learning
Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},
More informationStatistical Estimation
Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from
More informationEstimating Gaussian Mixture Densities with EM A Tutorial
Estimating Gaussian Mixture Densities with EM A Tutorial Carlo Tomasi Due University Expectation Maximization (EM) [4, 3, 6] is a numerical algorithm for the maximization of functions of several variables
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David
More informationMore on Unsupervised Learning
More on Unsupervised Learning Two types of problems are to find association rules for occurrences in common in observations (market basket analysis), and finding the groups of values of observational data
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization
More informationData Spectroscopy: Learning Mixture Models using Eigenspaces of Convolution Operators
Data Spectroscopy: Learning Mixture Models using Eigenspaces of Convolution Operators Tao Shi Department of Statistics, Ohio State University Mikhail Belkin Department of Computer Science and Engineering,
More informationLecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models
Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance
More informationMH I. Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution
MH I Metropolis-Hastings (MH) algorithm is the most popular method of getting dependent samples from a probability distribution a lot of Bayesian mehods rely on the use of MH algorithm and it s famous
More informationExpectation Maximization
Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationExpectation Maximization Algorithm
Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More informationECE 275B Homework #2 Due Thursday 2/12/2015. MIDTERM is Scheduled for Thursday, February 19, 2015
Reading ECE 275B Homework #2 Due Thursday 2/12/2015 MIDTERM is Scheduled for Thursday, February 19, 2015 Read and understand the Newton-Raphson and Method of Scores MLE procedures given in Kay, Example
More informationK-means. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. November 19 th, Carlos Guestrin 1
EM Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University November 19 th, 2007 2005-2007 Carlos Guestrin 1 K-means 1. Ask user how many clusters they d like. e.g. k=5 2. Randomly guess
More informationMIXTURE OF EXPERTS ARCHITECTURES FOR NEURAL NETWORKS AS A SPECIAL CASE OF CONDITIONAL EXPECTATION FORMULA
MIXTURE OF EXPERTS ARCHITECTURES FOR NEURAL NETWORKS AS A SPECIAL CASE OF CONDITIONAL EXPECTATION FORMULA Jiří Grim Department of Pattern Recognition Institute of Information Theory and Automation Academy
More informationLabel Switching and Its Simple Solutions for Frequentist Mixture Models
Label Switching and Its Simple Solutions for Frequentist Mixture Models Weixin Yao Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. wxyao@ksu.edu Abstract The label switching
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationChapter 08: Direct Maximum Likelihood/MAP Estimation and Incomplete Data Problems
LEARNING AND INFERENCE IN GRAPHICAL MODELS Chapter 08: Direct Maximum Likelihood/MAP Estimation and Incomplete Data Problems Dr. Martin Lauer University of Freiburg Machine Learning Lab Karlsruhe Institute
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Mixture Models, Density Estimation, Factor Analysis Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 2: 1 late day to hand it in now. Assignment 3: Posted,
More informationUncertainty quantification and visualization for functional random variables
Uncertainty quantification and visualization for functional random variables MascotNum Workshop 2014 S. Nanty 1,3 C. Helbert 2 A. Marrel 1 N. Pérot 1 C. Prieur 3 1 CEA, DEN/DER/SESI/LSMR, F-13108, Saint-Paul-lez-Durance,
More informationChapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)
HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter
More informationMixture of Gaussians Models
Mixture of Gaussians Models Outline Inference, Learning, and Maximum Likelihood Why Mixtures? Why Gaussians? Building up to the Mixture of Gaussians Single Gaussians Fully-Observed Mixtures Hidden Mixtures
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Ladislav Rampasek slides by Mengye Ren and others February 22, 2016 Naive Bayes and Gaussian Bayes Classifier February 22, 2016 1 / 21 Naive Bayes Bayes Rule:
More informationParameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets
Parameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets Matthias Katzfuß Advisor: Dr. Noel Cressie Department of Statistics The Ohio State University
More informationParametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory
Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007
More informationSTATS 306B: Unsupervised Learning Spring Lecture 2 April 2
STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised
More informationECE 275B Homework #2 Due Thursday MIDTERM is Scheduled for Tuesday, February 21, 2012
Reading ECE 275B Homework #2 Due Thursday 2-16-12 MIDTERM is Scheduled for Tuesday, February 21, 2012 Read and understand the Newton-Raphson and Method of Scores MLE procedures given in Kay, Example 7.11,
More informationDisentangling Gaussians
Disentangling Gaussians Ankur Moitra, MIT November 6th, 2014 Dean s Breakfast Algorithmic Aspects of Machine Learning 2015 by Ankur Moitra. Note: These are unpolished, incomplete course notes. Developed
More informationWe Prediction of Geological Characteristic Using Gaussian Mixture Model
We-07-06 Prediction of Geological Characteristic Using Gaussian Mixture Model L. Li* (BGP,CNPC), Z.H. Wan (BGP,CNPC), S.F. Zhan (BGP,CNPC), C.F. Tao (BGP,CNPC) & X.H. Ran (BGP,CNPC) SUMMARY The multi-attribute
More informationPattern Recognition. Parameter Estimation of Probability Density Functions
Pattern Recognition Parameter Estimation of Probability Density Functions Classification Problem (Review) The classification problem is to assign an arbitrary feature vector x F to one of c classes. The
More informationIntroduction An approximated EM algorithm Simulation studies Discussion
1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse
More informationAn Introduction to Expectation-Maximization
An Introduction to Expectation-Maximization Dahua Lin Abstract This notes reviews the basics about the Expectation-Maximization EM) algorithm, a popular approach to perform model estimation of the generative
More informationCOMS 4771 Lecture Course overview 2. Maximum likelihood estimation (review of some statistics)
COMS 4771 Lecture 1 1. Course overview 2. Maximum likelihood estimation (review of some statistics) 1 / 24 Administrivia This course Topics http://www.satyenkale.com/coms4771/ 1. Supervised learning Core
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationABSTRACT INTRODUCTION
ABSTRACT Presented in this paper is an approach to fault diagnosis based on a unifying review of linear Gaussian models. The unifying review draws together different algorithms such as PCA, factor analysis,
More informationLatent Variable Models and EM algorithm
Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic
More informationCSE446: Clustering and EM Spring 2017
CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled
More informationMS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari
MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind
More informationWeighted Finite-State Transducers in Computational Biology
Weighted Finite-State Transducers in Computational Biology Mehryar Mohri Courant Institute of Mathematical Sciences mohri@cims.nyu.edu Joint work with Corinna Cortes (Google Research). 1 This Tutorial
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationUniversity of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries
University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationClustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning
Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades
More informationCSC411 Fall 2018 Homework 5
Homework 5 Deadline: Wednesday, Nov. 4, at :59pm. Submission: You need to submit two files:. Your solutions to Questions and 2 as a PDF file, hw5_writeup.pdf, through MarkUs. (If you submit answers to
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationExponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger
Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm by Korbinian Schwinger Overview Exponential Family Maximum Likelihood The EM Algorithm Gaussian Mixture Models Exponential
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationMachine Learning for Signal Processing Expectation Maximization Mixture Models. Bhiksha Raj 27 Oct /
Machine Learning for Signal rocessing Expectation Maximization Mixture Models Bhiksha Raj 27 Oct 2016 11755/18797 1 Learning Distributions for Data roblem: Given a collection of examples from some data,
More informationEM Algorithm II. September 11, 2018
EM Algorithm II September 11, 2018 Review EM 1/27 (Y obs, Y mis ) f (y obs, y mis θ), we observe Y obs but not Y mis Complete-data log likelihood: l C (θ Y obs, Y mis ) = log { f (Y obs, Y mis θ) Observed-data
More informationConvergence of the EM Algorithm for Gaussian Mixtures with Unbalanced Mixing Coefficients
Convergence of the Algorithm for Gaussian Mixtures with Unbalanced Mixing Coefficients Iftekhar Naim Daniel Gildea Department of Computer Science, University of Rochester Rochester, NY 14627, USA inaim@cs.rochester.edu
More informationEBEM: An Entropy-based EM Algorithm for Gaussian Mixture Models
EBEM: An Entropy-based EM Algorithm for Gaussian Mixture Models Antonio Peñalver Benavent, Francisco Escolano Ruiz and Juan M. Sáez Martínez Robot Vision Group Alicante University 03690 Alicante, Spain
More informationInformation geometry for bivariate distribution control
Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationProbability Theory for Machine Learning. Chris Cremer September 2015
Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares
More informationIntroduction to Machine Learning
Introduction to Machine Learning Bayesian Classification Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationHidden Markov Models and Gaussian Mixture Models
Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian
More informationClustering and Gaussian Mixture Models
Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 203 http://ce.sharif.edu/courses/9-92/2/ce725-/ Agenda Expectation-maximization
More informationMixtures of Gaussians. Sargur Srihari
Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationLecture 3. Gaussian Mixture Models and Introduction to HMM s. Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen, Markus Nussbaum-Thom
Lecture 3 Gaussian Mixture Models and Introduction to HMM s Michael Picheny, Bhuvana Ramabhadran, Stanley F. Chen, Markus Nussbaum-Thom Watson Group IBM T.J. Watson Research Center Yorktown Heights, New
More informationEXPECTATION- MAXIMIZATION THEORY
Chapter 3 EXPECTATION- MAXIMIZATION THEORY 3.1 Introduction Learning networks are commonly categorized in terms of supervised and unsupervised networks. In unsupervised learning, the training set consists
More informationFinite Singular Multivariate Gaussian Mixture
21/06/2016 Plan 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Plan Singular Multivariate Normal Distribution 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Multivariate
More informationNaive Bayes and Gaussian Bayes Classifier
Naive Bayes and Gaussian Bayes Classifier Mengye Ren mren@cs.toronto.edu October 18, 2015 Mengye Ren Naive Bayes and Gaussian Bayes Classifier October 18, 2015 1 / 21 Naive Bayes Bayes Rules: Naive Bayes
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationOverlapping Astronomical Sources: Utilizing Spectral Information
Overlapping Astronomical Sources: Utilizing Spectral Information David Jones Advisor: Xiao-Li Meng Collaborators: Vinay Kashyap (CfA) and David van Dyk (Imperial College) CHASC Astrostatistics Group April
More informationReview of Maximum Likelihood Estimators
Libby MacKinnon CSE 527 notes Lecture 7, October 7, 2007 MLE and EM Review of Maximum Likelihood Estimators MLE is one of many approaches to parameter estimation. The likelihood of independent observations
More informationMinimum Hellinger Distance Estimation in a. Semiparametric Mixture Model
Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.
More informationBAYESIAN DECISION THEORY
Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will
More informationPERFORMANCE OF THE EM ALGORITHM ON THE IDENTIFICATION OF A MIXTURE OF WAT- SON DISTRIBUTIONS DEFINED ON THE HYPER- SPHERE
REVSTAT Statistical Journal Volume 4, Number 2, June 2006, 111 130 PERFORMANCE OF THE EM ALGORITHM ON THE IDENTIFICATION OF A MIXTURE OF WAT- SON DISTRIBUTIONS DEFINED ON THE HYPER- SPHERE Authors: Adelaide
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More informationLikelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University
Likelihood, MLE & EM for Gaussian Mixture Clustering Nick Duffield Texas A&M University Probability vs. Likelihood Probability: predict unknown outcomes based on known parameters: P(x q) Likelihood: estimate
More information