Mixture Models. Michael Kuhn

Size: px
Start display at page:

Download "Mixture Models. Michael Kuhn"

Transcription

1 Mixture Models Michael Kuhn

2 Objec<ves Students will be able to explain what is a normal mixture model Students will be able to explain steps of the Expecta<on-Maximiza<on Algorithm Describe methods to select the number of components apply methods from mclust to their own data

3 Mixture Models Probabilis<c model for subpopula<ons with different probability density func<ons A probability density func<on generated by summing together mul<ple probability density func<ons gx y(x) = a i f(x) i=1 p

4 Mixture Models Probabilis<c model for subpopula<ons with different probability density func<ons A probability density func<on generated by summing together mul<ple probability density func<ons number of components y(x) = GX a i f(x) mixing coefficient (a i sum to 1) mixture model g=1 p component density func<on

5 Interpreta<on of Mixture Models Arise when there is a latent variable indica<ng group membership observa<ons (x i,z i ) z i drawn from {1,,G} x i drawn from f z ( θ z ) Semi-parametric method of es<ma<ng density probability density x

6 Normal (Gaussian) Mixture Models Normal distribu<ons have convenient X proper<es univariate X 1 (x µ) 2 f(x ) = (x µ, )= p exp p mul<variate f(x µ, ) = exp 1 2 (x µ)t 1 (x µ) p (2 )k

7 Diversity of Normal Mixture Models Examples from McLachlan & Peel (2004)

8 Diversity of Normal Mixture Models Examples from McLachlan & Peel (2004)

9 X X p Probability p p(x = 3) = f (3)dx probability density f (x) x= x

10 X p Likelihood p p(x 1,x 2,x 3 )=f (x 1 )f (x 2 )f (x 3 )dx 1 dx 2 dx 3 probability density f (x) x

11 p X p Likelihood likelihood( x 1,x 2,x 3 )=f (x 1 )f (x 2 )f (x 3 ) probability density f (x) x

12 X p Likelihood log likelihood( x 1,x 2,x 3 ) = log f (x 1 ) + log f (x 2 ) + log f (x 3 ) probability density f (x) X x

13 X Likelihood p L( x 1,x 2,...,x N )= NX log f (x i ) i=1 probability density f (x) X x

14 X Likelihoods p L( x 1,x 2,...,x N )= NX log f (x i ) i=1 probability density f (x) X x

15 p X p Likelihoods p For a normal distribu<on (µ, 2 ) the parameters are = {µ, 2 } What is the name probability density f (x) for the maximum likelihood es<mator for the center of a normal distribu<ons? x

16 Normal Mixture Models The maximum-likelihood (ML) es<mates for μ and Σ can be easily es<mated. unbiased es<mator of mean unbiased es<mator of covariance matrix x = 1 n nx i=1 X Q = 1 n 1 x i X nx (x i x)(x i x) T i=1

17 Expecta<on Maximiza<on algorithm Maximum-likelihood es<ma<on (MLE) method for a case where some variables are hidden In the case of mixture models, the hidden variables are the group a par<cular point belongs to

18 Expecta<on Maximiza<on algorithm E step: creates a func<on for the Expecta<on of the log-likelihood using the parameters from the M step M step: computes the parameters that maximizes the func<on from the E step

19 Expecta<on Maximiza<on algorithm X E step: creates a func<on for the Expecta<on of the log-likelihood using the parameters from the M step M step: computes the parameters that maximizes the func<on from the E step Q( (t) )=E Z X, (t) [L( ; X, Z)] Formally, (t+1) = arg max X Q( (t) )

20 Expecta<on Maximiza<on algorithm group 1 group 2 x1 x3 x2 x4 x5 For points {x i } the indicator variables variables z ig = 1 if the i th point is in the g th cluster and 0 otherwise In general z ig are not known, so we es<mate the group assignments probabilis<cally

21 Likelihood for EM X Observed Data Likelihood Data are X={x 1, x 2, x N } L obs. ( X) = Complete Data Likelihood X " nx G # X X log a i f(x i g ) i=1 g=1 X X " X Data are X,Z = {(x 1,z 1 ), (x 2,z 2 ), (x N,z N )} L compl. ( X, Z) = nx i=1 g=1 # GX z ig log a i f(x i g )

22 u t X M step 1 = 2 = a 1 = 1 n a 2 = 1 n X 2 component model y(x) =a 1 (x; µ 1, 1)+au t 2 (x; µ X 2, 2) nx Pr(1 x i ) i=1 nx Pr(2 x i ) i=1 µ 1 = 1 na 1 X xpr(1 xi ) µ 2 = 1 X xpr(2 xi ) na 2 v u t 1 nx (x µ 1 ) na 2 Pr(1 x i ) 1 v u t 1 na 2 i=1 nx (x µ 2 ) 2 Pr(2 x i ) i=1 E step v u t X Pr(g x i )= (x i ; µ j, j)/y(x i )

23 An example of the E-M algorithm The intrinsic distribu<on of the simulated data density x x

24 E M algorithm: ini<al guess grp1 grp2 n = length(x) grp1 = which(x < 1.0) grp2 = which(x >= 1.0) z.indicator1 = rep(0,100) z.indicator2 = rep(0,100) z.indicator1[grp1] = 1.0 z.indicator2[grp2] = 1.0 Frequency What methods would you as an ini<al guess for clustering in a mul<variate dataset? x

25 X M step 1 = 2 = a 1 = 1 n a 2 = 1 n nx Pr(1 x i ) i=1 nx Pr(2 x i ) i=1 µ 1 = 1 na 1 X xpr(1 xi ) µ 2 = 1 X xpr(2 xi ) na 2 v u t 1 nx (x µ 1 ) na 2 Pr(1 x i ) 1 v u t 1 na 2 i=1 nx (x µ 2 ) 2 Pr(2 x i ) i=

26 Number of Components Use methods for model selec<on Penalized likelihoods Log-likelihood: (stricter) (less strict) Individual values of the BIC and AIC by themselves are meaningless. They are only useful for comparison between models.

27 BIC BIC E V Number of components

28 mclust EM algorithm for fiung models BIC for model selec<on Op<onal, restric<ve or unrestric<ve priors

29 Open R and the notes from the school webpage.

30 Bayesian Methods MCMC methods useful for parameter es<ma<on and model selec<on Bayesian methods are par<cularly useful for evalua<ng the effect of uncertain<es on model parameters

31 Problems Interpreta<on of results Mul<modality of Density Nonintui<ve group membership Fiung Mul<modality of likelihood Infinite likelihood Irregularity Noniden<fability of components Slow convergence

32 Singulari<es The likelihood of mixture models is unbound at the edge of parameter space A component can shrink on a given point, the mean becomes the point, and the variance goes to zero. Mixture model codes must implement strategies to avoid this effect

33 Noniden<fiability Permuta<on of labels A challenge for implementa<ons of MCMC Point process representa<on σ Components with equal means and covariance matrices Interpreta<on of the likelihood ra<o sta<s<c is not straighxorward μ

34 Rela<onship to Other Cluster Analysis Algorithms Normal models are a soz classifier Condi<onal probabili<es es<mated Classifica<on based on es<mated probabil<es Hard classifica<on methods K-means DBSCAN

35 Mo<va<ons for Use of Mixture Models in Astronomy

36 Inves<ga<ng Multmodality GRB dura<ons from BATSE Mukherjee et al. (1998) log GRB Dura<on (T90) [sec]

37 Inves<ga<ng Mul<modality BATSE GRBs Mukherjee et al. (1998)

38 Removal of Contaminants HST observa<ons of galaxies GCs Background galaxies Field stars Mixture model approach to separate GCs and contaminants Distribu<on of galaxies obtained from an offtarget field Jordán et al. (2009)

39 Red and Blue Galaxies Well-known bimodality to galaxy color distribu<ons Color distribu<ons overlap Simple color cuts lead to bias when examining the galaxy mass func<on for red and blue galaxies Taylor et al. (2015)

40 Red and Blue Galaxies

41 Red and Blue Galaxies >23,000 galaxies from GAMA 40 parameter model for red and blue galaxies astrophysically mo<vated MCMC Expected shape to mass func<ons Mixture models reveal a longer distribu<on of red galaxies that merge with blue popula<on at low masses Taylor et al. (2015)

42 Star Cluster Modeling In star-forming regions young stars are ozen distributed in a number of groups NGC 6357 IRAC NGC 6357 Stellar Members

43 Star Cluster Modeling Isothermal King Model Ellipsoid Multivariate Normal Log surface density [stars pc 2 ] Length Scale [pc] Adap<vely smoothed data are shown in black Models are shown in gray

44 Comparison of Methods Mixture Model Classification Minimum spanning tree method Pruned MST y x Two simulated spherical clusters 1) 300 points 2) 700 points

45 Comparison of Methods Mixture Model Classification Minimum spanning tree method Pruned MST y x Two simulated spherical clusters 1) 300 points 2) 700 points

46 Extreme Deconvolu<on Technique described by Bovy et al. (2011) Deals with heteroscedas<c errors Makes use of mixture models to approximate arbitrary density func<ons

47 Examples in R

Latent Dirichlet Alloca/on

Latent Dirichlet Alloca/on Latent Dirichlet Alloca/on Blei, Ng and Jordan ( 2002 ) Presented by Deepak Santhanam What is Latent Dirichlet Alloca/on? Genera/ve Model for collec/ons of discrete data Data generated by parameters which

More information

Machine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler

Machine Learning and Data Mining. Bayes Classifiers. Prof. Alexander Ihler + Machine Learning and Data Mining Bayes Classifiers Prof. Alexander Ihler A basic classifier Training data D={x (i),y (i) }, Classifier f(x ; D) Discrete feature vector x f(x ; D) is a con@ngency table

More information

Model-based cluster analysis: a Defence. Gilles Celeux Inria Futurs

Model-based cluster analysis: a Defence. Gilles Celeux Inria Futurs Model-based cluster analysis: a Defence Gilles Celeux Inria Futurs Model-based cluster analysis Model-based clustering (MBC) consists of assuming that the data come from a source with several subpopulations.

More information

Data Preprocessing. Cluster Similarity

Data Preprocessing. Cluster Similarity 1 Cluster Similarity Similarity is most often measured with the help of a distance function. The smaller the distance, the more similar the data objects (points). A function d: M M R is a distance on M

More information

An Introduc+on to Sta+s+cs and Machine Learning for Quan+ta+ve Biology. Anirvan Sengupta Dept. of Physics and Astronomy Rutgers University

An Introduc+on to Sta+s+cs and Machine Learning for Quan+ta+ve Biology. Anirvan Sengupta Dept. of Physics and Astronomy Rutgers University An Introduc+on to Sta+s+cs and Machine Learning for Quan+ta+ve Biology Anirvan Sengupta Dept. of Physics and Astronomy Rutgers University Why Do We Care? Necessity in today s labs Principled approach:

More information

Mul$variate clustering. Astro 585 Spring 2013

Mul$variate clustering. Astro 585 Spring 2013 Mul$variate clustering Astro 585 Spring 2013 Astronomical context Astronomers have constructed classifica

More information

Machine Learning for Data Science (CS4786) Lecture 12

Machine Learning for Data Science (CS4786) Lecture 12 Machine Learning for Data Science (CS4786) Lecture 12 Gaussian Mixture Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Back to K-means Single link is sensitive to outliners We

More information

CS 6140: Machine Learning Spring 2017

CS 6140: Machine Learning Spring 2017 CS 6140: Machine Learning Spring 2017 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis@cs Assignment

More information

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model

CS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Assignment

More information

CS 6140: Machine Learning Spring 2016

CS 6140: Machine Learning Spring 2016 CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa?on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis?cs Assignment

More information

STAD68: Machine Learning

STAD68: Machine Learning STAD68: Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! h0p://www.cs.toronto.edu/~rsalakhu/ Lecture 1 Evalua;on 3 Assignments worth 40%. Midterm worth 20%. Final

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Quan&fying Uncertainty. Sai Ravela Massachuse7s Ins&tute of Technology

Quan&fying Uncertainty. Sai Ravela Massachuse7s Ins&tute of Technology Quan&fying Uncertainty Sai Ravela Massachuse7s Ins&tute of Technology 1 the many sources of uncertainty! 2 Two days ago 3 Quan&fying Indefinite Delay 4 Finally 5 Quan&fying Indefinite Delay P(X=delay M=

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

CS 6140: Machine Learning Spring What We Learned Last Week 2/26/16

CS 6140: Machine Learning Spring What We Learned Last Week 2/26/16 Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Sign

More information

Maximum Likelihood Estimation. only training data is available to design a classifier

Maximum Likelihood Estimation. only training data is available to design a classifier Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

DART Tutorial Sec'on 1: Filtering For a One Variable System

DART Tutorial Sec'on 1: Filtering For a One Variable System DART Tutorial Sec'on 1: Filtering For a One Variable System UCAR The Na'onal Center for Atmospheric Research is sponsored by the Na'onal Science Founda'on. Any opinions, findings and conclusions or recommenda'ons

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Logis&c Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Logis&c Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com Logis&c Regression These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these

More information

Bias/variance tradeoff, Model assessment and selec+on

Bias/variance tradeoff, Model assessment and selec+on Applied induc+ve learning Bias/variance tradeoff, Model assessment and selec+on Pierre Geurts Department of Electrical Engineering and Computer Science University of Liège October 29, 2012 1 Supervised

More information

Computer Vision. Pa0ern Recogni4on Concepts Part I. Luis F. Teixeira MAP- i 2012/13

Computer Vision. Pa0ern Recogni4on Concepts Part I. Luis F. Teixeira MAP- i 2012/13 Computer Vision Pa0ern Recogni4on Concepts Part I Luis F. Teixeira MAP- i 2012/13 What is it? Pa0ern Recogni4on Many defini4ons in the literature The assignment of a physical object or event to one of

More information

CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on

CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on Professor Wei-Min Shen Week 13.1 and 13.2 1 Status Check Extra credits? Announcement Evalua/on process will start soon

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Augmented Likelihood Estimators for Mixture Models

Augmented Likelihood Estimators for Mixture Models Augmented Likelihood Estimators for Mixture Models Markus Haas Jochen Krause Marc S. Paolella Swiss Banking Institute, University of Zurich What is mixture degeneracy? mixtures under study are finite convex

More information

Introduc)on to Ar)ficial Intelligence

Introduc)on to Ar)ficial Intelligence Introduc)on to Ar)ficial Intelligence Lecture 13 Approximate Inference CS/CNS/EE 154 Andreas Krause Bayesian networks! Compact representa)on of distribu)ons over large number of variables! (OQen) allows

More information

Founda'ons of Large- Scale Mul'media Informa'on Management and Retrieval. Lecture #3 Machine Learning. Edward Chang

Founda'ons of Large- Scale Mul'media Informa'on Management and Retrieval. Lecture #3 Machine Learning. Edward Chang Founda'ons of Large- Scale Mul'media Informa'on Management and Retrieval Lecture #3 Machine Learning Edward Y. Chang Edward Chang Founda'ons of LSMM 1 Edward Chang Foundations of LSMM 2 Machine Learning

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

Class Notes. Examining Repeated Measures Data on Individuals

Class Notes. Examining Repeated Measures Data on Individuals Ronald Heck Week 12: Class Notes 1 Class Notes Examining Repeated Measures Data on Individuals Generalized linear mixed models (GLMM) also provide a means of incorporang longitudinal designs with categorical

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

More on Unsupervised Learning

More on Unsupervised Learning More on Unsupervised Learning Two types of problems are to find association rules for occurrences in common in observations (market basket analysis), and finding the groups of values of observational data

More information

CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015

CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015 CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015 Luke ZeElemoyer Slides adapted from Carlos Guestrin Predic5on of con5nuous variables Billionaire says: Wait, that s not what

More information

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University Lecture 23 Birthday problem comments Construc

More information

Lecture 13: Data Modelling and Distributions. Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1

Lecture 13: Data Modelling and Distributions. Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1 Lecture 13: Data Modelling and Distributions Intelligent Data Analysis and Probabilistic Inference Lecture 13 Slide No 1 Why data distributions? It is a well established fact that many naturally occurring

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models David Rosenberg, Brett Bernstein New York University April 26, 2017 David Rosenberg, Brett Bernstein (New York University) DS-GA 1003 April 26, 2017 1 / 42 Intro Question Intro

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Generative Model (Naïve Bayes, LDA)

Generative Model (Naïve Bayes, LDA) Generative Model (Naïve Bayes, LDA) IST557 Data Mining: Techniques and Applications Jessie Li, Penn State University Materials from Prof. Jia Li, sta3s3cal learning book (Has3e et al.), and machine learning

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Regression Clustering

Regression Clustering Regression Clustering In regression clustering, we assume a model of the form y = f g (x, θ g ) + ɛ g for observations y and x in the g th group. Usually, of course, we assume linear models of the form

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Unsupervised Learning: K- Means & PCA

Unsupervised Learning: K- Means & PCA Unsupervised Learning: K- Means & PCA Unsupervised Learning Supervised learning used labeled data pairs (x, y) to learn a func>on f : X Y But, what if we don t have labels? No labels = unsupervised learning

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Bayesian Decision Theory

Bayesian Decision Theory Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian

More information

Machine Learning & Data Mining CS/CNS/EE 155. Lecture 8: Hidden Markov Models

Machine Learning & Data Mining CS/CNS/EE 155. Lecture 8: Hidden Markov Models Machine Learning & Data Mining CS/CNS/EE 155 Lecture 8: Hidden Markov Models 1 x = Fish Sleep y = (N, V) Sequence Predic=on (POS Tagging) x = The Dog Ate My Homework y = (D, N, V, D, N) x = The Fox Jumped

More information

Lecture 6: Gaussian Mixture Models (GMM)

Lecture 6: Gaussian Mixture Models (GMM) Helsinki Institute for Information Technology Lecture 6: Gaussian Mixture Models (GMM) Pedram Daee 3.11.2015 Outline Gaussian Mixture Models (GMM) Models Model families and parameters Parameter learning

More information

Experimental Designs for Planning Efficient Accelerated Life Tests

Experimental Designs for Planning Efficient Accelerated Life Tests Experimental Designs for Planning Efficient Accelerated Life Tests Kangwon Seo and Rong Pan School of Compu@ng, Informa@cs, and Decision Systems Engineering Arizona State University ASTR 2015, Sep 9-11,

More information

Sta$s$cal sequence recogni$on

Sta$s$cal sequence recogni$on Sta$s$cal sequence recogni$on Determinis$c sequence recogni$on Last $me, temporal integra$on of local distances via DP Integrates local matches over $me Normalizes $me varia$ons For cts speech, segments

More information

Bayesian networks Lecture 18. David Sontag New York University

Bayesian networks Lecture 18. David Sontag New York University Bayesian networks Lecture 18 David Sontag New York University Outline for today Modeling sequen&al data (e.g., =me series, speech processing) using hidden Markov models (HMMs) Bayesian networks Independence

More information

Machine Learning & Data Mining CS/CNS/EE 155. Lecture 11: Hidden Markov Models

Machine Learning & Data Mining CS/CNS/EE 155. Lecture 11: Hidden Markov Models Machine Learning & Data Mining CS/CNS/EE 155 Lecture 11: Hidden Markov Models 1 Kaggle Compe==on Part 1 2 Kaggle Compe==on Part 2 3 Announcements Updated Kaggle Report Due Date: 9pm on Monday Feb 13 th

More information

Expectation maximization

Expectation maximization Expectation maximization Subhransu Maji CMSCI 689: Machine Learning 14 April 2015 Motivation Suppose you are building a naive Bayes spam classifier. After your are done your boss tells you that there is

More information

Introduc)on to Ar)ficial Intelligence

Introduc)on to Ar)ficial Intelligence Introduc)on to Ar)ficial Intelligence Lecture 10 Probability CS/CNS/EE 154 Andreas Krause Announcements! Milestone due Nov 3. Please submit code to TAs! Grading: PacMan! Compiles?! Correct? (Will clear

More information

Mul$variate clustering & classifica$on with R

Mul$variate clustering & classifica$on with R Mul$variate clustering & classifica$on with R Eric Feigelson Penn State Cosmology on the Beach, 2014 Astronomical context Astronomers have constructed classifica@ons of celes@al objects for centuries:

More information

Different points of view for selecting a latent structure model

Different points of view for selecting a latent structure model Different points of view for selecting a latent structure model Gilles Celeux Inria Saclay-Île-de-France, Université Paris-Sud Latent structure models: two different point of views Density estimation LSM

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Computer Vision Group Prof. Daniel Cremers. 3. Regression

Computer Vision Group Prof. Daniel Cremers. 3. Regression Prof. Daniel Cremers 3. Regression Categories of Learning (Rep.) Learnin g Unsupervise d Learning Clustering, density estimation Supervised Learning learning from a training data set, inference on the

More information

Unsupervised learning (part 1) Lecture 19

Unsupervised learning (part 1) Lecture 19 Unsupervised learning (part 1) Lecture 19 David Sontag New York University Slides adapted from Carlos Guestrin, Dan Klein, Luke Ze@lemoyer, Dan Weld, Vibhav Gogate, and Andrew Moore Bayesian networks enable

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Likelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University

Likelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University Likelihood, MLE & EM for Gaussian Mixture Clustering Nick Duffield Texas A&M University Probability vs. Likelihood Probability: predict unknown outcomes based on known parameters: P(x q) Likelihood: estimate

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Mathematical Formulation of Our Example

Mathematical Formulation of Our Example Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Finite Singular Multivariate Gaussian Mixture

Finite Singular Multivariate Gaussian Mixture 21/06/2016 Plan 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Plan Singular Multivariate Normal Distribution 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Multivariate

More information

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2 STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised

More information

The generative approach to classification. A classification problem. Generative models CSE 250B

The generative approach to classification. A classification problem. Generative models CSE 250B The generative approach to classification The generative approach to classification CSE 250B The learning process: Fit a probability distribution to each class, individually To classify a new point: Which

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 Discriminative vs Generative Models Discriminative: Just learn a decision boundary between your

More information

Classification Methods II: Linear and Quadratic Discrimminant Analysis

Classification Methods II: Linear and Quadratic Discrimminant Analysis Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Regulatory Inferece from Gene Expression. CMSC858P Spring 2012 Hector Corrada Bravo

Regulatory Inferece from Gene Expression. CMSC858P Spring 2012 Hector Corrada Bravo Regulatory Inferece from Gene Expression CMSC858P Spring 2012 Hector Corrada Bravo 2 Graphical Model Let y be a vector- valued random variable Suppose some condi8onal independence proper8es hold for some

More information

Be#er Generaliza,on with Forecasts. Tom Schaul Mark Ring

Be#er Generaliza,on with Forecasts. Tom Schaul Mark Ring Be#er Generaliza,on with Forecasts Tom Schaul Mark Ring Which Representa,ons? Type: Feature- based representa,ons (state = feature vector) Quality 1: Usefulness for linear policies Quality 2: Generaliza,on

More information

Bayes Decision Theory

Bayes Decision Theory Bayes Decision Theory Minimum-Error-Rate Classification Classifiers, Discriminant Functions and Decision Surfaces The Normal Density 0 Minimum-Error-Rate Classification Actions are decisions on classes

More information

Mixtures of Gaussians continued

Mixtures of Gaussians continued Mixtures of Gaussians continued Machine Learning CSE446 Carlos Guestrin University of Washington May 17, 2013 1 One) bad case for k-means n Clusters may overlap n Some clusters may be wider than others

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Probability and Structure in Natural Language Processing

Probability and Structure in Natural Language Processing Probability and Structure in Natural Language Processing Noah Smith, Carnegie Mellon University 2012 Interna@onal Summer School in Language and Speech Technologies Quick Recap Yesterday: Bayesian networks

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Pattern Recognition. Parameter Estimation of Probability Density Functions

Pattern Recognition. Parameter Estimation of Probability Density Functions Pattern Recognition Parameter Estimation of Probability Density Functions Classification Problem (Review) The classification problem is to assign an arbitrary feature vector x F to one of c classes. The

More information

Sta$s$cal Significance Tes$ng In Theory and In Prac$ce

Sta$s$cal Significance Tes$ng In Theory and In Prac$ce Sta$s$cal Significance Tes$ng In Theory and In Prac$ce Ben Cartere8e University of Delaware h8p://ir.cis.udel.edu/ictir13tutorial Hypotheses and Experiments Hypothesis: Using an SVM for classifica$on will

More information

11. Learning graphical models

11. Learning graphical models Learning graphical models 11-1 11. Learning graphical models Maximum likelihood Parameter learning Structural learning Learning partially observed graphical models Learning graphical models 11-2 statistical

More information

COMS 4771 Lecture Course overview 2. Maximum likelihood estimation (review of some statistics)

COMS 4771 Lecture Course overview 2. Maximum likelihood estimation (review of some statistics) COMS 4771 Lecture 1 1. Course overview 2. Maximum likelihood estimation (review of some statistics) 1 / 24 Administrivia This course Topics http://www.satyenkale.com/coms4771/ 1. Supervised learning Core

More information

4 Gaussian Mixture Models

4 Gaussian Mixture Models 4 Gaussian Mixture Models Once you have a collection of feature vectors you will need to describe their distribution. You will do this using a Gaussian Mixture Model. The GMM comprises a collection of

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Least Squares Parameter Es.ma.on

Least Squares Parameter Es.ma.on Least Squares Parameter Es.ma.on Alun L. Lloyd Department of Mathema.cs Biomathema.cs Graduate Program North Carolina State University Aims of this Lecture 1. Model fifng using least squares 2. Quan.fica.on

More information

Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction

Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction Xiaodong Lin 1 and Yu Zhu 2 1 Statistical and Applied Mathematical Science Institute, RTP, NC, 27709 USA University of Cincinnati,

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

LECTURE NOTE #3 PROF. ALAN YUILLE

LECTURE NOTE #3 PROF. ALAN YUILLE LECTURE NOTE #3 PROF. ALAN YUILLE 1. Three Topics (1) Precision and Recall Curves. Receiver Operating Characteristic Curves (ROC). What to do if we do not fix the loss function? (2) The Curse of Dimensionality.

More information

Bayesian Decision and Bayesian Learning

Bayesian Decision and Bayesian Learning Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i

More information