Machine Learning. Clustering 1. Hamid Beigy. Sharif University of Technology. Fall 1395

Size: px
Start display at page:

Download "Machine Learning. Clustering 1. Hamid Beigy. Sharif University of Technology. Fall 1395"

Transcription

1 Machine Learning Clustering 1 Hamid Beigy Sharif University of Technology Fall Some slides are taken from P. Rai slides Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34

2 Table of contents 1 Introduction 2 K-means clustering 3 Hierarchical Clustering 4 Model-based clustering 5 Cluster validation and assessment Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34

3 Table of contents 1 Introduction 2 K-means clustering 3 Hierarchical Clustering 4 Model-based clustering 5 Cluster validation and assessment Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34

4 Introduction g Clustering is the process of grouping a set of data objects into multiple groups or clusters so that objects within a cluster have high similarity, but are very dissimilar to objects in other clusters. Dissimilarities and similarities are assessed based on the feature values describing the objects and often involve distance measures. Clustering is usually an unsupervised learning problem. Consider a dataset X = {x 1,..., x N },x i R D. Assume there are K clusters C 1,..., C K. The goal is to group the examples into K homogeneous partitions. an unsupervised learning problem N unlabeled examples {x 1,...,x N }; no. of desired partition roup the examples into K homogeneous partitions Picture courtesy: Data Clustering: 50 Years Beyond K-Means, A.K. Jain (2008) Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34

5 Introduction A good clustering is one that achieves: High within-cluster similarity Low inter-cluster similarity Applications of clustering Document/Image/Webpage Clustering Image Segmentation Clustering web-search results Clustering (people) nodes in (social) networks/graphs Pre-processing phase Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34

6 Comparing clustering methods The clustering methods can be compared using the following aspects: The partitioning criteria : In some methods, all the objects are partitioned so that no hierarchy exists among the clusters. Separation of clusters : In some methods, data partitioned into mutually exclusive clusters while in some other methods, the clusters may not be exclusive, that is, a data object may belong to more than one cluster. Similarity measure : Some methods determine the similarity between two objects by the distance between them; while in other methods, the similarity may be defined by connectivity based on density or contiguity. Clustering space : Many clustering methods search for clusters within the entire data space. These methods are useful for low-dimensionality data sets. With high- dimensional data, however, there can be many irrelevant attributes, which can make similarity measurements unreliable. Consequently, clusters found in the full space are often meaningless. Its often better to instead search for clusters within different subspaces of the same data set. Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34

7 1 Flat or Partitional clustering Types of Clustering Flat or Partitional clustering (Partitions are independent f each of each other) Partitions are independent of each other Hierarchical clustering (Partitions can be visualized using a tree structure (a 2 Hierarchical clustering dendrogram)) Partitions can be visualized using a tree structure (a dendrogra Machine Learning (CS771A) Possible to view partitions at di erent levels of granularities (i. refine/coarsen clusters) Possible using to view di erent partitions K at different levels of granularities (i.e., can refine/coarsen clusters) using different K. Clustering: K-means and Kernel K-means Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34

8 Table of contents 1 Introduction 2 K-means clustering 3 Hierarchical Clustering 4 Model-based clustering 5 Cluster validation and assessment Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34

9 Flat Clustering: K-means algorithm (Lloyd, 1957) Associate a prototype µ k, k = 1,..., K with each cluster. Let r nk be the indicator of x n C k. The goal is to minimize Goal is to find {r nk } and {µ k }. J = N n=1 k=1 K r nk x n µ k 2 Optimization is performed by alternating minimization Optimize over {r nk } for a fixed {µ k }. Optimize over {µ k } for a fixed {r nk }. Initialize with arbitrary choices of {µ k } Iterative updates continue till convergence Iterative updates continue till convergence Guaranteed to converge, as objective is monotonic decreasing Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34

10 Flat Clustering: K-means algorithm (Lloyd, 1957) Input: N examples {x 1,..., x N }; x n R D ; the number of partitions K Initialize: K cluster means µ 1,..., µ K, each µ k R D. Usually initialized randomly, but good initialization is crucial; many smarter initialization heuristics exist (e.g., K-means++, Arthur & Vassilvitskii, 2007) Repeat: (Re)-Assign each example xn to its closest cluster center (based on the smallest Euclidean distance) r nk = { 1 if k = argmin x n µ j 2 j 0 otherwise. Let C k is the set of examples assigned to cluster k with center µ k. Update the cluster means µ k = 1 C k N n=1 x n = r nkx n N x n C k n=1 r nk Stop: when cluster means or the loss (defined later) doesn t change by much Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34

11 Kmeans Example -means example Roland Memisevic Machine Learning 14 Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34

12 Kmeans 336 Example Representative-based Clustering (a) Initial dataset µ 1 = 2 µ 2 = (b) Iteration: t = 1 µ 1 = 2.5 µ 2 = µ 1 = (c) Iteration: t = 2 µ 2 = (d) Iteration: t = 3 µ 1 = 4.75 µ 2 = µ 1 = 7 (e) Iteration: t = µ 2 = (f) Iteration: t = 5(converged) Figure K-means in one dimension. Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34

13 Kmeans Example I Replace the RGB-value (a 3-D vector) at each pixel with one of K prototypes: Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34

14 Decrease in objective function ecrease in objective function 1000 J Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34

15 K-means objective function Consider the K-means objective function J(X, µ, r) = N n=1 k=1 K r nk x n µ k 2 It is a non-convex objective function, so may have many local minima. Also NP-hard to minimize in general (note that r is discrete) The K-means algorithm is a heuristic to optimize this function K-means algorithm alternated between the following two steps assign points to closest centers recompute the center means The algorithm usually converges to a local minima. Multiple runs with different initializations are usually tried to find a good solution. Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34

16 ect K for the K-means algorithm is to try di e K-means (choosing K) means objective versus K, and look at the elb Try different values of K, plot the K-means objective versus K. We can also use information criterion such as AIC (Akaike Information Criterion) or BIC plot, (Bayesian K = Information 6 iscriterion) the elbow and choose Kpoint that gives smallest AIC/BIC (both penalize large K values) Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34

17 Kernel K-means or Spectral clustering can handle non-convex ns Hamid Beigy or(sharif Spectral University of Technology) clustering Machine Learning can handle non-convex Fall / 34 K-means (limitations) ly is the clusters are roughtly of equal sizes Makes hard assignments of points to clusters lustering A point either methods completely belongs such to a cluster asorgaussian doesn t belong at allmixture mod No notion of a soft assignment. ues (model each cluster using a Gaussian distribu Works well only is the clusters are roughly of equal sizes. Probabilistic clustering methods such as Gaussian mixture models can handle both these issues works well only when the clusters are round-shap usters have non-convex shapes K-means also works well only when the clusters are round-shaped and does badly if the clusters have non-convex shapes

18 Kernel K-means t have to be computed/stored for data {x n } N n= because computations only depend on kernel 1 The idea is to replace the Euclidean distance/similarity computations in K-means by the kernelized versions ing k(x n, x n ) above is straightforward. Comput e involving µ d k 2 (x s n, µ requires k ) = ϕ(x n ) ϕ(µ some k ) simple algebra (s 2 = K(x n, x n ) + K(µ k, µ k ) 2K(µ k, x n ) Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34

19 Kernel K-means Computing K(x n, x n ) is easy: Simply compute this kernel function. Compute K(µ k, µ k ) as follows (assume N k is the no. of points in cluster k) K(µ k, µ k ) = ϕ T (µ k )ϕ(µ k ) = = 1 N 2 k = 1 N 2 k K(µ k, x n ) can be computed as N k N k n=1 m=1 N k N k n=1 m=1 [ 1 ] N T [ k 1 ϕ(x n ) N k n=1 ϕ T (x n )ϕ(x m ) K(x n, x m ) K(µ k, x n ) = ϕ T (µ k )ϕ(x n ) = = 1 N k N k N k m=1 m=1 [ 1 N k N k n=1 ] N k ϕ(x n ) N k n=1 T ϕ(x n )] ϕ(x n ) ϕ T (x n )ϕ(x m ) = 1 N k N k m=1 K(x n, x m ) Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34

20 Table of contents 1 Introduction 2 K-means clustering 3 Hierarchical Clustering 4 Model-based clustering 5 Cluster validation and assessment Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34

21 Hierarchical Clustering 0 Chapter A hierarchical 10 Clusterclustering Analysis: Basic method Concepts works andby Methods grouping data objects into a hierarchy or tree of clusters. Agglomerative (AGNES) Step 0 Step 1 Step 2 Step 3 Step 4 a ab b c cde abcde d de e Step 4 Step 3 Step 2 Step 1 Step 0 Divisive (DIANA) Figure Hierarchical 10.6 Agglomerative clustering andmethods divisive hierarchical clustering on data objects {a, b, c, d, e}. Agglomerative hierarchical clustering Divisive hierarchical Level clustering a b c d e l = l = 1 l = 2 Hamid Beigy (Sharif University of Technology) Machine Learning Fall / scale

22 compute the dissimilarity between two clusters R and S? kdistance or single-link: measures (linkage results in measures) chaining (clusters hierarchical can methods get very l d(r, S) = min How measure the distance between two clusters, where d(x R, x each cluster S ) x R 2R,x S 2S is generally a set. Four widely used measures for distance between clusters are as follows Minimum distance d min (C i, C j ) = min { p q } p C i,q C j k or complete-link: results in small, round shaped clusters Maximum distance d max (C i, C j ) = max { p q } p C i,q C j Mean distance e-link: compromise between single and complexte linkage Average distance d(r, S) = max d(x R, x S ) x R 2R,x S 2S d(r, S) = 1 d mean (C i, C j ) = µ i µ j X d(x R, x S ) d min (C i, C j ) = 1 R S p q xn Ri 2R,x N j p C S 2S i,q C j Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34

23 e Hierarchical methods Step 4 Step 3 Step 2 Step 1 Step 0 Divisive (DIANA) Agglomerative and divisive hierarchical clustering on data objects {a, b, c, d, e}. Level l = 0 l = 1 l = 2 l = 3 l = 4 a b c d e Similarity scale Dendrogram representation for hierarchical clustering of data objects {a, b, c, d, e}. different clusters. This is a single-linkage approach in that each cluster is represent Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34

24 Table of contents 1 Introduction 2 K-means clustering 3 Hierarchical Clustering 4 Model-based clustering 5 Cluster validation and assessment Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34

25 Model-based clustering K-means is closely related to a probabilistic model known as the Gaussian mixture model. K p(x) = π k N (x µ k, Σ k ) k=1 π k, µ k, Σ k are parameters. π k are called mixing proportions and each Gaussian is called a mixture component. The model is simply a weighted sum of Gaussian. But it is much more powerful than a Gaussian mixture models example Gaussian mixture single Gaussian, because it can model multi-modal distributions. Gaussian mixture models A Gaussian fit to Note that for p(x) to be I Aa mixture probability of threedistribution, Gaussians. we require that k π k = 1 and that for all k we have π k > 0. Thus, we may interpret the π k as probabilities themselves. Set of parameters θ = {{π k }, {µ k }, Roland {Σ k Memisevic }} Machine Learning 21 Gaussian mixture Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34

26 Model-based clustering (cont.) Let use a K-dimensional binary random variable z in which a particular element z k equals to 1 and other elements are 0. The values of z k therefore satisfy z k {0, 1} and k z k = 1 We define the joint distribution p(x, z) in terms of a marginal distribution p(z) and a conditional distribution p(x z). The marginal distribution over z is specified in terms of π k, such that We can write this distribution in the form of p(z k = 1) = π k p(z k = 1) = The conditional distribution of x given a particular value for z is a Gaussian This can also be written in the form of K k=1 π z k k p(x z k = 1) = N (x µ k, Σ k ) p(x z k = 1) = K N (x µ k, Σ k ) z k k=1 Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34

27 Model-based clustering (cont.) The marginal distribution of x equals to p(x) = z K p(z)p(x z) = π k N (x µ k, Σ k ) k=1 We can write p(z k = 1 x) as γ(z k ) = p(z k = 1 x) = p(z k = 1)p(x z k = 1) p(x) p(z k = 1)p(x z k = 1) = K j=1 p(z j = 1)p(x z j = 1) = π k N (x µ k, Σ k ) K j=1 π jn (x µ j, Σ j ) We shall view π k as the prior probability of z k = 1, and the quantity γ(z k ) as the corresponding posterior probability once we have observed x. Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34

28 Gaussian mixture model (example) PROBABILITY DISTRIBUTIONS 1 (a) 1 (b) Mixtures of Gaussians 433 Figure 2.23 Illustration of a mixture of 3 Gaussians in a two-dimensional space. (a) Contours of constant density for each of the mixture components, in which the 3 components are denoted red, blue and green, and the values of the mixing coefficients are shown below each component. (b) Contours of the marginal probability density (a) p(x) of the mixture distribution. (c) A(b) surface plot of the distribution p(x). (c) We therefore see that the mixing coefficients satisfy the requirements to be probabilities From the sum and product rules, the marginal density is given by 0 p(x) = K p(k)p(x k) 0 (2.191) which is equivalent to (2.188) in which we can view π k = p(k) as the prior prob- 500 points of picking drawn from the kthe component, mixture of 3 Gaussians and the density shownn in(x µ Figure k, Σ2.23. k )=p(x k) (a) Samples Figure 9.5 Example ofability as Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34 k=1

29 Model-based clustering (cont.) Let X = {x 1,..., x N } be drawn i.i.d. from mixture of Gaussian. The log-likelihood of the observations equals to [ N K ] ln p(x µ, π, Σ) = ln π k N (x n µ k, Σ k ) n=1 Finding the derivatives of ln p(x µ, π, Σ) with respect to µ k and setting it equal to zero, we obtain N π k N (x n µ k, Σ k ) 0 = K n=1 j=1 π Σ k (x n µ k ) jn (x n µ j, Σ j ) }{{} γ(z nk ) Multiplying by Σ 1 k k=1 and then simplifying, we obtain µ k = 1 N γ(z nk )x n N k N k = n=1 N γ(z nk ) n=1 Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34

30 Model-based clustering (cont.) Finding the derivatives of ln p(x µ, π, Σ) with respect to Σ k and setting it equal to zero, we obtain Σ k = 1 N k N n=1 γ(z nk )(x n µ k )(x n µ k ) T We maximize ln p(x µ, π, Σ) with respect to π k with constraint K k=1 π k = 1. This can be achieved using a Lagrange multiplier and maximizing the following quantity ( K ) ln p(x µ, π, Σ) + λ π k 1. which gives N n=1 k=1 π k N (x n µ k, Σ k ) K j=1 π jn (x n µ j, Σ j ) + λ If we now multiply both sides by π k and sum over k making use of the constraint K k=1 π k = 1, we find λ = N. Using this to eliminate λ and rearranging we obtain π k = N k N Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34

31 EM for Gassian mixture models 1 Initialize µ k, Σ k, and π k, and evaluate the initial value of the log likelihood. 2 E step Evaluate γ(z nk ) using the current parameter values γ(z nk ) = π kn (x n µ k, Σ k ) K j=1 π jn (x n µ j, Σ j ) 3 M step Re-estimate the parameters using the current value of γ(z nk ) µ k = 1 N γ(z nk )x n N k n=1 Σ k = 1 N γ(z nk )(x n µ k )(x n µ k ) T N k π k = N k N n=1 where N k = N n=1 γ(z nk). 4 Evaluate the log likelihood ln p(x µ, π, Σ) = [ N n=1 ln K ] k=1 π kn (x n µ k, Σ k ) and check for convergence of either the parameters or the log likelihood. If the convergence criterion is not satisfied return to step 2. Please read section 9.2 of Bishop. Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34

32 Model-based clustering (example) 9.2. Mixtures of Gaussians L = (a) (b) (c) 2 2 L =2 2 L =5 2 L = (d) (e) (f) 2 Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34

33 Model-based clustering (GMM vs K-means) vs K-means 1 consider a dataset clustered by K-means and GMM.. M clustering (rightmost figure), the most probable cluster for each point ha 2 For the GMM clustering, the most probable cluster for each point has been labeled. 3 K-means, unlike GMM, tends to learn equi-sized clusters. 4 In what situation, the results of GMM is equivalent to the results of K-means? (do it.) Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34

34 Table of contents 1 Introduction 2 K-means clustering 3 Hierarchical Clustering 4 Model-based clustering 5 Cluster validation and assessment Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34

35 Cluster validation and assessment How good is the clustering generated by a method? How can we compare the clusterings generated by different methods? Clustering is an unupervised learning technique and it is hard to evaluate the quality of the output of any given method. If we use probabilistic models, we can always evaluate the likelihood of a test set, but this has two drawbacks: 1 It does not directly assess any clustering that is discovered by the model. 2 It does not apply to non-probabilistic methods. We discuss some performance measures not based on likelihood. The goal of clustering is to assign points that are similar to the same cluster, and to ensure that points that are dissimilar are in different clusters. There are several ways of measuring these quantities 1 Internal criterion : Typical objective functions in clustering formalize the goal of attaining high intra-cluster similarity and low inter-cluster similarity. But good scores on an internal criterion do not necessarily translate into good effectiveness in an application. An alternative to internal criteria is direct evaluation in the application of interest. 2 External criterion : Suppose we have labels for each object. Then we can compare the clustering with the labels using various metrics. We will use some of these metrics later, when we compare clustering methods. Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34

36 troduction Purity Purity is a simple and transparent evaluation measure. Consider the following clustering. Let N ij be the number of objects in cluster i that belongs to class j and N i = C j=1 N ij be the total number of objects in cluster i Three clusters with labeled objects inside. Based on Figure 16.4 of (Manning et We define purity of cluster i as p i max j ( Nij purity i N i ), and the overall purity of a clustering as ing is an unupervised learning technique, so it is hard to evaluate the quality of givenfor method. the aboveiffigure, we use the purity probabilistic is models, we can always evaluate the lik set,butthishastwodrawbacks: 6 5 first,itdoesnotdirectlyassessanycluster 17 6 red by the model; and second, it does not = = apply to non-probabilistic method Bad clusterings have purity values close to 0, a perfect clustering has a purity of 1. uss some performance measures not based on likelihood. High purity is easy to achieve when the number of clusters is large. In particular, purity is itively, 1 if the eachgoal pointof gets clustering its own cluster. is to Thus, assign we cannot points use purity that to aretrade similar off theto quality theofsam ensure the that clustering points against that theare number dissimilar of clusters. are in different clusters. There are se N i N p i. Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34

37 Rand index Let U = {u 1,..., u R } and V = {v 1,..., v C } be two different (flat) clustering of N data points. For example, U might be the estimated clustering and V is reference clustering derived from the class labels. Define a 2 2 contingency table, containing the following numbers: 1 TP is the number of pairs that are in the same cluster in both U and V (true positives); 2 TN is the number of pairs that are in different clusters in bothu and V (true negatives); 3 FN is the number of pairs that are in different clusters in U but the same cluster in V (false negatives); 4 FP is the number of pairs that are in the same cluster in U but different clusters in V (false positives). Rand index is defined as RI TP + TN TP + FP + FN + TN Rand index can be interpreted as the fraction of clustering decisions that are correct. Clearly RI [0, 1]. Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34

38 Rand index (example) Introduction 877 Consider the following clustering The three clusters contain 6, 6 and 5 points, so we have ( ) ( ) ( ) Figure 25.1 Three clusters with labeled objects 6 inside. 6 Based on 5 Figure 16.4 of (Manning et al. 2008). TP + FP = + + = Clustering The number is an of unupervised true positives learning ( ) technique, ( ) so( it) is hard ( ) to evaluate the quality of the output of any given method. If we use probabilistic 5 5 models, 3 we can 2 always evaluate the likelihood of TP = = 20. atestset,butthishastwodrawbacks: 2 first,itdoesnotdirectlyassessanyclusteringthatis discovered Then FP = by 40 the model; 20 = 20. and Similarly, second, it FN does = 24 not and apply TN to = non-probabilistic 72. methods. So now we discuss some performance measures not based on likelihood. Hence Rand index Intuitively, the goal of clustering is to20 assign + 72points that are similar to the same cluster, and to ensure that points that RI are = dissimilar are 24 in + 72 different = clusters. There are several ways of Rand measuring index only these achieves quantities its lower e.g., see bound (Jainofand 0 ifdubes TP = 1988; TN = Kaufman 0, whichand is arousseeuw rare event. 1990). We However, can define these an adjusted internal criteria Rand index may be of limited use. An alternative is to rely on some external form of data with which to validate the method. For example, suppose we have labels for each index E[index] object, as in Figure (Equivalently, ARI we can have a reference clustering; given a clustering, we can induce a set of labels and vice versa.) maxthen indexwe can E[index]. compare the clustering with the labels using various metrics which we describe below. We will use some of these metrics later, when Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34

39 Mutual information Another measure of cluster quality is computing mutual information between U and V. Let P UV (i, j) = u i v j N be the probability that a randomly chosen object belongs to cluster u i in U and v j in V. Let P U (i) = u i N be the be the probability that a randomly chosen object belongs to cluster u i in U. Let P V (j) = v j N be the be the probability that a randomly chosen object belongs to cluster v j in V. Then mutual information is defined R C I(U, V ) P UV (i, j) log P UV (i, j) P U (i)p V (j). i=1 j=1 This lies between 0 and min{h(u), H(V )}. The maximum value can be achieved by using a lots of small clusters, which have low entropy. To compensate this, we can use normalized mutual information (NMI) This lies between 0 and 1. Please read section 25.1 of Murphy. I(U, V ) NMI (U, V ) 1 2 [H(U) + H(V )]. Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34

Clustering and Gaussian Mixture Models

Clustering and Gaussian Mixture Models Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap

More information

Computer Vision Group Prof. Daniel Cremers. 14. Clustering

Computer Vision Group Prof. Daniel Cremers. 14. Clustering Group Prof. Daniel Cremers 14. Clustering Motivation Supervised learning is good for interaction with humans, but labels from a supervisor are hard to obtain Clustering is unsupervised learning, i.e. it

More information

Clustering using Mixture Models

Clustering using Mixture Models Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification

More information

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization

More information

Data Preprocessing. Cluster Similarity

Data Preprocessing. Cluster Similarity 1 Cluster Similarity Similarity is most often measured with the help of a distance function. The smaller the distance, the more similar the data objects (points). A function d: M M R is a distance on M

More information

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures

More information

Mixtures of Gaussians. Sargur Srihari

Mixtures of Gaussians. Sargur Srihari Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

CSE446: Clustering and EM Spring 2017

CSE446: Clustering and EM Spring 2017 CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Clustering and Gaussian Mixtures

Clustering and Gaussian Mixtures Clustering and Gaussian Mixtures Oliver Schulte - CMPT 883 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15 2 25 5 1 15 2 25 detected tures detected

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 203 http://ce.sharif.edu/courses/9-92/2/ce725-/ Agenda Expectation-maximization

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

Clustering. Léon Bottou COS 424 3/4/2010. NEC Labs America

Clustering. Léon Bottou COS 424 3/4/2010. NEC Labs America Clustering Léon Bottou NEC Labs America COS 424 3/4/2010 Agenda Goals Representation Capacity Control Operational Considerations Computational Considerations Classification, clustering, regression, other.

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

Clustering with k-means and Gaussian mixture distributions

Clustering with k-means and Gaussian mixture distributions Clustering with k-means and Gaussian mixture distributions Machine Learning and Object Recognition 2017-2018 Jakob Verbeek Clustering Finding a group structure in the data Data in one cluster similar to

More information

Clustering with k-means and Gaussian mixture distributions

Clustering with k-means and Gaussian mixture distributions Clustering with k-means and Gaussian mixture distributions Machine Learning and Category Representation 2014-2015 Jakob Verbeek, ovember 21, 2014 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Latent Variable Models and EM Algorithm

Latent Variable Models and EM Algorithm SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision

More information

Intelligent Systems:

Intelligent Systems: Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition

More information

Clustering, K-Means, EM Tutorial

Clustering, K-Means, EM Tutorial Clustering, K-Means, EM Tutorial Kamyar Ghasemipour Parts taken from Shikhar Sharma, Wenjie Luo, and Boris Ivanovic s tutorial slides, as well as lecture notes Organization: Clustering Motivation K-Means

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 Discriminative vs Generative Models Discriminative: Just learn a decision boundary between your

More information

Sparse Linear Models (10/7/13)

Sparse Linear Models (10/7/13) STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine

More information

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14 STATS 306B: Unsupervised Learning Spring 2014 Lecture 5 April 14 Lecturer: Lester Mackey Scribe: Brian Do and Robin Jia 5.1 Discrete Hidden Markov Models 5.1.1 Recap In the last lecture, we introduced

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 6 Jan-Willem van de Meent (credit: Yijun Zhao, Chris Bishop, Andrew Moore, Hastie et al.) Project Project Deadlines 3 Feb: Form teams of

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm The Expectation-Maximization Algorithm Francisco S. Melo In these notes, we provide a brief overview of the formal aspects concerning -means, EM and their relation. We closely follow the presentation in

More information

Probabilistic clustering

Probabilistic clustering Aprendizagem Automática Probabilistic clustering Ludwig Krippahl Probabilistic clustering Summary Fuzzy sets and clustering Fuzzy c-means Probabilistic Clustering: mixture models Expectation-Maximization,

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

CENG 793. On Machine Learning and Optimization. Sinan Kalkan

CENG 793. On Machine Learning and Optimization. Sinan Kalkan CENG 793 On Machine Learning and Optimization Sinan Kalkan 2 Now Introduction to ML Problem definition Classes of approaches K-NN Support Vector Machines Softmax classification / logistic regression Parzen

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Data Exploration and Unsupervised Learning with Clustering

Data Exploration and Unsupervised Learning with Clustering Data Exploration and Unsupervised Learning with Clustering Paul F Rodriguez,PhD San Diego Supercomputer Center Predictive Analytic Center of Excellence Clustering Idea Given a set of data can we find a

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

Clustering VS Classification

Clustering VS Classification MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:

More information

Expectation maximization

Expectation maximization Expectation maximization Subhransu Maji CMSCI 689: Machine Learning 14 April 2015 Motivation Suppose you are building a naive Bayes spam classifier. After your are done your boss tells you that there is

More information

Mixture Models and EM

Mixture Models and EM Mixture Models and EM Yoan Miche CIS, HUT March 17, 27 Yoan Miche (CIS, HUT) Mixture Models and EM March 17, 27 1 / 23 Mise en Bouche Yoan Miche (CIS, HUT) Mixture Models and EM March 17, 27 2 / 23 Mise

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:

More information

Review and Motivation

Review and Motivation Review and Motivation We can model and visualize multimodal datasets by using multiple unimodal (Gaussian-like) clusters. K-means gives us a way of partitioning points into N clusters. Once we know which

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest

More information

Lecture 8: Clustering & Mixture Models

Lecture 8: Clustering & Mixture Models Lecture 8: Clustering & Mixture Models C4B Machine Learning Hilary 2011 A. Zisserman K-means algorithm GMM and the EM algorithm plsa clustering K-means algorithm K-means algorithm Partition data into K

More information

Clustering with k-means and Gaussian mixture distributions

Clustering with k-means and Gaussian mixture distributions Clustering with k-means and Gaussian mixture distributions Machine Learning and Category Representation 2012-2013 Jakob Verbeek, ovember 23, 2012 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.12.13

More information

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2 STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Final Exam, Machine Learning, Spring 2009

Final Exam, Machine Learning, Spring 2009 Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3

More information

Machine Learning - MT Clustering

Machine Learning - MT Clustering Machine Learning - MT 2016 15. Clustering Varun Kanade University of Oxford November 28, 2016 Announcements No new practical this week All practicals must be signed off in sessions this week Firm Deadline:

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

Gaussian Mixture Models, Expectation Maximization

Gaussian Mixture Models, Expectation Maximization Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak

More information

CS181 Midterm 2 Practice Solutions

CS181 Midterm 2 Practice Solutions CS181 Midterm 2 Practice Solutions 1. Convergence of -Means Consider Lloyd s algorithm for finding a -Means clustering of N data, i.e., minimizing the distortion measure objective function J({r n } N n=1,

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete

More information

Technical Details about the Expectation Maximization (EM) Algorithm

Technical Details about the Expectation Maximization (EM) Algorithm Technical Details about the Expectation Maximization (EM Algorithm Dawen Liang Columbia University dliang@ee.columbia.edu February 25, 2015 1 Introduction Maximum Lielihood Estimation (MLE is widely used

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

Clustering. Stephen Scott. CSCE 478/878 Lecture 8: Clustering. Stephen Scott. Introduction. Outline. Clustering.

Clustering. Stephen Scott. CSCE 478/878 Lecture 8: Clustering. Stephen Scott. Introduction. Outline. Clustering. 1 / 19 sscott@cse.unl.edu x1 If no label information is available, can still perform unsupervised learning Looking for structural information about instance space instead of label prediction function Approaches:

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 6: Cluster Analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2017/2018 Master in Mathematical Engineering

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection

More information

Series 7, May 22, 2018 (EM Convergence)

Series 7, May 22, 2018 (EM Convergence) Exercises Introduction to Machine Learning SS 2018 Series 7, May 22, 2018 (EM Convergence) Institute for Machine Learning Dept. of Computer Science, ETH Zürich Prof. Dr. Andreas Krause Web: https://las.inf.ethz.ch/teaching/introml-s18

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Machine Learning, Midterm Exam

Machine Learning, Midterm Exam 10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have

More information

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Unsupervised machine learning

Unsupervised machine learning Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels

More information

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels? Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity

More information

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

Lecture 14. Clustering, K-means, and EM

Lecture 14. Clustering, K-means, and EM Lecture 14. Clustering, K-means, and EM Prof. Alan Yuille Spring 2014 Outline 1. Clustering 2. K-means 3. EM 1 Clustering Task: Given a set of unlabeled data D = {x 1,..., x n }, we do the following: 1.

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu November 3, 2015 Methods to Learn Matrix Data Text Data Set Data Sequence Data Time Series Graph

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396 Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:

More information

Statistical Learning. Dong Liu. Dept. EEIS, USTC

Statistical Learning. Dong Liu. Dept. EEIS, USTC Statistical Learning Dong Liu Dept. EEIS, USTC Chapter 6. Unsupervised and Semi-Supervised Learning 1. Unsupervised learning 2. k-means 3. Gaussian mixture model 4. Other approaches to clustering 5. Principle

More information

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

BAYESIAN DECISION THEORY

BAYESIAN DECISION THEORY Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date

More information

Applying cluster analysis to 2011 Census local authority data

Applying cluster analysis to 2011 Census local authority data Applying cluster analysis to 2011 Census local authority data Kitty.Lymperopoulou@manchester.ac.uk SPSS User Group Conference November, 10 2017 Outline Basic ideas of cluster analysis How to choose variables

More information

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE You are allowed a two-page cheat sheet. You are also allowed to use a calculator. Answer the questions in the spaces provided on the question sheets.

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information