Machine Learning. Clustering 1. Hamid Beigy. Sharif University of Technology. Fall 1395
|
|
- Archibald Ferguson
- 5 years ago
- Views:
Transcription
1 Machine Learning Clustering 1 Hamid Beigy Sharif University of Technology Fall Some slides are taken from P. Rai slides Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34
2 Table of contents 1 Introduction 2 K-means clustering 3 Hierarchical Clustering 4 Model-based clustering 5 Cluster validation and assessment Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34
3 Table of contents 1 Introduction 2 K-means clustering 3 Hierarchical Clustering 4 Model-based clustering 5 Cluster validation and assessment Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34
4 Introduction g Clustering is the process of grouping a set of data objects into multiple groups or clusters so that objects within a cluster have high similarity, but are very dissimilar to objects in other clusters. Dissimilarities and similarities are assessed based on the feature values describing the objects and often involve distance measures. Clustering is usually an unsupervised learning problem. Consider a dataset X = {x 1,..., x N },x i R D. Assume there are K clusters C 1,..., C K. The goal is to group the examples into K homogeneous partitions. an unsupervised learning problem N unlabeled examples {x 1,...,x N }; no. of desired partition roup the examples into K homogeneous partitions Picture courtesy: Data Clustering: 50 Years Beyond K-Means, A.K. Jain (2008) Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34
5 Introduction A good clustering is one that achieves: High within-cluster similarity Low inter-cluster similarity Applications of clustering Document/Image/Webpage Clustering Image Segmentation Clustering web-search results Clustering (people) nodes in (social) networks/graphs Pre-processing phase Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34
6 Comparing clustering methods The clustering methods can be compared using the following aspects: The partitioning criteria : In some methods, all the objects are partitioned so that no hierarchy exists among the clusters. Separation of clusters : In some methods, data partitioned into mutually exclusive clusters while in some other methods, the clusters may not be exclusive, that is, a data object may belong to more than one cluster. Similarity measure : Some methods determine the similarity between two objects by the distance between them; while in other methods, the similarity may be defined by connectivity based on density or contiguity. Clustering space : Many clustering methods search for clusters within the entire data space. These methods are useful for low-dimensionality data sets. With high- dimensional data, however, there can be many irrelevant attributes, which can make similarity measurements unreliable. Consequently, clusters found in the full space are often meaningless. Its often better to instead search for clusters within different subspaces of the same data set. Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34
7 1 Flat or Partitional clustering Types of Clustering Flat or Partitional clustering (Partitions are independent f each of each other) Partitions are independent of each other Hierarchical clustering (Partitions can be visualized using a tree structure (a 2 Hierarchical clustering dendrogram)) Partitions can be visualized using a tree structure (a dendrogra Machine Learning (CS771A) Possible to view partitions at di erent levels of granularities (i. refine/coarsen clusters) Possible using to view di erent partitions K at different levels of granularities (i.e., can refine/coarsen clusters) using different K. Clustering: K-means and Kernel K-means Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34
8 Table of contents 1 Introduction 2 K-means clustering 3 Hierarchical Clustering 4 Model-based clustering 5 Cluster validation and assessment Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34
9 Flat Clustering: K-means algorithm (Lloyd, 1957) Associate a prototype µ k, k = 1,..., K with each cluster. Let r nk be the indicator of x n C k. The goal is to minimize Goal is to find {r nk } and {µ k }. J = N n=1 k=1 K r nk x n µ k 2 Optimization is performed by alternating minimization Optimize over {r nk } for a fixed {µ k }. Optimize over {µ k } for a fixed {r nk }. Initialize with arbitrary choices of {µ k } Iterative updates continue till convergence Iterative updates continue till convergence Guaranteed to converge, as objective is monotonic decreasing Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34
10 Flat Clustering: K-means algorithm (Lloyd, 1957) Input: N examples {x 1,..., x N }; x n R D ; the number of partitions K Initialize: K cluster means µ 1,..., µ K, each µ k R D. Usually initialized randomly, but good initialization is crucial; many smarter initialization heuristics exist (e.g., K-means++, Arthur & Vassilvitskii, 2007) Repeat: (Re)-Assign each example xn to its closest cluster center (based on the smallest Euclidean distance) r nk = { 1 if k = argmin x n µ j 2 j 0 otherwise. Let C k is the set of examples assigned to cluster k with center µ k. Update the cluster means µ k = 1 C k N n=1 x n = r nkx n N x n C k n=1 r nk Stop: when cluster means or the loss (defined later) doesn t change by much Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34
11 Kmeans Example -means example Roland Memisevic Machine Learning 14 Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34
12 Kmeans 336 Example Representative-based Clustering (a) Initial dataset µ 1 = 2 µ 2 = (b) Iteration: t = 1 µ 1 = 2.5 µ 2 = µ 1 = (c) Iteration: t = 2 µ 2 = (d) Iteration: t = 3 µ 1 = 4.75 µ 2 = µ 1 = 7 (e) Iteration: t = µ 2 = (f) Iteration: t = 5(converged) Figure K-means in one dimension. Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34
13 Kmeans Example I Replace the RGB-value (a 3-D vector) at each pixel with one of K prototypes: Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34
14 Decrease in objective function ecrease in objective function 1000 J Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34
15 K-means objective function Consider the K-means objective function J(X, µ, r) = N n=1 k=1 K r nk x n µ k 2 It is a non-convex objective function, so may have many local minima. Also NP-hard to minimize in general (note that r is discrete) The K-means algorithm is a heuristic to optimize this function K-means algorithm alternated between the following two steps assign points to closest centers recompute the center means The algorithm usually converges to a local minima. Multiple runs with different initializations are usually tried to find a good solution. Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34
16 ect K for the K-means algorithm is to try di e K-means (choosing K) means objective versus K, and look at the elb Try different values of K, plot the K-means objective versus K. We can also use information criterion such as AIC (Akaike Information Criterion) or BIC plot, (Bayesian K = Information 6 iscriterion) the elbow and choose Kpoint that gives smallest AIC/BIC (both penalize large K values) Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34
17 Kernel K-means or Spectral clustering can handle non-convex ns Hamid Beigy or(sharif Spectral University of Technology) clustering Machine Learning can handle non-convex Fall / 34 K-means (limitations) ly is the clusters are roughtly of equal sizes Makes hard assignments of points to clusters lustering A point either methods completely belongs such to a cluster asorgaussian doesn t belong at allmixture mod No notion of a soft assignment. ues (model each cluster using a Gaussian distribu Works well only is the clusters are roughly of equal sizes. Probabilistic clustering methods such as Gaussian mixture models can handle both these issues works well only when the clusters are round-shap usters have non-convex shapes K-means also works well only when the clusters are round-shaped and does badly if the clusters have non-convex shapes
18 Kernel K-means t have to be computed/stored for data {x n } N n= because computations only depend on kernel 1 The idea is to replace the Euclidean distance/similarity computations in K-means by the kernelized versions ing k(x n, x n ) above is straightforward. Comput e involving µ d k 2 (x s n, µ requires k ) = ϕ(x n ) ϕ(µ some k ) simple algebra (s 2 = K(x n, x n ) + K(µ k, µ k ) 2K(µ k, x n ) Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34
19 Kernel K-means Computing K(x n, x n ) is easy: Simply compute this kernel function. Compute K(µ k, µ k ) as follows (assume N k is the no. of points in cluster k) K(µ k, µ k ) = ϕ T (µ k )ϕ(µ k ) = = 1 N 2 k = 1 N 2 k K(µ k, x n ) can be computed as N k N k n=1 m=1 N k N k n=1 m=1 [ 1 ] N T [ k 1 ϕ(x n ) N k n=1 ϕ T (x n )ϕ(x m ) K(x n, x m ) K(µ k, x n ) = ϕ T (µ k )ϕ(x n ) = = 1 N k N k N k m=1 m=1 [ 1 N k N k n=1 ] N k ϕ(x n ) N k n=1 T ϕ(x n )] ϕ(x n ) ϕ T (x n )ϕ(x m ) = 1 N k N k m=1 K(x n, x m ) Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34
20 Table of contents 1 Introduction 2 K-means clustering 3 Hierarchical Clustering 4 Model-based clustering 5 Cluster validation and assessment Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34
21 Hierarchical Clustering 0 Chapter A hierarchical 10 Clusterclustering Analysis: Basic method Concepts works andby Methods grouping data objects into a hierarchy or tree of clusters. Agglomerative (AGNES) Step 0 Step 1 Step 2 Step 3 Step 4 a ab b c cde abcde d de e Step 4 Step 3 Step 2 Step 1 Step 0 Divisive (DIANA) Figure Hierarchical 10.6 Agglomerative clustering andmethods divisive hierarchical clustering on data objects {a, b, c, d, e}. Agglomerative hierarchical clustering Divisive hierarchical Level clustering a b c d e l = l = 1 l = 2 Hamid Beigy (Sharif University of Technology) Machine Learning Fall / scale
22 compute the dissimilarity between two clusters R and S? kdistance or single-link: measures (linkage results in measures) chaining (clusters hierarchical can methods get very l d(r, S) = min How measure the distance between two clusters, where d(x R, x each cluster S ) x R 2R,x S 2S is generally a set. Four widely used measures for distance between clusters are as follows Minimum distance d min (C i, C j ) = min { p q } p C i,q C j k or complete-link: results in small, round shaped clusters Maximum distance d max (C i, C j ) = max { p q } p C i,q C j Mean distance e-link: compromise between single and complexte linkage Average distance d(r, S) = max d(x R, x S ) x R 2R,x S 2S d(r, S) = 1 d mean (C i, C j ) = µ i µ j X d(x R, x S ) d min (C i, C j ) = 1 R S p q xn Ri 2R,x N j p C S 2S i,q C j Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34
23 e Hierarchical methods Step 4 Step 3 Step 2 Step 1 Step 0 Divisive (DIANA) Agglomerative and divisive hierarchical clustering on data objects {a, b, c, d, e}. Level l = 0 l = 1 l = 2 l = 3 l = 4 a b c d e Similarity scale Dendrogram representation for hierarchical clustering of data objects {a, b, c, d, e}. different clusters. This is a single-linkage approach in that each cluster is represent Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34
24 Table of contents 1 Introduction 2 K-means clustering 3 Hierarchical Clustering 4 Model-based clustering 5 Cluster validation and assessment Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34
25 Model-based clustering K-means is closely related to a probabilistic model known as the Gaussian mixture model. K p(x) = π k N (x µ k, Σ k ) k=1 π k, µ k, Σ k are parameters. π k are called mixing proportions and each Gaussian is called a mixture component. The model is simply a weighted sum of Gaussian. But it is much more powerful than a Gaussian mixture models example Gaussian mixture single Gaussian, because it can model multi-modal distributions. Gaussian mixture models A Gaussian fit to Note that for p(x) to be I Aa mixture probability of threedistribution, Gaussians. we require that k π k = 1 and that for all k we have π k > 0. Thus, we may interpret the π k as probabilities themselves. Set of parameters θ = {{π k }, {µ k }, Roland {Σ k Memisevic }} Machine Learning 21 Gaussian mixture Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34
26 Model-based clustering (cont.) Let use a K-dimensional binary random variable z in which a particular element z k equals to 1 and other elements are 0. The values of z k therefore satisfy z k {0, 1} and k z k = 1 We define the joint distribution p(x, z) in terms of a marginal distribution p(z) and a conditional distribution p(x z). The marginal distribution over z is specified in terms of π k, such that We can write this distribution in the form of p(z k = 1) = π k p(z k = 1) = The conditional distribution of x given a particular value for z is a Gaussian This can also be written in the form of K k=1 π z k k p(x z k = 1) = N (x µ k, Σ k ) p(x z k = 1) = K N (x µ k, Σ k ) z k k=1 Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34
27 Model-based clustering (cont.) The marginal distribution of x equals to p(x) = z K p(z)p(x z) = π k N (x µ k, Σ k ) k=1 We can write p(z k = 1 x) as γ(z k ) = p(z k = 1 x) = p(z k = 1)p(x z k = 1) p(x) p(z k = 1)p(x z k = 1) = K j=1 p(z j = 1)p(x z j = 1) = π k N (x µ k, Σ k ) K j=1 π jn (x µ j, Σ j ) We shall view π k as the prior probability of z k = 1, and the quantity γ(z k ) as the corresponding posterior probability once we have observed x. Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34
28 Gaussian mixture model (example) PROBABILITY DISTRIBUTIONS 1 (a) 1 (b) Mixtures of Gaussians 433 Figure 2.23 Illustration of a mixture of 3 Gaussians in a two-dimensional space. (a) Contours of constant density for each of the mixture components, in which the 3 components are denoted red, blue and green, and the values of the mixing coefficients are shown below each component. (b) Contours of the marginal probability density (a) p(x) of the mixture distribution. (c) A(b) surface plot of the distribution p(x). (c) We therefore see that the mixing coefficients satisfy the requirements to be probabilities From the sum and product rules, the marginal density is given by 0 p(x) = K p(k)p(x k) 0 (2.191) which is equivalent to (2.188) in which we can view π k = p(k) as the prior prob- 500 points of picking drawn from the kthe component, mixture of 3 Gaussians and the density shownn in(x µ Figure k, Σ2.23. k )=p(x k) (a) Samples Figure 9.5 Example ofability as Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34 k=1
29 Model-based clustering (cont.) Let X = {x 1,..., x N } be drawn i.i.d. from mixture of Gaussian. The log-likelihood of the observations equals to [ N K ] ln p(x µ, π, Σ) = ln π k N (x n µ k, Σ k ) n=1 Finding the derivatives of ln p(x µ, π, Σ) with respect to µ k and setting it equal to zero, we obtain N π k N (x n µ k, Σ k ) 0 = K n=1 j=1 π Σ k (x n µ k ) jn (x n µ j, Σ j ) }{{} γ(z nk ) Multiplying by Σ 1 k k=1 and then simplifying, we obtain µ k = 1 N γ(z nk )x n N k N k = n=1 N γ(z nk ) n=1 Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34
30 Model-based clustering (cont.) Finding the derivatives of ln p(x µ, π, Σ) with respect to Σ k and setting it equal to zero, we obtain Σ k = 1 N k N n=1 γ(z nk )(x n µ k )(x n µ k ) T We maximize ln p(x µ, π, Σ) with respect to π k with constraint K k=1 π k = 1. This can be achieved using a Lagrange multiplier and maximizing the following quantity ( K ) ln p(x µ, π, Σ) + λ π k 1. which gives N n=1 k=1 π k N (x n µ k, Σ k ) K j=1 π jn (x n µ j, Σ j ) + λ If we now multiply both sides by π k and sum over k making use of the constraint K k=1 π k = 1, we find λ = N. Using this to eliminate λ and rearranging we obtain π k = N k N Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34
31 EM for Gassian mixture models 1 Initialize µ k, Σ k, and π k, and evaluate the initial value of the log likelihood. 2 E step Evaluate γ(z nk ) using the current parameter values γ(z nk ) = π kn (x n µ k, Σ k ) K j=1 π jn (x n µ j, Σ j ) 3 M step Re-estimate the parameters using the current value of γ(z nk ) µ k = 1 N γ(z nk )x n N k n=1 Σ k = 1 N γ(z nk )(x n µ k )(x n µ k ) T N k π k = N k N n=1 where N k = N n=1 γ(z nk). 4 Evaluate the log likelihood ln p(x µ, π, Σ) = [ N n=1 ln K ] k=1 π kn (x n µ k, Σ k ) and check for convergence of either the parameters or the log likelihood. If the convergence criterion is not satisfied return to step 2. Please read section 9.2 of Bishop. Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34
32 Model-based clustering (example) 9.2. Mixtures of Gaussians L = (a) (b) (c) 2 2 L =2 2 L =5 2 L = (d) (e) (f) 2 Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34
33 Model-based clustering (GMM vs K-means) vs K-means 1 consider a dataset clustered by K-means and GMM.. M clustering (rightmost figure), the most probable cluster for each point ha 2 For the GMM clustering, the most probable cluster for each point has been labeled. 3 K-means, unlike GMM, tends to learn equi-sized clusters. 4 In what situation, the results of GMM is equivalent to the results of K-means? (do it.) Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34
34 Table of contents 1 Introduction 2 K-means clustering 3 Hierarchical Clustering 4 Model-based clustering 5 Cluster validation and assessment Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34
35 Cluster validation and assessment How good is the clustering generated by a method? How can we compare the clusterings generated by different methods? Clustering is an unupervised learning technique and it is hard to evaluate the quality of the output of any given method. If we use probabilistic models, we can always evaluate the likelihood of a test set, but this has two drawbacks: 1 It does not directly assess any clustering that is discovered by the model. 2 It does not apply to non-probabilistic methods. We discuss some performance measures not based on likelihood. The goal of clustering is to assign points that are similar to the same cluster, and to ensure that points that are dissimilar are in different clusters. There are several ways of measuring these quantities 1 Internal criterion : Typical objective functions in clustering formalize the goal of attaining high intra-cluster similarity and low inter-cluster similarity. But good scores on an internal criterion do not necessarily translate into good effectiveness in an application. An alternative to internal criteria is direct evaluation in the application of interest. 2 External criterion : Suppose we have labels for each object. Then we can compare the clustering with the labels using various metrics. We will use some of these metrics later, when we compare clustering methods. Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34
36 troduction Purity Purity is a simple and transparent evaluation measure. Consider the following clustering. Let N ij be the number of objects in cluster i that belongs to class j and N i = C j=1 N ij be the total number of objects in cluster i Three clusters with labeled objects inside. Based on Figure 16.4 of (Manning et We define purity of cluster i as p i max j ( Nij purity i N i ), and the overall purity of a clustering as ing is an unupervised learning technique, so it is hard to evaluate the quality of givenfor method. the aboveiffigure, we use the purity probabilistic is models, we can always evaluate the lik set,butthishastwodrawbacks: 6 5 first,itdoesnotdirectlyassessanycluster 17 6 red by the model; and second, it does not = = apply to non-probabilistic method Bad clusterings have purity values close to 0, a perfect clustering has a purity of 1. uss some performance measures not based on likelihood. High purity is easy to achieve when the number of clusters is large. In particular, purity is itively, 1 if the eachgoal pointof gets clustering its own cluster. is to Thus, assign we cannot points use purity that to aretrade similar off theto quality theofsam ensure the that clustering points against that theare number dissimilar of clusters. are in different clusters. There are se N i N p i. Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34
37 Rand index Let U = {u 1,..., u R } and V = {v 1,..., v C } be two different (flat) clustering of N data points. For example, U might be the estimated clustering and V is reference clustering derived from the class labels. Define a 2 2 contingency table, containing the following numbers: 1 TP is the number of pairs that are in the same cluster in both U and V (true positives); 2 TN is the number of pairs that are in different clusters in bothu and V (true negatives); 3 FN is the number of pairs that are in different clusters in U but the same cluster in V (false negatives); 4 FP is the number of pairs that are in the same cluster in U but different clusters in V (false positives). Rand index is defined as RI TP + TN TP + FP + FN + TN Rand index can be interpreted as the fraction of clustering decisions that are correct. Clearly RI [0, 1]. Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34
38 Rand index (example) Introduction 877 Consider the following clustering The three clusters contain 6, 6 and 5 points, so we have ( ) ( ) ( ) Figure 25.1 Three clusters with labeled objects 6 inside. 6 Based on 5 Figure 16.4 of (Manning et al. 2008). TP + FP = + + = Clustering The number is an of unupervised true positives learning ( ) technique, ( ) so( it) is hard ( ) to evaluate the quality of the output of any given method. If we use probabilistic 5 5 models, 3 we can 2 always evaluate the likelihood of TP = = 20. atestset,butthishastwodrawbacks: 2 first,itdoesnotdirectlyassessanyclusteringthatis discovered Then FP = by 40 the model; 20 = 20. and Similarly, second, it FN does = 24 not and apply TN to = non-probabilistic 72. methods. So now we discuss some performance measures not based on likelihood. Hence Rand index Intuitively, the goal of clustering is to20 assign + 72points that are similar to the same cluster, and to ensure that points that RI are = dissimilar are 24 in + 72 different = clusters. There are several ways of Rand measuring index only these achieves quantities its lower e.g., see bound (Jainofand 0 ifdubes TP = 1988; TN = Kaufman 0, whichand is arousseeuw rare event. 1990). We However, can define these an adjusted internal criteria Rand index may be of limited use. An alternative is to rely on some external form of data with which to validate the method. For example, suppose we have labels for each index E[index] object, as in Figure (Equivalently, ARI we can have a reference clustering; given a clustering, we can induce a set of labels and vice versa.) maxthen indexwe can E[index]. compare the clustering with the labels using various metrics which we describe below. We will use some of these metrics later, when Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34
39 Mutual information Another measure of cluster quality is computing mutual information between U and V. Let P UV (i, j) = u i v j N be the probability that a randomly chosen object belongs to cluster u i in U and v j in V. Let P U (i) = u i N be the be the probability that a randomly chosen object belongs to cluster u i in U. Let P V (j) = v j N be the be the probability that a randomly chosen object belongs to cluster v j in V. Then mutual information is defined R C I(U, V ) P UV (i, j) log P UV (i, j) P U (i)p V (j). i=1 j=1 This lies between 0 and min{h(u), H(V )}. The maximum value can be achieved by using a lots of small clusters, which have low entropy. To compensate this, we can use normalized mutual information (NMI) This lies between 0 and 1. Please read section 25.1 of Murphy. I(U, V ) NMI (U, V ) 1 2 [H(U) + H(V )]. Hamid Beigy (Sharif University of Technology) Machine Learning Fall / 34
Clustering and Gaussian Mixture Models
Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap
More informationComputer Vision Group Prof. Daniel Cremers. 14. Clustering
Group Prof. Daniel Cremers 14. Clustering Motivation Supervised learning is good for interaction with humans, but labels from a supervisor are hard to obtain Clustering is unsupervised learning, i.e. it
More informationClustering using Mixture Models
Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationClustering. CSL465/603 - Fall 2016 Narayanan C Krishnan
Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification
More informationLatent Variable Models and Expectation Maximization
Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization
More informationData Preprocessing. Cluster Similarity
1 Cluster Similarity Similarity is most often measured with the help of a distance function. The smaller the distance, the more similar the data objects (points). A function d: M M R is a distance on M
More informationPattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM
Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures
More informationMixtures of Gaussians. Sargur Srihari
Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More informationLatent Variable Models and Expectation Maximization
Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More informationCSE446: Clustering and EM Spring 2017
CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationExpectation Maximization
Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationClustering and Gaussian Mixtures
Clustering and Gaussian Mixtures Oliver Schulte - CMPT 883 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15 2 25 5 1 15 2 25 detected tures detected
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 203 http://ce.sharif.edu/courses/9-92/2/ce725-/ Agenda Expectation-maximization
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationClustering. Léon Bottou COS 424 3/4/2010. NEC Labs America
Clustering Léon Bottou NEC Labs America COS 424 3/4/2010 Agenda Goals Representation Capacity Control Operational Considerations Computational Considerations Classification, clustering, regression, other.
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationClustering with k-means and Gaussian mixture distributions
Clustering with k-means and Gaussian mixture distributions Machine Learning and Object Recognition 2017-2018 Jakob Verbeek Clustering Finding a group structure in the data Data in one cluster similar to
More informationClustering with k-means and Gaussian mixture distributions
Clustering with k-means and Gaussian mixture distributions Machine Learning and Category Representation 2014-2015 Jakob Verbeek, ovember 21, 2014 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationLatent Variable Models and EM Algorithm
SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationPrinciples of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata
Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision
More informationIntelligent Systems:
Intelligent Systems: Undirected Graphical models (Factor Graphs) (2 lectures) Carsten Rother 15/01/2015 Intelligent Systems: Probabilistic Inference in DGM and UGM Roadmap for next two lectures Definition
More informationClustering, K-Means, EM Tutorial
Clustering, K-Means, EM Tutorial Kamyar Ghasemipour Parts taken from Shikhar Sharma, Wenjie Luo, and Boris Ivanovic s tutorial slides, as well as lecture notes Organization: Clustering Motivation K-Means
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 Discriminative vs Generative Models Discriminative: Just learn a decision boundary between your
More informationSparse Linear Models (10/7/13)
STA56: Probabilistic machine learning Sparse Linear Models (0/7/) Lecturer: Barbara Engelhardt Scribes: Jiaji Huang, Xin Jiang, Albert Oh Sparsity Sparsity has been a hot topic in statistics and machine
More informationSTATS 306B: Unsupervised Learning Spring Lecture 5 April 14
STATS 306B: Unsupervised Learning Spring 2014 Lecture 5 April 14 Lecturer: Lester Mackey Scribe: Brian Do and Robin Jia 5.1 Discrete Hidden Markov Models 5.1.1 Recap In the last lecture, we introduced
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s
More informationData Mining Techniques
Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 6 Jan-Willem van de Meent (credit: Yijun Zhao, Chris Bishop, Andrew Moore, Hastie et al.) Project Project Deadlines 3 Feb: Form teams of
More informationThe Expectation-Maximization Algorithm
The Expectation-Maximization Algorithm Francisco S. Melo In these notes, we provide a brief overview of the formal aspects concerning -means, EM and their relation. We closely follow the presentation in
More informationProbabilistic clustering
Aprendizagem Automática Probabilistic clustering Ludwig Krippahl Probabilistic clustering Summary Fuzzy sets and clustering Fuzzy c-means Probabilistic Clustering: mixture models Expectation-Maximization,
More informationClustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning
Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationCENG 793. On Machine Learning and Optimization. Sinan Kalkan
CENG 793 On Machine Learning and Optimization Sinan Kalkan 2 Now Introduction to ML Problem definition Classes of approaches K-NN Support Vector Machines Softmax classification / logistic regression Parzen
More informationData Mining Techniques
Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!
More informationMachine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.
Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning
More informationFINAL: CS 6375 (Machine Learning) Fall 2014
FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationData Exploration and Unsupervised Learning with Clustering
Data Exploration and Unsupervised Learning with Clustering Paul F Rodriguez,PhD San Diego Supercomputer Center Predictive Analytic Center of Excellence Clustering Idea Given a set of data can we find a
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David
More informationIntroduction to Machine Learning Midterm, Tues April 8
Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend
More informationClustering VS Classification
MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:
More informationExpectation maximization
Expectation maximization Subhransu Maji CMSCI 689: Machine Learning 14 April 2015 Motivation Suppose you are building a naive Bayes spam classifier. After your are done your boss tells you that there is
More informationMixture Models and EM
Mixture Models and EM Yoan Miche CIS, HUT March 17, 27 Yoan Miche (CIS, HUT) Mixture Models and EM March 17, 27 1 / 23 Mise en Bouche Yoan Miche (CIS, HUT) Mixture Models and EM March 17, 27 2 / 23 Mise
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:
More informationReview and Motivation
Review and Motivation We can model and visualize multimodal datasets by using multiple unimodal (Gaussian-like) clusters. K-means gives us a way of partitioning points into N clusters. Once we know which
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows Kn-Nearest
More informationLecture 8: Clustering & Mixture Models
Lecture 8: Clustering & Mixture Models C4B Machine Learning Hilary 2011 A. Zisserman K-means algorithm GMM and the EM algorithm plsa clustering K-means algorithm K-means algorithm Partition data into K
More informationClustering with k-means and Gaussian mixture distributions
Clustering with k-means and Gaussian mixture distributions Machine Learning and Category Representation 2012-2013 Jakob Verbeek, ovember 23, 2012 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.12.13
More informationSTATS 306B: Unsupervised Learning Spring Lecture 2 April 2
STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationFinal Exam, Machine Learning, Spring 2009
Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3
More informationMachine Learning - MT Clustering
Machine Learning - MT 2016 15. Clustering Varun Kanade University of Oxford November 28, 2016 Announcements No new practical this week All practicals must be signed off in sessions this week Firm Deadline:
More informationUnsupervised Learning
2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and
More informationGaussian Mixture Models, Expectation Maximization
Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak
More informationCS181 Midterm 2 Practice Solutions
CS181 Midterm 2 Practice Solutions 1. Convergence of -Means Consider Lloyd s algorithm for finding a -Means clustering of N data, i.e., minimizing the distortion measure objective function J({r n } N n=1,
More informationKernel Methods and Support Vector Machines
Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete
More informationTechnical Details about the Expectation Maximization (EM) Algorithm
Technical Details about the Expectation Maximization (EM Algorithm Dawen Liang Columbia University dliang@ee.columbia.edu February 25, 2015 1 Introduction Maximum Lielihood Estimation (MLE is widely used
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More informationClustering. Stephen Scott. CSCE 478/878 Lecture 8: Clustering. Stephen Scott. Introduction. Outline. Clustering.
1 / 19 sscott@cse.unl.edu x1 If no label information is available, can still perform unsupervised learning Looking for structural information about instance space instead of label prediction function Approaches:
More informationMultivariate Statistics
Multivariate Statistics Chapter 6: Cluster Analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2017/2018 Master in Mathematical Engineering
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationData Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395
Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection
More informationSeries 7, May 22, 2018 (EM Convergence)
Exercises Introduction to Machine Learning SS 2018 Series 7, May 22, 2018 (EM Convergence) Institute for Machine Learning Dept. of Computer Science, ETH Zürich Prof. Dr. Andreas Krause Web: https://las.inf.ethz.ch/teaching/introml-s18
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationMachine Learning, Midterm Exam
10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have
More informationCOMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017
COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationUnsupervised machine learning
Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels
More informationMachine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?
Machine Learning and Bayesian Inference Dr Sean Holden Computer Laboratory, Room FC6 Telephone extension 6372 Email: sbh11@cl.cam.ac.uk www.cl.cam.ac.uk/ sbh11/ Unsupervised learning Can we find regularity
More informationLecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models
Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationLecture 14. Clustering, K-means, and EM
Lecture 14. Clustering, K-means, and EM Prof. Alan Yuille Spring 2014 Outline 1. Clustering 2. K-means 3. EM 1 Clustering Task: Given a set of unlabeled data D = {x 1,..., x n }, we do the following: 1.
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu November 3, 2015 Methods to Learn Matrix Data Text Data Set Data Sequence Data Time Series Graph
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationData Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396
Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:
More informationStatistical Learning. Dong Liu. Dept. EEIS, USTC
Statistical Learning Dong Liu Dept. EEIS, USTC Chapter 6. Unsupervised and Semi-Supervised Learning 1. Unsupervised learning 2. k-means 3. Gaussian mixture model 4. Other approaches to clustering 5. Principle
More informationUnsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto
Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationBAYESIAN DECISION THEORY
Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date
More informationApplying cluster analysis to 2011 Census local authority data
Applying cluster analysis to 2011 Census local authority data Kitty.Lymperopoulou@manchester.ac.uk SPSS User Group Conference November, 10 2017 Outline Basic ideas of cluster analysis How to choose variables
More informationFINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE
FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE You are allowed a two-page cheat sheet. You are also allowed to use a calculator. Answer the questions in the spaces provided on the question sheets.
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More information