Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Size: px
Start display at page:

Download "Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan"

Transcription

1 Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in

2 Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification Continuous output regression Unsupervised learning - Given x " "%& ', can we infer the structure of the data? Learning without a teacher Clustering CSL465/603 - Machine Learning 2

3 Why Unsupervised Learning? Unlabeled data is cheap Labeled data is expensive cumbersome to collect Exploratory data analysis Preprocessing step for supervised learning algorithms Analysis of data in high dimensional spaces Clustering CSL465/603 - Machine Learning 3

4 Cluster Analysis Discover groups such that samples within a group are more similar to each other than samples across groups Clustering CSL465/603 - Machine Learning 4

5 Applications of Clustering (1) Unsupervised image segmentation Clustering CSL465/603 - Machine Learning 5

6 Applications of Clustering (2) Image Compression Clustering CSL465/603 - Machine Learning 6

7 Applications of Clustering (3) Social network clustering Clustering CSL465/603 - Machine Learning 7

8 Applications of Clustering (4) Recommendation Systems Clustering CSL465/603 - Machine Learning 8

9 Components of Clustering A dissimilarity (similarity) function Measures the distance/dissimilarity between examples A loss function Evaluates the clusters An algorithm that optimizes this loss function Clustering CSL465/603 - Machine Learning 9

10 Proximity Matrices Data is directly represented in terms of proximity between pairs of objects Subjectively judged dissimilarities are seldom distance in the strict sense (not necessarily follow the properties of a distance measure) Replace the proximity matrix D by D + D 0 /2 Clustering CSL465/603 - Machine Learning 10

11 Dissimilarity Based on Attributes (1) Data point x " has D features Attributes are real-valued Euclidean distance between the data points : D x ", x 4 = 6 x "7 x 47 ^2 7%& Resulting clusters are invariant to rotation and translation, but not to scaling If features have different scales - standardize the data Clustering CSL465/603 - Machine Learning 11

12 Dissimilarity Based on Attributes (2) Data point x " has D features Attributes are real-valued Any L = norm? D x ", x 4 = 6 x "7 x 47 > : 7%& Cosine distance between the data points : 7%& x "7 x 47 D x ", x 4 = : x A 7%& "7 : xa 7%& 47 Clustering CSL465/603 - Machine Learning 12

13 Dissimilarity Based on Attributes (3) Data point x " has D features Attributes are ordinal Grades A, B, C, D Answers to survey question - strongly agree, agree, neutral, disagree Replace the ordinal values by quantitative representations m 1/2 M, m = 1,, M Clustering CSL465/603 - Machine Learning 13

14 Dissimilarity Based on Attributes (4) Data point x " has D features Attributes are categorical Values of an attribute are unordered Define explicit difference between the values d && d &H d H& d HH Often For identical values - d K,K L = 0, if m = m P For different values- d K,K L= 1, if m m P Clustering CSL465/603 - Machine Learning 14

15 Loss Function for Clustering (1) Assign each observation to a cluster without regard to the probability model describing the data Let K - be the number of clusters and k - indexes into the number of clusters Each observation is assigned to one and only one cluster View the assignment as a function C i = k Loss function [ W C = d(x ", x " L) Z%& Y " %Z Y " L %Z Characterized the extent to which observations assigned to the same cluster tend to be close to one another Within cluster distance/scatter Clustering CSL465/603 - Machine Learning 15

16 Loss Function for Clustering (2) Consider the function ' ' Total point scatter This can be divided as [ T = d "" L "%& " L %& T = d "" L Z%& Y " %Z Y " L %Z + 6 d "" L Y " L ]Z T = W C + B(C) Clustering CSL465/603 - Machine Learning 16

17 Loss Function for Clustering (3) The function B C [ B C = d "" L Z%& Y " %Z Y " L ]Z Between cluster distance/scatter Thus minimizing W C is equivalent to maximizing B C Clustering CSL465/603 - Machine Learning 17

18 Combinatorial Clustering Minimize W over all possible assignments of N data points to K clusters Unfortunately feasible only for very small data sets The number of distinct assignments is [ S N, K = 1 K! 6 1 [bz K k k' S(10, 4) = 34,105 S 19, 4 = 10 &g Z%& Not a practical clustering algorithm Clustering CSL465/603 - Machine Learning 18

19 K- Means Clustering (1) Most popular iterative descent clustering method Suppose all variables/features are real-valued and we use squared Euclidean distance as the dissimilarity measure A d x h, x " L = x " x " L The within cluster scatter can be written as [ W C = x " x " L Z%& [ Y " %Z Y " L %Z = 6 N Z 6 x " xi Z A Z%& Y " %Z Clustering CSL465/603 - Machine Learning 19 A

20 K-Means Clustering (2) Find C A = min 6 N Z 6 x " xi Z Y Z%& Y " %Z Note that for a set S So find C = [ x n = argmin r 6 x " m A min Y, r w t tuv [ " n 6 N Z 6 x " m Z A Z%& Y " %Z Clustering CSL465/603 - Machine Learning 20

21 K-Means Clustering (3) Find the optimal solution using Expectation Maximization Iterative procedure consisting of two steps Expectation step (E Step) Fix the mean vectors [ m Z Z%& and find the optimal C Maximization step (M step) Fix the cluster assignments C and find the optimal mean vectors [ m Z Z%& Each step of this procedure reduces the loss function value Clustering CSL465/603 - Machine Learning 21

22 K-Means Clustering Illustration (1) Clustering CSL465/603 - Machine Learning 22

23 K-Means Clustering Illustration (2) Clustering CSL465/603 - Machine Learning 23

24 K-Means Clustering Illustration (3) Clustering CSL465/603 - Machine Learning 24

25 K-Means Clustering Illustration (4) Clustering CSL465/603 - Machine Learning 25

26 K-Means Clustering Illustration (5) Clustering CSL465/603 - Machine Learning 26

27 K-Means Clustering Illustration (6) Clustering CSL465/603 - Machine Learning 27

28 K-Means Clustering Illustration (7) Clustering CSL465/603 - Machine Learning 28

29 K-Means Clustering Illustration (8) Clustering CSL465/603 - Machine Learning 29

30 K-Means Clustering Illustration (9) Clustering CSL465/603 - Machine Learning 30

31 K-Means Clustering Illustration (10) Blue point - Expectation step Red point Maximization step Clustering CSL465/603 - Machine Learning 31

32 How to Choose K? Similar to choosing K in knn The loss function generally decreases with K Clustering CSL465/603 - Machine Learning 32

33 Limitations of K-Means Clustering Hard assignments are susceptible to noise/outliers Assumes spherical (convex) clusters with uniform prior on the clusters Clusters can change arbitrarily for different K and initializations Clustering CSL465/603 - Machine Learning 33

34 K-Medoids K-Means is suitable only when using Euclidean distance Susceptible to outliers Challenge when the centroid of a cluster is not a valid data point Generalizing K-Means to arbitrary distance measures Replace the mean calcluation by median calculation Ensures the centroid to be a medoid always a valid data point Increases computation as we have to now find the medoid Clustering CSL465/603 - Machine Learning 34

35 Soft K-Means as Gaussian Mixture Models (1) Probabilistic Clusters Each cluster is associated with a Gaussian Distribution - N(μ Z, Σ Z ) Each cluster also has a prior probability - π Z Then the likelihood of a data point drawn from the K clusters will be Where [ Z%& π Z = 1 [ P x = 6 π Z P x μ Z, Σ Z Z%& Clustering CSL465/603 - Machine Learning 35

36 Soft K-Means as Gaussian Mixture Models (2) Given N iid data points, the likelihood function P x &,, x ' is P x &,, x ' = Clustering CSL465/603 - Machine Learning 36

37 Soft K-Means as Gaussian Mixture Models (3) Given N iid data points, the likelihood function P x &,, x ' is ' ' P x &,, x ' = "%& P(x " ) = [ "%& Z%& π Z P x " μ Z, Σ Z Let us take the negative log likelihood Clustering CSL465/603 - Machine Learning 37

38 Soft K-Means as Gaussian Mixture Models (4) Given N iid data points, the likelihood function P x &,, x ' is ' ' P x &,, x ' = "%& P(x " ) = [ "%& Z%& π Z P x " μ Z, Σ Z Let us take the log likelihood ' [ 6 log 6 π Z P x " μ Z, Σ Z "%& Z%& Clustering CSL465/603 - Machine Learning 38

39 Soft K-Means as Gaussian Mixture Models (5) Problem with maximum likelihood Sum over the components appears inside the log, thus coupling all parameters Can lead to singularities ' [ 6 log 6 π Z P x " μ Z, Σ Z "%& Z%& Clustering CSL465/603 - Machine Learning 39

40 Soft K-Means as Gaussian Mixture Models (6) Latent Variables Each data point x " is associated with a latent variable - z " = z "&,, z "[ [ Z%& Where z "Z 0, 1, z "Z = 1 and P z "Z = 1 = π Z Given the complete data X, Z, we look at maximizing P X, Z π Z, μ Z, Σ Z Clustering CSL465/603 - Machine Learning 40

41 Soft K-Means as Gaussian Mixture Models (7) Latent Variables Each data point x " is associated with a latent variable - z " = z "&,, z "[ [ Z%& Where z "Z 0, 1, z "Z = 1 and P z "Z = 1 = π Z Let the probability P z "Z = 1 x " be denoted as γ z "Z From Bayes theorem γ z "Z = P z "Z = 1 x " = P z Z = 1 P x " z "Z = 1 P x " The marginal distribution P x " = t P x ", z " = [ P z "Z = 1 P x " z "Z = 1 Z%& Clustering CSL465/603 - Machine Learning 41

42 Soft K-Means as Gaussian Mixture Models (8) Now, Therefore P z Z = 1 = π Z P x " z "Z = 1 = N x " μ Z, Σ Z γ z "Z = P z "Z = 1 x " = Clustering CSL465/603 - Machine Learning 42

43 Estimating the mean μ Z (1) Begin with the log-likelihood function ' [ 6 log 6 π Z P x " μ Z, Σ Z "%& Z%& Taking the derivative wrt to μ Z and equating it to 0 Clustering CSL465/603 - Machine Learning 43

44 Estimating the mean μ Z (2) ' μ Z = 1 N Z 6 γ z "Z x " "%& ' Where N Z = "%& γ z "Z Effective number of points assigned to cluster k So the mean of k Gaussian component is the weighted mean of all the points in the dataset Where the weight of the i data point is the posterior probability that component k was responsible for generating x " Clustering CSL465/603 - Machine Learning 44

45 Estimating the Covariance Σ Z Begin with the log-likelihood function ' [ 6 log 6 π Z P x " μ Z, Σ Z "%& Z%& Taking the derivative wrt to Σ Z and equating it to 0 ' Σ Z = 1 N Z 6 γ z "Z x " μ Z 0 x " μ Z "%& Similar to the result for a single Gaussian for the dataset, but each data point is weighted by the corresponding posterior probability. Clustering CSL465/603 - Machine Learning 45

46 Estimating the mixing coefficients π Z Begin with the log-likelihood function ' [ 6 log 6 π Z P x " μ Z, Σ Z "%& Z%& Maximize the log-likelihood, w.r.t π Z Subject to the condition that Z%& π Z = 1 Use Lagrange multiplier λ and maximize ' [ 6 log 6 π Z P x " μ Z, Σ Z "%& Z%& Solving this will result in [ π Z = N Z N [ + λ 6 π Z Z%& 1 Clustering CSL465/603 - Machine Learning 46

47 Soft K-Means as Gaussian Mixture Models (8) In Summary π Z = ' t ' μ Z = & ' γ z ' "%& "Z x " t Σ Z = & ' γ z ' "%& "Z x " μ 0 Z x " μ Z t But then what if z "Z is unkown? Use EM algorithm! Clustering CSL465/603 - Machine Learning 47

48 EM for GMM First choose initial values for π Z, μ Z, Σ Z Alternate between Expectation and Maximization Steps Expectation Step (E) Given the parameters of the compute the posterior probabilities γ(z "Z ) Maximization step (M) Given the posterior probabilities, update π Z, μ Z, Σ Z Clustering CSL465/603 - Machine Learning 48

49 EM for GMM Illustration (1) Clustering CSL465/603 - Machine Learning 49

50 EM for GMM Illustration (2) Clustering CSL465/603 - Machine Learning 50

51 EM for GMM Illustration (3) Clustering CSL465/603 - Machine Learning 51

52 EM for GMM Illustration (4) Clustering CSL465/603 - Machine Learning 52

53 EM for GMM Illustration (5) Clustering CSL465/603 - Machine Learning 53

54 EM for GMM Illustration (6) Clustering CSL465/603 - Machine Learning 54

55 Practical Issues with EM for GMM Takes many more iterations than k-means Each iteration requires more computation Run k-means first, and then EM for GMM Covariance can be initialized to the covariance of the clusters obtained from k-means EM is not guaranteed to find the global maximum of the log likelihood function Check for convergence Log likelihood does not change significantly between two iterations Clustering CSL465/603 - Machine Learning 55

56 Hierarchical Clustering (1) Organize clusters in a hierarchical fashion Produces a rooted binary tree (dendrogram) Clustering CSL465/603 - Machine Learning 56

57 Hierarchical Clustering (2) Bottom-up (agglomerative): recursively merge two groups with the smallest between cluster similarity Top-down (divisive): recursively split the least coherent cluster Users can choose a cut through the hierarchy to represent the most natural division of clusters Clustering CSL465/603 - Machine Learning 57

58 Hierarchical Clustering (3) Bottom-up (agglomerative): recursively merge two groups with the smallest between cluster similarity Top-down (divisive): recursively split the least coherent cluster Share a monotonicity property Dissimilarity between merged clusters is monotone increase with the level of the merger Cophenetic correlation coefficient Correlation between the N(N 1)/2 pairwise observation dissimilarities and the cophenetic dissmilarities derived from the dendrogram Cophenetic dissimilarity - inter group dissimilarity at which the observations are first joined together in the same cluster Clustering CSL465/603 - Machine Learning 58

59 Agglomerative Clustering (1) Single Linkage distance between two most similar points in G and H D n G, H = D(i, j) min " Œ,4 Also referred to as nearest neighbor linkage Results in extended clusters through chaining May violate the compactness property (large diameter) Clustering CSL465/603 - Machine Learning 59

60 Agglomerative Clustering (2) Complete Linkage distance between two most dissimilar points in G and H D Y G, H = D(i, j) max " Œ,4 Furthest neighbor technique Forces spherical clusters with consistent diameter May violate the closeness property Clustering CSL465/603 - Machine Learning 60

61 Agglomerative Clustering (3) Average Linkage (Group Average) average dissimilarity between the groups D ΠG, H = d i, j N ΠN Less affected by outliers " Π4 Clustering CSL465/603 - Machine Learning 61

62 Agglomerative Clustering (4) Average Linkage Complete Linkage Single Linkage Clustering CSL465/603 - Machine Learning 62

63 Density-Based Clustering (1) (Extra Topic) DBSCAN Density Based Spatial Clustering of Applications with Noise Proposed by Ester, Kriegel, Sander and Xu (KDD 96) KDD 2014 Test of Time Award Winner Basic Idea Clusters are dense regions in the data space, separated by regions of lower object density Discovers clusters of arbitrary shape in spatial databases with noise Clustering CSL465/603 - Machine Learning 63

64 Density-Based Clustering (2) Why Density-Based Clustering? Results of a k-medoid algorithm for k=4 Clustering CSL465/603 - Machine Learning 64

65 Density-Based Clustering (3) Principle For any point in a cluster, the local point density around that point has to exceed some threshold The set of point from one cluster is spatially connected DBSCAN defines two parameters ε - radius for the neighborhood of point p: N p = q X d p, q ε MinPts minimum number of points in the given neighborhood N p Clustering CSL465/603 - Machine Learning 65

66 ε - Neighborhood ε - Neighborhood objects within a radius of ε from an object N p = q X d p, q ε High Density ε - Neighborhood of an object contains at least MinPts of objects ε q p ε Clustering CSL465/603 - Machine Learning 66

67 Core, Border and Outlier Points ε = 1 MinPts = 5 Border Core Outlier Given ε and MinPts categorize objects into three exclusive groups Core point if it has more than a specified number of points MinPts within εneighborhood (interior points of a cluster) Border point has fewer than MinPts within εneighborhood, but is in the neighborhood of a core point Noise/Outlier any point that is neither a core nor a border point Clustering CSL465/603 - Machine Learning 67

68 Density Reachability (1) Directly density-reachable An object q is directly density-reachable from object p if p is a core object and q is in p s ε-neighborhood. Density reachability is asymmetric ε q p ε MinPts = 4 Clustering CSL465/603 - Machine Learning 68

69 Density Reachability (2) Density-Reachable (directly and indirectly): A point p is directly density-reachable from p2; p2 is directly density-reachable from p1; p1 is directly density-reachable from q; p p2 p1 q form a chain. p is indirectly density reachable from q p 2 p q p 1 Clustering CSL465/603 - Machine Learning 69

70 Density - Connectivity Density-reachable is not symmetric Not good enough to describe clusters Density-Connected A pair of points p and q are density-connected, if they are commonly density-reachable from a point o. This is symmetric p q o Clustering CSL465/603 - Machine Learning 70

71 Cluster in DBSCAN Given a dataset X, parameter ε and threshold MinPts A cluster C is a subset of objects satisfying the criteria Connected - p, q C, p and q are density-connected Maximal - p, q X, if p C and q is density-reachable from p, then q C Clustering CSL465/603 - Machine Learning 71

72 DBSCAN - Algorithm Input Dataset X, Parameters - ε, MinPts For each object p X If p is a core object and not processed then C = retrieve all objects density reachable from p Mark all objects in C as processed Report C as a cluster Else mark p as outlier If p is a border point, no points are densityreachable from p and the DBSCAN algorithm visits the next point in X Clustering CSL465/603 - Machine Learning 72

73 DBSCAN Algorithm Illustration (1) ε = 2, MinPts = 3 For each object p X If p is a core object and not processed then C = retrieve all objects density reachable from p Mark all objects in C as processed Report C as a cluster Else mark p as outlier Clustering CSL465/603 - Machine Learning 73

74 DBSCAN Algorithm Illustration (2) ε = 2, MinPts = 3 For each object p X If p is a core object and not processed then C = retrieve all objects density reachable from p Mark all objects in C as processed Report C as a cluster Else mark p as outlier Clustering CSL465/603 - Machine Learning 74

75 DBSCAN Algorithm Illustration (3) ε = 2, MinPts = 3 For each object p X If p is a core object and not processed then C = retrieve all objects density reachable from p Mark all objects in C as processed Report C as a cluster Else mark p as outlier Clustering CSL465/603 - Machine Learning 75

76 DBSCAN Example (1) Where it works Original Points Clustering CSL465/603 - Machine Learning 76

77 DBSCAN Example (2) Where it does not work Varying densities Original points Clustering CSL465/603 - Machine Learning 77

78 Summary Unsupervised Learning K-means clustering Expectation Maximization for discovering the clusters K-medoids clustering Gaussian Mixture Models Expectation Maximization for estimating the parameters of the Gaussian mixtures Hierarchical Clustering Agglomerative Clustering Density Based Clustering DBSCAN Clustering CSL465/603 - Machine Learning 78

Data Preprocessing. Cluster Similarity

Data Preprocessing. Cluster Similarity 1 Cluster Similarity Similarity is most often measured with the help of a distance function. The smaller the distance, the more similar the data objects (points). A function d: M M R is a distance on M

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Part I. Linear regression & LASSO. Linear Regression. Linear Regression. Week 10 Based in part on slides from textbook, slides of Susan Holmes

Part I. Linear regression & LASSO. Linear Regression. Linear Regression. Week 10 Based in part on slides from textbook, slides of Susan Holmes Week 10 Based in part on slides from textbook, slides of Susan Holmes Part I Linear regression & December 5, 2012 1 / 1 2 / 1 We ve talked mostly about classification, where the outcome categorical. If

More information

Data Exploration and Unsupervised Learning with Clustering

Data Exploration and Unsupervised Learning with Clustering Data Exploration and Unsupervised Learning with Clustering Paul F Rodriguez,PhD San Diego Supercomputer Center Predictive Analytic Center of Excellence Clustering Idea Given a set of data can we find a

More information

Unsupervised machine learning

Unsupervised machine learning Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels

More information

University of Florida CISE department Gator Engineering. Clustering Part 1

University of Florida CISE department Gator Engineering. Clustering Part 1 Clustering Part 1 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville What is Cluster Analysis? Finding groups of objects such that the objects

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 6220 - Section 2 - Spring 2017 Lecture 6 Jan-Willem van de Meent (credit: Yijun Zhao, Chris Bishop, Andrew Moore, Hastie et al.) Project Project Deadlines 3 Feb: Form teams of

More information

Mixtures of Gaussians. Sargur Srihari

Mixtures of Gaussians. Sargur Srihari Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm

More information

Chapter 5-2: Clustering

Chapter 5-2: Clustering Chapter 5-2: Clustering Jilles Vreeken Revision 1, November 20 th typo s fixed: dendrogram Revision 2, December 10 th clarified: we do consider a point x as a member of its own ε-neighborhood 12 Nov 2015

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

Clustering VS Classification

Clustering VS Classification MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

Clustering and Gaussian Mixture Models

Clustering and Gaussian Mixture Models Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization

More information

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14

STATS 306B: Unsupervised Learning Spring Lecture 5 April 14 STATS 306B: Unsupervised Learning Spring 2014 Lecture 5 April 14 Lecturer: Lester Mackey Scribe: Brian Do and Robin Jia 5.1 Discrete Hidden Markov Models 5.1.1 Recap In the last lecture, we introduced

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2 STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

CSE446: Clustering and EM Spring 2017

CSE446: Clustering and EM Spring 2017 CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled

More information

Lecture 8: Clustering & Mixture Models

Lecture 8: Clustering & Mixture Models Lecture 8: Clustering & Mixture Models C4B Machine Learning Hilary 2011 A. Zisserman K-means algorithm GMM and the EM algorithm plsa clustering K-means algorithm K-means algorithm Partition data into K

More information

Clustering. Stephen Scott. CSCE 478/878 Lecture 8: Clustering. Stephen Scott. Introduction. Outline. Clustering.

Clustering. Stephen Scott. CSCE 478/878 Lecture 8: Clustering. Stephen Scott. Introduction. Outline. Clustering. 1 / 19 sscott@cse.unl.edu x1 If no label information is available, can still perform unsupervised learning Looking for structural information about instance space instead of label prediction function Approaches:

More information

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision

More information

Density-Based Clustering

Density-Based Clustering Density-Based Clustering idea: Clusters are dense regions in feature space F. density: objects volume ε here: volume: ε-neighborhood for object o w.r.t. distance measure dist(x,y) dense region: ε-neighborhood

More information

Machine Learning. Clustering 1. Hamid Beigy. Sharif University of Technology. Fall 1395

Machine Learning. Clustering 1. Hamid Beigy. Sharif University of Technology. Fall 1395 Machine Learning Clustering 1 Hamid Beigy Sharif University of Technology Fall 1395 1 Some slides are taken from P. Rai slides Hamid Beigy (Sharif University of Technology) Machine Learning Fall 1395 1

More information

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

More information

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

More information

Final Exam, Machine Learning, Spring 2009

Final Exam, Machine Learning, Spring 2009 Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 6: Cluster Analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2017/2018 Master in Mathematical Engineering

More information

Multivariate Statistics: Hierarchical and k-means cluster analysis

Multivariate Statistics: Hierarchical and k-means cluster analysis Multivariate Statistics: Hierarchical and k-means cluster analysis Steffen Unkel Department of Medical Statistics University Medical Center Goettingen, Germany Summer term 217 1/43 What is a cluster? Proximity

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Clustering with k-means and Gaussian mixture distributions

Clustering with k-means and Gaussian mixture distributions Clustering with k-means and Gaussian mixture distributions Machine Learning and Object Recognition 2017-2018 Jakob Verbeek Clustering Finding a group structure in the data Data in one cluster similar to

More information

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

Clustering with k-means and Gaussian mixture distributions

Clustering with k-means and Gaussian mixture distributions Clustering with k-means and Gaussian mixture distributions Machine Learning and Category Representation 2014-2015 Jakob Verbeek, ovember 21, 2014 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Saniv Kumar, Google Research, NY EECS-6898, Columbia University - Fall, 010 Saniv Kumar 9/13/010 EECS6898 Large Scale Machine Learning 1 Curse of Dimensionality Gaussian Mixture Models

More information

Gaussian Mixture Models, Expectation Maximization

Gaussian Mixture Models, Expectation Maximization Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak

More information

Latent Variable Models and EM Algorithm

Latent Variable Models and EM Algorithm SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Logic and machine learning review. CS 540 Yingyu Liang

Logic and machine learning review. CS 540 Yingyu Liang Logic and machine learning review CS 540 Yingyu Liang Propositional logic Logic If the rules of the world are presented formally, then a decision maker can use logical reasoning to make rational decisions.

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Applying cluster analysis to 2011 Census local authority data

Applying cluster analysis to 2011 Census local authority data Applying cluster analysis to 2011 Census local authority data Kitty.Lymperopoulou@manchester.ac.uk SPSS User Group Conference November, 10 2017 Outline Basic ideas of cluster analysis How to choose variables

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Clustering Lecture 1: Basics. Jing Gao SUNY Buffalo

Clustering Lecture 1: Basics. Jing Gao SUNY Buffalo Clustering Lecture 1: Basics Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics Clustering

More information

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. CS 189 Spring 013 Introduction to Machine Learning Final You have 3 hours for the exam. The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet. Please

More information

Machine Learning for Data Science (CS4786) Lecture 12

Machine Learning for Data Science (CS4786) Lecture 12 Machine Learning for Data Science (CS4786) Lecture 12 Gaussian Mixture Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Back to K-means Single link is sensitive to outliners We

More information

Clustering and Gaussian Mixtures

Clustering and Gaussian Mixtures Clustering and Gaussian Mixtures Oliver Schulte - CMPT 883 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15 2 25 5 1 15 2 25 detected tures detected

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:

More information

Lecture 11: Unsupervised Machine Learning

Lecture 11: Unsupervised Machine Learning CSE517A Machine Learning Spring 2018 Lecture 11: Unsupervised Machine Learning Instructor: Marion Neumann Scribe: Jingyu Xin Reading: fcml Ch6 (Intro), 6.2 (k-means), 6.3 (Mixture Models); [optional]:

More information

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction

More information

Statistical Learning. Dong Liu. Dept. EEIS, USTC

Statistical Learning. Dong Liu. Dept. EEIS, USTC Statistical Learning Dong Liu Dept. EEIS, USTC Chapter 6. Unsupervised and Semi-Supervised Learning 1. Unsupervised learning 2. k-means 3. Gaussian mixture model 4. Other approaches to clustering 5. Principle

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!

More information

Clustering, K-Means, EM Tutorial

Clustering, K-Means, EM Tutorial Clustering, K-Means, EM Tutorial Kamyar Ghasemipour Parts taken from Shikhar Sharma, Wenjie Luo, and Boris Ivanovic s tutorial slides, as well as lecture notes Organization: Clustering Motivation K-Means

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 203 http://ce.sharif.edu/courses/9-92/2/ce725-/ Agenda Expectation-maximization

More information

MLCC Clustering. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC Clustering. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 - Clustering Lorenzo Rosasco UNIGE-MIT-IIT About this class We will consider an unsupervised setting, and in particular the problem of clustering unlabeled data into coherent groups. MLCC 2018

More information

Module Master Recherche Apprentissage et Fouille

Module Master Recherche Apprentissage et Fouille Module Master Recherche Apprentissage et Fouille Michele Sebag Balazs Kegl Antoine Cornuéjols http://tao.lri.fr 19 novembre 2008 Unsupervised Learning Clustering Data Streaming Application: Clustering

More information

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM. Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Clustering using Mixture Models

Clustering using Mixture Models Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu November 3, 2015 Methods to Learn Matrix Data Text Data Set Data Sequence Data Time Series Graph

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:

More information

PCA and admixture models

PCA and admixture models PCA and admixture models CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar, Alkes Price PCA and admixture models 1 / 57 Announcements HW1

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Clustering. Léon Bottou COS 424 3/4/2010. NEC Labs America

Clustering. Léon Bottou COS 424 3/4/2010. NEC Labs America Clustering Léon Bottou NEC Labs America COS 424 3/4/2010 Agenda Goals Representation Capacity Control Operational Considerations Computational Considerations Classification, clustering, regression, other.

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Probabilistic clustering

Probabilistic clustering Aprendizagem Automática Probabilistic clustering Ludwig Krippahl Probabilistic clustering Summary Fuzzy sets and clustering Fuzzy c-means Probabilistic Clustering: mixture models Expectation-Maximization,

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

Multivariate Analysis

Multivariate Analysis Multivariate Analysis Chapter 5: Cluster analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2015/2016 Master in Business Administration and

More information

Clustering with k-means and Gaussian mixture distributions

Clustering with k-means and Gaussian mixture distributions Clustering with k-means and Gaussian mixture distributions Machine Learning and Category Representation 2012-2013 Jakob Verbeek, ovember 23, 2012 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.12.13

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1

EEL 851: Biometrics. An Overview of Statistical Pattern Recognition EEL 851 1 EEL 851: Biometrics An Overview of Statistical Pattern Recognition EEL 851 1 Outline Introduction Pattern Feature Noise Example Problem Analysis Segmentation Feature Extraction Classification Design Cycle

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

Machine Learning - MT Clustering

Machine Learning - MT Clustering Machine Learning - MT 2016 15. Clustering Varun Kanade University of Oxford November 28, 2016 Announcements No new practical this week All practicals must be signed off in sessions this week Firm Deadline:

More information

PATTERN CLASSIFICATION

PATTERN CLASSIFICATION PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS

More information