Unsupervised Learning: K-Means, Gaussian Mixture Models

Similar documents
Unsupervised Learning: K- Means & PCA

Gaussian Mixture Models, Expectation Maximization

CSE446: Clustering and EM Spring 2017

Bayesian Networks Structure Learning (cont.)

Linear Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Logistic Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Classification Based on Probability

Logis&c Regression. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Mixtures of Gaussians continued

Gaussian Mixture Models

K-means. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. November 19 th, Carlos Guestrin 1

Data Preprocessing. Cluster Similarity

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Expectation Maximization Algorithm

Brief Introduction of Machine Learning Techniques for Content Analysis

Probability Basics. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Unsupervised Learning

EM (cont.) November 26 th, Carlos Guestrin 1

Clustering, K-Means, EM Tutorial

Lecture 14. Clustering, K-means, and EM

L11: Pattern recognition principles

CS534 Machine Learning - Spring Final Exam

Classification: Decision Trees

Expectation maximization

Mixture of Gaussians Models

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

Unsupervised learning (part 1) Lecture 19

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Expectation Maximization

Lecture 11: Unsupervised Machine Learning

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

CS6220: DATA MINING TECHNIQUES

Machine Learning. Lecture Slides for. ETHEM ALPAYDIN The MIT Press, h1p://

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017

VC-dimension for characterizing classifiers

Introduction to Machine Learning Midterm, Tues April 8

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

CSCI 360 Introduc/on to Ar/ficial Intelligence Week 2: Problem Solving and Op/miza/on

CPSC 540: Machine Learning

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Mixture Models & EM algorithm Lecture 21

Principal Component Analysis (PCA)

Clustering and Gaussian Mixture Models

Probabilistic clustering

ECE 5984: Introduction to Machine Learning

Review and Motivation

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Robot Image Credit: Viktoriya Sukhanova 123RF.com. Dimensionality Reduction

Data Mining Techniques

Machine Learning Lecture 5

Information Extraction from Text

Human Mobility Pattern Prediction Algorithm using Mobile Device Location and Time Data

Expectation Maximization

K-Means, Expectation Maximization and Segmentation. D.A. Forsyth, CS543

Expectation Maximization, and Learning from Partly Unobserved Data (part 2)

Final Exam, Fall 2002

CS6220: DATA MINING TECHNIQUES

Pattern recognition. "To understand is to perceive patterns" Sir Isaiah Berlin, Russian philosopher

1 EM Primer. CS4786/5786: Machine Learning for Data Science, Spring /24/2015: Assignment 3: EM, graphical models

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Machine Learning for Data Science (CS4786) Lecture 12

Generative Clustering, Topic Modeling, & Bayesian Inference

Support Vector Machines

FINAL EXAM: FALL 2013 CS 6375 INSTRUCTOR: VIBHAV GOGATE

Notes on Machine Learning for and

Gentle Introduction to Infinite Gaussian Mixture Modeling

CSCI-567: Machine Learning (Spring 2019)

Clustering. Léon Bottou COS 424 3/4/2010. NEC Labs America

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm

Advanced Introduction to Machine Learning

Dimensionality Reduction

Introduction to Graphical Models

10-701/15-781, Machine Learning: Homework 4

STATS 306B: Unsupervised Learning Spring Lecture 3 April 7th

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models

CSC411: Final Review. James Lucas & David Madras. December 3, 2018

Statistical Learning. Dong Liu. Dept. EEIS, USTC

Clustering. Stephen Scott. CSCE 478/878 Lecture 8: Clustering. Stephen Scott. Introduction. Outline. Clustering.

Temporal Difference Learning & Policy Iteration

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Expectation Maximization (EM)

Single-tree GMM training

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)

K-Means and Gaussian Mixture Models

Outline. 15. Descriptive Summary, Design, and Inference. Descriptive summaries. Data mining. The centroid

Classification Semi-supervised learning based on network. Speakers: Hanwen Wang, Xinxin Huang, and Zeyu Li CS Winter

Lecture 2: GMM and EM

Introduction to Machine Learning

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017

Machine Learning, Midterm Exam

Transcription:

Unsupervised Learning: K-Means, Gaussian Mixture Models These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these slides for your own academic purposes, provided that you include proper attribution. Please send comments and corrections to Eric. Robot Image Credit: Viktoriya Sukhanova

Unsupervised Learning

Unsupervised Learning

Unsupervised Learning

Unsupervised Learning Supervised learning used labeled data pairs (x, y) to learn a function f : X Y But, what if we don t have labels? Most data doesn t! No labels = unsupervised learning Only some points are labeled = semi-supervised learning Labels may be expensive to obtain, so we only get a few

Unsupervised Learning Major Categories (and their uses) Clustering Density-estimation Dimensionality reduction (feature selection)

Kernel Density Estimation Some material adapted from slides by Le Song, GT.

Kernel Density Estimation Clustering with Gaussian Mixtures: Slide 8

Kernel Density Estimation Clustering with Gaussian Mixtures: Slide 9

Kernel Density Estimation Clustering with Gaussian Mixtures: Slide 10

Kernel Density Estimation Clustering with Gaussian Mixtures: Slide 11

Kernel Density Estimation Clustering with Gaussian Mixtures: Slide 12

Kernel Density Estimation Clustering with Gaussian Mixtures: Slide 13

K-Means Clustering Some material adapted from slides by Andrew Moore, CMU. Visit http://www.autonlab.org/tutorials/ for Andrew s repository of Data Mining tutorials.

Clustering Data

K-Means Clustering K-Means ( k, X ) Randomly choose k cluster center locations (centroids) Loop until convergence Assign each point to the cluster of the closest centroid Re-estimate the cluster centroids based on the data assigned to each cluster

K-Means Clustering K-Means ( k, X ) Randomly choose k cluster center locations (centroids) Loop until convergence Assign each point to the cluster of the closest centroid Re-estimate the cluster centroids based on the data assigned to each cluster

K-Means Clustering K-Means ( k, X ) Randomly choose k cluster center locations (centroids) Loop until convergence Assign each point to the cluster of the closest centroid Re-estimate the cluster centroids based on the data assigned to each cluster

K-Means Animation Example generated by Andrew Moore using Dan Pelleg s superduper fast K-means system: Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning. Proc. Conference on Knowledge Discovery in Databases 1999.

K-Means Animation Example generated by Andrew Moore using Dan Pelleg s superduper fast K-means system: Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning. Proc. Conference on Knowledge Discovery in Databases 1999.

K-Means Animation Example generated by Andrew Moore using Dan Pelleg s superduper fast K-means system: Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning. Proc. Conference on Knowledge Discovery in Databases 1999.

K-Means Animation Example generated by Andrew Moore using Dan Pelleg s superduper fast K-means system: Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning. Proc. Conference on Knowledge Discovery in Databases 1999.

K-Means Animation Example generated by Andrew Moore using Dan Pelleg s superduper fast K-means system: Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning. Proc. Conference on Knowledge Discovery in Databases 1999.

K-Means Animation Example generated by Andrew Moore using Dan Pelleg s superduper fast K-means system: Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning. Proc. Conference on Knowledge Discovery in Databases 1999.

K-Means Animation Example generated by Andrew Moore using Dan Pelleg s superduper fast K-means system: Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning. Proc. Conference on Knowledge Discovery in Databases 1999.

K-Means Animation Example generated by Andrew Moore using Dan Pelleg s superduper fast K-means system: Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning. Proc. Conference on Knowledge Discovery in Databases 1999.

K-Means Animation Example generated by Andrew Moore using Dan Pelleg s superduper fast K-means system: Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning. Proc. Conference on Knowledge Discovery in Databases 1999.

K-Means Animation Example generated by Andrew Moore using Dan Pelleg s superduper fast K-means system: Dan Pelleg and Andrew Moore. Accelerating Exact k-means Algorithms with Geometric Reasoning. Proc. Conference on Knowledge Discovery in Databases 1999.

K-Means Objective Function K-means finds a local optimum of the following objective function:

Problems with K-Means Very sensitive to the initial points Do many runs of K-Means, each with different initial centroids Seed the centroids using a better method than randomly choosing the centroids e.g., Farthest-first sampling Must manually choose k Learn the optimal k for the clustering Note that this requires a performance measure

Choosing the key parameter 1.The Elbow Method a. SSE= Ki=1 x cidist(x,ci)2 2.x-means clustering a. repeatedly attempting subdivision 3.Others: a.information Criterion Approach b.an Information Theoretic Approach c.the Silhouette Method d.cross-validation

Choosing the key parameter

Problems with K-Means How do you tell it which clustering you want? k = 2 Constrained clustering techniques (semi-supervised) Same-cluster constraint (must-link) Different-cluster constraint (cannot-link)

Gaussian Mixture Models and Expectation Maximization

Gaussian Mixture Models Recall the Gaussian distribution:

The GMM assumption There are k components. The i th component is called ω i Component ω i has an associated mean vector µ i µ1 µ 2 µ 3 Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 36

The GMM assumption There are k components. The i th component is called ω i Component ω i has an associated mean vector µ i Each component generates data from a Gaussian with mean µ i and covariance matrix σ 2 I Assume that each datapoint is generated according to the following recipe: µ 1 µ 2 µ 3 Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 37

The GMM assumption There are k components. The i th component is called ω i Component ω i has an associated mean vector µ i Each component generates data from a Gaussian with mean µ i and covariance matrix σ 2 I Assume that each datapoint is generated according to the following recipe: Pick a component at random. Choose component i with probability P(ω i ). µ 2 Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 38

The GMM assumption There are k components. The i th component is called ω i Component ω i has an associated mean vector µ i Each component generates data from a Gaussian with mean µ i and covariance matrix σ 2 I Assume that each datapoint is generated according to the following recipe: Pick a component at random. Choose component i with probability P(ω i ). Datapoint ~ N(µ i, σ 2 I ) µ 2 x Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 39

The General GMM assumption There are k components. The i th component is called ω i Component ω i has an associated mean vector µ i Each component generates data from a Gaussian with mean µ i and covariance matrix Σ i Assume that each datapoint is generated according to the following recipe: Pick a component at random. Choose component i with probability P(ω i ). Datapoint ~ N(µ i, Σ i ) µ 1 µ 2 µ 3 Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 40

Expectation-Maximization for GMMs Iterate until convergence: On the t th iteration let our estimates be λ t = { μ 1 (t), μ 2 (t) μ c (t) } Just evaluate a Gaussian at x k E-step: Compute expected classes of all datapoints for each class M-step: Estimate μ given our data s class membership distributions Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 41

E.M. for General GMMs Iterate. On the t th iteration let our estimates be p i (t) is shorthand for estimate of P(ω i ) on t th iteration λ t = { μ 1 (t), μ 2 (t) μ c (t), Σ 1 (t), Σ 2 (t) Σ c (t), p 1 (t), p 2 (t) p c (t) } E-step: Compute expected clusters of all datapoints Just evaluate a Gaussian at x k M-step: Estimate μ, Σ given our data s class membership distributions R = #records Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 42

Gaussian Mixture Example: Start Advance apologies: in Black and White this example will be incomprehensible Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 43

After first iteration Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 44

After 2nd iteration Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 45

After 3rd iteration Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 46

After 4th iteration Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 47

After 5th iteration Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 48

After 6th iteration Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 49

After 20th iteration Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 50

Some Bio Assay data Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 51

GMM clustering of the assay data Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 52

Resulting Density Estimator Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 53

Other Methods and Reading Hidden Markov Models: Andrew Ng s Notes Mean Shift Mean Shift Tutorial Deep Methods Tutorial on Variational Autoencoders Clustering with Gaussian Mixtures: Slide 54