Lecture Nov

Similar documents
Mixture o f of Gaussian Gaussian clustering Nov

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

Logistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI

CIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M

Gaussian Mixture Models

Clustering gene expression data & the EM algorithm

MATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)

EM and Structure Learning

ENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition

MACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression

MLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012

Expectation Maximization Mixture Models HMMs

An Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation

Hidden Markov Models

Clustering & Unsupervised Learning

Lecture Notes on Linear Regression

Discriminative classifier: Logistic Regression. CS534-Machine Learning

INF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018

P R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /

Finite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin

Outline. EM Algorithm and its Applications. K-Means Classifier. K-Means Classifier (Cont.) Introduction of EM K-Means EM EM Applications.

Generative classification models

Classification as a Regression Problem

5. POLARIMETRIC SAR DATA CLASSIFICATION

Evaluation for sets of classes

Lecture 12: Classification

MIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

Clustering & (Ken Kreutz-Delgado) UCSD

Lecture 3. Ax x i a i. i i

p 1 c 2 + p 2 c 2 + p 3 c p m c 2

Space of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics

Mixture of Gaussians Expectation Maximization (EM) Part 2

Maximum Likelihood Estimation (MLE)

Homework Assignment 3 Due in class, Thursday October 15

1 The Mistake Bound Model

Clustering Techniques for Information Retrieval

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 3: Large deviations bounds and applications Lecturer: Sanjeev Arora

10-701/ Machine Learning, Fall 2005 Homework 3

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

Chapter 13: Multiple Regression

Composite Hypotheses testing

Clustering with Gaussian Mixtures

A Bayes Algorithm for the Multitask Pattern Recognition Problem Direct Approach

Outline. Clustering: Similarity-Based Clustering. Supervised Learning vs. Unsupervised Learning. Clustering. Applications of Clustering

Ensemble Methods: Boosting

Discriminative classifier: Logistic Regression. CS534-Machine Learning

15-381: Artificial Intelligence. Regression and cross validation

xp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ

Boostrapaggregating (Bagging)

Computing MLE Bias Empirically

Classification. Representing data: Hypothesis (classifier) Lecture 2, September 14, Reading: Eric CMU,

Classification learning II

biologically-inspired computing lecture 21 Informatics luis rocha 2015 INDIANA UNIVERSITY biologically Inspired computing

2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification

Machine learning: Density estimation

Overview. Hidden Markov Models and Gaussian Mixture Models. Acoustic Modelling. Fundamental Equation of Statistical Speech Recognition

Hydrological statistics. Hydrological statistics and extremes

Mean Field / Variational Approximations

Note on EM-training of IBM-model 1

3.1 ML and Empirical Distribution

1 Motivation and Introduction

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

Semi-Supervised Learning

Department of Computer Science Artificial Intelligence Research Laboratory. Iowa State University MACHINE LEARNING

Convergence of random processes

Course 395: Machine Learning - Lectures

Learning from Data 1 Naive Bayes

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

Linear Regression Analysis: Terminology and Notation

Spatial Statistics and Analysis Methods (for GEOG 104 class).

Logistic Regression Maximum Likelihood Estimation

Module 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Performance of Different Algorithms on Clustering Molecular Dynamics Trajectories

Unified Subspace Analysis for Face Recognition

Appendix B: Resampling Algorithms

STATS 306B: Unsupervised Learning Spring Lecture 10 April 30

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

Measurement Uncertainties Reference

Case A. P k = Ni ( 2L i k 1 ) + (# big cells) 10d 2 P k.

CS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015

Chapter Newton s Method

The big picture. Outline

The Expectation-Maximization Algorithm

The Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD

Statistical pattern recognition

Bayesian Learning. Smart Home Health Analytics Spring Nirmalya Roy Department of Information Systems University of Maryland Baltimore County

Vapnik-Chervonenkis theory

Conjugacy and the Exponential Family

I529: Machine Learning in Bioinformatics (Spring 2017) Markov Models

Suites of Tests. DIEHARD TESTS (Marsaglia, 1985) See

Statistical analysis using matlab. HY 439 Presented by: George Fortetsanakis

ISQS 6348 Final Open notes, no books. Points out of 100 in parentheses. Y 1 ε 2

Outline. Multivariate Parametric Methods. Multivariate Data. Basic Multivariate Statistics. Steven J Zeil

Cluster Validation Determining Number of Clusters. Umut ORHAN, PhD.

4.3 Poisson Regression

Computational Biology Lecture 8: Substitution matrices Saad Mneimneh

Computation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models

Regularized Discriminant Analysis for Face Recognition

Transcription:

Lecture 18 Nov 07 2008

Revew Clusterng Groupng smlar obects nto clusters Herarchcal clusterng Agglomeratve approach (HAC: teratvely merge smlar clusters Dfferent lnkage algorthms for computng dstances among clusters Non herarchcal clusterng K-means: start wth a set of ntal seeds (ceners, teratvely go through reassgnment and recenterng steps untl convergence

More about Kmeans It always converges (fast It converges to local optmum Dfferent ntal seeds lead to dfferent local optmum, to address ths: Many random restart and pck the best wrt MSE Separate ntal seeds far apart Other problems: It s best suted for cases where clusters are all sphercal and smlar n sze It does not allow an obect to partally belong to multple clusters

Soft vs hard Clusterng Hard clusterng: Data pont s determnstcally assgned to one and only one cluster But n realty clusters may overlap Soft-clusterng: Data ponts are assgned to clusters wth certan probabltes

How can we etend Kmeans to make soft clusterng Gven a set of clusters centers μ 1, μ 2,, μ k, nstead of drectly assgn all data ponts to ther closest clusters, we can assgn them partally based on the dstances If each pont only partally belongs to a partcular cluster If each pont only partally belongs to a partcular cluster, when computng the centrod, should we stll use t as f t was fully there?

Gaussan for representng a cluster What eactly s a cluster? Intutvely t s a tghtly packed ball-shape lke thng We can use a Gaussan (normal dstrbuton to descrbe t Let s frst revew what s a Gaussan dstrbuton

Sde track: Gaussan Dstrbtuon Unvarate Gaussan dstrbuton: N(μ, σ 2 μ mean, center of the mass σ 2 standard devaton, spread of the mass Multvarate Gaussan dstrbuton: N(μ, Σ μ (μ 1, μ 2 Σ Covarance matr σ 2 1 σ 12 σ 12 σ 2 2

Mture of Gaussans Assume that we have k clusters n our data Each cluster contans data generated from a Gaussan dstrbuton Overall process of generatng g data: frst randomly select one of the clusters accordng to a pror dstrbuton of the clusters draw a random sample from the Gaussan dstrbuton of that partcular cluster Smlar to the generatve model we have learned n Bayes Classfer, dfference? Here we don t know the cluster membershp of each data pont (unsupervsed

Clusterng usng mture of Gaussan models Gven a set of data ponts, and assume that we know there are k clusters n the data, we need to: Assgn the data ponts to the k clusters (soft assgnment Learn the gaussan dstrbuton b t parameters for each cluster: μ and Σ

A smpler problem If we know the parameters of each Gaussan: (μ 1,Σ 1 ; (μ 2,Σ 2 ;..., (μ Κ,Σ Κ we can compute the probablty of each data pont belongng to each cluster P( C P( C P ( C = P ( 1 1 α ep[ ( μ d 2 2 (2 1/ π Σ 2 T 1 Σ / ( μ ] The same as n makng predcton n Bayes classfer

Another smpler problem If we know what ponts belong to cluster, we can estmate the gaussan parameters easly: Cluster pror ˆμ = 1 n Cluster mean C Σˆ = 1 n What we have s slghtly dfferent T ( ˆ μ ( ˆ μ C Cluster covarance For each data pont, we have P( C for =1,2,, K

Modfcatons Cluster pror = = n C P n, 1, ( 1 L α = 1 ˆμ Cluster mean = =,n, C P 1 ( ( ˆ L μ C n mean = n C P, 1, ( L μ Cluster covarance T C n = Σ ( ( ˆ ˆ 1 ˆ μ μ = = Σ,n, T C P C P 1 1 ( ˆ ˆ ( ˆ L ( ( μ μ = n,, 1 L

A procedure smlar to Kmeans Randomly ntalze the Gaussan parameters Repeat untl converge 1. Compute P ( C for all data ponts and all clusters Ths s called the E-step for t computes the epected values of the cluster membershps for each data pont 2. Re-compute the parameters of each Gaussan Ths s called the M-step for t performs mamum lkelhood estmaton of parameters

Q: Why are these two ponts red when they appear to be closer to blue?

K-Means s a Specal Case we get K-Means f we make followng restrctons: All Gaussans have the dentty covarance matr (.e., sphercal Gaussans Use hard assgnment for the E-step to assgn data pont to ts most lkely cluster

Behavor of EM It s guaranteed to converge In practce t may converge slowly, one can stop early f the change n loglkelhood s smaller than a threshold Lke K-means t converges to a local l optmum Multple restart s recommended