CHAPTER 7: CLUSTERING

Similar documents
Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

Clustering (Bishop ch 9)

CS 536: Machine Learning. Nonparametric Density Estimation Unsupervised Learning - Clustering

CHAPTER 10: LINEAR DISCRIMINATION

Advanced Machine Learning & Perception

Discrete Markov Process. Introduction. Example: Balls and Urns. Stochastic Automaton. INTRODUCTION TO Machine Learning 3rd Edition

Machine Learning 2nd Edition

MARKOV CHAIN AND HIDDEN MARKOV MODEL

Clustering with Gaussian Mixtures

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 4

CHAPTER 5: MULTIVARIATE METHODS

Image Classification Using EM And JE algorithms

Anomaly Detection. Lecture Notes for Chapter 9. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Outline. Probabilistic Model Learning. Probabilistic Model Learning. Probabilistic Model for Time-series Data: Hidden Markov Model

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

Bayes rule for a classification problem INF Discriminant functions for the normal density. Euclidean distance. Mahalanobis distance

( ) [ ] MAP Decision Rule

Chapter 4. Neural Networks Based on Competition

Variants of Pegasos. December 11, 2009

CHAPTER 10: LINEAR DISCRIMINATION

THE PREDICTION OF COMPETITIVE ENVIRONMENT IN BUSINESS

Lecture 2 L n i e n a e r a M od o e d l e s

Fall 2010 Graduate Course on Dynamic Learning

Normal Random Variable and its discriminant functions

Robustness Experiments with Two Variance Components

Introduction ( Week 1-2) Course introduction A brief introduction to molecular biology A brief introduction to sequence comparison Part I: Algorithms

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

Chapter Lagrangian Interpolation

MACHINE LEARNING. Learning Bayesian networks

Learning Objectives. Self Organization Map. Hamming Distance(1/5) Introduction. Hamming Distance(3/5) Hamming Distance(2/5) 15/04/2015

John Geweke a and Gianni Amisano b a Departments of Economics and Statistics, University of Iowa, USA b European Central Bank, Frankfurt, Germany

Solution in semi infinite diffusion couples (error function analysis)

CHAPTER 2: Supervised Learning

The Research of Algorithm for Data Mining Based on Fuzzy Theory

Lecture VI Regression

J i-1 i. J i i+1. Numerical integration of the diffusion equation (I) Finite difference method. Spatial Discretization. Internal nodes.

Hidden Markov Models

Lecture 11 SVM cont

In the complete model, these slopes are ANALYSIS OF VARIANCE FOR THE COMPLETE TWO-WAY MODEL. (! i+1 -! i ) + [(!") i+1,q - [(!

Lecture 6: Learning for Control (Generalised Linear Regression)

Dynamic Team Decision Theory. EECS 558 Project Shrutivandana Sharma and David Shuman December 10, 2005

Math 128b Project. Jude Yuen

January Examinations 2012

Mixtures Experiments with Mixing Errors

Stochastic Programming handling CVAR in objective and constraint

Robust and Accurate Cancer Classification with Gene Expression Profiling

Objectives. Image R 1. Segmentation. Objects. Pixels R N. i 1 i Fall LIST 2

DITAN: A TOOL FOR OPTIMAL SPACE TRAJECTORY DESIGN

Dishonest casino as an HMM

Chapter 6 DETECTION AND ESTIMATION: Model of digital communication system. Fundamental issues in digital communications are

Introduction to Boosting

Mixed Integer Programming Model for open Vehicle Routing Problem with Fleet and driver Scheduling Considering Delivery and Pick-Up Simultaneously

( t) Outline of program: BGC1: Survival and event history analysis Oslo, March-May Recapitulation. The additive regression model

Sparse Kernel Ridge Regression Using Backward Deletion

From Particles to Rigid Bodies

Block compressed sensing of video based on unstable sampling rates and multihypothesis predictions

Notes on the stability of dynamic systems and the use of Eigen Values.

Tools for Analysis of Accelerated Life and Degradation Test Data

CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1

The Finite Element Method for the Analysis of Non-Linear and Dynamic Systems

Appendix H: Rarefaction and extrapolation of Hill numbers for incidence data

A New Method for Computing EM Algorithm Parameters in Speaker Identification Using Gaussian Mixture Models

Reactive Methods to Solve the Berth AllocationProblem with Stochastic Arrival and Handling Times

Machine Learning Linear Regression

GMM parameter estimation. Xiaoye Lu CMPS290c Final Project

Computing Relevance, Similarity: The Vector Space Model

Sparse Kernel Ridge Regression Using Backward Deletion

Structural Optimization Using Metamodels

Nested case-control and case-cohort studies

Example: Suppose we want to build a classifier that recognizes WebPages of graduate students.

An introduction to Support Vector Machine

PHYS 1443 Section 001 Lecture #4

Delay tomography for large scale networks

WiH Wei He

V.Abramov - FURTHER ANALYSIS OF CONFIDENCE INTERVALS FOR LARGE CLIENT/SERVER COMPUTER NETWORKS

Video-Based Face Recognition Using Adaptive Hidden Markov Models

Kernel-Based Bayesian Filtering for Object Tracking

TSS = SST + SSE An orthogonal partition of the total SS

Department of Economics University of Toronto

Digital Speech Processing Lecture 20. The Hidden Markov Model (HMM)

Consider processes where state transitions are time independent, i.e., System of distinct states,

Improved Stumps Combined by Boosting for Text Categorization

[Link to MIT-Lab 6P.1 goes here.] After completing the lab, fill in the following blanks: Numerical. Simulation s Calculations

New M-Estimator Objective Function. in Simultaneous Equations Model. (A Comparative Study)

Associative Memories

Single-loop System Reliability-Based Design & Topology Optimization (SRBDO/SRBTO): A Matrix-based System Reliability (MSR) Method

CH.3. COMPATIBILITY EQUATIONS. Continuum Mechanics Course (MMC) - ETSECCPB - UPC

Solution set Stat 471/Spring 06. Homework 2

How about the more general "linear" scalar functions of scalars (i.e., a 1st degree polynomial of the following form with a constant term )?

Math 124B January 31, 2012

Ordinary Differential Equations in Neuroscience with Matlab examples. Aim 1- Gain understanding of how to set up and solve ODE s

, t 1. Transitions - this one was easy, but in general the hardest part is choosing the which variables are state and control variables

Stochastic State Estimation and Control for Stochastic Descriptor Systems

Linear Response Theory: The connection between QFT and experiments

Spring Ammar Abu-Hudrouss Islamic University Gaza

Research on Complex Networks Control Based on Fuzzy Integral Sliding Theory

Supplementary Material: Learning Structured Weight Uncertainty in Bayesian Neural Networks

A Principled Approach to MILP Modeling

Time-interval analysis of β decay. V. Horvat and J. C. Hardy

Transcription:

CHAPTER 7: CLUSTERING

Semparamerc Densy Esmaon 3 Paramerc: Assume a snge mode for p ( C ) (Chapers 4 and 5) Semparamerc: p ( C ) s a mure of denses Mupe possbe epanaons/prooypes: Dfferen handwrng syes, accens n speech Nonparamerc: No mode; daa speaks for sef (Chaper 8)

Mure Denses 4 p k p G P G 1 where G he componens/groups/cusers, P ( G ) mure proporons (prors), p ( G ) componen denses Gaussan mure where p( G ) ~ N ( μ, ) parameers Φ = {P ( G ), μ, } k =1 unabeed sampe X={ } (unsupervsed earnng)

Casses vs. Cusers Supervsed: X = {,r } Casses C =1,...,K where p( C ) ~ N(μ, ) Φ = {P (C ), μ, } K =1 Unsupervsed : X = { } Cusers G =1,...,k where p( G ) ~ N ( μ, ) Φ = {P ( G ), μ, } k =1 Labes r? 5 k G P G p p 1 K P p p 1 C C T r r r r N r C P m m m S ˆ

Cuserng 6 Unsupervsed Learnng probem: We are ony gven daa descrpon.e., X = { } No cass abes are provded Our goa s o fnd groups among daa Each group possby represens smar obecs for eampe, fndng groups n onne news arces, where ndvdua groups conans arces reaed o spors or busness or pocs ec. Dfferen mehods are avaabe for cuserng K-means E-M agorhm Herarchca cuserng Specra cuserng

k-means Cuserng 7 Fnd k reference vecors (prooypes/codebook vecors/codewords) whch bes represen daa Reference vecors, m, =1,...,k Use neares (mos smar) reference: m mn m Reconsrucon error E b k X b 1 m m 1 f m mn m 0 oherwse 2

Encodng/Decodng 8 b 1 f m mn m 0 oherwse

9 k-means Cuserng

k-means Cuserng 10 Ths s an erave agorhm I akes as npu k, whch s he number of reference vecors or cuser ceners I sars random nazaons (guesses) of k cuser ceners and repeas he foowng wo seps un convergence Assgn each daa pons o s coses cuser cener Thus a daa pons cose o a cuser cener form a group There are k such groups In each group, he average of a daa pons s compued and s assgned as new cuser cener

11

Epecaon-Mamzaon (EM) 12 In k-means, we approached cuserng as fndng codebook vecors ha mnmzes reconsrucon error Now, our approach s probabsc and we ook for componen densy parameers ha mamzes he kehood of he sampe Log kehood wh a mure mode gven he sampe X = { } s L X og p og k 1 p G P G Where Φncudes he prors P(G ) and parameers of componen denses P( G ) Unforunaey, we can no sove hs opmzaon probem anaycay and resor o erave opmzaon

Epecaon-Mamzaon (EM) 13 The Epecaon-Mamzaon (E-M) agorhm s used n mamum kehood esmaon where he probem nvoves wo ses of random varabes Observabe varabe X Hdden varabe Z The goa of he E-M agorhm s o fnd parameer vecorφ ha mamzes he observabe vaues of X, L(Φ X) Bu n cases, where hs s no feasbe, we assocae an era hdden varabe Z and epress he underyng mode usng X and Z Assumed hdden varabes z, whch when known, make opmzaon much smper Compee kehood, L c (Φ X,Z), n erms of and z Incompee kehood, L(Φ X), n erms of

E- and M-seps 14 Ierae he wo seps 1. E-sep: Esmae z gven X and curren Φ 2. M-sep: Fnd new Φ gven z, X, and od Φ. E - sep: Q M- sep: EL C X,Z 1 argmaq X, An ncrease n Q ncreases ncompee kehood 1 X L X L

z = 1 f beongs o G, 0 oherwse (abes r of supervsed earnng); assume p( G )~N(μ, ) E-sep: M-sep: EM n Gaussan Mures 15 h G P G P G p G P G p z E,,, X, T h h h h N h P 1 1 1 1 m m m S G Use esmaed abes n pace of unknown abes

EM n Gaussan Mures 16 If each componen densy share a common covarance mar S=s 2 I hen p( G )~N(m, s 2 I)) Mamzng he M sep eads o sovng he foowng probem Where, h s a number beween 0 and 1 Ths probem ooks very smar o k-means agorhm, ecep b of k-means agorhm makes a hard assgnmen whe h of E-M agorhm makes a sof assgnmen

17 P(G 1 )=h 1 =0.5

Afer Cuserng 19 Dmensonay reducon mehods fnd correaons beween feaures and group feaures Cuserng mehods fnd smares beween nsances and group nsances Aows knowedge eracon hrough number of cusers, pror probabes, cuser parameers,.e., cener, range of feaures. Eampe: CRM, cusomer segmenaon

Mure of Mures 21 In cassfcaon, he npu comes from a mure of casses (supervsed). If each cass s aso a mure, e.g., of Gaussans, (unsupervsed), we have a mure of mures: p k C p G P G p 1 K p C PC 1

Herarchca Cuserng 23 Cuser based on smares/dsances Dsance measure beween nsances r and s Mnkowsk (L p ) (Eucdean for p = 2) d m Cy-bock dsance r s d r s, 1 p 1/ p d cb r s d r, 1 s

Aggomerave Cuserng 24 Sar wh N groups each wh one nsance and merge wo coses groups a each eraon Dsance beween wo groups G and G : Snge-nk: d r s G, G mn d, r s G, G Compee-nk: d r s G, G ma d, r s G, G Average-nk, cenrod d r s G, G ave d, r s G, G

Eampe: Snge-Lnk Cuserng 25 Dendrogram

Choosng k 26 Defned by he appcaon, e.g., mage quanzaon Po daa (afer PCA) and check for cusers Incremena (eader-cuser) agorhm: Add one a a me un ebow (reconsrucon error/og kehood/nergroup dsances) Manuay check for meanng