CS 536: Machine Learning. Nonparametric Density Estimation Unsupervised Learning - Clustering

Similar documents
CHAPTER 7: CLUSTERING

Clustering (Bishop ch 9)

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 4

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

Machine Learning 2nd Edition

Bayes rule for a classification problem INF Discriminant functions for the normal density. Euclidean distance. Mahalanobis distance

Advanced Machine Learning & Perception

CHAPTER 10: LINEAR DISCRIMINATION

( ) [ ] MAP Decision Rule

Anomaly Detection. Lecture Notes for Chapter 9. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

( t) Outline of program: BGC1: Survival and event history analysis Oslo, March-May Recapitulation. The additive regression model

In the complete model, these slopes are ANALYSIS OF VARIANCE FOR THE COMPLETE TWO-WAY MODEL. (! i+1 -! i ) + [(!") i+1,q - [(!

Variants of Pegasos. December 11, 2009

An introduction to Support Vector Machine

Outline. Probabilistic Model Learning. Probabilistic Model Learning. Probabilistic Model for Time-series Data: Hidden Markov Model

Solution in semi infinite diffusion couples (error function analysis)

CH.3. COMPATIBILITY EQUATIONS. Continuum Mechanics Course (MMC) - ETSECCPB - UPC

Lecture VI Regression

Robust and Accurate Cancer Classification with Gene Expression Profiling

New M-Estimator Objective Function. in Simultaneous Equations Model. (A Comparative Study)

Discrete Markov Process. Introduction. Example: Balls and Urns. Stochastic Automaton. INTRODUCTION TO Machine Learning 3rd Edition

V.Abramov - FURTHER ANALYSIS OF CONFIDENCE INTERVALS FOR LARGE CLIENT/SERVER COMPUTER NETWORKS

CHAPTER 5: MULTIVARIATE METHODS

Fall 2010 Graduate Course on Dynamic Learning

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

Chapter 4. Neural Networks Based on Competition

Chapter Lagrangian Interpolation

Learning Objectives. Self Organization Map. Hamming Distance(1/5) Introduction. Hamming Distance(3/5) Hamming Distance(2/5) 15/04/2015

Clustering with Gaussian Mixtures

Lecture 6: Learning for Control (Generalised Linear Regression)

INTRODUCTION TO MACHINE LEARNING 3RD EDITION

Objectives. Image R 1. Segmentation. Objects. Pixels R N. i 1 i Fall LIST 2

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

Lecture 11 SVM cont

Introduction to Boosting

Ordinary Differential Equations in Neuroscience with Matlab examples. Aim 1- Gain understanding of how to set up and solve ODE s

FI 3103 Quantum Physics

CHAPTER 2: Supervised Learning

Dynamic Team Decision Theory. EECS 558 Project Shrutivandana Sharma and David Shuman December 10, 2005

Machine Learning Linear Regression

Math 128b Project. Jude Yuen

Robustness Experiments with Two Variance Components

Computing Relevance, Similarity: The Vector Space Model

Normal Random Variable and its discriminant functions

EEL 6266 Power System Operation and Control. Chapter 5 Unit Commitment

TSS = SST + SSE An orthogonal partition of the total SS

Kernel-Based Bayesian Filtering for Object Tracking

Department of Economics University of Toronto

Notes on the stability of dynamic systems and the use of Eigen Values.

Single and Multiple Object Tracking Using a Multi-Feature Joint Sparse Representation

John Geweke a and Gianni Amisano b a Departments of Economics and Statistics, University of Iowa, USA b European Central Bank, Frankfurt, Germany

Track Properities of Normal Chain

A Tour of Modeling Techniques

Linear Response Theory: The connection between QFT and experiments

CS286.2 Lecture 14: Quantum de Finetti Theorems II

J i-1 i. J i i+1. Numerical integration of the diffusion equation (I) Finite difference method. Spatial Discretization. Internal nodes.

Cubic Bezier Homotopy Function for Solving Exponential Equations

Detection of Waving Hands from Images Using Time Series of Intensity Values

Appendix to Online Clustering with Experts

Single-loop System Reliability-Based Design & Topology Optimization (SRBDO/SRBTO): A Matrix-based System Reliability (MSR) Method

Chapter 6 DETECTION AND ESTIMATION: Model of digital communication system. Fundamental issues in digital communications are

Appendix H: Rarefaction and extrapolation of Hill numbers for incidence data

Lecture 2 L n i e n a e r a M od o e d l e s

Genetic Algorithm in Parameter Estimation of Nonlinear Dynamic Systems

DEEP UNFOLDING FOR MULTICHANNEL SOURCE SEPARATION SUPPLEMENTARY MATERIAL

A Principled Approach to MILP Modeling

doi: info:doi/ /

HEAT CONDUCTION PROBLEM IN A TWO-LAYERED HOLLOW CYLINDER BY USING THE GREEN S FUNCTION METHOD

GENERATING CERTAIN QUINTIC IRREDUCIBLE POLYNOMIALS OVER FINITE FIELDS. Youngwoo Ahn and Kitae Kim

Boosted LMS-based Piecewise Linear Adaptive Filters

Scattering at an Interface: Oblique Incidence

Introduction ( Week 1-2) Course introduction A brief introduction to molecular biology A brief introduction to sequence comparison Part I: Algorithms

Filtrage particulaire et suivi multi-pistes Carine Hue Jean-Pierre Le Cadre and Patrick Pérez

Chapter 6: AC Circuits

5th International Conference on Advanced Design and Manufacturing Engineering (ICADME 2015)

Machine Vision based Micro-crack Inspection in Thin-film Solar Cell Panel

Pattern Classification (III) & Pattern Verification

GMM parameter estimation. Xiaoye Lu CMPS290c Final Project

On One Analytic Method of. Constructing Program Controls

WiH Wei He

Let s treat the problem of the response of a system to an applied external force. Again,

[ ] 2. [ ]3 + (Δx i + Δx i 1 ) / 2. Δx i-1 Δx i Δx i+1. TPG4160 Reservoir Simulation 2018 Lecture note 3. page 1 of 5

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

Tight results for Next Fit and Worst Fit with resource augmentation

Foundations of State Estimation Part II

THE PREDICTION OF COMPETITIVE ENVIRONMENT IN BUSINESS

MCs Detection Approach Using Bagging and Boosting Based Twin Support Vector Machine

Comparison of Supervised & Unsupervised Learning in βs Estimation between Stocks and the S&P500

Structural Optimization Using Metamodels

Sampling Procedure of the Sum of two Binary Markov Process Realizations

PHYS 1443 Section 001 Lecture #4

An Affine Symmetric Approach to Natural Image Compression

Chapters 2 Kinematics. Position, Distance, Displacement

Object Tracking Based on Visual Attention Model and Particle Filter

A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window

Comb Filters. Comb Filters

January Examinations 2012

Part II CONTINUOUS TIME STOCHASTIC PROCESSES

Video-Based Face Recognition Using Adaptive Hidden Markov Models

Hidden Markov Models Following a lecture by Andrew W. Moore Carnegie Mellon University

Transcription:

CS 536: Machne Learnng Nonparamerc Densy Esmaon Unsupervsed Learnng - Cluserng Fall 2005 Ahmed Elgammal Dep of Compuer Scence Rugers Unversy CS 536 Densy Esmaon - Cluserng - 1 Oulnes Densy esmaon Nonparamerc kernel densy esmaon Mure Denses Unsupervsed Learnng - Cluserng: Herarchcal Cluserng K-means Cluserng Mean Shf Cluserng Specral Cluserng Graph Cus Applcaon o Image Segmenaon CS 536 Densy Esmaon - Cluserng - 2

Densy Esmaon Paramerc: Assume a sngle model for p ( C (Chaper 4 and 5 Semparamerc: p ( C s a mure of denses Mulple possble eplanaons/prooypes: Dfferen handwrng syles, accens n speech Nonparamerc: No model; daa speaks for self (Chaper 8 CS 536 Densy Esmaon - Cluserng - 3 Nonparamerc Densy Esmaon Densy Esmaon: Gven a sample S={ } =1..N from a dsrbuon oban an esmae of he densy funcon f ( a any pon. Paramerc : Assume a paramerc densy famly f (. θ, (e. N(µ,σ 2 and oban he bes esmaor θ of θ Advanages: Effcen Robus o nose: robus esmaors can be used Problem wh paramerc mehods An ncorrecly specfed paramerc model has a bas ha canno be removed even by large number of samples. Nonparamerc : drecly oban a good esmae f ( of he enre densy f ( from he sample. Mos famous eample: Hsogram CS 536 Densy Esmaon - Cluserng - 4

Kernel Densy Esmaon 1950s + (F & Hodges 51, Rosenbla 56, Parzen 62, Cencov 62 Gven a se of samples S={ } =1..N we can oban an esmae for he densy a as: N N 1 1 f ( = K ( = K Nh h N = 1 = 1 h ( CS 536 Densy Esmaon - Cluserng - 5 N N 1 1 f ( = K ( = K Nh h N = 1 = 1 ( where K h (=K(/h/h called kernel funcon (wndow funcon h : scale or bandwdh K sasfes ceran condons, e.g.: K ( d h =1 K h ( 0 h CS 536 Densy Esmaon - Cluserng - 6

Kernel Esmaon A varey of kernel shapes wh dfferen properes. Gaussan kernel s ypcally used for s connuy and dfferenably. Mulvarae case: Kernel Produc Use same kernel funcon wh dfferen bandwdh h for each dmenson. General form: avod o sore all he samples N f ( = α K = 1 h ( 1 f ( = N N d = 1 = 1 K h ( CS 536 Densy Esmaon - Cluserng - 7 Kernel Densy Esmaon Advanages: Converge o any densy shape wh suffcen samples. asympocally he esmae converges o any densy. No need for model specfcaon. Unlke hsograms, densy esmaes are smooh, connuous and dfferenable. Easly generalze o hgher dmensons. All oher paramerc/nonparamerc densy esmaon mehods, e.g., hsograms, are asympocally kernel mehods. In many applcaons, he denses are mulvarae and mulmodal wh rregular cluser shapes. CS 536 Densy Esmaon - Cluserng - 8

Eample: color clusers Cluser shapes are rregular Cluser boundares are no well defned. rom omancu and eer ean shf robus approach oward feaure space analyss CS 536 Densy Esmaon - Cluserng - 9 Converson - KDE Esmaon usng Gaussan Kernel Esmaon usng Unform Kernel CS 536 Densy Esmaon - Cluserng - 10

Converson - KDE Esmaon usng Gaussan Kernel Esmaon usng Unform Kernel CS 536 Densy Esmaon - Cluserng - 11 Scale selecon Imporan problem. Large leraure. Small h resuls n ragged denses. Large h resuls n over smoohng. Bes choce for h depends on he number of samples: small n, wde kernels large n, Narrow kernels lm h( n = 0 n CS 536 Densy Esmaon - Cluserng - 12

Opmal scale Opmal kernel and opmal scale can be acheved by mnmzng he mean negraed square error f we know he densy! Normal reference rule: h op = (4 / 3 σ n 1.06σ n 1/ 5 1/5 ˆ 1/ 5 CS 536 Densy Esmaon - Cluserng - 13 Scale selecon CS 536 Densy Esmaon - Cluserng - 14

From R. O. Duda, P. E. Har, and D. G. Sork. Paern Classfcaon Wley, New York, 2nd edon, 2000 CS 536 Densy Esmaon - Cluserng - 15 Densy Esmaon Paramerc: Assume a sngle model for p ( C (Chaper 4 and 5 Semparamerc: p ( C s a mure of denses Mulple possble eplanaons/prooypes: Dfferen handwrng syles, accens n speech Nonparamerc: No model; daa speaks for self (Chaper 8 CS 536 Densy Esmaon - Cluserng - 16

Mure Denses p ( = p( G P( G = 1 where G he componens/groups/clusers, P ( G mure proporons (prors, p ( G componen denses k Gaussan mure where p( G ~ N ( µ, parameers Φ = {P ( G, µ, } k =1 unlabeled sample X={ } (unsupervsed learnng CS 536 Densy Esmaon - Cluserng - 17 Classes vs. Clusers Supervsed: X = {,r } Classes C =1,...,K p ( = p( C P( C = 1 where p ( C ~ N ( µ, Φ = {P (C, µ, } K =1 Pˆ ( C S = K r r = m = N r r ( ( m m r T Unsupervsed : X = { } Clusers G =1,...,k p k ( = p( G P( G = 1 where p ( G ~ N ( µ, Φ = {P ( G, µ, } k =1 Labels, r? CS 536 Densy Esmaon - Cluserng - 18

k-means Cluserng Fnd k reference vecors (prooypes/codebook vecors/codewords whch bes represen daa Reference vecors, m, =1,...,k Use neares (mos smlar reference: m = mn m Reconsrucon error E b k ({ m } X = = 1 1 = 0 f m oherwse b m = mn m CS 536 Densy Esmaon - Cluserng - 19 Encodng/Decodng b = 1 f m = mn m 0 oherwse CS 536 Densy Esmaon - Cluserng - 20

k-means Cluserng CS 536 Densy Esmaon - Cluserng - 21 CS 536 Densy Esmaon - Cluserng - 22

Image Clusers on nensy Clusers on color K-means cluserng usng nensy alone and color alone K=5 segmened mage s labeled wh cluser means CS 536 Densy Esmaon - Cluserng - 23 Image Clusers on color K-means usng color alone, 11 segmens CS 536 Densy Esmaon - Cluserng - 24

K-means usng color alone, 11 segmens. CS 536 Densy Esmaon - Cluserng - 25 K-means usng color and poson, 20 segmens CS 536 Densy Esmaon - Cluserng - 26

Herarchcal Cluserng Cluser based on smlares/dsances Dsance measure beween nsances r and s Mnkowsk (L p (Eucldean for p = 2 d m Cy-block dsance p 1/ [ ] p r s d r s (, = = ( 1 d cb r s d r (, = = 1 s CS 536 Densy Esmaon - Cluserng - 27 Herarchcal Cluserng: Agglomerave cluserng cluserng by mergng boom-up Each daa pon s assumed o be a cluser Recursvely merge clusers Algorhm: Make each pon a separae cluser Unl he cluserng s sasfacory Merge he wo clusers wh he smalles ner-cluser dsance Dvsve cluserng cluserng by splng op-down The enre daa se s regarded as a cluser Recursvely spl clusers Algorhm: Consruc a sngle cluser conanng all pons Unl he cluserng s sasfacory Spl he cluser ha yelds he wo componens wh he larges ner-cluser dsance CS 536 Densy Esmaon - Cluserng - 28

Herarchcal Cluserng: Two man ssues: Wha s a good ner-cluser dsance sngle-lnk cluserng: dsance beween he closes elemens -> eended clusers complee-lnk cluserng: he mamum dsance beween elemens > rounded clusers group-average cluserng: Average dsance beween elemens rounded clusers How many clusers are here (model selecon Dendrograms yeld a pcure of oupu as cluserng process connues CS 536 Densy Esmaon - Cluserng - 29 Agglomerave Cluserng Sar wh N groups each wh one nsance and merge wo closes groups a each eraon Dsance beween wo groups G and G : Sngle-lnk: Complee-lnk: d d Average-lnk, cenrod r s ( G, G = mn d (, r G, G r s ( G, G = ma d (, r s G, G s CS 536 Densy Esmaon - Cluserng - 30

Eample: Sngle-Lnk Cluserng Dendrogram CS 536 Densy Esmaon - Cluserng - 31 Choosng k Defned by he applcaon, e.g., mage quanzaon Plo daa (afer PCA and check for clusers Incremenal (leader-cluser algorhm: Add one a a me unl elbow (reconsrucon error/log lkelhood/nergroup dsances Manual check for meanng CS 536 Densy Esmaon - Cluserng - 32

CS 536 Densy Esmaon - Cluserng - 33 Mean Shf Gven a sample S={s :s R n } and a kernel K, he sample mean usng K a pon : m( = s K( s Ieraon of he form m( wll lead o he densy local mode Le s he cener of he wndow Ierae unl converson. Compue he sample mean m( from he samples nsde he wndow. Replace wh m( K( s CS 536 Densy Esmaon - Cluserng - 34

CS 536 Densy Esmaon - Cluserng - 35 Mean Shf Gven a sample S={s :s R n } and a kernel K, he sample mean usng K a pon : Fukunaga and Hosler 1975 nroduced he mean shf as he dfference m(- usng a fla kernel. Ieraon of he form m( wll lead o he densy mode Cheng 1995 generalzed he defnon usng general kernels and weghed daa Recenly popularzed by D. Comancu and P. Meer 99+ Applcaons: Cluserng[Cheng,Fu 85], mage flerng, segmenaon[meer 99] and rackng [Meer 00]. = s K s K s m ( ( ( ( ( ( ( ( s w s K s w s K s m = CS 536 Densy Esmaon - Cluserng - 36 Mean Shf Ieraons of he form m( are called mean shf algorhm. If K s a Gaussan (e.g. and he densy esmae usng K s Usng Gaussan Kernel K σ (, he dervave s we can show ha: he mean shf s n he graden drecon of he densy esmae. = s w s K C P ( ( ˆ( m P P = ( ˆ( ˆ( ( ( 2 K K σ σ σ =

Mean Shf The mean shf s n he graden drecon of he densy esmae. Successve eraons would converge o a local mama of he densy,.e., a saonary pon: m(=. Mean shf s a seepes-ascen lke procedure wh varable sze seps ha leads o fas convergence well-adused seepes ascen. CS 536 Densy Esmaon - Cluserng - 37 CS 536 Densy Esmaon - Cluserng - 38

Mean shf and Image Flerng Dsconnuy preservng smoohng Recall, average or Gaussan flers blur mages and do no preserve regon boundares. Mean shf applcaon: Represen each pel as spaal locaon s and range r (color, nensy Look for modes n he on spaal-range space Use a produc of wo kernels: a spaal kernel wh bandwdh h s and a range kernel wh bandwdh h r K ( k ( s r h, hr hs hr Algorhm: For each pel =( s, r apply mean shf unl converson. Le he converson pon be (y s,y r Assgn z = ( s,y r as fler oupu Resuls: see he paper. s = k CS 536 Densy Esmaon - Cluserng - 39 CS 536 Densy Esmaon - Cluserng - 40

Graph Cu Wha s a Graph Cu: We have undreced, weghed graph G=(V,E Remove a subse of edges o paron he graph no wo dson ses of verces A,B (wo sub graphs: A B = V, A B = Φ CS 536 Densy Esmaon - Cluserng - 42 Graph Cu Each cu corresponds o some cos (cu: sum of he weghs for he edges ha have been removed. cu( A, B = u A, v B w( u, v A B CS 536 Densy Esmaon - Cluserng - 43

Graph Cu In many applcaons s desred o fnd he cu wh mnmum cos: mnmum cu Well suded problem n graph heory, wh many applcaons There ess effcen algorhms for fndng mnmum cus cu( A, B = u A, v B A w( u, v B CS 536 Densy Esmaon - Cluserng - 44 Graph heorec cluserng Represen okens usng a weghed graph Weghs reflecs smlary beween okens affny mar Cu up hs graph o ge subgraphs such ha: Smlary whn ses mamum. Smlary beween ses mnmum. Mnmum cu CS 536 Densy Esmaon - Cluserng - 45

CS 536 Densy Esmaon - Cluserng - 46 Use eponenal funcon for edge weghs d( : feaure dsance w( = e ( d ( / σ 2 CS 536 Densy Esmaon - Cluserng - 47

Scale affecs affny w( = e ( d ( / σ 2 σ=0.1 σ=0.2 σ=1 CS 536 Densy Esmaon - Cluserng - 48 Egenvecors and cluserng Smples dea: we wan a vecor w gvng he assocaon beween each elemen and a cluser We wan elemens whn hs cluser o, on he whole, have srong affny wh one anoher We could mamze w Sum of T w n Awn Assocaon of elemen wh cluser n Affny beween and Assocaon of elemen wh cluser n CS 536 Densy Esmaon - Cluserng - 49

Egenvecors and cluserng We could mamze Bu need he consran Usng Lagrange mulpler λ w T Aw n n T w n wn = 1 Dfferenaon w T n Aw n + n n T λ( w w 1 Aw = λ n w n Ths s an egenvalue problem - choose he egenvecor of A wh larges egenvalue CS 536 Densy Esmaon - Cluserng - 50 Eample egenvecor pons egenvecor mar CS 536 Densy Esmaon - Cluserng - 51

Eample egenvecor pons mar Frs egenvecors The hree egenvecors correspondng o he ne hree egenvalues of he affny mar CS 536 Densy Esmaon - Cluserng - 52 Too many clusers! More obvous clusers egenvalues for hree dfferen scales for he affny mar CS 536 Densy Esmaon - Cluserng - 53

More han wo segmens Two opons Recursvely spl each sde o ge a ree, connung ll he egenvalues are oo small Use he oher egenvecors Algorhm Consruc an Affny mar A Compuer he egenvalues and egenvecors of A Unl here are suffcen clusers Take he egenvecor correspondng o he larges unprocessed egenvalue; zero all componens for elemens already clusered, and hreshold he remanng componens o deermne whch elemen belongs o hs cluser, (you can choose a hreshold by cluserng he componens, or use a fed hreshold. If all elemens are accouned for, here are suffcen clusers CS 536 Densy Esmaon - Cluserng - 54 We can end up wh egenvecors ha do no spl clusers because any lnear combnaon of egenvecors wh he same egenvalue s also an egenvecor. CS 536 Densy Esmaon - Cluserng - 55

Sources R. O. Duda, P. E. Har, and D. G. Sork. Paern Classfcaon. Wley, New York, 2nd edon, 2000 Ehem Alpaydn Inroducon o Machne Learnng Chaper 7 Forsyh and Ponce, Compuer Vson a Modern approach: chaper 14: 14.1,14.2,14.4. Sldes by D. Forsyh @ Berkeley Sldes by Ehem Alpaydn CS 536 Densy Esmaon - Cluserng - 73