Clustering Methods without Given Number of Clusters

Similar documents
Confusion matrices. True / False positives / negatives. INF 4300 Classification III Anne Solberg The agenda today: E.g., testing for cancer

Department of Mechanical Engineering Massachusetts Institute of Technology Modeling, Dynamics and Control III Spring 2002

Social Studies 201 Notes for March 18, 2005

Problem Set 8 Solutions

Social Studies 201 Notes for November 14, 2003

Suggestions - Problem Set (a) Show the discriminant condition (1) takes the form. ln ln, # # R R

Source slideplayer.com/fundamentals of Analytical Chemistry, F.J. Holler, S.R.Crouch. Chapter 6: Random Errors in Chemical Analysis

CHAPTER 6. Estimation

Suggested Answers To Exercises. estimates variability in a sampling distribution of random means. About 68% of means fall

Comparing Means: t-tests for Two Independent Samples

Factor Analysis with Poisson Output

Optimal Coordination of Samples in Business Surveys

Lecture 7: Testing Distributions

Chapter 2 Sampling and Quantization. In order to investigate sampling and quantization, the difference between analog

Lecture 21. The Lovasz splitting-off lemma Topics in Combinatorial Optimization April 29th, 2004

Avoiding Forbidden Submatrices by Row Deletions

Dimensional Analysis A Tool for Guiding Mathematical Calculations

A Simplified Methodology for the Synthesis of Adaptive Flight Control Systems

1. The F-test for Equality of Two Variances

A FUNCTIONAL BAYESIAN METHOD FOR THE SOLUTION OF INVERSE PROBLEMS WITH SPATIO-TEMPORAL PARAMETERS AUTHORS: CORRESPONDENCE: ABSTRACT

A Study on Simulating Convolutional Codes and Turbo Codes

Z a>2 s 1n = X L - m. X L = m + Z a>2 s 1n X L = The decision rule for this one-tail test is

Lecture 9: Shor s Algorithm

Predicting the Performance of Teams of Bounded Rational Decision-makers Using a Markov Chain Model

CHAPTER 8 OBSERVER BASED REDUCED ORDER CONTROLLER DESIGN FOR LARGE SCALE LINEAR DISCRETE-TIME CONTROL SYSTEMS

Preemptive scheduling on a small number of hierarchical machines

ON THE APPROXIMATION ERROR IN HIGH DIMENSIONAL MODEL REPRESENTATION. Xiaoqun Wang

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

NCAAPMT Calculus Challenge Challenge #3 Due: October 26, 2011

DYNAMIC MODELS FOR CONTROLLER DESIGN

Asymptotics of ABC. Paul Fearnhead 1, Correspondence: Abstract

Riemann s Functional Equation is Not Valid and its Implication on the Riemann Hypothesis. Armando M. Evangelista Jr.

Lecture 17: Analytic Functions and Integrals (See Chapter 14 in Boas)

Lecture 8: Period Finding: Simon s Problem over Z N

UNIT 15 RELIABILITY EVALUATION OF k-out-of-n AND STANDBY SYSTEMS

Codes Correcting Two Deletions

Chapter 5 Consistency, Zero Stability, and the Dahlquist Equivalence Theorem

[Saxena, 2(9): September, 2013] ISSN: Impact Factor: INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

μ + = σ = D 4 σ = D 3 σ = σ = All units in parts (a) and (b) are in V. (1) x chart: Center = μ = 0.75 UCL =

Chapter 4. The Laplace Transform Method

Riemann s Functional Equation is Not a Valid Function and Its Implication on the Riemann Hypothesis. Armando M. Evangelista Jr.

The Use of MDL to Select among Computational Models of Cognition

Solving Differential Equations by the Laplace Transform and by Numerical Methods

Convergence criteria and optimization techniques for beam moments

Linear Motion, Speed & Velocity

Supplementary Figures

Moment of Inertia of an Equilateral Triangle with Pivot at one Vertex

DIFFERENTIAL EQUATIONS

If Y is normally Distributed, then and 2 Y Y 10. σ σ

Lecture 10 Filtering: Applied Concepts

Assignment for Mathematics for Economists Fall 2016

PHYS 110B - HW #2 Spring 2004, Solutions by David Pace Any referenced equations are from Griffiths Problem statements are paraphrased

Advanced Digital Signal Processing. Stationary/nonstationary signals. Time-Frequency Analysis... Some nonstationary signals. Time-Frequency Analysis

By Xiaoquan Wen and Matthew Stephens University of Michigan and University of Chicago

Estimation of Peaked Densities Over the Interval [0,1] Using Two-Sided Power Distribution: Application to Lottery Experiments

Evolutionary Algorithms Based Fixed Order Robust Controller Design and Robustness Performance Analysis

White Rose Research Online URL for this paper: Version: Accepted Version

On the Isomorphism of Fractional Factorial Designs 1

One Class of Splitting Iterative Schemes

Cumulative Review of Calculus

Lecture 15 - Current. A Puzzle... Advanced Section: Image Charge for Spheres. Image Charge for a Grounded Spherical Shell

A Constraint Propagation Algorithm for Determining the Stability Margin. The paper addresses the stability margin assessment for linear systems

SMALL-SIGNAL STABILITY ASSESSMENT OF THE EUROPEAN POWER SYSTEM BASED ON ADVANCED NEURAL NETWORK METHOD

Control Systems Analysis and Design by the Root-Locus Method

Physics 741 Graduate Quantum Mechanics 1 Solutions to Final Exam, Fall 2014

CHAPTER 4 DESIGN OF STATE FEEDBACK CONTROLLERS AND STATE OBSERVERS USING REDUCED ORDER MODEL

IEOR 3106: Fall 2013, Professor Whitt Topics for Discussion: Tuesday, November 19 Alternating Renewal Processes and The Renewal Equation

Halliday/Resnick/Walker 7e Chapter 6

Theoretical Computer Science. Optimal algorithms for online scheduling with bounded rearrangement at the end

Lecture 4 Topic 3: General linear models (GLMs), the fundamentals of the analysis of variance (ANOVA), and completely randomized designs (CRDs)

The Hassenpflug Matrix Tensor Notation

An estimation approach for autotuning of event-based PI control systems

Determination of the local contrast of interference fringe patterns using continuous wavelet transform

Random vs. Deterministic Deployment of Sensors in the Presence of Failures and Placement Errors

4.6 Principal trajectories in terms of amplitude and phase function

The Dynamics of Learning Vector Quantization

MATEMATIK Datum: Tid: eftermiddag. A.Heintz Telefonvakt: Anders Martinsson Tel.:

Recent progress in fire-structure analysis

Unified Design Method for Flexure and Debonding in FRP Retrofitted RC Beams

arxiv: v2 [nucl-th] 3 May 2018

III.9. THE HYSTERESIS CYCLE OF FERROELECTRIC SUBSTANCES

Compact finite-difference approximations for anisotropic image smoothing and painting

STOCHASTIC GENERALIZED TRANSPORTATION PROBLEM WITH DISCRETE DISTRIBUTION OF DEMAND

Constant Force: Projectile Motion

7.2 INVERSE TRANSFORMS AND TRANSFORMS OF DERIVATIVES 281

Approximating discrete probability distributions with Bayesian networks

LINEAR ALGEBRA METHOD IN COMBINATORICS. Theorem 1.1 (Oddtown theorem). In a town of n citizens, no more than n clubs can be formed under the rules

Online Appendix for Managerial Attention and Worker Performance by Marina Halac and Andrea Prat

An Inequality for Nonnegative Matrices and the Inverse Eigenvalue Problem

EE Control Systems LECTURE 6

SAMPLING. Sampling is the acquisition of a continuous signal at discrete time intervals and is a fundamental concept in real-time signal processing.

Correction for Simple System Example and Notes on Laplace Transforms / Deviation Variables ECHE 550 Fall 2002

SIMPLE LINEAR REGRESSION

DIFFERENTIAL EQUATIONS Laplace Transforms. Paul Dawkins

A BATCH-ARRIVAL QUEUE WITH MULTIPLE SERVERS AND FUZZY PARAMETERS: PARAMETRIC PROGRAMMING APPROACH

Kalman Filter. Wim van Drongelen, Introduction

Laplace Transformation

Jul 4, 2005 turbo_code_primer Revision 0.0. Turbo Code Primer

Improved multi-level pedestrian behavior prediction based on matching with classified motion patterns

Pairwise Markov Random Fields and its Application in Textured Images Segmentation

Transcription:

Clutering Method without Given Number of Cluter Peng Xu, Fei Liu Introduction A we now, mean method i a very effective algorithm of clutering. It mot powerful feature i the calability and implicity. However, the mot diadvantage i that we mut now the number of cluter in the firt place, which i uually a difficult problem in practice. In thi paper, we propoe a new approach pea-earching clutering to realize clutering without given the number of cluter. Our method i baed on the imilarity graph[] of the data point. Through the relationhip between point, we capture thoe point which are near the cluter center. By finding the pea area of the tationary ditribution of the random wal on the correponding imilarity graph, we figure out a way to capture the point near the cluter center area, which we call pea point in thi paper. The advantage of our method i that we don t need the number of cluter a an input our algorithm etimate the number of cluter in the dataet, which we believe can be a good indictor of the true value. Pea-earching clutering Denote X := {x (i) } i=,...,m a the ample data et. m i the ample ize, n i the dimenion, x (i) R n. Since we do not now how many cluter there are in X, We try to have a good etimation of the number of cluter or the poible cluter center. Aume the data et i dividable, then the point in one cluter t to be near to each other. Our imple trategy i to capture thoe point which are near the cluter center.. degree and tationary ditribution Conider the imilarity graph G = (V, E), with V = {x (), x (),..., x (m) }. And W R m m i the correponding weighted adjacency matrix. The generalized degree of a point x (i) i defined a d i = n j= W i,j. The degree matrix i defined a D = diag(d) = diag(d, d,..., d m ). Here, d i i determined by i, or by x (i) there no eential difference. In the following paage we may ue d = d(x (i) ) to denote the mapping x (i) d(x (i) ) to implify the illutration. Similarly, W i,j ha no difference with W (x (i), x (j) ), and i can be ued to denote the point x (i) it won t caue any ambiguity. Now, conider the random wal on graph G with weighted adjacency matrix W. Then, the random wal ha a tationary ditribution π over all the point. It reaonable to claim that, the cloer to the cluter center a point i i, the higher π i i. That i to ay, π i i a local maximum of the tationary ditribution if i i the cloet point to the center. From the theorie in tochatic proce, we now that, π i the olution of the linear equation: πp = π, ubjected to m π i = i= where P = D W i the correponding tranition matrix. Since G i ymmetric, we can directly olve π a π d. So we can ue d to indicate the relative value of π.. pea-earching proce For a well-defined clutering problem, there hould be everal pea in the et {(x (i), d(x (i) )) : i =,,..., m}, that i, if we conider the mapping d from {x (), x (),..., x (m) } to R, then the mapping hould have everal local maxima area, which are the center area of the cluter. Given the degree of all the point, we want to capture one point per uch area, to locate one cluter center. We call thee point the pea point of cluter. To find uch point, firt, we get the point with the highet degree a the firt pea point. Note that, pea point hould not be cloe to each other becaue each of them i near the center of different cluter, therefore, if we have an appropriate neighborhood of the firt pea point, then the point of highet degree outide the neighborhood will

be the econd pea point. Theoretically, we can capture all the ret pea point by cutting off appropriate neighborhood of the exiting pea point. But it i difficult to determine the ize of neighborhood. Here we introduce the concept of peritency of point. A we increae the ize of neighborhood, the highet degree outide the neighborhood will decreae. Specifically, we conider a -nearet neighborhood of the firt pea point. And a we increae from to m, more and more point are included in the neighborhood. The highet degree outide the growing neighborhood i decreaing. But there exit ome reitance againt uch drop tency. We illutrate thi by an D example, Figure..... d........ x x 5 5 x x x Figure : In the left figure, there are two pea and we need to capture the pea point x,x. We get x a the firt pea point, becaue it ha the maximum value of d. Then we continuouly remove the neighborhood of x, which mean we remove (x, x + ) with growing from to. A grow, the maximum value outide the interval (x, x + ) drop, a hown in the right figure. But it drop to d(x ) at = and eep the level until x i included in the neighborhood of x and then the value goe down again. In thi example i continuou, but the baic idea i the ame for dicrete ituation. We call uch reitance againt the drop tency a the peritency of a point. In the example above, a we cut off the neighborhood of x by larger and larger ize, x how up to reit the drop tency. The peritency of x i defined a x x, which mean the length of period it hold the maximum value of d outide the neighborhood of x. Similarly we can define peritency to any other point. To implify our illutration, we call the maximum value of d outide the -nearet neighborhood of ome point a (). alo dep on the ome point, but for implicity, we jut ue () if it doen t caue ambiguity. Pea point t to have high peritency than nonpea point. In the example above, x ha the highet peritency, o we pic it out a the econd pea point. After we get N pea point. We earch the (N + )-th pea point (if there i any) by the following rule: We cut off the -nearet neighborhood of the N current pea point imultaneouly and oberve the point with the highet d among the ret of the point, then we pic out the mot peritent point during the growth of from to m. Then, the point we pic out i the potential (N + )-th pea point. Now we conider the termination of the earching proce. One way i to et a lower bound for the peritency, ince hort peritency doe not reflect the table tructure. But the lower bound varie between different data et and it i not eay to elect. Here we ue another method. We preproce the data to find out thoe point that are liely to be near the center of cluter. Recall that we have d a the indicator of the lielihood of a point to be near the center. If d i i higher than mot of the d j where j i near i, then d i i more liely to be a center. To compare d i with d j where j i near i, we imply compare it with the average weighted value of d j, where the weight i W i,j, an indicator of how j i near i. We call uch average weighted value a h i, then we can compute h by: h = dp T = dw D Then, the termination condition i: For a potential pea point x (i), if d i > h i, then i can be conidered a a real pea point, and the earching proce can proceed; if it not, then i i not conidered a a real pea point, and the proce terminate. The idea of finding h come from the theory related to heat equation: h i indeed a moothne of d by convoluting d with ome probability ditribution. It drag down the pea and raie up the valley. The behavior of heat i the ame: a the time goe on, heat flow from poition with high temperature to poition with low temperature. And the olution to the

heat equation i a convolution of the initial data with the Poion heat ernel. If we want to now where the pea are, we imply find where the value are dragged down. And the dragged down area hould be the pea area. Algorithm Pea-earching clutering Input: {x (i) : i =,,..., m}, input data W, weighted adjacency matrix Output: pea point (cluter center) cluter label of all point Peudo code:. Compute d i = m j= W i,j. Compute h i = m j= ( m j= W i,jd j )/d i 3. Add x (i) which ha the highet d i into the et of pea point. Find all pea point, repeat until top: Set the peritency of all point to for =,,..., m Find x (i) which ha the highet d i a- mong thoe point that are not in the -nearet neighborhood of all the current pea point Let the peritency of x (i) increae by Find x (c) which ha the highet peritency if d c > h c add x (c) into the et of pea point ele top finding pea point.3 clutering After getting all the pea point, we have the number of cluter a the number of pea point. Noting that the pea point are quite cloe to the cluter center, we can directly regard the pea point a the cluter center and all the other point are aigned to different cluter baed on the ditance between the point to the pea point. The point hare the ame cluter label with the nearet pea point. 3 Experiment In thi ection, we will ue both imulated data and real data to demontrate the utility of our approach. In the real data experiment, we compare our method with dpmean method[] and mean method. Throughout the experiment, we ue normalized mutual information(nmi) between the ground truth clae and algorithm output for evaluation. When uing mean, we ue our etimation of the number of cluter a the input. A to dpmean, we apply max-min random election method to etimate λ baed on our etimation of the number of cluter. Contruction of Graph: We can either ue mutual -nearet neighbor graph or Gauian weighted graph in our experiment. In thi paper, we ue Gauian weight, we chooe the parameter σ in Gauian imilarity function to be the mean variance of the original data. We alo have done experiment with mutual - nearet neighbor graph and generally we can achieve good performance when etting 5%m, the reult i not hown in thi paper. Firt, we ue 3 imulated et data of Gauian ditribution on the D plane to how how our algorithm wor (Figure,3,,5). We alo apply our method to 5 UCI data et. For each et of real data, we firt apply PCA to the original data(eeping over 9% principle component), and then implement our clutering method. The reult are hown in Tabel. 5. Set the pea point a the cluter center. Let each of the other point hare the ame cluter label a it nearet pea point. 3

data with label data without label Figure : The left figure i the original ample point generated by 3 Gauian ditribution with covariance matrix equal to I, and mean equal to ( 3, ) for the red-colored et, (, 3) for the green-colored et, and (3, ) for the blue-colored et. The number of ample point in each colored et i: 5 red, green, 5 blue. the degree d of each point h of each point the comparion of d and h Figure 3: In the left figure, d repreent the tationary ditribution of the random wal on the graph G, the value of d uccefully reflect how much a point i near a cluter center the pea area in d are exactly the center area of cluter. The middle figure how h, a moothne of d. In the right figure, we mar the point with d > h a red, and d < h a blue, we ee that uch tandard i a good indicator of whether or not a point i in the center area of a cluter. 7 7 7 5 5 5 3 3 3 peritency, with pea point 5 5 5 3 peritency, with pea point 5 5 5 3 peritency, with 3 pea point 5 5 Figure : Thee three figure how the value of () of the pea point. () drop a grow, but a we can ee, there are point that reit uch drop. In the figure, we mared out with red line the larget peritency period. In the firt figure, we ue pea point (the point with highet value of d) and eep cutting off it -th nearet neighborhood, and capture the econd pea point by it peritent behavior. In the middle figure, we ue pea point and imultaneouly to cut off their -th nearet neighborhood and capture the third pea point. In the right figure, A we ve already found 3 pea point, there no ignificant peritency anymore, what we get i a fae pea point. By the correponding value of h and d, it not in the center area and the earching proce terminate.

pea point aigned label Figure 5: A the earching for pea point terminate, we can aign the cluter label to all the point. The left figure how the pea point found, which i mared a red. The right figure how the aigned label. data et PSC dpmean mean Iri(3).7().77().79() Wine(3).35().33().59() Seed(3).97(3).57().99(3) Soybean().7375().93(5).73() Pima().57().9(5).() Table : UCI data et:nmi(number of cluter). In the firt column are the name of data et and the number in the parenthee are the number of true clae. In the other column, the number in the parenthee are the number of cluter algorithm output. In the cae of mean, it i ame with that of pea-earching clutering(psc). Dicuion In the previou ection, we have exhibited a brand new clutering method pea-earching clutering. It ha good performance in many cae. There are a few point we hould mention here. Firt, our method i baed on the imilarity graph, and eentially baed on the Euclid ditance of data point. In thi cae, our method can only deal with linearly dividable problem, the ame with mean approach. For the tructure lie a ring within a ring, we cannot eperate the two ring. Second, PSC i a very intuitive and traightforward method. We tart from the connection between point and try to ditinguih point lying in the center with thoe lying on the margin. We give reaonable cluter baed on thoe center point. However, a to clutering problem itelf, the number of cluter i probably undeterminable in practice, ince there i no abolute tandard of clutering. A good clutering method can only provide reaonable olution intead of the right olution, which i exactly what our approach ha done. The lat point i the parameter election problem. A in PSC, the contruction of imilarity graph turn out to be a very important tep. Either uing -NN graph or Guaian weighted graph, we have parameter or σ to be determined. In ome extreme cae, the clutering reult i quite enitive to the parameter election. In our experiment, we chooe σ a the mean of variance of the data point in all attribute. Probably thi i not the bet choice. How to elect σ might be a good problem to wor on. 5 Concluion In thi paper, we tart from a imple intuition, and provide a new clutering method without nowing the number of the cluter pea-earching clutering. We explain our idea from the perpective of random wal. We alo introduce the peritency concept in the peaearching proce. And in the experiment, PSC doe a good job in clutering problem. Reference [] U. von Luxburg, A tutorial on pectral clutering, Tech. Rep. 9, Max Planc Intitute for Biological Cybernetic, Augut. [] Kuli, B. and Jordan,M.I. Reviiting -mean: New Algorithm via Bayeian Nonparametric. In ICML,. 5