REGIONALIZATION AS SPATIAL DATA MINING PROBLEM BASED ON CLUSTERING: REVIEW

Size: px
Start display at page:

Download "REGIONALIZATION AS SPATIAL DATA MINING PROBLEM BASED ON CLUSTERING: REVIEW"

Transcription

1 REGIONALIZATION AS SPATIAL DATA MINING PROBLEM BASED ON CLUSTERING: REVIEW Geetinder Saini 1, Kamaljit Kaur 2 1 Department of Computer Science & Engineering 2 Assistant Professor, Department of Computer Science & Engineering Sri Guru Granth Sahib World University, Fatehgarh Sahib, Punjab, India, ABSTRACT: Regionalization is one of the biggest problems faced by spatial data mining while representing economic and social geography. This problem could be solved by the spatial clustering algorithm for grouping spatial objects. The main purpose of regionalization is to find compact and dense regions which also represent the homogeneous distribution of non-spatial variables. In this paper various clustering algorithms which are used to solve regionalization issues in spatial data mining are studied and also compare the performance of K-means and Ward s algorithm on cohesion, variance, precision and recall parameter s done. Keywords: Spatial data mining, Regionalization, Data clustering, K-Means, Ward s Method, Single linkage, Double linkage, Average Linkage, Ward s Method, DBSCAN Clustering, Cohesion, Variance, Precision and Recall [1] INTRODUCTION Spatial data mining is the process of discovering interesting and previously unknown, but potentially useful patterns from spatial databases. Extracting interesting and useful patterns from spatial datasets is more difficult than extracting the corresponding patterns from traditional numeric and categorical data due to the complexity of spatial data types, spatial relationship and spatial autocorrelation. Spatial data are the data related to objects that occupy space. A spatial database stores spatial objects represented by spatial data types and spatial relationship among such objects [7][12].There are different types of spatial data mining techniques i.e. Clustering,Outlier Detection, Association and Co-Location, Classification, Trend-Detection groups. Clustering is the most common techniques used in spatial mining. Clustering is the process of partitioning a set of data objects into subsets of data objects into subsets such that the data elements in a cluster are similar to one another and different from the elements of others[1].the set of cluster comes from a cluster analysis can be referred to as a clustering. Clustering is a critical task in data mining in which the data which is similar are putting in one group and dissimilar in other groups. The set of cluster resulting from a cluster analysis can be referred to as a clustering. In this context, different 163

2 clustering methods may generate different clustering s on the same data set. The partitioning is not performed by humans, but by the clustering algorithm. Spatial clustering is an important component of spatial data mining. It aims group similar spatial objects into group or clusters so that objects within a cluster have high similarity in comparison to one another but are dissimilar to objects in other clusters [13].Spatial clustering can be applicable for solving many problems. An important application area for the spatial clustering algorithm is social and economic geography. In the scope a classical methodical problem of social geography, regionalization can be considered [13]. Cluster analysis is widely used for data analysis, which organizes a set of data items into groups or clusters so that items in the same group are similar to each other and different from those in other groups. Cluster analysis has a wide range of application in business intelligence, image pattern recognition, web search, biology, space and security [16]. [2] REGIONALIZATION Regionalization is one of the important tasks in spatial data mining. Regionalization is a process of dividing regions into small areas. Regionalization is the process of delineating a large set of spatial objects into a smaller number of spatially contiguous regions while optimizing the homogeneity measure of the derived regions. Regionalization is a classification procedure applied to spatial objects with an area representation, which group them into a homogenous contiguous region. The intent of regionalization is to find spatially compact and dense regions of arbitrary shape with a homogeneous internal distribution of non-spatial variables [5].It would be helpful for many applications, e.g. for direct mailing, to have specific purpose regions, depending on the kind of homogeneity one is interested in[13]. For doing regionalization different types of techniques are used and clustering is commonly used the technique for regionalization. [3]VARIOUS DATA CLUSTERING TECHNIQUES FOR REGIONALIZATION In spatial data mining, many clustering methods can be developed and classified into different categories. Clustering methods can be broadly classified into two groups: partitioning clustering and hierarchical clustering. Partitioning clustering methods, such as K-means and self-organizing map (SOM), divide a set of data items into a number of non-overlapping clusters. A data item is assigned to the closest cluster based on a proximity or dissimilarity measure. Hierarchical clustering, on the other hand, organizes data items into a hierarchy with a sequence of nested partitions or groupings. Commonly-used hierarchical clustering methods include the Ward s method (Ward, 1963), single-linkage clustering, average-linkage clustering, and complete-linkage clustering. Some common techniques used to solve regionalization issues are:- [3.1] Partitional Clustering 164

3 Partitional clustering methods determine a partition for dividing a group of points into different clusters, such that the points in a cluster are more similar to each other than to points in different clusters. These methods start with some arbitrary initial clusters and iteratively reallocate points into clusters until a stopping criterion is met. They tend to find clusters with hyperspherical shapes [14]. Different partitional clustering algorithms are: k-means and k- medoids. [3.1.1] K-Means Clustering K-Means are a partition method technique. For solving the clustering problem K-means is one of the simplest unsupervised learning algorithms. The K-means clustering algorithm is a simple method for estimating the mean (vectors) of a set of K-groups. For spatial data mining, k-means represent an attempt to find an optimal number of k locations where the sum of the distances from every to each of the k centers is minimized. The K-means algorithm is 1. Selection of initial k means for k clusters. 2. a) Calculation of dissimilarity between an object and the mean of a cluster. b) Allocation of an object to the cluster whose mean is nearest to the object. c) Relocation of the mean of cluster from the objects allocated to it so that the intra cluster dissimilarity is minimized. 3. Repeat the second step until a complete pass through all the objects results in no object moving from one cluster to another. Now, cluster becomes stable and clustering process is ends [11]. K-Means Algorithm Properties: - There is always at least one object in each cluster. The clusters are non-hierarchical and they do not overlap. Every member of a cluster is closer to its cluster than any other cluster because closeness does not always involve the 'center' of clusters. There are always K clusters. Results depend on initial choice for centers. 165

4 Figure: 1. K-means algorithm process [22]. [3.2] Hierarchical Clustering A hierarchical method, for a given set of data objects creates a hierarchical decomposition which seeks to build a hierarchy of clusters or tree or dendrogram. In hierarchical clustering, we assign each object to a cluster such that K clusters have K objects. Find the clusters which have similar behavior and then merge them into a single cluster. Now, Compute distance between merged cluster and each of old clusters. This procedure is repeated until all objects are clustered into K no. of clusters [6]. There are two approaches to hierarchical clustering: First one is bottom up" i.e. Grouping small clusters into larger ones called agglomerative clustering or second one is top down" i.e. splitting larger clusters into small ones a called divisive clustering s respectively. [3.2.1] Agglomerative (Bottom Up) Agglomerative hierarchical clustering or bottom-up clustering starts with individual data objects and progressively groups these all data objects into big cluster until the root cluster contained all the data objects is formed. This process is done by using a greedy approach which groups that clusters which are most similar to each other at each step based on a user provide cluster dissimilarity function. This is bottom up clustering method where clusters have subclusters, which in turn have sub-clusters, etc. It starts by letting each object from its own cluster and iteratively merges cluster into larger and larger clusters, until all the objects are in a single cluster or certain termination condition is satisfied. The single cluster becomes the hierarchy s root. For the merging step, it finds the two clusters that are closest to each other, and combines the two to form one cluster [1]. [ ] Ward s Method The Ward method is an agglomerative hierarchical clustering Method. Ward s clustering method is implemented by reducing the number of clusters one at a time starting from one cluster per compound and ending which one cluster comprises all the compounds. At each cluster reduction, the method merges the two clusters and this will gives the result of the smallest increase in the total sum of squares of the distances of each point to its cluster centroid. Thus, the Ward s algorithm forms clusters by selecting a cluster that minimizes the within cluster sum of squares or the error sum of the squares (ESS) [3]. ESS k = - where: x ik : the attribute value of the molecule iin cluster k n: size of cluster 166

5 The ESS values will be summed together as in: E = (2) where: K: the number of cluster Algorithm for Ward s clustering [3] START 1- Start with the largest number of, each cluster consisting of exactly one compound. The value for E is Reduce the number of clusters by one by merging those two that minimize the increase of the total error sum of the squares 3- If the compound is found in more than one cluster, go back to step Display the results in the form of an inverted tree showing at each stage which two clusters were merged and its corresponding total error sum of squares (E) or total number of clusters (K). [ ] Single-Linkage Clustering Single linkage also called nearest neighbor or shortest distance is a method of calculating distances between clusters. In single linkage, the distance between the two closest objects in the two clusters is computed. We know that it is a bottom-up strategy: compare each object with each object. Each object is placed in a separate cluster, and at each step we merge the pair of clusters which is closest, until some termination conditions are satisfied. This requires defining a notion of cluster proximity.for the single link, the proximity of two clusters is defined as the minimum of the distance between any two points in the two clusters [11].Chaining phenomenon is the main drawback of this method i.e. clusters may be forced together due to single objects being close to each other, even though many of the objects in each cluster may be very distant to each other. [ ] Complete-Linkage Clustering Complete-linkage clustering is also known as maximum clustering. In complete linkage clustering, the distance between one cluster to another is considered to be equal to the maximum distance of any member of the clusters. According to the complete linkage clustering method, the distance between two clusters is the maximum of the distances between all pairs of variable vectors drawn from the two clusters [6]. [ ] Average-Linkage Clustering In the Average linkage algorithm, the distance between two clusters is defined as average distance between them.average linkage clustering is a method of calculating distance between clusters. In average linkage, the distance between the objects in the first cluster is considered equal to the average distance from the objects in the second cluster [9]. 167

6 [3.3] DBSCAN Clustering DBSCAN (Density Based Spatial Clustering of Application with Noise).It grows clusters according to the density of neighborhood objects. It is based on the concept of density reachibility and density connectability, both of which depends upon input parameter- size of epsilon neighborhood e and minimum terms of local distribution of nearest neighbors. Here parameter controls size of neighborhood and size of clusters. It starts with an arbitrary starting point that has not been visited [6]. The point s e-neighborhoods is retrieved, and if it contains sufficiently many points, a cluster is started. Otherwise the point is labeled as noise. The number of point parameter impacts detection of outliers. DBSCAN targeting low-dimensional spatial data used DENCLUE algorithm [12]. [4] EXISTING METHODS USED FOR SOLVING REGIONALIZATION ISSUES Various clustering methods are used by researchers to solve the regionalization issue. Some of them used existing algorithms, some were improved existing algorithms,some were presented new algorithms by combining two algorithms, and some other compared hybrid clustering algorithms for solving regionalization.in this section, we will review previous studies that presented different clustering methods used to solve regionalization issue in spatial data mining that have appeared in the literature:- Xie et al. s Scheme - [3] proposed Spatial Clustering algorithm for efficient processing of objects with neighborhood relations. Therefore, spatial clustering is determined by its spatial attributes as well as the attributes of objects in its neighborhood. Cluster with shortest distance based geomorphologic discrepancy laws are combined. The drawback of this method is that regional homogeneity is not guaranteed. Sharma et al. s Scheme - [12] proposed efficient clustering technique for regionalization of a spatial database (RCSDB). This algorithm combines the spatial density and a covariance based on method inductively finds spatially dense and non-spatially homogeneous clusters of arbitrary shape. RCSDB takes into account spatial point distributions as well as the distribution of several non-spatial characteristics. RCSDB classify a database of geographical locations into homogeneous, planar and density-connected subsets called regions. It finds internally density connected sets. Srinivas et al. s Scheme - [13] done a Comparative study of the regionalization used in spatial data mining techniques. They divided regionalization techniques into four parts: Conventional clustering method, maximization of regional compactness approach, an explicit spatial contiguity constraint approach, and density based approach. Lokesh Kumar et al. s Scheme - [4] proposed an algorithm to solve Regionalization, a prominent problem from social geography by combining the 'spatial density' clustering approach and a covariance based method to inductively find spatially dense and non- spatially homogeneous clusters of arbitrary shape. Ildiko Pelczer et al. s Scheme - [8] applies cluster analysis to achieve a regionalization of the Sonora River Basin in the Sonora State, Mexico, into homogeneous zones. The identification of 168

7 homogeneous zones is fundamental for the study of the climatic variations throughout the Basin. They do researches in this topic to analyze the frequency of rain and flood events, to analyze other variables, which can be very significant in the definition of similar areas. For this hierarchical and non- hierarchical algorithms were applied to six experiments based on the data sets for precipitation and temperature available from traditional weather stations. In order to validate the results, four indices applicable to both types of algorithms were applied. Experiments showed that better results were achieved when considering several variables than analyzing each parameter alone and also it is observed that working with average values could mask maximum and minimum values that can influence the climatic variability. By comparing results from the cluster analysis with ancillary data, it is concluded that the K-means algorithm was an effective method to achieve climatically homogeneous zones. Sheng-Tun et al. s Scheme - [6] discussed the results of cluster analysis using data generated from discrete wavelet transform and continuous wavelet transform. Data generated from continuous wavelet transform provides detailed time-variation features that can be used to detect the air pollutant spatial variation in a selected time period. Christina et al. s Scheme - [5] doing regionalization by using three agglomerative clustering and develop a system to study quality distribution. Three different hybrid clustering methods are analyzed for grouping sites into non-overlapping, contiguous and homogeneous regions. This paper also validates homogeneity of the regions formed and suggests future lines of research for improving these methods. Results of this paper show Cluster for grouping sites are homogenous and Ward s with k-means are better than other for regionalization. Ramachandra Rao et al. s Scheme - [10] uses three hybrid-clustering algorithms for checking the effectiveness of the hybrid-cluster analysis in regionalization, in which partitional clustering procedure is used to identify groups of similar catchments by refining the clusters derived from agglomerative hierarchical clustering algorithms, are investigated to determine their effectiveness in regionalization. The hierarchical clustering algorithms used are single linkage, complete linkage and Ward s algorithms, while the partitional clustering algorithm used is the K-means algorithm. The regions given by the clustering algorithms are, in general, not statistically homogeneous. The hybrid-cluster analysis is found to be useful in minimizing the effort needed to identify homogeneous regions. The hybrid of Ward s and K-means algorithms is better for regionalization than other ones. The hybrid method provides enough flexibility and it offers prospects for improvement in regionalization studies. [5] EXPERIMENTAL RESULT From so many clustering techniques which are used to solve regionalization issues we are reviewed two techniques from them i.e. K-means and Ward s algorithm. [5.1] Data Used in the study For comparison of k-means and ward s algorithm different spatial datasets are used. The dataset that is used to test the clustering algorithms and compare among them is obtained from the site: ( The experimental environment is implemented in MATLAB program. Three different datasets : 3D Road Network (North Jutland, Denmark) 169

8 Data Set, Gas Sensor Array Drift Dataset at Different Concentrations DataSet and Water Treatment Plant Dataset. [5.2] Evaluation Measure The regions formed from the clustering algorithms are tested under the following measures. (1) Cohesion measures tells that how close objects in the cluster are related to each other (2) Variance: measure how well-separated are the clusters from each other. (3)Precision: is the fraction of retrieved instances that are relevant (4)Recall: is the fraction of relevant instances that are retrieved A region can be regarded as acceptable homogeneous if HM <1, possibly homogeneous if 1 < HM < 2, and definitely heterogeneous if HM > 2, where HM is the heterogeneity measure[5]. K-means and Ward s algorithm are used for spatial analysis of data and the performances of these algorithms are evaluated by comparing their results. It is deduced that ward s algorithm provides good cohesion values than k-means. Since the Ward s algorithm merges the data objects which will result in minimum within cluster variance, it has got a better cohesion value compared to the k-means algorithms. For the algorithm to find homogenous clusters, it is essential for the right selection of the parameters. In the context of regionalization, it is inherent to use clustering algorithms to find arbitrarily shaped clusters. Table I Analysis of the average cohesion, average variance, precision and recall Dataset 3D Road Network (North Jutland, Denmark) Data Set [17] Gas Sensor Array Drift Dataset at Different Concentrations Data Set [17] Water Treatment Plant Dataset [17] Clustering method K-means Algorithm Ward s Algorith m Cohesion Variance Precision Recall Cohesion Variance Precision Recall Cohesion Variance Precision Recall

9 From the graph in the fig.3,4 and 5 it is noticed that as the cluster number increases, the cluster tends to be more homogenous. Figure:3. Homogeneity Measure of K-Mean Vs Ward s algorithm on first datasets. Figure: 4. Homogeneity Measure of K-Mean Vs Ward s algorithm on second datasets 171

10 Figure:5. Homogeneity Measure of K-Mean Vs Ward s algorithm on third datasets.. [6] CONCLUSION In this paper various data clustering techniques for regionalization issue and also various clustering methods used by different researchers are analyzed for grouping sites into contiguous, non-overlapping and homogeneous regions are presented. We compared on the four data sets the performance of the two clustering algorithms: k-mean and Ward s clustering algorithm. The result analysis of K-means and ward s algorithm on different air pollution dataset shows non-overlapping clusters based on features vector. It plays a vital role to select optimum no of clusters to be homogenous. When the no. of clusters is less than five, there remains at least one cluster which is heterogeneous. When the no. of cluster is six, then all the clusters are homogenous. Thus six is the optimum number for which the data set is taken which is found by over analysis. We found that ward s algorithm gives more cohesion and homogeneity with less clustering than k-means for our data sets. In future is work can be extended on other clustering algorithms are related to regionalization.

11 International Journal of Computer Engineering and Applications, Volume VI, Issue II/III, May 14 REFERENCES [1] Jiawei Han, Data mining: concepts and techniques,2006 [2] Margaret H Dunham, Data Mining: introductory and advanced concepts(pearson Education, 2006). [3] Caixiang Xie., Shilin Chen., FengmeiSuo., and Dan yang, Regionalization of Chinese Medicinal Plants Based on Spatial Data Mining, Seventh International Conference on Fuzzy Systems and Knowledge Discovery, pp , 2010 [4] Lokesh Kumar Sharma, Simon Scheider, Willy Kloesgen, Om Prakash Vyas, Efficient clustering technique for regionalisation of a spatial database, Int. J. of Business Intelligence and Data Mining, 2008 Vol.3, No.1, pp [5] J.Christina, Dr.K.Komathy, Analysis of Hard Clustering Algorithms Applicable to Regionalization, Proceedings of 2013 IEEE Conference on Information and Communication Technologies (ICT 2013) [6] Sheng-Tun Li and Shih-Wei Chou, Jeng-Jong Pan Multi-Resolution Spatio-temporal Data Mining for the Study of Air Pollutant Regionalizationl Proceedings of the 33rd Hawaii International Conference on System Sciences [7] N.Sumathi,R.Geetha, spatial data mining - techniques trends and its applications Journal of Computer Applications, Vol 1, No.4, Oct Dec 2008 [8] Pelczer, Ramos, Domínguez, González, Establishment of regional homogeneous zones in a watershed using clustering algorithms, International Journal of Business Intelligence and Data Mining, Volume 3, Number 1, 25 April 2008, pp (16) [9] Rao, Regionalization of Indiana Watersheds for Flood Flow Predictions Phase I: Studies in Regionalization of Indiana Watersheds, FHWA/IN/JTRP-2002/02, Joint Transportation Research Program, Indiana Department of Transportation and Purdue University, West Lafayette, Indiana, doi: / [10] Rao, Srinivas, Regionalization of watersheds by hybrid-cluster analysis Journal of Hydrology 318 (2006) [11] Ramachandra Rao and V.V. Srinivas (2006) Regionalization of watersheds by fuzzy cluster analysis, Journal of hydrology Science direct, pp [12] L.K. Sharma, S. Scheider, W. Kloesgen and O. P. Vyas, Efficient clustering technique for regionalisation of a spatial database, International Journal Business Intelligence and Data Mining,Vol. 3 No. 1 pp ,2008 [13] PVS Srinivas., Susanta K Satpathy., Lokesh K Sharma., and Ajaya K Akasapu (2011), Regionalisation as Spatial Data Mining Problem: A Comparative Study, Proc. International Journal of Computer Trends and Technology,Vol.18 No.5 pp [14] Xin Wang, Jing Wang, Using Clustering methods in geospatial information systems, GEOMATICA Vol. 64, No. 3, 2010 pp. 347 to 361 [15] Teknomo, Kardi, K-Means Clustering tutorial\kmean\ [16] Assuncao, Neves, Efficient regionalization techniques for socio-economic geographical units using minimum spanning trees International Journal of Geographical Information Science,Vol. 20, No. 7, August 2006, [17] The UCI Machine learning [online].available: 173

COMPARATIVE STUDY OF REGIONALIZATION BASED ON HYBRID K-MEAN AND WARD S CLUSTERING ALGORITHM USING DIFFERENT OPTIMIZATION TECHNIQUES

COMPARATIVE STUDY OF REGIONALIZATION BASED ON HYBRID K-MEAN AND WARD S CLUSTERING ALGORITHM USING DIFFERENT OPTIMIZATION TECHNIQUES International Journal of Computer Engineering and Applications, Volume VIII, Issue II, November 14 COMPARATIVE STUDY OF REGIONALIZATION BASED ON HYBRID K-MEAN AND WARD S CLUSTERING ALGORITHM USING DIFFERENT

More information

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification

More information

SPATIAL DATA MINING. Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM

SPATIAL DATA MINING. Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM SPATIAL DATA MINING Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM INTRODUCTION The main difference between data mining in relational DBS and in spatial DBS is that attributes of the neighbors

More information

University of Florida CISE department Gator Engineering. Clustering Part 1

University of Florida CISE department Gator Engineering. Clustering Part 1 Clustering Part 1 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville What is Cluster Analysis? Finding groups of objects such that the objects

More information

Data Exploration and Unsupervised Learning with Clustering

Data Exploration and Unsupervised Learning with Clustering Data Exploration and Unsupervised Learning with Clustering Paul F Rodriguez,PhD San Diego Supercomputer Center Predictive Analytic Center of Excellence Clustering Idea Given a set of data can we find a

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

An introduction to clustering techniques

An introduction to clustering techniques - ABSTRACT Cluster analysis has been used in a wide variety of fields, such as marketing, social science, biology, pattern recognition etc. It is used to identify homogenous groups of cases to better understand

More information

Data Preprocessing. Cluster Similarity

Data Preprocessing. Cluster Similarity 1 Cluster Similarity Similarity is most often measured with the help of a distance function. The smaller the distance, the more similar the data objects (points). A function d: M M R is a distance on M

More information

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision

More information

Applying cluster analysis to 2011 Census local authority data

Applying cluster analysis to 2011 Census local authority data Applying cluster analysis to 2011 Census local authority data Kitty.Lymperopoulou@manchester.ac.uk SPSS User Group Conference November, 10 2017 Outline Basic ideas of cluster analysis How to choose variables

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x

More information

Clustering Lecture 1: Basics. Jing Gao SUNY Buffalo

Clustering Lecture 1: Basics. Jing Gao SUNY Buffalo Clustering Lecture 1: Basics Jing Gao SUNY Buffalo 1 Outline Basics Motivation, definition, evaluation Methods Partitional Hierarchical Density-based Mixture model Spectral methods Advanced topics Clustering

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 6: Cluster Analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2017/2018 Master in Mathematical Engineering

More information

Chapter 5-2: Clustering

Chapter 5-2: Clustering Chapter 5-2: Clustering Jilles Vreeken Revision 1, November 20 th typo s fixed: dendrogram Revision 2, December 10 th clarified: we do consider a point x as a member of its own ε-neighborhood 12 Nov 2015

More information

Clustering. Stephen Scott. CSCE 478/878 Lecture 8: Clustering. Stephen Scott. Introduction. Outline. Clustering.

Clustering. Stephen Scott. CSCE 478/878 Lecture 8: Clustering. Stephen Scott. Introduction. Outline. Clustering. 1 / 19 sscott@cse.unl.edu x1 If no label information is available, can still perform unsupervised learning Looking for structural information about instance space instead of label prediction function Approaches:

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

Part I. Linear regression & LASSO. Linear Regression. Linear Regression. Week 10 Based in part on slides from textbook, slides of Susan Holmes

Part I. Linear regression & LASSO. Linear Regression. Linear Regression. Week 10 Based in part on slides from textbook, slides of Susan Holmes Week 10 Based in part on slides from textbook, slides of Susan Holmes Part I Linear regression & December 5, 2012 1 / 1 2 / 1 We ve talked mostly about classification, where the outcome categorical. If

More information

ST-DBSCAN: An Algorithm for Clustering Spatial-Temporal Data

ST-DBSCAN: An Algorithm for Clustering Spatial-Temporal Data ST-DBSCAN: An Algorithm for Clustering Spatial-Temporal Data Title Di Qin Carolina Department First Steering of Statistics Committee and Operations Research October 9, 2010 Introduction Clustering: the

More information

Multivariate Statistics: Hierarchical and k-means cluster analysis

Multivariate Statistics: Hierarchical and k-means cluster analysis Multivariate Statistics: Hierarchical and k-means cluster analysis Steffen Unkel Department of Medical Statistics University Medical Center Goettingen, Germany Summer term 217 1/43 What is a cluster? Proximity

More information

Clustering analysis of vegetation data

Clustering analysis of vegetation data Clustering analysis of vegetation data Valentin Gjorgjioski 1, Sašo Dzeroski 1 and Matt White 2 1 Jožef Stefan Institute Jamova cesta 39, SI-1000 Ljubljana Slovenia 2 Arthur Rylah Institute for Environmental

More information

An Entropy-based Method for Assessing the Number of Spatial Outliers

An Entropy-based Method for Assessing the Number of Spatial Outliers An Entropy-based Method for Assessing the Number of Spatial Outliers Xutong Liu, Chang-Tien Lu, Feng Chen Department of Computer Science Virginia Polytechnic Institute and State University {xutongl, ctlu,

More information

Application of Clustering to Earth Science Data: Progress and Challenges

Application of Clustering to Earth Science Data: Progress and Challenges Application of Clustering to Earth Science Data: Progress and Challenges Michael Steinbach Shyam Boriah Vipin Kumar University of Minnesota Pang-Ning Tan Michigan State University Christopher Potter NASA

More information

Application of a GA/Bayesian Filter-Wrapper Feature Selection Method to Classification of Clinical Depression from Speech Data

Application of a GA/Bayesian Filter-Wrapper Feature Selection Method to Classification of Clinical Depression from Speech Data Application of a GA/Bayesian Filter-Wrapper Feature Selection Method to Classification of Clinical Depression from Speech Data Juan Torres 1, Ashraf Saad 2, Elliot Moore 1 1 School of Electrical and Computer

More information

CHAPTER-17. Decision Tree Induction

CHAPTER-17. Decision Tree Induction CHAPTER-17 Decision Tree Induction 17.1 Introduction 17.2 Attribute selection measure 17.3 Tree Pruning 17.4 Extracting Classification Rules from Decision Trees 17.5 Bayesian Classification 17.6 Bayes

More information

More on Unsupervised Learning

More on Unsupervised Learning More on Unsupervised Learning Two types of problems are to find association rules for occurrences in common in observations (market basket analysis), and finding the groups of values of observational data

More information

International Journal of Research in Computer and Communication Technology, Vol 3, Issue 7, July

International Journal of Research in Computer and Communication Technology, Vol 3, Issue 7, July Hybrid SVM Data mining Techniques for Weather Data Analysis of Krishna District of Andhra Region N.Rajasekhar 1, Dr. T. V. Rajini Kanth 2 1 (Assistant Professor, Department of Computer Science & Engineering,

More information

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation

Clustering by Mixture Models. General background on clustering Example method: k-means Mixture model based clustering Model estimation Clustering by Mixture Models General bacground on clustering Example method: -means Mixture model based clustering Model estimation 1 Clustering A basic tool in data mining/pattern recognition: Divide

More information

Classification of High Spatial Resolution Remote Sensing Images Based on Decision Fusion

Classification of High Spatial Resolution Remote Sensing Images Based on Decision Fusion Journal of Advances in Information Technology Vol. 8, No. 1, February 2017 Classification of High Spatial Resolution Remote Sensing Images Based on Decision Fusion Guizhou Wang Institute of Remote Sensing

More information

To Predict Rain Fall in Desert Area of Rajasthan Using Data Mining Techniques

To Predict Rain Fall in Desert Area of Rajasthan Using Data Mining Techniques To Predict Rain Fall in Desert Area of Rajasthan Using Data Mining Techniques Peeyush Vyas Asst. Professor, CE/IT Department of Vadodara Institute of Engineering, Vadodara Abstract: Weather forecasting

More information

Overview of clustering analysis. Yuehua Cui

Overview of clustering analysis. Yuehua Cui Overview of clustering analysis Yuehua Cui Email: cuiy@msu.edu http://www.stt.msu.edu/~cui A data set with clear cluster structure How would you design an algorithm for finding the three clusters in this

More information

International Journal of Remote Sensing, in press, 2006.

International Journal of Remote Sensing, in press, 2006. International Journal of Remote Sensing, in press, 2006. Parameter Selection for Region-Growing Image Segmentation Algorithms using Spatial Autocorrelation G. M. ESPINDOLA, G. CAMARA*, I. A. REIS, L. S.

More information

A Modified DBSCAN Clustering Method to Estimate Retail Centre Extent

A Modified DBSCAN Clustering Method to Estimate Retail Centre Extent A Modified DBSCAN Clustering Method to Estimate Retail Centre Extent Michalis Pavlis 1, Les Dolega 1, Alex Singleton 1 1 University of Liverpool, Department of Geography and Planning, Roxby Building, Liverpool

More information

Classification Based on Logical Concept Analysis

Classification Based on Logical Concept Analysis Classification Based on Logical Concept Analysis Yan Zhao and Yiyu Yao Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2 E-mail: {yanzhao, yyao}@cs.uregina.ca Abstract.

More information

IV Course Spring 14. Graduate Course. May 4th, Big Spatiotemporal Data Analytics & Visualization

IV Course Spring 14. Graduate Course. May 4th, Big Spatiotemporal Data Analytics & Visualization Spatiotemporal Data Visualization IV Course Spring 14 Graduate Course of UCAS May 4th, 2014 Outline What is spatiotemporal data? How to analyze spatiotemporal data? How to visualize spatiotemporal data?

More information

Machine Learning on temporal data

Machine Learning on temporal data Machine Learning on temporal data Classification rees for ime Series Ahlame Douzal (Ahlame.Douzal@imag.fr) AMA, LIG, Université Joseph Fourier Master 2R - MOSIG (2011) Plan ime Series classification approaches

More information

Classification Using Decision Trees

Classification Using Decision Trees Classification Using Decision Trees 1. Introduction Data mining term is mainly used for the specific set of six activities namely Classification, Estimation, Prediction, Affinity grouping or Association

More information

Computer Vision Group Prof. Daniel Cremers. 14. Clustering

Computer Vision Group Prof. Daniel Cremers. 14. Clustering Group Prof. Daniel Cremers 14. Clustering Motivation Supervised learning is good for interaction with humans, but labels from a supervisor are hard to obtain Clustering is unsupervised learning, i.e. it

More information

MULTIVARIATE ANALYSIS OF BORE HOLE DISCONTINUITY DATA

MULTIVARIATE ANALYSIS OF BORE HOLE DISCONTINUITY DATA Maerz,. H., and Zhou, W., 999. Multivariate analysis of bore hole discontinuity data. Rock Mechanics for Industry, Proceedings of the 37th US Rock Mechanics Symposium, Vail Colorado, June 6-9, 999, v.,

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Nearest Neighbor Search with Keywords in Spatial Databases

Nearest Neighbor Search with Keywords in Spatial Databases 776 Nearest Neighbor Search with Keywords in Spatial Databases 1 Sphurti S. Sao, 2 Dr. Rahila Sheikh 1 M. Tech Student IV Sem, Dept of CSE, RCERT Chandrapur, MH, India 2 Head of Department, Dept of CSE,

More information

Unsupervised machine learning

Unsupervised machine learning Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels

More information

Machine Learning for Data Science (CS4786) Lecture 8

Machine Learning for Data Science (CS4786) Lecture 8 Machine Learning for Data Science (CS4786) Lecture 8 Clustering Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Announcement Those of you who submitted HW1 and are still on waitlist email

More information

Stochastic Hydrology. a) Data Mining for Evolution of Association Rules for Droughts and Floods in India using Climate Inputs

Stochastic Hydrology. a) Data Mining for Evolution of Association Rules for Droughts and Floods in India using Climate Inputs Stochastic Hydrology a) Data Mining for Evolution of Association Rules for Droughts and Floods in India using Climate Inputs An accurate prediction of extreme rainfall events can significantly aid in policy

More information

High resolution wetland mapping I.

High resolution wetland mapping I. High resolution wetland mapping I. Based on the teaching material developed by Steve Kas, GeoVille for WOIS Product Group #5 Dr. Zoltán Vekerdy and János Grósz z.vekerdy@utwente.nl vekerdy.zoltan@mkk.szie.hu

More information

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan

Link Prediction. Eman Badr Mohammed Saquib Akmal Khan Link Prediction Eman Badr Mohammed Saquib Akmal Khan 11-06-2013 Link Prediction Which pair of nodes should be connected? Applications Facebook friend suggestion Recommendation systems Monitoring and controlling

More information

TEMPERATUTE PREDICTION USING HEURISTIC DATA MINING ON TWO-FACTOR FUZZY TIME-SERIES

TEMPERATUTE PREDICTION USING HEURISTIC DATA MINING ON TWO-FACTOR FUZZY TIME-SERIES TEMPERATUTE PREDICTION USING HEURISTIC DATA MINING ON TWO-FACTOR FUZZY TIME-SERIES Adesh Kumar Pandey 1, Dr. V. K Srivastava 2, A.K Sinha 3 1,2,3 Krishna Institute of Engineering & Technology, Ghaziabad,

More information

STATISTICA MULTIVARIATA 2

STATISTICA MULTIVARIATA 2 1 / 73 STATISTICA MULTIVARIATA 2 Fabio Rapallo Dipartimento di Scienze e Innovazione Tecnologica Università del Piemonte Orientale, Alessandria (Italy) fabio.rapallo@uniupo.it Alessandria, May 2016 2 /

More information

Clustering using Mixture Models

Clustering using Mixture Models Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction

More information

Lectures in AstroStatistics: Topics in Machine Learning for Astronomers

Lectures in AstroStatistics: Topics in Machine Learning for Astronomers Lectures in AstroStatistics: Topics in Machine Learning for Astronomers Jessi Cisewski Yale University American Astronomical Society Meeting Wednesday, January 6, 2016 1 Statistical Learning - learning

More information

Cluster Analysis (Sect. 9.6/Chap. 14 of Wilks) Notes by Hong Li

Cluster Analysis (Sect. 9.6/Chap. 14 of Wilks) Notes by Hong Li 77 Cluster Analysis (Sect. 9.6/Chap. 14 of Wilks) Notes by Hong Li 1) Introduction Cluster analysis deals with separating data into groups whose identities are not known in advance. In general, even the

More information

From Research Objects to Research Networks: Combining Spatial and Semantic Search

From Research Objects to Research Networks: Combining Spatial and Semantic Search From Research Objects to Research Networks: Combining Spatial and Semantic Search Sara Lafia 1 and Lisa Staehli 2 1 Department of Geography, UCSB, Santa Barbara, CA, USA 2 Institute of Cartography and

More information

Learning Theory Continued

Learning Theory Continued Learning Theory Continued Machine Learning CSE446 Carlos Guestrin University of Washington May 13, 2013 1 A simple setting n Classification N data points Finite number of possible hypothesis (e.g., dec.

More information

Projective Clustering by Histograms

Projective Clustering by Histograms Projective Clustering by Histograms Eric Ka Ka Ng, Ada Wai-chee Fu and Raymond Chi-Wing Wong, Member, IEEE Abstract Recent research suggests that clustering for high dimensional data should involve searching

More information

Iterative Laplacian Score for Feature Selection

Iterative Laplacian Score for Feature Selection Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,

More information

Cluster Analysis CHAPTER PREVIEW KEY TERMS

Cluster Analysis CHAPTER PREVIEW KEY TERMS LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: Define cluster analysis, its roles, and its limitations. Identify the types of research questions addressed by

More information

Clusters. Unsupervised Learning. Luc Anselin. Copyright 2017 by Luc Anselin, All Rights Reserved

Clusters. Unsupervised Learning. Luc Anselin.   Copyright 2017 by Luc Anselin, All Rights Reserved Clusters Unsupervised Learning Luc Anselin http://spatial.uchicago.edu 1 curse of dimensionality principal components multidimensional scaling classical clustering methods 2 Curse of Dimensionality 3 Curse

More information

Introduction to clustering methods for gene expression data analysis

Introduction to clustering methods for gene expression data analysis Introduction to clustering methods for gene expression data analysis Giorgio Valentini e-mail: valentini@dsi.unimi.it Outline Levels of analysis of DNA microarray data Clustering methods for functional

More information

Correlation Preserving Unsupervised Discretization. Outline

Correlation Preserving Unsupervised Discretization. Outline Correlation Preserving Unsupervised Discretization Jee Vang Outline Paper References What is discretization? Motivation Principal Component Analysis (PCA) Association Mining Correlation Preserving Discretization

More information

Distributed Clustering and Local Regression for Knowledge Discovery in Multiple Spatial Databases

Distributed Clustering and Local Regression for Knowledge Discovery in Multiple Spatial Databases Distributed Clustering and Local Regression for Knowledge Discovery in Multiple Spatial Databases Aleksandar Lazarevic, Dragoljub Pokrajac, Zoran Obradovic School of Electrical Engineering and Computer

More information

Multivariate Analysis

Multivariate Analysis Multivariate Analysis Chapter 5: Cluster analysis Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2015/2016 Master in Business Administration and

More information

Computation of Autocorrelation Function using Data Set

Computation of Autocorrelation Function using Data Set Computation of Autocorrelation Function using Data Set Manjot Kaur Department of Computer Engineering &Technology Guru Nanak Dev University Amritsar, Punjab, India manjotman123@gmail.com Jaspreet Singh

More information

When Dictionary Learning Meets Classification

When Dictionary Learning Meets Classification When Dictionary Learning Meets Classification Bufford, Teresa 1 Chen, Yuxin 2 Horning, Mitchell 3 Shee, Liberty 1 Mentor: Professor Yohann Tendero 1 UCLA 2 Dalhousie University 3 Harvey Mudd College August

More information

Applied Hierarchical Cluster Analysis with Average Linkage Algoritm

Applied Hierarchical Cluster Analysis with Average Linkage Algoritm CAUCHY Jurnal Matematika Murni dan Aplikasi Volume 5(1)(2017), Pages 1-7 p-issn: 2086-0382; e-issn: 2477-3344 Applied Hierarchical Cluster Analysis with Average Linkage Algoritm Cindy Cahyaning Astuti

More information

CS570 Data Mining. Anomaly Detection. Li Xiong. Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber.

CS570 Data Mining. Anomaly Detection. Li Xiong. Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber. CS570 Data Mining Anomaly Detection Li Xiong Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber April 3, 2011 1 Anomaly Detection Anomaly is a pattern in the data that does not conform

More information

Multivariate Analysis Cluster Analysis

Multivariate Analysis Cluster Analysis Multivariate Analysis Cluster Analysis Prof. Dr. Anselmo E de Oliveira anselmo.quimica.ufg.br anselmo.disciplinas@gmail.com Cluster Analysis System Samples Measurements Similarities Distances Clusters

More information

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics

More information

DIMENSION REDUCTION AND CLUSTER ANALYSIS

DIMENSION REDUCTION AND CLUSTER ANALYSIS DIMENSION REDUCTION AND CLUSTER ANALYSIS EECS 833, 6 March 2006 Geoff Bohling Assistant Scientist Kansas Geological Survey geoff@kgs.ku.edu 864-2093 Overheads and resources available at http://people.ku.edu/~gbohling/eecs833

More information

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann

Machine Learning! in just a few minutes. Jan Peters Gerhard Neumann Machine Learning! in just a few minutes Jan Peters Gerhard Neumann 1 Purpose of this Lecture Foundations of machine learning tools for robotics We focus on regression methods and general principles Often

More information

Chapter 2 Spatial and Spatiotemporal Big Data Science

Chapter 2 Spatial and Spatiotemporal Big Data Science Chapter 2 Spatial and Spatiotemporal Big Data Science Abstract This chapter provides an overview of spatial and spatiotemporal big data science. This chapter starts with the unique characteristics of spatial

More information

CS626 Data Analysis and Simulation

CS626 Data Analysis and Simulation CS626 Data Analysis and Simulation Instructor: Peter Kemper R 104A, phone 221-3462, email:kemper@cs.wm.edu Today: Data Analysis: A Summary Reference: Berthold, Borgelt, Hoeppner, Klawonn: Guide to Intelligent

More information

PRELIMINARY STUDIES ON CONTOUR TREE-BASED TOPOGRAPHIC DATA MINING

PRELIMINARY STUDIES ON CONTOUR TREE-BASED TOPOGRAPHIC DATA MINING PRELIMINARY STUDIES ON CONTOUR TREE-BASED TOPOGRAPHIC DATA MINING C. F. Qiao a, J. Chen b, R. L. Zhao b, Y. H. Chen a,*, J. Li a a College of Resources Science and Technology, Beijing Normal University,

More information

Clustering. Genome 373 Genomic Informatics Elhanan Borenstein. Some slides adapted from Jacques van Helden

Clustering. Genome 373 Genomic Informatics Elhanan Borenstein. Some slides adapted from Jacques van Helden Clustering Genome 373 Genomic Informatics Elhanan Borenstein Some slides adapted from Jacques van Helden The clustering problem The goal of gene clustering process is to partition the genes into distinct

More information

On Improving the k-means Algorithm to Classify Unclassified Patterns

On Improving the k-means Algorithm to Classify Unclassified Patterns On Improving the k-means Algorithm to Classify Unclassified Patterns Mohamed M. Rizk 1, Safar Mohamed Safar Alghamdi 2 1 Mathematics & Statistics Department, Faculty of Science, Taif University, Taif,

More information

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu From statistics to data science BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Why? How? What? How much? How many? Individual facts (quantities, characters, or symbols) The Data-Information-Knowledge-Wisdom

More information

Supporting Statistical Hypothesis Testing Over Graphs

Supporting Statistical Hypothesis Testing Over Graphs Supporting Statistical Hypothesis Testing Over Graphs Jennifer Neville Departments of Computer Science and Statistics Purdue University (joint work with Tina Eliassi-Rad, Brian Gallagher, Sergey Kirshner,

More information

Issues and Techniques in Pattern Classification

Issues and Techniques in Pattern Classification Issues and Techniques in Pattern Classification Carlotta Domeniconi www.ise.gmu.edu/~carlotta Machine Learning Given a collection of data, a machine learner eplains the underlying process that generated

More information

Clustering. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein. Some slides adapted from Jacques van Helden

Clustering. Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein. Some slides adapted from Jacques van Helden Clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein Some slides adapted from Jacques van Helden Small vs. large parsimony A quick review Fitch s algorithm:

More information

Sparse representation classification and positive L1 minimization

Sparse representation classification and positive L1 minimization Sparse representation classification and positive L1 minimization Cencheng Shen Joint Work with Li Chen, Carey E. Priebe Applied Mathematics and Statistics Johns Hopkins University, August 5, 2014 Cencheng

More information

A Comparative Study of the National Water Model Forecast to Observed Streamflow Data

A Comparative Study of the National Water Model Forecast to Observed Streamflow Data A Comparative Study of the National Water Model Forecast to Observed Streamflow Data CE394K GIS in Water Resources Term Project Report Fall 2018 Leah Huling Introduction As global temperatures increase,

More information

Department of Computer Science and Engineering

Department of Computer Science and Engineering Linear algebra methods for data mining with applications to materials Yousef Saad Department of Computer Science and Engineering University of Minnesota ICSC 2012, Hong Kong, Jan 4-7, 2012 HAPPY BIRTHDAY

More information

Star-Structured High-Order Heterogeneous Data Co-clustering based on Consistent Information Theory

Star-Structured High-Order Heterogeneous Data Co-clustering based on Consistent Information Theory Star-Structured High-Order Heterogeneous Data Co-clustering based on Consistent Information Theory Bin Gao Tie-an Liu Wei-ing Ma Microsoft Research Asia 4F Sigma Center No. 49 hichun Road Beijing 00080

More information

Parameter selection for region-growing image segmentation algorithms using spatial autocorrelation

Parameter selection for region-growing image segmentation algorithms using spatial autocorrelation International Journal of Remote Sensing Vol. 27, No. 14, 20 July 2006, 3035 3040 Parameter selection for region-growing image segmentation algorithms using spatial autocorrelation G. M. ESPINDOLA, G. CAMARA*,

More information

Spatial Decision Tree: A Novel Approach to Land-Cover Classification

Spatial Decision Tree: A Novel Approach to Land-Cover Classification Spatial Decision Tree: A Novel Approach to Land-Cover Classification Zhe Jiang 1, Shashi Shekhar 1, Xun Zhou 1, Joseph Knight 2, Jennifer Corcoran 2 1 Department of Computer Science & Engineering 2 Department

More information

An Optimized Interestingness Hotspot Discovery Framework for Large Gridded Spatio-temporal Datasets

An Optimized Interestingness Hotspot Discovery Framework for Large Gridded Spatio-temporal Datasets IEEE Big Data 2015 Big Data in Geosciences Workshop An Optimized Interestingness Hotspot Discovery Framework for Large Gridded Spatio-temporal Datasets Fatih Akdag and Christoph F. Eick Department of Computer

More information

Introduction to Spatial Statistics and Modeling for Regional Analysis

Introduction to Spatial Statistics and Modeling for Regional Analysis Introduction to Spatial Statistics and Modeling for Regional Analysis Dr. Xinyue Ye, Assistant Professor Center for Regional Development (Department of Commerce EDA University Center) & School of Earth,

More information

Exploratory Hierarchical Clustering for Management Zone Delineation in Precision Agriculture

Exploratory Hierarchical Clustering for Management Zone Delineation in Precision Agriculture Exploratory Hierarchical Clustering for Management Zone Delineation in Precision Agriculture Georg Ruß, Rudolf Kruse Otto-von-Guericke-Universität Magdeburg, Germany {russ,kruse}@iws.cs.uni-magdeburg.de

More information

Characterization of Catchments Extracted From. Multiscale Digital Elevation Models

Characterization of Catchments Extracted From. Multiscale Digital Elevation Models Applied Mathematical Sciences, Vol. 1, 2007, no. 20, 963-974 Characterization of Catchments Extracted From Multiscale Digital Elevation Models S. Dinesh Science and Technology Research Institute for Defence

More information

Distributed Mining of Frequent Closed Itemsets: Some Preliminary Results

Distributed Mining of Frequent Closed Itemsets: Some Preliminary Results Distributed Mining of Frequent Closed Itemsets: Some Preliminary Results Claudio Lucchese Ca Foscari University of Venice clucches@dsi.unive.it Raffaele Perego ISTI-CNR of Pisa perego@isti.cnr.it Salvatore

More information

Lecture 8: Clustering & Mixture Models

Lecture 8: Clustering & Mixture Models Lecture 8: Clustering & Mixture Models C4B Machine Learning Hilary 2011 A. Zisserman K-means algorithm GMM and the EM algorithm plsa clustering K-means algorithm K-means algorithm Partition data into K

More information

Introduction to clustering methods for gene expression data analysis

Introduction to clustering methods for gene expression data analysis Introduction to clustering methods for gene expression data analysis Giorgio Valentini e-mail: valentini@dsi.unimi.it Outline Levels of analysis of DNA microarray data Clustering methods for functional

More information

Local data in M4D: LAU2 and Very Important Geographical Objects (VIGO) Delineating an alternative geometry at local scale JULY 2014 CONTENT.

Local data in M4D: LAU2 and Very Important Geographical Objects (VIGO) Delineating an alternative geometry at local scale JULY 2014 CONTENT. JULY 2014 Local data in M4D: LAU2 and Very Important Geographical Objects (VIGO) Delineating an alternative geometry at local scale CONTENT This technical report describes the methodology used to delineate

More information

Forecasting Using Time Series Models

Forecasting Using Time Series Models Forecasting Using Time Series Models Dr. J Katyayani 1, M Jahnavi 2 Pothugunta Krishna Prasad 3 1 Professor, Department of MBA, SPMVV, Tirupati, India 2 Assistant Professor, Koshys Institute of Management

More information

Machine Learning for Data Science (CS4786) Lecture 2

Machine Learning for Data Science (CS4786) Lecture 2 Machine Learning for Data Science (CS4786) Lecture 2 Clustering Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2017fa/ REPRESENTING DATA AS FEATURE VECTORS How do we represent data? Each data-point

More information

Jae-Bong Lee 1 and Bernard A. Megrey 2. International Symposium on Climate Change Effects on Fish and Fisheries

Jae-Bong Lee 1 and Bernard A. Megrey 2. International Symposium on Climate Change Effects on Fish and Fisheries International Symposium on Climate Change Effects on Fish and Fisheries On the utility of self-organizing maps (SOM) and k-means clustering to characterize and compare low frequency spatial and temporal

More information

Anomaly (outlier) detection. Huiping Cao, Anomaly 1

Anomaly (outlier) detection. Huiping Cao, Anomaly 1 Anomaly (outlier) detection Huiping Cao, Anomaly 1 Outline General concepts What are outliers Types of outliers Causes of anomalies Challenges of outlier detection Outlier detection approaches Huiping

More information

International Journal of Advance Engineering and Research Development. Review Paper On Weather Forecast Using cloud Computing Technique

International Journal of Advance Engineering and Research Development. Review Paper On Weather Forecast Using cloud Computing Technique Scientific Journal of Impact Factor (SJIF): 4.72 International Journal of Advance Engineering and Research Development Volume 4, Issue 12, December -2017 e-issn (O): 2348-4470 p-issn (P): 2348-6406 Review

More information

New Regional Co-location Pattern Mining Method Using Fuzzy Definition of Neighborhood

New Regional Co-location Pattern Mining Method Using Fuzzy Definition of Neighborhood New Regional Co-location Pattern Mining Method Using Fuzzy Definition of Neighborhood Mohammad Akbari 1, Farhad Samadzadegan 2 1 PhD Candidate of Dept. of Surveying & Geomatics Eng., University of Tehran,

More information

Mining Temporal Patterns for Interval-Based and Point-Based Events

Mining Temporal Patterns for Interval-Based and Point-Based Events International Journal of Computational Engineering Research Vol, 03 Issue, 4 Mining Temporal Patterns for Interval-Based and Point-Based Events 1, S.Kalaivani, 2, M.Gomathi, 3, R.Sethukkarasi 1,2,3, Department

More information

Classification of precipitation series using fuzzy cluster method

Classification of precipitation series using fuzzy cluster method INTERNATIONAL JOURNAL OF CLIMATOLOGY Int. J. Climatol. 32: 1596 1603 (2012) Published online 17 May 2011 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/joc.2350 Classification of precipitation

More information