Clustering analysis of vegetation data
|
|
- Cory Lucas
- 5 years ago
- Views:
Transcription
1 Clustering analysis of vegetation data Valentin Gjorgjioski 1, Sašo Dzeroski 1 and Matt White 2 1 Jožef Stefan Institute Jamova cesta 39, SI-1000 Ljubljana Slovenia 2 Arthur Rylah Institute for Environmental Research Dept. of Sustainability & Environment Heidelberg 3084, VIC, Australia 1 Introduction Vegetation may be described as the plant life of a region. The study of patterns and processes in vegetation at various scales of space and time is useful in understanding landscapes, ecological processes, environmental history and predicting ecosystem attributes such as productivity. Generalized vegetation descriptions, maps and other graphical representations of vegetation types have become fundamental to land use planning and management. They are widely used as biodiversity surrogates in conservation assessments and can provide a useful summary of many non-vegetation landscape elements such as animal habitats, agricultural suitability and the location and abundance of timber and other forest resources. We use clustering or classification of vegetation data to obtain such descriptions, maps and other representations. Clustering vegetation data is well known machine learning problem which aims to partition the data set into subsets, so that the data in each subset share some common trait. Summary of vegetation classification and methods can be found in the numerous texts that focus on this discipline[6,3]. In our work we deal with vegetation data which is organized in relational model. To be able to apply classical machine learning approach we need to do some data preprocessing. We preprocess the data using simple aggregation techniques and we use several approaches to analyze the data: Predictive clustering trees [1], k-means and Hierarchical Agglomerative Clustering. These algorithms were applied and satisfactory results were obtained. The rest of paper is organized as follows. First we discuss dataset and problem in details. Further on we show preprocessing details needed to make data suitable for classical data mining approaches, and in the next section we are describing our data mining setup and experiments. Next, we present the results of the experiments and at the end we conclude with discussion and further work proposals. 2 Dataset and problem description The problem is to produce classification and clustering of vegetation properties, which is easier problem to solve than the classification of the vegetation in general. We aim to solve easier problem and later to advance in solving more general
2 problem. Mapping of such classification over the whole landscape is also desired, so we will try to do predictive clustering model, which later can be mapped on the landscape. The data has been collected from across the State of Victoria, Australia an area of approximately 22,000,000 hectares. The State is relatively varied climatically and geologically and supports some 4,000 indigenous vascular plant species. Landscape is divided in about quadrants of 30x30 meters resolution referred as sites later in this paper. For this study we have about sites and about 4600 species. Each of these sites has ordinal categories which represent abundance of each species. Further more, for each site we have environmental (climatic, radiometric, topographic) and spectral variables from the same locations been extracted from a stack of data themes stored in a GIS. On the other hand, additional information is known of the species - their physiognomy (leaf type, plant size and general architecture), phenology (flowering time) and phylogeny (i.e. Genus, Family). Figure 1 depicts the relationship between site properties and species properties. Fig. 1. Species and Sites properties We have relational data with one to many relationships. To handle it we will aggregate the data with simple aggregation techniques. We give more details about this in next section. 3 Preprocessing First we convert the ordinal categories of abundance to numeric value with the help of the expert. We use the following mapping: 1 (0-5%) as (5-25%) as 15 3 (25-50% as (50-75%) as (75-100%) as 87.5
3 Next, we remove measurements of exotic species, and species with very low cover(0.5) as suggested by experts. After cleaning the data we aggregate the cover abundance of the species in a given site by species properties. For every property of a species we aggregate over each of its values. This is done for every site. Basically here we generate a new feature for every value of a given nominal property. Example for a feature generation of autflow property and value 1 is given in Figure 2, or given as algorithm it is presented with Algorithm 1. autflow1 (S i, S p) = X S p S i,autflow(s p)=1 cover(s i ) cover (S i) = X cover (S i, S p) S p S i cover (S i, S p), given that Fig. 2. Example of feature generation for autflow property, given the site S i and species S p Algorithm 1 Function that returns value of new feature for each site function generatef eature(attribute, av alue) for each site S do for each species species in S do sum+ = speciesabundance(species) if getavalue(species,attribute)==avalue then sum1+ = speciesabundance(species) setf eaturev alue(site, feature, sum1/sum) 4 Methodology We use three approaches: Predictive Clustering Trees for multi-target prediction (PCTs) K-means clustering Hierarchical agglomerative clustering (HAC) 4.1 Predictve Clustering Trees Predictive modeling aims at constructing models that can predict a target property of an object from a description of the object. Predictive models are learned
4 from sets of examples, where each example has the form (D, T ), with D being an object description and T a target property value. While a variety of representations ranging from propositional to first order logic have been used for D, T is almost always considered to consist of a single target attribute called the class, which is either discrete (classification problem) or continuous (regression problem). Clustering [2], on the other hand, is concerned with grouping objects into subsets of objects (called clusters) that are similar w.r.t. their description D. There is no target property defined in clustering tasks. In conventional clustering, the notion of a distance (or conversely, similarity) is crucial: examples are considered to be points in a metric space and clusters are constructed such that examples in the same cluster are close according to a particular distance metric. A centroid (or prototypical example) may be used as a representative for a cluster. The centroid is the point with the lowest average (squared) distance to all the examples in the cluster, i.e., the mean or medoid of the examples. Predictive clustering [1] combines elements from both prediction and clustering and it is implemented in the Clus system which can be obtained at K-means Clustering We describe k-means clustering very briefly in this section. The algorithm starts by partitioning the input points into k initial sets, either at random or using some heuristic data. It then calculates the mean point, or centroid, of each set. It constructs a new partition by associating each point with the closest centroid. Then the centroids are recalculated for the new clusters, and algorithm repeated by alternate application of these two steps until convergence, which is obtained when the points no longer switch clusters (or alternatively centroids are no longer changed). 4.3 Hierarchical Agglomerative Clustering In this section, we briefly discuss Hierarchical Agglomerative Clustering (HAC) (see, e.g., [4]). HAC is one of the most widely used clustering approaches. It produces a nested hierarchy of groups of similar objects, based on a matrix containing the pairwise distances between all objects. HAC repeats the following three steps until all objects are in the same cluster: 1. Search the distance matrix for the two closest objects or clusters. 2. Join the two objects (clusters) to produce a new cluster. 3. Update the distance matrix to include the distances between the new cluster and all other clusters (objects). There are four well known HAC algorithms: single-link, complete-link, groupaverage, and centroid clustering, which differ in the cluster similarity measure they employ. We decided to use single-link HAC because it is usually considered
5 to be the simplest approach and has smallest time complexity. Furthermore, this approach can do much better clustering than PCTs, and comparing to some better approach was out of the scope of this work. Single-link HAC computes the distance between two clusters as the distance between the closest pair of objects. The HAC implementation that we use has a computational cost of O(N 2 ), with N the number of time series, and for efficiency it uses a next-best-merge array [4]. An important drawback of single-link HAC is that it suffers from the chaining effect [4], which in some cases may result in undesirable elongated clusters. Because the merge criterion is strictly local (it only takes the two closest objects into account), a chain of points can be extended over a long distance without regard to the overall shape of the emerging cluster. 5 Experimental setup In experimental setup, attributes obtained with aggregation are target attributes. On the other hand the properties of the sites are used as descriptive attributes in data mining task. To provide better experiments, comparison and analysis of results, we use various experimental setups. First, we take only selected attributes. Using selected set of attributes we experiment with PCTs given size constraint[5] of 6 clusters. Next, experiments were performed using all of the available attributes, constraining PCTs to 6 and 12 clusters, HAC to 6 clusters, and we set k = 6 for k-means clustering. 6 Results Here, we present only the results from PCTs. In Appendix we give the k-means and PCTs results in details. We would like to emphasize that HAC produced very unbalanced clusters (four clusters of size 1, one cluster of size 2, and one cluster of size 29673) which are useless in this case. In Figure 3 we give map colored in six colors according to the tree generated by Clus which is given in Figure 4. Expert provided excellent feedback about this clustering and it s visualization. Maps for the other results, are not currently produced and that is planned for further work. First tree with 6 clusters was obtained using only selected aggregated attributes, while next experiments were performed using all of the available aggregated attributes. The tree given in Figure 5 is the result of PCT algorithm with constraint set to 12 leaves and using aggregation of all available attributes. Description of the clusters in terms of size and lifelook and sprflow attributes is given in Figure 6. We could conclude that elements across the clusters are well distributed: we do not have either too small or too large clusters. In terms of lif elook attribute, clusters are well separated. We can notice that two major lif elook types are different between clusters and both represents more than 30% of all elements in a cluster. Considering that there are 27 lifelook
6 Fig. 3. Map types, this percantage is far from small. On the other hand, in terms of sprflow attribute, clusters are inpure with small exception in some clusters(a, B, F). K-means clustering results in details are presented in appendix. For k-means clustering we calculate four standard statistics average, standard deviation, min and max over the descriptive attributes in the cluster to give some description about the clusters and later to do visualization of clusters on a map. 7 Discussion and further work This study shows how different is dealing with vegetation data and even more general with environmental data compared to classical data mining problems. We focused on aggregation and data preprocessing mainly in this work, and later we applied classical algorithms. Obtained results are very promising. We continue to work on this problem by adopting classical algorithm to use hierarchical information about species, also doing mining without aggregation but directly over species. In that case we consider that species are complex/structured data and we propose developing new (or adapting classical) algorithms to handle these types of problems.
7 Fig. 4. Clus Tree with 6 leaves Fig. 5. Clus Tree with 12 leaves References 1. H. Blockeel, L. De Raedt, and J. Ramon. Top-down induction of clustering trees. In 15th Int l Conf. on Machine Learning, pages 55 63, L. Kaufman and P.J. Rousseeuw, editors. Finding groups in data: An introduction to cluster analysis. John Wiley & Sons, P. Legendre and L. Legendre. Numerical Ecology. Elsevier, Amsterdam, C.D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, J. Struyf and S. Džeroski. Constraint based induction of multi-objective regression trees. In 4th Int l Workshop on Knowledge Discovery in Inductive Databases: Revised Selected and Invited Papers, volume 3933 of LNCS, pages Springer, D. Sun, R. J. Hnatiuk, and V. J. Neldner. Review of vegetation classification and mapping systems undertaken by major forested land management agencies in australia. Australian Journal of Botany, 45(6): , 1997.
8 Cluster A: Size: 1425 lifelook=ss 19% lifelook=mt 18% sprflow=1 76% Cluster B: Size: 4740 lifelook=s 15% lifelook=mtg 15% sprflow=1 79% Cluster C: Size: 5668 lifelook=h 36% lifelook=s 17% sprflow=1 65% Cluster D: Size: 2250 lifelook=s 20% lifelook=mtg 19% sprflow=1 53% Cluster E: Size: 9350 lifelook=t 21% lifelook=s 15% sprflow=1 65% Cluster F: Size: 6226 lifelook=t 20% lifelook=s 17% sprflow=1 75% Fig. 6. Description of the clusters in terms of size and lifelook and sprflow attributes A Appendix: Extended results For each cluster we provide size of cluster, four statistics on descriptive attributes, and part of the prototype of a cluster. For each prototype we show just the most important values of that prototype.
9 Cluster A, Size: 1398 Min Max Avg StdDev lifelook=ss 19% leaftype= 34% sprflow=1 76% sumflow=1 58% autflow= 64% winflow= 52% hitecat=1 36% aquatic= 99% fleshyf= 91% fleshyl= 80% Cluster B, Size: 7205 Min Max Avg StdDev lifelook=mtg 19% leaftype= 57% sprflow=1 81% sumflow=1 73% autflow= 68% winflow= 75% hitecat=2 31% aquatic= 94% fleshyf= 95% fleshyl= 97% Cluster C, Size: 1461 Min Max Avg StdDev lifelook=h 19% leaftype= 48% sprflow=1 75% sumflow=1 66% autflow= 58% winflow= 78% hitecat=1 35% aquatic= 91% fleshyf= 90% fleshyl= 78% Cluster D, Size: 2297 Min Max Avg StdDev lifelook=s 20% leaftype= 54% sprflow=1 53% sumflow=1 83% autflow= 80% winflow= 91% hitecat=2 33% aquatic= 99% fleshyf= 92% fleshyl= 100% Cluster E, Size: Min Max Avg StdDev lifelook=t 21% leaftype=scle 39% sprflow=1 65% sumflow=1 66% autflow= 74% winflow= 82% hitecat=2 21% aquatic= 99% fleshyf= 94% fleshyl= 100% Cluster F, Size: 6318 Min Max Avg StdDev lifelook=t 20% leaftype= 40% sprflow=1 75% sumflow=1 71% autflow= 70% winflow= 74% hitecat=2 24% aquatic= 98% fleshyf= 95% fleshyl= 99% Fig. 7. PCTs results obtained using all attributes
10 Cluster A, Size: 9526 Min Max Avg Stddev lifelook=s 28% leaftype=scle 54% sprflow=1 78% sumflow=1 71% autflow= 70% winflow= 64% hitecat=3 20% aquatic= 100% fleshyf= 92% fleshyl= 98% Cluster B, Size: 1065 Min Max Avg Stddev lifelook=ltg 31% leaftype= 64% sprflow=1 92% sumflow=1 87% autflow= 86% winflow= 79% hitecat=3 54% aquatic= 86% fleshyf= 98% fleshyl= 99% Cluster C, Size: 3853 Min Max Avg Stddev lifelook=h 28% leaftype= 62% sprflow=1 83% sumflow=1 65% autflow= 69% winflow= 66% hitecat=1 52% aquatic= 95% fleshyf= 95% fleshyl= 89% Cluster D, Size: 7116 Min Max Avg Stddev lifelook=mtg 27% leaftype= 69% sprflow=1 80% sumflow=1 87% autflow= 62% winflow= 89% hitecat=2 44% aquatic= 95% fleshyf= 97% fleshyl= 98% Cluster E, Size: 2530 Min Max Avg Stddev lifelook=t 28% leaftype= 40% sprflow= 57% sumflow= 68% autflow= 92% winflow= 85% hitecat=4 30% aquatic= 99% fleshyf= 89% fleshyl= 99% Cluster F, Size: 5589 Min Max Avg Stddev lifelook=t 17% leaftype= 34% sprflow= 52% sumflow=1 63% autflow= 74% winflow= 89% hitecat=2 28% aquatic= 99% fleshyf= 94% fleshyl= 98% Fig. 8. K-means results obtained using all attributes and k=6
MULTI-TARGET MODELLING OF THE DIATOM DIVERSITY INDICES IN LAKE PRESPA
- 521 - MULTI-TARGET MODELLING OF THE DIATOM DIVERSITY INDICES IN LAKE PRESPA NAUMOSKI, A.* University Ss. Cyril and Methodius - Skopje, Faculty of Electrical Engineering and Information Technologies Karpos
More informationWill it rain tomorrow?
Will it rain tomorrow? Bilal Ahmed - 561539 Department of Computing and Information Systems, The University of Melbourne, Victoria, Australia bahmad@student.unimelb.edu.au Abstract With the availability
More informationSPATIAL DATA MINING. Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM
SPATIAL DATA MINING Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM INTRODUCTION The main difference between data mining in relational DBS and in spatial DBS is that attributes of the neighbors
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 4: Vector Data: Decision Tree Instructor: Yizhou Sun yzsun@cs.ucla.edu October 10, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification Clustering
More informationData analysis and Ecological Classification
Data analysis and Ecological Classification Gabriela Augusto Marco Painho Abstract Classification is a basic tool for knowledge conceptualisation in any science. Ecology is not an exception. There are
More informationOn and Off-Policy Relational Reinforcement Learning
On and Off-Policy Relational Reinforcement Learning Christophe Rodrigues, Pierre Gérard, and Céline Rouveirol LIPN, UMR CNRS 73, Institut Galilée - Université Paris-Nord first.last@lipn.univ-paris13.fr
More informationClustering. CSL465/603 - Fall 2016 Narayanan C Krishnan
Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification
More informationSpecies Distribution Modeling
Species Distribution Modeling Julie Lapidus Scripps College 11 Eli Moss Brown University 11 Objective To characterize the performance of both multiple response and single response machine learning algorithms
More informationEcological Modelling
Ecological Modelling 220 (2009) 1159 1168 Contents lists available at ScienceDirect Ecological Modelling journal homepage: www.elsevier.com/locate/ecolmodel Using single- and multi-target regression trees
More informationIncremental Construction of Complex Aggregates: Counting over a Secondary Table
Incremental Construction of Complex Aggregates: Counting over a Secondary Table Clément Charnay 1, Nicolas Lachiche 1, and Agnès Braud 1 ICube, Université de Strasbourg, CNRS 300 Bd Sébastien Brant - CS
More informationTailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest Neighbors Frank Nielsen 1 Paolo Piro 2 Michel Barlaud 2 1 Ecole Polytechnique, LIX, Palaiseau, France 2 CNRS / University of Nice-Sophia Antipolis, Sophia
More informationData Exploration and Unsupervised Learning with Clustering
Data Exploration and Unsupervised Learning with Clustering Paul F Rodriguez,PhD San Diego Supercomputer Center Predictive Analytic Center of Excellence Clustering Idea Given a set of data can we find a
More informationLogic and machine learning review. CS 540 Yingyu Liang
Logic and machine learning review CS 540 Yingyu Liang Propositional logic Logic If the rules of the world are presented formally, then a decision maker can use logical reasoning to make rational decisions.
More informationIterative Laplacian Score for Feature Selection
Iterative Laplacian Score for Feature Selection Linling Zhu, Linsong Miao, and Daoqiang Zhang College of Computer Science and echnology, Nanjing University of Aeronautics and Astronautics, Nanjing 2006,
More informationof a landscape to support biodiversity and ecosystem processes and provide ecosystem services in face of various disturbances.
L LANDSCAPE ECOLOGY JIANGUO WU Arizona State University Spatial heterogeneity is ubiquitous in all ecological systems, underlining the significance of the pattern process relationship and the scale of
More informationCS570 Introduction to Data Mining
CS570 Introduction to Data Mining Department of Mathematics and Computer Science Li Xiong Data Exploration and Data Preprocessing Data and Attributes Data exploration Data pre-processing Data cleaning
More informationAdvanced Statistical Methods: Beyond Linear Regression
Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi
More informationGeoComputation 2011 Session 4: Posters Discovering Different Regimes of Biodiversity Support Using Decision Tree Learning T. F. Stepinski 1, D. White
Discovering Different Regimes of Biodiversity Support Using Decision Tree Learning T. F. Stepinski 1, D. White 2, J. Salazar 3 1 Department of Geography, University of Cincinnati, Cincinnati, OH 45221-0131,
More informationEcological Response Units Ecosystem Mapping System for the Southwest US
Ecological Response Units Ecosystem Mapping System for the Southwest US J. C. Moreland, W. A. Robbie, F. J. Triepke, E. H. Muldavin, and J. R. Malusa Objectives What are Ecological Response Units? What
More information43400 Serdang Selangor, Malaysia Serdang Selangor, Malaysia 4
An Extended ID3 Decision Tree Algorithm for Spatial Data Imas Sukaesih Sitanggang # 1, Razali Yaakob #2, Norwati Mustapha #3, Ahmad Ainuddin B Nuruddin *4 # Faculty of Computer Science and Information
More informationPredictive Analytics on Accident Data Using Rule Based and Discriminative Classifiers
Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 10, Number 3 (2017) pp. 461-469 Research India Publications http://www.ripublication.com Predictive Analytics on Accident Data Using
More informationMultivariate Analysis of Ecological Data using CANOCO
Multivariate Analysis of Ecological Data using CANOCO JAN LEPS University of South Bohemia, and Czech Academy of Sciences, Czech Republic Universitats- uric! Lanttesbibiiothek Darmstadt Bibliothek Biologie
More informationRepresentation of Geographic Data
GIS 5210 Week 2 The Nature of Spatial Variation Three principles of the nature of spatial variation: proximity effects are key to understanding spatial variation issues of geographic scale and level of
More informationModern Information Retrieval
Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction
More informationClassification Using Decision Trees
Classification Using Decision Trees 1. Introduction Data mining term is mainly used for the specific set of six activities namely Classification, Estimation, Prediction, Affinity grouping or Association
More informationMultiple regression and inference in ecology and conservation biology: further comments on identifying important predictor variables
Biodiversity and Conservation 11: 1397 1401, 2002. 2002 Kluwer Academic Publishers. Printed in the Netherlands. Multiple regression and inference in ecology and conservation biology: further comments on
More informationA Small Migrating Herd. Mapping Wildlife Distribution 1. Mapping Wildlife Distribution 2. Conservation & Reserve Management
A Basic Introduction to Wildlife Mapping & Modeling ~~~~~~~~~~ Rev. Ronald J. Wasowski, C.S.C. Associate Professor of Environmental Science University of Portland Portland, Oregon 8 December 2015 Introduction
More informationBiodiversity Blueprint Overview
Biodiversity Blueprint Overview Climate Variability Climate projections for the Glenelg Hopkins Regions suggest that the weather will be hotter and drier in the coming years which will impact on land use,
More informationClassification Based on Logical Concept Analysis
Classification Based on Logical Concept Analysis Yan Zhao and Yiyu Yao Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2 E-mail: {yanzhao, yyao}@cs.uregina.ca Abstract.
More informationUnsupervised machine learning
Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels
More informationApplying cluster analysis to 2011 Census local authority data
Applying cluster analysis to 2011 Census local authority data Kitty.Lymperopoulou@manchester.ac.uk SPSS User Group Conference November, 10 2017 Outline Basic ideas of cluster analysis How to choose variables
More informationChitra Sood, R.M. Bhagat and Vaibhav Kalia Centre for Geo-informatics Research and Training, CSK HPKV, Palampur , HP, India
APPLICATION OF SPACE TECHNOLOGY AND GIS FOR INVENTORYING, MONITORING & CONSERVATION OF MOUNTAIN BIODIVERSITY WITH SPECIAL REFERENCE TO MEDICINAL PLANTS Chitra Sood, R.M. Bhagat and Vaibhav Kalia Centre
More informationGIS Solutions in Natural Resource Management: Balancing the Technical- Political Equation
GIS Solutions in Natural Resource Management: Balancing the Technical- Political Equation Stan Morain, Editor SUB Gdttlngen 7 208 520 309 98 A14447 0NW0RD PRESS V? % \
More informationCluster Analysis (Sect. 9.6/Chap. 14 of Wilks) Notes by Hong Li
77 Cluster Analysis (Sect. 9.6/Chap. 14 of Wilks) Notes by Hong Li 1) Introduction Cluster analysis deals with separating data into groups whose identities are not known in advance. In general, even the
More informationChanging Ecoregional Map Boundaries
February 12, 2004 By Robert G. Bailey, USDA Forest Service, Inventory & Monitoring Institute Changing Ecoregional Map Boundaries The Forest Service has developed a mapping framework to help managers better
More informationCell-based Model For GIS Generalization
Cell-based Model For GIS Generalization Bo Li, Graeme G. Wilkinson & Souheil Khaddaj School of Computing & Information Systems Kingston University Penrhyn Road, Kingston upon Thames Surrey, KT1 2EE UK
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More informationKnowledge Extraction from DBNs for Images
Knowledge Extraction from DBNs for Images Son N. Tran and Artur d Avila Garcez Department of Computer Science City University London Contents 1 Introduction 2 Knowledge Extraction from DBNs 3 Experimental
More informationReal-Time Computerized Annotation of Pictures
Real-Time Computerized Annotation of Pictures Jia Li James Z. Wang The Pennsylvania State University Email: jiali@psu.edu, jwang@ist.psu.edu How Visible Are Web Images? Keukenhof photos ALIPR: Automatic
More informationLand Accounts - The Canadian Experience
Land Accounts - The Canadian Experience Development of a Geospatial database to measure the effect of human activity on the environment Who is doing Land Accounts Statistics Canada (national) Component
More informationAdvanced Techniques for Mining Structured Data: Process Mining
Advanced Techniques for Mining Structured Data: Process Mining Frequent Pattern Discovery /Event Forecasting Dr A. Appice Scuola di Dottorato in Informatica e Matematica XXXII Problem definition 1. Given
More informationUSING GIS CARTOGRAPHIC MODELING TO ANALYSIS SPATIAL DISTRIBUTION OF LANDSLIDE SENSITIVE AREAS IN YANGMINGSHAN NATIONAL PARK, TAIWAN
CO-145 USING GIS CARTOGRAPHIC MODELING TO ANALYSIS SPATIAL DISTRIBUTION OF LANDSLIDE SENSITIVE AREAS IN YANGMINGSHAN NATIONAL PARK, TAIWAN DING Y.C. Chinese Culture University., TAIPEI, TAIWAN, PROVINCE
More informationPartnering with LANDFIRE, NatureServe, and Heritage Programs. Utilizing Legacy Data for Ecological Site Concept Development and Descriptions
Partnering with LANDFIRE, NatureServe, and Heritage Programs Utilizing Legacy Data for Ecological Site Concept Development and Descriptions Content LANDFIRE: BpS vs. EVT LANDFIRE: Disturbance Models NatureServe:
More informationMachine Learning for Data Science (CS4786) Lecture 2
Machine Learning for Data Science (CS4786) Lecture 2 Clustering Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2017fa/ REPRESENTING DATA AS FEATURE VECTORS How do we represent data? Each data-point
More informationStatewide wetland geospatial inventory update
Statewide wetland geospatial inventory update Factsheet 1: Outcomes from the statewide wetland geospatial inventory update 1 Introduction In 2011 the Victorian Department of Environment and Primary Industries
More informationIntroducing GIS analysis
1 Introducing GIS analysis GIS analysis lets you see patterns and relationships in your geographic data. The results of your analysis will give you insight into a place, help you focus your actions, or
More informationPrinciples of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata
Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision
More informationMachine Learning for Data Science (CS4786) Lecture 8
Machine Learning for Data Science (CS4786) Lecture 8 Clustering Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Announcement Those of you who submitted HW1 and are still on waitlist email
More informationSystem of Environmental-Economic Accounting. Advancing the SEEA Experimental Ecosystem Accounting. Extent Account (Levels 1 and 2)
Advancing the SEEA Experimental Ecosystem Accounting Extent Account (Levels 1 and 2) Overview: The Extent Account 1. Learning objectives 2. Review of Level 0 (5m) What is it? Why do we need it? What does
More informationGeographical Information System (GIS) Prof. A. K. Gosain
Geographical Information System (GIS) Prof. A. K. Gosain gosain@civil.iitd.ernet.in Definition of GIS GIS - Geographic Information System or a particular information system applied to geographical data
More informationFinal Exam, Machine Learning, Spring 2009
Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3
More informationHard and Fuzzy c-medoids for Asymmetric Networks
16th World Congress of the International Fuzzy Systems Association (IFSA) 9th Conference of the European Society for Fuzzy Logic and Technology (EUSFLAT) Hard and Fuzzy c-medoids for Asymmetric Networks
More informationMultifunctional theory in agricultural land use planning case study
Multifunctional theory in agricultural land use planning case study Introduction István Ferencsik (PhD) VÁTI Research Department, iferencsik@vati.hu By the end of 20 th century demands and expectations
More informationUsing co-clustering to analyze spatio-temporal patterns: a case study based on spring phenology
Using co-clustering to analyze spatio-temporal patterns: a case study based on spring phenology R. Zurita-Milla, X. Wu, M.J. Kraak Faculty of Geo-Information Science and Earth Observation (ITC), University
More informationCHAPTER-17. Decision Tree Induction
CHAPTER-17 Decision Tree Induction 17.1 Introduction 17.2 Attribute selection measure 17.3 Tree Pruning 17.4 Extracting Classification Rules from Decision Trees 17.5 Bayesian Classification 17.6 Bayes
More informationLearning Decision Trees
Learning Decision Trees Machine Learning Spring 2018 1 This lecture: Learning Decision Trees 1. Representation: What are decision trees? 2. Algorithm: Learning decision trees The ID3 algorithm: A greedy
More informationDirectorate E: Sectoral and regional statistics Unit E-4: Regional statistics and geographical information LUCAS 2018.
EUROPEAN COMMISSION EUROSTAT Directorate E: Sectoral and regional statistics Unit E-4: Regional statistics and geographical information Doc. WG/LCU 52 LUCAS 2018 Eurostat Unit E4 Working Group for Land
More informationSpatial Process VS. Non-spatial Process. Landscape Process
Spatial Process VS. Non-spatial Process A process is non-spatial if it is NOT a function of spatial pattern = A process is spatial if it is a function of spatial pattern Landscape Process If there is no
More informationData Preprocessing. Cluster Similarity
1 Cluster Similarity Similarity is most often measured with the help of a distance function. The smaller the distance, the more similar the data objects (points). A function d: M M R is a distance on M
More informationMachine Learning - MT Clustering
Machine Learning - MT 2016 15. Clustering Varun Kanade University of Oxford November 28, 2016 Announcements No new practical this week All practicals must be signed off in sessions this week Firm Deadline:
More informationA soft computing logic method for agricultural land suitability evaluation
A soft computing logic method for agricultural land suitability evaluation B. Montgomery 1, S. Dragićević 1* and J. Dujmović 2 1 Geography Department, Simon Fraser University, 8888 University Drive, Burnaby,
More informationSection on Survey Research Methods JSM 2010 STATISTICAL GRAPHICS OF PEARSON RESIDUALS IN SURVEY LOGISTIC REGRESSION DIAGNOSIS
STATISTICAL GRAPHICS OF PEARSON RESIDUALS IN SURVEY LOGISTIC REGRESSION DIAGNOSIS Stanley Weng, National Agricultural Statistics Service, U.S. Department of Agriculture 3251 Old Lee Hwy, Fairfax, VA 22030,
More informationAUTOMATIC GENERALIZATION OF LAND COVER DATA
POSTER SESSIONS 377 AUTOMATIC GENERALIZATION OF LAND COVER DATA OIliJaakkola Finnish Geodetic Institute Geodeetinrinne 2 FIN-02430 Masala, Finland Abstract The study is related to the production of a European
More informationOutline. 15. Descriptive Summary, Design, and Inference. Descriptive summaries. Data mining. The centroid
Outline 15. Descriptive Summary, Design, and Inference Geographic Information Systems and Science SECOND EDITION Paul A. Longley, Michael F. Goodchild, David J. Maguire, David W. Rhind 2005 John Wiley
More informationPerformance Evaluation of the Matlab PCT for Parallel Implementations of Nonnegative Tensor Factorization
Performance Evaluation of the Matlab PCT for Parallel Implementations of Nonnegative Tensor Factorization Tabitha Samuel, Master s Candidate Dr. Michael W. Berry, Major Professor Abstract: Increasingly
More informationAN INVESTIGATION OF AUTOMATIC CHANGE DETECTION FOR TOPOGRAPHIC MAP UPDATING
AN INVESTIGATION OF AUTOMATIC CHANGE DETECTION FOR TOPOGRAPHIC MAP UPDATING Patricia Duncan 1 & Julian Smit 2 1 The Chief Directorate: National Geospatial Information, Department of Rural Development and
More informationKyoto and Carbon Initiative - the Ramsar / Wetlands International perspective
Kyoto and Carbon Initiative - the Ramsar / Wetlands International perspective (the thoughts of Max Finlayson, as interpreted by John Lowry) Broad Requirements Guideline(s) for delineating wetlands (specifically,
More informationStatistical Machine Learning
Statistical Machine Learning Christoph Lampert Spring Semester 2015/2016 // Lecture 12 1 / 36 Unsupervised Learning Dimensionality Reduction 2 / 36 Dimensionality Reduction Given: data X = {x 1,..., x
More informationAn open source approach for the intrinsic assessment of the temporal accuracy, up-todateness and lineage of OpenStreetMap
An open source approach for the intrinsic assessment of the temporal accuracy, up-todateness and lineage of OpenStreetMap Marco Minghini 1,2, Maria Antonia Brovelli 2, Francesco Frassinelli 2 1 European
More informationNatural Resource Management. Northern Tasmania. Strategy. Appendix 2
Natural Resource Management Strategy Northern Tasmania 2015 2020 Appendix 2 Appendix 2 Appendix 2. Carbon Planting Spatial Prioritisation In support of this Strategy s development and implementation, work
More informationQuality and Coverage of Data Sources
Quality and Coverage of Data Sources Objectives Selecting an appropriate source for each item of information to be stored in the GIS database is very important for GIS Data Capture. Selection of quality
More informationSTRUCTURAL BIOINFORMATICS I. Fall 2015
STRUCTURAL BIOINFORMATICS I Fall 2015 Info Course Number - Classification: Biology 5411 Class Schedule: Monday 5:30-7:50 PM, SERC Room 456 (4 th floor) Instructors: Vincenzo Carnevale - SERC, Room 704C;
More informationUnit 5.2. Ecogeographic Surveys - 1 -
Ecogeographic Surveys Unit 5.2 Ecogeographic Surveys - 1 - Objectives Ecogeographic Surveys - 2 - Outline Introduction Phase 1 - Project Design Phase 2 - Data Collection and Analysis Phase 3 - Product
More informationClustering with k-means and Gaussian mixture distributions
Clustering with k-means and Gaussian mixture distributions Machine Learning and Category Representation 2014-2015 Jakob Verbeek, ovember 21, 2014 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15
More informationDiscriminative Direction for Kernel Classifiers
Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering
More informationClustering. Stephen Scott. CSCE 478/878 Lecture 8: Clustering. Stephen Scott. Introduction. Outline. Clustering.
1 / 19 sscott@cse.unl.edu x1 If no label information is available, can still perform unsupervised learning Looking for structural information about instance space instead of label prediction function Approaches:
More informationPrepared by: Precious Annie Lopez & Nolwenn Boucher
Recognizing the ancestral land and biodiversity conservation efforts of indigenous people in Quinchao municipality using GIS- and participatory 3Dmapping tool Prepared by: Precious Annie Lopez & Nolwenn
More informationSPECCHIO for Australia: taking spectroscopy data from the sensor to discovery for the Australian remote sensing community
University of Wollongong Research Online Faculty of Science, Medicine and Health - Papers Faculty of Science, Medicine and Health 2013 SPECCHIO for Australia: taking spectroscopy data from the sensor to
More informationAUTOMATED TEMPLATE MATCHING METHOD FOR NMIS AT THE Y-12 NATIONAL SECURITY COMPLEX
AUTOMATED TEMPLATE MATCHING METHOD FOR NMIS AT THE Y-1 NATIONAL SECURITY COMPLEX J. A. Mullens, J. K. Mattingly, L. G. Chiang, R. B. Oberer, J. T. Mihalczo ABSTRACT This paper describes a template matching
More informationPopulation Ecology and the Distribution of Organisms. Essential Knowledge Objectives 2.D.1 (a-c), 4.A.5 (c), 4.A.6 (e)
Population Ecology and the Distribution of Organisms Essential Knowledge Objectives 2.D.1 (a-c), 4.A.5 (c), 4.A.6 (e) Ecology The scientific study of the interactions between organisms and the environment
More informationSIF_7.1_v2. Indicator. Measurement. What should the measurement tell us?
Indicator 7 Area of natural and semi-natural habitat Measurement 7.1 Area of natural and semi-natural habitat What should the measurement tell us? Natural habitats are considered the land and water areas
More informationAn Empirical Study of Building Compact Ensembles
An Empirical Study of Building Compact Ensembles Huan Liu, Amit Mandvikar, and Jigar Mody Computer Science & Engineering Arizona State University Tempe, AZ 85281 {huan.liu,amitm,jigar.mody}@asu.edu Abstract.
More informationMoreton Bay and Key Geographic Concepts Worksheet
Moreton Bay and Key Geographic Concepts Worksheet The Australian Curriculum: Geography draws on seven key geographic concepts: place scale space environment change interconnection sustainability They are
More informationLAND COVER CATEGORY DEFINITION BY IMAGE INVARIANTS FOR AUTOMATED CLASSIFICATION
LAND COVER CATEGORY DEFINITION BY IMAGE INVARIANTS FOR AUTOMATED CLASSIFICATION Nguyen Dinh Duong Environmental Remote Sensing Laboratory Institute of Geography Hoang Quoc Viet Rd., Cau Giay, Hanoi, Vietnam
More informationScience Standards of Learning for Virginia Public Schools
Standards of Learning for Virginia Public Schools Reasoning, and Logic Life Processes Earth Patterns, Cycles, and Change Kindergarten Key Concepts Rainforest Desert K.1 The student will conduct investigations
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationCOURSES OUTSIDE THE JOURNALISM SCHOOL
COURSES OUTSIDE THE JOURNALISM SCHOOL Students are not limited to the science courses on this list and may select classes from additional specialties. In accordance with University policy, 400-level courses
More informationMining Climate Data. Michael Steinbach Vipin Kumar University of Minnesota /AHPCRC
Mining Climate Data Michael Steinbach Vipin Kumar University of Minnesota /AHPCRC Collaborators: G. Karypis, S. Shekhar (University of Minnesota/AHPCRC) V. Chadola, S. Iyer, G. Simon, P. Zhang (UM/AHPCRC)
More informationNetwork regression with predictive clustering trees
Data Min Knowl Disc (2012) 25:378 413 DOI 10.1007/s10618-012-0278-6 Network regression with predictive clustering trees Daniela Stojanova Michelangelo Ceci Annalisa Appice Sašo Džeroski Received: 17 November
More informationLearning Decision Trees
Learning Decision Trees Machine Learning Fall 2018 Some slides from Tom Mitchell, Dan Roth and others 1 Key issues in machine learning Modeling How to formulate your problem as a machine learning problem?
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Ensembles Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique Fédérale de Lausanne
More informationClustering and Gaussian Mixture Models
Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap
More informationOverview of Remote Sensing in Natural Resources Mapping
Overview of Remote Sensing in Natural Resources Mapping What is remote sensing? Why remote sensing? Examples of remote sensing in natural resources mapping Class goals What is Remote Sensing A remote sensing
More information3-LS1-1 From Molecules to Organisms: Structures and Processes
3-LS1-1 From Molecules to Organisms: Structures and Processes 3-LS1-1. Develop models to describe that organisms have unique and diverse life cycles but all have in common birth, growth, reproduction,
More informationThe Road to Data in Baltimore
Creating a parcel level database from high resolution imagery By Austin Troy and Weiqi Zhou University of Vermont, Rubenstein School of Natural Resources State and local planning agencies are increasingly
More informationA Comparison of Categorical Attribute Data Clustering Methods
A Comparison of Categorical Attribute Data Clustering Methods Ville Hautamäki 1, Antti Pöllänen 1, Tomi Kinnunen 1, Kong Aik Lee 2, Haizhou Li 2, and Pasi Fränti 1 1 School of Computing, University of
More informationDynamic Modeling of Land Use and Coverage at Quarta Colônia, RS, Brazil
11 16 April 2010 Dynamic Modeling of Land Use and Coverage at Quarta Colônia, RS, Brazil Rudiney Soares PEREIRA Renata FERRARI Mariane Alves DAL SANTO w w w. f i g 2 0 1 0. c o m Introduction We have a
More informationUNITED NATIONS E/CONF.96/CRP. 5
UNITED NATIONS E/CONF.96/CRP. 5 ECONOMIC AND SOCIAL COUNCIL Eighth United Nations Regional Cartographic Conference for the Americas New York, 27 June -1 July 2005 Item 5 of the provisional agenda* COUNTRY
More informationOntario Science Curriculum Grade 9 Academic
Grade 9 Academic Use this title as a reference tool. SCIENCE Reproduction describe cell division, including mitosis, as part of the cell cycle, including the roles of the nucleus, cell membrane, and organelles
More informationTo Predict Rain Fall in Desert Area of Rajasthan Using Data Mining Techniques
To Predict Rain Fall in Desert Area of Rajasthan Using Data Mining Techniques Peeyush Vyas Asst. Professor, CE/IT Department of Vadodara Institute of Engineering, Vadodara Abstract: Weather forecasting
More information