Formalizing natural-language spatial relations between linear objects with topological and metric properties

Size: px
Start display at page:

Download "Formalizing natural-language spatial relations between linear objects with topological and metric properties"

Transcription

1 Formalizing natural-language spatial relations between linear objects with topological and metric properties JUN XU* Institute of Geographic Science and Natural Resources Research, Chinese Academy of Science, 11A Datun Road, Beijing , People's Republic of China (Received 9 February 2006; in final form 5 July 2006) People usually use qualitative terms to express spatial relations, while current geographic information systems (GISs) all use quantitative approaches to store spatial information. The abilities of current GISs to represent and query spatial information about geographic space are limited. Based on the result of a humansubject test of natural-language descriptions of spatial relations between linear geographic objects, this paper defines a series of quantitative indices that are related to natural-language spatial relation terms, and uses these indices to formalize the ambiguous natural-language representation with a decision-tree algorithm. The result indicates that using both topological indices and metric indices can formalize the natural-language spatial predicates better than using only topological indices. The rules extracted from the trees are used to characterize the spatial relations into qualitative description groups. Using these rules, a prototype of an intelligent natural-language interface for the ESRI software ArcGIS that can query spatial relations between two linear objects in natural English language is implemented using SNePS (the Semantic Network Processing System). Keywords: Spatial relations; Linear objects; Formalization; Topology; Geometric indices; Natural-language queries 1. Introduction When using natural language to express spatial relations, people usually use qualitative descriptions, such as 'Buffalo is close to Niagara Falls', 'NY-384 goes along Niagara river'. Current geographic information systems (GISs), however, all use quantitative methods to store and retrieve spatial information, therefore they can only query a few spatial relations based on some quantitative values, for example 'the restaurants within two miles of the park'. Current GIS models are limited in the ability to represent and retrieve spatial information about geographic space. Spatial relations are one of the most distinctive aspects of spatial or geographical information, and thus a better understanding of the cognitive aspects of spatial relations, and their formalization in computational models, is critical to the advancement of geographical information science (Mark et al. 1999). Qualitative spatial knowledge representation and reasoning is the central area of Naive Geography (Egenhofer and Mark 1995). It is concerned with capturing everyday common-sense knowledge of the physical world (Weld and De Kleer 1990, Cohn and Hazarika 2001). Many computational models of qualitative spatial Corresponding author. xujun@lreis.ac.cn International Journal of Geographical Information Science ISSN print/issn online 2007 Taylor & Francis r-»*~»t. in inon/nc^qqin^nnqo/itn

2 378 /. Xu relations have been developed. Region Connection Calculus (RCC) is a regionbased method of representing topological spatial relations (Randell et al. 1992, Cohn 1996, 1997, Cohn et al. 1997a, 1997b). An alternative approach to representing and reasoning topological relations is point-set topological spatial relations (Egenhofer and Franzosa 1991, Egenhofer and Herring 1994). Directional spatial relations have been described by Frank (1991) and Zhan and Peuquet (1987). Hernandez (1991) represented the order and orientation of spatial relations in a two-dimensional space qualitatively. Freksa and co-workers introduced an orientation grid for representing qualitative orientation information (Freksa 1992, Freksa and Zimmermann 1992). Qualitative distance and proximity relations were studied by using fuzzy set membership (Gahegan 1995, Hernandez et al. 1995, Yao 2002). Other methods in qualitative spatial reasoning include algebraic approach (Smith and Park 1992), partially ordered sets (Kainz et al. 1993), a computational model for characterizing spatial prepositions (Abella and Kender 1993), approaches combining different information (Bennett et al. 1997, Clementini et al. 1997), and a proximity approach for formalizing a region-based theory of space (Vakarelov et al. 2002). The problem of 'understanding' natural language can be treated as a problem of 'translating' between natural languages and formal languages within a very limited domain (Frank and Mark 1991). To bridge the gap between natural-language terms and a computational model of spatial relations, it is necessary fully to understand the relationship between the ambiguous natural-language representations and the geometric spatial relations of geographic objects, and to formalize the qualitative natural-language terms. Mark et al. (1995) designed several human-subject protocols to explore, evaluate or refine computational models of spatial relations in natural language. Human-subject experiments had been conducted to confirm these formal models of spatial relations. Mark and Egenhofer (1994, 1995) refined and calibrated the meaning of spatial predicates from English and Spanish concerning line-region relations through crosslinguistic human-subject testing. Based on their study, Egenhofer and Shariff (1998) and Shariff et al. (1998) developed a formal model to capture the metric and topological detail of natural-language spatial relations, and implemented the natural-language-like query in a geographic database. Topological relations were identified by the 9- intersention model. Two groups of metric details were derived: splitting ratios, which are the normalized values of lengths and areas of intersections; and closeness measures, which are the normalized distances between disjoint object parts. The resulting model of topological and metric properties was calibrated for 64 English-language terms about spatial relations between a line and a region. Recently, Nedas et al. (2006) used topological and metric models to specify the geometry of line-line spatial relations. In this paper, the spatial relationship between two linear objects is studied. A series of topological and metric indices describing the spatial relations of two linear objects are defined. Based on the result of a human-subject test, these indices are used to formalize the natural-language terms about spatial relations of two linear objects using a decision-tree data mining algorithm. Finally, the formalized rules are applied in the ESRI software ArcGIS to fulfil natural-language queries of spatial relations between two linear geographic objects. 2. Natural-language descriptions of spatial relations The human-subject study was conducted to find how people choose words to describe spatial relations between linear objects in different situations. It is a human-computer interactive procedure. A series of maps, with each map showing

3 Fomalizing natural-language spatial relations 379 two linear geographic objects in different geometric configurations, were displayed to the human subjects, and a sentence describing the spatial relations of the two objects was provided to the subjects. Comparing the sentences with the map, the subjects determined whether this sentence described the relation correctly, and chose the degree of their agreement from given options ranging from 'strongly agree' to 'strongly disagree'. The syntactic frame of the sentences used to describe spatial relations consist of a Figure (the located object), a Ground (the reference object with respect to which the Figure is located), and a spatial relation predicate (verbs, prepositions, or verbs with prepositions) to indicate the nature of the relationship between the Figure and the Ground (Talmy 1983, Herskovits 1986). Here are a few examples of sentences describing the relations between two linear objects: '1-990 connects with ' '1-25 crosses 1-80.' 'Colorado River goes along 1-15.' It is impossible to test all spatial predicates in the huge variety of selections. Only ten spatial predicates were chosen arbitrarily from some commonly used terms. In order to compare the semantic similarities and differences, some chosen terms may have very close meanings. These are: 'crosses', 'is perpendicular to', 'intersects', 'goes along', 'is parallel to', 'is coincident with', 'meets', 'connects with', 'merges with', and 'flows into'. Additionally, 114 pairs of linear objects in different geometric configurations were selected to test them. The spatial predicates are used to describe the 114 situations. In total, 1048 combinations of spatial relations and predicates were tested. They were divided into three groups, and each human subject only needed to answer one group of questions so that they could finish in one hour. The research was conducted in the Geographic Information and Analysis Laboratory at the State University of New York at Buffalo. Human subjects were recruited on campus. Most of them were undergraduate and graduate students. One hundred and eight human subjects from 31 departments participated in this study, of which 55 were male and 53 were female. They were divided into three groups randomly to answer different groups of question, 36 people to each group. The results indicated that the topological and metric properties of the linear objects affect people's agreement with the sentences. However, they have different degrees of importance for different natural-language terms (Xu 2005). For example, the terms 'crosses', 'meets' and 'merges with' are dominantly affected by topological relations, and metric measures do not have significant additional effects. However, metric details such as distance and angle have a primary effect on the terms 'goes along', 'parallel to' and 'perpendicular to', while topology has only a secondary effect. 3. Formal model and indices Following the result of the human-subject study, the geometric indices, including topological and metric indices, were used to formalize the natural-language spatial relation terms intersection model of topological relations In this paper, the 9-intersection model is used to model topological relations between two linear geographic objects. The 9-intersection model is a comprehensive model for binary topological spatial relations (Egenhofer and Herring 1994). It

4 3.2 Metric indices for line-line relations Topology alone is not enough to characterize spatial relations in all cases. In some cases, metric details have an even greater effect on natural-language descriptions. Similar to the metric indices of a line-region used by Shariff et al. (1998) and that of line-line used by Nedas et al (2006), a series of metric indices of two linear objects are measured in this study, but different measurements are used. According to the results of the humansubject test, the angle and distance between two objects and how the objects split each other play important roles when people perceive and describe spatial relations between two linear objects. Thus the metric indices used here should include orientation, distance and splitting details. In the following definitions of metric indices, A represents the Figure object, and B represents the Ground object in the spatial relations Direction. The relative direction of one linear object to another one is described by angles. There are two kinds of angles. One is a local angle (LA) (figure la), which is the exact angle at the intersection where the two lines cross; the other is a global angle (GA) (figure lb), which is the angle between the general orientations of the two objects. With the long axis of bonding boxes representing the general orientations of the objects, the global angle between two linear objects is the sharp angle between the two long axes of bounding boxes. The bounding box used

5 here is a different type with the ordinary bounding box defined by the maximum and minimum coordinates of the objects. It is centered at the object's centre of mass and oriented along the object's principal inertia axes with dimensions proportional to the object's maximum and minimum moments of inertia (Abella and Kender 1993). The ranges of LA and GA are from 0 to 90 degree. If two lines have no intersection, LA has no value (-9999), or if two lines have more than one intersection, only the biggest angle of the intersections is recorded as LA Splitting. Splitting determines how a line is divided by another line. There are different kinds of splitting: Intersect splitting (IS) describes how an intersection separates the lines (figure 2a). There are two ISs, IS for the Figure object and IS for the Ground object. As there is no consideration of the directions of the linear objects, ISs are always the ratio of the length of the shorter part of the linear object to the length of the entire object. The range of ISs is from 0 to 50%. If there is no

6 382 /. Xu intersection, or the two objects intersect at more than one point, there is no value (-9999) for this index. Alongness splitting (AS) describes how the coincident part separates the lines (figure 2b). There are two ASs, AS for the Figure object and AS for the Ground object. The ranges of ASs are from 0 to 100%. If the two objects intersect at only one point, the value is 0, and if the two objects have no intersection, there is no value (-9999) for this index. Interior traversal splitting (ITS) describes how two intersections separate the lines (figure 2c). There are two ITSs, ITS for the Figure object and ITS for the Ground object. The ranges of ITSs is from 0 to 100%. If the two objects intersect at only one point, the value is 0, if the two objects have no intersection, there is no value (-9999) for this index, and if two objects have more than two intersections, the farthest two intersections are counted.

7 Figure 4. Metric indices of spatial relations: overlap ratios. Longest-distance-length ratio (LDLR) is the ratio of the longest distance between the two objects to the length of the objects. There are two LDLRs, LDLR for the Figure object and LDLR for the Ground object: LDLR\=LD/length(A) LDLR2 = LD/length(B). Shortest-longest-distance ratio (SLDR) is the ratio of shortest distance and longest distance: SLDR = SD/LD Overlay. The overlaid area of two bounding boxes of the objects reflects not only the distance between two objects, but also the shapes or curvatures of the objects. In figure 4, the overlaid part of the bounding boxes is shaded with vertical lines. The overlap ratio (OR) is the ratio of the area of shadow to the area of the bounding box. There are two ORs, OR for the Figure object and OR for the Ground object: OR1=area(C)/area{A) OR2 = area(c)/area(b) The splitting and distance indices in this paper are very similar to the splitting and closeness measures used by Nedas et al. recently to capture metric details of line-line relation (Nedas et al. 2006). The alongness splitting and intersect splitting are actually the same as the alongness and interior splitting in their work. The interior traversal splitting corresponds to their exterior splitting in concept, although they have different formulations. The distance ratios in this paper and the closeness measures in their paper, and are all about how far away one line is from the other line, except that the distance ratios focus on the distance of the interior points of lines to the other line, while the closeness measures consider both interior points and boundaries. In addition to the distance ratios, the overlaid areas of bounding boxes are used to represent the closeness of two lines in this paper, and the angle indices between two lines are also considered because many natural-language descriptions about spatial relations contain directional meanings.

8 384 /. Xu 4. Formalization of natural-language terms with indices There are 15 metric indices, plus the eight intersection values of topological relations, so there are in total 23 indices for spatial relations. For a natural-language spatial relation predicate, only some of these indices are enough to characterize it, but the indices used to categorize different terms are different. In this paper, a decision-tree data mining algorithm is used to extract the geometric factors that determine people's choices of natural-language spatial predicates. 4.1 Decision tree The decision-tree algorithm is one of the most widely used and practical methods for data mining. It is a method for approximating discrete-valued target functions, in which the learned function is represented by a decision tree. Learned trees can also be re-represented as sets of horn clauses to improve human readability (Mitchell 1997). The decision-tree approach was first developed by Quinlan (1986). It is generated from training data in a top-down recursive divide-and-conquer manner. The initial state of a decision tree is the root node that is assigned all the examples from the training set. First, select an attribute to place at the root node and make one branch for each possible value. This splits up the example set into subsets, one for every value of the attribute. The process is repeated recursively for each branch. If all instances at a node have the same classification, stop developing that part of the tree (Mitchell 1997, Witten and Frank 2000). The question is how to choose the best attribute at the current node. The ID3 tree algorithm is described here. We should choose the attribute that produces the purest child nodes. We measure the purity with entropy: where value (A) is the set of all possible values for attribute A, and S v is the subset of S for which attribute A has value v. We select the attribute to divide the data set that will get the maximum Information Gain (Mitchell 1997, Witten and Frank 2000). The conditions to stop the recursion are: all samples at a given node belong to the same class, or there are no attributes remaining for further partitioning majority voting is employed for classifying the leaf, or there is no sample at the node (Mitchell 1997). 4.2 Numeric values The ID3 tree described above only works when all of the attributes are nominal, whereas the metric indices we will use are numeric attributes. C4.5 is a software extension of the basic ID3 algorithm designed by Quinlan (1993) to address the

9 Fomalizing natural-language spatial relations 385 issues not dealt with by ID3, such as numeric attributes, missing values, noisy data and pruning decision trees. Continuous-valued attributes can be incorporated into the learned tree by dynamically defining new discrete-valued attributes that partition the continuous attribute value into a discrete set of intervals. The C4.5 algorithm restricts the possibilities to a two-way, or binary, split. In particular, for an attribute A that is continuous-valued, the algorithm can dynamically create a new Boolean attribute A c that is true if A<c and false otherwise. The selected threshold, c, should produce the greatest information gain. By sorting the samples according to the continuous attribute A, then identifying adjacent samples that differ in their target classification, a set of candidate thresholds can be generated midway between the corresponding values of A. The value of c that maximizes information gain always lies at such a boundary (Mitchell 1997, Witten and Frank 2000). 4.3 Extract geometric factors with J.48 tree The comprehensive software, Waikato Environment for Knowledge Analysis (WEKA) software ( Witten and Frank 2000), was used to extract the geometric factors that affect the natural-language description of spatial relations. The WEKA classifier package has its own version of C4.5 known as J4.8, which can classify numeric attributes. Every natural-language term studied in the human-subject test was used to describe the spatial relations of 114 pairs of linear objects, and each of the combination was evaluated by 36 human subjects. Thus there should be 4104 training instances for each natural-language term. Since some people chose not to answer some of the questions, the actual number of instances is less than this number. The human subjects were asked to choose their answer from the seven choices: 'strongly agree', 'agree', 'somewhat agree', 'neutral', 'somewhat disagree', 'disagree' and 'strongly disagree'. In practical applications, we are only interested in positive answers, which means that the spatial relation of two linear objects belongs to a certain natural-language word. The answers 'strongly agree' and 'agree' are grouped into class 'yes', other answers are grouped into class 'no'. So there are only two values in the target attribute. 4.4 Result and analysis The J4.8 decision-tree algorithm performs a cross-validation using the training data if no test file is provided. If ten-fold cross-validation is used, then 9/10 of the training data is used to construct the model and 1/10 of the training data is used to test the model. This process is then repeated ten times so that all the training data is used exactly once in the test data. The ten different error estimates are then averaged to yield an overall error estimate. Except for 'is coincident with', which is misunderstood by many human subjects, and 'flows into' which was intended for another use, eight spatial predicates in the test are formalized by the J4.8 decision-tree algorithm using the survey result. Each of them was classified with topological indices, then with metric indices, then with both topological and metric indices, respectively. The classification accuracies of the three methods for each spatial predicate are compared in table 1. The classification accuracy of 'yes' is the percentage of cases that are correctly classified in all cases classed as 'yes', the classification accuracy of 'no' is the percentage of cases that are correctly classified in all cases classed as 'yes',

10 and the overall accuracy is the percentage of cases that are correctly classified in all cases. From table 1, we can see that for those spatial predicates that topological relations have more effects on, such as 'crosses', 'meets', 'intersects' and 'merges with', the three methods produce almost the same classification accuracies, but the overall accuracies of methods with metric indices only and with both kinds of indices are a little bit higher than those of methods with only topological indices. For spatial predicates primarily affected by metric details, such as 'is parallel to', 'is perpendicular to' and 'goes along', the methods using metric indices and both kinds of indices have almost the same classification accuracies, and they are both better than the method using only topological indices. The method using topological indices classified all 'yes' cases incorrectly. Generally, the results of using topological and metric indices together are better than those of using only topological indices. The following sections will show some examples of decision trees with both kinds of indices Crosses. There are 4078 instances for 'crosses'. Figure 5 shows the decision tree of 'crosses'. Each line in figure 6 represents a node in the tree. A node with one or more T characters before the rule is a child node of the node that the right-most

11 Fomalizing natural-language spatial relations 387 SLDR <= : no (3580.0/298.0) SLDR > : yes (499.0/117.0) Number of Leaves : 2 Size of the tree : 3 Figure 6. J4.8 pruned tree of 'parallel'. line of' ' characters terminates at. The next part of the line declares the rule. A leaf node is followed by a colon and a class designation. Nodes that generate a classification are followed by a number (sometimes two) in parentheses. The first number shows how many instances in the training set reach this node. The second number, if it exists (if not, it is taken to be 0.0), represents the number of instances incorrectly classified by the node. The stratified cross-validation result of the classification of 'crosses' shows that 3288 instances are correctly classified in the total of 4078 samples. The overall accuracy of the classification is 89.5%. The accuracies of class 'yes' and 'no' are 81.4% and 93.2%, respectively. Three rules of'crosses' can be extracted from the tree in figure 5. They are as follows. If Rll = l and AS2<6.35 and IS2< = 12 and GA< = and ITS1<=0, then cross=yes If Rl 1 = 1 and AS2<6.35 and IS2< = 12 and GA>56.9, then cross=yes If Rll = l and AS2<6.35 and IS2>12, then cross=yes The rules indicate that the topological relation plays an important role in determining whether two linear objects cross. If two linear objects cross, then they must share at least one interior point. With the same topological index of Rll = l, the splitting ratios and their orientations decide if the two objects cross 'is parallel to'. The decision tree of the spatial predicate 'is parallel to' is a very simple tree with two leaves (figure 6). It shows that when the ratio of the shortest distance and the longest distance between two lines is greater than 25.41%, the two lines are parallel to each other. Otherwise they are not parallel. However, this rule is not true when two linear objects are far away from each other. This question will be discussed later in Section 6.2. There are 4079 instances classified, and the overall accuracy is 89.83%. The accuracy of class 'yes' is 76.6%, and of class 'no' is 91.7% 'is perpendicular to'. Figure 7 is the decision tree of 'is perpendicular to'. The overall accuracy of 4087 instances is 85%. The accuracy of class 'yes' is 70.4%, and of class 'no' is 89.3%. The tree in figure 7 can be explained as follows. If GA>53.32 and OR1>0 and SDLR1< = 1.15 and GA< = and LDLR1< = and ITS1<= 18.66, then perpendicular=yes If GA>78.28 and OR1>0 and SDLR1< = 1.15, then perpendicular=yes From these rules, we can see that when the angle between the two general orientations of the linear objects is bigger than degrees, the two objects are

12 perpendicular to each other. Otherwise, if the angle is bigger than degrees, the overlay ratio is greater than 0, the shortest-distance-length ratio is less 1.15, the longest-distance-length ratio is less than 77.54, and the interior-traversal splitting is less than 18.66, which means that if two linear objects are perpendicular to each other, then the two objects must be close to each other, and the global angle between them must be big enough 'meets'. The decision tree of spatial predicate 'meets' is a very simple tree with two leaves (figure 8). This means that if one linear object is close enough to another one, then it meets the other one. The overall accuracy of 4081 instances of 'meets' is 81.21%. The accuracy of class 'yes' is 74.7%, and that of class 'no' is 98.6%. 5. Application of natural-language queries in ArcGIS Based on the rules extracted from the decision trees, a natural-language interface to query spatial relations of linear geographic objects is developed in ArcGIS. The interface cannot only perform the basic functions of displaying and querying spatial data in natural spoken English, but can also query the spatial relations of two linear objects in English according to the formalized rules of spatial relation predicates.

13 The interface is implemented with SNePS (the Semantic Network Processing System). SNePS is a semantic-network knowledge representation system that is used for research in artificial intelligence and in cognitive science (Shapiro and Rapaport 1995). Users can interact with SNePS with a variety of interface languages, including natural language. An interface between SNePS and ArcGIS is built to process natural-language queries of spatial relations. The handshake protocol used by Shapiro et al. (1992) in the ARC/INFO SNACTor system is used. There are two sides of the interface, the SNePS side and the ArcGIS side. The communication medium is a file system. SNePS accepts and processes the natural-language input, issues a command, generates a command file, and waits for a result to be generated. On the ArcGIS side, the program is implemented with ArcObjects in a Visual Basic Environment (VBE). ArcGIS reads the command file, executes the command, generates a result, writes the result to a result file if needed, builds the newly acquired information to the semantic network, and waits for the next command. Figure 9 illustrates the structure of the interface between SNePS and ArcGIS. SNePS need a grammar file and a lexicon in order to process natural-language inputs. The grammar file includes all the rules that SNePS will follow to parse the sentences. The lexicon file defines the words and phases that will be read by SNePS. Suppose we have a lexicon containing a Yellowstone National Park scenario and all the terms for operating ArcGIS; we can query the spatial relations of the roads and rivers in Yellowstone National Park. The following is a sample run of the naturallanguage interface in SNePS. The user input in natural language is shown at ':' prompts. The system's response follows after the user input separated by a blank line. The results in ArcGIS are shown in figure 10. : Find roads that cross Yellowstone River. There are 1 features selected. NE-Entrance Rd : Find roads that are parallel to US-191. There are 1 features selected. N-Entrance Rd SNePS is used because it is an intelligent system representing knowledge in a semantic network. It can build, use, and retrieve information from a prepositional semantic network (Shapiro and The SNePS Implementation Group 1999), and can infer unknown knowledge from what it has been told as well. Since SNePS has

14

15 Fomalizing natural-language spatial relations 391 : Is N-Entrance Rd perpendicular to US-191? No, N-Entrance Rd is not perpendicular to US Discussion 6.1 Comparison with the method of Shariff Metric indices were first used by Shariff et al. (1998) as refinements of the 9- intersection model to categorize the natural-language spatial relations between a line and a region. They divided the metric parameters calculated from the human-subject test into 10 clusters. Since the same natural-language term could have several different geometric configurations, the 9-intersection topological relation of a particular configuration was used to tag the spatial term. When people query the natural-language spatial relation in a GIS database, first the spatial term of the query is used to find the relevant topological and metric parameters and their values from the Metric Table of Spatial Terms, and then build the values of these parameters in an SQL query to select all the configurations that fulfil the query criteria. This paper uses the 9-intersection topological model and metric indices to formalize the natural-language spatial terms of two linear objects. Unlike Shariff s method to use the topological relation as a tag of the spatial term, all the values of topological indices and metric indices are used together to classify the spatial terms with the decision-tree data mining algorithm, and only the indices that are found to classify the data better are finally used in the spatial query. 6.2 The limitations of the result There are an unlimited number of possible metric configurations between two linear objects. Only 114 geometric configurations of linear geographic objects were used in the human-subject test of this study. Although different styles of combination are included as much as possible, they cannot include all situations. The data mining result can divide these situations well, but may not work well in some other situations. To display the objects to the human subjects clearly when zooming in to the two selected objects, all pairs of linear objects were selected within a short distance from the whole map. When the decision rules extracted from these sample data are applied to the query of the whole map, some problems show up. For example, in the rules of 'parallel to', there is 'If SLDR>25.41, then parallel=yes', which means two linear objects are parallel if the ratio of the shortest distance and longest distance between them is larger than When two linear objects are close to each other, this rule works well. But when the distance between them is far enough, this condition is always satisfied no matter what the metric configuration is. Figure 11 is the query result of 'Find roads that are parallel to US-191' using this rule. All the roads in the map are selected no matter if they really are parallel to US- 191 or not. To resolve this problem, currently a constraint about the distance between two objects is added to the rule. In the future, a human-subject test with more complete scenarios should be conducted. 7. Conclusions and future work This paper introduced a method of formalizing natural-language spatial relation terms with quantitative values. Metric indices were defined to describe the metric

16 392 /. Xu Figure 11. The wrong query of 'parallel to'. properties between two linear objects, and the values of the 9-intersection model were used to describe the topological properties between them. Based on the results of a human-subject test, these metric and topological indices were used to characterize the natural-language terms with the J4.8 decision-tree algorithm, so that the decision rules could be extracted to classify whether the spatial relation between two linear objects falls into the range that can be described by a certain natural-language term. It revealed that using topological and metric properties together can produce better results in formalizing natural-language terms than using only topological properties. The method built a bridge between the ambiguous natural-language representations and the geometric spatial relations of linear geographic objects, so as to define the natural-language spatial relations with quantitative indices. Based on the formalized rules, a natural-language interface for querying spatial relations was implemented in ArcGIS. The scenarios of linear objects and spatial predicates that have been studied in the human-subject test are incomplete. The incompleteness of the test leads to a usage limitation of the formalized rules, such as the problem of the rule about 'is parallel to' that has been discussed above. More human-subject studies need to be done on the other spatial relations between simple lines and spatial relations between nonsimple lines. Meanwhile, the scale issue has to be considered. The decision-tree algorithm used in this paper splits the continuous attributes into two groups at a threshold value. Although the selected threshold produces the greatest information gain, it is arbitrary to separate the fuzzy set at a specific point. In natural-language concepts, there are no distinct boundaries between 'far' and 'near', and 'big' and 'small'. The boundaries between them are fuzzy, and gradually changed. If you can say that two roads having a global angle of 79 degrees are

17 Fomalizing natural-language spatial relations 393 perpendicular, then you can hardly say that two roads having a global angle of 78 degrees are not perpendicular. Fuzzy decision trees do not categorize samples into different classes, but give the possibilities of every sample's being the member of each class. The fuzzy rules extracted from the fuzzy decision trees can reflect human cognition better. In the future, a fuzzy decision-tree algorithm will be used to formalize the natural-language spatial relation terms. The current natural-language query application in ArcGIS uses topological and metric information, but does not consider the context. Advanced research in the natural-language interface may take the context into account, so that the system understands the situation well and is able to search more effectively and correctly. When we query something that flows into Niagara River, it will only search river layers instead of all feature layers, and deduce that Niagara River is a water body. Thus the system becomes more intelligent. With a contextual vocabulary acquisition (CVA) algorithm, which is an SNePS-based computational project for computers to understand the meaning of a word from its context (Rapaport and Kibby 2002), it is applicable. References ABELLA, A. and KENDER, J.R., 1993, Qualitative describing objects using spatial prepositions. In Proceedings of IEEE Workshop on Qualitative Vision, pp BENNETT, B., COHN, A.G. and ISLI, A., 1997, Combining multiple representations in a spatial reasoning system. In Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence (TAI-97), pp , Newport Beach, CA. CLEMENTINI, E., DI FELICE, P. and HERNANDEZ, D., 1997, Qualitative representation of positional information. Artificial Intelligence, 95, pp COHN, A.G., 1996, Calculi for qualitative spatial reasoning. In Proceedings of the International Conference on Artificial Intelligence and Symbolic Mathematical Computation (AISMC-3), J. Calmet, J.A. Campbell and J. Pfalzgraf (Eds). Lecture Notes in Computer Science Vol (Berlin: Springer-Verlag), pp COHN, A.G., 1997, Qualitative spatial representation and reasoning techniques. In KI-97: Advances in Artificial Intelligence, G. Brewka, C. Habel and B. Nebel (Eds) (Berlin: Springer-Verlag), pp COHN, A.G., BENNETT, B., GOODDAY, J. and GOTTS, N.M., 1997a, Qualitative spatial representation and reasoning with region connection calculus. Geoinformatica, 1, pp COHN, A.G., BENNETT, B., GOODDAY, J. and GOTTS, N.M., 1997b, Representing and reasoning with qualitative spatial relations about regions. In Spatial and Temporal Reasoning, O. Stock (Ed.) (Dordrecht: Kluwer Academic), pp COHN, A.G. and HAZARIKA, S.M., 2001, Qualitative spatial representation and reasoning: an overview. Fundermanta Informaticae, 43, pp EGENHOFER, M.J. and FRANZOSA, R.D., 1991, Point-set topological spatial relations. International Journal of Geographical Information Systems, 5, pp EGENHOFER, M.J. and HERRING, J., 1994, Categorizing binary topological relations between regions, lines, and points in geographic databases. Technical Report, Department of Surveying Engineering, University of Maine. EGENHOFER, M.J. and MARK, D.M., 1995, Naive geography. In Proceedings of the International Conference on Spatial Information Theory: A Theoretical Basis for GIS (COSIT'95), A.U. Frank and K. Kuhn (Eds) (Berlin: Springer-Verlag), pp EGENHOFER, M.J. and SHARIFF, R., 1998, Metric details for natural-language spatial relations. ACM Transactions on Information Systems, 16, pp FRANK, A.U., 1991, Qualitative spatial reasoning about cardinal directions. In Proceedings of Autocarto 10, Baltimore, MD, pp

18 394 /. Xu FRANK, A.U. and MARK, D.M., 1991, Language issues for GIS. In Geographical Information Systems: Principles and Applications, D.J. Maguire, M.F. Goodchild and D.W. Rhind (Eds). Vol. 1 (London: Longmans), pp FREKSA, C, 1992, Using orientation information for qualitative spatial reasoning. In Proceedings of the International GIS Conference From Space to Territory: Theories and Methods of Spatio-Temporal Reasoning in Geographic Space, Pisa, Italy A.U. Frank, I. Campari, and U. Formentini (Eds) (London: Springer-Verlag), pp FREKSA, C. and ZIMMERMANN, K., 1992, On the utilization of spatial structures for cognitively plausible and efficient reasoning. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, Chicago, IL, October GAHEGAN, M., 1995, Proximity operators for qualitative spatial reasoning. In Proceedings of the International Conference on Spatial Information Theory: A Theoretical Basis for GIS (COSIT95), Semmering, Austria A. Frank, and W. Kuhn (Eds). Lecture Notes in Computer Science Vol.988 (Berlin: Springer-Verlag), pp HERNANDEZ, D., 1991, Relative representation of spatial knowledge: the 2-D case. In Cognitive and Linguistic Aspects of Geographic Space, D. Mark and A. Frank (Eds) (Dordrecht: Kluwer Academic). HERNANDEZ, D., CLEMETINI, E. and Di FELICE, P., 1995, Qualitative distance. In Proceedings of the International Conference on Spatial Information Theory: A Theoretical Basis for GIS (COSIT95), Semmering, Austria A. Frank, and W. Kuhn (Eds). Lecture Notes in Computer Science Vol. 988 (Berlin: Springer-Verlag), pp HERSKOVITS, A., 1986, Language and Spatial Cognition: An Interdisciplinary Study of the Preposition in English (New York: Cambridge University Press). KAINZ, W., EGENHOFER, M.J. and GREASLEY, I., 1993, Modeling spatial relations and operations with partially ordered sets. International Journal of Geographical Information Systems, 7, pp MARK, M.D., COMAS, D., EGENHOFER, M.J., FREUNDSCHUH, S.M., GOULD, M.D. and NUNES, J., 1995, Evaluating and Refining computational models of spatial relations through cross-linguistic human-subject testing. In Proceedings of the International Conference on Spatial Information Theory: A Theoretical Basis for GIS (COSIT'95), Semmering, Austria A. Frank, and W. Kuhn (Eds). Lecture Notes in Computer Science Vol. 988 (Berlin: Springer-Verlag), pp MARK, D.M. and EGENHOFER, M.J., 1994, Calibrating the meanings of spatial predicates from natural language: line-region relations. In Proceedings of the 6th International Symposium on 'Spatial Data Handling' (SDH '94), September 1994, Edinburgh, UK (London: Taylor & Francis), Vol. 1, pp MARK, D.M. and EGENHOFER, M.J., 1995, Topology of prototypical spatial relations between lines and regions in English and Spanish. In Proceedings of'auto Carto 12', Charlotte, North Carolina, March 1995, pp MARK, D.M., FRESKA, C, HIRTLE, S.C., LLOYD, R. and TVERSKY, B., 1999, Cognitive models of geographical space. International Journal of Information Science, 13, pp MITCHELL, T.M., 1997, Machine Learning (New York: McGraw-Hill). NEDAS, K.A., EGENHOFER, M.J. and WILMSEN, D., 2006, Metric details of topological lineline relations. International Journal of Geographical Information Science (in press). QUINLAN, J.R., 1993, C4.5: Programming for Machine Learning (San Mateo, CA: Morgan Kaufmann). QUINLAN, J.R., 1986, Induction of decision trees. Machine Learning, 1, pp RANDELL, D.A., Cui, Z. and COHN, A.G., 1992, A spatial logic based on regions and connection. In Proceedings of the 3rd International Conference on Knowledge Representation and Reasoning, October 1992, Cambridge, MA (San Francisco: Morgan Kaufmann), pp RAPAPORT, W.J. and KIBBY, M.W., 2002, ROLE: Contextual vocabulary acquisition: from algorithm to curriculum. In Proceedings of the 6th World Multiconference on

19 Fomalizing natural-language spatial relations 395 Systemics, Cybernetics and Informatics (SCI 2002), Orlando, FL, Vol.11: Concepts and Applications of Systemics, Cybernetics, and Informatics I (Orlando, FL: International Institute of Informatics and Systemics), pp SHAPIRO, S.C, CHALUPSKY, H., CHOU, H. and MARK, D.M., 1992, Intelligent user interfaces: Connecting ARC/INFO and SNACTor, a semantic network based system for planning actions. In Proceedings of the Twelfth Annual ESRI User Conference, 8-12 June 1992, Palm Springs, CA (Redlands, CA: Environmental Systems Research Institute), Vol. 3, pp SHAPIRO, S.C. and RAPAPORT, W.J., 1995, An introduction to a computational reader of narrative. In Deixis in Narrative: A Cognitive Science Perspective, J.F. Duchan, G.A. Bruder and L.E. Hewitt (Eds) (Hillsdale, NJ: Lawrence Erlbaum Associates), pp SHAPIRO, S.C. and THE SNEPS IMPLEMENTATION GROUP, 1999, SNePS 2.5 User's Manual. Available online at: SHARIFF, R., EGENHOFER, M.J. and MARK, D.M., 1998, Natural-language spatial relations between linear and areal objects: the topology and metric of English-language terms. International Journal of Geographical Information Science, 12, pp SMITH, T.R. and PARK, K.K., 1992, Algebraic approach to spatial reasoning. International Journal of Geographical Information Systems, 6, pp TALMY, L., 1983, How language structures space. In Spatial Orientation: Theory, Research and Application, H. Pick and L. Acredolo (Eds) (New York: Plenum Press). VAKARELOV, D., DIMOV, G., DUNTSCH, I. and BENNETT, B., 2002, A proximity approach to some region-based theories of space. Journal of Applied Non-Classical Logics, 12, pp WELD, D.S. and DE KLEER, J., 1990, Readings in Qualitative Reasoning about Physical Systems (San Mateo, CA: Morgan Kaufmann). WITTEN, I.H. and FRANK, E., 2000, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementation (San Francisco, CA: Morgan Kaufmann). Xu, J., 2005, Natural-language representation and query of linear geographic objects in GIS. PhD dissertation, Department of Geography, State University of New York at Buffalo. YAO, X., 2002, Qualitative georeferencing and proximity modeling for geospatial information system. PhD dissertation, Department of Geography, State University of New York at Buffalo. ZHAN, C. and PEUQUET, D., 1987, An algorithm to determine the directional relationship between arbitrarily-shaped polygons in the plane. Pattern Recognition, 20, pp

Convex Hull-Based Metric Refinements for Topological Spatial Relations

Convex Hull-Based Metric Refinements for Topological Spatial Relations ABSTRACT Convex Hull-Based Metric Refinements for Topological Spatial Relations Fuyu (Frank) Xu School of Computing and Information Science University of Maine Orono, ME 04469-5711, USA fuyu.xu@maine.edu

More information

Towards Usable Topological Operators at GIS User Interfaces

Towards Usable Topological Operators at GIS User Interfaces Towards Usable Topological Operators at GIS User Interfaces Catharina Riedemann Institute for Geoinformatics, University of Münster Münster, Germany riedemann@ifgi.uni-muenster.de SUMMARY The topological

More information

43400 Serdang Selangor, Malaysia Serdang Selangor, Malaysia 4

43400 Serdang Selangor, Malaysia Serdang Selangor, Malaysia 4 An Extended ID3 Decision Tree Algorithm for Spatial Data Imas Sukaesih Sitanggang # 1, Razali Yaakob #2, Norwati Mustapha #3, Ahmad Ainuddin B Nuruddin *4 # Faculty of Computer Science and Information

More information

Intelligent GIS: Automatic generation of qualitative spatial information

Intelligent GIS: Automatic generation of qualitative spatial information Intelligent GIS: Automatic generation of qualitative spatial information Jimmy A. Lee 1 and Jane Brennan 1 1 University of Technology, Sydney, FIT, P.O. Box 123, Broadway NSW 2007, Australia janeb@it.uts.edu.au

More information

Maintaining Relational Consistency in a Graph-Based Place Database

Maintaining Relational Consistency in a Graph-Based Place Database Maintaining Relational Consistency in a Graph-Based Place Database Hao Chen Department of Infrastructure Engineering, Geomatics The University of Melbourne Email: hchen@student.unimelb.edu.au Introduction

More information

Spatial Reasoning With A Hole

Spatial Reasoning With A Hole Spatial Reasoning With A Hole Max J. Egenhofer and Maria Vasardani National Center for Geographic Information and Analysis and Department of Spatial Information Science and Engineering University of Maine

More information

How to Handle Incomplete Knowledge Concerning Moving Objects

How to Handle Incomplete Knowledge Concerning Moving Objects B B is How to Handle Incomplete Knowledge Concerning Moving Objects Nico Van de Weghe 1, Peter Bogaert 1, Anthony G. Cohn 2, Matthias Delafontaine 1, Leen De Temmerman 1, Tijs Neutens 1, Philippe De Maeyer

More information

Metrics and topologies for geographic space

Metrics and topologies for geographic space Published in Proc. 7 th Intl. Symp. Spatial Data Handling, Delft, Netherlands. Metrics and topologies for geographic space Michael F Worboys Department of Computer Science, Keele University, Staffs ST5

More information

Relative adjacencies in spatial pseudo-partitions

Relative adjacencies in spatial pseudo-partitions Relative adjacencies in spatial pseudo-partitions Roderic Béra 1, Christophe Claramunt 1 1 Naval Academy Research Institute, Lanvéoc-Poulmic, BP 600, 29240 Brest Naval, France {bera, claramunt}@ecole-navale.fr

More information

Uncertainty of Spatial Metric Relations in GIs

Uncertainty of Spatial Metric Relations in GIs This file was created by scanning the printed publication. Errors identified by the software have been corrected; however, some errors may remain. Uncertainty of Spatial Metric Relations in GIs Xiaoyong

More information

ML techniques. symbolic techniques different types of representation value attribute representation representation of the first order

ML techniques. symbolic techniques different types of representation value attribute representation representation of the first order MACHINE LEARNING Definition 1: Learning is constructing or modifying representations of what is being experienced [Michalski 1986], p. 10 Definition 2: Learning denotes changes in the system That are adaptive

More information

Syntactic Patterns of Spatial Relations in Text

Syntactic Patterns of Spatial Relations in Text Syntactic Patterns of Spatial Relations in Text Shaonan Zhu, Xueying Zhang Key Laboratory of Virtual Geography Environment,Ministry of Education, Nanjing Normal University,Nanjing, China Abstract: Natural

More information

Classification Based on Logical Concept Analysis

Classification Based on Logical Concept Analysis Classification Based on Logical Concept Analysis Yan Zhao and Yiyu Yao Department of Computer Science, University of Regina, Regina, Saskatchewan, Canada S4S 0A2 E-mail: {yanzhao, yyao}@cs.uregina.ca Abstract.

More information

CS 6375 Machine Learning

CS 6375 Machine Learning CS 6375 Machine Learning Decision Trees Instructor: Yang Liu 1 Supervised Classifier X 1 X 2. X M Ref class label 2 1 Three variables: Attribute 1: Hair = {blond, dark} Attribute 2: Height = {tall, short}

More information

Induction of Decision Trees

Induction of Decision Trees Induction of Decision Trees Peter Waiganjo Wagacha This notes are for ICS320 Foundations of Learning and Adaptive Systems Institute of Computer Science University of Nairobi PO Box 30197, 00200 Nairobi.

More information

CS6375: Machine Learning Gautam Kunapuli. Decision Trees

CS6375: Machine Learning Gautam Kunapuli. Decision Trees Gautam Kunapuli Example: Restaurant Recommendation Example: Develop a model to recommend restaurants to users depending on their past dining experiences. Here, the features are cost (x ) and the user s

More information

Investigation on spatial relations in 3D GIS based on NIV

Investigation on spatial relations in 3D GIS based on NIV Investigation on spatial relations in 3D GIS based on NI DENG Min LI Chengming Chinese cademy of Surveying and Mapping 139, No.16, Road Beitaiping, District Haidian, Beijing, P.R.China Email: cmli@sun1.casm.cngov.net,

More information

Calibrating the Meanings of Spatial Predicates from Natural Language: Line-Region Relations *

Calibrating the Meanings of Spatial Predicates from Natural Language: Line-Region Relations * Calibrating the Meanings of Spatial Predicates from Natural Language: Line-Region Relations * David M. Mark National Center for Geographic Information and Analysis and Department of Geography, State University

More information

A Generalized Decision Logic in Interval-set-valued Information Tables

A Generalized Decision Logic in Interval-set-valued Information Tables A Generalized Decision Logic in Interval-set-valued Information Tables Y.Y. Yao 1 and Qing Liu 2 1 Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail: yyao@cs.uregina.ca

More information

Naming Topological Operators at GIS User Interfaces

Naming Topological Operators at GIS User Interfaces Naming Topological Operators at GIS User Interfaces Catharina Riedemann Institute for Geoinformatics, University of Münster Münster, Germany riedemann@uni-muenster.de SUMMARY User interfaces of geospatial

More information

Spatial Relations for Semantic Similarity Measurement

Spatial Relations for Semantic Similarity Measurement Spatial Relations for Semantic Similarity Measurement Angela Schwering 1,2 and Martin Raubal 2 1 Ordnance Survey of Great Britain, United Kingdom 2 Institute for Geoinformatics, University of Muenster,

More information

Decision Tree Learning

Decision Tree Learning Decision Tree Learning Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Machine Learning, Chapter 3 2. Data Mining: Concepts, Models,

More information

Alexander Klippel and Chris Weaver. GeoVISTA Center, Department of Geography The Pennsylvania State University, PA, USA

Alexander Klippel and Chris Weaver. GeoVISTA Center, Department of Geography The Pennsylvania State University, PA, USA Analyzing Behavioral Similarity Measures in Linguistic and Non-linguistic Conceptualization of Spatial Information and the Question of Individual Differences Alexander Klippel and Chris Weaver GeoVISTA

More information

Exploring Spatial Relationships for Knowledge Discovery in Spatial Data

Exploring Spatial Relationships for Knowledge Discovery in Spatial Data 2009 International Conference on Computer Engineering and Applications IPCSIT vol.2 (2011) (2011) IACSIT Press, Singapore Exploring Spatial Relationships for Knowledge Discovery in Spatial Norazwin Buang

More information

Modeling Spatial Relations Between Lines and Regions: Combining Formal Mathematical Models and Human Subjects Testing

Modeling Spatial Relations Between Lines and Regions: Combining Formal Mathematical Models and Human Subjects Testing Mark, D. M., and Egenhofer, M. J., 1994. Modeling Spatial Relations Between Lines and Regions: Combining Formal Mathematical Models and Human Subjects Testing. Cartography and Geographic Information Systems,

More information

Final exam of ECE 457 Applied Artificial Intelligence for the Spring term 2007.

Final exam of ECE 457 Applied Artificial Intelligence for the Spring term 2007. Spring 2007 / Page 1 Final exam of ECE 457 Applied Artificial Intelligence for the Spring term 2007. Don t panic. Be sure to write your name and student ID number on every page of the exam. The only materials

More information

Location Based Concierge Service with Spatially Extended Topology for Moving Objects

Location Based Concierge Service with Spatially Extended Topology for Moving Objects The Journal of GIS Association of Korea, Vol. 17, No. 4, pp. 445-454, December 2009 Location Based Concierge Service with Spatially Extended Topology for Moving Objects Byoungjae Lee* ABSTRACT Beyond simple

More information

SPATIAL DATA MINING. Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM

SPATIAL DATA MINING. Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM SPATIAL DATA MINING Ms. S. Malathi, Lecturer in Computer Applications, KGiSL - IIM INTRODUCTION The main difference between data mining in relational DBS and in spatial DBS is that attributes of the neighbors

More information

Preferred Mental Models in Qualitative Spatial Reasoning: A Cognitive Assessment of Allen s Calculus

Preferred Mental Models in Qualitative Spatial Reasoning: A Cognitive Assessment of Allen s Calculus Knauff, M., Rauh, R., & Schlieder, C. (1995). Preferred mental models in qualitative spatial reasoning: A cognitive assessment of Allen's calculus. In Proceedings of the Seventeenth Annual Conference of

More information

SVY2001: Lecture 15: Introduction to GIS and Attribute Data

SVY2001: Lecture 15: Introduction to GIS and Attribute Data SVY2001: Databases for GIS Lecture 15: Introduction to GIS and Attribute Data Management. Dr Stuart Barr School of Civil Engineering & Geosciences University of Newcastle upon Tyne. Email: S.L.Barr@ncl.ac.uk

More information

Qualitative Spatial-Relation Reasoning for Design

Qualitative Spatial-Relation Reasoning for Design Qualitative Spatial-Relation Reasoning for Design Max J. Egenhofer National Center for Geographic Information and Analysis Department of Spatial Information Science and Engineering Department of Computer

More information

Spatio-Temporal Relationships in a Primitive Space: an attempt to simplify spatio-temporal analysis

Spatio-Temporal Relationships in a Primitive Space: an attempt to simplify spatio-temporal analysis Spatio-Temporal Relationships in a Primitive Space: an attempt to simplify spatio-temporal analysis Pierre Hallot 1 1 Geomatics Unit / University of Liège (Belgium) P.Hallot@ulg.ac.be INTRODUCTION Nowadays,

More information

Machine Learning for Interpretation of Spatial Natural Language in terms of QSR

Machine Learning for Interpretation of Spatial Natural Language in terms of QSR Machine Learning for Interpretation of Spatial Natural Language in terms of QSR Parisa Kordjamshidi 1, Joana Hois 2, Martijn van Otterlo 1, and Marie-Francine Moens 1 1 Katholieke Universiteit Leuven,

More information

Combining cardinal direction relations and other orientation relations in QSR

Combining cardinal direction relations and other orientation relations in QSR Combining cardinal direction relations and other orientation relations in QSR mar Isli Fachbereich Informatik, Universität Hamburg, Vogt-Kölln-Strasse 30, D-22527 Hamburg, Germany isli@informatik.uni-hamburg.de

More information

Spatial Intelligence. Angela Schwering

Spatial Intelligence. Angela Schwering Spatial Intelligence Angela Schwering What I do What I do intelligent representation and processing of spatial information From the Cognitive Perspective How do humans perceive spatial information? How

More information

Review of Lecture 1. Across records. Within records. Classification, Clustering, Outlier detection. Associations

Review of Lecture 1. Across records. Within records. Classification, Clustering, Outlier detection. Associations Review of Lecture 1 This course is about finding novel actionable patterns in data. We can divide data mining algorithms (and the patterns they find) into five groups Across records Classification, Clustering,

More information

On minimal models of the Region Connection Calculus

On minimal models of the Region Connection Calculus Fundamenta Informaticae 69 (2006) 1 20 1 IOS Press On minimal models of the Region Connection Calculus Lirong Xia State Key Laboratory of Intelligent Technology and Systems Department of Computer Science

More information

Symbolic methods in TC: Decision Trees

Symbolic methods in TC: Decision Trees Symbolic methods in TC: Decision Trees ML for NLP Lecturer: Kevin Koidl Assist. Lecturer Alfredo Maldonado https://www.cs.tcd.ie/kevin.koidl/cs4062/ kevin.koidl@scss.tcd.ie, maldonaa@tcd.ie 2016-2017 2

More information

Mappings For Cognitive Semantic Interoperability

Mappings For Cognitive Semantic Interoperability Mappings For Cognitive Semantic Interoperability Martin Raubal Institute for Geoinformatics University of Münster, Germany raubal@uni-muenster.de SUMMARY Semantic interoperability for geographic information

More information

MULTIDIMENSIONAL REPRESENTATION OF GEOGRAPHIC FEATURES. E. Lynn Usery U.S. Geological Survey United States of America

MULTIDIMENSIONAL REPRESENTATION OF GEOGRAPHIC FEATURES. E. Lynn Usery U.S. Geological Survey United States of America MULTIDIMENSIONAL REPRESENTATION OF GEOGRAPHIC FEATURES E. Lynn Usery U.S. Geological Survey United States of America KEY WORDS: knowledge representation, object-oriented, multi-scale database, data models,

More information

Adding ternary complex roles to ALCRP(D)

Adding ternary complex roles to ALCRP(D) Adding ternary complex roles to ALCRP(D) A.Kaplunova, V. Haarslev, R.Möller University of Hamburg, Computer Science Department Vogt-Kölln-Str. 30, 22527 Hamburg, Germany Abstract The goal of this paper

More information

Quality Assessment of Geospatial Data

Quality Assessment of Geospatial Data Quality Assessment of Geospatial Data Bouhadjar MEGUENNI* * Center of Spatial Techniques. 1, Av de la Palestine BP13 Arzew- Algeria Abstract. According to application needs, the spatial data issued from

More information

Lecture 7: DecisionTrees

Lecture 7: DecisionTrees Lecture 7: DecisionTrees What are decision trees? Brief interlude on information theory Decision tree construction Overfitting avoidance Regression trees COMP-652, Lecture 7 - September 28, 2009 1 Recall:

More information

WEB-BASED SPATIAL DECISION SUPPORT: TECHNICAL FOUNDATIONS AND APPLICATIONS

WEB-BASED SPATIAL DECISION SUPPORT: TECHNICAL FOUNDATIONS AND APPLICATIONS WEB-BASED SPATIAL DECISION SUPPORT: TECHNICAL FOUNDATIONS AND APPLICATIONS Claus Rinner University of Muenster, Germany Piotr Jankowski San Diego State University, USA Keywords: geographic information

More information

GEO-INFORMATION (LAKE DATA) SERVICE BASED ON ONTOLOGY

GEO-INFORMATION (LAKE DATA) SERVICE BASED ON ONTOLOGY GEO-INFORMATION (LAKE DATA) SERVICE BASED ON ONTOLOGY Long-hua He* and Junjie Li Nanjing Institute of Geography & Limnology, Chinese Academy of Science, Nanjing 210008, China * Email: lhhe@niglas.ac.cn

More information

Lecture 3: Decision Trees

Lecture 3: Decision Trees Lecture 3: Decision Trees Cognitive Systems - Machine Learning Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning last change November 26, 2014 Ute Schmid (CogSys,

More information

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees!

Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Supervised Learning! Algorithm Implementations! Inferring Rudimentary Rules and Decision Trees! Summary! Input Knowledge representation! Preparing data for learning! Input: Concept, Instances, Attributes"

More information

MECHANISM AND METHODS OF FUZZY GEOGRAPHICAL OBJECT MODELING

MECHANISM AND METHODS OF FUZZY GEOGRAPHICAL OBJECT MODELING MECHANISM AND METHODS OF FUZZY GEOGRAPHICAL OBJECT MODELING Zhang Xiaoxiang a,, Yao Jing a, Li Manchun a a Dept. of Urban & Resources Sciences, Nanjing University, Nanjing, 210093, China (xiaoxiang, yaojing,

More information

Jeffrey D. Ullman Stanford University

Jeffrey D. Ullman Stanford University Jeffrey D. Ullman Stanford University 3 We are given a set of training examples, consisting of input-output pairs (x,y), where: 1. x is an item of the type we want to evaluate. 2. y is the value of some

More information

Egenhofer, M. (1991) Reasoning about Binary Topological Relations. In Gunther, O. and Schek, H.J. (eds.) Advances in Spatial Databases.

Egenhofer, M. (1991) Reasoning about Binary Topological Relations. In Gunther, O. and Schek, H.J. (eds.) Advances in Spatial Databases. Pullar, D., Egenhofer, M. (1988) Toward Formal Definitions of Topological Relations among patial Objects. In the Proceedings of the 3rd International ymposium on patial Data Handling. Randell, D. A., Cui,

More information

Formalization of GIS functionality

Formalization of GIS functionality Formalization of GIS functionality Over the past four decades humans have invested significantly in the construction of tools for handling digital representations of spaces and their contents. These include

More information

Decision T ree Tree Algorithm Week 4 1

Decision T ree Tree Algorithm Week 4 1 Decision Tree Algorithm Week 4 1 Team Homework Assignment #5 Read pp. 105 117 of the text book. Do Examples 3.1, 3.2, 3.3 and Exercise 3.4 (a). Prepare for the results of the homework assignment. Due date

More information

Temporal Knowledge Acquisition From Multiple Experts

Temporal Knowledge Acquisition From Multiple Experts emporal Knowledge Acuisition From Multiple Experts Helen Kaikova, Vagan erziyan Metaintelligence Lab., Kharkov State echnical University of Radioelectronics, 4 Lenina Avenue, 3076 Kharkov, Ukraine, e-mail:

More information

Notes on Machine Learning for and

Notes on Machine Learning for and Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Learning = improving with experience Improve over task T (e.g, Classification, control tasks) with respect

More information

P leiades: Subspace Clustering and Evaluation

P leiades: Subspace Clustering and Evaluation P leiades: Subspace Clustering and Evaluation Ira Assent, Emmanuel Müller, Ralph Krieger, Timm Jansen, and Thomas Seidl Data management and exploration group, RWTH Aachen University, Germany {assent,mueller,krieger,jansen,seidl}@cs.rwth-aachen.de

More information

Machine Learning & Data Mining

Machine Learning & Data Mining Group M L D Machine Learning M & Data Mining Chapter 7 Decision Trees Xin-Shun Xu @ SDU School of Computer Science and Technology, Shandong University Top 10 Algorithm in DM #1: C4.5 #2: K-Means #3: SVM

More information

WEIGHTS OF TESTS Vesela Angelova

WEIGHTS OF TESTS Vesela Angelova International Journal "Information Models and Analyses" Vol.1 / 2012 193 WEIGHTS OF TESTS Vesela Angelova Abstract: Terminal test is subset of features in training table that is enough to distinguish objects

More information

Application of Topology to Complex Object Identification. Eliseo CLEMENTINI University of L Aquila

Application of Topology to Complex Object Identification. Eliseo CLEMENTINI University of L Aquila Application of Topology to Complex Object Identification Eliseo CLEMENTINI University of L Aquila Agenda Recognition of complex objects in ortophotos Some use cases Complex objects definition An ontology

More information

Key Words: geospatial ontologies, formal concept analysis, semantic integration, multi-scale, multi-context.

Key Words: geospatial ontologies, formal concept analysis, semantic integration, multi-scale, multi-context. Marinos Kavouras & Margarita Kokla Department of Rural and Surveying Engineering National Technical University of Athens 9, H. Polytechniou Str., 157 80 Zografos Campus, Athens - Greece Tel: 30+1+772-2731/2637,

More information

Lecture 3: Decision Trees

Lecture 3: Decision Trees Lecture 3: Decision Trees Cognitive Systems II - Machine Learning SS 2005 Part I: Basic Approaches of Concept Learning ID3, Information Gain, Overfitting, Pruning Lecture 3: Decision Trees p. Decision

More information

Decision Tree. Decision Tree Learning. c4.5. Example

Decision Tree. Decision Tree Learning. c4.5. Example Decision ree Decision ree Learning s of systems that learn decision trees: c4., CLS, IDR, ASSISA, ID, CAR, ID. Suitable problems: Instances are described by attribute-value couples he target function has

More information

Machine Learning 2010

Machine Learning 2010 Machine Learning 2010 Decision Trees Email: mrichter@ucalgary.ca -- 1 - Part 1 General -- 2 - Representation with Decision Trees (1) Examples are attribute-value vectors Representation of concepts by labeled

More information

Spatio-temporal configurations of dynamics points in a 1D space

Spatio-temporal configurations of dynamics points in a 1D space Spatio-temporal configurations of dynamics points in a 1D space Pierre Hallot 1, Roland Billen 1 1 Geomatics Unit, University of Liège, Allée du 6 Août, 17 B-4000 Liège Belgium {P.Hallot@ulg.ac.be, rbillen@ulg.ac.be}

More information

GeoVISTA Center, Department of Geography, The Pennsylvania State University, PA, USA

GeoVISTA Center, Department of Geography, The Pennsylvania State University, PA, USA Formally grounding spatio-temporal thinking Klippel, A., Wallgrün, J. O., Yang, J., Li, R., & Dylla, F. (in print, 2012). Formally grounding spatio temporal thinking. Cognitive Processing. Alexander Klippel

More information

Decision Trees Entropy, Information Gain, Gain Ratio

Decision Trees Entropy, Information Gain, Gain Ratio Changelog: 14 Oct, 30 Oct Decision Trees Entropy, Information Gain, Gain Ratio Lecture 3: Part 2 Outline Entropy Information gain Gain ratio Marina Santini Acknowledgements Slides borrowed and adapted

More information

EVALUATING RISK FACTORS OF BEING OBESE, BY USING ID3 ALGORITHM IN WEKA SOFTWARE

EVALUATING RISK FACTORS OF BEING OBESE, BY USING ID3 ALGORITHM IN WEKA SOFTWARE EVALUATING RISK FACTORS OF BEING OBESE, BY USING ID3 ALGORITHM IN WEKA SOFTWARE Msc. Daniela Qendraj (Halidini) Msc. Evgjeni Xhafaj Department of Mathematics, Faculty of Information Technology, University

More information

Decision Tree Analysis for Classification Problems. Entscheidungsunterstützungssysteme SS 18

Decision Tree Analysis for Classification Problems. Entscheidungsunterstützungssysteme SS 18 Decision Tree Analysis for Classification Problems Entscheidungsunterstützungssysteme SS 18 Supervised segmentation An intuitive way of thinking about extracting patterns from data in a supervised manner

More information

Aijun An and Nick Cercone. Department of Computer Science, University of Waterloo. methods in a context of learning classication rules.

Aijun An and Nick Cercone. Department of Computer Science, University of Waterloo. methods in a context of learning classication rules. Discretization of Continuous Attributes for Learning Classication Rules Aijun An and Nick Cercone Department of Computer Science, University of Waterloo Waterloo, Ontario N2L 3G1 Canada Abstract. We present

More information

Foundations of Classification

Foundations of Classification Foundations of Classification J. T. Yao Y. Y. Yao and Y. Zhao Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 {jtyao, yyao, yanzhao}@cs.uregina.ca Summary. Classification

More information

INVESTIGATING GEOSPARQL REQUIREMENTS FOR PARTICIPATORY URBAN PLANNING

INVESTIGATING GEOSPARQL REQUIREMENTS FOR PARTICIPATORY URBAN PLANNING INVESTIGATING GEOSPARQL REQUIREMENTS FOR PARTICIPATORY URBAN PLANNING E. Mohammadi a, *, A. J.S. Hunter a a Geomatics Department, Schulich School of Engineering, University of Calgary, Calgary, T2N 1N4,

More information

Parts 3-6 are EXAMPLES for cse634

Parts 3-6 are EXAMPLES for cse634 1 Parts 3-6 are EXAMPLES for cse634 FINAL TEST CSE 352 ARTIFICIAL INTELLIGENCE Fall 2008 There are 6 pages in this exam. Please make sure you have all of them INTRODUCTION Philosophical AI Questions Q1.

More information

Final exam of ECE 457 Applied Artificial Intelligence for the Fall term 2007.

Final exam of ECE 457 Applied Artificial Intelligence for the Fall term 2007. Fall 2007 / Page 1 Final exam of ECE 457 Applied Artificial Intelligence for the Fall term 2007. Don t panic. Be sure to write your name and student ID number on every page of the exam. The only materials

More information

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Decision Trees. CS57300 Data Mining Fall Instructor: Bruno Ribeiro Decision Trees CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Classification without Models Well, partially without a model } Today: Decision Trees 2015 Bruno Ribeiro 2 3 Why Trees? } interpretable/intuitive,

More information

Classification: Decision Trees

Classification: Decision Trees Classification: Decision Trees Outline Top-Down Decision Tree Construction Choosing the Splitting Attribute Information Gain and Gain Ratio 2 DECISION TREE An internal node is a test on an attribute. A

More information

An Introduction to Geographic Information System

An Introduction to Geographic Information System An Introduction to Geographic Information System PROF. Dr. Yuji MURAYAMA Khun Kyaw Aung Hein 1 July 21,2010 GIS: A Formal Definition A system for capturing, storing, checking, Integrating, manipulating,

More information

Chapter 6: Classification

Chapter 6: Classification Chapter 6: Classification 1) Introduction Classification problem, evaluation of classifiers, prediction 2) Bayesian Classifiers Bayes classifier, naive Bayes classifier, applications 3) Linear discriminant

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction

More information

Fuzzy Systems. Introduction

Fuzzy Systems. Introduction Fuzzy Systems Introduction Prof. Dr. Rudolf Kruse Christoph Doell {kruse,doell}@iws.cs.uni-magdeburg.de Otto-von-Guericke University of Magdeburg Faculty of Computer Science Department of Knowledge Processing

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 4: Vector Data: Decision Tree Instructor: Yizhou Sun yzsun@cs.ucla.edu October 10, 2017 Methods to Learn Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

Pairing Transitive Closure and Reduction to Efficiently Reason about Partially Ordered Events

Pairing Transitive Closure and Reduction to Efficiently Reason about Partially Ordered Events Pairing Transitive Closure and Reduction to Efficiently Reason about Partially Ordered Events Massimo Franceschet Angelo Montanari Dipartimento di Matematica e Informatica, Università di Udine Via delle

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 8, 2018 Data Mining CS57300 Purdue University Bruno Ribeiro February 8, 2018 Decision trees Why Trees? interpretable/intuitive, popular in medical applications because they mimic the way a doctor thinks model

More information

A CARTOGRAPHIC DATA MODEL FOR BETTER GEOGRAPHICAL VISUALIZATION BASED ON KNOWLEDGE

A CARTOGRAPHIC DATA MODEL FOR BETTER GEOGRAPHICAL VISUALIZATION BASED ON KNOWLEDGE A CARTOGRAPHIC DATA MODEL FOR BETTER GEOGRAPHICAL VISUALIZATION BASED ON KNOWLEDGE Yang MEI a, *, Lin LI a a School Of Resource And Environmental Science, Wuhan University,129 Luoyu Road, Wuhan 430079,

More information

An Empirical Study of Building Compact Ensembles

An Empirical Study of Building Compact Ensembles An Empirical Study of Building Compact Ensembles Huan Liu, Amit Mandvikar, and Jigar Mody Computer Science & Engineering Arizona State University Tempe, AZ 85281 {huan.liu,amitm,jigar.mody}@asu.edu Abstract.

More information

Data Mining and Decision Systems Assessed Coursework

Data Mining and Decision Systems Assessed Coursework Data Mining and Decision Systems 08338 Assessed Coursework Data Mining of Legacy Data Student Number: 201305389 Stage5: Due 2pm 14 December 2015 Report (PDF File with TurnItIn Report) Date: Sunday, 13

More information

Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees

Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees Comparison of Shannon, Renyi and Tsallis Entropy used in Decision Trees Tomasz Maszczyk and W lodzis law Duch Department of Informatics, Nicolaus Copernicus University Grudzi adzka 5, 87-100 Toruń, Poland

More information

GeoAgent-based Knowledge Acquisition, Representation, and Validation

GeoAgent-based Knowledge Acquisition, Representation, and Validation GeoAgent-based Knowledge Acquisition, Representation, and Validation Chaoqing Yu Department of Geography, the Pennsylvania State University 302 Walker Building, University Park, PA 16802 Email: cxy164@psu.edu

More information

Decision Tree Learning Lecture 2

Decision Tree Learning Lecture 2 Machine Learning Coms-4771 Decision Tree Learning Lecture 2 January 28, 2008 Two Types of Supervised Learning Problems (recap) Feature (input) space X, label (output) space Y. Unknown distribution D over

More information

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology

Decision trees. Special Course in Computer and Information Science II. Adam Gyenge Helsinki University of Technology Decision trees Special Course in Computer and Information Science II Adam Gyenge Helsinki University of Technology 6.2.2008 Introduction Outline: Definition of decision trees ID3 Pruning methods Bibliography:

More information

Interpreting Low and High Order Rules: A Granular Computing Approach

Interpreting Low and High Order Rules: A Granular Computing Approach Interpreting Low and High Order Rules: A Granular Computing Approach Yiyu Yao, Bing Zhou and Yaohua Chen Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail:

More information

Generalization Error on Pruning Decision Trees

Generalization Error on Pruning Decision Trees Generalization Error on Pruning Decision Trees Ryan R. Rosario Computer Science 269 Fall 2010 A decision tree is a predictive model that can be used for either classification or regression [3]. Decision

More information

Deriving place graphs from spatial databases

Deriving place graphs from spatial databases 25 Deriving place graphs from spatial databases Ehsan Hamzei ehamzei@student.unimelb.edu.au Hua Hua Research School of Computer Science Australian National University hua.hua@anu.edu.au Martin Tomko tomkom@unimelb.edu.au

More information

Decision Trees / NLP Introduction

Decision Trees / NLP Introduction Decision Trees / NLP Introduction Dr. Kevin Koidl School of Computer Science and Statistic Trinity College Dublin ADAPT Research Centre The ADAPT Centre is funded under the SFI Research Centres Programme

More information

Similarity-based Classification with Dominance-based Decision Rules

Similarity-based Classification with Dominance-based Decision Rules Similarity-based Classification with Dominance-based Decision Rules Marcin Szeląg, Salvatore Greco 2,3, Roman Słowiński,4 Institute of Computing Science, Poznań University of Technology, 60-965 Poznań,

More information

CHAPTER-17. Decision Tree Induction

CHAPTER-17. Decision Tree Induction CHAPTER-17 Decision Tree Induction 17.1 Introduction 17.2 Attribute selection measure 17.3 Tree Pruning 17.4 Extracting Classification Rules from Decision Trees 17.5 Bayesian Classification 17.6 Bayes

More information

Why Is Cartographic Generalization So Hard?

Why Is Cartographic Generalization So Hard? 1 Why Is Cartographic Generalization So Hard? Andrew U. Frank Department for Geoinformation and Cartography Gusshausstrasse 27-29/E-127-1 A-1040 Vienna, Austria frank@geoinfo.tuwien.ac.at 1 Introduction

More information

Lecture 7. Logic. Section1: Statement Logic.

Lecture 7. Logic. Section1: Statement Logic. Ling 726: Mathematical Linguistics, Logic, Section : Statement Logic V. Borschev and B. Partee, October 5, 26 p. Lecture 7. Logic. Section: Statement Logic.. Statement Logic..... Goals..... Syntax of Statement

More information

Incremental Construction of Complex Aggregates: Counting over a Secondary Table

Incremental Construction of Complex Aggregates: Counting over a Secondary Table Incremental Construction of Complex Aggregates: Counting over a Secondary Table Clément Charnay 1, Nicolas Lachiche 1, and Agnès Braud 1 ICube, Université de Strasbourg, CNRS 300 Bd Sébastien Brant - CS

More information

Detailed and Integrated Representation of Spatial Relations

Detailed and Integrated Representation of Spatial Relations Detailed and Integrated Representation of Spatial Relations WU Changbin, Chen Xia College of Geographical science Nanjing Normal University, Nanjing 210023, China Abstract. Spatial relation is one of the

More information