Line generalization: least square with double tolerance

Line generalization: least square with double tolerance J. Jaafar Department of Surveying Se. & Geomatics Faculty of Architecture, Planning & Surveying Universiti Teknologi MARA, Shah Alam, Malaysia Abstract A new approach to line generalization using a least squares method with double tolerance (LS:DT) is introduced in this work. In this techque, anchor points that preserve the line caricature are first identified, using the Douglas & Peucker technique. Once these anchor points are located, a least squares line that passes through a set of identified points using the anchor points as guides is constructed. As a result, a least squares line that begins and ends at projected anchor points will be established. In order to control the allowable projected distance for the anchor points, a shft tolerance is used. With the introduction of the shift tolerance, the least squares lines are adjusted to enhanced the generalization effects. Since the least squares lines are not linked together, common intersection points (IP) are established. Joining together the corresponding IP and projected points then forms the generalize line. It is found that, apart from ehbiting a global approach towards generalization, LS:DT are capable of minimising length and areal (polygonal) distortion with respect to the original line and area while still preserving its caricature. Another advantage of LS:DT is its capability to perform generalization either by the Douglas & Peucker techmque, least squares or a combination of the two by specifying the appropriate shift tolerance.

136 Management Information Systems 1 Introduction Detecting building edges or lines from remotely sensed data is a task of great importance for the construction of 3D models and cartographc applications. The possibility of extracting building edges through vectorising remotely sensed dataset seems appealing. In most cases, however, vectorised (X,Y) lines extracted from remotely sensed data record far more data points than the required minimum. The reduction of the number of points describing lines by eliminating unnecessary points without jeopardising its true shape is a problem of high importance. There are various approaches to the reduction of unnecessary points in the representation of a line [I]. Several algorithms are available to remove unwanted details, or to select or emphasise particular features. Most simplification algorithms incorporate some mechanism to control the amount of detail that is removed. For example, the simplest and most frequently used method of reduction is to maintain every n point by deleting the intermediate points, where n is the numeric threshold. The disadvantage of this method is the removal of critical points such as edge comers, while the representation of straight lines is still over-defined. Furthermore, the result of the simplification may be a poor caricature of the original line [2]. A well-known approach is that of Douglas & Peucker [3]. Their technique is to select coordinates that fall outside a predefined bandwidth in a recursive fashion. This technique has been widely publicised and has been incorporated into various GIS software packages. Furthermore, this technique is capable of preserving the line or its caricature [4]. Nevertheless, for generalizing detected edges or lines from remotely sensed data, a more global approach should be investigated, in which all the vectorised points should have the same merit. This leads to an approach suggested by Cromley [2] known as principal axis line simplification. In this technique a centre line is constructed, based upon a set of points within a specified threshold and thus exhibits a more global approach where all the points are honoured to construct the centre line. In this approach, the established point at the end of a constructed centre line will automatically be the starting point for the next centre line. Unfortunately, this will result in the choice of a fixed end node before the overall nature of the next data series is considered. In this work, a new approach based on least squares technique is introduced. The next section reviews the basic concept of least squares methods and the corresponding sections discuss the methodology and results of the proposed new algorithm. 2 Least square line From a set of N data points (xl, yl)...,( XN, y ~), where the abscissas are distinct, it is possible to determine a linear function of the form: y =Ax) =Ax + B (1)

Management Information Systems 137 Since in reality the data points might not lie on a straight line that satisfies the above function, it is realised that the true valuef(xk) has to satisfies Equation 2, where ek is known as the deviation or residual. The Root Mean Square Error (RMSE) of the function can be expressed by the following norm; The least squares line defined by equation 1 for a given set of data points minimises Equation 3. The solution for a least squares line can be obtain by solving the following normal equations. where A is the gradient of the least squares line, B is the intersection point on the y- axis and N is the total number of data points. 2.1 Douglas & Peucker distance tolerance (First Tolerance) The first question to be faced is the choice of anchor points. White [5] states that the identification of anchor points selected by the Douglas & Peucker technique is almost identical to the one selected by visual inspection, and thus the line's caricature is maintained Muller [4]. Anchor points are hence first identified using the Douglas & Peucker [3] technique. In order to determine the positions of the anchor points, the distance tolerance (first tolerance) is the first input to the generalization process. Once the anchor points are determined, those points that lie between and including a pair of identified anchor points are used to determine the position of the least squares line. The anchor points are then shifted to represent the starting and end points of a least squares line. Figure 1 shows the initial anchor points, projected point and the least squares line for a given distance tolerance. In Figure 1, points A, B and C are the anchor points detected using the Douglas & Peucker technique for a specified distance tolerance. Using anchor points A and B

13 8 Management Information Systems as a guide, the first least squares line (A to BA ) was determined based on the set A, 1, 2, 3, 4, 5 and B. As a result, points A and B are projected to a new positions (A and BA ) to delineate the starting and end points for the least squares line. The coordinate for the new position is computed using equations 6 and 7. Similarly, for the next least square line (Bc to C ), points B and C are projected to points B, and C respectively. However, as shown in Figure 1, the two least squares lines are not joined together to form a continuous line. A simple solution to this problem would be to project the two least square lines in a search for a common intersection point (IP). The IP will then replace the two projected points (BA and Bc ) and act as the connection between the two least square lines. Figure 2 shows a common IP (B ) which was located by projecting the two least squares lines. Anchor Point 3 Least Square Line Anchor Point 1 I A %. Projected Distance -b.., Rojected Point... : 2.e...i.... 4.._..,. Original Line *. Projected Point 5 Figure 1 : Anchor, projected points and the least squares line where X,,Y,, are the coordinates of the projected point for the least squares line, Al is the gradient of the least squares line, A2 is the gradient of the projected line, B1 is the value where the least squares line intercept the y-axis, and B2 is the value where the projected line intercepts the y-axis. Even though this might solve the problem to certain extent, there might be cases where the IP position is shifted too far from its original position (anchor position, i.e; pt P, Q & R), and thus affect the line caricature, as shown in Figure 3. In order to reduced this effect, and enhance the generalization capability, a shift tolerance (second tolerance) is introduced.

Management Information Systems 139 Least Square Line e B (ZP) (Common Intersection Point) Figure 2: The intersection point (IP) between two least squares lines....-.. - - - LeastSquareLine....... Douglas 86 Peucker Line 0 Intersection point (g 0 Anchor Point Figure 3: Relationship between IP and anchor point positions 2.2 Shift tolerance (Second tolerance) With the inclusion of the second tolerance (st), the IP position is controlled to certain extent and leads to the least squares with double tolerance technique (LS:DT). Anchor points that have a projected distance greater than the specified second tolerance will retain their position. This will leads to four possibilities (cases) in determining the IP position (Figure 4).

140 Management Information Systems Case I Referring to Figure 4 (Case I), if the st is greater than the projected distance (d, and d2), the two least squares line will be projected to determine the IP position. In this case, both lines will inherit the least squares properties. The generalized line (LS:DT) will then be A-IP-C. Case I1 If one of the projected distances is greater than the second tolerance, for example (d2 > st) and (d, < st), the IP position is as portrayed in Figure 4 (Case 11). For this special case, the least squares line from point C is forced to pass through point B. Therefore, the line joining point A and IP will have the properties of a least squares line but line IP to C will be shifted slightly. Case I11 In the third case, the situation is the reverse of the second case, but the condition is d2 < st and dl > st. The new IP position is shown in Figure 4 (Case 111). n Ongnal Line LS:DT Care I c CngnalLine C LS DT Care rri Caserv Figure 4: The relationship of Intersection Point (IP), second tolerance (st), least squares line (LS), Least Square with Double Tolerance (LS:DT) and the projected distance (dl and d2) Case IV In this case, both of the projected distances are greater than the st. Therefore, the IP position will retain the Douglas & Peucker anchor point position (Figure 4, Case IV). Thus, if the user specifies a zero value for the second tolerance, the result of the generalization process would be the same as in Douglas & Peucker technique.

3 Outcomes Management Information Systems 14 1 Figure 5 shows the effect of st on the generalization process. By comparing to Figure 3 and 5, it can be seen that, by introducing st, the IP position (P, Q and R) can be controlled to certain extent. Flexibility in the generalization process can now be achieved, especially in determination of correct building outlines (by visual inspection for acceptable edge intersection). Figure 6 shows the overall procedure for the least squares with double tolerance (LS:DT) technique. O Interiection Point (P) ' Anchor Point Figure 5: Effect of second tolerance on IP position An interactive C++ program with graphical output was written to accomplish the task. In this study, the Douglas & Peucker technique is used to compare the generalization result achieved by LS:DT. A dataset consists of a vectorised building outline (Karlsruhe Castle, Germany) derived from subtraction technique using a Digital Elevation Model (DEM) and LiDAR Digital Surface Model (DSM) [6]. Comparison of the proposed generalization technique (LS:DT) [6] and Douglas & Peucker method are based on the following criteria: 0 0 Percentage distortion in total length, Percentage distortion in area (polygons), and Visual inspection on the generalized line. The first criterion for the evaluation process is to examine the differences in length of the generalized line with respect to its original length. The second criterion is to examine the capability of the generalization technique to preserve an enclosed area. The capability to minimise areal distortion would be a benefit to various cartographic and GIS applications such as the construction of 3D models. In addition to the quantitative assessment outlined above, a qualitative assessment is also carried out. Generalization is a basic human activity involving intellectual functions, where part of the evaluation process should be to analyse graphically and thus is hard to describe verbally. Consequently, the third criterion for assessment involves

142 Management Information Systems graphical output. There will be no comments on the graphical presentation for the Douglas & Peucker approach, since it is a well-known technique to preserve a line caricature [4]. However, the LS:DT approach is judged on its capability to position the IP based on the st defined. INPUT I 1 I i... I Identify anchor points using j Douglas & Peucker method Least Square Line using anchor points as guide Generalisation Using Douglas & Peuckerq d To/= 0 JointlP & projected point for generalised line (LS:DT) Figure 6: Overall procedure for the LS:DT technique 4 Results and discussion Figure 7 shows the generalized building outline using the Douglas & Peucker and the LS:DT algorithms for various first and second tolerance (ft and st) values. Figure 7 shows that the generalized building outlines produced by two techniques are almost identical. Reference to Table 1 shows that the quantitative assessment reveals differences between the two approaches. The LS:DT method shows a smaller areal and length distortion than the Douglas & Peucker technique. Apart from better preserving the enclosed area, it is also shown that the percentage differences in length (Criterion 1) are much smaller than those generated by the Douglas & Peucker technique at various tolerances. Even though the percentage differences are not pronounced for the dataset used, it should be noted that the effect is directly proportional to the dimension of the generalized object. In other words, the effect will be large on building with bigger dimension. From Figure 7 it is apparent that, by altering the second tolerance (ft = 8, st = 0.60) the IP position may be displaced distinctly away from the Douglas & Peucker anchor points. Experimenting with various second tolerances to enhance the degree of generalization should be carried out interactively. With the introduction of the second tolerance, the generalization process is more flexible.

Management Information Systems 143 Figure 7: Generalization of vectorised LiDAR dataset (Karlsruhe Castle, Germany) at various first and second tolerance (ft and st) Table 1 : Percentage differences between original data and generalization method 5 Conclusions An approach to line generalization using the LS:DT algorithm, which has the unique capability to preserve area and length is discussed. It is shown that the approach is capable of performing a flexible generalization by specifying appropriate first and second tolerance values. The major advantages of the proposed method (LS:DT) is that the generalization can be carried out interactively by specifying first and second tolerance in a search for the best representation of detected building edges.

144 Management Information Systems For an overall summary, the following points are noted; By specifying the appropriate value of st, the LS:DT algorithm could enhance the Douglas & Peucker procedure while still preserving the line caricature, The LS:DT technique shows reduced area and length distortion compared to the Douglas & Peucker technique, The LS:DT algorithm will perform the Douglas & Peucker generalization when a zero value is specified for st, and LS:DT will construct a full least squares approach by specifying a large value for st, this might benefit various applications such as the creation of 3D models from remotely sensed data. Acknowledgement LiDAR data used courtesy of TopoSys GmbH, Germany. Research and computing facilities were made available by the Department of Surveying Sc. & Geomatics, Universiti Teknologi MARA, Malaysia. References Buttenfield, B. P. and McMaster, R. B., 1991, Map Generalisation: Making Rules for Knowledge Representation, (UK: Longman Group Limited). Cromley, R. G., 1992, Principal axis line simplification. Computers and Geosciences, Vol. 18(8), pp. 1003-1011. Douglas, D. H. and Peucker, T. K., 1973, Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. The Canadian Cartographer, Vol. 10(2), pp. 110-122. Muller, J. C., 1987, Fractal and automated line generalization. The Cartographic Journal, Vol. 24, pp. 27-34. White, E. R., 1985, Assessment of line-generalization algorithms using characteristic points. The American Cartographer, Vol. 12( l), pp. 17-27. Jaafar, J., 2000, An evaluation of the generation and potential applications of digital surface models, Unpublished PhD thesis, (The University of Nottingham, Nottingham, UK).