Texas A&M University - PDF Free Download

Texas A&M University CVEN 658 Civil Engineering Applications of GIS Hotspot Analysis of Highway Accident Spatial Pattern Based on Network Spatial Weights Instructor: Dr. Francisco Olivera Author: Zachry Department of Civil Engineering December 06, 2010

ABSTRACT Traffic accidents are increasing being recognized as a social and public health challenge due to the increased mobility of today s society. Factors that influence traffic accidents are equipment failure, roadway design, poor roadway maintenance and driver behavior. From empirical experience, we know that spatial patterns exist in traffic incident. Some places are more likely to have accident than others, because of poor roadway design or more aggressive drivers exist in that area. To study and identify the areas that traffic incident frequently happen is helpful for road managers to allocate recourses to that areas to either improve the roadway conditions or develop strategies to avoid aggressive behavior. In this paper, I applied spatial statistics techniques in ArcGIS to study the spatial relationships among highway incidents in Houston. Three major steps will be involved in this study, which are construct a network dataset, generate spatial weights matrix and conduct statistical analyses. The function of Generate Network Spatial Weights will be used to obtain spatial weights of incidents which are based on road network rather than on straight line distances. Then the obtained spatial weights matrix will be implemented into Hot Spot Analysis (Getis-Ord Gi*) to get the final results. INTRODUCTION Due to the increased mobility of today s society, road traffic accidents analysis and prevention are increasingly being recognized as an important topic. In United States 2009, there are 30,797 fatal crashes and 33,808 persons dead in these accidents. This is to say that approximately 4 persons dead each hour. A study released by the World Health Organization shows that an estimated 50,000,000 people are injured and 1,200,000 people are killed in road crashes each year worldwide. An estimated 65% increase of accidents over the next 20 years unless new prevention actions are taken. However, highway accident are distributed on each sections of the road, inspection of every single location that an accident happens is impractical. To study and identify the areas (or road sections) that traffic accident frequently happen is helpful for road managers to allocate recourses to those areas to either improve the roadway conditions or develop strategies to avoid road accidents or diminish loses. Factors that contribute to the road accident occurrence include: traffic volume, roadway design, weather, configuration of highway networks and maintenance of highway, etc. and these factors are all exhibit strong spatial

patterns (Xiea and Yanb 2008). Thus investigating the spatial patterns of traffic accident is crucial steps in understanding how, where and when a traffic accident happens. To identify the areas that traffic accident frequently happens, we introduced the topic of the cluster analysis. Cluster occurs when features in a center area are found have similar high or low values. Identify the locations of accident clusters can help to identify the causes of the accidents. By comparing it with other locations that cluster does not occur, it is also possible to find causes that lead to the accident. GIS technology as a valuable tool that combines the spatial information with other data has been widely used in road accident analysis procedure to visualize accident data and analyze hotspots in highways. Here, accident hotspot is the cluster of individual accidents. GIS can hold large amount of data that can be easily stored, shared analyzed and managed (Erdogan et al. 2007). Existing studies only consider geometric distance and did not take the road network into consideration. However, the accidents are network based and it is important to take the road network into consideration when study distance between two accidents locations. In this study, distances between different accidents are defined based on the configuration of road network, so that the spatial relationships between accident data are defined based on the highway network. To realize this function, a network data set is created as the basic background for the accident analysis. Then generate spatial weight matrix function was used to calculate the weight matrix for accident data. Then hot spot analysis (Getis-Ord Gi*) were used to obtain the spatial relationships among traffic accident data. LITERATURE REVIEW GIS-based accident information systems provide a platform to conduct spatial analysis of the accident data which are almost impossible by using a non-spatial database. Since 1990, the GIS technologies and its applications on traffic safety and accident analysis gained popularity among agencies and researchers. Erdogan et al. (2007) summarized existing analysis methods that used in traffic accident analysis procedures, which include intersection or segment analysis, proximity analysis, spatial query analysis, cluster analysis, density analysis. He also introduced the statistical analysis methods: kernel density analysis and repeatability analysis to conduct the accident analysis and determine the hot spots of the accidents. The study results showed that

cross roads and junction points are places that accident frequently happens. Saffet(2009) studied the inter-province differences in traffic accidents and mortality. He used GIS to extract the features that can influence the accidents like day, temperature, humidity, weather conditions, and month of occurred traffic accidents. Apply the CFS method to select important features that can influence the traffic accident. Use SVM and ANN to classify the traffic accident dataset. The study results show that the proposed model has better prediction results of traffic accidents than that of SVM or ANN models alone. Anderson (2008) use Geographical Information Systems (GIS) and Kernel Density Estimation to study the spatial relationships among injury related accident data then using a K-means clustering algorithm to identify the accident hot spot. Based on collision and accident attribute data in London, UK, five groups and 15 clusters were created. There is no universally accepted definition of accident hotspot, Hauer (1997) describes two methods that are widely used to rank the accident locations, one is based on accident rate the other is based on accident frequencies. Road accident hotspot analysis usually focus on road segments or junctions, area based road accident analysis are seldom used in existing studies. A comprehensive understanding of factors are contribute to accidents are important in hotspot analysis procedures, for example, the severity of the accident and the surrounding environment. Because the GIS platform has the ability to link a large amount of disparate data bases, it allows both historical and statistical analysis of traffic accident. The most commonly used function in traffic accident analysis is spatial analysis extension and it provides varies ways to conduct accident analysis. METHODOLOGY The purpose of studying the distribution of traffic accidents is to find out the cluster of accidents that have the same feature, like the clearance time, the number of people injured or the number of death. In this study, the clearance time is used as the attribute feature. For an accident, if the clearance time is long, then it is defined as a serious traffic accident, if the clearance time is short, it is defined as a minor traffic accident in this study. The basic idea of the network based hotspot analysis of the accident data is first calculate the network spatial weights between any pair of accident data and then use the hot spot analysis (Getis-Ord Gi*) function in ArcGIS to find the

locations that long clearance time traffic accidents happens. To realize this function, three steps are involved: Data Preparation Data used for the network based accident hotspot analysis include accident data and road network data. The accident data includes the longitude and latitude of the accident locations, roadway name, cross street name and clearance time. The road network data should include the line feature of the road network, the length of each road section, the longitude and latitude of the road and the turn features. Network Dataset Before generate network spatial weights, a network dataset is needed to represents the distance among different accident locations. To create a network dataset, we first need to enable the network analysis extension in AcrCatalog. In ArcCatalog, go to the direction where the road network shapfile is located and choose the New Network Dataset to start define the attributes of the network dataset. In the following steps, we need to define the name of the network dataset, the network connectivity, elevation field settings, turn information, driving directions. After all the settings are defined, click yes to build the network. Then close the ArcCatalog. The created network dataset is a vitalization of the transportation networks and offers functions that can model impedances, restrictions, and hierarchy for the network. A network dataset includes: two shapfiles which are lines features that represents the location of roadway and junctions where two roadways intersect, one shapefile based network dataset. Generate Network Spatial Weights Different from traditional statistical method, spatial statistics takes space and spatial relationships into consideration. Network spatial weights are conceptualization of spatial relationships between any two points and are very important in the hotspot analysis. Different definitions of the weights will leads to different results. Euclidean distance, contiguity, fixed or inverse distances are most commonly used weighting schemes. Because spatial relationships among traffic accident data are closely related with road network, define spatial relationship in terms of

real road network will be more accurate. In this study, weights among different accident data are calculated based on the road network. The recently developed generate network spatial weights tool in ArcGIS can realize this function. Figure 1 illustrates the different conceptualizations of spatial relationships. Inverse Distance Distance Band Zone of Indifference Network Spatial Weights Fig.1. Most commonly used spatial weights The inverse distance indicates that correlation exists among all features and the correlations become smaller as the distance between these features grows larger. A fixed distance band allows one to specify a distance that features within that distance is closely related while uncorrelated when out that distance. Thus the value within that distance is a fixed number and immediately goes to zero when out of that distance. The zone of indifference combines the inverse distance method and distance band method: value within a distance is a fixed number

when out of that distance it gradually goes to zero. The network spatial weights are different from previous three methods, which define the weights among different objects based on a Network dataset. Since traffic accidents are network based, it is more appropriate to define the distance among different accident points by using the network spatial weights. Hot Spot Analysis (Getis-Ord Gi*) After we generate the network spatial weights, the next step is conduct traffic accident hotspot analysis. The hot spot analysis tool in ArcGIS applies the Getis-Ord Gi* statistics can realize this function and calculate the z-value which indicates whether features with high or low values are clustered together at each location. In this study, the duration of accident are used as the criterion to identify where accidents with longer duration are clustered together and where accidents with shorter duration are clustered together. The statistical definition of Getis-Ord Gi* is as following: ( ) ( ) Where The attribute value for feature j. Sample size. Spatial weights between feature and. The outcome of the Gi* statistic is a z-value for each feature. Higher z-value indicates cluster of accidents that last for a longer period, while lower z-value indicates large number of accidents that have shorter duration locate around this area. The hot spot analysis begin with a null hypothesis that there is no spatial pattern exists among studied features. In this study, the null hypothesis is that spatial correlations do not exist among traffic accidents. If the null hypothesis is true, the traffic accident should follow the normal distribution. The z score is used as a criterion to decide whether or not this null hypothesis should be rejected, while the p value tells the probability that one made a false statement.

Fig. 2. Normal distribution, the p-values and z-scores At the tail of the normal distribution, z-values are either very high or very low and the p-values are relatively small. This means that the null hypothesis is unlikely to happen at this kind of situations, which means spatial pattern exists. The outcome of the hotspot analysis is a z-score and a p-value for each accident data. Thus, if in an area most accidents have higher z-score and lower p-value, then it is very likely that this area is an accident prone area and actions are needed to prevent or release the accident happens in this area. APPLICATION I choose Houston highway accident data to conduct the accident hotspot analysis. Getis-Ord Gi* statistics is used to get the p-values and z-scores. Network spatial weights and Euclidian distance are used as two different methods to calculate the spatial distance between traffic accident data. Before conduct the hotspot analysis, one needs to first construct the network dataset to provide the basic structure to calculate the network spatial weights. Then use the generate spatial weights function to obtain the spatial weights. The last step is Getis-Ord Gi* analysis of accident hotspots. Data Description Data used in this study are Houston highway accident data and Houston highway network shapfile. Accident data can be obtained from police reports and should include basic accident

data attribute, for example geographic coordinate, accident duration and corresponding street information. Figure 3 is the basic information for accident data, which includes the latitude, longitude of the accident location, incident ID and incident duration and so on. The accident data are in Excel file, so we need to first add the Excel data through the Add Data dialog box. To display the accident locations on the map, one needs to use the Make XY Event Layer tool to create a point feature shapefile. Fig. 3. The basic information for accident data Road network data should contain basic information to create network dataset. The Houston highway network data were obtained from Houston-Galveston Area Council website, which contains the basic information that required creating the network dataset. Generate Network Dataset The Houston highway network I get is a simple line feature file, which contains one network impedance value-distance. To create a Network database, one needs to start ArcCatalog, enable the network analysis extension and then create the network in the ArcCatalog by choosing the New Network Dataset option shown as in figure 4. I give the name of the new network dataset as hgac_majthrfare_nd, use global turns, and choose the length of the road as the cost. The summery of the newly created network is shown in figure 5. If everything is right, then choose finish to generate the network dataset.

Fig. 4. New Network Dataset function Fig. 5. The summary of the new network dataset

After successively create the network dataset, three files will be created including two shapefiles and one network dataset shapefile. Figure 6 shows three files that a network dataset generated. The hgac_majthrfare_nd file contains the basic network dataset information and we can realize the network analysis functions based on this shapefile network dataset. The hgac_majthrfare_nd_junctionsshapefile in this study represents the intersections of the road network. Fig. 6. Shapefiles of network dataset After creating the network data set in AcrCatalog, one can open the newly created feature in AcrMap. The distance between two points will be calculated based on the network instead of straight distance. Figure 7 shows the travel distance between point 1 and point 2, which is longer than straight line distance. If point 1 and point 2 are two accident locations, it is more reasonable to use this distance to represent their spatial relationships, since the road accident is closely related with the road network.

Fig. 7. Network distance between two points Generate Network Spatial Weights After generating the road network dataset, we can calculate the network spatial weights. To generate network spatial weights, a point feature class is needed to represent both feature origins and feature destinations. In our case, the accident locations are used as the feature origins and feature destinations. The generate network spatial weights function first allocate the accident on the highway network and then use the travel distance to calculate the weight between each and every other accidents locations. Figure 8 shows the process that to create the network spatial weights between different accident data. The accident data and Houston highway network data were first displayed on the map and then open the generate network spatial weights tool. The input feature class is the accident shapefile, the input network is the Houston highway network

and the impendent attribute is mile in this study. Fig. 8. The generate network spatial weights function The output of the generate network spatial weights function is a spatial weights matrix file which contains the spatial relationships among all objects. Figure 9 is the table format of the spatial weights matrix file, FieldID is the from feature ID, NID is the to feature ID. WEIGHT represents spatial relationship between the FROM feature and the TO features. This file will be used to represent spatial relationship among accident points in the Hot Spot Analysis (Getis-Ord Gi*).

Fig. 9. Network based spatial weights matrix Hot Spot Analysis (Getis-Ord Gi*) Several functions in ArcGIS can conduct accident hotspot analysis, One of them is hot spot analysis(getis-ord Gi*) function. This function calculate the Getis-Ord Gi* statistics for each accident to tell us where accidents with long clearance time are clustered together and where accidents with short clearance time are clustered together. In this study, I use two methods to study the Getis-Ord Gi* statistics of the accident data by using different spatial weights functions: one is the most commonly used Euclidian Distance, the other one is the Network Spatial Weights. Figure 10 shows the Hot Spot Analysis (Getis-Ord Gi*) fuction in ArcGIS. The input feature class is accident; different input of the conceptualization of spatial relationships will lead to different results. Available options are inverse distance, inverse distance squared, fixed distance band, zone of indifference get spatial weights from file, distance band or threshold distance.

Fig. 10. Hot spot analysis(getis-ord Gi*) function I first choose the inverse distance as the conceptualization of spatial relationships and then use Euclidean Distance as the distance method. So that the relationship among accidents are calculated based on the inverse Euclidean distance, which is to say that if the nearby accidents will have closer relationship then the that located far away. The results of the hotspot analysis of the accident data is shown in figure 11. The blue points indicates accidents that have shorter clearance time were clustered together, while the red points indicates that accidents that have longer clearance time were clustered together. The Euclidean distance based hotspot analysis can identify the area where accidents with long clearance time clustered and where accident with short clearance time clustered.

Fig. 11. Euclidian distance based hot spot analysis Then I conduct the network based hotspot analysis by choose get spatial weights from file option and use the created network spatial weights swm file to define the spatial relationships of the accident data. The distance of any two accidents are calculated based on the network. Figure 12 is the results of the hotspot analysis, the red points indicate the locations where accidents with longer clearance time are clustered together and the blue points indicate the locations where accidents with shorter clearance time are clustered together. The network based hotspot analysis is able to identify the road links where accidents frequently happen.

Fig. 12. Network based hot spot analysis COMPARISON WITH OTHER METHODS Other two methods that can study the spatial distribution of the accident data are central feature method and point density method. The central feature tool identifies the most centrally located feature in the accident data. Figure 13 shows how feature central function works. The input feature is the highway accidents, I choose Euclidian distance to calculate the distance between each pair of features and roadway is used to group features. The output of the method is a point feature that located in the central among studied objectives. In this study, it is the central of accidents happen on the same road sections. Figure 14 is the accident central at each road section. If one wants to find a best location to deal with the potential accidents in the future, the point central tool can be used.

Fig. 13. Central feature function

Fig. 14. Accident central of each link The point density method shows where the accidents are concentrated by displaying the accident attribute on the map. This analysis method can be realized by the point density function in ArcMap as shown in Figure 15. The input point feature is accident, population field is LnDuration and the output cell size is 500. Figure 16 shows the accidents density map. This map offers a general view where accidents are densely located. But in the Houston accident analysis, the density map offers very little information. Since the central of the highway network are densely distributed, the number of accident data is also densely located there. Fig. 15. Point density feature function

Fig. 16. Accidents density map CONCLUSIONS Hotspot analysis performs better than central feature and point density function in identify the accident prone area. Since the central feature can only points out the accident central of studied objects and cannot points out where accidents frequently happens. Although the point density function can points out the area where accidents frequently happens, but it only displays a density map and it only offer a general view of where accident are more likely to happen. The hotspot identifies the locations where accidents frequently happen by using the statistical method. This method is more reliable. Network based hotspot analysis identify the road section where accident happens while the Euclidian distance based hotspot analysis can only points out the area where accident frequently happen. Because the accidents are closely related with the road network, it is more reasonable to

calculate the spatial pattern of traffic accidents based on the network. Study results of the project shows that the network based hotspot analysis are able to points out the links that accidents happen. FUTURE RESEARCH Refine the network dataset according to the real highway network conditions. Because the lack of the data, the Houston highway network dataset was simplified. The cost for the highway network is only based on the length of the link and I did not take other impendence factors into consideration. In the future research, in order to make the results more accurate, the network dataset should be refined if relevant information is available. In this study, I only focus on identifying the locations where accidents frequently happen. The next step is to study the factors that may influence traffic accident data. One way to identify these factors is to study the similarities among traffic accident-prone areas. So that transportation agencies can take proper actions to prevent the accident from happening by control these factors. REFERENCE Xiea, Z., and Yanb,J.(2008). Kernel Density Estimation of traffic accidents in a network space. Computers, Environment and Urban Systems, 32(5), 396-406. Erdogan, S., Yilmaz, I., Baybura, T., and Gullu, M. (2007). Geographical information systems aided traffic accident analysis system case study: City of Afyonkarahisar. Accident Analysis and Prevention, 40(1), 174-181. Erdogan, S.(2009). Explorative spatial analysis of traffic accident statistics and road mortality among the provinces of Turkey. Journal of Safety Research, 40(5), 341-351. Anderson, T.K.(2009). Kernel density estimation and K-means clustering to profile road accident hotspots. Accident Analysis and Prevention, 41(3), 359-364. Hauer, E.(1997). Observational before-after studies in road safety. Pergamon, Oxford.