Identifying Vehicular Crash High Risk Locations along Highways via Spatial Autocorrelation Indices and Kernel Density Estimation

Size: px

Start display at page:

Download "Identifying Vehicular Crash High Risk Locations along Highways via Spatial Autocorrelation Indices and Kernel Density Estimation"

Jessie Jacobs
5 years ago
Views:

World Journal of Engneerng and Technology, 2017, 5, 198-215 http://www.scrp.

1 World Journal of Engneerng and Technology, 2017, 5, ISSN Onlne: ISSN Prnt: Identfyng Vehcular Crash Hgh Rsk Locatons along Hghways va Spatal Autocorrelaton Indces and Kernel Densty Estmaton Azad Abdulhafedh 1 Department of Cvl and Envronmental Engneerng, Unversty of Mssour-Columba, MO, USA How to cte ths paper: Abdulhafedh, A. (2017) Identfyng Vehcular Crash Hgh Rsk Locatons along Hghways va Spatal Autocorrelaton Indces and Kernel Densty Estmaton. World Journal of Engneerng and Technology, 5, Receved: February 27, 2017 Accepted: May 7, 2017 Publshed: May 10, 2017 Copyrght 2017 by author and Scentfc Research Publshng Inc. Ths work s lcensed under the Creatve Commons Attrbuton Internatonal Lcense (CC BY 4.0). Open Access Abstract Identfyng vehcular crash hgh rsk locatons along hghways s mportant for understandng the causes of vehcle crashes and to determne effectve countermeasures based on the analyss. Ths paper presents a GIS approach to examne the spatal patterns of vehcle crashes and determnes f they are spatally clustered, dspersed, or random. Moran s I and Gets-Ord G* statstc are employed to examne spatal patterns, clusters mappng of vehcle crash data, and to generate hgh rsk locatons along hghways. Kernel Densty Estmaton (KDE) s used to generate crash concentraton maps that show the road densty of crashes. The proposed approach s evaluated usng the 2013 vehcle crash data n the state of Indana. Results show that the approach s effcent and relable n dentfyng vehcle crash hot spots and unsafe road locatons. Keywords Spatal Autocorrelaton, Kernel Densty, Moran s I, G* statstc, Hot Spots Analyss 1. Introducton Identfyng vehcular crash hgh rsk locatons along hghways s a useful tool that can help transportaton agences allocate lmted resources more effcently, and fnd effectve countermeasures. A crash hot spot s a locaton showng concentraton of ncdents, and hot spot analyss s a method for analyzng the spatal tendency between ponts or events wthn ths locaton [1]. If a feature s spatal tendency s hgh, and the values of ts neghborng features s also hgh, t s a part of a hot spot, and f the tendency of a feature and ts neghborhoods s low, t s a part of a cold spot. Spatal patterns of traffc crash data can be analyzed by 1 PhD n Cvl Engneerng DOI: /wjet May 10, 2017

2 spatal autocorrelaton, whch s a measure of the correlaton of an observaton wth other observatons through space. The spatal autocorrelaton phenomenon can be summarzed by the Tobler s frst law of Geography that everythng s usually related to all else but those whch are near to each other are more related when compared to those that are further away [2]. Most statstcal analyses are based on the assumpton that the values of observatons n each sample are ndependent of one another. Spatal autocorrelaton volates ths assumpton, because samples taken from nearby locatons are related to each other, and hence, they are statstcally not ndependent of one another [1] [3]. To assess spatal autocorrelaton, a dstance measure must be specfed n order to defne what s meant by two observatons beng close together. These dstances are usually presented n the form of a weght matrx, whch defnes the relatonshps between locatons at whch the observatons occur [4]. If data were collected at n locatons, then the weght matrx wll be n x n wth zeroes on the dagonal. The weght matrx s often row-standardzed, (.e. all weghts n a row sum to one), and can be constructed gven a varety of assumptons, such as [1]: A constant dstance that represents the weght for any two dfferent locatons. A fxed weght for all observatons wthn a specfed dstance. k nearest neghbors that represents a fxed weght, and all others non-neghbors are zero. Weght could be proportonal to the nverse dstance, or nverse dstance squared. There are a number of ndces or statstcs that attempt to measure spatal autocorrelaton for contnuous data, such as Moran s I, Geary s C, and Gets-Ord G statstc [5]. These ndces can be used as Global or Local measures dependng on the scope of the analyss. Global mples that all elements n the weght matrx are ncluded n the calculaton of spatal autocorrelaton provdng a sngle measurement of spatal autocorrelaton for an entre data set. Local ndces calculate spatal autocorrelaton for all areal unts of analyss. In other words, the global autocorrelaton s the extent to whch ponts that are close together n space have smlar values, and the local autocorrelaton s the extent to whch ponts that are close to a gven pont or area have smlar values. Anseln [6] outlned a general class of local ndcators of spatal autocorrelaton termed the Local Indcator of Spatal Autocorrelaton (LISA) statstc, whch mples that the LISA statstc decomposes global results nto ther local parts. For example, a sgnfcant global ndex at a gven spatal pont or secton may hde large spatal patches of no autocorrelaton, and LISA can detect ths and show us the locaton of these nsgnfcant patches n space. Conversely, an nsgnfcant global result may hde patches of strong autocorrelaton, and LISA can detect ths agan. Generally, both hot spots and cold spots can be dentfed as locatons for whch the LISA statstc s sgnfcant [6]. Hgh and sgnfcant values of Moran s I and G statstc n a spot ndcate a hgh spatal clusterng (hot spot), whereas low and sgnfcant values ndcate a low spatal clusterng (cold spot). The type of clus- 199

3 terng and ts statstcal sgnfcance s evaluated based on a confdence level and on the output z-scores and the correspondent p-values. These wll determne whether a data pont or a locaton belongs to a hot spot (denoted by Hgh-Hgh, HH), cold spot (denoted by Low-Low, LL) or an outler (a hgh data value surrounded by low data values or vce versa, denoted by Hgh-Low, HL or Low-Hgh, LH). Other methods for studyng spatal patterns of crash data as pont events have recently been developed. One of the most wdely used s the Kernel Densty Estmaton (KDE). The goal of KDE s to develop a contnuous surface of densty estmates of dscrete events such as road crashes by summng the number of events wthn a search bandwdth. Many recent studes have used the 2-D planar KDE for hot spot analyss. However, ths method has been crtczed n relaton to the fact that road crashes usually happen on the road lnks and need to be consdered n a road network space represented by 1-D dmenson. Therefore, some studes have extended the planar KDE to network spaces, whch estmates the crash densty over a dstance unt n a 1-D measurement nstead of an area unt [7] [8] [9]. However, a major weakness of the KDE methods s that t cannot be tested for statstcal sgnfcance [8] [10]. 2. Lterature Revew Vehcle crashes have been nvestgated from dfferent spatal and temporal perspectves by dfferent researchers usng vared procedures. The Black and Thomas [11] paper was the frst major work that clearly dstngushes traffc crash hotspots. Ther work ndcated that a postve network autocorrelaton of road crashes can cause the spatal clusterng of traffc hotspots. However, the analyss focused on network autocorrelatons at a global level wthn an entre dataset by usng Moran s I and the assocated z-score tests. Flahaut [12] ntroduced the use of the local ndces of spatal autocorrelaton (LISA) to examne the crash patterns of road networks n a Belgan provnce. Ths work explaned the advantages of hotspots and further developed logstc regresson models to explan traffc crash hotspots wth road characterstcs and local envronmental condtons. Yamada and Thll [7] explaned and compared hotspot methods by Kernel Densty Estmaton (KDE) at the planar and network-constraned. Ther work ndcated that the planar KDE analyss can produce over-detectng cluster patterns. More recently, Yamada and Thll [13] ntroduced a method called local ndcators of network-constraned clusters (LINCS) to dentfy hotspots by usng the network-based approach. Theoretcally, traffc crashes can occur at every possble locaton over the entre road network. However, t s mpractcal to examne the clusterng pattern at every possble pont usng the network-based approach. Hence, they suggested usng reference ponts along the network wth an equal nterval dstances. Schwetzer [14] used a process called kernel smoothng for hot spot analyss. Ths process creates local estmates of the measure of the spatal ntensty usng the count of frequency of ponts wthn a gven dstance of each pont, relatve to symmetrc dstrbuton. Xe and Yan [8] used a planar and a network-based KDE approach to examne traffc crashes n the state of Ken- 200

4 tucky. In mplementng the network KDE, they suggested usng a lnear segment of roads as the basc unt for aggregatng crashes, calculatng densty, and for vsualzaton. They found that segments of shorter length are more capable of showng the local varatons of the segments, and concluded that the networkconstraned KDE s more approprate than the planar KDE for traffc crash analyss. Erdogan et al. [15] used a repeatablty analyss to dentfy hotspots wth the hghest 5% and 1% area of the Posson dstrbuton over ten years, usng a bandwdth of 500 m. They concluded that repeatablty analyss determned more hot spot locatons than the Kernel Densty analyss. Yamada and Thll [16] appled the network-based framework to analyze hghway crashes that occurred on a small hghway network n Buffalo, New York. The method was mplemented n conjuncton wth Monte Carlo smulaton to obtan crtera aganst whch statstcal nferences from the observed patterns can be made. They found that ncorporatng GIS and spatal statstcal approaches can effectvely detect crash hotspots. 3. Moran s I Moran s I [17] s one of the oldest ndces of spatal autocorrelaton and can be used to test for global and local spatal autocorrelaton among contnuous data. For any contnuous varable, x a mean x, can be calculated and the devaton of any observaton from that mean can be calculated based on the cross products of the devatons from the mean. The statstc then compares the value of the varable at any one locaton wth the values at all other locatons [18] [19] [20]. For n observatons on a varable x at locatons, j, Moran s I s calculated by Equaton (1) as follows: ( )( ) n n j j j 0 n ( )) 2 n w x x x x I = S x x where, x : s the mean of the varable x; w j : are the elements of the weght matrx; S 0 : s the sum of the elements of the weght matrx: S0 n n = j Values for ths ndex typcally, range from 1.0 to +1.0, where a value of -1.0 ndcates negatve spatal autocorrelaton, and a value of +1.0 ndcates postve spatal autocorrelaton. When nearby ponts or segments have smlar values, ther cross product s hgh. Conversely, when nearby ponts or segments have dssmlar values, ther cross-product s low. The expectaton of Moran s I s: ( ) E I 1 = n 1 wth a Moran s I value larger than E(I), ndcates postve spatal autocorrelaton, and a Moran s I less than E(I), ndcates negatve spatal autocorrelaton. In Moran s formulaton, the weght varable, w j, s a contguty matrx. If zone j s adjacent to zone, the product receves a weght of 1.0. Otherwse, the product w j (1) (2) 201

5 receves a weght of 0.0. The z-scores of Moran s I can be computed by Equaton (3): ( ) ( ) I E I Z = (3) V I where E(I) s the expected value of I, and V(I) s the varance of I, as shown n Equaton (4): 2 2 ( ) ( ) ( ) V I = E I E I (4) The dstrbuton of the z-scores s assumed to be approxmately normal wth a mean of 0.0 and a varance of 1.0 (Clff and Ord 1981). A statstcally sgnfcant postve z-score ndcates that the dstrbuton of the observatons s spatally autocorrelated producng Hgh-Hgh (HH) clusters, whereas a negatve z-score ndcates that the observatons tend to be more dssmlar producng Low-Low (LL) clusters. A z-score close to zero ndcates that observatons are randomly and ndependently dstrbuted n space. By assumng a z-score s from a standard normal dstrbuton, ther assocated p-value can be obtaned, and can be used to determne the sgnfcance of the ndex at each locaton [4]. To determne f the z- score s statstcally sgnfcant, t should be compared to the range of values for a partcular confdence level. For example, at a sgnfcance level of 95%, a z-score would have to be less than 1.96 or greater than to be statstcally sgnfcant. The null hypothess H 0 s that there s no spatal autocorrelaton among the observatons. The null hypothess can be rejected, f the p-value shows that the z-score s sgnfcant. 4. Gets-Ord G Statstc The Gets-Ord G statstc s another ndex of spatal autocorrelaton [21] that can dstngush between postve spatal autocorrelaton wth hgh values from postve spatal autocorrelaton wth low values. The General (Global) G statstc computes a sngle statstc for the entre study area, whle the local G statstc s an ndcator for local autocorrelaton for each data pont. There are two types of G statstcs, although almost the two types produce dentcal results [22] [23]. The frst one, G, does not nclude the autocorrelaton of a zone wth tself, whereas the ncludes the nteracton of a zone wth tself (.e. the G statstc does not nclude the value of X tself, but only the neghborhood values, but ncludes X as well as the neghborhood values), and formally both can be computed by the formulae [5]: G G n ( ) j j ( ) w d = n x j j d x n ( ) j 1 j ( ) w d d = n = x j = where, d s the neghborhood (threshold) dstance, and w j s the weght matrx j 1 j x j (5) (6) 202

6 that has only 1.0 or 0.0 values, 1.0 f j s wthn d dstance of, and 0.0 f ts beyond that dstance. These formulae ndcate that the cross-product of the value of X at locaton and at another locaton j s weghted by a dstance weght, w j whch s defned by ether a 1.0 f the two locatons are equal to or closer than a threshold dstance, d, or a 0.0 otherwse. The G statstc can vary between 0.0 and 1.0. The statstcal sgnfcance of the local autocorrelaton between each pont and ts neghbors s assessed by the z-score test and the p-value. The expected G value for a threshold dstance, d, s defned as: W E G( d ) = (7) n n 1 ( ) where, W s the sum of weghts for all pars of locatons ( W n = n j w j ), and n s the number of observatons. Assumng normal dstrbuton, the varance of G(d) s defned as [24]: 2 2 ( ) = ( ) ( ) Var G d E G E G (8) The standard error of G(d) s the square root of the varance of G. Therefore, a z-test can be computed by: ( ) = ( ) SE.. G d Var G d (9) G d Z G( d) = ( ) E G( d) SE.. G( d) Where, a postve z-value ndcates spatal clusterng of hgh values, whle a negatve z-value ndcates spatal clusterng of low values. Sometmes, the G statstc may not follow a normal standard error, and the dstrbuton of the statstc may not be normally dstrbuted, such as the case of a skewed varable wth some ponts havng very hgh values whle the majorty of other ponts havng low values. In ths case, a permutaton type smulaton should be used [6] [25], wth a randomzaton dstrbuton to test the null hypothess of no local autocorrelaton (H 0 ). Ths wll mantan the dstrbuton of the varable z but wll estmate the value of G under random assgnment of ths varable, and the user can take the usual 95% or 99% confdence ntervals based on the level used. 5. Planar Kernel Densty Estmaton Kernel Densty Estmaton s a non-parametrc method to estmate the probablty densty functon of a varable that produces a smooth densty surface of pont events over a 2-D geographc space (.e. planar space). Kernel densty estmatons are closely related to hstograms, but can be constructed wth propertes such as smoothness or contnuty by usng a sutable kernel. The dsadvantages of hstograms provde the motvaton for kernel estmaton. When we construct a hstogram, we need to consder the wdth of the bns n whch the whole data nterval s dvded by, and the end ponts of the bns. As a result, the problems wth hstograms are that they are not smooth, and therefore we can allevate these problems by usng kernel densty estmaton that centers a kernel functon (10) 203

7 at each data pont [26]. KDE tends to produce a smooth densty surface of pont events over space by computng event ntensty as densty estmaton. The general form of a KDE n a 2-D space s gven by [8]: n 1 ds λ ( s) = k (11) 1 2 π r r Where λ(s) s the densty at locaton s, r s the search radus (bandwdth) of the KDE, k s the weght of a pont at dstance d s to locaton s. The kernel functon k s usually consdered as a functon of the rato between d s and r. As a result, the longer the dstance between a pont and locaton s, the less that pont s weghted for calculatng the overall densty. All ponts wthn the bandwdth r of locaton s are summed for calculatng the densty at s. A number of dstrbutons can be used to measure the spatal weghts k, such as Gaussan, Quartc, Conc, Mnmum varance functon, negatve exponental, and epanchnekov [27] [28]. Some of the mostly used forms of kernel functons are [29]: The Gaussan functon: d k r The Quartc functon: d k r s 1 d = EXP 2π 2r s d = K 1 r 2 s 2 2 s 2 (12) (13) Where K s a scalng factor to ensure the total volume under Quartc curve s 1.0, and usually used as ¾. The mnmum varance functon: d k r s 3 d = r To fnd the KDE value, two key parameters must be chosen: the kernel functon k; and the search radus (bandwdth) r. Many studes have found that the type of the dstrbuton of the kernel functon k has a very lttle effect on the results compared to the choce of search bandwdth r [1] [26] [29] [30] [31]. The value of search bandwdth r s very mportant because t usually determnes the smoothness of the estmated densty and can affect the outcome. If t s too small, t wll not produce a contnuous smooth surface, and too large bandwdth wll suppress spatal varaton of events. Therefore, the bandwdth of the kernel densty estmaton often proves to be more nfluental to outcomes than the kernel shape dstrbuton. Hence, an optmal value of r must be chosen that mnmzes the sum of the squared errors of the kernel estmaton [32]. There s no unque defnton of the optmal search radus, and dfferent optmalty crtera have been used. For example, ESRI ArcGIS 10.2 uses the followng formula as a default optmal search radus [33]: r = 0.9 mn SD, Dm n ln ( 2) (15) 2 s 2 (14) 204

8 where, SD: the standard dstance D m : the medan dstance n: the number of ponts f no populaton feld s used, or f a populaton feld s suppled, n s the sum of the populaton feld values, and mn: means that whchever of the two optons that results n a smaller value wll be used. Slverman [26] suggested a rule of thumb for calculatng the optmal search radus as follows: = 4 (16) r SD 5 3 n where, the SD s the standard devaton of the samples provded that the kernel functon s Gaussan type and that samples follow normal dstrbuton. The cell sze depends on the user choce and the dataset. Okabe et al. [9] suggested usng a cell sze of (r/10) as a rule of thumb. 6. Network Kernel Densty Estmaton In the real world, there are many knds of network-constraned events, such as traffc crashes, street crmes, leakages n gas ppe lnes along roadways, and rver contamnaton. In planar KDE, the space s characterzed as a 2-D homogeneous Eucldan space and densty s usually estmated at a large number of locatons that are regularly spaced over a grd. However, n analyzng the hot spots of network-constrant events, the assumpton of homogenety of 2-D space does not hold and the relevant KDE methods may produce based results [9]. Therefore, the planar KDE has been extended to the network KDE, whch dffers from the planar KDE n several aspects: () the network space s used as the pont event context; () both search bandwdth r and kernel functon k are based on network dstance (calculated as the shortest path dstance n a network) nstead of straght-lne Eucldean dstance; and () densty s measured per lnear unt nstead of area unt. The network KDE s a 1-D measurement whle the planar KDE s a 2-D measurement [9]. The network KDE s an extenson of the planar 2-D KDE and t uses the followng equaton for the densty estmaton of network-constraned pont events n a network space [8]: λ ( s) n 1 ds = k r r (17) Instead of calculatng the kernel densty over an area unt, the equaton estmates the densty over a lnear unt, and any of the dfferent forms of kernel functons k may be used. 7. Data The analyss s conducted on a dataset that presents a road network n the state of Indana as shown n Fgure 1. The data ncludes a crash pont layer and a road layer wth crash records of the year 2013 that ncludes all types of crashes (.e. fatal, njury, and property damage). The dataset ncludes 2983 crash pont 205

9 Fgure 1. Roads and Crashes of Indana network used n the analyss. events on the network. 8. Results and Dscusson The Global Moran s I evaluates whether the overall network crashes are clustered, dspersed, or random, and assesses the overall spatal pattern of the crash data. The GIS spatal statstcs tool s used to compute the Global Moran s I, and fve values are generated from runnng ths tool: The Moran s Index, the Expected Index, the Varance, the z-score, and the p-value as shown n Table 1. The results of the analyss are nterpreted wthn the context of the null hypothess. For the Global Moran s I statstc, the null hypothess states that the attrbutes (.e. crashes) beng analyzed are randomly dstrbuted among the features n the study area (.e. no global spatal autocorrelaton exsts for the entre network). However, snce the p-value beng generated s less than 0.01 (usng a confdence level of 99%), then ths ndcates that the Global Moran s I spatal autocorrelaton s sgnfcant, and hence, we can reject the null hypothess, and state that t s qute possble that the spatal dstrbuton of the overall network crashes s the result of clusterng pattern, and there s less than 1% probablty that ths pattern could be the result of random process. Smlarly, the Global (General) Gets-Ord statstc evaluates whether the overall network crashes are clustered, dspersed, or random, and assesses the overall pattern and trend of the crash data, and fve values are generated from runnng the ArcMap spatal statstc tool: The General G statstc, the Expected Index, the Varance, the z-score, and the p-value as shown n Table 2. Snce the p-value beng generated s less than 0.01 (usng a confdence level of 99%), then ths ndcates that the General G statstc spatal autocorrelaton s sgnfcant, and hence, we can reject the null hypothess, and state that t s qute possble that the spatal dstrbuton of the overall network crashes s the result of clusterng patterns, and there s less than 1% probablty that ths pattern 206

10 Table 1. Global Moran s I Summery for Indana road network. Global Moran s I Expected Index Varance z-score p-value Decson sgnfcant Table 2. General G statstc Summery for Indana road network General G Expected Index Varance z-score p-value Decson sgnfcant could be the result of random process. Ths result s analogous to the Global Moran s I n determnng the overall clusterng pattern of the crashes. Next, the statstcally sgnfcant hot spots, cold spots, and spatal outlers are dentfed usng the Anseln Local Moran s I, and the local statstc. The z-scores and p-values can be used to evaluate the statstcal sgnfcance of the computed ndex values. Ths method can dstngush between a statstcally sgnfcant cluster of hgh values (HH), cluster of low values (LL), and outlers n whch a hgh value s surrounded by low values (HL), and outlers n whch a low value s surrounded by hgh values (LH). Table 3 shows the HH, LL, HL, LH dentfed by both Moran s I and statstc. Fgure 2 shows the HH, LL, HL, LH dentfed by Moran s I. Fgure 3 shows the HH, LL, HL, LH of Moran s I wth renderng that clearly llustrates the range of the z-scores of the dentfed clusters between the range LL < - 2.0, and the HH > 2.0. It can be seen that the statstc has dentfed a larger number of sgnfcant hot spots (157 HHs) and sgnfcant cold spots (307 LLs) than the Moran s I (102 HHs, 287 LLs). However, Moran s I has dentfed a larger number of sgnfcant outlers (79 HLs, 82 LHs) than the (48 HLs, 0.0 LHs). In addton, the extent and locatons of hot spots, cold spots and outlers dffer from one method to the other. For example, cluster # 1 s dentfed by the statstc as purely HH hot spot, whle t has been dentfed as a mxed HH and LH hot spot by Moran s I. Clusters # 3, 4, and 5 are dentfed by the as purely cold spots, whle they have been dentfed as mxed HH, LH, and non-sgnfcant hot spots by Moran s I. Clusters # 2 and 6 are dentfed as non-sgnfcant spots by both methods. Fgure 4 shows the HH, LL, HL, LH dentfed by the statstc. Fgure 5 shows the HH, LL, HL, LH of wth renderng that clearly llustrates the range of the z-scores of the dentfed clusters between the range of LL < - 2.0, and the HH > 2.0. The planar Kernel Densty Estmaton s determned by the ArcMap spatal analyst tools. Kernel Densty calculates the densty of a pont around each output raster cell, and a smoothly curved surface s ftted over each pont. Densty surfaces show where pont features are concentrated. The surface value s hghest at the locaton of the pont beng analyzed and decreases wth ncreasng dstance from the pont, reachng zero at the search radus dstance from the pont. Fgure 6 shows the hot spots dentfed by the planar KDE, and ther average dens- 207

11 Fgure 2. Hot spots and outlers by Anseln Moran s I. Fgure 3. Hot Spots by Moran s I wth z-scores renderng. Fgure 4. Hot Spots and Outlers by G* statstc. tes per km 2. It can be seen that the planar KDE has dentfed seven clusters wth dfferent crash denstes. For example, cluster # 1 contans 8 densty levels rang- 208

Fgure 5. Hot Spots by G* statstc wth z-scores renderng. Fgure 6. Hot Spots by Planar Kernel Densty Estmaton. Table 3. Hot spots, Cold Spots, and outlers of Indana network by Moran s I and G *.

12 Fgure 5. Hot Spots by G* statstc wth z-scores renderng. Fgure 6. Hot Spots by Planar Kernel Densty Estmaton. Table 3. Hot spots, Cold Spots, and outlers of Indana network by Moran s I and G *. Method Hot Spot HH Cold Spot LL Outler HL Outler LH Anseln Moran s I G * statstc ng from the hghest densty value of 147 crashes/km 2 to the lowest densty value of 16 crashes/km 2. Cluster # 2 and # 3 contan 3 densty levels rangng from the hghest densty of 65 crashes/km 2 to the lowest densty of 16 crashes/km 2. Cluster # 4 and # 5 contan 2 densty levels that decreases from the hghest value of 48 crashes/km 2 to the lowest value of 16 crashes/km 2. Cluster # 6 and # 7 contan only one densty level of at least 16 crashes/km 2. The remanng whte raster area contans the mnmum densty (between 0 15 crashes/km 2 ). The network-constraned Kernel Densty Estmaton s determned usng the SANET V4.1 software [34]. Tradtonally, network events are analyzed wth spatal methods assumng Eucldean dstance on a 2-D plane, however, ths assumpton does not hold n practce when analyzng network events, because Eucldean 209

13 dstances and ther correspondng network shortest-path dstances are sgnfcantly dfferent. Alternatvely, network spatal analyss assumes the shortest-path dstance on networks that enables more practcal nvestgaton of network events than planar spatal analyss. Hence, planar KDE s lkely to lead to false conclusons when appled to network events [35]. A clear example s provded n Fgure 7, whch shows that for a road segment AB, the planar KDE consders 14 crashes wthn the crcular area surroundng the segment, whle the network KDE consders only 11 crashes on that segment. Fgure 8 shows the hot spots dentfed by the network KDE, and ther average denstes per lnear km. Fgure 9 shows the hot spots by the network KDE wth renderng of ther z-scores. It can be notced from Fgure 8 that the network KDE has dentfed dfferent clusterng patterns. For example, cluster # 1 s dentfed by the network KDE as purely hgh densty hot spot smlar to the pattern. The average densty at ths juncton s (between 4.8 to 5.7 crashes/km). Clusters # 3, 4, and 5 are dentfed by the network KDE as purely hgh densty hot spots (ther densty between 4.8 to 5.7 crashes/km), whle they have been dentfed as mxed HH, LH, and non-sgnfcant hot spots by Moran s I, and purely LL cold spots by. Clusters # 2 and 6 are dentfed by the network KDE as mxed hgh densty hot spots and low densty spots (densty for the low spots s between 1.8 to 2.7 crashes/km, and densty for the hgh spots s between 4.8 to 5.8 crashes/km), whle they are dentfed as non-sgnfcant spots by both Moran s I and statstc. The crash densty for the cold spots s between 2.8 to 4.7 crashes/km as shown n Fgure 8. Snce each method has dentfed dfferent clusterng patterns, therefore we recommend usng a combnaton of these methods n hot spot analyss. Comparable results can show more dverse and flexble nterpretatons among the clusterng patterns. Usng only one method can result n msleadng conclusons. For example, as we saw above, cluster # 1 s dentfed as a pure HH hot spot n Fgure 7. An example of dfferent outcomes between the planar KDE and the network KDE. 210

Fgure 8. Hot Spots by Network Kernel Densty Estmaton. Fgure 9. Hot Spots by network KDE wth z-score renderng. and hgh densty spot n network KDE (densty from 4.8 to 5.

14 Fgure 8. Hot Spots by Network Kernel Densty Estmaton. Fgure 9. Hot Spots by network KDE wth z-score renderng. and hgh densty spot n network KDE (densty from 4.8 to 5.7 crashes/km), whle t s dentfed as a mxed HH and LH n Moran s I. Lkewse, cluster # 2 s dentfed as a pure non-sgnfcant spot n both Moran s I and, whle t s a mxed HH densty spot (wth densty from 4.8 to 5.7 crashes/km) and low densty spot (from 1.8 to 2.7 crashes/km) n network KDE. Hence, usng a combnaton of these methods would probably produce more relable results than usng one method alone. Table 4 shows a comparson between some characterstcs of Moran s I, 9. Concluson statstc, and network KDE n dentfyng hot spots. Hot pot analyss focuses on hghlghtng areas whch have hgher than average ncdence of events, and t s a valuable technque for vsualzng the concentraton of events on networks. Ths paper presented two methods: Moran s I and Gets-Ord statstc based on network spatal autocorrelaton and another thrd method, kernel densty estmaton (KDE) to examne the spatal patterns of 211

15 Table 4. Comparson between Moran s I, G* statstc, and network KDE. Moran s I G* statstc Network KDE measures spatal correlaton, dentfes hot spots wth hgh-hgh values, and cold spots wth low-low values Identfes outlers (dspersed ncdents) wth hgh-low values and low-hgh values Lookng at the value n the context of ts neghbors values wthn the nverse dstance between locatons does not nclude the nteracton of a zone wth tself but only wth ts neghborhoods n measurng spatal correlaton reports an ndex-value, and a z-score measures spatal correlaton, dentfes hot spots wth hgh-hgh values, and cold spots wth low-low values does not dentfy outlers Lookng at the value n the context of ts neghbors values that fall wthn a specfed dstance of each other ncludes the nteracton of a zone wth tself n addton to ts neghborhoods n measurng spatal correlaton reports a combned ndex-value and a z-score measures probablty densty functon, dentfes only hot spots n term of densty per lnear dstance unt does not dentfy outlers conduct densty calculaton based on the user-specfed search radus and raster cell sze does not nclude the nteracton of a zone wth tself but only wth ts neghborhoods n measurng kernel densty reports a lnear densty value, and a z-score reports a p-value reports a p-value does not report a p-value presents the statstcal sgnfcance of clusterng presents the statstcal sgnfcance of clusterng does not present the statstcal sgnfcance of clusterng ranges from -1.0 to ranges from 0.0 to any postve value vehcle crashes and determnes f they are spatally clustered, dspersed, or random usng the 2013 vehcle crash data n the state of Indana. The Global values of both Moran s I and showed that t s qute possble that the spatal dstrbuton of the overall network crashes s the result of clusterng patterns, and there s less than 1% probablty that ths pattern could be the result of random process. The local Moran s I and the road network. The dentfed dfferent clusterng patterns on statstc has dentfed a larger number of sgnfcant hot spots (157 HHs) and sgnfcant cold spots (307 LLs) than the Moran s I (102 HHs, 287 LLs). However, Moran s I has dentfed a larger number of sgnfcant outlers (79 HLs, 82 LHs) than the (48 HLs, 0.0 LHs). The kernel densty estmaton s evaluated as planar KDE and network KDE. In planar KDE, the space s characterzed as a 2-D homogeneous Eucldan space that does not hold when analyzng network events. Therefore, the planar KDE has been extended to the network KDE, whch s a 1-D measurement whle the planar KDE s a 2-D measurement. In applyng the planar KDE to the network, t dentfed seven clusters wth dfferent crash denstes. Cluster # 1 contaned 8 densty levels, cluster # 2 and # 3 contaned 3 densty levels, cluster # 4 and # 5 contaned 2 densty levels, and cluster # 6 and # 7 contaned only one densty level. The network KDE dentfed dfferent patterns of clusters than what Moran s I and statstc have dentfed. For example, cluster # 1 s dentfed as a pure HH hot spot n both and network KDE (densty from 4.8 to 5.7 crashes/km), whle t s dentfed as a mxed HH and LH n Moran s I. Lkewse, cluster # 2 s den- 212

16 tfed as a pure non-sgnfcant spot n both Moran s I and, whle t s a mxed HH densty spot (wth densty from 4.8 to 5.7 crashes/km) and low densty spot (from 1.8 to 2.7 crashes/km) n network KDE. Snce each method has dentfed dfferent clusterng patterns, therefore ths paper recommends usng a combnaton of these methods n hot spot analyss. Comparable results from these methods can produce more relable nterpretatons among the clusterng patterns. Usng only one method can probably produce msleadng results. References [1] Baley, T.C. and Gatrell, A.C. (1995) Interactve Spatal Data Analyss. Addson Wesley Longman Ltd., Harlow, England. [2] Tobler, W.R. (1970) A Computer Move Smulatng Urban Growth n the Detrot Regon. Economc Geography, 46, [3] Black, W. (1992) Network Autocorrelaton n Transport Network and Flow Systems. Geographcal Analyss, 24, [4] Clff, A.D. and Ord, J.K. (1981) Spatal Processes: Models and Applcatons. Pon, London, UK. [5] Fscher, M. and Wang, J. (2011) Spatal Data Analyss: Models, Methods, and Technques. Sprnger, New York. [6] Anseln, L. (1995) Local Indcators of Spatal Assocaton LISA. Geographcal Analyss, 27, [7] Yamada, I. and Thll, J.C. (2004) Comparson of Planar and Network K-Functons n Traffc Accdent Analyss. Journal of Transport Geography, 12, [8] Xe, Z. and Yan, J. (2008) Kernel Densty Estmaton of Traffc Accdents n a Network Space. Computers, Envronment and Urban Systems, 32, [9] Okabe, A., Satoh, T. and Sughara, K. (2009) A Kernel Densty Estmaton Method for Networks, Its Computatonal Method and a GIS-Based Tool. Internatonal Journal of Geographcal Informaton Scence, 23, [10] Anderson, T.K. (2009) Kernel Densty Estmaton and K-Means Clusterng to Profle Road Accdent Hot Spots. Accdent Analyss & Preventon, 41, [11] Black, W. and Thomas, I. (1998) Accdents on Belgum s Motorway: A Network Autocorrelaton Analyss. Journal of Transport Geography, 6, [12] Flahaut, B. (2004) Impact of Infrastructure and Local Envronment on Road Unsafety: Logstc Modelng wth Spatal Autocorrelaton. Accdent Analyss & Preventon, 36, [13] Yamada, I. and Thll, J.C. (2007) Local Indcators of Network-Constraned Clusters n Spatal Pont Patterns. Geographcal Analyss, 39, [14] Schwetzer, L. (2006) Envronmental Justce and Hazmat Transport: A Spatal Analyss n Southern Calforna. Transportaton Research Part D: Transport and Envronment, 11, [15] Erdogan, S., Ylmaz, I., Baybura, T. and Gullu, M. (2008) Geographcal Informaton Systems Aded Traffc Accdent Analyss System Case Study: Cty of Afyonkarahsar. Accdent Analyss & Preventon, 40,

17 [16] Yamada, I. and Thll, J.C. (2010) Local Indcators of Network-Constraned Clusters n Spatal Patterns Represented by a Lnk Attrbute. Annals of the Assocaton of Amercan Geographers, 100, [17] Moran, P. (1948) The Interpretaton of Statstcal Maps. Journal of the Royal Statstcal Socety, 10, [18] Anseln, L. (1992) Space Stat: A Program for the Statstcal Analyss of Spatal Data. Natonal Center for Geographc Informaton and Analyss, Unversty of Calforna, Santa Barbara, CA. [19] Goodchld, M.F. (1987) Spatal Autocorrelaton. Concepts and Technques n Modern Geography, 48, [20] Grffth, D.A. (1987) Spatal Autocorrelaton: A Prmer. Resource Publcatons n Geography, The Assocaton of Amercan Geographers, Washngton DC. [21] Gets, A. and Ord, J.K. (1992) The Analyss of Spatal Assocaton by Use of Dstance Statstcs. Geographcal Analyss, 24, [22] Gets, A. and Ord, J.K. (1996) Local Spatal Statstcs: An Overvew. Geo Informaton Internatonal, Cambrdge, England. [23] Berglund, S. and Karlstrom, A. (1999) Identfyng Local Spatal Assocaton n Flow Data Geographcal Systems. Transportaton Geography, 1, [24] Lee, J. and Wong, D.W.S. (2005) Statstcal Analyss wth ArcVew GIS and ArcGIS. Wley & Sons, Inc., New York. [25] Mobley, L.R., Kuo, T.M., Drscoll, D., Clayton, L. and Anseln, L. (2008) Heterogenety n Mammography Use across the Naton: Separatng Evdence of Dspartes from the Dsproportonate Effects of Geography. Internatonal Journal of Health Geographcs, 7, [26] Slverman, B.W. (1986) Densty Estmaton for Statstcs and Data Analyss. Chapman Hall, London. [27] Levne, N. (2004) CrmeStat III: A Spatal Statstcs Program for the Analyss of Crme Incdent Locatons. Ned Levne & Assocates The Natonal Insttute of Justce, Houston, TX; Washngton DC. [28] Gbn, M., Longley, P. and Atknson, P. (2007) Kernel Densty Estmaton and Percent Volume Contours n General Practce Catchment Area Analyss n Urban Areas. Proceedngs of the GIScence Research UK Conference (GISRUK) 2007, Maynooth, Ireland. [29] Schabenberger, O. and Gotway, C.A. (2005) Statstcal Methods for Spatal Data Analyss. Chapman & Hall/CRC, Boca Raton, FL. [30] O Sullvan, D. and Unwn, D.J. (2002) Geographc Informaton Analyss. John Wley, Hoboken, NJ. [31] O Sullvan, D. and Wong, D.W.S. (2007) A Surface-Based Approach to Measurng Spatal Segregaton. Geographc Analyss, 39, [32] Scott, D.W. (1992) Multvarate Densty Estmaton, Theory, Practce, and Vsualzaton. John Wley & Sons, New York. [33] ArcGIS Resources. [34] Okabe, A., Okunuk, K. and Shode, S. (2006) The SANET Toolbox: New Methods for Network Spatal Analyss. Transactons n GIS, 10,

18 [35] Okabe, A. and Sughara, K. (2012) Spatal Analyss along Networks: Statstcal and Computatonal Methods. John Wley & Sons, New York. Submt or recommend next manuscrpt to SCIRP and we wll provde best servce for you: Acceptng pre-submsson nqures through Emal, Facebook, LnkedIn, Twtter, etc. A wde selecton of journals (nclusve of 9 subjects, more than 200 journals) Provdng 24-hour hgh-qualty servce User-frendly onlne submsson system Far and swft peer-revew system Effcent typesettng and proofreadng procedure Dsplay of the result of downloads and vsts, as well as the number of cted artcles Maxmum dssemnaton of your research work Submt your manuscrpt at: Or contact wjet@scrp.org 215

Comparison of Regression Lines

Comparison of Regression Lines STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence