Does the Dispersion Parameter of Negative Binomial Models Truly. Estimate the Level of Dispersion in Over-dispersed Crash data with a. Long Tail?

Size: px
Start display at page:

Download "Does the Dispersion Parameter of Negative Binomial Models Truly. Estimate the Level of Dispersion in Over-dispersed Crash data with a. Long Tail?"

Transcription

1 Does the Dispersion Parameter of Negative Binomial Models Truly Estimate the Level of Dispersion in Over-dispersed Crash data wh a Long Tail? Yajie Zou, Ph.D. Research associate Smart Transportation Applications & Research Laboratory Universy of Washington Tel: (936) Fax: (06) zouyajie@uw.edu Lingtao Wu Ph.D. Candidate, Zachry Department of Civil Engineering Texas A&M Universy, 3136 TAMU College Station, Texas Phone: , fax: wulingtao@gmail.com & Research Assistant, Research Instute of Highway, MOT, China 8th Xucheng RD., Haidian District Beijing, China Dominique Lord, Ph.D. Associate Professor and Zachry Development Professor I Zachry Department of Civil Engineering Texas A&M Universy, 3136 TAMU College Station, Texas Phone: , fax: d-lord@tamu.edu

2 ABSTRACT Despe many statistical models that have been proposed for modeling motor vehicle crashes, the most commonly used statistical tool remains the Negative binomial (NB) model. Crash data collected for safety studies may exhib over-dispersion and a long tail (i.e., a few ses have unusually high number of crashes). However, some studies have shown that NB models cannot handle over-dispersed count data wh a long tail adequately. So far, no work has investigated the performance of the dispersion parameter of the NB model when analyzing over-dispersed crash data wh a long tail. The dispersion parameter of the NB model plays an important role in various types of transportation safety analysis. The first objective of this study is to examine whether the dispersion parameter can truly reflect the level of dispersion in over-dispersed crash data wh a long tail. The second objective is to determine whether the dispersion term of the Sichel (SI) model can be used as an alternative to the dispersion parameter of the NB model. To accomplish the objectives of this study, 3,000 data sets are simulated from NB and SI regression models using different values describing the mean and the dispersion level. For the simulated data sets, the dispersion parameter and dispersion term are estimated and compared to the true values. To complement the output of the simulation study, crash data collected in Texas are also used to compare the dispersion parameter and dispersion term. The results from this study suggest that the dispersion parameter of the NB model can erroneously estimate the level of dispersion in over-dispersed count data wh a long tail and the dispersion term of the SI model is more reliable in estimating the true level of dispersion. Thus, considering the findings in this study, is believed that the dispersion term may offer a viable alternative for analyzing over-dispersed crash data wh a long tail. Impact on Industry: The dispersion term of the SI model can be used to obtain reliable empirical Bayes (EB) estimates. The SI-based EB estimates can provide accurate hotspot identification results by ranking crash-prone ses for safety improvement programs. Keywords: Sichel, Negative binomial, Dispersion parameter, Traffic crashes, Empirical Bayes

3 1. Introduction From a statistical point of view, the occurrence of highway crashes can be treated as random events by assuming that there is an underlying mean crash rate for each individual se (Park et al., 010). What makes the analysis difficult in modeling crash data is that the crash data are often found to exhib over-dispersion, meaning that the variance is greater than the mean (Park and Lord, 009). Lord et al. (005) provided a fundamental definion that the over-dispersion arises from the actual nature of the crash process. To accommodate over-dispersion in crash data, many mixed-poisson models have been proposed by transportation safety analysts, such as the negative binomial (NB, also known as Poisson-gamma) models (Miaou and Lord, 003; Poch and Mannering, 1996), zero-inflated models (Shankar et al., 1997), the Poisson-lognormal (Aguero-Valverde and Jovanis, 008), the Conway-Maxwell-Poisson (Lord et al., 008b), the Poisson-weibull (Cheng et al., 013), etc. (for a comprehensive review of the mixed-poisson models used in transportation safety analysis, see Mannering and Bhat (013)). These statistical models are in fact used as an approximation for modeling crash data. Among these mixed-poisson models, the NB model remains the most frequently used statistical model for accommodating the over-dispersion observed in the crash data (Lord and Mannering, 010). Reasons for the populary of the NB models include: (1) the NB model provides a simple way to manipulate the relationship between the mean and the variance (Lord and Mannering, 010); () the dispersion parameter of the NB model plays an important role in transportation safety analysis. Besides the mixed-poisson models, the random parameters count models (Anastasopoulos and Mannering, 009; Chen and Tarko, 013), fine mixture and Markov swching models (Malyshkina et al., 009; Park and Lord, 009; Zou et al., 01), generalized ordered-response models (Bhat et al., 014; Castro et al., 01) and quantile regression models (Qin and Reyes, 011) have been proposed for analyzing the crash-frequency data. The dispersion parameter of the NB model (usually denoted as ) is crical for estimating the weight factor of the empirical Bayes (EB) method (Hauer, 1997; Hauer et al., 1988) and for building confidence intervals for evaluating and screening highway projects (Wood, 005). Since the above two types of analysis are commonly used in highway safety, is necessary to obtain reliable estimates of the dispersion parameter. It has been shown that the low sample mean and small sample size can significantly influence the estimation of the dispersion parameter of NB models using the maximum likelihood estimation method and Bayesian method (Lord, 006; Lord and Miranda-Moreno, 008; Maher and Summersgill, 1996). To avoid or minimize an unreliably estimated dispersion parameter, Lord (006) also summarized the minimum sample size for different sample means. For NB models, the gamma distribution assumed in the probabilistic error term related to the mean of the Poisson variable can be restrictive in terms of s abily to account for heterogeney across observations (Park et al., 010). For example, Guo and Trivedi (00) have reported that NB regression models have difficulties modeling heavily over-dispersed data wh a long-tail and relatively high mean value because a negligible probabily is usually assigned to high counts. Recently, the Sichel distribution (SI, also known as the Poisson-generalized inverse Gaussian distribution) has been introduced by (Zou et al., 013)

4 for calculating EB estimates. The SI distribution is a compound Poisson distribution, which mixes the Poisson distribution wh the generalized inverse Gaussian distribution. Previous studies (Gupta and Ong, 005; Stein et al., 1987) have shown that the SI distribution is useful as a model for over-dispersed count data wh a long tail. Among different mixed-poisson models, is found that the NB and SI models both have the quadratic variance-mean relationship. Similar to the dispersion parameter of the NB model, a dispersion term of the SI model can be defined to measure the level of dispersion in the data. This dispersion term can be easily used by transportation safety analysts to obtain reliable EB estimates whin the SI modeling framework (Zou et al., 013). Considering the importance of the dispersion parameter of the NB model in transportation safety analysis, the objective of this study is to examine whether or not the tradionally used dispersion parameter can truly reflect the level of dispersion in over-dispersed crash data wh a long tail and whether the dispersion term of the SI model can be used as an alternative to the dispersion parameter. To accomplish the objectives of this study, 3,000 data sets are simulated from NB and SI regression models using different values describing the mean and the dispersion level. For the simulated datasets, the dispersion parameter and dispersion term are estimated and compared to the true values. To complement the output of the simulation study, crash data collected in Texas are also used to compare the dispersion parameter and dispersion term. This study will demonstrate that the dispersion parameter of the NB model can be biased for over-dispersed count data wh a long tail and the dispersion term of the SI model is a more reliable estimator of the true level of dispersion.. Background This section provides a brief description about the characteristics of the NB and the SI models, respectively. The NB models have the following probabilistic structure: the number of crashes Y, at the th i se and time period t, when condional on s mean is Poisson distributed and independent over all ses and time periods (Miaou and Lord, 003): Y ~ Poisson( ), i 1,,..., I and t 1,,..., T (1) The mean of the Poisson is structured as: f ( X; β )exp( e ) where, f () is a function of the covariates;

5 β is vector of unknown coefficients; and, e is the model error independent of all the covariates and exp( e ) is assumed to be independent and gamma distributed wh a mean equal to 1 and a variance. Then, can be derived that Y condional on and is distributed as a NB random variable wh a mean and a variance. The probabily densy function (PDF) of the NB model is defined as follows (for the complete derivation of the NB model, see (Hilbe, 011)): 1 ( y ) y 1 f( y, ) ( ) ( ) 1 ( ) ( y 1) 1 1 1/ () where, y response variable for observation i and time period t ; mean response of the observation i and time period t ; and, dispersion parameter. Compared to the Poisson distribution, the NB distribution can allow for over-dispersion. If 0, the crash variance equals the crash mean and the NB model converges to the Poisson model. The SI distribution has recently been used for modeling motor vehicle crashes (Zou et al., 013). It can be shown that the SI models have the following probabilistic structure: the number of crashes Y, at the i th se and time period t, when condional on s mean is Poisson distributed and independent over all ses and time periods: Y ~ Poisson( ), i 1,,..., I and t 1,,..., T (3) The mean of the Poisson is structured as: f ( X; β)exp( e )

6 where, f () is a function of the covariates; β is vector of unknown coefficients; and, e is the model error independent of all the covariates and exp( e ) is assumed to be independent and generalized inverse Gaussian distributed wh a mean equal to 1 and a variance. ( 1)/ c 1/ c 1 Then, can be shown that Y condional on and is distributed ( 1)/ c 1/ c 1 as a SI random variable wh a mean and a variance ( ( 1) / c 1/ c 1). The PDF of the SI distribution, SI(,, ), is given by, y ( / c) Ky ( ) py (,, ) (4) y K (1/ ) y!( ) where, y response variable; mean response of the observation i ; scale parameter, 0 ; shape parameter; ( c ) ; 1 c K K (1 / ) 1 ; and, (1 / ) K () t x exp( t( x x )} dx is the modified Bessel function of the third kind. 0

7 When and 0, can be shown that the SI distribution can be reduced to the NB distribution. Note that the NB and SI models both have the quadratic variance-mean relationship, that is VAR( y ) h(,, ) where E( y ) and h(,, ) is a function of the parameters of the mixing distribution. For the NB model, h(,, ) is defined as the dispersion parameter; on the other hand, for the SI model, h(,, ) ( 1)/ c 1/ c 1 can be viewed as a dispersion term. Similar to the dispersion parameter in the NB model, this dispersion term can be also used to measure the level of dispersion. For the over-dispersed crash data, the SI model is usually more flexible than the NB model. This is because the variance to mean function for NB model is VAR( y ) defined as 1 ; while for the SI model, the variance to mean Ey ( ) VAR( y ) ( ( 1)/ c 1/ c 1) function is 1 [ ( 1)/ c 1/ c 1]. Ey ( ) Since the SI model has three different parameters, for the crash data wh a fixed mean and a fixed variance to mean ratio, the possible values for parameters and are very flexible. Although many parameter estimation methods are available for estimating the dispersion parameter and dispersion term, three common methods used by transportation safety modelers are the method of moments, the weighted regression analysis and the maximum likelihood estimation (MLE). Previously, Lord (006) compared these three estimators and found that MLE can usually provide better estimation results than the other two estimators. Thus, the MLE method is adopted in this study. More details about the parameter estimation are given in Rigby et al. (008). 3. Methodology and simulation protocol This section describes the methodology and simulation protocol used for estimating the dispersion parameter and the dispersion term under different scenarios. More specifically, a simulation analysis was carried out for the following reason: when analyzing real crash data, the true values of regression parameters and the dispersion level of the crash data are seldom known in practice. In contrast, in a simulation, is possible to generate crash data wh known regression parameters and dispersion levels. In order to construct the crash data that are convincingly similar to empirical crash data, we first summarize the results of pervious works performed on the application of Poisson-gamma models in traffic safety. Table 1 provides the summary of eight crash datasets from seven published papers. This table includes statistics on crash counts, type and location of crash ses, explanatory variables and reported dispersion parameters for NB models. The mean number of crashes for eight datasets ranges from 0.9 to and the corresponding

8 standard deviation is between 0.69 and As documented in these seven studies, for various traffic facilies, the reported dispersion parameters for NB regression models are all below 1 except for the last study. The high dispersion parameter ( 1.69) found in the last study is probably explained by the preponderance of zeros in the data (about 80% of three-legged intersections report zero crash). Overall, Table 1 provides some guidelines in assigning the value for the dispersion parameter in our simulation framework.

9 Table 1. Characteristics of crash datasets and reported values of dispersion parameter. Study Anastasopoulos and Mannering (009) Location Rural interstate highways Crashes Min Max Mean SD* Chang (005) Freeway El-Basyouny and Sayed (006) Lord et al. (008b) Urban arterial Urban intersection Lord et al. (008b) Rural highways Kumara and Chin (003) Chin and Quddus (003) Miaou (1994) * SD = Standard Deviation Urban intersection Urban intersection Rural interstate highways Explanatory variables Pavement characteristics, geometric characteristics and traffic flow characteristics Geometric characteristics, traffic flow characteristics and weather information Geometric characteristics and traffic flow characteristics Observations Reported value of dispersion parameter Traffic flow characteristics N/A N/A Geometric characteristics and traffic flow characteristics Geometric characteristics, traffic flow characteristics and traffic device information Geometric characteristics, traffic flow characteristics and traffic device information Geometric characteristics and traffic flow characteristics

10 The following section presents the simulation protocol to illustrate the performance of the dispersion term of the SI model in estimating the dispersion parameter of the NB model. Two different experiments are designed. In the first experiment, the NB data were generated and the NB and SI regression models were estimated using the MLE method. In the second experiment, the SI data were generated and the NB and SI regression models were estimated Experiment one In order to examine the accuracy of parameter estimates from the SI models, 100 datasets wh 1,000 observations each, were randomly generated for each of 15 different scenarios corresponding to different dispersion parameters and sample means. The 15 scenarios include simulated datasets wh the following dispersion parameter, = 0.5, 0.5, 0.75, 0.95 and 1.5. For each dispersion parameter, three different sample means were considered: high mean (HM) ( 0.4, , 0.15 and resulting sample mean is approximately 11.0); moderate mean (MM) ( 0 1.7, , 0.15 and resulting sample mean is approximately 5.5); and low mean (LM) ( 0 0., , 0.15 and resulting sample mean is approximately 1.). For each scenario, the parameter estimates from NB and SI models were compared to the known parameter values that had been used to generate crash datasets. The following paragraph summarizes the simulation procedure for experiment one. The values of dispersion parameter are selected according to the finding in Table 1. In experiment one, we generated the NB data using the following steps: (1) Simulate a value for the covariates X 1 and X from a uniform distribution on [0, 1], respectively. () Generate a mean i for observation i according to equation wh known regression parameters. i exp( 0 1X1 X) (3) Generate a discrete count Y i given that the mean for observation i is gamma distributed wh the dispersion parameter and mean equal to 1: Y ~ Poisson( ) i i

11 exp( ) i i i exp( ) ~ gamma(1, ) i (4) Repeat steps (1) to (3) 1,000 times. 3.. Experiment two Since the dispersion term h(, ) ( 1)/ c 1/ c 1 of SI model has two different parameters, there exist many possible values for parameters and in assigning a certain value to the dispersion term. In order to adequately generate crash datasets that are similar to empirical crash data, observed crash data are used to help select the possible values for parameters and. Specifically, SI models were applied to five crash datasets wh different crash means and variances, which might reasonably reflect different traffic ses wh different safety performance. The estimated scale parameter and shape parameter are provided in Table. The reader is referred to the studies listed in Table for details of datasets, considered explanatory variables and functional forms. Shape and scale parameters are assumed to be fixed throughout this paper. Table. Estimated values of scale and shape parameters for crash datasets wh different characteristics. Datasets Michigan data (Geedipally et al., 01) Texas data (Zou et al., 013) Washington data (Lord et al., 008a) California data (Lord et al., 008a) Indiana data (Cheng et al., 013) * SD = Standard Deviation Crashes Min Max Mean SD* Observations Scale parameter Shape parameter , E The shape parameter is shown to have negative values, between -5 to -1. While the scale parameter is defined to be posive, and most datasets report small values for, wh the exception of Indiana data. The modeling results for Indiana data indicate that the SI model is converging to the Poisson inverse Gaussian model. To make the results gained from

12 this experiment applicable to empirical data, the values for parameters and were selected to try to represent the underlying characteristics related to true accident count distributions. The parameters that were assigned in simulating the crash data are given in Table 3. Table 3. Assigned parameters for scale and shape parameter in five scenarios. Scenario Scale parameter Shape parameter Dispersion term* * Dispersion term = h c c (, ) ( 1)/ 1/ 1 In order to investigate the accuracy of parameter estimates from the NB models for SI data, 100 datasets wh 1,000 observations each, were randomly generated for each of 15 different scenarios corresponding to different dispersion terms and sample means. The 15 scenarios include simulated datasets wh the following dispersion term, h(, ) ( 1)/ c 1/ c 1= 0.5, 0.5, 0.75, 0.95 and For each dispersion term, three different sample means were considered: high mean (HM) ( 0.4, , 0.15 and resulting sample mean is approximately 11.0); moderate mean (MM) ( 0 1.7, , 0.15 and resulting sample mean is approximately 5.5); and low mean (LM) ( 0 0., , 0.15 and resulting sample mean is approximately 1.). For each scenario, the parameter estimates from NB and SI models were compared to the known regression parameter values. The simulation procedure for experiment two is summarized as follows: In experiment two, we generated the SI data using the following steps: (1) Simulate a value for the covariates X 1 and X from a uniform distribution on [0, 1], respectively. () Generate a mean i for observation i according to equation wh known regression parameters. i exp( 0 1X1 X)

13 (3) Generate a discrete count Y i given that the mean for observation i is generalized inverse Gaussian (GIG) distributed wh the scale parameter, shape parameter and mean equal to 1: Y ~ Poisson( ) i i exp( ) i i i exp( ) ~ GIG(1,, ) i (4) Repeat steps (1) to (3) 1,000 times. The probabily densy functions of the gamma and generalized inverse Gaussian distributions can be found in (Rigby et al., 008). Note that the gamma is a liming distribution of the GIG by letting for 0. The coefficients of the NB and SI models were estimated using gamlss package (Rigby and Stasinopoulos, 013) in the software R. At the end of the experiments, the estimated regression parameters, dispersion parameter and dispersion term were recorded and compared to the known parameter values. 4. Simulation results This section summarizes the results of the simulation output. The first part compares the characteristics of the simulated datasets from NB and SI regression models. The second part describes the simulation results for the NB data. The third part provides the simulation results for the SI data. 4.1 Characteristics of the simulated datasets Tables 4 and 5 show the characteristics of crash count for the 100 simulated datasets (each dataset contains 1,000 observations) generated from NB and SI regression models, respectively. For the same simulation setting (i.e., crash mean and dispersion level), the distributions of crash counts simulated from the NB and SI regression models have the similar pattern. Crash counts generated from SI models usually have a longer tail than the crash count generated from NB models. Moreover, as the dispersion parameter/term increases, the difference in the tail behavior (see 100% quantile column in Tables 4 and 5) becomes significant. The reason for this difference is that NB models have the limation that they can only generate over-dispersed count data wh a relatively short tail.

14 Table 4 Characteristics of crash count for the 100 simulated datasets generated from NB regression models. Dispersion parameter Average values of the q-quantiles a of crash count ( 0 0. ) 0% 10% 30% 50% mean 70% 90% 100% b Dispersion parameter Average values of the q-quantiles of crash count ( ) 0% 10% 30% 50% mean 70% 90% 100% c Dispersion parameter Average values of the q-quantiles of crash count ( 0.4 ) 0% 10% 30% 50% mean 70% 90% 100% a The q-quantile of a set of values divides them so that q% of the values lie below and (100-q)% of the values lie above. b Average values of the % quantiles for the 100 simulated datasets. c Average values of the % quantiles for the 100 simulated datasets.

15 Table 5 Characteristics of crash count for the 100 simulated datasets generated from SI regression models Dispersion term Average values of the q-quantiles a of crash count ( 0 0. ) 0% 10% 30% 50% mean 70% 90% 100% Dispersion term Average values of the q-quantiles of crash count ( ) 0% 10% 30% 50% mean 70% 90% 100% Dispersion term Average values of the q-quantiles of crash count ( 0.4 ) 0% 10% 30% 50% mean 70% 90% 100% a The q-quantile of a set of values divides them so that q% of the values lie below and (100-q)% of the values lie above. 4.. Results for experiment one Table 6 shows the means and standard deviations of estimated values for regression parameters k (k= 0, 1, ) at = 0.5, 0.5, 0.75, 0.95 and 1.5 for three sample means. As indicated in Table 6, the regression parameter estimates of k from NB models are reliable under all scenarios. Interestingly, the estimation results from SI models are very similar to those from NB models. In fact, the means and standard deviations of estimated values for NB and SI models are almost identical for many scenarios. Note that for the low sample mean scenario, the standard deviation is slightly larger compared wh the standard deviations for moderate and high sample mean scenarios, indicating the regression parameter estimates tend

16 to be less stable when sample mean is low. Overall, for regression parameters k (k= 0, 1, ), the parameter estimates from NB and SI models are generally reliable for different dispersion parameters and sample means. Table 6 also presents the means and standard deviations of estimated values for dispersion parameter under different scenarios. Note that the dispersion term of SI model is calculated using equation h(, ) ( 1)/ c 1/ c 1 and is considered as the estimated dispersion parameter for the simulated NB data. It can be observed that the dispersion term of the SI models can adequately estimate the dispersion parameter of the NB models. For each scenario, one notable feature is that the standard deviation is generally larger when sample mean is low, even wh a sample size equal to 1,000. For example, when true dispersion parameter = 0.5, the standard deviation of the estimated values from NB models is 0.05 for low sample mean, 0.0 for moderate sample mean and 0.01 for high sample mean. This finding suggests that the dispersion parameter estimated from data characterized by low sample means can be relatively unreliable even if the sample size is sufficient.

17 Table 6 Simulation results for under different scenarios in experiment one. 0 =0. (LM) 0 =1.7 (MM) 0 =.4 (HM) Estimated values for regression parameter 0 under different scenarios Scenarios NB SI NB SI NB SI = a (0.08) b 0.1 (0.08) 1.70 (0.05) 1.69 (0.05).40 (0.05).40 (0.05) = (0.1) 0.1 (0.1) 1.70 (0.07) 1.70 (0.07).40 (0.06).40 (0.06) = (0.1) 0.19 (0.11) 1.70 (0.08) 1.70 (0.09).41 (0.07).40 (0.07) = (0.11) 0.19 (0.11) 1.70 (0.08) 1.70 (0.08).40 (0.07).40 (0.07) = (0.1) 0.1 (0.1) 1.70 (0.11) 1.70 (0.11).39 (0.1).39 (0.1) Estimated values for regression parameter under different scenarios scenarios NB SI NB SI NB SI = (0.1) 0.13 (0.11) 0.15 (0.07) 0.15 (0.07) 0.16 (0.07) 0.16 (0.07) = (0.1) 0.17 (0.13) 0.13 (0.10) 0.13 (0.10) 0.15 (0.10) 0.15 (0.10) = (0.14) 0.17 (0.14) 0.15 (0.11) 0.15 (0.11) 0.14 (0.10) 0.15 (0.09) = (0.14) 0.15 (0.14) 0.16 (0.1) 0.16 (0.1) 0.15 (0.11) 0.15 (0.11) = (0.14) 0.14 (0.14) 0.15 (0.14) 0.15 (0.14) 0.16 (0.13) 0.16 (0.13) Estimated values for regression parameter 0.15 under different scenarios scenarios NB SI NB SI NB SI = (0.11) (0.11) (0.08) (0.08) (0.05) (0.06) = (0.13) (0.13) (0.09) (0.09) (0.08) (0.08) = (0.13) (0.14) (0.11) (0.11) (0.10) (0.10) = (0.13) (0.13) (0.1) (0.1) (0.10) (0.10) = (0.16) (0.16) (0.14) (0.14) (0.15) (0.15) Estimated values for dispersion parameter under different scenarios scenarios NB SI NB SI NB SI = (0.05) 0.6 (0.05) 0.5 (0.0) 0.5 (0.0) 0.5 (0.01) 0.5 (0.01) = (0.07) 0.51 (0.08) 0.50 (0.04) 0.51 (0.04) 0.50 (0.0) 0.50 (0.03) = (0.09) 0.78 (0.10) 0.74 (0.04) 0.76 (0.05) 0.74 (0.04) 0.75 (0.04) = (0.09) 0.97 (0.10) 0.95 (0.05) 0.97 (0.06) 0.95 (0.05) 0.96 (0.06) = (0.1) 1.53 (0.13) 1.49 (0.08) 1.5 (0.10) 1.50 (0.07) 1.53 (0.08) a mean. b standard deviation. LM, the low sample mean scenario; MM, the moderate sample mean scenario; HM, the high sample mean scenario. In summary, the simulation results for the NB data have shown the following characteristics: (1) Both estimates from NB and SI models are very close to the true regression parameters under all scenarios;

18 () The dispersion parameter and dispersion term performed very well in estimating the true dispersion parameter under all scenarios; (3) The dispersion parameter estimated from NB and SI models becomes slightly unreliable for low sample mean even when the sample size is sufficient Results for experiment two In this experiment, the crash datasets were generated from the SI models wh known regression parameters. The simulated datasets were then used to estimate the regression parameters of NB and SI models, respectively. Figures 1-3 show the boxplots of estimated values for regression parameters k (k= 0, 1, ) at dispersion term = 0.5, 0.5, 0.75, 0.95 and 1.55 for three sample means. The regression parameter estimates from NB models are similar to those from SI models and the estimated values become slightly unstable when the sample mean is low. One interesting characteristic worth noting is that the parameter estimates from NB models are less reliable compared wh those from SI models when dispersion term is large (for example, see subfigures (d) and (e) in Figures 1-3). (a) Dispersion term = 0.5 (b) Dispersion term = 0.5

19 (c) Dispersion term = 0.75 (d) Dispersion term = 0.95 (e) Dispersion term = 1.55 Fig. 1. Boxplots of estimated values for regression parameter 0 under different scenarios in experiment two. LM, the low sample mean scenario; MM, the moderate sample mean scenario; HM, the high sample mean scenario. True parameter values are indicated by red horizontal lines.

20 (a) Dispersion term = 0.5 (b) Dispersion term = 0.5 (c) Dispersion term = 0.75 (d) Dispersion term = 0.95

21 (e) Dispersion term = 1.55 Figure. Boxplots of estimated values for regression parameter 1 under different scenarios in experiment two. LM, MM, HM and the red horizontal lines have the same meaning as those in Fig. 1. (a) Dispersion term = 0.5 (b) Dispersion term = 0. 5

22 (c) Dispersion term = 0.75 (d) Dispersion term = 0.95 (e) Dispersion term = 1.55 Figure 3. Boxplots of estimated values for regression parameter under different scenarios in experiment two. LM, MM, HM and the red horizontal lines have the same meaning as those in Fig. 1. Figure 4 presents the boxplots of estimated values for the dispersion term under different scenarios. The dispersion parameter of NB models is considered as the estimated dispersion term for the simulated SI data. It can be observed that there are generally three types of subfigures. For subfigure 4 (a), the parameter estimates from NB and SI models both under-estimate the dispersion term, especially when sample mean is moderate or high. And

23 the SI models can provide slightly larger estimates than NB models. For subfigures 4 (b)-(d), the parameter estimates for dispersion term from SI models are generally appropriate regardless of the sample means; while the estimated values from NB models are usually lower than the true dispersion term. For subfigure 4 (e), although parameter estimates from the SI models seem to be adequate, the distribution of the estimated values is right skewed, which increases the mean of the estimated values of the dispersion term. As expected, the NB estimator is seriously biased under this scenario. (a) Dispersion term = 0.5 (b) Dispersion term = 0.5 (c) Dispersion term = 0.75

24 (d) Dispersion term = 0.95 (e) Dispersion term = 1.55 Figure 4. Boxplots of estimated values for dispersion term under different scenarios in experiment two. LM, MM, HM and the red horizontal lines have the same meaning as those in Fig. 1. In summary, the simulation results for the SI data have shown the following characteristics: (1) The estimates from NB and SI models are adequate for regression parameters under all scenarios; () Under all scenarios, the dispersion parameter of NB models consistently provides significantly biased estimates of the true dispersion term, especially when the simulated crash count has a very long tail; (3) SI models performed well in estimating the dispersion term, except for dispersion term = Estimation Bias The bias of an estimator is defined as the difference between an estimator s expected value and the true value of the parameter being estimated (Francis et al., 01). The estimation bias of the dispersion parameter is calculated as follows: Bias E( ) (5)

25 where is the true value of the dispersion parameter and the is the estimator. And the estimation bias of the dispersion term h(, ) ( 1)/ c 1/ c 1 is calculated as follows: Bias E( h(, )) h(, ) (6) where h(, ) is the true value of the dispersion term and h(, ) is the estimator. The bias of the dispersion parameter and dispersion term h(, ) under each scenario is calculated as the difference between their average estimates from the 100 replications and the true parameter values assigned in each scenario. Tables 7 and 8 provide the parameter estimation bias for the dispersion parameter and dispersion term, respectively. For experiment one, the estimation bias is negligible for all scenarios, which means the dispersion term can be used as a robust estimate of the dispersion parameter in this scenario. For experiment two, parameter estimation bias for NB models is consistently larger than those for the SI models, especially when dispersion term = 0.75, 0.95 and Moreover, for the same dispersion term, the estimation results from NB models deteriorate as the sample mean increases. On the other hand, for SI models, the parameter estimation results are usually acceptable except for the last scenario (dispersion term = 1.55). As shown in Figure 4 (e), the estimated values from SI models are unstable and a few large outliers can be found on one end of the boxplot. These outliers significantly increase the average estimates and thus result in unsatisfactory estimation bias.

26 Table 7. Estimation bias for the dispersion parameter in experiment one Dispersion parameter Scenario Estimation bias NB SI HM MM LM HM MM LM HM MM LM HM MM LM HM MM LM Table 8. Estimation Bias for the Dispersion term in Experiment Two Dispersion parameter Scenario Estimation bias NB SI HM MM LM HM MM LM HM MM LM HM MM LM HM MM LM

27 5. Observed data One observed crash dataset was used to examine whether the dispersion term can be used as an alternative to the dispersion parameter. This dataset contains crash data collected on 1,499 4-lane undivided rural segments in Texas over a five-year period from 1997 to 001. The data were collected as a part of NCHRP 17-9 research project (Lord et al., 008a) and have been extensively used in some previous studies. The mean and variance of the crash data are equal to.84 and 3.4, respectively. Note that the crash count has a long tail (the maximum number of crashes is 97). Table 9 provides the summary statistics for the Texas data. Table 9 Summary statistics of characteristics for individual road segments in the Texas data Variable Min Max Mean(SD ) Sum Number of crashes (5 years) (5.69) 453 Average daily traffic (ADT) over the 5 years (F) ( ) Lane Width (LW) (1.59) - Total Shoulder Width (SW) (8.0) - Curve Densy (CD) (.35) - Segment Length (L) (miles) (0.67) SD = Standard Deviation. Three subsets of 1,000 observations were randomly sampled from the whole dataset. For the entire dataset and each subset, the NB and SI models were fted and the dispersion parameter and dispersion term were calculated, respectively. The mean functional form is adopted as follows: - LF e (7) 1 * LWi 3* SWi 4* CDi i 0 i i where i is the estimated numbers of crashes at segment i ; L i is the segment length in miles for segment i ; F i is the flow (ADT over five years) traveling on segment i ; SWi is the total shoulder width in feet for segment i ; CD i is the curve densy (curves per mile) for segment i ; and β ( 0, 1,, 3, 4)' are the estimated coefficients. The modeling results for NB and SI models are provided in Tables 10 and 11, respectively. First, for Texas data, the goodness-of-f statistics (log-likelihood, Akaike information crerion (AIC) and Bayesian information crerion (BIC)) indicate that the entire crash dataset and subsets 1 and can be better described by SI models. In addion, for the full dataset and three subsets, the estimated dispersion parameters are all less than the estimated

28 dispersion terms, and the magnude of the relative difference could be as high as 50% (for example, see estimated values for subset ). The modeling results using the real crash data appear to correspond wh the outcome of experiment two, in which the dispersion parameter of NB models consistently under-estimated the dispersion level of simulated data when analyzing over-dispersed count data wh a long tail. Since SI models are preferred over NB models in describing the Texas data based on the goodness-of-f statistics, the estimated dispersion term may better reflect the actual level of dispersion of this crash dataset. Second, the estimated values of regression coefficients (i.e., 0, 1,, 3 and 4 ) from NB and SI models differ slightly for all tested datasets. The results support the finding in the two simulation experiments that both models provide similar regression coefficient estimates. Table10 Modeling results for full dataset and three subsets from NB models NB estimate Full dataset Subset 1* Subset Subset 3 Value SE Value SE Value SE Value SE Intercept ln( 0) Ln(ADT) Lane Width Total Shoulder Width Curve Densy Dispersion parameter Observations Log-likelihood AIC BIC * Maximum number of crashes for subsets 1, and 3 are 97, 97 and 41, respectively.

29 Table11 Modeling results for full dataset and three subsets from SI models. SI estimate Full dataset Subset 1 Subset Subset 3 Value SE Value SE Value SE Value SE Intercept ln( 0) Ln(ADT) Lane Width Total Shoulder Width Curve Densy Scale parameter Scale parameter Dispersion term Observations Log-likelihood AIC BIC Discussion In this paper, the results are very interesting and deserve further discussion. Although different models have been proposed for analyzing over-dispersed data, the NB model is still frequently used by traffic safety researchers. However, the results from the simulation experiments raise a few issues about application of NB models in analyzing over-dispersed crash data wh a long tail. Based on the simulation results in this study, the following conclusions can be made: (1) When the crash data are generated from NB models, the dispersion parameter and dispersion term both performed very well in estimating the true dispersion parameter. () If the crash data are generated from SI models (the simulated crash count contains large outliers), then the dispersion parameter of NB models consistently provide biased estimates of the true dispersion term. In sum, when large outliers are present in the crash count, the dispersion parameter of the NB model can possibly be biased, and the dispersion tem of the SI model is more likely to reveal the true level of dispersion in the over-dispersed crash data. Considering the findings in this study, transportation safety researchers are recommended to use the dispersion term of the SI model in crash data analysis, especially when a long tail is presented. Since the EB estimates can be used to identify hotspots by ranking crash-prone locations and assess the effects of implemented treatments, is important to obtain reliable EB estimates. The dispersion parameter of NB models has extensively been used in the EB method. Similar

30 to the dispersion parameter, the dispersion term of the SI model can also be easily used by practioners to obtain reliable EB estimates. Whin the SI modeling framework, the long term mean for a se i using the EB method is given by (for the complete derivation, see Zou et al. (013)): w (1 w ) y (8) i i i i i where is the EB estimate of the expected number of crashes per year for se i; i i is the estimated number of crashes by crash prediction models for given se i (estimated using a 1 SI model); wi is the weight factor estimated as a function of 1 h(, ) i and i h(, ) ( 1)/ c 1/ c 1; and y i is the observed number of crashes per year at se i. So far, the NB distribution is the most frequently used model by transportation safety analysts for calculating the EB estimates (Cheng and Washington, 005; Huang et al., 009) and the dispersion parameter can be assumed to be fixed to the entire dataset (Miaou, 1996) or varying over different ses and periods (Hauer, 001). However, many safety researchers may not be aware that the dispersion parameter of NB models can be biased as demonstrated in experiment two. Lord (006) showed that even wh a small error in the misspecification of dispersion parameter, the EB estimate can be greatly affected. The dispersion term of SI models may provide a more reliable estimate of the level of dispersion in the data. Recently, two studies (Wu et al., 014; Zou et al., 013) have compared the effect of the dispersion parameter and dispersion term on the precision of the EB analysis. Zou et al. (013) found that the selection of the crash prediction model (i.e., the SI or NB model) will affect the value of weight factor used for estimating the EB output. Moreover, Wu et al. (014) conducted a simulation study and the results suggest that the SI-based EB method can consistently provide a better hotspot identification result than the NB-based EB method. Thus, transportation safety researchers have the option to use the dispersion term for calculating the EB estimates. Note that the dispersion term of the SI models is not significantly more difficult to estimate than the dispersion parameter of NB models since the gamlss (Rigby and Stasinopoulos, 013) package in the software R has built-in functions that can handle both the NB and SI models. In addion, the difference in computational times between NB and SI models is very small and the total time the models took to converge is a non-issue. For example, for the Texas data, the computational times for the two models (NB and SI) both converged after 1 minute. Overall, the estimation of the dispersion term and s computational time are not a concern for transportation safety analysts. 7. Summary and conclusions Given the importance of the dispersion parameter in various types of transportation safety

31 studies, the objective of this paper was to investigate whether the dispersion parameter can truly reflect the level of dispersion in over-dispersed crash data wh a long tail and whether the dispersion term of the SI model can be used as an alternative. The performance of the dispersion parameter and dispersion term was examined using simulated datasets generated from various NB and SI regression models. Appropriate sample means and dispersion levels are selected to generate over-dispersed data sets that are convincingly similar to empirical crash data. It is found that crash count simulated from SI regression models usually has a longer tail than the crash count generated from NB regression models. Moreover, the simulation results show that the dispersion parameter of NB models consistently underestimated the dispersion level for the over-dispersed crash data generated from SI regression models and the newly introduced dispersion term of SI models can estimate the true level of dispersion wh small estimation bias. Overall, considering that the dispersion parameter can possibly be a biased estimator of the level of dispersion in the data, is believed that the dispersion term may offer a viable alternative for analyzing over-dispersed crash data wh a long tail. For future work, is useful to implement the SI-based EB method for identifying hotspots using crash severy data. Some new creria (Cheng and Washington, 008) can be considered to evaluate the effectiveness of the SI-based EB and the NB-based EB methods. References Aguero-Valverde, J., Jovanis, P.P., 008. Analysis of road crash frequency wh spatial models. Transportation Research Record: Journal of the Transportation Research Board 061, Anastasopoulos, P.C., Mannering, F.L., 009. A note on modeling vehicle accident frequencies wh random-parameters count models. Accident Analysis & Prevention 41, Bhat, C.R., Born, K., Sidharthan, R., Bhat, P.C., 014. A Count Data Model wh Endogenous Covariates: Formulation and Application to Roadway Crash Frequency at Intersections. Analytic Methods in Accident Research 1, Castro, M., Paleti, R., Bhat, C.R., 01. A latent variable representation of count data models to accommodate spatial and temporal dependence: Application to predicting crash frequency at intersections. Transportation research part B: methodological 46, Chang, L.-Y., 005. Analysis of freeway accident frequencies: negative binomial regression versus artificial neural network. Safety Science 43, Chen, E., Tarko, A.P., 013. Modeling safety of highway work zones wh random parameters and random effects models. Analytic Methods in Accident Research. Cheng, L., Geedipally, S.R., Lord, D., 013. The Poisson Weibull generalized linear model for analyzing motor vehicle crash data. Safety Science 54, Cheng, W., Washington, S., 008. New Creria for Evaluating Methods of Identifying Hot Spots. Transportation Research Record: Journal of the Transportation Research Board

32 083, Cheng, W., Washington, S.P., 005. Experimental Evaluation of Hotspot Identification Methods. Accident Analysis and Prevention 37, Chin, H.C., Quddus, M.A., 003. Modeling Count Data wh Excess Zeroes An Empirical Application to Traffic Accidents. Sociological methods & research 3, El-Basyouny, K., Sayed, T., 006. Comparison of two negative binomial regression techniques in developing accident prediction models. Transportation Research Record: Journal of the Transportation Research Board 1950, Francis, R.A., Geedipally, S.R., Guikema, S.D., Dhavala, S.S., Lord, D., LaRocca, S., 01. Characterizing the Performance of the Conway Maxwell Poisson Generalized Linear Model. Risk Analysis 3, Geedipally, S.R., Lord, D., Dhavala, S.S., 01. The Negative Binomial-Lindley Generalized Linear Model: Characteristics and Applicatiousing Crash Data. Accident Analysis and Prevention 45, Guo, J., Trivedi, P., 00. Flexible parametric models for long-tailed patent count distributions. Oxford Bulletin of Economics and Statistics 64, Gupta, R.C., Ong, S., 005. Analysis of long-tailed count data by Poisson mixtures. Communications in statistics Theory and Methods 34, Hauer, E., Observational Before/After Studies in Road Safety. Estimating the Effect of Highway and Traffic Engineering Measures on Road Safety. Hauer, E., 001. Overdispersion in modelling accidents on road sections and in Empirical Bayes estimation. Accident Analysis & Prevention 33, Hauer, E., Ng, J.C., Lovell, J., Estimation of safety at signalized intersections (wh discussion and closure). Hilbe, J.M., 011. Negative binomial regression. Cambridge Universy Press. Huang, H., Chin, H.C., Haque, M.M., 009. Empirical Evaluation of Alternative Approaches in Identifying Crash Hot Spots. Transportation Research Record 103, Kumara, S., Chin, H.C., 003. Modeling accident occurrence at signalized tee intersections wh special emphasis on excess zeros. Traffic Injury Prevention 4, Lord, D., 006. Modeling motor vehicle crashes using Poisson-gamma models: Examining the effects of low sample mean values and small sample size on the estimation of the fixed dispersion parameter. Accident Analysis & Prevention 38, Lord, D., Geedipally, S.R., Persaud, B.N., Washington, S.P., van Schalkwyk, I., Ivan, J.N., Lyon, C., Jonsson, T., 008a. Methodology to predict the safety performance of rural multilane highways. Lord, D., Guikema, S.D., Geedipally, S.R., 008b. Application of The Conway-Maxwell-Poisson Generalized Linear Model for Analyzing Motor Vehicle Crashes. Accident Analysis and Prevention 40,

TRB Paper # Examining the Crash Variances Estimated by the Poisson-Gamma and Conway-Maxwell-Poisson Models

TRB Paper # Examining the Crash Variances Estimated by the Poisson-Gamma and Conway-Maxwell-Poisson Models TRB Paper #11-2877 Examining the Crash Variances Estimated by the Poisson-Gamma and Conway-Maxwell-Poisson Models Srinivas Reddy Geedipally 1 Engineering Research Associate Texas Transportation Instute

More information

Analyzing Highly Dispersed Crash Data Using the Sichel Generalized Additive Models for Location, Scale and Shape

Analyzing Highly Dispersed Crash Data Using the Sichel Generalized Additive Models for Location, Scale and Shape Analyzing Highly Dispersed Crash Data Using the Sichel Generalized Additive Models for Location, Scale and Shape By Yajie Zou Ph.D. Candidate Zachry Department of Civil Engineering Texas A&M University,

More information

Effects of the Varying Dispersion Parameter of Poisson-gamma models on the estimation of Confidence Intervals of Crash Prediction models

Effects of the Varying Dispersion Parameter of Poisson-gamma models on the estimation of Confidence Intervals of Crash Prediction models Effects of the Varying Dispersion Parameter of Poisson-gamma models on the estimation of Confidence Intervals of Crash Prediction models By Srinivas Reddy Geedipally Research Assistant Zachry Department

More information

The Negative Binomial Lindley Distribution as a Tool for Analyzing Crash Data Characterized by a Large Amount of Zeros

The Negative Binomial Lindley Distribution as a Tool for Analyzing Crash Data Characterized by a Large Amount of Zeros The Negative Binomial Lindley Distribution as a Tool for Analyzing Crash Data Characterized by a Large Amount of Zeros Dominique Lord 1 Associate Professor Zachry Department of Civil Engineering Texas

More information

TRB Paper Examining Methods for Estimating Crash Counts According to Their Collision Type

TRB Paper Examining Methods for Estimating Crash Counts According to Their Collision Type TRB Paper 10-2572 Examining Methods for Estimating Crash Counts According to Their Collision Type Srinivas Reddy Geedipally 1 Engineering Research Associate Texas Transportation Institute Texas A&M University

More information

Comparison of Confidence and Prediction Intervals for Different Mixed-Poisson Regression Models

Comparison of Confidence and Prediction Intervals for Different Mixed-Poisson Regression Models 0 0 0 Comparison of Confidence and Prediction Intervals for Different Mixed-Poisson Regression Models Submitted by John E. Ash Research Assistant Department of Civil and Environmental Engineering, University

More information

Exploring the Application of the Negative Binomial-Generalized Exponential Model for Analyzing Traffic Crash Data with Excess Zeros

Exploring the Application of the Negative Binomial-Generalized Exponential Model for Analyzing Traffic Crash Data with Excess Zeros Exploring the Application of the Negative Binomial-Generalized Exponential Model for Analyzing Traffic Crash Data with Excess Zeros Prathyusha Vangala Graduate Student Zachry Department of Civil Engineering

More information

Investigating the effects of the fixed and varying dispersion parameters of Poisson-gamma models on empirical Bayes estimates

Investigating the effects of the fixed and varying dispersion parameters of Poisson-gamma models on empirical Bayes estimates Investigating the effects of the fixed and varying dispersion parameters of Poisson-gamma models on empirical Bayes estimates Dominique Lord, Ph.D., P.Eng.* Assistant Professor Department of Civil Engineering

More information

The Conway Maxwell Poisson Model for Analyzing Crash Data

The Conway Maxwell Poisson Model for Analyzing Crash Data The Conway Maxwell Poisson Model for Analyzing Crash Data (Discussion paper associated with The COM Poisson Model for Count Data: A Survey of Methods and Applications by Sellers, K., Borle, S., and Shmueli,

More information

The Negative Binomial-Lindley Generalized Linear Model: Characteristics and Application using Crash Data

The Negative Binomial-Lindley Generalized Linear Model: Characteristics and Application using Crash Data The Negative Binomial-Lindley Generalized Linear Model: Characteristics and Application using Crash Data Srinivas Reddy Geedipally 1 Engineering Research Associate Texas Transportation Institute Texas

More information

TRB Paper Hot Spot Identification by Modeling Single-Vehicle and Multi-Vehicle Crashes Separately

TRB Paper Hot Spot Identification by Modeling Single-Vehicle and Multi-Vehicle Crashes Separately TRB Paper 10-2563 Hot Spot Identification by Modeling Single-Vehicle and Multi-Vehicle Crashes Separately Srinivas Reddy Geedipally 1 Engineering Research Associate Texas Transportation Institute Texas

More information

LEVERAGING HIGH-RESOLUTION TRAFFIC DATA TO UNDERSTAND THE IMPACTS OF CONGESTION ON SAFETY

LEVERAGING HIGH-RESOLUTION TRAFFIC DATA TO UNDERSTAND THE IMPACTS OF CONGESTION ON SAFETY LEVERAGING HIGH-RESOLUTION TRAFFIC DATA TO UNDERSTAND THE IMPACTS OF CONGESTION ON SAFETY Tingting Huang 1, Shuo Wang 2, Anuj Sharma 3 1,2,3 Department of Civil, Construction and Environmental Engineering,

More information

Accident Analysis and Prevention xxx (2006) xxx xxx. Dominique Lord

Accident Analysis and Prevention xxx (2006) xxx xxx. Dominique Lord Accident Analysis and Prevention xxx (2006) xxx xxx Modeling motor vehicle crashes using Poisson-gamma models: Examining the effects of low sample mean values and small sample size on the estimation of

More information

Investigating the Effect of Modeling Single-Vehicle and Multi-Vehicle Crashes Separately on Confidence Intervals of Poisson-gamma Models

Investigating the Effect of Modeling Single-Vehicle and Multi-Vehicle Crashes Separately on Confidence Intervals of Poisson-gamma Models Investigating the Effect of Modeling Single-Vehicle and Multi-Vehicle Crashes Separately on Confidence Intervals of Poisson-gamma Models Srinivas Reddy Geedipally 1 Engineering Research Associate Texas

More information

Crash Data Modeling with a Generalized Estimator

Crash Data Modeling with a Generalized Estimator Crash Data Modeling with a Generalized Estimator Zhirui Ye* Professor, Ph.D. Jiangsu Key Laboratory of Urban ITS Jiangsu Province Collaborative Innovation Center of Modern Urban Traffic Technologies Southeast

More information

FULL BAYESIAN POISSON-HIERARCHICAL MODELS FOR CRASH DATA ANALYSIS: INVESTIGATING THE IMPACT OF MODEL CHOICE ON SITE-SPECIFIC PREDICTIONS

FULL BAYESIAN POISSON-HIERARCHICAL MODELS FOR CRASH DATA ANALYSIS: INVESTIGATING THE IMPACT OF MODEL CHOICE ON SITE-SPECIFIC PREDICTIONS FULL BAYESIAN POISSON-HIERARCHICAL MODELS FOR CRASH DATA ANALYSIS: INVESTIGATING THE IMPACT OF MODEL CHOICE ON SITE-SPECIFIC PREDICTIONS A Dissertation by SEYED HADI KHAZRAEE KHOSHROOZI Submitted to the

More information

Application of the hyper-poisson generalized linear model for analyzing motor vehicle crashes

Application of the hyper-poisson generalized linear model for analyzing motor vehicle crashes Application of the hyper-poisson generalized linear model for analyzing motor vehicle crashes S. Hadi Khazraee 1 Graduate Research Assistant Zachry Department of Civil Engineering Texas A&M University

More information

Bayesian Poisson Hierarchical Models for Crash Data Analysis: Investigating the Impact of Model Choice on Site-Specific Predictions

Bayesian Poisson Hierarchical Models for Crash Data Analysis: Investigating the Impact of Model Choice on Site-Specific Predictions Khazraee, Johnson and Lord Page 1 of 47 Bayesian Poisson Hierarchical Models for Crash Data Analysis: Investigating the Impact of Model Choice on Site-Specific Predictions S. Hadi Khazraee, Ph.D.* Safety

More information

EXAMINING THE USE OF REGRESSION MODELS FOR DEVELOPING CRASH MODIFICATION FACTORS. A Dissertation LINGTAO WU

EXAMINING THE USE OF REGRESSION MODELS FOR DEVELOPING CRASH MODIFICATION FACTORS. A Dissertation LINGTAO WU EXAMINING THE USE OF REGRESSION MODELS FOR DEVELOPING CRASH MODIFICATION FACTORS A Dissertation by LINGTAO WU Submitted to the Office of Graduate and Professional Studies of Texas A&M University in partial

More information

ABSTRACT (218 WORDS) Prepared for Publication in Transportation Research Record Words: 5,449+1*250 (table) + 6*250 (figures) = 7,199 TRB

ABSTRACT (218 WORDS) Prepared for Publication in Transportation Research Record Words: 5,449+1*250 (table) + 6*250 (figures) = 7,199 TRB TRB 2003-3363 MODELING TRAFFIC CRASH-FLOW RELATIONSHIPS FOR INTERSECTIONS: DISPERSION PARAMETER, FUNCTIONAL FORM, AND BAYES VERSUS EMPIRICAL BAYES Shaw-Pin Miaou Research Scientist Texas Transportation

More information

Including Statistical Power for Determining. How Many Crashes Are Needed in Highway Safety Studies

Including Statistical Power for Determining. How Many Crashes Are Needed in Highway Safety Studies Including Statistical Power for Determining How Many Crashes Are Needed in Highway Safety Studies Dominique Lord Assistant Professor Texas A&M University, 336 TAMU College Station, TX 77843-336 Phone:

More information

A Full Bayes Approach to Road Safety: Hierarchical Poisson. Mixture Models, Variance Function Characterization, and. Prior Specification

A Full Bayes Approach to Road Safety: Hierarchical Poisson. Mixture Models, Variance Function Characterization, and. Prior Specification A Full Bayes Approach to Road Safety: Hierarchical Poisson Mixture Models, Variance Function Characterization, and Prior Specification Mohammad Heydari A Thesis in The Department of Building, Civil and

More information

Macro-level Pedestrian and Bicycle Crash Analysis: Incorporating Spatial Spillover Effects in Dual State Count Models

Macro-level Pedestrian and Bicycle Crash Analysis: Incorporating Spatial Spillover Effects in Dual State Count Models Macro-level Pedestrian and Bicycle Crash Analysis: Incorporating Spatial Spillover Effects in Dual State Count Models Qing Cai Jaeyoung Lee* Naveen Eluru Mohamed Abdel-Aty Department of Civil, Environment

More information

Characterizing the Performance of the Conway-Maxwell Poisson Generalized Linear Model

Characterizing the Performance of the Conway-Maxwell Poisson Generalized Linear Model Characterizing the Performance of the Conway-Maxwell Poisson Generalized Linear Model Royce A. Francis 1,2, Srinivas Reddy Geedipally 3, Seth D. Guikema 2, Soma Sekhar Dhavala 5, Dominique Lord 4, Sarah

More information

Poisson Inverse Gaussian (PIG) Model for Infectious Disease Count Data

Poisson Inverse Gaussian (PIG) Model for Infectious Disease Count Data American Journal of Theoretical and Applied Statistics 2016; 5(5): 326-333 http://www.sciencepublishinggroup.com/j/ajtas doi: 10.11648/j.ajtas.20160505.22 ISSN: 2326-8999 (Print); ISSN: 2326-9006 (Online)

More information

New Achievement in the Prediction of Highway Accidents

New Achievement in the Prediction of Highway Accidents Article New Achievement in the Prediction of Highway Accidents Gholamali Shafabakhsh a, * and Yousef Sajed b Faculty of Civil Engineering, Semnan University, University Sq., P.O. Box 35196-45399, Semnan,

More information

Parameters Estimation Methods for the Negative Binomial-Crack Distribution and Its Application

Parameters Estimation Methods for the Negative Binomial-Crack Distribution and Its Application Original Parameters Estimation Methods for the Negative Binomial-Crack Distribution and Its Application Pornpop Saengthong 1*, Winai Bodhisuwan 2 Received: 29 March 2013 Accepted: 15 May 2013 Abstract

More information

Confidence and prediction intervals for. generalised linear accident models

Confidence and prediction intervals for. generalised linear accident models Confidence and prediction intervals for generalised linear accident models G.R. Wood September 8, 2004 Department of Statistics, Macquarie University, NSW 2109, Australia E-mail address: gwood@efs.mq.edu.au

More information

Accident Prediction Models for Freeways

Accident Prediction Models for Freeways TRANSPORTATION RESEARCH RECORD 1401 55 Accident Prediction Models for Freeways BHAGWANT PERSAUD AND LESZEK DZBIK The modeling of freeway accidents continues to be of interest because of the frequency and

More information

LINEAR REGRESSION CRASH PREDICTION MODELS: ISSUES AND PROPOSED SOLUTIONS

LINEAR REGRESSION CRASH PREDICTION MODELS: ISSUES AND PROPOSED SOLUTIONS LINEAR REGRESSION CRASH PREDICTION MODELS: ISSUES AND PROPOSED SOLUTIONS FINAL REPORT PennDOT/MAUTC Agreement Contract No. VT-8- DTRS99-G- Prepared for Virginia Transportation Research Council By H. Rakha,

More information

PLANNING TRAFFIC SAFETY IN URBAN TRANSPORTATION NETWORKS: A SIMULATION-BASED EVALUATION PROCEDURE

PLANNING TRAFFIC SAFETY IN URBAN TRANSPORTATION NETWORKS: A SIMULATION-BASED EVALUATION PROCEDURE PLANNING TRAFFIC SAFETY IN URBAN TRANSPORTATION NETWORKS: A SIMULATION-BASED EVALUATION PROCEDURE Michele Ottomanelli and Domenico Sassanelli Polytechnic of Bari Dept. of Highways and Transportation EU

More information

Lecture-19: Modeling Count Data II

Lecture-19: Modeling Count Data II Lecture-19: Modeling Count Data II 1 In Today s Class Recap of Count data models Truncated count data models Zero-inflated models Panel count data models R-implementation 2 Count Data In many a phenomena

More information

Freeway rear-end collision risk for Italian freeways. An extreme value theory approach

Freeway rear-end collision risk for Italian freeways. An extreme value theory approach XXII SIDT National Scientific Seminar Politecnico di Bari 14 15 SETTEMBRE 2017 Freeway rear-end collision risk for Italian freeways. An extreme value theory approach Gregorio Gecchele Federico Orsini University

More information

EVALUATION OF SAFETY PERFORMANCES ON FREEWAY DIVERGE AREA AND FREEWAY EXIT RAMPS. Transportation Seminar February 16 th, 2009

EVALUATION OF SAFETY PERFORMANCES ON FREEWAY DIVERGE AREA AND FREEWAY EXIT RAMPS. Transportation Seminar February 16 th, 2009 EVALUATION OF SAFETY PERFORMANCES ON FREEWAY DIVERGE AREA AND FREEWAY EXIT RAMPS Transportation Seminar February 16 th, 2009 By: Hongyun Chen Graduate Research Assistant 1 Outline Introduction Problem

More information

Safety Effectiveness of Variable Speed Limit System in Adverse Weather Conditions on Challenging Roadway Geometry

Safety Effectiveness of Variable Speed Limit System in Adverse Weather Conditions on Challenging Roadway Geometry Safety Effectiveness of Variable Speed Limit System in Adverse Weather Conditions on Challenging Roadway Geometry Promothes Saha, Mohamed M. Ahmed, and Rhonda Kae Young This paper examined the interaction

More information

Local Calibration Factors for Implementing the Highway Safety Manual in Maine

Local Calibration Factors for Implementing the Highway Safety Manual in Maine Local Calibration Factors for Implementing the Highway Safety Manual in Maine 2017 Northeast Transportation Safety Conference Cromwell, Connecticut October 24-25, 2017 MAINE Darryl Belz, P.E. Maine Department

More information

Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data. Fred Mannering University of South Florida

Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data. Fred Mannering University of South Florida Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data Fred Mannering University of South Florida Highway Accidents Cost the lives of 1.25 million people per year Leading cause

More information

How to Incorporate Accident Severity and Vehicle Occupancy into the Hot Spot Identification Process?

How to Incorporate Accident Severity and Vehicle Occupancy into the Hot Spot Identification Process? How to Incorporate Accident Severity and Vehicle Occupancy into the Hot Spot Identification Process? Luis F. Miranda-Moreno, Liping Fu, Satish Ukkusuri, and Dominique Lord This paper introduces a Bayesian

More information

DAYLIGHT, TWILIGHT, AND NIGHT VARIATION IN ROAD ENVIRONMENT-RELATED FREEWAY TRAFFIC CRASHES IN KOREA

DAYLIGHT, TWILIGHT, AND NIGHT VARIATION IN ROAD ENVIRONMENT-RELATED FREEWAY TRAFFIC CRASHES IN KOREA DAYLIGHT, TWILIGHT, AND NIGHT VARIATION IN ROAD ENVIRONMENT-RELATED FREEWAY TRAFFIC CRASHES IN KOREA Sungmin Hong, Ph.D. Korea Transportation Safety Authority 17, Hyeoksin 6-ro, Gimcheon-si, Gyeongsangbuk-do,

More information

Hot Spot Identification using frequency of distinct crash types rather than total crashes

Hot Spot Identification using frequency of distinct crash types rather than total crashes Australasian Transport Research Forum 010 Proceedings 9 September 1 October 010, Canberra, Australia Publication website: http://www.patrec.org/atrf.aspx Hot Spot Identification using frequency of distinct

More information

Global Journal of Engineering Science and Research Management

Global Journal of Engineering Science and Research Management DEVELOPMENT AND APPLICATION OF CRASH MODIFICATION FACTORS FOR TRAFFIC FLOW PARAMETERS ON URBAN FREEWAY SEGMENTS Eugene Vida Maina, Ph.D*, Janice R. Daniel, Ph.D * Operations Systems Research Analyst, Dallas

More information

Statistical Model Of Road Traffic Crashes Data In Anambra State, Nigeria: A Poisson Regression Approach

Statistical Model Of Road Traffic Crashes Data In Anambra State, Nigeria: A Poisson Regression Approach Statistical Model Of Road Traffic Crashes Data In Anambra State, Nigeria: A Poisson Regression Approach Nwankwo Chike H., Nwaigwe Godwin I Abstract: Road traffic crashes are count (discrete) in nature.

More information

The relationship between urban accidents, traffic and geometric design in Tehran

The relationship between urban accidents, traffic and geometric design in Tehran Urban Transport XVIII 575 The relationship between urban accidents, traffic and geometric design in Tehran S. Aftabi Hossein 1 & M. Arabani 2 1 Bandar Anzali Branch, Islamic Azad University, Iran 2 Department

More information

DEVELOPING DECISION SUPPORT TOOLS FOR THE IMPLEMENTATION OF BICYCLE AND PEDESTRIAN SAFETY STRATEGIES

DEVELOPING DECISION SUPPORT TOOLS FOR THE IMPLEMENTATION OF BICYCLE AND PEDESTRIAN SAFETY STRATEGIES DEVELOPING DECISION SUPPORT TOOLS FOR THE IMPLEMENTATION OF BICYCLE AND PEDESTRIAN SAFETY STRATEGIES Deo Chimba, PhD., P.E., PTOE Associate Professor Civil Engineering Department Tennessee State University

More information

Texas A&M University

Texas A&M University Texas A&M University CVEN 658 Civil Engineering Applications of GIS Hotspot Analysis of Highway Accident Spatial Pattern Based on Network Spatial Weights Instructor: Dr. Francisco Olivera Author: Zachry

More information

Prediction of Bike Rental using Model Reuse Strategy

Prediction of Bike Rental using Model Reuse Strategy Prediction of Bike Rental using Model Reuse Strategy Arun Bala Subramaniyan and Rong Pan School of Computing, Informatics, Decision Systems Engineering, Arizona State University, Tempe, USA. {bsarun, rong.pan}@asu.edu

More information

Transportation and Road Weather

Transportation and Road Weather Portland State University PDXScholar TREC Friday Seminar Series Transportation Research and Education Center (TREC) 4-18-2014 Transportation and Road Weather Rhonda Young University of Wyoming Let us know

More information

DEVELOPMENT OF CRASH PREDICTION MODEL USING MULTIPLE REGRESSION ANALYSIS Harshit Gupta 1, Dr. Siddhartha Rokade 2 1

DEVELOPMENT OF CRASH PREDICTION MODEL USING MULTIPLE REGRESSION ANALYSIS Harshit Gupta 1, Dr. Siddhartha Rokade 2 1 DEVELOPMENT OF CRASH PREDICTION MODEL USING MULTIPLE REGRESSION ANALYSIS Harshit Gupta 1, Dr. Siddhartha Rokade 2 1 PG Student, 2 Assistant Professor, Department of Civil Engineering, Maulana Azad National

More information

High Friction Surface Treatment on Bridge Safety

High Friction Surface Treatment on Bridge Safety High Friction Surface Treatment on Bridge Safety Brian Porter/Rebecca Szymkowski- WisDOT Andrea Bill- UW-Madison TOPS Lab Objectives Weather in WI can be harsh Bridges can be problematic in inclement weather

More information

Modeling Simple and Combination Effects of Road Geometry and Cross Section Variables on Traffic Accidents

Modeling Simple and Combination Effects of Road Geometry and Cross Section Variables on Traffic Accidents Modeling Simple and Combination Effects of Road Geometry and Cross Section Variables on Traffic Accidents Terrance M. RENGARASU MS., Doctoral Degree candidate Graduate School of Engineering, Hokkaido University

More information

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1 Parametric Modelling of Over-dispersed Count Data Part III / MMath (Applied Statistics) 1 Introduction Poisson regression is the de facto approach for handling count data What happens then when Poisson

More information

EFFECT OF HIGHWAY GEOMETRICS ON ACCIDENT MODELING

EFFECT OF HIGHWAY GEOMETRICS ON ACCIDENT MODELING Sustainable Solutions in Structural Engineering and Construction Edited by Saha, S., Lloyd, N., Yazdani, S., and Singh, A. Copyright 2015 ISEC Press ISBN: 978-0-9960437-1-7 EFFECT OF HIGHWAY GEOMETRICS

More information

IDAHO TRANSPORTATION DEPARTMENT

IDAHO TRANSPORTATION DEPARTMENT RESEARCH REPORT IDAHO TRANSPORTATION DEPARTMENT RP 191A Potential Crash Reduction Benefits of Safety Improvement Projects Part A: Shoulder Rumble Strips By Ahmed Abdel-Rahim Mubassira Khan University of

More information

arxiv: v1 [stat.ap] 11 Aug 2008

arxiv: v1 [stat.ap] 11 Aug 2008 MARKOV SWITCHING MODELS: AN APPLICATION TO ROADWAY SAFETY arxiv:0808.1448v1 [stat.ap] 11 Aug 2008 (a draft, August, 2008) A Dissertation Submitted to the Faculty of Purdue University by Nataliya V. Malyshkina

More information

FREEWAY INCIDENT FREQUENCY ANALYSIS BASED ON CART METHOD

FREEWAY INCIDENT FREQUENCY ANALYSIS BASED ON CART METHOD XUECAI XU, Ph.D. E-mail: xuecai_xu@hust.edu.cn Huazhong University of Science and Technology & University of Hong Kong China ŽELJKO ŠARIĆ, Ph.D. Candidate E-mail: zeljko.saric@fpz.hr Faculty of Transport

More information

Planning Level Regression Models for Crash Prediction on Interchange and Non-Interchange Segments of Urban Freeways

Planning Level Regression Models for Crash Prediction on Interchange and Non-Interchange Segments of Urban Freeways Planning Level Regression Models for Crash Prediction on Interchange and Non-Interchange Segments of Urban Freeways Arun Chatterjee, Professor Department of Civil and Environmental Engineering The University

More information

Impact of Day-to-Day Variability of Peak Hour Volumes on Signalized Intersection Performance

Impact of Day-to-Day Variability of Peak Hour Volumes on Signalized Intersection Performance Impact of Day-to-Day Variability of Peak Hour Volumes on Signalized Intersection Performance Bruce Hellinga, PhD, PEng Associate Professor (Corresponding Author) Department of Civil and Environmental Engineering,

More information

ANALYSIS OF INTRINSIC FACTORS CONTRIBUTING TO URBAN ROAD CRASHES

ANALYSIS OF INTRINSIC FACTORS CONTRIBUTING TO URBAN ROAD CRASHES S. Raicu, et al., Int. J. of Safety and Security Eng., Vol. 7, No. 1 (2017) 1 9 ANALYSIS OF INTRINSIC FACTORS CONTRIBUTING TO URBAN ROAD CRASHES S. RAICU, D. COSTESCU & S. BURCIU Politehnica University

More information

Confirmatory and Exploratory Data Analyses Using PROC GENMOD: Factors Associated with Red Light Running Crashes

Confirmatory and Exploratory Data Analyses Using PROC GENMOD: Factors Associated with Red Light Running Crashes Confirmatory and Exploratory Data Analyses Using PROC GENMOD: Factors Associated with Red Light Running Crashes Li wan Chen, LENDIS Corporation, McLean, VA Forrest Council, Highway Safety Research Center,

More information

Comparative Analysis of Zonal Systems for Macro-level Crash Modeling: Census Tracts, Traffic Analysis Zones, and Traffic Analysis Districts

Comparative Analysis of Zonal Systems for Macro-level Crash Modeling: Census Tracts, Traffic Analysis Zones, and Traffic Analysis Districts Comparative Analysis of Zonal Systems for Macro-level Crash Modeling: Census Tracts, Traffic Analysis Zones, and Traffic Analysis Districts Qing Cai* Mohamed Abdel-Aty Jaeyoung Lee Naveen Eluru Department

More information

MODELING OF 85 TH PERCENTILE SPEED FOR RURAL HIGHWAYS FOR ENHANCED TRAFFIC SAFETY ANNUAL REPORT FOR FY 2009 (ODOT SPR ITEM No.

MODELING OF 85 TH PERCENTILE SPEED FOR RURAL HIGHWAYS FOR ENHANCED TRAFFIC SAFETY ANNUAL REPORT FOR FY 2009 (ODOT SPR ITEM No. MODELING OF 85 TH PERCENTILE SPEED FOR RURAL HIGHWAYS FOR ENHANCED TRAFFIC SAFETY ANNUAL REPORT FOR FY 2009 (ODOT SPR ITEM No. 2211) Submitted to: Ginger McGovern, P.E. Planning and Research Division Engineer

More information

Risk Assessment of Highway Bridges: A Reliability-based Approach

Risk Assessment of Highway Bridges: A Reliability-based Approach Risk Assessment of Highway Bridges: A Reliability-based Approach by Reynaldo M. Jr., PhD Indiana University-Purdue University Fort Wayne pablor@ipfw.edu Abstract: Many countries are currently experiencing

More information

$QDO\]LQJ$UWHULDO6WUHHWVLQ1HDU&DSDFLW\ RU2YHUIORZ&RQGLWLRQV

$QDO\]LQJ$UWHULDO6WUHHWVLQ1HDU&DSDFLW\ RU2YHUIORZ&RQGLWLRQV Paper No. 001636 $QDO\]LQJ$UWHULDO6WUHHWVLQ1HDU&DSDFLW\ RU2YHUIORZ&RQGLWLRQV Duplication for publication or sale is strictly prohibited without prior written permission of the Transportation Research Board

More information

Modeling of Accidents Using Safety Performance Functions

Modeling of Accidents Using Safety Performance Functions Modeling of Accidents Using Safety Performance Functions Khair S. Jadaan, Lamya Y. Foudeh, Mohammad N. Al-Marafi, and Majed Msallam Abstract Extensive research has been carried out in the field of road

More information

Re-visiting crash-speed relationships: A new perspective in crash modelling

Re-visiting crash-speed relationships: A new perspective in crash modelling 1 1 1 1 1 1 1 1 0 1 0 Re-visiting crash-speed relationships: A new perspective in crash modelling Maria-Ioanna M. Imprialou a, Mohammed Quddus a, David E. Pitfield a, Dominique Lord b a School of Civil

More information

Varieties of Count Data

Varieties of Count Data CHAPTER 1 Varieties of Count Data SOME POINTS OF DISCUSSION What are counts? What are count data? What is a linear statistical model? What is the relationship between a probability distribution function

More information

Mixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data

Mixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data Mixtures of Negative Binomial distributions for modelling overdispersion in RNA-Seq data Cinzia Viroli 1 joint with E. Bonafede 1, S. Robin 2 & F. Picard 3 1 Department of Statistical Sciences, University

More information

MODELING ACCIDENT FREQUENCIES AS ZERO-ALTERED PROBABILITY PROCESSES: AN EMPIRICAL INQUIRY

MODELING ACCIDENT FREQUENCIES AS ZERO-ALTERED PROBABILITY PROCESSES: AN EMPIRICAL INQUIRY Pergamon PII: SOOOl-4575(97)00052-3 Accid. Anal. and Prev., Vol. 29, No. 6, pp. 829-837, 1997 0 1997 Elsevier Science Ltd All rights reserved. Printed in Great Britain OOOI-4575/97 $17.00 + 0.00 MODELING

More information

Real-time Traffic Safety Evaluation Models And Their Application For Variable Speed Limits

Real-time Traffic Safety Evaluation Models And Their Application For Variable Speed Limits University of Central Florida Electronic Theses and Dissertations Doctoral Dissertation (Open Access) Real-time Traffic Safety Evaluation Models And Their Application For Variable Speed Limits 2013 Rongjie

More information

Hot Spot Analysis: Improving a Local Indicator of Spatial Association for Application in Traffic Safety

Hot Spot Analysis: Improving a Local Indicator of Spatial Association for Application in Traffic Safety Hot Spot Analysis: Improving a Local Indicator of Spatial Association for Application in Traffic Safety Elke Moons, Tom Brijs and Geert Wets Transportation Research Institute, Hasselt University, Science

More information

Markov-switching autoregressive latent variable models for longitudinal data

Markov-switching autoregressive latent variable models for longitudinal data Markov-swching autoregressive latent variable models for longudinal data Silvia Bacci Francesco Bartolucci Fulvia Pennoni Universy of Perugia (Italy) Universy of Perugia (Italy) Universy of Milano Bicocca

More information

FACTORS ASSOCIATED WITH MEDIAN RELATED CRASH FREQUENCY AND SEVERITY

FACTORS ASSOCIATED WITH MEDIAN RELATED CRASH FREQUENCY AND SEVERITY The Pennsylvania State University The Graduate School Department of Civil and Environmental Engineering FACTORS ASSOCIATED WITH MEDIAN RELATED CRASH FREQUENCY AND SEVERITY A Dissertation in Civil Engineering

More information

Safety Performance Functions for Partial Cloverleaf On-Ramp Loops for Michigan

Safety Performance Functions for Partial Cloverleaf On-Ramp Loops for Michigan 1 1 1 1 1 1 1 1 0 1 0 1 0 Safety Performance Functions for Partial Cloverleaf On-Ramp Loops for Michigan Elisha Jackson Wankogere Department of Civil and Construction Engineering Western Michigan University

More information

How GIS Can Help With Tribal Safety Planning

How GIS Can Help With Tribal Safety Planning How GIS Can Help With Tribal Safety Planning Thomas A. Horan, PhD Brian Hilton, PhD Arman Majidi, MAIS Center for Information Systems and Technology Claremont Graduate University Goals & Objectives This

More information

Transport Data Analysis and Modeling Methodologies

Transport Data Analysis and Modeling Methodologies 1 Transport Data Analysis and Modeling Methodologies Lab Session #12 (Random Parameters Count-Data Models) You are given accident, evirnomental, traffic, and roadway geometric data from 275 segments of

More information

A Joint Econometric Framework for Modeling Crash Counts by Severity

A Joint Econometric Framework for Modeling Crash Counts by Severity A Joint Econometric Framework for Modeling Crash Counts by Severity Shamsunnahar Yasmin Postdoctoral Associate Department of Civil, Environmental & Construction Engineering University of Central Florida

More information

NCHRP Inclusion Process and Literature Review Procedure for Part D

NCHRP Inclusion Process and Literature Review Procedure for Part D NCHRP 17-7 Inclusion Process and Literature Review Procedure for Part D Geni Bahar, P. Eng. Margaret Parkhill, P. Eng. Errol Tan, P. Eng. Chris Philp, P. Eng. Nesta Morris, M.Sc. (Econ) Sasha Naylor, EIT

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Evaluation of Road Safety in Portugal: A Case Study Analysis. Instituto Superior Técnico

Evaluation of Road Safety in Portugal: A Case Study Analysis. Instituto Superior Técnico Evaluation of Road Safety in Portugal: A Case Study Analysis Ana Fernandes José Neves Instituto Superior Técnico OUTLINE Objectives Methodology Results Road environments Expected number of road accidents

More information

The LmB Conferences on Multivariate Count Analysis

The LmB Conferences on Multivariate Count Analysis The LmB Conferences on Multivariate Count Analysis Title: On Poisson-exponential-Tweedie regression models for ultra-overdispersed count data Rahma ABID, C.C. Kokonendji & A. Masmoudi Email Address: rahma.abid.ch@gmail.com

More information

A NOTE ON GENERALIZED ORDERED OUTCOME MODELS

A NOTE ON GENERALIZED ORDERED OUTCOME MODELS A NOTE ON GENERALIZED ORDERED OUTCOME MODELS Naveen Eluru* Associate Professor Department of Civil, Environmental & Construction Engineering University of Central Florida Tel: 1-407-823-4815, Fax: 1-407-823-3315

More information

Geospatial Big Data Analytics for Road Network Safety Management

Geospatial Big Data Analytics for Road Network Safety Management Proceedings of the 2018 World Transport Convention Beijing, China, June 18-21, 2018 Geospatial Big Data Analytics for Road Network Safety Management ABSTRACT Wei Liu GHD Level 1, 103 Tristram Street, Hamilton,

More information

Effectiveness of Experimental Transverse- Bar Pavement Marking as Speed-Reduction Treatment on Freeway Curves

Effectiveness of Experimental Transverse- Bar Pavement Marking as Speed-Reduction Treatment on Freeway Curves Effectiveness of Experimental Transverse- Bar Pavement Marking as Speed-Reduction Treatment on Freeway Curves Timothy J. Gates, Xiao Qin, and David A. Noyce Researchers performed a before-and-after analysis

More information

Mohammed. Research in Pharmacoepidemiology National School of Pharmacy, University of Otago

Mohammed. Research in Pharmacoepidemiology National School of Pharmacy, University of Otago Mohammed Research in Pharmacoepidemiology (RIPE) @ National School of Pharmacy, University of Otago What is zero inflation? Suppose you want to study hippos and the effect of habitat variables on their

More information

Travel Time Calculation With GIS in Rail Station Location Optimization

Travel Time Calculation With GIS in Rail Station Location Optimization Travel Time Calculation With GIS in Rail Station Location Optimization Topic Scope: Transit II: Bus and Rail Stop Information and Analysis Paper: # UC8 by Sutapa Samanta Doctoral Student Department of

More information

Department of Civil Engineering, University of Memphis, 3815 Central Avenue, Memphis, TN 38152, United States b

Department of Civil Engineering, University of Memphis, 3815 Central Avenue, Memphis, TN 38152, United States b 0 0 0 Prediction of Secondary Crash Frequency on Highway Networks Afrid A. Sarker a,c, Rajesh Paleti b, Sabyasachee Mishra a,c*, Mihalis M. Golias a,c, Philip B Freeze d a Department of Civil Engineering,

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

Modeling Crash Frequency of Heavy Vehicles in Rural Freeways

Modeling Crash Frequency of Heavy Vehicles in Rural Freeways Journal of Traffic and Logistics Engineering Vol. 4, No. 2, December 2016 Modeling Crash Frequency of Heavy Vehicles in Rural Freeways Reza Imaninasab School of Civil Engineering, Iran University of Science

More information

Safety Effects of Icy-Curve Warning Systems

Safety Effects of Icy-Curve Warning Systems Safety Effects of Icy-Curve Warning Systems Zhirui Ye, David Veneziano, and Ian Turnbull The California Department of Transportation (Caltrans) deployed an icy-curve warning system (ICWS) on a 5-mi section

More information

Integrating the macroscopic and microscopic traffic safety analysis using hierarchical models

Integrating the macroscopic and microscopic traffic safety analysis using hierarchical models University of Central Florida Electronic Theses and Dissertations Doctoral Dissertation (Open Access) Integrating the macroscopic and microscopic traffic safety analysis using hierarchical models 2017

More information

ROBUSTNESS OF TWO-PHASE REGRESSION TESTS

ROBUSTNESS OF TWO-PHASE REGRESSION TESTS REVSTAT Statistical Journal Volume 3, Number 1, June 2005, 1 18 ROBUSTNESS OF TWO-PHASE REGRESSION TESTS Authors: Carlos A.R. Diniz Departamento de Estatística, Universidade Federal de São Carlos, São

More information

Use of Crash Report Data for Safety Engineering in Small- and Mediumsized

Use of Crash Report Data for Safety Engineering in Small- and Mediumsized Use of Crash Report Data for Safety Engineering in Small- and Mediumsized MPOs 2015 AMPO Annual Conference Sina Kahrobaei, Transportation Planner Doray Hill, Jr., Director October 21, 2015 San Angelo MPO,

More information

A COMPARATIVE STUDY OF THE APPLICATION OF THE STANDARD KERNEL DENSITY ESTIMATION AND NETWORK KERNEL DENSITY ESTIMATION IN CRASH HOTSPOT IDENTIFICATION

A COMPARATIVE STUDY OF THE APPLICATION OF THE STANDARD KERNEL DENSITY ESTIMATION AND NETWORK KERNEL DENSITY ESTIMATION IN CRASH HOTSPOT IDENTIFICATION A COMPARATIVE STUDY OF THE APPLICATION OF THE STANDARD KERNEL DENSITY ESTIMATION AND NETWORK KERNEL DENSITY ESTIMATION IN CRASH HOTSPOT IDENTIFICATION Yue Tang Graduate Research Assistant Department of

More information

CATEGORICAL MODELING TO EVALUATE ROAD SAFETY AT THE PLANNING LEVEL

CATEGORICAL MODELING TO EVALUATE ROAD SAFETY AT THE PLANNING LEVEL CATEGORICAL MODELING TO EVALUATE ROAD SAFETY AT THE PLANNING LEVEL Sara Ferreira Assistant Professor of Civil Engineering, School of Engineering, University of Porto, Rua Dr Roberto Frias, Porto, Portugal,

More information

Specification testing in panel data models estimated by fixed effects with instrumental variables

Specification testing in panel data models estimated by fixed effects with instrumental variables Specification testing in panel data models estimated by fixed effects wh instrumental variables Carrie Falls Department of Economics Michigan State Universy Abstract I show that a handful of the regressions

More information

FINAL REPORT. City of Toronto. Contract Project No: B

FINAL REPORT. City of Toronto. Contract Project No: B City of Toronto SAFETY IMPACTS AND REGULATIONS OF ELECTRONIC STATIC ROADSIDE ADVERTISING SIGNS TECHNICAL MEMORANDUM #2B BEFORE/AFTER COLLISION ANALYSIS AT MID-BLOCK LOCATIONS FINAL REPORT 3027 Harvester

More information

Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of

Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of Probability Sampling Procedures Collection of Data Measures

More information

Petr Volf. Model for Difference of Two Series of Poisson-like Count Data

Petr Volf. Model for Difference of Two Series of Poisson-like Count Data Petr Volf Institute of Information Theory and Automation Academy of Sciences of the Czech Republic Pod vodárenskou věží 4, 182 8 Praha 8 e-mail: volf@utia.cas.cz Model for Difference of Two Series of Poisson-like

More information

Xiaoguang Wang, Assistant Professor, Department of Geography, Central Michigan University Chao Liu,

Xiaoguang Wang,   Assistant Professor, Department of Geography, Central Michigan University Chao Liu, Xiaoguang Wang, Email: wang9x@cmich.edu Assistant Professor, Department of Geography, Central Michigan University Chao Liu, Email: cliu8@umd.edu Research Associate, National Center for Smart Growth, Research

More information

Markov switching multinomial logit model: an application to accident injury severities

Markov switching multinomial logit model: an application to accident injury severities Markov switching multinomial logit model: an application to accident injury severities Nataliya V. Malyshkina, Fred L. Mannering arxiv:0811.3644v1 [stat.ap] 21 Nov 2008 School of Civil Engineering, 550

More information