TRB Paper # Examining the Crash Variances Estimated by the Poisson-Gamma and Conway-Maxwell-Poisson Models

Similar documents
Effects of the Varying Dispersion Parameter of Poisson-gamma models on the estimation of Confidence Intervals of Crash Prediction models

The Negative Binomial Lindley Distribution as a Tool for Analyzing Crash Data Characterized by a Large Amount of Zeros

The Conway Maxwell Poisson Model for Analyzing Crash Data

Does the Dispersion Parameter of Negative Binomial Models Truly. Estimate the Level of Dispersion in Over-dispersed Crash data with a. Long Tail?

The Negative Binomial-Lindley Generalized Linear Model: Characteristics and Application using Crash Data

Investigating the effects of the fixed and varying dispersion parameters of Poisson-gamma models on empirical Bayes estimates

Exploring the Application of the Negative Binomial-Generalized Exponential Model for Analyzing Traffic Crash Data with Excess Zeros

TRB Paper Examining Methods for Estimating Crash Counts According to Their Collision Type

Analyzing Highly Dispersed Crash Data Using the Sichel Generalized Additive Models for Location, Scale and Shape

Application of the hyper-poisson generalized linear model for analyzing motor vehicle crashes

Investigating the Effect of Modeling Single-Vehicle and Multi-Vehicle Crashes Separately on Confidence Intervals of Poisson-gamma Models

Characterizing the Performance of the Conway-Maxwell Poisson Generalized Linear Model

TRB Paper Hot Spot Identification by Modeling Single-Vehicle and Multi-Vehicle Crashes Separately

ABSTRACT (218 WORDS) Prepared for Publication in Transportation Research Record Words: 5,449+1*250 (table) + 6*250 (figures) = 7,199 TRB

Accident Analysis and Prevention xxx (2006) xxx xxx. Dominique Lord

A Full Bayes Approach to Road Safety: Hierarchical Poisson. Mixture Models, Variance Function Characterization, and. Prior Specification

FULL BAYESIAN POISSON-HIERARCHICAL MODELS FOR CRASH DATA ANALYSIS: INVESTIGATING THE IMPACT OF MODEL CHOICE ON SITE-SPECIFIC PREDICTIONS

Comparison of Confidence and Prediction Intervals for Different Mixed-Poisson Regression Models

LEVERAGING HIGH-RESOLUTION TRAFFIC DATA TO UNDERSTAND THE IMPACTS OF CONGESTION ON SAFETY

Bayesian Poisson Hierarchical Models for Crash Data Analysis: Investigating the Impact of Model Choice on Site-Specific Predictions

How to Incorporate Accident Severity and Vehicle Occupancy into the Hot Spot Identification Process?

Crash Data Modeling with a Generalized Estimator

Including Statistical Power for Determining. How Many Crashes Are Needed in Highway Safety Studies

Approximating the Conway-Maxwell-Poisson normalizing constant

Confidence and prediction intervals for. generalised linear accident models

PLANNING TRAFFIC SAFETY IN URBAN TRANSPORTATION NETWORKS: A SIMULATION-BASED EVALUATION PROCEDURE

EVALUATION OF SAFETY PERFORMANCES ON FREEWAY DIVERGE AREA AND FREEWAY EXIT RAMPS. Transportation Seminar February 16 th, 2009

Bayesian multiple testing procedures for hotspot identification

Spatial discrete hazards using Hierarchical Bayesian Modeling

Key Words: Conway-Maxwell-Poisson (COM-Poisson) regression; mixture model; apparent dispersion; over-dispersion; under-dispersion

EXAMINING THE USE OF REGRESSION MODELS FOR DEVELOPING CRASH MODIFICATION FACTORS. A Dissertation LINGTAO WU

Flexiblity of Using Com-Poisson Regression Model for Count Data

Reparametrization of COM-Poisson Regression Models with Applications in the Analysis of Experimental Count Data

Modeling Simple and Combination Effects of Road Geometry and Cross Section Variables on Traffic Accidents

Global Journal of Engineering Science and Research Management

Local Calibration Factors for Implementing the Highway Safety Manual in Maine

Hot Spot Identification using frequency of distinct crash types rather than total crashes

LINEAR REGRESSION CRASH PREDICTION MODELS: ISSUES AND PROPOSED SOLUTIONS

New Achievement in the Prediction of Highway Accidents

Modeling of Accidents Using Safety Performance Functions

MODELING ACCIDENT FREQUENCIES AS ZERO-ALTERED PROBABILITY PROCESSES: AN EMPIRICAL INQUIRY

Statistical Model Of Road Traffic Crashes Data In Anambra State, Nigeria: A Poisson Regression Approach

Hot Spot Analysis: Improving a Local Indicator of Spatial Association for Application in Traffic Safety

Report and Opinion 2016;8(6) Analysis of bivariate correlated data under the Poisson-gamma model

DEVELOPING DECISION SUPPORT TOOLS FOR THE IMPLEMENTATION OF BICYCLE AND PEDESTRIAN SAFETY STRATEGIES

HOTSPOTS FOR VESSEL-TO-VESSEL AND VESSEL-TO-FIX OBJECT ACCIDENTS ALONG THE GREAT LAKES SEAWAY

arxiv: v1 [stat.ap] 9 Nov 2010

Confirmatory and Exploratory Data Analyses Using PROC GENMOD: Factors Associated with Red Light Running Crashes

DAYLIGHT, TWILIGHT, AND NIGHT VARIATION IN ROAD ENVIRONMENT-RELATED FREEWAY TRAFFIC CRASHES IN KOREA

Accident Prediction Models for Freeways

eqr094: Hierarchical MCMC for Bayesian System Reliability

MODELING OF 85 TH PERCENTILE SPEED FOR RURAL HIGHWAYS FOR ENHANCED TRAFFIC SAFETY ANNUAL REPORT FOR FY 2009 (ODOT SPR ITEM No.

Statistic Modelling of Count Data through the Recursive Probability Ratio of the COM-Poisson Extended Distribution

Poisson Inverse Gaussian (PIG) Model for Infectious Disease Count Data

The Bayesian Approach to Multi-equation Econometric Model Estimation

Use of Crash Report Data for Safety Engineering in Small- and Mediumsized

Research Article Headway Distributions Based on Empirical Erlang and Pearson Type III Time Methods Compared

Bayesian Nonparametric Regression for Diabetes Deaths

Using Count Regression Models to Determine the Factors which Effects the Hospitalization Number of People with Schizophrenia

LOGISTIC REGRESSION Joseph M. Hilbe

Indirect Clinical Evidence of Driver Inattention as a Cause of Crashes

Planning Level Regression Models for Crash Prediction on Interchange and Non-Interchange Segments of Urban Freeways

High-Throughput Sequencing Course

NCHRP. Web-Only Document 126: Methodology to Predict the Safety Performance of Rural Multilane Highways

Parameters Estimation Methods for the Negative Binomial-Crack Distribution and Its Application

Safety Effectiveness of Variable Speed Limit System in Adverse Weather Conditions on Challenging Roadway Geometry

Bayesian SAE using Complex Survey Data Lecture 4A: Hierarchical Spatial Bayes Modeling

Comparison of Accident Rates Using the Likelihood Ratio Testing Technique

ANALYSIS OF INTRINSIC FACTORS CONTRIBUTING TO URBAN ROAD CRASHES

NCHRP Inclusion Process and Literature Review Procedure for Part D

Subject CS1 Actuarial Statistics 1 Core Principles

Bayesian density regression for count data

Lecture-19: Modeling Count Data II

Modeling crash frequency data

Submitted for Presentation at the 2006 TRB Annual Meeting of the Transportation Research Board

STATISTICAL MODEL OF ROAD TRAFFIC CRASHES DATA IN ANAMBRA STATE, NIGERIA: A POISSON REGRESSION APPROACH

IDAHO TRANSPORTATION DEPARTMENT

The relationship between urban accidents, traffic and geometric design in Tehran

Safety Performance Functions for Partial Cloverleaf On-Ramp Loops for Michigan

Technical Report Documentation Page 2. Government Accession No. 3. Recipient's Catalog No.

Rate-Quality Control Method of Identifying Hazardous Road Locations

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

EFFECT OF HIGHWAY GEOMETRICS ON ACCIDENT MODELING

Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data. Fred Mannering University of South Florida

Lecture 8. Poisson models for counts

Incorporating the Effects of Traffic Signal Progression Into the Proposed Incremental Queue Accumulation (IQA) Method

Geospatial Big Data Analytics for Road Network Safety Management

Multivariate negative binomial models for insurance claim counts

Quantile POD for Hit-Miss Data

MODELING COUNT DATA Joseph M. Hilbe

Evaluation of fog-detection and advisory-speed system

On the estimation and influence diagnostics for the zero inflated Conway Maxwell Poisson regression model: A full Bayesian analysis

GeoTAIS: An Application of Spatial Analysis for Traffic Safety Improvements on Provincial Highways in Saskatchewan

Effectiveness of Experimental Transverse- Bar Pavement Marking as Speed-Reduction Treatment on Freeway Curves

Texas A&M University

Heriot-Watt University

Bayesian Methods for Machine Learning

Hidden Markov models for time series of counts with excess zeros

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

DEVELOPMENT OF CRASH PREDICTION MODEL USING MULTIPLE REGRESSION ANALYSIS Harshit Gupta 1, Dr. Siddhartha Rokade 2 1

Transcription:

TRB Paper #11-2877 Examining the Crash Variances Estimated by the Poisson-Gamma and Conway-Maxwell-Poisson Models Srinivas Reddy Geedipally 1 Engineering Research Associate Texas Transportation Instute Texas A&M Universy 3135 TAMU College Station, TX 77843-3135 Tel. (979) 862-1651 Fax. (979) 845-66 Email: srinivas-g@ttimail.tamu.edu Dominique Lord Associate Professor Zachry Department of Civil Engineering Texas A&M Universy 3136 TAMU College Station, TX 77843-3136 Tel. (979) 458-3949 Fax. (979) 845-6481 Email: d-lord@tamu.edu Word Count: 4,235 + 3, (4 tables + 8 figure) = 7,235 words November 15, 21 1 Corresponding author

ABSTRACT The Poisson-gamma (negative binomial or NB) distribution is still the most common probabilistic distribution used by transportation safety analysts for modeling motor vehicle crashes. Recent studies have showed that the Conway-Maxwell-Poisson distribution (COM- Poisson) distribution is also one of the promising distributions for developing crash prediction models. The obectives of this study were to investigate and compare the estimation of crash variance predicted by COM-Poisson GLM and the tradional Negative Binomial (NB) model. The comparison analysis was carried out using the most common functional forms employed by transportation safety analysts, which link crashes to the entering flows and other explanatory variables at intersections or on segments. To accomplish the obectives of the study, several NB and COM-Poisson GLMs, including flow-only models and models wh several covariates, were developed and compared using two datasets. The first dataset contained crash data collected at signalized 4-legged intersections in Toronto, Ont. The second dataset included data collected for rural 4-lane undivided highways in Texas. The results of this study show that the trend of crash variance prediction by COM-Poisson GLM is similar to that predicted by NB model. The spearman s rank correlation coefficients between that the crash variances predicted by COM- Poisson and NB model confirms that there is a perfect monotone increasing and the values are highly correlated. This means that a se that is characterized by a large variance will essentially be identified as such whether the NB and COM-Poisson model is used. 1

INTRODUCTION In highway safety, the tradional Poisson and mixed-poisson models are the most common probabilistic models utilized for analyzing crash data. Crash data have been found to often exhib over-dispersion (i.e. the variance is larger than the mean) and thus mixed-poisson models (such as the Poisson-gamma or negative binomial) are generally preferred over the tradional Poisson model. The Conway-Maxwell Poisson (COM-Poisson) distribution is also one of those generalizations of Poisson distribution, which can also be used for analyzing crash data. It was originally developed in 1962 as a method for modeling both under-dispersed and over-dispersed count data (1). The COM-Poisson distribution was then revised by Shmueli et al. (2) after a long period in which was not widely used. The COM-Poisson model can also handle underdispersed data (which the NB GLM cannot or has difficulties converging, see below) and datasets that contain intermingled over- and under-dispersed counts (for dual-link models only, since the dispersion characteristic is captured using the covariate-dependent shape parameter). Recent research in highway safety has shown that the dispersion parameter of Poisson-gamma model can potentially be dependent upon the covariates of the model and could vary from one observation to another (3-7). This characteristic has been shown to be important especially when the mean function is mis-specified, such as models that only incorporate entering traffic flows (8). Furthermore, previous studies have reported that Poisson-gamma models wh a varying dispersion parameter provide better statistical f (9-11). Similarly, the shape parameter of COM- Poisson model provides a basis for using a link function to allow the amount of over-dispersion or under-dispersion to vary across measurements. It is also expected that COM-Poisson models wh a varying shape parameter provide improved statistical f. The primary obective of this research was to examine whether the Poisson-gamma model and COM-Poisson model shows similar trend for estimating the crash variance. To accomplish the obectives of the study, NB and COM-Poisson GLMs were developed and compared using two datasets. The first dataset contained crash data collected at 4-legged signalized intersections in Toronto, Ont. The second dataset included data collected for rural 4-lane undivided highways in Texas. Flow-only models and models wh covariates were evaluated. This paper is organized as follows. The first section provides a brief overview about the characteristics of Poisson-gamma models and COM-Poisson models. The second section describes the methodology for estimating and comparing the models. The third section presents the summary statistics of the two datasets. The fourth section presents the results of the analysis. The last section provides a summary of the research and outline avenues for further work. BACKGROUND This section provides a brief description about the characteristics of the Poisson-gamma and the COM-Poisson models, respectively. 2

POISSON-GAMMA MODEL The Poisson-gamma (negative binomial or NB) distribution is the most common probabilistic distribution used by transportation safety analysts for modeling motor vehicle crashes (3, 4, 12 and 13). The Poisson-gamma model has the following model structure (14): the number of crashes Y for a particular i th se and time period t when condional on s mean is Poisson distributed and independent over all ses and time periods Y ~ Po( ) i = 1, 2,, I and t = 1, 2,, T (1) The mean of the Poisson is structured as: where, f X ; )exp( e ) (2) ( f (.) is a function of the covariates (X); is a vector of unknown coefficients; and, e is the model error independent of all the covariates. Wh this characteristic, can be shown that Y, condional on and, is distributed 2 as a Poisson-gamma random variable wh a mean and a variance, respectively. (Note: other variance functions exist for the Poisson-gamma model, but they are not covered here since they are seldom used in highway safety studies. The reader is referred to 15 and 16 for a description of alternative variance functions.) The probabily densy function (PDF) of the Poisson-gamma structure described above is given by the following equation: f y ;, y 1 1 y (3) Where, y = response variable for observation i and time period t ; = mean response for observation i and time period t ; and, = inverse dispersion parameter of the Poisson-gamma distribution. Note that if, the crash variance equals the crash mean and this model reverts back to the standard Poisson regression model. 3

The term is usually defined as the "inverse dispersion parameter" of the Poisson-gamma distribution. (Note: in the statistical and econometric lerature, 1 is usually defined as the dispersion parameter; in some published documents, the variable has also been defined as the over-dispersion parameter. ). This term has tradionally been assumed to be fixed and a unique value applied to the entire dataset in the study. As discussed above, recent research in highway safety has shown that the dispersion parameter can potentially be dependent upon the covariates of the model and could vary from one observation to another (3-7) COM-POISSON MODEL The COM-Poisson distribution has recently been used for modeling motor vehicle crashes (17-18). Shmueli et al. (2) elucidated the statistical properties of the COM-Poisson distribution using the formulation given by Conway and Maxwell (1), and Kadane et al. (19) developed the conugate distributions for the parameters of the COM-Poisson distribution. Its probabily mass function (PMF) can be given by Equations (4) and (5). y 1 PY y (4) Z, y! Z n, (5) n n! where, Y is a discrete count; is a centering parameter that is approximately the mean of the observations in many cases; and, is defined as the shape parameter of the COM-Poisson distribution. The centering parameter λ is approximately the mean when is close to one, differs substantially from the mean for small. Given that ν would be expected to be small for over-dispersed data, this would make a COM-Poisson model based on the original COM-Poisson formulation difficult to interpret and use for over-dispersed data. To circumvent this problem, Guikema and Coffelt (2) proposed a re-parameterization of the 1/ COM-Poisson distribution by substuting to provide a clear centering parameter. This new formulation of the COM-Poisson is summarized in Equations (6) and (7) below. PY y 1 y S, y! (6) n S, (7) n n! The mean and variance of Y are given in terms of the new formulation as EY V Y 2 1 log 2 2 log S 1 logs and log wh asymptotic approximations EY 12 12 and Var Y 4

especially accurate once μ>1. Wh this new parameterization, the integral part of μ is now the 1/ mode leaving μ as a reasonable approximation of the mean. The substution also allows ν to keep s role as a shape parameter. That is, if ν < 1, the variance is greater than the mean while ν > 1 leads to under-dispersion. Guikema and Coffelt (2) developed a COM-Poisson GLM framework for modeling discrete count data. The approach of Guikema and Coffelt (2) depended on MCMC for fting a duallink GLM based on the COM-Poisson distribution. It also used a reformulation of the COM Poisson to provide a more direct centering parameter than the original COM-Poisson formulation. Sellers and Shmueli (21) developed an MLE for a single-link GLM based on the original COM-Poisson distribution. Equations (8) (9) describe this modeling framework. The framework is in effect a dual-link GLM, in which both the mean and the variance depend on the covariates. In Equations (8) and (9), x i and z are covariates, and there are assumed to be p covariates used in the centering link function and q covariates used in the shape link function (similar to the varying dispersion parameter of the Poisson-gamma model proposed by 3, 7, 9, 22). The sets of parameters used in the two link functions do not necessarily have to be identical. p ln (8) i1 q 1 i x i ln z (9) The GLM framework can model under-dispersed data sets, over-dispersed data sets, and data sets that contain intermingled under-dispersed and over-dispersed counts (for dual-link models only, since the dispersion characteristic is captured using the covariate-dependent shape parameter). The variance is allowed to depend on the covariate values, which can be important if high (or low) values of some covariates tend to be variance-decreasing while high (or low) values of other covariates tend to be variance-increasing. The parameters have a direct link to eher the mean or the variance, providing insight into the behavior and driving factors in the problem, and the mean and variance of the predicted counts are readily approximated based on the covariate values and regression parameter estimates. METHODOLOGY This section describes the methodology used for estimating different NB and COM-Poisson models. For each dataset, COM-Poisson GLMs and NB models were inially estimated using the fixed shape parameter and dispersion parameter, respectively. Then, the models were developed using different parameterizations for a varying shape and a varying dispersion parameter. The functional form used for models were the following (Note: the centering parameter is the mean wh NB model whereas is approximately the mode wh the COM-Poisson model): 5

Toronto intersection data: 1 2 Centering parameter F F (1) i Ma_ i Min_ i 1 2 Shape parameter (of COM-Poisson model) F F (11) i Ma _ i Min _ i 1 2 Dispersion parameter (of NB model) F F (12) Texas segment data: i Ma _ i Min _ i Centering parameter (13) L F 1 2* LW 3* SW 4* CD e Shape parameter (of COM-Poisson model): Model 1: L (14) Model 2: / L (15) 1 Model 3: L (16) Dispersion parameter (of NB model) Model 1: L (17) Model 2: / L (18) 1 Model 3: L (19) Where, i = the mean number of crashes for intersection i ; = the mean number of crashes per year for segment ; F = entering flow for the maor approach (average annual daily traffic or Ma _ i Min _ i AADT) for intersection i ; F = entering flow for the minor approach (average annual daily traffic or AADT) for intersection i ; F = flow traveling on segment (average annual daily traffic or 6

AADT) and time period t ; L = length in miles for segment ; LW = lane width in ft for segment ; SW = total shoulder width in ft for segment ; CD = curve densy (curves per mile) for segment ; and, ' s, ' s = estimated coefficients. The coefficients of the COM-Poisson GLMs and NB models were estimated using the software WinBUGS (23). Vague or non-informative hyper-priors were utilized for the COM-Poisson and NB GLMs. A total of 3 Markov chains were used in the model estimation process. The Gelman- Rubin (G-R) convergence statistic was used to verify that the simulation runs converged properly. DATA DESCRIPTION This section describes the characteristics of the two datasets. The first dataset contained crash data collected in 1995 at 4-legged signalized intersections located in Toronto, Ont. The data have previously been used for several research proects and have been found to be of relatively good qualy (3, 17, 24-26). In total, 868 signalized intersections were used in this dataset. The second dataset contained crash data collected from 1997 to 21 at 4-lane rural undivided segments in Texas. The data were provided by the Texas Department of Public Safety (DPS) and the Texas Department of Transportation (TxDOT) and were used for the proect NCHRP 17-29 (Methodology for Estimating the Safety Performance of Multilane Rural Highways) (27). The final database included 1,499 segments (.1 mile). Table 1 presents the summary statistics for two datasets used in this study. 7

Table 1. Summary Statistics for the Toronto and Texas Data Min. Max. Average Total Crashes 54 11.56 (1.2) 13 Toronto Maor AADT 5469 72178 2844.81 (166.4) -- Minor AADT 53 42644 111.18 (8599.4) -- Crashes 97 2.84 (5.69) 4,253 Length (miles).1 6.275.55 (.67) 83.5 AADT 42 24,8 6,613.61 (41.1) -- Texas Lane Width (Feet) 9.75 16.5 12.57 (1.59) Shoulder Width (Right + Left) (Feet) 4 9.96 (8.2) -- Number of Horizontal Curves 16.7 (1.32) 152 RESULTS This section presents the modeling results for the COM-Poisson GLMs as well as for the NB models and is divided into two parts. The first part explains the modeling results for the Toronto data. The second part provides details about the modeling results for the Texas data. TORONTO DATA 8

Table 2 summarizes the results of the COM-Poisson and NB GLMs for the Toronto data. This table shows that the coefficients for the flow parameters are below one, which indicates that the crash risk increases at a decreasing rate as traffic flow increases. It should be pointed out that the 95% marginal posterior credible intervals for each of the coefficients did not include the origin. The Deviance Information Creria (i.e., DIC) value shows that there is no significant difference in the f among various models (this result supports the finding of Lord et al (17)). However, the NB model wh varying dispersion parameter showed a slight better f, as expected. Table 2. Modeling Results for the COM-Poisson and NB GLMs using the Toronto Data COM-Poisson NB Estimates Fixed shape parameter Varying shape parameter Fixed dispersion parameter Varying dispersion parameter Ln( ) -11.53 (.4159) -1.67 (.5249) -1.11 (.4794) -1.28 (.451) 1.635 (.4742).5498 (.4841).671 (.46).6161 (.4695) 2.795 (.311).7971 (.345).6852 (.21).6943 (.2295).348 (.283) -- -- -- -- -- 7.12 (.619) -- Ln( ) -- 4.882 (1.753) -- 2.764 (1.663) 1 -- -.5945 (.1734) -- -.49 (.1856) 2 --.133 (.5463) --.3671 (.149) DIC 4953.7 4937.48 4777.59 4762.38 Figure 1 illustrates the frequency distribution of the varying shape parameter of COM-Poisson distribution across all the observations. It is interesting to note that the frequency distribution can be approximated by a normal- or lognormal-shaped distribution. This figure shows that the highest frequency (i.e., the mode) occurred between the range of.3-.4 and the average shape parameter value was found to be.36, which is slightly higher than the value found for the fixed shape parameter (i.e..34). 9

45 4 392 35 Frequency 3 25 2 15 23 169 1 5 3 57 14 1 2.1-.2.2-.3.3-.4.4-.5.5-.6.6-.7.7-.8.8-.9 Figure 1. Frequency Distribution of the Varying Shape parameter of COM-Poisson Model (Mean=.36) Figure 2 illustrates the distribution of the varying (inverse) dispersion parameter across all the observations. As seen, there is a wide variation of dispersion parameter among various observations. The figure shows that the highest frequency (i.e., the mode) occurred between the range of 9-1, whereas the average dispersion parameter value was found to be 7.. However, the average value of varying dispersion parameter is found to be much closer to the fixed dispersion parameter (i.e. 7.1). Frequency 16 14 12 1 8 6 58 121 12 12 97 133 141 66 4 2 2 1 1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-1 1-11 Figure 2. Frequency Distribution of the Inverse Dispersion Parameter of NB Model (Mean=7.) Figure 3 shows the comparison of crash variance predicted by COM-Poisson and NB models. It can be seen that for the ses wh crash mean less than 2, both the models predict almost the same variance. However, for the ses wh higher mean, NB model predicts slightly higher variance than the COM-Poisson model, though the shapes are similar. The discrepancy between 1

the f and the smaller variance can be explained by the fact that the variance, via the dispersion parameter, is estimated directly from the data and independently from the mean (see 28 for addional information on this topic). The Spearman s rank correlation coefficient between the variances predicted by both models is found to be.999 which means that there is a perfect monotone increasing and is highly correlated. For this dataset, the variance may be slightly better captured by the COM-Poisson than by the NB, especially at larger mean values. Figure 3. Crash Variance versus Crash Mean for the Toronto Data Figure 4 illustrates the frequency distribution of crash variance predicted by COM-Poisson and NB models. As discussed above, the frequency of ses wh low variance is higher wh the COM-Poisson model than wh the NB model. 25 2 COM NB Frequency 15 1 5-1 1-2 2-3 3-4 4-5 5-6 6-7 Variance 7-8 8-9 9-1 >1 11

Figure 4: Frequency Distribution of Crash Variance for the Toronto Data TEXAS DATA Table 3 presents the results of the COM-Poisson GLM for the Texas data. This table shows that the coefficient for the flow parameter is above one for all models except model 1, which indicates that the crash risk increases at an increasing rate as traffic flow increases. It should be pointed out that the 95% marginal posterior credible intervals for each of the coefficients did not include the origin. The DIC value shows that Model 3 fs the data better than all other models. Model 3 was also found to be the best model in Geedipally et al. (22). Table 3. Modeling Results for the COM-Poisson GLM using the Texas Data Estimates Fixed shape Model 1 Model 2 Model 3 Ln( ) -8.845 (.673) -5.746 (.239) -27.18 (3.741) -13.59 (.947) 1 1.298 (.81).975 (.29) 3.97 (.366) 1.764 (.18) 2 -.14 (.19) -.17 (.17) -.11 (.51) -.99 (.26) 3 -.18 (.4) -.18 (.3) -.19 (.12) -.18 (.6) 4.94 (.13).139 (.9).168 (.39).96 (.18).419 (.26) -- -- -- Ln( ) -- -.321 (.26) -2.984 (.78) -1.792 (.19) 1 -- -- -- -.548 (.43) DIC 5159.3 6481.6 576.2 5.6 Table 4 presents the results of the NB GLM for the Texas data. This table shows that the coefficient for the flow parameter is below one for all models, which indicates that the crash risk increases at a decreasing rate as traffic flow increases. It should be pointed out that the 95% marginal posterior credible intervals for each of the coefficients did not include the origin. The DIC value shows that Model 1 fs the data better than all other models, although Model 3 is close second. In addion, the NB model fs the data slightly better than the COM-Poisson model. 12

Table 4. Modeling Results for the NB GLM using the Texas Data Estimates Fixed dispersion Model 1 Model 2 Model 3 Ln( ) -6.384 (.412) -5.597 (.285) -6.752 (.35) -5.977 (.388) 1.983 (.43).915 (.33) 1.4 (.31).945 (.38) 2 -.55 (.17) -.71 (.14) -.43 (.14) -.61 (.14) 3 -.1 (.3) -.11 (.3) -.9 (.3) -.11 (.3) 4.67 (.12).95 (.14).62 (.1).78 (.13) 2.55 (.234) -- -- -- Ln( ) -- 1.485 (.81) 1.52 (.134) 1.144 (.99) 1 -- -- --.51 (.91) DIC 4784. 479.6 4988.9 4732.6 Figure 5 shows the frequency distribution of the varying shape parameter of COM-Poisson distribution across various observations for Texas data. Similar to the Toronto data, the frequency distribution can be approximated by a normal- or lognormal-shaped distribution. This figure shows that the highest frequency (i.e., the mode) occurred between the range of.2-.3 and the average shape parameter value was found to be.313, which is slightly lower than the value found for the fixed shape parameter (i.e.,.419). 13

5 45 4 455 389 Frequency 35 3 25 2 15 243 245 126 1 5 41 -.1.1-.2.2-.3.3-.4.4-.5.5-.6 Figure 5: Frequency Distribution of Shape Parameter of COM-Poisson (Mean=.313) Figure 6 illustrates the distribution of the varying (inverse) dispersion parameter of NB model across various observations for Texas data. This frequency distribution can be approximated by a skewed normal or log-normal distribution. The figure shows that the highest frequency (i.e., the mode) occurred between the ranges of 1-2, whereas the average dispersion parameter value was found to be 2.45, which is much closer to the fixed dispersion parameter (i.e. 2.55). 6 5 449 521 Frequency 4 3 2 221 1 98 56 33 19 21 11 14 56-1 1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-1 >1 Figure 6: Frequency Distribution of Inverse Dispersion Parameter of NB (Mean=2.45) Figure 7 shows the comparison of crash variance predicted by COM-Poisson and NB models. The figure illustrates that both the models predict almost the same variance for a given mean. However, should be noted that for higher crash means NB model predicts slightly higher 14

variance than the COM-Poisson model. The Spearman s correlation coefficient between the variances is found to be.99 which confirms that the association is in same direction and is posively correlated. 1 1 Crash Variance 1 1 COM NB.1.1.1.1 1 1 1 1 1 Crash Mean Figure 7. Crash Variance versus Crash Mean for the Texas Data (Note: x-axis and y-axis are formatted under a logarhmic scale) Figure 8 presents the frequency distribution of crash variance predicted by COM-Poisson and NB models. As opposed to the Toronto data, the frequency of ses between both models is almost the same. 15

14 12 COM NB 1 Frequency 8 6 4 2-1 1-2 2-3 3-4 4-5 5-6 6-7 Variance 7-8 8-9 9-1 >1 Figure 8. Frequency Distribution of Crash Variance for Texas Data SUMMARY AND CONCLUSIONS This paper has documented the difference in the estimation of crash variance using the COM- Poisson and NB models. The NB model is the most commonly used model for analyzing motor vehicle crashes. Recently, the COM-Poisson model was introduced for traffic crash data modeling. The COM-Poisson model introduces a covariate-dependent shape parameter which captures the dispersion in the data. Thus, the COM-Poisson model has a capabily to handle datasets that contain intermingled over- and under-dispersed counts. The obectives of this study were to investigate and compare the estimation of crash variance predicted by the COM-Poisson and the NB models. To accomplish the study obectives, several NB and COM-Poisson GLMs were developed using two datasets. The first dataset contained crash data collected at 4-legged signalized intersections in Toronto, Ont. The second dataset included data collected for rural 4-lane undivided highways in Texas. The results of this study show that the trend of crash variance prediction by COM-Poisson GLM is similar to that predicted by NB model. The spearman s rank correlation coefficients between the crash variance predicted by COM-Poisson and NB model are.999 and.99 for Toronto and Texas data respectively. This means that a se that is characterized by a large variance will essentially be identified as such whether the NB and COM-Poisson model is used. This characteristic was found for both flow-only and models wh covariates. For the latter, the results may indicate that the variation observed for the variance is data specific rather (see 22) than attributed to the model specification, as suggested by Mra and Washinton (8). Further work is needed on this topic. It is recognized that constraining the parameters to be constant across various observations may lead to inconsistent and biased estimates (29). To overcome or minimize this important problem in count data models, Anastasopoulos and Mannering (3) suggested using random-parameter 16

models. Further work is thus needed to find the difference in the estimation of crash variance between NB and COM-Poisson models when the random-parameter model is used in conunction wh varying shape parameter. It should be pointed out such codes for the COM- Poisson model are not yet available. The next step consists of examining how do these slight differences in observed variance for high means influence typical highway safety studies, such as evaluating the effects of interventions and the identification of hazardous ses, when a varying dispersion or shape parameter is used. The approach used by Geedipally and Lord (31) could be utilized for such evaluation. It is also recommended to conduct an analysis wh the varying shape parameter for an underdispersed dataset. REFERENCES 1. Conway, R.W, and W.L. Maxwell (1962) A queuing model wh state dependent service rates. Journal of Industrial Engineering, Vol. 12, pp. 132-136. 2. Shmueli, G., T.P. Minka, J.B. Kadane, S. Borle, P. Boatwright (25) A useful distribution for fting discrete data: revival of the Conway-Maxwell-Poisson distribution. Journal of the Royal Statistical Society, Part C, Vol. 54, pp. 127-142. 3. Miaou, S.-P., and D. Lord. Modeling Traffic Crash-Flow Relationships for Intersections: Dispersion Parameter, Functional Form, and Bayes Versus Empirical Bayes Methods. In Transportation Research Record: Journal of the Transportation Research Board, No. 184, Transportation Research Board of the National Academies, Washington, D.C., 23, pp 31-4. 4. Geedipally, S.R. and D. Lord. Effects of the Varying Dispersion Parameter of Poissongamma models on the Estimation of Confidence Intervals of Crash Prediction models. In Transportation Research Record: Journal of the Transportation Research Board, No. 261, Transportation Research Board of the National Academies, Washington, D.C., 28, pp 46-54. 5. Lord, D., S.P. Washington, and J.N. Ivan. Poisson, Poisson-Gamma and Zero Inflated Regression Models of Motor Vehicle Crashes: Balancing Statistical F and Theory. Accident Analysis & Prevention, Vol. 37, No. 1, 25, pp. 35-46. 6. Hauer, E. (1997) Observational Before-After Studies in Road Safety: Estimating the Effect of Highway and Traffic Engineering Measures on Road Safety. Elsevier Science Ltd, Oxford. 7. Heydecker, B.G., and J. Wu. Identification of Ses for Road Accident Remedial Work by Bayesian Statistical Methods: An Example of Uncertain Inference. Advances in Engineering Software, Vol. 32, 21, pp. 859-869. 17

8. Mra, S., and S.P. Washington. On the Nature of Over-Dispersion in Motor Vehicle Crash Prediction Models. Accident Analysis & Prevention, Vol. 39, No. 3, 27, pp. 459-468. 9. Hauer, E. Overdispersion in Modelling Accidents on Road Sections and in Empirical Bayes Estimation. Accident Analysis & Prevention, Vol. 33, No. 6, 21, pp. 799-88. 1. Lord, D., and P.Y-J. Park. Investigating the Effects of the Fixed and Varying Dispersion Parameters of Poisson-Gamma Models on Empirical Bayes Estimates. Accident Analysis & Prevention, Vol. 4, No. 4, 28, pp. 1441-1457. 11. El-Basyouny, K., and T. Sayed. Comparison of Two Negative Binomial Regression Techniques in Developing Accident Prediction Models. In Transportation Research Record: Journal of the Transportation Research Board, No. 195, Transportation Research Board of the National Academies, Washington, D.C., 26, pp 9-16. 12. Poch, M., and F.L. Mannering. Negative Binomial Analysis of Intersection-Accident Frequencies. Journal of Transportation Engineering, ASCE, Vol. 122, No. 2, 1996, pp. 15-113. 13. Lord, D., and F. Mannering (21) The Statistical Analysis of Crash-Frequency Data: A Review and Assessment of Methodological Alternatives. Transportation Research - Part A, Vol. 44, No. 5, pp. 291-35. 14. Lord, D. Modeling Motor Vehicle Crashes Using Poisson-Gamma Models: Examining the Effects of Low Sample Mean Values and Small Sample Size on the Estimation of the Fixed Dispersion Parameter. Accident Analysis & Prevention, Vol. 38, No. 4, 26, pp. 751-766. 15. Cameron, A.C., and P.K. Trivedi. Regression Analysis of Count Data. Cambridge Universy Press, Cambridge, U.K., 1998. 16. Maher M.J., and I. Summersgill. A Comprehensive Methodology for the Fting Predictive Accident Models. Accident Analysis & Prevention, Vol. 28, No. 3, 1996, pp.281-296. 17. Lord, D., S.D. Guikema, and S. Geedipally (28) Application of the Conway-Maxwell- Poisson Generalized Linear Model for Analyzing Motor Vehicle Crashes. Accident Analysis & Prevention, Vol. 4, No. 3, pp. 1123-1134. 18. Lord, D., S.R. Geedipally, and S. Guikema (21) Extension of the Application of Conway-Maxwell-Poisson Models: Analyzing Traffic Crash Data Exhibing Under- Dispersion. Risk Analysis, in press (http://dx.doi.org/1.1111/.1539-6924.21.1417.x) 19. Kadane, J.B., G. Shmueli, T.P. Minka, S. Borle, and P. Boatwright (26) Conugate analysis of the Conway-Maxwell-Poisson distribution. Bayesian Analysis, Vol. 1, pp. 363-374. 2. Guikema, S. D., and Coffelt, J. P. (28), "A Flexible Count Data Regression Model for Risk Analysis.," Risk Analysis, 28, 213-223. 18

21. Sellers, K. F., and Shmueli, G. (21), "A Flexible Regression Model for Count Data," Annals of Applied Statistics, In Press. 22. Geedipally, S.R., D. Lord, and B.-J. Park (29) Analyzing Different Parameterizations of the Varying Dispersion Parameter as a Function of Segment Length. Transportation Research Record 213, pp. 18-118. 23. Spiegelhalter, D.J., A. Thomas, N.G. Best, D. Lun (23) WinBUGS Version 1.4.1 User Manual. MRC Biostatistics Un, Cambridge. Available from: <http://www.mrcbsu. cam.ac.uk/bugs/welcome.shtml>. 24. Lord, D. (2) The Prediction of Accidents on Digal Networks: Characteristics and Issues Related to the Application of Accident Prediction Models. Ph.D. Dissertation. Department of Civil Engineering, Universy of Toronto, Toronto, Ontario. 25. Miaou, S.-P., and J.J. Song (25) Bayesian ranking of ses for engineering safety improvements: Decision parameter, treatabily concept, statistical crerion and spatial dependence. Accident Analysis and Prevention, Vol. 37, No. 4, pp. 699-72. 26. Miranda-Moreno, L.F., and L. Fu (27) Traffic Safety Study: Empirical Bayes or Full Bayes? Paper 7-168. Presented at the 84th Annual Meeting of the Transportation Research Board, Washington, D.C. 27. Lord, D., Geedipally, S.R., Persaud, B.N., Washington, S.P., van Schalkwyk, I.,Ivan, J.N., Lyon, C., and Jonsson, T, 28. Methodology for Estimating the Safety Performance of Multilane Rural Highways. NCHRP Web-Only Document 126, National Cooperation Highway Research Program, Washington, DC,. (http://onlinepubs.trb.org/onlinepubs/nchrp/nchrp_w126.pdf, accessed on June 24 21). 28. Heydecker, B.G., J. Wu (21) Identification of Ses for Road Accident Remedial Work by Bayesian Statistical Methods: An Example of Uncertain Inference. Advances in Engineering Software, Vol. 32, pp. 859-869. 29. Washington, S.P., Karlaftis, M.G., Mannering, F.L., 21. Statistical and Econometric Methods for Transportation Data Analysis. Second Edion, Chapman Hall/CRC, Boca Raton, FL. 3. Anastasopoulos, P.C., Mannering, F.L., 29. A note on modeling vehicle accident frequencies wh random-parameters count models. Accident Analysis and Prevention 41 (1), 153 159. 31. Geedipally, S.R., and D. Lord (21) Hot Spot Identification by Modeling Single-Vehicle and Multi-Vehicle Crash Separately. Transportation Research Record 2147, pp. 97-13. 19